The following are sample outputs of the pfilestat tool for various scenarios.
Starting with something simple,
Running: dd if=/dev/rdsk/c0d0s0 of=/dev/null bs=56k # x86, 32-bit
# ./pfilestat `pgrep -x dd`
STATE FDNUM Time Filename
read 3 2% /devices/pci@0,0/pci-ide@1f,1/ide@0/cmdk@0,0
write 4 3% /devices/pseudo/mm@0:null
waitcpu 0 7%
running 0 16%
sleep-r 0 69%
STATE FDNUM KB/s Filename
write 4 53479 /devices/pseudo/mm@0:null
read 3 53479 /devices/pci@0,0/pci-ide@1f,1/ide@0/cmdk@0,0
Total event time (ms): 4999 Total Mbytes/sec: 104
Most of the time we are sleeping on read, which is to be expected as dd on
the raw device is simple -> read:entry, strategy, biodone, read:return.
CPU time in read() itself is small.
Now for the dsk device,
Running: dd if=/dev/dsk/c0d0s0 of=/dev/null bs=56k # x86, 32-bit
# ./pfilestat `pgrep -x dd`
STATE FDNUM Time Filename
write 4 5% /devices/pseudo/mm@0:null
waitcpu 0 8%
running 0 15%
sleep-r 0 18%
read 3 53% /devices/pci@0,0/pci-ide@1f,1/ide@0/cmdk@0,0
STATE FDNUM KB/s Filename
read 3 53492 /devices/pci@0,0/pci-ide@1f,1/ide@0/cmdk@0,0
write 4 53492 /devices/pseudo/mm@0:null
Total event time (ms): 4914 Total Mbytes/sec: 102
Woah, we are now spending much more time in read()! I imagine segmap is
a busy bee. The "running" and "write" times are hardly different.
Now for a SPARC demo of the same,
Running: dd if=/dev/dsk/c0d0s0 of=/dev/null bs=56k # SPARC, 64-bit
# ./pfilestat `pgrep -x dd`
STATE FDNUM Time Filename
write 4 3% /devices/pseudo/mm@0:zero
waitcpu 0 7%
running 0 17%
read 3 24% /devices/pci@1f,0/pci@1,1/ide@3/dad@0,0:a
sleep-r 0 54%
STATE FDNUM KB/s Filename
read 3 13594 /devices/pci@1f,0/pci@1,1/ide@3/dad@0,0:a
write 4 13606 /devices/pseudo/mm@0:zero
Total event time (ms): 4741 Total Mbytes/sec: 25
I did prime the cache by running this a few times first. There is less
read() time than with the x86 32-bit demo, as I would guess that the
process is more often exhausting the (faster) segmap cache and getting
to the point where it must sleep. (However, do take this comparison with
a grain of salt - my development servers aren't ideal for comparing
statistics: one is a 867 MHz Pentium, and the other a 360 MHz Ultra 5).
The file system cache is faster on 64-bit systems due to the segkpm
enhancement in Solaris 10. For details see,
http://blogs.sun.com/roller/page/rmc?entry=solaris_10_fast_filesystem_cache
Now, back to x86.
Running: tar cf /dev/null /
# ./pfilestat `pgrep -x tar`
STATE FDNUM Time Filename
read 11 0% /extra1/test/amd64/libCstd.so.1
read 11 0% /extra1/test/amd64/libXm.so
read 11 0% /extra1/test/amd64/libXm.so.4
read 11 1% /extra1/test/amd64/libgtk-x11-2.0.so
read 11 2% /extra1/test/amd64/libgtk-x11-2.0.so.0
waitcpu 0 2%
read 9 4% /extra1/5000
write 3 7% /devices/pseudo/mm@0:null
running 0 19%
sleep-r 0 46%
STATE FDNUM KB/s Filename
read 11 293 /extra1/test/amd64/libgdk-x11-2.0.so
read 11 295 /extra1/test/amd64/libgdk-x11-2.0.so.0
read 9 476 /extra1/1000
read 11 526 /extra1/test/amd64/libCstd.so.1
read 11 594 /extra1/test/amd64/libXm.so.4
read 11 594 /extra1/test/amd64/libXm.so
read 11 1603 /extra1/test/amd64/libgtk-x11-2.0.so.0
read 11 1606 /extra1/test/amd64/libgtk-x11-2.0.so
read 9 4078 /extra1/5000
write 3 21254 /devices/pseudo/mm@0:null
Total event time (ms): 4903 Total Mbytes/sec: 41
Fair enough. tar is crusing along at 21 Mbytes/sec (writes to fd 3!).
More interesting is to do the following,
Running: tar cf - / | gzip > /dev/null
# ./pfilestat `pgrep -x tar`
STATE FDNUM Time Filename
read 11 0% /extra1/test/amd64/libm.so
read 11 0% /extra1/test/amd64/libm.so.2
read 11 0% /extra1/test/amd64/libnsl.so
read 11 0% /extra1/test/amd64/libnsl.so.1
read 11 0% /extra1/test/amd64/libc.so.1
write 3 2% <none>
waitcpu 0 4%
sleep-r 0 4%
running 0 6%
sleep-w 0 78%
STATE FDNUM KB/s Filename
read 11 74 /extra1/test/amd64/libldap.so
read 11 75 /extra1/test/amd64/libldap.so.5
read 11 75 /extra1/test/amd64/libresolv.so.2
read 11 76 /extra1/test/amd64/libresolv.so
read 11 97 /extra1/test/amd64/libm.so.2
read 11 98 /extra1/test/amd64/libm.so
read 11 174 /extra1/test/amd64/libnsl.so
read 11 176 /extra1/test/amd64/libnsl.so.1
read 11 216 /extra1/test/amd64/libc.so.1
write 3 3022 <none>
Total event time (ms): 4911 Total Mbytes/sec: 6
Woah now, tar is writing 3 Mbytes/sec - AND spending 78% of it's time on
sleep-w, sleeping on writes! Of course, this is because we are piping the
output to gzip, which is spending a while compressing the data. 78%
matches the time gzip was on the CPU (using either "prstat -m" or dtrace
to measure; procfs's pr_pctcpu would take too long to catch up).
Also interesting is,
Running: perl -e 'while (1) {;}' &
Running: perl -e 'while (1) {;}' &
Running: perl -e 'while (1) {;}' &
Running: perl -e 'while (1) {;}' &
Running: tar cf /dev/null /
# ./pfilestat `pgrep -x tar`
STATE FDNUM Time Filename
read 11 0% /extra1/test/amd64/libxml2.so.2
read 11 0% /extra1/test/amd64/libgdk-x11-2.0.so.0
read 11 0% /extra1/test/amd64/libgdk-x11-2.0.so
read 11 0% /extra1/test/amd64/libCstd.so.1
read 11 0% /extra1/test/amd64/libgtk-x11-2.0.so.0
read 11 2% /extra1/test/amd64/libgtk-x11-2.0.so
write 3 2% /devices/pseudo/mm@0:null
running 0 8%
sleep-r 0 22%
waitcpu 0 65%
STATE FDNUM KB/s Filename
read 11 182 /extra1/test/amd64/libsun_fc.so
read 11 264 /extra1/test/amd64/libglib-2.0.so
read 11 266 /extra1/test/amd64/libglib-2.0.so.0
read 11 280 /extra1/test/amd64/libxml2.so.2
read 11 293 /extra1/test/amd64/libgdk-x11-2.0.so
read 11 295 /extra1/test/amd64/libgdk-x11-2.0.so.0
read 11 526 /extra1/test/amd64/libCstd.so.1
read 11 761 /extra1/test/amd64/libgtk-x11-2.0.so.0
read 11 1606 /extra1/test/amd64/libgtk-x11-2.0.so
write 3 7881 /devices/pseudo/mm@0:null
Total event time (ms): 4596 Total Mbytes/sec: 13
Now we have "waitcpu" as tar competes for CPU cycles along with the greedy
infinite perl processes.