Training courses

Kernel and Embedded Linux

Bootlin training courses

Embedded Linux, kernel,
Yocto Project, Buildroot, real-time,
graphics, boot time, debugging...

Bootlin logo

Elixir Cross Referencer

The following are sample outputs of the pfilestat tool for various scenarios.



Starting with something simple,

Running: dd if=/dev/rdsk/c0d0s0 of=/dev/null bs=56k	# x86, 32-bit

   # ./pfilestat `pgrep -x dd`
   
        STATE   FDNUM      Time Filename
         read       3        2% /devices/pci@0,0/pci-ide@1f,1/ide@0/cmdk@0,0
        write       4        3% /devices/pseudo/mm@0:null
      waitcpu       0        7%
      running       0       16%
      sleep-r       0       69%
   
        STATE   FDNUM      KB/s Filename
        write       4     53479 /devices/pseudo/mm@0:null
         read       3     53479 /devices/pci@0,0/pci-ide@1f,1/ide@0/cmdk@0,0
   
   Total event time (ms): 4999   Total Mbytes/sec: 104

Most of the time we are sleeping on read, which is to be expected as dd on
the raw device is simple -> read:entry, strategy, biodone, read:return.
CPU time in read() itself is small.



Now for the dsk device,

Running: dd if=/dev/dsk/c0d0s0 of=/dev/null bs=56k	# x86, 32-bit

   # ./pfilestat `pgrep -x dd`
   
        STATE   FDNUM      Time Filename
        write       4        5% /devices/pseudo/mm@0:null
      waitcpu       0        8%
      running       0       15%
      sleep-r       0       18%
         read       3       53% /devices/pci@0,0/pci-ide@1f,1/ide@0/cmdk@0,0
   
        STATE   FDNUM      KB/s Filename
         read       3     53492 /devices/pci@0,0/pci-ide@1f,1/ide@0/cmdk@0,0
        write       4     53492 /devices/pseudo/mm@0:null
   
   Total event time (ms): 4914   Total Mbytes/sec: 102

Woah, we are now spending much more time in read()! I imagine segmap is
a busy bee. The "running" and "write" times are hardly different.



Now for a SPARC demo of the same,

Running: dd if=/dev/dsk/c0d0s0 of=/dev/null bs=56k	# SPARC, 64-bit

   # ./pfilestat `pgrep -x dd`

        STATE   FDNUM      Time Filename
        write       4        3% /devices/pseudo/mm@0:zero
      waitcpu       0        7%
      running       0       17%
         read       3       24% /devices/pci@1f,0/pci@1,1/ide@3/dad@0,0:a
      sleep-r       0       54%
   
        STATE   FDNUM      KB/s Filename
         read       3     13594 /devices/pci@1f,0/pci@1,1/ide@3/dad@0,0:a
        write       4     13606 /devices/pseudo/mm@0:zero
   
   Total event time (ms): 4741   Total Mbytes/sec: 25

I did prime the cache by running this a few times first. There is less
read() time than with the x86 32-bit demo, as I would guess that the
process is more often exhausting the (faster) segmap cache and getting
to the point where it must sleep. (However, do take this comparison with
a grain of salt - my development servers aren't ideal for comparing
statistics: one is a 867 MHz Pentium, and the other a 360 MHz Ultra 5).

The file system cache is faster on 64-bit systems due to the segkpm
enhancement in Solaris 10. For details see,
http://blogs.sun.com/roller/page/rmc?entry=solaris_10_fast_filesystem_cache



Now, back to x86.

Running: tar cf /dev/null /

   # ./pfilestat `pgrep -x tar`
   
        STATE   FDNUM      Time Filename
         read      11        0% /extra1/test/amd64/libCstd.so.1
         read      11        0% /extra1/test/amd64/libXm.so
         read      11        0% /extra1/test/amd64/libXm.so.4
         read      11        1% /extra1/test/amd64/libgtk-x11-2.0.so
         read      11        2% /extra1/test/amd64/libgtk-x11-2.0.so.0
      waitcpu       0        2%
         read       9        4% /extra1/5000
        write       3        7% /devices/pseudo/mm@0:null
      running       0       19%
      sleep-r       0       46%
   
        STATE   FDNUM      KB/s Filename
         read      11       293 /extra1/test/amd64/libgdk-x11-2.0.so
         read      11       295 /extra1/test/amd64/libgdk-x11-2.0.so.0
         read       9       476 /extra1/1000
         read      11       526 /extra1/test/amd64/libCstd.so.1
         read      11       594 /extra1/test/amd64/libXm.so.4
         read      11       594 /extra1/test/amd64/libXm.so
         read      11      1603 /extra1/test/amd64/libgtk-x11-2.0.so.0
         read      11      1606 /extra1/test/amd64/libgtk-x11-2.0.so
         read       9      4078 /extra1/5000
        write       3     21254 /devices/pseudo/mm@0:null
   
   Total event time (ms): 4903   Total Mbytes/sec: 41

Fair enough. tar is crusing along at 21 Mbytes/sec (writes to fd 3!).



More interesting is to do the following,

Running: tar cf - / | gzip > /dev/null

   # ./pfilestat `pgrep -x tar`
   
        STATE   FDNUM      Time Filename
         read      11        0% /extra1/test/amd64/libm.so
         read      11        0% /extra1/test/amd64/libm.so.2
         read      11        0% /extra1/test/amd64/libnsl.so
         read      11        0% /extra1/test/amd64/libnsl.so.1
         read      11        0% /extra1/test/amd64/libc.so.1
        write       3        2% <none>
      waitcpu       0        4%
      sleep-r       0        4%
      running       0        6%
      sleep-w       0       78%
   
        STATE   FDNUM      KB/s Filename
         read      11        74 /extra1/test/amd64/libldap.so
         read      11        75 /extra1/test/amd64/libldap.so.5
         read      11        75 /extra1/test/amd64/libresolv.so.2
         read      11        76 /extra1/test/amd64/libresolv.so
         read      11        97 /extra1/test/amd64/libm.so.2
         read      11        98 /extra1/test/amd64/libm.so
         read      11       174 /extra1/test/amd64/libnsl.so
         read      11       176 /extra1/test/amd64/libnsl.so.1
         read      11       216 /extra1/test/amd64/libc.so.1
        write       3      3022 <none>
   
   Total event time (ms): 4911   Total Mbytes/sec: 6

Woah now, tar is writing 3 Mbytes/sec - AND spending 78% of it's time on
sleep-w, sleeping on writes! Of course, this is because we are piping the
output to gzip, which is spending a while compressing the data. 78%
matches the time gzip was on the CPU (using either "prstat -m" or dtrace
to measure; procfs's pr_pctcpu would take too long to catch up).




Also interesting is,

Running: perl -e 'while (1) {;}' &
Running: perl -e 'while (1) {;}' &
Running: perl -e 'while (1) {;}' &
Running: perl -e 'while (1) {;}' &
Running: tar cf /dev/null /

   # ./pfilestat `pgrep -x tar`

        STATE   FDNUM      Time Filename
         read      11        0% /extra1/test/amd64/libxml2.so.2
         read      11        0% /extra1/test/amd64/libgdk-x11-2.0.so.0
         read      11        0% /extra1/test/amd64/libgdk-x11-2.0.so
         read      11        0% /extra1/test/amd64/libCstd.so.1
         read      11        0% /extra1/test/amd64/libgtk-x11-2.0.so.0
         read      11        2% /extra1/test/amd64/libgtk-x11-2.0.so
        write       3        2% /devices/pseudo/mm@0:null
      running       0        8%
      sleep-r       0       22%
      waitcpu       0       65%
   
        STATE   FDNUM      KB/s Filename
         read      11       182 /extra1/test/amd64/libsun_fc.so
         read      11       264 /extra1/test/amd64/libglib-2.0.so
         read      11       266 /extra1/test/amd64/libglib-2.0.so.0
         read      11       280 /extra1/test/amd64/libxml2.so.2
         read      11       293 /extra1/test/amd64/libgdk-x11-2.0.so
         read      11       295 /extra1/test/amd64/libgdk-x11-2.0.so.0
         read      11       526 /extra1/test/amd64/libCstd.so.1
         read      11       761 /extra1/test/amd64/libgtk-x11-2.0.so.0
         read      11      1606 /extra1/test/amd64/libgtk-x11-2.0.so
        write       3      7881 /devices/pseudo/mm@0:null
   
   Total event time (ms): 4596   Total Mbytes/sec: 13

Now we have "waitcpu" as tar competes for CPU cycles along with the greedy
infinite perl processes.