Training courses

Kernel and Embedded Linux

Bootlin training courses

Embedded Linux, kernel,
Yocto Project, Buildroot, real-time,
graphics, boot time, debugging...

Bootlin logo

Elixir Cross Referencer

The following is a demonstration of the threaded.d script,


Here we run a test program called "cputhread" that creates 4 busy threads
that run at the same time. Here we run it on a server with only 1 CPU,

   # threaded.d
   
   2005 Jul 26 02:56:37,
   
     PID: 8516     CMD: cputhread
   
              value  ------------- Distribution ------------- count
                  1 |                                         0
                  2 |@@@@@@@                                  17
                  3 |@@@@@@@@@@@                              28
                  4 |@@@@@@@@@@@                              27
                  5 |@@@@@@@@@@@                              28
                  6 |                                         0
   [...]

In the above output, we can see that cputhread has four busy threads with 
thread IDs 2, 3, 4 and 5. We are sampling at 100 Hertz, and have caught 
each of these threads on the CPU between 17 and 28 times.

Since the above counts add to 100, this is either a sign of a single CPU
server (which it is), or a sign that a multithreaded application may be
running "serialised" - only 1 thread at a time. Compare the above output
to a multi CPU server,



Here we run the same test program on a server with 4 CPUs,

   # threaded.d
   
   2005 Jul 26 02:48:44,
   
     PID: 5218     CMD: cputhread
   
              value  ------------- Distribution ------------- count
                  1 |                                         0
                  2 |@@@@@@@@@@@                              80
                  3 |@@@@@@@@@@                               72
                  4 |@@@@@@@@@                                64
                  5 |@@@@@@@@@@@                              78
                  6 |                                         0
   [...]

This time the counts add to equal 294, so this program is definitely
running on multiple CPUs at the same time, otherwise this total would
not be beyond our sample rate of 100. The distribution of threads on CPU
is fairly even, and the above serves as an example of a multithreaded
application performing well.



Now we run a test program called "cpuserial", which also create 4 busy
threads, however due to a coding problem (poor use of mutex locks) they 
only run one at a time,

   # threaded.d

   2005 Jul 26 03:07:21,
   
     PID: 5238     CMD: cpuserial
   
              value  ------------- Distribution ------------- count
                  2 |                                         0
                  3 |@@@@@@@@@@@@                             30
                  4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@             70
                  5 |                                         0
   [...]

The above looks like we are back on a single CPU server with 100 samples
in total, however we are still on our 4 CPU server. Only two threads have
run, and the above distribution is a good indication that they have
run serialised.



Now more of a fringe case. This version of cpuserial again creates 4 threads
that are all busy and hungry for the CPU, and again we run it on a 4 CPU
server,

   # threaded.d
   
   2005 Jul 26 03:25:45,
   
     PID: 5280     CMD: cpuserial

              value  ------------- Distribution ------------- count
                  1 |                                         0
                  2 |@@@@@@@@@@@@@@@                          42
                  3 |@@@@@@@@@@@@@@@@@@                       50
                  4 |@@@@@@                                   15
                  5 |@                                        2
                  6 |                                         0
   [...]

So all threads are running, good. And with a total of 109, at some point
more than one thread was running at the same time (otherwise this would
not have exceeded 100, bearing in mind a sample rate of 100 Hertz). However,
what is not so good is that with 4 CPUs we have only scored 109 samples - 
since all threads are CPU hungry we'd hope that more often they could
run across the CPUs simultaneously; however this wasn't the case. Again,
this fault was created by poor use of mutex locks.