Mesure times in system calls with dtrace
drace syscall times
Sometimes processes are slow without consuming much ressources. In such cases you can try to use truss (or strace on some systems) to find out what the process is waiting for, however, in multithreaded processes this might not help you that much. Also tracing the process from userland will slow down the process further what might not be an option in production einvironments.
Using dtrace will not only be less intrusive to your application flow, it can also summarize the times for you. The following script will not only summarize the times spending in system calls (/opt/DTT/procsystime would do), it will also print you a line for each system call when it retruns
#!/usr/sbin/dtrace -s #pragma D option quiet syscall:::entry / zonename == "myzone" && execname == "tntnet" / { self->t=timestamp; } syscall:::return / zonename == "myzone" && self->t!=0 && execname == "tntnet" / { printf("%d/%d spent %d nsecs in %s\n", pid, tid, timestamp - self->t, probefunc); @function[probefunc] = sum (timestamp - self->t); }
For these who are familiar with awk, the syntax will not surprise you, in / / there are filters on which the code blocks apply, self is a special variable you can use to ensure coherency between threads.
... 14611/52 spent 3415416747 nsecs in pollsys 14611/52 spent 8583 nsecs in gtime 14611/52 spent 5333 nsecs in gtime 14611/52 spent 21666 nsecs in lwp_park 14611/52 spent 2798833 nsecs in nanosleep 14611/52 spent 20010167 nsecs in pollsys 14611/52 spent 18000 nsecs in read 14611/52 spent 9864917 nsecs in nanosleep 14611/52 spent 5417 nsecs in gtime 14611/53 spent 10008364573 nsecs in lwp_park 14611/53 spent 11500 nsecs in gtime 14611/53 spent 5500 nsecs in gtime ^C lseek 29583 fcntl 34666 close 45083 accept 57417 getpid 133751 open 134834 stat 248915 gtime 282585 read 1725499 write 2817832 nanosleep 80435583 pollsys 5699637580 lwp_park 35622920131
You see, that the process is spending the most time on beeing idle and waiting for requests, but tntnet is actually not a really good example since it doesn't appreas to be slow.