[Softdevice-devel] Profiling tools

Martin Wache M.Wache at gmx.net
Tue Apr 5 22:35:14 CEST 2005


Marko Mäkelä wrote:
> On Tue, Apr 05, 2005 at 06:11:32PM +0200, Martin Wache wrote:
> 
>>>Finally, here is the list for libvdr-softdevice:
>>>
>>>samples  %        symbol name
>>>2986     26.1219  cVideoStreamDecoder::DecodePacket(AVPacket*)
>>>1320     11.5475  cStreamDecoder::Action()
>>
>>Actualy this is quite interessting. Action() should do nothing except 
>>waiting for new packets and call DecodePacket() for each packet. I would 
>>not expect it use so much time. Sleeping and the call to DecodePacket 
>>should not count, so where does it spend the time? Maybe the mutex... I 
>>guess playing around in this function may give some improvements..
> 
> 
> The opannotate command will show the samples for individual machine
> instructions.
> 
> 
>>I never used oprofile before, only gprof so I don't now exactly what 
>>these numbers mean. What is samples exactly, I asume it is the overall 
>>cpu time spend in a function? Is there a statistic like numbers of calls 
>>and time consumed per call?
> 
> 
> Please have a look at http://oprofile.sourceforge.net/.  There are some
> performance counters in every modern processor, and OProfile makes use of
> them.  When you start it, you specify the counter(s) to use and a sampling
> interval (the lower the number, the more frequently the samples will be
> taken).  An NMI interrupt handler saves the samples in a buffer, and the
> oprofile daemon collects the statistics and writes them out to a file.  This
> causes a few percent overhead, and no instrumentation of the running programs
> is necessary.  You can even profile kernel code, and you can start and stop
> collecting samples any time you want.
> 
> Yes, the more samples, the more resources are consumed at that location.
> CPU_CLK_UNHALTED measures CPU consumption.  I've also profiled some code
> (not VDR) using DCU_MISS_OUTSTANDING, which measures data cache misses
> (useful if you want to fine-tune things with __builtin_prefetch()).
> 
> 
>>You can disable USE_SUBPLUGINS in the softdevice makefile and you will 
>>get at least one library for the softdevice.
> 
> 
> Thanks.  I also see a dlopen() call in plugin.c.
> 
> 
>>>If you want to play with oprofile yourself, be sure to enable the following
>>
>>I guess if I find the time I will try gprof. I know it much better and 
>>we can compare the results...
> 
> 
> Of course, this depends on what you mean by "better".  Oh, sorry, I misread
> you: you didn't say "I know it's much better". :-)
I did not want to say that gprof is better. But I like the idea to have 
two tools and to be able to compare the results
> 
> I've used gprof years ago when there was no free software tool like oprofile.
> (I didn't have access to Intel's VTune or DCPI for Digital UNIX.)
> I have understood that gprof does not work with multi-threaded programs, and
There is a wrapper around pthread library which helps to get around this 
problem.
> I think you will have hard time using it in a dlopen()ed library.  Well,
> actually the dlopen() might not be a problem, as long as all code is compiled
> with gcc -pg.  For my doctoral thesis work, I wrote a program that generates
> some C code on the fly and executes it.  I can't remember, but it could be
> that it might actually have worked on the dlopen()ed code as well.
I just hacked vdr to link the softdevice staticaly...

> 
> If you want to get meaningful statistics from gprof, you will have to disable
> inlining, which degrades performance further.  Also, you will get junk from
> the initialization and cleanup phases.
Sure gprof is not perfect ;-)
> 
> With oprofile, you can choose to collect samples for the interesting part
> only, at a performance penalty of a few percent.  Already a run of 10 seconds
> gives pretty good numbers.  The longer you run it, the more accurate it gets
> due to the statistical nature of oprofile.  I'd let it collect samples for
> a couple of minutes to get a pretty accurate view.  But isn't gprof somehow
> statistical as well?
Yes, that's right the mechanism to collect the running times is also 
statistical ( and I guess since it is not done by the kernel less 
precise). But with gprof I get accurate function call acounting.
For instance the the GetReltime function seems to use much time. As long 
as I don't know how often it is called I don't know if a single call 
uses to much time or if we just call it to often.

> 
> Sure, and please tell me if you got it to work.  I have never tried to use
> gprof on a multi-threaded program.
Yes, I have it running but I used some really dirty hacks. I changed 
plugin.c not to dlopen any plugins, and instead of invoking creator() I 
just create a new cPluginSoftdevice. Then link all softdevice files 
staticaly and dependency to the vdr. Of course this hacked version can't 
use any other plugins than the softdevice...

The results are somehow similar to yours... I don't want to post a 
profiling now since it also contains all vdr functions and is realy huge.
And I found the reason for the Action() mystery: it seems that when one 
calls usleep() with smaller values than 10000 it waits in a busy loop. I 
thought it would just schedule the smallest possible time... So I will 
replace all usleeps > 10000, there are a few...
The GetRelTime function is just called very often and thus appears to 
use too much time.

Bye,
Martin



More information about the Softdevice-devel mailing list