l4/pkg/valgrind/src/valgrind-3.6.0-svn/callgrind/docs/cl-manual.xml

   1 <?xml version="1.0"?> <!-- -*- sgml -*- -->
   2 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
   3   "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
   4 [ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
   5
   6 <chapter id="cl-manual" xreflabel="Callgrind Manual">
   7 <title>Callgrind: a call-graph generating cache and branch prediction profiler</title>
   8
   9
  10 <para>To use this tool, you must specify
  11 <option>--tool=callgrind</option> on the
  12 Valgrind command line.</para>
  13
  14 <sect1 id="cl-manual.use" xreflabel="Overview">
  15 <title>Overview</title>
  16
  17 <para>Callgrind is a profiling tool that records the call history among
  18 functions in a program's run as a call-graph.
  19 By default, the collected data consists of
  20 the number of instructions executed, their relationship
  21 to source lines, the caller/callee relationship between functions,
  22 and the numbers of such calls.
  23 Optionally, cache simulation and/or branch prediction (similar to Cachegrind)
  24 can produce further information about the runtime behavior of an application.
  25 </para>
  26
  27 <para>The profile data is written out to a file at program
  28 termination. For presentation of the data, and interactive control
  29 of the profiling, two command line tools are provided:</para>
  30 <variablelist>
  31   <varlistentry>
  32   <term><command>callgrind_annotate</command></term>
  33   <listitem>
  34     <para>This command reads in the profile data, and prints a
  35     sorted lists of functions, optionally with source annotation.</para>
  36
  37     <para>For graphical visualization of the data, try
  38     <ulink url="&cl-gui-url;">KCachegrind</ulink>, which is a KDE/Qt based
  39     GUI that makes it easy to navigate the large amount of data that
  40     Callgrind produces.</para>
  41
  42   </listitem>
  43   </varlistentry>
  44
  45   <varlistentry>
  46   <term><command>callgrind_control</command></term>
  47   <listitem>
  48     <para>This command enables you to interactively observe and control
  49     the status of a program currently running under Callgrind's control,
  50     without stopping the program.  You can get statistics information as
  51     well as the current stack trace, and you can request zeroing of counters
  52     or dumping of profile data.</para>
  53   </listitem>
  54   </varlistentry>
  55 </variablelist>
  56
  57   <sect2 id="cl-manual.functionality" xreflabel="Functionality">
  58   <title>Functionality</title>
  59
  60 <para>Cachegrind collects flat profile data: event counts (data reads,
  61 cache misses, etc.) are attributed directly to the function they
  62 occurred in.  This cost attribution mechanism is
  63 called <emphasis>self</emphasis> or <emphasis>exclusive</emphasis>
  64 attribution.</para>
  65
  66 <para>Callgrind extends this functionality by propagating costs
  67 across function call boundaries.  If function <function>foo</function> calls
  68 <function>bar</function>, the costs from <function>bar</function> are added into
  69 <function>foo</function>'s costs.  When applied to the program as a whole,
  70 this builds up a picture of so called <emphasis>inclusive</emphasis>
  71 costs, that is, where the cost of each function includes the costs of
  72 all functions it called, directly or indirectly.</para>
  73
  74 <para>As an example, the inclusive cost of
  75 <function>main</function> should be almost 100 percent
  76 of the total program cost.  Because of costs arising before
  77 <function>main</function> is run, such as
  78 initialization of the run time linker and construction of global C++
  79 objects, the inclusive cost of <function>main</function>
  80 is not exactly 100 percent of the total program cost.</para>
  81
  82 <para>Together with the call graph, this allows you to find the
  83 specific call chains starting from
  84 <function>main</function> in which the majority of the
  85 program's costs occur.  Caller/callee cost attribution is also useful
  86 for profiling functions called from multiple call sites, and where
  87 optimization opportunities depend on changing code in the callers, in
  88 particular by reducing the call count.</para>
  89
  90 <para>Callgrind's cache simulation is based on that of Cachegrind.
  91 Read the documentation for <xref linkend="cg-manual"/> first.  The material
  92 below describes the features supported in addition to Cachegrind's
  93 features.</para>
  94
  95 <para>Callgrind's ability to detect function calls and returns depends
  96 on the instruction set of the platform it is run on.  It works best
  97 on x86 and amd64, and unfortunately currently does not work so well
  98 on PowerPC code.  This is because there are no explicit call or return
  99 instructions in the PowerPC instruction set, so Callgrind has to rely
 100 on heuristics to detect calls and returns.</para>
 101
 102   </sect2>
 103
 104   <sect2 id="cl-manual.basics" xreflabel="Basic Usage">
 105   <title>Basic Usage</title>
 106
 107   <para>As with Cachegrind, you probably want to compile with debugging info
 108   (the <option>-g</option> option) and with optimization turned on.</para>
 109
 110   <para>To start a profile run for a program, execute:
 111   <screen>valgrind --tool=callgrind [callgrind options] your-program [program options]</screen>
 112   </para>
 113
 114   <para>While the simulation is running, you can observe execution with:
 115   <screen>callgrind_control -b</screen>
 116   This will print out the current backtrace. To annotate the backtrace with
 117   event counts, run
 118   <screen>callgrind_control -e -b</screen>
 119   </para>
 120
 121   <para>After program termination, a profile data file named
 122   <computeroutput>callgrind.out.&lt;pid&gt;</computeroutput>
 123   is generated, where <emphasis>pid</emphasis> is the process ID
 124   of the program being profiled.
 125   The data file contains information about the calls made in the
 126   program among the functions executed, together with
 127   <command>Instruction Read</command> (Ir) event counts.</para>
 128
 129   <para>To generate a function-by-function summary from the profile
 130   data file, use
 131   <screen>callgrind_annotate [options] callgrind.out.&lt;pid&gt;</screen>
 132   This summary is similar to the output you get from a Cachegrind
 133   run with cg_annotate: the list
 134   of functions is ordered by exclusive cost of functions, which also
 135   are the ones that are shown.
 136   Important for the additional features of Callgrind are
 137   the following two options:</para>
 138
 139   <itemizedlist>
 140     <listitem>
 141       <para><option>--inclusive=yes</option>: Instead of using
 142       exclusive cost of functions as sorting order, use and show
 143       inclusive cost.</para>
 144     </listitem>
 145
 146     <listitem>
 147       <para><option>--tree=both</option>: Interleave into the
 148       top level list of functions, information on the callers and the callees
 149       of each function. In these lines, which represents executed
 150       calls, the cost gives the number of events spent in the call.
 151       Indented, above each function, there is the list of callers,
 152       and below, the list of callees. The sum of events in calls to
 153       a given function (caller lines), as well as the sum of events in
 154       calls from the function (callee lines) together with the self
 155       cost, gives the total inclusive cost of the function.</para>
 156      </listitem>
 157   </itemizedlist>
 158
 159   <para>Use <option>--auto=yes</option> to get annotated source code
 160   for all relevant functions for which the source can be found. In
 161   addition to source annotation as produced by
 162   <computeroutput>cg_annotate</computeroutput>, you will see the
 163   annotated call sites with call counts. For all other options,
 164   consult the (Cachegrind) documentation for
 165   <computeroutput>cg_annotate</computeroutput>.
 166   </para>
 167
 168   <para>For better call graph browsing experience, it is highly recommended
 169   to use <ulink url="&cl-gui-url;">KCachegrind</ulink>.
 170   If your code
 171   has a significant fraction of its cost in <emphasis>cycles</emphasis> (sets
 172   of functions calling each other in a recursive manner), you have to
 173   use KCachegrind, as <computeroutput>callgrind_annotate</computeroutput>
 174   currently does not do any cycle detection, which is important to get correct
 175   results in this case.</para>
 176
 177   <para>If you are additionally interested in measuring the
 178   cache behavior of your program, use Callgrind with the option
 179   <option><xref linkend="clopt.cache-sim"/>=yes</option>. For
 180   branch prediction simulation, use <option><xref linkend="clopt.branch-sim"/>=yes</option>.
 181   Expect a further slow down approximately by a factor of 2.</para>
 182
 183   <para>If the program section you want to profile is somewhere in the
 184   middle of the run, it is beneficial to
 185   <emphasis>fast forward</emphasis> to this section without any
 186   profiling, and then enable profiling.  This is achieved by using
 187   the command line option
 188   <option><xref linkend="opt.instr-atstart"/>=no</option>
 189   and running, in a shell:
 190   <computeroutput>callgrind_control -i on</computeroutput> just before the
 191   interesting code section is executed. To exactly specify
 192   the code position where profiling should start, use the client request
 193   <computeroutput><xref linkend="cr.start-instr"/></computeroutput>.</para>
 194
 195   <para>If you want to be able to see assembly code level annotation, specify
 196   <option><xref linkend="opt.dump-instr"/>=yes</option>. This will produce
 197   profile data at instruction granularity. Note that the resulting profile
 198   data
 199   can only be viewed with KCachegrind. For assembly annotation, it also is
 200   interesting to see more details of the control flow inside of functions,
 201   i.e. (conditional) jumps. This will be collected by further specifying
 202   <option><xref linkend="opt.collect-jumps"/>=yes</option>.</para>
 203
 204   </sect2>
 205
 206 </sect1>
 207
 208 <sect1 id="cl-manual.usage" xreflabel="Advanced Usage">
 209 <title>Advanced Usage</title>
 210
 211   <sect2 id="cl-manual.dumps"
 212          xreflabel="Multiple dumps from one program run">
 213   <title>Multiple profiling dumps from one program run</title>
 214
 215   <para>Sometimes you are not interested in characteristics of a full
 216   program run, but only of a small part of it, for example execution of one
 217   algorithm.  If there are multiple algorithms, or one algorithm
 218   running with different input data, it may even be useful to get different
 219   profile information for different parts of a single program run.</para>
 220
 221   <para>Profile data files have names of the form
 222 <screen>
 223 callgrind.out.<emphasis>pid</emphasis>.<emphasis>part</emphasis>-<emphasis>threadID</emphasis>
 224 </screen>
 225   </para>
 226   <para>where <emphasis>pid</emphasis> is the PID of the running
 227   program, <emphasis>part</emphasis> is a number incremented on each
 228   dump (".part" is skipped for the dump at program termination), and
 229   <emphasis>threadID</emphasis> is a thread identification
 230   ("-threadID" is only used if you request dumps of individual
 231   threads with <option><xref linkend="opt.separate-threads"/>=yes</option>).</para>
 232
 233   <para>There are different ways to generate multiple profile dumps
 234   while a program is running under Callgrind's supervision.  Nevertheless,
 235   all methods trigger the same action, which is "dump all profile
 236   information since the last dump or program start, and zero cost
 237   counters afterwards".  To allow for zeroing cost counters without
 238   dumping, there is a second action "zero all cost counters now".
 239   The different methods are:</para>
 240   <itemizedlist>
 241
 242     <listitem>
 243       <para><command>Dump on program termination.</command>
 244       This method is the standard way and doesn't need any special
 245       action on your part.</para>
 246     </listitem>
 247
 248     <listitem>
 249       <para><command>Spontaneous, interactive dumping.</command> Use
 250       <screen>callgrind_control -d [hint [PID/Name]]</screen> to
 251       request the dumping of profile information of the supervised
 252       application with PID or Name.  <emphasis>hint</emphasis> is an
 253       arbitrary string you can optionally specify to later be able to
 254       distinguish profile dumps.  The control program will not terminate
 255       before the dump is completely written.  Note that the application
 256       must be actively running for detection of the dump command. So,
 257       for a GUI application, resize the window, or for a server, send a
 258       request.</para>
 259       <para>If you are using <ulink url="&cl-gui-url;">KCachegrind</ulink>
 260       for browsing of profile information, you can use the toolbar
 261       button <command>Force dump</command>. This will request a dump
 262       and trigger a reload after the dump is written.</para>
 263     </listitem>
 264
 265     <listitem>
 266       <para><command>Periodic dumping after execution of a specified
 267       number of basic blocks</command>. For this, use the command line
 268       option <option><xref linkend="opt.dump-every-bb"/>=count</option>.
 269       </para>
 270     </listitem>
 271
 272     <listitem>
 273       <para><command>Dumping at enter/leave of specified functions.</command>
 274       Use the
 275       option <option><xref linkend="opt.dump-before"/>=function</option>
 276       and <option><xref linkend="opt.dump-after"/>=function</option>.
 277       To zero cost counters before entering a function, use
 278       <option><xref linkend="opt.zero-before"/>=function</option>.</para>
 279       <para>You can specify these options multiple times for different
 280       functions. Function specifications support wildcards: e.g. use
 281       <option><xref linkend="opt.dump-before"/>='foo*'</option> to
 282       generate dumps before entering any function starting with
 283       <emphasis>foo</emphasis>.</para>
 284     </listitem>
 285
 286     <listitem>
 287       <para><command>Program controlled dumping.</command>
 288       Insert
 289       <computeroutput><xref linkend="cr.dump-stats"/>;</computeroutput>
 290       at the position in your code where you want a profile dump to happen. Use
 291       <computeroutput><xref linkend="cr.zero-stats"/>;</computeroutput> to only
 292       zero profile counters.
 293       See <xref linkend="cl-manual.clientrequests"/> for more information on
 294       Callgrind specific client requests.</para>
 295     </listitem>
 296   </itemizedlist>
 297
 298   <para>If you are running a multi-threaded application and specify the
 299   command line option <option><xref linkend="opt.separate-threads"/>=yes</option>,
 300   every thread will be profiled on its own and will create its own
 301   profile dump. Thus, the last two methods will only generate one dump
 302   of the currently running thread. With the other methods, you will get
 303   multiple dumps (one for each thread) on a dump request.</para>
 304
 305   </sect2>
 306
 307
 308
 309   <sect2 id="cl-manual.limits"
 310          xreflabel="Limiting range of event collection">
 311   <title>Limiting the range of collected events</title>
 312
 313   <para>For aggregating events (function enter/leave,
 314   instruction execution, memory access) into event numbers,
 315   first, the events must be recognizable by Callgrind, and second,
 316   the collection state must be enabled.</para>
 317
 318   <para>Event collection is only possible if <emphasis>instrumentation</emphasis>
 319   for program code is enabled. This is the default, but for faster
 320   execution (identical to <computeroutput>valgrind --tool=none</computeroutput>),
 321   it can be disabled until the program reaches a state in which
 322   you want to start collecting profiling data.
 323   Callgrind can start without instrumentation
 324   by specifying option <option><xref linkend="opt.instr-atstart"/>=no</option>.
 325   Instrumentation can be enabled interactively
 326   with: <screen>callgrind_control -i on</screen>
 327   and off by specifying "off" instead of "on".
 328   Furthermore, instrumentation state can be programatically changed with
 329   the macros <computeroutput><xref linkend="cr.start-instr"/>;</computeroutput>
 330   and <computeroutput><xref linkend="cr.stop-instr"/>;</computeroutput>.
 331   </para>
 332
 333   <para>In addition to enabling instrumentation, you must also enable
 334   event collection for the parts of your program you are interested in.
 335   By default, event collection is enabled everywhere.
 336   You can limit collection to a specific function
 337   by using
 338   <option><xref linkend="opt.toggle-collect"/>=function</option>.
 339   This will toggle the collection state on entering and leaving
 340   the specified functions.
 341   When this option is in effect, the default collection state
 342   at program start is "off".  Only events happening while running
 343   inside of the given function will be collected. Recursive
 344   calls of the given function do not trigger any action.</para>
 345
 346   <para>It is important to note that with instrumentation disabled, the
 347   cache simulator cannot see any memory access events, and thus, any
 348   simulated cache state will be frozen and wrong without instrumentation.
 349   Therefore, to get useful cache events (hits/misses) after switching on
 350   instrumentation, the cache first must warm up,
 351   probably leading to many <emphasis>cold misses</emphasis>
 352   which would not have happened in reality. If you do not want to see these,
 353   start event collection a few million instructions after you have enabled
 354   instrumentation.</para>
 355
 356   </sect2>
 357
 358   <sect2 id="cl-manual.busevents" xreflabel="Counting global bus events">
 359   <title>Counting global bus events</title>
 360
 361   <para>For access to shared data among threads in a multithreaded
 362   code, synchronization is required to avoid raced conditions.
 363   Synchronization primitives are usually implemented via atomic instructions.
 364   However, excessive use of such instructions can lead to performance
 365   issues.</para>
 366
 367   <para>To enable analysis of this problem, Callgrind optionally can count
 368   the number of atomic instructions executed. More precisely, for x86/x86_64,
 369   these are instructions using a lock prefix. For architectures supporting
 370   LL/SC, these are the number of SC instructions executed. For both, the term
 371   "global bus events" is used.</para>
 372
 373   <para>The short name of the event type used for global bus events is "Ge".
 374   To count global bus events, use <option><xref linkend="clopt.collect-bus"/>=yes</option>.
 375   </para>
 376   </sect2>
 377
 378   <sect2 id="cl-manual.cycles" xreflabel="Avoiding cycles">
 379   <title>Avoiding cycles</title>
 380
 381   <para>Informally speaking, a cycle is a group of functions which
 382   call each other in a recursive way.</para>
 383
 384   <para>Formally speaking, a cycle is a nonempty set S of functions,
 385   such that for every pair of functions F and G in S, it is possible
 386   to call from F to G (possibly via intermediate functions) and also
 387   from G to F.  Furthermore, S must be maximal -- that is, be the
 388   largest set of functions satisfying this property.  For example, if
 389   a third function H is called from inside S and calls back into S,
 390   then H is also part of the cycle and should be included in S.</para>
 391
 392   <para>Recursion is quite usual in programs, and therefore, cycles
 393   sometimes appear in the call graph output of Callgrind. However,
 394   the title of this chapter should raise two questions: What is bad
 395   about cycles which makes you want to avoid them? And: How can
 396   cycles be avoided without changing program code?</para>
 397
 398   <para>Cycles are not bad in itself, but tend to make performance
 399   analysis of your code harder. This is because inclusive costs
 400   for calls inside of a cycle are meaningless. The definition of
 401   inclusive cost, i.e. self cost of a function plus inclusive cost
 402   of its callees, needs a topological order among functions. For
 403   cycles, this does not hold true: callees of a function in a cycle include
 404   the function itself. Therefore, KCachegrind does cycle detection
 405   and skips visualization of any inclusive cost for calls inside
 406   of cycles. Further, all functions in a cycle are collapsed into artifical
 407   functions called like <computeroutput>Cycle 1</computeroutput>.</para>
 408
 409   <para>Now, when a program exposes really big cycles (as is
 410   true for some GUI code, or in general code using event or callback based
 411   programming style), you lose the nice property to let you pinpoint
 412   the bottlenecks by following call chains from
 413   <function>main</function>, guided via
 414   inclusive cost. In addition, KCachegrind loses its ability to show
 415   interesting parts of the call graph, as it uses inclusive costs to
 416   cut off uninteresting areas.</para>
 417
 418   <para>Despite the meaningless of inclusive costs in cycles, the big
 419   drawback for visualization motivates the possibility to temporarily
 420   switch off cycle detection in KCachegrind, which can lead to
 421   misguiding visualization. However, often cycles appear because of
 422   unlucky superposition of independent call chains in a way that
 423   the profile result will see a cycle. Neglecting uninteresting
 424   calls with very small measured inclusive cost would break these
 425   cycles. In such cases, incorrect handling of cycles by not detecting
 426   them still gives meaningful profiling visualization.</para>
 427
 428   <para>It has to be noted that currently, <command>callgrind_annotate</command>
 429   does not do any cycle detection at all. For program executions with function
 430   recursion, it e.g. can print nonsense inclusive costs way above 100%.</para>
 431
 432   <para>After describing why cycles are bad for profiling, it is worth
 433   talking about cycle avoidance. The key insight here is that symbols in
 434   the profile data do not have to exactly match the symbols found in the
 435   program. Instead, the symbol name could encode additional information
 436   from the current execution context such as recursion level of the
 437   current function, or even some part of the call chain leading to the
 438   function. While encoding of additional information into symbols is
 439   quite capable of avoiding cycles, it has to be used carefully to not cause
 440   symbol explosion. The latter imposes large memory requirement for Callgrind
 441   with possible out-of-memory conditions, and big profile data files.</para>
 442
 443   <para>A further possibility to avoid cycles in Callgrind's profile data
 444   output is to simply leave out given functions in the call graph. Of course, this
 445   also skips any call information from and to an ignored function, and thus can
 446   break a cycle. Candidates for this typically are dispatcher functions in event
 447   driven code. The option to ignore calls to a function is
 448   <option><xref linkend="opt.fn-skip"/>=function</option>. Aside from
 449   possibly breaking cycles, this is used in Callgrind to skip
 450   trampoline functions in the PLT sections
 451   for calls to functions in shared libraries. You can see the difference
 452   if you profile with <option><xref linkend="opt.skip-plt"/>=no</option>.
 453   If a call is ignored, its cost events will be propagated to the
 454   enclosing function.</para>
 455
 456   <para>If you have a recursive function, you can distinguish the first
 457   10 recursion levels by specifying
 458   <option><xref linkend="opt.separate-recs-num"/>=function</option>.
 459   Or for all functions with
 460   <option><xref linkend="opt.separate-recs"/>=10</option>, but this will
 461   give you much bigger profile data files.  In the profile data, you will see
 462   the recursion levels of "func" as the different functions with names
 463   "func", "func'2", "func'3" and so on.</para>
 464
 465   <para>If you have call chains "A &gt; B &gt; C" and "A &gt; C &gt; B"
 466   in your program, you usually get a "false" cycle "B &lt;&gt; C". Use
 467   <option><xref linkend="opt.separate-callers-num"/>=B</option>
 468   <option><xref linkend="opt.separate-callers-num"/>=C</option>,
 469   and functions "B" and "C" will be treated as different functions
 470   depending on the direct caller. Using the apostrophe for appending
 471   this "context" to the function name, you get "A &gt; B'A &gt; C'B"
 472   and "A &gt; C'A &gt; B'C", and there will be no cycle. Use
 473   <option><xref linkend="opt.separate-callers"/>=2</option> to get a 2-caller
 474   dependency for all functions.  Note that doing this will increase
 475   the size of profile data files.</para>
 476
 477   </sect2>
 478
 479   <sect2 id="cl-manual.forkingprograms" xreflabel="Forking Programs">
 480   <title>Forking Programs</title>
 481
 482   <para>If your program forks, the child will inherit all the profiling
 483   data that has been gathered for the parent. To start with empty profile
 484   counter values in the child, the client request
 485   <computeroutput><xref linkend="cr.zero-stats"/>;</computeroutput>
 486   can be inserted into code to be executed by the child, directly after
 487   <computeroutput>fork</computeroutput>.</para>
 488
 489   <para>However, you will have to make sure that the output file format string
 490   (controlled by <option>--callgrind-out-file</option>) does contain
 491   <option>%p</option> (which is true by default). Otherwise, the
 492   outputs from the parent and child will overwrite each other or will be
 493   intermingled, which almost certainly is not what you want.</para>
 494
 495   <para>You will be able to control the new child independently from
 496   the parent via callgrind_control.</para>
 497
 498   </sect2>
 499
 500 </sect1>
 501
 502
 503 <sect1 id="cl-manual.options" xreflabel="Callgrind Command-line Options">
 504 <title>Callgrind Command-line Options</title>
 505
 506 <para>
 507 In the following, options are grouped into classes.
 508 </para>
 509 <para>
 510 Some options allow the specification of a function/symbol name, such as
 511 <option><xref linkend="opt.dump-before"/>=function</option>, or
 512 <option><xref linkend="opt.fn-skip"/>=function</option>. All these options
 513 can be specified multiple times for different functions.
 514 In addition, the function specifications actually are patterns by supporting
 515 the use of wildcards '*' (zero or more arbitrary characters) and '?'
 516 (exactly one arbitrary character), similar to file name globbing in the
 517 shell. This feature is important especially for C++, as without wildcard
 518 usage, the function would have to be specified in full extent, including
 519 parameter signature. </para>
 520
 521 <sect2 id="cl-manual.options.creation"
 522        xreflabel="Dump creation options">
 523 <title>Dump creation options</title>
 524
 525 <para>
 526 These options influence the name and format of the profile data files.
 527 </para>
 528
 529 <!-- start of xi:include in the manpage -->
 530 <variablelist id="cl.opts.list.creation">
 531
 532   <varlistentry id="opt.callgrind-out-file" xreflabel="--callgrind-out-file">
 533     <term>
 534       <option><![CDATA[--callgrind-out-file=<file> ]]></option>
 535     </term>
 536     <listitem>
 537       <para>Write the profile data to
 538             <computeroutput>file</computeroutput> rather than to the default
 539             output file,
 540             <computeroutput>callgrind.out.&lt;pid&gt;</computeroutput>.  The
 541             <option>%p</option> and <option>%q</option> format specifiers
 542             can be used to embed the process ID and/or the contents of an
 543             environment variable in the name, as is the case for the core
 544             option <option><xref linkend="opt.log-file"/></option>.
 545             When multiple dumps are made, the file name
 546             is modified further; see below.</para>
 547     </listitem>
 548   </varlistentry>
 549
 550   <varlistentry id="opt.dump-line" xreflabel="--dump-line">
 551     <term>
 552       <option><![CDATA[--dump-line=<no|yes> [default: yes] ]]></option>
 553     </term>
 554     <listitem>
 555       <para>This specifies that event counting should be performed at
 556       source line granularity. This allows source annotation for sources
 557       which are compiled with debug information
 558       (<option>-g</option>).</para>
 559   </listitem>
 560   </varlistentry>
 561
 562   <varlistentry id="opt.dump-instr" xreflabel="--dump-instr">
 563     <term>
 564       <option><![CDATA[--dump-instr=<no|yes> [default: no] ]]></option>
 565     </term>
 566     <listitem>
 567       <para>This specifies that event counting should be performed at
 568       per-instruction granularity.
 569       This allows for assembly code
 570       annotation.  Currently the results can only be
 571       displayed by KCachegrind.</para>
 572   </listitem>
 573   </varlistentry>
 574
 575   <varlistentry id="opt.compress-strings" xreflabel="--compress-strings">
 576     <term>
 577       <option><![CDATA[--compress-strings=<no|yes> [default: yes] ]]></option>
 578     </term>
 579     <listitem>
 580       <para>This option influences the output format of the profile data.
 581       It specifies whether strings (file and function names) should be
 582       identified by numbers. This shrinks the file,
 583       but makes it more difficult
 584       for humans to read (which is not recommended in any case).</para>
 585     </listitem>
 586   </varlistentry>
 587
 588   <varlistentry id="opt.compress-pos" xreflabel="--compress-pos">
 589     <term>
 590       <option><![CDATA[--compress-pos=<no|yes> [default: yes] ]]></option>
 591     </term>
 592     <listitem>
 593       <para>This option influences the output format of the profile data.
 594       It specifies whether numerical positions are always specified as absolute
 595       values or are allowed to be relative to previous numbers.
 596       This shrinks the file size.</para>
 597     </listitem>
 598   </varlistentry>
 599
 600   <varlistentry id="opt.combine-dumps" xreflabel="--combine-dumps">
 601     <term>
 602       <option><![CDATA[--combine-dumps=<no|yes> [default: no] ]]></option>
 603     </term>
 604     <listitem>
 605       <para>When enabled, when multiple profile data parts are to be
 606       generated these parts are appended to the same output file.
 607       Not recommended.</para>
 608   </listitem>
 609   </varlistentry>
 610
 611 </variablelist>
 612 </sect2>
 613
 614 <sect2 id="cl-manual.options.activity"
 615        xreflabel="Activity options">
 616 <title>Activity options</title>
 617
 618 <para>
 619 These options specify when actions relating to event counts are to
 620 be executed. For interactive control use callgrind_control.
 621 </para>
 622
 623 <!-- start of xi:include in the manpage -->
 624 <variablelist id="cl.opts.list.activity">
 625
 626   <varlistentry id="opt.dump-every-bb" xreflabel="--dump-every-bb">
 627     <term>
 628       <option><![CDATA[--dump-every-bb=<count> [default: 0, never] ]]></option>
 629     </term>
 630     <listitem>
 631       <para>Dump profile data every <option>count</option> basic blocks.
 632       Whether a dump is needed is only checked when Valgrind's internal
 633       scheduler is run. Therefore, the minimum setting useful is about 100000.
 634       The count is a 64-bit value to make long dump periods possible.
 635       </para>
 636     </listitem>
 637   </varlistentry>
 638
 639   <varlistentry id="opt.dump-before" xreflabel="--dump-before">
 640     <term>
 641       <option><![CDATA[--dump-before=<function> ]]></option>
 642     </term>
 643     <listitem>
 644       <para>Dump when entering <option>function</option>.</para>
 645     </listitem>
 646   </varlistentry>
 647
 648   <varlistentry id="opt.zero-before" xreflabel="--zero-before">
 649     <term>
 650       <option><![CDATA[--zero-before=<function> ]]></option>
 651     </term>
 652     <listitem>
 653       <para>Zero all costs when entering <option>function</option>.</para>
 654     </listitem>
 655   </varlistentry>
 656
 657   <varlistentry id="opt.dump-after" xreflabel="--dump-after">
 658     <term>
 659       <option><![CDATA[--dump-after=<function> ]]></option>
 660     </term>
 661     <listitem>
 662       <para>Dump when leaving <option>function</option>.</para>
 663     </listitem>
 664   </varlistentry>
 665
 666 </variablelist>
 667 <!-- end of xi:include in the manpage -->
 668 </sect2>
 669
 670 <sect2 id="cl-manual.options.collection"
 671        xreflabel="Data collection options">
 672 <title>Data collection options</title>
 673
 674 <para>
 675 These options specify when events are to be aggregated into event counts.
 676 Also see <xref linkend="cl-manual.limits"/>.</para>
 677
 678 <!-- start of xi:include in the manpage -->
 679 <variablelist id="cl.opts.list.collection">
 680
 681   <varlistentry id="opt.instr-atstart" xreflabel="--instr-atstart">
 682     <term>
 683       <option><![CDATA[--instr-atstart=<yes|no> [default: yes] ]]></option>
 684     </term>
 685     <listitem>
 686       <para>Specify if you want Callgrind to start simulation and
 687       profiling from the beginning of the program.
 688       When set to <computeroutput>no</computeroutput>,
 689       Callgrind will not be able
 690       to collect any information, including calls, but it will have at
 691       most a slowdown of around 4, which is the minimum Valgrind
 692       overhead.  Instrumentation can be interactively enabled via
 693       <computeroutput>callgrind_control -i on</computeroutput>.</para>
 694       <para>Note that the resulting call graph will most probably not
 695       contain <function>main</function>, but will contain all the
 696       functions executed after instrumentation was enabled.
 697       Instrumentation can also programatically enabled/disabled. See the
 698       Callgrind include file
 699       <computeroutput>callgrind.h</computeroutput> for the macro
 700       you have to use in your source code.</para> <para>For cache
 701       simulation, results will be less accurate when switching on
 702       instrumentation later in the program run, as the simulator starts
 703       with an empty cache at that moment.  Switch on event collection
 704       later to cope with this error.</para>
 705     </listitem>
 706   </varlistentry>
 707
 708   <varlistentry id="opt.collect-atstart" xreflabel="--collect-atstart">
 709     <term>
 710       <option><![CDATA[--collect-atstart=<yes|no> [default: yes] ]]></option>
 711     </term>
 712     <listitem>
 713       <para>Specify whether event collection is enabled at beginning
 714       of the profile run.</para>
 715       <para>To only look at parts of your program, you have two
 716       possibilities:</para>
 717       <orderedlist>
 718       <listitem>
 719         <para>Zero event counters before entering the program part you
 720         want to profile, and dump the event counters to a file after
 721         leaving that program part.</para>
 722         </listitem>
 723         <listitem>
 724           <para>Switch on/off collection state as needed to only see
 725           event counters happening while inside of the program part you
 726           want to profile.</para>
 727         </listitem>
 728       </orderedlist>
 729       <para>The second option can be used if the program part you want to
 730       profile is called many times. Option 1, i.e. creating a lot of
 731       dumps is not practical here.</para>
 732       <para>Collection state can be
 733       toggled at entry and exit of a given function with the
 734       option <option><xref linkend="opt.toggle-collect"/></option>.  If you
 735       use this option, collection
 736       state should be disabled at the beginning.  Note that the
 737       specification of <option>--toggle-collect</option>
 738       implicitly sets
 739       <option>--collect-state=no</option>.</para>
 740       <para>Collection state can be toggled also by inserting the client request
 741       <computeroutput>
 742       <!-- commented out because it causes broken links in the man page
 743       <xref linkend="cr.toggle-collect"/>;
 744       -->
 745       CALLGRIND_TOGGLE_COLLECT
 746       ;</computeroutput>
 747       at the needed code positions.</para>
 748     </listitem>
 749   </varlistentry>
 750
 751   <varlistentry id="opt.toggle-collect" xreflabel="--toggle-collect">
 752     <term>
 753       <option><![CDATA[--toggle-collect=<function> ]]></option>
 754     </term>
 755     <listitem>
 756       <para>Toggle collection on entry/exit of <option>function</option>.</para>
 757     </listitem>
 758   </varlistentry>
 759
 760   <varlistentry id="opt.collect-jumps" xreflabel="--collect-jumps">
 761     <term>
 762       <option><![CDATA[--collect-jumps=<no|yes> [default: no] ]]></option>
 763     </term>
 764     <listitem>
 765       <para>This specifies whether information for (conditional) jumps
 766       should be collected.  As above, callgrind_annotate currently is not
 767       able to show you the data.  You have to use KCachegrind to get jump
 768       arrows in the annotated code.</para>
 769     </listitem>
 770   </varlistentry>
 771
 772   <varlistentry id="opt.collect-systime" xreflabel="--collect-systime">
 773     <term>
 774       <option><![CDATA[--collect-systime=<no|yes> [default: no] ]]></option>
 775     </term>
 776     <listitem>
 777       <para>This specifies whether information for system call times
 778       should be collected.</para>
 779     </listitem>
 780   </varlistentry>
 781
 782   <varlistentry id="clopt.collect-bus" xreflabel="--collect-bus">
 783     <term>
 784       <option><![CDATA[--collect-bus=<no|yes> [default: no] ]]></option>
 785     </term>
 786     <listitem>
 787       <para>This specifies whether the number of global bus events executed
 788       should be collected. The event type "Ge" is used for these events.</para>
 789     </listitem>
 790   </varlistentry>
 791
 792 </variablelist>
 793 <!-- end of xi:include in the manpage -->
 794 </sect2>
 795
 796 <sect2 id="cl-manual.options.separation"
 797        xreflabel="Cost entity separation options">
 798 <title>Cost entity separation options</title>
 799
 800 <para>
 801 These options specify how event counts should be attributed to execution
 802 contexts.
 803 For example, they specify whether the recursion level or the
 804 call chain leading to a function should be taken into account,
 805 and whether the thread ID should be considered.
 806 Also see <xref linkend="cl-manual.cycles"/>.</para>
 807
 808 <!-- start of xi:include in the manpage -->
 809 <variablelist id="cmd-options.separation">
 810
 811   <varlistentry id="opt.separate-threads" xreflabel="--separate-threads">
 812     <term>
 813       <option><![CDATA[--separate-threads=<no|yes> [default: no] ]]></option>
 814     </term>
 815     <listitem>
 816       <para>This option specifies whether profile data should be generated
 817       separately for every thread. If yes, the file names get "-threadID"
 818       appended.</para>
 819     </listitem>
 820   </varlistentry>
 821
 822   <varlistentry id="opt.separate-callers" xreflabel="--separate-callers">
 823     <term>
 824       <option><![CDATA[--separate-callers=<callers> [default: 0] ]]></option>
 825     </term>
 826     <listitem>
 827       <para>Separate contexts by at most &lt;callers&gt; functions in the
 828       call chain. See <xref linkend="cl-manual.cycles"/>.</para>
 829     </listitem>
 830   </varlistentry>
 831
 832   <varlistentry id="opt.separate-callers-num" xreflabel="--separate-callers2">
 833     <term>
 834       <option><![CDATA[--separate-callers<number>=<function> ]]></option>
 835     </term>
 836     <listitem>
 837       <para>Separate <option>number</option> callers for <option>function</option>.
 838       See <xref linkend="cl-manual.cycles"/>.</para>
 839     </listitem>
 840   </varlistentry>
 841
 842   <varlistentry id="opt.separate-recs" xreflabel="--separate-recs">
 843     <term>
 844       <option><![CDATA[--separate-recs=<level> [default: 2] ]]></option>
 845     </term>
 846     <listitem>
 847       <para>Separate function recursions by at most <option>level</option> levels.
 848       See <xref linkend="cl-manual.cycles"/>.</para>
 849     </listitem>
 850   </varlistentry>
 851
 852   <varlistentry id="opt.separate-recs-num" xreflabel="--separate-recs10">
 853     <term>
 854       <option><![CDATA[--separate-recs<number>=<function> ]]></option>
 855     </term>
 856     <listitem>
 857       <para>Separate <option>number</option> recursions for <option>function</option>.
 858       See <xref linkend="cl-manual.cycles"/>.</para>
 859     </listitem>
 860   </varlistentry>
 861
 862   <varlistentry id="opt.skip-plt" xreflabel="--skip-plt">
 863     <term>
 864       <option><![CDATA[--skip-plt=<no|yes> [default: yes] ]]></option>
 865     </term>
 866     <listitem>
 867       <para>Ignore calls to/from PLT sections.</para>
 868     </listitem>
 869   </varlistentry>
 870
 871   <varlistentry id="opt.skip-direct-rec" xreflabel="--skip-direct-rec">
 872     <term>
 873       <option><![CDATA[--skip-direct-rec=<no|yes> [default: yes] ]]></option>
 874     </term>
 875     <listitem>
 876       <para>Ignore direct recursions.</para>
 877     </listitem>
 878   </varlistentry>
 879
 880   <varlistentry id="opt.fn-skip" xreflabel="--fn-skip">
 881     <term>
 882       <option><![CDATA[--fn-skip=<function> ]]></option>
 883     </term>
 884     <listitem>
 885       <para>Ignore calls to/from a given function.  E.g. if you have a
 886       call chain A &gt; B &gt; C, and you specify function B to be
 887       ignored, you will only see A &gt; C.</para>
 888       <para>This is very convenient to skip functions handling callback
 889       behaviour.  For example, with the signal/slot mechanism in the
 890       Qt graphics library, you only want
 891       to see the function emitting a signal to call the slots connected
 892       to that signal. First, determine the real call chain to see the
 893       functions needed to be skipped, then use this option.</para>
 894     </listitem>
 895   </varlistentry>
 896
 897 <!--
 898     commenting out as it is only enabled with CLG_EXPERIMENTAL.  (Nb: I had to
 899     insert a space between the double dash to avoid XML comment problems.)
 900
 901   <varlistentry id="opt.fn-group">
 902     <term>
 903       <option><![CDATA[- -fn-group<number>=<function> ]]></option>
 904     </term>
 905     <listitem>
 906       <para>Put a function into a separate group. This influences the
 907       context name for cycle avoidance. All functions inside such a
 908       group are treated as being the same for context name building, which
 909       resembles the call chain leading to a context. By specifying function
 910       groups with this option, you can shorten the context name, as functions
 911       in the same group will not appear in sequence in the name. </para>
 912     </listitem>
 913   </varlistentry>
 914 -->
 915
 916 </variablelist>
 917 <!-- end of xi:include in the manpage -->
 918 </sect2>
 919
 920
 921 <sect2 id="cl-manual.options.simulation"
 922        xreflabel="Simulation options">
 923 <title>Simulation options</title>
 924
 925 <!-- start of xi:include in the manpage -->
 926 <variablelist id="cl.opts.list.simulation">
 927
 928   <varlistentry id="clopt.cache-sim" xreflabel="--cache-sim">
 929     <term>
 930       <option><![CDATA[--cache-sim=<yes|no> [default: no] ]]></option>
 931     </term>
 932     <listitem>
 933       <para>Specify if you want to do full cache simulation.  By default,
 934       only instruction read accesses will be counted ("Ir").
 935       With cache simulation, further event counters are enabled:
 936       Cache misses on instruction reads ("I1mr"/"ILmr"),
 937       data read accesses ("Dr") and related cache misses ("D1mr"/"DLmr"),
 938       data write accesses ("Dw") and related cache misses ("D1mw"/"DLmw").
 939       For more information, see <xref linkend="cg-manual"/>.
 940       </para>
 941     </listitem>
 942   </varlistentry>
 943
 944   <varlistentry id="clopt.branch-sim" xreflabel="--branch-sim">
 945     <term>
 946       <option><![CDATA[--branch-sim=<yes|no> [default: no] ]]></option>
 947     </term>
 948     <listitem>
 949       <para>Specify if you want to do branch prediction simulation.
 950       Further event counters are enabled: Number of executed conditional
 951       branches and related predictor misses ("Bc"/"Bcm"), executed indirect
 952       jumps and related misses of the jump address predictor ("Bi"/"Bim").
 953       </para>
 954     </listitem>
 955   </varlistentry>
 956
 957 </variablelist>
 958 <!-- end of xi:include in the manpage -->
 959 </sect2>
 960
 961
 962 <sect2 id="cl-manual.options.cachesimulation"
 963        xreflabel="Cache simulation options">
 964 <title>Cache simulation options</title>
 965
 966 <!-- start of xi:include in the manpage -->
 967 <variablelist id="cl.opts.list.cachesimulation">
 968
 969   <varlistentry id="opt.simulate-wb" xreflabel="--simulate-wb">
 970     <term>
 971       <option><![CDATA[--simulate-wb=<yes|no> [default: no] ]]></option>
 972     </term>
 973     <listitem>
 974       <para>Specify whether write-back behavior should be simulated, allowing
 975       to distinguish LL caches misses with and without write backs.
 976       The cache model of Cachegrind/Callgrind does not specify write-through
 977       vs. write-back behavior, and this also is not relevant for the number
 978       of generated miss counts. However, with explicit write-back simulation
 979       it can be decided whether a miss triggers not only the loading of a new
 980       cache line, but also if a write back of a dirty cache line had to take
 981       place before. The new dirty miss events are ILdmr, DLdmr, and DLdmw,
 982       for misses because of instruction read, data read, and data write,
 983       respectively. As they produce two memory transactions, they should
 984       account for a doubled time estimation in relation to a normal miss.
 985       </para>
 986     </listitem>
 987   </varlistentry>
 988
 989   <varlistentry id="opt.simulate-hwpref" xreflabel="--simulate-hwpref">
 990     <term>
 991       <option><![CDATA[--simulate-hwpref=<yes|no> [default: no] ]]></option>
 992     </term>
 993     <listitem>
 994       <para>Specify whether simulation of a hardware prefetcher should be
 995       added which is able to detect stream access in the second level cache
 996       by comparing accesses to separate to each page.
 997       As the simulation can not decide about any timing issues of prefetching,
 998       it is assumed that any hardware prefetch triggered succeeds before a
 999       real access is done. Thus, this gives a best-case scenario by covering
1000       all possible stream accesses.</para>
1001     </listitem>
1002   </varlistentry>
1003
1004   <varlistentry id="opt.cacheuse" xreflabel="--cacheuse">
1005     <term>
1006       <option><![CDATA[--cacheuse=<yes|no> [default: no] ]]></option>
1007     </term>
1008     <listitem>
1009       <para>Specify whether cache line use should be collected. For every
1010       cache line, from loading to it being evicted, the number of accesses
1011       as well as the number of actually used bytes is determined. This
1012       behavior is related to the code which triggered loading of the cache
1013       line. In contrast to miss counters, which shows the position where
1014       the symptoms of bad cache behavior (i.e. latencies) happens, the
1015       use counters try to pinpoint at the reason (i.e. the code with the
1016       bad access behavior). The new counters are defined in a way such
1017       that worse behavior results in higher cost.
1018       AcCost1 and AcCost2 are counters showing bad temporal locality
1019       for L1 and LL caches, respectively. This is done by summing up
1020       reciprocal values of the numbers of accesses of each cache line,
1021       multiplied by 1000 (as only integer costs are allowed). E.g. for
1022       a given source line with 5 read accesses, a value of 5000 AcCost
1023       means that for every access, a new cache line was loaded and directly
1024       evicted afterwards without further accesses. Similarly, SpLoss1/2
1025       shows bad spatial locality for L1 and LL caches, respectively. It
1026       gives the <emphasis>spatial loss</emphasis> count of bytes which
1027       were loaded into cache but never accessed. It pinpoints at code
1028       accessing data in a way such that cache space is wasted. This hints
1029       at bad layout of data structures in memory. Assuming a cache line
1030       size of 64 bytes and 100 L1 misses for a given source line, the
1031       loading of 6400 bytes into L1 was triggered. If SpLoss1 shows a
1032       value of 3200 for this line, this means that half of the loaded data was
1033       never used, or using a better data layout, only half of the cache
1034       space would have been needed.
1035       Please note that for cache line use counters, it currently is
1036       not possible to provide meaningful inclusive costs. Therefore,
1037       inclusive cost of these counters should be ignored.
1038       </para>
1039     </listitem>
1040   </varlistentry>
1041
1042   <varlistentry id="opt.I1" xreflabel="--I1">
1043     <term>
1044       <option><![CDATA[--I1=<size>,<associativity>,<line size> ]]></option>
1045     </term>
1046     <listitem>
1047       <para>Specify the size, associativity and line size of the level 1
1048       instruction cache.  </para>
1049     </listitem>
1050   </varlistentry>
1051
1052   <varlistentry id="opt.D1" xreflabel="--D1">
1053     <term>
1054       <option><![CDATA[--D1=<size>,<associativity>,<line size> ]]></option>
1055     </term>
1056     <listitem>
1057       <para>Specify the size, associativity and line size of the level 1
1058       data cache.</para>
1059     </listitem>
1060   </varlistentry>
1061
1062   <varlistentry id="opt.LL" xreflabel="--LL">
1063     <term>
1064       <option><![CDATA[--LL=<size>,<associativity>,<line size> ]]></option>
1065     </term>
1066     <listitem>
1067       <para>Specify the size, associativity and line size of the last-level
1068       cache.</para>
1069     </listitem>
1070   </varlistentry>
1071 </variablelist>
1072 <!-- end of xi:include in the manpage -->
1073
1074 </sect2>
1075
1076 </sect1>
1077
1078 <sect1 id="cl-manual.monitor-commands" xreflabel="Callgrind Monitor Commands">
1079 <title>Callgrind Monitor Commands</title>
1080 <para>The Callgrind tool provides monitor commands handled by the Valgrind
1081 gdbserver (see <xref linkend="manual-core.gdbserver-commandhandling"/>).
1082 </para>
1083
1084 <itemizedlist>
1085   <listitem>
1086     <para><varname>ct.dump [&lt;dump_hint&gt;]</varname> requests to dump the
1087     profile data. </para>
1088   </listitem>
1089
1090   <listitem>
1091     <para><varname>ct.zero</varname> requests to zero the profile data
1092     counters. </para>
1093   </listitem>
1094
1095   <listitem>
1096     <para>It would be nice to have some more callgrind monitor
1097     commands such as e.g. toggle collect and start instrumentation.
1098     </para>
1099   </listitem>
1100
1101 </itemizedlist>
1102 </sect1>
1103
1104 <sect1 id="cl-manual.clientrequests" xreflabel="Client request reference">
1105 <title>Callgrind specific client requests</title>
1106
1107 <para>Callgrind provides the following specific client requests in
1108 <filename>callgrind.h</filename>.  See that file for the exact details of
1109 their arguments.</para>
1110
1111 <variablelist id="cl.clientrequests.list">
1112
1113   <varlistentry id="cr.dump-stats" xreflabel="CALLGRIND_DUMP_STATS">
1114     <term>
1115       <computeroutput>CALLGRIND_DUMP_STATS</computeroutput>
1116     </term>
1117     <listitem>
1118       <para>Force generation of a profile dump at specified position
1119       in code, for the current thread only. Written counters will be reset
1120       to zero.</para>
1121     </listitem>
1122   </varlistentry>
1123
1124   <varlistentry id="cr.dump-stats-at" xreflabel="CALLGRIND_DUMP_STATS_AT">
1125     <term>
1126       <computeroutput>CALLGRIND_DUMP_STATS_AT(string)</computeroutput>
1127     </term>
1128     <listitem>
1129       <para>Same as <computeroutput>CALLGRIND_DUMP_STATS</computeroutput>,
1130       but allows to specify a string to be able to distinguish profile
1131       dumps.</para>
1132     </listitem>
1133   </varlistentry>
1134
1135   <varlistentry id="cr.zero-stats" xreflabel="CALLGRIND_ZERO_STATS">
1136     <term>
1137       <computeroutput>CALLGRIND_ZERO_STATS</computeroutput>
1138     </term>
1139     <listitem>
1140       <para>Reset the profile counters for the current thread to zero.</para>
1141     </listitem>
1142   </varlistentry>
1143
1144   <varlistentry id="cr.toggle-collect" xreflabel="CALLGRIND_TOGGLE_COLLECT">
1145     <term>
1146       <computeroutput>CALLGRIND_TOGGLE_COLLECT</computeroutput>
1147     </term>
1148     <listitem>
1149       <para>Toggle the collection state. This allows to ignore events
1150       with regard to profile counters. See also options
1151       <option><xref linkend="opt.collect-atstart"/></option> and
1152       <option><xref linkend="opt.toggle-collect"/></option>.</para>
1153     </listitem>
1154   </varlistentry>
1155
1156   <varlistentry id="cr.start-instr" xreflabel="CALLGRIND_START_INSTRUMENTATION">
1157     <term>
1158       <computeroutput>CALLGRIND_START_INSTRUMENTATION</computeroutput>
1159     </term>
1160     <listitem>
1161       <para>Start full Callgrind instrumentation if not already enabled.
1162       When cache simulation is done, this will flush the simulated cache
1163       and lead to an artifical cache warmup phase afterwards with
1164       cache misses which would not have happened in reality.  See also
1165       option <option><xref linkend="opt.instr-atstart"/></option>.</para>
1166     </listitem>
1167   </varlistentry>
1168
1169   <varlistentry id="cr.stop-instr" xreflabel="CALLGRIND_STOP_INSTRUMENTATION">
1170     <term>
1171       <computeroutput>CALLGRIND_STOP_INSTRUMENTATION</computeroutput>
1172     </term>
1173     <listitem>
1174       <para>Stop full Callgrind instrumentation if not already disabled.
1175       This flushes Valgrinds translation cache, and does no additional
1176       instrumentation afterwards: it effectivly will run at the same
1177       speed as Nulgrind, i.e. at minimal slowdown. Use this to
1178       speed up the Callgrind run for uninteresting code parts. Use
1179       <computeroutput><xref linkend="cr.start-instr"/></computeroutput> to
1180       enable instrumentation again.  See also option
1181       <option><xref linkend="opt.instr-atstart"/></option>.</para>
1182     </listitem>
1183   </varlistentry>
1184
1185 </variablelist>
1186
1187 </sect1>
1188
1189
1190
1191 <sect1 id="cl-manual.callgrind_annotate-options" xreflabel="callgrind_annotate Command-line Options">
1192 <title>callgrind_annotate Command-line Options</title>
1193
1194 <!-- start of xi:include in the manpage -->
1195 <variablelist id="callgrind_annotate.opts.list">
1196
1197   <varlistentry>
1198     <term><option>-h --help</option></term>
1199     <listitem>
1200       <para>Show summary of options.</para>
1201     </listitem>
1202   </varlistentry>
1203
1204   <varlistentry>
1205     <term><option>--version</option></term>
1206     <listitem>
1207       <para>Show version of callgrind_annotate.</para>
1208     </listitem>
1209   </varlistentry>
1210
1211   <varlistentry>
1212     <term>
1213       <option>--show=A,B,C [default: all]</option>
1214     </term>
1215     <listitem>
1216       <para>Only show figures for events A,B,C.</para>
1217     </listitem>
1218   </varlistentry>
1219
1220   <varlistentry>
1221     <term>
1222       <option>--sort=A,B,C</option>
1223     </term>
1224     <listitem>
1225       <para>Sort columns by events A,B,C [event column order].</para>
1226     </listitem>
1227   </varlistentry>
1228
1229   <varlistentry>
1230     <term>
1231       <option><![CDATA[--threshold=<0--100> [default: 99%] ]]></option>
1232     </term>
1233     <listitem>
1234       <para>Percentage of counts (of primary sort event) we are
1235       interested in.</para>
1236     </listitem>
1237   </varlistentry>
1238
1239   <varlistentry>
1240     <term>
1241       <option><![CDATA[--auto=<yes|no> [default: no] ]]></option>
1242     </term>
1243     <listitem>
1244       <para>Annotate all source files containing functions that helped
1245       reach the event count threshold.</para>
1246     </listitem>
1247   </varlistentry>
1248
1249   <varlistentry>
1250     <term>
1251       <option>--context=N [default: 8] </option>
1252     </term>
1253     <listitem>
1254       <para>Print N lines of context before and after annotated
1255       lines.</para>
1256     </listitem>
1257   </varlistentry>
1258
1259   <varlistentry>
1260     <term>
1261       <option><![CDATA[--inclusive=<yes|no> [default: no] ]]></option>
1262     </term>
1263     <listitem>
1264       <para>Add subroutine costs to functions calls.</para>
1265     </listitem>
1266   </varlistentry>
1267
1268   <varlistentry>
1269     <term>
1270       <option><![CDATA[--tree=<none|caller|calling|both> [default: none] ]]></option>
1271     </term>
1272     <listitem>
1273       <para>Print for each function their callers, the called functions
1274       or both.</para>
1275     </listitem>
1276   </varlistentry>
1277
1278   <varlistentry>
1279     <term>
1280       <option><![CDATA[-I, --include=<dir> ]]></option>
1281     </term>
1282     <listitem>
1283       <para>Add <option>dir</option> to the list of directories to search
1284       for source files.</para>
1285   </listitem>
1286   </varlistentry>
1287
1288 </variablelist>
1289 <!-- end of xi:include in the manpage -->
1290
1291
1292 </sect1>
1293
1294
1295
1296
1297 <sect1 id="cl-manual.callgrind_control-options" xreflabel="callgrind_control Command-line Options">
1298 <title>callgrind_control Command-line Options</title>
1299
1300 <para>By default, callgrind_control acts on all programs run by the
1301   current user under Callgrind.  It is possible to limit the actions to
1302   specified Callgrind runs by providing a list of pids or program names as
1303   argument.  The default action is to give some brief information about the
1304   applications being run under Callgrind.</para>
1305
1306 <!-- start of xi:include in the manpage -->
1307 <variablelist id="callgrind_control.opts.list">
1308
1309   <varlistentry>
1310     <term><option>-h --help</option></term>
1311     <listitem>
1312       <para>Show a short description, usage, and summary of options.</para>
1313     </listitem>
1314   </varlistentry>
1315
1316   <varlistentry>
1317     <term><option>--version</option></term>
1318     <listitem>
1319       <para>Show version of callgrind_control.</para>
1320     </listitem>
1321   </varlistentry>
1322
1323   <varlistentry>
1324     <term><option>-l --long</option></term>
1325     <listitem>
1326       <para>Show also the working directory, in addition to the brief
1327       information given by default.
1328       </para>
1329     </listitem>
1330   </varlistentry>
1331
1332   <varlistentry>
1333     <term><option>-s --stat</option></term>
1334     <listitem>
1335       <para>Show statistics information about active Callgrind runs.</para>
1336     </listitem>
1337   </varlistentry>
1338
1339   <varlistentry>
1340     <term><option>-b --back</option></term>
1341     <listitem>
1342       <para>Show stack/back traces of each thread in active Callgrind runs. For
1343       each active function in the stack trace, also the number of invocations
1344       since program start (or last dump) is shown. This option can be
1345       combined with -e to show inclusive cost of active functions.</para>
1346     </listitem>
1347   </varlistentry>
1348
1349   <varlistentry>
1350     <term><option><![CDATA[-e [A,B,...] ]]></option> (default: all)</term>
1351     <listitem>
1352       <para>Show the current per-thread, exclusive cost values of event
1353       counters. If no explicit event names are given, figures for all event
1354       types which are collected in the given Callgrind run are
1355       shown. Otherwise, only figures for event types A, B, ... are shown. If
1356       this option is combined with -b, inclusive cost for the functions of
1357       each active stack frame is provided, too.
1358       </para>
1359     </listitem>
1360   </varlistentry>
1361
1362   <varlistentry>
1363     <term><option><![CDATA[--dump[=<desc>] ]]></option> (default: no description)</term>
1364     <listitem>
1365       <para>Request the dumping of profile information. Optionally, a
1366       description can be specified which is written into the dump as part of
1367       the information giving the reason which triggered the dump action. This
1368       can be used to distinguish multiple dumps.</para>
1369     </listitem>
1370   </varlistentry>
1371
1372   <varlistentry>
1373     <term><option>-z --zero</option></term>
1374     <listitem>
1375       <para>Zero all event counters.</para>
1376     </listitem>
1377   </varlistentry>
1378
1379   <varlistentry>
1380     <term><option>-k --kill</option></term>
1381     <listitem>
1382       <para>Force a Callgrind run to be terminated.</para>
1383     </listitem>
1384   </varlistentry>
1385
1386   <varlistentry>
1387     <term><option><![CDATA[--instr=<on|off>]]></option></term>
1388     <listitem>
1389       <para>Switch instrumentation mode on or off. If a Callgrind run has
1390       instrumentation disabled, no simulation is done and no events are
1391       counted. This is useful to skip uninteresting program parts, as there
1392       is much less slowdown (same as with the Valgrind tool "none"). See also
1393       the Callgrind option <option>--instr-atstart</option>.</para>
1394     </listitem>
1395   </varlistentry>
1396
1397   <varlistentry>
1398     <term><option><![CDATA[-w=<dir>]]></option></term>
1399     <listitem>
1400       <para>Specify the startup directory of an active Callgrind run. On some
1401       systems, active Callgrind runs can not be detected. To be able to
1402       control these, the failed auto-detection can be worked around by
1403       specifying the directory where a Callgrind run was started.</para>
1404     </listitem>
1405   </varlistentry>
1406 </variablelist>
1407 <!-- end of xi:include in the manpage -->
1408
1409 </sect1>
1410
1411 </chapter>