| <?xml version="1.0"?> <!-- -*- sgml -*- --> |
| <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" |
| "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" |
| [ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]> |
| |
| <chapter id="cl-format" xreflabel="Callgrind Format Specification"> |
| <title>Callgrind Format Specification</title> |
| |
| <para>This chapter describes the Callgrind Profile Format, Version 1.</para> |
| |
| <para>A synonymous name is "Calltree Profile Format". These names actually mean |
| the same since Callgrind was previously named Calltree.</para> |
| |
| <para>The format description is meant for the user to be able to understand the |
| file contents; but more important, it is given for authors of measurement or |
| visualization tools to be able to write and read this format.</para> |
| |
| <sect1 id="cl-format.overview" xreflabel="Overview"> |
| <title>Overview</title> |
| |
| <para>The profile data format is ASCII based. |
| It is written by Callgrind, and it is upwards compatible |
| to the format used by Cachegrind (ie. Cachegrind uses a subset). It can |
| be read by callgrind_annotate and KCachegrind.</para> |
| |
| <para>This chapter gives on overview of format features and examples. |
| For detailed syntax, look at the format reference.</para> |
| |
| <sect2 id="cl-format.overview.basics" xreflabel="Basic Structure"> |
| <title>Basic Structure</title> |
| |
| <para>Each file has a header part of an arbitrary number of lines of the |
| format "key: value". The lines with key "positions" and "events" define |
| the meaning of cost lines in the second part of the file: the value of |
| "positions" is a list of subpositions, and the value of "events" is a list |
| of event type names. Cost lines consist of subpositions followed by 64-bit |
| counters for the events, in the order specified by the "positions" and "events" |
| header line.</para> |
| |
| <para>The "events" header line is always required in contrast to the optional |
| line for "positions", which defaults to "line", i.e. a line number of some |
| source file. In addition, the second part of the file contains position |
| specifications of the form "spec=name". "spec" can be e.g. "fn" for a |
| function name or "fl" for a file name. Cost lines are always related to |
| the function/file specifications given directly before.</para> |
| |
| </sect2> |
| |
| <sect2 id="cl-format.overview.example1" xreflabel="Simple Example"> |
| <title>Simple Example</title> |
| |
| <para>The event names in the following example are quite arbitrary, and are not |
| related to event names used by Callgrind. Especially, cycle counts matching |
| real processors probably will never be generated by any Valgrind tools, as these |
| are bound to simulations of simple machine models for acceptable slowdown. |
| However, any profiling tool could use the format described in this chapter.</para> |
| |
| <para> |
| <screen>events: Cycles Instructions Flops |
| fl=file.f |
| fn=main |
| 15 90 14 2 |
| 16 20 12</screen></para> |
| |
| <para>The above example gives profile information for event types "Cycles", |
| "Instructions", and "Flops". Thus, cost lines give the number of CPU cycles |
| passed by, number of executed instructions, and number of floating point |
| operations executed while running code corresponding to some source |
| position. As there is no line specifying the value of "positions", it defaults |
| to "line", which means that the first number of a cost line is always a line |
| number.</para> |
| |
| <para>Thus, the first cost line specifies that in line 15 of source file |
| <filename>file.f</filename> there is code belonging to function |
| <function>main</function>. While running, 90 CPU cycles passed by, and 2 of |
| the 14 instructions executed were floating point operations. Similarly, the |
| next line specifies that there were 12 instructions executed in the context |
| of function <function>main</function> which can be related to line 16 in |
| file <filename>file.f</filename>, taking 20 CPU cycles. If a cost line |
| specifies less event counts than given in the "events" line, the rest is |
| assumed to be zero. I.e. there was no floating point instruction executed |
| relating to line 16.</para> |
| |
| <para>Note that regular cost lines always give self (also called exclusive) |
| cost of code at a given position. If you specify multiple cost lines for the |
| same position, these will be summed up. On the other hand, in the example above |
| there is no specification of how many times function |
| <function>main</function> actually was |
| called: profile data only contains sums.</para> |
| |
| </sect2> |
| |
| |
| <sect2 id="cl-format.overview.associations" xreflabel="Associations"> |
| <title>Associations</title> |
| |
| <para>The most important extension to the original format of Cachegrind is the |
| ability to specify call relationship among functions. More generally, you |
| specify associations among positions. For this, the second part of the |
| file also can contain association specifications. These look similar to |
| position specifications, but consist of 2 lines. For calls, the format |
| looks like |
| <screen> |
| calls=(Call Count) (Destination position) |
| (Source position) (Inclusive cost of call) |
| </screen></para> |
| |
| <para>The destination only specifies subpositions like line number. Therefore, |
| to be able to specify a call to another function in another source file, you |
| have to precede the above lines with a "cfn=" specification for the name of the |
| called function, and a "cfl=" specification if the function is in another |
| source file. The 2nd line looks like a regular cost line with the difference |
| that inclusive cost spent inside of the function call has to be specified.</para> |
| |
| <para>Other associations are for example (conditional) jumps. See the |
| reference below for details.</para> |
| |
| </sect2> |
| |
| |
| <sect2 id="cl-format.overview.example2" xreflabel="Extended Example"> |
| <title>Extended Example</title> |
| |
| <para>The following example shows 3 functions, <function>main</function>, |
| <function>func1</function>, and <function>func2</function>. Function |
| <function>main</function> calls <function>func1</function> once and |
| <function>func2</function> 3 times. <function>func1</function> calls |
| <function>func2</function> 2 times. |
| <screen>events: Instructions |
| |
| fl=file1.c |
| fn=main |
| 16 20 |
| cfn=func1 |
| calls=1 50 |
| 16 400 |
| cfl=file2.c |
| cfn=func2 |
| calls=3 20 |
| 16 400 |
| |
| fn=func1 |
| 51 100 |
| cfl=file2.c |
| cfn=func2 |
| calls=2 20 |
| 51 300 |
| |
| fl=file2.c |
| fn=func2 |
| 20 700</screen></para> |
| |
| <para>One can see that in <function>main</function> only code from line 16 |
| is executed where also the other functions are called. Inclusive cost of |
| <function>main</function> is 820, which is the sum of self cost 20 and costs |
| spent in the calls: 400 for the single call to <function>func1</function> |
| and 400 as sum for the three calls to <function>func2</function>.</para> |
| |
| <para>Function <function>func1</function> is located in |
| <filename>file1.c</filename>, the same as <function>main</function>. |
| Therefore, a "cfl=" specification for the call to <function>func1</function> |
| is not needed. The function <function>func1</function> only consists of code |
| at line 51 of <filename>file1.c</filename>, where <function>func2</function> |
| is called.</para> |
| |
| </sect2> |
| |
| |
| <sect2 id="cl-format.overview.compression1" xreflabel="Name Compression"> |
| <title>Name Compression</title> |
| |
| <para>With the introduction of association specifications like calls it is |
| needed to specify the same function or same file name multiple times. As |
| absolute filenames or symbol names in C++ can be quite long, it is advantageous |
| to be able to specify integer IDs for position specifications. |
| Here, the term "position" corresponds to a file name (source or object file) |
| or function name.</para> |
| |
| <para>To support name compression, a position specification can be not only of |
| the format "spec=name", but also "spec=(ID) name" to specify a mapping of an |
| integer ID to a name, and "spec=(ID)" to reference a previously defined ID |
| mapping. There is a separate ID mapping for each position specification, |
| i.e. you can use ID 1 for both a file name and a symbol name.</para> |
| |
| <para>With string compression, the example from 1.4 looks like this: |
| <screen>events: Instructions |
| |
| fl=(1) file1.c |
| fn=(1) main |
| 16 20 |
| cfn=(2) func1 |
| calls=1 50 |
| 16 400 |
| cfl=(2) file2.c |
| cfn=(3) func2 |
| calls=3 20 |
| 16 400 |
| |
| fn=(2) |
| 51 100 |
| cfl=(2) |
| cfn=(3) |
| calls=2 20 |
| 51 300 |
| |
| fl=(2) |
| fn=(3) |
| 20 700</screen></para> |
| |
| <para>As position specifications carry no information themselves, but only change |
| the meaning of subsequent cost lines or associations, they can appear |
| everywhere in the file without any negative consequence. Especially, you can |
| define name compression mappings directly after the header, and before any cost |
| lines. Thus, the above example can also be written as |
| <screen>events: Instructions |
| |
| # define file ID mapping |
| fl=(1) file1.c |
| fl=(2) file2.c |
| # define function ID mapping |
| fn=(1) main |
| fn=(2) func1 |
| fn=(3) func2 |
| |
| fl=(1) |
| fn=(1) |
| 16 20 |
| ...</screen></para> |
| |
| </sect2> |
| |
| |
| <sect2 id="cl-format.overview.compression2" xreflabel="Subposition Compression"> |
| <title>Subposition Compression</title> |
| |
| <para>If a Callgrind data file should hold costs for each assembler instruction |
| of a program, you specify subposition "instr" in the "positions:" header line, |
| and each cost line has to include the address of some instruction. Addresses |
| are allowed to have a size of 64 bits to support 64-bit architectures. Thus, |
| repeating similar, long addresses for almost every line in the data file can |
| enlarge the file size quite significantly, and |
| motivates for subposition compression: instead of every cost line starting with |
| a 16 character long address, one is allowed to specify relative addresses. |
| This relative specification is not only allowed for instruction addresses, but |
| also for line numbers; both addresses and line numbers are called "subpositions".</para> |
| |
| <para>A relative subposition always is based on the corresponding subposition |
| of the last cost line, and starts with a "+" to specify a positive difference, |
| a "-" to specify a negative difference, or consists of "*" to specify the same |
| subposition. Because absolute subpositions always are positive (ie. never |
| prefixed by "-"), any relative specification is non-ambiguous; additionally, |
| absolute and relative subposition specifications can be mixed freely. |
| Assume the following example (subpositions can always be specified |
| as hexadecimal numbers, beginning with "0x"): |
| <screen>positions: instr line |
| events: ticks |
| |
| fn=func |
| 0x80001234 90 1 |
| 0x80001237 90 5 |
| 0x80001238 91 6</screen></para> |
| |
| <para>With subposition compression, this looks like |
| <screen>positions: instr line |
| events: ticks |
| |
| fn=func |
| 0x80001234 90 1 |
| +3 * 5 |
| +1 +1 6</screen></para> |
| |
| <para>Remark: For assembler annotation to work, instruction addresses have to |
| be corrected to correspond to addresses found in the original binary. I.e. for |
| relocatable shared objects, often a load offset has to be subtracted.</para> |
| |
| </sect2> |
| |
| |
| <sect2 id="cl-format.overview.misc" xreflabel="Miscellaneous"> |
| <title>Miscellaneous</title> |
| |
| <sect3 id="cl-format.overview.misc.summary" xreflabel="Cost Summary Information"> |
| <title>Cost Summary Information</title> |
| |
| <para>For the visualization to be able to show cost percentage, a sum of the |
| cost of the full run has to be known. Usually, it is assumed that this is the |
| sum of all cost lines in a file. But sometimes, this is not correct. Thus, you |
| can specify a "summary:" line in the header giving the full cost for the |
| profile run. This has another effect: a import filter can show a progress bar |
| while loading a large data file if he knows to cost sum in advance.</para> |
| |
| </sect3> |
| |
| <sect3 id="cl-format.overview.misc.events" xreflabel="Long Names for Event Types and inherited Types"> |
| <title>Long Names for Event Types and inherited Types</title> |
| |
| <para>Event types for cost lines are specified in the "events:" line with an |
| abbreviated name. For visualization, it makes sense to be able to specify some |
| longer, more descriptive name. For an event type "Ir" which means "Instruction |
| Fetches", this can be specified the header line |
| <screen>event: Ir : Instruction Fetches |
| events: Ir Dr</screen></para> |
| |
| <para>In this example, "Dr" itself has no long name associated. The order of |
| "event:" lines and the "events:" line is of no importance. Additionally, |
| inherited event types can be introduced for which no raw data is available, but |
| which are calculated from given types. Suppose the last example, you could add |
| <screen>event: Sum = Ir + Dr</screen> |
| to specify an additional event type "Sum", which is calculated by adding costs |
| for "Ir and "Dr".</para> |
| |
| </sect3> |
| |
| </sect2> |
| |
| </sect1> |
| |
| <sect1 id="cl-format.reference" xreflabel="Reference"> |
| <title>Reference</title> |
| |
| <sect2 id="cl-format.reference.grammar" xreflabel="Grammar"> |
| <title>Grammar</title> |
| |
| <para> |
| <screen>ProfileDataFile := FormatVersion? Creator? PartData*</screen> |
| <screen>FormatVersion := "version:" Space* Number "\n"</screen> |
| <screen>Creator := "creator:" NoNewLineChar* "\n"</screen> |
| <screen>PartData := (HeaderLine "\n")+ (BodyLine "\n")+</screen> |
| <screen>HeaderLine := (empty line) |
| | ('#' NoNewLineChar*) |
| | PartDetail |
| | Description |
| | EventSpecification |
| | CostLineDef</screen> |
| <screen>PartDetail := TargetCommand | TargetID</screen> |
| <screen>TargetCommand := "cmd:" Space* NoNewLineChar*</screen> |
| <screen>TargetID := ("pid"|"thread"|"part") ":" Space* Number</screen> |
| <screen>Description := "desc:" Space* Name Space* ":" NoNewLineChar*</screen> |
| <screen>EventSpecification := "event:" Space* Name InheritedDef? LongNameDef?</screen> |
| <screen>InheritedDef := "=" InheritedExpr</screen> |
| <screen>InheritedExpr := Name |
| | Number Space* ("*" Space*)? Name |
| | InheritedExpr Space* "+" Space* InheritedExpr</screen> |
| <screen>LongNameDef := ":" NoNewLineChar*</screen> |
| <screen>CostLineDef := "events:" Space* Name (Space+ Name)* |
| | "positions:" "instr"? (Space+ "line")?</screen> |
| <screen>BodyLine := (empty line) |
| | ('#' NoNewLineChar*) |
| | CostLine |
| | PositionSpecification |
| | AssociationSpecification</screen> |
| <screen>CostLine := SubPositionList Costs?</screen> |
| <screen>SubPositionList := (SubPosition+ Space+)+</screen> |
| <screen>SubPosition := Number | "+" Number | "-" Number | "*"</screen> |
| <screen>Costs := (Number Space+)+</screen> |
| <screen>PositionSpecification := Position "=" Space* PositionName</screen> |
| <screen>Position := CostPosition | CalledPosition</screen> |
| <screen>CostPosition := "ob" | "fl" | "fi" | "fe" | "fn"</screen> |
| <screen>CalledPosition := " "cob" | "cfl" | "cfn"</screen> |
| <screen>PositionName := ( "(" Number ")" )? (Space* NoNewLineChar* )?</screen> |
| <screen>AssociationSpecification := CallSpecification |
| | JumpSpecification</screen> |
| <screen>CallSpecification := CallLine "\n" CostLine</screen> |
| <screen>CallLine := "calls=" Space* Number Space+ SubPositionList</screen> |
| <screen>JumpSpecification := ...</screen> |
| <screen>Space := " " | "\t"</screen> |
| <screen>Number := HexNumber | (Digit)+</screen> |
| <screen>Digit := "0" | ... | "9"</screen> |
| <screen>HexNumber := "0x" (Digit | HexChar)+</screen> |
| <screen>HexChar := "a" | ... | "f" | "A" | ... | "F"</screen> |
| <screen>Name = Alpha (Digit | Alpha)*</screen> |
| <screen>Alpha = "a" | ... | "z" | "A" | ... | "Z"</screen> |
| <screen>NoNewLineChar := all characters without "\n"</screen> |
| </para> |
| |
| </sect2> |
| |
| <sect2 id="cl-format.reference.header" xreflabel="Description of Header Lines"> |
| <title>Description of Header Lines</title> |
| |
| <para>The header has an arbitrary number of lines of the format |
| "key: value". Possible <emphasis>key</emphasis> values for the header are:</para> |
| |
| <itemizedlist> |
| |
| <listitem> |
| <para><computeroutput>version: number</computeroutput> [Callgrind]</para> |
| <para>This is used to distinguish future profile data formats. A |
| major version of 0 or 1 is supposed to be upwards compatible with |
| Cachegrind's format. It is optional; if not appearing, version 1 |
| is supposed. Otherwise, this has to be the first header line.</para> |
| </listitem> |
| |
| <listitem> |
| <para><computeroutput>pid: process id</computeroutput> [Callgrind]</para> |
| <para>This specifies the process ID of the supervised application |
| for which this profile was generated.</para> |
| </listitem> |
| |
| <listitem> |
| <para><computeroutput>cmd: program name + args</computeroutput> [Cachegrind]</para> |
| <para>This specifies the full command line of the supervised |
| application for which this profile was generated.</para> |
| </listitem> |
| |
| <listitem> |
| <para><computeroutput>part: number</computeroutput> [Callgrind]</para> |
| <para>This specifies a sequentially incremented number for each dump |
| generated, starting at 1.</para> |
| </listitem> |
| |
| <listitem> |
| <para><computeroutput>desc: type: value</computeroutput> [Cachegrind]</para> |
| <para>This specifies various information for this dump. For some |
| types, the semantic is defined, but any description type is allowed. |
| Unknown types should be ignored.</para> |
| <para>There are the types "I1 cache", "D1 cache", "LL cache", which |
| specify parameters used for the cache simulator. These are the only |
| types originally used by Cachegrind. Additionally, Callgrind uses |
| the following types: "Timerange" gives a rough range of the basic |
| block counter, for which the cost of this dump was collected. |
| Type "Trigger" states the reason of why this trace was generated. |
| E.g. program termination or forced interactive dump.</para> |
| </listitem> |
| |
| <listitem> |
| <para><computeroutput>positions: [instr] [line]</computeroutput> [Callgrind]</para> |
| <para>For cost lines, this defines the semantic of the first numbers. |
| Any combination of "instr", "bb" and "line" is allowed, but has to be |
| in this order which corresponds to position numbers at the start of |
| the cost lines later in the file.</para> |
| <para>If "instr" is specified, the position is the address of an |
| instruction whose execution raised the events given later on the |
| line. This address is relative to the offset of the binary/shared |
| library file to not have to specify relocation info. For "line", |
| the position is the line number of a source file, which is |
| responsible for the events raised. Note that the mapping of "instr" |
| and "line" positions are given by the debugging line information |
| produced by the compiler.</para> |
| <para>This field is optional. If not specified, "line" is supposed |
| only.</para> |
| </listitem> |
| |
| <listitem> |
| <para><computeroutput>events: event type abbreviations</computeroutput> [Cachegrind]</para> |
| <para>A list of short names of the event types logged in this file. |
| The order is the same as in cost lines. The first event type is the |
| second or third number in a cost line, depending on the value of |
| "positions". Callgrind does not add additional cost types. Specify |
| exactly once.</para> |
| <para>Cost types from original Cachegrind are: |
| <itemizedlist> |
| <listitem> |
| <para><command>Ir</command>: Instruction read access</para> |
| </listitem> |
| <listitem> |
| <para><command>I1mr</command>: Instruction Level 1 read cache miss</para> |
| </listitem> |
| <listitem> |
| <para><command>ILmr</command>: Instruction last-level read cache miss</para> |
| </listitem> |
| <listitem> |
| <para>...</para> |
| </listitem> |
| </itemizedlist> |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para><computeroutput>summary: costs</computeroutput> [Callgrind]</para> |
| <para><computeroutput>totals: costs</computeroutput> [Cachegrind]</para> |
| <para>The value or the total number of events covered by this trace |
| file. Both keys have the same meaning, but the "totals:" line |
| happens to be at the end of the file, while "summary:" appears in |
| the header. This was added to allow postprocessing tools to know |
| in advance to total cost. The two lines always give the same cost |
| counts.</para> |
| </listitem> |
| |
| </itemizedlist> |
| |
| </sect2> |
| |
| <sect2 id="cl-format.reference.body" xreflabel="Description of Body Lines"> |
| <title>Description of Body Lines</title> |
| |
| <para>There exist lines |
| <computeroutput>spec=position</computeroutput>. The values for position |
| specifications are arbitrary strings. When starting with "(" and a |
| digit, it's a string in compressed format. Otherwise it's the real |
| position string. This allows for file and symbol names as position |
| strings, as these never start with "(" + <emphasis>digit</emphasis>. |
| The compressed format is either "(" <emphasis>number</emphasis> ")" |
| <emphasis>space</emphasis> <emphasis>position</emphasis> or only |
| "(" <emphasis>number</emphasis> ")". The first relates |
| <emphasis>position</emphasis> to <emphasis>number</emphasis> in the |
| context of the given format specification from this line to the end of |
| the file; it makes the (<emphasis>number</emphasis>) an alias for |
| <emphasis>position</emphasis>. Compressed format is always |
| optional.</para> |
| |
| <para>Position specifications allowed:</para> |
| <itemizedlist> |
| |
| <listitem> |
| <para><computeroutput>ob=</computeroutput> [Callgrind]</para> |
| <para>The ELF object where the cost of next cost lines happens.</para> |
| </listitem> |
| |
| <listitem> |
| <para><computeroutput>fl=</computeroutput> [Cachegrind]</para> |
| </listitem> |
| |
| <listitem> |
| <para><computeroutput>fi=</computeroutput> [Cachegrind]</para> |
| </listitem> |
| |
| <listitem> |
| <para><computeroutput>fe=</computeroutput> [Cachegrind]</para> |
| <para>The source file including the code which is responsible for |
| the cost of next cost lines. "fi="/"fe=" is used when the source |
| file changes inside of a function, i.e. for inlined code.</para> |
| </listitem> |
| |
| <listitem> |
| <para><computeroutput>fn=</computeroutput> [Cachegrind]</para> |
| <para>The name of the function where the cost of next cost lines |
| happens.</para> |
| </listitem> |
| |
| <listitem> |
| <para><computeroutput>cob=</computeroutput> [Callgrind]</para> |
| <para>The ELF object of the target of the next call cost lines.</para> |
| </listitem> |
| |
| <listitem> |
| <para><computeroutput>cfl=</computeroutput> [Callgrind]</para> |
| <para>The source file including the code of the target of the |
| next call cost lines.</para> |
| </listitem> |
| |
| <listitem> |
| <para><computeroutput>cfn=</computeroutput> [Callgrind]</para> |
| <para>The name of the target function of the next call cost |
| lines.</para> |
| </listitem> |
| |
| <listitem> |
| <para><computeroutput>calls=</computeroutput> [Callgrind]</para> |
| <para>The number of nonrecursive calls which are responsible for the |
| cost specified by the next call cost line. This is the cost spent |
| inside of the called function.</para> |
| <para>After "calls=" there MUST be a cost line. This is the cost |
| spent in the called function. The first number is the source line |
| from where the call happened.</para> |
| </listitem> |
| |
| <listitem> |
| <para><computeroutput>jump=count target position</computeroutput> [Callgrind]</para> |
| <para>Unconditional jump, executed count times, to the given target |
| position.</para> |
| </listitem> |
| |
| <listitem> |
| <para><computeroutput>jcnd=exe.count jumpcount target position</computeroutput> [Callgrind]</para> |
| <para>Conditional jump, executed exe.count times with jumpcount |
| jumps to the given target position.</para> |
| </listitem> |
| |
| </itemizedlist> |
| |
| </sect2> |
| |
| </sect1> |
| |
| </chapter> |