| <?xml version="1.0" encoding="ISO-8859-1"?> |
| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
| <html xmlns="http://www.w3.org/1999/xhtml"> |
| <head> |
| <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> |
| <title>OProfile manual</title> |
| <meta name="generator" content="DocBook XSL Stylesheets V1.69.1" /> |
| </head> |
| <body> |
| <div class="book" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h1 class="title"><a id="oprofile-guide"></a>OProfile manual</h1> |
| </div> |
| <div> |
| <div class="authorgroup"> |
| <div class="author"> |
| <h3 class="author"><span class="firstname">John</span> <span class="surname">Levon</span></h3> |
| <div class="affiliation"> |
| <div class="address"> |
| <p> |
| <code class="email"><<a href="mailto:levon@movementarian.org">levon@movementarian.org</a>></code> |
| </p> |
| </div> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div> |
| <p class="copyright">Copyright © 2000-2004 Victoria University of Manchester, John Levon and others</p> |
| </div> |
| </div> |
| <hr /> |
| </div> |
| <div class="toc"> |
| <p> |
| <b>Table of Contents</b> |
| </p> |
| <dl> |
| <dt> |
| <span class="chapter"> |
| <a href="#introduction">1. Introduction</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect1"> |
| <a href="#applications">1. Applications of OProfile</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#jitsupport">1.1. Support for dynamically compiled (JIT) code</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#requirements">2. System requirements</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#resources">3. Internet resources</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#install">4. Installation</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#uninstall">5. Uninstalling OProfile</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="chapter"> |
| <a href="#overview">2. Overview</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect1"> |
| <a href="#getting-started">1. Getting started</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#tools-overview">2. Tools summary</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="chapter"> |
| <a href="#controlling">3. Controlling the profiler</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect1"> |
| <a href="#controlling-daemon">1. Using <span><strong class="command">opcontrol</strong></span></a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#opcontrolexamples">1.1. Examples</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#eventspec">1.2. Specifying performance counter events</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#setup-jit">2. Setting up the JIT profiling feature</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#setup-jit-jvm">2.1. JVM instrumentation</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#oprofile-gui">3. Using <span><strong class="command">oprof_start</strong></span></a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#detailed-parameters">4. Configuration details</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#hardware-counters">4.1. Hardware performance counters</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#rtc">4.2. OProfile in RTC mode</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#timer">4.3. OProfile in timer interrupt mode</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#p4">4.4. Pentium 4 support</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#ia64">4.5. Intel Itanium 2 support</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#ppc64">4.6. PowerPC64 support</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#cell-be">4.7. Cell Broadband Engine support</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#amd-ibs-support">4.8. AMD64 (x86_64) Instruction-Based Sampling (IBS) support</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#misuse">4.9. Dangerous counter settings</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| </dl> |
| </dd> |
| <dt> |
| <span class="chapter"> |
| <a href="#results">4. Obtaining results</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect1"> |
| <a href="#profile-spec">1. Profile specifications</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#profile-spec-examples">1.1. Examples</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#profile-spec-details">1.2. Profile specification parameters</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#locating-and-managing-binary-images">1.3. Locating and managing binary images</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#no-results">1.4. What to do when you don't get any results</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#opreport">2. Image summaries and symbol summaries (<span><strong class="command">opreport</strong></span>)</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-merging">2.1. Merging separate profiles</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-comparison">2.2. Side-by-side multiple results</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-callgraph">2.3. Callgraph output</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-diff">2.4. Differential profiles with <span><strong class="command">opreport</strong></span></a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-anon">2.5. Anonymous executable mappings</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-xml">2.6. XML formatted output</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-options">2.7. Options for <span><strong class="command">opreport</strong></span></a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#opannotate">3. Outputting annotated source (<span><strong class="command">opannotate</strong></span>)</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#opannotate-finding-source">3.1. Locating source files</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opannotate-details">3.2. Usage of <span><strong class="command">opannotate</strong></span></a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#getting-jit-reports">4. OProfile results with JIT samples</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#opgprof">5. <span><strong class="command">gprof</strong></span>-compatible output (<span><strong class="command">opgprof</strong></span>)</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#opgprof-details">5.1. Usage of <span><strong class="command">opgprof</strong></span></a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#oparchive">6. Archiving measurements (<span><strong class="command">oparchive</strong></span>)</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#oparchive-details">6.1. Usage of <span><strong class="command">oparchive</strong></span></a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#opimport">7. Converting sample database files (<span><strong class="command">opimport</strong></span>)</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#opimport-details">7.1. Usage of <span><strong class="command">opimport</strong></span></a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| </dl> |
| </dd> |
| <dt> |
| <span class="chapter"> |
| <a href="#interpreting">5. Interpreting profiling results</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect1"> |
| <a href="#irq-latency">1. Profiling interrupt latency</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#kernel-profiling">2. Kernel profiling</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#irq-masking">2.1. Interrupt masking</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#idle">2.2. Idle time</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#kernel-modules">2.3. Profiling kernel modules</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#interpreting-callgraph">3. Interpreting call-graph profiles</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#debug-info">4. Inaccuracies in annotated source</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#effect-of-optimizations">4.1. Side effects of optimizations</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#prologues">4.2. Prologues and epilogues</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#inlined-function">4.3. Inlined functions</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#wrong-linenr-info">4.4. Inaccuracy in line number information</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#symbol-without-debug-info">5. Assembly functions</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#overlapping-symbols">6. Overlapping symbols in JITed code</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#hidden-cost">7. Other discrepancies</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="chapter"> |
| <a href="#ack">6. Acknowledgments</a> |
| </span> |
| </dt> |
| </dl> |
| </div> |
| <div class="chapter" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title"><a id="introduction"></a>Chapter 1. Introduction</h2> |
| </div> |
| </div> |
| </div> |
| <div class="toc"> |
| <p> |
| <b>Table of Contents</b> |
| </p> |
| <dl> |
| <dt> |
| <span class="sect1"> |
| <a href="#applications">1. Applications of OProfile</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#jitsupport">1.1. Support for dynamically compiled (JIT) code</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#requirements">2. System requirements</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#resources">3. Internet resources</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#install">4. Installation</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#uninstall">5. Uninstalling OProfile</a> |
| </span> |
| </dt> |
| </dl> |
| </div> |
| <p> |
| This manual applies to OProfile version 0.9.6. |
| OProfile is a profiling system for Linux 2.2/2.4/2.6 systems on a number of architectures. It is capable of profiling |
| all parts of a running system, from the kernel (including modules and interrupt handlers) to shared libraries |
| to binaries. It runs transparently in the background collecting information at a low overhead. These |
| features make it ideal for profiling entire systems to determine bottle necks in real-world systems. |
| </p> |
| <p> |
| Many CPUs provide "performance counters", hardware registers that can count "events"; for example, |
| cache misses, or CPU cycles. OProfile provides profiles of code based on the number of these occurring events: |
| repeatedly, every time a certain (configurable) number of events has occurred, the PC value is recorded. |
| This information is aggregated into profiles for each binary image.</p> |
| <p> |
| Some hardware setups do not allow OProfile to use performance counters: in these cases, no |
| events are available, and OProfile operates in timer/RTC mode, as described in later chapters. |
| </p> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="applications"></a>1. Applications of OProfile</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| OProfile is useful in a number of situations. You might want to use OProfile when you : |
| </p> |
| <div class="itemizedlist"> |
| <ul type="disc"> |
| <li> |
| <p>need low overhead</p> |
| </li> |
| <li> |
| <p>cannot use highly intrusive profiling methods</p> |
| </li> |
| <li> |
| <p>need to profile interrupt handlers</p> |
| </li> |
| <li> |
| <p>need to profile an application and its shared libraries</p> |
| </li> |
| <li> |
| <p>need to profile dynamically compiled code of supported virtual machines (see <a href="#jitsupport" title="1.1. Support for dynamically compiled (JIT) code">Section 1.1, “Support for dynamically compiled (JIT) code”</a>)</p> |
| </li> |
| <li> |
| <p>need to capture the performance behaviour of entire system</p> |
| </li> |
| <li> |
| <p>want to examine hardware effects such as cache misses</p> |
| </li> |
| <li> |
| <p>want detailed source annotation</p> |
| </li> |
| <li> |
| <p>want instruction-level profiles</p> |
| </li> |
| <li> |
| <p>want call-graph profiles</p> |
| </li> |
| </ul> |
| </div> |
| <p> |
| OProfile is not a panacea. OProfile might not be a complete solution when you : |
| </p> |
| <div class="itemizedlist"> |
| <ul type="disc"> |
| <li> |
| <p>require call graph profiles on platforms other than 2.6/x86</p> |
| </li> |
| <li> |
| <p>don't have root permissions</p> |
| </li> |
| <li> |
| <p>require 100% instruction-accurate profiles</p> |
| </li> |
| <li> |
| <p>need function call counts or an interstitial profiling API</p> |
| </li> |
| <li> |
| <p>cannot tolerate any disturbance to the system whatsoever</p> |
| </li> |
| <li> |
| <p>need to profile interpreted or dynamically compiled code of non-supported virtual machines</p> |
| </li> |
| </ul> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="jitsupport"></a>1.1. Support for dynamically compiled (JIT) code</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| Older versions of OProfile were not capable of attributing samples to symbols from dynamically |
| compiled code, i.e. "just-in-time (JIT) code". Typical JIT compilers load the JIT code into |
| anonymous memory regions. OProfile reported the samples from such code, but the attribution |
| provided was simply: |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen">"anon: <tgid><address range>" </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Due to this limitation, it wasn't possible to profile applications executed by virtual machines (VMs) |
| like the Java Virtual Machine. OProfile now contains an infrastructure to support JITed code. |
| A development library is provided to allow developers |
| to add support for any VM that produces dynamically compiled code (see the <span class="emphasis"><em>OProfile JIT agent |
| developer guide</em></span>). |
| In addition, built-in support is included for the following:</p> |
| <div class="itemizedlist"> |
| <ul type="disc"> |
| <li>JVMTI agent library for Java (1.5 and higher)</li> |
| <li>JVMPI agent library for Java (1.5 and lower)</li> |
| </ul> |
| </div> |
| <p> |
| For information on how to use OProfile's JIT support, see <a href="#setup-jit" title="2. Setting up the JIT profiling feature">Section 2, “Setting up the JIT profiling feature”</a>. |
| </p> |
| </div> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="requirements"></a>2. System requirements</h2> |
| </div> |
| </div> |
| </div> |
| <div class="variablelist"> |
| <dl> |
| <dt> |
| <span class="term">Linux kernel 2.2/2.4/2.6</span> |
| </dt> |
| <dd> |
| <p> |
| OProfile uses a kernel module that can be compiled for |
| 2.2.11 or later and 2.4. 2.4.10 or above is required if you use the |
| boot-time kernel option <code class="option">nosmp</code>. 2.6 kernels are supported with the in-kernel |
| OProfile driver. Note that only 32-bit x86 and IA64 are supported on 2.2/2.4 kernels. |
| </p> |
| <p> |
| 2.6 kernels are strongly recommended. Under 2.4, OProfile may cause system crashes if power |
| management is used, or the BIOS does not correctly deal with local APICs. |
| </p> |
| <p> |
| PPC64 processors (Power4/Power5/PPC970, etc.) require a recent (> 2.6.5) kernel with the line |
| <code class="constant">#define PV_970</code> present in <code class="filename">include/asm-ppc64/processor.h</code>. |
| |
| </p> |
| <p> |
| Profiling the Cell Broadband Engine PowerPC Processing Element (PPE) requires a kernel version |
| of 2.6.18 or more recent. |
| Profiling the Cell Broadband Engine Synergistic Processing Element (SPE) requires a kernel version |
| of 2.6.22 or more recent. Additionally, full support of SPE profiling requires a BFD library |
| from binutils code dated January 2007 or later. To ensure the proper BFD support exists, run |
| the <code class="code">configure</code> utility with <code class="code">--with-target=cell-be</code>. |
| |
| Profiling the Cell Broadband Engine using SPU events requires a kernel version of 2.6.29-rc1 |
| or more recent. |
| |
| </p> |
| <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3>Attempting to profile SPEs with kernel versions older than 2.6.22 may cause the |
| system to crash.</div> |
| <p> |
| </p> |
| <p> |
| Instruction-Based Sampling (IBS) profile on AMD family10h processors requires |
| kernel version 2.6.28-rc2 or later. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">modutils 2.4.6 or above</span> |
| </dt> |
| <dd> |
| <p> |
| You should have installed modutils 2.4.6 or higher (in fact earlier versions work well in almost all |
| cases). |
| </p> |
| </dd> |
| <dt> |
| <span class="term">Supported architecture</span> |
| </dt> |
| <dd> |
| <p> |
| For Intel IA32, a CPU with either a P6 generation or Pentium 4 core is |
| required. In marketing terms this translates to anything |
| between an Intel Pentium Pro (not Pentium Classics) and |
| a Pentium 4 / Xeon, including all Celerons. The AMD |
| Athlon, Opteron, Phenom, and Turion CPUs are also supported. Other IA32 |
| CPU types only support the RTC mode of OProfile; please |
| see later in this manual for details. Hyper-threaded Pentium IVs |
| are not supported in 2.4. For 2.4 kernels, the Intel |
| IA-64 CPUs are also supported. For 2.6 kernels, there is additionally |
| support for Alpha processors, MIPS, ARM, x86-64, sparc64, ppc64, AVR32, and, |
| in timer mode, PA-RISC and s390. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">Uniprocessor or SMP</span> |
| </dt> |
| <dd> |
| <p> |
| SMP machines are fully supported. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">Required libraries</span> |
| </dt> |
| <dd> |
| <p> |
| These libraries are required : <code class="filename">popt</code>, <code class="filename">bfd</code>, |
| <code class="filename">liberty</code> (debian users: libiberty is provided in binutils-dev package), <code class="filename">dl</code>, |
| plus the standard C++ libraries. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">Required user account</span> |
| </dt> |
| <dd> |
| <p> |
| For secure processing of sample data from JIT virtual machines (e.g., Java), |
| the special user account "oprofile" must exist on the system. The 'configure' |
| and 'make install' operations will print warning messages if this |
| account is not found. If you intend to profile JITed code, you must create |
| a group account named 'oprofile' and then create the 'oprofile' user account, |
| setting the default group to 'oprofile'. A runtime error message is printed to |
| the oprofile daemon log when processing JIT samples if this special user |
| account cannot be found. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">OProfile GUI</span> |
| </dt> |
| <dd> |
| <p> |
| The use of the GUI to start the profiler requires the <code class="filename">Qt 2</code> library. <code class="filename">Qt 3</code> should |
| also work. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <span class="acronym">ELF</span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Probably not too strenuous a requirement, but older <span class="acronym">A.OUT</span> binaries/libraries are not supported. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">K&R coding style</span> |
| </dt> |
| <dd> |
| <p> |
| OK, so it's not really a requirement, but I wish it was... |
| </p> |
| </dd> |
| </dl> |
| </div> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="resources"></a>3. Internet resources</h2> |
| </div> |
| </div> |
| </div> |
| <div class="variablelist"> |
| <dl> |
| <dt> |
| <span class="term">Web page</span> |
| </dt> |
| <dd> |
| <p> |
| There is a web page (which you may be reading now) at |
| <a href="http://oprofile.sf.net/">http://oprofile.sf.net/</a>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">Download</span> |
| </dt> |
| <dd> |
| <p> |
| You can download a source tarball or get anonymous CVS at the sourceforge page, |
| <a href="http://sf.net/projects/oprofile/">http://sf.net/projects/oprofile/</a>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">Mailing list</span> |
| </dt> |
| <dd> |
| <p> |
| There is a low-traffic OProfile-specific mailing list, details at |
| <a href="http://sf.net/mail/?group_id=16191">http://sf.net/mail/?group_id=16191</a>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">Bug tracker</span> |
| </dt> |
| <dd> |
| <p> |
| There is a bug tracker for OProfile at SourceForge, |
| <a href="http://sf.net/tracker/?group_id=16191&atid=116191">http://sf.net/tracker/?group_id=16191&atid=116191</a>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">IRC channel</span> |
| </dt> |
| <dd> |
| <p> |
| Several OProfile developers and users sometimes hang out on channel <span><strong class="command">#oprofile</strong></span> |
| on the <a href="http://oftc.net">OFTC</a> network. |
| </p> |
| </dd> |
| </dl> |
| </div> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="install"></a>4. Installation</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| First you need to build OProfile and install it. <span><strong class="command">./configure</strong></span>, <span><strong class="command">make</strong></span>, <span><strong class="command">make install</strong></span> |
| is often all you need, but note these arguments to <span><strong class="command">./configure</strong></span> : |
| </p> |
| <div class="variablelist"> |
| <dl> |
| <dt> |
| <span class="term"> |
| <code class="option">--with-linux</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Use this option to specify the location of the kernel source tree you wish |
| to compile against. The kernel module is built against this source and |
| will only work with a running kernel built from the same source with |
| exact same options, so it is important you specify this option if you need |
| to. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--with-java</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Use this option if you need to profile Java applications. Also, see |
| <a href="#requirements" title="2. System requirements">Section 2, “System requirements”</a>, "Required user account". This option |
| is used to specify the location of the Java Development Kit (JDK) |
| source tree you wish to use. This is necessary to get the interface description |
| of the JVMPI (or JVMTI) interface to compile the JIT support code successfully. |
| </p> |
| <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| <h3 class="title">Note</h3> |
| <p> |
| The Java Runtime Environment (JRE) does not include the development |
| files that are required to compile the JIT support code, so the full |
| JDK must be installed in order to use this option. |
| </p> |
| </div> |
| <p> |
| By default, the Oprofile JIT support libraries will be installed in |
| <code class="filename"><oprof_install_dir>/lib/oprofile</code>. To build |
| and install OProfile and the JIT support libraries as 64-bit, you can |
| do something like the following: |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # CFLAGS="-m64" CXXFLAGS="-m64" ./configure \ |
| --with-kernel-support --with-java={my_jdk_installdir} \ |
| --libdir=/usr/local/lib64 |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| </p> |
| <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| <h3 class="title">Note</h3> |
| <p> |
| If you encounter errors building 64-bit, you should |
| install libtool 1.5.26 or later since that release of |
| libtool fixes known problems for certain platforms. |
| If you install libtool into a non-standard location, |
| you'll need to edit the invocation of 'aclocal' in |
| OProfile's autogen.sh as follows (assume an install |
| location of /usr/local): |
| </p> |
| <p> |
| <code class="code">aclocal -I m4 -I /usr/local/share/aclocal</code> |
| </p> |
| </div> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--with-kernel-support</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Use this option with 2.6 and above kernels to indicate the |
| kernel provides the OProfile device driver. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--with-qt-dir/includes/libraries</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Specify the location of Qt headers and libraries. It defaults to searching in |
| <code class="constant">$QTDIR</code> if these are not specified. |
| </p> |
| </dd> |
| <dt> |
| <a id="disable-werror"></a> |
| <span class="term"> |
| <code class="option">--disable-werror</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Development versions of OProfile build by |
| default with <code class="option">-Werror</code>. This option turns |
| <code class="option">-Werror</code> off. |
| </p> |
| </dd> |
| <dt> |
| <a id="disable-optimization"></a> |
| <span class="term"> |
| <code class="option">--disable-optimization</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Disable the <code class="option">-O2</code> compiler flag |
| (useful if you discover an OProfile bug and want to give a useful |
| back-trace etc.) |
| </p> |
| </dd> |
| </dl> |
| </div> |
| <p> |
| You'll need to have a configured kernel source for the current kernel |
| to build the module for 2.4 kernels. Since all distributions provide different kernels it's unlikely the running kernel match the configured source |
| you installed. The safest way is to recompile your own kernel, run it and compile oprofile. It is also recommended that if you have a |
| uniprocessor machine, you enable the local APIC / IO_APIC support for |
| your kernel (this is automatically enabled for SMP kernels). With many BIOS, kernel >= 2.6.9 and UP kernel it's not sufficient to enable the local APIC you must also turn it on explicitly at boot time by providing "lapic" option to the kernel. On |
| machines with power management, such as laptops, the power management |
| must be turned off when using OProfile with 2.4 kernels. The power management software |
| in the BIOS cannot handle the non-maskable interrupts (NMIs) used by |
| OProfile for data collection. If you use the NMI watchdog, be aware that |
| the watchdog is disabled when profiling starts, and not re-enabled until the |
| OProfile module is removed (or, in 2.6, when OProfile is not running). If you compile OProfile for |
| a 2.2 kernel you must be root to compile the module. If you are using |
| 2.6 kernels or higher, you do not need kernel source, as long as the |
| OProfile driver is enabled; additionally, you should not need to disable |
| power management. |
| </p> |
| <p> |
| Please note that you must save or have available the <code class="filename">vmlinux</code> file |
| generated during a kernel compile, as OProfile needs it (you can use |
| <code class="option">--no-vmlinux</code>, but this will prevent kernel profiling). |
| </p> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="uninstall"></a>5. Uninstalling OProfile</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| You must have the source tree available to uninstall OProfile; a <span><strong class="command">make uninstall</strong></span> will |
| remove all installed files except your configuration file in the directory <code class="filename">~/.oprofile</code>. |
| </p> |
| </div> |
| </div> |
| <div class="chapter" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title"><a id="overview"></a>Chapter 2. Overview</h2> |
| </div> |
| </div> |
| </div> |
| <div class="toc"> |
| <p> |
| <b>Table of Contents</b> |
| </p> |
| <dl> |
| <dt> |
| <span class="sect1"> |
| <a href="#getting-started">1. Getting started</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#tools-overview">2. Tools summary</a> |
| </span> |
| </dt> |
| </dl> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="getting-started"></a>1. Getting started</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| Before you can use OProfile, you must set it up. The minimum setup required for this |
| is to tell OProfile where the <code class="filename">vmlinux</code> file corresponding to the |
| running kernel is, for example : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen">opcontrol --vmlinux=/boot/vmlinux-`uname -r`</pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| If you don't want to profile the kernel itself, |
| you can tell OProfile you don't have a <code class="filename">vmlinux</code> file : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen">opcontrol --no-vmlinux</pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Now we are ready to start the daemon (<span><strong class="command">oprofiled</strong></span>) which collects |
| the profile data : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen">opcontrol --start</pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| When I want to stop profiling, I can do so with : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen">opcontrol --shutdown</pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Note that unlike <span><strong class="command">gprof</strong></span>, no instrumentation (<code class="option">-pg</code> |
| and <code class="option">-a</code> options to <span><strong class="command">gcc</strong></span>) |
| is necessary. |
| </p> |
| <p> |
| Periodically (or on <span><strong class="command">opcontrol --shutdown</strong></span> or <span><strong class="command">opcontrol --dump</strong></span>) |
| the profile data is written out into the $SESSION_DIR/samples directory (by default at <code class="filename">/var/lib/oprofile/samples</code>). |
| These profile files cover shared libraries, applications, the kernel (vmlinux), and kernel modules. |
| You can clear the profile data (at any time) with <span><strong class="command">opcontrol --reset</strong></span>. |
| </p> |
| <p> |
| To place these sample database files in a specific directory instead of the default location (<code class="filename">/var/lib/oprofile</code>) use the <code class="option">--session-dir=dir</code> option. You must also specify the <code class="option">--session-dir</code> to tell the tools to continue using this directory. (In the future, we should allow this to be specified in an environment variable.) : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen">opcontrol --no-vmlinux --session-dir=/home/me/tmpsession</pre> |
| </td> |
| </tr> |
| </table> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen">opcontrol --start --session-dir=/home/me/tmpsession</pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| You can get summaries of this data in a number of ways at any time. To get a summary of |
| data across the entire system for all of these profiles, you can do : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen">opreport [--session-dir=dir]</pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Or to get a more detailed summary, for a particular image, you can do something like : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen">opreport -l /boot/vmlinux-`uname -r`</pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| There are also a number of other ways of presenting the data, as described later in this manual. |
| Note that OProfile will choose a default profiling setup for you. However, there are a number |
| of options you can pass to <span><strong class="command">opcontrol</strong></span> if you need to change something, |
| also detailed later. |
| </p> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="tools-overview"></a>2. Tools summary</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| This section gives a brief description of the available OProfile utilities and their purpose. |
| </p> |
| <div class="variablelist"> |
| <dl> |
| <dt> |
| <span class="term"> |
| <code class="filename">ophelp</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| This utility lists the available events and short descriptions. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="filename">opcontrol</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Used for controlling the OProfile data collection, discussed in <a href="#controlling" title="Chapter 3. Controlling the profiler">Chapter 3, <i>Controlling the profiler</i></a>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="filename">agent libraries</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Used by virtual machines (like the Java VM) to record information about JITed code being profiled. See <a href="#setup-jit" title="2. Setting up the JIT profiling feature">Section 2, “Setting up the JIT profiling feature”</a>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="filename">opreport</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| This is the main tool for retrieving useful profile data, described in |
| <a href="#opreport" title="2. Image summaries and symbol summaries (opreport)">Section 2, “Image summaries and symbol summaries (<span><strong class="command">opreport</strong></span>)”</a>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="filename">opannotate</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| This utility can be used to produce annotated source, assembly or mixed source/assembly. |
| Source level annotation is available only if the application was compiled with |
| debugging symbols. See <a href="#opannotate" title="3. Outputting annotated source (opannotate)">Section 3, “Outputting annotated source (<span><strong class="command">opannotate</strong></span>)”</a>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="filename">opgprof</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| This utility can output gprof-style data files for a binary, for use with |
| <span><strong class="command">gprof -p</strong></span>. See <a href="#opgprof" title="5. gprof-compatible output (opgprof)">Section 5, “<span><strong class="command">gprof</strong></span>-compatible output (<span><strong class="command">opgprof</strong></span>)”</a>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="filename">oparchive</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| This utility can be used to collect executables, debuginfo, |
| and sample files and copy the files into an archive. |
| The archive is self-contained and can be moved to another |
| machine for further analysis. |
| See <a href="#oparchive" title="6. Archiving measurements (oparchive)">Section 6, “Archiving measurements (<span><strong class="command">oparchive</strong></span>)”</a>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="filename">opimport</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| This utility converts sample database files from a foreign binary format (abi) to |
| the native format. This is useful only when moving sample files between hosts, |
| for analysis on platforms other than the one used for collection. |
| See <a href="#opimport" title="7. Converting sample database files (opimport)">Section 7, “Converting sample database files (<span><strong class="command">opimport</strong></span>)”</a>. |
| </p> |
| </dd> |
| </dl> |
| </div> |
| </div> |
| </div> |
| <div class="chapter" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title"><a id="controlling"></a>Chapter 3. Controlling the profiler</h2> |
| </div> |
| </div> |
| </div> |
| <div class="toc"> |
| <p> |
| <b>Table of Contents</b> |
| </p> |
| <dl> |
| <dt> |
| <span class="sect1"> |
| <a href="#controlling-daemon">1. Using <span><strong class="command">opcontrol</strong></span></a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#opcontrolexamples">1.1. Examples</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#eventspec">1.2. Specifying performance counter events</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#setup-jit">2. Setting up the JIT profiling feature</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#setup-jit-jvm">2.1. JVM instrumentation</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#oprofile-gui">3. Using <span><strong class="command">oprof_start</strong></span></a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#detailed-parameters">4. Configuration details</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#hardware-counters">4.1. Hardware performance counters</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#rtc">4.2. OProfile in RTC mode</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#timer">4.3. OProfile in timer interrupt mode</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#p4">4.4. Pentium 4 support</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#ia64">4.5. Intel Itanium 2 support</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#ppc64">4.6. PowerPC64 support</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#cell-be">4.7. Cell Broadband Engine support</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#amd-ibs-support">4.8. AMD64 (x86_64) Instruction-Based Sampling (IBS) support</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#misuse">4.9. Dangerous counter settings</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| </dl> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="controlling-daemon"></a>1. Using <span><strong class="command">opcontrol</strong></span></h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| In this section we describe the configuration and control of the profiling system |
| with opcontrol in more depth. |
| The <span><strong class="command">opcontrol</strong></span> script has a default setup, but you |
| can alter this with the options given below. In particular, |
| if your hardware supports performance counters, you can configure them. |
| There are a number of counters (for example, counter 0 and counter 1 |
| on the Pentium III). Each of these counters can be programmed with |
| an event to count, such as cache misses or MMX operations. The event |
| chosen for each counter is reflected in the profile data collected |
| by OProfile: functions and binaries at the top of the profiles reflect |
| that most of the chosen events happened within that code. |
| </p> |
| <p> |
| Additionally, each counter has a "count" value: this corresponds to how |
| detailed the profile is. The lower the value, the more frequently profile |
| samples are taken. A counter can choose to sample only kernel code, user-space code, |
| or both (both is the default). Finally, some events have a "unit mask" |
| - this is a value that further restricts the types of event that are counted. |
| The event types and unit masks for your CPU are listed by <span><strong class="command">opcontrol |
| --list-events</strong></span>. |
| </p> |
| <p> |
| The <span><strong class="command">opcontrol</strong></span> script provides the following actions : |
| </p> |
| <div class="variablelist"> |
| <dl> |
| <dt> |
| <span class="term"> |
| <code class="option">--init</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Loads the OProfile module if required and makes the OProfile driver |
| interface available. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--setup</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Followed by list arguments for profiling set up. List of arguments |
| saved in <code class="filename">/root/.oprofile/daemonrc</code>. |
| Giving this option is not necessary; you can just directly pass one |
| of the setup options, e.g. <span><strong class="command">opcontrol --no-vmlinux</strong></span>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--status</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Show configuration information. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--start-daemon</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Start the oprofile daemon without starting actual profiling. The profiling |
| can then be started using <code class="option">--start</code>. This is useful for avoiding |
| measuring the cost of daemon startup, as <code class="option">--start</code> is a simple |
| write to a file in oprofilefs. Not available in 2.2/2.4 kernels. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--start</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Start data collection with either arguments provided by <code class="option">--setup</code> |
| or information saved in <code class="filename">/root/.oprofile/daemonrc</code>. Specifying |
| the addition <code class="option">--verbose</code> makes the daemon generate lots of debug data |
| whilst it is running. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--dump</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Force a flush of the collected profiling data to the daemon. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--stop</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Stop data collection (this separate step is not possible with 2.2 or 2.4 kernels). |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--shutdown</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Stop data collection and kill the daemon. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--reset</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Clears out data from current session, but leaves saved sessions. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"><code class="option">--save=</code>session_name</span> |
| </dt> |
| <dd> |
| <p> |
| Save data from current session to session_name. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--deinit</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Shuts down daemon. Unload the OProfile module and oprofilefs. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--list-events</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| List event types and unit masks. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--help</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Generate usage messages. |
| </p> |
| </dd> |
| </dl> |
| </div> |
| <p> |
| There are a number of possible settings, of which, only |
| <code class="option">--vmlinux</code> (or <code class="option">--no-vmlinux</code>) |
| is required. These settings are stored in <code class="filename">~/.oprofile/daemonrc</code>. |
| </p> |
| <div class="variablelist"> |
| <dl> |
| <dt> |
| <span class="term"><code class="option">--buffer-size=</code>num</span> |
| </dt> |
| <dd> |
| <p> |
| Number of samples in kernel buffer. When using a 2.6 kernel |
| buffer watershed need to be tweaked when changing this value. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"><code class="option">--buffer-watershed=</code>num</span> |
| </dt> |
| <dd> |
| <p> |
| Set kernel buffer watershed to num samples (2.6 only). When it'll remain only |
| buffer-size - buffer-watershed free entry in the kernel buffer data will be |
| flushed to daemon, most usefull value are in the range [0.25 - 0.5] * buffer-size. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"><code class="option">--cpu-buffer-size=</code>num</span> |
| </dt> |
| <dd> |
| <p> |
| Number of samples in kernel per-cpu buffer (2.6 only). If you |
| profile at high rate it can help to increase this if the log |
| file show excessive count of sample lost cpu buffer overflow. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"><code class="option">--event=</code>[eventspec]</span> |
| </dt> |
| <dd> |
| <p> |
| Use the given performance counter event to profile. |
| See <a href="#eventspec" title="1.2. Specifying performance counter events">Section 1.2, “Specifying performance counter events”</a> below. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"><code class="option">--session-dir=</code>dir_path</span> |
| </dt> |
| <dd> |
| <p> |
| Create/use sample database out of directory <code class="filename">dir_path</code> instead of |
| the default location (/var/lib/oprofile). |
| </p> |
| </dd> |
| <dt> |
| <span class="term"><code class="option">--separate=</code>[none,lib,kernel,thread,cpu,all]</span> |
| </dt> |
| <dd> |
| <p> |
| By default, every profile is stored in a single file. Thus, for example, |
| samples in the C library are all accredited to the <code class="filename">/lib/libc.o</code> |
| profile. However, you choose to create separate sample files by specifying |
| one of the below options. |
| </p> |
| <div class="informaltable"> |
| <table border="1"> |
| <colgroup> |
| <col /> |
| <col /> |
| </colgroup> |
| <tbody> |
| <tr> |
| <td> |
| <code class="option">none</code> |
| </td> |
| <td>No profile separation (default)</td> |
| </tr> |
| <tr> |
| <td> |
| <code class="option">lib</code> |
| </td> |
| <td>Create per-application profiles for libraries</td> |
| </tr> |
| <tr> |
| <td> |
| <code class="option">kernel</code> |
| </td> |
| <td>Create per-application profiles for the kernel and kernel modules</td> |
| </tr> |
| <tr> |
| <td> |
| <code class="option">thread</code> |
| </td> |
| <td>Create profiles for each thread and each task</td> |
| </tr> |
| <tr> |
| <td> |
| <code class="option">cpu</code> |
| </td> |
| <td>Create profiles for each CPU</td> |
| </tr> |
| <tr> |
| <td> |
| <code class="option">all</code> |
| </td> |
| <td>All of the above options</td> |
| </tr> |
| </tbody> |
| </table> |
| </div> |
| <p> |
| Note that <code class="option">--separate=kernel</code> also turns on <code class="option">--separate=lib</code>. |
| |
| When using <code class="option">--separate=kernel</code>, samples in hardware interrupts, soft-irqs, or other |
| asynchronous kernel contexts are credited to the task currently running. This means you will see |
| seemingly nonsense profiles such as <code class="filename">/bin/bash</code> showing samples for the PPP modules, |
| etc. |
| </p> |
| <p> |
| On 2.2/2.4 only kernel threads already started when profiling begins are correctly profiled; |
| newly started kernel thread samples are credited to the vmlinux (kernel) profile. |
| </p> |
| <p> |
| Using <code class="option">--separate=thread</code> creates a lot |
| of sample files if you leave OProfile running for a while; it's most |
| useful when used for short sessions, or when using image filtering. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"><code class="option">--callgraph=</code>#depth</span> |
| </dt> |
| <dd> |
| <p> |
| Enable call-graph sample collection with a maximum depth. Use 0 to disable |
| callgraph profiling. NOTE: Callgraph support is available on a limited |
| number of platforms at this time; for example: |
| </p> |
| <p> |
| </p> |
| <div class="itemizedlist"> |
| <ul type="disc"> |
| <li> |
| <p>x86 with recent 2.6 kernel</p> |
| </li> |
| <li> |
| <p>ARM with recent 2.6 kernel</p> |
| </li> |
| <li> |
| <p>PowerPC with 2.6.17 kernel</p> |
| </li> |
| </ul> |
| </div> |
| <p> |
| </p> |
| <p> |
| </p> |
| </dd> |
| <dt> |
| <span class="term"><code class="option">--image=</code>image,[images]|"all"</span> |
| </dt> |
| <dd> |
| <p> |
| Image filtering. If you specify one or more absolute |
| paths to binaries, OProfile will only produce profile results for those |
| binary images. This is useful for restricting the sometimes voluminous |
| output you may get otherwise, especially with |
| <code class="option">--separate=thread</code>. Note that if you are using |
| <code class="option">--separate=lib</code> or |
| <code class="option">--separate=kernel</code>, then if you specification an |
| application binary, the shared libraries and kernel code |
| <span class="emphasis"><em>are</em></span> included. Specify the value |
| "all" to profile everything (the default). |
| </p> |
| </dd> |
| <dt> |
| <span class="term"><code class="option">--vmlinux=</code>file</span> |
| </dt> |
| <dd> |
| <p> |
| vmlinux kernel image. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--no-vmlinux</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Use this when you don't have a kernel vmlinux file, and you don't want |
| to profile the kernel. This still counts the total number of kernel samples, |
| but can't give symbol-based results for the kernel or any modules. |
| </p> |
| </dd> |
| </dl> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="opcontrolexamples"></a>1.1. Examples</h3> |
| </div> |
| </div> |
| </div> |
| <div class="sect3" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h4 class="title"><a id="examplesperfctr"></a>1.1.1. Intel performance counter setup</h4> |
| </div> |
| </div> |
| </div> |
| <p> |
| Here, we have a Pentium III running at 800MHz, and we want to look at where data memory |
| references are happening most, and also get results for CPU time. |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # opcontrol --event=CPU_CLK_UNHALTED:400000 --event=DATA_MEM_REFS:10000 |
| # opcontrol --vmlinux=/boot/2.6.0/vmlinux |
| # opcontrol --start |
| </pre> |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="sect3" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h4 class="title"><a id="examplesrtc"></a>1.1.2. RTC mode</h4> |
| </div> |
| </div> |
| </div> |
| <p> |
| Here, we have an Intel laptop without support for performance counters, running on 2.4 kernels. |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # ophelp -r |
| CPU with RTC device |
| # opcontrol --vmlinux=/boot/2.4.13/vmlinux --event=RTC_INTERRUPTS:1024 |
| # opcontrol --start |
| </pre> |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="sect3" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h4 class="title"><a id="examplesstartdaemon"></a>1.1.3. Starting the daemon separately</h4> |
| </div> |
| </div> |
| </div> |
| <p> |
| If we're running 2.6 kernels, we can use <code class="option">--start-daemon</code> to avoid |
| the profiler startup affecting results. |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # opcontrol --vmlinux=/boot/2.6.0/vmlinux |
| # opcontrol --start-daemon |
| # my_favourite_benchmark --init |
| # opcontrol --start ; my_favourite_benchmark --run ; opcontrol --stop |
| </pre> |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="sect3" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h4 class="title"><a id="exampleseparate"></a>1.1.4. Separate profiles for libraries and the kernel</h4> |
| </div> |
| </div> |
| </div> |
| <p> |
| Here, we want to see a profile of the OProfile daemon itself, including when |
| it was running inside the kernel driver, and its use of shared libraries. |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # opcontrol --separate=kernel --vmlinux=/boot/2.6.0/vmlinux |
| # opcontrol --start |
| # my_favourite_stress_test --run |
| # opreport -l -p /lib/modules/2.6.0/kernel /usr/local/bin/oprofiled |
| </pre> |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="sect3" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h4 class="title"><a id="examplessessions"></a>1.1.5. Profiling sessions</h4> |
| </div> |
| </div> |
| </div> |
| <p> |
| It can often be useful to split up profiling data into several different |
| time periods. For example, you may want to collect data on an application's |
| startup separately from the normal runtime data. You can use the simple |
| command <span><strong class="command">opcontrol --save</strong></span> to do this. For example : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # opcontrol --save=blah |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| will create a sub-directory in <code class="filename">$SESSION_DIR/samples</code> containing the samples |
| up to that point (the current session's sample files are moved into this |
| directory). You can then pass this session name as a parameter to the post-profiling |
| analysis tools, to only get data up to the point you named the |
| session. If you do not want to save a session, you can do |
| <span><strong class="command">rm -rf $SESSION_DIR/samples/sessionname</strong></span> or, for the |
| current session, <span><strong class="command">opcontrol --reset</strong></span>. |
| </p> |
| </div> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="eventspec"></a>1.2. Specifying performance counter events</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| The <code class="option">--event</code> option to <span><strong class="command">opcontrol</strong></span> |
| takes a specification that indicates how the details of each |
| hardware performance counter should be setup. If you want to |
| revert to OProfile's default setting (<code class="option">--event</code> |
| is strictly optional), use <code class="option">--event=default</code>. Use of this |
| option over-rides all previous event selections. |
| </p> |
| <p> |
| You can pass multiple event specifications. OProfile will allocate |
| hardware counters as necessary. Note that some combinations are not |
| allowed by the CPU; running <span><strong class="command">opcontrol --list-events</strong></span> gives the details |
| of each event. The event specification is a colon-separated string |
| of the form <code class="option"><span class="emphasis"><em>name</em></span>:<span class="emphasis"><em>count</em></span>:<span class="emphasis"><em>unitmask</em></span>:<span class="emphasis"><em>kernel</em></span>:<span class="emphasis"><em>user</em></span></code> as described in this table: |
| </p> |
| <div class="informaltable"> |
| <table border="1"> |
| <colgroup> |
| <col /> |
| <col /> |
| </colgroup> |
| <tbody> |
| <tr> |
| <td> |
| <code class="option">name</code> |
| </td> |
| <td>The symbolic event name, e.g. <code class="constant">CPU_CLK_UNHALTED</code></td> |
| </tr> |
| <tr> |
| <td> |
| <code class="option">count</code> |
| </td> |
| <td>The counter reset value, e.g. 100000</td> |
| </tr> |
| <tr> |
| <td> |
| <code class="option">unitmask</code> |
| </td> |
| <td>The unit mask, as given in the events list, e.g. 0x0f</td> |
| </tr> |
| <tr> |
| <td> |
| <code class="option">kernel</code> |
| </td> |
| <td>Whether to profile kernel code</td> |
| </tr> |
| <tr> |
| <td> |
| <code class="option">user</code> |
| </td> |
| <td>Whether to profile userspace code</td> |
| </tr> |
| </tbody> |
| </table> |
| </div> |
| <p> |
| The last three values are optional, if you omit them (e.g. <code class="option">--event=DATA_MEM_REFS:30000</code>), |
| they will be set to the default values (a unit mask of 0, and profiling both kernel and |
| userspace code). Note that some events require a unit mask. |
| </p> |
| <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| <h3 class="title">Note</h3> |
| <p> |
| For the PowerPC platforms, all events specified must be in the same group; i.e., the group number |
| appended to the event name (e.g. <code class="constant"><<span class="emphasis"><em>some-event-name</em></span>>_GRP9</code>) must be the same. |
| </p> |
| </div> |
| <p> |
| If OProfile is using RTC mode, and you want to alter the default counter value, |
| you can use something like <code class="option">--event=RTC_INTERRUPTS:2048</code>. Note the last |
| three values here are ignored. |
| If OProfile is using timer-interrupt mode, there is no configuration possible. |
| </p> |
| <p> |
| The table below lists the events selected by default |
| (<code class="option">--event=default</code>) for the various computer architectures: |
| </p> |
| <div class="informaltable"> |
| <table border="1"> |
| <colgroup> |
| <col /> |
| <col /> |
| <col /> |
| </colgroup> |
| <tbody> |
| <tr> |
| <td>Processor</td> |
| <td>cpu_type</td> |
| <td>Default event</td> |
| </tr> |
| <tr> |
| <td>Alpha EV4</td> |
| <td>alpha/ev4</td> |
| <td>CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Alpha EV5</td> |
| <td>alpha/ev5</td> |
| <td>CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Alpha PCA56</td> |
| <td>alpha/pca56</td> |
| <td>CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Alpha EV6</td> |
| <td>alpha/ev6</td> |
| <td>CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Alpha EV67</td> |
| <td>alpha/ev67</td> |
| <td>CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>ARM/XScale PMU1</td> |
| <td>arm/xscale1</td> |
| <td>CPU_CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>ARM/XScale PMU2</td> |
| <td>arm/xscale2</td> |
| <td>CPU_CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>ARM/MPCore</td> |
| <td>arm/mpcore</td> |
| <td>CPU_CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>AVR32</td> |
| <td>avr32</td> |
| <td>CPU_CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Athlon</td> |
| <td>i386/athlon</td> |
| <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Pentium Pro</td> |
| <td>i386/ppro</td> |
| <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Pentium II</td> |
| <td>i386/pii</td> |
| <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Pentium III</td> |
| <td>i386/piii</td> |
| <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Pentium M (P6 core)</td> |
| <td>i386/p6_mobile</td> |
| <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Pentium 4 (non-HT)</td> |
| <td>i386/p4</td> |
| <td>GLOBAL_POWER_EVENTS:100000:1:1:1</td> |
| </tr> |
| <tr> |
| <td>Pentium 4 (HT)</td> |
| <td>i386/p4-ht</td> |
| <td>GLOBAL_POWER_EVENTS:100000:1:1:1</td> |
| </tr> |
| <tr> |
| <td>Hammer</td> |
| <td>x86-64/hammer</td> |
| <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Family10h</td> |
| <td>x86-64/family10</td> |
| <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Family11h</td> |
| <td>x86-64/family11h</td> |
| <td>CPU_CLK_UNHALTED:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Itanium</td> |
| <td>ia64/itanium</td> |
| <td>CPU_CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>Itanium 2</td> |
| <td>ia64/itanium2</td> |
| <td>CPU_CYCLES:100000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>TIMER_INT</td> |
| <td>timer</td> |
| <td>None selectable</td> |
| </tr> |
| <tr> |
| <td>IBM iseries</td> |
| <td>PowerPC 4/5/970</td> |
| <td>CYCLES:10000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>IBM pseries</td> |
| <td>PowerPC 4/5/970/Cell</td> |
| <td>CYCLES:10000:0:1:1</td> |
| </tr> |
| <tr> |
| <td>IBM s390</td> |
| <td>timer</td> |
| <td>None selectable</td> |
| </tr> |
| <tr> |
| <td>IBM s390x</td> |
| <td>timer</td> |
| <td>None selectable</td> |
| </tr> |
| </tbody> |
| </table> |
| </div> |
| </div> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="setup-jit"></a>2. Setting up the JIT profiling feature</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| To gather information about JITed code from a virtual machine, |
| it needs to be instrumented with an agent library. We use the |
| agent libraries for Java in the following example. To use the |
| Java profiling feature, you must build OProfile with the "--with-java" option |
| (<a href="#install" title="4. Installation">Section 4, “Installation”</a>). |
| |
| </p> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="setup-jit-jvm"></a>2.1. JVM instrumentation</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| Add this to the startup parameters of the JVM (for JVMTI): |
| |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"><code xmlns="http://www.w3.org/1999/xhtml" class="option">-agentpath:<libdir>/libjvmti_oprofile.so[=<options>]</code> </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| or |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"><code xmlns="http://www.w3.org/1999/xhtml" class="option">-agentlib:jvmti_oprofile[=<options>]</code> </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| </p> |
| <p> |
| The JVMPI agent implementation is enabled with the command line option |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"><code xmlns="http://www.w3.org/1999/xhtml" class="option">-Xrunjvmpi_oprofile[:<options>]</code> </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| </p> |
| <p> |
| Currently, there is just one option available -- <code class="option">debug</code>. For JVMPI, |
| the convention for specifying an option is <code class="option">option_name=[yes|no]</code>. |
| For JVMTI, the option specification is simply the option name, implying |
| "yes"; no option specified implies "no". |
| </p> |
| <p> |
| The agent library (installed in <code class="filename"><oprof_install_dir>/lib/oprofile</code>) |
| needs to be in the library search path (e.g. add the library directory |
| to <code class="constant">LD_LIBRARY_PATH</code>). If the command line of |
| the JVM is not accessible, it may be buried within shell scripts or a |
| launcher program. It may also be possible to set an environment variable to add |
| the instrumentation. |
| For Sun JVMs this is <code class="constant">JAVA_TOOL_OPTIONS</code>. Please check |
| your JVM documentation for |
| further information on the agent startup options. |
| </p> |
| </div> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="oprofile-gui"></a>3. Using <span><strong class="command">oprof_start</strong></span></h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| The <span><strong class="command">oprof_start</strong></span> application provides a convenient way to start the profiler. |
| Note that <span><strong class="command">oprof_start</strong></span> is just a wrapper around the <span><strong class="command">opcontrol</strong></span> script, |
| so it does not provide more services than the script itself. |
| </p> |
| <p> |
| After <span><strong class="command">oprof_start</strong></span> is started you can select the event type for each counter; |
| the sampling rate and other related parameters are explained in <a href="#controlling-daemon" title="1. Using opcontrol">Section 1, “Using <span><strong class="command">opcontrol</strong></span>”</a>. |
| The "Configuration" section allows you to set general parameters such as the buffer size, kernel filename |
| etc. The counter setup interface should be self-explanatory; <a href="#hardware-counters" title="4.1. Hardware performance counters">Section 4.1, “Hardware performance counters”</a> and related |
| links contain information on using unit masks. |
| </p> |
| <p> |
| A status line shows the current status of the profiler: how long it has been running, and the average |
| number of interrupts received per second and the total, over all processors. |
| Note that quitting <span><strong class="command">oprof_start</strong></span> does not stop the profiler. |
| </p> |
| <p> |
| Your configuration is saved in the same file as <span><strong class="command">opcontrol</strong></span> uses; that is, |
| <code class="filename">~/.oprofile/daemonrc</code>. |
| </p> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="detailed-parameters"></a>4. Configuration details</h2> |
| </div> |
| </div> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="hardware-counters"></a>4.1. Hardware performance counters</h3> |
| </div> |
| </div> |
| </div> |
| <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| <h3 class="title">Note</h3> |
| <p> |
| Your CPU type may not include the requisite support for hardware performance counters, in which case |
| you must use OProfile in RTC mode in 2.4 (see <a href="#rtc" title="4.2. OProfile in RTC mode">Section 4.2, “OProfile in RTC mode”</a>), or timer mode in 2.6 (see <a href="#timer" title="4.3. OProfile in timer interrupt mode">Section 4.3, “OProfile in timer interrupt mode”</a>). |
| You do not really need to read this section unless you are interested in using |
| events other than the default event chosen by OProfile. |
| </p> |
| </div> |
| <p> |
| The Intel hardware performance counters are detailed in the Intel IA-32 Architecture Manual, Volume 3, available |
| from <a href="http://developer.intel.com/">http://developer.intel.com/</a>. |
| The AMD Athlon/Opteron/Phenom/Turion implementation is detailed in <a href="http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf"> |
| http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf</a>. |
| For PowerPC64 processors in IBM iSeries, pSeries, and blade server systems, processor documentation |
| is available at <a href="http://www-01.ibm.com/chips/techlib/techlib.nsf/productfamilies/PowerPC/"> |
| http://www-01.ibm.com/chips/techlib/techlib.nsf/productfamilies/PowerPC</a>. (For example, the |
| specific publication containing information on the performance monitor unit for the PowerPC970 is |
| "IBM PowerPC 970FX RISC Microprocessor User's Manual.") |
| These processors are capable of delivering an interrupt when a counter overflows. |
| This is the basic mechanism on which OProfile is based. The delivery mode is <span class="acronym">NMI</span>, |
| so blocking interrupts in the kernel does not prevent profiling. When the interrupt handler is called, |
| the current <span class="acronym">PC</span> value and the current task are recorded into the profiling structure. |
| This allows the overflow event to be attached to a specific assembly instruction in a binary image. |
| The daemon receives this data from the kernel, and writes it to the sample files. |
| </p> |
| <p> |
| If we use an event such as <code class="constant">CPU_CLK_UNHALTED</code> or <code class="constant">INST_RETIRED</code> |
| (<code class="constant">GLOBAL_POWER_EVENTS</code> or <code class="constant">INSTR_RETIRED</code>, respectively, on the Pentium 4), we can |
| use the overflow counts as an estimate of actual time spent in each part of code. Alternatively we can profile interesting |
| data such as the cache behaviour of routines with the other available counters. |
| </p> |
| <p> |
| However there are several caveats. First, there are those issues listed in the Intel manual. There is a delay |
| between the counter overflow and the interrupt delivery that can skew results on a small scale - this means |
| you cannot rely on the profiles at the instruction level as being perfectly accurate. |
| If you are using an "event-mode" counter such as the cache counters, a count registered against it doesn't mean |
| that it is responsible for that event. However, it implies that the counter overflowed in the dynamic |
| vicinity of that instruction, to within a few instructions. Further details on this problem can be found in |
| <a href="#interpreting" title="Chapter 5. Interpreting profiling results">Chapter 5, <i>Interpreting profiling results</i></a> and also in the Digital paper "ProfileMe: A Hardware Performance Counter". |
| </p> |
| <p> |
| Each counter has several configuration parameters. |
| First, there is the unit mask: this simply further specifies what to count. |
| Second, there is the counter value, discussed below. Third, there is a parameter whether to increment counts |
| whilst in kernel or user space. You can configure these separately for each counter. |
| </p> |
| <p> |
| After each overflow event, the counter will be re-initialized |
| such that another overflow will occur after this many events have been counted. Thus, higher |
| values mean less-detailed profiling, and lower values mean more detail, but higher overhead. |
| Picking a good value for this |
| parameter is, unfortunately, somewhat of a black art. It is of course dependent on the event |
| you have chosen. |
| Specifying too large a value will mean not enough interrupts are generated |
| to give a realistic profile (though this problem can be ameliorated by profiling for <span class="emphasis"><em>longer</em></span>). |
| Specifying too small a value can lead to higher performance overhead. |
| </p> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="rtc"></a>4.2. OProfile in RTC mode</h3> |
| </div> |
| </div> |
| </div> |
| <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| <h3 class="title">Note</h3> |
| <p> |
| This section applies to 2.2/2.4 kernels only. |
| </p> |
| </div> |
| <p> |
| Some CPU types do not provide the needed hardware support to use the hardware performance counters. This includes |
| some laptops, classic Pentiums, and other CPU types not yet supported by OProfile (such as Cyrix). |
| On these machines, OProfile falls |
| back to using the real-time clock interrupt to collect samples. This interrupt is also used by the <span><strong class="command">rtc</strong></span> |
| module: you cannot have both the OProfile and rtc modules loaded nor the rtc support compiled in the kernel. |
| </p> |
| <p> |
| RTC mode is less capable than the hardware counters mode; in particular, it is unable to profile sections of |
| the kernel where interrupts are disabled. There is just one available event, "RTC interrupts", and its value |
| corresponds to the number of interrupts generated per second (that is, a higher number means a better profiling |
| resolution, and higher overhead). The current implementation of the real-time clock supports only power-of-two |
| sampling rates from 2 to 4096 per second. Other values within this range are rounded to the nearest power of |
| two. |
| </p> |
| <p> |
| You can force use of the RTC interrupt with the <code class="option">force_rtc=1</code> module parameter. |
| </p> |
| <p> |
| Setting the value from the GUI should be straightforward. On the command line, you need to specify the |
| event to <span><strong class="command">opcontrol</strong></span>, e.g. : |
| </p> |
| <p> |
| <span> |
| <strong class="command">opcontrol --event=RTC_INTERRUPTS:256</strong> |
| </span> |
| </p> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="timer"></a>4.3. OProfile in timer interrupt mode</h3> |
| </div> |
| </div> |
| </div> |
| <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| <h3 class="title">Note</h3> |
| <p> |
| This section applies to 2.6 kernels and above only. |
| </p> |
| </div> |
| <p> |
| In 2.6 kernels on CPUs without OProfile support for the hardware performance counters, the driver |
| falls back to using the timer interrupt for profiling. Like the RTC mode in 2.4 kernels, this is not able to |
| profile code that has interrupts disabled. Note that there are no configuration parameters for |
| setting this, unlike the RTC and hardware performance counter setup. |
| </p> |
| <p> |
| You can force use of the timer interrupt by using the <code class="option">timer=1</code> module |
| parameter (or <code class="option">oprofile.timer=1</code> on the boot command line if OProfile is |
| built-in). |
| </p> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="p4"></a>4.4. Pentium 4 support</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| The Pentium 4 / Xeon performance counters are organized around 3 types of model specific registers (MSRs): 45 event |
| selection control registers (ESCRs), 18 counter configuration control registers (CCCRs) and 18 counters. ESCRs describe a |
| particular set of events which are to be recorded, and CCCRs bind ESCRs to counters and configure their |
| operation. Unfortunately the relationship between these registers is quite complex; they cannot all be used with one |
| another at any time. There is, however, a subset of 8 counters, 8 ESCRs, and 8 CCCRs which can be used independently of |
| one another, so OProfile only accesses those registers, treating them as a bank of 8 "normal" counters, similar |
| to those in the P6 or Athlon/Opteron/Phenom/Turion families of CPU. |
| </p> |
| <p> |
| There is currently no support for Precision Event-Based Sampling (PEBS), nor any advanced uses of the Debug Store |
| (DS). Current support is limited to the conservative extension of OProfile's existing interrupt-based model described |
| above. Performance monitoring hardware on Pentium 4 / Xeon processors with Hyperthreading enabled (multiple logical |
| processors on a single die) is not supported in 2.4 kernels (you can use OProfile if you disable hyper-threading, |
| though). |
| </p> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="ia64"></a>4.5. Intel Itanium 2 support</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| The Itanium 2 performance monitoring unit (PMU) organizes the counters as four |
| pairs of performance event monitoring registers. Each pair is composed of a |
| Performance Monitoring Configuration (PMC) register and Performance Monitoring |
| Data (PMD) register. The PMC selects the performance event being monitored and |
| the PMD determines the sampling interval. The IA64 Performance Monitoring Unit |
| (PMU) triggers sampling with maskable interrupts. Thus, samples will not occur |
| in sections of the IA64 kernel where interrupts are disabled. |
| </p> |
| <p> |
| None of the advance features of the Itanium 2 performance monitoring unit |
| such as opcode matching, address range matching, or precise event sampling are |
| supported by this version of OProfile. The Itanium 2 support only maps OProfile's |
| existing interrupt-based model to the PMU hardware. |
| </p> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="ppc64"></a>4.6. PowerPC64 support</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| The performance monitoring unit (PMU) for the IBM PowerPC 64-bit processors |
| consists of between 4 and 8 counters (depending on the model), plus three |
| special purpose registers used for programming the counters -- MMCR0, MMCR1, |
| and MMCRA. Advanced features such as instruction matching and thresholding are |
| not supported by this version of OProfile. |
| </p> |
| <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3>Later versions of the IBM POWER5+ processor (beginning with revision 3.0) |
| run the performance monitor unit in POWER6 mode, effectively removing OProfile's |
| access to counters 5 and 6. These two counters are dedicated to counting |
| instructions completed and cycles, respectively. In POWER6 mode, however, the |
| counters do not generate an interrupt on overflow and so are unusable by |
| OProfile. Kernel versions 2.6.23 and higher will recognize this mode |
| and export "ppc64/power5++" as the cpu_type to the oprofilefs pseudo filesystem. |
| OProfile userspace responds to this cpu_type by removing these counters from |
| the list of potential events to count. Without this kernel support, attempts |
| to profile using an event from one of these counters will yield incorrect |
| results -- typically, zero (or near zero) samples in the generated report. |
| </div> |
| <p> |
| </p> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="cell-be"></a>4.7. Cell Broadband Engine support</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| The Cell Broadband Engine (CBE) processor core consists of a PowerPC Processing |
| Element (PPE) and 8 Synergistic Processing Elements (SPE). PPEs and SPEs each |
| consist of a processing unit (PPU and SPU, respectively) and other hardware |
| components, such as memory controllers. |
| </p> |
| <p> |
| A PPU has two hardware threads (aka "virtual CPUs"). The performance monitor |
| unit of the CBE collects event information on one hardware thread at a time. |
| Therefore, when profiling PPE events, |
| OProfile collects the profile based on the selected events by time slicing the |
| performance counter hardware between the two threads. The user must ensure the |
| collection interval is long enough so that the time spent collecting data for |
| each PPU is sufficient to obtain a good profile. |
| </p> |
| <p> |
| To profile an SPU application, the user should specify the SPU_CYCLES event. |
| When starting OProfile with SPU_CYCLES, the opcontrol script enforces certain |
| separation parameters (separate=cpu,lib) to ensure that sufficient information |
| is collected in the sample data in order to generate a complete report. The |
| --merge=cpu option can be used to obtain a more readable report if analyzing |
| the performance of each separate SPU is not necessary. |
| </p> |
| <p> |
| Profiling with an SPU event (events 4100 through 4163) is not compatible with any other |
| event. Further more, only one SPU event can be specified at a time. The hardware only |
| supports profiling on one SPU per node at a time. The OProfile kernel code time slices |
| between the eight SPUs to collect data on all SPUs. |
| </p> |
| <p> |
| SPU profile reports have some unique characteristics compared to reports for |
| standard architectures: |
| </p> |
| <div class="itemizedlist"> |
| <ul type="disc"> |
| <li>Typically no "app name" column. This is really standard OProfile behavior |
| when the report contains samples for just a single application, which is |
| commonly the case when profiling SPUs.</li> |
| <li>"CPU" equates to "SPU"</li> |
| <li>Specifying '--long-filenames' on the opreport command does not always result |
| in long filenames. This happens when the SPU application code is embedded in |
| the PPE executable or shared library. The embedded SPU ELF data contains only the |
| short filename (i.e., no path information) for the SPU binary file that was used as |
| the source for embedding. The reason that just the short filename is used is because |
| the original SPU binary file may not exist or be accessible at runtime. The performance |
| analyst must have sufficient knowledge of the application to be able to correlate the |
| SPU binary image names found in the report to the application's source files. |
| <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3> |
| Compile the application with -g and generate the OProfile report |
| with -g to facilitate finding the right source file(s) on which to focus. |
| </div></li> |
| </ul> |
| </div> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="amd-ibs-support"></a>4.8. AMD64 (x86_64) Instruction-Based Sampling (IBS) support</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| Instruction-Based Sampling (IBS) is a new performance measurement technique |
| available on AMD Family 10h processors. Traditional performance counter |
| sampling is not precise enough to isolate performance issues to individual |
| instructions. IBS, however, precisely identifies instructions which are not |
| making the best use of the processor pipeline and memory hierarchy. |
| For more information, please refer to the "Instruction-Based Sampling: |
| A New Performance Analysis Technique for AMD Family 10h Processors" ( |
| <a href="http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf"> |
| http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf</a>). |
| There are two types of IBS profile types, described in the following sections. |
| </p> |
| <div class="sect3" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h4 class="title"><a id="ibs-fetch"></a>4.8.1. IBS Fetch</h4> |
| </div> |
| </div> |
| </div> |
| <p> |
| IBS fetch sampling is a statistical sampling method which counts completed |
| fetch operations. When the number of completed fetch operations reaches the |
| maximum fetch count (the sampling period), IBS tags the fetch operation and |
| monitors that operation until it either completes or aborts. When a tagged |
| fetch completes or aborts, a sampling interrupt is generated and an IBS fetch |
| sample is taken. An IBS fetch sample contains a timestamp, the identifier of |
| the interrupted process, the virtual fetch address, and several event flags |
| and values that describe what happened during the fetch operation. |
| </p> |
| </div> |
| <div class="sect3" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h4 class="title"><a id="ibs-op"></a>4.8.2. IBS Op</h4> |
| </div> |
| </div> |
| </div> |
| <p> |
| IBS op sampling selects, tags, and monitors macro-ops as issued from AMD64 |
| instructions. Two options are available for selecting ops for sampling: |
| </p> |
| <div class="itemizedlist"> |
| <ul type="disc"> |
| <li> |
| Cycles-based selection counts CPU clock cycles. The op is tagged and monitored |
| when the count reaches a threshold (the sampling period) and a valid op is |
| available. |
| </li> |
| <li> |
| Dispatched op-based selection counts dispatched macro-ops. |
| When the count reaches a threshold, the next valid op is tagged and monitored. |
| </li> |
| </ul> |
| </div> |
| <p> |
| In both cases, an IBS sample is generated only if the tagged op retires. |
| Thus, IBS op event information does not measure speculative execution activity. |
| The execution stages of the pipeline monitor the tagged macro-op. When the |
| tagged macro-op retires, a sampling interrupt is generated and an IBS op |
| sample is taken. An IBS op sample contains a timestamp, the identifier of |
| the interrupted process, the virtual address of the AMD64 instruction from |
| which the op was issued, and several event flags and values that describe |
| what happened when the macro-op executed. |
| </p> |
| </div> |
| <p> |
| Enabling IBS profiling is done simply by specifying IBS performance events |
| through the "--event=" options. These events are listed in the |
| <code class="function">opcontrol --list-events</code>. |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| opcontrol --event=IBS_FETCH_XXX:<count>:<um>:<kernel>:<user> |
| opcontrol --event=IBS_OP_XXX:<count>:<um>:<kernel>:<user> |
| |
| Note: * All IBS fetch event must have the same event count and unitmask, |
| as do those for IBS op. |
| </pre> |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="misuse"></a>4.9. Dangerous counter settings</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| OProfile is a low-level profiler which allow continuous profiling with a low-overhead cost. |
| If too low a count reset value is set for a counter, the system can become overloaded with counter |
| interrupts, and seem as if the system has frozen. Whilst some validation is done, it |
| is not foolproof. |
| </p> |
| <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| <h3 class="title">Note</h3> |
| <p> |
| This can happen as follows: When the profiler count |
| reaches zero an NMI handler is called which stores the sample values in an internal buffer, then resets the counter |
| to its original value. If the count is very low, a pending NMI can be sent before the NMI handler has |
| completed. Due to the priority of the NMI, the local APIC delivers the pending interrupt immediately after |
| completion of the previous interrupt handler, and control never returns to other parts of the system. |
| In this way the system seems to be frozen. |
| </p> |
| </div> |
| <p>If this happens, it will be impossible to bring the system back to a workable state. |
| There is no way to provide real security against this happening, other than making sure to use a reasonable value |
| for the counter reset. For example, setting <code class="constant">CPU_CLK_UNHALTED</code> event type with a ridiculously low reset count (e.g. 500) |
| is likely to freeze the system. |
| </p> |
| <p> |
| In short : <span><strong class="command">Don't try a foolish sample count value</strong></span>. Unfortunately the definition of a foolish value |
| is really dependent on the event type - if ever in doubt, e-mail </p> |
| <div class="address"> |
| <p><code class="email"><<a href="mailto:oprofile-list@lists.sf.net">oprofile-list@lists.sf.net</a>></code>.</p> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="chapter" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title"><a id="results"></a>Chapter 4. Obtaining results</h2> |
| </div> |
| </div> |
| </div> |
| <div class="toc"> |
| <p> |
| <b>Table of Contents</b> |
| </p> |
| <dl> |
| <dt> |
| <span class="sect1"> |
| <a href="#profile-spec">1. Profile specifications</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#profile-spec-examples">1.1. Examples</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#profile-spec-details">1.2. Profile specification parameters</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#locating-and-managing-binary-images">1.3. Locating and managing binary images</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#no-results">1.4. What to do when you don't get any results</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#opreport">2. Image summaries and symbol summaries (<span><strong class="command">opreport</strong></span>)</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-merging">2.1. Merging separate profiles</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-comparison">2.2. Side-by-side multiple results</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-callgraph">2.3. Callgraph output</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-diff">2.4. Differential profiles with <span><strong class="command">opreport</strong></span></a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-anon">2.5. Anonymous executable mappings</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-xml">2.6. XML formatted output</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opreport-options">2.7. Options for <span><strong class="command">opreport</strong></span></a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#opannotate">3. Outputting annotated source (<span><strong class="command">opannotate</strong></span>)</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#opannotate-finding-source">3.1. Locating source files</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#opannotate-details">3.2. Usage of <span><strong class="command">opannotate</strong></span></a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#getting-jit-reports">4. OProfile results with JIT samples</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#opgprof">5. <span><strong class="command">gprof</strong></span>-compatible output (<span><strong class="command">opgprof</strong></span>)</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#opgprof-details">5.1. Usage of <span><strong class="command">opgprof</strong></span></a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#oparchive">6. Archiving measurements (<span><strong class="command">oparchive</strong></span>)</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#oparchive-details">6.1. Usage of <span><strong class="command">oparchive</strong></span></a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#opimport">7. Converting sample database files (<span><strong class="command">opimport</strong></span>)</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#opimport-details">7.1. Usage of <span><strong class="command">opimport</strong></span></a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| </dl> |
| </div> |
| <p> |
| OK, so the profiler has been running, but it's not much use unless we can get some data out. Fairly often, |
| OProfile does a little <span class="emphasis"><em>too</em></span> good a job of keeping overhead low, and no data reaches |
| the profiler. This can happen on lightly-loaded machines. Remember you can force a dump at any time with : |
| </p> |
| <p> |
| <span> |
| <strong class="command">opcontrol --dump</strong> |
| </span> |
| </p> |
| <p>Remember to do this before complaining there is no profiling data ! |
| Now that we've got some data, it has to be processed. That's the job of <span><strong class="command">opreport</strong></span>, |
| <span><strong class="command">opannotate</strong></span>, or <span><strong class="command">opgprof</strong></span>. |
| </p> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="profile-spec"></a>1. Profile specifications</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| All of the analysis tools take a <span class="emphasis"><em>profile specification</em></span>. |
| This is a set of definitions that describe which actual profiles should be |
| examined. The simplest profile specification is empty: this will match all |
| the available profile files for the current session (this is what happens |
| when you do <span><strong class="command">opreport</strong></span>). |
| </p> |
| <p> |
| Specification parameters are of the form <code class="option">name:value[,value]</code>. |
| For example, if I wanted to get a combined symbol summary for |
| <code class="filename">/bin/myprog</code> and <code class="filename">/bin/myprog2</code>, |
| I could do <span><strong class="command">opreport -l image:/bin/myprog,/bin/myprog2</strong></span>. |
| As a special case, you don't actually need to specify the <code class="option">image:</code> |
| part here: anything left on the command line is assumed to be an |
| <code class="option">image:</code> name. Similarly, if no <code class="option">session:</code> |
| is specified, then <code class="option">session:current</code> is assumed ("current" |
| is a special name of the current / last profiling session). |
| </p> |
| <p> |
| In addition to the comma-separated list shown above, some of the |
| specification parameters can take <span><strong class="command">glob</strong></span>-style |
| values. For example, if I want to see image summaries for all |
| binaries profiled in <code class="filename">/usr/bin/</code>, I could do |
| <span><strong class="command">opreport image:/usr/bin/\*</strong></span>. Note the necessity |
| to escape the special character from the shell. |
| </p> |
| <p> |
| For <span><strong class="command">opreport</strong></span>, profile specifications can be used to |
| define two profiles, giving differential output. This is done by |
| enclosing each of the two specifications within curly braces, as shown |
| in the examples below. Any specifications outside of curly braces are |
| shared across both. |
| </p> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="profile-spec-examples"></a>1.1. Examples</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| Image summaries for all profiles with <code class="constant">DATA_MEM_REFS</code> |
| samples in the saved session called "stresstest" : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # opreport session:stresstest event:DATA_MEM_REFS |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Symbol summary for the application called "test_sym53c8xx,9xx". Note the |
| escaping is necessary as <code class="option">image:</code> takes a comma-separated list. |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # opreport -l ./test/test_sym53c8xx\,9xx |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Image summaries for all binaries in the <code class="filename">test</code> directory, |
| excepting <code class="filename">boring-test</code> : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # opreport image:./test/\* image-exclude:./test/boring-test |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Differential profile of a binary stored in two archives : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # opreport -l /bin/bash { archive:./orig } { archive:./new } |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Differential profile of an archived binary with the current session : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # opreport -l /bin/bash { archive:./orig } { } |
| </pre> |
| </td> |
| </tr> |
| </table> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="profile-spec-details"></a>1.2. Profile specification parameters</h3> |
| </div> |
| </div> |
| </div> |
| <div class="variablelist"> |
| <dl> |
| <dt> |
| <span class="term"> |
| <code class="option">archive:</code> |
| <span class="emphasis"> |
| <em>archivepath</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| A path to an archive made with <span><strong class="command">oparchive</strong></span>. |
| Absence of this tag, unlike others, means "the current system", |
| equivalent to specifying "archive:". |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">session:</code> |
| <span class="emphasis"> |
| <em>sessionlist</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| A comma-separated list of session names to resolve in. Absence of this |
| tag, unlike others, means "the current session", equivalent to |
| specifying "session:current". |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">session-exclude:</code> |
| <span class="emphasis"> |
| <em>sessionlist</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| A comma-separated list of sessions to exclude. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">image:</code> |
| <span class="emphasis"> |
| <em>imagelist</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| A comma-separated list of image names to resolve. Each entry may be relative |
| path, <span><strong class="command">glob</strong></span>-style name, or full path, e.g.</p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen">opreport 'image:/usr/bin/oprofiled,*op*,./opreport'</pre> |
| </td> |
| </tr> |
| </table> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">image-exclude:</code> |
| <span class="emphasis"> |
| <em>imagelist</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Same as <code class="option">image:</code>, but the matching images are excluded. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">lib-image:</code> |
| <span class="emphasis"> |
| <em>imagelist</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Same as <code class="option">image:</code>, but only for images that are for |
| a particular primary binary image (namely, an application). This only |
| makes sense to use if you're using <code class="option">--separate</code>. |
| This includes kernel modules and the kernel when using |
| <code class="option">--separate=kernel</code>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">lib-image-exclude:</code> |
| <span class="emphasis"> |
| <em>imagelist</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Same as <code class="option">lib-image:</code>, but the matching images |
| are excluded. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">event:</code> |
| <span class="emphasis"> |
| <em>eventlist</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| The symbolic event name to match on, e.g. <code class="option">event:DATA_MEM_REFS</code>. |
| You can pass a list of events for side-by-side comparison with <span><strong class="command">opreport</strong></span>. |
| When using the timer interrupt, the event is always "TIMER". |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">count:</code> |
| <span class="emphasis"> |
| <em>eventcountlist</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| The event count to match on, e.g. <code class="option">event:DATA_MEM_REFS count:30000</code>. |
| Note that this value refers to the setting used for <span><strong class="command">opcontrol</strong></span> |
| only, and has nothing to do with the sample counts in the profile data |
| itself. |
| You can pass a list of events for side-by-side comparison with <span><strong class="command">opreport</strong></span>. |
| When using the timer interrupt, the count is always 0 (indicating it cannot be set). |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">unit-mask:</code> |
| <span class="emphasis"> |
| <em>masklist</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| The unit mask value of the event to match on, e.g. <code class="option">unit-mask:1</code>. |
| You can pass a list of events for side-by-side comparison with <span><strong class="command">opreport</strong></span>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">cpu:</code> |
| <span class="emphasis"> |
| <em>cpulist</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Only consider profiles for the given numbered CPU (starting from zero). |
| This is only useful when using CPU profile separation. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">tgid:</code> |
| <span class="emphasis"> |
| <em>pidlist</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Only consider profiles for the given task groups. Unless some program |
| is using threads, the task group ID of a process is the same |
| as its process ID. This option corresponds to the POSIX |
| notion of a thread group. |
| This is only useful when using per-process profile separation. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">tid:</code> |
| <span class="emphasis"> |
| <em>tidlist</em> |
| </span> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Only consider profiles for the given threads. When using |
| recent thread libraries, all threads in a process share the |
| same task group ID, but have different thread IDs. You can |
| use this option in combination with <code class="option">tgid:</code> to |
| restrict the results to particular threads within a process. |
| This is only useful when using per-process profile separation. |
| </p> |
| </dd> |
| </dl> |
| </div> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="locating-and-managing-binary-images"></a>1.3. Locating and managing binary images</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| Each session's sample files can be found in the $SESSION_DIR/samples/ directory (default: <code class="filename">/var/lib/oprofile/samples/</code>). |
| These are used, along with the binary image files, to produce human-readable data. |
| In some circumstances (kernel modules in an initrd, or modules on 2.6 kernels), OProfile |
| will not be able to find the binary images. All the tools have an <code class="option">--image-path</code> |
| option to which you can pass a comma-separated list of alternate paths to search. For example, |
| I can let OProfile find my 2.6 modules by using <span><strong class="command">--image-path /lib/modules/2.6.0/kernel/</strong></span>. |
| It is your responsibility to ensure that the correct images are found when using this |
| option. |
| </p> |
| <p> |
| Note that if a binary image changes after the sample file was created, you won't be able to get useful |
| symbol-based data out. This situation is detected for you. If you replace a binary, you should |
| make sure to save the old binary if you need to do comparative profiles. |
| </p> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="no-results"></a>1.4. What to do when you don't get any results</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| When attempting to get output, you may see the error : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| error: no sample files found: profile specification too strict ? |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| What this is saying is that the profile specification you passed in, |
| when matched against the available sample files, resulted in no matches. |
| There are a number of reasons this might happen: |
| </p> |
| <div class="variablelist"> |
| <dl> |
| <dt> |
| <span class="term">spelling</span> |
| </dt> |
| <dd> |
| <p> |
| You specified a binary name, but spelt it wrongly. Check your spelling ! |
| </p> |
| </dd> |
| <dt> |
| <span class="term">profiler wasn't running</span> |
| </dt> |
| <dd> |
| <p> |
| Make very sure that OProfile was actually up and running when you ran |
| the binary. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">binary didn't run long enough</span> |
| </dt> |
| <dd> |
| <p> |
| Remember OProfile is a statistical profiler - you're not guaranteed to |
| get samples for short-running programs. You can help this by using a |
| lower count for the performance counter, so there are a lot more samples |
| taken per second. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">binary spent most of its time in libraries</span> |
| </dt> |
| <dd> |
| <p> |
| Similarly, if the binary spends little time in the main binary image |
| itself, with most of it spent in shared libraries it uses, you might |
| not see any samples for the binary image itself. You can check this |
| by using <span><strong class="command">opcontrol --separate=lib</strong></span> before the |
| profiling session, so <span><strong class="command">opreport</strong></span> and friends show |
| the library profiles on a per-application basis. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">specification was really too strict</span> |
| </dt> |
| <dd> |
| <p> |
| For example, you specified something like <code class="option">tgid:3433</code>, |
| but no task with that group ID ever ran the code. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">binary didn't generate any events</span> |
| </dt> |
| <dd> |
| <p> |
| If you're using a particular event counter, for example counting MMX |
| operations, the code might simply have not generated any events in the |
| first place. Verify the code you're profiling does what you expect it |
| to. |
| </p> |
| </dd> |
| <dt> |
| <span class="term">you didn't specify kernel module name correctly</span> |
| </dt> |
| <dd> |
| <p> |
| If you're using 2.6 kernels, and trying to get reports for a kernel |
| module, make sure to use the <code class="option">-p</code> option, and specify the |
| module name <span class="emphasis"><em>with</em></span> the <code class="filename">.ko</code> |
| extension. Check if the module is one loaded from initrd. |
| </p> |
| </dd> |
| </dl> |
| </div> |
| </div> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="opreport"></a>2. Image summaries and symbol summaries (<span><strong class="command">opreport</strong></span>)</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| The <span><strong class="command">opreport</strong></span> utility is the primary utility you will use for |
| getting formatted data out of OProfile. It produces two types of data: image summaries |
| and symbol summaries. An image summary lists the number of samples for individual |
| binary images such as libraries or applications. Symbol summaries provide per-symbol |
| profile data. In the following example, we're getting an image summary for the whole |
| system: |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| $ opreport --long-filenames |
| CPU: PIII, speed 863.195 MHz (estimated) |
| Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 23150 |
| 905898 59.7415 /usr/lib/gcc-lib/i386-redhat-linux/3.2/cc1plus |
| 214320 14.1338 /boot/2.6.0/vmlinux |
| 103450 6.8222 /lib/i686/libc-2.3.2.so |
| 60160 3.9674 /usr/local/bin/madplay |
| 31769 2.0951 /usr/local/oprofile-pp/bin/oprofiled |
| 26550 1.7509 /usr/lib/libartsflow.so.1.0.0 |
| 23906 1.5765 /usr/bin/as |
| 18770 1.2378 /oprofile |
| 15528 1.0240 /usr/lib/qt-3.0.5/lib/libqt-mt.so.3.0.5 |
| 11979 0.7900 /usr/X11R6/bin/XFree86 |
| 11328 0.7471 /bin/bash |
| ... |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| If we had specified <code class="option">--symbols</code> in the previous command, we would have |
| gotten a symbol summary of all the images across the entire system. We can restrict this to only |
| part of the system profile; for example, |
| below is a symbol summary of the OProfile daemon. Note that as we used |
| <span><strong class="command">opcontrol --separate=kernel</strong></span>, symbols from images that <span><strong class="command">oprofiled</strong></span> |
| has used are also shown. |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| $ opreport -l `which oprofiled` 2>/dev/null | more |
| CPU: PIII, speed 863.195 MHz (estimated) |
| Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 23150 |
| vma samples % image name symbol name |
| 0804be10 14971 28.1993 oprofiled odb_insert |
| 0804afdc 7144 13.4564 oprofiled pop_buffer_value |
| c01daea0 6113 11.5144 vmlinux __copy_to_user_ll |
| 0804b060 2816 5.3042 oprofiled opd_put_sample |
| 0804b4a0 2147 4.0441 oprofiled opd_process_samples |
| 0804acf4 1855 3.4941 oprofiled opd_put_image_sample |
| 0804ad84 1766 3.3264 oprofiled opd_find_image |
| 0804a5ec 1084 2.0418 oprofiled opd_find_module |
| 0804ba5c 741 1.3957 oprofiled odb_hash_add_node |
| ... |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| These are the two basic ways you are most likely to use regularly, but <span><strong class="command">opreport</strong></span> |
| can do a lot more than that, as described below. |
| </p> |
| <div class="sect2" lang="en" xml:lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="opreport-merging"></a>2.1. Merging separate profiles</h3></div></div></div> |
| |
| If you have used one of the <code class="option">--separate=</code> options |
| whilst profiling, there can be several separate profiles for |
| a single binary image within a session. Normally the output |
| will keep these images separated (so, for example, the image summary |
| output shows library image summaries on a per-application basis, |
| when using <code class="option">--separate=lib</code>). |
| Sometimes it can be useful to merge these results back together |
| before getting results. The <code class="option">--merge</code> option allows |
| you to do that. |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="opreport-comparison"></a>2.2. Side-by-side multiple results</h3></div></div></div> |
| If you have used multiple events when profiling, by default you get |
| side-by-side results of each event's sample values from <span><strong class="command">opreport</strong></span>. |
| You can restrict which events to list by appropriate use of the |
| <code class="option">event:</code> profile specifications, etc. |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="opreport-callgraph"></a>2.3. Callgraph output</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| This section provides details on how to use the OProfile callgraph feature. |
| </p> |
| <div class="sect3" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h4 class="title"><a id="op-cg1"></a>2.3.1. Callgraph details</h4> |
| </div> |
| </div> |
| </div> |
| <p> |
| When using the <code class="option">opcontrol --callgraph</code> option, you can see what |
| functions are calling other functions in the output. Consider the |
| following program: |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| #include <string.h> |
| #include <stdlib.h> |
| #include <stdio.h> |
| |
| #define SIZE 500000 |
| |
| static int compare(const void *s1, const void *s2) |
| { |
| return strcmp(s1, s2); |
| } |
| |
| static void repeat(void) |
| { |
| int i; |
| char *strings[SIZE]; |
| char str[] = "abcdefghijklmnopqrstuvwxyz"; |
| |
| for (i = 0; i < SIZE; ++i) { |
| strings[i] = strdup(str); |
| strfry(strings[i]); |
| } |
| |
| qsort(strings, SIZE, sizeof(char *), compare); |
| } |
| |
| int main() |
| { |
| while (1) |
| repeat(); |
| } |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| When running with the call-graph option, OProfile will |
| record the function stack every time it takes a sample. |
| <span><strong class="command">opreport --callgraph</strong></span> outputs an entry for each |
| function, where each entry looks similar to: |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| samples % image name symbol name |
| 197 0.1548 cg main |
| 127036 99.8452 cg repeat |
| 84590 42.5084 libc-2.3.2.so strfry |
| 84590 66.4838 libc-2.3.2.so strfry [self] |
| 39169 30.7850 libc-2.3.2.so random_r |
| 3475 2.7312 libc-2.3.2.so __i686.get_pc_thunk.bx |
| ------------------------------------------------------------------------------- |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Here the non-indented line is the function we're focussing upon |
| (<code class="function">strfry()</code>). This |
| line is the same as you'd get from a normal <span><strong class="command">opreport</strong></span> |
| output. |
| </p> |
| <p> |
| Above the non-indented line we find the functions that called this |
| function (for example, <code class="function">repeat()</code> calls |
| <code class="function">strfry()</code>). The samples and percentage values here |
| refer to the number of times we took a sample where this call was found |
| in the stack; the percentage is relative to all other callers of the |
| function we're focussing on. Note that these values are |
| <span class="emphasis"><em>not</em></span> call counts; they only reflect the call stack |
| every time a sample is taken; that is, if a call is found in the stack |
| at the time of a sample, it is recorded in this count. |
| </p> |
| <p> |
| Below the line are functions that are called by |
| <code class="function">strfry()</code> (called <span class="emphasis"><em>callees</em></span>). |
| It's clear here that <code class="function">strfry()</code> calls |
| <code class="function">random_r()</code>. We also see a special entry with a |
| "[self]" marker. This records the normal samples for the function, but |
| the percentage becomes relative to all callees. This allows you to |
| compare time spent in the function itself compared to functions it |
| calls. Note that if a function calls itself, then it will appear in the |
| list of callees of itself, but without the "[self]" marker; so recursive |
| calls are still clearly separable. |
| </p> |
| <p> |
| You may have noticed that the output lists <code class="function">main()</code> |
| as calling <code class="function">strfry()</code>, but it's clear from the source |
| that this doesn't actually happen. See <a href="#interpreting-callgraph" title="3. Interpreting call-graph profiles">Section 3, “Interpreting call-graph profiles”</a> for an explanation. |
| </p> |
| </div> |
| <div class="sect3" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h4 class="title"><a id="cg-with-jitsupport"></a>2.3.2. Callgraph and JIT support</h4> |
| </div> |
| </div> |
| </div> |
| <p> |
| Callgraph output where anonymously mapped code is in the callstack can sometimes be misleading. |
| For all such code, the samples for the anonymously mapped code are stored in a samples subdirectory |
| named <code class="filename">{anon:anon}/<tgid>.<begin_addr>.<end_addr></code>. |
| As stated earlier, if this anonymously mapped code is JITed code from a supported VM like Java, |
| OProfile creates an ELF file to provide a (somewhat) permanent backing file for the code. |
| However, when viewing callgraph output, any anonymously mapped code in the callstack |
| will be attributed to <code class="filename">anon (<tgid>: range:<begin_addr>-<end_addr></code>, |
| even if a <code class="filename">.jo</code> ELF file had been created for it. See the example below. |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| ------------------------------------------------------------------------------- |
| 1 2.2727 libj9ute23.so java.bin traceV |
| 2 4.5455 libj9ute23.so java.bin utsTraceV |
| 4 9.0909 libj9trc23.so java.bin fillInUTInterfaces |
| 37 84.0909 libj9trc23.so java.bin twGetSequenceCounter |
| 8 0.0154 libj9prt23.so java.bin j9time_hires_clock |
| 27 61.3636 anon (tgid:10014 range:0x100000-0x103000) java.bin (no symbols) |
| 9 20.4545 libc-2.4.so java.bin gettimeofday |
| 8 18.1818 libj9prt23.so java.bin j9time_hires_clock [self] |
| ------------------------------------------------------------------------------- |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| The output shows that "anon (tgid:10014 range:0x100000-0x103000)" was a callee of |
| <code class="code">j9time_hires_clock</code>, even though the ELF file <code class="filename">10014.jo</code> was |
| created for this profile run. Unfortunately, there is currently no way to correlate |
| that anonymous callgraph entry with its corresponding <code class="filename">.jo</code> file. |
| </p> |
| </div> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="opreport-diff"></a>2.4. Differential profiles with <span><strong class="command">opreport</strong></span></h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| Often, we'd like to be able to compare two profiles. For example, when |
| analysing the performance of an application, we'd like to make code |
| changes and examine the effect of the change. This is supported in |
| <span><strong class="command">opreport</strong></span> by giving a profile specification that |
| identifies two different profiles. The general form is of: |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| $ opreport <shared-spec> { <first-profile> } { <second-profile> } |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| <h3 class="title">Note</h3> |
| <p> |
| We lost our Dragon book down the back of the sofa, so you have to be |
| careful to have spaces around those braces, or things will get |
| hopelessly confused. We can only apologise. |
| </p> |
| </div> |
| <p> |
| For each of the profiles, the shared section is prefixed, and then the |
| specification is analysed. The usual parameters work both within the |
| shared section, and in the sub-specification within the curly braces. |
| </p> |
| <p> |
| A typical way to use this feature is with archives created with |
| <span><strong class="command">oparchive</strong></span>. Let's look at an example: |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| $ ./a |
| $ oparchive -o orig ./a |
| $ opcontrol --reset |
| # edit and recompile a |
| $ ./a |
| # now compare the current profile of a with the archived profile |
| $ opreport -xl ./a { archive:./orig } { } |
| CPU: PIII, speed 863.233 MHz (estimated) |
| Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a |
| unit mask of 0x00 (No unit mask) count 100000 |
| samples % diff % symbol name |
| 92435 48.5366 +0.4999 a |
| 54226 --- --- c |
| 49222 25.8459 +++ d |
| 48787 25.6175 -2.2e-01 b |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Note that we specified an empty second profile in the curly braces, as |
| we wanted to use the current session; alternatively, we could |
| have specified another archive, or a tgid etc. We specified the binary |
| <span><strong class="command">a</strong></span> in the shared section, so we matched that in both |
| the profiles we're diffing. |
| </p> |
| <p> |
| As in the normal output, the results are sorted by the number of |
| samples, and the percentage field represents the relative percentage of |
| the symbol's samples in the second profile. |
| </p> |
| <p> |
| Notice the new column in the output. This value represents the |
| percentage change of the relative percent between the first and the |
| second profile: roughly, "how much more important this symbol is". |
| Looking at the symbol <code class="function">a()</code>, we can see that it took |
| roughly the same amount of the total profile in both the first and the |
| second profile. The function <code class="function">c()</code> was not in the new |
| profile, so has been marked with <code class="function">---</code>. Note that the |
| sample value is the number of samples in the first profile; since we're |
| displaying results for the second profile, we don't list a percentage |
| value for it, as it would be meaningless. <code class="function">d()</code> is |
| new in the second profile, and consequently marked with |
| <code class="function">+++</code>. |
| </p> |
| <p> |
| When comparing profiles between different binaries, it should be clear |
| that functions can change in terms of VMA and size. To avoid this |
| problem, <span><strong class="command">opreport</strong></span> considers a symbol to be the same |
| if the symbol name, image name, and owning application name all match; |
| any other factors are ignored. Note that the check for application name |
| means that trying to compare library profiles between two different |
| applications will not work as you might expect: each symbol will be |
| considered different. |
| </p> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="opreport-anon"></a>2.5. Anonymous executable mappings</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| Many applications, typically ones involving dynamic compilation into |
| machine code (just-in-time, or "JIT", compilation), have executable mappings that |
| are not backed by an ELF file. <span><strong class="command">opreport</strong></span> has basic support for showing the |
| samples taken in these regions; for example: |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| $ opreport /usr/bin/mono -l |
| CPU: ppc64 POWER5, speed 1654.34 MHz (estimated) |
| Counted CYCLES events (Processor Cycles using continuous sampling) with a unit mask of 0x00 (No unit mask) count 100000 |
| samples % image name symbol name |
| 47 58.7500 mono (no symbols) |
| 14 17.5000 anon (tgid:3189 range:0xf72aa000-0xf72fa000) (no symbols) |
| 9 11.2500 anon (tgid:3189 range:0xf6cca000-0xf6dd9000) (no symbols) |
| . . . . |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| </p> |
| <p> |
| Note that, since such mappings are dependent upon individual invocations of |
| a binary, these mappings are always listed as a dependent image, |
| even when using <code class="option">--separate=none</code>. |
| Equally, the results are not affected by the <code class="option">--merge</code> |
| option. |
| </p> |
| <p> |
| As shown in the opreport output above, OProfile is unable to attribute the samples to any |
| symbol(s) because there is no ELF file for this code. |
| Enhanced support for JITed code is now available for some virtual machines; |
| e.g., the Java Virtual Machine. For details about OProfile output for |
| JITed code, see <a href="#getting-jit-reports" title="4. OProfile results with JIT samples">Section 4, “OProfile results with JIT samples”</a>. |
| </p> |
| <p>For more information about JIT support in OProfile, see <a href="#jitsupport" title="1.1. Support for dynamically compiled (JIT) code">Section 1.1, “Support for dynamically compiled (JIT) code”</a>. |
| </p> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="opreport-xml"></a>2.6. XML formatted output</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| The -xml option can be used to generate XML instead of the usual |
| text format. This allows opreport to eliminate some of the constraints |
| dictated by the two dimensional text format. For example, it is possible |
| to separate the sample data across multiple events, cpus and threads. The XML |
| schema implemented by opreport is found in doc/opreport.xsd. It contains |
| more detailed comments about the structure of the XML generated by opreport. |
| </p> |
| <p> |
| Since XML is consumed by a client program rather than a user, its structure |
| is fairly static. In particular, the --sort option is incompatible with the |
| --xml option. Percentages are not dislayed in the XML so the options related |
| to percentages will have no effect. Full pathnames are always displayed in |
| the XML so --long-filenames is not necessary. The --details option will cause |
| all of the individual sample data to be included in the XML as well as the |
| instruction byte stream for each symbol (for doing disassembly) and can result |
| in very large XML files. |
| </p> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="opreport-options"></a>2.7. Options for <span><strong class="command">opreport</strong></span></h3> |
| </div> |
| </div> |
| </div> |
| <div class="variablelist"> |
| <dl> |
| <dt> |
| <span class="term"> |
| <code class="option">--accumulated / -a</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Accumulate sample and percentage counts in the symbol list. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--callgraph / -c</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Show callgraph information. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--debug-info / -g</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Show source file and line for each symbol. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--demangle / -D none|normal|smart</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| none: no demangling. normal: use default demangler (default) smart: use |
| pattern-matching to make C++ symbol demangling more readable. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--details / -d</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Show per-instruction details for all selected symbols. Note that, for |
| binaries without symbol information, the VMA values shown are raw file |
| offsets for the image binary. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--exclude-dependent / -x</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Do not include application-specific images for libraries, kernel modules |
| and the kernel. This option only makes sense if the profile session |
| used --separate. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--exclude-symbols / -e [symbols]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Exclude all the symbols in the given comma-separated list. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--global-percent / -%</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Make all percentages relative to the whole profile. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--help / -? / --usage</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Show help message. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--image-path / -p [paths]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Comma-separated list of additional paths to search for binaries. |
| This is needed to find modules in kernels 2.6 and upwards. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--root / -R [path]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| A path to a filesystem to search for additional binaries. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--include-symbols / -i [symbols]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Only include symbols in the given comma-separated list. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--long-filenames / -f</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Output full paths instead of basenames. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--merge / -m [lib,cpu,tid,tgid,unitmask,all]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Merge any profiles separated in a --separate session. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--no-header</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Don't output a header detailing profiling parameters. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--output-file / -o [file]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Output to the given file instead of stdout. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--reverse-sort / -r</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Reverse the sort from the default. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"><code class="option">--session-dir=</code>dir_path</span> |
| </dt> |
| <dd> |
| <p> |
| Use sample database out of directory <code class="filename">dir_path</code> |
| instead of the default location (/var/lib/oprofile). |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--show-address / -w</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Show the VMA address of each symbol (off by default). |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--sort / -s [vma,sample,symbol,debug,image]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Sort the list of symbols by, respectively, symbol address, |
| number of samples, symbol name, debug filename and line number, |
| binary image filename. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--symbols / -l</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| List per-symbol information instead of a binary image summary. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--threshold / -t [percentage]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Only output data for symbols that have more than the given percentage |
| of total samples. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--verbose / -V [options]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Give verbose debugging output. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--version / -v</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Show version. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--xml / -X</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Generate XML output. |
| </p> |
| </dd> |
| </dl> |
| </div> |
| </div> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="opannotate"></a>3. Outputting annotated source (<span><strong class="command">opannotate</strong></span>)</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| The <span><strong class="command">opannotate</strong></span> utility generates annotated source files or assembly listings, optionally |
| mixed with source. |
| If you want to see the source file, the profiled application needs to have debug information, and the source |
| must be available through this debug information. For GCC, you must use the <code class="option">-g</code> option |
| when you are compiling. |
| If the binary doesn't contain sufficient debug information, you can still |
| use <span><strong class="command">opannotate <code class="option">--assembly</code></strong></span> to get annotated assembly. |
| </p> |
| <p> |
| Note that for the reason explained in <a href="#hardware-counters" title="4.1. Hardware performance counters">Section 4.1, “Hardware performance counters”</a> the results can be |
| inaccurate. The debug information itself can add other problems; for example, the line number for a symbol can be |
| incorrect. Assembly instructions can be re-ordered and moved by the compiler, and this can lead to |
| crediting source lines with samples not really "owned" by this line. Also see |
| <a href="#interpreting" title="Chapter 5. Interpreting profiling results">Chapter 5, <i>Interpreting profiling results</i></a>. |
| </p> |
| <p> |
| You can output the annotation to one single file, containing all the source found using the |
| <code class="option">--source</code>. You can use this in conjunction with <code class="option">--assembly</code> |
| to get combined source/assembly output. |
| </p> |
| <p> |
| You can also output a directory of annotated source files that maintains the structure of |
| the original sources. Each line in the annotated source is prepended with the samples |
| for that line. Additionally, each symbol is annotated giving details for the symbol |
| as a whole. An example: |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| $ opannotate --source --output-dir=annotated /usr/local/oprofile-pp/bin/oprofiled |
| $ ls annotated/home/moz/src/oprofile-pp/daemon/ |
| opd_cookie.h opd_image.c opd_kernel.c opd_sample_files.c oprofiled.c |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Line numbers are maintained in the source files, but each file has |
| a footer appended describing the profiling details. The actual annotation |
| looks something like this : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| ... |
| :static uint64_t pop_buffer_value(struct transient * trans) |
| 11510 1.9661 :{ /* pop_buffer_value total: 89901 15.3566 */ |
| : uint64_t val; |
| : |
| 10227 1.7469 : if (!trans->remaining) { |
| : fprintf(stderr, "BUG: popping empty buffer !\n"); |
| : exit(EXIT_FAILURE); |
| : } |
| : |
| : val = get_buffer_value(trans->buffer, 0); |
| 2281 0.3896 : trans->remaining--; |
| 2296 0.3922 : trans->buffer += kernel_pointer_size; |
| : return val; |
| 10454 1.7857 :} |
| ... |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| The first number on each line is the number of samples, whilst the second is |
| the relative percentage of total samples. |
| </p> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="opannotate-finding-source"></a>3.1. Locating source files</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| Of course, <span><strong class="command">opannotate</strong></span> needs to be able to locate the source files |
| for the binary image(s) in order to produce output. Some binary images have debug |
| information where the given source file paths are relative, not absolute. You can |
| specify search paths to look for these files (similar to <span><strong class="command">gdb</strong></span>'s |
| <code class="option">dir</code> command) with the <code class="option">--search-dirs</code> option. |
| </p> |
| <p> |
| Sometimes you may have a binary image which gives absolute paths for the source files, |
| but you have the actual sources elsewhere (commonly, you've installed an SRPM for |
| a binary on your system and you want annotation from an existing profile). You can |
| use the <code class="option">--base-dirs</code> option to redirect OProfile to look somewhere |
| else for source files. For example, imagine we have a binary generated from a source |
| file that is given in the debug information as <code class="filename">/tmp/build/libfoo/foo.c</code>, |
| and you have the source tree matching that binary installed in <code class="filename">/home/user/libfoo/</code>. |
| You can redirect OProfile to find <code class="filename">foo.c</code> correctly like this : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| $ opannotate --source --base-dirs=/tmp/build/libfoo/ --search-dirs=/home/user/libfoo/ --output-dir=annotated/ /lib/libfoo.so |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| You can specify multiple (comma-separated) paths to both options. |
| </p> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="opannotate-details"></a>3.2. Usage of <span><strong class="command">opannotate</strong></span></h3> |
| </div> |
| </div> |
| </div> |
| <div class="variablelist"> |
| <dl> |
| <dt> |
| <span class="term"> |
| <code class="option">--assembly / -a</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Output annotated assembly. If this is combined with --source, then mixed |
| source / assembly annotations are output. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--base-dirs / -b [paths]/</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Comma-separated list of path prefixes. This can be used to point OProfile to a |
| different location for source files when the debug information specifies an |
| absolute path on your system for the source that does not exist. The prefix |
| is stripped from the debug source file paths, then searched in the search dirs |
| specified by <code class="option">--search-dirs</code>. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--demangle / -D none|normal|smart</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| none: no demangling. normal: use default demangler (default) smart: use |
| pattern-matching to make C++ symbol demangling more readable. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--exclude-dependent / -x</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Do not include application-specific images for libraries, kernel modules |
| and the kernel. This option only makes sense if the profile session |
| used --separate. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--exclude-file [files]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Exclude all files in the given comma-separated list of glob patterns. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--exclude-symbols / -e [symbols]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Exclude all the symbols in the given comma-separated list. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--help / -? / --usage</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Show help message. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--image-path / -p [paths]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Comma-separated list of additional paths to search for binaries. |
| This is needed to find modules in kernels 2.6 and upwards. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--root / -R [path]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| A path to a filesystem to search for additional binaries. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--include-file [files]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Only include files in the given comma-separated list of glob patterns. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--include-symbols / -i [symbols]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Only include symbols in the given comma-separated list. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--objdump-params [params]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Pass the given parameters as extra values when calling objdump. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--output-dir / -o [dir]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Output directory. This makes opannotate output one annotated file for each |
| source file. This option can't be used in conjunction with --assembly. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--search-dirs / -d [paths]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Comma-separated list of paths to search for source files. This is useful to find |
| source files when the debug information only contains relative paths. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--source / -s</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Output annotated source. This requires debugging information to be available |
| for the binaries. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--threshold / -t [percentage]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Only output data for symbols that have more than the given percentage |
| of total samples. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--verbose / -V [options]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Give verbose debugging output. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--version / -v</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Show version. |
| </p> |
| </dd> |
| </dl> |
| </div> |
| </div> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="getting-jit-reports"></a>4. OProfile results with JIT samples</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| After profiling a Java (or other supported VM) application, the command |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"><span xmlns="http://www.w3.org/1999/xhtml"><strong class="command">"opcontrol --dump"</strong></span> </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| flushes the sample buffers and creates ELF binaries from the |
| intermediate files that were written by the agent library. |
| The ELF binaries are named <code class="filename"><tgid>.jo</code>. |
| With the symbol information stored in these ELF files, it is |
| possible to map samples to the appropriate symbols. |
| </p> |
| <p> |
| The usual analysis tools (<span><strong class="command">opreport</strong></span> and/or |
| <span><strong class="command">opannotate</strong></span>) can now be used |
| to get symbols and assembly code for the instrumented VM processes. |
| </p> |
| <p> |
| Below is an example of a profile report of a Java application that has been |
| instrumented with the provided agent library. |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| $ opreport -l /usr/lib/jvm/jre-1.5.0-ibm/bin/java |
| CPU: Core Solo / Duo, speed 2167 MHz (estimated) |
| Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit mask of 0x00 (Unhalted core cycles) count 100000 |
| samples % image name symbol name |
| 186020 50.0523 no-vmlinux no-vmlinux (no symbols) |
| 34333 9.2380 7635.jo java void test.f1() |
| 19022 5.1182 libc-2.5.so libc-2.5.so _IO_file_xsputn@@GLIBC_2.1 |
| 18762 5.0483 libc-2.5.so libc-2.5.so vfprintf |
| 16408 4.4149 7635.jo java void test$HelloThread.run() |
| 16250 4.3724 7635.jo java void test$test_1.f2(int) |
| 15303 4.1176 7635.jo java void test.f2(int, int) |
| 13252 3.5657 7635.jo java void test.f2(int) |
| 5165 1.3897 7635.jo java void test.f4() |
| 955 0.2570 7635.jo java void test$HelloThread.run()~ |
| |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| </p> |
| <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"> |
| <h3 class="title">Note</h3> |
| <p> |
| Depending on the JVM that is used, certain options of opreport and opannotate |
| do NOT work since they rely on debug information (e.g. source code line number) |
| that is not always available. The Sun JVM does provide the necessary debug |
| information via the JVMTI[PI] interface, |
| but other JVMs do not. |
| </p> |
| </div> |
| <p> |
| As you can see in the opreport output, the JIT support agent for Java |
| generates symbols to include the class and method signature. |
| A symbol with the suffix ˜<n> (e.g. |
| <code class="code">void test$HelloThread.run()˜1</code>) means that this is |
| the <n>th occurrence of the identical name. This happens if a method is re-JITed. |
| A symbol with the suffix %<n>, means that the address space of this symbol |
| was reused during the sample session (see <a href="#overlapping-symbols" title="6. Overlapping symbols in JITed code">Section 6, “Overlapping symbols in JITed code”</a>). |
| The value <n> is the percentage of time that this symbol/code was present in |
| relation to the total lifetime of all overlapping other symbols. A symbol of the form |
| <code class="code"><return_val> <class_name>$<method_sig></code> denotes an |
| inner class. |
| </p> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="opgprof"></a>5. <span><strong class="command">gprof</strong></span>-compatible output (<span><strong class="command">opgprof</strong></span>)</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| If you're familiar with the output produced by <span><strong class="command">GNU gprof</strong></span>, |
| you may find <span><strong class="command">opgprof</strong></span> useful. It takes a single binary |
| as an argument, and produces a <code class="filename">gmon.out</code> file for use |
| with <span><strong class="command">gprof -p</strong></span>. If call-graph profiling is enabled, |
| then this is also included. |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| $ opgprof `which oprofiled` # generates gmon.out file |
| $ gprof -p `which oprofiled` | head |
| Flat profile: |
| |
| Each sample counts as 1 samples. |
| % cumulative self self total |
| time samples samples calls T1/call T1/call name |
| 33.13 206237.00 206237.00 odb_insert |
| 22.67 347386.00 141149.00 pop_buffer_value |
| 9.56 406881.00 59495.00 opd_put_sample |
| 7.34 452599.00 45718.00 opd_find_image |
| 7.19 497327.00 44728.00 opd_process_samples |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="opgprof-details"></a>5.1. Usage of <span><strong class="command">opgprof</strong></span></h3> |
| </div> |
| </div> |
| </div> |
| <div class="variablelist"> |
| <dl> |
| <dt> |
| <span class="term"> |
| <code class="option">--help / -? / --usage</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Show help message. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--image-path / -p [paths]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Comma-separated list of additional paths to search for binaries. |
| This is needed to find modules in kernels 2.6 and upwards. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--root / -R [path]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| A path to a filesystem to search for additional binaries. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--output-filename / -o [file]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Output to the given file instead of the default, gmon.out |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--threshold / -t [percentage]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Only output data for symbols that have more than the given percentage |
| of total samples. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--verbose / -V [options]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Give verbose debugging output. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--version / -v</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Show version. |
| </p> |
| </dd> |
| </dl> |
| </div> |
| </div> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="oparchive"></a>6. Archiving measurements (<span><strong class="command">oparchive</strong></span>)</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| The <span><strong class="command">oparchive</strong></span> utility generates a directory populated |
| with executable, debug, and oprofile sample files. This directory can be |
| moved to another machine via <span><strong class="command">tar</strong></span> and analyzed without |
| further use of the data collection machine. |
| </p> |
| <p> |
| The following command would collect the sample files, the executables |
| associated with the sample files, and the debuginfo files associated |
| with the executables and copy them into |
| <code class="filename">/tmp/current_data</code>: |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # oparchive -o /tmp/current_data |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="oparchive-details"></a>6.1. Usage of <span><strong class="command">oparchive</strong></span></h3> |
| </div> |
| </div> |
| </div> |
| <div class="variablelist"> |
| <dl> |
| <dt> |
| <span class="term"> |
| <code class="option">--help / -? / --usage</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Show help message. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--exclude-dependent / -x</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Do not include application-specific images for libraries, kernel modules |
| and the kernel. This option only makes sense if the profile session |
| used --separate. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--image-path / -p [paths]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Comma-separated list of additional paths to search for binaries. |
| This is needed to find modules in kernels 2.6 and upwards. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--root / -R [path]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| A path to a filesystem to search for additional binaries. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--output-directory / -o [directory]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Output to the given directory. There is no default. This must be specified. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--list-files / -l</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Only list the files that would be archived, don't copy them. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--verbose / -V [options]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Give verbose debugging output. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--version / -v</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Show version. |
| </p> |
| </dd> |
| </dl> |
| </div> |
| </div> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="opimport"></a>7. Converting sample database files (<span><strong class="command">opimport</strong></span>)</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| This utility converts sample database files from a foreign binary format (abi) to |
| the native format. This is useful only when moving sample files between hosts, |
| for analysis on platforms other than the one used for collection. The abi format |
| of the file to be imported is described in a text file located in <code class="filename">$SESSION_DIR/abi</code>. |
| </p> |
| <p> |
| The following command would convert the input samples files to the |
| output samples files using the given abi file as a binary description |
| of the input file and the curent platform abi as a binary description |
| of the output file. |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| # opimport -a /var/lib/oprofile/abi -o /tmp/current/.../GLOBAL_POWER_EVENTS.200000.1.all.all.all /var/lib/.../mprime/GLOBAL_POWER_EVENTS.200000.1.all.all.all |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="opimport-details"></a>7.1. Usage of <span><strong class="command">opimport</strong></span></h3> |
| </div> |
| </div> |
| </div> |
| <div class="variablelist"> |
| <dl> |
| <dt> |
| <span class="term"> |
| <code class="option">--help / -? / --usage</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Show help message. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--abi / -a [filename]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Input abi file description location. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--force / -f</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Force conversion even if the input and output abi are identical. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--output / -o [filename]</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Specify the output filename. If the output file already exists, the file is |
| not overwritten but data are accumulated in. Sample filename are informative |
| for post profile tools and must be kept identical, in other word the pathname |
| from the first path component containing a '{' must be kept as it in the |
| output filename. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--verbose / -V</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Give verbose debugging output. |
| </p> |
| </dd> |
| <dt> |
| <span class="term"> |
| <code class="option">--version / -v</code> |
| </span> |
| </dt> |
| <dd> |
| <p> |
| Show version. |
| </p> |
| </dd> |
| </dl> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="chapter" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title"><a id="interpreting"></a>Chapter 5. Interpreting profiling results</h2> |
| </div> |
| </div> |
| </div> |
| <div class="toc"> |
| <p> |
| <b>Table of Contents</b> |
| </p> |
| <dl> |
| <dt> |
| <span class="sect1"> |
| <a href="#irq-latency">1. Profiling interrupt latency</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#kernel-profiling">2. Kernel profiling</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#irq-masking">2.1. Interrupt masking</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#idle">2.2. Idle time</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#kernel-modules">2.3. Profiling kernel modules</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#interpreting-callgraph">3. Interpreting call-graph profiles</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#debug-info">4. Inaccuracies in annotated source</a> |
| </span> |
| </dt> |
| <dd> |
| <dl> |
| <dt> |
| <span class="sect2"> |
| <a href="#effect-of-optimizations">4.1. Side effects of optimizations</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#prologues">4.2. Prologues and epilogues</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#inlined-function">4.3. Inlined functions</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect2"> |
| <a href="#wrong-linenr-info">4.4. Inaccuracy in line number information</a> |
| </span> |
| </dt> |
| </dl> |
| </dd> |
| <dt> |
| <span class="sect1"> |
| <a href="#symbol-without-debug-info">5. Assembly functions</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#overlapping-symbols">6. Overlapping symbols in JITed code</a> |
| </span> |
| </dt> |
| <dt> |
| <span class="sect1"> |
| <a href="#hidden-cost">7. Other discrepancies</a> |
| </span> |
| </dt> |
| </dl> |
| </div> |
| <p> |
| The standard caveats of profiling apply in interpreting the results from OProfile: |
| profile realistic situations, profile different scenarios, profile |
| for as long as a time as possible, avoid system-specific artifacts, don't trust |
| the profile data too much. Also bear in mind the comments on the performance |
| counters above - you <span class="emphasis"><em>cannot</em></span> rely on totally accurate |
| instruction-level profiling. However, for almost all circumstances the data |
| can be useful. Ideally a utility such as Intel's VTUNE would be available to |
| allow careful instruction-level analysis; go hassle Intel for this, not me ;) |
| </p> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="irq-latency"></a>1. Profiling interrupt latency</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| This is an example of how the latency of delivery of profiling interrupts |
| can impact the reliability of the profiling data. This is pretty much a |
| worst-case-scenario example: these problems are fairly rare. |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| double fun(double a, double b, double c) |
| { |
| double result = 0; |
| for (int i = 0 ; i < 10000; ++i) { |
| result += a; |
| result *= b; |
| result /= c; |
| } |
| return result; |
| } |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Here the last instruction of the loop is very costly, and you would expect the result |
| reflecting that - but (cutting the instructions inside the loop): |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| $ opannotate -a -t 10 ./a.out |
| |
| 88 15.38% : 8048337: fadd %st(3),%st |
| 48 8.391% : 8048339: fmul %st(2),%st |
| 68 11.88% : 804833b: fdiv %st(1),%st |
| 368 64.33% : 804833d: inc %eax |
| : 804833e: cmp $0x270f,%eax |
| : 8048343: jle 8048337 |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| The problem comes from the x86 hardware; when the counter overflows the IRQ |
| is asserted but the hardware has features that can delay the NMI interrupt: |
| x86 hardware is synchronous (i.e. cannot interrupt during an instruction); |
| there is also a latency when the IRQ is asserted, and the multiple |
| execution units and the out-of-order model of modern x86 CPUs also causes |
| problems. This is the same function, with annotation : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| $ opannotate -s -t 10 ./a.out |
| |
| :double fun(double a, double b, double c) |
| :{ /* _Z3funddd total: 572 100.0% */ |
| : double result = 0; |
| 368 64.33% : for (int i = 0 ; i < 10000; ++i) { |
| 88 15.38% : result += a; |
| 48 8.391% : result *= b; |
| 68 11.88% : result /= c; |
| : } |
| : return result; |
| :} |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| The conclusion: don't trust samples coming at the end of a loop, |
| particularly if the last instruction generated by the compiler is costly. This |
| case can also occur for branches. Always bear in mind that samples |
| can be delayed by a few cycles from its real position. That's a hardware |
| problem and OProfile can do nothing about it. |
| </p> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="kernel-profiling"></a>2. Kernel profiling</h2> |
| </div> |
| </div> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="irq-masking"></a>2.1. Interrupt masking</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| OProfile uses non-maskable interrupts (NMI) on the P6 generation, Pentium 4, |
| Athlon, Opteron, Phenom, and Turion processors. These interrupts can occur even in section of the |
| Linux where interrupts are disabled, allowing collection of samples in virtually |
| all executable code. The RTC, timer interrupt mode, and Itanium 2 collection mechanisms |
| use maskable interrupts. Thus, the RTC and Itanium 2 data collection mechanism have "sample |
| shadows", or blind spots: regions where no samples will be collected. Typically, the samples |
| will be attributed to the code immediately after the interrupts are re-enabled. |
| </p> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="idle"></a>2.2. Idle time</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| Your kernel is likely to support halting the processor when a CPU is idle. As |
| the typical hardware events like <code class="constant">CPU_CLK_UNHALTED</code> do not |
| count when the CPU is halted, the kernel profile will not reflect the actual |
| amount of time spent idle. You can change this behaviour by booting with |
| the <code class="option">idle=poll</code> option, which uses a different idle routine. This |
| will appear as <code class="function">poll_idle()</code> in your kernel profile. |
| </p> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="kernel-modules"></a>2.3. Profiling kernel modules</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| OProfile profiles kernel modules by default. However, there are a couple of problems |
| you may have when trying to get results. First, you may have booted via an initrd; |
| this means that the actual path for the module binaries cannot be determined automatically. |
| To get around this, you can use the <code class="option">-p</code> option to the profiling tools |
| to specify where to look for the kernel modules. |
| </p> |
| <p> |
| In 2.6, the information on where kernel module binaries are located has been removed. |
| This means OProfile needs guiding with the <code class="option">-p</code> option to find your |
| modules. Normally, you can just use your standard module top-level directory for this. |
| Note that due to this problem, OProfile cannot check that the modification times match; |
| it is your responsibility to make sure you do not modify a binary after a profile |
| has been created. |
| </p> |
| <p> |
| If you have run <span><strong class="command">insmod</strong></span> or <span><strong class="command">modprobe</strong></span> to insert a module |
| in a particular directory, it is important that you specify this directory with the |
| <code class="option">-p</code> option first, so that it over-rides an older module binary that might |
| exist in other directories you've specified with <code class="option">-p</code>. It is up to you |
| to make sure that these values are correct: 2.6 kernels simply do not provide enough |
| information for OProfile to get this information. |
| </p> |
| </div> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="interpreting-callgraph"></a>3. Interpreting call-graph profiles</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| Sometimes the results from call-graph profiles may be different to what |
| you expect to see. The first thing to check is whether the target |
| binaries where compiled with frame pointers enabled (if the binary was |
| compiled using <span><strong class="command">gcc</strong></span>'s |
| <code class="option">-fomit-frame-pointer</code> option, you will not get |
| meaningful results). Note that as of this writing, the GCC developers |
| plan to disable frame pointers by default. The Linux kernel is built |
| without frame pointers by default; there is a configuration option you |
| can use to turn it on under the "Kernel Hacking" menu. |
| </p> |
| <p> |
| Often you may see a caller of a function that does not actually directly |
| call the function you're looking at (e.g. if <code class="function">a()</code> |
| calls <code class="function">b()</code>, which in turn calls |
| <code class="function">c()</code>, you may see an entry for |
| <code class="function">a()->c()</code>). What's actually occurring is that we |
| are taking samples at the very start (or the very end) of |
| <code class="function">c()</code>; at these few instructions, we haven't yet |
| created the new function's frame, so it appears as if |
| <code class="function">a()</code> is calling directly into |
| <code class="function">c()</code>. Be careful not to be misled by these |
| entries. |
| </p> |
| <p> |
| Like the rest of OProfile, call-graph profiling uses a statistical |
| approach; this means that sometimes a backtrace sample is truncated, or |
| even partially wrong. Bear this in mind when examining results. |
| </p> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="debug-info"></a>4. Inaccuracies in annotated source</h2> |
| </div> |
| </div> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="effect-of-optimizations"></a>4.1. Side effects of optimizations</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| The compiler can introduce some pitfalls in the annotated source output. |
| The optimizer can move pieces of code in such manner that two line of codes |
| are interlaced (instruction scheduling). Also debug info generated by the compiler |
| can show strange behavior. This is especially true for complex expressions e.g. inside |
| an if statement: |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| if (a && .. |
| b && .. |
| c &&) |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| here the problem come from the position of line number. The available debug |
| info does not give enough details for the if condition, so all samples are |
| accumulated at the position of the right brace of the expression. Using |
| <span><strong class="command">opannotate <code class="option">-a</code></strong></span> can help to show the real |
| samples at an assembly level. |
| </p> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="prologues"></a>4.2. Prologues and epilogues</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| The compiler generally needs to generate "glue" code across function calls, dependent |
| on the particular function call conventions used. Additionally other things |
| need to happen, like stack pointer adjustment for the local variables; this |
| code is known as the function prologue. Similar code is needed at function return, |
| and is known as the function epilogue. This will show up in annotations as |
| samples at the very start and end of a function, where there is no apparent |
| executable code in the source. |
| </p> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="inlined-function"></a>4.3. Inlined functions</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| You may see that a function is credited with a certain number of samples, but |
| the listing does not add up to the correct total. To pick a real example : |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| :internal_sk_buff_alloc_security(struct sk_buff *skb) |
| 353 2.342% :{ /* internal_sk_buff_alloc_security total: 1882 12.48% */ |
| : |
| : sk_buff_security_t *sksec; |
| 15 0.0995% : int rc = 0; |
| : |
| 10 0.06633% : sksec = skb->lsm_security; |
| 468 3.104% : if (sksec && sksec->magic == DSI_MAGIC) { |
| : goto out; |
| : } |
| : |
| : sksec = (sk_buff_security_t *) get_sk_buff_memory(skb); |
| 3 0.0199% : if (!sksec) { |
| 38 0.2521% : rc = -ENOMEM; |
| : goto out; |
| 10 0.06633% : } |
| : memset(sksec, 0, sizeof (sk_buff_security_t)); |
| 44 0.2919% : sksec->magic = DSI_MAGIC; |
| 32 0.2123% : sksec->skb = skb; |
| 45 0.2985% : sksec->sid = DSI_SID_NORMAL; |
| 31 0.2056% : skb->lsm_security = sksec; |
| : |
| : out: |
| : |
| 146 0.9685% : return rc; |
| : |
| 98 0.6501% :} |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Here, the function is credited with 1,882 samples, but the annotations |
| below do not account for this. This is usually because of inline functions - |
| the compiler marks such code with debug entries for the inline function |
| definition, and this is where <span><strong class="command">opannotate</strong></span> annotates |
| such samples. In the case above, <code class="function">memset</code> is the most |
| likely candidate for this problem. Examining the mixed source/assembly |
| output can help identify such results. |
| </p> |
| <p> |
| This problem is more visible when there is no source file available, in the |
| following example it's trivially visible the sums of symbols samples is less |
| than the number of the samples for this file. The difference must be accounted |
| to inline functions. |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| /* |
| * Total samples for file : "arch/i386/kernel/process.c" |
| * |
| * 109 2.4616 |
| */ |
| |
| /* default_idle total: 84 1.8970 */ |
| /* cpu_idle total: 21 0.4743 */ |
| /* flush_thread total: 1 0.0226 */ |
| /* prepare_to_copy total: 1 0.0226 */ |
| /* __switch_to total: 18 0.4065 */ |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| The missing samples are not lost, they will be credited to another source |
| location where the inlined function is defined. The inlined function will be |
| credited from multiple call site and merged in one place in the annotated |
| source file so there is no way to see from what call site are coming the |
| samples for an inlined function. |
| </p> |
| <p> |
| When running <span><strong class="command">opannotate</strong></span>, you may get a warning |
| "some functions compiled without debug information may have incorrect source line attributions". |
| In some rare cases, OProfile is not able to verify that the derived source line |
| is correct (when some parts of the binary image are compiled without debugging |
| information). Be wary of results if this warning appears. |
| </p> |
| <p> |
| Furthermore, for some languages the compiler can implicitly generate functions, |
| such as default copy constructors. Such functions are labelled by the compiler |
| as having a line number of 0, which means the source annotation can be confusing. |
| </p> |
| </div> |
| <div class="sect2" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h3 class="title"><a id="wrong-linenr-info"></a>4.4. Inaccuracy in line number information</h3> |
| </div> |
| </div> |
| </div> |
| <p> |
| Depending on your compiler you can fall into the following problem: |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| struct big_object { int a[500]; }; |
| |
| int main() |
| { |
| big_object a, b; |
| for (int i = 0 ; i != 1000 * 1000; ++i) |
| b = a; |
| return 0; |
| } |
| |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Compiled with <span><strong class="command">gcc</strong></span> 3.0.4 the annotated source is clearly inaccurate: |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| :int main() |
| :{ /* main total: 7871 100% */ |
| : big_object a, b; |
| : for (int i = 0 ; i != 1000 * 1000; ++i) |
| : b = a; |
| 7871 100% : return 0; |
| :} |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| The problem here is distinct from the IRQ latency problem; the debug line number |
| information is not precise enough; again, looking at output of <span><strong class="command">opannoatate -as</strong></span> can help. |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| :int main() |
| :{ |
| : big_object a, b; |
| : for (int i = 0 ; i != 1000 * 1000; ++i) |
| : 80484c0: push %ebp |
| : 80484c1: mov %esp,%ebp |
| : 80484c3: sub $0xfac,%esp |
| : 80484c9: push %edi |
| : 80484ca: push %esi |
| : 80484cb: push %ebx |
| : b = a; |
| : 80484cc: lea 0xfffff060(%ebp),%edx |
| : 80484d2: lea 0xfffff830(%ebp),%eax |
| : 80484d8: mov $0xf423f,%ebx |
| : 80484dd: lea 0x0(%esi),%esi |
| : return 0; |
| 3 0.03811% : 80484e0: mov %edx,%edi |
| : 80484e2: mov %eax,%esi |
| 1 0.0127% : 80484e4: cld |
| 8 0.1016% : 80484e5: mov $0x1f4,%ecx |
| 7850 99.73% : 80484ea: repz movsl %ds:(%esi),%es:(%edi) |
| 9 0.1143% : 80484ec: dec %ebx |
| : 80484ed: jns 80484e0 |
| : 80484ef: xor %eax,%eax |
| : 80484f1: pop %ebx |
| : 80484f2: pop %esi |
| : 80484f3: pop %edi |
| : 80484f4: leave |
| : 80484f5: ret |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| So here it's clear that copying is correctly credited with of all the samples, but the |
| line number information is misplaced. <span><strong class="command">objdump -dS</strong></span> exposes the |
| same problem. Note that maintaining accurate debug information for compilers when optimizing is difficult, so this problem is not suprising. |
| The problem of debug information |
| accuracy is also dependent on the binutils version used; some BFD library versions |
| contain a work-around for known problems of <span><strong class="command">gcc</strong></span>, some others do not. This is unfortunate but we must live with that, |
| since profiling is pointless when you disable optimisation (which would give better debugging entries). |
| </p> |
| </div> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="symbol-without-debug-info"></a>5. Assembly functions</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| Often the assembler cannot generate debug information automatically. |
| This means that you cannot get a source report unless |
| you manually define the neccessary debug information; read your assembler documentation for how you might |
| do that. The only |
| debugging info needed currently by OProfile is the line-number/filename-VMA association. When profiling assembly |
| without debugging info you can always get report for symbols, and optionally for VMA, through <span><strong class="command">opreport -l</strong></span> |
| or <span><strong class="command">opreport -d</strong></span>, but this works only for symbols with the right attributes. |
| For <span><strong class="command">gas</strong></span> you can get this by |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| .globl foo |
| .type foo,@function |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| whilst for <span><strong class="command">nasm</strong></span> you must use |
| </p> |
| <table xmlns="" border="0" style="background: #E0E0E0;" width="90%"> |
| <tr> |
| <td> |
| <pre class="screen"> |
| GLOBAL foo:function ; [1] |
| </pre> |
| </td> |
| </tr> |
| </table> |
| <p> |
| Note that OProfile does not need the global attribute, only the function attribute. |
| </p> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="overlapping-symbols"></a>6. Overlapping symbols in JITed code</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| Some virtual machines (e.g., Java) may re-JIT a method, resulting in previously |
| allocated space for a piece of compiled code to be reused. This means that, at one distinct |
| code address, multiple symbols/methods may be present during the run time of the application. |
| </p> |
| <p> |
| Since OProfile samples are buffered and don′t have timing information, there is no way |
| to correlate samples with the (possibly) varying address ranges in which the code for a symbol |
| may reside. |
| An alternative would be flushing the OProfile sampling buffer when we get an unload event, |
| but this could result in high overhead. |
| </p> |
| <p> |
| To moderate the problem of overlapping symbols, OProfile tries to select the symbol that was |
| present at this address range most of the time. Additionally, other overlapping symbols |
| are truncated in the overlapping area. |
| This gives reasonable results, because in reality, address reuse typically takes place |
| during phase changes of the application -- in particular, during application startup. |
| Thus, for optimum profiling results, start the sampling session after application startup |
| and burn in. |
| </p> |
| </div> |
| <div class="sect1" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title" style="clear: both"><a id="hidden-cost"></a>7. Other discrepancies</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| Another cause of apparent problems is the hidden cost of instructions. A very |
| common example is two memory reads: one from L1 cache and the other from memory: |
| the second memory read is likely to have more samples. |
| There are many other causes of hidden cost of instructions. A non-exhaustive |
| list: mis-predicted branch, TLB cache miss, partial register stall, |
| partial register dependencies, memory mismatch stall, re-executed µops. If you want to write |
| programs at the assembly level, be sure to take a look at the Intel and |
| AMD documentation at <a href="http://developer.intel.com/">http://developer.intel.com/</a> |
| and <a href="http://developer.amd.com/devguides.jsp/">http://developer.amd.com/devguides.jsp</a>. |
| </p> |
| </div> |
| </div> |
| <div class="chapter" lang="en" xml:lang="en"> |
| <div class="titlepage"> |
| <div> |
| <div> |
| <h2 class="title"><a id="ack"></a>Chapter 6. Acknowledgments</h2> |
| </div> |
| </div> |
| </div> |
| <p> |
| Thanks to (in no particular order) : Arjan van de Ven, Rik van Riel, Juan Quintela, Philippe Elie, |
| Phillipp Rumpf, Tigran Aivazian, Alex Brown, Alisdair Rawsthorne, Bob Montgomery, Ray Bryant, H.J. Lu, |
| Jeff Esper, Will Cohen, Graydon Hoare, Cliff Woolley, Alex Tsariounov, Al Stone, Jason Yeh, |
| Randolph Chung, Anton Blanchard, Richard Henderson, Andries Brouwer, Bryan Rittmeyer, |
| Maynard P. Johnson, |
| Richard Reich (rreich@rdrtech.com), Zwane Mwaikambo, Dave Jones, Charles Filtness; and finally Pulp, for "Intro". |
| </p> |
| </div> |
| </div> |
| </body> |
| </html> |