| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" |
| "http://www.w3.org/TR/html4/strict.dtd"> |
| <html> |
| <head> |
| <title>Checker Developer Manual</title> |
| <link type="text/css" rel="stylesheet" href="menu.css"> |
| <link type="text/css" rel="stylesheet" href="content.css"> |
| <script type="text/javascript" src="scripts/menu.js"></script> |
| </head> |
| <body> |
| |
| <div id="page"> |
| <!--#include virtual="menu.html.incl"--> |
| |
| <div id="content"> |
| |
| <h1 style="color:red">This Page Is Under Construction</h1> |
| |
| <h1>Checker Developer Manual</h1> |
| |
| <p>The static analyzer engine performs symbolic execution of the program and |
| relies on a set of checkers to implement the logic for detecting and |
| constructing bug reports. This page provides hints and guidelines for anyone |
| who is interested in implementing their own checker. The static analyzer is a |
| part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a> |
| and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a> |
| for general developer guidelines and information. </p> |
| |
| <ul> |
| <li><a href="#start">Getting Started</a></li> |
| <li><a href="#analyzer">Analyzer Overview</a></li> |
| <li><a href="#idea">Idea for a Checker</a></li> |
| <li><a href="#registration">Checker Registration</a></li> |
| <li><a href="#skeleton">Checker Skeleton</a></li> |
| <li><a href="#node">Exploded Node</a></li> |
| <li><a href="#bugs">Bug Reports</a></li> |
| <li><a href="#ast">AST Visitors</a></li> |
| <li><a href="#testing">Testing</a></li> |
| <li><a href="#commands">Useful Commands</a></li> |
| </ul> |
| |
| <h2 id=start>Getting Started</h2> |
| <ul> |
| <li>To check out the source code and build the project, follow steps 1-4 of |
| the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a> |
| page.</li> |
| |
| <li>The analyzer source code is located under the Clang source tree: |
| <br><tt> |
| $ <b>cd llvm/tools/clang</b> |
| </tt> |
| <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>, |
| <tt>test/Analysis</tt>.</li> |
| |
| <li>The analyzer regression tests can be executed from the Clang's build |
| directory: |
| <br><tt> |
| $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b> |
| </tt></li> |
| |
| <li>Analyze a file with the specified checker: |
| <br><tt> |
| $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b> |
| </tt></li> |
| |
| <li>List the available checkers: |
| <br><tt> |
| $ <b>clang -cc1 -analyzer-checker-help</b> |
| </tt></li> |
| |
| <li>See the analyzer help for different output formats, fine tuning, and |
| debug options: |
| <br><tt> |
| $ <b>clang -cc1 -help | grep "analyzer"</b> |
| </tt></li> |
| |
| </ul> |
| |
| <h2 id=analyzer>Static Analyzer Overview</h2> |
| The analyzer core performs symbolic execution of the given program. All the |
| input values are represented with symbolic values; further, the engine deduces |
| the values of all the expressions in the program based on the input symbols |
| and the path. The execution is path sensitive and every possible path through |
| the program is explored. The explored execution traces are represented with |
| <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplodedGraph</a> object. |
| Each node of the graph is |
| <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>, |
| which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>. |
| <p> |
| <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a> |
| represents the corresponding location in the program (or the CFG graph). |
| <tt>ProgramPoint</tt> is also used to record additional information on |
| when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt> |
| kind means that the state is the result of purging dead symbols - the |
| analyzer's equivalent of garbage collection. |
| <p> |
| <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a> |
| represents abstract state of the program. It consists of: |
| <ul> |
| <li><tt>Environment</tt> - a mapping from source code expressions to symbolic |
| values |
| <li><tt>Store</tt> - a mapping from memory locations to symbolic values |
| <li><tt>GenericDataMap</tt> - constraints on symbolic values |
| </ul> |
| |
| <h3>Interaction with Checkers</h3> |
| Checkers are not merely passive receivers of the analyzer core changes - they |
| actively participate in the <tt>ProgramState</tt> construction through the |
| <tt>GenericDataMap</tt> which can be used to store the checker-defined part |
| of the state. Each time the analyzer engine explores a new statement, it |
| notifies each checker registered to listen for that statement, giving it an |
| opportunity to either report a bug or modify the state. (As a rule of thumb, |
| the checker itself should be stateless.) The checkers are called one after another |
| in the predefined order; thus, calling all the checkers adds a chain to the |
| <tt>ExplodedGraph</tt>. |
| |
| <h3>Representing Values</h3> |
| During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a> |
| objects are used to represent the semantic evaluation of expressions. |
| They can represent things like concrete |
| integers, symbolic values, or memory locations (which are memory regions). |
| They are a discriminated union of "values", symbolic and otherwise. |
| If a value isn't symbolic, usually that means there is no symbolic |
| information to track. For example, if the value was an integer, such as |
| <tt>42</tt>, it would be a <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1nonloc_1_1ConcreteInt.html">ConcreteInt</a>, |
| and the checker doesn't usually need to track any state with the concrete |
| number. In some cases, <tt>SVal</tt> is not a symbol, but it really should be |
| a symbolic value. This happens when the analyzer cannot reason about something |
| (yet). An example is floating point numbers. In such cases, the |
| <tt>SVal</tt> will evaluate to <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1UnknownVal.html">UnknownVal<a>. |
| This represents a case that is outside the realm of the analyzer's reasoning |
| capabilities. <tt>SVals</tt> are value objects and their values can be viewed |
| using the <tt>.dump()</tt> method. Often they wrap persistent objects such as |
| symbols or regions. |
| <p> |
| <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol) |
| is meant to represent abstract, but named, symbolic value. Symbols represent |
| an actual (immutable) value. We might not know what its specific value is, but |
| we can associate constraints with that value as we analyze a path. For |
| example, we might record that the value of a symbol is greater than |
| <tt>0</tt>, etc. |
| <p> |
| |
| <p> |
| <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol. |
| It is used to provide a lexicon of how to describe abstract memory. Regions can |
| layer on top of other regions, providing a layered approach to representing memory. |
| For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>, |
| but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could |
| be used to represent the memory associated with a specific field of that object. |
| So how do we represent symbolic memory regions? That's what |
| <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a> |
| is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the |
| symbol is unique and has a unique name; that symbol names the region. |
| |
| <P> |
| Let's see how the analyzer processes the expressions in the following example: |
| <p> |
| <pre class="code_example"> |
| int foo(int x) { |
| int y = x * 2; |
| int z = x; |
| ... |
| } |
| </pre> |
| <p> |
| Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated, |
| we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in |
| this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>. |
| Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>, |
| which references the value <b>currently bound</b> to <tt>x</tt>. That value is |
| symbolic; it's whatever <tt>x</tt> was bound to at the start of the function. |
| Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>, |
| and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When |
| we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions, |
| and create a new <tt>SVal</tt> that represents their multiplication (which in |
| this case is a new symbolic expression, which we might call <tt>$1</tt>). When we |
| evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>), |
| and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>) |
| to the <tt>MemRegion</tt> in the symbolic store. |
| <br> |
| The second line is similar. When we evaluate <tt>x</tt> again, we do the same |
| dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt> |
| might reference the same underlying values. |
| |
| <p> |
| To summarize, MemRegions are unique names for blocks of memory. Symbols are |
| unique names for abstract symbolic values. Some MemRegions represents abstract |
| symbolic chunks of memory, and thus are also based on symbols. SVals are just |
| references to values, and can reference either MemRegions, Symbols, or concrete |
| values (e.g., the number 1). |
| |
| <!-- |
| TODO: Add a picture. |
| <br> |
| Symbols<br> |
| FunctionalObjects are used throughout. |
| --> |
| <h2 id=idea>Idea for a Checker</h2> |
| Here are several questions which you should consider when evaluating your |
| checker idea: |
| <ul> |
| <li>Can the check be effectively implemented without path-sensitive |
| analysis? See <a href="#ast">AST Visitors</a>.</li> |
| |
| <li>How high the false positive rate is going to be? Looking at the occurrences |
| of the issue you want to write a checker for in the existing code bases might |
| give you some ideas. </li> |
| |
| <li>How the current limitations of the analysis will effect the false alarm |
| rate? Currently, the analyzer only reasons about one procedure at a time (no |
| inter-procedural analysis). Also, it uses a simple range tracking based |
| solver to model symbolic execution.</li> |
| |
| <li>Consult the <a |
| href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&bug_status=NEW&bug_status=REOPENED&version=trunk&component=Static%20Analyzer&product=clang">Bugzilla database</a> |
| to get some ideas for new checkers and consider starting with improving/fixing |
| bugs in the existing checkers.</li> |
| </ul> |
| |
| <h2 id=registration>Checker Registration</h2> |
| All checker implementation files are located in <tt>clang/lib/StaticAnalyzer/Checkers</tt> |
| folder. Follow the steps below to register a new checker with the analyzer. |
| <ol> |
| <li>Create a new checker implementation file, for example <tt>./lib/StaticAnalyzer/Checkers/NewChecker.cpp</tt> |
| <pre class="code_example"> |
| using namespace clang; |
| using namespace ento; |
| |
| namespace { |
| class NewChecker: public Checker< check::PreStmt<CallExpr> > { |
| public: |
| void checkPreStmt(const CallExpr *CE, CheckerContext &Ctx) const {} |
| } |
| } |
| void ento::registerNewChecker(CheckerManager &mgr) { |
| mgr.registerChecker<NewChecker>(); |
| } |
| </pre> |
| |
| <li>Pick the package name for your checker and add the registration code to |
| <tt>./lib/StaticAnalyzer/Checkers/Checkers.td</tt>. Note, all checkers should |
| first be developed as experimental. Suppose our new checker performs security |
| related checks, then we should add the following lines under |
| <tt>SecurityExperimental</tt> package: |
| <pre class="code_example"> |
| let ParentPackage = SecurityExperimental in { |
| ... |
| def NewChecker : Checker<"NewChecker">, |
| HelpText<"This text should give a short description of the checks performed.">, |
| DescFile<"NewChecker.cpp">; |
| ... |
| } // end "security.experimental" |
| </pre> |
| |
| <li>Make the source code file visible to CMake by adding it to |
| <tt>./lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>. |
| |
| <li>Compile and see your checker in the list of available checkers by running:<br> |
| <tt><b>$clang -cc1 -analyzer-checker-help</b></tt> |
| </ol> |
| |
| |
| <h2 id=skeleton>Checker Skeleton</h2> |
| There are two main decisions you need to make: |
| <ul> |
| <li> Which events the checker should be tracking. |
| See <a href="http://clang.llvm.org/doxygen/classento_1_1CheckerDocumentation.html">CheckerDocumentation</a> |
| for the list of available checker callbacks.</li> |
| <li> What data you want to store as part of the checker-specific program |
| state. Try to minimize the checker state as much as possible. </li> |
| </ul> |
| |
| <h2 id=bugs>Bug Reports</h2> |
| |
| <h2 id=ast>AST Visitors</h2> |
| Some checks might not require path-sensitivity to be effective. Simple AST walk |
| might be sufficient. If that is the case, consider implementing a Clang |
| compiler warning. On the other hand, a check might not be acceptable as a compiler |
| warning; for example, because of a relatively high false positive rate. In this |
| situation, AST callbacks <tt><b>checkASTDecl</b></tt> and |
| <tt><b>checkASTCodeBody</b></tt> are your best friends. |
| |
| <h2 id=testing>Testing</h2> |
| Every patch should be well tested with Clang regression tests. The checker tests |
| live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests, |
| execute the following from the <tt>clang</tt> build directory: |
| <pre class="code"> |
| $ <b>TESTDIRS=Analysis make test</b> |
| </pre> |
| |
| <h2 id=commands>Useful Commands/Debugging Hints</h2> |
| <ul> |
| <li> |
| While investigating a checker-related issue, instruct the analyzer to only |
| execute a single checker: |
| <br><tt> |
| $ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b> |
| </tt> |
| </li> |
| <li> |
| To dump AST: |
| <br><tt> |
| $ <b>clang -cc1 -ast-dump test.c</b> |
| </tt> |
| </li> |
| <li> |
| To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt> checkers: |
| <br><tt> |
| $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b> |
| </tt> |
| </li> |
| <li> |
| To see all available debug checkers: |
| <br><tt> |
| $ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b> |
| </tt> |
| </li> |
| <li> |
| To see which function is failing while processing a large file use |
| <tt>-analyzer-display-progress</tt> option. |
| </li> |
| <li> |
| While debugging execute <tt>clang -cc1 -analyze -analyzer-checker=core</tt> |
| instead of <tt>clang --analyze</tt>, as the later would call the compiler |
| in a separate process. |
| </li> |
| <li> |
| To view <tt>ExplodedGraph</tt> (the state graph explored by the analyzer) while |
| debugging, goto a frame that has <tt>clang::ento::ExprEngine</tt> object and |
| execute: |
| <br><tt> |
| (gdb) <b>p ViewGraph(0)</b> |
| </tt> |
| </li> |
| <li> |
| To see the <tt>ProgramState</tt> while debugging use the following command. |
| <br><tt> |
| (gdb) <b>p State->dump()</b> |
| </tt> |
| </li> |
| <li> |
| To see <tt>clang::Expr</tt> while debugging use the following command. If you |
| pass in a SourceManager object, it will also dump the corresponding line in the |
| source code. |
| <br><tt> |
| (gdb) <b>p E->dump()</b> |
| </tt> |
| </li> |
| <li> |
| To dump AST of a method that the current <tt>ExplodedNode</tt> belongs to: |
| <br><tt> |
| (gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b> |
| (gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump(getContext().getSourceManager())</b> |
| </tt> |
| </li> |
| </ul> |
| |
| </div> |
| </div> |
| </body> |
| </html> |