| ======================================= |
| LLVM's Optional Rich Disassembly Output |
| ======================================= |
| |
| .. contents:: |
| :local: |
| |
| Introduction |
| ============ |
| |
| LLVM's default disassembly output is raw text. To allow consumers more ability |
| to introspect the instructions' textual representation or to reformat for a more |
| user friendly display there is an optional rich disassembly output. |
| |
| This optional output is sufficient to reference into individual portions of the |
| instruction text. This is intended for clients like disassemblers, list file |
| generators, and pretty-printers, which need more than the raw instructions and |
| the ability to print them. |
| |
| To provide this functionality the assembly text is marked up with annotations. |
| The markup is simple enough in syntax to be robust even in the case of version |
| mismatches between consumers and producers. That is, the syntax generally does |
| not carry semantics beyond "this text has an annotation," so consumers can |
| simply ignore annotations they do not understand or do not care about. |
| |
| After calling ``LLVMCreateDisasm()`` to create a disassembler context the |
| optional output is enable with this call: |
| |
| .. code-block:: c |
| |
| LLVMSetDisasmOptions(DC, LLVMDisassembler_Option_UseMarkup); |
| |
| Then subsequent calls to ``LLVMDisasmInstruction()`` will return output strings |
| with the marked up annotations. |
| |
| Instruction Annotations |
| ======================= |
| |
| .. _contextual markups: |
| |
| Contextual markups |
| ------------------ |
| |
| Annoated assembly display will supply contextual markup to help clients more |
| efficiently implement things like pretty printers. Most markup will be target |
| independent, so clients can effectively provide good display without any target |
| specific knowledge. |
| |
| Annotated assembly goes through the normal instruction printer, but optionally |
| includes contextual tags on portions of the instruction string. An annotation |
| is any '<' '>' delimited section of text(1). |
| |
| .. code-block:: bat |
| |
| annotation: '<' tag-name tag-modifier-list ':' annotated-text '>' |
| tag-name: identifier |
| tag-modifier-list: comma delimited identifier list |
| |
| The tag-name is an identifier which gives the type of the annotation. For the |
| first pass, this will be very simple, with memory references, registers, and |
| immediates having the tag names "mem", "reg", and "imm", respectively. |
| |
| The tag-modifier-list is typically additional target-specific context, such as |
| register class. |
| |
| Clients should accept and ignore any tag-names or tag-modifiers they do not |
| understand, allowing the annotations to grow in richness without breaking older |
| clients. |
| |
| For example, a possible annotation of an ARM load of a stack-relative location |
| might be annotated as: |
| |
| .. code-block:: nasm |
| |
| ldr <reg gpr:r0>, <mem regoffset:[<reg gpr:sp>, <imm:#4>]> |
| |
| |
| 1: For assembly dialects in which '<' and/or '>' are legal tokens, a literal token is escaped by following immediately with a repeat of the character. For example, a literal '<' character is output as '<<' in an annotated assembly string. |
| |
| C API Details |
| ------------- |
| |
| The intended consumers of this information use the C API, therefore the new C |
| API function for the disassembler will be added to provide an option to produce |
| disassembled instructions with annotations, ``LLVMSetDisasmOptions()`` and the |
| ``LLVMDisassembler_Option_UseMarkup`` option (see above). |