| Notes on WST StructuredDocument |
| ------------------------------- |
| |
| Created: 2010/11/26 |
| References: WST 3.1.x, Eclipse 3.5 Galileo |
| |
| To manipulate XML documents in refactorings, we sometimes use the WST/SEE |
| "StructuredDocument" API. There isn't exactly a lot of documentation on |
| this out there, so this is a short explanation of how it works, totally |
| based on _empirical_ evidence. As such, it must be taken with a grain of salt. |
| |
| Examples of usage can be found in |
| sdk/eclipse/plugins/com.android.ide.eclipse.adt/src/com/android/ide/eclipse/adt/internal/refactorings/ |
| |
| |
| 1- Get a document instance |
| -------------------------- |
| |
| To get a document from an existing IFile resource: |
| |
| IModelManager modelMan = StructuredModelManager.getModelManager(); |
| IStructuredDocument sdoc = modelMan.createStructuredDocumentFor(file); |
| |
| Note that the IStructuredDocument and all the associated interfaces we'll use |
| below are all located in org.eclipse.wst.sse.core.internal.provisional, |
| meaning they _might_ change later. |
| |
| Also note that this parses the content of the file on disk, not of a buffer |
| with pending unsaved modifications opened in an editor. |
| |
| There is a counterpart for non-existent resources: |
| |
| IModelManager.createNewStructuredDocumentFor(IFile) |
| |
| However our goal so far has been to _parse_ existing documents, find |
| the place that we wanted to modify and then generate a TextFileChange |
| for a refactoring operation. Consequently this document doesn't say |
| anything about using this model to modify content directly. |
| |
| |
| 2- Structured Document overview |
| ------------------------------- |
| |
| The IStructuredDocument is organized in "regions", which are little pieces |
| of text. |
| |
| The document contains a list of region collections, each one being |
| a list of regions. Each region has a type, as well as text. |
| |
| Since we use this to parse XML, let's look at this XML example: |
| |
| <?xml version="1.0" encoding="utf-8"?> \n |
| <resource> \n |
| <color/> |
| <string name="my_string">Some Value</string> <!-- comment -->\n |
| </resource> |
| |
| |
| This will result in the following regions and sub-regions: |
| (all the constants below are located in DOMRegionContext) |
| |
| XML_PI_OPEN |
| XML_PI_OPEN:<? |
| XML_TAG_NAME:xml |
| XML_TAG_ATTRIBUTE_NAME:version |
| XML_TAG_ATTRIBUTE_EQUALS:= |
| XML_TAG_ATTRIBUTE_VALUE:"1.0" |
| XML_TAG_ATTRIBUTE_NAME:encoding |
| XML_TAG_ATTRIBUTE_EQUALS:= |
| XML_TAG_ATTRIBUTE_VALUE:"utf-8" |
| XML_PI_CLOSE:?> |
| |
| XML_CONTENT |
| XML_CONTENT:\n |
| |
| XML_TAG_NAME |
| XML_TAG_OPEN:< |
| XML_TAG_NAME:resources |
| XML_TAG_CLOSE:> |
| |
| XML_CONTENT |
| XML_CONTENT:\n + whitespace before color |
| |
| XML_TAG_NAME |
| XML_TAG_OPEN:< |
| XML_TAG_NAME:color |
| XML_EMPTY_TAG_CLOSE:/> |
| |
| XML_CONTENT |
| XML_CONTENT:\n + whitespace before string |
| |
| XML_TAG_NAME |
| XML_TAG_OPEN:< |
| XML_TAG_NAME:string |
| XML_TAG_ATTRIBUTE_NAME:name |
| XML_TAG_ATTRIBUTE_EQUALS:= |
| XML_TAG_ATTRIBUTE_VALUE:"my_string" |
| XML_TAG_CLOSE:> |
| |
| XML_CONTENT |
| XML_CONTENT:Some Value |
| |
| XML_TAG_NAME |
| XML_END_TAG_OPEN:</ |
| XML_TAG_NAME:string |
| XML_TAG_CLOSE:> |
| |
| XML_CONTENT |
| XML_CONTENT: (2 spaces before the comment) |
| |
| XML_COMMENT_TEXT |
| XML_COMMENT_OPEN:<!-- |
| XML_COMMENT_TEXT: comment |
| XML_COMMENT_CLOSE:-- |
| |
| XML_CONTENT |
| XML_CONTENT: \n after comment |
| |
| XML_TAG_NAME |
| XML_END_TAG_OPEN:</ |
| XML_TAG_NAME:resources |
| XML_TAG_CLOSE:> |
| |
| XML_CONTENT |
| XML_CONTENT: |
| |
| |
| 3- Iterating through regions |
| ---------------------------- |
| |
| To iterate through all regions, we need to process the list of top-level regions and then |
| iterate over inner regions: |
| |
| for (IStructuredDocumentRegion regions : sdoc.getStructuredDocumentRegions()) { |
| // process inner regions |
| for (int i = 0; i < regions.getNumberOfRegions(); i++) { |
| ITextRegion region = regions.getRegions().get(i); |
| String type = region.getType(); |
| String text = regions.getText(region); |
| } |
| } |
| |
| Each "region collection" basically matches one XML tag, with sub-regions for all the tokens |
| inside a tag. |
| |
| Note that an XML_CONTENT region is actually the whitespace, was is known as a TEXT in the w3c DOM. |
| |
| Also note that each outer region has a type, but the inner regions also reuse a similar type. |
| So for example an outer XML_TAG_NAME region collection is a proper XML tag, and it will contain |
| an opening tag, a closing tag but also an XML_TAG_NAME that is the tag name itself. |
| |
| Surprisingly, the inner regions do not have many access methods we can use on them, except their |
| type and start/length/end. There are two length and end methods: |
| - getLength() and getEnd() take any whitespace into account. |
| - getTextLength() and getTextEnd() exclude some typical trailing whitespace. |
| |
| Note that regarding the trailing whitespace, empirical evidence shows that in the XML case |
| here, the only case where it matters is in a tag such as <string name="my_string">: for the |
| XML_TAG_NAME region, getLength is 7 (string + space) and getTextLength is 6 (string, no space). |
| Spacing between XML element is its own collapsed region. |
| |
| If you want the text of the inner region, you actually need to query it from the outer region. |
| The outer IStructuredDocumentRegion (the region collection) contains lots more useful access |
| methods, some of which return details on the inner regions: |
| - getText : without the whitespace. |
| - getFullText : with the whitespace. |
| - getStart / getLength / getEnd : type-dependent offset, including whitespace. |
| - getStart / getTextLength / getTextEnd : type-dependent offset, excluding "irrelevant" whitespace. |
| - getStartOffset / getEndOffset / getTextEndOffset : relative to document. |
| |
| Empirical evidence shows that there is no discernible difference between the getStart/getEnd |
| values and those returned by getStartOffset/getEndOffset. Please abide by the javadoc. |
| |
| All offsets start at zero. |
| |
| Given a region collection, you can also browse regions either using a getRegions() list, or |
| using getFirst/getLastRegion, or using getRegionAtCharacterOffset(). Iterating the region |
| list seems the most useful scenario. There's no actual iterator provided for inner regions. |
| |
| There are a few other methods available in the regions classes. This was not an exhaustive list. |
| |
| |
| ---- |