| Compound word hyphenation |
| |
| Hyphen library supports better compound word hyphenation and special |
| rules of compound word hyphenation of German languages and other |
| languages with arbitrary number of compound words. The new options, |
| COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN help to set the right |
| style for the hyphenation of compound words. |
| |
| Algorithm |
| |
| The algorithm is an extension of the original pattern based hyphenation |
| algorithm. It uses two hyphenation pattern sets, defined in the same |
| pattern file and separated by the NEXTLEVEL keyword. First pattern |
| set is for hyphenation only at compound word boundaries, the second one |
| is for hyphenation within words or word parts. |
| |
| Recursive compound level hyphenation |
| |
| The algorithm is recursive: every word parts of a successful |
| first (compound) level hyphenation will be rehyphenated |
| by the same (first) pattern set. |
| |
| Finally, when first level hyphenation is not possible, Hyphen uses |
| the second level hyphenation for the word or the word parts. |
| |
| Word endings and word parts |
| |
| Patterns for word endings (patterns with ellipses) match the |
| word parts, too. |
| |
| Options |
| |
| COMPOUNDLEFTHYPHENMIN: min. hyph. dist. from the left compound word boundary |
| COMPOUNDRIGHTHYPHENMIN: min. hyph. dist. from the right comp. word boundary |
| NEXTLEVEL: sign second level hyphenation patterns |
| |
| Default hyphenmin values |
| |
| Default values of COMPOUNDLEFTHYPHENMIN and COMPOUNDRIGHTHYPHENMIN are 0, |
| and 0 under the hyphenation, too. ("0" values of |
| LEFTHYPHENMIN and RIGHTHYPHENMIN mean the default "2" under the hyphenation.) |
| |
| Examples |
| |
| See tests/compound* test files. |
| |
| Preparation of hyphenation patterns |
| |
| It hasn't been special pattern generator tool for compound hyphenation |
| patterns, yet. It is possible to use PATGEN to generate both of |
| pattern sets, concatenate it manually and set the requested HYPHENMIN values. |
| (But don't forget the preprocessing steps by substrings.pl before |
| concatenation.) One of the disadvantage of this method, that PATGEN |
| doesn't know recursive compound hyphenation of Hyphen. |
| |
| László Németh |
| <nemeth (at) openoffice.org> |