You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-commits@xmlgraphics.apache.org by Apache Wiki <wi...@apache.org> on 2005/04/01 22:45:20 UTC

[Xmlgraphics-fop Wiki] Update of "FopImplementationNotes/OooHyphenationPatterns" by SimonPepping

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Xmlgraphics-fop Wiki" for change notification.

The following page has been changed by SimonPepping:
http://wiki.apache.org/xmlgraphics-fop/FopImplementationNotes/OooHyphenationPatterns

New page:
= Failure to use OOo hyphenation patterns in FOP =

I have made an effort to use the hyphenation pattern files of
OpenOffice.org in FOP. The format of these files seems to be called
'ALTLinux' (see [http://lingucomponent.openoffice.org/hyphenator.html]).
In this format the first line contains the encoding of the file. Each
following line contains a pattern, exactly as in the pattern element
in a FOP XML hyphenation pattern file. ALTLinux files do not have
classes or exceptions.

Parsing the format and building a HyphenationTree object for it was
not difficult. But when I used the result, there was no hyphenation. 
This turns out to be due to the absence of classes. 

A class is a set of characters that are equivalent with respect to
hyphenation. Almost all classes consist of a lower case and the
corresponding upper case character. FOP has a second use of the
classes besides equivalence.  All characters listed in a class are
considered as letters, all other characters as non-letters. A word
with a non-letter is not hyphenated. An ALTLinux hyphenation pattern
file does not define letters, and therefore there is no hyphenation.

All West-European languages have a-z and A-Z as letters. But they
differ in their definition of letters in the accented character
range. Russian and other languages with Cyrillic script, of course,
deviate completely from this template. Therefore it does not seem
feasible to supply a definition of letters in the programming code.

The [http://linux.org.mt/projects/jtextcheck/index.html JTextCheck framework],
with the OOo
hyphenation plugin, obviously is able to work with the OOo
patterns. It delivers the hyphenation points for the words in a string
of text. That means that it provides a service that is rather similar
to that of FOP's hyphenation code, and it should be possible to use
this framework instead of or in parallel with FOP's hyphenation
code. But I do not see a sufficient need to justify the coding effort
needed to make this work.