You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by "Andreas L. Delmelle (JIRA)" <ji...@apache.org> on 2015/05/08 17:56:59 UTC

[jira] [Created] (FOP-2466) Improve output for pre-hyphenated text with SHY combined with hyphenation properties

Andreas L. Delmelle created FOP-2466:
----------------------------------------

             Summary: Improve output for pre-hyphenated text with SHY combined with hyphenation properties
                 Key: FOP-2466
                 URL: https://issues.apache.org/jira/browse/FOP-2466
             Project: Fop
          Issue Type: Improvement
          Components: layout/line
    Affects Versions: 1.1
            Reporter: Andreas L. Delmelle
            Priority: Minor


When processing a FO file that contains pre-hyphenated text, using soft-hyphens, FOP's hyphenation does not yield usable results.

>From the corresponding thread on fop-users@:

... internally for FOP, [t]he accumulated sequence of characters since the previous break opportunity is taken to be a 'word', which may or may not end in a hyphen. If the latter is true, a specific sequence of elements is glued to the word-box, to prevent a break before SHY and make sure that it is properly rendered, i.e. only counts if the break occurs right after.

As hyphenation by FOP itself is applied at a higher level, when all layout elements for a whole paragraph have been collected, that SHY sequence is seen as a word boundary. That is, that part of the algorithm just accumulates the text for ‘uninterrupted' sequences of word-boxes, and feeds those pieces to the hyphenator. The real intention is to apply hyphenation across any nested fo:inlines. ‘Uninterrupted’ means that auxiliary elements, generated for border or padding are explicitly *not* considered as word boundaries. The sequence generated for SHY contains two non-auxiliary elements, as if it were a space. Perhaps, just to ensure that that position in the layout always leads to a character that is visibly rendered.

In case of pre-hyphenated text, this has the unintended effect of restricting the input for the hyphenator to parts of words, which is basically meaningless (and wasteful).

Amongst others, this leads to the "hyphenation-ladder-count" property having seemingly no effect.

Note - At this point, I believe the behaviour is not necessarily incorrect. I am also thinking that it would be correct to ignore hyphenation-ladder-count in case hyphenation="false".

Initial idea for a fix: 
Make sure that the SHY sequence is not treated as a word boundary in LineLM when accumulating text for boxes generated by the TextLMs. Once done, we should then be able to check for each hyphenation point that FOP itself calculates, whether there is already an explicit SHY present at that same point. In that case, we can just do nothing (= leave the SHY in place).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)