You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by bu...@apache.org on 2003/02/07 11:12:01 UTC

DO NOT REPLY [Bug 16870] New: - Hyphenation bug including bugfix : sporadic mutilation of hyphenated word

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=16870>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=16870

Hyphenation bug including bugfix : sporadic mutilation of hyphenated word

           Summary: Hyphenation bug including bugfix : sporadic mutilation
                    of hyphenated word
           Product: Fop
           Version: 0.20.4
          Platform: PC
        OS/Version: Windows NT/2K
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: general
        AssignedTo: fop-dev@xml.apache.org
        ReportedBy: wewerka@ThreeDimensions.de


Explanation of bug:
-------------------
Under some circumstances (see below) some hyphenated words are mutilated.

E.g. the german word Altersvorsorge, was SOMETIMES (but not very often) 
hyphenated rsvor-Altesorge.


Reason:
-------
Xerces uses the characters() calls to give FOP a character buffer which is 
a 'view window' on the current document. It can happen that one word 
(like "Altersvorsorge") is fragmented over two calls of characters(). In the 
given example : "Alte" and "rsvorsorge"

FOP adds the first part of the word to the "pending areas". This happens in 
org\apache\fop\layout\LineArea.java in the method addText(). Xerces delivers 
the rest of the word in his second characters-call which results in a second 
call to addText(). 

In this second call (if hyphenation is set to true) the method doHyphenation() 
(also in class LineArea) is called which completely ignores pending areas!!! So 
it happens that the word fragment "rsvorsorge" is handed over to the 
hyphenation engine, which does a correct job with this fragment.

Now the Hyphenator determines that "rsvor-" is added to the current line area. 

The next call to addText checks if there are any pending areas ("Alte" in our 
example) prints it in the next line and continues with the rest of the current 
buffer ("sorge [...]" in the example).

So the reason that this bug occurs only in very few situations is that it 
depends on 
1) how often and with which buffer size the xml-parser calls the characters-
method and so I think it definitely depends on the version of the xml parser 
used
2) how the xml-document looks like; an additional character/newline somewhere 
BEFORE the mutilated word can change the calls to the characters method.



MY CHANGES
----------
I changed the internals of the method doHyphenation(). It now takes into 
account any pending areas which may contain word fragments. 

New Approach in doHyphenation:
1) Scan pending areas vector for pending text fragments, and remove them from 
the pending areas vector
2) Concatenate result from 1) with the current word to be hyphenated in the 
current char-buffer 
3) call Hyphenator
4) use addWord to add pre-hyphen word fragment to current line area
5) Decision: is final hyphenation point somewhere in the pending area or in the 
current char-buffer ?

5a) hyphenation point is somewhere in the pending area :
--> add rest of characters of the pending pending text fragments to the pending 
area vector (they will be printed in a new line (by addText()) together with 
the rest of the word which is in the current buffer). For this task I used the 
existing addSpacedWord() method with the pending parameter set to true.

5b) hyphenation point is somewhere in the current char buffer:
--> just return new position in current char buffer



I also changed the signature of doHyphenation():
Parameter TextState was added : addSpacedWord method (used in 5a) needs the 
current textState


The call to doHyphenation() in LineArea.addText() is modified:
The remaining width parameter now isn't reduced by the pendingWidth, because 
doHyphenation now looks at pending areas itself:

ret = this.doHyphenation(dataCopy, i, wordStart,
 this.getContentWidth()
 - (finalWidth
 + spaceWidth
 /*+ pendingWidth*/), textState);



I think it doesn't make sense that I include our xsl-fo documents to reproduce 
the error, because we use custom fonts, which will likely lead to a different 
layout on your system and the error will probably not occur.




Chris Wewerka
wewerka@ThreeDimensions.de
Munich, Germany

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org