You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Joerg Pietschmann <jo...@zkb.ch> on 2002/07/03 10:57:19 UTC

Line breaking and Hyphenation

Hello all,
I tracked down the bugs 10374, 2106 and 6042. The last
bug was caused by a simple, easy to fix mistake in the
hyphenation framework. The bug 10374 is unfortunately
a duplicate of 2106, not 6042, and a bit more interesting.
It is caused by the parser delivering character references
as a separate character chunk, thereby creating multiple
FOText children of the block (FObjMixed) for consecutive
text. This interferes badly with line breaking and
hyphenation. Take
  e&#x78;tensible
with room up to the "l" on the line.
This is split into three FOText objects
  e &#x78; tensible
The text is delivered separately to the line layout
algorithm. The "e" and "X" do not fill the line but
also are not words and are appended to the pendingAreas
vector. The "tensible" then overflows the line and is
passed to the hyphenation, lets say it is hyphenated
as "tensi-ble". The "tensi-" is appended without
flushing the pending areas, which are put first into the
next line.
I put a StringBuffer into FObjMixed to accumulate
consecutive addCharacters() events. This fixes the problem
with character references, but not
 e<fo:inline>X</fo:inline>tensible
(also noted somewhere in bugzilla as problem)
The second is to flush pendig areas in addWord(). This
fixes the lost characters problem but *still* does not
correctly hyphenate words split into inline FOs, only
the chunk actually overflowing the line is considered
for hyphenation.

More problems I noted:
- white space is handled inconsistently
- line break detection relies on white space only
- word detection for hyphenation relies on white space
  and wrongly assumes there is a white space before the
  word passed to doHyphenation()
- the LinkSet is not considered for hyphenated word parts
  in addWord, and neither for page-number-citation nor
  fo:character
- same for most of overlining, line through and vertical
  alignment
- characters are copied to FOText, and then copied *twice*
  in LineArea.layout(), one purely for hyphenation. During
  Layout, character data is at least three times, possibly
  four times (parser buffer) in memory

Questions:
- Is it still worth to do major hacks in LineArea.java?
- Should we consider using Unicode break properties for
  line break opportunity detection?
- How should words for hyphenation be detected?
- What happens to line breaks and word detection in case of
  * inline graphics and other definitely non-text inlines
  * inline foreign elements, like formulas
  * inline-containers containing blocks, especially blocks
    with text only
- Are there script or language dependencies to consider for
  line break and word detection?
- At which point should collapse-whitespace, linefeed-treatment
  etc. considered? Possibilities:
  * while creating FOText
  * while feeding it into the line area
  * during line area layout

Considering white-space-collapse during FOText creation has some
problems in case of successive spaces in different inline FO.

There are additional issues with consecutive spaces which had
been discussed here already, in particular how
  foo <fo:inline text-decoration="underline"> bar</fo:inline>
should be handled. Will this result in two consecutive spaces,
one of them underlined? Has this issue been resolved meanwhile?

J.Pietschmann

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Line breaking and Hyphenation

Posted by Keiron Liddle <ke...@aftexsw.com>.
Hi Peter,

I agree with you in general. I was talking about fixing all bugs in that
code. By all means fix some bugs. In fact there are quite a few bug
fixes that are relevant to the current redesign and should be/have been
put across.

For this case it might help in understanding fo but it will most likely
not help to understand the new design at all.

Keiron.

On Fri, 2002-07-05 at 05:49, Peter B. West wrote:
> Keiron,
> 
> Undoubtedly it is better to put effort into getting the new code to 
> work.  However, if, at the moment, the direction of the main line of new 
> code development is only really understood by you and Karen, work on 
> these things by Joerg in the -maint branch may be very useful for a 
> couple of reasons.  The first is actually fixing some bugs.  The second 
> is that it will give him (and others who might want to join in) a much 
> better idea of the problems that the new design is trying to solve.
> 
> As here, you can indicate, before he gets involved, which areas are 
> going to be superceded and which he should therefore be cautious about 
> getting too involved in.  The end result may be that Joerg gets up to 
> speed in the redesign as part of that process.
> 
> In any case, it is probably encouraging for our users to see that there 
> is bug-fixing activity going on in addition to redesign.
> 
> What say you?
> 
> Peter



---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Line breaking and Hyphenation

Posted by "Peter B. West" <pb...@powerup.com.au>.
Keiron,

Undoubtedly it is better to put effort into getting the new code to 
work.  However, if, at the moment, the direction of the main line of new 
code development is only really understood by you and Karen, work on 
these things by Joerg in the -maint branch may be very useful for a 
couple of reasons.  The first is actually fixing some bugs.  The second 
is that it will give him (and others who might want to join in) a much 
better idea of the problems that the new design is trying to solve.

As here, you can indicate, before he gets involved, which areas are 
going to be superceded and which he should therefore be cautious about 
getting too involved in.  The end result may be that Joerg gets up to 
speed in the redesign as part of that process.

In any case, it is probably encouraging for our users to see that there 
is bug-fixing activity going on in addition to redesign.

What say you?

Peter

Keiron Liddle wrote:
> These (and other) problems are precisely why certain areas have been
> redesigned.
> Wouldn't it be better to put the effort into getting the new code to
> work?
> 
> On Wed, 2002-07-03 at 10:57, Joerg Pietschmann wrote:
> 
>>I put a StringBuffer into FObjMixed to accumulate
>>consecutive addCharacters() events.
> 
> 
> This is probably a good idea in general. Sometimes the SAX events can
> split text in all sort of places and it would be easier to handle if all
> consecutive text is joined together.
> 
> 
>>Questions:
>>- Is it still worth to do major hacks in LineArea.java?
> 
> 
> If you want to get rid of all the bugs, I would say no.
> 
> 
>>There are additional issues with consecutive spaces which had
>>been discussed here already, in particular how
>>  foo <fo:inline text-decoration="underline"> bar</fo:inline>
>>should be handled. Will this result in two consecutive spaces,
>>one of them underlined? Has this issue been resolved meanwhile?
> 
> 
> IIRC the space in the inline is "marked" and therefore this space is
> retained while the other space is discarded.

-- 
Peter B. West  pbwest@powerup.com.au  http://powerup.com.au/~pbwest
"Lord, to whom shall we go?"


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Line breaking and Hyphenation

Posted by Keiron Liddle <ke...@aftexsw.com>.
These (and other) problems are precisely why certain areas have been
redesigned.
Wouldn't it be better to put the effort into getting the new code to
work?

On Wed, 2002-07-03 at 10:57, Joerg Pietschmann wrote:
> I put a StringBuffer into FObjMixed to accumulate
> consecutive addCharacters() events.

This is probably a good idea in general. Sometimes the SAX events can
split text in all sort of places and it would be easier to handle if all
consecutive text is joined together.

> Questions:
> - Is it still worth to do major hacks in LineArea.java?

If you want to get rid of all the bugs, I would say no.

> There are additional issues with consecutive spaces which had
> been discussed here already, in particular how
>   foo <fo:inline text-decoration="underline"> bar</fo:inline>
> should be handled. Will this result in two consecutive spaces,
> one of them underlined? Has this issue been resolved meanwhile?

IIRC the space in the inline is "marked" and therefore this space is
retained while the other space is discarded.

> J.Pietschmann



---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org