You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-users@xmlgraphics.apache.org by Ryan Lortie <de...@desrt.ca> on 2008/09/12 22:14:05 UTC

Undesirable line breaks

Hello

FOP is currently producing undesired line breaks.  I'm having this
problem with FOP 0.95 and svn trunk.

The problem is that FOP thinks that it's appropriate to split "Gtk+"
across two lines.  I can't think of any other text layout engine that
would consider that to be a legitimate place to insert a break.  I've
tried changing the language parameters around but that doesn't have much
effect.

Can anyone think of a quick workaround that I can use to deal with this
problem?

Thanks in advance

Cheers


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Undesirable line breaks

Posted by Andreas Delmelle <an...@telenet.be>.
On Sep 15, 2008, at 18:43, Andreas Delmelle wrote:

BTW:

> On Sep 15, 2008, at 09:05, Ryan Lortie wrote:
>> <snip />
>> With all of these workarounds it's getting to the point where nearly
>> every part of the output from my stylesheets is littered with  
>> millions
>> if <inline> elements :)
>
> Probably better in your case to insert auxiliary codepoints then.

Note that my original alternative uses fo:wrapper rather than  
fo:inline. In cases where you don't need borders, padding or special  
alignment adjustments, fo:wrapper is much more appropriate and  
efficient than fo:inline. (confirmed with FOP; I haven't checked  
other implementations for this, but I assume something similar holds  
there: fo:inlines generate their own area(s), so necessarily imply an  
increase in both processing-time and memory consumption)

I guess this simple fact still needs to find its way to the site  
(Memory Usage Hints). A lot of 3rd party stylesheets I've seen so  
far, generate fo:inlines in situations where this is actually a  
complete waste. This results in many XSL-FO novices only considering  
fo:block or fo:inline as possible alternatives (since eventually,  
they start off by mimicking the examples they encounter).

Just something that crossed my mind...


Cheers

Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Undesirable line breaks

Posted by Andreas Delmelle <an...@telenet.be>.
On Sep 15, 2008, at 09:05, Ryan Lortie wrote:

> On Sun, 2008-09-14 at 01:49 +0200, Andreas Delmelle wrote:
>> At what point? I assume it's right before the '+', correct?
> Correct.
>
>> If the layout engine uses Unicode TR#14 as reference to determine the
>> line-breaks, then a break between 'k' and '+' would be allowed. '+'
>> belongs to the class of Numeric Prefix characters (PR), and as such
>> allows a break before but not a break after. (see: http:// 
>> www.unicode/
>> reports/tr14/#DescriptionOfProperties)
>
> I was not aware of this standard.  I find that to be a rather odd  
> choice
> to make (in the meantime I've thought of other common cases like "A+"
> and "C++", etc.).  Oh well :)

Indeed, but those are actually not so common. That is: it is more  
common for a '+' to appear in the context of a numerical/mathematical  
expression than following regular alphabetic characters. If the '+'  
appears as an operator in a long mathematical addition which is  
broken, one would most commonly prefer to see it as the first  
character on the next line, I believe...

In the uncommon cases, as I hinted, the most straightforward  
workaround (currently) is to have a word-joiner (U+2060) or a zero- 
width-no-break-space (U+FEFF) precede the '+' to steer the pair-based  
algorithm in the right direction.

On another note, the Unicode Technical Report does offer room for  
exceptions/customizations (as described in: http://www.unicode.org/ 
reports/tr14/#Customization), but FOP currently 'only' implements the  
basic algorithm. This 'only' points to a limitation, but apart from  
some quirky exceptions, this basic implementation does already cover  
a very great deal of line-breaking rules taken for granted in a lot  
of different contexts/languages. More notable exceptions are special  
line-breaking rules for Japanese and a variant of Korean. OTOH, the  
rules for languages like Chinese, Hebrew and Arabic are covered by  
TR#14. (that is: only the line-breaking. FOP still has severe issues  
with the actual typesetting of Arabic, for example. Although the line- 
breaks will be determined correctly, FOP does not do any glyph- 
merging for inner-word ligatures... Each codepoint remains a separate  
character in the output.)

>> Another alternative would be something like: <fo:wrapper keep-
>> together.within-line="always">Gtk+</fo:wrapper>
>
> With all of these workarounds it's getting to the point where nearly
> every part of the output from my stylesheets is littered with millions
> if <inline> elements :)

Probably better in your case to insert auxiliary codepoints then.

> To be more specific about what I was wondering about: is there any way
> to tell FOP in a general sense "please be less intelligent, and only
> break on ASCII space characters."?

In a way, you could override the behavior for 'AL followed by PR',  
such that this will also lead to an indirect break (i.e. only break  
if there is a space between the letter and the prefix-character)
BUT... for the moment, since the matter of customization of the  
Unicode algorithm has not been addressed completely, it means you'll  
end up with a customized FOP-build.

It is rather easy, but definitely not recommended:
1° download the source distribution (or check out the trunk via SVN)
2° modify the file 'src/codegen/unicode/data/LineBreakPairTable.txt'.  
The characters representing the different types of break-opportunity  
are available at http://www.unicode.org/reports/tr14/#ExampleTable,  
or in the source file 'src/codegen/unicode/java/org/apache/fop/text/ 
linebreak/GenerateLineBreakUtils.java' In short: the character in the  
grid at row AL/ column PR would have to be '%' instead of '_'.
3° after that, run 'ant codegen-unicode'
4° run the standard 'ant package'

Ideally, we should be looking for an approach where the user has the  
option of adding an overriding pair-table (for all or some of the  
combinations of classes), such that it would no longer be necessary  
to regenerate the class in question.

The downside currently is that there may be side-effects for some  
other cases, where the basic pair-table offered by Unicode does  
generate the expected break-opportunity...



Cheers

Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Undesirable line breaks

Posted by Ryan Lortie <de...@desrt.ca>.
On Sun, 2008-09-14 at 01:49 +0200, Andreas Delmelle wrote:
> At what point? I assume it's right before the '+', correct?
Correct.

> If the layout engine uses Unicode TR#14 as reference to determine the  
> line-breaks, then a break between 'k' and '+' would be allowed. '+'  
> belongs to the class of Numeric Prefix characters (PR), and as such  
> allows a break before but not a break after. (see: http://www.unicode/ 
> reports/tr14/#DescriptionOfProperties)

I was not aware of this standard.  I find that to be a rather odd choice
to make (in the meantime I've thought of other common cases like "A+"
and "C++", etc.).  Oh well :)

> Another alternative would be something like: <fo:wrapper keep- 
> together.within-line="always">Gtk+</fo:wrapper>

With all of these workarounds it's getting to the point where nearly
every part of the output from my stylesheets is littered with millions
if <inline> elements :)

To be more specific about what I was wondering about: is there any way
to tell FOP in a general sense "please be less intelligent, and only
break on ASCII space characters."?

In any case, thanks for the insight.

Cheers


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


RE: Undesirable line breaks

Posted by "Amick, Eric" <Er...@mail.house.gov>.
>> FOP is currently producing undesired line breaks.  I'm having this 
>> problem with FOP 0.95 and svn trunk.
>>
>> The problem is that FOP thinks that it's appropriate to split "Gtk+"
>> across two lines.

>> I can't think of any other text layout engine that would consider
that 
>> to be a legitimate place to insert a break.

>At what point? I assume it's right before the '+', correct?

>If the layout engine uses Unicode TR#14 as reference to determine the
line-breaks, 
>then a break between 'k' and '+' would be allowed. '+'  
>belongs to the class of Numeric Prefix characters (PR), and as such
allows a break before 
>but not a break after. (see: http://www.unicode/
>reports/tr14/#DescriptionOfProperties)

The following description of the numeric prefix characters from the
Unicode standard suggests it shouldn't be breaking there:

Characters that usually precede a numerical expression may not be
separated from following numeric characters or following opening
characters, even if a space character intervenes. For example, there is
no break opportunity in "$ (100.00)".

Many currency signs can appear on both sides, or even the middle, of a
numeric expression. Therefore the line breaking algorithm, by default,
does not break between PR [numeric prefix] and numbers or *letters* on
either side. [emphasis mine]


Eric Amick
Legislative Computer Systems
Office of the Clerk


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Undesirable line breaks

Posted by Andreas Delmelle <an...@telenet.be>.
On Sep 12, 2008, at 22:14, Ryan Lortie wrote:

Hi Ryan

> FOP is currently producing undesired line breaks.  I'm having this
> problem with FOP 0.95 and svn trunk.
>
> The problem is that FOP thinks that it's appropriate to split "Gtk+"
> across two lines.

> I can't think of any other text layout engine that
> would consider that to be a legitimate place to insert a break.

At what point? I assume it's right before the '+', correct?

>
If the layout engine uses Unicode TR#14 as reference to determine the  
line-breaks, then a break between 'k' and '+' would be allowed. '+'  
belongs to the class of Numeric Prefix characters (PR), and as such  
allows a break before but not a break after. (see: http://www.unicode/ 
reports/tr14/#DescriptionOfProperties)

> I've tried changing the language parameters around but that doesn't  
> have much
> effect.
>
> Can anyone think of a quick workaround that I can use to deal with  
> this
> problem?

Try something like "Gtk&#xFEFF;+" (zero-width no-break space) or  
"Gtk&#x2060;+" (word-joiner). Inserting one of those should suffice  
to prevent FOP from considering breaking the text there.

Another alternative would be something like: <fo:wrapper keep- 
together.within-line="always">Gtk+</fo:wrapper>


HTH!

Cheers

Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org