You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-users@xmlgraphics.apache.org by Ryan Lortie <de...@desrt.ca> on 2008/09/12 22:14:05 UTC
Undesirable line breaks
Hello
FOP is currently producing undesired line breaks. I'm having this
problem with FOP 0.95 and svn trunk.
The problem is that FOP thinks that it's appropriate to split "Gtk+"
across two lines. I can't think of any other text layout engine that
would consider that to be a legitimate place to insert a break. I've
tried changing the language parameters around but that doesn't have much
effect.
Can anyone think of a quick workaround that I can use to deal with this
problem?
Thanks in advance
Cheers
---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org
Re: Undesirable line breaks
Posted by Andreas Delmelle <an...@telenet.be>.
On Sep 15, 2008, at 18:43, Andreas Delmelle wrote:
BTW:
> On Sep 15, 2008, at 09:05, Ryan Lortie wrote:
>> <snip />
>> With all of these workarounds it's getting to the point where nearly
>> every part of the output from my stylesheets is littered with
>> millions
>> if <inline> elements :)
>
> Probably better in your case to insert auxiliary codepoints then.
Note that my original alternative uses fo:wrapper rather than
fo:inline. In cases where you don't need borders, padding or special
alignment adjustments, fo:wrapper is much more appropriate and
efficient than fo:inline. (confirmed with FOP; I haven't checked
other implementations for this, but I assume something similar holds
there: fo:inlines generate their own area(s), so necessarily imply an
increase in both processing-time and memory consumption)
I guess this simple fact still needs to find its way to the site
(Memory Usage Hints). A lot of 3rd party stylesheets I've seen so
far, generate fo:inlines in situations where this is actually a
complete waste. This results in many XSL-FO novices only considering
fo:block or fo:inline as possible alternatives (since eventually,
they start off by mimicking the examples they encounter).
Just something that crossed my mind...
Cheers
Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org
Re: Undesirable line breaks
Posted by Andreas Delmelle <an...@telenet.be>.
On Sep 15, 2008, at 09:05, Ryan Lortie wrote:
> On Sun, 2008-09-14 at 01:49 +0200, Andreas Delmelle wrote:
>> At what point? I assume it's right before the '+', correct?
> Correct.
>
>> If the layout engine uses Unicode TR#14 as reference to determine the
>> line-breaks, then a break between 'k' and '+' would be allowed. '+'
>> belongs to the class of Numeric Prefix characters (PR), and as such
>> allows a break before but not a break after. (see: http://
>> www.unicode/
>> reports/tr14/#DescriptionOfProperties)
>
> I was not aware of this standard. I find that to be a rather odd
> choice
> to make (in the meantime I've thought of other common cases like "A+"
> and "C++", etc.). Oh well :)
Indeed, but those are actually not so common. That is: it is more
common for a '+' to appear in the context of a numerical/mathematical
expression than following regular alphabetic characters. If the '+'
appears as an operator in a long mathematical addition which is
broken, one would most commonly prefer to see it as the first
character on the next line, I believe...
In the uncommon cases, as I hinted, the most straightforward
workaround (currently) is to have a word-joiner (U+2060) or a zero-
width-no-break-space (U+FEFF) precede the '+' to steer the pair-based
algorithm in the right direction.
On another note, the Unicode Technical Report does offer room for
exceptions/customizations (as described in: http://www.unicode.org/
reports/tr14/#Customization), but FOP currently 'only' implements the
basic algorithm. This 'only' points to a limitation, but apart from
some quirky exceptions, this basic implementation does already cover
a very great deal of line-breaking rules taken for granted in a lot
of different contexts/languages. More notable exceptions are special
line-breaking rules for Japanese and a variant of Korean. OTOH, the
rules for languages like Chinese, Hebrew and Arabic are covered by
TR#14. (that is: only the line-breaking. FOP still has severe issues
with the actual typesetting of Arabic, for example. Although the line-
breaks will be determined correctly, FOP does not do any glyph-
merging for inner-word ligatures... Each codepoint remains a separate
character in the output.)
>> Another alternative would be something like: <fo:wrapper keep-
>> together.within-line="always">Gtk+</fo:wrapper>
>
> With all of these workarounds it's getting to the point where nearly
> every part of the output from my stylesheets is littered with millions
> if <inline> elements :)
Probably better in your case to insert auxiliary codepoints then.
> To be more specific about what I was wondering about: is there any way
> to tell FOP in a general sense "please be less intelligent, and only
> break on ASCII space characters."?
In a way, you could override the behavior for 'AL followed by PR',
such that this will also lead to an indirect break (i.e. only break
if there is a space between the letter and the prefix-character)
BUT... for the moment, since the matter of customization of the
Unicode algorithm has not been addressed completely, it means you'll
end up with a customized FOP-build.
It is rather easy, but definitely not recommended:
1° download the source distribution (or check out the trunk via SVN)
2° modify the file 'src/codegen/unicode/data/LineBreakPairTable.txt'.
The characters representing the different types of break-opportunity
are available at http://www.unicode.org/reports/tr14/#ExampleTable,
or in the source file 'src/codegen/unicode/java/org/apache/fop/text/
linebreak/GenerateLineBreakUtils.java' In short: the character in the
grid at row AL/ column PR would have to be '%' instead of '_'.
3° after that, run 'ant codegen-unicode'
4° run the standard 'ant package'
Ideally, we should be looking for an approach where the user has the
option of adding an overriding pair-table (for all or some of the
combinations of classes), such that it would no longer be necessary
to regenerate the class in question.
The downside currently is that there may be side-effects for some
other cases, where the basic pair-table offered by Unicode does
generate the expected break-opportunity...
Cheers
Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org
Re: Undesirable line breaks
Posted by Ryan Lortie <de...@desrt.ca>.
On Sun, 2008-09-14 at 01:49 +0200, Andreas Delmelle wrote:
> At what point? I assume it's right before the '+', correct?
Correct.
> If the layout engine uses Unicode TR#14 as reference to determine the
> line-breaks, then a break between 'k' and '+' would be allowed. '+'
> belongs to the class of Numeric Prefix characters (PR), and as such
> allows a break before but not a break after. (see: http://www.unicode/
> reports/tr14/#DescriptionOfProperties)
I was not aware of this standard. I find that to be a rather odd choice
to make (in the meantime I've thought of other common cases like "A+"
and "C++", etc.). Oh well :)
> Another alternative would be something like: <fo:wrapper keep-
> together.within-line="always">Gtk+</fo:wrapper>
With all of these workarounds it's getting to the point where nearly
every part of the output from my stylesheets is littered with millions
if <inline> elements :)
To be more specific about what I was wondering about: is there any way
to tell FOP in a general sense "please be less intelligent, and only
break on ASCII space characters."?
In any case, thanks for the insight.
Cheers
---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org
RE: Undesirable line breaks
Posted by "Amick, Eric" <Er...@mail.house.gov>.
>> FOP is currently producing undesired line breaks. I'm having this
>> problem with FOP 0.95 and svn trunk.
>>
>> The problem is that FOP thinks that it's appropriate to split "Gtk+"
>> across two lines.
>> I can't think of any other text layout engine that would consider
that
>> to be a legitimate place to insert a break.
>At what point? I assume it's right before the '+', correct?
>If the layout engine uses Unicode TR#14 as reference to determine the
line-breaks,
>then a break between 'k' and '+' would be allowed. '+'
>belongs to the class of Numeric Prefix characters (PR), and as such
allows a break before
>but not a break after. (see: http://www.unicode/
>reports/tr14/#DescriptionOfProperties)
The following description of the numeric prefix characters from the
Unicode standard suggests it shouldn't be breaking there:
Characters that usually precede a numerical expression may not be
separated from following numeric characters or following opening
characters, even if a space character intervenes. For example, there is
no break opportunity in "$ (100.00)".
Many currency signs can appear on both sides, or even the middle, of a
numeric expression. Therefore the line breaking algorithm, by default,
does not break between PR [numeric prefix] and numbers or *letters* on
either side. [emphasis mine]
Eric Amick
Legislative Computer Systems
Office of the Clerk
---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org
Re: Undesirable line breaks
Posted by Andreas Delmelle <an...@telenet.be>.
On Sep 12, 2008, at 22:14, Ryan Lortie wrote:
Hi Ryan
> FOP is currently producing undesired line breaks. I'm having this
> problem with FOP 0.95 and svn trunk.
>
> The problem is that FOP thinks that it's appropriate to split "Gtk+"
> across two lines.
> I can't think of any other text layout engine that
> would consider that to be a legitimate place to insert a break.
At what point? I assume it's right before the '+', correct?
>
If the layout engine uses Unicode TR#14 as reference to determine the
line-breaks, then a break between 'k' and '+' would be allowed. '+'
belongs to the class of Numeric Prefix characters (PR), and as such
allows a break before but not a break after. (see: http://www.unicode/
reports/tr14/#DescriptionOfProperties)
> I've tried changing the language parameters around but that doesn't
> have much
> effect.
>
> Can anyone think of a quick workaround that I can use to deal with
> this
> problem?
Try something like "Gtk+" (zero-width no-break space) or
"Gtk⁠+" (word-joiner). Inserting one of those should suffice
to prevent FOP from considering breaking the text there.
Another alternative would be something like: <fo:wrapper keep-
together.within-line="always">Gtk+</fo:wrapper>
HTH!
Cheers
Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org