You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-users@xmlgraphics.apache.org by Manuel Mall <ma...@apache.org> on 2007/04/13 13:55:41 UTC

Re: Differences in Chineese output between fop 0.20.5 and fop trunk

On Friday 13 April 2007 18:39, Jeremias Maerki wrote:
> The most probably reason is that the whole line-breaking is different
> from 0.20.5 (the code has been rewritten) and FOP still doesn't
> implement all of UAX#14. I can't even tell if 0.20.5 did it right.
> Anyway, we mostly lack the knowledge of non-western scripts and have
> an incomplete implementation, so effects like this are to be
> expected.
>
> As a work-around you can try inserting zero-width spaces to give FOP
> the chance to add more spacing/break possibilities. From the Unicode
> docs: ZWSP: "this character is intended for line break control: it
> has no width, but its presence between two characters does not
> prevent increased letter spacing in justification."
>
> HTH

Not sure its a UAX#14 line breaking issue. It seems every Chinese 
character is correctly handled as a line break opportunity. My sense is 
that it is simply the behaviour of the Knuth algorithm which decided 
that the FOP 0.93 solution is more aesthetically pleasing. If you look 
at the text there are very few elastic spaces in there for 
justification and if I remember correctly Knuth doesn't 'like' 
(penalises) short last lines in justified paragraphs so it will give 
some preferences to the solution with the longer line.

I am not sure that there are any workarounds apart from making the 
paragraph not justified.

May be adding some ideographic spaces in the right places may help?

Manuel

>
> On 12.04.2007 11:01:54 Stefan Heuer wrote:
> > Hi,
> >
> > I'm trying to generate a Chinese PDF file. I've found a working
> > font and embedding works.
> >
> > But there are some differences int the output between the old and
> > the current processor. I appended two PDF files and the fo source.
> >
> > The differences in the page layout are OK. But the line breaking
> > and the following big distances between the words looking bad.
> > Particular in the second paragraph on the second page there are big
> > differences between the words.
> >
> > I'm using fop 0.20.5 and fop trunk from 10.April on Windows XP with
> > Java 1.5.
> >
> > I would prefer using fop 0.93 or later. Is there anybody who can
> > give me a hint how to adjust (or other action) to avoid such big
> > distances.
> >
> >
> > Thanks
> > Stefan
>
> Jeremias Maerki
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
> For additional commands, e-mail:
> fop-users-help@xmlgraphics.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Differences in Chineese output between fop 0.20.5 and fop trunk

Posted by "J.Pietschmann" <j3...@yahoo.de>.
Abel Braaksma wrote:
> I don't agree that "rl-tb" is commonly used,

Darn transposition error, of course I meant lr-tb. Next
tie I'll spell it out, just to be sure...

J.Pietschmann

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Differences in Chineese output between fop 0.20.5 and fop trunk

Posted by Abel Braaksma <ab...@xs4all.nl>.
J.Pietschmann wrote:
> Manuel Mall wrote:
>> I am in no way an expert in Asian scripts but I believe Chinese is 
>> typically written top to bottom.
>
> That's "classical Chinese." Nowadays rl-tb is commonly used.
>
> J.PIetschmann

"Commonly" depends on where you are. In Macau, Hongkong and Taiwan, 
Traditional Chinese, top-to-bottom, right-to-left is still most common 
in books, newspapers (declining, often combining l-r and t-b texts in 
one paper), comics etc. Additionally, Japanese is invariably written 
top-to-bottom on address labels and in comics (see any Manga comic). 
About all Asian scripts are written top to bottom on spines of books. 
Traditional Mongolian (Manchu and the likes) is always written 
top-to-bottom left-to-right (but Manchu is rare). Japanese business 
cards tend to be both top-to-bottom (Japanese text) and left-to-right 
(English version).

I don't agree that "rl-tb" is commonly used, not even for Simplified 
Chinese: the majority of Simplified Chinese (which is the language in 
mainland China since 1950) texts are left-to-right and top-to-bottom. 
Only when combined with Arabic or Hebrew texts, right-to-left is used, 
i.e., it follows the "host" writing direction.

Cheers,
-- Abel Braaksma

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Differences in Chineese output between fop 0.20.5 and fop trunk

Posted by "J.Pietschmann" <j3...@yahoo.de>.
Manuel Mall wrote:
> I am in no way an expert in Asian scripts but I believe Chinese is 
> typically written top to bottom.

That's "classical Chinese." Nowadays rl-tb is commonly used.

J.PIetschmann

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Differences in Chineese output between fop 0.20.5 and fop trunk

Posted by Manuel Mall <ma...@apache.org>.
On Wednesday 18 April 2007 16:28, Stefan Heuer wrote:
> Hi,
>
> thanks for the anwers.
>
> Manuel your advice works. Inserting zero-width spaces doesn't help.
> But to making the paragraphs left-justified helps for the Chinese
> document.
>
> Anyway, for Chinese texts it seems not necessary to use justified
> paragraphs, because usually they don't use spaces between their
> characters. But I'm not a Chinese and I don't know anything about
> other Asian languages like Japanese or Korean. But so now I need
> different adjustments for European and Chinese documents.

Yes, but I don't find that surprising at all. After all an ideographic 
script (its a bit like a fixed width font with a character per word) is 
very different to a Western script. Therefore different typesetting 
conventions most likely apply.

I am in no way an expert in Asian scripts but I believe Chinese is 
typically written top to bottom. So your example which mixes Western 
and Chinese appears a bit strange to me.

>
> Is there a way to take influence to the Knuth algorithm? At least to
> switch on or of?

No sorry, there is no way of doing that without changing the code. It 
seems what you want is a way to switch between the Knuth Best Fit line 
layout to First Fit line layout as used by the older fop. This is not 
supported nor is the old First Fit algorithm in the first place.

Cheers

Manuel
>
> Stefan
>
> Manuel Mall schrieb:
> > On Friday 13 April 2007 18:39, Jeremias Maerki wrote:
> >> The most probably reason is that the whole line-breaking is
> >> different from 0.20.5 (the code has been rewritten) and FOP still
> >> doesn't implement all of UAX#14. I can't even tell if 0.20.5 did
> >> it right. Anyway, we mostly lack the knowledge of non-western
> >> scripts and have an incomplete implementation, so effects like
> >> this are to be expected.
> >>
> >> As a work-around you can try inserting zero-width spaces to give
> >> FOP the chance to add more spacing/break possibilities. From the
> >> Unicode docs: ZWSP: "this character is intended for line break
> >> control: it has no width, but its presence between two characters
> >> does not prevent increased letter spacing in justification."
> >>
> >> HTH
> >
> > Not sure its a UAX#14 line breaking issue. It seems every Chinese
> > character is correctly handled as a line break opportunity. My
> > sense is that it is simply the behaviour of the Knuth algorithm
> > which decided that the FOP 0.93 solution is more aesthetically
> > pleasing. If you look at the text there are very few elastic spaces
> > in there for justification and if I remember correctly Knuth
> > doesn't 'like' (penalises) short last lines in justified paragraphs
> > so it will give some preferences to the solution with the longer
> > line.
> >
> > I am not sure that there are any workarounds apart from making the
> > paragraph not justified.
> >
> > May be adding some ideographic spaces in the right places may help?
> >
> > Manuel
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
> For additional commands, e-mail:
> fop-users-help@xmlgraphics.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Differences in Chineese output between fop 0.20.5 and fop trunk

Posted by Stefan Heuer <sh...@gmx.de>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
 
Hi,

thanks for the anwers.

Manuel your advice works. Inserting zero-width spaces doesn't help.
But to making the paragraphs left-justified helps for the Chinese
document.

Anyway, for Chinese texts it seems not necessary to use justified
paragraphs, because usually they don't use spaces between their
characters. But I'm not a Chinese and I don't know anything about
other Asian languages like Japanese or Korean. But so now I need
different adjustments for European and Chinese documents.

Is there a way to take influence to the Knuth algorithm? At least to
switch on or of?

Stefan


Manuel Mall schrieb:
> On Friday 13 April 2007 18:39, Jeremias Maerki wrote:
>> The most probably reason is that the whole line-breaking is different
>> from 0.20.5 (the code has been rewritten) and FOP still doesn't
>> implement all of UAX#14. I can't even tell if 0.20.5 did it right.
>> Anyway, we mostly lack the knowledge of non-western scripts and have
>> an incomplete implementation, so effects like this are to be
>> expected.
>>
>> As a work-around you can try inserting zero-width spaces to give FOP
>> the chance to add more spacing/break possibilities. From the Unicode
>> docs: ZWSP: "this character is intended for line break control: it
>> has no width, but its presence between two characters does not
>> prevent increased letter spacing in justification."
>>
>> HTH
>
> Not sure its a UAX#14 line breaking issue. It seems every Chinese
> character is correctly handled as a line break opportunity. My sense is
> that it is simply the behaviour of the Knuth algorithm which decided
> that the FOP 0.93 solution is more aesthetically pleasing. If you look
> at the text there are very few elastic spaces in there for
> justification and if I remember correctly Knuth doesn't 'like'
> (penalises) short last lines in justified paragraphs so it will give
> some preferences to the solution with the longer line.
>
> I am not sure that there are any workarounds apart from making the
> paragraph not justified.
>
> May be adding some ideographic spaces in the right places may help?
>
> Manuel
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 
iD8DBQFGJdaWirQ0Ts08AdMRAgUKAJ9lQCT2sYohQyopgJ5UjXMBZ+VmsQCggbrJ
Gii7kJfdHC2DM/TEQjmXBtI=
=db8k
-----END PGP SIGNATURE-----


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org