You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-users@xmlgraphics.apache.org by Nicolas Lalevee <ni...@anyware-tech.com> on 2006/03/06 10:48:24 UTC

Chinese hyphenation particularity

Hi everybody,

I have succeeded in producing a chinese PDF from an XML document via a
XSL transformation.
There is a last problem. The professional chinese document should not
let a chinese character alone on a line.
For instance, I have the sentense "AZERTYUIO." to render in PDF. FOP
(with the patch of the bug 36977 for the trunk version), generate a PDF
with :
AZERTYUI
O.

And, for chinese people, that's not a well rendered document. The
prefered layout is :
AZERTYUIO. (the characters have to be compressed)
or
AZERTYU (the characters have to be expanded)
IO.

The only way I found to do so is to force the last three characters of a
text to be "no-wrap". Here is my XSL template :

    <xsl:template match="text()">
        <xsl:variable name="txt">
            <xsl:call-template name="string.subst">
                <xsl:with-param name="string">
                    <xsl:call-template name="string.subst">
                        <xsl:with-param name="string">
                            <xsl:call-template name="string.subst">
                                <xsl:with-param name="string" select="." />
                                <xsl:with-param name="target"
select="'.'" />
                                <xsl:with-param name="replacement"
select="'.&#x200B;'" />
                            </xsl:call-template>
                        </xsl:with-param>
                        <xsl:with-param name="target" select="'\'" />
                        <xsl:with-param name="replacement"
select="'\&#x200B;'" />
                    </xsl:call-template>
                </xsl:with-param>
                <xsl:with-param name="target" select="'/'" />
                <xsl:with-param name="replacement" select="'/&#x200B;'" />
            </xsl:call-template>
        </xsl:variable>
        <xsl:choose>
            <xsl:when test="string-length($txt) > 3">
                <xsl:value-of select="substring($txt, 1,
string-length($txt)-3)" />
                <fo:inline wrap-option="no-wrap" hyphenate="false"
keep-together.within-line="always">
                    <xsl:value-of select="substring($txt,
string-length($txt)-2, string-length($txt))" />
                </fo:inline>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="$txt" />
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

And that's a not good solution because some ends of line can be missed.
In my XML source document, I can have inline formatting properties, like
bold or italic, that makes the text sequence shorter than 3, even if the
complete sentense contains more than 3 characters.

Is there any other way to handle this hyphenation particularity ?

Thanks in advance
Nicolas


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Chinese hyphenation particularity

Posted by Nicolas Lalevee <ni...@anyware-tech.com>.
Thank you for the links. This is a huge job, you're right !
But I won't be involved in this. I had still no answer for the XEP
support, but the XSL Formatter support resend me a mail showing me the
attribute that does exactly what I want: it is axf:avoid-widow-words.
So probably that paying the XSL Formatter licence will be less expensive
that paying days of coding for FOP.

Thank you for your answers and your support
Bye
Nicolas

Jeremias Maerki a écrit :
> UAX#14 is specified here:
> http://www.unicode.org/reports/tr14/
> 
> If you're serious about diving into this, keep in mind that this is
> probably not a small job and will require some amount of getting
> acquainted with FOP and its innards. It's also recommended that you
> subscribe to the fop-dev mailing list and that you browse through the
> mailing list archive for the fop-dev mailing list where you will find
> some discussions and preparational work already done (mostly by Joerg
> Pietschmann). Search words are "UAX", "UAX#14", "Unicode", "TR14", "i18n",
> "line breaking". We were recently talking about using ICU4J
> (http://icu.sourceforge.net/). Its use may give us a good head-start.
> Joerg Pietschmann and Manuel Mall already did some work in the area:
> http://mail-archives.apache.org/mod_mbox/xmlgraphics-fop-dev/200510.mbox/%3c200510311525.13246.mm@arcus.com.au%3e
> 
> I can't help much here because I haven't had the time to get a closer
> look at all this.
> 
> On 07.03.2006 09:11:47 Nicolas Lalevee wrote:
> 
>>Jeremias Maerki a écrit :
>>
>>>There's not much else you can do other than to try to handle/work-around
>>>everything in XSLT. FOP does not have special code to handle languages
>>>like Chinese. We lack the knowledge set in the project team. Every now
>>>and then we talk about implementing UAX#14 line breaking but so far
>>>nobody had the resources to dive into this. Any help is welcome.
>>
>>OK,
>>In fact, I haven't find any formatter that does that. XSL Formatter
>>doesn't support that (said the Antenna House support), no answer yet of
>>XEP support.
>>So the only way to do a such hyphenation is to code it myself in FOP. So
>>maybe I will be involved in that stuff, depending of the work to do.
>>Can I have more info about the "UAX#14" ?
>>
>>bye,
>>Nicolas
>>
>>
>>>On 06.03.2006 10:48:24 Nicolas Lalevee wrote:
>>>
>>>
>>>>Hi everybody,
>>>>
>>>>I have succeeded in producing a chinese PDF from an XML document via a
>>>>XSL transformation.
>>>>There is a last problem. The professional chinese document should not
>>>>let a chinese character alone on a line.
>>>>For instance, I have the sentense "AZERTYUIO." to render in PDF. FOP
>>>>(with the patch of the bug 36977 for the trunk version), generate a PDF
>>>>with :
>>>>AZERTYUI
>>>>O.
>>>>
>>>>And, for chinese people, that's not a well rendered document. The
>>>>prefered layout is :
>>>>AZERTYUIO. (the characters have to be compressed)
>>>>or
>>>>AZERTYU (the characters have to be expanded)
>>>>IO.
>>>>
>>>>The only way I found to do so is to force the last three characters of a
>>>>text to be "no-wrap". Here is my XSL template :
>>>>
>>>>   <xsl:template match="text()">
>>>>       <xsl:variable name="txt">
>>>>           <xsl:call-template name="string.subst">
>>>>               <xsl:with-param name="string">
>>>>                   <xsl:call-template name="string.subst">
>>>>                       <xsl:with-param name="string">
>>>>                           <xsl:call-template name="string.subst">
>>>>                               <xsl:with-param name="string" select="." />
>>>>                               <xsl:with-param name="target"
>>>>select="'.'" />
>>>>                               <xsl:with-param name="replacement"
>>>>select="'.&#x200B;'" />
>>>>                           </xsl:call-template>
>>>>                       </xsl:with-param>
>>>>                       <xsl:with-param name="target" select="'\'" />
>>>>                       <xsl:with-param name="replacement"
>>>>select="'\&#x200B;'" />
>>>>                   </xsl:call-template>
>>>>               </xsl:with-param>
>>>>               <xsl:with-param name="target" select="'/'" />
>>>>               <xsl:with-param name="replacement" select="'/&#x200B;'" />
>>>>           </xsl:call-template>
>>>>       </xsl:variable>
>>>>       <xsl:choose>
>>>>           <xsl:when test="string-length($txt) > 3">
>>>>               <xsl:value-of select="substring($txt, 1,
>>>>string-length($txt)-3)" />
>>>>               <fo:inline wrap-option="no-wrap" hyphenate="false"
>>>>keep-together.within-line="always">
>>>>                   <xsl:value-of select="substring($txt,
>>>>string-length($txt)-2, string-length($txt))" />
>>>>               </fo:inline>
>>>>           </xsl:when>
>>>>           <xsl:otherwise>
>>>>               <xsl:value-of select="$txt" />
>>>>           </xsl:otherwise>
>>>>       </xsl:choose>
>>>>   </xsl:template>
>>>>
>>>>And that's a not good solution because some ends of line can be missed.
>>>>In my XML source document, I can have inline formatting properties, like
>>>>bold or italic, that makes the text sequence shorter than 3, even if the
>>>>complete sentense contains more than 3 characters.
>>>>
>>>>Is there any other way to handle this hyphenation particularity ?
>>>>
>>>>Thanks in advance
>>>>Nicolas
>>>
>>>
>>>
>>>Jeremias Maerki
> 
> 
> 
> Jeremias Maerki


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Chinese hyphenation particularity

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
UAX#14 is specified here:
http://www.unicode.org/reports/tr14/

If you're serious about diving into this, keep in mind that this is
probably not a small job and will require some amount of getting
acquainted with FOP and its innards. It's also recommended that you
subscribe to the fop-dev mailing list and that you browse through the
mailing list archive for the fop-dev mailing list where you will find
some discussions and preparational work already done (mostly by Joerg
Pietschmann). Search words are "UAX", "UAX#14", "Unicode", "TR14", "i18n",
"line breaking". We were recently talking about using ICU4J
(http://icu.sourceforge.net/). Its use may give us a good head-start.
Joerg Pietschmann and Manuel Mall already did some work in the area:
http://mail-archives.apache.org/mod_mbox/xmlgraphics-fop-dev/200510.mbox/%3c200510311525.13246.mm@arcus.com.au%3e

I can't help much here because I haven't had the time to get a closer
look at all this.

On 07.03.2006 09:11:47 Nicolas Lalevee wrote:
> Jeremias Maerki a écrit :
> > There's not much else you can do other than to try to handle/work-around
> > everything in XSLT. FOP does not have special code to handle languages
> > like Chinese. We lack the knowledge set in the project team. Every now
> > and then we talk about implementing UAX#14 line breaking but so far
> > nobody had the resources to dive into this. Any help is welcome.
> 
> OK,
> In fact, I haven't find any formatter that does that. XSL Formatter
> doesn't support that (said the Antenna House support), no answer yet of
> XEP support.
> So the only way to do a such hyphenation is to code it myself in FOP. So
> maybe I will be involved in that stuff, depending of the work to do.
> Can I have more info about the "UAX#14" ?
> 
> bye,
> Nicolas
> 
> > 
> > On 06.03.2006 10:48:24 Nicolas Lalevee wrote:
> > 
> >>Hi everybody,
> >>
> >>I have succeeded in producing a chinese PDF from an XML document via a
> >>XSL transformation.
> >>There is a last problem. The professional chinese document should not
> >>let a chinese character alone on a line.
> >>For instance, I have the sentense "AZERTYUIO." to render in PDF. FOP
> >>(with the patch of the bug 36977 for the trunk version), generate a PDF
> >>with :
> >>AZERTYUI
> >>O.
> >>
> >>And, for chinese people, that's not a well rendered document. The
> >>prefered layout is :
> >>AZERTYUIO. (the characters have to be compressed)
> >>or
> >>AZERTYU (the characters have to be expanded)
> >>IO.
> >>
> >>The only way I found to do so is to force the last three characters of a
> >>text to be "no-wrap". Here is my XSL template :
> >>
> >>    <xsl:template match="text()">
> >>        <xsl:variable name="txt">
> >>            <xsl:call-template name="string.subst">
> >>                <xsl:with-param name="string">
> >>                    <xsl:call-template name="string.subst">
> >>                        <xsl:with-param name="string">
> >>                            <xsl:call-template name="string.subst">
> >>                                <xsl:with-param name="string" select="." />
> >>                                <xsl:with-param name="target"
> >>select="'.'" />
> >>                                <xsl:with-param name="replacement"
> >>select="'.&#x200B;'" />
> >>                            </xsl:call-template>
> >>                        </xsl:with-param>
> >>                        <xsl:with-param name="target" select="'\'" />
> >>                        <xsl:with-param name="replacement"
> >>select="'\&#x200B;'" />
> >>                    </xsl:call-template>
> >>                </xsl:with-param>
> >>                <xsl:with-param name="target" select="'/'" />
> >>                <xsl:with-param name="replacement" select="'/&#x200B;'" />
> >>            </xsl:call-template>
> >>        </xsl:variable>
> >>        <xsl:choose>
> >>            <xsl:when test="string-length($txt) > 3">
> >>                <xsl:value-of select="substring($txt, 1,
> >>string-length($txt)-3)" />
> >>                <fo:inline wrap-option="no-wrap" hyphenate="false"
> >>keep-together.within-line="always">
> >>                    <xsl:value-of select="substring($txt,
> >>string-length($txt)-2, string-length($txt))" />
> >>                </fo:inline>
> >>            </xsl:when>
> >>            <xsl:otherwise>
> >>                <xsl:value-of select="$txt" />
> >>            </xsl:otherwise>
> >>        </xsl:choose>
> >>    </xsl:template>
> >>
> >>And that's a not good solution because some ends of line can be missed.
> >>In my XML source document, I can have inline formatting properties, like
> >>bold or italic, that makes the text sequence shorter than 3, even if the
> >>complete sentense contains more than 3 characters.
> >>
> >>Is there any other way to handle this hyphenation particularity ?
> >>
> >>Thanks in advance
> >>Nicolas
> > 
> > 
> > 
> > Jeremias Maerki


Jeremias Maerki


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Chinese hyphenation particularity

Posted by Nicolas Lalevee <ni...@anyware-tech.com>.
Jeremias Maerki a écrit :
> There's not much else you can do other than to try to handle/work-around
> everything in XSLT. FOP does not have special code to handle languages
> like Chinese. We lack the knowledge set in the project team. Every now
> and then we talk about implementing UAX#14 line breaking but so far
> nobody had the resources to dive into this. Any help is welcome.

OK,
In fact, I haven't find any formatter that does that. XSL Formatter
doesn't support that (said the Antenna House support), no answer yet of
XEP support.
So the only way to do a such hyphenation is to code it myself in FOP. So
maybe I will be involved in that stuff, depending of the work to do.
Can I have more info about the "UAX#14" ?

bye,
Nicolas

> 
> On 06.03.2006 10:48:24 Nicolas Lalevee wrote:
> 
>>Hi everybody,
>>
>>I have succeeded in producing a chinese PDF from an XML document via a
>>XSL transformation.
>>There is a last problem. The professional chinese document should not
>>let a chinese character alone on a line.
>>For instance, I have the sentense "AZERTYUIO." to render in PDF. FOP
>>(with the patch of the bug 36977 for the trunk version), generate a PDF
>>with :
>>AZERTYUI
>>O.
>>
>>And, for chinese people, that's not a well rendered document. The
>>prefered layout is :
>>AZERTYUIO. (the characters have to be compressed)
>>or
>>AZERTYU (the characters have to be expanded)
>>IO.
>>
>>The only way I found to do so is to force the last three characters of a
>>text to be "no-wrap". Here is my XSL template :
>>
>>    <xsl:template match="text()">
>>        <xsl:variable name="txt">
>>            <xsl:call-template name="string.subst">
>>                <xsl:with-param name="string">
>>                    <xsl:call-template name="string.subst">
>>                        <xsl:with-param name="string">
>>                            <xsl:call-template name="string.subst">
>>                                <xsl:with-param name="string" select="." />
>>                                <xsl:with-param name="target"
>>select="'.'" />
>>                                <xsl:with-param name="replacement"
>>select="'.&#x200B;'" />
>>                            </xsl:call-template>
>>                        </xsl:with-param>
>>                        <xsl:with-param name="target" select="'\'" />
>>                        <xsl:with-param name="replacement"
>>select="'\&#x200B;'" />
>>                    </xsl:call-template>
>>                </xsl:with-param>
>>                <xsl:with-param name="target" select="'/'" />
>>                <xsl:with-param name="replacement" select="'/&#x200B;'" />
>>            </xsl:call-template>
>>        </xsl:variable>
>>        <xsl:choose>
>>            <xsl:when test="string-length($txt) > 3">
>>                <xsl:value-of select="substring($txt, 1,
>>string-length($txt)-3)" />
>>                <fo:inline wrap-option="no-wrap" hyphenate="false"
>>keep-together.within-line="always">
>>                    <xsl:value-of select="substring($txt,
>>string-length($txt)-2, string-length($txt))" />
>>                </fo:inline>
>>            </xsl:when>
>>            <xsl:otherwise>
>>                <xsl:value-of select="$txt" />
>>            </xsl:otherwise>
>>        </xsl:choose>
>>    </xsl:template>
>>
>>And that's a not good solution because some ends of line can be missed.
>>In my XML source document, I can have inline formatting properties, like
>>bold or italic, that makes the text sequence shorter than 3, even if the
>>complete sentense contains more than 3 characters.
>>
>>Is there any other way to handle this hyphenation particularity ?
>>
>>Thanks in advance
>>Nicolas
> 
> 
> 
> Jeremias Maerki


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Chinese hyphenation particularity

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
There's not much else you can do other than to try to handle/work-around
everything in XSLT. FOP does not have special code to handle languages
like Chinese. We lack the knowledge set in the project team. Every now
and then we talk about implementing UAX#14 line breaking but so far
nobody had the resources to dive into this. Any help is welcome.

On 06.03.2006 10:48:24 Nicolas Lalevee wrote:
> Hi everybody,
> 
> I have succeeded in producing a chinese PDF from an XML document via a
> XSL transformation.
> There is a last problem. The professional chinese document should not
> let a chinese character alone on a line.
> For instance, I have the sentense "AZERTYUIO." to render in PDF. FOP
> (with the patch of the bug 36977 for the trunk version), generate a PDF
> with :
> AZERTYUI
> O.
> 
> And, for chinese people, that's not a well rendered document. The
> prefered layout is :
> AZERTYUIO. (the characters have to be compressed)
> or
> AZERTYU (the characters have to be expanded)
> IO.
> 
> The only way I found to do so is to force the last three characters of a
> text to be "no-wrap". Here is my XSL template :
> 
>     <xsl:template match="text()">
>         <xsl:variable name="txt">
>             <xsl:call-template name="string.subst">
>                 <xsl:with-param name="string">
>                     <xsl:call-template name="string.subst">
>                         <xsl:with-param name="string">
>                             <xsl:call-template name="string.subst">
>                                 <xsl:with-param name="string" select="." />
>                                 <xsl:with-param name="target"
> select="'.'" />
>                                 <xsl:with-param name="replacement"
> select="'.&#x200B;'" />
>                             </xsl:call-template>
>                         </xsl:with-param>
>                         <xsl:with-param name="target" select="'\'" />
>                         <xsl:with-param name="replacement"
> select="'\&#x200B;'" />
>                     </xsl:call-template>
>                 </xsl:with-param>
>                 <xsl:with-param name="target" select="'/'" />
>                 <xsl:with-param name="replacement" select="'/&#x200B;'" />
>             </xsl:call-template>
>         </xsl:variable>
>         <xsl:choose>
>             <xsl:when test="string-length($txt) > 3">
>                 <xsl:value-of select="substring($txt, 1,
> string-length($txt)-3)" />
>                 <fo:inline wrap-option="no-wrap" hyphenate="false"
> keep-together.within-line="always">
>                     <xsl:value-of select="substring($txt,
> string-length($txt)-2, string-length($txt))" />
>                 </fo:inline>
>             </xsl:when>
>             <xsl:otherwise>
>                 <xsl:value-of select="$txt" />
>             </xsl:otherwise>
>         </xsl:choose>
>     </xsl:template>
> 
> And that's a not good solution because some ends of line can be missed.
> In my XML source document, I can have inline formatting properties, like
> bold or italic, that makes the text sequence shorter than 3, even if the
> complete sentense contains more than 3 characters.
> 
> Is there any other way to handle this hyphenation particularity ?
> 
> Thanks in advance
> Nicolas


Jeremias Maerki


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org