You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Glenn Adams <gl...@skynav.com> on 2011/09/19 03:40:37 UTC

Re: Apache FOP XML to PDF problem with CJK Unified Ideographs Extension B character

I mean FOP itself needs to be modified to properly handle UTF-16 encoded
(i.e., Java string encoded) surrogate pair encodings of non-BMP Unicode code
points.

I have been doing such modifications in part in my recent work to add
complex script support (in which I did code for handling non-BMP
codepoints), but I have not undertaken to perform all necessary
modifications to support extra-BMP code points in pre-existing code.

I have created a bug report for this at:

https://issues.apache.org/bugzilla/show_bug.cgi?id=51843

I am willing to accept the work of resolving this bug, but I can't give you
a schedule at this time as to when I will work on it (as I have other work
on complex scripts that will take priority). If anyone else wishes to
actively work on it and develop a patch before I do, they are free to do so;
in which case, I would appreciate hearing about such work ahead of time so I
don't create a redundant patch.

Regards,
Glenn

2011/9/19 BRUCE Y L LEE <br...@gmail.com>

> Hi Glenn,
>
> Thanks your details explanation.
>
> May I know what is the meaning of "upgrade to the current code base" ? It
> is related to OS or JDK? And how to upgrade to the current code base? I need
> to know how to handle the CJK Unified Ideographs Extension B characters (characters
> whose scalar value is greater than 65535 (decimal)) by my project
> needs. Thank you very much.
>
> Regards,
> Bruce
>
>
> 2011/9/17 Glenn Adams <gl...@skynav.com>
>
>> In general, FOP does not support the use of characters outside the Unicode
>> Base Multilingual Plane (BMP), i.e., characters whose scalar value is
>> greater than 65535 (decimal).
>>
>> Support for extra-BMP Unicode codepoints will require an upgrade to the
>> current code base. As you can see by the error messages below, it is
>> attempting to treat individual UTF-16 surrogate pair codes as distinct code
>> points, and not as a UTF-16 encoding of an extra-BMP code point.
>>
>> Regards,
>> Glenn
>>
>> 2011/9/16 BRUCE Y L LEE <br...@gmail.com>
>>
>>> Hi
>>>
>>> I would like to transform XML to PDF using Apache FOP.
>>> CJK Unified Ideographs Extension B characters is included in the XML
>>> (e.g. &#x20000;), I had add the font "Simsun (Founder Extended)" for Apache
>>> FOP but it cannot render the CJK Unified Ideographs Extension B characters,
>>> please help.
>>>
>>> CJK_ExtB.xml
>>> [code]
>>> <CJK_ExtB>&#x20000;</CJK_ExtB>
>>> [/code]
>>>
>>> CJK_ExtB_FO.xsl
>>> [code]
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <xsl:stylesheet version="1.0" xmlns:xsl="
>>> http://www.w3.org/1999/XSL/Transform" xmlns:fo="
>>> http://www.w3.org/1999/XSL/Format">
>>>  <xsl:template match="/">
>>> <fo:root>
>>> <fo:layout-master-set>
>>>  <fo:simple-page-master master-name="A4" page-height="29.7cm"
>>> page-width="21.0cm" margin="2cm">
>>> <fo:region-body/>
>>>  </fo:simple-page-master>
>>> </fo:layout-master-set>
>>> <fo:page-sequence master-reference="A4">
>>>  <fo:flow flow-name="xsl-region-body">
>>> <fo:block font-family="Simsun (Founder Extended)">測試<xsl:value-of
>>> select="CJK_ExtB"/>測試</fo:block>
>>>  </fo:flow>
>>> </fo:page-sequence>
>>> </fo:root>
>>>  </xsl:template>
>>> </xsl:stylesheet>
>>> [/code]
>>>
>>> fop.xconf
>>> [code]
>>> <renderer mime="application/pdf">
>>>   <filterList>
>>> <value>flate</value>
>>>   </filterList>
>>>   <fonts>
>>> <font metrics-url="file:///D:/fop-1.0/Fonts/SURSONG.xml" kerning="yes"
>>> embed-url="file:///D:/fop-1.0/fonts/SURSONG.ttf">
>>>  <font-triplet name="Simsun (Founder Extended)" style="normal"
>>> weight="normal"/>
>>> <font-triplet name="Simsun (Founder Extended)" style="normal"
>>> weight="bold"/>
>>>  <font-triplet name="Simsun (Founder Extended)" style="italic"
>>> weight="normal"/>
>>> <font-triplet name="Simsun (Founder Extended)" style="italic"
>>> weight="bold"/>
>>>  </font>
>>>   </fonts>
>>> </renderer>
>>> [/code]
>>>
>>> cmd
>>> [code]
>>> D:\fop-1.0>fop -c conf\fop.xconf -xml CJK_ExtB.xml -xsl CJK_ExtB_FO.xsl
>>> -pdf CJK_ExtB.pdf
>>> 9月 15, 2011 2:32:20 下午 org.apache.fop.apps.FopFactoryConfigurator
>>> configure INFO: Default page-height set to: 11in
>>> 9月 15, 2011 2:32:20 下午 org.apache.fop.apps.FopFactoryConfigurator
>>> configure INFO: Default page-width set to: 8.26in
>>> 9月 15, 2011 2:32:23 下午 org.apache.fop.events.LoggingEventListener
>>> processEvent WARNING: Glyph "?" (0xd840) not available in font
>>> "FZSY--SURROGATE-0".
>>> 9月 15, 2011 2:32:23 下午 org.apache.fop.events.LoggingEventListener
>>> processEvent WARNING: Glyph "?" (0xdc00) not available in font
>>> "FZSY--SURROGATE-0".
>>> [/code]
>>>
>>>
>>> Regards,
>>> Bruce
>>
>>
>

Re: Apache FOP XML to PDF problem with CJK Unified Ideographs Extension B character

Posted by BRUCE Y L LEE <br...@gmail.com>.
Hi Glenn,

Thanks again your details explanation and help to create a bug report.

Thank you very much.

Regards,
Bruce


2011/9/19 Glenn Adams <gl...@skynav.com>

> I mean FOP itself needs to be modified to properly handle UTF-16 encoded
> (i.e., Java string encoded) surrogate pair encodings of non-BMP Unicode code
> points.
>
> I have been doing such modifications in part in my recent work to add
> complex script support (in which I did code for handling non-BMP
> codepoints), but I have not undertaken to perform all necessary
> modifications to support extra-BMP code points in pre-existing code.
>
> I have created a bug report for this at:
>
> https://issues.apache.org/bugzilla/show_bug.cgi?id=51843
>
> I am willing to accept the work of resolving this bug, but I can't give you
> a schedule at this time as to when I will work on it (as I have other work
> on complex scripts that will take priority). If anyone else wishes to
> actively work on it and develop a patch before I do, they are free to do so;
> in which case, I would appreciate hearing about such work ahead of time so I
> don't create a redundant patch.
>
> Regards,
> Glenn
>
>
> 2011/9/19 BRUCE Y L LEE <br...@gmail.com>
>
>> Hi Glenn,
>>
>> Thanks your details explanation.
>>
>> May I know what is the meaning of "upgrade to the current code base" ? It
>> is related to OS or JDK? And how to upgrade to the current code base? I need
>> to know how to handle the CJK Unified Ideographs Extension B characters (characters
>> whose scalar value is greater than 65535 (decimal)) by my project
>> needs. Thank you very much.
>>
>> Regards,
>> Bruce
>>
>>
>> 2011/9/17 Glenn Adams <gl...@skynav.com>
>>
>>> In general, FOP does not support the use of characters outside the
>>> Unicode Base Multilingual Plane (BMP), i.e., characters whose scalar value
>>> is greater than 65535 (decimal).
>>>
>>> Support for extra-BMP Unicode codepoints will require an upgrade to the
>>> current code base. As you can see by the error messages below, it is
>>> attempting to treat individual UTF-16 surrogate pair codes as distinct code
>>> points, and not as a UTF-16 encoding of an extra-BMP code point.
>>>
>>> Regards,
>>> Glenn
>>>
>>> 2011/9/16 BRUCE Y L LEE <br...@gmail.com>
>>>
>>>> Hi
>>>>
>>>> I would like to transform XML to PDF using Apache FOP.
>>>> CJK Unified Ideographs Extension B characters is included in the XML
>>>> (e.g. &#x20000;), I had add the font "Simsun (Founder Extended)" for Apache
>>>> FOP but it cannot render the CJK Unified Ideographs Extension B characters,
>>>> please help.
>>>>
>>>> CJK_ExtB.xml
>>>> [code]
>>>> <CJK_ExtB>&#x20000;</CJK_ExtB>
>>>> [/code]
>>>>
>>>> CJK_ExtB_FO.xsl
>>>> [code]
>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>> <xsl:stylesheet version="1.0" xmlns:xsl="
>>>> http://www.w3.org/1999/XSL/Transform" xmlns:fo="
>>>> http://www.w3.org/1999/XSL/Format">
>>>>  <xsl:template match="/">
>>>> <fo:root>
>>>> <fo:layout-master-set>
>>>>  <fo:simple-page-master master-name="A4" page-height="29.7cm"
>>>> page-width="21.0cm" margin="2cm">
>>>> <fo:region-body/>
>>>>  </fo:simple-page-master>
>>>> </fo:layout-master-set>
>>>> <fo:page-sequence master-reference="A4">
>>>>  <fo:flow flow-name="xsl-region-body">
>>>> <fo:block font-family="Simsun (Founder Extended)">測試<xsl:value-of
>>>> select="CJK_ExtB"/>測試</fo:block>
>>>>  </fo:flow>
>>>> </fo:page-sequence>
>>>> </fo:root>
>>>>  </xsl:template>
>>>> </xsl:stylesheet>
>>>> [/code]
>>>>
>>>> fop.xconf
>>>> [code]
>>>> <renderer mime="application/pdf">
>>>>   <filterList>
>>>> <value>flate</value>
>>>>   </filterList>
>>>>   <fonts>
>>>> <font metrics-url="file:///D:/fop-1.0/Fonts/SURSONG.xml" kerning="yes"
>>>> embed-url="file:///D:/fop-1.0/fonts/SURSONG.ttf">
>>>>  <font-triplet name="Simsun (Founder Extended)" style="normal"
>>>> weight="normal"/>
>>>> <font-triplet name="Simsun (Founder Extended)" style="normal"
>>>> weight="bold"/>
>>>>  <font-triplet name="Simsun (Founder Extended)" style="italic"
>>>> weight="normal"/>
>>>> <font-triplet name="Simsun (Founder Extended)" style="italic"
>>>> weight="bold"/>
>>>>  </font>
>>>>   </fonts>
>>>> </renderer>
>>>> [/code]
>>>>
>>>> cmd
>>>> [code]
>>>> D:\fop-1.0>fop -c conf\fop.xconf -xml CJK_ExtB.xml -xsl CJK_ExtB_FO.xsl
>>>> -pdf CJK_ExtB.pdf
>>>> 9月 15, 2011 2:32:20 下午 org.apache.fop.apps.FopFactoryConfigurator
>>>> configure INFO: Default page-height set to: 11in
>>>> 9月 15, 2011 2:32:20 下午 org.apache.fop.apps.FopFactoryConfigurator
>>>> configure INFO: Default page-width set to: 8.26in
>>>> 9月 15, 2011 2:32:23 下午 org.apache.fop.events.LoggingEventListener
>>>> processEvent WARNING: Glyph "?" (0xd840) not available in font
>>>> "FZSY--SURROGATE-0".
>>>> 9月 15, 2011 2:32:23 下午 org.apache.fop.events.LoggingEventListener
>>>> processEvent WARNING: Glyph "?" (0xdc00) not available in font
>>>> "FZSY--SURROGATE-0".
>>>> [/code]
>>>>
>>>>
>>>> Regards,
>>>> Bruce
>>>
>>>
>>
>