You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@forrest.apache.org by Heather Phipps <hm...@gmail.com> on 2009/04/19 02:24:00 UTC

Japanese PDF rendering

Hello,

I am having trouble getting a Japanese XML document to display
properly in PDF form.

I set up the basic site that comes with Apache Forrest.  The only
changes I made were to add the docbook dtd (v4.4) and the docbook
stylesheets from sourceforge and change the sitemap accordingly.  I
then added a docbook file with Japanese text to the samples directory.
 I followed the directions here:
http://www.terra-intl.com/agel/2005/11/forrest_japanese.html.  Then
when I run Forrest, the html document is generated correctly, but the
PDF still has ####### instead of Japanese text.  (New English XML
files display fine both in HTML and PDF format.)

Any ideas as to where I'm going wrong, or how I might start to debug?

Other info:
Mac OS 10.5.6
Java 1.5.0_16
Forrest 0.8

-Heather

Re: Japanese PDF rendering

Posted by Heather Phipps <hm...@gmail.com>.
(Sorry, I meant "to 1 GB," not "by 1 GB)

Also, to clarify, when I do "forrest run" the PDFs are in fact generated,
but they use the default fonts, so the Japanese once again displays as
########

Re: Japanese PDF rendering

Posted by Heather Phipps <hm...@gmail.com>.
As stated previously in this thread, the out of memory issue was resolved by
increasing the memory to Java by 1 GB.

The outstanding issue is that the font is properly embedded when I use
"forrest site" to pre-generate content but not when I use "forrest-run" and
the PDFs are (apparently) created on-the-fly as they are requested...

Re: Japanese PDF rendering

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
On 08.05.2009 03:29:20 David Crossley wrote:
> Heather Phipps wrote:
> > Actually, with my little hack, the font is properly embedded in the PDF only
> > when pre-generated, i.e., when you do "forrest site," but not when the PDFs
> > are generated on the fly, i.e., when you do "forrest run."  Not sure exactly
> > what accounts for the difference.  If anyone has a better understanding of
> > the cause of this discrepency, please let me know...
> > 
> > Otherwise, I may look into this more at some later point and will update
> > this thread if I have any further insight.
> 
> Perhaps the FOP project will be able help with that
> and with the memory issue:
> http://xmlgraphics.apache.org/fop/

When it is working in one case inside Forrest but not another? I doubt
it. That rather sounds like an integration problem (and I'm no Cocoon
expert). I don't know what the difference is between the two cases and
how the JVM gets the memory settings for them (I assume we're talking
about an OutOfMemoryError). What you'll get from FOP support is
questions like: does it work from the command-line? Generally, FOP can
deal with Japanese characters (left-to-right only).

At any rate, loading TrueType font currently takes a lot of memory (i.e.
it's programmed rather inefficiently). I'm sure that can be improved but
so far nobody cared enough to do the work.

Sorry, but I don't have any good suggestions other than (somehow) making
sure the JVM has enough memory.

Jeremias Maerki


Re: Japanese PDF rendering

Posted by David Crossley <cr...@apache.org>.
Heather Phipps wrote:
> Actually, with my little hack, the font is properly embedded in the PDF only
> when pre-generated, i.e., when you do "forrest site," but not when the PDFs
> are generated on the fly, i.e., when you do "forrest run."  Not sure exactly
> what accounts for the difference.  If anyone has a better understanding of
> the cause of this discrepency, please let me know...
> 
> Otherwise, I may look into this more at some later point and will update
> this thread if I have any further insight.

Perhaps the FOP project will be able help with that
and with the memory issue:
http://xmlgraphics.apache.org/fop/

-David

Re: Japanese PDF rendering

Posted by Heather Phipps <hm...@gmail.com>.
Actually, with my little hack, the font is properly embedded in the PDF only
when pre-generated, i.e., when you do "forrest site," but not when the PDFs
are generated on the fly, i.e., when you do "forrest run."  Not sure exactly
what accounts for the difference.  If anyone has a better understanding of
the cause of this discrepency, please let me know...

Otherwise, I may look into this more at some later point and will update
this thread if I have any further insight.

Re: Japanese PDF rendering

Posted by Heather Phipps <hm...@gmail.com>.
I increased the memory to Java by editing maxmemory property in the site's
forrest.properties file:

# Max memory to allocate to Java
forrest.maxmemory=1024m

Japanese displays correctly now in PDF from DocBook source!  (Turns out
there was no need to update the FOP jar file.)

Re: Japanese PDF rendering

Posted by Heather Phipps <hm...@gmail.com>.
I have an update and some further questions.

The Japanese font display issue is only occurring with my DocBook XML source
files.  These are not handled by document2fo.xsl, but rather by the DocBook
4.4 stylesheets, which I downloaded and included on my own.  Here is an
excerpt from my project sitemap showing how I did this:

   <map:match pattern="**.pdf">
    <map:act type="sourcetype" src="{properties:content.xdocs}{1}.xml">
     <map:select type="parameter">
      <map:parameter name="parameter-selector-test" value="{sourcetype}"/>
      <map:when test="docbook-v4.4">
       <map:generate src="{properties:content.xdocs}{../1}.xml"/>
       <map:transform
          src="{properties:resources.stylesheets}/docbook-xsl-1.74.3-pre
2/fo/docbook.xsl"/>
       <map:serialize type="fo2pdf"/>
      </map:when>
     </map:select>
    </map:act>
   </map:match>

I didn't feel like messing around in the the DocBook stylesheets just yet,
so I hacked the font config file to see if I could get other (non-Japanese)
fonts to display properly in my PDFs generated from DocBook.  Here is the
hacked font config file:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<fonts>
 <font
metrics-file="/Users/Heather/apache-forrest-0.8/main/webapp/resources/fonts/Brush_Script.xml"
embed-file="/Users/Heather/apache-forrest-0.8/main/webapp/resources/fonts/Brush_Script.ttf"
kerning="yes">
  <font-triplet name="serif" style="normal" weight="normal"/>
  <font-triplet name="serif" style="normal" weight="bold"/>
  <font-triplet name="serif" style="italic" weight="normal"/>
  <font-triplet name="serif" style="italic" weight="bold"/>
  <font-triplet name="sans-serif" style="normal" weight="normal"/>
  <font-triplet name="sans-serif" style="normal" weight="bold"/>
  <font-triplet name="sans-serif" style="italic" weight="normal"/>
  <font-triplet name="sans-serif" style="italic" weight="bold"/>
  <font-triplet name="sans" style="normal" weight="normal"/>
  <font-triplet name="sans" style="normal" weight="bold"/>
  <font-triplet name="sans" style="italic" weight="normal"/>
  <font-triplet name="sans" style="italic" weight="bold"/>
  <font-triplet name="monospace" style="normal" weight="normal"/>
  <font-triplet name="monospace" style="normal" weight="bold"/>
  <font-triplet name="monospace" style="italic" weight="normal"/>
  <font-triplet name="monospace" style="italic" weight="bold"/>
 </font>
</fonts>
</configuration>

This works...it causes the PDFs created from both sample XML files included
in the demo site and my DocBook XML files to use the "Brush Script" font.
However, when I attempt the same thing with a Japanese font, it works
properly for the sample XML files (even when I add some Japanese text) but
not for my DocBook XML files...I get the following out of memory error:

* [54/14]   [0/0]     9.776s 0b      samples/dotnet.pdf
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:95)
    at org.apache.fop.tools.IOUtil.copyStream(IOUtil.java:77)
    at org.apache.fop.tools.IOUtil.toByteArray(IOUtil.java:95)
    at org.apache.fop.fonts.FontFileReader.init(FontFileReader.java:76)
    at org.apache.fop.fonts.FontFileReader.<init>(FontFileReader.java:105)
    at
org.apache.fop.render.pdf.fonts.MultiByteFont.getFontFile(MultiByteFont.java:237)
    at
org.apache.fop.render.pdf.fonts.LazyFont.getFontFile(LazyFont.java:218)
    at
org.apache.fop.pdf.PDFDocument.makeFontDescriptor(PDFDocument.java:974)
    at org.apache.fop.pdf.PDFDocument.makeFont(PDFDocument.java:875)
    at
org.apache.fop.render.pdf.FontSetup.addToResources(FontSetup.java:250)
    at
org.apache.fop.render.pdf.PDFRenderer.stopRenderer(PDFRenderer.java:249)
    at
org.apache.fop.apps.StreamRenderer.stopRenderer(StreamRenderer.java:204)
    at org.apache.fop.fo.FOTreeBuilder.endDocument(FOTreeBuilder.java:246)
    at
org.apache.cocoon.xml.AbstractXMLPipe.endDocument(AbstractXMLPipe.java:55)
    at
org.apache.cocoon.xml.AbstractXMLPipe.endDocument(AbstractXMLPipe.java:55)
    at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
org.apache.cocoon.core.container.handler.PoolableComponentHandler$ProxyHandler.invoke(PoolableComponentHandler.java:147)
    at $Proxy6.endDocument(Unknown Source)
    at
org.apache.cocoon.components.sax.XMLTeePipe.endDocument(XMLTeePipe.java:67)
    at
org.apache.xml.serializer.ToXMLSAXHandler.endDocument(ToXMLSAXHandler.java:181)
    at
org.apache.xalan.transformer.TransformerImpl.transformNode(TransformerImpl.java:1378)
    at
org.apache.xalan.transformer.TransformerImpl.run(TransformerImpl.java:3458)
    at
org.apache.xalan.transformer.TransformerHandlerImpl.endDocument(TransformerHandlerImpl.java:406)
    at
org.apache.cocoon.xml.AbstractXMLPipe.endDocument(AbstractXMLPipe.java:55)
    at
org.apache.cocoon.transformation.TraxTransformer.endDocument(TraxTransformer.java:585)
    at org.apache.xerces.parsers.AbstractSAXParser.endDocument(Unknown
Source)
    at org.apache.xerces.impl.dtd.XMLDTDValidator.endDocument(Unknown
Source)
    at org.apache.xerces.impl.XMLDocumentScannerImpl.endEntity(Unknown
Source)
    at org.apache.xerces.impl.XMLEntityManager.endEntity(Unknown Source)
    at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)


Anyone seen this before?  Thoughts/suggestions?  I'm first going to try to
change the heap memory applied to Forrest, hoping that this will change the
heap memory applied to FOP...and also maybe try updating the FOP jar that is
included with Forrest.

Re: Japanese PDF rendering

Posted by Heather Phipps <hm...@gmail.com>.
Hi David,

Thank you for pointing that out.  I am new to using the plugins and
didn't realize this.  So, I attempted to follow your advice:

heather-phipps-computer:org.apache.forrest.plugin.output.pdf Heather$
/Users/Heather/apache-forrest-0.8/tools/ant/bin/ant local-deploy
Buildfile: build.xml

init-build-compiler:

echo-init:
     [echo]
     [echo]       --------------------------------------------------------------
     [echo]
     [echo]       Using Apache Ant version 1.6.5 compiled on May 16 2006
     [echo]       Build file
/Users/Heather/apache-forrest-0.8/plugins/org.apache.forrest.plugin.output.pdf/build.xml
     [echo]       Use 'build.[sh|bat] -projecthelp' to see other options.
     [echo]       Build system home /Users/Heather/apache-forrest-0.8/tools/ant
     [echo]       Build number 0
     [echo]       Project Name Forrest plugin build file
     [echo]       Java Version 1.5
     [echo]       Timestamp 200904241808
     [echo]
     [echo]       --------------------------------------------------------------
     [echo]

init:

compile:

jar:

local-deploy:
     [echo] Locally deploying org.apache.forrest.plugin.output.pdf

build:

BUILD SUCCESSFUL
Total time: 0 seconds

But then I did "forrest" again, and the PDF is still generated with
##### instead of Japanese text, so I guess I am still missing
something else...

-Heather

Re: Japanese PDF rendering

Posted by David Crossley <cr...@apache.org>.
Heather Phipps wrote:
> 
> Unfortunately, even after editing
> $FORREST/apache-forrest-0.8/plugins/org.apache.forrest.plugin.output.pdf/output.xmap,
> I am unable to get the font to display correctly in the PDF file.

Whenever a local plugin's resources (e.g. sitemaps or stylesheets, etc.)
are edited then you need to "local-deploy" the plugin.

See:
http://forrest.apache.org/pluginDocs/plugins_0_80/usingPlugins.html#local-deploy
http://forrest.apache.org/docs_0_80/howto/howto-buildPlugin.html#ant
http://forrest.apache.org/docs_0_80/howto/howto-buildPlugin.html#Testing+During+Development 

-David

Re: Japanese PDF rendering

Posted by "Dr. Bhatia Praveen" <pr...@sumpurn.com>.
Hi Heather,
    Thanks.
    Yes the output.xmap needs to be used because the Forrest 0.8 now uses 
plugins for fo2pdf generation.

    The step 5 which you mentioned is correct as the skins are not ready for 
Japanese, so MS gothic needs to inserted.

    I will look into your files, and what I had done, to see if I can help 
you further. Will get back to you later on that.

best wishes
Praveen
----- Original Message ----- 
From: "Heather Phipps" <hm...@gmail.com>
To: <us...@forrest.apache.org>; "Dr. Bhatia Praveen" 
<pr...@sumpurn.com>
Sent: Thursday, April 23, 2009 3:19 PM
Subject: Re: Japanese PDF rendering


> Hi Praveen,
>
> I very much appreciate your prompt response, as I have been struggling
> with this issue for quite some time now.
>
> Unfortunately, even after editing
> $FORREST/apache-forrest-0.8/plugins/org.apache.forrest.plugin.output.pdf/output.xmap,
> I am unable to get the font to display correctly in the PDF file.
>
> I have a few questions.  First...how did you know that Forrest is
> using output.xmap instead of sitemap.xmap for this step?  Second, do
> you know if step number 5 in the instructions from
> http://www.terra-intl.com/agel/2005/11/forrest_japanese.html is
> correct for Forrest 0.8?
>
> Here is an excerpt from my output.xmap file:
>
> ...
>    <map:serializers default="fo2pdf">
>      <map:serializer name="fo2pdf"
> src="org.apache.cocoon.serialization.FOPSerializer"
> mime-type="application/pdf">
>        <user-config
> src="/Users/Heather/apache-forrest-0.8/main/webapp/resources/fonts/config.xml"
> />
>      </map:serializer>
> ...
>
> (Currently, this is in both my sitemap.xmap file and output.xmap)
>
> I downloaded the .ttf and corresponding font-metric xml files from
> http://www.terra-intl.com/agel/2005/11/forrest_japanese.html and
> placed them in 
> /Users/Heather/apache-forrest-0.8/main/webapp/resources/fonts/,
> alongside config.xml, which contains the following:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <configuration>
> <fonts>
> <font 
> metrics-file="/Users/Heather/apache-forrest-0.8/main/webapp/resources/fonts/sazanami-gothic.xml"
> embed-file="/Users/Heather/apache-forrest-0.8/main/webapp/resources/fonts/sazanami-gothic.ttf"
> kerning="yes">
>   <font-triplet name="Gothic" style="normal" weight="normal"/>
>   <font-triplet name="Gothic" style="normal" weight="bold"/>
>   <font-triplet name="Gothic" style="italic" weight="normal"/>
>   <font-triplet name="Gothic" style="italic" weight="bold"/>
> </font>
> <font 
> metrics-file="/Users/Heather/apache-forrest-0.8/main/webapp/resources/fonts/sazanami-mincho.xml"
> embed-file="/Users/Heather/apache-forrest-0.8/main/webapp/resources/fonts/sazanami-mincho.ttf"
> kerning="yes">
>   <font-triplet name="Mincho" style="normal" weight="normal"/>
>   <font-triplet name="Mincho" style="normal" weight="bold"/>
>   <font-triplet name="Mincho" style="italic" weight="normal"/>
>   <font-triplet name="Mincho" style="italic" weight="bold"/>
>  </font>
> </fonts>
> </configuration>
>
> I also tried using different .ttf files and generating the font-metric
> files myself, and it appeared that the font-metric files were
> generated properly, but the PDF still did not display the Japanese
> text correctly.
>
> Any further thoughts?
>
> -Heather
>


--------------------------------------------------------------------------------



No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.0.238 / Virus Database: 270.12.3/2075 - Release Date: 04/22/09 
17:25:00


Re: Japanese PDF rendering

Posted by Heather Phipps <hm...@gmail.com>.
Hi Praveen,

I very much appreciate your prompt response, as I have been struggling
with this issue for quite some time now.

Unfortunately, even after editing
$FORREST/apache-forrest-0.8/plugins/org.apache.forrest.plugin.output.pdf/output.xmap,
I am unable to get the font to display correctly in the PDF file.

I have a few questions.  First...how did you know that Forrest is
using output.xmap instead of sitemap.xmap for this step?  Second, do
you know if step number 5 in the instructions from
http://www.terra-intl.com/agel/2005/11/forrest_japanese.html is
correct for Forrest 0.8?

Here is an excerpt from my output.xmap file:

...
    <map:serializers default="fo2pdf">
      <map:serializer name="fo2pdf"
src="org.apache.cocoon.serialization.FOPSerializer"
mime-type="application/pdf">
        <user-config
src="/Users/Heather/apache-forrest-0.8/main/webapp/resources/fonts/config.xml"
/>
      </map:serializer>
...

(Currently, this is in both my sitemap.xmap file and output.xmap)

I downloaded the .ttf and corresponding font-metric xml files from
http://www.terra-intl.com/agel/2005/11/forrest_japanese.html and
placed them in /Users/Heather/apache-forrest-0.8/main/webapp/resources/fonts/,
alongside config.xml, which contains the following:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<fonts>
 <font metrics-file="/Users/Heather/apache-forrest-0.8/main/webapp/resources/fonts/sazanami-gothic.xml"
embed-file="/Users/Heather/apache-forrest-0.8/main/webapp/resources/fonts/sazanami-gothic.ttf"
kerning="yes">
   <font-triplet name="Gothic" style="normal" weight="normal"/>
   <font-triplet name="Gothic" style="normal" weight="bold"/>
   <font-triplet name="Gothic" style="italic" weight="normal"/>
   <font-triplet name="Gothic" style="italic" weight="bold"/>
 </font>
 <font metrics-file="/Users/Heather/apache-forrest-0.8/main/webapp/resources/fonts/sazanami-mincho.xml"
embed-file="/Users/Heather/apache-forrest-0.8/main/webapp/resources/fonts/sazanami-mincho.ttf"
kerning="yes">
   <font-triplet name="Mincho" style="normal" weight="normal"/>
   <font-triplet name="Mincho" style="normal" weight="bold"/>
   <font-triplet name="Mincho" style="italic" weight="normal"/>
   <font-triplet name="Mincho" style="italic" weight="bold"/>
  </font>
</fonts>
</configuration>

I also tried using different .ttf files and generating the font-metric
files myself, and it appeared that the font-metric files were
generated properly, but the PDF still did not display the Japanese
text correctly.

Any further thoughts?

-Heather

Re: Japanese PDF rendering

Posted by "Dr. Bhatia Praveen" <pr...@sumpurn.com>.
Hello Heather,
   I have made Japanese characters work in 0.8. I had similar problems as 
yours earlier but few months of work got it right.
   In short, the problem is with the Japanese fonts and the new version of 
Forrest.

  Please note that the instructions that you are following in the link are 
right for pre 0.8 version of Forrest but need some changes for 0.8 version.

   The steps that I remember having used are:
1) Choose a Japanese font, say MS Gothic
2) Convert this to .xml metrics, say gothic.xml (instructions are available 
on the site that you have given)
3) In userconfig.xml direct the pdf generator to refer to this gothic.xml 
for Japanese. (TILL HERE SAME STEPS as yours)
4) (CHANGE HERE for 0.8 version) INSTEAD OF SITEMAP put the reference to 
userconfig.xml in the output.xmap as given below:
forrest\apache-forrest-0.8\plugins\org.apache.forrest.plugin.output.pdf/output.xmap

That should get the right fonts to work for you in the pdf file.

best wishes
Praveen

----- Original Message ----- 
From: "Heather Phipps" <hm...@gmail.com>
To: <us...@forrest.apache.org>
Sent: Sunday, April 19, 2009 9:24 AM
Subject: Japanese PDF rendering


> Hello,
>
> I am having trouble getting a Japanese XML document to display
> properly in PDF form.
>
> I set up the basic site that comes with Apache Forrest.  The only
> changes I made were to add the docbook dtd (v4.4) and the docbook
> stylesheets from sourceforge and change the sitemap accordingly.  I
> then added a docbook file with Japanese text to the samples directory.
> I followed the directions here:
> http://www.terra-intl.com/agel/2005/11/forrest_japanese.html.  Then
> when I run Forrest, the html document is generated correctly, but the
> PDF still has ####### instead of Japanese text.  (New English XML
> files display fine both in HTML and PDF format.)
>
> Any ideas as to where I'm going wrong, or how I might start to debug?
>
> Other info:
> Mac OS 10.5.6
> Java 1.5.0_16
> Forrest 0.8
>
> -Heather
>


--------------------------------------------------------------------------------



No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.0.238 / Virus Database: 270.11.59/2063 - Release Date: 04/16/09 
16:38:00