You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Steven A Rowe <sa...@syr.edu> on 2008/08/29 23:57:31 UTC

Forrest PDF non-Latin-1 support [was: RE: prototype Solr 1.3 RC 1]

On 08/29/2008 at 3:24 PM, Chris Hostetter wrote:
> I suspect the PDF formatter just doesn't play nicely with the
> non-trivial UTF-8 characters.

This is an Apache FOP FAQ; from <http://xmlgraphics.apache.org/fop/faq.html#pdf-characters>:

   6.2. Some characters are not displayed, or displayed
        incorrectly, or displayed as "#".

   This usually means the selected font doesn't have a
   glyph for the character.

   The standard text fonts supplied with Acrobat Reader have
   mostly glyphs for characters from the ISO Latin 1 character
   set. [...]

   If you use your own fonts, the font must have a glyph for the
   desired character. Furthermore the font must be available on
   the machine where the PDF is viewed or it must have been
   embedded in the PDF file. [...]

There's an open Forrest bug for this problem: <https://issues.apache.org/jira/browse/FOR-132>, and the discussion there includes a link to the Cocoon documentation for embedding fonts in PDF files: <http://cocoon.apache.org/2.1/userdocs/pdf-serializer.html#FOP+and+Embedding+Fonts>.

This looks kinda complicated, and AFAICT would require modifications to the Forrest installation wherever the site is built.

I suspect that almost nobody looks at the PDF version of the "Who we are" page (and I sure am sorry now that I brought this up...)

If things are left as-is, Otis's last name would be displayed properly in the HTML, and garbled in the PDF; if the diacritic is removed, then it will be displayed improperly in both places :)

Steve

RE: Forrest PDF non-Latin-1 support [was: RE: prototype Solr 1.3RC 1]

Posted by Thorsten Scherler <th...@apache.org>.
On Thu, 2008-09-04 at 16:28 -0400, Steven A Rowe wrote:
> Hi Thorsten,
> 

Hi Steven,

> On 09/04/2008 at 3:57 PM, Thorsten Scherler wrote:
> > On Fri, 2008-08-29 at 17:57 -0400, Steven A Rowe wrote:
> > > On 08/29/2008 at 3:24 PM, Chris Hostetter wrote:
> > > > I suspect the PDF formatter just doesn't play nicely with the
> > > > non-trivial UTF-8 characters.
> > ...
> > > 
> > > There's an open Forrest bug for this problem:
> > <https://issues.apache.org/jira/browse/FOR-132>, and the discussion
> > there includes a link to the Cocoon documentation for embedding fonts in
> > PDF files:
> > <http://cocoon.apache.org/2.1/userdocs/pdf-serializer.html#FOP+and+Embedding+Fonts>.
> > > 
> > > This looks kinda complicated, and AFAICT would require
> > > modifications to the Forrest installation wherever the site is built.
> > 
> > I just saw the thread, I will have a look.
> > 
> > Which version of forrest is currently recommended? I ask because they
> > have been done (and still some underway) to the pdf plugin lately.
> 
> The Solr Website Update HOWTO <http://wiki.apache.org/solr/Website_Update_HOWTO> says to use Forrest 0.8.

The problem is in FOP but Jeremias commented on the forrest ml that FOP
0.95 will fix this problem. The most recent thread around this issue is:
http://marc.info/?t=122095109600018&r=1&w=2

I am sorry that there is no quick fix for the current release but I will
keep you informed as soon the problem is fixed and we can release 0.9.

salu2
-- 
Thorsten Scherler                                 thorsten.at.apache.org
Open Source Java                      consulting, training and solutions


RE: Forrest PDF non-Latin-1 support [was: RE: prototype Solr 1.3RC 1]

Posted by Steven A Rowe <sa...@syr.edu>.
Hi Thorsten,

On 09/04/2008 at 3:57 PM, Thorsten Scherler wrote:
> On Fri, 2008-08-29 at 17:57 -0400, Steven A Rowe wrote:
> > On 08/29/2008 at 3:24 PM, Chris Hostetter wrote:
> > > I suspect the PDF formatter just doesn't play nicely with the
> > > non-trivial UTF-8 characters.
> ...
> > 
> > There's an open Forrest bug for this problem:
> <https://issues.apache.org/jira/browse/FOR-132>, and the discussion
> there includes a link to the Cocoon documentation for embedding fonts in
> PDF files:
> <http://cocoon.apache.org/2.1/userdocs/pdf-serializer.html#FOP+and+Embedding+Fonts>.
> > 
> > This looks kinda complicated, and AFAICT would require
> > modifications to the Forrest installation wherever the site is built.
> 
> I just saw the thread, I will have a look.
> 
> Which version of forrest is currently recommended? I ask because they
> have been done (and still some underway) to the pdf plugin lately.

The Solr Website Update HOWTO <http://wiki.apache.org/solr/Website_Update_HOWTO> says to use Forrest 0.8.

Steve

Re: Forrest PDF non-Latin-1 support [was: RE: prototype Solr 1.3 RC 1]

Posted by Thorsten Scherler <th...@apache.org>.
On Fri, 2008-08-29 at 17:57 -0400, Steven A Rowe wrote:
> On 08/29/2008 at 3:24 PM, Chris Hostetter wrote:
> > I suspect the PDF formatter just doesn't play nicely with the
> > non-trivial UTF-8 characters.
...
> 
> There's an open Forrest bug for this problem: <https://issues.apache.org/jira/browse/FOR-132>, and the discussion there includes a link to the Cocoon documentation for embedding fonts in PDF files: <http://cocoon.apache.org/2.1/userdocs/pdf-serializer.html#FOP+and+Embedding+Fonts>.
> 
> This looks kinda complicated, and AFAICT would require modifications to the Forrest installation wherever the site is built.

I just saw the thread, I will have a look.

Which version of forrest is currently recommended? I ask because they
have been done (and still some underway) to the pdf plugin lately.

Will let you know about my findings.

salu2
-- 
Thorsten Scherler                                 thorsten.at.apache.org
Open Source Java                      consulting, training and solutions