You are viewing a plain text version of this content. The canonical link for it is here.

Posted to fop-dev@xmlgraphics.apache.org by Jeremias Maerki <de...@jeremias-maerki.ch> on 2008/02/12 15:25:21 UTC

Supporting unusual encodings for Type 1 fonts

I've been asked to look into the possibility to support unusual
encodings (like Cyrillic) with Type 1 fonts. Right now we only support
WinAnsiEncoding (plus special handling for Symbol and ZapfDingbats).

I already have an AFM parser. The AFM parser is the precondition to
safely support non-standard encodings as only this file contains the
glyph list of a font.

I'm now on a good way to support non-WinAnsi encodings since I can now
build CodePointMapping instances from an AFM file. I then have to teach
the PDF and PS renderers to make use of these special encodings.

That's step 1, but it will only make the font's native encoding
available in FOP. The number of available glyphs for a Type 1 font will
still remain under 255 (typicaly under 223 as the first 32 chars are
usually not used). To support all glyphs of a Type 1 font we need more
and I found two possible ways to pursue:

1. Treat Type 1 fonts as CID fonts.

+ Probably the cleaner approach.
+ All glyphs are supported under one single font (no font renderer-level
  font switching required, see below)
- Makes the generated PDF/PS code a little less readable but that's not
  important.

2. Do something like OpenOffice when handling fonts with more than 255
chars: Create multiple single-byte encodings which map to the same base
font. This will require an 1:n relationship from font to char mapping
which the renderers also have to handle. The first encoding will be
equal to the font's default encoding (PDF calls that the "implicit base
encoding"). The other encoding(s) will be built from the rest of the
available characters. In the renderer it will be necessary to switch
fonts from one character to another (not the same as switching from
Helvetica to Symbol, i.e. not at FO level, but at renderer level).

+ Higher compatibility with PDF viewers which are not yet
  feature-complete.
+ Keeps the generated PDF/PS code more readable (not important)
- Switching between derived fonts (i.e. font with a common base font but
  with special encodings) is necessary. SingleByteFont needs to be split
  in two classes.

An example: The "Baskerville Cyrillic" font contains 264
characters/glyphs. The default encoding only contains 221 characters. So
43 additional characters can be made available like this.

I'm currently leaning towards CID fonts as it is probably the cleaner
approach. Both solutions are probably pretty much the same in terms of
effort. The CID approach will take more work in the PS renderer and the
multi-encoding approach will make changes necessary in FOP's font
library.

If anyone has thoughts on this, I'd appreciate it. I'll finish the
changes for supporting the default encodings and then finish the
processing feedback stuff before I finish this here.

Jeremias Maerki

Re: Supporting unusual encodings for Type 1 fonts

Posted by The Web Maestro <th...@gmail.com>.

Yes... If you build it, it will be tested... ;-) I guess I was just
trying to identify the minimum PDF viewers FOP strives to support.

As for 'cleanup', you mentioned the PS & PDF code'll be messier but it
doesn't matter much... If it doesn't matter much (e.g., it's still
valid) then I guess we're good...

Clay



On 2/15/08, Jeremias Maerki <de...@jeremias-maerki.ch> wrote:
> On 14.02.2008 16:24:40 The Web Maestro wrote:
> > In addition to Acrobat 6,7,8,+, Apple QuickView & Evince, I would
> > think nice to have:
> > - Acrobat Reader 5 (last version for Mac OS 9 Classic)
> > - Apple Preview 10.4 (probably similar to QuickView)
> > - Preview for 10.3 (Panther) would be nice too...
>
> People are allowed to test with all the PDF viewers they want. I have no
> Mac so I can't test any of these.
>
> > Is it possible the PDF code for option 1 could be 'cleaned up' in the
> > future (or does it matter)?
>
> Sorry, but I don't know what you mean. What's there to clean up?
>
> > Clay
> >
> >
> >
> > On 2/13/08, Jeremias Maerki <de...@jeremias-maerki.ch> wrote:
> > > Just some details what each approach will produce:
> > >
> > > #1 produces a /CIDFontType0 CIDFont [1] and a /Type0 Composite Font
> > > referencing the former.
> > >
> > > #2 produces one or more /Type1 fonts.
> > >
> > > [1] for TrueType we produce a CIDFontType2 CIDFont and a /Type0
> > > Composite font for each TrueType font. OpenOffice produces one or more
> > > /TrueType fonts for each TrueType font.
> > >
> > > #1 would always generate a CID font for simplicity. What you propose is
> > > basically a "#2a", i.e. produce a /Type1 font if the document stays
> > > within the default encoding of the font. If additional characters are
> > > used FOP would switch to CID fonts instead of producing a /Type1 font.
> > > So this needs elements from #1 and #2. Possible and probably makes sense
> > > if CID fonts work in the first place. I like it.
> > >
> > > BTW, I just found out that I have to generate a ToUnicode CMap if a
> > > Type1 font doesn't use one of the encodings that are predefined in the
> > > PDF spec. So a little more work for me there.
> > >
> > > On 13.02.2008 11:57:34 Vincent Hennebert wrote:
> > > > Hi Jeremias,
> > > >
> > > > With solution #1, if I happen to use only the glyphs from the font
> that
> > > > are available in its default encoding, will the resulting PDF be the
> > > > same as in solution #2?
> > > > What I mean is, will feature-incomplete PDF readers be able to display
> > > > it? In which case this wouldn't be that bad.
> > > >
> > > > Anyway, solution #1 also looks cleaner to me, so go for it. If that
> > > > means that I'll have to create a RFE for my favourite PDF reader, then
> > > > I'll do it ;-)
> > > >
> > > > Vincent
> > > >
> > > >
> > > > Jeremias Maerki wrote:
> > > > > I've been asked to look into the possibility to support unusual
> > > > > encodings (like Cyrillic) with Type 1 fonts. Right now we only
> support
> > > > > WinAnsiEncoding (plus special handling for Symbol and ZapfDingbats).
> > > > >
> > > > > I already have an AFM parser. The AFM parser is the precondition to
> > > > > safely support non-standard encodings as only this file contains the
> > > > > glyph list of a font.
> > > > >
> > > > > I'm now on a good way to support non-WinAnsi encodings since I can
> now
> > > > > build CodePointMapping instances from an AFM file. I then have to
> teach
> > > > > the PDF and PS renderers to make use of these special encodings.
> > > > >
> > > > > That's step 1, but it will only make the font's native encoding
> > > > > available in FOP. The number of available glyphs for a Type 1 font
> will
> > > > > still remain under 255 (typicaly under 223 as the first 32 chars are
> > > > > usually not used). To support all glyphs of a Type 1 font we need
> more
> > > > > and I found two possible ways to pursue:
> > > > >
> > > > > 1. Treat Type 1 fonts as CID fonts.
> > > > >
> > > > > + Probably the cleaner approach.
> > > > > + All glyphs are supported under one single font (no font
> renderer-level
> > > > >   font switching required, see below)
> > > > > - Makes the generated PDF/PS code a little less readable but that's
> not
> > > > >   important.
> > > > >
> > > > > 2. Do something like OpenOffice when handling fonts with more than
> 255
> > > > > chars: Create multiple single-byte encodings which map to the same
> base
> > > > > font. This will require an 1:n relationship from font to char
> mapping
> > > > > which the renderers also have to handle. The first encoding will be
> > > > > equal to the font's default encoding (PDF calls that the "implicit
> base
> > > > > encoding"). The other encoding(s) will be built from the rest of the
> > > > > available characters. In the renderer it will be necessary to switch
> > > > > fonts from one character to another (not the same as switching from
> > > > > Helvetica to Symbol, i.e. not at FO level, but at renderer level).
> > > > >
> > > > > + Higher compatibility with PDF viewers which are not yet
> > > > >   feature-complete.
> > > > > + Keeps the generated PDF/PS code more readable (not important)
> > > > > - Switching between derived fonts (i.e. font with a common base font
> but
> > > > >   with special encodings) is necessary. SingleByteFont needs to be
> split
> > > > >   in two classes.
> > > > >
> > > > > An example: The "Baskerville Cyrillic" font contains 264
> > > > > characters/glyphs. The default encoding only contains 221
> characters. So
> > > > > 43 additional characters can be made available like this.
> > > > >
> > > > > I'm currently leaning towards CID fonts as it is probably the
> cleaner
> > > > > approach. Both solutions are probably pretty much the same in terms
> of
> > > > > effort. The CID approach will take more work in the PS renderer and
> the
> > > > > multi-encoding approach will make changes necessary in FOP's font
> > > > > library.
> > > > >
> > > > > If anyone has thoughts on this, I'd appreciate it. I'll finish the
> > > > > changes for supporting the default encodings and then finish the
> > > > > processing feedback stuff before I finish this here.
> > > > >
> > > > > Jeremias Maerki
> > > >
> > > >
> > > > --
> > > > Vincent Hennebert                            Anyware Technologies
> > > > http://people.apache.org/~vhennebert
> http://www.anyware-tech.com
> > > > Apache FOP Committer                         FOP
> Development/Consulting
> > >
> > >
> > >
> > >
> > > Jeremias Maerki
> > >
> > >
> >
> > --
> > Sent from Gmail for mobile | mobile.google.com
> >
> > Regards,
> >
> > The Web Maestro
> > --
> > <th...@gmail.com> - <http://homepage.mac.com/webmaestro/>
> > My religion is simple. My religion is kindness.
> > - HH The 14th Dalai Lama of Tibet
>
>
>
>
> Jeremias Maerki
>
>

-- 
Sent from Gmail for mobile | mobile.google.com

Regards,

The Web Maestro
-- 
<th...@gmail.com> - <http://homepage.mac.com/webmaestro/>
My religion is simple. My religion is kindness.
- HH The 14th Dalai Lama of Tibet

Re: Supporting unusual encodings for Type 1 fonts

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.

On 14.02.2008 16:24:40 The Web Maestro wrote:
> In addition to Acrobat 6,7,8,+, Apple QuickView & Evince, I would
> think nice to have:
> - Acrobat Reader 5 (last version for Mac OS 9 Classic)
> - Apple Preview 10.4 (probably similar to QuickView)
> - Preview for 10.3 (Panther) would be nice too...

People are allowed to test with all the PDF viewers they want. I have no
Mac so I can't test any of these.

> Is it possible the PDF code for option 1 could be 'cleaned up' in the
> future (or does it matter)?

Sorry, but I don't know what you mean. What's there to clean up?

> Clay
> 
> 
> 
> On 2/13/08, Jeremias Maerki <de...@jeremias-maerki.ch> wrote:
> > Just some details what each approach will produce:
> >
> > #1 produces a /CIDFontType0 CIDFont [1] and a /Type0 Composite Font
> > referencing the former.
> >
> > #2 produces one or more /Type1 fonts.
> >
> > [1] for TrueType we produce a CIDFontType2 CIDFont and a /Type0
> > Composite font for each TrueType font. OpenOffice produces one or more
> > /TrueType fonts for each TrueType font.
> >
> > #1 would always generate a CID font for simplicity. What you propose is
> > basically a "#2a", i.e. produce a /Type1 font if the document stays
> > within the default encoding of the font. If additional characters are
> > used FOP would switch to CID fonts instead of producing a /Type1 font.
> > So this needs elements from #1 and #2. Possible and probably makes sense
> > if CID fonts work in the first place. I like it.
> >
> > BTW, I just found out that I have to generate a ToUnicode CMap if a
> > Type1 font doesn't use one of the encodings that are predefined in the
> > PDF spec. So a little more work for me there.
> >
> > On 13.02.2008 11:57:34 Vincent Hennebert wrote:
> > > Hi Jeremias,
> > >
> > > With solution #1, if I happen to use only the glyphs from the font that
> > > are available in its default encoding, will the resulting PDF be the
> > > same as in solution #2?
> > > What I mean is, will feature-incomplete PDF readers be able to display
> > > it? In which case this wouldn't be that bad.
> > >
> > > Anyway, solution #1 also looks cleaner to me, so go for it. If that
> > > means that I'll have to create a RFE for my favourite PDF reader, then
> > > I'll do it ;-)
> > >
> > > Vincent
> > >
> > >
> > > Jeremias Maerki wrote:
> > > > I've been asked to look into the possibility to support unusual
> > > > encodings (like Cyrillic) with Type 1 fonts. Right now we only support
> > > > WinAnsiEncoding (plus special handling for Symbol and ZapfDingbats).
> > > >
> > > > I already have an AFM parser. The AFM parser is the precondition to
> > > > safely support non-standard encodings as only this file contains the
> > > > glyph list of a font.
> > > >
> > > > I'm now on a good way to support non-WinAnsi encodings since I can now
> > > > build CodePointMapping instances from an AFM file. I then have to teach
> > > > the PDF and PS renderers to make use of these special encodings.
> > > >
> > > > That's step 1, but it will only make the font's native encoding
> > > > available in FOP. The number of available glyphs for a Type 1 font will
> > > > still remain under 255 (typicaly under 223 as the first 32 chars are
> > > > usually not used). To support all glyphs of a Type 1 font we need more
> > > > and I found two possible ways to pursue:
> > > >
> > > > 1. Treat Type 1 fonts as CID fonts.
> > > >
> > > > + Probably the cleaner approach.
> > > > + All glyphs are supported under one single font (no font renderer-level
> > > >   font switching required, see below)
> > > > - Makes the generated PDF/PS code a little less readable but that's not
> > > >   important.
> > > >
> > > > 2. Do something like OpenOffice when handling fonts with more than 255
> > > > chars: Create multiple single-byte encodings which map to the same base
> > > > font. This will require an 1:n relationship from font to char mapping
> > > > which the renderers also have to handle. The first encoding will be
> > > > equal to the font's default encoding (PDF calls that the "implicit base
> > > > encoding"). The other encoding(s) will be built from the rest of the
> > > > available characters. In the renderer it will be necessary to switch
> > > > fonts from one character to another (not the same as switching from
> > > > Helvetica to Symbol, i.e. not at FO level, but at renderer level).
> > > >
> > > > + Higher compatibility with PDF viewers which are not yet
> > > >   feature-complete.
> > > > + Keeps the generated PDF/PS code more readable (not important)
> > > > - Switching between derived fonts (i.e. font with a common base font but
> > > >   with special encodings) is necessary. SingleByteFont needs to be split
> > > >   in two classes.
> > > >
> > > > An example: The "Baskerville Cyrillic" font contains 264
> > > > characters/glyphs. The default encoding only contains 221 characters. So
> > > > 43 additional characters can be made available like this.
> > > >
> > > > I'm currently leaning towards CID fonts as it is probably the cleaner
> > > > approach. Both solutions are probably pretty much the same in terms of
> > > > effort. The CID approach will take more work in the PS renderer and the
> > > > multi-encoding approach will make changes necessary in FOP's font
> > > > library.
> > > >
> > > > If anyone has thoughts on this, I'd appreciate it. I'll finish the
> > > > changes for supporting the default encodings and then finish the
> > > > processing feedback stuff before I finish this here.
> > > >
> > > > Jeremias Maerki
> > >
> > >
> > > --
> > > Vincent Hennebert                            Anyware Technologies
> > > http://people.apache.org/~vhennebert         http://www.anyware-tech.com
> > > Apache FOP Committer                         FOP Development/Consulting
> >
> >
> >
> >
> > Jeremias Maerki
> >
> >
> 
> -- 
> Sent from Gmail for mobile | mobile.google.com
> 
> Regards,
> 
> The Web Maestro
> -- 
> <th...@gmail.com> - <http://homepage.mac.com/webmaestro/>
> My religion is simple. My religion is kindness.
> - HH The 14th Dalai Lama of Tibet




Jeremias Maerki

Re: Supporting unusual encodings for Type 1 fonts

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.

Should be fixed now together with a few other details I found while
diving in even deeper. In the end I decided to rip out the old glyph
list which seems to have been modified manually to hack in support for
special characters like NBSP. This should be cleaner now.

There's a little detail concerning the URW Dingbats font you used here:
The font contains different glyphs than are described in the PDF 1.4
spec even though the same character names are used. So if anyone wonders,
this is not a bug in the new code.

On 15.02.2008 13:11:57 Vincent Hennebert wrote:
> Jeremias Maerki wrote:
> > Took me a bit to find the font you were talking about. I can see in the
> > AFM that it states the AdobeStandardEncoding as default encoding, so
> > until I can implement the second part of the changes I was talking about,
> > no Cyrillics for you. ;-)
> 
> Ah, ok. I’ll wait for the next bunch of changes, then.
> 
> However, if I understand the commit message correctly, a font like Zapf 
> Dingbats which uses a non-standard encoding is now supposed to be 
> working?
> 
> Then you might be interested in the error message below.
> Font configuration:
>     <font kerning="yes" embed-url="type1/gsfonts/d050000l.pfb">
>         <font-triplet name="Dingbats" style="normal" weight="normal"/>
>     </font>
> Excerpt from the afm file:
>     FontName Dingbats
>     FullName Dingbats 
>     FamilyName Dingbats
>     EncodingScheme FontSpecific
> 
> Error message:
> 15-Feb-2008 12:00:27 org.apache.fop.fonts.type1.PFMFile loadExtMetrics
> WARNING: Size of extension block was expected to be 52 bytes, but was 0 bytes.
> 15-Feb-2008 12:00:27 org.apache.fop.cli.Main startFOP
> SEVERE: Exception
> java.lang.NullPointerException
>         at org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:189)
>         at org.apache.fop.cli.InputHandler.renderTo(InputHandler.java:116)
>         at org.apache.fop.cli.Main.startFOP(Main.java:166)
>         at org.apache.fop.cli.Main.main(Main.java:197)
> 
> ---------
> 
> java.lang.NullPointerException
>         at org.apache.fop.fonts.type1.Type1FontLoader.buildCustomEncoding(Type1FontLoader.java:299)
>         at org.apache.fop.fonts.type1.Type1FontLoader.buildFont(Type1FontLoader.java:142)
>         at org.apache.fop.fonts.type1.Type1FontLoader.read(Type1FontLoader.java:111)
>         at org.apache.fop.fonts.FontLoader.getFont(FontLoader.java:164)
>         at org.apache.fop.fonts.FontLoader.loadFont(FontLoader.java:113)
>         at org.apache.fop.fonts.LazyFont.load(LazyFont.java:126)
>         at org.apache.fop.fonts.LazyFont.getAscender(LazyFont.java:233)
>         at org.apache.fop.fonts.Font.getAscender(Font.java:96)
>         at org.apache.fop.layoutmgr.BlockLayoutManager.initialize(BlockLayoutManager.java:86)
>         at org.apache.fop.layoutmgr.AbstractLayoutManager.getChildLM(AbstractLayoutManager.java:118)
>         at org.apache.fop.layoutmgr.FlowLayoutManager.getNextKnuthElements(FlowLayoutManager.java:77)
>         at org.apache.fop.layoutmgr.PageBreaker.getNextKnuthElements(PageBreaker.java:145)
>         at org.apache.fop.layoutmgr.AbstractBreaker.getNextBlockList(AbstractBreaker.java:554)
>         at org.apache.fop.layoutmgr.PageBreaker.getNextBlockList(PageBreaker.java:137)
>         at org.apache.fop.layoutmgr.AbstractBreaker.doLayout(AbstractBreaker.java:302)
>         at org.apache.fop.layoutmgr.AbstractBreaker.doLayout(AbstractBreaker.java:264)
>         at org.apache.fop.layoutmgr.PageSequenceLayoutManager.activateLayout(PageSequenceLayoutManager.java:106)
>         at org.apache.fop.area.AreaTreeHandler.endPageSequence(AreaTreeHandler.java:234)
>         at org.apache.fop.fo.pagination.PageSequence.endOfNode(PageSequence.java:123)
>         at org.apache.fop.fo.FOTreeBuilder$MainFOHandler.endElement(FOTreeBuilder.java:374)
>         at org.apache.fop.fo.FOTreeBuilder.endElement(FOTreeBuilder.java:196)
>         at org.apache.xalan.transformer.TransformerIdentityImpl.endElement(TransformerIdentityImpl.java:1101)
>         at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
>         at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)
>         at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
>         at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>         at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:484)
>         at org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:186)
>         at org.apache.fop.cli.InputHandler.renderTo(InputHandler.java:116)
>         at org.apache.fop.cli.Main.startFOP(Main.java:166)
>         at org.apache.fop.cli.Main.main(Main.java:197)
> 
> <snip/>
> 
> Vincent
> 
> 
> -- 
> Vincent Hennebert                            Anyware Technologies
> http://people.apache.org/~vhennebert         http://www.anyware-tech.com
> Apache FOP Committer                         FOP Development/Consulting




Jeremias Maerki

Re: Supporting unusual encodings for Type 1 fonts

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.

It looks like we don't have character mappings from ZapfDingbats
characters to Unicode in Glyphs.java althought we have them in the
glyphlist.xml that the Glyphs.java was derived from. Furthermore,
there's a simple "if !=null" check missing. I'll look into it.

On 15.02.2008 13:11:57 Vincent Hennebert wrote:
> Jeremias Maerki wrote:
> > Took me a bit to find the font you were talking about. I can see in the
> > AFM that it states the AdobeStandardEncoding as default encoding, so
> > until I can implement the second part of the changes I was talking about,
> > no Cyrillics for you. ;-)
> 
> Ah, ok. I’ll wait for the next bunch of changes, then.
> 
> However, if I understand the commit message correctly, a font like Zapf 
> Dingbats which uses a non-standard encoding is now supposed to be 
> working?
> 
> Then you might be interested in the error message below.
> Font configuration:
>     <font kerning="yes" embed-url="type1/gsfonts/d050000l.pfb">
>         <font-triplet name="Dingbats" style="normal" weight="normal"/>
>     </font>
> Excerpt from the afm file:
>     FontName Dingbats
>     FullName Dingbats 
>     FamilyName Dingbats
>     EncodingScheme FontSpecific
> 
> Error message:
> 15-Feb-2008 12:00:27 org.apache.fop.fonts.type1.PFMFile loadExtMetrics
> WARNING: Size of extension block was expected to be 52 bytes, but was 0 bytes.
> 15-Feb-2008 12:00:27 org.apache.fop.cli.Main startFOP
> SEVERE: Exception
> java.lang.NullPointerException
>         at org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:189)
>         at org.apache.fop.cli.InputHandler.renderTo(InputHandler.java:116)
>         at org.apache.fop.cli.Main.startFOP(Main.java:166)
>         at org.apache.fop.cli.Main.main(Main.java:197)
> 
> ---------
> 
> java.lang.NullPointerException
>         at org.apache.fop.fonts.type1.Type1FontLoader.buildCustomEncoding(Type1FontLoader.java:299)
>         at org.apache.fop.fonts.type1.Type1FontLoader.buildFont(Type1FontLoader.java:142)
>         at org.apache.fop.fonts.type1.Type1FontLoader.read(Type1FontLoader.java:111)
>         at org.apache.fop.fonts.FontLoader.getFont(FontLoader.java:164)
>         at org.apache.fop.fonts.FontLoader.loadFont(FontLoader.java:113)
>         at org.apache.fop.fonts.LazyFont.load(LazyFont.java:126)
>         at org.apache.fop.fonts.LazyFont.getAscender(LazyFont.java:233)
>         at org.apache.fop.fonts.Font.getAscender(Font.java:96)
>         at org.apache.fop.layoutmgr.BlockLayoutManager.initialize(BlockLayoutManager.java:86)
>         at org.apache.fop.layoutmgr.AbstractLayoutManager.getChildLM(AbstractLayoutManager.java:118)
>         at org.apache.fop.layoutmgr.FlowLayoutManager.getNextKnuthElements(FlowLayoutManager.java:77)
>         at org.apache.fop.layoutmgr.PageBreaker.getNextKnuthElements(PageBreaker.java:145)
>         at org.apache.fop.layoutmgr.AbstractBreaker.getNextBlockList(AbstractBreaker.java:554)
>         at org.apache.fop.layoutmgr.PageBreaker.getNextBlockList(PageBreaker.java:137)
>         at org.apache.fop.layoutmgr.AbstractBreaker.doLayout(AbstractBreaker.java:302)
>         at org.apache.fop.layoutmgr.AbstractBreaker.doLayout(AbstractBreaker.java:264)
>         at org.apache.fop.layoutmgr.PageSequenceLayoutManager.activateLayout(PageSequenceLayoutManager.java:106)
>         at org.apache.fop.area.AreaTreeHandler.endPageSequence(AreaTreeHandler.java:234)
>         at org.apache.fop.fo.pagination.PageSequence.endOfNode(PageSequence.java:123)
>         at org.apache.fop.fo.FOTreeBuilder$MainFOHandler.endElement(FOTreeBuilder.java:374)
>         at org.apache.fop.fo.FOTreeBuilder.endElement(FOTreeBuilder.java:196)
>         at org.apache.xalan.transformer.TransformerIdentityImpl.endElement(TransformerIdentityImpl.java:1101)
>         at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
>         at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)
>         at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
>         at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>         at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:484)
>         at org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:186)
>         at org.apache.fop.cli.InputHandler.renderTo(InputHandler.java:116)
>         at org.apache.fop.cli.Main.startFOP(Main.java:166)
>         at org.apache.fop.cli.Main.main(Main.java:197)
> 
> <snip/>
> 
> Vincent
> 
> 
> -- 
> Vincent Hennebert                            Anyware Technologies
> http://people.apache.org/~vhennebert         http://www.anyware-tech.com
> Apache FOP Committer                         FOP Development/Consulting




Jeremias Maerki

Re: Supporting unusual encodings for Type 1 fonts

Posted by Vincent Hennebert <vi...@anyware-tech.com>.

Jeremias Maerki wrote:
> Took me a bit to find the font you were talking about. I can see in the
> AFM that it states the AdobeStandardEncoding as default encoding, so
> until I can implement the second part of the changes I was talking about,
> no Cyrillics for you. ;-)

Ah, ok. I’ll wait for the next bunch of changes, then.

However, if I understand the commit message correctly, a font like Zapf 
Dingbats which uses a non-standard encoding is now supposed to be 
working?

Then you might be interested in the error message below.
Font configuration:
    <font kerning="yes" embed-url="type1/gsfonts/d050000l.pfb">
        <font-triplet name="Dingbats" style="normal" weight="normal"/>
    </font>
Excerpt from the afm file:
    FontName Dingbats
    FullName Dingbats 
    FamilyName Dingbats
    EncodingScheme FontSpecific

Error message:
15-Feb-2008 12:00:27 org.apache.fop.fonts.type1.PFMFile loadExtMetrics
WARNING: Size of extension block was expected to be 52 bytes, but was 0 bytes.
15-Feb-2008 12:00:27 org.apache.fop.cli.Main startFOP
SEVERE: Exception
java.lang.NullPointerException
        at org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:189)
        at org.apache.fop.cli.InputHandler.renderTo(InputHandler.java:116)
        at org.apache.fop.cli.Main.startFOP(Main.java:166)
        at org.apache.fop.cli.Main.main(Main.java:197)

---------

java.lang.NullPointerException
        at org.apache.fop.fonts.type1.Type1FontLoader.buildCustomEncoding(Type1FontLoader.java:299)
        at org.apache.fop.fonts.type1.Type1FontLoader.buildFont(Type1FontLoader.java:142)
        at org.apache.fop.fonts.type1.Type1FontLoader.read(Type1FontLoader.java:111)
        at org.apache.fop.fonts.FontLoader.getFont(FontLoader.java:164)
        at org.apache.fop.fonts.FontLoader.loadFont(FontLoader.java:113)
        at org.apache.fop.fonts.LazyFont.load(LazyFont.java:126)
        at org.apache.fop.fonts.LazyFont.getAscender(LazyFont.java:233)
        at org.apache.fop.fonts.Font.getAscender(Font.java:96)
        at org.apache.fop.layoutmgr.BlockLayoutManager.initialize(BlockLayoutManager.java:86)
        at org.apache.fop.layoutmgr.AbstractLayoutManager.getChildLM(AbstractLayoutManager.java:118)
        at org.apache.fop.layoutmgr.FlowLayoutManager.getNextKnuthElements(FlowLayoutManager.java:77)
        at org.apache.fop.layoutmgr.PageBreaker.getNextKnuthElements(PageBreaker.java:145)
        at org.apache.fop.layoutmgr.AbstractBreaker.getNextBlockList(AbstractBreaker.java:554)
        at org.apache.fop.layoutmgr.PageBreaker.getNextBlockList(PageBreaker.java:137)
        at org.apache.fop.layoutmgr.AbstractBreaker.doLayout(AbstractBreaker.java:302)
        at org.apache.fop.layoutmgr.AbstractBreaker.doLayout(AbstractBreaker.java:264)
        at org.apache.fop.layoutmgr.PageSequenceLayoutManager.activateLayout(PageSequenceLayoutManager.java:106)
        at org.apache.fop.area.AreaTreeHandler.endPageSequence(AreaTreeHandler.java:234)
        at org.apache.fop.fo.pagination.PageSequence.endOfNode(PageSequence.java:123)
        at org.apache.fop.fo.FOTreeBuilder$MainFOHandler.endElement(FOTreeBuilder.java:374)
        at org.apache.fop.fo.FOTreeBuilder.endElement(FOTreeBuilder.java:196)
        at org.apache.xalan.transformer.TransformerIdentityImpl.endElement(TransformerIdentityImpl.java:1101)
        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:484)
        at org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:186)
        at org.apache.fop.cli.InputHandler.renderTo(InputHandler.java:116)
        at org.apache.fop.cli.Main.startFOP(Main.java:166)
        at org.apache.fop.cli.Main.main(Main.java:197)

<snip/>

Vincent


-- 
Vincent Hennebert                            Anyware Technologies
http://people.apache.org/~vhennebert         http://www.anyware-tech.com
Apache FOP Committer                         FOP Development/Consulting

Re: Supporting unusual encodings for Type 1 fonts

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.

Took me a bit to find the font you were talking about. I can see in the
AFM that it states the AdobeStandardEncoding as default encoding, so
until I can implement the second part of the changes I was talking about,
no Cyrillics for you. ;-)

If you open the AFM in a text editor you can see the list of characters.
All characters with "C -1" are currently not available inside FOP. When
I'm done with the next set of changes, these characters can also be used.

On 15.02.2008 11:48:30 Vincent Hennebert wrote:
> Hi Jeremias,
> 
> Jeremias Maerki wrote:
> > False alarm. JPedal and kpdf, for example, have no problems
> > reconstructing the correct text based on the embedded Encoding. PDFBox,
> 
> Are non-default encodings supposed to work after your latest commit? 
> Because I would like to test evince but I didn’t manage to produce a PDF 
> with a non-default encoding. When putting cyrillic characters in my FO 
> file I get the following kind of errors:
> WARNING: Glyph 1033 (0x409, afii10058) not available in font NimbusSanL-Regu
> 
> And I still see “/Encoding /WinAnsiEncoding” in the PDF when I open it 
> by hand.
> 
> Have I missed anything?
> 
> Thanks,
> Vincent
> 
> > too, but that one had problems extracting from text written using a
> > TrueType font. But Adobe Acrobat Reader does have a problem: text
> > written in a Cyrillic Type 1 font is extracted incorrectly. Not even
> > adding a ToUnicode CMap helped here. Probably a bug. I have no other
> > idea who I could help Acrobat extract the text correctly. So, if you
> > care about copy/paste from Acrobat, switch to TrueType fonts instead of
> > using fonts with encodings other than AdobeStandardEncoding or
> > WinAnsiEncoding.
> > 
> > On 13.02.2008 12:12:59 Jeremias Maerki wrote:
> > <snip/>
> >> BTW, I just found out that I have to generate a ToUnicode CMap if a
> >> Type1 font doesn't use one of the encodings that are predefined in the
> >> PDF spec. So a little more work for me there.
> > <snip/>
> > 
> > 
> > 
> > Jeremias Maerki
> 
> 
> -- 
> Vincent Hennebert                            Anyware Technologies
> http://people.apache.org/~vhennebert         http://www.anyware-tech.com
> Apache FOP Committer                         FOP Development/Consulting




Jeremias Maerki

Re: Supporting unusual encodings for Type 1 fonts

Posted by Vincent Hennebert <vi...@anyware-tech.com>.

I do have an afm file that has the same name as the pfb. I configured 
the font by hand and disabled the auto-detection on the corresponding 
directory, to avoid any possible trouble. FWIW the entry in my config 
file is the following:
    <font kerning="yes" embed-url="type1/gsfonts/n019003l.pfb">
        <font-triplet name="NimbusSansL" style="normal" weight="normal"/>
    </font>
Funny name for a font file, isn’t it? But the afm file has the same 
name. The fact that the glyph name is displayed in the error message 
(afii10058) might indicate that the afm file is actually parsed, so 
something might be wrong somewhere else.
I also have a pfm file with the same name in the directory.

If you need any further information, just yell.

Vincent


Jeremias Maerki wrote:
> It only works if you have an AFM file that has the same name as the PFB.
> 
> MyFont.pfb
> MyFont.pfm
> --> doesn't work, no encoding information in the PFM
> 
> MyFont.pfb
> (MyFont.pfm)
> MyFont.afm
> --> should work.
> 
> On 15.02.2008 11:48:30 Vincent Hennebert wrote:
>> Hi Jeremias,
>>
>> Jeremias Maerki wrote:
>>> False alarm. JPedal and kpdf, for example, have no problems
>>> reconstructing the correct text based on the embedded Encoding. PDFBox,
>> Are non-default encodings supposed to work after your latest commit? 
>> Because I would like to test evince but I didn’t manage to produce a PDF 
>> with a non-default encoding. When putting cyrillic characters in my FO 
>> file I get the following kind of errors:
>> WARNING: Glyph 1033 (0x409, afii10058) not available in font NimbusSanL-Regu
>>
>> And I still see “/Encoding /WinAnsiEncoding” in the PDF when I open it 
>> by hand.
>>
>> Have I missed anything?
>>
>> Thanks,
>> Vincent
>>
>>> too, but that one had problems extracting from text written using a
>>> TrueType font. But Adobe Acrobat Reader does have a problem: text
>>> written in a Cyrillic Type 1 font is extracted incorrectly. Not even
>>> adding a ToUnicode CMap helped here. Probably a bug. I have no other
>>> idea who I could help Acrobat extract the text correctly. So, if you
>>> care about copy/paste from Acrobat, switch to TrueType fonts instead of
>>> using fonts with encodings other than AdobeStandardEncoding or
>>> WinAnsiEncoding.
>>>
>>> On 13.02.2008 12:12:59 Jeremias Maerki wrote:
>>> <snip/>
>>>> BTW, I just found out that I have to generate a ToUnicode CMap if a
>>>> Type1 font doesn't use one of the encodings that are predefined in the
>>>> PDF spec. So a little more work for me there.
>>> <snip/>
>>>
>>>
>>>
>>> Jeremias Maerki
>>
>> -- 
>> Vincent Hennebert                            Anyware Technologies
>> http://people.apache.org/~vhennebert         http://www.anyware-tech.com
>> Apache FOP Committer                         FOP Development/Consulting
> 
> 
> 
> 
> Jeremias Maerki
> 

-- 
Vincent Hennebert                            Anyware Technologies
http://people.apache.org/~vhennebert         http://www.anyware-tech.com
Apache FOP Committer                         FOP Development/Consulting

Re: Supporting unusual encodings for Type 1 fonts

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.

It only works if you have an AFM file that has the same name as the PFB.

MyFont.pfb
MyFont.pfm
--> doesn't work, no encoding information in the PFM

MyFont.pfb
(MyFont.pfm)
MyFont.afm
--> should work.

On 15.02.2008 11:48:30 Vincent Hennebert wrote:
> Hi Jeremias,
> 
> Jeremias Maerki wrote:
> > False alarm. JPedal and kpdf, for example, have no problems
> > reconstructing the correct text based on the embedded Encoding. PDFBox,
> 
> Are non-default encodings supposed to work after your latest commit? 
> Because I would like to test evince but I didn’t manage to produce a PDF 
> with a non-default encoding. When putting cyrillic characters in my FO 
> file I get the following kind of errors:
> WARNING: Glyph 1033 (0x409, afii10058) not available in font NimbusSanL-Regu
> 
> And I still see “/Encoding /WinAnsiEncoding” in the PDF when I open it 
> by hand.
> 
> Have I missed anything?
> 
> Thanks,
> Vincent
> 
> > too, but that one had problems extracting from text written using a
> > TrueType font. But Adobe Acrobat Reader does have a problem: text
> > written in a Cyrillic Type 1 font is extracted incorrectly. Not even
> > adding a ToUnicode CMap helped here. Probably a bug. I have no other
> > idea who I could help Acrobat extract the text correctly. So, if you
> > care about copy/paste from Acrobat, switch to TrueType fonts instead of
> > using fonts with encodings other than AdobeStandardEncoding or
> > WinAnsiEncoding.
> > 
> > On 13.02.2008 12:12:59 Jeremias Maerki wrote:
> > <snip/>
> >> BTW, I just found out that I have to generate a ToUnicode CMap if a
> >> Type1 font doesn't use one of the encodings that are predefined in the
> >> PDF spec. So a little more work for me there.
> > <snip/>
> > 
> > 
> > 
> > Jeremias Maerki
> 
> 
> -- 
> Vincent Hennebert                            Anyware Technologies
> http://people.apache.org/~vhennebert         http://www.anyware-tech.com
> Apache FOP Committer                         FOP Development/Consulting




Jeremias Maerki

Re: Supporting unusual encodings for Type 1 fonts

Posted by Vincent Hennebert <vi...@anyware-tech.com>.

Hi Jeremias,

Jeremias Maerki wrote:
> False alarm. JPedal and kpdf, for example, have no problems
> reconstructing the correct text based on the embedded Encoding. PDFBox,

Are non-default encodings supposed to work after your latest commit? 
Because I would like to test evince but I didn’t manage to produce a PDF 
with a non-default encoding. When putting cyrillic characters in my FO 
file I get the following kind of errors:
WARNING: Glyph 1033 (0x409, afii10058) not available in font NimbusSanL-Regu

And I still see “/Encoding /WinAnsiEncoding” in the PDF when I open it 
by hand.

Have I missed anything?

Thanks,
Vincent

> too, but that one had problems extracting from text written using a
> TrueType font. But Adobe Acrobat Reader does have a problem: text
> written in a Cyrillic Type 1 font is extracted incorrectly. Not even
> adding a ToUnicode CMap helped here. Probably a bug. I have no other
> idea who I could help Acrobat extract the text correctly. So, if you
> care about copy/paste from Acrobat, switch to TrueType fonts instead of
> using fonts with encodings other than AdobeStandardEncoding or
> WinAnsiEncoding.
> 
> On 13.02.2008 12:12:59 Jeremias Maerki wrote:
> <snip/>
>> BTW, I just found out that I have to generate a ToUnicode CMap if a
>> Type1 font doesn't use one of the encodings that are predefined in the
>> PDF spec. So a little more work for me there.
> <snip/>
> 
> 
> 
> Jeremias Maerki


-- 
Vincent Hennebert                            Anyware Technologies
http://people.apache.org/~vhennebert         http://www.anyware-tech.com
Apache FOP Committer                         FOP Development/Consulting

Re: Supporting unusual encodings for Type 1 fonts

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.

False alarm. JPedal and kpdf, for example, have no problems
reconstructing the correct text based on the embedded Encoding. PDFBox,
too, but that one had problems extracting from text written using a
TrueType font. But Adobe Acrobat Reader does have a problem: text
written in a Cyrillic Type 1 font is extracted incorrectly. Not even
adding a ToUnicode CMap helped here. Probably a bug. I have no other
idea who I could help Acrobat extract the text correctly. So, if you
care about copy/paste from Acrobat, switch to TrueType fonts instead of
using fonts with encodings other than AdobeStandardEncoding or
WinAnsiEncoding.

On 13.02.2008 12:12:59 Jeremias Maerki wrote:
<snip/>
> BTW, I just found out that I have to generate a ToUnicode CMap if a
> Type1 font doesn't use one of the encodings that are predefined in the
> PDF spec. So a little more work for me there.
<snip/>

Jeremias Maerki

Re: Supporting unusual encodings for Type 1 fonts

Posted by The Web Maestro <th...@gmail.com>.

In addition to Acrobat 6,7,8,+, Apple QuickView & Evince, I would
think nice to have:
- Acrobat Reader 5 (last version for Mac OS 9 Classic)
- Apple Preview 10.4 (probably similar to QuickView)
- Preview for 10.3 (Panther) would be nice too...

Is it possible the PDF code for option 1 could be 'cleaned up' in the
future (or does it matter)?

Clay



On 2/13/08, Jeremias Maerki <de...@jeremias-maerki.ch> wrote:
> Just some details what each approach will produce:
>
> #1 produces a /CIDFontType0 CIDFont [1] and a /Type0 Composite Font
> referencing the former.
>
> #2 produces one or more /Type1 fonts.
>
> [1] for TrueType we produce a CIDFontType2 CIDFont and a /Type0
> Composite font for each TrueType font. OpenOffice produces one or more
> /TrueType fonts for each TrueType font.
>
> #1 would always generate a CID font for simplicity. What you propose is
> basically a "#2a", i.e. produce a /Type1 font if the document stays
> within the default encoding of the font. If additional characters are
> used FOP would switch to CID fonts instead of producing a /Type1 font.
> So this needs elements from #1 and #2. Possible and probably makes sense
> if CID fonts work in the first place. I like it.
>
> BTW, I just found out that I have to generate a ToUnicode CMap if a
> Type1 font doesn't use one of the encodings that are predefined in the
> PDF spec. So a little more work for me there.
>
> On 13.02.2008 11:57:34 Vincent Hennebert wrote:
> > Hi Jeremias,
> >
> > With solution #1, if I happen to use only the glyphs from the font that
> > are available in its default encoding, will the resulting PDF be the
> > same as in solution #2?
> > What I mean is, will feature-incomplete PDF readers be able to display
> > it? In which case this wouldn't be that bad.
> >
> > Anyway, solution #1 also looks cleaner to me, so go for it. If that
> > means that I'll have to create a RFE for my favourite PDF reader, then
> > I'll do it ;-)
> >
> > Vincent
> >
> >
> > Jeremias Maerki wrote:
> > > I've been asked to look into the possibility to support unusual
> > > encodings (like Cyrillic) with Type 1 fonts. Right now we only support
> > > WinAnsiEncoding (plus special handling for Symbol and ZapfDingbats).
> > >
> > > I already have an AFM parser. The AFM parser is the precondition to
> > > safely support non-standard encodings as only this file contains the
> > > glyph list of a font.
> > >
> > > I'm now on a good way to support non-WinAnsi encodings since I can now
> > > build CodePointMapping instances from an AFM file. I then have to teach
> > > the PDF and PS renderers to make use of these special encodings.
> > >
> > > That's step 1, but it will only make the font's native encoding
> > > available in FOP. The number of available glyphs for a Type 1 font will
> > > still remain under 255 (typicaly under 223 as the first 32 chars are
> > > usually not used). To support all glyphs of a Type 1 font we need more
> > > and I found two possible ways to pursue:
> > >
> > > 1. Treat Type 1 fonts as CID fonts.
> > >
> > > + Probably the cleaner approach.
> > > + All glyphs are supported under one single font (no font renderer-level
> > >   font switching required, see below)
> > > - Makes the generated PDF/PS code a little less readable but that's not
> > >   important.
> > >
> > > 2. Do something like OpenOffice when handling fonts with more than 255
> > > chars: Create multiple single-byte encodings which map to the same base
> > > font. This will require an 1:n relationship from font to char mapping
> > > which the renderers also have to handle. The first encoding will be
> > > equal to the font's default encoding (PDF calls that the "implicit base
> > > encoding"). The other encoding(s) will be built from the rest of the
> > > available characters. In the renderer it will be necessary to switch
> > > fonts from one character to another (not the same as switching from
> > > Helvetica to Symbol, i.e. not at FO level, but at renderer level).
> > >
> > > + Higher compatibility with PDF viewers which are not yet
> > >   feature-complete.
> > > + Keeps the generated PDF/PS code more readable (not important)
> > > - Switching between derived fonts (i.e. font with a common base font but
> > >   with special encodings) is necessary. SingleByteFont needs to be split
> > >   in two classes.
> > >
> > > An example: The "Baskerville Cyrillic" font contains 264
> > > characters/glyphs. The default encoding only contains 221 characters. So
> > > 43 additional characters can be made available like this.
> > >
> > > I'm currently leaning towards CID fonts as it is probably the cleaner
> > > approach. Both solutions are probably pretty much the same in terms of
> > > effort. The CID approach will take more work in the PS renderer and the
> > > multi-encoding approach will make changes necessary in FOP's font
> > > library.
> > >
> > > If anyone has thoughts on this, I'd appreciate it. I'll finish the
> > > changes for supporting the default encodings and then finish the
> > > processing feedback stuff before I finish this here.
> > >
> > > Jeremias Maerki
> >
> >
> > --
> > Vincent Hennebert                            Anyware Technologies
> > http://people.apache.org/~vhennebert         http://www.anyware-tech.com
> > Apache FOP Committer                         FOP Development/Consulting
>
>
>
>
> Jeremias Maerki
>
>

-- 
Sent from Gmail for mobile | mobile.google.com

Regards,

The Web Maestro
-- 
<th...@gmail.com> - <http://homepage.mac.com/webmaestro/>
My religion is simple. My religion is kindness.
- HH The 14th Dalai Lama of Tibet

Re: Supporting unusual encodings for Type 1 fonts

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.

Just some details what each approach will produce:

#1 produces a /CIDFontType0 CIDFont [1] and a /Type0 Composite Font
referencing the former.

#2 produces one or more /Type1 fonts.

[1] for TrueType we produce a CIDFontType2 CIDFont and a /Type0
Composite font for each TrueType font. OpenOffice produces one or more
/TrueType fonts for each TrueType font.

#1 would always generate a CID font for simplicity. What you propose is
basically a "#2a", i.e. produce a /Type1 font if the document stays
within the default encoding of the font. If additional characters are
used FOP would switch to CID fonts instead of producing a /Type1 font.
So this needs elements from #1 and #2. Possible and probably makes sense
if CID fonts work in the first place. I like it.

BTW, I just found out that I have to generate a ToUnicode CMap if a
Type1 font doesn't use one of the encodings that are predefined in the
PDF spec. So a little more work for me there.

On 13.02.2008 11:57:34 Vincent Hennebert wrote:
> Hi Jeremias,
> 
> With solution #1, if I happen to use only the glyphs from the font that 
> are available in its default encoding, will the resulting PDF be the 
> same as in solution #2?
> What I mean is, will feature-incomplete PDF readers be able to display 
> it? In which case this wouldn’t be that bad.
> 
> Anyway, solution #1 also looks cleaner to me, so go for it. If that 
> means that I’ll have to create a RFE for my favourite PDF reader, then 
> I’ll do it ;-)
> 
> Vincent
> 
> 
> Jeremias Maerki wrote:
> > I've been asked to look into the possibility to support unusual
> > encodings (like Cyrillic) with Type 1 fonts. Right now we only support
> > WinAnsiEncoding (plus special handling for Symbol and ZapfDingbats).
> > 
> > I already have an AFM parser. The AFM parser is the precondition to
> > safely support non-standard encodings as only this file contains the
> > glyph list of a font.
> > 
> > I'm now on a good way to support non-WinAnsi encodings since I can now
> > build CodePointMapping instances from an AFM file. I then have to teach
> > the PDF and PS renderers to make use of these special encodings.
> > 
> > That's step 1, but it will only make the font's native encoding
> > available in FOP. The number of available glyphs for a Type 1 font will
> > still remain under 255 (typicaly under 223 as the first 32 chars are
> > usually not used). To support all glyphs of a Type 1 font we need more
> > and I found two possible ways to pursue:
> > 
> > 1. Treat Type 1 fonts as CID fonts.
> > 
> > + Probably the cleaner approach.
> > + All glyphs are supported under one single font (no font renderer-level
> >   font switching required, see below)
> > - Makes the generated PDF/PS code a little less readable but that's not
> >   important.
> > 
> > 2. Do something like OpenOffice when handling fonts with more than 255
> > chars: Create multiple single-byte encodings which map to the same base
> > font. This will require an 1:n relationship from font to char mapping
> > which the renderers also have to handle. The first encoding will be
> > equal to the font's default encoding (PDF calls that the "implicit base
> > encoding"). The other encoding(s) will be built from the rest of the
> > available characters. In the renderer it will be necessary to switch
> > fonts from one character to another (not the same as switching from
> > Helvetica to Symbol, i.e. not at FO level, but at renderer level).
> > 
> > + Higher compatibility with PDF viewers which are not yet
> >   feature-complete.
> > + Keeps the generated PDF/PS code more readable (not important)
> > - Switching between derived fonts (i.e. font with a common base font but
> >   with special encodings) is necessary. SingleByteFont needs to be split
> >   in two classes.
> > 
> > An example: The "Baskerville Cyrillic" font contains 264
> > characters/glyphs. The default encoding only contains 221 characters. So
> > 43 additional characters can be made available like this.
> > 
> > I'm currently leaning towards CID fonts as it is probably the cleaner
> > approach. Both solutions are probably pretty much the same in terms of
> > effort. The CID approach will take more work in the PS renderer and the
> > multi-encoding approach will make changes necessary in FOP's font
> > library.
> > 
> > If anyone has thoughts on this, I'd appreciate it. I'll finish the
> > changes for supporting the default encodings and then finish the
> > processing feedback stuff before I finish this here.
> > 
> > Jeremias Maerki
> 
> 
> -- 
> Vincent Hennebert                            Anyware Technologies
> http://people.apache.org/~vhennebert         http://www.anyware-tech.com
> Apache FOP Committer                         FOP Development/Consulting




Jeremias Maerki

Re: Supporting unusual encodings for Type 1 fonts

Posted by Vincent Hennebert <vi...@anyware-tech.com>.

Hi Jeremias,

With solution #1, if I happen to use only the glyphs from the font that 
are available in its default encoding, will the resulting PDF be the 
same as in solution #2?
What I mean is, will feature-incomplete PDF readers be able to display 
it? In which case this wouldn’t be that bad.

Anyway, solution #1 also looks cleaner to me, so go for it. If that 
means that I’ll have to create a RFE for my favourite PDF reader, then 
I’ll do it ;-)

Vincent


Jeremias Maerki wrote:
> I've been asked to look into the possibility to support unusual
> encodings (like Cyrillic) with Type 1 fonts. Right now we only support
> WinAnsiEncoding (plus special handling for Symbol and ZapfDingbats).
> 
> I already have an AFM parser. The AFM parser is the precondition to
> safely support non-standard encodings as only this file contains the
> glyph list of a font.
> 
> I'm now on a good way to support non-WinAnsi encodings since I can now
> build CodePointMapping instances from an AFM file. I then have to teach
> the PDF and PS renderers to make use of these special encodings.
> 
> That's step 1, but it will only make the font's native encoding
> available in FOP. The number of available glyphs for a Type 1 font will
> still remain under 255 (typicaly under 223 as the first 32 chars are
> usually not used). To support all glyphs of a Type 1 font we need more
> and I found two possible ways to pursue:
> 
> 1. Treat Type 1 fonts as CID fonts.
> 
> + Probably the cleaner approach.
> + All glyphs are supported under one single font (no font renderer-level
>   font switching required, see below)
> - Makes the generated PDF/PS code a little less readable but that's not
>   important.
> 
> 2. Do something like OpenOffice when handling fonts with more than 255
> chars: Create multiple single-byte encodings which map to the same base
> font. This will require an 1:n relationship from font to char mapping
> which the renderers also have to handle. The first encoding will be
> equal to the font's default encoding (PDF calls that the "implicit base
> encoding"). The other encoding(s) will be built from the rest of the
> available characters. In the renderer it will be necessary to switch
> fonts from one character to another (not the same as switching from
> Helvetica to Symbol, i.e. not at FO level, but at renderer level).
> 
> + Higher compatibility with PDF viewers which are not yet
>   feature-complete.
> + Keeps the generated PDF/PS code more readable (not important)
> - Switching between derived fonts (i.e. font with a common base font but
>   with special encodings) is necessary. SingleByteFont needs to be split
>   in two classes.
> 
> An example: The "Baskerville Cyrillic" font contains 264
> characters/glyphs. The default encoding only contains 221 characters. So
> 43 additional characters can be made available like this.
> 
> I'm currently leaning towards CID fonts as it is probably the cleaner
> approach. Both solutions are probably pretty much the same in terms of
> effort. The CID approach will take more work in the PS renderer and the
> multi-encoding approach will make changes necessary in FOP's font
> library.
> 
> If anyone has thoughts on this, I'd appreciate it. I'll finish the
> changes for supporting the default encodings and then finish the
> processing feedback stuff before I finish this here.
> 
> Jeremias Maerki


-- 
Vincent Hennebert                            Anyware Technologies
http://people.apache.org/~vhennebert         http://www.anyware-tech.com
Apache FOP Committer                         FOP Development/Consulting

Re: Supporting unusual encodings for Type 1 fonts

Posted by The Web Maestro <th...@gmail.com>.

Thanks!

Clay



On 3/2/08, Jeremias Maerki <de...@jeremias-maerki.ch> wrote:
> You should look for error messages from the viewers or obviously wrong
> results. I've just uploaded a PNG which show the three variants and the
> differences in between. This is the expected output (with explanations).
> Caution: the PNG is >1MB! Anyway, the output from the viewers you tested
> is obviously fine.
>
> http://people.apache.org/~jeremias/fop/type1-demo/changes-explained.png
>
> Another thing that could be tested is if copy/paste of the text into a
> Unicode-capable (!) application is possible. Adobe Acrobat seems to have
> a problem in certain cases but that's more a bug there than in FOP
> because other tools can extract the text just fine.
>
> On 02.03.2008 07:39:41 The Web Maestro wrote:
> > On Fri, Feb 29, 2008 at 7:14 AM, Jeremias Maerki <de...@jeremias-maerki.ch>
> wrote:
> > > For those, who want to test PDF viewer compatibility I have a demo PDF
> > >  which demonstrates Type 1 "step 2" implemented with solution 2
> (multiple
> > >  descendant fonts with dynamic encoding build-up).
> > >
> > >  http://people.apache.org/~jeremias/fop/type1-demo/
> > >  - [1] font-type1-demo-before.pdf (revision 627678, before I added the
> AFM stuff, i.e. step 1)
> > >  - [2] font-type1-demo-step1.pdf (current FOP Trunk HEAD)
> > >  - [3] font-type1-demo-step2.pdf (my local working copy)
> >
> > I'm not sure what exactly to look for, but I've taken screenshots of
> > the 3 versions open Mac OS X 10.4.x Preview v3.0.9 & Acrobat 8.1.2:
> >
> > http://people.apache.org/~clay/fop/type1-demo/
> >
> > HTH!
> >
> > Regards,
> >
> > The Web Maestro
> > --
> > <th...@gmail.com> - <http://homepage.mac.com/webmaestro/>
> > My religion is simple. My religion is kindness.
> > - HH The 14th Dalai Lama of Tibet
>
>
>
>
> Jeremias Maerki
>
>

-- 
Sent from Gmail for mobile | mobile.google.com

Regards,

The Web Maestro
-- 
<th...@gmail.com> - <http://homepage.mac.com/webmaestro/>
My religion is simple. My religion is kindness.
- HH The 14th Dalai Lama of Tibet

Re: Supporting unusual encodings for Type 1 fonts

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.

You should look for error messages from the viewers or obviously wrong
results. I've just uploaded a PNG which show the three variants and the
differences in between. This is the expected output (with explanations).
Caution: the PNG is >1MB! Anyway, the output from the viewers you tested
is obviously fine.

http://people.apache.org/~jeremias/fop/type1-demo/changes-explained.png

Another thing that could be tested is if copy/paste of the text into a
Unicode-capable (!) application is possible. Adobe Acrobat seems to have
a problem in certain cases but that's more a bug there than in FOP
because other tools can extract the text just fine.

On 02.03.2008 07:39:41 The Web Maestro wrote:
> On Fri, Feb 29, 2008 at 7:14 AM, Jeremias Maerki <de...@jeremias-maerki.ch> wrote:
> > For those, who want to test PDF viewer compatibility I have a demo PDF
> >  which demonstrates Type 1 "step 2" implemented with solution 2 (multiple
> >  descendant fonts with dynamic encoding build-up).
> >
> >  http://people.apache.org/~jeremias/fop/type1-demo/
> >  - [1] font-type1-demo-before.pdf (revision 627678, before I added the AFM stuff, i.e. step 1)
> >  - [2] font-type1-demo-step1.pdf (current FOP Trunk HEAD)
> >  - [3] font-type1-demo-step2.pdf (my local working copy)
> 
> I'm not sure what exactly to look for, but I've taken screenshots of
> the 3 versions open Mac OS X 10.4.x Preview v3.0.9 & Acrobat 8.1.2:
> 
> http://people.apache.org/~clay/fop/type1-demo/
> 
> HTH!
> 
> Regards,
> 
> The Web Maestro
> -- 
> <th...@gmail.com> - <http://homepage.mac.com/webmaestro/>
> My religion is simple. My religion is kindness.
> - HH The 14th Dalai Lama of Tibet

Jeremias Maerki

Re: Supporting unusual encodings for Type 1 fonts

Posted by The Web Maestro <th...@gmail.com>.

On Fri, Feb 29, 2008 at 7:14 AM, Jeremias Maerki <de...@jeremias-maerki.ch> wrote:
> For those, who want to test PDF viewer compatibility I have a demo PDF
>  which demonstrates Type 1 "step 2" implemented with solution 2 (multiple
>  descendant fonts with dynamic encoding build-up).
>
>  http://people.apache.org/~jeremias/fop/type1-demo/
>  - [1] font-type1-demo-before.pdf (revision 627678, before I added the AFM stuff, i.e. step 1)
>  - [2] font-type1-demo-step1.pdf (current FOP Trunk HEAD)
>  - [3] font-type1-demo-step2.pdf (my local working copy)

I'm not sure what exactly to look for, but I've taken screenshots of
the 3 versions open Mac OS X 10.4.x Preview v3.0.9 & Acrobat 8.1.2:

http://people.apache.org/~clay/fop/type1-demo/

HTH!

Regards,

The Web Maestro
-- 
<th...@gmail.com> - <http://homepage.mac.com/webmaestro/>
My religion is simple. My religion is kindness.
- HH The 14th Dalai Lama of Tibet

Re: Supporting unusual encodings for Type 1 fonts

Posted by Max Berger <ma...@berger.name>.

Jeremias,

just tested on Linux with evince 2.20.2 and acrobat reader 7.0. The
results are the same as the one shown in your explanations. The
improvement is clearly visible. Good work!


Screenshots are available at:

http://max.berger.name/tmp/fop/


On Fre, 2008-02-29 at 16:14 +0100, Jeremias Maerki wrote:
> For those, who want to test PDF viewer compatibility I have a demo PDF
> which demonstrates Type 1 "step 2" implemented with solution 2 (multiple
> descendant fonts with dynamic encoding build-up).
> 
> http://people.apache.org/~jeremias/fop/type1-demo/
> - [1] font-type1-demo-before.pdf (revision 627678, before I added the AFM stuff, i.e. step 1)
> - [2] font-type1-demo-step1.pdf (current FOP Trunk HEAD)
> - [3] font-type1-demo-step2.pdf (my local working copy)
> 

> Jeremias Maerki


mfG

Max Berger
e-mail: max@berger.name

-- 
OpenPG ID: E81592BC   Print: F489F8759D4132923EC4 BC7E072AB73AE81592BC
For information about me and my work please see http://max.berger.name

Re: Supporting unusual encodings for Type 1 fonts

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.

For those, who want to test PDF viewer compatibility I have a demo PDF
which demonstrates Type 1 "step 2" implemented with solution 2 (multiple
descendant fonts with dynamic encoding build-up).

http://people.apache.org/~jeremias/fop/type1-demo/
- [1] font-type1-demo-before.pdf (revision 627678, before I added the AFM stuff, i.e. step 1)
- [2] font-type1-demo-step1.pdf (current FOP Trunk HEAD)
- [3] font-type1-demo-step2.pdf (my local working copy)

The PDFs are not compressed so it's easy to see what's happening inside
and what the differences are. The evolution is clearly visible. With [2]
the Baskerville Cyrillic fonts starts to display cyrillic characters
from its primary encoding. With [3] you get a few more characters with
Baskerville. The biggest leap forward is with the URW Gothic L font
which contains characters from various languages.

Now I have to do the same adjustments like for PDF for PostScript also.
Please note that there's no change with TrueType fonts. They are still
treated as CID fonts. At any rate, there's some infrastructure available
now that would make handling TrueType with multiple single byte
encodings easier.


Jeremias Maerki

Re: Supporting unusual encodings for Type 1 fonts

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.

It turns out that implementing solution 1 isn't so easy in PDF. Actually,
my naming was even wrong. It's not a CIDFont I wanted to use. I just
wanted to use character codes larger than 255 as input to a CMap which
spits out character names for Type 1 fonts (see illustration in PS Third
Edition, page 367). That's easy in PostScript but not supported in PDF
1.4. Grmbl.

If I could somehow convert a Type 1 font into a CIDFont I could do it.
The clue to using a CIDFont is to use CID (character IDs) instead of
character names. But I think this conversion is probably more
complicated than implementing solution 2. So I'm off towards solution 2.

On 12.02.2008 15:25:21 Jeremias Maerki wrote:
> I've been asked to look into the possibility to support unusual
> encodings (like Cyrillic) with Type 1 fonts. Right now we only support
> WinAnsiEncoding (plus special handling for Symbol and ZapfDingbats).
> 
> I already have an AFM parser. The AFM parser is the precondition to
> safely support non-standard encodings as only this file contains the
> glyph list of a font.
> 
> I'm now on a good way to support non-WinAnsi encodings since I can now
> build CodePointMapping instances from an AFM file. I then have to teach
> the PDF and PS renderers to make use of these special encodings.
> 
> That's step 1, but it will only make the font's native encoding
> available in FOP. The number of available glyphs for a Type 1 font will
> still remain under 255 (typicaly under 223 as the first 32 chars are
> usually not used). To support all glyphs of a Type 1 font we need more
> and I found two possible ways to pursue:
> 
> 1. Treat Type 1 fonts as CID fonts.
> 
> + Probably the cleaner approach.
> + All glyphs are supported under one single font (no font renderer-level
>   font switching required, see below)
> - Makes the generated PDF/PS code a little less readable but that's not
>   important.
> 
> 2. Do something like OpenOffice when handling fonts with more than 255
> chars: Create multiple single-byte encodings which map to the same base
> font. This will require an 1:n relationship from font to char mapping
> which the renderers also have to handle. The first encoding will be
> equal to the font's default encoding (PDF calls that the "implicit base
> encoding"). The other encoding(s) will be built from the rest of the
> available characters. In the renderer it will be necessary to switch
> fonts from one character to another (not the same as switching from
> Helvetica to Symbol, i.e. not at FO level, but at renderer level).
> 
> + Higher compatibility with PDF viewers which are not yet
>   feature-complete.
> + Keeps the generated PDF/PS code more readable (not important)
> - Switching between derived fonts (i.e. font with a common base font but
>   with special encodings) is necessary. SingleByteFont needs to be split
>   in two classes.
> 
> An example: The "Baskerville Cyrillic" font contains 264
> characters/glyphs. The default encoding only contains 221 characters. So
> 43 additional characters can be made available like this.
> 
> I'm currently leaning towards CID fonts as it is probably the cleaner
> approach. Both solutions are probably pretty much the same in terms of
> effort. The CID approach will take more work in the PS renderer and the
> multi-encoding approach will make changes necessary in FOP's font
> library.
> 
> If anyone has thoughts on this, I'd appreciate it. I'll finish the
> changes for supporting the default encodings and then finish the
> processing feedback stuff before I finish this here.
> 
> Jeremias Maerki
> 




Jeremias Maerki

Re: Supporting unusual encodings for Type 1 fonts

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.

On 12.02.2008 17:42:24 Max Berger wrote:
> Jeremias,
> 
> 
> 2008/2/12, Jeremias Maerki <de...@jeremias-maerki.ch>:
> > + Higher compatibility with PDF viewers which are not yet
> >   feature-complete.
> 
> would it be possible to create a quick PDF file with these extensions
> so that I can test my favorite PDF viewers?

Hmm, that's difficult. It is quite difficult to create such a PDF by
hand. In the end, it basically means I have to implement at least half
the solution just to find out if it works. But as it's only about PDF
and the CID solution should be relatively easy to do for PDF we can find
out with no big penalty if it doesn't work. At least I hope I didn't
miss anything when I did my estimates on this.

> As far as I am concerned
> these are
>
> - Adobe Acrobat Reader. This is the main app, and it should be tested
> with 6, 7, and 8. Having "broken" pdf on any of those is a real
> show-stopper.
> - Apple QuickView
> - Evince
> 
> We could set up a Wiki Test site, and I can try and test the pdf with
> different versions of the software to see what exactly the impact is,
> and then make a decision based on that.

Ok, I'll try to make test PDFs available as soon as I can produce them.

> > Jeremias Maerki
> 
> 
> Max

Jeremias Maerki

Re: Supporting unusual encodings for Type 1 fonts

Posted by Max Berger <ma...@berger.name>.

Jeremias,


2008/2/12, Jeremias Maerki <de...@jeremias-maerki.ch>:
> + Higher compatibility with PDF viewers which are not yet
>   feature-complete.

would it be possible to create a quick PDF file with these extensions
so that I can test my favorite PDF viewers? As far as I am concerned
these are

- Adobe Acrobat Reader. This is the main app, and it should be tested
with 6, 7, and 8. Having "broken" pdf on any of those is a real
show-stopper.
- Apple QuickView
- Evince

We could set up a Wiki Test site, and I can try and test the pdf with
different versions of the software to see what exactly the impact is,
and then make a decision based on that.

> Jeremias Maerki


Max