You are viewing a plain text version of this content. The canonical link for it is here.

Posted to fop-dev@xmlgraphics.apache.org by Clay Atkins <ca...@spcmg.com> on 1999/12/04 19:32:28 UTC

PDF Font Support

Here is the skinny on font support in PDF:

text can be encoded using unicode by beginning the string with <FE FF>.

PDF supports type 1, type 0, TrueType, and type 3 fonts.  Type 1,  type 3
and TrueType fonts rely upon a vector, with an indexing range from 0-255,
for mapping character codes to glyph names.  These fonts will not work for
unicode.

Type 0 fonts are "composite" fonts that support a more complicated type of
encoding that allows for multi-byte characters and will work fine for
unicode; including vertical writing, which is an issue to discuss in more
detail.

In PDF, the encoding mechanism for type 0 fonts is "CMap".  There are
several pre-defined CMaps, and in that list are maps for unicode support.
CMaps provide information for locating glyphs in base fonts.  For unicode,
the base fonts must be "CIDFonts", which are a new type of font that can
handle multi-byte character sets.

PDF sends font information as a small font descriptor -- I guess you know
that already -- or can send the entire font description, which can be
large.  The manual recommends sending the entire font description for any
font that cannot be represented in the ISOLatin1 character set; that might
not be necessary, though.

Anyway, it is necessary for the particular unicode encoded font to exist on
the server.  According to the documentation, the font should be included in
the PDF file.  Of course, if the font is installed on the clients, then this
is unnecessary.  Maybe this should be some type of option.

So, there it is.  I'd be glad to work on unicode support, but I don't have
in CID fonts and building one would be a big job.  I'll go looking for some.

Re: PDF Font Support

Posted by James Tauber <jt...@jtauber.com>.

Check out:

http://www.adobe.com/products/acrobat/cjkfontpack.html

I don't know whether this is any use (I haven't downloaded them yet).

James

Re: PDF Font Support

Posted by James Tauber <jt...@jtauber.com>.

> Doesn't the OEM-to-Unicode translation, that occurs when characters are
read
> from the input file, take care of necessary translations; indirectly
dealing
> with any issues concerning WinAnsi?  Or, are the input characters carried
> through to the render as bytes?

FOP uses Unicode internally. That is what is coming from the XML processors
regardless of what encoding the input file is in.

However, the PDF is in WinAnsi and the font metrics are in WinAnsi. So FOP
has mapping tables from Unicode to WinAnsi.

If we produce Unicode PDF, then we avoid having to translate the characters
because they are internally Unicode in FOP and they can be added to the
stream as Unicode (UCS-2 or UTF-16).

I said "fortunately" about Fotis's putting the code mapping into XML at my
request because it means the metrics for the inbuilt font are available to
us in Unicode, not just WinAnsi like the original AFM files.

> On the length thing, my interpretation of the manual is that only the
first
> two characters of each string need the <FE FF> sequence, and that the
> remainder of the string is expected to be two-byte sequences for each
> character.

Right, but the way to encode byte sequences is as hex digits. If you put
spaces between the bytes, you end up with three times the length. Eg "A"
becomes "41 ". Actually, it will be six times the length, because "A" will
become "00 41 ".

James

RE: PDF Font Support

Posted by Clay Atkins <ca...@spcmg.com>.

Doesn't the OEM-to-Unicode translation, that occurs when characters are read
from the input file, take care of necessary translations; indirectly dealing
with any issues concerning WinAnsi?  Or, are the input characters carried
through to the render as bytes?

On the length thing, my interpretation of the manual is that only the first
two characters of each string need the <FE FF> sequence, and that the
remainder of the string is expected to be two-byte sequences for each
character.

> Here is the skinny on font support in PDF:

Great!

> text can be encoded using unicode by beginning the string with <FE FF>.

The UTF-16 BOM; this makes sense. Tied in with the suggestion I forwarded
recently that we should use bytes in PDF streams to enable compression, we
should probably modify PDFStream to:

1. start with FE FF
2. convert all incoming Strings to be added to bytes using the UTF-16
transformation (Java probably provides this in the class library)
3. output the streams as < hex hex hex > rather than WinAnsi characters.

This will triple the length of streams, but that can be fixed with
compressed down the line. I don't know that it would be worth having a check
to see whether any non-WinAnsi characters are used and only use UTF-16 if
there are.

Fortunately, because of Fotis's code mapping files, we already have the
mapping between WinAnsi used by the font metrics and Unicode.

James

Re: PDF Font Support

Posted by James Tauber <jt...@jtauber.com>.

> Here is the skinny on font support in PDF:

Great!

> text can be encoded using unicode by beginning the string with <FE FF>.

The UTF-16 BOM; this makes sense. Tied in with the suggestion I forwarded
recently that we should use bytes in PDF streams to enable compression, we
should probably modify PDFStream to:

1. start with FE FF
2. convert all incoming Strings to be added to bytes using the UTF-16
transformation (Java probably provides this in the class library)
3. output the streams as < hex hex hex > rather than WinAnsi characters.

This will triple the length of streams, but that can be fixed with
compressed down the line. I don't know that it would be worth having a check
to see whether any non-WinAnsi characters are used and only use UTF-16 if
there are.

Fortunately, because of Fotis's code mapping files, we already have the
mapping between WinAnsi used by the font metrics and Unicode.

James

Re: PDF Font Support

Posted by James Tauber <jt...@jtauber.com>.

> In PDF, the encoding mechanism for type 0 fonts is "CMap".  There are
> several pre-defined CMaps, and in that list are maps for unicode support.
> CMaps provide information for locating glyphs in base fonts.  For unicode,
> the base fonts must be "CIDFonts", which are a new type of font that can
> handle multi-byte character sets.
>
> PDF sends font information as a small font descriptor -- I guess you know
> that already -- or can send the entire font description, which can be
> large.  The manual recommends sending the entire font description for any
> font that cannot be represented in the ISOLatin1 character set; that might
> not be necessary, though.
>
> Anyway, it is necessary for the particular unicode encoded font to exist
on
> the server.  According to the documentation, the font should be included
in
> the PDF file.  Of course, if the font is installed on the clients, then
this
> is unnecessary.  Maybe this should be some type of option.
>
> So, there it is.  I'd be glad to work on unicode support, but I don't have
> in CID fonts and building one would be a big job.  I'll go looking for
some.

That would be fantastic. Anyone else on this list have any experience with
CID fonts in PDF?

James

RE: PDF Font Support

Posted by Clay Atkins <ca...@spcmg.com>.

Well, wouldn't it be more user friendly to create a mechanism, where the
user can reference a directory of fonts that are automatically included, as
needed, without any type of compiling?  Maybe have some classes like
"PDFType0", "PDFType1", etc, that can interpret the content of the font file
as needed by the renderer.



> Considering that there so many fonts out there wouldn't it be a
possibility to provide a
> way for the end user to plug in his/her fonts (metrics etc.) + mapping
without the need
> to recompile FOP. The FOP project just provides the plug-in mechanism and
collects the
> definition files provided by the users (+ fonts if they are free). If
things are good
> documented user which are not into programming can make use of their own
fonts and
> share that with others.

But we should make the code mapping and metric classes passable to FOP as a
String indicating the name of the class (just like we now do element
mappings and renderers) so that all that would need to be compiled are the
mapping files, not all of FOP.

Note also, that you only need one mapping file per encoding, not one per
font.
For the metrics, we need a utility that will read TrueType fonts and AFMs
and generate the XML metric files, I think.

James

Re: PDF Font Support

Posted by James Tauber <jt...@jtauber.com>.

> Considering that there so many fonts out there wouldn't it be a
possibility to provide a
> way for the end user to plug in his/her fonts (metrics etc.) + mapping
without the need
> to recompile FOP. The FOP project just provides the plug-in mechanism and
collects the
> definition files provided by the users (+ fonts if they are free). If
things are good
> documented user which are not into programming can make use of their own
fonts and
> share that with others.

But we should make the code mapping and metric classes passable to FOP as a
String indicating the name of the class (just like we now do element
mappings and renderers) so that all that would need to be compiled are the
mapping files, not all of FOP.

Note also, that you only need one mapping file per encoding, not one per
font.
For the metrics, we need a utility that will read TrueType fonts and AFMs
and generate the XML metric files, I think.

James

Re: PDF Font Support

Posted by Fotis Jannidis <Fo...@lrz.uni-muenchen.de>.

> Ultimately "The Right Way" to do things is for FOP to produce PDF with
> Unicode and wish that all fonts supported Unicode.
> 
> But for support of those fonts that don't use Unicode, we have to have a
> mapping from Unicode to the encoding used by that font. We currently do that
> for the in-built fonts. We'll have to do it for other non-Unicode fonts.

Considering that there so many fonts out there wouldn't it be a possibility to provide a 
way for the end user to plug in his/her fonts (metrics etc.) + mapping without the need 
to recompile FOP. The FOP project just provides the plug-in mechanism and collects the 
definition files provided by the users (+ fonts if they are free). If things are good 
documented user which are not into programming can make use of their own fonts and 
share that with others. 
Fotis

Re: PDF Font Support

Posted by James Tauber <jt...@jtauber.com>.

> Okay, I'm confused.

My previous email might have made it clearly but I'll explain again.

>  What about the encoding declaration, for example:
>
> <?xml encoding='EUC-JP'?>
>
> How is the encoding of the document, given by the encoding declaration,
> reconciled with the use of a particular font?  Isn't it the responsibility
> of the xml producer to encode the document in a way that is compatible
with
> the fonts selected in the fo?

Only in as much as the font must contain glyphs for the characters used.

The encoding of the document is otherwise irrelevant. FOP never sees it. The
XML processor (ie parser) converts to Unicode and that is what FOP gets.

FOP currently converts that Unicode back to WinAnsi.

Ultimately "The Right Way" to do things is for FOP to produce PDF with
Unicode and wish that all fonts supported Unicode.

But for support of those fonts that don't use Unicode, we have to have a
mapping from Unicode to the encoding used by that font. We currently do that
for the in-built fonts. We'll have to do it for other non-Unicode fonts.

James

RE: PDF Font Support

Posted by Clay Atkins <ca...@spcmg.com>.

Okay, I'm confused.  What about the encoding declaration, for example:

<?xml encoding='EUC-JP'?>

How is the encoding of the document, given by the encoding declaration,
reconciled with the use of a particular font?  Isn't it the responsibility
of the xml producer to encode the document in a way that is compatible with
the fonts selected in the fo?

I think it will by fairly easy to do type 0 fonts.  Well, that's a relative
statement, of course.

---

> PDF supports type 1, type 0, TrueType, and type 3 fonts.  Type 1,  type 3
> and TrueType fonts rely upon a vector, with an indexing range from 0-255,
> for mapping character codes to glyph names.  These fonts will not work for
> unicode.

Yep. What we might be able to do in the first instance, though, is provide
for mappings from Unicode to certain fonts. For example, I have Greek and
Hebrew fonts that, while they use codes 0-255, can be mapped to Unicode.
Likewise fonts for any language with < 256 characters.

In the FOP code, there is a comment (I think in LineArea.java) where I say
that I don't think we should hard-code the code mapping class. The above is
the reason I had in mind for saying this.

> Type 0 fonts are "composite" fonts that support a more complicated type of
> encoding that allows for multi-byte characters and will work fine for
> unicode; including vertical writing, which is an issue to discuss in more
> detail.

Yes. I won't be satisfied until FOP can be used for at least simple CJKV
documents.

James

Re: PDF Font Support

Posted by James Tauber <jt...@jtauber.com>.

> PDF supports type 1, type 0, TrueType, and type 3 fonts.  Type 1,  type 3
> and TrueType fonts rely upon a vector, with an indexing range from 0-255,
> for mapping character codes to glyph names.  These fonts will not work for
> unicode.

Yep. What we might be able to do in the first instance, though, is provide
for mappings from Unicode to certain fonts. For example, I have Greek and
Hebrew fonts that, while they use codes 0-255, can be mapped to Unicode.
Likewise fonts for any language with < 256 characters.

In the FOP code, there is a comment (I think in LineArea.java) where I say
that I don't think we should hard-code the code mapping class. The above is
the reason I had in mind for saying this.

> Type 0 fonts are "composite" fonts that support a more complicated type of
> encoding that allows for multi-byte characters and will work fine for
> unicode; including vertical writing, which is an issue to discuss in more
> detail.

Yes. I won't be satisfied until FOP can be used for at least simple CJKV
documents.

James