You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Tore Engvig <te...@infostream.no> on 2001/03/08 14:01:54 UTC

Font subset embedding and bugfix patch

This patch enables embedding of font subsets for ttf cid fonts. It also
fixes some bugs in the TTFReader as described in the changes.txt file.

An example pdf file is available at
http://vaggen.net/~tengvig/unicode.pdf

(This is a 50Kb version of the former 8Mb example)


Tore


Re: Font subset embedding and bugfix patch

Posted by Tore Engvig <te...@infostream.no>.

On Thu, 8 Mar 2001, Peter S. Housel wrote:

> > An example pdf file is available at
> > http://vaggen.net/~tengvig/unicode.pdf
> 
> I haven't tried this patch yet, but I notice that cutting and pasting text
> from the subsetted font in the example file yields garbage.  Is there any
> way to subset a font and still preserve the proper encoding for text
> extraction?

Yes there is, although not in fop... At least I think there is...
The way cid keyed fonts work is that the strings in the pdf become indexes
to the glyph in the font - which most probably don't match any character
encoding scheme. When you create a font subset, you start with an empty
font and add glyphs to it as you go (in the patch this starts at glyph
index 2 using the mapChar method in MultiByteFont).

It is possible to create a ToUnicode cmap for the font. Then all the
strings in the pdf would be 16bit unicode. Then probably cut and paste
from the pdf would work.

My initial try at creating CID font support was to use a ToUnicode cmap,
but that didn't work very well... Seems like my cmaps never worked,
Acrobat choked and it isn't very verbose in the errormessages. More
verbose software like xpdf and ghostview don't support Type0 fonts at all.

Anyway, if it did work, a ToUnicode cmap would make things a little more
complicated and remove possibilities for some size optimizations in the
pdf. Also, if/when support for ligatures is added, they would probably
would look garbled when pasted in a document using another font than the
original font used to generate the pdf.

Is anyone using cut'n'paste from pdf documents?


Tore




> 
> Cheers,
> -Peter S. Housel-   housel@acm.org   http://members.home.com/housel/
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
> For additional commands, email: fop-dev-help@xml.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Font subset embedding and bugfix patch

Posted by "Peter S. Housel" <ho...@acm.org>.
> An example pdf file is available at
> http://vaggen.net/~tengvig/unicode.pdf

I haven't tried this patch yet, but I notice that cutting and pasting text
from the subsetted font in the example file yields garbage.  Is there any
way to subset a font and still preserve the proper encoding for text
extraction?

Cheers,
-Peter S. Housel-   housel@acm.org   http://members.home.com/housel/



---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org