You are viewing a plain text version of this content. The canonical link for it is here.

Posted to fop-dev@xmlgraphics.apache.org by Robert <rm...@hotmail.co.uk> on 2014/03/07 12:23:37 UTC

[VOTE] Applying the Type 1 subset patch

Hi All,

About a week ago I posted a patch to add Type 1 subset support to FOP. All referenced Type 1 fonts (unless set to embedding-mode="full") will now be subset by default much like the behaviour exhibited by TrueType and OpenType. As this is a big feature and quite involved I think it is necessary to vote on whether to add this feature in it's current state to FOP. I'm not sure if anyone has taken a look at what has gone into this or tried it out yet, but it might be worth doing so before making your decision.

I am going to be away for the next week or so but will tally up the votes and post the result once I am back.

Here is a link to the patch and issue:
https://issues.apache.org/jira/browse/FOP-2354

Regards,

Robert Meyer

Re: [VOTE] Applying the Type 1 subset patch

Posted by Vincent Hennebert <vh...@gmail.com>.

On 07/03/14 12:23, Robert wrote:
> Hi All,
>
> About a week ago I posted a patch to add Type 1 subset support to FOP. All referenced Type 1 fonts (unless set to embedding-mode="full") will now be subset by default much like the behaviour exhibited by TrueType and OpenType. As this is a big feature and quite involved I think it is necessary to vote on whether to add this feature in it's current state to FOP. I'm not sure if anyone has taken a look at what has gone into this or tried it out yet, but it might be worth doing so before making your decision.
>
> I am going to be away for the next week or so but will tally up the votes and post the result once I am back.
>
> Here is a link to the patch and issue:
> https://issues.apache.org/jira/browse/FOP-2354
>
> Regards,
>
> Robert Meyer

 From the quick look I had at the patch, I must say that some things are
sources of concern to me:
• The PostScript parser seems to be mixing lexical analysis, syntax
   analysis and interpretation. This makes it hard to follow and I could
   not figure out the meanings of the conditions in the various ‘if’
   statements inside the ‘parse’ method. Also, part of the parsing seems
   to be leaking into Type1SubsetFile. I’m concerned about the robustness
   of the thing. For example, there are unguarded calls to
   Integer.parseInt. How tolerant will that be to malformed font files?
• It seems that Type1SubsetFile tries to infer the mapping of character
   codes to glyph names. That essentially re-does what the mapChar method
   has already done earlier, with probable mismatch between the outputs
   of the two methods. In Type1SubsetFile.readEncoding I see references
   to the WinAnsi encoding, which may have nothing to do at all with the
   font’s own encoding. I suspect this is the source of the exception
   thrown when running the FO I attached to the issue.
• there is a lot of memory allocation. First, the font is entirely
   loaded in memory in Type1SubsetFile.createSubset, then again in
   PFBParser, plus data copied around when creating the subset. Surely
   some of this memory allocation can be avoided. Have you profiled the
   code? How much more slow is it compared to fully embedding the font?

Due to the possible regressions and the potential impact on performance,
I must vote -1 against enabling Type 1 subsetting by default. If Type 1
subsetting is left as an option that can be manually configured by the
user, then I vote +0.

Vincent

Re: [VOTE] Applying the Type 1 subset patch

Posted by Clay Leeds <th...@gmail.com>.

+1 from me. Nice work, Robert!

Clay

On Mar 7, 2014, at 3:23 AM, Robert <rm...@hotmail.co.uk> wrote:

> Hi All,
> 
> About a week ago I posted a patch to add Type 1 subset support to FOP. All referenced Type 1 fonts (unless set to embedding-mode="full") will now be subset by default much like the behaviour exhibited by TrueType and OpenType. As this is a big feature and quite involved I think it is necessary to vote on whether to add this feature in it's current state to FOP. I'm not sure if anyone has taken a look at what has gone into this or tried it out yet, but it might be worth doing so before making your decision.
> 
> I am going to be away for the next week or so but will tally up the votes and post the result once I am back.
> 
> Here is a link to the patch and issue:
> https://issues.apache.org/jira/browse/FOP-2354
> 
> Regards,
> 
> Robert Meyer

Re: [VOTE] Applying the Type 1 subset patch

Posted by Pascal Sancho <ps...@gmail.com>.

Hi Robert,

+1 for me, I like it.

I tried your patch with a short XSL-FO containing a single paragraph
Lorem ipsum, and I got a 20kB file rather than a 35kB one, using a 6
files font (light+regular+bold).

2014-03-07 12:23 GMT+01:00 Robert <rm...@hotmail.co.uk>:
> Hi All,
>
> About a week ago I posted a patch to add Type 1 subset support to FOP. All
> referenced Type 1 fonts (unless set to embedding-mode="full") will now be
> subset by default much like the behaviour exhibited by TrueType and
> OpenType. As this is a big feature and quite involved I think it is
> necessary to vote on whether to add this feature in it's current state to
> FOP. I'm not sure if anyone has taken a look at what has gone into this or
> tried it out yet, but it might be worth doing so before making your
> decision.
>
> I am going to be away for the next week or so but will tally up the votes
> and post the result once I am back.
>
> Here is a link to the patch and issue:
> https://issues.apache.org/jira/browse/FOP-2354
>
> Regards,
>
> Robert Meyer



-- 
pascal

RE: [VOTE] Applying the Type 1 subset patch

Posted by Robert <rm...@hotmail.co.uk>.

Hi All,

Thanks for your votes and testing the code. From reading the feedback I don't think it would be the right option to simply modify and push it through as a disabled by default feature and so will register Vincent's vote as a -1 and look to address his and Luis's concerns.

Regarding one of the points Vincent made about the Postscript Parser, the matter is complicated by the nature of the code being parsed. A traditional method of parsing a file would be to scan for tokens (using maybe a String Tokenizer) and then send those to the interpreter. Unfortunately Postscript Type 1 fonts have a mixture of regular code and binary data (Subroutines / CharString data). If a traditional Tokenizer were to be used the data would inevitably become corrupted. The alternative I chose balances the need to keep these sections intact and accessible whilst providing the means to parse tokens and interpret them as part of an expandable solution. There may be other solutions but any parser which would be written would need to do so on a byte by byte basis as opposed to feeding it in and expecting a list of tokens. I am going to leave the current implementation as it is but will look to address the Bakoma font problem Luis found and perform more extensive testing with other Type 1 fonts to try and prevent any further issues.

I will look to address the other issues you both raised in the coming weeks.

Thanks for your input.

Robert Meyer

Date: Mon, 17 Mar 2014 00:19:18 +0000
From: lmpmbernardo@gmail.com
To: fop-dev@xmlgraphics.apache.org
Subject: Re: [VOTE] Applying the Type 1 subset patch


  
    
  
  
    

      I performed some further tests, still on Mac, but with a couple of
      ghostscript type1 fonts, which are probably the same one finds in
      Linux.

      

      The test was successful in that the output looked good (for the
      record I has some unpredictable output between different runs
      which I could not reliably reproduce so I attribute that to an
      environment issue, maybe the .fop directory).

      

      My example included characters not present in the font. Instead of
      # for the missing glyph I got z (see example attached), which
      probably is not intended (i.e., looks like a bug). I was also
      expecting that Adobe would indicate that the fonts are subset but
      it doesn't but this could be a wrong expectation (the subset file
      is nevertheless considerably smaller -- 64KB versus 219 KB).

      

      Finally I ran a simple performance test. With the patched code
      (that produces subset) the time was 175 msecs. With the current
      trunk 83 msecs.

      

      So I think the suggestion that Vincent put forward to not make
      subset the default for type1 makes sense for now. I think this
      requires a new vote with a new patch.

      

      On 3/12/14, 12:06 AM, Luis Bernardo wrote:

    
    
      
      

        Since apparently Macs have no type1 fonts I had to look for some
        and I tried the first one from http://www.ctan.org/tex-archive/fonts/cm/ps-type1/bakoma
        (cmb10) which gave a problem:

        

        java.util.NoSuchElementException

            at java.util.Scanner.throwFor(Scanner.java:907)

            at java.util.Scanner.next(Scanner.java:1530)

            at java.util.Scanner.nextInt(Scanner.java:2160)

            at java.util.Scanner.nextInt(Scanner.java:2119)

            at
org.apache.fop.fonts.type1.PostscriptParser$PSFixedArray.addEntry(PostscriptParser.java:379)

            at
org.apache.fop.fonts.type1.PostscriptParser$PSFixedArray.parseToken(PostscriptParser.java:329)

            .....

        

        So it seems this needs to be tested with more fonts. But I will
        test next in with the default Linux type1 fonts.

        

        On 3/7/14, 11:23 AM, Robert wrote:

      
      
        
        Hi All,

          

          About a week ago I posted a patch to add Type 1 subset support
          to FOP. All referenced Type 1 fonts (unless set to
          embedding-mode="full") will now be subset by default much like
          the behaviour exhibited by TrueType and OpenType. As this is a
          big feature and quite involved I think it is necessary to vote
          on whether to add this feature in it's current state to FOP.
          I'm not sure if anyone has taken a look at what has gone into
          this or tried it out yet, but it might be worth doing so
          before making your decision.

          

          I am going to be away for the next week or so but will tally
          up the votes and post the result once I am back.

          

          Here is a link to the patch and issue:

          https://issues.apache.org/jira/browse/FOP-2354

          

          Regards,

          

          Robert Meyer

Re: [VOTE] Applying the Type 1 subset patch

Posted by Vincent Hennebert <vh...@gmail.com>.

On 17/03/14 01:19, Luis Bernardo wrote:
>
> I performed some further tests, still on Mac, but with a couple of ghostscript
> type1 fonts, which are probably the same one finds in Linux.
>
> The test was successful in that the output looked good (for the record I has
> some unpredictable output between different runs which I could not reliably
> reproduce so I attribute that to an environment issue, maybe the .fop directory).
>
> My example included characters not present in the font. Instead of # for the
> missing glyph I got z (see example attached), which probably is not intended
> (i.e., looks like a bug). I was also expecting that Adobe would indicate that
> the fonts are subset but it doesn't but this could be a wrong expectation (the

this is probably because the font’s PostScript name doesn’t start with
a subset tag (6 uppercase letters followed by a +) like it should. Also,
it may be necessary to add a CharSet entry to the font descriptor.


> subset file is nevertheless considerably smaller -- 64KB versus 219 KB).
>
> Finally I ran a simple performance test. With the patched code (that produces
> subset) the time was 175 msecs. With the current trunk 83 msecs.
>
> So I think the suggestion that Vincent put forward to not make subset the
> default for type1 makes sense for now. I think this requires a new vote with a
> new patch.

Vincent


> On 3/12/14, 12:06 AM, Luis Bernardo wrote:
>>
>> Since apparently Macs have no type1 fonts I had to look for some and I tried
>> the first one from http://www.ctan.org/tex-archive/fonts/cm/ps-type1/bakoma
>> (cmb10) which gave a problem:
>>
>> java.util.NoSuchElementException
>>     at java.util.Scanner.throwFor(Scanner.java:907)
>>     at java.util.Scanner.next(Scanner.java:1530)
>>     at java.util.Scanner.nextInt(Scanner.java:2160)
>>     at java.util.Scanner.nextInt(Scanner.java:2119)
>>     at
>> org.apache.fop.fonts.type1.PostscriptParser$PSFixedArray.addEntry(PostscriptParser.java:379)
>>
>>     at
>> org.apache.fop.fonts.type1.PostscriptParser$PSFixedArray.parseToken(PostscriptParser.java:329)
>>
>>     .....
>>
>> So it seems this needs to be tested with more fonts. But I will test next in
>> with the default Linux type1 fonts.
>>
>> On 3/7/14, 11:23 AM, Robert wrote:
>>> Hi All,
>>>
>>> About a week ago I posted a patch to add Type 1 subset support to FOP. All
>>> referenced Type 1 fonts (unless set to embedding-mode="full") will now be
>>> subset by default much like the behaviour exhibited by TrueType and
>>> OpenType. As this is a big feature and quite involved I think it is
>>> necessary to vote on whether to add this feature in it's current state to
>>> FOP. I'm not sure if anyone has taken a look at what has gone into this or
>>> tried it out yet, but it might be worth doing so before making your decision.
>>>
>>> I am going to be away for the next week or so but will tally up the votes
>>> and post the result once I am back.
>>>
>>> Here is a link to the patch and issue:
>>> https://issues.apache.org/jira/browse/FOP-2354
>>>
>>> Regards,
>>>
>>> Robert Meyer
>>
>
>

Re: [VOTE] Applying the Type 1 subset patch

Posted by Luis Bernardo <lm...@gmail.com>.

I performed some further tests, still on Mac, but with a couple of 
ghostscript type1 fonts, which are probably the same one finds in Linux.

The test was successful in that the output looked good (for the record I 
has some unpredictable output between different runs which I could not 
reliably reproduce so I attribute that to an environment issue, maybe 
the .fop directory).

My example included characters not present in the font. Instead of # for 
the missing glyph I got z (see example attached), which probably is not 
intended (i.e., looks like a bug). I was also expecting that Adobe would 
indicate that the fonts are subset but it doesn't but this could be a 
wrong expectation (the subset file is nevertheless considerably smaller 
-- 64KB versus 219 KB).

Finally I ran a simple performance test. With the patched code (that 
produces subset) the time was 175 msecs. With the current trunk 83 msecs.

So I think the suggestion that Vincent put forward to not make subset 
the default for type1 makes sense for now. I think this requires a new 
vote with a new patch.

On 3/12/14, 12:06 AM, Luis Bernardo wrote:
>
> Since apparently Macs have no type1 fonts I had to look for some and I 
> tried the first one from 
> http://www.ctan.org/tex-archive/fonts/cm/ps-type1/bakoma (cmb10) which 
> gave a problem:
>
> java.util.NoSuchElementException
>     at java.util.Scanner.throwFor(Scanner.java:907)
>     at java.util.Scanner.next(Scanner.java:1530)
>     at java.util.Scanner.nextInt(Scanner.java:2160)
>     at java.util.Scanner.nextInt(Scanner.java:2119)
>     at 
> org.apache.fop.fonts.type1.PostscriptParser$PSFixedArray.addEntry(PostscriptParser.java:379)
>     at 
> org.apache.fop.fonts.type1.PostscriptParser$PSFixedArray.parseToken(PostscriptParser.java:329)
>     .....
>
> So it seems this needs to be tested with more fonts. But I will test 
> next in with the default Linux type1 fonts.
>
> On 3/7/14, 11:23 AM, Robert wrote:
>> Hi All,
>>
>> About a week ago I posted a patch to add Type 1 subset support to 
>> FOP. All referenced Type 1 fonts (unless set to 
>> embedding-mode="full") will now be subset by default much like the 
>> behaviour exhibited by TrueType and OpenType. As this is a big 
>> feature and quite involved I think it is necessary to vote on whether 
>> to add this feature in it's current state to FOP. I'm not sure if 
>> anyone has taken a look at what has gone into this or tried it out 
>> yet, but it might be worth doing so before making your decision.
>>
>> I am going to be away for the next week or so but will tally up the 
>> votes and post the result once I am back.
>>
>> Here is a link to the patch and issue:
>> https://issues.apache.org/jira/browse/FOP-2354
>>
>> Regards,
>>
>> Robert Meyer
>

Re: [VOTE] Applying the Type 1 subset patch

Posted by Luis Bernardo <lm...@gmail.com>.

Since apparently Macs have no type1 fonts I had to look for some and I 
tried the first one from 
http://www.ctan.org/tex-archive/fonts/cm/ps-type1/bakoma (cmb10) which 
gave a problem:

java.util.NoSuchElementException
     at java.util.Scanner.throwFor(Scanner.java:907)
     at java.util.Scanner.next(Scanner.java:1530)
     at java.util.Scanner.nextInt(Scanner.java:2160)
     at java.util.Scanner.nextInt(Scanner.java:2119)
     at 
org.apache.fop.fonts.type1.PostscriptParser$PSFixedArray.addEntry(PostscriptParser.java:379)
     at 
org.apache.fop.fonts.type1.PostscriptParser$PSFixedArray.parseToken(PostscriptParser.java:329)
     .....

So it seems this needs to be tested with more fonts. But I will test 
next in with the default Linux type1 fonts.

On 3/7/14, 11:23 AM, Robert wrote:
> Hi All,
>
> About a week ago I posted a patch to add Type 1 subset support to FOP. 
> All referenced Type 1 fonts (unless set to embedding-mode="full") will 
> now be subset by default much like the behaviour exhibited by TrueType 
> and OpenType. As this is a big feature and quite involved I think it 
> is necessary to vote on whether to add this feature in it's current 
> state to FOP. I'm not sure if anyone has taken a look at what has gone 
> into this or tried it out yet, but it might be worth doing so before 
> making your decision.
>
> I am going to be away for the next week or so but will tally up the 
> votes and post the result once I am back.
>
> Here is a link to the patch and issue:
> https://issues.apache.org/jira/browse/FOP-2354
>
> Regards,
>
> Robert Meyer

Re: [VOTE] Applying the Type 1 subset patch

Posted by Chris Bowditch <bo...@hotmail.com>.

Hi Rob,

+1 from me. Good work.

Thanks,

Chris

On 07/03/2014 11:23, Robert wrote:
> Hi All,
>
> About a week ago I posted a patch to add Type 1 subset support to FOP. 
> All referenced Type 1 fonts (unless set to embedding-mode="full") will 
> now be subset by default much like the behaviour exhibited by TrueType 
> and OpenType. As this is a big feature and quite involved I think it 
> is necessary to vote on whether to add this feature in it's current 
> state to FOP. I'm not sure if anyone has taken a look at what has gone 
> into this or tried it out yet, but it might be worth doing so before 
> making your decision.
>
> I am going to be away for the next week or so but will tally up the 
> votes and post the result once I am back.
>
> Here is a link to the patch and issue:
> https://issues.apache.org/jira/browse/FOP-2354
>
> Regards,
>
> Robert Meyer

RE: [VOTE] Applying the Type 1 subset patch

Posted by Robert <rm...@hotmail.co.uk>.

The (optional) fontbox library dependency was added for the OpenType font / subset support which is already in trunk. This patch for subsetting Type 1 fonts adds no new dependencies and does not use fontbox.

From: glenn@skynav.com
Date: Fri, 7 Mar 2014 10:23:18 -0700
Subject: Re: [VOTE] Applying the Type 1 subset patch
To: fop-dev@xmlgraphics.apache.org

On Fri, Mar 7, 2014 at 4:23 AM, Robert <rm...@hotmail.co.uk> wrote:

Hi All,

About a week ago I posted a patch to add Type 1 subset support to FOP. All referenced Type 1 fonts (unless set to embedding-mode="full") will now be subset by default much like the behaviour exhibited by TrueType and OpenType. As this is a big feature and quite involved I think it is necessary to vote on whether to add this feature in it's current state to FOP. I'm not sure if anyone has taken a look at what has gone into this or tried it out yet, but it might be worth doing so before making your decision.

I am going to be away for the next week or so but will tally up the votes and post the result once I am back.

Here is a link to the patch and issue:
https://issues.apache.org/jira/browse/FOP-2354

Just to remind me, what new (external) library dependencies does this entail? FontBox? 

Regards,

Robert Meyer

Re: [VOTE] Applying the Type 1 subset patch

Posted by Glenn Adams <gl...@skynav.com>.

On Fri, Mar 7, 2014 at 4:23 AM, Robert <rm...@hotmail.co.uk> wrote:

> Hi All,
>
> About a week ago I posted a patch to add Type 1 subset support to FOP. All
> referenced Type 1 fonts (unless set to embedding-mode="full") will now be
> subset by default much like the behaviour exhibited by TrueType and
> OpenType. As this is a big feature and quite involved I think it is
> necessary to vote on whether to add this feature in it's current state to
> FOP. I'm not sure if anyone has taken a look at what has gone into this or
> tried it out yet, but it might be worth doing so before making your
> decision.
>
> I am going to be away for the next week or so but will tally up the votes
> and post the result once I am back.
>
> Here is a link to the patch and issue:
> https://issues.apache.org/jira/browse/FOP-2354
>

Just to remind me, what new (external) library dependencies does this
entail? FontBox?


>
>
> Regards,
>
> Robert Meyer
>

Re: [VOTE] Applying the Type 1 subset patch

Posted by Glenn Adams <gl...@skynav.com>.

+1


On Fri, Mar 7, 2014 at 4:23 AM, Robert <rm...@hotmail.co.uk> wrote:

> Hi All,
>
> About a week ago I posted a patch to add Type 1 subset support to FOP. All
> referenced Type 1 fonts (unless set to embedding-mode="full") will now be
> subset by default much like the behaviour exhibited by TrueType and
> OpenType. As this is a big feature and quite involved I think it is
> necessary to vote on whether to add this feature in it's current state to
> FOP. I'm not sure if anyone has taken a look at what has gone into this or
> tried it out yet, but it might be worth doing so before making your
> decision.
>
> I am going to be away for the next week or so but will tally up the votes
> and post the result once I am back.
>
> Here is a link to the patch and issue:
> https://issues.apache.org/jira/browse/FOP-2354
>
> Regards,
>
> Robert Meyer
>