You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Øyvind Berg (JIRA)" <ji...@apache.org> on 2011/05/04 18:56:03 UTC

[jira] [Created] (PDFBOX-1007) Maven performs textual filtering of binary resources [patch]

Maven performs textual filtering of binary resources [patch]
------------------------------------------------------------

                 Key: PDFBOX-1007
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1007
             Project: PDFBox
          Issue Type: Bug
    Affects Versions: 1.6.0
         Environment: Mac OS X 10.6.7, Java(TM) SE Runtime Environment (build 1.6.0_24-b07-334-10M3326), Apache Maven 3.0.2 (r1056850; 2011-01-09 01:58:10+0100)
            Reporter: Øyvind Berg
             Fix For: 1.6.0


This applies to current svn, r1099514.

This bit me when a lot of my files failed with the following stacktrace:

Error while processing PDF:
Caused by: java.io.IOException: head is mandatory
     at org.apache.fontbox.ttf.AbstractTTFParser.parseTables(AbstractTTFParser.java:107)
     at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:61)
     at org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:90)
     at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:26)
     at org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:66)
     at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:26)
     at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:204)
     at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:188)
     at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.<init>(PDTrueTypeFont.java:114)
     at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:116)
     at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:75)
     at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115)
     at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:243)
     at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
     at org.elacin.pdfextract.datasource.pdfbox.PDFBoxIntegration.processPage(PDFBoxIntegration.java:797)
     at org.elacin.pdfextract.datasource.pdfbox.PDFBoxIntegration.processDocument(PDFBoxIntegration.java:502)
     at org.elacin.pdfextract.datasource.pdfbox.PDFBoxSource.readPages(PDFBoxSource.java:74)
     ... 3 more


The reason was that binary files (in this case resources/ttf/ArialMT.ttf) were subject to filtering so that unicode unknown character-symbols were inserted. Please consider fixing this by turning filtering off in trunk/pdfextract/pom.xml in the following way:

<resource>
    <directory>src/main/resources</directory>
    <filtering>false</filtering>
</resource>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PDFBOX-1007) Maven performs textual filtering of binary resources [patch]

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler updated PDFBOX-1007:
---------------------------------------

    Fix Version/s:     (was: 1.6.0)

> Maven performs textual filtering of binary resources [patch]
> ------------------------------------------------------------
>
>                 Key: PDFBOX-1007
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1007
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>         Environment: Mac OS X 10.6.7, Java(TM) SE Runtime Environment (build 1.6.0_24-b07-334-10M3326), Apache Maven 3.0.2 (r1056850; 2011-01-09 01:58:10+0100)
>            Reporter: Øyvind Berg
>
> This applies to current svn, r1099514.
> This bit me when a lot of my files failed with the following stacktrace:
> Error while processing PDF:
> Caused by: java.io.IOException: head is mandatory
>      at org.apache.fontbox.ttf.AbstractTTFParser.parseTables(AbstractTTFParser.java:107)
>      at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:61)
>      at org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:90)
>      at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:26)
>      at org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:66)
>      at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:26)
>      at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:204)
>      at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:188)
>      at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.<init>(PDTrueTypeFont.java:114)
>      at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:116)
>      at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:75)
>      at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115)
>      at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:243)
>      at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
>      at org.elacin.pdfextract.datasource.pdfbox.PDFBoxIntegration.processPage(PDFBoxIntegration.java:797)
>      at org.elacin.pdfextract.datasource.pdfbox.PDFBoxIntegration.processDocument(PDFBoxIntegration.java:502)
>      at org.elacin.pdfextract.datasource.pdfbox.PDFBoxSource.readPages(PDFBoxSource.java:74)
>      ... 3 more
> The reason was that binary files (in this case resources/ttf/ArialMT.ttf) were subject to filtering so that unicode unknown character-symbols were inserted. Please consider fixing this by turning filtering off in trunk/pdfextract/pom.xml in the following way:
> <resource>
>     <directory>src/main/resources</directory>
>     <filtering>false</filtering>
> </resource>

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira