You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2008/11/11 23:29:45 UTC
[jira] Updated: (PDFBOX-318) Error getting pdf version

     [ https://issues.apache.org/jira/browse/PDFBOX-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler updated PDFBOX-318:
--------------------------------------

    Attachment: patch_pdfbox-318.diff

Obviously some files are misformed concerning their header. I've made a patch to make the extraction of the version of a pdf-document more robust. Therefore I've added a new method to the class org.apache.pdfbox.pdfparser.PDFParser. With this patch the document mentioned in the issue description will be parsed without problems.



> Error getting pdf version
> -------------------------
>
>                 Key: PDFBOX-318
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-318
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>         Attachments: patch_pdfbox-318.diff
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1822452
> Originally submitted by nobody on 2007-10-29 17:37.
> java.io.IOException: Error getting pdf version:java.lang.NumberFormatException: For input string: "-"
>  at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:166)
>  at org.pdfbox.pdmodel.PDDocument.load(PDDocument.java:707)
>  at org.pdfbox.pdmodel.PDDocument.load(PDDocument.java:691)
>  at org.pdfbox.pdmodel.PDDocument.load(PDDocument.java:633)
>  at test.pdfbox.pdfparser.TestPDFParser.test_exception_version1(TestPDFParser.java:112)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>  at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>  at java.lang.reflect.Method.invoke(Unknown Source)
>  at junit.framework.TestCase.runTest(TestCase.java:154)
>  at junit.framework.TestCase.runBare(TestCase.java:127)
>  at junit.framework.TestResult$1.protect(TestResult.java:106)
>  at junit.framework.TestResult.runProtected(TestResult.java:124)
>  at junit.framework.TestResult.run(TestResult.java:109)
>  at junit.framework.TestCase.run(TestCase.java:118)
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1822452&file_id=251894
> exception_version1.pdf (application/pdf), 196864 bytes
> [comment on SourceForge]
> Originally sent by nobody.
> Logged In: NO 
> Someone can put a better more throughtful fix in.
> Here is what I did to fix it.
> PDFParser.java:
>     public void parse() throws IOException
>     {
>         try
>         {
>             if ( raf == null )
>             {
>                 checktmpDir();
>                 document = new COSDocument( tempDirectory );
>             }
>             else
>             {
>                 document = new COSDocument( raf );
>             }
>             setDocument( document );
>             findVersion();   // New method see below.
>             // Code to find version moved to method findVersion();
>             skipHeaderFillBytes();
>             Object nextObject;
>             [...]
> ----
>     /**
>      * Attempt to find version in the following form %PDF-<number><0a|0d>%
>      * @throws IOException
>      */
>     private void findVersion() throws IOException
>     {
>         String header = null;
>         // try 5 lines to get PDF Version.
>         for ( int i = 0; i < 5; i++) {
>             header = readLine();
>             
>             //sometimes there are some garbage bytes in the header before the header
>             //actually starts, so lets try to find the header first.
>             int headerStart = header.indexOf( PDF_HEADER );
>             //greater than zero because if it is zero then
>             //there is no point of trimming            
>             if( headerStart > 0 )
>             {
>                 //trim off any leading characters
>                 header = header.substring( headerStart, header.length() );
>             } else if (headerStart < 0)
>                 continue;  // Did not find the Header Go look at next line
>             
>             document.setHeaderString( header );  
>             try
>             {
>                 float pdfVersion = Float.parseFloat( 
>                     header.substring( PDF_HEADER.length(), Math.min( header.length(), PDF_HEADER.length()+3) ) );
>                 document.setVersion( pdfVersion );
>                 return;  // Express return.
>             }
>             catch( NumberFormatException e )
>             {
>                 throw new IOException( "Error getting pdf version: " + header + "\n" + e );
>             }            
>         }
>         throw new IOException( "Unable to find version");            
>     }
> ----
> [comment on SourceForge]
> Originally sent by nobody.
> Logged In: NO 
> Debugged it with a hex dump on the submitted file 
> ---
> Appears that the Version started at office 0x80 instead of the first line.
> AdobeReader 7.x appears to have skipped to the right version and display the rest properly.
> So I think something needs to be done with PDFParser::parse() version checking.
> 00000000: 001f 3339 3339 202d 2057 4648 202d 2050  ..3939 - WFH - P
> 00000010: 7265 7020 666f 2331 3533 3245 332e 7064  rep fo#1532E3.pd
> 00000020: 6600 0000 0000 0000 0000 0000 0000 0000  f...............
> 00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> 00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> 00000050: 0000 0000 0300 2100 0000 00c2 550d 05c2  ......!.....U...
> 00000060: 550d 0500 0000 0000 0000 0000 0000 0000  U...............
> 00000070: 0000 0000 0000 0000 0000 8181 af49 0000  .............I..
> 00000080: 2550 4446 2d31 2e33 0a25 c4e5 f2e5 eba7  %PDF-1.3.%......
> 00000090: f3a0 d0c4 c60a 3220 3020 6f62 6a0a 3c3c  ......2 0 obj.<<
> [comment on SourceForge]
> Originally sent by nobody.
> Logged In: NO 
> Tested on 0.7.2, 0.7.3, latest 0.7.4-2007-10-22

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.