You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Jack Bush <ne...@yahoo.com.au> on 2011/08/11 10:30:08 UTC
An exception occured in parsing the PDF Document.

Hi All,
 
I am getting the following exception when trying to convert many PDF to Text files (in a loop):
 
ABC.pdf
PDF to Text conversion of ABC.txt has succeeded
An exception occured in parsing the PDF Document.
java.io.IOException: Error: Header doesn't contain versioninfo
XYZ.pdf
            at org.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:312)
            at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:169)
            at PDF2Text.PDFTextParser.pdftoText(PDFTextParser.java:39)
            at hpg.ImportHPGData.main(ImportHPGData.java:43)
 
Below is the PDFBox example where pdftoText() & writeTexttoFile() methods have been merged:
 
    public boolean pdftoText(String pdfSource, String txtTarget) {
 
        try {
            parser = new PDFParser(new FileInputStream(new File(pdfSource)));
        } catch (Exception e) {
            System.out.println("Unable to open " + pdfSource);
        }
        
        try 
        {
            parser.parse();
            cosDoc = parser.getDocument();
            pdfStripper = new PDFTextStripper();
            pdDoc = new PDDocument(cosDoc);
            parsedText = pdfStripper.getText(pdDoc);
            if (parsedText == null) {
            System.out.println("File " + pdfSource + " has failed PDF to Text Conversion.");
                return false;
            }
            else 
            {
                BufferedWriter txtTargetBW = new BufferedWriter(new FileWriter(txtTarget));
                txtTargetBW.write(parsedText);
                txtTargetBW.close();
                try {
                       if (cosDoc != null) cosDoc.close();
                       if (pdDoc != null) pdDoc.close();
                   } catch (Exception e) {
                   e.printStackTrace();
                }
            }
        } 
        catch (Exception e) 
        {
            System.out.println("An exception occured in parsing the PDF Document.");
            e.printStackTrace();
        }
        return true;
}
 
Any reason why this is occurring? I had no problem converting individual file and using the original example where the 2 methods were separated?
 
Thanks in advance,
 
Jack