You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Jack Bush <ne...@yahoo.com.au> on 2011/08/11 10:30:08 UTC
An exception occured in parsing the PDF Document.
Hi All,
I am getting the following exception when trying to convert many PDF to Text files (in a loop):
ABC.pdf
PDF to Text conversion of ABC.txt has succeeded
An exception occured in parsing the PDF Document.
java.io.IOException: Error: Header doesn't contain versioninfo
XYZ.pdf
at org.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:312)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:169)
at PDF2Text.PDFTextParser.pdftoText(PDFTextParser.java:39)
at hpg.ImportHPGData.main(ImportHPGData.java:43)
Below is the PDFBox example where pdftoText() & writeTexttoFile() methods have been merged:
public boolean pdftoText(String pdfSource, String txtTarget) {
try {
parser = new PDFParser(new FileInputStream(new File(pdfSource)));
} catch (Exception e) {
System.out.println("Unable to open " + pdfSource);
}
try
{
parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new PDFTextStripper();
pdDoc = new PDDocument(cosDoc);
parsedText = pdfStripper.getText(pdDoc);
if (parsedText == null) {
System.out.println("File " + pdfSource + " has failed PDF to Text Conversion.");
return false;
}
else
{
BufferedWriter txtTargetBW = new BufferedWriter(new FileWriter(txtTarget));
txtTargetBW.write(parsedText);
txtTargetBW.close();
try {
if (cosDoc != null) cosDoc.close();
if (pdDoc != null) pdDoc.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
catch (Exception e)
{
System.out.println("An exception occured in parsing the PDF Document.");
e.printStackTrace();
}
return true;
}
Any reason why this is occurring? I had no problem converting individual file and using the original example where the 2 methods were separated?
Thanks in advance,
Jack