You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2013/10/28 18:58:38 UTC
[jira] [Updated] (PDFBOX-1716) PDDocument.getNumberOfPages() return
0 for certain PDF document
[ https://issues.apache.org/jira/browse/PDFBOX-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler updated PDFBOX-1716:
---------------------------------------
Fix Version/s: (was: 1.8.2)
> PDDocument.getNumberOfPages() return 0 for certain PDF document
> ---------------------------------------------------------------
>
> Key: PDFBOX-1716
> URL: https://issues.apache.org/jira/browse/PDFBOX-1716
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.8.2
> Reporter: Tom
>
> Sample document(https://issues.apache.org/jira/secure/attachment/12430914/FormI-9-English.pdf) can be found here https://issues.apache.org/jira/browse/PDFBOX-578. Looks the NPE issue fix in that work item https://issues.apache.org/jira/browse/PDFBOX-578 is a work around.
> When I try to extract the text content from /FormI-9-English.pdf , when I call PDDocument.getNumberOfPages(), this method return 0 which makes the extraction of the text content impossible:
> InputStream in = <PDF InputStream>
> PDFParser parser = new PDFParser(content);
> PDFTextStripper pdfStripper = null;
> String parsedText = null;
> parser.parse();
> cosDoc = parser.getDocument();
> pdfStripper = new PDFTextStripper();
> pdDoc = new PDDocument(cosDoc);
>
> for(int i=1; i<= pdDoc.getNumberOfPages(); i++) { // pdDoc.getNumberOfPages() return 0, which is incorrect
>
> }
> Note:
> 1. This problem is found in the PDFBox latest version 1.8.2
> 2. I didn't which component to file this defect, so please assign to the correct component if needed, Thanks
--
This message was sent by Atlassian JIRA
(v6.1#6144)