You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Jignesh Sh (JIRA)" <ji...@apache.org> on 2009/11/16 13:07:39 UTC
[jira] Closed: (PDFBOX-547) problem in extracting text using PDFBox
[ https://issues.apache.org/jira/browse/PDFBOX-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jignesh Sh closed PDFBOX-547.
-----------------------------
Closing this issue
> problem in extracting text using PDFBox
> ---------------------------------------
>
> Key: PDFBOX-547
> URL: https://issues.apache.org/jira/browse/PDFBOX-547
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 0.7.0
> Reporter: Jignesh Sh
> Original Estimate: 96h
> Remaining Estimate: 96h
>
> Hi All,
> I am facing problem in extracting text using PDFBox.
> Program hang at the line pdfText = stripper.getText(pdDoc); and returns nothing.
> Actually I am using PDFBox version PDFBox-0.6.7a.jar
> Here is my code
> public String getPDFContent(ZipEntry pdfEntry)
> {
> boolean status = false;
> String pdfText = null;
> ZipIssueFactory issueFactory = null;
> logger.debug("Processing : " + pdfEntry.getName());
> COSDocument cosDoc = null;
> PDDocument pdDoc = null;
> try
> {
> cosDoc = parseDocument(zipFile.getInputStream(pdfEntry)); // Load InputStream into memory
>
> // skipping the PDF document, if it is encrypted
> if (cosDoc.isEncrypted()) {
> logger.warn("Can not decrypt PDF document w/o password, skipping:"+ pdfEntry.getName());
> return pdfText;
> }
> // extract PDF document's textual content
> pdDoc = new PDDocument(cosDoc);
> PDFTextStripper stripper = new PDFTextStripper();
> pdfText = stripper.getText(pdDoc);
> }
> catch (IOException e) {
> pdfText = null;
> logger.error("IOException in parsing PDF document: " + e);
> }
> finally{
> closeCOSDocument(cosDoc);
> closePDDocument(pdDoc);
> }
> return pdfText;
> }
> private static COSDocument parseDocument(InputStream is) throws IOException {
> PDFParser parser = new PDFParser(is);
> parser.parse();
> return parser.getDocument();
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.