You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Hong-Thai Nguyen (JIRA)" <ji...@apache.org> on 2013/12/02 15:02:42 UTC

[jira] [Comment Edited] (PDFBOX-1787) pdfbox hangs on a corrupt PDF file

    [ https://issues.apache.org/jira/browse/PDFBOX-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836523#comment-13836523 ] 

Hong-Thai Nguyen edited comment on PDFBOX-1787 at 12/2/13 2:01 PM:
-------------------------------------------------------------------

I agree that we can't do anything to extract text's content but what's we expecting that our pdfbox should stop and report properly when having this kind of problem.
NonSequenticalPDFParser is the newer one with more robustness of PDF files ? Text extraction result is the same as current PDFParser ? I'm reading code of PDFBOX-1104, seem that this parser improve extraction perf by starting extraction from random page.

Thanks


was (Author: thaichat04):
I agree that we can't do anything to extract text's content but what's we expecting that our pdfbox should stop and report properly when having this kind of problem.
NonSequenticalPDFParser is the newer one with more robustness of PDF files ? Text extraction result is the same as current PDFParser ?

Thanks

> pdfbox hangs on a corrupt PDF file
> ----------------------------------
>
>                 Key: PDFBOX-1787
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1787
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.8.3
>         Environment: windows
>            Reporter: Hong-Thai Nguyen
>         Attachments: corrupt_file.pdf
>
>
> pdfbox hangs on command line on attached file.



--
This message was sent by Atlassian JIRA
(v6.1#6144)