You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "thomas menzel (JIRA)" <ji...@apache.org> on 2009/10/21 21:26:59 UTC

[jira] Created: (PDFBOX-546) [parser] .PDFXrefStreamParser.parse fails with java.util.NoSuchElementException

[parser] .PDFXrefStreamParser.parse fails with java.util.NoSuchElementException
-------------------------------------------------------------------------------

                 Key: PDFBOX-546
                 URL: https://issues.apache.org/jira/browse/PDFBOX-546
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing, Text extraction
    Affects Versions: 0.8.0-incubator
            Reporter: thomas menzel


SYMPTOM
this is the full stack trace that i'm observing with the PDF file @ 

Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:860)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:825)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:750)
        at org.apache.pdfbox.ExtractText.main(ExtractText.java:173)
Caused by: java.util.NoSuchElementException
        at java.util.AbstractList$Itr.next(Unknown Source)
        at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115)
        at org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
        ... 4 more

STEPS
cmdline: org.apache.pdfbox.ExtractText on the file

i found the exception also @ PDFBOX-533 (https://issues.apache.org/jira/browse/PDFBOX-533?focusedCommentId=12756825&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12756825)  but am not sure if this is the same case or not.

see also https://issues.apache.org/jira/browse/PDFBOX-186 how i got to create this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PDFBOX-546) [parser] .PDFXrefStreamParser.parse fails with java.util.NoSuchElementException

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PDFBOX-546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler resolved PDFBOX-546.
---------------------------------------

    Resolution: Duplicate

This issue is already solved since svn version 821928.  See PDFBOX-536 for further details.
I've attached the text extracting result for the pdf mentioned in the description.

> [parser] .PDFXrefStreamParser.parse fails with java.util.NoSuchElementException
> -------------------------------------------------------------------------------
>
>                 Key: PDFBOX-546
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-546
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, Text extraction
>    Affects Versions: 0.8.0-incubator
>            Reporter: thomas menzel
>         Attachments: PDFBOX546-PwC-Tech-Forecast-Spring-2009.txt, screenshot-1.jpg
>
>
> SYMPTOM
> this is the full stack trace that i'm observing with the PDF file i attached @ https://issues.apache.org/jira/secure/attachment/12422836/PwC-Tech-Forecast-Spring-2009.pdf
> Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:860)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:825)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:750)
>         at org.apache.pdfbox.ExtractText.main(ExtractText.java:173)
> Caused by: java.util.NoSuchElementException
>         at java.util.AbstractList$Itr.next(Unknown Source)
>         at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115)
>         at org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538)
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
>         ... 4 more
> STEPS
> cmdline: org.apache.pdfbox.ExtractText on the file
> i found the exception also @ PDFBOX-533 (https://issues.apache.org/jira/browse/PDFBOX-533?focusedCommentId=12756825&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12756825)  but am not sure if this is the same case or not as this file is a lot smaller and have so little clue about the internal structure of PDF that i even can follow any of the comments. sorry.
> see also https://issues.apache.org/jira/browse/PDFBOX-186 how i got to create this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PDFBOX-546) [parser] .PDFXrefStreamParser.parse fails with java.util.NoSuchElementException

Posted by "Jean Philippe Alexis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PDFBOX-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770100#action_12770100 ] 

Jean Philippe Alexis commented on PDFBOX-546:
---------------------------------------------

The below code update gets the job done for me.
Assuming it's ok to have a null objID when type == 1.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Compare: (<)L:\temp\PDFXrefStreamParser.java (5487 bytes)
   with: (>)C:\Documents and Settings\J Philippe Alexis\workspace\PdfBox\src\org\apache\pdfbox\pdfparser\PDFXrefStreamParser.java (5749 bytes)

115c115,119
<                 Integer objID = (Integer)objIter.next();
---
>                 Integer objID = null;
>                 if(objIter.hasNext())
>                 {
>                 	objID = (Integer)objIter.next();
>                 }
137,138c141,145
<                         COSObjectKey objKey = new COSObjectKey(objID.intValue(), genNum);
<                         document.setXRef(objKey, offset);
---
>                         if(null != objID)
>                         {
>                         	COSObjectKey objKey = new COSObjectKey(objID.intValue(), genNum);
>                         	document.setXRef(objKey, offset);
>                         }


> [parser] .PDFXrefStreamParser.parse fails with java.util.NoSuchElementException
> -------------------------------------------------------------------------------
>
>                 Key: PDFBOX-546
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-546
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, Text extraction
>    Affects Versions: 0.8.0-incubator
>            Reporter: thomas menzel
>         Attachments: screenshot-1.jpg
>
>
> SYMPTOM
> this is the full stack trace that i'm observing with the PDF file i attached @ https://issues.apache.org/jira/secure/attachment/12422836/PwC-Tech-Forecast-Spring-2009.pdf
> Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:860)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:825)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:750)
>         at org.apache.pdfbox.ExtractText.main(ExtractText.java:173)
> Caused by: java.util.NoSuchElementException
>         at java.util.AbstractList$Itr.next(Unknown Source)
>         at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115)
>         at org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538)
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
>         ... 4 more
> STEPS
> cmdline: org.apache.pdfbox.ExtractText on the file
> i found the exception also @ PDFBOX-533 (https://issues.apache.org/jira/browse/PDFBOX-533?focusedCommentId=12756825&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12756825)  but am not sure if this is the same case or not as this file is a lot smaller and have so little clue about the internal structure of PDF that i even can follow any of the comments. sorry.
> see also https://issues.apache.org/jira/browse/PDFBOX-186 how i got to create this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PDFBOX-546) [parser] .PDFXrefStreamParser.parse fails with java.util.NoSuchElementException

Posted by "thomas menzel (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PDFBOX-546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

thomas menzel updated PDFBOX-546:
---------------------------------

    Description: 
SYMPTOM
this is the full stack trace that i'm observing with the PDF file i attached @ https://issues.apache.org/jira/secure/attachment/12422836/PwC-Tech-Forecast-Spring-2009.pdf

Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:860)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:825)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:750)
        at org.apache.pdfbox.ExtractText.main(ExtractText.java:173)
Caused by: java.util.NoSuchElementException
        at java.util.AbstractList$Itr.next(Unknown Source)
        at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115)
        at org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
        ... 4 more

STEPS
cmdline: org.apache.pdfbox.ExtractText on the file

i found the exception also @ PDFBOX-533 (https://issues.apache.org/jira/browse/PDFBOX-533?focusedCommentId=12756825&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12756825)  but am not sure if this is the same case or not as this file is a lot smaller and have so little clue about the internal structure of PDF that i even can follow any of the comments. sorry.

see also https://issues.apache.org/jira/browse/PDFBOX-186 how i got to create this issue.

  was:
SYMPTOM
this is the full stack trace that i'm observing with the PDF file @ 

Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:860)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:825)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:750)
        at org.apache.pdfbox.ExtractText.main(ExtractText.java:173)
Caused by: java.util.NoSuchElementException
        at java.util.AbstractList$Itr.next(Unknown Source)
        at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115)
        at org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
        ... 4 more

STEPS
cmdline: org.apache.pdfbox.ExtractText on the file

i found the exception also @ PDFBOX-533 (https://issues.apache.org/jira/browse/PDFBOX-533?focusedCommentId=12756825&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12756825)  but am not sure if this is the same case or not.

see also https://issues.apache.org/jira/browse/PDFBOX-186 how i got to create this issue.


> [parser] .PDFXrefStreamParser.parse fails with java.util.NoSuchElementException
> -------------------------------------------------------------------------------
>
>                 Key: PDFBOX-546
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-546
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, Text extraction
>    Affects Versions: 0.8.0-incubator
>            Reporter: thomas menzel
>
> SYMPTOM
> this is the full stack trace that i'm observing with the PDF file i attached @ https://issues.apache.org/jira/secure/attachment/12422836/PwC-Tech-Forecast-Spring-2009.pdf
> Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:860)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:825)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:750)
>         at org.apache.pdfbox.ExtractText.main(ExtractText.java:173)
> Caused by: java.util.NoSuchElementException
>         at java.util.AbstractList$Itr.next(Unknown Source)
>         at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115)
>         at org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538)
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
>         ... 4 more
> STEPS
> cmdline: org.apache.pdfbox.ExtractText on the file
> i found the exception also @ PDFBOX-533 (https://issues.apache.org/jira/browse/PDFBOX-533?focusedCommentId=12756825&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12756825)  but am not sure if this is the same case or not as this file is a lot smaller and have so little clue about the internal structure of PDF that i even can follow any of the comments. sorry.
> see also https://issues.apache.org/jira/browse/PDFBOX-186 how i got to create this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PDFBOX-546) [parser] .PDFXrefStreamParser.parse fails with java.util.NoSuchElementException

Posted by "thomas menzel (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PDFBOX-546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

thomas menzel updated PDFBOX-546:
---------------------------------

    Attachment: screenshot-1.jpg

screenshot-1 shows the properties of the PDF and there appears no dodgy PDF creator that messed up

> [parser] .PDFXrefStreamParser.parse fails with java.util.NoSuchElementException
> -------------------------------------------------------------------------------
>
>                 Key: PDFBOX-546
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-546
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, Text extraction
>    Affects Versions: 0.8.0-incubator
>            Reporter: thomas menzel
>         Attachments: screenshot-1.jpg
>
>
> SYMPTOM
> this is the full stack trace that i'm observing with the PDF file i attached @ https://issues.apache.org/jira/secure/attachment/12422836/PwC-Tech-Forecast-Spring-2009.pdf
> Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:860)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:825)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:750)
>         at org.apache.pdfbox.ExtractText.main(ExtractText.java:173)
> Caused by: java.util.NoSuchElementException
>         at java.util.AbstractList$Itr.next(Unknown Source)
>         at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115)
>         at org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538)
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
>         ... 4 more
> STEPS
> cmdline: org.apache.pdfbox.ExtractText on the file
> i found the exception also @ PDFBOX-533 (https://issues.apache.org/jira/browse/PDFBOX-533?focusedCommentId=12756825&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12756825)  but am not sure if this is the same case or not as this file is a lot smaller and have so little clue about the internal structure of PDF that i even can follow any of the comments. sorry.
> see also https://issues.apache.org/jira/browse/PDFBOX-186 how i got to create this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PDFBOX-546) [parser] .PDFXrefStreamParser.parse fails with java.util.NoSuchElementException

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PDFBOX-546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler updated PDFBOX-546:
--------------------------------------

    Attachment: PDFBOX546-PwC-Tech-Forecast-Spring-2009.txt

> [parser] .PDFXrefStreamParser.parse fails with java.util.NoSuchElementException
> -------------------------------------------------------------------------------
>
>                 Key: PDFBOX-546
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-546
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, Text extraction
>    Affects Versions: 0.8.0-incubator
>            Reporter: thomas menzel
>         Attachments: PDFBOX546-PwC-Tech-Forecast-Spring-2009.txt, screenshot-1.jpg
>
>
> SYMPTOM
> this is the full stack trace that i'm observing with the PDF file i attached @ https://issues.apache.org/jira/secure/attachment/12422836/PwC-Tech-Forecast-Spring-2009.pdf
> Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:860)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:825)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:750)
>         at org.apache.pdfbox.ExtractText.main(ExtractText.java:173)
> Caused by: java.util.NoSuchElementException
>         at java.util.AbstractList$Itr.next(Unknown Source)
>         at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115)
>         at org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538)
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
>         ... 4 more
> STEPS
> cmdline: org.apache.pdfbox.ExtractText on the file
> i found the exception also @ PDFBOX-533 (https://issues.apache.org/jira/browse/PDFBOX-533?focusedCommentId=12756825&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12756825)  but am not sure if this is the same case or not as this file is a lot smaller and have so little clue about the internal structure of PDF that i even can follow any of the comments. sorry.
> see also https://issues.apache.org/jira/browse/PDFBOX-186 how i got to create this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PDFBOX-546) [parser] .PDFXrefStreamParser.parse fails with java.util.NoSuchElementException

Posted by "Alexander Veit (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PDFBOX-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770468#action_12770468 ] 

Alexander Veit commented on PDFBOX-546:
---------------------------------------

> This issue is already solved since svn version 821928

Great! Do you recommend to build a pdfbox.jar from trunk, or to wait until an official bugfix release is available?

BTW, is there already a schedule for the next bugfix release?

> [parser] .PDFXrefStreamParser.parse fails with java.util.NoSuchElementException
> -------------------------------------------------------------------------------
>
>                 Key: PDFBOX-546
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-546
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, Text extraction
>    Affects Versions: 0.8.0-incubator
>            Reporter: thomas menzel
>         Attachments: PDFBOX546-PwC-Tech-Forecast-Spring-2009.txt, screenshot-1.jpg
>
>
> SYMPTOM
> this is the full stack trace that i'm observing with the PDF file i attached @ https://issues.apache.org/jira/secure/attachment/12422836/PwC-Tech-Forecast-Spring-2009.pdf
> Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:860)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:825)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:750)
>         at org.apache.pdfbox.ExtractText.main(ExtractText.java:173)
> Caused by: java.util.NoSuchElementException
>         at java.util.AbstractList$Itr.next(Unknown Source)
>         at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115)
>         at org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538)
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
>         ... 4 more
> STEPS
> cmdline: org.apache.pdfbox.ExtractText on the file
> i found the exception also @ PDFBOX-533 (https://issues.apache.org/jira/browse/PDFBOX-533?focusedCommentId=12756825&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12756825)  but am not sure if this is the same case or not as this file is a lot smaller and have so little clue about the internal structure of PDF that i even can follow any of the comments. sorry.
> see also https://issues.apache.org/jira/browse/PDFBOX-186 how i got to create this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PDFBOX-546) [parser] .PDFXrefStreamParser.parse fails with java.util.NoSuchElementException

Posted by "Alexander Veit (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PDFBOX-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770097#action_12770097 ] 

Alexander Veit commented on PDFBOX-546:
---------------------------------------

We have customers that stumble on http://issues.apache.org/jira/browse/PDFBOX-361. So we upgraded to 0.8.0, and now we get this error. Is there a workaround for this bug? If not, can you please give this issue a higher priority?

> [parser] .PDFXrefStreamParser.parse fails with java.util.NoSuchElementException
> -------------------------------------------------------------------------------
>
>                 Key: PDFBOX-546
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-546
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, Text extraction
>    Affects Versions: 0.8.0-incubator
>            Reporter: thomas menzel
>         Attachments: screenshot-1.jpg
>
>
> SYMPTOM
> this is the full stack trace that i'm observing with the PDF file i attached @ https://issues.apache.org/jira/secure/attachment/12422836/PwC-Tech-Forecast-Spring-2009.pdf
> Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:860)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:825)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:750)
>         at org.apache.pdfbox.ExtractText.main(ExtractText.java:173)
> Caused by: java.util.NoSuchElementException
>         at java.util.AbstractList$Itr.next(Unknown Source)
>         at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115)
>         at org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538)
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
>         ... 4 more
> STEPS
> cmdline: org.apache.pdfbox.ExtractText on the file
> i found the exception also @ PDFBOX-533 (https://issues.apache.org/jira/browse/PDFBOX-533?focusedCommentId=12756825&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12756825)  but am not sure if this is the same case or not as this file is a lot smaller and have so little clue about the internal structure of PDF that i even can follow any of the comments. sorry.
> see also https://issues.apache.org/jira/browse/PDFBOX-186 how i got to create this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.