You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Samuli Saarinen (JIRA)" <ji...@apache.org> on 2012/05/23 10:45:40 UTC

[jira] [Created] (PDFBOX-1320) NPE in extractEmbeddedDocuments

Samuli Saarinen created PDFBOX-1320:
---------------------------------------

             Summary: NPE in extractEmbeddedDocuments
                 Key: PDFBOX-1320
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1320
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
    Affects Versions: 1.7.0
         Environment: pdfbox 1.7.0 (current trunk)
            Reporter: Samuli Saarinen


While parsing a pdf document the following exception is thrown:
java.lang.NullPointerException
	at org.apache.pdfbox.tika.PDFParser.extractEmbeddedDocuments(PDFParser.java:155)
	at org.apache.pdfbox.tika.PDFParser.parse(PDFParser.java:133)
	at test.TikaParse.main(TikaParse.java:27)

The document I'm trying to parse is probably confidential so I cannot attach it until (or if) I get clearence.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (PDFBOX-1320) NPE in extractEmbeddedDocuments

Posted by "Timo Boehme (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Timo Boehme resolved PDFBOX-1320.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 1.7.0

fixed as proposed in rev. 1342242
                
> NPE in extractEmbeddedDocuments
> -------------------------------
>
>                 Key: PDFBOX-1320
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1320
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.7.0
>         Environment: pdfbox 1.7.0 (current trunk)
>            Reporter: Samuli Saarinen
>            Assignee: Timo Boehme
>             Fix For: 1.7.0
>
>         Attachments: PDFBOX-1320.patch, PDNameTreeNode.java.patch
>
>
> While parsing a pdf document the following exception is thrown:
> java.lang.NullPointerException
> 	at org.apache.pdfbox.tika.PDFParser.extractEmbeddedDocuments(PDFParser.java:155)
> 	at org.apache.pdfbox.tika.PDFParser.parse(PDFParser.java:133)
> 	at test.TikaParse.main(TikaParse.java:27)
> The document I'm trying to parse is probably confidential so I cannot attach it until (or if) I get clearence.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (PDFBOX-1320) NPE in extractEmbeddedDocuments

Posted by "Timo Boehme (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Timo Boehme reassigned PDFBOX-1320:
-----------------------------------

    Assignee: Timo Boehme
    
> NPE in extractEmbeddedDocuments
> -------------------------------
>
>                 Key: PDFBOX-1320
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1320
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.7.0
>         Environment: pdfbox 1.7.0 (current trunk)
>            Reporter: Samuli Saarinen
>            Assignee: Timo Boehme
>         Attachments: PDFBOX-1320.patch, PDNameTreeNode.java.patch
>
>
> While parsing a pdf document the following exception is thrown:
> java.lang.NullPointerException
> 	at org.apache.pdfbox.tika.PDFParser.extractEmbeddedDocuments(PDFParser.java:155)
> 	at org.apache.pdfbox.tika.PDFParser.parse(PDFParser.java:133)
> 	at test.TikaParse.main(TikaParse.java:27)
> The document I'm trying to parse is probably confidential so I cannot attach it until (or if) I get clearence.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1320) NPE in extractEmbeddedDocuments

Posted by "Timo Boehme (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281682#comment-13281682 ] 

Timo Boehme commented on PDFBOX-1320:
-------------------------------------

Returning an empty collection instead of null breaks PDNameTreeNode.getValue which tests for null value. This could be changed, however we would not be able to know if we simply had an empty name array or no name array at all. Since getValue is implemented to look for kids only if no name array exists I vote against returning an empty collection but to document that null may be returned and other code using it has to test for null.

If there are no objections I will do this change (document the null return value in JavaDoc) and fix the ExtractText which is the only one using this method (beside getValue).
                
> NPE in extractEmbeddedDocuments
> -------------------------------
>
>                 Key: PDFBOX-1320
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1320
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.7.0
>         Environment: pdfbox 1.7.0 (current trunk)
>            Reporter: Samuli Saarinen
>         Attachments: PDFBOX-1320.patch, PDNameTreeNode.java.patch
>
>
> While parsing a pdf document the following exception is thrown:
> java.lang.NullPointerException
> 	at org.apache.pdfbox.tika.PDFParser.extractEmbeddedDocuments(PDFParser.java:155)
> 	at org.apache.pdfbox.tika.PDFParser.parse(PDFParser.java:133)
> 	at test.TikaParse.main(TikaParse.java:27)
> The document I'm trying to parse is probably confidential so I cannot attach it until (or if) I get clearence.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PDFBOX-1320) NPE in extractEmbeddedDocuments

Posted by "Samuli Saarinen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Samuli Saarinen updated PDFBOX-1320:
------------------------------------

    Attachment: PDNameTreeNode.java.patch

Patch that seems to fix the NPE
                
> NPE in extractEmbeddedDocuments
> -------------------------------
>
>                 Key: PDFBOX-1320
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1320
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.7.0
>         Environment: pdfbox 1.7.0 (current trunk)
>            Reporter: Samuli Saarinen
>         Attachments: PDNameTreeNode.java.patch
>
>
> While parsing a pdf document the following exception is thrown:
> java.lang.NullPointerException
> 	at org.apache.pdfbox.tika.PDFParser.extractEmbeddedDocuments(PDFParser.java:155)
> 	at org.apache.pdfbox.tika.PDFParser.parse(PDFParser.java:133)
> 	at test.TikaParse.main(TikaParse.java:27)
> The document I'm trying to parse is probably confidential so I cannot attach it until (or if) I get clearence.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PDFBOX-1320) NPE in extractEmbeddedDocuments

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated PDFBOX-1320:
---------------------------------------

    Attachment: PDFBOX-1320.patch

I committed the fix to Tika's PDFParser.

Here's a patch to also null check in PDFBox's ExtractText tool....
                
> NPE in extractEmbeddedDocuments
> -------------------------------
>
>                 Key: PDFBOX-1320
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1320
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.7.0
>         Environment: pdfbox 1.7.0 (current trunk)
>            Reporter: Samuli Saarinen
>         Attachments: PDFBOX-1320.patch, PDNameTreeNode.java.patch
>
>
> While parsing a pdf document the following exception is thrown:
> java.lang.NullPointerException
> 	at org.apache.pdfbox.tika.PDFParser.extractEmbeddedDocuments(PDFParser.java:155)
> 	at org.apache.pdfbox.tika.PDFParser.parse(PDFParser.java:133)
> 	at test.TikaParse.main(TikaParse.java:27)
> The document I'm trying to parse is probably confidential so I cannot attach it until (or if) I get clearence.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1320) NPE in extractEmbeddedDocuments

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281500#comment-13281500 ] 

Michael McCandless commented on PDFBOX-1320:
--------------------------------------------

Good catch Sumuli!  We can also null-check the return from the getNames() method.

Tika's PDFParser has moved back to Tika sources (thanks Jukka!) ... I'll fix this there.

But, separately, we should also fix ExtractText to null check the call to embeddedFiles.getNames()....
                
> NPE in extractEmbeddedDocuments
> -------------------------------
>
>                 Key: PDFBOX-1320
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1320
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.7.0
>         Environment: pdfbox 1.7.0 (current trunk)
>            Reporter: Samuli Saarinen
>         Attachments: PDNameTreeNode.java.patch
>
>
> While parsing a pdf document the following exception is thrown:
> java.lang.NullPointerException
> 	at org.apache.pdfbox.tika.PDFParser.extractEmbeddedDocuments(PDFParser.java:155)
> 	at org.apache.pdfbox.tika.PDFParser.parse(PDFParser.java:133)
> 	at test.TikaParse.main(TikaParse.java:27)
> The document I'm trying to parse is probably confidential so I cannot attach it until (or if) I get clearence.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira