You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Samuli Saarinen (JIRA)" <ji...@apache.org> on 2012/05/23 10:45:40 UTC
[jira] [Created] (PDFBOX-1320) NPE in extractEmbeddedDocuments
Samuli Saarinen created PDFBOX-1320:
---------------------------------------
Summary: NPE in extractEmbeddedDocuments
Key: PDFBOX-1320
URL: https://issues.apache.org/jira/browse/PDFBOX-1320
Project: PDFBox
Issue Type: Bug
Components: Parsing
Affects Versions: 1.7.0
Environment: pdfbox 1.7.0 (current trunk)
Reporter: Samuli Saarinen
While parsing a pdf document the following exception is thrown:
java.lang.NullPointerException
at org.apache.pdfbox.tika.PDFParser.extractEmbeddedDocuments(PDFParser.java:155)
at org.apache.pdfbox.tika.PDFParser.parse(PDFParser.java:133)
at test.TikaParse.main(TikaParse.java:27)
The document I'm trying to parse is probably confidential so I cannot attach it until (or if) I get clearence.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PDFBOX-1320) NPE in extractEmbeddedDocuments
Posted by "Timo Boehme (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Timo Boehme resolved PDFBOX-1320.
---------------------------------
Resolution: Fixed
Fix Version/s: 1.7.0
fixed as proposed in rev. 1342242
> NPE in extractEmbeddedDocuments
> -------------------------------
>
> Key: PDFBOX-1320
> URL: https://issues.apache.org/jira/browse/PDFBOX-1320
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.7.0
> Environment: pdfbox 1.7.0 (current trunk)
> Reporter: Samuli Saarinen
> Assignee: Timo Boehme
> Fix For: 1.7.0
>
> Attachments: PDFBOX-1320.patch, PDNameTreeNode.java.patch
>
>
> While parsing a pdf document the following exception is thrown:
> java.lang.NullPointerException
> at org.apache.pdfbox.tika.PDFParser.extractEmbeddedDocuments(PDFParser.java:155)
> at org.apache.pdfbox.tika.PDFParser.parse(PDFParser.java:133)
> at test.TikaParse.main(TikaParse.java:27)
> The document I'm trying to parse is probably confidential so I cannot attach it until (or if) I get clearence.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PDFBOX-1320) NPE in extractEmbeddedDocuments
Posted by "Timo Boehme (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Timo Boehme reassigned PDFBOX-1320:
-----------------------------------
Assignee: Timo Boehme
> NPE in extractEmbeddedDocuments
> -------------------------------
>
> Key: PDFBOX-1320
> URL: https://issues.apache.org/jira/browse/PDFBOX-1320
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.7.0
> Environment: pdfbox 1.7.0 (current trunk)
> Reporter: Samuli Saarinen
> Assignee: Timo Boehme
> Attachments: PDFBOX-1320.patch, PDNameTreeNode.java.patch
>
>
> While parsing a pdf document the following exception is thrown:
> java.lang.NullPointerException
> at org.apache.pdfbox.tika.PDFParser.extractEmbeddedDocuments(PDFParser.java:155)
> at org.apache.pdfbox.tika.PDFParser.parse(PDFParser.java:133)
> at test.TikaParse.main(TikaParse.java:27)
> The document I'm trying to parse is probably confidential so I cannot attach it until (or if) I get clearence.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1320) NPE in extractEmbeddedDocuments
Posted by "Timo Boehme (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281682#comment-13281682 ]
Timo Boehme commented on PDFBOX-1320:
-------------------------------------
Returning an empty collection instead of null breaks PDNameTreeNode.getValue which tests for null value. This could be changed, however we would not be able to know if we simply had an empty name array or no name array at all. Since getValue is implemented to look for kids only if no name array exists I vote against returning an empty collection but to document that null may be returned and other code using it has to test for null.
If there are no objections I will do this change (document the null return value in JavaDoc) and fix the ExtractText which is the only one using this method (beside getValue).
> NPE in extractEmbeddedDocuments
> -------------------------------
>
> Key: PDFBOX-1320
> URL: https://issues.apache.org/jira/browse/PDFBOX-1320
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.7.0
> Environment: pdfbox 1.7.0 (current trunk)
> Reporter: Samuli Saarinen
> Attachments: PDFBOX-1320.patch, PDNameTreeNode.java.patch
>
>
> While parsing a pdf document the following exception is thrown:
> java.lang.NullPointerException
> at org.apache.pdfbox.tika.PDFParser.extractEmbeddedDocuments(PDFParser.java:155)
> at org.apache.pdfbox.tika.PDFParser.parse(PDFParser.java:133)
> at test.TikaParse.main(TikaParse.java:27)
> The document I'm trying to parse is probably confidential so I cannot attach it until (or if) I get clearence.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1320) NPE in extractEmbeddedDocuments
Posted by "Samuli Saarinen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Samuli Saarinen updated PDFBOX-1320:
------------------------------------
Attachment: PDNameTreeNode.java.patch
Patch that seems to fix the NPE
> NPE in extractEmbeddedDocuments
> -------------------------------
>
> Key: PDFBOX-1320
> URL: https://issues.apache.org/jira/browse/PDFBOX-1320
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.7.0
> Environment: pdfbox 1.7.0 (current trunk)
> Reporter: Samuli Saarinen
> Attachments: PDNameTreeNode.java.patch
>
>
> While parsing a pdf document the following exception is thrown:
> java.lang.NullPointerException
> at org.apache.pdfbox.tika.PDFParser.extractEmbeddedDocuments(PDFParser.java:155)
> at org.apache.pdfbox.tika.PDFParser.parse(PDFParser.java:133)
> at test.TikaParse.main(TikaParse.java:27)
> The document I'm trying to parse is probably confidential so I cannot attach it until (or if) I get clearence.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1320) NPE in extractEmbeddedDocuments
Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated PDFBOX-1320:
---------------------------------------
Attachment: PDFBOX-1320.patch
I committed the fix to Tika's PDFParser.
Here's a patch to also null check in PDFBox's ExtractText tool....
> NPE in extractEmbeddedDocuments
> -------------------------------
>
> Key: PDFBOX-1320
> URL: https://issues.apache.org/jira/browse/PDFBOX-1320
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.7.0
> Environment: pdfbox 1.7.0 (current trunk)
> Reporter: Samuli Saarinen
> Attachments: PDFBOX-1320.patch, PDNameTreeNode.java.patch
>
>
> While parsing a pdf document the following exception is thrown:
> java.lang.NullPointerException
> at org.apache.pdfbox.tika.PDFParser.extractEmbeddedDocuments(PDFParser.java:155)
> at org.apache.pdfbox.tika.PDFParser.parse(PDFParser.java:133)
> at test.TikaParse.main(TikaParse.java:27)
> The document I'm trying to parse is probably confidential so I cannot attach it until (or if) I get clearence.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1320) NPE in extractEmbeddedDocuments
Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281500#comment-13281500 ]
Michael McCandless commented on PDFBOX-1320:
--------------------------------------------
Good catch Sumuli! We can also null-check the return from the getNames() method.
Tika's PDFParser has moved back to Tika sources (thanks Jukka!) ... I'll fix this there.
But, separately, we should also fix ExtractText to null check the call to embeddedFiles.getNames()....
> NPE in extractEmbeddedDocuments
> -------------------------------
>
> Key: PDFBOX-1320
> URL: https://issues.apache.org/jira/browse/PDFBOX-1320
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.7.0
> Environment: pdfbox 1.7.0 (current trunk)
> Reporter: Samuli Saarinen
> Attachments: PDNameTreeNode.java.patch
>
>
> While parsing a pdf document the following exception is thrown:
> java.lang.NullPointerException
> at org.apache.pdfbox.tika.PDFParser.extractEmbeddedDocuments(PDFParser.java:155)
> at org.apache.pdfbox.tika.PDFParser.parse(PDFParser.java:133)
> at test.TikaParse.main(TikaParse.java:27)
> The document I'm trying to parse is probably confidential so I cannot attach it until (or if) I get clearence.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira