You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by reinhard schwab <re...@aon.at> on 2010/08/21 20:47:04 UTC
NPE in PDPageNode
i get a nullpointer exception when parsing a pdf with tika.
http://www.awsg.at/portal/media/4218.pdf
java.lang.NullPointerException
at org.apache.pdfbox.pdmodel.PDPageNode.getCount(PDPageNode.java:109)
at
org.apache.pdfbox.pdmodel.PDDocument.getNumberOfPages(PDDocument.java:943)
at
org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:105)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:86)
regards
reinhard
Re: NPE in PDPageNode
Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,
Am 24.08.2010 02:41, schrieb Adam@swmc.com:
> Reinhard,
>
> If you can get a copy of the unencrypted version, that'd be very helpful.
> If not, we'll just do the best we can with the PDF you have provided. I
> tried removing the password with a program I have, but it seems to have
> run into the same issue with parsing the PDF as PDFBox did... so no luck
> there either.
The problem is the unsupported encryption algorithm. See my comment on [1] (I
don't know why there wasn't any notification on dev@)
BR
Andreas Lehmkühler
[1] https://issues.apache.org/jira/browse/PDFBOX-797
Re: NPE in PDPageNode
Posted by Ad...@swmc.com.
Reinhard,
If you can get a copy of the unencrypted version, that'd be very helpful.
If not, we'll just do the best we can with the PDF you have provided. I
tried removing the password with a program I have, but it seems to have
run into the same issue with parsing the PDF as PDFBox did... so no luck
there either.
----
Thanks,
Adam
From:
reinhard schwab <re...@aon.at>
To:
dev@pdfbox.apache.org
Date:
08/23/2010 13:24
Subject:
Re: NPE in PDPageNode
adam,
im sorry. i neither dont know what program has been used nor do i know
the password or
how to remove the encryption.
i only can ask some other people about this.
i will open a jira issue and attach the file.
best regards
reinhard
Adam@swmc.com schrieb:
> Reinhard,
>
> The root element in your PDF references object 1554 as the object which
> informs us of the pages within this document. This object does not seem
> to exist in the PDF, which is a violation of the PDF spec and why PDFBox
> is unable to parse it. You can open the PDF in a decent text editor and
> search for 1554 and you'll see the Pages section which references this
> object, but that's the only place it's found, there's no object
> definition.
>
> Now, having said that, if we can find a reliable way to parse files like
> these, we can update the code. Do you know what program was used to
> create this PDF? Would it be possible for you to remove the encryption
on
> this file and try it again? That would make it much easier to debug (if
> it still crashes without the encryption, it might not).
>
> I also encourage you to create an issue of JIRA and upload this file
there
> (in case the link dies in the future). https://issues.apache.org/jira
>
> ----
> Thanks,
> Adam
>
>
>
>
>
> From:
> reinhard schwab <re...@aon.at>
> To:
> dev@pdfbox.apache.org
> Date:
> 08/21/2010 11:42
> Subject:
> NPE in PDPageNode
>
>
>
> i get a nullpointer exception when parsing a pdf with tika.
>
> http://www.awsg.at/portal/media/4218.pdf
>
> java.lang.NullPointerException
> at
org.apache.pdfbox.pdmodel.PDPageNode.getCount(PDPageNode.java:109)
> at
>
org.apache.pdfbox.pdmodel.PDDocument.getNumberOfPages(PDDocument.java:943)
> at
> org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:105)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:86)
>
>
> regards
> reinhard
>
>
>
>
>
>
> ? Click here to submit conditions
>
> This email and any content within or attached hereto from Sun West
Mortgage Company, Inc. is confidential and/or legally privileged. The
information is intended only for the use of the individual or entity named
on this email. If you are not the intended recipient, you are hereby
notified that any disclosure, copying, distribution or the taking of any
action in reliance on the contents of this email information is strictly
prohibited, and that the documents should be returned to this office
immediately by email. Receipt by anyone other than the intended recipient
is not a waiver of any privilege. Please do not include your social
security number, account number, or any other personal or financial
information in the content of the email. Should you have any questions,
please call (800) 453 7884.
? Click here to submit conditions
This email and any content within or attached hereto from Sun West Mortgage Company, Inc. is confidential and/or legally privileged. The information is intended only for the use of the individual or entity named on this email. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or the taking of any action in reliance on the contents of this email information is strictly prohibited, and that the documents should be returned to this office immediately by email. Receipt by anyone other than the intended recipient is not a waiver of any privilege. Please do not include your social security number, account number, or any other personal or financial information in the content of the email. Should you have any questions, please call (800) 453 7884.
Re: NPE in PDPageNode
Posted by reinhard schwab <re...@aon.at>.
adam,
im sorry. i neither dont know what program has been used nor do i know
the password or
how to remove the encryption.
i only can ask some other people about this.
i will open a jira issue and attach the file.
best regards
reinhard
Adam@swmc.com schrieb:
> Reinhard,
>
> The root element in your PDF references object 1554 as the object which
> informs us of the pages within this document. This object does not seem
> to exist in the PDF, which is a violation of the PDF spec and why PDFBox
> is unable to parse it. You can open the PDF in a decent text editor and
> search for 1554 and you'll see the Pages section which references this
> object, but that's the only place it's found, there's no object
> definition.
>
> Now, having said that, if we can find a reliable way to parse files like
> these, we can update the code. Do you know what program was used to
> create this PDF? Would it be possible for you to remove the encryption on
> this file and try it again? That would make it much easier to debug (if
> it still crashes without the encryption, it might not).
>
> I also encourage you to create an issue of JIRA and upload this file there
> (in case the link dies in the future). https://issues.apache.org/jira
>
> ----
> Thanks,
> Adam
>
>
>
>
>
> From:
> reinhard schwab <re...@aon.at>
> To:
> dev@pdfbox.apache.org
> Date:
> 08/21/2010 11:42
> Subject:
> NPE in PDPageNode
>
>
>
> i get a nullpointer exception when parsing a pdf with tika.
>
> http://www.awsg.at/portal/media/4218.pdf
>
> java.lang.NullPointerException
> at org.apache.pdfbox.pdmodel.PDPageNode.getCount(PDPageNode.java:109)
> at
> org.apache.pdfbox.pdmodel.PDDocument.getNumberOfPages(PDDocument.java:943)
> at
> org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:105)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:86)
>
>
> regards
> reinhard
>
>
>
>
>
>
> ? Click here to submit conditions
>
> This email and any content within or attached hereto from Sun West Mortgage Company, Inc. is confidential and/or legally privileged. The information is intended only for the use of the individual or entity named on this email. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or the taking of any action in reliance on the contents of this email information is strictly prohibited, and that the documents should be returned to this office immediately by email. Receipt by anyone other than the intended recipient is not a waiver of any privilege. Please do not include your social security number, account number, or any other personal or financial information in the content of the email. Should you have any questions, please call (800) 453 7884.
Re: NPE in PDPageNode
Posted by Ad...@swmc.com.
Reinhard,
The root element in your PDF references object 1554 as the object which
informs us of the pages within this document. This object does not seem
to exist in the PDF, which is a violation of the PDF spec and why PDFBox
is unable to parse it. You can open the PDF in a decent text editor and
search for 1554 and you'll see the Pages section which references this
object, but that's the only place it's found, there's no object
definition.
Now, having said that, if we can find a reliable way to parse files like
these, we can update the code. Do you know what program was used to
create this PDF? Would it be possible for you to remove the encryption on
this file and try it again? That would make it much easier to debug (if
it still crashes without the encryption, it might not).
I also encourage you to create an issue of JIRA and upload this file there
(in case the link dies in the future). https://issues.apache.org/jira
----
Thanks,
Adam
From:
reinhard schwab <re...@aon.at>
To:
dev@pdfbox.apache.org
Date:
08/21/2010 11:42
Subject:
NPE in PDPageNode
i get a nullpointer exception when parsing a pdf with tika.
http://www.awsg.at/portal/media/4218.pdf
java.lang.NullPointerException
at org.apache.pdfbox.pdmodel.PDPageNode.getCount(PDPageNode.java:109)
at
org.apache.pdfbox.pdmodel.PDDocument.getNumberOfPages(PDDocument.java:943)
at
org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:105)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:86)
regards
reinhard
? Click here to submit conditions
This email and any content within or attached hereto from Sun West Mortgage Company, Inc. is confidential and/or legally privileged. The information is intended only for the use of the individual or entity named on this email. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or the taking of any action in reliance on the contents of this email information is strictly prohibited, and that the documents should be returned to this office immediately by email. Receipt by anyone other than the intended recipient is not a waiver of any privilege. Please do not include your social security number, account number, or any other personal or financial information in the content of the email. Should you have any questions, please call (800) 453 7884.