You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Richard Braman <rb...@bramantax.com> on 2006/03/02 10:42:40 UTC
Permssion to extract text/Embedded documents
Permsssion to extract text:
I get the error
060302 034106 fetch okay, but can't parse
http://www.dor.state.nc.us/downloads/fillin/E585.pdf, reason:
failed(2,0): Can't be handled as pdf document. java.io.IOException: You
do not have permission to extract text
This is a crptography exception in stripper.gettext.
I can open the file no problem in IE, but when I goto plain ole acrobat,
it displays a message that says that "[it] is a secure document that has
been embedded in this document", whatever that means?
<http://www.dor.state.nc.us/downloads/fillin/E585.pdf>
http://www.dor.state.nc.us/downloads/fillin/E585.pdf.
Nutch PDF parsing in CVS.
http://svn.apache.org/viewcvs.cgi/lucene/nutch/tags/release-0.7.1/src/pl
ugin/parse-pdf/src/java/org/apache/nutch/parse/pdf/PdfParser.java?rev=29
3015
<http://svn.apache.org/viewcvs.cgi/lucene/nutch/tags/release-0.7.1/src/p
lugin/parse-pdf/src/java/org/apache/nutch/parse/pdf/PdfParser.java?rev=2
93015&view=log> &view=log
Richard Braman
mailto:rbraman@taxcodesoftware.org
561.748.4002 (voice)
http://www.taxcodesoftware.org <http://www.taxcodesoftware.org/>
Free Open Source Tax Software
Re: Permssion to extract text/Embedded documents
Posted by Leonard Rosenthol <le...@pdfsages.com>.
At 04:42 AM 3/2/2006, Richard Braman wrote:
>Permsssion to extract text:
>
>I get the error
>060302 034106 fetch okay, but can't parse
><http://www.dor.state.nc.us/downloads/fillin/E585.pdf>http://www.dor.state.nc.us/downloads/fillin/E585.pdf,
>reason: failed(2,0): Can't be handled as pdf document.
>java.io.IOException: You do not have permission to extract text
Correct.
If you open the document in Acrobat, you will see the little
lock icon in the bottom left-hand corner signifying that the document
is encrypted. Clicking on it, displays the specifics of the digital
rights that have been applied - in this case, that text extraction
(copying) has been DISABLED.
Leonard
---------------------------------------------------------------------------
Leonard Rosenthol <ma...@pdfsages.com>
Chief Technical Officer <http://www.pdfsages.com>
PDF Sages, Inc. 215-938-7080 (voice)
215-938-0880 (fax)