You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Richard Braman <rb...@bramantax.com> on 2006/03/02 10:42:40 UTC

Permssion to extract text/Embedded documents

Permsssion to extract text:
 
I get the error
060302 034106 fetch okay, but can't parse
http://www.dor.state.nc.us/downloads/fillin/E585.pdf, reason:
failed(2,0): Can't be handled as pdf document. java.io.IOException: You
do not have permission to extract text
 
This is a crptography exception in stripper.gettext.
 
I can open the file no problem in IE, but when I goto plain ole acrobat,
it displays a message that says that "[it] is a secure document that has
been embedded in this document", whatever that means?
 <http://www.dor.state.nc.us/downloads/fillin/E585.pdf>
http://www.dor.state.nc.us/downloads/fillin/E585.pdf. 
 
Nutch PDF parsing in CVS.
http://svn.apache.org/viewcvs.cgi/lucene/nutch/tags/release-0.7.1/src/pl
ugin/parse-pdf/src/java/org/apache/nutch/parse/pdf/PdfParser.java?rev=29
3015
<http://svn.apache.org/viewcvs.cgi/lucene/nutch/tags/release-0.7.1/src/p
lugin/parse-pdf/src/java/org/apache/nutch/parse/pdf/PdfParser.java?rev=2
93015&view=log> &view=log
 

Richard Braman
mailto:rbraman@taxcodesoftware.org
561.748.4002 (voice) 

http://www.taxcodesoftware.org <http://www.taxcodesoftware.org/> 
Free Open Source Tax Software

 

Re: Permssion to extract text/Embedded documents

Posted by Leonard Rosenthol <le...@pdfsages.com>.
At 04:42 AM 3/2/2006, Richard Braman wrote:
>Permsssion to extract text:
>
>I get the error
>060302 034106 fetch okay, but can't parse 
><http://www.dor.state.nc.us/downloads/fillin/E585.pdf>http://www.dor.state.nc.us/downloads/fillin/E585.pdf, 
>reason: failed(2,0): Can't be handled as pdf document. 
>java.io.IOException: You do not have permission to extract text

         Correct.

         If you open the document in Acrobat, you will see the little 
lock icon in the bottom left-hand corner signifying that the document 
is encrypted.   Clicking on it, displays the specifics of the digital 
rights that have been applied - in this case, that text extraction 
(copying) has been DISABLED.


Leonard

---------------------------------------------------------------------------
Leonard Rosenthol                            <ma...@pdfsages.com>
Chief Technical Officer                      <http://www.pdfsages.com>
PDF Sages, Inc.                              215-938-7080 (voice)
                                              215-938-0880 (fax)