You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2014/11/07 23:23:35 UTC

[jira] [Comment Edited] (TIKA-1467) pdf:encrypted:false with encrypted pdf

    [ https://issues.apache.org/jira/browse/TIKA-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202456#comment-14202456 ] 

Tilman Hausherr edited comment on TIKA-1467 at 11/7/14 10:22 PM:
-----------------------------------------------------------------

The old and the new parser have different approaches to decryption. In the old one, you have to decrypt yourself with openProtection(). With the new one, you pass the password (no password = empty password) to loadNonSeq and it is done immediately. So the document is no longer encrypted when loadNonSeq() returns. I don't know how to find out whether it was encrypted. [~lehmi] any idea?


was (Author: tilman):
The old and the new parser have different approaches to decryption. In the old one, you have to decrypt yourself with openProtection(). With the new one, you pass the password (no password = empty password) to loadNonSeq and it done immediately. So the document is no longer encrypted when loadNonSeq() returns. I don't know how to find out whether it was encrypted. [~lehmi] any idea?

> pdf:encrypted:false with encrypted pdf
> --------------------------------------
>
>                 Key: TIKA-1467
>                 URL: https://issues.apache.org/jira/browse/TIKA-1467
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.6
>         Environment: $java -version
> java version "1.6.0_25"
> Java(TM) SE Runtime Environment (build 1.6.0_25-b06)
> Java HotSpot(TM) Client VM (build 20.0-b11, mixed mode, sharing)
>            Reporter: Thomas Ledoux
>
> When extracting metadata from the encryption_noprinting.pdf file found in the pdfCabinetOfHorrors (https://github.com/openplanets/format-corpus/tree/master/pdfCabinetOfHorrors)
> $java -jar tika-app-1.7-20141105.092424-471.jar -j encryption_noprinting.pdf
> We get a 
> INFO - Document is encrypted
> but the resulting JSON has : "pdf:encrypted":"false"
> Looking at the PDFParser, it seems that the first information comes when reading the PDF but when the metadata is retrieve the PDF is no longer encrypted... the encryption fact should be retain to be added to the metadata.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)