You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2022/04/20 17:26:00 UTC

[jira] [Resolved] (TIKA-3666) Detect and indicate file encrypted with Rights Management Service RMS/IRM

     [ https://issues.apache.org/jira/browse/TIKA-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Allison resolved TIKA-3666.
-------------------------------
    Fix Version/s: 2.4.0
       Resolution: Fixed

I think we've done what we could.  If anyone can share examples of PDF or non-OLE2 packages, e.g. *.ptxt, please reopen or open another issue.

Thanks to all who helped with this!

> Detect and indicate file encrypted with Rights Management Service RMS/IRM
> -------------------------------------------------------------------------
>
>                 Key: TIKA-3666
>                 URL: https://issues.apache.org/jira/browse/TIKA-3666
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata
>            Reporter: August Valera
>            Priority: Major
>             Fix For: 2.4.0
>
>         Attachments: poifsviewer.txt, sam-poifsviewer.txt
>
>
> Rights Management Service (RMS), implemented in MS Office as Information Rights Management (IRM), allows organizations to set file permissions that are stored within the file. In most cases, this will result in the file getting a new extension (with a prefix p, such as {{.txt}} becoming {{{}.ptxt{}}}), but in the case of MS Office and PDF files, which support this natively, the implementation results in the file contents being encrypted without any extension change. 
> h4. Current behavior
> Running such files through Tika produces results as if it was an empty file ran through {{DefaultParser}} and {{{}OfficeParser{}}}.
> h4. Expected behavior
> Extract more metadata about necessary permissions to view (if possible), and throwing {{EncryptedDocumentException}} as is the case with Office files encrypted in the more traditional manner.
> Reference: [https://docs.microsoft.com/en-us/azure/information-protection/rms-client/clientv2-admin-guide-file-types#supported-file-types-for-classification-and-protection]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)