You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Roberto Benedetti (JIRA)" <ji...@apache.org> on 2019/01/18 22:46:00 UTC

[jira] [Comment Edited] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES

    [ https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746737#comment-16746737 ] 

Roberto Benedetti edited comment on TIKA-1997 at 1/18/19 10:45 PM:
-------------------------------------------------------------------

Updated references are:
 * [RFC-5652, Cryptographic Message Syntax (CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for "pkcs7-signedData" OID at the beginning of the file and, if found, returns "application/pkcs7-signature".

There are, however, three media types with that OID at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c" (".p7b" not mentioned but can be found too), when there are only certificates and (optionally) CRLs

Extension ".p7b" is registered in Tika with media type "application/x-pkcs7-certificates" but I think the content of such files is the same as ".p7c" ones.

Extension ".p7m" is also used when the OID at the beginning is "pkcs7-envelopedData" and the media type is "application/pkcs7-mime; smime-type=enveloped-data".

Extension ".p7z" is used when the OID at the beginning is "id-smime-ct-compressedData" and the media type is "application/pkcs7-mime; smime-type=compressed-data".

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file begins with "-----BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-----BEGIN PKCS7" or pkcs7-signedData are found (like it does for XML streams)

 


was (Author: roberto.benedetti):
Updated references are:
 * [RFC-5652, Cryptographic Message Syntax (CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for "pkcs7-signedData" OID at the beginning of the file and, if found, returns "application/pkcs7-signature".

There are, however, three media types with that OID at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c" (".p7b" not mentioned but can be found too), when there are only certificates and (optionally) CRLs

Extension ".p7m" is also used when the OID at the beginning is "pkcs7-envelopedData" and the media type is "application/pkcs7-mime; smime-type=enveloped-data".

Extension ".p7z" is used when the OID at the beginning is "id-smime-ct-compressedData" and the media type is "application/pkcs7-mime; smime-type=compressed-data".

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file begins with "-----BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-----BEGIN PKCS7" or pkcs7-signedData are found (like it does for XML streams)

 

> Problem in Tika().detect for xml file signed in CADES
> -----------------------------------------------------
>
>                 Key: TIKA-1997
>                 URL: https://issues.apache.org/jira/browse/TIKA-1997
>             Project: Tika
>          Issue Type: Sub-task
>          Components: detector
>    Affects Versions: 1.13
>         Environment: JDK 1.7
>            Reporter: Michele Andreano
>            Priority: Blocker
>         Attachments: test.xml.p7m
>
>
> When I submit a tika a xml file signed in P7M format, I expect tika return as mimetype application / pkcs7-mime instead gives me application / pkcs7-signature.
> How is it possible?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)