You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/10/11 23:40:00 UTC

[jira] [Commented] (TIKA-1788) message/rfc822 parser doesn't identify attachment filenames from Content-Disposition header

    [ https://issues.apache.org/jira/browse/TIKA-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201199#comment-16201199 ] 

ASF GitHub Bot commented on TIKA-1788:
--------------------------------------

AarjavP opened a new pull request #211: [TIKA-1788] RFC822Parser: provide email attachment filenames when available
URL: https://github.com/apache/tika/pull/211
 
 
   https://issues.apache.org/jira/browse/TIKA-1788 old case but had the solution in it. Not sure why it never made it in.
   
   There is other info available in `MaximalBodyDescriptor`. However I am not sure how they would map to fields in `Metadata` or if most files would even have that info*.
   
   One piece of info that I would like to make available is either the whole `Content-Disposition` line which I am not sure how to get, or flags for `attachment`/`inline` that are almost always available with filename. Looking at the source code, it is parsed but hidden away behind a `private` field.😢 
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> message/rfc822 parser doesn't identify attachment filenames from Content-Disposition header
> -------------------------------------------------------------------------------------------
>
>                 Key: TIKA-1788
>                 URL: https://issues.apache.org/jira/browse/TIKA-1788
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.11
>            Reporter: Sergey Tsalkov
>         Attachments: grep_content_disposition.zip
>
>
> rfc822 email files can contain attachments as subparts, and they'll
> generally specify the filename of the attachment in a manner like
> this:
> Content-Disposition: attachment;
>         filename*=utf-8''image001.jpg
> Tika doesn't seem to be grabbing that information at all!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)