You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Barrett (JIRA)" <ji...@apache.org> on 2015/06/26 10:49:04 UTC

[jira] [Updated] (TIKA-1665) Incorrect handling of eml files with type message/x-emlx embedded in msg files

     [ https://issues.apache.org/jira/browse/TIKA-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Barrett updated TIKA-1665:
------------------------------
    Attachment: Anonymised Small EML attached message.msg

This is a msg file with an embedded eml message which exhibits this behaviour

> Incorrect handling of eml files with type message/x-emlx embedded in msg files
> ------------------------------------------------------------------------------
>
>                 Key: TIKA-1665
>                 URL: https://issues.apache.org/jira/browse/TIKA-1665
>             Project: Tika
>          Issue Type: Bug
>          Components: mime, parser
>    Affects Versions: 1.7, 1.8, 1.9
>         Environment: all (Linux, Os-X, Windows)
>            Reporter: Tim Barrett
>         Attachments: Anonymised Small EML attached message.msg
>
>
> Our software uses Tika to parse large and diverse sets of customer files. Amongst these files we have eml files which are embedded within msg files. These eml files have a media type of message/x-emlx as detected by Media Detector.
> From Tika 1.7 onwards the binary mime attachment data of the file within the parent msg file is parsed as text, this did not happen with Tika 1.6 or prior versions. This is causing huge volumes of meaningless characters to be passed through to our content handler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)