You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2012/07/07 18:05:34 UTC

[jira] [Resolved] (TIKA-561) Support EMLX file detection

     [ https://issues.apache.org/jira/browse/TIKA-561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-561.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 1.2
         Assignee: Jukka Zitting

Sorry for the long delay on this...

I committed a slightly modified version in revision 1358596, using a hand-crafted EMLX example file based on the existing RFC822 test files we have.
                
> Support EMLX file detection
> ---------------------------
>
>                 Key: TIKA-561
>                 URL: https://issues.apache.org/jira/browse/TIKA-561
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Antoni Mylka
>            Assignee: Jukka Zitting
>             Fix For: 1.2
>
>         Attachments: tika-561.patch
>
>
> Apple Mail generates email files in .emlx format. They roughly resemble standard rfc822 .eml files but are different.
> On the first line they have the content length in bytes,
> then on the second line, normal rfc822 content starts
> and afterwards there is some XML metadata.
> I would suggest to add support for .emlx files to tika-mimetypes.xml. Just copy the message/rfc822 definitions and state that they should appear at offsets 3:10, this should be enough to accomodate the the content length on the first line. Any reasonable email should be longer than 9 bytes. In this case the first line would have two bytes, then the line break, and normal rfc822 headers can start at offset 4. This will work for emails up to 99 MB, (99 999 999 bytes). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira