You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by François Cassistat <f...@maya-systems.com> on 2010/02/09 19:25:03 UTC

Detecting rfc822 (email) messages

Hi,

I've got to index and sort files where many are in .eml format  
(message/rfc822). Apache Tika detects them as plain/text.

I was thinking to hack my application to add a check when I receive  
the mimetype plain/text file from Tika, but maybe I should try to  
write my own parser (this could be great to add support for indexing  
emails attachments). Any pointers?


F


Re: Detecting rfc822 (email) messages

Posted by François Cassistat <f...@maya-systems.com>.
Great, I should gave it a look as soon I've got some time for this.


F


Le 2010-02-10 à 9:50 AM, Jukka Zitting a écrit :

> Hi,
> 
> 2010/2/9 François Cassistat <f...@maya-systems.com>:
>> I was thinking to hack my application to add a check when I receive the
>> mimetype plain/text file from Tika, but maybe I should try to write my own
>> parser (this could be great to add support for indexing emails attachments).
>> Any pointers?
> 
> See the o.a.tika.parser.mbox.MboxParser class for our current email
> parsing functionality, and http://james.apache.org/mime4j/ for a more
> complete email parser library. It would be great if we could integrate
> mime4j to Tika!
> 
> BR,
> 
> Jukka Zitting


Re: Detecting rfc822 (email) messages

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

2010/2/9 François Cassistat <f...@maya-systems.com>:
> I was thinking to hack my application to add a check when I receive the
> mimetype plain/text file from Tika, but maybe I should try to write my own
> parser (this could be great to add support for indexing emails attachments).
> Any pointers?

See the o.a.tika.parser.mbox.MboxParser class for our current email
parsing functionality, and http://james.apache.org/mime4j/ for a more
complete email parser library. It would be great if we could integrate
mime4j to Tika!

BR,

Jukka Zitting

Re: Detecting rfc822 (email) messages

Posted by François Cassistat <f...@maya-systems.com>.
Thanks


Le 2010-02-10 à 7:03 AM, Nick Burch a écrit :

> On Tue, 9 Feb 2010, François Cassistat wrote:
>> I've got to index and sort files where many are in .eml format (message/rfc822). Apache Tika detects them as plain/text.
> 
> Currently it looks like tika only handles such files with a .mbox extension, not .eml. Did you try editing tika-mimetypes.xml and adding in .eml too?
> 
> Nick


Re: Detecting rfc822 (email) messages

Posted by Nick Burch <ni...@alfresco.com>.
On Tue, 9 Feb 2010, François Cassistat wrote:
> I've got to index and sort files where many are in .eml format 
> (message/rfc822). Apache Tika detects them as plain/text.

Currently it looks like tika only handles such files with a .mbox 
extension, not .eml. Did you try editing tika-mimetypes.xml and adding in 
.eml too?

Nick