You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by François Cassistat <f...@maya-systems.com> on 2010/02/09 19:25:03 UTC
Detecting rfc822 (email) messages
Hi,
I've got to index and sort files where many are in .eml format
(message/rfc822). Apache Tika detects them as plain/text.
I was thinking to hack my application to add a check when I receive
the mimetype plain/text file from Tika, but maybe I should try to
write my own parser (this could be great to add support for indexing
emails attachments). Any pointers?
F
Re: Detecting rfc822 (email) messages
Posted by François Cassistat <f...@maya-systems.com>.
Great, I should gave it a look as soon I've got some time for this.
F
Le 2010-02-10 à 9:50 AM, Jukka Zitting a écrit :
> Hi,
>
> 2010/2/9 François Cassistat <f...@maya-systems.com>:
>> I was thinking to hack my application to add a check when I receive the
>> mimetype plain/text file from Tika, but maybe I should try to write my own
>> parser (this could be great to add support for indexing emails attachments).
>> Any pointers?
>
> See the o.a.tika.parser.mbox.MboxParser class for our current email
> parsing functionality, and http://james.apache.org/mime4j/ for a more
> complete email parser library. It would be great if we could integrate
> mime4j to Tika!
>
> BR,
>
> Jukka Zitting
Re: Detecting rfc822 (email) messages
Posted by Jukka Zitting <ju...@gmail.com>.
Hi,
2010/2/9 François Cassistat <f...@maya-systems.com>:
> I was thinking to hack my application to add a check when I receive the
> mimetype plain/text file from Tika, but maybe I should try to write my own
> parser (this could be great to add support for indexing emails attachments).
> Any pointers?
See the o.a.tika.parser.mbox.MboxParser class for our current email
parsing functionality, and http://james.apache.org/mime4j/ for a more
complete email parser library. It would be great if we could integrate
mime4j to Tika!
BR,
Jukka Zitting
Re: Detecting rfc822 (email) messages
Posted by François Cassistat <f...@maya-systems.com>.
Thanks
Le 2010-02-10 à 7:03 AM, Nick Burch a écrit :
> On Tue, 9 Feb 2010, François Cassistat wrote:
>> I've got to index and sort files where many are in .eml format (message/rfc822). Apache Tika detects them as plain/text.
>
> Currently it looks like tika only handles such files with a .mbox extension, not .eml. Did you try editing tika-mimetypes.xml and adding in .eml too?
>
> Nick
Re: Detecting rfc822 (email) messages
Posted by Nick Burch <ni...@alfresco.com>.
On Tue, 9 Feb 2010, François Cassistat wrote:
> I've got to index and sort files where many are in .eml format
> (message/rfc822). Apache Tika detects them as plain/text.
Currently it looks like tika only handles such files with a .mbox
extension, not .eml. Did you try editing tika-mimetypes.xml and adding in
.eml too?
Nick