You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Radhouane Aniba <ar...@gmail.com> on 2011/02/14 04:49:06 UTC

Analysis Engines for mbox like data

Hello everyone,

Quite unusual request to this list, I am wondering if there is any analysis
engine that allow to mine MBOX like formats such as the famous mailman
mailing list archives in a way that it allow to structure these kind of data
into messages-replies ?

If anyone have already treated this topic I will be very interested in
discussing it further.

Regards

Radhouane

--

RE: Analysis Engines for mbox like data

Posted by jo...@thomsonreuters.com.
Radhouane,

Not an exact answer to your question, but Perl's CPAN has an MBOX format parser:
http://search.cpan.org/dist/Mail-Mbox-MessageParser/

If you write a 3-liner to convert from MBOX to XML, the UIMA analyzer might be
easier to write, unless you want everything implemented in one language.

Regards
Jochen

-----Original Message-----
From: Radhouane Aniba [mailto:aradwen@gmail.com] 
Sent: Montag, 14. Februar 2011 04:49
To: uima-user
Subject: Analysis Engines for mbox like data

Hello everyone,

Quite unusual request to this list, I am wondering if there is any analysis
engine that allow to mine MBOX like formats such as the famous mailman
mailing list archives in a way that it allow to structure these kind of data
into messages-replies ?

If anyone have already treated this topic I will be very interested in
discussing it further.

Regards

Radhouane

--

Re: Analysis Engines for mbox like data

Posted by Thilo Götz <tw...@gmx.de>.
Not sure where you want to go with this, but one
approach might be to preprocess your data into some
better structured format, and only start your UIMA
analysis after you've done that.

I have used a project called mstor on sourceforge to
process mbox files, and then you can use javamail
to get at the thread IDs and whatever else you need.

--Thilo

On 2/14/2011 04:49, Radhouane Aniba wrote:
> Hello everyone,
> 
> Quite unusual request to this list, I am wondering if there is any analysis
> engine that allow to mine MBOX like formats such as the famous mailman
> mailing list archives in a way that it allow to structure these kind of data
> into messages-replies ?
> 
> If anyone have already treated this topic I will be very interested in
> discussing it further.
> 
> Regards
> 
> Radhouane
> 
> --
> 

Re: Analysis Engines for mbox like data

Posted by Tommaso Teofili <to...@gmail.com>.
I agree with Jorn, I think that's the faster way.
Tommaso

2011/2/14 Jörn Kottmann <ko...@gmail.com>

> On 2/14/11 4:49 AM, Radhouane Aniba wrote:
>
>> Hello everyone,
>>
>> Quite unusual request to this list, I am wondering if there is any
>> analysis
>> engine that allow to mine MBOX like formats such as the famous mailman
>> mailing list archives in a way that it allow to structure these kind of
>> data
>> into messages-replies ?
>>
>> If anyone have already treated this topic I will be very interested in
>> discussing it further.
>>
>
> We have a tika integration, and tika has support for mbox.
> Maybe that is good enough to do the extraction.
>
> Jörn
>

Re: Analysis Engines for mbox like data

Posted by Jörn Kottmann <ko...@gmail.com>.
On 2/14/11 4:49 AM, Radhouane Aniba wrote:
> Hello everyone,
>
> Quite unusual request to this list, I am wondering if there is any analysis
> engine that allow to mine MBOX like formats such as the famous mailman
> mailing list archives in a way that it allow to structure these kind of data
> into messages-replies ?
>
> If anyone have already treated this topic I will be very interested in
> discussing it further.

We have a tika integration, and tika has support for mbox.
Maybe that is good enough to do the extraction.

Jörn