You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Mark Kerzner <ma...@gmail.com> on 2009/08/04 23:17:43 UTC

Extraction of text from emails

Hi,
Tika is supposed to process *.msg files (email extraced out of Outlook). At
first attempt this did not work, what should I look at? Incidentally, how do
I get *.msg out of Outlook?

Thank you,
Mark

Re: Extraction of text from emails

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Wed, Aug 5, 2009 at 1:56 PM, Mark Kerzner<ma...@gmail.com> wrote:
> to get Tika 0.4 I need to build it from source, there is no jar yet,
> correct?

Yes, as of now we don't provide pre-built binaries except for the
component jars in the central Maven repository.

BR,

Jukka Zitting

Re: Extraction of text from emails

Posted by Mark Kerzner <ma...@gmail.com>.
Thank you, Jukka,
to get Tika 0.4 I need to build it from source, there is no jar yet,
correct?

Mark

On Wed, Aug 5, 2009 at 6:20 AM, Jukka Zitting <ju...@gmail.com>wrote:

> Hi,
>
> On Wed, Aug 5, 2009 at 1:17 PM, Mark Kerzner<ma...@gmail.com> wrote:
> > But in my tests Tika does not process .msg (even though it is promised).
>
> It could be that Tika is unable to detect the file type. Earlier Tika
> versions needed the file name as input metadata to correctly detect
> some Microsoft file formats. Tika 0.4 should work fine even without
> any input metadata.
>
> If this doesn't help, can you file a bug report with an example .msg file?
>
> BR,
>
> Jukka Zitting
>

Re: Extraction of text from emails

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Wed, Aug 5, 2009 at 1:17 PM, Mark Kerzner<ma...@gmail.com> wrote:
> But in my tests Tika does not process .msg (even though it is promised).

It could be that Tika is unable to detect the file type. Earlier Tika
versions needed the file name as input metadata to correctly detect
some Microsoft file formats. Tika 0.4 should work fine even without
any input metadata.

If this doesn't help, can you file a bug report with an example .msg file?

BR,

Jukka Zitting

Re: Extraction of text from emails

Posted by Mark Kerzner <ma...@gmail.com>.
But in my tests Tika does not process .msg (even though it is promised).
That is why I will extract to something like html

On Wed, Aug 5, 2009 at 12:01 AM, Brindha karuppiah <kr...@gmail.com>wrote:

> Hi,
> By extracting .PST files u can get the *.msg files
>
>
> On Wed, Aug 5, 2009 at 2:47 AM, Mark Kerzner <ma...@gmail.com>wrote:
>
>> Hi,
>> Tika is supposed to process *.msg files (email extraced out of Outlook).
>> At first attempt this did not work, what should I look at? Incidentally, how
>> do I get *.msg out of Outlook?
>>
>> Thank you,
>> Mark
>>
>
>
>
> --
> Regards,
> Brindha.KR
>
>

Re: Extraction of text from emails

Posted by Brindha karuppiah <kr...@gmail.com>.
Hi,
By extracting .PST files u can get the *.msg files

On Wed, Aug 5, 2009 at 2:47 AM, Mark Kerzner <ma...@gmail.com> wrote:

> Hi,
> Tika is supposed to process *.msg files (email extraced out of Outlook). At
> first attempt this did not work, what should I look at? Incidentally, how do
> I get *.msg out of Outlook?
>
> Thank you,
> Mark
>



-- 
Regards,
Brindha.KR

Re: Extraction of text from emails

Posted by Mark Kerzner <ma...@gmail.com>.
Or maybe, if one is using a converter anyway, one can ask the converter to
output HTML for example and avoid all *.msg problems?

On Tue, Aug 4, 2009 at 4:39 PM, Mark Kerzner <ma...@gmail.com> wrote:

> Aaron,
> that would work. I was looking for something in Linux, but that's not
> so important. After all, such conversions can't be run on a Linux grid
> anyway.
>
> However, I still can't get *.msg recognized by Tika. I ran Tika in a GUI
> mode, and it does not crash and even recognized the msg format and
> a couple metadata fields, but no text.
>
> Thank you,
> Mark
>
>
> On Tue, Aug 4, 2009 at 4:34 PM, Aaron Fulton <aa...@softhome.net>wrote:
>
>>  To get msg files out of Outlook you can drag and drop messages from
>> outlook to a folder on your desktop or use a third party application such as
>> this one http://www.outlookconversion.com/pst-to-msg.html
>>
>> Mark Kerzner wrote:
>>
>> Hi,
>>  Tika is supposed to process *.msg files (email extraced out of Outlook).
>> At first attempt this did not work, what should I look at? Incidentally, how
>> do I get *.msg out of Outlook?
>>
>>  Thank you,
>> Mark
>>
>> ------------------------------
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 8.5.392 / Virus Database: 270.13.44/2282 - Release Date: 08/04/09 18:01:00
>>
>>
>>
>>
>

Re: Extraction of text from emails

Posted by Mark Kerzner <ma...@gmail.com>.
Aaron,
that would work. I was looking for something in Linux, but that's not
so important. After all, such conversions can't be run on a Linux grid
anyway.

However, I still can't get *.msg recognized by Tika. I ran Tika in a GUI
mode, and it does not crash and even recognized the msg format and
a couple metadata fields, but no text.

Thank you,
Mark

On Tue, Aug 4, 2009 at 4:34 PM, Aaron Fulton <aa...@softhome.net>wrote:

>  To get msg files out of Outlook you can drag and drop messages from
> outlook to a folder on your desktop or use a third party application such as
> this one http://www.outlookconversion.com/pst-to-msg.html
>
> Mark Kerzner wrote:
>
> Hi,
>  Tika is supposed to process *.msg files (email extraced out of Outlook).
> At first attempt this did not work, what should I look at? Incidentally, how
> do I get *.msg out of Outlook?
>
>  Thank you,
> Mark
>
> ------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.392 / Virus Database: 270.13.44/2282 - Release Date: 08/04/09 18:01:00
>
>
>
>

Re: Extraction of text from emails

Posted by Aaron Fulton <aa...@softhome.net>.
To get msg files out of Outlook you can drag and drop messages from 
outlook to a folder on your desktop or use a third party application 
such as this one http://www.outlookconversion.com/pst-to-msg.html

Mark Kerzner wrote:
> Hi,
>
> Tika is supposed to process *.msg files (email extraced out of 
> Outlook). At first attempt this did not work, what should I look at? 
> Incidentally, how do I get *.msg out of Outlook?
>
> Thank you,
> Mark
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com 
> Version: 8.5.392 / Virus Database: 270.13.44/2282 - Release Date: 08/04/09 18:01:00
>
>