You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Dave Meikle (JIRA)" <ji...@apache.org> on 2010/03/27 14:22:27 UTC

[jira] Created: (TIKA-396) Parser Attachements from Outlook Messages

Parser Attachements from Outlook Messages
-----------------------------------------

                 Key: TIKA-396
                 URL: https://issues.apache.org/jira/browse/TIKA-396
             Project: Tika
          Issue Type: Improvement
          Components: parser
    Affects Versions: 0.6
         Environment: All environments.
            Reporter: Dave Meikle
            Assignee: Dave Meikle


As raised by Albert Jensen on the tika-user mailing list[1], it would be good for the Outlook Parser to iterate through the mails attachements and then extract there content.

[1]http://mail-archives.apache.org/mod_mbox/lucene-tika-user/201003.mbox/%3C002701cacccf$16108b40$4231a1c0$@mail.dk%3E


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (TIKA-396) Parser Attachements from Outlook Messages

Posted by Dave Meikle <lo...@gmail.com>.
Hi,

On 11 April 2010 06:59, For Apache Tika <ol...@gmail.com> wrote:

> Please find attached zip file with msgs.
> Change zip_0 to zip ;-)
>

Thanks for the mail Oleg but the attachments do not come through on the
mailing list.  Feel free to fire it to me direct if you still have the ZIP
file.

Thanks,
Dave

Re: [jira] Commented: (TIKA-396) Parser Attachements from Outlook Messages

Posted by For Apache Tika <ol...@gmail.com>.
Please find attached zip file with msgs.
Change zip_0 to zip ;-)


Best regards,
Oleg.

On Fri, Apr 9, 2010 at 9:26 AM, Oleg Tikhonov <ol...@gmail.com>wrote:

> I'll send you on Sunday.
>
> Just wondering, what about Lotus Notes? Do we have something?
>
> -Oleg
>
>
> On Thu, Apr 8, 2010 at 11:13 PM, Dave Meikle <lo...@gmail.com> wrote:
>
>> Hi Oleg,
>>
>> On 8 April 2010 14:56, Oleg Tikhonov <ol...@gmail.com> wrote:
>>
>> > Hi Dave. Which format of Outlook mail do you need? msg?
>> >
>>
>> Yes a msg file, from either Outlook Express or Outlook.
>>
>> Thanks,
>> Dave
>>
>
>
>
> --
> Best regards, Oleg.
>



-- 
Best regards, Oleg.

Re: [jira] Commented: (TIKA-396) Parser Attachements from Outlook Messages

Posted by Oleg Tikhonov <ol...@gmail.com>.
I'll send you on Sunday.

Just wondering, what about Lotus Notes? Do we have something?

-Oleg

On Thu, Apr 8, 2010 at 11:13 PM, Dave Meikle <lo...@gmail.com> wrote:

> Hi Oleg,
>
> On 8 April 2010 14:56, Oleg Tikhonov <ol...@gmail.com> wrote:
>
> > Hi Dave. Which format of Outlook mail do you need? msg?
> >
>
> Yes a msg file, from either Outlook Express or Outlook.
>
> Thanks,
> Dave
>



-- 
Best regards, Oleg.

Re: [jira] Commented: (TIKA-396) Parser Attachements from Outlook Messages

Posted by Dave Meikle <lo...@gmail.com>.
Hi Oleg,

On 8 April 2010 14:56, Oleg Tikhonov <ol...@gmail.com> wrote:

> Hi Dave. Which format of Outlook mail do you need? msg?
>

Yes a msg file, from either Outlook Express or Outlook.

Thanks,
Dave

Re: [jira] Commented: (TIKA-396) Parser Attachements from Outlook Messages

Posted by Oleg Tikhonov <ol...@gmail.com>.
Hi Dave. Which format of Outlook mail do you need? msg?



On Thu, Apr 8, 2010 at 4:33 PM, Dave Meikle (JIRA) <ji...@apache.org> wrote:

>
>    [
> https://issues.apache.org/jira/browse/TIKA-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854933#action_12854933]
>
> Dave Meikle commented on TIKA-396:
> ----------------------------------
>
> Looking to add a test file but everything I have contains an attachment
> with private information.  Does anyone have anything suitable available?  Or
> do we just need to mock one up?
>
> > Parser Attachements from Outlook Messages
> > -----------------------------------------
> >
> >                 Key: TIKA-396
> >                 URL: https://issues.apache.org/jira/browse/TIKA-396
> >             Project: Tika
> >          Issue Type: Improvement
> >          Components: parser
> >    Affects Versions: 0.6
> >         Environment: All environments.
> >            Reporter: Dave Meikle
> >            Assignee: Dave Meikle
> >
> > As raised by Albert Jensen on the tika-user mailing list[1], it would be
> good for the Outlook Parser to iterate through the mails attachments and
> then extract their content.
> > [1]
> http://mail-archives.apache.org/mod_mbox/lucene-tika-user/201003.mbox/%3C002701cacccf$16108b40$4231a1c0$@mail.dk%3E
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>


-- 
Best regards, Oleg.

[jira] Commented: (TIKA-396) Parser Attachements from Outlook Messages

Posted by "Dave Meikle (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854933#action_12854933 ] 

Dave Meikle commented on TIKA-396:
----------------------------------

Looking to add a test file but everything I have contains an attachment with private information.  Does anyone have anything suitable available?  Or do we just need to mock one up?

> Parser Attachements from Outlook Messages
> -----------------------------------------
>
>                 Key: TIKA-396
>                 URL: https://issues.apache.org/jira/browse/TIKA-396
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 0.6
>         Environment: All environments.
>            Reporter: Dave Meikle
>            Assignee: Dave Meikle
>
> As raised by Albert Jensen on the tika-user mailing list[1], it would be good for the Outlook Parser to iterate through the mails attachments and then extract their content.
> [1]http://mail-archives.apache.org/mod_mbox/lucene-tika-user/201003.mbox/%3C002701cacccf$16108b40$4231a1c0$@mail.dk%3E

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-396) Parser Attachements from Outlook Messages

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856826#action_12856826 ] 

Jukka Zitting commented on TIKA-396:
------------------------------------

In revision 933903 I modified the OutlookExtractor to use the parser instance in the ParseContext instead of a hardcoded AutoDetectParser when parsing the attachments. This is similar to what the PackageParser does, and allows better client-level control of the parsing process.

Note that there's now an extra "Invalid attachment id" line being printed to system out as a part of the tika-parsers test suite. I guess this comes from POI.

> Parser Attachements from Outlook Messages
> -----------------------------------------
>
>                 Key: TIKA-396
>                 URL: https://issues.apache.org/jira/browse/TIKA-396
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 0.6
>         Environment: All environments.
>            Reporter: Dave Meikle
>            Assignee: Dave Meikle
>
> As raised by Albert Jensen on the tika-user mailing list[1], it would be good for the Outlook Parser to iterate through the mails attachments and then extract their content.
> [1]http://mail-archives.apache.org/mod_mbox/lucene-tika-user/201003.mbox/%3C002701cacccf$16108b40$4231a1c0$@mail.dk%3E

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (TIKA-396) Parser Attachements from Outlook Messages

Posted by "Dave Meikle (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Meikle updated TIKA-396:
-----------------------------

    Description: 
As raised by Albert Jensen on the tika-user mailing list[1], it would be good for the Outlook Parser to iterate through the mails attachments and then extract their content.

[1]http://mail-archives.apache.org/mod_mbox/lucene-tika-user/201003.mbox/%3C002701cacccf$16108b40$4231a1c0$@mail.dk%3E


  was:
As raised by Albert Jensen on the tika-user mailing list[1], it would be good for the Outlook Parser to iterate through the mails attachements and then extract there content.

[1]http://mail-archives.apache.org/mod_mbox/lucene-tika-user/201003.mbox/%3C002701cacccf$16108b40$4231a1c0$@mail.dk%3E



Looks like basic English is escaping me this morning ;-)

> Parser Attachements from Outlook Messages
> -----------------------------------------
>
>                 Key: TIKA-396
>                 URL: https://issues.apache.org/jira/browse/TIKA-396
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 0.6
>         Environment: All environments.
>            Reporter: Dave Meikle
>            Assignee: Dave Meikle
>
> As raised by Albert Jensen on the tika-user mailing list[1], it would be good for the Outlook Parser to iterate through the mails attachments and then extract their content.
> [1]http://mail-archives.apache.org/mod_mbox/lucene-tika-user/201003.mbox/%3C002701cacccf$16108b40$4231a1c0$@mail.dk%3E

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.