You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Dave Meikle (JIRA)" <ji...@apache.org> on 2010/03/27 14:00:28 UTC

[jira] Created: (TIKA-395) Tika fails to extract Messages from Outlook 2007

Tika fails to extract Messages from Outlook 2007 
-------------------------------------------------

                 Key: TIKA-395
                 URL: https://issues.apache.org/jira/browse/TIKA-395
             Project: Tika
          Issue Type: Bug
    Affects Versions: 0.6
         Environment: Windows 7, Outlook 2007
            Reporter: Dave Meikle
            Assignee: Dave Meikle


When parsing an Outlook 2007 message file no content in extracted.  However, using the included test message file from Outlook Express this works correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TIKA-395) Tika fails to extract Messages from Outlook 2007

Posted by "Dave Meikle (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Meikle updated TIKA-395:
-----------------------------

    Fix Version/s: 0.7

> Tika fails to extract Messages from Outlook 2007 
> -------------------------------------------------
>
>                 Key: TIKA-395
>                 URL: https://issues.apache.org/jira/browse/TIKA-395
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: Windows 7, Outlook 2007
>            Reporter: Dave Meikle
>            Assignee: Dave Meikle
>             Fix For: 0.7
>
>         Attachments: TIKA-395_Outlook_2007_Parser_Issue.diff
>
>
> When parsing an Outlook 2007 message file no content in extracted.  However, using the included test message file from Outlook Express this works correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-395) Tika fails to extract Messages from Outlook 2007

Posted by "Dave Meikle (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850510#action_12850510 ] 

Dave Meikle commented on TIKA-395:
----------------------------------

Just testing with Outlook 2003 and this does not work either.  Think I have identified the issue, will include a patch to see if everyone else can confirm this works before committing as I am concious no one has raised this before.

> Tika fails to extract Messages from Outlook 2007 
> -------------------------------------------------
>
>                 Key: TIKA-395
>                 URL: https://issues.apache.org/jira/browse/TIKA-395
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: Windows 7, Outlook 2007
>            Reporter: Dave Meikle
>            Assignee: Dave Meikle
>
> When parsing an Outlook 2007 message file no content in extracted.  However, using the included test message file from Outlook Express this works correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TIKA-395) Tika fails to extract Messages from Outlook 2007

Posted by "Dave Meikle (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Meikle updated TIKA-395:
-----------------------------

    Component/s: parser

> Tika fails to extract Messages from Outlook 2007 
> -------------------------------------------------
>
>                 Key: TIKA-395
>                 URL: https://issues.apache.org/jira/browse/TIKA-395
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.6
>         Environment: Windows 7, Outlook 2007
>            Reporter: Dave Meikle
>            Assignee: Dave Meikle
>             Fix For: 0.7
>
>         Attachments: TIKA-395_Outlook_2007_Parser_Issue.diff
>
>
> When parsing an Outlook 2007 message file no content in extracted.  However, using the included test message file from Outlook Express this works correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (TIKA-395) Tika fails to extract Messages from Outlook 2007

Posted by "Dave Meikle (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Meikle resolved TIKA-395.
------------------------------

    Resolution: Fixed

Managed to test this on a variety of machines and formats myself, therefore fix committed in revision 928505.

> Tika fails to extract Messages from Outlook 2007 
> -------------------------------------------------
>
>                 Key: TIKA-395
>                 URL: https://issues.apache.org/jira/browse/TIKA-395
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.6
>         Environment: Windows 7, Outlook 2007
>            Reporter: Dave Meikle
>            Assignee: Dave Meikle
>             Fix For: 0.7
>
>         Attachments: TIKA-395_Outlook_2007_Parser_Issue.diff
>
>
> When parsing an Outlook 2007 message file no content in extracted.  However, using the included test message file from Outlook Express this works correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TIKA-395) Tika fails to extract Messages from Outlook 2007

Posted by "Dave Meikle (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Meikle updated TIKA-395:
-----------------------------

    Attachment: TIKA-395_Outlook_2007_Parser_Issue.diff

Patch to get chunk list based on the chunks identified in the file.  This should detect if the file uses new or old type chunks as opposed to just using the default old string type.

> Tika fails to extract Messages from Outlook 2007 
> -------------------------------------------------
>
>                 Key: TIKA-395
>                 URL: https://issues.apache.org/jira/browse/TIKA-395
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: Windows 7, Outlook 2007
>            Reporter: Dave Meikle
>            Assignee: Dave Meikle
>         Attachments: TIKA-395_Outlook_2007_Parser_Issue.diff
>
>
> When parsing an Outlook 2007 message file no content in extracted.  However, using the included test message file from Outlook Express this works correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-395) Tika fails to extract Messages from Outlook 2007

Posted by "Dave Meikle (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850512#action_12850512 ] 

Dave Meikle commented on TIKA-395:
----------------------------------

This patch should fix it, can people give it a whirl to make sure it works on their systems.  Will create a msg file from Outlook 2007 for inclusion in the test suite when I commit the fix.

> Tika fails to extract Messages from Outlook 2007 
> -------------------------------------------------
>
>                 Key: TIKA-395
>                 URL: https://issues.apache.org/jira/browse/TIKA-395
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: Windows 7, Outlook 2007
>            Reporter: Dave Meikle
>            Assignee: Dave Meikle
>         Attachments: TIKA-395_Outlook_2007_Parser_Issue.diff
>
>
> When parsing an Outlook 2007 message file no content in extracted.  However, using the included test message file from Outlook Express this works correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.