You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2009/07/18 01:06:14 UTC

[jira] Resolved: (TIKA-262) ParsingReader does not parse metadata for larger MS Office documents

     [ https://issues.apache.org/jira/browse/TIKA-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-262.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.5
         Assignee: Jukka Zitting

Good stuff, thanks!

I committed a slightly modified version (inlined smaller methods, indent with spaces) of the patch in revision 795266.

> ParsingReader does not parse metadata for larger MS Office documents
> --------------------------------------------------------------------
>
>                 Key: TIKA-262
>                 URL: https://issues.apache.org/jira/browse/TIKA-262
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.3
>            Reporter: Daan de Wit
>            Assignee: Jukka Zitting
>             Fix For: 0.5
>
>         Attachments: lipsum.doc, OfficeParser.java.patch, OfficeParser.java.patch, OfficeParser.java.patch, tika-0.3_large-ms-office-metadata.patch
>
>
> The ParsingReader should cause the metadata to be extracted before anything is read from the reader. This is not done for certain MS Office files, it seems to be related to the size of the document.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.