You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Rida Benjelloun (JIRA)" <ji...@apache.org> on 2007/10/01 18:44:50 UTC
[jira] Closed: (TIKA-35) Extract MsOffice properties
[ https://issues.apache.org/jira/browse/TIKA-35?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rida Benjelloun closed TIKA-35.
-------------------------------
> Extract MsOffice properties
> ---------------------------
>
> Key: TIKA-35
> URL: https://issues.apache.org/jira/browse/TIKA-35
> Project: Tika
> Issue Type: Improvement
> Affects Versions: 0.1-incubator
> Reporter: Rida Benjelloun
> Assignee: Rida Benjelloun
> Fix For: 0.1-incubator
>
> Attachments: tika35.patch, tika35.patch
>
>
> Hi,
> I have developed a patch that allows MsOffice properties extraction. I wasn't able to extract the MsOffice properties and full text from a single inputstream, I always get this error : java.io.IOException Source code of java.io.IOException: Unable to read entire header; -1 bytes read;
> expected 512 bytes.
> I don't know how they make it work in Nutch (any ideas ?).
> To get it work, I have added "filePath" variable in the parser class, and I populate it from ParseUtils class. After that I create an inputStream from filePath or Url and I use it to extract properties and I use the default inputstream to extract full text.
> I didn't commit this modification; I would like to have your opinions before.
> Regards.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.