You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (Commented) (JIRA)" <ji...@apache.org> on 2011/12/21 04:27:31 UTC

[jira] [Commented] (TIKA-823) Detect StarOffice files

    [ https://issues.apache.org/jira/browse/TIKA-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173820#comment-13173820 ] 

Nick Burch commented on TIKA-823:
---------------------------------

Note that it looks like the strings are prefixed with a 4 byte long length field, and are null terminated. It looks like the first one may always start in the same place in the file, if so you can probably skip forward to that, then use the POI utils to read you the string from the DocumentInputStream
                
> Detect StarOffice files
> -----------------------
>
>                 Key: TIKA-823
>                 URL: https://issues.apache.org/jira/browse/TIKA-823
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 1.1
>            Reporter: Antoni Mylka
>         Attachments: testStarOffice-5.2-calc.sdc, testStarOffice-5.2-draw.sda, testStarOffice-5.2-impress.sdd, testStarOffice-5.2-write.sdw
>
>
> I would like both MimeTypes and the POIFSContainerDetector to be able to detect files created with Star Office Draw, Impress, Writer and Calc.
> I started working on this, but stumbled upon a POI issue, which I posted to poi-user. 
> http://thread.gmane.org/gmane.comp.jakarta.poi.user/17857
> Nick? Yegor? I know you're on the Tika list as well. Could you take a look? How to get the raw content of CompObj entry?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira