You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2015/06/23 17:11:00 UTC

[jira] [Updated] (TIKA-1651) Add mime detection (and parsing?) for Microsoft Chart object

     [ https://issues.apache.org/jira/browse/TIKA-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Allison updated TIKA-1651:
------------------------------
    Summary: Add mime detection (and parsing?) for Microsoft Chart object  (was: Add mime (and parsing?) for Microsoft Chart object)

> Add mime detection (and parsing?) for Microsoft Chart object
> ------------------------------------------------------------
>
>                 Key: TIKA-1651
>                 URL: https://issues.apache.org/jira/browse/TIKA-1651
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>         Attachments: 11.xls, 428996.ppt, embedded_xls_stack_traces.csv
>
>
> With recently modified tika eval dev code that captures exceptions from embedded documents, there are ~30k exceptions in govdocs1 for what we're currently identifying as xls files embedded in ppt and xls files. 
> It turns out that these are Microsoft Chart files/objects.  We are currently identifying them as xls.  Let's add mime detection to these embedded objects and see if we can use POI to parse the contents of embedded tables when there are embedded tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)