You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2021/05/06 14:57:00 UTC

[jira] [Comment Edited] (TIKA-3164) Upgrade to POI 5.0.0 when available

    [ https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340249#comment-17340249 ] 

Tim Allison edited comment on TIKA-3164 at 5/6/21, 2:56 PM:
------------------------------------------------------------

Reports are here: https://corpora.tika.apache.org/base/reports/poi-5.0.1-snapshot-reports.tgz

These compare the latest 4.x vs. 5.0.1-snapshot.  There's a new NPE in WMF parsing, and it looks like we're missing a bunch of attachments.

I also need to look into why there's less content coming out of application/vnd.openxmlformats-officedocument.spreadsheetml.sheet ... 

Parse times seem to be slower for ooxml than in 4.x, but that could be an artifact of the mood of the vm at the time of running...

Attachments and content of spreadsheetml could be Tika issues, not POI. I need to take a look.



was (Author: tallison@mitre.org):
Reports are here: https://corpora.tika.apache.org/base/reports/poi-5.0.1-snapshot-reports.tgz

These compare the latest 4.x vs. 5.0.1-snapshot.  There's a new NPE in WMF parsing, and it looks like we're missing a bunch of attachments.

I also need to look into why there's less content coming out of application/vnd.openxmlformats-officedocument.spreadsheetml.sheet ... this could be a Tika item, not POI...

Parse times seem to be slower for ooxml than in 4.x, but that could be an artifact of the mood of the vm at the time of running...

> Upgrade to POI 5.0.0 when available
> -----------------------------------
>
>                 Key: TIKA-3164
>                 URL: https://issues.apache.org/jira/browse/TIKA-3164
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)