You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2017/11/22 02:38:00 UTC

[jira] [Assigned] (TIKA-2507) xlsx takes more than 5 mins to parse in 1.16

     [ https://issues.apache.org/jira/browse/TIKA-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Allison reassigned TIKA-2507:
---------------------------------

    Assignee: Tim Allison

> xlsx takes more than 5 mins to parse in 1.16
> --------------------------------------------
>
>                 Key: TIKA-2507
>                 URL: https://issues.apache.org/jira/browse/TIKA-2507
>             Project: Tika
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 1.16
>         Environment: started server with 
> {noformat}
> java -jar tiki-server-1.16.jar
> {noformat}
>            Reporter: José Borges Ferreira
>            Assignee: Tim Allison
>         Attachments: Tika.1.16-killer.xlsx
>
>
> when sending a xlsx file with a lot of charts the tiki server takes more that 5 min to process on my  2,2GHz Macbook pro.
> In version 1.15 this takes less than a second. Looking at the changeling I'm guessing that can be related with some features introduced in 1.16, namely :
> # Extract text from charts in .docx, .pptx, .xlsx and .xlsb(TIKA-2254).
> # Extract text from diagrams in .docx, .pptx, .xlsx and .xlsb(TIKA-1945).
> I'm attaching the file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)