You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tika.apache.org by "Manjunath Dhongadi (Jira)" <ji...@apache.org> on 2022/03/02 16:44:00 UTC

[jira] [Commented] (TIKA-3668) High CPU utilization in Tika 2.2.0

    [ https://issues.apache.org/jira/browse/TIKA-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500277#comment-17500277 ] 

Manjunath Dhongadi commented on TIKA-3668:
------------------------------------------

We have observed this scenario during performance testing when we scan around 100GB of data.
This is not case with specific file formats, its generic across all.
We do not use any custom settings for parsers.

> High CPU utilization in Tika 2.2.0
> ----------------------------------
>
>                 Key: TIKA-3668
>                 URL: https://issues.apache.org/jira/browse/TIKA-3668
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Manjunath Dhongadi
>            Priority: Major
>
> Recently we upgraded Tika version from 1.26 to 2.2.0.
> We see the CPU utilization have gone high drastically(6 to 8 times more) in both cases Tesseract enabled and Tesseract disabled case.
> We are using tika-parsers-standard-package of 2.2.0.
> Whether this is normal behavior of high version of Tika 2.2.0. 
> Any fine tuning parameters available for same.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)