You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Chris Mattmann <ma...@apache.org> on 2020/04/16 14:11:35 UTC

Re: [EXTERNAL] Re: Issue with > 200% CPU after bulk usage

Yes, some of us have been developing an Elastic scaling stack for Tika server…

 

That does just that with AWS. Don’t have it ready to push upstream yet.


Cheers,

Chris

 

 

From: Eric Pugh <ep...@opensourceconnections.com>
Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
Date: Thursday, April 16, 2020 at 7:09 AM
To: "dev@tika.apache.org" <de...@tika.apache.org>
Subject: [EXTERNAL] Re: Issue with > 200% CPU after bulk usage

 

Does anyone have a good example of combining Tika with some sort of pool of Docker containers?   I think a lot of folks treat their Tika server like a pet, not like a cow.  https://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/ <https://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/>

 

I wonder if we could ship some “recipes” that describe how to deploy a pool of Tika’s.    Tika running over 200% for 1 hour, kill it and start the next.

 

 

 

On Apr 16, 2020, at 9:40 AM, Nick Burch <ap...@gagravarr.org> wrote:

On Wed, 15 Apr 2020, hans.meijer@avident-it.se wrote:

I have encountered an issue with Tika running locally on a box that the Java runtime goes up to over 200% CPU, after running a bulk load of documents over a couple of days, it is more than 3 million documents.

Can you do a thread dump to show what the JVM is doing?

https://access.redhat.com/solutions/18178

Nick

 

_______________________

Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  

Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>       

This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.

 

 


Re: [EXTERNAL] Re: Issue with > 200% CPU after bulk usage

Posted by Tim Allison <ta...@apache.org>.
I very much like Eric's ideas of recipes and possibly code because of the
differences in capabilities available via the various cloud providers.

On Thu, Apr 16, 2020 at 10:11 AM Chris Mattmann <ma...@apache.org> wrote:

> Yes, some of us have been developing an Elastic scaling stack for Tika
> server…
>
>
>
> That does just that with AWS. Don’t have it ready to push upstream yet.
>
>
> Cheers,
>
> Chris
>
>
>
>
>
> From: Eric Pugh <ep...@opensourceconnections.com>
> Reply-To: "dev@tika.apache.org" <de...@tika.apache.org>
> Date: Thursday, April 16, 2020 at 7:09 AM
> To: "dev@tika.apache.org" <de...@tika.apache.org>
> Subject: [EXTERNAL] Re: Issue with > 200% CPU after bulk usage
>
>
>
> Does anyone have a good example of combining Tika with some sort of pool
> of Docker containers?   I think a lot of folks treat their Tika server like
> a pet, not like a cow.
> https://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/
> <
> https://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/
> >
>
>
>
> I wonder if we could ship some “recipes” that describe how to deploy a
> pool of Tika’s.    Tika running over 200% for 1 hour, kill it and start the
> next.
>
>
>
>
>
>
>
> On Apr 16, 2020, at 9:40 AM, Nick Burch <ap...@gagravarr.org> wrote:
>
> On Wed, 15 Apr 2020, hans.meijer@avident-it.se wrote:
>
> I have encountered an issue with Tika running locally on a box that the
> Java runtime goes up to over 200% CPU, after running a bulk load of
> documents over a couple of days, it is more than 3 million documents.
>
> Can you do a thread dump to show what the JVM is doing?
>
> https://access.redhat.com/solutions/18178
>
> Nick
>
>
>
> _______________________
>
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>
>
>
>
>