You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Nate Findley <na...@zenlok.com> on 2013/12/21 19:02:10 UTC
Tika Server (JAXRS)
I am running Tika Server for processing files via curl requests. The
servers start running 100 CPU after a day or so. I am wondering if
there is any information about how to debug this situation. The wiki is
pretty thin on information.
Regards,
Nate
Re: Tika Server (JAXRS)
Posted by Rian J Stockbower <rs...@gmail.com>.
I've done some testing of Tika to determine how performant the JAXRS server
is under heavy loads by making 4-8 simultaneous requests as fast as the
webservice would respond, using a variety of test documents. (Some of these
document types were supported by Tika, some weren't.) I have a large text
extraction job coming up--millions of docs--and I needed to determine what
kind of resources I would need. During this testing, I found that CPU usage
was highest when Tika was unwinding exceptions. This CPU usage would
persist long after my ~10GB of documents had been completed.
These stack traces appeared to pile up such that documents would continue
to be processed as requests were made, and Tika would opportunistically
print a stack trace when it wasn't busy responding to other work. These
stack traces would scroll by--often for several minutes--after I had
finished making requests. I didn't dig into the cause because when I began
filtering the document types I was sending it, performance got better, and
dramatically reduced the number of exceptions thrown. As you might expect,
this brought CPU (and memory!) usage down dramatically.
With that in mind:
- Have you captured any console output?
- How busy is your web service?
- Are you filtering the document types before they're processed?
- Can you reproduce the problem in a test environment?
-Rian
On Sat, Dec 21, 2013 at 1:02 PM, Nate Findley <na...@zenlok.com> wrote:
> I am running Tika Server for processing files via curl requests. The
> servers start running 100 CPU after a day or so. I am wondering if there
> is any information about how to debug this situation. The wiki is pretty
> thin on information.
>
> Regards,
> Nate
>