You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Clemens Wyss DEV <cl...@mysign.ch> on 2013/03/13 14:22:45 UTC

Considering usage of tika-server(s)

We have several tomcats (each with several war's) running. At the moment we use tika "in memory", i.e. extraction is being performed within the tomcat processes/threads. 

Does a tika-server queue the requests or are they being executed in parallel?
Is there a default java implementation for communicating (i.e. issueing extraction requests) with a tika-server? (can the ForkParser be used herefor?)
How can I calculate the amount of memory to be assigend for a/the tika-server(s)?
Last but not least:
Does it make sense to have one (several) tika-server(s) running as the "working horse" for all tomcats/war's? Topics such as single-point-of-failure come to my mind...

Thanks for your advices
Clemens

Re: Considering usage of tika-server(s)

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Clemens,

Thanks for your questions. Answers below:

On 3/13/13 6:22 AM, "Clemens Wyss DEV" <cl...@mysign.ch> wrote:

>We have several tomcats (each with several war's) running. At the moment
>we use tika "in memory", i.e. extraction is being performed within the
>tomcat processes/threads.
>
>Does a tika-server queue the requests or are they being executed in
>parallel?

Tika-server doesn't explicitly deal with this. However, the container
(Tomcat) should be able to deal with load balancing, and/or
this type of functionality.

>Is there a default java implementation for communicating (i.e. issueing
>extraction requests) with a tika-server?

Not explicitly, though Apache CXF can generate Java client APIs using a
WADL descriptor that tika-server provides.

>(can the ForkParser be used herefor?)

I believe so though we use AutoDetectParser I believe atm.

>How can I calculate the amount of memory to be assigend for a/the
>tika-server(s)?

You can probably figure out how many REST apps you want to stand up (e.g.,
/ts1 /ts2 Š /tsN) and then horizontally load balance
between them, and then look at the biggest types of documents that you
will be sending and partition that way. OTOH, you could
run multiple physical Tomcat instances on different ports, and then assign
a different -Xms -Xmx value to each one.

>Last but not least:
>Does it make sense to have one (several) tika-server(s) running as the
>"working horse" for all tomcats/war's? Topics such as
>single-point-of-failure come to my mind...

Depends on the level of granularity you would like to have control over
all of the above.

HTH.

Cheers,
Chris

>
>Thanks for your advices
>Clemens