You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Milos <mi...@grf.bg.ac.rs> on 2013/03/09 22:24:37 UTC

Tika-server stability

Hello,
I plan to use tika server application tika-server.jar for my intranet web site.
I experimented with tika parsers v1.1 before but there were some problems when
parsing huge document collections. Sometimes tika would block the whole jvm so i
needed to make my own parsing server which started a new parsing proccess in
different jvm. If the submitted file is not parsed in a certain timeout the
parsing server would kill the external jvm and started another one. That was a
solution to my problem but I always felt it is not the right way to deal with
this problem. What are the experiences with the new tika-server-1.3.jar? Could
some problematic document block the whole server or not?
Best regards, Milos


Re: Tika-server stability

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Milos,

On 3/10/13 4:16 AM, "Milos" <mi...@grf.bg.ac.rs> wrote:

>Mattmann, Chris A (388J <ch...@...> writes:
>
>> 
>> Hey Milos,
>> 
>> Tika server is the JAX-RS server.
>> 
>> Good differentiation here on stack overflow:
>> 
>> 
>>http://stackoverflow.com/questions/12231630/how-to-use-tika-in-server-mod
>>e
>
>
>Ok I understand now.
>
>The way how I submit the file for parsing is not so important to me.

Got it.

>So I could
>use both versions. But the main question remains. Does someone know if
>these
>servers were immune to errors in a way that it is not needed to restart
>jvm if
>something goes wrong with parsing?

It's really hard without a specific example to tell you yes or no here. I
can tell
you that both applications are based on the same underlying core, so
unless they 
contain some magic capability in them to handle this specific to tika-app
or 
tika-server, then unless that capability is in tika-core then it's not
there.


>Also what are the performance implications of
>using one or another version?

The REST one (tika-server) is HTTP, and other (tika-app) is simply Tika
over a socket -- 
I would imagine tika-app to be faster, but have not benchmarked. Would be
very happy 
to see a comparison if you have the time.

Thanks!

Cheers,
Chris


Re: Tika-server stability

Posted by Milos <mi...@grf.bg.ac.rs>.
Mattmann, Chris A (388J <ch...@...> writes:

> 
> Hey Milos,
> 
> Tika server is the JAX-RS server.
> 
> Good differentiation here on stack overflow:
> 
> http://stackoverflow.com/questions/12231630/how-to-use-tika-in-server-mode


Ok I understand now.

The way how I submit the file for parsing is not so important to me. So I could
use both versions. But the main question remains. Does someone know if these
servers were immune to errors in a way that it is not needed to restart jvm if
something goes wrong with parsing? Also what are the performance implications of
using one or another version? 

Cheers, Milos


Re: Tika-server stability

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Milos,

Tika server is the JAX-RS server.

Good differentiation here on stack overflow:

http://stackoverflow.com/questions/12231630/how-to-use-tika-in-server-mode


HTH!

Cheers,
Chris


On 3/9/13 3:43 PM, "Milos" <mi...@grf.bg.ac.rs> wrote:

>
>Mattmann, Chris A (388J <ch...@...> writes:
>
>> 
>> Hi Milos,
>> 
>> Are you talking about the Tika JAXRS server, or the Network protocol
>> (lower layer/sockets) one?
>
>Hi Chris,
>I am not sure. Is tika-server.1.3.jar JAXRS or lower layer?
>Could you please tell me what are the jars for corresponding variants?
>Anyway I need the one which is immune to bad files and does not block the
>jvm.
>Regards, Milos
>
>
>


Re: Tika-server stability

Posted by Milos <mi...@grf.bg.ac.rs>.
Mattmann, Chris A (388J <ch...@...> writes:

> 
> Hi Milos,
> 
> Are you talking about the Tika JAXRS server, or the Network protocol
> (lower layer/sockets) one?

Hi Chris,
I am not sure. Is tika-server.1.3.jar JAXRS or lower layer?
Could you please tell me what are the jars for corresponding variants?
Anyway I need the one which is immune to bad files and does not block the jvm.
Regards, Milos




Re: Tika-server stability

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Milos,

Are you talking about the Tika JAXRS server, or the Network protocol
(lower layer/sockets) one?

Cheers,
Chris


On 3/9/13 1:24 PM, "Milos" <mi...@grf.bg.ac.rs> wrote:

>Hello,
>I plan to use tika server application tika-server.jar for my intranet web
>site.
>I experimented with tika parsers v1.1 before but there were some problems
>when
>parsing huge document collections. Sometimes tika would block the whole
>jvm so i
>needed to make my own parsing server which started a new parsing proccess
>in
>different jvm. If the submitted file is not parsed in a certain timeout
>the
>parsing server would kill the external jvm and started another one. That
>was a
>solution to my problem but I always felt it is not the right way to deal
>with
>this problem. What are the experiences with the new tika-server-1.3.jar?
>Could
>some problematic document block the whole server or not?
>Best regards, Milos
>