You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Milos <mi...@grf.bg.ac.rs> on 2013/03/09 22:24:37 UTC
Tika-server stability
Hello,
I plan to use tika server application tika-server.jar for my intranet web site.
I experimented with tika parsers v1.1 before but there were some problems when
parsing huge document collections. Sometimes tika would block the whole jvm so i
needed to make my own parsing server which started a new parsing proccess in
different jvm. If the submitted file is not parsed in a certain timeout the
parsing server would kill the external jvm and started another one. That was a
solution to my problem but I always felt it is not the right way to deal with
this problem. What are the experiences with the new tika-server-1.3.jar? Could
some problematic document block the whole server or not?
Best regards, Milos
Re: Tika-server stability
Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Milos,
On 3/10/13 4:16 AM, "Milos" <mi...@grf.bg.ac.rs> wrote:
>Mattmann, Chris A (388J <ch...@...> writes:
>
>>
>> Hey Milos,
>>
>> Tika server is the JAX-RS server.
>>
>> Good differentiation here on stack overflow:
>>
>>
>>http://stackoverflow.com/questions/12231630/how-to-use-tika-in-server-mod
>>e
>
>
>Ok I understand now.
>
>The way how I submit the file for parsing is not so important to me.
Got it.
>So I could
>use both versions. But the main question remains. Does someone know if
>these
>servers were immune to errors in a way that it is not needed to restart
>jvm if
>something goes wrong with parsing?
It's really hard without a specific example to tell you yes or no here. I
can tell
you that both applications are based on the same underlying core, so
unless they
contain some magic capability in them to handle this specific to tika-app
or
tika-server, then unless that capability is in tika-core then it's not
there.
>Also what are the performance implications of
>using one or another version?
The REST one (tika-server) is HTTP, and other (tika-app) is simply Tika
over a socket --
I would imagine tika-app to be faster, but have not benchmarked. Would be
very happy
to see a comparison if you have the time.
Thanks!
Cheers,
Chris
Re: Tika-server stability
Posted by Milos <mi...@grf.bg.ac.rs>.
Mattmann, Chris A (388J <ch...@...> writes:
>
> Hey Milos,
>
> Tika server is the JAX-RS server.
>
> Good differentiation here on stack overflow:
>
> http://stackoverflow.com/questions/12231630/how-to-use-tika-in-server-mode
Ok I understand now.
The way how I submit the file for parsing is not so important to me. So I could
use both versions. But the main question remains. Does someone know if these
servers were immune to errors in a way that it is not needed to restart jvm if
something goes wrong with parsing? Also what are the performance implications of
using one or another version?
Cheers, Milos
Re: Tika-server stability
Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Milos,
Tika server is the JAX-RS server.
Good differentiation here on stack overflow:
http://stackoverflow.com/questions/12231630/how-to-use-tika-in-server-mode
HTH!
Cheers,
Chris
On 3/9/13 3:43 PM, "Milos" <mi...@grf.bg.ac.rs> wrote:
>
>Mattmann, Chris A (388J <ch...@...> writes:
>
>>
>> Hi Milos,
>>
>> Are you talking about the Tika JAXRS server, or the Network protocol
>> (lower layer/sockets) one?
>
>Hi Chris,
>I am not sure. Is tika-server.1.3.jar JAXRS or lower layer?
>Could you please tell me what are the jars for corresponding variants?
>Anyway I need the one which is immune to bad files and does not block the
>jvm.
>Regards, Milos
>
>
>
Re: Tika-server stability
Posted by Milos <mi...@grf.bg.ac.rs>.
Mattmann, Chris A (388J <ch...@...> writes:
>
> Hi Milos,
>
> Are you talking about the Tika JAXRS server, or the Network protocol
> (lower layer/sockets) one?
Hi Chris,
I am not sure. Is tika-server.1.3.jar JAXRS or lower layer?
Could you please tell me what are the jars for corresponding variants?
Anyway I need the one which is immune to bad files and does not block the jvm.
Regards, Milos
Re: Tika-server stability
Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Milos,
Are you talking about the Tika JAXRS server, or the Network protocol
(lower layer/sockets) one?
Cheers,
Chris
On 3/9/13 1:24 PM, "Milos" <mi...@grf.bg.ac.rs> wrote:
>Hello,
>I plan to use tika server application tika-server.jar for my intranet web
>site.
>I experimented with tika parsers v1.1 before but there were some problems
>when
>parsing huge document collections. Sometimes tika would block the whole
>jvm so i
>needed to make my own parsing server which started a new parsing proccess
>in
>different jvm. If the submitted file is not parsed in a certain timeout
>the
>parsing server would kill the external jvm and started another one. That
>was a
>solution to my problem but I always felt it is not the right way to deal
>with
>this problem. What are the experiences with the new tika-server-1.3.jar?
>Could
>some problematic document block the whole server or not?
>Best regards, Milos
>