You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Frederik Van Hoyweghen <fr...@chapoo.com> on 2018/02/08 11:47:46 UTC
Opinions on ExtractingRequestHandler
Hey everyone,
What are your experiences on making (in production) use of Solr's
ExtractingRequestHandler?
I've been reading some mixed remarks so I was wondering what your actual
experiences with it are.
Personally, I feel like setting up a separate service which is solely
responsible for parsing file contents (to be indexed by Solr later on in
the process) using Tika is a safer approach, so we can use whatever Tika
version we want along with other things we might want to add.
Looking forward to your response!
Kind regards,
Frederik
Re: Opinions on ExtractingRequestHandler
Posted by "Sreenivas.T" <sr...@gmail.com>.
Frederik,
We have also used separate service, which uses tika & then use solrj to
index the content.
The main reason, why we went for this approach is to have flexibility to
manipulate/transform data over and above what tika does.
What I understand is that, if there is no other transformation needed
"ExtractingRequestHandler"
should be fine in production too.
Regards,
Sreenivas
On 8 February 2018 at 17:17, Frederik Van Hoyweghen <
frederik.vanhoyweghen@chapoo.com> wrote:
> Hey everyone,
>
> What are your experiences on making (in production) use of Solr's
> ExtractingRequestHandler?
>
> I've been reading some mixed remarks so I was wondering what your actual
> experiences with it are.
>
> Personally, I feel like setting up a separate service which is solely
> responsible for parsing file contents (to be indexed by Solr later on in
> the process) using Tika is a safer approach, so we can use whatever Tika
> version we want along with other things we might want to add.
>
> Looking forward to your response!
>
> Kind regards,
> Frederik
>
Re: Opinions on ExtractingRequestHandler
Posted by Charlie Hull <ch...@flax.co.uk>.
On 08/02/2018 11:47, Frederik Van Hoyweghen wrote:
> Hey everyone,
>
> What are your experiences on making (in production) use of Solr's
> ExtractingRequestHandler?
>
> I've been reading some mixed remarks so I was wondering what your actual
> experiences with it are.
>
> Personally, I feel like setting up a separate service which is solely
> responsible for parsing file contents (to be indexed by Solr later on in
> the process) using Tika is a safer approach, so we can use whatever Tika
> version we want along with other things we might want to add.
Yes, do this. It's entirely possible to bring down Tika with a nasty
PDF, or end up consuming lots of resources in the extraction step and
have these impact your Solr server. Run it separately and you can
monitor it/kill it if necessary.
You might like my colleague Matt Pearce's DropWizard wrapper for Tika
https://github.com/mattflax/dropwizard-tika-server
Cheers
Charlie
>
> Looking forward to your response!
>
> Kind regards,
> Frederik
>
--
Charlie Hull
Flax - Open Source Enterprise Search
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
web: www.flax.co.uk