You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Rahul Chhiber <ra...@cumulus-systems.com> on 2018/01/23 09:53:47 UTC

Using lucene to post-process Solr query results

Hi All,

For our business requirement, once our Solr client (Java) gets the results of a search query from the Solr server, we need to further search across and also within the content of the returned documents. To accomplish this, I am attempting to create on the client-side an in-memory lucene index (RAMDirectory), convert the SolrDocument objects into smaller lucene Document objects, add them into the index and then search within it.

Has something like this been attempted yet? And does it sound like a workable idea ?

P.S. - Reason for this approach is basically that we need search on the data at a certain fine granularity but don't want to index the data at such high granularity for indexing performance reasons i.e. we need to keep the total number of documents small.

Appreciate any help.

Regards,
Rahul Chhiber

RE: Using lucene to post-process Solr query results

Posted by Rahul Chhiber <ra...@cumulus-systems.com>.

Hi Atita,

Haven't tried anything else. I considered writing a plugin , custom SearchComponent or such , but being fairly ignorant of the Solr internals I thought of first trying out this approach, and if this works then maybe moving the processing inside a plugin.

I will take a look at streaming expressions, looks interesting.

Regards,
Rahul Chhiber

-----Original Message-----
From: Atita Arora [mailto:atitaarora@gmail.com] 
Sent: Tuesday, January 23, 2018 3:29 PM
To: solr-user@lucene.apache.org
Subject: Re: Using lucene to post-process Solr query results

Hi Rahul,
Looks like Streaming expressions can probably can help you.

Is there something else you have tried for this?

Atita



On Jan 23, 2018 3:24 PM, "Rahul Chhiber" <ra...@cumulus-systems.com>
wrote:

Hi All,

For our business requirement, once our Solr client (Java) gets the results of a search query from the Solr server, we need to further search across and also within the content of the returned documents. To accomplish this, I am attempting to create on the client-side an in-memory lucene index (RAMDirectory), convert the SolrDocument objects into smaller lucene Document objects, add them into the index and then search within it.

Has something like this been attempted yet? And does it sound like a workable idea ?

P.S. - Reason for this approach is basically that we need search on the data at a certain fine granularity but don't want to index the data at such high granularity for indexing performance reasons i.e. we need to keep the total number of documents small.

Appreciate any help.

Regards,
Rahul Chhiber

Re: Using lucene to post-process Solr query results

Posted by Atita Arora <at...@gmail.com>.

Hi Rahul,
Looks like Streaming expressions can probably can help you.

Is there something else you have tried for this?

Atita



On Jan 23, 2018 3:24 PM, "Rahul Chhiber" <ra...@cumulus-systems.com>
wrote:

Hi All,

For our business requirement, once our Solr client (Java) gets the results
of a search query from the Solr server, we need to further search across
and also within the content of the returned documents. To accomplish this,
I am attempting to create on the client-side an in-memory lucene index
(RAMDirectory), convert the SolrDocument objects into smaller lucene
Document objects, add them into the index and then search within it.

Has something like this been attempted yet? And does it sound like a
workable idea ?

P.S. - Reason for this approach is basically that we need search on the
data at a certain fine granularity but don't want to index the data at such
high granularity for indexing performance reasons i.e. we need to keep the
total number of documents small.

Appreciate any help.

Regards,
Rahul Chhiber

RE: Using lucene to post-process Solr query results

Posted by "alessandro.benedetti" <a....@sease.io>.

I have never been a big fan of " getting N results from Solr and then filter
them client side" .
I get your point about the document modelling, so I will assume you properly
tested it and having the small documents at Solr side is really not
sustainable. 

I also appreciate the fact you want to finally return just the children
documents.

Possible flaws in getting N and then filter K client side is that you may
end up in 0 results even if there are actual results (
e.g. 
you have a total of 1000 results from Solr
you get the top 10.
you split this top 10 creating 100 childrend docs, but none of them matches
the query anymore.
In the remaining 990 results there could be valid children documents that
are not returned.

Have you tried nested documents as well by any chance ? (keep in mind that a
child document is still a Solr document so it may be not a good fit for
you).




-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html