You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by solr_noob <di...@gmail.com> on 2012/01/06 06:55:11 UTC

Re: Filtered search for subset of ids

Hello,

I'm new to SOLR. I am facing the same set of problem to solve. The idea is
to search for key phrase(s) within a set of documents. I understand the
query syntax somewhat. What if the list of document ids to search gets to
about say, 10000 documents? what is the best way to craft the query?

so it would be,in relational DB 

    SELECT * FROM documents WHERE query ='search term' and document_id in
[.............];

Thanks :)



--
View this message in context: http://lucene.472066.n3.nabble.com/Filtered-search-for-subset-of-ids-tp502245p3637150.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Filtered search for subset of ids

Posted by solr_noob <di...@gmail.com>.
Hello Mikhail

I like your idea of creating an index for eg category-id. I am going to try
to approach the problem with that for now. yeah, off the top of my head, the
idea of fq=id(1,2,3,4......) does not seem scalable. thank you so much for
your suggestion and pointing out the performance ramifications. :)

=HEnry=

--
View this message in context: http://lucene.472066.n3.nabble.com/Filtered-search-for-subset-of-ids-tp502245p3639323.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Filtered search for subset of ids

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Hello,

It seem you are talking about huge disjunctive filter: fq=id:(1 2 3 4
5....). I have two suggestions.

I. don't do that.  It has the following drawbacks:
* it takes too much to parse such long query. as result, search will
cost O(query-len) instead of O(numFound) (without scoring/sorting)
* it hits BooleanQuery.TooManyClauses  exception
* this huge BooleanQuery is used as a key in Solr filter cache, that's
bad you know. Because, even cache hit cost you O(n^2) due to
straightforward equals().

The proper solution is bringing your first search stage, which gives
you  ids list, into Solr. Assuming you have some kind of external
index, which maps some short key e.g. category-id into set of ids. You
need index that category field by Solr, and request short filter query
fq=catId:666 instead the huge one.

II. some time ago I deal with this challenge, beside of the query
parsing though. The proper approach is implement your own
org.apache.lucene.search.MultiTermQuery and back in onto list of
sorted ids encoded by vint. It gives you fast equals(). Then you'll
need to implement own queryparser which will decode that vint vector.
And your app should form properly encoded filter query. But the length
is limited by url length. see approach I.

Regards


On Fri, Jan 6, 2012 at 9:55 AM, solr_noob <di...@gmail.com> wrote:
> Hello,
>
> I'm new to SOLR. I am facing the same set of problem to solve. The idea is
> to search for key phrase(s) within a set of documents. I understand the
> query syntax somewhat. What if the list of document ids to search gets to
> about say, 10000 documents? what is the best way to craft the query?
>
> so it would be,in relational DB
>
>    SELECT * FROM documents WHERE query ='search term' and document_id in
> [.............];
>
> Thanks :)
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Filtered-search-for-subset-of-ids-tp502245p3637150.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

Re: Filtered search for subset of ids

Posted by Lance Norskog <go...@gmail.com>.
If you want the Nth result in a result set, that would be:
start=N&rows=1

A document 'id' is field containing a unique value for a document. It
is not normally used for relevance scoring. You would instead search
for
id:value

On Thu, Jan 5, 2012 at 9:55 PM, solr_noob <di...@gmail.com> wrote:
> Hello,
>
> I'm new to SOLR. I am facing the same set of problem to solve. The idea is
> to search for key phrase(s) within a set of documents. I understand the
> query syntax somewhat. What if the list of document ids to search gets to
> about say, 10000 documents? what is the best way to craft the query?
>
> so it would be,in relational DB
>
>    SELECT * FROM documents WHERE query ='search term' and document_id in
> [.............];
>
> Thanks :)
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Filtered-search-for-subset-of-ids-tp502245p3637150.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Lance Norskog
goksron@gmail.com