You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andrea D'Ippolito <ad...@gmail.com> on 2009/10/15 10:00:34 UTC

Filtered search for subset of ids

Hi everybody,
I'm new here..and this is my last chance to find a solution for my problem.

I'm using acts_as_solr for Ruby On Rails.

I need to submit a query to a subset of documents which id belong to an
array of ids that I want to pass as parameter.

for istance, something like:

find_by_solr(query, id:[1,2,3,40,51,56])

or actually I'd just need a way in the option to filter a kind of sql IN
instead of RANGE.

I guess I need to override some methods..but first of all I want to know if
you consider this possibile, and if you have any hints about how to achieve
that.

I'm working on Articles repository, indexing title and content only (but
documents id is sincronized with the document id in the MySql database).

Thanks

(I hope this is not a duplicate..I've send it before to confirm subscription
:S )

Andrea

Re: Filtered search for subset of ids

Posted by solr_noob <di...@gmail.com>.
Hello Mikhail

I like your idea of creating an index for eg category-id. I am going to try
to approach the problem with that for now. yeah, off the top of my head, the
idea of fq=id(1,2,3,4......) does not seem scalable. thank you so much for
your suggestion and pointing out the performance ramifications. :)

=HEnry=

--
View this message in context: http://lucene.472066.n3.nabble.com/Filtered-search-for-subset-of-ids-tp502245p3639323.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Filtered search for subset of ids

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Hello,

It seem you are talking about huge disjunctive filter: fq=id:(1 2 3 4
5....). I have two suggestions.

I. don't do that.  It has the following drawbacks:
* it takes too much to parse such long query. as result, search will
cost O(query-len) instead of O(numFound) (without scoring/sorting)
* it hits BooleanQuery.TooManyClauses  exception
* this huge BooleanQuery is used as a key in Solr filter cache, that's
bad you know. Because, even cache hit cost you O(n^2) due to
straightforward equals().

The proper solution is bringing your first search stage, which gives
you  ids list, into Solr. Assuming you have some kind of external
index, which maps some short key e.g. category-id into set of ids. You
need index that category field by Solr, and request short filter query
fq=catId:666 instead the huge one.

II. some time ago I deal with this challenge, beside of the query
parsing though. The proper approach is implement your own
org.apache.lucene.search.MultiTermQuery and back in onto list of
sorted ids encoded by vint. It gives you fast equals(). Then you'll
need to implement own queryparser which will decode that vint vector.
And your app should form properly encoded filter query. But the length
is limited by url length. see approach I.

Regards


On Fri, Jan 6, 2012 at 9:55 AM, solr_noob <di...@gmail.com> wrote:
> Hello,
>
> I'm new to SOLR. I am facing the same set of problem to solve. The idea is
> to search for key phrase(s) within a set of documents. I understand the
> query syntax somewhat. What if the list of document ids to search gets to
> about say, 10000 documents? what is the best way to craft the query?
>
> so it would be,in relational DB
>
>    SELECT * FROM documents WHERE query ='search term' and document_id in
> [.............];
>
> Thanks :)
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Filtered-search-for-subset-of-ids-tp502245p3637150.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

Re: Filtered search for subset of ids

Posted by Lance Norskog <go...@gmail.com>.
If you want the Nth result in a result set, that would be:
start=N&rows=1

A document 'id' is field containing a unique value for a document. It
is not normally used for relevance scoring. You would instead search
for
id:value

On Thu, Jan 5, 2012 at 9:55 PM, solr_noob <di...@gmail.com> wrote:
> Hello,
>
> I'm new to SOLR. I am facing the same set of problem to solve. The idea is
> to search for key phrase(s) within a set of documents. I understand the
> query syntax somewhat. What if the list of document ids to search gets to
> about say, 10000 documents? what is the best way to craft the query?
>
> so it would be,in relational DB
>
>    SELECT * FROM documents WHERE query ='search term' and document_id in
> [.............];
>
> Thanks :)
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Filtered-search-for-subset-of-ids-tp502245p3637150.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Lance Norskog
goksron@gmail.com

Re: Filtered search for subset of ids

Posted by solr_noob <di...@gmail.com>.
Hello,

I'm new to SOLR. I am facing the same set of problem to solve. The idea is
to search for key phrase(s) within a set of documents. I understand the
query syntax somewhat. What if the list of document ids to search gets to
about say, 10000 documents? what is the best way to craft the query?

so it would be,in relational DB 

    SELECT * FROM documents WHERE query ='search term' and document_id in
[.............];

Thanks :)



--
View this message in context: http://lucene.472066.n3.nabble.com/Filtered-search-for-subset-of-ids-tp502245p3637150.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Filtered search for subset of ids

Posted by Chris Hostetter <ho...@fucit.org>.
: >    /select?q=your+main+query&fq=id:(1+2+3+40+51+56)

: ok, that's good to know.. I'll figure out how to force the API to get that,
: at the moment accept RANGE and OR as filter query..but I'm not sure how it
: process them..
: I'll check the methods .. (and maybe the OR is converted to + like you
: said...)

"OR" is what you want if that's part of the API ... the "+" in my example 
are the URL escaped space characters.


-Hoss


Re: Filtered search for subset of ids

Posted by Andrea D'Ippolito <ad...@gmail.com>.
2009/10/22 Chris Hostetter <ho...@fucit.org>

>
> : I need to submit a query to a subset of documents which id belong to an
> : array of ids that I want to pass as parameter.
> :
> : for istance, something like:
> :
> : find_by_solr(query, id:[1,2,3,40,51,56])
>
> i don't know anything baout the acts_as_solr API, but you should be able
> to do this using a "filter query" which is specified in the HTTP request
> using the "fq" param...
>
>    /select?q=your+main+query&fq=id:(1+2+3+40+51+56)
>
> ....hopefully that gives you the tips you need to find how to specify this
> type of query in acts_as_solr.
>
ok, that's good to know.. I'll figure out how to force the API to get that,
at the moment accept RANGE and OR as filter query..but I'm not sure how it
process them..
I'll check the methods .. (and maybe the OR is converted to + like you
said...)

thanks

andrea


>
>
> -Hoss
>
>

Re: Filtered search for subset of ids

Posted by Chris Hostetter <ho...@fucit.org>.
: I need to submit a query to a subset of documents which id belong to an
: array of ids that I want to pass as parameter.
: 
: for istance, something like:
: 
: find_by_solr(query, id:[1,2,3,40,51,56])

i don't know anything baout the acts_as_solr API, but you should be able 
to do this using a "filter query" which is specified in the HTTP request 
using the "fq" param...

    /select?q=your+main+query&fq=id:(1+2+3+40+51+56)

....hopefully that gives you the tips you need to find how to specify this 
type of query in acts_as_solr.


-Hoss