You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jim Murphy <ji...@pobox.com> on 2008/07/30 22:28:29 UTC

Understanding Filters

I'm still trying to filter my search results with external data.  I have ~100
million documents in the index.  I want to use the power of lucene to knock
that index down to 10-100 with keyword searching and a few other regular
query terms.  With that smaller subset I'd like to apply a filter based on
making calls to an external system to further reduce that set to 5-20.

Looking at filter queries and lucense search filters it seems that they
iterate over the entire index to create a bitset of documents to be included
in the query.  This seems the inverse of my needs.  I can't make ~100 milion
external calls to filter - I want lucene to handle that heavy lifting.

I'm trying to figure out the right place to hook to let paging and caching
in Solr work as normal but drop out result documents based on that expensive
external call.

Thanks, and sorry for the repeat requests. 

Jim
-- 
View this message in context: http://www.nabble.com/Understanding-Filters-tp18742220p18742220.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Understanding Filters

Posted by k_richard <ke...@msn.com>.
Jim -

I'm interested in doing something similar and was wondering if you ever got
a response.  I've looked at the solr source code and have some ideas about
where to apply additional filtering but they're not that elegant.

>From what I can see, if you add an additional search component after the
initial query you would need to update all of the saved doc sets/lists with
your filtered data to enable downstream search components to work
effectively.  However, there's warnings strewn about the code that warn
against modifying these.

Any thoughts?

Thanks.

- Kevin


Jim Murphy wrote:
> 
> I'm still trying to filter my search results with external data.  I have
> ~100 million documents in the index.  I want to use the power of lucene to
> knock that index down to 10-100 with keyword searching and a few other
> regular query terms.  With that smaller subset I'd like to apply a filter
> based on making calls to an external system to further reduce that set to
> 5-20.
> 
> Looking at filter queries and lucense search filters it seems that they
> iterate over the entire index to create a bitset of documents to be
> included in the query.  This seems the inverse of my needs.  I can't make
> ~100 milion external calls to filter - I want lucene to handle that heavy
> lifting.
> 
> I'm trying to figure out the right place to hook to let paging and caching
> in Solr work as normal but drop out result documents based on that
> expensive external call.
> 
> Thanks, and sorry for the repeat requests. 
> 
> Jim
> 

-- 
View this message in context: http://www.nabble.com/Understanding-Filters-tp18742220p21221020.html
Sent from the Solr - User mailing list archive at Nabble.com.