You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Fabio Confalonieri <fa...@zero.it> on 2006/05/12 15:06:46 UTC
Leveraging filter chache in queries
Hello,
I've just fond Lucene and Solr and I'm thinking of using them in our current
project, essentially an ads portal (something very similar to
www.oodle.com).
I see our needs have already surfaced in the mailing list, it's the refine
search problem You have sometime called faceted browsing and which is the
base of CNet browsing architecture: we have ads with different categories
which have different attributes ("fields" in lucene language), say
motors-car category has make,model,price,color and real-estates-houses has
bathrooms ranges, bedrooms ranges, etc...
I understand You have developed Solr also to have filter cache storing
bitset of search results to have a fast way to intersect those bitsets to
count resulting sub-queries and present the count for refinement searches (I
have read the announcement of CNet and the Nines related thread and also
some other related thread).
Actually we thought of storing for every category on a MySQL database (which
we use for every other non search related tasks) the possible sub-query
attributes with possible values/ranges, in a similar way as You with CNet do
storing the possible subqueries of a query in a lucene document.
Now what I havent understood is if the Solr StandardRequestHandler
automatically creates and caches filters from normal queries submitted to
Solr select servlet, possibly with some syntax clue.
I tried a query like "+field:value^0" which returns a great number of Hits
(on a total test of 100.000 documents), but I see only the query cache
growing and the filter cache always empty. Is this normal ? I've tried to
check all the cache configuration but I don't understand if filters are
auto-generated from normal queries.
A more general question: Is all the CNet logic of intersecting bitsets
available through the servlet or have I to write some java code to be
plugged in Solr?
In this case which is the correct level to make this, perhaps a new
RequestHandler understanding some new query syntax to exploit filters.
We only need a sort on a single and precalculated rank field stored as a
range field, so we don't need relevance and consequently don't nedd scores
(which is a prerequisite for using BitSets, if I understand well).
Thank You, I hope to have explained well my doubts.
Fabio
PS:I think Solr and Lucene are a really great work!
I'll be happy when we have finished to add our project (a major press group
here in Italy) to public websites in Solr Wiki.
--
View this message in context: http://www.nabble.com/Leveraging-filter-chache-in-queries-t1607377.html#a4357730
Sent from the Solr - User forum at Nabble.com.
Re: Leveraging filter chache in queries
Posted by Yonik Seeley <ys...@gmail.com>.
On 5/12/06, Fabio Confalonieri <fa...@zero.it> wrote:
> I tried a query like "+field:value^0" which returns a great number of Hits
> (on a total test of 100.000 documents), but I see only the query cache
> growing and the filter cache always empty. Is this normal ? I've tried to
> check all the cache configuration but I don't understand if filters are
> auto-generated from normal queries.
There is currently no syntax in the standard request handler that
understands filters.
Converting certain "heavy" term queries to filters when they have a
zero boost was something Doug pointed me at and I borrowed directly
from Nutch very early on, before Solr had it's own caching.
The optimization code is still sort-of in Solr, but
- it's not called by default anymore... people needing faceted
browsing currently need their own plugin anyway, and they can then
specify filters directly.
- it's caching is not integrated into Solr's caching
Filters *can* be generated and used to satisfy whole queries when the
following optimization is turned on in solrconfig.xml:
<!-- An optimization that attempts to use a filter to satisfy a search.
If the requested sort does not include score, then the filterCache
will be checked for a filter matching the query. If found, the filter
will be used as the source of document ids, and then the sort will be
applied to that. -->
<useFilterForSortedQuery>true</useFilterForSortedQuery>
> A more general question: Is all the CNet logic of intersecting bitsets
> available through the servlet or have I to write some java code to be
> plugged in Solr?
The nitty-gritty if getting intersection counts is in Solr, but you
still need to ask solr for each facet count individually, and you
still need to know which counts to ask for. Thats the part you
currently still need a custom request handler for.
> In this case which is the correct level to make this, perhaps a new
> RequestHandler understanding some new query syntax to exploit filters.
Yes, a new RequestHandler.. from there the easiest way is to pass
extra parameters (not changing the query syntax passed as "q").
> We only need a sort on a single and precalculated rank field stored as a
> range field, so we don't need relevance and consequently don't nedd scores
> (which is a prerequisite for using BitSets, if I understand well).
You can do relevancy scoring *and* do facets at the same time... there
is no incompatibility there.
-Yonik
Re: Leveraging filter chache in queries
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On May 12, 2006, at 9:06 AM, Fabio Confalonieri wrote:
> I see our needs have already surfaced in the mailing list, it's the
> refine
> search problem You have sometime called faceted browsing and which
> is the
> base of CNet browsing architecture: we have ads with different
> categories
> which have different attributes ("fields" in lucene language), say
> motors-car category has make,model,price,color and real-estates-
> houses has
> bathrooms ranges, bedrooms ranges, etc...
>
> I understand You have developed Solr also to have filter cache storing
> bitset of search results to have a fast way to intersect those
> bitsets to
> count resulting sub-queries and present the count for refinement
> searches (I
> have read the announcement of CNet and the Nines related thread and
> also
> some other related thread).
As Yonik has pointed out, Solr provides some nice facilities to build
upon, but the actual implementation is still custom for this sort of
thing. For example, here's the (pseudo)code for my intersecting
BitSet (and soon to become DocSet) processing works:
private Query createConstraintMask(final Map facetCache, String[]
constraints, BitSet constraintMask, IndexReader reader) throws
ParseException, IOException {
Query query = new BooleanQuery(); // BooleanQuery used for all
full-text expression constraints, but not for facets
constraintMask.set(0, constraintMask.size()); // light up all
documents initially
if (constraints != null) {
// Loop over all constraints, ANDing all cached bit sets with
the constraint mask
for (String constraint : constraints) {
if (constraint == null || constraint.length() == 0) continue;
// constraint looks like this: [-]field:value
int colonPosition = constraint.indexOf(':');
if (colonPosition <= 0) continue;
String field = constraint.substring(0,colonPosition);
boolean invert = false;
if (field.startsWith("-")) {
invert = true;
field = field.substring(1);
}
String value = constraint.substring(colonPosition + 1);
BitSet valueMask;
if (! field.equals("?")) {
Map fieldMap = (Map) facetCache.get(field); // facetCache
is from a custom Solr cache currently
if (fieldMap == null) continue; // field name doesn't
correspond to predefined facets
valueMask = (BitSet) fieldMap.get(value);
if (valueMask == null) {
valueMask = new BitSet(constraintMask.size());
System.out.println("invalid value requested for field "
+ field + ": " + value);
}
} else {
Query clause = // some query from parsing "value";
QueryFilter filter = new QueryFilter(clause); // this
should change to get the DocSet from Solr's facilities :)
valueMask = filter.bits(reader);
}
if (!invert) {
constraintMask.and(valueMask);
} else {
constraintMask.andNot(valueMask); // This is what would
be nice for DocSet's to be capable of
}
}
}
if (((BooleanQuery)query).getClauses().length == 0) {
query = new MatchAllDocsQuery();
}
return query;
}
And then basically it gets called like this in my custom handler:
BitSet constraintMask = new BitSet(reader.numDocs());
Query query = query = createConstraintMask(facetCache,
req.getParams("constraint"), constraintMask, reader);
DocList results = req.getSearcher().getDocList(query, new
BitDocSet(constraintMask), sort, req.getStart(), req.getLimit());
[critique of this code more than welcome!]
My client (Ruby on Rails) is POSTing in a parameter that looks like
this:
constraint=#{invert}#{field}:#{constraint[:value]}
parameters. Works really well even before my refactoring to use
Solr's DocSet and caching capabilities, and I'm sure it'll do even
better leveraging its provided capabilities. Really nice stuff!
> A more general question: Is all the CNet logic of intersecting bitsets
> available through the servlet or have I to write some java code to be
> plugged in Solr?
Currently you have to piece it together. The goal is to build these
facilities more into the core, but we should do so based on folks
implementing it themselves and contributing it, so that we can
compare the needs that others have and come up with some great
groundwork in the faceted browsing area just as Solr itself has built
above raw Lucene.
So, let's all flesh this stuff out and compare/contrast real-world
working implementations and factoring it on top.
As an example of another facility I've just added on top, the ability
to return all terms that match a client-provided prefix - this is to
enable Google Suggest-like convenience so that when someone types
"Yo" and pauses, an Ajaxifried UI will hit my Rails app, which in
turn will ping Solr with the prefix and a custom request handler will
respond back with the terms that match ("Yonik" for example) for a
specified field. Not only that, but my implementation returns the
number of documents that match that term constrained by the same
types of constraints above including full-text queries. This allows
our users to pick people by typing a name rather than us having to
populate a drop-down (we'll still have some kind of browser interface
too, I'm sure) but only names of folks involved in the document set
they are currently constraining their view to.
I've been thinking about this in a general sense - if Solr was driven
by a slick servlet filter rather than servlets then these types of
handlers could be plugged in a lot easier including automatic URL
handling rather than having to twiddle web.xml. I realize that the
handler configuration allows this with the qt parameter, and I'm
leveraging that myself, but I think with some HiveMind mojo to allow
true "plugins" to drop right into the classpath and be immediately
available (perhaps even hotly with some containers, but I personally
would rebuild a WAR, stop/deploy/restart).
> In this case which is the correct level to make this, perhaps a new
> RequestHandler understanding some new query syntax to exploit filters.
Back to your specific case currently, yes, a new request handler is
needed to go above and beyond what the built-in standard one
provides. I expect a flood of cool handlers on top of Solr :) and
that is why I am thinking more along the lines of a true plugin
architecture.
> We only need a sort on a single and precalculated rank field stored
> as a
> range field, so we don't need relevance and consequently don't nedd
> scores
> (which is a prerequisite for using BitSets, if I understand well).
You're pretty much right on!
> PS:I think Solr and Lucene are a really great work!
> I'll be happy when we have finished to add our project (a major
> press group
> here in Italy) to public websites in Solr Wiki.
I'm looking forward to your work on top of Solr! I'm personally
quite thrilled with it and really believe it'll go far. If only I
had more time to play with it myself rather than just contemplating
it :)
Erik