You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Erik Hatcher <er...@ehatchersolutions.com> on 2006/03/29 20:28:42 UTC
faceted browsing
I saw Yonik mentioned faceted browsing as something coming in the
future of Solr, but I had thought it was one of the initial features
from seeing this announcement ages ago:
<http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com-
Product-Category-Listings-t266441.html#a748420>
If facets are part of the current Solr codebase, how are they
configured and returned in the response?
If they aren't currently possible with Solr, what would it take to
implement it?
I'm still, obviously, just scratching the surface of Solr as I
evaluate it for replacing my custom XML-RPC based search server which
does rudimentary facets using Filters and BitSet operations.
By faceted browsing, a Query is used to search, Hits are returned,
but also based on a subset of the fields (indexed, untokenized
fields) the number of documents in each of these "facet" fields is
returned as well to show counts by each facet.
Thanks,
Erik
Re: faceted browsing
Posted by Yonik Seeley <ys...@gmail.com>.
On 3/30/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> Now I need to investigate the flexibility of the solrconfig.xml - can
> custom parameters be set there, such that a custom SolrRequestHandler
> could read them? For example, I'd want to list the field names that
> are the "facets", such that counts for each of those are returned
> with each query.
Yes, here is a fragment from the example solrconfig.xml
<!-- example of a request handler with custom parameters passed to it's init()
<requestHandler name="example" class="myorg.mypkg.MyRequestHandler" >
<int name="myparam">1000</int>
<float name="ratio">1.4142135</float>
<arr name="myarr"><int>1</int><int>2</int></arr>
<str>foo</str>
</requestHandler>
-->
The XML format is the same as what is used in the response for general data.
In addition to the data types above, there is also <lst> which is the
same as <arr>
except that the elements are named.
-Yonik
Re: faceted browsing
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Yonik,
Thanks for the recommendations. It's reassuring to know I was on the
right track in realizing that a custom SolrRequestHandler was needed
to accomplish this.
Now I need to investigate the flexibility of the solrconfig.xml - can
custom parameters be set there, such that a custom SolrRequestHandler
could read them? For example, I'd want to list the field names that
are the "facets", such that counts for each of those are returned
with each query.
Thanks,
Erik
On Mar 29, 2006, at 1:46 PM, Yonik Seeley wrote:
> Solr has a lot of support to do faceted browsing, but one must
> currently write a custom query handler to implement the faceting
> logic.
>
> The support includes:
> - custom query handlers:
> - the ability to return more data than just a list of documents
> - a filter cache with autowarming, for fast access to the filter for
> each facet
> - more memory efficient and faster intersecting filter
> representations
>
> The part I want in the future is simple faceted browsing without
> having to write any plugins or Java code.. so we need to come up with
> a syntax to represent the desired faceting operations, and then
> implement that syntax in the standard request handler.
>
> To implement a custom query handler, you need to implement
> SolrRequestHandler
> http://incubator.apache.org/solr/docs/api/org/apache/solr/request/
> SolrRequestHandler.html
> and register it in solrconfig.xml
>
> -Yonik
>
> On 3/29/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>> I saw Yonik mentioned faceted browsing as something coming in the
>> future of Solr, but I had thought it was one of the initial features
>> from seeing this announcement ages ago:
>>
>> <http://www.nabble.com/Announcement%3A-Lucene-powering-
>> CNET.com-
>> Product-Category-Listings-t266441.html#a748420>
>>
>> If facets are part of the current Solr codebase, how are they
>> configured and returned in the response?
>>
>> If they aren't currently possible with Solr, what would it take to
>> implement it?
>>
>> I'm still, obviously, just scratching the surface of Solr as I
>> evaluate it for replacing my custom XML-RPC based search server which
>> does rudimentary facets using Filters and BitSet operations.
>>
>> By faceted browsing, a Query is used to search, Hits are returned,
>> but also based on a subset of the fields (indexed, untokenized
>> fields) the number of documents in each of these "facet" fields is
>> returned as well to show counts by each facet.
>>
>> Thanks,
>> Erik
>>
Re: faceted browsing
Posted by Chris Hostetter <ho...@fucit.org>.
: What I'd really like to see is an XML query language so I can toss all
: the hackish URL query arguments and really move much of the query plugin
: logic out into the query itself instead of in the Java code.
: customers. We'll introduce dynamic attribute bucketing. Rather than
: produce a list of counts of all values for an attribute and have "at
: least" or "at most" options, users will be given ranged lists based on
: the actual distribution of the facets. I haven't really worked out
: the details since I haven't actually began the design but I'm probably
: going to see if I can't just look at it like it's on a bell curve and
: start picking evenly sized buckets. Monitors <= 15" (10), 15 -> 17
: (10), 17 -> 21 (10), 21-> 25 (10), > 25 (10). Now obviously I can't
: force it into a nice distribution like that but I'll figure out
: something. In any case, the bucket ranges will need to be based on the
: actual distribution (easy to maintain, hard to implement) in the current
: result set and not some pre-manufactured bucket categories (easy to
: implement, hard to maintain) as those get obsoleted fairly quickly.
Those are really the $64,000 questions ... dynamic bucketing works great
in some cases -- but not all. people like to see price ranges like $0-10,
$10-20, $20-30, $30-infinity ... if you try to make buckets based on
statistical distribution you get things like $0-11.75, $11.75-25.03,
$25.03-70.29, $70.29-infinity.
As for where the logic should live -- having a really robust way to
specify the rules you want to be used for determining which fields to
facet on , and wich type of faceting to do and what buckets to use, etc...
as query time params to the plugin works great when you've got one client
app that wants to drive the bus -- but when you've got lots of apps hiting
your Solr index, you want that data on the server -- either in "metadata
docs" that the plugin knows how to parse, or in the solrconfig.xml.
solrconfig.xml is easier to maintain, but harder to change on the fly --
and metadata docs have the advantage that there can be an arbitrary number
of them, each with a differnet identifier so that the client can say "use
rule set 'desktops'" and you search for the metadata doc with id
"desktops" which can tell you everything you want to know -- but for a
differnet query the client can say "use rulesset 'cameras'" and get facets
that make more sense for those types of products.
i think the ideal robust solution is to come up with a good object
representation for facet rules that can work in all of these cases, and
can be expressed in "solr xml format" and then write a plugin that can
read that info from it's init params, or as a query param, or get a query
param that tells it how to search for a metadata doc which it can expect
to find in that format ... then apply those rules (with good caching of
course)
a lot harder to impliment, but it serves every use case i can think of.
-Hoss
Re: faceted browsing
Posted by Trey Hyde <rh...@hydenetworks.com>.
Chris Hostetter wrote:
>
>: My (our) query plugin uses specialized SolrCache's in lieu of the meta
>: data records. For each new searcher installed each fields possible
>: values will be determined and stored in a cache (off the top of my head,
>
>Are you determining the field values based on all indexed values for those
>fields, or do you have application specific logic in the plugin that knows
>certain fields (like "price") should be ranges, while other fields should
>be discreet?
>
>
Yes. We filter on facets in 2 major ways.
For unranged attributes we simply specify the normalized value that
should appear the in the field (duh).
For unranged attributes we specify the field, a operator and the
normalized value we are comparing against. Here is an example of a
passed parameter.
&atr_A00053=K02147U00054||>4194304
That tells the system to give me only computers with more than 4MB of
RAM (wasn't that obvious?). In this case the K...U... number isn't
actually used (translated that means "4MB"), only the field (A00053) and
the normalized field value (4194303 ... that nonsense value that
currently means 4MB, 4048KB, etc).
This system only exists to maintain compatibility with systems
previously used to manage our AltaVista based search engine. It's not
pretty but it works well given our current functionality requirements.
It also doesn't do bounded searched like 4MB to 8MB.
>that's the reason why I used special metadata docs -- actually that's only
>part of the reason, i needed the facets to be data driven to allow our
>site staff to manage them, and i needed to support vastly different facets
>based on category (hence: one metadata doc per category).
>
>
>
Right, it's all about customer requirements. As above, the data gets
pulled from a live DB the web front end to produce the query strings as
options to the user and the logic is embedded in the query string.
What I'd really like to see is an XML query language so I can toss all
the hackish URL query arguments and really move much of the query plugin
logic out into the query itself instead of in the Java code.
I do intend to revamp our faceting engine in our next major release to
customers. We'll introduce dynamic attribute bucketing. Rather than
produce a list of counts of all values for an attribute and have "at
least" or "at most" options, users will be given ranged lists based on
the actual distribution of the facets. I haven't really worked out
the details since I haven't actually began the design but I'm probably
going to see if I can't just look at it like it's on a bell curve and
start picking evenly sized buckets. Monitors <= 15" (10), 15 -> 17
(10), 17 -> 21 (10), 21-> 25 (10), > 25 (10). Now obviously I can't
force it into a nice distribution like that but I'll figure out
something. In any case, the bucket ranges will need to be based on the
actual distribution (easy to maintain, hard to implement) in the current
result set and not some pre-manufactured bucket categories (easy to
implement, hard to maintain) as those get obsoleted fairly quickly.
Re: faceted browsing
Posted by Chris Hostetter <ho...@fucit.org>.
Hey everybody ... sorry i'm getting into the discussion so late ... i was
in a Cloud Forrest in Costa Rica for 8 days -- plenty of Internet Cafe's,
but i avoided all of them.
First off, i wanted to point out that this discussion came up about a
month ago, and a lot of ideas about how to define complex filters were
discussed...
http://www.nabble.com/metadata-about-result-sets--t1243321.html
http://wiki.apache.org/solr/ComplexFacetingBrainstorming
Second...
: From: Richard "Trey" Hyde <rh...@hydenetworks.com>
:
: My (our) query plugin uses specialized SolrCache's in lieu of the meta
: data records. For each new searcher installed each fields possible
How cool is that!
I was worried that lots of people were just "playing" with Solr but not
really "using" it. But now i find out someone whose name i hadn't heard
of until today isn't just Using Solr, he's writing his own plugins.
Kick Ass!
: My (our) query plugin uses specialized SolrCache's in lieu of the meta
: data records. For each new searcher installed each fields possible
: values will be determined and stored in a cache (off the top of my head,
Are you determining the field values based on all indexed values for those
fields, or do you have application specific logic in the plugin that knows
certain fields (like "price") should be ranges, while other fields should
be discreet?
that's the reason why I used special metadata docs -- actually that's only
part of the reason, i needed the facets to be data driven to allow our
site staff to manage them, and i needed to support vastly different facets
based on category (hence: one metadata doc per category).
-Hoss
Re: faceted browsing
Posted by "Richard \"Trey\" Hyde" <rh...@hydenetworks.com>.
My (our) query plugin uses specialized SolrCache's in lieu of the meta
data records. For each new searcher installed each fields possible
values will be determined and stored in a cache (off the top of my head,
some fields have a cardinality of well over 500k). Each time a query is
run that requests facets then the plugin goes to the cache to get the
DocSet that represents all the possible facets for the current search.
If none are found then it will go to the previously mentioned cache
field value cache, iterating through those values and getting the
document counts for each possible value (for the entire index, not just
the current search). These values are then again thrown into a cache.
Those DocSets are then intersected with the current result set and then
you have all your facet counts. With a bit of auto warming it's all
quite performant. All told we have 6 to 8 specialized caches, I believe
most of them are dedicated to the faceting.
Yonik Seeley wrote:
> On 3/29/06, Clay Webster <we...@gmail.com> wrote:
>
>> How could faceted browsing be accomplished without [Chris's] metadata
>> documents?
>>
>
> The most basic form:
>
> consider if a field called "category" existed on each document.
> You could then ask for the counts of the top 10 values in category
> field for all of the documents matching a query.
>
> Possible syntax: my user query; groupByField(category,10)
>
> Another form would require the user to enumerate the facets... this
> would work well for things like price ranges:
>
> Possible syntax: my user query; groupByQueries(price:[0 TO 10},
> price:[10 TO 100}, price:[100 TO 1000})
>
> And of course, one would want to be able to specify them all in a single query:
>
> my user query; groupByField(category,10), groupByField(author,20),
> groupByQueries(price:[0 TO 10}, price:[10 TO 100}, price:[100 TO
> 1000})
>
>
> The thing that Chris' metadata documents also did was tell you *what*
> facets to do, but that logic could also be kept in the client.
> Standardizing that is probably currently beyond the scope of what we
> could put in the standard request handler.
>
> -Yonik
>
Re: faceted browsing
Posted by Yonik Seeley <ys...@gmail.com>.
On 3/29/06, Clay Webster <we...@gmail.com> wrote:
> How could faceted browsing be accomplished without [Chris's] metadata
> documents?
The most basic form:
consider if a field called "category" existed on each document.
You could then ask for the counts of the top 10 values in category
field for all of the documents matching a query.
Possible syntax: my user query; groupByField(category,10)
Another form would require the user to enumerate the facets... this
would work well for things like price ranges:
Possible syntax: my user query; groupByQueries(price:[0 TO 10},
price:[10 TO 100}, price:[100 TO 1000})
And of course, one would want to be able to specify them all in a single query:
my user query; groupByField(category,10), groupByField(author,20),
groupByQueries(price:[0 TO 10}, price:[10 TO 100}, price:[100 TO
1000})
The thing that Chris' metadata documents also did was tell you *what*
facets to do, but that logic could also be kept in the client.
Standardizing that is probably currently beyond the scope of what we
could put in the standard request handler.
-Yonik
Re: faceted browsing
Posted by Clay Webster <we...@gmail.com>.
How could faceted browsing be accomplished without [Chris's] metadata
documents?
--cw
On 3/29/06, Yonik Seeley <ys...@gmail.com> wrote:
>
> Solr has a lot of support to do faceted browsing, but one must
> currently write a custom query handler to implement the faceting
> logic.
>
> The support includes:
> - custom query handlers:
> - the ability to return more data than just a list of documents
> - a filter cache with autowarming, for fast access to the filter for
> each facet
> - more memory efficient and faster intersecting filter representations
>
> The part I want in the future is simple faceted browsing without
> having to write any plugins or Java code.. so we need to come up with
> a syntax to represent the desired faceting operations, and then
> implement that syntax in the standard request handler.
>
> To implement a custom query handler, you need to implement
> SolrRequestHandler
>
> http://incubator.apache.org/solr/docs/api/org/apache/solr/request/SolrRequestHandler.html
> and register it in solrconfig.xml
>
> -Yonik
>
> On 3/29/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> > I saw Yonik mentioned faceted browsing as something coming in the
> > future of Solr, but I had thought it was one of the initial features
> > from seeing this announcement ages ago:
> >
> > <http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com-
> > Product-Category-Listings-t266441.html#a748420>
> >
> > If facets are part of the current Solr codebase, how are they
> > configured and returned in the response?
> >
> > If they aren't currently possible with Solr, what would it take to
> > implement it?
> >
> > I'm still, obviously, just scratching the surface of Solr as I
> > evaluate it for replacing my custom XML-RPC based search server which
> > does rudimentary facets using Filters and BitSet operations.
> >
> > By faceted browsing, a Query is used to search, Hits are returned,
> > but also based on a subset of the fields (indexed, untokenized
> > fields) the number of documents in each of these "facet" fields is
> > returned as well to show counts by each facet.
> >
> > Thanks,
> > Erik
> >
>
Re: faceted browsing
Posted by Yonik Seeley <ys...@gmail.com>.
Solr has a lot of support to do faceted browsing, but one must
currently write a custom query handler to implement the faceting
logic.
The support includes:
- custom query handlers:
- the ability to return more data than just a list of documents
- a filter cache with autowarming, for fast access to the filter for
each facet
- more memory efficient and faster intersecting filter representations
The part I want in the future is simple faceted browsing without
having to write any plugins or Java code.. so we need to come up with
a syntax to represent the desired faceting operations, and then
implement that syntax in the standard request handler.
To implement a custom query handler, you need to implement SolrRequestHandler
http://incubator.apache.org/solr/docs/api/org/apache/solr/request/SolrRequestHandler.html
and register it in solrconfig.xml
-Yonik
On 3/29/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> I saw Yonik mentioned faceted browsing as something coming in the
> future of Solr, but I had thought it was one of the initial features
> from seeing this announcement ages ago:
>
> <http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com-
> Product-Category-Listings-t266441.html#a748420>
>
> If facets are part of the current Solr codebase, how are they
> configured and returned in the response?
>
> If they aren't currently possible with Solr, what would it take to
> implement it?
>
> I'm still, obviously, just scratching the surface of Solr as I
> evaluate it for replacing my custom XML-RPC based search server which
> does rudimentary facets using Filters and BitSet operations.
>
> By faceted browsing, a Query is used to search, Hits are returned,
> but also based on a subset of the fields (indexed, untokenized
> fields) the number of documents in each of these "facet" fields is
> returned as well to show counts by each facet.
>
> Thanks,
> Erik
>