You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Erik Hatcher <er...@ehatchersolutions.com> on 2006/03/29 20:28:42 UTC

faceted browsing

I saw Yonik mentioned faceted browsing as something coming in the
future of Solr, but I had thought it was one of the initial features
from seeing this announcement ages ago:

	<http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com- 
Product-Category-Listings-t266441.html#a748420>

If facets are part of the current Solr codebase, how are they
configured and returned in the response?

If they aren't currently possible with Solr, what would it take to
implement it?

I'm still, obviously, just scratching the surface of Solr as I
evaluate it for replacing my custom XML-RPC based search server which
does rudimentary facets using Filters and BitSet operations.

By faceted browsing, a Query is used to search, Hits are returned,  
but also based on a subset of the fields (indexed, untokenized  
fields) the number of documents in each of these "facet" fields is  
returned as well to show counts by each facet.

Thanks,
	Erik

Re: faceted browsing

Posted by Yonik Seeley <ys...@gmail.com>.
On 3/30/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> Now I need to investigate the flexibility of the solrconfig.xml - can
> custom parameters be set there, such that a custom SolrRequestHandler
> could read them?   For example, I'd want to list the field names that
> are the "facets", such that counts for each of those are returned
> with each query.

Yes, here is a fragment from the example solrconfig.xml

  <!-- example of a request handler with custom parameters passed to it's init()
  <requestHandler name="example" class="myorg.mypkg.MyRequestHandler" >
    <int name="myparam">1000</int>
    <float name="ratio">1.4142135</float>
    <arr name="myarr"><int>1</int><int>2</int></arr>
    <str>foo</str>
  </requestHandler>
  -->

The XML format is the same as what is used in the response for general data.
In addition to the data types above, there is also <lst> which is the
same as <arr>
except that the elements are named.

-Yonik

Re: faceted browsing

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Yonik,

Thanks for the recommendations.  It's reassuring to know I was on the  
right track in realizing that a custom SolrRequestHandler was needed  
to accomplish this.

Now I need to investigate the flexibility of the solrconfig.xml - can  
custom parameters be set there, such that a custom SolrRequestHandler  
could read them?   For example, I'd want to list the field names that  
are the "facets", such that counts for each of those are returned  
with each query.

Thanks,
	Erik


On Mar 29, 2006, at 1:46 PM, Yonik Seeley wrote:

> Solr has a lot of support to do faceted browsing, but one must
> currently write a custom query handler to implement the faceting
> logic.
>
> The support includes:
>   - custom query handlers:
>   - the ability to return more data than just a list of documents
>   - a filter cache with autowarming, for fast access to the filter for
> each facet
>   - more memory efficient and faster intersecting filter  
> representations
>
> The part I want in the future is simple faceted browsing without
> having to write any plugins or Java code..  so we need to come up with
> a syntax to represent the desired faceting operations, and then
> implement that syntax in the standard request handler.
>
> To implement a custom query handler, you need to implement  
> SolrRequestHandler
> http://incubator.apache.org/solr/docs/api/org/apache/solr/request/ 
> SolrRequestHandler.html
> and register it in solrconfig.xml
>
> -Yonik
>
> On 3/29/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>> I saw Yonik mentioned faceted browsing as something coming in the
>> future of Solr, but I had thought it was one of the initial features
>> from seeing this announcement ages ago:
>>
>>         <http://www.nabble.com/Announcement%3A-Lucene-powering- 
>> CNET.com-
>> Product-Category-Listings-t266441.html#a748420>
>>
>> If facets are part of the current Solr codebase, how are they
>> configured and returned in the response?
>>
>> If they aren't currently possible with Solr, what would it take to
>> implement it?
>>
>> I'm still, obviously, just scratching the surface of Solr as I
>> evaluate it for replacing my custom XML-RPC based search server which
>> does rudimentary facets using Filters and BitSet operations.
>>
>> By faceted browsing, a Query is used to search, Hits are returned,
>> but also based on a subset of the fields (indexed, untokenized
>> fields) the number of documents in each of these "facet" fields is
>> returned as well to show counts by each facet.
>>
>> Thanks,
>>         Erik
>>


Re: faceted browsing

Posted by Chris Hostetter <ho...@fucit.org>.
: What I'd really like to see is an XML query language so I can toss all
: the hackish URL query arguments and really move much of the query plugin
: logic out into the query itself instead of in the Java code.

: customers.   We'll introduce dynamic attribute bucketing.  Rather than
: produce a list of counts of all values for an attribute and have "at
: least" or "at most" options, users will be given ranged lists based on
: the actual distribution of the facets.    I haven't really worked out
: the details since I haven't actually began the design but I'm probably
: going to see if I can't just look at it like it's on a bell curve and
: start picking evenly sized buckets.   Monitors <= 15" (10),  15 -> 17
: (10), 17 -> 21 (10), 21-> 25 (10), > 25 (10).   Now obviously I can't
: force it into a nice distribution like that but I'll figure out
: something.   In any case, the bucket ranges will need to be based on the
: actual distribution (easy to maintain, hard to implement) in the current
: result set and not some pre-manufactured bucket categories (easy to
: implement, hard to maintain) as those get obsoleted fairly quickly.

Those are really the $64,000 questions ... dynamic bucketing works great
in some cases -- but not all.  people like to see price ranges like $0-10,
$10-20, $20-30, $30-infinity ... if you try to make buckets based on
statistical distribution you get things like $0-11.75, $11.75-25.03,
$25.03-70.29, $70.29-infinity.

As for where the logic should live -- having a really robust way to
specify the rules you want to be used for determining which fields to
facet on , and wich type of faceting to do and what buckets to use, etc...
as query time params to the plugin works great when you've got one client
app that wants to drive the bus -- but when you've got lots of apps hiting
your Solr index, you want that data on the server -- either in "metadata
docs" that the plugin knows how to parse, or in the solrconfig.xml.
solrconfig.xml is easier to maintain, but harder to change on the fly --
and metadata docs have the advantage that there can be an arbitrary number
of them, each with a differnet identifier so that the client can say "use
rule set 'desktops'" and you search for the metadata doc with id
"desktops"  which can tell you everything you want to know -- but for a
differnet query the client can say "use rulesset 'cameras'" and get facets
that make more sense for those types of products.


i think the ideal robust solution is to come up with a good object
representation for facet rules that can work in all of these cases, and
can be expressed in "solr xml format" and then write a plugin that can
read that info from it's init params, or as a query param, or get a query
param that tells it how to search for a metadata doc which it can expect
to find in that format ... then apply those rules (with good caching of
course)


a lot harder to impliment, but it serves every use case i can think of.



-Hoss


Re: faceted browsing

Posted by Trey Hyde <rh...@hydenetworks.com>.
Chris Hostetter wrote:

>
>: My (our) query plugin uses specialized SolrCache's in lieu of the meta
>: data records.   For each new searcher installed each fields possible
>: values will be determined and stored in a cache (off the top of my head,
>
>Are you determining the field values based on all indexed values for those
>fields, or do you have application specific logic in the plugin that knows
>certain fields (like "price") should be ranges, while other fields should
>be discreet?
>  
>
Yes.  We filter on facets  in  2 major ways. 
For unranged attributes  we simply specify the normalized value that 
should appear the in the field (duh).
For unranged attributes we specify the field, a operator and the 
normalized value we are comparing against.  Here is an example of a 
passed parameter.

&atr_A00053=K02147U00054||>4194304

That tells the system to give me only computers with more than 4MB of 
RAM (wasn't that obvious?).  In this case the K...U... number isn't 
actually used (translated that means "4MB"), only the field (A00053) and 
the normalized field value (4194303 ... that nonsense value that 
currently means 4MB, 4048KB, etc).

This system only exists to maintain compatibility with systems 
previously used to manage our AltaVista based search engine.  It's not 
pretty but it works well given our current functionality requirements.  
It also doesn't do bounded searched like 4MB to 8MB.

>that's the reason why I used special metadata docs -- actually that's only
>part of the reason, i needed the facets to be data driven to allow our
>site staff to manage them, and i needed to support vastly different facets
>based on category (hence: one metadata doc per category).
>
>  
>
Right, it's all about customer requirements.   As above, the data gets 
pulled from a live DB the web front end to produce the query strings as 
options to the user and the logic is embedded in the query string.     
What I'd really like to see is an XML query language so I can toss all 
the hackish URL query arguments and really move much of the query plugin 
logic out into the query itself instead of in the Java code.

I do intend to revamp our faceting engine in our next major release to 
customers.   We'll introduce dynamic attribute bucketing.  Rather than 
produce a list of counts of all values for an attribute and have "at 
least" or "at most" options, users will be given ranged lists based on 
the actual distribution of the facets.    I haven't really worked out 
the details since I haven't actually began the design but I'm probably 
going to see if I can't just look at it like it's on a bell curve and 
start picking evenly sized buckets.   Monitors <= 15" (10),  15 -> 17 
(10), 17 -> 21 (10), 21-> 25 (10), > 25 (10).   Now obviously I can't 
force it into a nice distribution like that but I'll figure out 
something.   In any case, the bucket ranges will need to be based on the 
actual distribution (easy to maintain, hard to implement) in the current 
result set and not some pre-manufactured bucket categories (easy to 
implement, hard to maintain) as those get obsoleted fairly quickly.


Re: faceted browsing

Posted by Chris Hostetter <ho...@fucit.org>.
Hey everybody ... sorry i'm getting into the discussion so late ... i was
in a Cloud Forrest in Costa Rica for 8 days -- plenty of Internet Cafe's,
but i avoided all of them.


First off, i wanted to point out that this discussion came up about a
month ago, and a lot of ideas about how to define complex filters were
discussed...
 http://www.nabble.com/metadata-about-result-sets--t1243321.html
 http://wiki.apache.org/solr/ComplexFacetingBrainstorming

Second...

: From: Richard "Trey" Hyde <rh...@hydenetworks.com>
:
: My (our) query plugin uses specialized SolrCache's in lieu of the meta
: data records.   For each new searcher installed each fields possible

How cool is that!

I was worried that lots of people were just "playing" with Solr but not
really "using" it.  But now i find out someone whose name i hadn't heard
of until today isn't just Using Solr, he's writing his own plugins.

Kick Ass!

: My (our) query plugin uses specialized SolrCache's in lieu of the meta
: data records.   For each new searcher installed each fields possible
: values will be determined and stored in a cache (off the top of my head,

Are you determining the field values based on all indexed values for those
fields, or do you have application specific logic in the plugin that knows
certain fields (like "price") should be ranges, while other fields should
be discreet?

that's the reason why I used special metadata docs -- actually that's only
part of the reason, i needed the facets to be data driven to allow our
site staff to manage them, and i needed to support vastly different facets
based on category (hence: one metadata doc per category).



-Hoss


Re: faceted browsing

Posted by "Richard \"Trey\" Hyde" <rh...@hydenetworks.com>.
My (our) query plugin uses specialized SolrCache's in lieu of the meta 
data records.   For each new searcher installed each fields possible 
values will be determined and stored in a cache (off the top of my head, 
some fields have a cardinality of well over 500k). Each time a query is 
run that requests facets then the plugin goes to the cache to get the 
DocSet  that represents all the possible facets for the current search.  
If none are found then it will go to the previously mentioned cache 
field value cache, iterating through those values and getting the 
document counts for each possible value (for the entire index, not just 
the current search).   These values are then again thrown into a cache.  
Those DocSets are then intersected with the current result set and then 
you have all your facet counts.  With a bit of auto warming it's all 
quite performant.  All told we have 6 to 8 specialized caches, I believe 
most of them are dedicated to the faceting.

Yonik Seeley wrote:
> On 3/29/06, Clay Webster <we...@gmail.com> wrote:
>   
>> How could faceted browsing be accomplished without [Chris's] metadata
>> documents?
>>     
>
> The most basic form:
>
> consider if a field called "category" existed on each document.
> You could then ask for the counts of the top 10 values in category
> field for all of the documents matching a query.
>
> Possible syntax:   my user query; groupByField(category,10)
>
> Another form would require the user to enumerate the facets... this
> would work well for things like price ranges:
>
> Possible syntax:   my user query; groupByQueries(price:[0 TO 10},
> price:[10 TO 100}, price:[100 TO 1000})
>
> And of course, one would want to be able to specify them all in a single query:
>
> my user query; groupByField(category,10), groupByField(author,20),
> groupByQueries(price:[0 TO 10}, price:[10 TO 100}, price:[100 TO
> 1000})
>
>
> The thing that Chris' metadata documents also did was tell you *what*
> facets to do, but that logic could also be kept in the client. 
> Standardizing that is probably currently beyond the scope of what we
> could put in the standard request handler.
>
> -Yonik
>   


Re: faceted browsing

Posted by Yonik Seeley <ys...@gmail.com>.
On 3/29/06, Clay Webster <we...@gmail.com> wrote:
> How could faceted browsing be accomplished without [Chris's] metadata
> documents?

The most basic form:

consider if a field called "category" existed on each document.
You could then ask for the counts of the top 10 values in category
field for all of the documents matching a query.

Possible syntax:   my user query; groupByField(category,10)

Another form would require the user to enumerate the facets... this
would work well for things like price ranges:

Possible syntax:   my user query; groupByQueries(price:[0 TO 10},
price:[10 TO 100}, price:[100 TO 1000})

And of course, one would want to be able to specify them all in a single query:

my user query; groupByField(category,10), groupByField(author,20),
groupByQueries(price:[0 TO 10}, price:[10 TO 100}, price:[100 TO
1000})


The thing that Chris' metadata documents also did was tell you *what*
facets to do, but that logic could also be kept in the client. 
Standardizing that is probably currently beyond the scope of what we
could put in the standard request handler.

-Yonik

Re: faceted browsing

Posted by Clay Webster <we...@gmail.com>.
How could faceted browsing be accomplished without [Chris's] metadata
documents?

--cw

On 3/29/06, Yonik Seeley <ys...@gmail.com> wrote:
>
> Solr has a lot of support to do faceted browsing, but one must
> currently write a custom query handler to implement the faceting
> logic.
>
> The support includes:
>   - custom query handlers:
>   - the ability to return more data than just a list of documents
>   - a filter cache with autowarming, for fast access to the filter for
> each facet
>   - more memory efficient and faster intersecting filter representations
>
> The part I want in the future is simple faceted browsing without
> having to write any plugins or Java code..  so we need to come up with
> a syntax to represent the desired faceting operations, and then
> implement that syntax in the standard request handler.
>
> To implement a custom query handler, you need to implement
> SolrRequestHandler
>
> http://incubator.apache.org/solr/docs/api/org/apache/solr/request/SolrRequestHandler.html
> and register it in solrconfig.xml
>
> -Yonik
>
> On 3/29/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> > I saw Yonik mentioned faceted browsing as something coming in the
> > future of Solr, but I had thought it was one of the initial features
> > from seeing this announcement ages ago:
> >
> >         <http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com-
> > Product-Category-Listings-t266441.html#a748420>
> >
> > If facets are part of the current Solr codebase, how are they
> > configured and returned in the response?
> >
> > If they aren't currently possible with Solr, what would it take to
> > implement it?
> >
> > I'm still, obviously, just scratching the surface of Solr as I
> > evaluate it for replacing my custom XML-RPC based search server which
> > does rudimentary facets using Filters and BitSet operations.
> >
> > By faceted browsing, a Query is used to search, Hits are returned,
> > but also based on a subset of the fields (indexed, untokenized
> > fields) the number of documents in each of these "facet" fields is
> > returned as well to show counts by each facet.
> >
> > Thanks,
> >         Erik
> >
>

Re: faceted browsing

Posted by Yonik Seeley <ys...@gmail.com>.
Solr has a lot of support to do faceted browsing, but one must
currently write a custom query handler to implement the faceting
logic.

The support includes:
  - custom query handlers:
  - the ability to return more data than just a list of documents
  - a filter cache with autowarming, for fast access to the filter for
each facet
  - more memory efficient and faster intersecting filter representations

The part I want in the future is simple faceted browsing without
having to write any plugins or Java code..  so we need to come up with
a syntax to represent the desired faceting operations, and then
implement that syntax in the standard request handler.

To implement a custom query handler, you need to implement SolrRequestHandler
http://incubator.apache.org/solr/docs/api/org/apache/solr/request/SolrRequestHandler.html
and register it in solrconfig.xml

-Yonik

On 3/29/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> I saw Yonik mentioned faceted browsing as something coming in the
> future of Solr, but I had thought it was one of the initial features
> from seeing this announcement ages ago:
>
>         <http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com-
> Product-Category-Listings-t266441.html#a748420>
>
> If facets are part of the current Solr codebase, how are they
> configured and returned in the response?
>
> If they aren't currently possible with Solr, what would it take to
> implement it?
>
> I'm still, obviously, just scratching the surface of Solr as I
> evaluate it for replacing my custom XML-RPC based search server which
> does rudimentary facets using Filters and BitSet operations.
>
> By faceted browsing, a Query is used to search, Hits are returned,
> but also based on a subset of the fields (indexed, untokenized
> fields) the number of documents in each of these "facet" fields is
> returned as well to show counts by each facet.
>
> Thanks,
>         Erik
>