You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Doug Daniels <dd...@rooftophq.com> on 2007/11/16 20:43:09 UTC

grouped facets

Hi,

I'm looking to implement a slightly different style of faceting than what
the SimpleFacets provides right now.  Particularly, I'd like to implement
something more like kayak.com or farechase.com, where facets are grouped
together into buckets (e.g. airline, destination airport), but selection of
an item in the group doesn't remove information about the other items in the
group.  For instance, when I select "America West", I should still see the
number of results I would add to my search if I selected "United".  However,
if I select an item from a different group like destination airport, it
should change the counts on the airports group.

Has anyone else built something like this yet?

I've been modeling it in the DisMax syntax, having a user query U and
several facet groups A, B, and C.  Each facet group would have an associated
filter query containing the user's current selection in the group, as well
as facet fields and queries for the group.

To get the facet counts for a group, I believe you'd want a DocSet
intersecting the user's query and the filter queries for all other groups. 
So for instance, to get the facet counts for group A, I'd want a DocSet
intersecting U, B, and C.  I'd then run the facet queries and fields against
that DocSet to produce the counts for group A.

I'm thinking to do this by using getDocListAndSet to grab the DocSet for
each distinct group, and then running the SimpleFacets code against that
DocSet.  I think if I keep score retrieval and sorting out of the flags, the
cache should kick in for every DocSet after the first one, including
calculation of the main query.

Does this seem like a reasonable approach?  If anyone else is looking for
this functionality, I could also try to package it into something for the
solr repository.  I'm doing it in a custom request handler right now, so it
might be a bit more difficult to specify the relationships between filter
queries and facets in the URL syntax for general use.

Thanks,
-d
-- 
View this message in context: http://www.nabble.com/grouped-facets-tf4823454.html#a13800156
Sent from the Solr - Dev mailing list archive at Nabble.com.


Re: grouped facets

Posted by Doug Daniels <dd...@rooftophq.com>.


hossman wrote:
> 
> : the SimpleFacets provides right now.  Particularly, I'd like to
> implement
> : something more like kayak.com or farechase.com, where facets are grouped
> : together into buckets (e.g. airline, destination airport), but selection
> of
> : an item in the group doesn't remove information about the other items in
> the
> : group.  For instance, when I select "America West", I should still see
> the
> : number of results I would add to my search if I selected "United". 
> However,
> : if I select an item from a different group like destination airport, it
> : should change the counts on the airports group.
> 
> this seems like a fairly trivial UI issue that wouldn't require any custom 
> solr code.  you do the first search, and get back all the facet counts to 
> display next to each checkbox.  as boxes are checked, you do subsequent 
> requests to update the main result area of hte page, but you don't change 
> the counts next to the checkboxes (unless they are in a different group)
> 
> 

Hmm--I've been using the word filter, but I think that's actually
misleading.  The behavior that I'm modeling is more additive: you click some
checkboxes and add results into the search.  The count next to each checkbox
tells you the number of results you could add to your search, rather than
the number that you'll filter down to if you hit the checkbox.

I think doing this requires two different data sets.  The main results data
set is generated from the user's search and the checkbox options they've
chosen, as usual.  However, the checkbox data set (options and counts)
removes the filters, showing all the potential options with their count of
results they'd add if selected.  If we used the main results data set to
produce this, we could get all the options from facets, but would get 0
counts afor groups where an option was already selected.

As long as all checkboxes start out checked, then the main data set and the
checkbox data set are the same, and I could use the strategy you recommend. 
But if some options are deselected from the start, then I'd need separate
data sets for the main query and checkbox query.

I had previously been considering a more complex scenario than this, where
the counts in each checkbox set depend on the selected options in the other
checkbox sets.  For example, the counts next to each airline option would be
built from a data set including the user's query and the other checkbox
sets: destination airport, number of stops, etc.  However, these would be
additive options, so you'd still see the number of results you could add by
selecting United, even if only America West was selected as an airline.

I think doing that would require a dataset per checkbox (different filter
queries applied for each) and then one for the main result query with all
filters applied.  However, after playing around with it a bit more I find
the UI confusing, so I'm going to abandon that idea and go with the simpler
version above.

I see two ways that I could do the simpler version:

  1. Make two requests to solr, one for the checkbox data set and one for
the main data set.
  2. Add a way to ignore some/all filter queries when computing facets,
essentially building additive, checkbox-style facets.

I'm thinking to go with the former, as it requires no additional development
(though it does waste a bit of processing time with and extra request and
creating DocLists that aren't used).

	...


hossman wrote:
> 
> i wasn't relaly following the first part of your description of what you 
> are doing, but i don't see any reason to use getDocListAndSet if all you 
> need is a DocSet for SimpleFacets ... you can just use getDocSet and the 
> filterCache should take care of everything as well.
> 

I wanted to use getDocSet instead of getDocListAndSet, but there isn't a
public method that takes a List of filterQueries.  I also couldn't find a
non-protected way to turn a List of filterQuery into a DocSet, which the
public method for getDocSet takes as a filter.  Seems like it'd be
worthwhile to make getDocSet(List<Query>) public.

Anyhow, if I just make two separate requests I won't need this now.


-- 
View this message in context: http://www.nabble.com/grouped-facets-tf4823454.html#a13954705
Sent from the Solr - Dev mailing list archive at Nabble.com.


Re: grouped facets

Posted by Chris Hostetter <ho...@fucit.org>.
: the SimpleFacets provides right now.  Particularly, I'd like to implement
: something more like kayak.com or farechase.com, where facets are grouped
: together into buckets (e.g. airline, destination airport), but selection of
: an item in the group doesn't remove information about the other items in the
: group.  For instance, when I select "America West", I should still see the
: number of results I would add to my search if I selected "United".  However,
: if I select an item from a different group like destination airport, it
: should change the counts on the airports group.

this seems like a fairly trivial UI issue that wouldn't require any custom 
solr code.  you do the first search, and get back all the facet counts to 
display next to each checkbox.  as boxes are checked, you do subsequent 
requests to update the main result area of hte page, but you don't change 
the counts next to the checkboxes (unless they are in a different group)

	...
: I'm thinking to do this by using getDocListAndSet to grab the DocSet for
: each distinct group, and then running the SimpleFacets code against that
: DocSet.  I think if I keep score retrieval and sorting out of the flags, the
: cache should kick in for every DocSet after the first one, including
: calculation of the main query.

i wasn't relaly following the first part of your description of what you 
are doing, but i don't see any reason to use getDocListAndSet if all you 
need is a DocSet for SimpleFacets ... you can just use getDocSet and the 
filterCache should take care of everything as well.




-Hoss