You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Russell B <ru...@gmail.com> on 2011/06/30 11:38:11 UTC

Taxonomy faceting

I have a hierarchical taxonomy of documents that I would like users to be
able to search either through search or "drill-down" faceting.  The
documents may appear at multiple points in the hierarchy.  I've got a
solution working as follows: a multivalued field labelled category which for
each document defines where in the tree it should appear.  For example: doc1
has the category field set to "0/topics", "1/topics/computing",
"2/topic/computing/systems".

I then facet on the 'category' field, filter the results with fq={!raw
f=category}1/topics/computing to get everything below that point on the
tree, and use f.category.facet.prefix to restrict the facet fields to the
current level.

Full query something like:

http://localhost:8080/solr/select/?q=something&facet=true&facet.field=category&fq={!rawf=category}1/topics/computing&f.category.facet.prefix=2/topic/computing


Playing around with the results, it seems to work ok but despite reading
lots about faceting I can't help feel there might be a better solution.  Are
there better ways to achieve this?  Any comments/suggestions are welcome.

(Any suggestions as to what interface I can put on top of this are also
gratefully received!).


Thanks,

Russell

Re: Taxonomy faceting

Posted by Chris Hostetter <ho...@fucit.org>.
: Lucid Imagination did a webcast on this, as far as I remember?

that was me ... the webcast was a pre-run of my apachecon talk...

http://www.lucidimagination.com/why-lucid/webinars/mastering-power-faceted-search
http://people.apache.org/~hossman/apachecon2010/facets/

...taxonomy stuff comes up ~slide 30

: The '1/topics/computing'-solution works at a single level, so if you are
: interested in a multi-level result like

if you want to show the whole tree when facetig you can just leave the 
"depth" number prefix out of terms, thta should work fine (but i haven't 
though about hard)

: > Are there better ways to achieve this?
: 
: Taxonomy faceting is a bit of a mess right now, but it is also an area
: where a lot is happening. For SOLR, there is

right, some of which i havne't been able to keep up on and can't comment 
on -- but in my experience if you are serious organizing your data in a 
taxonomy then you probably already have some data structure in your 
application layer that models the whole thing in memory, and maps nodeIds 
to nodeLabels and what not.  What usually works fine is to just index the 
nodeIds for the entire ancestory of the category each Document is in can 
work fine for the filtering (ie: fq=cat:1234), and to generate the facet 
presentation you do a simple facet.field=ancestorCategories&facet.limit=-1 
to get all the counts in a big hashmap and then use that to annotate your 
own own category tree data structure that you use to generate the 
presentaiton.



-Hoss

Re: Taxonomy faceting

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Thu, 2011-06-30 at 11:38 +0200, Russell B wrote:
> a multivalued field labelled category which for each document defines
> where in the tree it should appear.  For example: doc1 has the
> category field set to "0/topics", "1/topics/computing",
> "2/topic/computing/systems".
> 
> I then facet on the 'category' field, filter the results with fq={!raw
> f=category}1/topics/computing to get everything below that point on the
> tree, and use f.category.facet.prefix to restrict the facet fields to the
> current level.

Lucid Imagination did a webcast on this, as far as I remember?

> Playing around with the results, it seems to work ok but despite reading
> lots about faceting I can't help feel there might be a better solution.

The '1/topics/computing'-solution works at a single level, so if you are
interested in a multi-level result like
- topic
 - computing
  - hardware
  - software
 - biology
  - plants
  - animals
you have to do more requests.

> Are there better ways to achieve this?

Taxonomy faceting is a bit of a mess right now, but it is also an area
where a lot is happening. For SOLR, there is

https://issues.apache.org/jira/browse/SOLR-64
(single path/document hierarchical faceting)

https://issues.apache.org/jira/browse/SOLR-792
(pivot faceting, now part of trunk AFAIR)

https://issues.apache.org/jira/browse/SOLR-2412
(multi path/document hierarchical faceting, very experimental)

Just yesterday, another multi path/document hierarchical faceting
solution was added to the Lucene 3.x branch and Lucene trunk. It has
been used by IBM for some time and appears to be mature and stable.
https://issues.apache.org/jira/browse/LUCENE-3079
However, this solution requires a sidecar index for the taxonomy and I
am a bit worried about how this fits into the Solr index workflow.


Re: Taxonomy faceting

Posted by da...@ontrenet.com.
That's a good way. How does it perform?

Another way would be to store the "parent" topics in a field.
Whenever a parent node is drilled-into, simply search for all documents
with that parent. Perhaps not as elegant as your approach though.

I'd be interested in the performance comparison between the two approaches.

> I have a hierarchical taxonomy of documents that I would like users to be
> able to search either through search or "drill-down" faceting.  The
> documents may appear at multiple points in the hierarchy.  I've got a
> solution working as follows: a multivalued field labelled category which
> for
> each document defines where in the tree it should appear.  For example:
> doc1
> has the category field set to "0/topics", "1/topics/computing",
> "2/topic/computing/systems".
>
> I then facet on the 'category' field, filter the results with fq={!raw
> f=category}1/topics/computing to get everything below that point on the
> tree, and use f.category.facet.prefix to restrict the facet fields to the
> current level.
>
> Full query something like:
>
> http://localhost:8080/solr/select/?q=something&facet=true&facet.field=category&fq={!rawf=category}1/topics/computing&f.category.facet.prefix=2/topic/computing
>
>
> Playing around with the results, it seems to work ok but despite reading
> lots about faceting I can't help feel there might be a better solution.
> Are
> there better ways to achieve this?  Any comments/suggestions are welcome.
>
> (Any suggestions as to what interface I can put on top of this are also
> gratefully received!).
>
>
> Thanks,
>
> Russell
>