You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2009/07/19 23:25:46 UTC
[Solr Wiki] Update of "HierarchicalFaceting" by ErikHatcher
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by ErikHatcher:
http://wiki.apache.org/solr/HierarchicalFaceting
The comment on the change is:
New page to compare/contrast hierarchical faceting approaches
New page:
= Overview =
There are many cases where documents represent objects associated with hierarchical structures. For example, if documents represent restaurants one might want to have geographical hierarchies such as "US/California/Cupertino". There are various indexing techniques for faceting on hierarchical structures. At first, this wiki page is designed to describe the various approaches, comparing and contrasting them across various real-world use cases. As approaches become codified and committed to Solr proper, this page will evolve to be a HOW-TO.
There won't be a single best approach to faceting on hierarchical fields, as different field semantics and usages will lend themselves to being indexed in varying ways.
= Comparing some approaches =
There are currently two similar, non-competing, approaches to generating tree/hierarchical facets from Solr: SOLR-64 and SOLR-792. These approaches can be tried out easily using a single set of sample data and the Solr example application (assumes current trunk codebase and latest patches posted to the respective issues).
{{{
svn http://svn.apache.org/repos/asf/lucene/solr/trunk/ hiersolr
cd hiersolr
patch -p0 < SOLR-64.patch
patch -p1 < SOLR-792.patch # note, p1 difference from previous line
ant run-example
# <new shell>
ruby hiergen.rb > hierfacets.csv # hiergen.rb pasted below
curl "http://localhost:8983/solr/update/csv?commit=true&optimize=true" --data-binary @hierfacets.csv -H 'Content-type:text/plain; charset=utf-8'
}}}
The hiergen.rb script outputs CSV with this format:
{{{
id,levels_h,level1_s,level2_s
0,A/1,A,1
1,A/2,A,2
...
259998,Z/9999,Z,9999
259999,Z/10000,Z,10000
}}}
An initial set of two-level hierarchical facets were generated, values A-Z for the top level and values 1-10000 for the second level for a total of 260,000 documents. The levels_h field is used for trying out SOLR-64. The level1_s and level2_s fields are for trying out SOLR-792.
Details of each implementation on generating the entire facet hierarchy across all documents (request stats shown were for 2nd or later duplicate requests, thereby ensuring filter caches are warmed):
SOLR-64:
{{{
$ time curl "http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=levels_h" | wc
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 14.5M 0 14.5M 0 0 2745k 0 --:--:-- 0:00:05 --:--:-- 3359k
4 520101 15256346
real 0m5.431s
user 0m0.143s
sys 0m0.073s
}}}
Solr logged this:
{{{
$ [java] INFO: [] webapp=/solr path=/select params={facet.field=levels_h&rows=0&q=*:*&facet=on} hits=260000 status=0 QTime=907
}}}
Summary of key SOLR-64 stats:
* filter cache entries created: 260027
* solr response time: 907ms
* time to receive response: 5.4s !!!
* response size: 14.5M !!!
SOLR-792:
{{{
$ time curl "http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.tree=level1_s,level2_s&facet.field=level1_s" | wc
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 63816 0 63816 0 0 1213k 0 --:--:-- --:--:-- --:--:-- 5665k
4 2677 63816
real 0m0.056s
user 0m0.002s
sys 0m0.006s
}}}
Solr logged this:
{{{
[java] INFO: [] webapp=/solr path=/select params={facet.field=level1_s&facet.tree=level1_s,level2_s&rows=0&q=*:*&facet=on} hits=260000 status=0 QTime=29
}}}
Summary of key SOLR-792 stats:
* filter cache entries created: 27
* solr response time: 29ms
* time to receive response: 56ms
* response size: 63K
= Basic hierarchical facet use cases =
== Facets across all documents for only top-level of hierarchy ==
In general, no need to leverage any tree/hierarchical faceting for this use case; index the first level as a separate facet field and use current Solr faceting capabilities for this common case.
* SOLR-64: http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=levels_h&facet.depth=1
* SOLR-792: http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=level1_s
== Facet across second level of hierarchy given single top-level constraint ==
* SOLR-64: http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=levels_h&fq=levels_h:A*&facet.mincount=1
* SOLR-792: http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=level2_s&fq=level1_s:A&facet.mincount=1 This is existing Solr built-in faceting/filtering, the SOLR-792 patch is not involved in this request.