You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shai Erera (JIRA)" <ji...@apache.org> on 2013/11/07 20:05:18 UTC
[jira] [Commented] (LUCENE-5333) Support sparse faceting for heterogeneous indices

    [ https://issues.apache.org/jira/browse/LUCENE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13816282#comment-13816282 ] 

Shai Erera commented on LUCENE-5333:
------------------------------------

A simple way to achieve that is to write an AllFacetsAccumulator which:

* Wraps a FacetsAccumulator and takes no FacetRequests
* Creates FacetRequests for each child of ROOT (by asking for the children of ROOT)
* Delegates .accumulate() to the wrapped FacetsAccumulator
* In the end it goes of the List<FacetResult> and removes any FRes which has no children (meaning no interesting facets were returned).

Some drawbacks to this:

* It's currently not clear how you can get the children of ROOT with SortedSetDocValues since it doesn't implement a TaxonomyReader. Maybe SSDVReaderState could have a getAllDims() method?
* SortedSetDVAccumulator only does counting, but if you pass a TaxonomyFA, you might be interested to do other aggregations. I guess we could at first only do counting and think about other aggregation methods later on, but that means it needs to create CountingFR explicitly.
* It may not be very efficient if e.g. you have 10s or 100s of dimensions, with a total number of categories that's huge, because what the method does is it will traverse the children of each dimension, cause it cannot tell up front if a dimension had or not any children. We could resolve that later.

What do you think?

> Support sparse faceting for heterogeneous indices
> -------------------------------------------------
>
>                 Key: LUCENE-5333
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5333
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/facet
>            Reporter: Michael McCandless
>
> In some search apps, e.g. a large e-commerce site, the index can have
> a mix of wildly different product categories and facet dimensions, and
> the number of dimensions could be huge.
> E.g. maybe the index has shirts, computer memory, hard drives, etc.,
> and each of these many categories has different attributes.
> In such an index, when someone searches for "so dimm", which should
> match a bunch of laptop memory modules, you can't (easily) know up
> front which facet dimensions will be important.
> But, I think this is very easy for the facet module, since ords are
> stored "row stride" (each doc lists all facet labels it has), we could
> simply count all facets that the hits actually saw, and then in the
> end see which ones "got traction" and return facet results for these
> top dims.
> I'm not sure what the API would look like, but conceptually this
> should work very well, because of how the facet module works.
> You shouldn't have to state up front exactly which facet dimensions
> to count...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org