You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rob Audenaerde <ro...@gmail.com> on 2015/01/12 13:38:23 UTC

fill 'empty' facet-values, sampling, taxoreader

Hi all,

I'm building an application in which users can add arbitrary documents, and
all fields will be added as facets as well. This allows users to browse
their documents by their own defined facets easily.

However, when the number of documents gets very large, I switch to
random-sampled facets to make sure the application stays responsive. By the
nature of sampling, documents (and thus facet-values) will be missed.

I let the user select the number of facet-values he want to see for each
facets. For example, the default is 10. If a facet contains values 1 to 20,
the user will always see 10 values if all documents are returned in the
search and no sampling is done.

If sampling is done, and the values are non-uniformly distributed, the user
might end up with only 5 values instead of 10. I want to 'fill' the empty 5
facet-value-slots with existing facet-values and an unknown facet-count
(?). The reason behind this, is that this value might exist in the
resultset and for interaction purposes, it is very nice if this value can
be selected and added to the query, to quickly find if there are documents
that also contain this facet value.

It is even more useful if these facet values are not sorted by count, but
by label. The user can then quickly see there are document that contain a
certain value.

I can iterate over the ordinals via the TaxonomyReader and TaxonomyFacets
(by leveraging the 'children'), but these ordinals might no longer be used
in the documents.

What would be a good approach to tackle this issue?