You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Marc D'Mello (Jira)" <ji...@apache.org> on 2021/11/23 23:24:00 UTC

[jira] [Comment Edited] (LUCENE-10250) Add hierarchical labels to SSDV facets

    [ https://issues.apache.org/jira/browse/LUCENE-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448303#comment-17448303 ] 

Marc D'Mello edited comment on LUCENE-10250 at 11/23/21, 11:23 PM:
-------------------------------------------------------------------

I used random words as labels here because from my understanding of [this discussion|https://github.com/mikemccand/luceneutil/pull/144#discussion_r727974361], it seems that we cannot generate new wiki line file docs, so I only had access to the info already in the {{enwiki-20120502-lines-1k.txt}} file as a source. Though I agree with your point, the hierarchical categories already in wikipedia would be a good way to test this change.


was (Author: mdmarshmallow):
I used random words as labels here because from my understanding of [this discussion|https://github.com/mikemccand/luceneutil/pull/144#discussion_r727974361], it seems that we cannot generate new wiki line file docs, so I only had access to the info already in the {{enwiki-20120502-lines-1k.txt}} file. Though I agree with your point, the hierarchical categories already in wikipedia would be a good way to test this change.

> Add hierarchical labels to SSDV facets
> --------------------------------------
>
>                 Key: LUCENE-10250
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10250
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Marc D'Mello
>            Priority: Major
>              Labels: discussion
>
> Hi all,
> I recently [added a new benchmarking task|https://github.com/mikemccand/luceneutil/issues/141] to {{luceneutil}} to count facets on a random word chosen from each document which would give us a very high cardinality facet benchmarking compared to the faceting benchmarks we already had. After being merged, [~mikemccand] pointed out some [interesting results|https://home.apache.org/~mikemccand/lucenebench/BrowseRandomLabelTaxoFacets.html] in the nightly benchmarks where the {{BrowseRandomLabelSSDVFacets}} task was much faster than the {{BrowseRandomLabelTaxoFacets}} task.
> I was thinking that using SSDV facets instead of taxonomy facets for our use case at Amazon Product Search could potentially lead to some increases in QPS and decreases in index size, but the issue is we use hierarchical labels, and as I understand it, SSDV faceting only supports a 2 level hierarchy as of today. This leads to my question of why is there a limitation like this on SSDV facets? Is hierarchical labels just a feature that hasn't been implemented in SSDV facets yet, or is there some more complex reason that we can't add hierarchical labels to SSDV facets?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org