You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shai Erera (JIRA)" <ji...@apache.org> on 2012/12/12 15:11:21 UTC

[jira] [Commented] (LUCENE-4622) TopKFacetsResultHandler should tie break sort by label not ord?

    [ https://issues.apache.org/jira/browse/LUCENE-4622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529965#comment-13529965 ] 

Shai Erera commented on LUCENE-4622:
------------------------------------

I think that there are two issues here Mike - which facets get into the top-K list and how is that list sorted to the user?

Currently, the facets that make it into the top-K list are the ones that have the higher weight (default, can be reversed). If there are collisions, then the facets with the lower ordinals win. That's very similar to regular Lucene Sort, which breaks ties on doc IDs, and can be easily explained to users as "if facets have same counts, the top K is determined by which facet was added first to the index" ... really like doc IDs.

The second issue is how the top-K categories are sorted on ties. Currently they are not re-sorted by label. All of the applications that I've seen sort the categories in the UI lexicographically (as a secondary sort). And some applications that used facets for computing a tag cloud, sort the tag cloud by label (first sort) and apply UI techniques to emphasize a category by its weight. We figured that it's just a waste of CPU time to do this sort, someone can very easily do that w/ the FacetResults that he gets.

So in my opinion:
* We should still break on ordinal for entering into the Top-K. It'd be very costly otherwise, and I'm not sure how critical it is.
* Sorting by weight + label worries me. Why should an app pay for that if e.g. it's going to sort in the UI anyway, say by the user's Locale? Or if it doesn't care about the sort?

I would rather that we don't change the default, but maybe add an API for doing that, like a utility class, or as a method on FacetResult. Whoever cares about sorting, just call this method.

What worries me is that it may not be so obvious to apps that don't want the sort, but get it sorted, that there's some extra work being done behind the scenes. However for apps that get the results not sorted and care about it, it's very clear to them that they need to do something, and for that we'll have an API, or they can really implement by themselves?

Sorting when depth>1 by label is going to be even trickier ...

Maybe as a compromise we can make FacetResultNode comparable and there break ties on label? Then you could traverse your tree of results and call Collections.sort(resultNodes) at each level?
                
> TopKFacetsResultHandler should tie break sort by label not ord?
> ---------------------------------------------------------------
>
>                 Key: LUCENE-4622
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4622
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/facet
>            Reporter: Michael McCandless
>
> EG I now get these facets:
> {noformat}
> Author (5)
>  Lisa (2)
>  Frank (1)
>  Susan (1)
>  Bob (1)
> {noformat}
> The primary sort is by count, but secondary is by ord (= order in which they were indexed), which is not really understandable/transparent to the end user.  I think it'd be best if we could do tie-break sort by label ...
> But talking to Shai, this seems hard/costly to fix, because when visiting the facet ords to collect the top K, we don't currently resolve to label, and in the worst case (say my example had a million labels with count 1) that's a lot of extra label lookups ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org