You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2010/03/25 17:27:29 UTC

[jira] Commented: (SOLR-1672) RFE: facet reverse sort count

    [ https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12849755#action_12849755 ] 

Hoss Man commented on SOLR-1672:
--------------------------------

Some old notes on this patch that i just found on my laptop (presumably from the last time i was on a plane) ...

* The existing patch is in a weird format that i coulnd't apply
* re-reading the patch, and comparing to the SimpleFacets and UnInvertedField source, i'm noticing some that several code paths for facet counts aren't being accounted for
* I think what we should do conceptially is refactor all of the code that looks at the existing FacetParams.FACET_SORT param (or any of the constant values for it) into a helper function thta parses the new legal values we want to support and returns a Comparator, and then start passing that comparator arround to the various strategies (termenum, fieldcache, uninverted) for collecting facet constraints, instead of just passing arround the "sort" string value...
** "true","count","count desc" => a comparator that does descending count sort
** "count asc" => a comparator that does ascending count sort
** "false","index","index asc" => null (by returning a null comparator we would be signalling that no sorting or bounded collection is needed, terms can be processed in order)
** "index desc" => a comparater that does descendeing term sort (not requested in this Jira, but recently asked about on the mailing list)
* The problem with that conceptual solution is that UnInvertedField doesn't maintain a BoundedTreeSet of CountPairs like all of hte other code paths, it uses a single Long to encode both the count and the index of the term, so it would need some special logic.
** Side question: I wonder if that Long encoded format would work for hte field cache based faceting as well?


> RFE: facet reverse sort count
> -----------------------------
>
>                 Key: SOLR-1672
>                 URL: https://issues.apache.org/jira/browse/SOLR-1672
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 1.4
>         Environment: Java, Solrj, http
>            Reporter: Peter Sturge
>            Priority: Minor
>         Attachments: SOLR-1672.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> As suggested by Chris Hosstetter, I have added an optional Comparator to the BoundedTreeSet<Long> in the UnInvertedField class.
> This optional comparator is used when a new (and also optional) field facet parameter called 'facet.sortorder' is set to the string 'dsc' 
> (e.g. &f.<facetname>.facet.sortorder=dsc for per field, or &facet.sortorder=dsc for all facets).
> Note that this parameter has no effect if facet.method=enum.
> Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to its default behaviour.
>  
> This change affects 2 source files:
> > UnInvertedField.java
>     [line 438] The getCounts() method signature is modified to add the 'facetSortOrder' parameter value to the end of the argument list.
>  
> DIFF UnInvertedField.java:
> - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix) throws IOException {
> + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix, String facetSortOrder) throws IOException {
>     [line 556] The getCounts() method is modified to create an overridden BoundedTreeSet<Long>(int, Comparator) if the 'facetSortOrder' parameter equals 'dsc'.
> DIFF UnInvertedField.java:
> - final BoundedTreeSet<Long> queue = new BoundedTreeSet<Long>(maxsize);
> + final BoundedTreeSet<Long> queue = (sort.equals("count") || sort.equals("true")) ? (facetSortOrder.equals("dsc") ? new BoundedTreeSet<Long>(maxsize, new Comparator()
> { @Override
> public int compare(Object o1, Object o2)
> {
>   if (o1 == null || o2 == null)
>     return 0;
>   int result = ((Long) o1).compareTo((Long) o2);
>   return (result != 0 ? result > 0 ? -1 : 1 : 0); //lowest number first sort
> }}) : new BoundedTreeSet<Long>(maxsize)) : null;
> > SimpleFacets.java
>     [line 221] A getFieldParam(field, "facet.sortorder", "asc"); is added to retrieve the new parameter, if present. 'asc' used as a default value.
> DIFF SimpleFacets.java:
> + String facetSortOrder = params.getFieldParam(field, "facet.sortorder", "asc");
>  
>     [line 253] The call to uif.getCounts() in the getTermCounts() method is modified to pass the 'facetSortOrder' value string.
> DIFF SimpleFacets.java:
> - counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix);
> + counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix, facetSortOrder);
> Implementation Notes:
> I have noted in testing that I was not able to retrieve any '0' counts as I had expected.
> I believe this could be because there appear to be some optimizations in SimpleFacets/count caching such that zero counts are not iterated (at least not by default)
> as a performance enhancement.
> I could be wrong about this, and zero counts may appear under some other as yet untested circumstances. Perhaps an expert familiar with this part of the code can clarify.
> In fact, this is not such a bad thing (at least for my requirements), as a whole bunch of zero counts is not necessarily useful (for my requirements, starting at '1' is just right).
>  
> There may, however, be instances where someone *will* want zero counts - e.g. searching for zero product stock counts (e.g. 'what have we run out of'). I was envisioning the facet.mincount field
> being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 or possibly higher), but because of the caching/optimization, the behaviour is somewhat different than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.