You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2014/08/06 20:20:12 UTC

[jira] [Commented] (SOLR-6329) facet.pivot.mincount=0 doesn't work well in distributed pivot faceting

    [ https://issues.apache.org/jira/browse/SOLR-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087984#comment-14087984 ] 

Hoss Man commented on SOLR-6329:
--------------------------------

Notes from SOLR-2894 about the root of the issue...

{panel}

>From what I can tell, the gist of the issue is that when dealing with sub-fields of the pivot, the coordination code doesn't know about some of the "0" values if no shard which has the value for the parent field even knows about the existence of the term.

The simplest example of this discrepency (compared to single node pivots) is to consider an index with only 2 docs...

{noformat}
[{"id":1,"top_s":"foo","sub_s":"bar"}
 {"id":2,"top_s":"xxx","sub_s":"yyy"}]
{noformat}

If those two docs exist in a single node index, and you pivot on {{top_s,sub_s}} using mincount=0 you get a response like this...

{noformat}
$ curl -sS 'http://localhost:8881/solr/select?q=*:*&rows=0&facet=true&facet.pivot.mincount=0&facet.pivot=top_s,sub_s&omitHeader=true&wt=json&indent=true'
{
  "response":{"numFound":2,"start":0,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{},
    "facet_dates":{},
    "facet_ranges":{},
    "facet_intervals":{},
    "facet_pivot":{
      "top_s,sub_s":[{
          "field":"top_s",
          "value":"foo",
          "count":1,
          "pivot":[{
              "field":"sub_s",
              "value":"bar",
              "count":1},
            {
              "field":"sub_s",
              "value":"yyy",
              "count":0}]},
        {
          "field":"top_s",
          "value":"xxx",
          "count":1,
          "pivot":[{
              "field":"sub_s",
              "value":"yyy",
              "count":1},
            {
              "field":"sub_s",
              "value":"bar",
              "count":0}]}]}}}
{noformat}

If however you index each of those docs on a seperate shard, the response comes back like this...

{noformat}
$ curl -sS 'http://localhost:8881/solr/select?q=*:*&rows=0&facet=true&facet.pivot.mincount=0&facet.pivot=top_s,sub_s&omitHeader=true&wt=json&indent=true&shards=localhost:8881/solr,localhost:8882/solr'
{
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{},
    "facet_dates":{},
    "facet_ranges":{},
    "facet_intervals":{},
    "facet_pivot":{
      "top_s,sub_s":[{
          "field":"top_s",
          "value":"foo",
          "count":1,
          "pivot":[{
              "field":"sub_s",
              "value":"bar",
              "count":1}]},
        {
          "field":"top_s",
          "value":"xxx",
          "count":1,
          "pivot":[{
              "field":"sub_s",
              "value":"yyy",
              "count":1}]}]}}}
{noformat}

The only solution i can think of, would be an extra (special to mincount=0) stage of logic, after each PivotFacetField is refined, that would:
* iterate over all the values of the current pivot
* build up a Set of all all the known values for the child-pivots of of those values
* iterate over all the values again, merging in a "0"-count child value for every value in the set

...ie: "At least one shard knows about value 'v_x' in field 'sub_field', so add a count of '0' for 'v_x' in every 'sub_field' collection nested under the 'top_field' in our 'top_field,sub_field' pivot"

I haven't thought this idea through enough to be confident it would work, or that it's worth doing ... i'm certainly not convinced that mincount=0 makes enough sense in a facet.pivot usecase to think getting this test working should hold up getting this committed -- probably something that should just be committed as is, with an open Jira that it's a known bug.
{panel}

SOLR-2894 includes a commented out test case related to using mincount=0 in distributed pivot faceting in DistributedFacetPivotLargeTest (annotated with "SOLR-6329")

> facet.pivot.mincount=0 doesn't work well in distributed pivot faceting
> ----------------------------------------------------------------------
>
>                 Key: SOLR-6329
>                 URL: https://issues.apache.org/jira/browse/SOLR-6329
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hoss Man
>            Priority: Minor
>
> Using facet.pivot.mincount=0 in conjunction with the distributed pivot faceting support being added in SOLR-2894 doesn't work as folks would expect if they are use to using facet.pivot.mincount=0 in a single node setup.
> Filing this issue to track this as a known defect, because it may not have a viable solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org