You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alfonso Muñoz-Pomer Fuentes <am...@ebi.ac.uk> on 2018/08/27 20:38:28 UTC

unique() in the JSON facets doesn’t count all the different values in a field

Hi all,

We’re running a SolrCloud 7.1 instance in our service and I’ve come across at a disagreement when trying to find out the different values a field has:

Using the JSON facets API with unique():
3385

Using the JSON facets API with terms:
3388

Using the stats component:
countDistinct	3388
cardinality	3356

My biggest surprise is that the unique function using the JSON facets doesn’t get the value correctly. Is this to be expected, in the same way that cardinality is an approximation?

If this is a bug (unless there’s something very basic I’m missing, I think it is), how should I report it? It’s the fist time I’ve seen a disagreement between unique and terms, and I don’t know how to reproduce it unless it’s with our specific collection.

Many thanks in advance.

--
Alfonso Muñoz-Pomer Fuentes
Senior Lead Software Engineer @ Expression Atlas Team
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Tel:+ 44 (0) 1223 49 2633
Skype: amunozpomer


Re: unique() in the JSON facets doesn’t count all the different values in a field

Posted by Alfonso Muñoz-Pomer Fuentes <am...@ebi.ac.uk>.
I just found out reading the Solr ref. guide for 7.1 that:
> • JSON Facet API now uses hyper-log-log for numBuckets cardinality calculation and calculates cardinality

Unfortunately the HTML ref. guide for version 7.1 didn’t contain any documentation regarding the JSON facet API. I’ve checked in later versions that unique returns approximate values for cardinalities higher than 100 (https://lucene.apache.org/solr/guide/7_2/json-facet-api.html#AggregationFunctions).

Please dismiss the previous email!



> On 27 Aug 2018, at 21:38, Alfonso Muñoz-Pomer Fuentes <am...@ebi.ac.uk> wrote:
> 
> Hi all,
> 
> We’re running a SolrCloud 7.1 instance in our service and I’ve come across at a disagreement when trying to find out the different values a field has:
> 
> Using the JSON facets API with unique():
> 3385
> 
> Using the JSON facets API with terms:
> 3388
> 
> Using the stats component:
> countDistinct	3388
> cardinality	3356
> 
> My biggest surprise is that the unique function using the JSON facets doesn’t get the value correctly. Is this to be expected, in the same way that cardinality is an approximation?
> 
> If this is a bug (unless there’s something very basic I’m missing, I think it is), how should I report it? It’s the fist time I’ve seen a disagreement between unique and terms, and I don’t know how to reproduce it unless it’s with our specific collection.
> 
> Many thanks in advance.
> 
> --
> Alfonso Muñoz-Pomer Fuentes
> Senior Lead Software Engineer @ Expression Atlas Team
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
> Tel:+ 44 (0) 1223 49 2633
> Skype: amunozpomer
> 

--
Alfonso Muñoz-Pomer Fuentes
Senior Lead Software Engineer @ Expression Atlas Team
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Tel:+ 44 (0) 1223 49 2633
Skype: amunozpomer