You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jack Krupansky <ja...@gmail.com> on 2015/01/14 18:01:09 UTC

Distributed mode for stats component?

Does anybody know for sure whether the stats component fully supports
distributed mode? It is listed in the doc as supporting distributed mode
(at least for old, non-SolrCloud distrib mode), but... I don't see any code
that actually does that. Nor any tests, unless they are hidden somewhere I
didn't look.

In particular, I am interested in the "countdistinct" parameter which would
need to retrieve all distinct values from all other shards to detect
whether any of the distinct values overlap between shards.

If this is supported, where exactly is the code to do it?

I know the new analytics component doesn't support distributed mode, but my
question is about the old "stats" component.

-- Jack Krupansky

Re: Distributed mode for stats component?

Posted by Jack Krupansky <ja...@gmail.com>.
Thanks, Chris. I just needed to stare at the code I already knew about more
intently to see what was really going on. It's super convoluted and super
confusing. The keys were the handleResponses method in the main component
class and the AbstractStatsValues class that is hidden in the
StatsValuesFactory source file. Oddly, the StatsValues source file doesn't
contain the classes that implement that interface - they're in the
"factory" source file!

BTW, we should have some doc notes on the limitations and performance
implications of the stats component. Although, admittedly, it's moot if
stats is eventually to be superseded by the analytics component.

-- Jack Krupansky

On Wed, Jan 14, 2015 at 12:26 PM, Chris Hostetter <ho...@fucit.org>
wrote:

>
> : Does anybody know for sure whether the stats component fully supports
> : distributed mode? It is listed in the doc as supporting distributed mode
>
> it's been supported for as long as i can remember -- since Day 1 of the
> StatsComponent i believe.
>
> : (at least for old, non-SolrCloud distrib mode), but... I don't see any
> code
> : that actually does that. Nor any tests, unless they are hidden somewhere
> I
> : didn't look.
>
> just like any other SearchComponent: look at StatsComponent.prepare,
> StatsComponent.process, ...distributedProcess, ....modifyRequest,
> ...handleResponses, ...finishStage, etc...
>
>
> : In particular, I am interested in the "countdistinct" parameter which
> would
> : need to retrieve all distinct values from all other shards to detect
> : whether any of the distinct values overlap between shards.
>
> yep -- that's exactly what it does ... totally naive and not a good idea
> at all for fields with non-trivial cardinality, which is why you have to
> explicitly turn it on with "calcDistinct" and why i wnat to replace it
> with HyperLogLog approximations...
>
> https://issues.apache.org/jira/browse/SOLR-6968
>
> -Hoss
> http://www.lucidworks.com/
>

Re: Distributed mode for stats component?

Posted by Chris Hostetter <ho...@fucit.org>.
: Does anybody know for sure whether the stats component fully supports
: distributed mode? It is listed in the doc as supporting distributed mode

it's been supported for as long as i can remember -- since Day 1 of the 
StatsComponent i believe.

: (at least for old, non-SolrCloud distrib mode), but... I don't see any code
: that actually does that. Nor any tests, unless they are hidden somewhere I
: didn't look.

just like any other SearchComponent: look at StatsComponent.prepare, 
StatsComponent.process, ...distributedProcess, ....modifyRequest, 
...handleResponses, ...finishStage, etc...


: In particular, I am interested in the "countdistinct" parameter which would
: need to retrieve all distinct values from all other shards to detect
: whether any of the distinct values overlap between shards.

yep -- that's exactly what it does ... totally naive and not a good idea 
at all for fields with non-trivial cardinality, which is why you have to 
explicitly turn it on with "calcDistinct" and why i wnat to replace it 
with HyperLogLog approximations...

https://issues.apache.org/jira/browse/SOLR-6968

-Hoss
http://www.lucidworks.com/