You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2014/09/23 01:57:34 UTC

[jira] [Commented] (SOLR-6541) Enhancement for SOLR-6452 StatsComponent "missing" stat won't work with docValues=true and indexed=false

    [ https://issues.apache.org/jira/browse/SOLR-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144067#comment-14144067 ] 

Hoss Man commented on SOLR-6541:
--------------------------------


Setting aside point #2 & #3 for a moment (since they seem to be purely about API visibility)...

bq. 1. Accumulate methods should not return stats specific numbers (it is generic). ...

I'm not sure that i understand this sentence -- it seems to be refering to the DocValuesStats.accumSingle and DocValuesStats.accumMulti, and the fact that they return the number of docs "missing" values.  but i don't understand:

a) in what way are you saying these methods are "generic" ?  ... they aren't part of any API contract, they're just package private static methods.

b) how is the missing count a "stats specific number" ? ... at first glance, i thought you ment it was being accumulated in the AbstractStatsValues base class (or code tightly coupled with it) when really it should be associated with code in a specific subclass (similar to how subclasses like "NumericStatsValues" adds stats like "sum" and "mean" that don't make sense for string & enum fields) -- but even if that's hwat you ment, that doens't make sense given that "missing" is part of the AbstractStatsValues contract.

bq. 4. We don't need intermediate maps to accumulate missing counts. ...

At a glance this seems like a good idea to me -- but it seems to straight forward it makes me suspicious of why it wasn't done that way in the first place?

Looking at the comments in SOLR-6452, this seems to have been very deliberate...

bq. As an optimization, why don't you make the missingStats a Map<Integer, Integer> and use the ords as keys instead of the terms. That way you don't need to do the lookupOrd for all docs, and you do it only once per term in the accumulateMissing() method. 


> Enhancement for SOLR-6452 StatsComponent "missing" stat won't work with docValues=true and indexed=false
> --------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-6541
>                 URL: https://issues.apache.org/jira/browse/SOLR-6541
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.10, Trunk
>            Reporter: Vitaliy Zhovtyuk
>            Priority: Minor
>             Fix For: 5.0, Trunk
>
>         Attachments: SOLR-6541.patch
>
>
> This issue is refactoring of solution provided in SOLR-6452 StatsComponent "missing" stat won't work with docValues=true and indexed=false.
> I think the following points need to be addressed:
> 1. Accumulate methods should not return stats specific numbers (it is generic). Attached solution with container class. Also made them private scoped.
> Returning just missing fields from accumulate methods does not allow you to extend it with additional counts field, therefore i propose to leave void.
> 2. Reduced visibility of fields in FieldFacetStats.
> 3. Methods FieldFacetStats#accumulateMissing and FieldFacetStats#accumulateTermNum does not throw any IO exception
> 4. We don't need intermediate maps to accumulate missing counts. Method  org.apache.solr.handler.component.FieldFacetStats#facetMissingNum 
> can be changed to work directly on StatsValues structure and removed org.apache.solr.handler.component.FieldFacetStats#accumulateMissing. 
> We don't need to have it in 2 phases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org