You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "aharbunou-branch (via GitHub)" <gi...@apache.org> on 2023/03/28 01:35:08 UTC

[GitHub] [druid] aharbunou-branch opened a new issue, #13987: groupBy HLL merge produces inconsistent results

aharbunou-branch opened a new issue, #13987:
URL: https://github.com/apache/druid/issues/13987

   ### Affected Version
   
   0.19.+
   
   ### Description
   
   I'm upgrading Druid from 0.18.1 to 25.0.0.
   We use HLL from druid-datasketches extension. 
   0.18.1 always produces stable result for HLLSketchMerge whereas starting from next version (i.e. 0.19.+) result is different from call to call. 
   
   For this test I did the following:
   - I deployed fresh 25.0.0 Druid cluster with just one historical node. 
   - I ingested about 10 hourly segments with test data having just one dimension with value `false` and HLL metric lgk=16. (I tried different lgks and all of them started to deviate at some point. I tested with lgk=16 as it is commonly used right now)
   - I ran groupBy queries against historical bypassing broker (I have same results from broker as well).
   - I tried multiple combination of different dimensions/metrics and saw same behavior.
   
   Result is always consistent with just one segment ingested. However, at some point it starts to deviate i.e. everytime I run groupBy query against this datasource I see different values for HLL metrics:
   Request 
   ```
   {
      "aggregations":[
         {
            "fieldName":"sketch_unique_count_16",
            "name":"unique_count_16",
            "type":"HLLSketchMerge",
            "lgK":16,
            "tgtHllType":"HLL_8"
         }
      ],
      "dataSource":"test_datasource",
      "dimensions":[
         "test_flag"
      ],
      "queryType":"groupBy",
      "intervals":"2023-03-25T21:00:00Z/2023-04-01T23:59:59Z",
      "granularity":"all"
   }
   ```
   Responses were
   `15621.84397004316`, `15662.87505956021`, `15600.366289040447`, `15635.015943397264` and etc.
   
   Another observation that once I downgraded Druid back to 0.18.1 groupBy returned consistent result for datasource ingested by newer version.
   
   Is it a new HLL behavior that is expected starts from 0.19.+? 
   If no, could you please help to find out what can contribute to this inconsistency?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [I] groupBy HLL merge produces inconsistent results (druid)

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on issue #13987:
URL: https://github.com/apache/druid/issues/13987#issuecomment-1964800645

   Closing since this behavior is expected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [I] groupBy HLL merge produces inconsistent results (druid)

Posted by "VKrishna-Branch (via GitHub)" <gi...@apache.org>.
VKrishna-Branch commented on issue #13987:
URL: https://github.com/apache/druid/issues/13987#issuecomment-1942730628

   commenting to keep this issue open.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [I] groupBy HLL merge produces inconsistent results (druid)

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm closed issue #13987: groupBy HLL merge produces inconsistent results
URL: https://github.com/apache/druid/issues/13987


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [I] groupBy HLL merge produces inconsistent results (druid)

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #13987:
URL: https://github.com/apache/druid/issues/13987#issuecomment-1936762848

   This issue has been marked as stale due to 280 days of inactivity.
   It will be closed in 4 weeks if no further activity occurs. If this issue is still
   relevant, please simply write any comment. Even if closed, you can still revive the
   issue at any time or discuss it on the dev@druid.apache.org list.
   Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [I] groupBy HLL merge produces inconsistent results (druid)

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on issue #13987:
URL: https://github.com/apache/druid/issues/13987#issuecomment-1964800158

   In general it isn't an expectation that the HLL operations produce exactly the same estimate every time.
   
   I did ask the datasketches folks about this once, you can see the response if you have an account on ASF Slack: https://the-asf.slack.com/archives/CP0930GKG/p1682452090261029. TLDR is that repeatable order-insensitive results are not something that the datasketches team is going for with most of their sketches. So in Druid land we inherit this.
   
   A couple of relevant comments in the discussion, from a datasketches developer:
   
   > Sketches, by their design are approximate and/or probabilistic, and in general, the results from a sketch should be viewed as such.  This means that the user should not expect either exact results, because sketches are approximate, nor expect exact repeatability of results even with identical inputs, because sketches should be treated as probabilistic.
   
   & 
   
   > As soon as the user starts demanding exactness in terms of either accuracy or repeatability of results, it requires specific details of the sketch algorithm, and may not be achievable without compromising other properties of the sketch as Jon mentions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org