You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/04/12 21:25:25 UTC

[GitHub] [incubator-druid] Dylan1312 edited a comment on issue #6814: [Discuss] Replacing hyperUnique as 'default' distinct count sketch

Dylan1312 edited a comment on issue #6814: [Discuss] Replacing hyperUnique as 'default' distinct count sketch
URL: https://github.com/apache/incubator-druid/issues/6814#issuecomment-482727132
 
 
   Hi Lee,
   
   Thanks for the response!
   
   - I've tried various configurations, each of HLL4,6&8 and lgK values of 6&12 for each.
   
   - I'm comparing the time to complete timeseries queries over a small number of segments.
   
   One using a hyperUnique (Druid's native hll) aggregator on a column of hyperUnique hll sketches, versus various queries each using a single HLLSketchMerge aggregator against a column ingested with HLLSketchBuild.
   
   I noticed that the aggregator spends a significant portion of its time passing a bytebuffer to HLLSketch::wrap, deserialization may be the wrong term :).
   
   - With ~18.5M sketches and an historical with a single core I see:
           - An HLL8 sketch queried with lgK 6 takes around 2.5seconds to complete.
           - A hyperunique sketch takes around 1.6seconds to complete.
   
   - I collected what I'm using fairly ad-hoc from a stream of data so I'm not sure. I expect the distribution to be fairly even but this is something I can investigate more.
   
   - I haven't specifically looked at accuracy but I'm seeing both give sketches give answers within 0.4% of each other.
   
   Best regards,
   Dylan

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org