You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/05/10 17:16:49 UTC

[GitHub] [druid] gianm commented on pull request #11201: Add "stringEncoding" parameter to DataSketches HLL.

gianm commented on pull request #11201:
URL: https://github.com/apache/druid/pull/11201#issuecomment-836992606


   > This is not correct, at least for the HLL in datasketches-java (I'm not sure what the Druid adaptor does). Strings are encoded using UTF-8 and have been for as long as I can remember. If you wish to use UTF-16, you just convert your string to char[] and the HLL sketch will accept that as well.
   
   @leerho Understood, but it is true as far as Druid is concerned — the HllSketch-based aggregator implementation in Druid does `update(s.toCharArray())` not `update(s)`: https://github.com/apache/druid/blob/8296123d895db7d06bc4517db5e767afb7862b83/extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/hll/HllSketchBuildAggregator.java#L103
   
   >  Nonetheless, whatever you decide, you will always need to stick with your choice.
   
   Yep, that's why this must be an option and the choice needs to be made in a consistent way.
   
   > I have some comments about PR 353 but I want to make these in the actual PR.
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org