You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/02/28 08:52:25 UTC

[GitHub] [druid] sascha-coenen edited a comment on issue #9419: SQL: top-level sketch functions do not return sketches

sascha-coenen edited a comment on issue #9419: SQL: top-level sketch functions do not return sketches
URL: https://github.com/apache/druid/issues/9419#issuecomment-592413853
 
 
   > @vikramsinghchandel You can set the same context options in the JDBC connection properties: https://druid.apache.org/docs/latest/querying/sql.html#connection-context
   
   hmm, Vikram already pointed out that he is aware of this. His need is to be able to express such things per query: 
   > Now both of these systems **don't have per query level context settings** **they are set at JDBC driver level.**
   
   --
   
   > the complex aggs like sketches are intended to be "finalized" (give the estimate) unless the user explicitly sets finalize=false in the context, so the bug would be that it's returning the base64 encodings with the additional projections
   
   I respectfully disagree:
   
   There are three dimensions to what makes a good user experience
   a) sensible defaults
   b) consistency
   b) control
   
   wrt **sensible defaults**, I agree with you that the common case is to want to materialize a sketch rather than to look at its raw form and therefore finalizing a sketch would be the most sensible default. 
   
   wrt **consistency** however, for all other sketches the current set of Druid SQL UDFs are designed around separating the responsibilities of creating a sketch and then evaluating it. It is also the common case to have to do this for the simple reason that - by default - there is no 1:1 relationship between aggregations and post aggregations. (i.e. should a doubleSum() be post-aggregated with a LOG10 or a SQRT ?  should a quantile sketch be post-aggregated with a union/intersect/not ?)
   So all sketches have UDFs that 
   a) generate a sketch 
   e.g. BLOOM_FILTER, TDIGEST_GENERATE_SKETCH, DS_QUANTILES_SKETCH, DS_HLL, DS_THETA
   b) evalutate a sketch
   e.g. BLOOM_FILTER_TEST, TDIGEST_QUANTILE  
   
   wrt **control**, having a sensible default is okay as long as this doesn't take away control from the author in terms of being able to express his intention and having it carried out. If it is my intention to pull unfinalized sketches in order to forklift them from Druid and feed them into another system, then I need to be able to formulate this. Setting a query context is not letting me do this because this is not usually possible on a per-query case and it is also not explicit within the query that other users need to be able to read and understand. I cannot send a query to another user and be sure that it does what I want because some parts of what the query result would look like are formulated outside of the query (namely in the query context). This is a huge usability problem and I'm looking forward to writing a proposal for expressing such query hints within the SQL queries.
   
   Whether sensible defaults trump consistency is a difficult question to answer. Luckily, this tie has already been broken in this context because the set of evaluating UDFs all work both on sketches and on regular fields, so that they act as shortcuts: If all I want is to create a sketch and then to finalize it, then I can directly use the above set of evaluating sketch functions.
   
   Putting all of this together, it seems that one just needs to follow the same concept for the given HLL/THETA sketches, which means that their generative UDFs aught to always return the sketches and the evaluating UDFs should always returned finalized results and can act as a shortcut to short-circuit the generative UDFS.
   
   The DS_HLL and DS_THETA functions just deviate from the common approach taken for the other sketches which also is reflected in their documentation which in my opinion correctly formulates the intention that these generative functions should yield a sketch and not a finalized result.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org