You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Zain Humayun (JIRA)" <ji...@apache.org> on 2017/05/16 18:29:04 UTC

[jira] [Comment Edited] (CALCITE-1787) thetaSketch Support for Druid Adapter

    [ https://issues.apache.org/jira/browse/CALCITE-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16012728#comment-16012728 ] 

Zain Humayun edited comment on CALCITE-1787 at 5/16/17 6:28 PM:
----------------------------------------------------------------

Apologies if I didn't fully understand your comment, but I have a few questions:
1. If we were to create a new expression, how would calcite know that the DB we are connecting to supports partial aggregates before the expression is built? For example, not all Druid instances support thetaSketches (aslo not all metrics are of type thetaSketch), and the Druid adapter only gets this information in {{DruidConnectionImpl#metadata}} - after (I believe) the expressions have been derived. 

2. Can you give me a sample query where the HISTOGRAM_AGG is used?

Thanks!




was (Author: zhumayun):
Apologies if I didn't fully understand your comment, but I have a few questions:
1. If we were to create a new expression, how would calcite know that the DB we are connecting to supports partial aggregates before the expression is built? For example, not all Druid instances support thetaSketches, and the Druid adapter only gets this information in {{DruidConnectionImpl#metadata}} - after (I believe) the expressions have been derived. 

2. Can you give me a sample query where the HISTOGRAM_AGG is used?

Thanks!



> thetaSketch Support for Druid Adapter
> -------------------------------------
>
>                 Key: CALCITE-1787
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1787
>             Project: Calcite
>          Issue Type: New Feature
>          Components: druid
>    Affects Versions: 1.12.0
>            Reporter: Zain Humayun
>            Assignee: Julian Hyde
>            Priority: Minor
>             Fix For: 1.12.0
>
>
> Currently, the Druid adapter does not support the [thetaSketch|http://druid.io/docs/latest/development/extensions-core/datasketches-aggregators.html] aggregate type, which is used to measure the cardinality of a column quickly. Many Druid instances support theta sketches, so I think it would be a nice feature to have.
> I've been looking at the Druid adapter, and propose we add a new DruidType called {{thetaSketch}} and then add logic in the {{getJsonAggregation}} method in class {{DruidQuery}} to generate the {{thetaSketch}} aggregate. This will require accessing information about the columns (what data type they are) so that the thetaSketch aggregate is only produced if the column's type is {{thetaSketch}}. 
> Also, I've noticed that a {{hyperUnique}} DruidType is currently defined, but a {{hyperUnique}} aggregate is never produced. Since both are approximate aggregators, I could also couple in the logic for {{hyperUnique}}.
> I'd love to hear your thoughts on my approach, and any suggestions you have for this feature.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)