You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Zain Humayun <zh...@yahoo-inc.com.INVALID> on 2017/07/21 23:12:56 UTC

[DISCUSS] Some thoughts about the structure of the Druid Adapter

Hello Calcite Community,

Over the past few months I’ve gotten to familiarize myself with the Druid adapter in Calcite, and wanted to have a discussion on the general structure of the Druid adapter. For those unfamiliar, Druid supports a hand full of queries[1]. Each query has a specialized use case, different trade offs, different fields, etc. At the moment, DruidQuery.java acts as the representation for all the different kinds of queries (Select, TopN, GroupBy, Timeseries). This works well if the queries are simple, but when we try to push more and more into Druid, we start to notice the drawbacks to this approach.

For one, when it comes time to actually issue the Druid query, a great deal of logic is required to determine which kind of Druid query to generate (see DruidQuery#getQuery). Another issue arises in the rules to push RelNodes into Druid. Certain rules will need to check whether pushing a RelNode results in a DruidQuery where the query type is valid. A good example of this DruidSortRule. Again, there is more logic to determine which kind of DruidQuery will be produced. This becomes a problem when one needs to add on to a rule, or needs to do the same check for their own rule. Lastly, I think that the current adapter structure makes it harder for newcomers to understand how the adapter works, and harder to add features to it. All of these problems get worse when we decide to support another (existing or future) Druid query type later on.

With all that said, it would then seem natural to represent each query type as it’s own RelNode (DruidSelectQuery, DruidTopNQuery, etc). DruidQuery can serve as an abstract base class that contains a rule to transform itself into the different kinds of druid query RelNodes. Each query type will contain it’s own set specialized rules to push RelNodes into them, a tailored cost function, and logic. The VolcanoPlanner takes care of the rest. Whichever druid query can achieve the lowest cost by pushing in the most RelNodes will be chosen and executed.

This, of course would be a very large refactor, but I think it would be beneficial in the long run. There’s at least one open JIRA ticket (CALCITE-1206) that would benefit from this change.

Anyways, i’d be interested to hear from the community on what their thoughts on this kind of change are.

Thanks!

Zain

[1] http://druid.io/docs/0.10.0/querying/querying.html

Re: [DISCUSS] Some thoughts about the structure of the Druid Adapter

Posted by Julian Hyde <jh...@apache.org>.

Suppose we were to calculate the DruidQuery.QuerySpec or its enum
QueryType at the same time as the creation of a DruidQuery. So the
DruidQuery.querySpec field would final and mandatory, set in the
constructor.(Currently we compute it on demand a little while later.)

Then I think we'd end up with very similar code to what you are
suggesting, but it would live in sub-classes QueryType, or perhaps
switch statements in QuerySpec or DruidQuery, rather than sub-classes
of DruidQuery.

Would that have a similar effect?

I am leary about sub-classing DruidQuery because you only get to slice
it one way. Also, when you have done the slicing, you have to move
code into the slices, and that makes it into a "big" refactoring just
by virtue of the amount of code that is copy-pasted from one place to
another.

QuerySpec already seems to do its job pretty well, it's just that
DruidQuery can't rely on it being fully there.

Julian


On Fri, Jul 21, 2017 at 4:12 PM, Zain Humayun
<zh...@yahoo-inc.com.invalid> wrote:
>
> Hello Calcite Community,
>
>
>
> Over the past few months I’ve gotten to familiarize myself with the Druid adapter in Calcite, and wanted to have a discussion on the general structure of the Druid adapter. For those unfamiliar, Druid supports a hand full of queries[1]. Each query has a specialized use case, different trade offs, different fields, etc. At the moment, DruidQuery.java acts as the representation for all the different kinds of queries (Select, TopN, GroupBy, Timeseries). This works well if the queries are simple, but when we try to push more and more into Druid, we start to notice the drawbacks to this approach.
>
>
>
> For one, when it comes time to actually issue the Druid query, a great deal of logic is required to determine which kind of Druid query to generate (see DruidQuery#getQuery). Another issue arises in the rules to push RelNodes into Druid. Certain rules will need to check whether pushing a RelNode results in a DruidQuery where the query type is valid. A good example of this DruidSortRule. Again, there is more logic to determine which kind of DruidQuery will be produced. This becomes a problem when one needs to add on to a rule, or needs to do the same check for their own rule. Lastly, I think that the current adapter structure makes it harder for newcomers to understand how the adapter works, and harder to add features to it. All of these problems get worse when we decide to support another (existing or future) Druid query type later on.
>
>
>
> With all that said, it would then seem natural to represent each query type as it’s own RelNode (DruidSelectQuery, DruidTopNQuery, etc). DruidQuery can serve as an abstract base class that contains a rule to transform itself into the different kinds of druid query RelNodes. Each query type will contain it’s own set specialized rules to push RelNodes into them, a tailored cost function, and logic. The VolcanoPlanner takes care of the rest. Whichever druid query can achieve the lowest cost by pushing in the most RelNodes will be chosen and executed.
>
>
>
> This, of course would be a very large refactor, but I think it would be beneficial in the long run. There’s at least one open JIRA ticket (CALCITE-1206) that would benefit from this change.
>
>
>
> Anyways, i’d be interested to hear from the community on what their thoughts on this kind of change are.
>
>
>
> Thanks!
>
>
> Zain
>
>
> [1] http://druid.io/docs/0.10.0/querying/querying.html
>
>
>
>
>
>