You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Clint Wylie <cl...@imply.io> on 2019/01/05 01:59:33 UTC

Dynamic Parameters

Hi all,

I've been investigating/experimenting with adding support for dynamic
parameters in Druid (
https://github.com/apache/incubator-druid/tree/master/sql) and I've come up
with a few questions and was also wondering if there are any
recommendations or best practices on this matter?

My first question I guess is "how to do?". In my digging so far it appears
there are basically 2 sides necessary.

The first is how to handle binding the values of the dynamic parameters.
Druid's sql planner has 2 planning paths, both paths parse the sql into an
SqlNode, validate it, and turn it into an initial rel. From there it tries
to "plan with druid convention" which transforms with druid rules which
build query fragments for the eventual transformation into a native druid
query(ies) for pushdown, and a fallback path that uses bindable convention
to support non druid queries against information schema or system table (or
druid queries that can't plan into native queries with a generic scan query
I guess). The bindable convention path is a non-issue and in my experiments
can just use 'DataContext' and the 'bind' function. But the `DruidRel`
transformation runs into issues because everything is currently blissfully
unaware of the existence of 'RexDynamicParam', and so the transformation is
expecting the nodes to be 'RexLiteral' so that the values can be extracted
and the native filters and aggregators and such be constructed. Is the most
legitimate way to handle this to make all of these parts able to handle
dynamic parameters?

Alternative/sneaky ways I can think of to handle binding values for druid
expressions seems to be (assuming the parameters all have values to bind at
planning time) to either use an SqlVisitor to walk the sql tree looking for
SqlDynamicParam and constructing an SqlLiteral of the value, or a RelVistor
to walk the rel tree and replace RexDynamicParam with a RexLiteral of the
value. Both of these appear to work, with some caveats, since the dynamic
parameters are just eliminated before the transformation which happens as
if everything was a literal in the first place.

The SqlVisitor approach is tricky because constructing an SqlLiteral seems
tough since SqlTypeName's createSqlLiteral doesn't appear to support the
full range of JDBC types that might appear, so this logic would need to be
fairly sophisticated.

The RelVisitor approach is a more complicated tree to walk, but the actual
replace easier since the RexBuilder literal creation function doesn't have
the type issue that sql literals do. Walking at the rel level also appears
that it can miss out on some rel collapses that happen when creating the
rel from the validated sql, so I would need a mechanism to simplify the rel
tree after replacing the dynamic params with literals, so that if they were
equivalent to any other relational expressions they could be recognized as
the same expression.

SELECT
  dim1,
  COUNT(*) FILTER(WHERE dim2 <> ?)/COUNT(*) as ratio
FROM druid.foo
GROUP BY dim1
HAVING COUNT(*) FILTER(WHERE dim2 <> ?)/COUNT(*) = 1

if param 1 = 'a' and param 2 = 'a' the `COUNT(*) FILTER(WHERE dim2 <>
?)/COUNT(*)` are not treated as the same as were they literals in the first
place.

I think a similar simplification would also have to happen if I did add
support for RexDynamicParam to appear during the DruidRel transformation as
well, because it would be effectively late binding the values at that point
after all of the transformations have happened at least afaict.

Are the visitor based approaches hacky or missing something important? It
seems much less disruptive from my point of view since all of the druid
conversion wouldn't have to know how to handle a dynamic parameter, but
curious mostly if there are problems with doing this.


The 2nd major part i think is adding support for an actual 'prepare' that
collects the types of the dynamic parameters encountered in the sql to
support the PrepareStatement on the jdbc side of things. Currently the
druid statements just run 'plan' in the 'prepare' method, which doesn't
collect these types and maybe does more work than it should. In my sketches
so far I've modified 'DruidPlanner' to have a 'prepare' function that
resembles some of the functionality in 'CalcitePrepareImpl' but without
doing a full implementation of the 'CalcitePrepare' interface. This appears
sufficient to construct the list of AvaticaParameters, and functions as
expected in my tests so far. Is this an adequate way to do this or is it
necessary to implement 'CalcitePrepare' or somehow use 'CalcitePrepareImpl'
instead?


I've also noticed in a lot of my experiments that I run into SqlValidator
exceptions that appear to be related to being unable to infer the type of a
dynamic parameter, usually either something like:

    SELECT SUBSTRING(dim2, ?, ?) FROM druid.foo LIMIT ?
    => Illegal use of dynamic parameter
or
    SELECT
      CASE 'foo'
      WHEN ? THEN SUM(cnt) / ?
      WHEN ? THEN SUM(m1) / ?
      END
    FROM foo
    => Cannot apply '/' to arguments of type '<BIGINT> / <UNKNOWN>'

which can be mitigated by doing something like `CAST(? as INT)` giving the
operand an explicit type, but that seems like an unfortunate workaround. Is
there a way to handle this?

Any pointers would be appreciated, thanks!

Cheers,
Clint

Re: Dynamic Parameters

Posted by Clint Wylie <cl...@imply.io>.
Hi Stamatis,

Just now getting back to this, but thank you very much for the detailed
response! At least it seems like I was on the right track, and you've given
me some things to consider. To my knowledge, the druid-sql implementation
was partially based on/inspired by the calcite Druid adapter, but with
subsequent refactorings has started to drift . The biggest difference is
that it produces native java objects rather than a json request, but there
are still a few recognizable bits.

If I have the chance, I'll see if I can work out a solution to some of the
validator errors while I work on this implementation, and contribute back
upstream.

Thanks again!

Clint


On Thu, Jan 24, 2019 at 12:46 AM Stamatis Zampetakis <za...@gmail.com>
wrote:

> Hi Clint,
>
> Thanks for opening this discussion.
>
> Probably you are already aware of this but the Calcite project has already
> an adapter for Druid [1] from which
> you can take possibly some ideas. Having said that, I am not sure if
> dynamic parameters are handled somehow by DruidQuery since I couldn't find
> something relevant.
>
> From what I can see DruidRel also follows the Bindable convention which
> means that you have access to the DataContext. During planning and the
> transformation to
> the Druid convention you could introduce some placeholders for the dynamic
> parameters that are get populated from the DataCotnext during the execution
> of the query
> (so very similar to how parameters are handled in the Bindable convention).
>
> Other than that the visitors approach also seems reasonable to me although
> it means that you will somehow copy the whole RelNode tree for each
> execution of the query
> with different parameters.
>
> Indeed after binding the parameters there might exist simplifications that
> can be performed in the RelNode tree but it may not be a very good idea to
> do them. Using parameters
> is beneficial for having always the same execution plan so not have to
> optimize and compile the same query multiple times.
>
> I don't think it is required to implement CalcitePrepare if you don't need
> to. It depends very much on what are your needs.
>
> The validator exceptions you observed are indeed annoying. It should be
> possible to improve this behavior by working on the way the validator
> infers unknown types [2].
>
> Best,
> Stamatis
>
> [1] https://github.com/apache/calcite/tree/master/druid
> [2]
>
> https://github.com/apache/calcite/blob/e804521beeec30eccb5ae3a29620f818d0de8427/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L1725
>
> Στις Σάβ, 5 Ιαν 2019 στις 2:59 π.μ., ο/η Clint Wylie <clint.wylie@imply.io
> >
> έγραψε:
>
> > Hi all,
> >
> > I've been investigating/experimenting with adding support for dynamic
> > parameters in Druid (
> > https://github.com/apache/incubator-druid/tree/master/sql) and I've come
> > up
> > with a few questions and was also wondering if there are any
> > recommendations or best practices on this matter?
> >
> > My first question I guess is "how to do?". In my digging so far it
> appears
> > there are basically 2 sides necessary.
> >
> > The first is how to handle binding the values of the dynamic parameters.
> > Druid's sql planner has 2 planning paths, both paths parse the sql into
> an
> > SqlNode, validate it, and turn it into an initial rel. From there it
> tries
> > to "plan with druid convention" which transforms with druid rules which
> > build query fragments for the eventual transformation into a native druid
> > query(ies) for pushdown, and a fallback path that uses bindable
> convention
> > to support non druid queries against information schema or system table
> (or
> > druid queries that can't plan into native queries with a generic scan
> query
> > I guess). The bindable convention path is a non-issue and in my
> experiments
> > can just use 'DataContext' and the 'bind' function. But the `DruidRel`
> > transformation runs into issues because everything is currently
> blissfully
> > unaware of the existence of 'RexDynamicParam', and so the transformation
> is
> > expecting the nodes to be 'RexLiteral' so that the values can be
> extracted
> > and the native filters and aggregators and such be constructed. Is the
> most
> > legitimate way to handle this to make all of these parts able to handle
> > dynamic parameters?
> >
> > Alternative/sneaky ways I can think of to handle binding values for druid
> > expressions seems to be (assuming the parameters all have values to bind
> at
> > planning time) to either use an SqlVisitor to walk the sql tree looking
> for
> > SqlDynamicParam and constructing an SqlLiteral of the value, or a
> RelVistor
> > to walk the rel tree and replace RexDynamicParam with a RexLiteral of the
> > value. Both of these appear to work, with some caveats, since the dynamic
> > parameters are just eliminated before the transformation which happens as
> > if everything was a literal in the first place.
> >
> > The SqlVisitor approach is tricky because constructing an SqlLiteral
> seems
> > tough since SqlTypeName's createSqlLiteral doesn't appear to support the
> > full range of JDBC types that might appear, so this logic would need to
> be
> > fairly sophisticated.
> >
> > The RelVisitor approach is a more complicated tree to walk, but the
> actual
> > replace easier since the RexBuilder literal creation function doesn't
> have
> > the type issue that sql literals do. Walking at the rel level also
> appears
> > that it can miss out on some rel collapses that happen when creating the
> > rel from the validated sql, so I would need a mechanism to simplify the
> rel
> > tree after replacing the dynamic params with literals, so that if they
> were
> > equivalent to any other relational expressions they could be recognized
> as
> > the same expression.
> >
> > SELECT
> >   dim1,
> >   COUNT(*) FILTER(WHERE dim2 <> ?)/COUNT(*) as ratio
> > FROM druid.foo
> > GROUP BY dim1
> > HAVING COUNT(*) FILTER(WHERE dim2 <> ?)/COUNT(*) = 1
> >
> > if param 1 = 'a' and param 2 = 'a' the `COUNT(*) FILTER(WHERE dim2 <>
> > ?)/COUNT(*)` are not treated as the same as were they literals in the
> first
> > place.
> >
> > I think a similar simplification would also have to happen if I did add
> > support for RexDynamicParam to appear during the DruidRel transformation
> as
> > well, because it would be effectively late binding the values at that
> point
> > after all of the transformations have happened at least afaict.
> >
> > Are the visitor based approaches hacky or missing something important? It
> > seems much less disruptive from my point of view since all of the druid
> > conversion wouldn't have to know how to handle a dynamic parameter, but
> > curious mostly if there are problems with doing this.
> >
> >
> > The 2nd major part i think is adding support for an actual 'prepare' that
> > collects the types of the dynamic parameters encountered in the sql to
> > support the PrepareStatement on the jdbc side of things. Currently the
> > druid statements just run 'plan' in the 'prepare' method, which doesn't
> > collect these types and maybe does more work than it should. In my
> sketches
> > so far I've modified 'DruidPlanner' to have a 'prepare' function that
> > resembles some of the functionality in 'CalcitePrepareImpl' but without
> > doing a full implementation of the 'CalcitePrepare' interface. This
> appears
> > sufficient to construct the list of AvaticaParameters, and functions as
> > expected in my tests so far. Is this an adequate way to do this or is it
> > necessary to implement 'CalcitePrepare' or somehow use
> 'CalcitePrepareImpl'
> > instead?
> >
> >
> > I've also noticed in a lot of my experiments that I run into SqlValidator
> > exceptions that appear to be related to being unable to infer the type
> of a
> > dynamic parameter, usually either something like:
> >
> >     SELECT SUBSTRING(dim2, ?, ?) FROM druid.foo LIMIT ?
> >     => Illegal use of dynamic parameter
> > or
> >     SELECT
> >       CASE 'foo'
> >       WHEN ? THEN SUM(cnt) / ?
> >       WHEN ? THEN SUM(m1) / ?
> >       END
> >     FROM foo
> >     => Cannot apply '/' to arguments of type '<BIGINT> / <UNKNOWN>'
> >
> > which can be mitigated by doing something like `CAST(? as INT)` giving
> the
> > operand an explicit type, but that seems like an unfortunate workaround.
> Is
> > there a way to handle this?
> >
> > Any pointers would be appreciated, thanks!
> >
> > Cheers,
> > Clint
> >
>

Re: Dynamic Parameters

Posted by Stamatis Zampetakis <za...@gmail.com>.
Hi Clint,

Thanks for opening this discussion.

Probably you are already aware of this but the Calcite project has already
an adapter for Druid [1] from which
you can take possibly some ideas. Having said that, I am not sure if
dynamic parameters are handled somehow by DruidQuery since I couldn't find
something relevant.

From what I can see DruidRel also follows the Bindable convention which
means that you have access to the DataContext. During planning and the
transformation to
the Druid convention you could introduce some placeholders for the dynamic
parameters that are get populated from the DataCotnext during the execution
of the query
(so very similar to how parameters are handled in the Bindable convention).

Other than that the visitors approach also seems reasonable to me although
it means that you will somehow copy the whole RelNode tree for each
execution of the query
with different parameters.

Indeed after binding the parameters there might exist simplifications that
can be performed in the RelNode tree but it may not be a very good idea to
do them. Using parameters
is beneficial for having always the same execution plan so not have to
optimize and compile the same query multiple times.

I don't think it is required to implement CalcitePrepare if you don't need
to. It depends very much on what are your needs.

The validator exceptions you observed are indeed annoying. It should be
possible to improve this behavior by working on the way the validator
infers unknown types [2].

Best,
Stamatis

[1] https://github.com/apache/calcite/tree/master/druid
[2]
https://github.com/apache/calcite/blob/e804521beeec30eccb5ae3a29620f818d0de8427/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L1725

Στις Σάβ, 5 Ιαν 2019 στις 2:59 π.μ., ο/η Clint Wylie <cl...@imply.io>
έγραψε:

> Hi all,
>
> I've been investigating/experimenting with adding support for dynamic
> parameters in Druid (
> https://github.com/apache/incubator-druid/tree/master/sql) and I've come
> up
> with a few questions and was also wondering if there are any
> recommendations or best practices on this matter?
>
> My first question I guess is "how to do?". In my digging so far it appears
> there are basically 2 sides necessary.
>
> The first is how to handle binding the values of the dynamic parameters.
> Druid's sql planner has 2 planning paths, both paths parse the sql into an
> SqlNode, validate it, and turn it into an initial rel. From there it tries
> to "plan with druid convention" which transforms with druid rules which
> build query fragments for the eventual transformation into a native druid
> query(ies) for pushdown, and a fallback path that uses bindable convention
> to support non druid queries against information schema or system table (or
> druid queries that can't plan into native queries with a generic scan query
> I guess). The bindable convention path is a non-issue and in my experiments
> can just use 'DataContext' and the 'bind' function. But the `DruidRel`
> transformation runs into issues because everything is currently blissfully
> unaware of the existence of 'RexDynamicParam', and so the transformation is
> expecting the nodes to be 'RexLiteral' so that the values can be extracted
> and the native filters and aggregators and such be constructed. Is the most
> legitimate way to handle this to make all of these parts able to handle
> dynamic parameters?
>
> Alternative/sneaky ways I can think of to handle binding values for druid
> expressions seems to be (assuming the parameters all have values to bind at
> planning time) to either use an SqlVisitor to walk the sql tree looking for
> SqlDynamicParam and constructing an SqlLiteral of the value, or a RelVistor
> to walk the rel tree and replace RexDynamicParam with a RexLiteral of the
> value. Both of these appear to work, with some caveats, since the dynamic
> parameters are just eliminated before the transformation which happens as
> if everything was a literal in the first place.
>
> The SqlVisitor approach is tricky because constructing an SqlLiteral seems
> tough since SqlTypeName's createSqlLiteral doesn't appear to support the
> full range of JDBC types that might appear, so this logic would need to be
> fairly sophisticated.
>
> The RelVisitor approach is a more complicated tree to walk, but the actual
> replace easier since the RexBuilder literal creation function doesn't have
> the type issue that sql literals do. Walking at the rel level also appears
> that it can miss out on some rel collapses that happen when creating the
> rel from the validated sql, so I would need a mechanism to simplify the rel
> tree after replacing the dynamic params with literals, so that if they were
> equivalent to any other relational expressions they could be recognized as
> the same expression.
>
> SELECT
>   dim1,
>   COUNT(*) FILTER(WHERE dim2 <> ?)/COUNT(*) as ratio
> FROM druid.foo
> GROUP BY dim1
> HAVING COUNT(*) FILTER(WHERE dim2 <> ?)/COUNT(*) = 1
>
> if param 1 = 'a' and param 2 = 'a' the `COUNT(*) FILTER(WHERE dim2 <>
> ?)/COUNT(*)` are not treated as the same as were they literals in the first
> place.
>
> I think a similar simplification would also have to happen if I did add
> support for RexDynamicParam to appear during the DruidRel transformation as
> well, because it would be effectively late binding the values at that point
> after all of the transformations have happened at least afaict.
>
> Are the visitor based approaches hacky or missing something important? It
> seems much less disruptive from my point of view since all of the druid
> conversion wouldn't have to know how to handle a dynamic parameter, but
> curious mostly if there are problems with doing this.
>
>
> The 2nd major part i think is adding support for an actual 'prepare' that
> collects the types of the dynamic parameters encountered in the sql to
> support the PrepareStatement on the jdbc side of things. Currently the
> druid statements just run 'plan' in the 'prepare' method, which doesn't
> collect these types and maybe does more work than it should. In my sketches
> so far I've modified 'DruidPlanner' to have a 'prepare' function that
> resembles some of the functionality in 'CalcitePrepareImpl' but without
> doing a full implementation of the 'CalcitePrepare' interface. This appears
> sufficient to construct the list of AvaticaParameters, and functions as
> expected in my tests so far. Is this an adequate way to do this or is it
> necessary to implement 'CalcitePrepare' or somehow use 'CalcitePrepareImpl'
> instead?
>
>
> I've also noticed in a lot of my experiments that I run into SqlValidator
> exceptions that appear to be related to being unable to infer the type of a
> dynamic parameter, usually either something like:
>
>     SELECT SUBSTRING(dim2, ?, ?) FROM druid.foo LIMIT ?
>     => Illegal use of dynamic parameter
> or
>     SELECT
>       CASE 'foo'
>       WHEN ? THEN SUM(cnt) / ?
>       WHEN ? THEN SUM(m1) / ?
>       END
>     FROM foo
>     => Cannot apply '/' to arguments of type '<BIGINT> / <UNKNOWN>'
>
> which can be mitigated by doing something like `CAST(? as INT)` giving the
> operand an explicit type, but that seems like an unfortunate workaround. Is
> there a way to handle this?
>
> Any pointers would be appreciated, thanks!
>
> Cheers,
> Clint
>