You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Francesco Gini <fr...@gmail.com> on 2021/04/28 17:55:25 UTC

Questions on JDBC adapter

Hi all,
I have been using calcite to make queries against a jdbc database and I noticed the following things, some might be bugs and some might be my misunderstanding:
- JdbcJoinRule canJoinOnCondition method fails to push down join whose conditions are always true or always false
- JdbcJoinRule does not convert SEMI and ANTI join but the JdbcImplementor has code to support it, I wonder if the two needs to be aligned ?
- JdbcToEnumerableConverter create a ResultSet from the query sent to the datasource. It would be nice if it was possible to configure the fetch size on the result set, and more generally to configure the jdbc objects. For instance with the postgres driver the fetch size is respected just if autocommit on the connection is disabled https://jdbc.postgresql.org/documentation/head/query.html#fetchsize-example
- JdbcCalc does not implement Calc, is it by design? I realised aboutit because I tried to convert a logical plan into a Jdbc plan via the hep planner. the final plan contained a JdbcCalc that the JdbcImplementor is not able to implement because it is not a Calc.

If those are bugs, I can try to contribute fixes for some of them, let me know what the process should be and where to record the problems.

A final question, I have tried to use an hep planner instead of a volcano planner, with the aim to skip any optimisation rule and just apply the rules to convert a logical plan to a jdbc only plan (the assumption is that the entire plan is pushed down to the jdbc database). I found out that an hep planner can't really be used as a replacement for a volcano planner because methods like isRegistered, ensureRegistered, registered, addRelTraitDef, getRelTraitDef are not really implemented. They way jdbc rules are registered is by registering the JdbcConvention which seems to be done when a node is registered with the planner, I think this is how it happens in the volcano planner. Even if manually register the jdbc rules with the planner, the conversion from logical plan to jdbc nodes does not happen. The convention to be set on the jdbc node is derived from the RelOptCluster emptyTraitSet field. However, emptyTraitSet is populated from the planner, which in case of HepPlanner is always a
 n empty set, therefore calls like RelOptCluster.traitSetOf fail to replace the convention.
Given those differences between hep planner and volcano planner is the intention to always use a volcano planner within a cluster, and potentially use the hep planner just as a Program/phase?

Apologies for the long message,
cheers

Re: Questions on JDBC adapter

Posted by Alessandro Solimando <al...@gmail.com>.

Hi Francesco,
regarding the "where to record the issues" bit, we use JIRA:
https://issues.apache.org/jira/projects/CALCITE/summary

You might have already seen this, but that's the contributor's guideline:
https://calcite.apache.org/develop/#contributing

Best regards,
Alessandro

On Thu, 29 Apr 2021 at 02:50, Julian Hyde <jh...@gmail.com> wrote:

> Thanks for your email. It is really helpful if people ask before beginning
> work (or logging a lot of bugs on an area that they perceive to be broken).
>
> I think those are all bugs/missing features. Feel free to log bugs, create
> PRs, see if anything breaks. (If things break that might be an indication
> that the feature is wrong, but then again it might not.)
>
> Recently we changed how we optimize cartesian joins (see
> https://issues.apache.org/jira/browse/CALCITE-4515 <
> https://issues.apache.org/jira/browse/CALCITE-4515>); you should see
> whether the arguments made in that case apply to canJoinOnCondition.
>
> I hadn’t realized that there were those problems for JdbcConvention in
> HepPlanner, but it makes sense. JdbcConvention is unusual in that it has
> multiple instances (each representing a separate database). Applying JDBC
> rules doesn’t inherently require Volcano’s dynamic programming approach
> (embodied by isRegistered, etc.) so I feel there could be a way to apply
> JDBC rules inside a HepPlanner. So, please log a bug for that too. I’d like
> to see a simple test case where we try to invoke a JDBC rule in a
> HepPlanner and it fails. I think there might be a simple hacky workaround
> (e.g. getting the JdbcConvention from a ThreadLocal) and then we can
> iterate and find a better solution.
>
> Julian
>
>
>
> > On Apr 28, 2021, at 10:55 AM, Francesco Gini <fr...@gmail.com>
> wrote:
> >
> > Hi all,
> > I have been using calcite to make queries against a jdbc database and I
> noticed the following things, some might be bugs and some might be my
> misunderstanding:
> > - JdbcJoinRule canJoinOnCondition method fails to push down join whose
> conditions are always true or always false
> > - JdbcJoinRule does not convert SEMI and ANTI join but the
> JdbcImplementor has code to support it, I wonder if the two needs to be
> aligned ?
> > - JdbcToEnumerableConverter create a ResultSet from the query sent to
> the datasource. It would be nice if it was possible to configure the fetch
> size on the result set, and more generally to configure the jdbc objects.
> For instance with the postgres driver the fetch size is respected just if
> autocommit on the connection is disabled
> https://jdbc.postgresql.org/documentation/head/query.html#fetchsize-example
> > - JdbcCalc does not implement Calc, is it by design? I realised aboutit
> because I tried to convert a logical plan into a Jdbc plan via the hep
> planner. the final plan contained a JdbcCalc that the JdbcImplementor is
> not able to implement because it is not a Calc.
> >
> > If those are bugs, I can try to contribute fixes for some of them, let
> me know what the process should be and where to record the problems.
> >
> > A final question, I have tried to use an hep planner instead of a
> volcano planner, with the aim to skip any optimisation rule and just apply
> the rules to convert a logical plan to a jdbc only plan (the assumption is
> that the entire plan is pushed down to the jdbc database). I found out that
> an hep planner can't really be used as a replacement for a volcano planner
> because methods like isRegistered, ensureRegistered, registered,
> addRelTraitDef, getRelTraitDef are not really implemented. They way jdbc
> rules are registered is by registering the JdbcConvention which seems to be
> done when a node is registered with the planner, I think this is how it
> happens in the volcano planner. Even if manually register the jdbc rules
> with the planner, the conversion from logical plan to jdbc nodes does not
> happen. The convention to be set on the jdbc node is derived from the
> RelOptCluster emptyTraitSet field. However, emptyTraitSet is populated from
> the planner, which in case of HepPlanner is always a
> > n empty set, therefore calls like RelOptCluster.traitSetOf fail to
> replace the convention.
> > Given those differences between hep planner and volcano planner is the
> intention to always use a volcano planner within a cluster, and potentially
> use the hep planner just as a Program/phase?
> >
> > Apologies for the long message,
> > cheers
>
>

Re: Questions on JDBC adapter

Posted by Julian Hyde <jh...@gmail.com>.

Thanks for your email. It is really helpful if people ask before beginning work (or logging a lot of bugs on an area that they perceive to be broken).

I think those are all bugs/missing features. Feel free to log bugs, create PRs, see if anything breaks. (If things break that might be an indication that the feature is wrong, but then again it might not.)

Recently we changed how we optimize cartesian joins (see https://issues.apache.org/jira/browse/CALCITE-4515 <https://issues.apache.org/jira/browse/CALCITE-4515>); you should see whether the arguments made in that case apply to canJoinOnCondition.

I hadn’t realized that there were those problems for JdbcConvention in HepPlanner, but it makes sense. JdbcConvention is unusual in that it has multiple instances (each representing a separate database). Applying JDBC rules doesn’t inherently require Volcano’s dynamic programming approach (embodied by isRegistered, etc.) so I feel there could be a way to apply JDBC rules inside a HepPlanner. So, please log a bug for that too. I’d like to see a simple test case where we try to invoke a JDBC rule in a HepPlanner and it fails. I think there might be a simple hacky workaround (e.g. getting the JdbcConvention from a ThreadLocal) and then we can iterate and find a better solution.

Julian

> On Apr 28, 2021, at 10:55 AM, Francesco Gini <fr...@gmail.com> wrote:
> 
> Hi all,
> I have been using calcite to make queries against a jdbc database and I noticed the following things, some might be bugs and some might be my misunderstanding:
> - JdbcJoinRule canJoinOnCondition method fails to push down join whose conditions are always true or always false
> - JdbcJoinRule does not convert SEMI and ANTI join but the JdbcImplementor has code to support it, I wonder if the two needs to be aligned ?
> - JdbcToEnumerableConverter create a ResultSet from the query sent to the datasource. It would be nice if it was possible to configure the fetch size on the result set, and more generally to configure the jdbc objects. For instance with the postgres driver the fetch size is respected just if autocommit on the connection is disabled https://jdbc.postgresql.org/documentation/head/query.html#fetchsize-example
> - JdbcCalc does not implement Calc, is it by design? I realised aboutit because I tried to convert a logical plan into a Jdbc plan via the hep planner. the final plan contained a JdbcCalc that the JdbcImplementor is not able to implement because it is not a Calc.
> 
> If those are bugs, I can try to contribute fixes for some of them, let me know what the process should be and where to record the problems.
> 
> A final question, I have tried to use an hep planner instead of a volcano planner, with the aim to skip any optimisation rule and just apply the rules to convert a logical plan to a jdbc only plan (the assumption is that the entire plan is pushed down to the jdbc database). I found out that an hep planner can't really be used as a replacement for a volcano planner because methods like isRegistered, ensureRegistered, registered, addRelTraitDef, getRelTraitDef are not really implemented. They way jdbc rules are registered is by registering the JdbcConvention which seems to be done when a node is registered with the planner, I think this is how it happens in the volcano planner. Even if manually register the jdbc rules with the planner, the conversion from logical plan to jdbc nodes does not happen. The convention to be set on the jdbc node is derived from the RelOptCluster emptyTraitSet field. However, emptyTraitSet is populated from the planner, which in case of HepPlanner is always a
> n empty set, therefore calls like RelOptCluster.traitSetOf fail to replace the convention.
> Given those differences between hep planner and volcano planner is the intention to always use a volcano planner within a cluster, and potentially use the hep planner just as a Program/phase?
> 
> Apologies for the long message,
> cheers