You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Yanjing Wang <zh...@gmail.com> on 2022/04/13 07:33:14 UTC

[DISCUSS] Parser with involved tables

Hi Commnunity,

In real scenarios, many developers want to get what tables a query
statement contains.

such as 'WITH t as (SELECT * FROM t1) SELECT * FROM t, t2', it only
contains table t1 and t2.

I tried to visit the SqlNode and found this solution is very complicated
and has very many checks, otherwise easy to get in parser.
such as add one line 'involvedTables.add(nameList);'  in
'CompoundTableIdentifier' production rule.

I know that this isn't the duty of parser, but I think it's the easiest way
to get the tables, I want to add this functionality to parser, how do you
think?

Re: [DISCUSS] Parser with involved tables

Posted by Yanjing Wang <zh...@gmail.com>.
Thanks Jacques, It's so elaborate that I get it.

Jacques Nadeau <ja...@apache.org> 于2022年4月14日周四 03:01写道:

> You need to complete validation in order to confirm that all tables are
> real. For example, validation includes clarifying ambiguities (since users
> can rely on implicit resolution but rel trees cannot).
>
> For #2: this is a really hard thing to do with Calcite. We did this in
> Apache Drill by basically watching for field and table name requests to the
> catalog interface and automatically adding them but it's relatively
> brittle. You can see the entry to this around the code here [1] (and the
> type holder this class uses).
>
> For #1: If you really want to do this, walking the AST is the best option.
> That being said, I don't think it makes sense to add this as a public
> interface to the parser. There are too many places where it won't make
> sense (or work inconsistently, confusing consumers). The first place I
> would do it would be once you have a validated tree since the validated
> tree has identified problems and clarified ambiguities. This would be one
> step before converting to rel (which seems useful in several cases). That
> being said, you'd also get some weird externalities even at that step.
> Views are not converted into plans until SqlToRel which means you'd only
> get direct tables referenced, not tables referenced through a view. You
> have to expand all the views to get the complete set (thus the need for a
> rel tree for the utility method).
>
>
> [1]
>
> https://github.com/apache/drill/blob/fd836a2a6e933c294a53271f7e9b6ba3fea7cc8d/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DynamicDrillTable.java
>
> On Wed, Apr 13, 2022 at 1:11 AM Yanjing Wang <zh...@gmail.com>
> wrote:
>
> > Hi Alessandro,
> >
> > Thanks for your feedback, the RelOptUtil#findAllTables requires its input
> > is a relNode, But many times we need extract all tables before
> validation,
> > this has the following advantages.
> >
> > 1. no metadata need and fast.
> > 2. we can compose a on-demand catalog, rather get all schemas, tables to
> > match.
> >
> > Alessandro Solimando <al...@gmail.com> 于2022年4月13日周三
> > 16:04写道:
> >
> > > Hi Yanjing,
> > >
> > > I found a discussion in the ML suggesting to use
> RelOptUtil#findAllTables
> > > to get all tables involved in a query:
> > > https://lists.apache.org/thread/wcfv2bhdytwsoyc3qlsnxcd7nmnnlb06
> > >
> > > Can you check if that's what you want?
> > >
> > > Best regards,
> > > Alessandro
> > >
> > > On Wed, 13 Apr 2022 at 09:33, Yanjing Wang <zh...@gmail.com>
> > > wrote:
> > >
> > > > Hi Commnunity,
> > > >
> > > > In real scenarios, many developers want to get what tables a query
> > > > statement contains.
> > > >
> > > > such as 'WITH t as (SELECT * FROM t1) SELECT * FROM t, t2', it only
> > > > contains table t1 and t2.
> > > >
> > > > I tried to visit the SqlNode and found this solution is very
> > complicated
> > > > and has very many checks, otherwise easy to get in parser.
> > > > such as add one line 'involvedTables.add(nameList);'  in
> > > > 'CompoundTableIdentifier' production rule.
> > > >
> > > > I know that this isn't the duty of parser, but I think it's the
> easiest
> > > way
> > > > to get the tables, I want to add this functionality to parser, how do
> > you
> > > > think?
> > > >
> > >
> >
>

Re: [DISCUSS] Parser with involved tables

Posted by Jacques Nadeau <ja...@apache.org>.
You need to complete validation in order to confirm that all tables are
real. For example, validation includes clarifying ambiguities (since users
can rely on implicit resolution but rel trees cannot).

For #2: this is a really hard thing to do with Calcite. We did this in
Apache Drill by basically watching for field and table name requests to the
catalog interface and automatically adding them but it's relatively
brittle. You can see the entry to this around the code here [1] (and the
type holder this class uses).

For #1: If you really want to do this, walking the AST is the best option.
That being said, I don't think it makes sense to add this as a public
interface to the parser. There are too many places where it won't make
sense (or work inconsistently, confusing consumers). The first place I
would do it would be once you have a validated tree since the validated
tree has identified problems and clarified ambiguities. This would be one
step before converting to rel (which seems useful in several cases). That
being said, you'd also get some weird externalities even at that step.
Views are not converted into plans until SqlToRel which means you'd only
get direct tables referenced, not tables referenced through a view. You
have to expand all the views to get the complete set (thus the need for a
rel tree for the utility method).


[1]
https://github.com/apache/drill/blob/fd836a2a6e933c294a53271f7e9b6ba3fea7cc8d/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DynamicDrillTable.java

On Wed, Apr 13, 2022 at 1:11 AM Yanjing Wang <zh...@gmail.com>
wrote:

> Hi Alessandro,
>
> Thanks for your feedback, the RelOptUtil#findAllTables requires its input
> is a relNode, But many times we need extract all tables before validation,
> this has the following advantages.
>
> 1. no metadata need and fast.
> 2. we can compose a on-demand catalog, rather get all schemas, tables to
> match.
>
> Alessandro Solimando <al...@gmail.com> 于2022年4月13日周三
> 16:04写道:
>
> > Hi Yanjing,
> >
> > I found a discussion in the ML suggesting to use RelOptUtil#findAllTables
> > to get all tables involved in a query:
> > https://lists.apache.org/thread/wcfv2bhdytwsoyc3qlsnxcd7nmnnlb06
> >
> > Can you check if that's what you want?
> >
> > Best regards,
> > Alessandro
> >
> > On Wed, 13 Apr 2022 at 09:33, Yanjing Wang <zh...@gmail.com>
> > wrote:
> >
> > > Hi Commnunity,
> > >
> > > In real scenarios, many developers want to get what tables a query
> > > statement contains.
> > >
> > > such as 'WITH t as (SELECT * FROM t1) SELECT * FROM t, t2', it only
> > > contains table t1 and t2.
> > >
> > > I tried to visit the SqlNode and found this solution is very
> complicated
> > > and has very many checks, otherwise easy to get in parser.
> > > such as add one line 'involvedTables.add(nameList);'  in
> > > 'CompoundTableIdentifier' production rule.
> > >
> > > I know that this isn't the duty of parser, but I think it's the easiest
> > way
> > > to get the tables, I want to add this functionality to parser, how do
> you
> > > think?
> > >
> >
>

Re: [DISCUSS] Parser with involved tables

Posted by Yanjing Wang <zh...@gmail.com>.
Hi Alessandro,

Thanks for your feedback, the RelOptUtil#findAllTables requires its input
is a relNode, But many times we need extract all tables before validation,
this has the following advantages.

1. no metadata need and fast.
2. we can compose a on-demand catalog, rather get all schemas, tables to
match.

Alessandro Solimando <al...@gmail.com> 于2022年4月13日周三 16:04写道:

> Hi Yanjing,
>
> I found a discussion in the ML suggesting to use RelOptUtil#findAllTables
> to get all tables involved in a query:
> https://lists.apache.org/thread/wcfv2bhdytwsoyc3qlsnxcd7nmnnlb06
>
> Can you check if that's what you want?
>
> Best regards,
> Alessandro
>
> On Wed, 13 Apr 2022 at 09:33, Yanjing Wang <zh...@gmail.com>
> wrote:
>
> > Hi Commnunity,
> >
> > In real scenarios, many developers want to get what tables a query
> > statement contains.
> >
> > such as 'WITH t as (SELECT * FROM t1) SELECT * FROM t, t2', it only
> > contains table t1 and t2.
> >
> > I tried to visit the SqlNode and found this solution is very complicated
> > and has very many checks, otherwise easy to get in parser.
> > such as add one line 'involvedTables.add(nameList);'  in
> > 'CompoundTableIdentifier' production rule.
> >
> > I know that this isn't the duty of parser, but I think it's the easiest
> way
> > to get the tables, I want to add this functionality to parser, how do you
> > think?
> >
>

Re: [DISCUSS] Parser with involved tables

Posted by Alessandro Solimando <al...@gmail.com>.
Hi Yanjing,

I found a discussion in the ML suggesting to use RelOptUtil#findAllTables
to get all tables involved in a query:
https://lists.apache.org/thread/wcfv2bhdytwsoyc3qlsnxcd7nmnnlb06

Can you check if that's what you want?

Best regards,
Alessandro

On Wed, 13 Apr 2022 at 09:33, Yanjing Wang <zh...@gmail.com>
wrote:

> Hi Commnunity,
>
> In real scenarios, many developers want to get what tables a query
> statement contains.
>
> such as 'WITH t as (SELECT * FROM t1) SELECT * FROM t, t2', it only
> contains table t1 and t2.
>
> I tried to visit the SqlNode and found this solution is very complicated
> and has very many checks, otherwise easy to get in parser.
> such as add one line 'involvedTables.add(nameList);'  in
> 'CompoundTableIdentifier' production rule.
>
> I know that this isn't the duty of parser, but I think it's the easiest way
> to get the tables, I want to add this functionality to parser, how do you
> think?
>