You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Juan Pan <pa...@apache.org> on 2020/12/15 11:12:11 UTC

Re: [Question] How to leverage Calcite adaptor for federated SQL query without using Calcite parser

Hi all,


Respond by myself.


After digging into the Calcite project, I figured out how to do SQL federation with an external SQL parser.


Federated SQL[1] is a demo to join SQLs from different database instances and shows users a basic process of parse(), validate(), optimize(), and execute() a SQL inner Calcite.


So this demo project gives users hints and points on implementing SQL federation and learning the Calcite project (you know it is difficult to understand this tremendous project from zero). If you have any ideas, welcome your issues and PRs.


Besides, we are still discussing and learning the Calcite project on the issue[2], where you can get some useful info, I guess. Also, I am considering posting a blog or giving a summary about [2] to help others get its context (SQL federation and SQL optimization) shortly. 


[1] https://github.com/tristaZero/federatedSQL
[2] https://github.com/apache/shardingsphere/issues/8284


Best,
Trista




---------------------------------------------------------------- 
Juan Pan (Trista)
                         
Senior DBA & PMC of Apache ShardingSphere
E-mail: panjuan@apache.org




On 11/24/2020 18:20,Juan Pan<pa...@apache.org> wrote:
Hi Rui,


Your sum-up is precisely what I care about.


1. Reuse Calcite adaptors and combine that with your parser to parse queries.


First, that is what @Michael described, doesn't it?
The coding work to solve it makes sense to me. Thanks.


2. Convert results from 1. to RelNode and let Calcite optimize, then execute based on Enumerable implementations.


Second, that is my concern now.
Yep, we need a trigger or entrance to make Calcite use the custom adapter with the external parsed result
(Precisely speaking, the RelNode converted from the third-party parsed result).
I guess it is Calcite Driver or Calcite connection is that `trigger`.
However,  it can not use the output from point 1.


write code to execute the enumerable tree (this part of code is inside
Calcite connection, but Calcite connection won't let you use your own
parser)


That way, maybe we need to rewrite a `Calcite connection` to execute the enumerable tree?
How about implementing `SqlAbstractParserImpl` and configure it in JDBC props?


Thanks for your time.


Best wishes,
Trista






Juan Pan (Trista)

Senior DBA & PMC of Apache ShardingSphere
E-mail: panjuan@apache.org




On 11/24/2020 06:16,Rui Wang<am...@apache.org> wrote:
I think Michael has said points that I can say. Just try to understand your
problem after reads existing threads:

Sounds like, you need to do two things:
1. Reuse Calcite adaptors and combine that with your parser to parse
queries.
2. Convert results from 1. to RelNode and let Calcite optimize, then
execute based on Enumerable implementations.

If my understanding so far is correct, I am thinking Calcite does not have
a simple API to allow you do 2.

My understanding is you will need to build something by yourself:
a. write code to convert results of 1. to RelNode, make sure to set up
Enumerable conventions to produce Enumerable backed nodes.
b. write code to execute the enumerable tree (this part of code is inside
Calcite connection, but Calcite connection won't let you use your own
parser)

-Rui

On Mon, Nov 23, 2020 at 4:10 AM Michael Mior <mm...@apache.org> wrote:

There is nothing stopping you from using adapters with SQL queries you
have parsed yourself. You simply need to assign the appropriate
convention to each table scan in the RelNode tree you pass into the
optimizer. However, if the reason for using your own parser is to be
able to have as broad support for different SQL queries as possible, I
suggest you look at Calcite's Babel parser. It extends the default
parser to add broader support for other dialects of SQL.

--
Michael Mior
mmior@apache.org

Le lun. 23 nov. 2020 à 01:54, Juan Pan <pa...@apache.org> a écrit :

Hi JiaTao,


Very appreciated your share.


Actually, what I am confused about is how to make Calcite custom adaptor
works with other parsers.
For example, I use a non-Calcite parser to get the parsed result and
transform them into RelNode to tell Calcite, Hi, please use this RelNode
for the rest handling.
But I still implement a custom adaptor and wish Calcite can adopt them.
If I call Calcite JDBC, like `Driver.getConnection(Calcite_Conn)`, which
will bring Calcite parser to parser SQL instead of my own.  : (
Is there any approach to make Calcite call the custom adapter and
third-party parser?


Best wishes,
Trista




Juan Pan (Trista)

Senior DBA & PMC of Apache ShardingSphere
E-mail: panjuan@apache.org




On 11/23/2020 14:38,JiaTao Tao<ta...@gmail.com> wrote:
Hi Juan Pan

As I said, you can archive this by "If you have to do this, you can
either
generate SqlNode with Antlr OR transform your own AST tree to RelNode,
you
can take a look at org.apache.calcite.sql2rel.SqlToRelConverter.", in
fact,
hive does the same thing, you can take a look, it uses its own AST tree
to
generate a RelNode tree.

Regards!

Aron Tao


Juan Pan <pa...@apache.org> 于2020年11月23日周一 下午1:04写道:

Hi JiaTao,


The reason we want to bypass Calcite parsing mainly contains two points.
First, as you said, we want to have a better query efficiency by only
parsing SQL one time. But from what you said, it looks like not a big
deal.


Second, I am a bit concerned about the SQL supported capacity of Calcite.
[1] shows me the all supported SQLs. Is that in line with SQL92 or MySQL
5.x?
Currently, ShardingSphere parser has almost complete support for MySQL
8.0
and PostgreSQL, and basic support for SQLServer, Oracle, SQL92 [2] (As a
distributed Database middleware ecosystem, we have to do so).
Therefore, if we use Calcite parser, maybe we can not help users handle
some of the SQLs  (Unsure).


Could you give me some hints to bypass the parsing of Calcite? Or maybe
we
can not reach that goal?
Much appreciated your any points or reply. : )


Regards,
Trista


[1] https://calcite.apache.org/docs/reference.html
[2]

https://shardingsphere.apache.org/document/current/en/features/sharding/principle/parse


Juan Pan (Trista)

Senior DBA & PMC of Apache ShardingSphere
E-mail: panjuan@apache.org




On 11/22/2020 16:17,JiaTao Tao<ta...@gmail.com> wrote:
In fact, parse twice's impact is little, in Apache Kylin, every time we
do
the transformation to SQL, we re-parse it.
What really takes time is validation (use metadata like getting it from
HMS) and optimization.

Regards!

Aron Tao


Juan Pan <pa...@apache.org> 于2020年11月22日周日 下午2:32写道:

Hi community,




Thanks for your attention. : )




Currently, Apache ShardingSphere community plans to leverage Apache
Calcite to implement federated SQL query,

i.e., the query from different database instances [1].




The draft approach is that we consider using the custom adaptor with the
SQL parser of ShardingSphere itself (Antlr involved),

and transforming the parsed result to the algebra of Calcite.

Lastly, Calcite will execute the SQLs by means of the custom adaptor.




Currently, I know the entrance of calling the custom adaptor is to use
the
`DriverManager.getConnection(CalciteUrl)`, which will get Calcite's SQL
parsing involved.

But we want to avoid twice SQL parsing, which means we wish to ignore the
SQL parsing of CalciteN .




My question is that how we can leverage Calcite adaptor without using
Calcite parser.

Could you give me some hints?




Very appreciated your any help and reply.




Regards,

Trista







[1] https://github.com/apache/shardingsphere/issues/8284



Juan Pan (Trista)

Senior DBA & PMC of Apache ShardingSphere
E-mail: panjuan@apache.org