You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Israel Ekpo <is...@gmail.com> on 2021/06/28 00:27:58 UTC

Re: [Question] Why doesn't Flink use Calcite adapter?

Maybe this question was better addressed to the DEV list.

On Fri, Jun 25, 2021 at 11:57 PM guangyuan wang <wa...@apache.org>
wrote:

>
> <https://stackoverflow.com/posts/68108655/timeline>
>
> I have read the design doc of the Flink planner recently. I've found the
> Flink only uses Calcite as an SQL optimizer. It translates an optimized
> RelNode to Flink(or Blink)RelNode, and then transfers it to the physical
> plan. Why doesn't Flink implement Calcite adapters? Isn't this an easier
> way to use calcite?
>
>
> The link of calcite daptor:calcite.apache.org/docs/adapter.html.
>

Re: [Question] Why doesn't Flink use Calcite adapter?

Posted by JING ZHANG <be...@gmail.com>.
Hi guangyuan,
The question is an interesting and broad topic. I try to give my opinion
based on my limited knowledge.

Flink introduces dynamic sources to read from an external system[1]. Flink
connector modules are completely decoupled with Calcite. There are two
benefits:
(1) If users need to develop a custom, user-defined connector, no
background knowledge of Calcite is required.
(2) Remove unnecessary external dependencies in the Flink connector module.

Besides, since Flink is distributed for stateful computations over *unbounded
and bounded* data streams, there are more things to be taken into
consideration when connected with an external system.
For example, how to complete data reading with multiple concurrency, how to
store metastore to state in order to recover after failover.
I list a few issues as follows. These issues are strongly related to the
Flink engine which are not defined in Calcite built-in adapters.
(1) Required: define how to read from an external storage system
     1.1 scan all rows or lookup rows by one or more keys
     1.2 if choose scan mode, define how to split source, how to store
metadata to state in order to recover them after recovery from failover.
(2) Required: mapping from data type in external system to Flink data type
system
(3) Optional: for planner optimization, define optionally
ability interfaces, e.g SupportsProjectionPushDown/SupportFilterPushDown
and so on.
(4) Optional: define encoding/ decoding formats

Hope it helps. Please correct me if I'm wrong.

Best regards,
JING ZHANG

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/sourcessinks/

Israel Ekpo <is...@gmail.com> 于2021年6月28日周一 上午8:28写道:

> Maybe this question was better addressed to the DEV list.
>
> On Fri, Jun 25, 2021 at 11:57 PM guangyuan wang <wa...@apache.org>
> wrote:
>
>>
>> <https://stackoverflow.com/posts/68108655/timeline>
>>
>> I have read the design doc of the Flink planner recently. I've found the
>> Flink only uses Calcite as an SQL optimizer. It translates an optimized
>> RelNode to Flink(or Blink)RelNode, and then transfers it to the physical
>> plan. Why doesn't Flink implement Calcite adapters? Isn't this an easier
>> way to use calcite?
>>
>>
>> The link of calcite daptor:calcite.apache.org/docs/adapter.html.
>>
>

Re: [Question] Why doesn't Flink use Calcite adapter?

Posted by JING ZHANG <be...@gmail.com>.
Hi guangyuan,
The question is an interesting and broad topic. I try to give my opinion
based on my limited knowledge.

Flink introduces dynamic sources to read from an external system[1]. Flink
connector modules are completely decoupled with Calcite. There are two
benefits:
(1) If users need to develop a custom, user-defined connector, no
background knowledge of Calcite is required.
(2) Remove unnecessary external dependencies in the Flink connector module.

Besides, since Flink is distributed for stateful computations over *unbounded
and bounded* data streams, there are more things to be taken into
consideration when connected with an external system.
For example, how to complete data reading with multiple concurrency, how to
store metastore to state in order to recover after failover.
I list a few issues as follows. These issues are strongly related to the
Flink engine which are not defined in Calcite built-in adapters.
(1) Required: define how to read from an external storage system
     1.1 scan all rows or lookup rows by one or more keys
     1.2 if choose scan mode, define how to split source, how to store
metadata to state in order to recover them after recovery from failover.
(2) Required: mapping from data type in external system to Flink data type
system
(3) Optional: for planner optimization, define optionally
ability interfaces, e.g SupportsProjectionPushDown/SupportFilterPushDown
and so on.
(4) Optional: define encoding/ decoding formats

Hope it helps. Please correct me if I'm wrong.

Best regards,
JING ZHANG

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/sourcessinks/

Israel Ekpo <is...@gmail.com> 于2021年6月28日周一 上午8:28写道:

> Maybe this question was better addressed to the DEV list.
>
> On Fri, Jun 25, 2021 at 11:57 PM guangyuan wang <wa...@apache.org>
> wrote:
>
>>
>> <https://stackoverflow.com/posts/68108655/timeline>
>>
>> I have read the design doc of the Flink planner recently. I've found the
>> Flink only uses Calcite as an SQL optimizer. It translates an optimized
>> RelNode to Flink(or Blink)RelNode, and then transfers it to the physical
>> plan. Why doesn't Flink implement Calcite adapters? Isn't this an easier
>> way to use calcite?
>>
>>
>> The link of calcite daptor:calcite.apache.org/docs/adapter.html.
>>
>