You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Ran Tao <ch...@gmail.com> on 2023/02/08 12:46:45 UTC

Confusion about some overlapping functionality of SupportsProjectionPushDown and SupportsReadingMetadata

Currently we use SupportsProjectionPushDown to push down physical columns,
and SupportsReadingMetadata is used to read metadata columns.
There is no problem when implementing one of the interfaces alone. If two
interfaces are implemented at the same time, there will be confusing
semantics.

For example, if we update the schema or producedDataType in
SupportsProjectionPushDown#applyProjection and
SupportsReadingMetadata#applyReadableMetadata at the same time, the former
is actually invalid, because the former is called first, and then the
latter will overwrite it.

There are some similar usage notes in the interface's documentation. But
this is very confusing. In this case, you only need to implement
SupportsReadingMetadata#applyReadableMetadata (only implement
SupportsProjectionPushDown, the override method is empty), and the rule
match logic of SupportsReadingMetadata will push down the physical column
and metadata columns to generate producedDataType and return it.

At this point SupportsProjectionPushDown is more like a marker interface.
In addition, if some member variables are relied on in the implementation
of SupportsReadingMetadata, and the member variables are also updated in
SupportsProjectionPushDown, unexpected problems may occur. Developers
should clearly read the implementation of these two interfaces and
understand that these overlapping functions will cause a certain
development cost to the developer of the connector (normally, the two
interfaces should be isolated functions, developers see the meaning of the
name ).

I wonder if the community has considered making the responsibilities of
these two interfaces more independent and clear in subsequent updates.
Maybe my understanding is not very sufficient, looking forward to your
opinions.

-- 
Best Regards,
Ran Tao
https://github.com/chucheng92

Re: Confusion about some overlapping functionality of SupportsProjectionPushDown and SupportsReadingMetadata

Posted by Ran Tao <ch...@gmail.com>.
hi. yuxia, Shengkai. thanks for your explanation.

I originally wanted to say whether we can let SupportsReadingMetadata
handle the metadata column alone (may provide an ability to merge 2
produced types),
so that the 2 interfaces don't matter.

But through your explanation on the planner side just now, it's maybe not a
easy job and involved too many places.

I agree that we have no other better way to avoid it currently.


yuxia <lu...@alumni.sjtu.edu.cn> 于2023年2月13日周一 10:16写道:

> Sorry for the misktake, correct it.
> "but I'm afraid that we have no other better way to avoid it.
>
> Best regards,
> Yuxia
>
> ----- 原始邮件 -----
> 发件人: "yuxia" <lu...@alumni.sjtu.edu.cn>
> 收件人: "dev" <de...@flink.apache.org>
> 发送时间: 星期一, 2023年 2 月 13日 上午 10:11:53
> 主题: Re: Confusion about some overlapping functionality of
> SupportsProjectionPushDown and SupportsReadingMetadata
>
> Hi, Ran Tao.
> I agree to Shengkai that physical column pushdown and metadata processing
> are different things and should be processed separately.
> To me, it'll be more weird that SupportsReadingMetadata do the both.
>
> I agree you that that the developer may have unexpected behaviors due to
> the sequence problem, but I don't think we have no other better way to
> avoid it.
>
> Best regards,
> Yuxia
>
> ----- 原始邮件 -----
> 发件人: "fskmine" <fs...@gmail.com>
> 收件人: "dev" <de...@flink.apache.org>
> 发送时间: 星期五, 2023年 2 月 10日 下午 12:02:36
> 主题: Re: Confusion about some overlapping functionality of
> SupportsProjectionPushDown and SupportsReadingMetadata
>
> Hi, Ran.
>
> I think it's a little difficult to split the rule into two parts. Because
> the ProjectDown and ReadingMetadata both need to reorder the fields. The
> ReadingMetadata requires the metadata columns to be at the last and the
> ProjectPushDown now is responsible to reorder the columns as the user
> specified.
>
> One more concern, the push-down optimization occurs in the logical phase
> that uses the volcano planner. I am not sure whether the rule will be
> applied at last because metadata push-down doesn't change the cost.
>
> Best,
> Shengkai
>


-- 
Best Regards,
Ran Tao
https://github.com/chucheng92

Re: Confusion about some overlapping functionality of SupportsProjectionPushDown and SupportsReadingMetadata

Posted by yuxia <lu...@alumni.sjtu.edu.cn>.
Sorry for the misktake, correct it.
"but I'm afraid that we have no other better way to avoid it.

Best regards,
Yuxia

----- 原始邮件 -----
发件人: "yuxia" <lu...@alumni.sjtu.edu.cn>
收件人: "dev" <de...@flink.apache.org>
发送时间: 星期一, 2023年 2 月 13日 上午 10:11:53
主题: Re: Confusion about some overlapping functionality of SupportsProjectionPushDown and SupportsReadingMetadata

Hi, Ran Tao.
I agree to Shengkai that physical column pushdown and metadata processing are different things and should be processed separately.
To me, it'll be more weird that SupportsReadingMetadata do the both. 

I agree you that that the developer may have unexpected behaviors due to the sequence problem, but I don't think we have no other better way to 
avoid it.

Best regards,
Yuxia

----- 原始邮件 -----
发件人: "fskmine" <fs...@gmail.com>
收件人: "dev" <de...@flink.apache.org>
发送时间: 星期五, 2023年 2 月 10日 下午 12:02:36
主题: Re: Confusion about some overlapping functionality of SupportsProjectionPushDown and SupportsReadingMetadata

Hi, Ran.

I think it's a little difficult to split the rule into two parts. Because
the ProjectDown and ReadingMetadata both need to reorder the fields. The
ReadingMetadata requires the metadata columns to be at the last and the
ProjectPushDown now is responsible to reorder the columns as the user
specified.

One more concern, the push-down optimization occurs in the logical phase
that uses the volcano planner. I am not sure whether the rule will be
applied at last because metadata push-down doesn't change the cost.

Best,
Shengkai

Re: Confusion about some overlapping functionality of SupportsProjectionPushDown and SupportsReadingMetadata

Posted by yuxia <lu...@alumni.sjtu.edu.cn>.
Hi, Ran Tao.
I agree to Shengkai that physical column pushdown and metadata processing are different things and should be processed separately.
To me, it'll be more weird that SupportsReadingMetadata do the both. 

I agree you that that the developer may have unexpected behaviors due to the sequence problem, but I don't think we have no other better way to 
avoid it.

Best regards,
Yuxia

----- 原始邮件 -----
发件人: "fskmine" <fs...@gmail.com>
收件人: "dev" <de...@flink.apache.org>
发送时间: 星期五, 2023年 2 月 10日 下午 12:02:36
主题: Re: Confusion about some overlapping functionality of SupportsProjectionPushDown and SupportsReadingMetadata

Hi, Ran.

I think it's a little difficult to split the rule into two parts. Because
the ProjectDown and ReadingMetadata both need to reorder the fields. The
ReadingMetadata requires the metadata columns to be at the last and the
ProjectPushDown now is responsible to reorder the columns as the user
specified.

One more concern, the push-down optimization occurs in the logical phase
that uses the volcano planner. I am not sure whether the rule will be
applied at last because metadata push-down doesn't change the cost.

Best,
Shengkai

Re: Confusion about some overlapping functionality of SupportsProjectionPushDown and SupportsReadingMetadata

Posted by Shengkai Fang <fs...@gmail.com>.
Hi, Ran.

I think it's a little difficult to split the rule into two parts. Because
the ProjectDown and ReadingMetadata both need to reorder the fields. The
ReadingMetadata requires the metadata columns to be at the last and the
ProjectPushDown now is responsible to reorder the columns as the user
specified.

One more concern, the push-down optimization occurs in the logical phase
that uses the volcano planner. I am not sure whether the rule will be
applied at last because metadata push-down doesn't change the cost.

Best,
Shengkai

Re: Confusion about some overlapping functionality of SupportsProjectionPushDown and SupportsReadingMetadata

Posted by Ran Tao <ch...@gmail.com>.
hi, yuxia, thanks for your feedback.

The point I am talking about here is that when implementing two interfaces
at the same time, the SupportsProjectionPushDown method can actually be
left blank, just use SupportsProjectionPushDown to mark it,
and then SupportsReadingMetadata can complete the physical column pushdown
and metadata processing at the same time.

Another point is that if the methods of two interfaces are implemented at
the same time,
the developer may have unexpected behaviors due to the sequence problem. Of
course I agree to update the java doc notes.

yuxia <lu...@alumni.sjtu.edu.cn> 于2023年2月9日周四 16:09写道:

> Hi, Ran Tao.
> Thanks for bring it up.
> TBH, to me, it's not as so confusing.
> Is that the fact that the applyReadableMetadata and applyProjection all
> will pass producedDataType and the source conneector developer will
> need to choose which one as the finnal output type?
>
> As the Java doc of applyReadableMetadata said, use the producedDataType in
> this method instead of applyProjection.
>
> For you question, I think the responsibilities of these two interfaces are
> quite independent. What kind of independence are you expecting?
>
> Btw, to aovid confusing, i think we may need to specific it that the
> applyProjection will be called before method applyReadableMetadata
> in the java doc of applyReadableMetadata.
>
>
> Best regards,
> Yuxia
>
> ----- 原始邮件 -----
> 发件人: "Ran Tao" <ch...@gmail.com>
> 收件人: "dev" <de...@flink.apache.org>
> 发送时间: 星期三, 2023年 2 月 08日 下午 8:46:45
> 主题: Confusion about some overlapping functionality of
> SupportsProjectionPushDown and SupportsReadingMetadata
>
> Currently we use SupportsProjectionPushDown to push down physical columns,
> and SupportsReadingMetadata is used to read metadata columns.
> There is no problem when implementing one of the interfaces alone. If two
> interfaces are implemented at the same time, there will be confusing
> semantics.
>
> For example, if we update the schema or producedDataType in
> SupportsProjectionPushDown#applyProjection and
> SupportsReadingMetadata#applyReadableMetadata at the same time, the former
> is actually invalid, because the former is called first, and then the
> latter will overwrite it.
>
> There are some similar usage notes in the interface's documentation. But
> this is very confusing. In this case, you only need to implement
> SupportsReadingMetadata#applyReadableMetadata (only implement
> SupportsProjectionPushDown, the override method is empty), and the rule
> match logic of SupportsReadingMetadata will push down the physical column
> and metadata columns to generate producedDataType and return it.
>
> At this point SupportsProjectionPushDown is more like a marker interface.
> In addition, if some member variables are relied on in the implementation
> of SupportsReadingMetadata, and the member variables are also updated in
> SupportsProjectionPushDown, unexpected problems may occur. Developers
> should clearly read the implementation of these two interfaces and
> understand that these overlapping functions will cause a certain
> development cost to the developer of the connector (normally, the two
> interfaces should be isolated functions, developers see the meaning of the
> name ).
>
> I wonder if the community has considered making the responsibilities of
> these two interfaces more independent and clear in subsequent updates.
> Maybe my understanding is not very sufficient, looking forward to your
> opinions.
>
> --
> Best Regards,
> Ran Tao
> https://github.com/chucheng92
>


-- 
Best Regards,
Ran Tao
https://github.com/chucheng92

Re: Confusion about some overlapping functionality of SupportsProjectionPushDown and SupportsReadingMetadata

Posted by yuxia <lu...@alumni.sjtu.edu.cn>.
Hi, Ran Tao.
Thanks for bring it up. 
TBH, to me, it's not as so confusing. 
Is that the fact that the applyReadableMetadata and applyProjection all will pass producedDataType and the source conneector developer will
need to choose which one as the finnal output type? 

As the Java doc of applyReadableMetadata said, use the producedDataType in this method instead of applyProjection.

For you question, I think the responsibilities of these two interfaces are quite independent. What kind of independence are you expecting?

Btw, to aovid confusing, i think we may need to specific it that the applyProjection will be called before method applyReadableMetadata
in the java doc of applyReadableMetadata.


Best regards,
Yuxia

----- 原始邮件 -----
发件人: "Ran Tao" <ch...@gmail.com>
收件人: "dev" <de...@flink.apache.org>
发送时间: 星期三, 2023年 2 月 08日 下午 8:46:45
主题: Confusion about some overlapping functionality of SupportsProjectionPushDown and SupportsReadingMetadata

Currently we use SupportsProjectionPushDown to push down physical columns,
and SupportsReadingMetadata is used to read metadata columns.
There is no problem when implementing one of the interfaces alone. If two
interfaces are implemented at the same time, there will be confusing
semantics.

For example, if we update the schema or producedDataType in
SupportsProjectionPushDown#applyProjection and
SupportsReadingMetadata#applyReadableMetadata at the same time, the former
is actually invalid, because the former is called first, and then the
latter will overwrite it.

There are some similar usage notes in the interface's documentation. But
this is very confusing. In this case, you only need to implement
SupportsReadingMetadata#applyReadableMetadata (only implement
SupportsProjectionPushDown, the override method is empty), and the rule
match logic of SupportsReadingMetadata will push down the physical column
and metadata columns to generate producedDataType and return it.

At this point SupportsProjectionPushDown is more like a marker interface.
In addition, if some member variables are relied on in the implementation
of SupportsReadingMetadata, and the member variables are also updated in
SupportsProjectionPushDown, unexpected problems may occur. Developers
should clearly read the implementation of these two interfaces and
understand that these overlapping functions will cause a certain
development cost to the developer of the connector (normally, the two
interfaces should be isolated functions, developers see the meaning of the
name ).

I wonder if the community has considered making the responsibilities of
these two interfaces more independent and clear in subsequent updates.
Maybe my understanding is not very sufficient, looking forward to your
opinions.

-- 
Best Regards,
Ran Tao
https://github.com/chucheng92