You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by yuxia <lu...@alumni.sjtu.edu.cn> on 2023/02/08 04:28:52 UTC
Re: [Discussion] externalize Hive connector
Hi, Chen.
Really appreciated for you efforts. Nowadays, I'm trying to picking up the work of decoupling again for I find some time finally. I think may be you can first read the FLIP-216[1] to understand how we plan to decouple and
make sure the pr you open will be on the right way.
I think FLINK-30659[2] you created may be can as a godd start which is an import part of decoupling. But please remember that as i commented, the flink-sql-parser-hive is no longer used, it can be dropped directly.
[1] [ https://cwiki.apache.org/confluence/display/FLINK/FLIP-216%3A++Introduce+pluggable+dialect+and+plan+for+migrating+Hive+dialect | https://cwiki.apache.org/confluence/display/FLINK/FLIP-216%3A++Introduce+pluggable+dialect+and+plan+for+migrating+Hive+dialect ]
[2] [ https://issues.apache.org/jira/browse/FLINK-30659 | https://issues.apache.org/jira/browse/FLINK-30659 ]
Best regards,
Yuxia
发件人: "Chen Qin" <qi...@gmail.com>
收件人: luoyuxia@alumni.sjtu.edu.cn
抄送: "dev" <de...@flink.apache.org>
发送时间: 星期五, 2023年 1 月 20日 上午 7:35:43
主题: Re: [Discussion] externalize Hive connector
Hi Yuxia,
[ https://issues.apache.org/jira/browse/FLINK-30667 | FLINK-30667 ] aims to address hive-connector dependency on table-planner @internal class . Here is a bit of rationale.
ParserImpl in table-planner and HiveParser in hive connector should be able to evolve separately after future externalization; Parser interface on other hand should keep consistent when community plan for other dialect support. This would help decouple one of the internal class dependencies you mentioned in [ https://issues.apache.org/jira/browse/FLINK-26603 | FLINK-26603 ] .
QueryOperation is a simple operator class shared between table-planner as well as hive-connector. Keep a copy instead of announcing public in table-planner could decouple hive connectors from table-planner change or future proof when considering more customized operations.
PlannerContext acts as a utility class providing reference to a group of components. we could make it publicEvolving.
Let me know what you think.
Chen
On Tue, Jan 10, 2023 at 8:34 PM yuxia < [ mailto:luoyuxia@alumni.sjtu.edu.cn | luoyuxia@alumni.sjtu.edu.cn ] > wrote:
Hi, Chen.
Cool! Of course not. Much appreciated that you can help continue this work.
If you have any question about it, please let me know.
Best regards,
Yuxia
----- 原始邮件 -----
发件人: "Chen Qin" < [ mailto:qinnchen@gmail.com | qinnchen@gmail.com ] >
收件人: "dev" < [ mailto:dev@flink.apache.org | dev@flink.apache.org ] >
发送时间: 星期三, 2023年 1 月 11日 上午 11:24:48
主题: Re: [Discussion] externalize Hive connector
Yuxia,
thanks for details context, I see with your point on maintainability. If
your don’t mind, I would like to help continue the work you mentioned.
Chen
On Tue, Jan 10, 2023 at 03:47 yuxia < [ mailto:luoyuxia@alumni.sjtu.edu.cn | luoyuxia@alumni.sjtu.edu.cn ] > wrote:
> Hi, Chen.
> Appreciated your efforts.
> I would like to shared my thonghts about it.
>
> I do think it's not a good timing to externalize Hive connector right now.
> As you know, the Hive connector relies some internal methods in
> flink-table-planner. The Flink devs can
> modify them with more freedom as they are internal methods with providing
> any compatibility gurantee.
> If externalize it now, it may well happen that some methods relied by Hive
> connector have been removed/renamed from flink-table-planner, which will
> then cause the Hive connector fail.
> It'll bring much burden to the connector maintainer, the maintainer will
> need to take much care of these changes and do the adaption. As I see,
> it'll always end with only some users report this issue, did the maintainer
> notice that. It'll be definitely a bad user experience.
>
> It's a histrical technical debt that Hive connector relies the internal
> methods in flink-table-planner, and we have created FLINK-26603[1] to
> resolve it.
> I'm intended to finish it in Flink 1.17, but may well fail to catch up
> with Flink 1.17. Feel sorry for that for I'm busy with other stuff. If you
> have interest about it, appreciated that you can also take part in this
> jira.
>
> Of course you can just create a seperate Hive connector now, and then
> cherry pick changes from the Flink repo before the Hive connector can be
> removed from Flink repo. But I'm afraid that it may still take much
> efforts. And IIUC, the external Hive connector can't be released as offical
> Hive connector unless we resolve the problem I refered above.
> So, why not wait any thing is ready before externalize the Hive connector?
>
> Best regards,
> Yuxia
>
> ----- 原始邮件 -----
> 发件人: "Chen Qin" < [ mailto:qinnchen@gmail.com | qinnchen@gmail.com ] >
> 收件人: "dev" < [ mailto:dev@flink.apache.org | dev@flink.apache.org ] >
> 发送时间: 星期二, 2023年 1 月 10日 上午 11:47:02
> 主题: Re: [Discussion] externalize Hive connector
>
> Hi Martijin,
>
> Thank you for sharing your thoughts. In my opinion, FLINK-26603 is no
> longer blocking FLINK-30064.
> That being said, we could consider keeping FLINK-26603 as follow-up work
> and breaking it down into smaller tasks as we proceed.
>
> [PR] [ https://github.com/apache/flink-connector-hive/pull/3 | https://github.com/apache/flink-connector-hive/pull/3 ]
>
> Chen
>
> On Mon, Jan 9, 2023 at 7:45 AM Martijn Visser < [ mailto:martijnvisser@apache.org | martijnvisser@apache.org ] >
> wrote:
>
> > Hi Chen,
> >
> > Thanks for bringing this up! I think it would be great if the Hive
> > connector is externalized. We've already previously established [1] that
> it
> > should be externalized. I believe the only reason why this hasn't been
> done
> > yet is because it's blocked by
> > [ https://issues.apache.org/jira/browse/FLINK-26603 | https://issues.apache.org/jira/browse/FLINK-26603 ] . Is that still the
> case?
> >
> > Best regards,
> >
> > Martijn
> >
> > [1] [ https://lists.apache.org/thread/bk9f91o6wk66zdh353j1n7sfshh262tr | https://lists.apache.org/thread/bk9f91o6wk66zdh353j1n7sfshh262tr ]
> >
> > On Mon, Jan 9, 2023 at 4:22 PM Chen Qin < [ mailto:qinnchen@gmail.com | qinnchen@gmail.com ] > wrote:
> >
> > > Hi there,
> > >
> > > Following community guidance Externalized+Connector+development
> > > <
> >
> [ https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development | https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development ]
> >,
> > We
> > > would like to initiate discussion on moving connector/hive to
> > > apache/flink-connector-hive
> > > < [ https://github.com/apache/flink-connector-hive | https://github.com/apache/flink-connector-hive ] >.
> > >
> > > Currently proposed changes includes
> > >
> > > - cleanup dependencies introduced from hive/yarn dependencies with
> > > latest package version stated in properties section in POM file
> > > - add FlinkPlannerCalciteShim to handle PlannerCalcite API signature
> > > changes from 1.16 v.s 1.17-SNAPSHOT
> > > - add PackageITTests and ProductionArchitectureTests
> > > - [bonus] adding docker e2e tests with list of supported Hive/HMS
> > > versions
> > >
> > > Risk associated with this change includes not being able to release
> until
> > > 1.17 release, so we would have to keep cherry-pick changes from
> > > flink/connectors/hive for a period of time.
> > >
> > > Looking forward to hearing community feedback.
> > >
> > > Chen
> > >
> >
>