You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by yuxia <lu...@alumni.sjtu.edu.cn> on 2023/02/08 04:28:52 UTC

Re: [Discussion] externalize Hive connector

Hi, Chen. 

Really appreciated for you efforts. Nowadays, I'm trying to picking up the work of decoupling again for I find some time finally. I think may be you can first read the FLIP-216[1] to understand how we plan to decouple and 
make sure the pr you open will be on the right way. 

I think FLINK-30659[2] you created may be can as a godd start which is an import part of decoupling. But please remember that as i commented, the flink-sql-parser-hive is no longer used, it can be dropped directly. 


[1] [ https://cwiki.apache.org/confluence/display/FLINK/FLIP-216%3A++Introduce+pluggable+dialect+and+plan+for+migrating+Hive+dialect | https://cwiki.apache.org/confluence/display/FLINK/FLIP-216%3A++Introduce+pluggable+dialect+and+plan+for+migrating+Hive+dialect ] 
[2] [ https://issues.apache.org/jira/browse/FLINK-30659 | https://issues.apache.org/jira/browse/FLINK-30659 ] 

Best regards, 
Yuxia 


发件人: "Chen Qin" <qi...@gmail.com> 
收件人: luoyuxia@alumni.sjtu.edu.cn 
抄送: "dev" <de...@flink.apache.org> 
发送时间: 星期五, 2023年 1 月 20日 上午 7:35:43 
主题: Re: [Discussion] externalize Hive connector 

Hi Yuxia, 

[ https://issues.apache.org/jira/browse/FLINK-30667 | FLINK-30667 ] aims to address hive-connector dependency on table-planner @internal class . Here is a bit of rationale. 

ParserImpl in table-planner and HiveParser in hive connector should be able to evolve separately after future externalization; Parser interface on other hand should keep consistent when community plan for other dialect support. This would help decouple one of the internal class dependencies you mentioned in [ https://issues.apache.org/jira/browse/FLINK-26603 | FLINK-26603 ] . 

QueryOperation is a simple operator class shared between table-planner as well as hive-connector. Keep a copy instead of announcing public in table-planner could decouple hive connectors from table-planner change or future proof when considering more customized operations. 

PlannerContext acts as a utility class providing reference to a group of components. we could make it publicEvolving. 

Let me know what you think. 

Chen 



On Tue, Jan 10, 2023 at 8:34 PM yuxia < [ mailto:luoyuxia@alumni.sjtu.edu.cn | luoyuxia@alumni.sjtu.edu.cn ] > wrote: 


Hi, Chen. 
Cool! Of course not. Much appreciated that you can help continue this work. 
If you have any question about it, please let me know. 

Best regards, 
Yuxia 

----- 原始邮件 ----- 
发件人: "Chen Qin" < [ mailto:qinnchen@gmail.com | qinnchen@gmail.com ] > 
收件人: "dev" < [ mailto:dev@flink.apache.org | dev@flink.apache.org ] > 
发送时间: 星期三, 2023年 1 月 11日 上午 11:24:48 
主题: Re: [Discussion] externalize Hive connector 

Yuxia, 

thanks for details context, I see with your point on maintainability. If 
your don’t mind, I would like to help continue the work you mentioned. 

Chen 

On Tue, Jan 10, 2023 at 03:47 yuxia < [ mailto:luoyuxia@alumni.sjtu.edu.cn | luoyuxia@alumni.sjtu.edu.cn ] > wrote: 

> Hi, Chen. 
> Appreciated your efforts. 
> I would like to shared my thonghts about it. 
> 
> I do think it's not a good timing to externalize Hive connector right now. 
> As you know, the Hive connector relies some internal methods in 
> flink-table-planner. The Flink devs can 
> modify them with more freedom as they are internal methods with providing 
> any compatibility gurantee. 
> If externalize it now, it may well happen that some methods relied by Hive 
> connector have been removed/renamed from flink-table-planner, which will 
> then cause the Hive connector fail. 
> It'll bring much burden to the connector maintainer, the maintainer will 
> need to take much care of these changes and do the adaption. As I see, 
> it'll always end with only some users report this issue, did the maintainer 
> notice that. It'll be definitely a bad user experience. 
> 
> It's a histrical technical debt that Hive connector relies the internal 
> methods in flink-table-planner, and we have created FLINK-26603[1] to 
> resolve it. 
> I'm intended to finish it in Flink 1.17, but may well fail to catch up 
> with Flink 1.17. Feel sorry for that for I'm busy with other stuff. If you 
> have interest about it, appreciated that you can also take part in this 
> jira. 
> 
> Of course you can just create a seperate Hive connector now, and then 
> cherry pick changes from the Flink repo before the Hive connector can be 
> removed from Flink repo. But I'm afraid that it may still take much 
> efforts. And IIUC, the external Hive connector can't be released as offical 
> Hive connector unless we resolve the problem I refered above. 
> So, why not wait any thing is ready before externalize the Hive connector? 
> 
> Best regards, 
> Yuxia 
> 
> ----- 原始邮件 ----- 
> 发件人: "Chen Qin" < [ mailto:qinnchen@gmail.com | qinnchen@gmail.com ] > 
> 收件人: "dev" < [ mailto:dev@flink.apache.org | dev@flink.apache.org ] > 
> 发送时间: 星期二, 2023年 1 月 10日 上午 11:47:02 
> 主题: Re: [Discussion] externalize Hive connector 
> 
> Hi Martijin, 
> 
> Thank you for sharing your thoughts. In my opinion, FLINK-26603 is no 
> longer blocking FLINK-30064. 
> That being said, we could consider keeping FLINK-26603 as follow-up work 
> and breaking it down into smaller tasks as we proceed. 
> 
> [PR] [ https://github.com/apache/flink-connector-hive/pull/3 | https://github.com/apache/flink-connector-hive/pull/3 ] 
> 
> Chen 
> 
> On Mon, Jan 9, 2023 at 7:45 AM Martijn Visser < [ mailto:martijnvisser@apache.org | martijnvisser@apache.org ] > 
> wrote: 
> 
> > Hi Chen, 
> > 
> > Thanks for bringing this up! I think it would be great if the Hive 
> > connector is externalized. We've already previously established [1] that 
> it 
> > should be externalized. I believe the only reason why this hasn't been 
> done 
> > yet is because it's blocked by 
> > [ https://issues.apache.org/jira/browse/FLINK-26603 | https://issues.apache.org/jira/browse/FLINK-26603 ] . Is that still the 
> case? 
> > 
> > Best regards, 
> > 
> > Martijn 
> > 
> > [1] [ https://lists.apache.org/thread/bk9f91o6wk66zdh353j1n7sfshh262tr | https://lists.apache.org/thread/bk9f91o6wk66zdh353j1n7sfshh262tr ] 
> > 
> > On Mon, Jan 9, 2023 at 4:22 PM Chen Qin < [ mailto:qinnchen@gmail.com | qinnchen@gmail.com ] > wrote: 
> > 
> > > Hi there, 
> > > 
> > > Following community guidance Externalized+Connector+development 
> > > < 
> > 
> [ https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development | https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development ] 
> >, 
> > We 
> > > would like to initiate discussion on moving connector/hive to 
> > > apache/flink-connector-hive 
> > > < [ https://github.com/apache/flink-connector-hive | https://github.com/apache/flink-connector-hive ] >. 
> > > 
> > > Currently proposed changes includes 
> > > 
> > > - cleanup dependencies introduced from hive/yarn dependencies with 
> > > latest package version stated in properties section in POM file 
> > > - add FlinkPlannerCalciteShim to handle PlannerCalcite API signature 
> > > changes from 1.16 v.s 1.17-SNAPSHOT 
> > > - add PackageITTests and ProductionArchitectureTests 
> > > - [bonus] adding docker e2e tests with list of supported Hive/HMS 
> > > versions 
> > > 
> > > Risk associated with this change includes not being able to release 
> until 
> > > 1.17 release, so we would have to keep cherry-pick changes from 
> > > flink/connectors/hive for a period of time. 
> > > 
> > > Looking forward to hearing community feedback. 
> > > 
> > > Chen 
> > > 
> > 
>