You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Chen Qin <qi...@gmail.com> on 2022/05/29 17:04:29 UTC

Table API thrift support

Hi there,

We would like to discuss and potentially upstream our thrift support
patches to flink.

For some context, we have been internally patched flink-1.11.2 to support
FlinkSQL jobs read/write to thrift encoded kafka source/sink. Over the
course of last 12 months, those patches supports a few features not
available in open source master, including

   - allow user defined inference thrift stub class name in table DDL,
   Thrift binary <-> Row
   - dynamic overwrite schema type information loaded from HiveCatalog
   (Table only)
   - forward compatible when kafka topic encode with new schema (adding new
   field)
   - backward compatible when job with new schema handles input or state
   with old schema

With more FlinkSQL jobs in production, we expect maintenance of divergent
feature sets to increase in the next 6-12 months. Specifically challenges
around

   - lack of systematic way to support inference based table/view ddl
   (parity with hiveql serde
   <https://cwiki.apache.org/confluence/display/hive/serde#:~:text=SerDe%20Overview,-SerDe%20is%20short&text=Hive%20uses%20the%20SerDe%20interface,HDFS%20in%20any%20custom%20format.>
   )
   - lack of robust mapping from thrift field to row field
   - dynamic update set of table with same inference class when performing
   schema change (e.g adding new field)
   - minor lack of handle UNSET case, use NULL

Please kindly provide pointers around the challenges section.

Thanks,
Chen, Pinterest.

Re: Table API thrift support

Posted by yuxia <lu...@alumni.sjtu.edu.cn>.

Thanks for raising it. 
But I wonder what do you mean by saying "dynamic overwrite schema type information loaded from HiveCatalog".
For such case, what does the HiveCatalog store?

Best regards,
Yuxia

----- 原始邮件 -----
发件人: "Chen Qin" <qi...@gmail.com>
收件人: "dev" <de...@flink.apache.org>
发送时间: 星期一, 2022年 5 月 30日 上午 1:04:29
主题: Table API thrift support

Hi there,

We would like to discuss and potentially upstream our thrift support
patches to flink.

For some context, we have been internally patched flink-1.11.2 to support
FlinkSQL jobs read/write to thrift encoded kafka source/sink. Over the
course of last 12 months, those patches supports a few features not
available in open source master, including

   - allow user defined inference thrift stub class name in table DDL,
   Thrift binary <-> Row
   - dynamic overwrite schema type information loaded from HiveCatalog
   (Table only)
   - forward compatible when kafka topic encode with new schema (adding new
   field)
   - backward compatible when job with new schema handles input or state
   with old schema

With more FlinkSQL jobs in production, we expect maintenance of divergent
feature sets to increase in the next 6-12 months. Specifically challenges
around

   - lack of systematic way to support inference based table/view ddl
   (parity with hiveql serde
   <https://cwiki.apache.org/confluence/display/hive/serde#:~:text=SerDe%20Overview,-SerDe%20is%20short&text=Hive%20uses%20the%20SerDe%20interface,HDFS%20in%20any%20custom%20format.>
   )
   - lack of robust mapping from thrift field to row field
   - dynamic update set of table with same inference class when performing
   schema change (e.g adding new field)
   - minor lack of handle UNSET case, use NULL

Please kindly provide pointers around the challenges section.

Thanks,
Chen, Pinterest.