You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by graceking lau <gr...@gmail.com> on 2023/02/06 01:24:31 UTC

Design decisions around flink table store

Hi there,

Recently I had a chance to get to know the flink-table-store project. I was
attracted by the idea behind it at first glance.

After reading the docs, I've got a question in my head for a while. It's
about the design of the file storage.

It looks like we can implement it based on the other popular open-source
libraries other than creating a totally new component (lsm tree based).
Hudi or iceburg looks like a good choice, since they both support change
logs saving and querying.
If we do it like this, there is no need to create a component for other
related computation engines (spark, hive or trinno) since they are already
supported by hudi or iceburg. It looks like a better solution for me
instead of creating another wheel.

So, here are my questions. Is there any issue not to write data as hudi or
iceburg? Why don't we choose them in the first design decision?

Looking forward to your answer!

(Not knowing if it's a good way to ask questions here, but I didn't find
another way yet. If it's not ok to ask in the mail, could someone please
point the right direction for me?)

Best regards,
Bright.

Re: Design decisions around flink table store

Posted by yuxia <lu...@alumni.sjtu.edu.cn>.
Hi, Bright. 

Thanks for reaching out. That's a really good question. 
Briefly speaking, the reason is both Hudi and iceberg are not efficient for updating. 
Also, the FLIP for flink-table-store has said why not hudi [1]: 

" 
Why doesn't FileStore use Hudi directly? 

1: Hudi aims to support the update of upsert, so it needs to forcibly define the primary key and time column. It is not easy to support all changelog types 
2: The update of Hudi is based on the index (currently there are BloomFilter and HBase). The data in the bucket is out of order. Every merge needs to be reread and rewritten, which is expensive. We need fast update storage, LSM is more suitable. 
" 

Also I have add JingSong Li to the mail list. He is the creator/maintainer of flink-table-store. Maybe he can provide more detail. 

[1] [ https://cwiki.apache.org/confluence/display/Flink/FLIP-188%3A+Introduce+Built-in+Dynamic+Table+Storage#FLIP188:IntroduceBuiltinDynamicTableStorage-UsingHudi | https://cwiki.apache.org/confluence/display/Flink/FLIP-188%3A+Introduce+Built-in+Dynamic+Table+Storage#FLIP188:IntroduceBuiltinDynamicTableStorage-UsingHudi ] 


Best regards, 
Yuxia 


发件人: "graceking lau" <gr...@gmail.com> 
收件人: "User" <us...@flink.apache.org> 
发送时间: 星期一, 2023年 2 月 06日 上午 9:24:31 
主题: Design decisions around flink table store 

Hi there, 

Recently I had a chance to get to know the flink-table-store project. I was attracted by the idea behind it at first glance. 

After reading the docs, I've got a question in my head for a while. It's about the design of the file storage. 

It looks like we can implement it based on the other popular open-source libraries other than creating a totally new component (lsm tree based). Hudi or iceburg looks like a good choice, since they both support change logs saving and querying. 
If we do it like this, there is no need to create a component for other related computation engines (spark, hive or trinno) since they are already supported by hudi or iceburg. It looks like a better solution for me instead of creating another wheel. 

So, here are my questions. Is there any issue not to write data as hudi or iceburg? Why don't we choose them in the first design decision? 

Looking forward to your answer! 

(Not knowing if it's a good way to ask questions here, but I didn't find another way yet. If it's not ok to ask in the mail, could someone please point the right direction for me?) 

Best regards, 
Bright.