You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Alexander Sorokoumov <as...@confluent.io.INVALID> on 2022/11/07 14:59:01 UTC

Questions about Flink Table Store

I’m Alexander from Confluent. I am new to Flink and its community. I would
like to contribute to the Flink Table Store, but am missing certain
details. Can someone please clarify the points mentioned below to me?

   - Given that there is always a single writer to a stream, in what
   situations can concurrent writes ever happen to Flink Table Store? FLIP-188
   mentions reprocessing and snapshot generation, but I do not understand how
   these cases can lead to more than a single writer.
   - If there are concurrent INSERTs into a table baked by Flink Table
   Store, how and by what component are they serialized?
   - Is Flink Table Store going to support ACID transactions?
   - Do Flink Table Store snapshots correspond 1:1 to Flink checkpoints?
   - Does Flink Table Store (plan to) support secondary indexes?
   - Is there an open roadmap for this Flink Table Store?

Thank you,
Alexander

Re: Questions about Flink Table Store

Posted by Caizhi Weng <ts...@gmail.com>.
Hi Alexander!

Thanks for your interest in Flink Table Store and I'm glad to share my
thoughts with you.

Given that there is always a single writer to a stream, in what situations
> can concurrent writes ever happen to Flink Table Store?


I'm not the author of FLIP so I'm not sure what "writer" refers to in this
FLIP.

Currently, each parallelism of Flink Table Store sink will write to its own
bucket, so no concurrent write will occur at bucket level for a single job.
However a table can consist of multiple buckets (records from all buckets
will be merged when reading the table, aka the merge-on-read strategy), so
within a Flink job, multiple sink parallelisms can write into their own
buckets concurrently.

Flink Table Store also supports multiple jobs writing the same table, even
the same bucket. Conflicts will be resolved by the commit process [1],
which is an optimistic concurrency control.

If there are concurrent INSERTs into a table baked by Flink Table Store,
> how and by what component are they serialized?


Do you mean how to determine the order of INSERTs?

The order of INSERTs are determined by their sequence number. When a writer
is created, it will read current sequence number from disk. As stated
above, no concurrent writes will occur for a single job. But for multiple
jobs running at the same time, they may create INSERTs with the same
sequence number. In that case the final result might be a mix of two jobs.
This is currently OK because Flink Table Store currently only provides an
isolation level of snapshot isolation, not serializable.

Is Flink Table Store going to support ACID transactions?


It's not on our road map for the short term.

Do Flink Table Store snapshots correspond 1:1 to Flink checkpoints?


Snapshots are committed when a checkpoint is notified as complete. However
Flink's checkpoint notification is a best-effort notification, meaning that
snapshots may be skipped. Also each commit currently consists up to 2
snapshots, one for all the new level 0 files and another for all compact
changes.

Does Flink Table Store (plan to) support secondary indexes?


Yes. We have such plan but haven't decided when to implement it.

Is there an open roadmap for this Flink Table Store?


As far as I know we currently don't. Maybe Jingsong Lee will know more
about this.

[1]
https://github.com/apache/flink-table-store/blob/master/flink-table-store-core/src/main/java/org/apache/flink/table/store/file/operation/FileStoreCommitImpl.java


Leonard Xu <xb...@gmail.com> 于2022年11月8日周二 22:47写道:

> Hi, Alexander
>
> Happy to hear that you’re willing to contribute to Flink, the answer of
> these specific tech design/functionality questions may need to dig the code.
> I’m not familiar with  FTS code base, but I’v CC my colleagues who are the
> core maintainers of FTS and maybe they can give your some insights.
>
>
> Best,
> Leonard
>
> > 2022年11月7日 下午10:59,Alexander Sorokoumov <as...@confluent.io.INVALID>
> 写道:
> >
> > I’m Alexander from Confluent. I am new to Flink and its community. I
> would
> > like to contribute to the Flink Table Store, but am missing certain
> > details. Can someone please clarify the points mentioned below to me?
> >
> >   - Given that there is always a single writer to a stream, in what
> >   situations can concurrent writes ever happen to Flink Table Store?
> FLIP-188
> >   mentions reprocessing and snapshot generation, but I do not understand
> how
> >   these cases can lead to more than a single writer.
> >   - If there are concurrent INSERTs into a table baked by Flink Table
> >   Store, how and by what component are they serialized?
> >   - Is Flink Table Store going to support ACID transactions?
> >   - Do Flink Table Store snapshots correspond 1:1 to Flink checkpoints?
> >   - Does Flink Table Store (plan to) support secondary indexes?
> >   - Is there an open roadmap for this Flink Table Store?
> >
> > Thank you,
> > Alexander
>
>

Re: Questions about Flink Table Store

Posted by Leonard Xu <xb...@gmail.com>.
Hi, Alexander

Happy to hear that you’re willing to contribute to Flink, the answer of these specific tech design/functionality questions may need to dig the code.
I’m not familiar with  FTS code base, but I’v CC my colleagues who are the core maintainers of FTS and maybe they can give your some insights.


Best,
Leonard

> 2022年11月7日 下午10:59,Alexander Sorokoumov <as...@confluent.io.INVALID> 写道:
> 
> I’m Alexander from Confluent. I am new to Flink and its community. I would
> like to contribute to the Flink Table Store, but am missing certain
> details. Can someone please clarify the points mentioned below to me?
> 
>   - Given that there is always a single writer to a stream, in what
>   situations can concurrent writes ever happen to Flink Table Store? FLIP-188
>   mentions reprocessing and snapshot generation, but I do not understand how
>   these cases can lead to more than a single writer.
>   - If there are concurrent INSERTs into a table baked by Flink Table
>   Store, how and by what component are they serialized?
>   - Is Flink Table Store going to support ACID transactions?
>   - Do Flink Table Store snapshots correspond 1:1 to Flink checkpoints?
>   - Does Flink Table Store (plan to) support secondary indexes?
>   - Is there an open roadmap for this Flink Table Store?
> 
> Thank you,
> Alexander