You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by David Morávek <dm...@apache.org> on 2021/12/06 11:27:53 UTC

[DISCUSS] Strong read-after-write consistency of Flink FileSystems

Hi Everyone,

as outlined in FLIP-194 discussion [1], for the future directions of Flink
HA services, I'd like to verify my thoughts around guarantees of the
distributed filesystems used with Flink.

Currently some of the services (*JobGraphStore*, *CompletedCheckpointStore*)
are implemented using a combination of strongly consistent Metadata storage
(ZooKeeper, K8s CM) and the actual FileSystem. Reasoning behind this dates
back to days, when S3 was an eventually consistent FileSystem and we needed
a strongly consistent view of the data.

I did some research, and my feeling is that all the major FileSystems that
Flink supports already provide strong read-after-write consistency, which
would be sufficient to decrease a complexity of the current HA
implementations.

FileSystems that I've checked and that seem to support strong
read-after-write consistency:
- S3
- GCS
- Azure Blob Storage
- Aliyun OSS
- HDFS
- Minio

Are you aware of other FileSystems that are used with Flink? Do they
support the consistency that is required for starting a new initiatives
towards simpler / less error-prone HA services? Are you aware of any
problems with the above mentioned FileSystems that I might have missed?

I'm also bringing this up to user@f.a.o, to make sure we don't miss any
FileSystems.

[1] https://lists.apache.org/thread/wlzv02jqtq221kb8dnm82v4xj8tomd94

Best,
D.

Re: [DISCUSS] Strong read-after-write consistency of Flink FileSystems

Posted by David Morávek <dm...@apache.org>.

Any other thoughts on the topic? If there are no concerns, I'd continue
with creating a FLIP for changing the "written" contract of the Flink
FileSystems to reflect this.

Best,
D.


On Wed, Dec 8, 2021 at 5:53 PM David Morávek <dm...@apache.org> wrote:

> Hi Martijn,
>
> I simply wasn't aware of that one :) It seems to be provided the
> guarantees that we need [1].
>
>> Of course, Azure Storage is built on a platform grounded in strong
>> consistency guaranteeing that writes are made durable before acknowledging
>> success to the client. This is critically important for big data workloads
>> where the output from one task is often the input to the next job. This
>> greatly simplifies development of big data applications since they do not
>> have to work around issues that surface with weaker consistency models such
>> as eventual consistency.
>>
>
> I'm not able to find the guarantees for MapR FS, but since it has been
> designed as an HDFS replacement back in the days, I'd except it provides
> the same guarantees as MapReduce heavily relies on this. I've seen that
> you've already started a deprecation thread.
>
> [1]
> https://azure.microsoft.com/en-us/blog/a-closer-look-at-azure-data-lake-storage-gen2/
>
> D.
>
> On Wed, Dec 8, 2021 at 4:34 PM Martijn Visser <ma...@ververica.com>
> wrote:
>
>> Hi David,
>>
>> Just to be sure, since you've already included Azure Blob Storage, but
>> did you deliberately skip Azure Data Lake Store Gen2? That's currently
>> supported and also used by Flink users [1]. There's also MapR FS, but I
>> doubt if that is still used.
>>
>> Best regards,
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/overview/
>>
>> On Mon, 6 Dec 2021 at 12:28, David Morávek <dm...@apache.org> wrote:
>>
>>> Hi Everyone,
>>>
>>> as outlined in FLIP-194 discussion [1], for the future directions of
>>> Flink HA services, I'd like to verify my thoughts around guarantees of the
>>> distributed filesystems used with Flink.
>>>
>>> Currently some of the services (*JobGraphStore*,
>>> *CompletedCheckpointStore*) are implemented using a combination of
>>> strongly consistent Metadata storage (ZooKeeper, K8s CM) and the actual
>>> FileSystem. Reasoning behind this dates back to days, when S3 was an
>>> eventually consistent FileSystem and we needed a strongly consistent view
>>> of the data.
>>>
>>> I did some research, and my feeling is that all the major FileSystems
>>> that Flink supports already provide strong read-after-write consistency,
>>> which would be sufficient to decrease a complexity of the current HA
>>> implementations.
>>>
>>> FileSystems that I've checked and that seem to support strong
>>> read-after-write consistency:
>>> - S3
>>> - GCS
>>> - Azure Blob Storage
>>> - Aliyun OSS
>>> - HDFS
>>> - Minio
>>>
>>> Are you aware of other FileSystems that are used with Flink? Do they
>>> support the consistency that is required for starting a new initiatives
>>> towards simpler / less error-prone HA services? Are you aware of any
>>> problems with the above mentioned FileSystems that I might have missed?
>>>
>>> I'm also bringing this up to user@f.a.o, to make sure we don't miss any
>>> FileSystems.
>>>
>>> [1] https://lists.apache.org/thread/wlzv02jqtq221kb8dnm82v4xj8tomd94
>>>
>>> Best,
>>> D.
>>>
>>

Re: [DISCUSS] Strong read-after-write consistency of Flink FileSystems

Posted by David Morávek <dm...@apache.org>.

Any other thoughts on the topic? If there are no concerns, I'd continue
with creating a FLIP for changing the "written" contract of the Flink
FileSystems to reflect this.

Best,
D.


On Wed, Dec 8, 2021 at 5:53 PM David Morávek <dm...@apache.org> wrote:

> Hi Martijn,
>
> I simply wasn't aware of that one :) It seems to be provided the
> guarantees that we need [1].
>
>> Of course, Azure Storage is built on a platform grounded in strong
>> consistency guaranteeing that writes are made durable before acknowledging
>> success to the client. This is critically important for big data workloads
>> where the output from one task is often the input to the next job. This
>> greatly simplifies development of big data applications since they do not
>> have to work around issues that surface with weaker consistency models such
>> as eventual consistency.
>>
>
> I'm not able to find the guarantees for MapR FS, but since it has been
> designed as an HDFS replacement back in the days, I'd except it provides
> the same guarantees as MapReduce heavily relies on this. I've seen that
> you've already started a deprecation thread.
>
> [1]
> https://azure.microsoft.com/en-us/blog/a-closer-look-at-azure-data-lake-storage-gen2/
>
> D.
>
> On Wed, Dec 8, 2021 at 4:34 PM Martijn Visser <ma...@ververica.com>
> wrote:
>
>> Hi David,
>>
>> Just to be sure, since you've already included Azure Blob Storage, but
>> did you deliberately skip Azure Data Lake Store Gen2? That's currently
>> supported and also used by Flink users [1]. There's also MapR FS, but I
>> doubt if that is still used.
>>
>> Best regards,
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/overview/
>>
>> On Mon, 6 Dec 2021 at 12:28, David Morávek <dm...@apache.org> wrote:
>>
>>> Hi Everyone,
>>>
>>> as outlined in FLIP-194 discussion [1], for the future directions of
>>> Flink HA services, I'd like to verify my thoughts around guarantees of the
>>> distributed filesystems used with Flink.
>>>
>>> Currently some of the services (*JobGraphStore*,
>>> *CompletedCheckpointStore*) are implemented using a combination of
>>> strongly consistent Metadata storage (ZooKeeper, K8s CM) and the actual
>>> FileSystem. Reasoning behind this dates back to days, when S3 was an
>>> eventually consistent FileSystem and we needed a strongly consistent view
>>> of the data.
>>>
>>> I did some research, and my feeling is that all the major FileSystems
>>> that Flink supports already provide strong read-after-write consistency,
>>> which would be sufficient to decrease a complexity of the current HA
>>> implementations.
>>>
>>> FileSystems that I've checked and that seem to support strong
>>> read-after-write consistency:
>>> - S3
>>> - GCS
>>> - Azure Blob Storage
>>> - Aliyun OSS
>>> - HDFS
>>> - Minio
>>>
>>> Are you aware of other FileSystems that are used with Flink? Do they
>>> support the consistency that is required for starting a new initiatives
>>> towards simpler / less error-prone HA services? Are you aware of any
>>> problems with the above mentioned FileSystems that I might have missed?
>>>
>>> I'm also bringing this up to user@f.a.o, to make sure we don't miss any
>>> FileSystems.
>>>
>>> [1] https://lists.apache.org/thread/wlzv02jqtq221kb8dnm82v4xj8tomd94
>>>
>>> Best,
>>> D.
>>>
>>

Re: [DISCUSS] Strong read-after-write consistency of Flink FileSystems

Posted by David Morávek <dm...@apache.org>.

Hi Martijn,

I simply wasn't aware of that one :) It seems to be provided the guarantees
that we need [1].

> Of course, Azure Storage is built on a platform grounded in strong
> consistency guaranteeing that writes are made durable before acknowledging
> success to the client. This is critically important for big data workloads
> where the output from one task is often the input to the next job. This
> greatly simplifies development of big data applications since they do not
> have to work around issues that surface with weaker consistency models such
> as eventual consistency.
>

I'm not able to find the guarantees for MapR FS, but since it has been
designed as an HDFS replacement back in the days, I'd except it provides
the same guarantees as MapReduce heavily relies on this. I've seen that
you've already started a deprecation thread.

[1]
https://azure.microsoft.com/en-us/blog/a-closer-look-at-azure-data-lake-storage-gen2/

D.

On Wed, Dec 8, 2021 at 4:34 PM Martijn Visser <ma...@ververica.com> wrote:

> Hi David,
>
> Just to be sure, since you've already included Azure Blob Storage, but did
> you deliberately skip Azure Data Lake Store Gen2? That's currently
> supported and also used by Flink users [1]. There's also MapR FS, but I
> doubt if that is still used.
>
> Best regards,
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/overview/
>
> On Mon, 6 Dec 2021 at 12:28, David Morávek <dm...@apache.org> wrote:
>
>> Hi Everyone,
>>
>> as outlined in FLIP-194 discussion [1], for the future directions of
>> Flink HA services, I'd like to verify my thoughts around guarantees of the
>> distributed filesystems used with Flink.
>>
>> Currently some of the services (*JobGraphStore*,
>> *CompletedCheckpointStore*) are implemented using a combination of
>> strongly consistent Metadata storage (ZooKeeper, K8s CM) and the actual
>> FileSystem. Reasoning behind this dates back to days, when S3 was an
>> eventually consistent FileSystem and we needed a strongly consistent view
>> of the data.
>>
>> I did some research, and my feeling is that all the major FileSystems
>> that Flink supports already provide strong read-after-write consistency,
>> which would be sufficient to decrease a complexity of the current HA
>> implementations.
>>
>> FileSystems that I've checked and that seem to support strong
>> read-after-write consistency:
>> - S3
>> - GCS
>> - Azure Blob Storage
>> - Aliyun OSS
>> - HDFS
>> - Minio
>>
>> Are you aware of other FileSystems that are used with Flink? Do they
>> support the consistency that is required for starting a new initiatives
>> towards simpler / less error-prone HA services? Are you aware of any
>> problems with the above mentioned FileSystems that I might have missed?
>>
>> I'm also bringing this up to user@f.a.o, to make sure we don't miss any
>> FileSystems.
>>
>> [1] https://lists.apache.org/thread/wlzv02jqtq221kb8dnm82v4xj8tomd94
>>
>> Best,
>> D.
>>
>

Re: [DISCUSS] Strong read-after-write consistency of Flink FileSystems

Posted by David Morávek <dm...@apache.org>.

Hi Martijn,

I simply wasn't aware of that one :) It seems to be provided the guarantees
that we need [1].

> Of course, Azure Storage is built on a platform grounded in strong
> consistency guaranteeing that writes are made durable before acknowledging
> success to the client. This is critically important for big data workloads
> where the output from one task is often the input to the next job. This
> greatly simplifies development of big data applications since they do not
> have to work around issues that surface with weaker consistency models such
> as eventual consistency.
>

I'm not able to find the guarantees for MapR FS, but since it has been
designed as an HDFS replacement back in the days, I'd except it provides
the same guarantees as MapReduce heavily relies on this. I've seen that
you've already started a deprecation thread.

[1]
https://azure.microsoft.com/en-us/blog/a-closer-look-at-azure-data-lake-storage-gen2/

D.

On Wed, Dec 8, 2021 at 4:34 PM Martijn Visser <ma...@ververica.com> wrote:

> Hi David,
>
> Just to be sure, since you've already included Azure Blob Storage, but did
> you deliberately skip Azure Data Lake Store Gen2? That's currently
> supported and also used by Flink users [1]. There's also MapR FS, but I
> doubt if that is still used.
>
> Best regards,
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/overview/
>
> On Mon, 6 Dec 2021 at 12:28, David Morávek <dm...@apache.org> wrote:
>
>> Hi Everyone,
>>
>> as outlined in FLIP-194 discussion [1], for the future directions of
>> Flink HA services, I'd like to verify my thoughts around guarantees of the
>> distributed filesystems used with Flink.
>>
>> Currently some of the services (*JobGraphStore*,
>> *CompletedCheckpointStore*) are implemented using a combination of
>> strongly consistent Metadata storage (ZooKeeper, K8s CM) and the actual
>> FileSystem. Reasoning behind this dates back to days, when S3 was an
>> eventually consistent FileSystem and we needed a strongly consistent view
>> of the data.
>>
>> I did some research, and my feeling is that all the major FileSystems
>> that Flink supports already provide strong read-after-write consistency,
>> which would be sufficient to decrease a complexity of the current HA
>> implementations.
>>
>> FileSystems that I've checked and that seem to support strong
>> read-after-write consistency:
>> - S3
>> - GCS
>> - Azure Blob Storage
>> - Aliyun OSS
>> - HDFS
>> - Minio
>>
>> Are you aware of other FileSystems that are used with Flink? Do they
>> support the consistency that is required for starting a new initiatives
>> towards simpler / less error-prone HA services? Are you aware of any
>> problems with the above mentioned FileSystems that I might have missed?
>>
>> I'm also bringing this up to user@f.a.o, to make sure we don't miss any
>> FileSystems.
>>
>> [1] https://lists.apache.org/thread/wlzv02jqtq221kb8dnm82v4xj8tomd94
>>
>> Best,
>> D.
>>
>

Re: [DISCUSS] Strong read-after-write consistency of Flink FileSystems

Posted by Martijn Visser <ma...@ververica.com>.

Hi David,

Just to be sure, since you've already included Azure Blob Storage, but did
you deliberately skip Azure Data Lake Store Gen2? That's currently
supported and also used by Flink users [1]. There's also MapR FS, but I
doubt if that is still used.

Best regards,

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/overview/

On Mon, 6 Dec 2021 at 12:28, David Morávek <dm...@apache.org> wrote:

> Hi Everyone,
>
> as outlined in FLIP-194 discussion [1], for the future directions of Flink
> HA services, I'd like to verify my thoughts around guarantees of the
> distributed filesystems used with Flink.
>
> Currently some of the services (*JobGraphStore*,
> *CompletedCheckpointStore*) are implemented using a combination of
> strongly consistent Metadata storage (ZooKeeper, K8s CM) and the actual
> FileSystem. Reasoning behind this dates back to days, when S3 was an
> eventually consistent FileSystem and we needed a strongly consistent view
> of the data.
>
> I did some research, and my feeling is that all the major FileSystems that
> Flink supports already provide strong read-after-write consistency, which
> would be sufficient to decrease a complexity of the current HA
> implementations.
>
> FileSystems that I've checked and that seem to support strong
> read-after-write consistency:
> - S3
> - GCS
> - Azure Blob Storage
> - Aliyun OSS
> - HDFS
> - Minio
>
> Are you aware of other FileSystems that are used with Flink? Do they
> support the consistency that is required for starting a new initiatives
> towards simpler / less error-prone HA services? Are you aware of any
> problems with the above mentioned FileSystems that I might have missed?
>
> I'm also bringing this up to user@f.a.o, to make sure we don't miss any
> FileSystems.
>
> [1] https://lists.apache.org/thread/wlzv02jqtq221kb8dnm82v4xj8tomd94
>
> Best,
> D.
>

Re: [DISCUSS] Strong read-after-write consistency of Flink FileSystems

Posted by Martijn Visser <ma...@ververica.com>.

Hi David,

Just to be sure, since you've already included Azure Blob Storage, but did
you deliberately skip Azure Data Lake Store Gen2? That's currently
supported and also used by Flink users [1]. There's also MapR FS, but I
doubt if that is still used.

Best regards,

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/overview/

On Mon, 6 Dec 2021 at 12:28, David Morávek <dm...@apache.org> wrote:

> Hi Everyone,
>
> as outlined in FLIP-194 discussion [1], for the future directions of Flink
> HA services, I'd like to verify my thoughts around guarantees of the
> distributed filesystems used with Flink.
>
> Currently some of the services (*JobGraphStore*,
> *CompletedCheckpointStore*) are implemented using a combination of
> strongly consistent Metadata storage (ZooKeeper, K8s CM) and the actual
> FileSystem. Reasoning behind this dates back to days, when S3 was an
> eventually consistent FileSystem and we needed a strongly consistent view
> of the data.
>
> I did some research, and my feeling is that all the major FileSystems that
> Flink supports already provide strong read-after-write consistency, which
> would be sufficient to decrease a complexity of the current HA
> implementations.
>
> FileSystems that I've checked and that seem to support strong
> read-after-write consistency:
> - S3
> - GCS
> - Azure Blob Storage
> - Aliyun OSS
> - HDFS
> - Minio
>
> Are you aware of other FileSystems that are used with Flink? Do they
> support the consistency that is required for starting a new initiatives
> towards simpler / less error-prone HA services? Are you aware of any
> problems with the above mentioned FileSystems that I might have missed?
>
> I'm also bringing this up to user@f.a.o, to make sure we don't miss any
> FileSystems.
>
> [1] https://lists.apache.org/thread/wlzv02jqtq221kb8dnm82v4xj8tomd94
>
> Best,
> D.
>