You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Alexis Sarda-Espinosa <al...@microfocus.com> on 2021/03/12 14:37:07 UTC

DataStream in batch mode - handling (un)ordered bounded data

Hello,

Regarding the new BATCH mode of the data stream API, I see that the documentation states that some operators will process all data for a given key before moving on to the next one. However, I don't see how Flink is supposed to know whether the input will provide all data for a given key sequentially. In the DataSet API, an (undocumented?) feature is using SplitDataProperties (https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/api/java/io/SplitDataProperties.html) to specify different grouping/partitioning/sorting properties, so if the data is pre-sorted (e.g. when reading from a database), some operations can be optimized. Will the DataStream API get something similar?

Regards,
Alexis.


Re: DataStream in batch mode - handling (un)ordered bounded data

Posted by Dawid Wysakowicz <dw...@apache.org>.
Hi Alexis,

As of now there is no such feature in the DataStream API. The Batch mode
in DataStream API is a new feature and we would be interested to hear
about the use cases people want to use it for to identify potential
areas to improve. What you are suggesting generally make sense so I
think it would be nice if you could create a jira ticket for it.

Best,

Dawid

On 12/03/2021 15:37, Alexis Sarda-Espinosa wrote:
>
> Hello,
>
>  
>
> Regarding the new BATCH mode of the data stream API, I see that the
> documentation states that some operators will process all data for a
> given key before moving on to the next one. However, I don’t see how
> Flink is supposed to know whether the input will provide all data for
> a given key sequentially. In the DataSet API, an (undocumented?)
> feature is using SplitDataProperties
> (https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/api/java/io/SplitDataProperties.html
> <https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/api/java/io/SplitDataProperties.html>)
> to specify different grouping/partitioning/sorting properties, so if
> the data is pre-sorted (e.g. when reading from a database), some
> operations can be optimized. Will the DataStream API get something
> similar?
>
>  
>
> Regards,
>
> Alexis.
>
>  
>