You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Andreas Hailu (Jira)" <ji...@apache.org> on 2022/05/28 15:12:00 UTC

[jira] [Created] (FLINK-27827) StreamExecutionEnvironment method supporting explicit Boundedness

Andreas Hailu created FLINK-27827:
-------------------------------------

             Summary: StreamExecutionEnvironment method supporting explicit Boundedness
                 Key: FLINK-27827
                 URL: https://issues.apache.org/jira/browse/FLINK-27827
             Project: Flink
          Issue Type: Improvement
          Components: API / DataStream
            Reporter: Andreas Hailu


When creating a {{{}DataStreamSource{}}}, an explicitly bounded input is only returned if the {{InputFormat}} provided implements {{{}FileInputFormat{}}}. This is results in runtime exceptions when trying to run applications in Batch execution mode while using non {{{}FileInputFormat{}}}s e.g. Apache Iceberg [1], Flink's Hadoop MapReduce compatibility API's [2] inputs, etc...

I understand there is a {{DataSource}} API [3] that supports the specification of the boundedness of an input, but that would require all connectors to revise their APIs to leverage it which would take some time.

My organization is in the middle of migrating from the {{DataSet}} API to the {{DataStream }}API, and we've found this to be a challenge as nearly all of our inputs have been determines to be unbounded as we use {{InputFormats}} that are not {{{}FileInputFormat{}}}s. Our work-around was to provide a local patch in {{StreamExecutionEnvironment}} with a method supporting explicitly bounded inputs.

As this helped us implement a Batch {{DataStream}} solution, perhaps this is something that may be helpful for others?

 

[1] [https://iceberg.apache.org/docs/latest/flink/#reading-with-datastream]

[2] [https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/dataset/hadoop_map_reduce/] 

[3] [https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/sources/#the-data-source-api] 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)