You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Hector He <he...@qq.com> on 2020/11/17 03:30:29 UTC

Re: [DISCUSS] Removing deprecated methods from DataStream API

May I have a ask about deprecating readFileStream(...), is there a
alternative to this method? Source code lead me to use readFile instead, but
it does not perform as readFileStream, readFileStream can reads file content
incrementally, but readFile with FileProcessingMode.PROCESS_CONTINUOUSLY
argument reads all file conent every time when the content changes. So why
will Flink make readFileStream to be deprecated but without a better
alternative?

From the description of official document below link,
FileProcessingMode.PROCESS_CONTINUOUSLY will break the “exactly-once”
semantics.

https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/datastream_api.html#data-sources



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Kostas Kloudas <kk...@gmail.com>.
Hi Hector,

The main reasons for deprecating the readFileStream() was that:
1) it was only capable of parsing Strings and in a rather limited way
as one could not even specify the encoding
2) it was not fault-tolerant, so your concerns about exactly-once were
not covered

One concern that I can find about keeping the last read index for
every file that we have seen so far,
is that this would simply blow up the memory.

Two things I would like to also mention are that:
1) the method has been deprecated a long time ago.
2) there is a new FileSource coming with 1.12 that may be interesting
for you [1].

Cheers,
Kostas

 [1] https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/FileSource.java

On Tue, Nov 17, 2020 at 4:30 AM Hector He <he...@qq.com> wrote:
>
> May I have a ask about deprecating readFileStream(...), is there a
> alternative to this method? Source code lead me to use readFile instead, but
> it does not perform as readFileStream, readFileStream can reads file content
> incrementally, but readFile with FileProcessingMode.PROCESS_CONTINUOUSLY
> argument reads all file conent every time when the content changes. So why
> will Flink make readFileStream to be deprecated but without a better
> alternative?
>
> From the description of official document below link,
> FileProcessingMode.PROCESS_CONTINUOUSLY will break the “exactly-once”
> semantics.
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/datastream_api.html#data-sources
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/