You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Dawid Wysakowicz <dw...@apache.org> on 2020/08/17 08:53:01 UTC

[DISCUSS] Removing deprecated methods from DataStream API

Hi devs and users,

I wanted to ask you what do you think about removing some of the
deprecated APIs around the DataStream API.

The APIs I have in mind are:

  * RuntimeContext#getAllAccumulators (deprecated in 0.10)
  * DataStream#fold and all related classes and methods such as
    FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated
    in 1.3/1.4)
  * StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
    (deprecated in 1.5)
  * DataStream#split (deprecated in 1.8)
  * Methods in (Connected)DataStream that specify keys as either indices
    or field names such as DataStream#keyBy, DataStream#partitionCustom,
    ConnectedStream#keyBy, .... (deprecated in 1.11)

I think the first three should be straightforward. They are long
deprecated. The getAccumulators method is not used very often in my
opinion. The same applies to the DataStream#fold which additionally is
not very performant. Lastly the setStateBackend has an alternative with
a class from the AbstractStateBackend hierarchy, therefore it will be
still code compatible. Moreover if we remove the
#setStateBackend(AbstractStateBackend) we will get rid off warnings
users have right now when setting a statebackend as the correct method
cannot be used without an explicit casting.

As for the DataStream#split I know there were some objections against
removing the #split method in the past. I still believe the output tags
can replace the split method already.

The only problem in the last set of methods I propose to remove is that
they were deprecated only in the last release and those method were only
partially deprecated. Moreover some of the methods were not deprecated
in ConnectedStreams. Nevertheless I'd still be inclined to remove the
methods in this release.

Let me know what do you think about it.

Best,

Dawid


Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Kostas Kloudas <kk...@gmail.com>.
Hi Hector,

The main reasons for deprecating the readFileStream() was that:
1) it was only capable of parsing Strings and in a rather limited way
as one could not even specify the encoding
2) it was not fault-tolerant, so your concerns about exactly-once were
not covered

One concern that I can find about keeping the last read index for
every file that we have seen so far,
is that this would simply blow up the memory.

Two things I would like to also mention are that:
1) the method has been deprecated a long time ago.
2) there is a new FileSource coming with 1.12 that may be interesting
for you [1].

Cheers,
Kostas

 [1] https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/FileSource.java

On Tue, Nov 17, 2020 at 4:30 AM Hector He <he...@qq.com> wrote:
>
> May I have a ask about deprecating readFileStream(...), is there a
> alternative to this method? Source code lead me to use readFile instead, but
> it does not perform as readFileStream, readFileStream can reads file content
> incrementally, but readFile with FileProcessingMode.PROCESS_CONTINUOUSLY
> argument reads all file conent every time when the content changes. So why
> will Flink make readFileStream to be deprecated but without a better
> alternative?
>
> From the description of official document below link,
> FileProcessingMode.PROCESS_CONTINUOUSLY will break the “exactly-once”
> semantics.
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/datastream_api.html#data-sources
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Hector He <he...@qq.com>.
May I have a ask about deprecating readFileStream(...), is there a
alternative to this method? Source code lead me to use readFile instead, but
it does not perform as readFileStream, readFileStream can reads file content
incrementally, but readFile with FileProcessingMode.PROCESS_CONTINUOUSLY
argument reads all file conent every time when the content changes. So why
will Flink make readFileStream to be deprecated but without a better
alternative?

From the description of official document below link,
FileProcessingMode.PROCESS_CONTINUOUSLY will break the “exactly-once”
semantics.

https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/datastream_api.html#data-sources



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Dawid Wysakowicz <dw...@apache.org>.
@Yun Yes, I created the ticket with target version 2.0.0, which was
agreed by deprecating the method ;) I will soon start a vote for
actually removing it in 1.12 and if we agree on that I will change the
target version.

On 28/08/2020 09:33, Yun Tang wrote:
> I noticed that the ticket to remove deprecated DataStream#fold() [1] has been created but not yet reach an agreement or assigned.
>
> Actually fold related function and state descriptions has been deprecated for long than 3 years, and I think it's okay to remove such kind of state now.
>
> [1] https://issues.apache.org/jira/browse/FLINK-19035
>
> Best,
> Yun Tang
> ________________________________
> From: Aljoscha Krettek <al...@apache.org>
> Sent: Thursday, August 27, 2020 16:45
> To: dev@flink.apache.org <de...@flink.apache.org>
> Subject: Re: [DISCUSS] Removing deprecated methods from DataStream API
>
> Did you consider DataStream.project() yet? In general I think we should
> remove most of the relational-ish methods from DataStream. More
> candidates in this set of methods would be the tuple index/expression
> methods for aggregations like min/max etc...
>
> Aljoscha
>
> On 25.08.20 20:52, Konstantin Knauf wrote:
>> I would argue that the guarantees of @Public methods that became
>> ineffective were broken when they became ineffective (and were deprecated).
>>
>>     - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>     - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>
>> Removing these methods seems like the better of two evils to me as it shows
>> users that they have been using no-ops for some time.
>>
>> On Thu, Aug 20, 2020 at 10:50 AM Stephan Ewen <se...@apache.org> wrote:
>>
>>> We have removed some public methods in the past, after a careful
>>> deprecation period, if they were not well working any more.
>>>
>>> The sentiment I got from users is that careful cleanup is in fact
>>> appreciated, otherwise things get confusing over time (the deprecated
>>> methods cause noise in the API).
>>> Still, we need to be very careful here.
>>>
>>> I would suggest to
>>>    - start with the non-public breaking methods
>>>    - remove fold() (very long deprecated)
>>>    - remove split() buggy
>>>
>>> Removing the env.socketStream() and env.fileStream() methods would
>>> probably be good, too. They are very long deprecated and they don't work
>>> well (with checkpoints) and the sources are the first thing a user needs to
>>> understand when starting with Flink, so removing noise there is super
>>> helpful.
>>>
>>>
>>> On Thu, Aug 20, 2020 at 8:53 AM Dawid Wysakowicz <dw...@apache.org>
>>> wrote:
>>>
>>>> Hey Till,
>>>>
>>>> You've got a good point here. Removing some of the methods would cause
>>>> breaking the stability guarantees. I do understand if we decide not to
>>>> remove them for that reason, let me explain though why I am thinking it
>>>> might make sense to remove them already. First of all I am a bit afraid it
>>>> might take a long time before we arrive at the 2.0 version. We have not
>>>> ever discussed that in the community. At the same time a lot of the methods
>>>> already don't work or are buggy, and we do not fix them any more.
>>>>
>>>> Methods which removing would not break the Public guarantees:
>>>>
>>>>     - ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>>     - RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>>     - ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>>     - StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>>>>     (not the equivalent in the ExecutionConfig)
>>>>     - StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>>     (deprecated in 1.5)
>>>>
>>>> Methods which removing would break the Public guarantees:
>>>>
>>>> which have no effect:
>>>>
>>>>     - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>>     - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>>
>>>> which are buggy or discouraged and thus we do not support fixing them:
>>>>
>>>>     - DataStream#split (deprecated in 1.8)
>>>>     - DataStream#fold and all related classes and methods such as
>>>>     FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>>     1.3/1.4)
>>>>
>>>> The methods like:
>>>>
>>>>     - StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...),
>>>>
>>>>     - methods in (Connected)DataStream that specify keys as either
>>>>     indices or field names
>>>>     -
>>>>     ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>>>>
>>>> should be working just fine and I feel the least eager to remove those.
>>>>
>>>> I'd suggest I will open PRs for removing the methods that will not cause
>>>> breakage of the Public guarantees as the general feedback was rather
>>>> positive. For the rest I do understand the resentment to do so and will not
>>>> do it in the 1.x branch. Still I think it is valuable to have the
>>>> discussion.
>>>>
>>>> Best,
>>>>
>>>> Dawid
>>>>
>>>>
>>>> On 18/08/2020 09:26, Till Rohrmann wrote:
>>>>
>>>> Having looked at the proposed set of methods to remove I've noticed that
>>>> some of them are actually annotated with @Public. According to our
>>>> stability guarantees, only major releases (1.0, 2.0, etc.) can break APIs
>>>> with this annotation. Hence, I believe that we cannot simply remove them
>>>> unless the community decides to change the stability guarantees we give or
>>>> by making the next release a major release (Flink 2.0).
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Tue, Aug 18, 2020 at 5:57 AM Yun Gao <yu...@aliyun.com> wrote:
>>>>
>>>>> +1 for removing the methods that are deprecated for a while & have
>>>>> alternative methods.
>>>>>
>>>>> One specific thing is that if we remove the DataStream#split, do we
>>>>> consider enabling side-output in more operators in the future ? Currently
>>>>> it should be only available in ProcessFunctions, but not available to other
>>>>> commonly used UDF like Source or AsyncFunction[1].
>>>>>
>>>>> One temporary solution occurs to me is to add a ProcessFunction after
>>>>> the operators want to use side-output. But I think the solution is not very
>>>>> direct to come up with and if it really works we might add it to the
>>>>> document of side-output.
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/FLINK-7954
>>>>>
>>>>> Best,
>>>>>   Yun
>>>>>
>>>>> ------------------Original Mail ------------------
>>>>> *Sender:*Kostas Kloudas <kk...@gmail.com>
>>>>> *Send Date:*Tue Aug 18 03:52:44 2020
>>>>> *Recipients:*Dawid Wysakowicz <dw...@apache.org>
>>>>> *CC:*dev <de...@flink.apache.org>, user <us...@flink.apache.org>
>>>>> *Subject:*Re: [DISCUSS] Removing deprecated methods from DataStream API
>>>>>
>>>>>> +1 for removing them.
>>>>>>
>>>>>>
>>>>>>
>>>>>>  From a quick look, most of them (not all) have been deprecated a long
>>>>>> time ago.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Kostas
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote:
>>>>>>
>>>>>>> @David Yes, my idea was to remove any use of fold method and all
>>>>>> related classes including WindowedStream#fold
>>>>>>
>>>>>>> @Klou Good idea to also remove the deprecated enableCheckpointing() &
>>>>>> StreamExecutionEnvironment#readFile and alike. I did another pass over some
>>>>>> of the classes and thought we could also drop:
>>>>>>
>>>>>>> ExecutionConfig#set/getCodeAnalysisMode
>>>>>>> ExecutionConfig#disable/enableSysoutLogging
>>>>>>> ExecutionConfig#set/isFailTaskOnCheckpointError
>>>>>>> ExecutionConfig#isLatencyTrackingEnabled
>>>>>>> As for the `forceCheckpointing` I am not fully convinced to doing it.
>>>>>> As far as I know iterations still do not participate in checkpointing
>>>>>> correctly. Therefore it still might make sense to force it. In other words
>>>>>> there is no real alternative to that method. Unless we only remove the
>>>>>> methods from StreamExecutionEnvironment and redirect to the setter in
>>>>>> CheckpointConfig. WDYT?
>>>>>>
>>>>>>> An updated list of methods I suggest to remove:
>>>>>>> ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>>>>> ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>>>>> ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>>>>> ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>>>>> ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
>>>>>> 1.1?)
>>>>>>
>>>>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>>>>>> (deprecated in 1.2)
>>>>>>
>>>>>>> RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>>>>> DataStream#fold and all related classes and methods such as
>>>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>>>> 1.3/1.4)
>>>>>>
>>>>>>> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>>>> (deprecated in 1.5)
>>>>>>
>>>>>>> DataStream#split (deprecated in 1.8)
>>>>>>> Methods in (Connected)DataStream that specify keys as either indices
>>>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>>>
>>>>>>> Bear in mind that majority of the options listed above in
>>>>>> ExecutionConfig take no effect. They were left there purely to satisfy the
>>>>>> binary compatibility. Personally I don't see any benefit of leaving a
>>>>>> method and silently dropping the underlying feature. The only configuration
>>>>>> that is respected is setting the number of execution retries.
>>>>>>
>>>>>>> I also wanted to make it explicit that most of the changes above
>>>>>> would result in a binary incompatibility and require additional exclusions
>>>>>> in the japicmp. Those are:
>>>>>>
>>>>>>> ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>>>>> ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>>>>> ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>>>>> ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>>>>> ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
>>>>>> 1.1?)
>>>>>>
>>>>>>> DataStream#fold and all related classes and methods such as
>>>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>>>> 1.3/1.4)
>>>>>>
>>>>>>> DataStream#split (deprecated in 1.8)
>>>>>>> Methods in (Connected)DataStream that specify keys as either indices
>>>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>>>
>>>>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>>>>>> (deprecated in 1.2)
>>>>>>
>>>>>>> Looking forward to more opinions on the issue.
>>>>>>> Best,
>>>>>>> Dawid
>>>>>>> On 17/08/2020 12:49, Kostas Kloudas wrote:
>>>>>>> Thanks a lot for starting this Dawid,
>>>>>>> Big +1 for the proposed clean-up, and I would also add the deprecated
>>>>>>> methods of the StreamExecutionEnvironment like:
>>>>>>> enableCheckpointing(long interval, CheckpointingMode mode, boolean
>>>>>> force)
>>>>>>
>>>>>>> enableCheckpointing()
>>>>>>> isForceCheckpointing()
>>>>>>> readFile(FileInputFormat inputFormat,String
>>>>>>> filePath,FileProcessingMode watchType,long interval, FilePathFilter
>>>>>>> filter)
>>>>>>> readFileStream(...)
>>>>>>> socketTextStream(String hostname, int port, char delimiter, long
>>>>>> maxRetry)
>>>>>>
>>>>>>> socketTextStream(String hostname, int port, char delimiter)
>>>>>>> There are more, like the (get)/setNumberOfExecutionRetries() that were
>>>>>>> deprecated long ago, but I have not investigated to see if they are
>>>>>>> actually easy to remove.
>>>>>>> Cheers,
>>>>>>> Kostas
>>>>>>> On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
>>>>>>> wrote:
>>>>>>> Hi devs and users,
>>>>>>> I wanted to ask you what do you think about removing some of the
>>>>>> deprecated APIs around the DataStream API.
>>>>>>
>>>>>>> The APIs I have in mind are:
>>>>>>> RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>>>>> DataStream#fold and all related classes and methods such as
>>>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>>>> 1.3/1.4)
>>>>>>
>>>>>>> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>>>> (deprecated in 1.5)
>>>>>>
>>>>>>> DataStream#split (deprecated in 1.8)
>>>>>>> Methods in (Connected)DataStream that specify keys as either indices
>>>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>>>
>>>>>>> I think the first three should be straightforward. They are long
>>>>>> deprecated. The getAccumulators method is not used very often in my
>>>>>> opinion. The same applies to the DataStream#fold which additionally is not
>>>>>> very performant. Lastly the setStateBackend has an alternative with a class
>>>>>> from the AbstractStateBackend hierarchy, therefore it will be still code
>>>>>> compatible. Moreover if we remove the
>>>>>> #setStateBackend(AbstractStateBackend) we will get rid off warnings users
>>>>>> have right now when setting a statebackend as the correct method cannot be
>>>>>> used without an explicit casting.
>>>>>>
>>>>>>> As for the DataStream#split I know there were some objections against
>>>>>> removing the #split method in the past. I still believe the output tags can
>>>>>> replace the split method already.
>>>>>>
>>>>>>> The only problem in the last set of methods I propose to remove is
>>>>>> that they were deprecated only in the last release and those method were
>>>>>> only partially deprecated. Moreover some of the methods were not deprecated
>>>>>> in ConnectedStreams. Nevertheless I'd still be inclined to remove the
>>>>>> methods in this release.
>>>>>>
>>>>>>> Let me know what do you think about it.
>>>>>>> Best,
>>>>>>> Dawid
>>>>>
>


Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Yun Tang <my...@live.com>.
I noticed that the ticket to remove deprecated DataStream#fold() [1] has been created but not yet reach an agreement or assigned.

Actually fold related function and state descriptions has been deprecated for long than 3 years, and I think it's okay to remove such kind of state now.

[1] https://issues.apache.org/jira/browse/FLINK-19035

Best,
Yun Tang
________________________________
From: Aljoscha Krettek <al...@apache.org>
Sent: Thursday, August 27, 2020 16:45
To: dev@flink.apache.org <de...@flink.apache.org>
Subject: Re: [DISCUSS] Removing deprecated methods from DataStream API

Did you consider DataStream.project() yet? In general I think we should
remove most of the relational-ish methods from DataStream. More
candidates in this set of methods would be the tuple index/expression
methods for aggregations like min/max etc...

Aljoscha

On 25.08.20 20:52, Konstantin Knauf wrote:
> I would argue that the guarantees of @Public methods that became
> ineffective were broken when they became ineffective (and were deprecated).
>
>     - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>     - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>
> Removing these methods seems like the better of two evils to me as it shows
> users that they have been using no-ops for some time.
>
> On Thu, Aug 20, 2020 at 10:50 AM Stephan Ewen <se...@apache.org> wrote:
>
>> We have removed some public methods in the past, after a careful
>> deprecation period, if they were not well working any more.
>>
>> The sentiment I got from users is that careful cleanup is in fact
>> appreciated, otherwise things get confusing over time (the deprecated
>> methods cause noise in the API).
>> Still, we need to be very careful here.
>>
>> I would suggest to
>>    - start with the non-public breaking methods
>>    - remove fold() (very long deprecated)
>>    - remove split() buggy
>>
>> Removing the env.socketStream() and env.fileStream() methods would
>> probably be good, too. They are very long deprecated and they don't work
>> well (with checkpoints) and the sources are the first thing a user needs to
>> understand when starting with Flink, so removing noise there is super
>> helpful.
>>
>>
>> On Thu, Aug 20, 2020 at 8:53 AM Dawid Wysakowicz <dw...@apache.org>
>> wrote:
>>
>>> Hey Till,
>>>
>>> You've got a good point here. Removing some of the methods would cause
>>> breaking the stability guarantees. I do understand if we decide not to
>>> remove them for that reason, let me explain though why I am thinking it
>>> might make sense to remove them already. First of all I am a bit afraid it
>>> might take a long time before we arrive at the 2.0 version. We have not
>>> ever discussed that in the community. At the same time a lot of the methods
>>> already don't work or are buggy, and we do not fix them any more.
>>>
>>> Methods which removing would not break the Public guarantees:
>>>
>>>     - ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>     - RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>     - ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>     - StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>>>     (not the equivalent in the ExecutionConfig)
>>>     - StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>     (deprecated in 1.5)
>>>
>>> Methods which removing would break the Public guarantees:
>>>
>>> which have no effect:
>>>
>>>     - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>     - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>
>>> which are buggy or discouraged and thus we do not support fixing them:
>>>
>>>     - DataStream#split (deprecated in 1.8)
>>>     - DataStream#fold and all related classes and methods such as
>>>     FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>     1.3/1.4)
>>>
>>> The methods like:
>>>
>>>     - StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...),
>>>
>>>     - methods in (Connected)DataStream that specify keys as either
>>>     indices or field names
>>>     -
>>>     ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>>>
>>> should be working just fine and I feel the least eager to remove those.
>>>
>>> I'd suggest I will open PRs for removing the methods that will not cause
>>> breakage of the Public guarantees as the general feedback was rather
>>> positive. For the rest I do understand the resentment to do so and will not
>>> do it in the 1.x branch. Still I think it is valuable to have the
>>> discussion.
>>>
>>> Best,
>>>
>>> Dawid
>>>
>>>
>>> On 18/08/2020 09:26, Till Rohrmann wrote:
>>>
>>> Having looked at the proposed set of methods to remove I've noticed that
>>> some of them are actually annotated with @Public. According to our
>>> stability guarantees, only major releases (1.0, 2.0, etc.) can break APIs
>>> with this annotation. Hence, I believe that we cannot simply remove them
>>> unless the community decides to change the stability guarantees we give or
>>> by making the next release a major release (Flink 2.0).
>>>
>>> Cheers,
>>> Till
>>>
>>> On Tue, Aug 18, 2020 at 5:57 AM Yun Gao <yu...@aliyun.com> wrote:
>>>
>>>> +1 for removing the methods that are deprecated for a while & have
>>>> alternative methods.
>>>>
>>>> One specific thing is that if we remove the DataStream#split, do we
>>>> consider enabling side-output in more operators in the future ? Currently
>>>> it should be only available in ProcessFunctions, but not available to other
>>>> commonly used UDF like Source or AsyncFunction[1].
>>>>
>>>> One temporary solution occurs to me is to add a ProcessFunction after
>>>> the operators want to use side-output. But I think the solution is not very
>>>> direct to come up with and if it really works we might add it to the
>>>> document of side-output.
>>>>
>>>> [1] https://issues.apache.org/jira/browse/FLINK-7954
>>>>
>>>> Best,
>>>>   Yun
>>>>
>>>> ------------------Original Mail ------------------
>>>> *Sender:*Kostas Kloudas <kk...@gmail.com>
>>>> *Send Date:*Tue Aug 18 03:52:44 2020
>>>> *Recipients:*Dawid Wysakowicz <dw...@apache.org>
>>>> *CC:*dev <de...@flink.apache.org>, user <us...@flink.apache.org>
>>>> *Subject:*Re: [DISCUSS] Removing deprecated methods from DataStream API
>>>>
>>>>> +1 for removing them.
>>>>>
>>>>>
>>>>>
>>>>>  From a quick look, most of them (not all) have been deprecated a long
>>>>> time ago.
>>>>>
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Kostas
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote:
>>>>>
>>>>>>
>>>>>
>>>>>> @David Yes, my idea was to remove any use of fold method and all
>>>>> related classes including WindowedStream#fold
>>>>>
>>>>>>
>>>>>
>>>>>> @Klou Good idea to also remove the deprecated enableCheckpointing() &
>>>>> StreamExecutionEnvironment#readFile and alike. I did another pass over some
>>>>> of the classes and thought we could also drop:
>>>>>
>>>>>>
>>>>>
>>>>>> ExecutionConfig#set/getCodeAnalysisMode
>>>>>
>>>>>> ExecutionConfig#disable/enableSysoutLogging
>>>>>
>>>>>> ExecutionConfig#set/isFailTaskOnCheckpointError
>>>>>
>>>>>> ExecutionConfig#isLatencyTrackingEnabled
>>>>>
>>>>>>
>>>>>
>>>>>> As for the `forceCheckpointing` I am not fully convinced to doing it.
>>>>> As far as I know iterations still do not participate in checkpointing
>>>>> correctly. Therefore it still might make sense to force it. In other words
>>>>> there is no real alternative to that method. Unless we only remove the
>>>>> methods from StreamExecutionEnvironment and redirect to the setter in
>>>>> CheckpointConfig. WDYT?
>>>>>
>>>>>>
>>>>>
>>>>>> An updated list of methods I suggest to remove:
>>>>>
>>>>>>
>>>>>
>>>>>> ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>>>
>>>>>> ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>>>
>>>>>> ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>>>
>>>>>> ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>>>
>>>>>> ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
>>>>> 1.1?)
>>>>>
>>>>>>
>>>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>>>>> (deprecated in 1.2)
>>>>>
>>>>>> RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>>>
>>>>>> DataStream#fold and all related classes and methods such as
>>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>>> 1.3/1.4)
>>>>>
>>>>>> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>>> (deprecated in 1.5)
>>>>>
>>>>>> DataStream#split (deprecated in 1.8)
>>>>>
>>>>>> Methods in (Connected)DataStream that specify keys as either indices
>>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>>
>>>>>>
>>>>>
>>>>>> Bear in mind that majority of the options listed above in
>>>>> ExecutionConfig take no effect. They were left there purely to satisfy the
>>>>> binary compatibility. Personally I don't see any benefit of leaving a
>>>>> method and silently dropping the underlying feature. The only configuration
>>>>> that is respected is setting the number of execution retries.
>>>>>
>>>>>>
>>>>>
>>>>>> I also wanted to make it explicit that most of the changes above
>>>>> would result in a binary incompatibility and require additional exclusions
>>>>> in the japicmp. Those are:
>>>>>
>>>>>>
>>>>>
>>>>>> ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>>>
>>>>>> ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>>>
>>>>>> ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>>>
>>>>>> ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>>>
>>>>>> ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
>>>>> 1.1?)
>>>>>
>>>>>> DataStream#fold and all related classes and methods such as
>>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>>> 1.3/1.4)
>>>>>
>>>>>> DataStream#split (deprecated in 1.8)
>>>>>
>>>>>> Methods in (Connected)DataStream that specify keys as either indices
>>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>>
>>>>>>
>>>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>>>>> (deprecated in 1.2)
>>>>>
>>>>>>
>>>>>
>>>>>> Looking forward to more opinions on the issue.
>>>>>
>>>>>>
>>>>>
>>>>>> Best,
>>>>>
>>>>>>
>>>>>
>>>>>> Dawid
>>>>>
>>>>>>
>>>>>
>>>>>>
>>>>>
>>>>>> On 17/08/2020 12:49, Kostas Kloudas wrote:
>>>>>
>>>>>>
>>>>>
>>>>>> Thanks a lot for starting this Dawid,
>>>>>
>>>>>>
>>>>>
>>>>>> Big +1 for the proposed clean-up, and I would also add the deprecated
>>>>>
>>>>>> methods of the StreamExecutionEnvironment like:
>>>>>
>>>>>>
>>>>>
>>>>>> enableCheckpointing(long interval, CheckpointingMode mode, boolean
>>>>> force)
>>>>>
>>>>>> enableCheckpointing()
>>>>>
>>>>>> isForceCheckpointing()
>>>>>
>>>>>>
>>>>>
>>>>>> readFile(FileInputFormat inputFormat,String
>>>>>
>>>>>> filePath,FileProcessingMode watchType,long interval, FilePathFilter
>>>>>
>>>>>> filter)
>>>>>
>>>>>> readFileStream(...)
>>>>>
>>>>>>
>>>>>
>>>>>> socketTextStream(String hostname, int port, char delimiter, long
>>>>> maxRetry)
>>>>>
>>>>>> socketTextStream(String hostname, int port, char delimiter)
>>>>>
>>>>>>
>>>>>
>>>>>> There are more, like the (get)/setNumberOfExecutionRetries() that were
>>>>>
>>>>>> deprecated long ago, but I have not investigated to see if they are
>>>>>
>>>>>> actually easy to remove.
>>>>>
>>>>>>
>>>>>
>>>>>> Cheers,
>>>>>
>>>>>> Kostas
>>>>>
>>>>>>
>>>>>
>>>>>> On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
>>>>>
>>>>>> wrote:
>>>>>
>>>>>>
>>>>>
>>>>>> Hi devs and users,
>>>>>
>>>>>>
>>>>>
>>>>>> I wanted to ask you what do you think about removing some of the
>>>>> deprecated APIs around the DataStream API.
>>>>>
>>>>>>
>>>>>
>>>>>> The APIs I have in mind are:
>>>>>
>>>>>>
>>>>>
>>>>>> RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>>>
>>>>>> DataStream#fold and all related classes and methods such as
>>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>>> 1.3/1.4)
>>>>>
>>>>>> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>>> (deprecated in 1.5)
>>>>>
>>>>>> DataStream#split (deprecated in 1.8)
>>>>>
>>>>>> Methods in (Connected)DataStream that specify keys as either indices
>>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>>
>>>>>>
>>>>>
>>>>>> I think the first three should be straightforward. They are long
>>>>> deprecated. The getAccumulators method is not used very often in my
>>>>> opinion. The same applies to the DataStream#fold which additionally is not
>>>>> very performant. Lastly the setStateBackend has an alternative with a class
>>>>> from the AbstractStateBackend hierarchy, therefore it will be still code
>>>>> compatible. Moreover if we remove the
>>>>> #setStateBackend(AbstractStateBackend) we will get rid off warnings users
>>>>> have right now when setting a statebackend as the correct method cannot be
>>>>> used without an explicit casting.
>>>>>
>>>>>>
>>>>>
>>>>>> As for the DataStream#split I know there were some objections against
>>>>> removing the #split method in the past. I still believe the output tags can
>>>>> replace the split method already.
>>>>>
>>>>>>
>>>>>
>>>>>> The only problem in the last set of methods I propose to remove is
>>>>> that they were deprecated only in the last release and those method were
>>>>> only partially deprecated. Moreover some of the methods were not deprecated
>>>>> in ConnectedStreams. Nevertheless I'd still be inclined to remove the
>>>>> methods in this release.
>>>>>
>>>>>>
>>>>>
>>>>>> Let me know what do you think about it.
>>>>>
>>>>>>
>>>>>
>>>>>> Best,
>>>>>
>>>>>>
>>>>>
>>>>>> Dawid
>>>>
>>>>
>


Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Dawid Wysakowicz <dw...@apache.org>.
@Aljoscha I have not thought about DataStream#project. I see your
argument and I do agree with you it makes sense to eventually remove
them. They are not deprecated yet though. Moreover I think we could
include deprecating it as part of the efforts of FLIP-131/FLIP-134. I
will add the argument there.

@Konstantin I do share your sentiment that the contract was broken when
they became ineffective. That's why I mentioned the fact they are
ineffective ;)

Let me give a brief summary how I understand the discussion so far.

I will start the effort of removing the non-breaking APIs:

  * StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
    (deprecated in 1.5) (ALREADY REMOVED)
  * RuntimeContext#getAllAccumulators (deprecated in 0.10) (ALREADY
    REMOVED)
  * ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
  * ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
  * StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries
    (not the equivalent in the ExecutionConfig
  * ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
    (ineffective, thus already broke the guarantees)
  * ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
    (ineffective, thus already broke the guarantees)
  * DataStream#writeAsCsv,writeAsText (deprecated in 1.10)

I will start a Vote thread for removing:

  * DataStream#split
  * XxxDataStream#fold

as there was quite a bit of positive feedback for doing so, but there
were also concerns.

For now (probably until 2.0), I'll leave the methods:

  * StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
  * methods in XxxDataStream that use field names and indices for
    addressing projections/groupings
  * ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries

BTW you can track the effort in FLINK-19033[1]

Best,

Dawid

[1] https://issues.apache.org/jira/browse/FLINK-19033


On 27/08/2020 10:45, Aljoscha Krettek wrote:
> Did you consider DataStream.project() yet? In general I think we
> should remove most of the relational-ish methods from DataStream. More
> candidates in this set of methods would be the tuple index/expression
> methods for aggregations like min/max etc...
>
> Aljoscha
>
> On 25.08.20 20:52, Konstantin Knauf wrote:
>> I would argue that the guarantees of @Public methods that became
>> ineffective were broken when they became ineffective (and were
>> deprecated).
>>
>>     - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>     - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in
>> 1.9)
>>
>> Removing these methods seems like the better of two evils to me as it
>> shows
>> users that they have been using no-ops for some time.
>>
>> On Thu, Aug 20, 2020 at 10:50 AM Stephan Ewen <se...@apache.org> wrote:
>>
>>> We have removed some public methods in the past, after a careful
>>> deprecation period, if they were not well working any more.
>>>
>>> The sentiment I got from users is that careful cleanup is in fact
>>> appreciated, otherwise things get confusing over time (the deprecated
>>> methods cause noise in the API).
>>> Still, we need to be very careful here.
>>>
>>> I would suggest to
>>>    - start with the non-public breaking methods
>>>    - remove fold() (very long deprecated)
>>>    - remove split() buggy
>>>
>>> Removing the env.socketStream() and env.fileStream() methods would
>>> probably be good, too. They are very long deprecated and they don't
>>> work
>>> well (with checkpoints) and the sources are the first thing a user
>>> needs to
>>> understand when starting with Flink, so removing noise there is super
>>> helpful.
>>>
>>>
>>> On Thu, Aug 20, 2020 at 8:53 AM Dawid Wysakowicz
>>> <dw...@apache.org>
>>> wrote:
>>>
>>>> Hey Till,
>>>>
>>>> You've got a good point here. Removing some of the methods would cause
>>>> breaking the stability guarantees. I do understand if we decide not to
>>>> remove them for that reason, let me explain though why I am
>>>> thinking it
>>>> might make sense to remove them already. First of all I am a bit
>>>> afraid it
>>>> might take a long time before we arrive at the 2.0 version. We have
>>>> not
>>>> ever discussed that in the community. At the same time a lot of the
>>>> methods
>>>> already don't work or are buggy, and we do not fix them any more.
>>>>
>>>> Methods which removing would not break the Public guarantees:
>>>>
>>>>     - ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>>     - RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>>     - ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>>     -
>>>> StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>>>>     (not the equivalent in the ExecutionConfig)
>>>>     - StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>>     (deprecated in 1.5)
>>>>
>>>> Methods which removing would break the Public guarantees:
>>>>
>>>> which have no effect:
>>>>
>>>>     - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>>     - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated
>>>> in 1.9)
>>>>
>>>> which are buggy or discouraged and thus we do not support fixing them:
>>>>
>>>>     - DataStream#split (deprecated in 1.8)
>>>>     - DataStream#fold and all related classes and methods such as
>>>>     FoldFunction, FoldingState, FoldingStateDescriptor ...
>>>> (deprecated in
>>>>     1.3/1.4)
>>>>
>>>> The methods like:
>>>>
>>>>     -
>>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...),
>>>>
>>>>     - methods in (Connected)DataStream that specify keys as either
>>>>     indices or field names
>>>>     -
>>>>    
>>>> ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>>>>
>>>>
>>>> should be working just fine and I feel the least eager to remove
>>>> those.
>>>>
>>>> I'd suggest I will open PRs for removing the methods that will not
>>>> cause
>>>> breakage of the Public guarantees as the general feedback was rather
>>>> positive. For the rest I do understand the resentment to do so and
>>>> will not
>>>> do it in the 1.x branch. Still I think it is valuable to have the
>>>> discussion.
>>>>
>>>> Best,
>>>>
>>>> Dawid
>>>>
>>>>
>>>> On 18/08/2020 09:26, Till Rohrmann wrote:
>>>>
>>>> Having looked at the proposed set of methods to remove I've noticed
>>>> that
>>>> some of them are actually annotated with @Public. According to our
>>>> stability guarantees, only major releases (1.0, 2.0, etc.) can
>>>> break APIs
>>>> with this annotation. Hence, I believe that we cannot simply remove
>>>> them
>>>> unless the community decides to change the stability guarantees we
>>>> give or
>>>> by making the next release a major release (Flink 2.0).
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Tue, Aug 18, 2020 at 5:57 AM Yun Gao <yu...@aliyun.com> wrote:
>>>>
>>>>> +1 for removing the methods that are deprecated for a while & have
>>>>> alternative methods.
>>>>>
>>>>> One specific thing is that if we remove the DataStream#split, do we
>>>>> consider enabling side-output in more operators in the future ?
>>>>> Currently
>>>>> it should be only available in ProcessFunctions, but not available
>>>>> to other
>>>>> commonly used UDF like Source or AsyncFunction[1].
>>>>>
>>>>> One temporary solution occurs to me is to add a ProcessFunction after
>>>>> the operators want to use side-output. But I think the solution is
>>>>> not very
>>>>> direct to come up with and if it really works we might add it to the
>>>>> document of side-output.
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/FLINK-7954
>>>>>
>>>>> Best,
>>>>>   Yun
>>>>>
>>>>> ------------------Original Mail ------------------
>>>>> *Sender:*Kostas Kloudas <kk...@gmail.com>
>>>>> *Send Date:*Tue Aug 18 03:52:44 2020
>>>>> *Recipients:*Dawid Wysakowicz <dw...@apache.org>
>>>>> *CC:*dev <de...@flink.apache.org>, user <us...@flink.apache.org>
>>>>> *Subject:*Re: [DISCUSS] Removing deprecated methods from
>>>>> DataStream API
>>>>>
>>>>>> +1 for removing them.
>>>>>>
>>>>>>
>>>>>>
>>>>>>  From a quick look, most of them (not all) have been deprecated a
>>>>>> long
>>>>>> time ago.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Kostas
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote:
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> @David Yes, my idea was to remove any use of fold method and all
>>>>>> related classes including WindowedStream#fold
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> @Klou Good idea to also remove the deprecated
>>>>>>> enableCheckpointing() &
>>>>>> StreamExecutionEnvironment#readFile and alike. I did another pass
>>>>>> over some
>>>>>> of the classes and thought we could also drop:
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> ExecutionConfig#set/getCodeAnalysisMode
>>>>>>
>>>>>>> ExecutionConfig#disable/enableSysoutLogging
>>>>>>
>>>>>>> ExecutionConfig#set/isFailTaskOnCheckpointError
>>>>>>
>>>>>>> ExecutionConfig#isLatencyTrackingEnabled
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> As for the `forceCheckpointing` I am not fully convinced to
>>>>>>> doing it.
>>>>>> As far as I know iterations still do not participate in
>>>>>> checkpointing
>>>>>> correctly. Therefore it still might make sense to force it. In
>>>>>> other words
>>>>>> there is no real alternative to that method. Unless we only
>>>>>> remove the
>>>>>> methods from StreamExecutionEnvironment and redirect to the
>>>>>> setter in
>>>>>> CheckpointConfig. WDYT?
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> An updated list of methods I suggest to remove:
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>>>>
>>>>>>> ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>>>>
>>>>>>> ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>>>>
>>>>>>> ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>>>>
>>>>>>> ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
>>>>>> 1.1?)
>>>>>>
>>>>>>>
>>>>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>>>>>>
>>>>>> (deprecated in 1.2)
>>>>>>
>>>>>>> RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>>>>
>>>>>>> DataStream#fold and all related classes and methods such as
>>>>>> FoldFunction, FoldingState, FoldingStateDescriptor ...
>>>>>> (deprecated in
>>>>>> 1.3/1.4)
>>>>>>
>>>>>>> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>>>> (deprecated in 1.5)
>>>>>>
>>>>>>> DataStream#split (deprecated in 1.8)
>>>>>>
>>>>>>> Methods in (Connected)DataStream that specify keys as either
>>>>>>> indices
>>>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Bear in mind that majority of the options listed above in
>>>>>> ExecutionConfig take no effect. They were left there purely to
>>>>>> satisfy the
>>>>>> binary compatibility. Personally I don't see any benefit of
>>>>>> leaving a
>>>>>> method and silently dropping the underlying feature. The only
>>>>>> configuration
>>>>>> that is respected is setting the number of execution retries.
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> I also wanted to make it explicit that most of the changes above
>>>>>> would result in a binary incompatibility and require additional
>>>>>> exclusions
>>>>>> in the japicmp. Those are:
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>>>>
>>>>>>> ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>>>>
>>>>>>> ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>>>>
>>>>>>> ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>>>>
>>>>>>> ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
>>>>>> 1.1?)
>>>>>>
>>>>>>> DataStream#fold and all related classes and methods such as
>>>>>> FoldFunction, FoldingState, FoldingStateDescriptor ...
>>>>>> (deprecated in
>>>>>> 1.3/1.4)
>>>>>>
>>>>>>> DataStream#split (deprecated in 1.8)
>>>>>>
>>>>>>> Methods in (Connected)DataStream that specify keys as either
>>>>>>> indices
>>>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>>>
>>>>>>>
>>>>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>>>>>>
>>>>>> (deprecated in 1.2)
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Looking forward to more opinions on the issue.
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Best,
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Dawid
>>>>>>
>>>>>>>
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> On 17/08/2020 12:49, Kostas Kloudas wrote:
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Thanks a lot for starting this Dawid,
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Big +1 for the proposed clean-up, and I would also add the
>>>>>>> deprecated
>>>>>>
>>>>>>> methods of the StreamExecutionEnvironment like:
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> enableCheckpointing(long interval, CheckpointingMode mode, boolean
>>>>>> force)
>>>>>>
>>>>>>> enableCheckpointing()
>>>>>>
>>>>>>> isForceCheckpointing()
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> readFile(FileInputFormat inputFormat,String
>>>>>>
>>>>>>> filePath,FileProcessingMode watchType,long interval, FilePathFilter
>>>>>>
>>>>>>> filter)
>>>>>>
>>>>>>> readFileStream(...)
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> socketTextStream(String hostname, int port, char delimiter, long
>>>>>> maxRetry)
>>>>>>
>>>>>>> socketTextStream(String hostname, int port, char delimiter)
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> There are more, like the (get)/setNumberOfExecutionRetries()
>>>>>>> that were
>>>>>>
>>>>>>> deprecated long ago, but I have not investigated to see if they are
>>>>>>
>>>>>>> actually easy to remove.
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Cheers,
>>>>>>
>>>>>>> Kostas
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
>>>>>>
>>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Hi devs and users,
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> I wanted to ask you what do you think about removing some of the
>>>>>> deprecated APIs around the DataStream API.
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> The APIs I have in mind are:
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>>>>
>>>>>>> DataStream#fold and all related classes and methods such as
>>>>>> FoldFunction, FoldingState, FoldingStateDescriptor ...
>>>>>> (deprecated in
>>>>>> 1.3/1.4)
>>>>>>
>>>>>>> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>>>> (deprecated in 1.5)
>>>>>>
>>>>>>> DataStream#split (deprecated in 1.8)
>>>>>>
>>>>>>> Methods in (Connected)DataStream that specify keys as either
>>>>>>> indices
>>>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> I think the first three should be straightforward. They are long
>>>>>> deprecated. The getAccumulators method is not used very often in my
>>>>>> opinion. The same applies to the DataStream#fold which
>>>>>> additionally is not
>>>>>> very performant. Lastly the setStateBackend has an alternative
>>>>>> with a class
>>>>>> from the AbstractStateBackend hierarchy, therefore it will be
>>>>>> still code
>>>>>> compatible. Moreover if we remove the
>>>>>> #setStateBackend(AbstractStateBackend) we will get rid off
>>>>>> warnings users
>>>>>> have right now when setting a statebackend as the correct method
>>>>>> cannot be
>>>>>> used without an explicit casting.
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> As for the DataStream#split I know there were some objections
>>>>>>> against
>>>>>> removing the #split method in the past. I still believe the
>>>>>> output tags can
>>>>>> replace the split method already.
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> The only problem in the last set of methods I propose to remove is
>>>>>> that they were deprecated only in the last release and those
>>>>>> method were
>>>>>> only partially deprecated. Moreover some of the methods were not
>>>>>> deprecated
>>>>>> in ConnectedStreams. Nevertheless I'd still be inclined to remove
>>>>>> the
>>>>>> methods in this release.
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Let me know what do you think about it.
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Best,
>>>>>>
>>>>>>>
>>>>>>
>>>>>>> Dawid
>>>>>
>>>>>
>>
>

Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Aljoscha Krettek <al...@apache.org>.
Did you consider DataStream.project() yet? In general I think we should 
remove most of the relational-ish methods from DataStream. More 
candidates in this set of methods would be the tuple index/expression 
methods for aggregations like min/max etc...

Aljoscha

On 25.08.20 20:52, Konstantin Knauf wrote:
> I would argue that the guarantees of @Public methods that became
> ineffective were broken when they became ineffective (and were deprecated).
> 
>     - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>     - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
> 
> Removing these methods seems like the better of two evils to me as it shows
> users that they have been using no-ops for some time.
> 
> On Thu, Aug 20, 2020 at 10:50 AM Stephan Ewen <se...@apache.org> wrote:
> 
>> We have removed some public methods in the past, after a careful
>> deprecation period, if they were not well working any more.
>>
>> The sentiment I got from users is that careful cleanup is in fact
>> appreciated, otherwise things get confusing over time (the deprecated
>> methods cause noise in the API).
>> Still, we need to be very careful here.
>>
>> I would suggest to
>>    - start with the non-public breaking methods
>>    - remove fold() (very long deprecated)
>>    - remove split() buggy
>>
>> Removing the env.socketStream() and env.fileStream() methods would
>> probably be good, too. They are very long deprecated and they don't work
>> well (with checkpoints) and the sources are the first thing a user needs to
>> understand when starting with Flink, so removing noise there is super
>> helpful.
>>
>>
>> On Thu, Aug 20, 2020 at 8:53 AM Dawid Wysakowicz <dw...@apache.org>
>> wrote:
>>
>>> Hey Till,
>>>
>>> You've got a good point here. Removing some of the methods would cause
>>> breaking the stability guarantees. I do understand if we decide not to
>>> remove them for that reason, let me explain though why I am thinking it
>>> might make sense to remove them already. First of all I am a bit afraid it
>>> might take a long time before we arrive at the 2.0 version. We have not
>>> ever discussed that in the community. At the same time a lot of the methods
>>> already don't work or are buggy, and we do not fix them any more.
>>>
>>> Methods which removing would not break the Public guarantees:
>>>
>>>     - ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>     - RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>     - ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>     - StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>>>     (not the equivalent in the ExecutionConfig)
>>>     - StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>     (deprecated in 1.5)
>>>
>>> Methods which removing would break the Public guarantees:
>>>
>>> which have no effect:
>>>
>>>     - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>     - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>
>>> which are buggy or discouraged and thus we do not support fixing them:
>>>
>>>     - DataStream#split (deprecated in 1.8)
>>>     - DataStream#fold and all related classes and methods such as
>>>     FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>     1.3/1.4)
>>>
>>> The methods like:
>>>
>>>     - StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...),
>>>
>>>     - methods in (Connected)DataStream that specify keys as either
>>>     indices or field names
>>>     -
>>>     ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>>>
>>> should be working just fine and I feel the least eager to remove those.
>>>
>>> I'd suggest I will open PRs for removing the methods that will not cause
>>> breakage of the Public guarantees as the general feedback was rather
>>> positive. For the rest I do understand the resentment to do so and will not
>>> do it in the 1.x branch. Still I think it is valuable to have the
>>> discussion.
>>>
>>> Best,
>>>
>>> Dawid
>>>
>>>
>>> On 18/08/2020 09:26, Till Rohrmann wrote:
>>>
>>> Having looked at the proposed set of methods to remove I've noticed that
>>> some of them are actually annotated with @Public. According to our
>>> stability guarantees, only major releases (1.0, 2.0, etc.) can break APIs
>>> with this annotation. Hence, I believe that we cannot simply remove them
>>> unless the community decides to change the stability guarantees we give or
>>> by making the next release a major release (Flink 2.0).
>>>
>>> Cheers,
>>> Till
>>>
>>> On Tue, Aug 18, 2020 at 5:57 AM Yun Gao <yu...@aliyun.com> wrote:
>>>
>>>> +1 for removing the methods that are deprecated for a while & have
>>>> alternative methods.
>>>>
>>>> One specific thing is that if we remove the DataStream#split, do we
>>>> consider enabling side-output in more operators in the future ? Currently
>>>> it should be only available in ProcessFunctions, but not available to other
>>>> commonly used UDF like Source or AsyncFunction[1].
>>>>
>>>> One temporary solution occurs to me is to add a ProcessFunction after
>>>> the operators want to use side-output. But I think the solution is not very
>>>> direct to come up with and if it really works we might add it to the
>>>> document of side-output.
>>>>
>>>> [1] https://issues.apache.org/jira/browse/FLINK-7954
>>>>
>>>> Best,
>>>>   Yun
>>>>
>>>> ------------------Original Mail ------------------
>>>> *Sender:*Kostas Kloudas <kk...@gmail.com>
>>>> *Send Date:*Tue Aug 18 03:52:44 2020
>>>> *Recipients:*Dawid Wysakowicz <dw...@apache.org>
>>>> *CC:*dev <de...@flink.apache.org>, user <us...@flink.apache.org>
>>>> *Subject:*Re: [DISCUSS] Removing deprecated methods from DataStream API
>>>>
>>>>> +1 for removing them.
>>>>>
>>>>>
>>>>>
>>>>>  From a quick look, most of them (not all) have been deprecated a long
>>>>> time ago.
>>>>>
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Kostas
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote:
>>>>>
>>>>>>
>>>>>
>>>>>> @David Yes, my idea was to remove any use of fold method and all
>>>>> related classes including WindowedStream#fold
>>>>>
>>>>>>
>>>>>
>>>>>> @Klou Good idea to also remove the deprecated enableCheckpointing() &
>>>>> StreamExecutionEnvironment#readFile and alike. I did another pass over some
>>>>> of the classes and thought we could also drop:
>>>>>
>>>>>>
>>>>>
>>>>>> ExecutionConfig#set/getCodeAnalysisMode
>>>>>
>>>>>> ExecutionConfig#disable/enableSysoutLogging
>>>>>
>>>>>> ExecutionConfig#set/isFailTaskOnCheckpointError
>>>>>
>>>>>> ExecutionConfig#isLatencyTrackingEnabled
>>>>>
>>>>>>
>>>>>
>>>>>> As for the `forceCheckpointing` I am not fully convinced to doing it.
>>>>> As far as I know iterations still do not participate in checkpointing
>>>>> correctly. Therefore it still might make sense to force it. In other words
>>>>> there is no real alternative to that method. Unless we only remove the
>>>>> methods from StreamExecutionEnvironment and redirect to the setter in
>>>>> CheckpointConfig. WDYT?
>>>>>
>>>>>>
>>>>>
>>>>>> An updated list of methods I suggest to remove:
>>>>>
>>>>>>
>>>>>
>>>>>> ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>>>
>>>>>> ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>>>
>>>>>> ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>>>
>>>>>> ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>>>
>>>>>> ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
>>>>> 1.1?)
>>>>>
>>>>>>
>>>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>>>>> (deprecated in 1.2)
>>>>>
>>>>>> RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>>>
>>>>>> DataStream#fold and all related classes and methods such as
>>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>>> 1.3/1.4)
>>>>>
>>>>>> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>>> (deprecated in 1.5)
>>>>>
>>>>>> DataStream#split (deprecated in 1.8)
>>>>>
>>>>>> Methods in (Connected)DataStream that specify keys as either indices
>>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>>
>>>>>>
>>>>>
>>>>>> Bear in mind that majority of the options listed above in
>>>>> ExecutionConfig take no effect. They were left there purely to satisfy the
>>>>> binary compatibility. Personally I don't see any benefit of leaving a
>>>>> method and silently dropping the underlying feature. The only configuration
>>>>> that is respected is setting the number of execution retries.
>>>>>
>>>>>>
>>>>>
>>>>>> I also wanted to make it explicit that most of the changes above
>>>>> would result in a binary incompatibility and require additional exclusions
>>>>> in the japicmp. Those are:
>>>>>
>>>>>>
>>>>>
>>>>>> ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>>>
>>>>>> ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>>>
>>>>>> ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>>>
>>>>>> ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>>>
>>>>>> ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
>>>>> 1.1?)
>>>>>
>>>>>> DataStream#fold and all related classes and methods such as
>>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>>> 1.3/1.4)
>>>>>
>>>>>> DataStream#split (deprecated in 1.8)
>>>>>
>>>>>> Methods in (Connected)DataStream that specify keys as either indices
>>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>>
>>>>>>
>>>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>>>>> (deprecated in 1.2)
>>>>>
>>>>>>
>>>>>
>>>>>> Looking forward to more opinions on the issue.
>>>>>
>>>>>>
>>>>>
>>>>>> Best,
>>>>>
>>>>>>
>>>>>
>>>>>> Dawid
>>>>>
>>>>>>
>>>>>
>>>>>>
>>>>>
>>>>>> On 17/08/2020 12:49, Kostas Kloudas wrote:
>>>>>
>>>>>>
>>>>>
>>>>>> Thanks a lot for starting this Dawid,
>>>>>
>>>>>>
>>>>>
>>>>>> Big +1 for the proposed clean-up, and I would also add the deprecated
>>>>>
>>>>>> methods of the StreamExecutionEnvironment like:
>>>>>
>>>>>>
>>>>>
>>>>>> enableCheckpointing(long interval, CheckpointingMode mode, boolean
>>>>> force)
>>>>>
>>>>>> enableCheckpointing()
>>>>>
>>>>>> isForceCheckpointing()
>>>>>
>>>>>>
>>>>>
>>>>>> readFile(FileInputFormat inputFormat,String
>>>>>
>>>>>> filePath,FileProcessingMode watchType,long interval, FilePathFilter
>>>>>
>>>>>> filter)
>>>>>
>>>>>> readFileStream(...)
>>>>>
>>>>>>
>>>>>
>>>>>> socketTextStream(String hostname, int port, char delimiter, long
>>>>> maxRetry)
>>>>>
>>>>>> socketTextStream(String hostname, int port, char delimiter)
>>>>>
>>>>>>
>>>>>
>>>>>> There are more, like the (get)/setNumberOfExecutionRetries() that were
>>>>>
>>>>>> deprecated long ago, but I have not investigated to see if they are
>>>>>
>>>>>> actually easy to remove.
>>>>>
>>>>>>
>>>>>
>>>>>> Cheers,
>>>>>
>>>>>> Kostas
>>>>>
>>>>>>
>>>>>
>>>>>> On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
>>>>>
>>>>>> wrote:
>>>>>
>>>>>>
>>>>>
>>>>>> Hi devs and users,
>>>>>
>>>>>>
>>>>>
>>>>>> I wanted to ask you what do you think about removing some of the
>>>>> deprecated APIs around the DataStream API.
>>>>>
>>>>>>
>>>>>
>>>>>> The APIs I have in mind are:
>>>>>
>>>>>>
>>>>>
>>>>>> RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>>>
>>>>>> DataStream#fold and all related classes and methods such as
>>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>>> 1.3/1.4)
>>>>>
>>>>>> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>>> (deprecated in 1.5)
>>>>>
>>>>>> DataStream#split (deprecated in 1.8)
>>>>>
>>>>>> Methods in (Connected)DataStream that specify keys as either indices
>>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>>
>>>>>>
>>>>>
>>>>>> I think the first three should be straightforward. They are long
>>>>> deprecated. The getAccumulators method is not used very often in my
>>>>> opinion. The same applies to the DataStream#fold which additionally is not
>>>>> very performant. Lastly the setStateBackend has an alternative with a class
>>>>> from the AbstractStateBackend hierarchy, therefore it will be still code
>>>>> compatible. Moreover if we remove the
>>>>> #setStateBackend(AbstractStateBackend) we will get rid off warnings users
>>>>> have right now when setting a statebackend as the correct method cannot be
>>>>> used without an explicit casting.
>>>>>
>>>>>>
>>>>>
>>>>>> As for the DataStream#split I know there were some objections against
>>>>> removing the #split method in the past. I still believe the output tags can
>>>>> replace the split method already.
>>>>>
>>>>>>
>>>>>
>>>>>> The only problem in the last set of methods I propose to remove is
>>>>> that they were deprecated only in the last release and those method were
>>>>> only partially deprecated. Moreover some of the methods were not deprecated
>>>>> in ConnectedStreams. Nevertheless I'd still be inclined to remove the
>>>>> methods in this release.
>>>>>
>>>>>>
>>>>>
>>>>>> Let me know what do you think about it.
>>>>>
>>>>>>
>>>>>
>>>>>> Best,
>>>>>
>>>>>>
>>>>>
>>>>>> Dawid
>>>>
>>>>
> 


Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Konstantin Knauf <kn...@apache.org>.
I would argue that the guarantees of @Public methods that became
ineffective were broken when they became ineffective (and were deprecated).

   - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
   - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)

Removing these methods seems like the better of two evils to me as it shows
users that they have been using no-ops for some time.

On Thu, Aug 20, 2020 at 10:50 AM Stephan Ewen <se...@apache.org> wrote:

> We have removed some public methods in the past, after a careful
> deprecation period, if they were not well working any more.
>
> The sentiment I got from users is that careful cleanup is in fact
> appreciated, otherwise things get confusing over time (the deprecated
> methods cause noise in the API).
> Still, we need to be very careful here.
>
> I would suggest to
>   - start with the non-public breaking methods
>   - remove fold() (very long deprecated)
>   - remove split() buggy
>
> Removing the env.socketStream() and env.fileStream() methods would
> probably be good, too. They are very long deprecated and they don't work
> well (with checkpoints) and the sources are the first thing a user needs to
> understand when starting with Flink, so removing noise there is super
> helpful.
>
>
> On Thu, Aug 20, 2020 at 8:53 AM Dawid Wysakowicz <dw...@apache.org>
> wrote:
>
>> Hey Till,
>>
>> You've got a good point here. Removing some of the methods would cause
>> breaking the stability guarantees. I do understand if we decide not to
>> remove them for that reason, let me explain though why I am thinking it
>> might make sense to remove them already. First of all I am a bit afraid it
>> might take a long time before we arrive at the 2.0 version. We have not
>> ever discussed that in the community. At the same time a lot of the methods
>> already don't work or are buggy, and we do not fix them any more.
>>
>> Methods which removing would not break the Public guarantees:
>>
>>    - ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>    - RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>    - ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>    - StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>>    (not the equivalent in the ExecutionConfig)
>>    - StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>    (deprecated in 1.5)
>>
>> Methods which removing would break the Public guarantees:
>>
>> which have no effect:
>>
>>    - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>    - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>
>> which are buggy or discouraged and thus we do not support fixing them:
>>
>>    - DataStream#split (deprecated in 1.8)
>>    - DataStream#fold and all related classes and methods such as
>>    FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>    1.3/1.4)
>>
>> The methods like:
>>
>>    - StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...),
>>
>>    - methods in (Connected)DataStream that specify keys as either
>>    indices or field names
>>    -
>>    ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>>
>> should be working just fine and I feel the least eager to remove those.
>>
>> I'd suggest I will open PRs for removing the methods that will not cause
>> breakage of the Public guarantees as the general feedback was rather
>> positive. For the rest I do understand the resentment to do so and will not
>> do it in the 1.x branch. Still I think it is valuable to have the
>> discussion.
>>
>> Best,
>>
>> Dawid
>>
>>
>> On 18/08/2020 09:26, Till Rohrmann wrote:
>>
>> Having looked at the proposed set of methods to remove I've noticed that
>> some of them are actually annotated with @Public. According to our
>> stability guarantees, only major releases (1.0, 2.0, etc.) can break APIs
>> with this annotation. Hence, I believe that we cannot simply remove them
>> unless the community decides to change the stability guarantees we give or
>> by making the next release a major release (Flink 2.0).
>>
>> Cheers,
>> Till
>>
>> On Tue, Aug 18, 2020 at 5:57 AM Yun Gao <yu...@aliyun.com> wrote:
>>
>>> +1 for removing the methods that are deprecated for a while & have
>>> alternative methods.
>>>
>>> One specific thing is that if we remove the DataStream#split, do we
>>> consider enabling side-output in more operators in the future ? Currently
>>> it should be only available in ProcessFunctions, but not available to other
>>> commonly used UDF like Source or AsyncFunction[1].
>>>
>>> One temporary solution occurs to me is to add a ProcessFunction after
>>> the operators want to use side-output. But I think the solution is not very
>>> direct to come up with and if it really works we might add it to the
>>> document of side-output.
>>>
>>> [1] https://issues.apache.org/jira/browse/FLINK-7954
>>>
>>> Best,
>>>  Yun
>>>
>>> ------------------Original Mail ------------------
>>> *Sender:*Kostas Kloudas <kk...@gmail.com>
>>> *Send Date:*Tue Aug 18 03:52:44 2020
>>> *Recipients:*Dawid Wysakowicz <dw...@apache.org>
>>> *CC:*dev <de...@flink.apache.org>, user <us...@flink.apache.org>
>>> *Subject:*Re: [DISCUSS] Removing deprecated methods from DataStream API
>>>
>>>> +1 for removing them.
>>>>
>>>>
>>>>
>>>> From a quick look, most of them (not all) have been deprecated a long
>>>> time ago.
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Kostas
>>>>
>>>>
>>>>
>>>> On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote:
>>>>
>>>> >
>>>>
>>>> > @David Yes, my idea was to remove any use of fold method and all
>>>> related classes including WindowedStream#fold
>>>>
>>>> >
>>>>
>>>> > @Klou Good idea to also remove the deprecated enableCheckpointing() &
>>>> StreamExecutionEnvironment#readFile and alike. I did another pass over some
>>>> of the classes and thought we could also drop:
>>>>
>>>> >
>>>>
>>>> > ExecutionConfig#set/getCodeAnalysisMode
>>>>
>>>> > ExecutionConfig#disable/enableSysoutLogging
>>>>
>>>> > ExecutionConfig#set/isFailTaskOnCheckpointError
>>>>
>>>> > ExecutionConfig#isLatencyTrackingEnabled
>>>>
>>>> >
>>>>
>>>> > As for the `forceCheckpointing` I am not fully convinced to doing it.
>>>> As far as I know iterations still do not participate in checkpointing
>>>> correctly. Therefore it still might make sense to force it. In other words
>>>> there is no real alternative to that method. Unless we only remove the
>>>> methods from StreamExecutionEnvironment and redirect to the setter in
>>>> CheckpointConfig. WDYT?
>>>>
>>>> >
>>>>
>>>> > An updated list of methods I suggest to remove:
>>>>
>>>> >
>>>>
>>>> > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>>
>>>> > ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>>
>>>> > ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>>
>>>> > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>>
>>>> > ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
>>>> 1.1?)
>>>>
>>>> >
>>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>>>> (deprecated in 1.2)
>>>>
>>>> > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>>
>>>> > DataStream#fold and all related classes and methods such as
>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>> 1.3/1.4)
>>>>
>>>> > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>> (deprecated in 1.5)
>>>>
>>>> > DataStream#split (deprecated in 1.8)
>>>>
>>>> > Methods in (Connected)DataStream that specify keys as either indices
>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>
>>>> >
>>>>
>>>> > Bear in mind that majority of the options listed above in
>>>> ExecutionConfig take no effect. They were left there purely to satisfy the
>>>> binary compatibility. Personally I don't see any benefit of leaving a
>>>> method and silently dropping the underlying feature. The only configuration
>>>> that is respected is setting the number of execution retries.
>>>>
>>>> >
>>>>
>>>> > I also wanted to make it explicit that most of the changes above
>>>> would result in a binary incompatibility and require additional exclusions
>>>> in the japicmp. Those are:
>>>>
>>>> >
>>>>
>>>> > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>>
>>>> > ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>>
>>>> > ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>>
>>>> > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>>
>>>> > ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
>>>> 1.1?)
>>>>
>>>> > DataStream#fold and all related classes and methods such as
>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>> 1.3/1.4)
>>>>
>>>> > DataStream#split (deprecated in 1.8)
>>>>
>>>> > Methods in (Connected)DataStream that specify keys as either indices
>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>
>>>> >
>>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>>>> (deprecated in 1.2)
>>>>
>>>> >
>>>>
>>>> > Looking forward to more opinions on the issue.
>>>>
>>>> >
>>>>
>>>> > Best,
>>>>
>>>> >
>>>>
>>>> > Dawid
>>>>
>>>> >
>>>>
>>>> >
>>>>
>>>> > On 17/08/2020 12:49, Kostas Kloudas wrote:
>>>>
>>>> >
>>>>
>>>> > Thanks a lot for starting this Dawid,
>>>>
>>>> >
>>>>
>>>> > Big +1 for the proposed clean-up, and I would also add the deprecated
>>>>
>>>> > methods of the StreamExecutionEnvironment like:
>>>>
>>>> >
>>>>
>>>> > enableCheckpointing(long interval, CheckpointingMode mode, boolean
>>>> force)
>>>>
>>>> > enableCheckpointing()
>>>>
>>>> > isForceCheckpointing()
>>>>
>>>> >
>>>>
>>>> > readFile(FileInputFormat inputFormat,String
>>>>
>>>> > filePath,FileProcessingMode watchType,long interval, FilePathFilter
>>>>
>>>> > filter)
>>>>
>>>> > readFileStream(...)
>>>>
>>>> >
>>>>
>>>> > socketTextStream(String hostname, int port, char delimiter, long
>>>> maxRetry)
>>>>
>>>> > socketTextStream(String hostname, int port, char delimiter)
>>>>
>>>> >
>>>>
>>>> > There are more, like the (get)/setNumberOfExecutionRetries() that were
>>>>
>>>> > deprecated long ago, but I have not investigated to see if they are
>>>>
>>>> > actually easy to remove.
>>>>
>>>> >
>>>>
>>>> > Cheers,
>>>>
>>>> > Kostas
>>>>
>>>> >
>>>>
>>>> > On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
>>>>
>>>> > wrote:
>>>>
>>>> >
>>>>
>>>> > Hi devs and users,
>>>>
>>>> >
>>>>
>>>> > I wanted to ask you what do you think about removing some of the
>>>> deprecated APIs around the DataStream API.
>>>>
>>>> >
>>>>
>>>> > The APIs I have in mind are:
>>>>
>>>> >
>>>>
>>>> > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>>
>>>> > DataStream#fold and all related classes and methods such as
>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>> 1.3/1.4)
>>>>
>>>> > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>> (deprecated in 1.5)
>>>>
>>>> > DataStream#split (deprecated in 1.8)
>>>>
>>>> > Methods in (Connected)DataStream that specify keys as either indices
>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>
>>>> >
>>>>
>>>> > I think the first three should be straightforward. They are long
>>>> deprecated. The getAccumulators method is not used very often in my
>>>> opinion. The same applies to the DataStream#fold which additionally is not
>>>> very performant. Lastly the setStateBackend has an alternative with a class
>>>> from the AbstractStateBackend hierarchy, therefore it will be still code
>>>> compatible. Moreover if we remove the
>>>> #setStateBackend(AbstractStateBackend) we will get rid off warnings users
>>>> have right now when setting a statebackend as the correct method cannot be
>>>> used without an explicit casting.
>>>>
>>>> >
>>>>
>>>> > As for the DataStream#split I know there were some objections against
>>>> removing the #split method in the past. I still believe the output tags can
>>>> replace the split method already.
>>>>
>>>> >
>>>>
>>>> > The only problem in the last set of methods I propose to remove is
>>>> that they were deprecated only in the last release and those method were
>>>> only partially deprecated. Moreover some of the methods were not deprecated
>>>> in ConnectedStreams. Nevertheless I'd still be inclined to remove the
>>>> methods in this release.
>>>>
>>>> >
>>>>
>>>> > Let me know what do you think about it.
>>>>
>>>> >
>>>>
>>>> > Best,
>>>>
>>>> >
>>>>
>>>> > Dawid
>>>
>>>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk

Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Konstantin Knauf <kn...@apache.org>.
I would argue that the guarantees of @Public methods that became
ineffective were broken when they became ineffective (and were deprecated).

   - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
   - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)

Removing these methods seems like the better of two evils to me as it shows
users that they have been using no-ops for some time.

On Thu, Aug 20, 2020 at 10:50 AM Stephan Ewen <se...@apache.org> wrote:

> We have removed some public methods in the past, after a careful
> deprecation period, if they were not well working any more.
>
> The sentiment I got from users is that careful cleanup is in fact
> appreciated, otherwise things get confusing over time (the deprecated
> methods cause noise in the API).
> Still, we need to be very careful here.
>
> I would suggest to
>   - start with the non-public breaking methods
>   - remove fold() (very long deprecated)
>   - remove split() buggy
>
> Removing the env.socketStream() and env.fileStream() methods would
> probably be good, too. They are very long deprecated and they don't work
> well (with checkpoints) and the sources are the first thing a user needs to
> understand when starting with Flink, so removing noise there is super
> helpful.
>
>
> On Thu, Aug 20, 2020 at 8:53 AM Dawid Wysakowicz <dw...@apache.org>
> wrote:
>
>> Hey Till,
>>
>> You've got a good point here. Removing some of the methods would cause
>> breaking the stability guarantees. I do understand if we decide not to
>> remove them for that reason, let me explain though why I am thinking it
>> might make sense to remove them already. First of all I am a bit afraid it
>> might take a long time before we arrive at the 2.0 version. We have not
>> ever discussed that in the community. At the same time a lot of the methods
>> already don't work or are buggy, and we do not fix them any more.
>>
>> Methods which removing would not break the Public guarantees:
>>
>>    - ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>    - RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>    - ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>    - StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>>    (not the equivalent in the ExecutionConfig)
>>    - StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>    (deprecated in 1.5)
>>
>> Methods which removing would break the Public guarantees:
>>
>> which have no effect:
>>
>>    - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>    - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>
>> which are buggy or discouraged and thus we do not support fixing them:
>>
>>    - DataStream#split (deprecated in 1.8)
>>    - DataStream#fold and all related classes and methods such as
>>    FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>    1.3/1.4)
>>
>> The methods like:
>>
>>    - StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...),
>>
>>    - methods in (Connected)DataStream that specify keys as either
>>    indices or field names
>>    -
>>    ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>>
>> should be working just fine and I feel the least eager to remove those.
>>
>> I'd suggest I will open PRs for removing the methods that will not cause
>> breakage of the Public guarantees as the general feedback was rather
>> positive. For the rest I do understand the resentment to do so and will not
>> do it in the 1.x branch. Still I think it is valuable to have the
>> discussion.
>>
>> Best,
>>
>> Dawid
>>
>>
>> On 18/08/2020 09:26, Till Rohrmann wrote:
>>
>> Having looked at the proposed set of methods to remove I've noticed that
>> some of them are actually annotated with @Public. According to our
>> stability guarantees, only major releases (1.0, 2.0, etc.) can break APIs
>> with this annotation. Hence, I believe that we cannot simply remove them
>> unless the community decides to change the stability guarantees we give or
>> by making the next release a major release (Flink 2.0).
>>
>> Cheers,
>> Till
>>
>> On Tue, Aug 18, 2020 at 5:57 AM Yun Gao <yu...@aliyun.com> wrote:
>>
>>> +1 for removing the methods that are deprecated for a while & have
>>> alternative methods.
>>>
>>> One specific thing is that if we remove the DataStream#split, do we
>>> consider enabling side-output in more operators in the future ? Currently
>>> it should be only available in ProcessFunctions, but not available to other
>>> commonly used UDF like Source or AsyncFunction[1].
>>>
>>> One temporary solution occurs to me is to add a ProcessFunction after
>>> the operators want to use side-output. But I think the solution is not very
>>> direct to come up with and if it really works we might add it to the
>>> document of side-output.
>>>
>>> [1] https://issues.apache.org/jira/browse/FLINK-7954
>>>
>>> Best,
>>>  Yun
>>>
>>> ------------------Original Mail ------------------
>>> *Sender:*Kostas Kloudas <kk...@gmail.com>
>>> *Send Date:*Tue Aug 18 03:52:44 2020
>>> *Recipients:*Dawid Wysakowicz <dw...@apache.org>
>>> *CC:*dev <de...@flink.apache.org>, user <us...@flink.apache.org>
>>> *Subject:*Re: [DISCUSS] Removing deprecated methods from DataStream API
>>>
>>>> +1 for removing them.
>>>>
>>>>
>>>>
>>>> From a quick look, most of them (not all) have been deprecated a long
>>>> time ago.
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Kostas
>>>>
>>>>
>>>>
>>>> On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote:
>>>>
>>>> >
>>>>
>>>> > @David Yes, my idea was to remove any use of fold method and all
>>>> related classes including WindowedStream#fold
>>>>
>>>> >
>>>>
>>>> > @Klou Good idea to also remove the deprecated enableCheckpointing() &
>>>> StreamExecutionEnvironment#readFile and alike. I did another pass over some
>>>> of the classes and thought we could also drop:
>>>>
>>>> >
>>>>
>>>> > ExecutionConfig#set/getCodeAnalysisMode
>>>>
>>>> > ExecutionConfig#disable/enableSysoutLogging
>>>>
>>>> > ExecutionConfig#set/isFailTaskOnCheckpointError
>>>>
>>>> > ExecutionConfig#isLatencyTrackingEnabled
>>>>
>>>> >
>>>>
>>>> > As for the `forceCheckpointing` I am not fully convinced to doing it.
>>>> As far as I know iterations still do not participate in checkpointing
>>>> correctly. Therefore it still might make sense to force it. In other words
>>>> there is no real alternative to that method. Unless we only remove the
>>>> methods from StreamExecutionEnvironment and redirect to the setter in
>>>> CheckpointConfig. WDYT?
>>>>
>>>> >
>>>>
>>>> > An updated list of methods I suggest to remove:
>>>>
>>>> >
>>>>
>>>> > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>>
>>>> > ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>>
>>>> > ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>>
>>>> > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>>
>>>> > ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
>>>> 1.1?)
>>>>
>>>> >
>>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>>>> (deprecated in 1.2)
>>>>
>>>> > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>>
>>>> > DataStream#fold and all related classes and methods such as
>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>> 1.3/1.4)
>>>>
>>>> > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>> (deprecated in 1.5)
>>>>
>>>> > DataStream#split (deprecated in 1.8)
>>>>
>>>> > Methods in (Connected)DataStream that specify keys as either indices
>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>
>>>> >
>>>>
>>>> > Bear in mind that majority of the options listed above in
>>>> ExecutionConfig take no effect. They were left there purely to satisfy the
>>>> binary compatibility. Personally I don't see any benefit of leaving a
>>>> method and silently dropping the underlying feature. The only configuration
>>>> that is respected is setting the number of execution retries.
>>>>
>>>> >
>>>>
>>>> > I also wanted to make it explicit that most of the changes above
>>>> would result in a binary incompatibility and require additional exclusions
>>>> in the japicmp. Those are:
>>>>
>>>> >
>>>>
>>>> > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>>
>>>> > ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>>
>>>> > ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>>
>>>> > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>>
>>>> > ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
>>>> 1.1?)
>>>>
>>>> > DataStream#fold and all related classes and methods such as
>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>> 1.3/1.4)
>>>>
>>>> > DataStream#split (deprecated in 1.8)
>>>>
>>>> > Methods in (Connected)DataStream that specify keys as either indices
>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>
>>>> >
>>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>>>> (deprecated in 1.2)
>>>>
>>>> >
>>>>
>>>> > Looking forward to more opinions on the issue.
>>>>
>>>> >
>>>>
>>>> > Best,
>>>>
>>>> >
>>>>
>>>> > Dawid
>>>>
>>>> >
>>>>
>>>> >
>>>>
>>>> > On 17/08/2020 12:49, Kostas Kloudas wrote:
>>>>
>>>> >
>>>>
>>>> > Thanks a lot for starting this Dawid,
>>>>
>>>> >
>>>>
>>>> > Big +1 for the proposed clean-up, and I would also add the deprecated
>>>>
>>>> > methods of the StreamExecutionEnvironment like:
>>>>
>>>> >
>>>>
>>>> > enableCheckpointing(long interval, CheckpointingMode mode, boolean
>>>> force)
>>>>
>>>> > enableCheckpointing()
>>>>
>>>> > isForceCheckpointing()
>>>>
>>>> >
>>>>
>>>> > readFile(FileInputFormat inputFormat,String
>>>>
>>>> > filePath,FileProcessingMode watchType,long interval, FilePathFilter
>>>>
>>>> > filter)
>>>>
>>>> > readFileStream(...)
>>>>
>>>> >
>>>>
>>>> > socketTextStream(String hostname, int port, char delimiter, long
>>>> maxRetry)
>>>>
>>>> > socketTextStream(String hostname, int port, char delimiter)
>>>>
>>>> >
>>>>
>>>> > There are more, like the (get)/setNumberOfExecutionRetries() that were
>>>>
>>>> > deprecated long ago, but I have not investigated to see if they are
>>>>
>>>> > actually easy to remove.
>>>>
>>>> >
>>>>
>>>> > Cheers,
>>>>
>>>> > Kostas
>>>>
>>>> >
>>>>
>>>> > On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
>>>>
>>>> > wrote:
>>>>
>>>> >
>>>>
>>>> > Hi devs and users,
>>>>
>>>> >
>>>>
>>>> > I wanted to ask you what do you think about removing some of the
>>>> deprecated APIs around the DataStream API.
>>>>
>>>> >
>>>>
>>>> > The APIs I have in mind are:
>>>>
>>>> >
>>>>
>>>> > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>>
>>>> > DataStream#fold and all related classes and methods such as
>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>> 1.3/1.4)
>>>>
>>>> > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>> (deprecated in 1.5)
>>>>
>>>> > DataStream#split (deprecated in 1.8)
>>>>
>>>> > Methods in (Connected)DataStream that specify keys as either indices
>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>>
>>>> >
>>>>
>>>> > I think the first three should be straightforward. They are long
>>>> deprecated. The getAccumulators method is not used very often in my
>>>> opinion. The same applies to the DataStream#fold which additionally is not
>>>> very performant. Lastly the setStateBackend has an alternative with a class
>>>> from the AbstractStateBackend hierarchy, therefore it will be still code
>>>> compatible. Moreover if we remove the
>>>> #setStateBackend(AbstractStateBackend) we will get rid off warnings users
>>>> have right now when setting a statebackend as the correct method cannot be
>>>> used without an explicit casting.
>>>>
>>>> >
>>>>
>>>> > As for the DataStream#split I know there were some objections against
>>>> removing the #split method in the past. I still believe the output tags can
>>>> replace the split method already.
>>>>
>>>> >
>>>>
>>>> > The only problem in the last set of methods I propose to remove is
>>>> that they were deprecated only in the last release and those method were
>>>> only partially deprecated. Moreover some of the methods were not deprecated
>>>> in ConnectedStreams. Nevertheless I'd still be inclined to remove the
>>>> methods in this release.
>>>>
>>>> >
>>>>
>>>> > Let me know what do you think about it.
>>>>
>>>> >
>>>>
>>>> > Best,
>>>>
>>>> >
>>>>
>>>> > Dawid
>>>
>>>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk

Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Stephan Ewen <se...@apache.org>.
We have removed some public methods in the past, after a careful
deprecation period, if they were not well working any more.

The sentiment I got from users is that careful cleanup is in fact
appreciated, otherwise things get confusing over time (the deprecated
methods cause noise in the API).
Still, we need to be very careful here.

I would suggest to
  - start with the non-public breaking methods
  - remove fold() (very long deprecated)
  - remove split() buggy

Removing the env.socketStream() and env.fileStream() methods would probably
be good, too. They are very long deprecated and they don't work well (with
checkpoints) and the sources are the first thing a user needs to understand
when starting with Flink, so removing noise there is super helpful.


On Thu, Aug 20, 2020 at 8:53 AM Dawid Wysakowicz <dw...@apache.org>
wrote:

> Hey Till,
>
> You've got a good point here. Removing some of the methods would cause
> breaking the stability guarantees. I do understand if we decide not to
> remove them for that reason, let me explain though why I am thinking it
> might make sense to remove them already. First of all I am a bit afraid it
> might take a long time before we arrive at the 2.0 version. We have not
> ever discussed that in the community. At the same time a lot of the methods
> already don't work or are buggy, and we do not fix them any more.
>
> Methods which removing would not break the Public guarantees:
>
>    - ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>    - RuntimeContext#getAllAccumulators (deprecated in 0.10)
>    - ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>    - StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>    (not the equivalent in the ExecutionConfig)
>    - StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>    (deprecated in 1.5)
>
> Methods which removing would break the Public guarantees:
>
> which have no effect:
>
>    - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>    - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>
> which are buggy or discouraged and thus we do not support fixing them:
>
>    - DataStream#split (deprecated in 1.8)
>    - DataStream#fold and all related classes and methods such as
>    FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>    1.3/1.4)
>
> The methods like:
>
>    - StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...),
>
>    - methods in (Connected)DataStream that specify keys as either indices
>    or field names
>    -
>    ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>
> should be working just fine and I feel the least eager to remove those.
>
> I'd suggest I will open PRs for removing the methods that will not cause
> breakage of the Public guarantees as the general feedback was rather
> positive. For the rest I do understand the resentment to do so and will not
> do it in the 1.x branch. Still I think it is valuable to have the
> discussion.
>
> Best,
>
> Dawid
>
>
> On 18/08/2020 09:26, Till Rohrmann wrote:
>
> Having looked at the proposed set of methods to remove I've noticed that
> some of them are actually annotated with @Public. According to our
> stability guarantees, only major releases (1.0, 2.0, etc.) can break APIs
> with this annotation. Hence, I believe that we cannot simply remove them
> unless the community decides to change the stability guarantees we give or
> by making the next release a major release (Flink 2.0).
>
> Cheers,
> Till
>
> On Tue, Aug 18, 2020 at 5:57 AM Yun Gao <yu...@aliyun.com> wrote:
>
>> +1 for removing the methods that are deprecated for a while & have
>> alternative methods.
>>
>> One specific thing is that if we remove the DataStream#split, do we
>> consider enabling side-output in more operators in the future ? Currently
>> it should be only available in ProcessFunctions, but not available to other
>> commonly used UDF like Source or AsyncFunction[1].
>>
>> One temporary solution occurs to me is to add a ProcessFunction after the
>> operators want to use side-output. But I think the solution is not very
>> direct to come up with and if it really works we might add it to the
>> document of side-output.
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-7954
>>
>> Best,
>>  Yun
>>
>> ------------------Original Mail ------------------
>> *Sender:*Kostas Kloudas <kk...@gmail.com>
>> *Send Date:*Tue Aug 18 03:52:44 2020
>> *Recipients:*Dawid Wysakowicz <dw...@apache.org>
>> *CC:*dev <de...@flink.apache.org>, user <us...@flink.apache.org>
>> *Subject:*Re: [DISCUSS] Removing deprecated methods from DataStream API
>>
>>> +1 for removing them.
>>>
>>>
>>>
>>> From a quick look, most of them (not all) have been deprecated a long
>>> time ago.
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Kostas
>>>
>>>
>>>
>>> On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote:
>>>
>>> >
>>>
>>> > @David Yes, my idea was to remove any use of fold method and all
>>> related classes including WindowedStream#fold
>>>
>>> >
>>>
>>> > @Klou Good idea to also remove the deprecated enableCheckpointing() &
>>> StreamExecutionEnvironment#readFile and alike. I did another pass over some
>>> of the classes and thought we could also drop:
>>>
>>> >
>>>
>>> > ExecutionConfig#set/getCodeAnalysisMode
>>>
>>> > ExecutionConfig#disable/enableSysoutLogging
>>>
>>> > ExecutionConfig#set/isFailTaskOnCheckpointError
>>>
>>> > ExecutionConfig#isLatencyTrackingEnabled
>>>
>>> >
>>>
>>> > As for the `forceCheckpointing` I am not fully convinced to doing it.
>>> As far as I know iterations still do not participate in checkpointing
>>> correctly. Therefore it still might make sense to force it. In other words
>>> there is no real alternative to that method. Unless we only remove the
>>> methods from StreamExecutionEnvironment and redirect to the setter in
>>> CheckpointConfig. WDYT?
>>>
>>> >
>>>
>>> > An updated list of methods I suggest to remove:
>>>
>>> >
>>>
>>> > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>
>>> > ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>
>>> > ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>
>>> > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>
>>> > ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
>>> 1.1?)
>>>
>>> >
>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>>> (deprecated in 1.2)
>>>
>>> > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>
>>> > DataStream#fold and all related classes and methods such as
>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>> 1.3/1.4)
>>>
>>> > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>> (deprecated in 1.5)
>>>
>>> > DataStream#split (deprecated in 1.8)
>>>
>>> > Methods in (Connected)DataStream that specify keys as either indices
>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>
>>> >
>>>
>>> > Bear in mind that majority of the options listed above in
>>> ExecutionConfig take no effect. They were left there purely to satisfy the
>>> binary compatibility. Personally I don't see any benefit of leaving a
>>> method and silently dropping the underlying feature. The only configuration
>>> that is respected is setting the number of execution retries.
>>>
>>> >
>>>
>>> > I also wanted to make it explicit that most of the changes above would
>>> result in a binary incompatibility and require additional exclusions in the
>>> japicmp. Those are:
>>>
>>> >
>>>
>>> > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>
>>> > ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>
>>> > ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>
>>> > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>
>>> > ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
>>> 1.1?)
>>>
>>> > DataStream#fold and all related classes and methods such as
>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>> 1.3/1.4)
>>>
>>> > DataStream#split (deprecated in 1.8)
>>>
>>> > Methods in (Connected)DataStream that specify keys as either indices
>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>
>>> >
>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>>> (deprecated in 1.2)
>>>
>>> >
>>>
>>> > Looking forward to more opinions on the issue.
>>>
>>> >
>>>
>>> > Best,
>>>
>>> >
>>>
>>> > Dawid
>>>
>>> >
>>>
>>> >
>>>
>>> > On 17/08/2020 12:49, Kostas Kloudas wrote:
>>>
>>> >
>>>
>>> > Thanks a lot for starting this Dawid,
>>>
>>> >
>>>
>>> > Big +1 for the proposed clean-up, and I would also add the deprecated
>>>
>>> > methods of the StreamExecutionEnvironment like:
>>>
>>> >
>>>
>>> > enableCheckpointing(long interval, CheckpointingMode mode, boolean
>>> force)
>>>
>>> > enableCheckpointing()
>>>
>>> > isForceCheckpointing()
>>>
>>> >
>>>
>>> > readFile(FileInputFormat inputFormat,String
>>>
>>> > filePath,FileProcessingMode watchType,long interval, FilePathFilter
>>>
>>> > filter)
>>>
>>> > readFileStream(...)
>>>
>>> >
>>>
>>> > socketTextStream(String hostname, int port, char delimiter, long
>>> maxRetry)
>>>
>>> > socketTextStream(String hostname, int port, char delimiter)
>>>
>>> >
>>>
>>> > There are more, like the (get)/setNumberOfExecutionRetries() that were
>>>
>>> > deprecated long ago, but I have not investigated to see if they are
>>>
>>> > actually easy to remove.
>>>
>>> >
>>>
>>> > Cheers,
>>>
>>> > Kostas
>>>
>>> >
>>>
>>> > On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
>>>
>>> > wrote:
>>>
>>> >
>>>
>>> > Hi devs and users,
>>>
>>> >
>>>
>>> > I wanted to ask you what do you think about removing some of the
>>> deprecated APIs around the DataStream API.
>>>
>>> >
>>>
>>> > The APIs I have in mind are:
>>>
>>> >
>>>
>>> > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>
>>> > DataStream#fold and all related classes and methods such as
>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>> 1.3/1.4)
>>>
>>> > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>> (deprecated in 1.5)
>>>
>>> > DataStream#split (deprecated in 1.8)
>>>
>>> > Methods in (Connected)DataStream that specify keys as either indices
>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>
>>> >
>>>
>>> > I think the first three should be straightforward. They are long
>>> deprecated. The getAccumulators method is not used very often in my
>>> opinion. The same applies to the DataStream#fold which additionally is not
>>> very performant. Lastly the setStateBackend has an alternative with a class
>>> from the AbstractStateBackend hierarchy, therefore it will be still code
>>> compatible. Moreover if we remove the
>>> #setStateBackend(AbstractStateBackend) we will get rid off warnings users
>>> have right now when setting a statebackend as the correct method cannot be
>>> used without an explicit casting.
>>>
>>> >
>>>
>>> > As for the DataStream#split I know there were some objections against
>>> removing the #split method in the past. I still believe the output tags can
>>> replace the split method already.
>>>
>>> >
>>>
>>> > The only problem in the last set of methods I propose to remove is
>>> that they were deprecated only in the last release and those method were
>>> only partially deprecated. Moreover some of the methods were not deprecated
>>> in ConnectedStreams. Nevertheless I'd still be inclined to remove the
>>> methods in this release.
>>>
>>> >
>>>
>>> > Let me know what do you think about it.
>>>
>>> >
>>>
>>> > Best,
>>>
>>> >
>>>
>>> > Dawid
>>
>>

Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Stephan Ewen <se...@apache.org>.
We have removed some public methods in the past, after a careful
deprecation period, if they were not well working any more.

The sentiment I got from users is that careful cleanup is in fact
appreciated, otherwise things get confusing over time (the deprecated
methods cause noise in the API).
Still, we need to be very careful here.

I would suggest to
  - start with the non-public breaking methods
  - remove fold() (very long deprecated)
  - remove split() buggy

Removing the env.socketStream() and env.fileStream() methods would probably
be good, too. They are very long deprecated and they don't work well (with
checkpoints) and the sources are the first thing a user needs to understand
when starting with Flink, so removing noise there is super helpful.


On Thu, Aug 20, 2020 at 8:53 AM Dawid Wysakowicz <dw...@apache.org>
wrote:

> Hey Till,
>
> You've got a good point here. Removing some of the methods would cause
> breaking the stability guarantees. I do understand if we decide not to
> remove them for that reason, let me explain though why I am thinking it
> might make sense to remove them already. First of all I am a bit afraid it
> might take a long time before we arrive at the 2.0 version. We have not
> ever discussed that in the community. At the same time a lot of the methods
> already don't work or are buggy, and we do not fix them any more.
>
> Methods which removing would not break the Public guarantees:
>
>    - ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>    - RuntimeContext#getAllAccumulators (deprecated in 0.10)
>    - ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>    - StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>    (not the equivalent in the ExecutionConfig)
>    - StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>    (deprecated in 1.5)
>
> Methods which removing would break the Public guarantees:
>
> which have no effect:
>
>    - ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>    - ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>
> which are buggy or discouraged and thus we do not support fixing them:
>
>    - DataStream#split (deprecated in 1.8)
>    - DataStream#fold and all related classes and methods such as
>    FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>    1.3/1.4)
>
> The methods like:
>
>    - StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...),
>
>    - methods in (Connected)DataStream that specify keys as either indices
>    or field names
>    -
>    ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries
>
> should be working just fine and I feel the least eager to remove those.
>
> I'd suggest I will open PRs for removing the methods that will not cause
> breakage of the Public guarantees as the general feedback was rather
> positive. For the rest I do understand the resentment to do so and will not
> do it in the 1.x branch. Still I think it is valuable to have the
> discussion.
>
> Best,
>
> Dawid
>
>
> On 18/08/2020 09:26, Till Rohrmann wrote:
>
> Having looked at the proposed set of methods to remove I've noticed that
> some of them are actually annotated with @Public. According to our
> stability guarantees, only major releases (1.0, 2.0, etc.) can break APIs
> with this annotation. Hence, I believe that we cannot simply remove them
> unless the community decides to change the stability guarantees we give or
> by making the next release a major release (Flink 2.0).
>
> Cheers,
> Till
>
> On Tue, Aug 18, 2020 at 5:57 AM Yun Gao <yu...@aliyun.com> wrote:
>
>> +1 for removing the methods that are deprecated for a while & have
>> alternative methods.
>>
>> One specific thing is that if we remove the DataStream#split, do we
>> consider enabling side-output in more operators in the future ? Currently
>> it should be only available in ProcessFunctions, but not available to other
>> commonly used UDF like Source or AsyncFunction[1].
>>
>> One temporary solution occurs to me is to add a ProcessFunction after the
>> operators want to use side-output. But I think the solution is not very
>> direct to come up with and if it really works we might add it to the
>> document of side-output.
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-7954
>>
>> Best,
>>  Yun
>>
>> ------------------Original Mail ------------------
>> *Sender:*Kostas Kloudas <kk...@gmail.com>
>> *Send Date:*Tue Aug 18 03:52:44 2020
>> *Recipients:*Dawid Wysakowicz <dw...@apache.org>
>> *CC:*dev <de...@flink.apache.org>, user <us...@flink.apache.org>
>> *Subject:*Re: [DISCUSS] Removing deprecated methods from DataStream API
>>
>>> +1 for removing them.
>>>
>>>
>>>
>>> From a quick look, most of them (not all) have been deprecated a long
>>> time ago.
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Kostas
>>>
>>>
>>>
>>> On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote:
>>>
>>> >
>>>
>>> > @David Yes, my idea was to remove any use of fold method and all
>>> related classes including WindowedStream#fold
>>>
>>> >
>>>
>>> > @Klou Good idea to also remove the deprecated enableCheckpointing() &
>>> StreamExecutionEnvironment#readFile and alike. I did another pass over some
>>> of the classes and thought we could also drop:
>>>
>>> >
>>>
>>> > ExecutionConfig#set/getCodeAnalysisMode
>>>
>>> > ExecutionConfig#disable/enableSysoutLogging
>>>
>>> > ExecutionConfig#set/isFailTaskOnCheckpointError
>>>
>>> > ExecutionConfig#isLatencyTrackingEnabled
>>>
>>> >
>>>
>>> > As for the `forceCheckpointing` I am not fully convinced to doing it.
>>> As far as I know iterations still do not participate in checkpointing
>>> correctly. Therefore it still might make sense to force it. In other words
>>> there is no real alternative to that method. Unless we only remove the
>>> methods from StreamExecutionEnvironment and redirect to the setter in
>>> CheckpointConfig. WDYT?
>>>
>>> >
>>>
>>> > An updated list of methods I suggest to remove:
>>>
>>> >
>>>
>>> > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>
>>> > ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>
>>> > ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>
>>> > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>
>>> > ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
>>> 1.1?)
>>>
>>> >
>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>>> (deprecated in 1.2)
>>>
>>> > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>
>>> > DataStream#fold and all related classes and methods such as
>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>> 1.3/1.4)
>>>
>>> > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>> (deprecated in 1.5)
>>>
>>> > DataStream#split (deprecated in 1.8)
>>>
>>> > Methods in (Connected)DataStream that specify keys as either indices
>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>
>>> >
>>>
>>> > Bear in mind that majority of the options listed above in
>>> ExecutionConfig take no effect. They were left there purely to satisfy the
>>> binary compatibility. Personally I don't see any benefit of leaving a
>>> method and silently dropping the underlying feature. The only configuration
>>> that is respected is setting the number of execution retries.
>>>
>>> >
>>>
>>> > I also wanted to make it explicit that most of the changes above would
>>> result in a binary incompatibility and require additional exclusions in the
>>> japicmp. Those are:
>>>
>>> >
>>>
>>> > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>
>>> > ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>
>>> > ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>
>>> > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>
>>> > ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
>>> 1.1?)
>>>
>>> > DataStream#fold and all related classes and methods such as
>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>> 1.3/1.4)
>>>
>>> > DataStream#split (deprecated in 1.8)
>>>
>>> > Methods in (Connected)DataStream that specify keys as either indices
>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>
>>> >
>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>>> (deprecated in 1.2)
>>>
>>> >
>>>
>>> > Looking forward to more opinions on the issue.
>>>
>>> >
>>>
>>> > Best,
>>>
>>> >
>>>
>>> > Dawid
>>>
>>> >
>>>
>>> >
>>>
>>> > On 17/08/2020 12:49, Kostas Kloudas wrote:
>>>
>>> >
>>>
>>> > Thanks a lot for starting this Dawid,
>>>
>>> >
>>>
>>> > Big +1 for the proposed clean-up, and I would also add the deprecated
>>>
>>> > methods of the StreamExecutionEnvironment like:
>>>
>>> >
>>>
>>> > enableCheckpointing(long interval, CheckpointingMode mode, boolean
>>> force)
>>>
>>> > enableCheckpointing()
>>>
>>> > isForceCheckpointing()
>>>
>>> >
>>>
>>> > readFile(FileInputFormat inputFormat,String
>>>
>>> > filePath,FileProcessingMode watchType,long interval, FilePathFilter
>>>
>>> > filter)
>>>
>>> > readFileStream(...)
>>>
>>> >
>>>
>>> > socketTextStream(String hostname, int port, char delimiter, long
>>> maxRetry)
>>>
>>> > socketTextStream(String hostname, int port, char delimiter)
>>>
>>> >
>>>
>>> > There are more, like the (get)/setNumberOfExecutionRetries() that were
>>>
>>> > deprecated long ago, but I have not investigated to see if they are
>>>
>>> > actually easy to remove.
>>>
>>> >
>>>
>>> > Cheers,
>>>
>>> > Kostas
>>>
>>> >
>>>
>>> > On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
>>>
>>> > wrote:
>>>
>>> >
>>>
>>> > Hi devs and users,
>>>
>>> >
>>>
>>> > I wanted to ask you what do you think about removing some of the
>>> deprecated APIs around the DataStream API.
>>>
>>> >
>>>
>>> > The APIs I have in mind are:
>>>
>>> >
>>>
>>> > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>
>>> > DataStream#fold and all related classes and methods such as
>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>> 1.3/1.4)
>>>
>>> > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>> (deprecated in 1.5)
>>>
>>> > DataStream#split (deprecated in 1.8)
>>>
>>> > Methods in (Connected)DataStream that specify keys as either indices
>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>>
>>> >
>>>
>>> > I think the first three should be straightforward. They are long
>>> deprecated. The getAccumulators method is not used very often in my
>>> opinion. The same applies to the DataStream#fold which additionally is not
>>> very performant. Lastly the setStateBackend has an alternative with a class
>>> from the AbstractStateBackend hierarchy, therefore it will be still code
>>> compatible. Moreover if we remove the
>>> #setStateBackend(AbstractStateBackend) we will get rid off warnings users
>>> have right now when setting a statebackend as the correct method cannot be
>>> used without an explicit casting.
>>>
>>> >
>>>
>>> > As for the DataStream#split I know there were some objections against
>>> removing the #split method in the past. I still believe the output tags can
>>> replace the split method already.
>>>
>>> >
>>>
>>> > The only problem in the last set of methods I propose to remove is
>>> that they were deprecated only in the last release and those method were
>>> only partially deprecated. Moreover some of the methods were not deprecated
>>> in ConnectedStreams. Nevertheless I'd still be inclined to remove the
>>> methods in this release.
>>>
>>> >
>>>
>>> > Let me know what do you think about it.
>>>
>>> >
>>>
>>> > Best,
>>>
>>> >
>>>
>>> > Dawid
>>
>>

Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Dawid Wysakowicz <dw...@apache.org>.
Hey Till,

You've got a good point here. Removing some of the methods would cause
breaking the stability guarantees. I do understand if we decide not to
remove them for that reason, let me explain though why I am thinking it
might make sense to remove them already. First of all I am a bit afraid
it might take a long time before we arrive at the 2.0 version. We have
not ever discussed that in the community. At the same time a lot of the
methods already don't work or are buggy, and we do not fix them any more.

Methods which removing would not break the Public guarantees:

  * ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
  * RuntimeContext#getAllAccumulators (deprecated in 0.10)
  * ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
  * StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries
    (not the equivalent in the ExecutionConfig)
  * StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
    (deprecated in 1.5)

Methods which removing would break the Public guarantees:

which have no effect:

  * ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
  * ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)

which are buggy or discouraged and thus we do not support fixing them:

  * DataStream#split (deprecated in 1.8)
  * DataStream#fold and all related classes and methods such as
    FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated
    in 1.3/1.4)

The methods like:

  * StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...),

  * methods in (Connected)DataStream that specify keys as either indices
    or field names
  * ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries

should be working just fine and I feel the least eager to remove those.

I'd suggest I will open PRs for removing the methods that will not cause
breakage of the Public guarantees as the general feedback was rather
positive. For the rest I do understand the resentment to do so and will
not do it in the 1.x branch. Still I think it is valuable to have the
discussion.

Best,

Dawid


On 18/08/2020 09:26, Till Rohrmann wrote:
> Having looked at the proposed set of methods to remove I've noticed
> that some of them are actually annotated with @Public. According to
> our stability guarantees, only major releases (1.0, 2.0, etc.) can
> break APIs with this annotation. Hence, I believe that we cannot
> simply remove them unless the community decides to change the
> stability guarantees we give or by making the next release a major
> release (Flink 2.0).
>
> Cheers,
> Till
>
> On Tue, Aug 18, 2020 at 5:57 AM Yun Gao <yungao.gy@aliyun.com
> <ma...@aliyun.com>> wrote:
>
>     +1 for removing the methods that are deprecated for a while & have
>     alternative methods.
>
>     One specific thing is that if we remove the DataStream#split, do
>     we consider enabling side-output in more operators in the future ?
>     Currently it should be only available in ProcessFunctions, but not
>     available to other commonly used UDF like Source or AsyncFunction[1].
>
>     One temporary solution occurs to me is to add a ProcessFunction
>     after the operators want to use side-output. But I think the
>     solution is not very direct to come up with and if it really works
>     we might add it to the document of side-output. 
>
>     [1] https://issues.apache.org/jira/browse/FLINK-7954
>
>     Best,
>      Yun
>
>         ------------------Original Mail ------------------
>         *Sender:*Kostas Kloudas <kkloudas@gmail.com
>         <ma...@gmail.com>>
>         *Send Date:*Tue Aug 18 03:52:44 2020
>         *Recipients:*Dawid Wysakowicz <dwysakowicz@apache.org
>         <ma...@apache.org>>
>         *CC:*dev <dev@flink.apache.org <ma...@flink.apache.org>>,
>         user <user@flink.apache.org <ma...@flink.apache.org>>
>         *Subject:*Re: [DISCUSS] Removing deprecated methods from
>         DataStream API
>
>             +1 for removing them.
>
>
>
>             From a quick look, most of them (not all) have been
>             deprecated a long time ago.
>
>
>
>             Cheers,
>
>             Kostas
>
>
>
>             On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote:
>
>             >
>
>             > @David Yes, my idea was to remove any use of fold method
>             and all related classes including WindowedStream#fold
>
>             >
>
>             > @Klou Good idea to also remove the deprecated
>             enableCheckpointing() &
>             StreamExecutionEnvironment#readFile and alike. I did
>             another pass over some of the classes and thought we could
>             also drop:
>
>             >
>
>             > ExecutionConfig#set/getCodeAnalysisMode
>
>             > ExecutionConfig#disable/enableSysoutLogging
>
>             > ExecutionConfig#set/isFailTaskOnCheckpointError
>
>             > ExecutionConfig#isLatencyTrackingEnabled
>
>             >
>
>             > As for the `forceCheckpointing` I am not fully convinced
>             to doing it. As far as I know iterations still do not
>             participate in checkpointing correctly. Therefore it still
>             might make sense to force it. In other words there is no
>             real alternative to that method. Unless we only remove the
>             methods from StreamExecutionEnvironment and redirect to
>             the setter in CheckpointConfig. WDYT?
>
>             >
>
>             > An updated list of methods I suggest to remove:
>
>             >
>
>             > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>
>             > ExecutionConfig#disable/enableSysoutLogging (deprecated
>             in 1.10)
>
>             > ExecutionConfig#set/isFailTaskOnCheckpointError
>             (deprecated in 1.9)
>
>             > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>
>             > ExecutionConfig#(get)/setNumberOfExecutionRetries()
>             (deprecated in 1.1?)
>
>             >
>             StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>             (deprecated in 1.2)
>
>             > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>
>             > DataStream#fold and all related classes and methods such
>             as FoldFunction, FoldingState, FoldingStateDescriptor ...
>             (deprecated in 1.3/1.4)
>
>             >
>             StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>             (deprecated in 1.5)
>
>             > DataStream#split (deprecated in 1.8)
>
>             > Methods in (Connected)DataStream that specify keys as
>             either indices or field names such as DataStream#keyBy,
>             DataStream#partitionCustom, ConnectedStream#keyBy, ....
>             (deprecated in 1.11)
>
>             >
>
>             > Bear in mind that majority of the options listed above
>             in ExecutionConfig take no effect. They were left there
>             purely to satisfy the binary compatibility. Personally I
>             don't see any benefit of leaving a method and silently
>             dropping the underlying feature. The only configuration
>             that is respected is setting the number of execution retries.
>
>             >
>
>             > I also wanted to make it explicit that most of the
>             changes above would result in a binary incompatibility and
>             require additional exclusions in the japicmp. Those are:
>
>             >
>
>             > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>
>             > ExecutionConfig#disable/enableSysoutLogging (deprecated
>             in 1.10)
>
>             > ExecutionConfig#set/isFailTaskOnCheckpointError
>             (deprecated in 1.9)
>
>             > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>
>             > ExecutionConfig#(get)/setNumberOfExecutionRetries()
>             (deprecated in 1.1?)
>
>             > DataStream#fold and all related classes and methods such
>             as FoldFunction, FoldingState, FoldingStateDescriptor ...
>             (deprecated in 1.3/1.4)
>
>             > DataStream#split (deprecated in 1.8)
>
>             > Methods in (Connected)DataStream that specify keys as
>             either indices or field names such as DataStream#keyBy,
>             DataStream#partitionCustom, ConnectedStream#keyBy, ....
>             (deprecated in 1.11)
>
>             >
>             StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>             (deprecated in 1.2)
>
>             >
>
>             > Looking forward to more opinions on the issue.
>
>             >
>
>             > Best,
>
>             >
>
>             > Dawid
>
>             >
>
>             >
>
>             > On 17/08/2020 12:49, Kostas Kloudas wrote:
>
>             >
>
>             > Thanks a lot for starting this Dawid,
>
>             >
>
>             > Big +1 for the proposed clean-up, and I would also add
>             the deprecated
>
>             > methods of the StreamExecutionEnvironment like:
>
>             >
>
>             > enableCheckpointing(long interval, CheckpointingMode
>             mode, boolean force)
>
>             > enableCheckpointing()
>
>             > isForceCheckpointing()
>
>             >
>
>             > readFile(FileInputFormat inputFormat,String
>
>             > filePath,FileProcessingMode watchType,long interval,
>             FilePathFilter
>
>             > filter)
>
>             > readFileStream(...)
>
>             >
>
>             > socketTextStream(String hostname, int port, char
>             delimiter, long maxRetry)
>
>             > socketTextStream(String hostname, int port, char delimiter)
>
>             >
>
>             > There are more, like the
>             (get)/setNumberOfExecutionRetries() that were
>
>             > deprecated long ago, but I have not investigated to see
>             if they are
>
>             > actually easy to remove.
>
>             >
>
>             > Cheers,
>
>             > Kostas
>
>             >
>
>             > On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
>
>             > wrote:
>
>             >
>
>             > Hi devs and users,
>
>             >
>
>             > I wanted to ask you what do you think about removing
>             some of the deprecated APIs around the DataStream API.
>
>             >
>
>             > The APIs I have in mind are:
>
>             >
>
>             > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>
>             > DataStream#fold and all related classes and methods such
>             as FoldFunction, FoldingState, FoldingStateDescriptor ...
>             (deprecated in 1.3/1.4)
>
>             >
>             StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>             (deprecated in 1.5)
>
>             > DataStream#split (deprecated in 1.8)
>
>             > Methods in (Connected)DataStream that specify keys as
>             either indices or field names such as DataStream#keyBy,
>             DataStream#partitionCustom, ConnectedStream#keyBy, ....
>             (deprecated in 1.11)
>
>             >
>
>             > I think the first three should be straightforward. They
>             are long deprecated. The getAccumulators method is not
>             used very often in my opinion. The same applies to the
>             DataStream#fold which additionally is not very performant.
>             Lastly the setStateBackend has an alternative with a class
>             from the AbstractStateBackend hierarchy, therefore it will
>             be still code compatible. Moreover if we remove the
>             #setStateBackend(AbstractStateBackend) we will get rid off
>             warnings users have right now when setting a statebackend
>             as the correct method cannot be used without an explicit
>             casting.
>
>             >
>
>             > As for the DataStream#split I know there were some
>             objections against removing the #split method in the past.
>             I still believe the output tags can replace the split
>             method already.
>
>             >
>
>             > The only problem in the last set of methods I propose to
>             remove is that they were deprecated only in the last
>             release and those method were only partially deprecated.
>             Moreover some of the methods were not deprecated in
>             ConnectedStreams. Nevertheless I'd still be inclined to
>             remove the methods in this release.
>
>             >
>
>             > Let me know what do you think about it.
>
>             >
>
>             > Best,
>
>             >
>
>             > Dawid
>

Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Dawid Wysakowicz <dw...@apache.org>.
Hey Till,

You've got a good point here. Removing some of the methods would cause
breaking the stability guarantees. I do understand if we decide not to
remove them for that reason, let me explain though why I am thinking it
might make sense to remove them already. First of all I am a bit afraid
it might take a long time before we arrive at the 2.0 version. We have
not ever discussed that in the community. At the same time a lot of the
methods already don't work or are buggy, and we do not fix them any more.

Methods which removing would not break the Public guarantees:

  * ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
  * RuntimeContext#getAllAccumulators (deprecated in 0.10)
  * ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
  * StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries
    (not the equivalent in the ExecutionConfig)
  * StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
    (deprecated in 1.5)

Methods which removing would break the Public guarantees:

which have no effect:

  * ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
  * ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)

which are buggy or discouraged and thus we do not support fixing them:

  * DataStream#split (deprecated in 1.8)
  * DataStream#fold and all related classes and methods such as
    FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated
    in 1.3/1.4)

The methods like:

  * StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...),

  * methods in (Connected)DataStream that specify keys as either indices
    or field names
  * ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries

should be working just fine and I feel the least eager to remove those.

I'd suggest I will open PRs for removing the methods that will not cause
breakage of the Public guarantees as the general feedback was rather
positive. For the rest I do understand the resentment to do so and will
not do it in the 1.x branch. Still I think it is valuable to have the
discussion.

Best,

Dawid


On 18/08/2020 09:26, Till Rohrmann wrote:
> Having looked at the proposed set of methods to remove I've noticed
> that some of them are actually annotated with @Public. According to
> our stability guarantees, only major releases (1.0, 2.0, etc.) can
> break APIs with this annotation. Hence, I believe that we cannot
> simply remove them unless the community decides to change the
> stability guarantees we give or by making the next release a major
> release (Flink 2.0).
>
> Cheers,
> Till
>
> On Tue, Aug 18, 2020 at 5:57 AM Yun Gao <yungao.gy@aliyun.com
> <ma...@aliyun.com>> wrote:
>
>     +1 for removing the methods that are deprecated for a while & have
>     alternative methods.
>
>     One specific thing is that if we remove the DataStream#split, do
>     we consider enabling side-output in more operators in the future ?
>     Currently it should be only available in ProcessFunctions, but not
>     available to other commonly used UDF like Source or AsyncFunction[1].
>
>     One temporary solution occurs to me is to add a ProcessFunction
>     after the operators want to use side-output. But I think the
>     solution is not very direct to come up with and if it really works
>     we might add it to the document of side-output. 
>
>     [1] https://issues.apache.org/jira/browse/FLINK-7954
>
>     Best,
>      Yun
>
>         ------------------Original Mail ------------------
>         *Sender:*Kostas Kloudas <kkloudas@gmail.com
>         <ma...@gmail.com>>
>         *Send Date:*Tue Aug 18 03:52:44 2020
>         *Recipients:*Dawid Wysakowicz <dwysakowicz@apache.org
>         <ma...@apache.org>>
>         *CC:*dev <dev@flink.apache.org <ma...@flink.apache.org>>,
>         user <user@flink.apache.org <ma...@flink.apache.org>>
>         *Subject:*Re: [DISCUSS] Removing deprecated methods from
>         DataStream API
>
>             +1 for removing them.
>
>
>
>             From a quick look, most of them (not all) have been
>             deprecated a long time ago.
>
>
>
>             Cheers,
>
>             Kostas
>
>
>
>             On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote:
>
>             >
>
>             > @David Yes, my idea was to remove any use of fold method
>             and all related classes including WindowedStream#fold
>
>             >
>
>             > @Klou Good idea to also remove the deprecated
>             enableCheckpointing() &
>             StreamExecutionEnvironment#readFile and alike. I did
>             another pass over some of the classes and thought we could
>             also drop:
>
>             >
>
>             > ExecutionConfig#set/getCodeAnalysisMode
>
>             > ExecutionConfig#disable/enableSysoutLogging
>
>             > ExecutionConfig#set/isFailTaskOnCheckpointError
>
>             > ExecutionConfig#isLatencyTrackingEnabled
>
>             >
>
>             > As for the `forceCheckpointing` I am not fully convinced
>             to doing it. As far as I know iterations still do not
>             participate in checkpointing correctly. Therefore it still
>             might make sense to force it. In other words there is no
>             real alternative to that method. Unless we only remove the
>             methods from StreamExecutionEnvironment and redirect to
>             the setter in CheckpointConfig. WDYT?
>
>             >
>
>             > An updated list of methods I suggest to remove:
>
>             >
>
>             > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>
>             > ExecutionConfig#disable/enableSysoutLogging (deprecated
>             in 1.10)
>
>             > ExecutionConfig#set/isFailTaskOnCheckpointError
>             (deprecated in 1.9)
>
>             > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>
>             > ExecutionConfig#(get)/setNumberOfExecutionRetries()
>             (deprecated in 1.1?)
>
>             >
>             StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>             (deprecated in 1.2)
>
>             > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>
>             > DataStream#fold and all related classes and methods such
>             as FoldFunction, FoldingState, FoldingStateDescriptor ...
>             (deprecated in 1.3/1.4)
>
>             >
>             StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>             (deprecated in 1.5)
>
>             > DataStream#split (deprecated in 1.8)
>
>             > Methods in (Connected)DataStream that specify keys as
>             either indices or field names such as DataStream#keyBy,
>             DataStream#partitionCustom, ConnectedStream#keyBy, ....
>             (deprecated in 1.11)
>
>             >
>
>             > Bear in mind that majority of the options listed above
>             in ExecutionConfig take no effect. They were left there
>             purely to satisfy the binary compatibility. Personally I
>             don't see any benefit of leaving a method and silently
>             dropping the underlying feature. The only configuration
>             that is respected is setting the number of execution retries.
>
>             >
>
>             > I also wanted to make it explicit that most of the
>             changes above would result in a binary incompatibility and
>             require additional exclusions in the japicmp. Those are:
>
>             >
>
>             > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>
>             > ExecutionConfig#disable/enableSysoutLogging (deprecated
>             in 1.10)
>
>             > ExecutionConfig#set/isFailTaskOnCheckpointError
>             (deprecated in 1.9)
>
>             > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>
>             > ExecutionConfig#(get)/setNumberOfExecutionRetries()
>             (deprecated in 1.1?)
>
>             > DataStream#fold and all related classes and methods such
>             as FoldFunction, FoldingState, FoldingStateDescriptor ...
>             (deprecated in 1.3/1.4)
>
>             > DataStream#split (deprecated in 1.8)
>
>             > Methods in (Connected)DataStream that specify keys as
>             either indices or field names such as DataStream#keyBy,
>             DataStream#partitionCustom, ConnectedStream#keyBy, ....
>             (deprecated in 1.11)
>
>             >
>             StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>             (deprecated in 1.2)
>
>             >
>
>             > Looking forward to more opinions on the issue.
>
>             >
>
>             > Best,
>
>             >
>
>             > Dawid
>
>             >
>
>             >
>
>             > On 17/08/2020 12:49, Kostas Kloudas wrote:
>
>             >
>
>             > Thanks a lot for starting this Dawid,
>
>             >
>
>             > Big +1 for the proposed clean-up, and I would also add
>             the deprecated
>
>             > methods of the StreamExecutionEnvironment like:
>
>             >
>
>             > enableCheckpointing(long interval, CheckpointingMode
>             mode, boolean force)
>
>             > enableCheckpointing()
>
>             > isForceCheckpointing()
>
>             >
>
>             > readFile(FileInputFormat inputFormat,String
>
>             > filePath,FileProcessingMode watchType,long interval,
>             FilePathFilter
>
>             > filter)
>
>             > readFileStream(...)
>
>             >
>
>             > socketTextStream(String hostname, int port, char
>             delimiter, long maxRetry)
>
>             > socketTextStream(String hostname, int port, char delimiter)
>
>             >
>
>             > There are more, like the
>             (get)/setNumberOfExecutionRetries() that were
>
>             > deprecated long ago, but I have not investigated to see
>             if they are
>
>             > actually easy to remove.
>
>             >
>
>             > Cheers,
>
>             > Kostas
>
>             >
>
>             > On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
>
>             > wrote:
>
>             >
>
>             > Hi devs and users,
>
>             >
>
>             > I wanted to ask you what do you think about removing
>             some of the deprecated APIs around the DataStream API.
>
>             >
>
>             > The APIs I have in mind are:
>
>             >
>
>             > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>
>             > DataStream#fold and all related classes and methods such
>             as FoldFunction, FoldingState, FoldingStateDescriptor ...
>             (deprecated in 1.3/1.4)
>
>             >
>             StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>             (deprecated in 1.5)
>
>             > DataStream#split (deprecated in 1.8)
>
>             > Methods in (Connected)DataStream that specify keys as
>             either indices or field names such as DataStream#keyBy,
>             DataStream#partitionCustom, ConnectedStream#keyBy, ....
>             (deprecated in 1.11)
>
>             >
>
>             > I think the first three should be straightforward. They
>             are long deprecated. The getAccumulators method is not
>             used very often in my opinion. The same applies to the
>             DataStream#fold which additionally is not very performant.
>             Lastly the setStateBackend has an alternative with a class
>             from the AbstractStateBackend hierarchy, therefore it will
>             be still code compatible. Moreover if we remove the
>             #setStateBackend(AbstractStateBackend) we will get rid off
>             warnings users have right now when setting a statebackend
>             as the correct method cannot be used without an explicit
>             casting.
>
>             >
>
>             > As for the DataStream#split I know there were some
>             objections against removing the #split method in the past.
>             I still believe the output tags can replace the split
>             method already.
>
>             >
>
>             > The only problem in the last set of methods I propose to
>             remove is that they were deprecated only in the last
>             release and those method were only partially deprecated.
>             Moreover some of the methods were not deprecated in
>             ConnectedStreams. Nevertheless I'd still be inclined to
>             remove the methods in this release.
>
>             >
>
>             > Let me know what do you think about it.
>
>             >
>
>             > Best,
>
>             >
>
>             > Dawid
>

Re: Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Till Rohrmann <tr...@apache.org>.
Having looked at the proposed set of methods to remove I've noticed that
some of them are actually annotated with @Public. According to our
stability guarantees, only major releases (1.0, 2.0, etc.) can break APIs
with this annotation. Hence, I believe that we cannot simply remove them
unless the community decides to change the stability guarantees we give or
by making the next release a major release (Flink 2.0).

Cheers,
Till

On Tue, Aug 18, 2020 at 5:57 AM Yun Gao <yu...@aliyun.com> wrote:

> +1 for removing the methods that are deprecated for a while & have
> alternative methods.
>
> One specific thing is that if we remove the DataStream#split, do we
> consider enabling side-output in more operators in the future ? Currently
> it should be only available in ProcessFunctions, but not available to other
> commonly used UDF like Source or AsyncFunction[1].
>
> One temporary solution occurs to me is to add a ProcessFunction after the
> operators want to use side-output. But I think the solution is not very
> direct to come up with and if it really works we might add it to the
> document of side-output.
>
> [1] https://issues.apache.org/jira/browse/FLINK-7954
>
> Best,
>  Yun
>
> ------------------Original Mail ------------------
> *Sender:*Kostas Kloudas <kk...@gmail.com>
> *Send Date:*Tue Aug 18 03:52:44 2020
> *Recipients:*Dawid Wysakowicz <dw...@apache.org>
> *CC:*dev <de...@flink.apache.org>, user <us...@flink.apache.org>
> *Subject:*Re: [DISCUSS] Removing deprecated methods from DataStream API
>
>> +1 for removing them.
>>
>>
>>
>> From a quick look, most of them (not all) have been deprecated a long
>> time ago.
>>
>>
>>
>> Cheers,
>>
>> Kostas
>>
>>
>>
>> On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote:
>>
>> >
>>
>> > @David Yes, my idea was to remove any use of fold method and all
>> related classes including WindowedStream#fold
>>
>> >
>>
>> > @Klou Good idea to also remove the deprecated enableCheckpointing() &
>> StreamExecutionEnvironment#readFile and alike. I did another pass over some
>> of the classes and thought we could also drop:
>>
>> >
>>
>> > ExecutionConfig#set/getCodeAnalysisMode
>>
>> > ExecutionConfig#disable/enableSysoutLogging
>>
>> > ExecutionConfig#set/isFailTaskOnCheckpointError
>>
>> > ExecutionConfig#isLatencyTrackingEnabled
>>
>> >
>>
>> > As for the `forceCheckpointing` I am not fully convinced to doing it.
>> As far as I know iterations still do not participate in checkpointing
>> correctly. Therefore it still might make sense to force it. In other words
>> there is no real alternative to that method. Unless we only remove the
>> methods from StreamExecutionEnvironment and redirect to the setter in
>> CheckpointConfig. WDYT?
>>
>> >
>>
>> > An updated list of methods I suggest to remove:
>>
>> >
>>
>> > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>
>> > ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>
>> > ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>
>> > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>
>> > ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?)
>>
>> >
>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>> (deprecated in 1.2)
>>
>> > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>
>> > DataStream#fold and all related classes and methods such as
>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>> 1.3/1.4)
>>
>> > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>> (deprecated in 1.5)
>>
>> > DataStream#split (deprecated in 1.8)
>>
>> > Methods in (Connected)DataStream that specify keys as either indices or
>> field names such as DataStream#keyBy, DataStream#partitionCustom,
>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>
>> >
>>
>> > Bear in mind that majority of the options listed above in
>> ExecutionConfig take no effect. They were left there purely to satisfy the
>> binary compatibility. Personally I don't see any benefit of leaving a
>> method and silently dropping the underlying feature. The only configuration
>> that is respected is setting the number of execution retries.
>>
>> >
>>
>> > I also wanted to make it explicit that most of the changes above would
>> result in a binary incompatibility and require additional exclusions in the
>> japicmp. Those are:
>>
>> >
>>
>> > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>
>> > ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>
>> > ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>
>> > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>
>> > ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?)
>>
>> > DataStream#fold and all related classes and methods such as
>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>> 1.3/1.4)
>>
>> > DataStream#split (deprecated in 1.8)
>>
>> > Methods in (Connected)DataStream that specify keys as either indices or
>> field names such as DataStream#keyBy, DataStream#partitionCustom,
>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>
>> >
>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>> (deprecated in 1.2)
>>
>> >
>>
>> > Looking forward to more opinions on the issue.
>>
>> >
>>
>> > Best,
>>
>> >
>>
>> > Dawid
>>
>> >
>>
>> >
>>
>> > On 17/08/2020 12:49, Kostas Kloudas wrote:
>>
>> >
>>
>> > Thanks a lot for starting this Dawid,
>>
>> >
>>
>> > Big +1 for the proposed clean-up, and I would also add the deprecated
>>
>> > methods of the StreamExecutionEnvironment like:
>>
>> >
>>
>> > enableCheckpointing(long interval, CheckpointingMode mode, boolean
>> force)
>>
>> > enableCheckpointing()
>>
>> > isForceCheckpointing()
>>
>> >
>>
>> > readFile(FileInputFormat inputFormat,String
>>
>> > filePath,FileProcessingMode watchType,long interval, FilePathFilter
>>
>> > filter)
>>
>> > readFileStream(...)
>>
>> >
>>
>> > socketTextStream(String hostname, int port, char delimiter, long
>> maxRetry)
>>
>> > socketTextStream(String hostname, int port, char delimiter)
>>
>> >
>>
>> > There are more, like the (get)/setNumberOfExecutionRetries() that were
>>
>> > deprecated long ago, but I have not investigated to see if they are
>>
>> > actually easy to remove.
>>
>> >
>>
>> > Cheers,
>>
>> > Kostas
>>
>> >
>>
>> > On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
>>
>> > wrote:
>>
>> >
>>
>> > Hi devs and users,
>>
>> >
>>
>> > I wanted to ask you what do you think about removing some of the
>> deprecated APIs around the DataStream API.
>>
>> >
>>
>> > The APIs I have in mind are:
>>
>> >
>>
>> > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>
>> > DataStream#fold and all related classes and methods such as
>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>> 1.3/1.4)
>>
>> > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>> (deprecated in 1.5)
>>
>> > DataStream#split (deprecated in 1.8)
>>
>> > Methods in (Connected)DataStream that specify keys as either indices or
>> field names such as DataStream#keyBy, DataStream#partitionCustom,
>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>
>> >
>>
>> > I think the first three should be straightforward. They are long
>> deprecated. The getAccumulators method is not used very often in my
>> opinion. The same applies to the DataStream#fold which additionally is not
>> very performant. Lastly the setStateBackend has an alternative with a class
>> from the AbstractStateBackend hierarchy, therefore it will be still code
>> compatible. Moreover if we remove the
>> #setStateBackend(AbstractStateBackend) we will get rid off warnings users
>> have right now when setting a statebackend as the correct method cannot be
>> used without an explicit casting.
>>
>> >
>>
>> > As for the DataStream#split I know there were some objections against
>> removing the #split method in the past. I still believe the output tags can
>> replace the split method already.
>>
>> >
>>
>> > The only problem in the last set of methods I propose to remove is that
>> they were deprecated only in the last release and those method were only
>> partially deprecated. Moreover some of the methods were not deprecated in
>> ConnectedStreams. Nevertheless I'd still be inclined to remove the methods
>> in this release.
>>
>> >
>>
>> > Let me know what do you think about it.
>>
>> >
>>
>> > Best,
>>
>> >
>>
>> > Dawid
>
>

Re: Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Till Rohrmann <tr...@apache.org>.
Having looked at the proposed set of methods to remove I've noticed that
some of them are actually annotated with @Public. According to our
stability guarantees, only major releases (1.0, 2.0, etc.) can break APIs
with this annotation. Hence, I believe that we cannot simply remove them
unless the community decides to change the stability guarantees we give or
by making the next release a major release (Flink 2.0).

Cheers,
Till

On Tue, Aug 18, 2020 at 5:57 AM Yun Gao <yu...@aliyun.com> wrote:

> +1 for removing the methods that are deprecated for a while & have
> alternative methods.
>
> One specific thing is that if we remove the DataStream#split, do we
> consider enabling side-output in more operators in the future ? Currently
> it should be only available in ProcessFunctions, but not available to other
> commonly used UDF like Source or AsyncFunction[1].
>
> One temporary solution occurs to me is to add a ProcessFunction after the
> operators want to use side-output. But I think the solution is not very
> direct to come up with and if it really works we might add it to the
> document of side-output.
>
> [1] https://issues.apache.org/jira/browse/FLINK-7954
>
> Best,
>  Yun
>
> ------------------Original Mail ------------------
> *Sender:*Kostas Kloudas <kk...@gmail.com>
> *Send Date:*Tue Aug 18 03:52:44 2020
> *Recipients:*Dawid Wysakowicz <dw...@apache.org>
> *CC:*dev <de...@flink.apache.org>, user <us...@flink.apache.org>
> *Subject:*Re: [DISCUSS] Removing deprecated methods from DataStream API
>
>> +1 for removing them.
>>
>>
>>
>> From a quick look, most of them (not all) have been deprecated a long
>> time ago.
>>
>>
>>
>> Cheers,
>>
>> Kostas
>>
>>
>>
>> On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote:
>>
>> >
>>
>> > @David Yes, my idea was to remove any use of fold method and all
>> related classes including WindowedStream#fold
>>
>> >
>>
>> > @Klou Good idea to also remove the deprecated enableCheckpointing() &
>> StreamExecutionEnvironment#readFile and alike. I did another pass over some
>> of the classes and thought we could also drop:
>>
>> >
>>
>> > ExecutionConfig#set/getCodeAnalysisMode
>>
>> > ExecutionConfig#disable/enableSysoutLogging
>>
>> > ExecutionConfig#set/isFailTaskOnCheckpointError
>>
>> > ExecutionConfig#isLatencyTrackingEnabled
>>
>> >
>>
>> > As for the `forceCheckpointing` I am not fully convinced to doing it.
>> As far as I know iterations still do not participate in checkpointing
>> correctly. Therefore it still might make sense to force it. In other words
>> there is no real alternative to that method. Unless we only remove the
>> methods from StreamExecutionEnvironment and redirect to the setter in
>> CheckpointConfig. WDYT?
>>
>> >
>>
>> > An updated list of methods I suggest to remove:
>>
>> >
>>
>> > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>
>> > ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>
>> > ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>
>> > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>
>> > ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?)
>>
>> >
>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>> (deprecated in 1.2)
>>
>> > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>
>> > DataStream#fold and all related classes and methods such as
>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>> 1.3/1.4)
>>
>> > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>> (deprecated in 1.5)
>>
>> > DataStream#split (deprecated in 1.8)
>>
>> > Methods in (Connected)DataStream that specify keys as either indices or
>> field names such as DataStream#keyBy, DataStream#partitionCustom,
>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>
>> >
>>
>> > Bear in mind that majority of the options listed above in
>> ExecutionConfig take no effect. They were left there purely to satisfy the
>> binary compatibility. Personally I don't see any benefit of leaving a
>> method and silently dropping the underlying feature. The only configuration
>> that is respected is setting the number of execution retries.
>>
>> >
>>
>> > I also wanted to make it explicit that most of the changes above would
>> result in a binary incompatibility and require additional exclusions in the
>> japicmp. Those are:
>>
>> >
>>
>> > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>
>> > ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>
>> > ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>
>> > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>
>> > ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?)
>>
>> > DataStream#fold and all related classes and methods such as
>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>> 1.3/1.4)
>>
>> > DataStream#split (deprecated in 1.8)
>>
>> > Methods in (Connected)DataStream that specify keys as either indices or
>> field names such as DataStream#keyBy, DataStream#partitionCustom,
>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>
>> >
>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>> (deprecated in 1.2)
>>
>> >
>>
>> > Looking forward to more opinions on the issue.
>>
>> >
>>
>> > Best,
>>
>> >
>>
>> > Dawid
>>
>> >
>>
>> >
>>
>> > On 17/08/2020 12:49, Kostas Kloudas wrote:
>>
>> >
>>
>> > Thanks a lot for starting this Dawid,
>>
>> >
>>
>> > Big +1 for the proposed clean-up, and I would also add the deprecated
>>
>> > methods of the StreamExecutionEnvironment like:
>>
>> >
>>
>> > enableCheckpointing(long interval, CheckpointingMode mode, boolean
>> force)
>>
>> > enableCheckpointing()
>>
>> > isForceCheckpointing()
>>
>> >
>>
>> > readFile(FileInputFormat inputFormat,String
>>
>> > filePath,FileProcessingMode watchType,long interval, FilePathFilter
>>
>> > filter)
>>
>> > readFileStream(...)
>>
>> >
>>
>> > socketTextStream(String hostname, int port, char delimiter, long
>> maxRetry)
>>
>> > socketTextStream(String hostname, int port, char delimiter)
>>
>> >
>>
>> > There are more, like the (get)/setNumberOfExecutionRetries() that were
>>
>> > deprecated long ago, but I have not investigated to see if they are
>>
>> > actually easy to remove.
>>
>> >
>>
>> > Cheers,
>>
>> > Kostas
>>
>> >
>>
>> > On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
>>
>> > wrote:
>>
>> >
>>
>> > Hi devs and users,
>>
>> >
>>
>> > I wanted to ask you what do you think about removing some of the
>> deprecated APIs around the DataStream API.
>>
>> >
>>
>> > The APIs I have in mind are:
>>
>> >
>>
>> > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>
>> > DataStream#fold and all related classes and methods such as
>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>> 1.3/1.4)
>>
>> > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>> (deprecated in 1.5)
>>
>> > DataStream#split (deprecated in 1.8)
>>
>> > Methods in (Connected)DataStream that specify keys as either indices or
>> field names such as DataStream#keyBy, DataStream#partitionCustom,
>> ConnectedStream#keyBy, .... (deprecated in 1.11)
>>
>> >
>>
>> > I think the first three should be straightforward. They are long
>> deprecated. The getAccumulators method is not used very often in my
>> opinion. The same applies to the DataStream#fold which additionally is not
>> very performant. Lastly the setStateBackend has an alternative with a class
>> from the AbstractStateBackend hierarchy, therefore it will be still code
>> compatible. Moreover if we remove the
>> #setStateBackend(AbstractStateBackend) we will get rid off warnings users
>> have right now when setting a statebackend as the correct method cannot be
>> used without an explicit casting.
>>
>> >
>>
>> > As for the DataStream#split I know there were some objections against
>> removing the #split method in the past. I still believe the output tags can
>> replace the split method already.
>>
>> >
>>
>> > The only problem in the last set of methods I propose to remove is that
>> they were deprecated only in the last release and those method were only
>> partially deprecated. Moreover some of the methods were not deprecated in
>> ConnectedStreams. Nevertheless I'd still be inclined to remove the methods
>> in this release.
>>
>> >
>>
>> > Let me know what do you think about it.
>>
>> >
>>
>> > Best,
>>
>> >
>>
>> > Dawid
>
>

Re: Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Yun Gao <yu...@aliyun.com.INVALID>.
+1 for removing the methods that are deprecated for a while & have alternative methods.

One specific thing is that if we remove the DataStream#split, do we consider enabling side-output in more operators in the future ? Currently it should be only available in ProcessFunctions, but not available to other commonly used UDF like Source or AsyncFunction[1].

One temporary solution occurs to me is to add a ProcessFunction after the operators want to use side-output. But I think the solution is not very direct to come up with and if it really works we might add it to the document of side-output. 

[1] https://issues.apache.org/jira/browse/FLINK-7954

Best,
 Yun


 ------------------Original Mail ------------------
Sender:Kostas Kloudas <kk...@gmail.com>
Send Date:Tue Aug 18 03:52:44 2020
Recipients:Dawid Wysakowicz <dw...@apache.org>
CC:dev <de...@flink.apache.org>, user <us...@flink.apache.org>
Subject:Re: [DISCUSS] Removing deprecated methods from DataStream API
+1 for removing them.



From a quick look, most of them (not all) have been deprecated a long time ago.



Cheers,

Kostas



On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz  wrote:

>

> @David Yes, my idea was to remove any use of fold method and all related classes including WindowedStream#fold

>

> @Klou Good idea to also remove the deprecated enableCheckpointing() & StreamExecutionEnvironment#readFile and alike. I did another pass over some of the classes and thought we could also drop:

>

> ExecutionConfig#set/getCodeAnalysisMode

> ExecutionConfig#disable/enableSysoutLogging

> ExecutionConfig#set/isFailTaskOnCheckpointError

> ExecutionConfig#isLatencyTrackingEnabled

>

> As for the `forceCheckpointing` I am not fully convinced to doing it. As far as I know iterations still do not participate in checkpointing correctly. Therefore it still might make sense to force it. In other words there is no real alternative to that method. Unless we only remove the methods from StreamExecutionEnvironment and redirect to the setter in CheckpointConfig. WDYT?

>

> An updated list of methods I suggest to remove:

>

> ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)

> ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)

> ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)

> ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)

> ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?)

> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...) (deprecated in 1.2)

> RuntimeContext#getAllAccumulators (deprecated in 0.10)

> DataStream#fold and all related classes and methods such as FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in 1.3/1.4)

> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) (deprecated in 1.5)

> DataStream#split (deprecated in 1.8)

> Methods in (Connected)DataStream that specify keys as either indices or field names such as DataStream#keyBy, DataStream#partitionCustom, ConnectedStream#keyBy, .... (deprecated in 1.11)

>

> Bear in mind that majority of the options listed above in ExecutionConfig take no effect. They were left there purely to satisfy the binary compatibility. Personally I don't see any benefit of leaving a method and silently dropping the underlying feature. The only configuration that is respected is setting the number of execution retries.

>

> I also wanted to make it explicit that most of the changes above would result in a binary incompatibility and require additional exclusions in the japicmp. Those are:

>

> ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)

> ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)

> ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)

> ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)

> ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?)

> DataStream#fold and all related classes and methods such as FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in 1.3/1.4)

> DataStream#split (deprecated in 1.8)

> Methods in (Connected)DataStream that specify keys as either indices or field names such as DataStream#keyBy, DataStream#partitionCustom, ConnectedStream#keyBy, .... (deprecated in 1.11)

> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...) (deprecated in 1.2)

>

> Looking forward to more opinions on the issue.

>

> Best,

>

> Dawid

>

>

> On 17/08/2020 12:49, Kostas Kloudas wrote:

>

> Thanks a lot for starting this Dawid,

>

> Big +1 for the proposed clean-up, and I would also add the deprecated

> methods of the StreamExecutionEnvironment like:

>

> enableCheckpointing(long interval, CheckpointingMode mode, boolean force)

> enableCheckpointing()

> isForceCheckpointing()

>

> readFile(FileInputFormat inputFormat,String

> filePath,FileProcessingMode watchType,long interval, FilePathFilter

> filter)

> readFileStream(...)

>

> socketTextStream(String hostname, int port, char delimiter, long maxRetry)

> socketTextStream(String hostname, int port, char delimiter)

>

> There are more, like the (get)/setNumberOfExecutionRetries() that were

> deprecated long ago, but I have not investigated to see if they are

> actually easy to remove.

>

> Cheers,

> Kostas

>

> On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz

>  wrote:

>

> Hi devs and users,

>

> I wanted to ask you what do you think about removing some of the deprecated APIs around the DataStream API.

>

> The APIs I have in mind are:

>

> RuntimeContext#getAllAccumulators (deprecated in 0.10)

> DataStream#fold and all related classes and methods such as FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in 1.3/1.4)

> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) (deprecated in 1.5)

> DataStream#split (deprecated in 1.8)

> Methods in (Connected)DataStream that specify keys as either indices or field names such as DataStream#keyBy, DataStream#partitionCustom, ConnectedStream#keyBy, .... (deprecated in 1.11)

>

> I think the first three should be straightforward. They are long deprecated. The getAccumulators method is not used very often in my opinion. The same applies to the DataStream#fold which additionally is not very performant. Lastly the setStateBackend has an alternative with a class from the AbstractStateBackend hierarchy, therefore it will be still code compatible. Moreover if we remove the #setStateBackend(AbstractStateBackend) we will get rid off warnings users have right now when setting a statebackend as the correct method cannot be used without an explicit casting.

>

> As for the DataStream#split I know there were some objections against removing the #split method in the past. I still believe the output tags can replace the split method already.

>

> The only problem in the last set of methods I propose to remove is that they were deprecated only in the last release and those method were only partially deprecated. Moreover some of the methods were not deprecated in ConnectedStreams. Nevertheless I'd still be inclined to remove the methods in this release.

>

> Let me know what do you think about it.

>

> Best,

>

> Dawid

Re: Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Yun Gao <yu...@aliyun.com>.
+1 for removing the methods that are deprecated for a while & have alternative methods.

One specific thing is that if we remove the DataStream#split, do we consider enabling side-output in more operators in the future ? Currently it should be only available in ProcessFunctions, but not available to other commonly used UDF like Source or AsyncFunction[1].

One temporary solution occurs to me is to add a ProcessFunction after the operators want to use side-output. But I think the solution is not very direct to come up with and if it really works we might add it to the document of side-output. 

[1] https://issues.apache.org/jira/browse/FLINK-7954

Best,
 Yun


 ------------------Original Mail ------------------
Sender:Kostas Kloudas <kk...@gmail.com>
Send Date:Tue Aug 18 03:52:44 2020
Recipients:Dawid Wysakowicz <dw...@apache.org>
CC:dev <de...@flink.apache.org>, user <us...@flink.apache.org>
Subject:Re: [DISCUSS] Removing deprecated methods from DataStream API
+1 for removing them.



From a quick look, most of them (not all) have been deprecated a long time ago.



Cheers,

Kostas



On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz  wrote:

>

> @David Yes, my idea was to remove any use of fold method and all related classes including WindowedStream#fold

>

> @Klou Good idea to also remove the deprecated enableCheckpointing() & StreamExecutionEnvironment#readFile and alike. I did another pass over some of the classes and thought we could also drop:

>

> ExecutionConfig#set/getCodeAnalysisMode

> ExecutionConfig#disable/enableSysoutLogging

> ExecutionConfig#set/isFailTaskOnCheckpointError

> ExecutionConfig#isLatencyTrackingEnabled

>

> As for the `forceCheckpointing` I am not fully convinced to doing it. As far as I know iterations still do not participate in checkpointing correctly. Therefore it still might make sense to force it. In other words there is no real alternative to that method. Unless we only remove the methods from StreamExecutionEnvironment and redirect to the setter in CheckpointConfig. WDYT?

>

> An updated list of methods I suggest to remove:

>

> ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)

> ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)

> ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)

> ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)

> ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?)

> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...) (deprecated in 1.2)

> RuntimeContext#getAllAccumulators (deprecated in 0.10)

> DataStream#fold and all related classes and methods such as FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in 1.3/1.4)

> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) (deprecated in 1.5)

> DataStream#split (deprecated in 1.8)

> Methods in (Connected)DataStream that specify keys as either indices or field names such as DataStream#keyBy, DataStream#partitionCustom, ConnectedStream#keyBy, .... (deprecated in 1.11)

>

> Bear in mind that majority of the options listed above in ExecutionConfig take no effect. They were left there purely to satisfy the binary compatibility. Personally I don't see any benefit of leaving a method and silently dropping the underlying feature. The only configuration that is respected is setting the number of execution retries.

>

> I also wanted to make it explicit that most of the changes above would result in a binary incompatibility and require additional exclusions in the japicmp. Those are:

>

> ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)

> ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)

> ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)

> ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)

> ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?)

> DataStream#fold and all related classes and methods such as FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in 1.3/1.4)

> DataStream#split (deprecated in 1.8)

> Methods in (Connected)DataStream that specify keys as either indices or field names such as DataStream#keyBy, DataStream#partitionCustom, ConnectedStream#keyBy, .... (deprecated in 1.11)

> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...) (deprecated in 1.2)

>

> Looking forward to more opinions on the issue.

>

> Best,

>

> Dawid

>

>

> On 17/08/2020 12:49, Kostas Kloudas wrote:

>

> Thanks a lot for starting this Dawid,

>

> Big +1 for the proposed clean-up, and I would also add the deprecated

> methods of the StreamExecutionEnvironment like:

>

> enableCheckpointing(long interval, CheckpointingMode mode, boolean force)

> enableCheckpointing()

> isForceCheckpointing()

>

> readFile(FileInputFormat inputFormat,String

> filePath,FileProcessingMode watchType,long interval, FilePathFilter

> filter)

> readFileStream(...)

>

> socketTextStream(String hostname, int port, char delimiter, long maxRetry)

> socketTextStream(String hostname, int port, char delimiter)

>

> There are more, like the (get)/setNumberOfExecutionRetries() that were

> deprecated long ago, but I have not investigated to see if they are

> actually easy to remove.

>

> Cheers,

> Kostas

>

> On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz

>  wrote:

>

> Hi devs and users,

>

> I wanted to ask you what do you think about removing some of the deprecated APIs around the DataStream API.

>

> The APIs I have in mind are:

>

> RuntimeContext#getAllAccumulators (deprecated in 0.10)

> DataStream#fold and all related classes and methods such as FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in 1.3/1.4)

> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) (deprecated in 1.5)

> DataStream#split (deprecated in 1.8)

> Methods in (Connected)DataStream that specify keys as either indices or field names such as DataStream#keyBy, DataStream#partitionCustom, ConnectedStream#keyBy, .... (deprecated in 1.11)

>

> I think the first three should be straightforward. They are long deprecated. The getAccumulators method is not used very often in my opinion. The same applies to the DataStream#fold which additionally is not very performant. Lastly the setStateBackend has an alternative with a class from the AbstractStateBackend hierarchy, therefore it will be still code compatible. Moreover if we remove the #setStateBackend(AbstractStateBackend) we will get rid off warnings users have right now when setting a statebackend as the correct method cannot be used without an explicit casting.

>

> As for the DataStream#split I know there were some objections against removing the #split method in the past. I still believe the output tags can replace the split method already.

>

> The only problem in the last set of methods I propose to remove is that they were deprecated only in the last release and those method were only partially deprecated. Moreover some of the methods were not deprecated in ConnectedStreams. Nevertheless I'd still be inclined to remove the methods in this release.

>

> Let me know what do you think about it.

>

> Best,

>

> Dawid

Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Kostas Kloudas <kk...@gmail.com>.
+1 for removing them.

From a quick look, most of them (not all) have been deprecated a long time ago.

Cheers,
Kostas

On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz <dw...@apache.org> wrote:
>
> @David Yes, my idea was to remove any use of fold method and all related classes including WindowedStream#fold
>
> @Klou Good idea to also remove the deprecated enableCheckpointing() & StreamExecutionEnvironment#readFile and alike. I did another pass over some of the classes and thought we could also drop:
>
> ExecutionConfig#set/getCodeAnalysisMode
> ExecutionConfig#disable/enableSysoutLogging
> ExecutionConfig#set/isFailTaskOnCheckpointError
> ExecutionConfig#isLatencyTrackingEnabled
>
> As for the `forceCheckpointing` I am not fully convinced to doing it. As far as I know iterations still do not participate in checkpointing correctly. Therefore it still might make sense to force it. In other words there is no real alternative to that method. Unless we only remove the methods from StreamExecutionEnvironment and redirect to the setter in CheckpointConfig. WDYT?
>
> An updated list of methods I suggest to remove:
>
> ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
> ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
> ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
> ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
> ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?)
> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...) (deprecated in 1.2)
> RuntimeContext#getAllAccumulators (deprecated in 0.10)
> DataStream#fold and all related classes and methods such as FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in 1.3/1.4)
> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) (deprecated in 1.5)
> DataStream#split (deprecated in 1.8)
> Methods in (Connected)DataStream that specify keys as either indices or field names such as DataStream#keyBy, DataStream#partitionCustom, ConnectedStream#keyBy, .... (deprecated in 1.11)
>
> Bear in mind that majority of the options listed above in ExecutionConfig take no effect. They were left there purely to satisfy the binary compatibility. Personally I don't see any benefit of leaving a method and silently dropping the underlying feature. The only configuration that is respected is setting the number of execution retries.
>
> I also wanted to make it explicit that most of the changes above would result in a binary incompatibility and require additional exclusions in the japicmp. Those are:
>
> ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
> ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
> ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
> ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
> ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?)
> DataStream#fold and all related classes and methods such as FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in 1.3/1.4)
> DataStream#split (deprecated in 1.8)
> Methods in (Connected)DataStream that specify keys as either indices or field names such as DataStream#keyBy, DataStream#partitionCustom, ConnectedStream#keyBy, .... (deprecated in 1.11)
> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...) (deprecated in 1.2)
>
> Looking forward to more opinions on the issue.
>
> Best,
>
> Dawid
>
>
> On 17/08/2020 12:49, Kostas Kloudas wrote:
>
> Thanks a lot for starting this Dawid,
>
> Big +1 for the proposed clean-up, and I would also add the deprecated
> methods of the StreamExecutionEnvironment like:
>
> enableCheckpointing(long interval, CheckpointingMode mode, boolean force)
> enableCheckpointing()
> isForceCheckpointing()
>
> readFile(FileInputFormat<OUT> inputFormat,String
> filePath,FileProcessingMode watchType,long interval, FilePathFilter
> filter)
> readFileStream(...)
>
> socketTextStream(String hostname, int port, char delimiter, long maxRetry)
> socketTextStream(String hostname, int port, char delimiter)
>
> There are more, like the (get)/setNumberOfExecutionRetries() that were
> deprecated long ago, but I have not investigated to see if they are
> actually easy to remove.
>
> Cheers,
> Kostas
>
> On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
> <dw...@apache.org> wrote:
>
> Hi devs and users,
>
> I wanted to ask you what do you think about removing some of the deprecated APIs around the DataStream API.
>
> The APIs I have in mind are:
>
> RuntimeContext#getAllAccumulators (deprecated in 0.10)
> DataStream#fold and all related classes and methods such as FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in 1.3/1.4)
> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) (deprecated in 1.5)
> DataStream#split (deprecated in 1.8)
> Methods in (Connected)DataStream that specify keys as either indices or field names such as DataStream#keyBy, DataStream#partitionCustom, ConnectedStream#keyBy, .... (deprecated in 1.11)
>
> I think the first three should be straightforward. They are long deprecated. The getAccumulators method is not used very often in my opinion. The same applies to the DataStream#fold which additionally is not very performant. Lastly the setStateBackend has an alternative with a class from the AbstractStateBackend hierarchy, therefore it will be still code compatible. Moreover if we remove the #setStateBackend(AbstractStateBackend) we will get rid off warnings users have right now when setting a statebackend as the correct method cannot be used without an explicit casting.
>
> As for the DataStream#split I know there were some objections against removing the #split method in the past. I still believe the output tags can replace the split method already.
>
> The only problem in the last set of methods I propose to remove is that they were deprecated only in the last release and those method were only partially deprecated. Moreover some of the methods were not deprecated in ConnectedStreams. Nevertheless I'd still be inclined to remove the methods in this release.
>
> Let me know what do you think about it.
>
> Best,
>
> Dawid

Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Kostas Kloudas <kk...@gmail.com>.
+1 for removing them.

From a quick look, most of them (not all) have been deprecated a long time ago.

Cheers,
Kostas

On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz <dw...@apache.org> wrote:
>
> @David Yes, my idea was to remove any use of fold method and all related classes including WindowedStream#fold
>
> @Klou Good idea to also remove the deprecated enableCheckpointing() & StreamExecutionEnvironment#readFile and alike. I did another pass over some of the classes and thought we could also drop:
>
> ExecutionConfig#set/getCodeAnalysisMode
> ExecutionConfig#disable/enableSysoutLogging
> ExecutionConfig#set/isFailTaskOnCheckpointError
> ExecutionConfig#isLatencyTrackingEnabled
>
> As for the `forceCheckpointing` I am not fully convinced to doing it. As far as I know iterations still do not participate in checkpointing correctly. Therefore it still might make sense to force it. In other words there is no real alternative to that method. Unless we only remove the methods from StreamExecutionEnvironment and redirect to the setter in CheckpointConfig. WDYT?
>
> An updated list of methods I suggest to remove:
>
> ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
> ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
> ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
> ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
> ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?)
> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...) (deprecated in 1.2)
> RuntimeContext#getAllAccumulators (deprecated in 0.10)
> DataStream#fold and all related classes and methods such as FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in 1.3/1.4)
> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) (deprecated in 1.5)
> DataStream#split (deprecated in 1.8)
> Methods in (Connected)DataStream that specify keys as either indices or field names such as DataStream#keyBy, DataStream#partitionCustom, ConnectedStream#keyBy, .... (deprecated in 1.11)
>
> Bear in mind that majority of the options listed above in ExecutionConfig take no effect. They were left there purely to satisfy the binary compatibility. Personally I don't see any benefit of leaving a method and silently dropping the underlying feature. The only configuration that is respected is setting the number of execution retries.
>
> I also wanted to make it explicit that most of the changes above would result in a binary incompatibility and require additional exclusions in the japicmp. Those are:
>
> ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
> ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
> ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
> ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
> ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?)
> DataStream#fold and all related classes and methods such as FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in 1.3/1.4)
> DataStream#split (deprecated in 1.8)
> Methods in (Connected)DataStream that specify keys as either indices or field names such as DataStream#keyBy, DataStream#partitionCustom, ConnectedStream#keyBy, .... (deprecated in 1.11)
> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...) (deprecated in 1.2)
>
> Looking forward to more opinions on the issue.
>
> Best,
>
> Dawid
>
>
> On 17/08/2020 12:49, Kostas Kloudas wrote:
>
> Thanks a lot for starting this Dawid,
>
> Big +1 for the proposed clean-up, and I would also add the deprecated
> methods of the StreamExecutionEnvironment like:
>
> enableCheckpointing(long interval, CheckpointingMode mode, boolean force)
> enableCheckpointing()
> isForceCheckpointing()
>
> readFile(FileInputFormat<OUT> inputFormat,String
> filePath,FileProcessingMode watchType,long interval, FilePathFilter
> filter)
> readFileStream(...)
>
> socketTextStream(String hostname, int port, char delimiter, long maxRetry)
> socketTextStream(String hostname, int port, char delimiter)
>
> There are more, like the (get)/setNumberOfExecutionRetries() that were
> deprecated long ago, but I have not investigated to see if they are
> actually easy to remove.
>
> Cheers,
> Kostas
>
> On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
> <dw...@apache.org> wrote:
>
> Hi devs and users,
>
> I wanted to ask you what do you think about removing some of the deprecated APIs around the DataStream API.
>
> The APIs I have in mind are:
>
> RuntimeContext#getAllAccumulators (deprecated in 0.10)
> DataStream#fold and all related classes and methods such as FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in 1.3/1.4)
> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) (deprecated in 1.5)
> DataStream#split (deprecated in 1.8)
> Methods in (Connected)DataStream that specify keys as either indices or field names such as DataStream#keyBy, DataStream#partitionCustom, ConnectedStream#keyBy, .... (deprecated in 1.11)
>
> I think the first three should be straightforward. They are long deprecated. The getAccumulators method is not used very often in my opinion. The same applies to the DataStream#fold which additionally is not very performant. Lastly the setStateBackend has an alternative with a class from the AbstractStateBackend hierarchy, therefore it will be still code compatible. Moreover if we remove the #setStateBackend(AbstractStateBackend) we will get rid off warnings users have right now when setting a statebackend as the correct method cannot be used without an explicit casting.
>
> As for the DataStream#split I know there were some objections against removing the #split method in the past. I still believe the output tags can replace the split method already.
>
> The only problem in the last set of methods I propose to remove is that they were deprecated only in the last release and those method were only partially deprecated. Moreover some of the methods were not deprecated in ConnectedStreams. Nevertheless I'd still be inclined to remove the methods in this release.
>
> Let me know what do you think about it.
>
> Best,
>
> Dawid

Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Dawid Wysakowicz <dw...@apache.org>.
@David Yes, my idea was to remove any use of fold method and all related
classes including WindowedStream#fold

@Klou Good idea to also remove the deprecated enableCheckpointing() &
StreamExecutionEnvironment#readFile and alike. I did another pass over
some of the classes and thought we could also drop:

  * ExecutionConfig#set/getCodeAnalysisMode
  * ExecutionConfig#disable/enableSysoutLogging
  * ExecutionConfig#set/isFailTaskOnCheckpointError
  * ExecutionConfig#isLatencyTrackingEnabled

As for the `forceCheckpointing` I am not fully convinced to doing it. As
far as I know iterations still do not participate in checkpointing
correctly. Therefore it still might make sense to force it. In other
words there is no real alternative to that method. Unless we only remove
the methods from StreamExecutionEnvironment and redirect to the setter
in CheckpointConfig. WDYT?

An updated list of methods I suggest to remove:

  * ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
  * ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
  * ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
  * ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
  * ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?)
  * StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
    (deprecated in 1.2)
  * RuntimeContext#getAllAccumulators (deprecated in 0.10)
  * DataStream#fold and all related classes and methods such as
    FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated
    in 1.3/1.4)
  * StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
    (deprecated in 1.5)
  * DataStream#split (deprecated in 1.8)
  * Methods in (Connected)DataStream that specify keys as either indices
    or field names such as DataStream#keyBy, DataStream#partitionCustom,
    ConnectedStream#keyBy, .... (deprecated in 1.11)

Bear in mind that majority of the options listed above in
ExecutionConfig take no effect. They were left there purely to satisfy
the binary compatibility. Personally I don't see any benefit of leaving
a method and silently dropping the underlying feature. The only
configuration that is respected is setting the number of execution retries.

I also wanted to make it explicit that most of the changes above would
result in a binary incompatibility and require additional exclusions in
the japicmp. Those are:

  * ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
  * ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
  * ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
  * ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
  * ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?)
  * DataStream#fold and all related classes and methods such as
    FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated
    in 1.3/1.4)
  * DataStream#split (deprecated in 1.8)
  * Methods in (Connected)DataStream that specify keys as either indices
    or field names such as DataStream#keyBy, DataStream#partitionCustom,
    ConnectedStream#keyBy, .... (deprecated in 1.11)
  * StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
    (deprecated in 1.2)

Looking forward to more opinions on the issue.

Best,

Dawid


On 17/08/2020 12:49, Kostas Kloudas wrote:
> Thanks a lot for starting this Dawid,
>
> Big +1 for the proposed clean-up, and I would also add the deprecated
> methods of the StreamExecutionEnvironment like:
>
> enableCheckpointing(long interval, CheckpointingMode mode, boolean force)
> enableCheckpointing()
> isForceCheckpointing()
>
> readFile(FileInputFormat<OUT> inputFormat,String
> filePath,FileProcessingMode watchType,long interval, FilePathFilter
> filter)
> readFileStream(...)
>
> socketTextStream(String hostname, int port, char delimiter, long maxRetry)
> socketTextStream(String hostname, int port, char delimiter)
>
> There are more, like the (get)/setNumberOfExecutionRetries() that were
> deprecated long ago, but I have not investigated to see if they are
> actually easy to remove.
>
> Cheers,
> Kostas
>
> On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
> <dw...@apache.org> wrote:
>> Hi devs and users,
>>
>> I wanted to ask you what do you think about removing some of the deprecated APIs around the DataStream API.
>>
>> The APIs I have in mind are:
>>
>> RuntimeContext#getAllAccumulators (deprecated in 0.10)
>> DataStream#fold and all related classes and methods such as FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in 1.3/1.4)
>> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) (deprecated in 1.5)
>> DataStream#split (deprecated in 1.8)
>> Methods in (Connected)DataStream that specify keys as either indices or field names such as DataStream#keyBy, DataStream#partitionCustom, ConnectedStream#keyBy, .... (deprecated in 1.11)
>>
>> I think the first three should be straightforward. They are long deprecated. The getAccumulators method is not used very often in my opinion. The same applies to the DataStream#fold which additionally is not very performant. Lastly the setStateBackend has an alternative with a class from the AbstractStateBackend hierarchy, therefore it will be still code compatible. Moreover if we remove the #setStateBackend(AbstractStateBackend) we will get rid off warnings users have right now when setting a statebackend as the correct method cannot be used without an explicit casting.
>>
>> As for the DataStream#split I know there were some objections against removing the #split method in the past. I still believe the output tags can replace the split method already.
>>
>> The only problem in the last set of methods I propose to remove is that they were deprecated only in the last release and those method were only partially deprecated. Moreover some of the methods were not deprecated in ConnectedStreams. Nevertheless I'd still be inclined to remove the methods in this release.
>>
>> Let me know what do you think about it.
>>
>> Best,
>>
>> Dawid

Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Dawid Wysakowicz <dw...@apache.org>.
@David Yes, my idea was to remove any use of fold method and all related
classes including WindowedStream#fold

@Klou Good idea to also remove the deprecated enableCheckpointing() &
StreamExecutionEnvironment#readFile and alike. I did another pass over
some of the classes and thought we could also drop:

  * ExecutionConfig#set/getCodeAnalysisMode
  * ExecutionConfig#disable/enableSysoutLogging
  * ExecutionConfig#set/isFailTaskOnCheckpointError
  * ExecutionConfig#isLatencyTrackingEnabled

As for the `forceCheckpointing` I am not fully convinced to doing it. As
far as I know iterations still do not participate in checkpointing
correctly. Therefore it still might make sense to force it. In other
words there is no real alternative to that method. Unless we only remove
the methods from StreamExecutionEnvironment and redirect to the setter
in CheckpointConfig. WDYT?

An updated list of methods I suggest to remove:

  * ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
  * ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
  * ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
  * ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
  * ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?)
  * StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
    (deprecated in 1.2)
  * RuntimeContext#getAllAccumulators (deprecated in 0.10)
  * DataStream#fold and all related classes and methods such as
    FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated
    in 1.3/1.4)
  * StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
    (deprecated in 1.5)
  * DataStream#split (deprecated in 1.8)
  * Methods in (Connected)DataStream that specify keys as either indices
    or field names such as DataStream#keyBy, DataStream#partitionCustom,
    ConnectedStream#keyBy, .... (deprecated in 1.11)

Bear in mind that majority of the options listed above in
ExecutionConfig take no effect. They were left there purely to satisfy
the binary compatibility. Personally I don't see any benefit of leaving
a method and silently dropping the underlying feature. The only
configuration that is respected is setting the number of execution retries.

I also wanted to make it explicit that most of the changes above would
result in a binary incompatibility and require additional exclusions in
the japicmp. Those are:

  * ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
  * ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
  * ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
  * ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
  * ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in 1.1?)
  * DataStream#fold and all related classes and methods such as
    FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated
    in 1.3/1.4)
  * DataStream#split (deprecated in 1.8)
  * Methods in (Connected)DataStream that specify keys as either indices
    or field names such as DataStream#keyBy, DataStream#partitionCustom,
    ConnectedStream#keyBy, .... (deprecated in 1.11)
  * StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
    (deprecated in 1.2)

Looking forward to more opinions on the issue.

Best,

Dawid


On 17/08/2020 12:49, Kostas Kloudas wrote:
> Thanks a lot for starting this Dawid,
>
> Big +1 for the proposed clean-up, and I would also add the deprecated
> methods of the StreamExecutionEnvironment like:
>
> enableCheckpointing(long interval, CheckpointingMode mode, boolean force)
> enableCheckpointing()
> isForceCheckpointing()
>
> readFile(FileInputFormat<OUT> inputFormat,String
> filePath,FileProcessingMode watchType,long interval, FilePathFilter
> filter)
> readFileStream(...)
>
> socketTextStream(String hostname, int port, char delimiter, long maxRetry)
> socketTextStream(String hostname, int port, char delimiter)
>
> There are more, like the (get)/setNumberOfExecutionRetries() that were
> deprecated long ago, but I have not investigated to see if they are
> actually easy to remove.
>
> Cheers,
> Kostas
>
> On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
> <dw...@apache.org> wrote:
>> Hi devs and users,
>>
>> I wanted to ask you what do you think about removing some of the deprecated APIs around the DataStream API.
>>
>> The APIs I have in mind are:
>>
>> RuntimeContext#getAllAccumulators (deprecated in 0.10)
>> DataStream#fold and all related classes and methods such as FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in 1.3/1.4)
>> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) (deprecated in 1.5)
>> DataStream#split (deprecated in 1.8)
>> Methods in (Connected)DataStream that specify keys as either indices or field names such as DataStream#keyBy, DataStream#partitionCustom, ConnectedStream#keyBy, .... (deprecated in 1.11)
>>
>> I think the first three should be straightforward. They are long deprecated. The getAccumulators method is not used very often in my opinion. The same applies to the DataStream#fold which additionally is not very performant. Lastly the setStateBackend has an alternative with a class from the AbstractStateBackend hierarchy, therefore it will be still code compatible. Moreover if we remove the #setStateBackend(AbstractStateBackend) we will get rid off warnings users have right now when setting a statebackend as the correct method cannot be used without an explicit casting.
>>
>> As for the DataStream#split I know there were some objections against removing the #split method in the past. I still believe the output tags can replace the split method already.
>>
>> The only problem in the last set of methods I propose to remove is that they were deprecated only in the last release and those method were only partially deprecated. Moreover some of the methods were not deprecated in ConnectedStreams. Nevertheless I'd still be inclined to remove the methods in this release.
>>
>> Let me know what do you think about it.
>>
>> Best,
>>
>> Dawid

Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Kostas Kloudas <kk...@gmail.com>.
Thanks a lot for starting this Dawid,

Big +1 for the proposed clean-up, and I would also add the deprecated
methods of the StreamExecutionEnvironment like:

enableCheckpointing(long interval, CheckpointingMode mode, boolean force)
enableCheckpointing()
isForceCheckpointing()

readFile(FileInputFormat<OUT> inputFormat,String
filePath,FileProcessingMode watchType,long interval, FilePathFilter
filter)
readFileStream(...)

socketTextStream(String hostname, int port, char delimiter, long maxRetry)
socketTextStream(String hostname, int port, char delimiter)

There are more, like the (get)/setNumberOfExecutionRetries() that were
deprecated long ago, but I have not investigated to see if they are
actually easy to remove.

Cheers,
Kostas

On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
<dw...@apache.org> wrote:
>
> Hi devs and users,
>
> I wanted to ask you what do you think about removing some of the deprecated APIs around the DataStream API.
>
> The APIs I have in mind are:
>
> RuntimeContext#getAllAccumulators (deprecated in 0.10)
> DataStream#fold and all related classes and methods such as FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in 1.3/1.4)
> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) (deprecated in 1.5)
> DataStream#split (deprecated in 1.8)
> Methods in (Connected)DataStream that specify keys as either indices or field names such as DataStream#keyBy, DataStream#partitionCustom, ConnectedStream#keyBy, .... (deprecated in 1.11)
>
> I think the first three should be straightforward. They are long deprecated. The getAccumulators method is not used very often in my opinion. The same applies to the DataStream#fold which additionally is not very performant. Lastly the setStateBackend has an alternative with a class from the AbstractStateBackend hierarchy, therefore it will be still code compatible. Moreover if we remove the #setStateBackend(AbstractStateBackend) we will get rid off warnings users have right now when setting a statebackend as the correct method cannot be used without an explicit casting.
>
> As for the DataStream#split I know there were some objections against removing the #split method in the past. I still believe the output tags can replace the split method already.
>
> The only problem in the last set of methods I propose to remove is that they were deprecated only in the last release and those method were only partially deprecated. Moreover some of the methods were not deprecated in ConnectedStreams. Nevertheless I'd still be inclined to remove the methods in this release.
>
> Let me know what do you think about it.
>
> Best,
>
> Dawid

Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by David Anderson <da...@alpinegizmo.com>.
I assume that along with DataStream#fold you would also
remove WindowedStream#fold.

I'm in favor of going ahead with all of these.

David

On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz <dw...@apache.org>
wrote:

> Hi devs and users,
>
> I wanted to ask you what do you think about removing some of the
> deprecated APIs around the DataStream API.
>
> The APIs I have in mind are:
>
>    - RuntimeContext#getAllAccumulators (deprecated in 0.10)
>    - DataStream#fold and all related classes and methods such as
>    FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>    1.3/1.4)
>    - StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>    (deprecated in 1.5)
>    - DataStream#split (deprecated in 1.8)
>    - Methods in (Connected)DataStream that specify keys as either indices
>    or field names such as DataStream#keyBy, DataStream#partitionCustom,
>    ConnectedStream#keyBy, .... (deprecated in 1.11)
>
> I think the first three should be straightforward. They are long
> deprecated. The getAccumulators method is not used very often in my
> opinion. The same applies to the DataStream#fold which additionally is not
> very performant. Lastly the setStateBackend has an alternative with a class
> from the AbstractStateBackend hierarchy, therefore it will be still code
> compatible. Moreover if we remove the
> #setStateBackend(AbstractStateBackend) we will get rid off warnings users
> have right now when setting a statebackend as the correct method cannot be
> used without an explicit casting.
>
> As for the DataStream#split I know there were some objections against
> removing the #split method in the past. I still believe the output tags can
> replace the split method already.
>
> The only problem in the last set of methods I propose to remove is that
> they were deprecated only in the last release and those method were only
> partially deprecated. Moreover some of the methods were not deprecated in
> ConnectedStreams. Nevertheless I'd still be inclined to remove the methods
> in this release.
>
> Let me know what do you think about it.
>
> Best,
>
> Dawid
>

Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by David Anderson <da...@alpinegizmo.com>.
I assume that along with DataStream#fold you would also
remove WindowedStream#fold.

I'm in favor of going ahead with all of these.

David

On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz <dw...@apache.org>
wrote:

> Hi devs and users,
>
> I wanted to ask you what do you think about removing some of the
> deprecated APIs around the DataStream API.
>
> The APIs I have in mind are:
>
>    - RuntimeContext#getAllAccumulators (deprecated in 0.10)
>    - DataStream#fold and all related classes and methods such as
>    FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>    1.3/1.4)
>    - StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>    (deprecated in 1.5)
>    - DataStream#split (deprecated in 1.8)
>    - Methods in (Connected)DataStream that specify keys as either indices
>    or field names such as DataStream#keyBy, DataStream#partitionCustom,
>    ConnectedStream#keyBy, .... (deprecated in 1.11)
>
> I think the first three should be straightforward. They are long
> deprecated. The getAccumulators method is not used very often in my
> opinion. The same applies to the DataStream#fold which additionally is not
> very performant. Lastly the setStateBackend has an alternative with a class
> from the AbstractStateBackend hierarchy, therefore it will be still code
> compatible. Moreover if we remove the
> #setStateBackend(AbstractStateBackend) we will get rid off warnings users
> have right now when setting a statebackend as the correct method cannot be
> used without an explicit casting.
>
> As for the DataStream#split I know there were some objections against
> removing the #split method in the past. I still believe the output tags can
> replace the split method already.
>
> The only problem in the last set of methods I propose to remove is that
> they were deprecated only in the last release and those method were only
> partially deprecated. Moreover some of the methods were not deprecated in
> ConnectedStreams. Nevertheless I'd still be inclined to remove the methods
> in this release.
>
> Let me know what do you think about it.
>
> Best,
>
> Dawid
>

Re: [DISCUSS] Removing deprecated methods from DataStream API

Posted by Kostas Kloudas <kk...@gmail.com>.
Thanks a lot for starting this Dawid,

Big +1 for the proposed clean-up, and I would also add the deprecated
methods of the StreamExecutionEnvironment like:

enableCheckpointing(long interval, CheckpointingMode mode, boolean force)
enableCheckpointing()
isForceCheckpointing()

readFile(FileInputFormat<OUT> inputFormat,String
filePath,FileProcessingMode watchType,long interval, FilePathFilter
filter)
readFileStream(...)

socketTextStream(String hostname, int port, char delimiter, long maxRetry)
socketTextStream(String hostname, int port, char delimiter)

There are more, like the (get)/setNumberOfExecutionRetries() that were
deprecated long ago, but I have not investigated to see if they are
actually easy to remove.

Cheers,
Kostas

On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
<dw...@apache.org> wrote:
>
> Hi devs and users,
>
> I wanted to ask you what do you think about removing some of the deprecated APIs around the DataStream API.
>
> The APIs I have in mind are:
>
> RuntimeContext#getAllAccumulators (deprecated in 0.10)
> DataStream#fold and all related classes and methods such as FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in 1.3/1.4)
> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend) (deprecated in 1.5)
> DataStream#split (deprecated in 1.8)
> Methods in (Connected)DataStream that specify keys as either indices or field names such as DataStream#keyBy, DataStream#partitionCustom, ConnectedStream#keyBy, .... (deprecated in 1.11)
>
> I think the first three should be straightforward. They are long deprecated. The getAccumulators method is not used very often in my opinion. The same applies to the DataStream#fold which additionally is not very performant. Lastly the setStateBackend has an alternative with a class from the AbstractStateBackend hierarchy, therefore it will be still code compatible. Moreover if we remove the #setStateBackend(AbstractStateBackend) we will get rid off warnings users have right now when setting a statebackend as the correct method cannot be used without an explicit casting.
>
> As for the DataStream#split I know there were some objections against removing the #split method in the past. I still believe the output tags can replace the split method already.
>
> The only problem in the last set of methods I propose to remove is that they were deprecated only in the last release and those method were only partially deprecated. Moreover some of the methods were not deprecated in ConnectedStreams. Nevertheless I'd still be inclined to remove the methods in this release.
>
> Let me know what do you think about it.
>
> Best,
>
> Dawid