You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Kostas Kloudas <kk...@apache.org> on 2020/10/12 14:30:10 UTC

[DISCUSS] Remove flink-connector-filesystem module.

Hi all,

As the title suggests, this thread is to discuss the removal of the
flink-connector-filesystem module which contains (only) the deprecated
BucketingSink. The BucketingSin is deprecated since FLINK 1.9 [1] in
favor of the relatively recently introduced StreamingFileSink.

For the sake of a clean and more manageable codebase, I propose to
remove this module for release-1.12, but of course we should see first
if there are any usecases that depend on it.

Let's have a fruitful discussion.

Cheers,
Kostas

[1] https://issues.apache.org/jira/browse/FLINK-13396

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Jingsong Li <ji...@gmail.com>.
Hi,

I share a concern:

Although we now support ORC Writer. It's not easy to support. We need to
override something for ORC classes.

Note that we are using a newer version of ORC, which is not forward
compatible. Therefore, the data written by users using Flink Orc writer may
not be readable by other engines, such as the old version of Hive.
However, it is not so easy for users to use streaming file sink to support
lower versions of ORC by themselves.

A replacement may be `HadoopPathBasedBulkFormatBuilder` which is added in
Flink 1.11.

Best,
Jingsong

On Tue, Oct 13, 2020 at 7:16 PM Chesnay Schepler <ch...@apache.org> wrote:

> How easy is the migration to the StreamingFileSink?
>
> On 10/13/2020 1:01 PM, Aljoscha Krettek wrote:
> > On 13.10.20 11:18, David Anderson wrote:
> >> I think the pertinent question is whether there are interesting cases
> >> where
> >> the BucketingSink is still a better choice. One case I'm not sure
> >> about is
> >> the situation described in docs for the StreamingFileSink under
> >> Important
> >> Note 2 [1]:
> >>
> >>      ... upon normal termination of a job, the last in-progress files
> >> will
> >> not be transitioned to the “finished” state.
> >>
> >> I know this confuses and frustrates users, but I don't know if the
> >> BucketingSink has any advantages in this regard.
> >
> > The BucketingSink suffers from the same problem. It's caused by the
> > fact that we don't do a "final" checkpoint before shutting down a
> > pipeline. We're trying to resolve that with FLIP-147 [1].
> >
> > [1] https://cwiki.apache.org/confluence/x/mw-ZCQ
> >
> >
>
>

-- 
Best, Jingsong Lee

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Chesnay Schepler <ch...@apache.org>.
How easy is the migration to the StreamingFileSink?

On 10/13/2020 1:01 PM, Aljoscha Krettek wrote:
> On 13.10.20 11:18, David Anderson wrote:
>> I think the pertinent question is whether there are interesting cases 
>> where
>> the BucketingSink is still a better choice. One case I'm not sure 
>> about is
>> the situation described in docs for the StreamingFileSink under 
>> Important
>> Note 2 [1]:
>>
>>      ... upon normal termination of a job, the last in-progress files 
>> will
>> not be transitioned to the “finished” state.
>>
>> I know this confuses and frustrates users, but I don't know if the
>> BucketingSink has any advantages in this regard.
>
> The BucketingSink suffers from the same problem. It's caused by the 
> fact that we don't do a "final" checkpoint before shutting down a 
> pipeline. We're trying to resolve that with FLIP-147 [1].
>
> [1] https://cwiki.apache.org/confluence/x/mw-ZCQ
>
>


Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Jingsong Li <ji...@gmail.com>.
+1 to remove the Bucketing Sink.

Thanks for the effort on ORC and `HadoopPathBasedBulkFormatBuilder`, I
think it's safe to get rid of the old Bucketing API with them.

Best,
Jingsong

On Thu, Oct 29, 2020 at 3:06 AM Kostas Kloudas <kk...@gmail.com> wrote:

> Thanks for the discussion!
>
> From this thread I do not see any objection with moving forward with
> removing the sink.
> Given this I will open a voting thread tomorrow.
>
> Cheers,
> Kostas
>
> On Wed, Oct 28, 2020 at 6:50 PM Stephan Ewen <se...@apache.org> wrote:
> >
> > +1 to remove the Bucketing Sink.
> >
> > It has been very common in the past to remove code that was deprecated
> for multiple releases in favor of reducing baggage.
> > Also in cases that had no perfect drop-in replacement, but needed users
> to forward fit the code.
> > I am not sure I understand why this case is so different.
> >
> > Why the Bucketing Sink should be thrown out, in my opinion:
> >
> > The Bucketing sink makes it easier for users to add general Hadoop
> writes.
> > But the price is that it easily leads to dataloss, because it assumes
> flush()/sync() work reliably on Hadoop relicably, which they don't (HDFS
> works somewhat, S3 works not at all).
> > I think the Bucketing sink is a trap for users, that's why it was
> deprecated long ago.
> >
> > The StreamingFileSink covers the majority of cases from the Bucketing
> Sink.
> > It does have some friction when adding/wrapping some general Hadoop
> writers. Parts will be solved with the transactional sink work.
> > If something is missing and blocking users, we can prioritize adding it
> to the Streaming File Sink. Also that is something we did before and it
> helped being pragmatic with moving forward, rather than being held back by
> "maybe there is something we don't know".
> >
> >
> >
> >
> > On Wed, Oct 28, 2020 at 12:36 PM Chesnay Schepler <ch...@apache.org>
> wrote:
> >>
> >> Then we can't remove it, because there is no way for us to ascertain
> >> whether anyone is still using it.
> >>
> >> Sure, the user ML is the best we got, but you can't argue that we don't
> >> want any users to be affected and then use an imperfect mean to find
> users.
> >> If you are fine with relying on the user ML, then you _are_ fine with
> >> removing it at the cost of friction for some users.
> >>
> >> To be clear, I, personally, don't have a problem with removing it (we
> >> have removed other connectors in the past that did not have a migration
> >> plan), I just reject he argumentation.
> >>
> >> On 10/28/2020 12:21 PM, Kostas Kloudas wrote:
> >> > No, I do not think that "we are fine with removing it at the cost of
> >> > friction for some users".
> >> >
> >> > I believe that this can be another discussion that we should have as
> >> > soon as we establish that someone is actually using it. The point I am
> >> > trying to make is that if no user is using it, we should remove it and
> >> > not leave unmaintained code around.
> >> >
> >> > On Wed, Oct 28, 2020 at 12:11 PM Chesnay Schepler <ch...@apache.org>
> wrote:
> >> >> The alternative could also be to use a different argument than "no
> one
> >> >> uses it", e.g., we are fine with removing it at the cost of friction
> for
> >> >> some users because there are better alternatives.
> >> >>
> >> >> On 10/28/2020 10:46 AM, Kostas Kloudas wrote:
> >> >>> I think that the mailing lists is the best we can do and I would say
> >> >>> that they seem to be working pretty well (e.g. the recent Mesos
> >> >>> discussion).
> >> >>> Of course they are not perfect but the alternative would be to never
> >> >>> remove anything user facing until the next major release, which I
> find
> >> >>> pretty strict.
> >> >>>
> >> >>> On Wed, Oct 28, 2020 at 10:04 AM Chesnay Schepler <
> chesnay@apache.org> wrote:
> >> >>>> If the conclusion is that we shouldn't remove it if _anyone_ is
> using
> >> >>>> it, then we cannot remove it because the user ML obviously does not
> >> >>>> reach all users.
> >> >>>>
> >> >>>> On 10/28/2020 9:28 AM, Kostas Kloudas wrote:
> >> >>>>> Hi all,
> >> >>>>>
> >> >>>>> I am bringing the up again to see if there are any users actively
> >> >>>>> using the BucketingSink.
> >> >>>>> So far, if I am not mistaken (and really sorry if I forgot
> anything),
> >> >>>>> it is only a discussion between devs about the potential problems
> of
> >> >>>>> removing it. I totally understand Chesnay's concern about not
> >> >>>>> providing compatibility with the StreamingFileSink (SFS) and if
> there
> >> >>>>> are any users, then we should not remove it without trying to
> find a
> >> >>>>> solution for them.
> >> >>>>>
> >> >>>>> But if there are no users then I would still propose to remove the
> >> >>>>> module, given that I am not aware of any efforts to provide
> >> >>>>> compatibility with the SFS any time soon.
> >> >>>>> The reasons for removing it also include the facts that we do not
> >> >>>>> actively maintain it and we do not add new features. As for
> potential
> >> >>>>> missing features in the SFS compared to the BucketingSink that was
> >> >>>>> mentioned before, I am not aware of any fundamental limitations
> and
> >> >>>>> even if there are, I would assume that the solution is not to
> direct
> >> >>>>> the users to a deprecated sink but rather try to increase the
> >> >>>>> functionality of the actively maintained one.
> >> >>>>>
> >> >>>>> Please keep in mind that the BucketingSink is deprecated since
> FLINK
> >> >>>>> 1.9 and there is a new File Sink that is coming as part of
> FLIP-143
> >> >>>>> [1].
> >> >>>>> Again, if there are any active users who cannot migrate easily,
> then
> >> >>>>> we cannot remove it before trying to provide a smooth migration
> path.
> >> >>>>>
> >> >>>>> Thanks,
> >> >>>>> Kostas
> >> >>>>>
> >> >>>>> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
> >> >>>>>
> >> >>>>> On Fri, Oct 16, 2020 at 4:36 PM Chesnay Schepler <
> chesnay@apache.org> wrote:
> >> >>>>>> @Seth: Earlier in this discussion it was said that the
> BucketingSink
> >> >>>>>> would not be usable in 1.12 .
> >> >>>>>>
> >> >>>>>> On 10/16/2020 4:25 PM, Seth Wiesman wrote:
> >> >>>>>>> +1 It has been deprecated for some time and the
> StreamingFileSink has
> >> >>>>>>> stabalized with a large number of formats and features.
> >> >>>>>>>
> >> >>>>>>> Plus, the bucketing sink only implements a small number of
> stable
> >> >>>>>>> interfaces[1]. I would expect users to continue to use the
> bucketing sink
> >> >>>>>>> from the 1.11 release with future versions for some time.
> >> >>>>>>>
> >> >>>>>>> Seth
> >> >>>>>>>
> >> >>>>>>>
> https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
> >> >>>>>>>
> >> >>>>>>> On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <
> kkloudas@gmail.com> wrote:
> >> >>>>>>>
> >> >>>>>>>> @Arvid Heise I also do not remember exactly what were all the
> >> >>>>>>>> problems. The fact that we added some more bulk formats to the
> >> >>>>>>>> streaming file sink definitely reduced the non-supported
> features. In
> >> >>>>>>>> addition, the latest discussion I found on the topic was [1]
> and the
> >> >>>>>>>> conclusion of that discussion seems to be to remove it.
> >> >>>>>>>>
> >> >>>>>>>> Currently, I cannot find any obvious reason why keeping the
> >> >>>>>>>> BucketingSink, apart from the fact that we do not have a
> migration
> >> >>>>>>>> plan unfortunately. This is why I posted this to dev@ and
> user@.
> >> >>>>>>>>
> >> >>>>>>>> Cheers,
> >> >>>>>>>> Kostas
> >> >>>>>>>>
> >> >>>>>>>> [1]
> >> >>>>>>>>
> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
> >> >>>>>>>>
> >> >>>>>>>> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <
> arvid@ververica.com> wrote:
> >> >>>>>>>>> I remember this conversation popping up a few times already
> and I'm in
> >> >>>>>>>>> general a big fan of removing BucketingSink.
> >> >>>>>>>>>
> >> >>>>>>>>> However, until now there were a few features lacking in
> StreamingFileSink
> >> >>>>>>>>> that are present in BucketingSink and that are being actively
> used (I
> >> >>>>>>>> can't
> >> >>>>>>>>> exactly remember them now, but I can look it up if everyone
> else is also
> >> >>>>>>>>> suffering from bad memory). Did we manage to add them in the
> meantime? If
> >> >>>>>>>>> not, then it feels rushed to remove it at this point.
> >> >>>>>>>>>
> >> >>>>>>>>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <
> kkloudas@gmail.com>
> >> >>>>>>>> wrote:
> >> >>>>>>>>>> @Chesnay Schepler  Off the top of my head, I cannot find an
> easy way
> >> >>>>>>>>>> to migrate from the BucketingSink to the StreamingFileSink.
> It may be
> >> >>>>>>>>>> possible but it will require some effort because the logic
> would be
> >> >>>>>>>>>> "read the old state, commit it, and start fresh with the
> >> >>>>>>>>>> StreamingFileSink."
> >> >>>>>>>>>>
> >> >>>>>>>>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <
> aljoscha@apache.org>
> >> >>>>>>>>>> wrote:
> >> >>>>>>>>>>> On 13.10.20 14:01, David Anderson wrote:
> >> >>>>>>>>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
> >> >>>>>>>> Handling --
> >> >>>>>>>>>> and
> >> >>>>>>>>>>>> in fact, the StreamingFileSink is mentioned in that FLIP
> as a
> >> >>>>>>>>>> motivating
> >> >>>>>>>>>>>> use case.
> >> >>>>>>>>>>> Ah yes, I see FLIP-147 as a more general replacement for
> FLIP-46.
> >> >>>>>>>> Thanks
> >> >>>>>>>>>>> for the reminder, we should close FLIP-46 now with an
> explanatory
> >> >>>>>>>>>>> message to avoid confusion.
> >> >>>>>>>>> --
> >> >>>>>>>>>
> >> >>>>>>>>> Arvid Heise | Senior Java Developer
> >> >>>>>>>>>
> >> >>>>>>>>> <https://www.ververica.com/>
> >> >>>>>>>>>
> >> >>>>>>>>> Follow us @VervericaData
> >> >>>>>>>>>
> >> >>>>>>>>> --
> >> >>>>>>>>>
> >> >>>>>>>>> Join Flink Forward <https://flink-forward.org/> - The Apache
> Flink
> >> >>>>>>>>> Conference
> >> >>>>>>>>>
> >> >>>>>>>>> Stream Processing | Event Driven | Real Time
> >> >>>>>>>>>
> >> >>>>>>>>> --
> >> >>>>>>>>>
> >> >>>>>>>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >> >>>>>>>>>
> >> >>>>>>>>> --
> >> >>>>>>>>> Ververica GmbH
> >> >>>>>>>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> >> >>>>>>>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung
> Jason, Ji
> >> >>>>>>>>> (Toni) Cheng
> >> >>
> >>
>


-- 
Best, Jingsong Lee

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Jingsong Li <ji...@gmail.com>.
+1 to remove the Bucketing Sink.

Thanks for the effort on ORC and `HadoopPathBasedBulkFormatBuilder`, I
think it's safe to get rid of the old Bucketing API with them.

Best,
Jingsong

On Thu, Oct 29, 2020 at 3:06 AM Kostas Kloudas <kk...@gmail.com> wrote:

> Thanks for the discussion!
>
> From this thread I do not see any objection with moving forward with
> removing the sink.
> Given this I will open a voting thread tomorrow.
>
> Cheers,
> Kostas
>
> On Wed, Oct 28, 2020 at 6:50 PM Stephan Ewen <se...@apache.org> wrote:
> >
> > +1 to remove the Bucketing Sink.
> >
> > It has been very common in the past to remove code that was deprecated
> for multiple releases in favor of reducing baggage.
> > Also in cases that had no perfect drop-in replacement, but needed users
> to forward fit the code.
> > I am not sure I understand why this case is so different.
> >
> > Why the Bucketing Sink should be thrown out, in my opinion:
> >
> > The Bucketing sink makes it easier for users to add general Hadoop
> writes.
> > But the price is that it easily leads to dataloss, because it assumes
> flush()/sync() work reliably on Hadoop relicably, which they don't (HDFS
> works somewhat, S3 works not at all).
> > I think the Bucketing sink is a trap for users, that's why it was
> deprecated long ago.
> >
> > The StreamingFileSink covers the majority of cases from the Bucketing
> Sink.
> > It does have some friction when adding/wrapping some general Hadoop
> writers. Parts will be solved with the transactional sink work.
> > If something is missing and blocking users, we can prioritize adding it
> to the Streaming File Sink. Also that is something we did before and it
> helped being pragmatic with moving forward, rather than being held back by
> "maybe there is something we don't know".
> >
> >
> >
> >
> > On Wed, Oct 28, 2020 at 12:36 PM Chesnay Schepler <ch...@apache.org>
> wrote:
> >>
> >> Then we can't remove it, because there is no way for us to ascertain
> >> whether anyone is still using it.
> >>
> >> Sure, the user ML is the best we got, but you can't argue that we don't
> >> want any users to be affected and then use an imperfect mean to find
> users.
> >> If you are fine with relying on the user ML, then you _are_ fine with
> >> removing it at the cost of friction for some users.
> >>
> >> To be clear, I, personally, don't have a problem with removing it (we
> >> have removed other connectors in the past that did not have a migration
> >> plan), I just reject he argumentation.
> >>
> >> On 10/28/2020 12:21 PM, Kostas Kloudas wrote:
> >> > No, I do not think that "we are fine with removing it at the cost of
> >> > friction for some users".
> >> >
> >> > I believe that this can be another discussion that we should have as
> >> > soon as we establish that someone is actually using it. The point I am
> >> > trying to make is that if no user is using it, we should remove it and
> >> > not leave unmaintained code around.
> >> >
> >> > On Wed, Oct 28, 2020 at 12:11 PM Chesnay Schepler <ch...@apache.org>
> wrote:
> >> >> The alternative could also be to use a different argument than "no
> one
> >> >> uses it", e.g., we are fine with removing it at the cost of friction
> for
> >> >> some users because there are better alternatives.
> >> >>
> >> >> On 10/28/2020 10:46 AM, Kostas Kloudas wrote:
> >> >>> I think that the mailing lists is the best we can do and I would say
> >> >>> that they seem to be working pretty well (e.g. the recent Mesos
> >> >>> discussion).
> >> >>> Of course they are not perfect but the alternative would be to never
> >> >>> remove anything user facing until the next major release, which I
> find
> >> >>> pretty strict.
> >> >>>
> >> >>> On Wed, Oct 28, 2020 at 10:04 AM Chesnay Schepler <
> chesnay@apache.org> wrote:
> >> >>>> If the conclusion is that we shouldn't remove it if _anyone_ is
> using
> >> >>>> it, then we cannot remove it because the user ML obviously does not
> >> >>>> reach all users.
> >> >>>>
> >> >>>> On 10/28/2020 9:28 AM, Kostas Kloudas wrote:
> >> >>>>> Hi all,
> >> >>>>>
> >> >>>>> I am bringing the up again to see if there are any users actively
> >> >>>>> using the BucketingSink.
> >> >>>>> So far, if I am not mistaken (and really sorry if I forgot
> anything),
> >> >>>>> it is only a discussion between devs about the potential problems
> of
> >> >>>>> removing it. I totally understand Chesnay's concern about not
> >> >>>>> providing compatibility with the StreamingFileSink (SFS) and if
> there
> >> >>>>> are any users, then we should not remove it without trying to
> find a
> >> >>>>> solution for them.
> >> >>>>>
> >> >>>>> But if there are no users then I would still propose to remove the
> >> >>>>> module, given that I am not aware of any efforts to provide
> >> >>>>> compatibility with the SFS any time soon.
> >> >>>>> The reasons for removing it also include the facts that we do not
> >> >>>>> actively maintain it and we do not add new features. As for
> potential
> >> >>>>> missing features in the SFS compared to the BucketingSink that was
> >> >>>>> mentioned before, I am not aware of any fundamental limitations
> and
> >> >>>>> even if there are, I would assume that the solution is not to
> direct
> >> >>>>> the users to a deprecated sink but rather try to increase the
> >> >>>>> functionality of the actively maintained one.
> >> >>>>>
> >> >>>>> Please keep in mind that the BucketingSink is deprecated since
> FLINK
> >> >>>>> 1.9 and there is a new File Sink that is coming as part of
> FLIP-143
> >> >>>>> [1].
> >> >>>>> Again, if there are any active users who cannot migrate easily,
> then
> >> >>>>> we cannot remove it before trying to provide a smooth migration
> path.
> >> >>>>>
> >> >>>>> Thanks,
> >> >>>>> Kostas
> >> >>>>>
> >> >>>>> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
> >> >>>>>
> >> >>>>> On Fri, Oct 16, 2020 at 4:36 PM Chesnay Schepler <
> chesnay@apache.org> wrote:
> >> >>>>>> @Seth: Earlier in this discussion it was said that the
> BucketingSink
> >> >>>>>> would not be usable in 1.12 .
> >> >>>>>>
> >> >>>>>> On 10/16/2020 4:25 PM, Seth Wiesman wrote:
> >> >>>>>>> +1 It has been deprecated for some time and the
> StreamingFileSink has
> >> >>>>>>> stabalized with a large number of formats and features.
> >> >>>>>>>
> >> >>>>>>> Plus, the bucketing sink only implements a small number of
> stable
> >> >>>>>>> interfaces[1]. I would expect users to continue to use the
> bucketing sink
> >> >>>>>>> from the 1.11 release with future versions for some time.
> >> >>>>>>>
> >> >>>>>>> Seth
> >> >>>>>>>
> >> >>>>>>>
> https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
> >> >>>>>>>
> >> >>>>>>> On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <
> kkloudas@gmail.com> wrote:
> >> >>>>>>>
> >> >>>>>>>> @Arvid Heise I also do not remember exactly what were all the
> >> >>>>>>>> problems. The fact that we added some more bulk formats to the
> >> >>>>>>>> streaming file sink definitely reduced the non-supported
> features. In
> >> >>>>>>>> addition, the latest discussion I found on the topic was [1]
> and the
> >> >>>>>>>> conclusion of that discussion seems to be to remove it.
> >> >>>>>>>>
> >> >>>>>>>> Currently, I cannot find any obvious reason why keeping the
> >> >>>>>>>> BucketingSink, apart from the fact that we do not have a
> migration
> >> >>>>>>>> plan unfortunately. This is why I posted this to dev@ and
> user@.
> >> >>>>>>>>
> >> >>>>>>>> Cheers,
> >> >>>>>>>> Kostas
> >> >>>>>>>>
> >> >>>>>>>> [1]
> >> >>>>>>>>
> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
> >> >>>>>>>>
> >> >>>>>>>> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <
> arvid@ververica.com> wrote:
> >> >>>>>>>>> I remember this conversation popping up a few times already
> and I'm in
> >> >>>>>>>>> general a big fan of removing BucketingSink.
> >> >>>>>>>>>
> >> >>>>>>>>> However, until now there were a few features lacking in
> StreamingFileSink
> >> >>>>>>>>> that are present in BucketingSink and that are being actively
> used (I
> >> >>>>>>>> can't
> >> >>>>>>>>> exactly remember them now, but I can look it up if everyone
> else is also
> >> >>>>>>>>> suffering from bad memory). Did we manage to add them in the
> meantime? If
> >> >>>>>>>>> not, then it feels rushed to remove it at this point.
> >> >>>>>>>>>
> >> >>>>>>>>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <
> kkloudas@gmail.com>
> >> >>>>>>>> wrote:
> >> >>>>>>>>>> @Chesnay Schepler  Off the top of my head, I cannot find an
> easy way
> >> >>>>>>>>>> to migrate from the BucketingSink to the StreamingFileSink.
> It may be
> >> >>>>>>>>>> possible but it will require some effort because the logic
> would be
> >> >>>>>>>>>> "read the old state, commit it, and start fresh with the
> >> >>>>>>>>>> StreamingFileSink."
> >> >>>>>>>>>>
> >> >>>>>>>>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <
> aljoscha@apache.org>
> >> >>>>>>>>>> wrote:
> >> >>>>>>>>>>> On 13.10.20 14:01, David Anderson wrote:
> >> >>>>>>>>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
> >> >>>>>>>> Handling --
> >> >>>>>>>>>> and
> >> >>>>>>>>>>>> in fact, the StreamingFileSink is mentioned in that FLIP
> as a
> >> >>>>>>>>>> motivating
> >> >>>>>>>>>>>> use case.
> >> >>>>>>>>>>> Ah yes, I see FLIP-147 as a more general replacement for
> FLIP-46.
> >> >>>>>>>> Thanks
> >> >>>>>>>>>>> for the reminder, we should close FLIP-46 now with an
> explanatory
> >> >>>>>>>>>>> message to avoid confusion.
> >> >>>>>>>>> --
> >> >>>>>>>>>
> >> >>>>>>>>> Arvid Heise | Senior Java Developer
> >> >>>>>>>>>
> >> >>>>>>>>> <https://www.ververica.com/>
> >> >>>>>>>>>
> >> >>>>>>>>> Follow us @VervericaData
> >> >>>>>>>>>
> >> >>>>>>>>> --
> >> >>>>>>>>>
> >> >>>>>>>>> Join Flink Forward <https://flink-forward.org/> - The Apache
> Flink
> >> >>>>>>>>> Conference
> >> >>>>>>>>>
> >> >>>>>>>>> Stream Processing | Event Driven | Real Time
> >> >>>>>>>>>
> >> >>>>>>>>> --
> >> >>>>>>>>>
> >> >>>>>>>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >> >>>>>>>>>
> >> >>>>>>>>> --
> >> >>>>>>>>> Ververica GmbH
> >> >>>>>>>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> >> >>>>>>>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung
> Jason, Ji
> >> >>>>>>>>> (Toni) Cheng
> >> >>
> >>
>


-- 
Best, Jingsong Lee

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Kostas Kloudas <kk...@gmail.com>.
Thanks for the discussion!

From this thread I do not see any objection with moving forward with
removing the sink.
Given this I will open a voting thread tomorrow.

Cheers,
Kostas

On Wed, Oct 28, 2020 at 6:50 PM Stephan Ewen <se...@apache.org> wrote:
>
> +1 to remove the Bucketing Sink.
>
> It has been very common in the past to remove code that was deprecated for multiple releases in favor of reducing baggage.
> Also in cases that had no perfect drop-in replacement, but needed users to forward fit the code.
> I am not sure I understand why this case is so different.
>
> Why the Bucketing Sink should be thrown out, in my opinion:
>
> The Bucketing sink makes it easier for users to add general Hadoop writes.
> But the price is that it easily leads to dataloss, because it assumes flush()/sync() work reliably on Hadoop relicably, which they don't (HDFS works somewhat, S3 works not at all).
> I think the Bucketing sink is a trap for users, that's why it was deprecated long ago.
>
> The StreamingFileSink covers the majority of cases from the Bucketing Sink.
> It does have some friction when adding/wrapping some general Hadoop writers. Parts will be solved with the transactional sink work.
> If something is missing and blocking users, we can prioritize adding it to the Streaming File Sink. Also that is something we did before and it helped being pragmatic with moving forward, rather than being held back by "maybe there is something we don't know".
>
>
>
>
> On Wed, Oct 28, 2020 at 12:36 PM Chesnay Schepler <ch...@apache.org> wrote:
>>
>> Then we can't remove it, because there is no way for us to ascertain
>> whether anyone is still using it.
>>
>> Sure, the user ML is the best we got, but you can't argue that we don't
>> want any users to be affected and then use an imperfect mean to find users.
>> If you are fine with relying on the user ML, then you _are_ fine with
>> removing it at the cost of friction for some users.
>>
>> To be clear, I, personally, don't have a problem with removing it (we
>> have removed other connectors in the past that did not have a migration
>> plan), I just reject he argumentation.
>>
>> On 10/28/2020 12:21 PM, Kostas Kloudas wrote:
>> > No, I do not think that "we are fine with removing it at the cost of
>> > friction for some users".
>> >
>> > I believe that this can be another discussion that we should have as
>> > soon as we establish that someone is actually using it. The point I am
>> > trying to make is that if no user is using it, we should remove it and
>> > not leave unmaintained code around.
>> >
>> > On Wed, Oct 28, 2020 at 12:11 PM Chesnay Schepler <ch...@apache.org> wrote:
>> >> The alternative could also be to use a different argument than "no one
>> >> uses it", e.g., we are fine with removing it at the cost of friction for
>> >> some users because there are better alternatives.
>> >>
>> >> On 10/28/2020 10:46 AM, Kostas Kloudas wrote:
>> >>> I think that the mailing lists is the best we can do and I would say
>> >>> that they seem to be working pretty well (e.g. the recent Mesos
>> >>> discussion).
>> >>> Of course they are not perfect but the alternative would be to never
>> >>> remove anything user facing until the next major release, which I find
>> >>> pretty strict.
>> >>>
>> >>> On Wed, Oct 28, 2020 at 10:04 AM Chesnay Schepler <ch...@apache.org> wrote:
>> >>>> If the conclusion is that we shouldn't remove it if _anyone_ is using
>> >>>> it, then we cannot remove it because the user ML obviously does not
>> >>>> reach all users.
>> >>>>
>> >>>> On 10/28/2020 9:28 AM, Kostas Kloudas wrote:
>> >>>>> Hi all,
>> >>>>>
>> >>>>> I am bringing the up again to see if there are any users actively
>> >>>>> using the BucketingSink.
>> >>>>> So far, if I am not mistaken (and really sorry if I forgot anything),
>> >>>>> it is only a discussion between devs about the potential problems of
>> >>>>> removing it. I totally understand Chesnay's concern about not
>> >>>>> providing compatibility with the StreamingFileSink (SFS) and if there
>> >>>>> are any users, then we should not remove it without trying to find a
>> >>>>> solution for them.
>> >>>>>
>> >>>>> But if there are no users then I would still propose to remove the
>> >>>>> module, given that I am not aware of any efforts to provide
>> >>>>> compatibility with the SFS any time soon.
>> >>>>> The reasons for removing it also include the facts that we do not
>> >>>>> actively maintain it and we do not add new features. As for potential
>> >>>>> missing features in the SFS compared to the BucketingSink that was
>> >>>>> mentioned before, I am not aware of any fundamental limitations and
>> >>>>> even if there are, I would assume that the solution is not to direct
>> >>>>> the users to a deprecated sink but rather try to increase the
>> >>>>> functionality of the actively maintained one.
>> >>>>>
>> >>>>> Please keep in mind that the BucketingSink is deprecated since FLINK
>> >>>>> 1.9 and there is a new File Sink that is coming as part of FLIP-143
>> >>>>> [1].
>> >>>>> Again, if there are any active users who cannot migrate easily, then
>> >>>>> we cannot remove it before trying to provide a smooth migration path.
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Kostas
>> >>>>>
>> >>>>> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
>> >>>>>
>> >>>>> On Fri, Oct 16, 2020 at 4:36 PM Chesnay Schepler <ch...@apache.org> wrote:
>> >>>>>> @Seth: Earlier in this discussion it was said that the BucketingSink
>> >>>>>> would not be usable in 1.12 .
>> >>>>>>
>> >>>>>> On 10/16/2020 4:25 PM, Seth Wiesman wrote:
>> >>>>>>> +1 It has been deprecated for some time and the StreamingFileSink has
>> >>>>>>> stabalized with a large number of formats and features.
>> >>>>>>>
>> >>>>>>> Plus, the bucketing sink only implements a small number of stable
>> >>>>>>> interfaces[1]. I would expect users to continue to use the bucketing sink
>> >>>>>>> from the 1.11 release with future versions for some time.
>> >>>>>>>
>> >>>>>>> Seth
>> >>>>>>>
>> >>>>>>> https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
>> >>>>>>>
>> >>>>>>> On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kk...@gmail.com> wrote:
>> >>>>>>>
>> >>>>>>>> @Arvid Heise I also do not remember exactly what were all the
>> >>>>>>>> problems. The fact that we added some more bulk formats to the
>> >>>>>>>> streaming file sink definitely reduced the non-supported features. In
>> >>>>>>>> addition, the latest discussion I found on the topic was [1] and the
>> >>>>>>>> conclusion of that discussion seems to be to remove it.
>> >>>>>>>>
>> >>>>>>>> Currently, I cannot find any obvious reason why keeping the
>> >>>>>>>> BucketingSink, apart from the fact that we do not have a migration
>> >>>>>>>> plan unfortunately. This is why I posted this to dev@ and user@.
>> >>>>>>>>
>> >>>>>>>> Cheers,
>> >>>>>>>> Kostas
>> >>>>>>>>
>> >>>>>>>> [1]
>> >>>>>>>> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
>> >>>>>>>>
>> >>>>>>>> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:
>> >>>>>>>>> I remember this conversation popping up a few times already and I'm in
>> >>>>>>>>> general a big fan of removing BucketingSink.
>> >>>>>>>>>
>> >>>>>>>>> However, until now there were a few features lacking in StreamingFileSink
>> >>>>>>>>> that are present in BucketingSink and that are being actively used (I
>> >>>>>>>> can't
>> >>>>>>>>> exactly remember them now, but I can look it up if everyone else is also
>> >>>>>>>>> suffering from bad memory). Did we manage to add them in the meantime? If
>> >>>>>>>>> not, then it feels rushed to remove it at this point.
>> >>>>>>>>>
>> >>>>>>>>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kk...@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>>> @Chesnay Schepler  Off the top of my head, I cannot find an easy way
>> >>>>>>>>>> to migrate from the BucketingSink to the StreamingFileSink. It may be
>> >>>>>>>>>> possible but it will require some effort because the logic would be
>> >>>>>>>>>> "read the old state, commit it, and start fresh with the
>> >>>>>>>>>> StreamingFileSink."
>> >>>>>>>>>>
>> >>>>>>>>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>> On 13.10.20 14:01, David Anderson wrote:
>> >>>>>>>>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
>> >>>>>>>> Handling --
>> >>>>>>>>>> and
>> >>>>>>>>>>>> in fact, the StreamingFileSink is mentioned in that FLIP as a
>> >>>>>>>>>> motivating
>> >>>>>>>>>>>> use case.
>> >>>>>>>>>>> Ah yes, I see FLIP-147 as a more general replacement for FLIP-46.
>> >>>>>>>> Thanks
>> >>>>>>>>>>> for the reminder, we should close FLIP-46 now with an explanatory
>> >>>>>>>>>>> message to avoid confusion.
>> >>>>>>>>> --
>> >>>>>>>>>
>> >>>>>>>>> Arvid Heise | Senior Java Developer
>> >>>>>>>>>
>> >>>>>>>>> <https://www.ververica.com/>
>> >>>>>>>>>
>> >>>>>>>>> Follow us @VervericaData
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>>
>> >>>>>>>>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>> >>>>>>>>> Conference
>> >>>>>>>>>
>> >>>>>>>>> Stream Processing | Event Driven | Real Time
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>>
>> >>>>>>>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> Ververica GmbH
>> >>>>>>>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>> >>>>>>>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
>> >>>>>>>>> (Toni) Cheng
>> >>
>>

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Kostas Kloudas <kk...@gmail.com>.
Thanks for the discussion!

From this thread I do not see any objection with moving forward with
removing the sink.
Given this I will open a voting thread tomorrow.

Cheers,
Kostas

On Wed, Oct 28, 2020 at 6:50 PM Stephan Ewen <se...@apache.org> wrote:
>
> +1 to remove the Bucketing Sink.
>
> It has been very common in the past to remove code that was deprecated for multiple releases in favor of reducing baggage.
> Also in cases that had no perfect drop-in replacement, but needed users to forward fit the code.
> I am not sure I understand why this case is so different.
>
> Why the Bucketing Sink should be thrown out, in my opinion:
>
> The Bucketing sink makes it easier for users to add general Hadoop writes.
> But the price is that it easily leads to dataloss, because it assumes flush()/sync() work reliably on Hadoop relicably, which they don't (HDFS works somewhat, S3 works not at all).
> I think the Bucketing sink is a trap for users, that's why it was deprecated long ago.
>
> The StreamingFileSink covers the majority of cases from the Bucketing Sink.
> It does have some friction when adding/wrapping some general Hadoop writers. Parts will be solved with the transactional sink work.
> If something is missing and blocking users, we can prioritize adding it to the Streaming File Sink. Also that is something we did before and it helped being pragmatic with moving forward, rather than being held back by "maybe there is something we don't know".
>
>
>
>
> On Wed, Oct 28, 2020 at 12:36 PM Chesnay Schepler <ch...@apache.org> wrote:
>>
>> Then we can't remove it, because there is no way for us to ascertain
>> whether anyone is still using it.
>>
>> Sure, the user ML is the best we got, but you can't argue that we don't
>> want any users to be affected and then use an imperfect mean to find users.
>> If you are fine with relying on the user ML, then you _are_ fine with
>> removing it at the cost of friction for some users.
>>
>> To be clear, I, personally, don't have a problem with removing it (we
>> have removed other connectors in the past that did not have a migration
>> plan), I just reject he argumentation.
>>
>> On 10/28/2020 12:21 PM, Kostas Kloudas wrote:
>> > No, I do not think that "we are fine with removing it at the cost of
>> > friction for some users".
>> >
>> > I believe that this can be another discussion that we should have as
>> > soon as we establish that someone is actually using it. The point I am
>> > trying to make is that if no user is using it, we should remove it and
>> > not leave unmaintained code around.
>> >
>> > On Wed, Oct 28, 2020 at 12:11 PM Chesnay Schepler <ch...@apache.org> wrote:
>> >> The alternative could also be to use a different argument than "no one
>> >> uses it", e.g., we are fine with removing it at the cost of friction for
>> >> some users because there are better alternatives.
>> >>
>> >> On 10/28/2020 10:46 AM, Kostas Kloudas wrote:
>> >>> I think that the mailing lists is the best we can do and I would say
>> >>> that they seem to be working pretty well (e.g. the recent Mesos
>> >>> discussion).
>> >>> Of course they are not perfect but the alternative would be to never
>> >>> remove anything user facing until the next major release, which I find
>> >>> pretty strict.
>> >>>
>> >>> On Wed, Oct 28, 2020 at 10:04 AM Chesnay Schepler <ch...@apache.org> wrote:
>> >>>> If the conclusion is that we shouldn't remove it if _anyone_ is using
>> >>>> it, then we cannot remove it because the user ML obviously does not
>> >>>> reach all users.
>> >>>>
>> >>>> On 10/28/2020 9:28 AM, Kostas Kloudas wrote:
>> >>>>> Hi all,
>> >>>>>
>> >>>>> I am bringing the up again to see if there are any users actively
>> >>>>> using the BucketingSink.
>> >>>>> So far, if I am not mistaken (and really sorry if I forgot anything),
>> >>>>> it is only a discussion between devs about the potential problems of
>> >>>>> removing it. I totally understand Chesnay's concern about not
>> >>>>> providing compatibility with the StreamingFileSink (SFS) and if there
>> >>>>> are any users, then we should not remove it without trying to find a
>> >>>>> solution for them.
>> >>>>>
>> >>>>> But if there are no users then I would still propose to remove the
>> >>>>> module, given that I am not aware of any efforts to provide
>> >>>>> compatibility with the SFS any time soon.
>> >>>>> The reasons for removing it also include the facts that we do not
>> >>>>> actively maintain it and we do not add new features. As for potential
>> >>>>> missing features in the SFS compared to the BucketingSink that was
>> >>>>> mentioned before, I am not aware of any fundamental limitations and
>> >>>>> even if there are, I would assume that the solution is not to direct
>> >>>>> the users to a deprecated sink but rather try to increase the
>> >>>>> functionality of the actively maintained one.
>> >>>>>
>> >>>>> Please keep in mind that the BucketingSink is deprecated since FLINK
>> >>>>> 1.9 and there is a new File Sink that is coming as part of FLIP-143
>> >>>>> [1].
>> >>>>> Again, if there are any active users who cannot migrate easily, then
>> >>>>> we cannot remove it before trying to provide a smooth migration path.
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Kostas
>> >>>>>
>> >>>>> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
>> >>>>>
>> >>>>> On Fri, Oct 16, 2020 at 4:36 PM Chesnay Schepler <ch...@apache.org> wrote:
>> >>>>>> @Seth: Earlier in this discussion it was said that the BucketingSink
>> >>>>>> would not be usable in 1.12 .
>> >>>>>>
>> >>>>>> On 10/16/2020 4:25 PM, Seth Wiesman wrote:
>> >>>>>>> +1 It has been deprecated for some time and the StreamingFileSink has
>> >>>>>>> stabalized with a large number of formats and features.
>> >>>>>>>
>> >>>>>>> Plus, the bucketing sink only implements a small number of stable
>> >>>>>>> interfaces[1]. I would expect users to continue to use the bucketing sink
>> >>>>>>> from the 1.11 release with future versions for some time.
>> >>>>>>>
>> >>>>>>> Seth
>> >>>>>>>
>> >>>>>>> https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
>> >>>>>>>
>> >>>>>>> On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kk...@gmail.com> wrote:
>> >>>>>>>
>> >>>>>>>> @Arvid Heise I also do not remember exactly what were all the
>> >>>>>>>> problems. The fact that we added some more bulk formats to the
>> >>>>>>>> streaming file sink definitely reduced the non-supported features. In
>> >>>>>>>> addition, the latest discussion I found on the topic was [1] and the
>> >>>>>>>> conclusion of that discussion seems to be to remove it.
>> >>>>>>>>
>> >>>>>>>> Currently, I cannot find any obvious reason why keeping the
>> >>>>>>>> BucketingSink, apart from the fact that we do not have a migration
>> >>>>>>>> plan unfortunately. This is why I posted this to dev@ and user@.
>> >>>>>>>>
>> >>>>>>>> Cheers,
>> >>>>>>>> Kostas
>> >>>>>>>>
>> >>>>>>>> [1]
>> >>>>>>>> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
>> >>>>>>>>
>> >>>>>>>> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:
>> >>>>>>>>> I remember this conversation popping up a few times already and I'm in
>> >>>>>>>>> general a big fan of removing BucketingSink.
>> >>>>>>>>>
>> >>>>>>>>> However, until now there were a few features lacking in StreamingFileSink
>> >>>>>>>>> that are present in BucketingSink and that are being actively used (I
>> >>>>>>>> can't
>> >>>>>>>>> exactly remember them now, but I can look it up if everyone else is also
>> >>>>>>>>> suffering from bad memory). Did we manage to add them in the meantime? If
>> >>>>>>>>> not, then it feels rushed to remove it at this point.
>> >>>>>>>>>
>> >>>>>>>>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kk...@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>>> @Chesnay Schepler  Off the top of my head, I cannot find an easy way
>> >>>>>>>>>> to migrate from the BucketingSink to the StreamingFileSink. It may be
>> >>>>>>>>>> possible but it will require some effort because the logic would be
>> >>>>>>>>>> "read the old state, commit it, and start fresh with the
>> >>>>>>>>>> StreamingFileSink."
>> >>>>>>>>>>
>> >>>>>>>>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>> On 13.10.20 14:01, David Anderson wrote:
>> >>>>>>>>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
>> >>>>>>>> Handling --
>> >>>>>>>>>> and
>> >>>>>>>>>>>> in fact, the StreamingFileSink is mentioned in that FLIP as a
>> >>>>>>>>>> motivating
>> >>>>>>>>>>>> use case.
>> >>>>>>>>>>> Ah yes, I see FLIP-147 as a more general replacement for FLIP-46.
>> >>>>>>>> Thanks
>> >>>>>>>>>>> for the reminder, we should close FLIP-46 now with an explanatory
>> >>>>>>>>>>> message to avoid confusion.
>> >>>>>>>>> --
>> >>>>>>>>>
>> >>>>>>>>> Arvid Heise | Senior Java Developer
>> >>>>>>>>>
>> >>>>>>>>> <https://www.ververica.com/>
>> >>>>>>>>>
>> >>>>>>>>> Follow us @VervericaData
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>>
>> >>>>>>>>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>> >>>>>>>>> Conference
>> >>>>>>>>>
>> >>>>>>>>> Stream Processing | Event Driven | Real Time
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>>
>> >>>>>>>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> Ververica GmbH
>> >>>>>>>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>> >>>>>>>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
>> >>>>>>>>> (Toni) Cheng
>> >>
>>

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Stephan Ewen <se...@apache.org>.
+1 to remove the Bucketing Sink.

It has been very common in the past to remove code that was deprecated for
multiple releases in favor of reducing baggage.
Also in cases that had no perfect drop-in replacement, but needed users to
forward fit the code.
I am not sure I understand why this case is so different.

Why the Bucketing Sink should be thrown out, in my opinion:

The Bucketing sink makes it easier for users to add general Hadoop writes.
But the price is that it easily leads to dataloss, because it assumes
flush()/sync() work reliably on Hadoop relicably, which they don't (HDFS
works somewhat, S3 works not at all).
I think the Bucketing sink is a trap for users, that's why it was
deprecated long ago.

The StreamingFileSink covers the majority of cases from the Bucketing Sink.
It does have some friction when adding/wrapping some general Hadoop
writers. Parts will be solved with the transactional sink work.
If something is missing and blocking users, we can prioritize adding it to
the Streaming File Sink. Also that is something we did before and it helped
being pragmatic with moving forward, rather than being held back by "maybe
there is something we don't know".




On Wed, Oct 28, 2020 at 12:36 PM Chesnay Schepler <ch...@apache.org>
wrote:

> Then we can't remove it, because there is no way for us to ascertain
> whether anyone is still using it.
>
> Sure, the user ML is the best we got, but you can't argue that we don't
> want any users to be affected and then use an imperfect mean to find users.
> If you are fine with relying on the user ML, then you _are_ fine with
> removing it at the cost of friction for some users.
>
> To be clear, I, personally, don't have a problem with removing it (we
> have removed other connectors in the past that did not have a migration
> plan), I just reject he argumentation.
>
> On 10/28/2020 12:21 PM, Kostas Kloudas wrote:
> > No, I do not think that "we are fine with removing it at the cost of
> > friction for some users".
> >
> > I believe that this can be another discussion that we should have as
> > soon as we establish that someone is actually using it. The point I am
> > trying to make is that if no user is using it, we should remove it and
> > not leave unmaintained code around.
> >
> > On Wed, Oct 28, 2020 at 12:11 PM Chesnay Schepler <ch...@apache.org>
> wrote:
> >> The alternative could also be to use a different argument than "no one
> >> uses it", e.g., we are fine with removing it at the cost of friction for
> >> some users because there are better alternatives.
> >>
> >> On 10/28/2020 10:46 AM, Kostas Kloudas wrote:
> >>> I think that the mailing lists is the best we can do and I would say
> >>> that they seem to be working pretty well (e.g. the recent Mesos
> >>> discussion).
> >>> Of course they are not perfect but the alternative would be to never
> >>> remove anything user facing until the next major release, which I find
> >>> pretty strict.
> >>>
> >>> On Wed, Oct 28, 2020 at 10:04 AM Chesnay Schepler <ch...@apache.org>
> wrote:
> >>>> If the conclusion is that we shouldn't remove it if _anyone_ is using
> >>>> it, then we cannot remove it because the user ML obviously does not
> >>>> reach all users.
> >>>>
> >>>> On 10/28/2020 9:28 AM, Kostas Kloudas wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> I am bringing the up again to see if there are any users actively
> >>>>> using the BucketingSink.
> >>>>> So far, if I am not mistaken (and really sorry if I forgot anything),
> >>>>> it is only a discussion between devs about the potential problems of
> >>>>> removing it. I totally understand Chesnay's concern about not
> >>>>> providing compatibility with the StreamingFileSink (SFS) and if there
> >>>>> are any users, then we should not remove it without trying to find a
> >>>>> solution for them.
> >>>>>
> >>>>> But if there are no users then I would still propose to remove the
> >>>>> module, given that I am not aware of any efforts to provide
> >>>>> compatibility with the SFS any time soon.
> >>>>> The reasons for removing it also include the facts that we do not
> >>>>> actively maintain it and we do not add new features. As for potential
> >>>>> missing features in the SFS compared to the BucketingSink that was
> >>>>> mentioned before, I am not aware of any fundamental limitations and
> >>>>> even if there are, I would assume that the solution is not to direct
> >>>>> the users to a deprecated sink but rather try to increase the
> >>>>> functionality of the actively maintained one.
> >>>>>
> >>>>> Please keep in mind that the BucketingSink is deprecated since FLINK
> >>>>> 1.9 and there is a new File Sink that is coming as part of FLIP-143
> >>>>> [1].
> >>>>> Again, if there are any active users who cannot migrate easily, then
> >>>>> we cannot remove it before trying to provide a smooth migration path.
> >>>>>
> >>>>> Thanks,
> >>>>> Kostas
> >>>>>
> >>>>> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
> >>>>>
> >>>>> On Fri, Oct 16, 2020 at 4:36 PM Chesnay Schepler <ch...@apache.org>
> wrote:
> >>>>>> @Seth: Earlier in this discussion it was said that the BucketingSink
> >>>>>> would not be usable in 1.12 .
> >>>>>>
> >>>>>> On 10/16/2020 4:25 PM, Seth Wiesman wrote:
> >>>>>>> +1 It has been deprecated for some time and the StreamingFileSink
> has
> >>>>>>> stabalized with a large number of formats and features.
> >>>>>>>
> >>>>>>> Plus, the bucketing sink only implements a small number of stable
> >>>>>>> interfaces[1]. I would expect users to continue to use the
> bucketing sink
> >>>>>>> from the 1.11 release with future versions for some time.
> >>>>>>>
> >>>>>>> Seth
> >>>>>>>
> >>>>>>>
> https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
> >>>>>>>
> >>>>>>> On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kk...@gmail.com>
> wrote:
> >>>>>>>
> >>>>>>>> @Arvid Heise I also do not remember exactly what were all the
> >>>>>>>> problems. The fact that we added some more bulk formats to the
> >>>>>>>> streaming file sink definitely reduced the non-supported
> features. In
> >>>>>>>> addition, the latest discussion I found on the topic was [1] and
> the
> >>>>>>>> conclusion of that discussion seems to be to remove it.
> >>>>>>>>
> >>>>>>>> Currently, I cannot find any obvious reason why keeping the
> >>>>>>>> BucketingSink, apart from the fact that we do not have a migration
> >>>>>>>> plan unfortunately. This is why I posted this to dev@ and user@.
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Kostas
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>>
> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
> >>>>>>>>
> >>>>>>>> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com>
> wrote:
> >>>>>>>>> I remember this conversation popping up a few times already and
> I'm in
> >>>>>>>>> general a big fan of removing BucketingSink.
> >>>>>>>>>
> >>>>>>>>> However, until now there were a few features lacking in
> StreamingFileSink
> >>>>>>>>> that are present in BucketingSink and that are being actively
> used (I
> >>>>>>>> can't
> >>>>>>>>> exactly remember them now, but I can look it up if everyone else
> is also
> >>>>>>>>> suffering from bad memory). Did we manage to add them in the
> meantime? If
> >>>>>>>>> not, then it feels rushed to remove it at this point.
> >>>>>>>>>
> >>>>>>>>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <
> kkloudas@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>> @Chesnay Schepler  Off the top of my head, I cannot find an
> easy way
> >>>>>>>>>> to migrate from the BucketingSink to the StreamingFileSink. It
> may be
> >>>>>>>>>> possible but it will require some effort because the logic
> would be
> >>>>>>>>>> "read the old state, commit it, and start fresh with the
> >>>>>>>>>> StreamingFileSink."
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <
> aljoscha@apache.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>> On 13.10.20 14:01, David Anderson wrote:
> >>>>>>>>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
> >>>>>>>> Handling --
> >>>>>>>>>> and
> >>>>>>>>>>>> in fact, the StreamingFileSink is mentioned in that FLIP as a
> >>>>>>>>>> motivating
> >>>>>>>>>>>> use case.
> >>>>>>>>>>> Ah yes, I see FLIP-147 as a more general replacement for
> FLIP-46.
> >>>>>>>> Thanks
> >>>>>>>>>>> for the reminder, we should close FLIP-46 now with an
> explanatory
> >>>>>>>>>>> message to avoid confusion.
> >>>>>>>>> --
> >>>>>>>>>
> >>>>>>>>> Arvid Heise | Senior Java Developer
> >>>>>>>>>
> >>>>>>>>> <https://www.ververica.com/>
> >>>>>>>>>
> >>>>>>>>> Follow us @VervericaData
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>>
> >>>>>>>>> Join Flink Forward <https://flink-forward.org/> - The Apache
> Flink
> >>>>>>>>> Conference
> >>>>>>>>>
> >>>>>>>>> Stream Processing | Event Driven | Real Time
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>>
> >>>>>>>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Ververica GmbH
> >>>>>>>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> >>>>>>>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung
> Jason, Ji
> >>>>>>>>> (Toni) Cheng
> >>
>
>

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Stephan Ewen <se...@apache.org>.
+1 to remove the Bucketing Sink.

It has been very common in the past to remove code that was deprecated for
multiple releases in favor of reducing baggage.
Also in cases that had no perfect drop-in replacement, but needed users to
forward fit the code.
I am not sure I understand why this case is so different.

Why the Bucketing Sink should be thrown out, in my opinion:

The Bucketing sink makes it easier for users to add general Hadoop writes.
But the price is that it easily leads to dataloss, because it assumes
flush()/sync() work reliably on Hadoop relicably, which they don't (HDFS
works somewhat, S3 works not at all).
I think the Bucketing sink is a trap for users, that's why it was
deprecated long ago.

The StreamingFileSink covers the majority of cases from the Bucketing Sink.
It does have some friction when adding/wrapping some general Hadoop
writers. Parts will be solved with the transactional sink work.
If something is missing and blocking users, we can prioritize adding it to
the Streaming File Sink. Also that is something we did before and it helped
being pragmatic with moving forward, rather than being held back by "maybe
there is something we don't know".




On Wed, Oct 28, 2020 at 12:36 PM Chesnay Schepler <ch...@apache.org>
wrote:

> Then we can't remove it, because there is no way for us to ascertain
> whether anyone is still using it.
>
> Sure, the user ML is the best we got, but you can't argue that we don't
> want any users to be affected and then use an imperfect mean to find users.
> If you are fine with relying on the user ML, then you _are_ fine with
> removing it at the cost of friction for some users.
>
> To be clear, I, personally, don't have a problem with removing it (we
> have removed other connectors in the past that did not have a migration
> plan), I just reject he argumentation.
>
> On 10/28/2020 12:21 PM, Kostas Kloudas wrote:
> > No, I do not think that "we are fine with removing it at the cost of
> > friction for some users".
> >
> > I believe that this can be another discussion that we should have as
> > soon as we establish that someone is actually using it. The point I am
> > trying to make is that if no user is using it, we should remove it and
> > not leave unmaintained code around.
> >
> > On Wed, Oct 28, 2020 at 12:11 PM Chesnay Schepler <ch...@apache.org>
> wrote:
> >> The alternative could also be to use a different argument than "no one
> >> uses it", e.g., we are fine with removing it at the cost of friction for
> >> some users because there are better alternatives.
> >>
> >> On 10/28/2020 10:46 AM, Kostas Kloudas wrote:
> >>> I think that the mailing lists is the best we can do and I would say
> >>> that they seem to be working pretty well (e.g. the recent Mesos
> >>> discussion).
> >>> Of course they are not perfect but the alternative would be to never
> >>> remove anything user facing until the next major release, which I find
> >>> pretty strict.
> >>>
> >>> On Wed, Oct 28, 2020 at 10:04 AM Chesnay Schepler <ch...@apache.org>
> wrote:
> >>>> If the conclusion is that we shouldn't remove it if _anyone_ is using
> >>>> it, then we cannot remove it because the user ML obviously does not
> >>>> reach all users.
> >>>>
> >>>> On 10/28/2020 9:28 AM, Kostas Kloudas wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> I am bringing the up again to see if there are any users actively
> >>>>> using the BucketingSink.
> >>>>> So far, if I am not mistaken (and really sorry if I forgot anything),
> >>>>> it is only a discussion between devs about the potential problems of
> >>>>> removing it. I totally understand Chesnay's concern about not
> >>>>> providing compatibility with the StreamingFileSink (SFS) and if there
> >>>>> are any users, then we should not remove it without trying to find a
> >>>>> solution for them.
> >>>>>
> >>>>> But if there are no users then I would still propose to remove the
> >>>>> module, given that I am not aware of any efforts to provide
> >>>>> compatibility with the SFS any time soon.
> >>>>> The reasons for removing it also include the facts that we do not
> >>>>> actively maintain it and we do not add new features. As for potential
> >>>>> missing features in the SFS compared to the BucketingSink that was
> >>>>> mentioned before, I am not aware of any fundamental limitations and
> >>>>> even if there are, I would assume that the solution is not to direct
> >>>>> the users to a deprecated sink but rather try to increase the
> >>>>> functionality of the actively maintained one.
> >>>>>
> >>>>> Please keep in mind that the BucketingSink is deprecated since FLINK
> >>>>> 1.9 and there is a new File Sink that is coming as part of FLIP-143
> >>>>> [1].
> >>>>> Again, if there are any active users who cannot migrate easily, then
> >>>>> we cannot remove it before trying to provide a smooth migration path.
> >>>>>
> >>>>> Thanks,
> >>>>> Kostas
> >>>>>
> >>>>> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
> >>>>>
> >>>>> On Fri, Oct 16, 2020 at 4:36 PM Chesnay Schepler <ch...@apache.org>
> wrote:
> >>>>>> @Seth: Earlier in this discussion it was said that the BucketingSink
> >>>>>> would not be usable in 1.12 .
> >>>>>>
> >>>>>> On 10/16/2020 4:25 PM, Seth Wiesman wrote:
> >>>>>>> +1 It has been deprecated for some time and the StreamingFileSink
> has
> >>>>>>> stabalized with a large number of formats and features.
> >>>>>>>
> >>>>>>> Plus, the bucketing sink only implements a small number of stable
> >>>>>>> interfaces[1]. I would expect users to continue to use the
> bucketing sink
> >>>>>>> from the 1.11 release with future versions for some time.
> >>>>>>>
> >>>>>>> Seth
> >>>>>>>
> >>>>>>>
> https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
> >>>>>>>
> >>>>>>> On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kk...@gmail.com>
> wrote:
> >>>>>>>
> >>>>>>>> @Arvid Heise I also do not remember exactly what were all the
> >>>>>>>> problems. The fact that we added some more bulk formats to the
> >>>>>>>> streaming file sink definitely reduced the non-supported
> features. In
> >>>>>>>> addition, the latest discussion I found on the topic was [1] and
> the
> >>>>>>>> conclusion of that discussion seems to be to remove it.
> >>>>>>>>
> >>>>>>>> Currently, I cannot find any obvious reason why keeping the
> >>>>>>>> BucketingSink, apart from the fact that we do not have a migration
> >>>>>>>> plan unfortunately. This is why I posted this to dev@ and user@.
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Kostas
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>>
> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
> >>>>>>>>
> >>>>>>>> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com>
> wrote:
> >>>>>>>>> I remember this conversation popping up a few times already and
> I'm in
> >>>>>>>>> general a big fan of removing BucketingSink.
> >>>>>>>>>
> >>>>>>>>> However, until now there were a few features lacking in
> StreamingFileSink
> >>>>>>>>> that are present in BucketingSink and that are being actively
> used (I
> >>>>>>>> can't
> >>>>>>>>> exactly remember them now, but I can look it up if everyone else
> is also
> >>>>>>>>> suffering from bad memory). Did we manage to add them in the
> meantime? If
> >>>>>>>>> not, then it feels rushed to remove it at this point.
> >>>>>>>>>
> >>>>>>>>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <
> kkloudas@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>> @Chesnay Schepler  Off the top of my head, I cannot find an
> easy way
> >>>>>>>>>> to migrate from the BucketingSink to the StreamingFileSink. It
> may be
> >>>>>>>>>> possible but it will require some effort because the logic
> would be
> >>>>>>>>>> "read the old state, commit it, and start fresh with the
> >>>>>>>>>> StreamingFileSink."
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <
> aljoscha@apache.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>> On 13.10.20 14:01, David Anderson wrote:
> >>>>>>>>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
> >>>>>>>> Handling --
> >>>>>>>>>> and
> >>>>>>>>>>>> in fact, the StreamingFileSink is mentioned in that FLIP as a
> >>>>>>>>>> motivating
> >>>>>>>>>>>> use case.
> >>>>>>>>>>> Ah yes, I see FLIP-147 as a more general replacement for
> FLIP-46.
> >>>>>>>> Thanks
> >>>>>>>>>>> for the reminder, we should close FLIP-46 now with an
> explanatory
> >>>>>>>>>>> message to avoid confusion.
> >>>>>>>>> --
> >>>>>>>>>
> >>>>>>>>> Arvid Heise | Senior Java Developer
> >>>>>>>>>
> >>>>>>>>> <https://www.ververica.com/>
> >>>>>>>>>
> >>>>>>>>> Follow us @VervericaData
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>>
> >>>>>>>>> Join Flink Forward <https://flink-forward.org/> - The Apache
> Flink
> >>>>>>>>> Conference
> >>>>>>>>>
> >>>>>>>>> Stream Processing | Event Driven | Real Time
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>>
> >>>>>>>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Ververica GmbH
> >>>>>>>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> >>>>>>>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung
> Jason, Ji
> >>>>>>>>> (Toni) Cheng
> >>
>
>

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Chesnay Schepler <ch...@apache.org>.
Then we can't remove it, because there is no way for us to ascertain 
whether anyone is still using it.

Sure, the user ML is the best we got, but you can't argue that we don't 
want any users to be affected and then use an imperfect mean to find users.
If you are fine with relying on the user ML, then you _are_ fine with 
removing it at the cost of friction for some users.

To be clear, I, personally, don't have a problem with removing it (we 
have removed other connectors in the past that did not have a migration 
plan), I just reject he argumentation.

On 10/28/2020 12:21 PM, Kostas Kloudas wrote:
> No, I do not think that "we are fine with removing it at the cost of
> friction for some users".
>
> I believe that this can be another discussion that we should have as
> soon as we establish that someone is actually using it. The point I am
> trying to make is that if no user is using it, we should remove it and
> not leave unmaintained code around.
>
> On Wed, Oct 28, 2020 at 12:11 PM Chesnay Schepler <ch...@apache.org> wrote:
>> The alternative could also be to use a different argument than "no one
>> uses it", e.g., we are fine with removing it at the cost of friction for
>> some users because there are better alternatives.
>>
>> On 10/28/2020 10:46 AM, Kostas Kloudas wrote:
>>> I think that the mailing lists is the best we can do and I would say
>>> that they seem to be working pretty well (e.g. the recent Mesos
>>> discussion).
>>> Of course they are not perfect but the alternative would be to never
>>> remove anything user facing until the next major release, which I find
>>> pretty strict.
>>>
>>> On Wed, Oct 28, 2020 at 10:04 AM Chesnay Schepler <ch...@apache.org> wrote:
>>>> If the conclusion is that we shouldn't remove it if _anyone_ is using
>>>> it, then we cannot remove it because the user ML obviously does not
>>>> reach all users.
>>>>
>>>> On 10/28/2020 9:28 AM, Kostas Kloudas wrote:
>>>>> Hi all,
>>>>>
>>>>> I am bringing the up again to see if there are any users actively
>>>>> using the BucketingSink.
>>>>> So far, if I am not mistaken (and really sorry if I forgot anything),
>>>>> it is only a discussion between devs about the potential problems of
>>>>> removing it. I totally understand Chesnay's concern about not
>>>>> providing compatibility with the StreamingFileSink (SFS) and if there
>>>>> are any users, then we should not remove it without trying to find a
>>>>> solution for them.
>>>>>
>>>>> But if there are no users then I would still propose to remove the
>>>>> module, given that I am not aware of any efforts to provide
>>>>> compatibility with the SFS any time soon.
>>>>> The reasons for removing it also include the facts that we do not
>>>>> actively maintain it and we do not add new features. As for potential
>>>>> missing features in the SFS compared to the BucketingSink that was
>>>>> mentioned before, I am not aware of any fundamental limitations and
>>>>> even if there are, I would assume that the solution is not to direct
>>>>> the users to a deprecated sink but rather try to increase the
>>>>> functionality of the actively maintained one.
>>>>>
>>>>> Please keep in mind that the BucketingSink is deprecated since FLINK
>>>>> 1.9 and there is a new File Sink that is coming as part of FLIP-143
>>>>> [1].
>>>>> Again, if there are any active users who cannot migrate easily, then
>>>>> we cannot remove it before trying to provide a smooth migration path.
>>>>>
>>>>> Thanks,
>>>>> Kostas
>>>>>
>>>>> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
>>>>>
>>>>> On Fri, Oct 16, 2020 at 4:36 PM Chesnay Schepler <ch...@apache.org> wrote:
>>>>>> @Seth: Earlier in this discussion it was said that the BucketingSink
>>>>>> would not be usable in 1.12 .
>>>>>>
>>>>>> On 10/16/2020 4:25 PM, Seth Wiesman wrote:
>>>>>>> +1 It has been deprecated for some time and the StreamingFileSink has
>>>>>>> stabalized with a large number of formats and features.
>>>>>>>
>>>>>>> Plus, the bucketing sink only implements a small number of stable
>>>>>>> interfaces[1]. I would expect users to continue to use the bucketing sink
>>>>>>> from the 1.11 release with future versions for some time.
>>>>>>>
>>>>>>> Seth
>>>>>>>
>>>>>>> https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
>>>>>>>
>>>>>>> On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kk...@gmail.com> wrote:
>>>>>>>
>>>>>>>> @Arvid Heise I also do not remember exactly what were all the
>>>>>>>> problems. The fact that we added some more bulk formats to the
>>>>>>>> streaming file sink definitely reduced the non-supported features. In
>>>>>>>> addition, the latest discussion I found on the topic was [1] and the
>>>>>>>> conclusion of that discussion seems to be to remove it.
>>>>>>>>
>>>>>>>> Currently, I cannot find any obvious reason why keeping the
>>>>>>>> BucketingSink, apart from the fact that we do not have a migration
>>>>>>>> plan unfortunately. This is why I posted this to dev@ and user@.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Kostas
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
>>>>>>>>
>>>>>>>> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:
>>>>>>>>> I remember this conversation popping up a few times already and I'm in
>>>>>>>>> general a big fan of removing BucketingSink.
>>>>>>>>>
>>>>>>>>> However, until now there were a few features lacking in StreamingFileSink
>>>>>>>>> that are present in BucketingSink and that are being actively used (I
>>>>>>>> can't
>>>>>>>>> exactly remember them now, but I can look it up if everyone else is also
>>>>>>>>> suffering from bad memory). Did we manage to add them in the meantime? If
>>>>>>>>> not, then it feels rushed to remove it at this point.
>>>>>>>>>
>>>>>>>>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kk...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>> @Chesnay Schepler  Off the top of my head, I cannot find an easy way
>>>>>>>>>> to migrate from the BucketingSink to the StreamingFileSink. It may be
>>>>>>>>>> possible but it will require some effort because the logic would be
>>>>>>>>>> "read the old state, commit it, and start fresh with the
>>>>>>>>>> StreamingFileSink."
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>> On 13.10.20 14:01, David Anderson wrote:
>>>>>>>>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
>>>>>>>> Handling --
>>>>>>>>>> and
>>>>>>>>>>>> in fact, the StreamingFileSink is mentioned in that FLIP as a
>>>>>>>>>> motivating
>>>>>>>>>>>> use case.
>>>>>>>>>>> Ah yes, I see FLIP-147 as a more general replacement for FLIP-46.
>>>>>>>> Thanks
>>>>>>>>>>> for the reminder, we should close FLIP-46 now with an explanatory
>>>>>>>>>>> message to avoid confusion.
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Arvid Heise | Senior Java Developer
>>>>>>>>>
>>>>>>>>> <https://www.ververica.com/>
>>>>>>>>>
>>>>>>>>> Follow us @VervericaData
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>>>>>>>>> Conference
>>>>>>>>>
>>>>>>>>> Stream Processing | Event Driven | Real Time
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Ververica GmbH
>>>>>>>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>>>>>>>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
>>>>>>>>> (Toni) Cheng
>>


Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Chesnay Schepler <ch...@apache.org>.
Then we can't remove it, because there is no way for us to ascertain 
whether anyone is still using it.

Sure, the user ML is the best we got, but you can't argue that we don't 
want any users to be affected and then use an imperfect mean to find users.
If you are fine with relying on the user ML, then you _are_ fine with 
removing it at the cost of friction for some users.

To be clear, I, personally, don't have a problem with removing it (we 
have removed other connectors in the past that did not have a migration 
plan), I just reject he argumentation.

On 10/28/2020 12:21 PM, Kostas Kloudas wrote:
> No, I do not think that "we are fine with removing it at the cost of
> friction for some users".
>
> I believe that this can be another discussion that we should have as
> soon as we establish that someone is actually using it. The point I am
> trying to make is that if no user is using it, we should remove it and
> not leave unmaintained code around.
>
> On Wed, Oct 28, 2020 at 12:11 PM Chesnay Schepler <ch...@apache.org> wrote:
>> The alternative could also be to use a different argument than "no one
>> uses it", e.g., we are fine with removing it at the cost of friction for
>> some users because there are better alternatives.
>>
>> On 10/28/2020 10:46 AM, Kostas Kloudas wrote:
>>> I think that the mailing lists is the best we can do and I would say
>>> that they seem to be working pretty well (e.g. the recent Mesos
>>> discussion).
>>> Of course they are not perfect but the alternative would be to never
>>> remove anything user facing until the next major release, which I find
>>> pretty strict.
>>>
>>> On Wed, Oct 28, 2020 at 10:04 AM Chesnay Schepler <ch...@apache.org> wrote:
>>>> If the conclusion is that we shouldn't remove it if _anyone_ is using
>>>> it, then we cannot remove it because the user ML obviously does not
>>>> reach all users.
>>>>
>>>> On 10/28/2020 9:28 AM, Kostas Kloudas wrote:
>>>>> Hi all,
>>>>>
>>>>> I am bringing the up again to see if there are any users actively
>>>>> using the BucketingSink.
>>>>> So far, if I am not mistaken (and really sorry if I forgot anything),
>>>>> it is only a discussion between devs about the potential problems of
>>>>> removing it. I totally understand Chesnay's concern about not
>>>>> providing compatibility with the StreamingFileSink (SFS) and if there
>>>>> are any users, then we should not remove it without trying to find a
>>>>> solution for them.
>>>>>
>>>>> But if there are no users then I would still propose to remove the
>>>>> module, given that I am not aware of any efforts to provide
>>>>> compatibility with the SFS any time soon.
>>>>> The reasons for removing it also include the facts that we do not
>>>>> actively maintain it and we do not add new features. As for potential
>>>>> missing features in the SFS compared to the BucketingSink that was
>>>>> mentioned before, I am not aware of any fundamental limitations and
>>>>> even if there are, I would assume that the solution is not to direct
>>>>> the users to a deprecated sink but rather try to increase the
>>>>> functionality of the actively maintained one.
>>>>>
>>>>> Please keep in mind that the BucketingSink is deprecated since FLINK
>>>>> 1.9 and there is a new File Sink that is coming as part of FLIP-143
>>>>> [1].
>>>>> Again, if there are any active users who cannot migrate easily, then
>>>>> we cannot remove it before trying to provide a smooth migration path.
>>>>>
>>>>> Thanks,
>>>>> Kostas
>>>>>
>>>>> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
>>>>>
>>>>> On Fri, Oct 16, 2020 at 4:36 PM Chesnay Schepler <ch...@apache.org> wrote:
>>>>>> @Seth: Earlier in this discussion it was said that the BucketingSink
>>>>>> would not be usable in 1.12 .
>>>>>>
>>>>>> On 10/16/2020 4:25 PM, Seth Wiesman wrote:
>>>>>>> +1 It has been deprecated for some time and the StreamingFileSink has
>>>>>>> stabalized with a large number of formats and features.
>>>>>>>
>>>>>>> Plus, the bucketing sink only implements a small number of stable
>>>>>>> interfaces[1]. I would expect users to continue to use the bucketing sink
>>>>>>> from the 1.11 release with future versions for some time.
>>>>>>>
>>>>>>> Seth
>>>>>>>
>>>>>>> https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
>>>>>>>
>>>>>>> On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kk...@gmail.com> wrote:
>>>>>>>
>>>>>>>> @Arvid Heise I also do not remember exactly what were all the
>>>>>>>> problems. The fact that we added some more bulk formats to the
>>>>>>>> streaming file sink definitely reduced the non-supported features. In
>>>>>>>> addition, the latest discussion I found on the topic was [1] and the
>>>>>>>> conclusion of that discussion seems to be to remove it.
>>>>>>>>
>>>>>>>> Currently, I cannot find any obvious reason why keeping the
>>>>>>>> BucketingSink, apart from the fact that we do not have a migration
>>>>>>>> plan unfortunately. This is why I posted this to dev@ and user@.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Kostas
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
>>>>>>>>
>>>>>>>> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:
>>>>>>>>> I remember this conversation popping up a few times already and I'm in
>>>>>>>>> general a big fan of removing BucketingSink.
>>>>>>>>>
>>>>>>>>> However, until now there were a few features lacking in StreamingFileSink
>>>>>>>>> that are present in BucketingSink and that are being actively used (I
>>>>>>>> can't
>>>>>>>>> exactly remember them now, but I can look it up if everyone else is also
>>>>>>>>> suffering from bad memory). Did we manage to add them in the meantime? If
>>>>>>>>> not, then it feels rushed to remove it at this point.
>>>>>>>>>
>>>>>>>>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kk...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>> @Chesnay Schepler  Off the top of my head, I cannot find an easy way
>>>>>>>>>> to migrate from the BucketingSink to the StreamingFileSink. It may be
>>>>>>>>>> possible but it will require some effort because the logic would be
>>>>>>>>>> "read the old state, commit it, and start fresh with the
>>>>>>>>>> StreamingFileSink."
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>> On 13.10.20 14:01, David Anderson wrote:
>>>>>>>>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
>>>>>>>> Handling --
>>>>>>>>>> and
>>>>>>>>>>>> in fact, the StreamingFileSink is mentioned in that FLIP as a
>>>>>>>>>> motivating
>>>>>>>>>>>> use case.
>>>>>>>>>>> Ah yes, I see FLIP-147 as a more general replacement for FLIP-46.
>>>>>>>> Thanks
>>>>>>>>>>> for the reminder, we should close FLIP-46 now with an explanatory
>>>>>>>>>>> message to avoid confusion.
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Arvid Heise | Senior Java Developer
>>>>>>>>>
>>>>>>>>> <https://www.ververica.com/>
>>>>>>>>>
>>>>>>>>> Follow us @VervericaData
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>>>>>>>>> Conference
>>>>>>>>>
>>>>>>>>> Stream Processing | Event Driven | Real Time
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Ververica GmbH
>>>>>>>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>>>>>>>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
>>>>>>>>> (Toni) Cheng
>>


Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Kostas Kloudas <kk...@gmail.com>.
No, I do not think that "we are fine with removing it at the cost of
friction for some users".

I believe that this can be another discussion that we should have as
soon as we establish that someone is actually using it. The point I am
trying to make is that if no user is using it, we should remove it and
not leave unmaintained code around.

On Wed, Oct 28, 2020 at 12:11 PM Chesnay Schepler <ch...@apache.org> wrote:
>
> The alternative could also be to use a different argument than "no one
> uses it", e.g., we are fine with removing it at the cost of friction for
> some users because there are better alternatives.
>
> On 10/28/2020 10:46 AM, Kostas Kloudas wrote:
> > I think that the mailing lists is the best we can do and I would say
> > that they seem to be working pretty well (e.g. the recent Mesos
> > discussion).
> > Of course they are not perfect but the alternative would be to never
> > remove anything user facing until the next major release, which I find
> > pretty strict.
> >
> > On Wed, Oct 28, 2020 at 10:04 AM Chesnay Schepler <ch...@apache.org> wrote:
> >> If the conclusion is that we shouldn't remove it if _anyone_ is using
> >> it, then we cannot remove it because the user ML obviously does not
> >> reach all users.
> >>
> >> On 10/28/2020 9:28 AM, Kostas Kloudas wrote:
> >>> Hi all,
> >>>
> >>> I am bringing the up again to see if there are any users actively
> >>> using the BucketingSink.
> >>> So far, if I am not mistaken (and really sorry if I forgot anything),
> >>> it is only a discussion between devs about the potential problems of
> >>> removing it. I totally understand Chesnay's concern about not
> >>> providing compatibility with the StreamingFileSink (SFS) and if there
> >>> are any users, then we should not remove it without trying to find a
> >>> solution for them.
> >>>
> >>> But if there are no users then I would still propose to remove the
> >>> module, given that I am not aware of any efforts to provide
> >>> compatibility with the SFS any time soon.
> >>> The reasons for removing it also include the facts that we do not
> >>> actively maintain it and we do not add new features. As for potential
> >>> missing features in the SFS compared to the BucketingSink that was
> >>> mentioned before, I am not aware of any fundamental limitations and
> >>> even if there are, I would assume that the solution is not to direct
> >>> the users to a deprecated sink but rather try to increase the
> >>> functionality of the actively maintained one.
> >>>
> >>> Please keep in mind that the BucketingSink is deprecated since FLINK
> >>> 1.9 and there is a new File Sink that is coming as part of FLIP-143
> >>> [1].
> >>> Again, if there are any active users who cannot migrate easily, then
> >>> we cannot remove it before trying to provide a smooth migration path.
> >>>
> >>> Thanks,
> >>> Kostas
> >>>
> >>> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
> >>>
> >>> On Fri, Oct 16, 2020 at 4:36 PM Chesnay Schepler <ch...@apache.org> wrote:
> >>>> @Seth: Earlier in this discussion it was said that the BucketingSink
> >>>> would not be usable in 1.12 .
> >>>>
> >>>> On 10/16/2020 4:25 PM, Seth Wiesman wrote:
> >>>>> +1 It has been deprecated for some time and the StreamingFileSink has
> >>>>> stabalized with a large number of formats and features.
> >>>>>
> >>>>> Plus, the bucketing sink only implements a small number of stable
> >>>>> interfaces[1]. I would expect users to continue to use the bucketing sink
> >>>>> from the 1.11 release with future versions for some time.
> >>>>>
> >>>>> Seth
> >>>>>
> >>>>> https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
> >>>>>
> >>>>> On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kk...@gmail.com> wrote:
> >>>>>
> >>>>>> @Arvid Heise I also do not remember exactly what were all the
> >>>>>> problems. The fact that we added some more bulk formats to the
> >>>>>> streaming file sink definitely reduced the non-supported features. In
> >>>>>> addition, the latest discussion I found on the topic was [1] and the
> >>>>>> conclusion of that discussion seems to be to remove it.
> >>>>>>
> >>>>>> Currently, I cannot find any obvious reason why keeping the
> >>>>>> BucketingSink, apart from the fact that we do not have a migration
> >>>>>> plan unfortunately. This is why I posted this to dev@ and user@.
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Kostas
> >>>>>>
> >>>>>> [1]
> >>>>>> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
> >>>>>>
> >>>>>> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:
> >>>>>>> I remember this conversation popping up a few times already and I'm in
> >>>>>>> general a big fan of removing BucketingSink.
> >>>>>>>
> >>>>>>> However, until now there were a few features lacking in StreamingFileSink
> >>>>>>> that are present in BucketingSink and that are being actively used (I
> >>>>>> can't
> >>>>>>> exactly remember them now, but I can look it up if everyone else is also
> >>>>>>> suffering from bad memory). Did we manage to add them in the meantime? If
> >>>>>>> not, then it feels rushed to remove it at this point.
> >>>>>>>
> >>>>>>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kk...@gmail.com>
> >>>>>> wrote:
> >>>>>>>> @Chesnay Schepler  Off the top of my head, I cannot find an easy way
> >>>>>>>> to migrate from the BucketingSink to the StreamingFileSink. It may be
> >>>>>>>> possible but it will require some effort because the logic would be
> >>>>>>>> "read the old state, commit it, and start fresh with the
> >>>>>>>> StreamingFileSink."
> >>>>>>>>
> >>>>>>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org>
> >>>>>>>> wrote:
> >>>>>>>>> On 13.10.20 14:01, David Anderson wrote:
> >>>>>>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
> >>>>>> Handling --
> >>>>>>>> and
> >>>>>>>>>> in fact, the StreamingFileSink is mentioned in that FLIP as a
> >>>>>>>> motivating
> >>>>>>>>>> use case.
> >>>>>>>>> Ah yes, I see FLIP-147 as a more general replacement for FLIP-46.
> >>>>>> Thanks
> >>>>>>>>> for the reminder, we should close FLIP-46 now with an explanatory
> >>>>>>>>> message to avoid confusion.
> >>>>>>> --
> >>>>>>>
> >>>>>>> Arvid Heise | Senior Java Developer
> >>>>>>>
> >>>>>>> <https://www.ververica.com/>
> >>>>>>>
> >>>>>>> Follow us @VervericaData
> >>>>>>>
> >>>>>>> --
> >>>>>>>
> >>>>>>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> >>>>>>> Conference
> >>>>>>>
> >>>>>>> Stream Processing | Event Driven | Real Time
> >>>>>>>
> >>>>>>> --
> >>>>>>>
> >>>>>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >>>>>>>
> >>>>>>> --
> >>>>>>> Ververica GmbH
> >>>>>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> >>>>>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> >>>>>>> (Toni) Cheng
>
>

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Kostas Kloudas <kk...@gmail.com>.
No, I do not think that "we are fine with removing it at the cost of
friction for some users".

I believe that this can be another discussion that we should have as
soon as we establish that someone is actually using it. The point I am
trying to make is that if no user is using it, we should remove it and
not leave unmaintained code around.

On Wed, Oct 28, 2020 at 12:11 PM Chesnay Schepler <ch...@apache.org> wrote:
>
> The alternative could also be to use a different argument than "no one
> uses it", e.g., we are fine with removing it at the cost of friction for
> some users because there are better alternatives.
>
> On 10/28/2020 10:46 AM, Kostas Kloudas wrote:
> > I think that the mailing lists is the best we can do and I would say
> > that they seem to be working pretty well (e.g. the recent Mesos
> > discussion).
> > Of course they are not perfect but the alternative would be to never
> > remove anything user facing until the next major release, which I find
> > pretty strict.
> >
> > On Wed, Oct 28, 2020 at 10:04 AM Chesnay Schepler <ch...@apache.org> wrote:
> >> If the conclusion is that we shouldn't remove it if _anyone_ is using
> >> it, then we cannot remove it because the user ML obviously does not
> >> reach all users.
> >>
> >> On 10/28/2020 9:28 AM, Kostas Kloudas wrote:
> >>> Hi all,
> >>>
> >>> I am bringing the up again to see if there are any users actively
> >>> using the BucketingSink.
> >>> So far, if I am not mistaken (and really sorry if I forgot anything),
> >>> it is only a discussion between devs about the potential problems of
> >>> removing it. I totally understand Chesnay's concern about not
> >>> providing compatibility with the StreamingFileSink (SFS) and if there
> >>> are any users, then we should not remove it without trying to find a
> >>> solution for them.
> >>>
> >>> But if there are no users then I would still propose to remove the
> >>> module, given that I am not aware of any efforts to provide
> >>> compatibility with the SFS any time soon.
> >>> The reasons for removing it also include the facts that we do not
> >>> actively maintain it and we do not add new features. As for potential
> >>> missing features in the SFS compared to the BucketingSink that was
> >>> mentioned before, I am not aware of any fundamental limitations and
> >>> even if there are, I would assume that the solution is not to direct
> >>> the users to a deprecated sink but rather try to increase the
> >>> functionality of the actively maintained one.
> >>>
> >>> Please keep in mind that the BucketingSink is deprecated since FLINK
> >>> 1.9 and there is a new File Sink that is coming as part of FLIP-143
> >>> [1].
> >>> Again, if there are any active users who cannot migrate easily, then
> >>> we cannot remove it before trying to provide a smooth migration path.
> >>>
> >>> Thanks,
> >>> Kostas
> >>>
> >>> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
> >>>
> >>> On Fri, Oct 16, 2020 at 4:36 PM Chesnay Schepler <ch...@apache.org> wrote:
> >>>> @Seth: Earlier in this discussion it was said that the BucketingSink
> >>>> would not be usable in 1.12 .
> >>>>
> >>>> On 10/16/2020 4:25 PM, Seth Wiesman wrote:
> >>>>> +1 It has been deprecated for some time and the StreamingFileSink has
> >>>>> stabalized with a large number of formats and features.
> >>>>>
> >>>>> Plus, the bucketing sink only implements a small number of stable
> >>>>> interfaces[1]. I would expect users to continue to use the bucketing sink
> >>>>> from the 1.11 release with future versions for some time.
> >>>>>
> >>>>> Seth
> >>>>>
> >>>>> https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
> >>>>>
> >>>>> On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kk...@gmail.com> wrote:
> >>>>>
> >>>>>> @Arvid Heise I also do not remember exactly what were all the
> >>>>>> problems. The fact that we added some more bulk formats to the
> >>>>>> streaming file sink definitely reduced the non-supported features. In
> >>>>>> addition, the latest discussion I found on the topic was [1] and the
> >>>>>> conclusion of that discussion seems to be to remove it.
> >>>>>>
> >>>>>> Currently, I cannot find any obvious reason why keeping the
> >>>>>> BucketingSink, apart from the fact that we do not have a migration
> >>>>>> plan unfortunately. This is why I posted this to dev@ and user@.
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Kostas
> >>>>>>
> >>>>>> [1]
> >>>>>> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
> >>>>>>
> >>>>>> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:
> >>>>>>> I remember this conversation popping up a few times already and I'm in
> >>>>>>> general a big fan of removing BucketingSink.
> >>>>>>>
> >>>>>>> However, until now there were a few features lacking in StreamingFileSink
> >>>>>>> that are present in BucketingSink and that are being actively used (I
> >>>>>> can't
> >>>>>>> exactly remember them now, but I can look it up if everyone else is also
> >>>>>>> suffering from bad memory). Did we manage to add them in the meantime? If
> >>>>>>> not, then it feels rushed to remove it at this point.
> >>>>>>>
> >>>>>>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kk...@gmail.com>
> >>>>>> wrote:
> >>>>>>>> @Chesnay Schepler  Off the top of my head, I cannot find an easy way
> >>>>>>>> to migrate from the BucketingSink to the StreamingFileSink. It may be
> >>>>>>>> possible but it will require some effort because the logic would be
> >>>>>>>> "read the old state, commit it, and start fresh with the
> >>>>>>>> StreamingFileSink."
> >>>>>>>>
> >>>>>>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org>
> >>>>>>>> wrote:
> >>>>>>>>> On 13.10.20 14:01, David Anderson wrote:
> >>>>>>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
> >>>>>> Handling --
> >>>>>>>> and
> >>>>>>>>>> in fact, the StreamingFileSink is mentioned in that FLIP as a
> >>>>>>>> motivating
> >>>>>>>>>> use case.
> >>>>>>>>> Ah yes, I see FLIP-147 as a more general replacement for FLIP-46.
> >>>>>> Thanks
> >>>>>>>>> for the reminder, we should close FLIP-46 now with an explanatory
> >>>>>>>>> message to avoid confusion.
> >>>>>>> --
> >>>>>>>
> >>>>>>> Arvid Heise | Senior Java Developer
> >>>>>>>
> >>>>>>> <https://www.ververica.com/>
> >>>>>>>
> >>>>>>> Follow us @VervericaData
> >>>>>>>
> >>>>>>> --
> >>>>>>>
> >>>>>>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> >>>>>>> Conference
> >>>>>>>
> >>>>>>> Stream Processing | Event Driven | Real Time
> >>>>>>>
> >>>>>>> --
> >>>>>>>
> >>>>>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >>>>>>>
> >>>>>>> --
> >>>>>>> Ververica GmbH
> >>>>>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> >>>>>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> >>>>>>> (Toni) Cheng
>
>

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Chesnay Schepler <ch...@apache.org>.
The alternative could also be to use a different argument than "no one 
uses it", e.g., we are fine with removing it at the cost of friction for 
some users because there are better alternatives.

On 10/28/2020 10:46 AM, Kostas Kloudas wrote:
> I think that the mailing lists is the best we can do and I would say
> that they seem to be working pretty well (e.g. the recent Mesos
> discussion).
> Of course they are not perfect but the alternative would be to never
> remove anything user facing until the next major release, which I find
> pretty strict.
>
> On Wed, Oct 28, 2020 at 10:04 AM Chesnay Schepler <ch...@apache.org> wrote:
>> If the conclusion is that we shouldn't remove it if _anyone_ is using
>> it, then we cannot remove it because the user ML obviously does not
>> reach all users.
>>
>> On 10/28/2020 9:28 AM, Kostas Kloudas wrote:
>>> Hi all,
>>>
>>> I am bringing the up again to see if there are any users actively
>>> using the BucketingSink.
>>> So far, if I am not mistaken (and really sorry if I forgot anything),
>>> it is only a discussion between devs about the potential problems of
>>> removing it. I totally understand Chesnay's concern about not
>>> providing compatibility with the StreamingFileSink (SFS) and if there
>>> are any users, then we should not remove it without trying to find a
>>> solution for them.
>>>
>>> But if there are no users then I would still propose to remove the
>>> module, given that I am not aware of any efforts to provide
>>> compatibility with the SFS any time soon.
>>> The reasons for removing it also include the facts that we do not
>>> actively maintain it and we do not add new features. As for potential
>>> missing features in the SFS compared to the BucketingSink that was
>>> mentioned before, I am not aware of any fundamental limitations and
>>> even if there are, I would assume that the solution is not to direct
>>> the users to a deprecated sink but rather try to increase the
>>> functionality of the actively maintained one.
>>>
>>> Please keep in mind that the BucketingSink is deprecated since FLINK
>>> 1.9 and there is a new File Sink that is coming as part of FLIP-143
>>> [1].
>>> Again, if there are any active users who cannot migrate easily, then
>>> we cannot remove it before trying to provide a smooth migration path.
>>>
>>> Thanks,
>>> Kostas
>>>
>>> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
>>>
>>> On Fri, Oct 16, 2020 at 4:36 PM Chesnay Schepler <ch...@apache.org> wrote:
>>>> @Seth: Earlier in this discussion it was said that the BucketingSink
>>>> would not be usable in 1.12 .
>>>>
>>>> On 10/16/2020 4:25 PM, Seth Wiesman wrote:
>>>>> +1 It has been deprecated for some time and the StreamingFileSink has
>>>>> stabalized with a large number of formats and features.
>>>>>
>>>>> Plus, the bucketing sink only implements a small number of stable
>>>>> interfaces[1]. I would expect users to continue to use the bucketing sink
>>>>> from the 1.11 release with future versions for some time.
>>>>>
>>>>> Seth
>>>>>
>>>>> https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
>>>>>
>>>>> On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kk...@gmail.com> wrote:
>>>>>
>>>>>> @Arvid Heise I also do not remember exactly what were all the
>>>>>> problems. The fact that we added some more bulk formats to the
>>>>>> streaming file sink definitely reduced the non-supported features. In
>>>>>> addition, the latest discussion I found on the topic was [1] and the
>>>>>> conclusion of that discussion seems to be to remove it.
>>>>>>
>>>>>> Currently, I cannot find any obvious reason why keeping the
>>>>>> BucketingSink, apart from the fact that we do not have a migration
>>>>>> plan unfortunately. This is why I posted this to dev@ and user@.
>>>>>>
>>>>>> Cheers,
>>>>>> Kostas
>>>>>>
>>>>>> [1]
>>>>>> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
>>>>>>
>>>>>> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:
>>>>>>> I remember this conversation popping up a few times already and I'm in
>>>>>>> general a big fan of removing BucketingSink.
>>>>>>>
>>>>>>> However, until now there were a few features lacking in StreamingFileSink
>>>>>>> that are present in BucketingSink and that are being actively used (I
>>>>>> can't
>>>>>>> exactly remember them now, but I can look it up if everyone else is also
>>>>>>> suffering from bad memory). Did we manage to add them in the meantime? If
>>>>>>> not, then it feels rushed to remove it at this point.
>>>>>>>
>>>>>>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kk...@gmail.com>
>>>>>> wrote:
>>>>>>>> @Chesnay Schepler  Off the top of my head, I cannot find an easy way
>>>>>>>> to migrate from the BucketingSink to the StreamingFileSink. It may be
>>>>>>>> possible but it will require some effort because the logic would be
>>>>>>>> "read the old state, commit it, and start fresh with the
>>>>>>>> StreamingFileSink."
>>>>>>>>
>>>>>>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org>
>>>>>>>> wrote:
>>>>>>>>> On 13.10.20 14:01, David Anderson wrote:
>>>>>>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
>>>>>> Handling --
>>>>>>>> and
>>>>>>>>>> in fact, the StreamingFileSink is mentioned in that FLIP as a
>>>>>>>> motivating
>>>>>>>>>> use case.
>>>>>>>>> Ah yes, I see FLIP-147 as a more general replacement for FLIP-46.
>>>>>> Thanks
>>>>>>>>> for the reminder, we should close FLIP-46 now with an explanatory
>>>>>>>>> message to avoid confusion.
>>>>>>> --
>>>>>>>
>>>>>>> Arvid Heise | Senior Java Developer
>>>>>>>
>>>>>>> <https://www.ververica.com/>
>>>>>>>
>>>>>>> Follow us @VervericaData
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>>>>>>> Conference
>>>>>>>
>>>>>>> Stream Processing | Event Driven | Real Time
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>>>>>>
>>>>>>> --
>>>>>>> Ververica GmbH
>>>>>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>>>>>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
>>>>>>> (Toni) Cheng



Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Chesnay Schepler <ch...@apache.org>.
The alternative could also be to use a different argument than "no one 
uses it", e.g., we are fine with removing it at the cost of friction for 
some users because there are better alternatives.

On 10/28/2020 10:46 AM, Kostas Kloudas wrote:
> I think that the mailing lists is the best we can do and I would say
> that they seem to be working pretty well (e.g. the recent Mesos
> discussion).
> Of course they are not perfect but the alternative would be to never
> remove anything user facing until the next major release, which I find
> pretty strict.
>
> On Wed, Oct 28, 2020 at 10:04 AM Chesnay Schepler <ch...@apache.org> wrote:
>> If the conclusion is that we shouldn't remove it if _anyone_ is using
>> it, then we cannot remove it because the user ML obviously does not
>> reach all users.
>>
>> On 10/28/2020 9:28 AM, Kostas Kloudas wrote:
>>> Hi all,
>>>
>>> I am bringing the up again to see if there are any users actively
>>> using the BucketingSink.
>>> So far, if I am not mistaken (and really sorry if I forgot anything),
>>> it is only a discussion between devs about the potential problems of
>>> removing it. I totally understand Chesnay's concern about not
>>> providing compatibility with the StreamingFileSink (SFS) and if there
>>> are any users, then we should not remove it without trying to find a
>>> solution for them.
>>>
>>> But if there are no users then I would still propose to remove the
>>> module, given that I am not aware of any efforts to provide
>>> compatibility with the SFS any time soon.
>>> The reasons for removing it also include the facts that we do not
>>> actively maintain it and we do not add new features. As for potential
>>> missing features in the SFS compared to the BucketingSink that was
>>> mentioned before, I am not aware of any fundamental limitations and
>>> even if there are, I would assume that the solution is not to direct
>>> the users to a deprecated sink but rather try to increase the
>>> functionality of the actively maintained one.
>>>
>>> Please keep in mind that the BucketingSink is deprecated since FLINK
>>> 1.9 and there is a new File Sink that is coming as part of FLIP-143
>>> [1].
>>> Again, if there are any active users who cannot migrate easily, then
>>> we cannot remove it before trying to provide a smooth migration path.
>>>
>>> Thanks,
>>> Kostas
>>>
>>> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
>>>
>>> On Fri, Oct 16, 2020 at 4:36 PM Chesnay Schepler <ch...@apache.org> wrote:
>>>> @Seth: Earlier in this discussion it was said that the BucketingSink
>>>> would not be usable in 1.12 .
>>>>
>>>> On 10/16/2020 4:25 PM, Seth Wiesman wrote:
>>>>> +1 It has been deprecated for some time and the StreamingFileSink has
>>>>> stabalized with a large number of formats and features.
>>>>>
>>>>> Plus, the bucketing sink only implements a small number of stable
>>>>> interfaces[1]. I would expect users to continue to use the bucketing sink
>>>>> from the 1.11 release with future versions for some time.
>>>>>
>>>>> Seth
>>>>>
>>>>> https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
>>>>>
>>>>> On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kk...@gmail.com> wrote:
>>>>>
>>>>>> @Arvid Heise I also do not remember exactly what were all the
>>>>>> problems. The fact that we added some more bulk formats to the
>>>>>> streaming file sink definitely reduced the non-supported features. In
>>>>>> addition, the latest discussion I found on the topic was [1] and the
>>>>>> conclusion of that discussion seems to be to remove it.
>>>>>>
>>>>>> Currently, I cannot find any obvious reason why keeping the
>>>>>> BucketingSink, apart from the fact that we do not have a migration
>>>>>> plan unfortunately. This is why I posted this to dev@ and user@.
>>>>>>
>>>>>> Cheers,
>>>>>> Kostas
>>>>>>
>>>>>> [1]
>>>>>> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
>>>>>>
>>>>>> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:
>>>>>>> I remember this conversation popping up a few times already and I'm in
>>>>>>> general a big fan of removing BucketingSink.
>>>>>>>
>>>>>>> However, until now there were a few features lacking in StreamingFileSink
>>>>>>> that are present in BucketingSink and that are being actively used (I
>>>>>> can't
>>>>>>> exactly remember them now, but I can look it up if everyone else is also
>>>>>>> suffering from bad memory). Did we manage to add them in the meantime? If
>>>>>>> not, then it feels rushed to remove it at this point.
>>>>>>>
>>>>>>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kk...@gmail.com>
>>>>>> wrote:
>>>>>>>> @Chesnay Schepler  Off the top of my head, I cannot find an easy way
>>>>>>>> to migrate from the BucketingSink to the StreamingFileSink. It may be
>>>>>>>> possible but it will require some effort because the logic would be
>>>>>>>> "read the old state, commit it, and start fresh with the
>>>>>>>> StreamingFileSink."
>>>>>>>>
>>>>>>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org>
>>>>>>>> wrote:
>>>>>>>>> On 13.10.20 14:01, David Anderson wrote:
>>>>>>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
>>>>>> Handling --
>>>>>>>> and
>>>>>>>>>> in fact, the StreamingFileSink is mentioned in that FLIP as a
>>>>>>>> motivating
>>>>>>>>>> use case.
>>>>>>>>> Ah yes, I see FLIP-147 as a more general replacement for FLIP-46.
>>>>>> Thanks
>>>>>>>>> for the reminder, we should close FLIP-46 now with an explanatory
>>>>>>>>> message to avoid confusion.
>>>>>>> --
>>>>>>>
>>>>>>> Arvid Heise | Senior Java Developer
>>>>>>>
>>>>>>> <https://www.ververica.com/>
>>>>>>>
>>>>>>> Follow us @VervericaData
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>>>>>>> Conference
>>>>>>>
>>>>>>> Stream Processing | Event Driven | Real Time
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>>>>>>
>>>>>>> --
>>>>>>> Ververica GmbH
>>>>>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>>>>>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
>>>>>>> (Toni) Cheng



Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Kostas Kloudas <kk...@gmail.com>.
I think that the mailing lists is the best we can do and I would say
that they seem to be working pretty well (e.g. the recent Mesos
discussion).
Of course they are not perfect but the alternative would be to never
remove anything user facing until the next major release, which I find
pretty strict.

On Wed, Oct 28, 2020 at 10:04 AM Chesnay Schepler <ch...@apache.org> wrote:
>
> If the conclusion is that we shouldn't remove it if _anyone_ is using
> it, then we cannot remove it because the user ML obviously does not
> reach all users.
>
> On 10/28/2020 9:28 AM, Kostas Kloudas wrote:
> > Hi all,
> >
> > I am bringing the up again to see if there are any users actively
> > using the BucketingSink.
> > So far, if I am not mistaken (and really sorry if I forgot anything),
> > it is only a discussion between devs about the potential problems of
> > removing it. I totally understand Chesnay's concern about not
> > providing compatibility with the StreamingFileSink (SFS) and if there
> > are any users, then we should not remove it without trying to find a
> > solution for them.
> >
> > But if there are no users then I would still propose to remove the
> > module, given that I am not aware of any efforts to provide
> > compatibility with the SFS any time soon.
> > The reasons for removing it also include the facts that we do not
> > actively maintain it and we do not add new features. As for potential
> > missing features in the SFS compared to the BucketingSink that was
> > mentioned before, I am not aware of any fundamental limitations and
> > even if there are, I would assume that the solution is not to direct
> > the users to a deprecated sink but rather try to increase the
> > functionality of the actively maintained one.
> >
> > Please keep in mind that the BucketingSink is deprecated since FLINK
> > 1.9 and there is a new File Sink that is coming as part of FLIP-143
> > [1].
> > Again, if there are any active users who cannot migrate easily, then
> > we cannot remove it before trying to provide a smooth migration path.
> >
> > Thanks,
> > Kostas
> >
> > [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
> >
> > On Fri, Oct 16, 2020 at 4:36 PM Chesnay Schepler <ch...@apache.org> wrote:
> >> @Seth: Earlier in this discussion it was said that the BucketingSink
> >> would not be usable in 1.12 .
> >>
> >> On 10/16/2020 4:25 PM, Seth Wiesman wrote:
> >>> +1 It has been deprecated for some time and the StreamingFileSink has
> >>> stabalized with a large number of formats and features.
> >>>
> >>> Plus, the bucketing sink only implements a small number of stable
> >>> interfaces[1]. I would expect users to continue to use the bucketing sink
> >>> from the 1.11 release with future versions for some time.
> >>>
> >>> Seth
> >>>
> >>> https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
> >>>
> >>> On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kk...@gmail.com> wrote:
> >>>
> >>>> @Arvid Heise I also do not remember exactly what were all the
> >>>> problems. The fact that we added some more bulk formats to the
> >>>> streaming file sink definitely reduced the non-supported features. In
> >>>> addition, the latest discussion I found on the topic was [1] and the
> >>>> conclusion of that discussion seems to be to remove it.
> >>>>
> >>>> Currently, I cannot find any obvious reason why keeping the
> >>>> BucketingSink, apart from the fact that we do not have a migration
> >>>> plan unfortunately. This is why I posted this to dev@ and user@.
> >>>>
> >>>> Cheers,
> >>>> Kostas
> >>>>
> >>>> [1]
> >>>> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
> >>>>
> >>>> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:
> >>>>> I remember this conversation popping up a few times already and I'm in
> >>>>> general a big fan of removing BucketingSink.
> >>>>>
> >>>>> However, until now there were a few features lacking in StreamingFileSink
> >>>>> that are present in BucketingSink and that are being actively used (I
> >>>> can't
> >>>>> exactly remember them now, but I can look it up if everyone else is also
> >>>>> suffering from bad memory). Did we manage to add them in the meantime? If
> >>>>> not, then it feels rushed to remove it at this point.
> >>>>>
> >>>>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kk...@gmail.com>
> >>>> wrote:
> >>>>>> @Chesnay Schepler  Off the top of my head, I cannot find an easy way
> >>>>>> to migrate from the BucketingSink to the StreamingFileSink. It may be
> >>>>>> possible but it will require some effort because the logic would be
> >>>>>> "read the old state, commit it, and start fresh with the
> >>>>>> StreamingFileSink."
> >>>>>>
> >>>>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org>
> >>>>>> wrote:
> >>>>>>> On 13.10.20 14:01, David Anderson wrote:
> >>>>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
> >>>> Handling --
> >>>>>> and
> >>>>>>>> in fact, the StreamingFileSink is mentioned in that FLIP as a
> >>>>>> motivating
> >>>>>>>> use case.
> >>>>>>> Ah yes, I see FLIP-147 as a more general replacement for FLIP-46.
> >>>> Thanks
> >>>>>>> for the reminder, we should close FLIP-46 now with an explanatory
> >>>>>>> message to avoid confusion.
> >>>>> --
> >>>>>
> >>>>> Arvid Heise | Senior Java Developer
> >>>>>
> >>>>> <https://www.ververica.com/>
> >>>>>
> >>>>> Follow us @VervericaData
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> >>>>> Conference
> >>>>>
> >>>>> Stream Processing | Event Driven | Real Time
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >>>>>
> >>>>> --
> >>>>> Ververica GmbH
> >>>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> >>>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> >>>>> (Toni) Cheng
> >>
>

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Kostas Kloudas <kk...@gmail.com>.
I think that the mailing lists is the best we can do and I would say
that they seem to be working pretty well (e.g. the recent Mesos
discussion).
Of course they are not perfect but the alternative would be to never
remove anything user facing until the next major release, which I find
pretty strict.

On Wed, Oct 28, 2020 at 10:04 AM Chesnay Schepler <ch...@apache.org> wrote:
>
> If the conclusion is that we shouldn't remove it if _anyone_ is using
> it, then we cannot remove it because the user ML obviously does not
> reach all users.
>
> On 10/28/2020 9:28 AM, Kostas Kloudas wrote:
> > Hi all,
> >
> > I am bringing the up again to see if there are any users actively
> > using the BucketingSink.
> > So far, if I am not mistaken (and really sorry if I forgot anything),
> > it is only a discussion between devs about the potential problems of
> > removing it. I totally understand Chesnay's concern about not
> > providing compatibility with the StreamingFileSink (SFS) and if there
> > are any users, then we should not remove it without trying to find a
> > solution for them.
> >
> > But if there are no users then I would still propose to remove the
> > module, given that I am not aware of any efforts to provide
> > compatibility with the SFS any time soon.
> > The reasons for removing it also include the facts that we do not
> > actively maintain it and we do not add new features. As for potential
> > missing features in the SFS compared to the BucketingSink that was
> > mentioned before, I am not aware of any fundamental limitations and
> > even if there are, I would assume that the solution is not to direct
> > the users to a deprecated sink but rather try to increase the
> > functionality of the actively maintained one.
> >
> > Please keep in mind that the BucketingSink is deprecated since FLINK
> > 1.9 and there is a new File Sink that is coming as part of FLIP-143
> > [1].
> > Again, if there are any active users who cannot migrate easily, then
> > we cannot remove it before trying to provide a smooth migration path.
> >
> > Thanks,
> > Kostas
> >
> > [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
> >
> > On Fri, Oct 16, 2020 at 4:36 PM Chesnay Schepler <ch...@apache.org> wrote:
> >> @Seth: Earlier in this discussion it was said that the BucketingSink
> >> would not be usable in 1.12 .
> >>
> >> On 10/16/2020 4:25 PM, Seth Wiesman wrote:
> >>> +1 It has been deprecated for some time and the StreamingFileSink has
> >>> stabalized with a large number of formats and features.
> >>>
> >>> Plus, the bucketing sink only implements a small number of stable
> >>> interfaces[1]. I would expect users to continue to use the bucketing sink
> >>> from the 1.11 release with future versions for some time.
> >>>
> >>> Seth
> >>>
> >>> https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
> >>>
> >>> On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kk...@gmail.com> wrote:
> >>>
> >>>> @Arvid Heise I also do not remember exactly what were all the
> >>>> problems. The fact that we added some more bulk formats to the
> >>>> streaming file sink definitely reduced the non-supported features. In
> >>>> addition, the latest discussion I found on the topic was [1] and the
> >>>> conclusion of that discussion seems to be to remove it.
> >>>>
> >>>> Currently, I cannot find any obvious reason why keeping the
> >>>> BucketingSink, apart from the fact that we do not have a migration
> >>>> plan unfortunately. This is why I posted this to dev@ and user@.
> >>>>
> >>>> Cheers,
> >>>> Kostas
> >>>>
> >>>> [1]
> >>>> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
> >>>>
> >>>> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:
> >>>>> I remember this conversation popping up a few times already and I'm in
> >>>>> general a big fan of removing BucketingSink.
> >>>>>
> >>>>> However, until now there were a few features lacking in StreamingFileSink
> >>>>> that are present in BucketingSink and that are being actively used (I
> >>>> can't
> >>>>> exactly remember them now, but I can look it up if everyone else is also
> >>>>> suffering from bad memory). Did we manage to add them in the meantime? If
> >>>>> not, then it feels rushed to remove it at this point.
> >>>>>
> >>>>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kk...@gmail.com>
> >>>> wrote:
> >>>>>> @Chesnay Schepler  Off the top of my head, I cannot find an easy way
> >>>>>> to migrate from the BucketingSink to the StreamingFileSink. It may be
> >>>>>> possible but it will require some effort because the logic would be
> >>>>>> "read the old state, commit it, and start fresh with the
> >>>>>> StreamingFileSink."
> >>>>>>
> >>>>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org>
> >>>>>> wrote:
> >>>>>>> On 13.10.20 14:01, David Anderson wrote:
> >>>>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
> >>>> Handling --
> >>>>>> and
> >>>>>>>> in fact, the StreamingFileSink is mentioned in that FLIP as a
> >>>>>> motivating
> >>>>>>>> use case.
> >>>>>>> Ah yes, I see FLIP-147 as a more general replacement for FLIP-46.
> >>>> Thanks
> >>>>>>> for the reminder, we should close FLIP-46 now with an explanatory
> >>>>>>> message to avoid confusion.
> >>>>> --
> >>>>>
> >>>>> Arvid Heise | Senior Java Developer
> >>>>>
> >>>>> <https://www.ververica.com/>
> >>>>>
> >>>>> Follow us @VervericaData
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> >>>>> Conference
> >>>>>
> >>>>> Stream Processing | Event Driven | Real Time
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >>>>>
> >>>>> --
> >>>>> Ververica GmbH
> >>>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> >>>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> >>>>> (Toni) Cheng
> >>
>

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Chesnay Schepler <ch...@apache.org>.
If the conclusion is that we shouldn't remove it if _anyone_ is using 
it, then we cannot remove it because the user ML obviously does not 
reach all users.

On 10/28/2020 9:28 AM, Kostas Kloudas wrote:
> Hi all,
>
> I am bringing the up again to see if there are any users actively
> using the BucketingSink.
> So far, if I am not mistaken (and really sorry if I forgot anything),
> it is only a discussion between devs about the potential problems of
> removing it. I totally understand Chesnay's concern about not
> providing compatibility with the StreamingFileSink (SFS) and if there
> are any users, then we should not remove it without trying to find a
> solution for them.
>
> But if there are no users then I would still propose to remove the
> module, given that I am not aware of any efforts to provide
> compatibility with the SFS any time soon.
> The reasons for removing it also include the facts that we do not
> actively maintain it and we do not add new features. As for potential
> missing features in the SFS compared to the BucketingSink that was
> mentioned before, I am not aware of any fundamental limitations and
> even if there are, I would assume that the solution is not to direct
> the users to a deprecated sink but rather try to increase the
> functionality of the actively maintained one.
>
> Please keep in mind that the BucketingSink is deprecated since FLINK
> 1.9 and there is a new File Sink that is coming as part of FLIP-143
> [1].
> Again, if there are any active users who cannot migrate easily, then
> we cannot remove it before trying to provide a smooth migration path.
>
> Thanks,
> Kostas
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
>
> On Fri, Oct 16, 2020 at 4:36 PM Chesnay Schepler <ch...@apache.org> wrote:
>> @Seth: Earlier in this discussion it was said that the BucketingSink
>> would not be usable in 1.12 .
>>
>> On 10/16/2020 4:25 PM, Seth Wiesman wrote:
>>> +1 It has been deprecated for some time and the StreamingFileSink has
>>> stabalized with a large number of formats and features.
>>>
>>> Plus, the bucketing sink only implements a small number of stable
>>> interfaces[1]. I would expect users to continue to use the bucketing sink
>>> from the 1.11 release with future versions for some time.
>>>
>>> Seth
>>>
>>> https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
>>>
>>> On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kk...@gmail.com> wrote:
>>>
>>>> @Arvid Heise I also do not remember exactly what were all the
>>>> problems. The fact that we added some more bulk formats to the
>>>> streaming file sink definitely reduced the non-supported features. In
>>>> addition, the latest discussion I found on the topic was [1] and the
>>>> conclusion of that discussion seems to be to remove it.
>>>>
>>>> Currently, I cannot find any obvious reason why keeping the
>>>> BucketingSink, apart from the fact that we do not have a migration
>>>> plan unfortunately. This is why I posted this to dev@ and user@.
>>>>
>>>> Cheers,
>>>> Kostas
>>>>
>>>> [1]
>>>> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
>>>>
>>>> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:
>>>>> I remember this conversation popping up a few times already and I'm in
>>>>> general a big fan of removing BucketingSink.
>>>>>
>>>>> However, until now there were a few features lacking in StreamingFileSink
>>>>> that are present in BucketingSink and that are being actively used (I
>>>> can't
>>>>> exactly remember them now, but I can look it up if everyone else is also
>>>>> suffering from bad memory). Did we manage to add them in the meantime? If
>>>>> not, then it feels rushed to remove it at this point.
>>>>>
>>>>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kk...@gmail.com>
>>>> wrote:
>>>>>> @Chesnay Schepler  Off the top of my head, I cannot find an easy way
>>>>>> to migrate from the BucketingSink to the StreamingFileSink. It may be
>>>>>> possible but it will require some effort because the logic would be
>>>>>> "read the old state, commit it, and start fresh with the
>>>>>> StreamingFileSink."
>>>>>>
>>>>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org>
>>>>>> wrote:
>>>>>>> On 13.10.20 14:01, David Anderson wrote:
>>>>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
>>>> Handling --
>>>>>> and
>>>>>>>> in fact, the StreamingFileSink is mentioned in that FLIP as a
>>>>>> motivating
>>>>>>>> use case.
>>>>>>> Ah yes, I see FLIP-147 as a more general replacement for FLIP-46.
>>>> Thanks
>>>>>>> for the reminder, we should close FLIP-46 now with an explanatory
>>>>>>> message to avoid confusion.
>>>>> --
>>>>>
>>>>> Arvid Heise | Senior Java Developer
>>>>>
>>>>> <https://www.ververica.com/>
>>>>>
>>>>> Follow us @VervericaData
>>>>>
>>>>> --
>>>>>
>>>>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>>>>> Conference
>>>>>
>>>>> Stream Processing | Event Driven | Real Time
>>>>>
>>>>> --
>>>>>
>>>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>>>>
>>>>> --
>>>>> Ververica GmbH
>>>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>>>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
>>>>> (Toni) Cheng
>>


Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Chesnay Schepler <ch...@apache.org>.
If the conclusion is that we shouldn't remove it if _anyone_ is using 
it, then we cannot remove it because the user ML obviously does not 
reach all users.

On 10/28/2020 9:28 AM, Kostas Kloudas wrote:
> Hi all,
>
> I am bringing the up again to see if there are any users actively
> using the BucketingSink.
> So far, if I am not mistaken (and really sorry if I forgot anything),
> it is only a discussion between devs about the potential problems of
> removing it. I totally understand Chesnay's concern about not
> providing compatibility with the StreamingFileSink (SFS) and if there
> are any users, then we should not remove it without trying to find a
> solution for them.
>
> But if there are no users then I would still propose to remove the
> module, given that I am not aware of any efforts to provide
> compatibility with the SFS any time soon.
> The reasons for removing it also include the facts that we do not
> actively maintain it and we do not add new features. As for potential
> missing features in the SFS compared to the BucketingSink that was
> mentioned before, I am not aware of any fundamental limitations and
> even if there are, I would assume that the solution is not to direct
> the users to a deprecated sink but rather try to increase the
> functionality of the actively maintained one.
>
> Please keep in mind that the BucketingSink is deprecated since FLINK
> 1.9 and there is a new File Sink that is coming as part of FLIP-143
> [1].
> Again, if there are any active users who cannot migrate easily, then
> we cannot remove it before trying to provide a smooth migration path.
>
> Thanks,
> Kostas
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API
>
> On Fri, Oct 16, 2020 at 4:36 PM Chesnay Schepler <ch...@apache.org> wrote:
>> @Seth: Earlier in this discussion it was said that the BucketingSink
>> would not be usable in 1.12 .
>>
>> On 10/16/2020 4:25 PM, Seth Wiesman wrote:
>>> +1 It has been deprecated for some time and the StreamingFileSink has
>>> stabalized with a large number of formats and features.
>>>
>>> Plus, the bucketing sink only implements a small number of stable
>>> interfaces[1]. I would expect users to continue to use the bucketing sink
>>> from the 1.11 release with future versions for some time.
>>>
>>> Seth
>>>
>>> https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
>>>
>>> On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kk...@gmail.com> wrote:
>>>
>>>> @Arvid Heise I also do not remember exactly what were all the
>>>> problems. The fact that we added some more bulk formats to the
>>>> streaming file sink definitely reduced the non-supported features. In
>>>> addition, the latest discussion I found on the topic was [1] and the
>>>> conclusion of that discussion seems to be to remove it.
>>>>
>>>> Currently, I cannot find any obvious reason why keeping the
>>>> BucketingSink, apart from the fact that we do not have a migration
>>>> plan unfortunately. This is why I posted this to dev@ and user@.
>>>>
>>>> Cheers,
>>>> Kostas
>>>>
>>>> [1]
>>>> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
>>>>
>>>> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:
>>>>> I remember this conversation popping up a few times already and I'm in
>>>>> general a big fan of removing BucketingSink.
>>>>>
>>>>> However, until now there were a few features lacking in StreamingFileSink
>>>>> that are present in BucketingSink and that are being actively used (I
>>>> can't
>>>>> exactly remember them now, but I can look it up if everyone else is also
>>>>> suffering from bad memory). Did we manage to add them in the meantime? If
>>>>> not, then it feels rushed to remove it at this point.
>>>>>
>>>>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kk...@gmail.com>
>>>> wrote:
>>>>>> @Chesnay Schepler  Off the top of my head, I cannot find an easy way
>>>>>> to migrate from the BucketingSink to the StreamingFileSink. It may be
>>>>>> possible but it will require some effort because the logic would be
>>>>>> "read the old state, commit it, and start fresh with the
>>>>>> StreamingFileSink."
>>>>>>
>>>>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org>
>>>>>> wrote:
>>>>>>> On 13.10.20 14:01, David Anderson wrote:
>>>>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
>>>> Handling --
>>>>>> and
>>>>>>>> in fact, the StreamingFileSink is mentioned in that FLIP as a
>>>>>> motivating
>>>>>>>> use case.
>>>>>>> Ah yes, I see FLIP-147 as a more general replacement for FLIP-46.
>>>> Thanks
>>>>>>> for the reminder, we should close FLIP-46 now with an explanatory
>>>>>>> message to avoid confusion.
>>>>> --
>>>>>
>>>>> Arvid Heise | Senior Java Developer
>>>>>
>>>>> <https://www.ververica.com/>
>>>>>
>>>>> Follow us @VervericaData
>>>>>
>>>>> --
>>>>>
>>>>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>>>>> Conference
>>>>>
>>>>> Stream Processing | Event Driven | Real Time
>>>>>
>>>>> --
>>>>>
>>>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>>>>
>>>>> --
>>>>> Ververica GmbH
>>>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>>>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
>>>>> (Toni) Cheng
>>


Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Kostas Kloudas <kk...@gmail.com>.
Hi all,

I am bringing the up again to see if there are any users actively
using the BucketingSink.
So far, if I am not mistaken (and really sorry if I forgot anything),
it is only a discussion between devs about the potential problems of
removing it. I totally understand Chesnay's concern about not
providing compatibility with the StreamingFileSink (SFS) and if there
are any users, then we should not remove it without trying to find a
solution for them.

But if there are no users then I would still propose to remove the
module, given that I am not aware of any efforts to provide
compatibility with the SFS any time soon.
The reasons for removing it also include the facts that we do not
actively maintain it and we do not add new features. As for potential
missing features in the SFS compared to the BucketingSink that was
mentioned before, I am not aware of any fundamental limitations and
even if there are, I would assume that the solution is not to direct
the users to a deprecated sink but rather try to increase the
functionality of the actively maintained one.

Please keep in mind that the BucketingSink is deprecated since FLINK
1.9 and there is a new File Sink that is coming as part of FLIP-143
[1].
Again, if there are any active users who cannot migrate easily, then
we cannot remove it before trying to provide a smooth migration path.

Thanks,
Kostas

[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API

On Fri, Oct 16, 2020 at 4:36 PM Chesnay Schepler <ch...@apache.org> wrote:
>
> @Seth: Earlier in this discussion it was said that the BucketingSink
> would not be usable in 1.12 .
>
> On 10/16/2020 4:25 PM, Seth Wiesman wrote:
> > +1 It has been deprecated for some time and the StreamingFileSink has
> > stabalized with a large number of formats and features.
> >
> > Plus, the bucketing sink only implements a small number of stable
> > interfaces[1]. I would expect users to continue to use the bucketing sink
> > from the 1.11 release with future versions for some time.
> >
> > Seth
> >
> > https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
> >
> > On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kk...@gmail.com> wrote:
> >
> >> @Arvid Heise I also do not remember exactly what were all the
> >> problems. The fact that we added some more bulk formats to the
> >> streaming file sink definitely reduced the non-supported features. In
> >> addition, the latest discussion I found on the topic was [1] and the
> >> conclusion of that discussion seems to be to remove it.
> >>
> >> Currently, I cannot find any obvious reason why keeping the
> >> BucketingSink, apart from the fact that we do not have a migration
> >> plan unfortunately. This is why I posted this to dev@ and user@.
> >>
> >> Cheers,
> >> Kostas
> >>
> >> [1]
> >> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
> >>
> >> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:
> >>> I remember this conversation popping up a few times already and I'm in
> >>> general a big fan of removing BucketingSink.
> >>>
> >>> However, until now there were a few features lacking in StreamingFileSink
> >>> that are present in BucketingSink and that are being actively used (I
> >> can't
> >>> exactly remember them now, but I can look it up if everyone else is also
> >>> suffering from bad memory). Did we manage to add them in the meantime? If
> >>> not, then it feels rushed to remove it at this point.
> >>>
> >>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kk...@gmail.com>
> >> wrote:
> >>>> @Chesnay Schepler  Off the top of my head, I cannot find an easy way
> >>>> to migrate from the BucketingSink to the StreamingFileSink. It may be
> >>>> possible but it will require some effort because the logic would be
> >>>> "read the old state, commit it, and start fresh with the
> >>>> StreamingFileSink."
> >>>>
> >>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org>
> >>>> wrote:
> >>>>> On 13.10.20 14:01, David Anderson wrote:
> >>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
> >> Handling --
> >>>> and
> >>>>>> in fact, the StreamingFileSink is mentioned in that FLIP as a
> >>>> motivating
> >>>>>> use case.
> >>>>> Ah yes, I see FLIP-147 as a more general replacement for FLIP-46.
> >> Thanks
> >>>>> for the reminder, we should close FLIP-46 now with an explanatory
> >>>>> message to avoid confusion.
> >>>
> >>> --
> >>>
> >>> Arvid Heise | Senior Java Developer
> >>>
> >>> <https://www.ververica.com/>
> >>>
> >>> Follow us @VervericaData
> >>>
> >>> --
> >>>
> >>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> >>> Conference
> >>>
> >>> Stream Processing | Event Driven | Real Time
> >>>
> >>> --
> >>>
> >>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >>>
> >>> --
> >>> Ververica GmbH
> >>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> >>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> >>> (Toni) Cheng
>
>

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Kostas Kloudas <kk...@gmail.com>.
Hi all,

I am bringing the up again to see if there are any users actively
using the BucketingSink.
So far, if I am not mistaken (and really sorry if I forgot anything),
it is only a discussion between devs about the potential problems of
removing it. I totally understand Chesnay's concern about not
providing compatibility with the StreamingFileSink (SFS) and if there
are any users, then we should not remove it without trying to find a
solution for them.

But if there are no users then I would still propose to remove the
module, given that I am not aware of any efforts to provide
compatibility with the SFS any time soon.
The reasons for removing it also include the facts that we do not
actively maintain it and we do not add new features. As for potential
missing features in the SFS compared to the BucketingSink that was
mentioned before, I am not aware of any fundamental limitations and
even if there are, I would assume that the solution is not to direct
the users to a deprecated sink but rather try to increase the
functionality of the actively maintained one.

Please keep in mind that the BucketingSink is deprecated since FLINK
1.9 and there is a new File Sink that is coming as part of FLIP-143
[1].
Again, if there are any active users who cannot migrate easily, then
we cannot remove it before trying to provide a smooth migration path.

Thanks,
Kostas

[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API

On Fri, Oct 16, 2020 at 4:36 PM Chesnay Schepler <ch...@apache.org> wrote:
>
> @Seth: Earlier in this discussion it was said that the BucketingSink
> would not be usable in 1.12 .
>
> On 10/16/2020 4:25 PM, Seth Wiesman wrote:
> > +1 It has been deprecated for some time and the StreamingFileSink has
> > stabalized with a large number of formats and features.
> >
> > Plus, the bucketing sink only implements a small number of stable
> > interfaces[1]. I would expect users to continue to use the bucketing sink
> > from the 1.11 release with future versions for some time.
> >
> > Seth
> >
> > https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
> >
> > On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kk...@gmail.com> wrote:
> >
> >> @Arvid Heise I also do not remember exactly what were all the
> >> problems. The fact that we added some more bulk formats to the
> >> streaming file sink definitely reduced the non-supported features. In
> >> addition, the latest discussion I found on the topic was [1] and the
> >> conclusion of that discussion seems to be to remove it.
> >>
> >> Currently, I cannot find any obvious reason why keeping the
> >> BucketingSink, apart from the fact that we do not have a migration
> >> plan unfortunately. This is why I posted this to dev@ and user@.
> >>
> >> Cheers,
> >> Kostas
> >>
> >> [1]
> >> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
> >>
> >> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:
> >>> I remember this conversation popping up a few times already and I'm in
> >>> general a big fan of removing BucketingSink.
> >>>
> >>> However, until now there were a few features lacking in StreamingFileSink
> >>> that are present in BucketingSink and that are being actively used (I
> >> can't
> >>> exactly remember them now, but I can look it up if everyone else is also
> >>> suffering from bad memory). Did we manage to add them in the meantime? If
> >>> not, then it feels rushed to remove it at this point.
> >>>
> >>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kk...@gmail.com>
> >> wrote:
> >>>> @Chesnay Schepler  Off the top of my head, I cannot find an easy way
> >>>> to migrate from the BucketingSink to the StreamingFileSink. It may be
> >>>> possible but it will require some effort because the logic would be
> >>>> "read the old state, commit it, and start fresh with the
> >>>> StreamingFileSink."
> >>>>
> >>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org>
> >>>> wrote:
> >>>>> On 13.10.20 14:01, David Anderson wrote:
> >>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
> >> Handling --
> >>>> and
> >>>>>> in fact, the StreamingFileSink is mentioned in that FLIP as a
> >>>> motivating
> >>>>>> use case.
> >>>>> Ah yes, I see FLIP-147 as a more general replacement for FLIP-46.
> >> Thanks
> >>>>> for the reminder, we should close FLIP-46 now with an explanatory
> >>>>> message to avoid confusion.
> >>>
> >>> --
> >>>
> >>> Arvid Heise | Senior Java Developer
> >>>
> >>> <https://www.ververica.com/>
> >>>
> >>> Follow us @VervericaData
> >>>
> >>> --
> >>>
> >>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> >>> Conference
> >>>
> >>> Stream Processing | Event Driven | Real Time
> >>>
> >>> --
> >>>
> >>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >>>
> >>> --
> >>> Ververica GmbH
> >>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> >>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> >>> (Toni) Cheng
>
>

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Chesnay Schepler <ch...@apache.org>.
@Seth: Earlier in this discussion it was said that the BucketingSink 
would not be usable in 1.12 .

On 10/16/2020 4:25 PM, Seth Wiesman wrote:
> +1 It has been deprecated for some time and the StreamingFileSink has
> stabalized with a large number of formats and features.
>
> Plus, the bucketing sink only implements a small number of stable
> interfaces[1]. I would expect users to continue to use the bucketing sink
> from the 1.11 release with future versions for some time.
>
> Seth
>
> https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
>
> On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kk...@gmail.com> wrote:
>
>> @Arvid Heise I also do not remember exactly what were all the
>> problems. The fact that we added some more bulk formats to the
>> streaming file sink definitely reduced the non-supported features. In
>> addition, the latest discussion I found on the topic was [1] and the
>> conclusion of that discussion seems to be to remove it.
>>
>> Currently, I cannot find any obvious reason why keeping the
>> BucketingSink, apart from the fact that we do not have a migration
>> plan unfortunately. This is why I posted this to dev@ and user@.
>>
>> Cheers,
>> Kostas
>>
>> [1]
>> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
>>
>> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:
>>> I remember this conversation popping up a few times already and I'm in
>>> general a big fan of removing BucketingSink.
>>>
>>> However, until now there were a few features lacking in StreamingFileSink
>>> that are present in BucketingSink and that are being actively used (I
>> can't
>>> exactly remember them now, but I can look it up if everyone else is also
>>> suffering from bad memory). Did we manage to add them in the meantime? If
>>> not, then it feels rushed to remove it at this point.
>>>
>>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kk...@gmail.com>
>> wrote:
>>>> @Chesnay Schepler  Off the top of my head, I cannot find an easy way
>>>> to migrate from the BucketingSink to the StreamingFileSink. It may be
>>>> possible but it will require some effort because the logic would be
>>>> "read the old state, commit it, and start fresh with the
>>>> StreamingFileSink."
>>>>
>>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org>
>>>> wrote:
>>>>> On 13.10.20 14:01, David Anderson wrote:
>>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
>> Handling --
>>>> and
>>>>>> in fact, the StreamingFileSink is mentioned in that FLIP as a
>>>> motivating
>>>>>> use case.
>>>>> Ah yes, I see FLIP-147 as a more general replacement for FLIP-46.
>> Thanks
>>>>> for the reminder, we should close FLIP-46 now with an explanatory
>>>>> message to avoid confusion.
>>>
>>> --
>>>
>>> Arvid Heise | Senior Java Developer
>>>
>>> <https://www.ververica.com/>
>>>
>>> Follow us @VervericaData
>>>
>>> --
>>>
>>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>>> Conference
>>>
>>> Stream Processing | Event Driven | Real Time
>>>
>>> --
>>>
>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>>
>>> --
>>> Ververica GmbH
>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
>>> (Toni) Cheng



Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Chesnay Schepler <ch...@apache.org>.
@Seth: Earlier in this discussion it was said that the BucketingSink 
would not be usable in 1.12 .

On 10/16/2020 4:25 PM, Seth Wiesman wrote:
> +1 It has been deprecated for some time and the StreamingFileSink has
> stabalized with a large number of formats and features.
>
> Plus, the bucketing sink only implements a small number of stable
> interfaces[1]. I would expect users to continue to use the bucketing sink
> from the 1.11 release with future versions for some time.
>
> Seth
>
> https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172
>
> On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kk...@gmail.com> wrote:
>
>> @Arvid Heise I also do not remember exactly what were all the
>> problems. The fact that we added some more bulk formats to the
>> streaming file sink definitely reduced the non-supported features. In
>> addition, the latest discussion I found on the topic was [1] and the
>> conclusion of that discussion seems to be to remove it.
>>
>> Currently, I cannot find any obvious reason why keeping the
>> BucketingSink, apart from the fact that we do not have a migration
>> plan unfortunately. This is why I posted this to dev@ and user@.
>>
>> Cheers,
>> Kostas
>>
>> [1]
>> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
>>
>> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:
>>> I remember this conversation popping up a few times already and I'm in
>>> general a big fan of removing BucketingSink.
>>>
>>> However, until now there were a few features lacking in StreamingFileSink
>>> that are present in BucketingSink and that are being actively used (I
>> can't
>>> exactly remember them now, but I can look it up if everyone else is also
>>> suffering from bad memory). Did we manage to add them in the meantime? If
>>> not, then it feels rushed to remove it at this point.
>>>
>>> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kk...@gmail.com>
>> wrote:
>>>> @Chesnay Schepler  Off the top of my head, I cannot find an easy way
>>>> to migrate from the BucketingSink to the StreamingFileSink. It may be
>>>> possible but it will require some effort because the logic would be
>>>> "read the old state, commit it, and start fresh with the
>>>> StreamingFileSink."
>>>>
>>>> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org>
>>>> wrote:
>>>>> On 13.10.20 14:01, David Anderson wrote:
>>>>>> I thought this was waiting on FLIP-46 -- Graceful Shutdown
>> Handling --
>>>> and
>>>>>> in fact, the StreamingFileSink is mentioned in that FLIP as a
>>>> motivating
>>>>>> use case.
>>>>> Ah yes, I see FLIP-147 as a more general replacement for FLIP-46.
>> Thanks
>>>>> for the reminder, we should close FLIP-46 now with an explanatory
>>>>> message to avoid confusion.
>>>
>>> --
>>>
>>> Arvid Heise | Senior Java Developer
>>>
>>> <https://www.ververica.com/>
>>>
>>> Follow us @VervericaData
>>>
>>> --
>>>
>>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>>> Conference
>>>
>>> Stream Processing | Event Driven | Real Time
>>>
>>> --
>>>
>>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>>
>>> --
>>> Ververica GmbH
>>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
>>> (Toni) Cheng



Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Seth Wiesman <sj...@gmail.com>.
+1 It has been deprecated for some time and the StreamingFileSink has
stabalized with a large number of formats and features.

Plus, the bucketing sink only implements a small number of stable
interfaces[1]. I would expect users to continue to use the bucketing sink
from the 1.11 release with future versions for some time.

Seth

https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172

On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kk...@gmail.com> wrote:

> @Arvid Heise I also do not remember exactly what were all the
> problems. The fact that we added some more bulk formats to the
> streaming file sink definitely reduced the non-supported features. In
> addition, the latest discussion I found on the topic was [1] and the
> conclusion of that discussion seems to be to remove it.
>
> Currently, I cannot find any obvious reason why keeping the
> BucketingSink, apart from the fact that we do not have a migration
> plan unfortunately. This is why I posted this to dev@ and user@.
>
> Cheers,
> Kostas
>
> [1]
> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
>
> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:
> >
> > I remember this conversation popping up a few times already and I'm in
> > general a big fan of removing BucketingSink.
> >
> > However, until now there were a few features lacking in StreamingFileSink
> > that are present in BucketingSink and that are being actively used (I
> can't
> > exactly remember them now, but I can look it up if everyone else is also
> > suffering from bad memory). Did we manage to add them in the meantime? If
> > not, then it feels rushed to remove it at this point.
> >
> > On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kk...@gmail.com>
> wrote:
> >
> > > @Chesnay Schepler  Off the top of my head, I cannot find an easy way
> > > to migrate from the BucketingSink to the StreamingFileSink. It may be
> > > possible but it will require some effort because the logic would be
> > > "read the old state, commit it, and start fresh with the
> > > StreamingFileSink."
> > >
> > > On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org>
> > > wrote:
> > > >
> > > > On 13.10.20 14:01, David Anderson wrote:
> > > > > I thought this was waiting on FLIP-46 -- Graceful Shutdown
> Handling --
> > > and
> > > > > in fact, the StreamingFileSink is mentioned in that FLIP as a
> > > motivating
> > > > > use case.
> > > >
> > > > Ah yes, I see FLIP-147 as a more general replacement for FLIP-46.
> Thanks
> > > > for the reminder, we should close FLIP-46 now with an explanatory
> > > > message to avoid confusion.
> > >
> >
> >
> > --
> >
> > Arvid Heise | Senior Java Developer
> >
> > <https://www.ververica.com/>
> >
> > Follow us @VervericaData
> >
> > --
> >
> > Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> > Conference
> >
> > Stream Processing | Event Driven | Real Time
> >
> > --
> >
> > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >
> > --
> > Ververica GmbH
> > Registered at Amtsgericht Charlottenburg: HRB 158244 B
> > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> > (Toni) Cheng
>

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Seth Wiesman <sj...@gmail.com>.
+1 It has been deprecated for some time and the StreamingFileSink has
stabalized with a large number of formats and features.

Plus, the bucketing sink only implements a small number of stable
interfaces[1]. I would expect users to continue to use the bucketing sink
from the 1.11 release with future versions for some time.

Seth

https://github.com/apache/flink/blob/2ff3b771cbb091e1f43686dd8e176cea6d435501/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L170-L172

On Thu, Oct 15, 2020 at 2:57 PM Kostas Kloudas <kk...@gmail.com> wrote:

> @Arvid Heise I also do not remember exactly what were all the
> problems. The fact that we added some more bulk formats to the
> streaming file sink definitely reduced the non-supported features. In
> addition, the latest discussion I found on the topic was [1] and the
> conclusion of that discussion seems to be to remove it.
>
> Currently, I cannot find any obvious reason why keeping the
> BucketingSink, apart from the fact that we do not have a migration
> plan unfortunately. This is why I posted this to dev@ and user@.
>
> Cheers,
> Kostas
>
> [1]
> https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E
>
> On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:
> >
> > I remember this conversation popping up a few times already and I'm in
> > general a big fan of removing BucketingSink.
> >
> > However, until now there were a few features lacking in StreamingFileSink
> > that are present in BucketingSink and that are being actively used (I
> can't
> > exactly remember them now, but I can look it up if everyone else is also
> > suffering from bad memory). Did we manage to add them in the meantime? If
> > not, then it feels rushed to remove it at this point.
> >
> > On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kk...@gmail.com>
> wrote:
> >
> > > @Chesnay Schepler  Off the top of my head, I cannot find an easy way
> > > to migrate from the BucketingSink to the StreamingFileSink. It may be
> > > possible but it will require some effort because the logic would be
> > > "read the old state, commit it, and start fresh with the
> > > StreamingFileSink."
> > >
> > > On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org>
> > > wrote:
> > > >
> > > > On 13.10.20 14:01, David Anderson wrote:
> > > > > I thought this was waiting on FLIP-46 -- Graceful Shutdown
> Handling --
> > > and
> > > > > in fact, the StreamingFileSink is mentioned in that FLIP as a
> > > motivating
> > > > > use case.
> > > >
> > > > Ah yes, I see FLIP-147 as a more general replacement for FLIP-46.
> Thanks
> > > > for the reminder, we should close FLIP-46 now with an explanatory
> > > > message to avoid confusion.
> > >
> >
> >
> > --
> >
> > Arvid Heise | Senior Java Developer
> >
> > <https://www.ververica.com/>
> >
> > Follow us @VervericaData
> >
> > --
> >
> > Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> > Conference
> >
> > Stream Processing | Event Driven | Real Time
> >
> > --
> >
> > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >
> > --
> > Ververica GmbH
> > Registered at Amtsgericht Charlottenburg: HRB 158244 B
> > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> > (Toni) Cheng
>

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Kostas Kloudas <kk...@gmail.com>.
@Arvid Heise I also do not remember exactly what were all the
problems. The fact that we added some more bulk formats to the
streaming file sink definitely reduced the non-supported features. In
addition, the latest discussion I found on the topic was [1] and the
conclusion of that discussion seems to be to remove it.

Currently, I cannot find any obvious reason why keeping the
BucketingSink, apart from the fact that we do not have a migration
plan unfortunately. This is why I posted this to dev@ and user@.

Cheers,
Kostas

[1] https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E

On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:
>
> I remember this conversation popping up a few times already and I'm in
> general a big fan of removing BucketingSink.
>
> However, until now there were a few features lacking in StreamingFileSink
> that are present in BucketingSink and that are being actively used (I can't
> exactly remember them now, but I can look it up if everyone else is also
> suffering from bad memory). Did we manage to add them in the meantime? If
> not, then it feels rushed to remove it at this point.
>
> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kk...@gmail.com> wrote:
>
> > @Chesnay Schepler  Off the top of my head, I cannot find an easy way
> > to migrate from the BucketingSink to the StreamingFileSink. It may be
> > possible but it will require some effort because the logic would be
> > "read the old state, commit it, and start fresh with the
> > StreamingFileSink."
> >
> > On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org>
> > wrote:
> > >
> > > On 13.10.20 14:01, David Anderson wrote:
> > > > I thought this was waiting on FLIP-46 -- Graceful Shutdown Handling --
> > and
> > > > in fact, the StreamingFileSink is mentioned in that FLIP as a
> > motivating
> > > > use case.
> > >
> > > Ah yes, I see FLIP-147 as a more general replacement for FLIP-46. Thanks
> > > for the reminder, we should close FLIP-46 now with an explanatory
> > > message to avoid confusion.
> >
>
>
> --
>
> Arvid Heise | Senior Java Developer
>
> <https://www.ververica.com/>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Toni) Cheng

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Kostas Kloudas <kk...@gmail.com>.
@Arvid Heise I also do not remember exactly what were all the
problems. The fact that we added some more bulk formats to the
streaming file sink definitely reduced the non-supported features. In
addition, the latest discussion I found on the topic was [1] and the
conclusion of that discussion seems to be to remove it.

Currently, I cannot find any obvious reason why keeping the
BucketingSink, apart from the fact that we do not have a migration
plan unfortunately. This is why I posted this to dev@ and user@.

Cheers,
Kostas

[1] https://lists.apache.org/thread.html/r799be74658bc7e169238cc8c1e479e961a9e85ccea19089290940ff0%40%3Cdev.flink.apache.org%3E

On Wed, Oct 14, 2020 at 8:03 AM Arvid Heise <ar...@ververica.com> wrote:
>
> I remember this conversation popping up a few times already and I'm in
> general a big fan of removing BucketingSink.
>
> However, until now there were a few features lacking in StreamingFileSink
> that are present in BucketingSink and that are being actively used (I can't
> exactly remember them now, but I can look it up if everyone else is also
> suffering from bad memory). Did we manage to add them in the meantime? If
> not, then it feels rushed to remove it at this point.
>
> On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kk...@gmail.com> wrote:
>
> > @Chesnay Schepler  Off the top of my head, I cannot find an easy way
> > to migrate from the BucketingSink to the StreamingFileSink. It may be
> > possible but it will require some effort because the logic would be
> > "read the old state, commit it, and start fresh with the
> > StreamingFileSink."
> >
> > On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org>
> > wrote:
> > >
> > > On 13.10.20 14:01, David Anderson wrote:
> > > > I thought this was waiting on FLIP-46 -- Graceful Shutdown Handling --
> > and
> > > > in fact, the StreamingFileSink is mentioned in that FLIP as a
> > motivating
> > > > use case.
> > >
> > > Ah yes, I see FLIP-147 as a more general replacement for FLIP-46. Thanks
> > > for the reminder, we should close FLIP-46 now with an explanatory
> > > message to avoid confusion.
> >
>
>
> --
>
> Arvid Heise | Senior Java Developer
>
> <https://www.ververica.com/>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Toni) Cheng

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Arvid Heise <ar...@ververica.com>.
I remember this conversation popping up a few times already and I'm in
general a big fan of removing BucketingSink.

However, until now there were a few features lacking in StreamingFileSink
that are present in BucketingSink and that are being actively used (I can't
exactly remember them now, but I can look it up if everyone else is also
suffering from bad memory). Did we manage to add them in the meantime? If
not, then it feels rushed to remove it at this point.

On Tue, Oct 13, 2020 at 2:33 PM Kostas Kloudas <kk...@gmail.com> wrote:

> @Chesnay Schepler  Off the top of my head, I cannot find an easy way
> to migrate from the BucketingSink to the StreamingFileSink. It may be
> possible but it will require some effort because the logic would be
> "read the old state, commit it, and start fresh with the
> StreamingFileSink."
>
> On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org>
> wrote:
> >
> > On 13.10.20 14:01, David Anderson wrote:
> > > I thought this was waiting on FLIP-46 -- Graceful Shutdown Handling --
> and
> > > in fact, the StreamingFileSink is mentioned in that FLIP as a
> motivating
> > > use case.
> >
> > Ah yes, I see FLIP-147 as a more general replacement for FLIP-46. Thanks
> > for the reminder, we should close FLIP-46 now with an explanatory
> > message to avoid confusion.
>


-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Kostas Kloudas <kk...@gmail.com>.
@Chesnay Schepler  Off the top of my head, I cannot find an easy way
to migrate from the BucketingSink to the StreamingFileSink. It may be
possible but it will require some effort because the logic would be
"read the old state, commit it, and start fresh with the
StreamingFileSink."

On Tue, Oct 13, 2020 at 2:09 PM Aljoscha Krettek <al...@apache.org> wrote:
>
> On 13.10.20 14:01, David Anderson wrote:
> > I thought this was waiting on FLIP-46 -- Graceful Shutdown Handling -- and
> > in fact, the StreamingFileSink is mentioned in that FLIP as a motivating
> > use case.
>
> Ah yes, I see FLIP-147 as a more general replacement for FLIP-46. Thanks
> for the reminder, we should close FLIP-46 now with an explanatory
> message to avoid confusion.

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Aljoscha Krettek <al...@apache.org>.
On 13.10.20 14:01, David Anderson wrote:
> I thought this was waiting on FLIP-46 -- Graceful Shutdown Handling -- and
> in fact, the StreamingFileSink is mentioned in that FLIP as a motivating
> use case.

Ah yes, I see FLIP-147 as a more general replacement for FLIP-46. Thanks 
for the reminder, we should close FLIP-46 now with an explanatory 
message to avoid confusion.

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by David Anderson <da...@apache.org>.
> The BucketingSink suffers from the same problem. It's caused by the fact
> that we don't do a "final" checkpoint before shutting down a pipeline.
> We're trying to resolve that with FLIP-147 [1].

I thought this was waiting on FLIP-46 -- Graceful Shutdown Handling -- and
in fact, the StreamingFileSink is mentioned in that FLIP as a motivating
use case.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-46%3A+Graceful+Shutdown+Handling+by+UDFs

On Tue, Oct 13, 2020 at 1:01 PM Aljoscha Krettek <al...@apache.org>
wrote:

> On 13.10.20 11:18, David Anderson wrote:
> > I think the pertinent question is whether there are interesting cases
> where
> > the BucketingSink is still a better choice. One case I'm not sure about
> is
> > the situation described in docs for the StreamingFileSink under Important
> > Note 2 [1]:
> >
> >      ... upon normal termination of a job, the last in-progress files
> will
> > not be transitioned to the “finished” state.
> >
> > I know this confuses and frustrates users, but I don't know if the
> > BucketingSink has any advantages in this regard.
>
> The BucketingSink suffers from the same problem. It's caused by the fact
> that we don't do a "final" checkpoint before shutting down a pipeline.
> We're trying to resolve that with FLIP-147 [1].
>
> [1] https://cwiki.apache.org/confluence/x/mw-ZCQ
>
>

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Aljoscha Krettek <al...@apache.org>.
On 13.10.20 11:18, David Anderson wrote:
> I think the pertinent question is whether there are interesting cases where
> the BucketingSink is still a better choice. One case I'm not sure about is
> the situation described in docs for the StreamingFileSink under Important
> Note 2 [1]:
> 
>      ... upon normal termination of a job, the last in-progress files will
> not be transitioned to the “finished” state.
> 
> I know this confuses and frustrates users, but I don't know if the
> BucketingSink has any advantages in this regard.

The BucketingSink suffers from the same problem. It's caused by the fact 
that we don't do a "final" checkpoint before shutting down a pipeline. 
We're trying to resolve that with FLIP-147 [1].

[1] https://cwiki.apache.org/confluence/x/mw-ZCQ


Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by David Anderson <da...@apache.org>.
I think the pertinent question is whether there are interesting cases where
the BucketingSink is still a better choice. One case I'm not sure about is
the situation described in docs for the StreamingFileSink under Important
Note 2 [1]:

    ... upon normal termination of a job, the last in-progress files will
not be transitioned to the “finished” state.

I know this confuses and frustrates users, but I don't know if the
BucketingSink has any advantages in this regard.

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/streamfile_sink.html#important-considerations

On Tue, Oct 13, 2020 at 11:06 AM Konstantin Knauf <kn...@apache.org> wrote:

> Given that it has been deprecated for three releases now, I am +1 to
> dropping it.
>
> On Mon, Oct 12, 2020 at 9:38 PM Chesnay Schepler <ch...@apache.org>
> wrote:
>
>> Is there a way for us to change the module (in a reasonable way) that
>> would allow users to continue using it?
>> Is it an API problem, or one of semantics?
>>
>> On 10/12/2020 4:57 PM, Kostas Kloudas wrote:
>> > Hi Chesnay,
>> >
>> > Unfortunately not from what I can see in the code.
>> > This is the reason why I am opening a discussion. I think that if we
>> > supported backwards compatibility, this would have been an easier
>> > process.
>> >
>> > Kostas
>> >
>> > On Mon, Oct 12, 2020 at 4:32 PM Chesnay Schepler <ch...@apache.org>
>> wrote:
>> >> Are older versions of the module compatible with 1.12+?
>> >>
>> >> On 10/12/2020 4:30 PM, Kostas Kloudas wrote:
>> >>> Hi all,
>> >>>
>> >>> As the title suggests, this thread is to discuss the removal of the
>> >>> flink-connector-filesystem module which contains (only) the deprecated
>> >>> BucketingSink. The BucketingSin is deprecated since FLINK 1.9 [1] in
>> >>> favor of the relatively recently introduced StreamingFileSink.
>> >>>
>> >>> For the sake of a clean and more manageable codebase, I propose to
>> >>> remove this module for release-1.12, but of course we should see first
>> >>> if there are any usecases that depend on it.
>> >>>
>> >>> Let's have a fruitful discussion.
>> >>>
>> >>> Cheers,
>> >>> Kostas
>> >>>
>> >>> [1] https://issues.apache.org/jira/browse/FLINK-13396
>> >>>
>>
>>
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by David Anderson <da...@apache.org>.
I think the pertinent question is whether there are interesting cases where
the BucketingSink is still a better choice. One case I'm not sure about is
the situation described in docs for the StreamingFileSink under Important
Note 2 [1]:

    ... upon normal termination of a job, the last in-progress files will
not be transitioned to the “finished” state.

I know this confuses and frustrates users, but I don't know if the
BucketingSink has any advantages in this regard.

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/streamfile_sink.html#important-considerations

On Tue, Oct 13, 2020 at 11:06 AM Konstantin Knauf <kn...@apache.org> wrote:

> Given that it has been deprecated for three releases now, I am +1 to
> dropping it.
>
> On Mon, Oct 12, 2020 at 9:38 PM Chesnay Schepler <ch...@apache.org>
> wrote:
>
>> Is there a way for us to change the module (in a reasonable way) that
>> would allow users to continue using it?
>> Is it an API problem, or one of semantics?
>>
>> On 10/12/2020 4:57 PM, Kostas Kloudas wrote:
>> > Hi Chesnay,
>> >
>> > Unfortunately not from what I can see in the code.
>> > This is the reason why I am opening a discussion. I think that if we
>> > supported backwards compatibility, this would have been an easier
>> > process.
>> >
>> > Kostas
>> >
>> > On Mon, Oct 12, 2020 at 4:32 PM Chesnay Schepler <ch...@apache.org>
>> wrote:
>> >> Are older versions of the module compatible with 1.12+?
>> >>
>> >> On 10/12/2020 4:30 PM, Kostas Kloudas wrote:
>> >>> Hi all,
>> >>>
>> >>> As the title suggests, this thread is to discuss the removal of the
>> >>> flink-connector-filesystem module which contains (only) the deprecated
>> >>> BucketingSink. The BucketingSin is deprecated since FLINK 1.9 [1] in
>> >>> favor of the relatively recently introduced StreamingFileSink.
>> >>>
>> >>> For the sake of a clean and more manageable codebase, I propose to
>> >>> remove this module for release-1.12, but of course we should see first
>> >>> if there are any usecases that depend on it.
>> >>>
>> >>> Let's have a fruitful discussion.
>> >>>
>> >>> Cheers,
>> >>> Kostas
>> >>>
>> >>> [1] https://issues.apache.org/jira/browse/FLINK-13396
>> >>>
>>
>>
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Konstantin Knauf <kn...@apache.org>.
Given that it has been deprecated for three releases now, I am +1 to
dropping it.

On Mon, Oct 12, 2020 at 9:38 PM Chesnay Schepler <ch...@apache.org> wrote:

> Is there a way for us to change the module (in a reasonable way) that
> would allow users to continue using it?
> Is it an API problem, or one of semantics?
>
> On 10/12/2020 4:57 PM, Kostas Kloudas wrote:
> > Hi Chesnay,
> >
> > Unfortunately not from what I can see in the code.
> > This is the reason why I am opening a discussion. I think that if we
> > supported backwards compatibility, this would have been an easier
> > process.
> >
> > Kostas
> >
> > On Mon, Oct 12, 2020 at 4:32 PM Chesnay Schepler <ch...@apache.org>
> wrote:
> >> Are older versions of the module compatible with 1.12+?
> >>
> >> On 10/12/2020 4:30 PM, Kostas Kloudas wrote:
> >>> Hi all,
> >>>
> >>> As the title suggests, this thread is to discuss the removal of the
> >>> flink-connector-filesystem module which contains (only) the deprecated
> >>> BucketingSink. The BucketingSin is deprecated since FLINK 1.9 [1] in
> >>> favor of the relatively recently introduced StreamingFileSink.
> >>>
> >>> For the sake of a clean and more manageable codebase, I propose to
> >>> remove this module for release-1.12, but of course we should see first
> >>> if there are any usecases that depend on it.
> >>>
> >>> Let's have a fruitful discussion.
> >>>
> >>> Cheers,
> >>> Kostas
> >>>
> >>> [1] https://issues.apache.org/jira/browse/FLINK-13396
> >>>
>
>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Konstantin Knauf <kn...@apache.org>.
Given that it has been deprecated for three releases now, I am +1 to
dropping it.

On Mon, Oct 12, 2020 at 9:38 PM Chesnay Schepler <ch...@apache.org> wrote:

> Is there a way for us to change the module (in a reasonable way) that
> would allow users to continue using it?
> Is it an API problem, or one of semantics?
>
> On 10/12/2020 4:57 PM, Kostas Kloudas wrote:
> > Hi Chesnay,
> >
> > Unfortunately not from what I can see in the code.
> > This is the reason why I am opening a discussion. I think that if we
> > supported backwards compatibility, this would have been an easier
> > process.
> >
> > Kostas
> >
> > On Mon, Oct 12, 2020 at 4:32 PM Chesnay Schepler <ch...@apache.org>
> wrote:
> >> Are older versions of the module compatible with 1.12+?
> >>
> >> On 10/12/2020 4:30 PM, Kostas Kloudas wrote:
> >>> Hi all,
> >>>
> >>> As the title suggests, this thread is to discuss the removal of the
> >>> flink-connector-filesystem module which contains (only) the deprecated
> >>> BucketingSink. The BucketingSin is deprecated since FLINK 1.9 [1] in
> >>> favor of the relatively recently introduced StreamingFileSink.
> >>>
> >>> For the sake of a clean and more manageable codebase, I propose to
> >>> remove this module for release-1.12, but of course we should see first
> >>> if there are any usecases that depend on it.
> >>>
> >>> Let's have a fruitful discussion.
> >>>
> >>> Cheers,
> >>> Kostas
> >>>
> >>> [1] https://issues.apache.org/jira/browse/FLINK-13396
> >>>
>
>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Chesnay Schepler <ch...@apache.org>.
Is there a way for us to change the module (in a reasonable way) that 
would allow users to continue using it?
Is it an API problem, or one of semantics?

On 10/12/2020 4:57 PM, Kostas Kloudas wrote:
> Hi Chesnay,
>
> Unfortunately not from what I can see in the code.
> This is the reason why I am opening a discussion. I think that if we
> supported backwards compatibility, this would have been an easier
> process.
>
> Kostas
>
> On Mon, Oct 12, 2020 at 4:32 PM Chesnay Schepler <ch...@apache.org> wrote:
>> Are older versions of the module compatible with 1.12+?
>>
>> On 10/12/2020 4:30 PM, Kostas Kloudas wrote:
>>> Hi all,
>>>
>>> As the title suggests, this thread is to discuss the removal of the
>>> flink-connector-filesystem module which contains (only) the deprecated
>>> BucketingSink. The BucketingSin is deprecated since FLINK 1.9 [1] in
>>> favor of the relatively recently introduced StreamingFileSink.
>>>
>>> For the sake of a clean and more manageable codebase, I propose to
>>> remove this module for release-1.12, but of course we should see first
>>> if there are any usecases that depend on it.
>>>
>>> Let's have a fruitful discussion.
>>>
>>> Cheers,
>>> Kostas
>>>
>>> [1] https://issues.apache.org/jira/browse/FLINK-13396
>>>


Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Chesnay Schepler <ch...@apache.org>.
Is there a way for us to change the module (in a reasonable way) that 
would allow users to continue using it?
Is it an API problem, or one of semantics?

On 10/12/2020 4:57 PM, Kostas Kloudas wrote:
> Hi Chesnay,
>
> Unfortunately not from what I can see in the code.
> This is the reason why I am opening a discussion. I think that if we
> supported backwards compatibility, this would have been an easier
> process.
>
> Kostas
>
> On Mon, Oct 12, 2020 at 4:32 PM Chesnay Schepler <ch...@apache.org> wrote:
>> Are older versions of the module compatible with 1.12+?
>>
>> On 10/12/2020 4:30 PM, Kostas Kloudas wrote:
>>> Hi all,
>>>
>>> As the title suggests, this thread is to discuss the removal of the
>>> flink-connector-filesystem module which contains (only) the deprecated
>>> BucketingSink. The BucketingSin is deprecated since FLINK 1.9 [1] in
>>> favor of the relatively recently introduced StreamingFileSink.
>>>
>>> For the sake of a clean and more manageable codebase, I propose to
>>> remove this module for release-1.12, but of course we should see first
>>> if there are any usecases that depend on it.
>>>
>>> Let's have a fruitful discussion.
>>>
>>> Cheers,
>>> Kostas
>>>
>>> [1] https://issues.apache.org/jira/browse/FLINK-13396
>>>


Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Kostas Kloudas <kk...@apache.org>.
Hi Chesnay,

Unfortunately not from what I can see in the code.
This is the reason why I am opening a discussion. I think that if we
supported backwards compatibility, this would have been an easier
process.

Kostas

On Mon, Oct 12, 2020 at 4:32 PM Chesnay Schepler <ch...@apache.org> wrote:
>
> Are older versions of the module compatible with 1.12+?
>
> On 10/12/2020 4:30 PM, Kostas Kloudas wrote:
> > Hi all,
> >
> > As the title suggests, this thread is to discuss the removal of the
> > flink-connector-filesystem module which contains (only) the deprecated
> > BucketingSink. The BucketingSin is deprecated since FLINK 1.9 [1] in
> > favor of the relatively recently introduced StreamingFileSink.
> >
> > For the sake of a clean and more manageable codebase, I propose to
> > remove this module for release-1.12, but of course we should see first
> > if there are any usecases that depend on it.
> >
> > Let's have a fruitful discussion.
> >
> > Cheers,
> > Kostas
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-13396
> >
>

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Kostas Kloudas <kk...@apache.org>.
Hi Chesnay,

Unfortunately not from what I can see in the code.
This is the reason why I am opening a discussion. I think that if we
supported backwards compatibility, this would have been an easier
process.

Kostas

On Mon, Oct 12, 2020 at 4:32 PM Chesnay Schepler <ch...@apache.org> wrote:
>
> Are older versions of the module compatible with 1.12+?
>
> On 10/12/2020 4:30 PM, Kostas Kloudas wrote:
> > Hi all,
> >
> > As the title suggests, this thread is to discuss the removal of the
> > flink-connector-filesystem module which contains (only) the deprecated
> > BucketingSink. The BucketingSin is deprecated since FLINK 1.9 [1] in
> > favor of the relatively recently introduced StreamingFileSink.
> >
> > For the sake of a clean and more manageable codebase, I propose to
> > remove this module for release-1.12, but of course we should see first
> > if there are any usecases that depend on it.
> >
> > Let's have a fruitful discussion.
> >
> > Cheers,
> > Kostas
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-13396
> >
>

Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Chesnay Schepler <ch...@apache.org>.
Are older versions of the module compatible with 1.12+?

On 10/12/2020 4:30 PM, Kostas Kloudas wrote:
> Hi all,
>
> As the title suggests, this thread is to discuss the removal of the
> flink-connector-filesystem module which contains (only) the deprecated
> BucketingSink. The BucketingSin is deprecated since FLINK 1.9 [1] in
> favor of the relatively recently introduced StreamingFileSink.
>
> For the sake of a clean and more manageable codebase, I propose to
> remove this module for release-1.12, but of course we should see first
> if there are any usecases that depend on it.
>
> Let's have a fruitful discussion.
>
> Cheers,
> Kostas
>
> [1] https://issues.apache.org/jira/browse/FLINK-13396
>


Re: [DISCUSS] Remove flink-connector-filesystem module.

Posted by Chesnay Schepler <ch...@apache.org>.
Are older versions of the module compatible with 1.12+?

On 10/12/2020 4:30 PM, Kostas Kloudas wrote:
> Hi all,
>
> As the title suggests, this thread is to discuss the removal of the
> flink-connector-filesystem module which contains (only) the deprecated
> BucketingSink. The BucketingSin is deprecated since FLINK 1.9 [1] in
> favor of the relatively recently introduced StreamingFileSink.
>
> For the sake of a clean and more manageable codebase, I propose to
> remove this module for release-1.12, but of course we should see first
> if there are any usecases that depend on it.
>
> Let's have a fruitful discussion.
>
> Cheers,
> Kostas
>
> [1] https://issues.apache.org/jira/browse/FLINK-13396
>