You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Tzu-Li (Gordon) Tai" <tz...@apache.org> on 2020/07/01 08:35:50 UTC
Re: [DISCUSS] FLIP-128: Enhanced Fan Out for AWS Kinesis Consumers
Thanks for updating the FLIP Danny. Changes look good to me.
Please feel free to proceed with a vote soon.
On Tue, Jun 30, 2020 at 11:19 PM Cranmer, Danny <cr...@amazon.com.invalid>
wrote:
> Hey Gordon,
>
> I have updated the FLIP [1] to include support for configurable
> registration strategies:
> - Added 2 additional configuration keys
> - Added Registration/De-registration Configuration section
> - Updated Stream Consumer Registration/Tear Down section
> - Remove rejected alternative (since we are now optionally supporting it)
>
> A slight tweak. I have added a third option, "none". This will disable
> registration/de-registration and allow the user to directly pass in the
> ConsumerARN. This option adds overhead/complexity to the user configuration
> but it will remove all start-up and teardown AWS SDK calls.
>
> Let me know if you are happy to proceed to a vote or have any further
> feedback.
>
> Thanks,
> Danny Cranmer
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-128%3A+Enhanced+Fan+Out+for+AWS+Kinesis+Consumers
>
> On 29/06/2020, 15:28, "Cranmer, Danny" <cr...@amazon.com.INVALID>
> wrote:
>
> Hey Gordon,
>
> Thank-you for you review and feedback.
>
> I agree with your suggestion for the contribution plan. I have updated
> the FLIP to include the additional step with the FanOutKinesisProxy/AWS SDK
> 2.x dependency. I have also added another precursor step to generally
> improve test coverage. I have been playing around in the connector code and
> discovered that restarting consumption from an aggregated record is not
> covered by unit/integration tests. I have written some additional tests
> with simulated Kinesis behaviour, pushing these tests in advance would
> increase confidence in the next contribution step.
>
> Regarding the consumer de-/registration. I had not considered making
> it optional via configuration as you propose. I agree with your observation
> on the additional configuration complexity and it would also expose
> internal implementation details to the user. That being said, if a user
> application has a very high parallelism they could quite easily exceed the
> ListStreamConsumers quota (5 TPS [1]), and increase the application
> start-up time substantially with back-off delays. The client could
> conditionally (based on config) register the stream consumer and add the
> ConsumerARN(s) to the consumer properties, eliminating the required
> parallel calls to ListStreamConsumers by the Flink tasks. I will update the
> FLIP to include this change, and reply to this thread once it is done.
>
> [1]
> https://docs.aws.amazon.com/kinesis/latest/APIReference/API_ListStreamConsumers.html
>
> Thanks,
> Danny,
>
> On 29/06/2020, 05:19, "Tzu-Li (Gordon) Tai" <tz...@apache.org>
> wrote:
>
> CAUTION: This email originated from outside of the organization.
> Do not click links or open attachments unless you can confirm the sender
> and know the content is safe.
>
>
>
> Also, if it wasn't clear, I'll be happy to provide committer
> support on
> reviewing and merging this FLIP, if it gets approved :)
>
> On Mon, Jun 29, 2020 at 12:12 PM Tzu-Li (Gordon) Tai <
> tzulitai@apache.org>
> wrote:
>
> > Hi Cranmer,
> >
> > Thank you for proposing the feature and starting the discussion
> thread.
> > This is really great work!
> >
> > Overall, +1 to adding EFO support to the Kinesis connector.
> > I can see that having a dedicated throughput quota for each
> consuming
> > Flink application is definitely a requirement for AWS users.
> > In the past, we worked around this by using adaptive polling to
> avoid
> > exceeding the quotas with multiple consumers, this would
> probably go away
> > with this implemented.
> >
> > There are a few things I like about the current proposal:
> > - EFO is an opt-in feature for now. Once we decide to
> reimplement the
> > Kinesis connector on top of the new source interface (FLIP-27),
> we can
> > probably consider enabling EFO by default to match the defaults
> of the
> > higher-level KCL library.
> > - From the design, it seems like the changes can indeed be fairly
> > consolidated in the Kinesis connector. The change should be
> fairly safe as
> > well, since we're essentially abstracting record publishing
> concerns only,
> > which is transparent to the exactly-once semantics / watermark
> components.
> >
> > Concerning competing stream consumer de-/registration:
> > this would most likely go away with the new source interface,
> where this
> > can be done on the source split enumerator.
> > I'm personally okay with the proposed strategy of competing with
> backoff.
> > As food for thought, have you considered an opt-in / opt-out,
> where the
> > user knows that the client has access to KDS, and can choose to
> register
> > once only on the client side instead of competing in TMs?
> > I'm not sure if this is worth the extra configuration complexity
> though.
> >
> > For the concrete next steps for implementation after this FLIP
> has passed:
> > From the FLIP I can conclude that implementation wise, this
> would come in
> > a few steps -
> > 1. Abstract away record publishing behind a new interface.
> Initially we
> > only have one implementation (the current polling mechanism).
> > 2. Add in AWS SDK 2.x dependency, with a new FanOutKinesisProxy
> > implementation.
> > 3. Add FanOutRecordPublisher to finalize EFO support.
> >
> > I think step 2. wasn't explicitly mentioned in the FLIP, but I
> strongly
> > suggest to consolidate that step as a single PR, as we would
> need to do a
> > license check for dependency changes and it would be nice to
> move forward
> > with that with at least interference of code changes as possible.
> >
> > Pushing this FLIP forward for approval:
> > Since this FLIP is a fairly consolidated change, we should be
> safe to
> > proceed with a vote soon.
> > That usually happens in a separate vote thread, linking to this
> discussion
> > thread.
> >
> > cc'ing Thomas Weise as well, as he has also worked substantially
> on the
> > Kinesis connector in the past.
> >
> > Cheers,
> > Gordon
> >
> > On Tue, Jun 23, 2020 at 6:02 PM hey_wxl <
> xiaolong.wang@smartnews.com>
> > wrote:
> >
> >> Hi, Cranmer.
> >>
> >> I'm Roland Wang. I've read the FLIP you wrote, and agree
> with your
> >> design.
> >> Recently, I'm working on this feature too, and have made
> some progress:
> >>
> >> 1. I add two methods: getOrRegisterConsumer &
> subscribeToShard on
> >> KinesisProxyInterface.
> >> 2. I re-implement the KinesisProxy using AWS SDK V2.x.
> >> 3. I use the new KinesisProxy to implement ShardConsumer.
> >>
> >> Though my design is not fully considered, I hope we can
> discuss a
> >> little
> >> bit about this feature. I wish to make some contribution to the
> community.
> >>
> >>
> >>
> >>
> >> --
> >> Sent from:
> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
> >>
> >
>
>
>