You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Tzu-Li (Gordon) Tai" <tz...@apache.org> on 2020/07/01 08:35:50 UTC

Re: [DISCUSS] FLIP-128: Enhanced Fan Out for AWS Kinesis Consumers

Thanks for updating the FLIP Danny. Changes look good to me.
Please feel free to proceed with a vote soon.

On Tue, Jun 30, 2020 at 11:19 PM Cranmer, Danny <cr...@amazon.com.invalid>
wrote:

> Hey Gordon,
>
> I have updated the FLIP [1] to include support for configurable
> registration strategies:
> - Added 2 additional configuration keys
> - Added Registration/De-registration Configuration section
> - Updated Stream Consumer Registration/Tear Down section
> - Remove rejected alternative (since we are now optionally supporting it)
>
> A slight tweak. I have added a third option, "none". This will disable
> registration/de-registration and allow the user to directly pass in the
> ConsumerARN. This option adds overhead/complexity to the user configuration
> but it will remove all start-up and teardown AWS SDK calls.
>
> Let me know if you are happy to proceed to a vote or have any further
> feedback.
>
> Thanks,
> Danny Cranmer
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-128%3A+Enhanced+Fan+Out+for+AWS+Kinesis+Consumers
>
> On 29/06/2020, 15:28, "Cranmer, Danny" <cr...@amazon.com.INVALID>
> wrote:
>
>     Hey Gordon,
>
>     Thank-you for you review and feedback.
>
>     I agree with your suggestion for the contribution plan. I have updated
> the FLIP to include the additional step with the FanOutKinesisProxy/AWS SDK
> 2.x dependency. I have also added another precursor step to generally
> improve test coverage. I have been playing around in the connector code and
> discovered that restarting consumption from an aggregated record is not
> covered by unit/integration tests. I have written some additional tests
> with simulated Kinesis behaviour, pushing these tests in advance would
> increase confidence in the next contribution step.
>
>     Regarding the consumer de-/registration. I had not considered making
> it optional via configuration as you propose. I agree with your observation
> on the additional configuration complexity and it would also expose
> internal implementation details to the user. That being said, if a user
> application has a very high parallelism they could quite easily exceed the
> ListStreamConsumers quota (5 TPS [1]), and increase the application
> start-up time substantially with back-off delays. The client could
> conditionally (based on config) register the stream consumer and add the
> ConsumerARN(s) to the consumer properties, eliminating the required
> parallel calls to ListStreamConsumers by the Flink tasks. I will update the
> FLIP to include this change, and reply to this thread once it is done.
>
>     [1]
> https://docs.aws.amazon.com/kinesis/latest/APIReference/API_ListStreamConsumers.html
>
>     Thanks,
>     Danny,
>
>     On 29/06/2020, 05:19, "Tzu-Li (Gordon) Tai" <tz...@apache.org>
> wrote:
>
>         CAUTION: This email originated from outside of the organization.
> Do not click links or open attachments unless you can confirm the sender
> and know the content is safe.
>
>
>
>         Also, if it wasn't clear, I'll be happy to provide committer
> support on
>         reviewing and merging this FLIP, if it gets approved :)
>
>         On Mon, Jun 29, 2020 at 12:12 PM Tzu-Li (Gordon) Tai <
> tzulitai@apache.org>
>         wrote:
>
>         > Hi Cranmer,
>         >
>         > Thank you for proposing the feature and starting the discussion
> thread.
>         > This is really great work!
>         >
>         > Overall, +1 to adding EFO support to the Kinesis connector.
>         > I can see that having a dedicated throughput quota for each
> consuming
>         > Flink application is definitely a requirement for AWS users.
>         > In the past, we worked around this by using adaptive polling to
> avoid
>         > exceeding the quotas with multiple consumers, this would
> probably go away
>         > with this implemented.
>         >
>         > There are a few things I like about the current proposal:
>         > - EFO is an opt-in feature for now. Once we decide to
> reimplement the
>         > Kinesis connector on top of the new source interface (FLIP-27),
> we can
>         > probably consider enabling EFO by default to match the defaults
> of the
>         > higher-level KCL library.
>         > - From the design, it seems like the changes can indeed be fairly
>         > consolidated in the Kinesis connector. The change should be
> fairly safe as
>         > well, since we're essentially abstracting record publishing
> concerns only,
>         > which is transparent to the exactly-once semantics / watermark
> components.
>         >
>         > Concerning competing stream consumer de-/registration:
>         > this would most likely go away with the new source interface,
> where this
>         > can be done on the source split enumerator.
>         > I'm personally okay with the proposed strategy of competing with
> backoff.
>         > As food for thought, have you considered an opt-in / opt-out,
> where the
>         > user knows that the client has access to KDS, and can choose to
> register
>         > once only on the client side instead of competing in TMs?
>         > I'm not sure if this is worth the extra configuration complexity
> though.
>         >
>         > For the concrete next steps for implementation after this FLIP
> has passed:
>         > From the FLIP I can conclude that implementation wise, this
> would come in
>         > a few steps -
>         > 1. Abstract away record publishing behind a new interface.
> Initially we
>         > only have one implementation (the current polling mechanism).
>         > 2. Add in AWS SDK 2.x dependency, with a new FanOutKinesisProxy
>         > implementation.
>         > 3. Add FanOutRecordPublisher to finalize EFO support.
>         >
>         > I think step 2. wasn't explicitly mentioned in the FLIP, but I
> strongly
>         > suggest to consolidate that step as a single PR, as we would
> need to do a
>         > license check for dependency changes and it would be nice to
> move forward
>         > with that with at least interference of code changes as possible.
>         >
>         > Pushing this FLIP forward for approval:
>         > Since this FLIP is a fairly consolidated change, we should be
> safe to
>         > proceed with a vote soon.
>         > That usually happens in a separate vote thread, linking to this
> discussion
>         > thread.
>         >
>         > cc'ing Thomas Weise as well, as he has also worked substantially
> on the
>         > Kinesis connector in the past.
>         >
>         > Cheers,
>         > Gordon
>         >
>         > On Tue, Jun 23, 2020 at 6:02 PM hey_wxl <
> xiaolong.wang@smartnews.com>
>         > wrote:
>         >
>         >> Hi, Cranmer.
>         >>
>         >>    I'm Roland Wang. I've read the FLIP you wrote, and agree
> with your
>         >> design.
>         >>    Recently, I'm working on this feature too, and have made
> some progress:
>         >>
>         >>    1. I add two methods: getOrRegisterConsumer &
> subscribeToShard on
>         >> KinesisProxyInterface.
>         >>    2. I re-implement the KinesisProxy using AWS SDK V2.x.
>         >>    3. I use the new KinesisProxy to implement ShardConsumer.
>         >>
>         >>    Though my design is not fully considered, I hope we can
> discuss a
>         >> little
>         >> bit about this feature. I wish to make some contribution to the
> community.
>         >>
>         >>
>         >>
>         >>
>         >> --
>         >> Sent from:
>         >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
>         >>
>         >
>
>
>