You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Ying Xu <yx...@lyft.com> on 2018/07/04 05:59:14 UTC

Re: Consuming data from dynamoDB streams to flink

HI Gordon:

We are starting to implement some of the primitives along this path. Please
let us know if you have any suggestions.

Thanks!

On Fri, Jun 29, 2018 at 12:31 AM, Ying Xu <yx...@lyft.com> wrote:

> Hi Gordon:
>
> Really appreciate the reply.
>
> Yes our plan is to build the connector on top of the FlinkKinesisConsumer.
> At the high level, FlinkKinesisConsumer mainly interacts with Kinesis
> through the AmazonKinesis client, more specifically through the following
> three function calls:
>
>    - describeStream
>    - getRecords
>    - getShardIterator
>
> Given that the low-level DynamoDB client (AmazonDynamoDBStreamsClient)
> has already implemented similar calls, it is possible to use that client to
> interact with the dynamoDB streams, and adapt the results from the dynamoDB
> streams model to the kinesis model.
>
> It appears this is exactly what the AmazonDynamoDBStreamsAdapterClient
> <https://github.com/awslabs/dynamodb-streams-kinesis-adapter/blob/master/src/main/java/com/amazonaws/services/dynamodbv2/streamsadapter/AmazonDynamoDBStreamsAdapterClient.java>
> does. The adaptor client implements the AmazonKinesis client interface,
> and is officially supported by AWS.  Hence it is possible to replace the
> internal Kinesis client inside FlinkKinesisConsumer with this adapter
> client when interacting with dynamoDB streams.  The new object can be a
> subclass of FlinkKinesisConsumer with a new name e.g, FlinkDynamoStreamCon
> sumer.
>
> At best this could simply work. But we would like to hear if there are
> other situations to take care of.  In particular, I am wondering what's the *"resharding
> behavior"* mentioned in FLINK-4582.
>
> Thanks a lot!
>
> -
> Ying
>
> On Wed, Jun 27, 2018 at 10:43 PM, Tzu-Li (Gordon) Tai <tzulitai@apache.org
> > wrote:
>
>> Hi!
>>
>> I think it would be definitely nice to have this feature.
>>
>> No actual previous work has been made on this issue, but AFAIK, we should
>> be able to build this on top of the FlinkKinesisConsumer.
>> Whether this should live within the Kinesis connector module or an
>> independent module of its own is still TBD.
>> If you want, I would be happy to look at any concrete design proposals you
>> have for this before you start the actual development efforts.
>>
>> Cheers,
>> Gordon
>>
>> On Thu, Jun 28, 2018 at 2:12 AM Ying Xu <yx...@lyft.com> wrote:
>>
>> > Thanks Fabian for the suggestion.
>> >
>> > *Ying Xu*
>> > Software Engineer
>> > 510.368.1252 <+15103681252>
>> > [image: Lyft] <http://www.lyft.com/>
>> >
>> > On Wed, Jun 27, 2018 at 2:01 AM, Fabian Hueske <fh...@gmail.com>
>> wrote:
>> >
>> > > Hi Ying,
>> > >
>> > > I'm not aware of any effort for this issue.
>> > > You could check with the assigned contributor in Jira if there is some
>> > > previous work.
>> > >
>> > > Best, Fabian
>> > >
>> > > 2018-06-26 9:46 GMT+02:00 Ying Xu <yx...@lyft.com>:
>> > >
>> > > > Hello Flink dev:
>> > > >
>> > > > We have a number of use cases which involves pulling data from
>> DynamoDB
>> > > > streams into Flink.
>> > > >
>> > > > Given that this issue is tracked by Flink-4582
>> > > > <https://issues.apache.org/jira/browse/FLINK-4582>. we would like
>> to
>> > > check
>> > > > if any prior work has been completed by the community.   We are also
>> > very
>> > > > interested in contributing to this effort.  Currently, we have a
>> > > high-level
>> > > > proposal which is based on extending the existing
>> FlinkKinesisConsumer
>> > > and
>> > > > making it work with DynamoDB streams (via integrating with the
>> > > > AmazonDynamoDBStreams API).
>> > > >
>> > > > Any suggestion is welcome. Thank you very much.
>> > > >
>> > > >
>> > > > -
>> > > > Ying
>> > > >
>> > >
>> >
>>
>
>

Re: Consuming data from dynamoDB streams to flink

Posted by Vinay Patil <vi...@gmail.com>.

Hello,

For anyone looking for setting up alerts for flink application ,here is
good blog by Flink itself :
https://www.ververica.com/blog/monitoring-apache-flink-applications-101
So, for dynamoDb streams we can set the alert on millisBehindLatest

Regards,
Vinay Patil


On Wed, Aug 7, 2019 at 2:24 PM Vinay Patil <vi...@gmail.com> wrote:

> Hi Andrey,
>
> Thank you for your reply, I understand that the checkpoints are gone when
> the job is cancelled or killed, may be configuring external checkpoints
> will help here so that we can resume from there.
>
> My points was if the job is terminated, and the stream position is set to
> TRIM_HORIZON , the consumer will start processing from the start, I was
> curious to know if there is a configuration like kafka_group_id that we can
> set for dynamoDB streams.
>
> Also, can you please let me know on which metrics should I generate an
> alert in case of DynamoDb Streams  (I am sending the metrics to
> prometheus), I see these metrics :
> https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/metrics/ShardMetricsReporter.java
>
>
> For Example: in case of Kafka we generate  alert when the consumer is
> lagging behind.
>
>
> Regards,
> Vinay Patil
>
>
> On Fri, Jul 19, 2019 at 10:40 PM Andrey Zagrebin <an...@ververica.com>
> wrote:
>
>> Hi Vinay,
>>
>> 1. I would assume it works similar to kinesis connector (correct me if
>> wrong, people who actually developed it)
>> 2. If you have activated just checkpointing, the checkpoints are gone if
>> you externally kill the job. You might be interested in savepoints [1]
>> 3. See paragraph in [2] about kinesis consumer parallelism
>>
>> Best,
>> Andrey
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html
>> [2]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/connectors/kinesis.html#kinesis-consumer
>>
>> On Fri, Jul 19, 2019 at 1:45 PM Vinay Patil <vi...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am using this consumer for processing records from DynamoDb Streams ,
>>> few questions on this :
>>>
>>> 1. How does checkpointing works with Dstreams, since this class is
>>> extending FlinkKinesisConsumer, I am assuming it will start from the last
>>> successful checkpoint in case of failure, right ?
>>> 2. Currently when I kill the pipeline and start again it reads all the
>>> data from the start of the stream, is there any configuration to avoid this
>>> (apart from ConsumerConfigConstants.InitialPosition.LATEST), similar to
>>> group-id in Kafka.
>>> 3. As these DynamoDB Streams are separated by shards what is the
>>> recommended parallelism to be set for the source , should it be one to one
>>> mapping , for example if there are 3 shards , then parallelism should be 3 ?
>>>
>>>
>>> Regards,
>>> Vinay Patil
>>>
>>>
>>> On Wed, Aug 1, 2018 at 3:42 PM Ying Xu [via Apache Flink Mailing List
>>> archive.] <ml...@n3.nabble.com> wrote:
>>>
>>>> Thank you so much Fabian!
>>>>
>>>> Will update status in the JIRA.
>>>>
>>>> -
>>>> Ying
>>>>
>>>> On Tue, Jul 31, 2018 at 1:37 AM, Fabian Hueske <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=0>> wrote:
>>>>
>>>> > Done!
>>>> >
>>>> > Thank you :-)
>>>> >
>>>> > 2018-07-31 6:41 GMT+02:00 Ying Xu <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=1>>:
>>>> >
>>>> > > Thanks Fabian and Thomas.
>>>> > >
>>>> > > Please assign FLINK-4582 to the following username:
>>>> > >     *yxu-apache
>>>> > > <
>>>> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=yxu-apache
>>>> > >*
>>>> > >
>>>> > > If needed I can get a ICLA or CCLA whichever is proper.
>>>> > >
>>>> > > *Ying Xu*
>>>> > > Software Engineer
>>>> > > 510.368.1252 <+15103681252>
>>>> > > [image: Lyft] <http://www.lyft.com/>
>>>> > >
>>>> > > On Mon, Jul 30, 2018 at 8:31 PM, Thomas Weise <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=2>> wrote:
>>>> > >
>>>> > > > The user is yxu-lyft, Ying had commented on that JIRA as well.
>>>> > > >
>>>> > > > https://issues.apache.org/jira/browse/FLINK-4582
>>>> > > >
>>>> > > >
>>>> > > > On Mon, Jul 30, 2018 at 1:25 AM Fabian Hueske <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=3>>
>>>> > wrote:
>>>> > > >
>>>> > > > > Hi Ying,
>>>> > > > >
>>>> > > > > Thanks for considering to contribute the connector!
>>>> > > > >
>>>> > > > > In general, you don't need special permissions to contribute to
>>>> > Flink.
>>>> > > > > Anybody can open Jiras and PRs.
>>>> > > > > You only need to be assigned to the Contributor role in Jira to
>>>> be
>>>> > able
>>>> > > > to
>>>> > > > > assign an issue to you.
>>>> > > > > I can give you these permissions if you tell me your Jira user.
>>>> > > > >
>>>> > > > > It would also be good if you could submit a CLA [1] if you plan
>>>> to
>>>> > > > > contribute a larger feature.
>>>> > > > >
>>>> > > > > Thanks, Fabian
>>>> > > > >
>>>> > > > > [1] https://www.apache.org/licenses/#clas
>>>> > > > >
>>>> > > > >
>>>> > > > > 2018-07-30 10:07 GMT+02:00 Ying Xu <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=4>>:
>>>> > > > >
>>>> > > > > > Hello Flink dev:
>>>> > > > > >
>>>> > > > > > We have implemented the prototype design and the initial PoC
>>>> worked
>>>> > > > > pretty
>>>> > > > > > well.  Currently, we plan to move ahead with this design in
>>>> our
>>>> > > > internal
>>>> > > > > > production system.
>>>> > > > > >
>>>> > > > > > We are thinking of contributing this connector back to the
>>>> flink
>>>> > > > > community
>>>> > > > > > sometime soon.  May I request to be granted with a
>>>> contributor
>>>> > role?
>>>> > > > > >
>>>> > > > > > Many thanks in advance.
>>>> > > > > >
>>>> > > > > > *Ying Xu*
>>>> > > > > > Software Engineer
>>>> > > > > > 510.368.1252 <+15103681252>
>>>> > > > > > [image: Lyft] <http://www.lyft.com/>
>>>> > > > > >
>>>> > > > > > On Fri, Jul 6, 2018 at 6:23 PM, Ying Xu <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=5>> wrote:
>>>> > > > > >
>>>> > > > > > > Hi Gordon:
>>>> > > > > > >
>>>> > > > > > > Cool. Thanks for the thumb-up!
>>>> > > > > > >
>>>> > > > > > > We will include some test cases around the behavior of
>>>> > re-sharding.
>>>> > > > If
>>>> > > > > > > needed we can double check the behavior with AWS, and see
>>>> if
>>>> > > > additional
>>>> > > > > > > changes are needed.  Will keep you posted.
>>>> > > > > > >
>>>> > > > > > > -
>>>> > > > > > > Ying
>>>> > > > > > >
>>>> > > > > > > On Wed, Jul 4, 2018 at 7:22 PM, Tzu-Li (Gordon) Tai <
>>>> > > > > [hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=6>
>>>> > > > > > >
>>>> > > > > > > wrote:
>>>> > > > > > >
>>>> > > > > > >> Hi Ying,
>>>> > > > > > >>
>>>> > > > > > >> Sorry for the late reply here.
>>>> > > > > > >>
>>>> > > > > > >> From the looks of the AmazonDynamoDBStreamsClient, yes it
>>>> seems
>>>> > > like
>>>> > > > > > this
>>>> > > > > > >> should simply work.
>>>> > > > > > >>
>>>> > > > > > >> Regarding the resharding behaviour I mentioned in the
>>>> JIRA:
>>>> > > > > > >> I'm not sure if this is really a difference in behaviour.
>>>> > > > Internally,
>>>> > > > > if
>>>> > > > > > >> DynamoDB streams is actually just working on Kinesis
>>>> Streams,
>>>> > then
>>>> > > > the
>>>> > > > > > >> resharding primitives should be similar.
>>>> > > > > > >> The shard discovery logic of the Flink Kinesis Consumer
>>>> assumes
>>>> > > that
>>>> > > > > > >> splitting / merging shards will result in new shards of
>>>> > > increasing,
>>>> > > > > > >> consecutive shard ids. As long as this is also the
>>>> behaviour for
>>>> > > > > > DynamoDB
>>>> > > > > > >> resharding, then we should be fine.
>>>> > > > > > >>
>>>> > > > > > >> Feel free to start with the implementation for this, I
>>>> think
>>>> > > > > design-wise
>>>> > > > > > >> we're good to go. And thanks for working on this!
>>>> > > > > > >>
>>>> > > > > > >> Cheers,
>>>> > > > > > >> Gordon
>>>> > > > > > >>
>>>> > > > > > >> On Wed, Jul 4, 2018 at 1:59 PM Ying Xu <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=7>> wrote:
>>>> > > > > > >>
>>>> > > > > > >> > HI Gordon:
>>>> > > > > > >> >
>>>> > > > > > >> > We are starting to implement some of the primitives
>>>> along this
>>>> > > > path.
>>>> > > > > > >> Please
>>>> > > > > > >> > let us know if you have any suggestions.
>>>> > > > > > >> >
>>>> > > > > > >> > Thanks!
>>>> > > > > > >> >
>>>> > > > > > >> > On Fri, Jun 29, 2018 at 12:31 AM, Ying Xu <[hidden
>>>> email] <http:///user/SendEmail.jtp?type=node&node=23597&i=8>>
>>>> > wrote:
>>>> > > > > > >> >
>>>> > > > > > >> > > Hi Gordon:
>>>> > > > > > >> > >
>>>> > > > > > >> > > Really appreciate the reply.
>>>> > > > > > >> > >
>>>> > > > > > >> > > Yes our plan is to build the connector on top of the
>>>> > > > > > >> > FlinkKinesisConsumer.
>>>> > > > > > >> > > At the high level, FlinkKinesisConsumer mainly
>>>> interacts
>>>> > with
>>>> > > > > > Kinesis
>>>> > > > > > >> > > through the AmazonKinesis client, more specifically
>>>> through
>>>> > > the
>>>> > > > > > >> following
>>>> > > > > > >> > > three function calls:
>>>> > > > > > >> > >
>>>> > > > > > >> > >    - describeStream
>>>> > > > > > >> > >    - getRecords
>>>> > > > > > >> > >    - getShardIterator
>>>> > > > > > >> > >
>>>> > > > > > >> > > Given that the low-level DynamoDB client
>>>> > > > > > (AmazonDynamoDBStreamsClient)
>>>> > > > > > >> > > has already implemented similar calls, it is possible
>>>> to use
>>>> > > > that
>>>> > > > > > >> client
>>>> > > > > > >> > to
>>>> > > > > > >> > > interact with the dynamoDB streams, and adapt the
>>>> results
>>>> > from
>>>> > > > the
>>>> > > > > > >> > dynamoDB
>>>> > > > > > >> > > streams model to the kinesis model.
>>>> > > > > > >> > >
>>>> > > > > > >> > > It appears this is exactly what the
>>>> > > > AmazonDynamoDBStreamsAdapterCl
>>>> > > > > > >> ient
>>>> > > > > > >> > > <
>>>> > > > > > >> >
>>>> https://github.com/awslabs/dynamodb-streams-kinesis-adapter/
>>>> > > > > > >>
>>>> blob/master/src/main/java/com/amazonaws/services/dynamodbv2/
>>>> > > > > > >> streamsadapter/AmazonDynamoDBStreamsAdapterClient.java
>>>> > > > > > >> > >
>>>> > > > > > >> > > does. The adaptor client implements the AmazonKinesis
>>>> client
>>>> > > > > > >> interface,
>>>> > > > > > >> > > and is officially supported by AWS.  Hence it is
>>>> possible to
>>>> > > > > replace
>>>> > > > > > >> the
>>>> > > > > > >> > > internal Kinesis client inside FlinkKinesisConsumer
>>>> with
>>>> > this
>>>> > > > > > adapter
>>>> > > > > > >> > > client when interacting with dynamoDB streams.  The
>>>> new
>>>> > object
>>>> > > > can
>>>> > > > > > be
>>>> > > > > > >> a
>>>> > > > > > >> > > subclass of FlinkKinesisConsumer with a new name e.g,
>>>> > > > > > >> > FlinkDynamoStreamCon
>>>> > > > > > >> > > sumer.
>>>> > > > > > >> > >
>>>> > > > > > >> > > At best this could simply work. But we would like to
>>>> hear if
>>>> > > > there
>>>> > > > > > are
>>>> > > > > > >> > > other situations to take care of.  In particular, I am
>>>> > > wondering
>>>> > > > > > >> what's
>>>> > > > > > >> > the *"resharding
>>>> > > > > > >> > > behavior"* mentioned in FLINK-4582.
>>>> > > > > > >> > >
>>>> > > > > > >> > > Thanks a lot!
>>>> > > > > > >> > >
>>>> > > > > > >> > > -
>>>> > > > > > >> > > Ying
>>>> > > > > > >> > >
>>>> > > > > > >> > > On Wed, Jun 27, 2018 at 10:43 PM, Tzu-Li (Gordon) Tai
>>>> <
>>>> > > > > > >> > [hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=9>
>>>> > > > > > >> > > > wrote:
>>>> > > > > > >> > >
>>>> > > > > > >> > >> Hi!
>>>> > > > > > >> > >>
>>>> > > > > > >> > >> I think it would be definitely nice to have this
>>>> feature.
>>>> > > > > > >> > >>
>>>> > > > > > >> > >> No actual previous work has been made on this issue,
>>>> but
>>>> > > AFAIK,
>>>> > > > > we
>>>> > > > > > >> > should
>>>> > > > > > >> > >> be able to build this on top of the
>>>> FlinkKinesisConsumer.
>>>> > > > > > >> > >> Whether this should live within the Kinesis connector
>>>> > module
>>>> > > or
>>>> > > > > an
>>>> > > > > > >> > >> independent module of its own is still TBD.
>>>> > > > > > >> > >> If you want, I would be happy to look at any concrete
>>>> > design
>>>> > > > > > >> proposals
>>>> > > > > > >> > you
>>>> > > > > > >> > >> have for this before you start the actual development
>>>> > > efforts.
>>>> > > > > > >> > >>
>>>> > > > > > >> > >> Cheers,
>>>> > > > > > >> > >> Gordon
>>>> > > > > > >> > >>
>>>> > > > > > >> > >> On Thu, Jun 28, 2018 at 2:12 AM Ying Xu <[hidden
>>>> email] <http:///user/SendEmail.jtp?type=node&node=23597&i=10>>
>>>> > > wrote:
>>>> > > > > > >> > >>
>>>> > > > > > >> > >> > Thanks Fabian for the suggestion.
>>>> > > > > > >> > >> >
>>>> > > > > > >> > >> > *Ying Xu*
>>>> > > > > > >> > >> > Software Engineer
>>>> > > > > > >> > >> > 510.368.1252 <+15103681252>
>>>> > > > > > >> > >> > [image: Lyft] <http://www.lyft.com/>
>>>> > > > > > >> > >> >
>>>> > > > > > >> > >> > On Wed, Jun 27, 2018 at 2:01 AM, Fabian Hueske <
>>>> > > > > > [hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=11>>
>>>> > > > > > >> > >> wrote:
>>>> > > > > > >> > >> >
>>>> > > > > > >> > >> > > Hi Ying,
>>>> > > > > > >> > >> > >
>>>> > > > > > >> > >> > > I'm not aware of any effort for this issue.
>>>> > > > > > >> > >> > > You could check with the assigned contributor in
>>>> Jira
>>>> > if
>>>> > > > > there
>>>> > > > > > is
>>>> > > > > > >> > some
>>>> > > > > > >> > >> > > previous work.
>>>> > > > > > >> > >> > >
>>>> > > > > > >> > >> > > Best, Fabian
>>>> > > > > > >> > >> > >
>>>> > > > > > >> > >> > > 2018-06-26 9:46 GMT+02:00 Ying Xu <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=12>>:
>>>> > > > > > >> > >> > >
>>>> > > > > > >> > >> > > > Hello Flink dev:
>>>> > > > > > >> > >> > > >
>>>> > > > > > >> > >> > > > We have a number of use cases which involves
>>>> pulling
>>>> > > data
>>>> > > > > > from
>>>> > > > > > >> > >> DynamoDB
>>>> > > > > > >> > >> > > > streams into Flink.
>>>> > > > > > >> > >> > > >
>>>> > > > > > >> > >> > > > Given that this issue is tracked by Flink-4582
>>>> > > > > > >> > >> > > > <
>>>> https://issues.apache.org/jira/browse/FLINK-4582>.
>>>> > we
>>>> > > > > would
>>>> > > > > > >> like
>>>> > > > > > >> > >> to
>>>> > > > > > >> > >> > > check
>>>> > > > > > >> > >> > > > if any prior work has been completed by the
>>>> > community.
>>>> > > >  We
>>>> > > > > > are
>>>> > > > > > >> > also
>>>> > > > > > >> > >> > very
>>>> > > > > > >> > >> > > > interested in contributing to this effort.
>>>> > Currently,
>>>> > > we
>>>> > > > > > have
>>>> > > > > > >> a
>>>> > > > > > >> > >> > > high-level
>>>> > > > > > >> > >> > > > proposal which is based on extending the
>>>> existing
>>>> > > > > > >> > >> FlinkKinesisConsumer
>>>> > > > > > >> > >> > > and
>>>> > > > > > >> > >> > > > making it work with DynamoDB streams (via
>>>> integrating
>>>> > > > with
>>>> > > > > > the
>>>> > > > > > >> > >> > > > AmazonDynamoDBStreams API).
>>>> > > > > > >> > >> > > >
>>>> > > > > > >> > >> > > > Any suggestion is welcome. Thank you very much.
>>>> > > > > > >> > >> > > >
>>>> > > > > > >> > >> > > >
>>>> > > > > > >> > >> > > > -
>>>> > > > > > >> > >> > > > Ying
>>>> > > > > > >> > >> > > >
>>>> > > > > > >> > >> > >
>>>> > > > > > >> > >> >
>>>> > > > > > >> > >>
>>>> > > > > > >> > >
>>>> > > > > > >> > >
>>>> > > > > > >> >
>>>> > > > > > >>
>>>> > > > > > >
>>>> > > > > > >
>>>> > > > > >
>>>> > > > >
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>>
>>>> ------------------------------
>>>> If you reply to this email, your message will be added to the
>>>> discussion below:
>>>>
>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Consuming-data-from-dynamoDB-streams-to-flink-tp22963p23597.html
>>>> To start a new topic under Apache Flink Mailing List archive., email
>>>> ml+s1008284n1h67@n3.nabble.com
>>>> To unsubscribe from Apache Flink Mailing List archive., click here
>>>> <http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=dmluYXkxOC5wYXRpbEBnbWFpbC5jb218MXwxODExMDE2NjAx>
>>>> .
>>>> NAML
>>>> <http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>>
>>>

Re: Consuming data from dynamoDB streams to flink

Posted by Vinay Patil <vi...@gmail.com>.

Hi Andrey,

Thank you for your reply, I understand that the checkpoints are gone when
the job is cancelled or killed, may be configuring external checkpoints
will help here so that we can resume from there.

My points was if the job is terminated, and the stream position is set to
TRIM_HORIZON , the consumer will start processing from the start, I was
curious to know if there is a configuration like kafka_group_id that we can
set for dynamoDB streams.

Also, can you please let me know on which metrics should I generate an
alert in case of DynamoDb Streams  (I am sending the metrics to
prometheus), I see these metrics :
https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/metrics/ShardMetricsReporter.java


For Example: in case of Kafka we generate  alert when the consumer is
lagging behind.


Regards,
Vinay Patil


On Fri, Jul 19, 2019 at 10:40 PM Andrey Zagrebin <an...@ververica.com>
wrote:

> Hi Vinay,
>
> 1. I would assume it works similar to kinesis connector (correct me if
> wrong, people who actually developed it)
> 2. If you have activated just checkpointing, the checkpoints are gone if
> you externally kill the job. You might be interested in savepoints [1]
> 3. See paragraph in [2] about kinesis consumer parallelism
>
> Best,
> Andrey
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/connectors/kinesis.html#kinesis-consumer
>
> On Fri, Jul 19, 2019 at 1:45 PM Vinay Patil <vi...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am using this consumer for processing records from DynamoDb Streams ,
>> few questions on this :
>>
>> 1. How does checkpointing works with Dstreams, since this class is
>> extending FlinkKinesisConsumer, I am assuming it will start from the last
>> successful checkpoint in case of failure, right ?
>> 2. Currently when I kill the pipeline and start again it reads all the
>> data from the start of the stream, is there any configuration to avoid this
>> (apart from ConsumerConfigConstants.InitialPosition.LATEST), similar to
>> group-id in Kafka.
>> 3. As these DynamoDB Streams are separated by shards what is the
>> recommended parallelism to be set for the source , should it be one to one
>> mapping , for example if there are 3 shards , then parallelism should be 3 ?
>>
>>
>> Regards,
>> Vinay Patil
>>
>>
>> On Wed, Aug 1, 2018 at 3:42 PM Ying Xu [via Apache Flink Mailing List
>> archive.] <ml...@n3.nabble.com> wrote:
>>
>>> Thank you so much Fabian!
>>>
>>> Will update status in the JIRA.
>>>
>>> -
>>> Ying
>>>
>>> On Tue, Jul 31, 2018 at 1:37 AM, Fabian Hueske <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=0>> wrote:
>>>
>>> > Done!
>>> >
>>> > Thank you :-)
>>> >
>>> > 2018-07-31 6:41 GMT+02:00 Ying Xu <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=1>>:
>>> >
>>> > > Thanks Fabian and Thomas.
>>> > >
>>> > > Please assign FLINK-4582 to the following username:
>>> > >     *yxu-apache
>>> > > <
>>> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=yxu-apache
>>> > >*
>>> > >
>>> > > If needed I can get a ICLA or CCLA whichever is proper.
>>> > >
>>> > > *Ying Xu*
>>> > > Software Engineer
>>> > > 510.368.1252 <+15103681252>
>>> > > [image: Lyft] <http://www.lyft.com/>
>>> > >
>>> > > On Mon, Jul 30, 2018 at 8:31 PM, Thomas Weise <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=2>> wrote:
>>> > >
>>> > > > The user is yxu-lyft, Ying had commented on that JIRA as well.
>>> > > >
>>> > > > https://issues.apache.org/jira/browse/FLINK-4582
>>> > > >
>>> > > >
>>> > > > On Mon, Jul 30, 2018 at 1:25 AM Fabian Hueske <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=3>>
>>> > wrote:
>>> > > >
>>> > > > > Hi Ying,
>>> > > > >
>>> > > > > Thanks for considering to contribute the connector!
>>> > > > >
>>> > > > > In general, you don't need special permissions to contribute to
>>> > Flink.
>>> > > > > Anybody can open Jiras and PRs.
>>> > > > > You only need to be assigned to the Contributor role in Jira to
>>> be
>>> > able
>>> > > > to
>>> > > > > assign an issue to you.
>>> > > > > I can give you these permissions if you tell me your Jira user.
>>> > > > >
>>> > > > > It would also be good if you could submit a CLA [1] if you plan
>>> to
>>> > > > > contribute a larger feature.
>>> > > > >
>>> > > > > Thanks, Fabian
>>> > > > >
>>> > > > > [1] https://www.apache.org/licenses/#clas
>>> > > > >
>>> > > > >
>>> > > > > 2018-07-30 10:07 GMT+02:00 Ying Xu <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=4>>:
>>> > > > >
>>> > > > > > Hello Flink dev:
>>> > > > > >
>>> > > > > > We have implemented the prototype design and the initial PoC
>>> worked
>>> > > > > pretty
>>> > > > > > well.  Currently, we plan to move ahead with this design in
>>> our
>>> > > > internal
>>> > > > > > production system.
>>> > > > > >
>>> > > > > > We are thinking of contributing this connector back to the
>>> flink
>>> > > > > community
>>> > > > > > sometime soon.  May I request to be granted with a contributor
>>> > role?
>>> > > > > >
>>> > > > > > Many thanks in advance.
>>> > > > > >
>>> > > > > > *Ying Xu*
>>> > > > > > Software Engineer
>>> > > > > > 510.368.1252 <+15103681252>
>>> > > > > > [image: Lyft] <http://www.lyft.com/>
>>> > > > > >
>>> > > > > > On Fri, Jul 6, 2018 at 6:23 PM, Ying Xu <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=5>> wrote:
>>> > > > > >
>>> > > > > > > Hi Gordon:
>>> > > > > > >
>>> > > > > > > Cool. Thanks for the thumb-up!
>>> > > > > > >
>>> > > > > > > We will include some test cases around the behavior of
>>> > re-sharding.
>>> > > > If
>>> > > > > > > needed we can double check the behavior with AWS, and see if
>>> > > > additional
>>> > > > > > > changes are needed.  Will keep you posted.
>>> > > > > > >
>>> > > > > > > -
>>> > > > > > > Ying
>>> > > > > > >
>>> > > > > > > On Wed, Jul 4, 2018 at 7:22 PM, Tzu-Li (Gordon) Tai <
>>> > > > > [hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=6>
>>> > > > > > >
>>> > > > > > > wrote:
>>> > > > > > >
>>> > > > > > >> Hi Ying,
>>> > > > > > >>
>>> > > > > > >> Sorry for the late reply here.
>>> > > > > > >>
>>> > > > > > >> From the looks of the AmazonDynamoDBStreamsClient, yes it
>>> seems
>>> > > like
>>> > > > > > this
>>> > > > > > >> should simply work.
>>> > > > > > >>
>>> > > > > > >> Regarding the resharding behaviour I mentioned in the JIRA:
>>> > > > > > >> I'm not sure if this is really a difference in behaviour.
>>> > > > Internally,
>>> > > > > if
>>> > > > > > >> DynamoDB streams is actually just working on Kinesis
>>> Streams,
>>> > then
>>> > > > the
>>> > > > > > >> resharding primitives should be similar.
>>> > > > > > >> The shard discovery logic of the Flink Kinesis Consumer
>>> assumes
>>> > > that
>>> > > > > > >> splitting / merging shards will result in new shards of
>>> > > increasing,
>>> > > > > > >> consecutive shard ids. As long as this is also the
>>> behaviour for
>>> > > > > > DynamoDB
>>> > > > > > >> resharding, then we should be fine.
>>> > > > > > >>
>>> > > > > > >> Feel free to start with the implementation for this, I
>>> think
>>> > > > > design-wise
>>> > > > > > >> we're good to go. And thanks for working on this!
>>> > > > > > >>
>>> > > > > > >> Cheers,
>>> > > > > > >> Gordon
>>> > > > > > >>
>>> > > > > > >> On Wed, Jul 4, 2018 at 1:59 PM Ying Xu <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=7>> wrote:
>>> > > > > > >>
>>> > > > > > >> > HI Gordon:
>>> > > > > > >> >
>>> > > > > > >> > We are starting to implement some of the primitives along
>>> this
>>> > > > path.
>>> > > > > > >> Please
>>> > > > > > >> > let us know if you have any suggestions.
>>> > > > > > >> >
>>> > > > > > >> > Thanks!
>>> > > > > > >> >
>>> > > > > > >> > On Fri, Jun 29, 2018 at 12:31 AM, Ying Xu <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=8>>
>>> > wrote:
>>> > > > > > >> >
>>> > > > > > >> > > Hi Gordon:
>>> > > > > > >> > >
>>> > > > > > >> > > Really appreciate the reply.
>>> > > > > > >> > >
>>> > > > > > >> > > Yes our plan is to build the connector on top of the
>>> > > > > > >> > FlinkKinesisConsumer.
>>> > > > > > >> > > At the high level, FlinkKinesisConsumer mainly
>>> interacts
>>> > with
>>> > > > > > Kinesis
>>> > > > > > >> > > through the AmazonKinesis client, more specifically
>>> through
>>> > > the
>>> > > > > > >> following
>>> > > > > > >> > > three function calls:
>>> > > > > > >> > >
>>> > > > > > >> > >    - describeStream
>>> > > > > > >> > >    - getRecords
>>> > > > > > >> > >    - getShardIterator
>>> > > > > > >> > >
>>> > > > > > >> > > Given that the low-level DynamoDB client
>>> > > > > > (AmazonDynamoDBStreamsClient)
>>> > > > > > >> > > has already implemented similar calls, it is possible
>>> to use
>>> > > > that
>>> > > > > > >> client
>>> > > > > > >> > to
>>> > > > > > >> > > interact with the dynamoDB streams, and adapt the
>>> results
>>> > from
>>> > > > the
>>> > > > > > >> > dynamoDB
>>> > > > > > >> > > streams model to the kinesis model.
>>> > > > > > >> > >
>>> > > > > > >> > > It appears this is exactly what the
>>> > > > AmazonDynamoDBStreamsAdapterCl
>>> > > > > > >> ient
>>> > > > > > >> > > <
>>> > > > > > >> >
>>> https://github.com/awslabs/dynamodb-streams-kinesis-adapter/
>>> > > > > > >>
>>> blob/master/src/main/java/com/amazonaws/services/dynamodbv2/
>>> > > > > > >> streamsadapter/AmazonDynamoDBStreamsAdapterClient.java
>>> > > > > > >> > >
>>> > > > > > >> > > does. The adaptor client implements the AmazonKinesis
>>> client
>>> > > > > > >> interface,
>>> > > > > > >> > > and is officially supported by AWS.  Hence it is
>>> possible to
>>> > > > > replace
>>> > > > > > >> the
>>> > > > > > >> > > internal Kinesis client inside FlinkKinesisConsumer
>>> with
>>> > this
>>> > > > > > adapter
>>> > > > > > >> > > client when interacting with dynamoDB streams.  The new
>>> > object
>>> > > > can
>>> > > > > > be
>>> > > > > > >> a
>>> > > > > > >> > > subclass of FlinkKinesisConsumer with a new name e.g,
>>> > > > > > >> > FlinkDynamoStreamCon
>>> > > > > > >> > > sumer.
>>> > > > > > >> > >
>>> > > > > > >> > > At best this could simply work. But we would like to
>>> hear if
>>> > > > there
>>> > > > > > are
>>> > > > > > >> > > other situations to take care of.  In particular, I am
>>> > > wondering
>>> > > > > > >> what's
>>> > > > > > >> > the *"resharding
>>> > > > > > >> > > behavior"* mentioned in FLINK-4582.
>>> > > > > > >> > >
>>> > > > > > >> > > Thanks a lot!
>>> > > > > > >> > >
>>> > > > > > >> > > -
>>> > > > > > >> > > Ying
>>> > > > > > >> > >
>>> > > > > > >> > > On Wed, Jun 27, 2018 at 10:43 PM, Tzu-Li (Gordon) Tai <
>>> > > > > > >> > [hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=9>
>>> > > > > > >> > > > wrote:
>>> > > > > > >> > >
>>> > > > > > >> > >> Hi!
>>> > > > > > >> > >>
>>> > > > > > >> > >> I think it would be definitely nice to have this
>>> feature.
>>> > > > > > >> > >>
>>> > > > > > >> > >> No actual previous work has been made on this issue,
>>> but
>>> > > AFAIK,
>>> > > > > we
>>> > > > > > >> > should
>>> > > > > > >> > >> be able to build this on top of the
>>> FlinkKinesisConsumer.
>>> > > > > > >> > >> Whether this should live within the Kinesis connector
>>> > module
>>> > > or
>>> > > > > an
>>> > > > > > >> > >> independent module of its own is still TBD.
>>> > > > > > >> > >> If you want, I would be happy to look at any concrete
>>> > design
>>> > > > > > >> proposals
>>> > > > > > >> > you
>>> > > > > > >> > >> have for this before you start the actual development
>>> > > efforts.
>>> > > > > > >> > >>
>>> > > > > > >> > >> Cheers,
>>> > > > > > >> > >> Gordon
>>> > > > > > >> > >>
>>> > > > > > >> > >> On Thu, Jun 28, 2018 at 2:12 AM Ying Xu <[hidden
>>> email] <http:///user/SendEmail.jtp?type=node&node=23597&i=10>>
>>> > > wrote:
>>> > > > > > >> > >>
>>> > > > > > >> > >> > Thanks Fabian for the suggestion.
>>> > > > > > >> > >> >
>>> > > > > > >> > >> > *Ying Xu*
>>> > > > > > >> > >> > Software Engineer
>>> > > > > > >> > >> > 510.368.1252 <+15103681252>
>>> > > > > > >> > >> > [image: Lyft] <http://www.lyft.com/>
>>> > > > > > >> > >> >
>>> > > > > > >> > >> > On Wed, Jun 27, 2018 at 2:01 AM, Fabian Hueske <
>>> > > > > > [hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=11>>
>>> > > > > > >> > >> wrote:
>>> > > > > > >> > >> >
>>> > > > > > >> > >> > > Hi Ying,
>>> > > > > > >> > >> > >
>>> > > > > > >> > >> > > I'm not aware of any effort for this issue.
>>> > > > > > >> > >> > > You could check with the assigned contributor in
>>> Jira
>>> > if
>>> > > > > there
>>> > > > > > is
>>> > > > > > >> > some
>>> > > > > > >> > >> > > previous work.
>>> > > > > > >> > >> > >
>>> > > > > > >> > >> > > Best, Fabian
>>> > > > > > >> > >> > >
>>> > > > > > >> > >> > > 2018-06-26 9:46 GMT+02:00 Ying Xu <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=23597&i=12>>:
>>> > > > > > >> > >> > >
>>> > > > > > >> > >> > > > Hello Flink dev:
>>> > > > > > >> > >> > > >
>>> > > > > > >> > >> > > > We have a number of use cases which involves
>>> pulling
>>> > > data
>>> > > > > > from
>>> > > > > > >> > >> DynamoDB
>>> > > > > > >> > >> > > > streams into Flink.
>>> > > > > > >> > >> > > >
>>> > > > > > >> > >> > > > Given that this issue is tracked by Flink-4582
>>> > > > > > >> > >> > > > <
>>> https://issues.apache.org/jira/browse/FLINK-4582>.
>>> > we
>>> > > > > would
>>> > > > > > >> like
>>> > > > > > >> > >> to
>>> > > > > > >> > >> > > check
>>> > > > > > >> > >> > > > if any prior work has been completed by the
>>> > community.
>>> > > >  We
>>> > > > > > are
>>> > > > > > >> > also
>>> > > > > > >> > >> > very
>>> > > > > > >> > >> > > > interested in contributing to this effort.
>>> > Currently,
>>> > > we
>>> > > > > > have
>>> > > > > > >> a
>>> > > > > > >> > >> > > high-level
>>> > > > > > >> > >> > > > proposal which is based on extending the
>>> existing
>>> > > > > > >> > >> FlinkKinesisConsumer
>>> > > > > > >> > >> > > and
>>> > > > > > >> > >> > > > making it work with DynamoDB streams (via
>>> integrating
>>> > > > with
>>> > > > > > the
>>> > > > > > >> > >> > > > AmazonDynamoDBStreams API).
>>> > > > > > >> > >> > > >
>>> > > > > > >> > >> > > > Any suggestion is welcome. Thank you very much.
>>> > > > > > >> > >> > > >
>>> > > > > > >> > >> > > >
>>> > > > > > >> > >> > > > -
>>> > > > > > >> > >> > > > Ying
>>> > > > > > >> > >> > > >
>>> > > > > > >> > >> > >
>>> > > > > > >> > >> >
>>> > > > > > >> > >>
>>> > > > > > >> > >
>>> > > > > > >> > >
>>> > > > > > >> >
>>> > > > > > >>
>>> > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>>
>>> ------------------------------
>>> If you reply to this email, your message will be added to the discussion
>>> below:
>>>
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Consuming-data-from-dynamoDB-streams-to-flink-tp22963p23597.html
>>> To start a new topic under Apache Flink Mailing List archive., email
>>> ml+s1008284n1h67@n3.nabble.com
>>> To unsubscribe from Apache Flink Mailing List archive., click here
>>> <http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=dmluYXkxOC5wYXRpbEBnbWFpbC5jb218MXwxODExMDE2NjAx>
>>> .
>>> NAML
>>> <http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>
>>

Re: Consuming data from dynamoDB streams to flink

Posted by Andrey Zagrebin <an...@ververica.com>.

Hi Vinay,

1. I would assume it works similar to kinesis connector (correct me if
wrong, people who actually developed it)
2. If you have activated just checkpointing, the checkpoints are gone if
you externally kill the job. You might be interested in savepoints [1]
3. See paragraph in [2] about kinesis consumer parallelism

Best,
Andrey

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/connectors/kinesis.html#kinesis-consumer

On Fri, Jul 19, 2019 at 1:45 PM Vinay Patil <vi...@gmail.com> wrote:

> Hi,
>
> I am using this consumer for processing records from DynamoDb Streams ,
> few questions on this :
>
> 1. How does checkpointing works with Dstreams, since this class is
> extending FlinkKinesisConsumer, I am assuming it will start from the last
> successful checkpoint in case of failure, right ?
> 2. Currently when I kill the pipeline and start again it reads all the
> data from the start of the stream, is there any configuration to avoid this
> (apart from ConsumerConfigConstants.InitialPosition.LATEST), similar to
> group-id in Kafka.
> 3. As these DynamoDB Streams are separated by shards what is the
> recommended parallelism to be set for the source , should it be one to one
> mapping , for example if there are 3 shards , then parallelism should be 3 ?
>
>
> Regards,
> Vinay Patil
>
>
> On Wed, Aug 1, 2018 at 3:42 PM Ying Xu [via Apache Flink Mailing List
> archive.] <ml...@n3.nabble.com> wrote:
>
>> Thank you so much Fabian!
>>
>> Will update status in the JIRA.
>>
>> -
>> Ying
>>
>> On Tue, Jul 31, 2018 at 1:37 AM, Fabian Hueske <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=23597&i=0>> wrote:
>>
>> > Done!
>> >
>> > Thank you :-)
>> >
>> > 2018-07-31 6:41 GMT+02:00 Ying Xu <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=23597&i=1>>:
>> >
>> > > Thanks Fabian and Thomas.
>> > >
>> > > Please assign FLINK-4582 to the following username:
>> > >     *yxu-apache
>> > > <
>> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=yxu-apache
>> > >*
>> > >
>> > > If needed I can get a ICLA or CCLA whichever is proper.
>> > >
>> > > *Ying Xu*
>> > > Software Engineer
>> > > 510.368.1252 <+15103681252>
>> > > [image: Lyft] <http://www.lyft.com/>
>> > >
>> > > On Mon, Jul 30, 2018 at 8:31 PM, Thomas Weise <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=23597&i=2>> wrote:
>> > >
>> > > > The user is yxu-lyft, Ying had commented on that JIRA as well.
>> > > >
>> > > > https://issues.apache.org/jira/browse/FLINK-4582
>> > > >
>> > > >
>> > > > On Mon, Jul 30, 2018 at 1:25 AM Fabian Hueske <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=23597&i=3>>
>> > wrote:
>> > > >
>> > > > > Hi Ying,
>> > > > >
>> > > > > Thanks for considering to contribute the connector!
>> > > > >
>> > > > > In general, you don't need special permissions to contribute to
>> > Flink.
>> > > > > Anybody can open Jiras and PRs.
>> > > > > You only need to be assigned to the Contributor role in Jira to
>> be
>> > able
>> > > > to
>> > > > > assign an issue to you.
>> > > > > I can give you these permissions if you tell me your Jira user.
>> > > > >
>> > > > > It would also be good if you could submit a CLA [1] if you plan
>> to
>> > > > > contribute a larger feature.
>> > > > >
>> > > > > Thanks, Fabian
>> > > > >
>> > > > > [1] https://www.apache.org/licenses/#clas
>> > > > >
>> > > > >
>> > > > > 2018-07-30 10:07 GMT+02:00 Ying Xu <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=23597&i=4>>:
>> > > > >
>> > > > > > Hello Flink dev:
>> > > > > >
>> > > > > > We have implemented the prototype design and the initial PoC
>> worked
>> > > > > pretty
>> > > > > > well.  Currently, we plan to move ahead with this design in our
>> > > > internal
>> > > > > > production system.
>> > > > > >
>> > > > > > We are thinking of contributing this connector back to the
>> flink
>> > > > > community
>> > > > > > sometime soon.  May I request to be granted with a contributor
>> > role?
>> > > > > >
>> > > > > > Many thanks in advance.
>> > > > > >
>> > > > > > *Ying Xu*
>> > > > > > Software Engineer
>> > > > > > 510.368.1252 <+15103681252>
>> > > > > > [image: Lyft] <http://www.lyft.com/>
>> > > > > >
>> > > > > > On Fri, Jul 6, 2018 at 6:23 PM, Ying Xu <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=23597&i=5>> wrote:
>> > > > > >
>> > > > > > > Hi Gordon:
>> > > > > > >
>> > > > > > > Cool. Thanks for the thumb-up!
>> > > > > > >
>> > > > > > > We will include some test cases around the behavior of
>> > re-sharding.
>> > > > If
>> > > > > > > needed we can double check the behavior with AWS, and see if
>> > > > additional
>> > > > > > > changes are needed.  Will keep you posted.
>> > > > > > >
>> > > > > > > -
>> > > > > > > Ying
>> > > > > > >
>> > > > > > > On Wed, Jul 4, 2018 at 7:22 PM, Tzu-Li (Gordon) Tai <
>> > > > > [hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=23597&i=6>
>> > > > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > >> Hi Ying,
>> > > > > > >>
>> > > > > > >> Sorry for the late reply here.
>> > > > > > >>
>> > > > > > >> From the looks of the AmazonDynamoDBStreamsClient, yes it
>> seems
>> > > like
>> > > > > > this
>> > > > > > >> should simply work.
>> > > > > > >>
>> > > > > > >> Regarding the resharding behaviour I mentioned in the JIRA:
>> > > > > > >> I'm not sure if this is really a difference in behaviour.
>> > > > Internally,
>> > > > > if
>> > > > > > >> DynamoDB streams is actually just working on Kinesis
>> Streams,
>> > then
>> > > > the
>> > > > > > >> resharding primitives should be similar.
>> > > > > > >> The shard discovery logic of the Flink Kinesis Consumer
>> assumes
>> > > that
>> > > > > > >> splitting / merging shards will result in new shards of
>> > > increasing,
>> > > > > > >> consecutive shard ids. As long as this is also the behaviour
>> for
>> > > > > > DynamoDB
>> > > > > > >> resharding, then we should be fine.
>> > > > > > >>
>> > > > > > >> Feel free to start with the implementation for this, I think
>> > > > > design-wise
>> > > > > > >> we're good to go. And thanks for working on this!
>> > > > > > >>
>> > > > > > >> Cheers,
>> > > > > > >> Gordon
>> > > > > > >>
>> > > > > > >> On Wed, Jul 4, 2018 at 1:59 PM Ying Xu <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=23597&i=7>> wrote:
>> > > > > > >>
>> > > > > > >> > HI Gordon:
>> > > > > > >> >
>> > > > > > >> > We are starting to implement some of the primitives along
>> this
>> > > > path.
>> > > > > > >> Please
>> > > > > > >> > let us know if you have any suggestions.
>> > > > > > >> >
>> > > > > > >> > Thanks!
>> > > > > > >> >
>> > > > > > >> > On Fri, Jun 29, 2018 at 12:31 AM, Ying Xu <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=23597&i=8>>
>> > wrote:
>> > > > > > >> >
>> > > > > > >> > > Hi Gordon:
>> > > > > > >> > >
>> > > > > > >> > > Really appreciate the reply.
>> > > > > > >> > >
>> > > > > > >> > > Yes our plan is to build the connector on top of the
>> > > > > > >> > FlinkKinesisConsumer.
>> > > > > > >> > > At the high level, FlinkKinesisConsumer mainly interacts
>> > with
>> > > > > > Kinesis
>> > > > > > >> > > through the AmazonKinesis client, more specifically
>> through
>> > > the
>> > > > > > >> following
>> > > > > > >> > > three function calls:
>> > > > > > >> > >
>> > > > > > >> > >    - describeStream
>> > > > > > >> > >    - getRecords
>> > > > > > >> > >    - getShardIterator
>> > > > > > >> > >
>> > > > > > >> > > Given that the low-level DynamoDB client
>> > > > > > (AmazonDynamoDBStreamsClient)
>> > > > > > >> > > has already implemented similar calls, it is possible to
>> use
>> > > > that
>> > > > > > >> client
>> > > > > > >> > to
>> > > > > > >> > > interact with the dynamoDB streams, and adapt the
>> results
>> > from
>> > > > the
>> > > > > > >> > dynamoDB
>> > > > > > >> > > streams model to the kinesis model.
>> > > > > > >> > >
>> > > > > > >> > > It appears this is exactly what the
>> > > > AmazonDynamoDBStreamsAdapterCl
>> > > > > > >> ient
>> > > > > > >> > > <
>> > > > > > >> >
>> https://github.com/awslabs/dynamodb-streams-kinesis-adapter/
>> > > > > > >> blob/master/src/main/java/com/amazonaws/services/dynamodbv2/
>> > > > > > >> streamsadapter/AmazonDynamoDBStreamsAdapterClient.java
>> > > > > > >> > >
>> > > > > > >> > > does. The adaptor client implements the AmazonKinesis
>> client
>> > > > > > >> interface,
>> > > > > > >> > > and is officially supported by AWS.  Hence it is
>> possible to
>> > > > > replace
>> > > > > > >> the
>> > > > > > >> > > internal Kinesis client inside FlinkKinesisConsumer with
>> > this
>> > > > > > adapter
>> > > > > > >> > > client when interacting with dynamoDB streams.  The new
>> > object
>> > > > can
>> > > > > > be
>> > > > > > >> a
>> > > > > > >> > > subclass of FlinkKinesisConsumer with a new name e.g,
>> > > > > > >> > FlinkDynamoStreamCon
>> > > > > > >> > > sumer.
>> > > > > > >> > >
>> > > > > > >> > > At best this could simply work. But we would like to
>> hear if
>> > > > there
>> > > > > > are
>> > > > > > >> > > other situations to take care of.  In particular, I am
>> > > wondering
>> > > > > > >> what's
>> > > > > > >> > the *"resharding
>> > > > > > >> > > behavior"* mentioned in FLINK-4582.
>> > > > > > >> > >
>> > > > > > >> > > Thanks a lot!
>> > > > > > >> > >
>> > > > > > >> > > -
>> > > > > > >> > > Ying
>> > > > > > >> > >
>> > > > > > >> > > On Wed, Jun 27, 2018 at 10:43 PM, Tzu-Li (Gordon) Tai <
>> > > > > > >> > [hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=23597&i=9>
>> > > > > > >> > > > wrote:
>> > > > > > >> > >
>> > > > > > >> > >> Hi!
>> > > > > > >> > >>
>> > > > > > >> > >> I think it would be definitely nice to have this
>> feature.
>> > > > > > >> > >>
>> > > > > > >> > >> No actual previous work has been made on this issue,
>> but
>> > > AFAIK,
>> > > > > we
>> > > > > > >> > should
>> > > > > > >> > >> be able to build this on top of the
>> FlinkKinesisConsumer.
>> > > > > > >> > >> Whether this should live within the Kinesis connector
>> > module
>> > > or
>> > > > > an
>> > > > > > >> > >> independent module of its own is still TBD.
>> > > > > > >> > >> If you want, I would be happy to look at any concrete
>> > design
>> > > > > > >> proposals
>> > > > > > >> > you
>> > > > > > >> > >> have for this before you start the actual development
>> > > efforts.
>> > > > > > >> > >>
>> > > > > > >> > >> Cheers,
>> > > > > > >> > >> Gordon
>> > > > > > >> > >>
>> > > > > > >> > >> On Thu, Jun 28, 2018 at 2:12 AM Ying Xu <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=23597&i=10>>
>> > > wrote:
>> > > > > > >> > >>
>> > > > > > >> > >> > Thanks Fabian for the suggestion.
>> > > > > > >> > >> >
>> > > > > > >> > >> > *Ying Xu*
>> > > > > > >> > >> > Software Engineer
>> > > > > > >> > >> > 510.368.1252 <+15103681252>
>> > > > > > >> > >> > [image: Lyft] <http://www.lyft.com/>
>> > > > > > >> > >> >
>> > > > > > >> > >> > On Wed, Jun 27, 2018 at 2:01 AM, Fabian Hueske <
>> > > > > > [hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=23597&i=11>>
>> > > > > > >> > >> wrote:
>> > > > > > >> > >> >
>> > > > > > >> > >> > > Hi Ying,
>> > > > > > >> > >> > >
>> > > > > > >> > >> > > I'm not aware of any effort for this issue.
>> > > > > > >> > >> > > You could check with the assigned contributor in
>> Jira
>> > if
>> > > > > there
>> > > > > > is
>> > > > > > >> > some
>> > > > > > >> > >> > > previous work.
>> > > > > > >> > >> > >
>> > > > > > >> > >> > > Best, Fabian
>> > > > > > >> > >> > >
>> > > > > > >> > >> > > 2018-06-26 9:46 GMT+02:00 Ying Xu <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=23597&i=12>>:
>> > > > > > >> > >> > >
>> > > > > > >> > >> > > > Hello Flink dev:
>> > > > > > >> > >> > > >
>> > > > > > >> > >> > > > We have a number of use cases which involves
>> pulling
>> > > data
>> > > > > > from
>> > > > > > >> > >> DynamoDB
>> > > > > > >> > >> > > > streams into Flink.
>> > > > > > >> > >> > > >
>> > > > > > >> > >> > > > Given that this issue is tracked by Flink-4582
>> > > > > > >> > >> > > > <https://issues.apache.org/jira/browse/FLINK-4582>.
>>
>> > we
>> > > > > would
>> > > > > > >> like
>> > > > > > >> > >> to
>> > > > > > >> > >> > > check
>> > > > > > >> > >> > > > if any prior work has been completed by the
>> > community.
>> > > >  We
>> > > > > > are
>> > > > > > >> > also
>> > > > > > >> > >> > very
>> > > > > > >> > >> > > > interested in contributing to this effort.
>> > Currently,
>> > > we
>> > > > > > have
>> > > > > > >> a
>> > > > > > >> > >> > > high-level
>> > > > > > >> > >> > > > proposal which is based on extending the existing
>> > > > > > >> > >> FlinkKinesisConsumer
>> > > > > > >> > >> > > and
>> > > > > > >> > >> > > > making it work with DynamoDB streams (via
>> integrating
>> > > > with
>> > > > > > the
>> > > > > > >> > >> > > > AmazonDynamoDBStreams API).
>> > > > > > >> > >> > > >
>> > > > > > >> > >> > > > Any suggestion is welcome. Thank you very much.
>> > > > > > >> > >> > > >
>> > > > > > >> > >> > > >
>> > > > > > >> > >> > > > -
>> > > > > > >> > >> > > > Ying
>> > > > > > >> > >> > > >
>> > > > > > >> > >> > >
>> > > > > > >> > >> >
>> > > > > > >> > >>
>> > > > > > >> > >
>> > > > > > >> > >
>> > > > > > >> >
>> > > > > > >>
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>>
>> ------------------------------
>> If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Consuming-data-from-dynamoDB-streams-to-flink-tp22963p23597.html
>> To start a new topic under Apache Flink Mailing List archive., email
>> ml+s1008284n1h67@n3.nabble.com
>> To unsubscribe from Apache Flink Mailing List archive., click here
>> <http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=dmluYXkxOC5wYXRpbEBnbWFpbC5jb218MXwxODExMDE2NjAx>
>> .
>> NAML
>> <http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>

Re: Consuming data from dynamoDB streams to flink

Posted by Vinay Patil <vi...@gmail.com>.

Hi,

I am using this consumer for processing records from DynamoDb Streams , few
questions on this :

1. How does checkpointing works with Dstreams, since this class is
extending FlinkKinesisConsumer, I am assuming it will start from the last
successful checkpoint in case of failure, right ?
2. Currently when I kill the pipeline and start again it reads all the data
from the start of the stream, is there any configuration to avoid this
(apart from ConsumerConfigConstants.InitialPosition.LATEST), similar to
group-id in Kafka.
3. As these DynamoDB Streams are separated by shards what is the
recommended parallelism to be set for the source , should it be one to one
mapping , for example if there are 3 shards , then parallelism should be 3 ?


Regards,
Vinay Patil


On Wed, Aug 1, 2018 at 3:42 PM Ying Xu [via Apache Flink Mailing List
archive.] <ml...@n3.nabble.com> wrote:

> Thank you so much Fabian!
>
> Will update status in the JIRA.
>
> -
> Ying
>
> On Tue, Jul 31, 2018 at 1:37 AM, Fabian Hueske <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=23597&i=0>> wrote:
>
> > Done!
> >
> > Thank you :-)
> >
> > 2018-07-31 6:41 GMT+02:00 Ying Xu <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=23597&i=1>>:
> >
> > > Thanks Fabian and Thomas.
> > >
> > > Please assign FLINK-4582 to the following username:
> > >     *yxu-apache
> > > <
> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=yxu-apache
> > >*
> > >
> > > If needed I can get a ICLA or CCLA whichever is proper.
> > >
> > > *Ying Xu*
> > > Software Engineer
> > > 510.368.1252 <+15103681252>
> > > [image: Lyft] <http://www.lyft.com/>
> > >
> > > On Mon, Jul 30, 2018 at 8:31 PM, Thomas Weise <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=23597&i=2>> wrote:
> > >
> > > > The user is yxu-lyft, Ying had commented on that JIRA as well.
> > > >
> > > > https://issues.apache.org/jira/browse/FLINK-4582
> > > >
> > > >
> > > > On Mon, Jul 30, 2018 at 1:25 AM Fabian Hueske <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=23597&i=3>>
> > wrote:
> > > >
> > > > > Hi Ying,
> > > > >
> > > > > Thanks for considering to contribute the connector!
> > > > >
> > > > > In general, you don't need special permissions to contribute to
> > Flink.
> > > > > Anybody can open Jiras and PRs.
> > > > > You only need to be assigned to the Contributor role in Jira to be
> > able
> > > > to
> > > > > assign an issue to you.
> > > > > I can give you these permissions if you tell me your Jira user.
> > > > >
> > > > > It would also be good if you could submit a CLA [1] if you plan to
> > > > > contribute a larger feature.
> > > > >
> > > > > Thanks, Fabian
> > > > >
> > > > > [1] https://www.apache.org/licenses/#clas
> > > > >
> > > > >
> > > > > 2018-07-30 10:07 GMT+02:00 Ying Xu <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=23597&i=4>>:
> > > > >
> > > > > > Hello Flink dev:
> > > > > >
> > > > > > We have implemented the prototype design and the initial PoC
> worked
> > > > > pretty
> > > > > > well.  Currently, we plan to move ahead with this design in our
> > > > internal
> > > > > > production system.
> > > > > >
> > > > > > We are thinking of contributing this connector back to the flink
> > > > > community
> > > > > > sometime soon.  May I request to be granted with a contributor
> > role?
> > > > > >
> > > > > > Many thanks in advance.
> > > > > >
> > > > > > *Ying Xu*
> > > > > > Software Engineer
> > > > > > 510.368.1252 <+15103681252>
> > > > > > [image: Lyft] <http://www.lyft.com/>
> > > > > >
> > > > > > On Fri, Jul 6, 2018 at 6:23 PM, Ying Xu <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=23597&i=5>> wrote:
> > > > > >
> > > > > > > Hi Gordon:
> > > > > > >
> > > > > > > Cool. Thanks for the thumb-up!
> > > > > > >
> > > > > > > We will include some test cases around the behavior of
> > re-sharding.
> > > > If
> > > > > > > needed we can double check the behavior with AWS, and see if
> > > > additional
> > > > > > > changes are needed.  Will keep you posted.
> > > > > > >
> > > > > > > -
> > > > > > > Ying
> > > > > > >
> > > > > > > On Wed, Jul 4, 2018 at 7:22 PM, Tzu-Li (Gordon) Tai <
> > > > > [hidden email]
> <http:///user/SendEmail.jtp?type=node&node=23597&i=6>
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi Ying,
> > > > > > >>
> > > > > > >> Sorry for the late reply here.
> > > > > > >>
> > > > > > >> From the looks of the AmazonDynamoDBStreamsClient, yes it
> seems
> > > like
> > > > > > this
> > > > > > >> should simply work.
> > > > > > >>
> > > > > > >> Regarding the resharding behaviour I mentioned in the JIRA:
> > > > > > >> I'm not sure if this is really a difference in behaviour.
> > > > Internally,
> > > > > if
> > > > > > >> DynamoDB streams is actually just working on Kinesis Streams,
> > then
> > > > the
> > > > > > >> resharding primitives should be similar.
> > > > > > >> The shard discovery logic of the Flink Kinesis Consumer
> assumes
> > > that
> > > > > > >> splitting / merging shards will result in new shards of
> > > increasing,
> > > > > > >> consecutive shard ids. As long as this is also the behaviour
> for
> > > > > > DynamoDB
> > > > > > >> resharding, then we should be fine.
> > > > > > >>
> > > > > > >> Feel free to start with the implementation for this, I think
> > > > > design-wise
> > > > > > >> we're good to go. And thanks for working on this!
> > > > > > >>
> > > > > > >> Cheers,
> > > > > > >> Gordon
> > > > > > >>
> > > > > > >> On Wed, Jul 4, 2018 at 1:59 PM Ying Xu <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=23597&i=7>> wrote:
> > > > > > >>
> > > > > > >> > HI Gordon:
> > > > > > >> >
> > > > > > >> > We are starting to implement some of the primitives along
> this
> > > > path.
> > > > > > >> Please
> > > > > > >> > let us know if you have any suggestions.
> > > > > > >> >
> > > > > > >> > Thanks!
> > > > > > >> >
> > > > > > >> > On Fri, Jun 29, 2018 at 12:31 AM, Ying Xu <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=23597&i=8>>
> > wrote:
> > > > > > >> >
> > > > > > >> > > Hi Gordon:
> > > > > > >> > >
> > > > > > >> > > Really appreciate the reply.
> > > > > > >> > >
> > > > > > >> > > Yes our plan is to build the connector on top of the
> > > > > > >> > FlinkKinesisConsumer.
> > > > > > >> > > At the high level, FlinkKinesisConsumer mainly interacts
> > with
> > > > > > Kinesis
> > > > > > >> > > through the AmazonKinesis client, more specifically
> through
> > > the
> > > > > > >> following
> > > > > > >> > > three function calls:
> > > > > > >> > >
> > > > > > >> > >    - describeStream
> > > > > > >> > >    - getRecords
> > > > > > >> > >    - getShardIterator
> > > > > > >> > >
> > > > > > >> > > Given that the low-level DynamoDB client
> > > > > > (AmazonDynamoDBStreamsClient)
> > > > > > >> > > has already implemented similar calls, it is possible to
> use
> > > > that
> > > > > > >> client
> > > > > > >> > to
> > > > > > >> > > interact with the dynamoDB streams, and adapt the results
> > from
> > > > the
> > > > > > >> > dynamoDB
> > > > > > >> > > streams model to the kinesis model.
> > > > > > >> > >
> > > > > > >> > > It appears this is exactly what the
> > > > AmazonDynamoDBStreamsAdapterCl
> > > > > > >> ient
> > > > > > >> > > <
> > > > > > >> >
> https://github.com/awslabs/dynamodb-streams-kinesis-adapter/
> > > > > > >> blob/master/src/main/java/com/amazonaws/services/dynamodbv2/
> > > > > > >> streamsadapter/AmazonDynamoDBStreamsAdapterClient.java
> > > > > > >> > >
> > > > > > >> > > does. The adaptor client implements the AmazonKinesis
> client
> > > > > > >> interface,
> > > > > > >> > > and is officially supported by AWS.  Hence it is possible
> to
> > > > > replace
> > > > > > >> the
> > > > > > >> > > internal Kinesis client inside FlinkKinesisConsumer with
> > this
> > > > > > adapter
> > > > > > >> > > client when interacting with dynamoDB streams.  The new
> > object
> > > > can
> > > > > > be
> > > > > > >> a
> > > > > > >> > > subclass of FlinkKinesisConsumer with a new name e.g,
> > > > > > >> > FlinkDynamoStreamCon
> > > > > > >> > > sumer.
> > > > > > >> > >
> > > > > > >> > > At best this could simply work. But we would like to hear
> if
> > > > there
> > > > > > are
> > > > > > >> > > other situations to take care of.  In particular, I am
> > > wondering
> > > > > > >> what's
> > > > > > >> > the *"resharding
> > > > > > >> > > behavior"* mentioned in FLINK-4582.
> > > > > > >> > >
> > > > > > >> > > Thanks a lot!
> > > > > > >> > >
> > > > > > >> > > -
> > > > > > >> > > Ying
> > > > > > >> > >
> > > > > > >> > > On Wed, Jun 27, 2018 at 10:43 PM, Tzu-Li (Gordon) Tai <
> > > > > > >> > [hidden email]
> <http:///user/SendEmail.jtp?type=node&node=23597&i=9>
> > > > > > >> > > > wrote:
> > > > > > >> > >
> > > > > > >> > >> Hi!
> > > > > > >> > >>
> > > > > > >> > >> I think it would be definitely nice to have this
> feature.
> > > > > > >> > >>
> > > > > > >> > >> No actual previous work has been made on this issue, but
> > > AFAIK,
> > > > > we
> > > > > > >> > should
> > > > > > >> > >> be able to build this on top of the
> FlinkKinesisConsumer.
> > > > > > >> > >> Whether this should live within the Kinesis connector
> > module
> > > or
> > > > > an
> > > > > > >> > >> independent module of its own is still TBD.
> > > > > > >> > >> If you want, I would be happy to look at any concrete
> > design
> > > > > > >> proposals
> > > > > > >> > you
> > > > > > >> > >> have for this before you start the actual development
> > > efforts.
> > > > > > >> > >>
> > > > > > >> > >> Cheers,
> > > > > > >> > >> Gordon
> > > > > > >> > >>
> > > > > > >> > >> On Thu, Jun 28, 2018 at 2:12 AM Ying Xu <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=23597&i=10>>
> > > wrote:
> > > > > > >> > >>
> > > > > > >> > >> > Thanks Fabian for the suggestion.
> > > > > > >> > >> >
> > > > > > >> > >> > *Ying Xu*
> > > > > > >> > >> > Software Engineer
> > > > > > >> > >> > 510.368.1252 <+15103681252>
> > > > > > >> > >> > [image: Lyft] <http://www.lyft.com/>
> > > > > > >> > >> >
> > > > > > >> > >> > On Wed, Jun 27, 2018 at 2:01 AM, Fabian Hueske <
> > > > > > [hidden email]
> <http:///user/SendEmail.jtp?type=node&node=23597&i=11>>
> > > > > > >> > >> wrote:
> > > > > > >> > >> >
> > > > > > >> > >> > > Hi Ying,
> > > > > > >> > >> > >
> > > > > > >> > >> > > I'm not aware of any effort for this issue.
> > > > > > >> > >> > > You could check with the assigned contributor in
> Jira
> > if
> > > > > there
> > > > > > is
> > > > > > >> > some
> > > > > > >> > >> > > previous work.
> > > > > > >> > >> > >
> > > > > > >> > >> > > Best, Fabian
> > > > > > >> > >> > >
> > > > > > >> > >> > > 2018-06-26 9:46 GMT+02:00 Ying Xu <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=23597&i=12>>:
> > > > > > >> > >> > >
> > > > > > >> > >> > > > Hello Flink dev:
> > > > > > >> > >> > > >
> > > > > > >> > >> > > > We have a number of use cases which involves
> pulling
> > > data
> > > > > > from
> > > > > > >> > >> DynamoDB
> > > > > > >> > >> > > > streams into Flink.
> > > > > > >> > >> > > >
> > > > > > >> > >> > > > Given that this issue is tracked by Flink-4582
> > > > > > >> > >> > > > <https://issues.apache.org/jira/browse/FLINK-4582>.
>
> > we
> > > > > would
> > > > > > >> like
> > > > > > >> > >> to
> > > > > > >> > >> > > check
> > > > > > >> > >> > > > if any prior work has been completed by the
> > community.
> > > >  We
> > > > > > are
> > > > > > >> > also
> > > > > > >> > >> > very
> > > > > > >> > >> > > > interested in contributing to this effort.
> > Currently,
> > > we
> > > > > > have
> > > > > > >> a
> > > > > > >> > >> > > high-level
> > > > > > >> > >> > > > proposal which is based on extending the existing
> > > > > > >> > >> FlinkKinesisConsumer
> > > > > > >> > >> > > and
> > > > > > >> > >> > > > making it work with DynamoDB streams (via
> integrating
> > > > with
> > > > > > the
> > > > > > >> > >> > > > AmazonDynamoDBStreams API).
> > > > > > >> > >> > > >
> > > > > > >> > >> > > > Any suggestion is welcome. Thank you very much.
> > > > > > >> > >> > > >
> > > > > > >> > >> > > >
> > > > > > >> > >> > > > -
> > > > > > >> > >> > > > Ying
> > > > > > >> > >> > > >
> > > > > > >> > >> > >
> > > > > > >> > >> >
> > > > > > >> > >>
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Consuming-data-from-dynamoDB-streams-to-flink-tp22963p23597.html
> To start a new topic under Apache Flink Mailing List archive., email
> ml+s1008284n1h67@n3.nabble.com
> To unsubscribe from Apache Flink Mailing List archive., click here
> <http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=dmluYXkxOC5wYXRpbEBnbWFpbC5jb218MXwxODExMDE2NjAx>
> .
> NAML
> <http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>

Re: Consuming data from dynamoDB streams to flink

Posted by Ying Xu <yx...@lyft.com>.

Thank you so much Fabian!

Will update status in the JIRA.

-
Ying

On Tue, Jul 31, 2018 at 1:37 AM, Fabian Hueske <fh...@gmail.com> wrote:

> Done!
>
> Thank you :-)
>
> 2018-07-31 6:41 GMT+02:00 Ying Xu <yx...@lyft.com>:
>
> > Thanks Fabian and Thomas.
> >
> > Please assign FLINK-4582 to the following username:
> >     *yxu-apache
> > <https://issues.apache.org/jira/secure/ViewProfile.jspa?name=yxu-apache
> >*
> >
> > If needed I can get a ICLA or CCLA whichever is proper.
> >
> > *Ying Xu*
> > Software Engineer
> > 510.368.1252 <+15103681252>
> > [image: Lyft] <http://www.lyft.com/>
> >
> > On Mon, Jul 30, 2018 at 8:31 PM, Thomas Weise <th...@apache.org> wrote:
> >
> > > The user is yxu-lyft, Ying had commented on that JIRA as well.
> > >
> > > https://issues.apache.org/jira/browse/FLINK-4582
> > >
> > >
> > > On Mon, Jul 30, 2018 at 1:25 AM Fabian Hueske <fh...@gmail.com>
> wrote:
> > >
> > > > Hi Ying,
> > > >
> > > > Thanks for considering to contribute the connector!
> > > >
> > > > In general, you don't need special permissions to contribute to
> Flink.
> > > > Anybody can open Jiras and PRs.
> > > > You only need to be assigned to the Contributor role in Jira to be
> able
> > > to
> > > > assign an issue to you.
> > > > I can give you these permissions if you tell me your Jira user.
> > > >
> > > > It would also be good if you could submit a CLA [1] if you plan to
> > > > contribute a larger feature.
> > > >
> > > > Thanks, Fabian
> > > >
> > > > [1] https://www.apache.org/licenses/#clas
> > > >
> > > >
> > > > 2018-07-30 10:07 GMT+02:00 Ying Xu <yx...@lyft.com>:
> > > >
> > > > > Hello Flink dev:
> > > > >
> > > > > We have implemented the prototype design and the initial PoC worked
> > > > pretty
> > > > > well.  Currently, we plan to move ahead with this design in our
> > > internal
> > > > > production system.
> > > > >
> > > > > We are thinking of contributing this connector back to the flink
> > > > community
> > > > > sometime soon.  May I request to be granted with a contributor
> role?
> > > > >
> > > > > Many thanks in advance.
> > > > >
> > > > > *Ying Xu*
> > > > > Software Engineer
> > > > > 510.368.1252 <+15103681252>
> > > > > [image: Lyft] <http://www.lyft.com/>
> > > > >
> > > > > On Fri, Jul 6, 2018 at 6:23 PM, Ying Xu <yx...@lyft.com> wrote:
> > > > >
> > > > > > Hi Gordon:
> > > > > >
> > > > > > Cool. Thanks for the thumb-up!
> > > > > >
> > > > > > We will include some test cases around the behavior of
> re-sharding.
> > > If
> > > > > > needed we can double check the behavior with AWS, and see if
> > > additional
> > > > > > changes are needed.  Will keep you posted.
> > > > > >
> > > > > > -
> > > > > > Ying
> > > > > >
> > > > > > On Wed, Jul 4, 2018 at 7:22 PM, Tzu-Li (Gordon) Tai <
> > > > tzulitai@apache.org
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > >> Hi Ying,
> > > > > >>
> > > > > >> Sorry for the late reply here.
> > > > > >>
> > > > > >> From the looks of the AmazonDynamoDBStreamsClient, yes it seems
> > like
> > > > > this
> > > > > >> should simply work.
> > > > > >>
> > > > > >> Regarding the resharding behaviour I mentioned in the JIRA:
> > > > > >> I'm not sure if this is really a difference in behaviour.
> > > Internally,
> > > > if
> > > > > >> DynamoDB streams is actually just working on Kinesis Streams,
> then
> > > the
> > > > > >> resharding primitives should be similar.
> > > > > >> The shard discovery logic of the Flink Kinesis Consumer assumes
> > that
> > > > > >> splitting / merging shards will result in new shards of
> > increasing,
> > > > > >> consecutive shard ids. As long as this is also the behaviour for
> > > > > DynamoDB
> > > > > >> resharding, then we should be fine.
> > > > > >>
> > > > > >> Feel free to start with the implementation for this, I think
> > > > design-wise
> > > > > >> we're good to go. And thanks for working on this!
> > > > > >>
> > > > > >> Cheers,
> > > > > >> Gordon
> > > > > >>
> > > > > >> On Wed, Jul 4, 2018 at 1:59 PM Ying Xu <yx...@lyft.com> wrote:
> > > > > >>
> > > > > >> > HI Gordon:
> > > > > >> >
> > > > > >> > We are starting to implement some of the primitives along this
> > > path.
> > > > > >> Please
> > > > > >> > let us know if you have any suggestions.
> > > > > >> >
> > > > > >> > Thanks!
> > > > > >> >
> > > > > >> > On Fri, Jun 29, 2018 at 12:31 AM, Ying Xu <yx...@lyft.com>
> wrote:
> > > > > >> >
> > > > > >> > > Hi Gordon:
> > > > > >> > >
> > > > > >> > > Really appreciate the reply.
> > > > > >> > >
> > > > > >> > > Yes our plan is to build the connector on top of the
> > > > > >> > FlinkKinesisConsumer.
> > > > > >> > > At the high level, FlinkKinesisConsumer mainly interacts
> with
> > > > > Kinesis
> > > > > >> > > through the AmazonKinesis client, more specifically through
> > the
> > > > > >> following
> > > > > >> > > three function calls:
> > > > > >> > >
> > > > > >> > >    - describeStream
> > > > > >> > >    - getRecords
> > > > > >> > >    - getShardIterator
> > > > > >> > >
> > > > > >> > > Given that the low-level DynamoDB client
> > > > > (AmazonDynamoDBStreamsClient)
> > > > > >> > > has already implemented similar calls, it is possible to use
> > > that
> > > > > >> client
> > > > > >> > to
> > > > > >> > > interact with the dynamoDB streams, and adapt the results
> from
> > > the
> > > > > >> > dynamoDB
> > > > > >> > > streams model to the kinesis model.
> > > > > >> > >
> > > > > >> > > It appears this is exactly what the
> > > AmazonDynamoDBStreamsAdapterCl
> > > > > >> ient
> > > > > >> > > <
> > > > > >> > https://github.com/awslabs/dynamodb-streams-kinesis-adapter/
> > > > > >> blob/master/src/main/java/com/amazonaws/services/dynamodbv2/
> > > > > >> streamsadapter/AmazonDynamoDBStreamsAdapterClient.java
> > > > > >> > >
> > > > > >> > > does. The adaptor client implements the AmazonKinesis client
> > > > > >> interface,
> > > > > >> > > and is officially supported by AWS.  Hence it is possible to
> > > > replace
> > > > > >> the
> > > > > >> > > internal Kinesis client inside FlinkKinesisConsumer with
> this
> > > > > adapter
> > > > > >> > > client when interacting with dynamoDB streams.  The new
> object
> > > can
> > > > > be
> > > > > >> a
> > > > > >> > > subclass of FlinkKinesisConsumer with a new name e.g,
> > > > > >> > FlinkDynamoStreamCon
> > > > > >> > > sumer.
> > > > > >> > >
> > > > > >> > > At best this could simply work. But we would like to hear if
> > > there
> > > > > are
> > > > > >> > > other situations to take care of.  In particular, I am
> > wondering
> > > > > >> what's
> > > > > >> > the *"resharding
> > > > > >> > > behavior"* mentioned in FLINK-4582.
> > > > > >> > >
> > > > > >> > > Thanks a lot!
> > > > > >> > >
> > > > > >> > > -
> > > > > >> > > Ying
> > > > > >> > >
> > > > > >> > > On Wed, Jun 27, 2018 at 10:43 PM, Tzu-Li (Gordon) Tai <
> > > > > >> > tzulitai@apache.org
> > > > > >> > > > wrote:
> > > > > >> > >
> > > > > >> > >> Hi!
> > > > > >> > >>
> > > > > >> > >> I think it would be definitely nice to have this feature.
> > > > > >> > >>
> > > > > >> > >> No actual previous work has been made on this issue, but
> > AFAIK,
> > > > we
> > > > > >> > should
> > > > > >> > >> be able to build this on top of the FlinkKinesisConsumer.
> > > > > >> > >> Whether this should live within the Kinesis connector
> module
> > or
> > > > an
> > > > > >> > >> independent module of its own is still TBD.
> > > > > >> > >> If you want, I would be happy to look at any concrete
> design
> > > > > >> proposals
> > > > > >> > you
> > > > > >> > >> have for this before you start the actual development
> > efforts.
> > > > > >> > >>
> > > > > >> > >> Cheers,
> > > > > >> > >> Gordon
> > > > > >> > >>
> > > > > >> > >> On Thu, Jun 28, 2018 at 2:12 AM Ying Xu <yx...@lyft.com>
> > wrote:
> > > > > >> > >>
> > > > > >> > >> > Thanks Fabian for the suggestion.
> > > > > >> > >> >
> > > > > >> > >> > *Ying Xu*
> > > > > >> > >> > Software Engineer
> > > > > >> > >> > 510.368.1252 <+15103681252>
> > > > > >> > >> > [image: Lyft] <http://www.lyft.com/>
> > > > > >> > >> >
> > > > > >> > >> > On Wed, Jun 27, 2018 at 2:01 AM, Fabian Hueske <
> > > > > fhueske@gmail.com>
> > > > > >> > >> wrote:
> > > > > >> > >> >
> > > > > >> > >> > > Hi Ying,
> > > > > >> > >> > >
> > > > > >> > >> > > I'm not aware of any effort for this issue.
> > > > > >> > >> > > You could check with the assigned contributor in Jira
> if
> > > > there
> > > > > is
> > > > > >> > some
> > > > > >> > >> > > previous work.
> > > > > >> > >> > >
> > > > > >> > >> > > Best, Fabian
> > > > > >> > >> > >
> > > > > >> > >> > > 2018-06-26 9:46 GMT+02:00 Ying Xu <yx...@lyft.com>:
> > > > > >> > >> > >
> > > > > >> > >> > > > Hello Flink dev:
> > > > > >> > >> > > >
> > > > > >> > >> > > > We have a number of use cases which involves pulling
> > data
> > > > > from
> > > > > >> > >> DynamoDB
> > > > > >> > >> > > > streams into Flink.
> > > > > >> > >> > > >
> > > > > >> > >> > > > Given that this issue is tracked by Flink-4582
> > > > > >> > >> > > > <https://issues.apache.org/jira/browse/FLINK-4582>.
> we
> > > > would
> > > > > >> like
> > > > > >> > >> to
> > > > > >> > >> > > check
> > > > > >> > >> > > > if any prior work has been completed by the
> community.
> > >  We
> > > > > are
> > > > > >> > also
> > > > > >> > >> > very
> > > > > >> > >> > > > interested in contributing to this effort.
> Currently,
> > we
> > > > > have
> > > > > >> a
> > > > > >> > >> > > high-level
> > > > > >> > >> > > > proposal which is based on extending the existing
> > > > > >> > >> FlinkKinesisConsumer
> > > > > >> > >> > > and
> > > > > >> > >> > > > making it work with DynamoDB streams (via integrating
> > > with
> > > > > the
> > > > > >> > >> > > > AmazonDynamoDBStreams API).
> > > > > >> > >> > > >
> > > > > >> > >> > > > Any suggestion is welcome. Thank you very much.
> > > > > >> > >> > > >
> > > > > >> > >> > > >
> > > > > >> > >> > > > -
> > > > > >> > >> > > > Ying
> > > > > >> > >> > > >
> > > > > >> > >> > >
> > > > > >> > >> >
> > > > > >> > >>
> > > > > >> > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Consuming data from dynamoDB streams to flink

Posted by Fabian Hueske <fh...@gmail.com>.

Done!

Thank you :-)

2018-07-31 6:41 GMT+02:00 Ying Xu <yx...@lyft.com>:

> Thanks Fabian and Thomas.
>
> Please assign FLINK-4582 to the following username:
>     *yxu-apache
> <https://issues.apache.org/jira/secure/ViewProfile.jspa?name=yxu-apache>*
>
> If needed I can get a ICLA or CCLA whichever is proper.
>
> *Ying Xu*
> Software Engineer
> 510.368.1252 <+15103681252>
> [image: Lyft] <http://www.lyft.com/>
>
> On Mon, Jul 30, 2018 at 8:31 PM, Thomas Weise <th...@apache.org> wrote:
>
> > The user is yxu-lyft, Ying had commented on that JIRA as well.
> >
> > https://issues.apache.org/jira/browse/FLINK-4582
> >
> >
> > On Mon, Jul 30, 2018 at 1:25 AM Fabian Hueske <fh...@gmail.com> wrote:
> >
> > > Hi Ying,
> > >
> > > Thanks for considering to contribute the connector!
> > >
> > > In general, you don't need special permissions to contribute to Flink.
> > > Anybody can open Jiras and PRs.
> > > You only need to be assigned to the Contributor role in Jira to be able
> > to
> > > assign an issue to you.
> > > I can give you these permissions if you tell me your Jira user.
> > >
> > > It would also be good if you could submit a CLA [1] if you plan to
> > > contribute a larger feature.
> > >
> > > Thanks, Fabian
> > >
> > > [1] https://www.apache.org/licenses/#clas
> > >
> > >
> > > 2018-07-30 10:07 GMT+02:00 Ying Xu <yx...@lyft.com>:
> > >
> > > > Hello Flink dev:
> > > >
> > > > We have implemented the prototype design and the initial PoC worked
> > > pretty
> > > > well.  Currently, we plan to move ahead with this design in our
> > internal
> > > > production system.
> > > >
> > > > We are thinking of contributing this connector back to the flink
> > > community
> > > > sometime soon.  May I request to be granted with a contributor role?
> > > >
> > > > Many thanks in advance.
> > > >
> > > > *Ying Xu*
> > > > Software Engineer
> > > > 510.368.1252 <+15103681252>
> > > > [image: Lyft] <http://www.lyft.com/>
> > > >
> > > > On Fri, Jul 6, 2018 at 6:23 PM, Ying Xu <yx...@lyft.com> wrote:
> > > >
> > > > > Hi Gordon:
> > > > >
> > > > > Cool. Thanks for the thumb-up!
> > > > >
> > > > > We will include some test cases around the behavior of re-sharding.
> > If
> > > > > needed we can double check the behavior with AWS, and see if
> > additional
> > > > > changes are needed.  Will keep you posted.
> > > > >
> > > > > -
> > > > > Ying
> > > > >
> > > > > On Wed, Jul 4, 2018 at 7:22 PM, Tzu-Li (Gordon) Tai <
> > > tzulitai@apache.org
> > > > >
> > > > > wrote:
> > > > >
> > > > >> Hi Ying,
> > > > >>
> > > > >> Sorry for the late reply here.
> > > > >>
> > > > >> From the looks of the AmazonDynamoDBStreamsClient, yes it seems
> like
> > > > this
> > > > >> should simply work.
> > > > >>
> > > > >> Regarding the resharding behaviour I mentioned in the JIRA:
> > > > >> I'm not sure if this is really a difference in behaviour.
> > Internally,
> > > if
> > > > >> DynamoDB streams is actually just working on Kinesis Streams, then
> > the
> > > > >> resharding primitives should be similar.
> > > > >> The shard discovery logic of the Flink Kinesis Consumer assumes
> that
> > > > >> splitting / merging shards will result in new shards of
> increasing,
> > > > >> consecutive shard ids. As long as this is also the behaviour for
> > > > DynamoDB
> > > > >> resharding, then we should be fine.
> > > > >>
> > > > >> Feel free to start with the implementation for this, I think
> > > design-wise
> > > > >> we're good to go. And thanks for working on this!
> > > > >>
> > > > >> Cheers,
> > > > >> Gordon
> > > > >>
> > > > >> On Wed, Jul 4, 2018 at 1:59 PM Ying Xu <yx...@lyft.com> wrote:
> > > > >>
> > > > >> > HI Gordon:
> > > > >> >
> > > > >> > We are starting to implement some of the primitives along this
> > path.
> > > > >> Please
> > > > >> > let us know if you have any suggestions.
> > > > >> >
> > > > >> > Thanks!
> > > > >> >
> > > > >> > On Fri, Jun 29, 2018 at 12:31 AM, Ying Xu <yx...@lyft.com> wrote:
> > > > >> >
> > > > >> > > Hi Gordon:
> > > > >> > >
> > > > >> > > Really appreciate the reply.
> > > > >> > >
> > > > >> > > Yes our plan is to build the connector on top of the
> > > > >> > FlinkKinesisConsumer.
> > > > >> > > At the high level, FlinkKinesisConsumer mainly interacts with
> > > > Kinesis
> > > > >> > > through the AmazonKinesis client, more specifically through
> the
> > > > >> following
> > > > >> > > three function calls:
> > > > >> > >
> > > > >> > >    - describeStream
> > > > >> > >    - getRecords
> > > > >> > >    - getShardIterator
> > > > >> > >
> > > > >> > > Given that the low-level DynamoDB client
> > > > (AmazonDynamoDBStreamsClient)
> > > > >> > > has already implemented similar calls, it is possible to use
> > that
> > > > >> client
> > > > >> > to
> > > > >> > > interact with the dynamoDB streams, and adapt the results from
> > the
> > > > >> > dynamoDB
> > > > >> > > streams model to the kinesis model.
> > > > >> > >
> > > > >> > > It appears this is exactly what the
> > AmazonDynamoDBStreamsAdapterCl
> > > > >> ient
> > > > >> > > <
> > > > >> > https://github.com/awslabs/dynamodb-streams-kinesis-adapter/
> > > > >> blob/master/src/main/java/com/amazonaws/services/dynamodbv2/
> > > > >> streamsadapter/AmazonDynamoDBStreamsAdapterClient.java
> > > > >> > >
> > > > >> > > does. The adaptor client implements the AmazonKinesis client
> > > > >> interface,
> > > > >> > > and is officially supported by AWS.  Hence it is possible to
> > > replace
> > > > >> the
> > > > >> > > internal Kinesis client inside FlinkKinesisConsumer with this
> > > > adapter
> > > > >> > > client when interacting with dynamoDB streams.  The new object
> > can
> > > > be
> > > > >> a
> > > > >> > > subclass of FlinkKinesisConsumer with a new name e.g,
> > > > >> > FlinkDynamoStreamCon
> > > > >> > > sumer.
> > > > >> > >
> > > > >> > > At best this could simply work. But we would like to hear if
> > there
> > > > are
> > > > >> > > other situations to take care of.  In particular, I am
> wondering
> > > > >> what's
> > > > >> > the *"resharding
> > > > >> > > behavior"* mentioned in FLINK-4582.
> > > > >> > >
> > > > >> > > Thanks a lot!
> > > > >> > >
> > > > >> > > -
> > > > >> > > Ying
> > > > >> > >
> > > > >> > > On Wed, Jun 27, 2018 at 10:43 PM, Tzu-Li (Gordon) Tai <
> > > > >> > tzulitai@apache.org
> > > > >> > > > wrote:
> > > > >> > >
> > > > >> > >> Hi!
> > > > >> > >>
> > > > >> > >> I think it would be definitely nice to have this feature.
> > > > >> > >>
> > > > >> > >> No actual previous work has been made on this issue, but
> AFAIK,
> > > we
> > > > >> > should
> > > > >> > >> be able to build this on top of the FlinkKinesisConsumer.
> > > > >> > >> Whether this should live within the Kinesis connector module
> or
> > > an
> > > > >> > >> independent module of its own is still TBD.
> > > > >> > >> If you want, I would be happy to look at any concrete design
> > > > >> proposals
> > > > >> > you
> > > > >> > >> have for this before you start the actual development
> efforts.
> > > > >> > >>
> > > > >> > >> Cheers,
> > > > >> > >> Gordon
> > > > >> > >>
> > > > >> > >> On Thu, Jun 28, 2018 at 2:12 AM Ying Xu <yx...@lyft.com>
> wrote:
> > > > >> > >>
> > > > >> > >> > Thanks Fabian for the suggestion.
> > > > >> > >> >
> > > > >> > >> > *Ying Xu*
> > > > >> > >> > Software Engineer
> > > > >> > >> > 510.368.1252 <+15103681252>
> > > > >> > >> > [image: Lyft] <http://www.lyft.com/>
> > > > >> > >> >
> > > > >> > >> > On Wed, Jun 27, 2018 at 2:01 AM, Fabian Hueske <
> > > > fhueske@gmail.com>
> > > > >> > >> wrote:
> > > > >> > >> >
> > > > >> > >> > > Hi Ying,
> > > > >> > >> > >
> > > > >> > >> > > I'm not aware of any effort for this issue.
> > > > >> > >> > > You could check with the assigned contributor in Jira if
> > > there
> > > > is
> > > > >> > some
> > > > >> > >> > > previous work.
> > > > >> > >> > >
> > > > >> > >> > > Best, Fabian
> > > > >> > >> > >
> > > > >> > >> > > 2018-06-26 9:46 GMT+02:00 Ying Xu <yx...@lyft.com>:
> > > > >> > >> > >
> > > > >> > >> > > > Hello Flink dev:
> > > > >> > >> > > >
> > > > >> > >> > > > We have a number of use cases which involves pulling
> data
> > > > from
> > > > >> > >> DynamoDB
> > > > >> > >> > > > streams into Flink.
> > > > >> > >> > > >
> > > > >> > >> > > > Given that this issue is tracked by Flink-4582
> > > > >> > >> > > > <https://issues.apache.org/jira/browse/FLINK-4582>. we
> > > would
> > > > >> like
> > > > >> > >> to
> > > > >> > >> > > check
> > > > >> > >> > > > if any prior work has been completed by the community.
> >  We
> > > > are
> > > > >> > also
> > > > >> > >> > very
> > > > >> > >> > > > interested in contributing to this effort.  Currently,
> we
> > > > have
> > > > >> a
> > > > >> > >> > > high-level
> > > > >> > >> > > > proposal which is based on extending the existing
> > > > >> > >> FlinkKinesisConsumer
> > > > >> > >> > > and
> > > > >> > >> > > > making it work with DynamoDB streams (via integrating
> > with
> > > > the
> > > > >> > >> > > > AmazonDynamoDBStreams API).
> > > > >> > >> > > >
> > > > >> > >> > > > Any suggestion is welcome. Thank you very much.
> > > > >> > >> > > >
> > > > >> > >> > > >
> > > > >> > >> > > > -
> > > > >> > >> > > > Ying
> > > > >> > >> > > >
> > > > >> > >> > >
> > > > >> > >> >
> > > > >> > >>
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Consuming data from dynamoDB streams to flink

Posted by Ying Xu <yx...@lyft.com>.

Thanks Fabian and Thomas.

Please assign FLINK-4582 to the following username:
    *yxu-apache
<https://issues.apache.org/jira/secure/ViewProfile.jspa?name=yxu-apache>*

If needed I can get a ICLA or CCLA whichever is proper.

*Ying Xu*
Software Engineer
510.368.1252 <+15103681252>
[image: Lyft] <http://www.lyft.com/>

On Mon, Jul 30, 2018 at 8:31 PM, Thomas Weise <th...@apache.org> wrote:

> The user is yxu-lyft, Ying had commented on that JIRA as well.
>
> https://issues.apache.org/jira/browse/FLINK-4582
>
>
> On Mon, Jul 30, 2018 at 1:25 AM Fabian Hueske <fh...@gmail.com> wrote:
>
> > Hi Ying,
> >
> > Thanks for considering to contribute the connector!
> >
> > In general, you don't need special permissions to contribute to Flink.
> > Anybody can open Jiras and PRs.
> > You only need to be assigned to the Contributor role in Jira to be able
> to
> > assign an issue to you.
> > I can give you these permissions if you tell me your Jira user.
> >
> > It would also be good if you could submit a CLA [1] if you plan to
> > contribute a larger feature.
> >
> > Thanks, Fabian
> >
> > [1] https://www.apache.org/licenses/#clas
> >
> >
> > 2018-07-30 10:07 GMT+02:00 Ying Xu <yx...@lyft.com>:
> >
> > > Hello Flink dev:
> > >
> > > We have implemented the prototype design and the initial PoC worked
> > pretty
> > > well.  Currently, we plan to move ahead with this design in our
> internal
> > > production system.
> > >
> > > We are thinking of contributing this connector back to the flink
> > community
> > > sometime soon.  May I request to be granted with a contributor role?
> > >
> > > Many thanks in advance.
> > >
> > > *Ying Xu*
> > > Software Engineer
> > > 510.368.1252 <+15103681252>
> > > [image: Lyft] <http://www.lyft.com/>
> > >
> > > On Fri, Jul 6, 2018 at 6:23 PM, Ying Xu <yx...@lyft.com> wrote:
> > >
> > > > Hi Gordon:
> > > >
> > > > Cool. Thanks for the thumb-up!
> > > >
> > > > We will include some test cases around the behavior of re-sharding.
> If
> > > > needed we can double check the behavior with AWS, and see if
> additional
> > > > changes are needed.  Will keep you posted.
> > > >
> > > > -
> > > > Ying
> > > >
> > > > On Wed, Jul 4, 2018 at 7:22 PM, Tzu-Li (Gordon) Tai <
> > tzulitai@apache.org
> > > >
> > > > wrote:
> > > >
> > > >> Hi Ying,
> > > >>
> > > >> Sorry for the late reply here.
> > > >>
> > > >> From the looks of the AmazonDynamoDBStreamsClient, yes it seems like
> > > this
> > > >> should simply work.
> > > >>
> > > >> Regarding the resharding behaviour I mentioned in the JIRA:
> > > >> I'm not sure if this is really a difference in behaviour.
> Internally,
> > if
> > > >> DynamoDB streams is actually just working on Kinesis Streams, then
> the
> > > >> resharding primitives should be similar.
> > > >> The shard discovery logic of the Flink Kinesis Consumer assumes that
> > > >> splitting / merging shards will result in new shards of increasing,
> > > >> consecutive shard ids. As long as this is also the behaviour for
> > > DynamoDB
> > > >> resharding, then we should be fine.
> > > >>
> > > >> Feel free to start with the implementation for this, I think
> > design-wise
> > > >> we're good to go. And thanks for working on this!
> > > >>
> > > >> Cheers,
> > > >> Gordon
> > > >>
> > > >> On Wed, Jul 4, 2018 at 1:59 PM Ying Xu <yx...@lyft.com> wrote:
> > > >>
> > > >> > HI Gordon:
> > > >> >
> > > >> > We are starting to implement some of the primitives along this
> path.
> > > >> Please
> > > >> > let us know if you have any suggestions.
> > > >> >
> > > >> > Thanks!
> > > >> >
> > > >> > On Fri, Jun 29, 2018 at 12:31 AM, Ying Xu <yx...@lyft.com> wrote:
> > > >> >
> > > >> > > Hi Gordon:
> > > >> > >
> > > >> > > Really appreciate the reply.
> > > >> > >
> > > >> > > Yes our plan is to build the connector on top of the
> > > >> > FlinkKinesisConsumer.
> > > >> > > At the high level, FlinkKinesisConsumer mainly interacts with
> > > Kinesis
> > > >> > > through the AmazonKinesis client, more specifically through the
> > > >> following
> > > >> > > three function calls:
> > > >> > >
> > > >> > >    - describeStream
> > > >> > >    - getRecords
> > > >> > >    - getShardIterator
> > > >> > >
> > > >> > > Given that the low-level DynamoDB client
> > > (AmazonDynamoDBStreamsClient)
> > > >> > > has already implemented similar calls, it is possible to use
> that
> > > >> client
> > > >> > to
> > > >> > > interact with the dynamoDB streams, and adapt the results from
> the
> > > >> > dynamoDB
> > > >> > > streams model to the kinesis model.
> > > >> > >
> > > >> > > It appears this is exactly what the
> AmazonDynamoDBStreamsAdapterCl
> > > >> ient
> > > >> > > <
> > > >> > https://github.com/awslabs/dynamodb-streams-kinesis-adapter/
> > > >> blob/master/src/main/java/com/amazonaws/services/dynamodbv2/
> > > >> streamsadapter/AmazonDynamoDBStreamsAdapterClient.java
> > > >> > >
> > > >> > > does. The adaptor client implements the AmazonKinesis client
> > > >> interface,
> > > >> > > and is officially supported by AWS.  Hence it is possible to
> > replace
> > > >> the
> > > >> > > internal Kinesis client inside FlinkKinesisConsumer with this
> > > adapter
> > > >> > > client when interacting with dynamoDB streams.  The new object
> can
> > > be
> > > >> a
> > > >> > > subclass of FlinkKinesisConsumer with a new name e.g,
> > > >> > FlinkDynamoStreamCon
> > > >> > > sumer.
> > > >> > >
> > > >> > > At best this could simply work. But we would like to hear if
> there
> > > are
> > > >> > > other situations to take care of.  In particular, I am wondering
> > > >> what's
> > > >> > the *"resharding
> > > >> > > behavior"* mentioned in FLINK-4582.
> > > >> > >
> > > >> > > Thanks a lot!
> > > >> > >
> > > >> > > -
> > > >> > > Ying
> > > >> > >
> > > >> > > On Wed, Jun 27, 2018 at 10:43 PM, Tzu-Li (Gordon) Tai <
> > > >> > tzulitai@apache.org
> > > >> > > > wrote:
> > > >> > >
> > > >> > >> Hi!
> > > >> > >>
> > > >> > >> I think it would be definitely nice to have this feature.
> > > >> > >>
> > > >> > >> No actual previous work has been made on this issue, but AFAIK,
> > we
> > > >> > should
> > > >> > >> be able to build this on top of the FlinkKinesisConsumer.
> > > >> > >> Whether this should live within the Kinesis connector module or
> > an
> > > >> > >> independent module of its own is still TBD.
> > > >> > >> If you want, I would be happy to look at any concrete design
> > > >> proposals
> > > >> > you
> > > >> > >> have for this before you start the actual development efforts.
> > > >> > >>
> > > >> > >> Cheers,
> > > >> > >> Gordon
> > > >> > >>
> > > >> > >> On Thu, Jun 28, 2018 at 2:12 AM Ying Xu <yx...@lyft.com> wrote:
> > > >> > >>
> > > >> > >> > Thanks Fabian for the suggestion.
> > > >> > >> >
> > > >> > >> > *Ying Xu*
> > > >> > >> > Software Engineer
> > > >> > >> > 510.368.1252 <+15103681252>
> > > >> > >> > [image: Lyft] <http://www.lyft.com/>
> > > >> > >> >
> > > >> > >> > On Wed, Jun 27, 2018 at 2:01 AM, Fabian Hueske <
> > > fhueske@gmail.com>
> > > >> > >> wrote:
> > > >> > >> >
> > > >> > >> > > Hi Ying,
> > > >> > >> > >
> > > >> > >> > > I'm not aware of any effort for this issue.
> > > >> > >> > > You could check with the assigned contributor in Jira if
> > there
> > > is
> > > >> > some
> > > >> > >> > > previous work.
> > > >> > >> > >
> > > >> > >> > > Best, Fabian
> > > >> > >> > >
> > > >> > >> > > 2018-06-26 9:46 GMT+02:00 Ying Xu <yx...@lyft.com>:
> > > >> > >> > >
> > > >> > >> > > > Hello Flink dev:
> > > >> > >> > > >
> > > >> > >> > > > We have a number of use cases which involves pulling data
> > > from
> > > >> > >> DynamoDB
> > > >> > >> > > > streams into Flink.
> > > >> > >> > > >
> > > >> > >> > > > Given that this issue is tracked by Flink-4582
> > > >> > >> > > > <https://issues.apache.org/jira/browse/FLINK-4582>. we
> > would
> > > >> like
> > > >> > >> to
> > > >> > >> > > check
> > > >> > >> > > > if any prior work has been completed by the community.
>  We
> > > are
> > > >> > also
> > > >> > >> > very
> > > >> > >> > > > interested in contributing to this effort.  Currently, we
> > > have
> > > >> a
> > > >> > >> > > high-level
> > > >> > >> > > > proposal which is based on extending the existing
> > > >> > >> FlinkKinesisConsumer
> > > >> > >> > > and
> > > >> > >> > > > making it work with DynamoDB streams (via integrating
> with
> > > the
> > > >> > >> > > > AmazonDynamoDBStreams API).
> > > >> > >> > > >
> > > >> > >> > > > Any suggestion is welcome. Thank you very much.
> > > >> > >> > > >
> > > >> > >> > > >
> > > >> > >> > > > -
> > > >> > >> > > > Ying
> > > >> > >> > > >
> > > >> > >> > >
> > > >> > >> >
> > > >> > >>
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: Consuming data from dynamoDB streams to flink

Posted by Thomas Weise <th...@apache.org>.

The user is yxu-lyft, Ying had commented on that JIRA as well.

https://issues.apache.org/jira/browse/FLINK-4582


On Mon, Jul 30, 2018 at 1:25 AM Fabian Hueske <fh...@gmail.com> wrote:

> Hi Ying,
>
> Thanks for considering to contribute the connector!
>
> In general, you don't need special permissions to contribute to Flink.
> Anybody can open Jiras and PRs.
> You only need to be assigned to the Contributor role in Jira to be able to
> assign an issue to you.
> I can give you these permissions if you tell me your Jira user.
>
> It would also be good if you could submit a CLA [1] if you plan to
> contribute a larger feature.
>
> Thanks, Fabian
>
> [1] https://www.apache.org/licenses/#clas
>
>
> 2018-07-30 10:07 GMT+02:00 Ying Xu <yx...@lyft.com>:
>
> > Hello Flink dev:
> >
> > We have implemented the prototype design and the initial PoC worked
> pretty
> > well.  Currently, we plan to move ahead with this design in our internal
> > production system.
> >
> > We are thinking of contributing this connector back to the flink
> community
> > sometime soon.  May I request to be granted with a contributor role?
> >
> > Many thanks in advance.
> >
> > *Ying Xu*
> > Software Engineer
> > 510.368.1252 <+15103681252>
> > [image: Lyft] <http://www.lyft.com/>
> >
> > On Fri, Jul 6, 2018 at 6:23 PM, Ying Xu <yx...@lyft.com> wrote:
> >
> > > Hi Gordon:
> > >
> > > Cool. Thanks for the thumb-up!
> > >
> > > We will include some test cases around the behavior of re-sharding. If
> > > needed we can double check the behavior with AWS, and see if additional
> > > changes are needed.  Will keep you posted.
> > >
> > > -
> > > Ying
> > >
> > > On Wed, Jul 4, 2018 at 7:22 PM, Tzu-Li (Gordon) Tai <
> tzulitai@apache.org
> > >
> > > wrote:
> > >
> > >> Hi Ying,
> > >>
> > >> Sorry for the late reply here.
> > >>
> > >> From the looks of the AmazonDynamoDBStreamsClient, yes it seems like
> > this
> > >> should simply work.
> > >>
> > >> Regarding the resharding behaviour I mentioned in the JIRA:
> > >> I'm not sure if this is really a difference in behaviour. Internally,
> if
> > >> DynamoDB streams is actually just working on Kinesis Streams, then the
> > >> resharding primitives should be similar.
> > >> The shard discovery logic of the Flink Kinesis Consumer assumes that
> > >> splitting / merging shards will result in new shards of increasing,
> > >> consecutive shard ids. As long as this is also the behaviour for
> > DynamoDB
> > >> resharding, then we should be fine.
> > >>
> > >> Feel free to start with the implementation for this, I think
> design-wise
> > >> we're good to go. And thanks for working on this!
> > >>
> > >> Cheers,
> > >> Gordon
> > >>
> > >> On Wed, Jul 4, 2018 at 1:59 PM Ying Xu <yx...@lyft.com> wrote:
> > >>
> > >> > HI Gordon:
> > >> >
> > >> > We are starting to implement some of the primitives along this path.
> > >> Please
> > >> > let us know if you have any suggestions.
> > >> >
> > >> > Thanks!
> > >> >
> > >> > On Fri, Jun 29, 2018 at 12:31 AM, Ying Xu <yx...@lyft.com> wrote:
> > >> >
> > >> > > Hi Gordon:
> > >> > >
> > >> > > Really appreciate the reply.
> > >> > >
> > >> > > Yes our plan is to build the connector on top of the
> > >> > FlinkKinesisConsumer.
> > >> > > At the high level, FlinkKinesisConsumer mainly interacts with
> > Kinesis
> > >> > > through the AmazonKinesis client, more specifically through the
> > >> following
> > >> > > three function calls:
> > >> > >
> > >> > >    - describeStream
> > >> > >    - getRecords
> > >> > >    - getShardIterator
> > >> > >
> > >> > > Given that the low-level DynamoDB client
> > (AmazonDynamoDBStreamsClient)
> > >> > > has already implemented similar calls, it is possible to use that
> > >> client
> > >> > to
> > >> > > interact with the dynamoDB streams, and adapt the results from the
> > >> > dynamoDB
> > >> > > streams model to the kinesis model.
> > >> > >
> > >> > > It appears this is exactly what the AmazonDynamoDBStreamsAdapterCl
> > >> ient
> > >> > > <
> > >> > https://github.com/awslabs/dynamodb-streams-kinesis-adapter/
> > >> blob/master/src/main/java/com/amazonaws/services/dynamodbv2/
> > >> streamsadapter/AmazonDynamoDBStreamsAdapterClient.java
> > >> > >
> > >> > > does. The adaptor client implements the AmazonKinesis client
> > >> interface,
> > >> > > and is officially supported by AWS.  Hence it is possible to
> replace
> > >> the
> > >> > > internal Kinesis client inside FlinkKinesisConsumer with this
> > adapter
> > >> > > client when interacting with dynamoDB streams.  The new object can
> > be
> > >> a
> > >> > > subclass of FlinkKinesisConsumer with a new name e.g,
> > >> > FlinkDynamoStreamCon
> > >> > > sumer.
> > >> > >
> > >> > > At best this could simply work. But we would like to hear if there
> > are
> > >> > > other situations to take care of.  In particular, I am wondering
> > >> what's
> > >> > the *"resharding
> > >> > > behavior"* mentioned in FLINK-4582.
> > >> > >
> > >> > > Thanks a lot!
> > >> > >
> > >> > > -
> > >> > > Ying
> > >> > >
> > >> > > On Wed, Jun 27, 2018 at 10:43 PM, Tzu-Li (Gordon) Tai <
> > >> > tzulitai@apache.org
> > >> > > > wrote:
> > >> > >
> > >> > >> Hi!
> > >> > >>
> > >> > >> I think it would be definitely nice to have this feature.
> > >> > >>
> > >> > >> No actual previous work has been made on this issue, but AFAIK,
> we
> > >> > should
> > >> > >> be able to build this on top of the FlinkKinesisConsumer.
> > >> > >> Whether this should live within the Kinesis connector module or
> an
> > >> > >> independent module of its own is still TBD.
> > >> > >> If you want, I would be happy to look at any concrete design
> > >> proposals
> > >> > you
> > >> > >> have for this before you start the actual development efforts.
> > >> > >>
> > >> > >> Cheers,
> > >> > >> Gordon
> > >> > >>
> > >> > >> On Thu, Jun 28, 2018 at 2:12 AM Ying Xu <yx...@lyft.com> wrote:
> > >> > >>
> > >> > >> > Thanks Fabian for the suggestion.
> > >> > >> >
> > >> > >> > *Ying Xu*
> > >> > >> > Software Engineer
> > >> > >> > 510.368.1252 <+15103681252>
> > >> > >> > [image: Lyft] <http://www.lyft.com/>
> > >> > >> >
> > >> > >> > On Wed, Jun 27, 2018 at 2:01 AM, Fabian Hueske <
> > fhueske@gmail.com>
> > >> > >> wrote:
> > >> > >> >
> > >> > >> > > Hi Ying,
> > >> > >> > >
> > >> > >> > > I'm not aware of any effort for this issue.
> > >> > >> > > You could check with the assigned contributor in Jira if
> there
> > is
> > >> > some
> > >> > >> > > previous work.
> > >> > >> > >
> > >> > >> > > Best, Fabian
> > >> > >> > >
> > >> > >> > > 2018-06-26 9:46 GMT+02:00 Ying Xu <yx...@lyft.com>:
> > >> > >> > >
> > >> > >> > > > Hello Flink dev:
> > >> > >> > > >
> > >> > >> > > > We have a number of use cases which involves pulling data
> > from
> > >> > >> DynamoDB
> > >> > >> > > > streams into Flink.
> > >> > >> > > >
> > >> > >> > > > Given that this issue is tracked by Flink-4582
> > >> > >> > > > <https://issues.apache.org/jira/browse/FLINK-4582>. we
> would
> > >> like
> > >> > >> to
> > >> > >> > > check
> > >> > >> > > > if any prior work has been completed by the community.   We
> > are
> > >> > also
> > >> > >> > very
> > >> > >> > > > interested in contributing to this effort.  Currently, we
> > have
> > >> a
> > >> > >> > > high-level
> > >> > >> > > > proposal which is based on extending the existing
> > >> > >> FlinkKinesisConsumer
> > >> > >> > > and
> > >> > >> > > > making it work with DynamoDB streams (via integrating with
> > the
> > >> > >> > > > AmazonDynamoDBStreams API).
> > >> > >> > > >
> > >> > >> > > > Any suggestion is welcome. Thank you very much.
> > >> > >> > > >
> > >> > >> > > >
> > >> > >> > > > -
> > >> > >> > > > Ying
> > >> > >> > > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Consuming data from dynamoDB streams to flink

Posted by Fabian Hueske <fh...@gmail.com>.

Hi Ying,

Thanks for considering to contribute the connector!

In general, you don't need special permissions to contribute to Flink.
Anybody can open Jiras and PRs.
You only need to be assigned to the Contributor role in Jira to be able to
assign an issue to you.
I can give you these permissions if you tell me your Jira user.

It would also be good if you could submit a CLA [1] if you plan to
contribute a larger feature.

Thanks, Fabian

[1] https://www.apache.org/licenses/#clas


2018-07-30 10:07 GMT+02:00 Ying Xu <yx...@lyft.com>:

> Hello Flink dev:
>
> We have implemented the prototype design and the initial PoC worked pretty
> well.  Currently, we plan to move ahead with this design in our internal
> production system.
>
> We are thinking of contributing this connector back to the flink community
> sometime soon.  May I request to be granted with a contributor role?
>
> Many thanks in advance.
>
> *Ying Xu*
> Software Engineer
> 510.368.1252 <+15103681252>
> [image: Lyft] <http://www.lyft.com/>
>
> On Fri, Jul 6, 2018 at 6:23 PM, Ying Xu <yx...@lyft.com> wrote:
>
> > Hi Gordon:
> >
> > Cool. Thanks for the thumb-up!
> >
> > We will include some test cases around the behavior of re-sharding. If
> > needed we can double check the behavior with AWS, and see if additional
> > changes are needed.  Will keep you posted.
> >
> > -
> > Ying
> >
> > On Wed, Jul 4, 2018 at 7:22 PM, Tzu-Li (Gordon) Tai <tzulitai@apache.org
> >
> > wrote:
> >
> >> Hi Ying,
> >>
> >> Sorry for the late reply here.
> >>
> >> From the looks of the AmazonDynamoDBStreamsClient, yes it seems like
> this
> >> should simply work.
> >>
> >> Regarding the resharding behaviour I mentioned in the JIRA:
> >> I'm not sure if this is really a difference in behaviour. Internally, if
> >> DynamoDB streams is actually just working on Kinesis Streams, then the
> >> resharding primitives should be similar.
> >> The shard discovery logic of the Flink Kinesis Consumer assumes that
> >> splitting / merging shards will result in new shards of increasing,
> >> consecutive shard ids. As long as this is also the behaviour for
> DynamoDB
> >> resharding, then we should be fine.
> >>
> >> Feel free to start with the implementation for this, I think design-wise
> >> we're good to go. And thanks for working on this!
> >>
> >> Cheers,
> >> Gordon
> >>
> >> On Wed, Jul 4, 2018 at 1:59 PM Ying Xu <yx...@lyft.com> wrote:
> >>
> >> > HI Gordon:
> >> >
> >> > We are starting to implement some of the primitives along this path.
> >> Please
> >> > let us know if you have any suggestions.
> >> >
> >> > Thanks!
> >> >
> >> > On Fri, Jun 29, 2018 at 12:31 AM, Ying Xu <yx...@lyft.com> wrote:
> >> >
> >> > > Hi Gordon:
> >> > >
> >> > > Really appreciate the reply.
> >> > >
> >> > > Yes our plan is to build the connector on top of the
> >> > FlinkKinesisConsumer.
> >> > > At the high level, FlinkKinesisConsumer mainly interacts with
> Kinesis
> >> > > through the AmazonKinesis client, more specifically through the
> >> following
> >> > > three function calls:
> >> > >
> >> > >    - describeStream
> >> > >    - getRecords
> >> > >    - getShardIterator
> >> > >
> >> > > Given that the low-level DynamoDB client
> (AmazonDynamoDBStreamsClient)
> >> > > has already implemented similar calls, it is possible to use that
> >> client
> >> > to
> >> > > interact with the dynamoDB streams, and adapt the results from the
> >> > dynamoDB
> >> > > streams model to the kinesis model.
> >> > >
> >> > > It appears this is exactly what the AmazonDynamoDBStreamsAdapterCl
> >> ient
> >> > > <
> >> > https://github.com/awslabs/dynamodb-streams-kinesis-adapter/
> >> blob/master/src/main/java/com/amazonaws/services/dynamodbv2/
> >> streamsadapter/AmazonDynamoDBStreamsAdapterClient.java
> >> > >
> >> > > does. The adaptor client implements the AmazonKinesis client
> >> interface,
> >> > > and is officially supported by AWS.  Hence it is possible to replace
> >> the
> >> > > internal Kinesis client inside FlinkKinesisConsumer with this
> adapter
> >> > > client when interacting with dynamoDB streams.  The new object can
> be
> >> a
> >> > > subclass of FlinkKinesisConsumer with a new name e.g,
> >> > FlinkDynamoStreamCon
> >> > > sumer.
> >> > >
> >> > > At best this could simply work. But we would like to hear if there
> are
> >> > > other situations to take care of.  In particular, I am wondering
> >> what's
> >> > the *"resharding
> >> > > behavior"* mentioned in FLINK-4582.
> >> > >
> >> > > Thanks a lot!
> >> > >
> >> > > -
> >> > > Ying
> >> > >
> >> > > On Wed, Jun 27, 2018 at 10:43 PM, Tzu-Li (Gordon) Tai <
> >> > tzulitai@apache.org
> >> > > > wrote:
> >> > >
> >> > >> Hi!
> >> > >>
> >> > >> I think it would be definitely nice to have this feature.
> >> > >>
> >> > >> No actual previous work has been made on this issue, but AFAIK, we
> >> > should
> >> > >> be able to build this on top of the FlinkKinesisConsumer.
> >> > >> Whether this should live within the Kinesis connector module or an
> >> > >> independent module of its own is still TBD.
> >> > >> If you want, I would be happy to look at any concrete design
> >> proposals
> >> > you
> >> > >> have for this before you start the actual development efforts.
> >> > >>
> >> > >> Cheers,
> >> > >> Gordon
> >> > >>
> >> > >> On Thu, Jun 28, 2018 at 2:12 AM Ying Xu <yx...@lyft.com> wrote:
> >> > >>
> >> > >> > Thanks Fabian for the suggestion.
> >> > >> >
> >> > >> > *Ying Xu*
> >> > >> > Software Engineer
> >> > >> > 510.368.1252 <+15103681252>
> >> > >> > [image: Lyft] <http://www.lyft.com/>
> >> > >> >
> >> > >> > On Wed, Jun 27, 2018 at 2:01 AM, Fabian Hueske <
> fhueske@gmail.com>
> >> > >> wrote:
> >> > >> >
> >> > >> > > Hi Ying,
> >> > >> > >
> >> > >> > > I'm not aware of any effort for this issue.
> >> > >> > > You could check with the assigned contributor in Jira if there
> is
> >> > some
> >> > >> > > previous work.
> >> > >> > >
> >> > >> > > Best, Fabian
> >> > >> > >
> >> > >> > > 2018-06-26 9:46 GMT+02:00 Ying Xu <yx...@lyft.com>:
> >> > >> > >
> >> > >> > > > Hello Flink dev:
> >> > >> > > >
> >> > >> > > > We have a number of use cases which involves pulling data
> from
> >> > >> DynamoDB
> >> > >> > > > streams into Flink.
> >> > >> > > >
> >> > >> > > > Given that this issue is tracked by Flink-4582
> >> > >> > > > <https://issues.apache.org/jira/browse/FLINK-4582>. we would
> >> like
> >> > >> to
> >> > >> > > check
> >> > >> > > > if any prior work has been completed by the community.   We
> are
> >> > also
> >> > >> > very
> >> > >> > > > interested in contributing to this effort.  Currently, we
> have
> >> a
> >> > >> > > high-level
> >> > >> > > > proposal which is based on extending the existing
> >> > >> FlinkKinesisConsumer
> >> > >> > > and
> >> > >> > > > making it work with DynamoDB streams (via integrating with
> the
> >> > >> > > > AmazonDynamoDBStreams API).
> >> > >> > > >
> >> > >> > > > Any suggestion is welcome. Thank you very much.
> >> > >> > > >
> >> > >> > > >
> >> > >> > > > -
> >> > >> > > > Ying
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Consuming data from dynamoDB streams to flink

Posted by Ying Xu <yx...@lyft.com>.

Hello Flink dev:

We have implemented the prototype design and the initial PoC worked pretty
well.  Currently, we plan to move ahead with this design in our internal
production system.

We are thinking of contributing this connector back to the flink community
sometime soon.  May I request to be granted with a contributor role?

Many thanks in advance.

*Ying Xu*
Software Engineer
510.368.1252 <+15103681252>
[image: Lyft] <http://www.lyft.com/>

On Fri, Jul 6, 2018 at 6:23 PM, Ying Xu <yx...@lyft.com> wrote:

> Hi Gordon:
>
> Cool. Thanks for the thumb-up!
>
> We will include some test cases around the behavior of re-sharding. If
> needed we can double check the behavior with AWS, and see if additional
> changes are needed.  Will keep you posted.
>
> -
> Ying
>
> On Wed, Jul 4, 2018 at 7:22 PM, Tzu-Li (Gordon) Tai <tz...@apache.org>
> wrote:
>
>> Hi Ying,
>>
>> Sorry for the late reply here.
>>
>> From the looks of the AmazonDynamoDBStreamsClient, yes it seems like this
>> should simply work.
>>
>> Regarding the resharding behaviour I mentioned in the JIRA:
>> I'm not sure if this is really a difference in behaviour. Internally, if
>> DynamoDB streams is actually just working on Kinesis Streams, then the
>> resharding primitives should be similar.
>> The shard discovery logic of the Flink Kinesis Consumer assumes that
>> splitting / merging shards will result in new shards of increasing,
>> consecutive shard ids. As long as this is also the behaviour for DynamoDB
>> resharding, then we should be fine.
>>
>> Feel free to start with the implementation for this, I think design-wise
>> we're good to go. And thanks for working on this!
>>
>> Cheers,
>> Gordon
>>
>> On Wed, Jul 4, 2018 at 1:59 PM Ying Xu <yx...@lyft.com> wrote:
>>
>> > HI Gordon:
>> >
>> > We are starting to implement some of the primitives along this path.
>> Please
>> > let us know if you have any suggestions.
>> >
>> > Thanks!
>> >
>> > On Fri, Jun 29, 2018 at 12:31 AM, Ying Xu <yx...@lyft.com> wrote:
>> >
>> > > Hi Gordon:
>> > >
>> > > Really appreciate the reply.
>> > >
>> > > Yes our plan is to build the connector on top of the
>> > FlinkKinesisConsumer.
>> > > At the high level, FlinkKinesisConsumer mainly interacts with Kinesis
>> > > through the AmazonKinesis client, more specifically through the
>> following
>> > > three function calls:
>> > >
>> > >    - describeStream
>> > >    - getRecords
>> > >    - getShardIterator
>> > >
>> > > Given that the low-level DynamoDB client (AmazonDynamoDBStreamsClient)
>> > > has already implemented similar calls, it is possible to use that
>> client
>> > to
>> > > interact with the dynamoDB streams, and adapt the results from the
>> > dynamoDB
>> > > streams model to the kinesis model.
>> > >
>> > > It appears this is exactly what the AmazonDynamoDBStreamsAdapterCl
>> ient
>> > > <
>> > https://github.com/awslabs/dynamodb-streams-kinesis-adapter/
>> blob/master/src/main/java/com/amazonaws/services/dynamodbv2/
>> streamsadapter/AmazonDynamoDBStreamsAdapterClient.java
>> > >
>> > > does. The adaptor client implements the AmazonKinesis client
>> interface,
>> > > and is officially supported by AWS.  Hence it is possible to replace
>> the
>> > > internal Kinesis client inside FlinkKinesisConsumer with this adapter
>> > > client when interacting with dynamoDB streams.  The new object can be
>> a
>> > > subclass of FlinkKinesisConsumer with a new name e.g,
>> > FlinkDynamoStreamCon
>> > > sumer.
>> > >
>> > > At best this could simply work. But we would like to hear if there are
>> > > other situations to take care of.  In particular, I am wondering
>> what's
>> > the *"resharding
>> > > behavior"* mentioned in FLINK-4582.
>> > >
>> > > Thanks a lot!
>> > >
>> > > -
>> > > Ying
>> > >
>> > > On Wed, Jun 27, 2018 at 10:43 PM, Tzu-Li (Gordon) Tai <
>> > tzulitai@apache.org
>> > > > wrote:
>> > >
>> > >> Hi!
>> > >>
>> > >> I think it would be definitely nice to have this feature.
>> > >>
>> > >> No actual previous work has been made on this issue, but AFAIK, we
>> > should
>> > >> be able to build this on top of the FlinkKinesisConsumer.
>> > >> Whether this should live within the Kinesis connector module or an
>> > >> independent module of its own is still TBD.
>> > >> If you want, I would be happy to look at any concrete design
>> proposals
>> > you
>> > >> have for this before you start the actual development efforts.
>> > >>
>> > >> Cheers,
>> > >> Gordon
>> > >>
>> > >> On Thu, Jun 28, 2018 at 2:12 AM Ying Xu <yx...@lyft.com> wrote:
>> > >>
>> > >> > Thanks Fabian for the suggestion.
>> > >> >
>> > >> > *Ying Xu*
>> > >> > Software Engineer
>> > >> > 510.368.1252 <+15103681252>
>> > >> > [image: Lyft] <http://www.lyft.com/>
>> > >> >
>> > >> > On Wed, Jun 27, 2018 at 2:01 AM, Fabian Hueske <fh...@gmail.com>
>> > >> wrote:
>> > >> >
>> > >> > > Hi Ying,
>> > >> > >
>> > >> > > I'm not aware of any effort for this issue.
>> > >> > > You could check with the assigned contributor in Jira if there is
>> > some
>> > >> > > previous work.
>> > >> > >
>> > >> > > Best, Fabian
>> > >> > >
>> > >> > > 2018-06-26 9:46 GMT+02:00 Ying Xu <yx...@lyft.com>:
>> > >> > >
>> > >> > > > Hello Flink dev:
>> > >> > > >
>> > >> > > > We have a number of use cases which involves pulling data from
>> > >> DynamoDB
>> > >> > > > streams into Flink.
>> > >> > > >
>> > >> > > > Given that this issue is tracked by Flink-4582
>> > >> > > > <https://issues.apache.org/jira/browse/FLINK-4582>. we would
>> like
>> > >> to
>> > >> > > check
>> > >> > > > if any prior work has been completed by the community.   We are
>> > also
>> > >> > very
>> > >> > > > interested in contributing to this effort.  Currently, we have
>> a
>> > >> > > high-level
>> > >> > > > proposal which is based on extending the existing
>> > >> FlinkKinesisConsumer
>> > >> > > and
>> > >> > > > making it work with DynamoDB streams (via integrating with the
>> > >> > > > AmazonDynamoDBStreams API).
>> > >> > > >
>> > >> > > > Any suggestion is welcome. Thank you very much.
>> > >> > > >
>> > >> > > >
>> > >> > > > -
>> > >> > > > Ying
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: Consuming data from dynamoDB streams to flink

Posted by Ying Xu <yx...@lyft.com>.

Hi Gordon:

Cool. Thanks for the thumb-up!

We will include some test cases around the behavior of re-sharding. If
needed we can double check the behavior with AWS, and see if additional
changes are needed.  Will keep you posted.

-
Ying

On Wed, Jul 4, 2018 at 7:22 PM, Tzu-Li (Gordon) Tai <tz...@apache.org>
wrote:

> Hi Ying,
>
> Sorry for the late reply here.
>
> From the looks of the AmazonDynamoDBStreamsClient, yes it seems like this
> should simply work.
>
> Regarding the resharding behaviour I mentioned in the JIRA:
> I'm not sure if this is really a difference in behaviour. Internally, if
> DynamoDB streams is actually just working on Kinesis Streams, then the
> resharding primitives should be similar.
> The shard discovery logic of the Flink Kinesis Consumer assumes that
> splitting / merging shards will result in new shards of increasing,
> consecutive shard ids. As long as this is also the behaviour for DynamoDB
> resharding, then we should be fine.
>
> Feel free to start with the implementation for this, I think design-wise
> we're good to go. And thanks for working on this!
>
> Cheers,
> Gordon
>
> On Wed, Jul 4, 2018 at 1:59 PM Ying Xu <yx...@lyft.com> wrote:
>
> > HI Gordon:
> >
> > We are starting to implement some of the primitives along this path.
> Please
> > let us know if you have any suggestions.
> >
> > Thanks!
> >
> > On Fri, Jun 29, 2018 at 12:31 AM, Ying Xu <yx...@lyft.com> wrote:
> >
> > > Hi Gordon:
> > >
> > > Really appreciate the reply.
> > >
> > > Yes our plan is to build the connector on top of the
> > FlinkKinesisConsumer.
> > > At the high level, FlinkKinesisConsumer mainly interacts with Kinesis
> > > through the AmazonKinesis client, more specifically through the
> following
> > > three function calls:
> > >
> > >    - describeStream
> > >    - getRecords
> > >    - getShardIterator
> > >
> > > Given that the low-level DynamoDB client (AmazonDynamoDBStreamsClient)
> > > has already implemented similar calls, it is possible to use that
> client
> > to
> > > interact with the dynamoDB streams, and adapt the results from the
> > dynamoDB
> > > streams model to the kinesis model.
> > >
> > > It appears this is exactly what the AmazonDynamoDBStreamsAdapterClient
> > > <
> > https://github.com/awslabs/dynamodb-streams-kinesis-
> adapter/blob/master/src/main/java/com/amazonaws/services/
> dynamodbv2/streamsadapter/AmazonDynamoDBStreamsAdapterClient.java
> > >
> > > does. The adaptor client implements the AmazonKinesis client interface,
> > > and is officially supported by AWS.  Hence it is possible to replace
> the
> > > internal Kinesis client inside FlinkKinesisConsumer with this adapter
> > > client when interacting with dynamoDB streams.  The new object can be a
> > > subclass of FlinkKinesisConsumer with a new name e.g,
> > FlinkDynamoStreamCon
> > > sumer.
> > >
> > > At best this could simply work. But we would like to hear if there are
> > > other situations to take care of.  In particular, I am wondering what's
> > the *"resharding
> > > behavior"* mentioned in FLINK-4582.
> > >
> > > Thanks a lot!
> > >
> > > -
> > > Ying
> > >
> > > On Wed, Jun 27, 2018 at 10:43 PM, Tzu-Li (Gordon) Tai <
> > tzulitai@apache.org
> > > > wrote:
> > >
> > >> Hi!
> > >>
> > >> I think it would be definitely nice to have this feature.
> > >>
> > >> No actual previous work has been made on this issue, but AFAIK, we
> > should
> > >> be able to build this on top of the FlinkKinesisConsumer.
> > >> Whether this should live within the Kinesis connector module or an
> > >> independent module of its own is still TBD.
> > >> If you want, I would be happy to look at any concrete design proposals
> > you
> > >> have for this before you start the actual development efforts.
> > >>
> > >> Cheers,
> > >> Gordon
> > >>
> > >> On Thu, Jun 28, 2018 at 2:12 AM Ying Xu <yx...@lyft.com> wrote:
> > >>
> > >> > Thanks Fabian for the suggestion.
> > >> >
> > >> > *Ying Xu*
> > >> > Software Engineer
> > >> > 510.368.1252 <+15103681252>
> > >> > [image: Lyft] <http://www.lyft.com/>
> > >> >
> > >> > On Wed, Jun 27, 2018 at 2:01 AM, Fabian Hueske <fh...@gmail.com>
> > >> wrote:
> > >> >
> > >> > > Hi Ying,
> > >> > >
> > >> > > I'm not aware of any effort for this issue.
> > >> > > You could check with the assigned contributor in Jira if there is
> > some
> > >> > > previous work.
> > >> > >
> > >> > > Best, Fabian
> > >> > >
> > >> > > 2018-06-26 9:46 GMT+02:00 Ying Xu <yx...@lyft.com>:
> > >> > >
> > >> > > > Hello Flink dev:
> > >> > > >
> > >> > > > We have a number of use cases which involves pulling data from
> > >> DynamoDB
> > >> > > > streams into Flink.
> > >> > > >
> > >> > > > Given that this issue is tracked by Flink-4582
> > >> > > > <https://issues.apache.org/jira/browse/FLINK-4582>. we would
> like
> > >> to
> > >> > > check
> > >> > > > if any prior work has been completed by the community.   We are
> > also
> > >> > very
> > >> > > > interested in contributing to this effort.  Currently, we have a
> > >> > > high-level
> > >> > > > proposal which is based on extending the existing
> > >> FlinkKinesisConsumer
> > >> > > and
> > >> > > > making it work with DynamoDB streams (via integrating with the
> > >> > > > AmazonDynamoDBStreams API).
> > >> > > >
> > >> > > > Any suggestion is welcome. Thank you very much.
> > >> > > >
> > >> > > >
> > >> > > > -
> > >> > > > Ying
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Consuming data from dynamoDB streams to flink

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.

Hi Ying,

Sorry for the late reply here.

From the looks of the AmazonDynamoDBStreamsClient, yes it seems like this
should simply work.

Regarding the resharding behaviour I mentioned in the JIRA:
I'm not sure if this is really a difference in behaviour. Internally, if
DynamoDB streams is actually just working on Kinesis Streams, then the
resharding primitives should be similar.
The shard discovery logic of the Flink Kinesis Consumer assumes that
splitting / merging shards will result in new shards of increasing,
consecutive shard ids. As long as this is also the behaviour for DynamoDB
resharding, then we should be fine.

Feel free to start with the implementation for this, I think design-wise
we're good to go. And thanks for working on this!

Cheers,
Gordon

On Wed, Jul 4, 2018 at 1:59 PM Ying Xu <yx...@lyft.com> wrote:

> HI Gordon:
>
> We are starting to implement some of the primitives along this path. Please
> let us know if you have any suggestions.
>
> Thanks!
>
> On Fri, Jun 29, 2018 at 12:31 AM, Ying Xu <yx...@lyft.com> wrote:
>
> > Hi Gordon:
> >
> > Really appreciate the reply.
> >
> > Yes our plan is to build the connector on top of the
> FlinkKinesisConsumer.
> > At the high level, FlinkKinesisConsumer mainly interacts with Kinesis
> > through the AmazonKinesis client, more specifically through the following
> > three function calls:
> >
> >    - describeStream
> >    - getRecords
> >    - getShardIterator
> >
> > Given that the low-level DynamoDB client (AmazonDynamoDBStreamsClient)
> > has already implemented similar calls, it is possible to use that client
> to
> > interact with the dynamoDB streams, and adapt the results from the
> dynamoDB
> > streams model to the kinesis model.
> >
> > It appears this is exactly what the AmazonDynamoDBStreamsAdapterClient
> > <
> https://github.com/awslabs/dynamodb-streams-kinesis-adapter/blob/master/src/main/java/com/amazonaws/services/dynamodbv2/streamsadapter/AmazonDynamoDBStreamsAdapterClient.java
> >
> > does. The adaptor client implements the AmazonKinesis client interface,
> > and is officially supported by AWS.  Hence it is possible to replace the
> > internal Kinesis client inside FlinkKinesisConsumer with this adapter
> > client when interacting with dynamoDB streams.  The new object can be a
> > subclass of FlinkKinesisConsumer with a new name e.g,
> FlinkDynamoStreamCon
> > sumer.
> >
> > At best this could simply work. But we would like to hear if there are
> > other situations to take care of.  In particular, I am wondering what's
> the *"resharding
> > behavior"* mentioned in FLINK-4582.
> >
> > Thanks a lot!
> >
> > -
> > Ying
> >
> > On Wed, Jun 27, 2018 at 10:43 PM, Tzu-Li (Gordon) Tai <
> tzulitai@apache.org
> > > wrote:
> >
> >> Hi!
> >>
> >> I think it would be definitely nice to have this feature.
> >>
> >> No actual previous work has been made on this issue, but AFAIK, we
> should
> >> be able to build this on top of the FlinkKinesisConsumer.
> >> Whether this should live within the Kinesis connector module or an
> >> independent module of its own is still TBD.
> >> If you want, I would be happy to look at any concrete design proposals
> you
> >> have for this before you start the actual development efforts.
> >>
> >> Cheers,
> >> Gordon
> >>
> >> On Thu, Jun 28, 2018 at 2:12 AM Ying Xu <yx...@lyft.com> wrote:
> >>
> >> > Thanks Fabian for the suggestion.
> >> >
> >> > *Ying Xu*
> >> > Software Engineer
> >> > 510.368.1252 <+15103681252>
> >> > [image: Lyft] <http://www.lyft.com/>
> >> >
> >> > On Wed, Jun 27, 2018 at 2:01 AM, Fabian Hueske <fh...@gmail.com>
> >> wrote:
> >> >
> >> > > Hi Ying,
> >> > >
> >> > > I'm not aware of any effort for this issue.
> >> > > You could check with the assigned contributor in Jira if there is
> some
> >> > > previous work.
> >> > >
> >> > > Best, Fabian
> >> > >
> >> > > 2018-06-26 9:46 GMT+02:00 Ying Xu <yx...@lyft.com>:
> >> > >
> >> > > > Hello Flink dev:
> >> > > >
> >> > > > We have a number of use cases which involves pulling data from
> >> DynamoDB
> >> > > > streams into Flink.
> >> > > >
> >> > > > Given that this issue is tracked by Flink-4582
> >> > > > <https://issues.apache.org/jira/browse/FLINK-4582>. we would like
> >> to
> >> > > check
> >> > > > if any prior work has been completed by the community.   We are
> also
> >> > very
> >> > > > interested in contributing to this effort.  Currently, we have a
> >> > > high-level
> >> > > > proposal which is based on extending the existing
> >> FlinkKinesisConsumer
> >> > > and
> >> > > > making it work with DynamoDB streams (via integrating with the
> >> > > > AmazonDynamoDBStreams API).
> >> > > >
> >> > > > Any suggestion is welcome. Thank you very much.
> >> > > >
> >> > > >
> >> > > > -
> >> > > > Ying
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>