You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Rui Wang <ru...@google.com> on 2018/11/08 19:06:58 UTC

Fwd: [external-thread] at-least once with job changes on Beam KinesisIO

to user@beam.

---------- Forwarded message ---------
From: Pramod Rao <pr...@google.com>
Date: Wed, Nov 7, 2018 at 5:59 PM
Subject: Re: [external-thread] at-least once with job changes on Beam
KinesisIO
To: Fei Xue <fe...@google.com>
Cc: <us...@beam.apache.org>, <ru...@google.com>, Parviz Deyhim <
deyhim@google.com>, Ryan McDowell <ry...@google.com>




I see that the KinesisIO is checkpointing. Kinesis Checkpointing using the
KCL uses DynamoDB internally. This is outside the beam code. I see
references to Kinesis Client Library. I am not sure if KinesisIO is taking
the approach as taken by KafkaIO wherein Beam does not use the facilities
for checkpointing provided the unbounded source and acknowledges the
message as it reads and takes responsibility of managing the data from then
on.

Also you can update the pipeline with the new code(JAVA). There may be side
effects in some scenarios. How the pipeline update behaves for KinesisIO is
something I have not tested. We can test this quickly.

https://cloud.google.com/dataflow/pipelines/updating-a-pipeline

Does this help?

Adding Ryan to the thread since he may have some insights working with
Kinesis and Beam.


On Wed, Nov 7, 2018 at 5:29 PM Fei Xue <fe...@google.com> wrote:

> Hi user@beam -
>
> How should one achieve at-least once with job changes with KinesisIO? For
> example, if one wants to deploy a new version of a job, we could drain the
> old job - but how should the new version pick up where the old one left
> off?
>
> I read "if a kinesis-checkpoint table is present, the stream will start
> consuming from the existing offsets even when TRIM_HORIZON is specified".
> There is KinesisReaderCheckpoint.java but the doc
> <https://beam.apache.org/releases/javadoc/2.1.0/org/apache/beam/sdk/io/kinesis/KinesisIO.html> does
> not specify if checkpointing is supported or not.
> Does anybody know if/how Beam supports checkpoint for KinesisIO? Does it
> write to the dynamoDB checkpoint table
> <https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-ddb.html>
>  ?
>
> Fei
>
> --
>
> *Fei Xue* | Customer Engineer
>
> <http://cloud.withgoogle.com/next18>
>


-- 

Pramod Rao |  Strategic Cloud Engineer |  pramodrao@google.com |  +1
415-736-6055