You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Nick Pentreath <ni...@gmail.com> on 2015/12/10 16:04:27 UTC

Spark Streaming Kinesis - DynamoDB Streams compatability

Hi Spark users & devs

I was just wondering if anyone out there has interest in DynamoDB Streams (
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html)
as an input source for Spark Streaming Kinesis?

Because DynamoDB Streams provides an adaptor client that works with the
KCL, making this work is fairly straightforward, but would require a little
bit of work to add it to Spark Streaming Kinesis as an option. It also
requires updating the AWS SDK version.

For those using AWS heavily, there are other ways of achieving the same
outcome indirectly, the easiest of which I've found so far is using AWS
Lambdas to read from the DynamoDB Stream, (optionally) transform the
events, and write to a Kinesis stream, allowing one to just use the
existing Spark integration. Still, I'd like to know if there is sufficient
interest or demand for this among the user base to work on a PR adding
DynamoDB Streams support to Spark.

(At the same time, the implementation details happen to provide an
opportunity to address https://issues.apache.org/jira/browse/SPARK-10969,
though not sure how much need there is for that either?)

N