You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Dennis Yung <de...@gmail.com> on 2020/08/14 14:52:43 UTC

DynamoDBIO fail to write to the same key in short consecution

Hi,

I am developing a beam job to sink mutable data to dynamodb. I found that
DynamoDBIO will throw an error if multiple write requests to the same key
are made in a short time.

DynamoDBIO.Write uses the batchWriteItem method from the AWS SDK to sink
items, and there is a limitation in the AWS SDK that a call to
batchWriteItem cannot contain duplicate keys.

Currently DynamoDBIO.Write performs no key deduplication before flushing a
batch, which could cause "ValidationException: Provided list of item keys
contains duplicates", if consecutive updates to a single key is within the
batch size (currently hardcoded to be 25).

I have created an issue on JIRA at
https://issues.apache.org/jira/browse/BEAM-10706?jql=text%20~%20%22dynamodbio%22

AWS support team confirmed to me that the Java SDK for dynamodb does not
currently handle deduplication. Taking reference from the Python sdk boto3,
which supports this, I modified the code of DynamoDBIO, which then solved
the problem for my application. The change is applied to 2.23.0, where I
also modified the test and ran it successfully.

Shall I apply the change to master and then create a PR? However I just
changed the ver1 aws module but not the ver2 one, plus I haven't submitted
a PR before so may need some guidance (I have read the contribution guide
though).

Thanks!

Re: DynamoDBIO fail to write to the same key in short consecution

Posted by Alexey Romanenko <ar...@gmail.com>.
I also added you to contributors list in Beam Jira and assigned this task to you.
Welcome!

> On 14 Aug 2020, at 17:48, Alexey Romanenko <ar...@gmail.com> wrote:
> 
> Hi Dennis,
> 
> Thank you for your contribution! 
> 
> On 14 Aug 2020, at 16:52, Dennis Yung <dennisylyung@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Shall I apply the change to master and then create a PR?
> 
> Yes, please do you changes against master because all that we merge (except cherry-pick commits) will be merged into master branch.
> 
>> However I just changed the ver1 aws module but not the ver2 one,
> 
> It would be great if you will apply this patch against AWS SDK v2 module as well since we are going to deprecate v1 module soon and finally remove it and keep only v2 module.
> 
>> plus I haven't submitted a PR before so may need some guidance (I have read the contribution guide though). 
> 
> You need to fork Beam GitHub repository, clone it into your local repo, create a branch related to your Jira issue, do your changes there and then push your brach to your remote GitHub repository. Then, from GitHub UI create a PR to Beam repository.
> You can find details with related commands here [1]. Also, ptal on our Beam Contribution Guide [2]
> 
> Please, don’t hesitate to ask if you need any help with that.
> 
> [1] https://cwiki.apache.org/confluence/display/BEAM/Git+Tips <https://cwiki.apache.org/confluence/display/BEAM/Git+Tips> 
> [2] https://beam.apache.org/contribute/ <https://beam.apache.org/contribute/>
> 


Re: DynamoDBIO fail to write to the same key in short consecution

Posted by Alexey Romanenko <ar...@gmail.com>.
Hi Dennis,

Thank you for your contribution! 

On 14 Aug 2020, at 16:52, Dennis Yung <de...@gmail.com> wrote:
> 
> Shall I apply the change to master and then create a PR?

Yes, please do you changes against master because all that we merge (except cherry-pick commits) will be merged into master branch.

> However I just changed the ver1 aws module but not the ver2 one,

It would be great if you will apply this patch against AWS SDK v2 module as well since we are going to deprecate v1 module soon and finally remove it and keep only v2 module.

> plus I haven't submitted a PR before so may need some guidance (I have read the contribution guide though). 

You need to fork Beam GitHub repository, clone it into your local repo, create a branch related to your Jira issue, do your changes there and then push your brach to your remote GitHub repository. Then, from GitHub UI create a PR to Beam repository.
You can find details with related commands here [1]. Also, ptal on our Beam Contribution Guide [2]

Please, don’t hesitate to ask if you need any help with that.

[1] https://cwiki.apache.org/confluence/display/BEAM/Git+Tips <https://cwiki.apache.org/confluence/display/BEAM/Git+Tips> 
[2] https://beam.apache.org/contribute/ <https://beam.apache.org/contribute/>