You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Alexey Romanenko (Jira)" <ji...@apache.org> on 2021/11/09 14:40:00 UTC
[jira] [Commented] (BEAM-13009) DynamoDBIO misses writing items if
`withDeduplicateKeys` is not set
[ https://issues.apache.org/jira/browse/BEAM-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441200#comment-17441200 ]
Alexey Romanenko commented on BEAM-13009:
-----------------------------------------
I raised this to P1 since it may cause a data loss.
> DynamoDBIO misses writing items if `withDeduplicateKeys` is not set
> -------------------------------------------------------------------
>
> Key: BEAM-13009
> URL: https://issues.apache.org/jira/browse/BEAM-13009
> Project: Beam
> Issue Type: Bug
> Components: io-java-aws
> Affects Versions: 2.27.0
> Reporter: Lei Li
> Assignee: Moritz Mack
> Priority: P1
> Labels: aws, data-loss, dynamodb
> Time Spent: 2h 40m
> Remaining Estimate: 0h
>
> A new method `withDeduplicateKeys` was added in DynamoDBIO from 2.27.0. It feels like it is optional according to the [doc|https://beam.apache.org/releases/javadoc/2.27.0/index.html?org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIO.html], and it was not shown in the examples either. But if a key name not set by it, [the deduplication logic|https://github.com/apache/beam/pull/12583/files#diff-0b5f7a7c1ee0ec890eef82e05e08ef1152421d2c8dcef11fca107f6af0d22e87R479-R492] still takes effect but uses an empty map as the `Map<String, AttributeValue>` part of the deduplication key, which results in all items having the same key and being deduplicated, writing only the last item to DynamoDB.
> I think we need to add an check on DeduplicateKeys in `extractDeduplicateKeyValues`, and skip the deduplication logic if it's empty.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)