You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Alexander Hoem Rosbach (JIRA)" <ji...@apache.org> on 2017/10/10 09:05:00 UTC

[jira] [Commented] (BEAM-3039) DatastoreIO.Write fails multiple mutations of same entity

    [ https://issues.apache.org/jira/browse/BEAM-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198386#comment-16198386 ] 

Alexander Hoem Rosbach commented on BEAM-3039:
----------------------------------------------

To get around this issue I've been using the Distinct-function with a window:
{code}
Window.<String>into(new GlobalWindows())
                .triggering(
                        Repeatedly.forever(
                                AfterFirst.of(
                                        AfterPane.elementCountAtLeast(BATCH_SIZE), // BATCH_SIZE => 1001
                                        AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(2))
                                )
                        )
                )
                .discardingFiredPanes();
    }
{code}

> DatastoreIO.Write fails multiple mutations of same entity
> ---------------------------------------------------------
>
>                 Key: BEAM-3039
>                 URL: https://issues.apache.org/jira/browse/BEAM-3039
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-extensions
>    Affects Versions: 2.1.0
>            Reporter: Alexander Hoem Rosbach
>            Assignee: Reuven Lax
>            Priority: Minor
>
> When streaming messages from a source that doesn't guarantee once-only-delivery, but has at-least-once-delivery, then the DatastoreIO.Write will throw an exception which leads to Dataflow retrying the same commit multiple times before giving up. This leads to a significant bottleneck in the pipeline, with the end-result that the data is dropped. This should be handled better.
> There are a number of ways to fix this. One of them could be to drop any duplicate mutations within one batch. Non-duplicates should also be handled in some way. Perhaps a use NON-TRANSACTIONAL commit, or make sure the mutations are commited in different commits.
> {code}
> com.google.datastore.v1.client.DatastoreException: A non-transactional commit may not contain multiple mutations affecting the same entity., code=INVALID_ARGUMENT
>         com.google.datastore.v1.client.RemoteRpc.makeException(RemoteRpc.java:126)
>         com.google.datastore.v1.client.RemoteRpc.makeException(RemoteRpc.java:169)
>         com.google.datastore.v1.client.RemoteRpc.call(RemoteRpc.java:89)
>         com.google.datastore.v1.client.Datastore.commit(Datastore.java:84)
>         org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$DatastoreWriterFn.flushBatch(DatastoreV1.java:1288)
>         org.apache.beam.sdk.io.gcp.datastore.DatastoreV1$DatastoreWriterFn.processElement(DatastoreV1.java:1253) 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)