You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/04/02 03:19:00 UTC

[jira] [Work logged] (BEAM-9660) StreamingDataflowWorker has confusing exception on commits over 2GB

     [ https://issues.apache.org/jira/browse/BEAM-9660?focusedWorklogId=414450&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-414450 ]

ASF GitHub Bot logged work on BEAM-9660:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/Apr/20 03:18
            Start Date: 02/Apr/20 03:18
    Worklog Time Spent: 10m 
      Work Description: spoortikundargi commented on issue #11289: [BEAM-9660]: Add an explicit check for integer overflow.
URL: https://github.com/apache/beam/pull/11289#issuecomment-607598330
 
 
   R: @scwhittle
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 414450)
    Time Spent: 20m  (was: 10m)

> StreamingDataflowWorker has confusing exception on commits over 2GB
> -------------------------------------------------------------------
>
>                 Key: BEAM-9660
>                 URL: https://issues.apache.org/jira/browse/BEAM-9660
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>    Affects Versions: 2.18.0, 2.19.0
>            Reporter: Sam Whittle
>            Assignee: Sam Whittle
>            Priority: Minor
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Commits over 2GB have a negative serialized commit size.
> When not using streaming engine the max commit limit is 2GB.
> https://github.com/apache/beam/blob/v2.19.0/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java#L450
> There appears to be a logging regression introduced by
> https://github.com/apache/beam/pull/10013
> With the new code, if the serialization overflows the estimated bytes is set to Integer.MAX which equals the commit limit for appliance.
> Then the comparison here:
> https://github.com/apache/beam/blob/v2.19.0/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java#L1371
> which uses > does not trigger and the large commit is just passed on to the commit queue, triggering the exception seen in #3 [2] when the weigher uses the negative serialized size for the semaphore acquire call. 
> So previously where we would have thrown a KeyCommitTooLargeException we are throwing the IllegalArgumentException.
> From that exception description: https://github.com/apache/beam/blob/v2.19.0/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java#L236
>           ". This may be caused by grouping a very "
>               + "large amount of data in a single window without using Combine,"
>               + " or by producing a large amount of data from a single input element."
> The overflow could be remembered explicitly instead of just comparing with max.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)