You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Luke Cwik (Jira)" <ji...@apache.org> on 2020/10/07 16:34:00 UTC
[jira] [Updated] (BEAM-10942) beam_PostCommit_Java_Nexmark_Flink
failing due to loss of state when executing splittable DoFn
[ https://issues.apache.org/jira/browse/BEAM-10942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luke Cwik updated BEAM-10942:
-----------------------------
Status: Resolved (was: Open)
> beam_PostCommit_Java_Nexmark_Flink failing due to loss of state when executing splittable DoFn
> ----------------------------------------------------------------------------------------------
>
> Key: BEAM-10942
> URL: https://issues.apache.org/jira/browse/BEAM-10942
> Project: Beam
> Issue Type: Bug
> Components: runner-flink
> Affects Versions: 2.25.0
> Reporter: Luke Cwik
> Assignee: Luke Cwik
> Priority: P0
>
> Nexmark fails due to NPE caused by failure in loading element/restriction state for splittable DoFn.
> Additional logging in https://github.com/lukecwik/incubator-beam/tree/beam10670.2 shows that the element restriction is null/null
> {noformat}
> Sep 21, 2020 5:26:23 PM org.apache.beam.runners.core.SplittableParDoViaKeyedWorkItems$ProcessFn processElement
> INFO: before: KV{null, null}
> {noformat}
> Earlier logging shows that we have set this from state and then set a timer and then failed to load the data:
> {noformat}
> INFO: Setting: //:713916102:element:ValueInGlobalWindow{value=UnboundedEventSource(0, 100000), pane=PaneInfo.NO_FIRING}
> INFO: Setting: //:713916102:restriction:UnboundedSourceRestriction{source=UnboundedEventSource(33000, 34000), checkpoint=GeneratorCheckpoint{numEvents=1000, wallclockBaseTime=1600734382646}, watermark=294247-01-10T04:00:54.775Z}
> INFO: Reading: //:713916102:element:ValueInGlobalWindow{value=UnboundedEventSource(0, 100000), pane=PaneInfo.NO_FIRING}
> INFO: Reading: //:713916102:restriction:UnboundedSourceRestriction{source=UnboundedEventSource(33000, 34000), checkpoint=GeneratorCheckpoint{numEvents=1000, wallclockBaseTime=1600734382646}, watermark=294247-01-10T04:00:54.775Z}
> INFO: Setting: //:713916102:watermarkEstimatorState:294247-01-10T04:00:54.775Z
> INFO: Setting timer: 713916102 1:1600734383019// at 1600734383019 with output time 1600734383019
> INFO: Reading: //:713916102:element:null
> INFO: Reading: //:713916102:restriction:null
> INFO: Reading: //:713916102:watermarkEstimatorState:null
> {noformat}
> for byte[] key with hash 713916102 in the global window namespace.
> Note that in a rerun the keys will be different since they are UUIDs but the pattern is the same.
> To rerun:
> checkout https://github.com/lukecwik/incubator-beam/tree/beam10670.2 for additional log output
> setup gradle task:
> with task:
> {noformat}
> :sdks:java:testing:nexmark:run
> {noformat}
> with args:
> {noformat}
> -Pnexmark.runner=":runners:flink:1.11"
> -Pnexmark.args=" --runner=FlinkRunner --suite=SMOKE --streamTimeout=60 --streaming=true --manageResources=false --monitorJobs=true --flinkMaster=[local] --numEvents=100000 --query=4 --checkpointingInterval=3000 --shutdownSourcesAfterIdleMs=60000"
> {noformat}
> Note, you may need to increase your console size (Editor>General>Console) to capture enough output before it fails. I usually add a breakpoint for the UserCodeException constructor to pause the program and inspect the state at the point in time.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)