You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Luke Cwik (Jira)" <ji...@apache.org> on 2020/09/22 00:35:00 UTC

[jira] [Created] (BEAM-10942) beam_PostCommit_Java_Nexmark_Flink failing

Luke Cwik created BEAM-10942:
--------------------------------

             Summary: beam_PostCommit_Java_Nexmark_Flink failing
                 Key: BEAM-10942
                 URL: https://issues.apache.org/jira/browse/BEAM-10942
             Project: Beam
          Issue Type: Bug
          Components: runner-flink
    Affects Versions: 2.25.0
            Reporter: Luke Cwik
            Assignee: Luke Cwik


Nexmark fails due to NPE caused by failure in loading element/restriction state for splittable DoFn.

Additional logging in https://github.com/lukecwik/incubator-beam/tree/beam10670.2 shows that the element restriction is null/null 

{noformat}
Sep 21, 2020 5:26:23 PM org.apache.beam.runners.core.SplittableParDoViaKeyedWorkItems$ProcessFn processElement
INFO: before: KV{null, null}
{noformat}

Earlier logging shows that we have set this from state and then set a timer and then failed to load the data:
{noformat}
INFO: Setting: //:713916102:element:ValueInGlobalWindow{value=UnboundedEventSource(0, 100000), pane=PaneInfo.NO_FIRING}
INFO: Setting: //:713916102:restriction:UnboundedSourceRestriction{source=UnboundedEventSource(33000, 34000), checkpoint=GeneratorCheckpoint{numEvents=1000, wallclockBaseTime=1600734382646}, watermark=294247-01-10T04:00:54.775Z}
INFO: Reading: //:713916102:element:ValueInGlobalWindow{value=UnboundedEventSource(0, 100000), pane=PaneInfo.NO_FIRING}
INFO: Reading: //:713916102:restriction:UnboundedSourceRestriction{source=UnboundedEventSource(33000, 34000), checkpoint=GeneratorCheckpoint{numEvents=1000, wallclockBaseTime=1600734382646}, watermark=294247-01-10T04:00:54.775Z}
INFO: Setting: //:713916102:watermarkEstimatorState:294247-01-10T04:00:54.775Z
INFO: Setting timer: 713916102 1:1600734383019// at 1600734383019 with output time 1600734383019
INFO: Reading: //:713916102:element:null
INFO: Reading: //:713916102:restriction:null
INFO: Reading: //:713916102:watermarkEstimatorState:null
{noformat}
for byte[] key with hash 713916102 in the global window namespace.

Note that in a rerun the keys will be different since they are UUIDs but the pattern is the same.

To rerun:
checkout https://github.com/lukecwik/incubator-beam/tree/beam10670.2 for additional log output
setup gradle task:
with task:
{noformat}
:sdks:java:testing:nexmark:run
{noformat}
with args:
{noformat}
-Pnexmark.runner=":runners:flink:1.11"
-Pnexmark.args="     --runner=FlinkRunner     --suite=SMOKE     --streamTimeout=60     --streaming=true     --manageResources=false     --monitorJobs=true     --flinkMaster=[local] --numEvents=100000 --query=4 --checkpointingInterval=3000 --shutdownSourcesAfterIdleMs=60000"
{noformat}

Note, you may need to increase your console size (Editor>General>Console) to capture enough output before it fails. I usually add a breakpoint for the UserCodeException constructor to pause the program and inspect the state at the point in time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)