You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Michael (Jira)" <ji...@apache.org> on 2022/08/05 00:32:00 UTC

[jira] [Created] (FLINK-28817) NullPointerException in HybridSource when restoring from checkpoint

Michael created FLINK-28817:
-------------------------------

             Summary: NullPointerException in HybridSource when restoring from checkpoint
                 Key: FLINK-28817
                 URL: https://issues.apache.org/jira/browse/FLINK-28817
             Project: Flink
          Issue Type: Bug
          Components: Connectors / Common
    Affects Versions: 1.15.1, 1.14.4
            Reporter: Michael
         Attachments: bf-29-JM-err-analysis.log

Scenario:
 # CheckpointCoordinator - Completed checkpoint 14 for job 00000000000000000000000000000000
 # HybridSource successfully completed processing a few SourceFactories, that reads from s3
 # Next SourceFactory try to read contents of s3 dir, and it cause an error Unable to execute HTTP request: Read timed out
 # CheckpointCoordinator - Restoring job 00000000000000000000000000000000 from Checkpoint 14
 # HybridSourceSplitEnumerator - Restoring enumerator for sourceIndex=47
 # This restoring fail, because of NullPointerException: in HybridSourceSplitEnumerator.close:
 # Because of this issue, all future restoring from checkpoint also failed

Extract from the log: --------------
2022/08/02 22:26:51.227 INFO  o.a.f.r.c.CheckpointCoordinator - Restoring job 00000000000000000000000000000000 from Checkpoint 14 @ 1659478803949 for 00000000000000000000000000000000 located at s3://spp-state-371299021277-tech-aidata-di/mb-backfill-jul-20-backfill-prd/2/checkpoints/00000000000000000000000000000000/chk-14.
2022/08/02 22:26:51.240 INFO  o.a.f.r.c.CheckpointCoordinator - No master state to restore
2022/08/02 22:26:51.240 INFO  o.a.f.r.o.c.RecreateOnResetOperatorCoordinator - Resetting coordinator to checkpoint.
2022/08/02 22:26:51.241 INFO  o.a.f.r.s.c.SourceCoordinator - Closing SourceCoordinator for source Source: hybrid-source.
2022/08/02 22:26:51.424 INFO  o.a.f.r.s.c.SourceCoordinator - Restoring SplitEnumerator of source Source: hybrid-source from checkpoint.
2022/08/02 22:26:51.425 INFO  o.a.f.r.s.c.SourceCoordinator - Starting split enumerator for source Source: hybrid-source.
2022/08/02 22:26:51.426 INFO  c.i.d.s.f.s.c.b.HourlyFileSourceFactory - Reading input data from path s3://idl-kafka-connect-ued-raw-uw2-data-lake-prd/data/topics/sbseg-qbo-clickstream/d_20220729-2300 for 2022-07-29T23:00:00Z
2022/08/02 22:26:51.426 INFO  o.a.f.c.b.s.h.HybridSourceSplitEnumerator - Restoring enumerator for sourceIndex=47
 
2022/08/02 22:26:51.435 INFO  o.a.f.runtime.jobmaster.JobMaster - Trying to recover from a global failure.
org.apache.flink.util.FlinkException: Global failure triggered by OperatorCoordinator for 'Source: hybrid-source -> decrypt -> map2Events -> filterOutNulls -> assignTimestampsAndWatermarks -> logRawJson' (operator fd9fbc680ee884c4eafd0b9c2d3d007f).
at org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder$LazyInitializedCoordinatorContext.failJob(OperatorCoordinatorHolder.java:545)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
...
Caused by: java.lang.NullPointerException: null
at org.apache.flink.connector.base.source.hybridspp.HybridSourceSplitEnumerator.close(HybridSourceSplitEnumerator.java:246)
at org.apache.flink.runtime.source.coordinator.SourceCoordinator.close(SourceCoordinator.java:151)
at org.apache.flink.runtime.operators.coordination.ComponentClosingUtils.lambda$closeAsyncWithTimeout$0(ComponentClosingUtils.java:70)
at java.lang.Thread.run(Thread.java:750)
-----------------------------------



--
This message was sent by Atlassian Jira
(v8.20.10#820010)