You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Stephan Ewen (JIRA)" <ji...@apache.org> on 2014/10/13 16:49:34 UTC

[jira] [Commented] (FLINK-1152) Failed task cancellation leads to NullPointerException

    [ https://issues.apache.org/jira/browse/FLINK-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169357#comment-14169357 ] 

Stephan Ewen commented on FLINK-1152:
-------------------------------------

The exception {{ERROR org.apache.flink.runtime.taskmanager.TaskManager - Could not instantiate task
java.lang.Exception: Cannot start task. Task was canceled or failed.}} is okay, that may happen when the cancel call intercepts the deployment. I think we should not log that error, as it is an acceptable thing to happen. Throwing an exception at that point ensures that the cleanup logic works, that was the reason for introducing it. We may change that to a {{TaskCanceledException}} and not log those, to keep the logs clean and not introduce confusing messages.

> Failed task cancellation leads to NullPointerException
> ------------------------------------------------------
>
>                 Key: FLINK-1152
>                 URL: https://issues.apache.org/jira/browse/FLINK-1152
>             Project: Flink
>          Issue Type: Bug
>          Components: TaskManager
>    Affects Versions: 0.7-incubating
>            Reporter: Robert Metzger
>
> As part of the testing for release 0.7-incubating, I found the following exception:
> {code}
> 20:33:47,737 WARN  org.apache.hadoop.hdfs.DFSClient                              - Failed to connect to /130.149.21.17:50010 for block, add to deadNodes and continue. java.nio.channels.ClosedByInterruptException
> java.nio.channels.ClosedByInterruptException
>         at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>         at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:681)
>         at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
>         at org.apache.hadoop.hdfs.DFSInputStream.newTcpPeer(DFSInputStream.java:955)
>         at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1107)
>         at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:533)
>         at org.apache.hadoop.hdfs.DFSInputStream.seekToBlockSource(DFSInputStream.java:1273)
>         at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:722)
>         at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:752)
>         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:793)
>         at java.io.DataInputStream.read(DataInputStream.java:149)
>         at org.apache.flink.runtime.fs.hdfs.DistributedDataInputStream.read(DistributedDataInputStream.java:66)
>         at org.apache.flink.api.common.io.DelimitedInputFormat.fillBuffer(DelimitedInputFormat.java:616)
>         at org.apache.flink.api.common.io.DelimitedInputFormat.readLine(DelimitedInputFormat.java:522)
>         at org.apache.flink.api.common.io.DelimitedInputFormat.nextRecord(DelimitedInputFormat.java:488)
>         at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:214)
>         at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:235)
>         at java.lang.Thread.run(Thread.java:745)
> 20:33:47,739 INFO  org.apache.flink.runtime.execution.RuntimeEnvironment         - Canceling CHAIN DataSource (TextInputFormat (hdfs:/datasets/enwiki-latest-pages-meta-current.xml) - UTF-8) -> FlatMap (org.apache.flink.examples.java.wordcount.WordCount$Tokenizer) -> Combine(SUM(1)) (215/400)
> [...]
> 20:34:01,584 INFO  org.apache.flink.runtime.execution.RuntimeEnvironment         - Canceling CHAIN DataSource (TextInputFormat (hdfs:/datasets/generatedKMeans/centers-10mio-10dim) - UTF-8) -> Map (com.github.projectflink.testPlan.KMeansArbitraryDimension$ConvertToCentroid) (164/400)
> 20:34:01,584 INFO  org.apache.flink.runtime.execution.RuntimeEnvironment         - Canceling CHAIN DataSource (TextInputFormat (hdfs:/datasets/generatedKMeans/centers-10mio-10dim) - UTF-8) -> Map (com.github.projectflink.testPlan.KMeansArbitraryDimension$ConvertToCentroid) (164/400)
> 20:34:01,634 ERROR org.apache.flink.runtime.taskmanager.TaskManager              - Could not instantiate task
> java.lang.Exception: Cannot start task. Task was canceled or failed.
>         at org.apache.flink.runtime.taskmanager.TaskManager.submitTask(TaskManager.java:621)
>         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.flink.runtime.ipc.RPC$Server.call(RPC.java:418)
>         at org.apache.flink.runtime.ipc.Server$Handler.run(Server.java:947)
> 20:34:01,644 INFO  org.apache.flink.runtime.execution.RuntimeEnvironment         - Canceling PartialSolution (BulkIteration (Bulk Iteration)) (140/400)
> 20:34:01,649 ERROR org.apache.flink.runtime.util.ExecutorThreadFactory           - Thread 'Flink Executor Thread - 22' produced an uncaught exception.
> java.lang.NullPointerException
>         at org.apache.flink.runtime.taskmanager.TaskManager.unregisterTask(TaskManager.java:674)
>         at org.apache.flink.runtime.taskmanager.TaskManager.notifyExecutionStateChange(TaskManager.java:709)
>         at org.apache.flink.runtime.taskmanager.Task.cancelExecution(Task.java:222)
>         at org.apache.flink.runtime.taskmanager.TaskManager$3.run(TaskManager.java:555)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 20:34:01,656 ERROR org.apache.flink.runtime.taskmanager.TaskManager              - Could not instantiate task
> java.lang.Exception: Cannot start task. Task was canceled or failed.
>         at org.apache.flink.runtime.taskmanager.TaskManager.submitTask(TaskManager.java:621)
>         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.flink.runtime.ipc.RPC$Server.call(RPC.java:418)
>         at org.apache.flink.runtime.ipc.Server$Handler.run(Server.java:947)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)