You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/03/21 02:07:24 UTC

[GitHub] [incubator-druid] Jinstarry commented on issue #5343: Kafka indexing tasks occasionally fail with 'Stream closed'

Jinstarry commented on issue #5343: Kafka indexing tasks occasionally fail with 'Stream closed'
URL: https://github.com/apache/incubator-druid/issues/5343#issuecomment-475091973
 
 
   Hi, @jihoonson .
   I found a similar error in Druid 0.13.0.
   This problem probably occurs when there is not enough free memory on the server.
   When an error occurred, the available memory increased from 162056 to 4570104.
   During the test we were still storing data on the client and there was no loss. 
   However, memory usage was increasing slightly. Perhaps there was not enough memory left for the segment to be created.
   
   
   middlemanager.log is below.
   
   `2019-03-18T10:21:45,196 INFO [forking-task-runner-6] org.apache.druid.indexing.overlord.ForkingTaskRunner - Removing task directory: var/druid/task/index_kafka_ds_test2_9d87293708d7ae2_pdpdjhlh
   2019-03-18T10:21:46,079 INFO [WorkerTaskManager-NoticeHandler] org.apache.druid.indexing.worker.WorkerTaskManager - Job's finished. Completed [index_kafka_ds_test4_ee46e16cbb67785_okdcidna] with status [FAILED]
   2019-03-18T10:21:46,097 INFO [WorkerTaskManager-NoticeHandler] org.apache.druid.indexing.worker.WorkerTaskManager - Job's finished. Completed [index_kafka_ds_test2_9d87293708d7ae2_pdpdjhlh] with status [FAILED]
   2019-03-18T10:21:53,750 INFO [qtp1431064234-77] org.apache.druid.indexing.overlord.ForkingTaskRunner - Killing process for task: index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk
   2019-03-18T10:21:53,756 INFO [forking-task-runner-10] org.apache.druid.indexing.common.tasklogs.FileTaskLogs - Wrote task log to: var/druid/indexing-logs/index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk.log
   2019-03-18T10:21:53,761 INFO [forking-task-runner-10] org.apache.druid.indexing.overlord.ForkingTaskRunner - Exception caught during execution
   java.io.IOException: Stream closed
           at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170) ~[?:1.8.0_181]
           at java.io.BufferedInputStream.read1(BufferedInputStream.java:291) ~[?:1.8.0_181]
           at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[?:1.8.0_181]
           at java.io.FilterInputStream.read(FilterInputStream.java:107) ~[?:1.8.0_181]
           at com.google.common.io.ByteStreams.copy(ByteStreams.java:175) ~[guava-16.0.1.jar:?]
           at org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:438) [druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
           at org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:223) [druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
           at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
           at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
   2019-03-18T10:21:53,792 INFO [forking-task-runner-10] org.apache.druid.indexing.overlord.ForkingTaskRunner - Removing task directory: var/druid/task/index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk
   2019-03-18T10:21:53,806 INFO [WorkerTaskManager-NoticeHandler] org.apache.druid.indexing.worker.WorkerTaskManager - Job's finished. Completed [index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk] with status [FAILED]
   2019-03-18T10:21:54,362 INFO [WorkerTaskManager-NoticeHandler] org.apache.druid.indexing.worker.WorkerTaskManager - Task[index_kafka_ds_test2_4d88b40775ab307_knbgkefd] started.
   2019-03-18T10:21:57,111 INFO [forking-task-runner-15] org.apache.druid.indexing.overlord.ForkingTaskRunner - Running command: java -cp `
   
   
   And overlord.log of the same time is below.
   
   `2019-03-18T10:21:53,939 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.overlord.TaskLockbox - TaskLock is now empty: TaskLock{type=EXCLUSIVE, groupId=index_kafka_ds_test2, dataSource=ds_test2, interval=2019-03-18T00:00:00.000Z/2019-03-19T00:00:00.000Z, version=2019-03-18T10:21:47.300Z, priority=75, revoked=false}
   2019-03-18T10:21:53,946 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.overlord.MetadataTaskStorage - Deleting TaskLock with id[232]: TaskLock{type=EXCLUSIVE, groupId=index_kafka_ds_test2, dataSource=ds_test2, interval=2019-03-18T00:00:00.000Z/2019-03-19T00:00:00.000Z, version=2019-03-18T10:21:47.300Z, priority=75, revoked=false}
   2019-03-18T10:21:53,957 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.overlord.MetadataTaskStorage - Updating task index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk to status: TaskStatus{id=index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk, status=FAILED, duration=-1, errorMsg=null}
   2019-03-18T10:21:53,964 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.overlord.TaskQueue - Task done: AbstractTask{id='index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk', groupId='index_kafka_ds_test2', taskResource=TaskResource{availabilityGroup='index_kafka_ds_test2_4d88b40775ab307', requiredCapacity=1}, dataSource='ds_test2', context={checkpoints={"0":{"0":8275}}, IS_INCREMENTAL_HANDOFF_SUPPORTED=true}}
   2019-03-18T10:21:53,964 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.kafka.supervisor.KafkaSupervisor - Killing task [index_kafka_ds_test2_4d88b40775ab307_hbdjcjcn] in the task group
   2019-03-18T10:21:53,964 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Cleaning up task[index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk] on worker[localhost:8091]
   2019-03-18T10:21:53,968 WARN [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.TaskQueue - Unknown task completed: index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk
   2019-03-18T10:21:53,968 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.overlord.RemoteTaskRunner - Removed task from pending queue: index_kafka_ds_test2_4d88b40775ab307_hbdjcjcn
   2019-03-18T10:21:53,968 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.overlord.TaskLockbox - Removing task[index_kafka_ds_test2_4d88b40775ab307_hbdjcjcn] from activeTasks
   2019-03-18T10:21:53,968 INFO [Curator-PathChildrenCache-1] org.apache.druid.java.util.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2019-03-18T10:21:53.968Z","service":"druid/overlord","host":"localhost:8090","version":"0.13.0-incubating","metric":"task/run/time","value":34710,"dataSource":"ds_test2","taskId":"index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk","taskStatus":"FAILED","taskType":"index_kafka"}]
   2019-03-18T10:21:53,969 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.TaskQueue - Task FAILED: AbstractTask{id='index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk', groupId='index_kafka_ds_test2', taskResource=TaskResource{availabilityGroup='index_kafka_ds_test2_4d88b40775ab307', requiredCapacity=1}, dataSource='ds_test2', context={checkpoints={"0":{"0":8275}}, IS_INCREMENTAL_HANDOFF_SUPPORTED=true}} (34710 run duration)
   2019-03-18T10:21:53,969 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk] status changed to [FAILED].
   2019-03-18T10:21:53,970 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Task[index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk] went bye bye.
   2019-03-18T10:21:53,972 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.overlord.MetadataTaskStorage - Updating task index_kafka_ds_test2_4d88b40775ab307_hbdjcjcn to status: TaskStatus{id=index_kafka_ds_test2_4d88b40775ab307_hbdjcjcn, status=FAILED, duration=-1, errorMsg=null}
   2019-03-18T10:21:53,977 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.overlord.TaskQueue - Task done: AbstractTask{id='index_kafka_ds_test2_4d88b40775ab307_hbdjcjcn', groupId='index_kafka_ds_test2', taskResource=TaskResource{availabilityGroup='index_kafka_ds_test2_4d88b40775ab307', requiredCapacity=1}, dataSource='ds_test2', context={checkpoints={"0":{"0":8275}}, IS_INCREMENTAL_HANDOFF_SUPPORTED=true}}
   2019-03-18T10:21:53,978 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.kafka.supervisor.KafkaSupervisor - [ds_test2] supervisor is running.
   2019-03-18T10:21:53,978 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.kafka.supervisor.KafkaSupervisor - Creating new task group [0] for partitions [0]`
   
   
   And zookeeper.log of the same time is below.
   
   `2019-03-18 10:21:45,993 INFO [SyncThread:0] org.apache.zookeeper.server.ZooKeeperServer - Established session 0x169904864e50013 with negotiated timeout 30000 for client /127.0.0.1:50068
   2019-03-18 10:21:53,880 INFO [ProcessThread(sid:0 cport:2181):] org.apache.zookeeper.server.PrepRequestProcessor - Processed session termination for sessionid: 0x169904864e5000d
   2019-03-18 10:21:53,886 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] org.apache.zookeeper.server.NIOServerCnxn - caught end of stream exception
   EndOfStreamException: Unable to read additional data from client sessionid 0x169904864e5000d, likely client has closed socket
           at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
           at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
           at java.lang.Thread.run(Thread.java:748)
   2019-03-18 10:21:53,891 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /127.0.0.1:49798 which had sessionid 0x169904864e5000d
   2019-03-18 10:21:56,640 INFO [ProcessThread(sid:0 cport:2181):] org.apache.zookeeper.server.PrepRequestProcessor - Processed session termination for sessionid: 0x169904864e50013`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org