You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Mahesh Sawaiker <ma...@persistent.com> on 2017/06/01 04:56:09 UTC

RE: The following Error seems to happen once in every ten minutes (Spark Structured Streaming)?

Your data node(s) is/are going down for some reason, check the logs of the datanode and fix the underlying issue why datanode is going down.
There should be no need to delete any data, just starting the data nodes should do the trick for you.

From: kant kodali [mailto:kanth909@gmail.com]
Sent: Thursday, June 01, 2017 4:35 AM
To: user @spark
Subject: The following Error seems to happen once in every ten minutes (Spark Structured Streaming)?

Hi All,

When my query is streaming I get the following error once in say 10 minutes. Lot of the solutions online seems to suggest just clear data directories under datanode and namenode and restart the HDFS cluster but I didn't see anything that explains the cause? If it happens so frequent what do I need to do? I use spark standalone 2.1.1 (I don't use any resource managers like YARN or Mesos at this time)

org.apache.spark.util.TaskCompletionListenerException: File /usr/local/hadoop/metrics/state/0/5/temp-6025335567362823423 could only be replicated to 0 nodes instead of minReplication (=1).  There are 2 datanode(s) running and no node(s) are excluded in this operation.

Thanks!

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.