You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Eric Friedman <er...@gmail.com> on 2014/07/23 17:27:54 UTC

Lost executors

I'm using spark 1.0.1 on a quite large cluster, with gobs of memory, etc.
 Cluster resources are available to me via Yarn and I am seeing these
errors quite often.

ERROR YarnClientClusterScheduler: Lost executor 63 on <host>: remote Akka
client disassociated


This is in an interactive shell session.  I don't know a lot about Yarn
plumbing and am wondering if there's some constraint in play -- executors
can't be idle for too long or they get cleared out.


Any insights here?

Re: Lost executors

Posted by Eric Friedman <er...@gmail.com>.

And... PEBCAK

I mistakenly believed I had set PYSPARK_PYTHON to a python 2.7 install, but
it was on a python 2.6 install on the remote nodes, hence incompatible with
what the master was sending.  Have set this to point to the correct version
everywhere and it works.

Apologies for the false alarm.


On Wed, Jul 23, 2014 at 8:40 PM, Eric Friedman <er...@gmail.com>
wrote:

> hi Andrew,
>
> Thanks for your note.  Yes, I see a stack trace now.  It seems to be an
> issue with python interpreting a function I wish to apply to an RDD.  The
> stack trace is below.  The function is a simple factorial:
>
> def f(n):
>   if n == 1: return 1
>   return n * f(n-1)
>
> and I'm trying to use it like this:
>
> tf = sc.textFile(...)
> tf.map(lambda line: line and len(line)).map(f).collect()
>
> I get the following error, which does not occur if I use a built-in
> function, like math.sqrt
>
>  TypeError: __import__() argument 1 must be string, not X#
>
> stacktrace follows
>
>
>
> WARN TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException
>
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>
>   File
> "/hadoop/d11/yarn/nm/usercache/eric_d_friedman/filecache/26/spark-assembly-1.0.1-hadoop2.2.0.jar/pyspark/worker.py",
> line 77, in main
>
>     serializer.dump_stream(func(split_index, iterator), outfile)
>
>   File
> "/hadoop/d11/yarn/nm/usercache/eric_d_friedman/filecache/26/spark-assembly-1.0.1-hadoop2.2.0.jar/pyspark/serializers.py",
> line 191, in dump_stream
>
>     self.serializer.dump_stream(self._batched(iterator), stream)
>
>   File
> "/hadoop/d11/yarn/nm/usercache/eric_d_friedman/filecache/26/spark-assembly-1.0.1-hadoop2.2.0.jar/pyspark/serializers.py",
> line 123, in dump_stream
>
>     for obj in iterator:
>
>   File
> "/hadoop/d11/yarn/nm/usercache/eric_d_friedman/filecache/26/spark-assembly-1.0.1-hadoop2.2.0.jar/pyspark/serializers.py",
> line 180, in _batched
>
>     for item in iterator:
>
>   File "<ipython-input-39-0f0dafaf1ed4>", line 2, in f
>
> TypeError: __import__() argument 1 must be string, not X#
>
>
>
>  at
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>
> at
> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>
> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>
>
>
>
>
> On Wed, Jul 23, 2014 at 11:07 AM, Andrew Or <an...@databricks.com> wrote:
>
>> Hi Eric,
>>
>> Have you checked the executor logs? It is possible they died because of
>> some exception, and the message you see is just a side effect.
>>
>> Andrew
>>
>>
>> 2014-07-23 8:27 GMT-07:00 Eric Friedman <er...@gmail.com>:
>>
>> I'm using spark 1.0.1 on a quite large cluster, with gobs of memory, etc.
>>>  Cluster resources are available to me via Yarn and I am seeing these
>>> errors quite often.
>>>
>>> ERROR YarnClientClusterScheduler: Lost executor 63 on <host>: remote
>>> Akka client disassociated
>>>
>>>
>>> This is in an interactive shell session.  I don't know a lot about Yarn
>>> plumbing and am wondering if there's some constraint in play -- executors
>>> can't be idle for too long or they get cleared out.
>>>
>>>
>>> Any insights here?
>>>
>>
>>
>

Re: Lost executors

Posted by Eric Friedman <er...@gmail.com>.

hi Andrew,

Thanks for your note.  Yes, I see a stack trace now.  It seems to be an
issue with python interpreting a function I wish to apply to an RDD.  The
stack trace is below.  The function is a simple factorial:

def f(n):
  if n == 1: return 1
  return n * f(n-1)

and I'm trying to use it like this:

tf = sc.textFile(...)
tf.map(lambda line: line and len(line)).map(f).collect()

I get the following error, which does not occur if I use a built-in
function, like math.sqrt

 TypeError: __import__() argument 1 must be string, not X#

stacktrace follows



WARN TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException

org.apache.spark.api.python.PythonException: Traceback (most recent call
last):

  File
"/hadoop/d11/yarn/nm/usercache/eric_d_friedman/filecache/26/spark-assembly-1.0.1-hadoop2.2.0.jar/pyspark/worker.py",
line 77, in main

    serializer.dump_stream(func(split_index, iterator), outfile)

  File
"/hadoop/d11/yarn/nm/usercache/eric_d_friedman/filecache/26/spark-assembly-1.0.1-hadoop2.2.0.jar/pyspark/serializers.py",
line 191, in dump_stream

    self.serializer.dump_stream(self._batched(iterator), stream)

  File
"/hadoop/d11/yarn/nm/usercache/eric_d_friedman/filecache/26/spark-assembly-1.0.1-hadoop2.2.0.jar/pyspark/serializers.py",
line 123, in dump_stream

    for obj in iterator:

  File
"/hadoop/d11/yarn/nm/usercache/eric_d_friedman/filecache/26/spark-assembly-1.0.1-hadoop2.2.0.jar/pyspark/serializers.py",
line 180, in _batched

    for item in iterator:

  File "<ipython-input-39-0f0dafaf1ed4>", line 2, in f

TypeError: __import__() argument 1 must be string, not X#



 at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)

at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)

at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)





On Wed, Jul 23, 2014 at 11:07 AM, Andrew Or <an...@databricks.com> wrote:

> Hi Eric,
>
> Have you checked the executor logs? It is possible they died because of
> some exception, and the message you see is just a side effect.
>
> Andrew
>
>
> 2014-07-23 8:27 GMT-07:00 Eric Friedman <er...@gmail.com>:
>
> I'm using spark 1.0.1 on a quite large cluster, with gobs of memory, etc.
>>  Cluster resources are available to me via Yarn and I am seeing these
>> errors quite often.
>>
>> ERROR YarnClientClusterScheduler: Lost executor 63 on <host>: remote Akka
>> client disassociated
>>
>>
>> This is in an interactive shell session.  I don't know a lot about Yarn
>> plumbing and am wondering if there's some constraint in play -- executors
>> can't be idle for too long or they get cleared out.
>>
>>
>> Any insights here?
>>
>
>

Re: Lost executors

Posted by Andrew Or <an...@databricks.com>.

Hi Eric,

Have you checked the executor logs? It is possible they died because of
some exception, and the message you see is just a side effect.

Andrew


2014-07-23 8:27 GMT-07:00 Eric Friedman <er...@gmail.com>:

> I'm using spark 1.0.1 on a quite large cluster, with gobs of memory, etc.
>  Cluster resources are available to me via Yarn and I am seeing these
> errors quite often.
>
> ERROR YarnClientClusterScheduler: Lost executor 63 on <host>: remote Akka
> client disassociated
>
>
> This is in an interactive shell session.  I don't know a lot about Yarn
> plumbing and am wondering if there's some constraint in play -- executors
> can't be idle for too long or they get cleared out.
>
>
> Any insights here?
>