You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Joaquin Alzola <Jo...@lebara.com> on 2016/06/30 11:34:05 UTC

Remote RPC client disassociated

HI List,

I am launching this spark-submit job:

hadoop@testbedocg:/mnt/spark> bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.10:1.6.0 --jars /mnt/spark/lib/TargetHolding_pyspark-cassandra-0.3.5.jar spark_v2.py

spark_v2.py is:
from pyspark_cassandra import CassandraSparkContext, Row
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
conf = SparkConf().setAppName("test").setMaster("spark://192.168.23.31:7077").set("spark.cassandra.connection.host", "192.168.23.31")
sc = CassandraSparkContext(conf=conf)
table = sc.cassandraTable("lebara_diameter_codes","nl_lebara_diameter_codes")
food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
food_count.collect()


Error I get when running the above command:

[Stage 0:>                                                          (0 + 3) / 7]16/06/30 10:40:36 ERROR TaskSchedulerImpl: Lost executor 0 on as5: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[Stage 0:>                                                          (0 + 7) / 7]16/06/30 10:40:40 ERROR TaskSchedulerImpl: Lost executor 1 on as4: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[Stage 0:>                                                          (0 + 5) / 7]16/06/30 10:40:42 ERROR TaskSchedulerImpl: Lost executor 3 on as5: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[Stage 0:>                                                          (0 + 4) / 7]16/06/30 10:40:46 ERROR TaskSchedulerImpl: Lost executor 4 on as4: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/06/30 10:40:46 ERROR TaskSetManager: Task 5 in stage 0.0 failed 4 times; aborting job
Traceback (most recent call last):
  File "/mnt/spark-1.6.1-bin-hadoop2.6/spark_v2.py", line 11, in <module>
    food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1004, in count
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 995, in sum
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 869, in fold
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in collect
  File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
  File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage 0.0 (TID 14, as4): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
       at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
        at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:405)
        at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)


From the jobs URL spark web (stderr log page for app-20160630104030-0086/4):

stderr log page for app-20160630104030-0086/4
16/06/30 10:44:34 ERROR util.Utils: Uncaught exception in thread stdout writer for python
java.lang.AbstractMethodError: pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;
              at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
              at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
              at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
              at com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)
              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
              at scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:914)
              at scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929)
              at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:968)
              at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
              at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
              at scala.collection.Iterator$class.foreach(Iterator.scala:727)
              at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
              at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
              at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
              at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765)
              at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)
16/06/30 10:44:34 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[stdout writer for python,5,main]
java.lang.AbstractMethodError: pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;
              at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
              at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
              at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
              at com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)
              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
              at scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:914)
              at scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929)
              at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:968)
              at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
              at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
              at scala.collection.Iterator$class.foreach(Iterator.scala:727)
              at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
              at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
              at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
              at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765)
               at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239) BR

Joaquin
This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.

RE: Remote RPC client disassociated

Posted by Joaquin Alzola <Jo...@lebara.com>.
>>> 16/06/30 10:44:34 ERROR util.Utils: Uncaught exception in thread stdout writer for python
java.lang.AbstractMethodError: pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;
>> You are trying to call an abstract method.  Please check the method DeferringRowReader.read

Do not know how to fix this issue.
Have seen in many tutorials around the net and those ones made the same calling I am currently doing

from pyspark_cassandra import CassandraSparkContext, Row
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
conf = SparkConf().setAppName("test").setMaster("spark://192.168.23.31:7077<http://192.168.23.31:7077>").set("spark.cassandra.connection.host", "192.168.23.31")
sc = CassandraSparkContext(conf=conf)
table = sc.cassandraTable("lebara_diameter_codes","nl_lebara_diameter_codes")
food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
food_count.collect()

I am really new to this psark thing. Was able to configure it correctly nd now learning the API.
This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.

Re: Remote RPC client disassociated

Posted by Jeff Zhang <zj...@gmail.com>.
>>> 16/06/30 10:44:34 ERROR util.Utils: Uncaught exception in thread stdout
writer for python

java.lang.AbstractMethodError:
pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;


You are trying to call an abstract method.  Please check the method
DeferringRowReader.read

On Thu, Jun 30, 2016 at 4:34 AM, Joaquin Alzola <Jo...@lebara.com>
wrote:

> HI List,
>
>
>
> I am launching this spark-submit job:
>
>
>
> hadoop@testbedocg:/mnt/spark> bin/spark-submit --packages
> com.datastax.spark:spark-cassandra-connector_2.10:1.6.0 --jars
> /mnt/spark/lib/TargetHolding_pyspark-cassandra-0.3.5.jar spark_v2.py
>
>
>
> spark_v2.py is:
>
> from pyspark_cassandra import CassandraSparkContext, Row
>
> from pyspark import SparkContext, SparkConf
>
> from pyspark.sql import SQLContext
>
> conf = SparkConf().setAppName("test").setMaster("spark://
> 192.168.23.31:7077").set("spark.cassandra.connection.host",
> "192.168.23.31")
>
> sc = CassandraSparkContext(conf=conf)
>
> table =
> sc.cassandraTable("lebara_diameter_codes","nl_lebara_diameter_codes")
>
> food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
>
> food_count.collect()
>
>
>
>
>
> Error I get when running the above command:
>
>
>
> [Stage 0:>                                                          (0 +
> 3) / 7]16/06/30 10:40:36 ERROR TaskSchedulerImpl: Lost executor 0 on as5:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> [Stage 0:>                                                          (0 +
> 7) / 7]16/06/30 10:40:40 ERROR TaskSchedulerImpl: Lost executor 1 on as4:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> [Stage 0:>                                                          (0 +
> 5) / 7]16/06/30 10:40:42 ERROR TaskSchedulerImpl: Lost executor 3 on as5:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> [Stage 0:>                                                          (0 +
> 4) / 7]16/06/30 10:40:46 ERROR TaskSchedulerImpl: Lost executor 4 on as4:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> 16/06/30 10:40:46 ERROR TaskSetManager: Task 5 in stage 0.0 failed 4
> times; aborting job
>
> Traceback (most recent call last):
>
>   File "/mnt/spark-1.6.1-bin-hadoop2.6/spark_v2.py", line 11, in <module>
>
>     food_count =
> table.select("errorcode2001").groupBy("errorcode2001").count()
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1004, in
> count
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 995, in sum
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 869, in
> fold
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in
> collect
>
>   File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line
> 813, in __call__
>
>   File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line
> 308, in get_return_value
>
> py4j.protocol.Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
>
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 5 in stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage
> 0.0 (TID 14, as4): ExecutorLostFailure (executor 4 exited caused by one of
> the running tasks) Reason: Remote RPC client disassociated. Likely due to
> containers exceeding thresholds, or network issues. Check driver logs for
> WARN messages.
>
> Driver stacktrace:
>
>         at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
>
>         at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
>         at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
>
>        at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>
>         at scala.Option.foreach(Option.scala:236)
>
>         at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
>
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
>
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
>
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
>
>         at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>
>         at
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
>
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
>
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
>
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
>
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
>
>         at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
>
>         at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>
>         at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>
>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>
>         at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
>
>         at
> org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:405)
>
>         at
> org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>         at java.lang.reflect.Method.invoke(Method.java:498)
>
>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>
>         at
> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
>
>         at py4j.Gateway.invoke(Gateway.java:259)
>
>         at
> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>
>         at py4j.commands.CallCommand.execute(CallCommand.java:79)
>
>         at py4j.GatewayConnection.run(GatewayConnection.java:209)
>
>         at java.lang.Thread.run(Thread.java:745)
>
>
>
>
>
> From the jobs URL spark web (*stderr log page for
> app-20160630104030-0086/4)*:
>
>
>
> stderr log page for app-20160630104030-0086/4
>
> 16/06/30 10:44:34 ERROR util.Utils: Uncaught exception in thread stdout
> writer for python
>
> java.lang.AbstractMethodError:
> pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;
>
>               at
> com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
>
>               at
> com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
>
>               at
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>               at
> scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
>
>               at
> com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)
>
>               at
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>               at
> scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:914)
>
>               at
> scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929)
>
>               at
> scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:968)
>
>               at
> scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
>
>               at
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>
>               at
> scala.collection.Iterator$class.foreach(Iterator.scala:727)
>
>               at
> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
>               at
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
>
>               at
> org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
>
>               at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765)
>
>               at
> org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)
>
> 16/06/30 10:44:34 ERROR util.SparkUncaughtExceptionHandler: Uncaught
> exception in thread Thread[stdout writer for python,5,main]
>
> java.lang.AbstractMethodError:
> pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;
>
>               at
> com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
>
>               at
> com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
>
>               at
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>               at
> scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
>
>               at
> com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)
>
>               at
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>               at
> scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:914)
>
>               at
> scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929)
>
>               at
> scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:968)
>
>               at
> scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
>
>               at
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>
>               at
> scala.collection.Iterator$class.foreach(Iterator.scala:727)
>
>               at
> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
>               at
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
>
>               at
> org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
>
>               at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765)
>
>                at
> org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)
> BR
>
>
>
> Joaquin
> This email is confidential and may be subject to privilege. If you are not
> the intended recipient, please do not copy or disclose its content but
> contact the sender immediately upon receipt.
>



-- 
Best Regards

Jeff Zhang

Re: Remote RPC client disassociated

Posted by Akhil Das <ak...@hacked.work>.
Can you try the Cassandra connector 1.5? It is also compatible with Spark
1.6 according to their documentation
https://github.com/datastax/spark-cassandra-connector#version-compatibility
You can also crosspost it over here
https://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user

On Fri, Jul 1, 2016 at 5:45 PM, Joaquin Alzola <Jo...@lebara.com>
wrote:

> HI Akhil
>
>
>
> I am using:
>
> Cassandra: 3.0.5
>
> Spark: 1.6.1
>
> Scala 2.10
>
> Spark-cassandra connector: 1.6.0
>
>
>
> *From:* Akhil Das [mailto:akhld@hacked.work]
> *Sent:* 01 July 2016 11:38
> *To:* Joaquin Alzola <Jo...@lebara.com>
> *Cc:* user@spark.apache.org
> *Subject:* Re: Remote RPC client disassociated
>
>
>
> This looks like a version conflict, which version of spark are you using?
> The Cassandra connector you are using is for Scala 2.10x and Spark 1.6
> version.
>
>
>
> On Thu, Jun 30, 2016 at 6:34 PM, Joaquin Alzola <Jo...@lebara.com>
> wrote:
>
> HI List,
>
>
>
> I am launching this spark-submit job:
>
>
>
> hadoop@testbedocg:/mnt/spark> bin/spark-submit --packages
> com.datastax.spark:spark-cassandra-connector_2.10:1.6.0 --jars
> /mnt/spark/lib/TargetHolding_pyspark-cassandra-0.3.5.jar spark_v2.py
>
>
>
> spark_v2.py is:
>
> from pyspark_cassandra import CassandraSparkContext, Row
>
> from pyspark import SparkContext, SparkConf
>
> from pyspark.sql import SQLContext
>
> conf = SparkConf().setAppName("test").setMaster("spark://
> 192.168.23.31:7077").set("spark.cassandra.connection.host",
> "192.168.23.31")
>
> sc = CassandraSparkContext(conf=conf)
>
> table =
> sc.cassandraTable("lebara_diameter_codes","nl_lebara_diameter_codes")
>
> food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
>
> food_count.collect()
>
>
>
>
>
> Error I get when running the above command:
>
>
>
> [Stage 0:>                                                          (0 +
> 3) / 7]16/06/30 10:40:36 ERROR TaskSchedulerImpl: Lost executor 0 on as5:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> [Stage 0:>                                                          (0 +
> 7) / 7]16/06/30 10:40:40 ERROR TaskSchedulerImpl: Lost executor 1 on as4:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> [Stage 0:>                                                          (0 +
> 5) / 7]16/06/30 10:40:42 ERROR TaskSchedulerImpl: Lost executor 3 on as5:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> [Stage 0:>                                                          (0 +
> 4) / 7]16/06/30 10:40:46 ERROR TaskSchedulerImpl: Lost executor 4 on as4:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> 16/06/30 10:40:46 ERROR TaskSetManager: Task 5 in stage 0.0 failed 4
> times; aborting job
>
> Traceback (most recent call last):
>
>   File "/mnt/spark-1.6.1-bin-hadoop2.6/spark_v2.py", line 11, in <module>
>
>     food_count =
> table.select("errorcode2001").groupBy("errorcode2001").count()
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1004, in
> count
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 995, in sum
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 869, in
> fold
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in
> collect
>
>   File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line
> 813, in __call__
>
>   File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line
> 308, in get_return_value
>
> py4j.protocol.Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
>
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 5 in stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage
> 0.0 (TID 14, as4): ExecutorLostFailure (executor 4 exited caused by one of
> the running tasks) Reason: Remote RPC client disassociated. Likely due to
> containers exceeding thresholds, or network issues. Check driver logs for
> WARN messages.
>
> Driver stacktrace:
>
>         at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
>
>         at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
>         at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
>
>        at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>
>         at scala.Option.foreach(Option.scala:236)
>
>         at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
>
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
>
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
>
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
>
>         at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>
>         at
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
>
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
>
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
>
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
>
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
>
>         at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
>
>         at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>
>         at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>
>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>
>         at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
>
>         at
> org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:405)
>
>         at
> org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>         at java.lang.reflect.Method.invoke(Method.java:498)
>
>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>
>         at
> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
>
>         at py4j.Gateway.invoke(Gateway.java:259)
>
>         at
> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>
>         at py4j.commands.CallCommand.execute(CallCommand.java:79)
>
>         at py4j.GatewayConnection.run(GatewayConnection.java:209)
>
>         at java.lang.Thread.run(Thread.java:745)
>
>
>
>
>
> From the jobs URL spark web (*stderr log page for
> app-20160630104030-0086/4)*:
>
>
>
> stderr log page for app-20160630104030-0086/4
>
> 16/06/30 10:44:34 ERROR util.Utils: Uncaught exception in thread stdout
> writer for python
>
> java.lang.AbstractMethodError:
> pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;
>
>               at
> com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
>
>               at
> com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
>
>               at
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>               at
> scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
>
>               at
> com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)
>
>               at
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>               at
> scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:914)
>
>               at
> scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929)
>
>               at
> scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:968)
>
>               at
> scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
>
>               at
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>
>               at
> scala.collection.Iterator$class.foreach(Iterator.scala:727)
>
>               at
> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
>               at
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
>
>               at
> org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
>
>               at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765)
>
>               at
> org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)
>
> 16/06/30 10:44:34 ERROR util.SparkUncaughtExceptionHandler: Uncaught
> exception in thread Thread[stdout writer for python,5,main]
>
> java.lang.AbstractMethodError:
> pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;
>
>               at
> com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
>
>               at
> com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
>
>               at
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>               at
> scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
>
>               at
> com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)
>
>               at
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>               at
> scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:914)
>
>               at
> scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929)
>
>               at
> scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:968)
>
>               at
> scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
>
>               at
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>
>               at
> scala.collection.Iterator$class.foreach(Iterator.scala:727)
>
>               at
> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
>               at
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
>
>               at
> org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
>
>               at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765)
>
>                at
> org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)
> BR
>
>
>
> Joaquin
>
> This email is confidential and may be subject to privilege. If you are not
> the intended recipient, please do not copy or disclose its content but
> contact the sender immediately upon receipt.
>
>
>
>
>
> --
>
> Cheers!
>
>
> This email is confidential and may be subject to privilege. If you are not
> the intended recipient, please do not copy or disclose its content but
> contact the sender immediately upon receipt.
>



-- 
Cheers!

RE: Remote RPC client disassociated

Posted by Joaquin Alzola <Jo...@lebara.com>.
HI Akhil

I am using:
Cassandra: 3.0.5
Spark: 1.6.1
Scala 2.10
Spark-cassandra connector: 1.6.0

From: Akhil Das [mailto:akhld@hacked.work]
Sent: 01 July 2016 11:38
To: Joaquin Alzola <Jo...@lebara.com>
Cc: user@spark.apache.org
Subject: Re: Remote RPC client disassociated

This looks like a version conflict, which version of spark are you using? The Cassandra connector you are using is for Scala 2.10x and Spark 1.6 version.

On Thu, Jun 30, 2016 at 6:34 PM, Joaquin Alzola <Jo...@lebara.com>> wrote:
HI List,

I am launching this spark-submit job:

hadoop@testbedocg:/mnt/spark> bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.10:1.6.0 --jars /mnt/spark/lib/TargetHolding_pyspark-cassandra-0.3.5.jar spark_v2.py

spark_v2.py is:
from pyspark_cassandra import CassandraSparkContext, Row
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
conf = SparkConf().setAppName("test").setMaster("spark://192.168.23.31:7077<http://192.168.23.31:7077>").set("spark.cassandra.connection.host", "192.168.23.31")
sc = CassandraSparkContext(conf=conf)
table = sc.cassandraTable("lebara_diameter_codes","nl_lebara_diameter_codes")
food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
food_count.collect()


Error I get when running the above command:

[Stage 0:>                                                          (0 + 3) / 7]16/06/30 10:40:36 ERROR TaskSchedulerImpl: Lost executor 0 on as5: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[Stage 0:>                                                          (0 + 7) / 7]16/06/30 10:40:40 ERROR TaskSchedulerImpl: Lost executor 1 on as4: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[Stage 0:>                                                          (0 + 5) / 7]16/06/30 10:40:42 ERROR TaskSchedulerImpl: Lost executor 3 on as5: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[Stage 0:>                                                          (0 + 4) / 7]16/06/30 10:40:46 ERROR TaskSchedulerImpl: Lost executor 4 on as4: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/06/30 10:40:46 ERROR TaskSetManager: Task 5 in stage 0.0 failed 4 times; aborting job
Traceback (most recent call last):
  File "/mnt/spark-1.6.1-bin-hadoop2.6/spark_v2.py", line 11, in <module>
    food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1004, in count
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 995, in sum
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 869, in fold
  File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in collect
  File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
  File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage 0.0 (TID 14, as4): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org<http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
       at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
        at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:405)
        at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)


From the jobs URL spark web (stderr log page for app-20160630104030-0086/4):

stderr log page for app-20160630104030-0086/4
16/06/30 10:44:34 ERROR util.Utils: Uncaught exception in thread stdout writer for python
java.lang.AbstractMethodError: pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;
              at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
              at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
              at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
              at com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)
              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
              at scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:914)
              at scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929)
              at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:968)
              at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
              at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
              at scala.collection.Iterator$class.foreach(Iterator.scala:727)
              at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
              at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
              at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
              at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765)
              at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)
16/06/30 10:44:34 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[stdout writer for python,5,main]
java.lang.AbstractMethodError: pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;
              at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
              at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
              at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
              at com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)
              at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
              at scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:914)
              at scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929)
              at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:968)
              at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
              at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
              at scala.collection.Iterator$class.foreach(Iterator.scala:727)
              at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
              at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
              at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
              at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765)
               at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239) BR

Joaquin
This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.



--
Cheers!

This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.

Re: Remote RPC client disassociated

Posted by Akhil Das <ak...@hacked.work>.
This looks like a version conflict, which version of spark are you using?
The Cassandra connector you are using is for Scala 2.10x and Spark 1.6
version.

On Thu, Jun 30, 2016 at 6:34 PM, Joaquin Alzola <Jo...@lebara.com>
wrote:

> HI List,
>
>
>
> I am launching this spark-submit job:
>
>
>
> hadoop@testbedocg:/mnt/spark> bin/spark-submit --packages
> com.datastax.spark:spark-cassandra-connector_2.10:1.6.0 --jars
> /mnt/spark/lib/TargetHolding_pyspark-cassandra-0.3.5.jar spark_v2.py
>
>
>
> spark_v2.py is:
>
> from pyspark_cassandra import CassandraSparkContext, Row
>
> from pyspark import SparkContext, SparkConf
>
> from pyspark.sql import SQLContext
>
> conf = SparkConf().setAppName("test").setMaster("spark://
> 192.168.23.31:7077").set("spark.cassandra.connection.host",
> "192.168.23.31")
>
> sc = CassandraSparkContext(conf=conf)
>
> table =
> sc.cassandraTable("lebara_diameter_codes","nl_lebara_diameter_codes")
>
> food_count = table.select("errorcode2001").groupBy("errorcode2001").count()
>
> food_count.collect()
>
>
>
>
>
> Error I get when running the above command:
>
>
>
> [Stage 0:>                                                          (0 +
> 3) / 7]16/06/30 10:40:36 ERROR TaskSchedulerImpl: Lost executor 0 on as5:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> [Stage 0:>                                                          (0 +
> 7) / 7]16/06/30 10:40:40 ERROR TaskSchedulerImpl: Lost executor 1 on as4:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> [Stage 0:>                                                          (0 +
> 5) / 7]16/06/30 10:40:42 ERROR TaskSchedulerImpl: Lost executor 3 on as5:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> [Stage 0:>                                                          (0 +
> 4) / 7]16/06/30 10:40:46 ERROR TaskSchedulerImpl: Lost executor 4 on as4:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> 16/06/30 10:40:46 ERROR TaskSetManager: Task 5 in stage 0.0 failed 4
> times; aborting job
>
> Traceback (most recent call last):
>
>   File "/mnt/spark-1.6.1-bin-hadoop2.6/spark_v2.py", line 11, in <module>
>
>     food_count =
> table.select("errorcode2001").groupBy("errorcode2001").count()
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1004, in
> count
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 995, in sum
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 869, in
> fold
>
>   File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in
> collect
>
>   File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line
> 813, in __call__
>
>   File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line
> 308, in get_return_value
>
> py4j.protocol.Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
>
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 5 in stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage
> 0.0 (TID 14, as4): ExecutorLostFailure (executor 4 exited caused by one of
> the running tasks) Reason: Remote RPC client disassociated. Likely due to
> containers exceeding thresholds, or network issues. Check driver logs for
> WARN messages.
>
> Driver stacktrace:
>
>         at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
>
>         at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
>         at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
>
>        at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>
>         at scala.Option.foreach(Option.scala:236)
>
>         at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
>
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
>
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
>
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
>
>         at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>
>         at
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
>
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
>
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
>
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
>
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
>
>         at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
>
>         at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>
>         at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>
>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>
>         at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
>
>         at
> org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:405)
>
>         at
> org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>         at java.lang.reflect.Method.invoke(Method.java:498)
>
>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>
>         at
> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
>
>         at py4j.Gateway.invoke(Gateway.java:259)
>
>         at
> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>
>         at py4j.commands.CallCommand.execute(CallCommand.java:79)
>
>         at py4j.GatewayConnection.run(GatewayConnection.java:209)
>
>         at java.lang.Thread.run(Thread.java:745)
>
>
>
>
>
> From the jobs URL spark web (*stderr log page for
> app-20160630104030-0086/4)*:
>
>
>
> stderr log page for app-20160630104030-0086/4
>
> 16/06/30 10:44:34 ERROR util.Utils: Uncaught exception in thread stdout
> writer for python
>
> java.lang.AbstractMethodError:
> pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;
>
>               at
> com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
>
>               at
> com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
>
>               at
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>               at
> scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
>
>               at
> com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)
>
>               at
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>               at
> scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:914)
>
>               at
> scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929)
>
>               at
> scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:968)
>
>               at
> scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
>
>               at
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>
>               at
> scala.collection.Iterator$class.foreach(Iterator.scala:727)
>
>               at
> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
>               at
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
>
>               at
> org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
>
>               at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765)
>
>               at
> org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)
>
> 16/06/30 10:44:34 ERROR util.SparkUncaughtExceptionHandler: Uncaught
> exception in thread Thread[stdout writer for python,5,main]
>
> java.lang.AbstractMethodError:
> pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;
>
>               at
> com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
>
>               at
> com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315)
>
>               at
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>               at
> scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
>
>               at
> com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)
>
>               at
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>               at
> scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:914)
>
>               at
> scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929)
>
>               at
> scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:968)
>
>               at
> scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
>
>               at
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>
>               at
> scala.collection.Iterator$class.foreach(Iterator.scala:727)
>
>               at
> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
>               at
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
>
>               at
> org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
>
>               at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765)
>
>                at
> org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)
> BR
>
>
>
> Joaquin
> This email is confidential and may be subject to privilege. If you are not
> the intended recipient, please do not copy or disclose its content but
> contact the sender immediately upon receipt.
>



-- 
Cheers!