You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Amo A (JIRA)" <ji...@apache.org> on 2015/01/23 23:42:35 UTC

[jira] [Commented] (SPARK-5209) Jobs fail with "unexpected value" exception in certain environments

    [ https://issues.apache.org/jira/browse/SPARK-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290153#comment-14290153 ] 

Amo A commented on SPARK-5209:
------------------------------

So after some further testing as promised, we managed to find that the job works without any issue when you set the "spark.akka.heartbeat.pauses" to 6000 (as recommended in http://spark.apache.org/docs/1.2.0/configuration.html#execution-behavior) while all the other config settings in your conf file remains the same.

After doing a bit of reading around akka actors behavior and the impact of these settings, (if my understanding on how akka works with my limited knowledge is correct?)

spark.akka.heartbeat.pauses = 6000 ( used to be 600 in Spark 1.1.1)
spark.akka.failure-detector.threshold = 300
spark.akka.heartbeat.interval = 1000

I guess the time between two heartbeats for a particular actor response (spark.akka.heartbeat.interval) has to be smaller than the total of the pause (due to GC or higher load) + the padding to activate the failure detector (spark.akka.failure-detector.threshold) and trigger a kill.

Looking at spark 1.1.1 doc, it seem to have 600 as the default value and however, by the 1.2.0 doc as you notice in the above link the default is suggested to be 6000. If my above theory/understanding is correct, I wonder how it worked in spark 1.1.x. Would someone be able to help explaining this?

Thank you.

> Jobs fail with "unexpected value" exception in certain environments
> -------------------------------------------------------------------
>
>                 Key: SPARK-5209
>                 URL: https://issues.apache.org/jira/browse/SPARK-5209
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.2.0
>         Environment: Amazon Elastic Map Reduce
>            Reporter: Sven Krasser
>         Attachments: driver_log.txt, exec_log.txt, gen_test_data.py, repro.py, spark-defaults.conf
>
>
> Jobs fail consistently and reproducibly with exceptions of the following type in PySpark using Spark 1.2.0:
> {noformat}
> 2015-01-13 00:14:05,898 ERROR [Executor task launch worker-1] executor.Executor (Logging.scala:logError(96)) - Exception in task 27.0 in stage 0.0 (TID 28)
> org.apache.spark.SparkException: PairwiseRDD: unexpected value: List([B@4c09f3e0)
> {noformat}
> The issue appeared the first time in Spark 1.2.0 and is sensitive to the environment (configuration, cluster size), i.e. some changes to the environment will cause the error to not occur.
> The following steps yield a reproduction on Amazon Elastic Map Reduce. Launch an EMR cluster with the following parameters (this will bootstrap Spark 1.2.0 onto it):
> {code}
> aws emr create-cluster --region us-west-1 --no-auto-terminate \
>    --ec2-attributes KeyName=your-key-here,SubnetId=your-subnet-here \
>    --bootstrap-actions Path=s3://support.elasticmapreduce/spark/install-spark,Args='["-g","-v","1.2.0.a"]' \
>    --ami-version 3.3 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge \
>    InstanceGroupType=CORE,InstanceCount=3,InstanceType=r3.xlarge --name "Spark Issue Repro" \
>    --visible-to-all-users --applications Name=Ganglia
> {code}
> Next, copy the attached {{spark-defaults.conf}} to {{~/spark/conf/}}.
> Run {{~/spark/bin/spark-submit gen_test_data.py}} to generate a test data set on HDFS. Then lastly run {{~/spark/bin/spark-submit repro.py}} to reproduce the error.
> Driver and executor logs are attached. For reference, a spark-user thread on the topic is here: http://mail-archives.us.apache.org/mod_mbox/spark-user/201501.mbox/%3CC5A80834-8F1C-4C0A-89F9-E04D3F1C4469@gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org