You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/01/27 11:11:39 UTC

[jira] [Resolved] (SPARK-5209) Jobs fail with "unexpected value" exception in certain environments

     [ https://issues.apache.org/jira/browse/SPARK-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-5209.
------------------------------
    Resolution: Not A Problem

> Jobs fail with "unexpected value" exception in certain environments
> -------------------------------------------------------------------
>
>                 Key: SPARK-5209
>                 URL: https://issues.apache.org/jira/browse/SPARK-5209
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.2.0
>         Environment: Amazon Elastic Map Reduce
>            Reporter: Sven Krasser
>         Attachments: driver_log.txt, exec_log.txt, gen_test_data.py, repro.py, spark-defaults.conf
>
>
> Jobs fail consistently and reproducibly with exceptions of the following type in PySpark using Spark 1.2.0:
> {noformat}
> 2015-01-13 00:14:05,898 ERROR [Executor task launch worker-1] executor.Executor (Logging.scala:logError(96)) - Exception in task 27.0 in stage 0.0 (TID 28)
> org.apache.spark.SparkException: PairwiseRDD: unexpected value: List([B@4c09f3e0)
> {noformat}
> The issue appeared the first time in Spark 1.2.0 and is sensitive to the environment (configuration, cluster size), i.e. some changes to the environment will cause the error to not occur.
> The following steps yield a reproduction on Amazon Elastic Map Reduce. Launch an EMR cluster with the following parameters (this will bootstrap Spark 1.2.0 onto it):
> {code}
> aws emr create-cluster --region us-west-1 --no-auto-terminate \
>    --ec2-attributes KeyName=your-key-here,SubnetId=your-subnet-here \
>    --bootstrap-actions Path=s3://support.elasticmapreduce/spark/install-spark,Args='["-g","-v","1.2.0.a"]' \
>    --ami-version 3.3 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge \
>    InstanceGroupType=CORE,InstanceCount=3,InstanceType=r3.xlarge --name "Spark Issue Repro" \
>    --visible-to-all-users --applications Name=Ganglia
> {code}
> Next, copy the attached {{spark-defaults.conf}} to {{~/spark/conf/}}.
> Run {{~/spark/bin/spark-submit gen_test_data.py}} to generate a test data set on HDFS. Then lastly run {{~/spark/bin/spark-submit repro.py}} to reproduce the error.
> Driver and executor logs are attached. For reference, a spark-user thread on the topic is here: http://mail-archives.us.apache.org/mod_mbox/spark-user/201501.mbox/%3CC5A80834-8F1C-4C0A-89F9-E04D3F1C4469@gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org