You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ian Wilkinson <ia...@me.com> on 2014/07/16 15:10:12 UTC

Problem running Spark shell (1.0.0) on EMR

Hi,

I’m trying to run the Spark (1.0.0) shell on EMR and encountering a classpath issue.
I suspect I’m missing something gloriously obviously, but so far it is eluding me.

I launch the EMR Cluster (using the aws cli) with:

aws emr create-cluster --name "Test Cluster"  \
	--ami-version 3.0.3 \
	--no-auto-terminate \
	--ec2-attributes KeyName=<...> \
	--bootstrap-actions Path=s3://elasticmapreduce/samples/spark/1.0.0/install-spark-shark.rb \
	--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.medium  \
	InstanceGroupType=CORE,InstanceCount=1,InstanceType=m1.medium --region eu-west-1

then,

$ aws emr ssh --cluster-id <...> --key-pair-file <...> --region eu-west-1

On the master node, I then launch the shell with:

[hadoop@ip-... spark]$ ./bin/spark-shell

and try performing:

scala> val logs = sc.textFile("s3n://.../“)

this produces:

14/07/16 12:40:35 WARN storage.BlockManager: Putting block broadcast_0 failed
java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;


Any help mighty welcome,
ian


Re: Problem running Spark shell (1.0.0) on EMR

Posted by Martin Goodson <ma...@skimlinks.com>.
I am also having exactly the same problem, calling using pyspark. Has
anyone managed to get this script to work?


-- 
Martin Goodson  |  VP Data Science
(0)20 3397 1240
[image: Inline image 1]


On Wed, Jul 16, 2014 at 2:10 PM, Ian Wilkinson <ia...@me.com> wrote:

> Hi,
>
> I’m trying to run the Spark (1.0.0) shell on EMR and encountering a
> classpath issue.
> I suspect I’m missing something gloriously obviously, but so far it is
> eluding me.
>
> I launch the EMR Cluster (using the aws cli) with:
>
> aws emr create-cluster --name "Test Cluster"  \
>         --ami-version 3.0.3 \
>         --no-auto-terminate \
>         --ec2-attributes KeyName=<...> \
>         --bootstrap-actions
> Path=s3://elasticmapreduce/samples/spark/1.0.0/install-spark-shark.rb \
>         --instance-groups
> InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.medium  \
>         InstanceGroupType=CORE,InstanceCount=1,InstanceType=m1.medium
> --region eu-west-1
>
> then,
>
> $ aws emr ssh --cluster-id <...> --key-pair-file <...> --region eu-west-1
>
> On the master node, I then launch the shell with:
>
> [hadoop@ip-... spark]$ ./bin/spark-shell
>
> and try performing:
>
> scala> val logs = sc.textFile("s3n://.../“)
>
> this produces:
>
> 14/07/16 12:40:35 WARN storage.BlockManager: Putting block broadcast_0
> failed
> java.lang.NoSuchMethodError:
> com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
>
>
> Any help mighty welcome,
> ian
>
>