You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by cheez <11...@seecs.edu.pk> on 2016/02/11 22:47:24 UTC

AmpLab Big Data Benchmark for Spark error on EC2

I am trying to run the Big Data  benchmark
<https://amplab.cs.berkeley.edu/benchmark/>   on my EC2 cluster for my own
Spark fork of version 1.5. It just modifies some files on the Spark core. My
cluster contains 1 master and 2 slave nodes of type m1.large. I use the ec2
scripts bundled with Spark to launch my cluster. The cluster launched
perfectly and I am able to successfully ssh into the master. However when I
try to run the benchmarks from the master using the command

./runner/prepare-benchmark.sh --shark --aws-key-id=xxxxxxxx
--aws-key=xxxxxxxx --shark-host=<my-spark-master>
--shark-identity-file=/root/.ssh/id_rsa --scale-factor=1

I get the following error:

=== IMPORTING BENCHMARK DATA FROM S3 ===
bash: /root/ephemeral-hdfs/bin/hdfs: No such file or directory
Connection to ec2-54-201-169-165.us-west-2.compute.amazonaws.com closed.
bash: /root/mapreduce/bin/start-mapred.sh: No such file or directory
Connection to ec2-54-201-169-165.us-west-2.compute.amazonaws.com closed.
Traceback (most recent call last):
  File "./prepare_benchmark.py", line 606, in <module>
    main()
  File "./prepare_benchmark.py", line 594, in main
    prepare_shark_dataset(opts)
  File "./prepare_benchmark.py", line 192, in prepare_shark_dataset
    ssh_shark("/root/mapreduce/bin/start-mapred.sh")
  File "./prepare_benchmark.py", line 180, in ssh_shark
    ssh(opts.shark_host, "root", opts.shark_identity_file, command)
  File "./prepare_benchmark.py", line 139, in ssh
    (identity_file, username, host, command), shell=True)
  File "/usr/lib64/python2.6/subprocess.py", line 505, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'ssh -t -o StrictHostKeyChecking=no
-i         /root/.ssh/id_rsa
root@ec2-54-201-169-165.us-west-2.compute.amazonaws.com 'source    
/root/.bash_profile; 
/root/mapreduce/bin/start-mapred.sh'' returned non-zero exit     status 127

 have tried terminating the cluster and launching it again multiples times
but the problem persists. What could be the issue?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/AmpLab-Big-Data-Benchmark-for-Spark-error-on-EC2-tp26207.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org