You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "nicholas.chammas" <ni...@gmail.com> on 2014/02/24 03:33:10 UTC

Spark Quick Start - call to open README.md needs explicit fs prefix

I just deployed Spark 0.9.0 to EC2 using the guide
here<http://spark.incubator.apache.org/docs/latest/ec2-scripts.html>.
I then turned to the Quick Start guide
here<http://spark.incubator.apache.org/docs/latest/quick-start.html>
and
walked through it using the Python shell.

When I do this:

>>> textFile = sc.textFile("README.md")
>>> textFile.count()


I get a long error output right after the count() that includes this:

org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
hdfs://ec2-my-node-address.compute-1.amazonaws.com:9000/user/root/README.md

So I guess Spark assumed that the file was in HDFS.

To get the file open and count to work, I had to do this:

>>> textFile = sc.textFile("file:///root/spark/README.md")
>>> textFile.count()


I get the same results if I use the Scala shell.

Does the quick start guide need to updated, or did I miss something?

Nick




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Quick-Start-call-to-open-README-md-needs-explicit-fs-prefix-tp1952.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark Quick Start - call to open README.md needs explicit fs prefix

Posted by Nicholas Chammas <ni...@gmail.com>.

Makes sense. Thank you.


On Sun, Feb 23, 2014 at 9:57 PM, Matei Zaharia <ma...@gmail.com>wrote:

> Good catch; the Spark cluster on EC2 is configured to use HDFS as its
> default filesystem, so it can't find this file. The quick start was written
> to run on a single machine with an out-of-the-box install. If you'd like to
> upload this file to the HDFS cluster on EC2, use the following command:
>
> ~/ephemeral-hdfs/bin/hadoop fs -put README.md README.md
>
> Matei
>
> On Feb 23, 2014, at 6:33 PM, nicholas.chammas <ni...@gmail.com>
> wrote:
>
> I just deployed Spark 0.9.0 to EC2 using the guide here<http://spark.incubator.apache.org/docs/latest/ec2-scripts.html>.
> I then turned to the Quick Start guide here<http://spark.incubator.apache.org/docs/latest/quick-start.html> and
> walked through it using the Python shell.
>
> When I do this:
>
> >>> textFile = sc.textFile("README.md")
> >>> textFile.count()
>
>
> I get a long error output right after the count() that includes this:
>
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
> hdfs://
> ec2-my-node-address.compute-1.amazonaws.com:9000/user/root/README.md
>
> So I guess Spark assumed that the file was in HDFS.
>
> To get the file open and count to work, I had to do this:
>
> >>> textFile = sc.textFile("file:///root/spark/README.md")
> >>> textFile.count()
>
>
> I get the same results if I use the Scala shell.
>
> Does the quick start guide need to updated, or did I miss something?
>
> Nick
>
>
> ------------------------------
> View this message in context: Spark Quick Start - call to open README.md
> needs explicit fs prefix<http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Quick-Start-call-to-open-README-md-needs-explicit-fs-prefix-tp1952.html>
> Sent from the Apache Spark User List mailing list archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at
> Nabble.com.
>
>
>

Re: Spark Quick Start - call to open README.md needs explicit fs prefix

Posted by Matei Zaharia <ma...@gmail.com>.

Good catch; the Spark cluster on EC2 is configured to use HDFS as its default filesystem, so it can’t find this file. The quick start was written to run on a single machine with an out-of-the-box install. If you’d like to upload this file to the HDFS cluster on EC2, use the following command:

~/ephemeral-hdfs/bin/hadoop fs -put README.md README.md

Matei

On Feb 23, 2014, at 6:33 PM, nicholas.chammas <ni...@gmail.com> wrote:

> I just deployed Spark 0.9.0 to EC2 using the guide here. I then turned to the Quick Start guide here and walked through it using the Python shell.
> 
> When I do this:
> 
> >>> textFile = sc.textFile("README.md")
> >>> textFile.count()
> 
> I get a long error output right after the count() that includes this:
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://ec2-my-node-address.compute-1.amazonaws.com:9000/user/root/README.md
> 
> So I guess Spark assumed that the file was in HDFS. 
> 
> To get the file open and count to work, I had to do this:
> 
> >>> textFile = sc.textFile("file:///root/spark/README.md")
> >>> textFile.count()
> 
> I get the same results if I use the Scala shell.
> 
> Does the quick start guide need to updated, or did I miss something?
> 
> Nick
> 
> 
> View this message in context: Spark Quick Start - call to open README.md needs explicit fs prefix
> Sent from the Apache Spark User List mailing list archive at Nabble.com.