You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "Mozumder, Monir" <Mo...@amd.com> on 2014/09/11 21:14:40 UTC

RE: cannot read file form a local path

I am seeing this same issue with Spark 1.0.1 (tried with file:// for local file ) :



scala> val lines = sc.textFile("file:///home/monir/.bashrc")
lines: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12

scala> val linecount = lines.count
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/monir/.bashrc
	at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:175)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)

-----Original Message-----
From: wsun
Sent: Feb 03, 2014; 12:44pm
To: user@spark.incubator.apache.org
Subject: cannot read file form a local path


After installing spark 0.8.1 on a EC2 cluster, I launched Spark shell on the master. This is what happened to me: 

scala>val textFile=sc.textFile("README.md") 
14/02/03 20:38:08 INFO storage.MemoryStore: ensureFreeSpace(34380) called with c                                      urMem=0, maxMem=4082116853 
14/02/03 20:38:08 INFO storage.MemoryStore: Block broadcast_0 stored as values t                                      o memory (estimated size 33.6 KB, free 3.8 GB) 
textFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <consol                                      e>:12 


scala> textFile.count() 
14/02/03 20:38:39 WARN snappy.LoadSnappy: Snappy native library is available 
14/02/03 20:38:39 INFO util.NativeCodeLoader: Loaded the native-hadoop library 
14/02/03 20:38:39 INFO snappy.LoadSnappy: Snappy native library loaded 
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs:                                      //ec2-54-234-136-50.compute-1.amazonaws.com:9000/user/root/README.md 
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.j                                      ava:197) 
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.ja                                      va:208) 
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:141) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199) 
        at scala.Option.getOrElse(Option.scala:108) 
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:199) 
        at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199) 
        at scala.Option.getOrElse(Option.scala:108) 
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:199) 
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:886) 
        at org.apache.spark.rdd.RDD.count(RDD.scala:698) 


Spark seems looking for "README.md" in hdfs. However, I did not specify the file is located in hdfs. I am just wondering if there any configuration in Spark that force Spark to read files from local file system. Thanks in advance for any helps. 

wp

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

RE: cannot read file form a local path

Posted by "Mozumder, Monir" <Mo...@amd.com>.

Seems starting spark-shell in local mode solves this. But still then it cannot recognize file beginning with a '.' 

    MASTER=local[4] ./bin/spark-shell
    ....
    .....
    scala> val lineCount = sc.textFile("/home/monir/ref").count
    lineCount: Long = 68

    scala> val lineCount2 = sc.textFile("/home/monir/.ref").count
    org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/monir/.ref


Though I am ok with running spark-shell in  local mode to basic examples run, I was wondering if getting to local files on the cluster nodes is possible when all of the worker nodes have the file in question in their local file system.

Still fairly new to Spark so bear with me if this is easily tunable by some config params.

Bests,
-Monir



-----Original Message-----
From: Mozumder, Monir 
Sent: Thursday, September 11, 2014 12:15 PM
To: user@spark.apache.org
Subject: RE: cannot read file form a local path

I am seeing this same issue with Spark 1.0.1 (tried with file:// for local file ) :



scala> val lines = sc.textFile("file:///home/monir/.bashrc")
lines: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12

scala> val linecount = lines.count
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/monir/.bashrc
	at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:175)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)

-----Original Message-----
From: wsun
Sent: Feb 03, 2014; 12:44pm
To: user@spark.incubator.apache.org
Subject: cannot read file form a local path


After installing spark 0.8.1 on a EC2 cluster, I launched Spark shell on the master. This is what happened to me: 

scala>val textFile=sc.textFile("README.md")
14/02/03 20:38:08 INFO storage.MemoryStore: ensureFreeSpace(34380) called with c                                      urMem=0, maxMem=4082116853 
14/02/03 20:38:08 INFO storage.MemoryStore: Block broadcast_0 stored as values t                                      o memory (estimated size 33.6 KB, free 3.8 GB) 
textFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <consol                                      e>:12 


scala> textFile.count()
14/02/03 20:38:39 WARN snappy.LoadSnappy: Snappy native library is available
14/02/03 20:38:39 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/02/03 20:38:39 INFO snappy.LoadSnappy: Snappy native library loaded 
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs:                                      //ec2-54-234-136-50.compute-1.amazonaws.com:9000/user/root/README.md 
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.j                                      ava:197) 
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.ja                                      va:208) 
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:141) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199) 
        at scala.Option.getOrElse(Option.scala:108) 
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:199) 
        at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199) 
        at scala.Option.getOrElse(Option.scala:108) 
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:199) 
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:886) 
        at org.apache.spark.rdd.RDD.count(RDD.scala:698) 


Spark seems looking for "README.md" in hdfs. However, I did not specify the file is located in hdfs. I am just wondering if there any configuration in Spark that force Spark to read files from local file system. Thanks in advance for any helps. 

wp

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional commands, e-mail: user-help@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org