You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by saatvikshah1994 <sa...@gmail.com> on 2017/06/23 00:21:10 UTC

Using Spark with Local File System/NFS

Hi,

I've downloaded and kept the same set of data files on all my cluster nodes,
in the same absolute path - say /home/xyzuser/data/*. I am now trying to
perform an operation(say open(filename).read()) on all these files in spark,
but by passing local file paths. I was under the assumption that as long as
the worker can find the file path it will be able to execute it. However, my
Spark tasks fail with the error(/home/xyzuser/data/* is not present) - and
Im sure its present on all my worker nodes.

If this experiment was successful I was planning to setup a NFS (actually
more like a read-only cloud persistent disk connected to my cluster nodes in
dataproc) and use that instead.

What exactly is going wrong here?

Thanks



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-with-Local-File-System-NFS-tp28781.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Using Spark with Local File System/NFS

Posted by Michael Mior <mi...@gmail.com>.
If you put a * in the path, Spark will look for a file or directory named
*. To read all the files in a directory, just remove the star.

--
Michael Mior
michael.mior@gmail.com

On Jun 22, 2017 17:21, "saatvikshah1994" <sa...@gmail.com> wrote:

> Hi,
>
> I've downloaded and kept the same set of data files on all my cluster
> nodes,
> in the same absolute path - say /home/xyzuser/data/*. I am now trying to
> perform an operation(say open(filename).read()) on all these files in
> spark,
> but by passing local file paths. I was under the assumption that as long as
> the worker can find the file path it will be able to execute it. However,
> my
> Spark tasks fail with the error(/home/xyzuser/data/* is not present) - and
> Im sure its present on all my worker nodes.
>
> If this experiment was successful I was planning to setup a NFS (actually
> more like a read-only cloud persistent disk connected to my cluster nodes
> in
> dataproc) and use that instead.
>
> What exactly is going wrong here?
>
> Thanks
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Using-Spark-with-Local-File-System-NFS-tp28781.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>