You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "Ulanov, Alexander" <al...@hp.com> on 2015/04/10 02:23:15 UTC

Access to hdfs FileSystem through Spark

Hi,

Is there a way to access hdfs FileSystem through Spark? For example, I need to check the file size before opening it with sc.binaryFile("hdfs://mynetwork.com:9000/myfile"). Can I do it without creating hadoop FileSystem by myself ?

val fs = FileSystem.get(new URI("hdfs://mynetwork.com:9000"), new Configuration())

Best regards, Alexander

Re: Access to hdfs FileSystem through Spark

Posted by jay vyas <ja...@gmail.com>.
whoa ! sorry about the typos above, i tried to refactor the email and it
sent. anyway, you get the idea :).....

basically spark context will read hadoop settings in the same way yarn
does... so if already on a hadoop cluster, it should work quite naturally
without needing to explicitly set anything at all.

On Thu, Apr 9, 2015 at 10:41 PM, jay vyas <ja...@gmail.com>
wrote:

> if already on a hadoop cluster, it should just work ootb.
> Spark is smart enough to work on hadoop filesystems... it reads hadoop
> conf on an existing normal HCFS
> cluster and sc.textFile will just use whatever your default hadoop fs uri
> is.
>
> In general any this is quite easy to test... once spark is setup properly,
> it should naturally
> load text files useing spark context from the
>
> 1) then put a file into your HCFS file system.
>
> hadoop fs -put /etc/passwd /etc/passwd
>
> 2) Then just confirm spark sees it...
>
> val lines = sc.textFile("/tmp/passwd")
>
> lines.collect can print this out for you.
>
> As a test of this... you can just use ASF BigTop's spark vagrant recipes
> :  we dont do anything special, and I found hdfs integration "just worked",
> since by default we deploy with hadoop configuration for HDFS.
>
>
>
> On Thu, Apr 9, 2015 at 8:38 PM, Sean Owen <so...@cloudera.com> wrote:
>
>> What you have there is how to do it although you want to use
>> sc.hadoopConfiguration IIRC.
>> On Apr 9, 2015 8:26 PM, "Ulanov, Alexander" <al...@hp.com>
>> wrote:
>>
>> > Hi,
>> >
>> > Is there a way to access hdfs FileSystem through Spark? For example, I
>> > need to check the file size before opening it with
>> sc.binaryFile("hdfs://
>> > mynetwork.com:9000/myfile"). Can I do it without creating hadoop
>> > FileSystem by myself ?
>> >
>> > val fs = FileSystem.get(new URI("hdfs://mynetwork.com:9000"), new
>> > Configuration())
>> >
>> > Best regards, Alexander
>> >
>>
>
>
>
> --
> jay vyas
>



-- 
jay vyas

Re: Access to hdfs FileSystem through Spark

Posted by jay vyas <ja...@gmail.com>.
if already on a hadoop cluster, it should just work ootb.
Spark is smart enough to work on hadoop filesystems... it reads hadoop conf
on an existing normal HCFS
cluster and sc.textFile will just use whatever your default hadoop fs uri
is.

In general any this is quite easy to test... once spark is setup properly,
it should naturally
load text files useing spark context from the

1) then put a file into your HCFS file system.

hadoop fs -put /etc/passwd /etc/passwd

2) Then just confirm spark sees it...

val lines = sc.textFile("/tmp/passwd")

lines.collect can print this out for you.

As a test of this... you can just use ASF BigTop's spark vagrant recipes :
we dont do anything special, and I found hdfs integration "just worked",
since by default we deploy with hadoop configuration for HDFS.



On Thu, Apr 9, 2015 at 8:38 PM, Sean Owen <so...@cloudera.com> wrote:

> What you have there is how to do it although you want to use
> sc.hadoopConfiguration IIRC.
> On Apr 9, 2015 8:26 PM, "Ulanov, Alexander" <al...@hp.com>
> wrote:
>
> > Hi,
> >
> > Is there a way to access hdfs FileSystem through Spark? For example, I
> > need to check the file size before opening it with sc.binaryFile("hdfs://
> > mynetwork.com:9000/myfile"). Can I do it without creating hadoop
> > FileSystem by myself ?
> >
> > val fs = FileSystem.get(new URI("hdfs://mynetwork.com:9000"), new
> > Configuration())
> >
> > Best regards, Alexander
> >
>



-- 
jay vyas

Re: Access to hdfs FileSystem through Spark

Posted by Sean Owen <so...@cloudera.com>.
What you have there is how to do it although you want to use
sc.hadoopConfiguration IIRC.
On Apr 9, 2015 8:26 PM, "Ulanov, Alexander" <al...@hp.com> wrote:

> Hi,
>
> Is there a way to access hdfs FileSystem through Spark? For example, I
> need to check the file size before opening it with sc.binaryFile("hdfs://
> mynetwork.com:9000/myfile"). Can I do it without creating hadoop
> FileSystem by myself ?
>
> val fs = FileSystem.get(new URI("hdfs://mynetwork.com:9000"), new
> Configuration())
>
> Best regards, Alexander
>