You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ognen Duzlevski <og...@nengoiksvelzud.com> on 2014/01/16 00:56:52 UTC

Reading files on a cluster / shared file system

On a cluster where the nodes and the master all have access to a shared
filesystem/files - does spark read a file (like one resulting from
sc.textFile()) in parallel/different sections on each node? Or is the file
read on master in sequence and chunks processed on the nodes afterwards?

Thanks!
Ognen

Re: Reading files on a cluster / shared file system

Posted by Ognen Duzlevski <og...@plainvanillagames.com>.
Makes sense. Thanks!
Ognen


On Thu, Jan 16, 2014 at 12:54 AM, Tathagata Das <tathagata.das1565@gmail.com
> wrote:

> If you are running a distributed Spark cluster over the nodes, then the
> reading should be done in a distributed manner. If you give sc.textFile() a
> "local path" to a directory in the shared file system, then each worker
> should read a subset of the files in directory by accessing them locally.
> Nothing should be read on the master.
>
> TD
>
>
> On Wed, Jan 15, 2014 at 3:56 PM, Ognen Duzlevski <ognen@nengoiksvelzud.com
> > wrote:
>
>> On a cluster where the nodes and the master all have access to a shared
>> filesystem/files - does spark read a file (like one resulting from
>> sc.textFile()) in parallel/different sections on each node? Or is the file
>> read on master in sequence and chunks processed on the nodes afterwards?
>>
>> Thanks!
>> Ognen
>>
>
>


-- 
"Le secret des grandes fortunes sans cause apparente est un crime oublié,
parce qu'il a été proprement fait" - Honore de Balzac

Re: Reading files on a cluster / shared file system

Posted by Tathagata Das <ta...@gmail.com>.
If you are running a distributed Spark cluster over the nodes, then the
reading should be done in a distributed manner. If you give sc.textFile() a
"local path" to a directory in the shared file system, then each worker
should read a subset of the files in directory by accessing them locally.
Nothing should be read on the master.

TD


On Wed, Jan 15, 2014 at 3:56 PM, Ognen Duzlevski
<og...@nengoiksvelzud.com>wrote:

> On a cluster where the nodes and the master all have access to a shared
> filesystem/files - does spark read a file (like one resulting from
> sc.textFile()) in parallel/different sections on each node? Or is the file
> read on master in sequence and chunks processed on the nodes afterwards?
>
> Thanks!
> Ognen
>