You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by sudhir k <k....@gmail.com> on 2017/06/24 17:30:07 UTC

Fwd: Can we access files on Cluster mode

I am new to Spark and i need some guidance on how to fetch files from
--files option on Spark-Submit.

I read on some forums that we can fetch the files from
Spark.getFiles(fileName) and can use it in our code and all nodes should
read it.

But i am facing some issue

Below is the command i am using

spark-submit --deploy-mode cluster --class com.check.Driver --files
/home/sql/first.sql test.jar 20170619

so when i use SparkFiles.get(first.sql) , i should be able to read the file
Path but it is throwing File not Found exception.

I tried SpackContext.addFile(/home/sql/first.sql) and then
SparkFiles.get(first.sql) but still the same error.

Its working on the stand alone mode but not on cluster mode. Any help is
appreciated.. Using Spark 2.1.0 and Scala 2.11

Thanks.


Regards,
Sudhir K



-- 
Regards,
Sudhir K

Re: Can we access files on Cluster mode

Posted by sudhir k <k....@gmail.com>.

Thank you . I guess I have to use common mount or s3 to access those files.

On Sun, Jun 25, 2017 at 4:42 AM Mich Talebzadeh <mi...@gmail.com>
wrote:

> Thanks. In my experience certain distros like Cloudera only support yarn
> client mode so AFAIK the driver stays on the Edge node. Happy to be
> corrected :)
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 25 June 2017 at 10:37, Anastasios Zouzias <zo...@gmail.com> wrote:
>
>> Hi Mich,
>>
>> If the driver starts on the edge node with cluster mode, then I don't see
>> the difference between client and cluster deploy mode.
>>
>> In cluster mode, it is the responsibility of the resource manager (yarn,
>> etc) to decide where to run the driver (at least for spark 1.6 this is what
>> I have experienced).
>>
>> Best,
>> Anastasios
>>
>> On Sun, Jun 25, 2017 at 11:14 AM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Hi Anastasios.
>>>
>>> Are you implying that in Yarn cluster mode even if you submit your Spark
>>> application on an Edge node the driver can start on any node. I was under
>>> the impression that the driver starts from the Edge node? and the executors
>>> can be on any node in the cluster (where Spark agents are running)?
>>>
>>> Thanks
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 25 June 2017 at 09:39, Anastasios Zouzias <zo...@gmail.com> wrote:
>>>
>>>> Just to note that in cluster mode the spark driver might run on any
>>>> node of the cluster, hence you need to make sure that the file exists on
>>>> *all* nodes. Push the file on all nodes or use client deploy-mode.
>>>>
>>>> Best,
>>>> Anastasios
>>>>
>>>>
>>>> Am 24.06.2017 23:24 schrieb "Holden Karau" <ho...@pigscanfly.ca>:
>>>>
>>>>> addFile is supposed to not depend on a shared FS unless the semantics
>>>>> have changed recently.
>>>>>
>>>>> On Sat, Jun 24, 2017 at 11:55 AM varma dantuluri <dv...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Sudhir,
>>>>>>
>>>>>> I believe you have to use a shared file system that is accused by all
>>>>>> nodes.
>>>>>>
>>>>>>
>>>>>> On Jun 24, 2017, at 1:30 PM, sudhir k <k....@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>> I am new to Spark and i need some guidance on how to fetch files from
>>>>>> --files option on Spark-Submit.
>>>>>>
>>>>>> I read on some forums that we can fetch the files from
>>>>>> Spark.getFiles(fileName) and can use it in our code and all nodes should
>>>>>> read it.
>>>>>>
>>>>>> But i am facing some issue
>>>>>>
>>>>>> Below is the command i am using
>>>>>>
>>>>>> spark-submit --deploy-mode cluster --class com.check.Driver --files
>>>>>> /home/sql/first.sql test.jar 20170619
>>>>>>
>>>>>> so when i use SparkFiles.get(first.sql) , i should be able to read
>>>>>> the file Path but it is throwing File not Found exception.
>>>>>>
>>>>>> I tried SpackContext.addFile(/home/sql/first.sql) and then
>>>>>> SparkFiles.get(first.sql) but still the same error.
>>>>>>
>>>>>> Its working on the stand alone mode but not on cluster mode. Any help
>>>>>> is appreciated.. Using Spark 2.1.0 and Scala 2.11
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Sudhir K
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Sudhir K
>>>>>>
>>>>>>
>>>>>> --
>>>>> Cell : 425-233-8271 <(425)%20233-8271>
>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>
>>>>
>>>
>>
>>
>> --
>> -- Anastasios Zouzias
>> <az...@zurich.ibm.com>
>>
>
> --
Sent from Gmail Mobile

Re: Can we access files on Cluster mode

Posted by Mich Talebzadeh <mi...@gmail.com>.

Thanks. In my experience certain distros like Cloudera only support yarn
client mode so AFAIK the driver stays on the Edge node. Happy to be
corrected :)



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 25 June 2017 at 10:37, Anastasios Zouzias <zo...@gmail.com> wrote:

> Hi Mich,
>
> If the driver starts on the edge node with cluster mode, then I don't see
> the difference between client and cluster deploy mode.
>
> In cluster mode, it is the responsibility of the resource manager (yarn,
> etc) to decide where to run the driver (at least for spark 1.6 this is what
> I have experienced).
>
> Best,
> Anastasios
>
> On Sun, Jun 25, 2017 at 11:14 AM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> Hi Anastasios.
>>
>> Are you implying that in Yarn cluster mode even if you submit your Spark
>> application on an Edge node the driver can start on any node. I was under
>> the impression that the driver starts from the Edge node? and the executors
>> can be on any node in the cluster (where Spark agents are running)?
>>
>> Thanks
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 25 June 2017 at 09:39, Anastasios Zouzias <zo...@gmail.com> wrote:
>>
>>> Just to note that in cluster mode the spark driver might run on any node
>>> of the cluster, hence you need to make sure that the file exists on *all*
>>> nodes. Push the file on all nodes or use client deploy-mode.
>>>
>>> Best,
>>> Anastasios
>>>
>>>
>>> Am 24.06.2017 23:24 schrieb "Holden Karau" <ho...@pigscanfly.ca>:
>>>
>>>> addFile is supposed to not depend on a shared FS unless the semantics
>>>> have changed recently.
>>>>
>>>> On Sat, Jun 24, 2017 at 11:55 AM varma dantuluri <dv...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Sudhir,
>>>>>
>>>>> I believe you have to use a shared file system that is accused by all
>>>>> nodes.
>>>>>
>>>>>
>>>>> On Jun 24, 2017, at 1:30 PM, sudhir k <k....@gmail.com> wrote:
>>>>>
>>>>>
>>>>> I am new to Spark and i need some guidance on how to fetch files from
>>>>> --files option on Spark-Submit.
>>>>>
>>>>> I read on some forums that we can fetch the files from
>>>>> Spark.getFiles(fileName) and can use it in our code and all nodes should
>>>>> read it.
>>>>>
>>>>> But i am facing some issue
>>>>>
>>>>> Below is the command i am using
>>>>>
>>>>> spark-submit --deploy-mode cluster --class com.check.Driver --files
>>>>> /home/sql/first.sql test.jar 20170619
>>>>>
>>>>> so when i use SparkFiles.get(first.sql) , i should be able to read the
>>>>> file Path but it is throwing File not Found exception.
>>>>>
>>>>> I tried SpackContext.addFile(/home/sql/first.sql) and then
>>>>> SparkFiles.get(first.sql) but still the same error.
>>>>>
>>>>> Its working on the stand alone mode but not on cluster mode. Any help
>>>>> is appreciated.. Using Spark 2.1.0 and Scala 2.11
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Sudhir K
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Sudhir K
>>>>>
>>>>>
>>>>> --
>>>> Cell : 425-233-8271 <(425)%20233-8271>
>>>> Twitter: https://twitter.com/holdenkarau
>>>>
>>>
>>
>
>
> --
> -- Anastasios Zouzias
> <az...@zurich.ibm.com>
>

Re: Can we access files on Cluster mode

Posted by Anastasios Zouzias <zo...@gmail.com>.

Hi Mich,

If the driver starts on the edge node with cluster mode, then I don't see
the difference between client and cluster deploy mode.

In cluster mode, it is the responsibility of the resource manager (yarn,
etc) to decide where to run the driver (at least for spark 1.6 this is what
I have experienced).

Best,
Anastasios

On Sun, Jun 25, 2017 at 11:14 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com
> wrote:

> Hi Anastasios.
>
> Are you implying that in Yarn cluster mode even if you submit your Spark
> application on an Edge node the driver can start on any node. I was under
> the impression that the driver starts from the Edge node? and the executors
> can be on any node in the cluster (where Spark agents are running)?
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 25 June 2017 at 09:39, Anastasios Zouzias <zo...@gmail.com> wrote:
>
>> Just to note that in cluster mode the spark driver might run on any node
>> of the cluster, hence you need to make sure that the file exists on *all*
>> nodes. Push the file on all nodes or use client deploy-mode.
>>
>> Best,
>> Anastasios
>>
>>
>> Am 24.06.2017 23:24 schrieb "Holden Karau" <ho...@pigscanfly.ca>:
>>
>>> addFile is supposed to not depend on a shared FS unless the semantics
>>> have changed recently.
>>>
>>> On Sat, Jun 24, 2017 at 11:55 AM varma dantuluri <dv...@gmail.com>
>>> wrote:
>>>
>>>> Hi Sudhir,
>>>>
>>>> I believe you have to use a shared file system that is accused by all
>>>> nodes.
>>>>
>>>>
>>>> On Jun 24, 2017, at 1:30 PM, sudhir k <k....@gmail.com> wrote:
>>>>
>>>>
>>>> I am new to Spark and i need some guidance on how to fetch files from
>>>> --files option on Spark-Submit.
>>>>
>>>> I read on some forums that we can fetch the files from
>>>> Spark.getFiles(fileName) and can use it in our code and all nodes should
>>>> read it.
>>>>
>>>> But i am facing some issue
>>>>
>>>> Below is the command i am using
>>>>
>>>> spark-submit --deploy-mode cluster --class com.check.Driver --files
>>>> /home/sql/first.sql test.jar 20170619
>>>>
>>>> so when i use SparkFiles.get(first.sql) , i should be able to read the
>>>> file Path but it is throwing File not Found exception.
>>>>
>>>> I tried SpackContext.addFile(/home/sql/first.sql) and then
>>>> SparkFiles.get(first.sql) but still the same error.
>>>>
>>>> Its working on the stand alone mode but not on cluster mode. Any help
>>>> is appreciated.. Using Spark 2.1.0 and Scala 2.11
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> Regards,
>>>> Sudhir K
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Sudhir K
>>>>
>>>>
>>>> --
>>> Cell : 425-233-8271 <(425)%20233-8271>
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>


-- 
-- Anastasios Zouzias
<az...@zurich.ibm.com>

Re: Can we access files on Cluster mode

Posted by Mich Talebzadeh <mi...@gmail.com>.

Hi Anastasios.

Are you implying that in Yarn cluster mode even if you submit your Spark
application on an Edge node the driver can start on any node. I was under
the impression that the driver starts from the Edge node? and the executors
can be on any node in the cluster (where Spark agents are running)?

Thanks


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 25 June 2017 at 09:39, Anastasios Zouzias <zo...@gmail.com> wrote:

> Just to note that in cluster mode the spark driver might run on any node
> of the cluster, hence you need to make sure that the file exists on *all*
> nodes. Push the file on all nodes or use client deploy-mode.
>
> Best,
> Anastasios
>
>
> Am 24.06.2017 23:24 schrieb "Holden Karau" <ho...@pigscanfly.ca>:
>
>> addFile is supposed to not depend on a shared FS unless the semantics
>> have changed recently.
>>
>> On Sat, Jun 24, 2017 at 11:55 AM varma dantuluri <dv...@gmail.com>
>> wrote:
>>
>>> Hi Sudhir,
>>>
>>> I believe you have to use a shared file system that is accused by all
>>> nodes.
>>>
>>>
>>> On Jun 24, 2017, at 1:30 PM, sudhir k <k....@gmail.com> wrote:
>>>
>>>
>>> I am new to Spark and i need some guidance on how to fetch files from
>>> --files option on Spark-Submit.
>>>
>>> I read on some forums that we can fetch the files from
>>> Spark.getFiles(fileName) and can use it in our code and all nodes should
>>> read it.
>>>
>>> But i am facing some issue
>>>
>>> Below is the command i am using
>>>
>>> spark-submit --deploy-mode cluster --class com.check.Driver --files
>>> /home/sql/first.sql test.jar 20170619
>>>
>>> so when i use SparkFiles.get(first.sql) , i should be able to read the
>>> file Path but it is throwing File not Found exception.
>>>
>>> I tried SpackContext.addFile(/home/sql/first.sql) and then
>>> SparkFiles.get(first.sql) but still the same error.
>>>
>>> Its working on the stand alone mode but not on cluster mode. Any help is
>>> appreciated.. Using Spark 2.1.0 and Scala 2.11
>>>
>>> Thanks.
>>>
>>>
>>> Regards,
>>> Sudhir K
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Sudhir K
>>>
>>>
>>> --
>> Cell : 425-233-8271 <(425)%20233-8271>
>> Twitter: https://twitter.com/holdenkarau
>>
>

Re: Can we access files on Cluster mode

Posted by Anastasios Zouzias <zo...@gmail.com>.

Just to note that in cluster mode the spark driver might run on any node of
the cluster, hence you need to make sure that the file exists on *all*
nodes. Push the file on all nodes or use client deploy-mode.

Best,
Anastasios

Am 24.06.2017 23:24 schrieb "Holden Karau" <ho...@pigscanfly.ca>:

> addFile is supposed to not depend on a shared FS unless the semantics have
> changed recently.
>
> On Sat, Jun 24, 2017 at 11:55 AM varma dantuluri <dv...@gmail.com>
> wrote:
>
>> Hi Sudhir,
>>
>> I believe you have to use a shared file system that is accused by all
>> nodes.
>>
>>
>> On Jun 24, 2017, at 1:30 PM, sudhir k <k....@gmail.com> wrote:
>>
>>
>> I am new to Spark and i need some guidance on how to fetch files from
>> --files option on Spark-Submit.
>>
>> I read on some forums that we can fetch the files from
>> Spark.getFiles(fileName) and can use it in our code and all nodes should
>> read it.
>>
>> But i am facing some issue
>>
>> Below is the command i am using
>>
>> spark-submit --deploy-mode cluster --class com.check.Driver --files
>> /home/sql/first.sql test.jar 20170619
>>
>> so when i use SparkFiles.get(first.sql) , i should be able to read the
>> file Path but it is throwing File not Found exception.
>>
>> I tried SpackContext.addFile(/home/sql/first.sql) and then
>> SparkFiles.get(first.sql) but still the same error.
>>
>> Its working on the stand alone mode but not on cluster mode. Any help is
>> appreciated.. Using Spark 2.1.0 and Scala 2.11
>>
>> Thanks.
>>
>>
>> Regards,
>> Sudhir K
>>
>>
>>
>> --
>> Regards,
>> Sudhir K
>>
>>
>> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>

Re: Can we access files on Cluster mode

Posted by Holden Karau <ho...@pigscanfly.ca>.

addFile is supposed to not depend on a shared FS unless the semantics have
changed recently.

On Sat, Jun 24, 2017 at 11:55 AM varma dantuluri <dv...@gmail.com>
wrote:

> Hi Sudhir,
>
> I believe you have to use a shared file system that is accused by all
> nodes.
>
>
> On Jun 24, 2017, at 1:30 PM, sudhir k <k....@gmail.com> wrote:
>
>
> I am new to Spark and i need some guidance on how to fetch files from
> --files option on Spark-Submit.
>
> I read on some forums that we can fetch the files from
> Spark.getFiles(fileName) and can use it in our code and all nodes should
> read it.
>
> But i am facing some issue
>
> Below is the command i am using
>
> spark-submit --deploy-mode cluster --class com.check.Driver --files
> /home/sql/first.sql test.jar 20170619
>
> so when i use SparkFiles.get(first.sql) , i should be able to read the
> file Path but it is throwing File not Found exception.
>
> I tried SpackContext.addFile(/home/sql/first.sql) and then
> SparkFiles.get(first.sql) but still the same error.
>
> Its working on the stand alone mode but not on cluster mode. Any help is
> appreciated.. Using Spark 2.1.0 and Scala 2.11
>
> Thanks.
>
>
> Regards,
> Sudhir K
>
>
>
> --
> Regards,
> Sudhir K
>
>
> --
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Re: Can we access files on Cluster mode

Posted by varma dantuluri <dv...@gmail.com>.

Hi Sudhir,

I believe you have to use a shared file system that is accused by all nodes.


> On Jun 24, 2017, at 1:30 PM, sudhir k <k....@gmail.com> wrote:
> 
> 
> I am new to Spark and i need some guidance on how to fetch files from --files option on Spark-Submit.
> 
> I read on some forums that we can fetch the files from Spark.getFiles(fileName) and can use it in our code and all nodes should read it.
> 
> But i am facing some issue
> 
> Below is the command i am using
> 
> spark-submit --deploy-mode cluster --class com.check.Driver --files /home/sql/first.sql test.jar 20170619
> 
> so when i use SparkFiles.get(first.sql) , i should be able to read the file Path but it is throwing File not Found exception.
> 
> I tried SpackContext.addFile(/home/sql/first.sql) and then SparkFiles.get(first.sql) but still the same error.
> 
> Its working on the stand alone mode but not on cluster mode. Any help is appreciated.. Using Spark 2.1.0 and Scala 2.11
> 
> Thanks.
> 
> 
> 
> Regards,
> Sudhir K
> 
> 
> 
> -- 
> Regards,
> Sudhir K