You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ajay Chander <it...@gmail.com> on 2015/08/30 17:21:57 UTC

submit_spark_job_to_YARN

Hi Everyone,

Recently we have installed spark on yarn in hortonworks cluster. Now I am
trying to run a wordcount program in my eclipse and I
did setMaster("local") and I see the results that's as expected. Now I want
to submit the same job to my yarn cluster from my eclipse. In storm
basically I was doing the same by using StormSubmitter class and by passing
nimbus & zookeeper host to Config object. I was looking for something
exactly the same.

When I went through the documentation online, it read that I am suppose to
"export HADOOP_HOME_DIR=path to the conf dir". So now I copied the conf
folder from one of sparks gateway node to my local Unix box. Now I did
export that dir...

export HADOOP_HOME_DIR=/Users/user1/Documents/conf/

And I did the same in .bash_profile too. Now when I do echo
$HADOOP_HOME_DIR, I see the path getting printed in the command prompt. Now
my assumption is, in my program when I change setMaster("local") to
setMaster("yarn-client") my program should pick up the resource mangers i.e
yarn cluster info from the directory which I have exported and the job
should get submitted to resolve manager from my eclipse. But somehow it's
not happening. Please tell me if my assumption is wrong or if I am missing
anything here.

I have attached the word count program that I was using. Any help is highly
appreciated.

Thank you,
Ajay

Re: submit_spark_job_to_YARN

Posted by Ajay Chander <it...@gmail.com>.
Thanks everyone for your valuable time and information. It was helpful.

On Sunday, August 30, 2015, Ted Yu <yu...@gmail.com> wrote:

> This is related:
> SPARK-10288 Add a rest client for Spark on Yarn
>
> FYI
>
> On Sun, Aug 30, 2015 at 12:12 PM, Dawid Wysakowicz <
> wysakowicz.dawid@gmail.com
> <javascript:_e(%7B%7D,'cvml','wysakowicz.dawid@gmail.com');>> wrote:
>
>> Hi Ajay,
>>
>> In short story: No, there is no easy way to do that. But if you'd like to
>> play around this topic a good starting point would be this blog post from
>> sequenceIQ: blog
>> <http://blog.sequenceiq.com/blog/2014/08/22/spark-submit-in-java/>.
>>
>> I heard rumors that there are some work going on to prepare Submit API,
>> but I am not a contributor and I can't say neither if it is true nor how
>> are the works going on.
>> For now the suggested way is to use the provided script: spark-submit.
>>
>> Regards
>> Dawid
>>
>> 2015-08-30 20:54 GMT+02:00 Ajay Chander <itschevva@gmail.com
>> <javascript:_e(%7B%7D,'cvml','itschevva@gmail.com');>>:
>>
>>> Hi David,
>>>
>>> Thanks for responding! My main intention was to submit spark Job/jar to
>>> yarn cluster from my eclipse with in the code. Is there any way that I
>>> could pass my yarn configuration somewhere in the code to submit the jar to
>>> the cluster?
>>>
>>> Thank you,
>>> Ajay
>>>
>>>
>>> On Sunday, August 30, 2015, David Mitchell <jdavidmitchell@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','jdavidmitchell@gmail.com');>> wrote:
>>>
>>>> Hi Ajay,
>>>>
>>>> Are you trying to save to your local file system or to HDFS?
>>>>
>>>> // This would save to HDFS under "/user/hadoop/counter"
>>>> counter.saveAsTextFile("/user/hadoop/counter");
>>>>
>>>> David
>>>>
>>>>
>>>> On Sun, Aug 30, 2015 at 11:21 AM, Ajay Chander <it...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Everyone,
>>>>>
>>>>> Recently we have installed spark on yarn in hortonworks cluster. Now I
>>>>> am trying to run a wordcount program in my eclipse and I
>>>>> did setMaster("local") and I see the results that's as expected. Now I want
>>>>> to submit the same job to my yarn cluster from my eclipse. In storm
>>>>> basically I was doing the same by using StormSubmitter class and by passing
>>>>> nimbus & zookeeper host to Config object. I was looking for something
>>>>> exactly the same.
>>>>>
>>>>> When I went through the documentation online, it read that I am
>>>>> suppose to "export HADOOP_HOME_DIR=path to the conf dir". So now I copied
>>>>> the conf folder from one of sparks gateway node to my local Unix box. Now I
>>>>> did export that dir...
>>>>>
>>>>> export HADOOP_HOME_DIR=/Users/user1/Documents/conf/
>>>>>
>>>>> And I did the same in .bash_profile too. Now when I do echo
>>>>> $HADOOP_HOME_DIR, I see the path getting printed in the command prompt. Now
>>>>> my assumption is, in my program when I change setMaster("local") to
>>>>> setMaster("yarn-client") my program should pick up the resource mangers i.e
>>>>> yarn cluster info from the directory which I have exported and the job
>>>>> should get submitted to resolve manager from my eclipse. But somehow it's
>>>>> not happening. Please tell me if my assumption is wrong or if I am missing
>>>>> anything here.
>>>>>
>>>>> I have attached the word count program that I was using. Any help is
>>>>> highly appreciated.
>>>>>
>>>>> Thank you,
>>>>> Ajay
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ### Confidential e-mail, for recipient's (or recipients') eyes only,
>>>> not for distribution. ###
>>>>
>>>
>>
>

Re: submit_spark_job_to_YARN

Posted by Ted Yu <yu...@gmail.com>.
This is related:
SPARK-10288 Add a rest client for Spark on Yarn

FYI

On Sun, Aug 30, 2015 at 12:12 PM, Dawid Wysakowicz <
wysakowicz.dawid@gmail.com> wrote:

> Hi Ajay,
>
> In short story: No, there is no easy way to do that. But if you'd like to
> play around this topic a good starting point would be this blog post from
> sequenceIQ: blog
> <http://blog.sequenceiq.com/blog/2014/08/22/spark-submit-in-java/>.
>
> I heard rumors that there are some work going on to prepare Submit API,
> but I am not a contributor and I can't say neither if it is true nor how
> are the works going on.
> For now the suggested way is to use the provided script: spark-submit.
>
> Regards
> Dawid
>
> 2015-08-30 20:54 GMT+02:00 Ajay Chander <it...@gmail.com>:
>
>> Hi David,
>>
>> Thanks for responding! My main intention was to submit spark Job/jar to
>> yarn cluster from my eclipse with in the code. Is there any way that I
>> could pass my yarn configuration somewhere in the code to submit the jar to
>> the cluster?
>>
>> Thank you,
>> Ajay
>>
>>
>> On Sunday, August 30, 2015, David Mitchell <jd...@gmail.com>
>> wrote:
>>
>>> Hi Ajay,
>>>
>>> Are you trying to save to your local file system or to HDFS?
>>>
>>> // This would save to HDFS under "/user/hadoop/counter"
>>> counter.saveAsTextFile("/user/hadoop/counter");
>>>
>>> David
>>>
>>>
>>> On Sun, Aug 30, 2015 at 11:21 AM, Ajay Chander <it...@gmail.com>
>>> wrote:
>>>
>>>> Hi Everyone,
>>>>
>>>> Recently we have installed spark on yarn in hortonworks cluster. Now I
>>>> am trying to run a wordcount program in my eclipse and I
>>>> did setMaster("local") and I see the results that's as expected. Now I want
>>>> to submit the same job to my yarn cluster from my eclipse. In storm
>>>> basically I was doing the same by using StormSubmitter class and by passing
>>>> nimbus & zookeeper host to Config object. I was looking for something
>>>> exactly the same.
>>>>
>>>> When I went through the documentation online, it read that I am suppose
>>>> to "export HADOOP_HOME_DIR=path to the conf dir". So now I copied the conf
>>>> folder from one of sparks gateway node to my local Unix box. Now I did
>>>> export that dir...
>>>>
>>>> export HADOOP_HOME_DIR=/Users/user1/Documents/conf/
>>>>
>>>> And I did the same in .bash_profile too. Now when I do echo
>>>> $HADOOP_HOME_DIR, I see the path getting printed in the command prompt. Now
>>>> my assumption is, in my program when I change setMaster("local") to
>>>> setMaster("yarn-client") my program should pick up the resource mangers i.e
>>>> yarn cluster info from the directory which I have exported and the job
>>>> should get submitted to resolve manager from my eclipse. But somehow it's
>>>> not happening. Please tell me if my assumption is wrong or if I am missing
>>>> anything here.
>>>>
>>>> I have attached the word count program that I was using. Any help is
>>>> highly appreciated.
>>>>
>>>> Thank you,
>>>> Ajay
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>
>>>
>>>
>>> --
>>> ### Confidential e-mail, for recipient's (or recipients') eyes only, not
>>> for distribution. ###
>>>
>>
>

Re: submit_spark_job_to_YARN

Posted by Dawid Wysakowicz <wy...@gmail.com>.
Hi Ajay,

In short story: No, there is no easy way to do that. But if you'd like to
play around this topic a good starting point would be this blog post from
sequenceIQ: blog
<http://blog.sequenceiq.com/blog/2014/08/22/spark-submit-in-java/>.

I heard rumors that there are some work going on to prepare Submit API, but
I am not a contributor and I can't say neither if it is true nor how are
the works going on.
For now the suggested way is to use the provided script: spark-submit.

Regards
Dawid

2015-08-30 20:54 GMT+02:00 Ajay Chander <it...@gmail.com>:

> Hi David,
>
> Thanks for responding! My main intention was to submit spark Job/jar to
> yarn cluster from my eclipse with in the code. Is there any way that I
> could pass my yarn configuration somewhere in the code to submit the jar to
> the cluster?
>
> Thank you,
> Ajay
>
>
> On Sunday, August 30, 2015, David Mitchell <jd...@gmail.com>
> wrote:
>
>> Hi Ajay,
>>
>> Are you trying to save to your local file system or to HDFS?
>>
>> // This would save to HDFS under "/user/hadoop/counter"
>> counter.saveAsTextFile("/user/hadoop/counter");
>>
>> David
>>
>>
>> On Sun, Aug 30, 2015 at 11:21 AM, Ajay Chander <it...@gmail.com>
>> wrote:
>>
>>> Hi Everyone,
>>>
>>> Recently we have installed spark on yarn in hortonworks cluster. Now I
>>> am trying to run a wordcount program in my eclipse and I
>>> did setMaster("local") and I see the results that's as expected. Now I want
>>> to submit the same job to my yarn cluster from my eclipse. In storm
>>> basically I was doing the same by using StormSubmitter class and by passing
>>> nimbus & zookeeper host to Config object. I was looking for something
>>> exactly the same.
>>>
>>> When I went through the documentation online, it read that I am suppose
>>> to "export HADOOP_HOME_DIR=path to the conf dir". So now I copied the conf
>>> folder from one of sparks gateway node to my local Unix box. Now I did
>>> export that dir...
>>>
>>> export HADOOP_HOME_DIR=/Users/user1/Documents/conf/
>>>
>>> And I did the same in .bash_profile too. Now when I do echo
>>> $HADOOP_HOME_DIR, I see the path getting printed in the command prompt. Now
>>> my assumption is, in my program when I change setMaster("local") to
>>> setMaster("yarn-client") my program should pick up the resource mangers i.e
>>> yarn cluster info from the directory which I have exported and the job
>>> should get submitted to resolve manager from my eclipse. But somehow it's
>>> not happening. Please tell me if my assumption is wrong or if I am missing
>>> anything here.
>>>
>>> I have attached the word count program that I was using. Any help is
>>> highly appreciated.
>>>
>>> Thank you,
>>> Ajay
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>
>>
>>
>> --
>> ### Confidential e-mail, for recipient's (or recipients') eyes only, not
>> for distribution. ###
>>
>

Re: submit_spark_job_to_YARN

Posted by Ajay Chander <it...@gmail.com>.
Hi David,

Thanks for responding! My main intention was to submit spark Job/jar to
yarn cluster from my eclipse with in the code. Is there any way that I
could pass my yarn configuration somewhere in the code to submit the jar to
the cluster?

Thank you,
Ajay

On Sunday, August 30, 2015, David Mitchell <jd...@gmail.com> wrote:

> Hi Ajay,
>
> Are you trying to save to your local file system or to HDFS?
>
> // This would save to HDFS under "/user/hadoop/counter"
> counter.saveAsTextFile("/user/hadoop/counter");
>
> David
>
>
> On Sun, Aug 30, 2015 at 11:21 AM, Ajay Chander <itschevva@gmail.com
> <javascript:_e(%7B%7D,'cvml','itschevva@gmail.com');>> wrote:
>
>> Hi Everyone,
>>
>> Recently we have installed spark on yarn in hortonworks cluster. Now I am
>> trying to run a wordcount program in my eclipse and I
>> did setMaster("local") and I see the results that's as expected. Now I want
>> to submit the same job to my yarn cluster from my eclipse. In storm
>> basically I was doing the same by using StormSubmitter class and by passing
>> nimbus & zookeeper host to Config object. I was looking for something
>> exactly the same.
>>
>> When I went through the documentation online, it read that I am suppose
>> to "export HADOOP_HOME_DIR=path to the conf dir". So now I copied the conf
>> folder from one of sparks gateway node to my local Unix box. Now I did
>> export that dir...
>>
>> export HADOOP_HOME_DIR=/Users/user1/Documents/conf/
>>
>> And I did the same in .bash_profile too. Now when I do echo
>> $HADOOP_HOME_DIR, I see the path getting printed in the command prompt. Now
>> my assumption is, in my program when I change setMaster("local") to
>> setMaster("yarn-client") my program should pick up the resource mangers i.e
>> yarn cluster info from the directory which I have exported and the job
>> should get submitted to resolve manager from my eclipse. But somehow it's
>> not happening. Please tell me if my assumption is wrong or if I am missing
>> anything here.
>>
>> I have attached the word count program that I was using. Any help is
>> highly appreciated.
>>
>> Thank you,
>> Ajay
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> <javascript:_e(%7B%7D,'cvml','user-unsubscribe@spark.apache.org');>
>> For additional commands, e-mail: user-help@spark.apache.org
>> <javascript:_e(%7B%7D,'cvml','user-help@spark.apache.org');>
>>
>
>
>
> --
> ### Confidential e-mail, for recipient's (or recipients') eyes only, not
> for distribution. ###
>

Re: submit_spark_job_to_YARN

Posted by David Mitchell <jd...@gmail.com>.
Hi Ajay,

Are you trying to save to your local file system or to HDFS?

// This would save to HDFS under "/user/hadoop/counter"
counter.saveAsTextFile("/user/hadoop/counter");

David


On Sun, Aug 30, 2015 at 11:21 AM, Ajay Chander <it...@gmail.com> wrote:

> Hi Everyone,
>
> Recently we have installed spark on yarn in hortonworks cluster. Now I am
> trying to run a wordcount program in my eclipse and I
> did setMaster("local") and I see the results that's as expected. Now I want
> to submit the same job to my yarn cluster from my eclipse. In storm
> basically I was doing the same by using StormSubmitter class and by passing
> nimbus & zookeeper host to Config object. I was looking for something
> exactly the same.
>
> When I went through the documentation online, it read that I am suppose to
> "export HADOOP_HOME_DIR=path to the conf dir". So now I copied the conf
> folder from one of sparks gateway node to my local Unix box. Now I did
> export that dir...
>
> export HADOOP_HOME_DIR=/Users/user1/Documents/conf/
>
> And I did the same in .bash_profile too. Now when I do echo
> $HADOOP_HOME_DIR, I see the path getting printed in the command prompt. Now
> my assumption is, in my program when I change setMaster("local") to
> setMaster("yarn-client") my program should pick up the resource mangers i.e
> yarn cluster info from the directory which I have exported and the job
> should get submitted to resolve manager from my eclipse. But somehow it's
> not happening. Please tell me if my assumption is wrong or if I am missing
> anything here.
>
> I have attached the word count program that I was using. Any help is
> highly appreciated.
>
> Thank you,
> Ajay
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>



-- 
### Confidential e-mail, for recipient's (or recipients') eyes only, not
for distribution. ###