You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Xu (Simon) Chen" <xc...@gmail.com> on 2014/06/01 02:51:51 UTC

spark 1.0.0 on yarn

Hi all,

I tried a couple ways, but couldn't get it to work..

The following seems to be what the online document (
http://spark.apache.org/docs/latest/running-on-yarn.html) is suggesting:
SPARK_JAR=hdfs://test/user/spark/share/lib/spark-assembly-1.0.0-hadoop2.2.0.jar
YARN_CONF_DIR=/opt/hadoop/conf ./spark-shell --master yarn-client

Help info of spark-shell seems to be suggesting "--master yarn
--deploy-mode cluster".

But either way, I am seeing the following messages:
14/06/01 00:33:20 INFO client.RMProxy: Connecting to ResourceManager at /
0.0.0.0:8032
14/06/01 00:33:21 INFO ipc.Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/06/01 00:33:22 INFO ipc.Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

My guess is that spark-shell is trying to talk to resource manager to setup
spark master/worker nodes - I am not sure where 0.0.0.0:8032 came from
though. I am running CDH5 with two resource managers in HA mode. Their
IP/port should be in /opt/hadoop/conf/yarn-site.xml. I tried both
HADOOP_CONF_DIR and YARN_CONF_DIR, but that info isn't picked up.

Any ideas? Thanks.
-Simon

Re: spark 1.0.0 on yarn

Posted by "Xu (Simon) Chen" <xc...@gmail.com>.
I built my new package like this:
"mvn -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0-cdh5.0.1 -DskipTests clean
package"

Spark-shell is working now, but pyspark is still broken. I reported the
problem on a different thread. Please take a look if you can... Desperately
need ideas..

Thanks.
-Simon


On Mon, Jun 2, 2014 at 2:47 PM, Patrick Wendell <pw...@gmail.com> wrote:

> Okay I'm guessing that our upstreaming "Hadoop2" package isn't new
> enough to work with CDH5. We should probably clarify this in our
> downloads. Thanks for reporting this. What was the exact string you
> used when building? Also which CDH-5 version are you building against?
>
> On Mon, Jun 2, 2014 at 8:11 AM, Xu (Simon) Chen <xc...@gmail.com> wrote:
> > OK, rebuilding the assembly jar file with cdh5 works now...
> > Thanks..
> >
> > -Simon
> >
> >
> > On Sun, Jun 1, 2014 at 9:37 PM, Xu (Simon) Chen <xc...@gmail.com>
> wrote:
> >>
> >> That helped a bit... Now I have a different failure: the start up
> process
> >> is stuck in an infinite loop outputting the following message:
> >>
> >> 14/06/02 01:34:56 INFO cluster.YarnClientSchedulerBackend: Application
> >> report from ASM:
> >> appMasterRpcPort: -1
> >> appStartTime: 1401672868277
> >> yarnAppState: ACCEPTED
> >>
> >> I am using the hadoop 2 prebuild package. Probably it doesn't have the
> >> latest yarn client.
> >>
> >> -Simon
> >>
> >>
> >>
> >>
> >> On Sun, Jun 1, 2014 at 9:03 PM, Patrick Wendell <pw...@gmail.com>
> >> wrote:
> >>>
> >>> As a debugging step, does it work if you use a single resource manager
> >>> with the key "yarn.resourcemanager.address" instead of using two named
> >>> resource managers? I wonder if somehow the YARN client can't detect
> >>> this multi-master set-up.
> >>>
> >>> On Sun, Jun 1, 2014 at 12:49 PM, Xu (Simon) Chen <xc...@gmail.com>
> >>> wrote:
> >>> > Note that everything works fine in spark 0.9, which is packaged in
> >>> > CDH5: I
> >>> > can launch a spark-shell and interact with workers spawned on my yarn
> >>> > cluster.
> >>> >
> >>> > So in my /opt/hadoop/conf/yarn-site.xml, I have:
> >>> >     ...
> >>> >     <property>
> >>> >         <name>yarn.resourcemanager.address.rm1</name>
> >>> >         <value>controller-1.mycomp.com:23140</value>
> >>> >     </property>
> >>> >     ...
> >>> >     <property>
> >>> >         <name>yarn.resourcemanager.address.rm2</name>
> >>> >         <value>controller-2.mycomp.com:23140</value>
> >>> >     </property>
> >>> >     ...
> >>> >
> >>> > And the other usual stuff.
> >>> >
> >>> > So spark 1.0 is launched like this:
> >>> > Spark Command: java -cp
> >>> >
> >>> >
> ::/home/chenxu/spark-1.0.0-bin-hadoop2/conf:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/opt/hadoop/conf
> >>> > -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
> >>> > org.apache.spark.deploy.SparkSubmit spark-shell --master yarn-client
> >>> > --class
> >>> > org.apache.spark.repl.Main
> >>> >
> >>> > I do see "/opt/hadoop/conf" included, but not sure it's the right
> >>> > place.
> >>> >
> >>> > Thanks..
> >>> > -Simon
> >>> >
> >>> >
> >>> >
> >>> > On Sun, Jun 1, 2014 at 1:57 PM, Patrick Wendell <pw...@gmail.com>
> >>> > wrote:
> >>> >>
> >>> >> I would agree with your guess, it looks like the yarn library isn't
> >>> >> correctly finding your yarn-site.xml file. If you look in
> >>> >> yarn-site.xml do you definitely the resource manager
> >>> >> address/addresses?
> >>> >>
> >>> >> Also, you can try running this command with
> >>> >> SPARK_PRINT_LAUNCH_COMMAND=1 to make sure the classpath is being
> >>> >> set-up correctly.
> >>> >>
> >>> >> - Patrick
> >>> >>
> >>> >> On Sat, May 31, 2014 at 5:51 PM, Xu (Simon) Chen <xchenum@gmail.com
> >
> >>> >> wrote:
> >>> >> > Hi all,
> >>> >> >
> >>> >> > I tried a couple ways, but couldn't get it to work..
> >>> >> >
> >>> >> > The following seems to be what the online document
> >>> >> > (http://spark.apache.org/docs/latest/running-on-yarn.html) is
> >>> >> > suggesting:
> >>> >> >
> >>> >> >
> >>> >> >
> SPARK_JAR=hdfs://test/user/spark/share/lib/spark-assembly-1.0.0-hadoop2.2.0.jar
> >>> >> > YARN_CONF_DIR=/opt/hadoop/conf ./spark-shell --master yarn-client
> >>> >> >
> >>> >> > Help info of spark-shell seems to be suggesting "--master yarn
> >>> >> > --deploy-mode
> >>> >> > cluster".
> >>> >> >
> >>> >> > But either way, I am seeing the following messages:
> >>> >> > 14/06/01 00:33:20 INFO client.RMProxy: Connecting to
> ResourceManager
> >>> >> > at
> >>> >> > /0.0.0.0:8032
> >>> >> > 14/06/01 00:33:21 INFO ipc.Client: Retrying connect to server:
> >>> >> > 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is
> >>> >> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
> >>> >> > SECONDS)
> >>> >> > 14/06/01 00:33:22 INFO ipc.Client: Retrying connect to server:
> >>> >> > 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is
> >>> >> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
> >>> >> > SECONDS)
> >>> >> >
> >>> >> > My guess is that spark-shell is trying to talk to resource manager
> >>> >> > to
> >>> >> > setup
> >>> >> > spark master/worker nodes - I am not sure where 0.0.0.0:8032 came
> >>> >> > from
> >>> >> > though. I am running CDH5 with two resource managers in HA mode.
> >>> >> > Their
> >>> >> > IP/port should be in /opt/hadoop/conf/yarn-site.xml. I tried both
> >>> >> > HADOOP_CONF_DIR and YARN_CONF_DIR, but that info isn't picked up.
> >>> >> >
> >>> >> > Any ideas? Thanks.
> >>> >> > -Simon
> >>> >
> >>> >
> >>
> >>
> >
>

Re: spark 1.0.0 on yarn

Posted by Patrick Wendell <pw...@gmail.com>.
Okay I'm guessing that our upstreaming "Hadoop2" package isn't new
enough to work with CDH5. We should probably clarify this in our
downloads. Thanks for reporting this. What was the exact string you
used when building? Also which CDH-5 version are you building against?

On Mon, Jun 2, 2014 at 8:11 AM, Xu (Simon) Chen <xc...@gmail.com> wrote:
> OK, rebuilding the assembly jar file with cdh5 works now...
> Thanks..
>
> -Simon
>
>
> On Sun, Jun 1, 2014 at 9:37 PM, Xu (Simon) Chen <xc...@gmail.com> wrote:
>>
>> That helped a bit... Now I have a different failure: the start up process
>> is stuck in an infinite loop outputting the following message:
>>
>> 14/06/02 01:34:56 INFO cluster.YarnClientSchedulerBackend: Application
>> report from ASM:
>> appMasterRpcPort: -1
>> appStartTime: 1401672868277
>> yarnAppState: ACCEPTED
>>
>> I am using the hadoop 2 prebuild package. Probably it doesn't have the
>> latest yarn client.
>>
>> -Simon
>>
>>
>>
>>
>> On Sun, Jun 1, 2014 at 9:03 PM, Patrick Wendell <pw...@gmail.com>
>> wrote:
>>>
>>> As a debugging step, does it work if you use a single resource manager
>>> with the key "yarn.resourcemanager.address" instead of using two named
>>> resource managers? I wonder if somehow the YARN client can't detect
>>> this multi-master set-up.
>>>
>>> On Sun, Jun 1, 2014 at 12:49 PM, Xu (Simon) Chen <xc...@gmail.com>
>>> wrote:
>>> > Note that everything works fine in spark 0.9, which is packaged in
>>> > CDH5: I
>>> > can launch a spark-shell and interact with workers spawned on my yarn
>>> > cluster.
>>> >
>>> > So in my /opt/hadoop/conf/yarn-site.xml, I have:
>>> >     ...
>>> >     <property>
>>> >         <name>yarn.resourcemanager.address.rm1</name>
>>> >         <value>controller-1.mycomp.com:23140</value>
>>> >     </property>
>>> >     ...
>>> >     <property>
>>> >         <name>yarn.resourcemanager.address.rm2</name>
>>> >         <value>controller-2.mycomp.com:23140</value>
>>> >     </property>
>>> >     ...
>>> >
>>> > And the other usual stuff.
>>> >
>>> > So spark 1.0 is launched like this:
>>> > Spark Command: java -cp
>>> >
>>> > ::/home/chenxu/spark-1.0.0-bin-hadoop2/conf:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/opt/hadoop/conf
>>> > -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
>>> > org.apache.spark.deploy.SparkSubmit spark-shell --master yarn-client
>>> > --class
>>> > org.apache.spark.repl.Main
>>> >
>>> > I do see "/opt/hadoop/conf" included, but not sure it's the right
>>> > place.
>>> >
>>> > Thanks..
>>> > -Simon
>>> >
>>> >
>>> >
>>> > On Sun, Jun 1, 2014 at 1:57 PM, Patrick Wendell <pw...@gmail.com>
>>> > wrote:
>>> >>
>>> >> I would agree with your guess, it looks like the yarn library isn't
>>> >> correctly finding your yarn-site.xml file. If you look in
>>> >> yarn-site.xml do you definitely the resource manager
>>> >> address/addresses?
>>> >>
>>> >> Also, you can try running this command with
>>> >> SPARK_PRINT_LAUNCH_COMMAND=1 to make sure the classpath is being
>>> >> set-up correctly.
>>> >>
>>> >> - Patrick
>>> >>
>>> >> On Sat, May 31, 2014 at 5:51 PM, Xu (Simon) Chen <xc...@gmail.com>
>>> >> wrote:
>>> >> > Hi all,
>>> >> >
>>> >> > I tried a couple ways, but couldn't get it to work..
>>> >> >
>>> >> > The following seems to be what the online document
>>> >> > (http://spark.apache.org/docs/latest/running-on-yarn.html) is
>>> >> > suggesting:
>>> >> >
>>> >> >
>>> >> > SPARK_JAR=hdfs://test/user/spark/share/lib/spark-assembly-1.0.0-hadoop2.2.0.jar
>>> >> > YARN_CONF_DIR=/opt/hadoop/conf ./spark-shell --master yarn-client
>>> >> >
>>> >> > Help info of spark-shell seems to be suggesting "--master yarn
>>> >> > --deploy-mode
>>> >> > cluster".
>>> >> >
>>> >> > But either way, I am seeing the following messages:
>>> >> > 14/06/01 00:33:20 INFO client.RMProxy: Connecting to ResourceManager
>>> >> > at
>>> >> > /0.0.0.0:8032
>>> >> > 14/06/01 00:33:21 INFO ipc.Client: Retrying connect to server:
>>> >> > 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is
>>> >> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
>>> >> > SECONDS)
>>> >> > 14/06/01 00:33:22 INFO ipc.Client: Retrying connect to server:
>>> >> > 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is
>>> >> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
>>> >> > SECONDS)
>>> >> >
>>> >> > My guess is that spark-shell is trying to talk to resource manager
>>> >> > to
>>> >> > setup
>>> >> > spark master/worker nodes - I am not sure where 0.0.0.0:8032 came
>>> >> > from
>>> >> > though. I am running CDH5 with two resource managers in HA mode.
>>> >> > Their
>>> >> > IP/port should be in /opt/hadoop/conf/yarn-site.xml. I tried both
>>> >> > HADOOP_CONF_DIR and YARN_CONF_DIR, but that info isn't picked up.
>>> >> >
>>> >> > Any ideas? Thanks.
>>> >> > -Simon
>>> >
>>> >
>>
>>
>

Re: spark 1.0.0 on yarn

Posted by "Xu (Simon) Chen" <xc...@gmail.com>.
OK, rebuilding the assembly jar file with cdh5 works now...
Thanks..

-Simon


On Sun, Jun 1, 2014 at 9:37 PM, Xu (Simon) Chen <xc...@gmail.com> wrote:

> That helped a bit... Now I have a different failure: the start up process
> is stuck in an infinite loop outputting the following message:
>
> 14/06/02 01:34:56 INFO cluster.YarnClientSchedulerBackend: Application
> report from ASM:
>  appMasterRpcPort: -1
>  appStartTime: 1401672868277
>  yarnAppState: ACCEPTED
>
> I am using the hadoop 2 prebuild package. Probably it doesn't have the
> latest yarn client.
>
> -Simon
>
>
>
>
> On Sun, Jun 1, 2014 at 9:03 PM, Patrick Wendell <pw...@gmail.com>
> wrote:
>
>> As a debugging step, does it work if you use a single resource manager
>> with the key "yarn.resourcemanager.address" instead of using two named
>> resource managers? I wonder if somehow the YARN client can't detect
>> this multi-master set-up.
>>
>> On Sun, Jun 1, 2014 at 12:49 PM, Xu (Simon) Chen <xc...@gmail.com>
>> wrote:
>> > Note that everything works fine in spark 0.9, which is packaged in
>> CDH5: I
>> > can launch a spark-shell and interact with workers spawned on my yarn
>> > cluster.
>> >
>> > So in my /opt/hadoop/conf/yarn-site.xml, I have:
>> >     ...
>> >     <property>
>> >         <name>yarn.resourcemanager.address.rm1</name>
>> >         <value>controller-1.mycomp.com:23140</value>
>> >     </property>
>> >     ...
>> >     <property>
>> >         <name>yarn.resourcemanager.address.rm2</name>
>> >         <value>controller-2.mycomp.com:23140</value>
>> >     </property>
>> >     ...
>> >
>> > And the other usual stuff.
>> >
>> > So spark 1.0 is launched like this:
>> > Spark Command: java -cp
>> >
>> ::/home/chenxu/spark-1.0.0-bin-hadoop2/conf:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/opt/hadoop/conf
>> > -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
>> > org.apache.spark.deploy.SparkSubmit spark-shell --master yarn-client
>> --class
>> > org.apache.spark.repl.Main
>> >
>> > I do see "/opt/hadoop/conf" included, but not sure it's the right place.
>> >
>> > Thanks..
>> > -Simon
>> >
>> >
>> >
>> > On Sun, Jun 1, 2014 at 1:57 PM, Patrick Wendell <pw...@gmail.com>
>> wrote:
>> >>
>> >> I would agree with your guess, it looks like the yarn library isn't
>> >> correctly finding your yarn-site.xml file. If you look in
>> >> yarn-site.xml do you definitely the resource manager
>> >> address/addresses?
>> >>
>> >> Also, you can try running this command with
>> >> SPARK_PRINT_LAUNCH_COMMAND=1 to make sure the classpath is being
>> >> set-up correctly.
>> >>
>> >> - Patrick
>> >>
>> >> On Sat, May 31, 2014 at 5:51 PM, Xu (Simon) Chen <xc...@gmail.com>
>> >> wrote:
>> >> > Hi all,
>> >> >
>> >> > I tried a couple ways, but couldn't get it to work..
>> >> >
>> >> > The following seems to be what the online document
>> >> > (http://spark.apache.org/docs/latest/running-on-yarn.html) is
>> >> > suggesting:
>> >> >
>> >> >
>> SPARK_JAR=hdfs://test/user/spark/share/lib/spark-assembly-1.0.0-hadoop2.2.0.jar
>> >> > YARN_CONF_DIR=/opt/hadoop/conf ./spark-shell --master yarn-client
>> >> >
>> >> > Help info of spark-shell seems to be suggesting "--master yarn
>> >> > --deploy-mode
>> >> > cluster".
>> >> >
>> >> > But either way, I am seeing the following messages:
>> >> > 14/06/01 00:33:20 INFO client.RMProxy: Connecting to ResourceManager
>> at
>> >> > /0.0.0.0:8032
>> >> > 14/06/01 00:33:21 INFO ipc.Client: Retrying connect to server:
>> >> > 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is
>> >> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
>> SECONDS)
>> >> > 14/06/01 00:33:22 INFO ipc.Client: Retrying connect to server:
>> >> > 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is
>> >> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
>> SECONDS)
>> >> >
>> >> > My guess is that spark-shell is trying to talk to resource manager to
>> >> > setup
>> >> > spark master/worker nodes - I am not sure where 0.0.0.0:8032 came
>> from
>> >> > though. I am running CDH5 with two resource managers in HA mode.
>> Their
>> >> > IP/port should be in /opt/hadoop/conf/yarn-site.xml. I tried both
>> >> > HADOOP_CONF_DIR and YARN_CONF_DIR, but that info isn't picked up.
>> >> >
>> >> > Any ideas? Thanks.
>> >> > -Simon
>> >
>> >
>>
>
>

Re: spark 1.0.0 on yarn

Posted by "Xu (Simon) Chen" <xc...@gmail.com>.
That helped a bit... Now I have a different failure: the start up process
is stuck in an infinite loop outputting the following message:

14/06/02 01:34:56 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:
 appMasterRpcPort: -1
 appStartTime: 1401672868277
 yarnAppState: ACCEPTED

I am using the hadoop 2 prebuild package. Probably it doesn't have the
latest yarn client.

-Simon




On Sun, Jun 1, 2014 at 9:03 PM, Patrick Wendell <pw...@gmail.com> wrote:

> As a debugging step, does it work if you use a single resource manager
> with the key "yarn.resourcemanager.address" instead of using two named
> resource managers? I wonder if somehow the YARN client can't detect
> this multi-master set-up.
>
> On Sun, Jun 1, 2014 at 12:49 PM, Xu (Simon) Chen <xc...@gmail.com>
> wrote:
> > Note that everything works fine in spark 0.9, which is packaged in CDH5:
> I
> > can launch a spark-shell and interact with workers spawned on my yarn
> > cluster.
> >
> > So in my /opt/hadoop/conf/yarn-site.xml, I have:
> >     ...
> >     <property>
> >         <name>yarn.resourcemanager.address.rm1</name>
> >         <value>controller-1.mycomp.com:23140</value>
> >     </property>
> >     ...
> >     <property>
> >         <name>yarn.resourcemanager.address.rm2</name>
> >         <value>controller-2.mycomp.com:23140</value>
> >     </property>
> >     ...
> >
> > And the other usual stuff.
> >
> > So spark 1.0 is launched like this:
> > Spark Command: java -cp
> >
> ::/home/chenxu/spark-1.0.0-bin-hadoop2/conf:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/opt/hadoop/conf
> > -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
> > org.apache.spark.deploy.SparkSubmit spark-shell --master yarn-client
> --class
> > org.apache.spark.repl.Main
> >
> > I do see "/opt/hadoop/conf" included, but not sure it's the right place.
> >
> > Thanks..
> > -Simon
> >
> >
> >
> > On Sun, Jun 1, 2014 at 1:57 PM, Patrick Wendell <pw...@gmail.com>
> wrote:
> >>
> >> I would agree with your guess, it looks like the yarn library isn't
> >> correctly finding your yarn-site.xml file. If you look in
> >> yarn-site.xml do you definitely the resource manager
> >> address/addresses?
> >>
> >> Also, you can try running this command with
> >> SPARK_PRINT_LAUNCH_COMMAND=1 to make sure the classpath is being
> >> set-up correctly.
> >>
> >> - Patrick
> >>
> >> On Sat, May 31, 2014 at 5:51 PM, Xu (Simon) Chen <xc...@gmail.com>
> >> wrote:
> >> > Hi all,
> >> >
> >> > I tried a couple ways, but couldn't get it to work..
> >> >
> >> > The following seems to be what the online document
> >> > (http://spark.apache.org/docs/latest/running-on-yarn.html) is
> >> > suggesting:
> >> >
> >> >
> SPARK_JAR=hdfs://test/user/spark/share/lib/spark-assembly-1.0.0-hadoop2.2.0.jar
> >> > YARN_CONF_DIR=/opt/hadoop/conf ./spark-shell --master yarn-client
> >> >
> >> > Help info of spark-shell seems to be suggesting "--master yarn
> >> > --deploy-mode
> >> > cluster".
> >> >
> >> > But either way, I am seeing the following messages:
> >> > 14/06/01 00:33:20 INFO client.RMProxy: Connecting to ResourceManager
> at
> >> > /0.0.0.0:8032
> >> > 14/06/01 00:33:21 INFO ipc.Client: Retrying connect to server:
> >> > 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is
> >> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
> SECONDS)
> >> > 14/06/01 00:33:22 INFO ipc.Client: Retrying connect to server:
> >> > 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is
> >> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
> SECONDS)
> >> >
> >> > My guess is that spark-shell is trying to talk to resource manager to
> >> > setup
> >> > spark master/worker nodes - I am not sure where 0.0.0.0:8032 came
> from
> >> > though. I am running CDH5 with two resource managers in HA mode. Their
> >> > IP/port should be in /opt/hadoop/conf/yarn-site.xml. I tried both
> >> > HADOOP_CONF_DIR and YARN_CONF_DIR, but that info isn't picked up.
> >> >
> >> > Any ideas? Thanks.
> >> > -Simon
> >
> >
>

Re: spark 1.0.0 on yarn

Posted by Patrick Wendell <pw...@gmail.com>.
As a debugging step, does it work if you use a single resource manager
with the key "yarn.resourcemanager.address" instead of using two named
resource managers? I wonder if somehow the YARN client can't detect
this multi-master set-up.

On Sun, Jun 1, 2014 at 12:49 PM, Xu (Simon) Chen <xc...@gmail.com> wrote:
> Note that everything works fine in spark 0.9, which is packaged in CDH5: I
> can launch a spark-shell and interact with workers spawned on my yarn
> cluster.
>
> So in my /opt/hadoop/conf/yarn-site.xml, I have:
>     ...
>     <property>
>         <name>yarn.resourcemanager.address.rm1</name>
>         <value>controller-1.mycomp.com:23140</value>
>     </property>
>     ...
>     <property>
>         <name>yarn.resourcemanager.address.rm2</name>
>         <value>controller-2.mycomp.com:23140</value>
>     </property>
>     ...
>
> And the other usual stuff.
>
> So spark 1.0 is launched like this:
> Spark Command: java -cp
> ::/home/chenxu/spark-1.0.0-bin-hadoop2/conf:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/opt/hadoop/conf
> -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
> org.apache.spark.deploy.SparkSubmit spark-shell --master yarn-client --class
> org.apache.spark.repl.Main
>
> I do see "/opt/hadoop/conf" included, but not sure it's the right place.
>
> Thanks..
> -Simon
>
>
>
> On Sun, Jun 1, 2014 at 1:57 PM, Patrick Wendell <pw...@gmail.com> wrote:
>>
>> I would agree with your guess, it looks like the yarn library isn't
>> correctly finding your yarn-site.xml file. If you look in
>> yarn-site.xml do you definitely the resource manager
>> address/addresses?
>>
>> Also, you can try running this command with
>> SPARK_PRINT_LAUNCH_COMMAND=1 to make sure the classpath is being
>> set-up correctly.
>>
>> - Patrick
>>
>> On Sat, May 31, 2014 at 5:51 PM, Xu (Simon) Chen <xc...@gmail.com>
>> wrote:
>> > Hi all,
>> >
>> > I tried a couple ways, but couldn't get it to work..
>> >
>> > The following seems to be what the online document
>> > (http://spark.apache.org/docs/latest/running-on-yarn.html) is
>> > suggesting:
>> >
>> > SPARK_JAR=hdfs://test/user/spark/share/lib/spark-assembly-1.0.0-hadoop2.2.0.jar
>> > YARN_CONF_DIR=/opt/hadoop/conf ./spark-shell --master yarn-client
>> >
>> > Help info of spark-shell seems to be suggesting "--master yarn
>> > --deploy-mode
>> > cluster".
>> >
>> > But either way, I am seeing the following messages:
>> > 14/06/01 00:33:20 INFO client.RMProxy: Connecting to ResourceManager at
>> > /0.0.0.0:8032
>> > 14/06/01 00:33:21 INFO ipc.Client: Retrying connect to server:
>> > 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is
>> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>> > 14/06/01 00:33:22 INFO ipc.Client: Retrying connect to server:
>> > 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is
>> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>> >
>> > My guess is that spark-shell is trying to talk to resource manager to
>> > setup
>> > spark master/worker nodes - I am not sure where 0.0.0.0:8032 came from
>> > though. I am running CDH5 with two resource managers in HA mode. Their
>> > IP/port should be in /opt/hadoop/conf/yarn-site.xml. I tried both
>> > HADOOP_CONF_DIR and YARN_CONF_DIR, but that info isn't picked up.
>> >
>> > Any ideas? Thanks.
>> > -Simon
>
>

Re: spark 1.0.0 on yarn

Posted by "Xu (Simon) Chen" <xc...@gmail.com>.
Note that everything works fine in spark 0.9, which is packaged in CDH5: I
can launch a spark-shell and interact with workers spawned on my yarn
cluster.

So in my /opt/hadoop/conf/yarn-site.xml, I have:
    ...
    <property>
        <name>yarn.resourcemanager.address.rm1</name>
        <value>controller-1.mycomp.com:23140</value>
    </property>
    ...
    <property>
        <name>yarn.resourcemanager.address.rm2</name>
        <value>controller-2.mycomp.com:23140</value>
    </property>
    ...

And the other usual stuff.

So spark 1.0 is launched like this:
Spark Command: java -cp
::/home/chenxu/spark-1.0.0-bin-hadoop2/conf:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/opt/hadoop/conf
-XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
org.apache.spark.deploy.SparkSubmit spark-shell --master yarn-client
--class org.apache.spark.repl.Main

I do see "/opt/hadoop/conf" included, but not sure it's the right place.

Thanks..
-Simon



On Sun, Jun 1, 2014 at 1:57 PM, Patrick Wendell <pw...@gmail.com> wrote:

> I would agree with your guess, it looks like the yarn library isn't
> correctly finding your yarn-site.xml file. If you look in
> yarn-site.xml do you definitely the resource manager
> address/addresses?
>
> Also, you can try running this command with
> SPARK_PRINT_LAUNCH_COMMAND=1 to make sure the classpath is being
> set-up correctly.
>
> - Patrick
>
> On Sat, May 31, 2014 at 5:51 PM, Xu (Simon) Chen <xc...@gmail.com>
> wrote:
> > Hi all,
> >
> > I tried a couple ways, but couldn't get it to work..
> >
> > The following seems to be what the online document
> > (http://spark.apache.org/docs/latest/running-on-yarn.html) is
> suggesting:
> >
> SPARK_JAR=hdfs://test/user/spark/share/lib/spark-assembly-1.0.0-hadoop2.2.0.jar
> > YARN_CONF_DIR=/opt/hadoop/conf ./spark-shell --master yarn-client
> >
> > Help info of spark-shell seems to be suggesting "--master yarn
> --deploy-mode
> > cluster".
> >
> > But either way, I am seeing the following messages:
> > 14/06/01 00:33:20 INFO client.RMProxy: Connecting to ResourceManager at
> > /0.0.0.0:8032
> > 14/06/01 00:33:21 INFO ipc.Client: Retrying connect to server:
> > 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is
> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
> > 14/06/01 00:33:22 INFO ipc.Client: Retrying connect to server:
> > 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is
> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
> >
> > My guess is that spark-shell is trying to talk to resource manager to
> setup
> > spark master/worker nodes - I am not sure where 0.0.0.0:8032 came from
> > though. I am running CDH5 with two resource managers in HA mode. Their
> > IP/port should be in /opt/hadoop/conf/yarn-site.xml. I tried both
> > HADOOP_CONF_DIR and YARN_CONF_DIR, but that info isn't picked up.
> >
> > Any ideas? Thanks.
> > -Simon
>

Re: spark 1.0.0 on yarn

Posted by Patrick Wendell <pw...@gmail.com>.
I would agree with your guess, it looks like the yarn library isn't
correctly finding your yarn-site.xml file. If you look in
yarn-site.xml do you definitely the resource manager
address/addresses?

Also, you can try running this command with
SPARK_PRINT_LAUNCH_COMMAND=1 to make sure the classpath is being
set-up correctly.

- Patrick

On Sat, May 31, 2014 at 5:51 PM, Xu (Simon) Chen <xc...@gmail.com> wrote:
> Hi all,
>
> I tried a couple ways, but couldn't get it to work..
>
> The following seems to be what the online document
> (http://spark.apache.org/docs/latest/running-on-yarn.html) is suggesting:
> SPARK_JAR=hdfs://test/user/spark/share/lib/spark-assembly-1.0.0-hadoop2.2.0.jar
> YARN_CONF_DIR=/opt/hadoop/conf ./spark-shell --master yarn-client
>
> Help info of spark-shell seems to be suggesting "--master yarn --deploy-mode
> cluster".
>
> But either way, I am seeing the following messages:
> 14/06/01 00:33:20 INFO client.RMProxy: Connecting to ResourceManager at
> /0.0.0.0:8032
> 14/06/01 00:33:21 INFO ipc.Client: Retrying connect to server:
> 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
> 14/06/01 00:33:22 INFO ipc.Client: Retrying connect to server:
> 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>
> My guess is that spark-shell is trying to talk to resource manager to setup
> spark master/worker nodes - I am not sure where 0.0.0.0:8032 came from
> though. I am running CDH5 with two resource managers in HA mode. Their
> IP/port should be in /opt/hadoop/conf/yarn-site.xml. I tried both
> HADOOP_CONF_DIR and YARN_CONF_DIR, but that info isn't picked up.
>
> Any ideas? Thanks.
> -Simon