You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ge ko <ko...@gmail.com> on 2014/04/12 15:19:23 UTC

Master registers itself at startup?

Hi,

I'm wondering why the master is registering itself at startup, exactly 3
times (same number as the number of workers). Log excerpt:
""
2014-04-11 21:08:15,363 INFO akka.event.slf4j.Slf4jLogger: Slf4jLogger
started
2014-04-11 21:08:15,478 INFO Remoting: Starting remoting
2014-04-11 21:08:15,838 INFO Remoting: Remoting started; listening on
addresses :[akka.tcp://sparkMaster@hadoop-pg-5.cluster:7077]
2014-04-11 21:08:16,252 INFO org.apache.spark.deploy.master.Master:
Starting Spark master at spark://hadoop-pg-5.cluster:7077
2014-04-11 21:08:16,299 INFO org.eclipse.jetty.server.Server:
jetty-7.x.y-SNAPSHOT
2014-04-11 21:08:16,341 INFO org.eclipse.jetty.server.AbstractConnector:
Started SelectChannelConnector@0.0.0.0:18080
2014-04-11 21:08:16,343 INFO org.apache.spark.deploy.master.ui.MasterWebUI:
Started Master web UI at http://hadoop-pg-5.cluster:18080
2014-04-11 21:08:16,374 INFO org.apache.spark.deploy.master.Master: I have
been elected leader! New state: ALIVE
2014-04-11 21:08:21,492 INFO org.apache.spark.deploy.master.Master:
Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
2014-04-11 21:08:31,362 INFO org.apache.spark.deploy.master.Master:
Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
2014-04-11 21:08:34,819 INFO org.apache.spark.deploy.master.Master:
Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
""

The workers should be hadoop-pg-7/-8/-9
This seems strange to me, or do I just interpret the log entry wrong ?!?!
Perhaps this relates to my other post titled "TaskSchedulerImpl: Initial
job has not accepted any resources" and both issues are caused by the same
problem/misconfiguration.

All the nodes can reach each other via ping and telnet on the corresponding
ports.

Any hints what is going wrong there?

br, Gerd

Re: Master registers itself at startup?

Posted by Gerd Koenig <ko...@googlemail.com>.
@YouPeng, @Aaron

many thanks for the memory-setting hint.
That solved the issue, just increased it to the default val of 512MB

thanks, Gerd


On 14 April 2014 03:22, YouPeng Yang <yy...@gmail.com> wrote:

> Hi
>
> The 512MB is the default memory size which each executor needs. and
> actually, your job does not need as much as the default memory size. you
> can create a  SparkContext with
>  sc = new SparkContext("local-cluster[2,1,512]", "test") // suppose you
> use the local-cluster model.
> Here the 512 is the memory size,you can change it.
>
>
> 2014-04-14 7:22 GMT+08:00 Aaron Davidson <il...@gmail.com>:
>
> This is usually due to a memory misconfiguration somewhere. Your job may
>> be requesting that each executor has 512MB, and your cluster may not be
>> able to satisfy that (if you're only allowing 64MB executors, for
>> instance). Try setting spark.executor.memory to be the same as
>> SPARK_WORKER_MEMORY.
>>
>>
>> On Sun, Apr 13, 2014 at 12:47 PM, Gerd Koenig <
>> koenig.bodensee@googlemail.com> wrote:
>>
>>> Many thanks for your explanation.
>>>
>>> So there's just my issue with that "TaskSchedulerImpl: Initial job has
>>> not accepted any resources" stuff that prevents me from starting with Spark
>>> (at least execute the examples successfully) ;)
>>>
>>> br, Gerd
>>>
>>>
>>> On 13 April 2014 10:17, Aaron Davidson <il...@gmail.com> wrote:
>>>
>>>> By the way, 64 MB of RAM per machine is really small, I'm surprised
>>>> Spark can even start up on that! Perhaps you meant to set
>>>> SPARK_DAEMON_MEMORY so that the actual worker process itself would be
>>>> small, but SPARK_WORKER_MEMORY (which controls the amount of memory
>>>> available for Spark executors) should be at least 512 MB, and ideally many
>>>> times that.
>>>>
>>>>
>>>> On Sun, Apr 13, 2014 at 1:14 AM, Aaron Davidson <il...@gmail.com>wrote:
>>>>
>>>>> This was actually a bug in the log message itself, where the Master
>>>>> would print its own ip and port instead of the registered worker's. It has
>>>>> been fixed in 0.9.1 and 1.0.0 (here's the patch:
>>>>> https://github.com/apache/spark/commit/c0795cf481d47425ec92f4fd0780e2e0b3fdda85
>>>>> ).
>>>>>
>>>>> Sorry about the confusion!
>>>>>
>>>>>
>>>>> On Sat, Apr 12, 2014 at 4:36 PM, Mark Baker <ma...@coactus.com> wrote:
>>>>>
>>>>>> On Sat, Apr 12, 2014 at 9:19 AM, ge ko <ko...@gmail.com> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > I'm wondering why the master is registering itself at startup,
>>>>>> exactly 3
>>>>>> > times (same number as the number of workers). Log excerpt:
>>>>>> > ""
>>>>>> > 2014-04-11 21:08:15,363 INFO akka.event.slf4j.Slf4jLogger:
>>>>>> Slf4jLogger
>>>>>> > started
>>>>>> > 2014-04-11 21:08:15,478 INFO Remoting: Starting remoting
>>>>>> > 2014-04-11 21:08:15,838 INFO Remoting: Remoting started; listening
>>>>>> on
>>>>>> > addresses :[akka.tcp://sparkMaster@hadoop-pg-5.cluster:7077]
>>>>>> > 2014-04-11 21:08:16,252 INFO org.apache.spark.deploy.master.Master:
>>>>>> Starting
>>>>>> > Spark master at spark://hadoop-pg-5.cluster:7077
>>>>>> > 2014-04-11 21:08:16,299 INFO org.eclipse.jetty.server.Server:
>>>>>> > jetty-7.x.y-SNAPSHOT
>>>>>> > 2014-04-11 21:08:16,341 INFO
>>>>>> org.eclipse.jetty.server.AbstractConnector:
>>>>>> > Started SelectChannelConnector@0.0.0.0:18080
>>>>>> > 2014-04-11 21:08:16,343 INFO
>>>>>> org.apache.spark.deploy.master.ui.MasterWebUI:
>>>>>> > Started Master web UI at http://hadoop-pg-5.cluster:18080
>>>>>> > 2014-04-11 21:08:16,374 INFO org.apache.spark.deploy.master.Master:
>>>>>> I have
>>>>>> > been elected leader! New state: ALIVE
>>>>>> > 2014-04-11 21:08:21,492 INFO org.apache.spark.deploy.master.Master:
>>>>>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB
>>>>>> RAM
>>>>>> > 2014-04-11 21:08:31,362 INFO org.apache.spark.deploy.master.Master:
>>>>>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB
>>>>>> RAM
>>>>>> > 2014-04-11 21:08:34,819 INFO org.apache.spark.deploy.master.Master:
>>>>>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB
>>>>>> RAM
>>>>>> > ""
>>>>>> >
>>>>>> > The workers should be hadoop-pg-7/-8/-9
>>>>>> > This seems strange to me, or do I just interpret the log entry
>>>>>> wrong ?!?!
>>>>>> > Perhaps this relates to my other post titled "TaskSchedulerImpl:
>>>>>> Initial job
>>>>>> > has not accepted any resources" and both issues are caused by the
>>>>>> same
>>>>>> > problem/misconfiguration.
>>>>>> >
>>>>>> > All the nodes can reach each other via ping and telnet on the
>>>>>> corresponding
>>>>>> > ports.
>>>>>> >
>>>>>> > Any hints what is going wrong there?
>>>>>>
>>>>>> I had this happen in a virtualbox configuration I was testing with,
>>>>>> and never got it fixed, but suspected it was because I wasn't using
>>>>>> FQDNs (as you're also not doing). I had found - can't find it again
>>>>>> now, of course - a message in the archives from somebody also
>>>>>> suffering from this issue while still using FQDNs, but claimed that
>>>>>> they found an error in /etc/hosts.
>>>>>>
>>>>>> Good luck.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Master registers itself at startup?

Posted by YouPeng Yang <yy...@gmail.com>.
Hi

The 512MB is the default memory size which each executor needs. and
actually, your job does not need as much as the default memory size. you
can create a  SparkContext with
 sc = new SparkContext("local-cluster[2,1,512]", "test") // suppose you use
the local-cluster model.
Here the 512 is the memory size,you can change it.


2014-04-14 7:22 GMT+08:00 Aaron Davidson <il...@gmail.com>:

> This is usually due to a memory misconfiguration somewhere. Your job may
> be requesting that each executor has 512MB, and your cluster may not be
> able to satisfy that (if you're only allowing 64MB executors, for
> instance). Try setting spark.executor.memory to be the same as
> SPARK_WORKER_MEMORY.
>
>
> On Sun, Apr 13, 2014 at 12:47 PM, Gerd Koenig <
> koenig.bodensee@googlemail.com> wrote:
>
>> Many thanks for your explanation.
>>
>> So there's just my issue with that "TaskSchedulerImpl: Initial job has
>> not accepted any resources" stuff that prevents me from starting with Spark
>> (at least execute the examples successfully) ;)
>>
>> br, Gerd
>>
>>
>> On 13 April 2014 10:17, Aaron Davidson <il...@gmail.com> wrote:
>>
>>> By the way, 64 MB of RAM per machine is really small, I'm surprised
>>> Spark can even start up on that! Perhaps you meant to set
>>> SPARK_DAEMON_MEMORY so that the actual worker process itself would be
>>> small, but SPARK_WORKER_MEMORY (which controls the amount of memory
>>> available for Spark executors) should be at least 512 MB, and ideally many
>>> times that.
>>>
>>>
>>> On Sun, Apr 13, 2014 at 1:14 AM, Aaron Davidson <il...@gmail.com>wrote:
>>>
>>>> This was actually a bug in the log message itself, where the Master
>>>> would print its own ip and port instead of the registered worker's. It has
>>>> been fixed in 0.9.1 and 1.0.0 (here's the patch:
>>>> https://github.com/apache/spark/commit/c0795cf481d47425ec92f4fd0780e2e0b3fdda85
>>>> ).
>>>>
>>>> Sorry about the confusion!
>>>>
>>>>
>>>> On Sat, Apr 12, 2014 at 4:36 PM, Mark Baker <ma...@coactus.com> wrote:
>>>>
>>>>> On Sat, Apr 12, 2014 at 9:19 AM, ge ko <ko...@gmail.com> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I'm wondering why the master is registering itself at startup,
>>>>> exactly 3
>>>>> > times (same number as the number of workers). Log excerpt:
>>>>> > ""
>>>>> > 2014-04-11 21:08:15,363 INFO akka.event.slf4j.Slf4jLogger:
>>>>> Slf4jLogger
>>>>> > started
>>>>> > 2014-04-11 21:08:15,478 INFO Remoting: Starting remoting
>>>>> > 2014-04-11 21:08:15,838 INFO Remoting: Remoting started; listening on
>>>>> > addresses :[akka.tcp://sparkMaster@hadoop-pg-5.cluster:7077]
>>>>> > 2014-04-11 21:08:16,252 INFO org.apache.spark.deploy.master.Master:
>>>>> Starting
>>>>> > Spark master at spark://hadoop-pg-5.cluster:7077
>>>>> > 2014-04-11 21:08:16,299 INFO org.eclipse.jetty.server.Server:
>>>>> > jetty-7.x.y-SNAPSHOT
>>>>> > 2014-04-11 21:08:16,341 INFO
>>>>> org.eclipse.jetty.server.AbstractConnector:
>>>>> > Started SelectChannelConnector@0.0.0.0:18080
>>>>> > 2014-04-11 21:08:16,343 INFO
>>>>> org.apache.spark.deploy.master.ui.MasterWebUI:
>>>>> > Started Master web UI at http://hadoop-pg-5.cluster:18080
>>>>> > 2014-04-11 21:08:16,374 INFO org.apache.spark.deploy.master.Master:
>>>>> I have
>>>>> > been elected leader! New state: ALIVE
>>>>> > 2014-04-11 21:08:21,492 INFO org.apache.spark.deploy.master.Master:
>>>>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
>>>>> > 2014-04-11 21:08:31,362 INFO org.apache.spark.deploy.master.Master:
>>>>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
>>>>> > 2014-04-11 21:08:34,819 INFO org.apache.spark.deploy.master.Master:
>>>>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
>>>>> > ""
>>>>> >
>>>>> > The workers should be hadoop-pg-7/-8/-9
>>>>> > This seems strange to me, or do I just interpret the log entry wrong
>>>>> ?!?!
>>>>> > Perhaps this relates to my other post titled "TaskSchedulerImpl:
>>>>> Initial job
>>>>> > has not accepted any resources" and both issues are caused by the
>>>>> same
>>>>> > problem/misconfiguration.
>>>>> >
>>>>> > All the nodes can reach each other via ping and telnet on the
>>>>> corresponding
>>>>> > ports.
>>>>> >
>>>>> > Any hints what is going wrong there?
>>>>>
>>>>> I had this happen in a virtualbox configuration I was testing with,
>>>>> and never got it fixed, but suspected it was because I wasn't using
>>>>> FQDNs (as you're also not doing). I had found - can't find it again
>>>>> now, of course - a message in the archives from somebody also
>>>>> suffering from this issue while still using FQDNs, but claimed that
>>>>> they found an error in /etc/hosts.
>>>>>
>>>>> Good luck.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Master registers itself at startup?

Posted by Aaron Davidson <il...@gmail.com>.
This is usually due to a memory misconfiguration somewhere. Your job may be
requesting that each executor has 512MB, and your cluster may not be able
to satisfy that (if you're only allowing 64MB executors, for instance). Try
setting spark.executor.memory to be the same as SPARK_WORKER_MEMORY.


On Sun, Apr 13, 2014 at 12:47 PM, Gerd Koenig <
koenig.bodensee@googlemail.com> wrote:

> Many thanks for your explanation.
>
> So there's just my issue with that "TaskSchedulerImpl: Initial job has not
> accepted any resources" stuff that prevents me from starting with Spark (at
> least execute the examples successfully) ;)
>
> br, Gerd
>
>
> On 13 April 2014 10:17, Aaron Davidson <il...@gmail.com> wrote:
>
>> By the way, 64 MB of RAM per machine is really small, I'm surprised Spark
>> can even start up on that! Perhaps you meant to set SPARK_DAEMON_MEMORY so
>> that the actual worker process itself would be small, but
>> SPARK_WORKER_MEMORY (which controls the amount of memory available for
>> Spark executors) should be at least 512 MB, and ideally many times that.
>>
>>
>> On Sun, Apr 13, 2014 at 1:14 AM, Aaron Davidson <il...@gmail.com>wrote:
>>
>>> This was actually a bug in the log message itself, where the Master
>>> would print its own ip and port instead of the registered worker's. It has
>>> been fixed in 0.9.1 and 1.0.0 (here's the patch:
>>> https://github.com/apache/spark/commit/c0795cf481d47425ec92f4fd0780e2e0b3fdda85
>>> ).
>>>
>>> Sorry about the confusion!
>>>
>>>
>>> On Sat, Apr 12, 2014 at 4:36 PM, Mark Baker <ma...@coactus.com> wrote:
>>>
>>>> On Sat, Apr 12, 2014 at 9:19 AM, ge ko <ko...@gmail.com> wrote:
>>>> > Hi,
>>>> >
>>>> > I'm wondering why the master is registering itself at startup,
>>>> exactly 3
>>>> > times (same number as the number of workers). Log excerpt:
>>>> > ""
>>>> > 2014-04-11 21:08:15,363 INFO akka.event.slf4j.Slf4jLogger: Slf4jLogger
>>>> > started
>>>> > 2014-04-11 21:08:15,478 INFO Remoting: Starting remoting
>>>> > 2014-04-11 21:08:15,838 INFO Remoting: Remoting started; listening on
>>>> > addresses :[akka.tcp://sparkMaster@hadoop-pg-5.cluster:7077]
>>>> > 2014-04-11 21:08:16,252 INFO org.apache.spark.deploy.master.Master:
>>>> Starting
>>>> > Spark master at spark://hadoop-pg-5.cluster:7077
>>>> > 2014-04-11 21:08:16,299 INFO org.eclipse.jetty.server.Server:
>>>> > jetty-7.x.y-SNAPSHOT
>>>> > 2014-04-11 21:08:16,341 INFO
>>>> org.eclipse.jetty.server.AbstractConnector:
>>>> > Started SelectChannelConnector@0.0.0.0:18080
>>>> > 2014-04-11 21:08:16,343 INFO
>>>> org.apache.spark.deploy.master.ui.MasterWebUI:
>>>> > Started Master web UI at http://hadoop-pg-5.cluster:18080
>>>> > 2014-04-11 21:08:16,374 INFO org.apache.spark.deploy.master.Master: I
>>>> have
>>>> > been elected leader! New state: ALIVE
>>>> > 2014-04-11 21:08:21,492 INFO org.apache.spark.deploy.master.Master:
>>>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
>>>> > 2014-04-11 21:08:31,362 INFO org.apache.spark.deploy.master.Master:
>>>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
>>>> > 2014-04-11 21:08:34,819 INFO org.apache.spark.deploy.master.Master:
>>>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
>>>> > ""
>>>> >
>>>> > The workers should be hadoop-pg-7/-8/-9
>>>> > This seems strange to me, or do I just interpret the log entry wrong
>>>> ?!?!
>>>> > Perhaps this relates to my other post titled "TaskSchedulerImpl:
>>>> Initial job
>>>> > has not accepted any resources" and both issues are caused by the same
>>>> > problem/misconfiguration.
>>>> >
>>>> > All the nodes can reach each other via ping and telnet on the
>>>> corresponding
>>>> > ports.
>>>> >
>>>> > Any hints what is going wrong there?
>>>>
>>>> I had this happen in a virtualbox configuration I was testing with,
>>>> and never got it fixed, but suspected it was because I wasn't using
>>>> FQDNs (as you're also not doing). I had found - can't find it again
>>>> now, of course - a message in the archives from somebody also
>>>> suffering from this issue while still using FQDNs, but claimed that
>>>> they found an error in /etc/hosts.
>>>>
>>>> Good luck.
>>>>
>>>
>>>
>>
>

Re: Master registers itself at startup?

Posted by Gerd Koenig <ko...@googlemail.com>.
Many thanks for your explanation.

So there's just my issue with that "TaskSchedulerImpl: Initial job has not
accepted any resources" stuff that prevents me from starting with Spark (at
least execute the examples successfully) ;)

br, Gerd


On 13 April 2014 10:17, Aaron Davidson <il...@gmail.com> wrote:

> By the way, 64 MB of RAM per machine is really small, I'm surprised Spark
> can even start up on that! Perhaps you meant to set SPARK_DAEMON_MEMORY so
> that the actual worker process itself would be small, but
> SPARK_WORKER_MEMORY (which controls the amount of memory available for
> Spark executors) should be at least 512 MB, and ideally many times that.
>
>
> On Sun, Apr 13, 2014 at 1:14 AM, Aaron Davidson <il...@gmail.com>wrote:
>
>> This was actually a bug in the log message itself, where the Master would
>> print its own ip and port instead of the registered worker's. It has been
>> fixed in 0.9.1 and 1.0.0 (here's the patch:
>> https://github.com/apache/spark/commit/c0795cf481d47425ec92f4fd0780e2e0b3fdda85
>> ).
>>
>> Sorry about the confusion!
>>
>>
>> On Sat, Apr 12, 2014 at 4:36 PM, Mark Baker <ma...@coactus.com> wrote:
>>
>>> On Sat, Apr 12, 2014 at 9:19 AM, ge ko <ko...@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > I'm wondering why the master is registering itself at startup, exactly
>>> 3
>>> > times (same number as the number of workers). Log excerpt:
>>> > ""
>>> > 2014-04-11 21:08:15,363 INFO akka.event.slf4j.Slf4jLogger: Slf4jLogger
>>> > started
>>> > 2014-04-11 21:08:15,478 INFO Remoting: Starting remoting
>>> > 2014-04-11 21:08:15,838 INFO Remoting: Remoting started; listening on
>>> > addresses :[akka.tcp://sparkMaster@hadoop-pg-5.cluster:7077]
>>> > 2014-04-11 21:08:16,252 INFO org.apache.spark.deploy.master.Master:
>>> Starting
>>> > Spark master at spark://hadoop-pg-5.cluster:7077
>>> > 2014-04-11 21:08:16,299 INFO org.eclipse.jetty.server.Server:
>>> > jetty-7.x.y-SNAPSHOT
>>> > 2014-04-11 21:08:16,341 INFO
>>> org.eclipse.jetty.server.AbstractConnector:
>>> > Started SelectChannelConnector@0.0.0.0:18080
>>> > 2014-04-11 21:08:16,343 INFO
>>> org.apache.spark.deploy.master.ui.MasterWebUI:
>>> > Started Master web UI at http://hadoop-pg-5.cluster:18080
>>> > 2014-04-11 21:08:16,374 INFO org.apache.spark.deploy.master.Master: I
>>> have
>>> > been elected leader! New state: ALIVE
>>> > 2014-04-11 21:08:21,492 INFO org.apache.spark.deploy.master.Master:
>>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
>>> > 2014-04-11 21:08:31,362 INFO org.apache.spark.deploy.master.Master:
>>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
>>> > 2014-04-11 21:08:34,819 INFO org.apache.spark.deploy.master.Master:
>>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
>>> > ""
>>> >
>>> > The workers should be hadoop-pg-7/-8/-9
>>> > This seems strange to me, or do I just interpret the log entry wrong
>>> ?!?!
>>> > Perhaps this relates to my other post titled "TaskSchedulerImpl:
>>> Initial job
>>> > has not accepted any resources" and both issues are caused by the same
>>> > problem/misconfiguration.
>>> >
>>> > All the nodes can reach each other via ping and telnet on the
>>> corresponding
>>> > ports.
>>> >
>>> > Any hints what is going wrong there?
>>>
>>> I had this happen in a virtualbox configuration I was testing with,
>>> and never got it fixed, but suspected it was because I wasn't using
>>> FQDNs (as you're also not doing). I had found - can't find it again
>>> now, of course - a message in the archives from somebody also
>>> suffering from this issue while still using FQDNs, but claimed that
>>> they found an error in /etc/hosts.
>>>
>>> Good luck.
>>>
>>
>>
>

Re: Master registers itself at startup?

Posted by Aaron Davidson <il...@gmail.com>.
By the way, 64 MB of RAM per machine is really small, I'm surprised Spark
can even start up on that! Perhaps you meant to set SPARK_DAEMON_MEMORY so
that the actual worker process itself would be small, but
SPARK_WORKER_MEMORY (which controls the amount of memory available for
Spark executors) should be at least 512 MB, and ideally many times that.


On Sun, Apr 13, 2014 at 1:14 AM, Aaron Davidson <il...@gmail.com> wrote:

> This was actually a bug in the log message itself, where the Master would
> print its own ip and port instead of the registered worker's. It has been
> fixed in 0.9.1 and 1.0.0 (here's the patch:
> https://github.com/apache/spark/commit/c0795cf481d47425ec92f4fd0780e2e0b3fdda85
> ).
>
> Sorry about the confusion!
>
>
> On Sat, Apr 12, 2014 at 4:36 PM, Mark Baker <ma...@coactus.com> wrote:
>
>> On Sat, Apr 12, 2014 at 9:19 AM, ge ko <ko...@gmail.com> wrote:
>> > Hi,
>> >
>> > I'm wondering why the master is registering itself at startup, exactly 3
>> > times (same number as the number of workers). Log excerpt:
>> > ""
>> > 2014-04-11 21:08:15,363 INFO akka.event.slf4j.Slf4jLogger: Slf4jLogger
>> > started
>> > 2014-04-11 21:08:15,478 INFO Remoting: Starting remoting
>> > 2014-04-11 21:08:15,838 INFO Remoting: Remoting started; listening on
>> > addresses :[akka.tcp://sparkMaster@hadoop-pg-5.cluster:7077]
>> > 2014-04-11 21:08:16,252 INFO org.apache.spark.deploy.master.Master:
>> Starting
>> > Spark master at spark://hadoop-pg-5.cluster:7077
>> > 2014-04-11 21:08:16,299 INFO org.eclipse.jetty.server.Server:
>> > jetty-7.x.y-SNAPSHOT
>> > 2014-04-11 21:08:16,341 INFO org.eclipse.jetty.server.AbstractConnector:
>> > Started SelectChannelConnector@0.0.0.0:18080
>> > 2014-04-11 21:08:16,343 INFO
>> org.apache.spark.deploy.master.ui.MasterWebUI:
>> > Started Master web UI at http://hadoop-pg-5.cluster:18080
>> > 2014-04-11 21:08:16,374 INFO org.apache.spark.deploy.master.Master: I
>> have
>> > been elected leader! New state: ALIVE
>> > 2014-04-11 21:08:21,492 INFO org.apache.spark.deploy.master.Master:
>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
>> > 2014-04-11 21:08:31,362 INFO org.apache.spark.deploy.master.Master:
>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
>> > 2014-04-11 21:08:34,819 INFO org.apache.spark.deploy.master.Master:
>> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
>> > ""
>> >
>> > The workers should be hadoop-pg-7/-8/-9
>> > This seems strange to me, or do I just interpret the log entry wrong
>> ?!?!
>> > Perhaps this relates to my other post titled "TaskSchedulerImpl:
>> Initial job
>> > has not accepted any resources" and both issues are caused by the same
>> > problem/misconfiguration.
>> >
>> > All the nodes can reach each other via ping and telnet on the
>> corresponding
>> > ports.
>> >
>> > Any hints what is going wrong there?
>>
>> I had this happen in a virtualbox configuration I was testing with,
>> and never got it fixed, but suspected it was because I wasn't using
>> FQDNs (as you're also not doing). I had found - can't find it again
>> now, of course - a message in the archives from somebody also
>> suffering from this issue while still using FQDNs, but claimed that
>> they found an error in /etc/hosts.
>>
>> Good luck.
>>
>
>

Re: Master registers itself at startup?

Posted by Aaron Davidson <il...@gmail.com>.
This was actually a bug in the log message itself, where the Master would
print its own ip and port instead of the registered worker's. It has been
fixed in 0.9.1 and 1.0.0 (here's the patch:
https://github.com/apache/spark/commit/c0795cf481d47425ec92f4fd0780e2e0b3fdda85
).

Sorry about the confusion!


On Sat, Apr 12, 2014 at 4:36 PM, Mark Baker <ma...@coactus.com> wrote:

> On Sat, Apr 12, 2014 at 9:19 AM, ge ko <ko...@gmail.com> wrote:
> > Hi,
> >
> > I'm wondering why the master is registering itself at startup, exactly 3
> > times (same number as the number of workers). Log excerpt:
> > ""
> > 2014-04-11 21:08:15,363 INFO akka.event.slf4j.Slf4jLogger: Slf4jLogger
> > started
> > 2014-04-11 21:08:15,478 INFO Remoting: Starting remoting
> > 2014-04-11 21:08:15,838 INFO Remoting: Remoting started; listening on
> > addresses :[akka.tcp://sparkMaster@hadoop-pg-5.cluster:7077]
> > 2014-04-11 21:08:16,252 INFO org.apache.spark.deploy.master.Master:
> Starting
> > Spark master at spark://hadoop-pg-5.cluster:7077
> > 2014-04-11 21:08:16,299 INFO org.eclipse.jetty.server.Server:
> > jetty-7.x.y-SNAPSHOT
> > 2014-04-11 21:08:16,341 INFO org.eclipse.jetty.server.AbstractConnector:
> > Started SelectChannelConnector@0.0.0.0:18080
> > 2014-04-11 21:08:16,343 INFO
> org.apache.spark.deploy.master.ui.MasterWebUI:
> > Started Master web UI at http://hadoop-pg-5.cluster:18080
> > 2014-04-11 21:08:16,374 INFO org.apache.spark.deploy.master.Master: I
> have
> > been elected leader! New state: ALIVE
> > 2014-04-11 21:08:21,492 INFO org.apache.spark.deploy.master.Master:
> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
> > 2014-04-11 21:08:31,362 INFO org.apache.spark.deploy.master.Master:
> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
> > 2014-04-11 21:08:34,819 INFO org.apache.spark.deploy.master.Master:
> > Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
> > ""
> >
> > The workers should be hadoop-pg-7/-8/-9
> > This seems strange to me, or do I just interpret the log entry wrong ?!?!
> > Perhaps this relates to my other post titled "TaskSchedulerImpl: Initial
> job
> > has not accepted any resources" and both issues are caused by the same
> > problem/misconfiguration.
> >
> > All the nodes can reach each other via ping and telnet on the
> corresponding
> > ports.
> >
> > Any hints what is going wrong there?
>
> I had this happen in a virtualbox configuration I was testing with,
> and never got it fixed, but suspected it was because I wasn't using
> FQDNs (as you're also not doing). I had found - can't find it again
> now, of course - a message in the archives from somebody also
> suffering from this issue while still using FQDNs, but claimed that
> they found an error in /etc/hosts.
>
> Good luck.
>

Re: Master registers itself at startup?

Posted by Mark Baker <ma...@coactus.com>.
On Sat, Apr 12, 2014 at 9:19 AM, ge ko <ko...@gmail.com> wrote:
> Hi,
>
> I'm wondering why the master is registering itself at startup, exactly 3
> times (same number as the number of workers). Log excerpt:
> ""
> 2014-04-11 21:08:15,363 INFO akka.event.slf4j.Slf4jLogger: Slf4jLogger
> started
> 2014-04-11 21:08:15,478 INFO Remoting: Starting remoting
> 2014-04-11 21:08:15,838 INFO Remoting: Remoting started; listening on
> addresses :[akka.tcp://sparkMaster@hadoop-pg-5.cluster:7077]
> 2014-04-11 21:08:16,252 INFO org.apache.spark.deploy.master.Master: Starting
> Spark master at spark://hadoop-pg-5.cluster:7077
> 2014-04-11 21:08:16,299 INFO org.eclipse.jetty.server.Server:
> jetty-7.x.y-SNAPSHOT
> 2014-04-11 21:08:16,341 INFO org.eclipse.jetty.server.AbstractConnector:
> Started SelectChannelConnector@0.0.0.0:18080
> 2014-04-11 21:08:16,343 INFO org.apache.spark.deploy.master.ui.MasterWebUI:
> Started Master web UI at http://hadoop-pg-5.cluster:18080
> 2014-04-11 21:08:16,374 INFO org.apache.spark.deploy.master.Master: I have
> been elected leader! New state: ALIVE
> 2014-04-11 21:08:21,492 INFO org.apache.spark.deploy.master.Master:
> Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
> 2014-04-11 21:08:31,362 INFO org.apache.spark.deploy.master.Master:
> Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
> 2014-04-11 21:08:34,819 INFO org.apache.spark.deploy.master.Master:
> Registering worker hadoop-pg-5.cluster:7078 with 2 cores, 64.0 MB RAM
> ""
>
> The workers should be hadoop-pg-7/-8/-9
> This seems strange to me, or do I just interpret the log entry wrong ?!?!
> Perhaps this relates to my other post titled "TaskSchedulerImpl: Initial job
> has not accepted any resources" and both issues are caused by the same
> problem/misconfiguration.
>
> All the nodes can reach each other via ping and telnet on the corresponding
> ports.
>
> Any hints what is going wrong there?

I had this happen in a virtualbox configuration I was testing with,
and never got it fixed, but suspected it was because I wasn't using
FQDNs (as you're also not doing). I had found - can't find it again
now, of course - a message in the archives from somebody also
suffering from this issue while still using FQDNs, but claimed that
they found an error in /etc/hosts.

Good luck.