You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by David Greenberg <ds...@gmail.com> on 2013/04/16 19:58:21 UTC

Launching a Mesos cluster w/ zookeeper for reliability

I am trying to use the automatic master failover feature of zookeeper, but
I'm seeing several issues:

When I launch multiple masters with ./mesos-master.sh --url=zoo://
myserver1.com:2181,myserver2.com:2181,myserver3.com:2181/mesos , all 3
servers elect themselves as master and I don't see anything in the logs
about zookeeper.

Similarly, when I launch slaves, they require a --master setting, which, if
I provide the zoo:// URL, causes them to fault (and I don't see why I
should provide a hostname, given that a host could be down.

I assume that I'm making some silly mistake in how I'm launching these
processes.

Thanks,
David

Re: Launching a Mesos cluster w/ zookeeper for reliability

Posted by Benjamin Mahler <be...@gmail.com>.
Great!! There are several spots that need fixing:

src/master/main.cpp
src/mesos/main.cpp
src/slave/main.cpp
src/java/src/org/apache/mesos/MesosSchedulerDriver.java (just the javadoc
needs fixing)

Also, yes that would be a bug, can you provide more information / logs /
etc?

On Thu, Apr 18, 2013 at 2:33 PM, David Greenberg <ds...@gmail.com>wrote:

> I will be happy to! I'm just finishing up the process with my employer to
> be able to start submitting patches (I have them all ready and waiting).
>
> By the way, I have discovered a bug, I think (unless it's already been
> found): after master failover, new frameworks I launch don't get resource
> offers.
>
>
> On Wed, Apr 17, 2013 at 4:47 PM, Vinod Kone <vi...@gmail.com> wrote:
>
> > Great to hear you were able to debug this David. Sounds like we should
> > either fix our help message or make the code work with the format that
> the
> > 'help' claims. I would think the former is easiest. Would you mind
> sending
> > us a patch?
> >
> >
> > On Wed, Apr 17, 2013 at 12:53 PM, David Greenberg <
> dsg123456789@gmail.com
> > >wrote:
> >
> > > I got things to work, sort of, using the zk:// url type. I am now using
> > the
> > > 0.12.X branch from the Github mirror. When I try to bring up the
> masters,
> > > often multiple machines decide to be the master. Similarly, when I try
> to
> > > bring up slaves, they rarely detect the masters (maybe 5-10% of the
> > time).
> > >
> > > I triaged the issue and determined that the correct zk url to use is
> > this:
> > >
> > > zk://
> > >
> >
> myserver1.com:2181/mesos,myserver2.com:2181/mesos,myserver3.com:2181/mesos
> > >
> > > Note that you must specify the same hierarchy path for each server. If
> > you
> > > don't do this, things will work, but unreliably.
> > >
> > >
> > > On Tue, Apr 16, 2013 at 4:50 PM, Benjamin Mahler
> > > <be...@gmail.com>wrote:
> > >
> > > > I believe it needs to be prefixed with "zk://" rather than zoo.
> > > >
> > > > The relevant code is in detector.cpp:
> > > >
> > > > *  } else if (master.find("zk://") == 0) {*
> > > >     Try<zookeeper::URL> url = zookeeper::URL::parse(master);
> > > >     if (url.isError()) {
> > > >       return Error(url.error());
> > > >     }
> > > >     if (url.get().path == "/") {
> > > >       return Error(
> > > >           "Expecting a (chroot) path for ZooKeeper ('/' is not
> > > > supported)");
> > > >     }
> > > >     return new ZooKeeperMasterDetector(url.get(), pid, contend,
> quiet);
> > > >   }
> > > >
> > > >
> > > > On Tue, Apr 16, 2013 at 1:01 PM, David Greenberg <
> > dsg123456789@gmail.com
> > > > >wrote:
> > > >
> > > > > Hi Vinod,
> > > > > That's correct. I tried starting the masters with --zk instead of
> > > --url.
> > > > I
> > > > > am running mesos from the git mirror at commit 3fa8389. Should I
> try
> > > > > updating to head, or is there a particular more stable version I
> > should
> > > > > use?
> > > > >
> > > > > dgrnbrg@myserver1.com:~/mesos/bin$ ./mesos-master.sh --zk=zoo://
> > > > > myserver1.com:2181,myserver2.com:2181,myserver3.com:2181/mesos
> > > > > I0416 19:59:45.205003 48438 main.cpp:116] Build: 2013-04-08
> 19:16:35
> > by
> > > > > dgrnbrg
> > > > > I0416 19:59:45.205140 48438 main.cpp:117] Starting Mesos master
> > > > > I0416 19:59:45.205313 48466 master.cpp:309] Master started on
> > > > > 172.21.97.196:5050
> > > > > I0416 19:59:45.205397 48466 master.cpp:324] Master ID:
> > > > > 201304161959-3294696876-5050-48438
> > > > > W0416 19:59:45.205567 48484 master.cpp:81] No whitelist given.
> > > > Advertising
> > > > > offers for all slaves
> > > > > F0416 19:59:45.205613 48438 main.cpp:129] CHECK_SOME(detector)
> > failed:
> > > > > Failed to create a master detector: Cannot parse '@0.0.0.0:0'
> > > > > *** Check failure stack trace: ***
> > > > >     @     0x7f230ef49f1d  google::LogMessage::Fail()
> > > > >     @     0x7f230ef4e5cf  google::LogMessage::SendToLog()
> > > > >     @     0x7f230ef4db07  google::LogMessage::Flush()
> > > > >     @     0x7f230ef4f25d
>  google::LogMessageFatal::~LogMessageFatal()
> > > > >     @           0x41c079  main
> > > > >     @     0x7f230cf74abd  (unknown)
> > > > >     @           0x418979  (unknown)
> > > > > Aborted
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Apr 16, 2013 at 2:38 PM, Vinod Kone <vi...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi David,
> > > > > >
> > > > > > I'm assuming the myserver[1-2-3].com above are your zk servers?
> > > > > >
> > > > > > Also, masters take "--zk" instead of "--url" for zookeeper
> address.
> > > > > "--url"
> > > > > > might have been our old flag, which is deprecated (which version
> of
> > > > mesos
> > > > > > are you running?).
> > > > > >
> > > > > > For slaves, "--master" should be the same set of zk servers that
> > you
> > > > > > started your masters with.
> > > > > >
> > > > > > So, "--master="zoo://myserver1.com:2181,myserver2.com:2181,
> > > > > > myserver3.com:2181/mesos"
> > > > > >
> > > > > > Let me know if that works. If not, please paste the master and
> > slave
> > > > > logs.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 16, 2013 at 10:58 AM, David Greenberg <
> > > > > dsg123456789@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > I am trying to use the automatic master failover feature of
> > > > zookeeper,
> > > > > > but
> > > > > > > I'm seeing several issues:
> > > > > > >
> > > > > > > When I launch multiple masters with ./mesos-master.sh
> > --url=zoo://
> > > > > > > myserver1.com:2181,myserver2.com:2181,myserver3.com:2181/mesos,
> > > > all 3
> > > > > > > servers elect themselves as master and I don't see anything in
> > the
> > > > logs
> > > > > > > about zookeeper.
> > > > > > >
> > > > > > > Similarly, when I launch slaves, they require a --master
> setting,
> > > > > which,
> > > > > > if
> > > > > > > I provide the zoo:// URL, causes them to fault (and I don't see
> > > why I
> > > > > > > should provide a hostname, given that a host could be down.
> > > > > > >
> > > > > > > I assume that I'm making some silly mistake in how I'm
> launching
> > > > these
> > > > > > > processes.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > David
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Launching a Mesos cluster w/ zookeeper for reliability

Posted by David Greenberg <ds...@gmail.com>.
I will be happy to! I'm just finishing up the process with my employer to
be able to start submitting patches (I have them all ready and waiting).

By the way, I have discovered a bug, I think (unless it's already been
found): after master failover, new frameworks I launch don't get resource
offers.


On Wed, Apr 17, 2013 at 4:47 PM, Vinod Kone <vi...@gmail.com> wrote:

> Great to hear you were able to debug this David. Sounds like we should
> either fix our help message or make the code work with the format that the
> 'help' claims. I would think the former is easiest. Would you mind sending
> us a patch?
>
>
> On Wed, Apr 17, 2013 at 12:53 PM, David Greenberg <dsg123456789@gmail.com
> >wrote:
>
> > I got things to work, sort of, using the zk:// url type. I am now using
> the
> > 0.12.X branch from the Github mirror. When I try to bring up the masters,
> > often multiple machines decide to be the master. Similarly, when I try to
> > bring up slaves, they rarely detect the masters (maybe 5-10% of the
> time).
> >
> > I triaged the issue and determined that the correct zk url to use is
> this:
> >
> > zk://
> >
> myserver1.com:2181/mesos,myserver2.com:2181/mesos,myserver3.com:2181/mesos
> >
> > Note that you must specify the same hierarchy path for each server. If
> you
> > don't do this, things will work, but unreliably.
> >
> >
> > On Tue, Apr 16, 2013 at 4:50 PM, Benjamin Mahler
> > <be...@gmail.com>wrote:
> >
> > > I believe it needs to be prefixed with "zk://" rather than zoo.
> > >
> > > The relevant code is in detector.cpp:
> > >
> > > *  } else if (master.find("zk://") == 0) {*
> > >     Try<zookeeper::URL> url = zookeeper::URL::parse(master);
> > >     if (url.isError()) {
> > >       return Error(url.error());
> > >     }
> > >     if (url.get().path == "/") {
> > >       return Error(
> > >           "Expecting a (chroot) path for ZooKeeper ('/' is not
> > > supported)");
> > >     }
> > >     return new ZooKeeperMasterDetector(url.get(), pid, contend, quiet);
> > >   }
> > >
> > >
> > > On Tue, Apr 16, 2013 at 1:01 PM, David Greenberg <
> dsg123456789@gmail.com
> > > >wrote:
> > >
> > > > Hi Vinod,
> > > > That's correct. I tried starting the masters with --zk instead of
> > --url.
> > > I
> > > > am running mesos from the git mirror at commit 3fa8389. Should I try
> > > > updating to head, or is there a particular more stable version I
> should
> > > > use?
> > > >
> > > > dgrnbrg@myserver1.com:~/mesos/bin$ ./mesos-master.sh --zk=zoo://
> > > > myserver1.com:2181,myserver2.com:2181,myserver3.com:2181/mesos
> > > > I0416 19:59:45.205003 48438 main.cpp:116] Build: 2013-04-08 19:16:35
> by
> > > > dgrnbrg
> > > > I0416 19:59:45.205140 48438 main.cpp:117] Starting Mesos master
> > > > I0416 19:59:45.205313 48466 master.cpp:309] Master started on
> > > > 172.21.97.196:5050
> > > > I0416 19:59:45.205397 48466 master.cpp:324] Master ID:
> > > > 201304161959-3294696876-5050-48438
> > > > W0416 19:59:45.205567 48484 master.cpp:81] No whitelist given.
> > > Advertising
> > > > offers for all slaves
> > > > F0416 19:59:45.205613 48438 main.cpp:129] CHECK_SOME(detector)
> failed:
> > > > Failed to create a master detector: Cannot parse '@0.0.0.0:0'
> > > > *** Check failure stack trace: ***
> > > >     @     0x7f230ef49f1d  google::LogMessage::Fail()
> > > >     @     0x7f230ef4e5cf  google::LogMessage::SendToLog()
> > > >     @     0x7f230ef4db07  google::LogMessage::Flush()
> > > >     @     0x7f230ef4f25d  google::LogMessageFatal::~LogMessageFatal()
> > > >     @           0x41c079  main
> > > >     @     0x7f230cf74abd  (unknown)
> > > >     @           0x418979  (unknown)
> > > > Aborted
> > > >
> > > >
> > > >
> > > > On Tue, Apr 16, 2013 at 2:38 PM, Vinod Kone <vi...@gmail.com>
> > wrote:
> > > >
> > > > > Hi David,
> > > > >
> > > > > I'm assuming the myserver[1-2-3].com above are your zk servers?
> > > > >
> > > > > Also, masters take "--zk" instead of "--url" for zookeeper address.
> > > > "--url"
> > > > > might have been our old flag, which is deprecated (which version of
> > > mesos
> > > > > are you running?).
> > > > >
> > > > > For slaves, "--master" should be the same set of zk servers that
> you
> > > > > started your masters with.
> > > > >
> > > > > So, "--master="zoo://myserver1.com:2181,myserver2.com:2181,
> > > > > myserver3.com:2181/mesos"
> > > > >
> > > > > Let me know if that works. If not, please paste the master and
> slave
> > > > logs.
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Apr 16, 2013 at 10:58 AM, David Greenberg <
> > > > dsg123456789@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > I am trying to use the automatic master failover feature of
> > > zookeeper,
> > > > > but
> > > > > > I'm seeing several issues:
> > > > > >
> > > > > > When I launch multiple masters with ./mesos-master.sh
> --url=zoo://
> > > > > > myserver1.com:2181,myserver2.com:2181,myserver3.com:2181/mesos ,
> > > all 3
> > > > > > servers elect themselves as master and I don't see anything in
> the
> > > logs
> > > > > > about zookeeper.
> > > > > >
> > > > > > Similarly, when I launch slaves, they require a --master setting,
> > > > which,
> > > > > if
> > > > > > I provide the zoo:// URL, causes them to fault (and I don't see
> > why I
> > > > > > should provide a hostname, given that a host could be down.
> > > > > >
> > > > > > I assume that I'm making some silly mistake in how I'm launching
> > > these
> > > > > > processes.
> > > > > >
> > > > > > Thanks,
> > > > > > David
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Launching a Mesos cluster w/ zookeeper for reliability

Posted by Vinod Kone <vi...@gmail.com>.
Great to hear you were able to debug this David. Sounds like we should
either fix our help message or make the code work with the format that the
'help' claims. I would think the former is easiest. Would you mind sending
us a patch?


On Wed, Apr 17, 2013 at 12:53 PM, David Greenberg <ds...@gmail.com>wrote:

> I got things to work, sort of, using the zk:// url type. I am now using the
> 0.12.X branch from the Github mirror. When I try to bring up the masters,
> often multiple machines decide to be the master. Similarly, when I try to
> bring up slaves, they rarely detect the masters (maybe 5-10% of the time).
>
> I triaged the issue and determined that the correct zk url to use is this:
>
> zk://
> myserver1.com:2181/mesos,myserver2.com:2181/mesos,myserver3.com:2181/mesos
>
> Note that you must specify the same hierarchy path for each server. If you
> don't do this, things will work, but unreliably.
>
>
> On Tue, Apr 16, 2013 at 4:50 PM, Benjamin Mahler
> <be...@gmail.com>wrote:
>
> > I believe it needs to be prefixed with "zk://" rather than zoo.
> >
> > The relevant code is in detector.cpp:
> >
> > *  } else if (master.find("zk://") == 0) {*
> >     Try<zookeeper::URL> url = zookeeper::URL::parse(master);
> >     if (url.isError()) {
> >       return Error(url.error());
> >     }
> >     if (url.get().path == "/") {
> >       return Error(
> >           "Expecting a (chroot) path for ZooKeeper ('/' is not
> > supported)");
> >     }
> >     return new ZooKeeperMasterDetector(url.get(), pid, contend, quiet);
> >   }
> >
> >
> > On Tue, Apr 16, 2013 at 1:01 PM, David Greenberg <dsg123456789@gmail.com
> > >wrote:
> >
> > > Hi Vinod,
> > > That's correct. I tried starting the masters with --zk instead of
> --url.
> > I
> > > am running mesos from the git mirror at commit 3fa8389. Should I try
> > > updating to head, or is there a particular more stable version I should
> > > use?
> > >
> > > dgrnbrg@myserver1.com:~/mesos/bin$ ./mesos-master.sh --zk=zoo://
> > > myserver1.com:2181,myserver2.com:2181,myserver3.com:2181/mesos
> > > I0416 19:59:45.205003 48438 main.cpp:116] Build: 2013-04-08 19:16:35 by
> > > dgrnbrg
> > > I0416 19:59:45.205140 48438 main.cpp:117] Starting Mesos master
> > > I0416 19:59:45.205313 48466 master.cpp:309] Master started on
> > > 172.21.97.196:5050
> > > I0416 19:59:45.205397 48466 master.cpp:324] Master ID:
> > > 201304161959-3294696876-5050-48438
> > > W0416 19:59:45.205567 48484 master.cpp:81] No whitelist given.
> > Advertising
> > > offers for all slaves
> > > F0416 19:59:45.205613 48438 main.cpp:129] CHECK_SOME(detector) failed:
> > > Failed to create a master detector: Cannot parse '@0.0.0.0:0'
> > > *** Check failure stack trace: ***
> > >     @     0x7f230ef49f1d  google::LogMessage::Fail()
> > >     @     0x7f230ef4e5cf  google::LogMessage::SendToLog()
> > >     @     0x7f230ef4db07  google::LogMessage::Flush()
> > >     @     0x7f230ef4f25d  google::LogMessageFatal::~LogMessageFatal()
> > >     @           0x41c079  main
> > >     @     0x7f230cf74abd  (unknown)
> > >     @           0x418979  (unknown)
> > > Aborted
> > >
> > >
> > >
> > > On Tue, Apr 16, 2013 at 2:38 PM, Vinod Kone <vi...@gmail.com>
> wrote:
> > >
> > > > Hi David,
> > > >
> > > > I'm assuming the myserver[1-2-3].com above are your zk servers?
> > > >
> > > > Also, masters take "--zk" instead of "--url" for zookeeper address.
> > > "--url"
> > > > might have been our old flag, which is deprecated (which version of
> > mesos
> > > > are you running?).
> > > >
> > > > For slaves, "--master" should be the same set of zk servers that you
> > > > started your masters with.
> > > >
> > > > So, "--master="zoo://myserver1.com:2181,myserver2.com:2181,
> > > > myserver3.com:2181/mesos"
> > > >
> > > > Let me know if that works. If not, please paste the master and slave
> > > logs.
> > > >
> > > >
> > > >
> > > > On Tue, Apr 16, 2013 at 10:58 AM, David Greenberg <
> > > dsg123456789@gmail.com
> > > > >wrote:
> > > >
> > > > > I am trying to use the automatic master failover feature of
> > zookeeper,
> > > > but
> > > > > I'm seeing several issues:
> > > > >
> > > > > When I launch multiple masters with ./mesos-master.sh --url=zoo://
> > > > > myserver1.com:2181,myserver2.com:2181,myserver3.com:2181/mesos ,
> > all 3
> > > > > servers elect themselves as master and I don't see anything in the
> > logs
> > > > > about zookeeper.
> > > > >
> > > > > Similarly, when I launch slaves, they require a --master setting,
> > > which,
> > > > if
> > > > > I provide the zoo:// URL, causes them to fault (and I don't see
> why I
> > > > > should provide a hostname, given that a host could be down.
> > > > >
> > > > > I assume that I'm making some silly mistake in how I'm launching
> > these
> > > > > processes.
> > > > >
> > > > > Thanks,
> > > > > David
> > > > >
> > > >
> > >
> >
>

Re: Launching a Mesos cluster w/ zookeeper for reliability

Posted by David Greenberg <ds...@gmail.com>.
I got things to work, sort of, using the zk:// url type. I am now using the
0.12.X branch from the Github mirror. When I try to bring up the masters,
often multiple machines decide to be the master. Similarly, when I try to
bring up slaves, they rarely detect the masters (maybe 5-10% of the time).

I triaged the issue and determined that the correct zk url to use is this:

zk://
myserver1.com:2181/mesos,myserver2.com:2181/mesos,myserver3.com:2181/mesos

Note that you must specify the same hierarchy path for each server. If you
don't do this, things will work, but unreliably.


On Tue, Apr 16, 2013 at 4:50 PM, Benjamin Mahler
<be...@gmail.com>wrote:

> I believe it needs to be prefixed with "zk://" rather than zoo.
>
> The relevant code is in detector.cpp:
>
> *  } else if (master.find("zk://") == 0) {*
>     Try<zookeeper::URL> url = zookeeper::URL::parse(master);
>     if (url.isError()) {
>       return Error(url.error());
>     }
>     if (url.get().path == "/") {
>       return Error(
>           "Expecting a (chroot) path for ZooKeeper ('/' is not
> supported)");
>     }
>     return new ZooKeeperMasterDetector(url.get(), pid, contend, quiet);
>   }
>
>
> On Tue, Apr 16, 2013 at 1:01 PM, David Greenberg <dsg123456789@gmail.com
> >wrote:
>
> > Hi Vinod,
> > That's correct. I tried starting the masters with --zk instead of --url.
> I
> > am running mesos from the git mirror at commit 3fa8389. Should I try
> > updating to head, or is there a particular more stable version I should
> > use?
> >
> > dgrnbrg@myserver1.com:~/mesos/bin$ ./mesos-master.sh --zk=zoo://
> > myserver1.com:2181,myserver2.com:2181,myserver3.com:2181/mesos
> > I0416 19:59:45.205003 48438 main.cpp:116] Build: 2013-04-08 19:16:35 by
> > dgrnbrg
> > I0416 19:59:45.205140 48438 main.cpp:117] Starting Mesos master
> > I0416 19:59:45.205313 48466 master.cpp:309] Master started on
> > 172.21.97.196:5050
> > I0416 19:59:45.205397 48466 master.cpp:324] Master ID:
> > 201304161959-3294696876-5050-48438
> > W0416 19:59:45.205567 48484 master.cpp:81] No whitelist given.
> Advertising
> > offers for all slaves
> > F0416 19:59:45.205613 48438 main.cpp:129] CHECK_SOME(detector) failed:
> > Failed to create a master detector: Cannot parse '@0.0.0.0:0'
> > *** Check failure stack trace: ***
> >     @     0x7f230ef49f1d  google::LogMessage::Fail()
> >     @     0x7f230ef4e5cf  google::LogMessage::SendToLog()
> >     @     0x7f230ef4db07  google::LogMessage::Flush()
> >     @     0x7f230ef4f25d  google::LogMessageFatal::~LogMessageFatal()
> >     @           0x41c079  main
> >     @     0x7f230cf74abd  (unknown)
> >     @           0x418979  (unknown)
> > Aborted
> >
> >
> >
> > On Tue, Apr 16, 2013 at 2:38 PM, Vinod Kone <vi...@gmail.com> wrote:
> >
> > > Hi David,
> > >
> > > I'm assuming the myserver[1-2-3].com above are your zk servers?
> > >
> > > Also, masters take "--zk" instead of "--url" for zookeeper address.
> > "--url"
> > > might have been our old flag, which is deprecated (which version of
> mesos
> > > are you running?).
> > >
> > > For slaves, "--master" should be the same set of zk servers that you
> > > started your masters with.
> > >
> > > So, "--master="zoo://myserver1.com:2181,myserver2.com:2181,
> > > myserver3.com:2181/mesos"
> > >
> > > Let me know if that works. If not, please paste the master and slave
> > logs.
> > >
> > >
> > >
> > > On Tue, Apr 16, 2013 at 10:58 AM, David Greenberg <
> > dsg123456789@gmail.com
> > > >wrote:
> > >
> > > > I am trying to use the automatic master failover feature of
> zookeeper,
> > > but
> > > > I'm seeing several issues:
> > > >
> > > > When I launch multiple masters with ./mesos-master.sh --url=zoo://
> > > > myserver1.com:2181,myserver2.com:2181,myserver3.com:2181/mesos ,
> all 3
> > > > servers elect themselves as master and I don't see anything in the
> logs
> > > > about zookeeper.
> > > >
> > > > Similarly, when I launch slaves, they require a --master setting,
> > which,
> > > if
> > > > I provide the zoo:// URL, causes them to fault (and I don't see why I
> > > > should provide a hostname, given that a host could be down.
> > > >
> > > > I assume that I'm making some silly mistake in how I'm launching
> these
> > > > processes.
> > > >
> > > > Thanks,
> > > > David
> > > >
> > >
> >
>

Re: Launching a Mesos cluster w/ zookeeper for reliability

Posted by Benjamin Mahler <be...@gmail.com>.
I believe it needs to be prefixed with "zk://" rather than zoo.

The relevant code is in detector.cpp:

*  } else if (master.find("zk://") == 0) {*
    Try<zookeeper::URL> url = zookeeper::URL::parse(master);
    if (url.isError()) {
      return Error(url.error());
    }
    if (url.get().path == "/") {
      return Error(
          "Expecting a (chroot) path for ZooKeeper ('/' is not supported)");
    }
    return new ZooKeeperMasterDetector(url.get(), pid, contend, quiet);
  }


On Tue, Apr 16, 2013 at 1:01 PM, David Greenberg <ds...@gmail.com>wrote:

> Hi Vinod,
> That's correct. I tried starting the masters with --zk instead of --url. I
> am running mesos from the git mirror at commit 3fa8389. Should I try
> updating to head, or is there a particular more stable version I should
> use?
>
> dgrnbrg@myserver1.com:~/mesos/bin$ ./mesos-master.sh --zk=zoo://
> myserver1.com:2181,myserver2.com:2181,myserver3.com:2181/mesos
> I0416 19:59:45.205003 48438 main.cpp:116] Build: 2013-04-08 19:16:35 by
> dgrnbrg
> I0416 19:59:45.205140 48438 main.cpp:117] Starting Mesos master
> I0416 19:59:45.205313 48466 master.cpp:309] Master started on
> 172.21.97.196:5050
> I0416 19:59:45.205397 48466 master.cpp:324] Master ID:
> 201304161959-3294696876-5050-48438
> W0416 19:59:45.205567 48484 master.cpp:81] No whitelist given. Advertising
> offers for all slaves
> F0416 19:59:45.205613 48438 main.cpp:129] CHECK_SOME(detector) failed:
> Failed to create a master detector: Cannot parse '@0.0.0.0:0'
> *** Check failure stack trace: ***
>     @     0x7f230ef49f1d  google::LogMessage::Fail()
>     @     0x7f230ef4e5cf  google::LogMessage::SendToLog()
>     @     0x7f230ef4db07  google::LogMessage::Flush()
>     @     0x7f230ef4f25d  google::LogMessageFatal::~LogMessageFatal()
>     @           0x41c079  main
>     @     0x7f230cf74abd  (unknown)
>     @           0x418979  (unknown)
> Aborted
>
>
>
> On Tue, Apr 16, 2013 at 2:38 PM, Vinod Kone <vi...@gmail.com> wrote:
>
> > Hi David,
> >
> > I'm assuming the myserver[1-2-3].com above are your zk servers?
> >
> > Also, masters take "--zk" instead of "--url" for zookeeper address.
> "--url"
> > might have been our old flag, which is deprecated (which version of mesos
> > are you running?).
> >
> > For slaves, "--master" should be the same set of zk servers that you
> > started your masters with.
> >
> > So, "--master="zoo://myserver1.com:2181,myserver2.com:2181,
> > myserver3.com:2181/mesos"
> >
> > Let me know if that works. If not, please paste the master and slave
> logs.
> >
> >
> >
> > On Tue, Apr 16, 2013 at 10:58 AM, David Greenberg <
> dsg123456789@gmail.com
> > >wrote:
> >
> > > I am trying to use the automatic master failover feature of zookeeper,
> > but
> > > I'm seeing several issues:
> > >
> > > When I launch multiple masters with ./mesos-master.sh --url=zoo://
> > > myserver1.com:2181,myserver2.com:2181,myserver3.com:2181/mesos , all 3
> > > servers elect themselves as master and I don't see anything in the logs
> > > about zookeeper.
> > >
> > > Similarly, when I launch slaves, they require a --master setting,
> which,
> > if
> > > I provide the zoo:// URL, causes them to fault (and I don't see why I
> > > should provide a hostname, given that a host could be down.
> > >
> > > I assume that I'm making some silly mistake in how I'm launching these
> > > processes.
> > >
> > > Thanks,
> > > David
> > >
> >
>

Re: Launching a Mesos cluster w/ zookeeper for reliability

Posted by David Greenberg <ds...@gmail.com>.
Hi Vinod,
That's correct. I tried starting the masters with --zk instead of --url. I
am running mesos from the git mirror at commit 3fa8389. Should I try
updating to head, or is there a particular more stable version I should use?

dgrnbrg@myserver1.com:~/mesos/bin$ ./mesos-master.sh --zk=zoo://
myserver1.com:2181,myserver2.com:2181,myserver3.com:2181/mesos
I0416 19:59:45.205003 48438 main.cpp:116] Build: 2013-04-08 19:16:35 by
dgrnbrg
I0416 19:59:45.205140 48438 main.cpp:117] Starting Mesos master
I0416 19:59:45.205313 48466 master.cpp:309] Master started on
172.21.97.196:5050
I0416 19:59:45.205397 48466 master.cpp:324] Master ID:
201304161959-3294696876-5050-48438
W0416 19:59:45.205567 48484 master.cpp:81] No whitelist given. Advertising
offers for all slaves
F0416 19:59:45.205613 48438 main.cpp:129] CHECK_SOME(detector) failed:
Failed to create a master detector: Cannot parse '@0.0.0.0:0'
*** Check failure stack trace: ***
    @     0x7f230ef49f1d  google::LogMessage::Fail()
    @     0x7f230ef4e5cf  google::LogMessage::SendToLog()
    @     0x7f230ef4db07  google::LogMessage::Flush()
    @     0x7f230ef4f25d  google::LogMessageFatal::~LogMessageFatal()
    @           0x41c079  main
    @     0x7f230cf74abd  (unknown)
    @           0x418979  (unknown)
Aborted



On Tue, Apr 16, 2013 at 2:38 PM, Vinod Kone <vi...@gmail.com> wrote:

> Hi David,
>
> I'm assuming the myserver[1-2-3].com above are your zk servers?
>
> Also, masters take "--zk" instead of "--url" for zookeeper address. "--url"
> might have been our old flag, which is deprecated (which version of mesos
> are you running?).
>
> For slaves, "--master" should be the same set of zk servers that you
> started your masters with.
>
> So, "--master="zoo://myserver1.com:2181,myserver2.com:2181,
> myserver3.com:2181/mesos"
>
> Let me know if that works. If not, please paste the master and slave logs.
>
>
>
> On Tue, Apr 16, 2013 at 10:58 AM, David Greenberg <dsg123456789@gmail.com
> >wrote:
>
> > I am trying to use the automatic master failover feature of zookeeper,
> but
> > I'm seeing several issues:
> >
> > When I launch multiple masters with ./mesos-master.sh --url=zoo://
> > myserver1.com:2181,myserver2.com:2181,myserver3.com:2181/mesos , all 3
> > servers elect themselves as master and I don't see anything in the logs
> > about zookeeper.
> >
> > Similarly, when I launch slaves, they require a --master setting, which,
> if
> > I provide the zoo:// URL, causes them to fault (and I don't see why I
> > should provide a hostname, given that a host could be down.
> >
> > I assume that I'm making some silly mistake in how I'm launching these
> > processes.
> >
> > Thanks,
> > David
> >
>

Re: Launching a Mesos cluster w/ zookeeper for reliability

Posted by Vinod Kone <vi...@gmail.com>.
Hi David,

I'm assuming the myserver[1-2-3].com above are your zk servers?

Also, masters take "--zk" instead of "--url" for zookeeper address. "--url"
might have been our old flag, which is deprecated (which version of mesos
are you running?).

For slaves, "--master" should be the same set of zk servers that you
started your masters with.

So, "--master="zoo://myserver1.com:2181,myserver2.com:2181,
myserver3.com:2181/mesos"

Let me know if that works. If not, please paste the master and slave logs.



On Tue, Apr 16, 2013 at 10:58 AM, David Greenberg <ds...@gmail.com>wrote:

> I am trying to use the automatic master failover feature of zookeeper, but
> I'm seeing several issues:
>
> When I launch multiple masters with ./mesos-master.sh --url=zoo://
> myserver1.com:2181,myserver2.com:2181,myserver3.com:2181/mesos , all 3
> servers elect themselves as master and I don't see anything in the logs
> about zookeeper.
>
> Similarly, when I launch slaves, they require a --master setting, which, if
> I provide the zoo:// URL, causes them to fault (and I don't see why I
> should provide a hostname, given that a host could be down.
>
> I assume that I'm making some silly mistake in how I'm launching these
> processes.
>
> Thanks,
> David
>