You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Alberto Rodriguez <ar...@gmail.com> on 2015/05/25 18:30:09 UTC

Not able to connect to mesos from different machine

Hi all,

I managed to get a mesos cluster up & running on a Ubuntu VM. I've
been also able to run and connect a spark-shell from this machine and
it works properly.

Unfortunately, I'm trying to connect from the host machine where the
VM is running to launch spark jobs and I can not.

See below the spark console output:

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_75)
Type in expressions to have them evaluated.
Type :help for more information.
15/05/25 18:13:00 INFO SecurityManager: Changing view acls to: arodriguez
15/05/25 18:13:00 INFO SecurityManager: Changing modify acls to: arodriguez
15/05/25 18:13:00 INFO SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view
permissions: Set(arodriguez); users with modify permissions:
Set(arodriguez)
15/05/25 18:13:01 INFO Slf4jLogger: Slf4jLogger started
15/05/25 18:13:01 INFO Remoting: Starting remoting
15/05/25 18:13:01 INFO Remoting: Remoting started; listening on
addresses :[akka.tcp://sparkDriver@localhost.localdomain:47229]
15/05/25 18:13:01 INFO Utils: Successfully started service
'sparkDriver' on port 47229.
15/05/25 18:13:01 INFO SparkEnv: Registering MapOutputTracker
15/05/25 18:13:01 INFO SparkEnv: Registering BlockManagerMaster
15/05/25 18:13:01 INFO DiskBlockManager: Created local directory at
/tmp/spark-local-20150525181301-7fa8
15/05/25 18:13:01 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
15/05/25 18:13:01 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where
applicable
15/05/25 18:13:01 INFO HttpFileServer: HTTP File server directory is
/tmp/spark-1249c23f-adc8-4fcd-a044-b65a80f40e16
15/05/25 18:13:01 INFO HttpServer: Starting HTTP Server
15/05/25 18:13:01 INFO Utils: Successfully started service 'HTTP file
server' on port 51659.
15/05/25 18:13:01 INFO Utils: Successfully started service 'SparkUI'
on port 4040.
15/05/25 18:13:01 INFO SparkUI: Started SparkUI at
http://localhost.localdomain:4040
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0525 18:13:01.749449 10908 sched.cpp:1323]
**************************************************
Scheduler driver bound to loopback interface! Cannot communicate with
remote master(s). You might want to set 'LIBPROCESS_IP' environment
variable to use a routable IP address.
**************************************************
2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@712:
Client environment:zookeeper.version=zookeeper C client 3.4.6
2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@716:
Client environment:host.name=localhost.localdomain
2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@723:
Client environment:os.name=Linux
2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@724:
Client environment:os.arch=3.19.7-200.fc21.x86_64
2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@725:
Client environment:os.version=#1 SMP Thu May 7 22:00:21 UTC 2015
2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@733:
Client environment:user.name=arodriguez
I0525 18:13:01.749791 10908 sched.cpp:157] Version: 0.22.1
2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@741:
Client environment:user.home=/home/arodriguez
2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@753:
Client environment:user.dir=/home/arodriguez/dev/spark-1.2.0-bin-hadoop2.4/bin
2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@zookeeper_init@786:
Initiating client connection, host=10.141.141.10:2181
sessionTimeout=10000 watcher=0x7fd4c2f0d5b0 sessionId=0
sessionPasswd=<null> context=0x7fd3d40063c0 flags=0
2015-05-25 18:13:01,750:10746(0x7fd4ab7fe700):ZOO_INFO@check_events@1705:
initiated connection to server [10.141.141.10:2181]
2015-05-25 18:13:01,752:10746(0x7fd4ab7fe700):ZOO_INFO@check_events@1752:
session establishment complete on server [10.141.141.10:2181],
sessionId=0x14d8babef360022, negotiated timeout=10000
I0525 18:13:01.752760 10913 group.cpp:313] Group process
(group(1)@127.0.0.1:48557) connected to ZooKeeper
I0525 18:13:01.752787 10913 group.cpp:790] Syncing group operations:
queue size (joins, cancels, datas) = (0, 0, 0)
I0525 18:13:01.752807 10913 group.cpp:385] Trying to create path
'/mesos' in ZooKeeper
I0525 18:13:01.754317 10909 detector.cpp:138] Detected a new leader: (id='16')
I0525 18:13:01.754408 10913 group.cpp:659] Trying to get
'/mesos/info_0000000016' in ZooKeeper
I0525 18:13:01.755056 10913 detector.cpp:452] A new leading master
(UPID=master@127.0.1.1:5050) is detected
I0525 18:13:01.755113 10911 sched.cpp:254] New master detected at
master@127.0.1.1:5050
I0525 18:13:01.755345 10911 sched.cpp:264] No credentials provided.
Attempting to register without authentication


It hangs up in the last line.

I've tried to set the LIBPROCESS_IP env variable with no luck.

Any advice?

Thank you in advance.

Kind regards,

Alberto

Re: Not able to connect to mesos from different machine

Posted by Marco Massenzio <ma...@mesosphere.io>.
You're most welcome!

Just another thing, then: please be aware that, if you are on a Mac and
running Cisco's VPN client, that one messes with VBox's firewall rules for
host-only and will cause "baffling behavior" :)
(just thought I'd mention, at another place I worked at, it gave us a lot
of grief until we found out)

*Marco Massenzio*
*Distributed Systems Engineer*

On Fri, May 29, 2015 at 12:03 AM, Alberto Rodriguez <ar...@gmail.com>
wrote:

> Hi Marco,
>
> there is no need to apologize! Thank you very, very much for your detailed
> explanation. As you said I tested it out NAT'ing the VMs but it didn't'
> work. I'll try to test your solution when I've got some spare time and get
> back to the group to let you know guys if your solution work.
>
> Thank you again!
>
> 2015-05-29 8:48 GMT+02:00 Marco Massenzio <ma...@mesosphere.io>:
>
> > Apologies in advance if you already know all this and are an expert on
> vbox
> > & networking - but maybe this either helps or at least may point you in
> the
> > right direction (hopefully!)
> >
> > The problem is most likely to be found in the fact that your laptop (or
> > whatever box you're running vbox in) has a hostname that's not
> > DNS-resolvable (and probably neither your VMs do).
> >
> > Further, by default, VBox configures the VM's NICs to be on a 'Bridged'
> > private subnet, which means that you can 'net out' (eg, ping google.com
> > from the VM) but not get in (eg, run a server accessible from outside the
> > VM)
> >
> > Mesos master/slave need to be able to talk to each other,
> bi-directionally,
> > which is possibly what was causing the issue in the first place.
> >
> > NAT'ing the VMs won't probably work either (you won't know in advance
> which
> > port the Slave will be listening on - I think!)
> >
> > One option is to configure vbox's VMs to be on their own subnet (I forget
> > the exact terminology, it's been almost a year now since I fiddled with
> it:
> > I think it's the Host-Only option
> > <https://www.virtualbox.org/manual/ch06.html#network_hostonly>) but
> > essentially vbox will create a subnet and act as a router - the host
> > machine will also have a virtual NIC in that subnet, so you'll be able to
> > route requests to/from the VMs.
> >
> > There's also the fact that the Spark driver (pyspark, or spark-submit)
> will
> > need to be able to talk to the worker nodes, but that should "just work"
> > once you get Mesos to work.
> >
> > HTH,
> >
> >
> > *Marco Massenzio*
> > *Distributed Systems Engineer*
> >
> > On Thu, May 28, 2015 at 11:13 PM, Alberto Rodriguez <ar...@gmail.com>
> > wrote:
> >
> > > To be honest I don't know what was the problem. I didn't manage to make
> > my
> > > Spark jobs work on the mesos cluster running on two virtual machines. I
> > > managed to make it work when I run my Spark jobs on my local machine
> and
> > > both master and mesos slaves are running also in my machine.
> > >
> > > I guess something is not working properly in the way that virtualbox is
> > > assigning their network interfaces to the virtual machines but I can't
> > > waste more time in the issue.
> > >
> > > Thank you again for your help!
> > >
> > > 2015-05-28 19:28 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:
> > >
> > > > Great! Mind sharing with the list what the problem was (for future
> > > > reference)?
> > > >
> > > > On Thu, May 28, 2015 at 5:25 PM, Alberto Rodriguez <
> ardlema@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Alex,
> > > > >
> > > > > I managed to make it work!! Finally I'm running both mesos master
> and
> > > > slave
> > > > > in my laptop and picking up the spark jar from a hdfs installed in
> a
> > > VM.
> > > > > I've just launched an spark job and is working fine!
> > > > >
> > > > > Thank you very much for your help
> > > > >
> > > > > 2015-05-28 16:20 GMT+02:00 Alberto Rodriguez <ar...@gmail.com>:
> > > > >
> > > > > > Hi Alex,
> > > > > >
> > > > > > see following an extract of the chronos log (not sure whether
> this
> > is
> > > > the
> > > > > > log you were talking about):
> > > > > >
> > > > > > 2015-05-28_14:18:28.49322 [2015-05-28 14:18:28,491] INFO No tasks
> > > > > > scheduled! Declining offers
> > > > > > (com.airbnb.scheduler.mesos.MesosJobFramework:106)
> > > > > > 2015-05-28_14:18:34.49896 [2015-05-28 14:18:34,497] INFO Received
> > > > > resource
> > > > > > offers
> > > > > > 2015-05-28_14:18:34.49903
> > > > > >  (com.airbnb.scheduler.mesos.MesosJobFramework:87)
> > > > > > 2015-05-28_14:18:34.50036 [2015-05-28 14:18:34,498] INFO No tasks
> > > > > > scheduled! Declining offers
> > > > > > (com.airbnb.scheduler.mesos.MesosJobFramework:106)
> > > > > > 2015-05-28_14:18:40.50442 [2015-05-28 14:18:40,503] INFO Received
> > > > > resource
> > > > > > offers
> > > > > > 2015-05-28_14:18:40.50444
> > > > > >  (com.airbnb.scheduler.mesos.MesosJobFramework:87)
> > > > > > 2015-05-28_14:18:40.50506 [2015-05-28 14:18:40,503] INFO No tasks
> > > > > > scheduled! Declining offers
> > > > > > (com.airbnb.scheduler.mesos.MesosJobFramework:106)
> > > > > >
> > > > > > I'm using 0.20.1 because I'm using this vagrant machine:
> > > > > > https://github.com/Banno/vagrant-mesos
> > > > > >
> > > > > > Kind regards and thank you again for your help
> > > > > >
> > > > > > 2015-05-28 14:09 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:
> > > > > >
> > > > > >> Alberto,
> > > > > >>
> > > > > >> it looks like Spark scheduler disconnects right after
> establishing
> > > the
> > > > > >> connection. Would you mind sharing scheduler logs as well? Also
> I
> > > see
> > > > > that
> > > > > >> you haven't specified the failover_timeout, try setting this
> value
> > > to
> > > > > >> something meaningful (several hours for test purposes).
> > > > > >>
> > > > > >> And by the way, any reason you're still on Mesos 0.20.1?
> > > > > >>
> > > > > >> On Wed, May 27, 2015 at 5:32 PM, Alberto Rodriguez <
> > > ardlema@gmail.com
> > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi Alex,
> > > > > >> >
> > > > > >> > I do not know what's going on, now I'm unable to access the
> > spark
> > > > > >> console
> > > > > >> > again, it's hanging up in the same point as before. See
> > following
> > > > the
> > > > > >> > master logs:
> > > > > >> >
> > > > > >> > 2015-05-27_15:30:53.68764 I0527 15:30:53.687494   944
> > > > master.cpp:3760]
> > > > > >> > Sending 1 offers to framework
> > > > 20150527-100126-169978048-5050-1851-0001
> > > > > >> > (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at
> > > > scheduler-be29901f-39ab-4bdf
> > > > > >> > -a9ec-691032775860@192.168.33.10:32768
> > > > > >> > 2015-05-27_15:30:53.69032 I0527 15:30:53.690196   942
> > > > master.cpp:2273]
> > > > > >> > Processing ACCEPT call for offers: [
> > > > > >> > 20150527-152023-169978048-5050-876-O241 ] on slave
> > > > > >> > 20150527-152023-169978048-5050-876-S0 at slave(1)@19
> > > > > >> > 2.168.33.11:5051 (mesos-slave1) for framework
> > > > > >> > 20150527-100126-169978048-5050-1851-0001
> > > > > >> > (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at
> > > > > >> >
> > > scheduler-be29901f-39ab-4bdf-a9ec-691032775860@192.168.33.10:32768
> > > > > >> > 2015-05-27_15:30:53.69038 I0527 15:30:53.690300   942
> > > > > >> hierarchical.hpp:648]
> > > > > >> > Recovered mem(*):1024; cpus(*):2; disk(*):33375;
> > > > > ports(*):[31000-32000]
> > > > > >> > (total allocatable: mem(*):1024; cpus(*):2; disk(*):33375;
> port
> > > > > >> > s(*):[31000-32000]) on slave
> > 20150527-152023-169978048-5050-876-S0
> > > > > from
> > > > > >> > framework 20150527-100126-169978048-5050-1851-0001
> > > > > >> > 2015-05-27_15:30:54.00952 I0527 15:30:54.009363   937
> > > > master.cpp:1574]
> > > > > >> > Received registration request for framework 'Spark shell' at
> > > > > >> >
> scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562
> > > > > >> > 2015-05-27_15:30:54.00957 I0527 15:30:54.009461   937
> > > > master.cpp:1638]
> > > > > >> > Registering framework 20150527-152023-169978048-5050-876-0026
> > > (Spark
> > > > > >> shell)
> > > > > >> > at
> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:5556
> > > > > >> > 2
> > > > > >> > 2015-05-27_15:30:54.00994 I0527 15:30:54.009703   937
> > > > > >> hierarchical.hpp:321]
> > > > > >> > Added framework 20150527-152023-169978048-5050-876-0026
> > > > > >> > 2015-05-27_15:30:54.00996 I0527 15:30:54.009826   937
> > > > master.cpp:3760]
> > > > > >> > Sending 1 offers to framework
> > > > 20150527-152023-169978048-5050-876-0026
> > > > > >> > (Spark shell) at
> > > > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.
> > > > > >> > 0.1:55562
> > > > > >> > 2015-05-27_15:30:54.01035 I0527 15:30:54.010267   944
> > > > master.cpp:878]
> > > > > >> > Framework 20150527-152023-169978048-5050-876-0026 (Spark
> shell)
> > at
> > > > > >> >
> scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562
> > > > > >> disconnecte
> > > > > >> > d
> > > > > >> > 2015-05-27_15:30:54.01037 I0527 15:30:54.010308   944
> > > > master.cpp:1948]
> > > > > >> > Disconnecting framework
> 20150527-152023-169978048-5050-876-0026
> > > > (Spark
> > > > > >> > shell) at
> > > > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55
> > > > > >> > 562
> > > > > >> > 2015-05-27_15:30:54.01038 I0527 15:30:54.010326   944
> > > > master.cpp:1964]
> > > > > >> > Deactivating framework 20150527-152023-169978048-5050-876-0026
> > > > (Spark
> > > > > >> > shell) at
> > > > > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:555
> > > > > >> > 62
> > > > > >> > 2015-05-27_15:30:54.01053 I0527 15:30:54.010447   939
> > > > > >> hierarchical.hpp:400]
> > > > > >> > Deactivated framework 20150527-152023-169978048-5050-876-0026
> > > > > >> > 2015-05-27_15:30:54.01055 I0527 15:30:54.010459   944
> > > > master.cpp:900]
> > > > > >> > Giving framework 20150527-152023-169978048-5050-876-0026
> (Spark
> > > > shell)
> > > > > >> at
> > > > > >> >
> scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562
> > > 0ns
> > > > > >> > to failover
> > > > > >> >
> > > > > >> >
> > > > > >> > Kind regards and thank you very much for your help!!
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > 2015-05-27 16:28 GMT+02:00 Alex Rukletsov <
> alex@mesosphere.com
> > >:
> > > > > >> >
> > > > > >> > > Alberto,
> > > > > >> > >
> > > > > >> > > would you mind providing slave and master logs (or
> appropriate
> > > > parts
> > > > > >> of
> > > > > >> > > them)? Have you specified the --work_dir flag for your Mesos
> > > > > Workers?
> > > > > >> > >
> > > > > >> > > On Wed, May 27, 2015 at 3:56 PM, Alberto Rodriguez <
> > > > > ardlema@gmail.com
> > > > > >> >
> > > > > >> > > wrote:
> > > > > >> > >
> > > > > >> > > > Hi Alex,
> > > > > >> > > >
> > > > > >> > > > Thank you for replying. I managed to fix the first problem
> > but
> > > > now
> > > > > >> > when I
> > > > > >> > > > launch a spark job through my console mesos is losing all
> > the
> > > > > >> tasks. I
> > > > > >> > > can
> > > > > >> > > > see them all in my mesos slave but their status is LOST.
> The
> > > > > stderr
> > > > > >> &
> > > > > >> > > > stdout files of the tasks are both empty.
> > > > > >> > > >
> > > > > >> > > > Any ideas?
> > > > > >> > > >
> > > > > >> > > > 2015-05-26 17:35 GMT+02:00 Alex Rukletsov <
> > > alex@mesosphere.com
> > > > >:
> > > > > >> > > >
> > > > > >> > > > > Alberto,
> > > > > >> > > > >
> > > > > >> > > > > What may be happening in your case is that Master is not
> > > able
> > > > to
> > > > > >> talk
> > > > > >> > > to
> > > > > >> > > > > your scheduler. When responding to a scheduler, Mesos
> > Master
> > > > > >> doesn't
> > > > > >> > > use
> > > > > >> > > > > the IP from which a request came from, but rather an IP
> > set
> > > in
> > > > > the
> > > > > >> > > > > "Libprocess-from" field instead. That's exactly what you
> > > > specify
> > > > > >> in
> > > > > >> > > > > LIBPROCESS_IP env var prior starting your scheduler.
> Could
> > > you
> > > > > >> please
> > > > > >> > > > > double check the it set up correctly and that IP is
> > > reachable
> > > > > for
> > > > > >> > Mesos
> > > > > >> > > > > Master?
> > > > > >> > > > >
> > > > > >> > > > > In case you are not able to solve the problem, please
> > > provide
> > > > > >> > scheduler
> > > > > >> > > > and
> > > > > >> > > > > Master logs together with master, zookeeper, and
> scheduler
> > > > > >> > > > configurations.
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > On Mon, May 25, 2015 at 6:30 PM, Alberto Rodriguez <
> > > > > >> > ardlema@gmail.com>
> > > > > >> > > > > wrote:
> > > > > >> > > > >
> > > > > >> > > > > > Hi all,
> > > > > >> > > > > >
> > > > > >> > > > > > I managed to get a mesos cluster up & running on a
> > Ubuntu
> > > > VM.
> > > > > >> I've
> > > > > >> > > > > > been also able to run and connect a spark-shell from
> > this
> > > > > >> machine
> > > > > >> > and
> > > > > >> > > > > > it works properly.
> > > > > >> > > > > >
> > > > > >> > > > > > Unfortunately, I'm trying to connect from the host
> > machine
> > > > > where
> > > > > >> > the
> > > > > >> > > > > > VM is running to launch spark jobs and I can not.
> > > > > >> > > > > >
> > > > > >> > > > > > See below the spark console output:
> > > > > >> > > > > >
> > > > > >> > > > > > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit
> > Server
> > > > VM,
> > > > > >> Java
> > > > > >> > > > > > 1.7.0_75)
> > > > > >> > > > > > Type in expressions to have them evaluated.
> > > > > >> > > > > > Type :help for more information.
> > > > > >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing view
> > acls
> > > > to:
> > > > > >> > > > arodriguez
> > > > > >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing
> modify
> > > acls
> > > > > to:
> > > > > >> > > > > arodriguez
> > > > > >> > > > > > 15/05/25 18:13:00 INFO SecurityManager:
> SecurityManager:
> > > > > >> > > > > > authentication disabled; ui acls disabled; users with
> > view
> > > > > >> > > > > > permissions: Set(arodriguez); users with modify
> > > permissions:
> > > > > >> > > > > > Set(arodriguez)
> > > > > >> > > > > > 15/05/25 18:13:01 INFO Slf4jLogger: Slf4jLogger
> started
> > > > > >> > > > > > 15/05/25 18:13:01 INFO Remoting: Starting remoting
> > > > > >> > > > > > 15/05/25 18:13:01 INFO Remoting: Remoting started;
> > > listening
> > > > > on
> > > > > >> > > > > > addresses
> :[akka.tcp://sparkDriver@localhost.localdomain
> > > > > :47229]
> > > > > >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started
> > service
> > > > > >> > > > > > 'sparkDriver' on port 47229.
> > > > > >> > > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering
> > > > MapOutputTracker
> > > > > >> > > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering
> > > > > BlockManagerMaster
> > > > > >> > > > > > 15/05/25 18:13:01 INFO DiskBlockManager: Created local
> > > > > >> directory at
> > > > > >> > > > > > /tmp/spark-local-20150525181301-7fa8
> > > > > >> > > > > > 15/05/25 18:13:01 INFO MemoryStore: MemoryStore
> started
> > > with
> > > > > >> > capacity
> > > > > >> > > > > > 265.4 MB
> > > > > >> > > > > > 15/05/25 18:13:01 WARN NativeCodeLoader: Unable to
> load
> > > > > >> > native-hadoop
> > > > > >> > > > > > library for your platform... using builtin-java
> classes
> > > > where
> > > > > >> > > > > > applicable
> > > > > >> > > > > > 15/05/25 18:13:01 INFO HttpFileServer: HTTP File
> server
> > > > > >> directory
> > > > > >> > is
> > > > > >> > > > > > /tmp/spark-1249c23f-adc8-4fcd-a044-b65a80f40e16
> > > > > >> > > > > > 15/05/25 18:13:01 INFO HttpServer: Starting HTTP
> Server
> > > > > >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started
> > service
> > > > > 'HTTP
> > > > > >> > file
> > > > > >> > > > > > server' on port 51659.
> > > > > >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started
> > service
> > > > > >> > 'SparkUI'
> > > > > >> > > > > > on port 4040.
> > > > > >> > > > > > 15/05/25 18:13:01 INFO SparkUI: Started SparkUI at
> > > > > >> > > > > > http://localhost.localdomain:4040
> > > > > >> > > > > > WARNING: Logging before InitGoogleLogging() is written
> > to
> > > > > STDERR
> > > > > >> > > > > > W0525 18:13:01.749449 10908 sched.cpp:1323]
> > > > > >> > > > > > **************************************************
> > > > > >> > > > > > Scheduler driver bound to loopback interface! Cannot
> > > > > communicate
> > > > > >> > with
> > > > > >> > > > > > remote master(s). You might want to set
> 'LIBPROCESS_IP'
> > > > > >> environment
> > > > > >> > > > > > variable to use a routable IP address.
> > > > > >> > > > > > **************************************************
> > > > > >> > > > > > 2015-05-25
> > > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > > > >> > @712:
> > > > > >> > > > > > Client environment:zookeeper.version=zookeeper C
> client
> > > > 3.4.6
> > > > > >> > > > > > 2015-05-25
> > > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > > > >> > @716:
> > > > > >> > > > > > Client environment:host.name=localhost.localdomain
> > > > > >> > > > > > 2015-05-25
> > > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > > > >> > @723:
> > > > > >> > > > > > Client environment:os.name=Linux
> > > > > >> > > > > > 2015-05-25
> > > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > > > >> > @724:
> > > > > >> > > > > > Client environment:os.arch=3.19.7-200.fc21.x86_64
> > > > > >> > > > > > 2015-05-25
> > > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > > > >> > @725:
> > > > > >> > > > > > Client environment:os.version=#1 SMP Thu May 7
> 22:00:21
> > > UTC
> > > > > 2015
> > > > > >> > > > > > 2015-05-25
> > > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > > > >> > @733:
> > > > > >> > > > > > Client environment:user.name=arodriguez
> > > > > >> > > > > > I0525 18:13:01.749791 10908 sched.cpp:157] Version:
> > 0.22.1
> > > > > >> > > > > > 2015-05-25
> > > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > > > >> > @741:
> > > > > >> > > > > > Client environment:user.home=/home/arodriguez
> > > > > >> > > > > > 2015-05-25
> > > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > > > >> > @753:
> > > > > >> > > > > > Client
> > > > > >> > > > > >
> > > > > >> > >
> > > > > >>
> > > >
> environment:user.dir=/home/arodriguez/dev/spark-1.2.0-bin-hadoop2.4/bin
> > > > > >> > > > > > 2015-05-25
> > > > > >> > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@zookeeper_init
> > > > > >> > > > > @786:
> > > > > >> > > > > > Initiating client connection, host=10.141.141.10:2181
> > > > > >> > > > > > sessionTimeout=10000 watcher=0x7fd4c2f0d5b0
> sessionId=0
> > > > > >> > > > > > sessionPasswd=<null> context=0x7fd3d40063c0 flags=0
> > > > > >> > > > > > 2015-05-25
> > > > > >> 18:13:01,750:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> > > > > >> > > > > @1705:
> > > > > >> > > > > > initiated connection to server [10.141.141.10:2181]
> > > > > >> > > > > > 2015-05-25
> > > > > >> 18:13:01,752:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> > > > > >> > > > > @1752:
> > > > > >> > > > > > session establishment complete on server [
> > > > 10.141.141.10:2181
> > > > > ],
> > > > > >> > > > > > sessionId=0x14d8babef360022, negotiated timeout=10000
> > > > > >> > > > > > I0525 18:13:01.752760 10913 group.cpp:313] Group
> process
> > > > > >> > > > > > (group(1)@127.0.0.1:48557) connected to ZooKeeper
> > > > > >> > > > > > I0525 18:13:01.752787 10913 group.cpp:790] Syncing
> group
> > > > > >> > operations:
> > > > > >> > > > > > queue size (joins, cancels, datas) = (0, 0, 0)
> > > > > >> > > > > > I0525 18:13:01.752807 10913 group.cpp:385] Trying to
> > > create
> > > > > path
> > > > > >> > > > > > '/mesos' in ZooKeeper
> > > > > >> > > > > > I0525 18:13:01.754317 10909 detector.cpp:138]
> Detected a
> > > new
> > > > > >> > leader:
> > > > > >> > > > > > (id='16')
> > > > > >> > > > > > I0525 18:13:01.754408 10913 group.cpp:659] Trying to
> get
> > > > > >> > > > > > '/mesos/info_0000000016' in ZooKeeper
> > > > > >> > > > > > I0525 18:13:01.755056 10913 detector.cpp:452] A new
> > > leading
> > > > > >> master
> > > > > >> > > > > > (UPID=master@127.0.1.1:5050) is detected
> > > > > >> > > > > > I0525 18:13:01.755113 10911 sched.cpp:254] New master
> > > > detected
> > > > > >> at
> > > > > >> > > > > > master@127.0.1.1:5050
> > > > > >> > > > > > I0525 18:13:01.755345 10911 sched.cpp:264] No
> > credentials
> > > > > >> provided.
> > > > > >> > > > > > Attempting to register without authentication
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > > It hangs up in the last line.
> > > > > >> > > > > >
> > > > > >> > > > > > I've tried to set the LIBPROCESS_IP env variable with
> no
> > > > luck.
> > > > > >> > > > > >
> > > > > >> > > > > > Any advice?
> > > > > >> > > > > >
> > > > > >> > > > > > Thank you in advance.
> > > > > >> > > > > >
> > > > > >> > > > > > Kind regards,
> > > > > >> > > > > >
> > > > > >> > > > > > Alberto
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Not able to connect to mesos from different machine

Posted by Alberto Rodriguez <ar...@gmail.com>.
Hi Marco,

there is no need to apologize! Thank you very, very much for your detailed
explanation. As you said I tested it out NAT'ing the VMs but it didn't'
work. I'll try to test your solution when I've got some spare time and get
back to the group to let you know guys if your solution work.

Thank you again!

2015-05-29 8:48 GMT+02:00 Marco Massenzio <ma...@mesosphere.io>:

> Apologies in advance if you already know all this and are an expert on vbox
> & networking - but maybe this either helps or at least may point you in the
> right direction (hopefully!)
>
> The problem is most likely to be found in the fact that your laptop (or
> whatever box you're running vbox in) has a hostname that's not
> DNS-resolvable (and probably neither your VMs do).
>
> Further, by default, VBox configures the VM's NICs to be on a 'Bridged'
> private subnet, which means that you can 'net out' (eg, ping google.com
> from the VM) but not get in (eg, run a server accessible from outside the
> VM)
>
> Mesos master/slave need to be able to talk to each other, bi-directionally,
> which is possibly what was causing the issue in the first place.
>
> NAT'ing the VMs won't probably work either (you won't know in advance which
> port the Slave will be listening on - I think!)
>
> One option is to configure vbox's VMs to be on their own subnet (I forget
> the exact terminology, it's been almost a year now since I fiddled with it:
> I think it's the Host-Only option
> <https://www.virtualbox.org/manual/ch06.html#network_hostonly>) but
> essentially vbox will create a subnet and act as a router - the host
> machine will also have a virtual NIC in that subnet, so you'll be able to
> route requests to/from the VMs.
>
> There's also the fact that the Spark driver (pyspark, or spark-submit) will
> need to be able to talk to the worker nodes, but that should "just work"
> once you get Mesos to work.
>
> HTH,
>
>
> *Marco Massenzio*
> *Distributed Systems Engineer*
>
> On Thu, May 28, 2015 at 11:13 PM, Alberto Rodriguez <ar...@gmail.com>
> wrote:
>
> > To be honest I don't know what was the problem. I didn't manage to make
> my
> > Spark jobs work on the mesos cluster running on two virtual machines. I
> > managed to make it work when I run my Spark jobs on my local machine and
> > both master and mesos slaves are running also in my machine.
> >
> > I guess something is not working properly in the way that virtualbox is
> > assigning their network interfaces to the virtual machines but I can't
> > waste more time in the issue.
> >
> > Thank you again for your help!
> >
> > 2015-05-28 19:28 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:
> >
> > > Great! Mind sharing with the list what the problem was (for future
> > > reference)?
> > >
> > > On Thu, May 28, 2015 at 5:25 PM, Alberto Rodriguez <ar...@gmail.com>
> > > wrote:
> > >
> > > > Hi Alex,
> > > >
> > > > I managed to make it work!! Finally I'm running both mesos master and
> > > slave
> > > > in my laptop and picking up the spark jar from a hdfs installed in a
> > VM.
> > > > I've just launched an spark job and is working fine!
> > > >
> > > > Thank you very much for your help
> > > >
> > > > 2015-05-28 16:20 GMT+02:00 Alberto Rodriguez <ar...@gmail.com>:
> > > >
> > > > > Hi Alex,
> > > > >
> > > > > see following an extract of the chronos log (not sure whether this
> is
> > > the
> > > > > log you were talking about):
> > > > >
> > > > > 2015-05-28_14:18:28.49322 [2015-05-28 14:18:28,491] INFO No tasks
> > > > > scheduled! Declining offers
> > > > > (com.airbnb.scheduler.mesos.MesosJobFramework:106)
> > > > > 2015-05-28_14:18:34.49896 [2015-05-28 14:18:34,497] INFO Received
> > > > resource
> > > > > offers
> > > > > 2015-05-28_14:18:34.49903
> > > > >  (com.airbnb.scheduler.mesos.MesosJobFramework:87)
> > > > > 2015-05-28_14:18:34.50036 [2015-05-28 14:18:34,498] INFO No tasks
> > > > > scheduled! Declining offers
> > > > > (com.airbnb.scheduler.mesos.MesosJobFramework:106)
> > > > > 2015-05-28_14:18:40.50442 [2015-05-28 14:18:40,503] INFO Received
> > > > resource
> > > > > offers
> > > > > 2015-05-28_14:18:40.50444
> > > > >  (com.airbnb.scheduler.mesos.MesosJobFramework:87)
> > > > > 2015-05-28_14:18:40.50506 [2015-05-28 14:18:40,503] INFO No tasks
> > > > > scheduled! Declining offers
> > > > > (com.airbnb.scheduler.mesos.MesosJobFramework:106)
> > > > >
> > > > > I'm using 0.20.1 because I'm using this vagrant machine:
> > > > > https://github.com/Banno/vagrant-mesos
> > > > >
> > > > > Kind regards and thank you again for your help
> > > > >
> > > > > 2015-05-28 14:09 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:
> > > > >
> > > > >> Alberto,
> > > > >>
> > > > >> it looks like Spark scheduler disconnects right after establishing
> > the
> > > > >> connection. Would you mind sharing scheduler logs as well? Also I
> > see
> > > > that
> > > > >> you haven't specified the failover_timeout, try setting this value
> > to
> > > > >> something meaningful (several hours for test purposes).
> > > > >>
> > > > >> And by the way, any reason you're still on Mesos 0.20.1?
> > > > >>
> > > > >> On Wed, May 27, 2015 at 5:32 PM, Alberto Rodriguez <
> > ardlema@gmail.com
> > > >
> > > > >> wrote:
> > > > >>
> > > > >> > Hi Alex,
> > > > >> >
> > > > >> > I do not know what's going on, now I'm unable to access the
> spark
> > > > >> console
> > > > >> > again, it's hanging up in the same point as before. See
> following
> > > the
> > > > >> > master logs:
> > > > >> >
> > > > >> > 2015-05-27_15:30:53.68764 I0527 15:30:53.687494   944
> > > master.cpp:3760]
> > > > >> > Sending 1 offers to framework
> > > 20150527-100126-169978048-5050-1851-0001
> > > > >> > (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at
> > > scheduler-be29901f-39ab-4bdf
> > > > >> > -a9ec-691032775860@192.168.33.10:32768
> > > > >> > 2015-05-27_15:30:53.69032 I0527 15:30:53.690196   942
> > > master.cpp:2273]
> > > > >> > Processing ACCEPT call for offers: [
> > > > >> > 20150527-152023-169978048-5050-876-O241 ] on slave
> > > > >> > 20150527-152023-169978048-5050-876-S0 at slave(1)@19
> > > > >> > 2.168.33.11:5051 (mesos-slave1) for framework
> > > > >> > 20150527-100126-169978048-5050-1851-0001
> > > > >> > (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at
> > > > >> >
> > scheduler-be29901f-39ab-4bdf-a9ec-691032775860@192.168.33.10:32768
> > > > >> > 2015-05-27_15:30:53.69038 I0527 15:30:53.690300   942
> > > > >> hierarchical.hpp:648]
> > > > >> > Recovered mem(*):1024; cpus(*):2; disk(*):33375;
> > > > ports(*):[31000-32000]
> > > > >> > (total allocatable: mem(*):1024; cpus(*):2; disk(*):33375; port
> > > > >> > s(*):[31000-32000]) on slave
> 20150527-152023-169978048-5050-876-S0
> > > > from
> > > > >> > framework 20150527-100126-169978048-5050-1851-0001
> > > > >> > 2015-05-27_15:30:54.00952 I0527 15:30:54.009363   937
> > > master.cpp:1574]
> > > > >> > Received registration request for framework 'Spark shell' at
> > > > >> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562
> > > > >> > 2015-05-27_15:30:54.00957 I0527 15:30:54.009461   937
> > > master.cpp:1638]
> > > > >> > Registering framework 20150527-152023-169978048-5050-876-0026
> > (Spark
> > > > >> shell)
> > > > >> > at
> scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:5556
> > > > >> > 2
> > > > >> > 2015-05-27_15:30:54.00994 I0527 15:30:54.009703   937
> > > > >> hierarchical.hpp:321]
> > > > >> > Added framework 20150527-152023-169978048-5050-876-0026
> > > > >> > 2015-05-27_15:30:54.00996 I0527 15:30:54.009826   937
> > > master.cpp:3760]
> > > > >> > Sending 1 offers to framework
> > > 20150527-152023-169978048-5050-876-0026
> > > > >> > (Spark shell) at
> > > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.
> > > > >> > 0.1:55562
> > > > >> > 2015-05-27_15:30:54.01035 I0527 15:30:54.010267   944
> > > master.cpp:878]
> > > > >> > Framework 20150527-152023-169978048-5050-876-0026 (Spark shell)
> at
> > > > >> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562
> > > > >> disconnecte
> > > > >> > d
> > > > >> > 2015-05-27_15:30:54.01037 I0527 15:30:54.010308   944
> > > master.cpp:1948]
> > > > >> > Disconnecting framework 20150527-152023-169978048-5050-876-0026
> > > (Spark
> > > > >> > shell) at
> > > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55
> > > > >> > 562
> > > > >> > 2015-05-27_15:30:54.01038 I0527 15:30:54.010326   944
> > > master.cpp:1964]
> > > > >> > Deactivating framework 20150527-152023-169978048-5050-876-0026
> > > (Spark
> > > > >> > shell) at
> > > > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:555
> > > > >> > 62
> > > > >> > 2015-05-27_15:30:54.01053 I0527 15:30:54.010447   939
> > > > >> hierarchical.hpp:400]
> > > > >> > Deactivated framework 20150527-152023-169978048-5050-876-0026
> > > > >> > 2015-05-27_15:30:54.01055 I0527 15:30:54.010459   944
> > > master.cpp:900]
> > > > >> > Giving framework 20150527-152023-169978048-5050-876-0026 (Spark
> > > shell)
> > > > >> at
> > > > >> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562
> > 0ns
> > > > >> > to failover
> > > > >> >
> > > > >> >
> > > > >> > Kind regards and thank you very much for your help!!
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > 2015-05-27 16:28 GMT+02:00 Alex Rukletsov <alex@mesosphere.com
> >:
> > > > >> >
> > > > >> > > Alberto,
> > > > >> > >
> > > > >> > > would you mind providing slave and master logs (or appropriate
> > > parts
> > > > >> of
> > > > >> > > them)? Have you specified the --work_dir flag for your Mesos
> > > > Workers?
> > > > >> > >
> > > > >> > > On Wed, May 27, 2015 at 3:56 PM, Alberto Rodriguez <
> > > > ardlema@gmail.com
> > > > >> >
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > Hi Alex,
> > > > >> > > >
> > > > >> > > > Thank you for replying. I managed to fix the first problem
> but
> > > now
> > > > >> > when I
> > > > >> > > > launch a spark job through my console mesos is losing all
> the
> > > > >> tasks. I
> > > > >> > > can
> > > > >> > > > see them all in my mesos slave but their status is LOST. The
> > > > stderr
> > > > >> &
> > > > >> > > > stdout files of the tasks are both empty.
> > > > >> > > >
> > > > >> > > > Any ideas?
> > > > >> > > >
> > > > >> > > > 2015-05-26 17:35 GMT+02:00 Alex Rukletsov <
> > alex@mesosphere.com
> > > >:
> > > > >> > > >
> > > > >> > > > > Alberto,
> > > > >> > > > >
> > > > >> > > > > What may be happening in your case is that Master is not
> > able
> > > to
> > > > >> talk
> > > > >> > > to
> > > > >> > > > > your scheduler. When responding to a scheduler, Mesos
> Master
> > > > >> doesn't
> > > > >> > > use
> > > > >> > > > > the IP from which a request came from, but rather an IP
> set
> > in
> > > > the
> > > > >> > > > > "Libprocess-from" field instead. That's exactly what you
> > > specify
> > > > >> in
> > > > >> > > > > LIBPROCESS_IP env var prior starting your scheduler. Could
> > you
> > > > >> please
> > > > >> > > > > double check the it set up correctly and that IP is
> > reachable
> > > > for
> > > > >> > Mesos
> > > > >> > > > > Master?
> > > > >> > > > >
> > > > >> > > > > In case you are not able to solve the problem, please
> > provide
> > > > >> > scheduler
> > > > >> > > > and
> > > > >> > > > > Master logs together with master, zookeeper, and scheduler
> > > > >> > > > configurations.
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > On Mon, May 25, 2015 at 6:30 PM, Alberto Rodriguez <
> > > > >> > ardlema@gmail.com>
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Hi all,
> > > > >> > > > > >
> > > > >> > > > > > I managed to get a mesos cluster up & running on a
> Ubuntu
> > > VM.
> > > > >> I've
> > > > >> > > > > > been also able to run and connect a spark-shell from
> this
> > > > >> machine
> > > > >> > and
> > > > >> > > > > > it works properly.
> > > > >> > > > > >
> > > > >> > > > > > Unfortunately, I'm trying to connect from the host
> machine
> > > > where
> > > > >> > the
> > > > >> > > > > > VM is running to launch spark jobs and I can not.
> > > > >> > > > > >
> > > > >> > > > > > See below the spark console output:
> > > > >> > > > > >
> > > > >> > > > > > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit
> Server
> > > VM,
> > > > >> Java
> > > > >> > > > > > 1.7.0_75)
> > > > >> > > > > > Type in expressions to have them evaluated.
> > > > >> > > > > > Type :help for more information.
> > > > >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing view
> acls
> > > to:
> > > > >> > > > arodriguez
> > > > >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing modify
> > acls
> > > > to:
> > > > >> > > > > arodriguez
> > > > >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: SecurityManager:
> > > > >> > > > > > authentication disabled; ui acls disabled; users with
> view
> > > > >> > > > > > permissions: Set(arodriguez); users with modify
> > permissions:
> > > > >> > > > > > Set(arodriguez)
> > > > >> > > > > > 15/05/25 18:13:01 INFO Slf4jLogger: Slf4jLogger started
> > > > >> > > > > > 15/05/25 18:13:01 INFO Remoting: Starting remoting
> > > > >> > > > > > 15/05/25 18:13:01 INFO Remoting: Remoting started;
> > listening
> > > > on
> > > > >> > > > > > addresses :[akka.tcp://sparkDriver@localhost.localdomain
> > > > :47229]
> > > > >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started
> service
> > > > >> > > > > > 'sparkDriver' on port 47229.
> > > > >> > > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering
> > > MapOutputTracker
> > > > >> > > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering
> > > > BlockManagerMaster
> > > > >> > > > > > 15/05/25 18:13:01 INFO DiskBlockManager: Created local
> > > > >> directory at
> > > > >> > > > > > /tmp/spark-local-20150525181301-7fa8
> > > > >> > > > > > 15/05/25 18:13:01 INFO MemoryStore: MemoryStore started
> > with
> > > > >> > capacity
> > > > >> > > > > > 265.4 MB
> > > > >> > > > > > 15/05/25 18:13:01 WARN NativeCodeLoader: Unable to load
> > > > >> > native-hadoop
> > > > >> > > > > > library for your platform... using builtin-java classes
> > > where
> > > > >> > > > > > applicable
> > > > >> > > > > > 15/05/25 18:13:01 INFO HttpFileServer: HTTP File server
> > > > >> directory
> > > > >> > is
> > > > >> > > > > > /tmp/spark-1249c23f-adc8-4fcd-a044-b65a80f40e16
> > > > >> > > > > > 15/05/25 18:13:01 INFO HttpServer: Starting HTTP Server
> > > > >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started
> service
> > > > 'HTTP
> > > > >> > file
> > > > >> > > > > > server' on port 51659.
> > > > >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started
> service
> > > > >> > 'SparkUI'
> > > > >> > > > > > on port 4040.
> > > > >> > > > > > 15/05/25 18:13:01 INFO SparkUI: Started SparkUI at
> > > > >> > > > > > http://localhost.localdomain:4040
> > > > >> > > > > > WARNING: Logging before InitGoogleLogging() is written
> to
> > > > STDERR
> > > > >> > > > > > W0525 18:13:01.749449 10908 sched.cpp:1323]
> > > > >> > > > > > **************************************************
> > > > >> > > > > > Scheduler driver bound to loopback interface! Cannot
> > > > communicate
> > > > >> > with
> > > > >> > > > > > remote master(s). You might want to set 'LIBPROCESS_IP'
> > > > >> environment
> > > > >> > > > > > variable to use a routable IP address.
> > > > >> > > > > > **************************************************
> > > > >> > > > > > 2015-05-25
> > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > > >> > @712:
> > > > >> > > > > > Client environment:zookeeper.version=zookeeper C client
> > > 3.4.6
> > > > >> > > > > > 2015-05-25
> > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > > >> > @716:
> > > > >> > > > > > Client environment:host.name=localhost.localdomain
> > > > >> > > > > > 2015-05-25
> > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > > >> > @723:
> > > > >> > > > > > Client environment:os.name=Linux
> > > > >> > > > > > 2015-05-25
> > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > > >> > @724:
> > > > >> > > > > > Client environment:os.arch=3.19.7-200.fc21.x86_64
> > > > >> > > > > > 2015-05-25
> > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > > >> > @725:
> > > > >> > > > > > Client environment:os.version=#1 SMP Thu May 7 22:00:21
> > UTC
> > > > 2015
> > > > >> > > > > > 2015-05-25
> > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > > >> > @733:
> > > > >> > > > > > Client environment:user.name=arodriguez
> > > > >> > > > > > I0525 18:13:01.749791 10908 sched.cpp:157] Version:
> 0.22.1
> > > > >> > > > > > 2015-05-25
> > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > > >> > @741:
> > > > >> > > > > > Client environment:user.home=/home/arodriguez
> > > > >> > > > > > 2015-05-25
> > > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > > >> > @753:
> > > > >> > > > > > Client
> > > > >> > > > > >
> > > > >> > >
> > > > >>
> > > environment:user.dir=/home/arodriguez/dev/spark-1.2.0-bin-hadoop2.4/bin
> > > > >> > > > > > 2015-05-25
> > > > >> > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@zookeeper_init
> > > > >> > > > > @786:
> > > > >> > > > > > Initiating client connection, host=10.141.141.10:2181
> > > > >> > > > > > sessionTimeout=10000 watcher=0x7fd4c2f0d5b0 sessionId=0
> > > > >> > > > > > sessionPasswd=<null> context=0x7fd3d40063c0 flags=0
> > > > >> > > > > > 2015-05-25
> > > > >> 18:13:01,750:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> > > > >> > > > > @1705:
> > > > >> > > > > > initiated connection to server [10.141.141.10:2181]
> > > > >> > > > > > 2015-05-25
> > > > >> 18:13:01,752:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> > > > >> > > > > @1752:
> > > > >> > > > > > session establishment complete on server [
> > > 10.141.141.10:2181
> > > > ],
> > > > >> > > > > > sessionId=0x14d8babef360022, negotiated timeout=10000
> > > > >> > > > > > I0525 18:13:01.752760 10913 group.cpp:313] Group process
> > > > >> > > > > > (group(1)@127.0.0.1:48557) connected to ZooKeeper
> > > > >> > > > > > I0525 18:13:01.752787 10913 group.cpp:790] Syncing group
> > > > >> > operations:
> > > > >> > > > > > queue size (joins, cancels, datas) = (0, 0, 0)
> > > > >> > > > > > I0525 18:13:01.752807 10913 group.cpp:385] Trying to
> > create
> > > > path
> > > > >> > > > > > '/mesos' in ZooKeeper
> > > > >> > > > > > I0525 18:13:01.754317 10909 detector.cpp:138] Detected a
> > new
> > > > >> > leader:
> > > > >> > > > > > (id='16')
> > > > >> > > > > > I0525 18:13:01.754408 10913 group.cpp:659] Trying to get
> > > > >> > > > > > '/mesos/info_0000000016' in ZooKeeper
> > > > >> > > > > > I0525 18:13:01.755056 10913 detector.cpp:452] A new
> > leading
> > > > >> master
> > > > >> > > > > > (UPID=master@127.0.1.1:5050) is detected
> > > > >> > > > > > I0525 18:13:01.755113 10911 sched.cpp:254] New master
> > > detected
> > > > >> at
> > > > >> > > > > > master@127.0.1.1:5050
> > > > >> > > > > > I0525 18:13:01.755345 10911 sched.cpp:264] No
> credentials
> > > > >> provided.
> > > > >> > > > > > Attempting to register without authentication
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > It hangs up in the last line.
> > > > >> > > > > >
> > > > >> > > > > > I've tried to set the LIBPROCESS_IP env variable with no
> > > luck.
> > > > >> > > > > >
> > > > >> > > > > > Any advice?
> > > > >> > > > > >
> > > > >> > > > > > Thank you in advance.
> > > > >> > > > > >
> > > > >> > > > > > Kind regards,
> > > > >> > > > > >
> > > > >> > > > > > Alberto
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Not able to connect to mesos from different machine

Posted by Marco Massenzio <ma...@mesosphere.io>.
Apologies in advance if you already know all this and are an expert on vbox
& networking - but maybe this either helps or at least may point you in the
right direction (hopefully!)

The problem is most likely to be found in the fact that your laptop (or
whatever box you're running vbox in) has a hostname that's not
DNS-resolvable (and probably neither your VMs do).

Further, by default, VBox configures the VM's NICs to be on a 'Bridged'
private subnet, which means that you can 'net out' (eg, ping google.com
from the VM) but not get in (eg, run a server accessible from outside the
VM)

Mesos master/slave need to be able to talk to each other, bi-directionally,
which is possibly what was causing the issue in the first place.

NAT'ing the VMs won't probably work either (you won't know in advance which
port the Slave will be listening on - I think!)

One option is to configure vbox's VMs to be on their own subnet (I forget
the exact terminology, it's been almost a year now since I fiddled with it:
I think it's the Host-Only option
<https://www.virtualbox.org/manual/ch06.html#network_hostonly>) but
essentially vbox will create a subnet and act as a router - the host
machine will also have a virtual NIC in that subnet, so you'll be able to
route requests to/from the VMs.

There's also the fact that the Spark driver (pyspark, or spark-submit) will
need to be able to talk to the worker nodes, but that should "just work"
once you get Mesos to work.

HTH,


*Marco Massenzio*
*Distributed Systems Engineer*

On Thu, May 28, 2015 at 11:13 PM, Alberto Rodriguez <ar...@gmail.com>
wrote:

> To be honest I don't know what was the problem. I didn't manage to make my
> Spark jobs work on the mesos cluster running on two virtual machines. I
> managed to make it work when I run my Spark jobs on my local machine and
> both master and mesos slaves are running also in my machine.
>
> I guess something is not working properly in the way that virtualbox is
> assigning their network interfaces to the virtual machines but I can't
> waste more time in the issue.
>
> Thank you again for your help!
>
> 2015-05-28 19:28 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:
>
> > Great! Mind sharing with the list what the problem was (for future
> > reference)?
> >
> > On Thu, May 28, 2015 at 5:25 PM, Alberto Rodriguez <ar...@gmail.com>
> > wrote:
> >
> > > Hi Alex,
> > >
> > > I managed to make it work!! Finally I'm running both mesos master and
> > slave
> > > in my laptop and picking up the spark jar from a hdfs installed in a
> VM.
> > > I've just launched an spark job and is working fine!
> > >
> > > Thank you very much for your help
> > >
> > > 2015-05-28 16:20 GMT+02:00 Alberto Rodriguez <ar...@gmail.com>:
> > >
> > > > Hi Alex,
> > > >
> > > > see following an extract of the chronos log (not sure whether this is
> > the
> > > > log you were talking about):
> > > >
> > > > 2015-05-28_14:18:28.49322 [2015-05-28 14:18:28,491] INFO No tasks
> > > > scheduled! Declining offers
> > > > (com.airbnb.scheduler.mesos.MesosJobFramework:106)
> > > > 2015-05-28_14:18:34.49896 [2015-05-28 14:18:34,497] INFO Received
> > > resource
> > > > offers
> > > > 2015-05-28_14:18:34.49903
> > > >  (com.airbnb.scheduler.mesos.MesosJobFramework:87)
> > > > 2015-05-28_14:18:34.50036 [2015-05-28 14:18:34,498] INFO No tasks
> > > > scheduled! Declining offers
> > > > (com.airbnb.scheduler.mesos.MesosJobFramework:106)
> > > > 2015-05-28_14:18:40.50442 [2015-05-28 14:18:40,503] INFO Received
> > > resource
> > > > offers
> > > > 2015-05-28_14:18:40.50444
> > > >  (com.airbnb.scheduler.mesos.MesosJobFramework:87)
> > > > 2015-05-28_14:18:40.50506 [2015-05-28 14:18:40,503] INFO No tasks
> > > > scheduled! Declining offers
> > > > (com.airbnb.scheduler.mesos.MesosJobFramework:106)
> > > >
> > > > I'm using 0.20.1 because I'm using this vagrant machine:
> > > > https://github.com/Banno/vagrant-mesos
> > > >
> > > > Kind regards and thank you again for your help
> > > >
> > > > 2015-05-28 14:09 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:
> > > >
> > > >> Alberto,
> > > >>
> > > >> it looks like Spark scheduler disconnects right after establishing
> the
> > > >> connection. Would you mind sharing scheduler logs as well? Also I
> see
> > > that
> > > >> you haven't specified the failover_timeout, try setting this value
> to
> > > >> something meaningful (several hours for test purposes).
> > > >>
> > > >> And by the way, any reason you're still on Mesos 0.20.1?
> > > >>
> > > >> On Wed, May 27, 2015 at 5:32 PM, Alberto Rodriguez <
> ardlema@gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >> > Hi Alex,
> > > >> >
> > > >> > I do not know what's going on, now I'm unable to access the spark
> > > >> console
> > > >> > again, it's hanging up in the same point as before. See following
> > the
> > > >> > master logs:
> > > >> >
> > > >> > 2015-05-27_15:30:53.68764 I0527 15:30:53.687494   944
> > master.cpp:3760]
> > > >> > Sending 1 offers to framework
> > 20150527-100126-169978048-5050-1851-0001
> > > >> > (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at
> > scheduler-be29901f-39ab-4bdf
> > > >> > -a9ec-691032775860@192.168.33.10:32768
> > > >> > 2015-05-27_15:30:53.69032 I0527 15:30:53.690196   942
> > master.cpp:2273]
> > > >> > Processing ACCEPT call for offers: [
> > > >> > 20150527-152023-169978048-5050-876-O241 ] on slave
> > > >> > 20150527-152023-169978048-5050-876-S0 at slave(1)@19
> > > >> > 2.168.33.11:5051 (mesos-slave1) for framework
> > > >> > 20150527-100126-169978048-5050-1851-0001
> > > >> > (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at
> > > >> >
> scheduler-be29901f-39ab-4bdf-a9ec-691032775860@192.168.33.10:32768
> > > >> > 2015-05-27_15:30:53.69038 I0527 15:30:53.690300   942
> > > >> hierarchical.hpp:648]
> > > >> > Recovered mem(*):1024; cpus(*):2; disk(*):33375;
> > > ports(*):[31000-32000]
> > > >> > (total allocatable: mem(*):1024; cpus(*):2; disk(*):33375; port
> > > >> > s(*):[31000-32000]) on slave 20150527-152023-169978048-5050-876-S0
> > > from
> > > >> > framework 20150527-100126-169978048-5050-1851-0001
> > > >> > 2015-05-27_15:30:54.00952 I0527 15:30:54.009363   937
> > master.cpp:1574]
> > > >> > Received registration request for framework 'Spark shell' at
> > > >> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562
> > > >> > 2015-05-27_15:30:54.00957 I0527 15:30:54.009461   937
> > master.cpp:1638]
> > > >> > Registering framework 20150527-152023-169978048-5050-876-0026
> (Spark
> > > >> shell)
> > > >> > at scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:5556
> > > >> > 2
> > > >> > 2015-05-27_15:30:54.00994 I0527 15:30:54.009703   937
> > > >> hierarchical.hpp:321]
> > > >> > Added framework 20150527-152023-169978048-5050-876-0026
> > > >> > 2015-05-27_15:30:54.00996 I0527 15:30:54.009826   937
> > master.cpp:3760]
> > > >> > Sending 1 offers to framework
> > 20150527-152023-169978048-5050-876-0026
> > > >> > (Spark shell) at
> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.
> > > >> > 0.1:55562
> > > >> > 2015-05-27_15:30:54.01035 I0527 15:30:54.010267   944
> > master.cpp:878]
> > > >> > Framework 20150527-152023-169978048-5050-876-0026 (Spark shell) at
> > > >> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562
> > > >> disconnecte
> > > >> > d
> > > >> > 2015-05-27_15:30:54.01037 I0527 15:30:54.010308   944
> > master.cpp:1948]
> > > >> > Disconnecting framework 20150527-152023-169978048-5050-876-0026
> > (Spark
> > > >> > shell) at
> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55
> > > >> > 562
> > > >> > 2015-05-27_15:30:54.01038 I0527 15:30:54.010326   944
> > master.cpp:1964]
> > > >> > Deactivating framework 20150527-152023-169978048-5050-876-0026
> > (Spark
> > > >> > shell) at
> > > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:555
> > > >> > 62
> > > >> > 2015-05-27_15:30:54.01053 I0527 15:30:54.010447   939
> > > >> hierarchical.hpp:400]
> > > >> > Deactivated framework 20150527-152023-169978048-5050-876-0026
> > > >> > 2015-05-27_15:30:54.01055 I0527 15:30:54.010459   944
> > master.cpp:900]
> > > >> > Giving framework 20150527-152023-169978048-5050-876-0026 (Spark
> > shell)
> > > >> at
> > > >> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562
> 0ns
> > > >> > to failover
> > > >> >
> > > >> >
> > > >> > Kind regards and thank you very much for your help!!
> > > >> >
> > > >> >
> > > >> >
> > > >> > 2015-05-27 16:28 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:
> > > >> >
> > > >> > > Alberto,
> > > >> > >
> > > >> > > would you mind providing slave and master logs (or appropriate
> > parts
> > > >> of
> > > >> > > them)? Have you specified the --work_dir flag for your Mesos
> > > Workers?
> > > >> > >
> > > >> > > On Wed, May 27, 2015 at 3:56 PM, Alberto Rodriguez <
> > > ardlema@gmail.com
> > > >> >
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Hi Alex,
> > > >> > > >
> > > >> > > > Thank you for replying. I managed to fix the first problem but
> > now
> > > >> > when I
> > > >> > > > launch a spark job through my console mesos is losing all the
> > > >> tasks. I
> > > >> > > can
> > > >> > > > see them all in my mesos slave but their status is LOST. The
> > > stderr
> > > >> &
> > > >> > > > stdout files of the tasks are both empty.
> > > >> > > >
> > > >> > > > Any ideas?
> > > >> > > >
> > > >> > > > 2015-05-26 17:35 GMT+02:00 Alex Rukletsov <
> alex@mesosphere.com
> > >:
> > > >> > > >
> > > >> > > > > Alberto,
> > > >> > > > >
> > > >> > > > > What may be happening in your case is that Master is not
> able
> > to
> > > >> talk
> > > >> > > to
> > > >> > > > > your scheduler. When responding to a scheduler, Mesos Master
> > > >> doesn't
> > > >> > > use
> > > >> > > > > the IP from which a request came from, but rather an IP set
> in
> > > the
> > > >> > > > > "Libprocess-from" field instead. That's exactly what you
> > specify
> > > >> in
> > > >> > > > > LIBPROCESS_IP env var prior starting your scheduler. Could
> you
> > > >> please
> > > >> > > > > double check the it set up correctly and that IP is
> reachable
> > > for
> > > >> > Mesos
> > > >> > > > > Master?
> > > >> > > > >
> > > >> > > > > In case you are not able to solve the problem, please
> provide
> > > >> > scheduler
> > > >> > > > and
> > > >> > > > > Master logs together with master, zookeeper, and scheduler
> > > >> > > > configurations.
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Mon, May 25, 2015 at 6:30 PM, Alberto Rodriguez <
> > > >> > ardlema@gmail.com>
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Hi all,
> > > >> > > > > >
> > > >> > > > > > I managed to get a mesos cluster up & running on a Ubuntu
> > VM.
> > > >> I've
> > > >> > > > > > been also able to run and connect a spark-shell from this
> > > >> machine
> > > >> > and
> > > >> > > > > > it works properly.
> > > >> > > > > >
> > > >> > > > > > Unfortunately, I'm trying to connect from the host machine
> > > where
> > > >> > the
> > > >> > > > > > VM is running to launch spark jobs and I can not.
> > > >> > > > > >
> > > >> > > > > > See below the spark console output:
> > > >> > > > > >
> > > >> > > > > > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server
> > VM,
> > > >> Java
> > > >> > > > > > 1.7.0_75)
> > > >> > > > > > Type in expressions to have them evaluated.
> > > >> > > > > > Type :help for more information.
> > > >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing view acls
> > to:
> > > >> > > > arodriguez
> > > >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing modify
> acls
> > > to:
> > > >> > > > > arodriguez
> > > >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: SecurityManager:
> > > >> > > > > > authentication disabled; ui acls disabled; users with view
> > > >> > > > > > permissions: Set(arodriguez); users with modify
> permissions:
> > > >> > > > > > Set(arodriguez)
> > > >> > > > > > 15/05/25 18:13:01 INFO Slf4jLogger: Slf4jLogger started
> > > >> > > > > > 15/05/25 18:13:01 INFO Remoting: Starting remoting
> > > >> > > > > > 15/05/25 18:13:01 INFO Remoting: Remoting started;
> listening
> > > on
> > > >> > > > > > addresses :[akka.tcp://sparkDriver@localhost.localdomain
> > > :47229]
> > > >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service
> > > >> > > > > > 'sparkDriver' on port 47229.
> > > >> > > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering
> > MapOutputTracker
> > > >> > > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering
> > > BlockManagerMaster
> > > >> > > > > > 15/05/25 18:13:01 INFO DiskBlockManager: Created local
> > > >> directory at
> > > >> > > > > > /tmp/spark-local-20150525181301-7fa8
> > > >> > > > > > 15/05/25 18:13:01 INFO MemoryStore: MemoryStore started
> with
> > > >> > capacity
> > > >> > > > > > 265.4 MB
> > > >> > > > > > 15/05/25 18:13:01 WARN NativeCodeLoader: Unable to load
> > > >> > native-hadoop
> > > >> > > > > > library for your platform... using builtin-java classes
> > where
> > > >> > > > > > applicable
> > > >> > > > > > 15/05/25 18:13:01 INFO HttpFileServer: HTTP File server
> > > >> directory
> > > >> > is
> > > >> > > > > > /tmp/spark-1249c23f-adc8-4fcd-a044-b65a80f40e16
> > > >> > > > > > 15/05/25 18:13:01 INFO HttpServer: Starting HTTP Server
> > > >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service
> > > 'HTTP
> > > >> > file
> > > >> > > > > > server' on port 51659.
> > > >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service
> > > >> > 'SparkUI'
> > > >> > > > > > on port 4040.
> > > >> > > > > > 15/05/25 18:13:01 INFO SparkUI: Started SparkUI at
> > > >> > > > > > http://localhost.localdomain:4040
> > > >> > > > > > WARNING: Logging before InitGoogleLogging() is written to
> > > STDERR
> > > >> > > > > > W0525 18:13:01.749449 10908 sched.cpp:1323]
> > > >> > > > > > **************************************************
> > > >> > > > > > Scheduler driver bound to loopback interface! Cannot
> > > communicate
> > > >> > with
> > > >> > > > > > remote master(s). You might want to set 'LIBPROCESS_IP'
> > > >> environment
> > > >> > > > > > variable to use a routable IP address.
> > > >> > > > > > **************************************************
> > > >> > > > > > 2015-05-25
> > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > >> > @712:
> > > >> > > > > > Client environment:zookeeper.version=zookeeper C client
> > 3.4.6
> > > >> > > > > > 2015-05-25
> > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > >> > @716:
> > > >> > > > > > Client environment:host.name=localhost.localdomain
> > > >> > > > > > 2015-05-25
> > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > >> > @723:
> > > >> > > > > > Client environment:os.name=Linux
> > > >> > > > > > 2015-05-25
> > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > >> > @724:
> > > >> > > > > > Client environment:os.arch=3.19.7-200.fc21.x86_64
> > > >> > > > > > 2015-05-25
> > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > >> > @725:
> > > >> > > > > > Client environment:os.version=#1 SMP Thu May 7 22:00:21
> UTC
> > > 2015
> > > >> > > > > > 2015-05-25
> > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > >> > @733:
> > > >> > > > > > Client environment:user.name=arodriguez
> > > >> > > > > > I0525 18:13:01.749791 10908 sched.cpp:157] Version: 0.22.1
> > > >> > > > > > 2015-05-25
> > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > >> > @741:
> > > >> > > > > > Client environment:user.home=/home/arodriguez
> > > >> > > > > > 2015-05-25
> > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > > >> > @753:
> > > >> > > > > > Client
> > > >> > > > > >
> > > >> > >
> > > >>
> > environment:user.dir=/home/arodriguez/dev/spark-1.2.0-bin-hadoop2.4/bin
> > > >> > > > > > 2015-05-25
> > > >> > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@zookeeper_init
> > > >> > > > > @786:
> > > >> > > > > > Initiating client connection, host=10.141.141.10:2181
> > > >> > > > > > sessionTimeout=10000 watcher=0x7fd4c2f0d5b0 sessionId=0
> > > >> > > > > > sessionPasswd=<null> context=0x7fd3d40063c0 flags=0
> > > >> > > > > > 2015-05-25
> > > >> 18:13:01,750:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> > > >> > > > > @1705:
> > > >> > > > > > initiated connection to server [10.141.141.10:2181]
> > > >> > > > > > 2015-05-25
> > > >> 18:13:01,752:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> > > >> > > > > @1752:
> > > >> > > > > > session establishment complete on server [
> > 10.141.141.10:2181
> > > ],
> > > >> > > > > > sessionId=0x14d8babef360022, negotiated timeout=10000
> > > >> > > > > > I0525 18:13:01.752760 10913 group.cpp:313] Group process
> > > >> > > > > > (group(1)@127.0.0.1:48557) connected to ZooKeeper
> > > >> > > > > > I0525 18:13:01.752787 10913 group.cpp:790] Syncing group
> > > >> > operations:
> > > >> > > > > > queue size (joins, cancels, datas) = (0, 0, 0)
> > > >> > > > > > I0525 18:13:01.752807 10913 group.cpp:385] Trying to
> create
> > > path
> > > >> > > > > > '/mesos' in ZooKeeper
> > > >> > > > > > I0525 18:13:01.754317 10909 detector.cpp:138] Detected a
> new
> > > >> > leader:
> > > >> > > > > > (id='16')
> > > >> > > > > > I0525 18:13:01.754408 10913 group.cpp:659] Trying to get
> > > >> > > > > > '/mesos/info_0000000016' in ZooKeeper
> > > >> > > > > > I0525 18:13:01.755056 10913 detector.cpp:452] A new
> leading
> > > >> master
> > > >> > > > > > (UPID=master@127.0.1.1:5050) is detected
> > > >> > > > > > I0525 18:13:01.755113 10911 sched.cpp:254] New master
> > detected
> > > >> at
> > > >> > > > > > master@127.0.1.1:5050
> > > >> > > > > > I0525 18:13:01.755345 10911 sched.cpp:264] No credentials
> > > >> provided.
> > > >> > > > > > Attempting to register without authentication
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > It hangs up in the last line.
> > > >> > > > > >
> > > >> > > > > > I've tried to set the LIBPROCESS_IP env variable with no
> > luck.
> > > >> > > > > >
> > > >> > > > > > Any advice?
> > > >> > > > > >
> > > >> > > > > > Thank you in advance.
> > > >> > > > > >
> > > >> > > > > > Kind regards,
> > > >> > > > > >
> > > >> > > > > > Alberto
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: Not able to connect to mesos from different machine

Posted by Alberto Rodriguez <ar...@gmail.com>.
To be honest I don't know what was the problem. I didn't manage to make my
Spark jobs work on the mesos cluster running on two virtual machines. I
managed to make it work when I run my Spark jobs on my local machine and
both master and mesos slaves are running also in my machine.

I guess something is not working properly in the way that virtualbox is
assigning their network interfaces to the virtual machines but I can't
waste more time in the issue.

Thank you again for your help!

2015-05-28 19:28 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:

> Great! Mind sharing with the list what the problem was (for future
> reference)?
>
> On Thu, May 28, 2015 at 5:25 PM, Alberto Rodriguez <ar...@gmail.com>
> wrote:
>
> > Hi Alex,
> >
> > I managed to make it work!! Finally I'm running both mesos master and
> slave
> > in my laptop and picking up the spark jar from a hdfs installed in a VM.
> > I've just launched an spark job and is working fine!
> >
> > Thank you very much for your help
> >
> > 2015-05-28 16:20 GMT+02:00 Alberto Rodriguez <ar...@gmail.com>:
> >
> > > Hi Alex,
> > >
> > > see following an extract of the chronos log (not sure whether this is
> the
> > > log you were talking about):
> > >
> > > 2015-05-28_14:18:28.49322 [2015-05-28 14:18:28,491] INFO No tasks
> > > scheduled! Declining offers
> > > (com.airbnb.scheduler.mesos.MesosJobFramework:106)
> > > 2015-05-28_14:18:34.49896 [2015-05-28 14:18:34,497] INFO Received
> > resource
> > > offers
> > > 2015-05-28_14:18:34.49903
> > >  (com.airbnb.scheduler.mesos.MesosJobFramework:87)
> > > 2015-05-28_14:18:34.50036 [2015-05-28 14:18:34,498] INFO No tasks
> > > scheduled! Declining offers
> > > (com.airbnb.scheduler.mesos.MesosJobFramework:106)
> > > 2015-05-28_14:18:40.50442 [2015-05-28 14:18:40,503] INFO Received
> > resource
> > > offers
> > > 2015-05-28_14:18:40.50444
> > >  (com.airbnb.scheduler.mesos.MesosJobFramework:87)
> > > 2015-05-28_14:18:40.50506 [2015-05-28 14:18:40,503] INFO No tasks
> > > scheduled! Declining offers
> > > (com.airbnb.scheduler.mesos.MesosJobFramework:106)
> > >
> > > I'm using 0.20.1 because I'm using this vagrant machine:
> > > https://github.com/Banno/vagrant-mesos
> > >
> > > Kind regards and thank you again for your help
> > >
> > > 2015-05-28 14:09 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:
> > >
> > >> Alberto,
> > >>
> > >> it looks like Spark scheduler disconnects right after establishing the
> > >> connection. Would you mind sharing scheduler logs as well? Also I see
> > that
> > >> you haven't specified the failover_timeout, try setting this value to
> > >> something meaningful (several hours for test purposes).
> > >>
> > >> And by the way, any reason you're still on Mesos 0.20.1?
> > >>
> > >> On Wed, May 27, 2015 at 5:32 PM, Alberto Rodriguez <ardlema@gmail.com
> >
> > >> wrote:
> > >>
> > >> > Hi Alex,
> > >> >
> > >> > I do not know what's going on, now I'm unable to access the spark
> > >> console
> > >> > again, it's hanging up in the same point as before. See following
> the
> > >> > master logs:
> > >> >
> > >> > 2015-05-27_15:30:53.68764 I0527 15:30:53.687494   944
> master.cpp:3760]
> > >> > Sending 1 offers to framework
> 20150527-100126-169978048-5050-1851-0001
> > >> > (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at
> scheduler-be29901f-39ab-4bdf
> > >> > -a9ec-691032775860@192.168.33.10:32768
> > >> > 2015-05-27_15:30:53.69032 I0527 15:30:53.690196   942
> master.cpp:2273]
> > >> > Processing ACCEPT call for offers: [
> > >> > 20150527-152023-169978048-5050-876-O241 ] on slave
> > >> > 20150527-152023-169978048-5050-876-S0 at slave(1)@19
> > >> > 2.168.33.11:5051 (mesos-slave1) for framework
> > >> > 20150527-100126-169978048-5050-1851-0001
> > >> > (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at
> > >> > scheduler-be29901f-39ab-4bdf-a9ec-691032775860@192.168.33.10:32768
> > >> > 2015-05-27_15:30:53.69038 I0527 15:30:53.690300   942
> > >> hierarchical.hpp:648]
> > >> > Recovered mem(*):1024; cpus(*):2; disk(*):33375;
> > ports(*):[31000-32000]
> > >> > (total allocatable: mem(*):1024; cpus(*):2; disk(*):33375; port
> > >> > s(*):[31000-32000]) on slave 20150527-152023-169978048-5050-876-S0
> > from
> > >> > framework 20150527-100126-169978048-5050-1851-0001
> > >> > 2015-05-27_15:30:54.00952 I0527 15:30:54.009363   937
> master.cpp:1574]
> > >> > Received registration request for framework 'Spark shell' at
> > >> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562
> > >> > 2015-05-27_15:30:54.00957 I0527 15:30:54.009461   937
> master.cpp:1638]
> > >> > Registering framework 20150527-152023-169978048-5050-876-0026 (Spark
> > >> shell)
> > >> > at scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:5556
> > >> > 2
> > >> > 2015-05-27_15:30:54.00994 I0527 15:30:54.009703   937
> > >> hierarchical.hpp:321]
> > >> > Added framework 20150527-152023-169978048-5050-876-0026
> > >> > 2015-05-27_15:30:54.00996 I0527 15:30:54.009826   937
> master.cpp:3760]
> > >> > Sending 1 offers to framework
> 20150527-152023-169978048-5050-876-0026
> > >> > (Spark shell) at
> scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.
> > >> > 0.1:55562
> > >> > 2015-05-27_15:30:54.01035 I0527 15:30:54.010267   944
> master.cpp:878]
> > >> > Framework 20150527-152023-169978048-5050-876-0026 (Spark shell) at
> > >> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562
> > >> disconnecte
> > >> > d
> > >> > 2015-05-27_15:30:54.01037 I0527 15:30:54.010308   944
> master.cpp:1948]
> > >> > Disconnecting framework 20150527-152023-169978048-5050-876-0026
> (Spark
> > >> > shell) at
> scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55
> > >> > 562
> > >> > 2015-05-27_15:30:54.01038 I0527 15:30:54.010326   944
> master.cpp:1964]
> > >> > Deactivating framework 20150527-152023-169978048-5050-876-0026
> (Spark
> > >> > shell) at
> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:555
> > >> > 62
> > >> > 2015-05-27_15:30:54.01053 I0527 15:30:54.010447   939
> > >> hierarchical.hpp:400]
> > >> > Deactivated framework 20150527-152023-169978048-5050-876-0026
> > >> > 2015-05-27_15:30:54.01055 I0527 15:30:54.010459   944
> master.cpp:900]
> > >> > Giving framework 20150527-152023-169978048-5050-876-0026 (Spark
> shell)
> > >> at
> > >> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562 0ns
> > >> > to failover
> > >> >
> > >> >
> > >> > Kind regards and thank you very much for your help!!
> > >> >
> > >> >
> > >> >
> > >> > 2015-05-27 16:28 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:
> > >> >
> > >> > > Alberto,
> > >> > >
> > >> > > would you mind providing slave and master logs (or appropriate
> parts
> > >> of
> > >> > > them)? Have you specified the --work_dir flag for your Mesos
> > Workers?
> > >> > >
> > >> > > On Wed, May 27, 2015 at 3:56 PM, Alberto Rodriguez <
> > ardlema@gmail.com
> > >> >
> > >> > > wrote:
> > >> > >
> > >> > > > Hi Alex,
> > >> > > >
> > >> > > > Thank you for replying. I managed to fix the first problem but
> now
> > >> > when I
> > >> > > > launch a spark job through my console mesos is losing all the
> > >> tasks. I
> > >> > > can
> > >> > > > see them all in my mesos slave but their status is LOST. The
> > stderr
> > >> &
> > >> > > > stdout files of the tasks are both empty.
> > >> > > >
> > >> > > > Any ideas?
> > >> > > >
> > >> > > > 2015-05-26 17:35 GMT+02:00 Alex Rukletsov <alex@mesosphere.com
> >:
> > >> > > >
> > >> > > > > Alberto,
> > >> > > > >
> > >> > > > > What may be happening in your case is that Master is not able
> to
> > >> talk
> > >> > > to
> > >> > > > > your scheduler. When responding to a scheduler, Mesos Master
> > >> doesn't
> > >> > > use
> > >> > > > > the IP from which a request came from, but rather an IP set in
> > the
> > >> > > > > "Libprocess-from" field instead. That's exactly what you
> specify
> > >> in
> > >> > > > > LIBPROCESS_IP env var prior starting your scheduler. Could you
> > >> please
> > >> > > > > double check the it set up correctly and that IP is reachable
> > for
> > >> > Mesos
> > >> > > > > Master?
> > >> > > > >
> > >> > > > > In case you are not able to solve the problem, please provide
> > >> > scheduler
> > >> > > > and
> > >> > > > > Master logs together with master, zookeeper, and scheduler
> > >> > > > configurations.
> > >> > > > >
> > >> > > > >
> > >> > > > > On Mon, May 25, 2015 at 6:30 PM, Alberto Rodriguez <
> > >> > ardlema@gmail.com>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Hi all,
> > >> > > > > >
> > >> > > > > > I managed to get a mesos cluster up & running on a Ubuntu
> VM.
> > >> I've
> > >> > > > > > been also able to run and connect a spark-shell from this
> > >> machine
> > >> > and
> > >> > > > > > it works properly.
> > >> > > > > >
> > >> > > > > > Unfortunately, I'm trying to connect from the host machine
> > where
> > >> > the
> > >> > > > > > VM is running to launch spark jobs and I can not.
> > >> > > > > >
> > >> > > > > > See below the spark console output:
> > >> > > > > >
> > >> > > > > > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server
> VM,
> > >> Java
> > >> > > > > > 1.7.0_75)
> > >> > > > > > Type in expressions to have them evaluated.
> > >> > > > > > Type :help for more information.
> > >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing view acls
> to:
> > >> > > > arodriguez
> > >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing modify acls
> > to:
> > >> > > > > arodriguez
> > >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: SecurityManager:
> > >> > > > > > authentication disabled; ui acls disabled; users with view
> > >> > > > > > permissions: Set(arodriguez); users with modify permissions:
> > >> > > > > > Set(arodriguez)
> > >> > > > > > 15/05/25 18:13:01 INFO Slf4jLogger: Slf4jLogger started
> > >> > > > > > 15/05/25 18:13:01 INFO Remoting: Starting remoting
> > >> > > > > > 15/05/25 18:13:01 INFO Remoting: Remoting started; listening
> > on
> > >> > > > > > addresses :[akka.tcp://sparkDriver@localhost.localdomain
> > :47229]
> > >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service
> > >> > > > > > 'sparkDriver' on port 47229.
> > >> > > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering
> MapOutputTracker
> > >> > > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering
> > BlockManagerMaster
> > >> > > > > > 15/05/25 18:13:01 INFO DiskBlockManager: Created local
> > >> directory at
> > >> > > > > > /tmp/spark-local-20150525181301-7fa8
> > >> > > > > > 15/05/25 18:13:01 INFO MemoryStore: MemoryStore started with
> > >> > capacity
> > >> > > > > > 265.4 MB
> > >> > > > > > 15/05/25 18:13:01 WARN NativeCodeLoader: Unable to load
> > >> > native-hadoop
> > >> > > > > > library for your platform... using builtin-java classes
> where
> > >> > > > > > applicable
> > >> > > > > > 15/05/25 18:13:01 INFO HttpFileServer: HTTP File server
> > >> directory
> > >> > is
> > >> > > > > > /tmp/spark-1249c23f-adc8-4fcd-a044-b65a80f40e16
> > >> > > > > > 15/05/25 18:13:01 INFO HttpServer: Starting HTTP Server
> > >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service
> > 'HTTP
> > >> > file
> > >> > > > > > server' on port 51659.
> > >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service
> > >> > 'SparkUI'
> > >> > > > > > on port 4040.
> > >> > > > > > 15/05/25 18:13:01 INFO SparkUI: Started SparkUI at
> > >> > > > > > http://localhost.localdomain:4040
> > >> > > > > > WARNING: Logging before InitGoogleLogging() is written to
> > STDERR
> > >> > > > > > W0525 18:13:01.749449 10908 sched.cpp:1323]
> > >> > > > > > **************************************************
> > >> > > > > > Scheduler driver bound to loopback interface! Cannot
> > communicate
> > >> > with
> > >> > > > > > remote master(s). You might want to set 'LIBPROCESS_IP'
> > >> environment
> > >> > > > > > variable to use a routable IP address.
> > >> > > > > > **************************************************
> > >> > > > > > 2015-05-25
> 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > >> > @712:
> > >> > > > > > Client environment:zookeeper.version=zookeeper C client
> 3.4.6
> > >> > > > > > 2015-05-25
> 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > >> > @716:
> > >> > > > > > Client environment:host.name=localhost.localdomain
> > >> > > > > > 2015-05-25
> 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > >> > @723:
> > >> > > > > > Client environment:os.name=Linux
> > >> > > > > > 2015-05-25
> 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > >> > @724:
> > >> > > > > > Client environment:os.arch=3.19.7-200.fc21.x86_64
> > >> > > > > > 2015-05-25
> 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > >> > @725:
> > >> > > > > > Client environment:os.version=#1 SMP Thu May 7 22:00:21 UTC
> > 2015
> > >> > > > > > 2015-05-25
> 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > >> > @733:
> > >> > > > > > Client environment:user.name=arodriguez
> > >> > > > > > I0525 18:13:01.749791 10908 sched.cpp:157] Version: 0.22.1
> > >> > > > > > 2015-05-25
> 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > >> > @741:
> > >> > > > > > Client environment:user.home=/home/arodriguez
> > >> > > > > > 2015-05-25
> 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > >> > @753:
> > >> > > > > > Client
> > >> > > > > >
> > >> > >
> > >>
> environment:user.dir=/home/arodriguez/dev/spark-1.2.0-bin-hadoop2.4/bin
> > >> > > > > > 2015-05-25
> > >> > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@zookeeper_init
> > >> > > > > @786:
> > >> > > > > > Initiating client connection, host=10.141.141.10:2181
> > >> > > > > > sessionTimeout=10000 watcher=0x7fd4c2f0d5b0 sessionId=0
> > >> > > > > > sessionPasswd=<null> context=0x7fd3d40063c0 flags=0
> > >> > > > > > 2015-05-25
> > >> 18:13:01,750:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> > >> > > > > @1705:
> > >> > > > > > initiated connection to server [10.141.141.10:2181]
> > >> > > > > > 2015-05-25
> > >> 18:13:01,752:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> > >> > > > > @1752:
> > >> > > > > > session establishment complete on server [
> 10.141.141.10:2181
> > ],
> > >> > > > > > sessionId=0x14d8babef360022, negotiated timeout=10000
> > >> > > > > > I0525 18:13:01.752760 10913 group.cpp:313] Group process
> > >> > > > > > (group(1)@127.0.0.1:48557) connected to ZooKeeper
> > >> > > > > > I0525 18:13:01.752787 10913 group.cpp:790] Syncing group
> > >> > operations:
> > >> > > > > > queue size (joins, cancels, datas) = (0, 0, 0)
> > >> > > > > > I0525 18:13:01.752807 10913 group.cpp:385] Trying to create
> > path
> > >> > > > > > '/mesos' in ZooKeeper
> > >> > > > > > I0525 18:13:01.754317 10909 detector.cpp:138] Detected a new
> > >> > leader:
> > >> > > > > > (id='16')
> > >> > > > > > I0525 18:13:01.754408 10913 group.cpp:659] Trying to get
> > >> > > > > > '/mesos/info_0000000016' in ZooKeeper
> > >> > > > > > I0525 18:13:01.755056 10913 detector.cpp:452] A new leading
> > >> master
> > >> > > > > > (UPID=master@127.0.1.1:5050) is detected
> > >> > > > > > I0525 18:13:01.755113 10911 sched.cpp:254] New master
> detected
> > >> at
> > >> > > > > > master@127.0.1.1:5050
> > >> > > > > > I0525 18:13:01.755345 10911 sched.cpp:264] No credentials
> > >> provided.
> > >> > > > > > Attempting to register without authentication
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > It hangs up in the last line.
> > >> > > > > >
> > >> > > > > > I've tried to set the LIBPROCESS_IP env variable with no
> luck.
> > >> > > > > >
> > >> > > > > > Any advice?
> > >> > > > > >
> > >> > > > > > Thank you in advance.
> > >> > > > > >
> > >> > > > > > Kind regards,
> > >> > > > > >
> > >> > > > > > Alberto
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Not able to connect to mesos from different machine

Posted by Alex Rukletsov <al...@mesosphere.com>.
Great! Mind sharing with the list what the problem was (for future
reference)?

On Thu, May 28, 2015 at 5:25 PM, Alberto Rodriguez <ar...@gmail.com>
wrote:

> Hi Alex,
>
> I managed to make it work!! Finally I'm running both mesos master and slave
> in my laptop and picking up the spark jar from a hdfs installed in a VM.
> I've just launched an spark job and is working fine!
>
> Thank you very much for your help
>
> 2015-05-28 16:20 GMT+02:00 Alberto Rodriguez <ar...@gmail.com>:
>
> > Hi Alex,
> >
> > see following an extract of the chronos log (not sure whether this is the
> > log you were talking about):
> >
> > 2015-05-28_14:18:28.49322 [2015-05-28 14:18:28,491] INFO No tasks
> > scheduled! Declining offers
> > (com.airbnb.scheduler.mesos.MesosJobFramework:106)
> > 2015-05-28_14:18:34.49896 [2015-05-28 14:18:34,497] INFO Received
> resource
> > offers
> > 2015-05-28_14:18:34.49903
> >  (com.airbnb.scheduler.mesos.MesosJobFramework:87)
> > 2015-05-28_14:18:34.50036 [2015-05-28 14:18:34,498] INFO No tasks
> > scheduled! Declining offers
> > (com.airbnb.scheduler.mesos.MesosJobFramework:106)
> > 2015-05-28_14:18:40.50442 [2015-05-28 14:18:40,503] INFO Received
> resource
> > offers
> > 2015-05-28_14:18:40.50444
> >  (com.airbnb.scheduler.mesos.MesosJobFramework:87)
> > 2015-05-28_14:18:40.50506 [2015-05-28 14:18:40,503] INFO No tasks
> > scheduled! Declining offers
> > (com.airbnb.scheduler.mesos.MesosJobFramework:106)
> >
> > I'm using 0.20.1 because I'm using this vagrant machine:
> > https://github.com/Banno/vagrant-mesos
> >
> > Kind regards and thank you again for your help
> >
> > 2015-05-28 14:09 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:
> >
> >> Alberto,
> >>
> >> it looks like Spark scheduler disconnects right after establishing the
> >> connection. Would you mind sharing scheduler logs as well? Also I see
> that
> >> you haven't specified the failover_timeout, try setting this value to
> >> something meaningful (several hours for test purposes).
> >>
> >> And by the way, any reason you're still on Mesos 0.20.1?
> >>
> >> On Wed, May 27, 2015 at 5:32 PM, Alberto Rodriguez <ar...@gmail.com>
> >> wrote:
> >>
> >> > Hi Alex,
> >> >
> >> > I do not know what's going on, now I'm unable to access the spark
> >> console
> >> > again, it's hanging up in the same point as before. See following the
> >> > master logs:
> >> >
> >> > 2015-05-27_15:30:53.68764 I0527 15:30:53.687494   944 master.cpp:3760]
> >> > Sending 1 offers to framework 20150527-100126-169978048-5050-1851-0001
> >> > (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at scheduler-be29901f-39ab-4bdf
> >> > -a9ec-691032775860@192.168.33.10:32768
> >> > 2015-05-27_15:30:53.69032 I0527 15:30:53.690196   942 master.cpp:2273]
> >> > Processing ACCEPT call for offers: [
> >> > 20150527-152023-169978048-5050-876-O241 ] on slave
> >> > 20150527-152023-169978048-5050-876-S0 at slave(1)@19
> >> > 2.168.33.11:5051 (mesos-slave1) for framework
> >> > 20150527-100126-169978048-5050-1851-0001
> >> > (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at
> >> > scheduler-be29901f-39ab-4bdf-a9ec-691032775860@192.168.33.10:32768
> >> > 2015-05-27_15:30:53.69038 I0527 15:30:53.690300   942
> >> hierarchical.hpp:648]
> >> > Recovered mem(*):1024; cpus(*):2; disk(*):33375;
> ports(*):[31000-32000]
> >> > (total allocatable: mem(*):1024; cpus(*):2; disk(*):33375; port
> >> > s(*):[31000-32000]) on slave 20150527-152023-169978048-5050-876-S0
> from
> >> > framework 20150527-100126-169978048-5050-1851-0001
> >> > 2015-05-27_15:30:54.00952 I0527 15:30:54.009363   937 master.cpp:1574]
> >> > Received registration request for framework 'Spark shell' at
> >> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562
> >> > 2015-05-27_15:30:54.00957 I0527 15:30:54.009461   937 master.cpp:1638]
> >> > Registering framework 20150527-152023-169978048-5050-876-0026 (Spark
> >> shell)
> >> > at scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:5556
> >> > 2
> >> > 2015-05-27_15:30:54.00994 I0527 15:30:54.009703   937
> >> hierarchical.hpp:321]
> >> > Added framework 20150527-152023-169978048-5050-876-0026
> >> > 2015-05-27_15:30:54.00996 I0527 15:30:54.009826   937 master.cpp:3760]
> >> > Sending 1 offers to framework 20150527-152023-169978048-5050-876-0026
> >> > (Spark shell) at scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.
> >> > 0.1:55562
> >> > 2015-05-27_15:30:54.01035 I0527 15:30:54.010267   944 master.cpp:878]
> >> > Framework 20150527-152023-169978048-5050-876-0026 (Spark shell) at
> >> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562
> >> disconnecte
> >> > d
> >> > 2015-05-27_15:30:54.01037 I0527 15:30:54.010308   944 master.cpp:1948]
> >> > Disconnecting framework 20150527-152023-169978048-5050-876-0026 (Spark
> >> > shell) at scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55
> >> > 562
> >> > 2015-05-27_15:30:54.01038 I0527 15:30:54.010326   944 master.cpp:1964]
> >> > Deactivating framework 20150527-152023-169978048-5050-876-0026 (Spark
> >> > shell) at
> scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:555
> >> > 62
> >> > 2015-05-27_15:30:54.01053 I0527 15:30:54.010447   939
> >> hierarchical.hpp:400]
> >> > Deactivated framework 20150527-152023-169978048-5050-876-0026
> >> > 2015-05-27_15:30:54.01055 I0527 15:30:54.010459   944 master.cpp:900]
> >> > Giving framework 20150527-152023-169978048-5050-876-0026 (Spark shell)
> >> at
> >> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562 0ns
> >> > to failover
> >> >
> >> >
> >> > Kind regards and thank you very much for your help!!
> >> >
> >> >
> >> >
> >> > 2015-05-27 16:28 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:
> >> >
> >> > > Alberto,
> >> > >
> >> > > would you mind providing slave and master logs (or appropriate parts
> >> of
> >> > > them)? Have you specified the --work_dir flag for your Mesos
> Workers?
> >> > >
> >> > > On Wed, May 27, 2015 at 3:56 PM, Alberto Rodriguez <
> ardlema@gmail.com
> >> >
> >> > > wrote:
> >> > >
> >> > > > Hi Alex,
> >> > > >
> >> > > > Thank you for replying. I managed to fix the first problem but now
> >> > when I
> >> > > > launch a spark job through my console mesos is losing all the
> >> tasks. I
> >> > > can
> >> > > > see them all in my mesos slave but their status is LOST. The
> stderr
> >> &
> >> > > > stdout files of the tasks are both empty.
> >> > > >
> >> > > > Any ideas?
> >> > > >
> >> > > > 2015-05-26 17:35 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:
> >> > > >
> >> > > > > Alberto,
> >> > > > >
> >> > > > > What may be happening in your case is that Master is not able to
> >> talk
> >> > > to
> >> > > > > your scheduler. When responding to a scheduler, Mesos Master
> >> doesn't
> >> > > use
> >> > > > > the IP from which a request came from, but rather an IP set in
> the
> >> > > > > "Libprocess-from" field instead. That's exactly what you specify
> >> in
> >> > > > > LIBPROCESS_IP env var prior starting your scheduler. Could you
> >> please
> >> > > > > double check the it set up correctly and that IP is reachable
> for
> >> > Mesos
> >> > > > > Master?
> >> > > > >
> >> > > > > In case you are not able to solve the problem, please provide
> >> > scheduler
> >> > > > and
> >> > > > > Master logs together with master, zookeeper, and scheduler
> >> > > > configurations.
> >> > > > >
> >> > > > >
> >> > > > > On Mon, May 25, 2015 at 6:30 PM, Alberto Rodriguez <
> >> > ardlema@gmail.com>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Hi all,
> >> > > > > >
> >> > > > > > I managed to get a mesos cluster up & running on a Ubuntu VM.
> >> I've
> >> > > > > > been also able to run and connect a spark-shell from this
> >> machine
> >> > and
> >> > > > > > it works properly.
> >> > > > > >
> >> > > > > > Unfortunately, I'm trying to connect from the host machine
> where
> >> > the
> >> > > > > > VM is running to launch spark jobs and I can not.
> >> > > > > >
> >> > > > > > See below the spark console output:
> >> > > > > >
> >> > > > > > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM,
> >> Java
> >> > > > > > 1.7.0_75)
> >> > > > > > Type in expressions to have them evaluated.
> >> > > > > > Type :help for more information.
> >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing view acls to:
> >> > > > arodriguez
> >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing modify acls
> to:
> >> > > > > arodriguez
> >> > > > > > 15/05/25 18:13:00 INFO SecurityManager: SecurityManager:
> >> > > > > > authentication disabled; ui acls disabled; users with view
> >> > > > > > permissions: Set(arodriguez); users with modify permissions:
> >> > > > > > Set(arodriguez)
> >> > > > > > 15/05/25 18:13:01 INFO Slf4jLogger: Slf4jLogger started
> >> > > > > > 15/05/25 18:13:01 INFO Remoting: Starting remoting
> >> > > > > > 15/05/25 18:13:01 INFO Remoting: Remoting started; listening
> on
> >> > > > > > addresses :[akka.tcp://sparkDriver@localhost.localdomain
> :47229]
> >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service
> >> > > > > > 'sparkDriver' on port 47229.
> >> > > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering MapOutputTracker
> >> > > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering
> BlockManagerMaster
> >> > > > > > 15/05/25 18:13:01 INFO DiskBlockManager: Created local
> >> directory at
> >> > > > > > /tmp/spark-local-20150525181301-7fa8
> >> > > > > > 15/05/25 18:13:01 INFO MemoryStore: MemoryStore started with
> >> > capacity
> >> > > > > > 265.4 MB
> >> > > > > > 15/05/25 18:13:01 WARN NativeCodeLoader: Unable to load
> >> > native-hadoop
> >> > > > > > library for your platform... using builtin-java classes where
> >> > > > > > applicable
> >> > > > > > 15/05/25 18:13:01 INFO HttpFileServer: HTTP File server
> >> directory
> >> > is
> >> > > > > > /tmp/spark-1249c23f-adc8-4fcd-a044-b65a80f40e16
> >> > > > > > 15/05/25 18:13:01 INFO HttpServer: Starting HTTP Server
> >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service
> 'HTTP
> >> > file
> >> > > > > > server' on port 51659.
> >> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service
> >> > 'SparkUI'
> >> > > > > > on port 4040.
> >> > > > > > 15/05/25 18:13:01 INFO SparkUI: Started SparkUI at
> >> > > > > > http://localhost.localdomain:4040
> >> > > > > > WARNING: Logging before InitGoogleLogging() is written to
> STDERR
> >> > > > > > W0525 18:13:01.749449 10908 sched.cpp:1323]
> >> > > > > > **************************************************
> >> > > > > > Scheduler driver bound to loopback interface! Cannot
> communicate
> >> > with
> >> > > > > > remote master(s). You might want to set 'LIBPROCESS_IP'
> >> environment
> >> > > > > > variable to use a routable IP address.
> >> > > > > > **************************************************
> >> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> >> > @712:
> >> > > > > > Client environment:zookeeper.version=zookeeper C client 3.4.6
> >> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> >> > @716:
> >> > > > > > Client environment:host.name=localhost.localdomain
> >> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> >> > @723:
> >> > > > > > Client environment:os.name=Linux
> >> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> >> > @724:
> >> > > > > > Client environment:os.arch=3.19.7-200.fc21.x86_64
> >> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> >> > @725:
> >> > > > > > Client environment:os.version=#1 SMP Thu May 7 22:00:21 UTC
> 2015
> >> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> >> > @733:
> >> > > > > > Client environment:user.name=arodriguez
> >> > > > > > I0525 18:13:01.749791 10908 sched.cpp:157] Version: 0.22.1
> >> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> >> > @741:
> >> > > > > > Client environment:user.home=/home/arodriguez
> >> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> >> > @753:
> >> > > > > > Client
> >> > > > > >
> >> > >
> >> environment:user.dir=/home/arodriguez/dev/spark-1.2.0-bin-hadoop2.4/bin
> >> > > > > > 2015-05-25
> >> > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@zookeeper_init
> >> > > > > @786:
> >> > > > > > Initiating client connection, host=10.141.141.10:2181
> >> > > > > > sessionTimeout=10000 watcher=0x7fd4c2f0d5b0 sessionId=0
> >> > > > > > sessionPasswd=<null> context=0x7fd3d40063c0 flags=0
> >> > > > > > 2015-05-25
> >> 18:13:01,750:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> >> > > > > @1705:
> >> > > > > > initiated connection to server [10.141.141.10:2181]
> >> > > > > > 2015-05-25
> >> 18:13:01,752:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> >> > > > > @1752:
> >> > > > > > session establishment complete on server [10.141.141.10:2181
> ],
> >> > > > > > sessionId=0x14d8babef360022, negotiated timeout=10000
> >> > > > > > I0525 18:13:01.752760 10913 group.cpp:313] Group process
> >> > > > > > (group(1)@127.0.0.1:48557) connected to ZooKeeper
> >> > > > > > I0525 18:13:01.752787 10913 group.cpp:790] Syncing group
> >> > operations:
> >> > > > > > queue size (joins, cancels, datas) = (0, 0, 0)
> >> > > > > > I0525 18:13:01.752807 10913 group.cpp:385] Trying to create
> path
> >> > > > > > '/mesos' in ZooKeeper
> >> > > > > > I0525 18:13:01.754317 10909 detector.cpp:138] Detected a new
> >> > leader:
> >> > > > > > (id='16')
> >> > > > > > I0525 18:13:01.754408 10913 group.cpp:659] Trying to get
> >> > > > > > '/mesos/info_0000000016' in ZooKeeper
> >> > > > > > I0525 18:13:01.755056 10913 detector.cpp:452] A new leading
> >> master
> >> > > > > > (UPID=master@127.0.1.1:5050) is detected
> >> > > > > > I0525 18:13:01.755113 10911 sched.cpp:254] New master detected
> >> at
> >> > > > > > master@127.0.1.1:5050
> >> > > > > > I0525 18:13:01.755345 10911 sched.cpp:264] No credentials
> >> provided.
> >> > > > > > Attempting to register without authentication
> >> > > > > >
> >> > > > > >
> >> > > > > > It hangs up in the last line.
> >> > > > > >
> >> > > > > > I've tried to set the LIBPROCESS_IP env variable with no luck.
> >> > > > > >
> >> > > > > > Any advice?
> >> > > > > >
> >> > > > > > Thank you in advance.
> >> > > > > >
> >> > > > > > Kind regards,
> >> > > > > >
> >> > > > > > Alberto
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Not able to connect to mesos from different machine

Posted by Alberto Rodriguez <ar...@gmail.com>.
Hi Alex,

I managed to make it work!! Finally I'm running both mesos master and slave
in my laptop and picking up the spark jar from a hdfs installed in a VM.
I've just launched an spark job and is working fine!

Thank you very much for your help

2015-05-28 16:20 GMT+02:00 Alberto Rodriguez <ar...@gmail.com>:

> Hi Alex,
>
> see following an extract of the chronos log (not sure whether this is the
> log you were talking about):
>
> 2015-05-28_14:18:28.49322 [2015-05-28 14:18:28,491] INFO No tasks
> scheduled! Declining offers
> (com.airbnb.scheduler.mesos.MesosJobFramework:106)
> 2015-05-28_14:18:34.49896 [2015-05-28 14:18:34,497] INFO Received resource
> offers
> 2015-05-28_14:18:34.49903
>  (com.airbnb.scheduler.mesos.MesosJobFramework:87)
> 2015-05-28_14:18:34.50036 [2015-05-28 14:18:34,498] INFO No tasks
> scheduled! Declining offers
> (com.airbnb.scheduler.mesos.MesosJobFramework:106)
> 2015-05-28_14:18:40.50442 [2015-05-28 14:18:40,503] INFO Received resource
> offers
> 2015-05-28_14:18:40.50444
>  (com.airbnb.scheduler.mesos.MesosJobFramework:87)
> 2015-05-28_14:18:40.50506 [2015-05-28 14:18:40,503] INFO No tasks
> scheduled! Declining offers
> (com.airbnb.scheduler.mesos.MesosJobFramework:106)
>
> I'm using 0.20.1 because I'm using this vagrant machine:
> https://github.com/Banno/vagrant-mesos
>
> Kind regards and thank you again for your help
>
> 2015-05-28 14:09 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:
>
>> Alberto,
>>
>> it looks like Spark scheduler disconnects right after establishing the
>> connection. Would you mind sharing scheduler logs as well? Also I see that
>> you haven't specified the failover_timeout, try setting this value to
>> something meaningful (several hours for test purposes).
>>
>> And by the way, any reason you're still on Mesos 0.20.1?
>>
>> On Wed, May 27, 2015 at 5:32 PM, Alberto Rodriguez <ar...@gmail.com>
>> wrote:
>>
>> > Hi Alex,
>> >
>> > I do not know what's going on, now I'm unable to access the spark
>> console
>> > again, it's hanging up in the same point as before. See following the
>> > master logs:
>> >
>> > 2015-05-27_15:30:53.68764 I0527 15:30:53.687494   944 master.cpp:3760]
>> > Sending 1 offers to framework 20150527-100126-169978048-5050-1851-0001
>> > (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at scheduler-be29901f-39ab-4bdf
>> > -a9ec-691032775860@192.168.33.10:32768
>> > 2015-05-27_15:30:53.69032 I0527 15:30:53.690196   942 master.cpp:2273]
>> > Processing ACCEPT call for offers: [
>> > 20150527-152023-169978048-5050-876-O241 ] on slave
>> > 20150527-152023-169978048-5050-876-S0 at slave(1)@19
>> > 2.168.33.11:5051 (mesos-slave1) for framework
>> > 20150527-100126-169978048-5050-1851-0001
>> > (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at
>> > scheduler-be29901f-39ab-4bdf-a9ec-691032775860@192.168.33.10:32768
>> > 2015-05-27_15:30:53.69038 I0527 15:30:53.690300   942
>> hierarchical.hpp:648]
>> > Recovered mem(*):1024; cpus(*):2; disk(*):33375; ports(*):[31000-32000]
>> > (total allocatable: mem(*):1024; cpus(*):2; disk(*):33375; port
>> > s(*):[31000-32000]) on slave 20150527-152023-169978048-5050-876-S0 from
>> > framework 20150527-100126-169978048-5050-1851-0001
>> > 2015-05-27_15:30:54.00952 I0527 15:30:54.009363   937 master.cpp:1574]
>> > Received registration request for framework 'Spark shell' at
>> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562
>> > 2015-05-27_15:30:54.00957 I0527 15:30:54.009461   937 master.cpp:1638]
>> > Registering framework 20150527-152023-169978048-5050-876-0026 (Spark
>> shell)
>> > at scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:5556
>> > 2
>> > 2015-05-27_15:30:54.00994 I0527 15:30:54.009703   937
>> hierarchical.hpp:321]
>> > Added framework 20150527-152023-169978048-5050-876-0026
>> > 2015-05-27_15:30:54.00996 I0527 15:30:54.009826   937 master.cpp:3760]
>> > Sending 1 offers to framework 20150527-152023-169978048-5050-876-0026
>> > (Spark shell) at scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.
>> > 0.1:55562
>> > 2015-05-27_15:30:54.01035 I0527 15:30:54.010267   944 master.cpp:878]
>> > Framework 20150527-152023-169978048-5050-876-0026 (Spark shell) at
>> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562
>> disconnecte
>> > d
>> > 2015-05-27_15:30:54.01037 I0527 15:30:54.010308   944 master.cpp:1948]
>> > Disconnecting framework 20150527-152023-169978048-5050-876-0026 (Spark
>> > shell) at scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55
>> > 562
>> > 2015-05-27_15:30:54.01038 I0527 15:30:54.010326   944 master.cpp:1964]
>> > Deactivating framework 20150527-152023-169978048-5050-876-0026 (Spark
>> > shell) at scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:555
>> > 62
>> > 2015-05-27_15:30:54.01053 I0527 15:30:54.010447   939
>> hierarchical.hpp:400]
>> > Deactivated framework 20150527-152023-169978048-5050-876-0026
>> > 2015-05-27_15:30:54.01055 I0527 15:30:54.010459   944 master.cpp:900]
>> > Giving framework 20150527-152023-169978048-5050-876-0026 (Spark shell)
>> at
>> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562 0ns
>> > to failover
>> >
>> >
>> > Kind regards and thank you very much for your help!!
>> >
>> >
>> >
>> > 2015-05-27 16:28 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:
>> >
>> > > Alberto,
>> > >
>> > > would you mind providing slave and master logs (or appropriate parts
>> of
>> > > them)? Have you specified the --work_dir flag for your Mesos Workers?
>> > >
>> > > On Wed, May 27, 2015 at 3:56 PM, Alberto Rodriguez <ardlema@gmail.com
>> >
>> > > wrote:
>> > >
>> > > > Hi Alex,
>> > > >
>> > > > Thank you for replying. I managed to fix the first problem but now
>> > when I
>> > > > launch a spark job through my console mesos is losing all the
>> tasks. I
>> > > can
>> > > > see them all in my mesos slave but their status is LOST. The stderr
>> &
>> > > > stdout files of the tasks are both empty.
>> > > >
>> > > > Any ideas?
>> > > >
>> > > > 2015-05-26 17:35 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:
>> > > >
>> > > > > Alberto,
>> > > > >
>> > > > > What may be happening in your case is that Master is not able to
>> talk
>> > > to
>> > > > > your scheduler. When responding to a scheduler, Mesos Master
>> doesn't
>> > > use
>> > > > > the IP from which a request came from, but rather an IP set in the
>> > > > > "Libprocess-from" field instead. That's exactly what you specify
>> in
>> > > > > LIBPROCESS_IP env var prior starting your scheduler. Could you
>> please
>> > > > > double check the it set up correctly and that IP is reachable for
>> > Mesos
>> > > > > Master?
>> > > > >
>> > > > > In case you are not able to solve the problem, please provide
>> > scheduler
>> > > > and
>> > > > > Master logs together with master, zookeeper, and scheduler
>> > > > configurations.
>> > > > >
>> > > > >
>> > > > > On Mon, May 25, 2015 at 6:30 PM, Alberto Rodriguez <
>> > ardlema@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi all,
>> > > > > >
>> > > > > > I managed to get a mesos cluster up & running on a Ubuntu VM.
>> I've
>> > > > > > been also able to run and connect a spark-shell from this
>> machine
>> > and
>> > > > > > it works properly.
>> > > > > >
>> > > > > > Unfortunately, I'm trying to connect from the host machine where
>> > the
>> > > > > > VM is running to launch spark jobs and I can not.
>> > > > > >
>> > > > > > See below the spark console output:
>> > > > > >
>> > > > > > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM,
>> Java
>> > > > > > 1.7.0_75)
>> > > > > > Type in expressions to have them evaluated.
>> > > > > > Type :help for more information.
>> > > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing view acls to:
>> > > > arodriguez
>> > > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing modify acls to:
>> > > > > arodriguez
>> > > > > > 15/05/25 18:13:00 INFO SecurityManager: SecurityManager:
>> > > > > > authentication disabled; ui acls disabled; users with view
>> > > > > > permissions: Set(arodriguez); users with modify permissions:
>> > > > > > Set(arodriguez)
>> > > > > > 15/05/25 18:13:01 INFO Slf4jLogger: Slf4jLogger started
>> > > > > > 15/05/25 18:13:01 INFO Remoting: Starting remoting
>> > > > > > 15/05/25 18:13:01 INFO Remoting: Remoting started; listening on
>> > > > > > addresses :[akka.tcp://sparkDriver@localhost.localdomain:47229]
>> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service
>> > > > > > 'sparkDriver' on port 47229.
>> > > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering MapOutputTracker
>> > > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering BlockManagerMaster
>> > > > > > 15/05/25 18:13:01 INFO DiskBlockManager: Created local
>> directory at
>> > > > > > /tmp/spark-local-20150525181301-7fa8
>> > > > > > 15/05/25 18:13:01 INFO MemoryStore: MemoryStore started with
>> > capacity
>> > > > > > 265.4 MB
>> > > > > > 15/05/25 18:13:01 WARN NativeCodeLoader: Unable to load
>> > native-hadoop
>> > > > > > library for your platform... using builtin-java classes where
>> > > > > > applicable
>> > > > > > 15/05/25 18:13:01 INFO HttpFileServer: HTTP File server
>> directory
>> > is
>> > > > > > /tmp/spark-1249c23f-adc8-4fcd-a044-b65a80f40e16
>> > > > > > 15/05/25 18:13:01 INFO HttpServer: Starting HTTP Server
>> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service 'HTTP
>> > file
>> > > > > > server' on port 51659.
>> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service
>> > 'SparkUI'
>> > > > > > on port 4040.
>> > > > > > 15/05/25 18:13:01 INFO SparkUI: Started SparkUI at
>> > > > > > http://localhost.localdomain:4040
>> > > > > > WARNING: Logging before InitGoogleLogging() is written to STDERR
>> > > > > > W0525 18:13:01.749449 10908 sched.cpp:1323]
>> > > > > > **************************************************
>> > > > > > Scheduler driver bound to loopback interface! Cannot communicate
>> > with
>> > > > > > remote master(s). You might want to set 'LIBPROCESS_IP'
>> environment
>> > > > > > variable to use a routable IP address.
>> > > > > > **************************************************
>> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
>> > @712:
>> > > > > > Client environment:zookeeper.version=zookeeper C client 3.4.6
>> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
>> > @716:
>> > > > > > Client environment:host.name=localhost.localdomain
>> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
>> > @723:
>> > > > > > Client environment:os.name=Linux
>> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
>> > @724:
>> > > > > > Client environment:os.arch=3.19.7-200.fc21.x86_64
>> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
>> > @725:
>> > > > > > Client environment:os.version=#1 SMP Thu May 7 22:00:21 UTC 2015
>> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
>> > @733:
>> > > > > > Client environment:user.name=arodriguez
>> > > > > > I0525 18:13:01.749791 10908 sched.cpp:157] Version: 0.22.1
>> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
>> > @741:
>> > > > > > Client environment:user.home=/home/arodriguez
>> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
>> > @753:
>> > > > > > Client
>> > > > > >
>> > >
>> environment:user.dir=/home/arodriguez/dev/spark-1.2.0-bin-hadoop2.4/bin
>> > > > > > 2015-05-25
>> > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@zookeeper_init
>> > > > > @786:
>> > > > > > Initiating client connection, host=10.141.141.10:2181
>> > > > > > sessionTimeout=10000 watcher=0x7fd4c2f0d5b0 sessionId=0
>> > > > > > sessionPasswd=<null> context=0x7fd3d40063c0 flags=0
>> > > > > > 2015-05-25
>> 18:13:01,750:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
>> > > > > @1705:
>> > > > > > initiated connection to server [10.141.141.10:2181]
>> > > > > > 2015-05-25
>> 18:13:01,752:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
>> > > > > @1752:
>> > > > > > session establishment complete on server [10.141.141.10:2181],
>> > > > > > sessionId=0x14d8babef360022, negotiated timeout=10000
>> > > > > > I0525 18:13:01.752760 10913 group.cpp:313] Group process
>> > > > > > (group(1)@127.0.0.1:48557) connected to ZooKeeper
>> > > > > > I0525 18:13:01.752787 10913 group.cpp:790] Syncing group
>> > operations:
>> > > > > > queue size (joins, cancels, datas) = (0, 0, 0)
>> > > > > > I0525 18:13:01.752807 10913 group.cpp:385] Trying to create path
>> > > > > > '/mesos' in ZooKeeper
>> > > > > > I0525 18:13:01.754317 10909 detector.cpp:138] Detected a new
>> > leader:
>> > > > > > (id='16')
>> > > > > > I0525 18:13:01.754408 10913 group.cpp:659] Trying to get
>> > > > > > '/mesos/info_0000000016' in ZooKeeper
>> > > > > > I0525 18:13:01.755056 10913 detector.cpp:452] A new leading
>> master
>> > > > > > (UPID=master@127.0.1.1:5050) is detected
>> > > > > > I0525 18:13:01.755113 10911 sched.cpp:254] New master detected
>> at
>> > > > > > master@127.0.1.1:5050
>> > > > > > I0525 18:13:01.755345 10911 sched.cpp:264] No credentials
>> provided.
>> > > > > > Attempting to register without authentication
>> > > > > >
>> > > > > >
>> > > > > > It hangs up in the last line.
>> > > > > >
>> > > > > > I've tried to set the LIBPROCESS_IP env variable with no luck.
>> > > > > >
>> > > > > > Any advice?
>> > > > > >
>> > > > > > Thank you in advance.
>> > > > > >
>> > > > > > Kind regards,
>> > > > > >
>> > > > > > Alberto
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Not able to connect to mesos from different machine

Posted by Alberto Rodriguez <ar...@gmail.com>.
Hi Alex,

see following an extract of the chronos log (not sure whether this is the
log you were talking about):

2015-05-28_14:18:28.49322 [2015-05-28 14:18:28,491] INFO No tasks
scheduled! Declining offers
(com.airbnb.scheduler.mesos.MesosJobFramework:106)
2015-05-28_14:18:34.49896 [2015-05-28 14:18:34,497] INFO Received resource
offers
2015-05-28_14:18:34.49903  (com.airbnb.scheduler.mesos.MesosJobFramework:87)
2015-05-28_14:18:34.50036 [2015-05-28 14:18:34,498] INFO No tasks
scheduled! Declining offers
(com.airbnb.scheduler.mesos.MesosJobFramework:106)
2015-05-28_14:18:40.50442 [2015-05-28 14:18:40,503] INFO Received resource
offers
2015-05-28_14:18:40.50444  (com.airbnb.scheduler.mesos.MesosJobFramework:87)
2015-05-28_14:18:40.50506 [2015-05-28 14:18:40,503] INFO No tasks
scheduled! Declining offers
(com.airbnb.scheduler.mesos.MesosJobFramework:106)

I'm using 0.20.1 because I'm using this vagrant machine:
https://github.com/Banno/vagrant-mesos

Kind regards and thank you again for your help

2015-05-28 14:09 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:

> Alberto,
>
> it looks like Spark scheduler disconnects right after establishing the
> connection. Would you mind sharing scheduler logs as well? Also I see that
> you haven't specified the failover_timeout, try setting this value to
> something meaningful (several hours for test purposes).
>
> And by the way, any reason you're still on Mesos 0.20.1?
>
> On Wed, May 27, 2015 at 5:32 PM, Alberto Rodriguez <ar...@gmail.com>
> wrote:
>
> > Hi Alex,
> >
> > I do not know what's going on, now I'm unable to access the spark console
> > again, it's hanging up in the same point as before. See following the
> > master logs:
> >
> > 2015-05-27_15:30:53.68764 I0527 15:30:53.687494   944 master.cpp:3760]
> > Sending 1 offers to framework 20150527-100126-169978048-5050-1851-0001
> > (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at scheduler-be29901f-39ab-4bdf
> > -a9ec-691032775860@192.168.33.10:32768
> > 2015-05-27_15:30:53.69032 I0527 15:30:53.690196   942 master.cpp:2273]
> > Processing ACCEPT call for offers: [
> > 20150527-152023-169978048-5050-876-O241 ] on slave
> > 20150527-152023-169978048-5050-876-S0 at slave(1)@19
> > 2.168.33.11:5051 (mesos-slave1) for framework
> > 20150527-100126-169978048-5050-1851-0001
> > (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at
> > scheduler-be29901f-39ab-4bdf-a9ec-691032775860@192.168.33.10:32768
> > 2015-05-27_15:30:53.69038 I0527 15:30:53.690300   942
> hierarchical.hpp:648]
> > Recovered mem(*):1024; cpus(*):2; disk(*):33375; ports(*):[31000-32000]
> > (total allocatable: mem(*):1024; cpus(*):2; disk(*):33375; port
> > s(*):[31000-32000]) on slave 20150527-152023-169978048-5050-876-S0 from
> > framework 20150527-100126-169978048-5050-1851-0001
> > 2015-05-27_15:30:54.00952 I0527 15:30:54.009363   937 master.cpp:1574]
> > Received registration request for framework 'Spark shell' at
> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562
> > 2015-05-27_15:30:54.00957 I0527 15:30:54.009461   937 master.cpp:1638]
> > Registering framework 20150527-152023-169978048-5050-876-0026 (Spark
> shell)
> > at scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:5556
> > 2
> > 2015-05-27_15:30:54.00994 I0527 15:30:54.009703   937
> hierarchical.hpp:321]
> > Added framework 20150527-152023-169978048-5050-876-0026
> > 2015-05-27_15:30:54.00996 I0527 15:30:54.009826   937 master.cpp:3760]
> > Sending 1 offers to framework 20150527-152023-169978048-5050-876-0026
> > (Spark shell) at scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.
> > 0.1:55562
> > 2015-05-27_15:30:54.01035 I0527 15:30:54.010267   944 master.cpp:878]
> > Framework 20150527-152023-169978048-5050-876-0026 (Spark shell) at
> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562
> disconnecte
> > d
> > 2015-05-27_15:30:54.01037 I0527 15:30:54.010308   944 master.cpp:1948]
> > Disconnecting framework 20150527-152023-169978048-5050-876-0026 (Spark
> > shell) at scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55
> > 562
> > 2015-05-27_15:30:54.01038 I0527 15:30:54.010326   944 master.cpp:1964]
> > Deactivating framework 20150527-152023-169978048-5050-876-0026 (Spark
> > shell) at scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:555
> > 62
> > 2015-05-27_15:30:54.01053 I0527 15:30:54.010447   939
> hierarchical.hpp:400]
> > Deactivated framework 20150527-152023-169978048-5050-876-0026
> > 2015-05-27_15:30:54.01055 I0527 15:30:54.010459   944 master.cpp:900]
> > Giving framework 20150527-152023-169978048-5050-876-0026 (Spark shell) at
> > scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562 0ns
> > to failover
> >
> >
> > Kind regards and thank you very much for your help!!
> >
> >
> >
> > 2015-05-27 16:28 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:
> >
> > > Alberto,
> > >
> > > would you mind providing slave and master logs (or appropriate parts of
> > > them)? Have you specified the --work_dir flag for your Mesos Workers?
> > >
> > > On Wed, May 27, 2015 at 3:56 PM, Alberto Rodriguez <ar...@gmail.com>
> > > wrote:
> > >
> > > > Hi Alex,
> > > >
> > > > Thank you for replying. I managed to fix the first problem but now
> > when I
> > > > launch a spark job through my console mesos is losing all the tasks.
> I
> > > can
> > > > see them all in my mesos slave but their status is LOST. The stderr &
> > > > stdout files of the tasks are both empty.
> > > >
> > > > Any ideas?
> > > >
> > > > 2015-05-26 17:35 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:
> > > >
> > > > > Alberto,
> > > > >
> > > > > What may be happening in your case is that Master is not able to
> talk
> > > to
> > > > > your scheduler. When responding to a scheduler, Mesos Master
> doesn't
> > > use
> > > > > the IP from which a request came from, but rather an IP set in the
> > > > > "Libprocess-from" field instead. That's exactly what you specify in
> > > > > LIBPROCESS_IP env var prior starting your scheduler. Could you
> please
> > > > > double check the it set up correctly and that IP is reachable for
> > Mesos
> > > > > Master?
> > > > >
> > > > > In case you are not able to solve the problem, please provide
> > scheduler
> > > > and
> > > > > Master logs together with master, zookeeper, and scheduler
> > > > configurations.
> > > > >
> > > > >
> > > > > On Mon, May 25, 2015 at 6:30 PM, Alberto Rodriguez <
> > ardlema@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I managed to get a mesos cluster up & running on a Ubuntu VM.
> I've
> > > > > > been also able to run and connect a spark-shell from this machine
> > and
> > > > > > it works properly.
> > > > > >
> > > > > > Unfortunately, I'm trying to connect from the host machine where
> > the
> > > > > > VM is running to launch spark jobs and I can not.
> > > > > >
> > > > > > See below the spark console output:
> > > > > >
> > > > > > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM,
> Java
> > > > > > 1.7.0_75)
> > > > > > Type in expressions to have them evaluated.
> > > > > > Type :help for more information.
> > > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing view acls to:
> > > > arodriguez
> > > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing modify acls to:
> > > > > arodriguez
> > > > > > 15/05/25 18:13:00 INFO SecurityManager: SecurityManager:
> > > > > > authentication disabled; ui acls disabled; users with view
> > > > > > permissions: Set(arodriguez); users with modify permissions:
> > > > > > Set(arodriguez)
> > > > > > 15/05/25 18:13:01 INFO Slf4jLogger: Slf4jLogger started
> > > > > > 15/05/25 18:13:01 INFO Remoting: Starting remoting
> > > > > > 15/05/25 18:13:01 INFO Remoting: Remoting started; listening on
> > > > > > addresses :[akka.tcp://sparkDriver@localhost.localdomain:47229]
> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service
> > > > > > 'sparkDriver' on port 47229.
> > > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering MapOutputTracker
> > > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering BlockManagerMaster
> > > > > > 15/05/25 18:13:01 INFO DiskBlockManager: Created local directory
> at
> > > > > > /tmp/spark-local-20150525181301-7fa8
> > > > > > 15/05/25 18:13:01 INFO MemoryStore: MemoryStore started with
> > capacity
> > > > > > 265.4 MB
> > > > > > 15/05/25 18:13:01 WARN NativeCodeLoader: Unable to load
> > native-hadoop
> > > > > > library for your platform... using builtin-java classes where
> > > > > > applicable
> > > > > > 15/05/25 18:13:01 INFO HttpFileServer: HTTP File server directory
> > is
> > > > > > /tmp/spark-1249c23f-adc8-4fcd-a044-b65a80f40e16
> > > > > > 15/05/25 18:13:01 INFO HttpServer: Starting HTTP Server
> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service 'HTTP
> > file
> > > > > > server' on port 51659.
> > > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service
> > 'SparkUI'
> > > > > > on port 4040.
> > > > > > 15/05/25 18:13:01 INFO SparkUI: Started SparkUI at
> > > > > > http://localhost.localdomain:4040
> > > > > > WARNING: Logging before InitGoogleLogging() is written to STDERR
> > > > > > W0525 18:13:01.749449 10908 sched.cpp:1323]
> > > > > > **************************************************
> > > > > > Scheduler driver bound to loopback interface! Cannot communicate
> > with
> > > > > > remote master(s). You might want to set 'LIBPROCESS_IP'
> environment
> > > > > > variable to use a routable IP address.
> > > > > > **************************************************
> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > @712:
> > > > > > Client environment:zookeeper.version=zookeeper C client 3.4.6
> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > @716:
> > > > > > Client environment:host.name=localhost.localdomain
> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > @723:
> > > > > > Client environment:os.name=Linux
> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > @724:
> > > > > > Client environment:os.arch=3.19.7-200.fc21.x86_64
> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > @725:
> > > > > > Client environment:os.version=#1 SMP Thu May 7 22:00:21 UTC 2015
> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > @733:
> > > > > > Client environment:user.name=arodriguez
> > > > > > I0525 18:13:01.749791 10908 sched.cpp:157] Version: 0.22.1
> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > @741:
> > > > > > Client environment:user.home=/home/arodriguez
> > > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> > @753:
> > > > > > Client
> > > > > >
> > > environment:user.dir=/home/arodriguez/dev/spark-1.2.0-bin-hadoop2.4/bin
> > > > > > 2015-05-25
> > 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@zookeeper_init
> > > > > @786:
> > > > > > Initiating client connection, host=10.141.141.10:2181
> > > > > > sessionTimeout=10000 watcher=0x7fd4c2f0d5b0 sessionId=0
> > > > > > sessionPasswd=<null> context=0x7fd3d40063c0 flags=0
> > > > > > 2015-05-25
> 18:13:01,750:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> > > > > @1705:
> > > > > > initiated connection to server [10.141.141.10:2181]
> > > > > > 2015-05-25
> 18:13:01,752:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> > > > > @1752:
> > > > > > session establishment complete on server [10.141.141.10:2181],
> > > > > > sessionId=0x14d8babef360022, negotiated timeout=10000
> > > > > > I0525 18:13:01.752760 10913 group.cpp:313] Group process
> > > > > > (group(1)@127.0.0.1:48557) connected to ZooKeeper
> > > > > > I0525 18:13:01.752787 10913 group.cpp:790] Syncing group
> > operations:
> > > > > > queue size (joins, cancels, datas) = (0, 0, 0)
> > > > > > I0525 18:13:01.752807 10913 group.cpp:385] Trying to create path
> > > > > > '/mesos' in ZooKeeper
> > > > > > I0525 18:13:01.754317 10909 detector.cpp:138] Detected a new
> > leader:
> > > > > > (id='16')
> > > > > > I0525 18:13:01.754408 10913 group.cpp:659] Trying to get
> > > > > > '/mesos/info_0000000016' in ZooKeeper
> > > > > > I0525 18:13:01.755056 10913 detector.cpp:452] A new leading
> master
> > > > > > (UPID=master@127.0.1.1:5050) is detected
> > > > > > I0525 18:13:01.755113 10911 sched.cpp:254] New master detected at
> > > > > > master@127.0.1.1:5050
> > > > > > I0525 18:13:01.755345 10911 sched.cpp:264] No credentials
> provided.
> > > > > > Attempting to register without authentication
> > > > > >
> > > > > >
> > > > > > It hangs up in the last line.
> > > > > >
> > > > > > I've tried to set the LIBPROCESS_IP env variable with no luck.
> > > > > >
> > > > > > Any advice?
> > > > > >
> > > > > > Thank you in advance.
> > > > > >
> > > > > > Kind regards,
> > > > > >
> > > > > > Alberto
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Not able to connect to mesos from different machine

Posted by Alex Rukletsov <al...@mesosphere.com>.
Alberto,

it looks like Spark scheduler disconnects right after establishing the
connection. Would you mind sharing scheduler logs as well? Also I see that
you haven't specified the failover_timeout, try setting this value to
something meaningful (several hours for test purposes).

And by the way, any reason you're still on Mesos 0.20.1?

On Wed, May 27, 2015 at 5:32 PM, Alberto Rodriguez <ar...@gmail.com>
wrote:

> Hi Alex,
>
> I do not know what's going on, now I'm unable to access the spark console
> again, it's hanging up in the same point as before. See following the
> master logs:
>
> 2015-05-27_15:30:53.68764 I0527 15:30:53.687494   944 master.cpp:3760]
> Sending 1 offers to framework 20150527-100126-169978048-5050-1851-0001
> (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at scheduler-be29901f-39ab-4bdf
> -a9ec-691032775860@192.168.33.10:32768
> 2015-05-27_15:30:53.69032 I0527 15:30:53.690196   942 master.cpp:2273]
> Processing ACCEPT call for offers: [
> 20150527-152023-169978048-5050-876-O241 ] on slave
> 20150527-152023-169978048-5050-876-S0 at slave(1)@19
> 2.168.33.11:5051 (mesos-slave1) for framework
> 20150527-100126-169978048-5050-1851-0001
> (chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at
> scheduler-be29901f-39ab-4bdf-a9ec-691032775860@192.168.33.10:32768
> 2015-05-27_15:30:53.69038 I0527 15:30:53.690300   942 hierarchical.hpp:648]
> Recovered mem(*):1024; cpus(*):2; disk(*):33375; ports(*):[31000-32000]
> (total allocatable: mem(*):1024; cpus(*):2; disk(*):33375; port
> s(*):[31000-32000]) on slave 20150527-152023-169978048-5050-876-S0 from
> framework 20150527-100126-169978048-5050-1851-0001
> 2015-05-27_15:30:54.00952 I0527 15:30:54.009363   937 master.cpp:1574]
> Received registration request for framework 'Spark shell' at
> scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562
> 2015-05-27_15:30:54.00957 I0527 15:30:54.009461   937 master.cpp:1638]
> Registering framework 20150527-152023-169978048-5050-876-0026 (Spark shell)
> at scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:5556
> 2
> 2015-05-27_15:30:54.00994 I0527 15:30:54.009703   937 hierarchical.hpp:321]
> Added framework 20150527-152023-169978048-5050-876-0026
> 2015-05-27_15:30:54.00996 I0527 15:30:54.009826   937 master.cpp:3760]
> Sending 1 offers to framework 20150527-152023-169978048-5050-876-0026
> (Spark shell) at scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.
> 0.1:55562
> 2015-05-27_15:30:54.01035 I0527 15:30:54.010267   944 master.cpp:878]
> Framework 20150527-152023-169978048-5050-876-0026 (Spark shell) at
> scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562 disconnecte
> d
> 2015-05-27_15:30:54.01037 I0527 15:30:54.010308   944 master.cpp:1948]
> Disconnecting framework 20150527-152023-169978048-5050-876-0026 (Spark
> shell) at scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55
> 562
> 2015-05-27_15:30:54.01038 I0527 15:30:54.010326   944 master.cpp:1964]
> Deactivating framework 20150527-152023-169978048-5050-876-0026 (Spark
> shell) at scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:555
> 62
> 2015-05-27_15:30:54.01053 I0527 15:30:54.010447   939 hierarchical.hpp:400]
> Deactivated framework 20150527-152023-169978048-5050-876-0026
> 2015-05-27_15:30:54.01055 I0527 15:30:54.010459   944 master.cpp:900]
> Giving framework 20150527-152023-169978048-5050-876-0026 (Spark shell) at
> scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562 0ns
> to failover
>
>
> Kind regards and thank you very much for your help!!
>
>
>
> 2015-05-27 16:28 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:
>
> > Alberto,
> >
> > would you mind providing slave and master logs (or appropriate parts of
> > them)? Have you specified the --work_dir flag for your Mesos Workers?
> >
> > On Wed, May 27, 2015 at 3:56 PM, Alberto Rodriguez <ar...@gmail.com>
> > wrote:
> >
> > > Hi Alex,
> > >
> > > Thank you for replying. I managed to fix the first problem but now
> when I
> > > launch a spark job through my console mesos is losing all the tasks. I
> > can
> > > see them all in my mesos slave but their status is LOST. The stderr &
> > > stdout files of the tasks are both empty.
> > >
> > > Any ideas?
> > >
> > > 2015-05-26 17:35 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:
> > >
> > > > Alberto,
> > > >
> > > > What may be happening in your case is that Master is not able to talk
> > to
> > > > your scheduler. When responding to a scheduler, Mesos Master doesn't
> > use
> > > > the IP from which a request came from, but rather an IP set in the
> > > > "Libprocess-from" field instead. That's exactly what you specify in
> > > > LIBPROCESS_IP env var prior starting your scheduler. Could you please
> > > > double check the it set up correctly and that IP is reachable for
> Mesos
> > > > Master?
> > > >
> > > > In case you are not able to solve the problem, please provide
> scheduler
> > > and
> > > > Master logs together with master, zookeeper, and scheduler
> > > configurations.
> > > >
> > > >
> > > > On Mon, May 25, 2015 at 6:30 PM, Alberto Rodriguez <
> ardlema@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I managed to get a mesos cluster up & running on a Ubuntu VM. I've
> > > > > been also able to run and connect a spark-shell from this machine
> and
> > > > > it works properly.
> > > > >
> > > > > Unfortunately, I'm trying to connect from the host machine where
> the
> > > > > VM is running to launch spark jobs and I can not.
> > > > >
> > > > > See below the spark console output:
> > > > >
> > > > > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
> > > > > 1.7.0_75)
> > > > > Type in expressions to have them evaluated.
> > > > > Type :help for more information.
> > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing view acls to:
> > > arodriguez
> > > > > 15/05/25 18:13:00 INFO SecurityManager: Changing modify acls to:
> > > > arodriguez
> > > > > 15/05/25 18:13:00 INFO SecurityManager: SecurityManager:
> > > > > authentication disabled; ui acls disabled; users with view
> > > > > permissions: Set(arodriguez); users with modify permissions:
> > > > > Set(arodriguez)
> > > > > 15/05/25 18:13:01 INFO Slf4jLogger: Slf4jLogger started
> > > > > 15/05/25 18:13:01 INFO Remoting: Starting remoting
> > > > > 15/05/25 18:13:01 INFO Remoting: Remoting started; listening on
> > > > > addresses :[akka.tcp://sparkDriver@localhost.localdomain:47229]
> > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service
> > > > > 'sparkDriver' on port 47229.
> > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering MapOutputTracker
> > > > > 15/05/25 18:13:01 INFO SparkEnv: Registering BlockManagerMaster
> > > > > 15/05/25 18:13:01 INFO DiskBlockManager: Created local directory at
> > > > > /tmp/spark-local-20150525181301-7fa8
> > > > > 15/05/25 18:13:01 INFO MemoryStore: MemoryStore started with
> capacity
> > > > > 265.4 MB
> > > > > 15/05/25 18:13:01 WARN NativeCodeLoader: Unable to load
> native-hadoop
> > > > > library for your platform... using builtin-java classes where
> > > > > applicable
> > > > > 15/05/25 18:13:01 INFO HttpFileServer: HTTP File server directory
> is
> > > > > /tmp/spark-1249c23f-adc8-4fcd-a044-b65a80f40e16
> > > > > 15/05/25 18:13:01 INFO HttpServer: Starting HTTP Server
> > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service 'HTTP
> file
> > > > > server' on port 51659.
> > > > > 15/05/25 18:13:01 INFO Utils: Successfully started service
> 'SparkUI'
> > > > > on port 4040.
> > > > > 15/05/25 18:13:01 INFO SparkUI: Started SparkUI at
> > > > > http://localhost.localdomain:4040
> > > > > WARNING: Logging before InitGoogleLogging() is written to STDERR
> > > > > W0525 18:13:01.749449 10908 sched.cpp:1323]
> > > > > **************************************************
> > > > > Scheduler driver bound to loopback interface! Cannot communicate
> with
> > > > > remote master(s). You might want to set 'LIBPROCESS_IP' environment
> > > > > variable to use a routable IP address.
> > > > > **************************************************
> > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> @712:
> > > > > Client environment:zookeeper.version=zookeeper C client 3.4.6
> > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> @716:
> > > > > Client environment:host.name=localhost.localdomain
> > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> @723:
> > > > > Client environment:os.name=Linux
> > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> @724:
> > > > > Client environment:os.arch=3.19.7-200.fc21.x86_64
> > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> @725:
> > > > > Client environment:os.version=#1 SMP Thu May 7 22:00:21 UTC 2015
> > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> @733:
> > > > > Client environment:user.name=arodriguez
> > > > > I0525 18:13:01.749791 10908 sched.cpp:157] Version: 0.22.1
> > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> @741:
> > > > > Client environment:user.home=/home/arodriguez
> > > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env
> @753:
> > > > > Client
> > > > >
> > environment:user.dir=/home/arodriguez/dev/spark-1.2.0-bin-hadoop2.4/bin
> > > > > 2015-05-25
> 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@zookeeper_init
> > > > @786:
> > > > > Initiating client connection, host=10.141.141.10:2181
> > > > > sessionTimeout=10000 watcher=0x7fd4c2f0d5b0 sessionId=0
> > > > > sessionPasswd=<null> context=0x7fd3d40063c0 flags=0
> > > > > 2015-05-25 18:13:01,750:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> > > > @1705:
> > > > > initiated connection to server [10.141.141.10:2181]
> > > > > 2015-05-25 18:13:01,752:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> > > > @1752:
> > > > > session establishment complete on server [10.141.141.10:2181],
> > > > > sessionId=0x14d8babef360022, negotiated timeout=10000
> > > > > I0525 18:13:01.752760 10913 group.cpp:313] Group process
> > > > > (group(1)@127.0.0.1:48557) connected to ZooKeeper
> > > > > I0525 18:13:01.752787 10913 group.cpp:790] Syncing group
> operations:
> > > > > queue size (joins, cancels, datas) = (0, 0, 0)
> > > > > I0525 18:13:01.752807 10913 group.cpp:385] Trying to create path
> > > > > '/mesos' in ZooKeeper
> > > > > I0525 18:13:01.754317 10909 detector.cpp:138] Detected a new
> leader:
> > > > > (id='16')
> > > > > I0525 18:13:01.754408 10913 group.cpp:659] Trying to get
> > > > > '/mesos/info_0000000016' in ZooKeeper
> > > > > I0525 18:13:01.755056 10913 detector.cpp:452] A new leading master
> > > > > (UPID=master@127.0.1.1:5050) is detected
> > > > > I0525 18:13:01.755113 10911 sched.cpp:254] New master detected at
> > > > > master@127.0.1.1:5050
> > > > > I0525 18:13:01.755345 10911 sched.cpp:264] No credentials provided.
> > > > > Attempting to register without authentication
> > > > >
> > > > >
> > > > > It hangs up in the last line.
> > > > >
> > > > > I've tried to set the LIBPROCESS_IP env variable with no luck.
> > > > >
> > > > > Any advice?
> > > > >
> > > > > Thank you in advance.
> > > > >
> > > > > Kind regards,
> > > > >
> > > > > Alberto
> > > > >
> > > >
> > >
> >
>

Re: Not able to connect to mesos from different machine

Posted by Alberto Rodriguez <ar...@gmail.com>.
Hi Alex,

I do not know what's going on, now I'm unable to access the spark console
again, it's hanging up in the same point as before. See following the
master logs:

2015-05-27_15:30:53.68764 I0527 15:30:53.687494   944 master.cpp:3760]
Sending 1 offers to framework 20150527-100126-169978048-5050-1851-0001
(chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at scheduler-be29901f-39ab-4bdf
-a9ec-691032775860@192.168.33.10:32768
2015-05-27_15:30:53.69032 I0527 15:30:53.690196   942 master.cpp:2273]
Processing ACCEPT call for offers: [
20150527-152023-169978048-5050-876-O241 ] on slave
20150527-152023-169978048-5050-876-S0 at slave(1)@19
2.168.33.11:5051 (mesos-slave1) for framework
20150527-100126-169978048-5050-1851-0001
(chronos-2.3.0_mesos-0.20.1-SNAPSHOT) at
scheduler-be29901f-39ab-4bdf-a9ec-691032775860@192.168.33.10:32768
2015-05-27_15:30:53.69038 I0527 15:30:53.690300   942 hierarchical.hpp:648]
Recovered mem(*):1024; cpus(*):2; disk(*):33375; ports(*):[31000-32000]
(total allocatable: mem(*):1024; cpus(*):2; disk(*):33375; port
s(*):[31000-32000]) on slave 20150527-152023-169978048-5050-876-S0 from
framework 20150527-100126-169978048-5050-1851-0001
2015-05-27_15:30:54.00952 I0527 15:30:54.009363   937 master.cpp:1574]
Received registration request for framework 'Spark shell' at
scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562
2015-05-27_15:30:54.00957 I0527 15:30:54.009461   937 master.cpp:1638]
Registering framework 20150527-152023-169978048-5050-876-0026 (Spark shell)
at scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:5556
2
2015-05-27_15:30:54.00994 I0527 15:30:54.009703   937 hierarchical.hpp:321]
Added framework 20150527-152023-169978048-5050-876-0026
2015-05-27_15:30:54.00996 I0527 15:30:54.009826   937 master.cpp:3760]
Sending 1 offers to framework 20150527-152023-169978048-5050-876-0026
(Spark shell) at scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.
0.1:55562
2015-05-27_15:30:54.01035 I0527 15:30:54.010267   944 master.cpp:878]
Framework 20150527-152023-169978048-5050-876-0026 (Spark shell) at
scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562 disconnecte
d
2015-05-27_15:30:54.01037 I0527 15:30:54.010308   944 master.cpp:1948]
Disconnecting framework 20150527-152023-169978048-5050-876-0026 (Spark
shell) at scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55
562
2015-05-27_15:30:54.01038 I0527 15:30:54.010326   944 master.cpp:1964]
Deactivating framework 20150527-152023-169978048-5050-876-0026 (Spark
shell) at scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:555
62
2015-05-27_15:30:54.01053 I0527 15:30:54.010447   939 hierarchical.hpp:400]
Deactivated framework 20150527-152023-169978048-5050-876-0026
2015-05-27_15:30:54.01055 I0527 15:30:54.010459   944 master.cpp:900]
Giving framework 20150527-152023-169978048-5050-876-0026 (Spark shell) at
scheduler-15df0294-c03c-4645-9079-a48128c68422@127.0.0.1:55562 0ns
to failover


Kind regards and thank you very much for your help!!



2015-05-27 16:28 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:

> Alberto,
>
> would you mind providing slave and master logs (or appropriate parts of
> them)? Have you specified the --work_dir flag for your Mesos Workers?
>
> On Wed, May 27, 2015 at 3:56 PM, Alberto Rodriguez <ar...@gmail.com>
> wrote:
>
> > Hi Alex,
> >
> > Thank you for replying. I managed to fix the first problem but now when I
> > launch a spark job through my console mesos is losing all the tasks. I
> can
> > see them all in my mesos slave but their status is LOST. The stderr &
> > stdout files of the tasks are both empty.
> >
> > Any ideas?
> >
> > 2015-05-26 17:35 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:
> >
> > > Alberto,
> > >
> > > What may be happening in your case is that Master is not able to talk
> to
> > > your scheduler. When responding to a scheduler, Mesos Master doesn't
> use
> > > the IP from which a request came from, but rather an IP set in the
> > > "Libprocess-from" field instead. That's exactly what you specify in
> > > LIBPROCESS_IP env var prior starting your scheduler. Could you please
> > > double check the it set up correctly and that IP is reachable for Mesos
> > > Master?
> > >
> > > In case you are not able to solve the problem, please provide scheduler
> > and
> > > Master logs together with master, zookeeper, and scheduler
> > configurations.
> > >
> > >
> > > On Mon, May 25, 2015 at 6:30 PM, Alberto Rodriguez <ar...@gmail.com>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I managed to get a mesos cluster up & running on a Ubuntu VM. I've
> > > > been also able to run and connect a spark-shell from this machine and
> > > > it works properly.
> > > >
> > > > Unfortunately, I'm trying to connect from the host machine where the
> > > > VM is running to launch spark jobs and I can not.
> > > >
> > > > See below the spark console output:
> > > >
> > > > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
> > > > 1.7.0_75)
> > > > Type in expressions to have them evaluated.
> > > > Type :help for more information.
> > > > 15/05/25 18:13:00 INFO SecurityManager: Changing view acls to:
> > arodriguez
> > > > 15/05/25 18:13:00 INFO SecurityManager: Changing modify acls to:
> > > arodriguez
> > > > 15/05/25 18:13:00 INFO SecurityManager: SecurityManager:
> > > > authentication disabled; ui acls disabled; users with view
> > > > permissions: Set(arodriguez); users with modify permissions:
> > > > Set(arodriguez)
> > > > 15/05/25 18:13:01 INFO Slf4jLogger: Slf4jLogger started
> > > > 15/05/25 18:13:01 INFO Remoting: Starting remoting
> > > > 15/05/25 18:13:01 INFO Remoting: Remoting started; listening on
> > > > addresses :[akka.tcp://sparkDriver@localhost.localdomain:47229]
> > > > 15/05/25 18:13:01 INFO Utils: Successfully started service
> > > > 'sparkDriver' on port 47229.
> > > > 15/05/25 18:13:01 INFO SparkEnv: Registering MapOutputTracker
> > > > 15/05/25 18:13:01 INFO SparkEnv: Registering BlockManagerMaster
> > > > 15/05/25 18:13:01 INFO DiskBlockManager: Created local directory at
> > > > /tmp/spark-local-20150525181301-7fa8
> > > > 15/05/25 18:13:01 INFO MemoryStore: MemoryStore started with capacity
> > > > 265.4 MB
> > > > 15/05/25 18:13:01 WARN NativeCodeLoader: Unable to load native-hadoop
> > > > library for your platform... using builtin-java classes where
> > > > applicable
> > > > 15/05/25 18:13:01 INFO HttpFileServer: HTTP File server directory is
> > > > /tmp/spark-1249c23f-adc8-4fcd-a044-b65a80f40e16
> > > > 15/05/25 18:13:01 INFO HttpServer: Starting HTTP Server
> > > > 15/05/25 18:13:01 INFO Utils: Successfully started service 'HTTP file
> > > > server' on port 51659.
> > > > 15/05/25 18:13:01 INFO Utils: Successfully started service 'SparkUI'
> > > > on port 4040.
> > > > 15/05/25 18:13:01 INFO SparkUI: Started SparkUI at
> > > > http://localhost.localdomain:4040
> > > > WARNING: Logging before InitGoogleLogging() is written to STDERR
> > > > W0525 18:13:01.749449 10908 sched.cpp:1323]
> > > > **************************************************
> > > > Scheduler driver bound to loopback interface! Cannot communicate with
> > > > remote master(s). You might want to set 'LIBPROCESS_IP' environment
> > > > variable to use a routable IP address.
> > > > **************************************************
> > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@712:
> > > > Client environment:zookeeper.version=zookeeper C client 3.4.6
> > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@716:
> > > > Client environment:host.name=localhost.localdomain
> > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@723:
> > > > Client environment:os.name=Linux
> > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@724:
> > > > Client environment:os.arch=3.19.7-200.fc21.x86_64
> > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@725:
> > > > Client environment:os.version=#1 SMP Thu May 7 22:00:21 UTC 2015
> > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@733:
> > > > Client environment:user.name=arodriguez
> > > > I0525 18:13:01.749791 10908 sched.cpp:157] Version: 0.22.1
> > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@741:
> > > > Client environment:user.home=/home/arodriguez
> > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@753:
> > > > Client
> > > >
> environment:user.dir=/home/arodriguez/dev/spark-1.2.0-bin-hadoop2.4/bin
> > > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@zookeeper_init
> > > @786:
> > > > Initiating client connection, host=10.141.141.10:2181
> > > > sessionTimeout=10000 watcher=0x7fd4c2f0d5b0 sessionId=0
> > > > sessionPasswd=<null> context=0x7fd3d40063c0 flags=0
> > > > 2015-05-25 18:13:01,750:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> > > @1705:
> > > > initiated connection to server [10.141.141.10:2181]
> > > > 2015-05-25 18:13:01,752:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> > > @1752:
> > > > session establishment complete on server [10.141.141.10:2181],
> > > > sessionId=0x14d8babef360022, negotiated timeout=10000
> > > > I0525 18:13:01.752760 10913 group.cpp:313] Group process
> > > > (group(1)@127.0.0.1:48557) connected to ZooKeeper
> > > > I0525 18:13:01.752787 10913 group.cpp:790] Syncing group operations:
> > > > queue size (joins, cancels, datas) = (0, 0, 0)
> > > > I0525 18:13:01.752807 10913 group.cpp:385] Trying to create path
> > > > '/mesos' in ZooKeeper
> > > > I0525 18:13:01.754317 10909 detector.cpp:138] Detected a new leader:
> > > > (id='16')
> > > > I0525 18:13:01.754408 10913 group.cpp:659] Trying to get
> > > > '/mesos/info_0000000016' in ZooKeeper
> > > > I0525 18:13:01.755056 10913 detector.cpp:452] A new leading master
> > > > (UPID=master@127.0.1.1:5050) is detected
> > > > I0525 18:13:01.755113 10911 sched.cpp:254] New master detected at
> > > > master@127.0.1.1:5050
> > > > I0525 18:13:01.755345 10911 sched.cpp:264] No credentials provided.
> > > > Attempting to register without authentication
> > > >
> > > >
> > > > It hangs up in the last line.
> > > >
> > > > I've tried to set the LIBPROCESS_IP env variable with no luck.
> > > >
> > > > Any advice?
> > > >
> > > > Thank you in advance.
> > > >
> > > > Kind regards,
> > > >
> > > > Alberto
> > > >
> > >
> >
>

Re: Not able to connect to mesos from different machine

Posted by Alex Rukletsov <al...@mesosphere.com>.
Alberto,

would you mind providing slave and master logs (or appropriate parts of
them)? Have you specified the --work_dir flag for your Mesos Workers?

On Wed, May 27, 2015 at 3:56 PM, Alberto Rodriguez <ar...@gmail.com>
wrote:

> Hi Alex,
>
> Thank you for replying. I managed to fix the first problem but now when I
> launch a spark job through my console mesos is losing all the tasks. I can
> see them all in my mesos slave but their status is LOST. The stderr &
> stdout files of the tasks are both empty.
>
> Any ideas?
>
> 2015-05-26 17:35 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:
>
> > Alberto,
> >
> > What may be happening in your case is that Master is not able to talk to
> > your scheduler. When responding to a scheduler, Mesos Master doesn't use
> > the IP from which a request came from, but rather an IP set in the
> > "Libprocess-from" field instead. That's exactly what you specify in
> > LIBPROCESS_IP env var prior starting your scheduler. Could you please
> > double check the it set up correctly and that IP is reachable for Mesos
> > Master?
> >
> > In case you are not able to solve the problem, please provide scheduler
> and
> > Master logs together with master, zookeeper, and scheduler
> configurations.
> >
> >
> > On Mon, May 25, 2015 at 6:30 PM, Alberto Rodriguez <ar...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > I managed to get a mesos cluster up & running on a Ubuntu VM. I've
> > > been also able to run and connect a spark-shell from this machine and
> > > it works properly.
> > >
> > > Unfortunately, I'm trying to connect from the host machine where the
> > > VM is running to launch spark jobs and I can not.
> > >
> > > See below the spark console output:
> > >
> > > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
> > > 1.7.0_75)
> > > Type in expressions to have them evaluated.
> > > Type :help for more information.
> > > 15/05/25 18:13:00 INFO SecurityManager: Changing view acls to:
> arodriguez
> > > 15/05/25 18:13:00 INFO SecurityManager: Changing modify acls to:
> > arodriguez
> > > 15/05/25 18:13:00 INFO SecurityManager: SecurityManager:
> > > authentication disabled; ui acls disabled; users with view
> > > permissions: Set(arodriguez); users with modify permissions:
> > > Set(arodriguez)
> > > 15/05/25 18:13:01 INFO Slf4jLogger: Slf4jLogger started
> > > 15/05/25 18:13:01 INFO Remoting: Starting remoting
> > > 15/05/25 18:13:01 INFO Remoting: Remoting started; listening on
> > > addresses :[akka.tcp://sparkDriver@localhost.localdomain:47229]
> > > 15/05/25 18:13:01 INFO Utils: Successfully started service
> > > 'sparkDriver' on port 47229.
> > > 15/05/25 18:13:01 INFO SparkEnv: Registering MapOutputTracker
> > > 15/05/25 18:13:01 INFO SparkEnv: Registering BlockManagerMaster
> > > 15/05/25 18:13:01 INFO DiskBlockManager: Created local directory at
> > > /tmp/spark-local-20150525181301-7fa8
> > > 15/05/25 18:13:01 INFO MemoryStore: MemoryStore started with capacity
> > > 265.4 MB
> > > 15/05/25 18:13:01 WARN NativeCodeLoader: Unable to load native-hadoop
> > > library for your platform... using builtin-java classes where
> > > applicable
> > > 15/05/25 18:13:01 INFO HttpFileServer: HTTP File server directory is
> > > /tmp/spark-1249c23f-adc8-4fcd-a044-b65a80f40e16
> > > 15/05/25 18:13:01 INFO HttpServer: Starting HTTP Server
> > > 15/05/25 18:13:01 INFO Utils: Successfully started service 'HTTP file
> > > server' on port 51659.
> > > 15/05/25 18:13:01 INFO Utils: Successfully started service 'SparkUI'
> > > on port 4040.
> > > 15/05/25 18:13:01 INFO SparkUI: Started SparkUI at
> > > http://localhost.localdomain:4040
> > > WARNING: Logging before InitGoogleLogging() is written to STDERR
> > > W0525 18:13:01.749449 10908 sched.cpp:1323]
> > > **************************************************
> > > Scheduler driver bound to loopback interface! Cannot communicate with
> > > remote master(s). You might want to set 'LIBPROCESS_IP' environment
> > > variable to use a routable IP address.
> > > **************************************************
> > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@712:
> > > Client environment:zookeeper.version=zookeeper C client 3.4.6
> > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@716:
> > > Client environment:host.name=localhost.localdomain
> > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@723:
> > > Client environment:os.name=Linux
> > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@724:
> > > Client environment:os.arch=3.19.7-200.fc21.x86_64
> > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@725:
> > > Client environment:os.version=#1 SMP Thu May 7 22:00:21 UTC 2015
> > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@733:
> > > Client environment:user.name=arodriguez
> > > I0525 18:13:01.749791 10908 sched.cpp:157] Version: 0.22.1
> > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@741:
> > > Client environment:user.home=/home/arodriguez
> > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@753:
> > > Client
> > > environment:user.dir=/home/arodriguez/dev/spark-1.2.0-bin-hadoop2.4/bin
> > > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@zookeeper_init
> > @786:
> > > Initiating client connection, host=10.141.141.10:2181
> > > sessionTimeout=10000 watcher=0x7fd4c2f0d5b0 sessionId=0
> > > sessionPasswd=<null> context=0x7fd3d40063c0 flags=0
> > > 2015-05-25 18:13:01,750:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> > @1705:
> > > initiated connection to server [10.141.141.10:2181]
> > > 2015-05-25 18:13:01,752:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> > @1752:
> > > session establishment complete on server [10.141.141.10:2181],
> > > sessionId=0x14d8babef360022, negotiated timeout=10000
> > > I0525 18:13:01.752760 10913 group.cpp:313] Group process
> > > (group(1)@127.0.0.1:48557) connected to ZooKeeper
> > > I0525 18:13:01.752787 10913 group.cpp:790] Syncing group operations:
> > > queue size (joins, cancels, datas) = (0, 0, 0)
> > > I0525 18:13:01.752807 10913 group.cpp:385] Trying to create path
> > > '/mesos' in ZooKeeper
> > > I0525 18:13:01.754317 10909 detector.cpp:138] Detected a new leader:
> > > (id='16')
> > > I0525 18:13:01.754408 10913 group.cpp:659] Trying to get
> > > '/mesos/info_0000000016' in ZooKeeper
> > > I0525 18:13:01.755056 10913 detector.cpp:452] A new leading master
> > > (UPID=master@127.0.1.1:5050) is detected
> > > I0525 18:13:01.755113 10911 sched.cpp:254] New master detected at
> > > master@127.0.1.1:5050
> > > I0525 18:13:01.755345 10911 sched.cpp:264] No credentials provided.
> > > Attempting to register without authentication
> > >
> > >
> > > It hangs up in the last line.
> > >
> > > I've tried to set the LIBPROCESS_IP env variable with no luck.
> > >
> > > Any advice?
> > >
> > > Thank you in advance.
> > >
> > > Kind regards,
> > >
> > > Alberto
> > >
> >
>

Re: Not able to connect to mesos from different machine

Posted by Alberto Rodriguez <ar...@gmail.com>.
Hi Alex,

Thank you for replying. I managed to fix the first problem but now when I
launch a spark job through my console mesos is losing all the tasks. I can
see them all in my mesos slave but their status is LOST. The stderr &
stdout files of the tasks are both empty.

Any ideas?

2015-05-26 17:35 GMT+02:00 Alex Rukletsov <al...@mesosphere.com>:

> Alberto,
>
> What may be happening in your case is that Master is not able to talk to
> your scheduler. When responding to a scheduler, Mesos Master doesn't use
> the IP from which a request came from, but rather an IP set in the
> "Libprocess-from" field instead. That's exactly what you specify in
> LIBPROCESS_IP env var prior starting your scheduler. Could you please
> double check the it set up correctly and that IP is reachable for Mesos
> Master?
>
> In case you are not able to solve the problem, please provide scheduler and
> Master logs together with master, zookeeper, and scheduler configurations.
>
>
> On Mon, May 25, 2015 at 6:30 PM, Alberto Rodriguez <ar...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > I managed to get a mesos cluster up & running on a Ubuntu VM. I've
> > been also able to run and connect a spark-shell from this machine and
> > it works properly.
> >
> > Unfortunately, I'm trying to connect from the host machine where the
> > VM is running to launch spark jobs and I can not.
> >
> > See below the spark console output:
> >
> > Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
> > 1.7.0_75)
> > Type in expressions to have them evaluated.
> > Type :help for more information.
> > 15/05/25 18:13:00 INFO SecurityManager: Changing view acls to: arodriguez
> > 15/05/25 18:13:00 INFO SecurityManager: Changing modify acls to:
> arodriguez
> > 15/05/25 18:13:00 INFO SecurityManager: SecurityManager:
> > authentication disabled; ui acls disabled; users with view
> > permissions: Set(arodriguez); users with modify permissions:
> > Set(arodriguez)
> > 15/05/25 18:13:01 INFO Slf4jLogger: Slf4jLogger started
> > 15/05/25 18:13:01 INFO Remoting: Starting remoting
> > 15/05/25 18:13:01 INFO Remoting: Remoting started; listening on
> > addresses :[akka.tcp://sparkDriver@localhost.localdomain:47229]
> > 15/05/25 18:13:01 INFO Utils: Successfully started service
> > 'sparkDriver' on port 47229.
> > 15/05/25 18:13:01 INFO SparkEnv: Registering MapOutputTracker
> > 15/05/25 18:13:01 INFO SparkEnv: Registering BlockManagerMaster
> > 15/05/25 18:13:01 INFO DiskBlockManager: Created local directory at
> > /tmp/spark-local-20150525181301-7fa8
> > 15/05/25 18:13:01 INFO MemoryStore: MemoryStore started with capacity
> > 265.4 MB
> > 15/05/25 18:13:01 WARN NativeCodeLoader: Unable to load native-hadoop
> > library for your platform... using builtin-java classes where
> > applicable
> > 15/05/25 18:13:01 INFO HttpFileServer: HTTP File server directory is
> > /tmp/spark-1249c23f-adc8-4fcd-a044-b65a80f40e16
> > 15/05/25 18:13:01 INFO HttpServer: Starting HTTP Server
> > 15/05/25 18:13:01 INFO Utils: Successfully started service 'HTTP file
> > server' on port 51659.
> > 15/05/25 18:13:01 INFO Utils: Successfully started service 'SparkUI'
> > on port 4040.
> > 15/05/25 18:13:01 INFO SparkUI: Started SparkUI at
> > http://localhost.localdomain:4040
> > WARNING: Logging before InitGoogleLogging() is written to STDERR
> > W0525 18:13:01.749449 10908 sched.cpp:1323]
> > **************************************************
> > Scheduler driver bound to loopback interface! Cannot communicate with
> > remote master(s). You might want to set 'LIBPROCESS_IP' environment
> > variable to use a routable IP address.
> > **************************************************
> > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@712:
> > Client environment:zookeeper.version=zookeeper C client 3.4.6
> > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@716:
> > Client environment:host.name=localhost.localdomain
> > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@723:
> > Client environment:os.name=Linux
> > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@724:
> > Client environment:os.arch=3.19.7-200.fc21.x86_64
> > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@725:
> > Client environment:os.version=#1 SMP Thu May 7 22:00:21 UTC 2015
> > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@733:
> > Client environment:user.name=arodriguez
> > I0525 18:13:01.749791 10908 sched.cpp:157] Version: 0.22.1
> > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@741:
> > Client environment:user.home=/home/arodriguez
> > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@753:
> > Client
> > environment:user.dir=/home/arodriguez/dev/spark-1.2.0-bin-hadoop2.4/bin
> > 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@zookeeper_init
> @786:
> > Initiating client connection, host=10.141.141.10:2181
> > sessionTimeout=10000 watcher=0x7fd4c2f0d5b0 sessionId=0
> > sessionPasswd=<null> context=0x7fd3d40063c0 flags=0
> > 2015-05-25 18:13:01,750:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> @1705:
> > initiated connection to server [10.141.141.10:2181]
> > 2015-05-25 18:13:01,752:10746(0x7fd4ab7fe700):ZOO_INFO@check_events
> @1752:
> > session establishment complete on server [10.141.141.10:2181],
> > sessionId=0x14d8babef360022, negotiated timeout=10000
> > I0525 18:13:01.752760 10913 group.cpp:313] Group process
> > (group(1)@127.0.0.1:48557) connected to ZooKeeper
> > I0525 18:13:01.752787 10913 group.cpp:790] Syncing group operations:
> > queue size (joins, cancels, datas) = (0, 0, 0)
> > I0525 18:13:01.752807 10913 group.cpp:385] Trying to create path
> > '/mesos' in ZooKeeper
> > I0525 18:13:01.754317 10909 detector.cpp:138] Detected a new leader:
> > (id='16')
> > I0525 18:13:01.754408 10913 group.cpp:659] Trying to get
> > '/mesos/info_0000000016' in ZooKeeper
> > I0525 18:13:01.755056 10913 detector.cpp:452] A new leading master
> > (UPID=master@127.0.1.1:5050) is detected
> > I0525 18:13:01.755113 10911 sched.cpp:254] New master detected at
> > master@127.0.1.1:5050
> > I0525 18:13:01.755345 10911 sched.cpp:264] No credentials provided.
> > Attempting to register without authentication
> >
> >
> > It hangs up in the last line.
> >
> > I've tried to set the LIBPROCESS_IP env variable with no luck.
> >
> > Any advice?
> >
> > Thank you in advance.
> >
> > Kind regards,
> >
> > Alberto
> >
>

Re: Not able to connect to mesos from different machine

Posted by Alex Rukletsov <al...@mesosphere.com>.
Alberto,

What may be happening in your case is that Master is not able to talk to
your scheduler. When responding to a scheduler, Mesos Master doesn't use
the IP from which a request came from, but rather an IP set in the
"Libprocess-from" field instead. That's exactly what you specify in
LIBPROCESS_IP env var prior starting your scheduler. Could you please
double check the it set up correctly and that IP is reachable for Mesos
Master?

In case you are not able to solve the problem, please provide scheduler and
Master logs together with master, zookeeper, and scheduler configurations.


On Mon, May 25, 2015 at 6:30 PM, Alberto Rodriguez <ar...@gmail.com>
wrote:

> Hi all,
>
> I managed to get a mesos cluster up & running on a Ubuntu VM. I've
> been also able to run and connect a spark-shell from this machine and
> it works properly.
>
> Unfortunately, I'm trying to connect from the host machine where the
> VM is running to launch spark jobs and I can not.
>
> See below the spark console output:
>
> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.7.0_75)
> Type in expressions to have them evaluated.
> Type :help for more information.
> 15/05/25 18:13:00 INFO SecurityManager: Changing view acls to: arodriguez
> 15/05/25 18:13:00 INFO SecurityManager: Changing modify acls to: arodriguez
> 15/05/25 18:13:00 INFO SecurityManager: SecurityManager:
> authentication disabled; ui acls disabled; users with view
> permissions: Set(arodriguez); users with modify permissions:
> Set(arodriguez)
> 15/05/25 18:13:01 INFO Slf4jLogger: Slf4jLogger started
> 15/05/25 18:13:01 INFO Remoting: Starting remoting
> 15/05/25 18:13:01 INFO Remoting: Remoting started; listening on
> addresses :[akka.tcp://sparkDriver@localhost.localdomain:47229]
> 15/05/25 18:13:01 INFO Utils: Successfully started service
> 'sparkDriver' on port 47229.
> 15/05/25 18:13:01 INFO SparkEnv: Registering MapOutputTracker
> 15/05/25 18:13:01 INFO SparkEnv: Registering BlockManagerMaster
> 15/05/25 18:13:01 INFO DiskBlockManager: Created local directory at
> /tmp/spark-local-20150525181301-7fa8
> 15/05/25 18:13:01 INFO MemoryStore: MemoryStore started with capacity
> 265.4 MB
> 15/05/25 18:13:01 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where
> applicable
> 15/05/25 18:13:01 INFO HttpFileServer: HTTP File server directory is
> /tmp/spark-1249c23f-adc8-4fcd-a044-b65a80f40e16
> 15/05/25 18:13:01 INFO HttpServer: Starting HTTP Server
> 15/05/25 18:13:01 INFO Utils: Successfully started service 'HTTP file
> server' on port 51659.
> 15/05/25 18:13:01 INFO Utils: Successfully started service 'SparkUI'
> on port 4040.
> 15/05/25 18:13:01 INFO SparkUI: Started SparkUI at
> http://localhost.localdomain:4040
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> W0525 18:13:01.749449 10908 sched.cpp:1323]
> **************************************************
> Scheduler driver bound to loopback interface! Cannot communicate with
> remote master(s). You might want to set 'LIBPROCESS_IP' environment
> variable to use a routable IP address.
> **************************************************
> 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@712:
> Client environment:zookeeper.version=zookeeper C client 3.4.6
> 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@716:
> Client environment:host.name=localhost.localdomain
> 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@723:
> Client environment:os.name=Linux
> 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@724:
> Client environment:os.arch=3.19.7-200.fc21.x86_64
> 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@725:
> Client environment:os.version=#1 SMP Thu May 7 22:00:21 UTC 2015
> 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@733:
> Client environment:user.name=arodriguez
> I0525 18:13:01.749791 10908 sched.cpp:157] Version: 0.22.1
> 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@741:
> Client environment:user.home=/home/arodriguez
> 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@log_env@753:
> Client
> environment:user.dir=/home/arodriguez/dev/spark-1.2.0-bin-hadoop2.4/bin
> 2015-05-25 18:13:01,749:10746(0x7fd4b1ffb700):ZOO_INFO@zookeeper_init@786:
> Initiating client connection, host=10.141.141.10:2181
> sessionTimeout=10000 watcher=0x7fd4c2f0d5b0 sessionId=0
> sessionPasswd=<null> context=0x7fd3d40063c0 flags=0
> 2015-05-25 18:13:01,750:10746(0x7fd4ab7fe700):ZOO_INFO@check_events@1705:
> initiated connection to server [10.141.141.10:2181]
> 2015-05-25 18:13:01,752:10746(0x7fd4ab7fe700):ZOO_INFO@check_events@1752:
> session establishment complete on server [10.141.141.10:2181],
> sessionId=0x14d8babef360022, negotiated timeout=10000
> I0525 18:13:01.752760 10913 group.cpp:313] Group process
> (group(1)@127.0.0.1:48557) connected to ZooKeeper
> I0525 18:13:01.752787 10913 group.cpp:790] Syncing group operations:
> queue size (joins, cancels, datas) = (0, 0, 0)
> I0525 18:13:01.752807 10913 group.cpp:385] Trying to create path
> '/mesos' in ZooKeeper
> I0525 18:13:01.754317 10909 detector.cpp:138] Detected a new leader:
> (id='16')
> I0525 18:13:01.754408 10913 group.cpp:659] Trying to get
> '/mesos/info_0000000016' in ZooKeeper
> I0525 18:13:01.755056 10913 detector.cpp:452] A new leading master
> (UPID=master@127.0.1.1:5050) is detected
> I0525 18:13:01.755113 10911 sched.cpp:254] New master detected at
> master@127.0.1.1:5050
> I0525 18:13:01.755345 10911 sched.cpp:264] No credentials provided.
> Attempting to register without authentication
>
>
> It hangs up in the last line.
>
> I've tried to set the LIBPROCESS_IP env variable with no luck.
>
> Any advice?
>
> Thank you in advance.
>
> Kind regards,
>
> Alberto
>