You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Scott Smith <sc...@gmail.com> on 2012/04/20 19:39:28 UTC

Releasing resources

I'm running Spark git head / Mesos 1205738.  My cluster is small -- a
single slave with 2 CPUs and 1.2GB of available RAM.

I can run SparkPi once, given:
./run spark.examples.SparkPi master@...

but I can't run it twice.  It seems that each invocation of SparkPi
creates a new framework entry in the webui:

201204200627-0-0022 	ubuntu 	SparkPi 	0 	0 	800.0 MB 	0.68 	2012-04-20 17:24:47

even after waiting for a couple minutes, the memory is still reserved.

I'm not sure what is supposed to release the resource -- the program
has exited, so the framework shouldn't exist anymore.  I added
'spark.stop()' to the end of the program but that doesn't help.  The
only way I've found to clean up the slave is to kill and restart it.
Doing this, however, still leaves stale empty framework entries in the
master:

201204200627-0-0018 	ubuntu 	SparkPi 	0 	0 	0.0 MB 	0.00 	2012-04-20 17:09:28
201204200627-0-0019 	ubuntu 	SparkPi 	0 	0 	0.0 MB 	0.00 	2012-04-20 17:17:25
201204200627-0-0016 	ubuntu 	SparkPi 	0 	0 	0.0 MB 	0.00 	2012-04-20 16:50:35
201204200627-0-0017 	ubuntu 	SparkPi 	0 	0 	0.0 MB 	0.00 	2012-04-20 16:51:19
.....

I'm also not sure if instead the correct behavior is that subsequent
invocations of SparkPi should reuse the existing framework -- if so,
how do I make that happen?

Thanks!
-- 
        Scott

Re: Releasing resources

Posted by Scott Smith <sc...@gmail.com>.
Ah thanks!  I'll probably set it to a minute or 30 seconds.

BTW I did discover I did something stupid (again) -- I added
spark.stop() to the example, and recompiled it using scalac -cp ... -d
mypi.jar mypi.scala, but it didn't actually update the jar.  I have to
delete the jar first to get it to recompile.  Once I did that, it did
release resources.

It's a bit rough trying to learn two frameworks and one language (two
if you count java) at the same time :-)

Which version of Mesos should I use with the latest version of Spark?
I noticed the head of svn generated mesos-0.9.0.jar, which I don't
think Spark knows how to find.  I can update the Spark run script if
need be, but if it isn't an approved version combo then I won't
bother.

On Fri, Apr 20, 2012 at 12:48 PM, Matei Zaharia <ma...@eecs.berkeley.edu> wrote:
> Ah yes, this is due to a feature called "framework failover" in that version of Mesos that has an overly large timeout by default. Basically the idea is that if a framework's master disconnects, we give it some time to reconnect before killing its executors and tasks, but this time is by default 1 day. You can fix it by adding the parameter --failover_timeout=1 when running mesos-master. If you're running through the deploy scripts, add failover_timeout=1 to your mesos.conf.
>
> I'll update the Spark wiki to mention this because it's come up a bunch. It will not be an issue in Mesos 0.9.
>
> Matei
>
> On Apr 20, 2012, at 10:39 AM, Scott Smith wrote:
>
>> I'm running Spark git head / Mesos 1205738.  My cluster is small -- a
>> single slave with 2 CPUs and 1.2GB of available RAM.
>>
>> I can run SparkPi once, given:
>> ./run spark.examples.SparkPi master@...
>>
>> but I can't run it twice.  It seems that each invocation of SparkPi
>> creates a new framework entry in the webui:
>>
>> 201204200627-0-0022   ubuntu  SparkPi         0       0       800.0 MB        0.68    2012-04-20 17:24:47
>>
>> even after waiting for a couple minutes, the memory is still reserved.
>>
>> I'm not sure what is supposed to release the resource -- the program
>> has exited, so the framework shouldn't exist anymore.  I added
>> 'spark.stop()' to the end of the program but that doesn't help.  The
>> only way I've found to clean up the slave is to kill and restart it.
>> Doing this, however, still leaves stale empty framework entries in the
>> master:
>>
>> 201204200627-0-0018   ubuntu  SparkPi         0       0       0.0 MB  0.00    2012-04-20 17:09:28
>> 201204200627-0-0019   ubuntu  SparkPi         0       0       0.0 MB  0.00    2012-04-20 17:17:25
>> 201204200627-0-0016   ubuntu  SparkPi         0       0       0.0 MB  0.00    2012-04-20 16:50:35
>> 201204200627-0-0017   ubuntu  SparkPi         0       0       0.0 MB  0.00    2012-04-20 16:51:19
>> .....
>>
>> I'm also not sure if instead the correct behavior is that subsequent
>> invocations of SparkPi should reuse the existing framework -- if so,
>> how do I make that happen?
>>
>> Thanks!
>> --
>>         Scott
>



-- 
        Scott

Re: Releasing resources

Posted by Matei Zaharia <ma...@eecs.berkeley.edu>.
Ah yes, this is due to a feature called "framework failover" in that version of Mesos that has an overly large timeout by default. Basically the idea is that if a framework's master disconnects, we give it some time to reconnect before killing its executors and tasks, but this time is by default 1 day. You can fix it by adding the parameter --failover_timeout=1 when running mesos-master. If you're running through the deploy scripts, add failover_timeout=1 to your mesos.conf.

I'll update the Spark wiki to mention this because it's come up a bunch. It will not be an issue in Mesos 0.9.

Matei

On Apr 20, 2012, at 10:39 AM, Scott Smith wrote:

> I'm running Spark git head / Mesos 1205738.  My cluster is small -- a
> single slave with 2 CPUs and 1.2GB of available RAM.
> 
> I can run SparkPi once, given:
> ./run spark.examples.SparkPi master@...
> 
> but I can't run it twice.  It seems that each invocation of SparkPi
> creates a new framework entry in the webui:
> 
> 201204200627-0-0022 	ubuntu 	SparkPi 	0 	0 	800.0 MB 	0.68 	2012-04-20 17:24:47
> 
> even after waiting for a couple minutes, the memory is still reserved.
> 
> I'm not sure what is supposed to release the resource -- the program
> has exited, so the framework shouldn't exist anymore.  I added
> 'spark.stop()' to the end of the program but that doesn't help.  The
> only way I've found to clean up the slave is to kill and restart it.
> Doing this, however, still leaves stale empty framework entries in the
> master:
> 
> 201204200627-0-0018 	ubuntu 	SparkPi 	0 	0 	0.0 MB 	0.00 	2012-04-20 17:09:28
> 201204200627-0-0019 	ubuntu 	SparkPi 	0 	0 	0.0 MB 	0.00 	2012-04-20 17:17:25
> 201204200627-0-0016 	ubuntu 	SparkPi 	0 	0 	0.0 MB 	0.00 	2012-04-20 16:50:35
> 201204200627-0-0017 	ubuntu 	SparkPi 	0 	0 	0.0 MB 	0.00 	2012-04-20 16:51:19
> .....
> 
> I'm also not sure if instead the correct behavior is that subsequent
> invocations of SparkPi should reuse the existing framework -- if so,
> how do I make that happen?
> 
> Thanks!
> -- 
>         Scott