You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Yu Wei <yu...@hotmail.com> on 2017/03/30 17:18:43 UTC

[Spark on mesos] Spark framework not re-registered and lost after mesos master restarted

Hi guys,

I encountered a problem about spark on mesos.

I setup mesos cluster and launched spark framework on mesos successfully.

Then mesos master was killed and started again.

However, spark framework couldn't be re-registered again as mesos agent does. I also couldn't find any error logs.

And MesosClusterDispatcher is still running there.


I suspect this is spark framework issue.

What's your opinion?



Thanks,

Jared, (韦煜)
Software developer
Interested in open source software, big data, Linux

Re: [Spark on mesos] Spark framework not re-registered and lost after mesos master restarted

Posted by Yu Wei <yu...@hotmail.com>.
Got that.


Thanks,

Jared, (韦煜)
Software developer
Interested in open source software, big data, Linux

________________________________
From: Timothy Chen <tn...@gmail.com>
Sent: Friday, March 31, 2017 11:33:42 AM
To: Yu Wei
Cc: dev; users@spark.apache.org
Subject: Re: [Spark on mesos] Spark framework not re-registered and lost after mesos master restarted

Hi Yu,

As mentioned earlier, currently the Spark framework will not
re-register as the failover_timeout is not set and there is no
configuration available yet.
It's only enabled in MesosClusterScheduler since it's meant to be a HA
framework.

We should add that configuration for users that want their Spark
frameworks to be able to failover in case of Master failover or
network disconnect, etc.

Tim

On Thu, Mar 30, 2017 at 8:25 PM, Yu Wei <yu...@hotmail.com> wrote:
> Hi Tim,
>
> I tested the scenario again with settings as below,
>
> [dcos@agent spark-2.0.2-bin-hadoop2.7]$ cat conf/spark-defaults.conf
> spark.deploy.recoveryMode  ZOOKEEPER
> spark.deploy.zookeeper.url 192.168.111.53:2181
> spark.deploy.zookeeper.dir /spark
> spark.executor.memory 512M
> spark.mesos.principal agent-dev-1
>
>
> However, the case still failed. After master restarted, spark framework did
> not re-register.
> From spark framework log, it seemed that below method in
> MesosClusterScheduler was not called.
> override def reregistered(driver: SchedulerDriver, masterInfo: MasterInfo):
> Unit
>
> Did I miss something? Any advice?
>
>
> Thanks,
>
> Jared, (韦煜)
> Software developer
> Interested in open source software, big data, Linux
>
>
>
> ________________________________
> From: Timothy Chen <tn...@gmail.com>
> Sent: Friday, March 31, 2017 5:13 AM
> To: Yu Wei
> Cc: users@spark.apache.org; dev
> Subject: Re: [Spark on mesos] Spark framework not re-registered and lost
> after mesos master restarted
>
> I think failover isn't enabled on regular Spark job framework, since we
> assume jobs are more ephemeral.
>
> It could be a good setting to add to the Spark framework to enable failover.
>
> Tim
>
> On Mar 30, 2017, at 10:18 AM, Yu Wei <yu...@hotmail.com> wrote:
>
> Hi guys,
>
> I encountered a problem about spark on mesos.
>
> I setup mesos cluster and launched spark framework on mesos successfully.
>
> Then mesos master was killed and started again.
>
> However, spark framework couldn't be re-registered again as mesos agent
> does. I also couldn't find any error logs.
>
> And MesosClusterDispatcher is still running there.
>
>
> I suspect this is spark framework issue.
>
> What's your opinion?
>
>
>
> Thanks,
>
> Jared, (韦煜)
> Software developer
> Interested in open source software, big data, Linux

Re: [Spark on mesos] Spark framework not re-registered and lost after mesos master restarted

Posted by Timothy Chen <tn...@gmail.com>.
Hi Yu,

As mentioned earlier, currently the Spark framework will not
re-register as the failover_timeout is not set and there is no
configuration available yet.
It's only enabled in MesosClusterScheduler since it's meant to be a HA
framework.

We should add that configuration for users that want their Spark
frameworks to be able to failover in case of Master failover or
network disconnect, etc.

Tim

On Thu, Mar 30, 2017 at 8:25 PM, Yu Wei <yu...@hotmail.com> wrote:
> Hi Tim,
>
> I tested the scenario again with settings as below,
>
> [dcos@agent spark-2.0.2-bin-hadoop2.7]$ cat conf/spark-defaults.conf
> spark.deploy.recoveryMode  ZOOKEEPER
> spark.deploy.zookeeper.url 192.168.111.53:2181
> spark.deploy.zookeeper.dir /spark
> spark.executor.memory 512M
> spark.mesos.principal agent-dev-1
>
>
> However, the case still failed. After master restarted, spark framework did
> not re-register.
> From spark framework log, it seemed that below method in
> MesosClusterScheduler was not called.
> override def reregistered(driver: SchedulerDriver, masterInfo: MasterInfo):
> Unit
>
> Did I miss something? Any advice?
>
>
> Thanks,
>
> Jared, (韦煜)
> Software developer
> Interested in open source software, big data, Linux
>
>
>
> ________________________________
> From: Timothy Chen <tn...@gmail.com>
> Sent: Friday, March 31, 2017 5:13 AM
> To: Yu Wei
> Cc: users@spark.apache.org; dev
> Subject: Re: [Spark on mesos] Spark framework not re-registered and lost
> after mesos master restarted
>
> I think failover isn't enabled on regular Spark job framework, since we
> assume jobs are more ephemeral.
>
> It could be a good setting to add to the Spark framework to enable failover.
>
> Tim
>
> On Mar 30, 2017, at 10:18 AM, Yu Wei <yu...@hotmail.com> wrote:
>
> Hi guys,
>
> I encountered a problem about spark on mesos.
>
> I setup mesos cluster and launched spark framework on mesos successfully.
>
> Then mesos master was killed and started again.
>
> However, spark framework couldn't be re-registered again as mesos agent
> does. I also couldn't find any error logs.
>
> And MesosClusterDispatcher is still running there.
>
>
> I suspect this is spark framework issue.
>
> What's your opinion?
>
>
>
> Thanks,
>
> Jared, (韦煜)
> Software developer
> Interested in open source software, big data, Linux

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: [Spark on mesos] Spark framework not re-registered and lost after mesos master restarted

Posted by Yu Wei <yu...@hotmail.com>.
Hi Tim,

I tested the scenario again with settings as below,

[dcos@agent spark-2.0.2-bin-hadoop2.7]$ cat conf/spark-defaults.conf
spark.deploy.recoveryMode  ZOOKEEPER
spark.deploy.zookeeper.url 192.168.111.53:2181
spark.deploy.zookeeper.dir /spark
spark.executor.memory 512M
spark.mesos.principal agent-dev-1


However, the case still failed. After master restarted, spark framework did not re-register.
From spark framework log, it seemed that below method in MesosClusterScheduler was not called.
override def reregistered(driver: SchedulerDriver, masterInfo: MasterInfo): Unit

Did I miss something? Any advice?



Thanks,

Jared, (韦煜)
Software developer
Interested in open source software, big data, Linux


________________________________
From: Timothy Chen <tn...@gmail.com>
Sent: Friday, March 31, 2017 5:13 AM
To: Yu Wei
Cc: users@spark.apache.org; dev
Subject: Re: [Spark on mesos] Spark framework not re-registered and lost after mesos master restarted

I think failover isn't enabled on regular Spark job framework, since we assume jobs are more ephemeral.

It could be a good setting to add to the Spark framework to enable failover.

Tim

On Mar 30, 2017, at 10:18 AM, Yu Wei <yu...@hotmail.com>> wrote:


Hi guys,

I encountered a problem about spark on mesos.

I setup mesos cluster and launched spark framework on mesos successfully.

Then mesos master was killed and started again.

However, spark framework couldn't be re-registered again as mesos agent does. I also couldn't find any error logs.

And MesosClusterDispatcher is still running there.


I suspect this is spark framework issue.

What's your opinion?



Thanks,

Jared, (韦煜)
Software developer
Interested in open source software, big data, Linux

Re: [Spark on mesos] Spark framework not re-registered and lost after mesos master restarted

Posted by Timothy Chen <tn...@gmail.com>.
I think failover isn't enabled on regular Spark job framework, since we assume jobs are more ephemeral.

It could be a good setting to add to the Spark framework to enable failover.

Tim

> On Mar 30, 2017, at 10:18 AM, Yu Wei <yu...@hotmail.com> wrote:
> 
> Hi guys,
> 
> I encountered a problem about spark on mesos.
> 
> I setup mesos cluster and launched spark framework on mesos successfully.
> 
> Then mesos master was killed and started again.
> 
> However, spark framework couldn't be re-registered again as mesos agent does. I also couldn't find any error logs.
> 
> And MesosClusterDispatcher is still running there.
> 
> I suspect this is spark framework issue. 
> What's your opinion?
> 
> 
> Thanks,
> 
> Jared, (��ԡ^
> Software developer
> Interested in open source software, big data, Linux