You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "igor.berman" <ig...@gmail.com> on 2018/01/20 16:33:55 UTC

external shuffle service in mesos

Hi,
wanted to get some advice regarding managing external shuffle service in
mesos environments

In spark documentation the Marathon is mentioned, however there is very
limited documentation.
I've tried to search for some documentation and it's seems not too difficult
to configure it under Marathon(e.g.
https://github.com/NBCUAS/dcos-spark-shuffle-service/blob/master/marathon/mesos-shuffle-service.json),
however I see few problems:

There is no clear way to deploy some application in mesos on every node
see https://jira.mesosphere.com/browse/MARATHON-3791 
* it's not possible to guarantee on which nodes shuffle service application
will be placed(it's possible to guarantee with mesos unique constrain that
only 1 shuffle service instance will be placed on some node)
* cluster that has dynamic nodes joining/leaving - the config of shuffle
service must be adjusted(specifically number of instances config)

So any production ops advices will be welcome
Igor



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: external shuffle service in mesos

Posted by "igor.berman" <ig...@gmail.com>.

Hi Susan,
yes, agree with you regarding resource accounting. Imho, in this case
shuffle service must run on node no matter what resources are available(same
as we don't account for resources that "system" takes - mesos agent, OS
itself and any other process that is running on same machine)

One additional argument against managing it with puppet/chef is that this
management becomes "leaked abstraction": usually we submit spark frameworks
through mesos and give it any spark distribution uri, while to get this
shuffle service running as daemon on every node I need to install specific
version of spark distribution on this node and then when upgrading spark
version it's not enough to give new uri to mesos, I need to create new
shuffle service which uses new spark distro(and then port/dir/other
conflicts should be resolved)



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: external shuffle service in mesos

Posted by "Susan X. Huynh" <xh...@mesosphere.io>.

Hi Igor,

You made a good point about the tradeoffs. I think the main thing you would
get with Marathon is the accounting for resources (the memory and cpus
specified in the config file). That allows Mesos to manage the resources
properly. I don't think the other tools mentioned would reserve resources
from Mesos.

If you want more information about production ops for Mesos, you might want
to ask in the Mesos mailing list. Or, you can check out the
https://dcos.io/community/ project.

Susan

On Sat, Jan 20, 2018 at 11:59 PM, igor.berman <ig...@gmail.com> wrote:

> Hi Susan
>
> In general I can get what I need without Marathon, with configuring
> external-shuffle-service with puppet/ansible/chef + maybe some alerts for
> checks.
>
> I mean in companies that don't have strong Devops teams and want to install
> services as simple as possible just by config - Marathon might be useful,
> however if company already has strong puppet/ansible/chef whatever infra,
> the Marathon addition(additional component) and management is less clear
>
> WDYT?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

-- 
Susan X. Huynh
Software engineer, Data Agility
xhuynh@mesosphere.com

Re: external shuffle service in mesos

Posted by "igor.berman" <ig...@gmail.com>.

Hi Susan

In general I can get what I need without Marathon, with configuring
external-shuffle-service with puppet/ansible/chef + maybe some alerts for
checks.

I mean in companies that don't have strong Devops teams and want to install
services as simple as possible just by config - Marathon might be useful,
however if company already has strong puppet/ansible/chef whatever infra,
the Marathon addition(additional component) and management is less clear

WDYT?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: external shuffle service in mesos

Posted by "Susan X. Huynh" <xh...@mesosphere.io>.

Hi Igor,

The best way I know of is with Marathon.
* Placement constraint: you could combine constraints in Marathon. Like:
"constraints": [
        ["hostname", "UNIQUE"],
        ["hostname", "LIKE", "host1|host2|host3"]
]
https://groups.google.com/forum/#!topic/marathon-framework/hfLUw3TIw2I

* You would have to use a workaround to deal with a dynamically sized
cluster: set the number of instances to be greater than the expected
cluster size.
https://jira.mesosphere.com/browse/MARATHON-3791?focusedCommentId=79976&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-79976
As the commenter notes, it's not ideal, it's just a workaround.

Susan

On Sat, Jan 20, 2018 at 8:33 AM, igor.berman <ig...@gmail.com> wrote:

> Hi,
> wanted to get some advice regarding managing external shuffle service in
> mesos environments
>
> In spark documentation the Marathon is mentioned, however there is very
> limited documentation.
> I've tried to search for some documentation and it's seems not too
> difficult
> to configure it under Marathon(e.g.
> https://github.com/NBCUAS/dcos-spark-shuffle-service/
> blob/master/marathon/mesos-shuffle-service.json),
> however I see few problems:
>
> There is no clear way to deploy some application in mesos on every node
> see https://jira.mesosphere.com/browse/MARATHON-3791
> * it's not possible to guarantee on which nodes shuffle service application
> will be placed(it's possible to guarantee with mesos unique constrain that
> only 1 shuffle service instance will be placed on some node)
> * cluster that has dynamic nodes joining/leaving - the config of shuffle
> service must be adjusted(specifically number of instances config)
>
> So any production ops advices will be welcome
> Igor
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>


-- 
Susan X. Huynh
Software engineer, Data Agility
xhuynh@mesosphere.com