You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Florian Pfeiffer <fp...@x8s.de> on 2016/06/30 15:57:23 UTC

Mesos on hybrid AWS&DC - Best practices?

Hi,

the last 2 years I managed a mesos cluster with bare-metal on-premise. Now
at my new company, the situation is a little bit different, and I'm
wondering if there are some kind of best practices:
The company is in the middle of a transition from on-premise to AWS. The
old stuff is still running in the DC, the newer micro services are running
within autoscales groups on AWS and other AWS services like DynamoDB,
Kinesis and Lambda are also on the rise.

So in my naive view of the world (where no problems occur..... never!) I'm
thinking that it would be great to span a hybrid mesos cluster over AWS&DC
to leverage the still available resources in the DC which gets more and
more underutilized over the time.

Now my naive world view slowly crumbles, and I realize that I'm missing the
experience with AWS. Questions that are already popping up (beside all
those Questions, where I currently don't know that I will have them...) are:
* Is Virtual Private Gateway to my VPC enough, or do I need to aim for a
Direct Connect?
* Put everything into one Account, or use a Multi-Account strategy? (Mainly
to prevent things running amok and drag stuff down while running into an
account wide shared limit?)
* Will e.g. DynamoDb be "fast" enough if it's accessed from the Datacenter.

I'll appreciate any feedback or lessons learned about that topic :)

Thanks,
Florian

Re: Mesos on hybrid AWS&DC - Best practices?

Posted by "olaf@magnetic.io" <ol...@magnetic.io>.
I agree about seperate clusters and tooling on top. This is exactly where several of our customers are using Vamp (vamp.io) for: gradual and controlled (canary) moving from legacy/current environments/applications (often on own DC’s) to container-based modern environments (often on public clouds like AWS). Vamp’s gateways can manage the canary routing based on HAproxy, and our integration with DC/OS can handle container deployments, (auto)scaling and routing/lb’ing on the modern DC/OS cluster.

Cheers, Olaf

Olaf Molenveld
co-founder / CEO
-------------------------------------
VAMP: the Canary test and release platform for containers by magnetic.io
E: olaf@magnetic.io
T: +31653362783
Skype: olafmol
www.vamp.io <http://www.vamp.io/>
www.magnetic.io <http://www.magnetic.io/>






> On 30 Jun 2016, at 19:05, Sharma Podila <sp...@netflix.com> wrote:
> 
> I would second the suggestion of separate Mesos clusters for DC and AWS, with a layer on top for picking one or either based on the job SLAs and resource requirements.
> The local storage on cloud instances are more ephemeral than I'd expect the DC instances to be. So, persistent storage of job metadata needs consideration. Using something like DynamoDB may work, however, depending on the scale of your operations, you may have to plan for EC2 rate limiting its API calls and/or paying for higher IOPS for data storage/access. 
> Treating the cloud instances as immutable infrastructure has additional benefits. For example, we deploy new Mesos master ASG for version upgrades, let them join the quorum, and then "tear down" the old master ASG. Same for agents. Although, for agent migration our framework does coordinate migration of jobs from old agent ASG to new one with some SLAs on not too many instances of a service being down at a time. Sort of what the maintenance primitives from Mesos aim to address.
> 
> 
> On Thu, Jun 30, 2016 at 9:41 AM, Ken Sipe <kensipe@gmail.com <ma...@gmail.com>> wrote:
> I would suggest a cluster on AWS and a cluster on-prem.    Then tooling on top to manage between the 2.
> It is unlikely that a failure of a task on-prem should have a scheduled replacement on AWS or vise versa.    It is likely that you will end up creating constraints to statically partition the clusters anyway IMO.
> 2 Clusters eliminates most of your proposed questions.
> 
> ken
> 
> > On Jun 30, 2016, at 10:57 AM, Florian Pfeiffer <fpfeiffer@x8s.de <ma...@x8s.de>> wrote:
> >
> > Hi,
> >
> > the last 2 years I managed a mesos cluster with bare-metal on-premise. Now at my new company, the situation is a little bit different, and I'm wondering if there are some kind of best practices:
> > The company is in the middle of a transition from on-premise to AWS. The old stuff is still running in the DC, the newer micro services are running within autoscales groups on AWS and other AWS services like DynamoDB, Kinesis and Lambda are also on the rise.
> >
> > So in my naive view of the world (where no problems occur..... never!) I'm thinking that it would be great to span a hybrid mesos cluster over AWS&DC to leverage the still available resources in the DC which gets more and more underutilized over the time.
> >
> > Now my naive world view slowly crumbles, and I realize that I'm missing the experience with AWS. Questions that are already popping up (beside all those Questions, where I currently don't know that I will have them...) are:
> > * Is Virtual Private Gateway to my VPC enough, or do I need to aim for a Direct Connect?
> > * Put everything into one Account, or use a Multi-Account strategy? (Mainly to prevent things running amok and drag stuff down while running into an account wide shared limit?)
> > * Will e.g. DynamoDb be "fast" enough if it's accessed from the Datacenter.
> >
> > I'll appreciate any feedback or lessons learned about that topic :)
> >
> > Thanks,
> > Florian
> >
> 
> 


Re: Mesos on hybrid AWS&DC - Best practices?

Posted by Sharma Podila <sp...@netflix.com>.
I would second the suggestion of separate Mesos clusters for DC and AWS,
with a layer on top for picking one or either based on the job SLAs and
resource requirements.
The local storage on cloud instances are more ephemeral than I'd expect the
DC instances to be. So, persistent storage of job metadata needs
consideration. Using something like DynamoDB may work, however, depending
on the scale of your operations, you may have to plan for EC2 rate limiting
its API calls and/or paying for higher IOPS for data storage/access.
Treating the cloud instances as immutable infrastructure has additional
benefits. For example, we deploy new Mesos master ASG for version upgrades,
let them join the quorum, and then "tear down" the old master ASG. Same for
agents. Although, for agent migration our framework does coordinate
migration of jobs from old agent ASG to new one with some SLAs on not too
many instances of a service being down at a time. Sort of what the
maintenance primitives from Mesos aim to address.


On Thu, Jun 30, 2016 at 9:41 AM, Ken Sipe <ke...@gmail.com> wrote:

> I would suggest a cluster on AWS and a cluster on-prem.    Then tooling on
> top to manage between the 2.
> It is unlikely that a failure of a task on-prem should have a scheduled
> replacement on AWS or vise versa.    It is likely that you will end up
> creating constraints to statically partition the clusters anyway IMO.
> 2 Clusters eliminates most of your proposed questions.
>
> ken
>
> > On Jun 30, 2016, at 10:57 AM, Florian Pfeiffer <fp...@x8s.de> wrote:
> >
> > Hi,
> >
> > the last 2 years I managed a mesos cluster with bare-metal on-premise.
> Now at my new company, the situation is a little bit different, and I'm
> wondering if there are some kind of best practices:
> > The company is in the middle of a transition from on-premise to AWS. The
> old stuff is still running in the DC, the newer micro services are running
> within autoscales groups on AWS and other AWS services like DynamoDB,
> Kinesis and Lambda are also on the rise.
> >
> > So in my naive view of the world (where no problems occur..... never!)
> I'm thinking that it would be great to span a hybrid mesos cluster over
> AWS&DC to leverage the still available resources in the DC which gets more
> and more underutilized over the time.
> >
> > Now my naive world view slowly crumbles, and I realize that I'm missing
> the experience with AWS. Questions that are already popping up (beside all
> those Questions, where I currently don't know that I will have them...) are:
> > * Is Virtual Private Gateway to my VPC enough, or do I need to aim for a
> Direct Connect?
> > * Put everything into one Account, or use a Multi-Account strategy?
> (Mainly to prevent things running amok and drag stuff down while running
> into an account wide shared limit?)
> > * Will e.g. DynamoDb be "fast" enough if it's accessed from the
> Datacenter.
> >
> > I'll appreciate any feedback or lessons learned about that topic :)
> >
> > Thanks,
> > Florian
> >
>
>

Re: Mesos on hybrid AWS&DC - Best practices?

Posted by Ken Sipe <ke...@gmail.com>.
I would suggest a cluster on AWS and a cluster on-prem.    Then tooling on top to manage between the 2.
It is unlikely that a failure of a task on-prem should have a scheduled replacement on AWS or vise versa.    It is likely that you will end up creating constraints to statically partition the clusters anyway IMO. 
2 Clusters eliminates most of your proposed questions.

ken

> On Jun 30, 2016, at 10:57 AM, Florian Pfeiffer <fp...@x8s.de> wrote:
> 
> Hi,
> 
> the last 2 years I managed a mesos cluster with bare-metal on-premise. Now at my new company, the situation is a little bit different, and I'm wondering if there are some kind of best practices:
> The company is in the middle of a transition from on-premise to AWS. The old stuff is still running in the DC, the newer micro services are running within autoscales groups on AWS and other AWS services like DynamoDB, Kinesis and Lambda are also on the rise. 
> 
> So in my naive view of the world (where no problems occur..... never!) I'm thinking that it would be great to span a hybrid mesos cluster over AWS&DC to leverage the still available resources in the DC which gets more and more underutilized over the time. 
> 
> Now my naive world view slowly crumbles, and I realize that I'm missing the experience with AWS. Questions that are already popping up (beside all those Questions, where I currently don't know that I will have them...) are:
> * Is Virtual Private Gateway to my VPC enough, or do I need to aim for a Direct Connect?
> * Put everything into one Account, or use a Multi-Account strategy? (Mainly to prevent things running amok and drag stuff down while running into an account wide shared limit?)
> * Will e.g. DynamoDb be "fast" enough if it's accessed from the Datacenter.
> 
> I'll appreciate any feedback or lessons learned about that topic :)
> 
> Thanks,
> Florian
> 


Re: Mesos on hybrid AWS&DC - Best practices?

Posted by Chris Baker <ch...@galacticfog.com>.
I would also be concerned regarding the latency involved in having a Mesos
cluster span across the DC and the cloud provider. There have been some
discussions previously about tolerable latency for master/master and
master/slave; you might search the archives for this.

On Thu, Jun 30, 2016 at 11:57 AM Florian Pfeiffer <fp...@x8s.de> wrote:

> Hi,
>
> the last 2 years I managed a mesos cluster with bare-metal on-premise. Now
> at my new company, the situation is a little bit different, and I'm
> wondering if there are some kind of best practices:
> The company is in the middle of a transition from on-premise to AWS. The
> old stuff is still running in the DC, the newer micro services are running
> within autoscales groups on AWS and other AWS services like DynamoDB,
> Kinesis and Lambda are also on the rise.
>
> So in my naive view of the world (where no problems occur..... never!) I'm
> thinking that it would be great to span a hybrid mesos cluster over AWS&DC
> to leverage the still available resources in the DC which gets more and
> more underutilized over the time.
>
> Now my naive world view slowly crumbles, and I realize that I'm missing
> the experience with AWS. Questions that are already popping up (beside all
> those Questions, where I currently don't know that I will have them...) are:
> * Is Virtual Private Gateway to my VPC enough, or do I need to aim for a
> Direct Connect?
> * Put everything into one Account, or use a Multi-Account strategy?
> (Mainly to prevent things running amok and drag stuff down while running
> into an account wide shared limit?)
> * Will e.g. DynamoDb be "fast" enough if it's accessed from the Datacenter.
>
> I'll appreciate any feedback or lessons learned about that topic :)
>
> Thanks,
> Florian
>
>