You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Geoffry Sumter <vi...@gmail.com> on 2015/02/09 22:24:23 UTC

Your experience running Samza in AWS?

Hello,

I'm looking to experiment with Samza more but wanted to get feedback on
using it in AWS, particularly in production. Are you using YARN? Mesos?
Something custom? Have you documented tradeoffs you've made, reliability
concerns, or pitfalls you've discovered? Is there anything you wish you had
known first? I'd love to benefit from past experience if you have time! :)

I see "This means that YARN can be replaced with other virtualization
frameworks — in particular, we are interested in adding direct AWS
integration. Many companies run in AWS which is itself a virtualization
framework" from
http://samza.apache.org/learn/documentation/0.8/comparisons/introduction.html
Is there work currently being done on this effort?

Thanks for the help,
Geoffry

Re: Your experience running Samza in AWS?

Posted by Chris Riccomini <cr...@apache.org>.
Hey Geoffry,

You will probably be quite interested in SAMZA-516. :)

Jae has been messing with Samza on AWS, and can probably comment about his
experience.

The plan is that we are going to implement Samza as a standalone service,
which you can run inside of AWS (or any other virtualization tech). You
won't need to run YARN/Mesos, or anything. In the meantime, you'll have to
run either YARN or Mesos on top of AWS.

Cheers,
Chris

On Mon, Feb 9, 2015 at 1:24 PM, Geoffry Sumter <vi...@gmail.com> wrote:

> Hello,
>
> I'm looking to experiment with Samza more but wanted to get feedback on
> using it in AWS, particularly in production. Are you using YARN? Mesos?
> Something custom? Have you documented tradeoffs you've made, reliability
> concerns, or pitfalls you've discovered? Is there anything you wish you had
> known first? I'd love to benefit from past experience if you have time! :)
>
> I see "This means that YARN can be replaced with other virtualization
> frameworks — in particular, we are interested in adding direct AWS
> integration. Many companies run in AWS which is itself a virtualization
> framework" from
>
> http://samza.apache.org/learn/documentation/0.8/comparisons/introduction.html
> Is there work currently being done on this effort?
>
> Thanks for the help,
> Geoffry
>

Re: Your experience running Samza in AWS?

Posted by Chris Riccomini <cr...@apache.org>.
Hey Gian,

Thanks for this info. I've updated Samza's FAQ with these recommendations.

Cheers,
Chris

On Sat, Feb 14, 2015 at 8:45 AM, Gian Merlino <gi...@metamarkets.com> wrote:

> Hi Geoffry,
>
> We've been using Samza in production on AWS for a little over a month.
> We're just using the YARN runner on a mostly stock hadoop 2.4.0 cluster
> (not EMR). Our experience is that c3s work well for the YARN instances and
> i2s work well for the Kafka instances. Things have been pretty solid with
> that setup.
>
> For scaling up and scaling down YARN, we just terminate instances or add
> instances, and this works pretty well. It can take a few minutes for the
> cluster to realize a node has gone and respawn containers elsewhere.
>
> We have a separate Kafka cluster just for Samza's use, different from our
> main Kafka cluster. The main reason is that we wanted to isolate off the
> disk and network load of state compactions and restores (we don't use
> compacted topics in our main Kafka cluster, but we do use them with Samza,
> and the extra load on Kafka can be substantial).
>
> Gian
>
> On Mon, Feb 9, 2015 at 1:24 PM, Geoffry Sumter <vi...@gmail.com> wrote:
>
> > Hello,
> >
> > I'm looking to experiment with Samza more but wanted to get feedback on
> > using it in AWS, particularly in production. Are you using YARN? Mesos?
> > Something custom? Have you documented tradeoffs you've made, reliability
> > concerns, or pitfalls you've discovered? Is there anything you wish you
> had
> > known first? I'd love to benefit from past experience if you have time!
> :)
> >
> > I see "This means that YARN can be replaced with other virtualization
> > frameworks — in particular, we are interested in adding direct AWS
> > integration. Many companies run in AWS which is itself a virtualization
> > framework" from
> >
> >
> http://samza.apache.org/learn/documentation/0.8/comparisons/introduction.html
> > Is there work currently being done on this effort?
> >
> > Thanks for the help,
> > Geoffry
> >
>

Re: Your experience running Samza in AWS?

Posted by Gian Merlino <gi...@metamarkets.com>.
Hi Geoffry,

We've been using Samza in production on AWS for a little over a month.
We're just using the YARN runner on a mostly stock hadoop 2.4.0 cluster
(not EMR). Our experience is that c3s work well for the YARN instances and
i2s work well for the Kafka instances. Things have been pretty solid with
that setup.

For scaling up and scaling down YARN, we just terminate instances or add
instances, and this works pretty well. It can take a few minutes for the
cluster to realize a node has gone and respawn containers elsewhere.

We have a separate Kafka cluster just for Samza's use, different from our
main Kafka cluster. The main reason is that we wanted to isolate off the
disk and network load of state compactions and restores (we don't use
compacted topics in our main Kafka cluster, but we do use them with Samza,
and the extra load on Kafka can be substantial).

Gian

On Mon, Feb 9, 2015 at 1:24 PM, Geoffry Sumter <vi...@gmail.com> wrote:

> Hello,
>
> I'm looking to experiment with Samza more but wanted to get feedback on
> using it in AWS, particularly in production. Are you using YARN? Mesos?
> Something custom? Have you documented tradeoffs you've made, reliability
> concerns, or pitfalls you've discovered? Is there anything you wish you had
> known first? I'd love to benefit from past experience if you have time! :)
>
> I see "This means that YARN can be replaced with other virtualization
> frameworks — in particular, we are interested in adding direct AWS
> integration. Many companies run in AWS which is itself a virtualization
> framework" from
>
> http://samza.apache.org/learn/documentation/0.8/comparisons/introduction.html
> Is there work currently being done on this effort?
>
> Thanks for the help,
> Geoffry
>