You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by James Cheng <jc...@tivo.com> on 2015/02/27 07:36:11 UTC

If you run Kafka in AWS or Docker, how do you persist data?

Hi,

I know that Netflix might be talking about "Kafka on AWS" at the March meetup, but I wanted to bring up the topic anyway.

I'm sure that some people are running Kafka in AWS. Is anyone running Kafka within docker in production? How does that work?

For both of these, how do you persist data? If on AWS, do you use EBS? Do you use ephemeral storage and then rely on replication? And if using docker, do you persist data outside the docker container and on the host machine?

And related, how do you deal with broker failure? Do you simply replace it, and repopulate a new broker via replication? Or do you bring back up the broker with the persisted files?

Trying to learn about what people are doing, beyond "on premises and dedicated hardware".

Thanks,
-James

Re: If you run Kafka in AWS or Docker, how do you persist data?

Posted by Colin <co...@clark.ws>.

Hello,

We use docker for kafka on vm's with both nas and local disk.  We mount the volumes externally.  We havent had many problems at all, and a restart has cleared any issue.  We are on .8.1

We are also started to deploy to aws.

--
Colin 
+1 612 859 6129
Skype colin.p.clark

> On Mar 4, 2015, at 10:46 PM, Otis Gospodnetic <ot...@gmail.com> wrote:
> 
> Hi,
> 
>> On Fri, Feb 27, 2015 at 1:36 AM, James Cheng <jc...@tivo.com> wrote:
>> 
>> Hi,
>> 
>> I know that Netflix might be talking about "Kafka on AWS" at the March
>> meetup, but I wanted to bring up the topic anyway.
>> 
>> I'm sure that some people are running Kafka in AWS.
> 
> 
> I'd say most, not some :)
> 
> 
>> Is anyone running Kafka within docker in production? How does that work?
> 
> Not us.  When I was at DevOps Days in NYC last year, everyone was talking
> about Docker, but only about 2.5 people in the room actually really used it.
> 
> For both of these, how do you persist data? If on AWS, do you use EBS? Do
>> you use ephemeral storage and then rely on replication? And if using
>> docker, do you persist data outside the docker container and on the host
>> machine?
> 
> We've used both EBD and local disks in AWS.  We don't have Kafka
> replication, as far as I know.
> 
> And related, how do you deal with broker failure? Do you simply replace it,
>> and repopulate a new broker via replication? Or do you bring back up the
>> broker with the persisted files?
> 
> We monitor all Kafka pieces - producers, consumer, and brokers with SPM.
> We have alerts and anomaly detection enabled for various Kafka metrics
> (yeah, consumer lag being one of them).
> Broker failures have been very rare (we've used 0.7.2, 0.8.1.x, and are now
> on 0.8.2).  When they happened a restart was typically enough. I can recall
> one instance where segments recovery tool a long time (minutes, maybe more
> than an hour), but this was >6 months ago.
> 
> 
>> Trying to learn about what people are doing, beyond "on premises and
>> dedicated hardware".
> 
> In my world almost everyone I talk to is in AWS.
> 
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/

Re: If you run Kafka in AWS or Docker, how do you persist data?

Posted by Otis Gospodnetic <ot...@gmail.com>.

Hi,

On Fri, Feb 27, 2015 at 1:36 AM, James Cheng <jc...@tivo.com> wrote:

> Hi,
>
> I know that Netflix might be talking about "Kafka on AWS" at the March
> meetup, but I wanted to bring up the topic anyway.
>
> I'm sure that some people are running Kafka in AWS.

I'd say most, not some :)

> Is anyone running Kafka within docker in production? How does that work?
>

Not us.  When I was at DevOps Days in NYC last year, everyone was talking
about Docker, but only about 2.5 people in the room actually really used it.

For both of these, how do you persist data? If on AWS, do you use EBS? Do
> you use ephemeral storage and then rely on replication? And if using
> docker, do you persist data outside the docker container and on the host
> machine?
>

We've used both EBD and local disks in AWS.  We don't have Kafka
replication, as far as I know.

And related, how do you deal with broker failure? Do you simply replace it,
> and repopulate a new broker via replication? Or do you bring back up the
> broker with the persisted files?
>

We monitor all Kafka pieces - producers, consumer, and brokers with SPM.
We have alerts and anomaly detection enabled for various Kafka metrics
(yeah, consumer lag being one of them).
Broker failures have been very rare (we've used 0.7.2, 0.8.1.x, and are now
on 0.8.2).  When they happened a restart was typically enough. I can recall
one instance where segments recovery tool a long time (minutes, maybe more
than an hour), but this was >6 months ago.

> Trying to learn about what people are doing, beyond "on premises and
> dedicated hardware".
>

In my world almost everyone I talk to is in AWS.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

Re: If you run Kafka in AWS or Docker, how do you persist data?

Posted by Joseph Lawson <jl...@roomkey.com>.

Side question: why run kafka on docker for aws? Is the docker config being used for configuration management? Are there more systems running on the instance other than kafka?

Sent by Outlook<http://taps.io/outlookmobile> for Android

On Sun, Mar 1, 2015 at 1:10 PM -0800, "Ewen Cheslack-Postava" <ew...@confluent.io>> wrote:

On Fri, Feb 27, 2015 at 8:09 PM, Jeff Schroeder <je...@computer.org>
wrote:

> Kafka on dedicated hosts running in docker under marathon under Mesos. It
> was a real bear to get working, but is really beautiful once I did manage
> to get it working. I simply run with a unique hostname constraint and
> number of instances = replication factor. If a broker dies and it isn't a
> hardware or network issue, marathon restarts it.
>
> The hardest part was that Kafka was registering to ZK with the internal (to
> docker) port. My workaround was that you have to use the same port inside
> and outside docker or it will register to ZK with whatever the port is
> inside the container.
>

You should be able to use advertised.host.name and advertised.port to
control this, so you aren't required to use the same port inside and
outside Docker.

>
> FYI this is an on premise dedicated Mesos cluster running on bare metal :)
>
> On Friday, February 27, 2015, James Cheng <jc...@tivo.com> wrote:
>
> > Hi,
> >
> > I know that Netflix might be talking about "Kafka on AWS" at the March
> > meetup, but I wanted to bring up the topic anyway.
> >
> > I'm sure that some people are running Kafka in AWS. Is anyone running
> > Kafka within docker in production? How does that work?
> >
> > For both of these, how do you persist data? If on AWS, do you use EBS? Do
> > you use ephemeral storage and then rely on replication? And if using
> > docker, do you persist data outside the docker container and on the host
> > machine?
>

On AWS, your choice will depend on a tradeoff of tolerance for data loss,
performance, and price sensitivity. You might be able to get better/more
predictable performance out of the ephemeral instance storage, but since
you are presumably running all instances in the same AZ you leave yourself
open to significant data loss if there's a coordinated outage. It's pretty
rare, but it does happen. With EBS you may have to do more work or spread
across more volumes to get the same throughput. Relevant quote from the
docs on provisioned IOPS: "Additionally, you can stripe multiple volumes
together to achieve up to 48,000 IOPS or 800MBps when attached to larger
EC2 instances". (Note MBps not Mbps.) Other considerations: AWS has been
moving most of its instance storage to SSDs, so getting enough instance
storage space can be relatively pricey, and you can also potentially go
with a hybrid setup to get a balance of the two, but you'll need to be very
careful about partition assignment then to ensure at least one copy of
every partition ends up on an EBS-backed node.

For Docker, you probably want the data to be stored on a volume. If
possible, it would be better if non-hardware errors could be resolved just
by restarting the broker. You'll avoid a lot of needless copying of data.
Storing data in a volume would let you simply restart a new container and
have it pick up where the last one left off. The example of Postgres given
for a volume container in https://docs.docker.com/userguide/dockervolumes/
isn't too far from Kafka if you were to assume Postgres was replicating to
a slave -- you'd prefer to reuse the existing data on the existing node
(which a volume container enables), but could still handle bringing up a
new node if necessary.

> >
> > And related, how do you deal with broker failure? Do you simply replace
> > it, and repopulate a new broker via replication? Or do you bring back up
> > the broker with the persisted files?
> >
> > Trying to learn about what people are doing, beyond "on premises and
> > dedicated hardware".
> >
> > Thanks,
> > -James
> >
> >
>
> --
> Text by Jeff, typos by iPhone
>

--
Thanks,
Ewen

Re: If you run Kafka in AWS or Docker, how do you persist data?

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.

On Fri, Feb 27, 2015 at 8:09 PM, Jeff Schroeder <je...@computer.org>
wrote:

> Kafka on dedicated hosts running in docker under marathon under Mesos. It
> was a real bear to get working, but is really beautiful once I did manage
> to get it working. I simply run with a unique hostname constraint and
> number of instances = replication factor. If a broker dies and it isn't a
> hardware or network issue, marathon restarts it.
>
> The hardest part was that Kafka was registering to ZK with the internal (to
> docker) port. My workaround was that you have to use the same port inside
> and outside docker or it will register to ZK with whatever the port is
> inside the container.
>

You should be able to use advertised.host.name and advertised.port to
control this, so you aren't required to use the same port inside and
outside Docker.

>
> FYI this is an on premise dedicated Mesos cluster running on bare metal :)
>
> On Friday, February 27, 2015, James Cheng <jc...@tivo.com> wrote:
>
> > Hi,
> >
> > I know that Netflix might be talking about "Kafka on AWS" at the March
> > meetup, but I wanted to bring up the topic anyway.
> >
> > I'm sure that some people are running Kafka in AWS. Is anyone running
> > Kafka within docker in production? How does that work?
> >
> > For both of these, how do you persist data? If on AWS, do you use EBS? Do
> > you use ephemeral storage and then rely on replication? And if using
> > docker, do you persist data outside the docker container and on the host
> > machine?
>

On AWS, your choice will depend on a tradeoff of tolerance for data loss,
performance, and price sensitivity. You might be able to get better/more
predictable performance out of the ephemeral instance storage, but since
you are presumably running all instances in the same AZ you leave yourself
open to significant data loss if there's a coordinated outage. It's pretty
rare, but it does happen. With EBS you may have to do more work or spread
across more volumes to get the same throughput. Relevant quote from the
docs on provisioned IOPS: "Additionally, you can stripe multiple volumes
together to achieve up to 48,000 IOPS or 800MBps when attached to larger
EC2 instances". (Note MBps not Mbps.) Other considerations: AWS has been
moving most of its instance storage to SSDs, so getting enough instance
storage space can be relatively pricey, and you can also potentially go
with a hybrid setup to get a balance of the two, but you'll need to be very
careful about partition assignment then to ensure at least one copy of
every partition ends up on an EBS-backed node.

For Docker, you probably want the data to be stored on a volume. If
possible, it would be better if non-hardware errors could be resolved just
by restarting the broker. You'll avoid a lot of needless copying of data.
Storing data in a volume would let you simply restart a new container and
have it pick up where the last one left off. The example of Postgres given
for a volume container in https://docs.docker.com/userguide/dockervolumes/
isn't too far from Kafka if you were to assume Postgres was replicating to
a slave -- you'd prefer to reuse the existing data on the existing node
(which a volume container enables), but could still handle bringing up a
new node if necessary.

> >
> > And related, how do you deal with broker failure? Do you simply replace
> > it, and repopulate a new broker via replication? Or do you bring back up
> > the broker with the persisted files?
> >
> > Trying to learn about what people are doing, beyond "on premises and
> > dedicated hardware".
> >
> > Thanks,
> > -James
> >
> >
>
> --
> Text by Jeff, typos by iPhone
>

-- 
Thanks,
Ewen

Re: If you run Kafka in AWS or Docker, how do you persist data?

Posted by Jeff Schroeder <je...@computer.org>.

Kafka on dedicated hosts running in docker under marathon under Mesos. It
was a real bear to get working, but is really beautiful once I did manage
to get it working. I simply run with a unique hostname constraint and
number of instances = replication factor. If a broker dies and it isn't a
hardware or network issue, marathon restarts it.

The hardest part was that Kafka was registering to ZK with the internal (to
docker) port. My workaround was that you have to use the same port inside
and outside docker or it will register to ZK with whatever the port is
inside the container.

FYI this is an on premise dedicated Mesos cluster running on bare metal :)

On Friday, February 27, 2015, James Cheng <jc...@tivo.com> wrote:

> Hi,
>
> I know that Netflix might be talking about "Kafka on AWS" at the March
> meetup, but I wanted to bring up the topic anyway.
>
> I'm sure that some people are running Kafka in AWS. Is anyone running
> Kafka within docker in production? How does that work?
>
> For both of these, how do you persist data? If on AWS, do you use EBS? Do
> you use ephemeral storage and then rely on replication? And if using
> docker, do you persist data outside the docker container and on the host
> machine?
>
> And related, how do you deal with broker failure? Do you simply replace
> it, and repopulate a new broker via replication? Or do you bring back up
> the broker with the persisted files?
>
> Trying to learn about what people are doing, beyond "on premises and
> dedicated hardware".
>
> Thanks,
> -James
>
>

-- 
Text by Jeff, typos by iPhone