You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Jay Kreps <ja...@confluent.io> on 2015/02/25 20:51:28 UTC

Tips for working with Kafka and data streams

Hey guys,

One thing we tried to do along with the product release was start to put
together a practical guide for using Kafka. I wrote this up here:
http://blog.confluent.io/2015/02/25/stream-data-platform-1/

I'd like to keep expanding on this as good practices emerge and we learn
more stuff. So two questions:
1. Anything you think other people should know about working with data
streams? What did you wish you knew when you got started?
2. Anything you don't know about but would like to hear more about?

-Jay

Re: Tips for working with Kafka and data streams

Posted by Tong Li <li...@us.ibm.com>.
+2, these kind of articles coming from the ones who created Kafka always
provide great value to Kafka users and developers. For my 2 cents, I would
love to see one or two articles for developers who involved in Kafka
development on the topics of how to develop test cases and how to run them,
what to expect when error occurs, typical system settings, I suspect that
most of us do run it on linux based systems, little pointer probably can
help a lot. and most importantly how to set up your dev environment so that
you are not struggling with the things the pioneers have already figured
out. For example, recommended dev. ide, debug methods, of course, these
will be the preference of the writer, no one is obligated to use but can
certainly get people started quicker. As Kafka draw more interest, I
suspect more developers will join, having something like that can be
extremely helpful.

Jay, articles similar to the one linked in your original email can actually
be submitted to developerworks, and you can get some money out of it if you
like. If you do not know how to do that, I can certainly provide some
pointers if you are interested.

Thanks.

Tong Li
OpenStack & Kafka Community Development
Building 501/B205
litong01@us.ibm.com



From:	Jay Kreps <ja...@confluent.io>
To:	"dev@kafka.apache.org" <de...@kafka.apache.org>,
            "users@kafka.apache.org" <us...@kafka.apache.org>
Date:	02/25/2015 02:52 PM
Subject:	Tips for working with Kafka and data streams



Hey guys,

One thing we tried to do along with the product release was start to put
together a practical guide for using Kafka. I wrote this up here:
http://blog.confluent.io/2015/02/25/stream-data-platform-1/

I'd like to keep expanding on this as good practices emerge and we learn
more stuff. So two questions:
1. Anything you think other people should know about working with data
streams? What did you wish you knew when you got started?
2. Anything you don't know about but would like to hear more about?

-Jay

Re: Tips for working with Kafka and data streams

Posted by Christian Csar <ch...@csar.us>.
Yeah, we do have scenarios where we use customer specific keys so our
envelopes end up containing key identification information for accessing
our key repository. I'll certainly follow any changes you propose in this
area with interest, but I'd expect that sort of centralized key thing to be
fairly separate from Kafka even if there's a handy optional layer that
integrates with it.

Christian

On Wed, Feb 25, 2015 at 5:34 PM, Julio Castillo <
jcastillo@financialengines.com> wrote:

> Although full disk encryption appears to be an easy solution, in our case
> that may not be sufficient. For cases where the actual payload needs to be
> encrypted, the cost of encryption is paid by the consumer and producers.
> Further complicating the matter would be the handling of encryption keys,
> etc. I think this is the area where enhancements to Kafka may facilitate
> that key exchange between consumers and producers, still leaving it up to
> the clients, but facilitating the key handling.
>
> Julio
>
> On 2/25/15, 4:24 PM, "Christian Csar" <ch...@csar.us> wrote:
>
> >The questions we get from customers typically end up being general so we
> >break out our answer into network level and on disk scenarios.
> >
> >On disk/at rest scenario may just be use full disk encryption at the OS
> >level and Kafka doesn't need to worry about it. But documenting any issues
> >around it would be good. For example what sort of Kafka specific
> >performance impacts does it have, ie budgeting for better processors.
> >
> >The security story right now is to run on a private network, but I believe
> >some of our customers like to be told that within datacenter transmissions
> >are encrypted on the wire. Based on
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_conf
> >luence_display_KAFKA_Security&d=AwIBaQ&c=cKbMccWasSe6U4u_qE0M-qEjqwAh3shju
> >L5QPa1B7Yk&r=rJHFl4LhCQ-6kvKROhIocflKqVSHRTvT-PgdZ5MFuS0&m=jhFmJTJBQfbq0sN
> >jxtKA4M1tvSVgBLKOr2ePaK6zqww&s=HqZ4N2gLpCZ796dRG7Fo-KLOBc0tgnnvDnC_8VTUo84
> >&e=  that might mean
> >waiting for TLS support, or using a VPN/ssh tunnel for the network
> >connections.
> >
> >Since we're in hosted stream land we can't do either of the above and
> >encrypt the messages themselves. For those enterprises that are like our
> >customers but would run Kafka or use Confluent, having a story like the
> >above so they don't give up the benefits of your schema management layers
> >would be good.
> >
> >Since I didn't mention it before I did find your blog posts handy (though
> >I'm already moving us towards stream centric land).
> >
> >Christian
> >
> >On Wed, Feb 25, 2015 at 3:57 PM, Jay Kreps <ja...@gmail.com> wrote:
> >
> >> Hey Christian,
> >>
> >> That makes sense. I agree that would be a good area to dive into. Are
> >>you
> >> primarily interested in network level security or encryption on disk?
> >>
> >> -Jay
> >>
> >> On Wed, Feb 25, 2015 at 1:38 PM, Christian Csar <ch...@csar.us>
> >>wrote:
> >>
> >> > I wouldn't say no to some discussion of encryption. We're running on
> >> Azure
> >> > EventHubs (with preparations for Kinesis for EC2, and Kafka for
> >> deployments
> >> > in customer datacenters when needed) so can't just use disk level
> >> > encryption (which would have its own overhead). We're putting all of
> >>our
> >> > messages inside of encrypted envelopes before sending them to the
> >>stream
> >> > which limits our opportunities for schema verification of the
> >>underlying
> >> > messages to the declared type of the message.
> >> >
> >> > Encryption at rest mostly works out to a sales point for customers who
> >> want
> >> > assurances, and in a Kafka focused discussion might be dealt with by
> >> > covering disk encryption and how the conversations between Kafka
> >> instances
> >> > are protected.
> >> >
> >> > Christian
> >> >
> >> >
> >> > On Wed, Feb 25, 2015 at 11:51 AM, Jay Kreps <ja...@confluent.io> wrote:
> >> >
> >> > > Hey guys,
> >> > >
> >> > > One thing we tried to do along with the product release was start to
> >> put
> >> > > together a practical guide for using Kafka. I wrote this up here:
> >> > >
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.confluent.io_201
> >>5_02_25_stream-2Ddata-2Dplatform-2D1_&d=AwIBaQ&c=cKbMccWasSe6U4u_qE0M-qEj
> >>qwAh3shjuL5QPa1B7Yk&r=rJHFl4LhCQ-6kvKROhIocflKqVSHRTvT-PgdZ5MFuS0&m=jhFmJ
> >>TJBQfbq0sNjxtKA4M1tvSVgBLKOr2ePaK6zqww&s=0I9x4bCw1kN3y9Y22l9lK_YbhSYEZpp4
> >>ZwBBrP-dSLk&e=
> >> > >
> >> > > I'd like to keep expanding on this as good practices emerge and we
> >> learn
> >> > > more stuff. So two questions:
> >> > > 1. Anything you think other people should know about working with
> >>data
> >> > > streams? What did you wish you knew when you got started?
> >> > > 2. Anything you don't know about but would like to hear more about?
> >> > >
> >> > > -Jay
> >> > >
> >> >
> >>
>
> NOTICE: This e-mail and any attachments to it may be privileged,
> confidential or contain trade secret information and is intended only for
> the use of the individual or entity to which it is addressed. If this
> e-mail was sent to you in error, please notify me immediately by either
> reply e-mail or by phone at 408.498.6000, and do not use, disseminate,
> retain, print or copy the e-mail or any attachment. All messages sent to
> and from this e-mail address may be monitored as permitted by or necessary
> under applicable law and regulations.
>

Re: Tips for working with Kafka and data streams

Posted by Julio Castillo <jc...@FinancialEngines.com>.
Although full disk encryption appears to be an easy solution, in our case
that may not be sufficient. For cases where the actual payload needs to be
encrypted, the cost of encryption is paid by the consumer and producers.
Further complicating the matter would be the handling of encryption keys,
etc. I think this is the area where enhancements to Kafka may facilitate
that key exchange between consumers and producers, still leaving it up to
the clients, but facilitating the key handling.

Julio

On 2/25/15, 4:24 PM, "Christian Csar" <ch...@csar.us> wrote:

>The questions we get from customers typically end up being general so we
>break out our answer into network level and on disk scenarios.
>
>On disk/at rest scenario may just be use full disk encryption at the OS
>level and Kafka doesn't need to worry about it. But documenting any issues
>around it would be good. For example what sort of Kafka specific
>performance impacts does it have, ie budgeting for better processors.
>
>The security story right now is to run on a private network, but I believe
>some of our customers like to be told that within datacenter transmissions
>are encrypted on the wire. Based on
>https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_conf
>luence_display_KAFKA_Security&d=AwIBaQ&c=cKbMccWasSe6U4u_qE0M-qEjqwAh3shju
>L5QPa1B7Yk&r=rJHFl4LhCQ-6kvKROhIocflKqVSHRTvT-PgdZ5MFuS0&m=jhFmJTJBQfbq0sN
>jxtKA4M1tvSVgBLKOr2ePaK6zqww&s=HqZ4N2gLpCZ796dRG7Fo-KLOBc0tgnnvDnC_8VTUo84
>&e=  that might mean
>waiting for TLS support, or using a VPN/ssh tunnel for the network
>connections.
>
>Since we're in hosted stream land we can't do either of the above and
>encrypt the messages themselves. For those enterprises that are like our
>customers but would run Kafka or use Confluent, having a story like the
>above so they don't give up the benefits of your schema management layers
>would be good.
>
>Since I didn't mention it before I did find your blog posts handy (though
>I'm already moving us towards stream centric land).
>
>Christian
>
>On Wed, Feb 25, 2015 at 3:57 PM, Jay Kreps <ja...@gmail.com> wrote:
>
>> Hey Christian,
>>
>> That makes sense. I agree that would be a good area to dive into. Are
>>you
>> primarily interested in network level security or encryption on disk?
>>
>> -Jay
>>
>> On Wed, Feb 25, 2015 at 1:38 PM, Christian Csar <ch...@csar.us>
>>wrote:
>>
>> > I wouldn't say no to some discussion of encryption. We're running on
>> Azure
>> > EventHubs (with preparations for Kinesis for EC2, and Kafka for
>> deployments
>> > in customer datacenters when needed) so can't just use disk level
>> > encryption (which would have its own overhead). We're putting all of
>>our
>> > messages inside of encrypted envelopes before sending them to the
>>stream
>> > which limits our opportunities for schema verification of the
>>underlying
>> > messages to the declared type of the message.
>> >
>> > Encryption at rest mostly works out to a sales point for customers who
>> want
>> > assurances, and in a Kafka focused discussion might be dealt with by
>> > covering disk encryption and how the conversations between Kafka
>> instances
>> > are protected.
>> >
>> > Christian
>> >
>> >
>> > On Wed, Feb 25, 2015 at 11:51 AM, Jay Kreps <ja...@confluent.io> wrote:
>> >
>> > > Hey guys,
>> > >
>> > > One thing we tried to do along with the product release was start to
>> put
>> > > together a practical guide for using Kafka. I wrote this up here:
>> > > 
>>https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.confluent.io_201
>>5_02_25_stream-2Ddata-2Dplatform-2D1_&d=AwIBaQ&c=cKbMccWasSe6U4u_qE0M-qEj
>>qwAh3shjuL5QPa1B7Yk&r=rJHFl4LhCQ-6kvKROhIocflKqVSHRTvT-PgdZ5MFuS0&m=jhFmJ
>>TJBQfbq0sNjxtKA4M1tvSVgBLKOr2ePaK6zqww&s=0I9x4bCw1kN3y9Y22l9lK_YbhSYEZpp4
>>ZwBBrP-dSLk&e= 
>> > >
>> > > I'd like to keep expanding on this as good practices emerge and we
>> learn
>> > > more stuff. So two questions:
>> > > 1. Anything you think other people should know about working with
>>data
>> > > streams? What did you wish you knew when you got started?
>> > > 2. Anything you don't know about but would like to hear more about?
>> > >
>> > > -Jay
>> > >
>> >
>>

NOTICE: This e-mail and any attachments to it may be privileged, confidential or contain trade secret information and is intended only for the use of the individual or entity to which it is addressed. If this e-mail was sent to you in error, please notify me immediately by either reply e-mail or by phone at 408.498.6000, and do not use, disseminate, retain, print or copy the e-mail or any attachment. All messages sent to and from this e-mail address may be monitored as permitted by or necessary under applicable law and regulations.

Re: Tips for working with Kafka and data streams

Posted by Christian Csar <ch...@csar.us>.
The questions we get from customers typically end up being general so we
break out our answer into network level and on disk scenarios.

On disk/at rest scenario may just be use full disk encryption at the OS
level and Kafka doesn't need to worry about it. But documenting any issues
around it would be good. For example what sort of Kafka specific
performance impacts does it have, ie budgeting for better processors.

The security story right now is to run on a private network, but I believe
some of our customers like to be told that within datacenter transmissions
are encrypted on the wire. Based on
https://cwiki.apache.org/confluence/display/KAFKA/Security that might mean
waiting for TLS support, or using a VPN/ssh tunnel for the network
connections.

Since we're in hosted stream land we can't do either of the above and
encrypt the messages themselves. For those enterprises that are like our
customers but would run Kafka or use Confluent, having a story like the
above so they don't give up the benefits of your schema management layers
would be good.

Since I didn't mention it before I did find your blog posts handy (though
I'm already moving us towards stream centric land).

Christian

On Wed, Feb 25, 2015 at 3:57 PM, Jay Kreps <ja...@gmail.com> wrote:

> Hey Christian,
>
> That makes sense. I agree that would be a good area to dive into. Are you
> primarily interested in network level security or encryption on disk?
>
> -Jay
>
> On Wed, Feb 25, 2015 at 1:38 PM, Christian Csar <ch...@csar.us> wrote:
>
> > I wouldn't say no to some discussion of encryption. We're running on
> Azure
> > EventHubs (with preparations for Kinesis for EC2, and Kafka for
> deployments
> > in customer datacenters when needed) so can't just use disk level
> > encryption (which would have its own overhead). We're putting all of our
> > messages inside of encrypted envelopes before sending them to the stream
> > which limits our opportunities for schema verification of the underlying
> > messages to the declared type of the message.
> >
> > Encryption at rest mostly works out to a sales point for customers who
> want
> > assurances, and in a Kafka focused discussion might be dealt with by
> > covering disk encryption and how the conversations between Kafka
> instances
> > are protected.
> >
> > Christian
> >
> >
> > On Wed, Feb 25, 2015 at 11:51 AM, Jay Kreps <ja...@confluent.io> wrote:
> >
> > > Hey guys,
> > >
> > > One thing we tried to do along with the product release was start to
> put
> > > together a practical guide for using Kafka. I wrote this up here:
> > > http://blog.confluent.io/2015/02/25/stream-data-platform-1/
> > >
> > > I'd like to keep expanding on this as good practices emerge and we
> learn
> > > more stuff. So two questions:
> > > 1. Anything you think other people should know about working with data
> > > streams? What did you wish you knew when you got started?
> > > 2. Anything you don't know about but would like to hear more about?
> > >
> > > -Jay
> > >
> >
>

Re: Tips for working with Kafka and data streams

Posted by Jay Kreps <ja...@gmail.com>.
Hey Christian,

That makes sense. I agree that would be a good area to dive into. Are you
primarily interested in network level security or encryption on disk?

-Jay

On Wed, Feb 25, 2015 at 1:38 PM, Christian Csar <ch...@csar.us> wrote:

> I wouldn't say no to some discussion of encryption. We're running on Azure
> EventHubs (with preparations for Kinesis for EC2, and Kafka for deployments
> in customer datacenters when needed) so can't just use disk level
> encryption (which would have its own overhead). We're putting all of our
> messages inside of encrypted envelopes before sending them to the stream
> which limits our opportunities for schema verification of the underlying
> messages to the declared type of the message.
>
> Encryption at rest mostly works out to a sales point for customers who want
> assurances, and in a Kafka focused discussion might be dealt with by
> covering disk encryption and how the conversations between Kafka instances
> are protected.
>
> Christian
>
>
> On Wed, Feb 25, 2015 at 11:51 AM, Jay Kreps <ja...@confluent.io> wrote:
>
> > Hey guys,
> >
> > One thing we tried to do along with the product release was start to put
> > together a practical guide for using Kafka. I wrote this up here:
> > http://blog.confluent.io/2015/02/25/stream-data-platform-1/
> >
> > I'd like to keep expanding on this as good practices emerge and we learn
> > more stuff. So two questions:
> > 1. Anything you think other people should know about working with data
> > streams? What did you wish you knew when you got started?
> > 2. Anything you don't know about but would like to hear more about?
> >
> > -Jay
> >
>

Re: Tips for working with Kafka and data streams

Posted by Christian Csar <ch...@csar.us>.
I wouldn't say no to some discussion of encryption. We're running on Azure
EventHubs (with preparations for Kinesis for EC2, and Kafka for deployments
in customer datacenters when needed) so can't just use disk level
encryption (which would have its own overhead). We're putting all of our
messages inside of encrypted envelopes before sending them to the stream
which limits our opportunities for schema verification of the underlying
messages to the declared type of the message.

Encryption at rest mostly works out to a sales point for customers who want
assurances, and in a Kafka focused discussion might be dealt with by
covering disk encryption and how the conversations between Kafka instances
are protected.

Christian


On Wed, Feb 25, 2015 at 11:51 AM, Jay Kreps <ja...@confluent.io> wrote:

> Hey guys,
>
> One thing we tried to do along with the product release was start to put
> together a practical guide for using Kafka. I wrote this up here:
> http://blog.confluent.io/2015/02/25/stream-data-platform-1/
>
> I'd like to keep expanding on this as good practices emerge and we learn
> more stuff. So two questions:
> 1. Anything you think other people should know about working with data
> streams? What did you wish you knew when you got started?
> 2. Anything you don't know about but would like to hear more about?
>
> -Jay
>

Re: Tips for working with Kafka and data streams

Posted by Tong Li <li...@us.ibm.com>.
+2, these kind of articles coming from the ones who created Kafka always
provide great value to Kafka users and developers. For my 2 cents, I would
love to see one or two articles for developers who involved in Kafka
development on the topics of how to develop test cases and how to run them,
what to expect when error occurs, typical system settings, I suspect that
most of us do run it on linux based systems, little pointer probably can
help a lot. and most importantly how to set up your dev environment so that
you are not struggling with the things the pioneers have already figured
out. For example, recommended dev. ide, debug methods, of course, these
will be the preference of the writer, no one is obligated to use but can
certainly get people started quicker. As Kafka draw more interest, I
suspect more developers will join, having something like that can be
extremely helpful.

Jay, articles similar to the one linked in your original email can actually
be submitted to developerworks, and you can get some money out of it if you
like. If you do not know how to do that, I can certainly provide some
pointers if you are interested.

Thanks.

Tong Li
OpenStack & Kafka Community Development
Building 501/B205
litong01@us.ibm.com



From:	Jay Kreps <ja...@confluent.io>
To:	"dev@kafka.apache.org" <de...@kafka.apache.org>,
            "users@kafka.apache.org" <us...@kafka.apache.org>
Date:	02/25/2015 02:52 PM
Subject:	Tips for working with Kafka and data streams



Hey guys,

One thing we tried to do along with the product release was start to put
together a practical guide for using Kafka. I wrote this up here:
http://blog.confluent.io/2015/02/25/stream-data-platform-1/

I'd like to keep expanding on this as good practices emerge and we learn
more stuff. So two questions:
1. Anything you think other people should know about working with data
streams? What did you wish you knew when you got started?
2. Anything you don't know about but would like to hear more about?

-Jay