You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Philip O'Toole <ph...@loggly.com> on 2012/08/30 18:33:00 UTC

Re: kafka internals -- and example client code

Yes -- thanks for this post.

I am new to Kafka, and I'd like clarification on one point. The
classes referenced by this post:

http://people.apache.org/~joestein/kafka-0.7.1-incubating-docs/kafka/consumer/package.html
http://people.apache.org/~joestein/kafka-0.7.1-incubating-docs/kafka/producer/package.html

are the canonical Scala classes for writing Producer and Consumer
clients, correct? I am comparing these docs to the example clients
(particularly the Python and C++ examples). It seems the example
clients simply hard-code values such as "Partition ID", whereas these
docs show the complete way to access such information.

By the way, it seems that if one has to hit Zookeeper every time
before sending a message to Kafka, throughput will take a hit. If one
wants a high-performance system, clients must "use [a] local copy of
the list of brokers and their number of partitions". Is this also
correct?

Thanks,

Philip

--
Philip O'Toole
Senior Developer
Loggly, Inc.
San Francisco, CA

On Wed, Aug 29, 2012 at 6:12 PM, Pankaj Gupta <pa...@brightroll.com> wrote:
> Hey Ming,
>
> Thanks for blogging. Kafka documentation is really good but it is always good to see it  from another perspective.
>
> Pankaj
> On Aug 29, 2012, at 3:57 PM, Ming Han wrote:
>
>> I wrote a blog post about some of Kafka internals, if anyone is interested:
>> http://hanworks.blogspot.com/2012/08/down-rabbit-hole-with-kafka.html
>>
>> Thanks,
>> Ming Han
>

Re: kafka internals -- and example client code

Posted by Ming Han <te...@gmail.com>.

Hi Philip,

For most use cases, the (Java) high level + low level consumer api +
the producer api examples are the correct way to access Kafka.
If you need to do more funky stuff (such as doing automatic broker
discovery with the low level api) then the docs are the correct place
to look.
I am not familiar with the Python/C++ API.

Also, based on my reading of the code, what Jay Kreps (the kafka dev)
wrote is correct.
The information from ZK is cached locally and ZK watches trigger changes.

Thanks,
Ming Han

On Fri, Aug 31, 2012 at 9:35 AM, Philip O'Toole <ph...@loggly.com> wrote:
> Jay - thanks. And my understanding of the Scala docs is correct?
>
> Philip
>
> On Aug 30, 2012, at 2:52 PM, Jay Kreps <ja...@gmail.com> wrote:
>
>> Yeah, we definitely aren't talking to zk on every request. There is a hash
>> map in memory that holds the active brokers, and that is updated when the
>> zk watcher fires, which only happens when the set of brokers change.
>>
>> -Jay
>> On Thu, Aug 30, 2012 at 9:33 AM, Philip O'Toole <ph...@loggly.com> wrote:
>>
>>> Yes -- thanks for this post.
>>>
>>> I am new to Kafka, and I'd like clarification on one point. The
>>> classes referenced by this post:
>>>
>>>
>>> http://people.apache.org/~joestein/kafka-0.7.1-incubating-docs/kafka/consumer/package.html
>>>
>>> http://people.apache.org/~joestein/kafka-0.7.1-incubating-docs/kafka/producer/package.html
>>>
>>> are the canonical Scala classes for writing Producer and Consumer
>>> clients, correct? I am comparing these docs to the example clients
>>> (particularly the Python and C++ examples). It seems the example
>>> clients simply hard-code values such as "Partition ID", whereas these
>>> docs show the complete way to access such information.
>>>
>>> By the way, it seems that if one has to hit Zookeeper every time
>>> before sending a message to Kafka, throughput will take a hit. If one
>>> wants a high-performance system, clients must "use [a] local copy of
>>> the list of brokers and their number of partitions". Is this also
>>> correct?
>>>
>>> Thanks,
>>>
>>> Philip
>>>
>>> --
>>> Philip O'Toole
>>> Senior Developer
>>> Loggly, Inc.
>>> San Francisco, CA
>>>
>>> On Wed, Aug 29, 2012 at 6:12 PM, Pankaj Gupta <pa...@brightroll.com>
>>> wrote:
>>>> Hey Ming,
>>>>
>>>> Thanks for blogging. Kafka documentation is really good but it is always
>>> good to see it  from another perspective.
>>>>
>>>> Pankaj
>>>> On Aug 29, 2012, at 3:57 PM, Ming Han wrote:
>>>>
>>>>> I wrote a blog post about some of Kafka internals, if anyone is
>>> interested:
>>>>> http://hanworks.blogspot.com/2012/08/down-rabbit-hole-with-kafka.html
>>>>>
>>>>> Thanks,
>>>>> Ming Han
>>>>
>>>

Re: kafka internals -- and example client code

Posted by Philip O'Toole <ph...@loggly.com>.

Jay - thanks. And my understanding of the Scala docs is correct?

Philip

On Aug 30, 2012, at 2:52 PM, Jay Kreps <ja...@gmail.com> wrote:

> Yeah, we definitely aren't talking to zk on every request. There is a hash
> map in memory that holds the active brokers, and that is updated when the
> zk watcher fires, which only happens when the set of brokers change.
> 
> -Jay
> On Thu, Aug 30, 2012 at 9:33 AM, Philip O'Toole <ph...@loggly.com> wrote:
> 
>> Yes -- thanks for this post.
>> 
>> I am new to Kafka, and I'd like clarification on one point. The
>> classes referenced by this post:
>> 
>> 
>> http://people.apache.org/~joestein/kafka-0.7.1-incubating-docs/kafka/consumer/package.html
>> 
>> http://people.apache.org/~joestein/kafka-0.7.1-incubating-docs/kafka/producer/package.html
>> 
>> are the canonical Scala classes for writing Producer and Consumer
>> clients, correct? I am comparing these docs to the example clients
>> (particularly the Python and C++ examples). It seems the example
>> clients simply hard-code values such as "Partition ID", whereas these
>> docs show the complete way to access such information.
>> 
>> By the way, it seems that if one has to hit Zookeeper every time
>> before sending a message to Kafka, throughput will take a hit. If one
>> wants a high-performance system, clients must "use [a] local copy of
>> the list of brokers and their number of partitions". Is this also
>> correct?
>> 
>> Thanks,
>> 
>> Philip
>> 
>> --
>> Philip O'Toole
>> Senior Developer
>> Loggly, Inc.
>> San Francisco, CA
>> 
>> On Wed, Aug 29, 2012 at 6:12 PM, Pankaj Gupta <pa...@brightroll.com>
>> wrote:
>>> Hey Ming,
>>> 
>>> Thanks for blogging. Kafka documentation is really good but it is always
>> good to see it  from another perspective.
>>> 
>>> Pankaj
>>> On Aug 29, 2012, at 3:57 PM, Ming Han wrote:
>>> 
>>>> I wrote a blog post about some of Kafka internals, if anyone is
>> interested:
>>>> http://hanworks.blogspot.com/2012/08/down-rabbit-hole-with-kafka.html
>>>> 
>>>> Thanks,
>>>> Ming Han
>>> 
>>

Re: kafka internals -- and example client code

Posted by Jay Kreps <ja...@gmail.com>.

Yeah, we definitely aren't talking to zk on every request. There is a hash
map in memory that holds the active brokers, and that is updated when the
zk watcher fires, which only happens when the set of brokers change.

-Jay
On Thu, Aug 30, 2012 at 9:33 AM, Philip O'Toole <ph...@loggly.com> wrote:

> Yes -- thanks for this post.
>
> I am new to Kafka, and I'd like clarification on one point. The
> classes referenced by this post:
>
>
> http://people.apache.org/~joestein/kafka-0.7.1-incubating-docs/kafka/consumer/package.html
>
> http://people.apache.org/~joestein/kafka-0.7.1-incubating-docs/kafka/producer/package.html
>
> are the canonical Scala classes for writing Producer and Consumer
> clients, correct? I am comparing these docs to the example clients
> (particularly the Python and C++ examples). It seems the example
> clients simply hard-code values such as "Partition ID", whereas these
> docs show the complete way to access such information.
>
> By the way, it seems that if one has to hit Zookeeper every time
> before sending a message to Kafka, throughput will take a hit. If one
> wants a high-performance system, clients must "use [a] local copy of
> the list of brokers and their number of partitions". Is this also
> correct?
>
> Thanks,
>
> Philip
>
> --
> Philip O'Toole
> Senior Developer
> Loggly, Inc.
> San Francisco, CA
>
> On Wed, Aug 29, 2012 at 6:12 PM, Pankaj Gupta <pa...@brightroll.com>
> wrote:
> > Hey Ming,
> >
> > Thanks for blogging. Kafka documentation is really good but it is always
> good to see it  from another perspective.
> >
> > Pankaj
> > On Aug 29, 2012, at 3:57 PM, Ming Han wrote:
> >
> >> I wrote a blog post about some of Kafka internals, if anyone is
> interested:
> >> http://hanworks.blogspot.com/2012/08/down-rabbit-hole-with-kafka.html
> >>
> >> Thanks,
> >> Ming Han
> >
>