You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Mark <st...@gmail.com> on 2011/12/01 17:28:27 UTC

General questions on functionality and usage

- Does Kafka support pattern matching?

- What are the limitations of one Kafka server in terms of number of 
topics and number of consumers?

- Can you load balance publishing/subscribing across multiple Kafka 
servers to increase redundancy?

- Other than lack of map/reduce support how does Kafka differ than say 
Redis Pub/Sub? (http://redis.io/topics/pubsub)

- Would anyone mind sharing their Kafka setup in terms of both 
functionality/usage and architecture... basically more in depth than the 
usual "Kafka servers our realt-time X" 
(https://cwiki.apache.org/confluence/display/KAFKA/Powered+By). Having 
concrete use cases on the wiki could help gain adoption, especially to 
new users of the pub/sub paradigm, by showing what the powers of pub/sub 
real-time messaging can accomplish.

- Any good papers on what problems pub/sub in general can solve?

Thanks

Re: General questions on functionality and usage

Posted by Jun Rao <ju...@gmail.com>.

Mark,

A topic can have multiple partitions spread over multiple brokers. Those
partitions are evenly assigned to consumers within a group for parallel
consumption.

Thanks,

Jun

On Fri, Dec 2, 2011 at 10:09 AM, Mark <st...@gmail.com> wrote:

> Could you mind explaining how you go about:
>
>
> (1) partitioning and load balancing data across a cluster of machines
>
>
> On 12/2/11 6:42 AM, Jay Kreps wrote:
>
>> I think there are two things here: (1) partitioning and load balancing
>> data
>> across a cluster of machines, and (2) replicating each message on N
>> machines. We do (1) but not (2). We are working on (2), as Jun says.
>>
>> -Jay
>>
>> On Thu, Dec 1, 2011 at 5:29 PM, Jun Rao<ju...@gmail.com>  wrote:
>>
>>  No, multiple servers in each cluster.
>>>
>>> Jun
>>>
>>> On Thu, Dec 1, 2011 at 4:48 PM, Mark<static.void.dev@gmail.com**>
>>>  wrote:
>>>
>>>  So at linked in you only use 1 kafka server?
>>>>
>>>>
>>>> On 12/1/11 9:12 AM, Jun Rao wrote:
>>>>
>>>>  Mark,
>>>>>
>>>>> See my inlined answers below.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Jun
>>>>>
>>>>> On Thu, Dec 1, 2011 at 8:28 AM, Mark<static.void.dev@gmail.com****>
>>>>>
>>>>  wrote:
>>>
>>>>  - Does Kafka support pattern matching?
>>>>>
>>>>>>  There is no server-side filtering in Kafka right now.
>>>>>>
>>>>>
>>>>>  - What are the limitations of one Kafka server in terms of number of
>>>>>
>>>>>> topics and number of consumers?
>>>>>>
>>>>>>  There is no hard limit. However, at LinkedIn, we are dealing with
>>>>>>
>>>>> hundreds
>>>>> of topics and tens of consumers. Large # of topics/consumers could be
>>>>> limited by ZK capacity and OS capacity (e.g., open file handlers).
>>>>> Also,
>>>>> if
>>>>> a consumer consumes a large number of topics, time to balance load will
>>>>>
>>>> be
>>>
>>>> longer.
>>>>>
>>>>>
>>>>>  - Can you load balance publishing/subscribing across multiple Kafka
>>>>>
>>>>>> servers to increase redundancy?
>>>>>>
>>>>>>
>>>>>>  It's possible, but it's not something that's built-in now. We do plan
>>>>>>
>>>>> to
>>>
>>>> support intra-cluster replication. See the design in
>>>>> https://issues.apache.org/****jira/browse/KAFKA-50<https://issues.apache.org/**jira/browse/KAFKA-50>
>>>>> <
>>>>>
>>>> https://issues.apache.org/**jira/browse/KAFKA-50<https://issues.apache.org/jira/browse/KAFKA-50>
>>> >
>>>
>>>>
>>>>>  - Other than lack of map/reduce support how does Kafka differ than say
>>>>>
>>>>>> Redis Pub/Sub? (http://redis.io/topics/****pubsub**<http://redis.io/topics/**pubsub**>
>>>>>> <
>>>>>>
>>>>> http://redis.io/topics/pubsub**** <http://redis.io/topics/pubsub**>>
>>>
>>>> )
>>>>>>
>>>>>>
>>>>>>  Don't know about Redis Pub/Sub. However, Kafka differs from some
>>>>>> other
>>>>>>
>>>>> pub/sub/messaging systems in that it focuses more on scalability,
>>>>> efficiency, and throughput.
>>>>>
>>>>>
>>>>>  - Would anyone mind sharing their Kafka setup in terms of both
>>>>>
>>>>>> functionality/usage and architecture... basically more in depth than
>>>>>>
>>>>> the
>>>
>>>> usual "Kafka servers our realt-time X" (https://cwiki.apache.org/**
>>>>>> confluence/display/KAFKA/******Powered+By<https://cwiki.**
>>>>>> apache.org/confluence/display/****KAFKA/Powered+By<http://apache.org/confluence/display/**KAFKA/Powered+By>
>>>>>> <
>>>>>>
>>>>> https://cwiki.apache.org/**confluence/display/KAFKA/**Powered+By<https://cwiki.apache.org/confluence/display/KAFKA/Powered+By>
>>> >
>>>
>>>> ).
>>>>>>>
>>>>>> Having concrete use cases on the wiki could help gain adoption,
>>>>>> especially
>>>>>> to new users of the pub/sub paradigm, by showing what the powers of
>>>>>> pub/sub
>>>>>> real-time messaging can accomplish.
>>>>>>
>>>>>>
>>>>>>  Yes, we will update the wiki later.
>>>>>>
>>>>>
>>>>>  - Any good papers on what problems pub/sub in general can solve?
>>>>>
>>>>>>
>>>>>>  Some of the design and usage of Kafka can be found in this paper:
>>>>>>
>>>>> http://research.microsoft.com/****en-us/um/people/srikanth/**<http://research.microsoft.com/**en-us/um/people/srikanth/**>
>>>>> netdb11/netdb11papers/netdb11-****final12.pdf<
>>>>>
>>>> http://research.microsoft.com/**en-us/um/people/srikanth/**
>>> netdb11/netdb11papers/netdb11-**final12.pdf<http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf>
>>>
>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>

Re: General questions on functionality and usage

Posted by Mark <st...@gmail.com>.

Could you mind explaining how you go about:

(1) partitioning and load balancing data across a cluster of machines


On 12/2/11 6:42 AM, Jay Kreps wrote:
> I think there are two things here: (1) partitioning and load balancing data
> across a cluster of machines, and (2) replicating each message on N
> machines. We do (1) but not (2). We are working on (2), as Jun says.
>
> -Jay
>
> On Thu, Dec 1, 2011 at 5:29 PM, Jun Rao<ju...@gmail.com>  wrote:
>
>> No, multiple servers in each cluster.
>>
>> Jun
>>
>> On Thu, Dec 1, 2011 at 4:48 PM, Mark<st...@gmail.com>  wrote:
>>
>>> So at linked in you only use 1 kafka server?
>>>
>>>
>>> On 12/1/11 9:12 AM, Jun Rao wrote:
>>>
>>>> Mark,
>>>>
>>>> See my inlined answers below.
>>>>
>>>> Thanks,
>>>>
>>>> Jun
>>>>
>>>> On Thu, Dec 1, 2011 at 8:28 AM, Mark<static.void.dev@gmail.com**>
>>   wrote:
>>>>   - Does Kafka support pattern matching?
>>>>>   There is no server-side filtering in Kafka right now.
>>>>
>>>>   - What are the limitations of one Kafka server in terms of number of
>>>>> topics and number of consumers?
>>>>>
>>>>>   There is no hard limit. However, at LinkedIn, we are dealing with
>>>> hundreds
>>>> of topics and tens of consumers. Large # of topics/consumers could be
>>>> limited by ZK capacity and OS capacity (e.g., open file handlers). Also,
>>>> if
>>>> a consumer consumes a large number of topics, time to balance load will
>> be
>>>> longer.
>>>>
>>>>
>>>>   - Can you load balance publishing/subscribing across multiple Kafka
>>>>> servers to increase redundancy?
>>>>>
>>>>>
>>>>>   It's possible, but it's not something that's built-in now. We do plan
>> to
>>>> support intra-cluster replication. See the design in
>>>> https://issues.apache.org/**jira/browse/KAFKA-50<
>> https://issues.apache.org/jira/browse/KAFKA-50>
>>>>
>>>>   - Other than lack of map/reduce support how does Kafka differ than say
>>>>> Redis Pub/Sub? (http://redis.io/topics/**pubsub**<
>> http://redis.io/topics/pubsub**>
>>>>> )
>>>>>
>>>>>
>>>>>   Don't know about Redis Pub/Sub. However, Kafka differs from some other
>>>> pub/sub/messaging systems in that it focuses more on scalability,
>>>> efficiency, and throughput.
>>>>
>>>>
>>>>   - Would anyone mind sharing their Kafka setup in terms of both
>>>>> functionality/usage and architecture... basically more in depth than
>> the
>>>>> usual "Kafka servers our realt-time X" (https://cwiki.apache.org/**
>>>>> confluence/display/KAFKA/****Powered+By<https://cwiki.**
>>>>> apache.org/confluence/display/**KAFKA/Powered+By<
>> https://cwiki.apache.org/confluence/display/KAFKA/Powered+By>
>>>>>> ).
>>>>> Having concrete use cases on the wiki could help gain adoption,
>>>>> especially
>>>>> to new users of the pub/sub paradigm, by showing what the powers of
>>>>> pub/sub
>>>>> real-time messaging can accomplish.
>>>>>
>>>>>
>>>>>   Yes, we will update the wiki later.
>>>>
>>>>   - Any good papers on what problems pub/sub in general can solve?
>>>>>
>>>>>   Some of the design and usage of Kafka can be found in this paper:
>>>> http://research.microsoft.com/**en-us/um/people/srikanth/**
>>>> netdb11/netdb11papers/netdb11-**final12.pdf<
>> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
>>>>
>>>> Thanks
>>>>
>>>>>
>>>>>
>>>>>

Re: General questions on functionality and usage

Posted by Jay Kreps <ja...@gmail.com>.

I think there are two things here: (1) partitioning and load balancing data
across a cluster of machines, and (2) replicating each message on N
machines. We do (1) but not (2). We are working on (2), as Jun says.

-Jay

On Thu, Dec 1, 2011 at 5:29 PM, Jun Rao <ju...@gmail.com> wrote:

> No, multiple servers in each cluster.
>
> Jun
>
> On Thu, Dec 1, 2011 at 4:48 PM, Mark <st...@gmail.com> wrote:
>
> > So at linked in you only use 1 kafka server?
> >
> >
> > On 12/1/11 9:12 AM, Jun Rao wrote:
> >
> >> Mark,
> >>
> >> See my inlined answers below.
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Thu, Dec 1, 2011 at 8:28 AM, Mark<static.void.dev@gmail.com**>
>  wrote:
> >>
> >>  - Does Kafka support pattern matching?
> >>>
> >>>  There is no server-side filtering in Kafka right now.
> >>
> >>
> >>  - What are the limitations of one Kafka server in terms of number of
> >>> topics and number of consumers?
> >>>
> >>>  There is no hard limit. However, at LinkedIn, we are dealing with
> >> hundreds
> >> of topics and tens of consumers. Large # of topics/consumers could be
> >> limited by ZK capacity and OS capacity (e.g., open file handlers). Also,
> >> if
> >> a consumer consumes a large number of topics, time to balance load will
> be
> >> longer.
> >>
> >>
> >>  - Can you load balance publishing/subscribing across multiple Kafka
> >>> servers to increase redundancy?
> >>>
> >>>
> >>>  It's possible, but it's not something that's built-in now. We do plan
> to
> >> support intra-cluster replication. See the design in
> >> https://issues.apache.org/**jira/browse/KAFKA-50<
> https://issues.apache.org/jira/browse/KAFKA-50>
> >>
> >>
> >>  - Other than lack of map/reduce support how does Kafka differ than say
> >>> Redis Pub/Sub? (http://redis.io/topics/**pubsub**<
> http://redis.io/topics/pubsub**>
> >>> )
> >>>
> >>>
> >>>  Don't know about Redis Pub/Sub. However, Kafka differs from some other
> >> pub/sub/messaging systems in that it focuses more on scalability,
> >> efficiency, and throughput.
> >>
> >>
> >>  - Would anyone mind sharing their Kafka setup in terms of both
> >>> functionality/usage and architecture... basically more in depth than
> the
> >>> usual "Kafka servers our realt-time X" (https://cwiki.apache.org/**
> >>> confluence/display/KAFKA/****Powered+By<https://cwiki.**
> >>> apache.org/confluence/display/**KAFKA/Powered+By<
> https://cwiki.apache.org/confluence/display/KAFKA/Powered+By>
> >>> >).
> >>>
> >>> Having concrete use cases on the wiki could help gain adoption,
> >>> especially
> >>> to new users of the pub/sub paradigm, by showing what the powers of
> >>> pub/sub
> >>> real-time messaging can accomplish.
> >>>
> >>>
> >>>  Yes, we will update the wiki later.
> >>
> >>
> >>  - Any good papers on what problems pub/sub in general can solve?
> >>>
> >>>
> >>>  Some of the design and usage of Kafka can be found in this paper:
> >> http://research.microsoft.com/**en-us/um/people/srikanth/**
> >> netdb11/netdb11papers/netdb11-**final12.pdf<
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> >
> >>
> >>
> >> Thanks
> >>
> >>>
> >>>
> >>>
> >>>
>

Re: General questions on functionality and usage

Posted by Jun Rao <ju...@gmail.com>.

No, multiple servers in each cluster.

Jun

On Thu, Dec 1, 2011 at 4:48 PM, Mark <st...@gmail.com> wrote:

> So at linked in you only use 1 kafka server?
>
>
> On 12/1/11 9:12 AM, Jun Rao wrote:
>
>> Mark,
>>
>> See my inlined answers below.
>>
>> Thanks,
>>
>> Jun
>>
>> On Thu, Dec 1, 2011 at 8:28 AM, Mark<static.void.dev@gmail.com**>  wrote:
>>
>>  - Does Kafka support pattern matching?
>>>
>>>  There is no server-side filtering in Kafka right now.
>>
>>
>>  - What are the limitations of one Kafka server in terms of number of
>>> topics and number of consumers?
>>>
>>>  There is no hard limit. However, at LinkedIn, we are dealing with
>> hundreds
>> of topics and tens of consumers. Large # of topics/consumers could be
>> limited by ZK capacity and OS capacity (e.g., open file handlers). Also,
>> if
>> a consumer consumes a large number of topics, time to balance load will be
>> longer.
>>
>>
>>  - Can you load balance publishing/subscribing across multiple Kafka
>>> servers to increase redundancy?
>>>
>>>
>>>  It's possible, but it's not something that's built-in now. We do plan to
>> support intra-cluster replication. See the design in
>> https://issues.apache.org/**jira/browse/KAFKA-50<https://issues.apache.org/jira/browse/KAFKA-50>
>>
>>
>>  - Other than lack of map/reduce support how does Kafka differ than say
>>> Redis Pub/Sub? (http://redis.io/topics/**pubsub**<http://redis.io/topics/pubsub**>
>>> )
>>>
>>>
>>>  Don't know about Redis Pub/Sub. However, Kafka differs from some other
>> pub/sub/messaging systems in that it focuses more on scalability,
>> efficiency, and throughput.
>>
>>
>>  - Would anyone mind sharing their Kafka setup in terms of both
>>> functionality/usage and architecture... basically more in depth than the
>>> usual "Kafka servers our realt-time X" (https://cwiki.apache.org/**
>>> confluence/display/KAFKA/****Powered+By<https://cwiki.**
>>> apache.org/confluence/display/**KAFKA/Powered+By<https://cwiki.apache.org/confluence/display/KAFKA/Powered+By>
>>> >).
>>>
>>> Having concrete use cases on the wiki could help gain adoption,
>>> especially
>>> to new users of the pub/sub paradigm, by showing what the powers of
>>> pub/sub
>>> real-time messaging can accomplish.
>>>
>>>
>>>  Yes, we will update the wiki later.
>>
>>
>>  - Any good papers on what problems pub/sub in general can solve?
>>>
>>>
>>>  Some of the design and usage of Kafka can be found in this paper:
>> http://research.microsoft.com/**en-us/um/people/srikanth/**
>> netdb11/netdb11papers/netdb11-**final12.pdf<http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf>
>>
>>
>> Thanks
>>
>>>
>>>
>>>
>>>

Re: General questions on functionality and usage

Posted by Mark <st...@gmail.com>.

So at linked in you only use 1 kafka server?

On 12/1/11 9:12 AM, Jun Rao wrote:
> Mark,
>
> See my inlined answers below.
>
> Thanks,
>
> Jun
>
> On Thu, Dec 1, 2011 at 8:28 AM, Mark<st...@gmail.com>  wrote:
>
>> - Does Kafka support pattern matching?
>>
> There is no server-side filtering in Kafka right now.
>
>
>> - What are the limitations of one Kafka server in terms of number of
>> topics and number of consumers?
>>
> There is no hard limit. However, at LinkedIn, we are dealing with hundreds
> of topics and tens of consumers. Large # of topics/consumers could be
> limited by ZK capacity and OS capacity (e.g., open file handlers). Also, if
> a consumer consumes a large number of topics, time to balance load will be
> longer.
>
>
>> - Can you load balance publishing/subscribing across multiple Kafka
>> servers to increase redundancy?
>>
>>
> It's possible, but it's not something that's built-in now. We do plan to
> support intra-cluster replication. See the design in
> https://issues.apache.org/jira/browse/KAFKA-50
>
>
>> - Other than lack of map/reduce support how does Kafka differ than say
>> Redis Pub/Sub? (http://redis.io/topics/pubsub**)
>>
>>
> Don't know about Redis Pub/Sub. However, Kafka differs from some other
> pub/sub/messaging systems in that it focuses more on scalability,
> efficiency, and throughput.
>
>
>> - Would anyone mind sharing their Kafka setup in terms of both
>> functionality/usage and architecture... basically more in depth than the
>> usual "Kafka servers our realt-time X" (https://cwiki.apache.org/**
>> confluence/display/KAFKA/**Powered+By<https://cwiki.apache.org/confluence/display/KAFKA/Powered+By>).
>> Having concrete use cases on the wiki could help gain adoption, especially
>> to new users of the pub/sub paradigm, by showing what the powers of pub/sub
>> real-time messaging can accomplish.
>>
>>
> Yes, we will update the wiki later.
>
>
>> - Any good papers on what problems pub/sub in general can solve?
>>
>>
> Some of the design and usage of Kafka can be found in this paper:
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
>
>
> Thanks
>>
>>
>>

Re: General questions on functionality and usage

Posted by Jun Rao <ju...@gmail.com>.

Mark,

See my inlined answers below.

Thanks,

Jun

On Thu, Dec 1, 2011 at 8:28 AM, Mark <st...@gmail.com> wrote:

> - Does Kafka support pattern matching?
>

There is no server-side filtering in Kafka right now.


>
> - What are the limitations of one Kafka server in terms of number of
> topics and number of consumers?
>

There is no hard limit. However, at LinkedIn, we are dealing with hundreds
of topics and tens of consumers. Large # of topics/consumers could be
limited by ZK capacity and OS capacity (e.g., open file handlers). Also, if
a consumer consumes a large number of topics, time to balance load will be
longer.


>
> - Can you load balance publishing/subscribing across multiple Kafka
> servers to increase redundancy?
>
>
It's possible, but it's not something that's built-in now. We do plan to
support intra-cluster replication. See the design in
https://issues.apache.org/jira/browse/KAFKA-50


> - Other than lack of map/reduce support how does Kafka differ than say
> Redis Pub/Sub? (http://redis.io/topics/pubsub**)
>
>
Don't know about Redis Pub/Sub. However, Kafka differs from some other
pub/sub/messaging systems in that it focuses more on scalability,
efficiency, and throughput.


> - Would anyone mind sharing their Kafka setup in terms of both
> functionality/usage and architecture... basically more in depth than the
> usual "Kafka servers our realt-time X" (https://cwiki.apache.org/**
> confluence/display/KAFKA/**Powered+By<https://cwiki.apache.org/confluence/display/KAFKA/Powered+By>).
> Having concrete use cases on the wiki could help gain adoption, especially
> to new users of the pub/sub paradigm, by showing what the powers of pub/sub
> real-time messaging can accomplish.
>
>
Yes, we will update the wiki later.


> - Any good papers on what problems pub/sub in general can solve?
>
>
Some of the design and usage of Kafka can be found in this paper:
http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf


Thanks
>
>
>
>