You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Bao Thai Ngo <ba...@gmail.com> on 2012/02/17 11:57:54 UTC

Re: Kafka is live in prod @ 100%

Hi Taylor,

I found your email and the Kafka use case by chance. Our use case is a
little similar to yours. We actually implement semantic partitioning to
maintain some kind of produced data and we are also running several
thousand topics as you.

One issue we have been facing is that it is totally inconvenient for us to
maintain and update Kafka server configuration (server.properties) when
running several thousand topics. We have to put number of partitions on a
per-topic in the way Kafka requires:

### Overrides for for the default given by num.partitions on a per-topic
basis
topic.partition.count.map = topic1:4, topic2:4, ..., topicn:4

I am almost sure that you did meet this issue I have mentioned, so I am
curious to know how you solved it.

Thanks,
~Thai

On Wed, Dec 7, 2011 at 12:34 AM, Taylor Gautier <tg...@tagged.com> wrote:

> We had to isolate topics to specific servers because we are running
> several hundred thousand topics in aggregate.
>
> Due to the directory strategy of Kafka it's not feasible to put that
> many topics in every host since they reside in a single directory.
>
> An improvement we considered making was to make the data directory
> nested which would have alleviated this problem.  We also could have
> tried a different filesystem but we weren't confident that would solve
> the problem entirely.
>
> The advantage to our solution is that each host in our Kafka tier is
> literally share nothing. It will scale horizontally for a long, long
> way.
>
> And it's also a contingency plan. Since Kafka was unproven (for us
> anyway at the time) it was easier to build smaller components with
> less overall functionality and glue them together in a scalable way.
> If we had had to we could have out a different message bus in place.
> But we didn't want to do that if we could avoid it :)
>
>
>
> On Dec 6, 2011, at 9:13 AM, Neha Narkhede <ne...@gmail.com> wrote:
>
> > Taylor,
> >
> > This sounds great ! Congratulations on this launch.
> >
> >>> But basically we have many topics, few messages (relatively) per topic
> >
> > Can you explain your strategy of mapping topics to brokers ? The default
> in
> > Kafka today is to have all brokers host all topics.
> >
> >>> An end user browser makes a long-poll event http connection to receive
> >  1:1 messages and 1:M messages from a specialized http server we built
> for
> >  this purpose.  1:M messages are delivered from Kafka.
> >
> > What do you use for receiving 1:1 messages ?
> >
> > Your use case is interesting and different. It will be great if you add
> > relevant details here -
> > https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
> >
> > Thanks,
> > Neha
> >
> >
> > On Tue, Dec 6, 2011 at 8:44 AM, Jun Rao <ju...@gmail.com> wrote:
> >
> >> Hi, Taylor,
> >>
> >> Thanks for the update. This is great. Could you update your usage in
> Kafka
> >> wiki? Also, do you delete topics online? If so, how do you do that?
> >>
> >> Jun
> >>
> >> On Tue, Dec 6, 2011 at 8:30 AM, Taylor Gautier <tg...@tagged.com>
> >> wrote:
> >>
> >>> I've already mentioned this before, but I wanted to give a quick shout
> to
> >>> let you guys know that our newest game, Deckadence, is 100% live as of
> >>> yesterday.
> >>>
> >>> Check it out at http://www.tagged.com/deckadence.html
> >>>
> >>> A little about our use case:
> >>>
> >>>  - Deckadence is a game of buying and selling - or rather trading -
> >>>  cards.  Every user on Tagged owns a card.  There are 100M uses on
> >> Tagged,
> >>>  so that means there are 100M cards to trade.
> >>>  - Kafka enables real-time delivery of events in the game
> >>>  - An end user browser makes a long-poll event http connection to
> >> receive
> >>>  1:1 messages and 1:M messages from a specialized http server we built
> >> for
> >>>  this purpose.  1:M messages are delivered from Kafka.
> >>>  - Because of this design, we can publish a message anywhere inside our
> >>>  datacenter and send it directly and immediately to any other system
> >> that
> >>> is
> >>>  subscribed to Kafka, or to an end-user browser
> >>>  - Every update event for every card is sent to a unique topic that
> >>>  represents the users card.
> >>>  - When a user is browsing any card or list of cards - say a search
> >>>  result - their browser subscribes to all of the cards on screen.
> >>>  - The effect of this is that any changes to any card seen on-screen
> are
> >>>  seen in real-time by all users of the game
> >>>  - Our primary producers and consumers are PHP and NodeJS, respectively
> >>>
> >>> Well, I plan to write up more about this use case in the near future.
>  As
> >>> you might have guessed, this is just about as far away from the
> original
> >>> intent of Kafka as you could get - we have PHP that sends messages to
> >>> Kafka.  Since it's not good to hold a TCP connection open in PHP, we
> had
> >> to
> >>> do some trickery here.  There was no existing Node client so we had to
> >>> write our own.  And since there are 100 million users registered on
> >> Tagged,
> >>> that means we could have in theory 100M topics.  Of course in practice
> we
> >>> have far fewer than that.  One of the main things we currently have to
> do
> >>> is aggressively clean topics.  But basically we have many topics, few
> >>> messages (relatively) per topic.  And order matters, so we had to deal
> >> with
> >>> ensuring that we could handle the number of topics we would create, and
> >>> ensure ordered delivery and receipt.
> >>>
> >>> In the future I have big plans for Kafka, another feature is currently
> in
> >>> private test and will be released to the public soon (it uses Kafka in
> a
> >>> more traditional way).  And we hope to have many more in 2012...
> >>>
> >>
>

Re: Kafka is live in prod @ 100%

Posted by Bao Thai Ngo <ba...@gmail.com>.
Neha,

Thanks for your response.

Yes, we really need to specify number of partitions differently for
thousand topics. The option num.partitions does a little help in our use
case. Some reasons Taylor and I already explained in the previous emails.
In addition, we currently implement semantic partitioning to maintain
various kinds of produced data. High throughput is needed for important and
quickly-produced/processed kinds of data but not for others.

What do you think?

Thanks,
~Thai

On Sat, Mar 3, 2012 at 6:35 AM, Neha Narkhede <ne...@gmail.com>wrote:

> Thai,
>
> Do you really need to specify the number of partitions differently for
> so many topics ?
> I wonder if setting the right default for num.partitions works instead ?
>
> Thanks,
> Neha
>
> On Sun, Feb 19, 2012 at 6:47 PM, Bao Thai Ngo <ba...@gmail.com>
> wrote:
> > Hi,
> >
> > I like the idea Taylor suggested. This will definitely help a lot.
> >
> > Another approach I would suggest is to let Kafka load information of
> > topic.partition.count.map from an external file (plain-text, xml, ect) in
> > some format like:
> > topic1:#partition
> > topic2:#partition
> > ....
> > topicn:#partition
> >
> > By this way, a Kafka user will be also able to modify (manually or
> > automatically by a script) this information as he/she wants.
> >
> > What do you think?
> >
> > Thanks,
> > ~Thai
> >
> > On Sat, Feb 18, 2012 at 1:52 AM, Taylor Gautier <tg...@tagged.com>
> wrote:
> >
> >> Jun,
> >>
> >> No, it's necessary for us to modify the tuning parameters on a per topic
> >> basis using wildcards, e.g.
> >>
> >> topic.flush.intervals.ms=chat*:100,presence*:1000
> >>
> >> On Fri, Feb 17, 2012 at 10:09 AM, Jun Rao <ju...@gmail.com> wrote:
> >>
> >> > Taylor,
> >> >
> >> > We don't have a jira for that. Please open one.
> >> >
> >> > In 0.8, we will have DDLs for creating topics, which you can use to
> >> > customize # partitions. Will that be enough?
> >> >
> >> > Jun
> >> >
> >> > On Fri, Feb 17, 2012 at 9:02 AM, Taylor Gautier <tg...@tagged.com>
> >> > wrote:
> >> >
> >> > > Hi Thai.
> >> > >
> >> > > Well, actually we didn't solve this problem.  We had to use the
> global
> >> > > topic settings that apply to all topics.
> >> > >
> >> > > I would really like to see globs (wildcards) supported in the config
> >> > > settings.  This is something my team and I have discussed on several
> >> > > occasions.
> >> > >
> >> > > I'm not sure if there is a Kafka JIRA to cover that feature…
> >> > >
> >> > > -Taylor
> >> > >
> >> > > On Fri, Feb 17, 2012 at 2:57 AM, Bao Thai Ngo <baothaingo@gmail.com
> >
> >> > > wrote:
> >> > >
> >> > > > Hi Taylor,
> >> > > >
> >> > > > I found your email and the Kafka use case by chance. Our use case
> is
> >> a
> >> > > > little similar to yours. We actually implement semantic
> partitioning
> >> to
> >> > > > maintain some kind of produced data and we are also running
> several
> >> > > > thousand topics as you.
> >> > > >
> >> > > > One issue we have been facing is that it is totally inconvenient
> for
> >> us
> >> > > to
> >> > > > maintain and update Kafka server configuration (server.properties)
> >> when
> >> > > > running several thousand topics. We have to put number of
> partitions
> >> > on a
> >> > > > per-topic in the way Kafka requires:
> >> > > >
> >> > > > ### Overrides for for the default given by num.partitions on a
> >> > per-topic
> >> > > > basis
> >> > > > topic.partition.count.map = topic1:4, topic2:4, ..., topicn:4
> >> > > >
> >> > > > I am almost sure that you did meet this issue I have mentioned,
> so I
> >> am
> >> > > > curious to know how you solved it.
> >> > > >
> >> > > > Thanks,
> >> > > > ~Thai
> >> > > >
> >> > > > On Wed, Dec 7, 2011 at 12:34 AM, Taylor Gautier <
> tgautier@tagged.com
> >> > > >wrote:
> >> > > >
> >> > > >> We had to isolate topics to specific servers because we are
> running
> >> > > >> several hundred thousand topics in aggregate.
> >> > > >>
> >> > > >> Due to the directory strategy of Kafka it's not feasible to put
> that
> >> > > >> many topics in every host since they reside in a single
> directory.
> >> > > >>
> >> > > >> An improvement we considered making was to make the data
> directory
> >> > > >> nested which would have alleviated this problem.  We also could
> have
> >> > > >> tried a different filesystem but we weren't confident that would
> >> solve
> >> > > >> the problem entirely.
> >> > > >>
> >> > > >> The advantage to our solution is that each host in our Kafka
> tier is
> >> > > >> literally share nothing. It will scale horizontally for a long,
> long
> >> > > >> way.
> >> > > >>
> >> > > >> And it's also a contingency plan. Since Kafka was unproven (for
> us
> >> > > >> anyway at the time) it was easier to build smaller components
> with
> >> > > >> less overall functionality and glue them together in a scalable
> way.
> >> > > >> If we had had to we could have out a different message bus in
> place.
> >> > > >> But we didn't want to do that if we could avoid it :)
> >> > > >>
> >> > > >>
> >> > > >>
> >> > > >> On Dec 6, 2011, at 9:13 AM, Neha Narkhede <
> neha.narkhede@gmail.com>
> >> > > >> wrote:
> >> > > >>
> >> > > >> > Taylor,
> >> > > >> >
> >> > > >> > This sounds great ! Congratulations on this launch.
> >> > > >> >
> >> > > >> >>> But basically we have many topics, few messages (relatively)
> per
> >> > > topic
> >> > > >> >
> >> > > >> > Can you explain your strategy of mapping topics to brokers ?
> The
> >> > > >> default in
> >> > > >> > Kafka today is to have all brokers host all topics.
> >> > > >> >
> >> > > >> >>> An end user browser makes a long-poll event http connection
> to
> >> > > receive
> >> > > >> >  1:1 messages and 1:M messages from a specialized http server
> we
> >> > built
> >> > > >> for
> >> > > >> >  this purpose.  1:M messages are delivered from Kafka.
> >> > > >> >
> >> > > >> > What do you use for receiving 1:1 messages ?
> >> > > >> >
> >> > > >> > Your use case is interesting and different. It will be great if
> >> you
> >> > > add
> >> > > >> > relevant details here -
> >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
> >> > > >> >
> >> > > >> > Thanks,
> >> > > >> > Neha
> >> > > >> >
> >> > > >> >
> >> > > >> > On Tue, Dec 6, 2011 at 8:44 AM, Jun Rao <ju...@gmail.com>
> wrote:
> >> > > >> >
> >> > > >> >> Hi, Taylor,
> >> > > >> >>
> >> > > >> >> Thanks for the update. This is great. Could you update your
> usage
> >> > in
> >> > > >> Kafka
> >> > > >> >> wiki? Also, do you delete topics online? If so, how do you do
> >> that?
> >> > > >> >>
> >> > > >> >> Jun
> >> > > >> >>
> >> > > >> >> On Tue, Dec 6, 2011 at 8:30 AM, Taylor Gautier <
> >> > tgautier@tagged.com>
> >> > > >> >> wrote:
> >> > > >> >>
> >> > > >> >>> I've already mentioned this before, but I wanted to give a
> quick
> >> > > >> shout to
> >> > > >> >>> let you guys know that our newest game, Deckadence, is 100%
> live
> >> > as
> >> > > of
> >> > > >> >>> yesterday.
> >> > > >> >>>
> >> > > >> >>> Check it out at http://www.tagged.com/deckadence.html
> >> > > >> >>>
> >> > > >> >>> A little about our use case:
> >> > > >> >>>
> >> > > >> >>>  - Deckadence is a game of buying and selling - or rather
> >> trading
> >> > -
> >> > > >> >>>  cards.  Every user on Tagged owns a card.  There are 100M
> uses
> >> on
> >> > > >> >> Tagged,
> >> > > >> >>>  so that means there are 100M cards to trade.
> >> > > >> >>>  - Kafka enables real-time delivery of events in the game
> >> > > >> >>>  - An end user browser makes a long-poll event http
> connection
> >> to
> >> > > >> >> receive
> >> > > >> >>>  1:1 messages and 1:M messages from a specialized http
> server we
> >> > > built
> >> > > >> >> for
> >> > > >> >>>  this purpose.  1:M messages are delivered from Kafka.
> >> > > >> >>>  - Because of this design, we can publish a message anywhere
> >> > inside
> >> > > >> our
> >> > > >> >>>  datacenter and send it directly and immediately to any other
> >> > system
> >> > > >> >> that
> >> > > >> >>> is
> >> > > >> >>>  subscribed to Kafka, or to an end-user browser
> >> > > >> >>>  - Every update event for every card is sent to a unique
> topic
> >> > that
> >> > > >> >>>  represents the users card.
> >> > > >> >>>  - When a user is browsing any card or list of cards - say a
> >> > search
> >> > > >> >>>  result - their browser subscribes to all of the cards on
> >> screen.
> >> > > >> >>>  - The effect of this is that any changes to any card seen
> >> > on-screen
> >> > > >> are
> >> > > >> >>>  seen in real-time by all users of the game
> >> > > >> >>>  - Our primary producers and consumers are PHP and NodeJS,
> >> > > >> respectively
> >> > > >> >>>
> >> > > >> >>> Well, I plan to write up more about this use case in the near
> >> > > future.
> >> > > >>  As
> >> > > >> >>> you might have guessed, this is just about as far away from
> the
> >> > > >> original
> >> > > >> >>> intent of Kafka as you could get - we have PHP that sends
> >> messages
> >> > > to
> >> > > >> >>> Kafka.  Since it's not good to hold a TCP connection open in
> >> PHP,
> >> > we
> >> > > >> had
> >> > > >> >> to
> >> > > >> >>> do some trickery here.  There was no existing Node client so
> we
> >> > had
> >> > > to
> >> > > >> >>> write our own.  And since there are 100 million users
> registered
> >> > on
> >> > > >> >> Tagged,
> >> > > >> >>> that means we could have in theory 100M topics.  Of course in
> >> > > >> practice we
> >> > > >> >>> have far fewer than that.  One of the main things we
> currently
> >> > have
> >> > > >> to do
> >> > > >> >>> is aggressively clean topics.  But basically we have many
> >> topics,
> >> > > few
> >> > > >> >>> messages (relatively) per topic.  And order matters, so we
> had
> >> to
> >> > > deal
> >> > > >> >> with
> >> > > >> >>> ensuring that we could handle the number of topics we would
> >> > create,
> >> > > >> and
> >> > > >> >>> ensure ordered delivery and receipt.
> >> > > >> >>>
> >> > > >> >>> In the future I have big plans for Kafka, another feature is
> >> > > >> currently in
> >> > > >> >>> private test and will be released to the public soon (it uses
> >> > Kafka
> >> > > >> in a
> >> > > >> >>> more traditional way).  And we hope to have many more in
> 2012...
> >> > > >> >>>
> >> > > >> >>
> >> > > >>
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
>

Re: Kafka is live in prod @ 100%

Posted by Neha Narkhede <ne...@gmail.com>.
Thai,

Do you really need to specify the number of partitions differently for
so many topics ?
I wonder if setting the right default for num.partitions works instead ?

Thanks,
Neha

On Sun, Feb 19, 2012 at 6:47 PM, Bao Thai Ngo <ba...@gmail.com> wrote:
> Hi,
>
> I like the idea Taylor suggested. This will definitely help a lot.
>
> Another approach I would suggest is to let Kafka load information of
> topic.partition.count.map from an external file (plain-text, xml, ect) in
> some format like:
> topic1:#partition
> topic2:#partition
> ....
> topicn:#partition
>
> By this way, a Kafka user will be also able to modify (manually or
> automatically by a script) this information as he/she wants.
>
> What do you think?
>
> Thanks,
> ~Thai
>
> On Sat, Feb 18, 2012 at 1:52 AM, Taylor Gautier <tg...@tagged.com> wrote:
>
>> Jun,
>>
>> No, it's necessary for us to modify the tuning parameters on a per topic
>> basis using wildcards, e.g.
>>
>> topic.flush.intervals.ms=chat*:100,presence*:1000
>>
>> On Fri, Feb 17, 2012 at 10:09 AM, Jun Rao <ju...@gmail.com> wrote:
>>
>> > Taylor,
>> >
>> > We don't have a jira for that. Please open one.
>> >
>> > In 0.8, we will have DDLs for creating topics, which you can use to
>> > customize # partitions. Will that be enough?
>> >
>> > Jun
>> >
>> > On Fri, Feb 17, 2012 at 9:02 AM, Taylor Gautier <tg...@tagged.com>
>> > wrote:
>> >
>> > > Hi Thai.
>> > >
>> > > Well, actually we didn't solve this problem.  We had to use the global
>> > > topic settings that apply to all topics.
>> > >
>> > > I would really like to see globs (wildcards) supported in the config
>> > > settings.  This is something my team and I have discussed on several
>> > > occasions.
>> > >
>> > > I'm not sure if there is a Kafka JIRA to cover that feature…
>> > >
>> > > -Taylor
>> > >
>> > > On Fri, Feb 17, 2012 at 2:57 AM, Bao Thai Ngo <ba...@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi Taylor,
>> > > >
>> > > > I found your email and the Kafka use case by chance. Our use case is
>> a
>> > > > little similar to yours. We actually implement semantic partitioning
>> to
>> > > > maintain some kind of produced data and we are also running several
>> > > > thousand topics as you.
>> > > >
>> > > > One issue we have been facing is that it is totally inconvenient for
>> us
>> > > to
>> > > > maintain and update Kafka server configuration (server.properties)
>> when
>> > > > running several thousand topics. We have to put number of partitions
>> > on a
>> > > > per-topic in the way Kafka requires:
>> > > >
>> > > > ### Overrides for for the default given by num.partitions on a
>> > per-topic
>> > > > basis
>> > > > topic.partition.count.map = topic1:4, topic2:4, ..., topicn:4
>> > > >
>> > > > I am almost sure that you did meet this issue I have mentioned, so I
>> am
>> > > > curious to know how you solved it.
>> > > >
>> > > > Thanks,
>> > > > ~Thai
>> > > >
>> > > > On Wed, Dec 7, 2011 at 12:34 AM, Taylor Gautier <tgautier@tagged.com
>> > > >wrote:
>> > > >
>> > > >> We had to isolate topics to specific servers because we are running
>> > > >> several hundred thousand topics in aggregate.
>> > > >>
>> > > >> Due to the directory strategy of Kafka it's not feasible to put that
>> > > >> many topics in every host since they reside in a single directory.
>> > > >>
>> > > >> An improvement we considered making was to make the data directory
>> > > >> nested which would have alleviated this problem.  We also could have
>> > > >> tried a different filesystem but we weren't confident that would
>> solve
>> > > >> the problem entirely.
>> > > >>
>> > > >> The advantage to our solution is that each host in our Kafka tier is
>> > > >> literally share nothing. It will scale horizontally for a long, long
>> > > >> way.
>> > > >>
>> > > >> And it's also a contingency plan. Since Kafka was unproven (for us
>> > > >> anyway at the time) it was easier to build smaller components with
>> > > >> less overall functionality and glue them together in a scalable way.
>> > > >> If we had had to we could have out a different message bus in place.
>> > > >> But we didn't want to do that if we could avoid it :)
>> > > >>
>> > > >>
>> > > >>
>> > > >> On Dec 6, 2011, at 9:13 AM, Neha Narkhede <ne...@gmail.com>
>> > > >> wrote:
>> > > >>
>> > > >> > Taylor,
>> > > >> >
>> > > >> > This sounds great ! Congratulations on this launch.
>> > > >> >
>> > > >> >>> But basically we have many topics, few messages (relatively) per
>> > > topic
>> > > >> >
>> > > >> > Can you explain your strategy of mapping topics to brokers ? The
>> > > >> default in
>> > > >> > Kafka today is to have all brokers host all topics.
>> > > >> >
>> > > >> >>> An end user browser makes a long-poll event http connection to
>> > > receive
>> > > >> >  1:1 messages and 1:M messages from a specialized http server we
>> > built
>> > > >> for
>> > > >> >  this purpose.  1:M messages are delivered from Kafka.
>> > > >> >
>> > > >> > What do you use for receiving 1:1 messages ?
>> > > >> >
>> > > >> > Your use case is interesting and different. It will be great if
>> you
>> > > add
>> > > >> > relevant details here -
>> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
>> > > >> >
>> > > >> > Thanks,
>> > > >> > Neha
>> > > >> >
>> > > >> >
>> > > >> > On Tue, Dec 6, 2011 at 8:44 AM, Jun Rao <ju...@gmail.com> wrote:
>> > > >> >
>> > > >> >> Hi, Taylor,
>> > > >> >>
>> > > >> >> Thanks for the update. This is great. Could you update your usage
>> > in
>> > > >> Kafka
>> > > >> >> wiki? Also, do you delete topics online? If so, how do you do
>> that?
>> > > >> >>
>> > > >> >> Jun
>> > > >> >>
>> > > >> >> On Tue, Dec 6, 2011 at 8:30 AM, Taylor Gautier <
>> > tgautier@tagged.com>
>> > > >> >> wrote:
>> > > >> >>
>> > > >> >>> I've already mentioned this before, but I wanted to give a quick
>> > > >> shout to
>> > > >> >>> let you guys know that our newest game, Deckadence, is 100% live
>> > as
>> > > of
>> > > >> >>> yesterday.
>> > > >> >>>
>> > > >> >>> Check it out at http://www.tagged.com/deckadence.html
>> > > >> >>>
>> > > >> >>> A little about our use case:
>> > > >> >>>
>> > > >> >>>  - Deckadence is a game of buying and selling - or rather
>> trading
>> > -
>> > > >> >>>  cards.  Every user on Tagged owns a card.  There are 100M uses
>> on
>> > > >> >> Tagged,
>> > > >> >>>  so that means there are 100M cards to trade.
>> > > >> >>>  - Kafka enables real-time delivery of events in the game
>> > > >> >>>  - An end user browser makes a long-poll event http connection
>> to
>> > > >> >> receive
>> > > >> >>>  1:1 messages and 1:M messages from a specialized http server we
>> > > built
>> > > >> >> for
>> > > >> >>>  this purpose.  1:M messages are delivered from Kafka.
>> > > >> >>>  - Because of this design, we can publish a message anywhere
>> > inside
>> > > >> our
>> > > >> >>>  datacenter and send it directly and immediately to any other
>> > system
>> > > >> >> that
>> > > >> >>> is
>> > > >> >>>  subscribed to Kafka, or to an end-user browser
>> > > >> >>>  - Every update event for every card is sent to a unique topic
>> > that
>> > > >> >>>  represents the users card.
>> > > >> >>>  - When a user is browsing any card or list of cards - say a
>> > search
>> > > >> >>>  result - their browser subscribes to all of the cards on
>> screen.
>> > > >> >>>  - The effect of this is that any changes to any card seen
>> > on-screen
>> > > >> are
>> > > >> >>>  seen in real-time by all users of the game
>> > > >> >>>  - Our primary producers and consumers are PHP and NodeJS,
>> > > >> respectively
>> > > >> >>>
>> > > >> >>> Well, I plan to write up more about this use case in the near
>> > > future.
>> > > >>  As
>> > > >> >>> you might have guessed, this is just about as far away from the
>> > > >> original
>> > > >> >>> intent of Kafka as you could get - we have PHP that sends
>> messages
>> > > to
>> > > >> >>> Kafka.  Since it's not good to hold a TCP connection open in
>> PHP,
>> > we
>> > > >> had
>> > > >> >> to
>> > > >> >>> do some trickery here.  There was no existing Node client so we
>> > had
>> > > to
>> > > >> >>> write our own.  And since there are 100 million users registered
>> > on
>> > > >> >> Tagged,
>> > > >> >>> that means we could have in theory 100M topics.  Of course in
>> > > >> practice we
>> > > >> >>> have far fewer than that.  One of the main things we currently
>> > have
>> > > >> to do
>> > > >> >>> is aggressively clean topics.  But basically we have many
>> topics,
>> > > few
>> > > >> >>> messages (relatively) per topic.  And order matters, so we had
>> to
>> > > deal
>> > > >> >> with
>> > > >> >>> ensuring that we could handle the number of topics we would
>> > create,
>> > > >> and
>> > > >> >>> ensure ordered delivery and receipt.
>> > > >> >>>
>> > > >> >>> In the future I have big plans for Kafka, another feature is
>> > > >> currently in
>> > > >> >>> private test and will be released to the public soon (it uses
>> > Kafka
>> > > >> in a
>> > > >> >>> more traditional way).  And we hope to have many more in 2012...
>> > > >> >>>
>> > > >> >>
>> > > >>
>> > > >
>> > > >
>> > >
>> >
>>

Re: Kafka is live in prod @ 100%

Posted by Bao Thai Ngo <ba...@gmail.com>.
Hi,

I like the idea Taylor suggested. This will definitely help a lot.

Another approach I would suggest is to let Kafka load information of
topic.partition.count.map from an external file (plain-text, xml, ect) in
some format like:
topic1:#partition
topic2:#partition
....
topicn:#partition

By this way, a Kafka user will be also able to modify (manually or
automatically by a script) this information as he/she wants.

What do you think?

Thanks,
~Thai

On Sat, Feb 18, 2012 at 1:52 AM, Taylor Gautier <tg...@tagged.com> wrote:

> Jun,
>
> No, it's necessary for us to modify the tuning parameters on a per topic
> basis using wildcards, e.g.
>
> topic.flush.intervals.ms=chat*:100,presence*:1000
>
> On Fri, Feb 17, 2012 at 10:09 AM, Jun Rao <ju...@gmail.com> wrote:
>
> > Taylor,
> >
> > We don't have a jira for that. Please open one.
> >
> > In 0.8, we will have DDLs for creating topics, which you can use to
> > customize # partitions. Will that be enough?
> >
> > Jun
> >
> > On Fri, Feb 17, 2012 at 9:02 AM, Taylor Gautier <tg...@tagged.com>
> > wrote:
> >
> > > Hi Thai.
> > >
> > > Well, actually we didn't solve this problem.  We had to use the global
> > > topic settings that apply to all topics.
> > >
> > > I would really like to see globs (wildcards) supported in the config
> > > settings.  This is something my team and I have discussed on several
> > > occasions.
> > >
> > > I'm not sure if there is a Kafka JIRA to cover that feature…
> > >
> > > -Taylor
> > >
> > > On Fri, Feb 17, 2012 at 2:57 AM, Bao Thai Ngo <ba...@gmail.com>
> > > wrote:
> > >
> > > > Hi Taylor,
> > > >
> > > > I found your email and the Kafka use case by chance. Our use case is
> a
> > > > little similar to yours. We actually implement semantic partitioning
> to
> > > > maintain some kind of produced data and we are also running several
> > > > thousand topics as you.
> > > >
> > > > One issue we have been facing is that it is totally inconvenient for
> us
> > > to
> > > > maintain and update Kafka server configuration (server.properties)
> when
> > > > running several thousand topics. We have to put number of partitions
> > on a
> > > > per-topic in the way Kafka requires:
> > > >
> > > > ### Overrides for for the default given by num.partitions on a
> > per-topic
> > > > basis
> > > > topic.partition.count.map = topic1:4, topic2:4, ..., topicn:4
> > > >
> > > > I am almost sure that you did meet this issue I have mentioned, so I
> am
> > > > curious to know how you solved it.
> > > >
> > > > Thanks,
> > > > ~Thai
> > > >
> > > > On Wed, Dec 7, 2011 at 12:34 AM, Taylor Gautier <tgautier@tagged.com
> > > >wrote:
> > > >
> > > >> We had to isolate topics to specific servers because we are running
> > > >> several hundred thousand topics in aggregate.
> > > >>
> > > >> Due to the directory strategy of Kafka it's not feasible to put that
> > > >> many topics in every host since they reside in a single directory.
> > > >>
> > > >> An improvement we considered making was to make the data directory
> > > >> nested which would have alleviated this problem.  We also could have
> > > >> tried a different filesystem but we weren't confident that would
> solve
> > > >> the problem entirely.
> > > >>
> > > >> The advantage to our solution is that each host in our Kafka tier is
> > > >> literally share nothing. It will scale horizontally for a long, long
> > > >> way.
> > > >>
> > > >> And it's also a contingency plan. Since Kafka was unproven (for us
> > > >> anyway at the time) it was easier to build smaller components with
> > > >> less overall functionality and glue them together in a scalable way.
> > > >> If we had had to we could have out a different message bus in place.
> > > >> But we didn't want to do that if we could avoid it :)
> > > >>
> > > >>
> > > >>
> > > >> On Dec 6, 2011, at 9:13 AM, Neha Narkhede <ne...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Taylor,
> > > >> >
> > > >> > This sounds great ! Congratulations on this launch.
> > > >> >
> > > >> >>> But basically we have many topics, few messages (relatively) per
> > > topic
> > > >> >
> > > >> > Can you explain your strategy of mapping topics to brokers ? The
> > > >> default in
> > > >> > Kafka today is to have all brokers host all topics.
> > > >> >
> > > >> >>> An end user browser makes a long-poll event http connection to
> > > receive
> > > >> >  1:1 messages and 1:M messages from a specialized http server we
> > built
> > > >> for
> > > >> >  this purpose.  1:M messages are delivered from Kafka.
> > > >> >
> > > >> > What do you use for receiving 1:1 messages ?
> > > >> >
> > > >> > Your use case is interesting and different. It will be great if
> you
> > > add
> > > >> > relevant details here -
> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
> > > >> >
> > > >> > Thanks,
> > > >> > Neha
> > > >> >
> > > >> >
> > > >> > On Tue, Dec 6, 2011 at 8:44 AM, Jun Rao <ju...@gmail.com> wrote:
> > > >> >
> > > >> >> Hi, Taylor,
> > > >> >>
> > > >> >> Thanks for the update. This is great. Could you update your usage
> > in
> > > >> Kafka
> > > >> >> wiki? Also, do you delete topics online? If so, how do you do
> that?
> > > >> >>
> > > >> >> Jun
> > > >> >>
> > > >> >> On Tue, Dec 6, 2011 at 8:30 AM, Taylor Gautier <
> > tgautier@tagged.com>
> > > >> >> wrote:
> > > >> >>
> > > >> >>> I've already mentioned this before, but I wanted to give a quick
> > > >> shout to
> > > >> >>> let you guys know that our newest game, Deckadence, is 100% live
> > as
> > > of
> > > >> >>> yesterday.
> > > >> >>>
> > > >> >>> Check it out at http://www.tagged.com/deckadence.html
> > > >> >>>
> > > >> >>> A little about our use case:
> > > >> >>>
> > > >> >>>  - Deckadence is a game of buying and selling - or rather
> trading
> > -
> > > >> >>>  cards.  Every user on Tagged owns a card.  There are 100M uses
> on
> > > >> >> Tagged,
> > > >> >>>  so that means there are 100M cards to trade.
> > > >> >>>  - Kafka enables real-time delivery of events in the game
> > > >> >>>  - An end user browser makes a long-poll event http connection
> to
> > > >> >> receive
> > > >> >>>  1:1 messages and 1:M messages from a specialized http server we
> > > built
> > > >> >> for
> > > >> >>>  this purpose.  1:M messages are delivered from Kafka.
> > > >> >>>  - Because of this design, we can publish a message anywhere
> > inside
> > > >> our
> > > >> >>>  datacenter and send it directly and immediately to any other
> > system
> > > >> >> that
> > > >> >>> is
> > > >> >>>  subscribed to Kafka, or to an end-user browser
> > > >> >>>  - Every update event for every card is sent to a unique topic
> > that
> > > >> >>>  represents the users card.
> > > >> >>>  - When a user is browsing any card or list of cards - say a
> > search
> > > >> >>>  result - their browser subscribes to all of the cards on
> screen.
> > > >> >>>  - The effect of this is that any changes to any card seen
> > on-screen
> > > >> are
> > > >> >>>  seen in real-time by all users of the game
> > > >> >>>  - Our primary producers and consumers are PHP and NodeJS,
> > > >> respectively
> > > >> >>>
> > > >> >>> Well, I plan to write up more about this use case in the near
> > > future.
> > > >>  As
> > > >> >>> you might have guessed, this is just about as far away from the
> > > >> original
> > > >> >>> intent of Kafka as you could get - we have PHP that sends
> messages
> > > to
> > > >> >>> Kafka.  Since it's not good to hold a TCP connection open in
> PHP,
> > we
> > > >> had
> > > >> >> to
> > > >> >>> do some trickery here.  There was no existing Node client so we
> > had
> > > to
> > > >> >>> write our own.  And since there are 100 million users registered
> > on
> > > >> >> Tagged,
> > > >> >>> that means we could have in theory 100M topics.  Of course in
> > > >> practice we
> > > >> >>> have far fewer than that.  One of the main things we currently
> > have
> > > >> to do
> > > >> >>> is aggressively clean topics.  But basically we have many
> topics,
> > > few
> > > >> >>> messages (relatively) per topic.  And order matters, so we had
> to
> > > deal
> > > >> >> with
> > > >> >>> ensuring that we could handle the number of topics we would
> > create,
> > > >> and
> > > >> >>> ensure ordered delivery and receipt.
> > > >> >>>
> > > >> >>> In the future I have big plans for Kafka, another feature is
> > > >> currently in
> > > >> >>> private test and will be released to the public soon (it uses
> > Kafka
> > > >> in a
> > > >> >>> more traditional way).  And we hope to have many more in 2012...
> > > >> >>>
> > > >> >>
> > > >>
> > > >
> > > >
> > >
> >
>

Re: Kafka is live in prod @ 100%

Posted by Taylor Gautier <tg...@tagged.com>.
Jun,

No, it's necessary for us to modify the tuning parameters on a per topic
basis using wildcards, e.g.

topic.flush.intervals.ms=chat*:100,presence*:1000

On Fri, Feb 17, 2012 at 10:09 AM, Jun Rao <ju...@gmail.com> wrote:

> Taylor,
>
> We don't have a jira for that. Please open one.
>
> In 0.8, we will have DDLs for creating topics, which you can use to
> customize # partitions. Will that be enough?
>
> Jun
>
> On Fri, Feb 17, 2012 at 9:02 AM, Taylor Gautier <tg...@tagged.com>
> wrote:
>
> > Hi Thai.
> >
> > Well, actually we didn't solve this problem.  We had to use the global
> > topic settings that apply to all topics.
> >
> > I would really like to see globs (wildcards) supported in the config
> > settings.  This is something my team and I have discussed on several
> > occasions.
> >
> > I'm not sure if there is a Kafka JIRA to cover that feature…
> >
> > -Taylor
> >
> > On Fri, Feb 17, 2012 at 2:57 AM, Bao Thai Ngo <ba...@gmail.com>
> > wrote:
> >
> > > Hi Taylor,
> > >
> > > I found your email and the Kafka use case by chance. Our use case is a
> > > little similar to yours. We actually implement semantic partitioning to
> > > maintain some kind of produced data and we are also running several
> > > thousand topics as you.
> > >
> > > One issue we have been facing is that it is totally inconvenient for us
> > to
> > > maintain and update Kafka server configuration (server.properties) when
> > > running several thousand topics. We have to put number of partitions
> on a
> > > per-topic in the way Kafka requires:
> > >
> > > ### Overrides for for the default given by num.partitions on a
> per-topic
> > > basis
> > > topic.partition.count.map = topic1:4, topic2:4, ..., topicn:4
> > >
> > > I am almost sure that you did meet this issue I have mentioned, so I am
> > > curious to know how you solved it.
> > >
> > > Thanks,
> > > ~Thai
> > >
> > > On Wed, Dec 7, 2011 at 12:34 AM, Taylor Gautier <tgautier@tagged.com
> > >wrote:
> > >
> > >> We had to isolate topics to specific servers because we are running
> > >> several hundred thousand topics in aggregate.
> > >>
> > >> Due to the directory strategy of Kafka it's not feasible to put that
> > >> many topics in every host since they reside in a single directory.
> > >>
> > >> An improvement we considered making was to make the data directory
> > >> nested which would have alleviated this problem.  We also could have
> > >> tried a different filesystem but we weren't confident that would solve
> > >> the problem entirely.
> > >>
> > >> The advantage to our solution is that each host in our Kafka tier is
> > >> literally share nothing. It will scale horizontally for a long, long
> > >> way.
> > >>
> > >> And it's also a contingency plan. Since Kafka was unproven (for us
> > >> anyway at the time) it was easier to build smaller components with
> > >> less overall functionality and glue them together in a scalable way.
> > >> If we had had to we could have out a different message bus in place.
> > >> But we didn't want to do that if we could avoid it :)
> > >>
> > >>
> > >>
> > >> On Dec 6, 2011, at 9:13 AM, Neha Narkhede <ne...@gmail.com>
> > >> wrote:
> > >>
> > >> > Taylor,
> > >> >
> > >> > This sounds great ! Congratulations on this launch.
> > >> >
> > >> >>> But basically we have many topics, few messages (relatively) per
> > topic
> > >> >
> > >> > Can you explain your strategy of mapping topics to brokers ? The
> > >> default in
> > >> > Kafka today is to have all brokers host all topics.
> > >> >
> > >> >>> An end user browser makes a long-poll event http connection to
> > receive
> > >> >  1:1 messages and 1:M messages from a specialized http server we
> built
> > >> for
> > >> >  this purpose.  1:M messages are delivered from Kafka.
> > >> >
> > >> > What do you use for receiving 1:1 messages ?
> > >> >
> > >> > Your use case is interesting and different. It will be great if you
> > add
> > >> > relevant details here -
> > >> > https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
> > >> >
> > >> > Thanks,
> > >> > Neha
> > >> >
> > >> >
> > >> > On Tue, Dec 6, 2011 at 8:44 AM, Jun Rao <ju...@gmail.com> wrote:
> > >> >
> > >> >> Hi, Taylor,
> > >> >>
> > >> >> Thanks for the update. This is great. Could you update your usage
> in
> > >> Kafka
> > >> >> wiki? Also, do you delete topics online? If so, how do you do that?
> > >> >>
> > >> >> Jun
> > >> >>
> > >> >> On Tue, Dec 6, 2011 at 8:30 AM, Taylor Gautier <
> tgautier@tagged.com>
> > >> >> wrote:
> > >> >>
> > >> >>> I've already mentioned this before, but I wanted to give a quick
> > >> shout to
> > >> >>> let you guys know that our newest game, Deckadence, is 100% live
> as
> > of
> > >> >>> yesterday.
> > >> >>>
> > >> >>> Check it out at http://www.tagged.com/deckadence.html
> > >> >>>
> > >> >>> A little about our use case:
> > >> >>>
> > >> >>>  - Deckadence is a game of buying and selling - or rather trading
> -
> > >> >>>  cards.  Every user on Tagged owns a card.  There are 100M uses on
> > >> >> Tagged,
> > >> >>>  so that means there are 100M cards to trade.
> > >> >>>  - Kafka enables real-time delivery of events in the game
> > >> >>>  - An end user browser makes a long-poll event http connection to
> > >> >> receive
> > >> >>>  1:1 messages and 1:M messages from a specialized http server we
> > built
> > >> >> for
> > >> >>>  this purpose.  1:M messages are delivered from Kafka.
> > >> >>>  - Because of this design, we can publish a message anywhere
> inside
> > >> our
> > >> >>>  datacenter and send it directly and immediately to any other
> system
> > >> >> that
> > >> >>> is
> > >> >>>  subscribed to Kafka, or to an end-user browser
> > >> >>>  - Every update event for every card is sent to a unique topic
> that
> > >> >>>  represents the users card.
> > >> >>>  - When a user is browsing any card or list of cards - say a
> search
> > >> >>>  result - their browser subscribes to all of the cards on screen.
> > >> >>>  - The effect of this is that any changes to any card seen
> on-screen
> > >> are
> > >> >>>  seen in real-time by all users of the game
> > >> >>>  - Our primary producers and consumers are PHP and NodeJS,
> > >> respectively
> > >> >>>
> > >> >>> Well, I plan to write up more about this use case in the near
> > future.
> > >>  As
> > >> >>> you might have guessed, this is just about as far away from the
> > >> original
> > >> >>> intent of Kafka as you could get - we have PHP that sends messages
> > to
> > >> >>> Kafka.  Since it's not good to hold a TCP connection open in PHP,
> we
> > >> had
> > >> >> to
> > >> >>> do some trickery here.  There was no existing Node client so we
> had
> > to
> > >> >>> write our own.  And since there are 100 million users registered
> on
> > >> >> Tagged,
> > >> >>> that means we could have in theory 100M topics.  Of course in
> > >> practice we
> > >> >>> have far fewer than that.  One of the main things we currently
> have
> > >> to do
> > >> >>> is aggressively clean topics.  But basically we have many topics,
> > few
> > >> >>> messages (relatively) per topic.  And order matters, so we had to
> > deal
> > >> >> with
> > >> >>> ensuring that we could handle the number of topics we would
> create,
> > >> and
> > >> >>> ensure ordered delivery and receipt.
> > >> >>>
> > >> >>> In the future I have big plans for Kafka, another feature is
> > >> currently in
> > >> >>> private test and will be released to the public soon (it uses
> Kafka
> > >> in a
> > >> >>> more traditional way).  And we hope to have many more in 2012...
> > >> >>>
> > >> >>
> > >>
> > >
> > >
> >
>

Re: Kafka is live in prod @ 100%

Posted by Jun Rao <ju...@gmail.com>.
Taylor,

We don't have a jira for that. Please open one.

In 0.8, we will have DDLs for creating topics, which you can use to
customize # partitions. Will that be enough?

Jun

On Fri, Feb 17, 2012 at 9:02 AM, Taylor Gautier <tg...@tagged.com> wrote:

> Hi Thai.
>
> Well, actually we didn't solve this problem.  We had to use the global
> topic settings that apply to all topics.
>
> I would really like to see globs (wildcards) supported in the config
> settings.  This is something my team and I have discussed on several
> occasions.
>
> I'm not sure if there is a Kafka JIRA to cover that feature…
>
> -Taylor
>
> On Fri, Feb 17, 2012 at 2:57 AM, Bao Thai Ngo <ba...@gmail.com>
> wrote:
>
> > Hi Taylor,
> >
> > I found your email and the Kafka use case by chance. Our use case is a
> > little similar to yours. We actually implement semantic partitioning to
> > maintain some kind of produced data and we are also running several
> > thousand topics as you.
> >
> > One issue we have been facing is that it is totally inconvenient for us
> to
> > maintain and update Kafka server configuration (server.properties) when
> > running several thousand topics. We have to put number of partitions on a
> > per-topic in the way Kafka requires:
> >
> > ### Overrides for for the default given by num.partitions on a per-topic
> > basis
> > topic.partition.count.map = topic1:4, topic2:4, ..., topicn:4
> >
> > I am almost sure that you did meet this issue I have mentioned, so I am
> > curious to know how you solved it.
> >
> > Thanks,
> > ~Thai
> >
> > On Wed, Dec 7, 2011 at 12:34 AM, Taylor Gautier <tgautier@tagged.com
> >wrote:
> >
> >> We had to isolate topics to specific servers because we are running
> >> several hundred thousand topics in aggregate.
> >>
> >> Due to the directory strategy of Kafka it's not feasible to put that
> >> many topics in every host since they reside in a single directory.
> >>
> >> An improvement we considered making was to make the data directory
> >> nested which would have alleviated this problem.  We also could have
> >> tried a different filesystem but we weren't confident that would solve
> >> the problem entirely.
> >>
> >> The advantage to our solution is that each host in our Kafka tier is
> >> literally share nothing. It will scale horizontally for a long, long
> >> way.
> >>
> >> And it's also a contingency plan. Since Kafka was unproven (for us
> >> anyway at the time) it was easier to build smaller components with
> >> less overall functionality and glue them together in a scalable way.
> >> If we had had to we could have out a different message bus in place.
> >> But we didn't want to do that if we could avoid it :)
> >>
> >>
> >>
> >> On Dec 6, 2011, at 9:13 AM, Neha Narkhede <ne...@gmail.com>
> >> wrote:
> >>
> >> > Taylor,
> >> >
> >> > This sounds great ! Congratulations on this launch.
> >> >
> >> >>> But basically we have many topics, few messages (relatively) per
> topic
> >> >
> >> > Can you explain your strategy of mapping topics to brokers ? The
> >> default in
> >> > Kafka today is to have all brokers host all topics.
> >> >
> >> >>> An end user browser makes a long-poll event http connection to
> receive
> >> >  1:1 messages and 1:M messages from a specialized http server we built
> >> for
> >> >  this purpose.  1:M messages are delivered from Kafka.
> >> >
> >> > What do you use for receiving 1:1 messages ?
> >> >
> >> > Your use case is interesting and different. It will be great if you
> add
> >> > relevant details here -
> >> > https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
> >> >
> >> > Thanks,
> >> > Neha
> >> >
> >> >
> >> > On Tue, Dec 6, 2011 at 8:44 AM, Jun Rao <ju...@gmail.com> wrote:
> >> >
> >> >> Hi, Taylor,
> >> >>
> >> >> Thanks for the update. This is great. Could you update your usage in
> >> Kafka
> >> >> wiki? Also, do you delete topics online? If so, how do you do that?
> >> >>
> >> >> Jun
> >> >>
> >> >> On Tue, Dec 6, 2011 at 8:30 AM, Taylor Gautier <tg...@tagged.com>
> >> >> wrote:
> >> >>
> >> >>> I've already mentioned this before, but I wanted to give a quick
> >> shout to
> >> >>> let you guys know that our newest game, Deckadence, is 100% live as
> of
> >> >>> yesterday.
> >> >>>
> >> >>> Check it out at http://www.tagged.com/deckadence.html
> >> >>>
> >> >>> A little about our use case:
> >> >>>
> >> >>>  - Deckadence is a game of buying and selling - or rather trading -
> >> >>>  cards.  Every user on Tagged owns a card.  There are 100M uses on
> >> >> Tagged,
> >> >>>  so that means there are 100M cards to trade.
> >> >>>  - Kafka enables real-time delivery of events in the game
> >> >>>  - An end user browser makes a long-poll event http connection to
> >> >> receive
> >> >>>  1:1 messages and 1:M messages from a specialized http server we
> built
> >> >> for
> >> >>>  this purpose.  1:M messages are delivered from Kafka.
> >> >>>  - Because of this design, we can publish a message anywhere inside
> >> our
> >> >>>  datacenter and send it directly and immediately to any other system
> >> >> that
> >> >>> is
> >> >>>  subscribed to Kafka, or to an end-user browser
> >> >>>  - Every update event for every card is sent to a unique topic that
> >> >>>  represents the users card.
> >> >>>  - When a user is browsing any card or list of cards - say a search
> >> >>>  result - their browser subscribes to all of the cards on screen.
> >> >>>  - The effect of this is that any changes to any card seen on-screen
> >> are
> >> >>>  seen in real-time by all users of the game
> >> >>>  - Our primary producers and consumers are PHP and NodeJS,
> >> respectively
> >> >>>
> >> >>> Well, I plan to write up more about this use case in the near
> future.
> >>  As
> >> >>> you might have guessed, this is just about as far away from the
> >> original
> >> >>> intent of Kafka as you could get - we have PHP that sends messages
> to
> >> >>> Kafka.  Since it's not good to hold a TCP connection open in PHP, we
> >> had
> >> >> to
> >> >>> do some trickery here.  There was no existing Node client so we had
> to
> >> >>> write our own.  And since there are 100 million users registered on
> >> >> Tagged,
> >> >>> that means we could have in theory 100M topics.  Of course in
> >> practice we
> >> >>> have far fewer than that.  One of the main things we currently have
> >> to do
> >> >>> is aggressively clean topics.  But basically we have many topics,
> few
> >> >>> messages (relatively) per topic.  And order matters, so we had to
> deal
> >> >> with
> >> >>> ensuring that we could handle the number of topics we would create,
> >> and
> >> >>> ensure ordered delivery and receipt.
> >> >>>
> >> >>> In the future I have big plans for Kafka, another feature is
> >> currently in
> >> >>> private test and will be released to the public soon (it uses Kafka
> >> in a
> >> >>> more traditional way).  And we hope to have many more in 2012...
> >> >>>
> >> >>
> >>
> >
> >
>

Re: Kafka is live in prod @ 100%

Posted by Taylor Gautier <tg...@tagged.com>.
Hi Thai.

Well, actually we didn't solve this problem.  We had to use the global
topic settings that apply to all topics.

I would really like to see globs (wildcards) supported in the config
settings.  This is something my team and I have discussed on several
occasions.

I'm not sure if there is a Kafka JIRA to cover that feature…

-Taylor

On Fri, Feb 17, 2012 at 2:57 AM, Bao Thai Ngo <ba...@gmail.com> wrote:

> Hi Taylor,
>
> I found your email and the Kafka use case by chance. Our use case is a
> little similar to yours. We actually implement semantic partitioning to
> maintain some kind of produced data and we are also running several
> thousand topics as you.
>
> One issue we have been facing is that it is totally inconvenient for us to
> maintain and update Kafka server configuration (server.properties) when
> running several thousand topics. We have to put number of partitions on a
> per-topic in the way Kafka requires:
>
> ### Overrides for for the default given by num.partitions on a per-topic
> basis
> topic.partition.count.map = topic1:4, topic2:4, ..., topicn:4
>
> I am almost sure that you did meet this issue I have mentioned, so I am
> curious to know how you solved it.
>
> Thanks,
> ~Thai
>
> On Wed, Dec 7, 2011 at 12:34 AM, Taylor Gautier <tg...@tagged.com>wrote:
>
>> We had to isolate topics to specific servers because we are running
>> several hundred thousand topics in aggregate.
>>
>> Due to the directory strategy of Kafka it's not feasible to put that
>> many topics in every host since they reside in a single directory.
>>
>> An improvement we considered making was to make the data directory
>> nested which would have alleviated this problem.  We also could have
>> tried a different filesystem but we weren't confident that would solve
>> the problem entirely.
>>
>> The advantage to our solution is that each host in our Kafka tier is
>> literally share nothing. It will scale horizontally for a long, long
>> way.
>>
>> And it's also a contingency plan. Since Kafka was unproven (for us
>> anyway at the time) it was easier to build smaller components with
>> less overall functionality and glue them together in a scalable way.
>> If we had had to we could have out a different message bus in place.
>> But we didn't want to do that if we could avoid it :)
>>
>>
>>
>> On Dec 6, 2011, at 9:13 AM, Neha Narkhede <ne...@gmail.com>
>> wrote:
>>
>> > Taylor,
>> >
>> > This sounds great ! Congratulations on this launch.
>> >
>> >>> But basically we have many topics, few messages (relatively) per topic
>> >
>> > Can you explain your strategy of mapping topics to brokers ? The
>> default in
>> > Kafka today is to have all brokers host all topics.
>> >
>> >>> An end user browser makes a long-poll event http connection to receive
>> >  1:1 messages and 1:M messages from a specialized http server we built
>> for
>> >  this purpose.  1:M messages are delivered from Kafka.
>> >
>> > What do you use for receiving 1:1 messages ?
>> >
>> > Your use case is interesting and different. It will be great if you add
>> > relevant details here -
>> > https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
>> >
>> > Thanks,
>> > Neha
>> >
>> >
>> > On Tue, Dec 6, 2011 at 8:44 AM, Jun Rao <ju...@gmail.com> wrote:
>> >
>> >> Hi, Taylor,
>> >>
>> >> Thanks for the update. This is great. Could you update your usage in
>> Kafka
>> >> wiki? Also, do you delete topics online? If so, how do you do that?
>> >>
>> >> Jun
>> >>
>> >> On Tue, Dec 6, 2011 at 8:30 AM, Taylor Gautier <tg...@tagged.com>
>> >> wrote:
>> >>
>> >>> I've already mentioned this before, but I wanted to give a quick
>> shout to
>> >>> let you guys know that our newest game, Deckadence, is 100% live as of
>> >>> yesterday.
>> >>>
>> >>> Check it out at http://www.tagged.com/deckadence.html
>> >>>
>> >>> A little about our use case:
>> >>>
>> >>>  - Deckadence is a game of buying and selling - or rather trading -
>> >>>  cards.  Every user on Tagged owns a card.  There are 100M uses on
>> >> Tagged,
>> >>>  so that means there are 100M cards to trade.
>> >>>  - Kafka enables real-time delivery of events in the game
>> >>>  - An end user browser makes a long-poll event http connection to
>> >> receive
>> >>>  1:1 messages and 1:M messages from a specialized http server we built
>> >> for
>> >>>  this purpose.  1:M messages are delivered from Kafka.
>> >>>  - Because of this design, we can publish a message anywhere inside
>> our
>> >>>  datacenter and send it directly and immediately to any other system
>> >> that
>> >>> is
>> >>>  subscribed to Kafka, or to an end-user browser
>> >>>  - Every update event for every card is sent to a unique topic that
>> >>>  represents the users card.
>> >>>  - When a user is browsing any card or list of cards - say a search
>> >>>  result - their browser subscribes to all of the cards on screen.
>> >>>  - The effect of this is that any changes to any card seen on-screen
>> are
>> >>>  seen in real-time by all users of the game
>> >>>  - Our primary producers and consumers are PHP and NodeJS,
>> respectively
>> >>>
>> >>> Well, I plan to write up more about this use case in the near future.
>>  As
>> >>> you might have guessed, this is just about as far away from the
>> original
>> >>> intent of Kafka as you could get - we have PHP that sends messages to
>> >>> Kafka.  Since it's not good to hold a TCP connection open in PHP, we
>> had
>> >> to
>> >>> do some trickery here.  There was no existing Node client so we had to
>> >>> write our own.  And since there are 100 million users registered on
>> >> Tagged,
>> >>> that means we could have in theory 100M topics.  Of course in
>> practice we
>> >>> have far fewer than that.  One of the main things we currently have
>> to do
>> >>> is aggressively clean topics.  But basically we have many topics, few
>> >>> messages (relatively) per topic.  And order matters, so we had to deal
>> >> with
>> >>> ensuring that we could handle the number of topics we would create,
>> and
>> >>> ensure ordered delivery and receipt.
>> >>>
>> >>> In the future I have big plans for Kafka, another feature is
>> currently in
>> >>> private test and will be released to the public soon (it uses Kafka
>> in a
>> >>> more traditional way).  And we hope to have many more in 2012...
>> >>>
>> >>
>>
>
>