You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Chen Wang <ch...@gmail.com> on 2014/08/11 23:19:52 UTC

Issue with 240 topics per day

Folks,
Is there any potential issue with creating 240 topics every day? Although
the retention of each topic is set to be 2 days, I am a little concerned
that since right now there is no delete topic api, the zookeepers might be
overloaded.
Thanks,
Chen

Re: Issue with 240 topics per day

Posted by Philip O'Toole <ph...@yahoo.com.INVALID>.
Why do you need to read it every 6 minutes? Why not just read it as it arrives? If it naturally arrives in 6 minute bursts, you'll read it in 6 minute bursts, no?

Perhaps the data does not have timestamps embedded in it, so that is why you are relying on time-based topic names? In that case I would have an intermediate stage that tags the data with the timestamp, and then writes it to a single topic, and then processes it at your leisure in a third stage.

Perhaps I am still missing a key difficulty with your system.

Your original suggestion is going to be difficult to get working. You'll quickly run out of file descriptors, amongst other issues.

Philip




---------------------------------
http://www.philipotoole.com

> On Aug 11, 2014, at 6:42 PM, Chen Wang <ch...@gmail.com> wrote:
> 
> "And if you can't consume it all within 6 minutes, partition the topic
> until you can run enough consumers such that you can keep up.", this is
> what I intend to do for each 6min -topic.
> 
> What I really need is a partitioned queue: each 6 minute of data can put
> into a separate partition, so that I can read that specific partition at
> the end of each 6 minutes. So apparently redis naturally fit this case, but
> the only issue is the performance,(well also some trick in ensuring the
> reliable message delivery). As I said, we have kakfa infrastructure in
> place, if without too much work, i can make the design work with kafka, i
> would rather go this path instead of setting up another queue system.
> 
> Chen
> 
> Chen
> 
> 
> On Mon, Aug 11, 2014 at 6:07 PM, Philip O'Toole <
> philip.otoole@yahoo.com.invalid> wrote:
> 
>> It's still not clear to me why you need to create so many topics.
>> 
>> Write the data to a single topic and consume it when it arrives. It
>> doesn't matter if it arrives in bursts, as long as you can process it all
>> within 6 minutes, right?
>> 
>> And if you can't consume it all within 6 minutes, partition the topic
>> until you can run enough consumers such that you can keep up. The fact that
>> you are thinking about so many topics is a sign your design is wrong, or
>> Kafka is the wrong solution.
>> 
>> Philip
>> 
>>>> On Aug 11, 2014, at 5:18 PM, Chen Wang <ch...@gmail.com>
>>> wrote:
>>> 
>>> Philip,
>>> That is right. There is huge amount of data flushed into the topic
>> within each 6 minutes. Then at the end of each 6 min, I only want to read
>> from that specify topic, and data within that topic has to be processed as
>> fast as possible. I was originally using redis queue for this purpose, but
>> it takes much longer to process a redis queue than kafka queue(testing data
>> is 2M messages). Since we already have kafka infrastructure setup, instead
>> of seeking other tools(activeMQ, rabbitMQ etc), I would rather make use of
>> kafka, although it does not seem like a common kafka user case.
>>> 
>>> Chen
>>> 
>>> 
>>>> On Mon, Aug 11, 2014 at 5:01 PM, Philip O'Toole
>> <ph...@yahoo.com.invalid> wrote:
>>>> I'd love to know more about what you're trying to do here. It sounds
>> like you're trying to create topics on a schedule, trying to make it easy
>> to locate data for a given time range? I'm not sure it makes sense to use
>> Kafka in this manner.
>>>> 
>>>> Can you provide more detail?
>>>> 
>>>> 
>>>> Philip
>>>> 
>>>> 
>>>> -----------------------------------------
>>>> http://www.philipotoole.com
>>>> 
>>>> 
>>>> On Monday, August 11, 2014 4:45 PM, Chen Wang <
>> chen.apache.solr@gmail.com> wrote:
>>>> 
>>>> 
>>>> 
>>>> Todd,
>>>> I actually only intend to keep each topic valid for 3 days most. Each of
>>>> our topic has 3 partitions, so its around 3*240*3 =2160 partitions.
>> Since
>>>> there is no api for deleting topic, i guess i could set up a cron job
>>>> deleting the out dated topics(folders) from zookeeper..
>>>> do you know when the delete topic api will be available in kafka?
>>>> Chen
>>>> 
>>>> 
>>>> 
>>>> On Mon, Aug 11, 2014 at 3:47 PM, Todd Palino
>> <tp...@linkedin.com.invalid>
>>>> wrote:
>>>> 
>>>>> You need to consider your total partition count as you do this. After
>> 30
>>>>> days, assuming 1 partition per topic, you have 7200 partitions.
>> Depending
>>>>> on how many brokers you have, this can start to be a problem. We just
>>>>> found an issue on one of our clusters that has over 70k partitions
>> that
>>>>> there¹s now a problem with doing actions like a preferred replica
>> election
>>>>> for all topics because the JSON object that gets written to the
>> zookeeper
>>>>> node to trigger it is too large for Zookeeper¹s default 1 MB data
>> size.
>>>>> 
>>>>> You also need to think about the number of open file handles. Even
>> with no
>>>>> data, there will be open files for each topic.
>>>>> 
>>>>> -Todd
>>>>> 
>>>>> 
>>>>>> On 8/11/14, 2:19 PM, "Chen Wang" <ch...@gmail.com> wrote:
>>>>>> 
>>>>>> Folks,
>>>>>> Is there any potential issue with creating 240 topics every day?
>> Although
>>>>>> the retention of each topic is set to be 2 days, I am a little
>> concerned
>>>>>> that since right now there is no delete topic api, the zookeepers
>> might be
>>>>>> overloaded.
>>>>>> Thanks,
>>>>>> Chen
>> 

Re: Issue with 240 topics per day

Posted by Chen Wang <ch...@gmail.com>.
"And if you can't consume it all within 6 minutes, partition the topic
until you can run enough consumers such that you can keep up.", this is
what I intend to do for each 6min -topic.

What I really need is a partitioned queue: each 6 minute of data can put
into a separate partition, so that I can read that specific partition at
the end of each 6 minutes. So apparently redis naturally fit this case, but
the only issue is the performance,(well also some trick in ensuring the
reliable message delivery). As I said, we have kakfa infrastructure in
place, if without too much work, i can make the design work with kafka, i
would rather go this path instead of setting up another queue system.

Chen

Chen


On Mon, Aug 11, 2014 at 6:07 PM, Philip O'Toole <
philip.otoole@yahoo.com.invalid> wrote:

> It's still not clear to me why you need to create so many topics.
>
> Write the data to a single topic and consume it when it arrives. It
> doesn't matter if it arrives in bursts, as long as you can process it all
> within 6 minutes, right?
>
> And if you can't consume it all within 6 minutes, partition the topic
> until you can run enough consumers such that you can keep up. The fact that
> you are thinking about so many topics is a sign your design is wrong, or
> Kafka is the wrong solution.
>
> Philip
>
> > On Aug 11, 2014, at 5:18 PM, Chen Wang <ch...@gmail.com>
> wrote:
> >
> > Philip,
> > That is right. There is huge amount of data flushed into the topic
> within each 6 minutes. Then at the end of each 6 min, I only want to read
> from that specify topic, and data within that topic has to be processed as
> fast as possible. I was originally using redis queue for this purpose, but
> it takes much longer to process a redis queue than kafka queue(testing data
> is 2M messages). Since we already have kafka infrastructure setup, instead
> of seeking other tools(activeMQ, rabbitMQ etc), I would rather make use of
> kafka, although it does not seem like a common kafka user case.
> >
> > Chen
> >
> >
> >> On Mon, Aug 11, 2014 at 5:01 PM, Philip O'Toole
> <ph...@yahoo.com.invalid> wrote:
> >> I'd love to know more about what you're trying to do here. It sounds
> like you're trying to create topics on a schedule, trying to make it easy
> to locate data for a given time range? I'm not sure it makes sense to use
> Kafka in this manner.
> >>
> >> Can you provide more detail?
> >>
> >>
> >> Philip
> >>
> >>
> >> -----------------------------------------
> >> http://www.philipotoole.com
> >>
> >>
> >> On Monday, August 11, 2014 4:45 PM, Chen Wang <
> chen.apache.solr@gmail.com> wrote:
> >>
> >>
> >>
> >> Todd,
> >> I actually only intend to keep each topic valid for 3 days most. Each of
> >> our topic has 3 partitions, so its around 3*240*3 =2160 partitions.
> Since
> >> there is no api for deleting topic, i guess i could set up a cron job
> >> deleting the out dated topics(folders) from zookeeper..
> >> do you know when the delete topic api will be available in kafka?
> >> Chen
> >>
> >>
> >>
> >> On Mon, Aug 11, 2014 at 3:47 PM, Todd Palino
> <tp...@linkedin.com.invalid>
> >> wrote:
> >>
> >> > You need to consider your total partition count as you do this. After
> 30
> >> > days, assuming 1 partition per topic, you have 7200 partitions.
> Depending
> >> > on how many brokers you have, this can start to be a problem. We just
> >> > found an issue on one of our clusters that has over 70k partitions
> that
> >> > there¹s now a problem with doing actions like a preferred replica
> election
> >> > for all topics because the JSON object that gets written to the
> zookeeper
> >> > node to trigger it is too large for Zookeeper¹s default 1 MB data
> size.
> >> >
> >> > You also need to think about the number of open file handles. Even
> with no
> >> > data, there will be open files for each topic.
> >> >
> >> > -Todd
> >> >
> >> >
> >> > On 8/11/14, 2:19 PM, "Chen Wang" <ch...@gmail.com> wrote:
> >> >
> >> > >Folks,
> >> > >Is there any potential issue with creating 240 topics every day?
> Although
> >> > >the retention of each topic is set to be 2 days, I am a little
> concerned
> >> > >that since right now there is no delete topic api, the zookeepers
> might be
> >> > >overloaded.
> >> > >Thanks,
> >> > >Chen
> >> >
> >> >
> >
>

Re: Issue with 240 topics per day

Posted by Philip O'Toole <ph...@yahoo.com.INVALID>.
It's still not clear to me why you need to create so many topics. 

Write the data to a single topic and consume it when it arrives. It doesn't matter if it arrives in bursts, as long as you can process it all within 6 minutes, right?

And if you can't consume it all within 6 minutes, partition the topic until you can run enough consumers such that you can keep up. The fact that you are thinking about so many topics is a sign your design is wrong, or Kafka is the wrong solution. 

Philip

> On Aug 11, 2014, at 5:18 PM, Chen Wang <ch...@gmail.com> wrote:
> 
> Philip,
> That is right. There is huge amount of data flushed into the topic within each 6 minutes. Then at the end of each 6 min, I only want to read from that specify topic, and data within that topic has to be processed as fast as possible. I was originally using redis queue for this purpose, but it takes much longer to process a redis queue than kafka queue(testing data is 2M messages). Since we already have kafka infrastructure setup, instead of seeking other tools(activeMQ, rabbitMQ etc), I would rather make use of kafka, although it does not seem like a common kafka user case.
> 
> Chen
> 
> 
>> On Mon, Aug 11, 2014 at 5:01 PM, Philip O'Toole <ph...@yahoo.com.invalid> wrote:
>> I'd love to know more about what you're trying to do here. It sounds like you're trying to create topics on a schedule, trying to make it easy to locate data for a given time range? I'm not sure it makes sense to use Kafka in this manner.
>> 
>> Can you provide more detail?
>> 
>> 
>> Philip
>> 
>>  
>> -----------------------------------------
>> http://www.philipotoole.com
>> 
>> 
>> On Monday, August 11, 2014 4:45 PM, Chen Wang <ch...@gmail.com> wrote:
>> 
>> 
>> 
>> Todd,
>> I actually only intend to keep each topic valid for 3 days most. Each of
>> our topic has 3 partitions, so its around 3*240*3 =2160 partitions. Since
>> there is no api for deleting topic, i guess i could set up a cron job
>> deleting the out dated topics(folders) from zookeeper..
>> do you know when the delete topic api will be available in kafka?
>> Chen
>> 
>> 
>> 
>> On Mon, Aug 11, 2014 at 3:47 PM, Todd Palino <tp...@linkedin.com.invalid>
>> wrote:
>> 
>> > You need to consider your total partition count as you do this. After 30
>> > days, assuming 1 partition per topic, you have 7200 partitions. Depending
>> > on how many brokers you have, this can start to be a problem. We just
>> > found an issue on one of our clusters that has over 70k partitions that
>> > there¹s now a problem with doing actions like a preferred replica election
>> > for all topics because the JSON object that gets written to the zookeeper
>> > node to trigger it is too large for Zookeeper¹s default 1 MB data size.
>> >
>> > You also need to think about the number of open file handles. Even with no
>> > data, there will be open files for each topic.
>> >
>> > -Todd
>> >
>> >
>> > On 8/11/14, 2:19 PM, "Chen Wang" <ch...@gmail.com> wrote:
>> >
>> > >Folks,
>> > >Is there any potential issue with creating 240 topics every day? Although
>> > >the retention of each topic is set to be 2 days, I am a little concerned
>> > >that since right now there is no delete topic api, the zookeepers might be
>> > >overloaded.
>> > >Thanks,
>> > >Chen
>> >
>> >
> 

Re: Issue with 240 topics per day

Posted by Chen Wang <ch...@gmail.com>.
Philip,
That is right. There is huge amount of data flushed into the topic within
each 6 minutes. Then at the end of each 6 min, I only want to read from
that specify topic, and data within that topic has to be processed as fast
as possible. I was originally using redis queue for this purpose, but it
takes much longer to process a redis queue than kafka queue(testing data is
2M messages). Since we already have kafka infrastructure setup, instead of
seeking other tools(activeMQ, rabbitMQ etc), I would rather make use of
kafka, although it does not seem like a common kafka user case.

Chen


On Mon, Aug 11, 2014 at 5:01 PM, Philip O'Toole <
philip.otoole@yahoo.com.invalid> wrote:

> I'd love to know more about what you're trying to do here. It sounds like
> you're trying to create topics on a schedule, trying to make it easy to
> locate data for a given time range? I'm not sure it makes sense to use
> Kafka in this manner.
>
> Can you provide more detail?
>
>
> Philip
>
>
> -----------------------------------------
> http://www.philipotoole.com
>
>
> On Monday, August 11, 2014 4:45 PM, Chen Wang <ch...@gmail.com>
> wrote:
>
>
>
> Todd,
> I actually only intend to keep each topic valid for 3 days most. Each of
> our topic has 3 partitions, so its around 3*240*3 =2160 partitions. Since
> there is no api for deleting topic, i guess i could set up a cron job
> deleting the out dated topics(folders) from zookeeper..
> do you know when the delete topic api will be available in kafka?
> Chen
>
>
>
> On Mon, Aug 11, 2014 at 3:47 PM, Todd Palino <tpalino@linkedin.com.invalid
> >
> wrote:
>
> > You need to consider your total partition count as you do this. After 30
> > days, assuming 1 partition per topic, you have 7200 partitions. Depending
> > on how many brokers you have, this can start to be a problem. We just
> > found an issue on one of our clusters that has over 70k partitions that
> > there¹s now a problem with doing actions like a preferred replica
> election
> > for all topics because the JSON object that gets written to the zookeeper
> > node to trigger it is too large for Zookeeper¹s default 1 MB data size.
> >
> > You also need to think about the number of open file handles. Even with
> no
> > data, there will be open files for each topic.
> >
> > -Todd
> >
> >
> > On 8/11/14, 2:19 PM, "Chen Wang" <ch...@gmail.com> wrote:
> >
> > >Folks,
> > >Is there any potential issue with creating 240 topics every day?
> Although
> > >the retention of each topic is set to be 2 days, I am a little concerned
> > >that since right now there is no delete topic api, the zookeepers might
> be
> > >overloaded.
> > >Thanks,
> > >Chen
> >
> >
>

Re: Issue with 240 topics per day

Posted by Philip O'Toole <ph...@yahoo.com.INVALID>.
I'd love to know more about what you're trying to do here. It sounds like you're trying to create topics on a schedule, trying to make it easy to locate data for a given time range? I'm not sure it makes sense to use Kafka in this manner.

Can you provide more detail?


Philip

 
-----------------------------------------
http://www.philipotoole.com 


On Monday, August 11, 2014 4:45 PM, Chen Wang <ch...@gmail.com> wrote:
 


Todd,
I actually only intend to keep each topic valid for 3 days most. Each of
our topic has 3 partitions, so its around 3*240*3 =2160 partitions. Since
there is no api for deleting topic, i guess i could set up a cron job
deleting the out dated topics(folders) from zookeeper..
do you know when the delete topic api will be available in kafka?
Chen



On Mon, Aug 11, 2014 at 3:47 PM, Todd Palino <tp...@linkedin.com.invalid>
wrote:

> You need to consider your total partition count as you do this. After 30
> days, assuming 1 partition per topic, you have 7200 partitions. Depending
> on how many brokers you have, this can start to be a problem. We just
> found an issue on one of our clusters that has over 70k partitions that
> there¹s now a problem with doing actions like a preferred replica election
> for all topics because the JSON object that gets written to the zookeeper
> node to trigger it is too large for Zookeeper¹s default 1 MB data size.
>
> You also need to think about the number of open file handles. Even with no
> data, there will be open files for each topic.
>
> -Todd
>
>
> On 8/11/14, 2:19 PM, "Chen Wang" <ch...@gmail.com> wrote:
>
> >Folks,
> >Is there any potential issue with creating 240 topics every day? Although
> >the retention of each topic is set to be 2 days, I am a little concerned
> >that since right now there is no delete topic api, the zookeepers might be
> >overloaded.
> >Thanks,
> >Chen
>
>

Re: Issue with 240 topics per day

Posted by Philip O'Toole <ph...@yahoo.com.INVALID>.
Todd -- can you share details of the ZK cluster you are running, to support this scale? Is it one single Kafka cluster? Are you using 1 single ZK cluster?


Thanks,

Philip

 
-----------------------------------------
http://www.philipotoole.com 


On Monday, August 11, 2014 9:32 PM, Todd Palino <tp...@linkedin.com.INVALID> wrote:
 


As I noted, we have a cluster right now with 70k partitions. It’s running
on over 30 brokers, partly to cover the number of partitions and and
partly to cover the amount of data that we push through it. If you can
have at least 4 or 5 brokers, I wouldn’t anticipate any problems with the
number of partitions. You may need more than that depending on the
throughput you want to handle.

-Todd


On 8/11/14, 9:20 PM, "Chen Wang" <ch...@gmail.com> wrote:

>Todd,
>Yes I actually thought about that. My concern is that even a weeks topic
>partition(240*7*3 = 5040) is too many. Does linkedin have a good
>experience
>in using this many topics in your system?:-)
>Thanks,
>Chen
>
>
>On Mon, Aug 11, 2014 at 9:02 PM, Todd Palino
><tp...@linkedin.com.invalid>
>wrote:
>
>> In order to delete topics, you need to shut down the entire cluster (all
>> brokers), delete the topics from Zookeeper, and delete the log files and
>> partition directory from the disk on the brokers. Then you can restart
>>the
>> cluster. Assuming that you can take a periodic outage on your cluster,
>>you
>> can do it this way.
>>
>> Reading what you’re intending to do in other parts of this thread, have
>> you considered setting up 1 week’s worth of topics with 3 day retention,
>> and having your producer and consumer rotate between them. That is, on
>> Sunday at 12:00 AM, you start with topic1, then proceed to topic2 at
>> 12:06, and so on. The next week, you loop around over exactly the same
>> topics, knowing that the retention settings have cleared out the old
>>data.
>>
>> -Todd
>>
>> On 8/11/14, 4:45 PM, "Chen Wang" <ch...@gmail.com> wrote:
>>
>> >Todd,
>> >I actually only intend to keep each topic valid for 3 days most. Each
>>of
>> >our topic has 3 partitions, so its around 3*240*3 =2160 partitions.
>>Since
>> >there is no api for deleting topic, i guess i could set up a cron job
>> >deleting the out dated topics(folders) from zookeeper..
>> >do you know when the delete topic api will be available in kafka?
>> >Chen
>> >
>> >
>> >On Mon, Aug 11, 2014 at 3:47 PM, Todd Palino
>> ><tp...@linkedin.com.invalid>
>> >wrote:
>> >
>> >> You need to consider your total partition count as you do this.
>>After 30
>> >> days, assuming 1 partition per topic, you have 7200 partitions.
>> >>Depending
>> >> on how many brokers you have, this can start to be a problem. We just
>> >> found an issue on one of our clusters that has over 70k partitions
>>that
>> >> there¹s now a problem with doing actions like a preferred replica
>> >>election
>> >> for all topics because the JSON object that gets written to the
>> >>zookeeper
>> >> node to trigger it is too large for Zookeeper¹s default 1 MB data
>>size.
>> >>
>> >> You also need to think about the number of open file handles. Even
>>with
>> >>no
>> >> data, there will be open files for each topic.
>> >>
>> >> -Todd
>> >>
>> >>
>> >> On 8/11/14, 2:19 PM, "Chen Wang" <ch...@gmail.com> wrote:
>> >>
>> >> >Folks,
>> >> >Is there any potential issue with creating 240 topics every day?
>> >>Although
>> >> >the retention of each topic is set to be 2 days, I am a little
>> >>concerned
>> >> >that since right now there is no delete topic api, the zookeepers
>> >>might be
>> >> >overloaded.
>> >> >Thanks,
>> >> >Chen
>> >>
>> >>
>>
>>

Re: Issue with 240 topics per day

Posted by Chen Wang <ch...@gmail.com>.
Got it. thanks for the input Todd!
Chen


On Mon, Aug 11, 2014 at 9:31 PM, Todd Palino <tp...@linkedin.com.invalid>
wrote:

> As I noted, we have a cluster right now with 70k partitions. It’s running
> on over 30 brokers, partly to cover the number of partitions and and
> partly to cover the amount of data that we push through it. If you can
> have at least 4 or 5 brokers, I wouldn’t anticipate any problems with the
> number of partitions. You may need more than that depending on the
> throughput you want to handle.
>
> -Todd
>
> On 8/11/14, 9:20 PM, "Chen Wang" <ch...@gmail.com> wrote:
>
> >Todd,
> >Yes I actually thought about that. My concern is that even a weeks topic
> >partition(240*7*3 = 5040) is too many. Does linkedin have a good
> >experience
> >in using this many topics in your system?:-)
> >Thanks,
> >Chen
> >
> >
> >On Mon, Aug 11, 2014 at 9:02 PM, Todd Palino
> ><tp...@linkedin.com.invalid>
> >wrote:
> >
> >> In order to delete topics, you need to shut down the entire cluster (all
> >> brokers), delete the topics from Zookeeper, and delete the log files and
> >> partition directory from the disk on the brokers. Then you can restart
> >>the
> >> cluster. Assuming that you can take a periodic outage on your cluster,
> >>you
> >> can do it this way.
> >>
> >> Reading what you’re intending to do in other parts of this thread, have
> >> you considered setting up 1 week’s worth of topics with 3 day retention,
> >> and having your producer and consumer rotate between them. That is, on
> >> Sunday at 12:00 AM, you start with topic1, then proceed to topic2 at
> >> 12:06, and so on. The next week, you loop around over exactly the same
> >> topics, knowing that the retention settings have cleared out the old
> >>data.
> >>
> >> -Todd
> >>
> >> On 8/11/14, 4:45 PM, "Chen Wang" <ch...@gmail.com> wrote:
> >>
> >> >Todd,
> >> >I actually only intend to keep each topic valid for 3 days most. Each
> >>of
> >> >our topic has 3 partitions, so its around 3*240*3 =2160 partitions.
> >>Since
> >> >there is no api for deleting topic, i guess i could set up a cron job
> >> >deleting the out dated topics(folders) from zookeeper..
> >> >do you know when the delete topic api will be available in kafka?
> >> >Chen
> >> >
> >> >
> >> >On Mon, Aug 11, 2014 at 3:47 PM, Todd Palino
> >> ><tp...@linkedin.com.invalid>
> >> >wrote:
> >> >
> >> >> You need to consider your total partition count as you do this.
> >>After 30
> >> >> days, assuming 1 partition per topic, you have 7200 partitions.
> >> >>Depending
> >> >> on how many brokers you have, this can start to be a problem. We just
> >> >> found an issue on one of our clusters that has over 70k partitions
> >>that
> >> >> there¹s now a problem with doing actions like a preferred replica
> >> >>election
> >> >> for all topics because the JSON object that gets written to the
> >> >>zookeeper
> >> >> node to trigger it is too large for Zookeeper¹s default 1 MB data
> >>size.
> >> >>
> >> >> You also need to think about the number of open file handles. Even
> >>with
> >> >>no
> >> >> data, there will be open files for each topic.
> >> >>
> >> >> -Todd
> >> >>
> >> >>
> >> >> On 8/11/14, 2:19 PM, "Chen Wang" <ch...@gmail.com> wrote:
> >> >>
> >> >> >Folks,
> >> >> >Is there any potential issue with creating 240 topics every day?
> >> >>Although
> >> >> >the retention of each topic is set to be 2 days, I am a little
> >> >>concerned
> >> >> >that since right now there is no delete topic api, the zookeepers
> >> >>might be
> >> >> >overloaded.
> >> >> >Thanks,
> >> >> >Chen
> >> >>
> >> >>
> >>
> >>
>
>

Re: Issue with 240 topics per day

Posted by Todd Palino <tp...@linkedin.com.INVALID>.
As I noted, we have a cluster right now with 70k partitions. It’s running
on over 30 brokers, partly to cover the number of partitions and and
partly to cover the amount of data that we push through it. If you can
have at least 4 or 5 brokers, I wouldn’t anticipate any problems with the
number of partitions. You may need more than that depending on the
throughput you want to handle.

-Todd

On 8/11/14, 9:20 PM, "Chen Wang" <ch...@gmail.com> wrote:

>Todd,
>Yes I actually thought about that. My concern is that even a weeks topic
>partition(240*7*3 = 5040) is too many. Does linkedin have a good
>experience
>in using this many topics in your system?:-)
>Thanks,
>Chen
>
>
>On Mon, Aug 11, 2014 at 9:02 PM, Todd Palino
><tp...@linkedin.com.invalid>
>wrote:
>
>> In order to delete topics, you need to shut down the entire cluster (all
>> brokers), delete the topics from Zookeeper, and delete the log files and
>> partition directory from the disk on the brokers. Then you can restart
>>the
>> cluster. Assuming that you can take a periodic outage on your cluster,
>>you
>> can do it this way.
>>
>> Reading what you’re intending to do in other parts of this thread, have
>> you considered setting up 1 week’s worth of topics with 3 day retention,
>> and having your producer and consumer rotate between them. That is, on
>> Sunday at 12:00 AM, you start with topic1, then proceed to topic2 at
>> 12:06, and so on. The next week, you loop around over exactly the same
>> topics, knowing that the retention settings have cleared out the old
>>data.
>>
>> -Todd
>>
>> On 8/11/14, 4:45 PM, "Chen Wang" <ch...@gmail.com> wrote:
>>
>> >Todd,
>> >I actually only intend to keep each topic valid for 3 days most. Each
>>of
>> >our topic has 3 partitions, so its around 3*240*3 =2160 partitions.
>>Since
>> >there is no api for deleting topic, i guess i could set up a cron job
>> >deleting the out dated topics(folders) from zookeeper..
>> >do you know when the delete topic api will be available in kafka?
>> >Chen
>> >
>> >
>> >On Mon, Aug 11, 2014 at 3:47 PM, Todd Palino
>> ><tp...@linkedin.com.invalid>
>> >wrote:
>> >
>> >> You need to consider your total partition count as you do this.
>>After 30
>> >> days, assuming 1 partition per topic, you have 7200 partitions.
>> >>Depending
>> >> on how many brokers you have, this can start to be a problem. We just
>> >> found an issue on one of our clusters that has over 70k partitions
>>that
>> >> there¹s now a problem with doing actions like a preferred replica
>> >>election
>> >> for all topics because the JSON object that gets written to the
>> >>zookeeper
>> >> node to trigger it is too large for Zookeeper¹s default 1 MB data
>>size.
>> >>
>> >> You also need to think about the number of open file handles. Even
>>with
>> >>no
>> >> data, there will be open files for each topic.
>> >>
>> >> -Todd
>> >>
>> >>
>> >> On 8/11/14, 2:19 PM, "Chen Wang" <ch...@gmail.com> wrote:
>> >>
>> >> >Folks,
>> >> >Is there any potential issue with creating 240 topics every day?
>> >>Although
>> >> >the retention of each topic is set to be 2 days, I am a little
>> >>concerned
>> >> >that since right now there is no delete topic api, the zookeepers
>> >>might be
>> >> >overloaded.
>> >> >Thanks,
>> >> >Chen
>> >>
>> >>
>>
>>


Re: Issue with 240 topics per day

Posted by Chen Wang <ch...@gmail.com>.
Todd,
Yes I actually thought about that. My concern is that even a weeks topic
partition(240*7*3 = 5040) is too many. Does linkedin have a good experience
in using this many topics in your system?:-)
Thanks,
Chen


On Mon, Aug 11, 2014 at 9:02 PM, Todd Palino <tp...@linkedin.com.invalid>
wrote:

> In order to delete topics, you need to shut down the entire cluster (all
> brokers), delete the topics from Zookeeper, and delete the log files and
> partition directory from the disk on the brokers. Then you can restart the
> cluster. Assuming that you can take a periodic outage on your cluster, you
> can do it this way.
>
> Reading what you’re intending to do in other parts of this thread, have
> you considered setting up 1 week’s worth of topics with 3 day retention,
> and having your producer and consumer rotate between them. That is, on
> Sunday at 12:00 AM, you start with topic1, then proceed to topic2 at
> 12:06, and so on. The next week, you loop around over exactly the same
> topics, knowing that the retention settings have cleared out the old data.
>
> -Todd
>
> On 8/11/14, 4:45 PM, "Chen Wang" <ch...@gmail.com> wrote:
>
> >Todd,
> >I actually only intend to keep each topic valid for 3 days most. Each of
> >our topic has 3 partitions, so its around 3*240*3 =2160 partitions. Since
> >there is no api for deleting topic, i guess i could set up a cron job
> >deleting the out dated topics(folders) from zookeeper..
> >do you know when the delete topic api will be available in kafka?
> >Chen
> >
> >
> >On Mon, Aug 11, 2014 at 3:47 PM, Todd Palino
> ><tp...@linkedin.com.invalid>
> >wrote:
> >
> >> You need to consider your total partition count as you do this. After 30
> >> days, assuming 1 partition per topic, you have 7200 partitions.
> >>Depending
> >> on how many brokers you have, this can start to be a problem. We just
> >> found an issue on one of our clusters that has over 70k partitions that
> >> there¹s now a problem with doing actions like a preferred replica
> >>election
> >> for all topics because the JSON object that gets written to the
> >>zookeeper
> >> node to trigger it is too large for Zookeeper¹s default 1 MB data size.
> >>
> >> You also need to think about the number of open file handles. Even with
> >>no
> >> data, there will be open files for each topic.
> >>
> >> -Todd
> >>
> >>
> >> On 8/11/14, 2:19 PM, "Chen Wang" <ch...@gmail.com> wrote:
> >>
> >> >Folks,
> >> >Is there any potential issue with creating 240 topics every day?
> >>Although
> >> >the retention of each topic is set to be 2 days, I am a little
> >>concerned
> >> >that since right now there is no delete topic api, the zookeepers
> >>might be
> >> >overloaded.
> >> >Thanks,
> >> >Chen
> >>
> >>
>
>

Re: Issue with 240 topics per day

Posted by Todd Palino <tp...@linkedin.com.INVALID>.
In order to delete topics, you need to shut down the entire cluster (all
brokers), delete the topics from Zookeeper, and delete the log files and
partition directory from the disk on the brokers. Then you can restart the
cluster. Assuming that you can take a periodic outage on your cluster, you
can do it this way.

Reading what you’re intending to do in other parts of this thread, have
you considered setting up 1 week’s worth of topics with 3 day retention,
and having your producer and consumer rotate between them. That is, on
Sunday at 12:00 AM, you start with topic1, then proceed to topic2 at
12:06, and so on. The next week, you loop around over exactly the same
topics, knowing that the retention settings have cleared out the old data.

-Todd

On 8/11/14, 4:45 PM, "Chen Wang" <ch...@gmail.com> wrote:

>Todd,
>I actually only intend to keep each topic valid for 3 days most. Each of
>our topic has 3 partitions, so its around 3*240*3 =2160 partitions. Since
>there is no api for deleting topic, i guess i could set up a cron job
>deleting the out dated topics(folders) from zookeeper..
>do you know when the delete topic api will be available in kafka?
>Chen
>
>
>On Mon, Aug 11, 2014 at 3:47 PM, Todd Palino
><tp...@linkedin.com.invalid>
>wrote:
>
>> You need to consider your total partition count as you do this. After 30
>> days, assuming 1 partition per topic, you have 7200 partitions.
>>Depending
>> on how many brokers you have, this can start to be a problem. We just
>> found an issue on one of our clusters that has over 70k partitions that
>> there¹s now a problem with doing actions like a preferred replica
>>election
>> for all topics because the JSON object that gets written to the
>>zookeeper
>> node to trigger it is too large for Zookeeper¹s default 1 MB data size.
>>
>> You also need to think about the number of open file handles. Even with
>>no
>> data, there will be open files for each topic.
>>
>> -Todd
>>
>>
>> On 8/11/14, 2:19 PM, "Chen Wang" <ch...@gmail.com> wrote:
>>
>> >Folks,
>> >Is there any potential issue with creating 240 topics every day?
>>Although
>> >the retention of each topic is set to be 2 days, I am a little
>>concerned
>> >that since right now there is no delete topic api, the zookeepers
>>might be
>> >overloaded.
>> >Thanks,
>> >Chen
>>
>>


Re: Issue with 240 topics per day

Posted by Chen Wang <ch...@gmail.com>.
Todd,
I actually only intend to keep each topic valid for 3 days most. Each of
our topic has 3 partitions, so its around 3*240*3 =2160 partitions. Since
there is no api for deleting topic, i guess i could set up a cron job
deleting the out dated topics(folders) from zookeeper..
do you know when the delete topic api will be available in kafka?
Chen


On Mon, Aug 11, 2014 at 3:47 PM, Todd Palino <tp...@linkedin.com.invalid>
wrote:

> You need to consider your total partition count as you do this. After 30
> days, assuming 1 partition per topic, you have 7200 partitions. Depending
> on how many brokers you have, this can start to be a problem. We just
> found an issue on one of our clusters that has over 70k partitions that
> there¹s now a problem with doing actions like a preferred replica election
> for all topics because the JSON object that gets written to the zookeeper
> node to trigger it is too large for Zookeeper¹s default 1 MB data size.
>
> You also need to think about the number of open file handles. Even with no
> data, there will be open files for each topic.
>
> -Todd
>
>
> On 8/11/14, 2:19 PM, "Chen Wang" <ch...@gmail.com> wrote:
>
> >Folks,
> >Is there any potential issue with creating 240 topics every day? Although
> >the retention of each topic is set to be 2 days, I am a little concerned
> >that since right now there is no delete topic api, the zookeepers might be
> >overloaded.
> >Thanks,
> >Chen
>
>

Re: Issue with 240 topics per day

Posted by Todd Palino <tp...@linkedin.com.INVALID>.
You need to consider your total partition count as you do this. After 30
days, assuming 1 partition per topic, you have 7200 partitions. Depending
on how many brokers you have, this can start to be a problem. We just
found an issue on one of our clusters that has over 70k partitions that
there¹s now a problem with doing actions like a preferred replica election
for all topics because the JSON object that gets written to the zookeeper
node to trigger it is too large for Zookeeper¹s default 1 MB data size.

You also need to think about the number of open file handles. Even with no
data, there will be open files for each topic.

-Todd


On 8/11/14, 2:19 PM, "Chen Wang" <ch...@gmail.com> wrote:

>Folks,
>Is there any potential issue with creating 240 topics every day? Although
>the retention of each topic is set to be 2 days, I am a little concerned
>that since right now there is no delete topic api, the zookeepers might be
>overloaded.
>Thanks,
>Chen