You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Hyounmin Wang <hy...@gmail.com> on 2016/07/05 06:49:40 UTC

Kafka Beginners planning problem.

Hi there!

I'm new grad engineer and is pretty new to kafka world.

I'm trying to replace rabbit mq with apache-kafka and while planning, I
bumped in to several conceptual planning problem.

First we are using rabbit mq for per user queue policy meaning each user
uses one queue. This suits our need because each user represent some job to
be done with that particular user, and if that user causes a problem, the
queue will never have a problem with other users because queues are
seperated ( Problem meaning messages in the queue will be dispatch to the
users using http request. If user refuses to receive a message (server down
perhaps?) it will go back in retry queue, which will result in no loses of
message (Unless queue goes down))

Now kafka is fault tolerant and failure safe because it write to a disk.
And its exactly why I am trying to implement kafka to our structure.

but there are problem to my plannings.

First, I was thinking to create as many topic as per user meaning each user
would have each topic (What problem will this cause? My max estimate is
that I will have around 1~5 million topics)

Second, If I decide to go for topics based on operation and partition by
random hash of users id, if there was a problem with one user not consuming
message currently, will the all user in the partition have to wait ? What
would be the best way to structure this situation?

So as conclusion, 1~5 millions users. We do not want to have one user
blocking large number of other users being processed. Having topic per user
will solve this issue, it seems like there might be an issue with zookeeper
if such large number gets in (Is this true? )

what would be the best solution for structuring? Considering scalability?

Re: Kafka Beginners planning problem.

Posted by Hyounmin Wang <hy...@gmail.com>.
Hi David

Thank you for your comments. My concern about that idea is that with only
one topic, it will slow a lot of things down. I am assuming there are at
least 6~7 physical consumers so I can safely assume to have more topics. (
Separate topic by operation perhaps?)

Also according to your approach, wouldn't partition be created for 100
millions? as far as I know, partition works in IO file which means it will
slow entire system down (Am I even correct on this?)

Its all matter of how to make sure user A activity does not block User B

Thank you for your answers!


On Wed, Jul 6, 2016 at 12:24 AM, David Newberger <
david.newberger@wandcorp.com> wrote:

> Hi,
>
> I think the recommended approach to this would be to have a single topic
> and partition it by userId. This will give you locality and order by user.
> If you think about it this would give you a better ordering guarantee than
> if you had one topic per users. It's also a lot more efficient. If you are
> using Kafka as a log or messaging system you really should not need
> millions of topics or partitions. If I'm miss understanding the use case
> please let me know.
>
> Cheers,
>
> David Newberger
>
> -----Original Message-----
> From: Hyounmin Wang [mailto:hyunmin90@gmail.com]
> Sent: Tuesday, July 5, 2016 1:50 AM
> To: users@kafka.apache.org
> Subject: Kafka Beginners planning problem.
>
> Hi there!
>
> I'm new grad engineer and is pretty new to kafka world.
>
> I'm trying to replace rabbit mq with apache-kafka and while planning, I
> bumped in to several conceptual planning problem.
>
> First we are using rabbit mq for per user queue policy meaning each user
> uses one queue. This suits our need because each user represent some job to
> be done with that particular user, and if that user causes a problem, the
> queue will never have a problem with other users because queues are
> seperated ( Problem meaning messages in the queue will be dispatch to the
> users using http request. If user refuses to receive a message (server down
> perhaps?) it will go back in retry queue, which will result in no loses of
> message (Unless queue goes down))
>
> Now kafka is fault tolerant and failure safe because it write to a disk.
> And its exactly why I am trying to implement kafka to our structure.
>
> but there are problem to my plannings.
>
> First, I was thinking to create as many topic as per user meaning each
> user would have each topic (What problem will this cause? My max estimate
> is that I will have around 1~5 million topics)
>
> Second, If I decide to go for topics based on operation and partition by
> random hash of users id, if there was a problem with one user not consuming
> message currently, will the all user in the partition have to wait ? What
> would be the best way to structure this situation?
>
> So as conclusion, 1~5 millions users. We do not want to have one user
> blocking large number of other users being processed. Having topic per user
> will solve this issue, it seems like there might be an issue with zookeeper
> if such large number gets in (Is this true? )
>
> what would be the best solution for structuring? Considering scalability?
>

RE: Kafka Beginners planning problem.

Posted by David Newberger <da...@wandcorp.com>.
Hi,

I think the recommended approach to this would be to have a single topic and partition it by userId. This will give you locality and order by user. If you think about it this would give you a better ordering guarantee than if you had one topic per users. It's also a lot more efficient. If you are using Kafka as a log or messaging system you really should not need millions of topics or partitions. If I'm miss understanding the use case please let me know. 

Cheers, 

David Newberger

-----Original Message-----
From: Hyounmin Wang [mailto:hyunmin90@gmail.com] 
Sent: Tuesday, July 5, 2016 1:50 AM
To: users@kafka.apache.org
Subject: Kafka Beginners planning problem.

Hi there!

I'm new grad engineer and is pretty new to kafka world.

I'm trying to replace rabbit mq with apache-kafka and while planning, I bumped in to several conceptual planning problem.

First we are using rabbit mq for per user queue policy meaning each user uses one queue. This suits our need because each user represent some job to be done with that particular user, and if that user causes a problem, the queue will never have a problem with other users because queues are seperated ( Problem meaning messages in the queue will be dispatch to the users using http request. If user refuses to receive a message (server down
perhaps?) it will go back in retry queue, which will result in no loses of message (Unless queue goes down))

Now kafka is fault tolerant and failure safe because it write to a disk.
And its exactly why I am trying to implement kafka to our structure.

but there are problem to my plannings.

First, I was thinking to create as many topic as per user meaning each user would have each topic (What problem will this cause? My max estimate is that I will have around 1~5 million topics)

Second, If I decide to go for topics based on operation and partition by random hash of users id, if there was a problem with one user not consuming message currently, will the all user in the partition have to wait ? What would be the best way to structure this situation?

So as conclusion, 1~5 millions users. We do not want to have one user blocking large number of other users being processed. Having topic per user will solve this issue, it seems like there might be an issue with zookeeper if such large number gets in (Is this true? )

what would be the best solution for structuring? Considering scalability?