You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Doyle, Keith" <Ke...@greenwayhealth.com> on 2016/08/11 23:43:55 UTC

Co-locating brokers

Hi,


We're considering Kafka to provide a queued transport mechanism for an ETL process with near-real-time capability.  Kafka is looking pretty good but I'm wondering about a couple of things.

It's not clear- my first inclination is to co-locate a broker on the client servers in order to provide a queueing mechanism on the clients, just for getting the data from the databases into Kafka.   This would allow the data to back up on a client if necessary without holding off the producer in the case of external network or server availability problems.   And then using broker replication, the queue would then be duplicated on the warehouse server where the consumer can process the data for storage in the warehouse database.    So each client database server would be set up in a Kafka partition that has two brokers, one residing on the client and one on the warehouse, set to replicate.

Without doing it this way, network outages or performance hits could slow the producer when unable to contact a broker to the point it might not be able to keep up, and we'd need to implement a storage queue for that as well to solve the problem.   Some amount of database-to-producer queue may still be required, but I was hoping to keep it short and depend on the fact there's a local broker to minimize the problem.    My thinking is to minimize the path to getting the data into Kafka by providing a local broker, and let its replication abilities take over from there.

Does it make any sense to think about it this way?   I realize this would mean the client broker could then persist a lot of data, eating up disk space but that's the nature of the problem if the source database is producing a lot of transactions which need to be stored somewhere.

And are the latency measures of Kafka rated based on single broker throughput, where a broker-to-broker replication across networks would not be taken into account?   What effect does broker-to-broker replication over a network have on latency?

I'm also wondering if a queue can be cleared "on demand."   I know you can configure the persistence based on time or size, but I'm wondering if the consumer could trigger the removal of data as the messages are processed.


--

Keith Doyle
Greenway Health

NOTICE: This e-mail message and all attachments transmitted with it may contain legally privileged and confidential information intended solely for the use of the addressee. If the reader of this message is not the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by electronic mail and delete this message and all copies and backups thereof. Thank you. Greenway Health.

Re: Co-locating brokers

Posted by "Tauzell, Dave" <Da...@surescripts.com>.
You would have to have the client servers have their own "cluster" and use mirror maker to replicate to the main cluster.

I haven't found a way to clear on demand but you can temporarily set the time to live short, like one second, and wait for Kafka to clear the messages.

Dave

> On Aug 11, 2016, at 20:20, Doyle, Keith <Ke...@greenwayhealth.com> wrote:
>
>
> Hi,
>
>
> We're considering Kafka to provide a queued transport mechanism for an ETL process with near-real-time capability.  Kafka is looking pretty good but I'm wondering about a couple of things.
>
> It's not clear- my first inclination is to co-locate a broker on the client servers in order to provide a queueing mechanism on the clients, just for getting the data from the databases into Kafka.   This would allow the data to back up on a client if necessary without holding off the producer in the case of external network or server availability problems.   And then using broker replication, the queue would then be duplicated on the warehouse server where the consumer can process the data for storage in the warehouse database.    So each client database server would be set up in a Kafka partition that has two brokers, one residing on the client and one on the warehouse, set to replicate.
>
> Without doing it this way, network outages or performance hits could slow the producer when unable to contact a broker to the point it might not be able to keep up, and we'd need to implement a storage queue for that as well to solve the problem.   Some amount of database-to-producer queue may still be required, but I was hoping to keep it short and depend on the fact there's a local broker to minimize the problem.    My thinking is to minimize the path to getting the data into Kafka by providing a local broker, and let its replication abilities take over from there.
>
> Does it make any sense to think about it this way?   I realize this would mean the client broker could then persist a lot of data, eating up disk space but that's the nature of the problem if the source database is producing a lot of transactions which need to be stored somewhere.
>
> And are the latency measures of Kafka rated based on single broker throughput, where a broker-to-broker replication across networks would not be taken into account?   What effect does broker-to-broker replication over a network have on latency?
>
> I'm also wondering if a queue can be cleared "on demand."   I know you can configure the persistence based on time or size, but I'm wondering if the consumer could trigger the removal of data as the messages are processed.
>
>
> --
>
> Keith Doyle
> Greenway Health
>
> NOTICE: This e-mail message and all attachments transmitted with it may contain legally privileged and confidential information intended solely for the use of the addressee. If the reader of this message is not the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by electronic mail and delete this message and all copies and backups thereof. Thank you. Greenway Health.
This e-mail and any files transmitted with it are confidential, may contain sensitive information, and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error, please notify the sender by reply e-mail immediately and destroy all copies of the e-mail and any attachments.