You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2017/03/03 08:15:51 UTC

kafka and zookeeper set up in prod for spark streaming

hi,

In DEV, Kafka and ZooKeeper services can be co- located.on the same
physical hosts

In Prod moving forward do we need to set up Zookeeper on its own cluster
not sharing with Hadoop cluster? Can these services be shared within the
Hadoop cluster?

How best to set up Zookeeper that is needed for Kafka for use with Spark
Streaming?

Thanks

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: kafka and zookeeper set up in prod for spark streaming

Posted by Mich Talebzadeh <mi...@gmail.com>.
Thanks all. How about Kafka HA which is important. Is it best to use
application specific Kafka delivery or Kafka MirrorMaker?

Cheers

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 3 March 2017 at 10:22, Mich Talebzadeh <mi...@gmail.com> wrote:

>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> Forwarded conversation
> Subject: kafka and zookeeper set up in prod for spark streaming
> ------------------------
>
> From: Mich Talebzadeh <mi...@gmail.com>
> Date: 3 March 2017 at 08:15
> To: "user @spark" <us...@spark.apache.org>
>
>
>
> hi,
>
> In DEV, Kafka and ZooKeeper services can be co- located.on the same
> physical hosts
>
> In Prod moving forward do we need to set up Zookeeper on its own cluster
> not sharing with Hadoop cluster? Can these services be shared within the
> Hadoop cluster?
>
> How best to set up Zookeeper that is needed for Kafka for use with Spark
> Streaming?
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> ----------
> From: Jörn Franke <jo...@gmail.com>
> Date: 3 March 2017 at 08:29
> To: Mich Talebzadeh <mi...@gmail.com>
> Cc: "user @spark" <us...@spark.apache.org>
>
>
> I think this highly depends on the risk that you want to be exposed to. If
> you have it on dedicated nodes there is less influence of other processes.
>
> I have seen both: on Hadoop nodes or dedicated. On Hadoop I would not
> recommend to put it on data nodes/heavily utilized nodes.
>
> Zookeeper does not need many resources (if you do not abuse it) and you
> may think about putting it on a dedicated small infrastructure of several
> nodes.
>
> ----------
> From: vincent gromakowski <vi...@gmail.com>
> Date: 3 March 2017 at 08:29
> To: Mich Talebzadeh <mi...@gmail.com>
> Cc: "user @spark" <us...@spark.apache.org>
>
>
> Hi,
> Depending on the Kafka version (< 0.8.2 I think), offsets are managed in
> Zookeeper and if you have lots of consumer it's recommended to use a
> dedicated zookeeper cluster (always with dedicated disks, even SSD is
> better). On newer version offsets are managed in special Kafka topics and
> Zookeeper is only used to store metadata, you can share it with Hadoop.
> Maybe you can reach a limit depending on the size of your Kafka, the number
> of topics, producers/consumers... but I have never heard yet. Another point
> is to be careful about security on Zookeeper, sharing a cluster means you
> get the same security level (authentication or not)
>
> ----------
> From: vincent gromakowski <vi...@gmail.com>
> Date: 3 March 2017 at 08:31
> To: Jörn Franke <jo...@gmail.com>
> Cc: Mich Talebzadeh <mi...@gmail.com>, "user @spark" <
> user@spark.apache.org>
>
>
> I forgot to mention it also depends on the spark kafka connector you use.
> If it's receiver based, I recommend a dedicated zookeeper cluster because
> it is used to store offsets. If it's receiver less Zookeeper can be shared.
>
>
>

Re: kafka and zookeeper set up in prod for spark streaming

Posted by vincent gromakowski <vi...@gmail.com>.
I forgot to mention it also depends on the spark kafka connector you use.
If it's receiver based, I recommend a dedicated zookeeper cluster because
it is used to store offsets. If it's receiver less Zookeeper can be shared.

2017-03-03 9:29 GMT+01:00 Jörn Franke <jo...@gmail.com>:

> I think this highly depends on the risk that you want to be exposed to. If
> you have it on dedicated nodes there is less influence of other processes.
>
> I have seen both: on Hadoop nodes or dedicated. On Hadoop I would not
> recommend to put it on data nodes/heavily utilized nodes.
>
> Zookeeper does not need many resources (if you do not abuse it) and you
> may think about putting it on a dedicated small infrastructure of several
> nodes.
>
> On 3 Mar 2017, at 09:15, Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
>
> hi,
>
> In DEV, Kafka and ZooKeeper services can be co- located.on the same
> physical hosts
>
> In Prod moving forward do we need to set up Zookeeper on its own cluster
> not sharing with Hadoop cluster? Can these services be shared within the
> Hadoop cluster?
>
> How best to set up Zookeeper that is needed for Kafka for use with Spark
> Streaming?
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>

Re: kafka and zookeeper set up in prod for spark streaming

Posted by Jörn Franke <jo...@gmail.com>.
I think this highly depends on the risk that you want to be exposed to. If you have it on dedicated nodes there is less influence of other processes.

I have seen both: on Hadoop nodes or dedicated. On Hadoop I would not recommend to put it on data nodes/heavily utilized nodes.

Zookeeper does not need many resources (if you do not abuse it) and you may think about putting it on a dedicated small infrastructure of several nodes.

> On 3 Mar 2017, at 09:15, Mich Talebzadeh <mi...@gmail.com> wrote:
> 
> 
> hi,
> 
> In DEV, Kafka and ZooKeeper services can be co- located.on the same physical hosts
> 
> In Prod moving forward do we need to set up Zookeeper on its own cluster not sharing with Hadoop cluster? Can these services be shared within the Hadoop cluster?
> 
> How best to set up Zookeeper that is needed for Kafka for use with Spark Streaming?
> 
> Thanks
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
>  

Re: kafka and zookeeper set up in prod for spark streaming

Posted by vincent gromakowski <vi...@gmail.com>.
Hi,
Depending on the Kafka version (< 0.8.2 I think), offsets are managed in
Zookeeper and if you have lots of consumer it's recommended to use a
dedicated zookeeper cluster (always with dedicated disks, even SSD is
better). On newer version offsets are managed in special Kafka topics and
Zookeeper is only used to store metadata, you can share it with Hadoop.
Maybe you can reach a limit depending on the size of your Kafka, the number
of topics, producers/consumers... but I have never heard yet. Another point
is to be careful about security on Zookeeper, sharing a cluster means you
get the same security level (authentication or not)

2017-03-03 9:15 GMT+01:00 Mich Talebzadeh <mi...@gmail.com>:

>
> hi,
>
> In DEV, Kafka and ZooKeeper services can be co- located.on the same
> physical hosts
>
> In Prod moving forward do we need to set up Zookeeper on its own cluster
> not sharing with Hadoop cluster? Can these services be shared within the
> Hadoop cluster?
>
> How best to set up Zookeeper that is needed for Kafka for use with Spark
> Streaming?
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>