You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Ali Nazemian <al...@gmail.com> on 2017/04/11 01:16:32 UTC

Apache storm HW recommendation

Hi all,

I was wondering if there is any benchmark or any recommendation for having
physical HW vs. virtual for the STORM. I am trying to calculate the HW
requirements for a STORM Cluster with a hard SLA. My questions are as
follows.

- How much on-heap and off-heap memory would be required per node? Is there
any additional improvement we may have by adding additional memory? I think
STOM supervisor is not a disk-intensive workload. Does it mean on-heap
memory is all that matters?

- Is there any rule for calculating the number of required CPU cores per
supervisor node?

- Since Storm is more CPU-intensive and not a Disk-intensive workload, how
bad would be to coexist STORM and a none CPU-intensive workload like
Kafka-Broker?

Regards,
Ali

Re: Apache storm HW recommendation

Posted by Ambud Sharma <as...@gmail.com>.
Kafka is NOT cpu intensive whereas Storm depending on what you are trying
to accomplish is. Memory however is going to be point of contention, with
physical machines this is going to be even more evident as OS page cache
will be affected. Virtualizing will not provide you any additional
bandwidth, simply separation of OS to avoid any potential performance
issues due to memory contention.


On Mon, Apr 10, 2017 at 7:54 PM, Ali Nazemian <al...@gmail.com> wrote:

> You are right. It is very application specific.
>
> The advantage that comes with colocating is that it might be too much
> headroom to dedicate an entire Rack Server to Storm or Kafka only. When we
> are speaking of virtualized environment it would be easy to just use
> separate VM, but in the bare metal solution, it might be not cost
> effective. We are eliminating the hypervisor overhead and providing
> resource locality to gain extra performance. I am investigating the
> performance and throughput of colocating Kafka and Storm on a bare metal
> hardware vs separate VM for Kafka and Storm.
>
> Thanks,
> Ali
>
> On Tue, Apr 11, 2017 at 12:36 PM, Erik Weathers <ew...@groupon.com>
> wrote:
>
>> There are so many variables.  Again, I think you just need to run the
>> application and profile it.   Maybe just run it on VMs to get some
>> profiling info and then determine if you need real h/w.  This is something
>> that I don't think you can expect a mailing list to really help you solve,
>> Storm is *really* just a bunch of APIs and interfaces for running Java code
>> (ignoring the shell-based stuff for running logic written in other
>> languages).  So the question boils down to:  what type of h/w do I need to
>> run some arbitrary program?   I don't see how anyone can answer that for
>> you.
>>
>> If you wanna co-locate it you could choose to do so, I just like to keep
>> stuff separate.  These are large complex systems, I don't know what benefit
>> you'll be gaining from shoving them into the same host.
>>
>> - Erik
>>
>> On Mon, Apr 10, 2017 at 7:18 PM, Ali Nazemian <al...@gmail.com>
>> wrote:
>>
>>> Thank you very much, Erick.
>>>
>>> What if I dedicate a separate disk for Storm logs? Let's say for example
>>> dedicating 23 number of disks for Kafka and additional disk for Storm.
>>>
>>> In terms of application, my application would be parsing unstructured
>>> data, enrich it with some additional data stored on HBase and send it to
>>> Elasticsearch/Solr as well as store it on HDFS. Can it narrow down the use
>>> case to have a better understanding of HW requirements?
>>>
>>> Regards,
>>> Ali
>>>
>>> On Tue, Apr 11, 2017 at 12:04 PM, Erik Weathers <ew...@groupon.com>
>>> wrote:
>>>
>>>> hi Ali,
>>>>
>>>> Unfortunately the answer to these questions is *very* dependent on your
>>>> application logic in your storm topology, I don't think anyone can really
>>>> speak to many of these questions, it's just too broad.  You'll need to do
>>>> your own profiling with your application and figure out your own particular
>>>> resource needs.
>>>>
>>>> I wouldn't recommend running Storm colocated with Kafka on the same
>>>> host.  You *can* create a lot of disk IO if you log a lot from your storm
>>>> topology, and you wouldn't want anything messing with Kafka's ability to
>>>> use the disk (Kafka must read from disk if you read older data that isn't
>>>> resident in memory, and Kafka is also writing everything to disk).  But if
>>>> you're using VMs you may not have control over whether Kafka brokers and
>>>> Storm worker nodes get placed onto the same physical host.
>>>>
>>>> - Erik
>>>>
>>>> P.S., Storm isn't normally fully capitalized as you're writing (STORM);
>>>> i.e., it's not an acronym.
>>>>
>>>> On Mon, Apr 10, 2017 at 6:16 PM, Ali Nazemian <al...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I was wondering if there is any benchmark or any recommendation for
>>>>> having physical HW vs. virtual for the STORM. I am trying to calculate the
>>>>> HW requirements for a STORM Cluster with a hard SLA. My questions are as
>>>>> follows.
>>>>>
>>>>> - How much on-heap and off-heap memory would be required per node? Is
>>>>> there any additional improvement we may have by adding additional memory? I
>>>>> think STOM supervisor is not a disk-intensive workload. Does it mean
>>>>> on-heap memory is all that matters?
>>>>>
>>>>> - Is there any rule for calculating the number of required CPU cores
>>>>> per supervisor node?
>>>>>
>>>>> - Since Storm is more CPU-intensive and not a Disk-intensive workload,
>>>>> how bad would be to coexist STORM and a none CPU-intensive workload like
>>>>> Kafka-Broker?
>>>>>
>>>>> Regards,
>>>>> Ali
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> A.Nazemian
>>>
>>
>>
>
>
> --
> A.Nazemian
>

Re: Apache storm HW recommendation

Posted by Ali Nazemian <al...@gmail.com>.
You are right. It is very application specific.

The advantage that comes with colocating is that it might be too much
headroom to dedicate an entire Rack Server to Storm or Kafka only. When we
are speaking of virtualized environment it would be easy to just use
separate VM, but in the bare metal solution, it might be not cost
effective. We are eliminating the hypervisor overhead and providing
resource locality to gain extra performance. I am investigating the
performance and throughput of colocating Kafka and Storm on a bare metal
hardware vs separate VM for Kafka and Storm.

Thanks,
Ali

On Tue, Apr 11, 2017 at 12:36 PM, Erik Weathers <ew...@groupon.com>
wrote:

> There are so many variables.  Again, I think you just need to run the
> application and profile it.   Maybe just run it on VMs to get some
> profiling info and then determine if you need real h/w.  This is something
> that I don't think you can expect a mailing list to really help you solve,
> Storm is *really* just a bunch of APIs and interfaces for running Java code
> (ignoring the shell-based stuff for running logic written in other
> languages).  So the question boils down to:  what type of h/w do I need to
> run some arbitrary program?   I don't see how anyone can answer that for
> you.
>
> If you wanna co-locate it you could choose to do so, I just like to keep
> stuff separate.  These are large complex systems, I don't know what benefit
> you'll be gaining from shoving them into the same host.
>
> - Erik
>
> On Mon, Apr 10, 2017 at 7:18 PM, Ali Nazemian <al...@gmail.com>
> wrote:
>
>> Thank you very much, Erick.
>>
>> What if I dedicate a separate disk for Storm logs? Let's say for example
>> dedicating 23 number of disks for Kafka and additional disk for Storm.
>>
>> In terms of application, my application would be parsing unstructured
>> data, enrich it with some additional data stored on HBase and send it to
>> Elasticsearch/Solr as well as store it on HDFS. Can it narrow down the use
>> case to have a better understanding of HW requirements?
>>
>> Regards,
>> Ali
>>
>> On Tue, Apr 11, 2017 at 12:04 PM, Erik Weathers <ew...@groupon.com>
>> wrote:
>>
>>> hi Ali,
>>>
>>> Unfortunately the answer to these questions is *very* dependent on your
>>> application logic in your storm topology, I don't think anyone can really
>>> speak to many of these questions, it's just too broad.  You'll need to do
>>> your own profiling with your application and figure out your own particular
>>> resource needs.
>>>
>>> I wouldn't recommend running Storm colocated with Kafka on the same
>>> host.  You *can* create a lot of disk IO if you log a lot from your storm
>>> topology, and you wouldn't want anything messing with Kafka's ability to
>>> use the disk (Kafka must read from disk if you read older data that isn't
>>> resident in memory, and Kafka is also writing everything to disk).  But if
>>> you're using VMs you may not have control over whether Kafka brokers and
>>> Storm worker nodes get placed onto the same physical host.
>>>
>>> - Erik
>>>
>>> P.S., Storm isn't normally fully capitalized as you're writing (STORM);
>>> i.e., it's not an acronym.
>>>
>>> On Mon, Apr 10, 2017 at 6:16 PM, Ali Nazemian <al...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I was wondering if there is any benchmark or any recommendation for
>>>> having physical HW vs. virtual for the STORM. I am trying to calculate the
>>>> HW requirements for a STORM Cluster with a hard SLA. My questions are as
>>>> follows.
>>>>
>>>> - How much on-heap and off-heap memory would be required per node? Is
>>>> there any additional improvement we may have by adding additional memory? I
>>>> think STOM supervisor is not a disk-intensive workload. Does it mean
>>>> on-heap memory is all that matters?
>>>>
>>>> - Is there any rule for calculating the number of required CPU cores
>>>> per supervisor node?
>>>>
>>>> - Since Storm is more CPU-intensive and not a Disk-intensive workload,
>>>> how bad would be to coexist STORM and a none CPU-intensive workload like
>>>> Kafka-Broker?
>>>>
>>>> Regards,
>>>> Ali
>>>>
>>>
>>>
>>
>>
>> --
>> A.Nazemian
>>
>
>


-- 
A.Nazemian

Re: Apache storm HW recommendation

Posted by Erik Weathers <ew...@groupon.com>.
There are so many variables.  Again, I think you just need to run the
application and profile it.   Maybe just run it on VMs to get some
profiling info and then determine if you need real h/w.  This is something
that I don't think you can expect a mailing list to really help you solve,
Storm is *really* just a bunch of APIs and interfaces for running Java code
(ignoring the shell-based stuff for running logic written in other
languages).  So the question boils down to:  what type of h/w do I need to
run some arbitrary program?   I don't see how anyone can answer that for
you.

If you wanna co-locate it you could choose to do so, I just like to keep
stuff separate.  These are large complex systems, I don't know what benefit
you'll be gaining from shoving them into the same host.

- Erik

On Mon, Apr 10, 2017 at 7:18 PM, Ali Nazemian <al...@gmail.com> wrote:

> Thank you very much, Erick.
>
> What if I dedicate a separate disk for Storm logs? Let's say for example
> dedicating 23 number of disks for Kafka and additional disk for Storm.
>
> In terms of application, my application would be parsing unstructured
> data, enrich it with some additional data stored on HBase and send it to
> Elasticsearch/Solr as well as store it on HDFS. Can it narrow down the use
> case to have a better understanding of HW requirements?
>
> Regards,
> Ali
>
> On Tue, Apr 11, 2017 at 12:04 PM, Erik Weathers <ew...@groupon.com>
> wrote:
>
>> hi Ali,
>>
>> Unfortunately the answer to these questions is *very* dependent on your
>> application logic in your storm topology, I don't think anyone can really
>> speak to many of these questions, it's just too broad.  You'll need to do
>> your own profiling with your application and figure out your own particular
>> resource needs.
>>
>> I wouldn't recommend running Storm colocated with Kafka on the same
>> host.  You *can* create a lot of disk IO if you log a lot from your storm
>> topology, and you wouldn't want anything messing with Kafka's ability to
>> use the disk (Kafka must read from disk if you read older data that isn't
>> resident in memory, and Kafka is also writing everything to disk).  But if
>> you're using VMs you may not have control over whether Kafka brokers and
>> Storm worker nodes get placed onto the same physical host.
>>
>> - Erik
>>
>> P.S., Storm isn't normally fully capitalized as you're writing (STORM);
>> i.e., it's not an acronym.
>>
>> On Mon, Apr 10, 2017 at 6:16 PM, Ali Nazemian <al...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I was wondering if there is any benchmark or any recommendation for
>>> having physical HW vs. virtual for the STORM. I am trying to calculate the
>>> HW requirements for a STORM Cluster with a hard SLA. My questions are as
>>> follows.
>>>
>>> - How much on-heap and off-heap memory would be required per node? Is
>>> there any additional improvement we may have by adding additional memory? I
>>> think STOM supervisor is not a disk-intensive workload. Does it mean
>>> on-heap memory is all that matters?
>>>
>>> - Is there any rule for calculating the number of required CPU cores per
>>> supervisor node?
>>>
>>> - Since Storm is more CPU-intensive and not a Disk-intensive workload,
>>> how bad would be to coexist STORM and a none CPU-intensive workload like
>>> Kafka-Broker?
>>>
>>> Regards,
>>> Ali
>>>
>>
>>
>
>
> --
> A.Nazemian
>

Re: Apache storm HW recommendation

Posted by Ali Nazemian <al...@gmail.com>.
Thank you very much, Erick.

What if I dedicate a separate disk for Storm logs? Let's say for example
dedicating 23 number of disks for Kafka and additional disk for Storm.

In terms of application, my application would be parsing unstructured data,
enrich it with some additional data stored on HBase and send it to
Elasticsearch/Solr as well as store it on HDFS. Can it narrow down the use
case to have a better understanding of HW requirements?

Regards,
Ali

On Tue, Apr 11, 2017 at 12:04 PM, Erik Weathers <ew...@groupon.com>
wrote:

> hi Ali,
>
> Unfortunately the answer to these questions is *very* dependent on your
> application logic in your storm topology, I don't think anyone can really
> speak to many of these questions, it's just too broad.  You'll need to do
> your own profiling with your application and figure out your own particular
> resource needs.
>
> I wouldn't recommend running Storm colocated with Kafka on the same host.
> You *can* create a lot of disk IO if you log a lot from your storm
> topology, and you wouldn't want anything messing with Kafka's ability to
> use the disk (Kafka must read from disk if you read older data that isn't
> resident in memory, and Kafka is also writing everything to disk).  But if
> you're using VMs you may not have control over whether Kafka brokers and
> Storm worker nodes get placed onto the same physical host.
>
> - Erik
>
> P.S., Storm isn't normally fully capitalized as you're writing (STORM);
> i.e., it's not an acronym.
>
> On Mon, Apr 10, 2017 at 6:16 PM, Ali Nazemian <al...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I was wondering if there is any benchmark or any recommendation for
>> having physical HW vs. virtual for the STORM. I am trying to calculate the
>> HW requirements for a STORM Cluster with a hard SLA. My questions are as
>> follows.
>>
>> - How much on-heap and off-heap memory would be required per node? Is
>> there any additional improvement we may have by adding additional memory? I
>> think STOM supervisor is not a disk-intensive workload. Does it mean
>> on-heap memory is all that matters?
>>
>> - Is there any rule for calculating the number of required CPU cores per
>> supervisor node?
>>
>> - Since Storm is more CPU-intensive and not a Disk-intensive workload,
>> how bad would be to coexist STORM and a none CPU-intensive workload like
>> Kafka-Broker?
>>
>> Regards,
>> Ali
>>
>
>


-- 
A.Nazemian

Re: Apache storm HW recommendation

Posted by Erik Weathers <ew...@groupon.com>.
hi Ali,

Unfortunately the answer to these questions is *very* dependent on your
application logic in your storm topology, I don't think anyone can really
speak to many of these questions, it's just too broad.  You'll need to do
your own profiling with your application and figure out your own particular
resource needs.

I wouldn't recommend running Storm colocated with Kafka on the same host.
You *can* create a lot of disk IO if you log a lot from your storm
topology, and you wouldn't want anything messing with Kafka's ability to
use the disk (Kafka must read from disk if you read older data that isn't
resident in memory, and Kafka is also writing everything to disk).  But if
you're using VMs you may not have control over whether Kafka brokers and
Storm worker nodes get placed onto the same physical host.

- Erik

P.S., Storm isn't normally fully capitalized as you're writing (STORM);
i.e., it's not an acronym.

On Mon, Apr 10, 2017 at 6:16 PM, Ali Nazemian <al...@gmail.com> wrote:

> Hi all,
>
> I was wondering if there is any benchmark or any recommendation for having
> physical HW vs. virtual for the STORM. I am trying to calculate the HW
> requirements for a STORM Cluster with a hard SLA. My questions are as
> follows.
>
> - How much on-heap and off-heap memory would be required per node? Is
> there any additional improvement we may have by adding additional memory? I
> think STOM supervisor is not a disk-intensive workload. Does it mean
> on-heap memory is all that matters?
>
> - Is there any rule for calculating the number of required CPU cores per
> supervisor node?
>
> - Since Storm is more CPU-intensive and not a Disk-intensive workload, how
> bad would be to coexist STORM and a none CPU-intensive workload like
> Kafka-Broker?
>
> Regards,
> Ali
>