You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Ambud Sharma <as...@gmail.com> on 2017/05/06 16:48:55 UTC

Re: Apache storm HW recommendation

Kafka is NOT cpu intensive whereas Storm depending on what you are trying
to accomplish is. Memory however is going to be point of contention, with
physical machines this is going to be even more evident as OS page cache
will be affected. Virtualizing will not provide you any additional
bandwidth, simply separation of OS to avoid any potential performance
issues due to memory contention.


On Mon, Apr 10, 2017 at 7:54 PM, Ali Nazemian <al...@gmail.com> wrote:

> You are right. It is very application specific.
>
> The advantage that comes with colocating is that it might be too much
> headroom to dedicate an entire Rack Server to Storm or Kafka only. When we
> are speaking of virtualized environment it would be easy to just use
> separate VM, but in the bare metal solution, it might be not cost
> effective. We are eliminating the hypervisor overhead and providing
> resource locality to gain extra performance. I am investigating the
> performance and throughput of colocating Kafka and Storm on a bare metal
> hardware vs separate VM for Kafka and Storm.
>
> Thanks,
> Ali
>
> On Tue, Apr 11, 2017 at 12:36 PM, Erik Weathers <ew...@groupon.com>
> wrote:
>
>> There are so many variables.  Again, I think you just need to run the
>> application and profile it.   Maybe just run it on VMs to get some
>> profiling info and then determine if you need real h/w.  This is something
>> that I don't think you can expect a mailing list to really help you solve,
>> Storm is *really* just a bunch of APIs and interfaces for running Java code
>> (ignoring the shell-based stuff for running logic written in other
>> languages).  So the question boils down to:  what type of h/w do I need to
>> run some arbitrary program?   I don't see how anyone can answer that for
>> you.
>>
>> If you wanna co-locate it you could choose to do so, I just like to keep
>> stuff separate.  These are large complex systems, I don't know what benefit
>> you'll be gaining from shoving them into the same host.
>>
>> - Erik
>>
>> On Mon, Apr 10, 2017 at 7:18 PM, Ali Nazemian <al...@gmail.com>
>> wrote:
>>
>>> Thank you very much, Erick.
>>>
>>> What if I dedicate a separate disk for Storm logs? Let's say for example
>>> dedicating 23 number of disks for Kafka and additional disk for Storm.
>>>
>>> In terms of application, my application would be parsing unstructured
>>> data, enrich it with some additional data stored on HBase and send it to
>>> Elasticsearch/Solr as well as store it on HDFS. Can it narrow down the use
>>> case to have a better understanding of HW requirements?
>>>
>>> Regards,
>>> Ali
>>>
>>> On Tue, Apr 11, 2017 at 12:04 PM, Erik Weathers <ew...@groupon.com>
>>> wrote:
>>>
>>>> hi Ali,
>>>>
>>>> Unfortunately the answer to these questions is *very* dependent on your
>>>> application logic in your storm topology, I don't think anyone can really
>>>> speak to many of these questions, it's just too broad.  You'll need to do
>>>> your own profiling with your application and figure out your own particular
>>>> resource needs.
>>>>
>>>> I wouldn't recommend running Storm colocated with Kafka on the same
>>>> host.  You *can* create a lot of disk IO if you log a lot from your storm
>>>> topology, and you wouldn't want anything messing with Kafka's ability to
>>>> use the disk (Kafka must read from disk if you read older data that isn't
>>>> resident in memory, and Kafka is also writing everything to disk).  But if
>>>> you're using VMs you may not have control over whether Kafka brokers and
>>>> Storm worker nodes get placed onto the same physical host.
>>>>
>>>> - Erik
>>>>
>>>> P.S., Storm isn't normally fully capitalized as you're writing (STORM);
>>>> i.e., it's not an acronym.
>>>>
>>>> On Mon, Apr 10, 2017 at 6:16 PM, Ali Nazemian <al...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I was wondering if there is any benchmark or any recommendation for
>>>>> having physical HW vs. virtual for the STORM. I am trying to calculate the
>>>>> HW requirements for a STORM Cluster with a hard SLA. My questions are as
>>>>> follows.
>>>>>
>>>>> - How much on-heap and off-heap memory would be required per node? Is
>>>>> there any additional improvement we may have by adding additional memory? I
>>>>> think STOM supervisor is not a disk-intensive workload. Does it mean
>>>>> on-heap memory is all that matters?
>>>>>
>>>>> - Is there any rule for calculating the number of required CPU cores
>>>>> per supervisor node?
>>>>>
>>>>> - Since Storm is more CPU-intensive and not a Disk-intensive workload,
>>>>> how bad would be to coexist STORM and a none CPU-intensive workload like
>>>>> Kafka-Broker?
>>>>>
>>>>> Regards,
>>>>> Ali
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> A.Nazemian
>>>
>>
>>
>
>
> --
> A.Nazemian
>