You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by I PVP <ip...@hotmail.com> on 2017/05/02 14:44:10 UTC

Storm Topology design

What is the high level best practice on Apache Storm ?

a)  To create a OrderTopology that would receive  and process data from all Order related topics/Spouts like  OrderCreated, OrderUpdated, OrderCancelled and so on

OR

b) To create individual Topologies like OrderCreatedTopology, OrderUpdatedTopology, OrderCancelledTopology

The reason I am asking is because  processing power is getting consumed 100% on all supervisor machines/instance... and does not matter how big the machines/instances are  or how many topologies are running.
The overhead required to run a topology seems to be the attention point.. as cpus on supervisors are at 100% even when there is no data coming into Spouts  or going out  to Bolts.

Our application  has Topologies that  receive data from a KafkaSpouts -> Bolts write data to Cassandra. So far 32 Topologies.

Should I  focus on consolidating all "business domain" ( like Order, Payment)  activities within the same Topology( like OrderTopology, PaymentTopology)?

How does Storm based solutions “design” their topologies ?
A side of individual logging , what are the pros and cons  from Apache Storm perspective ?


thanks

IPVP

Re: Storm Topology design

Posted by Dmitry Semenov <dm...@saritasa.com>.
I see what you mean. We always do as atomic as possible (i.e. the task is
atomic) but I would group all events into single business domain. I.e.
single kafka topic has all order-related stuff, which you then get in your
kafka spout and move it through your bolts to persist in DB.
You only need to separate every single event into its own topic if the
processing of such events is very different and you need a different
topology for such processing. If the only thing you do is persist such
events into cassandra - one topic (kafka) -> one kafka spout -> one
persistence bolt is what you need. Should be pretty simple.

The cost of debugging topology (especially in production) is very high -
therefore the less complexity you have (topics, spouts, bolts and so on) -
the easier life on your side :)

Dmitry


On Tue, May 2, 2017 at 3:32 PM, I PVP <ip...@hotmail.com> wrote:

> Thanks for answering and sorry for not being more clear.
> I will try to clarify more.
>
> All topologies are running simple logic.
>  it is a event driven approach and I am trying to figure out what is
>  conceptually the way people design/organize Topologies on Apache Storm
>
> So far i had done  kafka topic per event ( example: OrderCreated,
> OrderUpdated)  and 1  Topology per event ( exemple OrdeCreatedTopology)
>  Each Topology has has 1 KafkSpout ( receives data from the kafka Topic and
> passas to 1 Bolt that writes data to Cassandra.
>
>
> My question is… if this Topology per event the way to do or do experience
> Storm developers would develop 1 Topology per business domain like
> OrderTopology and that topology with have all “Order” related KafkaSpouts
> and Bolts ?
>
> Thanks
> IPVP
>
> On May 2, 2017 at 5:22:45 PM, Dmitry Semenov (dmitry@saritasa.com) wrote:
>
> It's hard to understand your question or recommend a solution.
>
> If you put too much of activity (business logic / processing) in a single
> task - then it will be hard for you to scale up the topology and your
> hardware utilization will be very high. Make tasks atomic and small, use
> batching inserts to DB if possible. Analyze if cassandra becomes a
> bottleneck.  Cache of data inside tasks's memory to avoid lookup queries to
> DB.
>
> On Tue, May 2, 2017 at 7:44 AM, I PVP <ip...@hotmail.com> wrote:
>
>> What is the high level best practice on Apache Storm ?
>>
>> a)  To create a OrderTopology that would receive  and process data from
>> all Order related topics/Spouts like  OrderCreated, OrderUpdated,
>> OrderCancelled and so on
>>
>> OR
>>
>> b) To create individual Topologies like OrderCreatedTopology,
>> OrderUpdatedTopology, OrderCancelledTopology
>>
>> The reason I am asking is because  processing power is getting consumed
>> 100% on all supervisor machines/instance... and does not matter how big the
>> machines/instances are  or how many topologies are running.
>> The overhead required to run a topology seems to be the attention point..
>> as cpus on supervisors are at 100% even when there is no data coming into
>> Spouts  or going out  to Bolts.
>>
>> Our application  has Topologies that  receive data from a KafkaSpouts ->
>> Bolts write data to Cassandra. So far 32 Topologies.
>>
>> Should I  focus on consolidating all "business domain" ( like Order,
>> Payment)  activities within the same Topology( like OrderTopology,
>> PaymentTopology)?
>>
>> How does Storm based solutions “design” their topologies ?
>> A side of individual logging , what are the pros and cons  from Apache
>> Storm perspective ?
>>
>>
>> thanks
>>
>> IPVP
>>
>
>
>
-- 
------------------------------
<http://www.saritasa.com/>
Dmitry Semenov
www.saritasa.com
20411 Birch St., Suite 330, Newport Beach, CA 92660

Re: Storm Topology design

Posted by I PVP <ip...@hotmail.com>.
Thanks for answering and sorry for not being more clear.
I will try to clarify more.

All topologies are running simple logic.
 it is a event driven approach and I am trying to figure out what is  conceptually the way people design/organize Topologies on Apache Storm

So far i had done  kafka topic per event ( example: OrderCreated, OrderUpdated)  and 1  Topology per event ( exemple OrdeCreatedTopology)  Each Topology has has 1 KafkSpout ( receives data from the kafka Topic and passas to 1 Bolt that writes data to Cassandra.


My question is… if this Topology per event the way to do or do experience Storm developers would develop 1 Topology per business domain like OrderTopology and that topology with have all “Order” related KafkaSpouts and Bolts ?

Thanks
IPVP


On May 2, 2017 at 5:22:45 PM, Dmitry Semenov (dmitry@saritasa.com<ma...@saritasa.com>) wrote:

It's hard to understand your question or recommend a solution.

If you put too much of activity (business logic / processing) in a single task - then it will be hard for you to scale up the topology and your hardware utilization will be very high. Make tasks atomic and small, use batching inserts to DB if possible. Analyze if cassandra becomes a bottleneck.  Cache of data inside tasks's memory to avoid lookup queries to DB.

On Tue, May 2, 2017 at 7:44 AM, I PVP <ip...@hotmail.com>> wrote:
What is the high level best practice on Apache Storm ?

a)  To create a OrderTopology that would receive  and process data from all Order related topics/Spouts like  OrderCreated, OrderUpdated, OrderCancelled and so on

OR

b) To create individual Topologies like OrderCreatedTopology, OrderUpdatedTopology, OrderCancelledTopology

The reason I am asking is because  processing power is getting consumed 100% on all supervisor machines/instance... and does not matter how big the machines/instances are  or how many topologies are running.
The overhead required to run a topology seems to be the attention point.. as cpus on supervisors are at 100% even when there is no data coming into Spouts  or going out  to Bolts.

Our application  has Topologies that  receive data from a KafkaSpouts -> Bolts write data to Cassandra. So far 32 Topologies.

Should I  focus on consolidating all "business domain" ( like Order, Payment)  activities within the same Topology( like OrderTopology, PaymentTopology)?

How does Storm based solutions “design” their topologies ?
A side of individual logging , what are the pros and cons  from Apache Storm perspective ?


thanks

IPVP



Re: Storm Topology design

Posted by Dmitry Semenov <dm...@saritasa.com>.
It's hard to understand your question or recommend a solution.

If you put too much of activity (business logic / processing) in a single
task - then it will be hard for you to scale up the topology and your
hardware utilization will be very high. Make tasks atomic and small, use
batching inserts to DB if possible. Analyze if cassandra becomes a
bottleneck.  Cache of data inside tasks's memory to avoid lookup queries to
DB.

On Tue, May 2, 2017 at 7:44 AM, I PVP <ip...@hotmail.com> wrote:

> What is the high level best practice on Apache Storm ?
>
> a)  To create a OrderTopology that would receive  and process data from
> all Order related topics/Spouts like  OrderCreated, OrderUpdated,
> OrderCancelled and so on
>
> OR
>
> b) To create individual Topologies like OrderCreatedTopology,
> OrderUpdatedTopology, OrderCancelledTopology
>
> The reason I am asking is because  processing power is getting consumed
> 100% on all supervisor machines/instance... and does not matter how big the
> machines/instances are  or how many topologies are running.
> The overhead required to run a topology seems to be the attention point..
> as cpus on supervisors are at 100% even when there is no data coming into
> Spouts  or going out  to Bolts.
>
> Our application  has Topologies that  receive data from a KafkaSpouts ->
> Bolts write data to Cassandra. So far 32 Topologies.
>
> Should I  focus on consolidating all "business domain" ( like Order,
> Payment)  activities within the same Topology( like OrderTopology,
> PaymentTopology)?
>
> How does Storm based solutions “design” their topologies ?
> A side of individual logging , what are the pros and cons  from Apache
> Storm perspective ?
>
>
> thanks
>
> IPVP
>