You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Hasan Riaz <ha...@gmail.com> on 2015/04/22 02:50:30 UTC

New to apache storm

Hello to all,
I am new to apache storm and have been working with it for the last month
or so. We are trying to design a topology wherein:
- A json message is broken up into multiple parts
- each of these parts is processed in a parallel manner
- the results are aggregated via a Grouping Bolt

This topology needs to work in a synchronous and a asynchronous manner,
meaning that the message can be expected synchronously via a DRPC request
or via a message queue(kafka)

I have the following question:
- Is there a way to achieve the above via a single topology or would I need
to have separate topologies?
- Since DRPC is deprecated is it safe to assume that the best way to code
is through the trident abstraction?
- Using storm primitives is there a way to process a message exactly once?

Lastly, in order to monitor whether a topology is running, I have a script
which invokes the rest api as documented by the link
<https://github.com/apache/storm/blob/master/STORM-UI-REST-API.md>, reads
the response of the topology summary and then based on whether the topology
is present or not, starts or stops the topology on a given server. Is this
way prudent? I am using monit to invoke the script.

Thanks in advance for your help

Re: New to apache storm

Posted by "Grant Overby (groverby)" <gr...@cisco.com>.
I haven’t used DRPC, so I can’t speak to it. That said, Kafka is pretty awesome and can do some really jaw dropping performance. If I were you, I’d consider standardizing around Kafka. If that isn’t viable, storm topologies are directional acyclic graphs, so you can merge two streams into a single stream — the illustration on wikipedia is pretty nice http://en.wikipedia.org/wiki/Directed_acyclic_graph.

Generally speaking, synchronous anything in distributed computing is expensive and, imho, to be avoided if possible.

I’m a big fan of trident. I’d use it first unless there is a need for the lower level spout&bolt api.

Trident can be used to guarantee exactly once processing, but keep in mind this has some external requirements. If you’re writing to a database, for example, the writes still need idempotency. Trident helps with this by providing batch ids, which is more performant than natural keys on an individual tuple, but still can be a pain with things such as columnar stores.



[http://www.cisco.com/web/europe/images/email/signature/est2014/logo_06.png?ct=1398192119726]

Grant Overby
Software Engineer
Cisco.com<http://www.cisco.com/>
groverby@cisco.com<ma...@cisco.com>
Mobile: 865 724 4910






[http://www.cisco.com/assets/swa/img/thinkbeforeyouprint.gif] Think before you print.

This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.

Please click here<http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for Company Registration Information.





From: Hasan Riaz <ha...@gmail.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Thursday, April 23, 2015 at 9:42 PM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Subject: Re: New to apache storm

Hello,
just wanted to inquire if anyone can answer my questions.
Thanks

On Tue, Apr 21, 2015 at 7:50 PM, Hasan Riaz <ha...@gmail.com>> wrote:
Hello to all,
I am new to apache storm and have been working with it for the last month or so. We are trying to design a topology wherein:
- A json message is broken up into multiple parts
- each of these parts is processed in a parallel manner
- the results are aggregated via a Grouping Bolt

This topology needs to work in a synchronous and a asynchronous manner, meaning that the message can be expected synchronously via a DRPC request or via a message queue(kafka)

I have the following question:
- Is there a way to achieve the above via a single topology or would I need to have separate topologies?
- Since DRPC is deprecated is it safe to assume that the best way to code is through the trident abstraction?
- Using storm primitives is there a way to process a message exactly once?

Lastly, in order to monitor whether a topology is running, I have a script which invokes the rest api as documented by the link<https://github.com/apache/storm/blob/master/STORM-UI-REST-API.md>, reads the response of the topology summary and then based on whether the topology is present or not, starts or stops the topology on a given server. Is this way prudent? I am using monit to invoke the script.

Thanks in advance for your help


Re: New to apache storm

Posted by Hasan Riaz <ha...@gmail.com>.
Hello,
just wanted to inquire if anyone can answer my questions.
Thanks

On Tue, Apr 21, 2015 at 7:50 PM, Hasan Riaz <ha...@gmail.com> wrote:

> Hello to all,
> I am new to apache storm and have been working with it for the last month
> or so. We are trying to design a topology wherein:
> - A json message is broken up into multiple parts
> - each of these parts is processed in a parallel manner
> - the results are aggregated via a Grouping Bolt
>
> This topology needs to work in a synchronous and a asynchronous manner,
> meaning that the message can be expected synchronously via a DRPC request
> or via a message queue(kafka)
>
> I have the following question:
> - Is there a way to achieve the above via a single topology or would I
> need to have separate topologies?
> - Since DRPC is deprecated is it safe to assume that the best way to code
> is through the trident abstraction?
> - Using storm primitives is there a way to process a message exactly once?
>
> Lastly, in order to monitor whether a topology is running, I have a script
> which invokes the rest api as documented by the link
> <https://github.com/apache/storm/blob/master/STORM-UI-REST-API.md>, reads
> the response of the topology summary and then based on whether the topology
> is present or not, starts or stops the topology on a given server. Is this
> way prudent? I am using monit to invoke the script.
>
> Thanks in advance for your help
>