You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Joël Kuiper <me...@joelkuiper.eu> on 2014/04/18 16:40:06 UTC

Setting up Storm behind HTTP in Clojure

Hey, 

So I’m contemplating using Storm for processing for doing rather complicated analyses on user submitted data (either through HTTP or WebSockets). 
Storm seems perfect for the multi-stage processing that I need, and it’s real-time nature would fit the type of interactions I require. 
Furthermore many steps would involve already written analyses in Python and R, so using bolts for that would be great.

However, hooking up Storm behind an HTTP like Ring (optionally with http-kit) seems non-trivial. 

I first thought of pushing the messages on a core.async queue and having a Spout consume them. But I realise this might fail in a cluster. 
So the current thinking is 

* HTTP Request -> create job & push job on Kafka jobs topic 
* Inform the user about the created job, which includes a (WebSocket) url to listen for results
* Storm consumes from Kafka 
* End results are pushed to bolts that push on a Kafka topic for results 
* Make server listen on results topic & push results to appropriate jobs (i.e. notify user on job url) 

But to be honest … this seems a bit of hassle to set-up. It would require server/developers to set-up Kafka, Storm and all related dependencies. 
It’s a lot of “stuff” just to get it running, which might hamper developer adaptation at our shop.

Is there a simpeler way of getting this going, or does this seem to be the most appropriate way?

Many thanks,
Joël

Re: Setting up Storm behind HTTP in Clojure

Posted by Ruhollah Farchtchi <ru...@gmail.com>.

Seems like you should check out storm DRPC api. Since it looks like you are
using Clojure there is a clojure trident DSL documented here:
https://github.com/yieldbot/marceline that might make things easier.


Ruhollah Farchtchi
ruhollah.farchtchi@gmail.com


On Fri, Apr 18, 2014 at 11:29 AM, Marc Vaillant <va...@animetrics.com>wrote:

> Have you looked at Trident + DRPC?
>
> https://github.com/nathanmarz/storm/wiki/Trident-tutorial
>
> Also, I came across the following once but I've never tried it and I'm
> not sure how mature it is:
>
> https://github.com/chriskchew/restexpress-storm
>
> Marc
>
> On Fri, Apr 18, 2014 at 04:40:06PM +0200, Joël Kuiper wrote:
> > Hey,
> >
> > So I’m contemplating using Storm for processing for doing rather
> complicated analyses on user submitted data (either through HTTP or
> WebSockets).
> > Storm seems perfect for the multi-stage processing that I need, and it’s
> real-time nature would fit the type of interactions I require.
> > Furthermore many steps would involve already written analyses in Python
> and R, so using bolts for that would be great.
> >
> > However, hooking up Storm behind an HTTP like Ring (optionally with
> http-kit) seems non-trivial.
> >
> > I first thought of pushing the messages on a core.async queue and having
> a Spout consume them. But I realise this might fail in a cluster.
> > So the current thinking is
> >
> > * HTTP Request -> create job & push job on Kafka jobs topic
> > * Inform the user about the created job, which includes a (WebSocket)
> url to listen for results
> > * Storm consumes from Kafka
> > * End results are pushed to bolts that push on a Kafka topic for results
> > * Make server listen on results topic & push results to appropriate jobs
> (i.e. notify user on job url)
> >
> > But to be honest … this seems a bit of hassle to set-up. It would
> require server/developers to set-up Kafka, Storm and all related
> dependencies.
> > It’s a lot of “stuff” just to get it running, which might hamper
> developer adaptation at our shop.
> >
> > Is there a simpeler way of getting this going, or does this seem to be
> the most appropriate way?
> >
> > Many thanks,
> > Joël
> >
>
>
>

Re: Setting up Storm behind HTTP in Clojure

Posted by Rohan Kapadia <ro...@gmail.com>.

At my company we have implemented this using a node server to take care 
of the HTTP part.

I've set up a storm node to run the drpc daemon. Currently we are suing 
the LinearDRPCTopology, that is deprecated. It works as required though.

On Sunday 20 April 2014 04:21:30 PM IST, Joël Kuiper wrote:
> Unfortunately using Trident is not an option due to the multi-language requirement.
> There’s a lot of peer-reviewed stuff that cannot be (trivially) ported to the JVM.
>
> I guess creating a generic “call Storm over HTTP” with a nice protocol would be an interesting project in itself though.
>
> On 18 Apr 2014, at 17:29, Marc Vaillant <va...@animetrics.com> wrote:
>
>> Have you looked at Trident + DRPC?
>>
>> https://github.com/nathanmarz/storm/wiki/Trident-tutorial
>>
>> Also, I came across the following once but I've never tried it and I'm
>> not sure how mature it is:
>>
>> https://github.com/chriskchew/restexpress-storm
>>
>> Marc
>>
>> On Fri, Apr 18, 2014 at 04:40:06PM +0200, Joël Kuiper wrote:
>>> Hey,
>>>
>>> So I’m contemplating using Storm for processing for doing rather complicated analyses on user submitted data (either through HTTP or WebSockets).
>>> Storm seems perfect for the multi-stage processing that I need, and it’s real-time nature would fit the type of interactions I require.
>>> Furthermore many steps would involve already written analyses in Python and R, so using bolts for that would be great.
>>>
>>> However, hooking up Storm behind an HTTP like Ring (optionally with http-kit) seems non-trivial.
>>>
>>> I first thought of pushing the messages on a core.async queue and having a Spout consume them. But I realise this might fail in a cluster.
>>> So the current thinking is
>>>
>>> * HTTP Request -> create job & push job on Kafka jobs topic
>>> * Inform the user about the created job, which includes a (WebSocket) url to listen for results
>>> * Storm consumes from Kafka
>>> * End results are pushed to bolts that push on a Kafka topic for results
>>> * Make server listen on results topic & push results to appropriate jobs (i.e. notify user on job url)
>>>
>>> But to be honest … this seems a bit of hassle to set-up. It would require server/developers to set-up Kafka, Storm and all related dependencies.
>>> It’s a lot of “stuff” just to get it running, which might hamper developer adaptation at our shop.
>>>
>>> Is there a simpeler way of getting this going, or does this seem to be the most appropriate way?
>>>
>>> Many thanks,
>>> Joël
>>>
>>
>>
>

Re: Setting up Storm behind HTTP in Clojure

Posted by Joël Kuiper <me...@joelkuiper.eu>.

Unfortunately using Trident is not an option due to the multi-language requirement. 
There’s a lot of peer-reviewed stuff that cannot be (trivially) ported to the JVM. 

I guess creating a generic “call Storm over HTTP” with a nice protocol would be an interesting project in itself though.

On 18 Apr 2014, at 17:29, Marc Vaillant <va...@animetrics.com> wrote:

> Have you looked at Trident + DRPC? 
> 
> https://github.com/nathanmarz/storm/wiki/Trident-tutorial
> 
> Also, I came across the following once but I've never tried it and I'm
> not sure how mature it is:
> 
> https://github.com/chriskchew/restexpress-storm
> 
> Marc
> 
> On Fri, Apr 18, 2014 at 04:40:06PM +0200, Joël Kuiper wrote:
>> Hey, 
>> 
>> So I’m contemplating using Storm for processing for doing rather complicated analyses on user submitted data (either through HTTP or WebSockets). 
>> Storm seems perfect for the multi-stage processing that I need, and it’s real-time nature would fit the type of interactions I require. 
>> Furthermore many steps would involve already written analyses in Python and R, so using bolts for that would be great.
>> 
>> However, hooking up Storm behind an HTTP like Ring (optionally with http-kit) seems non-trivial. 
>> 
>> I first thought of pushing the messages on a core.async queue and having a Spout consume them. But I realise this might fail in a cluster. 
>> So the current thinking is 
>> 
>> * HTTP Request -> create job & push job on Kafka jobs topic 
>> * Inform the user about the created job, which includes a (WebSocket) url to listen for results
>> * Storm consumes from Kafka 
>> * End results are pushed to bolts that push on a Kafka topic for results 
>> * Make server listen on results topic & push results to appropriate jobs (i.e. notify user on job url) 
>> 
>> But to be honest … this seems a bit of hassle to set-up. It would require server/developers to set-up Kafka, Storm and all related dependencies. 
>> It’s a lot of “stuff” just to get it running, which might hamper developer adaptation at our shop.
>> 
>> Is there a simpeler way of getting this going, or does this seem to be the most appropriate way?
>> 
>> Many thanks,
>> Joël
>> 
> 
>

Re: Setting up Storm behind HTTP in Clojure

Posted by Marc Vaillant <va...@animetrics.com>.

Have you looked at Trident + DRPC? 

https://github.com/nathanmarz/storm/wiki/Trident-tutorial

Also, I came across the following once but I've never tried it and I'm
not sure how mature it is:

https://github.com/chriskchew/restexpress-storm

Marc

On Fri, Apr 18, 2014 at 04:40:06PM +0200, Joël Kuiper wrote:
> Hey, 
> 
> So I’m contemplating using Storm for processing for doing rather complicated analyses on user submitted data (either through HTTP or WebSockets). 
> Storm seems perfect for the multi-stage processing that I need, and it’s real-time nature would fit the type of interactions I require. 
> Furthermore many steps would involve already written analyses in Python and R, so using bolts for that would be great.
> 
> However, hooking up Storm behind an HTTP like Ring (optionally with http-kit) seems non-trivial. 
> 
> I first thought of pushing the messages on a core.async queue and having a Spout consume them. But I realise this might fail in a cluster. 
> So the current thinking is 
> 
> * HTTP Request -> create job & push job on Kafka jobs topic 
> * Inform the user about the created job, which includes a (WebSocket) url to listen for results
> * Storm consumes from Kafka 
> * End results are pushed to bolts that push on a Kafka topic for results 
> * Make server listen on results topic & push results to appropriate jobs (i.e. notify user on job url) 
> 
> But to be honest … this seems a bit of hassle to set-up. It would require server/developers to set-up Kafka, Storm and all related dependencies. 
> It’s a lot of “stuff” just to get it running, which might hamper developer adaptation at our shop.
> 
> Is there a simpeler way of getting this going, or does this seem to be the most appropriate way?
> 
> Many thanks,
> Joël
>