You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by bhasker datta <bh...@gmail.com> on 2015/09/21 16:10:34 UTC

Re: Implementing multiple(about 1000) storm topologies

On Mon, Sep 21, 2015 at 8:53 AM, bhasker datta <bh...@gmail.com> wrote:

> Hello,
> I am working on a storm topology where I need to build multiple topologies
> for different client locations.
>
> I have a Kafka server producing data that needs to reach remote client
> machine(after some transformation in the bolts). There are about 1000 of
> those clients.
> when there is new data produced in the Kafka, the clients need to get that
> data in few minutes.
> There is some transformation that needs to happen (which can be handled by
> the bolts) and the data is sent to the remote client via websocket.
>
> e.g: Kafka server -> Kafka spout -> Bolt1 -> bolt2 -> bolt3 -> websocket
> (on a client machine)
> I have 1000 client machines.
>
> I am hoping that this topology can be dynamically created per client. When
> the client is ready, can they initiate this topology and start reading the
> data?
> or can these topologies (1000 of them) be created on the storm and ready
> to send the data?
>
> Is this architecture possible with Storm?
> How efficient is this?
> Is the Strom setup manageable?
>
> thanks for your response.
> Bhasker
>
>

Re: Implementing multiple(about 1000) storm topologies

Posted by bh...@gmail.com.
The data in the Kafka is similar but it may not apply to all locations .
E.g: I have an insurance product that may not apply to all locations . I have data produced for all products, but depending on the Client location , they grab what's intended for their location. 
The idea is that the bolt will process that message and in the message there is location information which determines which client it should goto.

If the topology is the same for all locations, ie bolt1 and bolt2 process that data and then give it to bolt3 which can spray the messages to multiple locations , but if any one location is down, what would the status of that message? Error handling will be little difficult. We don't want to resend the message over till all clients receive successfully.

Alternatively , I can have each location produce its own topic, which is an overhead On the producer to maintain 1000 topics for the similar data each one for different location. 

Hope I have clarified the issue.
Thanks
Bhasker


> On Sep 21, 2015, at 11:55 PM, Javier Gonzalez <ja...@gmail.com> wrote:
> 
> Off the top of my head, I would:
> 
> - have a Storm topology ready listening on kafka. If you have a few minutes between kafka event and delivery of processed input to clients, I would rather not waste time starting up the topology. 
> - Not implement 1000 topologies. That's at least 1000 jvms. Is the processing in the internal bolts the same or similar enough that the whole requirement can be solved with a single topology that has 3 types of bolts, each of them paralelized as far as your cluster hardware can bear? 
> 
> Can you provide more information?
> 
> regards,
> JG
> 
> 
>> On Mon, Sep 21, 2015 at 10:10 AM, bhasker datta <bh...@gmail.com> wrote:
>> 
>> 
>>> On Mon, Sep 21, 2015 at 8:53 AM, bhasker datta <bh...@gmail.com> wrote:
>>> Hello,
>>> I am working on a storm topology where I need to build multiple topologies for different client locations.
>>> 
>>> I have a Kafka server producing data that needs to reach remote client machine(after some transformation in the bolts). There are about 1000 of those clients.
>>> when there is new data produced in the Kafka, the clients need to get that data in few minutes.
>>> There is some transformation that needs to happen (which can be handled by the bolts) and the data is sent to the remote client via websocket.
>>> 
>>> e.g: Kafka server -> Kafka spout -> Bolt1 -> bolt2 -> bolt3 -> websocket (on a client machine)
>>> I have 1000 client machines.
>>> 
>>> I am hoping that this topology can be dynamically created per client. When the client is ready, can they initiate this topology and start reading the data?
>>> or can these topologies (1000 of them) be created on the storm and ready to send the data?
>>> 
>>> Is this architecture possible with Storm?
>>> How efficient is this?
>>> Is the Strom setup manageable?
>>> 
>>> thanks for your response.
>>> Bhasker
> 
> 
> 
> -- 
> Javier González Nicolini

Re: Implementing multiple(about 1000) storm topologies

Posted by Javier Gonzalez <ja...@gmail.com>.
Off the top of my head, I would:

- have a Storm topology ready listening on kafka. If you have a few minutes
between kafka event and delivery of processed input to clients, I would
rather not waste time starting up the topology.
- Not implement 1000 topologies. That's at least 1000 jvms. Is the
processing in the internal bolts the same or similar enough that the whole
requirement can be solved with a single topology that has 3 types of bolts,
each of them paralelized as far as your cluster hardware can bear?

Can you provide more information?

regards,
JG


On Mon, Sep 21, 2015 at 10:10 AM, bhasker datta <bh...@gmail.com> wrote:

>
>
> On Mon, Sep 21, 2015 at 8:53 AM, bhasker datta <bh...@gmail.com>
> wrote:
>
>> Hello,
>> I am working on a storm topology where I need to build multiple
>> topologies for different client locations.
>>
>> I have a Kafka server producing data that needs to reach remote client
>> machine(after some transformation in the bolts). There are about 1000 of
>> those clients.
>> when there is new data produced in the Kafka, the clients need to get
>> that data in few minutes.
>> There is some transformation that needs to happen (which can be handled
>> by the bolts) and the data is sent to the remote client via websocket.
>>
>> e.g: Kafka server -> Kafka spout -> Bolt1 -> bolt2 -> bolt3 -> websocket
>> (on a client machine)
>> I have 1000 client machines.
>>
>> I am hoping that this topology can be dynamically created per client.
>> When the client is ready, can they initiate this topology and start reading
>> the data?
>> or can these topologies (1000 of them) be created on the storm and ready
>> to send the data?
>>
>> Is this architecture possible with Storm?
>> How efficient is this?
>> Is the Strom setup manageable?
>>
>> thanks for your response.
>> Bhasker
>>
>>
>


-- 
Javier González Nicolini