You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Preetam Rao <ra...@gmail.com> on 2014/07/09 08:04:09 UTC

Controlling Component to JVM/Host allocation ?

Hi

Appreciate any pointers on the following which are causing us problems in
production.

1. Is there a way we can restrict multiple instances of a given spout be
allocated on "different hosts" ? Our spouts start a embedded Jetty server
that listen on a well known port (Here https). Thus parallelism on same
host does not help. Observing that most often, even when parallelism is set
to 5, all get allocated on same host rendering  the parallelism wasted and
which in turn is causing load issues. From discussion I have read, my
thinking is it is not possible. But it is a critical issue for us right
now, so pointers help.

2. Is there a way we can control how many components get allocated per
worker (Or, enforce rule that never allocate more than one component per
worker) ? Irrespective of high worker count setup, occasionally both spout
as well as bolt are getting allocated on same worker (that is same host &
storm port). This is causing load & GC issues since the input rate is quite
high. .

3. Ours is a multi tenant setup. On the same lines as item 2 above, how can
we prevent components from different topologies not running on same worker
? Because that simply means any topology by chance can break the worker
(say memory leak) on which my typology's components are running on.

Thanks in advance for any pointers/suggestions.
Preetam

Re: Controlling Component to JVM/Host allocation ?

Posted by Preetam Rao <ra...@gmail.com>.

Thanks for the suggestions. However restricting to one worker per host
might be too restrictive in a shared multi tenant setup. Custom scheduler
too may not be feasible given the nature of the setup.


On Thu, Jul 10, 2014 at 6:17 AM, Srinath C <sr...@gmail.com> wrote:

> On Wed, Jul 9, 2014 at 11:34 AM, Preetam Rao <ra...@gmail.com>
> wrote:
>
>> Hi
>>
>> Appreciate any pointers on the following which are causing us problems in
>> production.
>>
>> 1. Is there a way we can restrict multiple instances of a given spout be
>> allocated on "different hosts" ? Our spouts start a embedded Jetty server
>> that listen on a well known port (Here https). Thus parallelism on same
>> host does not help. Observing that most often, even when parallelism is set
>> to 5, all get allocated on same host rendering  the parallelism wasted and
>> which in turn is causing load issues. From discussion I have read, my
>> thinking is it is not possible. But it is a critical issue for us right
>> now, so pointers help.
>>
>
> You could probably do this by restricting one worker per host and setting
> parallelism and num tasks < num of workers.
>
>
>>
>> 2. Is there a way we can control how many components get allocated per
>> worker (Or, enforce rule that never allocate more than one component per
>> worker) ? Irrespective of high worker count setup, occasionally both spout
>> as well as bolt are getting allocated on same worker (that is same host &
>> storm port). This is causing load & GC issues since the input rate is quite
>> high. .
>>
>
> I recently came across an article. See if this helps
> http://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/
>
>
>>
>> 3. Ours is a multi tenant setup. On the same lines as item 2 above, how
>> can we prevent components from different topologies not running on same
>> worker ? Because that simply means any topology by chance can break the
>> worker (say memory leak) on which my typology's components are running on.
>>
>
> Probably you can just restrict one worker process per host?
>
>
>>
>> Thanks in advance for any pointers/suggestions.
>> Preetam
>>
>>
>>
>

Re: Controlling Component to JVM/Host allocation ?

Posted by Srinath C <sr...@gmail.com>.

On Wed, Jul 9, 2014 at 11:34 AM, Preetam Rao <ra...@gmail.com> wrote:

> Hi
>
> Appreciate any pointers on the following which are causing us problems in
> production.
>
> 1. Is there a way we can restrict multiple instances of a given spout be
> allocated on "different hosts" ? Our spouts start a embedded Jetty server
> that listen on a well known port (Here https). Thus parallelism on same
> host does not help. Observing that most often, even when parallelism is set
> to 5, all get allocated on same host rendering  the parallelism wasted and
> which in turn is causing load issues. From discussion I have read, my
> thinking is it is not possible. But it is a critical issue for us right
> now, so pointers help.
>

You could probably do this by restricting one worker per host and setting
parallelism and num tasks < num of workers.


>
> 2. Is there a way we can control how many components get allocated per
> worker (Or, enforce rule that never allocate more than one component per
> worker) ? Irrespective of high worker count setup, occasionally both spout
> as well as bolt are getting allocated on same worker (that is same host &
> storm port). This is causing load & GC issues since the input rate is quite
> high. .
>

I recently came across an article. See if this helps
http://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/


>
> 3. Ours is a multi tenant setup. On the same lines as item 2 above, how
> can we prevent components from different topologies not running on same
> worker ? Because that simply means any topology by chance can break the
> worker (say memory leak) on which my typology's components are running on.
>

Probably you can just restrict one worker process per host?


>
> Thanks in advance for any pointers/suggestions.
> Preetam
>
>
>