You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stratos.apache.org by Imesh Gunaratne <im...@apache.org> on 2014/04/12 04:06:34 UTC

Load Balancer Statistics Publishing Sliding Window

Hi Lahiru,

This is regarding the modification we did in load balancer statistics
publishing logic in LB to publish statistics every one minute from LB to
CEP.

Commit Revision: 7d8a8b3cdfb112affc230c6716d71a24e0a35c0a

I think there is a problem with this design. Ideally the time window
processing logic should reside in CEP. I just went through the CEP
artefacts and found that we already have an execution plan to calculate
load balancer statistics every one minute and publish to Auto-scaler:

<queryExpressions><![CDATA[
from lbStats1#window.timeBatch(1 min)
select cluster_id,network_partition_id, avg(in_flight_request_count) as
count group by cluster_id,network_partition_id  insert into
average_in_flight_requests;]]></queryExpressions>

Could you recall what was the exact issue we experienced when we decided to
do this modification?

Thanks



-- 
Imesh Gunaratne

Technical Lead, WSO2
Committer & PPMC Member, Apache Stratos

Re: Load Balancer Statistics Publishing Sliding Window

Posted by Lahiru Sandaruwan <la...@wso2.com>.
Hi Isuru,


On Fri, Apr 11, 2014 at 7:45 PM, Isuru Perera <is...@wso2.com> wrote:

> Do we have any docs explaining these things in auto-scaling.
>
> For example, I have a an app. If I get more than two hundred concurrent
> requests, which will take at least 30 seconds to process, I want to
> auto-scale the app. But I'm not sure which values to put in auto-scaling
> policy.
>
> It would be great if we have docs for these.
>

Currently the end-user(cartridge subscriber) have the control only over
three values, Requests in flight threshold, Load average threshold, and
memory consumption threshold.

+1 for documenting this if not already done. Ideally this should go to the
place where we explain the autoscaling policy.

Let us talk about the LB requests in flight(RIF) aspect of your requirement,
Assume that one instance can handle 50 concurrent requests and this
load(200 concurrent) is kept over 1 day

Then we need 4 minimum instances over that day.

If load your load is seasonal,

    STRATOS-488 Item #2 should be able to address it. i.e. we define that
we need 4 minimum instances during this particular day.

Else if this is a unpredicted load increase,

    Say that we have started with 1 as minimum. When we start receiving
this load, the RIF will go up and pass 200. So we should set threshold as
200. If the RIF goes over 200, as the current number of instances(1) cannot
handle this load, it will scale up until RIF is decreased.

*BUT* this 200 means the concurrent requests that could come to the service
cluster. This is not a very good parameter to depend on. Because it is
difficult to predict exact amount of request at the time of deployment.

Better way is to let the cartridge subscriber use number of concurrent
requests that an one instance can handle(50 in this case), as the
threshold. It is a better capacity planning attribute which is widely used.
Then the Autoscaler will find out that the RIF is 200 and one instance can
bear 50. So this cluster need 4 instances to bear the complete load. If the
number of instances spawned is less than 4, Autoscaler will increase the
number of instances until 4. In this case it will directly spawn 3
instances which needs to be there to cater this load.

This requirement should be catered with the item #1 of STRATOS-488.


> Shall we also improve the design by moving relevant logic to CEP as Imesh
> has mentioned?
>

Read my earlier reply ;)

Thanks.

>
>
> On Sat, Apr 12, 2014 at 7:59 AM, Lahiru Sandaruwan <la...@wso2.com>wrote:
>
>> Hi Imesh,
>>
>> Yes. The reason was, the request in flight count does not reduce if the
>> response is not sent back to client. It kept increasing and held at a high
>> value even there were no request sent due that reason.
>>
>> Thanks.
>>
>> Sent from my mobile.
>> On Apr 11, 2014 7:06 PM, "Imesh Gunaratne" <im...@apache.org> wrote:
>>
>>> Hi Lahiru,
>>>
>>> This is regarding the modification we did in load balancer statistics
>>> publishing logic in LB to publish statistics every one minute from LB to
>>> CEP.
>>>
>>> Commit Revision: 7d8a8b3cdfb112affc230c6716d71a24e0a35c0a
>>>
>>> I think there is a problem with this design. Ideally the time window
>>> processing logic should reside in CEP. I just went through the CEP
>>> artefacts and found that we already have an execution plan to calculate
>>> load balancer statistics every one minute and publish to Auto-scaler:
>>>
>>> <queryExpressions><![CDATA[
>>> from lbStats1#window.timeBatch(1 min)
>>> select cluster_id,network_partition_id, avg(in_flight_request_count) as
>>> count group by cluster_id,network_partition_id  insert into
>>> average_in_flight_requests;]]></queryExpressions>
>>>
>>> Could you recall what was the exact issue we experienced when we decided
>>> to do this modification?
>>>
>>> Thanks
>>>
>>>
>>>
>>> --
>>> Imesh Gunaratne
>>>
>>> Technical Lead, WSO2
>>> Committer & PPMC Member, Apache Stratos
>>>
>>
>
>
> --
> Isuru Perera
> Senior Software Engineer | WSO2, Inc. | http://wso2.com/
> Lean . Enterprise . Middleware
>
> about.me/chrishantha
>



-- 
--
Lahiru Sandaruwan
Software Engineer,
Platform Technologies,
WSO2 Inc., http://wso2.com
lean.enterprise.middleware

email: lahirus@wso2.com cell: (+94) 773 325 954
blog: http://lahiruwrites.blogspot.com/
twitter: http://twitter.com/lahirus
linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146

Re: Load Balancer Statistics Publishing Sliding Window

Posted by Isuru Perera <is...@wso2.com>.
Do we have any docs explaining these things in auto-scaling.

For example, I have a an app. If I get more than two hundred concurrent
requests, which will take at least 30 seconds to process, I want to
auto-scale the app. But I'm not sure which values to put in auto-scaling
policy.

It would be great if we have docs for these.

Shall we also improve the design by moving relevant logic to CEP as Imesh
has mentioned?


On Sat, Apr 12, 2014 at 7:59 AM, Lahiru Sandaruwan <la...@wso2.com> wrote:

> Hi Imesh,
>
> Yes. The reason was, the request in flight count does not reduce if the
> response is not sent back to client. It kept increasing and held at a high
> value even there were no request sent due that reason.
>
> Thanks.
>
> Sent from my mobile.
> On Apr 11, 2014 7:06 PM, "Imesh Gunaratne" <im...@apache.org> wrote:
>
>> Hi Lahiru,
>>
>> This is regarding the modification we did in load balancer statistics
>> publishing logic in LB to publish statistics every one minute from LB to
>> CEP.
>>
>> Commit Revision: 7d8a8b3cdfb112affc230c6716d71a24e0a35c0a
>>
>> I think there is a problem with this design. Ideally the time window
>> processing logic should reside in CEP. I just went through the CEP
>> artefacts and found that we already have an execution plan to calculate
>> load balancer statistics every one minute and publish to Auto-scaler:
>>
>> <queryExpressions><![CDATA[
>> from lbStats1#window.timeBatch(1 min)
>> select cluster_id,network_partition_id, avg(in_flight_request_count) as
>> count group by cluster_id,network_partition_id  insert into
>> average_in_flight_requests;]]></queryExpressions>
>>
>> Could you recall what was the exact issue we experienced when we decided
>> to do this modification?
>>
>> Thanks
>>
>>
>>
>> --
>> Imesh Gunaratne
>>
>> Technical Lead, WSO2
>> Committer & PPMC Member, Apache Stratos
>>
>


-- 
Isuru Perera
Senior Software Engineer | WSO2, Inc. | http://wso2.com/
Lean . Enterprise . Middleware

about.me/chrishantha

Re: Load Balancer Statistics Publishing Sliding Window

Posted by Lahiru Sandaruwan <la...@wso2.com>.
Hi Imesh,

Yes. The reason was, the request in flight count does not reduce if the
response is not sent back to client. It kept increasing and held at a high
value even there were no request sent due that reason.

Thanks.

Sent from my mobile.
On Apr 11, 2014 7:06 PM, "Imesh Gunaratne" <im...@apache.org> wrote:

> Hi Lahiru,
>
> This is regarding the modification we did in load balancer statistics
> publishing logic in LB to publish statistics every one minute from LB to
> CEP.
>
> Commit Revision: 7d8a8b3cdfb112affc230c6716d71a24e0a35c0a
>
> I think there is a problem with this design. Ideally the time window
> processing logic should reside in CEP. I just went through the CEP
> artefacts and found that we already have an execution plan to calculate
> load balancer statistics every one minute and publish to Auto-scaler:
>
> <queryExpressions><![CDATA[
> from lbStats1#window.timeBatch(1 min)
> select cluster_id,network_partition_id, avg(in_flight_request_count) as
> count group by cluster_id,network_partition_id  insert into
> average_in_flight_requests;]]></queryExpressions>
>
> Could you recall what was the exact issue we experienced when we decided
> to do this modification?
>
> Thanks
>
>
>
> --
> Imesh Gunaratne
>
> Technical Lead, WSO2
> Committer & PPMC Member, Apache Stratos
>