You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@dubbo.apache.org by "Daniela Morais (JIRA)" <ji...@apache.org> on 2019/04/04 04:25:00 UTC
[jira] [Comment Edited] (DUBBO-34) GSoC 2019: New Load Balancer for higher availability and resilience.

    [ https://issues.apache.org/jira/browse/DUBBO-34?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809494#comment-16809494 ] 

Daniela Morais edited comment on DUBBO-34 at 4/4/19 4:24 AM:
-------------------------------------------------------------

Hey,

 

I was thinking about it:

At the beginning of balancing (the first interaction), we don't have any information about the server so we can forward to any server.

After that, the server response will say about CPU utilization, what should be the minimum of CPU utilization in normal conditions and the number of inflight requests (we can get this on the header of the response, Result.java has an attribute called "attachment" that can be used I think). We can also get the round trip time here and check if any error happened (the default can be just 5xx code). We can discuss the better data structure but persist just the last data sent from the server seems better to avoid a lot of processing.

So until we got some interactions, is possible to select using these criteria:
 * the number of inflight request count and CPU % (the server will reply)
 * the current number of inflight requests from this load balancer (I think RpcStatus can help)
 * percentage of fewer errors and how much longer is server response time (this code is similar to MetricsFilter)

 

I was thinking about filtering those servers that are available (least than x% of errors happened, CPU <= the minimum of CPU utilization, server response time <= y, etc.) and return one of them. Note that can be a properties, for example, we allow that server response time can be Y ms.

Another option is some equation that put a score in every server and returns the best. But what should be the priority of this equation? CPU, number of requests? What do you think?

 

I wrote this snippet:

{color:#cc7832}public class {color}MetricsLoadBalance {color:#cc7832}extends {color}AbstractLoadBalance {color:#cc7832}implements {color}Filter {

{color:#bbb529}@Override{color} {color:#cc7832}protected {color}<{color:#507874}T{color}> Invoker<{color:#507874}T{color}> {color:#ffc66d}doSelect{color}(List<Invoker<{color:#507874}T{color}>> invokers{color:#cc7832}, {color}URL url{color:#cc7832}, {color}Invocation invocation) {
 {color:#808080}//is there any information about servers? note that in the first request we don't have information about the servers{color}

{color:#cc7832}if {color}(getServersInfo() != {color:#cc7832}null {color}&& !getServersInfo().isEmpty()) {
 {color:#808080}//{color}{color:#808080}get the least number of calls requests from this load balancer using RpcStatus{color}

{color:#808080}//{color}{color:#808080}select server (I've some questions about it){color}



} {color:#cc7832}else {color}{
 {color:#808080}//select any server {color}

}
 {color:#cc7832}return null;{color}

}

{color:#bbb529}@Override{color} {color:#cc7832}public {color}Result {color:#ffc66d}invoke{color}(Invoker<?> invoker{color:#cc7832}, {color}Invocation invocation) {color:#cc7832}throws {color}RpcException {
 {color:#808080}//call invoker.invoke(invocation) and save server response time{color}{color:#808080} // if a RpcException is threw then saves the error{color}{color:#808080} //all the data (such as errors and server response time) will be save in some class similar to FastCompass.java (dubbo-metrics){color}{color:#808080} //also here we can save the CPU %, number of inflight requests and server utilization that will be returned{color}

{color:#cc7832}return null;{color}

}

{color:#cc7832}private {color}List<?> {color:#ffc66d}getServersInfo{color}() {
 {color:#808080}//{color}{color:#a8c023}TODO: get servers information (errors, server response time and server utilization) that was saved before{color}

{color:#cc7832}return null;{color}

}
}

 I've some questions: I asked in the mailing list about using "active" in query param in the LeastActiveLoadBalance, is there any server data that is possible to send in query param?

Can we reuse anything dubbo-metrics? I saw that MetricsFilter already saves the server response time and if any error happened.

 

 

 


was (Author: danielamorais):
Hey,

 

I was thinking about it:

At the beginning of balancing (the first interaction), we don't have any information about the server so we can forward to any server. 

After that, the server response will say about CPU utilization, what should be the minimum of CPU utilization in normal conditions and the number of inflight requests (we can get this on the header of the response, Result.java has an attribute called "attachment" that can be used I think). We can also get the round trip time here and check if any error happened (the default can be just 5xx code). We can discuss the better data structure but persist just the last data sent from the server seems better to avoid a lot of processing. 

So until we got some interactions, is possible to select using these criteria:

* the number of inflight request count and CPU % (the server will reply)
* the current number of inflight requests from this load balancer (I think RpcStatus can help)
* percentage of fewer errors and how much longer is server response time (this code is similar to MetricsFilter)

 

I was thinking about filtering those servers that are available (least than x% of errors happened, CPU <= the minimum of CPU utilization, server response time <= y, etc.) and return one of them. Note that can be a properties, for example, we allow that server response time can be Y ms. 

Another option is some equation that put a score in every server and returns the best. But what should be the priority of this equation? CPU, number of requests? What do you think? 

  

I wrote this snippet: 



{color:#cc7832}public class {color}MetricsLoadBalance {color:#cc7832}extends {color}AbstractLoadBalance {color:#cc7832}implements {color}Filter {

 {color:#bbb529}@Override
{color} {color:#cc7832}protected {color}<{color:#507874}T{color}> Invoker<{color:#507874}T{color}> {color:#ffc66d}doSelect{color}(List<Invoker<{color:#507874}T{color}>> invokers{color:#cc7832}, {color}URL url{color:#cc7832}, {color}Invocation invocation) {
 {color:#808080}//is there any information about servers? note that in the first request we don't have information about the servers
{color} {color:#cc7832}if {color}(getServersInfo() != {color:#cc7832}null {color}&& !getServersInfo().isEmpty()) {
 {color:#808080}//{color}{color:#808080}{color:#808080}get the least number of calls requests from this load balancer using RpcStatus{color}{color}

{color:#808080}//select server (*){color}

} {color:#cc7832}else {color}{
 {color:#808080}//select any server 
{color} }
 {color:#cc7832}return null;
{color} }

 {color:#bbb529}@Override
{color} {color:#cc7832}public {color}Result {color:#ffc66d}invoke{color}(Invoker<?> invoker{color:#cc7832}, {color}Invocation invocation) {color:#cc7832}throws {color}RpcException {
 {color:#808080}//call invoker.invoke(invocation) and save server response time
{color}{color:#808080} // if a RpcException is threw then saves the error
{color}{color:#808080} //all the data (such as errors and server response time) will be save in some class similar to FastCompass.java (dubbo-metrics)
{color}{color:#808080} //also here we can save the CPU %, number of inflight requests and server utilization that will be returned
{color} {color:#cc7832}return null;
{color} }

 {color:#cc7832}private {color}List<?> {color:#ffc66d}getServersInfo{color}() {
 {color:#808080}//{color}{color:#a8c023}TODO: get servers information (errors, server response time and server utilization) that was saved before
{color} {color:#cc7832}return null;
{color} }
}

 I've some questions: I asked in the mailing list about using "active" in query param in the LeastActiveLoadBalance, is there any server data that is possible to send in query param?

Can we reuse anything dubbo-metrics? I saw that MetricsFilter already saves the server response time and if any error happened. 

 

 

 

> GSoC 2019: New Load Balancer for higher availability and resilience.
> --------------------------------------------------------------------
>
>                 Key: DUBBO-34
>                 URL: https://issues.apache.org/jira/browse/DUBBO-34
>             Project: Apache Dubbo
>          Issue Type: Task
>            Reporter: Jun Liu
>            Priority: Major
>              Labels: GSoC2019
>
> This is an idea for Google Summer of Code (GSoC). Get to know about Dubbo[0].
> As an RPC framework, LoadBalance is a key part of Dubbo for distributing traffics among servers. Below are the built-in strategies already supported:
> * Round Robin
> * Least Active
> * Consistent Hash
> * Random
> Now, we are considering some more intelligent and adaptive strategies that can learn the healthy status of servers at runtime and automatically adjust traffic distributions, something like P2C for Finagle[1 ]and JSQ for Netflix[2].
> 0. https://issues.apache.org/jira/browse/DUBBO-33.
> 1. https://twitter.github.io/finagle/guide/Clients.html.
> 2. https://medium.com/netflix-techblog/netflix-edge-load-balancing-695308b5548c. 
> How to achieve it, guidance for your reference:  
> The new load-balancing strategy should be able to automatically isolate abnormal instances based on the statistics of the load or health status of the back-end Provider instance. This ensures that traffic is forwarded to the processing-capable instance. The load balancer should also know when to recover,  periodically checks the health status of the isolated instances, put back the instance into the normal instance pool to be scheduled once it's recovered. 
> A quite similar project is [Circuit Breaker|http://example.com], except that circuit breaker treats the downstream cluster as a whole while this Load Balancer needs to distinguish the state of each instance.
> This topic can be achieved by extending the [LoadBalance|https://github.com/apache/incubator-dubbo/blob/master/dubbo-cluster/src/main/java/org/apache/dubbo/rpc/cluster/LoadBalance.java] SPI.
> To provide the basic statistics for LB to make a decision, you may need to count the data of each RPC request, such as QPS, RT, Active Request, etc. This can be achieved by extending the [Filter|https://github.com/apache/incubator-dubbo/blob/master/dubbo-rpc/dubbo-rpc-api/src/main/java/org/apache/dubbo/rpc/Filter.java] SPI. For more details, see How [MetricsFilter|http://example.com] does it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)