You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apisix.apache.org by Zhang Chao <zc...@gmail.com> on 2020/10/25 04:33:00 UTC

[PROPOSAL] Introduce grayscale for configuration

Hello, community!

As we all know, the configuration synchronization in APISIX resorts to ETCD, once administrator creates/updates/deletes a config instance, it will be detected by all APISIX instances immediately, that’s cool but the scope is ALL INSTANCES, which also means all instances might suffer breakdown if the config instance is malformed (maybe lack of check), that’s not ops-friendly.

We’re familiar with the grayscale for server instances, use a small fraction of traffic to verify the work of new release, to reduce the influence of faults. So why not just using this way to verify the new issued config instance? What I named it as "configuration grayscale".

The way to use "configuration grayscale" is simple, what we need is an indication to tell the current APISIX instance whether it should apply this config instance, so obviously we can add a new item in each configuration (like route, upstream):


{
    "upstream": {
        "nodes": {
            "127.0.0.1:8080": 1
        } 
    },

    "annotations": {
        "grayscale": {
            "hostname": [
                "apisix-node1",
                "apisix-node3"
            ]
        }
    }
}

Here we put the "grayscale" into a more general field "annotations" rather than flattening it, that's more flexible and clear. The above example tells the APISIX instance to verify the grayscale firstly, just compare its hostname and the grayscale targets (wheter it's in the hostname list). If the grayscale hits, the APISIX instance is willing to use it, or on the contrary, it ignores this config instance just like it doens't receive it. The hostname comparsion is just a simple example and that not means we can only use this type of grayscale conditions. For instance, we may use the Nginx built-in variables systems to support more flexible grayscale.

{
    "upstream": {
        "nodes": {
            "127.0.0.1:8080": 1
        } 
    },

    "annotations": {
        "grayscale": {
            "vars": [
                { "$pid", "==", "12349" }
            ]
        }
    }
}

We need to discuss the most suitable grayscale way for APISIX, which can cover almost demands that an APISIX administrator needs.

Situtation will be complicated if grayscale is present in the config dependency (e.g. route depends on upstream), to better describe this problem, let's say we have two kinds of config A and B, and A depends on B. There are several situations we need to consider.

1) Both A and B have the grayscale conditions

In such a case, the grayscale conditions must same or there will have some instances cannot apply both A and B, requests on those instances cannot be handled properly.

2) A has grayscale conditions but B not

Since A depends on B and B can be applied unconditionally, there is no problem when A has grayscale conditions.

3) B has grayscale conditions but A not

Which means for APISIX instances that outside of B's apply scope, they cannot find B, and requests cannot be handled rightly.

So based on these situations, we should add some limitations to avoid these complicated situations, for example, don't gray release two config instances when they have relations, testing the "leaf" config instance firstly (B in abovementioned example) and make sure it's stable then try next.

Let's say a more concrete example, Alice needs to create a new route, for those requests which uri is prefixed by "/api/v1/trade", proxy them to upstream "trade-system", head first she adds the upstream and no other Route in APISIX use this upstream, then she tries to create the route that will use this upstream, but she is'nt sure whether the upstream, the route are absolute right, so when she creating the Route on APISIX dashboard, in turn she marks this Route as grayscale, and only node which name is "apigw-sh-1" can apply this route, after creating it, she starts to monitor the behaivor in that node for a while, one day later, all related requests in "apigw-sh-1" meets the expectations, then she cancels the grayscale and now each APISIX instance applies these routes.

The support of configuration scale can be gradual, we may support the core configurations like Route firstly, and let's users to try this feature and get more feedbacks.


Chao Zhang
zchao1995@gmail.com




Re: [PROPOSAL] Introduce grayscale for configuration

Posted by Zhang Chao <zc...@gmail.com>.
Hi, Jin

Thanks for your advices.



   1. The gray level of the configuration will make the configuration of
   APISIX stateful. The reason why it is considered to be stateful is that the
   configuration takes effect on some nodes, which will cause inconsistent
   behavior among APISIX nodes.

Yeah, it’s really the behavior among APISIX instances will be inconsistent,
it’s maybe more suitable for configuration update, such as SSL key pair
replacement. We may restrict the use case to avoid some unexpected
circumstances.


   1. Any small-scale verification we do is aimed at the granularity of
   traffic. The granularity of the proposal is too rough in terms of the
   number of nodes in APISIX. Even if the production environment only affects
   the configuration of a node, a lot of traffic will be affected

The node dimension is just an example, it’s not necessary to use it, just
like above mentioned, we can introduce the variable system to provide
finer-granularity.

Re: [PROPOSAL] Introduce grayscale for configuration

Posted by wei jin <kv...@apache.org>.
From this proposal, I can understand the effect you want to achieve, but
there are some doubts, maybe it is not a good choice for the gateway

1. The gray level of the configuration will make the configuration of
APISIX stateful. The reason why it is considered to be stateful is that the
configuration takes effect on some nodes, which will cause inconsistent
behavior among APISIX nodes.
  a. Annotation alone cannot precisely control which APISIX node the
traffic is allocated to; the same request may flow on a normal node, and
then on a gray node; this will bring some usage burden, and it will also be
wrong. The requester causes trouble
  b. If we add flow control to the annotation, it will be functionally
duplicated with APISIX's existing grayscale capabilities. It is better to
verify the configuration through flow control directly

2. Any small-scale verification we do is aimed at the granularity of
traffic. The granularity of the proposal is too rough in terms of the
number of nodes in APISIX. Even if the production environment only affects
the configuration of a node, a lot of traffic will be affected

 If APISIX supports mesh in the future, the traffic granularity of the
sidecar is much smaller than that of the gateway, and then the gray-scale
configuration will be a matter of course.


Zhang Chao <zc...@gmail.com> 于2020年10月25日周日 下午12:33写道:

> Hello, community!
>
> As we all know, the configuration synchronization in APISIX resorts to
> ETCD, once administrator creates/updates/deletes a config instance, it will
> be detected by all APISIX instances immediately, that’s cool but the scope
> is ALL INSTANCES, which also means all instances might suffer breakdown if
> the config instance is malformed (maybe lack of check), that’s not
> ops-friendly.
>
> We’re familiar with the grayscale for server instances, use a small
> fraction of traffic to verify the work of new release, to reduce the
> influence of faults. So why not just using this way to verify the new
> issued config instance? What I named it as "configuration grayscale".
>
> The way to use "configuration grayscale" is simple, what we need is an
> indication to tell the current APISIX instance whether it should apply this
> config instance, so obviously we can add a new item in each configuration
> (like route, upstream):
>
>
> {
>     "upstream": {
>         "nodes": {
>             "127.0.0.1:8080": 1
>         }
>     },
>
>     "annotations": {
>         "grayscale": {
>             "hostname": [
>                 "apisix-node1",
>                 "apisix-node3"
>             ]
>         }
>     }
> }
>
> Here we put the "grayscale" into a more general field "annotations" rather
> than flattening it, that's more flexible and clear. The above example tells
> the APISIX instance to verify the grayscale firstly, just compare its
> hostname and the grayscale targets (wheter it's in the hostname list). If
> the grayscale hits, the APISIX instance is willing to use it, or on the
> contrary, it ignores this config instance just like it doens't receive it.
> The hostname comparsion is just a simple example and that not means we can
> only use this type of grayscale conditions. For instance, we may use the
> Nginx built-in variables systems to support more flexible grayscale.
>
> {
>     "upstream": {
>         "nodes": {
>             "127.0.0.1:8080": 1
>         }
>     },
>
>     "annotations": {
>         "grayscale": {
>             "vars": [
>                 { "$pid", "==", "12349" }
>             ]
>         }
>     }
> }
>
> We need to discuss the most suitable grayscale way for APISIX, which can
> cover almost demands that an APISIX administrator needs.
>
> Situtation will be complicated if grayscale is present in the config
> dependency (e.g. route depends on upstream), to better describe this
> problem, let's say we have two kinds of config A and B, and A depends on B.
> There are several situations we need to consider.
>
> 1) Both A and B have the grayscale conditions
>
> In such a case, the grayscale conditions must same or there will have some
> instances cannot apply both A and B, requests on those instances cannot be
> handled properly.
>
> 2) A has grayscale conditions but B not
>
> Since A depends on B and B can be applied unconditionally, there is no
> problem when A has grayscale conditions.
>
> 3) B has grayscale conditions but A not
>
> Which means for APISIX instances that outside of B's apply scope, they
> cannot find B, and requests cannot be handled rightly.
>
> So based on these situations, we should add some limitations to avoid
> these complicated situations, for example, don't gray release two config
> instances when they have relations, testing the "leaf" config instance
> firstly (B in abovementioned example) and make sure it's stable then try
> next.
>
> Let's say a more concrete example, Alice needs to create a new route, for
> those requests which uri is prefixed by "/api/v1/trade", proxy them to
> upstream "trade-system", head first she adds the upstream and no other
> Route in APISIX use this upstream, then she tries to create the route that
> will use this upstream, but she is'nt sure whether the upstream, the route
> are absolute right, so when she creating the Route on APISIX dashboard, in
> turn she marks this Route as grayscale, and only node which name is
> "apigw-sh-1" can apply this route, after creating it, she starts to monitor
> the behaivor in that node for a while, one day later, all related requests
> in "apigw-sh-1" meets the expectations, then she cancels the grayscale and
> now each APISIX instance applies these routes.
>
> The support of configuration scale can be gradual, we may support the core
> configurations like Route firstly, and let's users to try this feature and
> get more feedbacks.
>
>
> Chao Zhang
> zchao1995@gmail.com
>
>
>
>