You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@buildstream.apache.org by Tristan Van Berkom <tr...@codethink.co.uk> on 2021/01/28 08:12:28 UTC

Re: Proposal: Redesign of configuration surface for all remote services

Hi all,

Since there was no activity on this thread for a very long time, I
decided to go ahead and take a crack at this.

I have a good branch now that is ready for review. The MR is up here: 
https://github.com/apache/buildstream/pull/1453


I'm sending a detailed email because it's a large proposal and I would
like this to be visible, so that people can chime in incase we've
missed an important use case.

Cheers,
    -Tristan


Here is the design/changes I've come up with.
=============================================


The offending junction configurations
-------------------------------------
The junction configurations:

  "cache-junction-elements"
  "ignore-junction-remotes"

Are completely removed, reducing the worrisome ambiguity of what
happens in what configuration.

This is replaced by the enhanced user configuration.


Authentication
--------------
For all of the authentication related properties, `server-cert`,
`client-cert` and `client-key`, these have been split out into a
subdictionary named "auth" for any remote configuration.

This may allow better extensibility for alternative authentication
methods in the future, however right now it serves us very well to be
able to document the "auth" dictionary in one central place in the
documentation.


Remote Execution Configuration
------------------------------
As described in this thread, this is now only configurable with user
configuration, and only one "remote-execution" block is ever
considered.

There is no longer any ambiguity here, "remote-execution" applies to an
entire session, one build session cannot be built across multiple
different remote execution build clusters.


Artifact and Source cache configuration
---------------------------------------
Projects are still allowed to provide recommendations for artifact and
source cache servers.

User configuration now has the ability to override them, i.e. disregard
artifact and source cache servers declared in projects.

Also, it is no longer possible to declare an artifact/source cache
server as a dictionary, it MUST be a list.

This choice is simply because it the dict-or-list tactic here does not
buy us any convenience whatsoever, and clarity that it is in fact a
list of dictionaries is more worthwhile.

Consider:
~~~~~~~~~

  artifacts:
    url: https://pony.moose/zebra:4040
    push: true

Versus:
~~~~~~~

  artifacts:
  - url: https://pony.moose/zebra:4040
    push: true

It is exactly the same amount of typing, no point in supporting both
here.


Project Configuration
~~~~~~~~~~~~~~~~~~~~~

  #
  # This is mostly unchanged, except for the `auth`
  #
  artifacts:
  - url: https://pony.com:9999
    type: both
    push: false
    instance-name: this-shard
    auth:
      server-cert: server.crt


User Configuration
~~~~~~~~~~~~~~~~~~
We can declare global artifact configuration, which either
overrides or augments project recommended cache servers.

When "augmenting", the user configuration is still at a higher priority
than the project recommendations (as in: user configuration caches will
be consulted *first* when interacting with remotes).


  #
  # Global artifact configuration
  #
  artifacts:

    #
    # Here we decide whether user configuration overrides
    # project recommendations.
    #
    override-project-caches: true

    #
    # And we declare the global artifact configurations
    # under the new "servers" sub-dictionary instead
    #
    servers:
    - url: https://pony.com:9999
      type: both
      push: true
      instance-name: this-shard
      auth:
        server-cert: server.crt
        client-key: client.key
        client-cert: client.crt


We can still declare artifact configuration in the overrides, with
exactly the same new configuration:

  #
  # Artifact configuration for project "foo"
  #
  projects:
    foo:

      artifacts:
        #
        # Lets completely override the cache for only project "foo"
        #
        override-project-caches: true

        #
        # And declare the servers here
        #
        servers:
        - url: https://pony.com:9999
          type: both
          push: true
          instance-name: this-shard
          auth:
            server-cert: server.crt
            client-key: client.key
            client-cert: client.crt


Use case overview
=================
Here is an overview of the previously discussed desirable use cases.

Inline responses to my initial proposal:

[...]
> Use cases we want
> =================
> Here I will try to provide a birds eye view of what our use cases are,
> what does a BuildStream client application require from these
> services ?
> 
>   * The ability to store and retrieve artifacts on a remote artifact
>     server.

Of course.

>   * The ability to store and retrieve staged source packages, indexed
>     by source cache key, on remote source cache services.

Of course.

>   * The ability to farm out builds to a remote execution service

Depending only on your ability to setup a remote execution build
cluster, of course.

>   * The ability to make requirements of worker instances on a remote
>     execution service.
> 
>     - Possibly also the ability to bail out early if the remote
>       execution service knows that it cannot provide a worker which
>       with the properties which some of the project elements require.

Nothing changes here thus far, although that is not to say we are
perfect in this regard yet, but this patch does not effect this.

>   * Ability to have redundancies in configuration of remote servers, in
>     case a service is down we usually allow configuration of services
>     in list format.

We still have this.

>   * Ability to carry artifacts forward from a third party artifact
>     cache which was recommended by project configuration across a
>     junction boundary.
> 
>     I.e. for better repeatability, it is often desirable to re-cache
>     the artifacts from an upstream project on your own infra in order
>     to ensure you have your own copy.
> 
>     NOTE: This is currently only available in project data and not
>           overridable by user configuration in the form of the
>           `cache-junction-elements`[2] configuration, which I already
>           pointed out was problematic in my original report[0].

With this patch, we can achieve this use case by configuring a global
cache server for pushing, and setting `override-project-caches` to
`false`.

The result will be:

  * When pulling, we will:

    - First try to pull from the globally defined cache servers
    - Fall back on project defined cache servers

  * When pushing, we will:

    - First push to our globally defined cache servers
    - Not push to the project defined cache servers

Why will we not push to the project defined cache servers ?

Well, it is currently only a consequence of the simple fact that
normally people simply do not configure "push" servers in a
project.conf, as that would likely imply that they are publishing the
private key needed to push to their server along with their project, so
that everyone and their dog can push anything they like to the artifact
server.

We could additionally police this in the code and completely disallow
such nonsense configurations, but for now I've just left a fat notice
in the project.conf documentation which points out that is is a very
bad idea to configure a "push" remote from your project.conf (following
the "let people shoot their own feet if they really want to" policy).

>   * Ability to avoid downloading artifacts found on third party
>     infrastructure.
> 
>     I.e. for better trustability, you may want to ensure that all of
>     the built artifacts you end up consuming were built on
>     infrastructure you control, rather than downloaded from an upstream
>     project's artifact server.
> 
>     NOTE: This is currently only available in project data and not
>           overridable by user configuration in the form of the 
>           `ignore-junction-remotes`[2] configuration, which I already
>           pointed out was problematic in my original report[0].

This is now possible by simply declaring `override-project-caches` to
`true` in the global configuration, regardless of whether or not you
have provided any remotes in your global configuration.

>   * Ability to farm out any local caching work a remote service, to
>     reduce uploads and downloads for builds when configured on an RE
>     service (by way of specifying the RE service's CAS here),
>     configuring multiple build machines which run without RE may also
>     be optimized by way of using the same remote CAS for this.
> 
>     NOTE: This has not yet landed and is a part of Jürg's ongoing
>     work[3].

This orthogonal feature has yet to land in master.

>   * Ability to clearly override the recommendation of any project.conf
>     in the loaded pipeline using user configuration, which should
>     always have the last word on any user configuration.

This can be achieved on a per-project bases by setting the new
`override-project-caches` attribute to `true` in the overrides section
of the user configuration.

Re: Proposal: Redesign of configuration surface for all remote services

Posted by Sander Striker <s....@striker.nl>.

Hi,

This all makes sense to me.  Once the migration is done again, I'll file an
issue for RE Scenario 2, such that we don't forget :-).

Cheers,

Sander

On Wed, Feb 3, 2021 at 7:50 AM Tristan Van Berkom <
tristan.vanberkom@codethink.co.uk> wrote:

> Hi again,
>
> One other thought I forgot to include...
>
> On Tue, 2021-02-02 at 23:28 +0900, Tristan Van Berkom wrote:
> > Hi Sander,
> >
> > Replies inline.
> >
> [...]
> >   * I am reading between the lines that there is an idea that
> >     the entire "push" configuration option is redundant, as one
> >     could infer this by the presence of certain credentials.
> >
> >     It was my original naive understanding that we could get away
> >     with renaming the confusingly named `server-cert`, `client-cert`
> >     and `client-key` configurations with `pull-cert` and `push-cert`
> >     and `push-key`, alleviating some cognitive effort for the user.
> >
> >     However I was mistaken about the relation here, and I don't think
> >     that we can infer the ability or decision to push based on what
> >     credentials have been provided.
> >
> >     Instead, the "server-cert" is only needed when encrypted traffic
> >     is only required on the client receiving end, and the client
> >     cert/key are additionally required when bidirectional TLS is
> >     required.
> >
> >     I think it is better to consider this requirement as an attribute
> >     of the client server relationship, and the implementation of the
> >     service, rather than to assume that two way encryption will never
> >     be required for activities which the client deems to be "pulling"
> >     or "fetching".
> >
> >     Even though the implementations we are currently working with
> >     happen to require all three credentials only for push operations,
> >     this does not mean it will always be so, for every service
> >     implementation (at least, not as far as I understand).
>
> As I am not a big fan of the "push" boolean switch either, I was also
> thinking this might be more appropriate as an enumeration.
>
> Instead of "push" being either true/false, we could have an "action"
> attribute with three values: "push" / "pull" / "both".
>
> I did not do this in the branch as of yet, but what it would provide in
> addition to the current API is the ability to define push remotes where
> you only want to push but want to make sure you never try to pull from
> those.
>
> However, I haven't come up with a sensible reason for a user to desire
> this option, which is why I didn't change this part.
> Cheers,
>     -Tristan
>
>
>

Re: Proposal: Redesign of configuration surface for all remote services

Posted by Tristan Van Berkom <tr...@codethink.co.uk>.

Hi again,

One other thought I forgot to include...

On Tue, 2021-02-02 at 23:28 +0900, Tristan Van Berkom wrote:
> Hi Sander,
> 
> Replies inline.
> 
[...]
>   * I am reading between the lines that there is an idea that
>     the entire "push" configuration option is redundant, as one
>     could infer this by the presence of certain credentials.
> 
>     It was my original naive understanding that we could get away
>     with renaming the confusingly named `server-cert`, `client-cert`
>     and `client-key` configurations with `pull-cert` and `push-cert`
>     and `push-key`, alleviating some cognitive effort for the user.
> 
>     However I was mistaken about the relation here, and I don't think
>     that we can infer the ability or decision to push based on what
>     credentials have been provided.
> 
>     Instead, the "server-cert" is only needed when encrypted traffic
>     is only required on the client receiving end, and the client
>     cert/key are additionally required when bidirectional TLS is
>     required.
> 
>     I think it is better to consider this requirement as an attribute
>     of the client server relationship, and the implementation of the
>     service, rather than to assume that two way encryption will never
>     be required for activities which the client deems to be "pulling"
>     or "fetching".
> 
>     Even though the implementations we are currently working with
>     happen to require all three credentials only for push operations,
>     this does not mean it will always be so, for every service
>     implementation (at least, not as far as I understand).

As I am not a big fan of the "push" boolean switch either, I was also
thinking this might be more appropriate as an enumeration.

Instead of "push" being either true/false, we could have an "action"
attribute with three values: "push" / "pull" / "both".

I did not do this in the branch as of yet, but what it would provide in
addition to the current API is the ability to define push remotes where
you only want to push but want to make sure you never try to pull from
those.

However, I haven't come up with a sensible reason for a user to desire
this option, which is why I didn't change this part.

Cheers,
    -Tristan

Re: Proposal: Redesign of configuration surface for all remote services

Posted by Tristan Van Berkom <tr...@codethink.co.uk>.

Hi Sander,

Replies inline.

On Tue, 2021-02-02 at 12:35 +0100, Sander Striker wrote:
> Hi Tristan,
> 
> First of all, Thank You for the write up as well as the progress made so
> far.
> 
> I'll be honest in that I am not sure I got the full picture of implications
> - maybe as a result of your attempt to outline it in detail ;-).  But
> that's on me.
> 
> I have a couple of questions:
> *Remote Execution*
> There seems to be a skew towards only using a single service in the
> proposal.  There are some aspects to Remote Execution that we haven't
> clearly articulated or fully implemented.  We have only implemented
> scenario 1 currently.
> 
> Scenario 1: Full Remote Execution [implemented]
> - BuildStream checks ActionCache
>   - on cache hit, get ActionResult and blobs from CAS associated with
> Remote Execution.  Note: the blob content fetching could be on-demand
>   - on cache miss, uploads any missing blobs for the Action to the CAS
> associated with Remote Execution, finally it uses the Execution service to
> execute the Action.
> 
> Scenario 2: Remote Cache (read-only) [not currently implemented]
> - BuildStream checks ActionCache
>   - on cache hit, get ActionResult and blobs from CAS associated with
> Remote Execution.  Note: the blob content fetching could be on-demand
>   - on cache miss fall back to local execution
> 
> Scenario 3: Remote Cache (read-write) [not currently implemented]
> - BuildStream checks ActionCache
>   - on cache hit, get ActionResult and blobs from CAS associated with
> Remote Execution.  Note: the blob content fetching could be on-demand
>   - on cache miss fall back to local execution, and upload result blobs to
> CAS associated with Remote Execution, finally write ActionResult to
> ActionCache
> 
> Especially in case of Scenario 2, it is not unimaginable to have a project
> propose a read-only endpoint.  Getting these implemented is currently
> mostly tricky due to the "fall back to local execution" bit - as we choose
> to either remote or to locally execute up front.

As far as I can see, yes it can make sense, once those features are
implemented in some way, to figure out a sensible way to have the
project make recommendations of where to pull certain things from.

I think it makes better sense to postpone exposing such options until a
time when they have a defined purpose.

> *Write access configuration*
> With the current credentials it may make sense to only have push be enabled
> when auth credentials have been set up in the user configuration.  To also
> set `push: true` seems redundant if that is also set in project.conf.  If
> you don't want to push as a user, set `push: false` in user conf if you
> have credentials configured.
> At some point in the future, I can imagine credential prompting as well,
> and I would imagine bst helps me out by caching those creds or e.g. storing
> a JWT for later use.

A few things:

  * Setting 'push' to true or false is strictly related to a specific
    endpoint declaration.

    These endpoint lists are recommended in projects and ultimately
    decided in user configuration.

    A given endpoint is never inherited and augmented, it is either
    completely replaced by user configuration, or the user
    configuration decides to keep this end point.

    In other words, you don't set `push: True` in the user config
    for an endpoint that is recommended by the project.conf, that
    never happens.

  * I am reading between the lines that there is an idea that
    the entire "push" configuration option is redundant, as one
    could infer this by the presence of certain credentials.

    It was my original naive understanding that we could get away
    with renaming the confusingly named `server-cert`, `client-cert`
    and `client-key` configurations with `pull-cert` and `push-cert`
    and `push-key`, alleviating some cognitive effort for the user.

    However I was mistaken about the relation here, and I don't think
    that we can infer the ability or decision to push based on what
    credentials have been provided.

    Instead, the "server-cert" is only needed when encrypted traffic
    is only required on the client receiving end, and the client
    cert/key are additionally required when bidirectional TLS is
    required.

    I think it is better to consider this requirement as an attribute
    of the client server relationship, and the implementation of the
    service, rather than to assume that two way encryption will never
    be required for activities which the client deems to be "pulling"
    or "fetching".

    Even though the implementations we are currently working with
    happen to require all three credentials only for push operations,
    this does not mean it will always be so, for every service
    implementation (at least, not as far as I understand).

  * Regarding user prompting for credentials and suchlike, I think
    the refactor and cleanup does bring the code into much better shape
    to handle this sort of thing, as we have a clear decision early
    on of all the services which the client will attempt to access
    over the course of a session, all boiled down into a clean
    focal point/function.

    I don't think the prompting and storing of credentials has
    any effect on the API configuration changes put forward by
    this branch.

Does this make sense ?

Cheers,
    -Tristan


> Any thoughts on how the above fits with the current changes?

> Cheers,
> 
> Sander
> 
> 
> On Thu, Jan 28, 2021 at 9:12 AM Tristan Van Berkom <
> tristan.vanberkom@codethink.co.uk> wrote:
> 
> > Hi all,
> > 
> > Since there was no activity on this thread for a very long time, I
> > decided to go ahead and take a crack at this.
> > 
> > I have a good branch now that is ready for review. The MR is up here:
> > https://github.com/apache/buildstream/pull/1453
> > 
> > 
> > I'm sending a detailed email because it's a large proposal and I would
> > like this to be visible, so that people can chime in incase we've
> > missed an important use case.
> > 
> > Cheers,
> >     -Tristan
> > 
> > 
> > Here is the design/changes I've come up with.
> > =============================================
> > 
> > 
> > The offending junction configurations
> > -------------------------------------
> > The junction configurations:
> > 
> >   "cache-junction-elements"
> >   "ignore-junction-remotes"
> > 
> > Are completely removed, reducing the worrisome ambiguity of what
> > happens in what configuration.
> > 
> > This is replaced by the enhanced user configuration.
> > 
> > 
> > Authentication
> > --------------
> > For all of the authentication related properties, `server-cert`,
> > `client-cert` and `client-key`, these have been split out into a
> > subdictionary named "auth" for any remote configuration.
> > 
> > This may allow better extensibility for alternative authentication
> > methods in the future, however right now it serves us very well to be
> > able to document the "auth" dictionary in one central place in the
> > documentation.
> > 
> > 
> > Remote Execution Configuration
> > ------------------------------
> > As described in this thread, this is now only configurable with user
> > configuration, and only one "remote-execution" block is ever
> > considered.
> > 
> > There is no longer any ambiguity here, "remote-execution" applies to an
> > entire session, one build session cannot be built across multiple
> > different remote execution build clusters.
> > 
> > 
> > Artifact and Source cache configuration
> > ---------------------------------------
> > Projects are still allowed to provide recommendations for artifact and
> > source cache servers.
> > 
> > User configuration now has the ability to override them, i.e. disregard
> > artifact and source cache servers declared in projects.
> > 
> > Also, it is no longer possible to declare an artifact/source cache
> > server as a dictionary, it MUST be a list.
> > 
> > This choice is simply because it the dict-or-list tactic here does not
> > buy us any convenience whatsoever, and clarity that it is in fact a
> > list of dictionaries is more worthwhile.
> > 
> > Consider:
> > ~~~~~~~~~
> > 
> >   artifacts:
> >     url: https://pony.moose/zebra:4040
> >     push: true
> > 
> > Versus:
> > ~~~~~~~
> > 
> >   artifacts:
> >   - url: https://pony.moose/zebra:4040
> >     push: true
> > 
> > It is exactly the same amount of typing, no point in supporting both
> > here.
> > 
> > 
> > Project Configuration
> > ~~~~~~~~~~~~~~~~~~~~~
> > 
> >   #
> >   # This is mostly unchanged, except for the `auth`
> >   #
> >   artifacts:
> >   - url: https://pony.com:9999
> >     type: both
> >     push: false
> >     instance-name: this-shard
> >     auth:
> >       server-cert: server.crt
> > 
> > 
> > User Configuration
> > ~~~~~~~~~~~~~~~~~~
> > We can declare global artifact configuration, which either
> > overrides or augments project recommended cache servers.
> > 
> > When "augmenting", the user configuration is still at a higher priority
> > than the project recommendations (as in: user configuration caches will
> > be consulted *first* when interacting with remotes).
> > 
> > 
> >   #
> >   # Global artifact configuration
> >   #
> >   artifacts:
> > 
> >     #
> >     # Here we decide whether user configuration overrides
> >     # project recommendations.
> >     #
> >     override-project-caches: true
> > 
> >     #
> >     # And we declare the global artifact configurations
> >     # under the new "servers" sub-dictionary instead
> >     #
> >     servers:
> >     - url: https://pony.com:9999
> >       type: both
> >       push: true
> >       instance-name: this-shard
> >       auth:
> >         server-cert: server.crt
> >         client-key: client.key
> >         client-cert: client.crt
> > 
> > 
> > We can still declare artifact configuration in the overrides, with
> > exactly the same new configuration:
> > 
> >   #
> >   # Artifact configuration for project "foo"
> >   #
> >   projects:
> >     foo:
> > 
> >       artifacts:
> >         #
> >         # Lets completely override the cache for only project "foo"
> >         #
> >         override-project-caches: true
> > 
> >         #
> >         # And declare the servers here
> >         #
> >         servers:
> >         - url: https://pony.com:9999
> >           type: both
> >           push: true
> >           instance-name: this-shard
> >           auth:
> >             server-cert: server.crt
> >             client-key: client.key
> >             client-cert: client.crt
> > 
> > 
> > Use case overview
> > =================
> > Here is an overview of the previously discussed desirable use cases.
> > 
> > Inline responses to my initial proposal:
> > 
> > [...]
> > > Use cases we want
> > > =================
> > > Here I will try to provide a birds eye view of what our use cases are,
> > > what does a BuildStream client application require from these
> > > services ?
> > > 
> > >   * The ability to store and retrieve artifacts on a remote artifact
> > >     server.
> > 
> > Of course.
> > 
> > >   * The ability to store and retrieve staged source packages, indexed
> > >     by source cache key, on remote source cache services.
> > 
> > Of course.
> > 
> > >   * The ability to farm out builds to a remote execution service
> > 
> > Depending only on your ability to setup a remote execution build
> > cluster, of course.
> > 
> > >   * The ability to make requirements of worker instances on a remote
> > >     execution service.
> > > 
> > >     - Possibly also the ability to bail out early if the remote
> > >       execution service knows that it cannot provide a worker which
> > >       with the properties which some of the project elements require.
> > 
> > Nothing changes here thus far, although that is not to say we are
> > perfect in this regard yet, but this patch does not effect this.
> > 
> > >   * Ability to have redundancies in configuration of remote servers, in
> > >     case a service is down we usually allow configuration of services
> > >     in list format.
> > 
> > We still have this.
> > 
> > >   * Ability to carry artifacts forward from a third party artifact
> > >     cache which was recommended by project configuration across a
> > >     junction boundary.
> > > 
> > >     I.e. for better repeatability, it is often desirable to re-cache
> > >     the artifacts from an upstream project on your own infra in order
> > >     to ensure you have your own copy.
> > > 
> > >     NOTE: This is currently only available in project data and not
> > >           overridable by user configuration in the form of the
> > >           `cache-junction-elements`[2] configuration, which I already
> > >           pointed out was problematic in my original report[0].
> > 
> > With this patch, we can achieve this use case by configuring a global
> > cache server for pushing, and setting `override-project-caches` to
> > `false`.
> > 
> > The result will be:
> > 
> >   * When pulling, we will:
> > 
> >     - First try to pull from the globally defined cache servers
> >     - Fall back on project defined cache servers
> > 
> >   * When pushing, we will:
> > 
> >     - First push to our globally defined cache servers
> >     - Not push to the project defined cache servers
> > 
> > Why will we not push to the project defined cache servers ?
> > 
> > Well, it is currently only a consequence of the simple fact that
> > normally people simply do not configure "push" servers in a
> > project.conf, as that would likely imply that they are publishing the
> > private key needed to push to their server along with their project, so
> > that everyone and their dog can push anything they like to the artifact
> > server.
> > 
> > We could additionally police this in the code and completely disallow
> > such nonsense configurations, but for now I've just left a fat notice
> > in the project.conf documentation which points out that is is a very
> > bad idea to configure a "push" remote from your project.conf (following
> > the "let people shoot their own feet if they really want to" policy).
> > 
> > >   * Ability to avoid downloading artifacts found on third party
> > >     infrastructure.
> > > 
> > >     I.e. for better trustability, you may want to ensure that all of
> > >     the built artifacts you end up consuming were built on
> > >     infrastructure you control, rather than downloaded from an upstream
> > >     project's artifact server.
> > > 
> > >     NOTE: This is currently only available in project data and not
> > >           overridable by user configuration in the form of the
> > >           `ignore-junction-remotes`[2] configuration, which I already
> > >           pointed out was problematic in my original report[0].
> > 
> > This is now possible by simply declaring `override-project-caches` to
> > `true` in the global configuration, regardless of whether or not you
> > have provided any remotes in your global configuration.
> > 
> > >   * Ability to farm out any local caching work a remote service, to
> > >     reduce uploads and downloads for builds when configured on an RE
> > >     service (by way of specifying the RE service's CAS here),
> > >     configuring multiple build machines which run without RE may also
> > >     be optimized by way of using the same remote CAS for this.
> > > 
> > >     NOTE: This has not yet landed and is a part of Jürg's ongoing
> > >     work[3].
> > 
> > This orthogonal feature has yet to land in master.
> > 
> > >   * Ability to clearly override the recommendation of any project.conf
> > >     in the loaded pipeline using user configuration, which should
> > >     always have the last word on any user configuration.
> > 
> > This can be achieved on a per-project bases by setting the new
> > `override-project-caches` attribute to `true` in the overrides section
> > of the user configuration.
> > 
> > 
> > 
> > 
> >

Re: Proposal: Redesign of configuration surface for all remote services

Posted by Sander Striker <s....@striker.nl>.

Hi Tristan,

First of all, Thank You for the write up as well as the progress made so
far.

I'll be honest in that I am not sure I got the full picture of implications
- maybe as a result of your attempt to outline it in detail ;-).  But
that's on me.

I have a couple of questions:
*Remote Execution*
There seems to be a skew towards only using a single service in the
proposal.  There are some aspects to Remote Execution that we haven't
clearly articulated or fully implemented.  We have only implemented
scenario 1 currently.

Scenario 1: Full Remote Execution [implemented]
- BuildStream checks ActionCache
  - on cache hit, get ActionResult and blobs from CAS associated with
Remote Execution.  Note: the blob content fetching could be on-demand
  - on cache miss, uploads any missing blobs for the Action to the CAS
associated with Remote Execution, finally it uses the Execution service to
execute the Action.

Scenario 2: Remote Cache (read-only) [not currently implemented]
- BuildStream checks ActionCache
  - on cache hit, get ActionResult and blobs from CAS associated with
Remote Execution.  Note: the blob content fetching could be on-demand
  - on cache miss fall back to local execution

Scenario 3: Remote Cache (read-write) [not currently implemented]
- BuildStream checks ActionCache
  - on cache hit, get ActionResult and blobs from CAS associated with
Remote Execution.  Note: the blob content fetching could be on-demand
  - on cache miss fall back to local execution, and upload result blobs to
CAS associated with Remote Execution, finally write ActionResult to
ActionCache

Especially in case of Scenario 2, it is not unimaginable to have a project
propose a read-only endpoint.  Getting these implemented is currently
mostly tricky due to the "fall back to local execution" bit - as we choose
to either remote or to locally execute up front.

*Write access configuration*
With the current credentials it may make sense to only have push be enabled
when auth credentials have been set up in the user configuration.  To also
set `push: true` seems redundant if that is also set in project.conf.  If
you don't want to push as a user, set `push: false` in user conf if you
have credentials configured.
At some point in the future, I can imagine credential prompting as well,
and I would imagine bst helps me out by caching those creds or e.g. storing
a JWT for later use.

Any thoughts on how the above fits with the current changes?

Cheers,

Sander


On Thu, Jan 28, 2021 at 9:12 AM Tristan Van Berkom <
tristan.vanberkom@codethink.co.uk> wrote:

> Hi all,
>
> Since there was no activity on this thread for a very long time, I
> decided to go ahead and take a crack at this.
>
> I have a good branch now that is ready for review. The MR is up here:
> https://github.com/apache/buildstream/pull/1453
>
>
> I'm sending a detailed email because it's a large proposal and I would
> like this to be visible, so that people can chime in incase we've
> missed an important use case.
>
> Cheers,
>     -Tristan
>
>
> Here is the design/changes I've come up with.
> =============================================
>
>
> The offending junction configurations
> -------------------------------------
> The junction configurations:
>
>   "cache-junction-elements"
>   "ignore-junction-remotes"
>
> Are completely removed, reducing the worrisome ambiguity of what
> happens in what configuration.
>
> This is replaced by the enhanced user configuration.
>
>
> Authentication
> --------------
> For all of the authentication related properties, `server-cert`,
> `client-cert` and `client-key`, these have been split out into a
> subdictionary named "auth" for any remote configuration.
>
> This may allow better extensibility for alternative authentication
> methods in the future, however right now it serves us very well to be
> able to document the "auth" dictionary in one central place in the
> documentation.
>
>
> Remote Execution Configuration
> ------------------------------
> As described in this thread, this is now only configurable with user
> configuration, and only one "remote-execution" block is ever
> considered.
>
> There is no longer any ambiguity here, "remote-execution" applies to an
> entire session, one build session cannot be built across multiple
> different remote execution build clusters.
>
>
> Artifact and Source cache configuration
> ---------------------------------------
> Projects are still allowed to provide recommendations for artifact and
> source cache servers.
>
> User configuration now has the ability to override them, i.e. disregard
> artifact and source cache servers declared in projects.
>
> Also, it is no longer possible to declare an artifact/source cache
> server as a dictionary, it MUST be a list.
>
> This choice is simply because it the dict-or-list tactic here does not
> buy us any convenience whatsoever, and clarity that it is in fact a
> list of dictionaries is more worthwhile.
>
> Consider:
> ~~~~~~~~~
>
>   artifacts:
>     url: https://pony.moose/zebra:4040
>     push: true
>
> Versus:
> ~~~~~~~
>
>   artifacts:
>   - url: https://pony.moose/zebra:4040
>     push: true
>
> It is exactly the same amount of typing, no point in supporting both
> here.
>
>
> Project Configuration
> ~~~~~~~~~~~~~~~~~~~~~
>
>   #
>   # This is mostly unchanged, except for the `auth`
>   #
>   artifacts:
>   - url: https://pony.com:9999
>     type: both
>     push: false
>     instance-name: this-shard
>     auth:
>       server-cert: server.crt
>
>
> User Configuration
> ~~~~~~~~~~~~~~~~~~
> We can declare global artifact configuration, which either
> overrides or augments project recommended cache servers.
>
> When "augmenting", the user configuration is still at a higher priority
> than the project recommendations (as in: user configuration caches will
> be consulted *first* when interacting with remotes).
>
>
>   #
>   # Global artifact configuration
>   #
>   artifacts:
>
>     #
>     # Here we decide whether user configuration overrides
>     # project recommendations.
>     #
>     override-project-caches: true
>
>     #
>     # And we declare the global artifact configurations
>     # under the new "servers" sub-dictionary instead
>     #
>     servers:
>     - url: https://pony.com:9999
>       type: both
>       push: true
>       instance-name: this-shard
>       auth:
>         server-cert: server.crt
>         client-key: client.key
>         client-cert: client.crt
>
>
> We can still declare artifact configuration in the overrides, with
> exactly the same new configuration:
>
>   #
>   # Artifact configuration for project "foo"
>   #
>   projects:
>     foo:
>
>       artifacts:
>         #
>         # Lets completely override the cache for only project "foo"
>         #
>         override-project-caches: true
>
>         #
>         # And declare the servers here
>         #
>         servers:
>         - url: https://pony.com:9999
>           type: both
>           push: true
>           instance-name: this-shard
>           auth:
>             server-cert: server.crt
>             client-key: client.key
>             client-cert: client.crt
>
>
> Use case overview
> =================
> Here is an overview of the previously discussed desirable use cases.
>
> Inline responses to my initial proposal:
>
> [...]
> > Use cases we want
> > =================
> > Here I will try to provide a birds eye view of what our use cases are,
> > what does a BuildStream client application require from these
> > services ?
> >
> >   * The ability to store and retrieve artifacts on a remote artifact
> >     server.
>
> Of course.
>
> >   * The ability to store and retrieve staged source packages, indexed
> >     by source cache key, on remote source cache services.
>
> Of course.
>
> >   * The ability to farm out builds to a remote execution service
>
> Depending only on your ability to setup a remote execution build
> cluster, of course.
>
> >   * The ability to make requirements of worker instances on a remote
> >     execution service.
> >
> >     - Possibly also the ability to bail out early if the remote
> >       execution service knows that it cannot provide a worker which
> >       with the properties which some of the project elements require.
>
> Nothing changes here thus far, although that is not to say we are
> perfect in this regard yet, but this patch does not effect this.
>
> >   * Ability to have redundancies in configuration of remote servers, in
> >     case a service is down we usually allow configuration of services
> >     in list format.
>
> We still have this.
>
> >   * Ability to carry artifacts forward from a third party artifact
> >     cache which was recommended by project configuration across a
> >     junction boundary.
> >
> >     I.e. for better repeatability, it is often desirable to re-cache
> >     the artifacts from an upstream project on your own infra in order
> >     to ensure you have your own copy.
> >
> >     NOTE: This is currently only available in project data and not
> >           overridable by user configuration in the form of the
> >           `cache-junction-elements`[2] configuration, which I already
> >           pointed out was problematic in my original report[0].
>
> With this patch, we can achieve this use case by configuring a global
> cache server for pushing, and setting `override-project-caches` to
> `false`.
>
> The result will be:
>
>   * When pulling, we will:
>
>     - First try to pull from the globally defined cache servers
>     - Fall back on project defined cache servers
>
>   * When pushing, we will:
>
>     - First push to our globally defined cache servers
>     - Not push to the project defined cache servers
>
> Why will we not push to the project defined cache servers ?
>
> Well, it is currently only a consequence of the simple fact that
> normally people simply do not configure "push" servers in a
> project.conf, as that would likely imply that they are publishing the
> private key needed to push to their server along with their project, so
> that everyone and their dog can push anything they like to the artifact
> server.
>
> We could additionally police this in the code and completely disallow
> such nonsense configurations, but for now I've just left a fat notice
> in the project.conf documentation which points out that is is a very
> bad idea to configure a "push" remote from your project.conf (following
> the "let people shoot their own feet if they really want to" policy).
>
> >   * Ability to avoid downloading artifacts found on third party
> >     infrastructure.
> >
> >     I.e. for better trustability, you may want to ensure that all of
> >     the built artifacts you end up consuming were built on
> >     infrastructure you control, rather than downloaded from an upstream
> >     project's artifact server.
> >
> >     NOTE: This is currently only available in project data and not
> >           overridable by user configuration in the form of the
> >           `ignore-junction-remotes`[2] configuration, which I already
> >           pointed out was problematic in my original report[0].
>
> This is now possible by simply declaring `override-project-caches` to
> `true` in the global configuration, regardless of whether or not you
> have provided any remotes in your global configuration.
>
> >   * Ability to farm out any local caching work a remote service, to
> >     reduce uploads and downloads for builds when configured on an RE
> >     service (by way of specifying the RE service's CAS here),
> >     configuring multiple build machines which run without RE may also
> >     be optimized by way of using the same remote CAS for this.
> >
> >     NOTE: This has not yet landed and is a part of Jürg's ongoing
> >     work[3].
>
> This orthogonal feature has yet to land in master.
>
> >   * Ability to clearly override the recommendation of any project.conf
> >     in the loaded pipeline using user configuration, which should
> >     always have the last word on any user configuration.
>
> This can be achieved on a per-project bases by setting the new
> `override-project-caches` attribute to `true` in the overrides section
> of the user configuration.
>
>
>
>
>

Re: Proposal: Redesign of configuration surface for all remote services

Posted by Tristan Van Berkom <tr...@codethink.co.uk>.

Hi,

Thanks for the reply Abderrahim :)

FWIW, the new PR (after the migration correction) is up at:

    https://github.com/apache/buildstream/pull/1434

On Fri, 2021-02-05 at 21:13 +0100, Abderrahim Kitouni wrote:
> Hi Tristan,
[...]
> > Authentication
> > --------------
> > For all of the authentication related properties, `server-cert`,
> > `client-cert` and `client-key`, these have been split out into a
> > subdictionary named "auth" for any remote configuration.
> > 
> > This may allow better extensibility for alternative authentication
> > methods in the future, however right now it serves us very well to be
> > able to document the "auth" dictionary in one central place in the
> > documentation.
> 
> I think it's better to keep `server-cert` out of the sub-dictionary.
> The server certificate is needed when the server is using an untrusted
> certificate regardless of whether we want to authenticate with the
> server and regardless of the method we use to authenticate (currently
> client certificate, but we may want to support some kind of access
> token in the future like buildbox)

I understand that the YAML might be one dict less wordy in a lot of
cases where you only want to download, and I also take your point that
the server cert is not exactly for the purpose of "authenticating the
client".

The main motivations behind the `auth` dict are:

  * We can clearly document the authentication related attributes in a
    single location.

    The key/cert exchange is a bit confusing in my opinion, I wanted
    to rename them in terms of "pull" and "push" credentials so that
    the user could be unburdened of the knowledge of how key exchanges
    work, but that is also incorrect.

    I think that the client/server cert/key is all clearly the same
    topic / subject material, and as it is used in multiple places
    (different kinds of remote asset services, remote execution
    services), it makes good sense to have a central place to
    document this.

  * Extensibility.

    Right now we only support http/https URIs, but this could
    hypothetically be extended in the future.

    By having an `auth` dict we can naturally add a `kind` field
    to the dict and support different types of authentication in
    the future in a nice clean component.


> > Artifact and Source cache configuration
> > ---------------------------------------
> > Projects are still allowed to provide recommendations for artifact and
> > source cache servers.
> > 
> > User configuration now has the ability to override them, i.e. disregard
> > artifact and source cache servers declared in projects.
> > 
> > Also, it is no longer possible to declare an artifact/source cache
> > server as a dictionary, it MUST be a list.
> > 
> > This choice is simply because it the dict-or-list tactic here does not
> > buy us any convenience whatsoever, and clarity that it is in fact a
> > list of dictionaries is more worthwhile.
> 
> I think this makes sense.
> 
> > Project Configuration
> > ~~~~~~~~~~~~~~~~~~~~~
> > 
> >   #
> >   # This is mostly unchanged, except for the `auth`
> >   #
> >   artifacts:
> >   - url: https://pony.com:9999
> >     type: both
> >     push: false
> >     instance-name: this-shard
> >     auth:
> >       server-cert: server.crt
> 
> I assume `both` here means index+storage? (as in CAS + RA)

Yes, that is the current name (I didn't change this), although we could
rename this if you have a better name in mind (now is a good time...).

> > User Configuration
> > ~~~~~~~~~~~~~~~~~~
> > We can declare global artifact configuration, which either
> > overrides or augments project recommended cache servers.
> > 
> > When "augmenting", the user configuration is still at a higher priority
> > than the project recommendations (as in: user configuration caches will
> > be consulted *first* when interacting with remotes).
> 
> I think this makes a lot of sense, but is the reverse of what bst1
> does. I hope there won't be objections to changing it.

Indeed.

The point is that all of user configuration can override project
suggestions, which has always been the design, e.g. see:

    https://docs.buildstream.build/master/arch_data_model.html#context

The cache service configuration went a bit off the reservation in this
regard so I want to bring it back in line :)

> >   #
> >   # Global artifact configuration
> >   #
> >   artifacts:
> > 
> >     #
> >     # Here we decide whether user configuration overrides
> >     # project recommendations.
> >     #
> >     override-project-caches: true
> 
> This would be useful as a command line argument too.


Yup, that is already included in my follow up branch :)

    https://github.com/apache/buildstream/pull/1438

Cheers,
    -Tristan

Re: Proposal: Redesign of configuration surface for all remote services

Posted by Abderrahim Kitouni <a....@gmail.com>.

Hi Tristan,

Le jeu. 28 janv. 2021 à 09:12, Tristan Van Berkom
<tr...@codethink.co.uk> a écrit :
>
> Hi all,
>
> Since there was no activity on this thread for a very long time, I
> decided to go ahead and take a crack at this.
>
> I have a good branch now that is ready for review. The MR is up here:
> https://github.com/apache/buildstream/pull/1453
>
>
> I'm sending a detailed email because it's a large proposal and I would
> like this to be visible, so that people can chime in incase we've
> missed an important use case.

This is something I've been meaning to reply to for a long time, but
never found the time. I've now gone through your proposal, and I must
say I like it. A couple remarks inline:

> Authentication
> --------------
> For all of the authentication related properties, `server-cert`,
> `client-cert` and `client-key`, these have been split out into a
> subdictionary named "auth" for any remote configuration.
>
> This may allow better extensibility for alternative authentication
> methods in the future, however right now it serves us very well to be
> able to document the "auth" dictionary in one central place in the
> documentation.

I think it's better to keep `server-cert` out of the sub-dictionary.
The server certificate is needed when the server is using an untrusted
certificate regardless of whether we want to authenticate with the
server and regardless of the method we use to authenticate (currently
client certificate, but we may want to support some kind of access
token in the future like buildbox)

> Artifact and Source cache configuration
> ---------------------------------------
> Projects are still allowed to provide recommendations for artifact and
> source cache servers.
>
> User configuration now has the ability to override them, i.e. disregard
> artifact and source cache servers declared in projects.
>
> Also, it is no longer possible to declare an artifact/source cache
> server as a dictionary, it MUST be a list.
>
> This choice is simply because it the dict-or-list tactic here does not
> buy us any convenience whatsoever, and clarity that it is in fact a
> list of dictionaries is more worthwhile.

I think this makes sense.

> Project Configuration
> ~~~~~~~~~~~~~~~~~~~~~
>
>   #
>   # This is mostly unchanged, except for the `auth`
>   #
>   artifacts:
>   - url: https://pony.com:9999
>     type: both
>     push: false
>     instance-name: this-shard
>     auth:
>       server-cert: server.crt

I assume `both` here means index+storage? (as in CAS + RA)

> User Configuration
> ~~~~~~~~~~~~~~~~~~
> We can declare global artifact configuration, which either
> overrides or augments project recommended cache servers.
>
> When "augmenting", the user configuration is still at a higher priority
> than the project recommendations (as in: user configuration caches will
> be consulted *first* when interacting with remotes).

I think this makes a lot of sense, but is the reverse of what bst1
does. I hope there won't be objections to changing it.

>   #
>   # Global artifact configuration
>   #
>   artifacts:
>
>     #
>     # Here we decide whether user configuration overrides
>     # project recommendations.
>     #
>     override-project-caches: true

This would be useful as a command line argument too.

Regards,

Abderrahim