You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@buildstream.apache.org by Tristan Van Berkom <tr...@codethink.co.uk> on 2020/06/22 11:24:39 UTC

Re: [BuildStream] Protect against plugin modifications of artifacts

Hi William,

Thanks for engaging in this.

On Mon, 2020-06-22 at 11:02 +0100, William Salmon wrote:
> Reply in line.
[...]
> There are already a number of plugins that put files in the sandbox but 
> that do it slightly differently. They run commands that then get 
> executed in the sand box but the effect is the same, files get added 
> that did not come from a source plugin.
> 
> It is very common to use cat and some variable to place files in the 
> sandbox that do not come from a "source plugin", see x86image [4] which 
> Tristan often mentions. I appreciate that by creating the file in this 
> way, the action is recorded in the config but the config can be 
> manipulated by the plugin. I have written rather convoluted plugins to 
> try and make the x86image much more generic. This made assembling the 
> cat command very complex and ultimately lead to the plugin not being 
> successful.
> 
> To me the issue is, do changes in the artifact have changes in the cache 
> key. It is the authors job to make sure that this is true with tools 
> such as get_unique_key.
> 
> A author could cause changes to a artifact by changing the way they use 
> the `blessed` api eg `layout_add`.
> 
> Using cat is actually rather tricky and I think we could make much 
> better plugins by just adding files to the sandbox directly. The 
> x86image is very ridged and so i have had mixed success with a plugin 
> wrapped around genimage, freedesktop-sdk use this approach but not the 
> plugin a lot but as there configuration files are added with cat they 
> end up having to have loads of very different elements.
> 
> I think a good thing would be for use to have a flexible image creation 
> plugin that has a nice way to create configs and assemble the sandbox. I 
> was planing to use the API that Tristan is planning to outlaw to do this 
> as I think it is a the best way. As noted above plugins anthers who are 
> very familiar with bst2 internals have chose to do this for plugins like 
> OCI.
> 
> Looking at the new version of some of the plugins you have talked about 
> [7] [8] I am confused by the problem. The new api has all file 
> manipulation in nice context handlers that control how the new data is 
> added in to the cas. I can see that maybe we could change when you can 
> do this so that the new files can be baked in to the cache key. Maybe at 
> configure time, but I think that would add time to bst show.
> 
> As discussed on IRC adding a cleaner API for changing permissions etc 
> would be nice but I don't really see why having to include a build dep 
> to bring chown/chmod in to a sandbox and coupling this with some config 
> is so much better. Or why we can not found a way to make sure that 
> tweaking the sandbox can not be done safely.

The main differences here are:

  A.) The guarantees we can provide about running safe sandboxed code
      compared with untrusted local python interpreter code.

      This would lead us down a path of providing ever more carefully
      crafted APIs in the core in order to produce reproducible output
      in the artifact, all the while not being certain what python APIs
      a plugin might use to produce output.

  B.) Simply having a "second way" of doing things complicates things
      for us, as the question of which way to choose arises for a
      plugin author, when the safe way should be the only choice.

  C.) I wasn't thinking of this but indeed, as you pointed out: we
      leave open the chance for human error by implying that the plugin
      author should rev their Plugin.get_unique_key() implementation at
      any time that their code changes.

      Seeing how third party plugins have been maintained in the past
      (i.e. rather sloppily, just getting the job done, fire and
      forget), I don't expect this level of diligence from plugin
      authors... and to be clear: I think this is a *good thing*.

If we take a simple example of even the current collect_manifest
plugin, take a look at the lines where we construct dictionaries:

   https://gitlab.com/BuildStream/bst-plugins-experimental/-/blob/master/src/bst_plugins_experimental/elements/collect_manifest.py#L191
   https://gitlab.com/BuildStream/bst-plugins-experimental/-/blob/master/src/bst_plugins_experimental/elements/collect_manifest.py#L309
   https://gitlab.com/BuildStream/bst-plugins-experimental/-/blob/master/src/bst_plugins_experimental/elements/collect_manifest.py#L322

Here we can see that the output is clearly not reproducible, while it
might be reproducible in more recent versions of python, I'm not
certain we can be sure how long this will last.

If we were running this python code which constructs dictionaries in
the sandbox, at the very least we could set PYTHONHASHSEED=0 in the
environment for projects which use such unstable python code (note that
this behavior of dicts has flipflopped over the years, early on it was
accidentally stable, then intentionally randomized, and now finally
forced to behave like the OrderedDict() type).

This is not an invitation to a debate about whether we can trust python
dictionaries or not in the future, it's just an example of a class of
problem we should not be held accountable for: We simply provide the
sandbox for elements to perform permutations within, and provide some
controls to allow for projects to achieve maximal reproducibility.


To answer another point in the above paragraph: `cat` need not be the
only way to construct data in the sandbox.

Currently, the BuildElement only offers support for running commands
provided as text supplied in it's `config` as shell scripts, but this
is really only because nobody has cared enough to give it more
capability yet, see:

    https://gitlab.com/BuildStream/buildstream/-/blob/master/src/buildstream/buildelement.py#L309

There's no reason why BuildElements (or other elements) need to
restrict themselves to running shell script (it could equally pass text
to /usr/bin/python or /usr/bin/node or whatever).

Indeed, to take a page from rpm spec files, they have had such support
in their %pre / %post scripts for a very long time and I've been hoping
to add it to BuildElement too but haven't found a personal itch that it
would scratch as of yet, still there's nothing preventing this afaics.

> > But in short, the authors of these plugins should be doing the work to
> > get their features supported properly upstream, and BuildStream 2.0
> > should not be held hostage to provide alternative support for a feature
> > which was never a feature in the first place.
> 
> BST should not be held hostage to any one of course but given that I 
> have found and helped to fix 8 bugs in the last 2 mounts from trying to 
> get bst2 to work with freedesktop-sdk. Several of these bugs have been 
> HUGE eg the handling of variables being broken or bst not being able to 
> pull a project the size of freedesktop-sdk it seems that freedesktop-sdk 
> and buildstream working together is giving both projects a lot of benefit.
> 
> It would seem rather prudent to me to check that bst2 can be used with 
> some real projects before the release. Freedesktop-sdk seems like the 
> most obvious candidate as everyone one can see it and work on it. But I 
> don't expect us to need to block on freedesktop-sdk specifically, if we 
> have a alternative approach to validate bst before release?

This is a very good point.

To be honest, I raised the issue of blocking here as a matter of
principle, mostly because I've noticed that a lot of effort has been
(mis)placed into adding support for something I believe we shouldn't
even be supporting in the first place, and I expect this is due to a
thinking that we should block progress and refactors on ensuring
plugins still work.

For the most part I agree with this, but I think it simple went
overlooked that in this case, we never really explicitly supported this
behavior, and instead we are blocking on supporting an exploitation of
the `directory` argument, lending more ability to plugins than it
should.

For collect_manifest, there should be ways to work around this without
too much effort, and for the purpose of ensuring BuildStream works in
the meantime, surely we can build freedesktop-sdk without involving the
collect_manifest plugin and still know that it works.


Cheers,
    -Tristan

> 
> > 
> > So, what are your thoughts ?
> > 
> > Best regards,
> >      -Tristan
> > 
> > 
> > [0]: https://gitlab.com/BuildStream/buildstream/-/blob/master/src/buildstream/storage/directory.py
> > [1]: https://gitlab.com/BuildStream/buildstream/-/issues/235
> > [2]: https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/55
> > [3]: https://mail.gnome.org/archives/buildstream-list/2018-August/msg00013.html
> > [4]: https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/376
> > [5]: https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/merge_requests/735
> > [6]: https://gitlab.com/BuildStream/bst-plugins-experimental/-/issues/2
> [7]: 
> https://gitlab.com/BuildStream/bst-plugins-experimental/-/blob/master/src/bst_plugins_experimental/elements/x86image.yaml#L48
> [8]: 
> https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/blob/dwinship/bst2-fv/use-virtual-directories-in-plugins/plugins/elements/url_manifest.py#L174
> [9]: 
> https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/blob/dwinship/bst2-fv/use-virtual-directories-in-plugins/plugins/elements/collect_initial_scripts.py#L50
> > 
> > 
> > 
> _______________________________________________
> buildstream-list mailing list
> buildstream-list@gnome.org
> https://mail.gnome.org/mailman/listinfo/buildstream-list
>

Re: [BuildStream] Protect against plugin modifications of artifacts

Posted by Tristan Van Berkom <tr...@codethink.co.uk>.

Hi,

On Fri, 2020-07-03 at 08:38 +0100, Darius Makovsky wrote:
> Hi Ben,
> 
[...]
> > Will it mean that plugin authors might not be able to write their
> > plugin the easiest way? Possible, however, that's were as a
> > community we need good examples and documentation on how to do
> > "those special things".
> > 
> 
> I agree. I just wanted to emphasize that I don't think it would be a
> case of simply forbidding certain things from happening (in whatever
> way is decided is best). However the way I think it's best to make it
> as simple as practical for a user.

I think that for the most part, we do have the right tools to get the
job done already.

One thing that appears to have been overlooked in some of these
plugins, is that one can stage dependency graphs multiple times into
multiple locations in the sandbox.

> > Moreover, at least for all the 'packaging' plugins, we would just
> > not need to do anything outside the sandbox anymore.
> 
> OK, how do you see this working in the example of the bazelize
> plugin? Do you mean that this plugin would be a source kind instead?

Bazelize might fall into the more uncharted territory.

In terms of 'packaging' plugins, like tar/oci/docker, you really only
need to have tools like `tar` around in the sandbox to use in order to
compose output from a payload staged into a separate subdirectory.

For bazelize we might need to add something more to expose BuildStream
level metadata into the sandbox so that sandboxed tools can process
them, without host code composing/formatting any metadata manually - I
don't know the details of bazelize well enough but I'm aware we have a
shortcoming in this regard and bazelize probably falls into this
category.

Cheers,
    -Tristan

Re: [BuildStream] Protect against plugin modifications of artifacts

Posted by Darius Makovsky <da...@codethink.co.uk>.

Hi Ben,

Apologies for the late response.

On 26/06/2020 11:01, Benjamin Schubert wrote:
> 
> I think the desire for plugins doing such things can be obtained with different constructs, which we might need to implement and promote, and that "being able to write in the sandbox directory" without running in the sandbox is not a hard requirement for any plugin, if we have a way of doing it differently.
> 
> Will it mean that plugin authors might not be able to write their plugin the easiest way? Possible, however, that's were as a community we need good examples and documentation on how to do "those special things".
> 

I agree. I just wanted to emphasize that I don't think it would be a case of simply forbidding certain things from happening (in whatever way is decided is best). However the way I think it's best to make it as simple as practical for a user.

> Moreover, at least for all the 'packaging' plugins, we would just not need to do anything outside the sandbox anymore.

OK, how do you see this working in the example of the bazelize plugin? Do you mean that this plugin would be a source kind instead?
> Hope my answers better clarify my vision,
> Ben
> 
> 


-- 
Best Regards,
Darius


For Codethink's privacy-policy please see https://www.codethink.co.uk/privacy.html

Re: [BuildStream] Protect against plugin modifications of artifacts

Posted by Benjamin Schubert <co...@benschubert.me>.

Hey Darius,

Responses inline

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, 26 June 2020 10:28, Darius Makovsky <da...@codethink.co.uk> wrote:
>
> > The aim is to more specifically remove the `Directory.open_files()` call for
> > writing files in the directory. Is that accurate? Or is there more things that
> > would disappear?
>
> It seems like there's been quite a bit of churn on api and that's resulted in porting work for plugins and projects needing them. Is it fair to say that some of this is from uncertainty/disagreement about api design? Are there any process changes that can be made to better formalize and enforce acceptable api attributes?
>

My point of view on this is:

- Much of the API has to change in order to make RE a first class citizen in BuildStream
- We had decided when starting BuildStream 2.0 that we would be allowed to break the API in order to move BuildStream in a direction where it would be more performant and better integrated with RE.

And the current process requires agreement on the ML, which was not necessarily the case before. I believe that if we stay diligent on that one, we will end up not having too many of them, and only the necessary ones.

> > I do agree that we should do our best to prevent plugin authors from shooting
> > themselves in the foot and to help them achieve the most repeatable build
> > possible. However I don't think there is a difference between trusting Python
> > for us, and trusting python for the plugins.
>
> That is true but it's also about how and in what stage the artifacts can change right? This is related to the question above about formal requirements for api. Ultimately, there's clearly desire for plugins that do these sorts of things and so developers will find a way to do it regardless which can lead to project fragmentation or rejection. I think it's better to find a practical compromise.
>

I think the desire for plugins doing such things can be obtained with different constructs, which we might need to implement and promote, and that "being able to write in the sandbox directory" without running in the sandbox is not a hard requirement for any plugin, if we have a way of doing it differently.

Will it mean that plugin authors might not be able to write their plugin the easiest way? Possible, however, that's were as a community we need good examples and documentation on how to do "those special things".

> > The second part would then be, for plugins that are 'packaging', like docker or
> > oci, to rewrite the element to work on the `artifact` source instead. And for
> > other elements, like `bazelize`, to actually be a `SourceTransform` based on
> > the previous elements.
>
> But does that resolve the original state problem or just defer it to another element?
>

Sources have a better tracking of keys and are expected to run outside the sandbox, it is thus easier to handle such things there.
Moreover, at least for all the 'packaging' plugins, we would just not need to do anything outside the sandbox anymore.

>
> -----------------------------------------------------------------------------------------
>
> Best Regards,
> Darius

Hope my answers better clarify my vision,
Ben

Re: [BuildStream] Protect against plugin modifications of artifacts

Posted by William Salmon <wi...@codethink.co.uk>.

On 26/06/2020 12:18, Tristan Van Berkom wrote:
[...]
> 
> Summary
> =======
> I think that for the summary of plugins which currently exploit the
> directory API to write data to the sandbox, things fall into a few
> categories:
> 
>    A) Stuff that we can totally do in the sandbox, but people didnt
>       do because they were lazy or wanted something a bit more
>       convenient to work with than BuildStream.
> 
>       E.g. oci, docker, tar - these are all expected to be implemented
>       by staging the required tooling into "/" and processing in the
>       sandbox.

When I wrote a first go at a genimage plugin I found that while running 
genimage in side the sandbox was pretty easy, having some basic things 
in element.bst' yaml as switches or option and then the plugin translate 
that in to a genimage config was a great way to keep the load of 
learning genimage off the user.

The script elements that valentine created drawing inspiration from my 
plugin in FD-SDK and GBM are far harder to maintain, share or create and 
i would like to supper seed them at some point with a revamped genimage 
plugin. Various people have had issue junctioning FD and copying script 
elements and having incompatibility issues when updating FD and thus 
genimage but not there scripts.

Its interesting to note that the plugin that i wrote did no use the api 
you are removing but a variation that you have expressed your dis 
approval with, and i think the api you are proposing to remove would 
have been a better bet and would have worked better than what i came up 
with.

There for I think that the comments that you have made for `C` also 
apply here.

> 
>    B) Stuff that cannot be implemented at all in BuildStream.
> 
>       E.g. collect_manifest, this is just a plain hack which cannot be
>       supported at all without designing some way for BuildStream to
>       deliver the data which `collect_manifest` desires.

I actually think a plugin is a really neat way to do this class of 
problem, as bst checks if it needs updating for you to some degree. If 
the cache key is the same bst knows not to redo the ordit. But i realize 
a external thing could implement yet another caching mechanism based on 
our cache keys.

That said allowing bst to be imported and using bst as a pyton lib to 
inspect a project would also make it easier to do such tasks. Did some 
one mention this recently?

Collect_manifest and the other auditing work is a important part of 
FD-SDK so saying we are going to write off this class of issue for bst2 
seems rather sweeping to me. Unless we can demonstrate that `bst show 
--thing` can give the required information. It seems important to me to 
make sure that the required information is rendered in a way that is not 
too onerous for a down stream thing to use, is there a `--json` option 
for bst show?.

> 
>    C) Possibly metadata for package managers like rpm or dpkg.
> 
>       In this case we might be able to create some helper facility to
>       safely create a file based on a %{variable}, in order to avoid
>       having to use a shell command like `cat > file <<EOF ...` and
>       the escaping which this entails.

I think this may be quite a tricky piece of discussion to get the 
flexibility required for the long list of plugins that do this sort of 
thing while still addressing your concerns although i think it will be 
well worth it, and I would imaging the implementation not being too 
difficult.

> 
> 
> Cheers,
>      -Tristan
> 
> 
>

Re: [BuildStream] Protect against plugin modifications of artifacts

Posted by Tristan Van Berkom <tr...@codethink.co.uk>.

Thanks for chiming in Ben,

Some replies inline.

On Fri, 2020-06-26 at 08:52 +0000, Benjamin Schubert wrote:
> Hey all,
> 
> This is a lot of mails to respond inline, so I will rather answer as a single
> email. Please excuse the length of it.
> 
> TLDR
> ====
> 
> I think we should be closing the API to prevent writing in the sandbox, but I
> think we need other construct beforehand, and to rethink how impacted plugins
> actually should work in BuildStream.
> A potential way forward I can see is having a 'artifact' source kind, more
> in details in the mail.
> 

I'm not fond of this idea at all, or the implications it might lead to.

First of all, treating artifacts as Sources might be backdoored as an
excuse to modify otherwise safely constructed output in artifacts,
using host python tooling.

Sources have a heavy responsibility to stage *exactly bit for bit* what
is represented by the ref, the same ref for a given source should never
ever result in anything different being staged.

So when we talk about sources like `tar`, we are extracting content and
the output is guaranteed to be the same, even if the tar files were
constructed non-deterministically at their origins.

> 
> Scope
> =====
> 
> First of all, in order to ensure I correctly understand the premises of this
> thread:
> 
> The aim is to more specifically remove the `Directory.open_files()` call for
> writing files in the directory. Is that accurate? Or is there more things that
> would disappear?
> 
> 
> What would it affect?
> =====================
> 
> A quick look around the plugins shows that the following plugins would be
> affected by such a change:
> 
> bst_plugins_experimental/elements/bazelize.py
> bst_plugins_experimental/elements/collect_integration.py
> bst_plugins_experimental/elements/collect_manifest.py
> bst_plugins_experimental/elements/dpkg_build.py
> bst_plugins_experimental/elements/dpkg_deploy.py
> bst_plugins_experimental/elements/flatpak_image.py
> bst_plugins_experimental/elements/oci.py
> bst_plugins_experimental/elements/tar_element.py
> bst_plugins_containers/elements/docker_container.py
> 
> So this seems to be more than "a few" plugins and most of those seem to have
> a thing in common: their dependencies are actually part of their output,
> in a transformed way. I'll come back to that later.

For the sake of concentrating more on the actual problems, let's cross
at least one of these off the list:

collect_manifest cannot be done
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The `collect_manifest` cannot possibly be implemented under current
BuildStream API, *at all*, there is absolutely no valid way of
supporting this plugin, so it's best to not even discuss this.

This is explained at:

    https://gitlab.com/BuildStream/bst-plugins-experimental/-/issues/2

I still feel this kind of manifest construction is better suited to a
script which calls `bst artifact show --format "..."` for various
artifacts, extracting and parsing relevant data to construct a
manifest.

Whatever the approach could possibly be for this brand of plugin, it
cannot possibly be implemented at all without first augmenting the
Source API contract in some way to allow Source objects to report
something standardized about their `url` and `ref`.

> What should we trust in the sandbox and plugins?
> ================================================
> 
> It is mentionned in this thread that we do not trust python's APIs to run
> repeatably for plugins. However we do rely on it for BuildStream itself, and
> there is little we can do around that area of trust.
> 
> I do agree that we should do our best to prevent plugin authors from shooting
> themselves in the foot and to help them achieve the most repeatable build
> possible. However I don't think there is a difference between trusting Python
> for us, and trusting python for the plugins.

I'm not sure I understand the difference of "for us and for plugins".

From my interpretation, I think you mean:

   * In the core we have python code which writes to the sandbox, for
     instance Element.stage().

   * Would we trust a plugin with this power ?

Of course not, over our dead bodies.

Every bit of trust that we relinquish to a Plugin is forcefully pried
from our cold dead fingers. Ideally, we do not trust plugins at all for
anything: Plugins only exist to chose how the core is to operate on the
given inputs, absolutely nothing more.

Why ?

Because in this way, we can provide real guarantees, if things are
buggy, we can fix them in the core.

Plugins are supposed to have little to no power to get anything wrong,
they can be whipped up on the fly for a given use case, and live in
someones repository for years, miles away from our reach and ability to
fix.

Asides from the above, I should also stress, we do not trust
*ourselves* to write data into artifacts in the core either. Giving
plugins opportunity to compose output in the sandbox (by any means) is
giving plugins a level of trust that we do not give ourselves.

Every manipulation of the sandbox we do in the core, is to preserve the
data as it was created by the sandbox, and ensure the determinism of
everything (e.g. setting deterministic mtimes, ensuring umask
consistency when staging sources, etc).

> What way forward?
> =================
> 
> I agree that having plugins being able to write any files in a sandbox is not
> the best and prevents us (BuildStream core) from making any assumption.
> 
> There seems to be a need for not executing code in the sandbox but rather
> outside of it though. Based on the plugins above, I will try to go deeper and
> explain what I see as common problems they have and alternative ways to allow
> them to continue doing their work.
> 
> All the plugins above have a common point: they treat their dependencies as
> their input, as other elements would treat their sources. As a brief summary,
> here is the list of operations they are doing:

All element treat their dependencies as their input.

If elements want to process other dependencies independently, that's
exactly why they are allowed to call Element.stage_artifact(), and use
Element.search() to obtain specific elements to stage in the sandboxes
so that they can be independently processed.

I don't see why we would need any additional API for any of this,
although Element.search() needs to be fixed for cross-junction lookups,
that's already on my immediate todo list.

> - Archiving / checksuming files and writing that in the sandbox
> - Writing a manifest of some kind
> 
> Having them run script inside the sandbox means they would have a harder time
> knowing what is their input and what is there to allow doing their job, and we
> currently don't have any construct in BuildStream for such things.

Again, with Element.search() and Element.stage_artifact() should be
sufficient to stage elements at locations if they need to be
independently processed, using other tooling staged at "/" to process
this and run scripts.

It can be that we need a core facility to expose artifact public data
deterministically in some way(s) for the sake of processing with
scripts inside a sandbox.

> This seems to show to me that there is a need for an `artifact` source kind,
> which would be the output of a previous element. I will start another
> thread to keep things separate as to how this would work, but I believe this
> would solve the first part of our problem, that is, be able to separate build
> dependencies from "sources".
> 
> The second part would then be, for plugins that are 'packaging', like docker or
> oci, to rewrite the element to work on the `artifact` source instead. And for
> other elements, like `bazelize`, to actually be a `SourceTransform` based on
> the previous elements.
> 
> A brief example of how I could see that working (More in the next ML thread
> I'll start after this, let's not discuss the details here too much unless
> relevent)
> 
> For an element, we could have as sources:
> 
> ```
> kind: oci_image
> 
> sources:
> - kind: artifact
>   elements: my_stack_for_which_to_create_a_oci_image
> - kind: manifesto  # Write a manifesto in the sources based on previous sources
> 
> build-depends:
> - tar (or buildah)
> ```
> 
> I believe that such a new construct would allow BuildStream plugins to achieve
> all they achieve today without writing to the sandbox directly for elements.
> 
> I also think that we should potentially try this first before closing the API
> alltogether on a few plugins.

Coming back to the beginning of my reply, I feel very strongly against
this if I'm interpreting this correctly. This implies host side
processing and manipulation of data which will make it's way to an
artifact, which is equally as incorrect as writing directly into the
sandbox.

Importing sources is the weakest link, it needs to remain as brief as
possible, and do the best job of simply ensuring bit-for-bit equal
staging for any two times it bares the same ref (it should definitely
not be doing the stuff that is going on in `oci` and `docker` plugins
currently, which I consider rather reckless for BuildStream).

There is no reason I can see why oci and docker plugins cannot be
implemented safely using the sandbox, it's very simple; they simply
need to stage tooling to operate on the data, just like we stage
autotools (except possibly in a different sysroot).

Summary
=======
I think that for the summary of plugins which currently exploit the
directory API to write data to the sandbox, things fall into a few
categories:

  A) Stuff that we can totally do in the sandbox, but people didnt
     do because they were lazy or wanted something a bit more
     convenient to work with than BuildStream.

     E.g. oci, docker, tar - these are all expected to be implemented
     by staging the required tooling into "/" and processing in the
     sandbox.

  B) Stuff that cannot be implemented at all in BuildStream.

     E.g. collect_manifest, this is just a plain hack which cannot be
     supported at all without designing some way for BuildStream to
     deliver the data which `collect_manifest` desires.

  C) Possibly metadata for package managers like rpm or dpkg.

     In this case we might be able to create some helper facility to
     safely create a file based on a %{variable}, in order to avoid
     having to use a shell command like `cat > file <<EOF ...` and
     the escaping which this entails.

Cheers,
    -Tristan

Re: [BuildStream] Protect against plugin modifications of artifacts

Posted by Darius Makovsky <da...@codethink.co.uk>.

On 26/06/2020 09:52, Benjamin Schubert wrote:
> 
> I think we should be closing the API to prevent writing in the sandbox, but I
> think we need other construct beforehand, and to rethink how impacted plugins
> actually should work in BuildStream.

+1

> 
> The aim is to more specifically remove the `Directory.open_files()` call for
> writing files in the directory. Is that accurate? Or is there more things that
> would disappear?

It seems like there's been quite a bit of churn on api and that's resulted in porting work for plugins and projects needing them. Is it fair to say that some of this is from uncertainty/disagreement about api design? Are there any process changes that can be made to better formalize and enforce acceptable api attributes?

> 
> I do agree that we should do our best to prevent plugin authors from shooting
> themselves in the foot and to help them achieve the most repeatable build
> possible. However I don't think there is a difference between trusting Python
> for us, and trusting python for the plugins.

That is true but it's also about how and in what stage the artifacts can change right? This is related to the question above about formal requirements for api. Ultimately, there's clearly desire for plugins that do these sorts of things and so developers will find a way to do it regardless which can lead to project fragmentation or rejection. I think it's better to find a practical compromise.

> 
> The second part would then be, for plugins that are 'packaging', like docker or
> oci, to rewrite the element to work on the `artifact` source instead. And for
> other elements, like `bazelize`, to actually be a `SourceTransform` based on
> the previous elements.

But does that resolve the original state problem or just defer it to another element?

-- 
Best Regards,
Darius

For Codethink's privacy-policy please see https://www.codethink.co.uk/privacy.html

Re: [BuildStream] Protect against plugin modifications of artifacts

Posted by Tristan Van Berkom <tr...@codethink.co.uk>.

Hi Gökçen,

On Fri, 2020-06-26 at 11:18 +0100, Gökçen Nurlu via buildstream-list wrote:
> Hi all,
[...]
> > All the plugins above have a common point: they treat their dependencies as
> > their input, as other elements would treat their sources. As a brief summary,
> > here is the list of operations they are doing:
> > 
> > - Archiving / checksuming files and writing that in the sandbox
> > - Writing a manifest of some kind
> 
> Thanks for the investigation, +1. This part is actually my motivation
> to chime in. Correct me if I'm wrong, all the plugins above except
> dpkg_* ones at least, they are specifically designed to use their
> "artifact" outside of the buildstream universe. dpkg_* ones might be,
> too. I was just too lazy to read the code in detail..
> 
> Here is my 2cs: what about enabling this Directory API
> (write/read/etc) only for the kinds of Elements that mark themselves
> as "leaf-level" and we forbid them to be used in a different bst file?
> Sorry if this has been discussed and rejected before.
> 
> At least for the current plugins, they could just add a
> `LEAF_LEVEL=True` as a class attr, similar to other flags we have.
> For other elements that are actually in the middle of the build plan
> and don't have this flag, we can stop exposing
> `sandbox.get_virtual_directory()` at all, somehow. I find this easier
> to grasp and explain to plugin devs, there'd be less places to modify
> in bst codebase and minimal changes for (element) plugin codes.
> Citation is needed for the last two though, it's been some time since
> I checked the codebase :)

This is close to something which came up many times in the early months
of BuildStream development, essentially what you are calling "leaf"
elements is something we were calling "deployment" elements.

We drew a hard line on this and I maintain it was the right decision.

Basically, there is a need to wrap up bootable images and suchlike,
where you might want to encode a custom hostname or give it some custom
certificates and such.

This should all be done post-checkout, because anything that is a
BuildStream artifact, has a cache key, and should be reproducible.

If you need variance and you want host tools, there is no value to be
doing it in BuildStream anyway, and there is always the expectation
that some post processing may be needed on content which extracted by
`bst artifact checkout` (there are plenty of valid use cases for this).

Cheers,
    -Tristan

Re: [BuildStream] Protect against plugin modifications of artifacts

Posted by Gökçen Nurlu <go...@gmail.com>.

Hi all,

Really fruitful conversations, thanks! I'm also excited to send my
first mail to Apache ML :)

Benjamin Schubert <co...@benschubert.me>, 26 Haz 2020 Cum, 09:52
tarihinde şunu yazdı:
> What would it affect?
> =====================
>
> A quick look around the plugins shows that the following plugins would be
> affected by such a change:
>
> bst_plugins_experimental/elements/bazelize.py
> bst_plugins_experimental/elements/collect_integration.py
> bst_plugins_experimental/elements/collect_manifest.py
> bst_plugins_experimental/elements/dpkg_build.py
> bst_plugins_experimental/elements/dpkg_deploy.py
> bst_plugins_experimental/elements/flatpak_image.py
> bst_plugins_experimental/elements/oci.py
> bst_plugins_experimental/elements/tar_element.py
> bst_plugins_containers/elements/docker_container.py
>
> So this seems to be more than "a few" plugins and most of those seem to have
> a thing in common: their dependencies are actually part of their output,
> in a transformed way.

+1, I share the same observations

> All the plugins above have a common point: they treat their dependencies as
> their input, as other elements would treat their sources. As a brief summary,
> here is the list of operations they are doing:
>
> - Archiving / checksuming files and writing that in the sandbox
> - Writing a manifest of some kind

Thanks for the investigation, +1. This part is actually my motivation
to chime in. Correct me if I'm wrong, all the plugins above except
dpkg_* ones at least, they are specifically designed to use their
"artifact" outside of the buildstream universe. dpkg_* ones might be,
too. I was just too lazy to read the code in detail..

Here is my 2cs: what about enabling this Directory API
(write/read/etc) only for the kinds of Elements that mark themselves
as "leaf-level" and we forbid them to be used in a different bst file?
Sorry if this has been discussed and rejected before.

At least for the current plugins, they could just add a
`LEAF_LEVEL=True` as a class attr, similar to other flags we have.
For other elements that are actually in the middle of the build plan
and don't have this flag, we can stop exposing
`sandbox.get_virtual_directory()` at all, somehow. I find this easier
to grasp and explain to plugin devs, there'd be less places to modify
in bst codebase and minimal changes for (element) plugin codes.
Citation is needed for the last two though, it's been some time since
I checked the codebase :)

Best,

Gokcen

Re: [BuildStream] Protect against plugin modifications of artifacts

Posted by Benjamin Schubert <co...@benschubert.me>.

Hey all,

This is a lot of mails to respond inline, so I will rather answer as a single
email. Please excuse the length of it.

TLDR
====

I think we should be closing the API to prevent writing in the sandbox, but I
think we need other construct beforehand, and to rethink how impacted plugins
actually should work in BuildStream.
A potential way forward I can see is having a 'artifact' source kind, more
in details in the mail.


Scope
=====

First of all, in order to ensure I correctly understand the premises of this
thread:

The aim is to more specifically remove the `Directory.open_files()` call for
writing files in the directory. Is that accurate? Or is there more things that
would disappear?


What would it affect?
=====================

A quick look around the plugins shows that the following plugins would be
affected by such a change:

bst_plugins_experimental/elements/bazelize.py
bst_plugins_experimental/elements/collect_integration.py
bst_plugins_experimental/elements/collect_manifest.py
bst_plugins_experimental/elements/dpkg_build.py
bst_plugins_experimental/elements/dpkg_deploy.py
bst_plugins_experimental/elements/flatpak_image.py
bst_plugins_experimental/elements/oci.py
bst_plugins_experimental/elements/tar_element.py
bst_plugins_containers/elements/docker_container.py

So this seems to be more than "a few" plugins and most of those seem to have
a thing in common: their dependencies are actually part of their output,
in a transformed way. I'll come back to that later.


What should we trust in the sandbox and plugins?
================================================

It is mentionned in this thread that we do not trust python's APIs to run
repeatably for plugins. However we do rely on it for BuildStream itself, and
there is little we can do around that area of trust.

I do agree that we should do our best to prevent plugin authors from shooting
themselves in the foot and to help them achieve the most repeatable build
possible. However I don't think there is a difference between trusting Python
for us, and trusting python for the plugins.


What way forward?
=================

I agree that having plugins being able to write any files in a sandbox is not
the best and prevents us (BuildStream core) from making any assumption.

There seems to be a need for not executing code in the sandbox but rather
outside of it though. Based on the plugins above, I will try to go deeper and
explain what I see as common problems they have and alternative ways to allow
them to continue doing their work.

All the plugins above have a common point: they treat their dependencies as
their input, as other elements would treat their sources. As a brief summary,
here is the list of operations they are doing:

- Archiving / checksuming files and writing that in the sandbox
- Writing a manifest of some kind

Having them run script inside the sandbox means they would have a harder time
knowing what is their input and what is there to allow doing their job, and we
currently don't have any construct in BuildStream for such things.

This seems to show to me that there is a need for an `artifact` source kind,
which would be the output of a previous element. I will start another
thread to keep things separate as to how this would work, but I believe this
would solve the first part of our problem, that is, be able to separate build
dependencies from "sources".

The second part would then be, for plugins that are 'packaging', like docker or
oci, to rewrite the element to work on the `artifact` source instead. And for
other elements, like `bazelize`, to actually be a `SourceTransform` based on
the previous elements.

A brief example of how I could see that working (More in the next ML thread
I'll start after this, let's not discuss the details here too much unless
relevent)

For an element, we could have as sources:

```
kind: oci_image

sources:
- kind: artifact
  elements: my_stack_for_which_to_create_a_oci_image
- kind: manifesto  # Write a manifesto in the sources based on previous sources

build-depends:
- tar (or buildah)
```

I believe that such a new construct would allow BuildStream plugins to achieve
all they achieve today without writing to the sandbox directly for elements.

I also think that we should potentially try this first before closing the API
alltogether on a few plugins.


I hope that helps and might be a way forwards.
Cheers,

Ben

Re: [BuildStream] Protect against plugin modifications of artifacts

Posted by Tristan Van Berkom <tr...@codethink.co.uk>.

Hi William,

On Wed, 2020-06-24 at 13:52 +0100, William Salmon wrote:
[...]
> > If this is such an interesting feature to have, I don't see much reason
> > why we could not implement such a feature in BuildStream, even using
> > Python plugins. This could be an Element API which takes a Sandbox, a
> > "%{variable}" name and an absolute path, which could stage a resolved
> > variable as the content of a file in the sandbox safely.
> > 
> > It would be interesting to see a proposal for such a feature, probably
> > I would argue that the permissions used to stage such a file be very
> > limited, or unspecified (something matching the hard coded permissions
> > used to stage files from Sources into the sandbox).
> > 
> 
> I don't strongly disagree with the jist of this. Things like `with 
> conditionally resolved variables` seems like it might hint at the 
> required flexibility.
> 
> > That said, a controlled feature like this would be an extremely far cry
> > from allowing python code to simply write whatever they want into the
> > sandbox, and would not allow for the non-deterministic things which are
> > currently being done by existing plugins which exploit this currently
> > existing weakness.
> 
> In fact I have said in this thread:
> """I am not saying that this API should not be improved or that there is 
> no room to make it better"""
> 
> I would think that we can come up with something that would be enough 
> for many of the EXISTING AND IN-USE plugins that have been created and 
> used in good faith that need to put things in to the sandbox. As well as 
> supporting the plugin's that should exist like a genimage plugin.
> 
> I look forward to getting the details sorted so we can be sure they are 
> useful and reproducible.
> 
> I don't think its a good idea to remove the old API without adding the 
> new API as this would brake and block many existing projects who are 
> trying to track bst-master and are providing feed back and many bug 
> reports and some MR's.

Right, the thing is, I suspect that plugins which are exploiting the
ability to write directly to the sandbox, which could be implemented
with the above kind of suggested API, are the minority (but I've only
so far looked at collect_manifest and oci, which are both rather
hopeless in this regard and need a complete rethink).

From an upstream perspective, I think it's important to close the
floodgates ASAP, and consider any (controlled/limited) API for writing
out variables to files separately.

Further, it would have to be *ensured* that these contents cannot be
modified with python, that's why a variable name would be just about
the only thing I could imagine would be workable.

In any case, I think it's important to untangle these two separate
topics, what we have been discussing and what the main focus of this
topic is, is the ability to write to the sandbox, and the implications
of this, which needs eliminating - what plugin authors do about it for
the various plugins is a separate topic, and may involve adding
appropriate helper features to BuildStream.

Cheers,
    -Tristan

Re: [BuildStream] Protect against plugin modifications of artifacts

Posted by William Salmon <wi...@codethink.co.uk>.

On 24/06/2020 09:19, Tristan Van Berkom wrote:
> Hi Will,
> 
[...]
> 
> With all of that out of the way, I'd like to separately reply to this:
> 
>    "A repeatable plugin should not have a issue with writing to the
>     sandbox. So long as what is written is completely determined by the
>     yaml in the project (element.bst + project.conf + dep.bst) and is
>     thus repeatable and deterministic."
> 
> So, to reuse a term which I came up with in the above text,
> "dynamic controlled content" for lack of a better term, could perhaps
> simply be a resolved %{variable} (which could contain a lot of text and
> act like a template, with conditionally resolved variables substituted
> within).
> 
> If this is such an interesting feature to have, I don't see much reason
> why we could not implement such a feature in BuildStream, even using
> Python plugins. This could be an Element API which takes a Sandbox, a
> "%{variable}" name and an absolute path, which could stage a resolved
> variable as the content of a file in the sandbox safely.
> 
> It would be interesting to see a proposal for such a feature, probably
> I would argue that the permissions used to stage such a file be very
> limited, or unspecified (something matching the hard coded permissions
> used to stage files from Sources into the sandbox).
> 

I don't strongly disagree with the jist of this. Things like `with 
conditionally resolved variables` seems like it might hint at the 
required flexibility.

> That said, a controlled feature like this would be an extremely far cry
> from allowing python code to simply write whatever they want into the
> sandbox, and would not allow for the non-deterministic things which are
> currently being done by existing plugins which exploit this currently
> existing weakness.

In fact I have said in this thread:
"""I am not saying that this API should not be improved or that there is 
no room to make it better"""

I would think that we can come up with something that would be enough 
for many of the EXISTING AND IN-USE plugins that have been created and 
used in good faith that need to put things in to the sandbox. As well as 
supporting the plugin's that should exist like a genimage plugin.

I look forward to getting the details sorted so we can be sure they are 
useful and reproducible.

I don't think its a good idea to remove the old API without adding the 
new API as this would brake and block many existing projects who are 
trying to track bst-master and are providing feed back and many bug 
reports and some MR's.

Re: [BuildStream] Protect against plugin modifications of artifacts

Posted by Tristan Van Berkom <tr...@codethink.co.uk>.

Hi Will,

   So I looked at the OCI plugin for the first time today fwiw[0], I
think that we can all agree that one can not possibly have any
expectation for this code to really be reproducible. This is using
various host side constructs directly to create artifact output,
including delegation of complex operations to `tar` and `gzip` on the
host.

Again, this is only an example, even if it were to be proven that every
possible implementation of `tar` and `gzip` used by the given python
libraries in use were to produce deterministic results; the policy that
is at the heart of the BuildStream design is to assume that tooling is
not deterministic, and that in order to produce deterministic build
results, one must at least consider the tooling as a part of the input
(i.e. guarantee that the exact binaries and configuration of the
tooling in use to produce output, is considered in that output's cache
key).

The fact that some plugins are breaking this policy by composing output
and placing it in the sandbox manually, using host tooling, is breaking
this policy, a policy which is the foundation on which we have built
and earned the trust of our user base.


I will now reply to your latest comments.

  "If plugins are not repeatable because they are write in python then 
   surely we should not have our plugins written in python? Plugins
   affect the repeatability in may ways and if they them selves are not
   repeatable then surely this is a different issue and needs
   addressing urgently!

   Once our plugins are repeatable then I don't see a issue with them 
   adding to the sandbox through a strict API. So long as for a given
   input they create the same cache key and they produce the same
   repeatable output."

While the gist of what you say here resonates strongly, the overall
statement is misguided in a variety of ways. I will patiently try to
enumerate these as clearly as I can.


  Core data flow design and trust model
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  Setting aside for a moment the fact that BuildStream offers a plugin
  API at all, or how the overall implementation works, let's look at a
  few of the basic core principle in play:

    * We take it as a plain fact that host tools are a variable, and as
      such we never trust these tools to produce output.

    * All data processing is performed within an isolated sandbox
      environment where we can guarantee the tooling is a constant.

    * In order to obtain external data, such as source code and initial
      base runtime libraries, we have no choice but to relinquish 
      control to host tooling.

      Even if we were to sandbox the tooling used to obtain external
      data for the build sandbox, we need to have networking and
      relinquish some measure of trust to external services.

      This is the weak link, and as such we perform as much validation
      as we can to ensure correct base inputs (Sources as we now know
      them in BuildStream).


  Plugin determinism and trustability
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  Determinism vs trustability for plugins is not a zero sum game,
  inasmuch as having non-deterministic plugins running code on the host
  does not detract from their trustability per se.

  For instance, the fact that BuildStream might build one element
  before another depending on system load, or if one log message
  appears before another non-deterministically, does not have any
  effect on how artifacts get constructed within the sandbox.

  A lot of this depends on what we trust plugins *for*, i.e. what kind
  of workloads we delegate to them.

  For BuildStream Elements, a simple summary of what tasks we delegate
  is:

    * Call BuildStream APIs to stage Sources and Artifacts to locations
      of the Element's choosing

    * Parse configuration

    * Run commands within the Sandbox

    * Generate a unique key describing it's configuration, which must
      capture everything about how it will behave, for every way it
      can behave differently.

      NOTE: We delegate this responsibility to plugins _only_ because
            we are not using a declarative plugin description language.
	    With a turing complete language like Python, it is not in
	    our power to derive such a unique key without actually
            running the code. Ideally this would not be so (more in the
            next section).

  To summarize this point, Plugins not being deterministic in the way
  they execute code, does not in itself mean that the outputs of the
  pipeline are non-deterministic.


  Why use Python for plugins ?
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  As I've outlined in the previous point: Not because it cannot be
  trusted to generate deterministic output on it's own.

  However, I'm happy you raised this because it is an interesting
  point.

  Ideally, we should be using a completely declarative description of
  how a plugin works, rather than a turing complete programming
  language which implements things in terms of running procedural code.

  If we had a declarative plugin description language instead:

    * We would have 100% control of the implementations of everything a
      plugin does, the chances of a plugin being buggy in such a
      scenario are slim to none (either the declaration of the plugin
      is loaded successfully, or an error message is issued at load
      time).

    * We could indeed allow for constructs where for instance, static
      content, or dynamic controlled content (like a %{variable}) could
      be placed into the sandbox as a given file.

    * Our capacity for long term support would be greatly increased
      (in the current climate, we may be forced to distribute a python3
      interpretor in the future, in order to support exactly the same
      plugins written today, in 10, 20 or 30 years from now).

    * As stated in the previous section, we would not have to trust
      plugins to implement `Plugin.get_unique_key()`, since parsing the
      plugin declaration would give the core all the knowledge it would
      need to compose such a key.

  Of course, this would still preclude turing complete code using host
  libraries and dynamically generating content to be placed in the
  sandbox (like the collect_manifest and oci plugins do).

  For a straight answer to the question "Why Python" in the first
  place, this was basically a matter of cost, the design and
  implementation would have taken a lot more time, we chose Python for
  it's ease of use and ease of integration in order to meet immediate
  targets (something that works well enough in the first 6 months of
  development).

  If there is ever a BuildStream 3, it's main driver might be to switch
  the plugin implementations into something declarative which is
  entirely under BuildStream core control.


With all of that out of the way, I'd like to separately reply to this:

  "A repeatable plugin should not have a issue with writing to the 
   sandbox. So long as what is written is completely determined by the
   yaml in the project (element.bst + project.conf + dep.bst) and is
   thus repeatable and deterministic."

So, to reuse a term which I came up with in the above text,
"dynamic controlled content" for lack of a better term, could perhaps
simply be a resolved %{variable} (which could contain a lot of text and
act like a template, with conditionally resolved variables substituted
within).

If this is such an interesting feature to have, I don't see much reason
why we could not implement such a feature in BuildStream, even using
Python plugins. This could be an Element API which takes a Sandbox, a
"%{variable}" name and an absolute path, which could stage a resolved
variable as the content of a file in the sandbox safely.

It would be interesting to see a proposal for such a feature, probably
I would argue that the permissions used to stage such a file be very
limited, or unspecified (something matching the hard coded permissions
used to stage files from Sources into the sandbox).

That said, a controlled feature like this would be an extremely far cry
from allowing python code to simply write whatever they want into the
sandbox, and would not allow for the non-deterministic things which are
currently being done by existing plugins which exploit this currently
existing weakness.

This took me a while to write today, I hope this has achieved it's goal
in being as informative as I hoped it would be.

Best Regards,
    -Tristan

[0]: https://gitlab.com/BuildStream/bst-plugins-experimental/-/blob/master/src/bst_plugins_experimental/elements/oci.py

Re: [BuildStream] Protect against plugin modifications of artifacts

Posted by William Salmon <wi...@codethink.co.uk>.


On 23/06/2020 14:04, Tristan Van Berkom wrote:
> One more round...
> 
> On Tue, 2020-06-23 at 11:12 +0100, William Salmon wrote:
>>
> [...]
>>> Of course you need to stage the tooling which is used to create output,
>>> that is because BuildStream intentionally enforces that you always use
>>> deterministic input, and the exact build/copy of the binaries used to
>>> create output, is a part of the input.
>>>
>>>>    > The fact that generating data in python is problematic is not a
>>>> reason to avoid fixing illegal writing to the sandbox (which probably
>>>> mostly happens *due* to the latter problem), its a reason to ensure that
>>>> we *also* ensure that doesn’t happen.
>>>>    >
>>>>
>>>> While the addition may have been a error in your eyes, If writing to the
>>>> sandbox was not public API I would have be arguing that it should be.
>>>> For the reasons above.
>>>
>>> It was most definitely an error.
>>>
>>> The plugins need to have a way to stage files in locations of their
>>> choosing, before the vdir abstraction was in place, the only way to do
>>> this was by providing a directory argument.
>>>
>>> This was abused, and has now lead to generation of artifact data which
>>> is generated non-deterministically, without controlling the inputs of
>>> this output, which is at the heart of BuildStream's promise.
>>
>> I feel you are conflating two things, are plugins `non-deterministic`
>> and should we let plugins alter whats in the sandbox. The answer to
>> should plugins be deterministic should clearly be yes and the answer to
>> should plugins effect what happens in the sandbox seem clearly yes. If
>> plugins are non deterministic then we need to fix that, plugins effect
>> the build in a number of ways and if we cant trust plugins then we cant
>> trust anything about bst as every element is build with a plugin.
>>
>> Once plugins are trust able, and they must be or the hole concept of
>> cache keys falls apart. Then your hole argument for why we cant put
>> things in the sandbox falls apart, given that we must fix plugins so
>> they are trust able then I fail to see a issue with plugins putting
>> things in to the sandbox.
> 
> There are two things you appear to be conflating, which is stability
> and correctness of cache keys, and reproducibility of build artifacts.
> 
> 
> This whole discussion is about reproducibility, not about cache key
> composition (although it was raised as an orthogonal concern, it is not
> centric to why we don't use plugins to create output).
> 
> For reproducibility, the plugins cannot reasonably be trusted to
> compose data reproducibly, the premise of creating reproducible output
> in artifacts is to use deterministic inputs: The host version of
> installed python, plus the versions of any python libraries which are
> running, are not deterministic.
> 
> BuildStream ensures as a promise, that output is based on deterministic
> inputs, do not conflate this with the calculation of cache keys which
> BuildStream uses to identify the inputs.
> 
> I've already demonstrated in my first reply to you an example of how
> the output of collect_manifest is already non-reproducible, precisely
> *because* the host version of python will have an effect on it's
> output, I emphasized that this was an *example* (not an invitation to
> haggle over whether 'dict' can be trusted in the future, there is no
> guarantee that it can), that said, it was a single example to
> demonstrate that host python *matters* when constructing output.
> 
> Keep the following things in mind:
> 
>    * An important set of BuildStream users will be striving to produce
>      reproducible output.
> 
>      If they achieve reproducibility today, they should be able to take
>      the exact same project state, and use the latest version of
>      BuildStream 2, in 10 years from now... and they should be able
>      to produce the exact same output, bit-for-bit.
> 
>    * While BuildStream all by itself cannot guarantee reproducible
>      output, our role is to guarantee that even after upgrading your
>      host BuildStream installation in 10 years from now, BuildStream
>      will repeat the build in *exactly the same way*.
> 
> The goal of deterministic building is entirely centric to the
> BuildStream mission, and cannot be derailed by a desire to have things
> just a little bit more convenient.

I'm sorry if I have, but I have at no point meant to have said that bst 
should not be deterministic or repeatable. My point about cache keys is 
that for a given set of inputs we should have a consistent cache key and 
we should always get the same output, ie. the build should be repeatable.

If plugins are not repeatable because they are write in python then 
surely we should not have our plugins written in python? Plugins affect 
the repeatability in may ways and if they them selves are not repeatable 
then surely this is a different issue and needs addressing urgently!

Once our plugins are repeatable then I don't see a issue with them 
adding to the sandbox through a strict API. So long as for a given input 
they create the same cache key and they produce the same repeatable output.

In my mind these are two distinct issues and I do not understand why 
they need to be coupled:

   * Plugins must be repeatable in all that they do or bst can not claim 
to be repeatable.

   * A repeatable plugin should not have a issue with writing to the 
sandbox. So long as what is written is completely determined by the yaml 
in the project (element.bst + project.conf + dep.bst) and is thus 
repeatable and deterministic.

> 
> 
> So, please keep in mind, plugins are not there to create output, there
> is no way we can make plugins trustable for creating output, because we
> cannot be in control of the environment in which plugins run.
> 
> Cheers,
>      -Tristan
> 
> 
>

Re: [BuildStream] Protect against plugin modifications of artifacts

Posted by Tristan Van Berkom <tr...@codethink.co.uk>.

One more round...

On Tue, 2020-06-23 at 11:12 +0100, William Salmon wrote:
> 
[...]
> > Of course you need to stage the tooling which is used to create output,
> > that is because BuildStream intentionally enforces that you always use
> > deterministic input, and the exact build/copy of the binaries used to
> > create output, is a part of the input.
> > 
> > >   > The fact that generating data in python is problematic is not a
> > > reason to avoid fixing illegal writing to the sandbox (which probably
> > > mostly happens *due* to the latter problem), its a reason to ensure that
> > > we *also* ensure that doesn’t happen.
> > >   >
> > > 
> > > While the addition may have been a error in your eyes, If writing to the
> > > sandbox was not public API I would have be arguing that it should be.
> > > For the reasons above.
> > 
> > It was most definitely an error.
> > 
> > The plugins need to have a way to stage files in locations of their
> > choosing, before the vdir abstraction was in place, the only way to do
> > this was by providing a directory argument.
> > 
> > This was abused, and has now lead to generation of artifact data which
> > is generated non-deterministically, without controlling the inputs of
> > this output, which is at the heart of BuildStream's promise.
> 
> I feel you are conflating two things, are plugins `non-deterministic` 
> and should we let plugins alter whats in the sandbox. The answer to 
> should plugins be deterministic should clearly be yes and the answer to 
> should plugins effect what happens in the sandbox seem clearly yes. If 
> plugins are non deterministic then we need to fix that, plugins effect 
> the build in a number of ways and if we cant trust plugins then we cant 
> trust anything about bst as every element is build with a plugin.
> 
> Once plugins are trust able, and they must be or the hole concept of 
> cache keys falls apart. Then your hole argument for why we cant put 
> things in the sandbox falls apart, given that we must fix plugins so 
> they are trust able then I fail to see a issue with plugins putting 
> things in to the sandbox.

There are two things you appear to be conflating, which is stability
and correctness of cache keys, and reproducibility of build artifacts.

This whole discussion is about reproducibility, not about cache key
composition (although it was raised as an orthogonal concern, it is not
centric to why we don't use plugins to create output).

For reproducibility, the plugins cannot reasonably be trusted to
compose data reproducibly, the premise of creating reproducible output
in artifacts is to use deterministic inputs: The host version of
installed python, plus the versions of any python libraries which are
running, are not deterministic.

BuildStream ensures as a promise, that output is based on deterministic
inputs, do not conflate this with the calculation of cache keys which
BuildStream uses to identify the inputs.

I've already demonstrated in my first reply to you an example of how
the output of collect_manifest is already non-reproducible, precisely
*because* the host version of python will have an effect on it's
output, I emphasized that this was an *example* (not an invitation to
haggle over whether 'dict' can be trusted in the future, there is no
guarantee that it can), that said, it was a single example to
demonstrate that host python *matters* when constructing output.

Keep the following things in mind:

  * An important set of BuildStream users will be striving to produce
    reproducible output.

    If they achieve reproducibility today, they should be able to take
    the exact same project state, and use the latest version of
    BuildStream 2, in 10 years from now... and they should be able
    to produce the exact same output, bit-for-bit.

  * While BuildStream all by itself cannot guarantee reproducible
    output, our role is to guarantee that even after upgrading your
    host BuildStream installation in 10 years from now, BuildStream
    will repeat the build in *exactly the same way*.

The goal of deterministic building is entirely centric to the
BuildStream mission, and cannot be derailed by a desire to have things
just a little bit more convenient.

So, please keep in mind, plugins are not there to create output, there
is no way we can make plugins trustable for creating output, because we
cannot be in control of the environment in which plugins run.

Cheers,
    -Tristan

Re: [BuildStream] Protect against plugin modifications of artifacts

Posted by William Salmon <wi...@codethink.co.uk>.


On 22/06/2020 18:28, Tristan Van Berkom wrote:
> Hi,
> 
> On Mon, 2020-06-22 at 18:01 +0100, William Salmon wrote:
> [...]
>>> Definitely agree.
>>>
>>> The thing is, both are highly problematic.
>>>
>>
>> This is my issue, creating PLUGIN_GENERATED_CONTENT is some times highly
>> desirable / necessary*
>>
>> So we should A) make it as easy as possible to make sure
>> PLUGIN_GENERATED_CONTENT is unique to a cache key, B) minimize the risk
>> of PLUGIN_GENERATED_CONTENT not getting correctly translated in to the
>> sand box.
>>
>> Writing via the vdir API to the cas is a much better way to get
>> PLUGIN_GENERATED_CONTENT in to the sandbox, IMO because it dose not have
>> to be escaped to get through sh and cat,
> 
> But this is the whole point, data which ends up in artifacts should be
> created within the sandbox, where the input of that data is
> deterministic (i.e. the data _and_ tooling used to create that output).
> 
> If it is generated with scripts residing in the sandbox as everything
> else is generated as usual, then there is no extra escaping to to be
> done.
> 
> Further, as mentioned in a previous mail here, the shell invocations
> also need not be shell, we can run python or JS or anything in the
> sandbox as well, or stage scripts from sources to run in the sandbox,
> there is a lot of leeway here.
> 
>>   dose not need extra deps and
>> dose not needs manipulation of configuration commands etc.
> 
> This indeed.
> 
> You are focusing on the convenience of breaking the promise which
> BuildStream makes, this is where I cannot get on board.

Yes I am focused on using bst, plugins are a key part of why bst is 
usable, but i do also care about it being reproducible and having trust 
in the cache keys. Things get in to the sandbox by a variety of ways, by 
variable, config etc often as arguments on a programs cli but i really 
fail to see the difference in providing something as a config file or as 
part of somethings cli.

> 
> Of course you need to stage the tooling which is used to create output,
> that is because BuildStream intentionally enforces that you always use
> deterministic input, and the exact build/copy of the binaries used to
> create output, is a part of the input.
> 
>>   > The fact that generating data in python is problematic is not a
>> reason to avoid fixing illegal writing to the sandbox (which probably
>> mostly happens *due* to the latter problem), its a reason to ensure that
>> we *also* ensure that doesn’t happen.
>>   >
>>
>> While the addition may have been a error in your eyes, If writing to the
>> sandbox was not public API I would have be arguing that it should be.
>> For the reasons above.
> 
> It was most definitely an error.
> 
> The plugins need to have a way to stage files in locations of their
> choosing, before the vdir abstraction was in place, the only way to do
> this was by providing a directory argument.
> 
> This was abused, and has now lead to generation of artifact data which
> is generated non-deterministically, without controlling the inputs of
> this output, which is at the heart of BuildStream's promise.

I feel you are conflating two things, are plugins `non-deterministic` 
and should we let plugins alter whats in the sandbox. The answer to 
should plugins be deterministic should clearly be yes and the answer to 
should plugins effect what happens in the sandbox seem clearly yes. If 
plugins are non deterministic then we need to fix that, plugins effect 
the build in a number of ways and if we cant trust plugins then we cant 
trust anything about bst as every element is build with a plugin.

Once plugins are trust able, and they must be or the hole concept of 
cache keys falls apart. Then your hole argument for why we cant put 
things in the sandbox falls apart, given that we must fix plugins so 
they are trust able then I fail to see a issue with plugins putting 
things in to the sandbox.

> 
> Cheers,
>      -Tristan
> 
> 
>

Re: [BuildStream] Protect against plugin modifications of artifacts

Posted by Tristan Van Berkom <tr...@codethink.co.uk>.

Hi,

On Mon, 2020-06-22 at 18:01 +0100, William Salmon wrote:
[...]
> > Definitely agree.
> > 
> > The thing is, both are highly problematic.
> > 
> 
> This is my issue, creating PLUGIN_GENERATED_CONTENT is some times highly 
> desirable / necessary*
> 
> So we should A) make it as easy as possible to make sure 
> PLUGIN_GENERATED_CONTENT is unique to a cache key, B) minimize the risk 
> of PLUGIN_GENERATED_CONTENT not getting correctly translated in to the 
> sand box.
> 
> Writing via the vdir API to the cas is a much better way to get 
> PLUGIN_GENERATED_CONTENT in to the sandbox, IMO because it dose not have 
> to be escaped to get through sh and cat,

But this is the whole point, data which ends up in artifacts should be
created within the sandbox, where the input of that data is
deterministic (i.e. the data _and_ tooling used to create that output).

If it is generated with scripts residing in the sandbox as everything
else is generated as usual, then there is no extra escaping to to be
done.

Further, as mentioned in a previous mail here, the shell invocations
also need not be shell, we can run python or JS or anything in the
sandbox as well, or stage scripts from sources to run in the sandbox,
there is a lot of leeway here.

>  dose not need extra deps and 
> dose not needs manipulation of configuration commands etc.

This indeed.

You are focusing on the convenience of breaking the promise which
BuildStream makes, this is where I cannot get on board.

Of course you need to stage the tooling which is used to create output,
that is because BuildStream intentionally enforces that you always use
deterministic input, and the exact build/copy of the binaries used to
create output, is a part of the input.

>  > The fact that generating data in python is problematic is not a 
> reason to avoid fixing illegal writing to the sandbox (which probably 
> mostly happens *due* to the latter problem), its a reason to ensure that 
> we *also* ensure that doesn’t happen.
>  >
> 
> While the addition may have been a error in your eyes, If writing to the 
> sandbox was not public API I would have be arguing that it should be. 
> For the reasons above.

It was most definitely an error.

The plugins need to have a way to stage files in locations of their
choosing, before the vdir abstraction was in place, the only way to do
this was by providing a directory argument.

This was abused, and has now lead to generation of artifact data which
is generated non-deterministically, without controlling the inputs of
this output, which is at the heart of BuildStream's promise.

Cheers,
    -Tristan

Re: [BuildStream] Protect against plugin modifications of artifacts

Posted by William Salmon <wi...@codethink.co.uk>.

On 22/06/2020 16:52, Tristan Van Berkom wrote:
> Hi again,
> 
>> On Jun 22, 2020, at 10:50 PM, William Salmon <wi...@codethink.co.uk> wrote:
>>
>> 
>>
>>> On 22/06/2020 14:26, Tristan Van Berkom wrote:
>>> Hi,
>>>>> On Jun 22, 2020, at 9:37 PM, William Salmon <wi...@codethink.co.uk> wrote:
>>>>
>>>> 
>>>>
>>>>> On 22/06/2020 12:24, Tristan Van Berkom wrote:
>>>>> Hi William,
>>>>> Thanks for engaging in this.
>>>>>> On Mon, 2020-06-22 at 11:02 +0100, William Salmon wrote:
>>>>>> Reply in line.
>>>>> [...]
>>>> [..]
>>>>> The main differences here are:
>>>>>    A.) The guarantees we can provide about running safe sandboxed code
>>>>>        compared with untrusted local python interpreter code.
>>>>>        This would lead us down a path of providing ever more carefully
>>>>>        crafted APIs in the core in order to produce reproducible output
>>>>>        in the artifact, all the while not being certain what python APIs
>>>>>        a plugin might use to produce output.
>>>>
>>>> Else were you mention that you don't want to get in to a debate about if we can trust python. So lets not, lets decide were things should live and then decide if we need to change our plugin language separately.
>>>>
>>>> If a plugin creates a config file with logic dependent on config options and variables etc and then the plugin adds a configure command that calls cat to add this text to a file. This seems just as susceptible to plugin author or plugin language issues as if the plugin directly adds a file to the cas (via the nice api we have now) so it can be included in the sandbox.
>>>>
>>>> These seem equivalent except that using cat is more complex and requires cat be a build dependency. There are many alternatives to cat but they still require some extra bin be in the sandbox and adding new configure commands.
>>> It is not equivalent.
>>> `cat`, or whatever you use inside the sandbox, is part of a controlled environment, the python code your plugin is running, is not.
>>
>> Yes but sometimes the logic to generate the conf is non trivial so in those cases were the plugin needs to effect the text, the danger is in creating the PLUGIN_GENERATED_CONTENT and is a separate issue.
>>
>> Once PLUGIN_GENERATED_CONTENT is created we can ether
>>
>> configuration_commands.append("""
>> cat > /path/to/build/file.conf << EOF
>> """ + PLUGIN_GENERATED_CONTENT + """
>> EOF
>> """)
>>
>> or
>>
>> file.write(PLUGIN_GENERATED_CONTENT)
>>
> 
> Definitely agree.
> 
> The thing is, both are highly problematic.
> 

This is my issue, creating PLUGIN_GENERATED_CONTENT is some times highly 
desirable / necessary*

So we should A) make it as easy as possible to make sure 
PLUGIN_GENERATED_CONTENT is unique to a cache key, B) minimize the risk 
of PLUGIN_GENERATED_CONTENT not getting correctly translated in to the 
sand box.

Writing via the vdir API to the cas is a much better way to get 
PLUGIN_GENERATED_CONTENT in to the sandbox, IMO because it dose not have 
to be escaped to get through sh and cat, dose not need extra deps and 
dose not needs manipulation of configuration commands etc.

 > The fact that generating data in python is problematic is not a 
reason to avoid fixing illegal writing to the sandbox (which probably 
mostly happens *due* to the latter problem), its a reason to ensure that 
we *also* ensure that doesn’t happen.
 >

While the addition may have been a error in your eyes, If writing to the 
sandbox was not public API I would have be arguing that it should be. 
For the reasons above.

Directory access API has been used by plugin authors for a long time and 
it is the cleanest way to do a variety of things and has been used by a 
variety of plugin authors.

Regardless of why/how the API was added this thread is saying that we 
should get rid of the most reliable way to place 
PLUGIN_GENERATED_CONTENT in to the sandbox.

I am not saying that this API should not be improved or that there is no 
room to make it better or more tied to a cache key etc to reduce risks 
of things going wrong. But to completely remove this feature would seem 
to me to be a major step in the wrong direction.

* If we are ever going to generate a good image creation element or 
maybe something like a better OSTree element then having plugins 
generate config/content for the underlying tool is needed.

Re: [BuildStream] Protect against plugin modifications of artifacts

Posted by Tristan Van Berkom <tr...@codethink.co.uk>.

Hi again,

> On Jun 22, 2020, at 10:50 PM, William Salmon <wi...@codethink.co.uk> wrote:
> 
> 
> 
>> On 22/06/2020 14:26, Tristan Van Berkom wrote:
>> Hi,
>>>> On Jun 22, 2020, at 9:37 PM, William Salmon <wi...@codethink.co.uk> wrote:
>>> 
>>> 
>>> 
>>>> On 22/06/2020 12:24, Tristan Van Berkom wrote:
>>>> Hi William,
>>>> Thanks for engaging in this.
>>>>> On Mon, 2020-06-22 at 11:02 +0100, William Salmon wrote:
>>>>> Reply in line.
>>>> [...]
>>> [..]
>>>> The main differences here are:
>>>>   A.) The guarantees we can provide about running safe sandboxed code
>>>>       compared with untrusted local python interpreter code.
>>>>       This would lead us down a path of providing ever more carefully
>>>>       crafted APIs in the core in order to produce reproducible output
>>>>       in the artifact, all the while not being certain what python APIs
>>>>       a plugin might use to produce output.
>>> 
>>> Else were you mention that you don't want to get in to a debate about if we can trust python. So lets not, lets decide were things should live and then decide if we need to change our plugin language separately.
>>> 
>>> If a plugin creates a config file with logic dependent on config options and variables etc and then the plugin adds a configure command that calls cat to add this text to a file. This seems just as susceptible to plugin author or plugin language issues as if the plugin directly adds a file to the cas (via the nice api we have now) so it can be included in the sandbox.
>>> 
>>> These seem equivalent except that using cat is more complex and requires cat be a build dependency. There are many alternatives to cat but they still require some extra bin be in the sandbox and adding new configure commands.
>> It is not equivalent.
>> `cat`, or whatever you use inside the sandbox, is part of a controlled environment, the python code your plugin is running, is not.
> 
> Yes but sometimes the logic to generate the conf is non trivial so in those cases were the plugin needs to effect the text, the danger is in creating the PLUGIN_GENERATED_CONTENT and is a separate issue.
> 
> Once PLUGIN_GENERATED_CONTENT is created we can ether
> 
> configuration_commands.append("""
> cat > /path/to/build/file.conf << EOF
> """ + PLUGIN_GENERATED_CONTENT + """
> EOF
> """)
> 
> or
> 
> file.write(PLUGIN_GENERATED_CONTENT)
> 

Definitely agree.

The thing is, both are highly problematic.

The fact that generating data in python is problematic is not a reason to avoid fixing illegal writing to the sandbox (which probably mostly happens *due* to the latter problem), its a reason to ensure that we *also* ensure that doesn’t happen.

Maybe the key here is to ensure a safe and stable way for a core maintained function to reliably export public data to element assemble methods. In doing so, we can provide any context which elements need to generate output based on this public data safely, and this would be done by a single, well tested codepath ensuring that public data is never exported differently with incoming future python versions.

Cheers,
    -Tristan

PS: I think this whole thread also calls mutability of public data by plugin code into question, if I’m not mistaken this only happens for the sake of Debian build plugins, which could otherwise be transporting this context directly through artifact payload data.

Re: [BuildStream] Protect against plugin modifications of artifacts

Posted by William Salmon <wi...@codethink.co.uk>.


On 22/06/2020 14:26, Tristan Van Berkom wrote:
> Hi,
> 
>> On Jun 22, 2020, at 9:37 PM, William Salmon <wi...@codethink.co.uk> wrote:
>>
>> 
>>
>>> On 22/06/2020 12:24, Tristan Van Berkom wrote:
>>> Hi William,
>>> Thanks for engaging in this.
>>>> On Mon, 2020-06-22 at 11:02 +0100, William Salmon wrote:
>>>> Reply in line.
>>> [...]
>> [..]
>>> The main differences here are:
>>>    A.) The guarantees we can provide about running safe sandboxed code
>>>        compared with untrusted local python interpreter code.
>>>        This would lead us down a path of providing ever more carefully
>>>        crafted APIs in the core in order to produce reproducible output
>>>        in the artifact, all the while not being certain what python APIs
>>>        a plugin might use to produce output.
>>
>> Else were you mention that you don't want to get in to a debate about if we can trust python. So lets not, lets decide were things should live and then decide if we need to change our plugin language separately.
>>
>> If a plugin creates a config file with logic dependent on config options and variables etc and then the plugin adds a configure command that calls cat to add this text to a file. This seems just as susceptible to plugin author or plugin language issues as if the plugin directly adds a file to the cas (via the nice api we have now) so it can be included in the sandbox.
>>
>> These seem equivalent except that using cat is more complex and requires cat be a build dependency. There are many alternatives to cat but they still require some extra bin be in the sandbox and adding new configure commands.
> 
> It is not equivalent.
> 
> `cat`, or whatever you use inside the sandbox, is part of a controlled environment, the python code your plugin is running, is not.

Yes but sometimes the logic to generate the conf is non trivial so in 
those cases were the plugin needs to effect the text, the danger is in 
creating the PLUGIN_GENERATED_CONTENT and is a separate issue.

Once PLUGIN_GENERATED_CONTENT is created we can ether

configuration_commands.append("""
cat > /path/to/build/file.conf << EOF
""" + PLUGIN_GENERATED_CONTENT + """
EOF
""")

or

file.write(PLUGIN_GENERATED_CONTENT)

And I don't think one is safer than the other but the second is much 
less complicated for a plugin author.

> 
> Yes it is possible to have non reproducible output also from within the sandbox, but the chances are lower and the tools being used are deterministic, which is kind of the whole point.
> 
> Permutations within the sandbox is what BuildStream provides and ensures, hacking at the content with plugin code breaks this contract.
> 
> This is only considering the point (A) above which is the most important/relevant, other than this you also have a problematic (C) which you raised yourself.
> 
> While it is quite obvious that input configurations such as commands provided in `Plugin.configure()` must be a part of the unique key, this is less obvious for code changes in manual strong arming of the sandbox in python code, and more likely to be overlooked in third party plugin code (which should not be easy to get wrong).
> 
> Regarding the other comment about multiple ways of doing things: it is impossible to remove the possibility of manipulating files within the sandbox, as this is kind of the primary function of BuildStream (while manipulating files with plugin code was never been a function of BuildStream in the first place).
> 
> Cheers,
>      -Tristan
> 
> 
>

Re: [BuildStream] Protect against plugin modifications of artifacts

Posted by Tristan Van Berkom <tr...@codethink.co.uk>.

Hi,

> On Jun 22, 2020, at 9:37 PM, William Salmon <wi...@codethink.co.uk> wrote:
> 
> 
> 
>> On 22/06/2020 12:24, Tristan Van Berkom wrote:
>> Hi William,
>> Thanks for engaging in this.
>>> On Mon, 2020-06-22 at 11:02 +0100, William Salmon wrote:
>>> Reply in line.
>> [...]
> [..]
>> The main differences here are:
>>   A.) The guarantees we can provide about running safe sandboxed code
>>       compared with untrusted local python interpreter code.
>>       This would lead us down a path of providing ever more carefully
>>       crafted APIs in the core in order to produce reproducible output
>>       in the artifact, all the while not being certain what python APIs
>>       a plugin might use to produce output.
> 
> Else were you mention that you don't want to get in to a debate about if we can trust python. So lets not, lets decide were things should live and then decide if we need to change our plugin language separately.
> 
> If a plugin creates a config file with logic dependent on config options and variables etc and then the plugin adds a configure command that calls cat to add this text to a file. This seems just as susceptible to plugin author or plugin language issues as if the plugin directly adds a file to the cas (via the nice api we have now) so it can be included in the sandbox.
> 
> These seem equivalent except that using cat is more complex and requires cat be a build dependency. There are many alternatives to cat but they still require some extra bin be in the sandbox and adding new configure commands.

It is not equivalent.

`cat`, or whatever you use inside the sandbox, is part of a controlled environment, the python code your plugin is running, is not.

Yes it is possible to have non reproducible output also from within the sandbox, but the chances are lower and the tools being used are deterministic, which is kind of the whole point.

Permutations within the sandbox is what BuildStream provides and ensures, hacking at the content with plugin code breaks this contract.

This is only considering the point (A) above which is the most important/relevant, other than this you also have a problematic (C) which you raised yourself.

While it is quite obvious that input configurations such as commands provided in `Plugin.configure()` must be a part of the unique key, this is less obvious for code changes in manual strong arming of the sandbox in python code, and more likely to be overlooked in third party plugin code (which should not be easy to get wrong).

Regarding the other comment about multiple ways of doing things: it is impossible to remove the possibility of manipulating files within the sandbox, as this is kind of the primary function of BuildStream (while manipulating files with plugin code was never been a function of BuildStream in the first place).

Cheers,
    -Tristan

Re: [BuildStream] Protect against plugin modifications of artifacts

Posted by William Salmon <wi...@codethink.co.uk>.

On 22/06/2020 12:24, Tristan Van Berkom wrote:
> Hi William,
> 
> Thanks for engaging in this.
> 
> On Mon, 2020-06-22 at 11:02 +0100, William Salmon wrote:
>> Reply in line.
> [...]
[..]
> 
> The main differences here are:
> 
>    A.) The guarantees we can provide about running safe sandboxed code
>        compared with untrusted local python interpreter code.
> 
>        This would lead us down a path of providing ever more carefully
>        crafted APIs in the core in order to produce reproducible output
>        in the artifact, all the while not being certain what python APIs
>        a plugin might use to produce output.

Else were you mention that you don't want to get in to a debate about if 
we can trust python. So lets not, lets decide were things should live 
and then decide if we need to change our plugin language separately.

If a plugin creates a config file with logic dependent on config options 
and variables etc and then the plugin adds a configure command that 
calls cat to add this text to a file. This seems just as susceptible to 
plugin author or plugin language issues as if the plugin directly adds a 
file to the cas (via the nice api we have now) so it can be included in 
the sandbox.

These seem equivalent except that using cat is more complex and requires 
cat be a build dependency. There are many alternatives to cat but they 
still require some extra bin be in the sandbox and adding new configure 
commands.

> 
>    B.) Simply having a "second way" of doing things complicates things
>        for us, as the question of which way to choose arises for a
>        plugin author, when the safe way should be the only choice.
> 

I agree that we should not have lots of ways of doing the same thing. I 
disagree on what is a good way to assemble a configuration file.