You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Jacob Wujciak <ja...@voltrondata.com.INVALID> on 2022/12/16 13:46:03 UTC

[DISC] Self-Hosted Runners for Arrow

I would like to propose the addition of a self-hosted runner system to the
arrow repository to add speciality runners (arm64 and CUDA). This will
allow us to compensate for the arm64 jobs that previously ran on Travis,
which will be turned off EOY[1].

The migration to GitHub Issues will require a significant extension of our
existing “comment bot”-workflows (e.g. assigning and labeling issues for
non-committers, see [3]), with such a system we could add reserved runners
that only pick up these “comment bot”-jobs to guarantee a smooth developer
experience, regardless of the state of the ASF CI resources.

As the allocation of GitHub-hosted runners for the Apache software
foundation was recently increased, the queue times are currently low, but
this will inevitably change and such a system would enable us to react
quickly to such changes by adding new Windows and Linux nodes without any
need for INFRA intervention.

We at Voltron Data have been working on a Kubernetes based system to deploy
auto-scaling ephemeral GitHub runners that can be seamlessly added to the
arrow repository via a Github App. As the runners are ephemeral (each job
is run in an isolated environment that is destroyed once the job is done)
the usual security issues with self-hosted runners do not apply [2].

Voltron Data has open sourced the necessary Infrastructure as Code [4],
this makes it possible for other interested parties to donate CI capacity
to arrow or other ASF projects by cloning the IaC, setting up and
maintaining their own Instance of the system. Voltron Data will set up and
maintain one instance of the system.

The dockerfiles for the runners will be added to the main arrow repo to
facilitate easy changes and updates to the runner configuration for the
community.

Best,
Jacob

[1]: https://cwiki.apache.org/confluence/display/INFRA/Travis+Migrations

[2]:
https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security

[3]: https://github.com/apache/arrow/actions/workflows/comment_bot.yml

[4]: https://github.com/voltrondata-labs/gha-controller-infra

Re: [DISC] Self-Hosted Runners for Arrow

Posted by Martin Grigorov <mg...@apache.org>.
Hi,

On Tue, Feb 14, 2023 at 12:51 PM Raúl Cumplido <ra...@gmail.com>
wrote:

> Hi,
>
> Following up on this thread, I am going to try and coordinate to set up an
> instance of the self-hosted runners for arm64 on the Arrow repository.
>
> There was a question about using Travis CI on Crossbow for those jobs. That
> could be a possibility but I think there are some benefits to the proposed
> solution:
> - Having the possibility to have those runners on the Arrow repo will allow
> us to run these jobs on a PR basis, as we do today instead of as external
> adhoc tests.
> - Moving those jobs to GHA would be beneficial for maintenance purposes.
> That's where the majority of our CI is hosted. Trying to get rid of a CI
> system (travis).
> - We are already lacking resources on arm64 CI on Crossbow. We had to
> remove libarrow-flight-dev packages built for arm64. See:
> https://github.com/apache/arrow/issues/33934
> - Finding a solution that allows us to increase the number of runners on
> the Arrow repo and run the CI from the Arrow repo would be beneficial not
> only for those jobs but for extra CI capacity if/when needed for future
> purposes.
>
> About the s390x jobs there is some Apache INFRA CI on Jenkins that could be
> used if we can't find an alternative. I've asked on ASF Slack for more
> information about that and here are a couple of examples of builds on other
> Apache projects for s390x:
>
> https://github.com/apache/camel/blob/e7825a48c9f3d1202333c4f311330be55ff30257/Jenkinsfile.s390x#L20
>
> https://github.com/apache/activemq/blob/c58286487d08d155496e571db649f047bd979630/Jenkinsfile#L45
>
> To be honest it doesn't seem ideal to add a new CI system but if we can't
> find other possibilities for s390x hosts and we want to maintain them on CI
> I can't think of others.
>

Github Actions does not support s390x on self-hosted runners at the moment.
And there is nothing in their plans:
https://github.com/github/roadmap/issues?q=s390x

You could try with https://github.com/uraimo/run-on-arch-action/. It uses
QEMU to emulate armv6, armv7, aarch64, s390x and ppc64le. But it might be
too slow for your needs...



>
> Kind regards,
> Raúl
>
> El jue, 22 dic 2022 a las 22:20, Sutou Kouhei (<ko...@clear-code.com>)
> escribió:
>
> > Hi,
> >
> > We can keep using Travis CI via Crossbow by the following
> > approach:
> > https://github.com/apache/arrow/pull/14751
> >
> > Travis CI for https://github.com/ursacomputing/crossbow is
> > sponsored by Voltron Data (not ASF) for arm64 Linux
> > packages.
> >
>
> > How about using the approach for s390x?
> >
> >
> > Thanks,
> > --
> > kou
> >
> > In <CA...@mail.gmail.com>
> >   "Re: [DISC] Self-Hosted Runners for Arrow" on Fri, 16 Dec 2022 19:26:36
> > +0100,
> >   Jacob Wujciak <ja...@voltrondata.com.INVALID> wrote:
> >
> > > No news with regards to arrow specific S390x machines but apparently
> IBM
> > > has donated a number of S390x VMs to the ASF which we should be able to
> > use
> > > but I have not had the time yet to investigate this option.
> > >
> > >
> > > Matt Topol <zo...@gmail.com> schrieb am Fr., 16. Dez. 2022,
> > 17:01:
> > >
> > >> These are awesome! Has there been any luck in reaching out to IBM to
> > see if
> > >> they could donate one or more s390x VMs to use as runners for testing
> > the
> > >> s390x builds? That is probably my only concern with Travis going away
> at
> > >> EOY, since we don't have a way currently to test those builds on GH
> > >> Actions.
> > >>
> > >> --Matt
> > >>
> > >> On Fri, Dec 16, 2022 at 8:46 AM Jacob Wujciak
> > >> <ja...@voltrondata.com.invalid>
> > >> wrote:
> > >>
> > >> > I would like to propose the addition of a self-hosted runner system
> to
> > >> the
> > >> > arrow repository to add speciality runners (arm64 and CUDA). This
> will
> > >> > allow us to compensate for the arm64 jobs that previously ran on
> > Travis,
> > >> > which will be turned off EOY[1].
> > >> >
> > >> > The migration to GitHub Issues will require a significant extension
> of
> > >> our
> > >> > existing “comment bot”-workflows (e.g. assigning and labeling issues
> > for
> > >> > non-committers, see [3]), with such a system we could add reserved
> > >> runners
> > >> > that only pick up these “comment bot”-jobs to guarantee a smooth
> > >> developer
> > >> > experience, regardless of the state of the ASF CI resources.
> > >> >
> > >> > As the allocation of GitHub-hosted runners for the Apache software
> > >> > foundation was recently increased, the queue times are currently
> low,
> > but
> > >> > this will inevitably change and such a system would enable us to
> react
> > >> > quickly to such changes by adding new Windows and Linux nodes
> without
> > any
> > >> > need for INFRA intervention.
> > >> >
> > >> > We at Voltron Data have been working on a Kubernetes based system to
> > >> deploy
> > >> > auto-scaling ephemeral GitHub runners that can be seamlessly added
> to
> > the
> > >> > arrow repository via a Github App. As the runners are ephemeral
> (each
> > job
> > >> > is run in an isolated environment that is destroyed once the job is
> > done)
> > >> > the usual security issues with self-hosted runners do not apply [2].
> > >> >
> > >> > Voltron Data has open sourced the necessary Infrastructure as Code
> > [4],
> > >> > this makes it possible for other interested parties to donate CI
> > capacity
> > >> > to arrow or other ASF projects by cloning the IaC, setting up and
> > >> > maintaining their own Instance of the system. Voltron Data will set
> up
> > >> and
> > >> > maintain one instance of the system.
> > >> >
> > >> > The dockerfiles for the runners will be added to the main arrow repo
> > to
> > >> > facilitate easy changes and updates to the runner configuration for
> > the
> > >> > community.
> > >> >
> > >> > Best,
> > >> > Jacob
> > >> >
> > >> > [1]:
> > https://cwiki.apache.org/confluence/display/INFRA/Travis+Migrations
> > >> >
> > >> > [2]:
> > >> >
> > >> >
> > >>
> >
> https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security
> > >> >
> > >> > [3]:
> > https://github.com/apache/arrow/actions/workflows/comment_bot.yml
> > >> >
> > >> > [4]: https://github.com/voltrondata-labs/gha-controller-infra
> > >> >
> > >>
> >
>

Re: [DISC] Self-Hosted Runners for Arrow

Posted by Raúl Cumplido <ra...@gmail.com>.
Hi,

Following up on this thread, I am going to try and coordinate to set up an
instance of the self-hosted runners for arm64 on the Arrow repository.

There was a question about using Travis CI on Crossbow for those jobs. That
could be a possibility but I think there are some benefits to the proposed
solution:
- Having the possibility to have those runners on the Arrow repo will allow
us to run these jobs on a PR basis, as we do today instead of as external
adhoc tests.
- Moving those jobs to GHA would be beneficial for maintenance purposes.
That's where the majority of our CI is hosted. Trying to get rid of a CI
system (travis).
- We are already lacking resources on arm64 CI on Crossbow. We had to
remove libarrow-flight-dev packages built for arm64. See:
https://github.com/apache/arrow/issues/33934
- Finding a solution that allows us to increase the number of runners on
the Arrow repo and run the CI from the Arrow repo would be beneficial not
only for those jobs but for extra CI capacity if/when needed for future
purposes.

About the s390x jobs there is some Apache INFRA CI on Jenkins that could be
used if we can't find an alternative. I've asked on ASF Slack for more
information about that and here are a couple of examples of builds on other
Apache projects for s390x:
https://github.com/apache/camel/blob/e7825a48c9f3d1202333c4f311330be55ff30257/Jenkinsfile.s390x#L20
https://github.com/apache/activemq/blob/c58286487d08d155496e571db649f047bd979630/Jenkinsfile#L45

To be honest it doesn't seem ideal to add a new CI system but if we can't
find other possibilities for s390x hosts and we want to maintain them on CI
I can't think of others.

Kind regards,
Raúl

El jue, 22 dic 2022 a las 22:20, Sutou Kouhei (<ko...@clear-code.com>)
escribió:

> Hi,
>
> We can keep using Travis CI via Crossbow by the following
> approach:
> https://github.com/apache/arrow/pull/14751
>
> Travis CI for https://github.com/ursacomputing/crossbow is
> sponsored by Voltron Data (not ASF) for arm64 Linux
> packages.
>

> How about using the approach for s390x?
>
>
> Thanks,
> --
> kou
>
> In <CA...@mail.gmail.com>
>   "Re: [DISC] Self-Hosted Runners for Arrow" on Fri, 16 Dec 2022 19:26:36
> +0100,
>   Jacob Wujciak <ja...@voltrondata.com.INVALID> wrote:
>
> > No news with regards to arrow specific S390x machines but apparently IBM
> > has donated a number of S390x VMs to the ASF which we should be able to
> use
> > but I have not had the time yet to investigate this option.
> >
> >
> > Matt Topol <zo...@gmail.com> schrieb am Fr., 16. Dez. 2022,
> 17:01:
> >
> >> These are awesome! Has there been any luck in reaching out to IBM to
> see if
> >> they could donate one or more s390x VMs to use as runners for testing
> the
> >> s390x builds? That is probably my only concern with Travis going away at
> >> EOY, since we don't have a way currently to test those builds on GH
> >> Actions.
> >>
> >> --Matt
> >>
> >> On Fri, Dec 16, 2022 at 8:46 AM Jacob Wujciak
> >> <ja...@voltrondata.com.invalid>
> >> wrote:
> >>
> >> > I would like to propose the addition of a self-hosted runner system to
> >> the
> >> > arrow repository to add speciality runners (arm64 and CUDA). This will
> >> > allow us to compensate for the arm64 jobs that previously ran on
> Travis,
> >> > which will be turned off EOY[1].
> >> >
> >> > The migration to GitHub Issues will require a significant extension of
> >> our
> >> > existing “comment bot”-workflows (e.g. assigning and labeling issues
> for
> >> > non-committers, see [3]), with such a system we could add reserved
> >> runners
> >> > that only pick up these “comment bot”-jobs to guarantee a smooth
> >> developer
> >> > experience, regardless of the state of the ASF CI resources.
> >> >
> >> > As the allocation of GitHub-hosted runners for the Apache software
> >> > foundation was recently increased, the queue times are currently low,
> but
> >> > this will inevitably change and such a system would enable us to react
> >> > quickly to such changes by adding new Windows and Linux nodes without
> any
> >> > need for INFRA intervention.
> >> >
> >> > We at Voltron Data have been working on a Kubernetes based system to
> >> deploy
> >> > auto-scaling ephemeral GitHub runners that can be seamlessly added to
> the
> >> > arrow repository via a Github App. As the runners are ephemeral (each
> job
> >> > is run in an isolated environment that is destroyed once the job is
> done)
> >> > the usual security issues with self-hosted runners do not apply [2].
> >> >
> >> > Voltron Data has open sourced the necessary Infrastructure as Code
> [4],
> >> > this makes it possible for other interested parties to donate CI
> capacity
> >> > to arrow or other ASF projects by cloning the IaC, setting up and
> >> > maintaining their own Instance of the system. Voltron Data will set up
> >> and
> >> > maintain one instance of the system.
> >> >
> >> > The dockerfiles for the runners will be added to the main arrow repo
> to
> >> > facilitate easy changes and updates to the runner configuration for
> the
> >> > community.
> >> >
> >> > Best,
> >> > Jacob
> >> >
> >> > [1]:
> https://cwiki.apache.org/confluence/display/INFRA/Travis+Migrations
> >> >
> >> > [2]:
> >> >
> >> >
> >>
> https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security
> >> >
> >> > [3]:
> https://github.com/apache/arrow/actions/workflows/comment_bot.yml
> >> >
> >> > [4]: https://github.com/voltrondata-labs/gha-controller-infra
> >> >
> >>
>

Re: [DISC] Self-Hosted Runners for Arrow

Posted by Sutou Kouhei <ko...@clear-code.com>.
Hi,

We can keep using Travis CI via Crossbow by the following
approach:
https://github.com/apache/arrow/pull/14751

Travis CI for https://github.com/ursacomputing/crossbow is
sponsored by Voltron Data (not ASF) for arm64 Linux
packages.

How about using the approach for s390x?


Thanks,
-- 
kou

In <CA...@mail.gmail.com>
  "Re: [DISC] Self-Hosted Runners for Arrow" on Fri, 16 Dec 2022 19:26:36 +0100,
  Jacob Wujciak <ja...@voltrondata.com.INVALID> wrote:

> No news with regards to arrow specific S390x machines but apparently IBM
> has donated a number of S390x VMs to the ASF which we should be able to use
> but I have not had the time yet to investigate this option.
> 
> 
> Matt Topol <zo...@gmail.com> schrieb am Fr., 16. Dez. 2022, 17:01:
> 
>> These are awesome! Has there been any luck in reaching out to IBM to see if
>> they could donate one or more s390x VMs to use as runners for testing the
>> s390x builds? That is probably my only concern with Travis going away at
>> EOY, since we don't have a way currently to test those builds on GH
>> Actions.
>>
>> --Matt
>>
>> On Fri, Dec 16, 2022 at 8:46 AM Jacob Wujciak
>> <ja...@voltrondata.com.invalid>
>> wrote:
>>
>> > I would like to propose the addition of a self-hosted runner system to
>> the
>> > arrow repository to add speciality runners (arm64 and CUDA). This will
>> > allow us to compensate for the arm64 jobs that previously ran on Travis,
>> > which will be turned off EOY[1].
>> >
>> > The migration to GitHub Issues will require a significant extension of
>> our
>> > existing “comment bot”-workflows (e.g. assigning and labeling issues for
>> > non-committers, see [3]), with such a system we could add reserved
>> runners
>> > that only pick up these “comment bot”-jobs to guarantee a smooth
>> developer
>> > experience, regardless of the state of the ASF CI resources.
>> >
>> > As the allocation of GitHub-hosted runners for the Apache software
>> > foundation was recently increased, the queue times are currently low, but
>> > this will inevitably change and such a system would enable us to react
>> > quickly to such changes by adding new Windows and Linux nodes without any
>> > need for INFRA intervention.
>> >
>> > We at Voltron Data have been working on a Kubernetes based system to
>> deploy
>> > auto-scaling ephemeral GitHub runners that can be seamlessly added to the
>> > arrow repository via a Github App. As the runners are ephemeral (each job
>> > is run in an isolated environment that is destroyed once the job is done)
>> > the usual security issues with self-hosted runners do not apply [2].
>> >
>> > Voltron Data has open sourced the necessary Infrastructure as Code [4],
>> > this makes it possible for other interested parties to donate CI capacity
>> > to arrow or other ASF projects by cloning the IaC, setting up and
>> > maintaining their own Instance of the system. Voltron Data will set up
>> and
>> > maintain one instance of the system.
>> >
>> > The dockerfiles for the runners will be added to the main arrow repo to
>> > facilitate easy changes and updates to the runner configuration for the
>> > community.
>> >
>> > Best,
>> > Jacob
>> >
>> > [1]: https://cwiki.apache.org/confluence/display/INFRA/Travis+Migrations
>> >
>> > [2]:
>> >
>> >
>> https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security
>> >
>> > [3]: https://github.com/apache/arrow/actions/workflows/comment_bot.yml
>> >
>> > [4]: https://github.com/voltrondata-labs/gha-controller-infra
>> >
>>

Re: [DISC] Self-Hosted Runners for Arrow

Posted by Jacob Wujciak <ja...@voltrondata.com.INVALID>.
If there are no objections we will start setting up the instance and
working with INFRA to connect it to the arrow repo after the holidays.

Happy Holidays Everyone!

On Mon, Dec 19, 2022 at 3:15 PM Jacob Wujciak <ja...@voltrondata.com> wrote:

> Jarek, thank you for the glowing review :)
>
> Yes, we will have monitoring setup in the instance we are going to host to
> protect against abuse like that but as we use a non-FOSS tool for
> monitoring internally there is no code included for this at this time.
>
> I would like to give a shout out to Álvaro Maldonado Mateos and Ian Flores
> Siaca who have been doing the work of implementing this and are available
> for detailed technical questions or suggestions via the issues of the repo
> [1]!
>
>
> [1]: https://github.com/voltrondata-labs/gha-controller-infra/issues
>
>
> On Sun, Dec 18, 2022 at 4:40 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Comment from outside - I looked briefly at the implementation and docs and
>> the GHA controller looks very clear and straightforward to implement.
>> Fantastic job Jacob and big shoutout to Voltron Data for implementing and
>> open-sourcing it.
>>
>> I am going to try it out  in Apache Airflow very soon. We were waiting for
>> something that GitHub Actions are cooking up
>> https://github.com/orgs/github/projects/4247 but it just moved from Q4
>> 2022
>> to Q1 2022 so .... you never know :).
>>
>> One small comment for the security of hosting your self-hosted runners
>> that
>> you might want to take into account.
>>
>> While this is great there are ephemeral runners (they provide all the
>> necessary security boundaries, escaping from a container in K8S is not an
>> easy feat), there is still one case where allowing any PRs to run code in
>> your self-hosted runners is potentially problematic - i.e. possibility of
>> using the processing power of your machines by anyone to do any kind of
>> jobs (and do it with your donated credits or money). For example
>> cryptomining. This is not an academic problem - this has already happened
>> in the past
>>
>> https://github.blog/2021-04-22-github-actions-update-helping-maintainers-combat-bad-actors/
>> and that's why GitHub Actions introduced mandatory "Approval" for
>> first-time-users Pull requests - because the bad actors were actuallly
>> abusing Github's public runners to mine crypto.
>>
>> The approval workflow actually protects against the "mass abuse" - i.e.
>> creating new accounts and using them to exploit this on multiple repos,
>> but
>> it does not protect you against the case that some collaborators will use
>> your self-hosted runners to do any kind of computing. There are likely
>> ways
>> to mitigate it like limiting the maximum time container can run, and of
>> course attempts to do so might be caught during reviews (and the offending
>> user can be called out) - but I think if you want powerful CI and have a
>> lot of contributors, this might slip under the radar easily unless you
>> have
>> some monitoring in place. The fact that it is not mass-exploitable by new
>> users, makes it less likely to occur (because the regular users might lose
>> their reputation if they attempt to do it), but it is still a possibility.
>>
>> It's up to you if you would like to protect against it in some ways (in
>> Airflow we will likely continue using https://github.com/ashb/runner and
>> limit the self-hosted workflows to "main" workflows and to maintainer's
>> PRs) and it is not a blocker, but I wanted you to be aware of this
>> potential abuse scenario.
>>
>> J.
>>
>>
>>
>> On Fri, Dec 16, 2022 at 7:27 PM Jacob Wujciak
>> <ja...@voltrondata.com.invalid>
>> wrote:
>>
>> > No news with regards to arrow specific S390x machines but apparently IBM
>> > has donated a number of S390x VMs to the ASF which we should be able to
>> use
>> > but I have not had the time yet to investigate this option.
>> >
>> >
>> > Matt Topol <zo...@gmail.com> schrieb am Fr., 16. Dez. 2022,
>> 17:01:
>> >
>> > > These are awesome! Has there been any luck in reaching out to IBM to
>> see
>> > if
>> > > they could donate one or more s390x VMs to use as runners for testing
>> the
>> > > s390x builds? That is probably my only concern with Travis going away
>> at
>> > > EOY, since we don't have a way currently to test those builds on GH
>> > > Actions.
>> > >
>> > > --Matt
>> > >
>> > > On Fri, Dec 16, 2022 at 8:46 AM Jacob Wujciak
>> > > <ja...@voltrondata.com.invalid>
>> > > wrote:
>> > >
>> > > > I would like to propose the addition of a self-hosted runner system
>> to
>> > > the
>> > > > arrow repository to add speciality runners (arm64 and CUDA). This
>> will
>> > > > allow us to compensate for the arm64 jobs that previously ran on
>> > Travis,
>> > > > which will be turned off EOY[1].
>> > > >
>> > > > The migration to GitHub Issues will require a significant extension
>> of
>> > > our
>> > > > existing “comment bot”-workflows (e.g. assigning and labeling issues
>> > for
>> > > > non-committers, see [3]), with such a system we could add reserved
>> > > runners
>> > > > that only pick up these “comment bot”-jobs to guarantee a smooth
>> > > developer
>> > > > experience, regardless of the state of the ASF CI resources.
>> > > >
>> > > > As the allocation of GitHub-hosted runners for the Apache software
>> > > > foundation was recently increased, the queue times are currently
>> low,
>> > but
>> > > > this will inevitably change and such a system would enable us to
>> react
>> > > > quickly to such changes by adding new Windows and Linux nodes
>> without
>> > any
>> > > > need for INFRA intervention.
>> > > >
>> > > > We at Voltron Data have been working on a Kubernetes based system to
>> > > deploy
>> > > > auto-scaling ephemeral GitHub runners that can be seamlessly added
>> to
>> > the
>> > > > arrow repository via a Github App. As the runners are ephemeral
>> (each
>> > job
>> > > > is run in an isolated environment that is destroyed once the job is
>> > done)
>> > > > the usual security issues with self-hosted runners do not apply [2].
>> > > >
>> > > > Voltron Data has open sourced the necessary Infrastructure as Code
>> [4],
>> > > > this makes it possible for other interested parties to donate CI
>> > capacity
>> > > > to arrow or other ASF projects by cloning the IaC, setting up and
>> > > > maintaining their own Instance of the system. Voltron Data will set
>> up
>> > > and
>> > > > maintain one instance of the system.
>> > > >
>> > > > The dockerfiles for the runners will be added to the main arrow
>> repo to
>> > > > facilitate easy changes and updates to the runner configuration for
>> the
>> > > > community.
>> > > >
>> > > > Best,
>> > > > Jacob
>> > > >
>> > > > [1]:
>> > https://cwiki.apache.org/confluence/display/INFRA/Travis+Migrations
>> > > >
>> > > > [2]:
>> > > >
>> > > >
>> > >
>> >
>> https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security
>> > > >
>> > > > [3]:
>> https://github.com/apache/arrow/actions/workflows/comment_bot.yml
>> > > >
>> > > > [4]: https://github.com/voltrondata-labs/gha-controller-infra
>> > > >
>> > >
>> >
>>
>

Re: [DISC] Self-Hosted Runners for Arrow

Posted by Jacob Wujciak <ja...@voltrondata.com.INVALID>.
Jarek, thank you for the glowing review :)

Yes, we will have monitoring setup in the instance we are going to host to
protect against abuse like that but as we use a non-FOSS tool for
monitoring internally there is no code included for this at this time.

I would like to give a shout out to Álvaro Maldonado Mateos and Ian Flores
Siaca who have been doing the work of implementing this and are available
for detailed technical questions or suggestions via the issues of the repo
[1]!


[1]: https://github.com/voltrondata-labs/gha-controller-infra/issues


On Sun, Dec 18, 2022 at 4:40 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> Comment from outside - I looked briefly at the implementation and docs and
> the GHA controller looks very clear and straightforward to implement.
> Fantastic job Jacob and big shoutout to Voltron Data for implementing and
> open-sourcing it.
>
> I am going to try it out  in Apache Airflow very soon. We were waiting for
> something that GitHub Actions are cooking up
> https://github.com/orgs/github/projects/4247 but it just moved from Q4
> 2022
> to Q1 2022 so .... you never know :).
>
> One small comment for the security of hosting your self-hosted runners that
> you might want to take into account.
>
> While this is great there are ephemeral runners (they provide all the
> necessary security boundaries, escaping from a container in K8S is not an
> easy feat), there is still one case where allowing any PRs to run code in
> your self-hosted runners is potentially problematic - i.e. possibility of
> using the processing power of your machines by anyone to do any kind of
> jobs (and do it with your donated credits or money). For example
> cryptomining. This is not an academic problem - this has already happened
> in the past
>
> https://github.blog/2021-04-22-github-actions-update-helping-maintainers-combat-bad-actors/
> and that's why GitHub Actions introduced mandatory "Approval" for
> first-time-users Pull requests - because the bad actors were actuallly
> abusing Github's public runners to mine crypto.
>
> The approval workflow actually protects against the "mass abuse" - i.e.
> creating new accounts and using them to exploit this on multiple repos, but
> it does not protect you against the case that some collaborators will use
> your self-hosted runners to do any kind of computing. There are likely ways
> to mitigate it like limiting the maximum time container can run, and of
> course attempts to do so might be caught during reviews (and the offending
> user can be called out) - but I think if you want powerful CI and have a
> lot of contributors, this might slip under the radar easily unless you have
> some monitoring in place. The fact that it is not mass-exploitable by new
> users, makes it less likely to occur (because the regular users might lose
> their reputation if they attempt to do it), but it is still a possibility.
>
> It's up to you if you would like to protect against it in some ways (in
> Airflow we will likely continue using https://github.com/ashb/runner and
> limit the self-hosted workflows to "main" workflows and to maintainer's
> PRs) and it is not a blocker, but I wanted you to be aware of this
> potential abuse scenario.
>
> J.
>
>
>
> On Fri, Dec 16, 2022 at 7:27 PM Jacob Wujciak
> <ja...@voltrondata.com.invalid>
> wrote:
>
> > No news with regards to arrow specific S390x machines but apparently IBM
> > has donated a number of S390x VMs to the ASF which we should be able to
> use
> > but I have not had the time yet to investigate this option.
> >
> >
> > Matt Topol <zo...@gmail.com> schrieb am Fr., 16. Dez. 2022,
> 17:01:
> >
> > > These are awesome! Has there been any luck in reaching out to IBM to
> see
> > if
> > > they could donate one or more s390x VMs to use as runners for testing
> the
> > > s390x builds? That is probably my only concern with Travis going away
> at
> > > EOY, since we don't have a way currently to test those builds on GH
> > > Actions.
> > >
> > > --Matt
> > >
> > > On Fri, Dec 16, 2022 at 8:46 AM Jacob Wujciak
> > > <ja...@voltrondata.com.invalid>
> > > wrote:
> > >
> > > > I would like to propose the addition of a self-hosted runner system
> to
> > > the
> > > > arrow repository to add speciality runners (arm64 and CUDA). This
> will
> > > > allow us to compensate for the arm64 jobs that previously ran on
> > Travis,
> > > > which will be turned off EOY[1].
> > > >
> > > > The migration to GitHub Issues will require a significant extension
> of
> > > our
> > > > existing “comment bot”-workflows (e.g. assigning and labeling issues
> > for
> > > > non-committers, see [3]), with such a system we could add reserved
> > > runners
> > > > that only pick up these “comment bot”-jobs to guarantee a smooth
> > > developer
> > > > experience, regardless of the state of the ASF CI resources.
> > > >
> > > > As the allocation of GitHub-hosted runners for the Apache software
> > > > foundation was recently increased, the queue times are currently low,
> > but
> > > > this will inevitably change and such a system would enable us to
> react
> > > > quickly to such changes by adding new Windows and Linux nodes without
> > any
> > > > need for INFRA intervention.
> > > >
> > > > We at Voltron Data have been working on a Kubernetes based system to
> > > deploy
> > > > auto-scaling ephemeral GitHub runners that can be seamlessly added to
> > the
> > > > arrow repository via a Github App. As the runners are ephemeral (each
> > job
> > > > is run in an isolated environment that is destroyed once the job is
> > done)
> > > > the usual security issues with self-hosted runners do not apply [2].
> > > >
> > > > Voltron Data has open sourced the necessary Infrastructure as Code
> [4],
> > > > this makes it possible for other interested parties to donate CI
> > capacity
> > > > to arrow or other ASF projects by cloning the IaC, setting up and
> > > > maintaining their own Instance of the system. Voltron Data will set
> up
> > > and
> > > > maintain one instance of the system.
> > > >
> > > > The dockerfiles for the runners will be added to the main arrow repo
> to
> > > > facilitate easy changes and updates to the runner configuration for
> the
> > > > community.
> > > >
> > > > Best,
> > > > Jacob
> > > >
> > > > [1]:
> > https://cwiki.apache.org/confluence/display/INFRA/Travis+Migrations
> > > >
> > > > [2]:
> > > >
> > > >
> > >
> >
> https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security
> > > >
> > > > [3]:
> https://github.com/apache/arrow/actions/workflows/comment_bot.yml
> > > >
> > > > [4]: https://github.com/voltrondata-labs/gha-controller-infra
> > > >
> > >
> >
>

Re: [DISC] Self-Hosted Runners for Arrow

Posted by Jarek Potiuk <ja...@potiuk.com>.
Comment from outside - I looked briefly at the implementation and docs and
the GHA controller looks very clear and straightforward to implement.
Fantastic job Jacob and big shoutout to Voltron Data for implementing and
open-sourcing it.

I am going to try it out  in Apache Airflow very soon. We were waiting for
something that GitHub Actions are cooking up
https://github.com/orgs/github/projects/4247 but it just moved from Q4 2022
to Q1 2022 so .... you never know :).

One small comment for the security of hosting your self-hosted runners that
you might want to take into account.

While this is great there are ephemeral runners (they provide all the
necessary security boundaries, escaping from a container in K8S is not an
easy feat), there is still one case where allowing any PRs to run code in
your self-hosted runners is potentially problematic - i.e. possibility of
using the processing power of your machines by anyone to do any kind of
jobs (and do it with your donated credits or money). For example
cryptomining. This is not an academic problem - this has already happened
in the past
https://github.blog/2021-04-22-github-actions-update-helping-maintainers-combat-bad-actors/
and that's why GitHub Actions introduced mandatory "Approval" for
first-time-users Pull requests - because the bad actors were actuallly
abusing Github's public runners to mine crypto.

The approval workflow actually protects against the "mass abuse" - i.e.
creating new accounts and using them to exploit this on multiple repos, but
it does not protect you against the case that some collaborators will use
your self-hosted runners to do any kind of computing. There are likely ways
to mitigate it like limiting the maximum time container can run, and of
course attempts to do so might be caught during reviews (and the offending
user can be called out) - but I think if you want powerful CI and have a
lot of contributors, this might slip under the radar easily unless you have
some monitoring in place. The fact that it is not mass-exploitable by new
users, makes it less likely to occur (because the regular users might lose
their reputation if they attempt to do it), but it is still a possibility.

It's up to you if you would like to protect against it in some ways (in
Airflow we will likely continue using https://github.com/ashb/runner and
limit the self-hosted workflows to "main" workflows and to maintainer's
PRs) and it is not a blocker, but I wanted you to be aware of this
potential abuse scenario.

J.



On Fri, Dec 16, 2022 at 7:27 PM Jacob Wujciak <ja...@voltrondata.com.invalid>
wrote:

> No news with regards to arrow specific S390x machines but apparently IBM
> has donated a number of S390x VMs to the ASF which we should be able to use
> but I have not had the time yet to investigate this option.
>
>
> Matt Topol <zo...@gmail.com> schrieb am Fr., 16. Dez. 2022, 17:01:
>
> > These are awesome! Has there been any luck in reaching out to IBM to see
> if
> > they could donate one or more s390x VMs to use as runners for testing the
> > s390x builds? That is probably my only concern with Travis going away at
> > EOY, since we don't have a way currently to test those builds on GH
> > Actions.
> >
> > --Matt
> >
> > On Fri, Dec 16, 2022 at 8:46 AM Jacob Wujciak
> > <ja...@voltrondata.com.invalid>
> > wrote:
> >
> > > I would like to propose the addition of a self-hosted runner system to
> > the
> > > arrow repository to add speciality runners (arm64 and CUDA). This will
> > > allow us to compensate for the arm64 jobs that previously ran on
> Travis,
> > > which will be turned off EOY[1].
> > >
> > > The migration to GitHub Issues will require a significant extension of
> > our
> > > existing “comment bot”-workflows (e.g. assigning and labeling issues
> for
> > > non-committers, see [3]), with such a system we could add reserved
> > runners
> > > that only pick up these “comment bot”-jobs to guarantee a smooth
> > developer
> > > experience, regardless of the state of the ASF CI resources.
> > >
> > > As the allocation of GitHub-hosted runners for the Apache software
> > > foundation was recently increased, the queue times are currently low,
> but
> > > this will inevitably change and such a system would enable us to react
> > > quickly to such changes by adding new Windows and Linux nodes without
> any
> > > need for INFRA intervention.
> > >
> > > We at Voltron Data have been working on a Kubernetes based system to
> > deploy
> > > auto-scaling ephemeral GitHub runners that can be seamlessly added to
> the
> > > arrow repository via a Github App. As the runners are ephemeral (each
> job
> > > is run in an isolated environment that is destroyed once the job is
> done)
> > > the usual security issues with self-hosted runners do not apply [2].
> > >
> > > Voltron Data has open sourced the necessary Infrastructure as Code [4],
> > > this makes it possible for other interested parties to donate CI
> capacity
> > > to arrow or other ASF projects by cloning the IaC, setting up and
> > > maintaining their own Instance of the system. Voltron Data will set up
> > and
> > > maintain one instance of the system.
> > >
> > > The dockerfiles for the runners will be added to the main arrow repo to
> > > facilitate easy changes and updates to the runner configuration for the
> > > community.
> > >
> > > Best,
> > > Jacob
> > >
> > > [1]:
> https://cwiki.apache.org/confluence/display/INFRA/Travis+Migrations
> > >
> > > [2]:
> > >
> > >
> >
> https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security
> > >
> > > [3]: https://github.com/apache/arrow/actions/workflows/comment_bot.yml
> > >
> > > [4]: https://github.com/voltrondata-labs/gha-controller-infra
> > >
> >
>

Re: [DISC] Self-Hosted Runners for Arrow

Posted by Jacob Wujciak <ja...@voltrondata.com.INVALID>.
No news with regards to arrow specific S390x machines but apparently IBM
has donated a number of S390x VMs to the ASF which we should be able to use
but I have not had the time yet to investigate this option.


Matt Topol <zo...@gmail.com> schrieb am Fr., 16. Dez. 2022, 17:01:

> These are awesome! Has there been any luck in reaching out to IBM to see if
> they could donate one or more s390x VMs to use as runners for testing the
> s390x builds? That is probably my only concern with Travis going away at
> EOY, since we don't have a way currently to test those builds on GH
> Actions.
>
> --Matt
>
> On Fri, Dec 16, 2022 at 8:46 AM Jacob Wujciak
> <ja...@voltrondata.com.invalid>
> wrote:
>
> > I would like to propose the addition of a self-hosted runner system to
> the
> > arrow repository to add speciality runners (arm64 and CUDA). This will
> > allow us to compensate for the arm64 jobs that previously ran on Travis,
> > which will be turned off EOY[1].
> >
> > The migration to GitHub Issues will require a significant extension of
> our
> > existing “comment bot”-workflows (e.g. assigning and labeling issues for
> > non-committers, see [3]), with such a system we could add reserved
> runners
> > that only pick up these “comment bot”-jobs to guarantee a smooth
> developer
> > experience, regardless of the state of the ASF CI resources.
> >
> > As the allocation of GitHub-hosted runners for the Apache software
> > foundation was recently increased, the queue times are currently low, but
> > this will inevitably change and such a system would enable us to react
> > quickly to such changes by adding new Windows and Linux nodes without any
> > need for INFRA intervention.
> >
> > We at Voltron Data have been working on a Kubernetes based system to
> deploy
> > auto-scaling ephemeral GitHub runners that can be seamlessly added to the
> > arrow repository via a Github App. As the runners are ephemeral (each job
> > is run in an isolated environment that is destroyed once the job is done)
> > the usual security issues with self-hosted runners do not apply [2].
> >
> > Voltron Data has open sourced the necessary Infrastructure as Code [4],
> > this makes it possible for other interested parties to donate CI capacity
> > to arrow or other ASF projects by cloning the IaC, setting up and
> > maintaining their own Instance of the system. Voltron Data will set up
> and
> > maintain one instance of the system.
> >
> > The dockerfiles for the runners will be added to the main arrow repo to
> > facilitate easy changes and updates to the runner configuration for the
> > community.
> >
> > Best,
> > Jacob
> >
> > [1]: https://cwiki.apache.org/confluence/display/INFRA/Travis+Migrations
> >
> > [2]:
> >
> >
> https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security
> >
> > [3]: https://github.com/apache/arrow/actions/workflows/comment_bot.yml
> >
> > [4]: https://github.com/voltrondata-labs/gha-controller-infra
> >
>

Re: [DISC] Self-Hosted Runners for Arrow

Posted by Matt Topol <zo...@gmail.com>.
These are awesome! Has there been any luck in reaching out to IBM to see if
they could donate one or more s390x VMs to use as runners for testing the
s390x builds? That is probably my only concern with Travis going away at
EOY, since we don't have a way currently to test those builds on GH Actions.

--Matt

On Fri, Dec 16, 2022 at 8:46 AM Jacob Wujciak <ja...@voltrondata.com.invalid>
wrote:

> I would like to propose the addition of a self-hosted runner system to the
> arrow repository to add speciality runners (arm64 and CUDA). This will
> allow us to compensate for the arm64 jobs that previously ran on Travis,
> which will be turned off EOY[1].
>
> The migration to GitHub Issues will require a significant extension of our
> existing “comment bot”-workflows (e.g. assigning and labeling issues for
> non-committers, see [3]), with such a system we could add reserved runners
> that only pick up these “comment bot”-jobs to guarantee a smooth developer
> experience, regardless of the state of the ASF CI resources.
>
> As the allocation of GitHub-hosted runners for the Apache software
> foundation was recently increased, the queue times are currently low, but
> this will inevitably change and such a system would enable us to react
> quickly to such changes by adding new Windows and Linux nodes without any
> need for INFRA intervention.
>
> We at Voltron Data have been working on a Kubernetes based system to deploy
> auto-scaling ephemeral GitHub runners that can be seamlessly added to the
> arrow repository via a Github App. As the runners are ephemeral (each job
> is run in an isolated environment that is destroyed once the job is done)
> the usual security issues with self-hosted runners do not apply [2].
>
> Voltron Data has open sourced the necessary Infrastructure as Code [4],
> this makes it possible for other interested parties to donate CI capacity
> to arrow or other ASF projects by cloning the IaC, setting up and
> maintaining their own Instance of the system. Voltron Data will set up and
> maintain one instance of the system.
>
> The dockerfiles for the runners will be added to the main arrow repo to
> facilitate easy changes and updates to the runner configuration for the
> community.
>
> Best,
> Jacob
>
> [1]: https://cwiki.apache.org/confluence/display/INFRA/Travis+Migrations
>
> [2]:
>
> https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security
>
> [3]: https://github.com/apache/arrow/actions/workflows/comment_bot.yml
>
> [4]: https://github.com/voltrondata-labs/gha-controller-infra
>