You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Jarek Potiuk <Ja...@polidea.com> on 2020/10/13 12:12:24 UTC

Credits from Google (or other sponsors?) for self-hosted runners

Hello Aizhamal, Everyone,

We've had some problems recently with concurrency for Github Actions and
suggested solution for now is to use self-hosted runners (This is suggested
by GitHub Support)

I made some comments in the issue here:

https://github.com/apache/airflow/issues/11496

And also opened build@ discussion
https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E
and opened an accompanying ticket in JIRA:
https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20978

Regardless from those discussions, It would be great if we come back to the
idea of Google Donating some credits to Apache Airlfow to setup their own
runners.

 We have not used them last time when GitLab did not manage to implement
the needed fork support (they have not implemented it till NOW for more
than 1.5 year!) but with GitHub I am quite certain we can switch and start
using such runners pretty much immediately if we had some credits.

Or maybe some other companies could donate some credits to us ?

J.




-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Credits from Google (or other sponsors?) for self-hosted runners

Posted by Jarek Potiuk <Ja...@polidea.com>.
<3

On Fri, Oct 16, 2020 at 3:15 AM Aizhamal Nurmamat kyzy <ai...@apache.org>
wrote:

> Let me start pulling internal strings and i will report back.
>
> On Tue, Oct 13, 2020 at 1:42 PM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
>> Really hard to say now. But I did some - rather generic - calculations
>> https://cloud.google.com/products/calculator#id=abb18f23-0ea5-495e-a1fc-9cca1953096b
>> and is some 400 USD /month. But I think when we connect it with free tier
>> from GA, it could be half that I think.
>>
>> J.
>>
>> On Tue, Oct 13, 2020 at 10:10 PM Aizhamal Nurmamat kyzy <
>> aizhamal@apache.org> wrote:
>>
>>> What are the estimated yearly costs?
>>>
>>> On Tue, Oct 13, 2020 at 9:17 AM Jarek Potiuk <Ja...@polidea.com>
>>> wrote:
>>>
>>>> Yep, we can do it: *docker build --cpu-shares=100 --memory=1024m *
>>>>
>>>> On Tue, Oct 13, 2020 at 6:15 PM Jarek Potiuk <Ja...@polidea.com>
>>>> wrote:
>>>>
>>>>> Plus the "workflow_runs" (image building) for all PRs can also be done
>>>>> in the self-hosted workers. They are safe as they are using master scripts
>>>>> (the only potentially dangerous part in them is that someone could do some
>>>>> "mining" as "malicious" Docker image building step, This is the only part
>>>>> that comes from the PR for "workflow_run" but this would be isolated within
>>>>> the docker build process which I believe has rather limited resources or we
>>>>> can limit it additionally to single processor and limited memory.
>>>>>
>>>>> J.
>>>>>
>>>>>
>>>>> On Tue, Oct 13, 2020 at 6:12 PM Jarek Potiuk <Ja...@polidea.com>
>>>>> wrote:
>>>>>
>>>>>> I think this part is easy:
>>>>>>
>>>>>> * First of all -  It is similar to GA - someone could have used all
>>>>>> the 180 workers of Apache by submitting PRs to various projects. So we just
>>>>>> need a limited worker queue. All those can run as workers in GKE and it
>>>>>> should be easy to manage (we could have auto-scaling GKE cluster with upper
>>>>>> limit)
>>>>>> * Secondly - we can - likely - continue using the GA public workers
>>>>>> for all incoming PRs and only use the self-hosted ones for master pushes.
>>>>>> Or we could also use them for PRs coming from maintainers.
>>>>>>
>>>>>> J.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 13, 2020 at 5:52 PM Ash Berlin-Taylor <as...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> And a magic security sandbox :D
>>>>>>>
>>>>>>> On Oct 13 2020, at 4:51 pm, Jarek Potiuk <Ja...@polidea.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Yep. Now we just need credits :)
>>>>>>>
>>>>>>> On Tue, Oct 13, 2020 at 5:30 PM Kaxil Naik <ka...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> That's ace, we should go ahead with self-hosted runners then.
>>>>>>>
>>>>>>> On Tue, Oct 13, 2020 at 4:06 PM Ash Berlin-Taylor <as...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Confirmed, we *can* do it - Arrow has done it already
>>>>>>> https://issues.apache.org/jira/browse/INFRA-19875
>>>>>>>
>>>>>>> But lets have a think on how to not be a bot net :)
>>>>>>>
>>>>>>> On Oct 13 2020, at 3:59 pm, Ash Berlin-Taylor <as...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>> I've spoken to a few members of ASF Infra directly, and they are
>>>>>>> just confirming but they are okay with the idea of us adding self hosted
>>>>>>> runners to our repo, and also okay that we can manage those nodes
>>>>>>> ourselves. Should get final confirmation today.
>>>>>>>
>>>>>>> I wanted to double check that we could use the credits before we get
>>>>>>> anyone to stump up the VMs/credits etc.
>>>>>>>
>>>>>>> -ash
>>>>>>>
>>>>>>> On Oct 13 2020, at 2:16 pm, Jarek Potiuk <Ja...@polidea.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> This is also a slight problem as mentioned in the build@ thread:
>>>>>>> https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E -
>>>>>>> managing hosting runners has to be done through infrastructure and they are
>>>>>>> not really responsive recently (I have tickets waiting for weeks now).
>>>>>>>
>>>>>>> But as I've learned recently that we can manage our own secrets via
>>>>>>> API without INFRA (and completely legitimately according to GitHub
>>>>>>> documentation), maybe hosted runners will be also possible to self-manage :D
>>>>>>>
>>>>>>> J.
>>>>>>>
>>>>>>> On Tue, Oct 13, 2020 at 2:22 PM Ash Berlin-Taylor <as...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>> I've thought about private/self-hosted runners, and I think long
>>>>>>> term that's the way to go to alievate our CI bottlenecks.
>>>>>>>
>>>>>>> There's a bit of work we need to do around security of builds - as
>>>>>>> mentioned here
>>>>>>> https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories
>>>>>>>
>>>>>>> > We recommend that you do not use self-hosted runners with public
>>>>>>> repositories.
>>>>>>> >
>>>>>>> > Forks of your public repository can potentially run dangerous code
>>>>>>> on your self-hosted runner machine by creating a pull request that executes
>>>>>>> the code in a workflow.
>>>>>>> >
>>>>>>> > This is not an issue with GitHub-hosted runners because each
>>>>>>> GitHub-hosted runner is always a clean isolated virtual machine, and it is
>>>>>>> destroyed at the end of the job execution.
>>>>>>>
>>>>>>> So we'd need to dos something similar.
>>>>>>>
>>>>>>> All for this and happy to help out once 2.0 is out (or at least once
>>>>>>> it starts to quieten down)
>>>>>>>
>>>>>>> -ash
>>>>>>>
>>>>>>> On Oct 13 2020, at 1:12 pm, Jarek Potiuk <Ja...@polidea.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hello Aizhamal, Everyone,
>>>>>>>
>>>>>>> We've had some problems recently with concurrency for Github Actions
>>>>>>> and suggested solution for now is to use self-hosted runners (This is
>>>>>>> suggested by GitHub Support)
>>>>>>>
>>>>>>> I made some comments in the issue here:
>>>>>>>
>>>>>>> https://github.com/apache/airflow/issues/11496
>>>>>>>
>>>>>>> And also opened build@ discussion
>>>>>>> https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E and
>>>>>>> opened an accompanying ticket in JIRA:
>>>>>>> https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20978
>>>>>>>
>>>>>>> Regardless from those discussions, It would be great if we come back
>>>>>>> to the idea of Google Donating some credits to Apache Airlfow to
>>>>>>> setup their own runners.
>>>>>>>
>>>>>>>  We have not used them last time when GitLab did not manage to
>>>>>>> implement the needed fork support (they have not implemented it till NOW
>>>>>>> for more than 1.5 year!) but with GitHub I am quite certain we can switch
>>>>>>> and start using such runners pretty much immediately if we had some
>>>>>>> credits.
>>>>>>>
>>>>>>> Or maybe some other companies could donate some credits to us ?
>>>>>>>
>>>>>>> J.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Jarek Potiuk
>>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>>>
>>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Jarek Potiuk
>>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>>>
>>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Jarek Potiuk
>>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>>>
>>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Jarek Potiuk
>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>>
>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Jarek Potiuk
>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>
>>>>> M: +48 660 796 129 <+48660796129>
>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Jarek Potiuk
>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>
>>>> M: +48 660 796 129 <+48660796129>
>>>> [image: Polidea] <https://www.polidea.com/>
>>>>
>>>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Credits from Google (or other sponsors?) for self-hosted runners

Posted by Aizhamal Nurmamat kyzy <ai...@apache.org>.
Let me start pulling internal strings and i will report back.

On Tue, Oct 13, 2020 at 1:42 PM Jarek Potiuk <Ja...@polidea.com>
wrote:

> Really hard to say now. But I did some - rather generic - calculations
> https://cloud.google.com/products/calculator#id=abb18f23-0ea5-495e-a1fc-9cca1953096b
> and is some 400 USD /month. But I think when we connect it with free tier
> from GA, it could be half that I think.
>
> J.
>
> On Tue, Oct 13, 2020 at 10:10 PM Aizhamal Nurmamat kyzy <
> aizhamal@apache.org> wrote:
>
>> What are the estimated yearly costs?
>>
>> On Tue, Oct 13, 2020 at 9:17 AM Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>>
>>> Yep, we can do it: *docker build --cpu-shares=100 --memory=1024m *
>>>
>>> On Tue, Oct 13, 2020 at 6:15 PM Jarek Potiuk <Ja...@polidea.com>
>>> wrote:
>>>
>>>> Plus the "workflow_runs" (image building) for all PRs can also be done
>>>> in the self-hosted workers. They are safe as they are using master scripts
>>>> (the only potentially dangerous part in them is that someone could do some
>>>> "mining" as "malicious" Docker image building step, This is the only part
>>>> that comes from the PR for "workflow_run" but this would be isolated within
>>>> the docker build process which I believe has rather limited resources or we
>>>> can limit it additionally to single processor and limited memory.
>>>>
>>>> J.
>>>>
>>>>
>>>> On Tue, Oct 13, 2020 at 6:12 PM Jarek Potiuk <Ja...@polidea.com>
>>>> wrote:
>>>>
>>>>> I think this part is easy:
>>>>>
>>>>> * First of all -  It is similar to GA - someone could have used all
>>>>> the 180 workers of Apache by submitting PRs to various projects. So we just
>>>>> need a limited worker queue. All those can run as workers in GKE and it
>>>>> should be easy to manage (we could have auto-scaling GKE cluster with upper
>>>>> limit)
>>>>> * Secondly - we can - likely - continue using the GA public workers
>>>>> for all incoming PRs and only use the self-hosted ones for master pushes.
>>>>> Or we could also use them for PRs coming from maintainers.
>>>>>
>>>>> J.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 13, 2020 at 5:52 PM Ash Berlin-Taylor <as...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> And a magic security sandbox :D
>>>>>>
>>>>>> On Oct 13 2020, at 4:51 pm, Jarek Potiuk <Ja...@polidea.com>
>>>>>> wrote:
>>>>>>
>>>>>> Yep. Now we just need credits :)
>>>>>>
>>>>>> On Tue, Oct 13, 2020 at 5:30 PM Kaxil Naik <ka...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> That's ace, we should go ahead with self-hosted runners then.
>>>>>>
>>>>>> On Tue, Oct 13, 2020 at 4:06 PM Ash Berlin-Taylor <as...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>> Confirmed, we *can* do it - Arrow has done it already
>>>>>> https://issues.apache.org/jira/browse/INFRA-19875
>>>>>>
>>>>>> But lets have a think on how to not be a bot net :)
>>>>>>
>>>>>> On Oct 13 2020, at 3:59 pm, Ash Berlin-Taylor <as...@apache.org> wrote:
>>>>>>
>>>>>> I've spoken to a few members of ASF Infra directly, and they are just
>>>>>> confirming but they are okay with the idea of us adding self hosted runners
>>>>>> to our repo, and also okay that we can manage those nodes ourselves. Should
>>>>>> get final confirmation today.
>>>>>>
>>>>>> I wanted to double check that we could use the credits before we get
>>>>>> anyone to stump up the VMs/credits etc.
>>>>>>
>>>>>> -ash
>>>>>>
>>>>>> On Oct 13 2020, at 2:16 pm, Jarek Potiuk <Ja...@polidea.com>
>>>>>> wrote:
>>>>>>
>>>>>> This is also a slight problem as mentioned in the build@ thread:
>>>>>> https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E -
>>>>>> managing hosting runners has to be done through infrastructure and they are
>>>>>> not really responsive recently (I have tickets waiting for weeks now).
>>>>>>
>>>>>> But as I've learned recently that we can manage our own secrets via
>>>>>> API without INFRA (and completely legitimately according to GitHub
>>>>>> documentation), maybe hosted runners will be also possible to self-manage :D
>>>>>>
>>>>>> J.
>>>>>>
>>>>>> On Tue, Oct 13, 2020 at 2:22 PM Ash Berlin-Taylor <as...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>> I've thought about private/self-hosted runners, and I think long term
>>>>>> that's the way to go to alievate our CI bottlenecks.
>>>>>>
>>>>>> There's a bit of work we need to do around security of builds - as
>>>>>> mentioned here
>>>>>> https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories
>>>>>>
>>>>>> > We recommend that you do not use self-hosted runners with public
>>>>>> repositories.
>>>>>> >
>>>>>> > Forks of your public repository can potentially run dangerous code
>>>>>> on your self-hosted runner machine by creating a pull request that executes
>>>>>> the code in a workflow.
>>>>>> >
>>>>>> > This is not an issue with GitHub-hosted runners because each
>>>>>> GitHub-hosted runner is always a clean isolated virtual machine, and it is
>>>>>> destroyed at the end of the job execution.
>>>>>>
>>>>>> So we'd need to dos something similar.
>>>>>>
>>>>>> All for this and happy to help out once 2.0 is out (or at least once
>>>>>> it starts to quieten down)
>>>>>>
>>>>>> -ash
>>>>>>
>>>>>> On Oct 13 2020, at 1:12 pm, Jarek Potiuk <Ja...@polidea.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hello Aizhamal, Everyone,
>>>>>>
>>>>>> We've had some problems recently with concurrency for Github Actions
>>>>>> and suggested solution for now is to use self-hosted runners (This is
>>>>>> suggested by GitHub Support)
>>>>>>
>>>>>> I made some comments in the issue here:
>>>>>>
>>>>>> https://github.com/apache/airflow/issues/11496
>>>>>>
>>>>>> And also opened build@ discussion
>>>>>> https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E and
>>>>>> opened an accompanying ticket in JIRA:
>>>>>> https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20978
>>>>>>
>>>>>> Regardless from those discussions, It would be great if we come back
>>>>>> to the idea of Google Donating some credits to Apache Airlfow to
>>>>>> setup their own runners.
>>>>>>
>>>>>>  We have not used them last time when GitLab did not manage to
>>>>>> implement the needed fork support (they have not implemented it till NOW
>>>>>> for more than 1.5 year!) but with GitHub I am quite certain we can switch
>>>>>> and start using such runners pretty much immediately if we had some
>>>>>> credits.
>>>>>>
>>>>>> Or maybe some other companies could donate some credits to us ?
>>>>>>
>>>>>> J.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Jarek Potiuk
>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>>
>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Jarek Potiuk
>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>>
>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Jarek Potiuk
>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>>
>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Jarek Potiuk
>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>
>>>>> M: +48 660 796 129 <+48660796129>
>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Jarek Potiuk
>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>
>>>> M: +48 660 796 129 <+48660796129>
>>>> [image: Polidea] <https://www.polidea.com/>
>>>>
>>>>
>>>
>>> --
>>>
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>>
>>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

Re: Credits from Google (or other sponsors?) for self-hosted runners

Posted by Jarek Potiuk <Ja...@polidea.com>.
Really hard to say now. But I did some - rather generic - calculations
https://cloud.google.com/products/calculator#id=abb18f23-0ea5-495e-a1fc-9cca1953096b
and is some 400 USD /month. But I think when we connect it with free tier
from GA, it could be half that I think.

J.

On Tue, Oct 13, 2020 at 10:10 PM Aizhamal Nurmamat kyzy <ai...@apache.org>
wrote:

> What are the estimated yearly costs?
>
> On Tue, Oct 13, 2020 at 9:17 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
>> Yep, we can do it: *docker build --cpu-shares=100 --memory=1024m *
>>
>> On Tue, Oct 13, 2020 at 6:15 PM Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>>
>>> Plus the "workflow_runs" (image building) for all PRs can also be done
>>> in the self-hosted workers. They are safe as they are using master scripts
>>> (the only potentially dangerous part in them is that someone could do some
>>> "mining" as "malicious" Docker image building step, This is the only part
>>> that comes from the PR for "workflow_run" but this would be isolated within
>>> the docker build process which I believe has rather limited resources or we
>>> can limit it additionally to single processor and limited memory.
>>>
>>> J.
>>>
>>>
>>> On Tue, Oct 13, 2020 at 6:12 PM Jarek Potiuk <Ja...@polidea.com>
>>> wrote:
>>>
>>>> I think this part is easy:
>>>>
>>>> * First of all -  It is similar to GA - someone could have used all the
>>>> 180 workers of Apache by submitting PRs to various projects. So we just
>>>> need a limited worker queue. All those can run as workers in GKE and it
>>>> should be easy to manage (we could have auto-scaling GKE cluster with upper
>>>> limit)
>>>> * Secondly - we can - likely - continue using the GA public workers for
>>>> all incoming PRs and only use the self-hosted ones for master pushes. Or we
>>>> could also use them for PRs coming from maintainers.
>>>>
>>>> J.
>>>>
>>>>
>>>>
>>>> On Tue, Oct 13, 2020 at 5:52 PM Ash Berlin-Taylor <as...@apache.org>
>>>> wrote:
>>>>
>>>>> And a magic security sandbox :D
>>>>>
>>>>> On Oct 13 2020, at 4:51 pm, Jarek Potiuk <Ja...@polidea.com>
>>>>> wrote:
>>>>>
>>>>> Yep. Now we just need credits :)
>>>>>
>>>>> On Tue, Oct 13, 2020 at 5:30 PM Kaxil Naik <ka...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> That's ace, we should go ahead with self-hosted runners then.
>>>>>
>>>>> On Tue, Oct 13, 2020 at 4:06 PM Ash Berlin-Taylor <as...@apache.org>
>>>>> wrote:
>>>>>
>>>>> Confirmed, we *can* do it - Arrow has done it already
>>>>> https://issues.apache.org/jira/browse/INFRA-19875
>>>>>
>>>>> But lets have a think on how to not be a bot net :)
>>>>>
>>>>> On Oct 13 2020, at 3:59 pm, Ash Berlin-Taylor <as...@apache.org> wrote:
>>>>>
>>>>> I've spoken to a few members of ASF Infra directly, and they are just
>>>>> confirming but they are okay with the idea of us adding self hosted runners
>>>>> to our repo, and also okay that we can manage those nodes ourselves. Should
>>>>> get final confirmation today.
>>>>>
>>>>> I wanted to double check that we could use the credits before we get
>>>>> anyone to stump up the VMs/credits etc.
>>>>>
>>>>> -ash
>>>>>
>>>>> On Oct 13 2020, at 2:16 pm, Jarek Potiuk <Ja...@polidea.com>
>>>>> wrote:
>>>>>
>>>>> This is also a slight problem as mentioned in the build@ thread:
>>>>> https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E -
>>>>> managing hosting runners has to be done through infrastructure and they are
>>>>> not really responsive recently (I have tickets waiting for weeks now).
>>>>>
>>>>> But as I've learned recently that we can manage our own secrets via
>>>>> API without INFRA (and completely legitimately according to GitHub
>>>>> documentation), maybe hosted runners will be also possible to self-manage :D
>>>>>
>>>>> J.
>>>>>
>>>>> On Tue, Oct 13, 2020 at 2:22 PM Ash Berlin-Taylor <as...@apache.org>
>>>>> wrote:
>>>>>
>>>>> I've thought about private/self-hosted runners, and I think long term
>>>>> that's the way to go to alievate our CI bottlenecks.
>>>>>
>>>>> There's a bit of work we need to do around security of builds - as
>>>>> mentioned here
>>>>> https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories
>>>>>
>>>>> > We recommend that you do not use self-hosted runners with public
>>>>> repositories.
>>>>> >
>>>>> > Forks of your public repository can potentially run dangerous code
>>>>> on your self-hosted runner machine by creating a pull request that executes
>>>>> the code in a workflow.
>>>>> >
>>>>> > This is not an issue with GitHub-hosted runners because each
>>>>> GitHub-hosted runner is always a clean isolated virtual machine, and it is
>>>>> destroyed at the end of the job execution.
>>>>>
>>>>> So we'd need to dos something similar.
>>>>>
>>>>> All for this and happy to help out once 2.0 is out (or at least once
>>>>> it starts to quieten down)
>>>>>
>>>>> -ash
>>>>>
>>>>> On Oct 13 2020, at 1:12 pm, Jarek Potiuk <Ja...@polidea.com>
>>>>> wrote:
>>>>>
>>>>> Hello Aizhamal, Everyone,
>>>>>
>>>>> We've had some problems recently with concurrency for Github Actions
>>>>> and suggested solution for now is to use self-hosted runners (This is
>>>>> suggested by GitHub Support)
>>>>>
>>>>> I made some comments in the issue here:
>>>>>
>>>>> https://github.com/apache/airflow/issues/11496
>>>>>
>>>>> And also opened build@ discussion
>>>>> https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E and
>>>>> opened an accompanying ticket in JIRA:
>>>>> https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20978
>>>>>
>>>>> Regardless from those discussions, It would be great if we come back
>>>>> to the idea of Google Donating some credits to Apache Airlfow to
>>>>> setup their own runners.
>>>>>
>>>>>  We have not used them last time when GitLab did not manage to
>>>>> implement the needed fork support (they have not implemented it till NOW
>>>>> for more than 1.5 year!) but with GitHub I am quite certain we can switch
>>>>> and start using such runners pretty much immediately if we had some
>>>>> credits.
>>>>>
>>>>> Or maybe some other companies could donate some credits to us ?
>>>>>
>>>>> J.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Jarek Potiuk
>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>
>>>>> M: +48 660 796 129 <+48660796129>
>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Jarek Potiuk
>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>
>>>>> M: +48 660 796 129 <+48660796129>
>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Jarek Potiuk
>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>
>>>>> M: +48 660 796 129 <+48660796129>
>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Jarek Potiuk
>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>
>>>> M: +48 660 796 129 <+48660796129>
>>>> [image: Polidea] <https://www.polidea.com/>
>>>>
>>>>
>>>
>>> --
>>>
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>>
>>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Credits from Google (or other sponsors?) for self-hosted runners

Posted by Aizhamal Nurmamat kyzy <ai...@apache.org>.
What are the estimated yearly costs?

On Tue, Oct 13, 2020 at 9:17 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> Yep, we can do it: *docker build --cpu-shares=100 --memory=1024m *
>
> On Tue, Oct 13, 2020 at 6:15 PM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
>> Plus the "workflow_runs" (image building) for all PRs can also be done in
>> the self-hosted workers. They are safe as they are using master scripts
>> (the only potentially dangerous part in them is that someone could do some
>> "mining" as "malicious" Docker image building step, This is the only part
>> that comes from the PR for "workflow_run" but this would be isolated within
>> the docker build process which I believe has rather limited resources or we
>> can limit it additionally to single processor and limited memory.
>>
>> J.
>>
>>
>> On Tue, Oct 13, 2020 at 6:12 PM Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>>
>>> I think this part is easy:
>>>
>>> * First of all -  It is similar to GA - someone could have used all the
>>> 180 workers of Apache by submitting PRs to various projects. So we just
>>> need a limited worker queue. All those can run as workers in GKE and it
>>> should be easy to manage (we could have auto-scaling GKE cluster with upper
>>> limit)
>>> * Secondly - we can - likely - continue using the GA public workers for
>>> all incoming PRs and only use the self-hosted ones for master pushes. Or we
>>> could also use them for PRs coming from maintainers.
>>>
>>> J.
>>>
>>>
>>>
>>> On Tue, Oct 13, 2020 at 5:52 PM Ash Berlin-Taylor <as...@apache.org>
>>> wrote:
>>>
>>>> And a magic security sandbox :D
>>>>
>>>> On Oct 13 2020, at 4:51 pm, Jarek Potiuk <Ja...@polidea.com>
>>>> wrote:
>>>>
>>>> Yep. Now we just need credits :)
>>>>
>>>> On Tue, Oct 13, 2020 at 5:30 PM Kaxil Naik <ka...@gmail.com> wrote:
>>>>
>>>> That's ace, we should go ahead with self-hosted runners then.
>>>>
>>>> On Tue, Oct 13, 2020 at 4:06 PM Ash Berlin-Taylor <as...@apache.org>
>>>> wrote:
>>>>
>>>> Confirmed, we *can* do it - Arrow has done it already
>>>> https://issues.apache.org/jira/browse/INFRA-19875
>>>>
>>>> But lets have a think on how to not be a bot net :)
>>>>
>>>> On Oct 13 2020, at 3:59 pm, Ash Berlin-Taylor <as...@apache.org> wrote:
>>>>
>>>> I've spoken to a few members of ASF Infra directly, and they are just
>>>> confirming but they are okay with the idea of us adding self hosted runners
>>>> to our repo, and also okay that we can manage those nodes ourselves. Should
>>>> get final confirmation today.
>>>>
>>>> I wanted to double check that we could use the credits before we get
>>>> anyone to stump up the VMs/credits etc.
>>>>
>>>> -ash
>>>>
>>>> On Oct 13 2020, at 2:16 pm, Jarek Potiuk <Ja...@polidea.com>
>>>> wrote:
>>>>
>>>> This is also a slight problem as mentioned in the build@ thread:
>>>> https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E -
>>>> managing hosting runners has to be done through infrastructure and they are
>>>> not really responsive recently (I have tickets waiting for weeks now).
>>>>
>>>> But as I've learned recently that we can manage our own secrets via API
>>>> without INFRA (and completely legitimately according to GitHub
>>>> documentation), maybe hosted runners will be also possible to self-manage :D
>>>>
>>>> J.
>>>>
>>>> On Tue, Oct 13, 2020 at 2:22 PM Ash Berlin-Taylor <as...@apache.org>
>>>> wrote:
>>>>
>>>> I've thought about private/self-hosted runners, and I think long term
>>>> that's the way to go to alievate our CI bottlenecks.
>>>>
>>>> There's a bit of work we need to do around security of builds - as
>>>> mentioned here
>>>> https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories
>>>>
>>>> > We recommend that you do not use self-hosted runners with public
>>>> repositories.
>>>> >
>>>> > Forks of your public repository can potentially run dangerous code on
>>>> your self-hosted runner machine by creating a pull request that executes
>>>> the code in a workflow.
>>>> >
>>>> > This is not an issue with GitHub-hosted runners because each
>>>> GitHub-hosted runner is always a clean isolated virtual machine, and it is
>>>> destroyed at the end of the job execution.
>>>>
>>>> So we'd need to dos something similar.
>>>>
>>>> All for this and happy to help out once 2.0 is out (or at least once it
>>>> starts to quieten down)
>>>>
>>>> -ash
>>>>
>>>> On Oct 13 2020, at 1:12 pm, Jarek Potiuk <Ja...@polidea.com>
>>>> wrote:
>>>>
>>>> Hello Aizhamal, Everyone,
>>>>
>>>> We've had some problems recently with concurrency for Github Actions
>>>> and suggested solution for now is to use self-hosted runners (This is
>>>> suggested by GitHub Support)
>>>>
>>>> I made some comments in the issue here:
>>>>
>>>> https://github.com/apache/airflow/issues/11496
>>>>
>>>> And also opened build@ discussion
>>>> https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E and
>>>> opened an accompanying ticket in JIRA:
>>>> https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20978
>>>>
>>>> Regardless from those discussions, It would be great if we come back to
>>>> the idea of Google Donating some credits to Apache Airlfow to setup their
>>>> own runners.
>>>>
>>>>  We have not used them last time when GitLab did not manage to
>>>> implement the needed fork support (they have not implemented it till NOW
>>>> for more than 1.5 year!) but with GitHub I am quite certain we can switch
>>>> and start using such runners pretty much immediately if we had some
>>>> credits.
>>>>
>>>> Or maybe some other companies could donate some credits to us ?
>>>>
>>>> J.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Jarek Potiuk
>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>
>>>> M: +48 660 796 129 <+48660796129>
>>>> [image: Polidea] <https://www.polidea.com/>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Jarek Potiuk
>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>
>>>> M: +48 660 796 129 <+48660796129>
>>>> [image: Polidea] <https://www.polidea.com/>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Jarek Potiuk
>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>
>>>> M: +48 660 796 129 <+48660796129>
>>>> [image: Polidea] <https://www.polidea.com/>
>>>>
>>>>
>>>
>>> --
>>>
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>>
>>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

Re: Credits from Google (or other sponsors?) for self-hosted runners

Posted by Jarek Potiuk <Ja...@polidea.com>.
Yep, we can do it: *docker build --cpu-shares=100 --memory=1024m *

On Tue, Oct 13, 2020 at 6:15 PM Jarek Potiuk <Ja...@polidea.com>
wrote:

> Plus the "workflow_runs" (image building) for all PRs can also be done in
> the self-hosted workers. They are safe as they are using master scripts
> (the only potentially dangerous part in them is that someone could do some
> "mining" as "malicious" Docker image building step, This is the only part
> that comes from the PR for "workflow_run" but this would be isolated within
> the docker build process which I believe has rather limited resources or we
> can limit it additionally to single processor and limited memory.
>
> J.
>
>
> On Tue, Oct 13, 2020 at 6:12 PM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
>> I think this part is easy:
>>
>> * First of all -  It is similar to GA - someone could have used all the
>> 180 workers of Apache by submitting PRs to various projects. So we just
>> need a limited worker queue. All those can run as workers in GKE and it
>> should be easy to manage (we could have auto-scaling GKE cluster with upper
>> limit)
>> * Secondly - we can - likely - continue using the GA public workers for
>> all incoming PRs and only use the self-hosted ones for master pushes. Or we
>> could also use them for PRs coming from maintainers.
>>
>> J.
>>
>>
>>
>> On Tue, Oct 13, 2020 at 5:52 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>>
>>> And a magic security sandbox :D
>>>
>>> On Oct 13 2020, at 4:51 pm, Jarek Potiuk <Ja...@polidea.com>
>>> wrote:
>>>
>>> Yep. Now we just need credits :)
>>>
>>> On Tue, Oct 13, 2020 at 5:30 PM Kaxil Naik <ka...@gmail.com> wrote:
>>>
>>> That's ace, we should go ahead with self-hosted runners then.
>>>
>>> On Tue, Oct 13, 2020 at 4:06 PM Ash Berlin-Taylor <as...@apache.org>
>>> wrote:
>>>
>>> Confirmed, we *can* do it - Arrow has done it already
>>> https://issues.apache.org/jira/browse/INFRA-19875
>>>
>>> But lets have a think on how to not be a bot net :)
>>>
>>> On Oct 13 2020, at 3:59 pm, Ash Berlin-Taylor <as...@apache.org> wrote:
>>>
>>> I've spoken to a few members of ASF Infra directly, and they are just
>>> confirming but they are okay with the idea of us adding self hosted runners
>>> to our repo, and also okay that we can manage those nodes ourselves. Should
>>> get final confirmation today.
>>>
>>> I wanted to double check that we could use the credits before we get
>>> anyone to stump up the VMs/credits etc.
>>>
>>> -ash
>>>
>>> On Oct 13 2020, at 2:16 pm, Jarek Potiuk <Ja...@polidea.com>
>>> wrote:
>>>
>>> This is also a slight problem as mentioned in the build@ thread:
>>> https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E -
>>> managing hosting runners has to be done through infrastructure and they are
>>> not really responsive recently (I have tickets waiting for weeks now).
>>>
>>> But as I've learned recently that we can manage our own secrets via API
>>> without INFRA (and completely legitimately according to GitHub
>>> documentation), maybe hosted runners will be also possible to self-manage :D
>>>
>>> J.
>>>
>>> On Tue, Oct 13, 2020 at 2:22 PM Ash Berlin-Taylor <as...@apache.org>
>>> wrote:
>>>
>>> I've thought about private/self-hosted runners, and I think long term
>>> that's the way to go to alievate our CI bottlenecks.
>>>
>>> There's a bit of work we need to do around security of builds - as
>>> mentioned here
>>> https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories
>>>
>>> > We recommend that you do not use self-hosted runners with public
>>> repositories.
>>> >
>>> > Forks of your public repository can potentially run dangerous code on
>>> your self-hosted runner machine by creating a pull request that executes
>>> the code in a workflow.
>>> >
>>> > This is not an issue with GitHub-hosted runners because each
>>> GitHub-hosted runner is always a clean isolated virtual machine, and it is
>>> destroyed at the end of the job execution.
>>>
>>> So we'd need to dos something similar.
>>>
>>> All for this and happy to help out once 2.0 is out (or at least once it
>>> starts to quieten down)
>>>
>>> -ash
>>>
>>> On Oct 13 2020, at 1:12 pm, Jarek Potiuk <Ja...@polidea.com>
>>> wrote:
>>>
>>> Hello Aizhamal, Everyone,
>>>
>>> We've had some problems recently with concurrency for Github Actions and
>>> suggested solution for now is to use self-hosted runners (This is suggested
>>> by GitHub Support)
>>>
>>> I made some comments in the issue here:
>>>
>>> https://github.com/apache/airflow/issues/11496
>>>
>>> And also opened build@ discussion
>>> https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E and
>>> opened an accompanying ticket in JIRA:
>>> https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20978
>>>
>>> Regardless from those discussions, It would be great if we come back to
>>> the idea of Google Donating some credits to Apache Airlfow to setup their
>>> own runners.
>>>
>>>  We have not used them last time when GitLab did not manage to implement
>>> the needed fork support (they have not implemented it till NOW for more
>>> than 1.5 year!) but with GitHub I am quite certain we can switch and start
>>> using such runners pretty much immediately if we had some credits.
>>>
>>> Or maybe some other companies could donate some credits to us ?
>>>
>>> J.
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>>
>>>
>>>
>>> --
>>>
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>>
>>>
>>>
>>> --
>>>
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>>
>>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Credits from Google (or other sponsors?) for self-hosted runners

Posted by Jarek Potiuk <Ja...@polidea.com>.
Plus the "workflow_runs" (image building) for all PRs can also be done in
the self-hosted workers. They are safe as they are using master scripts
(the only potentially dangerous part in them is that someone could do some
"mining" as "malicious" Docker image building step, This is the only part
that comes from the PR for "workflow_run" but this would be isolated within
the docker build process which I believe has rather limited resources or we
can limit it additionally to single processor and limited memory.

J.


On Tue, Oct 13, 2020 at 6:12 PM Jarek Potiuk <Ja...@polidea.com>
wrote:

> I think this part is easy:
>
> * First of all -  It is similar to GA - someone could have used all the
> 180 workers of Apache by submitting PRs to various projects. So we just
> need a limited worker queue. All those can run as workers in GKE and it
> should be easy to manage (we could have auto-scaling GKE cluster with upper
> limit)
> * Secondly - we can - likely - continue using the GA public workers for
> all incoming PRs and only use the self-hosted ones for master pushes. Or we
> could also use them for PRs coming from maintainers.
>
> J.
>
>
>
> On Tue, Oct 13, 2020 at 5:52 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>
>> And a magic security sandbox :D
>>
>> On Oct 13 2020, at 4:51 pm, Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>>
>> Yep. Now we just need credits :)
>>
>> On Tue, Oct 13, 2020 at 5:30 PM Kaxil Naik <ka...@gmail.com> wrote:
>>
>> That's ace, we should go ahead with self-hosted runners then.
>>
>> On Tue, Oct 13, 2020 at 4:06 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>>
>> Confirmed, we *can* do it - Arrow has done it already
>> https://issues.apache.org/jira/browse/INFRA-19875
>>
>> But lets have a think on how to not be a bot net :)
>>
>> On Oct 13 2020, at 3:59 pm, Ash Berlin-Taylor <as...@apache.org> wrote:
>>
>> I've spoken to a few members of ASF Infra directly, and they are just
>> confirming but they are okay with the idea of us adding self hosted runners
>> to our repo, and also okay that we can manage those nodes ourselves. Should
>> get final confirmation today.
>>
>> I wanted to double check that we could use the credits before we get
>> anyone to stump up the VMs/credits etc.
>>
>> -ash
>>
>> On Oct 13 2020, at 2:16 pm, Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>>
>> This is also a slight problem as mentioned in the build@ thread:
>> https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E -
>> managing hosting runners has to be done through infrastructure and they are
>> not really responsive recently (I have tickets waiting for weeks now).
>>
>> But as I've learned recently that we can manage our own secrets via API
>> without INFRA (and completely legitimately according to GitHub
>> documentation), maybe hosted runners will be also possible to self-manage :D
>>
>> J.
>>
>> On Tue, Oct 13, 2020 at 2:22 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>>
>> I've thought about private/self-hosted runners, and I think long term
>> that's the way to go to alievate our CI bottlenecks.
>>
>> There's a bit of work we need to do around security of builds - as
>> mentioned here
>> https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories
>>
>> > We recommend that you do not use self-hosted runners with public
>> repositories.
>> >
>> > Forks of your public repository can potentially run dangerous code on
>> your self-hosted runner machine by creating a pull request that executes
>> the code in a workflow.
>> >
>> > This is not an issue with GitHub-hosted runners because each
>> GitHub-hosted runner is always a clean isolated virtual machine, and it is
>> destroyed at the end of the job execution.
>>
>> So we'd need to dos something similar.
>>
>> All for this and happy to help out once 2.0 is out (or at least once it
>> starts to quieten down)
>>
>> -ash
>>
>> On Oct 13 2020, at 1:12 pm, Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>>
>> Hello Aizhamal, Everyone,
>>
>> We've had some problems recently with concurrency for Github Actions and
>> suggested solution for now is to use self-hosted runners (This is suggested
>> by GitHub Support)
>>
>> I made some comments in the issue here:
>>
>> https://github.com/apache/airflow/issues/11496
>>
>> And also opened build@ discussion
>> https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E and
>> opened an accompanying ticket in JIRA:
>> https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20978
>>
>> Regardless from those discussions, It would be great if we come back to
>> the idea of Google Donating some credits to Apache Airlfow to setup their
>> own runners.
>>
>>  We have not used them last time when GitLab did not manage to implement
>> the needed fork support (they have not implemented it till NOW for more
>> than 1.5 year!) but with GitHub I am quite certain we can switch and start
>> using such runners pretty much immediately if we had some credits.
>>
>> Or maybe some other companies could donate some credits to us ?
>>
>> J.
>>
>>
>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Credits from Google (or other sponsors?) for self-hosted runners

Posted by Daniel Imberman <da...@gmail.com>.
re: security conerns, this is a case where we could require committer 
approval before running full tests (though leaves the risk that a PR is 
approved for testing and then the user adds something concerning after).

via Newton Mail 
[https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.51&pv=10.15.6&source=email_footer_2]
On Tue, Oct 13, 2020 at 9:12 AM, Jarek Potiuk <Ja...@polidea.com> 
wrote:
I think this part is easy:
* First of all - It is similar to GA - someone could have used all the 180 
workers of Apache by submitting PRs to various projects. So we just need a 
limited worker queue. All those can run as workers in GKE and it should be 
easy to manage (we could have auto-scaling GKE cluster with upper limit) * 
Secondly - we can - likely - continue using the GA public workers for all 
incoming PRs and only use the self-hosted ones for master pushes. Or we 
could also use them for PRs coming from maintainers.
J.


On Tue, Oct 13, 2020 at 5:52 PM Ash Berlin-Taylor < ash@apache.org 
[ash@apache.org] > wrote:
And a magic security sandbox :D
On Oct 13 2020, at 4:51 pm, Jarek Potiuk < Jarek.Potiuk@polidea.com 
[Jarek.Potiuk@polidea.com] > wrote: Yep. Now we just need credits :)
On Tue, Oct 13, 2020 at 5:30 PM Kaxil Naik < kaxilnaik@gmail.com 
[kaxilnaik@gmail.com] > wrote: That's ace, we should go ahead with 
self-hosted runners then.
On Tue, Oct 13, 2020 at 4:06 PM Ash Berlin-Taylor < ash@apache.org 
[ash@apache.org] > wrote: Confirmed, we can do it - Arrow has done it 
already https://issues.apache.org/jira/browse/INFRA-19875 
[https://issues.apache.org/jira/browse/INFRA-19875]
But lets have a think on how to not be a bot net :)
On Oct 13 2020, at 3:59 pm, Ash Berlin-Taylor < ash@apache.org 
[ash@apache.org] > wrote: I've spoken to a few members of ASF Infra 
directly, and they are just confirming but they are okay with the idea of 
us adding self hosted runners to our repo, and also okay that we can manage 
those nodes ourselves. Should get final confirmation today.
I wanted to double check that we could use the credits before we get anyone 
to stump up the VMs/credits etc.
-ash
On Oct 13 2020, at 2:16 pm, Jarek Potiuk < Jarek.Potiuk@polidea.com 
[Jarek.Potiuk@polidea.com] > wrote: This is also a slight problem as 
mentioned in the build@ thread: 
https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E 
[https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E] 
- managing hosting runners has to be done through infrastructure and they 
are not really responsive recently (I have tickets waiting for weeks now).
But as I've learned recently that we can manage our own secrets via API 
without INFRA (and completely legitimately according to GitHub 
documentation), maybe hosted runners will be also possible to self-manage 
:D
J.
On Tue, Oct 13, 2020 at 2:22 PM Ash Berlin-Taylor < ash@apache.org 
[ash@apache.org] > wrote: I've thought about private/self-hosted runners, 
and I think long term that's the way to go to alievate our CI bottlenecks.
There's a bit of work we need to do around security of builds - as 
mentioned here 
https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories 
[https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories]
 > We recommend that you do not use self-hosted runners with public 
repositories. > > Forks of your public repository can potentially run 
dangerous code on your self-hosted runner machine by creating a pull 
request that executes the code in a workflow. > > This is not an issue with 
GitHub-hosted runners because each GitHub-hosted runner is always a clean 
isolated virtual machine, and it is destroyed at the end of the job 
execution.
So we'd need to dos something similar.
All for this and happy to help out once 2.0 is out (or at least once it 
starts to quieten down)
-ash
On Oct 13 2020, at 1:12 pm, Jarek Potiuk < Jarek.Potiuk@polidea.com 
[Jarek.Potiuk@polidea.com] > wrote: Hello Aizhamal, Everyone,
We've had some problems recently with concurrency for Github Actions and 
suggested solution for now is to use self-hosted runners (This is suggested 
by GitHub Support)
I made some comments in the issue here:
https://github.com/apache/airflow/issues/11496 
[https://github.com/apache/airflow/issues/11496]
And also opened build@ discussion 
https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E 
[https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E] 
and opened an accompanying ticket in JIRA: 
https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20978 
[https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20978]
Regardless from those discussions, It would be great if we come back to the 
idea of Google Donating some credits to Apache Airlfow to setup their own 
runners.
We have not used them last time when GitLab did not manage to implement the 
needed fork support (they have not implemented it till NOW for more than 
1.5 year!) but with GitHub I am quite certain we can switch and start using 
such runners pretty much immediately if we had some credits.
Or maybe some other companies could donate some credits to us ?
J.



--   Jarek Potiuk                                                       
    Polidea [https://www.polidea.com/] | Principal Software Engineer   

M: +48 660 796 129 [tel:+48660796129]   
[https://www.polidea.com/]              



--   Jarek Potiuk                                                       
    Polidea [https://www.polidea.com/] | Principal Software Engineer   

M: +48 660 796 129 [tel:+48660796129]   
[https://www.polidea.com/]              



--   Jarek Potiuk                                                       
    Polidea [https://www.polidea.com/] | Principal Software Engineer   

M: +48 660 796 129 [tel:+48660796129]   
[https://www.polidea.com/]              



--
    Jarek Potiuk                                                       
    Polidea [https://www.polidea.com/] | Principal Software Engineer   

M: +48 660 796 129 [tel:+48660796129]   
[https://www.polidea.com/]

Re: Credits from Google (or other sponsors?) for self-hosted runners

Posted by Jarek Potiuk <Ja...@polidea.com>.
I think this part is easy:

* First of all -  It is similar to GA - someone could have used all the 180
workers of Apache by submitting PRs to various projects. So we just need a
limited worker queue. All those can run as workers in GKE and it should be
easy to manage (we could have auto-scaling GKE cluster with upper limit)
* Secondly - we can - likely - continue using the GA public workers for all
incoming PRs and only use the self-hosted ones for master pushes. Or we
could also use them for PRs coming from maintainers.

J.



On Tue, Oct 13, 2020 at 5:52 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> And a magic security sandbox :D
>
> On Oct 13 2020, at 4:51 pm, Jarek Potiuk <Ja...@polidea.com> wrote:
>
> Yep. Now we just need credits :)
>
> On Tue, Oct 13, 2020 at 5:30 PM Kaxil Naik <ka...@gmail.com> wrote:
>
> That's ace, we should go ahead with self-hosted runners then.
>
> On Tue, Oct 13, 2020 at 4:06 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>
> Confirmed, we *can* do it - Arrow has done it already
> https://issues.apache.org/jira/browse/INFRA-19875
>
> But lets have a think on how to not be a bot net :)
>
> On Oct 13 2020, at 3:59 pm, Ash Berlin-Taylor <as...@apache.org> wrote:
>
> I've spoken to a few members of ASF Infra directly, and they are just
> confirming but they are okay with the idea of us adding self hosted runners
> to our repo, and also okay that we can manage those nodes ourselves. Should
> get final confirmation today.
>
> I wanted to double check that we could use the credits before we get
> anyone to stump up the VMs/credits etc.
>
> -ash
>
> On Oct 13 2020, at 2:16 pm, Jarek Potiuk <Ja...@polidea.com> wrote:
>
> This is also a slight problem as mentioned in the build@ thread:
> https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E -
> managing hosting runners has to be done through infrastructure and they are
> not really responsive recently (I have tickets waiting for weeks now).
>
> But as I've learned recently that we can manage our own secrets via API
> without INFRA (and completely legitimately according to GitHub
> documentation), maybe hosted runners will be also possible to self-manage :D
>
> J.
>
> On Tue, Oct 13, 2020 at 2:22 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>
> I've thought about private/self-hosted runners, and I think long term
> that's the way to go to alievate our CI bottlenecks.
>
> There's a bit of work we need to do around security of builds - as
> mentioned here
> https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories
>
> > We recommend that you do not use self-hosted runners with public
> repositories.
> >
> > Forks of your public repository can potentially run dangerous code on
> your self-hosted runner machine by creating a pull request that executes
> the code in a workflow.
> >
> > This is not an issue with GitHub-hosted runners because each
> GitHub-hosted runner is always a clean isolated virtual machine, and it is
> destroyed at the end of the job execution.
>
> So we'd need to dos something similar.
>
> All for this and happy to help out once 2.0 is out (or at least once it
> starts to quieten down)
>
> -ash
>
> On Oct 13 2020, at 1:12 pm, Jarek Potiuk <Ja...@polidea.com> wrote:
>
> Hello Aizhamal, Everyone,
>
> We've had some problems recently with concurrency for Github Actions and
> suggested solution for now is to use self-hosted runners (This is suggested
> by GitHub Support)
>
> I made some comments in the issue here:
>
> https://github.com/apache/airflow/issues/11496
>
> And also opened build@ discussion
> https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E and
> opened an accompanying ticket in JIRA:
> https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20978
>
> Regardless from those discussions, It would be great if we come back to
> the idea of Google Donating some credits to Apache Airlfow to setup their
> own runners.
>
>  We have not used them last time when GitLab did not manage to implement
> the needed fork support (they have not implemented it till NOW for more
> than 1.5 year!) but with GitHub I am quite certain we can switch and start
> using such runners pretty much immediately if we had some credits.
>
> Or maybe some other companies could donate some credits to us ?
>
> J.
>
>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Credits from Google (or other sponsors?) for self-hosted runners

Posted by Ash Berlin-Taylor <as...@apache.org>.
And a magic security sandbox :D

On Oct 13 2020, at 4:51 pm, Jarek Potiuk <Ja...@polidea.com> wrote:
> Yep. Now we just need credits :)
>
> On Tue, Oct 13, 2020 at 5:30 PM Kaxil Naik <kaxilnaik@gmail.com (mailto:kaxilnaik@gmail.com)> wrote:
> > That's ace, we should go ahead with self-hosted runners then.
> >
> >
> > On Tue, Oct 13, 2020 at 4:06 PM Ash Berlin-Taylor <ash@apache.org (mailto:ash@apache.org)> wrote:
> > > Confirmed, we can do it - Arrow has done it already https://issues.apache.org/jira/browse/INFRA-19875
> > >
> > > But lets have a think on how to not be a bot net :)
> > > On Oct 13 2020, at 3:59 pm, Ash Berlin-Taylor <ash@apache.org (mailto:ash@apache.org)> wrote:
> > > > I've spoken to a few members of ASF Infra directly, and they are just confirming but they are okay with the idea of us adding self hosted runners to our repo, and also okay that we can manage those nodes ourselves. Should get final confirmation today.
> > > >
> > > > I wanted to double check that we could use the credits before we get anyone to stump up the VMs/credits etc.
> > > > -ash
> > > > On Oct 13 2020, at 2:16 pm, Jarek Potiuk <Jarek.Potiuk@polidea.com (mailto:Jarek.Potiuk@polidea.com)> wrote:
> > > > > This is also a slight problem as mentioned in the build@ thread: https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E - managing hosting runners has to be done through infrastructure and they are not really responsive recently (I have tickets waiting for weeks now).
> > > > >
> > > > > But as I've learned recently that we can manage our own secrets via API without INFRA (and completely legitimately according to GitHub documentation), maybe hosted runners will be also possible to self-manage :D
> > > > >
> > > > > J.
> > > > > On Tue, Oct 13, 2020 at 2:22 PM Ash Berlin-Taylor <ash@apache.org (mailto:ash@apache.org)> wrote:
> > > > > > I've thought about private/self-hosted runners, and I think long term that's the way to go to alievate our CI bottlenecks.
> > > > > >
> > > > > > There's a bit of work we need to do around security of builds - as mentioned here https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories
> > > > > > > We recommend that you do not use self-hosted runners with public repositories.
> > > > > > >
> > > > > > > Forks of your public repository can potentially run dangerous code on your self-hosted runner machine by creating a pull request that executes the code in a workflow.
> > > > > > >
> > > > > > > This is not an issue with GitHub-hosted runners because each GitHub-hosted runner is always a clean isolated virtual machine, and it is destroyed at the end of the job execution.
> > > > > >
> > > > > > So we'd need to dos something similar.
> > > > > > All for this and happy to help out once 2.0 is out (or at least once it starts to quieten down)
> > > > > > -ash
> > > > > > On Oct 13 2020, at 1:12 pm, Jarek Potiuk <Jarek.Potiuk@polidea.com (mailto:Jarek.Potiuk@polidea.com)> wrote:
> > > > > > > Hello Aizhamal, Everyone,
> > > > > > >
> > > > > > > We've had some problems recently with concurrency for Github Actions and suggested solution for now is to use self-hosted runners (This is suggested by GitHub Support)
> > > > > > >
> > > > > > > I made some comments in the issue here:
> > > > > > >
> > > > > > > https://github.com/apache/airflow/issues/11496
> > > > > > >
> > > > > > > And also opened build@ discussion https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E and opened an accompanying ticket in JIRA: https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20978
> > > > > > >
> > > > > > > Regardless from those discussions, It would be great if we come back to the idea of Google Donating some credits to Apache Airlfow to setup their own runners.
> > > > > > >
> > > > > > > We have not used them last time when GitLab did not manage to implement the needed fork support (they have not implemented it till NOW for more than 1.5 year!) but with GitHub I am quite certain we can switch and start using such runners pretty much immediately if we had some credits.
> > > > > > >
> > > > > > > Or maybe some other companies could donate some credits to us ?
> > > > > > >
> > > > > > > J.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > >
> > > > > > > Jarek Potiuk
> > > > > > > Polidea (https://www.polidea.com/) | Principal Software Engineer
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > M: +48 660 796 129 (tel:+48660796129)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > >
> > > > > Jarek Potiuk
> > > > > Polidea (https://www.polidea.com/) | Principal Software Engineer
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > M: +48 660 796 129 (tel:+48660796129)
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> >
>
>
>
>
> --
>
>
> Jarek Potiuk
> Polidea (https://www.polidea.com/) | Principal Software Engineer
>
>
>
>
>
>
>
> M: +48 660 796 129 (tel:+48660796129)
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


Re: Credits from Google (or other sponsors?) for self-hosted runners

Posted by Jarek Potiuk <Ja...@polidea.com>.
Yep. Now we just need credits :)

On Tue, Oct 13, 2020 at 5:30 PM Kaxil Naik <ka...@gmail.com> wrote:

> That's ace, we should go ahead with self-hosted runners then.
>
> On Tue, Oct 13, 2020 at 4:06 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>
>> Confirmed, we *can* do it - Arrow has done it already
>> https://issues.apache.org/jira/browse/INFRA-19875
>>
>> But lets have a think on how to not be a bot net :)
>>
>> On Oct 13 2020, at 3:59 pm, Ash Berlin-Taylor <as...@apache.org> wrote:
>>
>> I've spoken to a few members of ASF Infra directly, and they are just
>> confirming but they are okay with the idea of us adding self hosted runners
>> to our repo, and also okay that we can manage those nodes ourselves. Should
>> get final confirmation today.
>>
>> I wanted to double check that we could use the credits before we get
>> anyone to stump up the VMs/credits etc.
>>
>> -ash
>>
>> On Oct 13 2020, at 2:16 pm, Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>>
>> This is also a slight problem as mentioned in the build@ thread:
>> https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E -
>> managing hosting runners has to be done through infrastructure and they are
>> not really responsive recently (I have tickets waiting for weeks now).
>>
>> But as I've learned recently that we can manage our own secrets via API
>> without INFRA (and completely legitimately according to GitHub
>> documentation), maybe hosted runners will be also possible to self-manage :D
>>
>> J.
>>
>> On Tue, Oct 13, 2020 at 2:22 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>>
>> I've thought about private/self-hosted runners, and I think long term
>> that's the way to go to alievate our CI bottlenecks.
>>
>> There's a bit of work we need to do around security of builds - as
>> mentioned here
>> https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories
>>
>> > We recommend that you do not use self-hosted runners with public
>> repositories.
>> >
>> > Forks of your public repository can potentially run dangerous code on
>> your self-hosted runner machine by creating a pull request that executes
>> the code in a workflow.
>> >
>> > This is not an issue with GitHub-hosted runners because each
>> GitHub-hosted runner is always a clean isolated virtual machine, and it is
>> destroyed at the end of the job execution.
>>
>> So we'd need to dos something similar.
>>
>> All for this and happy to help out once 2.0 is out (or at least once it
>> starts to quieten down)
>>
>> -ash
>>
>> On Oct 13 2020, at 1:12 pm, Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>>
>> Hello Aizhamal, Everyone,
>>
>> We've had some problems recently with concurrency for Github Actions and
>> suggested solution for now is to use self-hosted runners (This is suggested
>> by GitHub Support)
>>
>> I made some comments in the issue here:
>>
>> https://github.com/apache/airflow/issues/11496
>>
>> And also opened build@ discussion
>> https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E and
>> opened an accompanying ticket in JIRA:
>> https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20978
>>
>> Regardless from those discussions, It would be great if we come back to
>> the idea of Google Donating some credits to Apache Airlfow to setup their
>> own runners.
>>
>>  We have not used them last time when GitLab did not manage to implement
>> the needed fork support (they have not implemented it till NOW for more
>> than 1.5 year!) but with GitHub I am quite certain we can switch and start
>> using such runners pretty much immediately if we had some credits.
>>
>> Or maybe some other companies could donate some credits to us ?
>>
>> J.
>>
>>
>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Credits from Google (or other sponsors?) for self-hosted runners

Posted by Kaxil Naik <ka...@gmail.com>.
That's ace, we should go ahead with self-hosted runners then.

On Tue, Oct 13, 2020 at 4:06 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> Confirmed, we *can* do it - Arrow has done it already
> https://issues.apache.org/jira/browse/INFRA-19875
>
> But lets have a think on how to not be a bot net :)
>
> On Oct 13 2020, at 3:59 pm, Ash Berlin-Taylor <as...@apache.org> wrote:
>
> I've spoken to a few members of ASF Infra directly, and they are just
> confirming but they are okay with the idea of us adding self hosted runners
> to our repo, and also okay that we can manage those nodes ourselves. Should
> get final confirmation today.
>
> I wanted to double check that we could use the credits before we get
> anyone to stump up the VMs/credits etc.
>
> -ash
>
> On Oct 13 2020, at 2:16 pm, Jarek Potiuk <Ja...@polidea.com> wrote:
>
> This is also a slight problem as mentioned in the build@ thread:
> https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E -
> managing hosting runners has to be done through infrastructure and they are
> not really responsive recently (I have tickets waiting for weeks now).
>
> But as I've learned recently that we can manage our own secrets via API
> without INFRA (and completely legitimately according to GitHub
> documentation), maybe hosted runners will be also possible to self-manage :D
>
> J.
>
> On Tue, Oct 13, 2020 at 2:22 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>
> I've thought about private/self-hosted runners, and I think long term
> that's the way to go to alievate our CI bottlenecks.
>
> There's a bit of work we need to do around security of builds - as
> mentioned here
> https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories
>
> > We recommend that you do not use self-hosted runners with public
> repositories.
> >
> > Forks of your public repository can potentially run dangerous code on
> your self-hosted runner machine by creating a pull request that executes
> the code in a workflow.
> >
> > This is not an issue with GitHub-hosted runners because each
> GitHub-hosted runner is always a clean isolated virtual machine, and it is
> destroyed at the end of the job execution.
>
> So we'd need to dos something similar.
>
> All for this and happy to help out once 2.0 is out (or at least once it
> starts to quieten down)
>
> -ash
>
> On Oct 13 2020, at 1:12 pm, Jarek Potiuk <Ja...@polidea.com> wrote:
>
> Hello Aizhamal, Everyone,
>
> We've had some problems recently with concurrency for Github Actions and
> suggested solution for now is to use self-hosted runners (This is suggested
> by GitHub Support)
>
> I made some comments in the issue here:
>
> https://github.com/apache/airflow/issues/11496
>
> And also opened build@ discussion
> https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E and
> opened an accompanying ticket in JIRA:
> https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20978
>
> Regardless from those discussions, It would be great if we come back to
> the idea of Google Donating some credits to Apache Airlfow to setup their
> own runners.
>
>  We have not used them last time when GitLab did not manage to implement
> the needed fork support (they have not implemented it till NOW for more
> than 1.5 year!) but with GitHub I am quite certain we can switch and start
> using such runners pretty much immediately if we had some credits.
>
> Or maybe some other companies could donate some credits to us ?
>
> J.
>
>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

Re: Credits from Google (or other sponsors?) for self-hosted runners

Posted by Ash Berlin-Taylor <as...@apache.org>.
Confirmed, we can do it - Arrow has done it already https://issues.apache.org/jira/browse/INFRA-19875

But lets have a think on how to not be a bot net :)
On Oct 13 2020, at 3:59 pm, Ash Berlin-Taylor <as...@apache.org> wrote:
> I've spoken to a few members of ASF Infra directly, and they are just confirming but they are okay with the idea of us adding self hosted runners to our repo, and also okay that we can manage those nodes ourselves. Should get final confirmation today.
>
> I wanted to double check that we could use the credits before we get anyone to stump up the VMs/credits etc.
> -ash
> On Oct 13 2020, at 2:16 pm, Jarek Potiuk <Ja...@polidea.com> wrote:
> > This is also a slight problem as mentioned in the build@ thread: https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E - managing hosting runners has to be done through infrastructure and they are not really responsive recently (I have tickets waiting for weeks now).
> >
> > But as I've learned recently that we can manage our own secrets via API without INFRA (and completely legitimately according to GitHub documentation), maybe hosted runners will be also possible to self-manage :D
> >
> > J.
> > On Tue, Oct 13, 2020 at 2:22 PM Ash Berlin-Taylor <ash@apache.org (mailto:ash@apache.org)> wrote:
> > > I've thought about private/self-hosted runners, and I think long term that's the way to go to alievate our CI bottlenecks.
> > >
> > > There's a bit of work we need to do around security of builds - as mentioned here https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories
> > > > We recommend that you do not use self-hosted runners with public repositories.
> > > >
> > > > Forks of your public repository can potentially run dangerous code on your self-hosted runner machine by creating a pull request that executes the code in a workflow.
> > > >
> > > > This is not an issue with GitHub-hosted runners because each GitHub-hosted runner is always a clean isolated virtual machine, and it is destroyed at the end of the job execution.
> > >
> > > So we'd need to dos something similar.
> > > All for this and happy to help out once 2.0 is out (or at least once it starts to quieten down)
> > > -ash
> > > On Oct 13 2020, at 1:12 pm, Jarek Potiuk <Jarek.Potiuk@polidea.com (mailto:Jarek.Potiuk@polidea.com)> wrote:
> > > > Hello Aizhamal, Everyone,
> > > >
> > > > We've had some problems recently with concurrency for Github Actions and suggested solution for now is to use self-hosted runners (This is suggested by GitHub Support)
> > > >
> > > > I made some comments in the issue here:
> > > >
> > > > https://github.com/apache/airflow/issues/11496
> > > >
> > > > And also opened build@ discussion https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E and opened an accompanying ticket in JIRA: https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20978
> > > >
> > > > Regardless from those discussions, It would be great if we come back to the idea of Google Donating some credits to Apache Airlfow to setup their own runners.
> > > >
> > > > We have not used them last time when GitLab did not manage to implement the needed fork support (they have not implemented it till NOW for more than 1.5 year!) but with GitHub I am quite certain we can switch and start using such runners pretty much immediately if we had some credits.
> > > >
> > > > Or maybe some other companies could donate some credits to us ?
> > > >
> > > > J.
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > >
> > > > Jarek Potiuk
> > > > Polidea (https://www.polidea.com/) | Principal Software Engineer
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > M: +48 660 796 129 (tel:+48660796129)
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> >
> > --
> >
> >
> > Jarek Potiuk
> > Polidea (https://www.polidea.com/) | Principal Software Engineer
> >
> >
> >
> >
> >
> >
> >
> >
> > M: +48 660 796 129 (tel:+48660796129)
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>


Re: Credits from Google (or other sponsors?) for self-hosted runners

Posted by Ash Berlin-Taylor <as...@apache.org>.
I've spoken to a few members of ASF Infra directly, and they are just confirming but they are okay with the idea of us adding self hosted runners to our repo, and also okay that we can manage those nodes ourselves. Should get final confirmation today.

I wanted to double check that we could use the credits before we get anyone to stump up the VMs/credits etc.
-ash
On Oct 13 2020, at 2:16 pm, Jarek Potiuk <Ja...@polidea.com> wrote:
> This is also a slight problem as mentioned in the build@ thread: https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E - managing hosting runners has to be done through infrastructure and they are not really responsive recently (I have tickets waiting for weeks now).
>
> But as I've learned recently that we can manage our own secrets via API without INFRA (and completely legitimately according to GitHub documentation), maybe hosted runners will be also possible to self-manage :D
>
> J.
> On Tue, Oct 13, 2020 at 2:22 PM Ash Berlin-Taylor <ash@apache.org (mailto:ash@apache.org)> wrote:
> > I've thought about private/self-hosted runners, and I think long term that's the way to go to alievate our CI bottlenecks.
> >
> > There's a bit of work we need to do around security of builds - as mentioned here https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories
> > > We recommend that you do not use self-hosted runners with public repositories.
> > >
> > > Forks of your public repository can potentially run dangerous code on your self-hosted runner machine by creating a pull request that executes the code in a workflow.
> > >
> > > This is not an issue with GitHub-hosted runners because each GitHub-hosted runner is always a clean isolated virtual machine, and it is destroyed at the end of the job execution.
> >
> > So we'd need to dos something similar.
> > All for this and happy to help out once 2.0 is out (or at least once it starts to quieten down)
> > -ash
> > On Oct 13 2020, at 1:12 pm, Jarek Potiuk <Jarek.Potiuk@polidea.com (mailto:Jarek.Potiuk@polidea.com)> wrote:
> > > Hello Aizhamal, Everyone,
> > >
> > > We've had some problems recently with concurrency for Github Actions and suggested solution for now is to use self-hosted runners (This is suggested by GitHub Support)
> > >
> > > I made some comments in the issue here:
> > >
> > > https://github.com/apache/airflow/issues/11496
> > >
> > > And also opened build@ discussion https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E and opened an accompanying ticket in JIRA: https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20978
> > >
> > > Regardless from those discussions, It would be great if we come back to the idea of Google Donating some credits to Apache Airlfow to setup their own runners.
> > >
> > > We have not used them last time when GitLab did not manage to implement the needed fork support (they have not implemented it till NOW for more than 1.5 year!) but with GitHub I am quite certain we can switch and start using such runners pretty much immediately if we had some credits.
> > >
> > > Or maybe some other companies could donate some credits to us ?
> > >
> > > J.
> > >
> > >
> > >
> > >
> > > --
> > >
> > >
> > > Jarek Potiuk
> > > Polidea (https://www.polidea.com/) | Principal Software Engineer
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > M: +48 660 796 129 (tel:+48660796129)
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
>
>
>
>
> --
>
>
> Jarek Potiuk
> Polidea (https://www.polidea.com/) | Principal Software Engineer
>
>
>
>
>
>
>
> M: +48 660 796 129 (tel:+48660796129)
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


Re: Credits from Google (or other sponsors?) for self-hosted runners

Posted by Jarek Potiuk <Ja...@polidea.com>.
This is also a slight problem as mentioned in the build@ thread:
https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E
- managing hosting runners has to be done through infrastructure and they
are not really responsive recently (I have tickets waiting for weeks now).

But as I've learned recently that we can manage our own secrets via API
without INFRA (and completely legitimately according to GitHub
documentation), maybe hosted runners will be also possible to self-manage :D

J.

On Tue, Oct 13, 2020 at 2:22 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> I've thought about private/self-hosted runners, and I think long term
> that's the way to go to alievate our CI bottlenecks.
>
> There's a bit of work we need to do around security of builds - as
> mentioned here
> https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories
>
> > We recommend that you do not use self-hosted runners with public
> repositories.
> >
> > Forks of your public repository can potentially run dangerous code on
> your self-hosted runner machine by creating a pull request that executes
> the code in a workflow.
> >
> > This is not an issue with GitHub-hosted runners because each
> GitHub-hosted runner is always a clean isolated virtual machine, and it is
> destroyed at the end of the job execution.
>
> So we'd need to dos something similar.
>
> All for this and happy to help out once 2.0 is out (or at least once it
> starts to quieten down)
>
> -ash
>
> On Oct 13 2020, at 1:12 pm, Jarek Potiuk <Ja...@polidea.com> wrote:
>
> Hello Aizhamal, Everyone,
>
> We've had some problems recently with concurrency for Github Actions and
> suggested solution for now is to use self-hosted runners (This is suggested
> by GitHub Support)
>
> I made some comments in the issue here:
>
> https://github.com/apache/airflow/issues/11496
>
> And also opened build@ discussion
> https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E and
> opened an accompanying ticket in JIRA:
> https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20978
>
> Regardless from those discussions, It would be great if we come back to
> the idea of Google Donating some credits to Apache Airlfow to setup their
> own runners.
>
>  We have not used them last time when GitLab did not manage to implement
> the needed fork support (they have not implemented it till NOW for more
> than 1.5 year!) but with GitHub I am quite certain we can switch and start
> using such runners pretty much immediately if we had some credits.
>
> Or maybe some other companies could donate some credits to us ?
>
> J.
>
>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Credits from Google (or other sponsors?) for self-hosted runners

Posted by Ash Berlin-Taylor <as...@apache.org>.
I've thought about private/self-hosted runners, and I think long term that's the way to go to alievate our CI bottlenecks.

There's a bit of work we need to do around security of builds - as mentioned here https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories
> We recommend that you do not use self-hosted runners with public repositories.
>
> Forks of your public repository can potentially run dangerous code on your self-hosted runner machine by creating a pull request that executes the code in a workflow.
>
> This is not an issue with GitHub-hosted runners because each GitHub-hosted runner is always a clean isolated virtual machine, and it is destroyed at the end of the job execution.

So we'd need to dos something similar.
All for this and happy to help out once 2.0 is out (or at least once it starts to quieten down)
-ash
On Oct 13 2020, at 1:12 pm, Jarek Potiuk <Ja...@polidea.com> wrote:
> Hello Aizhamal, Everyone,
>
> We've had some problems recently with concurrency for Github Actions and suggested solution for now is to use self-hosted runners (This is suggested by GitHub Support)
>
> I made some comments in the issue here:
>
> https://github.com/apache/airflow/issues/11496
>
> And also opened build@ discussion https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E and opened an accompanying ticket in JIRA: https://issues.apache.org/jira/projects/INFRA/issues/INFRA-20978
>
> Regardless from those discussions, It would be great if we come back to the idea of Google Donating some credits to Apache Airlfow to setup their own runners.
>
> We have not used them last time when GitLab did not manage to implement the needed fork support (they have not implemented it till NOW for more than 1.5 year!) but with GitHub I am quite certain we can switch and start using such runners pretty much immediately if we had some credits.
>
> Or maybe some other companies could donate some credits to us ?
>
> J.
>
>
>
>
> --
>
>
> Jarek Potiuk
> Polidea (https://www.polidea.com/) | Principal Software Engineer
>
>
>
>
>
>
>
> M: +48 660 796 129 (tel:+48660796129)
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>