You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2019/10/07 22:00:54 UTC

[DRAFT] Apache Arrow Board Report - October 2019

Here is a start for our Q3 board report

## Description:
The mission of Apache Arrow is the creation and maintenance of software related
to columnar in-memory processing and data interchange

## Issues:
There are no issues requiring board attention at this time

## Membership Data:
* Apache Arrow was founded 2016-01-19 (4 years ago)
* There are currently 48 committers and 28 PMC members in this project.
* The Committer-to-PMC ratio is roughly 3:2.

Community changes, past quarter:
- Micah Kornfield was added to the PMC on 2019-08-21
- Sebastien Binet was added to the PMC on 2019-08-21
- Ben Kietzman was added as committer on 2019-09-07
- David Li was added as committer on 2019-08-30
- Kenta Murata was added as committer on 2019-09-05
- Neal Richardson was added as committer on 2019-09-05
- Praveen Kumar was added as committer on 2019-07-14

## Project Activity:

* The project has just made a 0.15.0 release.
* We are discussing ways to make the Arrow libraries as accessible as possible
  to downstream projects for minimal use cases while allowing the development
  of more comprehensive "standard libraries" with larger dependency stacks in
  the project
* We plan to make a 1.0.0 release as our next major release, at which time we
  will declare that the Arrow binary protocol is stable with forward and
  backward compatibility guarantees
* We are struggling with Continuous Integration scalability as the project has
  definitely outgrown what Travis CI and Appveyor can do for us. We are
  exploring alternative solutions such as Buildbot, Buildkite (see
  INFRA-19217), and GitHub Actions to provide a path to migrate away from
  Travis CI / Appveyor

## Community Health:

* The community is overall healthy, with the aforementioned concerns around CI
  scalability. New contributors frequently take notice of the long build queue
  times when submitting pull requests.

Re: [DRAFT] Apache Arrow Board Report - October 2019

Posted by Jacques Nadeau <ja...@apache.org>.
Hey there, I meant to remove the issues section at top and replace with the
one in the community health section but forgot to remove the top part. I
just submitted with the removed top part. Let me know if people want me to
further edit.

Thanks

On Thu, Oct 10, 2019 at 1:54 PM Antoine Pitrou <an...@python.org> wrote:

>
> It's good with me.
>
> Regards
>
> Antoine.
>
>
> Le 10/10/2019 à 22:51, Jacques Nadeau a écrit :
> > Antoine, is my synopsis fair?
> >
> > On Thu, Oct 10, 2019 at 12:53 PM Wes McKinney <we...@gmail.com>
> wrote:
> >
> >> +1
> >>
> >> On Thu, Oct 10, 2019, 2:12 PM Jacques Nadeau <ja...@apache.org>
> wrote:
> >>
> >>> Proposed report update below. LMK your thoughts.
> >>>
> >>> ## Description:
> >>> The mission of Apache Arrow is the creation and maintenance of software
> >>> related to columnar in-memory processing and data interchange
> >>>
> >>> ## Issues:
> >>>
> >>> * We are struggling with Continuous Integration scalability as the
> >> project
> >>> has
> >>>   definitely outgrown what Travis CI and Appveyor can do for us. Some
> >>>   contributors have shown reluctance to submit patches they aren't sure
> >>> about
> >>>   because they don't want to pile on the build queue. We are exploring
> >>>   alternative solutions such as Buildbot, Buildkite, and GitHub Actions
> >> to
> >>>   provide a path to migrate away from Travis CI / Appveyor. In our
> >> request
> >>> to
> >>>   Infrastructure INFRA-19217, some of us were alarmed to find that an
> >> CI/CD
> >>>   service like Buildkite may not be able to be connected to the @apache
> >>> GitHub
> >>>   account on account of requiring admin access to repository webhooks,
> >> but
> >>> no
> >>>   ability to modify source code. There are workarounds (building custom
> >>> OAuth
> >>>   bots) that could enable us to use Buildkite, but it would require
> extra
> >>>   development and result in a less refined experience for community
> >>> members.
> >>>
> >>>
> >>>
> >>> ## Membership Data:
> >>> * Apache Arrow was founded 2016-01-19 (4 years ago)
> >>> * There are currently 48 committers and 28 PMC members in this project.
> >>> * The Committer-to-PMC ratio is roughly 3:2.
> >>>
> >>> Community changes, past quarter:
> >>> - Micah Kornfield was added to the PMC on 2019-08-21
> >>> - Sebastien Binet was added to the PMC on 2019-08-21
> >>> - Ben Kietzman was added as committer on 2019-09-07
> >>> - David Li was added as committer on 2019-08-30
> >>> - Kenta Murata was added as committer on 2019-09-05
> >>> - Neal Richardson was added as committer on 2019-09-05
> >>> - Praveen Kumar was added as committer on 2019-07-14
> >>>
> >>> ## Project Activity:
> >>>
> >>> * The project has just made a 0.15.0 release.
> >>> * We are discussing ways to make the Arrow libraries as accessible as
> >>> possible
> >>>   to downstream projects for minimal use cases while allowing the
> >>> development
> >>>   of more comprehensive "standard libraries" with larger dependency
> >> stacks
> >>> in
> >>>   the project
> >>> * We plan to make a 1.0.0 release as our next major release, at which
> >> time
> >>> we
> >>>   will declare that the Arrow binary protocol is stable with forward
> and
> >>>   backward compatibility guarantees
> >>>
> >>> ## Community Health:
> >>>
> >>> * The community is continuing to grow at a great rate. We see good
> >> adoption
> >>>   among many other projects and fast growth of key metrics.
> >>> * Many contributors are struggling with the slowness of pre-commit CI.
> >>> Arrow
> >>>   has a large number of different platforms and components and a
> complex
> >>> build
> >>>   matrix. As new commits come in, they frequently take a long time to
> >>>   complete. The community is trying several ways to solve this. There
> is
> >>>   bubbling frustration in the community around the GitHub repo rules
> for
> >>> using
> >>>   third party services. This is especially challenging when there are
> >> free
> >>>   solutions to relieve the community pressure but the community is
> unable
> >>> to
> >>>   access these resources. This frustration is greatest among people who
> >>> work
> >>>   on many non-asf OSS projects which don't have such restrictive rules
> >>>   around GitHub.  Some examples of ways the community has tried to
> >> resolve
> >>>   these have included:
> >>>   * Try to use CircleCI, rejected in INFRA-15964
> >>>   * Try to use Azure Pipelines, rejected in INFRA-17030
> >>>   * Try to resolves Issues with Travis CI capacity: INFRA-18533 &
> >>>     https://s.apache.org/ci-capacity (no resolution beyond "find
> >>> donations")
> >>>   * The creation of new infrastructure design (in progress but a huge
> >>> amount of
> >>>     thankless work)
> >>> * While the community has seen great growth in contribution (more than
> >> 300
> >>>   unique contributors at this point), the vast majority are casual
> >>>   contributors. The daily active committers (the workhorses of the
> >> project
> >>>   that bear the load committing the constant PRs, more than 5000 closed
> >> at
> >>>   this point) have been growing slower than adoption. This is despite
> the
> >>> fact
> >>>   that the community has been very aggressive at being inclusive of new
> >>>   committers (with likelihood to have more than 50 in the next week).
> The
> >>>   community is still continuing to try to brainstorm ways to improve
> >> this.
> >>>
> >>
> >
>

Re: [DRAFT] Apache Arrow Board Report - October 2019

Posted by Antoine Pitrou <an...@python.org>.
It's good with me.

Regards

Antoine.


Le 10/10/2019 à 22:51, Jacques Nadeau a écrit :
> Antoine, is my synopsis fair?
> 
> On Thu, Oct 10, 2019 at 12:53 PM Wes McKinney <we...@gmail.com> wrote:
> 
>> +1
>>
>> On Thu, Oct 10, 2019, 2:12 PM Jacques Nadeau <ja...@apache.org> wrote:
>>
>>> Proposed report update below. LMK your thoughts.
>>>
>>> ## Description:
>>> The mission of Apache Arrow is the creation and maintenance of software
>>> related to columnar in-memory processing and data interchange
>>>
>>> ## Issues:
>>>
>>> * We are struggling with Continuous Integration scalability as the
>> project
>>> has
>>>   definitely outgrown what Travis CI and Appveyor can do for us. Some
>>>   contributors have shown reluctance to submit patches they aren't sure
>>> about
>>>   because they don't want to pile on the build queue. We are exploring
>>>   alternative solutions such as Buildbot, Buildkite, and GitHub Actions
>> to
>>>   provide a path to migrate away from Travis CI / Appveyor. In our
>> request
>>> to
>>>   Infrastructure INFRA-19217, some of us were alarmed to find that an
>> CI/CD
>>>   service like Buildkite may not be able to be connected to the @apache
>>> GitHub
>>>   account on account of requiring admin access to repository webhooks,
>> but
>>> no
>>>   ability to modify source code. There are workarounds (building custom
>>> OAuth
>>>   bots) that could enable us to use Buildkite, but it would require extra
>>>   development and result in a less refined experience for community
>>> members.
>>>
>>>
>>>
>>> ## Membership Data:
>>> * Apache Arrow was founded 2016-01-19 (4 years ago)
>>> * There are currently 48 committers and 28 PMC members in this project.
>>> * The Committer-to-PMC ratio is roughly 3:2.
>>>
>>> Community changes, past quarter:
>>> - Micah Kornfield was added to the PMC on 2019-08-21
>>> - Sebastien Binet was added to the PMC on 2019-08-21
>>> - Ben Kietzman was added as committer on 2019-09-07
>>> - David Li was added as committer on 2019-08-30
>>> - Kenta Murata was added as committer on 2019-09-05
>>> - Neal Richardson was added as committer on 2019-09-05
>>> - Praveen Kumar was added as committer on 2019-07-14
>>>
>>> ## Project Activity:
>>>
>>> * The project has just made a 0.15.0 release.
>>> * We are discussing ways to make the Arrow libraries as accessible as
>>> possible
>>>   to downstream projects for minimal use cases while allowing the
>>> development
>>>   of more comprehensive "standard libraries" with larger dependency
>> stacks
>>> in
>>>   the project
>>> * We plan to make a 1.0.0 release as our next major release, at which
>> time
>>> we
>>>   will declare that the Arrow binary protocol is stable with forward and
>>>   backward compatibility guarantees
>>>
>>> ## Community Health:
>>>
>>> * The community is continuing to grow at a great rate. We see good
>> adoption
>>>   among many other projects and fast growth of key metrics.
>>> * Many contributors are struggling with the slowness of pre-commit CI.
>>> Arrow
>>>   has a large number of different platforms and components and a complex
>>> build
>>>   matrix. As new commits come in, they frequently take a long time to
>>>   complete. The community is trying several ways to solve this. There is
>>>   bubbling frustration in the community around the GitHub repo rules for
>>> using
>>>   third party services. This is especially challenging when there are
>> free
>>>   solutions to relieve the community pressure but the community is unable
>>> to
>>>   access these resources. This frustration is greatest among people who
>>> work
>>>   on many non-asf OSS projects which don't have such restrictive rules
>>>   around GitHub.  Some examples of ways the community has tried to
>> resolve
>>>   these have included:
>>>   * Try to use CircleCI, rejected in INFRA-15964
>>>   * Try to use Azure Pipelines, rejected in INFRA-17030
>>>   * Try to resolves Issues with Travis CI capacity: INFRA-18533 &
>>>     https://s.apache.org/ci-capacity (no resolution beyond "find
>>> donations")
>>>   * The creation of new infrastructure design (in progress but a huge
>>> amount of
>>>     thankless work)
>>> * While the community has seen great growth in contribution (more than
>> 300
>>>   unique contributors at this point), the vast majority are casual
>>>   contributors. The daily active committers (the workhorses of the
>> project
>>>   that bear the load committing the constant PRs, more than 5000 closed
>> at
>>>   this point) have been growing slower than adoption. This is despite the
>>> fact
>>>   that the community has been very aggressive at being inclusive of new
>>>   committers (with likelihood to have more than 50 in the next week). The
>>>   community is still continuing to try to brainstorm ways to improve
>> this.
>>>
>>
> 

Re: [DRAFT] Apache Arrow Board Report - October 2019

Posted by Jacques Nadeau <ja...@apache.org>.
Antoine, is my synopsis fair?

On Thu, Oct 10, 2019 at 12:53 PM Wes McKinney <we...@gmail.com> wrote:

> +1
>
> On Thu, Oct 10, 2019, 2:12 PM Jacques Nadeau <ja...@apache.org> wrote:
>
> > Proposed report update below. LMK your thoughts.
> >
> > ## Description:
> > The mission of Apache Arrow is the creation and maintenance of software
> > related to columnar in-memory processing and data interchange
> >
> > ## Issues:
> >
> > * We are struggling with Continuous Integration scalability as the
> project
> > has
> >   definitely outgrown what Travis CI and Appveyor can do for us. Some
> >   contributors have shown reluctance to submit patches they aren't sure
> > about
> >   because they don't want to pile on the build queue. We are exploring
> >   alternative solutions such as Buildbot, Buildkite, and GitHub Actions
> to
> >   provide a path to migrate away from Travis CI / Appveyor. In our
> request
> > to
> >   Infrastructure INFRA-19217, some of us were alarmed to find that an
> CI/CD
> >   service like Buildkite may not be able to be connected to the @apache
> > GitHub
> >   account on account of requiring admin access to repository webhooks,
> but
> > no
> >   ability to modify source code. There are workarounds (building custom
> > OAuth
> >   bots) that could enable us to use Buildkite, but it would require extra
> >   development and result in a less refined experience for community
> > members.
> >
> >
> >
> > ## Membership Data:
> > * Apache Arrow was founded 2016-01-19 (4 years ago)
> > * There are currently 48 committers and 28 PMC members in this project.
> > * The Committer-to-PMC ratio is roughly 3:2.
> >
> > Community changes, past quarter:
> > - Micah Kornfield was added to the PMC on 2019-08-21
> > - Sebastien Binet was added to the PMC on 2019-08-21
> > - Ben Kietzman was added as committer on 2019-09-07
> > - David Li was added as committer on 2019-08-30
> > - Kenta Murata was added as committer on 2019-09-05
> > - Neal Richardson was added as committer on 2019-09-05
> > - Praveen Kumar was added as committer on 2019-07-14
> >
> > ## Project Activity:
> >
> > * The project has just made a 0.15.0 release.
> > * We are discussing ways to make the Arrow libraries as accessible as
> > possible
> >   to downstream projects for minimal use cases while allowing the
> > development
> >   of more comprehensive "standard libraries" with larger dependency
> stacks
> > in
> >   the project
> > * We plan to make a 1.0.0 release as our next major release, at which
> time
> > we
> >   will declare that the Arrow binary protocol is stable with forward and
> >   backward compatibility guarantees
> >
> > ## Community Health:
> >
> > * The community is continuing to grow at a great rate. We see good
> adoption
> >   among many other projects and fast growth of key metrics.
> > * Many contributors are struggling with the slowness of pre-commit CI.
> > Arrow
> >   has a large number of different platforms and components and a complex
> > build
> >   matrix. As new commits come in, they frequently take a long time to
> >   complete. The community is trying several ways to solve this. There is
> >   bubbling frustration in the community around the GitHub repo rules for
> > using
> >   third party services. This is especially challenging when there are
> free
> >   solutions to relieve the community pressure but the community is unable
> > to
> >   access these resources. This frustration is greatest among people who
> > work
> >   on many non-asf OSS projects which don't have such restrictive rules
> >   around GitHub.  Some examples of ways the community has tried to
> resolve
> >   these have included:
> >   * Try to use CircleCI, rejected in INFRA-15964
> >   * Try to use Azure Pipelines, rejected in INFRA-17030
> >   * Try to resolves Issues with Travis CI capacity: INFRA-18533 &
> >     https://s.apache.org/ci-capacity (no resolution beyond "find
> > donations")
> >   * The creation of new infrastructure design (in progress but a huge
> > amount of
> >     thankless work)
> > * While the community has seen great growth in contribution (more than
> 300
> >   unique contributors at this point), the vast majority are casual
> >   contributors. The daily active committers (the workhorses of the
> project
> >   that bear the load committing the constant PRs, more than 5000 closed
> at
> >   this point) have been growing slower than adoption. This is despite the
> > fact
> >   that the community has been very aggressive at being inclusive of new
> >   committers (with likelihood to have more than 50 in the next week). The
> >   community is still continuing to try to brainstorm ways to improve
> this.
> >
>

Re: [DRAFT] Apache Arrow Board Report - October 2019

Posted by Wes McKinney <we...@gmail.com>.
+1

On Thu, Oct 10, 2019, 2:12 PM Jacques Nadeau <ja...@apache.org> wrote:

> Proposed report update below. LMK your thoughts.
>
> ## Description:
> The mission of Apache Arrow is the creation and maintenance of software
> related to columnar in-memory processing and data interchange
>
> ## Issues:
>
> * We are struggling with Continuous Integration scalability as the project
> has
>   definitely outgrown what Travis CI and Appveyor can do for us. Some
>   contributors have shown reluctance to submit patches they aren't sure
> about
>   because they don't want to pile on the build queue. We are exploring
>   alternative solutions such as Buildbot, Buildkite, and GitHub Actions to
>   provide a path to migrate away from Travis CI / Appveyor. In our request
> to
>   Infrastructure INFRA-19217, some of us were alarmed to find that an CI/CD
>   service like Buildkite may not be able to be connected to the @apache
> GitHub
>   account on account of requiring admin access to repository webhooks, but
> no
>   ability to modify source code. There are workarounds (building custom
> OAuth
>   bots) that could enable us to use Buildkite, but it would require extra
>   development and result in a less refined experience for community
> members.
>
>
>
> ## Membership Data:
> * Apache Arrow was founded 2016-01-19 (4 years ago)
> * There are currently 48 committers and 28 PMC members in this project.
> * The Committer-to-PMC ratio is roughly 3:2.
>
> Community changes, past quarter:
> - Micah Kornfield was added to the PMC on 2019-08-21
> - Sebastien Binet was added to the PMC on 2019-08-21
> - Ben Kietzman was added as committer on 2019-09-07
> - David Li was added as committer on 2019-08-30
> - Kenta Murata was added as committer on 2019-09-05
> - Neal Richardson was added as committer on 2019-09-05
> - Praveen Kumar was added as committer on 2019-07-14
>
> ## Project Activity:
>
> * The project has just made a 0.15.0 release.
> * We are discussing ways to make the Arrow libraries as accessible as
> possible
>   to downstream projects for minimal use cases while allowing the
> development
>   of more comprehensive "standard libraries" with larger dependency stacks
> in
>   the project
> * We plan to make a 1.0.0 release as our next major release, at which time
> we
>   will declare that the Arrow binary protocol is stable with forward and
>   backward compatibility guarantees
>
> ## Community Health:
>
> * The community is continuing to grow at a great rate. We see good adoption
>   among many other projects and fast growth of key metrics.
> * Many contributors are struggling with the slowness of pre-commit CI.
> Arrow
>   has a large number of different platforms and components and a complex
> build
>   matrix. As new commits come in, they frequently take a long time to
>   complete. The community is trying several ways to solve this. There is
>   bubbling frustration in the community around the GitHub repo rules for
> using
>   third party services. This is especially challenging when there are free
>   solutions to relieve the community pressure but the community is unable
> to
>   access these resources. This frustration is greatest among people who
> work
>   on many non-asf OSS projects which don't have such restrictive rules
>   around GitHub.  Some examples of ways the community has tried to resolve
>   these have included:
>   * Try to use CircleCI, rejected in INFRA-15964
>   * Try to use Azure Pipelines, rejected in INFRA-17030
>   * Try to resolves Issues with Travis CI capacity: INFRA-18533 &
>     https://s.apache.org/ci-capacity (no resolution beyond "find
> donations")
>   * The creation of new infrastructure design (in progress but a huge
> amount of
>     thankless work)
> * While the community has seen great growth in contribution (more than 300
>   unique contributors at this point), the vast majority are casual
>   contributors. The daily active committers (the workhorses of the project
>   that bear the load committing the constant PRs, more than 5000 closed at
>   this point) have been growing slower than adoption. This is despite the
> fact
>   that the community has been very aggressive at being inclusive of new
>   committers (with likelihood to have more than 50 in the next week). The
>   community is still continuing to try to brainstorm ways to improve this.
>

Re: [DRAFT] Apache Arrow Board Report - October 2019

Posted by Jacques Nadeau <ja...@apache.org>.
Proposed report update below. LMK your thoughts.

## Description:
The mission of Apache Arrow is the creation and maintenance of software
related to columnar in-memory processing and data interchange

## Issues:

* We are struggling with Continuous Integration scalability as the project
has
  definitely outgrown what Travis CI and Appveyor can do for us. Some
  contributors have shown reluctance to submit patches they aren't sure
about
  because they don't want to pile on the build queue. We are exploring
  alternative solutions such as Buildbot, Buildkite, and GitHub Actions to
  provide a path to migrate away from Travis CI / Appveyor. In our request
to
  Infrastructure INFRA-19217, some of us were alarmed to find that an CI/CD
  service like Buildkite may not be able to be connected to the @apache
GitHub
  account on account of requiring admin access to repository webhooks, but
no
  ability to modify source code. There are workarounds (building custom
OAuth
  bots) that could enable us to use Buildkite, but it would require extra
  development and result in a less refined experience for community members.



## Membership Data:
* Apache Arrow was founded 2016-01-19 (4 years ago)
* There are currently 48 committers and 28 PMC members in this project.
* The Committer-to-PMC ratio is roughly 3:2.

Community changes, past quarter:
- Micah Kornfield was added to the PMC on 2019-08-21
- Sebastien Binet was added to the PMC on 2019-08-21
- Ben Kietzman was added as committer on 2019-09-07
- David Li was added as committer on 2019-08-30
- Kenta Murata was added as committer on 2019-09-05
- Neal Richardson was added as committer on 2019-09-05
- Praveen Kumar was added as committer on 2019-07-14

## Project Activity:

* The project has just made a 0.15.0 release.
* We are discussing ways to make the Arrow libraries as accessible as
possible
  to downstream projects for minimal use cases while allowing the
development
  of more comprehensive "standard libraries" with larger dependency stacks
in
  the project
* We plan to make a 1.0.0 release as our next major release, at which time
we
  will declare that the Arrow binary protocol is stable with forward and
  backward compatibility guarantees

## Community Health:

* The community is continuing to grow at a great rate. We see good adoption
  among many other projects and fast growth of key metrics.
* Many contributors are struggling with the slowness of pre-commit CI. Arrow
  has a large number of different platforms and components and a complex
build
  matrix. As new commits come in, they frequently take a long time to
  complete. The community is trying several ways to solve this. There is
  bubbling frustration in the community around the GitHub repo rules for
using
  third party services. This is especially challenging when there are free
  solutions to relieve the community pressure but the community is unable to
  access these resources. This frustration is greatest among people who work
  on many non-asf OSS projects which don't have such restrictive rules
  around GitHub.  Some examples of ways the community has tried to resolve
  these have included:
  * Try to use CircleCI, rejected in INFRA-15964
  * Try to use Azure Pipelines, rejected in INFRA-17030
  * Try to resolves Issues with Travis CI capacity: INFRA-18533 &
    https://s.apache.org/ci-capacity (no resolution beyond "find donations")
  * The creation of new infrastructure design (in progress but a huge
amount of
    thankless work)
* While the community has seen great growth in contribution (more than 300
  unique contributors at this point), the vast majority are casual
  contributors. The daily active committers (the workhorses of the project
  that bear the load committing the constant PRs, more than 5000 closed at
  this point) have been growing slower than adoption. This is despite the
fact
  that the community has been very aggressive at being inclusive of new
  committers (with likelihood to have more than 50 in the next week). The
  community is still continuing to try to brainstorm ways to improve this.

Re: [DRAFT] Apache Arrow Board Report - October 2019

Posted by Jacques Nadeau <ja...@apache.org>.
Arg... accidental send before ready.

What do think about the statement below for community health? Does it
fairly capture the concerns/perspective?

On Thu, Oct 10, 2019 at 10:24 AM Jacques Nadeau <ja...@apache.org> wrote:

> Many contributors are struggling with the slowness of pre-commit CI. Arrow
> has a large number of different platforms and components and a complex
> build matrix. As new commits come in, they frequently take a long time to
> complete. The community is trying several ways to solve this. Some of those
> have been:
>
>    - Try to use CircleCI, rejected in INFRA-15964
>    <https://issues.apache.org/jira/browse/INFRA-15964>
>    - Try to use Azure Pipelines, rejected in INFRA-17030
>    - Try to resolves Issues with Travis CI capacity: INFRA-18533
>    <https://issues.apache.org/jira/browse/INFRA-18533>,
>    https://s.apache.org/ci-capacity (no resolution beyond "find
>    donations")
>    - The creation of new infrastructure design (in progress but a huge
>    amount of thankless work)
>
>
> There is bubbling frustration in the community around the GitHub repo
> rules for using third party services. This is especially challenging when
> there are free solutions to relieve the community pressure but the
> community is unable to access these resources. This frustration is greatest
> among people who work on projects on many OSS projects which don't have
> such restrictive rules around GitHub.
>
> On Thu, Oct 10, 2019 at 5:36 AM Wes McKinney <we...@gmail.com> wrote:
>
>> Here is a rejection of CircleCI more than 18 months ago
>>
>> https://issues.apache.org/jira/browse/INFRA-15964
>>
>> On Thu, Oct 10, 2019 at 4:33 AM Antoine Pitrou <an...@python.org>
>> wrote:
>> >
>> >
>> > For the record, here is the ticket for Azure Pipelines integration:
>> > https://issues.apache.org/jira/browse/INFRA-17030
>> >
>> > I opened an issue back in May about the Travis-CI capacity situation:
>> > https://issues.apache.org/jira/browse/INFRA-18533
>> >
>> > Apparently CI capacity has been a "hot topic as of late":
>> >
>> https://lists.apache.org/thread.html/af52e2a3e865c01596d46374e8b294f2740587dbd59d85e132429b6c@%3Cbuilds.apache.org%3E
>> >
>> > (I didn't know this list -- builds@apache.org -- existed, by the way)
>> >
>> > Regards
>> >
>> > Antoine.
>> >
>> >
>> > Le 10/10/2019 à 07:34, Wes McKinney a écrit :
>> > > On Thu, Oct 10, 2019 at 12:22 AM Jacques Nadeau <ja...@apache.org>
>> wrote:
>> > >>
>> > >> I'm not dismissing the there are issues but I also don't feel like
>> there
>> > >> has been constant discussion for months on the list that INFRA is
>> not being
>> > >> responsive to Arrow community requests. It seems like you might be
>> saying a
>> > >> couple different things one of two things (or both?)?
>> > >>
>> > >> 1) The Arrow infrastructure requirements are vastly different than
>> other
>> > >> projects. Because of Arrow's specialized requirements, we need
>> things that
>> > >> no other project needs.
>> > >> 2) There are many projects that want CircleCI, Buildkite and Azure
>> > >> pipelines but Infrastructure is not responsive. This is putting a big
>> > >> damper on the success of the Arrow project.
>> > >
>> > > Yes, I'm saying both of these things.
>> > >
>> > > 1. Yes, Arrow is special -- validating the project requires running a
>> > > dozen or more different builds (with dozens more nightly builds) that
>> > > test different parts of the project. Different language components, a
>> > > large and diverse packaging matrix, and interproject integration tests
>> > > and integration with external projects (e.g. Apache Spark adn others)
>> > >
>> > > 2. Yes, the limited GitHub App availability is hurting us.
>> > >
>> > > I'm OK to place this concern in the "Community Health" section and
>> > > spend more time building a comprehensive case about how Infra's
>> > > conservatism around Apps is causing us to work with one hand tied
>> > > behind our back. I know that I'm not the only one who is unhappy, but
>> > > I'll let the others speak for themselves.
>> > >
>> > >> For each of these, if we're asking the board to do something, we
>> should say
>> > >> more and more clearly. Sure, CI is a pain in the Arrow project's
>> a**. I
>> > >> also agree that community health is impacted by the challenge to
>> merge
>> > >> things. I also share the perspective that the foundation has been
>> slow to
>> > >> adopt new technologies and has been way to religious about svn.
>> However, If
>> > >> we're asking the board to do something, what is it?
>> > >
>> > > Allow GitHub Apps that do not require write access to the code itself,
>> > > set up appropriate checks and balances to ensure that the Foundation's
>> > > IP provenance webhooks are preserved.
>> > >
>> > >> Looking at the two things you might be saying...
>> > >> If 1, are we confident in that? Many other projects have pretty
>> complex
>> > >> build matrices I think. (I haven't thought about this and evaluated
>> the
>> > >> other projects...maybe it is true.) If 1, we should clarify why we
>> think
>> > >> we're different. If that is the case, what are asking for from the
>> board.
>> > >>
>> > >> If 2, and you are proposing throwing stones at INFRA, we should back
>> it up
>> > >> with INFRA tickets and numbers (e.g. how many projects have wanted
>> these
>> > >> things and for how long). We should reference multiple threads on
>> the INFRA
>> > >> mailing list where we voiced certain concerns and many other people
>> voiced
>> > >> similar concerns and INFRA turned a deaf ear or blind eye (maybe
>> these
>> > >> exist, I haven't spent much time on the INFRA list lately). As it
>> stands,
>> > >> the one ticket referenced in this thread is a ticket that has only
>> one
>> > >> project asking for a new integration that has been open for less
>> than a
>> > >> week. That may be annoying but it doesn't seem like something that
>> has
>> > >> gotten to the level that we need to get the boards help.
>> > >>
>> > >> In a nutshell, I agree that this is impacting the health and growth
>> of the
>> > >> project but think we should cover that in the community health
>> section of
>> > >> the report. I'm less a fan of saying this is an issue the board
>> needs to
>> > >> help us solve unless it has been a constant point of pain that we've
>> > >> attempted to elevate multiple times in infra forums and experienced
>> > >> unreasonable responses. The board is a blunt instrument and should
>> only be
>> > >> used when we have depleted every other avenue for resolution.
>> > >>
>> > >
>> > > Yes, I'm happy to spend more time building a comprehensive case before
>> > > escalating it to the board level. However, Apache Arrow is a high
>> > > profile project and it is not a good luck to have a PMC in a
>> > > fast-growing project growing disgruntled with the Foundation's
>> > > policies in this way. We've been struggling visibly for a long time
>> > > with our CI scalability, and I think we should have all the options on
>> > > the table to utilize GitHub-integrated tools to help us find a way out
>> > > of the mess that we are in.
>> > >
>> > >>
>> > >> On Wed, Oct 9, 2019 at 9:44 PM Wes McKinney <we...@gmail.com>
>> wrote:
>> > >>
>> > >>> hi Jacques,
>> > >>>
>> > >>> I think we need to share the concerns that many PMC members have
>> over
>> > >>> the constraints that INFRA is placing on us. Can we rephrase the
>> > >>> concern in a way that is more helpful?
>> > >>>
>> > >>> Firstly, I respect and appreciate the ASF's desire to limit write
>> > >>> access to committers only from an IP provenance perspective. I
>> > >>> understand that GitHub webhooks are used to log actions taken in
>> > >>> repositories to secure IP provenance. I do not think a third party
>> > >>> application should be given the ability to commit or modify a
>> > >>> repository -- all write operations on the .git repository should be
>> > >>> initiated by committers.
>> > >>>
>> > >>> However, GitHub is the main platform for producing open source
>> > >>> software, and tools are being created to help produce open source
>> more
>> > >>> efficiently. It is frustrating for us to not be able to take
>> advantage
>> > >>> of the tools that are available to everyone else on GitHub. I
>> brought
>> > >>> up the recent request about Buildkite as being representative of
>> this
>> > >>> (after learning that Google has been making a lot of use of it), but
>> > >>> we have previously been denied use of CircleCI and Azure Pipelines
>> > >>> since those services require even more permissions (AFAIK) than in
>> the
>> > >>> case of Buildkite. From our use in
>> > >>> https://github.com/ursa-labs/crossbow CircleCI and Azure seem to
>> be a
>> > >>> lot better than Travis CI and Appveyor
>> > >>>
>> > >>> I think the ASF is going to face an existential crisis in the near
>> > >>> future whether it wants to live in 2020 or 2000. It feels like
>> GitHub
>> > >>> is treated somewhat as ersatz SVN "because people want to use git +
>> > >>> GitHub instead of SVN"
>> > >>>
>> > >>> In the same way that the cloud revolutionized software startups,
>> > >>> enabling small groups of developers to build large SaaS
>> applications,
>> > >>> the same kind of leverage is becoming available to open source
>> > >>> developers to set up infrastructure to automate and scale open
>> source
>> > >>> projects. I think projects considering joining the Foundation are
>> > >>> going to look at these issues around App usage and decide that they
>> > >>> would rather be in control of their own infrastructure.
>> > >>>
>> > >>> I can set aside even more time and money from my non-profit
>> > >>> organization's modest budget to do CI work for Apache Arrow. The
>> > >>> amount that we have invested already is very large, and continues to
>> > >>> grow. I'm raising these issues because as Member of the Foundation
>> I'm
>> > >>> concerned that fast-growing projects like ours are not being
>> > >>> adequately served by INFRA, and we probably aren't the only project
>> > >>> that will face these issues. All that is needed is for INFRA to let
>> us
>> > >>> use third party GitHub Apps and monitor any potentially destructive
>> > >>> actions that they may take, such as modifying unrelated repository
>> > >>> webhooks related to IP provenance.
>> > >>>
>> > >>> - Wes
>> > >>>
>> > >>> On Wed, Oct 9, 2019 at 9:33 PM Jacques Nadeau <ja...@apache.org>
>> wrote:
>> > >>>>
>> > >>>> I think we need to more direct in listing issues for the board.
>> > >>>>
>> > >>>> What have we done? What do we want them to do?
>> > >>>>
>> > >>>> In general, any large org is going to be slow to add new deep
>> > >>> integrations
>> > >>>> into GitHub. I don't think we should expect Apache to be any
>> different
>> > >>> (it
>> > >>>> took several years before we could merge things through github for
>> > >>>> example). If I were on the INFRA side, I think I would look and
>> see how
>> > >>>> many different people are asking for BuildKite before considering
>> > >>>> integration. It seems like we only opened the JIRA 6 days ago and
>> no
>> > >>> other
>> > >>>> projects have requested access to this?
>> > >>>>
>> > >>>> I'm not clear why this is a board issue. What do we think the
>> board can
>> > >>> do
>> > >>>> for us that we can't solve ourselves and need them to solve?
>> Remember, a
>> > >>>> board solution to a problem is typically very removed from what
>> matters
>> > >>> to
>> > >>>> individuals on a project.
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> On Tue, Oct 8, 2019 at 7:03 AM Wes McKinney <we...@gmail.com>
>> wrote:
>> > >>>>
>> > >>>>> New draft
>> > >>>>>
>> > >>>>> ## Description:
>> > >>>>> The mission of Apache Arrow is the creation and maintenance of
>> software
>> > >>>>> related
>> > >>>>> to columnar in-memory processing and data interchange
>> > >>>>>
>> > >>>>> ## Issues:
>> > >>>>>
>> > >>>>> * We are struggling with Continuous Integration scalability as the
>> > >>> project
>> > >>>>> has
>> > >>>>>   definitely outgrown what Travis CI and Appveyor can do for us.
>> Some
>> > >>>>>   contributors have shown reluctance to submit patches they
>> aren't sure
>> > >>>>> about
>> > >>>>>   because they don't want to pile on the build queue. We are
>> exploring
>> > >>>>>   alternative solutions such as Buildbot, Buildkite, and GitHub
>> > >>> Actions to
>> > >>>>>   provide a path to migrate away from Travis CI / Appveyor. In our
>> > >>> request
>> > >>>>> to
>> > >>>>>   Infrastructure INFRA-19217, some of us were alarmed to find
>> that an
>> > >>> CI/CD
>> > >>>>>   service like Buildkite may not be able to be connected to the
>> @apache
>> > >>>>> GitHub
>> > >>>>>   account on account of requiring admin access to repository
>> webhooks,
>> > >>> but
>> > >>>>> no
>> > >>>>>   ability to modify source code. There are workarounds (building
>> custom
>> > >>>>> OAuth
>> > >>>>>   bots) that could enable us to use Buildkite, but it would
>> require
>> > >>> extra
>> > >>>>>   development and result in a less refined experience for
>> community
>> > >>>>> members.
>> > >>>>>
>> > >>>>> ## Membership Data:
>> > >>>>> * Apache Arrow was founded 2016-01-19 (4 years ago)
>> > >>>>> * There are currently 48 committers and 28 PMC members in this
>> project.
>> > >>>>> * The Committer-to-PMC ratio is roughly 3:2.
>> > >>>>>
>> > >>>>> Community changes, past quarter:
>> > >>>>> - Micah Kornfield was added to the PMC on 2019-08-21
>> > >>>>> - Sebastien Binet was added to the PMC on 2019-08-21
>> > >>>>> - Ben Kietzman was added as committer on 2019-09-07
>> > >>>>> - David Li was added as committer on 2019-08-30
>> > >>>>> - Kenta Murata was added as committer on 2019-09-05
>> > >>>>> - Neal Richardson was added as committer on 2019-09-05
>> > >>>>> - Praveen Kumar was added as committer on 2019-07-14
>> > >>>>>
>> > >>>>> ## Project Activity:
>> > >>>>>
>> > >>>>> * The project has just made a 0.15.0 release.
>> > >>>>> * We are discussing ways to make the Arrow libraries as
>> accessible as
>> > >>>>> possible
>> > >>>>>   to downstream projects for minimal use cases while allowing the
>> > >>>>> development
>> > >>>>>   of more comprehensive "standard libraries" with larger
>> dependency
>> > >>> stacks
>> > >>>>> in
>> > >>>>>   the project
>> > >>>>> * We plan to make a 1.0.0 release as our next major release, at
>> which
>> > >>> time
>> > >>>>> we
>> > >>>>>   will declare that the Arrow binary protocol is stable with
>> forward
>> > >>> and
>> > >>>>>   backward compatibility guarantees
>> > >>>>>
>> > >>>>> ## Community Health:
>> > >>>>>
>> > >>>>> * The community is overall healthy, with the aforementioned
>> concerns
>> > >>>>> around CI
>> > >>>>>   scalability. New contributors frequently take notice of the long
>> > >>> build
>> > >>>>> queue
>> > >>>>>   times when submitting pull requests.
>> > >>>>>
>> > >>>>> On Tue, Oct 8, 2019 at 8:58 AM Wes McKinney <we...@gmail.com>
>> > >>> wrote:
>> > >>>>>>
>> > >>>>>> Yes, I agree with raising the issue to the board.
>> > >>>>>>
>> > >>>>>> On Tue, Oct 8, 2019 at 8:31 AM Antoine Pitrou <
>> antoine@python.org>
>> > >>>>> wrote:
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>> I agree.  Especially given that the constraints imposed by Infra
>> > >>> don't
>> > >>>>>>> help solving the problem.
>> > >>>>>>>
>> > >>>>>>> Regards
>> > >>>>>>>
>> > >>>>>>> Antoine.
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>> Le 08/10/2019 à 15:02, Uwe L. Korn a écrit :
>> > >>>>>>>> I'm not sure what qualifies for "board attention" but it seems
>> > >>> that
>> > >>>>> CI is a critical problem in Apache projects, not just Arrow.
>> Should we
>> > >>>>> raise that?
>> > >>>>>>>>
>> > >>>>>>>> Uwe
>> > >>>>>>>>
>> > >>>>>>>> On Tue, Oct 8, 2019, at 12:00 AM, Wes McKinney wrote:
>> > >>>>>>>>> Here is a start for our Q3 board report
>> > >>>>>>>>>
>> > >>>>>>>>> ## Description:
>> > >>>>>>>>> The mission of Apache Arrow is the creation and maintenance of
>> > >>>>> software related
>> > >>>>>>>>> to columnar in-memory processing and data interchange
>> > >>>>>>>>>
>> > >>>>>>>>> ## Issues:
>> > >>>>>>>>> There are no issues requiring board attention at this time
>> > >>>>>>>>>
>> > >>>>>>>>> ## Membership Data:
>> > >>>>>>>>> * Apache Arrow was founded 2016-01-19 (4 years ago)
>> > >>>>>>>>> * There are currently 48 committers and 28 PMC members in this
>> > >>>>> project.
>> > >>>>>>>>> * The Committer-to-PMC ratio is roughly 3:2.
>> > >>>>>>>>>
>> > >>>>>>>>> Community changes, past quarter:
>> > >>>>>>>>> - Micah Kornfield was added to the PMC on 2019-08-21
>> > >>>>>>>>> - Sebastien Binet was added to the PMC on 2019-08-21
>> > >>>>>>>>> - Ben Kietzman was added as committer on 2019-09-07
>> > >>>>>>>>> - David Li was added as committer on 2019-08-30
>> > >>>>>>>>> - Kenta Murata was added as committer on 2019-09-05
>> > >>>>>>>>> - Neal Richardson was added as committer on 2019-09-05
>> > >>>>>>>>> - Praveen Kumar was added as committer on 2019-07-14
>> > >>>>>>>>>
>> > >>>>>>>>> ## Project Activity:
>> > >>>>>>>>>
>> > >>>>>>>>> * The project has just made a 0.15.0 release.
>> > >>>>>>>>> * We are discussing ways to make the Arrow libraries as
>> > >>> accessible
>> > >>>>> as possible
>> > >>>>>>>>>   to downstream projects for minimal use cases while allowing
>> > >>> the
>> > >>>>> development
>> > >>>>>>>>>   of more comprehensive "standard libraries" with larger
>> > >>> dependency
>> > >>>>> stacks in
>> > >>>>>>>>>   the project
>> > >>>>>>>>> * We plan to make a 1.0.0 release as our next major release,
>> at
>> > >>>>> which time we
>> > >>>>>>>>>   will declare that the Arrow binary protocol is stable with
>> > >>>>> forward and
>> > >>>>>>>>>   backward compatibility guarantees
>> > >>>>>>>>> * We are struggling with Continuous Integration scalability as
>> > >>> the
>> > >>>>> project has
>> > >>>>>>>>>   definitely outgrown what Travis CI and Appveyor can do for
>> > >>> us. We
>> > >>>>> are
>> > >>>>>>>>>   exploring alternative solutions such as Buildbot, Buildkite
>> > >>> (see
>> > >>>>>>>>>   INFRA-19217), and GitHub Actions to provide a path to
>> migrate
>> > >>>>> away from
>> > >>>>>>>>>   Travis CI / Appveyor
>> > >>>>>>>>>
>> > >>>>>>>>> ## Community Health:
>> > >>>>>>>>>
>> > >>>>>>>>> * The community is overall healthy, with the aforementioned
>> > >>>>> concerns around CI
>> > >>>>>>>>>   scalability. New contributors frequently take notice of the
>> > >>> long
>> > >>>>> build queue
>> > >>>>>>>>>   times when submitting pull requests.
>> > >>>>>>>>>
>> > >>>>>
>> > >>>
>>
>

Re: [DRAFT] Apache Arrow Board Report - October 2019

Posted by Jacques Nadeau <ja...@apache.org>.
Many contributors are struggling with the slowness of pre-commit CI. Arrow
has a large number of different platforms and components and a complex
build matrix. As new commits come in, they frequently take a long time to
complete. The community is trying several ways to solve this. Some of those
have been:

   - Try to use CircleCI, rejected in INFRA-15964
   <https://issues.apache.org/jira/browse/INFRA-15964>
   - Try to use Azure Pipelines, rejected in INFRA-17030
   - Try to resolves Issues with Travis CI capacity: INFRA-18533
   <https://issues.apache.org/jira/browse/INFRA-18533>,
   https://s.apache.org/ci-capacity (no resolution beyond "find donations")
   - The creation of new infrastructure design (in progress but a huge
   amount of thankless work)


There is bubbling frustration in the community around the GitHub repo rules
for using third party services. This is especially challenging when there
are free solutions to relieve the community pressure but the community is
unable to access these resources. This frustration is greatest among people
who work on projects on many OSS projects which don't have such restrictive
rules around GitHub.

On Thu, Oct 10, 2019 at 5:36 AM Wes McKinney <we...@gmail.com> wrote:

> Here is a rejection of CircleCI more than 18 months ago
>
> https://issues.apache.org/jira/browse/INFRA-15964
>
> On Thu, Oct 10, 2019 at 4:33 AM Antoine Pitrou <an...@python.org> wrote:
> >
> >
> > For the record, here is the ticket for Azure Pipelines integration:
> > https://issues.apache.org/jira/browse/INFRA-17030
> >
> > I opened an issue back in May about the Travis-CI capacity situation:
> > https://issues.apache.org/jira/browse/INFRA-18533
> >
> > Apparently CI capacity has been a "hot topic as of late":
> >
> https://lists.apache.org/thread.html/af52e2a3e865c01596d46374e8b294f2740587dbd59d85e132429b6c@%3Cbuilds.apache.org%3E
> >
> > (I didn't know this list -- builds@apache.org -- existed, by the way)
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 10/10/2019 à 07:34, Wes McKinney a écrit :
> > > On Thu, Oct 10, 2019 at 12:22 AM Jacques Nadeau <ja...@apache.org>
> wrote:
> > >>
> > >> I'm not dismissing the there are issues but I also don't feel like
> there
> > >> has been constant discussion for months on the list that INFRA is not
> being
> > >> responsive to Arrow community requests. It seems like you might be
> saying a
> > >> couple different things one of two things (or both?)?
> > >>
> > >> 1) The Arrow infrastructure requirements are vastly different than
> other
> > >> projects. Because of Arrow's specialized requirements, we need things
> that
> > >> no other project needs.
> > >> 2) There are many projects that want CircleCI, Buildkite and Azure
> > >> pipelines but Infrastructure is not responsive. This is putting a big
> > >> damper on the success of the Arrow project.
> > >
> > > Yes, I'm saying both of these things.
> > >
> > > 1. Yes, Arrow is special -- validating the project requires running a
> > > dozen or more different builds (with dozens more nightly builds) that
> > > test different parts of the project. Different language components, a
> > > large and diverse packaging matrix, and interproject integration tests
> > > and integration with external projects (e.g. Apache Spark adn others)
> > >
> > > 2. Yes, the limited GitHub App availability is hurting us.
> > >
> > > I'm OK to place this concern in the "Community Health" section and
> > > spend more time building a comprehensive case about how Infra's
> > > conservatism around Apps is causing us to work with one hand tied
> > > behind our back. I know that I'm not the only one who is unhappy, but
> > > I'll let the others speak for themselves.
> > >
> > >> For each of these, if we're asking the board to do something, we
> should say
> > >> more and more clearly. Sure, CI is a pain in the Arrow project's a**.
> I
> > >> also agree that community health is impacted by the challenge to merge
> > >> things. I also share the perspective that the foundation has been
> slow to
> > >> adopt new technologies and has been way to religious about svn.
> However, If
> > >> we're asking the board to do something, what is it?
> > >
> > > Allow GitHub Apps that do not require write access to the code itself,
> > > set up appropriate checks and balances to ensure that the Foundation's
> > > IP provenance webhooks are preserved.
> > >
> > >> Looking at the two things you might be saying...
> > >> If 1, are we confident in that? Many other projects have pretty
> complex
> > >> build matrices I think. (I haven't thought about this and evaluated
> the
> > >> other projects...maybe it is true.) If 1, we should clarify why we
> think
> > >> we're different. If that is the case, what are asking for from the
> board.
> > >>
> > >> If 2, and you are proposing throwing stones at INFRA, we should back
> it up
> > >> with INFRA tickets and numbers (e.g. how many projects have wanted
> these
> > >> things and for how long). We should reference multiple threads on the
> INFRA
> > >> mailing list where we voiced certain concerns and many other people
> voiced
> > >> similar concerns and INFRA turned a deaf ear or blind eye (maybe these
> > >> exist, I haven't spent much time on the INFRA list lately). As it
> stands,
> > >> the one ticket referenced in this thread is a ticket that has only one
> > >> project asking for a new integration that has been open for less than
> a
> > >> week. That may be annoying but it doesn't seem like something that has
> > >> gotten to the level that we need to get the boards help.
> > >>
> > >> In a nutshell, I agree that this is impacting the health and growth
> of the
> > >> project but think we should cover that in the community health
> section of
> > >> the report. I'm less a fan of saying this is an issue the board needs
> to
> > >> help us solve unless it has been a constant point of pain that we've
> > >> attempted to elevate multiple times in infra forums and experienced
> > >> unreasonable responses. The board is a blunt instrument and should
> only be
> > >> used when we have depleted every other avenue for resolution.
> > >>
> > >
> > > Yes, I'm happy to spend more time building a comprehensive case before
> > > escalating it to the board level. However, Apache Arrow is a high
> > > profile project and it is not a good luck to have a PMC in a
> > > fast-growing project growing disgruntled with the Foundation's
> > > policies in this way. We've been struggling visibly for a long time
> > > with our CI scalability, and I think we should have all the options on
> > > the table to utilize GitHub-integrated tools to help us find a way out
> > > of the mess that we are in.
> > >
> > >>
> > >> On Wed, Oct 9, 2019 at 9:44 PM Wes McKinney <we...@gmail.com>
> wrote:
> > >>
> > >>> hi Jacques,
> > >>>
> > >>> I think we need to share the concerns that many PMC members have over
> > >>> the constraints that INFRA is placing on us. Can we rephrase the
> > >>> concern in a way that is more helpful?
> > >>>
> > >>> Firstly, I respect and appreciate the ASF's desire to limit write
> > >>> access to committers only from an IP provenance perspective. I
> > >>> understand that GitHub webhooks are used to log actions taken in
> > >>> repositories to secure IP provenance. I do not think a third party
> > >>> application should be given the ability to commit or modify a
> > >>> repository -- all write operations on the .git repository should be
> > >>> initiated by committers.
> > >>>
> > >>> However, GitHub is the main platform for producing open source
> > >>> software, and tools are being created to help produce open source
> more
> > >>> efficiently. It is frustrating for us to not be able to take
> advantage
> > >>> of the tools that are available to everyone else on GitHub. I brought
> > >>> up the recent request about Buildkite as being representative of this
> > >>> (after learning that Google has been making a lot of use of it), but
> > >>> we have previously been denied use of CircleCI and Azure Pipelines
> > >>> since those services require even more permissions (AFAIK) than in
> the
> > >>> case of Buildkite. From our use in
> > >>> https://github.com/ursa-labs/crossbow CircleCI and Azure seem to be
> a
> > >>> lot better than Travis CI and Appveyor
> > >>>
> > >>> I think the ASF is going to face an existential crisis in the near
> > >>> future whether it wants to live in 2020 or 2000. It feels like GitHub
> > >>> is treated somewhat as ersatz SVN "because people want to use git +
> > >>> GitHub instead of SVN"
> > >>>
> > >>> In the same way that the cloud revolutionized software startups,
> > >>> enabling small groups of developers to build large SaaS applications,
> > >>> the same kind of leverage is becoming available to open source
> > >>> developers to set up infrastructure to automate and scale open source
> > >>> projects. I think projects considering joining the Foundation are
> > >>> going to look at these issues around App usage and decide that they
> > >>> would rather be in control of their own infrastructure.
> > >>>
> > >>> I can set aside even more time and money from my non-profit
> > >>> organization's modest budget to do CI work for Apache Arrow. The
> > >>> amount that we have invested already is very large, and continues to
> > >>> grow. I'm raising these issues because as Member of the Foundation
> I'm
> > >>> concerned that fast-growing projects like ours are not being
> > >>> adequately served by INFRA, and we probably aren't the only project
> > >>> that will face these issues. All that is needed is for INFRA to let
> us
> > >>> use third party GitHub Apps and monitor any potentially destructive
> > >>> actions that they may take, such as modifying unrelated repository
> > >>> webhooks related to IP provenance.
> > >>>
> > >>> - Wes
> > >>>
> > >>> On Wed, Oct 9, 2019 at 9:33 PM Jacques Nadeau <ja...@apache.org>
> wrote:
> > >>>>
> > >>>> I think we need to more direct in listing issues for the board.
> > >>>>
> > >>>> What have we done? What do we want them to do?
> > >>>>
> > >>>> In general, any large org is going to be slow to add new deep
> > >>> integrations
> > >>>> into GitHub. I don't think we should expect Apache to be any
> different
> > >>> (it
> > >>>> took several years before we could merge things through github for
> > >>>> example). If I were on the INFRA side, I think I would look and see
> how
> > >>>> many different people are asking for BuildKite before considering
> > >>>> integration. It seems like we only opened the JIRA 6 days ago and no
> > >>> other
> > >>>> projects have requested access to this?
> > >>>>
> > >>>> I'm not clear why this is a board issue. What do we think the board
> can
> > >>> do
> > >>>> for us that we can't solve ourselves and need them to solve?
> Remember, a
> > >>>> board solution to a problem is typically very removed from what
> matters
> > >>> to
> > >>>> individuals on a project.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Tue, Oct 8, 2019 at 7:03 AM Wes McKinney <we...@gmail.com>
> wrote:
> > >>>>
> > >>>>> New draft
> > >>>>>
> > >>>>> ## Description:
> > >>>>> The mission of Apache Arrow is the creation and maintenance of
> software
> > >>>>> related
> > >>>>> to columnar in-memory processing and data interchange
> > >>>>>
> > >>>>> ## Issues:
> > >>>>>
> > >>>>> * We are struggling with Continuous Integration scalability as the
> > >>> project
> > >>>>> has
> > >>>>>   definitely outgrown what Travis CI and Appveyor can do for us.
> Some
> > >>>>>   contributors have shown reluctance to submit patches they aren't
> sure
> > >>>>> about
> > >>>>>   because they don't want to pile on the build queue. We are
> exploring
> > >>>>>   alternative solutions such as Buildbot, Buildkite, and GitHub
> > >>> Actions to
> > >>>>>   provide a path to migrate away from Travis CI / Appveyor. In our
> > >>> request
> > >>>>> to
> > >>>>>   Infrastructure INFRA-19217, some of us were alarmed to find that
> an
> > >>> CI/CD
> > >>>>>   service like Buildkite may not be able to be connected to the
> @apache
> > >>>>> GitHub
> > >>>>>   account on account of requiring admin access to repository
> webhooks,
> > >>> but
> > >>>>> no
> > >>>>>   ability to modify source code. There are workarounds (building
> custom
> > >>>>> OAuth
> > >>>>>   bots) that could enable us to use Buildkite, but it would require
> > >>> extra
> > >>>>>   development and result in a less refined experience for community
> > >>>>> members.
> > >>>>>
> > >>>>> ## Membership Data:
> > >>>>> * Apache Arrow was founded 2016-01-19 (4 years ago)
> > >>>>> * There are currently 48 committers and 28 PMC members in this
> project.
> > >>>>> * The Committer-to-PMC ratio is roughly 3:2.
> > >>>>>
> > >>>>> Community changes, past quarter:
> > >>>>> - Micah Kornfield was added to the PMC on 2019-08-21
> > >>>>> - Sebastien Binet was added to the PMC on 2019-08-21
> > >>>>> - Ben Kietzman was added as committer on 2019-09-07
> > >>>>> - David Li was added as committer on 2019-08-30
> > >>>>> - Kenta Murata was added as committer on 2019-09-05
> > >>>>> - Neal Richardson was added as committer on 2019-09-05
> > >>>>> - Praveen Kumar was added as committer on 2019-07-14
> > >>>>>
> > >>>>> ## Project Activity:
> > >>>>>
> > >>>>> * The project has just made a 0.15.0 release.
> > >>>>> * We are discussing ways to make the Arrow libraries as accessible
> as
> > >>>>> possible
> > >>>>>   to downstream projects for minimal use cases while allowing the
> > >>>>> development
> > >>>>>   of more comprehensive "standard libraries" with larger dependency
> > >>> stacks
> > >>>>> in
> > >>>>>   the project
> > >>>>> * We plan to make a 1.0.0 release as our next major release, at
> which
> > >>> time
> > >>>>> we
> > >>>>>   will declare that the Arrow binary protocol is stable with
> forward
> > >>> and
> > >>>>>   backward compatibility guarantees
> > >>>>>
> > >>>>> ## Community Health:
> > >>>>>
> > >>>>> * The community is overall healthy, with the aforementioned
> concerns
> > >>>>> around CI
> > >>>>>   scalability. New contributors frequently take notice of the long
> > >>> build
> > >>>>> queue
> > >>>>>   times when submitting pull requests.
> > >>>>>
> > >>>>> On Tue, Oct 8, 2019 at 8:58 AM Wes McKinney <we...@gmail.com>
> > >>> wrote:
> > >>>>>>
> > >>>>>> Yes, I agree with raising the issue to the board.
> > >>>>>>
> > >>>>>> On Tue, Oct 8, 2019 at 8:31 AM Antoine Pitrou <antoine@python.org
> >
> > >>>>> wrote:
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> I agree.  Especially given that the constraints imposed by Infra
> > >>> don't
> > >>>>>>> help solving the problem.
> > >>>>>>>
> > >>>>>>> Regards
> > >>>>>>>
> > >>>>>>> Antoine.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Le 08/10/2019 à 15:02, Uwe L. Korn a écrit :
> > >>>>>>>> I'm not sure what qualifies for "board attention" but it seems
> > >>> that
> > >>>>> CI is a critical problem in Apache projects, not just Arrow.
> Should we
> > >>>>> raise that?
> > >>>>>>>>
> > >>>>>>>> Uwe
> > >>>>>>>>
> > >>>>>>>> On Tue, Oct 8, 2019, at 12:00 AM, Wes McKinney wrote:
> > >>>>>>>>> Here is a start for our Q3 board report
> > >>>>>>>>>
> > >>>>>>>>> ## Description:
> > >>>>>>>>> The mission of Apache Arrow is the creation and maintenance of
> > >>>>> software related
> > >>>>>>>>> to columnar in-memory processing and data interchange
> > >>>>>>>>>
> > >>>>>>>>> ## Issues:
> > >>>>>>>>> There are no issues requiring board attention at this time
> > >>>>>>>>>
> > >>>>>>>>> ## Membership Data:
> > >>>>>>>>> * Apache Arrow was founded 2016-01-19 (4 years ago)
> > >>>>>>>>> * There are currently 48 committers and 28 PMC members in this
> > >>>>> project.
> > >>>>>>>>> * The Committer-to-PMC ratio is roughly 3:2.
> > >>>>>>>>>
> > >>>>>>>>> Community changes, past quarter:
> > >>>>>>>>> - Micah Kornfield was added to the PMC on 2019-08-21
> > >>>>>>>>> - Sebastien Binet was added to the PMC on 2019-08-21
> > >>>>>>>>> - Ben Kietzman was added as committer on 2019-09-07
> > >>>>>>>>> - David Li was added as committer on 2019-08-30
> > >>>>>>>>> - Kenta Murata was added as committer on 2019-09-05
> > >>>>>>>>> - Neal Richardson was added as committer on 2019-09-05
> > >>>>>>>>> - Praveen Kumar was added as committer on 2019-07-14
> > >>>>>>>>>
> > >>>>>>>>> ## Project Activity:
> > >>>>>>>>>
> > >>>>>>>>> * The project has just made a 0.15.0 release.
> > >>>>>>>>> * We are discussing ways to make the Arrow libraries as
> > >>> accessible
> > >>>>> as possible
> > >>>>>>>>>   to downstream projects for minimal use cases while allowing
> > >>> the
> > >>>>> development
> > >>>>>>>>>   of more comprehensive "standard libraries" with larger
> > >>> dependency
> > >>>>> stacks in
> > >>>>>>>>>   the project
> > >>>>>>>>> * We plan to make a 1.0.0 release as our next major release, at
> > >>>>> which time we
> > >>>>>>>>>   will declare that the Arrow binary protocol is stable with
> > >>>>> forward and
> > >>>>>>>>>   backward compatibility guarantees
> > >>>>>>>>> * We are struggling with Continuous Integration scalability as
> > >>> the
> > >>>>> project has
> > >>>>>>>>>   definitely outgrown what Travis CI and Appveyor can do for
> > >>> us. We
> > >>>>> are
> > >>>>>>>>>   exploring alternative solutions such as Buildbot, Buildkite
> > >>> (see
> > >>>>>>>>>   INFRA-19217), and GitHub Actions to provide a path to migrate
> > >>>>> away from
> > >>>>>>>>>   Travis CI / Appveyor
> > >>>>>>>>>
> > >>>>>>>>> ## Community Health:
> > >>>>>>>>>
> > >>>>>>>>> * The community is overall healthy, with the aforementioned
> > >>>>> concerns around CI
> > >>>>>>>>>   scalability. New contributors frequently take notice of the
> > >>> long
> > >>>>> build queue
> > >>>>>>>>>   times when submitting pull requests.
> > >>>>>>>>>
> > >>>>>
> > >>>
>

Re: [DRAFT] Apache Arrow Board Report - October 2019

Posted by Wes McKinney <we...@gmail.com>.
Here is a rejection of CircleCI more than 18 months ago

https://issues.apache.org/jira/browse/INFRA-15964

On Thu, Oct 10, 2019 at 4:33 AM Antoine Pitrou <an...@python.org> wrote:
>
>
> For the record, here is the ticket for Azure Pipelines integration:
> https://issues.apache.org/jira/browse/INFRA-17030
>
> I opened an issue back in May about the Travis-CI capacity situation:
> https://issues.apache.org/jira/browse/INFRA-18533
>
> Apparently CI capacity has been a "hot topic as of late":
> https://lists.apache.org/thread.html/af52e2a3e865c01596d46374e8b294f2740587dbd59d85e132429b6c@%3Cbuilds.apache.org%3E
>
> (I didn't know this list -- builds@apache.org -- existed, by the way)
>
> Regards
>
> Antoine.
>
>
> Le 10/10/2019 à 07:34, Wes McKinney a écrit :
> > On Thu, Oct 10, 2019 at 12:22 AM Jacques Nadeau <ja...@apache.org> wrote:
> >>
> >> I'm not dismissing the there are issues but I also don't feel like there
> >> has been constant discussion for months on the list that INFRA is not being
> >> responsive to Arrow community requests. It seems like you might be saying a
> >> couple different things one of two things (or both?)?
> >>
> >> 1) The Arrow infrastructure requirements are vastly different than other
> >> projects. Because of Arrow's specialized requirements, we need things that
> >> no other project needs.
> >> 2) There are many projects that want CircleCI, Buildkite and Azure
> >> pipelines but Infrastructure is not responsive. This is putting a big
> >> damper on the success of the Arrow project.
> >
> > Yes, I'm saying both of these things.
> >
> > 1. Yes, Arrow is special -- validating the project requires running a
> > dozen or more different builds (with dozens more nightly builds) that
> > test different parts of the project. Different language components, a
> > large and diverse packaging matrix, and interproject integration tests
> > and integration with external projects (e.g. Apache Spark adn others)
> >
> > 2. Yes, the limited GitHub App availability is hurting us.
> >
> > I'm OK to place this concern in the "Community Health" section and
> > spend more time building a comprehensive case about how Infra's
> > conservatism around Apps is causing us to work with one hand tied
> > behind our back. I know that I'm not the only one who is unhappy, but
> > I'll let the others speak for themselves.
> >
> >> For each of these, if we're asking the board to do something, we should say
> >> more and more clearly. Sure, CI is a pain in the Arrow project's a**. I
> >> also agree that community health is impacted by the challenge to merge
> >> things. I also share the perspective that the foundation has been slow to
> >> adopt new technologies and has been way to religious about svn. However, If
> >> we're asking the board to do something, what is it?
> >
> > Allow GitHub Apps that do not require write access to the code itself,
> > set up appropriate checks and balances to ensure that the Foundation's
> > IP provenance webhooks are preserved.
> >
> >> Looking at the two things you might be saying...
> >> If 1, are we confident in that? Many other projects have pretty complex
> >> build matrices I think. (I haven't thought about this and evaluated the
> >> other projects...maybe it is true.) If 1, we should clarify why we think
> >> we're different. If that is the case, what are asking for from the board.
> >>
> >> If 2, and you are proposing throwing stones at INFRA, we should back it up
> >> with INFRA tickets and numbers (e.g. how many projects have wanted these
> >> things and for how long). We should reference multiple threads on the INFRA
> >> mailing list where we voiced certain concerns and many other people voiced
> >> similar concerns and INFRA turned a deaf ear or blind eye (maybe these
> >> exist, I haven't spent much time on the INFRA list lately). As it stands,
> >> the one ticket referenced in this thread is a ticket that has only one
> >> project asking for a new integration that has been open for less than a
> >> week. That may be annoying but it doesn't seem like something that has
> >> gotten to the level that we need to get the boards help.
> >>
> >> In a nutshell, I agree that this is impacting the health and growth of the
> >> project but think we should cover that in the community health section of
> >> the report. I'm less a fan of saying this is an issue the board needs to
> >> help us solve unless it has been a constant point of pain that we've
> >> attempted to elevate multiple times in infra forums and experienced
> >> unreasonable responses. The board is a blunt instrument and should only be
> >> used when we have depleted every other avenue for resolution.
> >>
> >
> > Yes, I'm happy to spend more time building a comprehensive case before
> > escalating it to the board level. However, Apache Arrow is a high
> > profile project and it is not a good luck to have a PMC in a
> > fast-growing project growing disgruntled with the Foundation's
> > policies in this way. We've been struggling visibly for a long time
> > with our CI scalability, and I think we should have all the options on
> > the table to utilize GitHub-integrated tools to help us find a way out
> > of the mess that we are in.
> >
> >>
> >> On Wed, Oct 9, 2019 at 9:44 PM Wes McKinney <we...@gmail.com> wrote:
> >>
> >>> hi Jacques,
> >>>
> >>> I think we need to share the concerns that many PMC members have over
> >>> the constraints that INFRA is placing on us. Can we rephrase the
> >>> concern in a way that is more helpful?
> >>>
> >>> Firstly, I respect and appreciate the ASF's desire to limit write
> >>> access to committers only from an IP provenance perspective. I
> >>> understand that GitHub webhooks are used to log actions taken in
> >>> repositories to secure IP provenance. I do not think a third party
> >>> application should be given the ability to commit or modify a
> >>> repository -- all write operations on the .git repository should be
> >>> initiated by committers.
> >>>
> >>> However, GitHub is the main platform for producing open source
> >>> software, and tools are being created to help produce open source more
> >>> efficiently. It is frustrating for us to not be able to take advantage
> >>> of the tools that are available to everyone else on GitHub. I brought
> >>> up the recent request about Buildkite as being representative of this
> >>> (after learning that Google has been making a lot of use of it), but
> >>> we have previously been denied use of CircleCI and Azure Pipelines
> >>> since those services require even more permissions (AFAIK) than in the
> >>> case of Buildkite. From our use in
> >>> https://github.com/ursa-labs/crossbow CircleCI and Azure seem to be a
> >>> lot better than Travis CI and Appveyor
> >>>
> >>> I think the ASF is going to face an existential crisis in the near
> >>> future whether it wants to live in 2020 or 2000. It feels like GitHub
> >>> is treated somewhat as ersatz SVN "because people want to use git +
> >>> GitHub instead of SVN"
> >>>
> >>> In the same way that the cloud revolutionized software startups,
> >>> enabling small groups of developers to build large SaaS applications,
> >>> the same kind of leverage is becoming available to open source
> >>> developers to set up infrastructure to automate and scale open source
> >>> projects. I think projects considering joining the Foundation are
> >>> going to look at these issues around App usage and decide that they
> >>> would rather be in control of their own infrastructure.
> >>>
> >>> I can set aside even more time and money from my non-profit
> >>> organization's modest budget to do CI work for Apache Arrow. The
> >>> amount that we have invested already is very large, and continues to
> >>> grow. I'm raising these issues because as Member of the Foundation I'm
> >>> concerned that fast-growing projects like ours are not being
> >>> adequately served by INFRA, and we probably aren't the only project
> >>> that will face these issues. All that is needed is for INFRA to let us
> >>> use third party GitHub Apps and monitor any potentially destructive
> >>> actions that they may take, such as modifying unrelated repository
> >>> webhooks related to IP provenance.
> >>>
> >>> - Wes
> >>>
> >>> On Wed, Oct 9, 2019 at 9:33 PM Jacques Nadeau <ja...@apache.org> wrote:
> >>>>
> >>>> I think we need to more direct in listing issues for the board.
> >>>>
> >>>> What have we done? What do we want them to do?
> >>>>
> >>>> In general, any large org is going to be slow to add new deep
> >>> integrations
> >>>> into GitHub. I don't think we should expect Apache to be any different
> >>> (it
> >>>> took several years before we could merge things through github for
> >>>> example). If I were on the INFRA side, I think I would look and see how
> >>>> many different people are asking for BuildKite before considering
> >>>> integration. It seems like we only opened the JIRA 6 days ago and no
> >>> other
> >>>> projects have requested access to this?
> >>>>
> >>>> I'm not clear why this is a board issue. What do we think the board can
> >>> do
> >>>> for us that we can't solve ourselves and need them to solve? Remember, a
> >>>> board solution to a problem is typically very removed from what matters
> >>> to
> >>>> individuals on a project.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Oct 8, 2019 at 7:03 AM Wes McKinney <we...@gmail.com> wrote:
> >>>>
> >>>>> New draft
> >>>>>
> >>>>> ## Description:
> >>>>> The mission of Apache Arrow is the creation and maintenance of software
> >>>>> related
> >>>>> to columnar in-memory processing and data interchange
> >>>>>
> >>>>> ## Issues:
> >>>>>
> >>>>> * We are struggling with Continuous Integration scalability as the
> >>> project
> >>>>> has
> >>>>>   definitely outgrown what Travis CI and Appveyor can do for us. Some
> >>>>>   contributors have shown reluctance to submit patches they aren't sure
> >>>>> about
> >>>>>   because they don't want to pile on the build queue. We are exploring
> >>>>>   alternative solutions such as Buildbot, Buildkite, and GitHub
> >>> Actions to
> >>>>>   provide a path to migrate away from Travis CI / Appveyor. In our
> >>> request
> >>>>> to
> >>>>>   Infrastructure INFRA-19217, some of us were alarmed to find that an
> >>> CI/CD
> >>>>>   service like Buildkite may not be able to be connected to the @apache
> >>>>> GitHub
> >>>>>   account on account of requiring admin access to repository webhooks,
> >>> but
> >>>>> no
> >>>>>   ability to modify source code. There are workarounds (building custom
> >>>>> OAuth
> >>>>>   bots) that could enable us to use Buildkite, but it would require
> >>> extra
> >>>>>   development and result in a less refined experience for community
> >>>>> members.
> >>>>>
> >>>>> ## Membership Data:
> >>>>> * Apache Arrow was founded 2016-01-19 (4 years ago)
> >>>>> * There are currently 48 committers and 28 PMC members in this project.
> >>>>> * The Committer-to-PMC ratio is roughly 3:2.
> >>>>>
> >>>>> Community changes, past quarter:
> >>>>> - Micah Kornfield was added to the PMC on 2019-08-21
> >>>>> - Sebastien Binet was added to the PMC on 2019-08-21
> >>>>> - Ben Kietzman was added as committer on 2019-09-07
> >>>>> - David Li was added as committer on 2019-08-30
> >>>>> - Kenta Murata was added as committer on 2019-09-05
> >>>>> - Neal Richardson was added as committer on 2019-09-05
> >>>>> - Praveen Kumar was added as committer on 2019-07-14
> >>>>>
> >>>>> ## Project Activity:
> >>>>>
> >>>>> * The project has just made a 0.15.0 release.
> >>>>> * We are discussing ways to make the Arrow libraries as accessible as
> >>>>> possible
> >>>>>   to downstream projects for minimal use cases while allowing the
> >>>>> development
> >>>>>   of more comprehensive "standard libraries" with larger dependency
> >>> stacks
> >>>>> in
> >>>>>   the project
> >>>>> * We plan to make a 1.0.0 release as our next major release, at which
> >>> time
> >>>>> we
> >>>>>   will declare that the Arrow binary protocol is stable with forward
> >>> and
> >>>>>   backward compatibility guarantees
> >>>>>
> >>>>> ## Community Health:
> >>>>>
> >>>>> * The community is overall healthy, with the aforementioned concerns
> >>>>> around CI
> >>>>>   scalability. New contributors frequently take notice of the long
> >>> build
> >>>>> queue
> >>>>>   times when submitting pull requests.
> >>>>>
> >>>>> On Tue, Oct 8, 2019 at 8:58 AM Wes McKinney <we...@gmail.com>
> >>> wrote:
> >>>>>>
> >>>>>> Yes, I agree with raising the issue to the board.
> >>>>>>
> >>>>>> On Tue, Oct 8, 2019 at 8:31 AM Antoine Pitrou <an...@python.org>
> >>>>> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> I agree.  Especially given that the constraints imposed by Infra
> >>> don't
> >>>>>>> help solving the problem.
> >>>>>>>
> >>>>>>> Regards
> >>>>>>>
> >>>>>>> Antoine.
> >>>>>>>
> >>>>>>>
> >>>>>>> Le 08/10/2019 à 15:02, Uwe L. Korn a écrit :
> >>>>>>>> I'm not sure what qualifies for "board attention" but it seems
> >>> that
> >>>>> CI is a critical problem in Apache projects, not just Arrow. Should we
> >>>>> raise that?
> >>>>>>>>
> >>>>>>>> Uwe
> >>>>>>>>
> >>>>>>>> On Tue, Oct 8, 2019, at 12:00 AM, Wes McKinney wrote:
> >>>>>>>>> Here is a start for our Q3 board report
> >>>>>>>>>
> >>>>>>>>> ## Description:
> >>>>>>>>> The mission of Apache Arrow is the creation and maintenance of
> >>>>> software related
> >>>>>>>>> to columnar in-memory processing and data interchange
> >>>>>>>>>
> >>>>>>>>> ## Issues:
> >>>>>>>>> There are no issues requiring board attention at this time
> >>>>>>>>>
> >>>>>>>>> ## Membership Data:
> >>>>>>>>> * Apache Arrow was founded 2016-01-19 (4 years ago)
> >>>>>>>>> * There are currently 48 committers and 28 PMC members in this
> >>>>> project.
> >>>>>>>>> * The Committer-to-PMC ratio is roughly 3:2.
> >>>>>>>>>
> >>>>>>>>> Community changes, past quarter:
> >>>>>>>>> - Micah Kornfield was added to the PMC on 2019-08-21
> >>>>>>>>> - Sebastien Binet was added to the PMC on 2019-08-21
> >>>>>>>>> - Ben Kietzman was added as committer on 2019-09-07
> >>>>>>>>> - David Li was added as committer on 2019-08-30
> >>>>>>>>> - Kenta Murata was added as committer on 2019-09-05
> >>>>>>>>> - Neal Richardson was added as committer on 2019-09-05
> >>>>>>>>> - Praveen Kumar was added as committer on 2019-07-14
> >>>>>>>>>
> >>>>>>>>> ## Project Activity:
> >>>>>>>>>
> >>>>>>>>> * The project has just made a 0.15.0 release.
> >>>>>>>>> * We are discussing ways to make the Arrow libraries as
> >>> accessible
> >>>>> as possible
> >>>>>>>>>   to downstream projects for minimal use cases while allowing
> >>> the
> >>>>> development
> >>>>>>>>>   of more comprehensive "standard libraries" with larger
> >>> dependency
> >>>>> stacks in
> >>>>>>>>>   the project
> >>>>>>>>> * We plan to make a 1.0.0 release as our next major release, at
> >>>>> which time we
> >>>>>>>>>   will declare that the Arrow binary protocol is stable with
> >>>>> forward and
> >>>>>>>>>   backward compatibility guarantees
> >>>>>>>>> * We are struggling with Continuous Integration scalability as
> >>> the
> >>>>> project has
> >>>>>>>>>   definitely outgrown what Travis CI and Appveyor can do for
> >>> us. We
> >>>>> are
> >>>>>>>>>   exploring alternative solutions such as Buildbot, Buildkite
> >>> (see
> >>>>>>>>>   INFRA-19217), and GitHub Actions to provide a path to migrate
> >>>>> away from
> >>>>>>>>>   Travis CI / Appveyor
> >>>>>>>>>
> >>>>>>>>> ## Community Health:
> >>>>>>>>>
> >>>>>>>>> * The community is overall healthy, with the aforementioned
> >>>>> concerns around CI
> >>>>>>>>>   scalability. New contributors frequently take notice of the
> >>> long
> >>>>> build queue
> >>>>>>>>>   times when submitting pull requests.
> >>>>>>>>>
> >>>>>
> >>>

Re: [DRAFT] Apache Arrow Board Report - October 2019

Posted by Antoine Pitrou <an...@python.org>.
For the record, here is the ticket for Azure Pipelines integration:
https://issues.apache.org/jira/browse/INFRA-17030

I opened an issue back in May about the Travis-CI capacity situation:
https://issues.apache.org/jira/browse/INFRA-18533

Apparently CI capacity has been a "hot topic as of late":
https://lists.apache.org/thread.html/af52e2a3e865c01596d46374e8b294f2740587dbd59d85e132429b6c@%3Cbuilds.apache.org%3E

(I didn't know this list -- builds@apache.org -- existed, by the way)

Regards

Antoine.


Le 10/10/2019 à 07:34, Wes McKinney a écrit :
> On Thu, Oct 10, 2019 at 12:22 AM Jacques Nadeau <ja...@apache.org> wrote:
>>
>> I'm not dismissing the there are issues but I also don't feel like there
>> has been constant discussion for months on the list that INFRA is not being
>> responsive to Arrow community requests. It seems like you might be saying a
>> couple different things one of two things (or both?)?
>>
>> 1) The Arrow infrastructure requirements are vastly different than other
>> projects. Because of Arrow's specialized requirements, we need things that
>> no other project needs.
>> 2) There are many projects that want CircleCI, Buildkite and Azure
>> pipelines but Infrastructure is not responsive. This is putting a big
>> damper on the success of the Arrow project.
> 
> Yes, I'm saying both of these things.
> 
> 1. Yes, Arrow is special -- validating the project requires running a
> dozen or more different builds (with dozens more nightly builds) that
> test different parts of the project. Different language components, a
> large and diverse packaging matrix, and interproject integration tests
> and integration with external projects (e.g. Apache Spark adn others)
> 
> 2. Yes, the limited GitHub App availability is hurting us.
> 
> I'm OK to place this concern in the "Community Health" section and
> spend more time building a comprehensive case about how Infra's
> conservatism around Apps is causing us to work with one hand tied
> behind our back. I know that I'm not the only one who is unhappy, but
> I'll let the others speak for themselves.
> 
>> For each of these, if we're asking the board to do something, we should say
>> more and more clearly. Sure, CI is a pain in the Arrow project's a**. I
>> also agree that community health is impacted by the challenge to merge
>> things. I also share the perspective that the foundation has been slow to
>> adopt new technologies and has been way to religious about svn. However, If
>> we're asking the board to do something, what is it?
> 
> Allow GitHub Apps that do not require write access to the code itself,
> set up appropriate checks and balances to ensure that the Foundation's
> IP provenance webhooks are preserved.
> 
>> Looking at the two things you might be saying...
>> If 1, are we confident in that? Many other projects have pretty complex
>> build matrices I think. (I haven't thought about this and evaluated the
>> other projects...maybe it is true.) If 1, we should clarify why we think
>> we're different. If that is the case, what are asking for from the board.
>>
>> If 2, and you are proposing throwing stones at INFRA, we should back it up
>> with INFRA tickets and numbers (e.g. how many projects have wanted these
>> things and for how long). We should reference multiple threads on the INFRA
>> mailing list where we voiced certain concerns and many other people voiced
>> similar concerns and INFRA turned a deaf ear or blind eye (maybe these
>> exist, I haven't spent much time on the INFRA list lately). As it stands,
>> the one ticket referenced in this thread is a ticket that has only one
>> project asking for a new integration that has been open for less than a
>> week. That may be annoying but it doesn't seem like something that has
>> gotten to the level that we need to get the boards help.
>>
>> In a nutshell, I agree that this is impacting the health and growth of the
>> project but think we should cover that in the community health section of
>> the report. I'm less a fan of saying this is an issue the board needs to
>> help us solve unless it has been a constant point of pain that we've
>> attempted to elevate multiple times in infra forums and experienced
>> unreasonable responses. The board is a blunt instrument and should only be
>> used when we have depleted every other avenue for resolution.
>>
> 
> Yes, I'm happy to spend more time building a comprehensive case before
> escalating it to the board level. However, Apache Arrow is a high
> profile project and it is not a good luck to have a PMC in a
> fast-growing project growing disgruntled with the Foundation's
> policies in this way. We've been struggling visibly for a long time
> with our CI scalability, and I think we should have all the options on
> the table to utilize GitHub-integrated tools to help us find a way out
> of the mess that we are in.
> 
>>
>> On Wed, Oct 9, 2019 at 9:44 PM Wes McKinney <we...@gmail.com> wrote:
>>
>>> hi Jacques,
>>>
>>> I think we need to share the concerns that many PMC members have over
>>> the constraints that INFRA is placing on us. Can we rephrase the
>>> concern in a way that is more helpful?
>>>
>>> Firstly, I respect and appreciate the ASF's desire to limit write
>>> access to committers only from an IP provenance perspective. I
>>> understand that GitHub webhooks are used to log actions taken in
>>> repositories to secure IP provenance. I do not think a third party
>>> application should be given the ability to commit or modify a
>>> repository -- all write operations on the .git repository should be
>>> initiated by committers.
>>>
>>> However, GitHub is the main platform for producing open source
>>> software, and tools are being created to help produce open source more
>>> efficiently. It is frustrating for us to not be able to take advantage
>>> of the tools that are available to everyone else on GitHub. I brought
>>> up the recent request about Buildkite as being representative of this
>>> (after learning that Google has been making a lot of use of it), but
>>> we have previously been denied use of CircleCI and Azure Pipelines
>>> since those services require even more permissions (AFAIK) than in the
>>> case of Buildkite. From our use in
>>> https://github.com/ursa-labs/crossbow CircleCI and Azure seem to be a
>>> lot better than Travis CI and Appveyor
>>>
>>> I think the ASF is going to face an existential crisis in the near
>>> future whether it wants to live in 2020 or 2000. It feels like GitHub
>>> is treated somewhat as ersatz SVN "because people want to use git +
>>> GitHub instead of SVN"
>>>
>>> In the same way that the cloud revolutionized software startups,
>>> enabling small groups of developers to build large SaaS applications,
>>> the same kind of leverage is becoming available to open source
>>> developers to set up infrastructure to automate and scale open source
>>> projects. I think projects considering joining the Foundation are
>>> going to look at these issues around App usage and decide that they
>>> would rather be in control of their own infrastructure.
>>>
>>> I can set aside even more time and money from my non-profit
>>> organization's modest budget to do CI work for Apache Arrow. The
>>> amount that we have invested already is very large, and continues to
>>> grow. I'm raising these issues because as Member of the Foundation I'm
>>> concerned that fast-growing projects like ours are not being
>>> adequately served by INFRA, and we probably aren't the only project
>>> that will face these issues. All that is needed is for INFRA to let us
>>> use third party GitHub Apps and monitor any potentially destructive
>>> actions that they may take, such as modifying unrelated repository
>>> webhooks related to IP provenance.
>>>
>>> - Wes
>>>
>>> On Wed, Oct 9, 2019 at 9:33 PM Jacques Nadeau <ja...@apache.org> wrote:
>>>>
>>>> I think we need to more direct in listing issues for the board.
>>>>
>>>> What have we done? What do we want them to do?
>>>>
>>>> In general, any large org is going to be slow to add new deep
>>> integrations
>>>> into GitHub. I don't think we should expect Apache to be any different
>>> (it
>>>> took several years before we could merge things through github for
>>>> example). If I were on the INFRA side, I think I would look and see how
>>>> many different people are asking for BuildKite before considering
>>>> integration. It seems like we only opened the JIRA 6 days ago and no
>>> other
>>>> projects have requested access to this?
>>>>
>>>> I'm not clear why this is a board issue. What do we think the board can
>>> do
>>>> for us that we can't solve ourselves and need them to solve? Remember, a
>>>> board solution to a problem is typically very removed from what matters
>>> to
>>>> individuals on a project.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Oct 8, 2019 at 7:03 AM Wes McKinney <we...@gmail.com> wrote:
>>>>
>>>>> New draft
>>>>>
>>>>> ## Description:
>>>>> The mission of Apache Arrow is the creation and maintenance of software
>>>>> related
>>>>> to columnar in-memory processing and data interchange
>>>>>
>>>>> ## Issues:
>>>>>
>>>>> * We are struggling with Continuous Integration scalability as the
>>> project
>>>>> has
>>>>>   definitely outgrown what Travis CI and Appveyor can do for us. Some
>>>>>   contributors have shown reluctance to submit patches they aren't sure
>>>>> about
>>>>>   because they don't want to pile on the build queue. We are exploring
>>>>>   alternative solutions such as Buildbot, Buildkite, and GitHub
>>> Actions to
>>>>>   provide a path to migrate away from Travis CI / Appveyor. In our
>>> request
>>>>> to
>>>>>   Infrastructure INFRA-19217, some of us were alarmed to find that an
>>> CI/CD
>>>>>   service like Buildkite may not be able to be connected to the @apache
>>>>> GitHub
>>>>>   account on account of requiring admin access to repository webhooks,
>>> but
>>>>> no
>>>>>   ability to modify source code. There are workarounds (building custom
>>>>> OAuth
>>>>>   bots) that could enable us to use Buildkite, but it would require
>>> extra
>>>>>   development and result in a less refined experience for community
>>>>> members.
>>>>>
>>>>> ## Membership Data:
>>>>> * Apache Arrow was founded 2016-01-19 (4 years ago)
>>>>> * There are currently 48 committers and 28 PMC members in this project.
>>>>> * The Committer-to-PMC ratio is roughly 3:2.
>>>>>
>>>>> Community changes, past quarter:
>>>>> - Micah Kornfield was added to the PMC on 2019-08-21
>>>>> - Sebastien Binet was added to the PMC on 2019-08-21
>>>>> - Ben Kietzman was added as committer on 2019-09-07
>>>>> - David Li was added as committer on 2019-08-30
>>>>> - Kenta Murata was added as committer on 2019-09-05
>>>>> - Neal Richardson was added as committer on 2019-09-05
>>>>> - Praveen Kumar was added as committer on 2019-07-14
>>>>>
>>>>> ## Project Activity:
>>>>>
>>>>> * The project has just made a 0.15.0 release.
>>>>> * We are discussing ways to make the Arrow libraries as accessible as
>>>>> possible
>>>>>   to downstream projects for minimal use cases while allowing the
>>>>> development
>>>>>   of more comprehensive "standard libraries" with larger dependency
>>> stacks
>>>>> in
>>>>>   the project
>>>>> * We plan to make a 1.0.0 release as our next major release, at which
>>> time
>>>>> we
>>>>>   will declare that the Arrow binary protocol is stable with forward
>>> and
>>>>>   backward compatibility guarantees
>>>>>
>>>>> ## Community Health:
>>>>>
>>>>> * The community is overall healthy, with the aforementioned concerns
>>>>> around CI
>>>>>   scalability. New contributors frequently take notice of the long
>>> build
>>>>> queue
>>>>>   times when submitting pull requests.
>>>>>
>>>>> On Tue, Oct 8, 2019 at 8:58 AM Wes McKinney <we...@gmail.com>
>>> wrote:
>>>>>>
>>>>>> Yes, I agree with raising the issue to the board.
>>>>>>
>>>>>> On Tue, Oct 8, 2019 at 8:31 AM Antoine Pitrou <an...@python.org>
>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> I agree.  Especially given that the constraints imposed by Infra
>>> don't
>>>>>>> help solving the problem.
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Antoine.
>>>>>>>
>>>>>>>
>>>>>>> Le 08/10/2019 à 15:02, Uwe L. Korn a écrit :
>>>>>>>> I'm not sure what qualifies for "board attention" but it seems
>>> that
>>>>> CI is a critical problem in Apache projects, not just Arrow. Should we
>>>>> raise that?
>>>>>>>>
>>>>>>>> Uwe
>>>>>>>>
>>>>>>>> On Tue, Oct 8, 2019, at 12:00 AM, Wes McKinney wrote:
>>>>>>>>> Here is a start for our Q3 board report
>>>>>>>>>
>>>>>>>>> ## Description:
>>>>>>>>> The mission of Apache Arrow is the creation and maintenance of
>>>>> software related
>>>>>>>>> to columnar in-memory processing and data interchange
>>>>>>>>>
>>>>>>>>> ## Issues:
>>>>>>>>> There are no issues requiring board attention at this time
>>>>>>>>>
>>>>>>>>> ## Membership Data:
>>>>>>>>> * Apache Arrow was founded 2016-01-19 (4 years ago)
>>>>>>>>> * There are currently 48 committers and 28 PMC members in this
>>>>> project.
>>>>>>>>> * The Committer-to-PMC ratio is roughly 3:2.
>>>>>>>>>
>>>>>>>>> Community changes, past quarter:
>>>>>>>>> - Micah Kornfield was added to the PMC on 2019-08-21
>>>>>>>>> - Sebastien Binet was added to the PMC on 2019-08-21
>>>>>>>>> - Ben Kietzman was added as committer on 2019-09-07
>>>>>>>>> - David Li was added as committer on 2019-08-30
>>>>>>>>> - Kenta Murata was added as committer on 2019-09-05
>>>>>>>>> - Neal Richardson was added as committer on 2019-09-05
>>>>>>>>> - Praveen Kumar was added as committer on 2019-07-14
>>>>>>>>>
>>>>>>>>> ## Project Activity:
>>>>>>>>>
>>>>>>>>> * The project has just made a 0.15.0 release.
>>>>>>>>> * We are discussing ways to make the Arrow libraries as
>>> accessible
>>>>> as possible
>>>>>>>>>   to downstream projects for minimal use cases while allowing
>>> the
>>>>> development
>>>>>>>>>   of more comprehensive "standard libraries" with larger
>>> dependency
>>>>> stacks in
>>>>>>>>>   the project
>>>>>>>>> * We plan to make a 1.0.0 release as our next major release, at
>>>>> which time we
>>>>>>>>>   will declare that the Arrow binary protocol is stable with
>>>>> forward and
>>>>>>>>>   backward compatibility guarantees
>>>>>>>>> * We are struggling with Continuous Integration scalability as
>>> the
>>>>> project has
>>>>>>>>>   definitely outgrown what Travis CI and Appveyor can do for
>>> us. We
>>>>> are
>>>>>>>>>   exploring alternative solutions such as Buildbot, Buildkite
>>> (see
>>>>>>>>>   INFRA-19217), and GitHub Actions to provide a path to migrate
>>>>> away from
>>>>>>>>>   Travis CI / Appveyor
>>>>>>>>>
>>>>>>>>> ## Community Health:
>>>>>>>>>
>>>>>>>>> * The community is overall healthy, with the aforementioned
>>>>> concerns around CI
>>>>>>>>>   scalability. New contributors frequently take notice of the
>>> long
>>>>> build queue
>>>>>>>>>   times when submitting pull requests.
>>>>>>>>>
>>>>>
>>>

Re: [DRAFT] Apache Arrow Board Report - October 2019

Posted by Wes McKinney <we...@gmail.com>.
On Thu, Oct 10, 2019 at 12:22 AM Jacques Nadeau <ja...@apache.org> wrote:
>
> I'm not dismissing the there are issues but I also don't feel like there
> has been constant discussion for months on the list that INFRA is not being
> responsive to Arrow community requests. It seems like you might be saying a
> couple different things one of two things (or both?)?
>
> 1) The Arrow infrastructure requirements are vastly different than other
> projects. Because of Arrow's specialized requirements, we need things that
> no other project needs.
> 2) There are many projects that want CircleCI, Buildkite and Azure
> pipelines but Infrastructure is not responsive. This is putting a big
> damper on the success of the Arrow project.

Yes, I'm saying both of these things.

1. Yes, Arrow is special -- validating the project requires running a
dozen or more different builds (with dozens more nightly builds) that
test different parts of the project. Different language components, a
large and diverse packaging matrix, and interproject integration tests
and integration with external projects (e.g. Apache Spark adn others)

2. Yes, the limited GitHub App availability is hurting us.

I'm OK to place this concern in the "Community Health" section and
spend more time building a comprehensive case about how Infra's
conservatism around Apps is causing us to work with one hand tied
behind our back. I know that I'm not the only one who is unhappy, but
I'll let the others speak for themselves.

> For each of these, if we're asking the board to do something, we should say
> more and more clearly. Sure, CI is a pain in the Arrow project's a**. I
> also agree that community health is impacted by the challenge to merge
> things. I also share the perspective that the foundation has been slow to
> adopt new technologies and has been way to religious about svn. However, If
> we're asking the board to do something, what is it?

Allow GitHub Apps that do not require write access to the code itself,
set up appropriate checks and balances to ensure that the Foundation's
IP provenance webhooks are preserved.

> Looking at the two things you might be saying...
> If 1, are we confident in that? Many other projects have pretty complex
> build matrices I think. (I haven't thought about this and evaluated the
> other projects...maybe it is true.) If 1, we should clarify why we think
> we're different. If that is the case, what are asking for from the board.
>
> If 2, and you are proposing throwing stones at INFRA, we should back it up
> with INFRA tickets and numbers (e.g. how many projects have wanted these
> things and for how long). We should reference multiple threads on the INFRA
> mailing list where we voiced certain concerns and many other people voiced
> similar concerns and INFRA turned a deaf ear or blind eye (maybe these
> exist, I haven't spent much time on the INFRA list lately). As it stands,
> the one ticket referenced in this thread is a ticket that has only one
> project asking for a new integration that has been open for less than a
> week. That may be annoying but it doesn't seem like something that has
> gotten to the level that we need to get the boards help.
>
> In a nutshell, I agree that this is impacting the health and growth of the
> project but think we should cover that in the community health section of
> the report. I'm less a fan of saying this is an issue the board needs to
> help us solve unless it has been a constant point of pain that we've
> attempted to elevate multiple times in infra forums and experienced
> unreasonable responses. The board is a blunt instrument and should only be
> used when we have depleted every other avenue for resolution.
>

Yes, I'm happy to spend more time building a comprehensive case before
escalating it to the board level. However, Apache Arrow is a high
profile project and it is not a good luck to have a PMC in a
fast-growing project growing disgruntled with the Foundation's
policies in this way. We've been struggling visibly for a long time
with our CI scalability, and I think we should have all the options on
the table to utilize GitHub-integrated tools to help us find a way out
of the mess that we are in.

>
> On Wed, Oct 9, 2019 at 9:44 PM Wes McKinney <we...@gmail.com> wrote:
>
> > hi Jacques,
> >
> > I think we need to share the concerns that many PMC members have over
> > the constraints that INFRA is placing on us. Can we rephrase the
> > concern in a way that is more helpful?
> >
> > Firstly, I respect and appreciate the ASF's desire to limit write
> > access to committers only from an IP provenance perspective. I
> > understand that GitHub webhooks are used to log actions taken in
> > repositories to secure IP provenance. I do not think a third party
> > application should be given the ability to commit or modify a
> > repository -- all write operations on the .git repository should be
> > initiated by committers.
> >
> > However, GitHub is the main platform for producing open source
> > software, and tools are being created to help produce open source more
> > efficiently. It is frustrating for us to not be able to take advantage
> > of the tools that are available to everyone else on GitHub. I brought
> > up the recent request about Buildkite as being representative of this
> > (after learning that Google has been making a lot of use of it), but
> > we have previously been denied use of CircleCI and Azure Pipelines
> > since those services require even more permissions (AFAIK) than in the
> > case of Buildkite. From our use in
> > https://github.com/ursa-labs/crossbow CircleCI and Azure seem to be a
> > lot better than Travis CI and Appveyor
> >
> > I think the ASF is going to face an existential crisis in the near
> > future whether it wants to live in 2020 or 2000. It feels like GitHub
> > is treated somewhat as ersatz SVN "because people want to use git +
> > GitHub instead of SVN"
> >
> > In the same way that the cloud revolutionized software startups,
> > enabling small groups of developers to build large SaaS applications,
> > the same kind of leverage is becoming available to open source
> > developers to set up infrastructure to automate and scale open source
> > projects. I think projects considering joining the Foundation are
> > going to look at these issues around App usage and decide that they
> > would rather be in control of their own infrastructure.
> >
> > I can set aside even more time and money from my non-profit
> > organization's modest budget to do CI work for Apache Arrow. The
> > amount that we have invested already is very large, and continues to
> > grow. I'm raising these issues because as Member of the Foundation I'm
> > concerned that fast-growing projects like ours are not being
> > adequately served by INFRA, and we probably aren't the only project
> > that will face these issues. All that is needed is for INFRA to let us
> > use third party GitHub Apps and monitor any potentially destructive
> > actions that they may take, such as modifying unrelated repository
> > webhooks related to IP provenance.
> >
> > - Wes
> >
> > On Wed, Oct 9, 2019 at 9:33 PM Jacques Nadeau <ja...@apache.org> wrote:
> > >
> > > I think we need to more direct in listing issues for the board.
> > >
> > > What have we done? What do we want them to do?
> > >
> > > In general, any large org is going to be slow to add new deep
> > integrations
> > > into GitHub. I don't think we should expect Apache to be any different
> > (it
> > > took several years before we could merge things through github for
> > > example). If I were on the INFRA side, I think I would look and see how
> > > many different people are asking for BuildKite before considering
> > > integration. It seems like we only opened the JIRA 6 days ago and no
> > other
> > > projects have requested access to this?
> > >
> > > I'm not clear why this is a board issue. What do we think the board can
> > do
> > > for us that we can't solve ourselves and need them to solve? Remember, a
> > > board solution to a problem is typically very removed from what matters
> > to
> > > individuals on a project.
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Oct 8, 2019 at 7:03 AM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > > New draft
> > > >
> > > > ## Description:
> > > > The mission of Apache Arrow is the creation and maintenance of software
> > > > related
> > > > to columnar in-memory processing and data interchange
> > > >
> > > > ## Issues:
> > > >
> > > > * We are struggling with Continuous Integration scalability as the
> > project
> > > > has
> > > >   definitely outgrown what Travis CI and Appveyor can do for us. Some
> > > >   contributors have shown reluctance to submit patches they aren't sure
> > > > about
> > > >   because they don't want to pile on the build queue. We are exploring
> > > >   alternative solutions such as Buildbot, Buildkite, and GitHub
> > Actions to
> > > >   provide a path to migrate away from Travis CI / Appveyor. In our
> > request
> > > > to
> > > >   Infrastructure INFRA-19217, some of us were alarmed to find that an
> > CI/CD
> > > >   service like Buildkite may not be able to be connected to the @apache
> > > > GitHub
> > > >   account on account of requiring admin access to repository webhooks,
> > but
> > > > no
> > > >   ability to modify source code. There are workarounds (building custom
> > > > OAuth
> > > >   bots) that could enable us to use Buildkite, but it would require
> > extra
> > > >   development and result in a less refined experience for community
> > > > members.
> > > >
> > > > ## Membership Data:
> > > > * Apache Arrow was founded 2016-01-19 (4 years ago)
> > > > * There are currently 48 committers and 28 PMC members in this project.
> > > > * The Committer-to-PMC ratio is roughly 3:2.
> > > >
> > > > Community changes, past quarter:
> > > > - Micah Kornfield was added to the PMC on 2019-08-21
> > > > - Sebastien Binet was added to the PMC on 2019-08-21
> > > > - Ben Kietzman was added as committer on 2019-09-07
> > > > - David Li was added as committer on 2019-08-30
> > > > - Kenta Murata was added as committer on 2019-09-05
> > > > - Neal Richardson was added as committer on 2019-09-05
> > > > - Praveen Kumar was added as committer on 2019-07-14
> > > >
> > > > ## Project Activity:
> > > >
> > > > * The project has just made a 0.15.0 release.
> > > > * We are discussing ways to make the Arrow libraries as accessible as
> > > > possible
> > > >   to downstream projects for minimal use cases while allowing the
> > > > development
> > > >   of more comprehensive "standard libraries" with larger dependency
> > stacks
> > > > in
> > > >   the project
> > > > * We plan to make a 1.0.0 release as our next major release, at which
> > time
> > > > we
> > > >   will declare that the Arrow binary protocol is stable with forward
> > and
> > > >   backward compatibility guarantees
> > > >
> > > > ## Community Health:
> > > >
> > > > * The community is overall healthy, with the aforementioned concerns
> > > > around CI
> > > >   scalability. New contributors frequently take notice of the long
> > build
> > > > queue
> > > >   times when submitting pull requests.
> > > >
> > > > On Tue, Oct 8, 2019 at 8:58 AM Wes McKinney <we...@gmail.com>
> > wrote:
> > > > >
> > > > > Yes, I agree with raising the issue to the board.
> > > > >
> > > > > On Tue, Oct 8, 2019 at 8:31 AM Antoine Pitrou <an...@python.org>
> > > > wrote:
> > > > > >
> > > > > >
> > > > > > I agree.  Especially given that the constraints imposed by Infra
> > don't
> > > > > > help solving the problem.
> > > > > >
> > > > > > Regards
> > > > > >
> > > > > > Antoine.
> > > > > >
> > > > > >
> > > > > > Le 08/10/2019 à 15:02, Uwe L. Korn a écrit :
> > > > > > > I'm not sure what qualifies for "board attention" but it seems
> > that
> > > > CI is a critical problem in Apache projects, not just Arrow. Should we
> > > > raise that?
> > > > > > >
> > > > > > > Uwe
> > > > > > >
> > > > > > > On Tue, Oct 8, 2019, at 12:00 AM, Wes McKinney wrote:
> > > > > > >> Here is a start for our Q3 board report
> > > > > > >>
> > > > > > >> ## Description:
> > > > > > >> The mission of Apache Arrow is the creation and maintenance of
> > > > software related
> > > > > > >> to columnar in-memory processing and data interchange
> > > > > > >>
> > > > > > >> ## Issues:
> > > > > > >> There are no issues requiring board attention at this time
> > > > > > >>
> > > > > > >> ## Membership Data:
> > > > > > >> * Apache Arrow was founded 2016-01-19 (4 years ago)
> > > > > > >> * There are currently 48 committers and 28 PMC members in this
> > > > project.
> > > > > > >> * The Committer-to-PMC ratio is roughly 3:2.
> > > > > > >>
> > > > > > >> Community changes, past quarter:
> > > > > > >> - Micah Kornfield was added to the PMC on 2019-08-21
> > > > > > >> - Sebastien Binet was added to the PMC on 2019-08-21
> > > > > > >> - Ben Kietzman was added as committer on 2019-09-07
> > > > > > >> - David Li was added as committer on 2019-08-30
> > > > > > >> - Kenta Murata was added as committer on 2019-09-05
> > > > > > >> - Neal Richardson was added as committer on 2019-09-05
> > > > > > >> - Praveen Kumar was added as committer on 2019-07-14
> > > > > > >>
> > > > > > >> ## Project Activity:
> > > > > > >>
> > > > > > >> * The project has just made a 0.15.0 release.
> > > > > > >> * We are discussing ways to make the Arrow libraries as
> > accessible
> > > > as possible
> > > > > > >>   to downstream projects for minimal use cases while allowing
> > the
> > > > development
> > > > > > >>   of more comprehensive "standard libraries" with larger
> > dependency
> > > > stacks in
> > > > > > >>   the project
> > > > > > >> * We plan to make a 1.0.0 release as our next major release, at
> > > > which time we
> > > > > > >>   will declare that the Arrow binary protocol is stable with
> > > > forward and
> > > > > > >>   backward compatibility guarantees
> > > > > > >> * We are struggling with Continuous Integration scalability as
> > the
> > > > project has
> > > > > > >>   definitely outgrown what Travis CI and Appveyor can do for
> > us. We
> > > > are
> > > > > > >>   exploring alternative solutions such as Buildbot, Buildkite
> > (see
> > > > > > >>   INFRA-19217), and GitHub Actions to provide a path to migrate
> > > > away from
> > > > > > >>   Travis CI / Appveyor
> > > > > > >>
> > > > > > >> ## Community Health:
> > > > > > >>
> > > > > > >> * The community is overall healthy, with the aforementioned
> > > > concerns around CI
> > > > > > >>   scalability. New contributors frequently take notice of the
> > long
> > > > build queue
> > > > > > >>   times when submitting pull requests.
> > > > > > >>
> > > >
> >

Re: [DRAFT] Apache Arrow Board Report - October 2019

Posted by Jacques Nadeau <ja...@apache.org>.
I'm not dismissing the there are issues but I also don't feel like there
has been constant discussion for months on the list that INFRA is not being
responsive to Arrow community requests. It seems like you might be saying a
couple different things one of two things (or both?)?

1) The Arrow infrastructure requirements are vastly different than other
projects. Because of Arrow's specialized requirements, we need things that
no other project needs.
2) There are many projects that want CircleCI, Buildkite and Azure
pipelines but Infrastructure is not responsive. This is putting a big
damper on the success of the Arrow project.

For each of these, if we're asking the board to do something, we should say
more and more clearly. Sure, CI is a pain in the Arrow project's a**. I
also agree that community health is impacted by the challenge to merge
things. I also share the perspective that the foundation has been slow to
adopt new technologies and has been way to religious about svn. However, If
we're asking the board to do something, what is it?

Looking at the two things you might be saying...
If 1, are we confident in that? Many other projects have pretty complex
build matrices I think. (I haven't thought about this and evaluated the
other projects...maybe it is true.) If 1, we should clarify why we think
we're different. If that is the case, what are asking for from the board.

If 2, and you are proposing throwing stones at INFRA, we should back it up
with INFRA tickets and numbers (e.g. how many projects have wanted these
things and for how long). We should reference multiple threads on the INFRA
mailing list where we voiced certain concerns and many other people voiced
similar concerns and INFRA turned a deaf ear or blind eye (maybe these
exist, I haven't spent much time on the INFRA list lately). As it stands,
the one ticket referenced in this thread is a ticket that has only one
project asking for a new integration that has been open for less than a
week. That may be annoying but it doesn't seem like something that has
gotten to the level that we need to get the boards help.

In a nutshell, I agree that this is impacting the health and growth of the
project but think we should cover that in the community health section of
the report. I'm less a fan of saying this is an issue the board needs to
help us solve unless it has been a constant point of pain that we've
attempted to elevate multiple times in infra forums and experienced
unreasonable responses. The board is a blunt instrument and should only be
used when we have depleted every other avenue for resolution.




On Wed, Oct 9, 2019 at 9:44 PM Wes McKinney <we...@gmail.com> wrote:

> hi Jacques,
>
> I think we need to share the concerns that many PMC members have over
> the constraints that INFRA is placing on us. Can we rephrase the
> concern in a way that is more helpful?
>
> Firstly, I respect and appreciate the ASF's desire to limit write
> access to committers only from an IP provenance perspective. I
> understand that GitHub webhooks are used to log actions taken in
> repositories to secure IP provenance. I do not think a third party
> application should be given the ability to commit or modify a
> repository -- all write operations on the .git repository should be
> initiated by committers.
>
> However, GitHub is the main platform for producing open source
> software, and tools are being created to help produce open source more
> efficiently. It is frustrating for us to not be able to take advantage
> of the tools that are available to everyone else on GitHub. I brought
> up the recent request about Buildkite as being representative of this
> (after learning that Google has been making a lot of use of it), but
> we have previously been denied use of CircleCI and Azure Pipelines
> since those services require even more permissions (AFAIK) than in the
> case of Buildkite. From our use in
> https://github.com/ursa-labs/crossbow CircleCI and Azure seem to be a
> lot better than Travis CI and Appveyor
>
> I think the ASF is going to face an existential crisis in the near
> future whether it wants to live in 2020 or 2000. It feels like GitHub
> is treated somewhat as ersatz SVN "because people want to use git +
> GitHub instead of SVN"
>
> In the same way that the cloud revolutionized software startups,
> enabling small groups of developers to build large SaaS applications,
> the same kind of leverage is becoming available to open source
> developers to set up infrastructure to automate and scale open source
> projects. I think projects considering joining the Foundation are
> going to look at these issues around App usage and decide that they
> would rather be in control of their own infrastructure.
>
> I can set aside even more time and money from my non-profit
> organization's modest budget to do CI work for Apache Arrow. The
> amount that we have invested already is very large, and continues to
> grow. I'm raising these issues because as Member of the Foundation I'm
> concerned that fast-growing projects like ours are not being
> adequately served by INFRA, and we probably aren't the only project
> that will face these issues. All that is needed is for INFRA to let us
> use third party GitHub Apps and monitor any potentially destructive
> actions that they may take, such as modifying unrelated repository
> webhooks related to IP provenance.
>
> - Wes
>
> On Wed, Oct 9, 2019 at 9:33 PM Jacques Nadeau <ja...@apache.org> wrote:
> >
> > I think we need to more direct in listing issues for the board.
> >
> > What have we done? What do we want them to do?
> >
> > In general, any large org is going to be slow to add new deep
> integrations
> > into GitHub. I don't think we should expect Apache to be any different
> (it
> > took several years before we could merge things through github for
> > example). If I were on the INFRA side, I think I would look and see how
> > many different people are asking for BuildKite before considering
> > integration. It seems like we only opened the JIRA 6 days ago and no
> other
> > projects have requested access to this?
> >
> > I'm not clear why this is a board issue. What do we think the board can
> do
> > for us that we can't solve ourselves and need them to solve? Remember, a
> > board solution to a problem is typically very removed from what matters
> to
> > individuals on a project.
> >
> >
> >
> >
> >
> >
> > On Tue, Oct 8, 2019 at 7:03 AM Wes McKinney <we...@gmail.com> wrote:
> >
> > > New draft
> > >
> > > ## Description:
> > > The mission of Apache Arrow is the creation and maintenance of software
> > > related
> > > to columnar in-memory processing and data interchange
> > >
> > > ## Issues:
> > >
> > > * We are struggling with Continuous Integration scalability as the
> project
> > > has
> > >   definitely outgrown what Travis CI and Appveyor can do for us. Some
> > >   contributors have shown reluctance to submit patches they aren't sure
> > > about
> > >   because they don't want to pile on the build queue. We are exploring
> > >   alternative solutions such as Buildbot, Buildkite, and GitHub
> Actions to
> > >   provide a path to migrate away from Travis CI / Appveyor. In our
> request
> > > to
> > >   Infrastructure INFRA-19217, some of us were alarmed to find that an
> CI/CD
> > >   service like Buildkite may not be able to be connected to the @apache
> > > GitHub
> > >   account on account of requiring admin access to repository webhooks,
> but
> > > no
> > >   ability to modify source code. There are workarounds (building custom
> > > OAuth
> > >   bots) that could enable us to use Buildkite, but it would require
> extra
> > >   development and result in a less refined experience for community
> > > members.
> > >
> > > ## Membership Data:
> > > * Apache Arrow was founded 2016-01-19 (4 years ago)
> > > * There are currently 48 committers and 28 PMC members in this project.
> > > * The Committer-to-PMC ratio is roughly 3:2.
> > >
> > > Community changes, past quarter:
> > > - Micah Kornfield was added to the PMC on 2019-08-21
> > > - Sebastien Binet was added to the PMC on 2019-08-21
> > > - Ben Kietzman was added as committer on 2019-09-07
> > > - David Li was added as committer on 2019-08-30
> > > - Kenta Murata was added as committer on 2019-09-05
> > > - Neal Richardson was added as committer on 2019-09-05
> > > - Praveen Kumar was added as committer on 2019-07-14
> > >
> > > ## Project Activity:
> > >
> > > * The project has just made a 0.15.0 release.
> > > * We are discussing ways to make the Arrow libraries as accessible as
> > > possible
> > >   to downstream projects for minimal use cases while allowing the
> > > development
> > >   of more comprehensive "standard libraries" with larger dependency
> stacks
> > > in
> > >   the project
> > > * We plan to make a 1.0.0 release as our next major release, at which
> time
> > > we
> > >   will declare that the Arrow binary protocol is stable with forward
> and
> > >   backward compatibility guarantees
> > >
> > > ## Community Health:
> > >
> > > * The community is overall healthy, with the aforementioned concerns
> > > around CI
> > >   scalability. New contributors frequently take notice of the long
> build
> > > queue
> > >   times when submitting pull requests.
> > >
> > > On Tue, Oct 8, 2019 at 8:58 AM Wes McKinney <we...@gmail.com>
> wrote:
> > > >
> > > > Yes, I agree with raising the issue to the board.
> > > >
> > > > On Tue, Oct 8, 2019 at 8:31 AM Antoine Pitrou <an...@python.org>
> > > wrote:
> > > > >
> > > > >
> > > > > I agree.  Especially given that the constraints imposed by Infra
> don't
> > > > > help solving the problem.
> > > > >
> > > > > Regards
> > > > >
> > > > > Antoine.
> > > > >
> > > > >
> > > > > Le 08/10/2019 à 15:02, Uwe L. Korn a écrit :
> > > > > > I'm not sure what qualifies for "board attention" but it seems
> that
> > > CI is a critical problem in Apache projects, not just Arrow. Should we
> > > raise that?
> > > > > >
> > > > > > Uwe
> > > > > >
> > > > > > On Tue, Oct 8, 2019, at 12:00 AM, Wes McKinney wrote:
> > > > > >> Here is a start for our Q3 board report
> > > > > >>
> > > > > >> ## Description:
> > > > > >> The mission of Apache Arrow is the creation and maintenance of
> > > software related
> > > > > >> to columnar in-memory processing and data interchange
> > > > > >>
> > > > > >> ## Issues:
> > > > > >> There are no issues requiring board attention at this time
> > > > > >>
> > > > > >> ## Membership Data:
> > > > > >> * Apache Arrow was founded 2016-01-19 (4 years ago)
> > > > > >> * There are currently 48 committers and 28 PMC members in this
> > > project.
> > > > > >> * The Committer-to-PMC ratio is roughly 3:2.
> > > > > >>
> > > > > >> Community changes, past quarter:
> > > > > >> - Micah Kornfield was added to the PMC on 2019-08-21
> > > > > >> - Sebastien Binet was added to the PMC on 2019-08-21
> > > > > >> - Ben Kietzman was added as committer on 2019-09-07
> > > > > >> - David Li was added as committer on 2019-08-30
> > > > > >> - Kenta Murata was added as committer on 2019-09-05
> > > > > >> - Neal Richardson was added as committer on 2019-09-05
> > > > > >> - Praveen Kumar was added as committer on 2019-07-14
> > > > > >>
> > > > > >> ## Project Activity:
> > > > > >>
> > > > > >> * The project has just made a 0.15.0 release.
> > > > > >> * We are discussing ways to make the Arrow libraries as
> accessible
> > > as possible
> > > > > >>   to downstream projects for minimal use cases while allowing
> the
> > > development
> > > > > >>   of more comprehensive "standard libraries" with larger
> dependency
> > > stacks in
> > > > > >>   the project
> > > > > >> * We plan to make a 1.0.0 release as our next major release, at
> > > which time we
> > > > > >>   will declare that the Arrow binary protocol is stable with
> > > forward and
> > > > > >>   backward compatibility guarantees
> > > > > >> * We are struggling with Continuous Integration scalability as
> the
> > > project has
> > > > > >>   definitely outgrown what Travis CI and Appveyor can do for
> us. We
> > > are
> > > > > >>   exploring alternative solutions such as Buildbot, Buildkite
> (see
> > > > > >>   INFRA-19217), and GitHub Actions to provide a path to migrate
> > > away from
> > > > > >>   Travis CI / Appveyor
> > > > > >>
> > > > > >> ## Community Health:
> > > > > >>
> > > > > >> * The community is overall healthy, with the aforementioned
> > > concerns around CI
> > > > > >>   scalability. New contributors frequently take notice of the
> long
> > > build queue
> > > > > >>   times when submitting pull requests.
> > > > > >>
> > >
>

Re: [DRAFT] Apache Arrow Board Report - October 2019

Posted by Wes McKinney <we...@gmail.com>.
hi Jacques,

I think we need to share the concerns that many PMC members have over
the constraints that INFRA is placing on us. Can we rephrase the
concern in a way that is more helpful?

Firstly, I respect and appreciate the ASF's desire to limit write
access to committers only from an IP provenance perspective. I
understand that GitHub webhooks are used to log actions taken in
repositories to secure IP provenance. I do not think a third party
application should be given the ability to commit or modify a
repository -- all write operations on the .git repository should be
initiated by committers.

However, GitHub is the main platform for producing open source
software, and tools are being created to help produce open source more
efficiently. It is frustrating for us to not be able to take advantage
of the tools that are available to everyone else on GitHub. I brought
up the recent request about Buildkite as being representative of this
(after learning that Google has been making a lot of use of it), but
we have previously been denied use of CircleCI and Azure Pipelines
since those services require even more permissions (AFAIK) than in the
case of Buildkite. From our use in
https://github.com/ursa-labs/crossbow CircleCI and Azure seem to be a
lot better than Travis CI and Appveyor

I think the ASF is going to face an existential crisis in the near
future whether it wants to live in 2020 or 2000. It feels like GitHub
is treated somewhat as ersatz SVN "because people want to use git +
GitHub instead of SVN"

In the same way that the cloud revolutionized software startups,
enabling small groups of developers to build large SaaS applications,
the same kind of leverage is becoming available to open source
developers to set up infrastructure to automate and scale open source
projects. I think projects considering joining the Foundation are
going to look at these issues around App usage and decide that they
would rather be in control of their own infrastructure.

I can set aside even more time and money from my non-profit
organization's modest budget to do CI work for Apache Arrow. The
amount that we have invested already is very large, and continues to
grow. I'm raising these issues because as Member of the Foundation I'm
concerned that fast-growing projects like ours are not being
adequately served by INFRA, and we probably aren't the only project
that will face these issues. All that is needed is for INFRA to let us
use third party GitHub Apps and monitor any potentially destructive
actions that they may take, such as modifying unrelated repository
webhooks related to IP provenance.

- Wes

On Wed, Oct 9, 2019 at 9:33 PM Jacques Nadeau <ja...@apache.org> wrote:
>
> I think we need to more direct in listing issues for the board.
>
> What have we done? What do we want them to do?
>
> In general, any large org is going to be slow to add new deep integrations
> into GitHub. I don't think we should expect Apache to be any different (it
> took several years before we could merge things through github for
> example). If I were on the INFRA side, I think I would look and see how
> many different people are asking for BuildKite before considering
> integration. It seems like we only opened the JIRA 6 days ago and no other
> projects have requested access to this?
>
> I'm not clear why this is a board issue. What do we think the board can do
> for us that we can't solve ourselves and need them to solve? Remember, a
> board solution to a problem is typically very removed from what matters to
> individuals on a project.
>
>
>
>
>
>
> On Tue, Oct 8, 2019 at 7:03 AM Wes McKinney <we...@gmail.com> wrote:
>
> > New draft
> >
> > ## Description:
> > The mission of Apache Arrow is the creation and maintenance of software
> > related
> > to columnar in-memory processing and data interchange
> >
> > ## Issues:
> >
> > * We are struggling with Continuous Integration scalability as the project
> > has
> >   definitely outgrown what Travis CI and Appveyor can do for us. Some
> >   contributors have shown reluctance to submit patches they aren't sure
> > about
> >   because they don't want to pile on the build queue. We are exploring
> >   alternative solutions such as Buildbot, Buildkite, and GitHub Actions to
> >   provide a path to migrate away from Travis CI / Appveyor. In our request
> > to
> >   Infrastructure INFRA-19217, some of us were alarmed to find that an CI/CD
> >   service like Buildkite may not be able to be connected to the @apache
> > GitHub
> >   account on account of requiring admin access to repository webhooks, but
> > no
> >   ability to modify source code. There are workarounds (building custom
> > OAuth
> >   bots) that could enable us to use Buildkite, but it would require extra
> >   development and result in a less refined experience for community
> > members.
> >
> > ## Membership Data:
> > * Apache Arrow was founded 2016-01-19 (4 years ago)
> > * There are currently 48 committers and 28 PMC members in this project.
> > * The Committer-to-PMC ratio is roughly 3:2.
> >
> > Community changes, past quarter:
> > - Micah Kornfield was added to the PMC on 2019-08-21
> > - Sebastien Binet was added to the PMC on 2019-08-21
> > - Ben Kietzman was added as committer on 2019-09-07
> > - David Li was added as committer on 2019-08-30
> > - Kenta Murata was added as committer on 2019-09-05
> > - Neal Richardson was added as committer on 2019-09-05
> > - Praveen Kumar was added as committer on 2019-07-14
> >
> > ## Project Activity:
> >
> > * The project has just made a 0.15.0 release.
> > * We are discussing ways to make the Arrow libraries as accessible as
> > possible
> >   to downstream projects for minimal use cases while allowing the
> > development
> >   of more comprehensive "standard libraries" with larger dependency stacks
> > in
> >   the project
> > * We plan to make a 1.0.0 release as our next major release, at which time
> > we
> >   will declare that the Arrow binary protocol is stable with forward and
> >   backward compatibility guarantees
> >
> > ## Community Health:
> >
> > * The community is overall healthy, with the aforementioned concerns
> > around CI
> >   scalability. New contributors frequently take notice of the long build
> > queue
> >   times when submitting pull requests.
> >
> > On Tue, Oct 8, 2019 at 8:58 AM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > Yes, I agree with raising the issue to the board.
> > >
> > > On Tue, Oct 8, 2019 at 8:31 AM Antoine Pitrou <an...@python.org>
> > wrote:
> > > >
> > > >
> > > > I agree.  Especially given that the constraints imposed by Infra don't
> > > > help solving the problem.
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > > >
> > > > Le 08/10/2019 à 15:02, Uwe L. Korn a écrit :
> > > > > I'm not sure what qualifies for "board attention" but it seems that
> > CI is a critical problem in Apache projects, not just Arrow. Should we
> > raise that?
> > > > >
> > > > > Uwe
> > > > >
> > > > > On Tue, Oct 8, 2019, at 12:00 AM, Wes McKinney wrote:
> > > > >> Here is a start for our Q3 board report
> > > > >>
> > > > >> ## Description:
> > > > >> The mission of Apache Arrow is the creation and maintenance of
> > software related
> > > > >> to columnar in-memory processing and data interchange
> > > > >>
> > > > >> ## Issues:
> > > > >> There are no issues requiring board attention at this time
> > > > >>
> > > > >> ## Membership Data:
> > > > >> * Apache Arrow was founded 2016-01-19 (4 years ago)
> > > > >> * There are currently 48 committers and 28 PMC members in this
> > project.
> > > > >> * The Committer-to-PMC ratio is roughly 3:2.
> > > > >>
> > > > >> Community changes, past quarter:
> > > > >> - Micah Kornfield was added to the PMC on 2019-08-21
> > > > >> - Sebastien Binet was added to the PMC on 2019-08-21
> > > > >> - Ben Kietzman was added as committer on 2019-09-07
> > > > >> - David Li was added as committer on 2019-08-30
> > > > >> - Kenta Murata was added as committer on 2019-09-05
> > > > >> - Neal Richardson was added as committer on 2019-09-05
> > > > >> - Praveen Kumar was added as committer on 2019-07-14
> > > > >>
> > > > >> ## Project Activity:
> > > > >>
> > > > >> * The project has just made a 0.15.0 release.
> > > > >> * We are discussing ways to make the Arrow libraries as accessible
> > as possible
> > > > >>   to downstream projects for minimal use cases while allowing the
> > development
> > > > >>   of more comprehensive "standard libraries" with larger dependency
> > stacks in
> > > > >>   the project
> > > > >> * We plan to make a 1.0.0 release as our next major release, at
> > which time we
> > > > >>   will declare that the Arrow binary protocol is stable with
> > forward and
> > > > >>   backward compatibility guarantees
> > > > >> * We are struggling with Continuous Integration scalability as the
> > project has
> > > > >>   definitely outgrown what Travis CI and Appveyor can do for us. We
> > are
> > > > >>   exploring alternative solutions such as Buildbot, Buildkite (see
> > > > >>   INFRA-19217), and GitHub Actions to provide a path to migrate
> > away from
> > > > >>   Travis CI / Appveyor
> > > > >>
> > > > >> ## Community Health:
> > > > >>
> > > > >> * The community is overall healthy, with the aforementioned
> > concerns around CI
> > > > >>   scalability. New contributors frequently take notice of the long
> > build queue
> > > > >>   times when submitting pull requests.
> > > > >>
> >

Re: [DRAFT] Apache Arrow Board Report - October 2019

Posted by Jacques Nadeau <ja...@apache.org>.
I think we need to more direct in listing issues for the board.

What have we done? What do we want them to do?

In general, any large org is going to be slow to add new deep integrations
into GitHub. I don't think we should expect Apache to be any different (it
took several years before we could merge things through github for
example). If I were on the INFRA side, I think I would look and see how
many different people are asking for BuildKite before considering
integration. It seems like we only opened the JIRA 6 days ago and no other
projects have requested access to this?

I'm not clear why this is a board issue. What do we think the board can do
for us that we can't solve ourselves and need them to solve? Remember, a
board solution to a problem is typically very removed from what matters to
individuals on a project.






On Tue, Oct 8, 2019 at 7:03 AM Wes McKinney <we...@gmail.com> wrote:

> New draft
>
> ## Description:
> The mission of Apache Arrow is the creation and maintenance of software
> related
> to columnar in-memory processing and data interchange
>
> ## Issues:
>
> * We are struggling with Continuous Integration scalability as the project
> has
>   definitely outgrown what Travis CI and Appveyor can do for us. Some
>   contributors have shown reluctance to submit patches they aren't sure
> about
>   because they don't want to pile on the build queue. We are exploring
>   alternative solutions such as Buildbot, Buildkite, and GitHub Actions to
>   provide a path to migrate away from Travis CI / Appveyor. In our request
> to
>   Infrastructure INFRA-19217, some of us were alarmed to find that an CI/CD
>   service like Buildkite may not be able to be connected to the @apache
> GitHub
>   account on account of requiring admin access to repository webhooks, but
> no
>   ability to modify source code. There are workarounds (building custom
> OAuth
>   bots) that could enable us to use Buildkite, but it would require extra
>   development and result in a less refined experience for community
> members.
>
> ## Membership Data:
> * Apache Arrow was founded 2016-01-19 (4 years ago)
> * There are currently 48 committers and 28 PMC members in this project.
> * The Committer-to-PMC ratio is roughly 3:2.
>
> Community changes, past quarter:
> - Micah Kornfield was added to the PMC on 2019-08-21
> - Sebastien Binet was added to the PMC on 2019-08-21
> - Ben Kietzman was added as committer on 2019-09-07
> - David Li was added as committer on 2019-08-30
> - Kenta Murata was added as committer on 2019-09-05
> - Neal Richardson was added as committer on 2019-09-05
> - Praveen Kumar was added as committer on 2019-07-14
>
> ## Project Activity:
>
> * The project has just made a 0.15.0 release.
> * We are discussing ways to make the Arrow libraries as accessible as
> possible
>   to downstream projects for minimal use cases while allowing the
> development
>   of more comprehensive "standard libraries" with larger dependency stacks
> in
>   the project
> * We plan to make a 1.0.0 release as our next major release, at which time
> we
>   will declare that the Arrow binary protocol is stable with forward and
>   backward compatibility guarantees
>
> ## Community Health:
>
> * The community is overall healthy, with the aforementioned concerns
> around CI
>   scalability. New contributors frequently take notice of the long build
> queue
>   times when submitting pull requests.
>
> On Tue, Oct 8, 2019 at 8:58 AM Wes McKinney <we...@gmail.com> wrote:
> >
> > Yes, I agree with raising the issue to the board.
> >
> > On Tue, Oct 8, 2019 at 8:31 AM Antoine Pitrou <an...@python.org>
> wrote:
> > >
> > >
> > > I agree.  Especially given that the constraints imposed by Infra don't
> > > help solving the problem.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 08/10/2019 à 15:02, Uwe L. Korn a écrit :
> > > > I'm not sure what qualifies for "board attention" but it seems that
> CI is a critical problem in Apache projects, not just Arrow. Should we
> raise that?
> > > >
> > > > Uwe
> > > >
> > > > On Tue, Oct 8, 2019, at 12:00 AM, Wes McKinney wrote:
> > > >> Here is a start for our Q3 board report
> > > >>
> > > >> ## Description:
> > > >> The mission of Apache Arrow is the creation and maintenance of
> software related
> > > >> to columnar in-memory processing and data interchange
> > > >>
> > > >> ## Issues:
> > > >> There are no issues requiring board attention at this time
> > > >>
> > > >> ## Membership Data:
> > > >> * Apache Arrow was founded 2016-01-19 (4 years ago)
> > > >> * There are currently 48 committers and 28 PMC members in this
> project.
> > > >> * The Committer-to-PMC ratio is roughly 3:2.
> > > >>
> > > >> Community changes, past quarter:
> > > >> - Micah Kornfield was added to the PMC on 2019-08-21
> > > >> - Sebastien Binet was added to the PMC on 2019-08-21
> > > >> - Ben Kietzman was added as committer on 2019-09-07
> > > >> - David Li was added as committer on 2019-08-30
> > > >> - Kenta Murata was added as committer on 2019-09-05
> > > >> - Neal Richardson was added as committer on 2019-09-05
> > > >> - Praveen Kumar was added as committer on 2019-07-14
> > > >>
> > > >> ## Project Activity:
> > > >>
> > > >> * The project has just made a 0.15.0 release.
> > > >> * We are discussing ways to make the Arrow libraries as accessible
> as possible
> > > >>   to downstream projects for minimal use cases while allowing the
> development
> > > >>   of more comprehensive "standard libraries" with larger dependency
> stacks in
> > > >>   the project
> > > >> * We plan to make a 1.0.0 release as our next major release, at
> which time we
> > > >>   will declare that the Arrow binary protocol is stable with
> forward and
> > > >>   backward compatibility guarantees
> > > >> * We are struggling with Continuous Integration scalability as the
> project has
> > > >>   definitely outgrown what Travis CI and Appveyor can do for us. We
> are
> > > >>   exploring alternative solutions such as Buildbot, Buildkite (see
> > > >>   INFRA-19217), and GitHub Actions to provide a path to migrate
> away from
> > > >>   Travis CI / Appveyor
> > > >>
> > > >> ## Community Health:
> > > >>
> > > >> * The community is overall healthy, with the aforementioned
> concerns around CI
> > > >>   scalability. New contributors frequently take notice of the long
> build queue
> > > >>   times when submitting pull requests.
> > > >>
>

Re: [DRAFT] Apache Arrow Board Report - October 2019

Posted by Wes McKinney <we...@gmail.com>.
New draft

## Description:
The mission of Apache Arrow is the creation and maintenance of software related
to columnar in-memory processing and data interchange

## Issues:

* We are struggling with Continuous Integration scalability as the project has
  definitely outgrown what Travis CI and Appveyor can do for us. Some
  contributors have shown reluctance to submit patches they aren't sure about
  because they don't want to pile on the build queue. We are exploring
  alternative solutions such as Buildbot, Buildkite, and GitHub Actions to
  provide a path to migrate away from Travis CI / Appveyor. In our request to
  Infrastructure INFRA-19217, some of us were alarmed to find that an CI/CD
  service like Buildkite may not be able to be connected to the @apache GitHub
  account on account of requiring admin access to repository webhooks, but no
  ability to modify source code. There are workarounds (building custom OAuth
  bots) that could enable us to use Buildkite, but it would require extra
  development and result in a less refined experience for community members.

## Membership Data:
* Apache Arrow was founded 2016-01-19 (4 years ago)
* There are currently 48 committers and 28 PMC members in this project.
* The Committer-to-PMC ratio is roughly 3:2.

Community changes, past quarter:
- Micah Kornfield was added to the PMC on 2019-08-21
- Sebastien Binet was added to the PMC on 2019-08-21
- Ben Kietzman was added as committer on 2019-09-07
- David Li was added as committer on 2019-08-30
- Kenta Murata was added as committer on 2019-09-05
- Neal Richardson was added as committer on 2019-09-05
- Praveen Kumar was added as committer on 2019-07-14

## Project Activity:

* The project has just made a 0.15.0 release.
* We are discussing ways to make the Arrow libraries as accessible as possible
  to downstream projects for minimal use cases while allowing the development
  of more comprehensive "standard libraries" with larger dependency stacks in
  the project
* We plan to make a 1.0.0 release as our next major release, at which time we
  will declare that the Arrow binary protocol is stable with forward and
  backward compatibility guarantees

## Community Health:

* The community is overall healthy, with the aforementioned concerns around CI
  scalability. New contributors frequently take notice of the long build queue
  times when submitting pull requests.

On Tue, Oct 8, 2019 at 8:58 AM Wes McKinney <we...@gmail.com> wrote:
>
> Yes, I agree with raising the issue to the board.
>
> On Tue, Oct 8, 2019 at 8:31 AM Antoine Pitrou <an...@python.org> wrote:
> >
> >
> > I agree.  Especially given that the constraints imposed by Infra don't
> > help solving the problem.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 08/10/2019 à 15:02, Uwe L. Korn a écrit :
> > > I'm not sure what qualifies for "board attention" but it seems that CI is a critical problem in Apache projects, not just Arrow. Should we raise that?
> > >
> > > Uwe
> > >
> > > On Tue, Oct 8, 2019, at 12:00 AM, Wes McKinney wrote:
> > >> Here is a start for our Q3 board report
> > >>
> > >> ## Description:
> > >> The mission of Apache Arrow is the creation and maintenance of software related
> > >> to columnar in-memory processing and data interchange
> > >>
> > >> ## Issues:
> > >> There are no issues requiring board attention at this time
> > >>
> > >> ## Membership Data:
> > >> * Apache Arrow was founded 2016-01-19 (4 years ago)
> > >> * There are currently 48 committers and 28 PMC members in this project.
> > >> * The Committer-to-PMC ratio is roughly 3:2.
> > >>
> > >> Community changes, past quarter:
> > >> - Micah Kornfield was added to the PMC on 2019-08-21
> > >> - Sebastien Binet was added to the PMC on 2019-08-21
> > >> - Ben Kietzman was added as committer on 2019-09-07
> > >> - David Li was added as committer on 2019-08-30
> > >> - Kenta Murata was added as committer on 2019-09-05
> > >> - Neal Richardson was added as committer on 2019-09-05
> > >> - Praveen Kumar was added as committer on 2019-07-14
> > >>
> > >> ## Project Activity:
> > >>
> > >> * The project has just made a 0.15.0 release.
> > >> * We are discussing ways to make the Arrow libraries as accessible as possible
> > >>   to downstream projects for minimal use cases while allowing the development
> > >>   of more comprehensive "standard libraries" with larger dependency stacks in
> > >>   the project
> > >> * We plan to make a 1.0.0 release as our next major release, at which time we
> > >>   will declare that the Arrow binary protocol is stable with forward and
> > >>   backward compatibility guarantees
> > >> * We are struggling with Continuous Integration scalability as the project has
> > >>   definitely outgrown what Travis CI and Appveyor can do for us. We are
> > >>   exploring alternative solutions such as Buildbot, Buildkite (see
> > >>   INFRA-19217), and GitHub Actions to provide a path to migrate away from
> > >>   Travis CI / Appveyor
> > >>
> > >> ## Community Health:
> > >>
> > >> * The community is overall healthy, with the aforementioned concerns around CI
> > >>   scalability. New contributors frequently take notice of the long build queue
> > >>   times when submitting pull requests.
> > >>

Re: [DRAFT] Apache Arrow Board Report - October 2019

Posted by Wes McKinney <we...@gmail.com>.
Yes, I agree with raising the issue to the board.

On Tue, Oct 8, 2019 at 8:31 AM Antoine Pitrou <an...@python.org> wrote:
>
>
> I agree.  Especially given that the constraints imposed by Infra don't
> help solving the problem.
>
> Regards
>
> Antoine.
>
>
> Le 08/10/2019 à 15:02, Uwe L. Korn a écrit :
> > I'm not sure what qualifies for "board attention" but it seems that CI is a critical problem in Apache projects, not just Arrow. Should we raise that?
> >
> > Uwe
> >
> > On Tue, Oct 8, 2019, at 12:00 AM, Wes McKinney wrote:
> >> Here is a start for our Q3 board report
> >>
> >> ## Description:
> >> The mission of Apache Arrow is the creation and maintenance of software related
> >> to columnar in-memory processing and data interchange
> >>
> >> ## Issues:
> >> There are no issues requiring board attention at this time
> >>
> >> ## Membership Data:
> >> * Apache Arrow was founded 2016-01-19 (4 years ago)
> >> * There are currently 48 committers and 28 PMC members in this project.
> >> * The Committer-to-PMC ratio is roughly 3:2.
> >>
> >> Community changes, past quarter:
> >> - Micah Kornfield was added to the PMC on 2019-08-21
> >> - Sebastien Binet was added to the PMC on 2019-08-21
> >> - Ben Kietzman was added as committer on 2019-09-07
> >> - David Li was added as committer on 2019-08-30
> >> - Kenta Murata was added as committer on 2019-09-05
> >> - Neal Richardson was added as committer on 2019-09-05
> >> - Praveen Kumar was added as committer on 2019-07-14
> >>
> >> ## Project Activity:
> >>
> >> * The project has just made a 0.15.0 release.
> >> * We are discussing ways to make the Arrow libraries as accessible as possible
> >>   to downstream projects for minimal use cases while allowing the development
> >>   of more comprehensive "standard libraries" with larger dependency stacks in
> >>   the project
> >> * We plan to make a 1.0.0 release as our next major release, at which time we
> >>   will declare that the Arrow binary protocol is stable with forward and
> >>   backward compatibility guarantees
> >> * We are struggling with Continuous Integration scalability as the project has
> >>   definitely outgrown what Travis CI and Appveyor can do for us. We are
> >>   exploring alternative solutions such as Buildbot, Buildkite (see
> >>   INFRA-19217), and GitHub Actions to provide a path to migrate away from
> >>   Travis CI / Appveyor
> >>
> >> ## Community Health:
> >>
> >> * The community is overall healthy, with the aforementioned concerns around CI
> >>   scalability. New contributors frequently take notice of the long build queue
> >>   times when submitting pull requests.
> >>

Re: [DRAFT] Apache Arrow Board Report - October 2019

Posted by Antoine Pitrou <an...@python.org>.
I agree.  Especially given that the constraints imposed by Infra don't
help solving the problem.

Regards

Antoine.


Le 08/10/2019 à 15:02, Uwe L. Korn a écrit :
> I'm not sure what qualifies for "board attention" but it seems that CI is a critical problem in Apache projects, not just Arrow. Should we raise that?
> 
> Uwe
> 
> On Tue, Oct 8, 2019, at 12:00 AM, Wes McKinney wrote:
>> Here is a start for our Q3 board report
>>
>> ## Description:
>> The mission of Apache Arrow is the creation and maintenance of software related
>> to columnar in-memory processing and data interchange
>>
>> ## Issues:
>> There are no issues requiring board attention at this time
>>
>> ## Membership Data:
>> * Apache Arrow was founded 2016-01-19 (4 years ago)
>> * There are currently 48 committers and 28 PMC members in this project.
>> * The Committer-to-PMC ratio is roughly 3:2.
>>
>> Community changes, past quarter:
>> - Micah Kornfield was added to the PMC on 2019-08-21
>> - Sebastien Binet was added to the PMC on 2019-08-21
>> - Ben Kietzman was added as committer on 2019-09-07
>> - David Li was added as committer on 2019-08-30
>> - Kenta Murata was added as committer on 2019-09-05
>> - Neal Richardson was added as committer on 2019-09-05
>> - Praveen Kumar was added as committer on 2019-07-14
>>
>> ## Project Activity:
>>
>> * The project has just made a 0.15.0 release.
>> * We are discussing ways to make the Arrow libraries as accessible as possible
>>   to downstream projects for minimal use cases while allowing the development
>>   of more comprehensive "standard libraries" with larger dependency stacks in
>>   the project
>> * We plan to make a 1.0.0 release as our next major release, at which time we
>>   will declare that the Arrow binary protocol is stable with forward and
>>   backward compatibility guarantees
>> * We are struggling with Continuous Integration scalability as the project has
>>   definitely outgrown what Travis CI and Appveyor can do for us. We are
>>   exploring alternative solutions such as Buildbot, Buildkite (see
>>   INFRA-19217), and GitHub Actions to provide a path to migrate away from
>>   Travis CI / Appveyor
>>
>> ## Community Health:
>>
>> * The community is overall healthy, with the aforementioned concerns around CI
>>   scalability. New contributors frequently take notice of the long build queue
>>   times when submitting pull requests.
>>

Re: [DRAFT] Apache Arrow Board Report - October 2019

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
I'm not sure what qualifies for "board attention" but it seems that CI is a critical problem in Apache projects, not just Arrow. Should we raise that?

Uwe

On Tue, Oct 8, 2019, at 12:00 AM, Wes McKinney wrote:
> Here is a start for our Q3 board report
> 
> ## Description:
> The mission of Apache Arrow is the creation and maintenance of software related
> to columnar in-memory processing and data interchange
> 
> ## Issues:
> There are no issues requiring board attention at this time
> 
> ## Membership Data:
> * Apache Arrow was founded 2016-01-19 (4 years ago)
> * There are currently 48 committers and 28 PMC members in this project.
> * The Committer-to-PMC ratio is roughly 3:2.
> 
> Community changes, past quarter:
> - Micah Kornfield was added to the PMC on 2019-08-21
> - Sebastien Binet was added to the PMC on 2019-08-21
> - Ben Kietzman was added as committer on 2019-09-07
> - David Li was added as committer on 2019-08-30
> - Kenta Murata was added as committer on 2019-09-05
> - Neal Richardson was added as committer on 2019-09-05
> - Praveen Kumar was added as committer on 2019-07-14
> 
> ## Project Activity:
> 
> * The project has just made a 0.15.0 release.
> * We are discussing ways to make the Arrow libraries as accessible as possible
>   to downstream projects for minimal use cases while allowing the development
>   of more comprehensive "standard libraries" with larger dependency stacks in
>   the project
> * We plan to make a 1.0.0 release as our next major release, at which time we
>   will declare that the Arrow binary protocol is stable with forward and
>   backward compatibility guarantees
> * We are struggling with Continuous Integration scalability as the project has
>   definitely outgrown what Travis CI and Appveyor can do for us. We are
>   exploring alternative solutions such as Buildbot, Buildkite (see
>   INFRA-19217), and GitHub Actions to provide a path to migrate away from
>   Travis CI / Appveyor
> 
> ## Community Health:
> 
> * The community is overall healthy, with the aforementioned concerns around CI
>   scalability. New contributors frequently take notice of the long build queue
>   times when submitting pull requests.
>