You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2018/02/01 17:09:18 UTC

Arrow PR backlog: please help

hi folks,

We've had a rough couple of weeks in our PR queue due to various CI
issues causing a high incidence of build failures:

* Package dependency upgrades (Thrift -- this has been fixed)
* Failures due possibly to VM setting changes in Travis CI (memory
thrashing / VM timeouts, see ARROW-2062, ARROW-2071)
* apt flakiness (this is still ongoing, see ARROW-2021)

Meanwhile, at the moment, we have 37 open PRs
(https://github.com/apache/arrow/pulls). Some of these are stale and
need to either be reviewed, updated, or closed. We have many other PRs
that need to be rebased (builds should mostly pass now if rebased on
master) and/or reviewed. I've been doing the best I can do keep up
with the PR queue (and others have been reviewing and merging PRs,
too), but it's currently not enough to keep up, and there's a lot of
development work for the 0.9.0 milestone that I'd like to also be
doing.

The project is growing fast -- both in users and new developers. Just
on a single install path for the Python libraries, Arrow is being
installed _over 1000 times per day_
(https://anaconda.org/conda-forge/pyarrow) -- when you add up all the
install paths it is likely to be much more than that.

Reviews and help maintaining PRs from the community, but especially
from other committers and PMC members, would be especially useful
right now to get the project operating smoothly with a steady stream
of high quality patches making their way into master.

If there's anything else we can do to improve developer and community
productivity in Arrow right now, I'm open to ideas.

Thanks,
Wes

Re: Arrow PR backlog: please help

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
I see the problem with the brotli feedstock (the lib name changed from "_static" to "-static"). I'm going to add some cmake magic for that.

Uwe

On Fri, Feb 2, 2018, at 6:29 PM, Wes McKinney wrote:
> Our master branch builds are back to failing, from what I can tell:
> 
> * Python tests hanging seemingly due to VM resource pressure
> (ARROW-2062 resolved only temporarily)
> * Windows toolchain issue related to Brotli package on conda-forge
> 
> It would be great to get the Python issues resolved today if possible.
> The Brotli issue is a bit concerning (packages were updated in last 24
> hours https://github.com/conda-forge/brotli-feedstock/commits/master),
> is that on its way to being fixed or does that need to be
> investigated?
> 
> Thanks
> Wes
> 
> On Thu, Feb 1, 2018 at 4:19 PM, Phillip Cloud <cp...@gmail.com> wrote:
> > JIRA-ized: https://issues.apache.org/jira/browse/INFRA-15964
> >
> > On Thu, Feb 1, 2018 at 3:59 PM Phillip Cloud <cp...@gmail.com> wrote:
> >
> >> Ok, will do.
> >>
> >> On Thu, Feb 1, 2018 at 3:56 PM Wes McKinney <we...@gmail.com> wrote:
> >>
> >>> You'll have to open an INFRA ticket on JIRA
> >>>
> >>> On Thu, Feb 1, 2018 at 3:53 PM, Phillip Cloud <cp...@gmail.com> wrote:
> >>> > I'll follow up with them and shoot an email over to see if we can use
> >>> > circle with gitbox repos.
> >>> >
> >>> > On Thu, Feb 1, 2018 at 3:47 PM Wes McKinney <we...@gmail.com>
> >>> wrote:
> >>> >
> >>> >> Does someone want to ask Infra about it? I haven't asked them since we
> >>> >> migrated to GitBox
> >>> >>
> >>> >> On Thu, Feb 1, 2018 at 2:15 PM, Uwe L. Korn <uw...@xhochy.com> wrote:
> >>> >> > CircleCI requires more permissions than Travis and Apache Infra don't
> >>> >> want to give it to them. This might be different now that we have the
> >>> >> gitbox setup instead of the previous Apache git mirroring.
> >>> >> >
> >>> >> >> Am 01.02.2018 um 20:08 schrieb Phillip Cloud <cp...@gmail.com>:
> >>> >> >>
> >>> >> >> What is the main barrier to getting CircleCI to work with Apache
> >>> >> projects?
> >>> >> >>
> >>> >> >>> On Thu, Feb 1, 2018 at 2:03 PM Uwe L. Korn <uw...@xhochy.com>
> >>> wrote:
> >>> >> >>>
> >>> >> >>> I just went over a lot of open PRs and sadly I wasn't able to
> >>> reduce
> >>> >> the
> >>> >> >>> number of open ones significantly. Some of them make slow progress
> >>> and
> >>> >> it
> >>> >> >>> might be worthwhile to jump in in a week, for now I would rather
> >>> wait
> >>> >> and
> >>> >> >>> let the initial authors finish them to get more involved in the
> >>> >> project.
> >>> >> >>> Currently the CI issues are a main bottleneck for all of us,
> >>> besides
> >>> >> the
> >>> >> >>> long-running Python tests, we also spent a lot of time on the
> >>> >> environment
> >>> >> >>> setup. Typically this is a thing that can really be improved with a
> >>> >> docker
> >>> >> >>> setup, sadly Travis takes quite some time to pull the current
> >>> image we
> >>> >> use
> >>> >> >>> for the manylinux1 build. I'll first have a look at improving it
> >>> and
> >>> >> if the
> >>> >> >>> download times get better, we might want to move some things in
> >>> there
> >>> >> >>> (sadly CircleCI and Apache projects still don't work together).
> >>> >> >>>
> >>> >> >>> Also I think a confusing thing is that we have separate
> >>> documentations
> >>> >> >>> between Python and C++. This is also a thing I'm going to work on
> >>> once
> >>> >> I
> >>> >> >>> have some time. The two implementation are bound very thight
> >>> together
> >>> >> and a
> >>> >> >>> lot that applies to one language also applies to the other one.
> >>> >> >>>
> >>> >> >>> Uwe
> >>> >> >>>
> >>> >> >>>> On Thu, Feb 1, 2018, at 6:09 PM, Wes McKinney wrote:
> >>> >> >>>> hi folks,
> >>> >> >>>>
> >>> >> >>>> We've had a rough couple of weeks in our PR queue due to various
> >>> CI
> >>> >> >>>> issues causing a high incidence of build failures:
> >>> >> >>>>
> >>> >> >>>> * Package dependency upgrades (Thrift -- this has been fixed)
> >>> >> >>>> * Failures due possibly to VM setting changes in Travis CI (memory
> >>> >> >>>> thrashing / VM timeouts, see ARROW-2062, ARROW-2071)
> >>> >> >>>> * apt flakiness (this is still ongoing, see ARROW-2021)
> >>> >> >>>>
> >>> >> >>>> Meanwhile, at the moment, we have 37 open PRs
> >>> >> >>>> (https://github.com/apache/arrow/pulls). Some of these are stale
> >>> and
> >>> >> >>>> need to either be reviewed, updated, or closed. We have many
> >>> other PRs
> >>> >> >>>> that need to be rebased (builds should mostly pass now if rebased
> >>> on
> >>> >> >>>> master) and/or reviewed. I've been doing the best I can do keep up
> >>> >> >>>> with the PR queue (and others have been reviewing and merging PRs,
> >>> >> >>>> too), but it's currently not enough to keep up, and there's a lot
> >>> of
> >>> >> >>>> development work for the 0.9.0 milestone that I'd like to also be
> >>> >> >>>> doing.
> >>> >> >>>>
> >>> >> >>>> The project is growing fast -- both in users and new developers.
> >>> Just
> >>> >> >>>> on a single install path for the Python libraries, Arrow is being
> >>> >> >>>> installed _over 1000 times per day_
> >>> >> >>>> (https://anaconda.org/conda-forge/pyarrow) -- when you add up
> >>> all the
> >>> >> >>>> install paths it is likely to be much more than that.
> >>> >> >>>>
> >>> >> >>>> Reviews and help maintaining PRs from the community, but
> >>> especially
> >>> >> >>>> from other committers and PMC members, would be especially useful
> >>> >> >>>> right now to get the project operating smoothly with a steady
> >>> stream
> >>> >> >>>> of high quality patches making their way into master.
> >>> >> >>>>
> >>> >> >>>> If there's anything else we can do to improve developer and
> >>> community
> >>> >> >>>> productivity in Arrow right now, I'm open to ideas.
> >>> >> >>>>
> >>> >> >>>> Thanks,
> >>> >> >>>> Wes
> >>> >> >>>
> >>> >> >
> >>> >>
> >>>
> >>

Re: Arrow PR backlog: please help

Posted by Wes McKinney <we...@gmail.com>.
@Antoine, see this build
https://travis-ci.org/apache/arrow/jobs/336412446. See also ARROW-2071
(the test slowness and VM timeouts are likely closely related)

On Fri, Feb 2, 2018 at 1:00 PM, Antoine Pitrou <an...@python.org> wrote:
>
>
> Le 02/02/2018 à 18:29, Wes McKinney a écrit :
>> Our master branch builds are back to failing, from what I can tell:
>>
>> * Python tests hanging seemingly due to VM resource pressure
>> (ARROW-2062 resolved only temporarily)
>
> Can you link to an example?
>
> Regards
>
> Antoine.

Re: Arrow PR backlog: please help

Posted by Antoine Pitrou <an...@python.org>.

Le 02/02/2018 à 18:29, Wes McKinney a écrit :
> Our master branch builds are back to failing, from what I can tell:
> 
> * Python tests hanging seemingly due to VM resource pressure
> (ARROW-2062 resolved only temporarily)

Can you link to an example?

Regards

Antoine.

Re: Arrow PR backlog: please help

Posted by Wes McKinney <we...@gmail.com>.
Our master branch builds are back to failing, from what I can tell:

* Python tests hanging seemingly due to VM resource pressure
(ARROW-2062 resolved only temporarily)
* Windows toolchain issue related to Brotli package on conda-forge

It would be great to get the Python issues resolved today if possible.
The Brotli issue is a bit concerning (packages were updated in last 24
hours https://github.com/conda-forge/brotli-feedstock/commits/master),
is that on its way to being fixed or does that need to be
investigated?

Thanks
Wes

On Thu, Feb 1, 2018 at 4:19 PM, Phillip Cloud <cp...@gmail.com> wrote:
> JIRA-ized: https://issues.apache.org/jira/browse/INFRA-15964
>
> On Thu, Feb 1, 2018 at 3:59 PM Phillip Cloud <cp...@gmail.com> wrote:
>
>> Ok, will do.
>>
>> On Thu, Feb 1, 2018 at 3:56 PM Wes McKinney <we...@gmail.com> wrote:
>>
>>> You'll have to open an INFRA ticket on JIRA
>>>
>>> On Thu, Feb 1, 2018 at 3:53 PM, Phillip Cloud <cp...@gmail.com> wrote:
>>> > I'll follow up with them and shoot an email over to see if we can use
>>> > circle with gitbox repos.
>>> >
>>> > On Thu, Feb 1, 2018 at 3:47 PM Wes McKinney <we...@gmail.com>
>>> wrote:
>>> >
>>> >> Does someone want to ask Infra about it? I haven't asked them since we
>>> >> migrated to GitBox
>>> >>
>>> >> On Thu, Feb 1, 2018 at 2:15 PM, Uwe L. Korn <uw...@xhochy.com> wrote:
>>> >> > CircleCI requires more permissions than Travis and Apache Infra don't
>>> >> want to give it to them. This might be different now that we have the
>>> >> gitbox setup instead of the previous Apache git mirroring.
>>> >> >
>>> >> >> Am 01.02.2018 um 20:08 schrieb Phillip Cloud <cp...@gmail.com>:
>>> >> >>
>>> >> >> What is the main barrier to getting CircleCI to work with Apache
>>> >> projects?
>>> >> >>
>>> >> >>> On Thu, Feb 1, 2018 at 2:03 PM Uwe L. Korn <uw...@xhochy.com>
>>> wrote:
>>> >> >>>
>>> >> >>> I just went over a lot of open PRs and sadly I wasn't able to
>>> reduce
>>> >> the
>>> >> >>> number of open ones significantly. Some of them make slow progress
>>> and
>>> >> it
>>> >> >>> might be worthwhile to jump in in a week, for now I would rather
>>> wait
>>> >> and
>>> >> >>> let the initial authors finish them to get more involved in the
>>> >> project.
>>> >> >>> Currently the CI issues are a main bottleneck for all of us,
>>> besides
>>> >> the
>>> >> >>> long-running Python tests, we also spent a lot of time on the
>>> >> environment
>>> >> >>> setup. Typically this is a thing that can really be improved with a
>>> >> docker
>>> >> >>> setup, sadly Travis takes quite some time to pull the current
>>> image we
>>> >> use
>>> >> >>> for the manylinux1 build. I'll first have a look at improving it
>>> and
>>> >> if the
>>> >> >>> download times get better, we might want to move some things in
>>> there
>>> >> >>> (sadly CircleCI and Apache projects still don't work together).
>>> >> >>>
>>> >> >>> Also I think a confusing thing is that we have separate
>>> documentations
>>> >> >>> between Python and C++. This is also a thing I'm going to work on
>>> once
>>> >> I
>>> >> >>> have some time. The two implementation are bound very thight
>>> together
>>> >> and a
>>> >> >>> lot that applies to one language also applies to the other one.
>>> >> >>>
>>> >> >>> Uwe
>>> >> >>>
>>> >> >>>> On Thu, Feb 1, 2018, at 6:09 PM, Wes McKinney wrote:
>>> >> >>>> hi folks,
>>> >> >>>>
>>> >> >>>> We've had a rough couple of weeks in our PR queue due to various
>>> CI
>>> >> >>>> issues causing a high incidence of build failures:
>>> >> >>>>
>>> >> >>>> * Package dependency upgrades (Thrift -- this has been fixed)
>>> >> >>>> * Failures due possibly to VM setting changes in Travis CI (memory
>>> >> >>>> thrashing / VM timeouts, see ARROW-2062, ARROW-2071)
>>> >> >>>> * apt flakiness (this is still ongoing, see ARROW-2021)
>>> >> >>>>
>>> >> >>>> Meanwhile, at the moment, we have 37 open PRs
>>> >> >>>> (https://github.com/apache/arrow/pulls). Some of these are stale
>>> and
>>> >> >>>> need to either be reviewed, updated, or closed. We have many
>>> other PRs
>>> >> >>>> that need to be rebased (builds should mostly pass now if rebased
>>> on
>>> >> >>>> master) and/or reviewed. I've been doing the best I can do keep up
>>> >> >>>> with the PR queue (and others have been reviewing and merging PRs,
>>> >> >>>> too), but it's currently not enough to keep up, and there's a lot
>>> of
>>> >> >>>> development work for the 0.9.0 milestone that I'd like to also be
>>> >> >>>> doing.
>>> >> >>>>
>>> >> >>>> The project is growing fast -- both in users and new developers.
>>> Just
>>> >> >>>> on a single install path for the Python libraries, Arrow is being
>>> >> >>>> installed _over 1000 times per day_
>>> >> >>>> (https://anaconda.org/conda-forge/pyarrow) -- when you add up
>>> all the
>>> >> >>>> install paths it is likely to be much more than that.
>>> >> >>>>
>>> >> >>>> Reviews and help maintaining PRs from the community, but
>>> especially
>>> >> >>>> from other committers and PMC members, would be especially useful
>>> >> >>>> right now to get the project operating smoothly with a steady
>>> stream
>>> >> >>>> of high quality patches making their way into master.
>>> >> >>>>
>>> >> >>>> If there's anything else we can do to improve developer and
>>> community
>>> >> >>>> productivity in Arrow right now, I'm open to ideas.
>>> >> >>>>
>>> >> >>>> Thanks,
>>> >> >>>> Wes
>>> >> >>>
>>> >> >
>>> >>
>>>
>>

Re: Arrow PR backlog: please help

Posted by Phillip Cloud <cp...@gmail.com>.
JIRA-ized: https://issues.apache.org/jira/browse/INFRA-15964

On Thu, Feb 1, 2018 at 3:59 PM Phillip Cloud <cp...@gmail.com> wrote:

> Ok, will do.
>
> On Thu, Feb 1, 2018 at 3:56 PM Wes McKinney <we...@gmail.com> wrote:
>
>> You'll have to open an INFRA ticket on JIRA
>>
>> On Thu, Feb 1, 2018 at 3:53 PM, Phillip Cloud <cp...@gmail.com> wrote:
>> > I'll follow up with them and shoot an email over to see if we can use
>> > circle with gitbox repos.
>> >
>> > On Thu, Feb 1, 2018 at 3:47 PM Wes McKinney <we...@gmail.com>
>> wrote:
>> >
>> >> Does someone want to ask Infra about it? I haven't asked them since we
>> >> migrated to GitBox
>> >>
>> >> On Thu, Feb 1, 2018 at 2:15 PM, Uwe L. Korn <uw...@xhochy.com> wrote:
>> >> > CircleCI requires more permissions than Travis and Apache Infra don't
>> >> want to give it to them. This might be different now that we have the
>> >> gitbox setup instead of the previous Apache git mirroring.
>> >> >
>> >> >> Am 01.02.2018 um 20:08 schrieb Phillip Cloud <cp...@gmail.com>:
>> >> >>
>> >> >> What is the main barrier to getting CircleCI to work with Apache
>> >> projects?
>> >> >>
>> >> >>> On Thu, Feb 1, 2018 at 2:03 PM Uwe L. Korn <uw...@xhochy.com>
>> wrote:
>> >> >>>
>> >> >>> I just went over a lot of open PRs and sadly I wasn't able to
>> reduce
>> >> the
>> >> >>> number of open ones significantly. Some of them make slow progress
>> and
>> >> it
>> >> >>> might be worthwhile to jump in in a week, for now I would rather
>> wait
>> >> and
>> >> >>> let the initial authors finish them to get more involved in the
>> >> project.
>> >> >>> Currently the CI issues are a main bottleneck for all of us,
>> besides
>> >> the
>> >> >>> long-running Python tests, we also spent a lot of time on the
>> >> environment
>> >> >>> setup. Typically this is a thing that can really be improved with a
>> >> docker
>> >> >>> setup, sadly Travis takes quite some time to pull the current
>> image we
>> >> use
>> >> >>> for the manylinux1 build. I'll first have a look at improving it
>> and
>> >> if the
>> >> >>> download times get better, we might want to move some things in
>> there
>> >> >>> (sadly CircleCI and Apache projects still don't work together).
>> >> >>>
>> >> >>> Also I think a confusing thing is that we have separate
>> documentations
>> >> >>> between Python and C++. This is also a thing I'm going to work on
>> once
>> >> I
>> >> >>> have some time. The two implementation are bound very thight
>> together
>> >> and a
>> >> >>> lot that applies to one language also applies to the other one.
>> >> >>>
>> >> >>> Uwe
>> >> >>>
>> >> >>>> On Thu, Feb 1, 2018, at 6:09 PM, Wes McKinney wrote:
>> >> >>>> hi folks,
>> >> >>>>
>> >> >>>> We've had a rough couple of weeks in our PR queue due to various
>> CI
>> >> >>>> issues causing a high incidence of build failures:
>> >> >>>>
>> >> >>>> * Package dependency upgrades (Thrift -- this has been fixed)
>> >> >>>> * Failures due possibly to VM setting changes in Travis CI (memory
>> >> >>>> thrashing / VM timeouts, see ARROW-2062, ARROW-2071)
>> >> >>>> * apt flakiness (this is still ongoing, see ARROW-2021)
>> >> >>>>
>> >> >>>> Meanwhile, at the moment, we have 37 open PRs
>> >> >>>> (https://github.com/apache/arrow/pulls). Some of these are stale
>> and
>> >> >>>> need to either be reviewed, updated, or closed. We have many
>> other PRs
>> >> >>>> that need to be rebased (builds should mostly pass now if rebased
>> on
>> >> >>>> master) and/or reviewed. I've been doing the best I can do keep up
>> >> >>>> with the PR queue (and others have been reviewing and merging PRs,
>> >> >>>> too), but it's currently not enough to keep up, and there's a lot
>> of
>> >> >>>> development work for the 0.9.0 milestone that I'd like to also be
>> >> >>>> doing.
>> >> >>>>
>> >> >>>> The project is growing fast -- both in users and new developers.
>> Just
>> >> >>>> on a single install path for the Python libraries, Arrow is being
>> >> >>>> installed _over 1000 times per day_
>> >> >>>> (https://anaconda.org/conda-forge/pyarrow) -- when you add up
>> all the
>> >> >>>> install paths it is likely to be much more than that.
>> >> >>>>
>> >> >>>> Reviews and help maintaining PRs from the community, but
>> especially
>> >> >>>> from other committers and PMC members, would be especially useful
>> >> >>>> right now to get the project operating smoothly with a steady
>> stream
>> >> >>>> of high quality patches making their way into master.
>> >> >>>>
>> >> >>>> If there's anything else we can do to improve developer and
>> community
>> >> >>>> productivity in Arrow right now, I'm open to ideas.
>> >> >>>>
>> >> >>>> Thanks,
>> >> >>>> Wes
>> >> >>>
>> >> >
>> >>
>>
>

Re: Arrow PR backlog: please help

Posted by Phillip Cloud <cp...@gmail.com>.
Ok, will do.

On Thu, Feb 1, 2018 at 3:56 PM Wes McKinney <we...@gmail.com> wrote:

> You'll have to open an INFRA ticket on JIRA
>
> On Thu, Feb 1, 2018 at 3:53 PM, Phillip Cloud <cp...@gmail.com> wrote:
> > I'll follow up with them and shoot an email over to see if we can use
> > circle with gitbox repos.
> >
> > On Thu, Feb 1, 2018 at 3:47 PM Wes McKinney <we...@gmail.com> wrote:
> >
> >> Does someone want to ask Infra about it? I haven't asked them since we
> >> migrated to GitBox
> >>
> >> On Thu, Feb 1, 2018 at 2:15 PM, Uwe L. Korn <uw...@xhochy.com> wrote:
> >> > CircleCI requires more permissions than Travis and Apache Infra don't
> >> want to give it to them. This might be different now that we have the
> >> gitbox setup instead of the previous Apache git mirroring.
> >> >
> >> >> Am 01.02.2018 um 20:08 schrieb Phillip Cloud <cp...@gmail.com>:
> >> >>
> >> >> What is the main barrier to getting CircleCI to work with Apache
> >> projects?
> >> >>
> >> >>> On Thu, Feb 1, 2018 at 2:03 PM Uwe L. Korn <uw...@xhochy.com>
> wrote:
> >> >>>
> >> >>> I just went over a lot of open PRs and sadly I wasn't able to reduce
> >> the
> >> >>> number of open ones significantly. Some of them make slow progress
> and
> >> it
> >> >>> might be worthwhile to jump in in a week, for now I would rather
> wait
> >> and
> >> >>> let the initial authors finish them to get more involved in the
> >> project.
> >> >>> Currently the CI issues are a main bottleneck for all of us, besides
> >> the
> >> >>> long-running Python tests, we also spent a lot of time on the
> >> environment
> >> >>> setup. Typically this is a thing that can really be improved with a
> >> docker
> >> >>> setup, sadly Travis takes quite some time to pull the current image
> we
> >> use
> >> >>> for the manylinux1 build. I'll first have a look at improving it and
> >> if the
> >> >>> download times get better, we might want to move some things in
> there
> >> >>> (sadly CircleCI and Apache projects still don't work together).
> >> >>>
> >> >>> Also I think a confusing thing is that we have separate
> documentations
> >> >>> between Python and C++. This is also a thing I'm going to work on
> once
> >> I
> >> >>> have some time. The two implementation are bound very thight
> together
> >> and a
> >> >>> lot that applies to one language also applies to the other one.
> >> >>>
> >> >>> Uwe
> >> >>>
> >> >>>> On Thu, Feb 1, 2018, at 6:09 PM, Wes McKinney wrote:
> >> >>>> hi folks,
> >> >>>>
> >> >>>> We've had a rough couple of weeks in our PR queue due to various CI
> >> >>>> issues causing a high incidence of build failures:
> >> >>>>
> >> >>>> * Package dependency upgrades (Thrift -- this has been fixed)
> >> >>>> * Failures due possibly to VM setting changes in Travis CI (memory
> >> >>>> thrashing / VM timeouts, see ARROW-2062, ARROW-2071)
> >> >>>> * apt flakiness (this is still ongoing, see ARROW-2021)
> >> >>>>
> >> >>>> Meanwhile, at the moment, we have 37 open PRs
> >> >>>> (https://github.com/apache/arrow/pulls). Some of these are stale
> and
> >> >>>> need to either be reviewed, updated, or closed. We have many other
> PRs
> >> >>>> that need to be rebased (builds should mostly pass now if rebased
> on
> >> >>>> master) and/or reviewed. I've been doing the best I can do keep up
> >> >>>> with the PR queue (and others have been reviewing and merging PRs,
> >> >>>> too), but it's currently not enough to keep up, and there's a lot
> of
> >> >>>> development work for the 0.9.0 milestone that I'd like to also be
> >> >>>> doing.
> >> >>>>
> >> >>>> The project is growing fast -- both in users and new developers.
> Just
> >> >>>> on a single install path for the Python libraries, Arrow is being
> >> >>>> installed _over 1000 times per day_
> >> >>>> (https://anaconda.org/conda-forge/pyarrow) -- when you add up all
> the
> >> >>>> install paths it is likely to be much more than that.
> >> >>>>
> >> >>>> Reviews and help maintaining PRs from the community, but especially
> >> >>>> from other committers and PMC members, would be especially useful
> >> >>>> right now to get the project operating smoothly with a steady
> stream
> >> >>>> of high quality patches making their way into master.
> >> >>>>
> >> >>>> If there's anything else we can do to improve developer and
> community
> >> >>>> productivity in Arrow right now, I'm open to ideas.
> >> >>>>
> >> >>>> Thanks,
> >> >>>> Wes
> >> >>>
> >> >
> >>
>

Re: Arrow PR backlog: please help

Posted by Wes McKinney <we...@gmail.com>.
You'll have to open an INFRA ticket on JIRA

On Thu, Feb 1, 2018 at 3:53 PM, Phillip Cloud <cp...@gmail.com> wrote:
> I'll follow up with them and shoot an email over to see if we can use
> circle with gitbox repos.
>
> On Thu, Feb 1, 2018 at 3:47 PM Wes McKinney <we...@gmail.com> wrote:
>
>> Does someone want to ask Infra about it? I haven't asked them since we
>> migrated to GitBox
>>
>> On Thu, Feb 1, 2018 at 2:15 PM, Uwe L. Korn <uw...@xhochy.com> wrote:
>> > CircleCI requires more permissions than Travis and Apache Infra don't
>> want to give it to them. This might be different now that we have the
>> gitbox setup instead of the previous Apache git mirroring.
>> >
>> >> Am 01.02.2018 um 20:08 schrieb Phillip Cloud <cp...@gmail.com>:
>> >>
>> >> What is the main barrier to getting CircleCI to work with Apache
>> projects?
>> >>
>> >>> On Thu, Feb 1, 2018 at 2:03 PM Uwe L. Korn <uw...@xhochy.com> wrote:
>> >>>
>> >>> I just went over a lot of open PRs and sadly I wasn't able to reduce
>> the
>> >>> number of open ones significantly. Some of them make slow progress and
>> it
>> >>> might be worthwhile to jump in in a week, for now I would rather wait
>> and
>> >>> let the initial authors finish them to get more involved in the
>> project.
>> >>> Currently the CI issues are a main bottleneck for all of us, besides
>> the
>> >>> long-running Python tests, we also spent a lot of time on the
>> environment
>> >>> setup. Typically this is a thing that can really be improved with a
>> docker
>> >>> setup, sadly Travis takes quite some time to pull the current image we
>> use
>> >>> for the manylinux1 build. I'll first have a look at improving it and
>> if the
>> >>> download times get better, we might want to move some things in there
>> >>> (sadly CircleCI and Apache projects still don't work together).
>> >>>
>> >>> Also I think a confusing thing is that we have separate documentations
>> >>> between Python and C++. This is also a thing I'm going to work on once
>> I
>> >>> have some time. The two implementation are bound very thight together
>> and a
>> >>> lot that applies to one language also applies to the other one.
>> >>>
>> >>> Uwe
>> >>>
>> >>>> On Thu, Feb 1, 2018, at 6:09 PM, Wes McKinney wrote:
>> >>>> hi folks,
>> >>>>
>> >>>> We've had a rough couple of weeks in our PR queue due to various CI
>> >>>> issues causing a high incidence of build failures:
>> >>>>
>> >>>> * Package dependency upgrades (Thrift -- this has been fixed)
>> >>>> * Failures due possibly to VM setting changes in Travis CI (memory
>> >>>> thrashing / VM timeouts, see ARROW-2062, ARROW-2071)
>> >>>> * apt flakiness (this is still ongoing, see ARROW-2021)
>> >>>>
>> >>>> Meanwhile, at the moment, we have 37 open PRs
>> >>>> (https://github.com/apache/arrow/pulls). Some of these are stale and
>> >>>> need to either be reviewed, updated, or closed. We have many other PRs
>> >>>> that need to be rebased (builds should mostly pass now if rebased on
>> >>>> master) and/or reviewed. I've been doing the best I can do keep up
>> >>>> with the PR queue (and others have been reviewing and merging PRs,
>> >>>> too), but it's currently not enough to keep up, and there's a lot of
>> >>>> development work for the 0.9.0 milestone that I'd like to also be
>> >>>> doing.
>> >>>>
>> >>>> The project is growing fast -- both in users and new developers. Just
>> >>>> on a single install path for the Python libraries, Arrow is being
>> >>>> installed _over 1000 times per day_
>> >>>> (https://anaconda.org/conda-forge/pyarrow) -- when you add up all the
>> >>>> install paths it is likely to be much more than that.
>> >>>>
>> >>>> Reviews and help maintaining PRs from the community, but especially
>> >>>> from other committers and PMC members, would be especially useful
>> >>>> right now to get the project operating smoothly with a steady stream
>> >>>> of high quality patches making their way into master.
>> >>>>
>> >>>> If there's anything else we can do to improve developer and community
>> >>>> productivity in Arrow right now, I'm open to ideas.
>> >>>>
>> >>>> Thanks,
>> >>>> Wes
>> >>>
>> >
>>

Re: Arrow PR backlog: please help

Posted by Phillip Cloud <cp...@gmail.com>.
I'll follow up with them and shoot an email over to see if we can use
circle with gitbox repos.

On Thu, Feb 1, 2018 at 3:47 PM Wes McKinney <we...@gmail.com> wrote:

> Does someone want to ask Infra about it? I haven't asked them since we
> migrated to GitBox
>
> On Thu, Feb 1, 2018 at 2:15 PM, Uwe L. Korn <uw...@xhochy.com> wrote:
> > CircleCI requires more permissions than Travis and Apache Infra don't
> want to give it to them. This might be different now that we have the
> gitbox setup instead of the previous Apache git mirroring.
> >
> >> Am 01.02.2018 um 20:08 schrieb Phillip Cloud <cp...@gmail.com>:
> >>
> >> What is the main barrier to getting CircleCI to work with Apache
> projects?
> >>
> >>> On Thu, Feb 1, 2018 at 2:03 PM Uwe L. Korn <uw...@xhochy.com> wrote:
> >>>
> >>> I just went over a lot of open PRs and sadly I wasn't able to reduce
> the
> >>> number of open ones significantly. Some of them make slow progress and
> it
> >>> might be worthwhile to jump in in a week, for now I would rather wait
> and
> >>> let the initial authors finish them to get more involved in the
> project.
> >>> Currently the CI issues are a main bottleneck for all of us, besides
> the
> >>> long-running Python tests, we also spent a lot of time on the
> environment
> >>> setup. Typically this is a thing that can really be improved with a
> docker
> >>> setup, sadly Travis takes quite some time to pull the current image we
> use
> >>> for the manylinux1 build. I'll first have a look at improving it and
> if the
> >>> download times get better, we might want to move some things in there
> >>> (sadly CircleCI and Apache projects still don't work together).
> >>>
> >>> Also I think a confusing thing is that we have separate documentations
> >>> between Python and C++. This is also a thing I'm going to work on once
> I
> >>> have some time. The two implementation are bound very thight together
> and a
> >>> lot that applies to one language also applies to the other one.
> >>>
> >>> Uwe
> >>>
> >>>> On Thu, Feb 1, 2018, at 6:09 PM, Wes McKinney wrote:
> >>>> hi folks,
> >>>>
> >>>> We've had a rough couple of weeks in our PR queue due to various CI
> >>>> issues causing a high incidence of build failures:
> >>>>
> >>>> * Package dependency upgrades (Thrift -- this has been fixed)
> >>>> * Failures due possibly to VM setting changes in Travis CI (memory
> >>>> thrashing / VM timeouts, see ARROW-2062, ARROW-2071)
> >>>> * apt flakiness (this is still ongoing, see ARROW-2021)
> >>>>
> >>>> Meanwhile, at the moment, we have 37 open PRs
> >>>> (https://github.com/apache/arrow/pulls). Some of these are stale and
> >>>> need to either be reviewed, updated, or closed. We have many other PRs
> >>>> that need to be rebased (builds should mostly pass now if rebased on
> >>>> master) and/or reviewed. I've been doing the best I can do keep up
> >>>> with the PR queue (and others have been reviewing and merging PRs,
> >>>> too), but it's currently not enough to keep up, and there's a lot of
> >>>> development work for the 0.9.0 milestone that I'd like to also be
> >>>> doing.
> >>>>
> >>>> The project is growing fast -- both in users and new developers. Just
> >>>> on a single install path for the Python libraries, Arrow is being
> >>>> installed _over 1000 times per day_
> >>>> (https://anaconda.org/conda-forge/pyarrow) -- when you add up all the
> >>>> install paths it is likely to be much more than that.
> >>>>
> >>>> Reviews and help maintaining PRs from the community, but especially
> >>>> from other committers and PMC members, would be especially useful
> >>>> right now to get the project operating smoothly with a steady stream
> >>>> of high quality patches making their way into master.
> >>>>
> >>>> If there's anything else we can do to improve developer and community
> >>>> productivity in Arrow right now, I'm open to ideas.
> >>>>
> >>>> Thanks,
> >>>> Wes
> >>>
> >
>

Re: Arrow PR backlog: please help

Posted by Wes McKinney <we...@gmail.com>.
Does someone want to ask Infra about it? I haven't asked them since we
migrated to GitBox

On Thu, Feb 1, 2018 at 2:15 PM, Uwe L. Korn <uw...@xhochy.com> wrote:
> CircleCI requires more permissions than Travis and Apache Infra don't want to give it to them. This might be different now that we have the gitbox setup instead of the previous Apache git mirroring.
>
>> Am 01.02.2018 um 20:08 schrieb Phillip Cloud <cp...@gmail.com>:
>>
>> What is the main barrier to getting CircleCI to work with Apache projects?
>>
>>> On Thu, Feb 1, 2018 at 2:03 PM Uwe L. Korn <uw...@xhochy.com> wrote:
>>>
>>> I just went over a lot of open PRs and sadly I wasn't able to reduce the
>>> number of open ones significantly. Some of them make slow progress and it
>>> might be worthwhile to jump in in a week, for now I would rather wait and
>>> let the initial authors finish them to get more involved in the project.
>>> Currently the CI issues are a main bottleneck for all of us, besides the
>>> long-running Python tests, we also spent a lot of time on the environment
>>> setup. Typically this is a thing that can really be improved with a docker
>>> setup, sadly Travis takes quite some time to pull the current image we use
>>> for the manylinux1 build. I'll first have a look at improving it and if the
>>> download times get better, we might want to move some things in there
>>> (sadly CircleCI and Apache projects still don't work together).
>>>
>>> Also I think a confusing thing is that we have separate documentations
>>> between Python and C++. This is also a thing I'm going to work on once I
>>> have some time. The two implementation are bound very thight together and a
>>> lot that applies to one language also applies to the other one.
>>>
>>> Uwe
>>>
>>>> On Thu, Feb 1, 2018, at 6:09 PM, Wes McKinney wrote:
>>>> hi folks,
>>>>
>>>> We've had a rough couple of weeks in our PR queue due to various CI
>>>> issues causing a high incidence of build failures:
>>>>
>>>> * Package dependency upgrades (Thrift -- this has been fixed)
>>>> * Failures due possibly to VM setting changes in Travis CI (memory
>>>> thrashing / VM timeouts, see ARROW-2062, ARROW-2071)
>>>> * apt flakiness (this is still ongoing, see ARROW-2021)
>>>>
>>>> Meanwhile, at the moment, we have 37 open PRs
>>>> (https://github.com/apache/arrow/pulls). Some of these are stale and
>>>> need to either be reviewed, updated, or closed. We have many other PRs
>>>> that need to be rebased (builds should mostly pass now if rebased on
>>>> master) and/or reviewed. I've been doing the best I can do keep up
>>>> with the PR queue (and others have been reviewing and merging PRs,
>>>> too), but it's currently not enough to keep up, and there's a lot of
>>>> development work for the 0.9.0 milestone that I'd like to also be
>>>> doing.
>>>>
>>>> The project is growing fast -- both in users and new developers. Just
>>>> on a single install path for the Python libraries, Arrow is being
>>>> installed _over 1000 times per day_
>>>> (https://anaconda.org/conda-forge/pyarrow) -- when you add up all the
>>>> install paths it is likely to be much more than that.
>>>>
>>>> Reviews and help maintaining PRs from the community, but especially
>>>> from other committers and PMC members, would be especially useful
>>>> right now to get the project operating smoothly with a steady stream
>>>> of high quality patches making their way into master.
>>>>
>>>> If there's anything else we can do to improve developer and community
>>>> productivity in Arrow right now, I'm open to ideas.
>>>>
>>>> Thanks,
>>>> Wes
>>>
>

Re: Arrow PR backlog: please help

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
CircleCI requires more permissions than Travis and Apache Infra don't want to give it to them. This might be different now that we have the gitbox setup instead of the previous Apache git mirroring. 

> Am 01.02.2018 um 20:08 schrieb Phillip Cloud <cp...@gmail.com>:
> 
> What is the main barrier to getting CircleCI to work with Apache projects?
> 
>> On Thu, Feb 1, 2018 at 2:03 PM Uwe L. Korn <uw...@xhochy.com> wrote:
>> 
>> I just went over a lot of open PRs and sadly I wasn't able to reduce the
>> number of open ones significantly. Some of them make slow progress and it
>> might be worthwhile to jump in in a week, for now I would rather wait and
>> let the initial authors finish them to get more involved in the project.
>> Currently the CI issues are a main bottleneck for all of us, besides the
>> long-running Python tests, we also spent a lot of time on the environment
>> setup. Typically this is a thing that can really be improved with a docker
>> setup, sadly Travis takes quite some time to pull the current image we use
>> for the manylinux1 build. I'll first have a look at improving it and if the
>> download times get better, we might want to move some things in there
>> (sadly CircleCI and Apache projects still don't work together).
>> 
>> Also I think a confusing thing is that we have separate documentations
>> between Python and C++. This is also a thing I'm going to work on once I
>> have some time. The two implementation are bound very thight together and a
>> lot that applies to one language also applies to the other one.
>> 
>> Uwe
>> 
>>> On Thu, Feb 1, 2018, at 6:09 PM, Wes McKinney wrote:
>>> hi folks,
>>> 
>>> We've had a rough couple of weeks in our PR queue due to various CI
>>> issues causing a high incidence of build failures:
>>> 
>>> * Package dependency upgrades (Thrift -- this has been fixed)
>>> * Failures due possibly to VM setting changes in Travis CI (memory
>>> thrashing / VM timeouts, see ARROW-2062, ARROW-2071)
>>> * apt flakiness (this is still ongoing, see ARROW-2021)
>>> 
>>> Meanwhile, at the moment, we have 37 open PRs
>>> (https://github.com/apache/arrow/pulls). Some of these are stale and
>>> need to either be reviewed, updated, or closed. We have many other PRs
>>> that need to be rebased (builds should mostly pass now if rebased on
>>> master) and/or reviewed. I've been doing the best I can do keep up
>>> with the PR queue (and others have been reviewing and merging PRs,
>>> too), but it's currently not enough to keep up, and there's a lot of
>>> development work for the 0.9.0 milestone that I'd like to also be
>>> doing.
>>> 
>>> The project is growing fast -- both in users and new developers. Just
>>> on a single install path for the Python libraries, Arrow is being
>>> installed _over 1000 times per day_
>>> (https://anaconda.org/conda-forge/pyarrow) -- when you add up all the
>>> install paths it is likely to be much more than that.
>>> 
>>> Reviews and help maintaining PRs from the community, but especially
>>> from other committers and PMC members, would be especially useful
>>> right now to get the project operating smoothly with a steady stream
>>> of high quality patches making their way into master.
>>> 
>>> If there's anything else we can do to improve developer and community
>>> productivity in Arrow right now, I'm open to ideas.
>>> 
>>> Thanks,
>>> Wes
>> 


Re: Arrow PR backlog: please help

Posted by Li Jin <ic...@gmail.com>.
I just took a look at the Java issues,

I have reviewed all of those to the extent I can, however, most of them
need input from other Java committers in order to merge or move forward.

Please let me know if there is anything I can help more on reviewing Java
patches...

Li

On Thu, Feb 1, 2018 at 2:08 PM, Phillip Cloud <cp...@gmail.com> wrote:

> What is the main barrier to getting CircleCI to work with Apache projects?
>
> On Thu, Feb 1, 2018 at 2:03 PM Uwe L. Korn <uw...@xhochy.com> wrote:
>
> > I just went over a lot of open PRs and sadly I wasn't able to reduce the
> > number of open ones significantly. Some of them make slow progress and it
> > might be worthwhile to jump in in a week, for now I would rather wait and
> > let the initial authors finish them to get more involved in the project.
> > Currently the CI issues are a main bottleneck for all of us, besides the
> > long-running Python tests, we also spent a lot of time on the environment
> > setup. Typically this is a thing that can really be improved with a
> docker
> > setup, sadly Travis takes quite some time to pull the current image we
> use
> > for the manylinux1 build. I'll first have a look at improving it and if
> the
> > download times get better, we might want to move some things in there
> > (sadly CircleCI and Apache projects still don't work together).
> >
> > Also I think a confusing thing is that we have separate documentations
> > between Python and C++. This is also a thing I'm going to work on once I
> > have some time. The two implementation are bound very thight together
> and a
> > lot that applies to one language also applies to the other one.
> >
> > Uwe
> >
> > On Thu, Feb 1, 2018, at 6:09 PM, Wes McKinney wrote:
> > > hi folks,
> > >
> > > We've had a rough couple of weeks in our PR queue due to various CI
> > > issues causing a high incidence of build failures:
> > >
> > > * Package dependency upgrades (Thrift -- this has been fixed)
> > > * Failures due possibly to VM setting changes in Travis CI (memory
> > > thrashing / VM timeouts, see ARROW-2062, ARROW-2071)
> > > * apt flakiness (this is still ongoing, see ARROW-2021)
> > >
> > > Meanwhile, at the moment, we have 37 open PRs
> > > (https://github.com/apache/arrow/pulls). Some of these are stale and
> > > need to either be reviewed, updated, or closed. We have many other PRs
> > > that need to be rebased (builds should mostly pass now if rebased on
> > > master) and/or reviewed. I've been doing the best I can do keep up
> > > with the PR queue (and others have been reviewing and merging PRs,
> > > too), but it's currently not enough to keep up, and there's a lot of
> > > development work for the 0.9.0 milestone that I'd like to also be
> > > doing.
> > >
> > > The project is growing fast -- both in users and new developers. Just
> > > on a single install path for the Python libraries, Arrow is being
> > > installed _over 1000 times per day_
> > > (https://anaconda.org/conda-forge/pyarrow) -- when you add up all the
> > > install paths it is likely to be much more than that.
> > >
> > > Reviews and help maintaining PRs from the community, but especially
> > > from other committers and PMC members, would be especially useful
> > > right now to get the project operating smoothly with a steady stream
> > > of high quality patches making their way into master.
> > >
> > > If there's anything else we can do to improve developer and community
> > > productivity in Arrow right now, I'm open to ideas.
> > >
> > > Thanks,
> > > Wes
> >
>

Re: Arrow PR backlog: please help

Posted by Phillip Cloud <cp...@gmail.com>.
What is the main barrier to getting CircleCI to work with Apache projects?

On Thu, Feb 1, 2018 at 2:03 PM Uwe L. Korn <uw...@xhochy.com> wrote:

> I just went over a lot of open PRs and sadly I wasn't able to reduce the
> number of open ones significantly. Some of them make slow progress and it
> might be worthwhile to jump in in a week, for now I would rather wait and
> let the initial authors finish them to get more involved in the project.
> Currently the CI issues are a main bottleneck for all of us, besides the
> long-running Python tests, we also spent a lot of time on the environment
> setup. Typically this is a thing that can really be improved with a docker
> setup, sadly Travis takes quite some time to pull the current image we use
> for the manylinux1 build. I'll first have a look at improving it and if the
> download times get better, we might want to move some things in there
> (sadly CircleCI and Apache projects still don't work together).
>
> Also I think a confusing thing is that we have separate documentations
> between Python and C++. This is also a thing I'm going to work on once I
> have some time. The two implementation are bound very thight together and a
> lot that applies to one language also applies to the other one.
>
> Uwe
>
> On Thu, Feb 1, 2018, at 6:09 PM, Wes McKinney wrote:
> > hi folks,
> >
> > We've had a rough couple of weeks in our PR queue due to various CI
> > issues causing a high incidence of build failures:
> >
> > * Package dependency upgrades (Thrift -- this has been fixed)
> > * Failures due possibly to VM setting changes in Travis CI (memory
> > thrashing / VM timeouts, see ARROW-2062, ARROW-2071)
> > * apt flakiness (this is still ongoing, see ARROW-2021)
> >
> > Meanwhile, at the moment, we have 37 open PRs
> > (https://github.com/apache/arrow/pulls). Some of these are stale and
> > need to either be reviewed, updated, or closed. We have many other PRs
> > that need to be rebased (builds should mostly pass now if rebased on
> > master) and/or reviewed. I've been doing the best I can do keep up
> > with the PR queue (and others have been reviewing and merging PRs,
> > too), but it's currently not enough to keep up, and there's a lot of
> > development work for the 0.9.0 milestone that I'd like to also be
> > doing.
> >
> > The project is growing fast -- both in users and new developers. Just
> > on a single install path for the Python libraries, Arrow is being
> > installed _over 1000 times per day_
> > (https://anaconda.org/conda-forge/pyarrow) -- when you add up all the
> > install paths it is likely to be much more than that.
> >
> > Reviews and help maintaining PRs from the community, but especially
> > from other committers and PMC members, would be especially useful
> > right now to get the project operating smoothly with a steady stream
> > of high quality patches making their way into master.
> >
> > If there's anything else we can do to improve developer and community
> > productivity in Arrow right now, I'm open to ideas.
> >
> > Thanks,
> > Wes
>

Re: Arrow PR backlog: please help

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
I just went over a lot of open PRs and sadly I wasn't able to reduce the number of open ones significantly. Some of them make slow progress and it might be worthwhile to jump in in a week, for now I would rather wait and let the initial authors finish them to get more involved in the project. Currently the CI issues are a main bottleneck for all of us, besides the long-running Python tests, we also spent a lot of time on the environment setup. Typically this is a thing that can really be improved with a docker setup, sadly Travis takes quite some time to pull the current image we use for the manylinux1 build. I'll first have a look at improving it and if the download times get better, we might want to move some things in there (sadly CircleCI and Apache projects still don't work together).

Also I think a confusing thing is that we have separate documentations between Python and C++. This is also a thing I'm going to work on once I have some time. The two implementation are bound very thight together and a lot that applies to one language also applies to the other one.

Uwe 

On Thu, Feb 1, 2018, at 6:09 PM, Wes McKinney wrote:
> hi folks,
> 
> We've had a rough couple of weeks in our PR queue due to various CI
> issues causing a high incidence of build failures:
> 
> * Package dependency upgrades (Thrift -- this has been fixed)
> * Failures due possibly to VM setting changes in Travis CI (memory
> thrashing / VM timeouts, see ARROW-2062, ARROW-2071)
> * apt flakiness (this is still ongoing, see ARROW-2021)
> 
> Meanwhile, at the moment, we have 37 open PRs
> (https://github.com/apache/arrow/pulls). Some of these are stale and
> need to either be reviewed, updated, or closed. We have many other PRs
> that need to be rebased (builds should mostly pass now if rebased on
> master) and/or reviewed. I've been doing the best I can do keep up
> with the PR queue (and others have been reviewing and merging PRs,
> too), but it's currently not enough to keep up, and there's a lot of
> development work for the 0.9.0 milestone that I'd like to also be
> doing.
> 
> The project is growing fast -- both in users and new developers. Just
> on a single install path for the Python libraries, Arrow is being
> installed _over 1000 times per day_
> (https://anaconda.org/conda-forge/pyarrow) -- when you add up all the
> install paths it is likely to be much more than that.
> 
> Reviews and help maintaining PRs from the community, but especially
> from other committers and PMC members, would be especially useful
> right now to get the project operating smoothly with a steady stream
> of high quality patches making their way into master.
> 
> If there's anything else we can do to improve developer and community
> productivity in Arrow right now, I'm open to ideas.
> 
> Thanks,
> Wes