You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Joris Van den Bossche <jo...@gmail.com> on 2020/01/16 09:42:16 UTC

PySpark failure [RE: [NIGHTLY] Arrow Build Report for Job nightly-2020-01-15-0]

So the spark integration build started to fail, and with the following test
error:

======================================================================
ERROR: test_toPandas_batch_order
(pyspark.sql.tests.test_arrow.EncryptionArrowTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/spark/python/pyspark/sql/tests/test_arrow.py", line 422, in
test_toPandas_batch_order
    run_test(*case)
  File "/spark/python/pyspark/sql/tests/test_arrow.py", line 409, in run_test
    pdf, pdf_arrow = self._toPandas_arrow_toggle(df)
  File "/spark/python/pyspark/sql/tests/test_arrow.py", line 152, in
_toPandas_arrow_toggle
    pdf_arrow = df.toPandas()
  File "/spark/python/pyspark/sql/pandas/conversion.py", line 115, in toPandas
    return _check_dataframe_localize_timestamps(pdf, timezone)
  File "/spark/python/pyspark/sql/pandas/types.py", line 180, in
_check_dataframe_localize_timestamps
    pdf[column] = _check_series_localize_timestamps(series, timezone)
  File "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/frame.py",
line 3487, in __setitem__
    self._set_item(key, value)
  File "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/frame.py",
line 3565, in _set_item
    NDFrame._set_item(self, key, value)
  File "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/generic.py",
line 3381, in _set_item
    self._data.set(key, value)
  File "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/internals/managers.py",
line 1090, in set
    blk.set(blk_locs, value_getitem(val_locs))
  File "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/internals/blocks.py",
line 380, in set
    self.values[locs] = values
ValueError: assignment destination is read-only


It's from a test that is doing conversions from spark to arrow to pandas
(so calling pyarrow.Table.to_pandas here
<https://github.com/apache/spark/blob/018bdcc53c925072b07956de0600452ad255b9c7/python/pyspark/sql/pandas/conversion.py#L111-L115>),
and on the resulting DataFrame, it is iterating through all columns,
potentially fixing timezones, and writing each column back into the
DataFrame (here
<https://github.com/apache/spark/blob/018bdcc53c925072b07956de0600452ad255b9c7/python/pyspark/sql/pandas/types.py#L179-L181>
).

Since it is giving an error about read-only, it might be related to
zero-copy behaviour of to_pandas, and thus might be related to the refactor
of the arrow->pandas conversion that landed yesterday (
https://github.com/apache/arrow/pull/6067, it says it changed to do
zero-copy for 1-column blocks if possible).
I am not sure if something should be fixed in pyarrow for this, but the
obvious thing that pyspark can do is specify they don't want zero-copy.

Joris

On Wed, 15 Jan 2020 at 14:32, Crossbow <cr...@ursalabs.org> wrote:

>
> Arrow Build Report for Job nightly-2020-01-15-0
>
> All tasks:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0
>
> Failed Tasks:
> - gandiva-jar-osx:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-gandiva-jar-osx
> - test-conda-python-3.7-spark-master:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-spark-master
> - wheel-manylinux2014-cp35m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2014-cp35m
>
> Succeeded Tasks:
> - centos-6:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-centos-6
> - centos-7:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-centos-7
> - centos-8:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-centos-8
> - conda-linux-gcc-py27:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-linux-gcc-py27
> - conda-linux-gcc-py36:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-linux-gcc-py36
> - conda-linux-gcc-py37:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-linux-gcc-py37
> - conda-linux-gcc-py38:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-linux-gcc-py38
> - conda-osx-clang-py27:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-osx-clang-py27
> - conda-osx-clang-py36:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-osx-clang-py36
> - conda-osx-clang-py37:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-osx-clang-py37
> - conda-osx-clang-py38:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-osx-clang-py38
> - conda-win-vs2015-py36:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-win-vs2015-py36
> - conda-win-vs2015-py37:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-win-vs2015-py37
> - conda-win-vs2015-py38:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-win-vs2015-py38
> - debian-buster:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-debian-buster
> - debian-stretch:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-debian-stretch
> - gandiva-jar-trusty:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-gandiva-jar-trusty
> - homebrew-cpp:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-homebrew-cpp
> - macos-r-autobrew:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-macos-r-autobrew
> - test-conda-cpp:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-cpp
> - test-conda-python-2.7-pandas-latest:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-2.7-pandas-latest
> - test-conda-python-2.7:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-2.7
> - test-conda-python-3.6:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.6
> - test-conda-python-3.7-dask-latest:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-dask-latest
> - test-conda-python-3.7-hdfs-2.9.2:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-hdfs-2.9.2
> - test-conda-python-3.7-pandas-latest:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-pandas-latest
> - test-conda-python-3.7-pandas-master:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-pandas-master
> - test-conda-python-3.7-turbodbc-latest:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-turbodbc-latest
> - test-conda-python-3.7-turbodbc-master:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-turbodbc-master
> - test-conda-python-3.7:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7
> - test-conda-python-3.8-dask-master:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.8-dask-master
> - test-conda-python-3.8-pandas-latest:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.8-pandas-latest
> - test-conda-r-3.6:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-r-3.6
> - test-debian-10-cpp:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-debian-10-cpp
> - test-debian-10-go-1.12:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-debian-10-go-1.12
> - test-debian-10-python-3:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-debian-10-python-3
> - test-debian-c-glib:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-debian-c-glib
> - test-debian-ruby:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-debian-ruby
> - test-fedora-29-cpp:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-fedora-29-cpp
> - test-fedora-29-python-3:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-fedora-29-python-3
> - test-r-rhub-debian-gcc-devel:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rhub-debian-gcc-devel
> - test-r-rhub-ubuntu-gcc-release:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rhub-ubuntu-gcc-release
> - test-r-rstudio-r-base-3.6-bionic:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rstudio-r-base-3.6-bionic
> - test-r-rstudio-r-base-3.6-centos6:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rstudio-r-base-3.6-centos6
> - test-r-rstudio-r-base-3.6-opensuse15:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rstudio-r-base-3.6-opensuse15
> - test-r-rstudio-r-base-3.6-opensuse42:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rstudio-r-base-3.6-opensuse42
> - test-ubuntu-16.04-cpp:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-16.04-cpp
> - test-ubuntu-18.04-cpp-cmake32:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-cpp-cmake32
> - test-ubuntu-18.04-cpp-release:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-cpp-release
> - test-ubuntu-18.04-cpp-static:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-cpp-static
> - test-ubuntu-18.04-cpp:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-cpp
> - test-ubuntu-18.04-docs:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-docs
> - test-ubuntu-18.04-python-3:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-python-3
> - test-ubuntu-18.04-r-sanitizer:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-r-sanitizer
> - test-ubuntu-c-glib:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-c-glib
> - test-ubuntu-fuzzit-fuzzing:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-fuzzit-fuzzing
> - test-ubuntu-fuzzit-regression:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-fuzzit-regression
> - test-ubuntu-ruby:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-ruby
> - ubuntu-bionic:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-ubuntu-bionic
> - ubuntu-disco:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-ubuntu-disco
> - ubuntu-xenial:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-ubuntu-xenial
> - wheel-manylinux1-cp27m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp27m
> - wheel-manylinux1-cp27mu:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp27mu
> - wheel-manylinux1-cp35m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp35m
> - wheel-manylinux1-cp36m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp36m
> - wheel-manylinux1-cp37m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp37m
> - wheel-manylinux1-cp38:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp38
> - wheel-manylinux2010-cp27m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp27m
> - wheel-manylinux2010-cp27mu:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp27mu
> - wheel-manylinux2010-cp35m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp35m
> - wheel-manylinux2010-cp36m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp36m
> - wheel-manylinux2010-cp37m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp37m
> - wheel-manylinux2010-cp38:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp38
> - wheel-manylinux2014-cp36m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2014-cp36m
> - wheel-manylinux2014-cp37m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2014-cp37m
> - wheel-manylinux2014-cp38:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2014-cp38
> - wheel-osx-cp27m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-wheel-osx-cp27m
> - wheel-osx-cp35m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-wheel-osx-cp35m
> - wheel-osx-cp36m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-wheel-osx-cp36m
> - wheel-osx-cp37m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-wheel-osx-cp37m
> - wheel-osx-cp38:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-wheel-osx-cp38
> - wheel-win-cp36m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-appveyor-wheel-win-cp36m
> - wheel-win-cp37m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-appveyor-wheel-win-cp37m
> - wheel-win-cp38:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-appveyor-wheel-win-cp38
>

Re: PySpark failure [RE: [NIGHTLY] Arrow Build Report for Job nightly-2020-01-15-0]

Posted by Bryan Cutler <cu...@gmail.com>.
Thanks Joris for clearing that up! It's correct that pyspark will allow the
user to do operations on the resulting DataFrame, so it doesn't sound like
I should set `split_blocks=True` in the conversion. You're right that the
unnecessary assignments can be easily avoided if not timestamps, so that
will be a big help. I'll link this discussion to the JIRA in case it could
help others. Thanks again.

Bryan

On Fri, Jan 24, 2020 at 2:10 AM Joris Van den Bossche <
jorisvandenbossche@gmail.com> wrote:

> Hi Bryan,
>
> For the case that the column is no timestamp and was not modified: I don't
> think it will take copies of the full dataframe by assigning columns in a
> loop like that. But it is still doing work (it will copy data for that
> column into the array holding those data for 2D blocks), and which can
> easily be avoided I think by only assigning back when the column was
> actually modified (eg by moving the is_datetime64tz_dtype inline in the
> loop iterating through all columns, so you can only write back if actually
> having tz-aware data).
>
> Further, even if you do the above to avoid writing back to the dataframe
> when not needed, I am not sure you should directly try to use the new
> zero-copy feature of the Table.to_pandas conversion (with
> split_blocks=True). It depends very much on what further happens with the
> converted dataframe. Once you do some operations in pandas, those splitted
> blocks will get combined (resulting in a memory copy then), and it also
> means you can't modify the dataframe (if this dataframe is used in python
> UDFs, it might limit what can be done in those UDFs. Just guessing here, I
> don't know the pyspark code well enough).
>
> Joris
>
>
> On Thu, 23 Jan 2020 at 21:03, Bryan Cutler <cu...@gmail.com> wrote:
>
> > Thanks for investigating this and the quick fix Joris and Wes!  I just
> have
> > a couple questions about the behavior observed here.  The pyspark code
> > assigns either the same series back to the pandas.DataFrame or makes some
> > modifications if it is a timestamp. In the case there are no timestamps,
> is
> > this potentially making extra copies or will it be unable to take
> advantage
> > of new zero-copy features in pyarrow? For the case of having timestamp
> > columns that need to be modified, is there a more efficient way to
> create a
> > new dataframe with only copies of the modified series?  Thanks!
> >
> > Bryan
> >
> > On Thu, Jan 16, 2020 at 11:48 PM Joris Van den Bossche <
> > jorisvandenbossche@gmail.com> wrote:
> >
> > > That sounds like a good solution. Having the zero-copy behavior
> depending
> > > on whether you have only 1 column of a certain type or not, might lead
> to
> > > surprising results. To avoid yet another keyword, only doing it when
> > > split_blocks=True sounds good to me (in practice, that's also when it
> > will
> > > happen mostly, except for very narrow dataframes with only few
> columns).
> > >
> > > Joris
> > >
> > > On Thu, 16 Jan 2020 at 22:44, Wes McKinney <we...@gmail.com>
> wrote:
> > >
> > > > hi Joris,
> > > >
> > > > Thanks for investigating this. It seems there were some unintended
> > > > consequences of the zero-copy optimizations from ARROW-3789. Another
> > > > way forward might be to "opt in" to this behavior, or to only do the
> > > > zero copy optimizations when split_blocks=True. What do you think?
> > > >
> > > > - Wes
> > > >
> > > > On Thu, Jan 16, 2020 at 3:42 AM Joris Van den Bossche
> > > > <jo...@gmail.com> wrote:
> > > > >
> > > > > So the spark integration build started to fail, and with the
> > following
> > > > test
> > > > > error:
> > > > >
> > > > >
> > ======================================================================
> > > > > ERROR: test_toPandas_batch_order
> > > > > (pyspark.sql.tests.test_arrow.EncryptionArrowTests)
> > > > >
> > ----------------------------------------------------------------------
> > > > > Traceback (most recent call last):
> > > > >   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 422,
> in
> > > > > test_toPandas_batch_order
> > > > >     run_test(*case)
> > > > >   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 409,
> in
> > > > run_test
> > > > >     pdf, pdf_arrow = self._toPandas_arrow_toggle(df)
> > > > >   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 152,
> in
> > > > > _toPandas_arrow_toggle
> > > > >     pdf_arrow = df.toPandas()
> > > > >   File "/spark/python/pyspark/sql/pandas/conversion.py", line 115,
> in
> > > > toPandas
> > > > >     return _check_dataframe_localize_timestamps(pdf, timezone)
> > > > >   File "/spark/python/pyspark/sql/pandas/types.py", line 180, in
> > > > > _check_dataframe_localize_timestamps
> > > > >     pdf[column] = _check_series_localize_timestamps(series,
> timezone)
> > > > >   File
> > > >
> > "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/frame.py",
> > > > > line 3487, in __setitem__
> > > > >     self._set_item(key, value)
> > > > >   File
> > > >
> > "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/frame.py",
> > > > > line 3565, in _set_item
> > > > >     NDFrame._set_item(self, key, value)
> > > > >   File
> > > >
> > >
> >
> "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/generic.py",
> > > > > line 3381, in _set_item
> > > > >     self._data.set(key, value)
> > > > >   File
> > > >
> > >
> >
> "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/internals/managers.py",
> > > > > line 1090, in set
> > > > >     blk.set(blk_locs, value_getitem(val_locs))
> > > > >   File
> > > >
> > >
> >
> "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/internals/blocks.py",
> > > > > line 380, in set
> > > > >     self.values[locs] = values
> > > > > ValueError: assignment destination is read-only
> > > > >
> > > > >
> > > > > It's from a test that is doing conversions from spark to arrow to
> > > pandas
> > > > > (so calling pyarrow.Table.to_pandas here
> > > > > <
> > > >
> > >
> >
> https://github.com/apache/spark/blob/018bdcc53c925072b07956de0600452ad255b9c7/python/pyspark/sql/pandas/conversion.py#L111-L115
> > > > >),
> > > > > and on the resulting DataFrame, it is iterating through all
> columns,
> > > > > potentially fixing timezones, and writing each column back into the
> > > > > DataFrame (here
> > > > > <
> > > >
> > >
> >
> https://github.com/apache/spark/blob/018bdcc53c925072b07956de0600452ad255b9c7/python/pyspark/sql/pandas/types.py#L179-L181
> > > > >
> > > > > ).
> > > > >
> > > > > Since it is giving an error about read-only, it might be related to
> > > > > zero-copy behaviour of to_pandas, and thus might be related to the
> > > > refactor
> > > > > of the arrow->pandas conversion that landed yesterday (
> > > > > https://github.com/apache/arrow/pull/6067, it says it changed to
> do
> > > > > zero-copy for 1-column blocks if possible).
> > > > > I am not sure if something should be fixed in pyarrow for this, but
> > the
> > > > > obvious thing that pyspark can do is specify they don't want
> > zero-copy.
> > > > >
> > > > > Joris
> > > > >
> > > > > On Wed, 15 Jan 2020 at 14:32, Crossbow <cr...@ursalabs.org>
> > wrote:
> > > > >
> > > >
> > >
> >
>

Re: PySpark failure [RE: [NIGHTLY] Arrow Build Report for Job nightly-2020-01-15-0]

Posted by Joris Van den Bossche <jo...@gmail.com>.
Hi Bryan,

For the case that the column is no timestamp and was not modified: I don't
think it will take copies of the full dataframe by assigning columns in a
loop like that. But it is still doing work (it will copy data for that
column into the array holding those data for 2D blocks), and which can
easily be avoided I think by only assigning back when the column was
actually modified (eg by moving the is_datetime64tz_dtype inline in the
loop iterating through all columns, so you can only write back if actually
having tz-aware data).

Further, even if you do the above to avoid writing back to the dataframe
when not needed, I am not sure you should directly try to use the new
zero-copy feature of the Table.to_pandas conversion (with
split_blocks=True). It depends very much on what further happens with the
converted dataframe. Once you do some operations in pandas, those splitted
blocks will get combined (resulting in a memory copy then), and it also
means you can't modify the dataframe (if this dataframe is used in python
UDFs, it might limit what can be done in those UDFs. Just guessing here, I
don't know the pyspark code well enough).

Joris


On Thu, 23 Jan 2020 at 21:03, Bryan Cutler <cu...@gmail.com> wrote:

> Thanks for investigating this and the quick fix Joris and Wes!  I just have
> a couple questions about the behavior observed here.  The pyspark code
> assigns either the same series back to the pandas.DataFrame or makes some
> modifications if it is a timestamp. In the case there are no timestamps, is
> this potentially making extra copies or will it be unable to take advantage
> of new zero-copy features in pyarrow? For the case of having timestamp
> columns that need to be modified, is there a more efficient way to create a
> new dataframe with only copies of the modified series?  Thanks!
>
> Bryan
>
> On Thu, Jan 16, 2020 at 11:48 PM Joris Van den Bossche <
> jorisvandenbossche@gmail.com> wrote:
>
> > That sounds like a good solution. Having the zero-copy behavior depending
> > on whether you have only 1 column of a certain type or not, might lead to
> > surprising results. To avoid yet another keyword, only doing it when
> > split_blocks=True sounds good to me (in practice, that's also when it
> will
> > happen mostly, except for very narrow dataframes with only few columns).
> >
> > Joris
> >
> > On Thu, 16 Jan 2020 at 22:44, Wes McKinney <we...@gmail.com> wrote:
> >
> > > hi Joris,
> > >
> > > Thanks for investigating this. It seems there were some unintended
> > > consequences of the zero-copy optimizations from ARROW-3789. Another
> > > way forward might be to "opt in" to this behavior, or to only do the
> > > zero copy optimizations when split_blocks=True. What do you think?
> > >
> > > - Wes
> > >
> > > On Thu, Jan 16, 2020 at 3:42 AM Joris Van den Bossche
> > > <jo...@gmail.com> wrote:
> > > >
> > > > So the spark integration build started to fail, and with the
> following
> > > test
> > > > error:
> > > >
> > > >
> ======================================================================
> > > > ERROR: test_toPandas_batch_order
> > > > (pyspark.sql.tests.test_arrow.EncryptionArrowTests)
> > > >
> ----------------------------------------------------------------------
> > > > Traceback (most recent call last):
> > > >   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 422, in
> > > > test_toPandas_batch_order
> > > >     run_test(*case)
> > > >   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 409, in
> > > run_test
> > > >     pdf, pdf_arrow = self._toPandas_arrow_toggle(df)
> > > >   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 152, in
> > > > _toPandas_arrow_toggle
> > > >     pdf_arrow = df.toPandas()
> > > >   File "/spark/python/pyspark/sql/pandas/conversion.py", line 115, in
> > > toPandas
> > > >     return _check_dataframe_localize_timestamps(pdf, timezone)
> > > >   File "/spark/python/pyspark/sql/pandas/types.py", line 180, in
> > > > _check_dataframe_localize_timestamps
> > > >     pdf[column] = _check_series_localize_timestamps(series, timezone)
> > > >   File
> > >
> "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/frame.py",
> > > > line 3487, in __setitem__
> > > >     self._set_item(key, value)
> > > >   File
> > >
> "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/frame.py",
> > > > line 3565, in _set_item
> > > >     NDFrame._set_item(self, key, value)
> > > >   File
> > >
> >
> "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/generic.py",
> > > > line 3381, in _set_item
> > > >     self._data.set(key, value)
> > > >   File
> > >
> >
> "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/internals/managers.py",
> > > > line 1090, in set
> > > >     blk.set(blk_locs, value_getitem(val_locs))
> > > >   File
> > >
> >
> "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/internals/blocks.py",
> > > > line 380, in set
> > > >     self.values[locs] = values
> > > > ValueError: assignment destination is read-only
> > > >
> > > >
> > > > It's from a test that is doing conversions from spark to arrow to
> > pandas
> > > > (so calling pyarrow.Table.to_pandas here
> > > > <
> > >
> >
> https://github.com/apache/spark/blob/018bdcc53c925072b07956de0600452ad255b9c7/python/pyspark/sql/pandas/conversion.py#L111-L115
> > > >),
> > > > and on the resulting DataFrame, it is iterating through all columns,
> > > > potentially fixing timezones, and writing each column back into the
> > > > DataFrame (here
> > > > <
> > >
> >
> https://github.com/apache/spark/blob/018bdcc53c925072b07956de0600452ad255b9c7/python/pyspark/sql/pandas/types.py#L179-L181
> > > >
> > > > ).
> > > >
> > > > Since it is giving an error about read-only, it might be related to
> > > > zero-copy behaviour of to_pandas, and thus might be related to the
> > > refactor
> > > > of the arrow->pandas conversion that landed yesterday (
> > > > https://github.com/apache/arrow/pull/6067, it says it changed to do
> > > > zero-copy for 1-column blocks if possible).
> > > > I am not sure if something should be fixed in pyarrow for this, but
> the
> > > > obvious thing that pyspark can do is specify they don't want
> zero-copy.
> > > >
> > > > Joris
> > > >
> > > > On Wed, 15 Jan 2020 at 14:32, Crossbow <cr...@ursalabs.org>
> wrote:
> > > >
> > >
> >
>

Re: PySpark failure [RE: [NIGHTLY] Arrow Build Report for Job nightly-2020-01-15-0]

Posted by Bryan Cutler <cu...@gmail.com>.
Thanks for investigating this and the quick fix Joris and Wes!  I just have
a couple questions about the behavior observed here.  The pyspark code
assigns either the same series back to the pandas.DataFrame or makes some
modifications if it is a timestamp. In the case there are no timestamps, is
this potentially making extra copies or will it be unable to take advantage
of new zero-copy features in pyarrow? For the case of having timestamp
columns that need to be modified, is there a more efficient way to create a
new dataframe with only copies of the modified series?  Thanks!

Bryan

On Thu, Jan 16, 2020 at 11:48 PM Joris Van den Bossche <
jorisvandenbossche@gmail.com> wrote:

> That sounds like a good solution. Having the zero-copy behavior depending
> on whether you have only 1 column of a certain type or not, might lead to
> surprising results. To avoid yet another keyword, only doing it when
> split_blocks=True sounds good to me (in practice, that's also when it will
> happen mostly, except for very narrow dataframes with only few columns).
>
> Joris
>
> On Thu, 16 Jan 2020 at 22:44, Wes McKinney <we...@gmail.com> wrote:
>
> > hi Joris,
> >
> > Thanks for investigating this. It seems there were some unintended
> > consequences of the zero-copy optimizations from ARROW-3789. Another
> > way forward might be to "opt in" to this behavior, or to only do the
> > zero copy optimizations when split_blocks=True. What do you think?
> >
> > - Wes
> >
> > On Thu, Jan 16, 2020 at 3:42 AM Joris Van den Bossche
> > <jo...@gmail.com> wrote:
> > >
> > > So the spark integration build started to fail, and with the following
> > test
> > > error:
> > >
> > > ======================================================================
> > > ERROR: test_toPandas_batch_order
> > > (pyspark.sql.tests.test_arrow.EncryptionArrowTests)
> > > ----------------------------------------------------------------------
> > > Traceback (most recent call last):
> > >   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 422, in
> > > test_toPandas_batch_order
> > >     run_test(*case)
> > >   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 409, in
> > run_test
> > >     pdf, pdf_arrow = self._toPandas_arrow_toggle(df)
> > >   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 152, in
> > > _toPandas_arrow_toggle
> > >     pdf_arrow = df.toPandas()
> > >   File "/spark/python/pyspark/sql/pandas/conversion.py", line 115, in
> > toPandas
> > >     return _check_dataframe_localize_timestamps(pdf, timezone)
> > >   File "/spark/python/pyspark/sql/pandas/types.py", line 180, in
> > > _check_dataframe_localize_timestamps
> > >     pdf[column] = _check_series_localize_timestamps(series, timezone)
> > >   File
> > "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/frame.py",
> > > line 3487, in __setitem__
> > >     self._set_item(key, value)
> > >   File
> > "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/frame.py",
> > > line 3565, in _set_item
> > >     NDFrame._set_item(self, key, value)
> > >   File
> >
> "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/generic.py",
> > > line 3381, in _set_item
> > >     self._data.set(key, value)
> > >   File
> >
> "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/internals/managers.py",
> > > line 1090, in set
> > >     blk.set(blk_locs, value_getitem(val_locs))
> > >   File
> >
> "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/internals/blocks.py",
> > > line 380, in set
> > >     self.values[locs] = values
> > > ValueError: assignment destination is read-only
> > >
> > >
> > > It's from a test that is doing conversions from spark to arrow to
> pandas
> > > (so calling pyarrow.Table.to_pandas here
> > > <
> >
> https://github.com/apache/spark/blob/018bdcc53c925072b07956de0600452ad255b9c7/python/pyspark/sql/pandas/conversion.py#L111-L115
> > >),
> > > and on the resulting DataFrame, it is iterating through all columns,
> > > potentially fixing timezones, and writing each column back into the
> > > DataFrame (here
> > > <
> >
> https://github.com/apache/spark/blob/018bdcc53c925072b07956de0600452ad255b9c7/python/pyspark/sql/pandas/types.py#L179-L181
> > >
> > > ).
> > >
> > > Since it is giving an error about read-only, it might be related to
> > > zero-copy behaviour of to_pandas, and thus might be related to the
> > refactor
> > > of the arrow->pandas conversion that landed yesterday (
> > > https://github.com/apache/arrow/pull/6067, it says it changed to do
> > > zero-copy for 1-column blocks if possible).
> > > I am not sure if something should be fixed in pyarrow for this, but the
> > > obvious thing that pyspark can do is specify they don't want zero-copy.
> > >
> > > Joris
> > >
> > > On Wed, 15 Jan 2020 at 14:32, Crossbow <cr...@ursalabs.org> wrote:
> > >
> >
>

Re: PySpark failure [RE: [NIGHTLY] Arrow Build Report for Job nightly-2020-01-15-0]

Posted by Joris Van den Bossche <jo...@gmail.com>.
That sounds like a good solution. Having the zero-copy behavior depending
on whether you have only 1 column of a certain type or not, might lead to
surprising results. To avoid yet another keyword, only doing it when
split_blocks=True sounds good to me (in practice, that's also when it will
happen mostly, except for very narrow dataframes with only few columns).

Joris

On Thu, 16 Jan 2020 at 22:44, Wes McKinney <we...@gmail.com> wrote:

> hi Joris,
>
> Thanks for investigating this. It seems there were some unintended
> consequences of the zero-copy optimizations from ARROW-3789. Another
> way forward might be to "opt in" to this behavior, or to only do the
> zero copy optimizations when split_blocks=True. What do you think?
>
> - Wes
>
> On Thu, Jan 16, 2020 at 3:42 AM Joris Van den Bossche
> <jo...@gmail.com> wrote:
> >
> > So the spark integration build started to fail, and with the following
> test
> > error:
> >
> > ======================================================================
> > ERROR: test_toPandas_batch_order
> > (pyspark.sql.tests.test_arrow.EncryptionArrowTests)
> > ----------------------------------------------------------------------
> > Traceback (most recent call last):
> >   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 422, in
> > test_toPandas_batch_order
> >     run_test(*case)
> >   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 409, in
> run_test
> >     pdf, pdf_arrow = self._toPandas_arrow_toggle(df)
> >   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 152, in
> > _toPandas_arrow_toggle
> >     pdf_arrow = df.toPandas()
> >   File "/spark/python/pyspark/sql/pandas/conversion.py", line 115, in
> toPandas
> >     return _check_dataframe_localize_timestamps(pdf, timezone)
> >   File "/spark/python/pyspark/sql/pandas/types.py", line 180, in
> > _check_dataframe_localize_timestamps
> >     pdf[column] = _check_series_localize_timestamps(series, timezone)
> >   File
> "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/frame.py",
> > line 3487, in __setitem__
> >     self._set_item(key, value)
> >   File
> "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/frame.py",
> > line 3565, in _set_item
> >     NDFrame._set_item(self, key, value)
> >   File
> "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/generic.py",
> > line 3381, in _set_item
> >     self._data.set(key, value)
> >   File
> "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/internals/managers.py",
> > line 1090, in set
> >     blk.set(blk_locs, value_getitem(val_locs))
> >   File
> "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/internals/blocks.py",
> > line 380, in set
> >     self.values[locs] = values
> > ValueError: assignment destination is read-only
> >
> >
> > It's from a test that is doing conversions from spark to arrow to pandas
> > (so calling pyarrow.Table.to_pandas here
> > <
> https://github.com/apache/spark/blob/018bdcc53c925072b07956de0600452ad255b9c7/python/pyspark/sql/pandas/conversion.py#L111-L115
> >),
> > and on the resulting DataFrame, it is iterating through all columns,
> > potentially fixing timezones, and writing each column back into the
> > DataFrame (here
> > <
> https://github.com/apache/spark/blob/018bdcc53c925072b07956de0600452ad255b9c7/python/pyspark/sql/pandas/types.py#L179-L181
> >
> > ).
> >
> > Since it is giving an error about read-only, it might be related to
> > zero-copy behaviour of to_pandas, and thus might be related to the
> refactor
> > of the arrow->pandas conversion that landed yesterday (
> > https://github.com/apache/arrow/pull/6067, it says it changed to do
> > zero-copy for 1-column blocks if possible).
> > I am not sure if something should be fixed in pyarrow for this, but the
> > obvious thing that pyspark can do is specify they don't want zero-copy.
> >
> > Joris
> >
> > On Wed, 15 Jan 2020 at 14:32, Crossbow <cr...@ursalabs.org> wrote:
> >
>

Re: PySpark failure [RE: [NIGHTLY] Arrow Build Report for Job nightly-2020-01-15-0]

Posted by Wes McKinney <we...@gmail.com>.
I created https://issues.apache.org/jira/browse/ARROW-7596 and made it
a blocker for 0.16.0 so this does not get lost in the shuffle

On Thu, Jan 16, 2020 at 3:43 PM Wes McKinney <we...@gmail.com> wrote:
>
> hi Joris,
>
> Thanks for investigating this. It seems there were some unintended
> consequences of the zero-copy optimizations from ARROW-3789. Another
> way forward might be to "opt in" to this behavior, or to only do the
> zero copy optimizations when split_blocks=True. What do you think?
>
> - Wes
>
> On Thu, Jan 16, 2020 at 3:42 AM Joris Van den Bossche
> <jo...@gmail.com> wrote:
> >
> > So the spark integration build started to fail, and with the following test
> > error:
> >
> > ======================================================================
> > ERROR: test_toPandas_batch_order
> > (pyspark.sql.tests.test_arrow.EncryptionArrowTests)
> > ----------------------------------------------------------------------
> > Traceback (most recent call last):
> >   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 422, in
> > test_toPandas_batch_order
> >     run_test(*case)
> >   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 409, in run_test
> >     pdf, pdf_arrow = self._toPandas_arrow_toggle(df)
> >   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 152, in
> > _toPandas_arrow_toggle
> >     pdf_arrow = df.toPandas()
> >   File "/spark/python/pyspark/sql/pandas/conversion.py", line 115, in toPandas
> >     return _check_dataframe_localize_timestamps(pdf, timezone)
> >   File "/spark/python/pyspark/sql/pandas/types.py", line 180, in
> > _check_dataframe_localize_timestamps
> >     pdf[column] = _check_series_localize_timestamps(series, timezone)
> >   File "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/frame.py",
> > line 3487, in __setitem__
> >     self._set_item(key, value)
> >   File "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/frame.py",
> > line 3565, in _set_item
> >     NDFrame._set_item(self, key, value)
> >   File "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/generic.py",
> > line 3381, in _set_item
> >     self._data.set(key, value)
> >   File "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/internals/managers.py",
> > line 1090, in set
> >     blk.set(blk_locs, value_getitem(val_locs))
> >   File "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/internals/blocks.py",
> > line 380, in set
> >     self.values[locs] = values
> > ValueError: assignment destination is read-only
> >
> >
> > It's from a test that is doing conversions from spark to arrow to pandas
> > (so calling pyarrow.Table.to_pandas here
> > <https://github.com/apache/spark/blob/018bdcc53c925072b07956de0600452ad255b9c7/python/pyspark/sql/pandas/conversion.py#L111-L115>),
> > and on the resulting DataFrame, it is iterating through all columns,
> > potentially fixing timezones, and writing each column back into the
> > DataFrame (here
> > <https://github.com/apache/spark/blob/018bdcc53c925072b07956de0600452ad255b9c7/python/pyspark/sql/pandas/types.py#L179-L181>
> > ).
> >
> > Since it is giving an error about read-only, it might be related to
> > zero-copy behaviour of to_pandas, and thus might be related to the refactor
> > of the arrow->pandas conversion that landed yesterday (
> > https://github.com/apache/arrow/pull/6067, it says it changed to do
> > zero-copy for 1-column blocks if possible).
> > I am not sure if something should be fixed in pyarrow for this, but the
> > obvious thing that pyspark can do is specify they don't want zero-copy.
> >
> > Joris
> >
> > On Wed, 15 Jan 2020 at 14:32, Crossbow <cr...@ursalabs.org> wrote:
> >
> > >
> > > Arrow Build Report for Job nightly-2020-01-15-0
> > >
> > > All tasks:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0
> > >
> > > Failed Tasks:
> > > - gandiva-jar-osx:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-gandiva-jar-osx
> > > - test-conda-python-3.7-spark-master:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-spark-master
> > > - wheel-manylinux2014-cp35m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2014-cp35m
> > >
> > > Succeeded Tasks:
> > > - centos-6:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-centos-6
> > > - centos-7:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-centos-7
> > > - centos-8:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-centos-8
> > > - conda-linux-gcc-py27:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-linux-gcc-py27
> > > - conda-linux-gcc-py36:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-linux-gcc-py36
> > > - conda-linux-gcc-py37:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-linux-gcc-py37
> > > - conda-linux-gcc-py38:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-linux-gcc-py38
> > > - conda-osx-clang-py27:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-osx-clang-py27
> > > - conda-osx-clang-py36:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-osx-clang-py36
> > > - conda-osx-clang-py37:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-osx-clang-py37
> > > - conda-osx-clang-py38:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-osx-clang-py38
> > > - conda-win-vs2015-py36:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-win-vs2015-py36
> > > - conda-win-vs2015-py37:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-win-vs2015-py37
> > > - conda-win-vs2015-py38:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-win-vs2015-py38
> > > - debian-buster:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-debian-buster
> > > - debian-stretch:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-debian-stretch
> > > - gandiva-jar-trusty:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-gandiva-jar-trusty
> > > - homebrew-cpp:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-homebrew-cpp
> > > - macos-r-autobrew:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-macos-r-autobrew
> > > - test-conda-cpp:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-cpp
> > > - test-conda-python-2.7-pandas-latest:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-2.7-pandas-latest
> > > - test-conda-python-2.7:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-2.7
> > > - test-conda-python-3.6:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.6
> > > - test-conda-python-3.7-dask-latest:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-dask-latest
> > > - test-conda-python-3.7-hdfs-2.9.2:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-hdfs-2.9.2
> > > - test-conda-python-3.7-pandas-latest:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-pandas-latest
> > > - test-conda-python-3.7-pandas-master:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-pandas-master
> > > - test-conda-python-3.7-turbodbc-latest:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-turbodbc-latest
> > > - test-conda-python-3.7-turbodbc-master:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-turbodbc-master
> > > - test-conda-python-3.7:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7
> > > - test-conda-python-3.8-dask-master:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.8-dask-master
> > > - test-conda-python-3.8-pandas-latest:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.8-pandas-latest
> > > - test-conda-r-3.6:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-r-3.6
> > > - test-debian-10-cpp:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-debian-10-cpp
> > > - test-debian-10-go-1.12:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-debian-10-go-1.12
> > > - test-debian-10-python-3:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-debian-10-python-3
> > > - test-debian-c-glib:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-debian-c-glib
> > > - test-debian-ruby:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-debian-ruby
> > > - test-fedora-29-cpp:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-fedora-29-cpp
> > > - test-fedora-29-python-3:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-fedora-29-python-3
> > > - test-r-rhub-debian-gcc-devel:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rhub-debian-gcc-devel
> > > - test-r-rhub-ubuntu-gcc-release:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rhub-ubuntu-gcc-release
> > > - test-r-rstudio-r-base-3.6-bionic:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rstudio-r-base-3.6-bionic
> > > - test-r-rstudio-r-base-3.6-centos6:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rstudio-r-base-3.6-centos6
> > > - test-r-rstudio-r-base-3.6-opensuse15:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rstudio-r-base-3.6-opensuse15
> > > - test-r-rstudio-r-base-3.6-opensuse42:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rstudio-r-base-3.6-opensuse42
> > > - test-ubuntu-16.04-cpp:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-16.04-cpp
> > > - test-ubuntu-18.04-cpp-cmake32:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-cpp-cmake32
> > > - test-ubuntu-18.04-cpp-release:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-cpp-release
> > > - test-ubuntu-18.04-cpp-static:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-cpp-static
> > > - test-ubuntu-18.04-cpp:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-cpp
> > > - test-ubuntu-18.04-docs:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-docs
> > > - test-ubuntu-18.04-python-3:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-python-3
> > > - test-ubuntu-18.04-r-sanitizer:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-r-sanitizer
> > > - test-ubuntu-c-glib:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-c-glib
> > > - test-ubuntu-fuzzit-fuzzing:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-fuzzit-fuzzing
> > > - test-ubuntu-fuzzit-regression:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-fuzzit-regression
> > > - test-ubuntu-ruby:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-ruby
> > > - ubuntu-bionic:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-ubuntu-bionic
> > > - ubuntu-disco:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-ubuntu-disco
> > > - ubuntu-xenial:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-ubuntu-xenial
> > > - wheel-manylinux1-cp27m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp27m
> > > - wheel-manylinux1-cp27mu:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp27mu
> > > - wheel-manylinux1-cp35m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp35m
> > > - wheel-manylinux1-cp36m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp36m
> > > - wheel-manylinux1-cp37m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp37m
> > > - wheel-manylinux1-cp38:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp38
> > > - wheel-manylinux2010-cp27m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp27m
> > > - wheel-manylinux2010-cp27mu:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp27mu
> > > - wheel-manylinux2010-cp35m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp35m
> > > - wheel-manylinux2010-cp36m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp36m
> > > - wheel-manylinux2010-cp37m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp37m
> > > - wheel-manylinux2010-cp38:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp38
> > > - wheel-manylinux2014-cp36m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2014-cp36m
> > > - wheel-manylinux2014-cp37m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2014-cp37m
> > > - wheel-manylinux2014-cp38:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2014-cp38
> > > - wheel-osx-cp27m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-wheel-osx-cp27m
> > > - wheel-osx-cp35m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-wheel-osx-cp35m
> > > - wheel-osx-cp36m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-wheel-osx-cp36m
> > > - wheel-osx-cp37m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-wheel-osx-cp37m
> > > - wheel-osx-cp38:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-wheel-osx-cp38
> > > - wheel-win-cp36m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-appveyor-wheel-win-cp36m
> > > - wheel-win-cp37m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-appveyor-wheel-win-cp37m
> > > - wheel-win-cp38:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-appveyor-wheel-win-cp38
> > >

Re: PySpark failure [RE: [NIGHTLY] Arrow Build Report for Job nightly-2020-01-15-0]

Posted by Wes McKinney <we...@gmail.com>.
hi Joris,

Thanks for investigating this. It seems there were some unintended
consequences of the zero-copy optimizations from ARROW-3789. Another
way forward might be to "opt in" to this behavior, or to only do the
zero copy optimizations when split_blocks=True. What do you think?

- Wes

On Thu, Jan 16, 2020 at 3:42 AM Joris Van den Bossche
<jo...@gmail.com> wrote:
>
> So the spark integration build started to fail, and with the following test
> error:
>
> ======================================================================
> ERROR: test_toPandas_batch_order
> (pyspark.sql.tests.test_arrow.EncryptionArrowTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 422, in
> test_toPandas_batch_order
>     run_test(*case)
>   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 409, in run_test
>     pdf, pdf_arrow = self._toPandas_arrow_toggle(df)
>   File "/spark/python/pyspark/sql/tests/test_arrow.py", line 152, in
> _toPandas_arrow_toggle
>     pdf_arrow = df.toPandas()
>   File "/spark/python/pyspark/sql/pandas/conversion.py", line 115, in toPandas
>     return _check_dataframe_localize_timestamps(pdf, timezone)
>   File "/spark/python/pyspark/sql/pandas/types.py", line 180, in
> _check_dataframe_localize_timestamps
>     pdf[column] = _check_series_localize_timestamps(series, timezone)
>   File "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/frame.py",
> line 3487, in __setitem__
>     self._set_item(key, value)
>   File "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/frame.py",
> line 3565, in _set_item
>     NDFrame._set_item(self, key, value)
>   File "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/generic.py",
> line 3381, in _set_item
>     self._data.set(key, value)
>   File "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/internals/managers.py",
> line 1090, in set
>     blk.set(blk_locs, value_getitem(val_locs))
>   File "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/internals/blocks.py",
> line 380, in set
>     self.values[locs] = values
> ValueError: assignment destination is read-only
>
>
> It's from a test that is doing conversions from spark to arrow to pandas
> (so calling pyarrow.Table.to_pandas here
> <https://github.com/apache/spark/blob/018bdcc53c925072b07956de0600452ad255b9c7/python/pyspark/sql/pandas/conversion.py#L111-L115>),
> and on the resulting DataFrame, it is iterating through all columns,
> potentially fixing timezones, and writing each column back into the
> DataFrame (here
> <https://github.com/apache/spark/blob/018bdcc53c925072b07956de0600452ad255b9c7/python/pyspark/sql/pandas/types.py#L179-L181>
> ).
>
> Since it is giving an error about read-only, it might be related to
> zero-copy behaviour of to_pandas, and thus might be related to the refactor
> of the arrow->pandas conversion that landed yesterday (
> https://github.com/apache/arrow/pull/6067, it says it changed to do
> zero-copy for 1-column blocks if possible).
> I am not sure if something should be fixed in pyarrow for this, but the
> obvious thing that pyspark can do is specify they don't want zero-copy.
>
> Joris
>
> On Wed, 15 Jan 2020 at 14:32, Crossbow <cr...@ursalabs.org> wrote:
>
> >
> > Arrow Build Report for Job nightly-2020-01-15-0
> >
> > All tasks:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0
> >
> > Failed Tasks:
> > - gandiva-jar-osx:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-gandiva-jar-osx
> > - test-conda-python-3.7-spark-master:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-spark-master
> > - wheel-manylinux2014-cp35m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2014-cp35m
> >
> > Succeeded Tasks:
> > - centos-6:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-centos-6
> > - centos-7:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-centos-7
> > - centos-8:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-centos-8
> > - conda-linux-gcc-py27:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-linux-gcc-py27
> > - conda-linux-gcc-py36:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-linux-gcc-py36
> > - conda-linux-gcc-py37:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-linux-gcc-py37
> > - conda-linux-gcc-py38:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-linux-gcc-py38
> > - conda-osx-clang-py27:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-osx-clang-py27
> > - conda-osx-clang-py36:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-osx-clang-py36
> > - conda-osx-clang-py37:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-osx-clang-py37
> > - conda-osx-clang-py38:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-osx-clang-py38
> > - conda-win-vs2015-py36:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-win-vs2015-py36
> > - conda-win-vs2015-py37:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-win-vs2015-py37
> > - conda-win-vs2015-py38:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-win-vs2015-py38
> > - debian-buster:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-debian-buster
> > - debian-stretch:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-debian-stretch
> > - gandiva-jar-trusty:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-gandiva-jar-trusty
> > - homebrew-cpp:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-homebrew-cpp
> > - macos-r-autobrew:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-macos-r-autobrew
> > - test-conda-cpp:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-cpp
> > - test-conda-python-2.7-pandas-latest:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-2.7-pandas-latest
> > - test-conda-python-2.7:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-2.7
> > - test-conda-python-3.6:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.6
> > - test-conda-python-3.7-dask-latest:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-dask-latest
> > - test-conda-python-3.7-hdfs-2.9.2:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-hdfs-2.9.2
> > - test-conda-python-3.7-pandas-latest:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-pandas-latest
> > - test-conda-python-3.7-pandas-master:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-pandas-master
> > - test-conda-python-3.7-turbodbc-latest:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-turbodbc-latest
> > - test-conda-python-3.7-turbodbc-master:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-turbodbc-master
> > - test-conda-python-3.7:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7
> > - test-conda-python-3.8-dask-master:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.8-dask-master
> > - test-conda-python-3.8-pandas-latest:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.8-pandas-latest
> > - test-conda-r-3.6:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-r-3.6
> > - test-debian-10-cpp:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-debian-10-cpp
> > - test-debian-10-go-1.12:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-debian-10-go-1.12
> > - test-debian-10-python-3:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-debian-10-python-3
> > - test-debian-c-glib:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-debian-c-glib
> > - test-debian-ruby:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-debian-ruby
> > - test-fedora-29-cpp:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-fedora-29-cpp
> > - test-fedora-29-python-3:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-fedora-29-python-3
> > - test-r-rhub-debian-gcc-devel:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rhub-debian-gcc-devel
> > - test-r-rhub-ubuntu-gcc-release:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rhub-ubuntu-gcc-release
> > - test-r-rstudio-r-base-3.6-bionic:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rstudio-r-base-3.6-bionic
> > - test-r-rstudio-r-base-3.6-centos6:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rstudio-r-base-3.6-centos6
> > - test-r-rstudio-r-base-3.6-opensuse15:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rstudio-r-base-3.6-opensuse15
> > - test-r-rstudio-r-base-3.6-opensuse42:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rstudio-r-base-3.6-opensuse42
> > - test-ubuntu-16.04-cpp:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-16.04-cpp
> > - test-ubuntu-18.04-cpp-cmake32:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-cpp-cmake32
> > - test-ubuntu-18.04-cpp-release:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-cpp-release
> > - test-ubuntu-18.04-cpp-static:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-cpp-static
> > - test-ubuntu-18.04-cpp:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-cpp
> > - test-ubuntu-18.04-docs:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-docs
> > - test-ubuntu-18.04-python-3:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-python-3
> > - test-ubuntu-18.04-r-sanitizer:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-r-sanitizer
> > - test-ubuntu-c-glib:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-c-glib
> > - test-ubuntu-fuzzit-fuzzing:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-fuzzit-fuzzing
> > - test-ubuntu-fuzzit-regression:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-fuzzit-regression
> > - test-ubuntu-ruby:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-ruby
> > - ubuntu-bionic:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-ubuntu-bionic
> > - ubuntu-disco:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-ubuntu-disco
> > - ubuntu-xenial:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-ubuntu-xenial
> > - wheel-manylinux1-cp27m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp27m
> > - wheel-manylinux1-cp27mu:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp27mu
> > - wheel-manylinux1-cp35m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp35m
> > - wheel-manylinux1-cp36m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp36m
> > - wheel-manylinux1-cp37m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp37m
> > - wheel-manylinux1-cp38:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp38
> > - wheel-manylinux2010-cp27m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp27m
> > - wheel-manylinux2010-cp27mu:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp27mu
> > - wheel-manylinux2010-cp35m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp35m
> > - wheel-manylinux2010-cp36m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp36m
> > - wheel-manylinux2010-cp37m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp37m
> > - wheel-manylinux2010-cp38:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp38
> > - wheel-manylinux2014-cp36m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2014-cp36m
> > - wheel-manylinux2014-cp37m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2014-cp37m
> > - wheel-manylinux2014-cp38:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2014-cp38
> > - wheel-osx-cp27m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-wheel-osx-cp27m
> > - wheel-osx-cp35m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-wheel-osx-cp35m
> > - wheel-osx-cp36m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-wheel-osx-cp36m
> > - wheel-osx-cp37m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-wheel-osx-cp37m
> > - wheel-osx-cp38:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-wheel-osx-cp38
> > - wheel-win-cp36m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-appveyor-wheel-win-cp36m
> > - wheel-win-cp37m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-appveyor-wheel-win-cp37m
> > - wheel-win-cp38:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-appveyor-wheel-win-cp38
> >