You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by we...@apache.org on 2018/08/06 19:40:15 UTC
[arrow] branch master updated (1e2a069 -> 551e9ce)
This is an automated email from the ASF dual-hosted git repository.
wesm pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.
omit 1e2a069 ARROW-2813: [CI] [Followup] Disable gcov output in Travis-CI logs
omit 4d682a6 ARROW-2988: Improve Windows release verification script to be more automated
omit 0b654ce ARROW-2061: [C++] Make tests a bit faster with Valgrind
omit d9d1f6b ARROW-2815: [CI] Skip Java tests and style checks on C++ job [skip appveyor]
omit 992b27f ARROW-2982: Ensure release verification script works with wget < 1.16, build ORC in C++ libraries
omit ad7bbbd ARROW-2951: [CI] Don't skip AppVeyor build on format-only changes
omit da7a48e ARROW-2990: [GLib] Support building with rpath-ed Arrow C++ on macOS
omit ae95780 ARROW-2985: [Ruby] Add support for verifying RC
omit f8ba33d ARROW-2869: [Python] Add documentation for Array.to_numpy
omit 8cbaf44 ARROW-2977: [Packaging] Release verification script should check rust too
omit 889e1e6 ARROW-2978: [Rust] Change argument to rust fmt to fix build
omit 9101292 ARROW-2480: [C++] Enable casting the value of a decimal to int32_t or int64_t
omit 41bb85b ARROW-2962: [Packaging] Bintray descriptor files are no longer needed
omit 7a6144e ARROW-2666: [Python] Add __array__ method to Array, ChunkedArray, Column
omit 5b45c66 ARROW-2813: [CI] Mute uninformative lcov warnings
add 446dd45 [Release] Update CHANGELOG.md for 0.10.0
add d38bc66 [Release] Update .deb/.rpm changelogs for 0.10.0
add 07f142d [maven-release-plugin] prepare release apache-arrow-0.10.0
new 0f5fb20 ARROW-2813: [CI] Mute uninformative lcov warnings
new ef933a6 ARROW-2666: [Python] Add __array__ method to Array, ChunkedArray, Column
new 0c29673 ARROW-2962: [Packaging] Bintray descriptor files are no longer needed
new 495bf36 ARROW-2480: [C++] Enable casting the value of a decimal to int32_t or int64_t
new 1b2a42e ARROW-2978: [Rust] Change argument to rust fmt to fix build
new 7c953a0 ARROW-2977: [Packaging] Release verification script should check rust too
new de50744 ARROW-2869: [Python] Add documentation for Array.to_numpy
new 072fa77 ARROW-2985: [Ruby] Add support for verifying RC
new 00aed05 ARROW-2990: [GLib] Support building with rpath-ed Arrow C++ on macOS
new 91eab98 ARROW-2951: [CI] Don't skip AppVeyor build on format-only changes
new ea9157a ARROW-2982: Ensure release verification script works with wget < 1.16, build ORC in C++ libraries
new e10f2b3 ARROW-2815: [CI] Skip Java tests and style checks on C++ job [skip appveyor]
new d3c9c1d ARROW-2061: [C++] Make tests a bit faster with Valgrind
new 71145cd ARROW-2988: Improve Windows release verification script to be more automated
new 551e9ce ARROW-2813: [CI] [Followup] Disable gcov output in Travis-CI logs
This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version. This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:
* -- * -- B -- O -- O -- O (1e2a069)
\
N -- N -- N refs/heads/master (551e9ce)
You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.
Any revisions marked "omit" are not gone; other references still
refer to them. Any revisions marked "discard" are gone forever.
The 15 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
CHANGELOG.md | 463 +++++++++++++++++++++
.../linux-packages/debian.ubuntu-trusty/changelog | 6 +
dev/tasks/linux-packages/debian/changelog | 6 +
dev/tasks/linux-packages/yum/arrow.spec.in | 3 +
java/adapter/jdbc/pom.xml | 2 +-
java/format/pom.xml | 2 +-
java/memory/pom.xml | 2 +-
java/plasma/pom.xml | 2 +-
java/pom.xml | 4 +-
java/tools/pom.xml | 2 +-
java/vector/pom.xml | 2 +-
11 files changed, 486 insertions(+), 8 deletions(-)
[arrow] 06/15: ARROW-2977: [Packaging] Release verification script
should check rust too
Posted by we...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
commit 7c953a01e84e14f23bbfd3f5bc649afc28c4b649
Author: Krisztián Szűcs <sz...@gmail.com>
AuthorDate: Sun Aug 5 16:06:04 2018 -0400
ARROW-2977: [Packaging] Release verification script should check rust too
I've found a couple of issues with the verification scripts:
1. The standalone js verification script seems obsolete
2. The windows script only checks arrow-cpp, parquet-cpp and pyarrow
3. The windows script doesn't create conda env
For the next release it'd nice to have consistent scripts on each platform (c_glib requires additional configuration on OSX).
Author: Krisztián Szűcs <sz...@gmail.com>
Closes #2369 from kszucs/ARROW-2977 and squashes the following commits:
5e323c9b <Krisztián Szűcs> remove comments
a59ca508 <Krisztián Szűcs> setup rustup and test rust library
---
dev/release/verify-release-candidate.sh | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/dev/release/verify-release-candidate.sh b/dev/release/verify-release-candidate.sh
index 74ec61c..eedec46 100755
--- a/dev/release/verify-release-candidate.sh
+++ b/dev/release/verify-release-candidate.sh
@@ -225,6 +225,29 @@ test_js() {
popd
}
+test_rust() {
+ # install rust toolchain in a similar fashion like test-miniconda
+ export RUSTUP_HOME=`pwd`/test-rustup
+ export CARGO_HOME=`pwd`/test-rustup
+
+ curl https://sh.rustup.rs -sSf | sh -s -- -y
+ source $RUSTUP_HOME/env
+
+ # build and test rust
+ pushd rust
+
+ # raises on any formatting errors (disabled, because RC1 has a couple)
+ # rustup component add rustfmt-preview
+ # cargo fmt --all -- --check
+ # raises on any warnings
+ cargo rustc -- -D warnings
+
+ cargo build
+ cargo test
+
+ popd
+}
+
# Build and test Java (Requires newer Maven -- I used 3.3.9)
test_package_java() {
@@ -286,6 +309,7 @@ test_integration
test_glib
install_parquet_cpp
test_python
+test_rust
echo 'Release candidate looks good!'
exit 0
[arrow] 02/15: ARROW-2666: [Python] Add __array__ method to Array,
ChunkedArray, Column
Posted by we...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
commit ef933a642ecbd00591735acb353db4ea9f74060c
Author: Pedro M. Duarte <pm...@gmail.com>
AuthorDate: Sat Aug 4 16:00:40 2018 -0400
ARROW-2666: [Python] Add __array__ method to Array, ChunkedArray, Column
Implement `__array__` method on `pyarrow.Array`, `pyarrow.ChunkedArray` and `pyarrow.Column` so that the `to_pandas()` method is used when calling `numpy.asarray` on an instance of these classes.
Currently `numpy.asarray` falls back to using the iterator interface so we get numpy object arrays of the underlying pyarrow scalar value type.
Author: Pedro M. Duarte <pm...@gmail.com>
Closes #2365 from PedroMDuarte/asarray and squashes the following commits:
71f9e291 <Pedro M. Duarte> Improve inline comment
6eac2685 <Pedro M. Duarte> Add __array__ method to Array, ChunkedArray, Column
---
python/pyarrow/array.pxi | 5 ++++
python/pyarrow/table.pxi | 17 +++++++++---
python/pyarrow/tests/test_array.py | 29 ++++++++++++++++++++
python/pyarrow/tests/test_table.py | 56 +++++++++++++++++++++++++++++++++++---
4 files changed, 99 insertions(+), 8 deletions(-)
diff --git a/python/pyarrow/array.pxi b/python/pyarrow/array.pxi
index d59bb05..513fa86 100644
--- a/python/pyarrow/array.pxi
+++ b/python/pyarrow/array.pxi
@@ -652,6 +652,11 @@ cdef class Array:
self, &out))
return wrap_array_output(out)
+ def __array__(self, dtype=None):
+ if dtype is None:
+ return self.to_pandas()
+ return self.to_pandas().astype(dtype)
+
def to_numpy(self):
"""
EXPERIMENTAL: Construct a NumPy view of this array. Only supports
diff --git a/python/pyarrow/table.pxi b/python/pyarrow/table.pxi
index 9a8a875..e056843 100644
--- a/python/pyarrow/table.pxi
+++ b/python/pyarrow/table.pxi
@@ -147,11 +147,12 @@ cdef class ChunkedArray:
c_bool zero_copy_only=False,
c_bool integer_object_nulls=False):
"""
- Convert the arrow::Column to a pandas.Series
+ Convert the arrow::ChunkedArray to an array object suitable for use
+ in pandas
- Returns
- -------
- pandas.Series
+ See also
+ --------
+ Column.to_pandas
"""
cdef:
PyObject* out
@@ -171,6 +172,11 @@ cdef class ChunkedArray:
return wrap_array_output(out)
+ def __array__(self, dtype=None):
+ if dtype is None:
+ return self.to_pandas()
+ return self.to_pandas().astype(dtype)
+
def dictionary_encode(self):
"""
Compute dictionary-encoded representation of array
@@ -517,6 +523,9 @@ cdef class Column:
return result
+ def __array__(self, dtype=None):
+ return self.data.__array__(dtype=dtype)
+
def equals(self, Column other):
"""
Check if contents of two columns are equal
diff --git a/python/pyarrow/tests/test_array.py b/python/pyarrow/tests/test_array.py
index af2708f..425fe09 100644
--- a/python/pyarrow/tests/test_array.py
+++ b/python/pyarrow/tests/test_array.py
@@ -156,6 +156,35 @@ def test_to_pandas_zero_copy():
np_arr.sum()
+def test_asarray():
+ arr = pa.array(range(4))
+
+ # The iterator interface gives back an array of Int64Value's
+ np_arr = np.asarray([_ for _ in arr])
+ assert np_arr.tolist() == [0, 1, 2, 3]
+ assert np_arr.dtype == np.dtype('O')
+ assert type(np_arr[0]) == pa.lib.Int64Value
+
+ # Calling with the arrow array gives back an array with 'int64' dtype
+ np_arr = np.asarray(arr)
+ assert np_arr.tolist() == [0, 1, 2, 3]
+ assert np_arr.dtype == np.dtype('int64')
+
+ # An optional type can be specified when calling np.asarray
+ np_arr = np.asarray(arr, dtype='str')
+ assert np_arr.tolist() == ['0', '1', '2', '3']
+
+ # If PyArrow array has null values, numpy type will be changed as needed
+ # to support nulls.
+ arr = pa.array([0, 1, 2, None])
+ assert arr.type == pa.int64()
+ np_arr = np.asarray(arr)
+ elements = np_arr.tolist()
+ assert elements[:3] == [0., 1., 2.]
+ assert np.isnan(elements[3])
+ assert np_arr.dtype == np.dtype('float64')
+
+
def test_array_getitem():
arr = pa.array(range(10, 15))
lst = arr.to_pylist()
diff --git a/python/pyarrow/tests/test_table.py b/python/pyarrow/tests/test_table.py
index 69086e0..cc672fc 100644
--- a/python/pyarrow/tests/test_table.py
+++ b/python/pyarrow/tests/test_table.py
@@ -160,6 +160,48 @@ def test_chunked_array_pickle(data, typ):
assert result.equals(array)
+def test_chunked_array_to_pandas():
+ data = [
+ pa.array([-10, -5, 0, 5, 10])
+ ]
+ table = pa.Table.from_arrays(data, names=['a'])
+ chunked_arr = table.column(0).data
+ assert isinstance(chunked_arr, pa.ChunkedArray)
+ array = chunked_arr.to_pandas()
+ assert array.shape == (5,)
+ assert array[0] == -10
+
+
+def test_chunked_array_asarray():
+ data = [
+ pa.array([0]),
+ pa.array([1, 2, 3])
+ ]
+ chunked_arr = pa.chunked_array(data)
+
+ np_arr = np.asarray(chunked_arr)
+ assert np_arr.tolist() == [0, 1, 2, 3]
+ assert np_arr.dtype == np.dtype('int64')
+
+ # An optional type can be specified when calling np.asarray
+ np_arr = np.asarray(chunked_arr, dtype='str')
+ assert np_arr.tolist() == ['0', '1', '2', '3']
+
+ # Types are modified when there are nulls
+ data = [
+ pa.array([1, None]),
+ pa.array([1, 2, 3])
+ ]
+ chunked_arr = pa.chunked_array(data)
+
+ np_arr = np.asarray(chunked_arr)
+ elements = np_arr.tolist()
+ assert elements[0] == 1.
+ assert np.isnan(elements[1])
+ assert elements[2:] == [1., 2., 3.]
+ assert np_arr.dtype == np.dtype('float64')
+
+
def test_column_basics():
data = [
pa.array([-10, -5, 0, 5, 10])
@@ -219,14 +261,20 @@ def test_column_to_pandas():
assert series.iloc[0] == -10
-def test_chunked_array_to_pandas():
+def test_column_asarray():
data = [
pa.array([-10, -5, 0, 5, 10])
]
table = pa.Table.from_arrays(data, names=['a'])
- array = table.column(0).data.to_pandas()
- assert array.shape == (5,)
- assert array[0] == -10
+ column = table.column(0)
+
+ np_arr = np.asarray(column)
+ assert np_arr.tolist() == [-10, -5, 0, 5, 10]
+ assert np_arr.dtype == np.dtype('int64')
+
+ # An optional type can be specified when calling np.asarray
+ np_arr = np.asarray(column, dtype='str')
+ assert np_arr.tolist() == ['-10', '-5', '0', '5', '10']
def test_column_flatten():
[arrow] 14/15: ARROW-2988: Improve Windows release verification
script to be more automated
Posted by we...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
commit 71145cdbdc0c2d717ca3a6a4f8189c6cbcad38e5
Author: Wes McKinney <we...@apache.org>
AuthorDate: Mon Aug 6 14:42:29 2018 -0400
ARROW-2988: Improve Windows release verification script to be more automated
* Downloads tarball from SVN dist
* Creates ephemeral conda environment automatically
I am adding instructions to https://cwiki.apache.org/confluence/display/ARROW to help others verify releases on Windows.
Author: Wes McKinney <we...@apache.org>
Closes #2373 from wesm/ARROW-2988 and squashes the following commits:
1a52a48c <Wes McKinney> Revamp Windows release verification script
---
dev/release/verify-release-candidate.bat | 73 ++++++++++++++++++--------------
1 file changed, 42 insertions(+), 31 deletions(-)
diff --git a/dev/release/verify-release-candidate.bat b/dev/release/verify-release-candidate.bat
index bc05b23..86abbc6 100644
--- a/dev/release/verify-release-candidate.bat
+++ b/dev/release/verify-release-candidate.bat
@@ -15,24 +15,8 @@
@rem specific language governing permissions and limitations
@rem under the License.
-@rem To use this script, first create the following conda environment. Change
-@rem the Python version if so desired. You can also omit one or more of the
-@rem libray build rem dependencies if you want to build them from source as well
-@rem
-
-@rem set PYTHON=3.6
-@rem conda create -n arrow-verify-release -f -q -y python=%PYTHON%
-@rem conda install -y ^
-@rem six pytest setuptools numpy pandas cython ^
-@rem thrift-cpp flatbuffers rapidjson ^
-@rem cmake ^
-@rem git ^
-@rem boost-cpp ^
-@rem snappy zlib brotli gflags lz4-c zstd -c conda-forge || exit /B
-
-@rem Then run from the directory containing the RC tarball
-@rem
-@rem verify-release-candidate.bat apache-arrow-%VERSION%
+@rem To run the script:
+@rem verify-release-candidate.bat VERSION RC_NUM
@echo on
@@ -40,17 +24,40 @@ if not exist "C:\tmp\" mkdir C:\tmp
if exist "C:\tmp\arrow-verify-release" rd C:\tmp\arrow-verify-release /s /q
if not exist "C:\tmp\arrow-verify-release" mkdir C:\tmp\arrow-verify-release
-tar xvf %1.tar.gz -C "C:/tmp/"
+set _VERIFICATION_DIR=C:\tmp\arrow-verify-release
+set _VERIFICATION_DIR_UNIX=C:/tmp/arrow-verify-release
+set _VERIFICATION_CONDA_ENV=%_VERIFICATION_DIR%\conda-env
+set _DIST_URL=https://dist.apache.org/repos/dist/dev/arrow
+set _TARBALL=apache-arrow-%1.tar.gz
+set ARROW_SOURCE=%_VERIFICATION_DIR%\apache-arrow-%1
+set INSTALL_DIR=%_VERIFICATION_DIR%\install
+
+@rem Requires GNU Wget for Windows
+wget -O %_TARBALL% %_DIST_URL%/apache-arrow-%1-rc%2/%_TARBALL%
+
+tar xvf %_TARBALL% -C %_VERIFICATION_DIR_UNIX%
+
+set PYTHON=3.6
+
+@rem Using call with conda.bat seems necessary to avoid terminating the batch
+@rem script execution
+call conda create -p %_VERIFICATION_CONDA_ENV% -f -q -y python=%PYTHON% || exit /B
+
+call activate %_VERIFICATION_CONDA_ENV%
+
+call conda install -y ^
+ six pytest setuptools numpy pandas cython ^
+ thrift-cpp flatbuffers rapidjson ^
+ cmake ^
+ git ^
+ boost-cpp ^
+ snappy zlib brotli gflags lz4-c zstd -c conda-forge
set GENERATOR=Visual Studio 14 2015 Win64
set CONFIGURATION=release
-set ARROW_SOURCE=C:\tmp\%1
-set INSTALL_DIR=C:\tmp\%1\install
pushd %ARROW_SOURCE%
-call activate arrow-verify-release
-
set ARROW_BUILD_TOOLCHAIN=%CONDA_PREFIX%\Library
set PARQUET_BUILD_TOOLCHAIN=%CONDA_PREFIX%\Library
@@ -59,14 +66,17 @@ set PARQUET_HOME=%INSTALL_DIR%
set PATH=%INSTALL_DIR%\bin;%PATH%
@rem Build and test Arrow C++ libraries
-mkdir cpp\build
-pushd cpp\build
+mkdir %ARROW_SOURCE%\cpp\build
+pushd %ARROW_SOURCE%\cpp\build
+
+@rem This is the path for Visual Studio Community 2017
+call "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\Tools\VsDevCmd.bat" -arch=amd64
cmake -G "%GENERATOR%" ^
-DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^
-DARROW_BOOST_USE_SHARED=OFF ^
-DCMAKE_BUILD_TYPE=%CONFIGURATION% ^
- -DARROW_CXXFLAGS="/WX /MP" ^
+ -DARROW_CXXFLAGS="/MP" ^
-DARROW_PYTHON=ON ^
.. || exit /B
cmake --build . --target INSTALL --config %CONFIGURATION% || exit /B
@@ -79,13 +89,13 @@ popd
@rem Build parquet-cpp
git clone https://github.com/apache/parquet-cpp.git || exit /B
-mkdir parquet-cpp\build
-pushd parquet-cpp\build
+mkdir %ARROW_SOURCE%\parquet-cpp\build
+pushd %ARROW_SOURCE%\parquet-cpp\build
cmake -G "%GENERATOR%" ^
-DCMAKE_INSTALL_PREFIX=%PARQUET_HOME% ^
-DCMAKE_BUILD_TYPE=%CONFIGURATION% ^
- -DPARQUET_BOOST_USE_SHARED=OFF ^
+x -DPARQUET_BOOST_USE_SHARED=OFF ^
-DPARQUET_BUILD_TESTS=off .. || exit /B
cmake --build . --target INSTALL --config %CONFIGURATION% || exit /B
popd
@@ -93,10 +103,11 @@ popd
@rem Build and import pyarrow
@rem parquet-cpp has some additional runtime dependencies that we need to figure out
@rem see PARQUET-1018
-pushd python
+pushd %ARROW_SOURCE%\python
-set PYARROW_CXXFLAGS=/WX
python setup.py build_ext --inplace --with-parquet --bundle-arrow-cpp bdist_wheel || exit /B
py.test pyarrow -v -s --parquet || exit /B
popd
+
+call deactivate
[arrow] 01/15: ARROW-2813: [CI] Mute uninformative lcov warnings
Posted by we...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
commit 0f5fb20ca896b5b3aacfe7c67f8df0385acea6d6
Author: Antoine Pitrou <an...@python.org>
AuthorDate: Fri Aug 3 22:23:10 2018 -0400
ARROW-2813: [CI] Mute uninformative lcov warnings
Author: Antoine Pitrou <an...@python.org>
Closes #2367 from pitrou/ARROW-2813-mute-lcov-output and squashes the following commits:
19a4f661 <Antoine Pitrou> ARROW-2813: Mute uninformative lcov warnings
---
ci/travis_script_cpp.sh | 3 ++-
ci/travis_script_python.sh | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/ci/travis_script_cpp.sh b/ci/travis_script_cpp.sh
index eedca98..3a6b2f7 100755
--- a/ci/travis_script_cpp.sh
+++ b/ci/travis_script_cpp.sh
@@ -30,6 +30,7 @@ popd
# Capture C++ coverage info (we wipe the build dir in travis_script_python.sh)
if [ "$ARROW_TRAVIS_COVERAGE" == "1" ]; then
pushd $TRAVIS_BUILD_DIR
- lcov --quiet --directory . --capture --no-external --output-file $ARROW_CPP_COVERAGE_FILE
+ lcov --quiet --directory . --capture --no-external --output-file $ARROW_CPP_COVERAGE_FILE \
+ 2>&1 | grep -v "WARNING: no data found for /usr/include"
popd
fi
diff --git a/ci/travis_script_python.sh b/ci/travis_script_python.sh
index 0743f86..53dd36c 100755
--- a/ci/travis_script_python.sh
+++ b/ci/travis_script_python.sh
@@ -155,7 +155,8 @@ if [ "$ARROW_TRAVIS_COVERAGE" == "1" ]; then
coverage xml -i -o $TRAVIS_BUILD_DIR/coverage.xml
# Capture C++ coverage info and combine with previous coverage file
pushd $TRAVIS_BUILD_DIR
- lcov --quiet --directory . --capture --no-external --output-file coverage-python-tests.info
+ lcov --quiet --directory . --capture --no-external --output-file coverage-python-tests.info \
+ 2>&1 | grep -v "WARNING: no data found for /usr/include"
lcov --add-tracefile coverage-python-tests.info \
--add-tracefile $ARROW_CPP_COVERAGE_FILE \
--output-file $ARROW_CPP_COVERAGE_FILE
[arrow] 15/15: ARROW-2813: [CI] [Followup] Disable gcov output in
Travis-CI logs
Posted by we...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
commit 551e9cec0f04c91963411c735f744346b1772ae1
Author: Antoine Pitrou <an...@python.org>
AuthorDate: Mon Aug 6 15:38:38 2018 -0400
ARROW-2813: [CI] [Followup] Disable gcov output in Travis-CI logs
We don't actually need codecov's gcov discovery, since we gather coverage ourselves using `lcov` in the CI scripts. This suppresses hundreds of lines of logs in Travis-CI's output.
Author: Antoine Pitrou <an...@python.org>
Closes #2379 from pitrou/codecov-disable-gcov-discovery and squashes the following commits:
cc06becb <Antoine Pitrou> Disable gcov output in Travis-CI logs
---
ci/travis_upload_cpp_coverage.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/ci/travis_upload_cpp_coverage.sh b/ci/travis_upload_cpp_coverage.sh
index 2b11c5e..38ea5d3 100755
--- a/ci/travis_upload_cpp_coverage.sh
+++ b/ci/travis_upload_cpp_coverage.sh
@@ -25,7 +25,7 @@ pushd $TRAVIS_BUILD_DIR
# Display C++ coverage summary
lcov --list $ARROW_CPP_COVERAGE_FILE
-# Upload report to CodeCov
-bash <(curl -s https://codecov.io/bash) || echo "Codecov did not collect coverage reports"
+# Upload report to CodeCov, disabling gcov discovery to save time and avoid warnings
+bash <(curl -s https://codecov.io/bash) -X gcov || echo "Codecov did not collect coverage reports"
popd
[arrow] 12/15: ARROW-2815: [CI] Skip Java tests and style checks on
C++ job [skip appveyor]
Posted by we...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
commit e10f2b3c15c426c879924529ec944222b9e576f5
Author: Antoine Pitrou <an...@python.org>
AuthorDate: Mon Aug 6 19:10:08 2018 +0200
ARROW-2815: [CI] Skip Java tests and style checks on C++ job [skip appveyor]
This omits all warning and debug logs from previous Maven output.
Author: Antoine Pitrou <an...@python.org>
Closes #2378 from pitrou/ARROW-2815-strip-java-logging and squashes the following commits:
603db64 <Antoine Pitrou> ARROW-2815: Skip Java tests and style checks on C++ job
---
.travis.yml | 4 ++++
ci/travis_script_java.sh | 7 ++++++-
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/.travis.yml b/.travis.yml
index f14b86f..a1f5699 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -23,6 +23,9 @@ services:
cache:
ccache: true
+ directories:
+ - $HOME/.m2 # Maven
+
before_install:
# Common pre-install steps for all builds
@@ -57,6 +60,7 @@ matrix:
- ARROW_TRAVIS_PYTHON_DOCS=1
- ARROW_BUILD_WARNING_LEVEL=CHECKIN
- ARROW_TRAVIS_PYTHON_JVM=1
+ - ARROW_TRAVIS_JAVA_BUILD_ONLY=1
- CC="clang-6.0"
- CXX="clang++-6.0"
before_script:
diff --git a/ci/travis_script_java.sh b/ci/travis_script_java.sh
index a8ad94c..9553dd5 100755
--- a/ci/travis_script_java.sh
+++ b/ci/travis_script_java.sh
@@ -24,6 +24,11 @@ JAVA_DIR=${TRAVIS_BUILD_DIR}/java
pushd $JAVA_DIR
export MAVEN_OPTS="$MAVEN_OPTS -Dorg.slf4j.simpleLogger.defaultLogLevel=warn"
-mvn -B install
+if [ $ARROW_TRAVIS_JAVA_BUILD_ONLY == "1" ]; then
+ # Save time and make build less verbose by skipping tests and style checks
+ mvn -DskipTests=true -Dcheckstyle.skip=true -B install
+else
+ mvn -B install
+fi
popd
[arrow] 07/15: ARROW-2869: [Python] Add documentation for
Array.to_numpy
Posted by we...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
commit de50744e207bd98ab8d775b5fca42d9a29a0dd1f
Author: Antoine Pitrou <an...@python.org>
AuthorDate: Sun Aug 5 16:09:58 2018 -0400
ARROW-2869: [Python] Add documentation for Array.to_numpy
Author: Antoine Pitrou <an...@python.org>
Closes #2351 from pitrou/ARROW-2869-document-numpy and squashes the following commits:
2792dc84 <Antoine Pitrou> Fix renamed reference
8cb89989 <Antoine Pitrou> Revert "Capitalize Pandas"
34d8c36e <Antoine Pitrou> Capitalize Pandas
395231e0 <Antoine Pitrou> Address review comments
347ca4e7 <Antoine Pitrou> ARROW-2869: Add documentation for Array.to_numpy
---
python/doc/Makefile | 2 +-
python/doc/source/api.rst | 4 +--
python/doc/source/data.rst | 4 +--
python/doc/source/extending.rst | 2 +-
python/doc/source/index.rst | 5 +--
python/doc/source/numpy.rst | 75 +++++++++++++++++++++++++++++++++++++++++
python/doc/source/pandas.rst | 16 ++++++---
python/doc/source/plasma.rst | 2 +-
python/pyarrow/array.pxi | 17 ++++++----
9 files changed, 106 insertions(+), 21 deletions(-)
diff --git a/python/doc/Makefile b/python/doc/Makefile
index eacb124..5798f27 100644
--- a/python/doc/Makefile
+++ b/python/doc/Makefile
@@ -20,7 +20,7 @@
#
# You can set these variables from the command line.
-SPHINXOPTS = -j4
+SPHINXOPTS = -j8 -W
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = _build
diff --git a/python/doc/source/api.rst b/python/doc/source/api.rst
index cb99933..23eae92 100644
--- a/python/doc/source/api.rst
+++ b/python/doc/source/api.rst
@@ -139,7 +139,7 @@ Scalar Value Types
.. _api.array:
-.. currentmodule:: pyarrow.lib
+.. currentmodule:: pyarrow
Array Types
-----------
@@ -299,7 +299,7 @@ Memory Pools
.. _api.type_classes:
-.. currentmodule:: pyarrow.lib
+.. currentmodule:: pyarrow
Type Classes
------------
diff --git a/python/doc/source/data.rst b/python/doc/source/data.rst
index 3f4169c..f54cba1 100644
--- a/python/doc/source/data.rst
+++ b/python/doc/source/data.rst
@@ -401,8 +401,8 @@ for one or more arrays of the same type.
c.data.num_chunks
c.data.chunk(0)
-As you'll see in the :ref:`pandas section <pandas>`, we can convert these
-objects to contiguous NumPy arrays for use in pandas:
+As you'll see in the :ref:`pandas section <pandas_interop>`, we can convert
+these objects to contiguous NumPy arrays for use in pandas:
.. ipython:: python
diff --git a/python/doc/source/extending.rst b/python/doc/source/extending.rst
index a471fb3..e3d8707 100644
--- a/python/doc/source/extending.rst
+++ b/python/doc/source/extending.rst
@@ -15,7 +15,7 @@
.. specific language governing permissions and limitations
.. under the License.
-.. currentmodule:: pyarrow.lib
+.. currentmodule:: pyarrow
.. _extending:
Using pyarrow from C++ and Cython Code
diff --git a/python/doc/source/index.rst b/python/doc/source/index.rst
index c35f20b..8af795d 100644
--- a/python/doc/source/index.rst
+++ b/python/doc/source/index.rst
@@ -15,8 +15,8 @@
.. specific language governing permissions and limitations
.. under the License.
-Apache Arrow (Python)
-=====================
+Python bindings for Apache Arrow
+================================
Apache Arrow is a cross-language development platform for in-memory data. It
specifies a standardized language-independent columnar memory format for flat
@@ -45,6 +45,7 @@ structures.
ipc
filesystems
plasma
+ numpy
pandas
parquet
extending
diff --git a/python/doc/source/numpy.rst b/python/doc/source/numpy.rst
new file mode 100644
index 0000000..303e182
--- /dev/null
+++ b/python/doc/source/numpy.rst
@@ -0,0 +1,75 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. _numpy_interop:
+
+Using PyArrow with NumPy
+========================
+
+PyArrow allows converting back and forth from
+`NumPy <https://www.numpy.org/>`_ arrays to Arrow :ref:`Arrays <data.array>`.
+
+NumPy to Arrow
+--------------
+
+To convert a NumPy array to Arrow, one can simply call the :func:`pyarrow.array`
+factory function.
+
+.. code-block:: pycon
+
+ >>> import numpy as np
+ >>> import pyarrow as pa
+ >>> data = np.arange(10, dtype='int16')
+ >>> arr = pa.array(data)
+ >>> arr
+ <pyarrow.lib.Int16Array object at 0x7fb1d1e6ae58>
+ [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9
+ ]
+
+Converting from NumPy supports a wide range of input dtypes, including
+structured dtypes or strings.
+
+Arrow to NumPy
+--------------
+
+In the reverse direction, it is possible to produce a view of an Arrow Array
+for use with NumPy using the :meth:`~pyarrow.Array.to_numpy` method.
+This is limited to primitive types for which NumPy has the same physical
+representation as Arrow, and assuming the Arrow data has no nulls.
+
+.. code-block:: pycon
+
+ >>> import numpy as np
+ >>> import pyarrow as pa
+ >>> arr = pa.array([4, 5, 6], type=pa.int32())
+ >>> view = arr.to_numpy()
+ >>> view
+ array([4, 5, 6], dtype=int32)
+
+For more complex data types, you have to use the :meth:`~pyarrow.Array.to_pandas`
+method (which will construct a Numpy array with Pandas semantics for, e.g.,
+representation of null values).
diff --git a/python/doc/source/pandas.rst b/python/doc/source/pandas.rst
index 7699b13..be11b5b 100644
--- a/python/doc/source/pandas.rst
+++ b/python/doc/source/pandas.rst
@@ -15,24 +15,30 @@
.. specific language governing permissions and limitations
.. under the License.
-.. _pandas:
+.. _pandas_interop:
Using PyArrow with pandas
=========================
-To interface with pandas, PyArrow provides various conversion routines to
-consume pandas structures and convert back to them.
+To interface with `pandas <https://pandas.pydata.org/>`_, PyArrow provides
+various conversion routines to consume pandas structures and convert back
+to them.
+
+.. note::
+ While pandas uses NumPy as a backend, it has enough peculiarities
+ (such as a different type system, and support for null values) that this
+ is a separate topic from :ref:`numpy_interop`.
DataFrames
----------
-The equivalent to a pandas DataFrame in Arrow is a :class:`pyarrow.table.Table`.
+The equivalent to a pandas DataFrame in Arrow is a :ref:`Table <data.table>`.
Both consist of a set of named columns of equal length. While pandas only
supports flat columns, the Table also provides nested columns, thus it can
represent more data than a DataFrame, so a full conversion is not always possible.
Conversion from a Table to a DataFrame is done by calling
-:meth:`pyarrow.table.Table.to_pandas`. The inverse is then achieved by using
+:meth:`pyarrow.Table.to_pandas`. The inverse is then achieved by using
:meth:`pyarrow.Table.from_pandas`.
.. code-block:: python
diff --git a/python/doc/source/plasma.rst b/python/doc/source/plasma.rst
index b64b4c2..6adc470 100644
--- a/python/doc/source/plasma.rst
+++ b/python/doc/source/plasma.rst
@@ -291,7 +291,7 @@ process of storing an object in the Plasma store, however one cannot directly
write the ``DataFrame`` to Plasma with Pandas alone. Plasma also needs to know
the size of the ``DataFrame`` to allocate a buffer for.
-See :ref:`pandas` for more information on using Arrow with Pandas.
+See :ref:`pandas_interop` for more information on using Arrow with Pandas.
You can create the pyarrow equivalent of a Pandas ``DataFrame`` by using
``pyarrow.from_pandas`` to convert it to a ``RecordBatch``.
diff --git a/python/pyarrow/array.pxi b/python/pyarrow/array.pxi
index 513fa86..5906965 100644
--- a/python/pyarrow/array.pxi
+++ b/python/pyarrow/array.pxi
@@ -620,7 +620,7 @@ cdef class Array:
c_bool zero_copy_only=False,
c_bool integer_object_nulls=False):
"""
- Convert to an array object suitable for use in pandas
+ Convert to a NumPy array object suitable for use in pandas.
Parameters
----------
@@ -659,14 +659,13 @@ cdef class Array:
def to_numpy(self):
"""
- EXPERIMENTAL: Construct a NumPy view of this array. Only supports
- primitive arrays with the same memory layout as NumPy (i.e. integers,
- floating point) without any nulls.
+ Experimental: return a NumPy view of this array. Only primitive
+ arrays with the same memory layout as NumPy (i.e. integers,
+ floating point), without any nulls, are supported.
Returns
-------
- arr : numpy.ndarray
-
+ array : numpy.ndarray
"""
if self.null_count:
raise NotImplementedError('NumPy array view is only supported '
@@ -681,7 +680,11 @@ cdef class Array:
def to_pylist(self):
"""
- Convert to an list of native Python objects.
+ Convert to a list of native Python objects.
+
+ Returns
+ -------
+ lst : list
"""
return [x.as_py() for x in self]
[arrow] 11/15: ARROW-2982: Ensure release verification script works
with wget < 1.16, build ORC in C++ libraries
Posted by we...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
commit ea9157a6c6fb6da3516f1a53b80e3436a82cc2c1
Author: Wes McKinney <we...@apache.org>
AuthorDate: Mon Aug 6 08:18:59 2018 -0400
ARROW-2982: Ensure release verification script works with wget < 1.16, build ORC in C++ libraries
I also wrote a guide to setting up Ubuntu Linux (14.04 and higher) to be able to run from a cold start. I may have missed some stuff from a brand new install; others can keep updating. Eventually we should Dockerize for Ubuntu 14.04, 16.04, and 18.04: https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
Author: Wes McKinney <we...@apache.org>
Closes #2372 from wesm/ARROW-2982 and squashes the following commits:
0ffdd15b <Wes McKinney> Ensure script works with older wget
---
dev/release/verify-release-candidate.sh | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/dev/release/verify-release-candidate.sh b/dev/release/verify-release-candidate.sh
index 05b0a43..220a79b 100755
--- a/dev/release/verify-release-candidate.sh
+++ b/dev/release/verify-release-candidate.sh
@@ -76,13 +76,17 @@ fetch_archive() {
}
verify_binary_artifacts() {
+ # --show-progress not supported on wget < 1.16
+ wget --help | grep -q '\--show-progress' && \
+ _WGET_PROGRESS_OPT="-q --show-progress" || _WGET_PROGRESS_OPT=""
+
# download the binaries folder for the current RC
rcname=apache-arrow-${VERSION}-rc${RC_NUMBER}
wget -P "$rcname" \
--quiet \
--no-host-directories \
--cut-dirs=5 \
- --show-progress \
+ $_WGET_PROGRESS_OPT \
--no-parent \
--reject 'index.html*' \
--recursive "$ARROW_DIST_URL/$rcname/binaries/"
@@ -144,8 +148,9 @@ test_and_install_cpp() {
cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
-DCMAKE_INSTALL_LIBDIR=$ARROW_HOME/lib \
- -DARROW_PLASMA=on \
- -DARROW_PYTHON=on \
+ -DARROW_PLASMA=ON \
+ -DARROW_ORC=ON \
+ -DARROW_PYTHON=ON \
-DARROW_BOOST_USE_SHARED=on \
-DCMAKE_BUILD_TYPE=release \
-DARROW_BUILD_BENCHMARKS=on \
@@ -323,12 +328,12 @@ cd ${DIST_NAME}
test_package_java
setup_miniconda
test_and_install_cpp
-test_js
-test_integration
-test_glib
install_parquet_cpp
test_python
+test_glib
test_ruby
+test_js
+test_integration
test_rust
echo 'Release candidate looks good!'
[arrow] 09/15: ARROW-2990: [GLib] Support building with rpath-ed
Arrow C++ on macOS
Posted by we...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
commit 00aed053fd77e5c5e17d83e36d85a82a1b738fa0
Author: Kouhei Sutou <ko...@clear-code.com>
AuthorDate: Mon Aug 6 08:16:37 2018 -0400
ARROW-2990: [GLib] Support building with rpath-ed Arrow C++ on macOS
Author: Kouhei Sutou <ko...@clear-code.com>
Closes #2374 from kou/glib-macos and squashes the following commits:
c8b5c453 <Kouhei Sutou> Support building with rpath-ed Arrow C++ on macOS
---
c_glib/arrow-glib/Makefile.am | 19 ++++++++++---------
c_glib/arrow-gpu-glib/Makefile.am | 28 +++++++++++++++++-----------
c_glib/configure.ac | 2 ++
3 files changed, 29 insertions(+), 20 deletions(-)
diff --git a/c_glib/arrow-glib/Makefile.am b/c_glib/arrow-glib/Makefile.am
index 0eef0d4..e557964 100644
--- a/c_glib/arrow-glib/Makefile.am
+++ b/c_glib/arrow-glib/Makefile.am
@@ -242,14 +242,6 @@ if HAVE_INTROSPECTION
INTROSPECTION_GIRS =
INTROSPECTION_SCANNER_ARGS =
INTROSPECTION_SCANNER_ENV =
-if USE_ARROW_BUILD_DIR
-INTROSPECTION_SCANNER_ENV += \
- LD_LIBRARY_PATH=$(ARROW_LIB_DIR):$${LD_LIBRARY_PATH}
-endif
-if OS_MACOS
-INTROSPECTION_SCANNER_ENV += \
- ARCHFLAGS=
-endif
INTROSPECTION_COMPILER_ARGS =
Arrow-1.0.gir: libarrow-glib.la
@@ -261,12 +253,21 @@ Arrow_1_0_gir_INCLUDES = \
Gio-2.0
Arrow_1_0_gir_CFLAGS = \
$(AM_CPPFLAGS)
-Arrow_1_0_gir_LIBS = libarrow-glib.la
+Arrow_1_0_gir_LIBS =
Arrow_1_0_gir_FILES = $(libarrow_glib_la_sources)
Arrow_1_0_gir_SCANNERFLAGS = \
+ --library-path=$(ARROW_LIB_DIR) \
--warn-all \
--identifier-prefix=GArrow \
--symbol-prefix=garrow
+if OS_MACOS
+Arrow_1_0_gir_LIBS += arrow-glib
+Arrow_1_0_gir_SCANNERFLAGS += \
+ --no-libtool \
+ --library-path=$(abs_builddir)/.libs
+else
+Arrow_1_0_gir_LIBS += libarrow-glib.la
+endif
INTROSPECTION_GIRS += Arrow-1.0.gir
girdir = $(datadir)/gir-1.0
diff --git a/c_glib/arrow-gpu-glib/Makefile.am b/c_glib/arrow-gpu-glib/Makefile.am
index 1e1c02a..2ed9665 100644
--- a/c_glib/arrow-gpu-glib/Makefile.am
+++ b/c_glib/arrow-gpu-glib/Makefile.am
@@ -78,10 +78,6 @@ else
INTROSPECTION_SCANNER_ENV += \
PKG_CONFIG_PATH=${abs_builddir}/../arrow-glib:$${PKG_CONFIG_PATH}
endif
-if OS_MACOS
-INTROSPECTION_SCANNER_ENV += \
- ARCHFLAGS=
-endif
INTROSPECTION_COMPILER_ARGS = \
--includedir=$(abs_builddir)/../arrow-glib
@@ -95,20 +91,30 @@ ArrowGPU_1_0_gir_INCLUDES = \
ArrowGPU_1_0_gir_CFLAGS = \
$(AM_CPPFLAGS)
ArrowGPU_1_0_gir_LDFLAGS =
-if USE_ARROW_BUILD_DIR
-ArrowGPU_1_0_gir_LDFLAGS += \
- -L$(ARROW_LIB_DIR)
-endif
-ArrowGPU_1_0_gir_LIBS = \
- $(abs_builddir)/../arrow-glib/libarrow-glib.la \
- libarrow-gpu-glib.la
+ArrowGPU_1_0_gir_LIBS =
ArrowGPU_1_0_gir_FILES = \
$(libarrow_gpu_glib_la_sources)
ArrowGPU_1_0_gir_SCANNERFLAGS = \
+ --library-path=$(ARROW_LIB_DIR) \
--warn-all \
--add-include-path=$(abs_builddir)/../arrow-glib \
--identifier-prefix=GArrowGPU \
--symbol-prefix=garrow_gpu
+if OS_MACOS
+ArrowGPU_1_0_gir_LIBS += \
+ arrow-glib \
+ arrow-gpu-glib
+ArrowGPU_1_0_gir_SCANNERFLAGS += \
+ --no-libtool \
+ --library-path=$(abs_builddir)/../arrow-glib/.libs \
+ --library-path=$(abs_builddir)/.libs
+else
+ArrowGPU_1_0_gir_LIBS += \
+ $(abs_builddir)/../arrow-glib/libarrow-glib.la \
+ libarrow-gpu-glib.la
+endif
+
+ \
INTROSPECTION_GIRS += ArrowGPU-1.0.gir
girdir = $(datadir)/gir-1.0
diff --git a/c_glib/configure.ac b/c_glib/configure.ac
index 6692927..6368170 100644
--- a/c_glib/configure.ac
+++ b/c_glib/configure.ac
@@ -115,6 +115,8 @@ if test "x$GARROW_ARROW_CPP_BUILD_DIR" = "x"; then
USE_ARROW_BUILD_DIR=no
PKG_CHECK_MODULES([ARROW], [arrow arrow-compute])
+ _PKG_CONFIG(ARROW_LIB_DIR, [variable=libdir], [arrow])
+ ARROW_LIB_DIR="$pkg_cv_ARROW_LIB_DIR"
PKG_CHECK_MODULES([ARROW_ORC],
[arrow-orc],
[HAVE_ARROW_ORC=yes],
[arrow] 08/15: ARROW-2985: [Ruby] Add support for verifying RC
Posted by we...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
commit 072fa775d8fd54d4f3b6aa185fb85f91b79a1876
Author: Kouhei Sutou <ko...@clear-code.com>
AuthorDate: Mon Aug 6 08:15:34 2018 -0400
ARROW-2985: [Ruby] Add support for verifying RC
Author: Kouhei Sutou <ko...@clear-code.com>
Closes #2376 from kou/verify-ruby and squashes the following commits:
f3e1fb7a <Kouhei Sutou> Add support for verifying RC
---
dev/release/verify-release-candidate.sh | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/dev/release/verify-release-candidate.sh b/dev/release/verify-release-candidate.sh
index eedec46..05b0a43 100755
--- a/dev/release/verify-release-candidate.sh
+++ b/dev/release/verify-release-candidate.sh
@@ -225,6 +225,25 @@ test_js() {
popd
}
+test_ruby() {
+ export GI_TYPELIB_PATH=$ARROW_HOME/lib/girepository-1.0:$GI_TYPELIB_PATH
+
+ pushd ruby
+
+ pushd red-arrow
+ bundle install --path vendor/bundle
+ bundle exec ruby test/run-test.rb
+ popd
+
+ # TODO: Arrow GPU related tests
+ # pushd red-arrow-gpu
+ # bundle install --path vendor/bundle
+ # bundle exec ruby test/run-test.rb
+ # popd
+
+ popd
+}
+
test_rust() {
# install rust toolchain in a similar fashion like test-miniconda
export RUSTUP_HOME=`pwd`/test-rustup
@@ -309,6 +328,7 @@ test_integration
test_glib
install_parquet_cpp
test_python
+test_ruby
test_rust
echo 'Release candidate looks good!'
[arrow] 03/15: ARROW-2962: [Packaging] Bintray descriptor files are
no longer needed
Posted by we...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
commit 0c29673824bc388f51266174ca83a457f8820f79
Author: Krisztián Szűcs <sz...@gmail.com>
AuthorDate: Sat Aug 4 18:07:55 2018 -0400
ARROW-2962: [Packaging] Bintray descriptor files are no longer needed
Wait for [build-302](https://github.com/kszucs/crossbow/branches/all?utf8=%E2%9C%93&query=build-302) to pass
Author: Krisztián Szűcs <sz...@gmail.com>
Closes #2357 from kszucs/ARROW-2962 and squashes the following commits:
7445b8cb <Krisztián Szűcs> remove bintray descriptors
8baabdac <Krisztián Szűcs> don't update descriptor in rake task
---
dev/tasks/linux-packages/apt/descriptor.json | 45 ----------------------------
dev/tasks/linux-packages/package-task.rb | 10 -------
dev/tasks/linux-packages/yum/descriptor.json | 22 --------------
3 files changed, 77 deletions(-)
diff --git a/dev/tasks/linux-packages/apt/descriptor.json b/dev/tasks/linux-packages/apt/descriptor.json
deleted file mode 100644
index d45ed85..0000000
--- a/dev/tasks/linux-packages/apt/descriptor.json
+++ /dev/null
@@ -1,45 +0,0 @@
-{
- "package": {
- "name": "APT",
- "repo": "apache-arrow-apt",
- "subject": "kou",
- "licenses": ["Apache-2.0"],
- "vcs_url": "htttps://github.com/apache/arrow.git"
- },
- "version": {
- "name": "dev"
- },
- "files": [
- {
- "includePattern": "dev/tasks/linux-packages/apt/repositories/([^/]+)/pool/stretch/main/a/apache-arrow/([^/]+\\.deb)\\z",
- "uploadPattern": "pool/stretch/main/$2",
- "matrixParams": {
- "deb_distribution": "stretch",
- "deb_component": "main",
- "deb_architecture": "amd64",
- "override": 1
- }
- },
- {
- "includePattern": "dev/tasks/linux-packages/apt/repositories/([^/]+)/pool/trusty/universe/a/apache-arrow/([^/]+\\.deb)\\z",
- "uploadPattern": "pool/trusty/universe/$2",
- "matrixParams": {
- "deb_distribution": "trusty",
- "deb_component": "universe",
- "deb_architecture": "amd64",
- "override": 1
- }
- },
- {
- "includePattern": "dev/tasks/linux-packages/apt/repositories/([^/]+)/pool/xenial/universe/a/apache-arrow/([^/]+\\.deb)\\z",
- "uploadPattern": "pool/xenial/universe/$2",
- "matrixParams": {
- "deb_distribution": "xenial",
- "deb_component": "universe",
- "deb_architecture": "amd64",
- "override": 1
- }
- }
- ],
- "publish": true
-}
diff --git a/dev/tasks/linux-packages/package-task.rb b/dev/tasks/linux-packages/package-task.rb
index 29468a1..b8f25ae 100644
--- a/dev/tasks/linux-packages/package-task.rb
+++ b/dev/tasks/linux-packages/package-task.rb
@@ -266,7 +266,6 @@ VERSION=#{@deb_upstream_version}
task :update do
update_debian_changelog
update_spec
- update_descriptor
end
end
end
@@ -325,13 +324,4 @@ VERSION=#{@deb_upstream_version}
end
end
- def update_descriptor
- Dir.glob("**/descriptor.json") do |descriptor_json|
- update_content(descriptor_json) do |content|
- content = content.sub(/"name": "\d+\.\d+\.\d+.*?"/) do
- "\"name\": \"#{@version}\""
- end
- end
- end
- end
end
diff --git a/dev/tasks/linux-packages/yum/descriptor.json b/dev/tasks/linux-packages/yum/descriptor.json
deleted file mode 100644
index b025b17..0000000
--- a/dev/tasks/linux-packages/yum/descriptor.json
+++ /dev/null
@@ -1,22 +0,0 @@
-{
- "package": {
- "name": "Yum",
- "repo": "apache-arrow-yum",
- "subject": "kou",
- "licenses": ["Apache-2.0"],
- "vcs_url": "htttps://github.com/apache/arrow.git"
- },
- "version": {
- "name": "dev"
- },
- "files": [
- {
- "includePattern": "cpp-linux/yum/repositories/(centos)/([^/]+)/([^/]+)/[^/]+/([^/]+\\.rpm)",
- "uploadPattern": "$1/$2/$3/$4",
- "matrixParams": {
- "override": 1
- }
- }
- ],
- "publish": true
-}
[arrow] 05/15: ARROW-2978: [Rust] Change argument to rust fmt to
fix build
Posted by we...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
commit 1b2a42e563c6b6f5e8e72144303e0dfcb168300f
Author: Andy Grove <an...@gmail.com>
AuthorDate: Sun Aug 5 16:01:31 2018 -0400
ARROW-2978: [Rust] Change argument to rust fmt to fix build
Work in progress... trying to fix the build.
Author: Andy Grove <an...@gmail.com>
Closes #2371 from andygrove/fix_rust_ci_failure and squashes the following commits:
94c12773 <Andy Grove> Update code formatting to keep latest version of rust fmt happy
1b3e72d9 <Andy Grove> Change argument to rust fmt to fix build
---
ci/travis_script_rust.sh | 2 +-
rust/src/array.rs | 7 ++-----
rust/src/buffer.rs | 15 ++++++++++-----
rust/src/datatypes.rs | 13 ++++++++-----
4 files changed, 21 insertions(+), 16 deletions(-)
diff --git a/ci/travis_script_rust.sh b/ci/travis_script_rust.sh
index ff12483..f85820f 100755
--- a/ci/travis_script_rust.sh
+++ b/ci/travis_script_rust.sh
@@ -25,7 +25,7 @@ pushd $RUST_DIR
# raises on any formatting errors
rustup component add rustfmt-preview
-cargo fmt --all -- --write-mode=diff
+cargo fmt --all -- --check
# raises on any warnings
cargo rustc -- -D warnings
diff --git a/rust/src/array.rs b/rust/src/array.rs
index e418518..1c4322c 100644
--- a/rust/src/array.rs
+++ b/rust/src/array.rs
@@ -19,9 +19,9 @@
use std::any::Any;
use std::convert::From;
use std::ops::Add;
-use std::sync::Arc;
use std::str;
use std::string::String;
+use std::sync::Arc;
use super::bitmap::Bitmap;
use super::buffer::*;
@@ -453,12 +453,9 @@ mod tests {
fn test_access_array_concurrently() {
let a = PrimitiveArray::from(Buffer::from(vec![5, 6, 7, 8, 9]));
- let ret = thread::spawn(move || {
- a.iter().collect::<Vec<i32>>()
- }).join();
+ let ret = thread::spawn(move || a.iter().collect::<Vec<i32>>()).join();
assert!(ret.is_ok());
assert_eq!(vec![5, 6, 7, 8, 9], ret.ok().unwrap());
}
}
-
diff --git a/rust/src/buffer.rs b/rust/src/buffer.rs
index 0fdc2c5..bdc3601 100644
--- a/rust/src/buffer.rs
+++ b/rust/src/buffer.rs
@@ -190,7 +190,8 @@ mod tests {
fn test_buffer_eq() {
let a = Buffer::from(vec![1, 2, 3, 4, 5]);
let b = Buffer::from(vec![5, 4, 3, 2, 1]);
- let c = a.iter()
+ let c = a
+ .iter()
.zip(b.iter())
.map(|(a, b)| a == b)
.collect::<Vec<bool>>();
@@ -201,7 +202,8 @@ mod tests {
fn test_buffer_lt() {
let a = Buffer::from(vec![1, 2, 3, 4, 5]);
let b = Buffer::from(vec![5, 4, 3, 2, 1]);
- let c = a.iter()
+ let c = a
+ .iter()
.zip(b.iter())
.map(|(a, b)| a < b)
.collect::<Vec<bool>>();
@@ -212,7 +214,8 @@ mod tests {
fn test_buffer_gt() {
let a = Buffer::from(vec![1, 2, 3, 4, 5]);
let b = Buffer::from(vec![5, 4, 3, 2, 1]);
- let c = a.iter()
+ let c = a
+ .iter()
.zip(b.iter())
.map(|(a, b)| a > b)
.collect::<Vec<bool>>();
@@ -223,7 +226,8 @@ mod tests {
fn test_buffer_add() {
let a = Buffer::from(vec![1, 2, 3, 4, 5]);
let b = Buffer::from(vec![5, 4, 3, 2, 1]);
- let c = a.iter()
+ let c = a
+ .iter()
.zip(b.iter())
.map(|(a, b)| a + b)
.collect::<Vec<i32>>();
@@ -234,7 +238,8 @@ mod tests {
fn test_buffer_multiply() {
let a = Buffer::from(vec![1, 2, 3, 4, 5]);
let b = Buffer::from(vec![5, 4, 3, 2, 1]);
- let c = a.iter()
+ let c = a
+ .iter()
.zip(b.iter())
.map(|(a, b)| a * b)
.collect::<Vec<i32>>();
diff --git a/rust/src/datatypes.rs b/rust/src/datatypes.rs
index d4849da..2adec0b 100644
--- a/rust/src/datatypes.rs
+++ b/rust/src/datatypes.rs
@@ -278,11 +278,14 @@ impl Schema {
impl fmt::Display for Schema {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
- f.write_str(&self.columns
- .iter()
- .map(|c| c.to_string())
- .collect::<Vec<String>>()
- .join(", "))
+ f.write_str(
+ &self
+ .columns
+ .iter()
+ .map(|c| c.to_string())
+ .collect::<Vec<String>>()
+ .join(", "),
+ )
}
}
[arrow] 04/15: ARROW-2480: [C++] Enable casting the value of a
decimal to int32_t or int64_t
Posted by we...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
commit 495bf36bedc8614dd49e309760b66f912987c800
Author: Antoine Pitrou <an...@python.org>
AuthorDate: Sat Aug 4 18:37:08 2018 -0400
ARROW-2480: [C++] Enable casting the value of a decimal to int32_t or int64_t
Author: Antoine Pitrou <an...@python.org>
Author: Phillip Cloud <cp...@gmail.com>
Closes #1917 from cpcloud/ARROW-2480 and squashes the following commits:
456624e4 <Antoine Pitrou> Try to fix other compile error
d9c2955a <Antoine Pitrou> Try to fix gcc 4.8 failure
609efaec <Phillip Cloud> ARROW-2480: Enable casting the value of a decimal to int32_t or int64_t
---
cpp/src/arrow/util/decimal-test.cc | 26 ++++++++++++++++++++++++++
cpp/src/arrow/util/decimal.h | 18 ++++++++++++++++++
2 files changed, 44 insertions(+)
diff --git a/cpp/src/arrow/util/decimal-test.cc b/cpp/src/arrow/util/decimal-test.cc
index 0877617..61884a1 100644
--- a/cpp/src/arrow/util/decimal-test.cc
+++ b/cpp/src/arrow/util/decimal-test.cc
@@ -436,4 +436,30 @@ TEST(Decimal128Test, TestFromBigEndianBadLength) {
ASSERT_RAISES(Invalid, Decimal128::FromBigEndian(0, 17, &out));
}
+TEST(Decimal128Test, TestToInteger) {
+ Decimal128 value1("1234");
+ int32_t out1;
+
+ Decimal128 value2("-1234");
+ int64_t out2;
+
+ ASSERT_OK(value1.ToInteger(&out1));
+ ASSERT_EQ(1234, out1);
+
+ ASSERT_OK(value1.ToInteger(&out2));
+ ASSERT_EQ(1234, out2);
+
+ ASSERT_OK(value2.ToInteger(&out1));
+ ASSERT_EQ(-1234, out1);
+
+ ASSERT_OK(value2.ToInteger(&out2));
+ ASSERT_EQ(-1234, out2);
+
+ Decimal128 invalid_int32(static_cast<int64_t>(std::pow(2, 31)));
+ ASSERT_RAISES(Invalid, invalid_int32.ToInteger(&out1));
+
+ Decimal128 invalid_int64("12345678912345678901");
+ ASSERT_RAISES(Invalid, invalid_int64.ToInteger(&out2));
+}
+
} // namespace arrow
diff --git a/cpp/src/arrow/util/decimal.h b/cpp/src/arrow/util/decimal.h
index b3180cb..7280362 100644
--- a/cpp/src/arrow/util/decimal.h
+++ b/cpp/src/arrow/util/decimal.h
@@ -20,11 +20,14 @@
#include <array>
#include <cstdint>
+#include <limits>
+#include <sstream>
#include <string>
#include <type_traits>
#include "arrow/status.h"
#include "arrow/util/macros.h"
+#include "arrow/util/type_traits.h"
#include "arrow/util/visibility.h"
namespace arrow {
@@ -134,6 +137,21 @@ class ARROW_EXPORT Decimal128 {
/// \brief Convert Decimal128 from one scale to another
Status Rescale(int32_t original_scale, int32_t new_scale, Decimal128* out) const;
+ /// \brief Convert to a signed integer
+ template <typename T, typename = EnableIfIsOneOf<T, int32_t, int64_t>>
+ Status ToInteger(T* out) const {
+ constexpr auto min_value = std::numeric_limits<T>::min();
+ constexpr auto max_value = std::numeric_limits<T>::max();
+ const auto& self = *this;
+ if (self < min_value || self > max_value) {
+ std::stringstream buf;
+ buf << "Invalid cast from Decimal128 to " << sizeof(T) << " byte integer";
+ return Status::Invalid(buf.str());
+ }
+ *out = static_cast<T>(low_bits_);
+ return Status::OK();
+ }
+
private:
int64_t high_bits_;
uint64_t low_bits_;
[arrow] 10/15: ARROW-2951: [CI] Don't skip AppVeyor build on
format-only changes
Posted by we...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
commit 91eab98976124b27cae457c3852915d053ad6178
Author: Antoine Pitrou <an...@python.org>
AuthorDate: Mon Aug 6 08:17:08 2018 -0400
ARROW-2951: [CI] Don't skip AppVeyor build on format-only changes
Author: Antoine Pitrou <an...@python.org>
Closes #2375 from pitrou/ARROW-2951-appveyor-builds-format and squashes the following commits:
8d813774 <Antoine Pitrou> ARROW-2951: Don't skip AppVeyor build on format-only changes
---
appveyor.yml | 1 +
1 file changed, 1 insertion(+)
diff --git a/appveyor.yml b/appveyor.yml
index d62baf7..0e37033 100644
--- a/appveyor.yml
+++ b/appveyor.yml
@@ -24,6 +24,7 @@ only_commits:
- appveyor.yml
- ci/
- cpp/
+ - format/
- python/
- rust/
[arrow] 13/15: ARROW-2061: [C++] Make tests a bit faster with
Valgrind
Posted by we...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
commit d3c9c1df257c991e04fdd3c10d328ec857d68f96
Author: Antoine Pitrou <an...@python.org>
AuthorDate: Mon Aug 6 14:41:50 2018 -0400
ARROW-2061: [C++] Make tests a bit faster with Valgrind
Saves around 80 seconds on Travis-CI.
Author: Antoine Pitrou <an...@python.org>
Closes #2377 from pitrou/ARROW-2061-valgrind-test-speed and squashes the following commits:
43a1e0e1 <Antoine Pitrou> ARROW-2061: Make tests a bit faster with Valgrind
---
cpp/src/arrow/array-test.cc | 5 +++-
cpp/src/arrow/compute/compute-test.cc | 51 +++++++++++++++++++-------------
cpp/src/arrow/io/io-memory-test.cc | 11 +++++--
cpp/src/arrow/ipc/ipc-read-write-test.cc | 3 ++
4 files changed, 47 insertions(+), 23 deletions(-)
diff --git a/cpp/src/arrow/array-test.cc b/cpp/src/arrow/array-test.cc
index b7bad67..8b78762 100644
--- a/cpp/src/arrow/array-test.cc
+++ b/cpp/src/arrow/array-test.cc
@@ -247,10 +247,13 @@ TEST_F(TestArray, TestIsNullIsValidNoNulls) {
TEST_F(TestArray, BuildLargeInMemoryArray) {
#ifdef NDEBUG
const int64_t length = static_cast<int64_t>(std::numeric_limits<int32_t>::max()) + 1;
-#else
+#elif !defined(ARROW_VALGRIND)
// use a smaller size since the insert function isn't optimized properly on debug and
// the test takes a long time to complete
const int64_t length = 2 << 24;
+#else
+ // use an even smaller size with valgrind
+ const int64_t length = 2 << 20;
#endif
BooleanBuilder builder;
diff --git a/cpp/src/arrow/compute/compute-test.cc b/cpp/src/arrow/compute/compute-test.cc
index 6a92844..ba5c935 100644
--- a/cpp/src/arrow/compute/compute-test.cc
+++ b/cpp/src/arrow/compute/compute-test.cc
@@ -1034,24 +1034,29 @@ TEST_F(TestHashKernel, DictEncodeBinary) {
}
TEST_F(TestHashKernel, BinaryResizeTable) {
- const int64_t kTotalValues = 10000;
- const int64_t kRepeats = 10;
+ const int32_t kTotalValues = 10000;
+#if !defined(ARROW_VALGRIND)
+ const int32_t kRepeats = 10;
+#else
+ // Mitigate Valgrind's slowness
+ const int32_t kRepeats = 3;
+#endif
vector<std::string> values;
vector<std::string> uniques;
vector<int32_t> indices;
- for (int64_t i = 0; i < kTotalValues * kRepeats; i++) {
- int64_t index = i % kTotalValues;
- std::stringstream ss;
- ss << "test" << index;
- std::string val = ss.str();
+ char buf[20] = "test";
- values.push_back(val);
+ for (int32_t i = 0; i < kTotalValues * kRepeats; i++) {
+ int32_t index = i % kTotalValues;
+
+ ASSERT_GE(snprintf(buf + 4, sizeof(buf) - 4, "%d", index), 0);
+ values.emplace_back(buf);
if (i < kTotalValues) {
- uniques.push_back(val);
+ uniques.push_back(values.back());
}
- indices.push_back(static_cast<int32_t>(i % kTotalValues));
+ indices.push_back(index);
}
CheckUnique<BinaryType, std::string>(&this->ctx_, binary(), values, {}, uniques, {});
@@ -1076,24 +1081,30 @@ TEST_F(TestHashKernel, DictEncodeFixedSizeBinary) {
}
TEST_F(TestHashKernel, FixedSizeBinaryResizeTable) {
- const int64_t kTotalValues = 10000;
- const int64_t kRepeats = 10;
+ const int32_t kTotalValues = 10000;
+#if !defined(ARROW_VALGRIND)
+ const int32_t kRepeats = 10;
+#else
+ // Mitigate Valgrind's slowness
+ const int32_t kRepeats = 3;
+#endif
vector<std::string> values;
vector<std::string> uniques;
vector<int32_t> indices;
- for (int64_t i = 0; i < kTotalValues * kRepeats; i++) {
- int64_t index = i % kTotalValues;
- std::stringstream ss;
- ss << "test" << static_cast<char>(index / 128) << static_cast<char>(index % 128);
- std::string val = ss.str();
+ char buf[7] = "test..";
- values.push_back(val);
+ for (int32_t i = 0; i < kTotalValues * kRepeats; i++) {
+ int32_t index = i % kTotalValues;
+
+ buf[4] = static_cast<char>(index / 128);
+ buf[5] = static_cast<char>(index % 128);
+ values.emplace_back(buf, 6);
if (i < kTotalValues) {
- uniques.push_back(val);
+ uniques.push_back(values.back());
}
- indices.push_back(static_cast<int32_t>(i % kTotalValues));
+ indices.push_back(index);
}
auto type = fixed_size_binary(6);
diff --git a/cpp/src/arrow/io/io-memory-test.cc b/cpp/src/arrow/io/io-memory-test.cc
index d80aaec..62305a6 100644
--- a/cpp/src/arrow/io/io-memory-test.cc
+++ b/cpp/src/arrow/io/io-memory-test.cc
@@ -131,9 +131,16 @@ TEST(TestBufferReader, RetainParentReference) {
}
TEST(TestMemcopy, ParallelMemcopy) {
+#if defined(ARROW_VALGRIND)
+ // Compensate for Valgrind's slowness
+ constexpr int64_t THRESHOLD = 32 * 1024;
+#else
+ constexpr int64_t THRESHOLD = 1024 * 1024;
+#endif
+
for (int i = 0; i < 5; ++i) {
// randomize size so the memcopy alignment is tested
- int64_t total_size = 3 * 1024 * 1024 + std::rand() % 100;
+ int64_t total_size = 3 * THRESHOLD + std::rand() % 100;
std::shared_ptr<Buffer> buffer1, buffer2;
@@ -144,7 +151,7 @@ TEST(TestMemcopy, ParallelMemcopy) {
io::FixedSizeBufferWriter writer(buffer1);
writer.set_memcopy_threads(4);
- writer.set_memcopy_threshold(1024 * 1024);
+ writer.set_memcopy_threshold(THRESHOLD);
ASSERT_OK(writer.Write(buffer2->data(), buffer2->size()));
ASSERT_EQ(0, memcmp(buffer1->data(), buffer2->data(), buffer1->size()));
diff --git a/cpp/src/arrow/ipc/ipc-read-write-test.cc b/cpp/src/arrow/ipc/ipc-read-write-test.cc
index baf067e..f6e49ea 100644
--- a/cpp/src/arrow/ipc/ipc-read-write-test.cc
+++ b/cpp/src/arrow/ipc/ipc-read-write-test.cc
@@ -498,8 +498,11 @@ TEST_F(RecursionLimits, StressLimit) {
CheckDepth(100, &it_works);
ASSERT_TRUE(it_works);
+// Mitigate Valgrind's slowness
+#if !defined(ARROW_VALGRIND)
CheckDepth(500, &it_works);
ASSERT_TRUE(it_works);
+#endif
}
#endif // !defined(_WIN32) || defined(NDEBUG)