You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by GitBox <gi...@apache.org> on 2020/01/27 20:53:37 UTC

[GitHub] [arrow-site] wesm commented on a change in pull request #41: ARROW-7580: [Website] 0.16 release post

wesm commented on a change in pull request #41: ARROW-7580: [Website] 0.16 release post 
URL: https://github.com/apache/arrow-site/pull/41#discussion_r371476426
 
 

 ##########
 File path: _posts/2020-01-25-0.16.0-release.md
 ##########
 @@ -0,0 +1,271 @@
+---
+layout: post
+title: "Apache Arrow 0.16.0 Release"
+date: "2020-01-25 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Apache Arrow team is pleased to announce the 0.16.0 release. This covers
+about 4 months of development work and includes [**XXX resolved issues**][1]
+from [**YY distinct contributors**][2].  See the Install Page to learn how to
+get the libraries for your platform.
+
+<!-- Another paragraph here -->
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release.  Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## New committers
+
+Since the 0.15.0 release, we've added two new committers:
+
+* [Eric Erhardt][4]
+* [Joris Van den Bossche][5]
+
+Thank you for all your contributions!
+
+## Columnar Format Notes
+
+We still have work to do to complete comprehensive columnar format integration
+testing between the Java and C++ libraries. Once this work is completed, we
+intend to make a 1.0.0 release with [forward and backward compatibility
+guarantees][23].
+
+## Arrow Flight RPC notes
+
+Flight development work has recently focused on robustness and stability. If
+you are not yet familiar flight, read the [introductory blog post from
+October][24].
+
+We are also discussing adding a "bidirectional RPC" which enables
+request-response workflows requiring both client and server to send data
+streams to be performed a single RPC request.
+
+## C++ notes
+
+Some work has been done to make the default build configuration of Arrow C++ as
+lean as possible. The Arrow C++ core can now be built without any external
+dependencies other than a new enough C++ compiler (gcc 4.9 or higher). Notably,
+Boost is no longer required. We invested effort to vendor some small essential
+dependencies: Flatbuffers, double-conversion, and uriparser. Many optional
+features requiring external libraries, like compression and GLog integration,
+are now disabled by default. Several subcomponents of the C++ project like the
+filesystem API, CSV, compute, dataset and JSON layers, as well as command-line
+utilities, are now disabled by default. The only toolchain dependency enabled
+by default is jemalloc, the recommended memory allocator, but this can also be
+disabled if desired.
+
+When enabled, the default jemalloc configuration has been tweaked to return
+memory more aggressively to the OS (ARROW-6910, ARROW-6994). We welcome
+feedback from users about our memory allocation configuration and performance
+in applications.
+
+The array validation facilities have been vastly expanded and now exist in
+two flavors: the `Validate` method does a light-weight validation that's
+O(1) in array size, while the potentially O(N) method `ValidateFull` does
+thorough data validation (ARROW-6157).
+
+The IO APIs now use `Result<T>` when returning both a Status
+and result value, rather than taking a pointer-out function parameter
+(ARROW-7235).
+
+### C++: CSV
+
+An option is added to attempt automatic dictionary encoding of string columns
+during reading a CSV file, until a cardinality limit is reached. When
+successful, it can make reading faster and the resulting Arrow data is
+much more memory-efficient (ARROW-3408).
+
+The CSV APIs now use `Result<T>` when returning both a Status
+and result value, rather than taking a pointer-out function parameter
+(ARROW-7236).
+
+### C++: Datasets
+
+The 0.16 release introduces the Datasets API to the C++ library, along with
+bindings in Python and R.
+This API allows you to treat multiple files as a single logical dataset entity
+and make efficient selection queries against it.
+This release includes support for Parquet and Arrow IPC file formats.
+Factory objects allow you to discover files in a directory recursively, inspect the schemas in the files, and performs some basic schema unification.
+You may specify how file path segments map to partition, and there is support for auto-detecting some partition information, including Hive-style partitioning.
+The Datasets API includes a filter expression syntax as well as column selection.
+These are evaluated with predicate pushdown, and for Parquet, evaluation is pushed down to row groups.
+
+### C++: Filesystem layer
+
+An HDFS implementation of the FileSystem class is available (ARROW-6720). We
+plan to deprecate the prior bespoke C++ HDFS class in favor of the standardized
+filesystem API.
+
+The filesystem APIs now use `Result<T>` when returning both a Status
+and result value, rather than taking a pointer-out function parameter
+(ARROW-7161).
+
+### C++: IPC
+
+The Arrow IPC reader is being fuzzed continuously by the [OSS-Fuzz][6]
+infrastructure, to detect undesirable behavior on invalid or malicious input.
+Several issues have already been found and fixed.
+
+### C++: Parquet
+
+[Modular encryption][10] is now supported (PARQUET-1300).
+
+A performance regression when reading a file with a large number of columns
+has been fixed (ARROW-6876, ARROW-7059), as well as several bugs (PARQUET-1766, ARROW-6895).
+
+### C++: Tensors
+
+CSC sparse matrices are supported (ARROW-4225).
+
+The Tensor APIs now use `Result<T>` when returning both a Status
+and result value, rather than taking a pointer-out function parameter
+(ARROW-7420).
+
+## C# Notes
+
+
+## Java notes
+
+
+## Python notes
+
 
 Review comment:
   yes

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services