You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by GitBox <gi...@apache.org> on 2020/01/25 00:35:24 UTC
[GitHub] [arrow-site] paddyhoran commented on a change in pull request #41: ARROW-7580: [Website] 0.16 release post

paddyhoran commented on a change in pull request #41: ARROW-7580: [Website] 0.16 release post 
URL: https://github.com/apache/arrow-site/pull/41#discussion_r370896853
 
 

 ##########
 File path: _posts/2020-01-25-0.16.0-release.md
 ##########
 @@ -0,0 +1,250 @@
+---
+layout: post
+title: "Apache Arrow 0.16.0 Release"
+date: "2020-01-25 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Apache Arrow team is pleased to announce the 0.16.0 release. This covers
+about 4 months of development work and includes [**XXX resolved issues**][1]
+from [**YY distinct contributors**][2].  See the Install Page to learn how to
+get the libraries for your platform.
+
+<!-- Another paragraph here -->
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release.  Many other bugfixes and improvements have been made, we refer
+you to the [complete changelog][3].
+
+## New committers
+
+Since the 0.15.0 release, we've added two new committers:
+
+* [Eric Erhardt][4]
+* [Joris Van den Bossche][5]
+
+Thank you for all your contributions!
+
+## Columnar Format Notes
+
+
+## Arrow Flight notes
+
+## C++ notes
+
+Some work has been done to make the default build configuration of Arrow C++
+as lean as possible:
+
+* The Arrow C++ core can now be built without any dependency on Boost
+(ARROW-6613, ARROW-6782, ARROW-6743, ARROW-6742).
+
+* Flatbuffers and its generated files are vendored within the Arrow C++ source
+tree (ARROW-6634), as well as the double-conversion library (ARROW-6633) and the
+uriparser library (ARROW-7169).
+
+* Compression support (ARROW-6631) and GLog integration (ARROW-6635) are disabled
+by default.
+
+* The filesystem (ARROW-6610), CSV, compute, dataset and JSON layers (ARROW-6637),
+as well as command-line utilities (ARROW-6636), are disabled by default.
+
+When enabled, the default jemalloc configuration has been tweaked to return
+memory more aggressively to the OS (ARROW-6910, ARROW-6994).
+
+The array validation facilities have been vastly expanded and now exist in
+two flavors: the `Validate` method does a light-weight validation that's
+O(1) in array size, while the potentially O(N) method `ValidateFull` does
+thorough data validation (ARROW-6157).
+
+The IO APIs now use `Result<T>` when returning both a Status
+and result value, rather than taking a pointer-out function parameter
+(ARROW-7235).
+
+### C++: CSV
+
+An option is added to attempt automatic dictionary encoding of string columns
+during reading a CSV file, until a cardinality limit is reached. When
+successful, it can make reading faster and the resulting Arrow data is
+much more memory-efficient (ARROW-3408).
+
+The CSV APIs now use `Result<T>` when returning both a Status
+and result value, rather than taking a pointer-out function parameter
+(ARROW-7236).
+
+### C++: Datasets
+
+
+Added support for Arrow IPC files via `IPCFileFormat` (ARROW-7415), and the
+`IN` and `IS_VALID` filter operators (ARROW-7185).
+
+Classes were renamed to avoid the repeated `Data` prefix. The core classes are
+now `Dataset`, `Source`, `Fragment`, and `Partitioning` (ARROW-7498). The dataset
+APIs now use `Result<T>` when returning both a Status and result value, rather
+than taking a pointer-out function parameter (ARROW-7148).
+
+A discovery facility was added to create a `Dataset` and/or `Source` via the
+`DatasetFactory` and `SourceFactory` interfaces. Notably `FileSystemSourceFactory`
+can crawl directories to find the candidates files (`Fragment`) supporting by a
+given `FileFormat`. The factories will also try to unify schemas, transparently
+supporting missing/added columns to the final unified schema. (ARROW-6614,
+ARROW-7061, ARROW-843, ARROW-7380)
+
+A partitioning facility was added to support partition pruning with predicate
+pushdown (ARROW-6494). The partitioning extracts partition from the `Fragment`'s
+path, e.g. `/data/year=2015/month=04/day=29` would extract the `year`, `month`
+and `day` partitions. Partitions can injected in the schema and the RecordBatch
+as materialized columns (ARROW-6965). Add a `PartitioningFactory` discovery
+facility such that types of the `Partitioning`'s schema are inferred if possible.
+
+The `ParquetFileFormat` transparently supports predicate pushdown by ignoring
+RowGroups based on their statistic (ARROW-6952). It also supports column
+projection, effectively only reading data from columns of interest (ARROW-6951).
+
+The dataset layer now compiles and passes tests on Visual Studio (ARROW-7650).
+
+### C++: Filesystem layer
+
+An HDFS implementation of the FileSystem class is available (ARROW-6720).
+
+The filesystem APIs now use `Result<T>` when returning both a Status
+and result value, rather than taking a pointer-out function parameter
+(ARROW-7161).
+
+### C++: IPC
+
+The Arrow IPC reader is being fuzzed continuously by the [OSS-Fuzz][6]
+infrastructure, to detect undesirable behavior on invalid or malicious input.
+Several issues have already been found and fixed.
+
+### C++: Parquet
+
+[Modular encryption][10] is supported (PARQUET-1300).
+
+A performance regression when reading a file with a large number of columns
+has been fixed (ARROW-6876, ARROW-7059).
+
+A number of severe bugs were fixed (PARQUET-1766, ARROW-6895).
+
+### C++: Tensors
+
+CSC sparse matrices are supported (ARROW-4225).
+
+The Tensor APIs now use `Result<T>` when returning both a Status
+and result value, rather than taking a pointer-out function parameter
+(ARROW-7420).
+
+## C# Notes
+
+
+## Java notes
+
+
+## Python notes
+
+Arrow is now tested against Python 3.8 as well.
+
+Python now has bindings for the datasets API (ARROW-6341) as well as the S3
+(ARROW-6655) and HDFS (ARROW-7310) filesystem implementations.
+
+The Duration (ARROW-5855) and Fixed Size List (ARROW-7261) types are exposed
+in Python.
+
+Sparse tensors can be converted to dense tensors (ARROW-6624).  They are
+also interoperable with the `pydata/sparse` and `scipy.sparse` libraries
+(ARROW-4223, ARROW-4224).
+
+A memory leak when converting Arrow data to Pandas "object" data has been
+fixed (ARROW-6874).
+
+Pandas extension arrays now are able to roundtrip through Arrow conversion
+(ARROW-2428).
+
+We now build manylinux2014 wheels for Python 3 (ARROW-7344).
+
+## Ruby and C GLib notes
+
+
+## Rust notes
+
+Support for Arrow data types has been improved, with the following array types now supported (ARROW-3690):
+
+* Fixed Size List and Fixed Size Binary
+* Adding a String Array for utf-8 strings, and keeping the Binary Array for general binary data
+* Duration and interval arrays.
+
+IPC readers for files and streams have been implemented (ARROW-5180).
+
+### Rust: DataFusion
+
+Query execution has been reimplemented with an extensible physical query plan. This allows other projec ts to add other plans, such as for distributed computing or for specific database servers (ARROW-5227).
 
 Review comment:
   ```suggestion
   Query execution has been reimplemented with an extensible physical query plan. This allows other projects to add other plans, such as for distributed computing or for specific database servers (ARROW-5227).
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services