You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/26 23:14:04 UTC

[GitHub] [arrow-site] nealrichardson opened a new pull request #92: 3.0 release blog post

nealrichardson opened a new pull request #92:
URL: https://github.com/apache/arrow-site/pull/92


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] eerhardt commented on a change in pull request #92: 3.0 release blog post

Posted by GitBox <gi...@apache.org>.
eerhardt commented on a change in pull request #92:
URL: https://github.com/apache/arrow-site/pull/92#discussion_r567932311



##########
File path: _posts/2021-01-25-3.0.0-release.md
##########
@@ -0,0 +1,114 @@
+---
+layout: post
+title: "Apache Arrow 3.0.0 Release"
+date: "2021-01-25 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Apache Arrow team is pleased to announce the 3.0.0 release. This covers
+over 3 months of development work and includes [**666 resolved issues**][1]
+from [**106 distinct contributors**][2]. See the Install Page to learn how to
+get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+
+## Columnar Format Notes
+
+ARROW-9747 - [C++][Java][Format] Support Decimal256 Type
+
+## Arrow Flight RPC notes
+
+## C++ notes
+
+
+Critical fixes in writing/reading of nested data with parquet ... (ARROW-10493)
+
+Several new compute kernels were added (any/all boolean reduction,
+``is_nan``, sorting by multiple columns).
+String kernels (trimming, splitting, filling nulls)
+
+Fixed reading of compressed Feather files written with pyarrow 0.17.
+
+
+ARROW-6883 - [C++] Support sending delta DictionaryBatch or replacement DictionaryBatch in IPC stream writer class
+
+## C# notes
+

Review comment:
       ```suggestion
   
   The .NET package added initial support for Arrow Flight clients and servers. Support is enabled through two new NuGet packages [Apache.Arrow.Flight](https://www.nuget.org/packages/Apache.Arrow.Flight/) (client) and [Apache.Arrow.Flight.AspNetCore](https://www.nuget.org/packages/Apache.Arrow.Flight.AspNetCore/) (server).
   
   Also fixed an issue where ArrowStreamWriter wasn't writing schema metadata to Arrow streams.
   
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson merged pull request #92: 3.0 release blog post

Posted by GitBox <gi...@apache.org>.
nealrichardson merged pull request #92:
URL: https://github.com/apache/arrow-site/pull/92


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] andygrove commented on pull request #92: 3.0 release blog post

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #92:
URL: https://github.com/apache/arrow-site/pull/92#issuecomment-772627482


   > @andygrove you can or you can just push to my branch
   
   Done


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] bkietz commented on a change in pull request #92: 3.0 release blog post

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #92:
URL: https://github.com/apache/arrow-site/pull/92#discussion_r568151444



##########
File path: _posts/2021-01-25-3.0.0-release.md
##########
@@ -0,0 +1,231 @@
+---
+layout: post
+title: "Apache Arrow 3.0.0 Release"
+date: "2021-01-25 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Apache Arrow team is pleased to announce the 3.0.0 release. This covers
+over 3 months of development work and includes [**666 resolved issues**][1]
+from [**106 distinct contributors**][2]. See the Install Page to learn how to
+get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+
+## Columnar Format Notes
+
+The Decimal256 data type, which was already supported by the Arrow columnar
+format specification, is now implemented in C++ and Java (ARROW-9747).
+
+## Arrow Flight RPC notes
+
+Authentication in C++/Java/Python has been overhauled, allowing more flexible authentication methods and use of standard headers.
+Support for cookies has also been added.
+The C++/Java implementations are now more permissive when parsing messages in order to interoperate better with other Flight implementations.
+
+A basic Flight implementation for C#/.NET has been added.
+See the [implementation status matrix](https://arrow.apache.org/docs/status.html#flight-rpc) for details.
+## C++ notes
+
+The default memory pool can now be changed at runtime using the environment
+variable `ARROW_DEFAULT_MEMORY_POOL` (ARROW-11009).  The environment variable
+is inspected at process startup.  This is useful when trying to diagnose memory
+consumption issues with Arrow.
+
+STL-like iterators are now provided over concrete arrays. Those are useful for
+non-performance critical tasks, for example testing (ARROW-10776).
+
+It is now possible to concatenate dictionary arrays with unequal dictionaries.
+The dictionaries are unified when concatenating, for supported data types
+(ARROW-5336).
+
+Threads in a thread pool are now spawned lazily as needed for enqueued
+tasks, up to the configured capacity. They used to be spawned upfront on
+creation of the thread pool (ARROW-10038).
+
+### Compute layer
+
+Comprehensive documentation for compute functions is now available:
+https://arrow.apache.org/docs/cpp/compute.html
+
+Compute functions for string processing have been added for:
+* spliting on whitespace (ASCII and Unicode flavours) and splitting on a
+  pattern (ARROW-9991);
+* trimming characters (ARROW-9128).
+
+Behavior of the `index_in` and `is_in` compute functions with nulls has been
+changed for consistency (ARROW-10663).
+
+Multiple-column sort kernels are now available for tables and record batches
+(ARROW-8199, ARROW-10796, ARROW-10790).
+
+Performance of table filtering has been vastly improved (ARROW-10569).
+
+Scalar arguments are now accepted for more compute functions.
+
+Compute functions `quantile` (ARROW-10831) and `is_nan` (ARROW-11043) have been
+added for numeric data.
+
+Aggregation functions `any` (ARROW-1846) and `all` (ARROW-10301) have been
+added for boolean data.
+

Review comment:
       ```suggestion
   ### Dataset
   
   The `Expression` hierarchy has simplified to a wrapper around literals, field references,
   or calls to named functions. This enables usage of any compute function while filtering
   with no bolierplate.
   
   Parquet statistics are lazily parsed in `ParquetDatasetFactory` and
   `ParquetFileFragment` for shorter construction time.
   
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] github-actions[bot] commented on pull request #92: 3.0 release blog post

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #92:
URL: https://github.com/apache/arrow-site/pull/92#issuecomment-767891486


   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow-site/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] mrkn commented on a change in pull request #92: 3.0 release blog post

Posted by GitBox <gi...@apache.org>.
mrkn commented on a change in pull request #92:
URL: https://github.com/apache/arrow-site/pull/92#discussion_r567517552



##########
File path: _posts/2021-01-25-3.0.0-release.md
##########
@@ -0,0 +1,96 @@
+---
+layout: post
+title: "Apache Arrow 3.0.0 Release"
+date: "2021-01-25 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Apache Arrow team is pleased to announce the 3.0.0 release. This covers
+over 3 months of development work and includes [**666 resolved issues**][1]
+from [**106 distinct contributors**][2]. See the Install Page to learn how to
+get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+
+## Columnar Format Notes
+
+ARROW-9747 - [C++][Java][Format] Support Decimal256 Type
+
+## Arrow Flight RPC notes
+
+## C++ notes
+
+
+Critical fixes in writing/reading of nested data with parquet ... (ARROW-10493)
+
+Several new compute kernels were added (any/all boolean reduction,
+``is_nan``, sorting by multiple columns).
+String kernels (trimming, splitting, filling nulls)
+
+Fixed reading of compressed Feather files written with pyarrow 0.17.
+
+
+ARROW-6883 - [C++] Support sending delta DictionaryBatch or replacement DictionaryBatch in IPC stream writer class
+
+## C# notes
+
+## Go notes
+
+## Java notes
+
+## JavaScript notes
+## Julia notes
+
+This is the first release to officially include [an implementation](https://github.com/apache/arrow/tree/master/julia/Arrow) for the Julia language. The pure Julia implementation includes support for [wide coverage of the format specification](https://arrow.apache.org/docs/status.html). Additional details can be found in the [julialang.org blog post](https://julialang.org/blog/2021/01/arrow/).
+
+## Python notes
+
+Support for Python 3.5 was dropped, and support for Python 3.9 added.
+The minimal required version for NumPy is now 1.16.6. Note that when upgrading
+NumPy to 1.20, you also need to upgrade pyarrow to 3.0.0 to ensure compatibility.
+
+
+## R notes
+
+This release contains new features for the Flight RPC wrapper, better
+support for saving R metadata (including `sf` spatial data) to Feather and
+Parquet files, several significant improvements to speed and memory management,
+and many other enhancements.
+
+For more on what’s in the 3.0.0 R package, see the [R changelog][4].
+
+## Ruby and C GLib notes
+
+### Ruby
+
+### C GLib

Review comment:
       ```suggestion
   ### Ruby
   
   In Ruby binding, 256-bit decimal support and `Arrow::FixedBinaryArrayBuilder` are added likewise C GLib below.
   
   ### C GLib
   
   In the version 3.0.0 of C GLib consists of many new features.
   
   A chunked array, a record batch, and a table support `sort_indices` function as well as an array.
   These functions including array's support to specify sorting option.
   `garrow_array_sort_to_indices` has been renamed to `garrow_array_sort_indices` and the previous name has been deprecated.
   
   `GArrowField` supports functions to handle metadata.
   `GArrowSchema` supports `garrow_schema_has_metadata()` function.
   
   `GArrowArrayBuilder` supports to add single null, multiple nulls, single empty value, and multiple empty values.
   `GArrowFixedSizedBinaryArrayBuilder` is newly supported.
   
   256-bit decimal and extension types are newly supported.
   Filesystem module supports Mock, HDFS, S3 file systems.
   Dataset module supports CSV, IPC, and Parquet file formats.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] andygrove commented on pull request #92: 3.0 release blog post

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #92:
URL: https://github.com/apache/arrow-site/pull/92#issuecomment-771919056


   @nealrichardson should I PR against this PR to add the Rust content?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] lidavidm commented on a change in pull request #92: 3.0 release blog post

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #92:
URL: https://github.com/apache/arrow-site/pull/92#discussion_r567245871



##########
File path: _posts/2021-01-25-3.0.0-release.md
##########
@@ -0,0 +1,96 @@
+---
+layout: post
+title: "Apache Arrow 3.0.0 Release"
+date: "2021-01-25 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Apache Arrow team is pleased to announce the 3.0.0 release. This covers
+over 3 months of development work and includes [**666 resolved issues**][1]
+from [**106 distinct contributors**][2]. See the Install Page to learn how to
+get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+
+## Columnar Format Notes
+
+ARROW-9747 - [C++][Java][Format] Support Decimal256 Type
+
+## Arrow Flight RPC notes
+

Review comment:
       ```suggestion
   
   Authentication in C++/Java/Python has been overhauled, allowing more flexible authentication methods and use of standard headers.
   Support for cookies has also been added.
   The C++/Java implementations are now more permissive when parsing messages in order to interoperate better with other Flight implementations.
   
   A basic Flight implementation for C#/.NET has been added.
   See the [implementation status matrix](https://arrow.apache.org/docs/status.html#flight-rpc) for details.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on pull request #92: 3.0 release blog post

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #92:
URL: https://github.com/apache/arrow-site/pull/92#issuecomment-771200334


   I think this is complete enough, just needs resolution on #93, whether this post ends with a link to the other post or whether the content is merged in here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] andygrove commented on a change in pull request #92: 3.0 release blog post

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #92:
URL: https://github.com/apache/arrow-site/pull/92#discussion_r569541258



##########
File path: _posts/2021-01-25-3.0.0-release.md
##########
@@ -0,0 +1,237 @@
+---
+layout: post
+title: "Apache Arrow 3.0.0 Release"
+date: "2021-01-25 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Apache Arrow team is pleased to announce the 3.0.0 release. This covers
+over 3 months of development work and includes [**666 resolved issues**][1]
+from [**106 distinct contributors**][2]. See the Install Page to learn how to
+get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## Columnar Format Notes
+
+The Decimal256 data type, which was already supported by the Arrow columnar
+format specification, is now implemented in C++ and Java (ARROW-9747).
+
+## Arrow Flight RPC notes
+
+Authentication in C++/Java/Python has been overhauled, allowing more flexible authentication methods and use of standard headers.
+Support for cookies has also been added.
+The C++/Java implementations are now more permissive when parsing messages in order to interoperate better with other Flight implementations.
+
+A basic Flight implementation for C#/.NET has been added.
+See the [implementation status matrix](https://arrow.apache.org/docs/status.html#flight-rpc) for details.
+
+## C++ notes
+
+The default memory pool can now be changed at runtime using the environment
+variable `ARROW_DEFAULT_MEMORY_POOL` (ARROW-11009).  The environment variable
+is inspected at process startup.  This is useful when trying to diagnose memory
+consumption issues with Arrow.
+
+STL-like iterators are now provided over concrete arrays. Those are useful for
+non-performance critical tasks, for example testing (ARROW-10776).
+
+It is now possible to concatenate dictionary arrays with unequal dictionaries.
+The dictionaries are unified when concatenating, for supported data types
+(ARROW-5336).
+
+Threads in a thread pool are now spawned lazily as needed for enqueued
+tasks, up to the configured capacity. They used to be spawned upfront on
+creation of the thread pool (ARROW-10038).
+
+### Compute layer
+
+Comprehensive documentation for compute functions is now available:
+https://arrow.apache.org/docs/cpp/compute.html
+
+Compute functions for string processing have been added for:
+
+* splitting on whitespace (ASCII and Unicode flavors) and splitting on a
+  pattern (ARROW-9991);
+* trimming characters (ARROW-9128).
+
+Behavior of the `index_in` and `is_in` compute functions with nulls has been
+changed for consistency (ARROW-10663).
+
+Multiple-column sort kernels are now available for tables and record batches
+(ARROW-8199, ARROW-10796, ARROW-10790).
+
+Performance of table filtering has been vastly improved (ARROW-10569).
+
+Scalar arguments are now accepted for more compute functions.
+
+Compute functions `quantile` (ARROW-10831) and `is_nan` (ARROW-11043) have been
+added for numeric data.
+
+Aggregation functions `any` (ARROW-1846) and `all` (ARROW-10301) have been
+added for boolean data.
+
+### Dataset
+
+The `Expression` hierarchy has simplified to a wrapper around literals, field references,
+or calls to named functions. This enables usage of any compute function while filtering
+with no boilerplate.
+
+Parquet statistics are lazily parsed in `ParquetDatasetFactory` and
+`ParquetFileFragment` for shorter construction time.
+
+### CSV
+
+Conversion of string columns is now faster thanks to faster UTF-8 validation
+of small strings (ARROW-10313).
+
+Conversion of floating-point columns is now faster thanks to optimized
+string-to-double conversion routines (ARROW-10328).
+
+Parsing of ISO8601 timestamps is now more liberal: trailing zeros can
+be omitted in the fractional part (ARROW-10337).
+
+Fixed a bug where null detection could give the wrong results on some platforms
+(ARROW-11067).
+
+Added type inference for Date32 columns for values in the form `YYYY-MM-DD`
+(ARROW-11247).
+
+### Feather
+
+Fixed reading of compressed Feather files written with Arrow 0.17 (ARROW-11163).
+
+### Filesystem layer
+
+S3 recursive tree walks now benefit from a parallel implementation, where reads
+of multiple child directories are now issued concurrently (ARROW-10788).
+
+Improved empty directory detection to be mindful of differences between Amazon
+and Minio S3 implementations (ARROW-10942).
+
+### Flight RPC
+
+IPv6 host addresses are now supported (ARROW-10475).
+
+### IPC
+
+It is now possible to emit dictionary deltas where possible using the IPC
+stream writer. This is governed by a new variable in the `IpcWriteOptions` class
+(ARROW-6883).
+
+It is now possible to read wider tables, which used to fail due to reaching a
+limit during Flatbuffers verification (ARROW-10056).
+
+### Parquet
+
+Fixed reading of LZ4-compressed Parquet columns emitted by the Java Parquet
+implementation (ARROW-11301).
+
+Fixed a bug where writing multiple batches of nullable nested strings to Parquet
+would not write any data in batches after the first one (ARROW-10493)
+
+The Decimal256 data type can be read from and written to Parquet (ARROW-10607).
+
+LargeString and LargeBinary data can now be written to Parquet (ARROW-10426).
+
+## C# notes
+
+The .NET package added initial support for Arrow Flight clients and servers. Support is enabled through two new NuGet packages [Apache.Arrow.Flight](https://www.nuget.org/packages/Apache.Arrow.Flight/) (client) and [Apache.Arrow.Flight.AspNetCore](https://www.nuget.org/packages/Apache.Arrow.Flight.AspNetCore/) (server).
+
+Also fixed an issue where ArrowStreamWriter wasn't writing schema metadata to Arrow streams.
+
+## Julia notes
+
+This is the first release to officially include
+[an implementation](https://github.com/apache/arrow/tree/master/julia/Arrow)
+for the Julia language. The pure Julia implementation includes support
+for [wide coverage of the format specification](https://arrow.apache.org/docs/status.html).
+Additional details can be found in the
+[julialang.org blog post](https://julialang.org/blog/2021/01/arrow/).
+
+## Python notes
+
+Support for Python 3.9 was added (ARROW-10224), and support for Python 3.5
+was removed (ARROW-5679).
+
+Support for building manylinux1 packages has been removed (ARROW-11212).
+PyArrow continues to be available as manylinux2010 and manylinux2014 wheels.
+
+The minimal required version for NumPy is now 1.16.6. Note that when upgrading
+NumPy to 1.20, you also need to upgrade pyarrow to 3.0.0 to ensure compatibility,
+as this pyarrow release fixed a compatibility issue with NumPy 1.20 (ARROW-10833).
+
+Compute functions are now automatically exported from C++ to the `pyarrow.compute`
+module, and they have docstrings matching their C++ definition.
+
+An `iter_batches()` method is now available for reading a Parquet file iteratively
+(ARROW-7800).
+
+Alternate memory pools (such as mimalloc, jemalloc or the C malloc-based memory
+pool) are now available from Python (ARROW-11049).
+
+Fixed a potential deadlock when importing pandas from several threads (ARROW-10519).
+
+See the C++ notes above for additional details.
+
+## R notes
+
+This release contains new features for the Flight RPC wrapper, better
+support for saving R metadata (including `sf` spatial data) to Feather and
+Parquet files, several significant improvements to speed and memory management,
+and many other enhancements.
+
+For more on what’s in the 3.0.0 R package, see the [R changelog][4].
+
+## Ruby and C GLib notes
+
+### Ruby
+
+In Ruby binding, 256-bit decimal support and `Arrow::FixedBinaryArrayBuilder` are added likewise C GLib below.
+
+### C GLib
+
+In the version 3.0.0 of C GLib consists of many new features.
+
+A chunked array, a record batch, and a table support `sort_indices` function as well as an array.
+These functions including array's support to specify sorting option.
+`garrow_array_sort_to_indices` has been renamed to `garrow_array_sort_indices` and the previous name has been deprecated.
+
+`GArrowField` supports functions to handle metadata.
+`GArrowSchema` supports `garrow_schema_has_metadata()` function.
+
+`GArrowArrayBuilder` supports to add single null, multiple nulls, single empty value, and multiple empty values.
+`GArrowFixedSizedBinaryArrayBuilder` is newly supported.
+
+256-bit decimal and extension types are newly supported.
+Filesystem module supports Mock, HDFS, S3 file systems.
+Dataset module supports CSV, IPC, and Parquet file formats.
+
+## Rust notes
+
+
+[1]: https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%203.0.0
+[2]: {{ site.baseurl }}/release/3.0.0.html#contributors
+[3]: {{ site.baseurl }}/release/3.0.0.html
+[4]: {{ site.baseurl }}/docs/r/news/

Review comment:
       ```suggestion
   [4]: {{ site.baseurl }}/docs/r/news/
   [5]: https://issues.apache.org/jira/browse/ARROW-11329?jql=project%20%3D%20ARROW%20AND%20status%20not%20in%20%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20and%20fixVersion%20%3D%203.0.0%20AND%20text%20~%20rust%20ORDER%20BY%20created%20DESC
   [6]: https://docs.google.com/document/d/1qspsOM_dknOxJKdGvKbC1aoVoO0M3i6x1CIo58mmN2Y/edit#heading=h.kstb571j5g5j
   
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] andygrove commented on a change in pull request #92: 3.0 release blog post

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #92:
URL: https://github.com/apache/arrow-site/pull/92#discussion_r569540898



##########
File path: _posts/2021-01-25-3.0.0-release.md
##########
@@ -0,0 +1,237 @@
+---
+layout: post
+title: "Apache Arrow 3.0.0 Release"
+date: "2021-01-25 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Apache Arrow team is pleased to announce the 3.0.0 release. This covers
+over 3 months of development work and includes [**666 resolved issues**][1]
+from [**106 distinct contributors**][2]. See the Install Page to learn how to
+get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## Columnar Format Notes
+
+The Decimal256 data type, which was already supported by the Arrow columnar
+format specification, is now implemented in C++ and Java (ARROW-9747).
+
+## Arrow Flight RPC notes
+
+Authentication in C++/Java/Python has been overhauled, allowing more flexible authentication methods and use of standard headers.
+Support for cookies has also been added.
+The C++/Java implementations are now more permissive when parsing messages in order to interoperate better with other Flight implementations.
+
+A basic Flight implementation for C#/.NET has been added.
+See the [implementation status matrix](https://arrow.apache.org/docs/status.html#flight-rpc) for details.
+
+## C++ notes
+
+The default memory pool can now be changed at runtime using the environment
+variable `ARROW_DEFAULT_MEMORY_POOL` (ARROW-11009).  The environment variable
+is inspected at process startup.  This is useful when trying to diagnose memory
+consumption issues with Arrow.
+
+STL-like iterators are now provided over concrete arrays. Those are useful for
+non-performance critical tasks, for example testing (ARROW-10776).
+
+It is now possible to concatenate dictionary arrays with unequal dictionaries.
+The dictionaries are unified when concatenating, for supported data types
+(ARROW-5336).
+
+Threads in a thread pool are now spawned lazily as needed for enqueued
+tasks, up to the configured capacity. They used to be spawned upfront on
+creation of the thread pool (ARROW-10038).
+
+### Compute layer
+
+Comprehensive documentation for compute functions is now available:
+https://arrow.apache.org/docs/cpp/compute.html
+
+Compute functions for string processing have been added for:
+
+* splitting on whitespace (ASCII and Unicode flavors) and splitting on a
+  pattern (ARROW-9991);
+* trimming characters (ARROW-9128).
+
+Behavior of the `index_in` and `is_in` compute functions with nulls has been
+changed for consistency (ARROW-10663).
+
+Multiple-column sort kernels are now available for tables and record batches
+(ARROW-8199, ARROW-10796, ARROW-10790).
+
+Performance of table filtering has been vastly improved (ARROW-10569).
+
+Scalar arguments are now accepted for more compute functions.
+
+Compute functions `quantile` (ARROW-10831) and `is_nan` (ARROW-11043) have been
+added for numeric data.
+
+Aggregation functions `any` (ARROW-1846) and `all` (ARROW-10301) have been
+added for boolean data.
+
+### Dataset
+
+The `Expression` hierarchy has simplified to a wrapper around literals, field references,
+or calls to named functions. This enables usage of any compute function while filtering
+with no boilerplate.
+
+Parquet statistics are lazily parsed in `ParquetDatasetFactory` and
+`ParquetFileFragment` for shorter construction time.
+
+### CSV
+
+Conversion of string columns is now faster thanks to faster UTF-8 validation
+of small strings (ARROW-10313).
+
+Conversion of floating-point columns is now faster thanks to optimized
+string-to-double conversion routines (ARROW-10328).
+
+Parsing of ISO8601 timestamps is now more liberal: trailing zeros can
+be omitted in the fractional part (ARROW-10337).
+
+Fixed a bug where null detection could give the wrong results on some platforms
+(ARROW-11067).
+
+Added type inference for Date32 columns for values in the form `YYYY-MM-DD`
+(ARROW-11247).
+
+### Feather
+
+Fixed reading of compressed Feather files written with Arrow 0.17 (ARROW-11163).
+
+### Filesystem layer
+
+S3 recursive tree walks now benefit from a parallel implementation, where reads
+of multiple child directories are now issued concurrently (ARROW-10788).
+
+Improved empty directory detection to be mindful of differences between Amazon
+and Minio S3 implementations (ARROW-10942).
+
+### Flight RPC
+
+IPv6 host addresses are now supported (ARROW-10475).
+
+### IPC
+
+It is now possible to emit dictionary deltas where possible using the IPC
+stream writer. This is governed by a new variable in the `IpcWriteOptions` class
+(ARROW-6883).
+
+It is now possible to read wider tables, which used to fail due to reaching a
+limit during Flatbuffers verification (ARROW-10056).
+
+### Parquet
+
+Fixed reading of LZ4-compressed Parquet columns emitted by the Java Parquet
+implementation (ARROW-11301).
+
+Fixed a bug where writing multiple batches of nullable nested strings to Parquet
+would not write any data in batches after the first one (ARROW-10493)
+
+The Decimal256 data type can be read from and written to Parquet (ARROW-10607).
+
+LargeString and LargeBinary data can now be written to Parquet (ARROW-10426).
+
+## C# notes
+
+The .NET package added initial support for Arrow Flight clients and servers. Support is enabled through two new NuGet packages [Apache.Arrow.Flight](https://www.nuget.org/packages/Apache.Arrow.Flight/) (client) and [Apache.Arrow.Flight.AspNetCore](https://www.nuget.org/packages/Apache.Arrow.Flight.AspNetCore/) (server).
+
+Also fixed an issue where ArrowStreamWriter wasn't writing schema metadata to Arrow streams.
+
+## Julia notes
+
+This is the first release to officially include
+[an implementation](https://github.com/apache/arrow/tree/master/julia/Arrow)
+for the Julia language. The pure Julia implementation includes support
+for [wide coverage of the format specification](https://arrow.apache.org/docs/status.html).
+Additional details can be found in the
+[julialang.org blog post](https://julialang.org/blog/2021/01/arrow/).
+
+## Python notes
+
+Support for Python 3.9 was added (ARROW-10224), and support for Python 3.5
+was removed (ARROW-5679).
+
+Support for building manylinux1 packages has been removed (ARROW-11212).
+PyArrow continues to be available as manylinux2010 and manylinux2014 wheels.
+
+The minimal required version for NumPy is now 1.16.6. Note that when upgrading
+NumPy to 1.20, you also need to upgrade pyarrow to 3.0.0 to ensure compatibility,
+as this pyarrow release fixed a compatibility issue with NumPy 1.20 (ARROW-10833).
+
+Compute functions are now automatically exported from C++ to the `pyarrow.compute`
+module, and they have docstrings matching their C++ definition.
+
+An `iter_batches()` method is now available for reading a Parquet file iteratively
+(ARROW-7800).
+
+Alternate memory pools (such as mimalloc, jemalloc or the C malloc-based memory
+pool) are now available from Python (ARROW-11049).
+
+Fixed a potential deadlock when importing pandas from several threads (ARROW-10519).
+
+See the C++ notes above for additional details.
+
+## R notes
+
+This release contains new features for the Flight RPC wrapper, better
+support for saving R metadata (including `sf` spatial data) to Feather and
+Parquet files, several significant improvements to speed and memory management,
+and many other enhancements.
+
+For more on what’s in the 3.0.0 R package, see the [R changelog][4].
+
+## Ruby and C GLib notes
+
+### Ruby
+
+In Ruby binding, 256-bit decimal support and `Arrow::FixedBinaryArrayBuilder` are added likewise C GLib below.
+
+### C GLib
+
+In the version 3.0.0 of C GLib consists of many new features.
+
+A chunked array, a record batch, and a table support `sort_indices` function as well as an array.
+These functions including array's support to specify sorting option.
+`garrow_array_sort_to_indices` has been renamed to `garrow_array_sort_indices` and the previous name has been deprecated.
+
+`GArrowField` supports functions to handle metadata.
+`GArrowSchema` supports `garrow_schema_has_metadata()` function.
+
+`GArrowArrayBuilder` supports to add single null, multiple nulls, single empty value, and multiple empty values.
+`GArrowFixedSizedBinaryArrayBuilder` is newly supported.
+
+256-bit decimal and extension types are newly supported.
+Filesystem module supports Mock, HDFS, S3 file systems.
+Dataset module supports CSV, IPC, and Parquet file formats.
+
+## Rust notes
+

Review comment:
       ```suggestion
   ### Core Arrow Crate
   
   The development of the arrow crate was focused on four main aspects:
   
   - Make the crate usable in stable Rust
   - Bug fixing and removal of `unsafe` code
   - Extend functionality to keep up with the specification
   - Increase performance of existing kernels
   
   #### Stable Rust
   
   Possibly the biggest news for this release is that all project crates, including arrow, parquet, and datafusion now
   build with stable Rust by default. Nightly / unstable Rust is still required when enabling the SIMD feature.
   
   #### Parquet Arrow writer
   
   The Parquet Writer for Arrow arrays is now available, allowing the Rust programs to easily read and write Parquet
   files and making it easier to integrate with the overall Arrow ecosystem. The reader and writer include both basic
   and nested type support (List, Dictionary, Struct)
   
   #### First Class Arrow Flight IPC Support
   
   This release the Arrow Flight IPC implementation in Rust became fully-featured enough to participate in the regular
   cross-language integration tests, thus ensuring Rust applications written using Arrow can interoperate with the rest
   of the ecosystem
   
   #### Performance
   
   There have been numerous performance improvements in this release across the board. This includes both kernel
   operations, such as `take`, `filter`, and `cast`, as well as more fundamental parts such as bitwise comparison
   and reading and writing to CSV.
   
   #### Increased Data Type Support
   
   New DataTypes:
   - Decimal data type for fixed-width decimal values
   
   Improved operation support for nested structures Dictionary, and Lists (filter, take, etc)
   
   #### Other improvements:
   
   - Added support for Date and time on FFI
   - Added support for Binary type on FFI
   - Added support for i64 sized arrays to “take” kernel
   - Support for the i128 Decimal Type
   - Added support to cast string to date
   - Added API to create arrays out of existing arrays (e.g. for join, merge-sort, concatenate)
   - The simd feature is now also available on aarch64
   
   #### API Changes
   
   - BooleanArray is no longer a PrimitiveArray
   - ArrowNativeType no longer includes bool since arrows boolean type is represented using bitpacking
   - Several Buffer methods are now infallible instead of returning a Result
   - DataType::List now contains a Field to track metadata about the contained elements
   - PrimitiveArray::raw_values, values_slice and values methods got replaced by a values method returning a slice
   - Buffer::data and raw_data were renamed to as_slice and as_ptr
   - MutableBuffer::data_mut and freeze were renamed to as_slice_mut and into to be more consistent with the stdlib naming conventions
   - The generic type parameter for BufferBuilder was changed from ArrowPrimitiveType to ArrowNativeType
   
   ### DataFusion
   
   #### SQL
   
   In this release, we clarified that DataFusion will standardize on the PostgreSQL SQL dialect.
   
   New SQL support:
   - JOIN, LEFT JOIN, RIGHT JOIN
   - COUNT DISTINCT
   - CASE WHEN
   - USING
   - BETWEEN
   - IS IN
   - Nested SELECT statements
   - Nested expressions in aggregations
   - LOWER(), UPPER(), TRIM()
   - NULLIF()
   - SHA224(), SHA256(), SHA384(), SHA512()
   - DATE_TRUNC()
   
   #### Performance
   
   There have been numerous performance improvements in this release:
   - Optimizations for JOINs such as using vectorized hashing.
   - We started with adding statistics and cost-based optimizations. We choose the smaller side of a join as the build
     side if possible.
   - Improved parallelism when reading partitioned Parquet data sources
   - Concurrent writes of CSV and Parquet partitions to file
   
   ### Parquet Crate
   
   The Parquet has the following improvements:
   
   - Nested reading
   - Support to write booleans
   - Add support to write temporal types
   
   ### Roadmap for 4.0.0
   
   We have also started building up a shared community roadmap for 4.0: [Apache Arrow: Crowd Sourced Rust Roadmap for
   Arrow 4.0, January 2021][6].
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] andygrove commented on pull request #92: 3.0 release blog post

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #92:
URL: https://github.com/apache/arrow-site/pull/92#issuecomment-771919056


   @nealrichardson should I PR against this PR to add the Rust content?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] pitrou commented on pull request #92: 3.0 release blog post

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #92:
URL: https://github.com/apache/arrow-site/pull/92#issuecomment-770972724


   I'd ping @bkietz for dataset changes and @jorisvandenbossche for the various fixes and improvements in Python-land.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on pull request #92: 3.0 release blog post

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #92:
URL: https://github.com/apache/arrow-site/pull/92#issuecomment-772162274


   @andygrove you can or you can just push to my branch


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] quinnj commented on a change in pull request #92: 3.0 release blog post

Posted by GitBox <gi...@apache.org>.
quinnj commented on a change in pull request #92:
URL: https://github.com/apache/arrow-site/pull/92#discussion_r566303578



##########
File path: _posts/2021-01-25-3.0.0-release.md
##########
@@ -0,0 +1,93 @@
+---
+layout: post
+title: "Apache Arrow 3.0.0 Release"
+date: "2021-01-25 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Apache Arrow team is pleased to announce the 3.0.0 release. This covers
+over 3 months of development work and includes [**666 resolved issues**][1]
+from [**106 distinct contributors**][2]. See the Install Page to learn how to
+get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+
+## Columnar Format Notes
+
+ARROW-9747 - [C++][Java][Format] Support Decimal256 Type
+
+## Arrow Flight RPC notes
+
+## C++ notes
+
+
+Critical fixes in writing/reading of nested data with parquet ... (ARROW-10493)
+
+Several new compute kernels were added (any/all boolean reduction,
+``is_nan``, sorting by multiple columns).
+String kernels (trimming, splitting, filling nulls)
+
+Fixed reading of compressed Feather files written with pyarrow 0.17.
+
+
+ARROW-6883 - [C++] Support sending delta DictionaryBatch or replacement DictionaryBatch in IPC stream writer class
+
+## C# notes
+
+## Go notes
+
+## Java notes
+
+## JavaScript notes
+

Review comment:
       ```suggestion
   ## Julia notes
   
   This is the first release to officially include [an implementation](https://github.com/apache/arrow/tree/master/julia/Arrow) for the Julia language. The pure Julia implementation includes support for [wide coverage of the format specification](https://arrow.apache.org/docs/status.html). Additional details can be found in the [julialang.org blog post](https://julialang.org/blog/2021/01/arrow/).
   
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org