You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by am...@apache.org on 2022/02/07 15:15:09 UTC

[arrow-site] branch master updated: Version 7.0.0 release blog post (#178)

This is an automated email from the ASF dual-hosted git repository.

amolina pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-site.git


The following commit(s) were added to refs/heads/master by this push:
     new 8961dfb  Version 7.0.0 release blog post  (#178)
8961dfb is described below

commit 8961dfbcaea92cbbf95f30da23941c02ddf0bbed
Author: Alessandro Molina <am...@turbogears.org>
AuthorDate: Mon Feb 7 16:15:03 2022 +0100

    Version 7.0.0 release blog post  (#178)
    
    * v7.0.0 post skeleton
    
    * Add rust details
    
    * Import @bkmgit's suggestion for Ruby and C GLib
    
    Co-authored-by: Benson Muite <bk...@users.noreply.github.com>
    
    * Add C++ notes
    
    * Update Ruby part
    
    Co-authored-by: Kenta Murata <39...@users.noreply.github.com>
    
    * Update GLib part
    
    Co-authored-by: Kenta Murata <39...@users.noreply.github.com>
    
    * Update _posts/2022-01-19-7.0.0-release.md
    
    * Update _posts/2022-01-19-7.0.0-release.md
    
    Co-authored-by: David Li <li...@gmail.com>
    
    * Update _posts/2022-01-19-7.0.0-release.md
    
    Co-authored-by: David Li <li...@gmail.com>
    
    * Update _posts/2022-01-19-7.0.0-release.md
    
    Co-authored-by: Ian Alexander Joiner <ia...@gmail.com>
    
    * Update _posts/2022-01-19-7.0.0-release.md
    
    Co-authored-by: Matt Topol <zo...@gmail.com>
    
    * Update _posts/2022-01-19-7.0.0-release.md
    
    * Update _posts/2022-01-19-7.0.0-release.md
    
    Co-authored-by: Dominik Moritz <do...@gmail.com>
    
    * Update _posts/2022-01-19-7.0.0-release.md
    
    Co-authored-by: Ian Alexander Joiner <ia...@gmail.com>
    
    * Update _posts/2022-01-19-7.0.0-release.md
    
    * Update _posts/2022-01-19-7.0.0-release.md
    
    * Update 2022-01-19-7.0.0-release.md
    
    * Update 2022-01-19-7.0.0-release.md
    
    Co-authored-by: Andrew Lamb <an...@nerdnetworks.org>
    Co-authored-by: Kenta Murata <39...@users.noreply.github.com>
    Co-authored-by: Benson Muite <bk...@users.noreply.github.com>
    Co-authored-by: Antoine Pitrou <an...@python.org>
    Co-authored-by: Sutou Kouhei <ko...@cozmixng.org>
    Co-authored-by: David Li <li...@gmail.com>
    Co-authored-by: Ian Alexander Joiner <ia...@gmail.com>
    Co-authored-by: Matt Topol <zo...@gmail.com>
    Co-authored-by: Dominik Moritz <do...@gmail.com>
---
 _posts/2022-01-19-7.0.0-release.md | 273 +++++++++++++++++++++++++++++++++++++
 1 file changed, 273 insertions(+)

diff --git a/_posts/2022-01-19-7.0.0-release.md b/_posts/2022-01-19-7.0.0-release.md
new file mode 100644
index 0000000..9a28c4f
--- /dev/null
+++ b/_posts/2022-01-19-7.0.0-release.md
@@ -0,0 +1,273 @@
+---
+layout: post
+title: "Apache Arrow 7.0.0 Release"
+date: "2022-02-08 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+
+The Apache Arrow team is pleased to announce the 7.0.0 release. This covers
+over 3 months of development work and includes [**617 resolved issues**][1]
+from [**105 distinct contributors**][2]. See the Install Page to learn how to
+get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## Community
+
+Since the 6.0.1 release, Rémi Dattai and Alessandro Molina have been invited to be committers.
+Daniël Heres and Yibo Cai have joined the Project Management Committee (PMC).
+Thanks for your contributions and participation in the project!
+
+## Arrow Flight RPC notes
+
+The Flight specification has been clarified to note that schemas are expected to be IPC-encapsulated on the wire.
+
+Documentation has been generally improved; see the [Arrow Cookbook](https://arrow.apache.org/cookbook/) for recipes on how to use Flight in Python and R, and a new [example](https://github.com/apache/arrow/blob/master/cpp/examples/arrow/flight_grpc_example.cc) on how to use Flight and gRPC services on the same port.
+
+This release includes Arrow Flight SQL, a protocol for using Arrow Flight to execute queries against and fetch metadata from SQL databases. Support is included for C++ and Java (but *not* languages that bind to C++, like Python or R). A more detailed blog post is forthcoming. Note that development is ongoing and the specification is currently experimental.
+
+## C++ notes
+
+A set of CMake presets has been added to ease building Arrow in a number
+of cases (ARROW-14678, ARROW-14714).
+
+The `arrow::BitUtil` namespace has been renamed to `arrow::bit_util`
+(ARROW-13494).
+
+Concatenation of union arrays is now supported (ARROW-4975).
+
+`StructType` gained three convenience methods to add, change and remove
+a given field (ARROW-11424).
+
+The `Datum` kind `COLLECTION` has been removed as it was entirely unused
+in the codebase (ARROW-13598).
+
+### Compute Layer
+
+A number of compute functions have been added:
+
+- functions operating on strings: "binary_reverse" (ARROW-14306),
+  "string_repeat" (ARROW-12712), "utf8_normalize" (ARROW-14205);
+- "fill_null_forward", "fill_null_backward" (ARROW-1699);
+- "ceil_temporal", "floor_temporal", "round_temporal" to adjust temporal input
+  to an integral multiple of a given unit (ARROW-14822);
+- "year_month_day" to extract the calendar components of the input (ARROW-15032);
+- "random" to general random floating-point values between 0 and 1 (ARROW-12404);
+- "indices_nonzero" to return the indices in the input where there are
+  non-zero, non-null values (ARROW-13035).
+
+Decimal data is now supported as input of the arithmetic kernels
+(ARROW-13130).
+
+Dictionary data is now supported as input of the hash join execution node
+(ARROW-14181).
+
+Residual predicates have been implemented in the hash join node
+(ARROW-13643).
+
+The "list_parent_indices" function now always returns int64 data
+regardless of the input type (ARROW-14592).
+
+Month-day-nano interval data is now supported as input of the same functions
+as other interval types (ARROW-13989).
+
+### CSV
+
+The CSV writer got additional configuration options:
+- the string representation of null values (ARROW-14905);
+- the quoting strategy: always / never / as needed (ARROW-14905);
+- the end of line character(s) (ARROW-14907)
+
+### Dataset Layer
+
+[Skyhook]({% link _posts/2022-01-31-skyhook-bringing-computation-to-storage-with-apache-arrow.md %}),
+a dataset addition that offloads fragment scan operations to a
+Ceph distributed storage cluster, was contributed (ARROW-13607).
+
+The dataset writer now exposes options `min_rows_per_group` and
+`max_rows_per_group` to control the size of row groups created (ARROW-14426).
+
+### IO and Filesystem Layer
+
+A critical bug in the AWS SDK for C++ that risks losing data in S3 multipart
+uploads has been circumvented (ARROW-14523).
+
+The Google Cloud Storage filesystem is now featureful enough to pass all
+generic filesystem tests (ARROW-14924).
+
+The OpenAppendStream method of filesystems has been un-deprecated; however,
+it still cannot be implemented for all filesystem backends (ARROW-14969).
+
+A new function `arrow::fs::ResolveS3BucketRegion` allows resolving the
+region where a particular S3 bucket resides (ARROW-15165).
+
+The S3 filesystem now sets the Content-Type of output files to
+"application/octet-stream" (instead of "application/xml" previously)
+if not explicitly specified by the caller (ARROW-15306).
+
+### IPC
+
+Fine-grained I/O (coalescing) is now enabled in the synchronous (ARROW-12683)
+and asynchronous (ARROW-14577) IPC reader.
+
+It is now possible to set the compression level when using LZ4 compression
+(ARROW-9648).
+
+### ORC
+
+The ORC adapters have been significantly improved. A lot more properties of the ORC reader as well as ORC writer options are now available. Moreover API docs for both the ORC reader and the ORC writer have been generated.  (ARROW-11297)
+### Parquet
+
+DELTA_BYTE_ARRAY-encoded data can now be read from (but not written to)
+bytearray columns in Parquet files (PARQUET-492).
+
+## Go notes
+
+### Arrow
+
+#### Bug Fixes
+
+* License lifted up a level so that it is properly detected for the github.com/apache/arrow/go/v7 module for pkg.go.dev [ARROW-14728](https://github.com/apache/arrow/pull/11715). Documentation on pkg.go.dev will look correct with complete major version handling as of the v7.0.0 release.
+* Errors from `MessageReader.Message` get properly surfaced by `Reader.Read` [ARROW-14769](https://github.com/apache/arrow/pull/11739)
+* `ipc.Reader` properly uses the allocator it is initialized with instead of making native byte slices [ARROW-14717](https://github.com/apache/arrow/pull/11712)
+* Fixed a CI issue where the CGO tests were crashing on windows [ARROW-14589](https://github.com/apache/arrow/pull/11611)
+* Various fixes for internal usages of `Release` and `Retain` to maintain proper management of reference counting.
+
+#### Enhancements
+
+* Continuous Integration for Go library now uses Go1.16 as the version being tested [ARROW-14985](https://github.com/apache/arrow/pull/11860)
+* `ValueOffsets` function added to `array.String` to return the entire slice of offsets [ARROW-14645](https://github.com/apache/arrow/pull/11653)
+* `array.Interface` has been lifted to `arrow.Array`, `array.{Record,Column,Chunked,Table}` have been lifted to `arrow.{Record,Column,Chunked,Table}`. Interface `arrow.ArrayData` has been created to be used instead of `array.Data`. Aliases have been provided for the `array` package so existing code that doesn't directly use `array.Data` shouldn't be affected. The aliases will be removed in v8. [ARROW-5599](https://github.com/apache/arrow/pull/11832). The `Chunked.NewSlice` method has bee [...]
+* Arrays and Records now support marshalling to JSON via the `json.Marshaller` interface. Builders support adding values to them by unmarshalling from JSON via the `json.Unmarshaller` interface. `array.FromJSON` function added to create Arrays from JSON directly. [ARROW-9630](https://github.com/apache/arrow/pull/11359)
+* Basic handling of field referencing and expression building similar to the C++ Compute APIs added through the new `compute` package in preparation for adding compute interfaces. Does not yet allow *executing* expressions. [ARROW-14430](https://github.com/apache/arrow/pull/11514)
+
+### Parquet
+
+#### Enhancements
+
+* Updated dependency versions [ARROW-14462](https://github.com/apache/arrow/pull/11537)
+* `file` module added, Go Parquet library now supports full file reading and writing. [ARROW-13984](https://github.com/apache/arrow/pull/11146) [ARROW-13986](https://github.com/apache/arrow/pull/11538). Does not yet provide direct Parquet <--> Arrow conversions.
+* Internal min_max utility functions given Arm64 NEON SIMD optimized assembly, gaining a 4x - 6x performance improvement. [ARROW-15536](https://github.com/apache/arrow/pull/12163)
+
+## Java notes
+
+* Flight SQL support is now available in the Java library, with integration tests to verify it against the C++ reference implementation.
+* `GeneralOutOfPlaceVectorSorter` is now available for sorting any kind of vector. In general if dedicated sorters can be used (like `FixedWidthInPlaceVectorSorter`) they should preferred as they will generally perform better.
+* `log4j2` dependency was removed as it was unused and a possible vector for attacks 
+* `VectorSchemaRootAppender` now works with `BitVector`
+
+## JavaScript notes
+
+* Major simplifications to the API. There is only a single Vector class now. See the (also much improved) docs for details.
+* Dictionary vectors created with `vectorFromArray` are automatically cached for better performance.
+* Better tree shaking support. Some bundles can now be only a few kb.
+
+## Python notes
+
+* Official support for Python 3.6 has been dropped.
+* `random` and `indices_nonzero` compute functions are now supported in Python
+* `pyarrow.orc.read_table` is now provided to easily read the content of ORC files to a Table.
+* `pyarrow.orc.ORCFile` now has a lot more properties exposed.
+* `pyarrow.orc.ORCWriter` and `pyarrow.orc.write_table` now have the writer options available.
+* `pyarrow.orc` now has much better API documentation.
+* Support for compute functions arguments and options has been improved in general,  arguments are not position only, while options can be provided as keyword args or not, and error reporting for wrong arguments has been improved.
+* `Table` now has a `group_by` method that allows to perform aggregations on table data. The compute functions documentation has also been improved to better distinguish between standard compute functions and `HASH_AGGREGATE` compute functions that can only be using for aggregations.
+* Python documentation now provides interlinking for references to parameter types and return values, thus making far easier to navigate the documentation.
+
+## R notes
+
+This release adds additional improvements to the dyplr interface, to CSV support and to the C-Data interface to exchange data with other languages. For more details, see the [complete R changelog][4].
+
+## Ruby and C GLib notes
+
+### Ruby
+
+There are two new contributors @okadakk and @simpl1g .
+
+The updates of Red Arrow consists of the following improvements:
+
+- `Arrow::Function#execute` now accepts an instance of an `Arrow::Column` as its argument [(ARROW-14551)](https://issues.apache.org/jira/browse/ARROW-14551)
+- `Arrow::Table.load` now supports `.arrows` files to load [(ARROW-15356)](https://issues.apache.org/jira/browse/ARROW-15356)
+- Add support loading `Arrow::Table` by a `URI` in `Arrow::Table.load` [(ARROW-14562)](https://issues.apache.org/jira/browse/ARROW-14562)
+- `Arrow::Table` now supports to join two tables [(ARROW-14531)](https://issues.apache.org/jira/browse/ARROW-14531)
+- `Arrow::Function#execute` gets more easier to use than before [(ARROW-15274)](https://issues.apache.org/jira/browse/ARROW-15274)
+- `Arrow::SortKey#name` has been renamed to `Arrow::SortKey#target` [(ARROW-14784)](https://issues.apache.org/jira/browse/ARROW-14784)
+- Add Cookbook section to documentation [(ARROW-14636)](https://issues.apache.org/jira/browse/ARROW-14636)
+- Support the explicit initialization of S3 API by the `Arrow.s3_initialize` method [(ARROW-14637)](https://issues.apache.org/jira/browse/ARROW-14637)
+- On macOS, stop specifying the version of openssl package explicitly when building the extension library [(ARROW-14619)](https://issues.apache.org/jira/browse/ARROW-14619)
+
+### C GLib
+
+The updates of Arrow GLib etc. consists of the following improvements:
+
+- Add `garrow_execute_plan_build_hash_join_node` function, `GArrowHashJoinNodeOption`, and `GArrowJoinType` [(ARROW-15288)](https://issues.apache.org/jira/browse/ARROW-15288)
+- Add `garrow_function_get_options_type` function [(ARROW-15273)](https://issues.apache.org/jira/browse/ARROW-15273)
+- Add `garrow_function_get_default_options` function [(ARROW-15267)](https://issues.apache.org/jira/browse/ARROW-15267)
+- Add `GArrowRoundToMultipleOptions` to customize the `round_to_multiple` function [(ARROW-15216)](https://issues.apache.org/jira/browse/ARROW-15216)
+- Add `garrow_function_all` function to list up all the functions [(ARROW-15205)](https://issues.apache.org/jira/browse/ARROW-15205)
+    - In addition, add `garrow_function_get_name`, `garrow_function_equal`, and `garrow_function_to_string` functions for convenience
+- Add `GArrowRoundOptions` [(ARROW-15204)](https://issues.apache.org/jira/browse/ARROW-15204)
+- Add `garrow_struct_scalar_get_value` function for converting a C++ scalar value to a GLib value [(ARROW-15203)](https://issues.apache.org/jira/browse/ARROW-15203)
+- Add the following three interval data types [(ARROW-15134)](https://issues.apache.org/jira/browse/ARROW-15134)
+    - `GArrowMonthIntervalDataType` for the interval with the month component
+    - `GArrowDayTimeIntervalDataType` for the interval with the days and the milliseconds components
+    - `GArrowMonthDayNanoIntervalDataType` for the interval with the months, the days, and the nanoseconds components
+- Rename `GArrowSortKey::name` to `::target` [(ARROW-14784)](https://issues.apache.org/jira/browse/ARROW-14784)
+- Support the explicit initialization of S3 API by the `garrow_s3_initialize` function [(ARROW-14637)](https://issues.apache.org/jira/browse/ARROW-14637)
+- `garrow_decimal128_new_string` and `garrow_decimal256_new_string` now returns errors when they gets a invalid decimal string [(ARROW-14530)](https://issues.apache.org/jira/browse/ARROW-14530)
+- `garrow_decimal128_data_type_new` and `garrow_decimal256_data_type_new` functions now validates the given precision [(ARROW-14529)](https://issues.apache.org/jira/browse/ARROW-14529)
+
+## Rust notes
+
+Rust releases minor versions every 2 weeks in addition to a major
+version with the rest of the Arrow language implementations. Thus most
+enhancements have been incrementally released over the last 3 months
+as part of the 6.x.
+
+Going forward, the Rust implementation version will start deviating
+from the rest of the Arrow implementations, incrementing a major
+version if the changes to the crate require it. We still plan a
+release every other week. Please see issue [#1120](https://github.com/apache/arrow-rs/issues/1120)
+for more details
+
+Major changes in the 7.0.0 release include:
+1. Additional support for `Decimal`
+2. More ergonomic compute kernels that take `dyn Array`
+3. `Union` type now follows the latest Arrow standard
+4. Support for custom datetime format for inference and parsing CSV files
+
+Another highlight is that the community continues to improve the
+safety of the arrow crate. The 6.4.0 release included complete data
+validation and has resolved all outstanding RUSTSEC issues against the
+crate.
+
+For additional details on the 7.0.0
+Rust implementation, please see the [Arrow Rust CHANGELOG][5]
+
+[1]: https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%207.0.0
+[2]: {{ site.baseurl }}/release/7.0.0.html#contributors
+[3]: {{ site.baseurl }}/release/7.0.0.html#changelog
+[4]: {{ site.baseurl }}/docs/r/news/
+[5]: https://github.com/apache/arrow-rs/blob/7.0.0/CHANGELOG.md