You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by al...@apache.org on 2022/02/04 14:15:07 UTC

[arrow-site] 01/01: Blog post for arrow 9 release

This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch alamb/arrow-9-blog
in repository https://gitbox.apache.org/repos/asf/arrow-site.git

commit 6249e8d840f37ee594826ba5944727f03878270b
Author: Andrew Lamb <an...@nerdnetworks.org>
AuthorDate: Fri Feb 4 09:03:55 2022 -0500

    Blog post for arrow 9 release
---
 _posts/2022-02-04-rust-9.0.0.md | 139 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 139 insertions(+)

diff --git a/_posts/2022-02-04-rust-9.0.0.md b/_posts/2022-02-04-rust-9.0.0.md
new file mode 100644
index 0000000..12c8c54
--- /dev/null
+++ b/_posts/2022-02-04-rust-9.0.0.md
@@ -0,0 +1,139 @@
+---
+layout: post
+title: "Recent Rust Apache Arrow and Parquet Highlights"
+date: "2022-02-04 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Rust implementation of [Apache Arrow] has just released version `9.0.0`.
+
+While a major version of this magnitude may shock some in the Rust
+community to whom it implies a slow moving 20 year old piece of
+software, nothing could be further from the truth!
+
+With regular and predictable bi-weekly releases, the library continues
+to evolve and `9.0.0` is no exception. Some recent highlights
+
+
+# `parquet`: async, performance, safety and nested types
+
+The parquet `9.0.0` release includes an `async` reader (TODO link to rustdoc
+when published), a long time requested feature. Using the `async`
+reader it is now possible to read only the relevant parts of a parquet
+file from a networked source such as object storage. Previously the
+entire file had to be buffered locally. We are hoping to add a `async`
+writer in a future release and would love some
+[help](https://github.com/apache/arrow-rs/issues/1269).
+
+It is also significantly faster to read parquet data (up to
+[60x](https://github.com/apache/arrow-rs/pull/1180#issuecomment-1018518863)
+in some cases) than with previous versions of the `parquet`
+crate. Kudos to [tustvold](https://github.com/tustvold) and
+[yordan-pavlov](https://github.com/yordan-pavlov) for their
+contributions in these areas.
+
+With `8.0.0` and later, the code that reads and writes `RecordBatch`es
+to and from Parquet now supports all types, including deeply nested
+structs and lists. Thanks [helgikrs](https://github.com/helgikrs) for
+cleaning up the last corner cases!
+
+Other notable recent additions to parquet are `UTF-8` validation on
+string data for improved security against malicious inputs.
+
+Planned upcoming work includes [pushing more
+filtering](https://github.com/apache/arrow-rs/issues/1191) directly
+into the parquet scan as well as an `async` writer.
+
+
+# `arrow`: performance, dyn kernels, and DecimalArray
+
+The [compute](https://docs.rs/arrow/latest/arrow/compute/index.html)
+kernels have been improved significantly in arrow `9.0.0`. Some [filter
+benchmarks](https://github.com/apache/arrow-rs/pull/1228#issue-1111889246)
+are twice as fast and the SIMD kernels are also [significantly
+faster](https://github.com/apache/arrow-rs/pull/1221). Many thanks to tustvold
+[tustvold](https://github.com/tustvold) and
+[jhorstmann](https://github.com/jhorstmann)
+
+We are working on new set of "dynamic" `dyn_` kernels (for example,
+[`eq_dyn`](https://docs.rs/arrow/8.0.0/arrow/compute/kernels/comparison/fn.eq_dyn.html))
+that make it easier to invoke the heavily optimized kernels provided
+by the `arrow` crate. Work is underway to expand the breadth of types
+supported by these new kernels to make them even more useful. Thanks
+to [matthewmturner](https://github.com/matthewmturner) and
+[viirya](https://github.com/viirya) for their help in this
+effort.
+
+While `arrow` has had basic support for `DecimalArray` since version
+`3.0.0`, support has been expanded for `Decimal` type in calculation
+kernels such as `sort`, `take` and `filter` thanks to some great
+contributions from [liukun4515](https://github.com/liukun4515). There
+is [ongoing work](https://github.com/apache/arrow-rs/pull/1223) to
+improve the API ergonomics and performance of `DecimalArray` as well.
+
+# Security
+
+The `6.4.0` release resolved the last outstanding
+[RUSTSEC](https://rustsec.org/)
+[advisory](https://github.com/rustsec/advisory-db/pull/1131) on the
+arrow crate and the `8.0.0` release resolved the last outstanding
+known security issues. While these security issues were mostly limited
+misuse of the low level "power user" APIs which most users do not (and
+should not) be using, it is good to tighten up that area.
+
+Now that `arrow-rs` is releasing major versions every other week, we
+are also able to update dependencies at the same pace, helping to
+ensure that security fixes upstream can flow more quickly to
+downstream projects.
+
+# Final shoutout
+It takes a community to build great software, and we would like to
+thank everyone who has contributed to the arrow-rs repository since
+the `7.0.0` release:
+
+```console
+git shortlog -sn 7.0.0..9.0.0
+    22  Raphael Taylor-Davies
+    18  Andrew Lamb
+     6  Helgi Kristvin Sigurbjarnarson
+     6  Remzi Yang
+     5  Jörn Horstmann
+     4  Liang-Chi Hsieh
+     3  Jiayu Liu
+     2  dependabot[bot]
+     2  Yijie Shen
+     1  Matthew Turner
+     1  Kun Liu
+     1  Yang
+     1  Edd Robinson
+     1  Patrick More
+```
+
+
+# How to Get Involved
+
+If you are interested in contributing to the Rust subproject in Apache Arrow, you can find a list of open issues
+suitable for beginners [here](https://github.com/apache/arrow-rs/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
+and the full list [here](https://github.com/apache/arrow-rs/issues).
+
+Other ways to get involved include trying out Arrow on some of your data and filing bug reports, and helping to
+improve the documentation.