You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by al...@apache.org on 2022/02/14 11:22:38 UTC

[arrow-site] branch master updated: ARROW-15675: [Website] Blog post for Rust arrow 9 release (in fork) (#192)

This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-site.git


The following commit(s) were added to refs/heads/master by this push:
     new bc2bde1  ARROW-15675: [Website] Blog post for Rust arrow 9 release (in fork) (#192)
bc2bde1 is described below

commit bc2bde19e6dfa21fb66aacd19f2e3bcc55d14ea4
Author: Andrew Lamb <an...@nerdnetworks.org>
AuthorDate: Mon Feb 14 06:22:28 2022 -0500

    ARROW-15675: [Website] Blog post for Rust arrow 9 release (in fork) (#192)
    
    * Blog post for arrow 9 release
    
    * Add teaser for arrow 10.0.0
    
    * Update _posts/2022-02-04-rust-9.0.0.md
    
    Co-authored-by: Jonathan Keane <jk...@gmail.com>
    
    * Reame file to match date
    
    * Add final links, fix some typos
    
    Co-authored-by: Jonathan Keane <jk...@gmail.com>
---
 _posts/2022-02-13-rust-9.0.md | 140 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 140 insertions(+)

diff --git a/_posts/2022-02-13-rust-9.0.md b/_posts/2022-02-13-rust-9.0.md
new file mode 100644
index 0000000..447f2a6
--- /dev/null
+++ b/_posts/2022-02-13-rust-9.0.md
@@ -0,0 +1,140 @@
+---
+layout: post
+title: " February 2022 Rust Apache Arrow and Parquet Highlights"
+date: "2022-02-13 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Rust implementation of [Apache Arrow](https://arrow.apache.org/) has just released version `9.0.2`.
+
+While a major version of this magnitude may shock some in the Rust
+community to whom it implies a slow moving 20 year old piece of
+software, nothing could be further from the truth!
+
+With regular and predictable bi-weekly releases, the library continues
+to evolve rapidly, and `9.0.2` is no exception. Some recent highlights:
+
+
+# `parquet`: async, performance, safety and nested types
+
+The [parquet `9.0.2`](https://crates.io/crates/arrow/9.0.2) release includes an [`async` reader](https://github.com/apache/arrow-rs/blob/9.0.2/parquet/src/arrow/async_reader.rs#L21-L75), a long time requested feature. Using the `async`
+reader it is now possible to read only the relevant parts of a parquet
+file from a networked source such as object storage. Previously the
+entire file had to be buffered locally. We are hoping to add an `async`
+writer in a future release and would love some
+[help](https://github.com/apache/arrow-rs/issues/1269).
+
+It is also significantly faster to read parquet data (up to
+[60x](https://github.com/apache/arrow-rs/pull/1180#issuecomment-1018518863)
+in some cases) than with previous versions of the `parquet`
+crate. Kudos to [tustvold](https://github.com/tustvold) and
+[yordan-pavlov](https://github.com/yordan-pavlov) for their
+contributions in these areas.
+
+With `8.0.0` and later, the code that reads and writes `RecordBatch`es
+to and from Parquet now supports all types, including deeply nested
+structs and lists. Thanks [helgikrs](https://github.com/helgikrs) for
+cleaning up the last corner cases!
+
+Other notable recent additions to parquet are `UTF-8` validation on
+string data for improved security against malicious inputs.
+
+Planned upcoming work includes [pushing more
+filtering](https://github.com/apache/arrow-rs/issues/1191) directly
+into the parquet scan as well as an `async` writer.
+
+
+# `arrow`: performance, dyn kernels, and DecimalArray
+
+The [compute](https://docs.rs/arrow/latest/arrow/compute/index.html)
+kernels have been improved significantly in [arrow `9.0.2`](https://crates.io/crates/parquet/9.0.2). Some [filter
+benchmarks](https://github.com/apache/arrow-rs/pull/1228#issue-1111889246)
+are twice as fast and the SIMD kernels are also [significantly
+faster](https://github.com/apache/arrow-rs/pull/1221). Many thanks to
+[tustvold](https://github.com/tustvold) and
+[jhorstmann](https://github.com/jhorstmann).
+[Additional substantial](https://github.com/apache/arrow-rs/pull/1248)
+improvements are likely to land in arrow `10.0.0`.
+
+We are working on new set of "dynamic" `dyn_` kernels (for example,
+[`eq_dyn`](https://docs.rs/arrow/8.0.0/arrow/compute/kernels/comparison/fn.eq_dyn.html))
+that make it easier to invoke the heavily optimized kernels provided
+by the `arrow` crate. Work is underway to expand the breadth of types
+supported by these new kernels to make them even more useful. Thanks
+to [matthewmturner](https://github.com/matthewmturner) and
+[viirya](https://github.com/viirya) for their help in this
+effort.
+
+While `arrow` has had basic support for `DecimalArray` since version
+`3.0.0`, support has been expanded for `Decimal` type in calculation
+kernels such as `sort`, `take` and `filter` thanks to some great
+contributions from [liukun4515](https://github.com/liukun4515). There
+is [ongoing work](https://github.com/apache/arrow-rs/pull/1223) to
+improve the API ergonomics and performance of `DecimalArray` as well.
+
+# Security
+
+The `6.4.0` release resolved the last outstanding
+[RUSTSEC](https://rustsec.org/)
+[advisory](https://github.com/rustsec/advisory-db/pull/1131) on the
+arrow crate and the `8.0.0` release resolved the last outstanding
+known security issues. While these security issues were mostly limited
+misuse of the low level "power user" APIs which most users do not (and
+should not) be using, it was good to tighten up that area.
+
+Now that `arrow-rs` is releasing major versions every other week, we
+are also able to update dependencies at the same pace, helping to
+ensure that security fixes upstream can flow more quickly to
+downstream projects.
+
+# Final shoutout
+It takes a community to build great software, and we would like to
+thank everyone who has contributed to the arrow-rs repository since
+the `7.0.0` release:
+
+```console
+git shortlog -sn 7.0.0..9.0.0
+    22  Raphael Taylor-Davies
+    18  Andrew Lamb
+     6  Helgi Kristvin Sigurbjarnarson
+     6  Remzi Yang
+     5  Jörn Horstmann
+     4  Liang-Chi Hsieh
+     3  Jiayu Liu
+     2  dependabot[bot]
+     2  Yijie Shen
+     1  Matthew Turner
+     1  Kun Liu
+     1  Yang
+     1  Edd Robinson
+     1  Patrick More
+```
+
+
+# How to Get Involved
+
+If you are interested in contributing to the Rust subproject in Apache Arrow, you can find a list of open issues
+suitable for beginners [here](https://github.com/apache/arrow-rs/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
+and the full list [here](https://github.com/apache/arrow-rs/issues).
+
+Other ways to get involved include trying out Arrow on some of your data and filing bug reports, and helping to
+improve the documentation.