You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/10 14:59:44 UTC

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1840: Update arrow module docs

tustvold commented on code in PR #1840:
URL: https://github.com/apache/arrow-rs/pull/1840#discussion_r894611994


##########
arrow/src/lib.rs:
##########
@@ -15,114 +15,215 @@
 // specific language governing permissions and limitations
 // under the License.
 
-//! A native Rust implementation of [Apache Arrow](https://arrow.apache.org), a cross-language
+//! A complete, safe, native Rust implementation of [Apache Arrow](https://arrow.apache.org), a cross-language

Review Comment:
   Nothing like a good bit of marketeering :laughing: 



##########
arrow/src/array/mod.rs:
##########
@@ -15,40 +15,66 @@
 // specific language governing permissions and limitations
 // under the License.
 
-//! The central type in Apache Arrow are arrays, represented
-//! by the [`Array` trait](crate::array::Array).
-//! An array represents a known-length sequence of values all
-//! having the same type.
+//! The central type in Apache Arrow are arrays, which are a known-length sequence of values
+//! all having the same type. This module provides concrete implementations of each type, as
+//! well as an [`Array`] trait that can be used for type-erasure.
 //!
-//! Internally, those values are represented by one or several
-//! [buffers](crate::buffer::Buffer), the number and meaning
-//! of which depend on the array’s data type, as documented in
-//! [the Arrow data layout specification](https://arrow.apache.org/docs/format/Columnar.html).
-//! For example, the type `Int16Array` represents an Apache
-//! Arrow array of 16-bit integers.
+//! # Downcasting an Array

Review Comment:
   I feel it is important to highlight this from the outset, as it can be unclear for a new user given an `Array` how to actually do something with it :laughing: 



##########
arrow/src/array/mod.rs:
##########
@@ -15,40 +15,66 @@
 // specific language governing permissions and limitations
 // under the License.
 
-//! The central type in Apache Arrow are arrays, represented
-//! by the [`Array` trait](crate::array::Array).
-//! An array represents a known-length sequence of values all
-//! having the same type.
+//! The central type in Apache Arrow are arrays, which are a known-length sequence of values
+//! all having the same type. This module provides concrete implementations of each type, as
+//! well as an [`Array`] trait that can be used for type-erasure.
 //!
-//! Internally, those values are represented by one or several
-//! [buffers](crate::buffer::Buffer), the number and meaning
-//! of which depend on the array’s data type, as documented in
-//! [the Arrow data layout specification](https://arrow.apache.org/docs/format/Columnar.html).
-//! For example, the type `Int16Array` represents an Apache
-//! Arrow array of 16-bit integers.
+//! # Downcasting an Array
 //!
-//! Those buffers consist of the value data itself and an
-//! optional [bitmap buffer](crate::bitmap::Bitmap) that
-//! indicates which array entries are null values.
-//! The bitmap buffer can be entirely omitted if the array is
-//! known to have zero null values.
+//! Arrays are often passed around as a dynamically typed [`&dyn Array`] or [`ArrayRef`].
+//! For example, [`RecordBatch`](`crate::record_batch::RecordBatch`) stores columns as [`ArrayRef`].
 //!
-//! There are concrete implementations of this trait for each
-//! data type, that help you access individual values of the
-//! array.
+//! Whilst these arrays can be passed directly to the [`compute`](crate::compute),
+//! [`csv`](crate::csv), [`json`](crate::json), etc... APIs, it is often the case that you wish
+//! to interact with the data directly. This requires downcasting to the concrete type of the array:
+//!
+//! ```
+//! # use arrow::array::{Array, Float32Array, Int32Array};
+//! #
+//! fn sum_int32(array: &dyn Array) -> i32 {
+//!     let integers: &Int32Array = array.as_any().downcast_ref().unwrap();
+//!     integers.iter().map(|val| val.unwrap_or_default()).sum()
+//! }
+//!
+//! // Note: the values for positions corresponding to nulls will be arbitrary
+//! fn as_f32_slice(array: &dyn Array) -> &[f32] {
+//!     array.as_any().downcast_ref::<Float32Array>().unwrap().values()
+//! }
+//! ```
 //!
 //! # Building an Array
 //!
-//! Arrow's `Arrays` are immutable, but there is the trait
-//! [`ArrayBuilder`](crate::array::ArrayBuilder)
-//! that helps you with constructing new `Arrays`. As with the
-//! `Array` trait, there are builder implementations for all
-//! concrete array types.
+//! Most [`Array`] implementations can be constructed directly from iterators or [`Vec`]
 //!
-//! # Example
 //! ```
-//! use arrow::array::Int16Array;
+//! # use arrow::array::Int32Array;
+//! # use arrow::array::StringArray;
+//! # use arrow::array::ListArray;
+//! # use arrow::datatypes::Int32Type;
+//! #
+//! Int32Array::from(vec![1, 2]);

Review Comment:
   These new APIs are :ok_hand: 



##########
arrow/src/array/mod.rs:
##########
@@ -78,6 +104,43 @@
 //!     "Get slice of len 2 starting at idx 3"
 //! )
 //! ```
+//!
+//! # Zero-Copy Slicing
+//!
+//! Given an [`Array`] of arbitrary length, it is possible to create an owned slice of this
+//! data. Internally this just increments some ref-counts, and so is incredibly cheap
+//!
+//! ```rust
+//! # use std::sync::Arc;
+//! # use arrow::array::{Array, Int32Array, ArrayRef};
+//! let array = Arc::new(Int32Array::from_iter([1, 2, 3])) as ArrayRef;
+//!
+//! // Slice with offset 1 and length 2
+//! let sliced = array.slice(1, 2);

Review Comment:
   It is perhaps a little bit unfortunate that this returns `ArrayRef` even when called on a concrete type, but then again I'm not sure of many use-cases for slicing concretely type arrays :laughing: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org