You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/01 15:49:37 UTC

[GitHub] [arrow] alamb commented on a change in pull request #8310: ARROW-10148: [Rust] Improved rust/lib.rs that is shown in docs.rs

alamb commented on a change in pull request #8310:
URL: https://github.com/apache/arrow/pull/8310#discussion_r498345323



##########
File path: rust/arrow/src/lib.rs
##########
@@ -18,8 +18,109 @@
 //! A native Rust implementation of [Apache Arrow](https://arrow.apache.org), a cross-language
 //! development platform for in-memory data.
 //!
-//! Currently the project is developed and tested against nightly Rust. To learn more
-//! about the status of Arrow in Rust, see `README.md`.
+//! ### DataType
+//!
+//! Every [`Array`](array::Array) in this crate has an associated [`DataType`](datatypes::DataType),
+//! that specifies how its data is layed in memory and represented.
+//! Thus, a central enum of this crate is [`DataType`](datatypes::DataType), that contains the set of valid
+//! DataTypes in the specification. For example, [`DataType::Utf8`](datatypes::DataType::Utf8).
+//!
+//! ## Array
+//!
+//! The central trait of this package is the dynamically-typed [`Array`](array::Array) that
+//! represents a fixed-sized, immutable, Send + Sync Array of nullable elements. An example of such an array is [`UInt32Array`](array::UInt32Array).
+//! One way to think about an arrow [`Array`](array::Array) isa `Arc<[Option<T>; len]>` where T can be anything ranging from an integer to a string, or even

Review comment:
       ```suggestion
   //! One way to think about an arrow [`Array`](array::Array) is a `Arc<[Option<T>; len]>` where T can be anything ranging from an integer to a string, or even
   ```

##########
File path: rust/arrow/src/ipc/gen/SparseTensor.rs
##########
@@ -389,7 +389,7 @@ impl<'a> SparseMatrixIndexCSX<'a> {
     }
     /// indptrBuffer stores the location and size of indptr array that
     /// represents the range of the rows.
-    /// The i-th row spans from indptr[i] to indptr[i+1] in the data.
+    /// The i-th row spans from `indptr[i]` to `indptr[i+1]` in the data.

Review comment:
       I was seeing `cargo doc` warnings about these lines just this morning. 👍  for fixing them

##########
File path: rust/arrow/src/lib.rs
##########
@@ -18,8 +18,109 @@
 //! A native Rust implementation of [Apache Arrow](https://arrow.apache.org), a cross-language
 //! development platform for in-memory data.
 //!
-//! Currently the project is developed and tested against nightly Rust. To learn more
-//! about the status of Arrow in Rust, see `README.md`.
+//! ### DataType
+//!
+//! Every [`Array`](array::Array) in this crate has an associated [`DataType`](datatypes::DataType),
+//! that specifies how its data is layed in memory and represented.
+//! Thus, a central enum of this crate is [`DataType`](datatypes::DataType), that contains the set of valid
+//! DataTypes in the specification. For example, [`DataType::Utf8`](datatypes::DataType::Utf8).
+//!
+//! ## Array
+//!
+//! The central trait of this package is the dynamically-typed [`Array`](array::Array) that
+//! represents a fixed-sized, immutable, Send + Sync Array of nullable elements. An example of such an array is [`UInt32Array`](array::UInt32Array).
+//! One way to think about an arrow [`Array`](array::Array) isa `Arc<[Option<T>; len]>` where T can be anything ranging from an integer to a string, or even
+//! another [`Array`](array::Array).
+//!
+//! [`Arrays`](array::Array) have [`len()`](array::Array::len), [`data_type()`](array::Array::data_type), and the nullability of each of its elements,
+//! can be obtained via [`is_null(index)`](array::Array::is_null). To downcast an [`Array`](array::Array) to a specific implementation, you can use
+//!
+//! ```rust
+//! use arrow::array::{Array, PrimitiveArrayOps, UInt32Array};
+//! let array = UInt32Array::from(vec![Some(1), None, Some(3)]);
+//! assert_eq!(array.len(), 3);
+//! assert_eq!(array.value(0), 1);
+//! assert_eq!(array.is_null(1), true);
+//! ```
+//!
+//! To make the array dynamically typed, we wrap it in an [`Arc`](std::sync::Arc):
+//!
+//! ```rust
+//! # use std::sync::Arc;
+//! use arrow::datatypes::DataType;
+//! use arrow::array::{UInt32Array, ArrayRef};
+//! # let array = UInt32Array::from(vec![Some(1), None, Some(3)]);
+//! let array: ArrayRef = Arc::new(array);
+//! assert_eq!(array.len(), 3);
+//! // array.value() is not available in the dynamcally-typed version
+//! assert_eq!(array.is_null(1), true);
+//! assert_eq!(array.data_type(), &DataType::UInt32);
+//! ```
+//!
+//! to downcast, use `as_any()`:

Review comment:
       👍 

##########
File path: rust/arrow/src/lib.rs
##########
@@ -18,8 +18,109 @@
 //! A native Rust implementation of [Apache Arrow](https://arrow.apache.org), a cross-language
 //! development platform for in-memory data.
 //!
-//! Currently the project is developed and tested against nightly Rust. To learn more
-//! about the status of Arrow in Rust, see `README.md`.
+//! ### DataType
+//!
+//! Every [`Array`](array::Array) in this crate has an associated [`DataType`](datatypes::DataType),
+//! that specifies how its data is layed in memory and represented.
+//! Thus, a central enum of this crate is [`DataType`](datatypes::DataType), that contains the set of valid
+//! DataTypes in the specification. For example, [`DataType::Utf8`](datatypes::DataType::Utf8).
+//!
+//! ## Array
+//!
+//! The central trait of this package is the dynamically-typed [`Array`](array::Array) that
+//! represents a fixed-sized, immutable, Send + Sync Array of nullable elements. An example of such an array is [`UInt32Array`](array::UInt32Array).
+//! One way to think about an arrow [`Array`](array::Array) isa `Arc<[Option<T>; len]>` where T can be anything ranging from an integer to a string, or even
+//! another [`Array`](array::Array).
+//!
+//! [`Arrays`](array::Array) have [`len()`](array::Array::len), [`data_type()`](array::Array::data_type), and the nullability of each of its elements,
+//! can be obtained via [`is_null(index)`](array::Array::is_null). To downcast an [`Array`](array::Array) to a specific implementation, you can use
+//!
+//! ```rust
+//! use arrow::array::{Array, PrimitiveArrayOps, UInt32Array};
+//! let array = UInt32Array::from(vec![Some(1), None, Some(3)]);
+//! assert_eq!(array.len(), 3);
+//! assert_eq!(array.value(0), 1);
+//! assert_eq!(array.is_null(1), true);
+//! ```
+//!
+//! To make the array dynamically typed, we wrap it in an [`Arc`](std::sync::Arc):
+//!
+//! ```rust
+//! # use std::sync::Arc;
+//! use arrow::datatypes::DataType;
+//! use arrow::array::{UInt32Array, ArrayRef};
+//! # let array = UInt32Array::from(vec![Some(1), None, Some(3)]);
+//! let array: ArrayRef = Arc::new(array);
+//! assert_eq!(array.len(), 3);
+//! // array.value() is not available in the dynamcally-typed version
+//! assert_eq!(array.is_null(1), true);
+//! assert_eq!(array.data_type(), &DataType::UInt32);
+//! ```
+//!
+//! to downcast, use `as_any()`:
+//!
+//! ```rust
+//! # use std::sync::Arc;
+//! # use arrow::array::{UInt32Array, ArrayRef, PrimitiveArrayOps};
+//! # let array = UInt32Array::from(vec![Some(1), None, Some(3)]);
+//! # let array: ArrayRef = Arc::new(array);
+//! let array = array.as_any().downcast_ref::<UInt32Array>().unwrap();
+//! assert_eq!(array.value(0), 1);
+//! ```
+//!
+//! ## Memory and Buffers
+//!
+//! Data in [`Array`](array::Array) is stored in [`ArrayData`](array::data::ArrayData), that in turn
+//! is a collection of other [`ArrayData`](array::data::ArrayData) and [`Buffers`](buffer::Buffer).
+//! [`Buffers`](buffer::Buffer) is the central struct that array implementations use keep allocated memory and pointers.
+//! The [`MutableBuffer`](buffer::MutableBuffer) is the mutable counter-part of[`Buffer`](buffer::Buffer).
+//! These are the lowest abstractions of this crate, and are used throughout the crate to
+//! efficiently allocate, write, read and deallocate memory.
+//!
+//! ## Field, Schema and RecordBatch
+//!
+//! [`Field`](datatypes::Field) is a struct that contains an arrays' metadata (datatype and whether its values

Review comment:
       ```suggestion
   //! [`Field`](datatypes::Field) is a struct that contains an array's metadata (datatype and whether its values
   ```

##########
File path: rust/arrow/src/lib.rs
##########
@@ -18,8 +18,109 @@
 //! A native Rust implementation of [Apache Arrow](https://arrow.apache.org), a cross-language
 //! development platform for in-memory data.
 //!
-//! Currently the project is developed and tested against nightly Rust. To learn more
-//! about the status of Arrow in Rust, see `README.md`.
+//! ### DataType
+//!
+//! Every [`Array`](array::Array) in this crate has an associated [`DataType`](datatypes::DataType),
+//! that specifies how its data is layed in memory and represented.
+//! Thus, a central enum of this crate is [`DataType`](datatypes::DataType), that contains the set of valid
+//! DataTypes in the specification. For example, [`DataType::Utf8`](datatypes::DataType::Utf8).
+//!
+//! ## Array
+//!
+//! The central trait of this package is the dynamically-typed [`Array`](array::Array) that
+//! represents a fixed-sized, immutable, Send + Sync Array of nullable elements. An example of such an array is [`UInt32Array`](array::UInt32Array).
+//! One way to think about an arrow [`Array`](array::Array) isa `Arc<[Option<T>; len]>` where T can be anything ranging from an integer to a string, or even
+//! another [`Array`](array::Array).
+//!
+//! [`Arrays`](array::Array) have [`len()`](array::Array::len), [`data_type()`](array::Array::data_type), and the nullability of each of its elements,
+//! can be obtained via [`is_null(index)`](array::Array::is_null). To downcast an [`Array`](array::Array) to a specific implementation, you can use
+//!
+//! ```rust
+//! use arrow::array::{Array, PrimitiveArrayOps, UInt32Array};
+//! let array = UInt32Array::from(vec![Some(1), None, Some(3)]);
+//! assert_eq!(array.len(), 3);
+//! assert_eq!(array.value(0), 1);
+//! assert_eq!(array.is_null(1), true);
+//! ```
+//!
+//! To make the array dynamically typed, we wrap it in an [`Arc`](std::sync::Arc):
+//!
+//! ```rust
+//! # use std::sync::Arc;
+//! use arrow::datatypes::DataType;
+//! use arrow::array::{UInt32Array, ArrayRef};
+//! # let array = UInt32Array::from(vec![Some(1), None, Some(3)]);
+//! let array: ArrayRef = Arc::new(array);
+//! assert_eq!(array.len(), 3);
+//! // array.value() is not available in the dynamcally-typed version
+//! assert_eq!(array.is_null(1), true);
+//! assert_eq!(array.data_type(), &DataType::UInt32);
+//! ```
+//!
+//! to downcast, use `as_any()`:
+//!
+//! ```rust
+//! # use std::sync::Arc;
+//! # use arrow::array::{UInt32Array, ArrayRef, PrimitiveArrayOps};
+//! # let array = UInt32Array::from(vec![Some(1), None, Some(3)]);
+//! # let array: ArrayRef = Arc::new(array);
+//! let array = array.as_any().downcast_ref::<UInt32Array>().unwrap();
+//! assert_eq!(array.value(0), 1);
+//! ```
+//!
+//! ## Memory and Buffers
+//!
+//! Data in [`Array`](array::Array) is stored in [`ArrayData`](array::data::ArrayData), that in turn
+//! is a collection of other [`ArrayData`](array::data::ArrayData) and [`Buffers`](buffer::Buffer).
+//! [`Buffers`](buffer::Buffer) is the central struct that array implementations use keep allocated memory and pointers.
+//! The [`MutableBuffer`](buffer::MutableBuffer) is the mutable counter-part of[`Buffer`](buffer::Buffer).
+//! These are the lowest abstractions of this crate, and are used throughout the crate to
+//! efficiently allocate, write, read and deallocate memory.
+//!
+//! ## Field, Schema and RecordBatch
+//!
+//! [`Field`](datatypes::Field) is a struct that contains an arrays' metadata (datatype and whether its values
+//! can be null), and a name. [`Schema`](datatypes::Schema) is a vector of fields with optional metadata, and together with
+//! Together, they form the basis of a schematic representation of a group of [`Arrays`](array::Array).
+//!
+//! In fact, [`RecordBatch`](record_batch::RecordBatch) is a struct with a [`Schema`](datatypes::Schema) and a vector of
+//! [`Array`](array::Array)s, all with the same `len`. A record batch is the highest order struct that this crate currently offersm

Review comment:
       ```suggestion
   //! [`Array`](array::Array)s, all with the same `len`. A record batch is the highest order struct that this crate currently offers
   ```

##########
File path: rust/arrow/src/lib.rs
##########
@@ -18,8 +18,109 @@
 //! A native Rust implementation of [Apache Arrow](https://arrow.apache.org), a cross-language
 //! development platform for in-memory data.
 //!
-//! Currently the project is developed and tested against nightly Rust. To learn more
-//! about the status of Arrow in Rust, see `README.md`.
+//! ### DataType
+//!
+//! Every [`Array`](array::Array) in this crate has an associated [`DataType`](datatypes::DataType),
+//! that specifies how its data is layed in memory and represented.
+//! Thus, a central enum of this crate is [`DataType`](datatypes::DataType), that contains the set of valid
+//! DataTypes in the specification. For example, [`DataType::Utf8`](datatypes::DataType::Utf8).
+//!
+//! ## Array
+//!
+//! The central trait of this package is the dynamically-typed [`Array`](array::Array) that
+//! represents a fixed-sized, immutable, Send + Sync Array of nullable elements. An example of such an array is [`UInt32Array`](array::UInt32Array).
+//! One way to think about an arrow [`Array`](array::Array) isa `Arc<[Option<T>; len]>` where T can be anything ranging from an integer to a string, or even
+//! another [`Array`](array::Array).
+//!
+//! [`Arrays`](array::Array) have [`len()`](array::Array::len), [`data_type()`](array::Array::data_type), and the nullability of each of its elements,
+//! can be obtained via [`is_null(index)`](array::Array::is_null). To downcast an [`Array`](array::Array) to a specific implementation, you can use
+//!
+//! ```rust
+//! use arrow::array::{Array, PrimitiveArrayOps, UInt32Array};
+//! let array = UInt32Array::from(vec![Some(1), None, Some(3)]);
+//! assert_eq!(array.len(), 3);
+//! assert_eq!(array.value(0), 1);
+//! assert_eq!(array.is_null(1), true);
+//! ```
+//!
+//! To make the array dynamically typed, we wrap it in an [`Arc`](std::sync::Arc):
+//!
+//! ```rust
+//! # use std::sync::Arc;
+//! use arrow::datatypes::DataType;
+//! use arrow::array::{UInt32Array, ArrayRef};
+//! # let array = UInt32Array::from(vec![Some(1), None, Some(3)]);
+//! let array: ArrayRef = Arc::new(array);
+//! assert_eq!(array.len(), 3);
+//! // array.value() is not available in the dynamcally-typed version
+//! assert_eq!(array.is_null(1), true);
+//! assert_eq!(array.data_type(), &DataType::UInt32);
+//! ```
+//!
+//! to downcast, use `as_any()`:
+//!
+//! ```rust
+//! # use std::sync::Arc;
+//! # use arrow::array::{UInt32Array, ArrayRef, PrimitiveArrayOps};
+//! # let array = UInt32Array::from(vec![Some(1), None, Some(3)]);
+//! # let array: ArrayRef = Arc::new(array);
+//! let array = array.as_any().downcast_ref::<UInt32Array>().unwrap();
+//! assert_eq!(array.value(0), 1);
+//! ```
+//!
+//! ## Memory and Buffers
+//!
+//! Data in [`Array`](array::Array) is stored in [`ArrayData`](array::data::ArrayData), that in turn
+//! is a collection of other [`ArrayData`](array::data::ArrayData) and [`Buffers`](buffer::Buffer).
+//! [`Buffers`](buffer::Buffer) is the central struct that array implementations use keep allocated memory and pointers.
+//! The [`MutableBuffer`](buffer::MutableBuffer) is the mutable counter-part of[`Buffer`](buffer::Buffer).
+//! These are the lowest abstractions of this crate, and are used throughout the crate to
+//! efficiently allocate, write, read and deallocate memory.
+//!
+//! ## Field, Schema and RecordBatch
+//!
+//! [`Field`](datatypes::Field) is a struct that contains an arrays' metadata (datatype and whether its values
+//! can be null), and a name. [`Schema`](datatypes::Schema) is a vector of fields with optional metadata, and together with
+//! Together, they form the basis of a schematic representation of a group of [`Arrays`](array::Array).
+//!
+//! In fact, [`RecordBatch`](record_batch::RecordBatch) is a struct with a [`Schema`](datatypes::Schema) and a vector of
+//! [`Array`](array::Array)s, all with the same `len`. A record batch is the highest order struct that this crate currently offersm
+//! and is broadly used to represent a table where each column in an `Array`.
+//!
+//! ## Compute
+//!
+//! This crate offers many operations (called kernels) to operate on `Array`s, that you can find at [compute::kernels].
+//! It has both vertial and horizontal operations, and some of them have an SIMD implementation.
+//!
+//! ## Status
+//!
+//! This crate has most of the implementation of the arrow specification. Specifically, it supports the following types:
+//!
+//! * All arrow primitive types, such as [`Int32Array`](array::UInt8Array), [`BooleanArray`](array::BooleanArray) and [`Float64Array`](array::Float64Array).
+//! * All arrow variable length types, such as [`StringArray`](array::StringArray) and [`BinaryArray`](array::BinaryArray)
+//! * All composite types such as [`StructArray`](array::StructArray) and [`ListArray`](array::ListArray)

Review comment:
       ```suggestion
   //! * All composite types such as [`StructArray`](array::StructArray) and [`ListArray`](array::ListArray)
   //! * Dictionary types  [`DictionaryArray`](array::DictionaryArray) 
   
   ```

##########
File path: rust/arrow/src/lib.rs
##########
@@ -18,8 +18,109 @@
 //! A native Rust implementation of [Apache Arrow](https://arrow.apache.org), a cross-language
 //! development platform for in-memory data.
 //!
-//! Currently the project is developed and tested against nightly Rust. To learn more
-//! about the status of Arrow in Rust, see `README.md`.
+//! ### DataType
+//!
+//! Every [`Array`](array::Array) in this crate has an associated [`DataType`](datatypes::DataType),
+//! that specifies how its data is layed in memory and represented.
+//! Thus, a central enum of this crate is [`DataType`](datatypes::DataType), that contains the set of valid
+//! DataTypes in the specification. For example, [`DataType::Utf8`](datatypes::DataType::Utf8).
+//!
+//! ## Array
+//!
+//! The central trait of this package is the dynamically-typed [`Array`](array::Array) that
+//! represents a fixed-sized, immutable, Send + Sync Array of nullable elements. An example of such an array is [`UInt32Array`](array::UInt32Array).
+//! One way to think about an arrow [`Array`](array::Array) isa `Arc<[Option<T>; len]>` where T can be anything ranging from an integer to a string, or even
+//! another [`Array`](array::Array).
+//!
+//! [`Arrays`](array::Array) have [`len()`](array::Array::len), [`data_type()`](array::Array::data_type), and the nullability of each of its elements,
+//! can be obtained via [`is_null(index)`](array::Array::is_null). To downcast an [`Array`](array::Array) to a specific implementation, you can use
+//!
+//! ```rust
+//! use arrow::array::{Array, PrimitiveArrayOps, UInt32Array};
+//! let array = UInt32Array::from(vec![Some(1), None, Some(3)]);
+//! assert_eq!(array.len(), 3);
+//! assert_eq!(array.value(0), 1);
+//! assert_eq!(array.is_null(1), true);
+//! ```
+//!
+//! To make the array dynamically typed, we wrap it in an [`Arc`](std::sync::Arc):
+//!
+//! ```rust
+//! # use std::sync::Arc;
+//! use arrow::datatypes::DataType;
+//! use arrow::array::{UInt32Array, ArrayRef};
+//! # let array = UInt32Array::from(vec![Some(1), None, Some(3)]);
+//! let array: ArrayRef = Arc::new(array);
+//! assert_eq!(array.len(), 3);
+//! // array.value() is not available in the dynamcally-typed version
+//! assert_eq!(array.is_null(1), true);
+//! assert_eq!(array.data_type(), &DataType::UInt32);
+//! ```
+//!
+//! to downcast, use `as_any()`:
+//!
+//! ```rust
+//! # use std::sync::Arc;
+//! # use arrow::array::{UInt32Array, ArrayRef, PrimitiveArrayOps};
+//! # let array = UInt32Array::from(vec![Some(1), None, Some(3)]);
+//! # let array: ArrayRef = Arc::new(array);
+//! let array = array.as_any().downcast_ref::<UInt32Array>().unwrap();
+//! assert_eq!(array.value(0), 1);
+//! ```
+//!
+//! ## Memory and Buffers
+//!
+//! Data in [`Array`](array::Array) is stored in [`ArrayData`](array::data::ArrayData), that in turn
+//! is a collection of other [`ArrayData`](array::data::ArrayData) and [`Buffers`](buffer::Buffer).
+//! [`Buffers`](buffer::Buffer) is the central struct that array implementations use keep allocated memory and pointers.
+//! The [`MutableBuffer`](buffer::MutableBuffer) is the mutable counter-part of[`Buffer`](buffer::Buffer).
+//! These are the lowest abstractions of this crate, and are used throughout the crate to
+//! efficiently allocate, write, read and deallocate memory.
+//!
+//! ## Field, Schema and RecordBatch
+//!
+//! [`Field`](datatypes::Field) is a struct that contains an arrays' metadata (datatype and whether its values
+//! can be null), and a name. [`Schema`](datatypes::Schema) is a vector of fields with optional metadata, and together with

Review comment:
       ```suggestion
   //! can be null), and a name. [`Schema`](datatypes::Schema) is a vector of fields with optional metadata. 
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org