You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/02/22 17:58:52 UTC

[GitHub] [arrow-rs] tustvold opened a new pull request, #3749: Array data struct

tustvold opened a new pull request, #3749:
URL: https://github.com/apache/arrow-rs/pull/3749

   _Draft as builds on #3743_
   
   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Part of #1799 
   Relates to #1176 
   
   # Rationale for this change
    
   <!--
   Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
   Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
   -->
   
   This starts the process of fleshing out what the elements of the `ArrayData` enumeration will consist of, with implementations for primitive and byte arrays. I wanted to get something up for review to get feedback on the general direction, and to also avoid a single massive PR.
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   This PR adds enumerations `ArrayDataPrimitive` and `ArrayDataBytes`, that consist of the enumerable variants of the generic `PrimitiveArrayData` and `BytesArrayData`.
   
   These provide for an idiomatic way to enumerate the possible array types, with the view that this would eventually replace the current trait object approach.
   
   Additionally `downcast` / `downcast_ref` / `From` are provided to facilitate going from the enumeration to/from the generic form. This is critical to allowing types such as `PrimitiveArray` to be contain `PrimitiveArrayData` whilst being constructable from an `ArrayData` enumeration containing `ArrayDataPrimitive`.
   
   # Are there any user-facing changes?
   
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #3749: ArrayData Enumeration for Primitive, Binary and UTF8

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold merged PR #3749:
URL: https://github.com/apache/arrow-rs/pull/3749


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #3749: ArrayData Enumeration for Primitive, Binary and UTF8

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #3749:
URL: https://github.com/apache/arrow-rs/pull/3749#discussion_r1118384534


##########
arrow-data/src/data/types.rs:
##########
@@ -0,0 +1,153 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow_schema::{DataType, IntervalUnit};
+
+/// An enumeration of the primitive types implementing [`ArrowNativeType`](arrow_buffer::ArrowNativeType)
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum PrimitiveType {
+    Int8,
+    Int16,
+    Int32,
+    Int64,
+    Int128,
+    Int256,
+    UInt8,
+    UInt16,
+    UInt32,
+    UInt64,
+    Float16,
+    Float32,
+    Float64,
+}
+
+/// An enumeration of the types of offsets for variable length encodings
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum OffsetType {
+    Small,
+    Large,
+}

Review Comment:
   Hmm, as this is for physical type, should it be something like Int32, Int64?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3749: ArrayData Enumeration for Primitive, Binary and UTF8

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3749:
URL: https://github.com/apache/arrow-rs/pull/3749#discussion_r1118730152


##########
arrow-data/src/data/bytes.rs:
##########
@@ -0,0 +1,288 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::types::{BytesType, OffsetType};
+use arrow_buffer::buffer::{NullBuffer, ScalarBuffer};
+use arrow_buffer::{ArrowNativeType, Buffer};
+use arrow_schema::DataType;
+use std::marker::PhantomData;
+
+mod private {
+    use super::*;
+
+    pub trait BytesSealed {
+        /// Create from bytes without performing any validation
+        ///
+        /// # Safety
+        ///
+        /// If `str`, `b` must be a valid UTF-8 sequence
+        unsafe fn from_bytes_unchecked(b: &[u8]) -> &Self;
+
+        /// Downcast [`ArrayDataBytes`] to `[ArrayDataBytesOffset`]
+        fn downcast_ref(data: &ArrayDataBytes) -> Option<&ArrayDataBytesOffset<Self>>
+        where
+            Self: Bytes;
+
+        /// Downcast [`ArrayDataBytes`] to `[ArrayDataBytesOffset`]
+        fn downcast(data: ArrayDataBytes) -> Option<ArrayDataBytesOffset<Self>>
+        where
+            Self: Bytes;
+
+        /// Cast [`ArrayDataBytesOffset`] to [`ArrayDataBytes`]
+        fn upcast(v: ArrayDataBytesOffset<Self>) -> ArrayDataBytes
+        where
+            Self: Bytes;
+    }
+
+    pub trait BytesOffsetSealed {
+        /// Downcast [`ArrayDataBytesOffset`] to `[BytesArrayData`]
+        fn downcast_ref<B: Bytes + ?Sized>(
+            data: &ArrayDataBytesOffset<B>,
+        ) -> Option<&BytesArrayData<Self, B>>
+        where
+            Self: BytesOffset;
+
+        /// Downcast [`ArrayDataBytesOffset`] to `[BytesArrayData`]
+        fn downcast<B: Bytes + ?Sized>(
+            data: ArrayDataBytesOffset<B>,
+        ) -> Option<BytesArrayData<Self, B>>
+        where
+            Self: BytesOffset;
+
+        /// Cast [`BytesArrayData`] to [`ArrayDataBytesOffset`]
+        fn upcast<B: Bytes + ?Sized>(
+            v: BytesArrayData<Self, B>,
+        ) -> ArrayDataBytesOffset<B>
+        where
+            Self: BytesOffset;
+    }
+}
+
+/// Types backed by a variable length slice of bytes
+pub trait Bytes: private::BytesSealed {
+    const TYPE: BytesType;
+}
+
+impl Bytes for [u8] {
+    const TYPE: BytesType = BytesType::Binary;
+}
+
+impl private::BytesSealed for [u8] {
+    unsafe fn from_bytes_unchecked(b: &[u8]) -> &Self {
+        b
+    }
+
+    fn downcast_ref(data: &ArrayDataBytes) -> Option<&ArrayDataBytesOffset<Self>> {
+        match data {
+            ArrayDataBytes::Binary(v) => Some(v),
+            ArrayDataBytes::Utf8(_) => None,
+        }
+    }
+
+    fn downcast(data: ArrayDataBytes) -> Option<ArrayDataBytesOffset<Self>> {
+        match data {
+            ArrayDataBytes::Binary(v) => Some(v),
+            ArrayDataBytes::Utf8(_) => None,
+        }
+    }
+
+    fn upcast(v: ArrayDataBytesOffset<Self>) -> ArrayDataBytes {
+        ArrayDataBytes::Binary(v)
+    }
+}
+
+impl Bytes for str {
+    const TYPE: BytesType = BytesType::Utf8;
+}
+
+impl private::BytesSealed for str {
+    unsafe fn from_bytes_unchecked(b: &[u8]) -> &Self {
+        std::str::from_utf8_unchecked(b)
+    }
+
+    fn downcast_ref(data: &ArrayDataBytes) -> Option<&ArrayDataBytesOffset<Self>> {
+        match data {
+            ArrayDataBytes::Binary(_) => None,
+            ArrayDataBytes::Utf8(v) => Some(v),
+        }
+    }
+
+    fn downcast(data: ArrayDataBytes) -> Option<ArrayDataBytesOffset<Self>> {
+        match data {
+            ArrayDataBytes::Binary(_) => None,
+            ArrayDataBytes::Utf8(v) => Some(v),
+        }
+    }
+
+    fn upcast(v: ArrayDataBytesOffset<Self>) -> ArrayDataBytes {
+        ArrayDataBytes::Utf8(v)
+    }
+}
+
+/// Types of offset used by variable length byte arrays
+pub trait BytesOffset: private::BytesOffsetSealed + ArrowNativeType {
+    const TYPE: OffsetType;
+}
+
+impl BytesOffset for i32 {
+    const TYPE: OffsetType = OffsetType::Small;
+}
+
+impl private::BytesOffsetSealed for i32 {
+    fn downcast_ref<B: Bytes + ?Sized>(
+        data: &ArrayDataBytesOffset<B>,
+    ) -> Option<&BytesArrayData<Self, B>> {
+        match data {
+            ArrayDataBytesOffset::Small(v) => Some(v),
+            ArrayDataBytesOffset::Large(_) => None,
+        }
+    }
+
+    fn downcast<B: Bytes + ?Sized>(
+        data: ArrayDataBytesOffset<B>,
+    ) -> Option<BytesArrayData<Self, B>> {
+        match data {
+            ArrayDataBytesOffset::Small(v) => Some(v),
+            ArrayDataBytesOffset::Large(_) => None,
+        }
+    }
+
+    fn upcast<B: Bytes + ?Sized>(v: BytesArrayData<Self, B>) -> ArrayDataBytesOffset<B> {
+        ArrayDataBytesOffset::Small(v)
+    }
+}
+
+impl BytesOffset for i64 {
+    const TYPE: OffsetType = OffsetType::Large;
+}
+
+impl private::BytesOffsetSealed for i64 {
+    fn downcast_ref<B: Bytes + ?Sized>(
+        data: &ArrayDataBytesOffset<B>,
+    ) -> Option<&BytesArrayData<Self, B>> {
+        match data {
+            ArrayDataBytesOffset::Small(_) => None,
+            ArrayDataBytesOffset::Large(v) => Some(v),
+        }
+    }
+
+    fn downcast<B: Bytes + ?Sized>(
+        data: ArrayDataBytesOffset<B>,
+    ) -> Option<BytesArrayData<Self, B>> {
+        match data {
+            ArrayDataBytesOffset::Small(_) => None,
+            ArrayDataBytesOffset::Large(v) => Some(v),
+        }
+    }
+
+    fn upcast<B: Bytes + ?Sized>(v: BytesArrayData<Self, B>) -> ArrayDataBytesOffset<B> {
+        ArrayDataBytesOffset::Large(v)
+    }
+}
+
+/// An enumeration of the types of [`ArrayDataBytesOffset`]
+pub enum ArrayDataBytes {
+    Binary(ArrayDataBytesOffset<[u8]>),
+    Utf8(ArrayDataBytesOffset<str>),
+}
+
+impl ArrayDataBytes {
+    /// Downcast this [`ArrayDataBytes`] to the corresponding [`BytesArrayData`]
+    pub fn downcast_ref<O: BytesOffset, B: Bytes + ?Sized>(
+        &self,
+    ) -> Option<&BytesArrayData<O, B>> {
+        O::downcast_ref(B::downcast_ref(self)?)
+    }
+
+    /// Downcast this [`ArrayDataBytes`] to the corresponding [`BytesArrayData`]
+    pub fn downcast<O: BytesOffset, B: Bytes + ?Sized>(
+        self,
+    ) -> Option<BytesArrayData<O, B>> {
+        O::downcast(B::downcast(self)?)
+    }
+}
+
+/// An enumeration of the types of [`BytesArrayData`]
+pub enum ArrayDataBytesOffset<B: Bytes + ?Sized> {
+    Small(BytesArrayData<i32, B>),
+    Large(BytesArrayData<i64, B>),
+}

Review Comment:
   It is necessary for arrow2 compatibility (which has different arrays for strings and bytes) and for the downcasting logic



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3749: ArrayData Enumeration for Primitive, Binary and UTF8

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3749:
URL: https://github.com/apache/arrow-rs/pull/3749#discussion_r1118735470


##########
arrow-data/src/data/primitive.rs:
##########
@@ -0,0 +1,184 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::types::{PhysicalType, PrimitiveType};
+use arrow_buffer::buffer::{NullBuffer, ScalarBuffer};
+use arrow_buffer::{i256, ArrowNativeType};
+use arrow_schema::DataType;
+use half::f16;
+
+mod private {
+    use super::*;
+
+    pub trait PrimitiveSealed {
+        /// Downcast [`ArrayDataPrimitive`] to `[PrimitiveArrayData`]
+        fn downcast_ref(data: &ArrayDataPrimitive) -> Option<&PrimitiveArrayData<Self>>
+        where
+            Self: Primitive;
+
+        /// Downcast [`ArrayDataPrimitive`] to `[PrimitiveArrayData`]
+        fn downcast(data: ArrayDataPrimitive) -> Option<PrimitiveArrayData<Self>>
+        where
+            Self: Primitive;
+
+        /// Cast [`ArrayDataPrimitive`] to [`ArrayDataPrimitive`]
+        fn upcast(v: PrimitiveArrayData<Self>) -> ArrayDataPrimitive
+        where
+            Self: Primitive;
+    }
+}
+
+pub trait Primitive: private::PrimitiveSealed + ArrowNativeType {
+    const VARIANT: PrimitiveType;
+}

Review Comment:
   I will add some high-level documentation of the ArrayData "pattern" when I switch ArrayData over (and make these public)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ursabot commented on pull request #3749: ArrayData Enumeration for Primitive, Binary and UTF8

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #3749:
URL: https://github.com/apache/arrow-rs/pull/3749#issuecomment-1446391705

   Benchmark runs are scheduled for baseline = 96791ea47b032ce2ebcb07087d3b3007ea0df536 and contender = dae7a71cc2980d6778ae3226aa3adb240a41da3c. dae7a71cc2980d6778ae3226aa3adb240a41da3c is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/9b6675961f1d4c9b915fcdad6c5440f1...c3cd6e61b0fb4d3eadccf4fa547905ab/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/6e5013740e894d3ba14818b4d0237ae2...03673823540e4b91933407c6dd708b1e/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/484ca48dfb3c4e1f8a4cfe63809aa84b...f1b010cb68b64f0b9964009cb019f9ec/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/a2e29724050f4629871ac148e66c1264...4817653d7d3c45339de0549b3c6e57bd/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #3749: ArrayData Enumeration for Primitive, Binary and UTF8

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #3749:
URL: https://github.com/apache/arrow-rs/pull/3749#discussion_r1117432885


##########
arrow-data/src/data/primitive.rs:
##########
@@ -0,0 +1,184 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::types::{PhysicalType, PrimitiveType};
+use arrow_buffer::buffer::{NullBuffer, ScalarBuffer};
+use arrow_buffer::{i256, ArrowNativeType};
+use arrow_schema::DataType;
+use half::f16;
+
+mod private {
+    use super::*;
+
+    pub trait PrimitiveSealed {
+        /// Downcast [`ArrayDataPrimitive`] to `[PrimitiveArrayData`]
+        fn downcast_ref(data: &ArrayDataPrimitive) -> Option<&PrimitiveArrayData<Self>>
+        where
+            Self: Primitive;
+
+        /// Downcast [`ArrayDataPrimitive`] to `[PrimitiveArrayData`]
+        fn downcast(data: ArrayDataPrimitive) -> Option<PrimitiveArrayData<Self>>
+        where
+            Self: Primitive;
+
+        /// Cast [`ArrayDataPrimitive`] to [`ArrayDataPrimitive`]
+        fn upcast(v: PrimitiveArrayData<Self>) -> ArrayDataPrimitive
+        where
+            Self: Primitive;
+    }
+}
+
+pub trait Primitive: private::PrimitiveSealed + ArrowNativeType {
+    const VARIANT: PrimitiveType;
+}
+
+macro_rules! primitive {
+    ($t:ty,$v:ident) => {
+        impl Primitive for $t {
+            const VARIANT: PrimitiveType = PrimitiveType::$v;
+        }
+        impl private::PrimitiveSealed for $t {
+            fn downcast_ref(
+                data: &ArrayDataPrimitive,
+            ) -> Option<&PrimitiveArrayData<Self>> {
+                match data {
+                    ArrayDataPrimitive::$v(v) => Some(v),
+                    _ => None,
+                }
+            }
+
+            fn downcast(data: ArrayDataPrimitive) -> Option<PrimitiveArrayData<Self>> {
+                match data {
+                    ArrayDataPrimitive::$v(v) => Some(v),
+                    _ => None,
+                }
+            }
+
+            fn upcast(v: PrimitiveArrayData<Self>) -> ArrayDataPrimitive {
+                ArrayDataPrimitive::$v(v)
+            }
+        }
+    };
+}
+
+primitive!(i8, Int8);
+primitive!(i16, Int16);
+primitive!(i32, Int32);
+primitive!(i64, Int64);
+primitive!(i128, Int128);
+primitive!(i256, Int256);
+primitive!(u8, UInt8);
+primitive!(u16, UInt16);
+primitive!(u32, UInt32);
+primitive!(u64, UInt64);
+primitive!(f16, Float16);
+primitive!(f32, Float32);
+primitive!(f64, Float64);
+
+/// An enumeration of the types of [`PrimitiveArrayData`]
+pub enum ArrayDataPrimitive {
+    Int8(PrimitiveArrayData<i8>),
+    Int16(PrimitiveArrayData<i16>),
+    Int32(PrimitiveArrayData<i32>),
+    Int64(PrimitiveArrayData<i64>),
+    Int128(PrimitiveArrayData<i128>),
+    Int256(PrimitiveArrayData<i256>),
+    UInt8(PrimitiveArrayData<u8>),
+    UInt16(PrimitiveArrayData<u16>),
+    UInt32(PrimitiveArrayData<u32>),
+    UInt64(PrimitiveArrayData<u64>),
+    Float16(PrimitiveArrayData<f16>),
+    Float32(PrimitiveArrayData<f32>),
+    Float64(PrimitiveArrayData<f64>),
+}
+
+impl ArrayDataPrimitive {
+    /// Downcast this [`ArrayDataPrimitive`] to the corresponding [`PrimitiveArrayData`]
+    pub fn downcast_ref<P: Primitive>(&self) -> Option<&PrimitiveArrayData<P>> {
+        P::downcast_ref(self)
+    }
+
+    /// Downcast this [`ArrayDataPrimitive`] to the corresponding [`PrimitiveArrayData`]
+    pub fn downcast<P: Primitive>(self) -> Option<PrimitiveArrayData<P>> {
+        P::downcast(self)
+    }
+}
+
+/// ArrayData for arrays of [`Primitive`]
+#[derive(Debug, Clone)]
+pub struct PrimitiveArrayData<T: Primitive> {
+    data_type: DataType,
+    nulls: Option<NullBuffer>,
+    values: ScalarBuffer<T>,
+}

Review Comment:
   Will we remove current ArrayData after adding these? I think the arrays like primitive arrays will go to use these enumeration instead of current ArrayData?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #3749: ArrayData Enumeration for Primitive, Binary and UTF8

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #3749:
URL: https://github.com/apache/arrow-rs/pull/3749#discussion_r1118387280


##########
arrow-data/src/data/types.rs:
##########
@@ -0,0 +1,153 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow_schema::{DataType, IntervalUnit};
+
+/// An enumeration of the primitive types implementing [`ArrowNativeType`](arrow_buffer::ArrowNativeType)
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum PrimitiveType {
+    Int8,
+    Int16,
+    Int32,
+    Int64,
+    Int128,
+    Int256,
+    UInt8,
+    UInt16,
+    UInt32,
+    UInt64,
+    Float16,
+    Float32,
+    Float64,
+}
+
+/// An enumeration of the types of offsets for variable length encodings
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum OffsetType {
+    Small,
+    Large,
+}
+
+/// An enumeration of the types of variable length byte arrays
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum BytesType {
+    Binary,
+    Utf8,
+}
+
+/// An enumeration of the types of dictionary key
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum DictionaryKeyType {
+    Int8,
+    Int16,
+    Int32,
+    Int64,
+    UInt8,
+    UInt16,
+    UInt32,
+    UInt64,
+}
+
+/// An enumeration of the types of run key
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum RunEndType {
+    Int16,
+    Int32,
+    Int64,
+}
+
+/// Describes the physical representation of a given [`DataType`]
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum PhysicalType {

Review Comment:
   Is this only used for validating physical type when constructing typed ArrayData like `PrimitiveArrayData.new` does?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #3749: ArrayData Enumeration for Primitive, Binary and UTF8

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #3749:
URL: https://github.com/apache/arrow-rs/pull/3749#discussion_r1116319209


##########
arrow-data/src/data/bytes.rs:
##########
@@ -0,0 +1,288 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::types::{BytesType, OffsetType};
+use arrow_buffer::buffer::{NullBuffer, ScalarBuffer};
+use arrow_buffer::{ArrowNativeType, Buffer};
+use arrow_schema::DataType;
+use std::marker::PhantomData;
+
+mod private {
+    use super::*;
+
+    pub trait BytesSealed {
+        /// Create from bytes without performing any validation
+        ///
+        /// # Safety
+        ///
+        /// If `str`, `b` must be a valid UTF-8 sequence
+        unsafe fn from_bytes_unchecked(b: &[u8]) -> &Self;
+
+        /// Downcast [`ArrayDataBytes`] to `[ArrayDataBytesOffset`]
+        fn downcast_ref(data: &ArrayDataBytes) -> Option<&ArrayDataBytesOffset<Self>>
+        where
+            Self: Bytes;
+
+        /// Downcast [`ArrayDataBytes`] to `[ArrayDataBytesOffset`]
+        fn downcast(data: ArrayDataBytes) -> Option<ArrayDataBytesOffset<Self>>
+        where
+            Self: Bytes;
+
+        /// Cast [`ArrayDataBytesOffset`] to [`ArrayDataBytes`]
+        fn upcast(v: ArrayDataBytesOffset<Self>) -> ArrayDataBytes
+        where
+            Self: Bytes;
+    }
+
+    pub trait BytesOffsetSealed {
+        /// Downcast [`ArrayDataBytesOffset`] to `[BytesArrayData`]
+        fn downcast_ref<B: Bytes + ?Sized>(
+            data: &ArrayDataBytesOffset<B>,
+        ) -> Option<&BytesArrayData<Self, B>>
+        where
+            Self: BytesOffset;
+
+        /// Downcast [`ArrayDataBytesOffset`] to `[BytesArrayData`]
+        fn downcast<B: Bytes + ?Sized>(
+            data: ArrayDataBytesOffset<B>,
+        ) -> Option<BytesArrayData<Self, B>>
+        where
+            Self: BytesOffset;
+
+        /// Cast [`BytesArrayData`] to [`ArrayDataBytesOffset`]
+        fn upcast<B: Bytes + ?Sized>(
+            v: BytesArrayData<Self, B>,

Review Comment:
   What difference between `ArrayDataBytes` and `BytesArrayData`? Seems `ArrayDataBytes` is the data of bytes types (binary and str). But `BytesArrayData`?



##########
arrow-data/src/data/bytes.rs:
##########
@@ -0,0 +1,288 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::types::{BytesType, OffsetType};
+use arrow_buffer::buffer::{NullBuffer, ScalarBuffer};
+use arrow_buffer::{ArrowNativeType, Buffer};
+use arrow_schema::DataType;
+use std::marker::PhantomData;
+
+mod private {
+    use super::*;
+
+    pub trait BytesSealed {
+        /// Create from bytes without performing any validation
+        ///
+        /// # Safety
+        ///
+        /// If `str`, `b` must be a valid UTF-8 sequence
+        unsafe fn from_bytes_unchecked(b: &[u8]) -> &Self;
+
+        /// Downcast [`ArrayDataBytes`] to `[ArrayDataBytesOffset`]
+        fn downcast_ref(data: &ArrayDataBytes) -> Option<&ArrayDataBytesOffset<Self>>
+        where
+            Self: Bytes;
+
+        /// Downcast [`ArrayDataBytes`] to `[ArrayDataBytesOffset`]
+        fn downcast(data: ArrayDataBytes) -> Option<ArrayDataBytesOffset<Self>>
+        where
+            Self: Bytes;
+
+        /// Cast [`ArrayDataBytesOffset`] to [`ArrayDataBytes`]
+        fn upcast(v: ArrayDataBytesOffset<Self>) -> ArrayDataBytes
+        where
+            Self: Bytes;
+    }
+
+    pub trait BytesOffsetSealed {
+        /// Downcast [`ArrayDataBytesOffset`] to `[BytesArrayData`]
+        fn downcast_ref<B: Bytes + ?Sized>(
+            data: &ArrayDataBytesOffset<B>,
+        ) -> Option<&BytesArrayData<Self, B>>
+        where
+            Self: BytesOffset;
+
+        /// Downcast [`ArrayDataBytesOffset`] to `[BytesArrayData`]
+        fn downcast<B: Bytes + ?Sized>(
+            data: ArrayDataBytesOffset<B>,
+        ) -> Option<BytesArrayData<Self, B>>
+        where
+            Self: BytesOffset;
+
+        /// Cast [`BytesArrayData`] to [`ArrayDataBytesOffset`]
+        fn upcast<B: Bytes + ?Sized>(
+            v: BytesArrayData<Self, B>,

Review Comment:
   I got it after reading the enums and struct `BytesArrayData` below. Maybe a descriptive comment at the top helpful.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #3749: ArrayData Enumeration for Primitive, Binary and UTF8

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #3749:
URL: https://github.com/apache/arrow-rs/pull/3749#discussion_r1118385917


##########
arrow-data/src/data/types.rs:
##########
@@ -0,0 +1,153 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow_schema::{DataType, IntervalUnit};
+
+/// An enumeration of the primitive types implementing [`ArrowNativeType`](arrow_buffer::ArrowNativeType)
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum PrimitiveType {
+    Int8,
+    Int16,
+    Int32,
+    Int64,
+    Int128,
+    Int256,
+    UInt8,
+    UInt16,
+    UInt32,
+    UInt64,
+    Float16,
+    Float32,
+    Float64,
+}
+
+/// An enumeration of the types of offsets for variable length encodings
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum OffsetType {
+    Small,
+    Large,
+}
+
+/// An enumeration of the types of variable length byte arrays
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum BytesType {
+    Binary,
+    Utf8,
+}
+
+/// An enumeration of the types of dictionary key
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum DictionaryKeyType {
+    Int8,
+    Int16,
+    Int32,
+    Int64,
+    UInt8,
+    UInt16,
+    UInt32,
+    UInt64,
+}
+
+/// An enumeration of the types of run key
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum RunEndType {
+    Int16,
+    Int32,
+    Int64,
+}
+
+/// Describes the physical representation of a given [`DataType`]
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum PhysicalType {
+    Null,
+    Boolean,
+    Primitive(PrimitiveType),
+    FixedSizeBinary,
+    Bytes(OffsetType, BytesType),
+    FixedSizeList,
+    List(OffsetType),
+    Map,
+    Struct,
+    Union,
+    Dictionary(DictionaryKeyType),
+    Run(RunEndType),
+}
+
+impl From<&DataType> for PhysicalType {
+    fn from(value: &DataType) -> Self {
+        match value {
+            DataType::Null => Self::Null,
+            DataType::Boolean => Self::Boolean,
+            DataType::Int8 => Self::Primitive(PrimitiveType::Int8),
+            DataType::Int16 => Self::Primitive(PrimitiveType::Int16),
+            DataType::Int32 => Self::Primitive(PrimitiveType::Int32),
+            DataType::Int64 => Self::Primitive(PrimitiveType::Int64),
+            DataType::UInt8 => Self::Primitive(PrimitiveType::UInt8),
+            DataType::UInt16 => Self::Primitive(PrimitiveType::UInt16),
+            DataType::UInt32 => Self::Primitive(PrimitiveType::UInt32),
+            DataType::UInt64 => Self::Primitive(PrimitiveType::UInt64),
+            DataType::Float16 => Self::Primitive(PrimitiveType::Float16),
+            DataType::Float32 => Self::Primitive(PrimitiveType::Float32),
+            DataType::Float64 => Self::Primitive(PrimitiveType::Float64),
+            DataType::Timestamp(_, _) => Self::Primitive(PrimitiveType::Int64),
+            DataType::Date32 => Self::Primitive(PrimitiveType::Int32),
+            DataType::Date64 => Self::Primitive(PrimitiveType::Int64),
+            DataType::Time32(_) => Self::Primitive(PrimitiveType::Int32),
+            DataType::Time64(_) => Self::Primitive(PrimitiveType::Int64),
+            DataType::Duration(_) => Self::Primitive(PrimitiveType::Int64),
+            DataType::Decimal128(_, _) => Self::Primitive(PrimitiveType::Int128),
+            DataType::Decimal256(_, _) => Self::Primitive(PrimitiveType::Int256),
+            DataType::Interval(IntervalUnit::YearMonth) => {
+                Self::Primitive(PrimitiveType::Int32)
+            }
+            DataType::Interval(IntervalUnit::DayTime) => {
+                Self::Primitive(PrimitiveType::Int64)
+            }
+            DataType::Interval(IntervalUnit::MonthDayNano) => {
+                Self::Primitive(PrimitiveType::Int128)
+            }
+            DataType::Binary => Self::Bytes(OffsetType::Small, BytesType::Binary),
+            DataType::FixedSizeBinary(_) => Self::FixedSizeBinary,
+            DataType::LargeBinary => Self::Bytes(OffsetType::Large, BytesType::Binary),

Review Comment:
   Just to keep variable length binary close.
   ```suggestion
               DataType::Binary => Self::Bytes(OffsetType::Small, BytesType::Binary),
               DataType::LargeBinary => Self::Bytes(OffsetType::Large, BytesType::Binary),
               DataType::FixedSizeBinary(_) => Self::FixedSizeBinary,
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #3749: ArrayData Enumeration for Primitive, Binary and UTF8

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #3749:
URL: https://github.com/apache/arrow-rs/pull/3749#discussion_r1118381311


##########
arrow-data/src/data/primitive.rs:
##########
@@ -0,0 +1,184 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::types::{PhysicalType, PrimitiveType};
+use arrow_buffer::buffer::{NullBuffer, ScalarBuffer};
+use arrow_buffer::{i256, ArrowNativeType};
+use arrow_schema::DataType;
+use half::f16;
+
+mod private {
+    use super::*;
+
+    pub trait PrimitiveSealed {
+        /// Downcast [`ArrayDataPrimitive`] to `[PrimitiveArrayData`]
+        fn downcast_ref(data: &ArrayDataPrimitive) -> Option<&PrimitiveArrayData<Self>>
+        where
+            Self: Primitive;
+
+        /// Downcast [`ArrayDataPrimitive`] to `[PrimitiveArrayData`]
+        fn downcast(data: ArrayDataPrimitive) -> Option<PrimitiveArrayData<Self>>
+        where
+            Self: Primitive;
+
+        /// Cast [`ArrayDataPrimitive`] to [`ArrayDataPrimitive`]
+        fn upcast(v: PrimitiveArrayData<Self>) -> ArrayDataPrimitive
+        where
+            Self: Primitive;
+    }
+}
+
+pub trait Primitive: private::PrimitiveSealed + ArrowNativeType {
+    const VARIANT: PrimitiveType;
+}

Review Comment:
   Seems this trait and the const need document.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3749: ArrayData Enumeration for Primitive, Binary and UTF8

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3749:
URL: https://github.com/apache/arrow-rs/pull/3749#discussion_r1114731523


##########
arrow-data/src/data/bytes.rs:
##########
@@ -0,0 +1,288 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::types::{BytesType, OffsetType};
+use arrow_buffer::buffer::{NullBuffer, ScalarBuffer};
+use arrow_buffer::{ArrowNativeType, Buffer};
+use arrow_schema::DataType;
+use std::marker::PhantomData;
+
+mod private {
+    use super::*;
+
+    pub trait BytesSealed {

Review Comment:
   This two-level formulation is somewhat arcane, but allows `BytesArrayData` to be typed solely on actual physical types, without needing a logical `ByteArrayType`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3749: ArrayData Enumeration for Primitive, Binary and UTF8

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3749:
URL: https://github.com/apache/arrow-rs/pull/3749#discussion_r1118732466


##########
arrow-data/src/data/types.rs:
##########
@@ -0,0 +1,153 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow_schema::{DataType, IntervalUnit};
+
+/// An enumeration of the primitive types implementing [`ArrowNativeType`](arrow_buffer::ArrowNativeType)
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum PrimitiveType {
+    Int8,
+    Int16,
+    Int32,
+    Int64,
+    Int128,
+    Int256,
+    UInt8,
+    UInt16,
+    UInt32,
+    UInt64,
+    Float16,
+    Float32,
+    Float64,
+}
+
+/// An enumeration of the types of offsets for variable length encodings
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum OffsetType {
+    Small,
+    Large,
+}
+
+/// An enumeration of the types of variable length byte arrays
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum BytesType {
+    Binary,
+    Utf8,
+}
+
+/// An enumeration of the types of dictionary key
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum DictionaryKeyType {
+    Int8,
+    Int16,
+    Int32,
+    Int64,
+    UInt8,
+    UInt16,
+    UInt32,
+    UInt64,
+}
+
+/// An enumeration of the types of run key
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum RunEndType {
+    Int16,
+    Int32,
+    Int64,
+}
+
+/// Describes the physical representation of a given [`DataType`]
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum PhysicalType {

Review Comment:
   I anticipate it being used in other safe array data constructors, but yes this is the only current use-case



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3749: ArrayData Enumeration for Primitive, Binary and UTF8

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3749:
URL: https://github.com/apache/arrow-rs/pull/3749#discussion_r1118729198


##########
arrow-data/src/data/bytes.rs:
##########
@@ -0,0 +1,288 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::types::{BytesType, OffsetType};
+use arrow_buffer::buffer::{NullBuffer, ScalarBuffer};
+use arrow_buffer::{ArrowNativeType, Buffer};
+use arrow_schema::DataType;
+use std::marker::PhantomData;
+
+mod private {
+    use super::*;
+
+    pub trait BytesSealed {
+        /// Create from bytes without performing any validation
+        ///
+        /// # Safety
+        ///
+        /// If `str`, `b` must be a valid UTF-8 sequence
+        unsafe fn from_bytes_unchecked(b: &[u8]) -> &Self;
+
+        /// Downcast [`ArrayDataBytes`] to `[ArrayDataBytesOffset`]
+        fn downcast_ref(data: &ArrayDataBytes) -> Option<&ArrayDataBytesOffset<Self>>
+        where
+            Self: Bytes;
+
+        /// Downcast [`ArrayDataBytes`] to `[ArrayDataBytesOffset`]
+        fn downcast(data: ArrayDataBytes) -> Option<ArrayDataBytesOffset<Self>>
+        where
+            Self: Bytes;
+
+        /// Cast [`ArrayDataBytesOffset`] to [`ArrayDataBytes`]
+        fn upcast(v: ArrayDataBytesOffset<Self>) -> ArrayDataBytes
+        where
+            Self: Bytes;
+    }
+
+    pub trait BytesOffsetSealed {
+        /// Downcast [`ArrayDataBytesOffset`] to `[BytesArrayData`]
+        fn downcast_ref<B: Bytes + ?Sized>(
+            data: &ArrayDataBytesOffset<B>,
+        ) -> Option<&BytesArrayData<Self, B>>
+        where
+            Self: BytesOffset;
+
+        /// Downcast [`ArrayDataBytesOffset`] to `[BytesArrayData`]
+        fn downcast<B: Bytes + ?Sized>(
+            data: ArrayDataBytesOffset<B>,
+        ) -> Option<BytesArrayData<Self, B>>
+        where
+            Self: BytesOffset;
+
+        /// Cast [`BytesArrayData`] to [`ArrayDataBytesOffset`]
+        fn upcast<B: Bytes + ?Sized>(
+            v: BytesArrayData<Self, B>,
+        ) -> ArrayDataBytesOffset<B>
+        where
+            Self: BytesOffset;
+    }
+}
+
+/// Types backed by a variable length slice of bytes
+pub trait Bytes: private::BytesSealed {
+    const TYPE: BytesType;
+}
+
+impl Bytes for [u8] {
+    const TYPE: BytesType = BytesType::Binary;
+}
+
+impl private::BytesSealed for [u8] {
+    unsafe fn from_bytes_unchecked(b: &[u8]) -> &Self {
+        b
+    }
+
+    fn downcast_ref(data: &ArrayDataBytes) -> Option<&ArrayDataBytesOffset<Self>> {
+        match data {
+            ArrayDataBytes::Binary(v) => Some(v),
+            ArrayDataBytes::Utf8(_) => None,
+        }
+    }
+
+    fn downcast(data: ArrayDataBytes) -> Option<ArrayDataBytesOffset<Self>> {
+        match data {
+            ArrayDataBytes::Binary(v) => Some(v),
+            ArrayDataBytes::Utf8(_) => None,
+        }
+    }
+
+    fn upcast(v: ArrayDataBytesOffset<Self>) -> ArrayDataBytes {
+        ArrayDataBytes::Binary(v)
+    }
+}
+
+impl Bytes for str {
+    const TYPE: BytesType = BytesType::Utf8;
+}
+
+impl private::BytesSealed for str {
+    unsafe fn from_bytes_unchecked(b: &[u8]) -> &Self {
+        std::str::from_utf8_unchecked(b)
+    }
+
+    fn downcast_ref(data: &ArrayDataBytes) -> Option<&ArrayDataBytesOffset<Self>> {
+        match data {
+            ArrayDataBytes::Binary(_) => None,
+            ArrayDataBytes::Utf8(v) => Some(v),
+        }
+    }
+
+    fn downcast(data: ArrayDataBytes) -> Option<ArrayDataBytesOffset<Self>> {
+        match data {
+            ArrayDataBytes::Binary(_) => None,
+            ArrayDataBytes::Utf8(v) => Some(v),
+        }
+    }
+
+    fn upcast(v: ArrayDataBytesOffset<Self>) -> ArrayDataBytes {
+        ArrayDataBytes::Utf8(v)
+    }
+}
+
+/// Types of offset used by variable length byte arrays
+pub trait BytesOffset: private::BytesOffsetSealed + ArrowNativeType {
+    const TYPE: OffsetType;
+}
+
+impl BytesOffset for i32 {
+    const TYPE: OffsetType = OffsetType::Small;
+}
+
+impl private::BytesOffsetSealed for i32 {
+    fn downcast_ref<B: Bytes + ?Sized>(
+        data: &ArrayDataBytesOffset<B>,
+    ) -> Option<&BytesArrayData<Self, B>> {
+        match data {
+            ArrayDataBytesOffset::Small(v) => Some(v),
+            ArrayDataBytesOffset::Large(_) => None,
+        }
+    }
+
+    fn downcast<B: Bytes + ?Sized>(
+        data: ArrayDataBytesOffset<B>,
+    ) -> Option<BytesArrayData<Self, B>> {
+        match data {
+            ArrayDataBytesOffset::Small(v) => Some(v),
+            ArrayDataBytesOffset::Large(_) => None,
+        }
+    }
+
+    fn upcast<B: Bytes + ?Sized>(v: BytesArrayData<Self, B>) -> ArrayDataBytesOffset<B> {
+        ArrayDataBytesOffset::Small(v)
+    }
+}
+
+impl BytesOffset for i64 {
+    const TYPE: OffsetType = OffsetType::Large;
+}
+
+impl private::BytesOffsetSealed for i64 {
+    fn downcast_ref<B: Bytes + ?Sized>(
+        data: &ArrayDataBytesOffset<B>,
+    ) -> Option<&BytesArrayData<Self, B>> {
+        match data {
+            ArrayDataBytesOffset::Small(_) => None,
+            ArrayDataBytesOffset::Large(v) => Some(v),
+        }
+    }
+
+    fn downcast<B: Bytes + ?Sized>(
+        data: ArrayDataBytesOffset<B>,
+    ) -> Option<BytesArrayData<Self, B>> {
+        match data {
+            ArrayDataBytesOffset::Small(_) => None,
+            ArrayDataBytesOffset::Large(v) => Some(v),
+        }
+    }
+
+    fn upcast<B: Bytes + ?Sized>(v: BytesArrayData<Self, B>) -> ArrayDataBytesOffset<B> {
+        ArrayDataBytesOffset::Large(v)
+    }
+}
+
+/// An enumeration of the types of [`ArrayDataBytesOffset`]
+pub enum ArrayDataBytes {
+    Binary(ArrayDataBytesOffset<[u8]>),
+    Utf8(ArrayDataBytesOffset<str>),
+}
+
+impl ArrayDataBytes {
+    /// Downcast this [`ArrayDataBytes`] to the corresponding [`BytesArrayData`]
+    pub fn downcast_ref<O: BytesOffset, B: Bytes + ?Sized>(
+        &self,
+    ) -> Option<&BytesArrayData<O, B>> {
+        O::downcast_ref(B::downcast_ref(self)?)
+    }
+
+    /// Downcast this [`ArrayDataBytes`] to the corresponding [`BytesArrayData`]
+    pub fn downcast<O: BytesOffset, B: Bytes + ?Sized>(
+        self,
+    ) -> Option<BytesArrayData<O, B>> {
+        O::downcast(B::downcast(self)?)
+    }
+}
+
+/// An enumeration of the types of [`BytesArrayData`]
+pub enum ArrayDataBytesOffset<B: Bytes + ?Sized> {
+    Small(BytesArrayData<i32, B>),
+    Large(BytesArrayData<i64, B>),
+}
+
+impl<O: BytesOffset, B: Bytes + ?Sized> From<BytesArrayData<O, B>> for ArrayDataBytes {
+    fn from(value: BytesArrayData<O, B>) -> Self {
+        B::upcast(O::upcast(value))
+    }
+}
+
+/// ArrayData for arrays of [`Bytes`]
+pub struct BytesArrayData<O: BytesOffset, B: Bytes + ?Sized> {
+    data_type: DataType,
+    nulls: Option<NullBuffer>,
+    offsets: ScalarBuffer<O>,
+    values: Buffer,
+    phantom: PhantomData<B>,
+}
+
+impl<O: BytesOffset, B: Bytes> BytesArrayData<O, B> {
+    /// Creates a new [`BytesArrayData`]
+    ///
+    /// # Safety
+    ///
+    /// - Each consecutive window of `offsets` must identify a valid slice of `values`
+    /// - `nulls.len() == offsets.len() + 1`
+    /// - `data_type` must be valid for this layout
+    pub unsafe fn new_unchecked(

Review Comment:
   A future PR will likely add this, as part of replacing the validate logic currently in ArrayData



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3749: ArrayData Enumeration for Primitive, Binary and UTF8

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3749:
URL: https://github.com/apache/arrow-rs/pull/3749#discussion_r1117500855


##########
arrow-data/src/data/primitive.rs:
##########
@@ -0,0 +1,184 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::types::{PhysicalType, PrimitiveType};
+use arrow_buffer::buffer::{NullBuffer, ScalarBuffer};
+use arrow_buffer::{i256, ArrowNativeType};
+use arrow_schema::DataType;
+use half::f16;
+
+mod private {
+    use super::*;
+
+    pub trait PrimitiveSealed {
+        /// Downcast [`ArrayDataPrimitive`] to `[PrimitiveArrayData`]
+        fn downcast_ref(data: &ArrayDataPrimitive) -> Option<&PrimitiveArrayData<Self>>
+        where
+            Self: Primitive;
+
+        /// Downcast [`ArrayDataPrimitive`] to `[PrimitiveArrayData`]
+        fn downcast(data: ArrayDataPrimitive) -> Option<PrimitiveArrayData<Self>>
+        where
+            Self: Primitive;
+
+        /// Cast [`ArrayDataPrimitive`] to [`ArrayDataPrimitive`]
+        fn upcast(v: PrimitiveArrayData<Self>) -> ArrayDataPrimitive
+        where
+            Self: Primitive;
+    }
+}
+
+pub trait Primitive: private::PrimitiveSealed + ArrowNativeType {
+    const VARIANT: PrimitiveType;
+}
+
+macro_rules! primitive {
+    ($t:ty,$v:ident) => {
+        impl Primitive for $t {
+            const VARIANT: PrimitiveType = PrimitiveType::$v;
+        }
+        impl private::PrimitiveSealed for $t {
+            fn downcast_ref(
+                data: &ArrayDataPrimitive,
+            ) -> Option<&PrimitiveArrayData<Self>> {
+                match data {
+                    ArrayDataPrimitive::$v(v) => Some(v),
+                    _ => None,
+                }
+            }
+
+            fn downcast(data: ArrayDataPrimitive) -> Option<PrimitiveArrayData<Self>> {
+                match data {
+                    ArrayDataPrimitive::$v(v) => Some(v),
+                    _ => None,
+                }
+            }
+
+            fn upcast(v: PrimitiveArrayData<Self>) -> ArrayDataPrimitive {
+                ArrayDataPrimitive::$v(v)
+            }
+        }
+    };
+}
+
+primitive!(i8, Int8);
+primitive!(i16, Int16);
+primitive!(i32, Int32);
+primitive!(i64, Int64);
+primitive!(i128, Int128);
+primitive!(i256, Int256);
+primitive!(u8, UInt8);
+primitive!(u16, UInt16);
+primitive!(u32, UInt32);
+primitive!(u64, UInt64);
+primitive!(f16, Float16);
+primitive!(f32, Float32);
+primitive!(f64, Float64);
+
+/// An enumeration of the types of [`PrimitiveArrayData`]
+pub enum ArrayDataPrimitive {
+    Int8(PrimitiveArrayData<i8>),
+    Int16(PrimitiveArrayData<i16>),
+    Int32(PrimitiveArrayData<i32>),
+    Int64(PrimitiveArrayData<i64>),
+    Int128(PrimitiveArrayData<i128>),
+    Int256(PrimitiveArrayData<i256>),
+    UInt8(PrimitiveArrayData<u8>),
+    UInt16(PrimitiveArrayData<u16>),
+    UInt32(PrimitiveArrayData<u32>),
+    UInt64(PrimitiveArrayData<u64>),
+    Float16(PrimitiveArrayData<f16>),
+    Float32(PrimitiveArrayData<f32>),
+    Float64(PrimitiveArrayData<f64>),
+}
+
+impl ArrayDataPrimitive {
+    /// Downcast this [`ArrayDataPrimitive`] to the corresponding [`PrimitiveArrayData`]
+    pub fn downcast_ref<P: Primitive>(&self) -> Option<&PrimitiveArrayData<P>> {
+        P::downcast_ref(self)
+    }
+
+    /// Downcast this [`ArrayDataPrimitive`] to the corresponding [`PrimitiveArrayData`]
+    pub fn downcast<P: Primitive>(self) -> Option<PrimitiveArrayData<P>> {
+        P::downcast(self)
+    }
+}
+
+/// ArrayData for arrays of [`Primitive`]
+#[derive(Debug, Clone)]
+pub struct PrimitiveArrayData<T: Primitive> {
+    data_type: DataType,
+    nulls: Option<NullBuffer>,
+    values: ScalarBuffer<T>,
+}

Review Comment:
   Yes I anticipate arrays will move to using these abstractions directly. I suspect they will need to keep ArrayData at least for a time as Array::data returns a reference, but eventually I anticipate ArrayData only being used for generic dyn kernels



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #3749: ArrayData Enumeration for Primitive, Binary and UTF8

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #3749:
URL: https://github.com/apache/arrow-rs/pull/3749#discussion_r1118367094


##########
arrow-data/src/data/bytes.rs:
##########
@@ -0,0 +1,288 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::types::{BytesType, OffsetType};
+use arrow_buffer::buffer::{NullBuffer, ScalarBuffer};
+use arrow_buffer::{ArrowNativeType, Buffer};
+use arrow_schema::DataType;
+use std::marker::PhantomData;
+
+mod private {
+    use super::*;
+
+    pub trait BytesSealed {
+        /// Create from bytes without performing any validation
+        ///
+        /// # Safety
+        ///
+        /// If `str`, `b` must be a valid UTF-8 sequence
+        unsafe fn from_bytes_unchecked(b: &[u8]) -> &Self;
+
+        /// Downcast [`ArrayDataBytes`] to `[ArrayDataBytesOffset`]
+        fn downcast_ref(data: &ArrayDataBytes) -> Option<&ArrayDataBytesOffset<Self>>
+        where
+            Self: Bytes;
+
+        /// Downcast [`ArrayDataBytes`] to `[ArrayDataBytesOffset`]
+        fn downcast(data: ArrayDataBytes) -> Option<ArrayDataBytesOffset<Self>>
+        where
+            Self: Bytes;
+
+        /// Cast [`ArrayDataBytesOffset`] to [`ArrayDataBytes`]
+        fn upcast(v: ArrayDataBytesOffset<Self>) -> ArrayDataBytes
+        where
+            Self: Bytes;
+    }
+
+    pub trait BytesOffsetSealed {
+        /// Downcast [`ArrayDataBytesOffset`] to `[BytesArrayData`]
+        fn downcast_ref<B: Bytes + ?Sized>(
+            data: &ArrayDataBytesOffset<B>,
+        ) -> Option<&BytesArrayData<Self, B>>
+        where
+            Self: BytesOffset;
+
+        /// Downcast [`ArrayDataBytesOffset`] to `[BytesArrayData`]
+        fn downcast<B: Bytes + ?Sized>(
+            data: ArrayDataBytesOffset<B>,
+        ) -> Option<BytesArrayData<Self, B>>
+        where
+            Self: BytesOffset;
+
+        /// Cast [`BytesArrayData`] to [`ArrayDataBytesOffset`]
+        fn upcast<B: Bytes + ?Sized>(
+            v: BytesArrayData<Self, B>,
+        ) -> ArrayDataBytesOffset<B>
+        where
+            Self: BytesOffset;
+    }
+}
+
+/// Types backed by a variable length slice of bytes
+pub trait Bytes: private::BytesSealed {
+    const TYPE: BytesType;
+}
+
+impl Bytes for [u8] {
+    const TYPE: BytesType = BytesType::Binary;
+}
+
+impl private::BytesSealed for [u8] {
+    unsafe fn from_bytes_unchecked(b: &[u8]) -> &Self {
+        b
+    }
+
+    fn downcast_ref(data: &ArrayDataBytes) -> Option<&ArrayDataBytesOffset<Self>> {
+        match data {
+            ArrayDataBytes::Binary(v) => Some(v),
+            ArrayDataBytes::Utf8(_) => None,
+        }
+    }
+
+    fn downcast(data: ArrayDataBytes) -> Option<ArrayDataBytesOffset<Self>> {
+        match data {
+            ArrayDataBytes::Binary(v) => Some(v),
+            ArrayDataBytes::Utf8(_) => None,
+        }
+    }
+
+    fn upcast(v: ArrayDataBytesOffset<Self>) -> ArrayDataBytes {
+        ArrayDataBytes::Binary(v)
+    }
+}
+
+impl Bytes for str {
+    const TYPE: BytesType = BytesType::Utf8;
+}
+
+impl private::BytesSealed for str {
+    unsafe fn from_bytes_unchecked(b: &[u8]) -> &Self {
+        std::str::from_utf8_unchecked(b)
+    }
+
+    fn downcast_ref(data: &ArrayDataBytes) -> Option<&ArrayDataBytesOffset<Self>> {
+        match data {
+            ArrayDataBytes::Binary(_) => None,
+            ArrayDataBytes::Utf8(v) => Some(v),
+        }
+    }
+
+    fn downcast(data: ArrayDataBytes) -> Option<ArrayDataBytesOffset<Self>> {
+        match data {
+            ArrayDataBytes::Binary(_) => None,
+            ArrayDataBytes::Utf8(v) => Some(v),
+        }
+    }
+
+    fn upcast(v: ArrayDataBytesOffset<Self>) -> ArrayDataBytes {
+        ArrayDataBytes::Utf8(v)
+    }
+}
+
+/// Types of offset used by variable length byte arrays
+pub trait BytesOffset: private::BytesOffsetSealed + ArrowNativeType {
+    const TYPE: OffsetType;
+}
+
+impl BytesOffset for i32 {
+    const TYPE: OffsetType = OffsetType::Small;
+}
+
+impl private::BytesOffsetSealed for i32 {
+    fn downcast_ref<B: Bytes + ?Sized>(
+        data: &ArrayDataBytesOffset<B>,
+    ) -> Option<&BytesArrayData<Self, B>> {
+        match data {
+            ArrayDataBytesOffset::Small(v) => Some(v),
+            ArrayDataBytesOffset::Large(_) => None,
+        }
+    }
+
+    fn downcast<B: Bytes + ?Sized>(
+        data: ArrayDataBytesOffset<B>,
+    ) -> Option<BytesArrayData<Self, B>> {
+        match data {
+            ArrayDataBytesOffset::Small(v) => Some(v),
+            ArrayDataBytesOffset::Large(_) => None,
+        }
+    }
+
+    fn upcast<B: Bytes + ?Sized>(v: BytesArrayData<Self, B>) -> ArrayDataBytesOffset<B> {
+        ArrayDataBytesOffset::Small(v)
+    }
+}
+
+impl BytesOffset for i64 {
+    const TYPE: OffsetType = OffsetType::Large;
+}
+
+impl private::BytesOffsetSealed for i64 {
+    fn downcast_ref<B: Bytes + ?Sized>(
+        data: &ArrayDataBytesOffset<B>,
+    ) -> Option<&BytesArrayData<Self, B>> {
+        match data {
+            ArrayDataBytesOffset::Small(_) => None,
+            ArrayDataBytesOffset::Large(v) => Some(v),
+        }
+    }
+
+    fn downcast<B: Bytes + ?Sized>(
+        data: ArrayDataBytesOffset<B>,
+    ) -> Option<BytesArrayData<Self, B>> {
+        match data {
+            ArrayDataBytesOffset::Small(_) => None,
+            ArrayDataBytesOffset::Large(v) => Some(v),
+        }
+    }
+
+    fn upcast<B: Bytes + ?Sized>(v: BytesArrayData<Self, B>) -> ArrayDataBytesOffset<B> {
+        ArrayDataBytesOffset::Large(v)
+    }
+}
+
+/// An enumeration of the types of [`ArrayDataBytesOffset`]
+pub enum ArrayDataBytes {
+    Binary(ArrayDataBytesOffset<[u8]>),
+    Utf8(ArrayDataBytesOffset<str>),
+}
+
+impl ArrayDataBytes {
+    /// Downcast this [`ArrayDataBytes`] to the corresponding [`BytesArrayData`]
+    pub fn downcast_ref<O: BytesOffset, B: Bytes + ?Sized>(
+        &self,
+    ) -> Option<&BytesArrayData<O, B>> {
+        O::downcast_ref(B::downcast_ref(self)?)
+    }
+
+    /// Downcast this [`ArrayDataBytes`] to the corresponding [`BytesArrayData`]
+    pub fn downcast<O: BytesOffset, B: Bytes + ?Sized>(
+        self,
+    ) -> Option<BytesArrayData<O, B>> {
+        O::downcast(B::downcast(self)?)
+    }
+}
+
+/// An enumeration of the types of [`BytesArrayData`]
+pub enum ArrayDataBytesOffset<B: Bytes + ?Sized> {
+    Small(BytesArrayData<i32, B>),
+    Large(BytesArrayData<i64, B>),
+}

Review Comment:
   Is this middle enum `ArrayDataBytesOffset` useful? Cannot we just have four `BytesArrayData` in `ArrayDataBytes`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3749: ArrayData Enumeration for Primitive, Binary and UTF8

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3749:
URL: https://github.com/apache/arrow-rs/pull/3749#discussion_r1114732276


##########
arrow-data/src/data/types.rs:
##########
@@ -0,0 +1,153 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow_schema::{DataType, IntervalUnit};
+
+/// An enumeration of the primitive types implementing [`ArrowNativeType`](arrow_buffer::ArrowNativeType)
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum PrimitiveType {
+    Int8,
+    Int16,
+    Int32,
+    Int64,
+    Int128,
+    Int256,
+    UInt8,
+    UInt16,
+    UInt32,
+    UInt64,
+    Float16,
+    Float32,
+    Float64,
+}
+
+/// An enumeration of the types of offsets for variable length encodings
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum OffsetType {
+    Small,
+    Large,
+}
+
+/// An enumeration of the types of variable length byte arrays
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum BytesType {
+    Binary,
+    Utf8,
+}
+
+/// An enumeration of the types of dictionary key
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum DictionaryKeyType {
+    Int8,
+    Int16,
+    Int32,
+    Int64,
+    UInt8,
+    UInt16,
+    UInt32,
+    UInt64,
+}
+
+/// An enumeration of the types of run key
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum RunEndType {
+    Int16,
+    Int32,
+    Int64,
+}
+
+/// Describes the physical representation of a given [`DataType`]
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum PhysicalType {

Review Comment:
   This mirrors what the eventual `ArrayData` enumeration will look like, with each of these variants corresponding to an `ArrayData` variant



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3749: ArrayData Enumeration for Primitive, Binary and UTF8

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3749:
URL: https://github.com/apache/arrow-rs/pull/3749#discussion_r1114732276


##########
arrow-data/src/data/types.rs:
##########
@@ -0,0 +1,153 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow_schema::{DataType, IntervalUnit};
+
+/// An enumeration of the primitive types implementing [`ArrowNativeType`](arrow_buffer::ArrowNativeType)
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum PrimitiveType {
+    Int8,
+    Int16,
+    Int32,
+    Int64,
+    Int128,
+    Int256,
+    UInt8,
+    UInt16,
+    UInt32,
+    UInt64,
+    Float16,
+    Float32,
+    Float64,
+}
+
+/// An enumeration of the types of offsets for variable length encodings
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum OffsetType {
+    Small,
+    Large,
+}
+
+/// An enumeration of the types of variable length byte arrays
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum BytesType {
+    Binary,
+    Utf8,
+}
+
+/// An enumeration of the types of dictionary key
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum DictionaryKeyType {
+    Int8,
+    Int16,
+    Int32,
+    Int64,
+    UInt8,
+    UInt16,
+    UInt32,
+    UInt64,
+}
+
+/// An enumeration of the types of run key
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum RunEndType {
+    Int16,
+    Int32,
+    Int64,
+}
+
+/// Describes the physical representation of a given [`DataType`]
+#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)]
+pub enum PhysicalType {

Review Comment:
   This mirrors what the eventual `ArrayData` enumeration will look like



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3749: ArrayData Enumeration for Primitive, Binary and UTF8

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3749:
URL: https://github.com/apache/arrow-rs/pull/3749#discussion_r1114729693


##########
arrow-buffer/src/buffer/boolean.rs:
##########
@@ -0,0 +1,84 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::{bit_util, Buffer};
+
+/// A slice-able [`Buffer`] containing bit-packed booleans
+#[derive(Debug, Clone)]
+pub struct BooleanBuffer {

Review Comment:
   This closes #1802 as discussed in #3700 this needs to be a new type to avoid breaking existing workloads



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #3749: ArrayData Enumeration for Primitive, Binary and UTF8

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #3749:
URL: https://github.com/apache/arrow-rs/pull/3749#discussion_r1118368762


##########
arrow-data/src/data/bytes.rs:
##########
@@ -0,0 +1,288 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::types::{BytesType, OffsetType};
+use arrow_buffer::buffer::{NullBuffer, ScalarBuffer};
+use arrow_buffer::{ArrowNativeType, Buffer};
+use arrow_schema::DataType;
+use std::marker::PhantomData;
+
+mod private {
+    use super::*;
+
+    pub trait BytesSealed {
+        /// Create from bytes without performing any validation
+        ///
+        /// # Safety
+        ///
+        /// If `str`, `b` must be a valid UTF-8 sequence
+        unsafe fn from_bytes_unchecked(b: &[u8]) -> &Self;
+
+        /// Downcast [`ArrayDataBytes`] to `[ArrayDataBytesOffset`]
+        fn downcast_ref(data: &ArrayDataBytes) -> Option<&ArrayDataBytesOffset<Self>>
+        where
+            Self: Bytes;
+
+        /// Downcast [`ArrayDataBytes`] to `[ArrayDataBytesOffset`]
+        fn downcast(data: ArrayDataBytes) -> Option<ArrayDataBytesOffset<Self>>
+        where
+            Self: Bytes;
+
+        /// Cast [`ArrayDataBytesOffset`] to [`ArrayDataBytes`]
+        fn upcast(v: ArrayDataBytesOffset<Self>) -> ArrayDataBytes
+        where
+            Self: Bytes;
+    }
+
+    pub trait BytesOffsetSealed {
+        /// Downcast [`ArrayDataBytesOffset`] to `[BytesArrayData`]
+        fn downcast_ref<B: Bytes + ?Sized>(
+            data: &ArrayDataBytesOffset<B>,
+        ) -> Option<&BytesArrayData<Self, B>>
+        where
+            Self: BytesOffset;
+
+        /// Downcast [`ArrayDataBytesOffset`] to `[BytesArrayData`]
+        fn downcast<B: Bytes + ?Sized>(
+            data: ArrayDataBytesOffset<B>,
+        ) -> Option<BytesArrayData<Self, B>>
+        where
+            Self: BytesOffset;
+
+        /// Cast [`BytesArrayData`] to [`ArrayDataBytesOffset`]
+        fn upcast<B: Bytes + ?Sized>(
+            v: BytesArrayData<Self, B>,
+        ) -> ArrayDataBytesOffset<B>
+        where
+            Self: BytesOffset;
+    }
+}
+
+/// Types backed by a variable length slice of bytes
+pub trait Bytes: private::BytesSealed {
+    const TYPE: BytesType;
+}
+
+impl Bytes for [u8] {
+    const TYPE: BytesType = BytesType::Binary;
+}
+
+impl private::BytesSealed for [u8] {
+    unsafe fn from_bytes_unchecked(b: &[u8]) -> &Self {
+        b
+    }
+
+    fn downcast_ref(data: &ArrayDataBytes) -> Option<&ArrayDataBytesOffset<Self>> {
+        match data {
+            ArrayDataBytes::Binary(v) => Some(v),
+            ArrayDataBytes::Utf8(_) => None,
+        }
+    }
+
+    fn downcast(data: ArrayDataBytes) -> Option<ArrayDataBytesOffset<Self>> {
+        match data {
+            ArrayDataBytes::Binary(v) => Some(v),
+            ArrayDataBytes::Utf8(_) => None,
+        }
+    }
+
+    fn upcast(v: ArrayDataBytesOffset<Self>) -> ArrayDataBytes {
+        ArrayDataBytes::Binary(v)
+    }
+}
+
+impl Bytes for str {
+    const TYPE: BytesType = BytesType::Utf8;
+}
+
+impl private::BytesSealed for str {
+    unsafe fn from_bytes_unchecked(b: &[u8]) -> &Self {
+        std::str::from_utf8_unchecked(b)
+    }
+
+    fn downcast_ref(data: &ArrayDataBytes) -> Option<&ArrayDataBytesOffset<Self>> {
+        match data {
+            ArrayDataBytes::Binary(_) => None,
+            ArrayDataBytes::Utf8(v) => Some(v),
+        }
+    }
+
+    fn downcast(data: ArrayDataBytes) -> Option<ArrayDataBytesOffset<Self>> {
+        match data {
+            ArrayDataBytes::Binary(_) => None,
+            ArrayDataBytes::Utf8(v) => Some(v),
+        }
+    }
+
+    fn upcast(v: ArrayDataBytesOffset<Self>) -> ArrayDataBytes {
+        ArrayDataBytes::Utf8(v)
+    }
+}
+
+/// Types of offset used by variable length byte arrays
+pub trait BytesOffset: private::BytesOffsetSealed + ArrowNativeType {
+    const TYPE: OffsetType;
+}
+
+impl BytesOffset for i32 {
+    const TYPE: OffsetType = OffsetType::Small;
+}
+
+impl private::BytesOffsetSealed for i32 {
+    fn downcast_ref<B: Bytes + ?Sized>(
+        data: &ArrayDataBytesOffset<B>,
+    ) -> Option<&BytesArrayData<Self, B>> {
+        match data {
+            ArrayDataBytesOffset::Small(v) => Some(v),
+            ArrayDataBytesOffset::Large(_) => None,
+        }
+    }
+
+    fn downcast<B: Bytes + ?Sized>(
+        data: ArrayDataBytesOffset<B>,
+    ) -> Option<BytesArrayData<Self, B>> {
+        match data {
+            ArrayDataBytesOffset::Small(v) => Some(v),
+            ArrayDataBytesOffset::Large(_) => None,
+        }
+    }
+
+    fn upcast<B: Bytes + ?Sized>(v: BytesArrayData<Self, B>) -> ArrayDataBytesOffset<B> {
+        ArrayDataBytesOffset::Small(v)
+    }
+}
+
+impl BytesOffset for i64 {
+    const TYPE: OffsetType = OffsetType::Large;
+}
+
+impl private::BytesOffsetSealed for i64 {
+    fn downcast_ref<B: Bytes + ?Sized>(
+        data: &ArrayDataBytesOffset<B>,
+    ) -> Option<&BytesArrayData<Self, B>> {
+        match data {
+            ArrayDataBytesOffset::Small(_) => None,
+            ArrayDataBytesOffset::Large(v) => Some(v),
+        }
+    }
+
+    fn downcast<B: Bytes + ?Sized>(
+        data: ArrayDataBytesOffset<B>,
+    ) -> Option<BytesArrayData<Self, B>> {
+        match data {
+            ArrayDataBytesOffset::Small(_) => None,
+            ArrayDataBytesOffset::Large(v) => Some(v),
+        }
+    }
+
+    fn upcast<B: Bytes + ?Sized>(v: BytesArrayData<Self, B>) -> ArrayDataBytesOffset<B> {
+        ArrayDataBytesOffset::Large(v)
+    }
+}
+
+/// An enumeration of the types of [`ArrayDataBytesOffset`]
+pub enum ArrayDataBytes {
+    Binary(ArrayDataBytesOffset<[u8]>),
+    Utf8(ArrayDataBytesOffset<str>),
+}
+
+impl ArrayDataBytes {
+    /// Downcast this [`ArrayDataBytes`] to the corresponding [`BytesArrayData`]
+    pub fn downcast_ref<O: BytesOffset, B: Bytes + ?Sized>(
+        &self,
+    ) -> Option<&BytesArrayData<O, B>> {
+        O::downcast_ref(B::downcast_ref(self)?)
+    }
+
+    /// Downcast this [`ArrayDataBytes`] to the corresponding [`BytesArrayData`]
+    pub fn downcast<O: BytesOffset, B: Bytes + ?Sized>(
+        self,
+    ) -> Option<BytesArrayData<O, B>> {
+        O::downcast(B::downcast(self)?)
+    }
+}
+
+/// An enumeration of the types of [`BytesArrayData`]
+pub enum ArrayDataBytesOffset<B: Bytes + ?Sized> {
+    Small(BytesArrayData<i32, B>),
+    Large(BytesArrayData<i64, B>),
+}
+
+impl<O: BytesOffset, B: Bytes + ?Sized> From<BytesArrayData<O, B>> for ArrayDataBytes {
+    fn from(value: BytesArrayData<O, B>) -> Self {
+        B::upcast(O::upcast(value))
+    }
+}
+
+/// ArrayData for arrays of [`Bytes`]
+pub struct BytesArrayData<O: BytesOffset, B: Bytes + ?Sized> {
+    data_type: DataType,
+    nulls: Option<NullBuffer>,
+    offsets: ScalarBuffer<O>,
+    values: Buffer,
+    phantom: PhantomData<B>,
+}
+
+impl<O: BytesOffset, B: Bytes> BytesArrayData<O, B> {
+    /// Creates a new [`BytesArrayData`]
+    ///
+    /// # Safety
+    ///
+    /// - Each consecutive window of `offsets` must identify a valid slice of `values`
+    /// - `nulls.len() == offsets.len() + 1`
+    /// - `data_type` must be valid for this layout
+    pub unsafe fn new_unchecked(

Review Comment:
   Do we need a way to validate it like a `validate` function?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #3749: ArrayData Enumeration for Primitive, Binary and UTF8

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #3749:
URL: https://github.com/apache/arrow-rs/pull/3749#discussion_r1118382844


##########
arrow-data/src/data/primitive.rs:
##########
@@ -0,0 +1,184 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::data::types::{PhysicalType, PrimitiveType};
+use arrow_buffer::buffer::{NullBuffer, ScalarBuffer};
+use arrow_buffer::{i256, ArrowNativeType};
+use arrow_schema::DataType;
+use half::f16;
+
+mod private {
+    use super::*;
+
+    pub trait PrimitiveSealed {
+        /// Downcast [`ArrayDataPrimitive`] to `[PrimitiveArrayData`]
+        fn downcast_ref(data: &ArrayDataPrimitive) -> Option<&PrimitiveArrayData<Self>>
+        where
+            Self: Primitive;
+
+        /// Downcast [`ArrayDataPrimitive`] to `[PrimitiveArrayData`]
+        fn downcast(data: ArrayDataPrimitive) -> Option<PrimitiveArrayData<Self>>
+        where
+            Self: Primitive;
+
+        /// Cast [`ArrayDataPrimitive`] to [`ArrayDataPrimitive`]
+        fn upcast(v: PrimitiveArrayData<Self>) -> ArrayDataPrimitive
+        where
+            Self: Primitive;
+    }
+}
+
+pub trait Primitive: private::PrimitiveSealed + ArrowNativeType {
+    const VARIANT: PrimitiveType;
+}
+
+macro_rules! primitive {
+    ($t:ty,$v:ident) => {
+        impl Primitive for $t {
+            const VARIANT: PrimitiveType = PrimitiveType::$v;
+        }
+        impl private::PrimitiveSealed for $t {
+            fn downcast_ref(
+                data: &ArrayDataPrimitive,
+            ) -> Option<&PrimitiveArrayData<Self>> {
+                match data {
+                    ArrayDataPrimitive::$v(v) => Some(v),
+                    _ => None,
+                }
+            }
+
+            fn downcast(data: ArrayDataPrimitive) -> Option<PrimitiveArrayData<Self>> {
+                match data {
+                    ArrayDataPrimitive::$v(v) => Some(v),
+                    _ => None,
+                }
+            }
+
+            fn upcast(v: PrimitiveArrayData<Self>) -> ArrayDataPrimitive {
+                ArrayDataPrimitive::$v(v)
+            }
+        }
+    };
+}
+
+primitive!(i8, Int8);
+primitive!(i16, Int16);
+primitive!(i32, Int32);
+primitive!(i64, Int64);
+primitive!(i128, Int128);
+primitive!(i256, Int256);
+primitive!(u8, UInt8);
+primitive!(u16, UInt16);
+primitive!(u32, UInt32);
+primitive!(u64, UInt64);
+primitive!(f16, Float16);
+primitive!(f32, Float32);
+primitive!(f64, Float64);
+
+/// An enumeration of the types of [`PrimitiveArrayData`]
+pub enum ArrayDataPrimitive {
+    Int8(PrimitiveArrayData<i8>),
+    Int16(PrimitiveArrayData<i16>),
+    Int32(PrimitiveArrayData<i32>),
+    Int64(PrimitiveArrayData<i64>),
+    Int128(PrimitiveArrayData<i128>),
+    Int256(PrimitiveArrayData<i256>),
+    UInt8(PrimitiveArrayData<u8>),
+    UInt16(PrimitiveArrayData<u16>),
+    UInt32(PrimitiveArrayData<u32>),
+    UInt64(PrimitiveArrayData<u64>),
+    Float16(PrimitiveArrayData<f16>),
+    Float32(PrimitiveArrayData<f32>),
+    Float64(PrimitiveArrayData<f64>),
+}
+
+impl ArrayDataPrimitive {
+    /// Downcast this [`ArrayDataPrimitive`] to the corresponding [`PrimitiveArrayData`]
+    pub fn downcast_ref<P: Primitive>(&self) -> Option<&PrimitiveArrayData<P>> {
+        P::downcast_ref(self)
+    }
+
+    /// Downcast this [`ArrayDataPrimitive`] to the corresponding [`PrimitiveArrayData`]
+    pub fn downcast<P: Primitive>(self) -> Option<PrimitiveArrayData<P>> {
+        P::downcast(self)
+    }
+}
+
+/// ArrayData for arrays of [`Primitive`]
+#[derive(Debug, Clone)]
+pub struct PrimitiveArrayData<T: Primitive> {
+    data_type: DataType,
+    nulls: Option<NullBuffer>,
+    values: ScalarBuffer<T>,
+}
+
+impl<P: Primitive> From<PrimitiveArrayData<P>> for ArrayDataPrimitive {
+    fn from(value: PrimitiveArrayData<P>) -> Self {
+        P::upcast(value)
+    }
+}
+
+impl<T: Primitive> PrimitiveArrayData<T> {
+    /// Create a new [`PrimitiveArrayData`]
+    ///
+    /// # Panics
+    ///
+    /// Panics if
+    /// - `nulls` and `values` are different lengths
+    /// - `data_type` is not compatible with `T`
+    pub fn new(
+        data_type: DataType,
+        values: ScalarBuffer<T>,
+        nulls: Option<NullBuffer>,
+    ) -> Self {
+        let physical = PhysicalType::from(&data_type);
+        assert!(
+            matches!(physical, PhysicalType::Primitive(p) if p == T::VARIANT),
+            "Illegal physical type for PrimitiveArrayData, expected {:?} got {:?}",
+            T::VARIANT,
+            physical
+        );

Review Comment:
   ```suggestion
           assert!(
               matches!(physical, PhysicalType::Primitive(p) if p == T::VARIANT),
               "Illegal physical type for PrimitiveArrayData of datatype {:?}, expected {:?} got {:?}",
               data_type,
               T::VARIANT,
               physical
           );
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org