You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/17 02:54:13 UTC

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #1589: support from_slice for binary, string, and boolean array types

Jimexist opened a new pull request #1589:
URL: https://github.com/apache/arrow-datafusion/pull/1589


   # Which issue does this PR close?
   
   Closes #.
   
    # Rationale for this change
   
   - a follow up on #1588 
   - related to #1556 
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on a change in pull request #1589: support from_slice for binary, string, and boolean array types

Posted by GitBox <gi...@apache.org>.
Jimexist commented on a change in pull request #1589:
URL: https://github.com/apache/arrow-datafusion/pull/1589#discussion_r785950014



##########
File path: datafusion/src/from_slice.rs
##########
@@ -19,27 +19,119 @@
 //!
 //! This file essentially exists to ease the transition onto arrow2
 
-use arrow::array::{ArrayData, PrimitiveArray};
-use arrow::buffer::Buffer;
-use arrow::datatypes::ArrowPrimitiveType;
+use arrow::array::{
+    ArrayData, BinaryOffsetSizeTrait, BooleanArray, GenericBinaryArray,
+    GenericStringArray, PrimitiveArray, StringOffsetSizeTrait,
+};
+use arrow::buffer::{Buffer, MutableBuffer};
+use arrow::datatypes::{ArrowPrimitiveType, DataType};
+use arrow::util::bit_util;
 
 /// A trait to define from_slice functions for arrow primitive array types
-pub trait FromSlice<T>
+pub trait FromSlice<S, E>
 where
-    T: ArrowPrimitiveType,
+    S: AsRef<[E]>,
 {
     /// convert a slice of native types into a primitive array (without nulls)
-    fn from_slice(slice: &[T::Native]) -> PrimitiveArray<T>;
+    fn from_slice(slice: S) -> Self;
 }
 
-/// default implementation for primitive types
-// #[cfg(test)]
-impl<T: ArrowPrimitiveType> FromSlice<T> for PrimitiveArray<T> {
-    fn from_slice(slice: &[T::Native]) -> PrimitiveArray<T> {
+/// default implementation for primitive array types, adapted from `From<Vec<_>>`
+impl<S, T> FromSlice<S, T::Native> for PrimitiveArray<T>
+where
+    T: ArrowPrimitiveType,
+    S: AsRef<[T::Native]>,
+{
+    fn from_slice(slice: S) -> Self {
+        let slice = slice.as_ref();
         let array_data = ArrayData::builder(T::DATA_TYPE)
             .len(slice.len())
             .add_buffer(Buffer::from_slice_ref(&slice));
         let array_data = unsafe { array_data.build_unchecked() };
-        PrimitiveArray::<T>::from(array_data)
+        Self::from(array_data)
+    }
+}
+
+/// default implementation for binary array types, adapted from `From<Vec<_>>`
+impl<S, I, OffsetSize> FromSlice<S, I> for GenericBinaryArray<OffsetSize>
+where
+    OffsetSize: BinaryOffsetSizeTrait,
+    S: AsRef<[I]>,
+    I: AsRef<[u8]>,
+{
+    fn from_slice(slice: S) -> Self {
+        let slice = slice.as_ref();
+        let mut offsets = Vec::with_capacity(slice.len() + 1);
+        let mut values = Vec::new();
+        let mut length_so_far: OffsetSize = OffsetSize::zero();
+        offsets.push(length_so_far);
+        for s in slice {
+            let s = s.as_ref();
+            length_so_far += OffsetSize::from_usize(s.len()).unwrap();
+            offsets.push(length_so_far);
+            values.extend_from_slice(s);
+        }
+        let array_data = ArrayData::builder(OffsetSize::DATA_TYPE)
+            .len(slice.len())
+            .add_buffer(Buffer::from_slice_ref(&offsets))
+            .add_buffer(Buffer::from_slice_ref(&values));
+        let array_data = unsafe { array_data.build_unchecked() };
+        Self::from(array_data)
+    }
+}
+
+/// default implementation for utf8 array types, adapted from `From<Vec<_>>`
+impl<S, I, OffsetSize> FromSlice<S, I> for GenericStringArray<OffsetSize>
+where
+    OffsetSize: StringOffsetSizeTrait,
+    S: AsRef<[I]>,
+    I: AsRef<str>,
+{
+    fn from_slice(slice: S) -> Self {

Review comment:
       @alamb done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on pull request #1589: support from_slice for binary, string, and boolean array types

Posted by GitBox <gi...@apache.org>.
Jimexist commented on pull request #1589:
URL: https://github.com/apache/arrow-datafusion/pull/1589#issuecomment-1014092242


   @houqp after this is merged i guess you can rebase arrow2 branch onto master and reduce some amount of diffs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #1589: support from_slice for binary, string, and boolean array types

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #1589:
URL: https://github.com/apache/arrow-datafusion/pull/1589


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on a change in pull request #1589: support from_slice for binary, string, and boolean array types

Posted by GitBox <gi...@apache.org>.
Jimexist commented on a change in pull request #1589:
URL: https://github.com/apache/arrow-datafusion/pull/1589#discussion_r785949640



##########
File path: datafusion/src/from_slice.rs
##########
@@ -19,27 +19,119 @@
 //!
 //! This file essentially exists to ease the transition onto arrow2
 
-use arrow::array::{ArrayData, PrimitiveArray};
-use arrow::buffer::Buffer;
-use arrow::datatypes::ArrowPrimitiveType;
+use arrow::array::{
+    ArrayData, BinaryOffsetSizeTrait, BooleanArray, GenericBinaryArray,
+    GenericStringArray, PrimitiveArray, StringOffsetSizeTrait,
+};
+use arrow::buffer::{Buffer, MutableBuffer};
+use arrow::datatypes::{ArrowPrimitiveType, DataType};
+use arrow::util::bit_util;
 
 /// A trait to define from_slice functions for arrow primitive array types
-pub trait FromSlice<T>
+pub trait FromSlice<S, E>
 where
-    T: ArrowPrimitiveType,
+    S: AsRef<[E]>,
 {
     /// convert a slice of native types into a primitive array (without nulls)
-    fn from_slice(slice: &[T::Native]) -> PrimitiveArray<T>;
+    fn from_slice(slice: S) -> Self;
 }
 
-/// default implementation for primitive types
-// #[cfg(test)]
-impl<T: ArrowPrimitiveType> FromSlice<T> for PrimitiveArray<T> {
-    fn from_slice(slice: &[T::Native]) -> PrimitiveArray<T> {
+/// default implementation for primitive array types, adapted from `From<Vec<_>>`
+impl<S, T> FromSlice<S, T::Native> for PrimitiveArray<T>
+where
+    T: ArrowPrimitiveType,
+    S: AsRef<[T::Native]>,
+{
+    fn from_slice(slice: S) -> Self {
+        let slice = slice.as_ref();
         let array_data = ArrayData::builder(T::DATA_TYPE)
             .len(slice.len())
             .add_buffer(Buffer::from_slice_ref(&slice));
         let array_data = unsafe { array_data.build_unchecked() };
-        PrimitiveArray::<T>::from(array_data)
+        Self::from(array_data)
+    }
+}
+
+/// default implementation for binary array types, adapted from `From<Vec<_>>`
+impl<S, I, OffsetSize> FromSlice<S, I> for GenericBinaryArray<OffsetSize>
+where
+    OffsetSize: BinaryOffsetSizeTrait,
+    S: AsRef<[I]>,
+    I: AsRef<[u8]>,
+{
+    fn from_slice(slice: S) -> Self {
+        let slice = slice.as_ref();

Review comment:
       @alamb unfortunately this can't be done for StringArray
   
   https://github.com/apache/arrow-datafusion/pull/1589/files/8ad8c0a9ed239d403d9788e63d4df29641ab06f3..e5d5558f51da92f541ca2faf884f4e8abefb00f1#diff-62cf85e721865a8a043ed6f7de87fbbf67a7447a3ba017e019367fa249c490f8R59

##########
File path: datafusion/src/from_slice.rs
##########
@@ -19,27 +19,119 @@
 //!
 //! This file essentially exists to ease the transition onto arrow2
 
-use arrow::array::{ArrayData, PrimitiveArray};
-use arrow::buffer::Buffer;
-use arrow::datatypes::ArrowPrimitiveType;
+use arrow::array::{
+    ArrayData, BinaryOffsetSizeTrait, BooleanArray, GenericBinaryArray,
+    GenericStringArray, PrimitiveArray, StringOffsetSizeTrait,
+};
+use arrow::buffer::{Buffer, MutableBuffer};
+use arrow::datatypes::{ArrowPrimitiveType, DataType};
+use arrow::util::bit_util;
 
 /// A trait to define from_slice functions for arrow primitive array types
-pub trait FromSlice<T>
+pub trait FromSlice<S, E>
 where
-    T: ArrowPrimitiveType,
+    S: AsRef<[E]>,
 {
     /// convert a slice of native types into a primitive array (without nulls)
-    fn from_slice(slice: &[T::Native]) -> PrimitiveArray<T>;
+    fn from_slice(slice: S) -> Self;
 }
 
-/// default implementation for primitive types
-// #[cfg(test)]
-impl<T: ArrowPrimitiveType> FromSlice<T> for PrimitiveArray<T> {
-    fn from_slice(slice: &[T::Native]) -> PrimitiveArray<T> {
+/// default implementation for primitive array types, adapted from `From<Vec<_>>`
+impl<S, T> FromSlice<S, T::Native> for PrimitiveArray<T>
+where
+    T: ArrowPrimitiveType,
+    S: AsRef<[T::Native]>,
+{
+    fn from_slice(slice: S) -> Self {
+        let slice = slice.as_ref();
         let array_data = ArrayData::builder(T::DATA_TYPE)
             .len(slice.len())
             .add_buffer(Buffer::from_slice_ref(&slice));
         let array_data = unsafe { array_data.build_unchecked() };
-        PrimitiveArray::<T>::from(array_data)
+        Self::from(array_data)
+    }
+}
+
+/// default implementation for binary array types, adapted from `From<Vec<_>>`
+impl<S, I, OffsetSize> FromSlice<S, I> for GenericBinaryArray<OffsetSize>
+where
+    OffsetSize: BinaryOffsetSizeTrait,
+    S: AsRef<[I]>,
+    I: AsRef<[u8]>,
+{
+    fn from_slice(slice: S) -> Self {
+        let slice = slice.as_ref();

Review comment:
       @alamb unfortunately this can't be done for BinaryArray
   
   https://github.com/apache/arrow-datafusion/pull/1589/files/8ad8c0a9ed239d403d9788e63d4df29641ab06f3..e5d5558f51da92f541ca2faf884f4e8abefb00f1#diff-62cf85e721865a8a043ed6f7de87fbbf67a7447a3ba017e019367fa249c490f8R59




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on a change in pull request #1589: support from_slice for binary, string, and boolean array types

Posted by GitBox <gi...@apache.org>.
Jimexist commented on a change in pull request #1589:
URL: https://github.com/apache/arrow-datafusion/pull/1589#discussion_r785977125



##########
File path: datafusion/src/from_slice.rs
##########
@@ -19,27 +19,119 @@
 //!
 //! This file essentially exists to ease the transition onto arrow2
 
-use arrow::array::{ArrayData, PrimitiveArray};
-use arrow::buffer::Buffer;
-use arrow::datatypes::ArrowPrimitiveType;
+use arrow::array::{
+    ArrayData, BinaryOffsetSizeTrait, BooleanArray, GenericBinaryArray,
+    GenericStringArray, PrimitiveArray, StringOffsetSizeTrait,
+};
+use arrow::buffer::{Buffer, MutableBuffer};
+use arrow::datatypes::{ArrowPrimitiveType, DataType};
+use arrow::util::bit_util;
 
 /// A trait to define from_slice functions for arrow primitive array types
-pub trait FromSlice<T>
+pub trait FromSlice<S, E>
 where
-    T: ArrowPrimitiveType,
+    S: AsRef<[E]>,
 {
     /// convert a slice of native types into a primitive array (without nulls)
-    fn from_slice(slice: &[T::Native]) -> PrimitiveArray<T>;
+    fn from_slice(slice: S) -> Self;
 }
 
-/// default implementation for primitive types
-// #[cfg(test)]
-impl<T: ArrowPrimitiveType> FromSlice<T> for PrimitiveArray<T> {
-    fn from_slice(slice: &[T::Native]) -> PrimitiveArray<T> {
+/// default implementation for primitive array types, adapted from `From<Vec<_>>`
+impl<S, T> FromSlice<S, T::Native> for PrimitiveArray<T>
+where
+    T: ArrowPrimitiveType,
+    S: AsRef<[T::Native]>,
+{
+    fn from_slice(slice: S) -> Self {
+        let slice = slice.as_ref();
         let array_data = ArrayData::builder(T::DATA_TYPE)
             .len(slice.len())
             .add_buffer(Buffer::from_slice_ref(&slice));
         let array_data = unsafe { array_data.build_unchecked() };
-        PrimitiveArray::<T>::from(array_data)
+        Self::from(array_data)
+    }
+}
+
+/// default implementation for binary array types, adapted from `From<Vec<_>>`
+impl<S, I, OffsetSize> FromSlice<S, I> for GenericBinaryArray<OffsetSize>
+where
+    OffsetSize: BinaryOffsetSizeTrait,
+    S: AsRef<[I]>,
+    I: AsRef<[u8]>,
+{
+    fn from_slice(slice: S) -> Self {
+        let slice = slice.as_ref();

Review comment:
       i guess this change is necessary:
   - https://github.com/apache/arrow-rs/pull/1188




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1589: support from_slice for binary, string, and boolean array types

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1589:
URL: https://github.com/apache/arrow-datafusion/pull/1589#discussion_r785931197



##########
File path: datafusion/src/from_slice.rs
##########
@@ -19,27 +19,119 @@
 //!
 //! This file essentially exists to ease the transition onto arrow2
 
-use arrow::array::{ArrayData, PrimitiveArray};
-use arrow::buffer::Buffer;
-use arrow::datatypes::ArrowPrimitiveType;
+use arrow::array::{
+    ArrayData, BinaryOffsetSizeTrait, BooleanArray, GenericBinaryArray,
+    GenericStringArray, PrimitiveArray, StringOffsetSizeTrait,
+};
+use arrow::buffer::{Buffer, MutableBuffer};
+use arrow::datatypes::{ArrowPrimitiveType, DataType};
+use arrow::util::bit_util;
 
 /// A trait to define from_slice functions for arrow primitive array types
-pub trait FromSlice<T>
+pub trait FromSlice<S, E>
 where
-    T: ArrowPrimitiveType,
+    S: AsRef<[E]>,
 {
     /// convert a slice of native types into a primitive array (without nulls)
-    fn from_slice(slice: &[T::Native]) -> PrimitiveArray<T>;
+    fn from_slice(slice: S) -> Self;
 }
 
-/// default implementation for primitive types
-// #[cfg(test)]
-impl<T: ArrowPrimitiveType> FromSlice<T> for PrimitiveArray<T> {
-    fn from_slice(slice: &[T::Native]) -> PrimitiveArray<T> {
+/// default implementation for primitive array types, adapted from `From<Vec<_>>`
+impl<S, T> FromSlice<S, T::Native> for PrimitiveArray<T>
+where
+    T: ArrowPrimitiveType,
+    S: AsRef<[T::Native]>,
+{
+    fn from_slice(slice: S) -> Self {
+        let slice = slice.as_ref();
         let array_data = ArrayData::builder(T::DATA_TYPE)
             .len(slice.len())
             .add_buffer(Buffer::from_slice_ref(&slice));
         let array_data = unsafe { array_data.build_unchecked() };
-        PrimitiveArray::<T>::from(array_data)
+        Self::from(array_data)
+    }
+}
+
+/// default implementation for binary array types, adapted from `From<Vec<_>>`
+impl<S, I, OffsetSize> FromSlice<S, I> for GenericBinaryArray<OffsetSize>
+where
+    OffsetSize: BinaryOffsetSizeTrait,
+    S: AsRef<[I]>,
+    I: AsRef<[u8]>,
+{
+    fn from_slice(slice: S) -> Self {
+        let slice = slice.as_ref();
+        let mut offsets = Vec::with_capacity(slice.len() + 1);
+        let mut values = Vec::new();
+        let mut length_so_far: OffsetSize = OffsetSize::zero();
+        offsets.push(length_so_far);
+        for s in slice {
+            let s = s.as_ref();
+            length_so_far += OffsetSize::from_usize(s.len()).unwrap();
+            offsets.push(length_so_far);
+            values.extend_from_slice(s);
+        }
+        let array_data = ArrayData::builder(OffsetSize::DATA_TYPE)
+            .len(slice.len())
+            .add_buffer(Buffer::from_slice_ref(&offsets))
+            .add_buffer(Buffer::from_slice_ref(&values));
+        let array_data = unsafe { array_data.build_unchecked() };
+        Self::from(array_data)
+    }
+}
+
+/// default implementation for utf8 array types, adapted from `From<Vec<_>>`
+impl<S, I, OffsetSize> FromSlice<S, I> for GenericStringArray<OffsetSize>
+where
+    OffsetSize: StringOffsetSizeTrait,
+    S: AsRef<[I]>,
+    I: AsRef<str>,
+{
+    fn from_slice(slice: S) -> Self {

Review comment:
       Likewise perhaps something like
   
   ```rust
   fn from_slice(slice: S) -> Self {
     slice.iter().collect()
   }
   ```
   
   ?

##########
File path: datafusion/src/from_slice.rs
##########
@@ -19,27 +19,119 @@
 //!
 //! This file essentially exists to ease the transition onto arrow2
 
-use arrow::array::{ArrayData, PrimitiveArray};
-use arrow::buffer::Buffer;
-use arrow::datatypes::ArrowPrimitiveType;
+use arrow::array::{
+    ArrayData, BinaryOffsetSizeTrait, BooleanArray, GenericBinaryArray,
+    GenericStringArray, PrimitiveArray, StringOffsetSizeTrait,
+};
+use arrow::buffer::{Buffer, MutableBuffer};
+use arrow::datatypes::{ArrowPrimitiveType, DataType};
+use arrow::util::bit_util;
 
 /// A trait to define from_slice functions for arrow primitive array types
-pub trait FromSlice<T>
+pub trait FromSlice<S, E>
 where
-    T: ArrowPrimitiveType,
+    S: AsRef<[E]>,
 {
     /// convert a slice of native types into a primitive array (without nulls)
-    fn from_slice(slice: &[T::Native]) -> PrimitiveArray<T>;
+    fn from_slice(slice: S) -> Self;
 }
 
-/// default implementation for primitive types
-// #[cfg(test)]
-impl<T: ArrowPrimitiveType> FromSlice<T> for PrimitiveArray<T> {
-    fn from_slice(slice: &[T::Native]) -> PrimitiveArray<T> {
+/// default implementation for primitive array types, adapted from `From<Vec<_>>`
+impl<S, T> FromSlice<S, T::Native> for PrimitiveArray<T>
+where
+    T: ArrowPrimitiveType,
+    S: AsRef<[T::Native]>,
+{
+    fn from_slice(slice: S) -> Self {
+        let slice = slice.as_ref();
         let array_data = ArrayData::builder(T::DATA_TYPE)
             .len(slice.len())
             .add_buffer(Buffer::from_slice_ref(&slice));
         let array_data = unsafe { array_data.build_unchecked() };
-        PrimitiveArray::<T>::from(array_data)
+        Self::from(array_data)
+    }
+}
+
+/// default implementation for binary array types, adapted from `From<Vec<_>>`
+impl<S, I, OffsetSize> FromSlice<S, I> for GenericBinaryArray<OffsetSize>
+where
+    OffsetSize: BinaryOffsetSizeTrait,
+    S: AsRef<[I]>,
+    I: AsRef<[u8]>,
+{
+    fn from_slice(slice: S) -> Self {
+        let slice = slice.as_ref();

Review comment:
       I wonder if this could call `from_iter_values` rather than replicate the code in DataFuson?
   
   I haven't tried but perhaps something like
   ```rust
   fn from _slice(slice: S) -> Self {
     GenericBinaryArray::from_iter_values(slice.iter())
   ```
   ?
   
   https://docs.rs/arrow/7.0.0/arrow/array/struct.GenericStringArray.html#method.from_iter_values




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org