You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/21 17:17:22 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request #381: Function to create `ArrayRef` from an iterator of ScalarValues

alamb opened a new pull request #381:
URL: https://github.com/apache/arrow-datafusion/pull/381


   # Which issue does this PR close?
   
   This is part of #363 where I am trying to evaluate pruning predicates on record batches that represent the min/max values of statistics.
   
    # Rationale for this change
   The proposed interface for providing min/max values is as `ScalarValue`s and I need a way to take a bunch of `ScalarValues` and turn them into an Array.
   
   You can see how it is used in https://github.com/apache/arrow-datafusion/pull/380
   
   
   # What changes are included in this PR?
   1. Add `ScalarValue::iter_to_array`, tests for same
   
   # Are there any user-facing changes?
   `ScalarValue::iter_to_array` is now available
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on pull request #381: Function to create `ArrayRef` from an iterator of ScalarValues

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #381:
URL: https://github.com/apache/arrow-datafusion/pull/381#issuecomment-846123286


   ah, sorry, I misunderstand. The idea is to generalize over the parquet min/max. Yeap, makes total sense 👍; sorry for the noise  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #381: Function to create `ArrayRef` from an iterator of ScalarValues

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #381:
URL: https://github.com/apache/arrow-datafusion/pull/381#issuecomment-846542352


   I'll plan to merge this when the CI is green


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #381: Function to create `ArrayRef` from an iterator of ScalarValues

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #381:
URL: https://github.com/apache/arrow-datafusion/pull/381#discussion_r637525780



##########
File path: datafusion/src/scalar.rs
##########
@@ -283,6 +283,155 @@ impl ScalarValue {
         self.to_array_of_size(1)
     }
 
+    /// Converts an iterator of references [`ScalarValue`] into an [`ArrayRef`]
+    /// corresponding to those values. For example,
+    ///
+    /// Returns an error if the iterator is empty or if the
+    /// [`ScalarValue`]s are not all the same type
+    ///
+    /// Example
+    /// ```
+    /// use datafusion::scalar::ScalarValue;
+    /// use arrow::array::{ArrayRef, BooleanArray};
+    ///
+    /// let scalars = vec![
+    ///   ScalarValue::Boolean(Some(true)),
+    ///   ScalarValue::Boolean(None),
+    ///   ScalarValue::Boolean(Some(false)),
+    /// ];
+    ///
+    /// // Build an Array from the list of ScalarValues
+    /// let array = ScalarValue::iter_to_array(scalars.iter())
+    ///   .unwrap();
+    ///
+    /// let expected: ArrayRef = std::sync::Arc::new(
+    ///   BooleanArray::from(vec![
+    ///     Some(true),
+    ///     None,
+    ///     Some(false)
+    ///   ]
+    /// ));
+    ///
+    /// assert_eq!(&array, &expected);
+    /// ```
+    pub fn iter_to_array<'a>(
+        scalars: impl IntoIterator<Item = &'a ScalarValue>,
+    ) -> Result<ArrayRef> {
+        let mut scalars = scalars.into_iter().peekable();
+
+        // figure out the type based on the first element
+        let data_type = match scalars.peek() {

Review comment:
       Yeah -- this confused me a little -- it is convenient that `ScalarValue` is always typed even when it is `None` 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on a change in pull request #381: Function to create `ArrayRef` from an iterator of ScalarValues

Posted by GitBox <gi...@apache.org>.
Jimexist commented on a change in pull request #381:
URL: https://github.com/apache/arrow-datafusion/pull/381#discussion_r637493533



##########
File path: datafusion/src/scalar.rs
##########
@@ -283,6 +283,155 @@ impl ScalarValue {
         self.to_array_of_size(1)
     }
 
+    /// Converts an iterator of references [`ScalarValue`] into an [`ArrayRef`]
+    /// corresponding to those values. For example,
+    ///
+    /// Returns an error if the iterator is empty or if the
+    /// [`ScalarValue`]s are not all the same type
+    ///
+    /// Example
+    /// ```
+    /// use datafusion::scalar::ScalarValue;
+    /// use arrow::array::{ArrayRef, BooleanArray};
+    ///
+    /// let scalars = vec![
+    ///   ScalarValue::Boolean(Some(true)),
+    ///   ScalarValue::Boolean(None),
+    ///   ScalarValue::Boolean(Some(false)),
+    /// ];
+    ///
+    /// // Build an Array from the list of ScalarValues
+    /// let array = ScalarValue::iter_to_array(scalars.iter())
+    ///   .unwrap();
+    ///
+    /// let expected: ArrayRef = std::sync::Arc::new(
+    ///   BooleanArray::from(vec![
+    ///     Some(true),
+    ///     None,
+    ///     Some(false)
+    ///   ]
+    /// ));
+    ///
+    /// assert_eq!(&array, &expected);
+    /// ```
+    pub fn iter_to_array<'a>(
+        scalars: impl IntoIterator<Item = &'a ScalarValue>,
+    ) -> Result<ArrayRef> {
+        let mut scalars = scalars.into_iter().peekable();
+
+        // figure out the type based on the first element
+        let data_type = match scalars.peek() {

Review comment:
       does it make sense to have an iter of a mixture of null and present values? in that case the first value can be null




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on pull request #381: Function to create `ArrayRef` from an iterator of ScalarValues

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #381:
URL: https://github.com/apache/arrow-datafusion/pull/381#issuecomment-846116775


   Shouldn't we use the Builder API to create an array of the values from the statistics instead of `stats -> Vec<Scalar> -> Array`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #381: Function to create `ArrayRef` from an iterator of ScalarValues

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #381:
URL: https://github.com/apache/arrow-datafusion/pull/381#discussion_r637428470



##########
File path: datafusion/src/scalar.rs
##########
@@ -916,4 +1071,139 @@ mod tests {
         assert!(prim_array.is_null(1));
         assert_eq!(prim_array.value(2), 101);
     }
+
+    /// Creates array directly and via ScalarValue and ensures they are the same
+    macro_rules! check_scalar_iter {
+        ($SCALAR_T:ident, $ARRAYTYPE:ident, $INPUT:expr) => {{
+            let scalars: Vec<_> =
+                $INPUT.iter().map(|v| ScalarValue::$SCALAR_T(*v)).collect();
+
+            let array = ScalarValue::iter_to_array(scalars.iter()).unwrap();
+
+            let expected: ArrayRef = Arc::new($ARRAYTYPE::from($INPUT));
+
+            assert_eq!(&array, &expected);
+        }};
+    }
+
+    /// Creates array directly and via ScalarValue and ensures they
+    /// are the same, for string  arrays
+    macro_rules! check_scalar_iter_string {
+        ($SCALAR_T:ident, $ARRAYTYPE:ident, $INPUT:expr) => {{
+            let scalars: Vec<_> = $INPUT
+                .iter()
+                .map(|v| ScalarValue::$SCALAR_T(v.map(|v| v.to_string())))
+                .collect();
+
+            let array = ScalarValue::iter_to_array(scalars.iter()).unwrap();
+
+            let expected: ArrayRef = Arc::new($ARRAYTYPE::from($INPUT));
+
+            assert_eq!(&array, &expected);
+        }};
+    }
+
+    /// Creates array directly and via ScalarValue and ensures they
+    /// are the same, for binary arrays
+    macro_rules! check_scalar_iter_binary {
+        ($SCALAR_T:ident, $ARRAYTYPE:ident, $INPUT:expr) => {{
+            let scalars: Vec<_> = $INPUT
+                .iter()
+                .map(|v| ScalarValue::$SCALAR_T(v.map(|v| v.to_vec())))
+                .collect();
+
+            let array = ScalarValue::iter_to_array(scalars.iter()).unwrap();
+
+            let expected: $ARRAYTYPE =
+                $INPUT.iter().map(|v| v.map(|v| v.to_vec())).collect();
+
+            let expected: ArrayRef = Arc::new(expected);
+
+            assert_eq!(&array, &expected);
+        }};
+    }
+
+    #[test]
+    fn scalar_iter_to_array_boolean() {
+        check_scalar_iter!(Boolean, BooleanArray, vec![Some(true), None, Some(false)]);
+        check_scalar_iter!(Float32, Float32Array, vec![Some(1.9), None, Some(-2.1)]);
+        check_scalar_iter!(Float64, Float64Array, vec![Some(1.9), None, Some(-2.1)]);
+
+        check_scalar_iter!(Int8, Int8Array, vec![Some(1), None, Some(3)]);
+        check_scalar_iter!(Int16, Int16Array, vec![Some(1), None, Some(3)]);
+        check_scalar_iter!(Int32, Int32Array, vec![Some(1), None, Some(3)]);
+        check_scalar_iter!(Int64, Int64Array, vec![Some(1), None, Some(3)]);
+
+        check_scalar_iter!(UInt8, UInt8Array, vec![Some(1), None, Some(3)]);
+        check_scalar_iter!(UInt16, UInt16Array, vec![Some(1), None, Some(3)]);
+        check_scalar_iter!(UInt32, UInt32Array, vec![Some(1), None, Some(3)]);
+        check_scalar_iter!(UInt64, UInt64Array, vec![Some(1), None, Some(3)]);
+
+        check_scalar_iter!(
+            TimestampSecond,
+            TimestampSecondArray,
+            vec![Some(1), None, Some(3)]
+        );
+        check_scalar_iter!(
+            TimestampMillisecond,
+            TimestampMillisecondArray,
+            vec![Some(1), None, Some(3)]
+        );
+        check_scalar_iter!(
+            TimestampMicrosecond,
+            TimestampMicrosecondArray,
+            vec![Some(1), None, Some(3)]
+        );
+        check_scalar_iter!(
+            TimestampNanosecond,
+            TimestampNanosecondArray,
+            vec![Some(1), None, Some(3)]
+        );
+
+        check_scalar_iter_string!(
+            Utf8,
+            StringArray,
+            vec![Some("foo"), None, Some("bar")]
+        );
+        check_scalar_iter_string!(
+            LargeUtf8,
+            LargeStringArray,
+            vec![Some("foo"), None, Some("bar")]
+        );
+        check_scalar_iter_binary!(
+            Binary,
+            BinaryArray,
+            vec![Some(b"foo"), None, Some(b"bar")]
+        );
+        check_scalar_iter_binary!(
+            LargeBinary,
+            LargeBinaryArray,
+            vec![Some(b"foo"), None, Some(b"bar")]
+        );
+    }
+
+    #[test]
+    fn scalar_iter_to_array_empty() {
+        let scalars = vec![] as Vec<ScalarValue>;
+
+        let result = ScalarValue::iter_to_array(scalars.iter()).unwrap_err();
+        assert!(
+            result
+                .to_string()
+                .contains("empty iterator passed to ScalarValue::iter_to_array"),

Review comment:
       ```suggestion
                   .contains("Empty iterator passed to ScalarValue::iter_to_array"),
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #381: Function to create `ArrayRef` from an iterator of ScalarValues

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #381:
URL: https://github.com/apache/arrow-datafusion/pull/381#issuecomment-846121385


   > Shouldn't we use the Builder API to create an array of the values from the statistics instead of stats -> Vec<Scalar> -> Array?
   
   @jorgecarleitao could you please elaborate on what you have in mind here? The reason `ScalarValue` got involved was that I proposed returning Min/Max values as `ScalarValue` (in a trait that I can implement for things outside of DataFusion) in https://github.com/apache/arrow-datafusion/pull/380. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #381: Function to create `ArrayRef` from an iterator of ScalarValues

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #381:
URL: https://github.com/apache/arrow-datafusion/pull/381#issuecomment-846544946


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/381?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#381](https://codecov.io/gh/apache/arrow-datafusion/pull/381?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (b2f771e) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/8b31714d69ff08517443a1f2da41e0dc0ec00302?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (8b31714) will **increase** coverage by `0.01%`.
   > The diff coverage is `72.56%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/381/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/381?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #381      +/-   ##
   ==========================================
   + Coverage   74.88%   74.90%   +0.01%     
   ==========================================
     Files         146      146              
     Lines       24368    24481     +113     
   ==========================================
   + Hits        18249    18338      +89     
   - Misses       6119     6143      +24     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/381?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/381/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvc2NhbGFyLnJz) | `58.66% <72.56%> (+4.95%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/381?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/381?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [8b31714...b2f771e](https://codecov.io/gh/apache/arrow-datafusion/pull/381?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #381: Function to create `ArrayRef` from an iterator of ScalarValues

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #381:
URL: https://github.com/apache/arrow-datafusion/pull/381


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #381: Function to create `ArrayRef` from an iterator of ScalarValues

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #381:
URL: https://github.com/apache/arrow-datafusion/pull/381#issuecomment-846544946


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/381?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#381](https://codecov.io/gh/apache/arrow-datafusion/pull/381?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6dfe1fd) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/db4f098d38993b96ce1134c4bc7bf5c6579509cf?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (db4f098) will **increase** coverage by `0.02%`.
   > The diff coverage is `72.56%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/381/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/381?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #381      +/-   ##
   ==========================================
   + Coverage   74.94%   74.96%   +0.02%     
   ==========================================
     Files         146      146              
     Lines       24314    24427     +113     
   ==========================================
   + Hits        18221    18312      +91     
   - Misses       6093     6115      +22     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/381?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/381/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvc2NhbGFyLnJz) | `60.25% <72.56%> (+4.77%)` | :arrow_up: |
   | [datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/381/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvbG9naWNhbF9wbGFuL2V4cHIucnM=) | `84.34% <0.00%> (ø)` | |
   | [datafusion/src/physical\_plan/hash\_aggregate.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/381/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9oYXNoX2FnZ3JlZ2F0ZS5ycw==) | `85.21% <0.00%> (+0.36%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/381?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/381?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [db4f098...6dfe1fd](https://codecov.io/gh/apache/arrow-datafusion/pull/381?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #381: Function to create `ArrayRef` from an iterator of ScalarValues

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #381:
URL: https://github.com/apache/arrow-datafusion/pull/381#discussion_r637234206



##########
File path: datafusion/src/scalar.rs
##########
@@ -283,6 +283,155 @@ impl ScalarValue {
         self.to_array_of_size(1)
     }
 
+    /// Converts an iterator of references [`ScalarValue`] into an [`ArrayRef`]
+    /// corresponding to those values. For example,
+    ///
+    /// Returns an error if the iterator is empty or if the
+    /// [`ScalarValue`]s are not all the same type
+    ///
+    /// Example
+    /// ```
+    /// use datafusion::scalar::ScalarValue;
+    /// use arrow::array::{ArrayRef, BooleanArray};
+    ///
+    /// let scalars = vec![
+    ///   ScalarValue::Boolean(Some(true)),
+    ///   ScalarValue::Boolean(None),
+    ///   ScalarValue::Boolean(Some(false)),
+    /// ];
+    ///
+    /// // Build an Array from the list of ScalarValues
+    /// let array = ScalarValue::iter_to_array(scalars.iter())
+    ///   .unwrap();
+    ///
+    /// let expected: ArrayRef = std::sync::Arc::new(
+    ///   BooleanArray::from(vec![
+    ///     Some(true),
+    ///     None,
+    ///     Some(false)
+    ///   ]
+    /// ));
+    ///
+    /// assert_eq!(&array, &expected);
+    /// ```
+    pub fn iter_to_array<'a>(
+        scalars: impl IntoIterator<Item = &'a ScalarValue>,
+    ) -> Result<ArrayRef> {
+        let mut scalars = scalars.into_iter().peekable();
+
+        // figure out the type based on the first element
+        let data_type = match scalars.peek() {
+            None => {
+                return Err(DataFusionError::Internal(
+                    "empty iterator passed to ScalarValue::iter_to_array".to_string(),

Review comment:
       ```suggestion
                       "Empty iterator passed to ScalarValue::iter_to_array".to_string(),
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on a change in pull request #381: Function to create `ArrayRef` from an iterator of ScalarValues

Posted by GitBox <gi...@apache.org>.
Jimexist commented on a change in pull request #381:
URL: https://github.com/apache/arrow-datafusion/pull/381#discussion_r637493584



##########
File path: datafusion/src/scalar.rs
##########
@@ -283,6 +283,155 @@ impl ScalarValue {
         self.to_array_of_size(1)
     }
 
+    /// Converts an iterator of references [`ScalarValue`] into an [`ArrayRef`]
+    /// corresponding to those values. For example,
+    ///
+    /// Returns an error if the iterator is empty or if the
+    /// [`ScalarValue`]s are not all the same type
+    ///
+    /// Example
+    /// ```
+    /// use datafusion::scalar::ScalarValue;
+    /// use arrow::array::{ArrayRef, BooleanArray};
+    ///
+    /// let scalars = vec![
+    ///   ScalarValue::Boolean(Some(true)),
+    ///   ScalarValue::Boolean(None),
+    ///   ScalarValue::Boolean(Some(false)),
+    /// ];
+    ///
+    /// // Build an Array from the list of ScalarValues
+    /// let array = ScalarValue::iter_to_array(scalars.iter())
+    ///   .unwrap();
+    ///
+    /// let expected: ArrayRef = std::sync::Arc::new(
+    ///   BooleanArray::from(vec![
+    ///     Some(true),
+    ///     None,
+    ///     Some(false)
+    ///   ]
+    /// ));
+    ///
+    /// assert_eq!(&array, &expected);
+    /// ```
+    pub fn iter_to_array<'a>(
+        scalars: impl IntoIterator<Item = &'a ScalarValue>,
+    ) -> Result<ArrayRef> {
+        let mut scalars = scalars.into_iter().peekable();
+
+        // figure out the type based on the first element
+        let data_type = match scalars.peek() {

Review comment:
       never mind - even if for not the data type info is present.

##########
File path: datafusion/src/scalar.rs
##########
@@ -283,6 +283,155 @@ impl ScalarValue {
         self.to_array_of_size(1)
     }
 
+    /// Converts an iterator of references [`ScalarValue`] into an [`ArrayRef`]
+    /// corresponding to those values. For example,
+    ///
+    /// Returns an error if the iterator is empty or if the
+    /// [`ScalarValue`]s are not all the same type
+    ///
+    /// Example
+    /// ```
+    /// use datafusion::scalar::ScalarValue;
+    /// use arrow::array::{ArrayRef, BooleanArray};
+    ///
+    /// let scalars = vec![
+    ///   ScalarValue::Boolean(Some(true)),
+    ///   ScalarValue::Boolean(None),
+    ///   ScalarValue::Boolean(Some(false)),
+    /// ];
+    ///
+    /// // Build an Array from the list of ScalarValues
+    /// let array = ScalarValue::iter_to_array(scalars.iter())
+    ///   .unwrap();
+    ///
+    /// let expected: ArrayRef = std::sync::Arc::new(
+    ///   BooleanArray::from(vec![
+    ///     Some(true),
+    ///     None,
+    ///     Some(false)
+    ///   ]
+    /// ));
+    ///
+    /// assert_eq!(&array, &expected);
+    /// ```
+    pub fn iter_to_array<'a>(
+        scalars: impl IntoIterator<Item = &'a ScalarValue>,
+    ) -> Result<ArrayRef> {
+        let mut scalars = scalars.into_iter().peekable();
+
+        // figure out the type based on the first element
+        let data_type = match scalars.peek() {

Review comment:
       never mind - even if for null the data type info is present.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan edited a comment on pull request #381: Function to create `ArrayRef` from an iterator of ScalarValues

Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on pull request #381:
URL: https://github.com/apache/arrow-datafusion/pull/381#issuecomment-846549279


   @alamb there is still a typo in the code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on pull request #381: Function to create `ArrayRef` from an iterator of ScalarValues

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #381:
URL: https://github.com/apache/arrow-datafusion/pull/381#issuecomment-846549279


   @alamb there is another typo in the code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #381: Function to create `ArrayRef` from an iterator of ScalarValues

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #381:
URL: https://github.com/apache/arrow-datafusion/pull/381#discussion_r637532899



##########
File path: datafusion/src/scalar.rs
##########
@@ -283,6 +283,155 @@ impl ScalarValue {
         self.to_array_of_size(1)
     }
 
+    /// Converts an iterator of references [`ScalarValue`] into an [`ArrayRef`]
+    /// corresponding to those values. For example,
+    ///
+    /// Returns an error if the iterator is empty or if the
+    /// [`ScalarValue`]s are not all the same type
+    ///
+    /// Example
+    /// ```
+    /// use datafusion::scalar::ScalarValue;
+    /// use arrow::array::{ArrayRef, BooleanArray};
+    ///
+    /// let scalars = vec![
+    ///   ScalarValue::Boolean(Some(true)),
+    ///   ScalarValue::Boolean(None),
+    ///   ScalarValue::Boolean(Some(false)),
+    /// ];
+    ///
+    /// // Build an Array from the list of ScalarValues
+    /// let array = ScalarValue::iter_to_array(scalars.iter())
+    ///   .unwrap();
+    ///
+    /// let expected: ArrayRef = std::sync::Arc::new(
+    ///   BooleanArray::from(vec![
+    ///     Some(true),
+    ///     None,
+    ///     Some(false)
+    ///   ]
+    /// ));
+    ///
+    /// assert_eq!(&array, &expected);
+    /// ```
+    pub fn iter_to_array<'a>(
+        scalars: impl IntoIterator<Item = &'a ScalarValue>,
+    ) -> Result<ArrayRef> {
+        let mut scalars = scalars.into_iter().peekable();
+
+        // figure out the type based on the first element
+        let data_type = match scalars.peek() {
+            None => {
+                return Err(DataFusionError::Internal(
+                    "Empty iterator passed to ScalarValue::iter_to_array".to_string(),
+                ))
+            }
+            Some(sv) => sv.get_datatype(),
+        };
+
+        /// Creates an array of $ARRAY_TY by unpacking values of
+        /// SCALAR_TY for primitive types
+        macro_rules! build_array_primative {

Review comment:
       ```suggestion
           macro_rules! build_array_primitive {
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on pull request #381: Function to create `ArrayRef` from an iterator of ScalarValues

Posted by GitBox <gi...@apache.org>.
Jimexist commented on pull request #381:
URL: https://github.com/apache/arrow-datafusion/pull/381#issuecomment-846510410


   this is definitely helpful while implementing #375 , I was this close of trying to implement this by myself :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org