You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "Weijun-H (via GitHub)" <gi...@apache.org> on 2023/11/10 13:06:19 UTC

[PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Weijun-H opened a new pull request, #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121

   ## Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #.
   
   ## Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   ## What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   ## Are these changes tested?
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   ## Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "jayzhan211 (via GitHub)" <gi...@apache.org>.
jayzhan211 commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1402815790


##########
datafusion/sqllogictest/test_files/array.slt:
##########
@@ -2427,6 +2469,18 @@ NULL 10
 NULL 10
 NULL 10
 
+query II
+select array_length(arrow_cast(array[array[1, 2], array[3, 4]], 'LargeList(List(Int64))'), column3), array_length(arrow_cast(column1, 'LargeList(Int64)'), 1) from arrays_values;

Review Comment:
   Mixing non-table with table test cases is quite confusing, not sure which line does the second one start from, can we separate them?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "Weijun-H (via GitHub)" <gi...@apache.org>.
Weijun-H commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1389381105


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -3206,6 +3261,16 @@ mod tests {
         make_array(&args).expect("failed to initialize function array")
     }
 
+    fn return_large_array() -> ArrayRef {
+        // Returns: [1, 2, 3, 4]
+        let capacity = i32::MAX as usize + 10;
+        let args = vec![Arc::new(Int64Array::from(vec![Some(1)])) as ArrayRef; capacity];
+
+        println!("args.len() = {}", args.len());
+
+        make_array(&args).expect("failed to initialize function array")
+    }
+

Review Comment:
   I don't know how to efficiently create a `LargeList` using `make_array` as it runs out of memory on my end.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "Veeupup (via GitHub)" <gi...@apache.org>.
Veeupup commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1398204034


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -3206,6 +3261,16 @@ mod tests {
         make_array(&args).expect("failed to initialize function array")
     }
 
+    fn return_large_array() -> ArrayRef {
+        // Returns: [1, 2, 3, 4]
+        let capacity = i32::MAX as usize + 10;
+        let args = vec![Arc::new(Int64Array::from(vec![Some(1)])) as ArrayRef; capacity];
+
+        println!("args.len() = {}", args.len());
+
+        make_array(&args).expect("failed to initialize function array")
+    }
+

Review Comment:
   maybe you should use below constructor to construct LargeList
   ```rust
   let list_array = Arc::new(LargeListArray::from_iter_primitive::<Int32Type, _, _>(
           vec![
               Some(vec![Some(0), Some(1), Some(2)]),
               None,
               Some(vec![Some(3), None, Some(5)]),
           ],
       )) as ArrayRef
   
   ```



##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -3015,6 +3085,18 @@ mod tests {
         make_array(&args).expect("failed to initialize function array")
     }
 
+    fn return_large_array() -> ArrayRef {
+        // Returns: [1, 2, 3, 4]
+        let args = [
+            Arc::new(Int64Array::from(vec![Some(1)])) as ArrayRef,
+            Arc::new(Int64Array::from(vec![Some(2)])) as ArrayRef,
+            Arc::new(Int64Array::from(vec![Some(3)])) as ArrayRef,
+            Arc::new(Int64Array::from(vec![Some(4)])) as ArrayRef,
+        ];
+        let data_type = DataType::Int64;
+        array_array::<i64>(&args, data_type).expect("failed to initialize function array")
+    }
+

Review Comment:
   maybe you should use below constructor to construct LargeList
   ```rust
   let list_array = Arc::new(LargeListArray::from_iter_primitive::<Int32Type, _, _>(
           vec![
               Some(vec![Some(0), Some(1), Some(2)]),
               None,
               Some(vec![Some(3), None, Some(5)]),
           ],
       )) as ArrayRef
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "Weijun-H (via GitHub)" <gi...@apache.org>.
Weijun-H commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1398237060


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -2939,6 +3011,43 @@ mod tests {
             as_uint64_array(&arr).expect("failed to initialize function array_length");
 
         assert_eq!(result, &UInt64Array::from(vec![None]));
+

Review Comment:
   Yes, I do not know how we could construct the `LargeList` in sqllogictest, instead of an array with more `i32:MAX` by `make_array`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "jayzhan211 (via GitHub)" <gi...@apache.org>.
jayzhan211 commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1402815271


##########
datafusion/sqllogictest/test_files/array.slt:
##########
@@ -2427,6 +2469,18 @@ NULL 10
 NULL 10
 NULL 10
 
+query II
+select array_length(arrow_cast(array[array[1, 2], array[3, 4]], 'LargeList(List(Int64))'), column3), array_length(arrow_cast(column1, 'LargeList(Int64)'), 1) from arrays_values;

Review Comment:
   mixing these non-table with table test cases is quite confusing, maybe separate them?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "jayzhan211 (via GitHub)" <gi...@apache.org>.
jayzhan211 commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1402815271


##########
datafusion/sqllogictest/test_files/array.slt:
##########
@@ -2427,6 +2469,18 @@ NULL 10
 NULL 10
 NULL 10
 
+query II
+select array_length(arrow_cast(array[array[1, 2], array[3, 4]], 'LargeList(List(Int64))'), column3), array_length(arrow_cast(column1, 'LargeList(Int64)'), 1) from arrays_values;

Review Comment:
   mixing these non-table with table test cases is quite confusing, maybe separate them?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "Weijun-H (via GitHub)" <gi...@apache.org>.
Weijun-H commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1393892134


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -3015,6 +3085,18 @@ mod tests {
         make_array(&args).expect("failed to initialize function array")
     }
 
+    fn return_large_array() -> ArrayRef {
+        // Returns: [1, 2, 3, 4]
+        let args = [
+            Arc::new(Int64Array::from(vec![Some(1)])) as ArrayRef,
+            Arc::new(Int64Array::from(vec![Some(2)])) as ArrayRef,
+            Arc::new(Int64Array::from(vec![Some(3)])) as ArrayRef,
+            Arc::new(Int64Array::from(vec![Some(4)])) as ArrayRef,
+        ];
+        let data_type = DataType::Int64;
+        array_array::<i64>(&args, data_type).expect("failed to initialize function array")
+    }
+

Review Comment:
   Hack here to create a `LargeList` to avoid spending too much memory for testing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1398189552


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -2939,6 +3011,43 @@ mod tests {
             as_uint64_array(&arr).expect("failed to initialize function array_length");
 
         assert_eq!(result, &UInt64Array::from(vec![None]));
+

Review Comment:
   Rather than extending this module with more rust tests, can you perhaps add SQL level tests using sqllogictest?
   
   That will ensure the functions are usable end to end and that all the connections are in place
   
   Perhaps https://github.com/apache/arrow-datafusion/blob/main/datafusion/sqllogictest/test_files/array.slt



##########
datafusion/common/src/scalar.rs:
##########
@@ -1716,7 +1716,11 @@ impl ScalarValue {
         } else {
             Self::iter_to_array(values.iter().cloned()).unwrap()
         };
-        Arc::new(array_into_list_array(values))
+        if values.len() <= i32::MAX as usize {

Review Comment:
   I think the ScalarValue type needs to match the underyling array type -- so with this change, a `ScalarValue::List` might return `LargeListArray` or `ListArray`. This mismatch will likely cause issues downstream
   
   I think the standard pattern would be to add a new function `ScalarValue::new_large_list` and `ScalarValue::LargeList` if they don't already exist



##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -3015,6 +3085,18 @@ mod tests {
         make_array(&args).expect("failed to initialize function array")
     }
 
+    fn return_large_array() -> ArrayRef {
+        // Returns: [1, 2, 3, 4]
+        let args = [
+            Arc::new(Int64Array::from(vec![Some(1)])) as ArrayRef,
+            Arc::new(Int64Array::from(vec![Some(2)])) as ArrayRef,
+            Arc::new(Int64Array::from(vec![Some(3)])) as ArrayRef,
+            Arc::new(Int64Array::from(vec![Some(4)])) as ArrayRef,
+        ];
+        let data_type = DataType::Int64;
+        array_array::<i64>(&args, data_type).expect("failed to initialize function array")
+    }
+

Review Comment:
   What is the hack? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "jayzhan211 (via GitHub)" <gi...@apache.org>.
jayzhan211 commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1398198857


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -355,13 +364,13 @@ fn array_array(args: &[ArrayRef], data_type: DataType) -> Result<ArrayRef> {
                 mutable.extend_nulls(1);
             }
         }
-        offsets.push(mutable.len() as i32);
+        offsets.push(O::from_usize(mutable.len()).unwrap());
     }
-
     let data = mutable.freeze();
-    Ok(Arc::new(ListArray::try_new(
+
+    Ok(Arc::new(GenericListArray::<O>::try_new(
         Arc::new(Field::new("item", data_type, true)),
-        OffsetBuffer::new(offsets.into()),
+        OffsetBuffer::new(ScalarBuffer::from(offsets)),

Review Comment:
   no need this for `usize_as`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "jayzhan211 (via GitHub)" <gi...@apache.org>.
jayzhan211 commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1398199254


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -1707,7 +1731,20 @@ pub fn flatten(args: &[ArrayRef]) -> Result<ArrayRef> {
 
 /// Array_length SQL function
 pub fn array_length(args: &[ArrayRef]) -> Result<ArrayRef> {
-    let list_array = as_list_array(&args[0])?;
+    match &args[0].data_type() {
+        DataType::List(_) => _array_length_list::<i32>(args),
+        DataType::LargeList(_) => _array_length_list::<i64>(args),
+        _ => Err(DataFusionError::Internal(format!(

Review Comment:
   macro



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "jayzhan211 (via GitHub)" <gi...@apache.org>.
jayzhan211 commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1402812033


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -355,11 +364,11 @@ fn array_array(args: &[ArrayRef], data_type: DataType) -> Result<ArrayRef> {
                 mutable.extend_nulls(1);
             }
         }
-        offsets.push(mutable.len() as i32);
+        offsets.push(O::from_usize(mutable.len()).unwrap());

Review Comment:
   I think we dont need unwrap if casting via `usize_as` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "jayzhan211 (via GitHub)" <gi...@apache.org>.
jayzhan211 commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1398198814


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -336,8 +344,9 @@ fn array_array(args: &[ArrayRef], data_type: DataType) -> Result<ArrayRef> {
         total_len += arg_data.len();
         data.push(arg_data);
     }
-    let mut offsets = Vec::with_capacity(total_len);
-    offsets.push(0);
+
+    let mut offsets: Vec<O> = Vec::with_capacity(total_len);
+    offsets.push(O::from_usize(0).unwrap());

Review Comment:
   `usize_as`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1412021717


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -379,13 +387,28 @@ pub fn make_array(arrays: &[ArrayRef]) -> Result<ArrayRef> {
         }
     }
 
+    let len = arrays.len();
     match data_type {
         // Either an empty array or all nulls:
         DataType::Null => {
             let array = new_null_array(&DataType::Null, arrays.len());
-            Ok(Arc::new(array_into_list_array(array)))
+            if len <= i32::MAX as usize {

Review Comment:
   I still don't think that `make_array` should ever return `NullArray` (aka `DataType::Null`) and instead it should always return `ListArray` (possibly with a null value)
   
   However, this PR doesn't make this situation worse, so 👍 



##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -379,13 +387,28 @@ pub fn make_array(arrays: &[ArrayRef]) -> Result<ArrayRef> {
         }
     }
 
+    let len = arrays.len();
     match data_type {
         // Either an empty array or all nulls:
         DataType::Null => {
             let array = new_null_array(&DataType::Null, arrays.len());
-            Ok(Arc::new(array_into_list_array(array)))
+            if len <= i32::MAX as usize {
+                Ok(Arc::new(array_into_list_array(array)))
+            } else if len <= i64::MAX as usize {
+                Ok(Arc::new(array_into_large_list_array(array)))
+            } else {
+                exec_err!("The number of elements {} in the array exceed the maximum number of elements supported by DataFusion",len)
+            }
+        }
+        data_type => {

Review Comment:
   The output type should be what is declared, not based on the data
   
   So returning a LargeLIstArray when the function's return `DataType` was `DataType::List` is wrong
   
   
   So in other words,  I think this should be something like
   
   ```suggestion
           DataType::LIst(_) => array_array::<i32>(arrays, data_type)
           DataType::LargeList(_) => array_array::<i64>(arrays, data_type)
   ```
   
   (as you have written it in `array_length(args: &[ArrayRef]) -> Result<ArrayRef> {`)



##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -379,13 +387,28 @@ pub fn make_array(arrays: &[ArrayRef]) -> Result<ArrayRef> {
         }
     }
 
+    let len = arrays.len();
     match data_type {
         // Either an empty array or all nulls:
         DataType::Null => {
             let array = new_null_array(&DataType::Null, arrays.len());
-            Ok(Arc::new(array_into_list_array(array)))
+            if len <= i32::MAX as usize {

Review Comment:
   I still don't think that `make_array` should ever return `NullArray` (aka `DataType::Null`) and instead it should always return `ListArray` (possibly with a null value)
   
   However, this PR doesn't make this situation worse, so 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "Weijun-H (via GitHub)" <gi...@apache.org>.
Weijun-H commented on PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#issuecomment-1817464426

   @alamb @jayzhan211 @Veeupup 
   PTAL :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "jayzhan211 (via GitHub)" <gi...@apache.org>.
jayzhan211 commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1398199405


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -1707,7 +1731,20 @@ pub fn flatten(args: &[ArrayRef]) -> Result<ArrayRef> {
 
 /// Array_length SQL function
 pub fn array_length(args: &[ArrayRef]) -> Result<ArrayRef> {
-    let list_array = as_list_array(&args[0])?;
+    match &args[0].data_type() {
+        DataType::List(_) => _array_length_list::<i32>(args),
+        DataType::LargeList(_) => _array_length_list::<i64>(args),
+        _ => Err(DataFusionError::Internal(format!(
+            "array_length does not support type '{:?}'",
+            args[0].data_type()
+        ))),
+    }
+}
+
+/// array_length for List and LargeList
+fn _array_length_list<O: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> {

Review Comment:
   Any other name without _? In rust _xxx usually means unused.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "jayzhan211 (via GitHub)" <gi...@apache.org>.
jayzhan211 commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1398199944


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -2939,6 +3011,43 @@ mod tests {
             as_uint64_array(&arr).expect("failed to initialize function array_length");
 
         assert_eq!(result, &UInt64Array::from(vec![None]));
+

Review Comment:
   I'm not sure how to create an large list in sql file. I think it's why he done the test here? @Weijun-H 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "Weijun-H (via GitHub)" <gi...@apache.org>.
Weijun-H commented on PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#issuecomment-1823974605

   > Overall LGTM
   
   Thank you for review 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "Weijun-H (via GitHub)" <gi...@apache.org>.
Weijun-H commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1400309359


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -2939,6 +3011,43 @@ mod tests {
             as_uint64_array(&arr).expect("failed to initialize function array_length");
 
         assert_eq!(result, &UInt64Array::from(vec![None]));
+

Review Comment:
   The current idea is to use `arrow_cast()` to create a LargeList instead of `make_array`. This pr is stalled by #8290 @alamb @jayzhan211 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "jayzhan211 (via GitHub)" <gi...@apache.org>.
jayzhan211 commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1402815790


##########
datafusion/sqllogictest/test_files/array.slt:
##########
@@ -2427,6 +2469,18 @@ NULL 10
 NULL 10
 NULL 10
 
+query II
+select array_length(arrow_cast(array[array[1, 2], array[3, 4]], 'LargeList(List(Int64))'), column3), array_length(arrow_cast(column1, 'LargeList(Int64)'), 1) from arrays_values;

Review Comment:
   Mixing non-table with table test cases is quite confusing, not sure which line does the second one start from, can we separate them?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1413070868


##########
datafusion/sqllogictest/test_files/array.slt:
##########
@@ -2371,36 +2371,66 @@ select array_length(make_array(1, 2, 3, 4, 5)), array_length(make_array(1, 2, 3)
 ----
 5 3 3
 
+query III
+select array_length(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)')), array_length(arrow_cast(make_array(1, 2, 3), 'LargeList(Int64)')), array_length(arrow_cast(make_array([1, 2], [3, 4], [5, 6]), 'LargeList(List(Int64))'));
+----
+5 3 3
+
 # array_length scalar function #2
 query III
 select array_length(make_array(1, 2, 3, 4, 5), 1), array_length(make_array(1, 2, 3), 1), array_length(make_array([1, 2], [3, 4], [5, 6]), 1);
 ----
 5 3 3
 
+query III
+select array_length(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)'), 1), array_length(arrow_cast(make_array(1, 2, 3), 'LargeList(Int64)'), 1), array_length(arrow_cast(make_array([1, 2], [3, 4], [5, 6]), 'LargeList(List(Int64))'), 1);
+----
+5 3 3
+
 # array_length scalar function #3
 query III
 select array_length(make_array(1, 2, 3, 4, 5), 2), array_length(make_array(1, 2, 3), 2), array_length(make_array([1, 2], [3, 4], [5, 6]), 2);
 ----
 NULL NULL 2
 
+query III
+select array_length(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)'), 2), array_length(arrow_cast(make_array(1, 2, 3), 'LargeList(Int64)'), 2), array_length(arrow_cast(make_array([1, 2], [3, 4], [5, 6]), 'LargeList(List(Int64))'), 2);
+----
+NULL NULL 2
+
 # array_length scalar function #4
 query II
 select array_length(array_repeat(array_repeat(array_repeat(3, 5), 2), 3), 1), array_length(array_repeat(array_repeat(array_repeat(3, 5), 2), 3), 2);
 ----
 3 2
 
+query II
+select array_length(arrow_cast(array_repeat(array_repeat(array_repeat(3, 5), 2), 3), 'LargeList(List(List(Int64)))'), 1), array_length(arrow_cast(array_repeat(array_repeat(array_repeat(3, 5), 2), 3), 'LargeList(List(List(Int64)))'), 2);

Review Comment:
   Could it be possible to add a test that tries to cast `LargeList(List)` as a follow on PR? It would, of course, error initially, but then when we upgraded arrow to a version that did support that cast, we would have the test coverage



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#issuecomment-1837467877

   THanks for sticking with these PRs -- I know it has taken time, but now that we have the patterns down I feel like the code is really improving at a nice rate. This is what I think successful software development and incremental improvement looks like!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "Weijun-H (via GitHub)" <gi...@apache.org>.
Weijun-H commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1402130704


##########
datafusion/sqllogictest/test_files/array.slt:
##########
@@ -2371,36 +2371,66 @@ select array_length(make_array(1, 2, 3, 4, 5)), array_length(make_array(1, 2, 3)
 ----
 5 3 3
 
+query III
+select array_length(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)')), array_length(arrow_cast(make_array(1, 2, 3), 'LargeList(Int64)')), array_length(arrow_cast(make_array([1, 2], [3, 4], [5, 6]), 'LargeList(List(Int64))'));
+----
+5 3 3
+
 # array_length scalar function #2
 query III
 select array_length(make_array(1, 2, 3, 4, 5), 1), array_length(make_array(1, 2, 3), 1), array_length(make_array([1, 2], [3, 4], [5, 6]), 1);
 ----
 5 3 3
 
+query III
+select array_length(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)'), 1), array_length(arrow_cast(make_array(1, 2, 3), 'LargeList(Int64)'), 1), array_length(arrow_cast(make_array([1, 2], [3, 4], [5, 6]), 'LargeList(List(Int64))'), 1);
+----
+5 3 3
+
 # array_length scalar function #3
 query III
 select array_length(make_array(1, 2, 3, 4, 5), 2), array_length(make_array(1, 2, 3), 2), array_length(make_array([1, 2], [3, 4], [5, 6]), 2);
 ----
 NULL NULL 2
 
+query III
+select array_length(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)'), 2), array_length(arrow_cast(make_array(1, 2, 3), 'LargeList(Int64)'), 2), array_length(arrow_cast(make_array([1, 2], [3, 4], [5, 6]), 'LargeList(List(Int64))'), 2);
+----
+NULL NULL 2
+
 # array_length scalar function #4
 query II
 select array_length(array_repeat(array_repeat(array_repeat(3, 5), 2), 3), 1), array_length(array_repeat(array_repeat(array_repeat(3, 5), 2), 3), 2);
 ----
 3 2
 
+query II
+select array_length(arrow_cast(array_repeat(array_repeat(array_repeat(3, 5), 2), 3), 'LargeList(List(List(Int64)))'), 1), array_length(arrow_cast(array_repeat(array_repeat(array_repeat(3, 5), 2), 3), 'LargeList(List(List(Int64)))'), 2);

Review Comment:
   Can not test `LargeList(LargeList)` because arrow-rs does not support yet. #8305



##########
datafusion/sqllogictest/test_files/array.slt:
##########
@@ -2371,36 +2371,66 @@ select array_length(make_array(1, 2, 3, 4, 5)), array_length(make_array(1, 2, 3)
 ----
 5 3 3
 
+query III
+select array_length(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)')), array_length(arrow_cast(make_array(1, 2, 3), 'LargeList(Int64)')), array_length(arrow_cast(make_array([1, 2], [3, 4], [5, 6]), 'LargeList(List(Int64))'));
+----
+5 3 3
+
 # array_length scalar function #2
 query III
 select array_length(make_array(1, 2, 3, 4, 5), 1), array_length(make_array(1, 2, 3), 1), array_length(make_array([1, 2], [3, 4], [5, 6]), 1);
 ----
 5 3 3
 
+query III
+select array_length(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)'), 1), array_length(arrow_cast(make_array(1, 2, 3), 'LargeList(Int64)'), 1), array_length(arrow_cast(make_array([1, 2], [3, 4], [5, 6]), 'LargeList(List(Int64))'), 1);
+----
+5 3 3
+
 # array_length scalar function #3
 query III
 select array_length(make_array(1, 2, 3, 4, 5), 2), array_length(make_array(1, 2, 3), 2), array_length(make_array([1, 2], [3, 4], [5, 6]), 2);
 ----
 NULL NULL 2
 
+query III
+select array_length(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)'), 2), array_length(arrow_cast(make_array(1, 2, 3), 'LargeList(Int64)'), 2), array_length(arrow_cast(make_array([1, 2], [3, 4], [5, 6]), 'LargeList(List(Int64))'), 2);
+----
+NULL NULL 2
+
 # array_length scalar function #4
 query II
 select array_length(array_repeat(array_repeat(array_repeat(3, 5), 2), 3), 1), array_length(array_repeat(array_repeat(array_repeat(3, 5), 2), 3), 2);
 ----
 3 2
 
+query II
+select array_length(arrow_cast(array_repeat(array_repeat(array_repeat(3, 5), 2), 3), 'LargeList(List(List(Int64)))'), 1), array_length(arrow_cast(array_repeat(array_repeat(array_repeat(3, 5), 2), 3), 'LargeList(List(List(Int64)))'), 2);

Review Comment:
   Can not test `LargeList(LargeList)` because arrow-rs does not support it yet. #8305



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "jayzhan211 (via GitHub)" <gi...@apache.org>.
jayzhan211 commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1402811554


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -1707,7 +1731,20 @@ pub fn flatten(args: &[ArrayRef]) -> Result<ArrayRef> {
 
 /// Array_length SQL function
 pub fn array_length(args: &[ArrayRef]) -> Result<ArrayRef> {
-    let list_array = as_list_array(&args[0])?;
+    match &args[0].data_type() {
+        DataType::List(_) => _array_length_list::<i32>(args),
+        DataType::LargeList(_) => _array_length_list::<i64>(args),
+        _ => Err(DataFusionError::Internal(format!(

Review Comment:
   Unresolved



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "Veeupup (via GitHub)" <gi...@apache.org>.
Veeupup commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1398203826


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -3015,6 +3085,18 @@ mod tests {
         make_array(&args).expect("failed to initialize function array")
     }
 
+    fn return_large_array() -> ArrayRef {
+        // Returns: [1, 2, 3, 4]
+        let args = [
+            Arc::new(Int64Array::from(vec![Some(1)])) as ArrayRef,
+            Arc::new(Int64Array::from(vec![Some(2)])) as ArrayRef,
+            Arc::new(Int64Array::from(vec![Some(3)])) as ArrayRef,
+            Arc::new(Int64Array::from(vec![Some(4)])) as ArrayRef,
+        ];
+        let data_type = DataType::Int64;
+        array_array::<i64>(&args, data_type).expect("failed to initialize function array")
+    }
+

Review Comment:
   maybe you should use below constructor to construct LargeList
   ```rust
   let list_array = Arc::new(LargeListArray::from_iter_primitive::<Int32Type, _, _>(
           vec![
               Some(vec![Some(0), Some(1), Some(2)]),
               None,
               Some(vec![Some(3), None, Some(5)]),
           ],
       )) as ArrayRef
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org