You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/12/01 12:04:28 UTC

Re: [PR] feat: support `LargeList` in `make_array` and `array_length` [arrow-datafusion]

alamb commented on code in PR #8121:
URL: https://github.com/apache/arrow-datafusion/pull/8121#discussion_r1412021717


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -379,13 +387,28 @@ pub fn make_array(arrays: &[ArrayRef]) -> Result<ArrayRef> {
         }
     }
 
+    let len = arrays.len();
     match data_type {
         // Either an empty array or all nulls:
         DataType::Null => {
             let array = new_null_array(&DataType::Null, arrays.len());
-            Ok(Arc::new(array_into_list_array(array)))
+            if len <= i32::MAX as usize {

Review Comment:
   I still don't think that `make_array` should ever return `NullArray` (aka `DataType::Null`) and instead it should always return `ListArray` (possibly with a null value)
   
   However, this PR doesn't make this situation worse, so 👍 



##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -379,13 +387,28 @@ pub fn make_array(arrays: &[ArrayRef]) -> Result<ArrayRef> {
         }
     }
 
+    let len = arrays.len();
     match data_type {
         // Either an empty array or all nulls:
         DataType::Null => {
             let array = new_null_array(&DataType::Null, arrays.len());
-            Ok(Arc::new(array_into_list_array(array)))
+            if len <= i32::MAX as usize {
+                Ok(Arc::new(array_into_list_array(array)))
+            } else if len <= i64::MAX as usize {
+                Ok(Arc::new(array_into_large_list_array(array)))
+            } else {
+                exec_err!("The number of elements {} in the array exceed the maximum number of elements supported by DataFusion",len)
+            }
+        }
+        data_type => {

Review Comment:
   The output type should be what is declared, not based on the data
   
   So returning a LargeLIstArray when the function's return `DataType` was `DataType::List` is wrong
   
   
   So in other words,  I think this should be something like
   
   ```suggestion
           DataType::LIst(_) => array_array::<i32>(arrays, data_type)
           DataType::LargeList(_) => array_array::<i64>(arrays, data_type)
   ```
   
   (as you have written it in `array_length(args: &[ArrayRef]) -> Result<ArrayRef> {`)



##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -379,13 +387,28 @@ pub fn make_array(arrays: &[ArrayRef]) -> Result<ArrayRef> {
         }
     }
 
+    let len = arrays.len();
     match data_type {
         // Either an empty array or all nulls:
         DataType::Null => {
             let array = new_null_array(&DataType::Null, arrays.len());
-            Ok(Arc::new(array_into_list_array(array)))
+            if len <= i32::MAX as usize {

Review Comment:
   I still don't think that `make_array` should ever return `NullArray` (aka `DataType::Null`) and instead it should always return `ListArray` (possibly with a null value)
   
   However, this PR doesn't make this situation worse, so 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org