You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "izveigor (via GitHub)" <gi...@apache.org> on 2023/05/18 19:24:28 UTC

[GitHub] [arrow-datafusion] izveigor opened a new pull request, #6384: feat: New functions and operations for working with arrays

izveigor opened a new pull request, #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #6119 and #6075.
   
   # Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are these changes tested?
   Yes
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   # Are there any user-facing changes?
   Yes
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #6384: feat: New functions and operations for working with arrays

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#discussion_r1213009802


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -77,11 +81,40 @@ fn array_array(args: &[ArrayRef]) -> Result<ArrayRef> {
     // do not accept 0 arguments.
     if args.is_empty() {
         return Err(DataFusionError::Internal(
-            "array requires at least one argument".to_string(),
+            "Array requires at least one argument".to_string(),
         ));
     }
 
-    let res = match args[0].data_type() {
+    let data_type = args[0].data_type();
+    let res = match data_type {
+        DataType::List(..) => {
+            let arrays =

Review Comment:
   It would perhaps be nicer to use a combination of https://docs.rs/arrow-array/latest/arrow_array/array/struct.GenericListArray.html#method.try_new and https://docs.rs/arrow-select/latest/arrow_select/concat/index.html
   
   MutableArrayData is not the nicest API to use



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on pull request #6384: feat: New functions and operations for working with arrays

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#issuecomment-1582127713

   This PR exist bug, related with #6596 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] izveigor commented on a diff in pull request #6384: feat: New functions and operations for working with arrays

Posted by "izveigor (via GitHub)" <gi...@apache.org>.
izveigor commented on code in PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#discussion_r1209545448


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -115,3 +149,1560 @@ pub fn array(values: &[ColumnarValue]) -> Result<ColumnarValue> {
         .collect();
     Ok(ColumnarValue::Array(array_array(arrays.as_slice())?))
 }
+
+macro_rules! downcast_arg {
+    ($ARG:expr, $ARRAY_TYPE:ident) => {{
+        $ARG.as_any().downcast_ref::<$ARRAY_TYPE>().ok_or_else(|| {
+            DataFusionError::Internal(format!(
+                "could not cast to {}",
+                type_name::<$ARRAY_TYPE>()
+            ))
+        })?
+    }};
+}
+
+macro_rules! append {
+    ($ARRAY:expr, $ELEMENT:expr, $ARRAY_TYPE:ident) => {{
+        let child_array =
+            downcast_arg!(downcast_arg!($ARRAY, ListArray).values(), $ARRAY_TYPE);
+        let element = downcast_arg!($ELEMENT, $ARRAY_TYPE);
+        let concat = compute::concat(&[child_array, element])?;
+        let mut scalars = vec![];
+        for i in 0..concat.len() {
+            scalars.push(ColumnarValue::Scalar(ScalarValue::try_from_array(
+                &concat, i,
+            )?));
+        }
+        scalars
+    }};
+}
+
+/// Array_append SQL function
+pub fn array_append(args: &[ColumnarValue]) -> Result<ColumnarValue> {
+    if args.len() != 2 {
+        return Err(DataFusionError::Internal(format!(
+            "Array_append function requires two arguments, got {}",
+            args.len()
+        )));
+    }
+
+    let arr = match &args[0] {
+        ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),
+        ColumnarValue::Array(arr) => arr.clone(),
+    };
+
+    let element = match &args[1] {
+        ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),

Review Comment:
   `ColumnarValue::Array` also makes sense in this situation?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] izveigor commented on a diff in pull request #6384: feat: New functions and operations for working with arrays

Posted by "izveigor (via GitHub)" <gi...@apache.org>.
izveigor commented on code in PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#discussion_r1209544990


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -115,3 +149,1560 @@ pub fn array(values: &[ColumnarValue]) -> Result<ColumnarValue> {
         .collect();
     Ok(ColumnarValue::Array(array_array(arrays.as_slice())?))
 }
+
+macro_rules! downcast_arg {
+    ($ARG:expr, $ARRAY_TYPE:ident) => {{
+        $ARG.as_any().downcast_ref::<$ARRAY_TYPE>().ok_or_else(|| {
+            DataFusionError::Internal(format!(
+                "could not cast to {}",
+                type_name::<$ARRAY_TYPE>()
+            ))
+        })?
+    }};
+}
+
+macro_rules! append {
+    ($ARRAY:expr, $ELEMENT:expr, $ARRAY_TYPE:ident) => {{
+        let child_array =
+            downcast_arg!(downcast_arg!($ARRAY, ListArray).values(), $ARRAY_TYPE);
+        let element = downcast_arg!($ELEMENT, $ARRAY_TYPE);
+        let concat = compute::concat(&[child_array, element])?;
+        let mut scalars = vec![];
+        for i in 0..concat.len() {
+            scalars.push(ColumnarValue::Scalar(ScalarValue::try_from_array(
+                &concat, i,
+            )?));
+        }
+        scalars
+    }};
+}
+
+/// Array_append SQL function
+pub fn array_append(args: &[ColumnarValue]) -> Result<ColumnarValue> {

Review Comment:
   Should each function accept `&[ColumnarValue]` or `ArrayRef`? Is there a difference in these approaches?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #6384: feat: New functions and operations for working with arrays

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#issuecomment-1571011517

   I didn't make it to this today, but I plan to review it tomororw


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] izveigor commented on pull request #6384: feat: New functions and operations for working with arrays

Posted by "izveigor (via GitHub)" <gi...@apache.org>.
izveigor commented on PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#issuecomment-1576602838

   In my opinion, it would be better to merge this PR, I have some arguments:
   1) This PR is completed, and the next PRs will be only improve it.
   2) If we break this PR down, I think it will increase production time.
   The main reason why I don't want to continue this PR is because I want to take a closer look at some issues, but they are mostly related to ready-made functions.
   So, I understand your concerns, but i think this way is better.
   What do you think, @alamb?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #6384: feat: New functions and operations for working with arrays

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#discussion_r1212990411


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -77,11 +81,40 @@ fn array_array(args: &[ArrayRef]) -> Result<ArrayRef> {
     // do not accept 0 arguments.
     if args.is_empty() {
         return Err(DataFusionError::Internal(
-            "array requires at least one argument".to_string(),
+            "Array requires at least one argument".to_string(),
         ));
     }
 
-    let res = match args[0].data_type() {
+    let data_type = args[0].data_type();
+    let res = match data_type {
+        DataType::List(..) => {
+            let arrays =

Review Comment:
   @tustvold  can you offer some suggestions on using the arrow-rs API to build list arrays? Is this the best way to use that API?



##########
datafusion/core/tests/sqllogictests/test_files/array.slt:
##########
@@ -0,0 +1,206 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+
+#   http://www.apache.org/licenses/LICENSE-2.0
+
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+#############
+## Array expressions Tests
+#############
+
+# array scalar function #1

Review Comment:
   These are great @izveigor  -- thank you so much
   
   the only thing I recommend is adding some additional tests that have `null` in the lists. 



##########
datafusion/expr/src/function.rs:
##########
@@ -322,10 +404,21 @@ pub fn signature(fun: &BuiltinScalarFunction) -> Signature {
 
     // for now, the list is small, as we do not have many built-in functions.
     match fun {
-        BuiltinScalarFunction::MakeArray => Signature::variadic(
-            array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
-            fun.volatility(),
-        ),
+        BuiltinScalarFunction::ArrayAppend => Signature::any(2, fun.volatility()),

Review Comment:
   🤔  Given the element type of the list is part of its `DataType`, you probably can'y use the existing Signatures
   
   Perhaps you could add a new `Signature::any_list` or something that would only check that the datatype matched `DataType::list` 🤔 



##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -115,3 +149,1560 @@ pub fn array(values: &[ColumnarValue]) -> Result<ColumnarValue> {
         .collect();
     Ok(ColumnarValue::Array(array_array(arrays.as_slice())?))
 }
+
+macro_rules! downcast_arg {
+    ($ARG:expr, $ARRAY_TYPE:ident) => {{
+        $ARG.as_any().downcast_ref::<$ARRAY_TYPE>().ok_or_else(|| {
+            DataFusionError::Internal(format!(
+                "could not cast to {}",
+                type_name::<$ARRAY_TYPE>()
+            ))
+        })?
+    }};
+}
+
+macro_rules! append {
+    ($ARRAY:expr, $ELEMENT:expr, $ARRAY_TYPE:ident) => {{
+        let child_array =
+            downcast_arg!(downcast_arg!($ARRAY, ListArray).values(), $ARRAY_TYPE);
+        let element = downcast_arg!($ELEMENT, $ARRAY_TYPE);
+        let concat = compute::concat(&[child_array, element])?;
+        let mut scalars = vec![];
+        for i in 0..concat.len() {
+            scalars.push(ColumnarValue::Scalar(ScalarValue::try_from_array(
+                &concat, i,
+            )?));
+        }
+        scalars
+    }};
+}
+
+/// Array_append SQL function
+pub fn array_append(args: &[ColumnarValue]) -> Result<ColumnarValue> {
+    if args.len() != 2 {
+        return Err(DataFusionError::Internal(format!(
+            "Array_append function requires two arguments, got {}",
+            args.len()
+        )));
+    }
+
+    let arr = match &args[0] {

Review Comment:
   I think you can use https://docs.rs/datafusion/latest/datafusion/physical_plan/enum.ColumnarValue.html#method.into_array here



##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -115,3 +149,1560 @@ pub fn array(values: &[ColumnarValue]) -> Result<ColumnarValue> {
         .collect();
     Ok(ColumnarValue::Array(array_array(arrays.as_slice())?))
 }
+
+macro_rules! downcast_arg {
+    ($ARG:expr, $ARRAY_TYPE:ident) => {{
+        $ARG.as_any().downcast_ref::<$ARRAY_TYPE>().ok_or_else(|| {
+            DataFusionError::Internal(format!(
+                "could not cast to {}",
+                type_name::<$ARRAY_TYPE>()
+            ))
+        })?
+    }};
+}
+
+macro_rules! append {
+    ($ARRAY:expr, $ELEMENT:expr, $ARRAY_TYPE:ident) => {{
+        let child_array =
+            downcast_arg!(downcast_arg!($ARRAY, ListArray).values(), $ARRAY_TYPE);
+        let element = downcast_arg!($ELEMENT, $ARRAY_TYPE);
+        let concat = compute::concat(&[child_array, element])?;
+        let mut scalars = vec![];
+        for i in 0..concat.len() {
+            scalars.push(ColumnarValue::Scalar(ScalarValue::try_from_array(
+                &concat, i,
+            )?));
+        }
+        scalars
+    }};
+}
+
+/// Array_append SQL function
+pub fn array_append(args: &[ColumnarValue]) -> Result<ColumnarValue> {

Review Comment:
   The difference is that if you take `ColumnarValue` we could specialize the kernels to do something faster with scalar (single) values rather than expanding them out to arrays (aka making copies).
   
   For the initial implementation I think converting them all to arrays is the best approach as it is simplest



##########
datafusion/physical-expr/src/functions.rs:
##########
@@ -2785,73 +2807,6 @@ mod tests {
         Ok(())
     }
 
-    fn generic_test_array(

Review Comment:
   I am not sure



##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -77,11 +81,40 @@ fn array_array(args: &[ArrayRef]) -> Result<ArrayRef> {
     // do not accept 0 arguments.
     if args.is_empty() {
         return Err(DataFusionError::Internal(
-            "array requires at least one argument".to_string(),
+            "Array requires at least one argument".to_string(),
         ));
     }
 
-    let res = match args[0].data_type() {
+    let data_type = args[0].data_type();
+    let res = match data_type {
+        DataType::List(..) => {

Review Comment:
   As FixedSizedList and List are different data types, if people have data that came from a Parquet file or something that is a FixedSizedList these functions likely wont work, 
   
   However, perhaps eventually we can add coercion rules to coerce (automatically cast) FixedSizeList to List 



##########
datafusion/expr/src/function.rs:
##########
@@ -322,10 +404,21 @@ pub fn signature(fun: &BuiltinScalarFunction) -> Signature {
 
     // for now, the list is small, as we do not have many built-in functions.
     match fun {
-        BuiltinScalarFunction::MakeArray => Signature::variadic(
-            array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
-            fun.volatility(),
-        ),
+        BuiltinScalarFunction::ArrayAppend => Signature::any(2, fun.volatility()),

Review Comment:
   🤔  Given the element type of the list is part of its `DataType`, you probably can'y use the existing Signatures
   
   Perhaps you could add a new `Signature::any_list` or something that would only check that the datatype matched `DataType::list` 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #6384: feat: New functions and operations for working with arrays

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#discussion_r1213003167


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -115,3 +149,1560 @@ pub fn array(values: &[ColumnarValue]) -> Result<ColumnarValue> {
         .collect();
     Ok(ColumnarValue::Array(array_array(arrays.as_slice())?))
 }
+
+macro_rules! downcast_arg {
+    ($ARG:expr, $ARRAY_TYPE:ident) => {{
+        $ARG.as_any().downcast_ref::<$ARRAY_TYPE>().ok_or_else(|| {
+            DataFusionError::Internal(format!(
+                "could not cast to {}",
+                type_name::<$ARRAY_TYPE>()
+            ))
+        })?
+    }};
+}
+
+macro_rules! append {
+    ($ARRAY:expr, $ELEMENT:expr, $ARRAY_TYPE:ident) => {{
+        let child_array =
+            downcast_arg!(downcast_arg!($ARRAY, ListArray).values(), $ARRAY_TYPE);
+        let element = downcast_arg!($ELEMENT, $ARRAY_TYPE);
+        let concat = compute::concat(&[child_array, element])?;
+        let mut scalars = vec![];
+        for i in 0..concat.len() {
+            scalars.push(ColumnarValue::Scalar(ScalarValue::try_from_array(
+                &concat, i,
+            )?));
+        }
+        scalars
+    }};
+}
+
+/// Array_append SQL function
+pub fn array_append(args: &[ColumnarValue]) -> Result<ColumnarValue> {
+    if args.len() != 2 {
+        return Err(DataFusionError::Internal(format!(
+            "Array_append function requires two arguments, got {}",
+            args.len()
+        )));
+    }
+
+    let arr = match &args[0] {
+        ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),
+        ColumnarValue::Array(arr) => arr.clone(),
+    };
+
+    let element = match &args[1] {
+        ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),
+        _ => {
+            return Err(DataFusionError::Internal(
+                "Array_append function requires scalar element".to_string(),
+            ))
+        }
+    };
+
+    let data_type = arr.data_type();
+    let arrays = match data_type {
+        DataType::List(field) => {
+            match (field.data_type(), element.data_type()) {
+                (DataType::Utf8, DataType::Utf8) => append!(arr, element, StringArray),
+                (DataType::LargeUtf8, DataType::LargeUtf8) => append!(arr, element, LargeStringArray),
+                (DataType::Boolean, DataType::Boolean) => append!(arr, element, BooleanArray),
+                (DataType::Float32, DataType::Float32) => append!(arr, element, Float32Array),
+                (DataType::Float64, DataType::Float64) => append!(arr, element, Float64Array),
+                (DataType::Int8, DataType::Int8) => append!(arr, element, Int8Array),
+                (DataType::Int16, DataType::Int16) => append!(arr, element, Int16Array),
+                (DataType::Int32, DataType::Int32) => append!(arr, element, Int32Array),
+                (DataType::Int64, DataType::Int64) => append!(arr, element, Int64Array),
+                (DataType::UInt8, DataType::UInt8) => append!(arr, element, UInt8Array),
+                (DataType::UInt16, DataType::UInt16) => append!(arr, element, UInt16Array),
+                (DataType::UInt32, DataType::UInt32) => append!(arr, element, UInt32Array),
+                (DataType::UInt64, DataType::UInt64) => append!(arr, element, UInt64Array),
+                (array_data_type, element_data_type) => {
+                    return Err(DataFusionError::NotImplemented(format!(
+                        "Array_append is not implemented for types '{array_data_type:?}' and '{element_data_type:?}'."
+                    )))
+                }
+            }
+        }
+        data_type => {
+            return Err(DataFusionError::Internal(format!(
+                "Array is not type '{data_type:?}'."
+            )))
+        }
+    };
+
+    array(arrays.as_slice())
+}
+
+macro_rules! prepend {
+    ($ARRAY:expr, $ELEMENT:expr, $ARRAY_TYPE:ident) => {{
+        let child_array =
+            downcast_arg!(downcast_arg!($ARRAY, ListArray).values(), $ARRAY_TYPE);
+        let element = downcast_arg!($ELEMENT, $ARRAY_TYPE);
+        let concat = compute::concat(&[element, child_array])?;
+        let mut scalars = vec![];
+        for i in 0..concat.len() {
+            scalars.push(ColumnarValue::Scalar(ScalarValue::try_from_array(
+                &concat, i,
+            )?));
+        }
+        scalars
+    }};
+}
+
+/// Array_prepend SQL function
+pub fn array_prepend(args: &[ColumnarValue]) -> Result<ColumnarValue> {
+    if args.len() != 2 {
+        return Err(DataFusionError::Internal(format!(
+            "Array_prepend function requires two arguments, got {}",
+            args.len()
+        )));
+    }
+
+    let element = match &args[0] {
+        ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),
+        _ => {
+            return Err(DataFusionError::Internal(
+                "Array_prepend function requires scalar element".to_string(),
+            ))
+        }
+    };
+
+    let arr = match &args[1] {
+        ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),
+        ColumnarValue::Array(arr) => arr.clone(),
+    };
+
+    let data_type = arr.data_type();
+    let arrays = match data_type {
+        DataType::List(field) => {
+            match (field.data_type(), element.data_type()) {
+                (DataType::Utf8, DataType::Utf8) => prepend!(arr, element, StringArray),
+                (DataType::LargeUtf8, DataType::LargeUtf8) => prepend!(arr, element, LargeStringArray),
+                (DataType::Boolean, DataType::Boolean) => prepend!(arr, element, BooleanArray),
+                (DataType::Float32, DataType::Float32) => prepend!(arr, element, Float32Array),
+                (DataType::Float64, DataType::Float64) => prepend!(arr, element, Float64Array),
+                (DataType::Int8, DataType::Int8) => prepend!(arr, element, Int8Array),
+                (DataType::Int16, DataType::Int16) => prepend!(arr, element, Int16Array),
+                (DataType::Int32, DataType::Int32) => prepend!(arr, element, Int32Array),
+                (DataType::Int64, DataType::Int64) => prepend!(arr, element, Int64Array),
+                (DataType::UInt8, DataType::UInt8) => prepend!(arr, element, UInt8Array),
+                (DataType::UInt16, DataType::UInt16) => prepend!(arr, element, UInt16Array),
+                (DataType::UInt32, DataType::UInt32) => prepend!(arr, element, UInt32Array),
+                (DataType::UInt64, DataType::UInt64) => prepend!(arr, element, UInt64Array),
+                (array_data_type, element_data_type) => {
+                    return Err(DataFusionError::NotImplemented(format!(
+                        "Array_prepend is not implemented for types '{array_data_type:?}' and '{element_data_type:?}'."
+                    )))
+                }
+            }
+        }
+        data_type => {
+            return Err(DataFusionError::Internal(format!(
+                "Array is not type '{data_type:?}'."
+            )))
+        }
+    };
+
+    array(arrays.as_slice())
+}
+
+/// Array_concat/Array_cat SQL function
+pub fn array_concat(args: &[ColumnarValue]) -> Result<ColumnarValue> {
+    let arrays: Vec<ArrayRef> = args
+        .iter()
+        .map(|x| match x {
+            ColumnarValue::Array(array) => array.clone(),
+            ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),
+        })
+        .collect();
+    let data_type = arrays[0].data_type();
+    match data_type {
+        DataType::List(..) => {
+            let list_arrays =
+                downcast_vec!(arrays, ListArray).collect::<Result<Vec<&ListArray>>>()?;
+            let len: usize = list_arrays.iter().map(|a| a.values().len()).sum();
+            let capacity = Capacities::Array(
+                list_arrays.iter().map(|a| a.get_buffer_memory_size()).sum(),

Review Comment:
   ```suggestion
                   list_arrays.iter().map(|a| a.len()).sum(),
   ```
   The buffer memory size is fairly significant over estimate



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #6384: feat: New functions and operations for working with arrays

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#issuecomment-1568449740

   Thank you @izveigor  -- I have put this on my review list but I likely won't have a chance to review until tomorrow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] izveigor commented on pull request #6384: feat: New functions and operations for working with arrays

Posted by "izveigor (via GitHub)" <gi...@apache.org>.
izveigor commented on PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#issuecomment-1574235933

   Hello, @alamb!
   I see your and @tustvold comments, thanks for your work!
   
   I analyzed all the comments and came to the conclusion that it is better to implement all other changes in subsequent PR, if the current changes do not contain critical errors. (Because it will be easier to analyze changes and implement their)
   So, I have made a list of issues for possible improvements to arrays:
   `arrow-rs`:
   1) Should some of the features be implemented in `arrow-rs` (for example, `position`)?
   `arrow-datafusion`:
   1) [Important] Implement `unnest` function (it would allow arrays to use aggregate functions `SELECT sum(a) AS total FROM (SELECT unnest(make_array(3, 5, 6) AS a) AS b;`
   2) Support NULLS in arrays (not only NullArray) (I think it would be nice to rewrite `make_array` function with using `try_new` method)
   3) `array_contains` function (LIKE `array[1, 2, 3] @> array[1, 1, 2, 3]`
   4) Write a `Signature` method for list datatypes.
   5) Cast between arrays elements.
   6) Support empty array?
   7) Maybe, refactoring some functions if anyone finds a better solution.
   8) `FixedSizeList` to `List`
   
   What do you think, @alamb?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] izveigor commented on a diff in pull request #6384: feat: New functions and operations for working with arrays

Posted by "izveigor (via GitHub)" <gi...@apache.org>.
izveigor commented on code in PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#discussion_r1209543952


##########
datafusion/expr/src/function.rs:
##########
@@ -322,10 +404,21 @@ pub fn signature(fun: &BuiltinScalarFunction) -> Signature {
 
     // for now, the list is small, as we do not have many built-in functions.
     match fun {
-        BuiltinScalarFunction::MakeArray => Signature::variadic(
-            array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
-            fun.volatility(),
-        ),
+        BuiltinScalarFunction::ArrayAppend => Signature::any(2, fun.volatility()),

Review Comment:
   Are there ways to use List and ARRAY_DATATYPES?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #6384: feat: New functions and operations for working with arrays

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#issuecomment-1577365120

   > This PR is completed, and the next PRs will be only improve it.
   
   I agree this PR is complete (with tests) and is not missing anything major
   
   > If we break this PR down, I think it will increase production time.
   
   Yes, I agree breaking the PR down will require more effort on the author's (your) part. However, I do think if you have the time the effort would improve the overall quality of the DataFusion codebase. 
   
   > What do you think, @alamb?
   
   I think we can merge this PR as long as the work you have planned is tracked by some tickets (so that if you don't have a chance to get to them at least we will have some institutional knowledge) 
   
   Is that acceptable?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #6384: feat: New functions and operations for working with arrays

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#discussion_r1213003167


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -115,3 +149,1560 @@ pub fn array(values: &[ColumnarValue]) -> Result<ColumnarValue> {
         .collect();
     Ok(ColumnarValue::Array(array_array(arrays.as_slice())?))
 }
+
+macro_rules! downcast_arg {
+    ($ARG:expr, $ARRAY_TYPE:ident) => {{
+        $ARG.as_any().downcast_ref::<$ARRAY_TYPE>().ok_or_else(|| {
+            DataFusionError::Internal(format!(
+                "could not cast to {}",
+                type_name::<$ARRAY_TYPE>()
+            ))
+        })?
+    }};
+}
+
+macro_rules! append {
+    ($ARRAY:expr, $ELEMENT:expr, $ARRAY_TYPE:ident) => {{
+        let child_array =
+            downcast_arg!(downcast_arg!($ARRAY, ListArray).values(), $ARRAY_TYPE);
+        let element = downcast_arg!($ELEMENT, $ARRAY_TYPE);
+        let concat = compute::concat(&[child_array, element])?;
+        let mut scalars = vec![];
+        for i in 0..concat.len() {
+            scalars.push(ColumnarValue::Scalar(ScalarValue::try_from_array(
+                &concat, i,
+            )?));
+        }
+        scalars
+    }};
+}
+
+/// Array_append SQL function
+pub fn array_append(args: &[ColumnarValue]) -> Result<ColumnarValue> {
+    if args.len() != 2 {
+        return Err(DataFusionError::Internal(format!(
+            "Array_append function requires two arguments, got {}",
+            args.len()
+        )));
+    }
+
+    let arr = match &args[0] {
+        ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),
+        ColumnarValue::Array(arr) => arr.clone(),
+    };
+
+    let element = match &args[1] {
+        ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),
+        _ => {
+            return Err(DataFusionError::Internal(
+                "Array_append function requires scalar element".to_string(),
+            ))
+        }
+    };
+
+    let data_type = arr.data_type();
+    let arrays = match data_type {
+        DataType::List(field) => {
+            match (field.data_type(), element.data_type()) {
+                (DataType::Utf8, DataType::Utf8) => append!(arr, element, StringArray),
+                (DataType::LargeUtf8, DataType::LargeUtf8) => append!(arr, element, LargeStringArray),
+                (DataType::Boolean, DataType::Boolean) => append!(arr, element, BooleanArray),
+                (DataType::Float32, DataType::Float32) => append!(arr, element, Float32Array),
+                (DataType::Float64, DataType::Float64) => append!(arr, element, Float64Array),
+                (DataType::Int8, DataType::Int8) => append!(arr, element, Int8Array),
+                (DataType::Int16, DataType::Int16) => append!(arr, element, Int16Array),
+                (DataType::Int32, DataType::Int32) => append!(arr, element, Int32Array),
+                (DataType::Int64, DataType::Int64) => append!(arr, element, Int64Array),
+                (DataType::UInt8, DataType::UInt8) => append!(arr, element, UInt8Array),
+                (DataType::UInt16, DataType::UInt16) => append!(arr, element, UInt16Array),
+                (DataType::UInt32, DataType::UInt32) => append!(arr, element, UInt32Array),
+                (DataType::UInt64, DataType::UInt64) => append!(arr, element, UInt64Array),
+                (array_data_type, element_data_type) => {
+                    return Err(DataFusionError::NotImplemented(format!(
+                        "Array_append is not implemented for types '{array_data_type:?}' and '{element_data_type:?}'."
+                    )))
+                }
+            }
+        }
+        data_type => {
+            return Err(DataFusionError::Internal(format!(
+                "Array is not type '{data_type:?}'."
+            )))
+        }
+    };
+
+    array(arrays.as_slice())
+}
+
+macro_rules! prepend {
+    ($ARRAY:expr, $ELEMENT:expr, $ARRAY_TYPE:ident) => {{
+        let child_array =
+            downcast_arg!(downcast_arg!($ARRAY, ListArray).values(), $ARRAY_TYPE);
+        let element = downcast_arg!($ELEMENT, $ARRAY_TYPE);
+        let concat = compute::concat(&[element, child_array])?;
+        let mut scalars = vec![];
+        for i in 0..concat.len() {
+            scalars.push(ColumnarValue::Scalar(ScalarValue::try_from_array(
+                &concat, i,
+            )?));
+        }
+        scalars
+    }};
+}
+
+/// Array_prepend SQL function
+pub fn array_prepend(args: &[ColumnarValue]) -> Result<ColumnarValue> {
+    if args.len() != 2 {
+        return Err(DataFusionError::Internal(format!(
+            "Array_prepend function requires two arguments, got {}",
+            args.len()
+        )));
+    }
+
+    let element = match &args[0] {
+        ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),
+        _ => {
+            return Err(DataFusionError::Internal(
+                "Array_prepend function requires scalar element".to_string(),
+            ))
+        }
+    };
+
+    let arr = match &args[1] {
+        ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),
+        ColumnarValue::Array(arr) => arr.clone(),
+    };
+
+    let data_type = arr.data_type();
+    let arrays = match data_type {
+        DataType::List(field) => {
+            match (field.data_type(), element.data_type()) {
+                (DataType::Utf8, DataType::Utf8) => prepend!(arr, element, StringArray),
+                (DataType::LargeUtf8, DataType::LargeUtf8) => prepend!(arr, element, LargeStringArray),
+                (DataType::Boolean, DataType::Boolean) => prepend!(arr, element, BooleanArray),
+                (DataType::Float32, DataType::Float32) => prepend!(arr, element, Float32Array),
+                (DataType::Float64, DataType::Float64) => prepend!(arr, element, Float64Array),
+                (DataType::Int8, DataType::Int8) => prepend!(arr, element, Int8Array),
+                (DataType::Int16, DataType::Int16) => prepend!(arr, element, Int16Array),
+                (DataType::Int32, DataType::Int32) => prepend!(arr, element, Int32Array),
+                (DataType::Int64, DataType::Int64) => prepend!(arr, element, Int64Array),
+                (DataType::UInt8, DataType::UInt8) => prepend!(arr, element, UInt8Array),
+                (DataType::UInt16, DataType::UInt16) => prepend!(arr, element, UInt16Array),
+                (DataType::UInt32, DataType::UInt32) => prepend!(arr, element, UInt32Array),
+                (DataType::UInt64, DataType::UInt64) => prepend!(arr, element, UInt64Array),
+                (array_data_type, element_data_type) => {
+                    return Err(DataFusionError::NotImplemented(format!(
+                        "Array_prepend is not implemented for types '{array_data_type:?}' and '{element_data_type:?}'."
+                    )))
+                }
+            }
+        }
+        data_type => {
+            return Err(DataFusionError::Internal(format!(
+                "Array is not type '{data_type:?}'."
+            )))
+        }
+    };
+
+    array(arrays.as_slice())
+}
+
+/// Array_concat/Array_cat SQL function
+pub fn array_concat(args: &[ColumnarValue]) -> Result<ColumnarValue> {
+    let arrays: Vec<ArrayRef> = args
+        .iter()
+        .map(|x| match x {
+            ColumnarValue::Array(array) => array.clone(),
+            ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),
+        })
+        .collect();
+    let data_type = arrays[0].data_type();
+    match data_type {
+        DataType::List(..) => {
+            let list_arrays =
+                downcast_vec!(arrays, ListArray).collect::<Result<Vec<&ListArray>>>()?;
+            let len: usize = list_arrays.iter().map(|a| a.values().len()).sum();
+            let capacity = Capacities::Array(
+                list_arrays.iter().map(|a| a.get_buffer_memory_size()).sum(),

Review Comment:
   ```suggestion
                   list_arrays.iter().map(|a| a.len()).sum(),
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #6384: feat: New functions and operations for working with arrays

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#issuecomment-1579295135

   > I have created issues regarding further improvements for working with arrays:
   
   
   Thanks @izveigor  -- I added them to https://github.com/apache/arrow-datafusion/issues/2326 as well. !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #6384: feat: New functions and operations for working with arrays

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] izveigor commented on a diff in pull request #6384: feat: New functions and operations for working with arrays

Posted by "izveigor (via GitHub)" <gi...@apache.org>.
izveigor commented on code in PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#discussion_r1209546666


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -77,11 +81,40 @@ fn array_array(args: &[ArrayRef]) -> Result<ArrayRef> {
     // do not accept 0 arguments.
     if args.is_empty() {
         return Err(DataFusionError::Internal(
-            "array requires at least one argument".to_string(),
+            "Array requires at least one argument".to_string(),
         ));
     }
 
-    let res = match args[0].data_type() {
+    let data_type = args[0].data_type();
+    let res = match data_type {
+        DataType::List(..) => {

Review Comment:
   I don't know the ways how to implement `FixedSizeList` in all the functions, so I preferred to use `List`. I think it does not affect anything.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] izveigor commented on pull request #6384: feat: New functions and operations for working with arrays

Posted by "izveigor (via GitHub)" <gi...@apache.org>.
izveigor commented on PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#issuecomment-1578635580

   Hello, @alamb!
   
   I have created issues regarding further improvements for working with arrays:
   #6555
   #6556
   #6557
   #6558
   #6559 
   #6560 
   #6561 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] izveigor commented on pull request #6384: feat: New functions and operations for working with arrays

Posted by "izveigor (via GitHub)" <gi...@apache.org>.
izveigor commented on PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#issuecomment-1577373410

   I think this option will suit me.
   Tomorrow I will try to describe in detail all the ideas in the tickets.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #6384: feat: New functions and operations for working with arrays

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#issuecomment-1576515347

   > What do you think, @alamb?
   
   I think this would be ok -- especially as you have a history of continued contribution. However, there are a few instances where engnaged committers committed in the start of promising features (such as the analysis framework from @isidentical) and then were not able to to finish the work for whatever reason. While this is fine, I think it would be better for datafusion to avoid it.
   
   Thus I would like to suggest an alternate approach which is to break this PR down into several smaller ones (perhaps one for each new function?) That way we can give each function the attention during review it deserves (and maybe even parallelize the work)
   
   We have a much better track record of being able to review and merge smaller PRs quickly than single large PRs. So when the functionality can be split up I think that is the best plan.
   
   What do you think @izveigor ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] izveigor commented on a diff in pull request #6384: feat: New functions and operations for working with arrays

Posted by "izveigor (via GitHub)" <gi...@apache.org>.
izveigor commented on code in PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#discussion_r1209551667


##########
datafusion/physical-expr/src/functions.rs:
##########
@@ -2785,73 +2807,6 @@ mod tests {
         Ok(())
     }
 
-    fn generic_test_array(

Review Comment:
   It does not work (`FixedSizeList` replaced with `List`). With what it can be connected?
   Error:
   ```
   left: `List(Field { name: "item", data_type: UInt32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })`,
   right: `List(Field { name: "item", data_type: UInt64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })`'
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] izveigor commented on pull request #6384: feat: New functions and operations for working with arrays

Posted by "izveigor (via GitHub)" <gi...@apache.org>.
izveigor commented on PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#issuecomment-1567503406

   @alamb I wonder if you have time to review this PR.
   I left comments on which I would be pleased to hear your opinion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #6384: feat: New functions and operations for working with arrays

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#discussion_r1213002787


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -115,3 +149,1560 @@ pub fn array(values: &[ColumnarValue]) -> Result<ColumnarValue> {
         .collect();
     Ok(ColumnarValue::Array(array_array(arrays.as_slice())?))
 }
+
+macro_rules! downcast_arg {
+    ($ARG:expr, $ARRAY_TYPE:ident) => {{
+        $ARG.as_any().downcast_ref::<$ARRAY_TYPE>().ok_or_else(|| {
+            DataFusionError::Internal(format!(
+                "could not cast to {}",
+                type_name::<$ARRAY_TYPE>()
+            ))
+        })?
+    }};
+}
+
+macro_rules! append {
+    ($ARRAY:expr, $ELEMENT:expr, $ARRAY_TYPE:ident) => {{
+        let child_array =
+            downcast_arg!(downcast_arg!($ARRAY, ListArray).values(), $ARRAY_TYPE);
+        let element = downcast_arg!($ELEMENT, $ARRAY_TYPE);
+        let concat = compute::concat(&[child_array, element])?;
+        let mut scalars = vec![];
+        for i in 0..concat.len() {
+            scalars.push(ColumnarValue::Scalar(ScalarValue::try_from_array(
+                &concat, i,
+            )?));
+        }
+        scalars
+    }};
+}
+
+/// Array_append SQL function
+pub fn array_append(args: &[ColumnarValue]) -> Result<ColumnarValue> {
+    if args.len() != 2 {
+        return Err(DataFusionError::Internal(format!(
+            "Array_append function requires two arguments, got {}",
+            args.len()
+        )));
+    }
+
+    let arr = match &args[0] {
+        ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),
+        ColumnarValue::Array(arr) => arr.clone(),
+    };
+
+    let element = match &args[1] {
+        ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),
+        _ => {
+            return Err(DataFusionError::Internal(
+                "Array_append function requires scalar element".to_string(),
+            ))
+        }
+    };
+
+    let data_type = arr.data_type();
+    let arrays = match data_type {
+        DataType::List(field) => {
+            match (field.data_type(), element.data_type()) {
+                (DataType::Utf8, DataType::Utf8) => append!(arr, element, StringArray),
+                (DataType::LargeUtf8, DataType::LargeUtf8) => append!(arr, element, LargeStringArray),
+                (DataType::Boolean, DataType::Boolean) => append!(arr, element, BooleanArray),
+                (DataType::Float32, DataType::Float32) => append!(arr, element, Float32Array),
+                (DataType::Float64, DataType::Float64) => append!(arr, element, Float64Array),
+                (DataType::Int8, DataType::Int8) => append!(arr, element, Int8Array),
+                (DataType::Int16, DataType::Int16) => append!(arr, element, Int16Array),
+                (DataType::Int32, DataType::Int32) => append!(arr, element, Int32Array),
+                (DataType::Int64, DataType::Int64) => append!(arr, element, Int64Array),
+                (DataType::UInt8, DataType::UInt8) => append!(arr, element, UInt8Array),
+                (DataType::UInt16, DataType::UInt16) => append!(arr, element, UInt16Array),
+                (DataType::UInt32, DataType::UInt32) => append!(arr, element, UInt32Array),
+                (DataType::UInt64, DataType::UInt64) => append!(arr, element, UInt64Array),
+                (array_data_type, element_data_type) => {
+                    return Err(DataFusionError::NotImplemented(format!(
+                        "Array_append is not implemented for types '{array_data_type:?}' and '{element_data_type:?}'."
+                    )))
+                }
+            }
+        }
+        data_type => {
+            return Err(DataFusionError::Internal(format!(
+                "Array is not type '{data_type:?}'."
+            )))
+        }
+    };
+
+    array(arrays.as_slice())
+}
+
+macro_rules! prepend {
+    ($ARRAY:expr, $ELEMENT:expr, $ARRAY_TYPE:ident) => {{
+        let child_array =
+            downcast_arg!(downcast_arg!($ARRAY, ListArray).values(), $ARRAY_TYPE);
+        let element = downcast_arg!($ELEMENT, $ARRAY_TYPE);
+        let concat = compute::concat(&[element, child_array])?;
+        let mut scalars = vec![];
+        for i in 0..concat.len() {
+            scalars.push(ColumnarValue::Scalar(ScalarValue::try_from_array(
+                &concat, i,
+            )?));
+        }
+        scalars
+    }};
+}
+
+/// Array_prepend SQL function
+pub fn array_prepend(args: &[ColumnarValue]) -> Result<ColumnarValue> {
+    if args.len() != 2 {
+        return Err(DataFusionError::Internal(format!(
+            "Array_prepend function requires two arguments, got {}",
+            args.len()
+        )));
+    }
+
+    let element = match &args[0] {
+        ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),
+        _ => {
+            return Err(DataFusionError::Internal(
+                "Array_prepend function requires scalar element".to_string(),
+            ))
+        }
+    };
+
+    let arr = match &args[1] {
+        ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),
+        ColumnarValue::Array(arr) => arr.clone(),
+    };
+
+    let data_type = arr.data_type();
+    let arrays = match data_type {
+        DataType::List(field) => {
+            match (field.data_type(), element.data_type()) {
+                (DataType::Utf8, DataType::Utf8) => prepend!(arr, element, StringArray),
+                (DataType::LargeUtf8, DataType::LargeUtf8) => prepend!(arr, element, LargeStringArray),
+                (DataType::Boolean, DataType::Boolean) => prepend!(arr, element, BooleanArray),
+                (DataType::Float32, DataType::Float32) => prepend!(arr, element, Float32Array),
+                (DataType::Float64, DataType::Float64) => prepend!(arr, element, Float64Array),
+                (DataType::Int8, DataType::Int8) => prepend!(arr, element, Int8Array),
+                (DataType::Int16, DataType::Int16) => prepend!(arr, element, Int16Array),
+                (DataType::Int32, DataType::Int32) => prepend!(arr, element, Int32Array),
+                (DataType::Int64, DataType::Int64) => prepend!(arr, element, Int64Array),
+                (DataType::UInt8, DataType::UInt8) => prepend!(arr, element, UInt8Array),
+                (DataType::UInt16, DataType::UInt16) => prepend!(arr, element, UInt16Array),
+                (DataType::UInt32, DataType::UInt32) => prepend!(arr, element, UInt32Array),
+                (DataType::UInt64, DataType::UInt64) => prepend!(arr, element, UInt64Array),
+                (array_data_type, element_data_type) => {
+                    return Err(DataFusionError::NotImplemented(format!(
+                        "Array_prepend is not implemented for types '{array_data_type:?}' and '{element_data_type:?}'."
+                    )))
+                }
+            }
+        }
+        data_type => {
+            return Err(DataFusionError::Internal(format!(
+                "Array is not type '{data_type:?}'."
+            )))
+        }
+    };
+
+    array(arrays.as_slice())
+}
+
+/// Array_concat/Array_cat SQL function
+pub fn array_concat(args: &[ColumnarValue]) -> Result<ColumnarValue> {
+    let arrays: Vec<ArrayRef> = args
+        .iter()
+        .map(|x| match x {
+            ColumnarValue::Array(array) => array.clone(),
+            ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),
+        })
+        .collect();
+    let data_type = arrays[0].data_type();
+    match data_type {
+        DataType::List(..) => {
+            let list_arrays =

Review Comment:
   I think you could just call `to_data()` I'm not sure this needs to downcast to ListArray



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #6384: feat: New functions and operations for working with arrays

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #6384:
URL: https://github.com/apache/arrow-datafusion/pull/6384#discussion_r1213005254


##########
datafusion/physical-expr/src/array_expressions.rs:
##########
@@ -115,3 +149,1560 @@ pub fn array(values: &[ColumnarValue]) -> Result<ColumnarValue> {
         .collect();
     Ok(ColumnarValue::Array(array_array(arrays.as_slice())?))
 }
+
+macro_rules! downcast_arg {
+    ($ARG:expr, $ARRAY_TYPE:ident) => {{
+        $ARG.as_any().downcast_ref::<$ARRAY_TYPE>().ok_or_else(|| {
+            DataFusionError::Internal(format!(
+                "could not cast to {}",
+                type_name::<$ARRAY_TYPE>()
+            ))
+        })?
+    }};
+}
+
+macro_rules! append {
+    ($ARRAY:expr, $ELEMENT:expr, $ARRAY_TYPE:ident) => {{
+        let child_array =
+            downcast_arg!(downcast_arg!($ARRAY, ListArray).values(), $ARRAY_TYPE);
+        let element = downcast_arg!($ELEMENT, $ARRAY_TYPE);
+        let concat = compute::concat(&[child_array, element])?;
+        let mut scalars = vec![];
+        for i in 0..concat.len() {
+            scalars.push(ColumnarValue::Scalar(ScalarValue::try_from_array(
+                &concat, i,
+            )?));
+        }
+        scalars
+    }};
+}
+
+/// Array_append SQL function
+pub fn array_append(args: &[ColumnarValue]) -> Result<ColumnarValue> {
+    if args.len() != 2 {
+        return Err(DataFusionError::Internal(format!(
+            "Array_append function requires two arguments, got {}",
+            args.len()
+        )));
+    }
+
+    let arr = match &args[0] {
+        ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),
+        ColumnarValue::Array(arr) => arr.clone(),
+    };
+
+    let element = match &args[1] {
+        ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),
+        _ => {
+            return Err(DataFusionError::Internal(
+                "Array_append function requires scalar element".to_string(),
+            ))
+        }
+    };
+
+    let data_type = arr.data_type();
+    let arrays = match data_type {
+        DataType::List(field) => {
+            match (field.data_type(), element.data_type()) {
+                (DataType::Utf8, DataType::Utf8) => append!(arr, element, StringArray),
+                (DataType::LargeUtf8, DataType::LargeUtf8) => append!(arr, element, LargeStringArray),
+                (DataType::Boolean, DataType::Boolean) => append!(arr, element, BooleanArray),
+                (DataType::Float32, DataType::Float32) => append!(arr, element, Float32Array),
+                (DataType::Float64, DataType::Float64) => append!(arr, element, Float64Array),
+                (DataType::Int8, DataType::Int8) => append!(arr, element, Int8Array),
+                (DataType::Int16, DataType::Int16) => append!(arr, element, Int16Array),
+                (DataType::Int32, DataType::Int32) => append!(arr, element, Int32Array),
+                (DataType::Int64, DataType::Int64) => append!(arr, element, Int64Array),
+                (DataType::UInt8, DataType::UInt8) => append!(arr, element, UInt8Array),
+                (DataType::UInt16, DataType::UInt16) => append!(arr, element, UInt16Array),
+                (DataType::UInt32, DataType::UInt32) => append!(arr, element, UInt32Array),
+                (DataType::UInt64, DataType::UInt64) => append!(arr, element, UInt64Array),
+                (array_data_type, element_data_type) => {
+                    return Err(DataFusionError::NotImplemented(format!(
+                        "Array_append is not implemented for types '{array_data_type:?}' and '{element_data_type:?}'."
+                    )))
+                }
+            }
+        }
+        data_type => {
+            return Err(DataFusionError::Internal(format!(
+                "Array is not type '{data_type:?}'."
+            )))
+        }
+    };
+
+    array(arrays.as_slice())
+}
+
+macro_rules! prepend {
+    ($ARRAY:expr, $ELEMENT:expr, $ARRAY_TYPE:ident) => {{
+        let child_array =
+            downcast_arg!(downcast_arg!($ARRAY, ListArray).values(), $ARRAY_TYPE);
+        let element = downcast_arg!($ELEMENT, $ARRAY_TYPE);
+        let concat = compute::concat(&[element, child_array])?;
+        let mut scalars = vec![];
+        for i in 0..concat.len() {
+            scalars.push(ColumnarValue::Scalar(ScalarValue::try_from_array(
+                &concat, i,
+            )?));
+        }
+        scalars
+    }};
+}
+
+/// Array_prepend SQL function
+pub fn array_prepend(args: &[ColumnarValue]) -> Result<ColumnarValue> {
+    if args.len() != 2 {
+        return Err(DataFusionError::Internal(format!(
+            "Array_prepend function requires two arguments, got {}",
+            args.len()
+        )));
+    }
+
+    let element = match &args[0] {
+        ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),
+        _ => {
+            return Err(DataFusionError::Internal(
+                "Array_prepend function requires scalar element".to_string(),
+            ))
+        }
+    };
+
+    let arr = match &args[1] {
+        ColumnarValue::Scalar(scalar) => scalar.to_array().clone(),
+        ColumnarValue::Array(arr) => arr.clone(),
+    };
+
+    let data_type = arr.data_type();
+    let arrays = match data_type {
+        DataType::List(field) => {
+            match (field.data_type(), element.data_type()) {
+                (DataType::Utf8, DataType::Utf8) => prepend!(arr, element, StringArray),
+                (DataType::LargeUtf8, DataType::LargeUtf8) => prepend!(arr, element, LargeStringArray),
+                (DataType::Boolean, DataType::Boolean) => prepend!(arr, element, BooleanArray),
+                (DataType::Float32, DataType::Float32) => prepend!(arr, element, Float32Array),
+                (DataType::Float64, DataType::Float64) => prepend!(arr, element, Float64Array),
+                (DataType::Int8, DataType::Int8) => prepend!(arr, element, Int8Array),
+                (DataType::Int16, DataType::Int16) => prepend!(arr, element, Int16Array),
+                (DataType::Int32, DataType::Int32) => prepend!(arr, element, Int32Array),
+                (DataType::Int64, DataType::Int64) => prepend!(arr, element, Int64Array),
+                (DataType::UInt8, DataType::UInt8) => prepend!(arr, element, UInt8Array),
+                (DataType::UInt16, DataType::UInt16) => prepend!(arr, element, UInt16Array),
+                (DataType::UInt32, DataType::UInt32) => prepend!(arr, element, UInt32Array),
+                (DataType::UInt64, DataType::UInt64) => prepend!(arr, element, UInt64Array),
+                (array_data_type, element_data_type) => {
+                    return Err(DataFusionError::NotImplemented(format!(
+                        "Array_prepend is not implemented for types '{array_data_type:?}' and '{element_data_type:?}'."
+                    )))
+                }
+            }
+        }
+        data_type => {
+            return Err(DataFusionError::Internal(format!(
+                "Array is not type '{data_type:?}'."
+            )))
+        }
+    };
+
+    array(arrays.as_slice())
+}
+
+/// Array_concat/Array_cat SQL function
+pub fn array_concat(args: &[ColumnarValue]) -> Result<ColumnarValue> {

Review Comment:
   It would perhaps be nicer to use a combination of https://docs.rs/arrow-array/latest/arrow_array/array/struct.GenericListArray.html#method.try_new and https://docs.rs/arrow-select/latest/arrow_select/concat/index.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org