You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "Omega359 (via GitHub)" <gi...@apache.org> on 2024/03/21 18:46:17 UTC

[PR] Move trim functions (btrim, ltrim, rtrim) to datafusion_functions [arrow-datafusion]

Omega359 opened a new pull request, #9730:
URL: https://github.com/apache/arrow-datafusion/pull/9730

   ## Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #9703
   
   ## Rationale for this change
   
   Function migration.
    
   ## What changes are included in this PR?
   
   Code and tests.
   
   ## Are these changes tested?
   
   Yes.
   
   ## Are there any user-facing changes?
   
   Yes. The dataframe api now requires a vec![] parameter for all trim functions vs before where it seemed to be mixed between vec and not.
   
   There was a mismatch between the sql interface, the dataframe api and the documentation around these functions. As you can see in the expr.slt file all the trim functions could be provided with a second argument which would be the string to use when trimming. However, the documentation and the dataframe api only seems to have that for btrim, not the other two functions.
   
   This PR aligns the documentation and the dataframe with the sql so that all accept a second (optional) argument.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Move trim functions (btrim, ltrim, rtrim) to datafusion_functions, make expr_fn API consistent [arrow-datafusion]

Posted by "Omega359 (via GitHub)" <gi...@apache.org>.
Omega359 commented on code in PR #9730:
URL: https://github.com/apache/arrow-datafusion/pull/9730#discussion_r1535772504


##########
datafusion/physical-expr/src/functions.rs:
##########
@@ -752,70 +725,6 @@ mod tests {
             Int32Array
         );
         test_function!(BitLength, &[lit("")], Ok(Some(0)), i32, Int32, Int32Array);
-        test_function!(

Review Comment:
   I'm actually migrating that test_function for another PR (#9702) so I think what I'd like to do is to have a followup PR after that lands to readd those tests into the code. I'm not a fan of relying strictly on the sql tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Move trim functions (btrim, ltrim, rtrim) to datafusion_functions, make expr_fn API consistent [arrow-datafusion]

Posted by "Omega359 (via GitHub)" <gi...@apache.org>.
Omega359 commented on code in PR #9730:
URL: https://github.com/apache/arrow-datafusion/pull/9730#discussion_r1535769855


##########
datafusion/functions/src/string/btrim.rs:
##########
@@ -16,27 +16,27 @@
 // under the License.
 
 use arrow::array::{ArrayRef, OffsetSizeTrait};
-use arrow::datatypes::DataType;
-use datafusion_common::exec_err;
-use datafusion_common::Result;
-use datafusion_expr::ColumnarValue;
-use datafusion_expr::TypeSignature::*;
-use datafusion_expr::{ScalarUDFImpl, Signature, Volatility};
 use std::any::Any;
 
-use crate::string::{make_scalar_function, utf8_to_str_type};
+use arrow::datatypes::DataType;
 
-use super::{general_trim, TrimType};
+use datafusion_common::{exec_err, Result};
+use datafusion_expr::TypeSignature::*;
+use datafusion_expr::{ColumnarValue, Volatility};
+use datafusion_expr::{ScalarUDFImpl, Signature};
 
-/// Returns the longest string  with leading and trailing characters removed. If the characters are not specified, whitespace is removed.
+use crate::string::common::*;
+
+/// Returns the longest string with leading and trailing characters removed. If the characters are not specified, whitespace is removed.
 /// btrim('xyxtrimyyx', 'xyz') = 'trim'
-pub fn btrim<T: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> {
+fn btrim<T: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> {
     general_trim::<T>(args, TrimType::Both)
 }
 
 #[derive(Debug)]
 pub(super) struct TrimFunc {

Review Comment:
   Urgh, this was an oversight on my part when moving the Trim function here. I'll fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Move trim functions (btrim, ltrim, rtrim) to datafusion_functions, make expr_fn API consistent [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #9730:
URL: https://github.com/apache/arrow-datafusion/pull/9730


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Move trim functions (btrim, ltrim, rtrim) to datafusion_functions, make expr_fn API consistent [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #9730:
URL: https://github.com/apache/arrow-datafusion/pull/9730#discussion_r1535990270


##########
datafusion/physical-expr/src/functions.rs:
##########
@@ -752,70 +725,6 @@ mod tests {
             Int32Array
         );
         test_function!(BitLength, &[lit("")], Ok(Some(0)), i32, Int32, Int32Array);
-        test_function!(

Review Comment:
   ok. I actually think the sql coverage is the most important (as it ensures everything works end to end). That being said having additional unit test coverage is not a bad thing either



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Move trim functions (btrim, ltrim, rtrim) to datafusion_functions, make expr_fn API consistent [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #9730:
URL: https://github.com/apache/arrow-datafusion/pull/9730#discussion_r1535694427


##########
datafusion/functions/src/string/btrim.rs:
##########
@@ -16,27 +16,27 @@
 // under the License.
 
 use arrow::array::{ArrayRef, OffsetSizeTrait};
-use arrow::datatypes::DataType;
-use datafusion_common::exec_err;
-use datafusion_common::Result;
-use datafusion_expr::ColumnarValue;
-use datafusion_expr::TypeSignature::*;
-use datafusion_expr::{ScalarUDFImpl, Signature, Volatility};
 use std::any::Any;
 
-use crate::string::{make_scalar_function, utf8_to_str_type};
+use arrow::datatypes::DataType;
 
-use super::{general_trim, TrimType};
+use datafusion_common::{exec_err, Result};
+use datafusion_expr::TypeSignature::*;
+use datafusion_expr::{ColumnarValue, Volatility};
+use datafusion_expr::{ScalarUDFImpl, Signature};
 
-/// Returns the longest string  with leading and trailing characters removed. If the characters are not specified, whitespace is removed.
+use crate::string::common::*;
+
+/// Returns the longest string with leading and trailing characters removed. If the characters are not specified, whitespace is removed.
 /// btrim('xyxtrimyyx', 'xyz') = 'trim'
-pub fn btrim<T: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> {
+fn btrim<T: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> {
     general_trim::<T>(args, TrimType::Both)
 }
 
 #[derive(Debug)]
 pub(super) struct TrimFunc {

Review Comment:
   It isn't a huge deal, but I found it a little confusing that the function is called `TrimFunc` but has a name of `"btrim"` and an alias of `"trim"`
   
   I would expect either `BTrimFunc` or else the name of "trim" and an alias of `"btrim"`



##########
datafusion/physical-expr/src/functions.rs:
##########
@@ -752,70 +725,6 @@ mod tests {
             Int32Array
         );
         test_function!(BitLength, &[lit("")], Ok(Some(0)), i32, Int32, Int32Array);
-        test_function!(

Review Comment:
   I double checked and  the functions are already tested in expr.slt (e.g. https://github.com/apache/arrow-datafusion/blob/47f4b5a67ac3b327764cbd4c0f42da7ac44854e5/datafusion/sqllogictest/test_files/expr.slt#L558-L574)
   
   Thus removing these tests looks good to me 👍 
   



##########
datafusion/functions/src/string/mod.rs:
##########
@@ -15,278 +15,63 @@
 // specific language governing permissions and limitations
 // under the License.
 
-use arrow::{
-    array::{Array, ArrayRef, GenericStringArray, OffsetSizeTrait},
-    datatypes::DataType,
-};
-use datafusion_common::{
-    cast::as_generic_string_array, exec_err, plan_err, Result, ScalarValue,
-};
-use datafusion_expr::{ColumnarValue, ScalarFunctionImplementation};
-use datafusion_physical_expr::functions::Hint;
-use std::{
-    fmt::{Display, Formatter},
-    sync::Arc,
-};
+//! "string" DataFusion functions
 
-/// Creates a function to identify the optimal return type of a string function given
-/// the type of its first argument.
-///
-/// If the input type is `LargeUtf8` or `LargeBinary` the return type is
-/// `$largeUtf8Type`,
-///
-/// If the input type is `Utf8` or `Binary` the return type is `$utf8Type`,
-macro_rules! get_optimal_return_type {

Review Comment:
   thank you for cleaning this up



##########
docs/source/user-guide/sql/scalar_functions.md:
##########
@@ -919,26 +922,25 @@ lpad(str, n[, padding_str])
 
 ### `ltrim`
 
-Removes leading spaces from a string.
+Trims the specified trim string from the beginning of a string.

Review Comment:
   Thank you for correcting and rationalizing the documentation



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org