You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "waynexia (via GitHub)" <gi...@apache.org> on 2023/07/04 12:28:24 UTC

[GitHub] [arrow-datafusion] waynexia opened a new pull request, #6840: feat: implement substrait for LIKE/ILIKE expr

waynexia opened a new pull request, #6840:
URL: https://github.com/apache/arrow-datafusion/pull/6840

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #6731.
   
   # Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   Support more expressions in substrait
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   
   Implement `Expr::Like` and `Expr::ILike` for datafusion-substrait
   
   # Are these changes tested?
    
   Yes
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   # Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waynexia commented on a diff in pull request #6840: feat: implement substrait for LIKE/ILIKE expr

Posted by "waynexia (via GitHub)" <gi...@apache.org>.
waynexia commented on code in PR #6840:
URL: https://github.com/apache/arrow-datafusion/pull/6840#discussion_r1260704266


##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -819,116 +840,38 @@ pub async fn from_substrait_rex(
                                 ),
                             })))
                         }
-                        Ok(ScalarFunctionType::Builtin(fun)) => {
-                            Ok(Arc::new(Expr::ScalarFunction(expr::ScalarFunction {
-                                fun,
-                                args: vec![
-                                    from_substrait_rex(l, input_schema, extensions)
-                                        .await?
-                                        .as_ref()
-                                        .clone(),
-                                    from_substrait_rex(r, input_schema, extensions)
-                                        .await?
-                                        .as_ref()
-                                        .clone(),
-                                ],
-                            })))
-                        }
-                        Ok(ScalarFunctionType::Not) => {
-                            Err(DataFusionError::NotImplemented(
-                                "Not expected function type: Not".to_string(),
-                            ))
-                        }
-                        Err(e) => Err(e),
-                    }
-                }
-                (l, r) => Err(DataFusionError::NotImplemented(format!(
-                    "Invalid arguments for binary expression: {l:?} and {r:?}"
-                ))),
-            },
-            // ScalarFunction or Expr::Not
-            1 => {
-                let fun = match extensions.get(&f.function_reference) {
-                    Some(fname) => scalar_function_or_not(fname),
-                    None => Err(DataFusionError::NotImplemented(format!(
-                        "Function not found: function reference = {:?}",
-                        f.function_reference
-                    ))),
-                };
-
-                match fun {
-                    Ok(ScalarFunctionType::Op(_)) => {
-                        Err(DataFusionError::NotImplemented(
-                            "Not expected function type: Op".to_string(),
-                        ))
-                    }
-                    Ok(scalar_function_type) => {
-                        match &f.arguments.first().unwrap().arg_type {
-                            Some(ArgType::Value(e)) => {
-                                let expr =
-                                    from_substrait_rex(e, input_schema, extensions)
-                                        .await?
-                                        .as_ref()
-                                        .clone();
-                                match scalar_function_type {
-                                    ScalarFunctionType::Builtin(fun) => Ok(Arc::new(
-                                        Expr::ScalarFunction(expr::ScalarFunction {
-                                            fun,
-                                            args: vec![expr],
-                                        }),
-                                    )),
-                                    ScalarFunctionType::Not => {
-                                        Ok(Arc::new(Expr::Not(Box::new(expr))))
-                                    }
-                                    _ => Err(DataFusionError::NotImplemented(
-                                        "Invalid arguments for Not expression"
-                                            .to_string(),
-                                    )),
-                                }
-                            }
-                            _ => Err(DataFusionError::NotImplemented(
-                                "Invalid arguments for Not expression".to_string(),
-                            )),
-                        }
+                        (l, r) => Err(DataFusionError::NotImplemented(format!(
+                            "Invalid arguments for binary expression: {l:?} and {r:?}"
+                        ))),
                     }
-                    Err(e) => Err(e),
                 }
-            }
-            // ScalarFunction
-            _ => {
-                let fun = match extensions.get(&f.function_reference) {
-                    Some(fname) => BuiltinScalarFunction::from_str(fname),
-                    None => Err(DataFusionError::NotImplemented(format!(
-                        "Aggregated function not found: function reference = {:?}",
-                        f.function_reference
-                    ))),
-                };
-
-                let mut args: Vec<Expr> = vec![];
-                for arg in f.arguments.iter() {
+                ScalarFunctionType::Not => {
+                    let arg = f.arguments.first().ok_or_else(|| {

Review Comment:
   I'm not sure about this. In the previous code (like deserializing ReadRel) we don't care about extra information from the proto. I.e., we only care about if we can get the data necessary to construct our plan, and just ignore other extra things. But on the other hand, redundant things sometimes indicate an error or unexpected behavior (like finding a remaining screw after recovering something 🫣).



##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -819,116 +840,38 @@ pub async fn from_substrait_rex(
                                 ),
                             })))
                         }
-                        Ok(ScalarFunctionType::Builtin(fun)) => {
-                            Ok(Arc::new(Expr::ScalarFunction(expr::ScalarFunction {
-                                fun,
-                                args: vec![
-                                    from_substrait_rex(l, input_schema, extensions)
-                                        .await?
-                                        .as_ref()
-                                        .clone(),
-                                    from_substrait_rex(r, input_schema, extensions)
-                                        .await?
-                                        .as_ref()
-                                        .clone(),
-                                ],
-                            })))
-                        }
-                        Ok(ScalarFunctionType::Not) => {
-                            Err(DataFusionError::NotImplemented(
-                                "Not expected function type: Not".to_string(),
-                            ))
-                        }
-                        Err(e) => Err(e),
-                    }
-                }
-                (l, r) => Err(DataFusionError::NotImplemented(format!(
-                    "Invalid arguments for binary expression: {l:?} and {r:?}"
-                ))),
-            },
-            // ScalarFunction or Expr::Not
-            1 => {
-                let fun = match extensions.get(&f.function_reference) {
-                    Some(fname) => scalar_function_or_not(fname),
-                    None => Err(DataFusionError::NotImplemented(format!(
-                        "Function not found: function reference = {:?}",
-                        f.function_reference
-                    ))),
-                };
-
-                match fun {
-                    Ok(ScalarFunctionType::Op(_)) => {
-                        Err(DataFusionError::NotImplemented(
-                            "Not expected function type: Op".to_string(),
-                        ))
-                    }
-                    Ok(scalar_function_type) => {
-                        match &f.arguments.first().unwrap().arg_type {
-                            Some(ArgType::Value(e)) => {
-                                let expr =
-                                    from_substrait_rex(e, input_schema, extensions)
-                                        .await?
-                                        .as_ref()
-                                        .clone();
-                                match scalar_function_type {
-                                    ScalarFunctionType::Builtin(fun) => Ok(Arc::new(
-                                        Expr::ScalarFunction(expr::ScalarFunction {
-                                            fun,
-                                            args: vec![expr],
-                                        }),
-                                    )),
-                                    ScalarFunctionType::Not => {
-                                        Ok(Arc::new(Expr::Not(Box::new(expr))))
-                                    }
-                                    _ => Err(DataFusionError::NotImplemented(
-                                        "Invalid arguments for Not expression"
-                                            .to_string(),
-                                    )),
-                                }
-                            }
-                            _ => Err(DataFusionError::NotImplemented(
-                                "Invalid arguments for Not expression".to_string(),
-                            )),
-                        }
+                        (l, r) => Err(DataFusionError::NotImplemented(format!(
+                            "Invalid arguments for binary expression: {l:?} and {r:?}"
+                        ))),
                     }
-                    Err(e) => Err(e),
                 }
-            }
-            // ScalarFunction
-            _ => {
-                let fun = match extensions.get(&f.function_reference) {
-                    Some(fname) => BuiltinScalarFunction::from_str(fname),
-                    None => Err(DataFusionError::NotImplemented(format!(
-                        "Aggregated function not found: function reference = {:?}",
-                        f.function_reference
-                    ))),
-                };
-
-                let mut args: Vec<Expr> = vec![];
-                for arg in f.arguments.iter() {
+                ScalarFunctionType::Not => {
+                    let arg = f.arguments.first().ok_or_else(|| {

Review Comment:
   I'm not sure about this. In the previous code (like deserializing ReadRel) we don't care about extra information from the proto. I.e., we only care about if we can get the data necessary to construct our plan, and just ignore other extra things. But on the other hand, redundant things sometimes indicate an error or unexpected behavior (like finding a remaining screw after recovering something 🫣).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on pull request #6840: feat: implement substrait for LIKE/ILIKE expr

Posted by "andygrove (via GitHub)" <gi...@apache.org>.
andygrove commented on PR #6840:
URL: https://github.com/apache/arrow-datafusion/pull/6840#issuecomment-1623821971

   @nseekhao  fyi


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] crepererum commented on pull request #6840: feat: implement substrait for LIKE/ILIKE expr

Posted by "crepererum (via GitHub)" <gi...@apache.org>.
crepererum commented on PR #6840:
URL: https://github.com/apache/arrow-datafusion/pull/6840#issuecomment-1620221775

   > This patch doesn't use this [definition](https://substrait.io/extensions/functions_string/#like) as our `Expr::Like` has one more field `escape_char` (but it's not used/supported in the physical plan lol).
   
   IIRC I've ran into this issue before as well. I think we should just remove that field and bail out during the SQL->Logical lowering if the SQL parser encounters that (because it seems the logical expr. is modeled after SQL and the physical after whats possible in arrow at the moment). Or we decide that this is a feature we actually want and fix the logical->physical lowering.
   
   > Maybe we can combine `Expr::Like` and `Expr::ILike` by adding a field `ignore_case`?
   
   Yes please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] nseekhao commented on a diff in pull request #6840: feat: implement substrait for LIKE/ILIKE expr

Posted by "nseekhao (via GitHub)" <gi...@apache.org>.
nseekhao commented on code in PR #6840:
URL: https://github.com/apache/arrow-datafusion/pull/6840#discussion_r1258478050


##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -819,116 +840,38 @@ pub async fn from_substrait_rex(
                                 ),
                             })))
                         }
-                        Ok(ScalarFunctionType::Builtin(fun)) => {
-                            Ok(Arc::new(Expr::ScalarFunction(expr::ScalarFunction {
-                                fun,
-                                args: vec![
-                                    from_substrait_rex(l, input_schema, extensions)
-                                        .await?
-                                        .as_ref()
-                                        .clone(),
-                                    from_substrait_rex(r, input_schema, extensions)
-                                        .await?
-                                        .as_ref()
-                                        .clone(),
-                                ],
-                            })))
-                        }
-                        Ok(ScalarFunctionType::Not) => {
-                            Err(DataFusionError::NotImplemented(
-                                "Not expected function type: Not".to_string(),
-                            ))
-                        }
-                        Err(e) => Err(e),
-                    }
-                }
-                (l, r) => Err(DataFusionError::NotImplemented(format!(
-                    "Invalid arguments for binary expression: {l:?} and {r:?}"
-                ))),
-            },
-            // ScalarFunction or Expr::Not
-            1 => {
-                let fun = match extensions.get(&f.function_reference) {
-                    Some(fname) => scalar_function_or_not(fname),
-                    None => Err(DataFusionError::NotImplemented(format!(
-                        "Function not found: function reference = {:?}",
-                        f.function_reference
-                    ))),
-                };
-
-                match fun {
-                    Ok(ScalarFunctionType::Op(_)) => {
-                        Err(DataFusionError::NotImplemented(
-                            "Not expected function type: Op".to_string(),
-                        ))
-                    }
-                    Ok(scalar_function_type) => {
-                        match &f.arguments.first().unwrap().arg_type {
-                            Some(ArgType::Value(e)) => {
-                                let expr =
-                                    from_substrait_rex(e, input_schema, extensions)
-                                        .await?
-                                        .as_ref()
-                                        .clone();
-                                match scalar_function_type {
-                                    ScalarFunctionType::Builtin(fun) => Ok(Arc::new(
-                                        Expr::ScalarFunction(expr::ScalarFunction {
-                                            fun,
-                                            args: vec![expr],
-                                        }),
-                                    )),
-                                    ScalarFunctionType::Not => {
-                                        Ok(Arc::new(Expr::Not(Box::new(expr))))
-                                    }
-                                    _ => Err(DataFusionError::NotImplemented(
-                                        "Invalid arguments for Not expression"
-                                            .to_string(),
-                                    )),
-                                }
-                            }
-                            _ => Err(DataFusionError::NotImplemented(
-                                "Invalid arguments for Not expression".to_string(),
-                            )),
-                        }
+                        (l, r) => Err(DataFusionError::NotImplemented(format!(
+                            "Invalid arguments for binary expression: {l:?} and {r:?}"
+                        ))),
                     }
-                    Err(e) => Err(e),
                 }
-            }
-            // ScalarFunction
-            _ => {
-                let fun = match extensions.get(&f.function_reference) {
-                    Some(fname) => BuiltinScalarFunction::from_str(fname),
-                    None => Err(DataFusionError::NotImplemented(format!(
-                        "Aggregated function not found: function reference = {:?}",
-                        f.function_reference
-                    ))),
-                };
-
-                let mut args: Vec<Expr> = vec![];
-                for arg in f.arguments.iter() {
+                ScalarFunctionType::Not => {
+                    let arg = f.arguments.first().ok_or_else(|| {

Review Comment:
   Should we check that `f.arguments.len() == 1` is true, since `first().ok_or_else()` would only give us an error if the vector is empty? Or do we not care if there's more than one argument?



##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -1342,3 +1285,67 @@ fn from_substrait_null(null_type: &Type) -> Result<ScalarValue> {
         ))
     }
 }
+
+async fn make_datafusion_like(
+    case_insensitive: bool,
+    f: &ScalarFunction,
+    input_schema: &DFSchema,
+    extensions: &HashMap<u32, &String>,
+) -> Result<Arc<Expr>> {
+    let fn_name = if case_insensitive { "ILIKE" } else { "LIKE" };
+    if f.arguments.len() != 3 {
+        return Err(DataFusionError::NotImplemented(format!(
+            "Expect three arguments for `{fn_name}` expr"
+        )));
+    }
+
+    let Some(ArgType::Value(expr_substrait)) = &f.arguments[0].arg_type else {
+        return Err(DataFusionError::NotImplemented(
+            format!("Invalid arguments type for `{}` expr", fn_name)
+        ))
+    };

Review Comment:
   I'm not sure if
   ```Rust
   format!("Invalid arguments type for `{}` expr", fn_name)
   ```
   or 
   ```Rust
   format!("Invalid arguments type for `{fn_name}` expr")
   ```
   is preferred in Rust, but maybe we should choose one to be consistent here? It may be confusing to the reader if different syntaxes are used to implement the same semantic.



##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -1342,3 +1285,67 @@ fn from_substrait_null(null_type: &Type) -> Result<ScalarValue> {
         ))
     }
 }
+
+async fn make_datafusion_like(
+    case_insensitive: bool,
+    f: &ScalarFunction,
+    input_schema: &DFSchema,
+    extensions: &HashMap<u32, &String>,
+) -> Result<Arc<Expr>> {
+    let fn_name = if case_insensitive { "ILIKE" } else { "LIKE" };
+    if f.arguments.len() != 3 {
+        return Err(DataFusionError::NotImplemented(format!(
+            "Expect three arguments for `{fn_name}` expr"
+        )));
+    }
+
+    let Some(ArgType::Value(expr_substrait)) = &f.arguments[0].arg_type else {
+        return Err(DataFusionError::NotImplemented(
+            format!("Invalid arguments type for `{}` expr", fn_name)
+        ))
+    };
+    let expr = from_substrait_rex(expr_substrait, input_schema, extensions)
+        .await?
+        .as_ref()
+        .clone();
+    let Some(ArgType::Value(pattern_substrait)) = &f.arguments[1].arg_type else {
+        return Err(DataFusionError::NotImplemented(
+            "Invalid arguments type for `Like` expr".to_string()
+        ))
+    };
+    let pattern = from_substrait_rex(pattern_substrait, input_schema, extensions)
+        .await?
+        .as_ref()
+        .clone();
+    let Some(ArgType::Value(escape_char_substrait)) = &f.arguments[2].arg_type else {
+        return Err(DataFusionError::NotImplemented(
+            "Invalid arguments type for `Like` expr".to_string()

Review Comment:
   Same comment as above.



##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -1342,3 +1285,67 @@ fn from_substrait_null(null_type: &Type) -> Result<ScalarValue> {
         ))
     }
 }
+
+async fn make_datafusion_like(
+    case_insensitive: bool,
+    f: &ScalarFunction,
+    input_schema: &DFSchema,
+    extensions: &HashMap<u32, &String>,
+) -> Result<Arc<Expr>> {
+    let fn_name = if case_insensitive { "ILIKE" } else { "LIKE" };
+    if f.arguments.len() != 3 {
+        return Err(DataFusionError::NotImplemented(format!(
+            "Expect three arguments for `{fn_name}` expr"
+        )));
+    }
+
+    let Some(ArgType::Value(expr_substrait)) = &f.arguments[0].arg_type else {
+        return Err(DataFusionError::NotImplemented(
+            format!("Invalid arguments type for `{}` expr", fn_name)
+        ))
+    };
+    let expr = from_substrait_rex(expr_substrait, input_schema, extensions)
+        .await?
+        .as_ref()
+        .clone();
+    let Some(ArgType::Value(pattern_substrait)) = &f.arguments[1].arg_type else {
+        return Err(DataFusionError::NotImplemented(
+            "Invalid arguments type for `Like` expr".to_string()

Review Comment:
   Format with `fn_name` here in place of `Like` as well?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] nseekhao commented on a diff in pull request #6840: feat: implement substrait for LIKE/ILIKE expr

Posted by "nseekhao (via GitHub)" <gi...@apache.org>.
nseekhao commented on code in PR #6840:
URL: https://github.com/apache/arrow-datafusion/pull/6840#discussion_r1262760162


##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -1342,3 +1285,67 @@ fn from_substrait_null(null_type: &Type) -> Result<ScalarValue> {
         ))
     }
 }
+
+async fn make_datafusion_like(
+    case_insensitive: bool,
+    f: &ScalarFunction,
+    input_schema: &DFSchema,
+    extensions: &HashMap<u32, &String>,
+) -> Result<Arc<Expr>> {
+    let fn_name = if case_insensitive { "ILIKE" } else { "LIKE" };
+    if f.arguments.len() != 3 {
+        return Err(DataFusionError::NotImplemented(format!(
+            "Expect three arguments for `{fn_name}` expr"
+        )));
+    }
+
+    let Some(ArgType::Value(expr_substrait)) = &f.arguments[0].arg_type else {
+        return Err(DataFusionError::NotImplemented(
+            format!("Invalid arguments type for `{}` expr", fn_name)
+        ))
+    };

Review Comment:
   > BTW, clippy has a lint about this style [uninlined_format_args](https://rust-lang.github.io/rust-clippy/master/index.html#/uninlined_format_args). (but maybe it's unnecessary to enable it?
   
   Oh thanks for the reference! Inlining the args make sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] nseekhao commented on a diff in pull request #6840: feat: implement substrait for LIKE/ILIKE expr

Posted by "nseekhao (via GitHub)" <gi...@apache.org>.
nseekhao commented on code in PR #6840:
URL: https://github.com/apache/arrow-datafusion/pull/6840#discussion_r1254887499


##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -67,8 +67,12 @@ use crate::variation_const::{
 enum ScalarFunctionType {
     Builtin(BuiltinScalarFunction),
     Op(Operator),
-    // logical negation
+    /// [Expr::Not]
     Not,
+    /// [Expr::Like]
+    Like,
+    /// [Expr::ILike]
+    ILike,

Review Comment:
   Might be more informative to add the something like:
   ```suggestion
       /// [Expr::Like] Used for filtering rows based on the given wildcard pattern. Case sensitive
       Like,
       /// [Expr::ILike] Case insensitive operator counterpart of `Like`
       ILike,
   ```



##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -104,7 +108,7 @@ pub fn name_to_op(name: &str) -> Result<Operator> {
     }
 }
 
-fn name_to_op_or_scalar_function(name: &str) -> Result<ScalarFunctionType> {
+fn scalar_function_or_expr(name: &str) -> Result<ScalarFunctionType> {

Review Comment:
   Would a name like `scalar_function_type_from_str()` describe this function better?



##########
datafusion/substrait/src/logical_plan/producer.rs:
##########
@@ -903,6 +904,36 @@ pub fn to_substrait_rex(
                 bounds,
             ))
         }
+        Expr::Like(Like {
+            negated,
+            expr,
+            pattern,
+            escape_char,
+        }) => make_substrait_like_expr(
+            false,
+            *negated,
+            expr,
+            pattern,
+            *escape_char,
+            schema,
+            col_ref_offset,
+            extension_info,
+        ),
+        Expr::ILike(Like {
+            negated,
+            expr,
+            pattern,
+            escape_char,
+        }) => make_substrait_like_expr(
+            true,
+            *negated,
+            expr,
+            pattern,
+            *escape_char,
+            schema,
+            col_ref_offset,

Review Comment:
   [For future improvement. **Not completely related to this PR**]
   
   I think having to carry around `col_ref_offset` for any expression-related functions unnecessarily overcrowds the code. Once we have `SubqueryAlias` support implemented, this should not be necessary anymore. I'll refactor the code when that happens.



##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -1329,3 +1272,66 @@ fn from_substrait_null(null_type: &Type) -> Result<ScalarValue> {
         ))
     }
 }
+
+async fn make_datafusion_like(
+    case_insensitive: bool,
+    f: &ScalarFunction,
+    input_schema: &DFSchema,
+    extensions: &HashMap<u32, &String>,
+) -> Result<Arc<Expr>> {
+    if f.arguments.len() != 3 {
+        return Err(DataFusionError::NotImplemented(
+            "Expect three arguments for `LIKE` expr".to_string(),
+        ));
+    }

Review Comment:
   ```suggestion
       let fn_name = if case_insensitive {"ILIKE"} else {"LIKE"};
       if f.arguments.len() != 3 {
           return Err(DataFusionError::NotImplemented(
               format!("Expect three arguments for `{fn_name}` expr")
           ));
       }
   ```



##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -1329,3 +1272,66 @@ fn from_substrait_null(null_type: &Type) -> Result<ScalarValue> {
         ))
     }
 }
+
+async fn make_datafusion_like(
+    case_insensitive: bool,
+    f: &ScalarFunction,
+    input_schema: &DFSchema,
+    extensions: &HashMap<u32, &String>,
+) -> Result<Arc<Expr>> {
+    if f.arguments.len() != 3 {
+        return Err(DataFusionError::NotImplemented(
+            "Expect three arguments for `LIKE` expr".to_string(),
+        ));
+    }
+
+    let Some(ArgType::Value(expr_substrait)) = &f.arguments[0].arg_type else {
+        return Err(DataFusionError::NotImplemented(
+            "Invalid arguments type for `Like` expr".to_string()

Review Comment:
   ```suggestion
               format!("Invalid arguments type for `{}` expr", fn_name)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waynexia commented on a diff in pull request #6840: feat: implement substrait for LIKE/ILIKE expr

Posted by "waynexia (via GitHub)" <gi...@apache.org>.
waynexia commented on code in PR #6840:
URL: https://github.com/apache/arrow-datafusion/pull/6840#discussion_r1251967011


##########
datafusion/substrait/src/logical_plan/producer.rs:
##########
@@ -1120,6 +1151,70 @@ fn make_substrait_window_function(
     }
 }
 
+#[allow(deprecated)]
+fn make_substrait_like_expr(

Review Comment:
   Maybe we can combine `Expr::Like` and `Expr::ILike` by adding a field `ignore_case`? cc @alamb 



##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -1329,3 +1272,66 @@ fn from_substrait_null(null_type: &Type) -> Result<ScalarValue> {
         ))
     }
 }
+
+async fn make_datafusion_like(

Review Comment:
   Maybe it's time to refactor these large `producer.rs` and `consumer.rs` into several small files and organize them by functionality (e.g., produce `like` and consume `like`) 



##########
datafusion/substrait/src/logical_plan/producer.rs:
##########
@@ -1120,6 +1151,70 @@ fn make_substrait_window_function(
     }
 }
 
+#[allow(deprecated)]
+fn make_substrait_like_expr(
+    ignore_case: bool,
+    negated: bool,
+    expr: &Box<Expr>,
+    pattern: &Box<Expr>,
+    escape_char: Option<char>,
+    schema: &DFSchemaRef,
+    col_ref_offset: usize,
+    extension_info: &mut (
+        Vec<extensions::SimpleExtensionDeclaration>,
+        HashMap<String, u32>,
+    ),
+) -> Result<Expression> {
+    let function_anchor = if ignore_case {
+        _register_function("ilike".to_string(), extension_info)
+    } else {
+        _register_function("like".to_string(), extension_info)
+    };
+    let expr = to_substrait_rex(expr, schema, col_ref_offset, extension_info)?;
+    let pattern = to_substrait_rex(pattern, schema, col_ref_offset, extension_info)?;
+    let escape_char =
+        to_substrait_literal(&ScalarValue::Utf8(escape_char.map(|c| c.to_string())))?;
+    let arguments = vec![
+        FunctionArgument {
+            arg_type: Some(ArgType::Value(expr)),
+        },
+        FunctionArgument {
+            arg_type: Some(ArgType::Value(pattern)),
+        },
+        FunctionArgument {
+            arg_type: Some(ArgType::Value(escape_char)),
+        },
+    ];
+
+    let substrait_like = Expression {
+        rex_type: Some(RexType::ScalarFunction(ScalarFunction {
+            function_reference: function_anchor,
+            arguments,
+            output_type: None,
+            args: vec![],
+            options: vec![],
+        })),
+    };
+
+    if negated {
+        let function_anchor = _register_function("not".to_string(), extension_info);
+
+        Ok(Expression {

Review Comment:
   This pattern is very common. I'm thinking of wrapping it into something like `LogicalPlanBuilder`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waynexia merged pull request #6840: feat: implement substrait for LIKE/ILIKE expr

Posted by "waynexia (via GitHub)" <gi...@apache.org>.
waynexia merged PR #6840:
URL: https://github.com/apache/arrow-datafusion/pull/6840


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waynexia commented on a diff in pull request #6840: feat: implement substrait for LIKE/ILIKE expr

Posted by "waynexia (via GitHub)" <gi...@apache.org>.
waynexia commented on code in PR #6840:
URL: https://github.com/apache/arrow-datafusion/pull/6840#discussion_r1260683571


##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -1342,3 +1285,67 @@ fn from_substrait_null(null_type: &Type) -> Result<ScalarValue> {
         ))
     }
 }
+
+async fn make_datafusion_like(
+    case_insensitive: bool,
+    f: &ScalarFunction,
+    input_schema: &DFSchema,
+    extensions: &HashMap<u32, &String>,
+) -> Result<Arc<Expr>> {
+    let fn_name = if case_insensitive { "ILIKE" } else { "LIKE" };
+    if f.arguments.len() != 3 {
+        return Err(DataFusionError::NotImplemented(format!(
+            "Expect three arguments for `{fn_name}` expr"
+        )));
+    }
+
+    let Some(ArgType::Value(expr_substrait)) = &f.arguments[0].arg_type else {
+        return Err(DataFusionError::NotImplemented(
+            format!("Invalid arguments type for `{}` expr", fn_name)
+        ))
+    };

Review Comment:
   Okay 👍 
   
   BTW, clippy has a lint about this style [`uninlined_format_args`](https://rust-lang.github.io/rust-clippy/master/index.html#/uninlined_format_args). (but maybe it's unnecessary to enable it?



##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -1342,3 +1285,67 @@ fn from_substrait_null(null_type: &Type) -> Result<ScalarValue> {
         ))
     }
 }
+
+async fn make_datafusion_like(
+    case_insensitive: bool,
+    f: &ScalarFunction,
+    input_schema: &DFSchema,
+    extensions: &HashMap<u32, &String>,
+) -> Result<Arc<Expr>> {
+    let fn_name = if case_insensitive { "ILIKE" } else { "LIKE" };
+    if f.arguments.len() != 3 {
+        return Err(DataFusionError::NotImplemented(format!(
+            "Expect three arguments for `{fn_name}` expr"
+        )));
+    }
+
+    let Some(ArgType::Value(expr_substrait)) = &f.arguments[0].arg_type else {
+        return Err(DataFusionError::NotImplemented(
+            format!("Invalid arguments type for `{}` expr", fn_name)
+        ))
+    };

Review Comment:
   Okay 👍 
   
   BTW, clippy has a lint about this style [`uninlined_format_args`](https://rust-lang.github.io/rust-clippy/master/index.html#/uninlined_format_args). (but maybe it's unnecessary to enable it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waynexia commented on pull request #6840: feat: implement substrait for LIKE/ILIKE expr

Posted by "waynexia (via GitHub)" <gi...@apache.org>.
waynexia commented on PR #6840:
URL: https://github.com/apache/arrow-datafusion/pull/6840#issuecomment-1620299220

   > > This patch doesn't use this [definition](https://substrait.io/extensions/functions_string/#like) as our `Expr::Like` has one more field `escape_char` (but it's not used/supported in the physical plan lol).
   > 
   > IIRC I've ran into this issue before as well. I think we should just remove that field and bail out during the SQL->Logical lowering if the SQL parser encounters that (because it seems the logical expr. is modeled after SQL and the physical after whats possible in arrow at the moment). Or we decide that this is a feature we actually want and fix the logical->physical lowering.
   
   This is also a solution. I have a little background about this feature. But this is supported in PG ([doc](https://www.postgresql.org/docs/7.3/functions-matching.html)), so maybe we are going to implement it in the future?
   
   > > Maybe we can combine `Expr::Like` and `Expr::ILike` by adding a field `ignore_case`?
   > 
   > Yes please.
   
   #6841


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org