You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "MaxGekk (via GitHub)" <gi...@apache.org> on 2023/05/17 06:51:20 UTC

[GitHub] [spark] MaxGekk commented on a diff in pull request #41169: [SPARK-43493][SQL] Add a max distance argument to the levenshtein() function

MaxGekk commented on code in PR #41169:
URL: https://github.com/apache/spark/pull/41169#discussion_r1196017647


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:
##########
@@ -2134,30 +2134,145 @@ case class OctetLength(child: Expression)
  * A function that return the Levenshtein distance between the two given strings.
  */
 @ExpressionDescription(
-  usage = "_FUNC_(str1, str2) - Returns the Levenshtein distance between the two given strings.",
+  usage = """
+    _FUNC_(str1, str2) - Returns the Levenshtein distance between the two given strings.
+      If threshold is set and distance more than it, return -1.""",

Review Comment:
   Please, add the parameter that you mentioned: `_FUNC_(str1, str2[, threshold])`



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:
##########
@@ -2134,30 +2134,145 @@ case class OctetLength(child: Expression)
  * A function that return the Levenshtein distance between the two given strings.
  */
 @ExpressionDescription(
-  usage = "_FUNC_(str1, str2) - Returns the Levenshtein distance between the two given strings.",
+  usage = """
+    _FUNC_(str1, str2) - Returns the Levenshtein distance between the two given strings.
+      If threshold is set and distance more than it, return -1.""",
   examples = """
     Examples:
       > SELECT _FUNC_('kitten', 'sitting');
        3
+      > SELECT _FUNC_('kitten', 'sitting', 2);
+       -1
   """,
   since = "1.5.0",
   group = "string_funcs")
-case class Levenshtein(left: Expression, right: Expression) extends BinaryExpression
-    with ImplicitCastInputTypes with NullIntolerant {
+case class Levenshtein(
+    left: Expression,
+    right: Expression,
+    threshold: Option[Expression] = None)
+  extends Expression

Review Comment:
   How about to extend `TernaryExpression`, see for instance:
   https://github.com/apache/spark/blob/e37c74b3ef40aebd33ca9338210300a1ed1164f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L1974-L1979
   
   Or there are some reasons to don't do that?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org