You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by we...@apache.org on 2019/04/09 05:50:04 UTC

[spark] branch master updated: [SPARK-27328][SQL] Add 'deprecated' in ExpressionDescription for extended usage and SQL doc

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new f16dfb9  [SPARK-27328][SQL] Add 'deprecated' in ExpressionDescription for extended usage and SQL doc
f16dfb9 is described below

commit f16dfb91296c49c4a3f82a77327d235b210f6226
Author: Hyukjin Kwon <gu...@apache.org>
AuthorDate: Tue Apr 9 13:49:42 2019 +0800

    [SPARK-27328][SQL] Add 'deprecated' in ExpressionDescription for extended usage and SQL doc
    
    ## What changes were proposed in this pull request?
    
    This PR proposes to two things:
    
    1. Add `deprecated` field to `ExpressionDescription` so that it can be shown in our SQL function documentation (https://spark.apache.org/docs/latest/api/sql/), and it can be shown via `DESCRIBE FUNCTION EXTENDED`.
    
    2. While I am here, add some more restrictions for `note()` and `since()`. Looks some documentations are broken due to malformed `note`:
    
        ![Screen Shot 2019-03-31 at 3 00 53 PM](https://user-images.githubusercontent.com/6477701/55285518-a3e88500-53c8-11e9-9e99-41d857794fbe.png)
    
        It should start with 4 spaces and end with a newline. I added some asserts, and fixed the instances together while I am here. This is technically a breaking change but I think it's too trivial to note somewhere (and we're in Spark 3.0.0).
    
    This PR adds `deprecated` property into `from_utc_timestamp` and `to_utc_timestamp` (it's deprecated as of #24195) as examples of using this field.
    
    Now it shows the deprecation information as below:
    
    - **SQL documentation is shown as below:**
    
        ![Screen Shot 2019-03-31 at 3 07 31 PM](https://user-images.githubusercontent.com/6477701/55285537-2113fa00-53c9-11e9-9932-f5693a03332d.png)
    
    - **`DESCRIBE FUNCTION EXTENDED from_utc_timestamp;`**:
    
        ```
        Function: from_utc_timestamp
        Class: org.apache.spark.sql.catalyst.expressions.FromUTCTimestamp
        Usage: from_utc_timestamp(timestamp, timezone) - Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'.
        Extended Usage:
            Examples:
              > SELECT from_utc_timestamp('2016-08-31', 'Asia/Seoul');
               2016-08-31 09:00:00
    
            Since: 1.5.0
    
            Deprecated:
              Deprecated since 3.0.0. See SPARK-25496.
    
        ```
    
    ## How was this patch tested?
    
    Manually tested via:
    
    - For documentation verification:
    
        ```
        $ cd sql
        $ sh create-docs.sh
        ```
    
    - For checking description:
    
        ```
        $ ./bin/spark-sql
        ```
        ```
        spark-sql> DESCRIBE FUNCTION EXTENDED from_utc_timestamp;
        spark-sql> DESCRIBE FUNCTION EXTENDED to_utc_timestamp;
        ```
    
    Closes #24259 from HyukjinKwon/SPARK-27328.
    
    Authored-by: Hyukjin Kwon <gu...@apache.org>
    Signed-off-by: Wenchen Fan <we...@databricks.com>
---
 .../expressions/ExpressionDescription.java         | 64 +++++++++++++++-------
 .../sql/catalyst/expressions/ExpressionInfo.java   | 35 ++++++++++--
 .../sql/catalyst/analysis/FunctionRegistry.scala   |  5 +-
 .../expressions/collectionOperations.scala         | 12 +++-
 .../catalyst/expressions/complexTypeCreator.scala  |  1 +
 .../catalyst/expressions/datetimeExpressions.scala | 10 +++-
 .../spark/sql/catalyst/expressions/misc.scala      |  4 +-
 .../catalyst/expressions/randomExpressions.scala   |  8 ++-
 sql/gen-sql-markdown.py                            | 33 ++++++++++-
 9 files changed, 136 insertions(+), 36 deletions(-)

diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java
index ea6fffa..6a64cb1 100644
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java
@@ -31,35 +31,58 @@ import java.lang.annotation.RetentionPolicy;
  * `usage()` will be used for the function usage in brief way.
  *
  * These below are concatenated and used for the function usage in verbose way, suppose arguments,
- * examples, note and since will be provided.
+ * examples, note, since and deprecated will be provided.
  *
- * `arguments()` describes arguments for the expression. This should follow the format as below:
+ * `arguments()` describes arguments for the expression.
  *
- *   Arguments:
- *     * arg0 - ...
- *         ....
- *     * arg1 - ...
- *         ....
- *
- * `examples()` describes examples for the expression. This should follow the format as below:
- *
- *   Examples:
- *     > SELECT ...;
- *      ...
- *     > SELECT ...;
- *      ...
+ * `examples()` describes examples for the expression.
  *
  * `note()` contains some notes for the expression optionally.
  *
  * `since()` contains version information for the expression. Version is specified by,
  * for example, "2.2.0".
  *
- *  We can refer the function name by `_FUNC_`, in `usage`, `arguments` and `examples`, as it's
- *  registered in `FunctionRegistry`.
+ * `deprecated()` contains deprecation information for the expression optionally, for example,
+ * "Deprecated since 2.2.0. Use something else instead".
+ *
+ * The format, in particular for `arguments()`, `examples()`,`note()`, `since()` and
+ * `deprecated()`, should strictly be as follows.
+ *
+ * <pre>
+ * <code>@ExpressionDescription(
+ *   ...
+ *   arguments = """
+ *     Arguments:
+ *       * arg0 - ...
+ *           ....
+ *       * arg1 - ...
+ *           ....
+ *   """,
+ *   examples = """
+ *     Examples:
+ *       > SELECT ...;
+ *        ...
+ *       > SELECT ...;
+ *        ...
+ *   """,
+ *   note = """
+ *     ...
+ *   """,
+ *   since = "3.0.0",
+ *   deprecated = """
+ *     ...
+ *   """)
+ * </code>
+ * </pre>
+ *
+ *  We can refer the function name by `_FUNC_`, in `usage()`, `arguments()` and `examples()` as
+ *  it is registered in `FunctionRegistry`.
+ *
+ *  Note that, if `extended()` is defined, `arguments()`, `examples()`, `note()`, `since()` and
+ *  `deprecated()` should be not defined together. `extended()` exists for backward compatibility.
  *
- *  Note that, if `extended()` is defined, `arguments()`, `examples()`, `note()` and `since()` will
- *  be ignored and `extended()` will be used for the extended description for backward
- *  compatibility.
+ *  Note this contents are used in the SparkSQL documentation for built-in functions. The contents
+ *  here are considered as a Markdown text and then rendered.
  */
 @DeveloperApi
 @Retention(RetentionPolicy.RUNTIME)
@@ -70,4 +93,5 @@ public @interface ExpressionDescription {
     String examples() default "";
     String note() default "";
     String since() default "";
+    String deprecated() default "";
 }
diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java
index 20bdd87..0c69942 100644
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java
@@ -30,6 +30,7 @@ public class ExpressionInfo {
     private String examples;
     private String note;
     private String since;
+    private String deprecated;
 
     public String getClassName() {
         return className;
@@ -63,6 +64,10 @@ public class ExpressionInfo {
         return note;
     }
 
+    public String getDeprecated() {
+        return deprecated;
+    }
+
     public String getDb() {
         return db;
     }
@@ -75,13 +80,15 @@ public class ExpressionInfo {
             String arguments,
             String examples,
             String note,
-            String since) {
+            String since,
+            String deprecated) {
         assert name != null;
         assert arguments != null;
         assert examples != null;
         assert examples.isEmpty() || examples.contains("    Examples:");
         assert note != null;
         assert since != null;
+        assert deprecated != null;
 
         this.className = className;
         this.db = db;
@@ -91,6 +98,7 @@ public class ExpressionInfo {
         this.examples = examples;
         this.note = note;
         this.since = since;
+        this.deprecated = deprecated;
 
         // Make the extended description.
         this.extended = arguments + examples;
@@ -98,25 +106,44 @@ public class ExpressionInfo {
             this.extended = "\n    No example/argument for _FUNC_.\n";
         }
         if (!note.isEmpty()) {
+            if (!note.contains("    ") || !note.endsWith("  ")) {
+                throw new IllegalArgumentException("'note' is malformed in the expression [" +
+                    this.name + "]. It should start with a newline and 4 leading spaces; end " +
+                    "with a newline and two spaces; however, got [" + note + "].");
+            }
             this.extended += "\n    Note:\n      " + note.trim() + "\n";
         }
         if (!since.isEmpty()) {
+            if (Integer.parseInt(since.split("\\.")[0]) < 0) {
+                throw new IllegalArgumentException("'since' is malformed in the expression [" +
+                    this.name + "]. It should not start with a negative number; however, " +
+                    "got [" + since + "].");
+            }
             this.extended += "\n    Since: " + since + "\n";
         }
+        if (!deprecated.isEmpty()) {
+            if (!deprecated.contains("    ") || !deprecated.endsWith("  ")) {
+                throw new IllegalArgumentException("'deprecated' is malformed in the " +
+                    "expression [" + this.name + "]. It should start with a newline and 4 " +
+                    "leading spaces; end with a newline and two spaces; however, got [" +
+                    deprecated + "].");
+            }
+            this.extended += "\n    Deprecated:\n      " + deprecated.trim() + "\n";
+        }
     }
 
     public ExpressionInfo(String className, String name) {
-        this(className, null, name, null, "", "", "", "");
+        this(className, null, name, null, "", "", "", "", "");
     }
 
     public ExpressionInfo(String className, String db, String name) {
-        this(className, db, name, null, "", "", "", "");
+        this(className, db, name, null, "", "", "", "", "");
     }
 
     // This is to keep the original constructor just in case.
     public ExpressionInfo(String className, String db, String name, String usage, String extended) {
         // `arguments` and `examples` are concatenated for the extended description. So, here
         // simply pass the `extended` as `arguments` and an empty string for `examples`.
-        this(className, db, name, usage, extended, "", "", "");
+        this(className, db, name, usage, extended, "", "", "", "");
     }
 }
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
index 46cf0f9..deb53cf 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
@@ -621,7 +621,7 @@ object FunctionRegistry {
     val clazz = scala.reflect.classTag[Cast].runtimeClass
     val usage = "_FUNC_(expr) - Casts the value `expr` to the target data type `_FUNC_`."
     val expressionInfo =
-      new ExpressionInfo(clazz.getCanonicalName, null, name, usage, "", "", "", "")
+      new ExpressionInfo(clazz.getCanonicalName, null, name, usage, "", "", "", "", "")
     (name, (expressionInfo, builder))
   }
 
@@ -641,7 +641,8 @@ object FunctionRegistry {
           df.arguments(),
           df.examples(),
           df.note(),
-          df.since())
+          df.since(),
+          df.deprecated())
       } else {
         // This exists for the backward compatibility with old `ExpressionDescription`s defining
         // the extended description in `extended()`.
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
index 2a7488a..41d9b06 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
@@ -958,7 +958,9 @@ case class ArraySort(child: Expression) extends UnaryExpression with ArraySortLi
       > SELECT _FUNC_(array(1, 20, null, 3));
        [20,null,3,1]
   """,
-  note = "The function is non-deterministic.",
+  note = """
+    The function is non-deterministic.
+  """,
   since = "2.4.0")
 case class Shuffle(child: Expression, randomSeed: Option[Long] = None)
   extends UnaryExpression with ExpectsInputTypes with Stateful with ExpressionWithRandomSeed {
@@ -1042,7 +1044,9 @@ case class Shuffle(child: Expression, randomSeed: Option[Long] = None)
        [3,4,1,2]
   """,
   since = "1.5.0",
-  note = "Reverse logic for arrays is available since 2.4.0."
+  note = """
+    Reverse logic for arrays is available since 2.4.0.
+  """
 )
 case class Reverse(child: Expression) extends UnaryExpression with ImplicitCastInputTypes {
 
@@ -2056,7 +2060,9 @@ case class ElementAt(left: Expression, right: Expression)
       > SELECT _FUNC_(array(1, 2, 3), array(4, 5), array(6));
        [1,2,3,4,5,6]
   """,
-  note = "Concat logic for arrays is available since 2.4.0.")
+  note = """
+    Concat logic for arrays is available since 2.4.0.
+  """)
 case class Concat(children: Seq[Expression]) extends ComplexTypeMergingExpression {
 
   private def allowedTypes: Seq[AbstractDataType] = Seq(StringType, BinaryType, ArrayType)
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
index 4e722c9..8d3a641 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
@@ -288,6 +288,7 @@ object CreateStruct extends FunctionBuilder {
       "",
       "",
       "",
+      "",
       "")
     ("struct", (info, this))
   }
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
index 784425e..5fb0b85 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
@@ -1018,7 +1018,10 @@ case class TimeAdd(start: Expression, interval: Expression, timeZoneId: Option[S
       > SELECT from_utc_timestamp('2016-08-31', 'Asia/Seoul');
        2016-08-31 09:00:00
   """,
-  since = "1.5.0")
+  since = "1.5.0",
+  deprecated = """
+    Deprecated since 3.0.0. See SPARK-25496.
+  """)
 // scalastyle:on line.size.limit
 case class FromUTCTimestamp(left: Expression, right: Expression)
   extends BinaryExpression with ImplicitCastInputTypes {
@@ -1229,7 +1232,10 @@ case class MonthsBetween(
       > SELECT _FUNC_('2016-08-31', 'Asia/Seoul');
        2016-08-30 15:00:00
   """,
-  since = "1.5.0")
+  since = "1.5.0",
+  deprecated = """
+    Deprecated since 3.0.0. See SPARK-25496.
+  """)
 // scalastyle:on line.size.limit
 case class ToUTCTimestamp(left: Expression, right: Expression)
   extends BinaryExpression with ImplicitCastInputTypes {
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
index 1f1decc..2af2b13 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
@@ -125,7 +125,9 @@ case class CurrentDatabase() extends LeafExpression with Unevaluable {
       > SELECT _FUNC_();
        46707d92-02f4-4817-8116-a4c3b23e6266
   """,
-  note = "The function is non-deterministic.")
+  note = """
+    The function is non-deterministic.
+  """)
 // scalastyle:on line.size.limit
 case class Uuid(randomSeed: Option[Long] = None) extends LeafExpression with Stateful
     with ExpressionWithRandomSeed {
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
index b70c341..91a8ac0 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
@@ -78,7 +78,9 @@ trait ExpressionWithRandomSeed {
       > SELECT _FUNC_(null);
        0.8446490682263027
   """,
-  note = "The function is non-deterministic in general case.")
+  note = """
+    The function is non-deterministic in general case.
+  """)
 // scalastyle:on line.size.limit
 case class Rand(child: Expression) extends RDG with ExpressionWithRandomSeed {
 
@@ -118,7 +120,9 @@ object Rand {
       > SELECT _FUNC_(null);
        1.1164209726833079
   """,
-  note = "The function is non-deterministic in general case.")
+  note = """
+    The function is non-deterministic in general case.
+  """)
 // scalastyle:on line.size.limit
 case class Randn(child: Expression) extends RDG with ExpressionWithRandomSeed {
 
diff --git a/sql/gen-sql-markdown.py b/sql/gen-sql-markdown.py
index fa8124b..e0529f8 100644
--- a/sql/gen-sql-markdown.py
+++ b/sql/gen-sql-markdown.py
@@ -20,7 +20,7 @@ import os
 from collections import namedtuple
 
 ExpressionInfo = namedtuple(
-    "ExpressionInfo", "className name usage arguments examples note since")
+    "ExpressionInfo", "className name usage arguments examples note since deprecated")
 
 
 def _list_function_infos(jvm):
@@ -42,7 +42,8 @@ def _list_function_infos(jvm):
             arguments=jinfo.getArguments().replace("_FUNC_", name),
             examples=jinfo.getExamples().replace("_FUNC_", name),
             note=jinfo.getNote(),
-            since=jinfo.getSince()))
+            since=jinfo.getSince(),
+            deprecated=jinfo.getDeprecated()))
     return sorted(infos, key=lambda i: i.name)
 
 
@@ -136,6 +137,27 @@ def _make_pretty_note(note):
         return "**Note:**\n%s\n" % note
 
 
+def _make_pretty_deprecated(deprecated):
+    """
+    Makes the deprecated description pretty and returns a formatted string if `deprecated`
+    is not an empty string. Otherwise, returns None.
+
+    Expected input:
+
+        ...
+
+    Expected output:
+    **Deprecated:**
+
+    ...
+
+    """
+
+    if deprecated != "":
+        deprecated = "\n".join(map(lambda n: n[4:], deprecated.split("\n")))
+        return "**Deprecated:**\n%s\n" % deprecated
+
+
 def generate_sql_markdown(jvm, path):
     """
     Generates a markdown file after listing the function information. The output file
@@ -162,6 +184,10 @@ def generate_sql_markdown(jvm, path):
 
     **Since:** SINCE
 
+    **Deprecated:**
+
+    DEPRECATED
+
     <br/>
 
     """
@@ -174,6 +200,7 @@ def generate_sql_markdown(jvm, path):
             examples = _make_pretty_examples(info.examples)
             note = _make_pretty_note(info.note)
             since = info.since
+            deprecated = _make_pretty_deprecated(info.deprecated)
 
             mdfile.write("### %s\n\n" % name)
             if usage is not None:
@@ -186,6 +213,8 @@ def generate_sql_markdown(jvm, path):
                 mdfile.write(note)
             if since is not None and since != "":
                 mdfile.write("**Since:** %s\n\n" % since.strip())
+            if deprecated is not None:
+                mdfile.write(deprecated)
             mdfile.write("<br/>\n\n")
 
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org