You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by we...@apache.org on 2019/04/09 05:50:04 UTC
[spark] branch master updated: [SPARK-27328][SQL] Add 'deprecated'
in ExpressionDescription for extended usage and SQL doc
This is an automated email from the ASF dual-hosted git repository.
wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new f16dfb9 [SPARK-27328][SQL] Add 'deprecated' in ExpressionDescription for extended usage and SQL doc
f16dfb9 is described below
commit f16dfb91296c49c4a3f82a77327d235b210f6226
Author: Hyukjin Kwon <gu...@apache.org>
AuthorDate: Tue Apr 9 13:49:42 2019 +0800
[SPARK-27328][SQL] Add 'deprecated' in ExpressionDescription for extended usage and SQL doc
## What changes were proposed in this pull request?
This PR proposes to two things:
1. Add `deprecated` field to `ExpressionDescription` so that it can be shown in our SQL function documentation (https://spark.apache.org/docs/latest/api/sql/), and it can be shown via `DESCRIBE FUNCTION EXTENDED`.
2. While I am here, add some more restrictions for `note()` and `since()`. Looks some documentations are broken due to malformed `note`:
![Screen Shot 2019-03-31 at 3 00 53 PM](https://user-images.githubusercontent.com/6477701/55285518-a3e88500-53c8-11e9-9e99-41d857794fbe.png)
It should start with 4 spaces and end with a newline. I added some asserts, and fixed the instances together while I am here. This is technically a breaking change but I think it's too trivial to note somewhere (and we're in Spark 3.0.0).
This PR adds `deprecated` property into `from_utc_timestamp` and `to_utc_timestamp` (it's deprecated as of #24195) as examples of using this field.
Now it shows the deprecation information as below:
- **SQL documentation is shown as below:**
![Screen Shot 2019-03-31 at 3 07 31 PM](https://user-images.githubusercontent.com/6477701/55285537-2113fa00-53c9-11e9-9932-f5693a03332d.png)
- **`DESCRIBE FUNCTION EXTENDED from_utc_timestamp;`**:
```
Function: from_utc_timestamp
Class: org.apache.spark.sql.catalyst.expressions.FromUTCTimestamp
Usage: from_utc_timestamp(timestamp, timezone) - Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'.
Extended Usage:
Examples:
> SELECT from_utc_timestamp('2016-08-31', 'Asia/Seoul');
2016-08-31 09:00:00
Since: 1.5.0
Deprecated:
Deprecated since 3.0.0. See SPARK-25496.
```
## How was this patch tested?
Manually tested via:
- For documentation verification:
```
$ cd sql
$ sh create-docs.sh
```
- For checking description:
```
$ ./bin/spark-sql
```
```
spark-sql> DESCRIBE FUNCTION EXTENDED from_utc_timestamp;
spark-sql> DESCRIBE FUNCTION EXTENDED to_utc_timestamp;
```
Closes #24259 from HyukjinKwon/SPARK-27328.
Authored-by: Hyukjin Kwon <gu...@apache.org>
Signed-off-by: Wenchen Fan <we...@databricks.com>
---
.../expressions/ExpressionDescription.java | 64 +++++++++++++++-------
.../sql/catalyst/expressions/ExpressionInfo.java | 35 ++++++++++--
.../sql/catalyst/analysis/FunctionRegistry.scala | 5 +-
.../expressions/collectionOperations.scala | 12 +++-
.../catalyst/expressions/complexTypeCreator.scala | 1 +
.../catalyst/expressions/datetimeExpressions.scala | 10 +++-
.../spark/sql/catalyst/expressions/misc.scala | 4 +-
.../catalyst/expressions/randomExpressions.scala | 8 ++-
sql/gen-sql-markdown.py | 33 ++++++++++-
9 files changed, 136 insertions(+), 36 deletions(-)
diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java
index ea6fffa..6a64cb1 100644
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java
@@ -31,35 +31,58 @@ import java.lang.annotation.RetentionPolicy;
* `usage()` will be used for the function usage in brief way.
*
* These below are concatenated and used for the function usage in verbose way, suppose arguments,
- * examples, note and since will be provided.
+ * examples, note, since and deprecated will be provided.
*
- * `arguments()` describes arguments for the expression. This should follow the format as below:
+ * `arguments()` describes arguments for the expression.
*
- * Arguments:
- * * arg0 - ...
- * ....
- * * arg1 - ...
- * ....
- *
- * `examples()` describes examples for the expression. This should follow the format as below:
- *
- * Examples:
- * > SELECT ...;
- * ...
- * > SELECT ...;
- * ...
+ * `examples()` describes examples for the expression.
*
* `note()` contains some notes for the expression optionally.
*
* `since()` contains version information for the expression. Version is specified by,
* for example, "2.2.0".
*
- * We can refer the function name by `_FUNC_`, in `usage`, `arguments` and `examples`, as it's
- * registered in `FunctionRegistry`.
+ * `deprecated()` contains deprecation information for the expression optionally, for example,
+ * "Deprecated since 2.2.0. Use something else instead".
+ *
+ * The format, in particular for `arguments()`, `examples()`,`note()`, `since()` and
+ * `deprecated()`, should strictly be as follows.
+ *
+ * <pre>
+ * <code>@ExpressionDescription(
+ * ...
+ * arguments = """
+ * Arguments:
+ * * arg0 - ...
+ * ....
+ * * arg1 - ...
+ * ....
+ * """,
+ * examples = """
+ * Examples:
+ * > SELECT ...;
+ * ...
+ * > SELECT ...;
+ * ...
+ * """,
+ * note = """
+ * ...
+ * """,
+ * since = "3.0.0",
+ * deprecated = """
+ * ...
+ * """)
+ * </code>
+ * </pre>
+ *
+ * We can refer the function name by `_FUNC_`, in `usage()`, `arguments()` and `examples()` as
+ * it is registered in `FunctionRegistry`.
+ *
+ * Note that, if `extended()` is defined, `arguments()`, `examples()`, `note()`, `since()` and
+ * `deprecated()` should be not defined together. `extended()` exists for backward compatibility.
*
- * Note that, if `extended()` is defined, `arguments()`, `examples()`, `note()` and `since()` will
- * be ignored and `extended()` will be used for the extended description for backward
- * compatibility.
+ * Note this contents are used in the SparkSQL documentation for built-in functions. The contents
+ * here are considered as a Markdown text and then rendered.
*/
@DeveloperApi
@Retention(RetentionPolicy.RUNTIME)
@@ -70,4 +93,5 @@ public @interface ExpressionDescription {
String examples() default "";
String note() default "";
String since() default "";
+ String deprecated() default "";
}
diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java
index 20bdd87..0c69942 100644
--- a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java
+++ b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java
@@ -30,6 +30,7 @@ public class ExpressionInfo {
private String examples;
private String note;
private String since;
+ private String deprecated;
public String getClassName() {
return className;
@@ -63,6 +64,10 @@ public class ExpressionInfo {
return note;
}
+ public String getDeprecated() {
+ return deprecated;
+ }
+
public String getDb() {
return db;
}
@@ -75,13 +80,15 @@ public class ExpressionInfo {
String arguments,
String examples,
String note,
- String since) {
+ String since,
+ String deprecated) {
assert name != null;
assert arguments != null;
assert examples != null;
assert examples.isEmpty() || examples.contains(" Examples:");
assert note != null;
assert since != null;
+ assert deprecated != null;
this.className = className;
this.db = db;
@@ -91,6 +98,7 @@ public class ExpressionInfo {
this.examples = examples;
this.note = note;
this.since = since;
+ this.deprecated = deprecated;
// Make the extended description.
this.extended = arguments + examples;
@@ -98,25 +106,44 @@ public class ExpressionInfo {
this.extended = "\n No example/argument for _FUNC_.\n";
}
if (!note.isEmpty()) {
+ if (!note.contains(" ") || !note.endsWith(" ")) {
+ throw new IllegalArgumentException("'note' is malformed in the expression [" +
+ this.name + "]. It should start with a newline and 4 leading spaces; end " +
+ "with a newline and two spaces; however, got [" + note + "].");
+ }
this.extended += "\n Note:\n " + note.trim() + "\n";
}
if (!since.isEmpty()) {
+ if (Integer.parseInt(since.split("\\.")[0]) < 0) {
+ throw new IllegalArgumentException("'since' is malformed in the expression [" +
+ this.name + "]. It should not start with a negative number; however, " +
+ "got [" + since + "].");
+ }
this.extended += "\n Since: " + since + "\n";
}
+ if (!deprecated.isEmpty()) {
+ if (!deprecated.contains(" ") || !deprecated.endsWith(" ")) {
+ throw new IllegalArgumentException("'deprecated' is malformed in the " +
+ "expression [" + this.name + "]. It should start with a newline and 4 " +
+ "leading spaces; end with a newline and two spaces; however, got [" +
+ deprecated + "].");
+ }
+ this.extended += "\n Deprecated:\n " + deprecated.trim() + "\n";
+ }
}
public ExpressionInfo(String className, String name) {
- this(className, null, name, null, "", "", "", "");
+ this(className, null, name, null, "", "", "", "", "");
}
public ExpressionInfo(String className, String db, String name) {
- this(className, db, name, null, "", "", "", "");
+ this(className, db, name, null, "", "", "", "", "");
}
// This is to keep the original constructor just in case.
public ExpressionInfo(String className, String db, String name, String usage, String extended) {
// `arguments` and `examples` are concatenated for the extended description. So, here
// simply pass the `extended` as `arguments` and an empty string for `examples`.
- this(className, db, name, usage, extended, "", "", "");
+ this(className, db, name, usage, extended, "", "", "", "");
}
}
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
index 46cf0f9..deb53cf 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
@@ -621,7 +621,7 @@ object FunctionRegistry {
val clazz = scala.reflect.classTag[Cast].runtimeClass
val usage = "_FUNC_(expr) - Casts the value `expr` to the target data type `_FUNC_`."
val expressionInfo =
- new ExpressionInfo(clazz.getCanonicalName, null, name, usage, "", "", "", "")
+ new ExpressionInfo(clazz.getCanonicalName, null, name, usage, "", "", "", "", "")
(name, (expressionInfo, builder))
}
@@ -641,7 +641,8 @@ object FunctionRegistry {
df.arguments(),
df.examples(),
df.note(),
- df.since())
+ df.since(),
+ df.deprecated())
} else {
// This exists for the backward compatibility with old `ExpressionDescription`s defining
// the extended description in `extended()`.
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
index 2a7488a..41d9b06 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
@@ -958,7 +958,9 @@ case class ArraySort(child: Expression) extends UnaryExpression with ArraySortLi
> SELECT _FUNC_(array(1, 20, null, 3));
[20,null,3,1]
""",
- note = "The function is non-deterministic.",
+ note = """
+ The function is non-deterministic.
+ """,
since = "2.4.0")
case class Shuffle(child: Expression, randomSeed: Option[Long] = None)
extends UnaryExpression with ExpectsInputTypes with Stateful with ExpressionWithRandomSeed {
@@ -1042,7 +1044,9 @@ case class Shuffle(child: Expression, randomSeed: Option[Long] = None)
[3,4,1,2]
""",
since = "1.5.0",
- note = "Reverse logic for arrays is available since 2.4.0."
+ note = """
+ Reverse logic for arrays is available since 2.4.0.
+ """
)
case class Reverse(child: Expression) extends UnaryExpression with ImplicitCastInputTypes {
@@ -2056,7 +2060,9 @@ case class ElementAt(left: Expression, right: Expression)
> SELECT _FUNC_(array(1, 2, 3), array(4, 5), array(6));
[1,2,3,4,5,6]
""",
- note = "Concat logic for arrays is available since 2.4.0.")
+ note = """
+ Concat logic for arrays is available since 2.4.0.
+ """)
case class Concat(children: Seq[Expression]) extends ComplexTypeMergingExpression {
private def allowedTypes: Seq[AbstractDataType] = Seq(StringType, BinaryType, ArrayType)
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
index 4e722c9..8d3a641 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
@@ -288,6 +288,7 @@ object CreateStruct extends FunctionBuilder {
"",
"",
"",
+ "",
"")
("struct", (info, this))
}
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
index 784425e..5fb0b85 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
@@ -1018,7 +1018,10 @@ case class TimeAdd(start: Expression, interval: Expression, timeZoneId: Option[S
> SELECT from_utc_timestamp('2016-08-31', 'Asia/Seoul');
2016-08-31 09:00:00
""",
- since = "1.5.0")
+ since = "1.5.0",
+ deprecated = """
+ Deprecated since 3.0.0. See SPARK-25496.
+ """)
// scalastyle:on line.size.limit
case class FromUTCTimestamp(left: Expression, right: Expression)
extends BinaryExpression with ImplicitCastInputTypes {
@@ -1229,7 +1232,10 @@ case class MonthsBetween(
> SELECT _FUNC_('2016-08-31', 'Asia/Seoul');
2016-08-30 15:00:00
""",
- since = "1.5.0")
+ since = "1.5.0",
+ deprecated = """
+ Deprecated since 3.0.0. See SPARK-25496.
+ """)
// scalastyle:on line.size.limit
case class ToUTCTimestamp(left: Expression, right: Expression)
extends BinaryExpression with ImplicitCastInputTypes {
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
index 1f1decc..2af2b13 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
@@ -125,7 +125,9 @@ case class CurrentDatabase() extends LeafExpression with Unevaluable {
> SELECT _FUNC_();
46707d92-02f4-4817-8116-a4c3b23e6266
""",
- note = "The function is non-deterministic.")
+ note = """
+ The function is non-deterministic.
+ """)
// scalastyle:on line.size.limit
case class Uuid(randomSeed: Option[Long] = None) extends LeafExpression with Stateful
with ExpressionWithRandomSeed {
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
index b70c341..91a8ac0 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
@@ -78,7 +78,9 @@ trait ExpressionWithRandomSeed {
> SELECT _FUNC_(null);
0.8446490682263027
""",
- note = "The function is non-deterministic in general case.")
+ note = """
+ The function is non-deterministic in general case.
+ """)
// scalastyle:on line.size.limit
case class Rand(child: Expression) extends RDG with ExpressionWithRandomSeed {
@@ -118,7 +120,9 @@ object Rand {
> SELECT _FUNC_(null);
1.1164209726833079
""",
- note = "The function is non-deterministic in general case.")
+ note = """
+ The function is non-deterministic in general case.
+ """)
// scalastyle:on line.size.limit
case class Randn(child: Expression) extends RDG with ExpressionWithRandomSeed {
diff --git a/sql/gen-sql-markdown.py b/sql/gen-sql-markdown.py
index fa8124b..e0529f8 100644
--- a/sql/gen-sql-markdown.py
+++ b/sql/gen-sql-markdown.py
@@ -20,7 +20,7 @@ import os
from collections import namedtuple
ExpressionInfo = namedtuple(
- "ExpressionInfo", "className name usage arguments examples note since")
+ "ExpressionInfo", "className name usage arguments examples note since deprecated")
def _list_function_infos(jvm):
@@ -42,7 +42,8 @@ def _list_function_infos(jvm):
arguments=jinfo.getArguments().replace("_FUNC_", name),
examples=jinfo.getExamples().replace("_FUNC_", name),
note=jinfo.getNote(),
- since=jinfo.getSince()))
+ since=jinfo.getSince(),
+ deprecated=jinfo.getDeprecated()))
return sorted(infos, key=lambda i: i.name)
@@ -136,6 +137,27 @@ def _make_pretty_note(note):
return "**Note:**\n%s\n" % note
+def _make_pretty_deprecated(deprecated):
+ """
+ Makes the deprecated description pretty and returns a formatted string if `deprecated`
+ is not an empty string. Otherwise, returns None.
+
+ Expected input:
+
+ ...
+
+ Expected output:
+ **Deprecated:**
+
+ ...
+
+ """
+
+ if deprecated != "":
+ deprecated = "\n".join(map(lambda n: n[4:], deprecated.split("\n")))
+ return "**Deprecated:**\n%s\n" % deprecated
+
+
def generate_sql_markdown(jvm, path):
"""
Generates a markdown file after listing the function information. The output file
@@ -162,6 +184,10 @@ def generate_sql_markdown(jvm, path):
**Since:** SINCE
+ **Deprecated:**
+
+ DEPRECATED
+
<br/>
"""
@@ -174,6 +200,7 @@ def generate_sql_markdown(jvm, path):
examples = _make_pretty_examples(info.examples)
note = _make_pretty_note(info.note)
since = info.since
+ deprecated = _make_pretty_deprecated(info.deprecated)
mdfile.write("### %s\n\n" % name)
if usage is not None:
@@ -186,6 +213,8 @@ def generate_sql_markdown(jvm, path):
mdfile.write(note)
if since is not None and since != "":
mdfile.write("**Since:** %s\n\n" % since.strip())
+ if deprecated is not None:
+ mdfile.write(deprecated)
mdfile.write("<br/>\n\n")
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org