You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by we...@apache.org on 2020/03/30 04:37:33 UTC
[spark] branch branch-3.0 updated: [SPARK-31286][SQL][DOC] Specify
formats of time zone ID for JSON/CSV option and from/to_utc_timestamp
This is an automated email from the ASF dual-hosted git repository.
wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.0 by this push:
new 8d61a14 [SPARK-31286][SQL][DOC] Specify formats of time zone ID for JSON/CSV option and from/to_utc_timestamp
8d61a14 is described below
commit 8d61a141d9e899c521f37b79c78d3a29fd8401fe
Author: Maxim Gekk <ma...@gmail.com>
AuthorDate: Mon Mar 30 12:20:11 2020 +0800
[SPARK-31286][SQL][DOC] Specify formats of time zone ID for JSON/CSV option and from/to_utc_timestamp
### What changes were proposed in this pull request?
In the PR, I propose to update the doc for the `timeZone` option in JSON/CSV datasources and for the `tz` parameter of the `from_utc_timestamp()`/`to_utc_timestamp()` functions, and to restrict format of config's values to 2 forms:
1. Geographical regions, such as `America/Los_Angeles`.
2. Fixed offsets - a fully resolved offset from UTC. For example, `-08:00`.
### Why are the changes needed?
Other formats such as three-letter time zone IDs are ambitious, and depend on the locale. For example, `CST` could be U.S. `Central Standard Time` and `China Standard Time`. Such formats have been already deprecated in JDK, see [Three-letter time zone IDs](https://docs.oracle.com/javase/8/docs/api/java/util/TimeZone.html).
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
By running `./dev/scalastyle`, and manual testing.
Closes #28051 from MaxGekk/doc-time-zone-option.
Authored-by: Maxim Gekk <ma...@gmail.com>
Signed-off-by: Wenchen Fan <we...@databricks.com>
(cherry picked from commit d2ff5c5bfb29a7b22df3cb49b6584c3e2bec397c)
Signed-off-by: Wenchen Fan <we...@databricks.com>
---
R/pkg/R/functions.R | 8 ++-
python/pyspark/sql/functions.py | 14 ++++-
python/pyspark/sql/readwriter.py | 60 +++++++++++++++++-----
python/pyspark/sql/streaming.py | 60 +++++++++++++++++-----
.../org/apache/spark/sql/DataFrameReader.scala | 45 +++++++++++++---
.../org/apache/spark/sql/DataFrameWriter.scala | 45 +++++++++++++---
.../scala/org/apache/spark/sql/functions.scala | 16 ++++--
.../spark/sql/streaming/DataStreamReader.scala | 45 +++++++++++++---
.../spark/sql/streaming/DataStreamWriter.scala | 45 +++++++++++++---
9 files changed, 283 insertions(+), 55 deletions(-)
diff --git a/R/pkg/R/functions.R b/R/pkg/R/functions.R
index dd5dbbc..d8b0450 100644
--- a/R/pkg/R/functions.R
+++ b/R/pkg/R/functions.R
@@ -77,7 +77,13 @@ NULL
#' days to be added to or subtracted from \code{y}. For class \code{character}, it is
#' \itemize{
#' \item \code{date_format}: date format specification.
-#' \item \code{from_utc_timestamp}, \code{to_utc_timestamp}: time zone to use.
+#' \item \code{from_utc_timestamp}, \code{to_utc_timestamp}: A string detailing
+#' the time zone ID that the input should be adjusted to. It should be in the format
+#' of either region-based zone IDs or zone offsets. Region IDs must have the form
+#' 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in the format
+#' (+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported
+#' as aliases of '+00:00'. Other short names are not recommended to use
+#' because they can be ambiguous.
#' \item \code{next_day}: day of the week string.
#' }
#' @param ... additional argument(s).
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 543bb7f..1ade21c 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -1313,7 +1313,12 @@ def from_utc_timestamp(timestamp, tz):
timestamp to string according to the session local timezone.
:param timestamp: the column that contains timestamps
- :param tz: a string that has the ID of timezone, e.g. "GMT", "America/Los_Angeles", etc
+ :param tz: A string detailing the time zone ID that the input should be adjusted to. It should
+ be in the format of either region-based zone IDs or zone offsets. Region IDs must
+ have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in
+ the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are
+ supported as aliases of '+00:00'. Other short names are not recommended to use
+ because they can be ambiguous.
.. versionchanged:: 2.4
`tz` can take a :class:`Column` containing timezone ID strings.
@@ -1347,7 +1352,12 @@ def to_utc_timestamp(timestamp, tz):
timestamp to string according to the session local timezone.
:param timestamp: the column that contains timestamps
- :param tz: a string that has the ID of timezone, e.g. "GMT", "America/Los_Angeles", etc
+ :param tz: A string detailing the time zone ID that the input should be adjusted to. It should
+ be in the format of either region-based zone IDs or zone offsets. Region IDs must
+ have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in
+ the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are
+ supported as aliases of '+00:00'. Other short names are not recommended to use
+ because they can be ambiguous.
.. versionchanged:: 2.4
`tz` can take a :class:`Column` containing timezone ID strings.
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py
index 8179784..92d36e7 100644
--- a/python/pyspark/sql/readwriter.py
+++ b/python/pyspark/sql/readwriter.py
@@ -105,9 +105,18 @@ class DataFrameReader(OptionUtils):
"""Adds an input option for the underlying data source.
You can set the following option(s) for reading files:
- * ``timeZone``: sets the string that indicates a timezone to be used to parse timestamps
- in the JSON/CSV datasources or partition values.
- If it isn't set, it uses the default value, session local timezone.
+ * ``timeZone``: sets the string that indicates a time zone ID to be used to parse
+ timestamps in the JSON/CSV datasources or partition values. The following
+ formats of `timeZone` are supported:
+
+ * Region-based zone ID: It should have the form 'area/city', such as \
+ 'America/Los_Angeles'.
+ * Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00' or \
+ '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.
+
+ Other short names like 'CST' are not recommended to use because they can be
+ ambiguous. If it isn't set, the current value of the SQL config
+ ``spark.sql.session.timeZone`` is used by default.
* ``pathGlobFilter``: an optional glob pattern to only include files with paths matching
the pattern. The syntax follows org.apache.hadoop.fs.GlobFilter.
It does not change the behavior of partition discovery.
@@ -120,9 +129,18 @@ class DataFrameReader(OptionUtils):
"""Adds input options for the underlying data source.
You can set the following option(s) for reading files:
- * ``timeZone``: sets the string that indicates a timezone to be used to parse timestamps
- in the JSON/CSV datasources or partition values.
- If it isn't set, it uses the default value, session local timezone.
+ * ``timeZone``: sets the string that indicates a time zone ID to be used to parse
+ timestamps in the JSON/CSV datasources or partition values. The following
+ formats of `timeZone` are supported:
+
+ * Region-based zone ID: It should have the form 'area/city', such as \
+ 'America/Los_Angeles'.
+ * Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00' or \
+ '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.
+
+ Other short names like 'CST' are not recommended to use because they can be
+ ambiguous. If it isn't set, the current value of the SQL config
+ ``spark.sql.session.timeZone`` is used by default.
* ``pathGlobFilter``: an optional glob pattern to only include files with paths matching
the pattern. The syntax follows org.apache.hadoop.fs.GlobFilter.
It does not change the behavior of partition discovery.
@@ -665,9 +683,18 @@ class DataFrameWriter(OptionUtils):
"""Adds an output option for the underlying data source.
You can set the following option(s) for writing files:
- * ``timeZone``: sets the string that indicates a timezone to be used to format
- timestamps in the JSON/CSV datasources or partition values.
- If it isn't set, it uses the default value, session local timezone.
+ * ``timeZone``: sets the string that indicates a time zone ID to be used to format
+ timestamps in the JSON/CSV datasources or partition values. The following
+ formats of `timeZone` are supported:
+
+ * Region-based zone ID: It should have the form 'area/city', such as \
+ 'America/Los_Angeles'.
+ * Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00' or \
+ '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.
+
+ Other short names like 'CST' are not recommended to use because they can be
+ ambiguous. If it isn't set, the current value of the SQL config
+ ``spark.sql.session.timeZone`` is used by default.
"""
self._jwrite = self._jwrite.option(key, to_str(value))
return self
@@ -677,9 +704,18 @@ class DataFrameWriter(OptionUtils):
"""Adds output options for the underlying data source.
You can set the following option(s) for writing files:
- * ``timeZone``: sets the string that indicates a timezone to be used to format
- timestamps in the JSON/CSV datasources or partition values.
- If it isn't set, it uses the default value, session local timezone.
+ * ``timeZone``: sets the string that indicates a time zone ID to be used to format
+ timestamps in the JSON/CSV datasources or partition values. The following
+ formats of `timeZone` are supported:
+
+ * Region-based zone ID: It should have the form 'area/city', such as \
+ 'America/Los_Angeles'.
+ * Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00' or \
+ '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.
+
+ Other short names like 'CST' are not recommended to use because they can be
+ ambiguous. If it isn't set, the current value of the SQL config
+ ``spark.sql.session.timeZone`` is used by default.
"""
for k in options:
self._jwrite = self._jwrite.option(k, to_str(options[k]))
diff --git a/python/pyspark/sql/streaming.py b/python/pyspark/sql/streaming.py
index a5e8646..4d36a04 100644
--- a/python/pyspark/sql/streaming.py
+++ b/python/pyspark/sql/streaming.py
@@ -339,9 +339,18 @@ class DataStreamReader(OptionUtils):
"""Adds an input option for the underlying data source.
You can set the following option(s) for reading files:
- * ``timeZone``: sets the string that indicates a timezone to be used to parse timestamps
- in the JSON/CSV datasources or partition values.
- If it isn't set, it uses the default value, session local timezone.
+ * ``timeZone``: sets the string that indicates a time zone ID to be used to parse
+ timestamps in the JSON/CSV datasources or partition values. The following
+ formats of `timeZone` are supported:
+
+ * Region-based zone ID: It should have the form 'area/city', such as \
+ 'America/Los_Angeles'.
+ * Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00' or \
+ '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.
+
+ Other short names like 'CST' are not recommended to use because they can be
+ ambiguous. If it isn't set, the current value of the SQL config
+ ``spark.sql.session.timeZone`` is used by default.
.. note:: Evolving.
@@ -355,9 +364,18 @@ class DataStreamReader(OptionUtils):
"""Adds input options for the underlying data source.
You can set the following option(s) for reading files:
- * ``timeZone``: sets the string that indicates a timezone to be used to parse timestamps
- in the JSON/CSV datasources or partition values.
- If it isn't set, it uses the default value, session local timezone.
+ * ``timeZone``: sets the string that indicates a time zone ID to be used to parse
+ timestamps in the JSON/CSV datasources or partition values. The following
+ formats of `timeZone` are supported:
+
+ * Region-based zone ID: It should have the form 'area/city', such as \
+ 'America/Los_Angeles'.
+ * Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00' or \
+ '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.
+
+ Other short names like 'CST' are not recommended to use because they can be
+ ambiguous. If it isn't set, the current value of the SQL config
+ ``spark.sql.session.timeZone`` is used by default.
.. note:: Evolving.
@@ -812,9 +830,18 @@ class DataStreamWriter(object):
"""Adds an output option for the underlying data source.
You can set the following option(s) for writing files:
- * ``timeZone``: sets the string that indicates a timezone to be used to format
- timestamps in the JSON/CSV datasources or partition values.
- If it isn't set, it uses the default value, session local timezone.
+ * ``timeZone``: sets the string that indicates a time zone ID to be used to format
+ timestamps in the JSON/CSV datasources or partition values. The following
+ formats of `timeZone` are supported:
+
+ * Region-based zone ID: It should have the form 'area/city', such as \
+ 'America/Los_Angeles'.
+ * Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00' or \
+ '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.
+
+ Other short names like 'CST' are not recommended to use because they can be
+ ambiguous. If it isn't set, the current value of the SQL config
+ ``spark.sql.session.timeZone`` is used by default.
.. note:: Evolving.
"""
@@ -826,9 +853,18 @@ class DataStreamWriter(object):
"""Adds output options for the underlying data source.
You can set the following option(s) for writing files:
- * ``timeZone``: sets the string that indicates a timezone to be used to format
- timestamps in the JSON/CSV datasources or partition values.
- If it isn't set, it uses the default value, session local timezone.
+ * ``timeZone``: sets the string that indicates a time zone ID to be used to format
+ timestamps in the JSON/CSV datasources or partition values. The following
+ formats of `timeZone` are supported:
+
+ * Region-based zone ID: It should have the form 'area/city', such as \
+ 'America/Los_Angeles'.
+ * Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00' or \
+ '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.
+
+ Other short names like 'CST' are not recommended to use because they can be
+ ambiguous. If it isn't set, the current value of the SQL config
+ ``spark.sql.session.timeZone`` is used by default.
.. note:: Evolving.
"""
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
index a2a3518..83e5678 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
@@ -96,8 +96,19 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
*
* You can set the following option(s):
* <ul>
- * <li>`timeZone` (default session local timezone): sets the string that indicates a timezone
- * to be used to parse timestamps in the JSON/CSV datasources or partition values.</li>
+ * <li>`timeZone` (default session local timezone): sets the string that indicates a time zone ID
+ * to be used to parse timestamps in the JSON/CSV datasources or partition values. The following
+ * formats of `timeZone` are supported:
+ * <ul>
+ * <li> Region-based zone ID: It should have the form 'area/city', such as
+ * 'America/Los_Angeles'.</li>
+ * <li> Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00'
+ * or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.</li>
+ * </ul>
+ * Other short names like 'CST' are not recommended to use because they can be ambiguous.
+ * If it isn't set, the current value of the SQL config `spark.sql.session.timeZone` is
+ * used by default.
+ * </li>
* </ul>
*
* @since 1.4.0
@@ -133,8 +144,19 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
*
* You can set the following option(s):
* <ul>
- * <li>`timeZone` (default session local timezone): sets the string that indicates a timezone
- * to be used to parse timestamps in the JSON/CSV datasources or partition values.</li>
+ * <li>`timeZone` (default session local timezone): sets the string that indicates a time zone ID
+ * to be used to parse timestamps in the JSON/CSV datasources or partition values. The following
+ * formats of `timeZone` are supported:
+ * <ul>
+ * <li> Region-based zone ID: It should have the form 'area/city', such as
+ * 'America/Los_Angeles'.</li>
+ * <li> Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00'
+ * or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.</li>
+ * </ul>
+ * Other short names like 'CST' are not recommended to use because they can be ambiguous.
+ * If it isn't set, the current value of the SQL config `spark.sql.session.timeZone` is
+ * used by default.
+ * </li>
* </ul>
*
* @since 1.4.0
@@ -149,8 +171,19 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
*
* You can set the following option(s):
* <ul>
- * <li>`timeZone` (default session local timezone): sets the string that indicates a timezone
- * to be used to parse timestamps in the JSON/CSV datasources or partition values.</li>
+ * <li>`timeZone` (default session local timezone): sets the string that indicates a time zone ID
+ * to be used to parse timestamps in the JSON/CSV datasources or partition values. The following
+ * formats of `timeZone` are supported:
+ * <ul>
+ * <li> Region-based zone ID: It should have the form 'area/city', such as
+ * 'America/Los_Angeles'.</li>
+ * <li> Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00'
+ * or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.</li>
+ * </ul>
+ * Other short names like 'CST' are not recommended to use because they can be ambiguous.
+ * If it isn't set, the current value of the SQL config `spark.sql.session.timeZone` is
+ * used by default.
+ * </li>
* </ul>
*
* @since 1.4.0
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
index 11feae9..7e669e0 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
@@ -107,8 +107,19 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
*
* You can set the following option(s):
* <ul>
- * <li>`timeZone` (default session local timezone): sets the string that indicates a timezone
- * to be used to format timestamps in the JSON/CSV datasources or partition values.</li>
+ * <li>`timeZone` (default session local timezone): sets the string that indicates a time zone ID
+ * to be used to format timestamps in the JSON/CSV datasources or partition values. The following
+ * formats of `timeZone` are supported:
+ * <ul>
+ * <li> Region-based zone ID: It should have the form 'area/city', such as
+ * 'America/Los_Angeles'.</li>
+ * <li> Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00'
+ * or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.</li>
+ * </ul>
+ * Other short names like 'CST' are not recommended to use because they can be ambiguous.
+ * If it isn't set, the current value of the SQL config `spark.sql.session.timeZone` is
+ * used by default.
+ * </li>
* </ul>
*
* @since 1.4.0
@@ -144,8 +155,19 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
*
* You can set the following option(s):
* <ul>
- * <li>`timeZone` (default session local timezone): sets the string that indicates a timezone
- * to be used to format timestamps in the JSON/CSV datasources or partition values.</li>
+ * <li>`timeZone` (default session local timezone): sets the string that indicates a time zone ID
+ * to be used to format timestamps in the JSON/CSV datasources or partition values. The following
+ * formats of `timeZone` are supported:
+ * <ul>
+ * <li> Region-based zone ID: It should have the form 'area/city', such as
+ * 'America/Los_Angeles'.</li>
+ * <li> Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00'
+ * or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.</li>
+ * </ul>
+ * Other short names like 'CST' are not recommended to use because they can be ambiguous.
+ * If it isn't set, the current value of the SQL config `spark.sql.session.timeZone` is
+ * used by default.
+ * </li>
* </ul>
*
* @since 1.4.0
@@ -160,8 +182,19 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
*
* You can set the following option(s):
* <ul>
- * <li>`timeZone` (default session local timezone): sets the string that indicates a timezone
- * to be used to format timestamps in the JSON/CSV datasources or partition values.</li>
+ * <li>`timeZone` (default session local timezone): sets the string that indicates a time zone ID
+ * to be used to format timestamps in the JSON/CSV datasources or partition values. The following
+ * formats of `timeZone` are supported:
+ * <ul>
+ * <li> Region-based zone ID: It should have the form 'area/city', such as
+ * 'America/Los_Angeles'.</li>
+ * <li> Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00'
+ * or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.</li>
+ * </ul>
+ * Other short names like 'CST' are not recommended to use because they can be ambiguous.
+ * If it isn't set, the current value of the SQL config `spark.sql.session.timeZone` is
+ * used by default.
+ * </li>
* </ul>
*
* @since 1.4.0
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index 0ca4238..8a89a3b 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -3136,8 +3136,12 @@ object functions {
*
* @param ts A date, timestamp or string. If a string, the data must be in a format that can be
* cast to a timestamp, such as `yyyy-MM-dd` or `yyyy-MM-dd HH:mm:ss.SSSS`
- * @param tz A string detailing the time zone that the input should be adjusted to, such as
- * `Europe/London`, `PST` or `GMT+5`
+ * @param tz A string detailing the time zone ID that the input should be adjusted to. It should
+ * be in the format of either region-based zone IDs or zone offsets. Region IDs must
+ * have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in
+ * the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are
+ * supported as aliases of '+00:00'. Other short names are not recommended to use
+ * because they can be ambiguous.
* @return A timestamp, or null if `ts` was a string that could not be cast to a timestamp or
* `tz` was an invalid value
* @group datetime_funcs
@@ -3165,8 +3169,12 @@ object functions {
*
* @param ts A date, timestamp or string. If a string, the data must be in a format that can be
* cast to a timestamp, such as `yyyy-MM-dd` or `yyyy-MM-dd HH:mm:ss.SSSS`
- * @param tz A string detailing the time zone that the input belongs to, such as `Europe/London`,
- * `PST` or `GMT+5`
+ * @param tz A string detailing the time zone ID that the input should be adjusted to. It should
+ * be in the format of either region-based zone IDs or zone offsets. Region IDs must
+ * have the form 'area/city', such as 'America/Los_Angeles'. Zone offsets must be in
+ * the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are
+ * supported as aliases of '+00:00'. Other short names are not recommended to use
+ * because they can be ambiguous.
* @return A timestamp, or null if `ts` was a string that could not be cast to a timestamp or
* `tz` was an invalid value
* @group datetime_funcs
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala b/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala
index be7f021..a2eaed8 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala
@@ -81,8 +81,19 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
*
* You can set the following option(s):
* <ul>
- * <li>`timeZone` (default session local timezone): sets the string that indicates a timezone
- * to be used to parse timestamps in the JSON/CSV datasources or partition values.</li>
+ * <li>`timeZone` (default session local timezone): sets the string that indicates a time zone ID
+ * to be used to parse timestamps in the JSON/CSV datasources or partition values. The following
+ * formats of `timeZone` are supported:
+ * <ul>
+ * <li> Region-based zone ID: It should have the form 'area/city', such as
+ * 'America/Los_Angeles'.</li>
+ * <li> Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00'
+ * or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.</li>
+ * </ul>
+ * Other short names like 'CST' are not recommended to use because they can be ambiguous.
+ * If it isn't set, the current value of the SQL config `spark.sql.session.timeZone` is
+ * used by default.
+ * </li>
* </ul>
*
* @since 2.0.0
@@ -118,8 +129,19 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
*
* You can set the following option(s):
* <ul>
- * <li>`timeZone` (default session local timezone): sets the string that indicates a timezone
- * to be used to parse timestamps in the JSON/CSV data sources or partition values.</li>
+ * <li>`timeZone` (default session local timezone): sets the string that indicates a time zone ID
+ * to be used to parse timestamps in the JSON/CSV datasources or partition values. The following
+ * formats of `timeZone` are supported:
+ * <ul>
+ * <li> Region-based zone ID: It should have the form 'area/city', such as
+ * 'America/Los_Angeles'.</li>
+ * <li> Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00'
+ * or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.</li>
+ * </ul>
+ * Other short names like 'CST' are not recommended to use because they can be ambiguous.
+ * If it isn't set, the current value of the SQL config `spark.sql.session.timeZone` is
+ * used by default.
+ * </li>
* </ul>
*
* @since 2.0.0
@@ -134,8 +156,19 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
*
* You can set the following option(s):
* <ul>
- * <li>`timeZone` (default session local timezone): sets the string that indicates a timezone
- * to be used to parse timestamps in the JSON/CSV data sources or partition values.</li>
+ * <li>`timeZone` (default session local timezone): sets the string that indicates a time zone ID
+ * to be used to parse timestamps in the JSON/CSV datasources or partition values. The following
+ * formats of `timeZone` are supported:
+ * <ul>
+ * <li> Region-based zone ID: It should have the form 'area/city', such as
+ * 'America/Los_Angeles'.</li>
+ * <li> Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00'
+ * or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.</li>
+ * </ul>
+ * Other short names like 'CST' are not recommended to use because they can be ambiguous.
+ * If it isn't set, the current value of the SQL config `spark.sql.session.timeZone` is
+ * used by default.
+ * </li>
* </ul>
*
* @since 2.0.0
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala b/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
index 1c21a30..1d0ca4d 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
@@ -161,8 +161,19 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {
*
* You can set the following option(s):
* <ul>
- * <li>`timeZone` (default session local timezone): sets the string that indicates a timezone
- * to be used to format timestamps in the JSON/CSV datasources or partition values.</li>
+ * <li>`timeZone` (default session local timezone): sets the string that indicates a time zone ID
+ * to be used to format timestamps in the JSON/CSV datasources or partition values. The following
+ * formats of `timeZone` are supported:
+ * <ul>
+ * <li> Region-based zone ID: It should have the form 'area/city', such as
+ * 'America/Los_Angeles'.</li>
+ * <li> Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00'
+ * or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.</li>
+ * </ul>
+ * Other short names like 'CST' are not recommended to use because they can be ambiguous.
+ * If it isn't set, the current value of the SQL config `spark.sql.session.timeZone` is
+ * used by default.
+ * </li>
* </ul>
*
* @since 2.0.0
@@ -198,8 +209,19 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {
*
* You can set the following option(s):
* <ul>
- * <li>`timeZone` (default session local timezone): sets the string that indicates a timezone
- * to be used to format timestamps in the JSON/CSV datasources or partition values.</li>
+ * <li>`timeZone` (default session local timezone): sets the string that indicates a time zone ID
+ * to be used to format timestamps in the JSON/CSV datasources or partition values. The following
+ * formats of `timeZone` are supported:
+ * <ul>
+ * <li> Region-based zone ID: It should have the form 'area/city', such as
+ * 'America/Los_Angeles'.</li>
+ * <li> Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00'
+ * or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.</li>
+ * </ul>
+ * Other short names like 'CST' are not recommended to use because they can be ambiguous.
+ * If it isn't set, the current value of the SQL config `spark.sql.session.timeZone` is
+ * used by default.
+ * </li>
* </ul>
*
* @since 2.0.0
@@ -214,8 +236,19 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {
*
* You can set the following option(s):
* <ul>
- * <li>`timeZone` (default session local timezone): sets the string that indicates a timezone
- * to be used to format timestamps in the JSON/CSV datasources or partition values.</li>
+ * <li>`timeZone` (default session local timezone): sets the string that indicates a time zone ID
+ * to be used to format timestamps in the JSON/CSV datasources or partition values. The following
+ * formats of `timeZone` are supported:
+ * <ul>
+ * <li> Region-based zone ID: It should have the form 'area/city', such as
+ * 'America/Los_Angeles'.</li>
+ * <li> Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00'
+ * or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.</li>
+ * </ul>
+ * Other short names like 'CST' are not recommended to use because they can be ambiguous.
+ * If it isn't set, the current value of the SQL config `spark.sql.session.timeZone` is
+ * used by default.
+ * </li>
* </ul>
*
* @since 2.0.0
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org