You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "cloud-fan (via GitHub)" <gi...@apache.org> on 2023/10/17 15:29:47 UTC

[PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

cloud-fan opened a new pull request, #43403:
URL: https://github.com/apache/spark/pull/43403

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   We've added time travel API in DS v2 and a dedicated SQL syntax for it. However, there is no way to do it with df read API. This PR adds time travel options (`timestampAsOf` and `versionAsOf`) to support time travel with df read API.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   feature parity 
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   Yes, now people can specify time travel in read options.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   new tests
   
   ### Was this patch authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1363176439


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -2196,6 +2208,13 @@
     },
     "sqlState" : "42K0E"
   },
+  "INVALID_TIME_TRAVEL_TIMESTAMP_OPTION" : {

Review Comment:
   Duplicates INVALID_TIME_TRAVEL_TIMESTAMP_EXPR.INPUT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1368672704


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -2207,6 +2213,11 @@
           "Must be deterministic."
         ]
       },
+      "OPTION" : {
+        "message" : [
+          "Timestamp string in the options should use ISO standard format such as 'yyyy-MM-dd HH:mm:ss'."

Review Comment:
   > 
   
   Which ISO standard do you mean? As per [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) :
   In ISO 8601:2004 it was permitted to omit the "T" character by mutual agreement as in "200704051430",[[36]](https://en.wikipedia.org/wiki/ISO_8601#cite_note-36) **but this provision was removed in ISO 8601-1:2019**. Separating date and time parts with other characters such as space is not allowed in ISO 8601, but allowed in its profile RFC 3339.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1364787948


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -2188,6 +2194,11 @@
           "Must be deterministic."
         ]
       },
+      "OPTION" : {
+        "message" : [
+          "Timestamp string in the options should be in the format of 'yyyy-MM-dd HH:mm:ss[.us][zone_id]'."

Review Comment:
   The given formatter might be insufficient for valid timestamps and the `[.us][zone_id]` part differs from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1369803328


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -2207,6 +2213,11 @@
           "Must be deterministic."
         ]
       },
+      "OPTION" : {
+        "message" : [
+          "Timestamp string in the options should use ISO standard format such as 'yyyy-MM-dd HH:mm:ss'."

Review Comment:
   hmm, it's probably too much to put this BNF in the error message. `yyyy-MM-dd HH:mm:ss` from Java looks clearer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1369831774


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -2207,6 +2213,11 @@
           "Must be deterministic."
         ]
       },
+      "OPTION" : {
+        "message" : [
+          "Timestamp string in the options should use ISO standard format such as 'yyyy-MM-dd HH:mm:ss'."

Review Comment:
   How about
   ```
   Timestamp string in the options should be able to cast to TIMESTAMP.
   ```
   
   Refering INVALID_TIME_TRAVEL_TIMESTAMP_EXPR.INPUT, it seems that we also do a cast operation here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1368307092


##########
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala:
##########
@@ -2928,29 +2932,34 @@ class DataSourceV2SQLSuiteV1Filter
       sql(s"INSERT INTO $t4 VALUES (7)")
       sql(s"INSERT INTO $t4 VALUES (8)")
 
-      assert(sql("SELECT * FROM t TIMESTAMP AS OF '2019-01-29 00:37:58'").collect()
-        === Array(Row(5), Row(6)))
-      assert(sql("SELECT * FROM t TIMESTAMP AS OF '2021-01-29 00:00:00'").collect()
-        === Array(Row(7), Row(8)))
-      assert(sql(s"SELECT * FROM t TIMESTAMP AS OF $ts1InSeconds").collect()
-        === Array(Row(5), Row(6)))
-      assert(sql(s"SELECT * FROM t TIMESTAMP AS OF $ts2InSeconds").collect()
-        === Array(Row(7), Row(8)))
-      assert(sql(s"SELECT * FROM t FOR SYSTEM_TIME AS OF $ts1InSeconds").collect()
-        === Array(Row(5), Row(6)))
-      assert(sql(s"SELECT * FROM t FOR SYSTEM_TIME AS OF $ts2InSeconds").collect()
-        === Array(Row(7), Row(8)))
-      assert(sql("SELECT * FROM t TIMESTAMP AS OF make_date(2021, 1, 29)").collect()
-        === Array(Row(7), Row(8)))
-      assert(sql("SELECT * FROM t TIMESTAMP AS OF to_timestamp('2021-01-29 00:00:00')").collect()
-        === Array(Row(7), Row(8)))
+      val res1_sql = sql("SELECT * FROM t TIMESTAMP AS OF '2019-01-29 00:37:58'").collect()
+      assert(res1_sql === Array(Row(5), Row(6)))
+      val res1_df = spark.read.option("timestampAsOf", "2019-01-29 00:37:58").table("t").collect()
+      assert(res1_df === Array(Row(5), Row(6)))
+      val res2_sql = sql("SELECT * FROM t TIMESTAMP AS OF '2021-01-29 00:00:00'").collect()
+      assert(res2_sql === Array(Row(7), Row(8)))
+      val res2_df = spark.read.option("timestampAsOf", "2021-01-29 00:00:00").table("t").collect()
+      assert(res2_df === Array(Row(7), Row(8)))
+
+      val res3 = sql(s"SELECT * FROM t TIMESTAMP AS OF $ts1InSeconds").collect()
+      assert(res3 === Array(Row(5), Row(6)))
+      val res4 = sql(s"SELECT * FROM t TIMESTAMP AS OF $ts2InSeconds").collect()
+      assert(res4 === Array(Row(7), Row(8)))
+      val res5 = sql(s"SELECT * FROM t FOR SYSTEM_TIME AS OF $ts1InSeconds").collect()
+      assert(res5 === Array(Row(5), Row(6)))
+      val res6 = sql(s"SELECT * FROM t FOR SYSTEM_TIME AS OF $ts2InSeconds").collect()
+      assert(res6 === Array(Row(7), Row(8)))
+      val res7 = sql("SELECT * FROM t TIMESTAMP AS OF make_date(2021, 1, 29)").collect()
+      assert(res7 === Array(Row(7), Row(8)))
+      val res8 = sql("SELECT * FROM t TIMESTAMP AS OF to_timestamp('2021-01-29 00:00:00')").collect()
+      assert(res8 === Array(Row(7), Row(8)))
       // Scalar subquery is also supported.
-      assert(sql("SELECT * FROM t TIMESTAMP AS OF (SELECT make_date(2021, 1, 29))").collect()
-        === Array(Row(7), Row(8)))
+      val res9 = sql("SELECT * FROM t TIMESTAMP AS OF (SELECT make_date(2021, 1, 29))")
+      assert(res9.collect() === Array(Row(7), Row(8)))

Review Comment:
   ```suggestion
         val res9 = sql("SELECT * FROM t TIMESTAMP AS OF (SELECT make_date(2021, 1, 29))").collect()
         assert(res9 === Array(Row(7), Row(8)))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1369756236


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -2207,6 +2213,11 @@
           "Must be deterministic."
         ]
       },
+      "OPTION" : {
+        "message" : [
+          "Timestamp string in the options should use ISO standard format such as 'yyyy-MM-dd HH:mm:ss'."

Review Comment:
   does ANSI define timestamp format using these characters `yyyy-MM-dd HH:mm:ss`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1369818233


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -2207,6 +2213,11 @@
           "Must be deterministic."
         ]
       },
+      "OPTION" : {
+        "message" : [
+          "Timestamp string in the options should use ISO standard format such as 'yyyy-MM-dd HH:mm:ss'."

Review Comment:
   I remember you once told me try our best to avoid Java stuff from SQL docs and errors:)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1368095915


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -2207,6 +2213,11 @@
           "Must be deterministic."
         ]
       },
+      "OPTION" : {
+        "message" : [
+          "Timestamp string in the options should be in the format of 'yyyy-MM-dd HH:mm:ss'."

Review Comment:
   ```suggestion
             "Timestamp string in the options should use ISO standard format such as 'yyyy-MM-dd HH:mm:ss'."
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #43403:
URL: https://github.com/apache/spark/pull/43403#issuecomment-1782181913

   Thank you all, merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1368813264


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -2207,6 +2213,11 @@
           "Must be deterministic."
         ]
       },
+      "OPTION" : {
+        "message" : [
+          "Timestamp string in the options should use ISO standard format such as 'yyyy-MM-dd HH:mm:ss'."

Review Comment:
   hmm, shall we say Java standard instead? `yyyy-MM-dd HH:mm:ss` is pretty common.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "ryan-johnson-databricks (via GitHub)" <gi...@apache.org>.
ryan-johnson-databricks commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1362362661


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala:
##########
@@ -65,4 +66,36 @@ object TimeTravelSpec {
       None
     }
   }
+
+  def fromOptions(
+      options: CaseInsensitiveStringMap,
+      timestampKey: String,
+      versionKey: String,
+      sessionLocalTimeZone: String): Option[TimeTravelSpec] = {
+    val timestampStr = options.get(timestampKey)
+    val versionStr = options.get(versionKey)
+    if (timestampStr != null && versionStr != null) {

Review Comment:
   Seems like `null` is a scala antipattern... would it be cleaner to do this instead?
   ```scala
   (Option(options.get(timestampKey)), Option(options.get(versionKey))) match {
     case (Some(_), Some(_)) =>
       throw QueryCompilationErrors.invalidTimeTravelSpecError()
     case (Some(timestampString), None) =>
         ...
       Some(AsOfTimestamp(...))
     case (None, Some(versionStr)) =>
       Some(AsOfVersion(versionStr))
     case (None, None) =>
       None
   }
   ```



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -1122,7 +1122,10 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor
 
       case r @ RelationTimeTravel(u: UnresolvedRelation, timestamp, version)
           if timestamp.forall(ts => ts.resolved && !SubqueryExpression.hasSubquery(ts)) =>
-        resolveRelation(u, TimeTravelSpec.create(timestamp, version, conf)).getOrElse(r)
+        resolveRelation(
+          u,
+          TimeTravelSpec.create(timestamp, version, conf.sessionLocalTimeZone)
+        ).getOrElse(r)

Review Comment:
   ```suggestion
           val timeTravelSpec = TimeTravelSpec.create(timestamp, version, conf.sessionLocalTimeZone)
           resolveRelation(u, timeTravelSpec).getOrElse(r)
   ```



##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -2196,6 +2208,13 @@
     },
     "sqlState" : "42K0E"
   },
+  "INVALID_TIME_TRAVEL_TIMESTAMP_OPTION" : {

Review Comment:
   Should this be a subclass of the existing `INVALID_TIME_TRAVEL_TIMESTAMP_EXPR`? 
   (unless `TIMESTAMP AS OF 'yyyy-MM-dd HH:mm:ss[.us][zone_id]'` is invalid and options are special?)



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -1255,17 +1258,27 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor
     private def resolveRelation(
         u: UnresolvedRelation,
         timeTravelSpec: Option[TimeTravelSpec] = None): Option[LogicalPlan] = {
-      resolveTempView(u.multipartIdentifier, u.isStreaming, timeTravelSpec.isDefined).orElse {
+      val timeTravelSpecFromOptions = TimeTravelSpec.fromOptions(
+        u.options,
+        conf.getConf(SQLConf.TIME_TRAVEL_TIMESTAMP_KEY),
+        conf.getConf(SQLConf.TIME_TRAVEL_VERSION_KEY),

Review Comment:
   If it's part of `SQLConf`, does that mean somebody might try to force time travel for all table accesses in their session? Wouldn't that be a weird conflict, if somebody used SQL `AS OF` or normal options?



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala:
##########
@@ -65,4 +66,36 @@ object TimeTravelSpec {
       None
     }
   }
+
+  def fromOptions(
+      options: CaseInsensitiveStringMap,
+      timestampKey: String,
+      versionKey: String,
+      sessionLocalTimeZone: String): Option[TimeTravelSpec] = {
+    val timestampStr = options.get(timestampKey)
+    val versionStr = options.get(versionKey)
+    if (timestampStr != null && versionStr != null) {
+      throw QueryCompilationErrors.invalidTimeTravelSpecError()
+    }
+
+    if (timestampStr != null) {
+      val timestampValue = Cast(
+        Literal(timestampStr),
+        TimestampType,
+        Some(sessionLocalTimeZone),
+        ansiEnabled = false

Review Comment:
   Shouldn't ansi enablement normally come from the session conf? 
   Or are we relying specifically on non-ansi so that invalid timestamp gives back null instead of raising an exception?



##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -869,6 +869,12 @@
     ],
     "sqlState" : "42710"
   },
+  "DUPLICATED_TIME_TRAVEL_SPEC" : {

Review Comment:
   nit: 
   ```suggestion
     "MULTIPLE_TIME_TRAVEL_SPEC" : {
   ```
   (merely "duplicated" could in theory be ok, in the sense that both specs are identical and thus don't conflict -- the real problem comes if the provided specs disagree with each other... but it doesn't seem worth the trouble to actually compare the two in order to raise a `CONFLICTING_TIME_TRAVEL` error -- better to require a single spec and be done with it)



##########
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala:
##########
@@ -2928,28 +2932,34 @@ class DataSourceV2SQLSuiteV1Filter
       sql(s"INSERT INTO $t4 VALUES (7)")
       sql(s"INSERT INTO $t4 VALUES (8)")
 
-      assert(sql("SELECT * FROM t TIMESTAMP AS OF '2019-01-29 00:37:58'").collect
-        === Array(Row(5), Row(6)))
-      assert(sql("SELECT * FROM t TIMESTAMP AS OF '2021-01-29 00:00:00'").collect
-        === Array(Row(7), Row(8)))
-      assert(sql(s"SELECT * FROM t TIMESTAMP AS OF $ts1InSeconds").collect
-        === Array(Row(5), Row(6)))
-      assert(sql(s"SELECT * FROM t TIMESTAMP AS OF $ts2InSeconds").collect
-        === Array(Row(7), Row(8)))
-      assert(sql(s"SELECT * FROM t FOR SYSTEM_TIME AS OF $ts1InSeconds").collect
-        === Array(Row(5), Row(6)))
-      assert(sql(s"SELECT * FROM t FOR SYSTEM_TIME AS OF $ts2InSeconds").collect
-        === Array(Row(7), Row(8)))
-      assert(sql("SELECT * FROM t TIMESTAMP AS OF make_date(2021, 1, 29)").collect
-        === Array(Row(7), Row(8)))
-      assert(sql("SELECT * FROM t TIMESTAMP AS OF to_timestamp('2021-01-29 00:00:00')").collect
-        === Array(Row(7), Row(8)))

Review Comment:
   This seems like a spurious change/refactor? 
   Even if the refactor is important, it's arguably best isolated in its own PR to not mix feature+refactor.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -1255,17 +1258,27 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor
     private def resolveRelation(
         u: UnresolvedRelation,
         timeTravelSpec: Option[TimeTravelSpec] = None): Option[LogicalPlan] = {
-      resolveTempView(u.multipartIdentifier, u.isStreaming, timeTravelSpec.isDefined).orElse {
+      val timeTravelSpecFromOptions = TimeTravelSpec.fromOptions(
+        u.options,
+        conf.getConf(SQLConf.TIME_TRAVEL_TIMESTAMP_KEY),
+        conf.getConf(SQLConf.TIME_TRAVEL_VERSION_KEY),

Review Comment:
   Update: The conf lookup is just fetching the _names_ of the time travel keys, not their values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1364007094


##########
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala:
##########
@@ -2928,28 +2932,34 @@ class DataSourceV2SQLSuiteV1Filter
       sql(s"INSERT INTO $t4 VALUES (7)")
       sql(s"INSERT INTO $t4 VALUES (8)")
 
-      assert(sql("SELECT * FROM t TIMESTAMP AS OF '2019-01-29 00:37:58'").collect
-        === Array(Row(5), Row(6)))
-      assert(sql("SELECT * FROM t TIMESTAMP AS OF '2021-01-29 00:00:00'").collect
-        === Array(Row(7), Row(8)))
-      assert(sql(s"SELECT * FROM t TIMESTAMP AS OF $ts1InSeconds").collect
-        === Array(Row(5), Row(6)))
-      assert(sql(s"SELECT * FROM t TIMESTAMP AS OF $ts2InSeconds").collect
-        === Array(Row(7), Row(8)))
-      assert(sql(s"SELECT * FROM t FOR SYSTEM_TIME AS OF $ts1InSeconds").collect
-        === Array(Row(5), Row(6)))
-      assert(sql(s"SELECT * FROM t FOR SYSTEM_TIME AS OF $ts2InSeconds").collect
-        === Array(Row(7), Row(8)))
-      assert(sql("SELECT * FROM t TIMESTAMP AS OF make_date(2021, 1, 29)").collect
-        === Array(Row(7), Row(8)))
-      assert(sql("SELECT * FROM t TIMESTAMP AS OF to_timestamp('2021-01-29 00:00:00')").collect
-        === Array(Row(7), Row(8)))

Review Comment:
   yea it's better to have separated PRs, but this is just improving the code style of a test, and should be OK to use a single PR to save test resources.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1368337352


##########
docs/sql-error-conditions-invalid-time-travel-timestamp-expr-error-class.md:
##########
@@ -33,6 +33,10 @@ Cannot be casted to the "TIMESTAMP" type.
 
 Must be deterministic.
 
+## OPTION
+
+Timestamp string in the options should be in the format of 'yyyy-MM-dd HH:mm:ss[.us][zone_id]'.

Review Comment:
   ```suggestion
   Timestamp string in the options should use ISO standard format such as 'yyyy-MM-dd HH:mm:ss'.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1369729340


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -2207,6 +2213,11 @@
           "Must be deterministic."
         ]
       },
+      "OPTION" : {
+        "message" : [
+          "Timestamp string in the options should use ISO standard format such as 'yyyy-MM-dd HH:mm:ss'."

Review Comment:
   Is it actually ANSI SQL Standard DATE/TIMESTAMP and TIMESTAMP WITH TIME ZONE formats?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1369786851


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -2207,6 +2213,11 @@
           "Must be deterministic."
         ]
       },
+      "OPTION" : {
+        "message" : [
+          "Timestamp string in the options should use ISO standard format such as 'yyyy-MM-dd HH:mm:ss'."

Review Comment:
   
   
   Yes,it's defined in the ANSI SQL Standard in BNF form
   ```bnf
   <unquoted timestamp string> ::=
     <unquoted date string> <space> <unquoted time string>
   
   ```
   ![image](https://github.com/apache/spark/assets/8326978/fa5c026e-41c5-40d4-a32c-c76a0a17cb7d)
   



##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -2207,6 +2213,11 @@
           "Must be deterministic."
         ]
       },
+      "OPTION" : {
+        "message" : [
+          "Timestamp string in the options should use ISO standard format such as 'yyyy-MM-dd HH:mm:ss'."

Review Comment:
   
   
   Yes,it's defined in the ANSI SQL Standard in BNF form
   ```bnf
   <unquoted timestamp string> ::=
     <unquoted date string> <space> <unquoted time string>
   
   ```
   ![image](https://github.com/apache/spark/assets/8326978/fa5c026e-41c5-40d4-a32c-c76a0a17cb7d)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1364005044


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala:
##########
@@ -65,4 +66,36 @@ object TimeTravelSpec {
       None
     }
   }
+
+  def fromOptions(
+      options: CaseInsensitiveStringMap,
+      timestampKey: String,
+      versionKey: String,
+      sessionLocalTimeZone: String): Option[TimeTravelSpec] = {
+    val timestampStr = options.get(timestampKey)
+    val versionStr = options.get(versionKey)
+    if (timestampStr != null && versionStr != null) {
+      throw QueryCompilationErrors.invalidTimeTravelSpecError()
+    }
+
+    if (timestampStr != null) {
+      val timestampValue = Cast(
+        Literal(timestampStr),
+        TimestampType,
+        Some(sessionLocalTimeZone),
+        ansiEnabled = false

Review Comment:
   We intentionally turn off ansi here so it doesn't fail, and we can customize the error message later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1369660330


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -2207,6 +2213,11 @@
           "Must be deterministic."
         ]
       },
+      "OPTION" : {
+        "message" : [
+          "Timestamp string in the options should use ISO standard format such as 'yyyy-MM-dd HH:mm:ss'."

Review Comment:
   Yep



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1369826782


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -2207,6 +2213,11 @@
           "Must be deterministic."
         ]
       },
+      "OPTION" : {
+        "message" : [
+          "Timestamp string in the options should use ISO standard format such as 'yyyy-MM-dd HH:mm:ss'."

Review Comment:
   I believe `T` is the valid delimiter to separate date and time in ISO standard.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1369825843


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -2207,6 +2213,11 @@
           "Must be deterministic."
         ]
       },
+      "OPTION" : {
+        "message" : [
+          "Timestamp string in the options should use ISO standard format such as 'yyyy-MM-dd HH:mm:ss'."

Review Comment:
   alright, how about `... use ISO standard format 'yyyy-MM-ddTHH:mm:ss'`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn closed pull request #43403: [SPARK-45575][SQL] Support time travel options for df read API
URL: https://github.com/apache/spark/pull/43403


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "ryan-johnson-databricks (via GitHub)" <gi...@apache.org>.
ryan-johnson-databricks commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1364001231


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala:
##########
@@ -65,4 +66,36 @@ object TimeTravelSpec {
       None
     }
   }
+
+  def fromOptions(
+      options: CaseInsensitiveStringMap,
+      timestampKey: String,
+      versionKey: String,
+      sessionLocalTimeZone: String): Option[TimeTravelSpec] = {
+    val timestampStr = options.get(timestampKey)
+    val versionStr = options.get(versionKey)
+    if (timestampStr != null && versionStr != null) {
+      throw QueryCompilationErrors.invalidTimeTravelSpecError()
+    }
+
+    if (timestampStr != null) {
+      val timestampValue = Cast(
+        Literal(timestampStr),
+        TimestampType,
+        Some(sessionLocalTimeZone),
+        ansiEnabled = false

Review Comment:
   (because it seems like we could easily get exceptions even in non-ansi mode, for sufficiently broken/illegal timestamp expressions?) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1368672704


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -2207,6 +2213,11 @@
           "Must be deterministic."
         ]
       },
+      "OPTION" : {
+        "message" : [
+          "Timestamp string in the options should use ISO standard format such as 'yyyy-MM-dd HH:mm:ss'."

Review Comment:
   > use ISO standard format such as 'yyyy-MM-dd HH:mm:ss'
   
   Which ISO standard do you mean? As per [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) :
   In ISO 8601:2004 it was permitted to omit the "T" character by mutual agreement as in "200704051430",[[36]](https://en.wikipedia.org/wiki/ISO_8601#cite_note-36) **but this provision was removed in ISO 8601-1:2019**. Separating date and time parts with other characters such as space is not allowed in ISO 8601, but allowed in its profile RFC 3339.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1368264275


##########
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala:
##########
@@ -2902,10 +2902,21 @@ class DataSourceV2SQLSuiteV1Filter
       sql(s"INSERT INTO $t2 VALUES (3)")
       sql(s"INSERT INTO $t2 VALUES (4)")
 
+<<<<<<< HEAD

Review Comment:
   @cloud-fan Could you resolve conflicts here, please.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1366622868


##########
common/utils/src/main/resources/error/error-classes.json:
##########
@@ -2188,6 +2194,11 @@
           "Must be deterministic."
         ]
       },
+      "OPTION" : {
+        "message" : [
+          "Timestamp string in the options should be in the format of 'yyyy-MM-dd HH:mm:ss[.us][zone_id]'."

Review Comment:
   We use Cast and the allowed input string is very flexible. But in the document, we should ask people to use the most standard timestamp format.
   
   BTW, this is not really a Java timestamp format syntax, but only for document purposes. I can remove the `[.us][zone_id]` part as most people won't specify them in time travel.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45575][SQL] Support time travel options for df read API [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43403:
URL: https://github.com/apache/spark/pull/43403#discussion_r1364781656


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala:
##########
@@ -65,4 +66,35 @@ object TimeTravelSpec {
       None
     }
   }
+
+  def fromOptions(
+      options: CaseInsensitiveStringMap,
+      timestampKey: String,
+      versionKey: String,
+      sessionLocalTimeZone: String): Option[TimeTravelSpec] = {
+    (Option(options.get(timestampKey)), Option(options.get(versionKey))) match {
+      case (Some(_), Some(_)) =>
+        throw QueryCompilationErrors.invalidTimeTravelSpecError()
+
+      case (Some(timestampStr), None) =>
+        val timestampValue = Cast(
+          Literal(timestampStr),
+          TimestampType,
+          Some(sessionLocalTimeZone),
+          ansiEnabled = false
+        ).eval()
+        if (timestampValue == null) {
+          throw new AnalysisException(
+            "INVALID_TIME_TRAVEL_TIMESTAMP_EXPR.OPTION",
+            Map("expr" -> s"'$timestampStr''")

Review Comment:
   ```suggestion
               Map("expr" -> s"'$timestampStr'")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org