You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/06/29 16:35:36 UTC

[GitHub] [iceberg] singhpk234 opened a new pull request, #5160: Spark 3.3: [FOLLOWUP] TimeTravel using SPARK SQL and DataFrameReaders

singhpk234 opened a new pull request, #5160:
URL: https://github.com/apache/iceberg/pull/5160

   ### About the changes
   
   Implements work-around for : https://issues.apache.org/jira/browse/SPARK-39633. 
   Unless the above is fixed we can convert timestamp to date-format explicitly before delegating to spark.
   
   ### Testing Done
   
   Added UT's for the same.
   
   cc @rdblue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] singhpk234 commented on pull request #5160: Spark 3.3: [FOLLOWUP] Add TODO for TimeTravel via DataframeReader using timestamp in seconds and CatalogOptions

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on PR #5160:
URL: https://github.com/apache/iceberg/pull/5160#issuecomment-1173454177

   Sure, I just thought it would nice to document this atleast, so that we don't miss this when we upgrade to 3.3.1, but I think we should be fine either ways !
   
   Thanks @rdblue for the review :) !! I learned about timestamp handling in iceberg / spark from this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] singhpk234 commented on a diff in pull request #5160: Spark 3.3: [FOLLOWUP] TimeTravel using SPARK SQL and DataFrameReaders

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on code in PR #5160:
URL: https://github.com/apache/iceberg/pull/5160#discussion_r911616713


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/IcebergSource.java:
##########
@@ -173,7 +174,19 @@ public Optional<String> extractTimeTravelVersion(CaseInsensitiveStringMap option
 
   @Override
   public Optional<String> extractTimeTravelTimestamp(CaseInsensitiveStringMap options) {
-    return Optional.ofNullable(PropertyUtil.propertyAsString(options, "timestampAsOf", null));
+    String timestampAsOf = PropertyUtil.propertyAsString(options, "timestampAsOf", null);
+    if (timestampAsOf == null) {
+      return Optional.empty();
+    }
+
+    try {
+      // timestamp provided should be at a seconds precision.
+      // TODO: remove once https://issues.apache.org/jira/browse/SPARK-39633 is resolved
+      long timestampAsOfAsLong = Long.parseLong(timestampAsOf);
+      return Optional.of(DateTimeUtil.formatTimestampMillisWithLocalTime(timestampAsOfAsLong * 1000));

Review Comment:
   Makes sense, calling spark api's via iceberg will also not be ok i think then as this now indirectly puts the translation to iceberg (as which spark api to use). Should I  add a TODO in the code with the issue linking it and add the UT added in this PR in ignore as soon as we upgrade to 3.3.1 we will have it (The PR for [SPARK-39633](https://issues.apache.org/jira/browse/SPARK-39633) is merged in upstream). 
   
   Meanwhile we have existing ways to TT via dataframe options we can specify [as-of-timestamp](https://iceberg.apache.org/docs/latest/spark-queries/#time-travel) in milliseconds.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5160: Spark 3.3: [FOLLOWUP] TimeTravel using SPARK SQL and DataFrameReaders

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5160:
URL: https://github.com/apache/iceberg/pull/5160#discussion_r910271454


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/IcebergSource.java:
##########
@@ -173,7 +174,19 @@ public Optional<String> extractTimeTravelVersion(CaseInsensitiveStringMap option
 
   @Override
   public Optional<String> extractTimeTravelTimestamp(CaseInsensitiveStringMap options) {
-    return Optional.ofNullable(PropertyUtil.propertyAsString(options, "timestampAsOf", null));
+    String timestampAsOf = PropertyUtil.propertyAsString(options, "timestampAsOf", null);
+    if (timestampAsOf == null) {
+      return Optional.empty();
+    }
+
+    try {
+      // timestamp provided should be at a seconds precision.
+      // TODO: remove once https://issues.apache.org/jira/browse/SPARK-39633 is resolved
+      long timestampAsOfAsLong = Long.parseLong(timestampAsOf);
+      return Optional.of(DateTimeUtil.formatTimestampMillisWithLocalTime(timestampAsOfAsLong * 1000));

Review Comment:
   This needs to be handled by Spark. Iceberg should not translate to a zone-specific time because that is what Spark expects..



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5160: Spark 3.3: [FOLLOWUP] TimeTravel using SPARK SQL and DataFrameReaders

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5160:
URL: https://github.com/apache/iceberg/pull/5160#discussion_r912066099


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/IcebergSource.java:
##########
@@ -173,7 +174,19 @@ public Optional<String> extractTimeTravelVersion(CaseInsensitiveStringMap option
 
   @Override
   public Optional<String> extractTimeTravelTimestamp(CaseInsensitiveStringMap options) {
-    return Optional.ofNullable(PropertyUtil.propertyAsString(options, "timestampAsOf", null));
+    String timestampAsOf = PropertyUtil.propertyAsString(options, "timestampAsOf", null);
+    if (timestampAsOf == null) {
+      return Optional.empty();
+    }
+
+    try {
+      // timestamp provided should be at a seconds precision.
+      // TODO: remove once https://issues.apache.org/jira/browse/SPARK-39633 is resolved
+      long timestampAsOfAsLong = Long.parseLong(timestampAsOf);
+      return Optional.of(DateTimeUtil.formatTimestampMillisWithLocalTime(timestampAsOfAsLong * 1000));

Review Comment:
   @singhpk234, sounds good to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5160: Spark 3.3: [FOLLOWUP] Add TODO for TimeTravel via DataframeReader using timestamp in seconds and CatalogOptions

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5160:
URL: https://github.com/apache/iceberg/pull/5160#issuecomment-1173163926

   Should we close this since it won't be fixed in Iceberg?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] singhpk234 closed pull request #5160: Spark 3.3: [FOLLOWUP] Add TODO for TimeTravel via DataframeReader using timestamp in seconds and CatalogOptions

Posted by GitBox <gi...@apache.org>.
singhpk234 closed pull request #5160: Spark 3.3: [FOLLOWUP] Add TODO for TimeTravel via DataframeReader using timestamp in seconds and CatalogOptions
URL: https://github.com/apache/iceberg/pull/5160


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5160: Spark 3.3: [FOLLOWUP] TimeTravel using SPARK SQL and DataFrameReaders

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5160:
URL: https://github.com/apache/iceberg/pull/5160#discussion_r910270845


##########
core/src/main/java/org/apache/iceberg/util/DateTimeUtil.java:
##########
@@ -88,4 +88,8 @@ public static long microsFromTimestamptz(OffsetDateTime dateTime) {
   public static String formatTimestampMillis(long millis) {
     return DATE_FORMAT.format(LocalDateTime.ofInstant(Instant.ofEpochMilli(millis), ZoneOffset.UTC));
   }
+
+  public static String formatTimestampMillisWithLocalTime(long millis) {
+    return DATE_FORMAT.format(LocalDateTime.ofInstant(Instant.ofEpochMilli(millis), ZoneOffset.systemDefault()));

Review Comment:
   Iceberg does not produce zone-specific time formats. This should not go in a common class like `DateTimeUtil`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] singhpk234 commented on a diff in pull request #5160: Spark 3.3: [FOLLOWUP] TimeTravel using SPARK SQL and DataFrameReaders

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on code in PR #5160:
URL: https://github.com/apache/iceberg/pull/5160#discussion_r911616713


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/IcebergSource.java:
##########
@@ -173,7 +174,19 @@ public Optional<String> extractTimeTravelVersion(CaseInsensitiveStringMap option
 
   @Override
   public Optional<String> extractTimeTravelTimestamp(CaseInsensitiveStringMap options) {
-    return Optional.ofNullable(PropertyUtil.propertyAsString(options, "timestampAsOf", null));
+    String timestampAsOf = PropertyUtil.propertyAsString(options, "timestampAsOf", null);
+    if (timestampAsOf == null) {
+      return Optional.empty();
+    }
+
+    try {
+      // timestamp provided should be at a seconds precision.
+      // TODO: remove once https://issues.apache.org/jira/browse/SPARK-39633 is resolved
+      long timestampAsOfAsLong = Long.parseLong(timestampAsOf);
+      return Optional.of(DateTimeUtil.formatTimestampMillisWithLocalTime(timestampAsOfAsLong * 1000));

Review Comment:
   Makes sense, calling spark api's via iceberg will also not be ok i think then as this now indirectly puts the translation to iceberg (as which spark api to use and to provide zoneids etc). Should I then add a TODO in the code with the issue linking it and add the UT added in this PR in ignore as soon as we upgrade to 3.3.1 we will have it (The PR for [SPARK-39633](https://issues.apache.org/jira/browse/SPARK-39633) is merged in upstream). 
   
   Meanwhile we have existing ways to TT via dataframe options we can specify [as-of-timestamp](https://iceberg.apache.org/docs/latest/spark-queries/#time-travel) in milliseconds.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org