You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "JonasJ-ap (via GitHub)" <gi...@apache.org> on 2023/04/27 18:21:02 UTC

[GitHub] [iceberg] JonasJ-ap opened a new pull request, #7450: Delta Migration: Add version and timestamp tags for each Delta Lake transaction when add to Iceberg transaction

JonasJ-ap opened a new pull request, #7450:
URL: https://github.com/apache/iceberg/pull/7450

   Fixes #6769 
   
   For each Iceberg transaction migrated, add `delta-version-xxx` and `delta-yyyy-MM-dd-HH:mm:ss.SSS` tag to the snapshot. `delta-version-xxx` represents the logical delta lake version. `delta-yyyy-MM-dd-HH:mm:ss.SSS` represents the commit time of the corresponding delta lake transaction


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #7450: Delta Migration: Add version and timestamp tags for each Delta Lake transaction when add to Iceberg transaction

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #7450:
URL: https://github.com/apache/iceberg/pull/7450#discussion_r1179542476


##########
delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java:
##########
@@ -79,6 +83,10 @@ class BaseSnapshotDeltaLakeTableAction implements SnapshotDeltaLakeTable {
   private static final String DELTA_SOURCE_VALUE = "delta";
   private static final String ORIGINAL_LOCATION_PROP = "original_location";
   private static final String PARQUET_SUFFIX = ".parquet";
+  private static final String DELTA_VERSION_TAG_PREFIX = "delta-version-";
+  private static final String DELTA_TIMESTAMP_TAG_PREFIX = "delta-";

Review Comment:
   what about `delta-ts-`? just to be clear



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #7450: Delta Migration: Add version and timestamp tags for each Delta Lake transaction when add to Iceberg transaction

Posted by "JonasJ-ap (via GitHub)" <gi...@apache.org>.
JonasJ-ap commented on code in PR #7450:
URL: https://github.com/apache/iceberg/pull/7450#discussion_r1179694860


##########
delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java:
##########
@@ -412,6 +426,25 @@ private Map<String, String> destTableProperties(
     return additionalPropertiesBuilder.build();
   }
 
+  private void tagCurrentSnapshot(long deltaVersion, Transaction transaction) {
+    long currentSnapshotId = transaction.table().currentSnapshot().snapshotId();
+
+    ManageSnapshots manageSnapshots = transaction.manageSnapshots();
+    manageSnapshots.createTag(DELTA_VERSION_TAG_PREFIX + deltaVersion, currentSnapshotId);
+
+    Timestamp deltaVersionTimestamp = deltaLog.getCommitInfoAt(deltaVersion).getTimestamp();

Review Comment:
   Based on its source code, it seems this method returns non-null result:
   https://github.com/delta-io/connectors/blob/f11c355649bf1182dd74b563bc399fa7f1b7fe97/standalone/src/main/scala/io/delta/standalone/internal/DeltaLogImpl.scala#L101-L104
   
   https://github.com/delta-io/connectors/blob/f11c355649bf1182dd74b563bc399fa7f1b7fe97/standalone/src/main/scala/io/delta/standalone/internal/util/ConversionUtils.scala#L126-L159
   
   So I think we are safe to exclude null-check for simplification. However, given that the interface of this method does not specify it as `nonnull`, I am not against adding a null check. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 merged pull request #7450: Delta Migration: Add version and timestamp tags for each Delta Lake transaction when add to Iceberg transaction

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 merged PR #7450:
URL: https://github.com/apache/iceberg/pull/7450


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #7450: Delta Migration: Add version and timestamp tags for each Delta Lake transaction when add to Iceberg transaction

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #7450:
URL: https://github.com/apache/iceberg/pull/7450#discussion_r1179761033


##########
delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java:
##########
@@ -412,6 +426,25 @@ private Map<String, String> destTableProperties(
     return additionalPropertiesBuilder.build();
   }
 
+  private void tagCurrentSnapshot(long deltaVersion, Transaction transaction) {
+    long currentSnapshotId = transaction.table().currentSnapshot().snapshotId();
+
+    ManageSnapshots manageSnapshots = transaction.manageSnapshots();
+    manageSnapshots.createTag(DELTA_VERSION_TAG_PREFIX + deltaVersion, currentSnapshotId);
+
+    Timestamp deltaVersionTimestamp = deltaLog.getCommitInfoAt(deltaVersion).getTimestamp();

Review Comment:
   cool, in that case I am fine as is!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #7450: Delta Migration: Add version and timestamp tags for each Delta Lake transaction when add to Iceberg transaction

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #7450:
URL: https://github.com/apache/iceberg/pull/7450#discussion_r1179581773


##########
delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java:
##########
@@ -412,6 +426,25 @@ private Map<String, String> destTableProperties(
     return additionalPropertiesBuilder.build();
   }
 
+  private void tagCurrentSnapshot(long deltaVersion, Transaction transaction) {
+    long currentSnapshotId = transaction.table().currentSnapshot().snapshotId();
+
+    ManageSnapshots manageSnapshots = transaction.manageSnapshots();
+    manageSnapshots.createTag(DELTA_VERSION_TAG_PREFIX + deltaVersion, currentSnapshotId);
+
+    Timestamp deltaVersionTimestamp = deltaLog.getCommitInfoAt(deltaVersion).getTimestamp();
+    if (deltaVersionTimestamp != null) {
+      String formattedDeltaTimestamp =
+          deltaVersionTimestamp
+              .toInstant()
+              .atZone(ZoneId.of(DELTA_TIME_STAMP_ZONE))
+              .format(DateTimeFormatter.ofPattern(DELTA_TIME_STAMP_FORMAT));

Review Comment:
   Thought about this a bit more, I think I would +1 for just using raw millisecond timestamp. Dealing with timezone is complex and should be avoided if there is no significant benefit. Also it leaves a space character in the tag, that might cause issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #7450: Delta Migration: Add version and timestamp tags for each Delta Lake transaction when add to Iceberg transaction

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #7450:
URL: https://github.com/apache/iceberg/pull/7450#discussion_r1179550048


##########
delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java:
##########
@@ -412,6 +426,25 @@ private Map<String, String> destTableProperties(
     return additionalPropertiesBuilder.build();
   }
 
+  private void tagCurrentSnapshot(long deltaVersion, Transaction transaction) {
+    long currentSnapshotId = transaction.table().currentSnapshot().snapshotId();
+
+    ManageSnapshots manageSnapshots = transaction.manageSnapshots();
+    manageSnapshots.createTag(DELTA_VERSION_TAG_PREFIX + deltaVersion, currentSnapshotId);
+
+    Timestamp deltaVersionTimestamp = deltaLog.getCommitInfoAt(deltaVersion).getTimestamp();
+    if (deltaVersionTimestamp != null) {
+      String formattedDeltaTimestamp =
+          deltaVersionTimestamp
+              .toInstant()
+              .atZone(ZoneId.of(DELTA_TIME_STAMP_ZONE))
+              .format(DateTimeFormatter.ofPattern(DELTA_TIME_STAMP_FORMAT));

Review Comment:
   I am wondering how much it is worth doing this conversion, instead of just using the millisecond timestamp value. Any thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #7450: Delta Migration: Add version and timestamp tags for each Delta Lake transaction when add to Iceberg transaction

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on PR #7450:
URL: https://github.com/apache/iceberg/pull/7450#issuecomment-1526815771

   Thanks for adding the feature Jonas!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #7450: Delta Migration: Add version and timestamp tags for each Delta Lake transaction when add to Iceberg transaction

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #7450:
URL: https://github.com/apache/iceberg/pull/7450#discussion_r1179543545


##########
delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java:
##########
@@ -412,6 +426,25 @@ private Map<String, String> destTableProperties(
     return additionalPropertiesBuilder.build();
   }
 
+  private void tagCurrentSnapshot(long deltaVersion, Transaction transaction) {
+    long currentSnapshotId = transaction.table().currentSnapshot().snapshotId();
+
+    ManageSnapshots manageSnapshots = transaction.manageSnapshots();
+    manageSnapshots.createTag(DELTA_VERSION_TAG_PREFIX + deltaVersion, currentSnapshotId);
+
+    Timestamp deltaVersionTimestamp = deltaLog.getCommitInfoAt(deltaVersion).getTimestamp();

Review Comment:
   should we also check `deltaLog.getCommitInfoAt(deltaVersion)` is not null?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org