You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "szehon-ho (via GitHub)" <gi...@apache.org> on 2023/05/22 23:07:08 UTC

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7581: Add last updated timestamp and snapshotId for partition table

szehon-ho commented on code in PR #7581:
URL: https://github.com/apache/iceberg/pull/7581#discussion_r1201254518


##########
core/src/main/java/org/apache/iceberg/PartitionsTable.java:
##########
@@ -47,6 +47,16 @@ public class PartitionsTable extends BaseMetadataTable {
         new Schema(
             Types.NestedField.required(1, "partition", Partitioning.partitionType(table)),
             Types.NestedField.required(4, "spec_id", Types.IntegerType.get()),
+            Types.NestedField.required(
+                9,
+                "last_updated_at",

Review Comment:
   How about 'last_updated' to be more precise, and match table metadata json?



##########
core/src/main/java/org/apache/iceberg/PartitionsTable.java:
##########
@@ -246,19 +268,27 @@ static class Partition {
     private int posDeleteFileCount;
     private long eqDeleteRecordCount;
     private int eqDeleteFileCount;
+    private long lastUpdatedAt;
+    private long lastUpdatedSnapshotId;
 
     Partition(StructLike key, Types.StructType keyType) {
       this.partitionData = toPartitionData(key, keyType);
       this.specId = 0;
-      this.dataRecordCount = 0;
+      this.dataRecordCount = 0L;
       this.dataFileCount = 0;
-      this.posDeleteRecordCount = 0;
+      this.posDeleteRecordCount = 0L;
       this.posDeleteFileCount = 0;
-      this.eqDeleteRecordCount = 0;
+      this.eqDeleteRecordCount = 0L;
       this.eqDeleteFileCount = 0;
+      this.lastUpdatedAt = 0L;
+      this.lastUpdatedSnapshotId = 0L;
     }
 
-    void update(ContentFile<?> file) {
+    void update(ContentFile<?> file, Snapshot snapshot) {
+      if (snapshot.timestampMillis() * 1000 > this.lastUpdatedAt) {

Review Comment:
   opt: maybe make a variable snapshotCommitTime = snapshot.timestampMillis * 1000?



##########
core/src/main/java/org/apache/iceberg/PartitionsTable.java:
##########
@@ -155,21 +172,26 @@ private static Iterable<Partition> partitions(Table table, StaticTableScan scan)
   }
 
   @VisibleForTesting
-  static CloseableIterable<ContentFile<?>> planFiles(StaticTableScan scan) {
+  static CloseableIterable<ManifestEntry<? extends ContentFile<?>>> planEntries(

Review Comment:
   Is it possible to remove `? extends ContentFile` here?



##########
core/src/main/java/org/apache/iceberg/PartitionsTable.java:
##########
@@ -155,21 +172,26 @@ private static Iterable<Partition> partitions(Table table, StaticTableScan scan)
   }
 
   @VisibleForTesting
-  static CloseableIterable<ContentFile<?>> planFiles(StaticTableScan scan) {
+  static CloseableIterable<ManifestEntry<? extends ContentFile<?>>> planEntries(
+      StaticTableScan scan) {
     Table table = scan.table();
 
     CloseableIterable<ManifestFile> filteredManifests =
         filteredManifests(scan, table, scan.snapshot().allManifests(table.io()));
 
-    Iterable<CloseableIterable<ContentFile<?>>> tasks =
+    Iterable<CloseableIterable<ManifestEntry<? extends ContentFile<?>>>> tasks =
         CloseableIterable.transform(
             filteredManifests,
             manifest ->
                 CloseableIterable.transform(
                     ManifestFiles.open(manifest, table.io(), table.specs())
                         .caseSensitive(scan.isCaseSensitive())
-                        .select(scanColumns(manifest.content())), // don't select stats columns
-                    t -> (ContentFile<?>) t));
+                        .select(scanColumns(manifest.content())) // don't select stats columns
+                        .entries(),
+                    t ->

Review Comment:
   maybe worth breaking into new method now?  ie, manifest -> entries(manifest, scan)



##########
core/src/main/java/org/apache/iceberg/PartitionsTable.java:
##########
@@ -139,13 +155,14 @@ private static StaticDataTask.Row convertPartition(Partition partition) {
   private static Iterable<Partition> partitions(Table table, StaticTableScan scan) {
     Types.StructType partitionType = Partitioning.partitionType(table);
     PartitionMap partitions = new PartitionMap(partitionType);
-
-    try (CloseableIterable<ContentFile<?>> files = planFiles(scan)) {
-      for (ContentFile<?> file : files) {
+    try (CloseableIterable<ManifestEntry<? extends ContentFile<?>>> entries = planEntries(scan)) {
+      for (ManifestEntry<? extends ContentFile<?>> entry : entries) {
+        Snapshot snapshot = table.snapshot(entry.snapshotId());

Review Comment:
   Hm.. i think it may be possible.  Eg, snapshot1 => add DataFile1, snapshot2 => add DataFile2, 
   
   ExpireSnapshot may move Snapshot1, just DataFile1 is not removed.  
   
   Maybe we could make a null here if we don't find a snapshot?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org