You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@paimon.apache.org by "tsreaper (via GitHub)" <gi...@apache.org> on 2024/03/07 08:57:38 UTC

[PR] [core] Data files with delete records should not be upgraded directly to max level [incubator-paimon]

tsreaper opened a new pull request, #2962:
URL: https://github.com/apache/incubator-paimon/pull/2962

   ### Purpose
   
   Data files on max level should not have delete records. However when upgrading files to max level we currently does not perform this check.
   
   ### Tests
   
   `TableWriteTest#testUpgradeToMaxLevel`.
   
   ### API and Format
   
   Yes. `DataFileMeta` is changed, and thus `ManifestEntry` is changed. However as long as we can pass `PrimaryKeyFileStoreTableTest#testStreamingChangelogCompatibility02` there should not be any issue.
   
   ### Documentation
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [core] Data files with delete records should not be upgraded directly to max level [incubator-paimon]

Posted by "tsreaper (via GitHub)" <gi...@apache.org>.
tsreaper commented on PR #2962:
URL: https://github.com/apache/incubator-paimon/pull/2962#issuecomment-2002797530

   > Please manually test that the newly written table can be read by the old code.
   
   I've tested by hand, that the old reader code can read this newly written table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [core] Data files with delete records should not be upgraded directly to max level [incubator-paimon]

Posted by "JingsongLi (via GitHub)" <gi...@apache.org>.
JingsongLi commented on code in PR #2962:
URL: https://github.com/apache/incubator-paimon/pull/2962#discussion_r1525839508


##########
paimon-core/src/main/java/org/apache/paimon/io/DataFileMeta.java:
##########
@@ -58,7 +60,13 @@ public class DataFileMeta {
 
     private final String fileName;
     private final long fileSize;
+
+    // rowCount = addRowCount + deleteRowCount
+    // Why don't we keep addRowCount and deleteRowCount?
+    // Because in previous versions of DataFileMeta, we only keep rowCount.
+    // We have to keep the compatibility.
     private final long rowCount;
+    private final @Nullable Long deleteRowCount;

Review Comment:
   Put this field to last class member.



##########
paimon-core/src/main/java/org/apache/paimon/io/DataFileMeta.java:
##########
@@ -58,7 +60,13 @@ public class DataFileMeta {
 
     private final String fileName;
     private final long fileSize;
+
+    // rowCount = addRowCount + deleteRowCount
+    // Why don't we keep addRowCount and deleteRowCount?
+    // Because in previous versions of DataFileMeta, we only keep rowCount.
+    // We have to keep the compatibility.
     private final long rowCount;
+    private final @Nullable Long deleteRowCount;

Review Comment:
   Put this field to last class member.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [core] Data files with delete records should not be upgraded directly to max level [incubator-paimon]

Posted by "JingsongLi (via GitHub)" <gi...@apache.org>.
JingsongLi commented on code in PR #2962:
URL: https://github.com/apache/incubator-paimon/pull/2962#discussion_r1525839582


##########
paimon-core/src/main/java/org/apache/paimon/io/DataFileMeta.java:
##########
@@ -58,7 +60,13 @@ public class DataFileMeta {
 
     private final String fileName;
     private final long fileSize;
+
+    // rowCount = addRowCount + deleteRowCount
+    // Why don't we keep addRowCount and deleteRowCount?
+    // Because in previous versions of DataFileMeta, we only keep rowCount.
+    // We have to keep the compatibility.
     private final long rowCount;
+    private final @Nullable Long deleteRowCount;

Review Comment:
   Put this field to last class member.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [core] Data files with delete records should not be upgraded directly to max level [incubator-paimon]

Posted by "JingsongLi (via GitHub)" <gi...@apache.org>.
JingsongLi merged PR #2962:
URL: https://github.com/apache/incubator-paimon/pull/2962


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [core] Data files with delete records should not be upgraded directly to max level [incubator-paimon]

Posted by "JingsongLi (via GitHub)" <gi...@apache.org>.
JingsongLi commented on code in PR #2962:
URL: https://github.com/apache/incubator-paimon/pull/2962#discussion_r1519071424


##########
paimon-core/src/main/java/org/apache/paimon/io/DataFileMeta.java:
##########
@@ -58,7 +60,13 @@ public class DataFileMeta {
 
     private final String fileName;
     private final long fileSize;
-    private final long rowCount;
+
+    // totalRowCount = addRowCount + deleteRowCount
+    // Why don't we keep addRowCount and deleteRowCount?
+    // Because in previous versions of DataFileMeta, we only keep rowCount.
+    // We have to keep the compatibility.
+    private final long totalRowCount;
+    private final @Nullable Long addRowCount;

Review Comment:
   It is better to introduce `deleteRowCount`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [core] Data files with delete records should not be upgraded directly to max level [incubator-paimon]

Posted by "JingsongLi (via GitHub)" <gi...@apache.org>.
JingsongLi commented on code in PR #2962:
URL: https://github.com/apache/incubator-paimon/pull/2962#discussion_r1516222025


##########
paimon-core/src/main/java/org/apache/paimon/io/DataFileMeta.java:
##########
@@ -58,7 +60,13 @@ public class DataFileMeta {
 
     private final String fileName;
     private final long fileSize;
-    private final long rowCount;
+
+    // totalRowCount = addRowCount + deleteRowCount
+    // Why don't we keep addRowCount and deleteRowCount?
+    // Because in previous versions of DataFileMeta, we only keep rowCount.
+    // We have to keep the compatibility.
+    private final long totalRowCount;

Review Comment:
   Can we still use `rowCount` naming?
   Because in schema, it is still `_ROW_COUNT`...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [core] Data files with delete records should not be upgraded directly to max level [paimon]

Posted by "nonggialiang (via GitHub)" <gi...@apache.org>.
nonggialiang commented on PR #2962:
URL: https://github.com/apache/paimon/pull/2962#issuecomment-2072014949

   > > Please manually test that the newly written table can be read by the old code.
   > 
   > I've tested by hand, that the old reader code can read this newly written table.
   
   When i use 0.5 code to read a table written by 0.8-SNAPSHOT, i got this exception. Is that as expected?  @tsreaper 
   ```		
   Caused by: java.lang.IndexOutOfBoundsException: Index: 13, Size: 13
           at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_201]
           at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_201]
           at org.apache.paimon.format.avro.FieldReaderFactory$RowReader.<init>(FieldReaderFactory.java:455) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.format.avro.FieldReaderFactory$RowReader.<init>(FieldReaderFactory.java:447) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.format.avro.FieldReaderFactory.visitRecord(FieldReaderFactory.java:147) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.format.avro.FieldReaderFactory.visitRecord(FieldReaderFactory.java:44) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.format.avro.AvroSchemaVisitor.visit(AvroSchemaVisitor.java:41) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.format.avro.FieldReaderFactory.visitUnion(FieldReaderFactory.java:70) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.format.avro.FieldReaderFactory.visitUnion(FieldReaderFactory.java:44) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.format.avro.AvroSchemaVisitor.visit(AvroSchemaVisitor.java:44) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.format.avro.FieldReaderFactory$RowReader.<init>(FieldReaderFactory.java:456) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.format.avro.FieldReaderFactory.createRowReader(FieldReaderFactory.java:436) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.format.avro.AvroRowDatumReader.setSchema(AvroRowDatumReader.java:54) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.shade.org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:145) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.shade.org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:143) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.shade.org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:134) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.shade.org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:74) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.format.avro.AvroBulkFormat$AvroReader.createReaderFromPath(AvroBulkFormat.java:81) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.format.avro.AvroBulkFormat$AvroReader.<init>(AvroBulkFormat.java:68) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.format.avro.AvroBulkFormat$AvroReader.<init>(AvroBulkFormat.java:58) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.format.avro.AvroBulkFormat.createReader(AvroBulkFormat.java:55) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.format.avro.AvroBulkFormat.createReader(AvroBulkFormat.java:41) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.utils.FileUtils.createFormatReader(FileUtils.java:128) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.utils.ObjectsFile.read(ObjectsFile.java:97) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.operation.AbstractFileStoreScan.readManifestFileMeta(AbstractFileStoreScan.java:341) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.operation.AbstractFileStoreScan.lambda$null$2(AbstractFileStoreScan.java:238) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267) ~[?:1.8.0_201]
           at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) ~[?:1.8.0_201]
           at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) ~[?:1.8.0_201]
           at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[?:1.8.0_201]
           at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[?:1.8.0_201]
           at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:747) ~[?:1.8.0_201]
           at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:721) ~[?:1.8.0_201]
           at java.util.stream.AbstractTask.compute(AbstractTask.java:316) ~[?:1.8.0_201]
           at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) ~[?:1.8.0_201]
           at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) ~[?:1.8.0_201]
           at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) ~[?:1.8.0_201]
           at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) ~[?:1.8.0_201]
           at java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:714) ~[?:1.8.0_201]
           at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) ~[?:1.8.0_201]
           at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) ~[?:1.8.0_201]
           at org.apache.paimon.operation.AbstractFileStoreScan.lambda$doPlan$3(AbstractFileStoreScan.java:240) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at org.apache.paimon.utils.ParallellyExecuteUtils$1.lambda$advanceIfNeeded$0(ParallellyExecuteUtils.java:78) ~[blob_p-96ee7771d44cf819eb6a119821982bdbeb1ca209-f036c42e0b082802409d42965f38fe86:0.5-SNAPSHOT]
           at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) ~[?:1.8.0_201]
           at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1582) ~[?:1.8.0_201]
           at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) ~[?:1.8.0_201]
           at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) ~[?:1.8.0_201]
           at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) ~[?:1.8.0_201]
           at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) ~[?:1.8.0_201]
   ````
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org