You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@paimon.apache.org by "JingsongLi (via GitHub)" <gi...@apache.org> on 2023/03/29 04:32:13 UTC

[GitHub] [incubator-paimon] JingsongLi opened a new pull request, #755: [core] AppendOnly table full reading order is wrong

JingsongLi opened a new pull request, #755:
URL: https://github.com/apache/incubator-paimon/pull/755

   ### Purpose
   
   When latest-full streaming reading.
   The records order of full reading is wrong. This is because:
   
   New files may be created during the compaction process, then the results of the compaction may be put after the new files, and this order will be disrupted.
   
   We need to ensure this order, so we force the order of files by sequence.
   
   ### Tests
   
   `AppendOnlyFileStoreTableTest.testBatchOrderWithCompaction`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-paimon] yuzelin commented on a diff in pull request #755: [core] AppendOnly table full reading order is wrong

Posted by "yuzelin (via GitHub)" <gi...@apache.org>.
yuzelin commented on code in PR #755:
URL: https://github.com/apache/incubator-paimon/pull/755#discussion_r1151559656


##########
paimon-core/src/main/java/org/apache/paimon/append/AppendOnlyCompactManager.java:
##########
@@ -301,4 +302,39 @@ public List<DataFileMeta> after() {
     public interface CompactRewriter {
         List<DataFileMeta> rewrite(List<DataFileMeta> compactBefore) throws Exception;
     }
+
+    /**
+     * New files may be created during the compaction process, then the results of the compaction
+     * may be put after the new files, and this order will be disrupted. We need to ensure this
+     * order, so we force the order by sequence.
+     */
+    public static List<DataFileMeta> sortFiles(Collection<DataFileMeta> input) {
+        List<DataFileMeta> files = new ArrayList<>(input);
+        files.sort(
+                (o1, o2) -> {
+                    if (isOverlap(o1, o2)) {
+                        throw new RuntimeException(
+                                String.format(
+                                        "There should no overlap in append files, there is a bug!"
+                                                + " Range1(%s, %s), Range2(%s, %s)",
+                                        o1.minSequenceNumber(),
+                                        o1.maxSequenceNumber(),
+                                        o2.minSequenceNumber(),
+                                        o2.maxSequenceNumber()));
+                    }
+
+                    return Long.compare(o1.minSequenceNumber(), o2.minSequenceNumber());
+                });
+        return files;
+    }
+
+    private static boolean isOverlap(DataFileMeta o1, DataFileMeta o2) {
+        if (o1.minSequenceNumber() <= o2.maxSequenceNumber()
+                && o1.maxSequenceNumber() >= o2.minSequenceNumber()) {
+            return true;
+        }

Review Comment:
   Duplicated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-paimon] yuzelin commented on a diff in pull request #755: [core] AppendOnly table full reading order is wrong

Posted by "yuzelin (via GitHub)" <gi...@apache.org>.
yuzelin commented on code in PR #755:
URL: https://github.com/apache/incubator-paimon/pull/755#discussion_r1151555168


##########
paimon-core/src/main/java/org/apache/paimon/append/AppendOnlyCompactManager.java:
##########
@@ -301,4 +302,39 @@ public List<DataFileMeta> after() {
     public interface CompactRewriter {
         List<DataFileMeta> rewrite(List<DataFileMeta> compactBefore) throws Exception;
     }
+
+    /**
+     * New files may be created during the compaction process, then the results of the compaction
+     * may be put after the new files, and this order will be disrupted. We need to ensure this
+     * order, so we force the order by sequence.
+     */
+    public static List<DataFileMeta> sortFiles(Collection<DataFileMeta> input) {
+        List<DataFileMeta> files = new ArrayList<>(input);
+        files.sort(
+                (o1, o2) -> {
+                    if (isOverlap(o1, o2)) {
+                        throw new RuntimeException(
+                                String.format(
+                                        "There should no overlap in append files, there is a bug!"
+                                                + " Range1(%s, %s), Range2(%s, %s)",
+                                        o1.minSequenceNumber(),
+                                        o1.maxSequenceNumber(),
+                                        o2.minSequenceNumber(),
+                                        o2.maxSequenceNumber()));
+                    }
+
+                    return Long.compare(o1.minSequenceNumber(), o2.minSequenceNumber());
+                });
+        return files;
+    }
+
+    private static boolean isOverlap(DataFileMeta o1, DataFileMeta o2) {
+        if (o1.minSequenceNumber() <= o2.maxSequenceNumber()
+                && o1.maxSequenceNumber() >= o2.minSequenceNumber()) {
+            return true;
+        }
+
+        return o2.minSequenceNumber() <= o1.maxSequenceNumber()
+                && o2.maxSequenceNumber() >= o1.minSequenceNumber();
+    }

Review Comment:
   Maybe it's more clear to check false first:
   ```
   if (o1.max() < o2.min || o1.min > o2.max) {
       return false;
   }
   return true;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-paimon] JingsongLi commented on a diff in pull request #755: [core] AppendOnly table full reading order is wrong

Posted by "JingsongLi (via GitHub)" <gi...@apache.org>.
JingsongLi commented on code in PR #755:
URL: https://github.com/apache/incubator-paimon/pull/755#discussion_r1151581232


##########
paimon-core/src/main/java/org/apache/paimon/append/AppendOnlyCompactManager.java:
##########
@@ -301,4 +302,39 @@ public List<DataFileMeta> after() {
     public interface CompactRewriter {
         List<DataFileMeta> rewrite(List<DataFileMeta> compactBefore) throws Exception;
     }
+
+    /**
+     * New files may be created during the compaction process, then the results of the compaction
+     * may be put after the new files, and this order will be disrupted. We need to ensure this
+     * order, so we force the order by sequence.
+     */
+    public static List<DataFileMeta> sortFiles(Collection<DataFileMeta> input) {
+        List<DataFileMeta> files = new ArrayList<>(input);
+        files.sort(
+                (o1, o2) -> {
+                    if (isOverlap(o1, o2)) {
+                        throw new RuntimeException(
+                                String.format(
+                                        "There should no overlap in append files, there is a bug!"
+                                                + " Range1(%s, %s), Range2(%s, %s)",
+                                        o1.minSequenceNumber(),
+                                        o1.maxSequenceNumber(),
+                                        o2.minSequenceNumber(),
+                                        o2.maxSequenceNumber()));
+                    }
+
+                    return Long.compare(o1.minSequenceNumber(), o2.minSequenceNumber());
+                });
+        return files;
+    }
+
+    private static boolean isOverlap(DataFileMeta o1, DataFileMeta o2) {
+        if (o1.minSequenceNumber() <= o2.maxSequenceNumber()
+                && o1.maxSequenceNumber() >= o2.minSequenceNumber()) {
+            return true;
+        }

Review Comment:
   It is duplicated



##########
paimon-core/src/main/java/org/apache/paimon/append/AppendOnlyCompactManager.java:
##########
@@ -301,4 +302,39 @@ public List<DataFileMeta> after() {
     public interface CompactRewriter {
         List<DataFileMeta> rewrite(List<DataFileMeta> compactBefore) throws Exception;
     }
+
+    /**
+     * New files may be created during the compaction process, then the results of the compaction
+     * may be put after the new files, and this order will be disrupted. We need to ensure this
+     * order, so we force the order by sequence.
+     */
+    public static List<DataFileMeta> sortFiles(Collection<DataFileMeta> input) {
+        List<DataFileMeta> files = new ArrayList<>(input);
+        files.sort(
+                (o1, o2) -> {
+                    if (isOverlap(o1, o2)) {
+                        throw new RuntimeException(
+                                String.format(
+                                        "There should no overlap in append files, there is a bug!"
+                                                + " Range1(%s, %s), Range2(%s, %s)",
+                                        o1.minSequenceNumber(),
+                                        o1.maxSequenceNumber(),
+                                        o2.minSequenceNumber(),
+                                        o2.maxSequenceNumber()));
+                    }
+
+                    return Long.compare(o1.minSequenceNumber(), o2.minSequenceNumber());
+                });
+        return files;
+    }
+
+    private static boolean isOverlap(DataFileMeta o1, DataFileMeta o2) {
+        if (o1.minSequenceNumber() <= o2.maxSequenceNumber()
+                && o1.maxSequenceNumber() >= o2.minSequenceNumber()) {
+            return true;
+        }

Review Comment:
   True, It is duplicated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-paimon] yuzelin commented on a diff in pull request #755: [core] AppendOnly table full reading order is wrong

Posted by "yuzelin (via GitHub)" <gi...@apache.org>.
yuzelin commented on code in PR #755:
URL: https://github.com/apache/incubator-paimon/pull/755#discussion_r1151555168


##########
paimon-core/src/main/java/org/apache/paimon/append/AppendOnlyCompactManager.java:
##########
@@ -301,4 +302,39 @@ public List<DataFileMeta> after() {
     public interface CompactRewriter {
         List<DataFileMeta> rewrite(List<DataFileMeta> compactBefore) throws Exception;
     }
+
+    /**
+     * New files may be created during the compaction process, then the results of the compaction
+     * may be put after the new files, and this order will be disrupted. We need to ensure this
+     * order, so we force the order by sequence.
+     */
+    public static List<DataFileMeta> sortFiles(Collection<DataFileMeta> input) {
+        List<DataFileMeta> files = new ArrayList<>(input);
+        files.sort(
+                (o1, o2) -> {
+                    if (isOverlap(o1, o2)) {
+                        throw new RuntimeException(
+                                String.format(
+                                        "There should no overlap in append files, there is a bug!"
+                                                + " Range1(%s, %s), Range2(%s, %s)",
+                                        o1.minSequenceNumber(),
+                                        o1.maxSequenceNumber(),
+                                        o2.minSequenceNumber(),
+                                        o2.maxSequenceNumber()));
+                    }
+
+                    return Long.compare(o1.minSequenceNumber(), o2.minSequenceNumber());
+                });
+        return files;
+    }
+
+    private static boolean isOverlap(DataFileMeta o1, DataFileMeta o2) {
+        if (o1.minSequenceNumber() <= o2.maxSequenceNumber()
+                && o1.maxSequenceNumber() >= o2.minSequenceNumber()) {
+            return true;
+        }
+
+        return o2.minSequenceNumber() <= o1.maxSequenceNumber()
+                && o2.maxSequenceNumber() >= o1.minSequenceNumber();
+    }

Review Comment:
   Maybe it's more clear to check false first:
   ```
   if (o1.max() < o2.min || o1.min > o2.max) {
       return false;
   }
   return true;
   ```
   Or `return !(o1.max < o2.min || o1.min > o2.max)`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-paimon] JingsongLi commented on a diff in pull request #755: [core] AppendOnly table full reading order is wrong

Posted by "JingsongLi (via GitHub)" <gi...@apache.org>.
JingsongLi commented on code in PR #755:
URL: https://github.com/apache/incubator-paimon/pull/755#discussion_r1151581232


##########
paimon-core/src/main/java/org/apache/paimon/append/AppendOnlyCompactManager.java:
##########
@@ -301,4 +302,39 @@ public List<DataFileMeta> after() {
     public interface CompactRewriter {
         List<DataFileMeta> rewrite(List<DataFileMeta> compactBefore) throws Exception;
     }
+
+    /**
+     * New files may be created during the compaction process, then the results of the compaction
+     * may be put after the new files, and this order will be disrupted. We need to ensure this
+     * order, so we force the order by sequence.
+     */
+    public static List<DataFileMeta> sortFiles(Collection<DataFileMeta> input) {
+        List<DataFileMeta> files = new ArrayList<>(input);
+        files.sort(
+                (o1, o2) -> {
+                    if (isOverlap(o1, o2)) {
+                        throw new RuntimeException(
+                                String.format(
+                                        "There should no overlap in append files, there is a bug!"
+                                                + " Range1(%s, %s), Range2(%s, %s)",
+                                        o1.minSequenceNumber(),
+                                        o1.maxSequenceNumber(),
+                                        o2.minSequenceNumber(),
+                                        o2.maxSequenceNumber()));
+                    }
+
+                    return Long.compare(o1.minSequenceNumber(), o2.minSequenceNumber());
+                });
+        return files;
+    }
+
+    private static boolean isOverlap(DataFileMeta o1, DataFileMeta o2) {
+        if (o1.minSequenceNumber() <= o2.maxSequenceNumber()
+                && o1.maxSequenceNumber() >= o2.minSequenceNumber()) {
+            return true;
+        }

Review Comment:
   It is not duplicated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-paimon] JingsongLi merged pull request #755: [core] AppendOnly table full reading order is wrong

Posted by "JingsongLi (via GitHub)" <gi...@apache.org>.
JingsongLi merged PR #755:
URL: https://github.com/apache/incubator-paimon/pull/755


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org