You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@paimon.apache.org by "Zouxxyy (via GitHub)" <gi...@apache.org> on 2024/03/20 08:47:50 UTC

[PR] [core] Introduce SplitGroup for SplitGenerator [incubator-paimon]

Zouxxyy opened a new pull request, #3059:
URL: https://github.com/apache/incubator-paimon/pull/3059

   <!-- Please specify the module before the PR name: [core] ... or [flink] ... -->
   
   ### Purpose
   
   - make SplitGenerator return files + rawConvertible
   - optimizer split generate
   
   ### Tests
   
   <!-- List UT and IT cases to verify this change -->
   
   ### API and Format
   
   <!-- Does this change affect API or storage format -->
   
   ### Documentation
   
   <!-- Does this change introduce a new feature -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [core] Introduce SplitGroup for SplitGenerator [paimon]

Posted by "Zouxxyy (via GitHub)" <gi...@apache.org>.
Zouxxyy commented on code in PR #3059:
URL: https://github.com/apache/paimon/pull/3059#discussion_r1537227164


##########
paimon-core/src/main/java/org/apache/paimon/table/source/MergeTreeSplitGenerator.java:
##########
@@ -61,11 +61,17 @@ public MergeTreeSplitGenerator(
     }
 
     @Override
-    public List<List<DataFileMeta>> splitForBatch(List<DataFileMeta> files) {
-        if (deletionVectorsEnabled || mergeEngine == FIRST_ROW) {
+    public List<SplitGroup> splitForBatch(List<DataFileMeta> files) {
+        boolean rawConvertible = files.stream().allMatch(DataFileMeta::rawConvertible);
+        boolean oneLevel =

Review Comment:
   Not zero level is tested by files.stream().allMatch(DataFileMeta::rawConvertible);



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [core] Introduce SplitGroup for SplitGenerator [paimon]

Posted by "Zouxxyy (via GitHub)" <gi...@apache.org>.
Zouxxyy commented on PR #3059:
URL: https://github.com/apache/paimon/pull/3059#issuecomment-2017664661

   > Hi @Zouxxyy , can you add more tests for your modification?
   
   Added~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [core] Introduce SplitGroup for SplitGenerator [paimon]

Posted by "JingsongLi (via GitHub)" <gi...@apache.org>.
JingsongLi commented on code in PR #3059:
URL: https://github.com/apache/paimon/pull/3059#discussion_r1537097010


##########
paimon-core/src/main/java/org/apache/paimon/table/source/MergeTreeSplitGenerator.java:
##########
@@ -61,11 +61,17 @@ public MergeTreeSplitGenerator(
     }
 
     @Override
-    public List<List<DataFileMeta>> splitForBatch(List<DataFileMeta> files) {
-        if (deletionVectorsEnabled || mergeEngine == FIRST_ROW) {
+    public List<SplitGroup> splitForBatch(List<DataFileMeta> files) {
+        boolean rawConvertible = files.stream().allMatch(DataFileMeta::rawConvertible);
+        boolean oneLevel =

Review Comment:
   oneLevel and not zero level.
   
   You should add test for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [core] Introduce SplitGroup for SplitGenerator [paimon]

Posted by "Zouxxyy (via GitHub)" <gi...@apache.org>.
Zouxxyy commented on code in PR #3059:
URL: https://github.com/apache/paimon/pull/3059#discussion_r1537227164


##########
paimon-core/src/main/java/org/apache/paimon/table/source/MergeTreeSplitGenerator.java:
##########
@@ -61,11 +61,17 @@ public MergeTreeSplitGenerator(
     }
 
     @Override
-    public List<List<DataFileMeta>> splitForBatch(List<DataFileMeta> files) {
-        if (deletionVectorsEnabled || mergeEngine == FIRST_ROW) {
+    public List<SplitGroup> splitForBatch(List<DataFileMeta> files) {
+        boolean rawConvertible = files.stream().allMatch(DataFileMeta::rawConvertible);
+        boolean oneLevel =

Review Comment:
   Not zero level is tested by files.stream().allMatch(DataFileMeta::rawConvertible);



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [core] Introduce SplitGroup for SplitGenerator [paimon]

Posted by "Zouxxyy (via GitHub)" <gi...@apache.org>.
Zouxxyy commented on code in PR #3059:
URL: https://github.com/apache/paimon/pull/3059#discussion_r1538556886


##########
paimon-core/src/main/java/org/apache/paimon/table/source/MergeTreeSplitGenerator.java:
##########
@@ -61,11 +61,17 @@ public MergeTreeSplitGenerator(
     }
 
     @Override
-    public List<List<DataFileMeta>> splitForBatch(List<DataFileMeta> files) {
-        if (deletionVectorsEnabled || mergeEngine == FIRST_ROW) {
+    public List<SplitGroup> splitForBatch(List<DataFileMeta> files) {
+        boolean rawConvertible = files.stream().allMatch(DataFileMeta::rawConvertible);
+        boolean oneLevel =

Review Comment:
   move rawConvertible to avoid potential problems with the append table



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [core] Introduce SplitGroup for SplitGenerator [paimon]

Posted by "JingsongLi (via GitHub)" <gi...@apache.org>.
JingsongLi commented on PR #3059:
URL: https://github.com/apache/paimon/pull/3059#issuecomment-2019485380

   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [core] Introduce SplitGroup for SplitGenerator [paimon]

Posted by "JingsongLi (via GitHub)" <gi...@apache.org>.
JingsongLi commented on PR #3059:
URL: https://github.com/apache/paimon/pull/3059#issuecomment-2019274189

   any idea about above comment: `oneLevel and not zero level.`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [core] Introduce SplitGroup for SplitGenerator [paimon]

Posted by "JingsongLi (via GitHub)" <gi...@apache.org>.
JingsongLi merged PR #3059:
URL: https://github.com/apache/paimon/pull/3059


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org