You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@paimon.apache.org by "liming30 (via GitHub)" <gi...@apache.org> on 2023/11/20 12:22:18 UTC

[I] [Bug] MultiplePartitionPredicate does not check all fields. [incubator-paimon]

liming30 opened a new issue, #2350:
URL: https://github.com/apache/incubator-paimon/issues/2350

   ### Search before asking
   
   - [X] I searched in the [issues](https://github.com/apache/incubator-paimon/issues) and found nothing similar.
   
   
   ### Paimon version
   
   master
   
   ### Compute Engine
   
   flink-1.17
   
   ### Minimal reproduce step
   
   Try the following test. When I want to filter the partition `<20231120, 16, 0>`, the data for `<20231115>` can still be filtered out.
   ```
           RowType type = DataTypes.ROW(DataTypes.STRING(), DataTypes.STRING(), DataTypes.STRING());
   
           BinaryRow binaryRow = new BinaryRow(3);
           BinaryRowWriter writer = new BinaryRowWriter(binaryRow);
           writer.writeString(0, BinaryString.fromString("20231120"));
           writer.writeString(1, BinaryString.fromString("16"));
           writer.writeString(2, BinaryString.fromString("0"));
           writer.complete();
   
           PartitionPredicate predicate =
                   PartitionPredicate.fromMultiple(type, Collections.singletonList(binaryRow));
   
           FieldStats[] fieldStats = new FieldStats[3];
           fieldStats[0] =
                   new FieldStats(
                           BinaryString.fromString("20231115"),
                           BinaryString.fromString("20231115"),
                           0L);
           fieldStats[1] =
                   new FieldStats(BinaryString.fromString("15"), BinaryString.fromString("20"), 0L);
           fieldStats[2] =
                   new FieldStats(BinaryString.fromString("0"), BinaryString.fromString("5"), 0L);
   
           // return true
           boolean result = predicate.test(100L, fieldStats);
   ```
   
   ### What doesn't meet your expectations?
   
   `MultiplePartitionPredicate` filters out data that obviously does not belong to this partition, which increases the overhead of subsequent filtering.
   
   ### Anything else?
   
   Currently, `AbstractFileStoreWrite#createWriterContainer` will `scanExistingFileMetas`. Due to the misjudgment of `PartitionPredicate`, a large number of `ManifestFile` will be read in parallel.
   
   ### Are you willing to submit a PR?
   
   - [X] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug] MultiplePartitionPredicate does not check all fields. [incubator-paimon]

Posted by "JingsongLi (via GitHub)" <gi...@apache.org>.
JingsongLi closed issue #2350: [Bug] MultiplePartitionPredicate does not check all fields.
URL: https://github.com/apache/incubator-paimon/issues/2350


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org