You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/10/10 03:23:25 UTC

[GitHub] [flink-table-store] zjureel opened a new pull request, #313: [FLINK-27958] Compare batch maxKey to reduce comparisons in SortMergeReader

zjureel opened a new pull request, #313:
URL: https://github.com/apache/flink-table-store/pull/313

   Currently the `SortMergeReader` will compare and sort the readers after reading one batch from them to ensure that the sequence is correct. The readers are created from `SortedRun` list and the key ranges of them may be disjoint. We can compare batch minKey and maxKey for each read in the files of `SortedRun` list and divide them to multiple regions. When there's only one reader in the region, it can read data directly without compare and sort.
   
   So the main changes are as follows:
   1. Add `SortedRegionDataRecordReader` class which can create a reader with minKey and maxKey from each file in `SortedRun`
   2. Add `RecordReaderSubRegion` class which includes `SortedRegionDataRecordReader` list, it is created from one `SortedRun`
   3. Add `RecordReaderRegionManager` to divide `RecordReaderSubRegion` into multiple `RecordReaderRegion`, each `RecordReaderRegion` manages `RecordReaderSubRegion` list and the key range in different `RecordReaderRegion`s are disjoint
   4. Create `SortMergeReader` from each `RecordReaderRegion` to reduce the comparisons in different  `RecordReaderRegion`s. If the `RecordReaderRegion` has only one reader, using the specify reader directly
   
   Test cases `RecordReaderRegionTest` and `RecordReaderRegionManagerTest` are added to test the new classes, the `SortMergeReader` and related classes are tested in `MergeTreeTest`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-table-store] JingsongLi commented on pull request #313: [FLINK-27958] Compare batch maxKey to reduce comparisons in SortMergeReader

Posted by GitBox <gi...@apache.org>.

JingsongLi commented on PR #313:
URL: https://github.com/apache/flink-table-store/pull/313#issuecomment-1275570471

   Hi @zjureel can you do some benchmark to verify the improvement?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-table-store] JingsongLi commented on pull request #313: [FLINK-27958] Compare batch maxKey to reduce comparisons in SortMergeReader

Posted by GitBox <gi...@apache.org>.

JingsongLi commented on PR #313:
URL: https://github.com/apache/flink-table-store/pull/313#issuecomment-1277440357

   +1 to `flink-table-store-micro-benchmarks`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-table-store] zjureel commented on pull request #313: [FLINK-27958] Compare batch maxKey to reduce comparisons in SortMergeReader

Posted by GitBox <gi...@apache.org>.

zjureel commented on PR #313:
URL: https://github.com/apache/flink-table-store/pull/313#issuecomment-1272741579

   Hi @JingsongLi  I tried to fix [FLINK-27958](https://issues.apache.org/jira/browse/FLINK-27958) and the main changes are described as above. Can you help to review the implementation and codes, THX


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-table-store] FangYongs closed pull request #313: [FLINK-27958] Compare batch maxKey to reduce comparisons in SortMergeReader

Posted by "FangYongs (via GitHub)" <gi...@apache.org>.

FangYongs closed pull request #313: [FLINK-27958] Compare batch maxKey to reduce comparisons in SortMergeReader
URL: https://github.com/apache/flink-table-store/pull/313


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-table-store] JingsongLi commented on pull request #313: [FLINK-27958] Compare batch maxKey to reduce comparisons in SortMergeReader

Posted by GitBox <gi...@apache.org>.

JingsongLi commented on PR #313:
URL: https://github.com/apache/flink-table-store/pull/313#issuecomment-1275570123

   CC: @tsreaper 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-table-store] zjureel commented on pull request #313: [FLINK-27958] Compare batch maxKey to reduce comparisons in SortMergeReader

Posted by GitBox <gi...@apache.org>.

zjureel commented on PR #313:
URL: https://github.com/apache/flink-table-store/pull/313#issuecomment-1276966383

   > Hi @zjureel can you do some benchmark to verify the improvement?
   
   Hi @JingsongLi It's a good idea and I like it. I find there's a `flink-table-store-benchmark` project in `flink-table-store` to setup a flink cluster, run a query in the cluster and collect some metrics. I propose to add a new micro benchmark project in 'flink-table-store', we then add mcro benchmarks of core operation steps in `flink-table-store-micro-benchmarks` such as the throughput of read, write and compaction. We can create a view for the micro benchmarks, and the 'flink-table-store-micro-benchmarks' project is just similar to 'flink-benchmarks' for flink. What do you think ? Hope to hear from you, THX


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org