You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/01/01 09:15:21 UTC

[GitHub] [incubator-doris] morningman opened a new pull request #2632: [Rowset Reader] Improve the merge read efficiency of alpha rowsets

morningman opened a new pull request #2632: [Rowset Reader] Improve the merge read efficiency of alpha rowsets
URL: https://github.com/apache/incubator-doris/pull/2632
 
 
   When merge reads from multi rowsets, or one rowset with multi overlapping segments, 
   I introduce a priority queue(A Minimum heap data structure) for multipath merge sort, 
   to replace the old O(N^2) time complexity algorithm.
   
   This can significantly improve the read efficiency when merging large number of 
   overlapping data.
   
   In mytest:
   1. Compaction with 187 segments reduce time from 75 seconds to 42 seconds
   2. Compaction with 3574 segments cost 43 seconds, and with old version, I kill the 
   process after waiting more than 10 minutes...
   
   This CL only change the reads of alpha rowset. Beta rowset will be changed in another CL.
   
   ISSUE: #2631 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman merged pull request #2632: [Rowset Reader] Improve the merge read efficiency of alpha rowsets

Posted by GitBox <gi...@apache.org>.
morningman merged pull request #2632: [Rowset Reader] Improve the merge read efficiency of alpha rowsets
URL: https://github.com/apache/incubator-doris/pull/2632
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] imay commented on a change in pull request #2632: [Rowset Reader] Improve the merge read efficiency of alpha rowsets

Posted by GitBox <gi...@apache.org>.
imay commented on a change in pull request #2632: [Rowset Reader] Improve the merge read efficiency of alpha rowsets
URL: https://github.com/apache/incubator-doris/pull/2632#discussion_r362320010
 
 

 ##########
 File path: be/src/olap/rowset/alpha_rowset_reader.h
 ##########
 @@ -103,6 +128,9 @@ class AlphaRowsetReader : public RowsetReader {
     RowsetReaderContext* _current_read_context;
     OlapReaderStatistics _owned_stats;
     OlapReaderStatistics* _stats = &_owned_stats;
+
+    // a priority queue for merging rowsets
+    std::priority_queue<RowCursorWithOrdinal, vector<RowCursorWithOrdinal>, RowCursorWithOrdinalComparator> _merge_queue;
 
 Review comment:
   why not use use **RowCursorWithOrdinal** as element of queue. If then, there is no need to record ordinal.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #2632: [Rowset Reader] Improve the merge read efficiency of alpha rowsets

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #2632: [Rowset Reader] Improve the merge read efficiency of alpha rowsets
URL: https://github.com/apache/incubator-doris/pull/2632#discussion_r362377473
 
 

 ##########
 File path: be/src/olap/rowset/alpha_rowset_reader.h
 ##########
 @@ -103,6 +128,9 @@ class AlphaRowsetReader : public RowsetReader {
     RowsetReaderContext* _current_read_context;
     OlapReaderStatistics _owned_stats;
     OlapReaderStatistics* _stats = &_owned_stats;
+
+    // a priority queue for merging rowsets
+    std::priority_queue<RowCursorWithOrdinal, vector<RowCursorWithOrdinal>, RowCursorWithOrdinalComparator> _merge_queue;
 
 Review comment:
   I removed the RowCursorWithOrdinal, and use MergeContext directly

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] imay commented on a change in pull request #2632: [Rowset Reader] Improve the merge read efficiency of alpha rowsets

Posted by GitBox <gi...@apache.org>.
imay commented on a change in pull request #2632: [Rowset Reader] Improve the merge read efficiency of alpha rowsets
URL: https://github.com/apache/incubator-doris/pull/2632#discussion_r362319571
 
 

 ##########
 File path: be/src/olap/rowset/alpha_rowset_reader.cpp
 ##########
 @@ -326,4 +382,8 @@ RowsetSharedPtr AlphaRowsetReader::rowset() {
     return std::static_pointer_cast<Rowset>(_rowset);
 }
 
+bool RowCursorWithOrdinalComparator::operator () (const RowCursorWithOrdinal &x, const RowCursorWithOrdinal &y) const {
 
 Review comment:
   ```suggestion
   bool RowCursorWithOrdinalComparator::operator () (const RowCursorWithOrdinal& x, const RowCursorWithOrdinal& y) const {
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org