You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/06/24 04:29:41 UTC

[GitHub] [doris] englefly opened a new pull request, #10397: limit segment block size no more than 64M

englefly opened a new pull request, #10397:
URL: https://github.com/apache/doris/pull/10397

   # Proposed changes
   SegmentIterator adjust the row number of a batch according to the row bytes and max_block_size(64M).
   The default row number of a batch is 4096, which is not friendly for big-wide table.
   
   5 concurrent execution of "insert into big_wide-table select...", mem peak decreased from 44G to 33G with the same speed.
   
   
   Issue Number: close #xxx
   
   ## Problem Summary:
   
   Describe the overview of changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10397: [hotfix](dev-1.0.1) limit segment block size no more than 64M

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10397:
URL: https://github.com/apache/doris/pull/10397#discussion_r905764746


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -935,13 +937,22 @@ Status SegmentIterator::next_batch(vectorized::Block* block) {
                 _current_return_columns[cid]->reserve(_opts.block_row_max);
             }
         }
+        //count non predicate column size
+        for (auto cid : _non_predicate_columns) {
+            return_column_row_length += _schema.column(cid)->length();
+        }
+        // the size of a block: no more than 64M
+        constexpr uint32_t max_block_size = 1024 * 64;

Review Comment:
   64k? or 64 MB?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #10397: [hotfix](dev-1.0.1) limit segment block size no more than 64M

Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #10397:
URL: https://github.com/apache/doris/pull/10397#discussion_r905815365


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -935,13 +937,22 @@ Status SegmentIterator::next_batch(vectorized::Block* block) {
                 _current_return_columns[cid]->reserve(_opts.block_row_max);
             }
         }
+        //count non predicate column size
+        for (auto cid : _non_predicate_columns) {
+            return_column_row_length += _schema.column(cid)->length();
+        }
+        // the size of a block: no more than 64M
+        constexpr uint32_t max_block_size = 1024 * 64;

Review Comment:
   应该是64M,写错了



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10397: [hotfix](dev-1.0.1) limit segment block size no more than 64M

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10397:
URL: https://github.com/apache/doris/pull/10397#discussion_r905765702


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -935,13 +937,22 @@ Status SegmentIterator::next_batch(vectorized::Block* block) {
                 _current_return_columns[cid]->reserve(_opts.block_row_max);
             }
         }
+        //count non predicate column size

Review Comment:
   The length is definition size, not the actual data size, user may define the length == varchar(1000), but only save a single char.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #10397: [hotfix](dev-1.0.1) limit segment block size no more than 64M

Posted by GitBox <gi...@apache.org>.
englefly commented on code in PR #10397:
URL: https://github.com/apache/doris/pull/10397#discussion_r905817451


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -935,13 +937,22 @@ Status SegmentIterator::next_batch(vectorized::Block* block) {
                 _current_return_columns[cid]->reserve(_opts.block_row_max);
             }
         }
+        //count non predicate column size

Review Comment:
   我们没有实际长度的统计信息,所以这里只能使用schema的定义值。将来FE下发统计的实际平均长度,可以让batch size更准。



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly closed pull request #10397: [hotfix](dev-1.0.1) limit segment block size no more than 64M

Posted by GitBox <gi...@apache.org>.
englefly closed pull request #10397: [hotfix](dev-1.0.1) limit segment block size no more than 64M
URL: https://github.com/apache/doris/pull/10397


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org