You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/06/27 06:20:56 UTC

[GitHub] [doris] HappenLee opened a new pull request, #10448: [Load][Vectorized] opt the mem use of aggregate function in load to speed up

HappenLee opened a new pull request, #10448:
URL: https://github.com/apache/doris/pull/10448

   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem Summary:
   
   After do the opt of this
   
   unique_table 1000W data load time
   
   before:35s -> after:25s
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] spaces-X commented on pull request #10448: [Load][Vectorized] opt the mem use of aggregate function in load to speed up

Posted by GitBox <gi...@apache.org>.
spaces-X commented on PR #10448:
URL: https://github.com/apache/doris/pull/10448#issuecomment-1168290266

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10448: [Load][Vectorized] opt the mem use of aggregate function in load to speed up

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10448:
URL: https://github.com/apache/doris/pull/10448#discussion_r908193811


##########
be/src/olap/memtable.cpp:
##########
@@ -161,11 +179,12 @@ void MemTable::_insert_one_row_from_block(RowInBlock* row_in_block) {
     if (is_exist) {
         _aggregate_two_row_in_block(row_in_block, _vec_hint.curr->key);
     } else {
-        row_in_block->init_agg_places(_agg_functions, _schema->num_key_columns());
+        row_in_block->init_agg_places(
+                (char*)_table_mem_pool->allocate(_total_size_of_aggregate_states),

Review Comment:
   This is suitable for MemPool~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #10448: [Load][Vectorized] opt the mem use of aggregate function in load to speed up

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #10448:
URL: https://github.com/apache/doris/pull/10448#discussion_r907043602


##########
be/src/olap/memtable.cpp:
##########
@@ -161,11 +179,12 @@ void MemTable::_insert_one_row_from_block(RowInBlock* row_in_block) {
     if (is_exist) {
         _aggregate_two_row_in_block(row_in_block, _vec_hint.curr->key);
     } else {
-        row_in_block->init_agg_places(_agg_functions, _schema->num_key_columns());
+        row_in_block->init_agg_places(
+                (char*)_table_mem_pool->allocate(_total_size_of_aggregate_states),

Review Comment:
   How much is _total_size_of_aggregate_states usually, thks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] spaces-X commented on a diff in pull request #10448: [Load][Vectorized] opt the mem use of aggregate function in load to speed up

Posted by GitBox <gi...@apache.org>.
spaces-X commented on code in PR #10448:
URL: https://github.com/apache/doris/pull/10448#discussion_r908074976


##########
be/src/olap/memtable.cpp:
##########
@@ -161,11 +179,12 @@ void MemTable::_insert_one_row_from_block(RowInBlock* row_in_block) {
     if (is_exist) {
         _aggregate_two_row_in_block(row_in_block, _vec_hint.curr->key);
     } else {
-        row_in_block->init_agg_places(_agg_functions, _schema->num_key_columns());
+        row_in_block->init_agg_places(
+                (char*)_table_mem_pool->allocate(_total_size_of_aggregate_states),

Review Comment:
   I think `_total_size_of_aggregate_states` will not be large, depending on the number and type of value columns. Each value column needs a buffer to store the agg result, which is independent of the number of lines.
   
   Usually the number of value column will not be very large, may be less than 10. And the size and alignment length of basic data types are not large.
   
   However, the cumulative value of memory allocated by mem pool will be relatively large.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #10448: [Load][Vectorized] opt the mem use of aggregate function in load to speed up

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on PR #10448:
URL: https://github.com/apache/doris/pull/10448#issuecomment-1168405577

   LGTM @HappenLee  rebase for p0 and UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] k-i-d-d commented on a diff in pull request #10448: [Load][Vectorized] opt the mem use of aggregate function in load to speed up

Posted by GitBox <gi...@apache.org>.
k-i-d-d commented on code in PR #10448:
URL: https://github.com/apache/doris/pull/10448#discussion_r907042793


##########
be/src/olap/memtable.cpp:
##########
@@ -161,11 +179,12 @@ void MemTable::_insert_one_row_from_block(RowInBlock* row_in_block) {
     if (is_exist) {
         _aggregate_two_row_in_block(row_in_block, _vec_hint.curr->key);
     } else {
-        row_in_block->init_agg_places(_agg_functions, _schema->num_key_columns());
+        row_in_block->init_agg_places(
+                (char*)_table_mem_pool->allocate(_total_size_of_aggregate_states),

Review Comment:
   How much is `_total_size_of_aggregate_states` usually, thks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #10448: [Load][Vectorized] opt the mem use of aggregate function in load to speed up

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #10448:
URL: https://github.com/apache/doris/pull/10448#issuecomment-1179660273

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #10448: [Load][Vectorized] opt the mem use of aggregate function in load to speed up

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #10448:
URL: https://github.com/apache/doris/pull/10448#issuecomment-1179660279

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] BiteTheDDDDt merged pull request #10448: [Load][Vectorized] opt the mem use of aggregate function in load to speed up

Posted by GitBox <gi...@apache.org>.
BiteTheDDDDt merged PR #10448:
URL: https://github.com/apache/doris/pull/10448


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org