You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@shardingsphere.apache.org by GitBox <gi...@apache.org> on 2021/04/22 02:54:36 UTC

[GitHub] [shardingsphere] cherrylzhao opened a new issue #10150: reduce memory used of GroupByStreamMergeResult

cherrylzhao opened a new issue #10150:
URL: https://github.com/apache/shardingsphere/issues/10150


   
   ### current logic
   
   think we have a t_order table,  3 records distributed in 3 datasources as follow:
   ```
   +--------------------+---------+--------+
   | order_id           | user_id | status |
   +--------------------+---------+--------+
   | 591613421079212032 |   jerry | init   |  -> ds0
   | 591652652161937408 |   jerry | init   |  -> ds1
   | 591652696403456001 |   jerry | init   |  -> ds2
   +--------------------+---------+--------+
   ```
   
   query sql is:
   ```
   select user_id, count(user_id) from t_order group by user_id;
   ```
   
   inside the shardingsphere, modified sql will be sent to backend datasources to execute as follow:
   ```
   +-----------------+-------------------------------------------------------------------------------------+
   | datasource_name | sql                                                                                 |
   +-----------------+-------------------------------------------------------------------------------------+
   | ds_0           | select user_id,count(user_id) from t_order group by user_id ORDER BY user_id ASC     |
   | ds_1            | select user_id,count(user_id) from t_order group by user_id ORDER BY user_id ASC    |
   | ds_2           | select user_id,count(user_id) from t_order group by user_id ORDER BY user_id ASC     |
   +-----------------+-------------------------------------------------------------------------------------+
   ```
   then 3 query results will be loaded into GroupByStreamMergeResult, 
   after MergeResult.next() was invoked core data flow is like this:
   
   ```
   +-------------------------+
   |  currentRow             |
   +-------------------------+
   [jerry, 1]                           -> count:1
   [jerry, 1, jerry, 1]                 -> count:2
   [jerry, 1, jerry, 1, jerry, 1]       -> count:3
   [jerry, 3, jerry, 1, jerry, 1]       -> write count to currentRow
   ```
   
   ### optimize point
   
   key point is that, count value was computed by `AggravationUnit` in iteration process,
   then update the  currentRow value according to parsed aggregate projection index after `group by` value changed.
   so we should only cache first row of same group in currentRow to **reduce memory used** like this:
   
   ```
   +-------------------------+
   |  currentRow             |
   +-------------------------+
   [jerry, 1]                   -> count:1
   [jerry, 1]                   -> count:2
   [jerry, 1]                   -> count:3
   [jerry, 3]                   -> write count to currentRow
   ```
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [shardingsphere] cherrylzhao closed issue #10150: reduce memory used of GroupByStreamMergeResult

Posted by GitBox <gi...@apache.org>.
cherrylzhao closed issue #10150:
URL: https://github.com/apache/shardingsphere/issues/10150


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org