You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/04/13 08:58:17 UTC

[GitHub] [incubator-doris] vagetablechicken opened a new issue #3307: UniqueRowsetIdGenerator may use a lot of memory

vagetablechicken opened a new issue #3307: UniqueRowsetIdGenerator may use a lot of memory
URL: https://github.com/apache/incubator-doris/issues/3307
 
 
   **Describe the bug**
   We tracked valid_rowset_ids size. And it may increase to 1G or more. One RowsetId cost 25B. So the total memory usage may exceed 25GB.
   It wastes memory. We can use bitmap to save memory.
   Working on it. Any advise?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] chaoyli commented on issue #3307: UniqueRowsetIdGenerator may use a lot of memory

Posted by GitBox <gi...@apache.org>.
chaoyli commented on issue #3307: UniqueRowsetIdGenerator may use a lot of memory
URL: https://github.com/apache/incubator-doris/issues/3307#issuecomment-613256415
 
 
   I think you store only inc_id is OK, and store _version and _backend_uid once is enough.
   I am not understand that std::map<int, bitset<SIZE>> bitsets is for what?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] vagetablechicken edited a comment on issue #3307: UniqueRowsetIdGenerator may use a lot of memory

Posted by GitBox <gi...@apache.org>.
vagetablechicken edited a comment on issue #3307: UniqueRowsetIdGenerator may use a lot of memory
URL: https://github.com/apache/incubator-doris/issues/3307#issuecomment-613198701
 
 
   https://github.com/apache/incubator-doris/blob/cc31bf9cf96ec0bef2ce55553b247df7bbe598e3/be/src/olap/rowset/unique_rowset_id_generator.cpp#L31
   rowset_id is combined of version, inc_id, be_uid hi, be_uid lo. And the version and be_uid would never change. **So a inc_id(or rowset.hi) corresponds to a RowsetId.**
   We can use a bitmap to store inc_id(or rowset.hi).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] vagetablechicken commented on issue #3307: UniqueRowsetIdGenerator may use a lot of memory

Posted by GitBox <gi...@apache.org>.
vagetablechicken commented on issue #3307: UniqueRowsetIdGenerator may use a lot of memory
URL: https://github.com/apache/incubator-doris/issues/3307#issuecomment-613198701
 
 
   https://github.com/apache/incubator-doris/blob/cc31bf9cf96ec0bef2ce55553b247df7bbe598e3/be/src/olap/rowset/unique_rowset_id_generator.cpp#L31
   rowset_id is combined of version, inc_id, be_uid hi, be_uid lo. And the version and be_uid would never change. **So a inc_id corresponds to a RowsetId.**
   We can use a bitmap to store inc_id.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] vagetablechicken commented on issue #3307: UniqueRowsetIdGenerator may use a lot of memory

Posted by GitBox <gi...@apache.org>.
vagetablechicken commented on issue #3307: UniqueRowsetIdGenerator may use a lot of memory
URL: https://github.com/apache/incubator-doris/issues/3307#issuecomment-613275090
 
 
   > I think you store only inc_id is OK, and store _version and _backend_uid once is enough.
   > I am not understand that std::map<int, bitset> bitsets is for what?
   
   For minimum memory usage, but may has the worst case. So forget about it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] vagetablechicken commented on issue #3307: UniqueRowsetIdGenerator may use a lot of memory

Posted by GitBox <gi...@apache.org>.
vagetablechicken commented on issue #3307: UniqueRowsetIdGenerator may use a lot of memory
URL: https://github.com/apache/incubator-doris/issues/3307#issuecomment-613222420
 
 
   If you worry about the waste memory of unused rowset. e.g. we released rowset_id(_inc_id==1), then we never use it. But it's in the bitset.
    ### For memory-conserving
    we can use a map of bitset
   ```
   std::map<int, bitset<SIZE>> bitsets;
   ```
   So `bitsets[id/SIZE][id%SIZE]` is the rowset_id bit.
   And if one bitset is all 0, we can delete it.(may need generate a gc task)
   
   ### For cleaner code
   just use a set of inc_id to store valid rowset. We can save 17/25 of memory,  more than 2/ 3.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] chaoyli commented on issue #3307: UniqueRowsetIdGenerator may use a lot of memory

Posted by GitBox <gi...@apache.org>.
chaoyli commented on issue #3307: UniqueRowsetIdGenerator may use a lot of memory
URL: https://github.com/apache/incubator-doris/issues/3307#issuecomment-612850706
 
 
   How to use bitmap to store rowset_ids?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] vagetablechicken edited a comment on issue #3307: UniqueRowsetIdGenerator may use a lot of memory

Posted by GitBox <gi...@apache.org>.
vagetablechicken edited a comment on issue #3307: UniqueRowsetIdGenerator may use a lot of memory
URL: https://github.com/apache/incubator-doris/issues/3307#issuecomment-613222420
 
 
   If you worry about the waste memory of unused rowset. e.g. we released rowset_id(_inc_id==1), then we never use it. But it's in the bitset.
    ### For memory-conserving
    we can use a map of bitset
   ```
   std::map<int, bitset<SIZE>> bitsets;
   ```
   So `bitsets[id/SIZE][id%SIZE]` is the rowset_id bit.
   And if one bitset is all 0, we can delete it.(may need generate a gc task)
   
   ### For cleaner code
   just use a set of inc_id(or rowset.hi) to store valid rowset. We can save 17/25 of memory,  more than 2/ 3.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] vagetablechicken edited a comment on issue #3307: UniqueRowsetIdGenerator may use a lot of memory

Posted by GitBox <gi...@apache.org>.
vagetablechicken edited a comment on issue #3307: UniqueRowsetIdGenerator may use a lot of memory
URL: https://github.com/apache/incubator-doris/issues/3307#issuecomment-613198701
 
 
   https://github.com/apache/incubator-doris/blob/cc31bf9cf96ec0bef2ce55553b247df7bbe598e3/be/src/olap/rowset/unique_rowset_id_generator.cpp#L31
   rowset_id is combined of version, inc_id, be_uid hi, be_uid lo. And the version and be_uid would never change. **So a inc_id(or rowset.hi) corresponds to a RowsetId.**
   We can use a bitmap to store inc_id.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] vagetablechicken closed issue #3307: UniqueRowsetIdGenerator may use a lot of memory

Posted by GitBox <gi...@apache.org>.
vagetablechicken closed issue #3307:
URL: https://github.com/apache/incubator-doris/issues/3307


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] vagetablechicken closed issue #3307: UniqueRowsetIdGenerator may use a lot of memory

Posted by GitBox <gi...@apache.org>.
vagetablechicken closed issue #3307:
URL: https://github.com/apache/incubator-doris/issues/3307


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] vagetablechicken closed issue #3307: UniqueRowsetIdGenerator may use a lot of memory

Posted by GitBox <gi...@apache.org>.
vagetablechicken closed issue #3307:
URL: https://github.com/apache/incubator-doris/issues/3307


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org