You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/08/17 01:38:13 UTC

[GitHub] [incubator-doris] ZhangYu0123 opened a new pull request #4366: Optimise coding bit operation in BE

ZhangYu0123 opened a new pull request #4366:
URL: https://github.com/apache/incubator-doris/pull/4366


   ## Proposed changes
   
   Optimise bit operation  in variable length coding. Remove unnecessary bit operation.
   
   ## Types of changes
   
   What types of changes does your code introduce to Doris?
   _Put an `x` in the boxes that apply_
   
   - [x] Bugfix (non-breaking change which fixes an issue)
   - [] New feature (non-breaking change which adds functionality)
   - [] Breaking change (fix or feature that would cause existing functionality to not work as expected)
   - [] Documentation Update (if none of the other choices apply)
   - [] Code refactor (Modify the code structure, format the code, etc...)
   
   ## Checklist
   
   _Put an `x` in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code._
   
   - [x] I have create an issue on (Fix #ISSUE), and have described the bug/feature there in detail
   - [x] Compiling and unit tests pass locally with my changes
   - [] I have added tests that prove my fix is effective or that my feature works
   - [] If this change need a document change, I have updated the document
   - [] Any dependent changes have been merged
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] imay commented on pull request #4366: Optimise coding bit operation in BE

Posted by GitBox <gi...@apache.org>.
imay commented on pull request #4366:
URL: https://github.com/apache/incubator-doris/pull/4366#issuecomment-675370532


   > > Not intended to interfere, just curious about how many improvements can be achieved from this PR, are there any benchmarks?
   > 
   > In my develop computure, when encode_varint64 execute 1 billion times. `v | B` version averagely uses 95095ms in 5 times.
   > `(v & (B - 1))` version averagely uses 96103ms in 5 times. It can improve aboat 0.5% ~ 1%. Encode_varint64 is used high frequency in many cases like bitmap_value, page_pointer encode and so on.
   
   It seems too slow to execute 5 bilion enocde operation in about 95s 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg commented on pull request #4366: Optimise coding bit operation in BE

Posted by GitBox <gi...@apache.org>.
yangzhg commented on pull request #4366:
URL: https://github.com/apache/incubator-doris/pull/4366#issuecomment-675954087


   @imay From the point of view of implementation, our implementation should be copied from leveldb, but leveldb has removed some  unnecessary bit operation in this pr https://github.com/google/leveldb/commit/cf1b5f473259e46c667f3fb5a28bcd884ee3a102#diff-0f30ecb01c7888631c52ec595204f7f0 , so I think we can also change to the same implementation as level db, although this change improves performance not particularly big 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman merged pull request #4366: Optimise coding bit operation in BE

Posted by GitBox <gi...@apache.org>.
morningman merged pull request #4366:
URL: https://github.com/apache/incubator-doris/pull/4366


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] ZhangYu0123 edited a comment on pull request #4366: Optimise coding bit operation in BE

Posted by GitBox <gi...@apache.org>.
ZhangYu0123 edited a comment on pull request #4366:
URL: https://github.com/apache/incubator-doris/pull/4366#issuecomment-675244849


   > Not intended to interfere, just curious about how many improvements can be achieved from this PR, are there any benchmarks?
   
   In my develop computure, when encode_varint64 execute 1 billion times.  `v | B`  version averagely uses 95095ms  in 5 times.
   `(v & (B - 1))`  version averagely uses 96103ms  in 5 times.  It can improve aboat  0.5% ~ 1%.  Encode_varint64 is used high frequency in many cases like bitmap_value,  page_pointer encode and so on.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] ZhangYu0123 commented on pull request #4366: Optimise coding bit operation in BE

Posted by GitBox <gi...@apache.org>.
ZhangYu0123 commented on pull request #4366:
URL: https://github.com/apache/incubator-doris/pull/4366#issuecomment-675405126


   > > > Not intended to interfere, just curious about how many improvements can be achieved from this PR, are there any benchmarks?
   > > 
   > > 
   > > In my develop computure, when encode_varint64 execute 1 billion times. `v | B` version averagely uses 95095ms in 5 times.
   > > `(v & (B - 1))` version averagely uses 96103ms in 5 times. It can improve aboat 0.5% ~ 1%. Encode_varint64 is used high frequency in many cases like bitmap_value, page_pointer encode and so on.
   > 
   > It seems too slow to execute 5 bilion enocde operation in about 95s
   
   5 billion costs 95s * 5. Compression is time consuming.   This  encode_varint64 is mainly used to compress low-bit int to variable length instead of  int64_t type.  It is trade-off between time and space.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] ZhangYu0123 edited a comment on pull request #4366: Optimise coding bit operation in BE

Posted by GitBox <gi...@apache.org>.
ZhangYu0123 edited a comment on pull request #4366:
URL: https://github.com/apache/incubator-doris/pull/4366#issuecomment-675244849


   > Not intended to interfere, just curious about how many improvements can be achieved from this PR, are there any benchmarks?
   
   In my develop computure, when execute 1 billion times encode_varint64.  `v | B`  version averagely uses 95095ms  in 5 times.
   `(v & (B - 1))`  version averagely uses 96103ms  in 5 times.  It can improve aboat  0.5% ~ 1%.  Encode_varint64 use  high frequency in many case like bitmap_value,  page_pointer encode and so on.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] ZhangYu0123 commented on a change in pull request #4366: Optimise coding bit operation in BE

Posted by GitBox <gi...@apache.org>.
ZhangYu0123 commented on a change in pull request #4366:
URL: https://github.com/apache/incubator-doris/pull/4366#discussion_r471867279



##########
File path: be/src/util/coding.h
##########
@@ -143,7 +143,7 @@ extern uint8_t* encode_varint64(uint8_t* dst, uint64_t value);
 inline uint8_t* encode_varint64(uint8_t* dst, uint64_t v) {
     static const unsigned int B = 128;
     while (v >= B) {
-        *(dst++) = (v & (B - 1)) | B;
+        *(dst++) = v | B;

Review comment:
           v >>= 7;
   
   In the loop, low seven bits of v shift right dst every time  and the eight bit is used to mark whether the number is continue to compression.   The low eight bits is assgined to *dst whitch is uint_8. 
   
   In low eight bits,  (v | B) & (11111111) = 11111111. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg edited a comment on pull request #4366: Optimise coding bit operation in BE

Posted by GitBox <gi...@apache.org>.
yangzhg edited a comment on pull request #4366:
URL: https://github.com/apache/incubator-doris/pull/4366#issuecomment-675954087


   @imay From the point of view of implementation, our implementation should be copied from leveldb, but leveldb has removed some  unnecessary bit operation in this pr https://github.com/google/leveldb/commit/cf1b5f473259e46c667f3fb5a28bcd884ee3a102#diff-0f30ecb01c7888631c52ec595204f7f0 , so I think we can accept this pr and change to the same implementation as level db, although this change improves performance not particularly big


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] sduzh edited a comment on pull request #4366: Optimise coding bit operation in BE

Posted by GitBox <gi...@apache.org>.
sduzh edited a comment on pull request #4366:
URL: https://github.com/apache/incubator-doris/pull/4366#issuecomment-675225333


    Not intended to interfere, just curious about how many improvements can be achieved from this PR, are there any benchmarks?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] ZhangYu0123 commented on a change in pull request #4366: Optimise coding bit operation in BE

Posted by GitBox <gi...@apache.org>.
ZhangYu0123 commented on a change in pull request #4366:
URL: https://github.com/apache/incubator-doris/pull/4366#discussion_r471867630



##########
File path: be/src/util/coding.h
##########
@@ -143,7 +143,7 @@ extern uint8_t* encode_varint64(uint8_t* dst, uint64_t value);
 inline uint8_t* encode_varint64(uint8_t* dst, uint64_t v) {
     static const unsigned int B = 128;
     while (v >= B) {
-        *(dst++) = (v & (B - 1)) | B;
+        *(dst++) = v | B;

Review comment:
       ![image](https://user-images.githubusercontent.com/67053339/90461205-42999100-e138-11ea-980f-5f6530b88ccf.png)
   
   This is the implementation in leveldb.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] ZhangYu0123 edited a comment on pull request #4366: Optimise coding bit operation in BE

Posted by GitBox <gi...@apache.org>.
ZhangYu0123 edited a comment on pull request #4366:
URL: https://github.com/apache/incubator-doris/pull/4366#issuecomment-675244849


   > Not intended to interfere, just curious about how many improvements can be achieved from this PR, are there any benchmarks?
   
   In my develop computure, when execute 1 billion times encode_varint64.  `v | B`  version averagely uses 95095ms  in 5 times.
   `(v & (B - 1))`  version averagely uses 96103ms  in 5 times.  It can improve aboat  0.5% ~ 1%.  Encode_varint64 is used high frequency in many cases like bitmap_value,  page_pointer encode and so on.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] imay commented on a change in pull request #4366: Optimise coding bit operation in BE

Posted by GitBox <gi...@apache.org>.
imay commented on a change in pull request #4366:
URL: https://github.com/apache/incubator-doris/pull/4366#discussion_r471368389



##########
File path: be/src/util/coding.h
##########
@@ -143,7 +143,7 @@ extern uint8_t* encode_varint64(uint8_t* dst, uint64_t value);
 inline uint8_t* encode_varint64(uint8_t* dst, uint64_t v) {
     static const unsigned int B = 128;
     while (v >= B) {
-        *(dst++) = (v & (B - 1)) | B;
+        *(dst++) = v | B;

Review comment:
       (v | B) & (11111111) != v | B
   
   If v = 1011111111
   (v | B) = 1011111111
   (v | B) &  (11111111) =  11111111
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] ZhangYu0123 commented on pull request #4366: Optimise coding bit operation in BE

Posted by GitBox <gi...@apache.org>.
ZhangYu0123 commented on pull request #4366:
URL: https://github.com/apache/incubator-doris/pull/4366#issuecomment-675244849


   > Not intended to interfere, just curious about how many improvements can be achieved from this PR, are there any benchmarks?
   
   In my develop computure, when execute 1 billion times encode_varint64.  `v | B`  version averagely uses 95095  in 5 times.
   `(v & (B - 1))`  version averagely uses 96103  in 5 times.  It can improve aboat  0.5% ~ 1%.  Encode_varint64 use  high frequency in many case like bitmap_value,  page_pointer encode and so on.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] ZhangYu0123 commented on a change in pull request #4366: Optimise coding bit operation in BE

Posted by GitBox <gi...@apache.org>.
ZhangYu0123 commented on a change in pull request #4366:
URL: https://github.com/apache/incubator-doris/pull/4366#discussion_r471195742



##########
File path: be/src/util/coding.h
##########
@@ -143,7 +143,7 @@ extern uint8_t* encode_varint64(uint8_t* dst, uint64_t value);
 inline uint8_t* encode_varint64(uint8_t* dst, uint64_t v) {
     static const unsigned int B = 128;
     while (v >= B) {
-        *(dst++) = (v & (B - 1)) | B;
+        *(dst++) = v | B;

Review comment:
       (v & (B-1)) | B = (v | B) & ((B-1) | B ) = (v | B) & (11111111) = v | B
    
   eg: 
   129 & (128 - 1)  | 128 = 129
   129 | 128 =  129




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] ZhangYu0123 edited a comment on pull request #4366: Optimise coding bit operation in BE

Posted by GitBox <gi...@apache.org>.
ZhangYu0123 edited a comment on pull request #4366:
URL: https://github.com/apache/incubator-doris/pull/4366#issuecomment-675244849


   > Not intended to interfere, just curious about how many improvements can be achieved from this PR, are there any benchmarks?
   
   In my develop computure, when execute 1 billion times encode_varint64.  `v | B`  version averagely uses 95095ms  in 5 times.
   `(v & (B - 1))`  version averagely uses 96103ms  in 5 times.  It can improve aboat  0.5% ~ 1%.  Encode_varint64 use  high frequency in many cases like bitmap_value,  page_pointer encode and so on.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] chaoyli commented on a change in pull request #4366: Optimise coding bit operation in BE

Posted by GitBox <gi...@apache.org>.
chaoyli commented on a change in pull request #4366:
URL: https://github.com/apache/incubator-doris/pull/4366#discussion_r471193910



##########
File path: be/src/util/coding.h
##########
@@ -143,7 +143,7 @@ extern uint8_t* encode_varint64(uint8_t* dst, uint64_t value);
 inline uint8_t* encode_varint64(uint8_t* dst, uint64_t v) {
     static const unsigned int B = 128;
     while (v >= B) {
-        *(dst++) = (v & (B - 1)) | B;
+        *(dst++) = v | B;

Review comment:
       v is a uint64_t.
   if v > 128, (v & (B -1)) | B not equal v | B




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] ZhangYu0123 commented on a change in pull request #4366: Optimise coding bit operation in BE

Posted by GitBox <gi...@apache.org>.
ZhangYu0123 commented on a change in pull request #4366:
URL: https://github.com/apache/incubator-doris/pull/4366#discussion_r471195742



##########
File path: be/src/util/coding.h
##########
@@ -143,7 +143,7 @@ extern uint8_t* encode_varint64(uint8_t* dst, uint64_t value);
 inline uint8_t* encode_varint64(uint8_t* dst, uint64_t v) {
     static const unsigned int B = 128;
     while (v >= B) {
-        *(dst++) = (v & (B - 1)) | B;
+        *(dst++) = v | B;

Review comment:
       (v & (B-1)) | B = (v | B) & ((B-1) | B ) = (v | B) & (11111111) = v | B
    
   eg: 
   129 & (128 - 1)  | 128 = 1
   129 | 128 =  1




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] sduzh commented on pull request #4366: Optimise coding bit operation in BE

Posted by GitBox <gi...@apache.org>.
sduzh commented on pull request #4366:
URL: https://github.com/apache/incubator-doris/pull/4366#issuecomment-675225333


    Not intended to interfere, just curious about how many improvements can be achieved from this PR, are there some benchmarks?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org