You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/02/21 09:08:46 UTC

[GitHub] [incubator-doris] kangpinghuang opened a new issue #2967: add a test for different encoding

kangpinghuang opened a new issue #2967: add a test for different encoding
URL: https://github.com/apache/incubator-doris/issues/2967
 
 
   I add a test for encoding in different situation.
   
   I generate 100million int, classified into 4 type: sequence/random/small step/large step.
   
   the original data size is as following:
   
   sequence | random | small_step | big_step
   -- | -- | -- | --
   848 | 1000M | 859 | 859
   
   - tests
   
   I test for encoding method, including: alpha/beta_bitshuffle/beta_for(frame of reference)/beta_rle. The result is as following:
   
   1. space
   
   the following is space after encoding for 100million ints.
   
   单位(KB) | sequence | random | small_step | big_step
   -- | -- | -- | -- | --
   alpha | 2865.152 | 104420.4 | 2108.416 | 2224.128
   beta_bitshuffle | 4094.976 | 143268.9 | 1682.432 | 2679.808
   beta_for | 4582.4 | 94251.01 | 797.3325 | 956.233728
   beta_rle | 818.0101 | 10342.4 | 778.4581 | 783.970304
   
   the graph is as following:
   ![image](https://user-images.githubusercontent.com/40422952/75019415-c9441d00-54cb-11ea-92f7-a6ac0e2ae9f7.png)
   
   2. query time cost for count(*) 
   
   the time is 95% percentile time cost, unit is : ms
   
     | sequence | random | small_step | big_step
   -- | -- | -- | -- | --
   alpha | 7399.1 | 5416.48 | 6231 | 5372.88
   beta_bitshuffle | 14342 | 12059.82 | 9186.91 | 8817.78
   beta_for | 8752.04 | 11379.43 | 12403.98 | 8415.49
   beta_rle | 8544.95 | 8614.29 | 9299.58 | 8295.44
   
   the graph is:
   ![image](https://user-images.githubusercontent.com/40422952/75019604-22ac4c00-54cc-11ea-8ad9-38d9bb419ace.png)
   
   3. query time cost for point query
   
   select count(*) from table where id = xxx;
   
   
     | sequence | random | small_step | big_step
   -- | -- | -- | -- | --
   alhpa | 8.3 | 8.66 | 477.26 | 10.73
   beta_bitshuffle | 9.65 | 9.86 | 413.63 | 10.91
   beta_for | 25.3 | 29.98 | 401.32 | 30.13
   beta_rle | 8.65 | 9.06 | 398.92 | 10.86
   
   the graph is:
   
   ![image](https://user-images.githubusercontent.com/40422952/75019723-61420680-54cc-11ea-80ef-da9862433f4c.png)
   
   - conclusion
   
   beta rle aquire the best space efficiency in all situation than other beta's encodings and alpha encoding. The query performance of beta rle is the best in encodings of Segment V2, but is a bit poor than alpha encoding.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] gaodayue commented on issue #2967: add a test for different encoding

Posted by GitBox <gi...@apache.org>.
gaodayue commented on issue #2967: add a test for different encoding
URL: https://github.com/apache/incubator-doris/issues/2967#issuecomment-589928568
 
 
   To obtain accurate and reproducible test results, we should
   1. write benchmark code that directly tests different `PageBuilder` and `PageDecoder`
   2. open source it so that other people can review and run the benchmark
   
   In addition, I think we should add a test case on seek read performance for segment_v2 because it significant affects the performance of SegmentIterator.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] kangpinghuang commented on issue #2967: add a test for different encoding

Posted by GitBox <gi...@apache.org>.
kangpinghuang commented on issue #2967: add a test for different encoding
URL: https://github.com/apache/incubator-doris/issues/2967#issuecomment-589564958
 
 
   parent issue: #2886

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org