You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/05/14 05:49:28 UTC

[GitHub] [incubator-doris] xiaokang opened a new pull request, #9566: optimize lz4 compress and decompress speed by reusing context

xiaokang opened a new pull request, #9566:
URL: https://github.com/apache/incubator-doris/pull/9566

   # Proposed changes
   
   optimize lz4 compress and decompress speed by reusing context
   
   ## Problem Summary:
   
   Currently, context object is create and free for each lz4 compress and decompress function call. It's not efficient since memory alloc and free is too frequent for once per compression/decompression block, usually a page of less than 64KB. 
   
   In practice, we see hot pot in LZ4_decompress_safe, memcpy, SpinLock::SpinLoop as in the following screenshot.
   
   ![image](https://user-images.githubusercontent.com/680838/168412575-00840f08-20c5-4d33-8522-5938ce4e0527.png)
   
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (No)
   2. Has unit tests been added: (No Need)
   3. Has document been added or modified: (No Need)
   4. Does it need to update dependencies: (No)
   5. Are there any changes that cannot be rolled back: (No)
   
   ## Further comments
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xiaokang commented on pull request #9566: optimize lz4 compress and decompress speed by reusing context

Posted by GitBox <gi...@apache.org>.
xiaokang commented on PR #9566:
URL: https://github.com/apache/incubator-doris/pull/9566#issuecomment-1126649753

   We see x6.75 speedup in the following comparison test.
   
   test env
   - test data: 10GB text log, 44 million rows
   - test table: test_table(ts varchar(30), log string)
   - test SQL: set enable_vectorized_engine=1; select sum(length(log)) from test_table
   - be.conf: disable_storage_page_cache = true   
   set this config to disable doris page cache to avoid all data cached in memory for test real decompression speed.
   
   test result 
   - master branch runtime: 5.2 seconds
   - lz4_compress_opt branch runtime: 0.77 seconds
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #9566: optimize lz4 compress and decompress speed by reusing context

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9566:
URL: https://github.com/apache/incubator-doris/pull/9566#issuecomment-1126845498

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #9566: optimize lz4 compress and decompress speed by reusing context

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9566:
URL: https://github.com/apache/incubator-doris/pull/9566#issuecomment-1126845494

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei commented on a diff in pull request #9566: optimize lz4 compress and decompress speed by reusing context

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #9566:
URL: https://github.com/apache/incubator-doris/pull/9566#discussion_r872938862


##########
be/src/util/block_compression.cpp:
##########
@@ -370,27 +388,39 @@ class ZlibBlockCompression : public BlockCompressionCodec {
 };
 
 Status get_block_compression_codec(segment_v2::CompressionTypePB type,
-                                   const BlockCompressionCodec** codec) {
+                                   std::unique_ptr<BlockCompressionCodec>& codec) {
+    BlockCompressionCodec* ptr = nullptr;
     switch (type) {
     case segment_v2::CompressionTypePB::NO_COMPRESSION:
-        *codec = nullptr;
-        break;
+        codec.reset(nullptr);
+        return Status::OK();
     case segment_v2::CompressionTypePB::SNAPPY:
-        *codec = SnappyBlockCompression::instance();
+        ptr = new SnappyBlockCompression();

Review Comment:
   could write this way:  codec.reset(new SnappyBlockCompression());



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei commented on a diff in pull request #9566: optimize lz4 compress and decompress speed by reusing context

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #9566:
URL: https://github.com/apache/incubator-doris/pull/9566#discussion_r872938912


##########
be/src/util/block_compression.cpp:
##########
@@ -370,27 +388,39 @@ class ZlibBlockCompression : public BlockCompressionCodec {
 };
 
 Status get_block_compression_codec(segment_v2::CompressionTypePB type,
-                                   const BlockCompressionCodec** codec) {
+                                   std::unique_ptr<BlockCompressionCodec>& codec) {
+    BlockCompressionCodec* ptr = nullptr;
     switch (type) {
     case segment_v2::CompressionTypePB::NO_COMPRESSION:
-        *codec = nullptr;
-        break;
+        codec.reset(nullptr);
+        return Status::OK();
     case segment_v2::CompressionTypePB::SNAPPY:
-        *codec = SnappyBlockCompression::instance();
+        ptr = new SnappyBlockCompression();
         break;
     case segment_v2::CompressionTypePB::LZ4:
-        *codec = Lz4BlockCompression::instance();
+        ptr = new Lz4BlockCompression();
         break;
     case segment_v2::CompressionTypePB::LZ4F:
-        *codec = Lz4fBlockCompression::instance();
+        ptr = new Lz4fBlockCompression(); 
         break;
     case segment_v2::CompressionTypePB::ZLIB:
-        *codec = ZlibBlockCompression::instance();
+        ptr = new ZlibBlockCompression();
         break;
     default:
         return Status::NotFound(strings::Substitute("unknown compression type($0)", type));
     }
-    return Status::OK();
+
+    if (!ptr)
+        return Status::NotFound("Failed to create compression codec");
+    

Review Comment:
   just write: return codec->init();



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xiaokang commented on pull request #9566: optimize lz4 compress and decompress speed by reusing context

Posted by GitBox <gi...@apache.org>.
xiaokang commented on PR #9566:
URL: https://github.com/apache/incubator-doris/pull/9566#issuecomment-1127101991

   Add some information about correctness validation. We import the test data and then export data (select * from test_table into outfile ...) using master and lz4_compress_opt branch. The results of two branches are the same after sort.
   
   btw. add a comment for NOTICE of NOT thread safe.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xiaokang commented on a diff in pull request #9566: optimize lz4 compress and decompress speed by reusing context

Posted by GitBox <gi...@apache.org>.
xiaokang commented on code in PR #9566:
URL: https://github.com/apache/incubator-doris/pull/9566#discussion_r872940972


##########
be/src/util/block_compression.cpp:
##########
@@ -370,27 +388,39 @@ class ZlibBlockCompression : public BlockCompressionCodec {
 };
 
 Status get_block_compression_codec(segment_v2::CompressionTypePB type,
-                                   const BlockCompressionCodec** codec) {
+                                   std::unique_ptr<BlockCompressionCodec>& codec) {
+    BlockCompressionCodec* ptr = nullptr;
     switch (type) {
     case segment_v2::CompressionTypePB::NO_COMPRESSION:
-        *codec = nullptr;
-        break;
+        codec.reset(nullptr);
+        return Status::OK();
     case segment_v2::CompressionTypePB::SNAPPY:
-        *codec = SnappyBlockCompression::instance();
+        ptr = new SnappyBlockCompression();

Review Comment:
   Yes, it's an easier way to use codec.reset(new SnappyBlockCompression()). 
   
   But I changed to the current more complex way, new a ptr first and then call ptr->init() and call codec.reset(ptr) finally, to avoid reset codec to a not properly initialized prt.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei merged pull request #9566: optimize lz4 compress and decompress speed by reusing context

Posted by GitBox <gi...@apache.org>.
yiguolei merged PR #9566:
URL: https://github.com/apache/incubator-doris/pull/9566


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org