You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/04/08 09:40:56 UTC

[GitHub] [incubator-doris] Lchangliang opened a new pull request, #8923: Support compression prop

Lchangliang opened a new pull request, #8923:
URL: https://github.com/apache/incubator-doris/pull/8923

   # Proposed changes
   
   Issue Number: close #8427
   
   ## Problem Summary:
   
   The PR just support to specify compression algorithms when creating tables like HBase. But Doris just support LZ4. More compression algorithms should be supported in the future.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8923: [Feature] Support compression prop

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #8923:
URL: https://github.com/apache/incubator-doris/pull/8923#issuecomment-1139331654

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg commented on a diff in pull request #8923: [Feature] Support compression prop

Posted by GitBox <gi...@apache.org>.
yangzhg commented on code in PR #8923:
URL: https://github.com/apache/incubator-doris/pull/8923#discussion_r847880569


##########
gensrc/thrift/AgentService.thrift:
##########
@@ -84,6 +84,12 @@ struct TStorageParam {
     3: optional TS3StorageParam s3_storage_param
 }
 
+enum TCompressionType {
+    DEFAULT = 0,
+    LZ4 = 1,
+    LZ0 = 2

Review Comment:
   maybe `LZO`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg commented on pull request #8923: [Feature] Support compression prop

Posted by GitBox <gi...@apache.org>.
yangzhg commented on PR #8923:
URL: https://github.com/apache/incubator-doris/pull/8923#issuecomment-1095851195

    `compression type`   you mean used to compress page? 
   page compress use LZ4F not LZ4, and compress type is defined in `CompressionTypePB` in  `gensrc/proto/segment_v2.proto` no need to create new define in thrift,  or thrift should keep consistent with proto,
   the actual compress algorithm is hard code in `be/src/olap/rowset/segment_v2/segment_writer.cpp`
   so I think we need more discuss for this pr 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg commented on a diff in pull request #8923: [Feature] Support compression prop

Posted by GitBox <gi...@apache.org>.
yangzhg commented on code in PR #8923:
URL: https://github.com/apache/incubator-doris/pull/8923#discussion_r847878329


##########
be/src/olap/tablet_meta.cpp:
##########
@@ -89,7 +90,21 @@ TabletMeta::TabletMeta(int64_t table_id, int64_t partition_id, int64_t tablet_id
         LOG(WARNING) << "unknown tablet keys type";
         break;
     }
-    schema->set_compress_kind(COMPRESS_LZ4);
+
+    switch (compression_type)
+    {
+    case TCompressionType::LZ4:
+        schema->set_compress_kind(COMPRESS_LZ4);
+        break;
+    #ifdef DORIS_WITH_LZO
+    case TCompressionType::LZ0:
+        schema->set_compress_kind(COMPRESS_LZ0);

Review Comment:
   ```suggestion
           schema->set_compress_kind(COMPRESS_LZO);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] Lchangliang commented on pull request #8923: [Feature] Support compression prop

Posted by GitBox <gi...@apache.org>.
Lchangliang commented on PR #8923:
URL: https://github.com/apache/incubator-doris/pull/8923#issuecomment-1097631872

   > `compression type` you mean used to compress page? page compress use LZ4F not LZ4, and compress type is defined in `CompressionTypePB` in `gensrc/proto/segment_v2.proto` no need to create new define in thrift, or thrift should keep consistent with proto, the actual compress algorithm is hard code in `be/src/olap/rowset/segment_v2/segment_writer.cpp` so I think we need more discuss for this pr
   
   I'm sorry. I misunderstood the code.
   As you say, the issue want to specify different algorithms to compress page. Maybe i can create thrift that keeps consistent with proto and persist on tabletSchema. In segment_writer.cpp, "meta->set_compression(LZ4F);" change to "meta->set_compression(_tablet_schema->compression_type());".
   BTW, I want to know that what the enum 'CompressKind' use to do ?
   Thank you for your correction


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] Lchangliang commented on a diff in pull request #8923: [Feature] Support compression prop

Posted by GitBox <gi...@apache.org>.
Lchangliang commented on code in PR #8923:
URL: https://github.com/apache/incubator-doris/pull/8923#discussion_r880313916


##########
be/src/olap/tablet_meta.cpp:
##########
@@ -90,8 +91,34 @@ TabletMeta::TabletMeta(int64_t table_id, int64_t partition_id, int64_t tablet_id
         LOG(WARNING) << "unknown tablet keys type";
         break;
     }
+
     schema->set_compress_kind(COMPRESS_LZ4);
 
+    switch (compression_type)
+    {
+    case TCompressionType::NO_COMPRESSION:
+        schema->set_compression_type(NO_COMPRESSION);

Review Comment:
   I add some annotation to explain their differences.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] Lchangliang commented on pull request #8923: [Feature] Support compression prop

Posted by GitBox <gi...@apache.org>.
Lchangliang commented on PR #8923:
URL: https://github.com/apache/incubator-doris/pull/8923#issuecomment-1097659081

   > CompressKind used to compress segment files, but it is hard coded none for now
   
   Thanks for your answer.
   And 
   
   > Maybe i can create thrift that keeps consistent with proto and persist on tabletSchema. In segment_writer.cpp, "meta->set_compression(LZ4F);" change to "meta->set_compression(_tablet_schema->compression_type());".
   
   It's the right way to solve this issue ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg commented on a diff in pull request #8923: [Feature] Support compression prop

Posted by GitBox <gi...@apache.org>.
yangzhg commented on code in PR #8923:
URL: https://github.com/apache/incubator-doris/pull/8923#discussion_r850314406


##########
be/src/olap/tablet_schema.h:
##########
@@ -169,6 +172,7 @@ class TabletSchema {
     size_t _num_short_key_columns = 0;
     size_t _num_rows_per_row_block = 0;
     CompressKind _compress_kind = COMPRESS_NONE;
+    segment_v2::CompressionTypePB _compression_type = segment_v2::CompressionTypePB::NO_COMPRESSION;

Review Comment:
   why defaut is NO_COMPRESSION



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8923: [Feature] Support compression prop

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #8923:
URL: https://github.com/apache/incubator-doris/pull/8923#issuecomment-1138354649

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] Lchangliang commented on a diff in pull request #8923: [Feature] Support compression prop

Posted by GitBox <gi...@apache.org>.
Lchangliang commented on code in PR #8923:
URL: https://github.com/apache/incubator-doris/pull/8923#discussion_r851190905


##########
be/src/olap/tablet_schema.h:
##########
@@ -169,6 +172,7 @@ class TabletSchema {
     size_t _num_short_key_columns = 0;
     size_t _num_rows_per_row_block = 0;
     CompressKind _compress_kind = COMPRESS_NONE;
+    segment_v2::CompressionTypePB _compression_type = segment_v2::CompressionTypePB::NO_COMPRESSION;

Review Comment:
   I think it is just a zero value. It will be set after creation like _compress_kind. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg commented on a diff in pull request #8923: [Feature] Support compression prop

Posted by GitBox <gi...@apache.org>.
yangzhg commented on code in PR #8923:
URL: https://github.com/apache/incubator-doris/pull/8923#discussion_r847880000


##########
gensrc/thrift/AgentService.thrift:
##########
@@ -84,6 +84,12 @@ struct TStorageParam {
     3: optional TS3StorageParam s3_storage_param
 }
 
+enum TCompressionType {

Review Comment:
   use `DEFAULT` is not a good idea,  it is better to specify an explicit type



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xiaokang commented on pull request #8923: [Feature] Support compression prop

Posted by GitBox <gi...@apache.org>.
xiaokang commented on PR #8923:
URL: https://github.com/apache/incubator-doris/pull/8923#issuecomment-1135352894

   There are still some hardcoded LZ4F in the following files.
   
   be/src/olap/rowset/segment_v2/column_writer.cpp:            length_options.meta->set_compression(LZ4F);
   be/src/olap/rowset/segment_v2/column_writer.cpp:                null_options.meta->set_compression(LZ4F);
   be/src/olap/rowset/segment_v2/bitmap_index_writer.cpp:            options.compression = LZ4F;
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xiaokang commented on a diff in pull request #8923: [Feature] Support compression prop

Posted by GitBox <gi...@apache.org>.
xiaokang commented on code in PR #8923:
URL: https://github.com/apache/incubator-doris/pull/8923#discussion_r880008300


##########
fe/fe-core/src/main/java/org/apache/doris/common/util/PropertyAnalyzer.java:
##########
@@ -434,6 +436,31 @@ public static long analyzeTimeout(Map<String, String> properties, long defaultTi
         return timeout;
     }
 
+    // analyzeCompressionType will parse the compression type from properties
+    public static TCompressionType analyzeCompressionType(Map<String, String> properties) throws  AnalysisException {
+        String compressionType = "";
+        if (properties != null && properties.containsKey(PROPERTIES_COMPRESSION)) {
+            compressionType = properties.get(PROPERTIES_COMPRESSION);
+            properties.remove(PROPERTIES_COMPRESSION);
+        } else {
+            return TCompressionType.LZ4F;
+        }
+
+        if (compressionType.equalsIgnoreCase("no_compression")) {
+            return TCompressionType.NO_COMPRESSION;
+        } else if (compressionType.equalsIgnoreCase("lz4")) {
+            return TCompressionType.LZ4;
+        } else if (compressionType.equalsIgnoreCase("lz4f")) {
+            return TCompressionType.LZ4F;
+        } else if (compressionType.equalsIgnoreCase("zlib")) {

Review Comment:
   snappy is missed here



##########
be/src/olap/rowset/segment_v2/segment_writer.h:
##########
@@ -92,6 +89,7 @@ class SegmentWriter {
     Status _write_short_key_index();
     Status _write_footer();
     Status _write_raw_data(const std::vector<Slice>& slices);
+    void init_column_meta(ColumnMetaPB* meta, uint32_t* column_id, const TabletColumn& column) const;

Review Comment:
   why change init_column_meta from public to private?



##########
gensrc/thrift/AgentService.thrift:
##########
@@ -84,6 +84,16 @@ struct TStorageParam {
     3: optional TS3StorageParam s3_storage_param
 }
 
+enum TCompressionType {

Review Comment:
   is it better to be consistent with CompressionTypePB?



##########
be/src/olap/tablet_meta.cpp:
##########
@@ -90,8 +91,34 @@ TabletMeta::TabletMeta(int64_t table_id, int64_t partition_id, int64_t tablet_id
         LOG(WARNING) << "unknown tablet keys type";
         break;
     }
+
     schema->set_compress_kind(COMPRESS_LZ4);
 
+    switch (compression_type)
+    {
+    case TCompressionType::NO_COMPRESSION:
+        schema->set_compression_type(NO_COMPRESSION);

Review Comment:
   It's confusing to keep both set_compress_kink and set_compression_type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #8923: [Feature] Support compression prop

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #8923:
URL: https://github.com/apache/incubator-doris/pull/8923#issuecomment-1138354682

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] Lchangliang commented on pull request #8923: [Feature] Support compression prop

Posted by GitBox <gi...@apache.org>.
Lchangliang commented on PR #8923:
URL: https://github.com/apache/incubator-doris/pull/8923#issuecomment-1097631588

   > LZ4F
   
   I'm sorry. I misunderstood the code.  
   As you say, the issue want to specify different algorithms to compress page. Maybe i can create thrift that keeps consistent with proto and persist on tabletSchema. In segment_writer.cpp, "meta->set_compression(LZ4F);" change to "meta->set_compression(_tablet_schema->compression_type());". 
   BTW, I want to know that what the enum 'CompressKind' use to do ? 
   Thank you for your correction
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg commented on pull request #8923: [Feature] Support compression prop

Posted by GitBox <gi...@apache.org>.
yangzhg commented on PR #8923:
URL: https://github.com/apache/incubator-doris/pull/8923#issuecomment-1097644501

   CompressKind used to compress segment files, but it is hard coded none for now
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg commented on a diff in pull request #8923: [Feature] Support compression prop

Posted by GitBox <gi...@apache.org>.
yangzhg commented on code in PR #8923:
URL: https://github.com/apache/incubator-doris/pull/8923#discussion_r850268897


##########
gensrc/thrift/AgentService.thrift:
##########
@@ -105,6 +115,7 @@ struct TCreateTabletReq {
     13: optional TStorageFormat storage_format
     14: optional TTabletType tablet_type
     15: optional TStorageParam storage_param
+    16: optional TCompressionType compression_type

Review Comment:
   better to set a default value of LZ4F



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg commented on pull request #8923: [Feature] Support compression prop

Posted by GitBox <gi...@apache.org>.
yangzhg commented on PR #8923:
URL: https://github.com/apache/incubator-doris/pull/8923#issuecomment-1098934453

   > It's the right way to solve this issue ?
   
   I think that a good idea


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] Lchangliang commented on a diff in pull request #8923: [Feature] Support compression prop

Posted by GitBox <gi...@apache.org>.
Lchangliang commented on code in PR #8923:
URL: https://github.com/apache/incubator-doris/pull/8923#discussion_r851191311


##########
gensrc/thrift/AgentService.thrift:
##########
@@ -105,6 +115,7 @@ struct TCreateTabletReq {
     13: optional TStorageFormat storage_format
     14: optional TTabletType tablet_type
     15: optional TStorageParam storage_param
+    16: optional TCompressionType compression_type

Review Comment:
   Thank you for your advice. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman merged pull request #8923: [Feature] Support compression prop

Posted by GitBox <gi...@apache.org>.
morningman merged PR #8923:
URL: https://github.com/apache/incubator-doris/pull/8923


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org