You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by GitBox <gi...@apache.org> on 2020/02/11 11:27:18 UTC

[GitHub] [carbondata] Pickupolddriver opened a new pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

Pickupolddriver opened a new pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611
 
 
   ### Why is this PR needed?
   
   In some cases, the data need to be uncompressed after loading into Carbondata file.
   In the current version, the project do not support loading data without compression.
   
   ### What changes were proposed in this PR?
   
   Provide a new Compressor as NoneCompressor implement the AbstractCompressor.
   This compressor can be set by calling
   CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR,"none");
   
   ### Does this PR introduce any user interface change?
   Yes
   
   ### Is any new testcase added?
   Yes
   
   
       
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on issue #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#issuecomment-584591430
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] ajantha-bhat commented on issue #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on issue #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#issuecomment-585024947
 
 
   @Pickupolddriver : Agree that it can improve the loading speed. But data will be 3x bigger. So, storage cost on OBS will be 3x more!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] kunal642 commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

Posted by GitBox <gi...@apache.org>.
kunal642 commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#discussion_r379266803
 
 

 ##########
 File path: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala
 ##########
 @@ -272,6 +272,79 @@ class TestLoadDataWithCompression extends QueryTest with BeforeAndAfterEach with
     }
   }
 
+  test("test current none compressor on legacy store with snappy") {
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true")
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, "snappy")
+    createTable()
+    loadData()
+
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true")
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, "none")
+    loadData()
+    checkAnswer(sql(s"SELECT count(*) FROM $tableName"), Seq(Row(16)))
 
 Review comment:
   Can you change select count(*) to select * so that actual data is validated

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] Pickupolddriver commented on issue #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

Posted by GitBox <gi...@apache.org>.
Pickupolddriver commented on issue #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#issuecomment-586201467
 
 
   > @Pickupolddriver : Agree that it can improve the loading speed. But data will be 3x bigger. So, storage cost on OBS will be 3x more!
   
   Data would be processed after loaded to OBS. So if we could provide a NonCompressor, it could avoid the data being compressed and then uncompressed. And the uncompressed data would be deleted after processed in OBS. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] kunal642 commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

Posted by GitBox <gi...@apache.org>.
kunal642 commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#discussion_r382578699
 
 

 ##########
 File path: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala
 ##########
 @@ -272,6 +272,79 @@ class TestLoadDataWithCompression extends QueryTest with BeforeAndAfterEach with
     }
   }
 
+  test("test current none compressor on legacy store with snappy") {
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true")
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, "snappy")
+    createTable()
+    loadData()
+
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true")
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, "none")
+    loadData()
+    checkAnswer(sql(s"SELECT count(*) FROM $tableName"), Seq(Row(16)))
 
 Review comment:
   No, just change the newly added test cases to select *

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] jackylk commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

Posted by GitBox <gi...@apache.org>.
jackylk commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#discussion_r381642255
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datastore/compression/CompressorFactory.java
 ##########
 @@ -35,6 +35,7 @@
   private final Map<String, Compressor> allSupportedCompressors = new HashMap<>();
 
   public enum NativeSupportedCompressor {
+    NONE("none",NoneCompressor.class),
 
 Review comment:
   suggest change to NONE("nocompress", NoneCompressor.class)
   `nocompress` will be appended to the data file name

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] kunal642 commented on issue #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

Posted by GitBox <gi...@apache.org>.
kunal642 commented on issue #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#issuecomment-584611062
 
 
   Can you please explain the scenario where no-compression would be beneficial?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] kunal642 commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

Posted by GitBox <gi...@apache.org>.
kunal642 commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#discussion_r379267345
 
 

 ##########
 File path: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala
 ##########
 @@ -272,6 +272,79 @@ class TestLoadDataWithCompression extends QueryTest with BeforeAndAfterEach with
     }
   }
 
+  test("test current none compressor on legacy store with snappy") {
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true")
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, "snappy")
+    createTable()
+    loadData()
+
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true")
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, "none")
+    loadData()
+    checkAnswer(sql(s"SELECT count(*) FROM $tableName"), Seq(Row(16)))
 
 Review comment:
   Can you change select count(*) to select * so that actual data is validated

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] Pickupolddriver commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

Posted by GitBox <gi...@apache.org>.
Pickupolddriver commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#discussion_r379335042
 
 

 ##########
 File path: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala
 ##########
 @@ -272,6 +272,79 @@ class TestLoadDataWithCompression extends QueryTest with BeforeAndAfterEach with
     }
   }
 
+  test("test current none compressor on legacy store with snappy") {
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true")
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, "snappy")
+    createTable()
+    loadData()
+
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true")
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, "none")
+    loadData()
+    checkAnswer(sql(s"SELECT count(*) FROM $tableName"), Seq(Row(16)))
 
 Review comment:
   Sure

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] Pickupolddriver commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

Posted by GitBox <gi...@apache.org>.
Pickupolddriver commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#discussion_r379335042
 
 

 ##########
 File path: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala
 ##########
 @@ -272,6 +272,79 @@ class TestLoadDataWithCompression extends QueryTest with BeforeAndAfterEach with
     }
   }
 
+  test("test current none compressor on legacy store with snappy") {
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true")
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, "snappy")
+    createTable()
+    loadData()
+
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true")
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, "none")
+    loadData()
+    checkAnswer(sql(s"SELECT count(*) FROM $tableName"), Seq(Row(16)))
 
 Review comment:
   Sure

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] QiangCai commented on issue #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

Posted by GitBox <gi...@apache.org>.
QiangCai commented on issue #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#issuecomment-590644147
 
 
   please rebase 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] jackylk commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

Posted by GitBox <gi...@apache.org>.
jackylk commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#discussion_r379408170
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datastore/compression/CompressorFactory.java
 ##########
 @@ -35,6 +35,7 @@
   private final Map<String, Compressor> allSupportedCompressors = new HashMap<>();
 
   public enum NativeSupportedCompressor {
+    NONE("none",NoneCompressor.class),
 
 Review comment:
   add space after `,`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] kunal642 commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

Posted by GitBox <gi...@apache.org>.
kunal642 commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#discussion_r379267345
 
 

 ##########
 File path: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala
 ##########
 @@ -272,6 +272,79 @@ class TestLoadDataWithCompression extends QueryTest with BeforeAndAfterEach with
     }
   }
 
+  test("test current none compressor on legacy store with snappy") {
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true")
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, "snappy")
+    createTable()
+    loadData()
+
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true")
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, "none")
+    loadData()
+    checkAnswer(sql(s"SELECT count(*) FROM $tableName"), Seq(Row(16)))
 
 Review comment:
   Can you change select count(*) to select * so that actual data is validated

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] jackylk commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

Posted by GitBox <gi...@apache.org>.
jackylk commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#discussion_r379408450
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datastore/compression/NoneCompressor.java
 ##########
 @@ -0,0 +1,51 @@
+package org.apache.carbondata.core.datastore.compression;
 
 Review comment:
   please add license header as other source file

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] Pickupolddriver commented on issue #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

Posted by GitBox <gi...@apache.org>.
Pickupolddriver commented on issue #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#issuecomment-585017763
 
 
   > Can you please explain the scenario where no-compression would be beneficial?
   
   This NoneCompress Compressor will improve the speed of loading data from Flink to OBS File by trade-off space and IO in some cases. 
   
   For example: when loading data from Flink to OBS, data needs to be compressed by Flink to temporary files and then decompressed by OBS. 
   After adding the NoneCompressor, users can use the NoneCompressor load data without compress first and then decompress the temporary files. 
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] Pickupolddriver commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

Posted by GitBox <gi...@apache.org>.
Pickupolddriver commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#discussion_r379815554
 
 

 ##########
 File path: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataWithCompression.scala
 ##########
 @@ -272,6 +272,79 @@ class TestLoadDataWithCompression extends QueryTest with BeforeAndAfterEach with
     }
   }
 
+  test("test current none compressor on legacy store with snappy") {
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true")
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, "snappy")
+    createTable()
+    loadData()
+
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_OFFHEAP_SORT, "true")
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR, "none")
+    loadData()
+    checkAnswer(sql(s"SELECT count(*) FROM $tableName"), Seq(Row(16)))
 
 Review comment:
   So you want to change all the test cases in this class from select count(*) to *? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] ajantha-bhat commented on issue #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on issue #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#issuecomment-589525751
 
 
   add to whitelist

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] jackylk commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.

Posted by GitBox <gi...@apache.org>.
jackylk commented on a change in pull request #3611: [CARBONDATA-3692] Support NoneCompression during loading data.
URL: https://github.com/apache/carbondata/pull/3611#discussion_r379408696
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datastore/compression/NoneCompressor.java
 ##########
 @@ -0,0 +1,51 @@
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.IOException;
+
+public class NoneCompressor extends AbstractCompressor {
 
 Review comment:
   You can add description to this class that it does not perform any compression

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services