You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by GitBox <gi...@apache.org> on 2020/03/26 11:15:46 UTC

[GitHub] [carbondata] ajantha-bhat opened a new pull request #3682: [CARBONDATA-3753] optimize double/float stats collector

ajantha-bhat opened a new pull request #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682
 
 
    ### Why is this PR needed?
   For every double/float column's value. we callĀ 
   `PrimitivePageStatsCollector.getDecimalCount(double value)`
   problem is, here we create new bigdecimal object andĀ plain string object every time.
   Which leads in huge memory usage during insert.
   
    ### What changes were proposed in this PR?
   Create only Bigdecimal object and use scale from that. 
       
    ### Does this PR introduce any user interface change?
    - No
    
    ### Is any new testcase added?
    - No
    
   Before the change:
   ![Screenshot from 2020-03-26 16-45-12](https://user-images.githubusercontent.com/5889404/77640947-380c0e80-6f81-11ea-97ff-f1b8942d99c6.png)
   
   
   After the change:
   ![Screenshot from 2020-03-26 16-30-27](https://user-images.githubusercontent.com/5889404/77640863-16128c00-6f81-11ea-8af6-1b60cc7a4ab8.png)
   
   There is about 5% improvement in insert for the TPCH lineitem table with10GB data without any change in store size.
   
     
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604438852
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2567/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] kunal642 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector

Posted by GitBox <gi...@apache.org>.
kunal642 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-610756333
 
 
   LGTM

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] QiangCai commented on a change in pull request #3682: [CARBONDATA-3753] optimize double/float stats collector

Posted by GitBox <gi...@apache.org>.
QiangCai commented on a change in pull request #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#discussion_r406015043
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datastore/page/statistics/PrimitivePageStatsCollector.java
 ##########
 @@ -233,20 +233,18 @@ public void update(long value) {
 
   /**
    * Return number of digit after decimal point
-   * TODO: it operation is costly, optimize for performance
    */
   private int getDecimalCount(double value) {
     int decimalPlaces = 0;
     try {
-      String strValue = BigDecimal.valueOf(Math.abs(value)).toPlainString();
-      int integerPlaces = strValue.indexOf('.');
-      if (-1 != integerPlaces) {
-        decimalPlaces = strValue.length() - integerPlaces - 1;
+      BigDecimal decimalValue = BigDecimal.valueOf(value);
 
 Review comment:
   better to write code to implement it.
   not required to use BigDecimal.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3682: [CARBONDATA-3753] optimize double/float stats collector

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#discussion_r406024143
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datastore/page/statistics/PrimitivePageStatsCollector.java
 ##########
 @@ -233,20 +233,18 @@ public void update(long value) {
 
   /**
    * Return number of digit after decimal point
-   * TODO: it operation is costly, optimize for performance
    */
   private int getDecimalCount(double value) {
     int decimalPlaces = 0;
     try {
-      String strValue = BigDecimal.valueOf(Math.abs(value)).toPlainString();
-      int integerPlaces = strValue.indexOf('.');
-      if (-1 != integerPlaces) {
-        decimalPlaces = strValue.length() - integerPlaces - 1;
+      BigDecimal decimalValue = BigDecimal.valueOf(value);
 
 Review comment:
   Actually double will not always be like `xx.yyy`, it will be having `exponent` also. 
   So, Bigdecimal already converts to string and do that logic. 
   
   May be next version we can reduce further by removing big decimal and copying API inside big decimal to do that.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604992269
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/864/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604995090
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2572/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604426644
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/859/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3682: [CARBONDATA-3753] optimize double/float stats collector

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#discussion_r406045145
 
 

 ##########
 File path: core/src/main/java/org/apache/carbondata/core/datastore/page/statistics/PrimitivePageStatsCollector.java
 ##########
 @@ -233,20 +233,18 @@ public void update(long value) {
 
   /**
    * Return number of digit after decimal point
-   * TODO: it operation is costly, optimize for performance
    */
   private int getDecimalCount(double value) {
     int decimalPlaces = 0;
     try {
-      String strValue = BigDecimal.valueOf(Math.abs(value)).toPlainString();
-      int integerPlaces = strValue.indexOf('.');
-      if (-1 != integerPlaces) {
-        decimalPlaces = strValue.length() - integerPlaces - 1;
+      BigDecimal decimalValue = BigDecimal.valueOf(value);
 
 Review comment:
   Also we need something without converting to string

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] ajantha-bhat commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604464981
 
 
   @jackylk , @ravipesala : please check

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] asfgit closed pull request #3682: [CARBONDATA-3753] optimize double/float stats collector

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] brijoobopanna commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector

Posted by GitBox <gi...@apache.org>.
brijoobopanna commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-604935581
 
 
   retest this please
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] QiangCai commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector

Posted by GitBox <gi...@apache.org>.
QiangCai commented on issue #3682: [CARBONDATA-3753] optimize double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682#issuecomment-611421173
 
 
   LGTM

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services