You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/07/08 16:13:56 UTC

[GitHub] [orc] belugabehr opened a new pull request #743: ORC-834: Do Not Convert to String in DecimalFromTimestampTreeReader

belugabehr opened a new pull request #743:
URL: https://github.com/apache/orc/pull/743


   ### What changes were proposed in this pull request?
   When converting Timestamp to Decimal type, do not use String operations.
   
   
   ### Why are the changes needed?
   Performance.
   
   
   ### How was this patch tested?
   No changes to functionality. Use existing unit tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #743: ORC-834: Do Not Convert to String in DecimalFromTimestampTreeReader

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #743:
URL: https://github.com/apache/orc/pull/743#issuecomment-899598389


   Thank you for updating, @belugabehr .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] belugabehr commented on a change in pull request #743: ORC-834: Do Not Convert to String in DecimalFromTimestampTreeReader

Posted by GitBox <gi...@apache.org>.
belugabehr commented on a change in pull request #743:
URL: https://github.com/apache/orc/pull/743#discussion_r689661020



##########
File path: java/core/src/java/org/apache/orc/impl/ConvertTreeReaderFactory.java
##########
@@ -962,7 +963,10 @@ public void setConvertVectorElement(int elementNum) throws IOException {
         seconds += 1;
         nanos = 1_000_000_000 - nanos;
       }
-      HiveDecimal value = HiveDecimal.create(String.format("%d.%09d", seconds, nanos));
+      BigDecimal secondsBd = new BigDecimal(seconds);
+      BigDecimal nanosBd = new BigDecimal(nanos).movePointLeft(9);
+      BigDecimal resultBd = (seconds >= 0L) ? secondsBd.add(nanosBd) : secondsBd.subtract(nanosBd);

Review comment:
       If I break this code intentionally (add instead of subtract), there are failures in:
   
   ```
   [ERROR] Failures: 
   [ERROR]   TestProlepticConversions.testSchemaEvolutionTimestamp:375 row 0 ==> expected: <-6212894324588> but was: <-6212894324412>
   [ERROR]   TestProlepticConversions.testSchemaEvolutionTimestamp:375 row 0 ==> expected: <-6212877044588> but was: <-6212877044412>
   [ERROR]   TestProlepticConversions.testSchemaEvolutionTimestamp:375 row 0 ==> expected: <-6212894324588> but was: <-6212894324412>
   [ERROR]   TestProlepticConversions.testSchemaEvolutionTimestamp:375 row 0 ==> expected: <-6212877044588> but was: <-6212877044412>
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #743: ORC-834: Do Not Convert to String in DecimalFromTimestampTreeReader

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #743:
URL: https://github.com/apache/orc/pull/743#discussion_r689663716



##########
File path: java/core/src/java/org/apache/orc/impl/ConvertTreeReaderFactory.java
##########
@@ -962,7 +963,10 @@ public void setConvertVectorElement(int elementNum) throws IOException {
         seconds += 1;
         nanos = 1_000_000_000 - nanos;
       }
-      HiveDecimal value = HiveDecimal.create(String.format("%d.%09d", seconds, nanos));
+      BigDecimal secondsBd = new BigDecimal(seconds);
+      BigDecimal nanosBd = new BigDecimal(nanos).movePointLeft(9);
+      BigDecimal resultBd = (seconds >= 0L) ? secondsBd.add(nanosBd) : secondsBd.subtract(nanosBd);

Review comment:
       Thank you for checking. Got it. That looks like an enough test coverage.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun merged pull request #743: ORC-834: Do Not Convert to String in DecimalFromTimestampTreeReader

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun merged pull request #743:
URL: https://github.com/apache/orc/pull/743


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] belugabehr commented on pull request #743: ORC-834: Do Not Convert to String in DecimalFromTimestampTreeReader

Posted by GitBox <gi...@apache.org>.
belugabehr commented on pull request #743:
URL: https://github.com/apache/orc/pull/743#issuecomment-899566628


   @dongjoon-hyun Good to go!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] belugabehr edited a comment on pull request #743: ORC-834: Do Not Convert to String in DecimalFromTimestampTreeReader

Posted by GitBox <gi...@apache.org>.
belugabehr edited a comment on pull request #743:
URL: https://github.com/apache/orc/pull/743#issuecomment-877200933


   A quick test (hardly definitive, but indicative) shows an order of magnitude (10x) performance improvement:
   
   ```java
       long t1 = System.nanoTime();
       long r = 0;
       for (long i = -100_000L; i < 100_000; i++) {
         BigDecimal secondsBd = new BigDecimal(i);
         BigDecimal nanosBd = new BigDecimal(i).movePointLeft(9);
         BigDecimal resultBd = (i >= 0L) ? secondsBd.add(nanosBd) : secondsBd.subtract(nanosBd);
         r += resultBd.longValue();
       }
       long t2 = System.nanoTime();
   
       System.out.println(r);
       System.out.println(t2 - t1);
   ```
   
   This can be done even faster by more fully utilizing the Hive BigDecimalWritable stuff, but this is a good start.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] belugabehr commented on pull request #743: ORC-834: Do Not Convert to String in DecimalFromTimestampTreeReader

Posted by GitBox <gi...@apache.org>.
belugabehr commented on pull request #743:
URL: https://github.com/apache/orc/pull/743#issuecomment-877200933


   A quick test (hardly definitive, but indicative) shows an order of magnitude performance improvement:
   
   ```java
       long t1 = System.nanoTime();
       long r = 0;
       for (long i = -100_000L; i < 100_000; i++) {
         BigDecimal secondsBd = new BigDecimal(i);
         BigDecimal nanosBd = new BigDecimal(i).movePointLeft(9);
         BigDecimal resultBd = (i >= 0L) ? secondsBd.add(nanosBd) : secondsBd.subtract(nanosBd);
         r += resultBd.longValue();
       }
       long t2 = System.nanoTime();
   
       System.out.println(r);
       System.out.println(t2 - t1);
   ```
   
   This can be done even faster by more fully utilizing the Hive BigDecimalWritable stuff, but this is a good start.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #743: ORC-834: Do Not Convert to String in DecimalFromTimestampTreeReader

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #743:
URL: https://github.com/apache/orc/pull/743#discussion_r689639185



##########
File path: java/core/src/java/org/apache/orc/impl/ConvertTreeReaderFactory.java
##########
@@ -962,7 +963,10 @@ public void setConvertVectorElement(int elementNum) throws IOException {
         seconds += 1;
         nanos = 1_000_000_000 - nanos;
       }
-      HiveDecimal value = HiveDecimal.create(String.format("%d.%09d", seconds, nanos));
+      BigDecimal secondsBd = new BigDecimal(seconds);
+      BigDecimal nanosBd = new BigDecimal(nanos).movePointLeft(9);
+      BigDecimal resultBd = (seconds >= 0L) ? secondsBd.add(nanosBd) : secondsBd.subtract(nanosBd);

Review comment:
       Do you have a test coverage for this negative second case? If we don't have, could you make one please?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org