You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@orc.apache.org by "ffacs (via GitHub)" <gi...@apache.org> on 2023/09/27 16:38:53 UTC

[GitHub] [orc] ffacs opened a new pull request, #1629: ORC-1387: [C++] Support schema evolution from decimal to numeric/decimal

ffacs opened a new pull request, #1629:
URL: https://github.com/apache/orc/pull/1629

   ### What changes were proposed in this pull request?
   Support schema evolution from decimal to {boolean, byte, short, int, long, float, double, decimal}
   
   
   ### Why are the changes needed?
   to support schema evolution
   
   ### How was this patch tested?
   UT passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] ORC-1387: [C++] Support schema evolution from decimal to numeric/decimal [orc]

Posted by "ffacs (via GitHub)" <gi...@apache.org>.

ffacs commented on code in PR #1629:
URL: https://github.com/apache/orc/pull/1629#discussion_r1349736930


##########
c++/src/Int128.cc:
##########
@@ -435,6 +435,13 @@ namespace orc {
     return buf.str();
   }
 
+  double Int128::toDouble() const {
+    if (fitsInLong()) {
+      return static_cast<double>(toLong());
+    }
+    return static_cast<double>(lowbits) + std::ldexp(static_cast<double>(highbits), 64);

Review Comment:
   > Does the java code do the same thing?
   
   The java side uses BigDecimal.doubleValue() to convert a decimal to a double.This operation would lose information, but it seems that the java ORC didn't do any checks too. 
   ```java
       if(intCompact != -9223372036854775808) {
               if (scale == 0) {
                   return (double)intCompact;
               } else {
                   /*
                    * If both intCompact and the scale can be exactly
                    * represented as double values, perform a single
                    * double multiply or divide to compute the (properly
                    * rounded) result.
                    */
                   if (Math.abs(intCompact) < 1L<<52 ) {
                       // Don't have too guard against
                       // Math.abs(MIN_VALUE) because of outer check
                       // against INFLATED.
                       if (scale > 0 && scale < double10pow.length) {
                           return (double)intCompact / double10pow[scale];
                       } else if (scale < 0 && scale > -double10pow.length) {
                           return (double)intCompact * double10pow[-scale];
                       }
                   }
               }
           }
           // Somewhat inefficient, but guaranteed to work.
           return Double.parseDouble(this.toString());
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] ORC-1387: [C++] Support schema evolution from decimal to numeric/decimal [orc]

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.

wgtmac commented on code in PR #1629:
URL: https://github.com/apache/orc/pull/1629#discussion_r1349611023


##########
c++/src/Int128.cc:
##########
@@ -435,6 +435,13 @@ namespace orc {
     return buf.str();
   }
 
+  double Int128::toDouble() const {
+    if (fitsInLong()) {
+      return static_cast<double>(toLong());
+    }
+    return static_cast<double>(lowbits) + std::ldexp(static_cast<double>(highbits), 64);

Review Comment:
   Should we do some error handling? What if the double value does not precisely store the original value?



##########
c++/include/orc/Int128.hh:
##########
@@ -300,6 +300,8 @@ namespace orc {
       throw std::range_error("Int128 too large to convert to long");
     }
 
+    double toDouble() const;

Review Comment:
   Please add a comment for what to expect from this function. Also please help fix the comment in line 294 above.



##########
c++/test/TestInt128.cc:
##########
@@ -1201,4 +1201,20 @@ namespace orc {
         << pair.second.toString();
   }
 
+  TEST(Int128, testConverToDouble) {

Review Comment:
   ```suggestion
     TEST(Int128, testConvertToDouble) {
   ```



##########
c++/src/ConvertColumnReader.cc:
##########
@@ -466,6 +466,136 @@ namespace orc {
     }
   }
 
+  template <typename FileTypeBatch, typename ReadTypeBatch, typename ReadType>
+  class DecimalToNumericColumnReader : public ConvertColumnReader {
+   public:
+    DecimalToNumericColumnReader(const Type& _readType, const Type& fileType, StripeStreams& stripe,
+                                 bool _throwOnOverflow)
+        : ConvertColumnReader(_readType, fileType, stripe, _throwOnOverflow) {
+      precision = fileType.getPrecision();
+      scale = fileType.getScale();
+      factor = 1;
+      for (int i = 0; i < scale; i++) {
+        factor *= 10;
+      }
+    }
+
+    void next(ColumnVectorBatch& rowBatch, uint64_t numValues, char* notNull) override {
+      ConvertColumnReader::next(rowBatch, numValues, notNull);
+
+      const auto& srcBatch = *SafeCastBatchTo<const FileTypeBatch*>(data.get());
+      auto& dstBatch = *SafeCastBatchTo<ReadTypeBatch*>(&rowBatch);
+      for (uint64_t i = 0; i < numValues; ++i) {
+        if (!rowBatch.hasNulls || rowBatch.notNull[i]) {
+          if constexpr (std::is_floating_point_v<ReadType>) {
+            convertDecimalToDouble(dstBatch, i, srcBatch);
+          } else {
+            convertDecimalToInteger(dstBatch, i, srcBatch);
+          }
+        }
+      }
+    }
+
+   private:
+    void convertDecimalToInteger(ReadTypeBatch& dstBatch, uint64_t idx,
+                                 const FileTypeBatch& srcBatch) {
+      using FileType = decltype(srcBatch.values[idx]);
+      Int128 result = scaleDownInt128ByPowerOfTen(srcBatch.values[idx], scale);
+      int64_t longValue = result.toLong();

Review Comment:
   This would be better to move to below line 508.



##########
c++/src/ConvertColumnReader.cc:
##########
@@ -466,6 +466,136 @@ namespace orc {
     }
   }
 
+  template <typename FileTypeBatch, typename ReadTypeBatch, typename ReadType>
+  class DecimalToNumericColumnReader : public ConvertColumnReader {
+   public:
+    DecimalToNumericColumnReader(const Type& _readType, const Type& fileType, StripeStreams& stripe,
+                                 bool _throwOnOverflow)
+        : ConvertColumnReader(_readType, fileType, stripe, _throwOnOverflow) {
+      precision = fileType.getPrecision();
+      scale = fileType.getScale();
+      factor = 1;
+      for (int i = 0; i < scale; i++) {
+        factor *= 10;
+      }
+    }
+
+    void next(ColumnVectorBatch& rowBatch, uint64_t numValues, char* notNull) override {
+      ConvertColumnReader::next(rowBatch, numValues, notNull);
+
+      const auto& srcBatch = *SafeCastBatchTo<const FileTypeBatch*>(data.get());
+      auto& dstBatch = *SafeCastBatchTo<ReadTypeBatch*>(&rowBatch);
+      for (uint64_t i = 0; i < numValues; ++i) {
+        if (!rowBatch.hasNulls || rowBatch.notNull[i]) {
+          if constexpr (std::is_floating_point_v<ReadType>) {
+            convertDecimalToDouble(dstBatch, i, srcBatch);
+          } else {
+            convertDecimalToInteger(dstBatch, i, srcBatch);
+          }
+        }
+      }
+    }
+
+   private:
+    void convertDecimalToInteger(ReadTypeBatch& dstBatch, uint64_t idx,
+                                 const FileTypeBatch& srcBatch) {
+      using FileType = decltype(srcBatch.values[idx]);
+      Int128 result = scaleDownInt128ByPowerOfTen(srcBatch.values[idx], scale);
+      int64_t longValue = result.toLong();
+      if (!result.fitsInLong()) {
+        handleOverflow<FileType, ReadType>(dstBatch, idx, throwOnOverflow);
+        return;
+      }
+      convertNumericElement<ReadType, int64_t>(longValue, dstBatch.data[idx], dstBatch, idx,
+                                               throwOnOverflow);
+    }
+
+    void convertDecimalToDouble(ReadTypeBatch& dstBatch, uint64_t idx,
+                                const FileTypeBatch& srcBatch) {
+      double doubleValue = 0;
+      Int128 i128 = srcBatch.values[idx];
+      doubleValue = i128.toDouble();

Review Comment:
   ```suggestion
         double doubleValue = srcBatch.values[idx].toDouble();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] ORC-1387: [C++] Support schema evolution from decimal to numeric/decimal [orc]

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.

wgtmac commented on code in PR #1629:
URL: https://github.com/apache/orc/pull/1629#discussion_r1349813237


##########
c++/src/Int128.cc:
##########
@@ -435,6 +435,13 @@ namespace orc {
     return buf.str();
   }
 
+  double Int128::toDouble() const {
+    if (fitsInLong()) {
+      return static_cast<double>(toLong());
+    }
+    return static_cast<double>(lowbits) + std::ldexp(static_cast<double>(highbits), 64);

Review Comment:
   It is OK to lose precision as IEEE754 is unable to store all integers as is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] ORC-1387: [C++] Support schema evolution from decimal to numeric/decimal [orc]

Posted by "ffacs (via GitHub)" <gi...@apache.org>.

ffacs commented on code in PR #1629:
URL: https://github.com/apache/orc/pull/1629#discussion_r1349663649


##########
c++/src/Int128.cc:
##########
@@ -435,6 +435,13 @@ namespace orc {
     return buf.str();
   }
 
+  double Int128::toDouble() const {
+    if (fitsInLong()) {
+      return static_cast<double>(toLong());
+    }
+    return static_cast<double>(lowbits) + std::ldexp(static_cast<double>(highbits), 64);

Review Comment:
   > 
   I didn't check the precision because I want Int128::toDouble() to act like 
   static_csat<double>(__int128_t). Do you think is that OK?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] ORC-1387: [C++] Support schema evolution from decimal to numeric/decimal [orc]

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.

wgtmac closed pull request #1629: ORC-1387: [C++] Support schema evolution from decimal to numeric/decimal
URL: https://github.com/apache/orc/pull/1629


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] ORC-1387: [C++] Support schema evolution from decimal to numeric/decimal [orc]

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.

wgtmac commented on code in PR #1629:
URL: https://github.com/apache/orc/pull/1629#discussion_r1349704267


##########
c++/src/Int128.cc:
##########
@@ -435,6 +435,13 @@ namespace orc {
     return buf.str();
   }
 
+  double Int128::toDouble() const {
+    if (fitsInLong()) {
+      return static_cast<double>(toLong());
+    }
+    return static_cast<double>(lowbits) + std::ldexp(static_cast<double>(highbits), 64);

Review Comment:
   Does the java code do the same thing?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org