You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/08/26 06:23:09 UTC

[GitHub] [arrow] kiszk opened a new pull request #8056: ARROW-9861: [java] Support big-endian in DecimalVector

kiszk opened a new pull request #8056:
URL: https://github.com/apache/arrow/pull/8056


   This PR fixes failures in DecimalVectorTest on a big-endian platform


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] BryanCutler commented on pull request #8056: ARROW-9861: [Java] Support big-endian in DecimalVector

Posted by GitBox <gi...@apache.org>.
BryanCutler commented on pull request #8056:
URL: https://github.com/apache/arrow/pull/8056#issuecomment-721897680


   merged to master, thanks @kiszk !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] BryanCutler commented on pull request #8056: ARROW-9861: [Java] Support big-endian in DecimalVector

Posted by GitBox <gi...@apache.org>.
BryanCutler commented on pull request #8056:
URL: https://github.com/apache/arrow/pull/8056#issuecomment-722570942


   I did not see anything that looks like it would affect performance here, but I agree we should get some benchmarks going to be sure. I will look at you other PR next.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] BryanCutler commented on pull request #8056: ARROW-9861: [Java] Support big-endian in DecimalVector

Posted by GitBox <gi...@apache.org>.
BryanCutler commented on pull request #8056:
URL: https://github.com/apache/arrow/pull/8056#issuecomment-721893960


   The test error with Java JNI looks unrelated and seems to be an env issue with ORC, I'll go ahead with merging this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #8056: ARROW-9861: [Java] Support big-endian in DecimalVector

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #8056:
URL: https://github.com/apache/arrow/pull/8056#issuecomment-717661326


   @kiszk you'll probably need to make some fixes for Decimal256 as well, I think.
   
   @BryanCutler you mentioned you could devote some time to reviewing Big Endian changes?  Would you mind taking a look through this one and  @kiszk other Java changes?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kiszk commented on a change in pull request #8056: ARROW-9861: [Java] Support big-endian in DecimalVector

Posted by GitBox <gi...@apache.org>.
kiszk commented on a change in pull request #8056:
URL: https://github.com/apache/arrow/pull/8056#discussion_r515768974



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/util/DecimalUtility.java
##########
@@ -131,9 +135,14 @@ public static void writeBigDecimalToArrowBuf(BigDecimal value, ArrowBuf bytebuf,
    */
   public static void writeLongToArrowBuf(long value, ArrowBuf bytebuf, int index) {
     final long addressOfValue = bytebuf.memoryAddress() + (long) index * DECIMAL_BYTE_LENGTH;
-    PlatformDependent.putLong(addressOfValue, value);
     final long padValue = Long.signum(value) == -1 ? -1L : 0L;
-    PlatformDependent.putLong(addressOfValue + Long.BYTES, padValue);
+    if (LITTLE_ENDIAN) {
+      PlatformDependent.putLong(addressOfValue, value);
+      PlatformDependent.putLong(addressOfValue + Long.BYTES, padValue);

Review comment:
       This is used for `DecialVector` that is a fixed-width (16-byte) vector. This routine extends the signed bit to a new long. I will write a document in the comment.

##########
File path: java/vector/src/main/java/org/apache/arrow/vector/DecimalVector.java
##########
@@ -218,25 +220,37 @@ public void setBigEndian(int index, byte[] value) {
     valueBuffer.checkBytes((long) index * TYPE_WIDTH, (long) (index + 1) * TYPE_WIDTH);
 
     long outAddress = valueBuffer.memoryAddress() + (long) index * TYPE_WIDTH;
-    // swap bytes to convert BE to LE
-    for (int byteIdx = 0; byteIdx < length; ++byteIdx) {
-      PlatformDependent.putByte(outAddress + byteIdx, value[length - 1 - byteIdx]);
-    }
-
-    if (length == TYPE_WIDTH) {
-      return;
-    }
-
     if (length == 0) {
       PlatformDependent.setMemory(outAddress, DecimalVector.TYPE_WIDTH, (byte) 0);
-    } else if (length < TYPE_WIDTH) {
-      // sign extend
-      final byte pad = (byte) (value[0] < 0 ? 0xFF : 0x00);
-      PlatformDependent.setMemory(outAddress + length, DecimalVector.TYPE_WIDTH - length, pad);
+      return;
+    }
+    if (LITTLE_ENDIAN) {
+      // swap bytes to convert BE to LE
+      for (int byteIdx = 0; byteIdx < length; ++byteIdx) {
+        PlatformDependent.putByte(outAddress + byteIdx, value[length - 1 - byteIdx]);
+      }
+
+      if (length == TYPE_WIDTH) {
+        return;
+      }
+
+      if (length < TYPE_WIDTH) {
+        // sign extend
+        final byte pad = (byte) (value[0] < 0 ? 0xFF : 0x00);
+        PlatformDependent.setMemory(outAddress + length, DecimalVector.TYPE_WIDTH - length, pad);
+        return;
+      }
     } else {
-      throw new IllegalArgumentException(
-          "Invalid decimal value length. Valid length in [1 - 16], got " + length);
+      if (length <= TYPE_WIDTH) {

Review comment:
       If we add `if (length == TYPE_WIDTH) return` before line 244, no data is copied from `value` to `outAddress`. I will add a comment `copy data from value to outAddress` after line 244.

##########
File path: java/vector/src/test/java/org/apache/arrow/vector/util/DecimalUtilityTest.java
##########
@@ -32,6 +32,31 @@
   private static final BigInteger[] MIN_BIG_INT = new BigInteger[]{MAX_BIG_INT[0].multiply(BigInteger.valueOf(-1)),
      MAX_BIG_INT[1].multiply(BigInteger.valueOf(-1))};
 
+  @Test
+  public void testSetLongInDecimalArrowBuf() {
+    try (BufferAllocator allocator = new RootAllocator(128);
+         ArrowBuf buf = allocator.buffer(16);
+    ) {
+      int [] intValues = new int [] {Integer.MAX_VALUE, Integer.MIN_VALUE, 0};
+      for (int val : intValues) {
+        buf.clear();
+        DecimalUtility.writeLongToArrowBuf((long) val, buf, 0);
+        BigDecimal actual = DecimalUtility.getBigDecimalFromArrowBuf(buf, 0, 0);
+        BigDecimal expected = BigDecimal.valueOf(val);
+        Assert.assertEquals(expected, actual);
+      }
+
+      long [] longValues = new long[] {Long.MIN_VALUE, 0 , Long.MAX_VALUE};

Review comment:
       Sure, I will drop lines 49-57.

##########
File path: java/vector/src/main/java/org/apache/arrow/vector/util/DecimalUtility.java
##########
@@ -131,9 +135,14 @@ public static void writeBigDecimalToArrowBuf(BigDecimal value, ArrowBuf bytebuf,
    */
   public static void writeLongToArrowBuf(long value, ArrowBuf bytebuf, int index) {
     final long addressOfValue = bytebuf.memoryAddress() + (long) index * DECIMAL_BYTE_LENGTH;
-    PlatformDependent.putLong(addressOfValue, value);
     final long padValue = Long.signum(value) == -1 ? -1L : 0L;
-    PlatformDependent.putLong(addressOfValue + Long.BYTES, padValue);
+    if (LITTLE_ENDIAN) {
+      PlatformDependent.putLong(addressOfValue, value);
+      PlatformDependent.putLong(addressOfValue + Long.BYTES, padValue);

Review comment:
       In addition, this PR will support to write it to a 256-bit entry for the future.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] BryanCutler commented on a change in pull request #8056: ARROW-9861: [Java] Support big-endian in DecimalVector

Posted by GitBox <gi...@apache.org>.
BryanCutler commented on a change in pull request #8056:
URL: https://github.com/apache/arrow/pull/8056#discussion_r516276482



##########
File path: java/vector/src/main/java/org/apache/arrow/vector/util/DecimalUtility.java
##########
@@ -128,12 +131,26 @@ public static void writeBigDecimalToArrowBuf(BigDecimal value, ArrowBuf bytebuf,
 
   /**
    * Write the given long to the ArrowBuf at the given value index.
+   * This routine extends the original sign bit to a new upper area in 128-bit or 256-bit.
    */
-  public static void writeLongToArrowBuf(long value, ArrowBuf bytebuf, int index) {
-    final long addressOfValue = bytebuf.memoryAddress() + (long) index * DECIMAL_BYTE_LENGTH;
-    PlatformDependent.putLong(addressOfValue, value);
+  public static void writeLongToArrowBuf(long value, ArrowBuf bytebuf, int index, int byteWidth) {
+    if (byteWidth != 16 && byteWidth != 32) {
+      throw new UnsupportedOperationException("DeciimalUtility.writeLongToArrowBuf() currently supports " +

Review comment:
       typo: `DeciimalUtility` -> `DecimalUtility`

##########
File path: java/vector/src/test/java/org/apache/arrow/vector/complex/impl/TestComplexCopier.java
##########
@@ -526,7 +526,7 @@ public void testCopyFixedSizedListOfDecimalsVector() {
       to.addOrGetVector(FieldType.nullable(new ArrowType.Decimal(3, 0, 128)));
 
       DecimalHolder holder = new DecimalHolder();
-      holder.buffer = allocator.buffer(DecimalUtility.DECIMAL_BYTE_LENGTH);
+      holder.buffer = allocator.buffer(16);

Review comment:
       Could you use `DecimalVector.TYPE_WIDTH` here?

##########
File path: java/vector/src/test/java/org/apache/arrow/vector/complex/writer/TestComplexWriter.java
##########
@@ -310,7 +310,7 @@ public void listDecimalType() {
       listVector.allocateNew();
       UnionListWriter listWriter = new UnionListWriter(listVector);
       DecimalHolder holder = new DecimalHolder();
-      holder.buffer = allocator.buffer(DecimalUtility.DECIMAL_BYTE_LENGTH);
+      holder.buffer = allocator.buffer(16);

Review comment:
       Same here

##########
File path: java/vector/src/main/java/org/apache/arrow/vector/DecimalVector.java
##########
@@ -218,25 +220,37 @@ public void setBigEndian(int index, byte[] value) {
     valueBuffer.checkBytes((long) index * TYPE_WIDTH, (long) (index + 1) * TYPE_WIDTH);
 
     long outAddress = valueBuffer.memoryAddress() + (long) index * TYPE_WIDTH;
-    // swap bytes to convert BE to LE
-    for (int byteIdx = 0; byteIdx < length; ++byteIdx) {
-      PlatformDependent.putByte(outAddress + byteIdx, value[length - 1 - byteIdx]);
-    }
-
-    if (length == TYPE_WIDTH) {
-      return;
-    }
-
     if (length == 0) {
       PlatformDependent.setMemory(outAddress, DecimalVector.TYPE_WIDTH, (byte) 0);
-    } else if (length < TYPE_WIDTH) {
-      // sign extend
-      final byte pad = (byte) (value[0] < 0 ? 0xFF : 0x00);
-      PlatformDependent.setMemory(outAddress + length, DecimalVector.TYPE_WIDTH - length, pad);
+      return;
+    }
+    if (LITTLE_ENDIAN) {
+      // swap bytes to convert BE to LE
+      for (int byteIdx = 0; byteIdx < length; ++byteIdx) {
+        PlatformDependent.putByte(outAddress + byteIdx, value[length - 1 - byteIdx]);
+      }
+
+      if (length == TYPE_WIDTH) {
+        return;
+      }
+
+      if (length < TYPE_WIDTH) {
+        // sign extend
+        final byte pad = (byte) (value[0] < 0 ? 0xFF : 0x00);
+        PlatformDependent.setMemory(outAddress + length, DecimalVector.TYPE_WIDTH - length, pad);
+        return;
+      }
     } else {
-      throw new IllegalArgumentException(
-          "Invalid decimal value length. Valid length in [1 - 16], got " + length);
+      if (length <= TYPE_WIDTH) {

Review comment:
       ok, I see above it is already set during the swap. I guess there is no harm in calling `PlatformDependent.setMemory(outAddress, DecimalVector.TYPE_WIDTH - length, pad)` if `length == TYPE_WIDTH`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8056: ARROW-9861: [Java] Support big-endian in DecimalVector

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8056:
URL: https://github.com/apache/arrow/pull/8056#issuecomment-680684161


   https://issues.apache.org/jira/browse/ARROW-9861


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kiszk commented on pull request #8056: ARROW-9861: [Java] Support big-endian in DecimalVector

Posted by GitBox <gi...@apache.org>.
kiszk commented on pull request #8056:
URL: https://github.com/apache/arrow/pull/8056#issuecomment-720369676


   I will update the Decimal256Vector class late today or tomorrow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kiszk commented on pull request #8056: ARROW-9861: [Java] Support big-endian in DecimalVector

Posted by GitBox <gi...@apache.org>.
kiszk commented on pull request #8056:
URL: https://github.com/apache/arrow/pull/8056#issuecomment-722146327


   @BryanCutler Thank you. One comment.   
   According to [the document](https://github.com/apache/arrow/blob/master/docs/source/developers/contributing.rst#endianness), benchmarking is necessary before merging features (I think that the test code does not affect the performance)
   > Benchmarks for performance critical parts of the code to demonstrate no regression.
   
   I am working for benchmark bot for Java [here](https://github.com/apache/arrow/pull/8210). It would be good to merge new features after this bot will be available.
   
   cc @emkornfield 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] BryanCutler commented on pull request #8056: ARROW-9861: [Java] Support big-endian in DecimalVector

Posted by GitBox <gi...@apache.org>.
BryanCutler commented on pull request #8056:
URL: https://github.com/apache/arrow/pull/8056#issuecomment-717671086


   Sure, I can take a look. It might a day or two before I can though @kiszk .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] BryanCutler closed pull request #8056: ARROW-9861: [Java] Support big-endian in DecimalVector

Posted by GitBox <gi...@apache.org>.
BryanCutler closed pull request #8056:
URL: https://github.com/apache/arrow/pull/8056


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kiszk commented on pull request #8056: ARROW-9861: [Java] Support big-endian in DecimalVector

Posted by GitBox <gi...@apache.org>.
kiszk commented on pull request #8056:
URL: https://github.com/apache/arrow/pull/8056#issuecomment-717679501


   @emkornfield Sure, I will work for supporting Decimal256.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org