You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/07/27 10:48:15 UTC

[GitHub] [spark] mundaym opened a new pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

mundaym opened a new pull request #29259:
URL: https://github.com/apache/spark/pull/29259


   ### What changes were proposed in this pull request?
   PR #26548 means that RecordBinaryComparator now uses big endian
   byte order for long comparisons. However, this means that some of
   the constants in the regression tests no longer map to the same
   values in the comparison that they used to.
   
   For example, one of the tests does a comparison between
   Long.MIN_VALUE and 1 in order to trigger an overflow condition that
   existed in the past (i.e. Long.MIN_VALUE - 1). These constants
   correspond to the values 0x80..00 and 0x00..01. However on a
   little-endian machine the bytes in these values are now swapped
   before they are compared. This means that we will now be comparing
   0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow
   therefore missing the original purpose of the test.
   
   To fix this the constants are now explicitly written out in big
   endian byte order to match the byte order used in the comparison.
   This also fixes the tests on big endian machines (which would
   otherwise get a different comparison result to the little-endian
   machines).
   
   ### Why are the changes needed?
   The regression tests no longer serve their initial purposes and also fail on big-endian systems.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Tests run on big-endian system (s390x).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29259: [SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29259:
URL: https://github.com/apache/spark/pull/29259#issuecomment-669138276


   **[Test build #127095 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127095/testReport)** for PR 29259 at commit [`5d29560`](https://github.com/apache/spark/commit/5d2956015cbe2d0f73650fd2f1f369eaebbee114).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29259: [SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29259:
URL: https://github.com/apache/spark/pull/29259#discussion_r465779254



##########
File path: sql/core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java
##########
@@ -261,40 +263,74 @@ public void testBinaryComparatorForNullColumns() throws Exception {
   public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws Exception {
     int numFields = 1;
 
+    // Place the following bytes (hex) into UnsafeRows for the comparison:
+    //
+    //   index | 00 01 02 03 04 05 06 07
+    //   ------+------------------------
+    //   row1  | 00 00 00 00 00 00 00 0b
+    //   row2  | 00 00 00 00 80 00 00 0a
+    //
+    // The byte layout needs to be identical on all platforms regardless of
+    // of endianness. To achieve this the bytes in each value are reversed
+    // on little-endian platforms.
+    long row1Data = 11L;
+    long row2Data = 11L + Integer.MAX_VALUE;
+    if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {

Review comment:
       My worry is that: this changes what we were testing before. When we wrote the test, we assume the test is run in little-endian machines. Now we reverse the bytes for little-endian machines, and as a result the testing result also changes (`result > 0` -> `result < 0`).
   
   I'm not very sure about the intention of these 2 tests. Maybe they were wrong at the very beginning and we should reverse the bytes for testing. I'll leave it for others to decide.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29259:
URL: https://github.com/apache/spark/pull/29259#discussion_r461302879



##########
File path: sql/core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java
##########
@@ -261,40 +263,58 @@ public void testBinaryComparatorForNullColumns() throws Exception {
   public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws Exception {
     int numFields = 1;
 
+    long row1Data = 11L;
+    long row2Data = 11L + Integer.MAX_VALUE;
+
+    // BinaryComparator compares longs in big-endian byte order.
+    if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {
+      row1Data = Long.reverseBytes(row1Data);
+      row2Data = Long.reverseBytes(row2Data);
+    }
+
     UnsafeRow row1 = new UnsafeRow(numFields);
     byte[] data1 = new byte[100];
     row1.pointTo(data1, computeSizeInBytes(numFields * 8));
-    row1.setLong(0, 11);
+    row1.setLong(0, row1Data);
 
     UnsafeRow row2 = new UnsafeRow(numFields);
     byte[] data2 = new byte[100];
     row2.pointTo(data2, computeSizeInBytes(numFields * 8));
-    row2.setLong(0, 11L + Integer.MAX_VALUE);
+    row2.setLong(0, row2Data);
 
     insertRow(row1);
     insertRow(row2);
 
-    Assert.assertTrue(compare(0, 1) > 0);
+    Assert.assertTrue(compare(0, 1) < 0);
   }
 
   @Test
   public void testBinaryComparatorWhenSubtractionCanOverflowLongValue() throws Exception {
     int numFields = 1;
 
+    long row1Data = Long.MIN_VALUE;
+    long row2Data = 1;
+
+    // BinaryComparator compares longs in big-endian byte order.
+    if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {

Review comment:
       > For example, one of the tests does a comparison between
   Long.MIN_VALUE and 1 in order to trigger an overflow condition that
   existed in the past (i.e. Long.MIN_VALUE - 1). These constants
   correspond to the values 0x80..00 and 0x00..01. However on a
   little-endian machine the bytes in these values are now swapped
   before they are compared. This means that we will now be comparing
   0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow
   therefore missing the original purpose of the test.
   
   I'm a bit confused by the PR description; I checked the original PR that added this test case and it seems like the overflow in the test title comes from the old code: https://github.com/apache/spark/pull/22101/files#diff-4ec35a60ad6a3f3f60f4d5ce91f59933L61-L63 To keep the original intention, why do you think we need to update the existing test in little-endian cases?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29259:
URL: https://github.com/apache/spark/pull/29259#discussion_r465003006



##########
File path: sql/core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java
##########
@@ -261,40 +263,58 @@ public void testBinaryComparatorForNullColumns() throws Exception {
   public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws Exception {
     int numFields = 1;
 
+    long row1Data = 11L;
+    long row2Data = 11L + Integer.MAX_VALUE;
+
+    // BinaryComparator compares longs in big-endian byte order.
+    if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {
+      row1Data = Long.reverseBytes(row1Data);
+      row2Data = Long.reverseBytes(row2Data);
+    }
+
     UnsafeRow row1 = new UnsafeRow(numFields);
     byte[] data1 = new byte[100];
     row1.pointTo(data1, computeSizeInBytes(numFields * 8));
-    row1.setLong(0, 11);
+    row1.setLong(0, row1Data);
 
     UnsafeRow row2 = new UnsafeRow(numFields);
     byte[] data2 = new byte[100];
     row2.pointTo(data2, computeSizeInBytes(numFields * 8));
-    row2.setLong(0, 11L + Integer.MAX_VALUE);
+    row2.setLong(0, row2Data);
 
     insertRow(row1);
     insertRow(row2);
 
-    Assert.assertTrue(compare(0, 1) > 0);
+    Assert.assertTrue(compare(0, 1) < 0);

Review comment:
       > then you are running a different test on little- vs big-endian
   
   Yea it's true. But it's also true that this is what happens in the reality: the comparator compares bytes from left to right, no matter it's little- or big-endian.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
srowen commented on pull request #29259:
URL: https://github.com/apache/spark/pull/29259#issuecomment-665700938


   Maybe just leave it as is and write a comment about what the resulting byte order is, for readers


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mundaym commented on pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
mundaym commented on pull request #29259:
URL: https://github.com/apache/spark/pull/29259#issuecomment-665598880


   Does anyone know if there is a clean(er) way to copy an 8 byte array into a single element in an UnsafeRow? I'm trying to avoid the byte reversal but as far as I can tell the interface requires me to go through `setLong` to set an 8 byte value which means I need to take into account the endianness of the platform to get a particular byte layout in memory (i.e. use reverseBytes or go through a ByteBuffer). The other option is to modify the byte array that backs the UnsafeRow directly but I don't think that would necessarily be robust against any future UnsafeRow changes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29259: [SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29259:
URL: https://github.com/apache/spark/pull/29259#discussion_r465840296



##########
File path: sql/core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java
##########
@@ -261,40 +263,74 @@ public void testBinaryComparatorForNullColumns() throws Exception {
   public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws Exception {
     int numFields = 1;
 
+    // Place the following bytes (hex) into UnsafeRows for the comparison:
+    //
+    //   index | 00 01 02 03 04 05 06 07
+    //   ------+------------------------
+    //   row1  | 00 00 00 00 00 00 00 0b
+    //   row2  | 00 00 00 00 80 00 00 0a
+    //
+    // The byte layout needs to be identical on all platforms regardless of
+    // of endianness. To achieve this the bytes in each value are reversed
+    // on little-endian platforms.
+    long row1Data = 11L;
+    long row2Data = 11L + Integer.MAX_VALUE;
+    if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {

Review comment:
       ah I see, LGTM




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #29259:
URL: https://github.com/apache/spark/pull/29259#discussion_r461114124



##########
File path: sql/core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java
##########
@@ -261,40 +263,58 @@ public void testBinaryComparatorForNullColumns() throws Exception {
   public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws Exception {
     int numFields = 1;
 
+    long row1Data = 11L;
+    long row2Data = 11L + Integer.MAX_VALUE;
+
+    // BinaryComparator compares longs in big-endian byte order.

Review comment:
       Would you like to add few more words? It is a bit hard to reason about this with this only sentence.
   
   E.g. "... In little-endian sysyem, this bytes comparison does not overflow. We swap it to make sure overflow happened."

##########
File path: sql/core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java
##########
@@ -261,40 +263,58 @@ public void testBinaryComparatorForNullColumns() throws Exception {
   public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws Exception {
     int numFields = 1;
 
+    long row1Data = 11L;
+    long row2Data = 11L + Integer.MAX_VALUE;
+
+    // BinaryComparator compares longs in big-endian byte order.

Review comment:
       I think the overflow only happened when comparing longs? @mundaym




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29259: [SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29259:
URL: https://github.com/apache/spark/pull/29259#issuecomment-669264009






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29259: [SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29259:
URL: https://github.com/apache/spark/pull/29259#issuecomment-669262903


   **[Test build #127095 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127095/testReport)** for PR 29259 at commit [`5d29560`](https://github.com/apache/spark/commit/5d2956015cbe2d0f73650fd2f1f369eaebbee114).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #29259: [SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #29259:
URL: https://github.com/apache/spark/pull/29259#issuecomment-669285985


   thanks, merging to master!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29259: [SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29259:
URL: https://github.com/apache/spark/pull/29259#issuecomment-669138797






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29259: [SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29259:
URL: https://github.com/apache/spark/pull/29259#issuecomment-669138797






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #29259:
URL: https://github.com/apache/spark/pull/29259#issuecomment-664584038






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29259:
URL: https://github.com/apache/spark/pull/29259#issuecomment-664313746


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mundaym commented on pull request #29259: [SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
mundaym commented on pull request #29259:
URL: https://github.com/apache/spark/pull/29259#issuecomment-669137810


   Thanks everyone. Sorry for the delay. I've updated the PR commit description and the comments in the tests. I've also opened and linked against a new JIRA issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #29259:
URL: https://github.com/apache/spark/pull/29259#discussion_r461119322



##########
File path: sql/core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java
##########
@@ -261,40 +263,58 @@ public void testBinaryComparatorForNullColumns() throws Exception {
   public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws Exception {
     int numFields = 1;
 
+    long row1Data = 11L;
+    long row2Data = 11L + Integer.MAX_VALUE;
+
+    // BinaryComparator compares longs in big-endian byte order.

Review comment:
       I don't even think that's quite true. The comparison isn't endian at all  as it is byte-by-byte. But the point here is to write bytes in a certain order for the test, for sure.

##########
File path: sql/core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java
##########
@@ -261,40 +263,58 @@ public void testBinaryComparatorForNullColumns() throws Exception {
   public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws Exception {
     int numFields = 1;
 
+    long row1Data = 11L;
+    long row2Data = 11L + Integer.MAX_VALUE;
+
+    // BinaryComparator compares longs in big-endian byte order.

Review comment:
       I don't think overflow was the issue per se; signed vs unsigned bytes were, for sure, in the original issue. But not so much here in this test case.

##########
File path: sql/core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java
##########
@@ -261,40 +263,58 @@ public void testBinaryComparatorForNullColumns() throws Exception {
   public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws Exception {
     int numFields = 1;
 
+    long row1Data = 11L;
+    long row2Data = 11L + Integer.MAX_VALUE;
+
+    // BinaryComparator compares longs in big-endian byte order.
+    if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {
+      row1Data = Long.reverseBytes(row1Data);
+      row2Data = Long.reverseBytes(row2Data);
+    }
+
     UnsafeRow row1 = new UnsafeRow(numFields);
     byte[] data1 = new byte[100];
     row1.pointTo(data1, computeSizeInBytes(numFields * 8));
-    row1.setLong(0, 11);
+    row1.setLong(0, row1Data);
 
     UnsafeRow row2 = new UnsafeRow(numFields);
     byte[] data2 = new byte[100];
     row2.pointTo(data2, computeSizeInBytes(numFields * 8));
-    row2.setLong(0, 11L + Integer.MAX_VALUE);
+    row2.setLong(0, row2Data);
 
     insertRow(row1);
     insertRow(row2);
 
-    Assert.assertTrue(compare(0, 1) > 0);
+    Assert.assertTrue(compare(0, 1) < 0);
   }
 
   @Test
   public void testBinaryComparatorWhenSubtractionCanOverflowLongValue() throws Exception {
     int numFields = 1;
 
+    long row1Data = Long.MIN_VALUE;
+    long row2Data = 1;
+
+    // BinaryComparator compares longs in big-endian byte order.
+    if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {

Review comment:
       If you mean, this is kind of a different issue -- yes. Should be a new JIRA. I'd summarize this as: the bytes that this test sets up and asserts about are different on big-endian. It creates the wrong test.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #29259: [SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #29259:
URL: https://github.com/apache/spark/pull/29259


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
srowen commented on pull request #29259:
URL: https://github.com/apache/spark/pull/29259#issuecomment-664585916


   @viirya but this test asserts a particular ordering -- for test purposes, sure. The ordering won't be consistent, so the test fails based on arch. Because this is a special case, asserting the ordering, the endian-ness has to be fixed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #29259:
URL: https://github.com/apache/spark/pull/29259#discussion_r465007841



##########
File path: sql/core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java
##########
@@ -261,40 +263,58 @@ public void testBinaryComparatorForNullColumns() throws Exception {
   public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws Exception {
     int numFields = 1;
 
+    long row1Data = 11L;
+    long row2Data = 11L + Integer.MAX_VALUE;
+
+    // BinaryComparator compares longs in big-endian byte order.
+    if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {
+      row1Data = Long.reverseBytes(row1Data);
+      row2Data = Long.reverseBytes(row2Data);
+    }
+
     UnsafeRow row1 = new UnsafeRow(numFields);
     byte[] data1 = new byte[100];
     row1.pointTo(data1, computeSizeInBytes(numFields * 8));
-    row1.setLong(0, 11);
+    row1.setLong(0, row1Data);
 
     UnsafeRow row2 = new UnsafeRow(numFields);
     byte[] data2 = new byte[100];
     row2.pointTo(data2, computeSizeInBytes(numFields * 8));
-    row2.setLong(0, 11L + Integer.MAX_VALUE);
+    row2.setLong(0, row2Data);
 
     insertRow(row1);
     insertRow(row2);
 
-    Assert.assertTrue(compare(0, 1) > 0);
+    Assert.assertTrue(compare(0, 1) < 0);

Review comment:
       That's right. There is no endian-ness issue in the logic being tested. There is endian-ness in how this test constructs the bytes to compare, because it writes a long into memory. I doubt this test's purpose holds _both_ ways? It "works" but either little- or big-endian machines aren't running the test as intended. It feels like there's one intended test to run here, not either of two.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29259:
URL: https://github.com/apache/spark/pull/29259#issuecomment-664314873


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29259: [SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29259:
URL: https://github.com/apache/spark/pull/29259#issuecomment-669264009






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29259:
URL: https://github.com/apache/spark/pull/29259#issuecomment-664313746


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on a change in pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
kiszk commented on a change in pull request #29259:
URL: https://github.com/apache/spark/pull/29259#discussion_r461797707



##########
File path: sql/core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java
##########
@@ -261,40 +263,58 @@ public void testBinaryComparatorForNullColumns() throws Exception {
   public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws Exception {
     int numFields = 1;
 
+    long row1Data = 11L;
+    long row2Data = 11L + Integer.MAX_VALUE;
+
+    // BinaryComparator compares longs in big-endian byte order.
+    if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {
+      row1Data = Long.reverseBytes(row1Data);
+      row2Data = Long.reverseBytes(row2Data);
+    }
+
     UnsafeRow row1 = new UnsafeRow(numFields);
     byte[] data1 = new byte[100];
     row1.pointTo(data1, computeSizeInBytes(numFields * 8));
-    row1.setLong(0, 11);
+    row1.setLong(0, row1Data);
 
     UnsafeRow row2 = new UnsafeRow(numFields);
     byte[] data2 = new byte[100];
     row2.pointTo(data2, computeSizeInBytes(numFields * 8));
-    row2.setLong(0, 11L + Integer.MAX_VALUE);
+    row2.setLong(0, row2Data);
 
     insertRow(row1);
     insertRow(row2);
 
-    Assert.assertTrue(compare(0, 1) > 0);
+    Assert.assertTrue(compare(0, 1) < 0);
   }
 
   @Test
   public void testBinaryComparatorWhenSubtractionCanOverflowLongValue() throws Exception {
     int numFields = 1;
 
+    long row1Data = Long.MIN_VALUE;
+    long row2Data = 1;
+
+    // BinaryComparator compares longs in big-endian byte order.
+    if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {

Review comment:
       IIUC, the intention of this test is to perform the comparison at `compare()` regardless endianness.   
    
   For people, would it be good to add byte pattern to be tested.
   ```
   80 00 00 00 00 00 00 00
   00 00 00 00 00 00 00 01
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #29259:
URL: https://github.com/apache/spark/pull/29259#discussion_r464995489



##########
File path: sql/core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java
##########
@@ -261,40 +263,58 @@ public void testBinaryComparatorForNullColumns() throws Exception {
   public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws Exception {
     int numFields = 1;
 
+    long row1Data = 11L;
+    long row2Data = 11L + Integer.MAX_VALUE;
+
+    // BinaryComparator compares longs in big-endian byte order.
+    if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {
+      row1Data = Long.reverseBytes(row1Data);
+      row2Data = Long.reverseBytes(row2Data);
+    }
+
     UnsafeRow row1 = new UnsafeRow(numFields);
     byte[] data1 = new byte[100];
     row1.pointTo(data1, computeSizeInBytes(numFields * 8));
-    row1.setLong(0, 11);
+    row1.setLong(0, row1Data);
 
     UnsafeRow row2 = new UnsafeRow(numFields);
     byte[] data2 = new byte[100];
     row2.pointTo(data2, computeSizeInBytes(numFields * 8));
-    row2.setLong(0, 11L + Integer.MAX_VALUE);
+    row2.setLong(0, row2Data);
 
     insertRow(row1);
     insertRow(row2);
 
-    Assert.assertTrue(compare(0, 1) > 0);
+    Assert.assertTrue(compare(0, 1) < 0);

Review comment:
       That would also work here. I think it's a little less ideal, because then you are running a different test on little- vs big-endian (and those are the correct answers to the two different tests). I think only one of those tests was the intended one AFAICT, and I buy the argument that the proposed change _restores_ it as well as addresses endianness.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #29259: [SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #29259:
URL: https://github.com/apache/spark/pull/29259#discussion_r465800031



##########
File path: sql/core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java
##########
@@ -261,40 +263,74 @@ public void testBinaryComparatorForNullColumns() throws Exception {
   public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws Exception {
     int numFields = 1;
 
+    // Place the following bytes (hex) into UnsafeRows for the comparison:
+    //
+    //   index | 00 01 02 03 04 05 06 07
+    //   ------+------------------------
+    //   row1  | 00 00 00 00 00 00 00 0b
+    //   row2  | 00 00 00 00 80 00 00 0a
+    //
+    // The byte layout needs to be identical on all platforms regardless of
+    // of endianness. To achieve this the bytes in each value are reversed
+    // on little-endian platforms.
+    long row1Data = 11L;
+    long row2Data = 11L + Integer.MAX_VALUE;
+    if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {

Review comment:
       That is perhaps the only remaining issue. We should fix the test that this sets up of course, but, which was the intended test? I'm also not 100% sure, but @mundaym had a reasonable argument at https://github.com/apache/spark/pull/29259#issuecomment-664480885




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29259:
URL: https://github.com/apache/spark/pull/29259#discussion_r464816729



##########
File path: sql/core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java
##########
@@ -261,40 +263,58 @@ public void testBinaryComparatorForNullColumns() throws Exception {
   public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws Exception {
     int numFields = 1;
 
+    long row1Data = 11L;
+    long row2Data = 11L + Integer.MAX_VALUE;
+
+    // BinaryComparator compares longs in big-endian byte order.
+    if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {
+      row1Data = Long.reverseBytes(row1Data);
+      row2Data = Long.reverseBytes(row2Data);
+    }
+
     UnsafeRow row1 = new UnsafeRow(numFields);
     byte[] data1 = new byte[100];
     row1.pointTo(data1, computeSizeInBytes(numFields * 8));
-    row1.setLong(0, 11);
+    row1.setLong(0, row1Data);
 
     UnsafeRow row2 = new UnsafeRow(numFields);
     byte[] data2 = new byte[100];
     row2.pointTo(data2, computeSizeInBytes(numFields * 8));
-    row2.setLong(0, 11L + Integer.MAX_VALUE);
+    row2.setLong(0, row2Data);
 
     insertRow(row1);
     insertRow(row2);
 
-    Assert.assertTrue(compare(0, 1) > 0);
+    Assert.assertTrue(compare(0, 1) < 0);

Review comment:
       My understanding is: the comparator compares bytes from left to right. It doesn't assume the byte ordering of the data. It's expected that different byte order leads to different comparison results.
   
   I think a simple way to fix the test is:
   ```
   if (LITTLE_ENDIAN) {
     Assert.assertTrue(compare(0, 1) < 0);
   } else {
     Assert.assertTrue(compare(0, 1) > 0);
   }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
srowen commented on pull request #29259:
URL: https://github.com/apache/spark/pull/29259#issuecomment-667187750


   I think we can move forward on this if we just change the comments, to show the desired bytes that are being written. Functionally this seems OK.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29259: [SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29259:
URL: https://github.com/apache/spark/pull/29259#issuecomment-669138276


   **[Test build #127095 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127095/testReport)** for PR 29259 at commit [`5d29560`](https://github.com/apache/spark/commit/5d2956015cbe2d0f73650fd2f1f369eaebbee114).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org