You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/01 16:35:17 UTC

[GitHub] [arrow] lwhite1 opened a new pull request, #14289: ARROW-17585: [Java] Update GenerateSampleData.java

lwhite1 opened a new pull request, #14289:
URL: https://github.com/apache/arrow/pull/14289

   Adds support for generating data for the four Uint FieldVectors


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lwhite1 commented on pull request #14289: ARROW-17585: [Java] Update GenerateSampleData.java

Posted by GitBox <gi...@apache.org>.
lwhite1 commented on PR #14289:
URL: https://github.com/apache/arrow/pull/14289#issuecomment-1265698266

   I think it's supposed to be an aid to users who want to create test data or
   example data. I intend to use it for Table testing to ensure, not that the
   vectors work correctly, but that the Table methods wrapping them are hooked
   up right.
   
   It's been sitting there for five years. I didn't expect it to be
   controversial.
   
   
   On Mon, Oct 3, 2022 at 9:50 AM David Li ***@***.***> wrote:
   
   > Well as far as I see it's completely unused, hence my question
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/arrow/pull/14289#issuecomment-1265470864>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AA2FPAQGNMZZN2S2QW6BKGLWBLQATANCNFSM6AAAAAAQ2QPYBM>
   > .
   > You are receiving this because you authored the thread.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #14289: ARROW-17585: [Java] Update GenerateSampleData.java

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14289:
URL: https://github.com/apache/arrow/pull/14289#issuecomment-1264415100

   https://issues.apache.org/jira/browse/ARROW-17585


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on a diff in pull request #14289: ARROW-17585: [Java] Update GenerateSampleData.java

Posted by GitBox <gi...@apache.org>.
lidavidm commented on code in PR #14289:
URL: https://github.com/apache/arrow/pull/14289#discussion_r985694473


##########
java/vector/src/main/java/org/apache/arrow/vector/GenerateSampleData.java:
##########
@@ -267,6 +275,58 @@ private static void writeTinyIntData(TinyIntVector vector, int valueCount) {
     vector.setValueCount(valueCount);
   }
 
+  private static void writeUInt1Data(UInt1Vector vector, int valueCount) {
+    final byte even = 1;
+    final byte odd = 2;
+    for (int i = 0; i < valueCount; i++) {
+      if (i % 2 == 0) {
+        vector.setSafe(i, even);
+      } else {
+        vector.setSafe(i, odd);
+      }
+    }
+    vector.setValueCount(valueCount);
+  }
+
+  private static void writeUInt2Data(UInt2Vector vector, int valueCount) {
+    final short even = 10;
+    final short odd = 20;
+    for (int i = 0; i < valueCount; i++) {
+      if (i % 2 == 0) {
+        vector.setSafe(i, even);
+      } else {
+        vector.setSafe(i, odd);
+      }
+    }
+    vector.setValueCount(valueCount);
+  }
+
+  private static void writeUInt4Data(UInt4Vector vector, int valueCount) {
+    final int even = 1000;
+    final int odd = 2000;
+    for (int i = 0; i < valueCount; i++) {
+      if (i % 2 == 0) {
+        vector.setSafe(i, even);
+      } else {
+        vector.setSafe(i, odd);
+      }
+    }
+    vector.setValueCount(valueCount);
+  }
+
+  private static void writeUInt8Data(UInt8Vector vector, int valueCount) {
+    final long even = 1000000000;
+    final long odd = 2000000000;

Review Comment:
   Is it possible to use a value that wouldn't be representable in the signed equivalent of the vectors? (Not sure how you write such a value though)



##########
java/vector/src/main/java/org/apache/arrow/vector/GenerateSampleData.java:
##########
@@ -267,6 +275,58 @@ private static void writeTinyIntData(TinyIntVector vector, int valueCount) {
     vector.setValueCount(valueCount);
   }
 
+  private static void writeUInt1Data(UInt1Vector vector, int valueCount) {
+    final byte even = 1;
+    final byte odd = 2;
+    for (int i = 0; i < valueCount; i++) {
+      if (i % 2 == 0) {
+        vector.setSafe(i, even);
+      } else {
+        vector.setSafe(i, odd);
+      }
+    }
+    vector.setValueCount(valueCount);
+  }
+
+  private static void writeUInt2Data(UInt2Vector vector, int valueCount) {
+    final short even = 10;
+    final short odd = 20;
+    for (int i = 0; i < valueCount; i++) {
+      if (i % 2 == 0) {
+        vector.setSafe(i, even);
+      } else {
+        vector.setSafe(i, odd);
+      }
+    }
+    vector.setValueCount(valueCount);
+  }
+
+  private static void writeUInt4Data(UInt4Vector vector, int valueCount) {
+    final int even = 1000;
+    final int odd = 2000;
+    for (int i = 0; i < valueCount; i++) {
+      if (i % 2 == 0) {
+        vector.setSafe(i, even);
+      } else {
+        vector.setSafe(i, odd);
+      }
+    }
+    vector.setValueCount(valueCount);
+  }
+
+  private static void writeUInt8Data(UInt8Vector vector, int valueCount) {
+    final long even = 1000000000;
+    final long odd = 2000000000;

Review Comment:
   Well, the value has to be representable somehow (else how would IPC work? I'm guessing you'd just take a negative int and it would reinterpret it as an unsigned integer) and I'd want us to test a range of values that may occur (the C++ unit tests for kernels, for instance, try to test the extremes, 0, negatives, etc. to catch corner cases, and frameworks like Hypothesis generalize this) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #14289: ARROW-17585: [Java] Update GenerateSampleData.java

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14289:
URL: https://github.com/apache/arrow/pull/14289#issuecomment-1264415104

   :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #14289: ARROW-17585: [Java] Update GenerateSampleData.java

Posted by GitBox <gi...@apache.org>.
lidavidm commented on PR #14289:
URL: https://github.com/apache/arrow/pull/14289#issuecomment-1265746984

   Sorry, knowing that it was going to be used to test the table implementation was all I really wanted to understand in the first place. I'll let a different reviewer step in.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lwhite1 commented on pull request #14289: ARROW-17585: [Java] Update GenerateSampleData.java

Posted by GitBox <gi...@apache.org>.
lwhite1 commented on PR #14289:
URL: https://github.com/apache/arrow/pull/14289#issuecomment-1265376017

   I don't think the rest of the values generated by this class tests corner
   cases. It's just filler data.
   
   On Mon, Oct 3, 2022 at 8:32 AM David Li ***@***.***> wrote:
   
   > ***@***.**** commented on this pull request.
   > ------------------------------
   >
   > In
   > java/vector/src/main/java/org/apache/arrow/vector/GenerateSampleData.java
   > <https://github.com/apache/arrow/pull/14289#discussion_r985728969>:
   >
   > > +  private static void writeUInt4Data(UInt4Vector vector, int valueCount) {
   > +    final int even = 1000;
   > +    final int odd = 2000;
   > +    for (int i = 0; i < valueCount; i++) {
   > +      if (i % 2 == 0) {
   > +        vector.setSafe(i, even);
   > +      } else {
   > +        vector.setSafe(i, odd);
   > +      }
   > +    }
   > +    vector.setValueCount(valueCount);
   > +  }
   > +
   > +  private static void writeUInt8Data(UInt8Vector vector, int valueCount) {
   > +    final long even = 1000000000;
   > +    final long odd = 2000000000;
   >
   > Well, the value has to be representable somehow (else how would IPC work?
   > I'm guessing you'd just take a negative int and it would reinterpret it as
   > an unsigned integer) and I'd want us to test a range of values that may
   > occur (the C++ unit tests for kernels, for instance, try to test the
   > extremes, 0, negatives, etc. to catch corner cases, and frameworks like
   > Hypothesis generalize this)
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/arrow/pull/14289#discussion_r985728969>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AA2FPAQHLBYLHT2VJO6HWDLWBLG7DANCNFSM6AAAAAAQ2QPYBM>
   > .
   > You are receiving this because you authored the thread.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lwhite1 commented on pull request #14289: ARROW-17585: [Java] Update GenerateSampleData.java

Posted by GitBox <gi...@apache.org>.
lwhite1 commented on PR #14289:
URL: https://github.com/apache/arrow/pull/14289#issuecomment-1265395144

   > Same question as above then: what's the intended purpose of this PR?
   
   This class generates _sample data_ for ValueVectors.  It doesn't create any sample value vectors for the four Uint types. The intended purpose of the PR is to add support for 4 vector types that are not currently included. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #14289: ARROW-17585: [Java] Update GenerateSampleData.java

Posted by GitBox <gi...@apache.org>.
lidavidm commented on PR #14289:
URL: https://github.com/apache/arrow/pull/14289#issuecomment-1265377617

   Same question as above then: what's the intended purpose of this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #14289: ARROW-17585: [Java] Update GenerateSampleData.java

Posted by GitBox <gi...@apache.org>.
lidavidm commented on PR #14289:
URL: https://github.com/apache/arrow/pull/14289#issuecomment-1265470864

   Well as far as I see it's completely unused, hence my question


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lwhite1 commented on a diff in pull request #14289: ARROW-17585: [Java] Update GenerateSampleData.java

Posted by GitBox <gi...@apache.org>.
lwhite1 commented on code in PR #14289:
URL: https://github.com/apache/arrow/pull/14289#discussion_r985726660


##########
java/vector/src/main/java/org/apache/arrow/vector/GenerateSampleData.java:
##########
@@ -267,6 +275,58 @@ private static void writeTinyIntData(TinyIntVector vector, int valueCount) {
     vector.setValueCount(valueCount);
   }
 
+  private static void writeUInt1Data(UInt1Vector vector, int valueCount) {
+    final byte even = 1;
+    final byte odd = 2;
+    for (int i = 0; i < valueCount; i++) {
+      if (i % 2 == 0) {
+        vector.setSafe(i, even);
+      } else {
+        vector.setSafe(i, odd);
+      }
+    }
+    vector.setValueCount(valueCount);
+  }
+
+  private static void writeUInt2Data(UInt2Vector vector, int valueCount) {
+    final short even = 10;
+    final short odd = 20;
+    for (int i = 0; i < valueCount; i++) {
+      if (i % 2 == 0) {
+        vector.setSafe(i, even);
+      } else {
+        vector.setSafe(i, odd);
+      }
+    }
+    vector.setValueCount(valueCount);
+  }
+
+  private static void writeUInt4Data(UInt4Vector vector, int valueCount) {
+    final int even = 1000;
+    final int odd = 2000;
+    for (int i = 0; i < valueCount; i++) {
+      if (i % 2 == 0) {
+        vector.setSafe(i, even);
+      } else {
+        vector.setSafe(i, odd);
+      }
+    }
+    vector.setValueCount(valueCount);
+  }
+
+  private static void writeUInt8Data(UInt8Vector vector, int valueCount) {
+    final long even = 1000000000;
+    final long odd = 2000000000;

Review Comment:
   The UInts are all represented as ints under the covers in java (except UInt2 may be a char). I don't know of any way to to create a value like you're suggesting, and I think it would error if I did so it couldn't be used for test data. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org