You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/09/18 11:39:00 UTC

[jira] [Commented] (PARQUET-1353) The random data generator used for tests repeats the same value over and over again

    [ https://issues.apache.org/jira/browse/PARQUET-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618979#comment-16618979 ] 

ASF GitHub Bot commented on PARQUET-1353:
-----------------------------------------

zivanfi closed pull request #504: PARQUET-1353: Fix random data generator.
URL: https://github.com/apache/parquet-mr/pull/504
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/parquet-hadoop/src/test/java/org/apache/parquet/statistics/RandomValues.java b/parquet-hadoop/src/test/java/org/apache/parquet/statistics/RandomValues.java
index 16db5cbf0..a3f41e924 100644
--- a/parquet-hadoop/src/test/java/org/apache/parquet/statistics/RandomValues.java
+++ b/parquet-hadoop/src/test/java/org/apache/parquet/statistics/RandomValues.java
@@ -84,19 +84,18 @@ public String randomFixedLengthString(int length) {
 
   static abstract class RandomBinaryBase<T extends Comparable<T>> extends RandomValueGenerator<T> {
     protected final int bufferLength;
-    protected final byte[] buffer;
 
     public RandomBinaryBase(long seed, int bufferLength) {
       super(seed);
 
       this.bufferLength = bufferLength;
-      this.buffer = new byte[bufferLength];
     }
 
     public abstract Binary nextBinaryValue();
 
     public Binary asReusedBinary(byte[] data) {
       int length = Math.min(data.length, bufferLength);
+      byte[] buffer = new byte[length];
       System.arraycopy(data, 0, buffer, 0, length);
       return Binary.fromReusedByteArray(data, 0, length);
     }
@@ -287,7 +286,8 @@ public BinaryGenerator(long seed) {
     @Override
     public Binary nextValue() {
       // use a random length, but ensure it is at least a few bytes
-      int length = 5 + randomPositiveInt(buffer.length - 5);
+      int length = 5 + randomPositiveInt(bufferLength - 5);
+      byte[] buffer = new byte[length];
       for (int index = 0; index < length; index++) {
         buffer[index] = (byte) randomInt();
       }
@@ -308,6 +308,7 @@ public FixedGenerator(long seed, int length) {
 
     @Override
     public Binary nextValue() {
+      byte[] buffer = new byte[bufferLength];
       for (int index = 0; index < buffer.length; index++) {
         buffer[index] = (byte) randomInt();
       }


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> The random data generator used for tests repeats the same value over and over again
> -----------------------------------------------------------------------------------
>
>                 Key: PARQUET-1353
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1353
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>            Reporter: Zoltan Ivanfi
>            Assignee: Zoltan Ivanfi
>            Priority: Minor
>              Labels: pull-request-available
>
> The RandomValues class returns references to its internal buffer as random values. This buffer gets a random value every time a new random value is requested, but since earlier values reference the same internal buffer, they get changed to the same value as well. So even if successive calls return different values each time, the actual list of these values will always consist of a single value repeated multiple times. For example:
> ||n-th call||returned value||accumulated list expected||accumulated list actual||
> |1|6C|6C|6C|
> |2|8F|6C 8F|8F 8F|
> |3|52|6C 8F 52|52 52 52|
> |4|B8|6C 8F 52 B8|B8 B8 B8 B8|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)