You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Simon Klakegg (Jira)" <ji...@apache.org> on 2022/08/18 17:11:00 UTC

[jira] [Created] (AVRO-3611) org.apache.avro.util.RandomData generates invalid test data

Simon Klakegg created AVRO-3611:
-----------------------------------

             Summary: org.apache.avro.util.RandomData generates invalid test data
                 Key: AVRO-3611
                 URL: https://issues.apache.org/jira/browse/AVRO-3611
             Project: Apache Avro
          Issue Type: Improvement
          Components: java
    Affects Versions: 1.11.1
            Reporter: Simon Klakegg
             Fix For: 1.11.2
         Attachments: image-2022-08-18-19-05-37-323.png

When RandomData.java generates data it does not check for Logical Types, which are described here: [Specification | Apache Avro|https://avro.apache.org/docs/1.11.1/specification/_print/]




For instance the following the generate method would return this for INT fields:
{code:java}
    case INT:      return random.nextInt(); {code}
 

However, an int field could be of logical type date:
!image-2022-08-18-19-05-37-323.png|width=1052,height=266!

 

Which in make cases could create an int that is out of range for logicalType Date, and thus break when creating records in for instance kafka.

My suggestion is to generated data that is valid for logicalTypes, here is an example I made for int and long:
{code:java}
case INT:
    switch (logicalTypeName) {
      case "date":
        // Random number of days between Unix Epoch start day (0) and end day (24855)
        int maxDaysInEpoch = (int) Duration.ofSeconds(Integer.MAX_VALUE).toDays();
        return ThreadLocalRandom.current().nextInt(0, maxDaysInEpoch);
      case "time-millis":
        // Random number of milliseconds between midnight 00:00:00.000 (0) and 23:59:59:999 (86399999)
        int maxMillisecondsInDay = (int) Duration.ofDays(1).toMillis() - 1;
        return random.nextInt(0, maxMillisecondsInDay);
      default: return random.nextInt();
    }
case LONG:
  switch (logicalTypeName) {
    case "time-micros":
      // Random number of microseconds between midnight 00:00:00.000000 (0) and 23:59:59:999999 (86399999999)
      long maxMicrosecondsInDay = (Duration.ofDays(1).toNanos() - 1) / 1000;
      return random.nextLong(0, maxMicrosecondsInDay);
    case "timestamp-millis":
      // Random milliseconds between Unix Epoch (0) start and end (2147483647000)
      long maxMillisecondsInEpoch = TimeUnit.SECONDS.toMillis(Integer.MAX_VALUE);
      return ThreadLocalRandom.current().nextLong(0, maxMillisecondsInEpoch);
    case "timestamp-micros":
      // Random microseconds between Unix Epoch (0) start and end (2147483647000000)
      long maxMicrosecondsInEpoch = TimeUnit.SECONDS.toMicros(Integer.MAX_VALUE);
      return ThreadLocalRandom.current().nextLong(0, maxMicrosecondsInEpoch);
    case "local-timestamp-millis":
      // Random number of milliseconds between Unix Epoch start (0) and 100 years from now (now() + 100)
      ZonedDateTime hundredYearsFromNow = ZonedDateTime.now().plusYears(100);
      long hundredYearsEpochMillis = ChronoUnit.MILLIS.between(Instant.EPOCH, hundredYearsFromNow);
      return random.nextLong(0, hundredYearsEpochMillis);
    case "local-timestamp-micros":
      // Random number of microseconds between Unix Epoch start (0) and 100 years from now (now() + 100)
      long hundredYearsEpochMicros = ChronoUnit.MICROS.between(Instant.EPOCH, hundredYearsFromNow);
      return random.nextLong(0, hundredYearsEpochMicros);
    default: return random.nextLong();
  } {code}






 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)