You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Simon Klakegg (Jira)" <ji...@apache.org> on 2022/08/18 17:13:00 UTC

[jira] [Updated] (AVRO-3611) org.apache.avro.util.RandomData generates invalid test data

     [ https://issues.apache.org/jira/browse/AVRO-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Klakegg updated AVRO-3611:
--------------------------------
    Description: 
When RandomData.java generates data it does not check for Logical Types, which are described here: [Specification | Apache Avro|https://avro.apache.org/docs/1.11.1/specification/_print/]

*For instance the following generate method would return this for INT fields:*
{code:java}
    case INT:      return random.nextInt(); {code}
 

{*}However, an int field could be of logical type date:{*}!image-2022-08-18-19-05-37-323.png|width=1052,height=266!

 

Which in many cases could create an int that is out of range for logicalType Date, and thus break when creating records in for instance kafka.

My suggestion is to generated data that is valid for logicalTypes, here is an example I made for int and long:
{code:java}
case INT:
    switch (logicalTypeName) {
      case "date":
        // Random number of days between Unix Epoch start day (0) and end day (24855)
        int maxDaysInEpoch = (int) Duration.ofSeconds(Integer.MAX_VALUE).toDays();
        return ThreadLocalRandom.current().nextInt(0, maxDaysInEpoch);
      case "time-millis":
        // Random number of milliseconds between midnight 00:00:00.000 (0) and 23:59:59:999 (86399999)
        int maxMillisecondsInDay = (int) Duration.ofDays(1).toMillis() - 1;
        return random.nextInt(0, maxMillisecondsInDay);
      default: return random.nextInt();
    }
case LONG:
  switch (logicalTypeName) {
    case "time-micros":
      // Random number of microseconds between midnight 00:00:00.000000 (0) and 23:59:59:999999 (86399999999)
      long maxMicrosecondsInDay = (Duration.ofDays(1).toNanos() - 1) / 1000;
      return random.nextLong(0, maxMicrosecondsInDay);
    case "timestamp-millis":
      // Random milliseconds between Unix Epoch (0) start and end (2147483647000)
      long maxMillisecondsInEpoch = TimeUnit.SECONDS.toMillis(Integer.MAX_VALUE);
      return ThreadLocalRandom.current().nextLong(0, maxMillisecondsInEpoch);
    case "timestamp-micros":
      // Random microseconds between Unix Epoch (0) start and end (2147483647000000)
      long maxMicrosecondsInEpoch = TimeUnit.SECONDS.toMicros(Integer.MAX_VALUE);
      return ThreadLocalRandom.current().nextLong(0, maxMicrosecondsInEpoch);
    case "local-timestamp-millis":
      // Random number of milliseconds between Unix Epoch start (0) and 100 years from now (now() + 100)
      ZonedDateTime hundredYearsFromNow = ZonedDateTime.now().plusYears(100);
      long hundredYearsEpochMillis = ChronoUnit.MILLIS.between(Instant.EPOCH, hundredYearsFromNow);
      return random.nextLong(0, hundredYearsEpochMillis);
    case "local-timestamp-micros":
      // Random number of microseconds between Unix Epoch start (0) and 100 years from now (now() + 100)
      long hundredYearsEpochMicros = ChronoUnit.MICROS.between(Instant.EPOCH, hundredYearsFromNow);
      return random.nextLong(0, hundredYearsEpochMicros);
    default: return random.nextLong();
  } {code}
 

  was:
When RandomData.java generates data it does not check for Logical Types, which are described here: [Specification | Apache Avro|https://avro.apache.org/docs/1.11.1/specification/_print/]




For instance the following the generate method would return this for INT fields:
{code:java}
    case INT:      return random.nextInt(); {code}
 

However, an int field could be of logical type date:
!image-2022-08-18-19-05-37-323.png|width=1052,height=266!

 

Which in make cases could create an int that is out of range for logicalType Date, and thus break when creating records in for instance kafka.

My suggestion is to generated data that is valid for logicalTypes, here is an example I made for int and long:
{code:java}
case INT:
    switch (logicalTypeName) {
      case "date":
        // Random number of days between Unix Epoch start day (0) and end day (24855)
        int maxDaysInEpoch = (int) Duration.ofSeconds(Integer.MAX_VALUE).toDays();
        return ThreadLocalRandom.current().nextInt(0, maxDaysInEpoch);
      case "time-millis":
        // Random number of milliseconds between midnight 00:00:00.000 (0) and 23:59:59:999 (86399999)
        int maxMillisecondsInDay = (int) Duration.ofDays(1).toMillis() - 1;
        return random.nextInt(0, maxMillisecondsInDay);
      default: return random.nextInt();
    }
case LONG:
  switch (logicalTypeName) {
    case "time-micros":
      // Random number of microseconds between midnight 00:00:00.000000 (0) and 23:59:59:999999 (86399999999)
      long maxMicrosecondsInDay = (Duration.ofDays(1).toNanos() - 1) / 1000;
      return random.nextLong(0, maxMicrosecondsInDay);
    case "timestamp-millis":
      // Random milliseconds between Unix Epoch (0) start and end (2147483647000)
      long maxMillisecondsInEpoch = TimeUnit.SECONDS.toMillis(Integer.MAX_VALUE);
      return ThreadLocalRandom.current().nextLong(0, maxMillisecondsInEpoch);
    case "timestamp-micros":
      // Random microseconds between Unix Epoch (0) start and end (2147483647000000)
      long maxMicrosecondsInEpoch = TimeUnit.SECONDS.toMicros(Integer.MAX_VALUE);
      return ThreadLocalRandom.current().nextLong(0, maxMicrosecondsInEpoch);
    case "local-timestamp-millis":
      // Random number of milliseconds between Unix Epoch start (0) and 100 years from now (now() + 100)
      ZonedDateTime hundredYearsFromNow = ZonedDateTime.now().plusYears(100);
      long hundredYearsEpochMillis = ChronoUnit.MILLIS.between(Instant.EPOCH, hundredYearsFromNow);
      return random.nextLong(0, hundredYearsEpochMillis);
    case "local-timestamp-micros":
      // Random number of microseconds between Unix Epoch start (0) and 100 years from now (now() + 100)
      long hundredYearsEpochMicros = ChronoUnit.MICROS.between(Instant.EPOCH, hundredYearsFromNow);
      return random.nextLong(0, hundredYearsEpochMicros);
    default: return random.nextLong();
  } {code}






 


> org.apache.avro.util.RandomData generates invalid test data
> -----------------------------------------------------------
>
>                 Key: AVRO-3611
>                 URL: https://issues.apache.org/jira/browse/AVRO-3611
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.11.1
>            Reporter: Simon Klakegg
>            Priority: Minor
>              Labels: features
>             Fix For: 1.11.2
>
>         Attachments: image-2022-08-18-19-05-37-323.png
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When RandomData.java generates data it does not check for Logical Types, which are described here: [Specification | Apache Avro|https://avro.apache.org/docs/1.11.1/specification/_print/]
> *For instance the following generate method would return this for INT fields:*
> {code:java}
>     case INT:      return random.nextInt(); {code}
>  
> {*}However, an int field could be of logical type date:{*}!image-2022-08-18-19-05-37-323.png|width=1052,height=266!
>  
> Which in many cases could create an int that is out of range for logicalType Date, and thus break when creating records in for instance kafka.
> My suggestion is to generated data that is valid for logicalTypes, here is an example I made for int and long:
> {code:java}
> case INT:
>     switch (logicalTypeName) {
>       case "date":
>         // Random number of days between Unix Epoch start day (0) and end day (24855)
>         int maxDaysInEpoch = (int) Duration.ofSeconds(Integer.MAX_VALUE).toDays();
>         return ThreadLocalRandom.current().nextInt(0, maxDaysInEpoch);
>       case "time-millis":
>         // Random number of milliseconds between midnight 00:00:00.000 (0) and 23:59:59:999 (86399999)
>         int maxMillisecondsInDay = (int) Duration.ofDays(1).toMillis() - 1;
>         return random.nextInt(0, maxMillisecondsInDay);
>       default: return random.nextInt();
>     }
> case LONG:
>   switch (logicalTypeName) {
>     case "time-micros":
>       // Random number of microseconds between midnight 00:00:00.000000 (0) and 23:59:59:999999 (86399999999)
>       long maxMicrosecondsInDay = (Duration.ofDays(1).toNanos() - 1) / 1000;
>       return random.nextLong(0, maxMicrosecondsInDay);
>     case "timestamp-millis":
>       // Random milliseconds between Unix Epoch (0) start and end (2147483647000)
>       long maxMillisecondsInEpoch = TimeUnit.SECONDS.toMillis(Integer.MAX_VALUE);
>       return ThreadLocalRandom.current().nextLong(0, maxMillisecondsInEpoch);
>     case "timestamp-micros":
>       // Random microseconds between Unix Epoch (0) start and end (2147483647000000)
>       long maxMicrosecondsInEpoch = TimeUnit.SECONDS.toMicros(Integer.MAX_VALUE);
>       return ThreadLocalRandom.current().nextLong(0, maxMicrosecondsInEpoch);
>     case "local-timestamp-millis":
>       // Random number of milliseconds between Unix Epoch start (0) and 100 years from now (now() + 100)
>       ZonedDateTime hundredYearsFromNow = ZonedDateTime.now().plusYears(100);
>       long hundredYearsEpochMillis = ChronoUnit.MILLIS.between(Instant.EPOCH, hundredYearsFromNow);
>       return random.nextLong(0, hundredYearsEpochMillis);
>     case "local-timestamp-micros":
>       // Random number of microseconds between Unix Epoch start (0) and 100 years from now (now() + 100)
>       long hundredYearsEpochMicros = ChronoUnit.MICROS.between(Instant.EPOCH, hundredYearsFromNow);
>       return random.nextLong(0, hundredYearsEpochMicros);
>     default: return random.nextLong();
>   } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)