You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@avro.apache.org by "Christophe Le Saec (Jira)" <ji...@apache.org> on 2022/09/14 14:17:00 UTC

[jira] [Commented] (AVRO-3611) org.apache.avro.util.RandomData generates invalid test data

    [ https://issues.apache.org/jira/browse/AVRO-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604726#comment-17604726 ] 

Christophe Le Saec commented on AVRO-3611:
------------------------------------------

I write this [PR|https://github.com/apache/avro/pull/1867] in order to fix the issue.
Some, like  LogicalTypes.TimestampMillis, doesn't need a special control as all long value can be converted (Long.MAX_VALUE < Instant.MAX.toEpochMilli(), which fail). For this kind of LogicalTypes, it may have been better to represent it with Fixed of size 10 to be able to save all possible values, but, that's another story.

> org.apache.avro.util.RandomData generates invalid test data
> -----------------------------------------------------------
>
>                 Key: AVRO-3611
>                 URL: https://issues.apache.org/jira/browse/AVRO-3611
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.11.1
>            Reporter: Simon Klakegg
>            Priority: Minor
>              Labels: features, pull-request-available
>             Fix For: 1.11.2
>
>         Attachments: image-2022-08-18-19-05-37-323.png
>
>   Original Estimate: 48h
>          Time Spent: 10m
>  Remaining Estimate: 47h 50m
>
> When RandomData.java generates data it does not check for Logical Types, which are described here: [Specification | Apache Avro|https://avro.apache.org/docs/1.11.1/specification/_print/]
> *For instance the following generate method would return this for INT fields:*
> {code:java}
>     case INT:      return random.nextInt(); {code}
>  
> {*}However, an int field could be of logical type date:{*}!image-2022-08-18-19-05-37-323.png|width=1052,height=266!
>  
> Which in many cases could create an int that is out of range for logicalType Date, and thus break when creating records in for instance kafka.
> My suggestion is to generated data that is valid for logicalTypes, here is an example I made for int and long:
> {code:java}
> case INT:
>     switch (logicalTypeName) {
>       case "date":
>         // Random number of days between Unix Epoch start day (0) and end day (24855)
>         int maxDaysInEpoch = (int) Duration.ofSeconds(Integer.MAX_VALUE).toDays();
>         return ThreadLocalRandom.current().nextInt(0, maxDaysInEpoch);
>       case "time-millis":
>         // Random number of milliseconds between midnight 00:00:00.000 (0) and 23:59:59:999 (86399999)
>         int maxMillisecondsInDay = (int) Duration.ofDays(1).toMillis() - 1;
>         return random.nextInt(0, maxMillisecondsInDay);
>       default: return random.nextInt();
>     }
> case LONG:
>   switch (logicalTypeName) {
>     case "time-micros":
>       // Random number of microseconds between midnight 00:00:00.000000 (0) and 23:59:59:999999 (86399999999)
>       long maxMicrosecondsInDay = (Duration.ofDays(1).toNanos() - 1) / 1000;
>       return random.nextLong(0, maxMicrosecondsInDay);
>     case "timestamp-millis":
>       // Random milliseconds between Unix Epoch (0) start and end (2147483647000)
>       long maxMillisecondsInEpoch = TimeUnit.SECONDS.toMillis(Integer.MAX_VALUE);
>       return ThreadLocalRandom.current().nextLong(0, maxMillisecondsInEpoch);
>     case "timestamp-micros":
>       // Random microseconds between Unix Epoch (0) start and end (2147483647000000)
>       long maxMicrosecondsInEpoch = TimeUnit.SECONDS.toMicros(Integer.MAX_VALUE);
>       return ThreadLocalRandom.current().nextLong(0, maxMicrosecondsInEpoch);
>     case "local-timestamp-millis":
>       // Random number of milliseconds between Unix Epoch start (0) and 100 years from now (now() + 100)
>       ZonedDateTime hundredYearsFromNow = ZonedDateTime.now().plusYears(100);
>       long hundredYearsEpochMillis = ChronoUnit.MILLIS.between(Instant.EPOCH, hundredYearsFromNow);
>       return random.nextLong(0, hundredYearsEpochMillis);
>     case "local-timestamp-micros":
>       // Random number of microseconds between Unix Epoch start (0) and 100 years from now (now() + 100)
>       long hundredYearsEpochMicros = ChronoUnit.MICROS.between(Instant.EPOCH, hundredYearsFromNow);
>       return random.nextLong(0, hundredYearsEpochMicros);
>     default: return random.nextLong();
>   } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)