You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@avro.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/09/14 14:10:00 UTC
[jira] [Updated] (AVRO-3611) org.apache.avro.util.RandomData generates invalid test data
[ https://issues.apache.org/jira/browse/AVRO-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated AVRO-3611:
---------------------------------
Labels: features pull-request-available (was: features)
> org.apache.avro.util.RandomData generates invalid test data
> -----------------------------------------------------------
>
> Key: AVRO-3611
> URL: https://issues.apache.org/jira/browse/AVRO-3611
> Project: Apache Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.11.1
> Reporter: Simon Klakegg
> Priority: Minor
> Labels: features, pull-request-available
> Fix For: 1.11.2
>
> Attachments: image-2022-08-18-19-05-37-323.png
>
> Original Estimate: 48h
> Time Spent: 10m
> Remaining Estimate: 47h 50m
>
> When RandomData.java generates data it does not check for Logical Types, which are described here: [Specification | Apache Avro|https://avro.apache.org/docs/1.11.1/specification/_print/]
> *For instance the following generate method would return this for INT fields:*
> {code:java}
> case INT: return random.nextInt(); {code}
>
> {*}However, an int field could be of logical type date:{*}!image-2022-08-18-19-05-37-323.png|width=1052,height=266!
>
> Which in many cases could create an int that is out of range for logicalType Date, and thus break when creating records in for instance kafka.
> My suggestion is to generated data that is valid for logicalTypes, here is an example I made for int and long:
> {code:java}
> case INT:
> switch (logicalTypeName) {
> case "date":
> // Random number of days between Unix Epoch start day (0) and end day (24855)
> int maxDaysInEpoch = (int) Duration.ofSeconds(Integer.MAX_VALUE).toDays();
> return ThreadLocalRandom.current().nextInt(0, maxDaysInEpoch);
> case "time-millis":
> // Random number of milliseconds between midnight 00:00:00.000 (0) and 23:59:59:999 (86399999)
> int maxMillisecondsInDay = (int) Duration.ofDays(1).toMillis() - 1;
> return random.nextInt(0, maxMillisecondsInDay);
> default: return random.nextInt();
> }
> case LONG:
> switch (logicalTypeName) {
> case "time-micros":
> // Random number of microseconds between midnight 00:00:00.000000 (0) and 23:59:59:999999 (86399999999)
> long maxMicrosecondsInDay = (Duration.ofDays(1).toNanos() - 1) / 1000;
> return random.nextLong(0, maxMicrosecondsInDay);
> case "timestamp-millis":
> // Random milliseconds between Unix Epoch (0) start and end (2147483647000)
> long maxMillisecondsInEpoch = TimeUnit.SECONDS.toMillis(Integer.MAX_VALUE);
> return ThreadLocalRandom.current().nextLong(0, maxMillisecondsInEpoch);
> case "timestamp-micros":
> // Random microseconds between Unix Epoch (0) start and end (2147483647000000)
> long maxMicrosecondsInEpoch = TimeUnit.SECONDS.toMicros(Integer.MAX_VALUE);
> return ThreadLocalRandom.current().nextLong(0, maxMicrosecondsInEpoch);
> case "local-timestamp-millis":
> // Random number of milliseconds between Unix Epoch start (0) and 100 years from now (now() + 100)
> ZonedDateTime hundredYearsFromNow = ZonedDateTime.now().plusYears(100);
> long hundredYearsEpochMillis = ChronoUnit.MILLIS.between(Instant.EPOCH, hundredYearsFromNow);
> return random.nextLong(0, hundredYearsEpochMillis);
> case "local-timestamp-micros":
> // Random number of microseconds between Unix Epoch start (0) and 100 years from now (now() + 100)
> long hundredYearsEpochMicros = ChronoUnit.MICROS.between(Instant.EPOCH, hundredYearsFromNow);
> return random.nextLong(0, hundredYearsEpochMicros);
> default: return random.nextLong();
> } {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)