You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Ryan Skraba (Jira)" <ji...@apache.org> on 2020/09/30 16:44:00 UTC

[jira] [Commented] (AVRO-2934) Initialise all fields in a nested schema

    [ https://issues.apache.org/jira/browse/AVRO-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204876#comment-17204876 ] 

Ryan Skraba commented on AVRO-2934:
-----------------------------------

Hello!  There's a [RandomData|https://github.com/apache/avro/blob/e208f4b2d442bc14aaba3dad86e8122b83a0873c/lang/java/avro/src/main/java/org/apache/avro/util/RandomData.java] that can be used to create pseudo-random data.

{{RandomData}} is an {{Iterable}} so it's pretty easy to use to create large collections, deterministically if you give it a seed.

{code}
// Create 5000 records that correspond to the given schema using the seed 0
for (Object datum : new RandomData(myRecordSchema, 5000, 0L)) {
    // e.g., datum will be a GenericRecord if myRecordSchema is a Schema.Type.RECORD
    ....
}
{code}

The rules for generating the data is hard-coded in the generating class, and it's _OK_ but inflexible.  If you have any propositions to improve the generating functions via annotations, it could be an interesting improvement!

> Initialise all fields in a nested schema
> ----------------------------------------
>
>                 Key: AVRO-2934
>                 URL: https://issues.apache.org/jira/browse/AVRO-2934
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Biliuta
>            Priority: Minor
>
> For testing purposes it would be nice to have a way to initialise all fields to some value even if there is no default value specified in the schema (the value is required). I noticed that for schemas that are large and have a few levels of nesting it can get quite ugly (creating all the required sub classes) when you want to instantiate a random message to do some tests.
> The possible data types in an avro schema are initialisable to some default/random value and if this is not the value desired, it can be changed at any time.
> I did a short implementation using reflection that recursively goes through the entire fields of a message  but maybe an annotation included in the avro schema (using javaAnnotation) would make more sense so that it is available only if needed. The annotation could also include some options like default or random value, overwrite existing non null members or not, ignore specific members or types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)