You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "James Baldassari (JIRA)" <ji...@apache.org> on 2011/07/05 04:15:22 UTC

[jira] [Commented] (AVRO-839) Implement builder pattern in generated record classes that sets default values when omitted

    [ https://issues.apache.org/jira/browse/AVRO-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059659#comment-13059659 ] 

James Baldassari commented on AVRO-839:
---------------------------------------

I finally had some time to work on this, and I have a patch ready now.  The functionality is basically as described in this issue.  There are a few implementation details to note:
* Implemented both generic and specific Builders
* Builders can also be initialized from other Builders or from record types.  For example:
** Person.newBuilder(anotherPersonBuilder)
** Person.newBuilder(anotherPerson)
* In addition to the accessor method in the record type I also added a mutator method.  I'm not sure if this is best; I just thought it was more flexible.  We could instead force all mutations to go through a Builder.  Thoughts?
* I introduced a new set of reserved words for error types which includes all the standard reserved words in addition to "message" and "cause".  The reason these two new words need to be reserved (for error types only) is that otherwise the generated accessor methods would conflict with those in java.lang.Throwable.
* I found that determining the default Java value for a field is fairly expensive because it requires serializing the JSON field and then deserializing it (is there a better way to do this?).  For that reason I introduced a static cache of default values in RecordBuilderBase.  We only determine the default value once per schema and field, at which point it is added to the static cache.  After a default value has been added to the cache, whenever we need use it we first obtain a deep copy of it so that subsequent changes will not affect the values stored in the cache.
* Even though the @java_class annotation is used in the IDL documentation, I couldn't find it anywhere in the Avro source.  Maybe that should be a future enhancement instead?
* Updated all unit tests so that they no longer directly access record fields.  They now use the new accessor/mutator methods.  As a result, a lot of the changes in this patch are just updates to unit tests.

While I was working on the Velocity templates I also made the following changes:
* Implemented changes requested in AVRO-846
* Added constructor to generated Fixed instances that allows setting the byte[] in the constructor.  This change makes it easier to used Fixed values with the builder interface.

All existing unit tests pass, and I added unit tests to cover all new functionality.

> Implement builder pattern in generated record classes that sets default values when omitted
> -------------------------------------------------------------------------------------------
>
>                 Key: AVRO-839
>                 URL: https://issues.apache.org/jira/browse/AVRO-839
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: James Baldassari
>            Assignee: James Baldassari
>         Attachments: AVRO-839.patch
>
>
> This is an idea for an improvement to the SpecificCompiler-generated record classes.  There are two main issues to address:
> # Default values specified in schemas are only used at read time, not when writing/serializing records.  For example, a NullPointerException is thrown when attempting to write a record that has an uninitialized array or string type.  I'm sure this was done for good reasons, like giving users maximum control and preventing unnecessary garbage collection, but I think it's also somewhat confusing and unintuitive for new users (myself included).
> # Users have to create their own factory classes/methods for every record type, both to ensure that all non-primitive members are initialized and to facilitate the construction and initialization of record instances (i.e. constructing and setting values in a single statement).
> These issues have been discussed previously here:
> * [http://search-hadoop.com/m/iDVTn1JVeSR1]
> * AVRO-726
> * AVRO-770
> * [http://search-hadoop.com/m/JuY1V16pwxh1]
> I'd like to propose a solution that is used by at least one other messaging framework.  For each generated record class there will be a public static inner class called Builder.  The Builder inner class has the same fields as the record class, as well as accessors and mutators for each of these fields.  Whenever a mutator method is called, the Builder sets a boolean flag indicating that the field has been set.  All mutators return a reference to 'this', so it's possible to chain a series of setter invocations, which makes it really easy to construct records in a single statement.  The Builder also has a build() method which constructs a record instance using the values that were set in the Builder.  When the build() method is invoked, if there are any fields that have not been set but have default values as defined in the schema, the Builder will set the values of these fields using their defaults.
> One nice thing about implementing the builder pattern in a static inner Builder class rather than in the record itself is that this enhancement will be completely backwards-compatible with existing code.  The record class itself would not change, and the public fields would still be there, so existing code would still work.  Users would have the option to use the Builder or continue constructing records manually.  Eventually the public fields could be phased out, and the record would be made immutable.  All changes would have to be done through the Builder.
> Here is an example of what this might look like:
> {code}
> // Person.newBuilder() returns a new Person.Builder instance
> // All Person.Builder setters return 'this' allowing us to chain set calls together for convenience
> // Person.Builder.build() returns a Person instance after setting any uninitialized values that have defaults
> Person me = Person.newBuilder().setName("James").setCountry("US").setState("MA").build();
> // We still have direct access to Person's members, so the records are backwards-compatible
> me.state = "CA";
> // Person has accessor methods now so that the public fields can be phased out later
> System.out.println(me.getState());
> // No NPE here because the array<Person> field that stores this person's friends has been automatically 
> // initialized by the Builder to a new java.util.ArrayList<Person> due to a @java_class annotation in the IDL
> System.out.println(me.getFriends().size());
> {code}
> What do people think about this approach?  Any other ideas?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira