You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Sachin Goyal (JIRA)" <ji...@apache.org> on 2014/06/13 00:13:02 UTC

[jira] [Updated] (AVRO-695) Cycle Reference Support

     [ https://issues.apache.org/jira/browse/AVRO-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sachin Goyal updated AVRO-695:
------------------------------

    Affects Version/s:     (was: 1.4.1)
                       1.7.6
               Status: Patch Available  (was: Open)

Support for circular references without modifying the grammar.
[~cutting], this patch takes above recommendations into account.

Example of a simple circular reference (very common in Java and Hibernate world):
-------------------
class Parent
{
  String name;
  Child child;
};

class Child
{
  Parent parent;
  String city;
};
-------------------
Without this fix, Avro correctly generates schema for the above but fails with StackOverflow error when trying to serialize actual objects with circular reference.

This fix requires following steps for using it:
1) Allow clients to choose name of the extra-field which every entity can use to store its ID.
        ReflectData rdata = ReflectData.AllowNull.get();
        rdata.setCircularRefIdPrefix("__crefId");

2) One must pass the same rdata when serializing objects.
        DatumWriter<T> datumWriter = new ReflectDatumWriter (T.class, rdata);

3) MUST use the same field-name when reading circular-reference objects.
        GenericDatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord> ();
        GenericData gdata = datumReader.getData();
        gdata.setCircularRefIdPrefix("__crefId");
  If field-name is different, there may be no error but circular references will not be resolved.

4) Choose if the circular references should be fully resolved or not.
        datumReader.setResolveCircularRefs (false);
   If this is set to true, circular references will point to actual objects forming a cycle.
   If this is set to false, circular references will point to dummy objects having only the circular-reference-ID. This option allows Avro to work everywhere else without running into infinite loops (like GenericRecord.toString()) while providing users enough information to
   form true circular references themselves if required.

Test-case provided for circular references in List and Maps as well.
Happy to make modifications if any.

> Cycle Reference Support
> -----------------------
>
>                 Key: AVRO-695
>                 URL: https://issues.apache.org/jira/browse/AVRO-695
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>    Affects Versions: 1.7.6
>            Reporter: Moustapha Cherri
>         Attachments: avro-1.4.1-cycle.patch.gz, avro-1.4.1-cycle.patch.gz
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> This is a proposed implementation to add cycle reference support to Avro. It basically introduce a new type named Cycle. Cycles contains a string representing the path to the other reference.
> For example if we have an object of type Message that have a member named previous with type Message too. If we have have this hierarchy:
> message
>   previous : message2
> message2
>   previous : message2
> When serializing the cycle path for "message2.previous" will be "previous".
> The implementation depend on ANTLR to evaluate those cycle at read time to resolve them. I used ANTLR 3.2. This dependency is not mandated; I just used ANTLR to speed thing up. I kept in this implementation the generated code from ANTLR though this should not be the case as this should be generated during the build. I only updated the Java code.
> I did not make full unit testing but you can find "avrotest.Main" class that can be used a preliminary test.
> Please do not hesitate to contact me for further clarification if this seems interresting.
> Best regards,
> Moustapha Cherri



--
This message was sent by Atlassian JIRA
(v6.2#6252)