You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by S G <sg...@gmail.com> on 2016/06/01 20:51:00 UTC

Re: Need review/merges for couple of pull requests

Hey RB,

I was trying out the circular refs with latest 1.8.1 version of Avro and it
doesn't seem to be working out of the box for me.
Perhaps I am missing something and would appreciate your help.

Thanks,
SG


Here is my test code:

public class CircularRefsTest
{
    @Test
    public void testSerialization() throws Exception
    {
        // Create a circular linked-list
        ListNode first = new ListNode("first");
        ListNode second = new ListNode("second");
        second.next = first;
        first.next = second;

        ReflectData rdata = ReflectData.AllowNull.get();

        // Print Schema
        Schema schema = rdata.getSchema(ListNode.class);
        System.out.println(schema);

        // Serialize
        DatumWriter<ListNode> datumWriter = new
ReflectDatumWriter<ListNode>(ListNode.class);
        ByteArrayOutputStream byteArrayOutputStream = new
ByteArrayOutputStream();
        Encoder encoder =
EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null);
        datumWriter.write(first, encoder);
        encoder.flush();
        byte[] bytes = byteArrayOutputStream.toByteArray();

        assertTrue( bytes != null );
    }
}

class ListNode
{
    String data;
    ListNode next;

    public ListNode() {
    }
    public ListNode(String data) {
        this.data = data;
    }
}



On Thu, May 28, 2015 at 11:00 AM, Ryan Blue <bl...@cloudera.com> wrote:

> SG,
>
> The data ends up looking like this:
>
> {"id":1,"p":"parent data!","child":{"c":"child data!","parent":{"long":1}}}
>
> I produced that with avro-tools 1.7.6 and tojson.
>
> Here's the schema: {
>   "type" : "record",
>   "name" : "Parent",
>   "fields" : [ {
>     "name" : "id",
>     "type" : "long"
>   }, {
>     "name" : "p",
>     "type" : "string"
>   }, {
>     "name" : "child",
>     "type" : {
>       "type" : "record",
>       "name" : "Child",
>       "fields" : [ {
>         "name" : "c",
>         "type" : "string"
>       }, {
>         "name" : "parent",
>         "type" : [ "null", "long", "Parent" ]
>       } ],
>       "logicalType" : "reference",
>       "ref-field-name" : "parent"
>     }
>   } ],
>   "logicalType" : "referenceable",
>   "id-field-name" : "id"
> }
>
> rb
>
>
> On 05/28/2015 09:50 AM, S G wrote:
>
>> RB,
>>
>> Could you please attach the schema and the JSON serialized output from
>> your
>> test-code as well?
>> My build environment is currently broken as I am grappling with some Java
>> 8
>> update issues.
>>
>> Thanks
>> SG
>>
>>
>> On Wed, May 27, 2015 at 3:28 PM, Ryan Blue <bl...@cloudera.com> wrote:
>>
>> SG,
>>>
>>> Now that logical types are in, I had some time to look at this issue.
>>> Thanks for your patience on this.
>>>
>>> When I started looking at the use case, this began to look very much like
>>> a logical type issue. (I know, I've been saying that a lot.) When you
>>> write, you replace any referenced object with its id. When you read, you
>>> replace those ids with the correct object. I went ahead and implemented
>>> this using 2 logical types: Referenceable for an object with an id, and
>>> Reference for a record with a field that references another object by id.
>>>
>>> There were some trade-offs to this approach. For example, I had to use a
>>> logical type for the object containing the reference rather than for the
>>> reference itself because I used a callback to set the object once it is
>>> found. That happens because children with references to a parent are
>>> deserialized completely first. The parent is the last object to be
>>> assembled and passed to the logical type conversion.
>>>
>>> A bigger issue was that logical types are currently conservative and
>>> don't
>>> overlap with reflect types or anything that sets java-class. That means
>>> that this currently only works with generic types. But, I'd rather make
>>> logical types work for reflect than add more custom code to support this.
>>> Does that sound reasonable?
>>>
>>> I'm attaching a diff with the working test code so you can take a look at
>>> the approach. Let me know what you are thinking.
>>>
>>> rb
>>>
>>> On 05/20/2015 12:05 PM, S G wrote:
>>>
>>> I am requesting some help with AVRO-695.
>>>> Here are some pieces from the last conversation.
>>>>
>>>>
>>>> Doug Cutting
>>>> <https://issues.apache.org/jira/secure/ViewProfile.jspa?name=cutting>
>>>> added
>>>> a comment - 02/Oct/14 21:19
>>>>
>>>> Here's a modified version of the patch. It moves all significant changes
>>>> to
>>>> reflect. Since reflect is the only implementation that can generate an
>>>> appropriate schema, changes should be confined to there.
>>>>
>>>> The tests need to be updated, as they still assume generic.
>>>> <https://issues.apache.org/jira/browse/AVRO-695#>
>>>> <
>>>>
>>>> https://issues.apache.org/jira/browse/AVRO-695?focusedCommentId=14286370&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14286370
>>>>
>>>>>
>>>>> Sachin Goyal
>>>> <
>>>> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=sachingoyal
>>>> >
>>>> added
>>>> a comment - 21/Jan/15 21:56
>>>>
>>>> Here is a patch with the updated test-cases.
>>>> I also confirm that all my changes are there in the patch.
>>>>
>>>>
>>>> The patch was submitted in June 2014 and was very hot till October 2014.
>>>> Since then, there has been no action on this even though I have sent
>>>> many
>>>> reminders in this group.
>>>>
>>>> I understand that everyone is very busy with their own stuff but I would
>>>> really appreciate if someone could help a fellow engineer in getting his
>>>> patch accepted.
>>>> It would encourage more participation as well as help people wanting to
>>>> use
>>>> Avro for circular references.
>>>>
>>>> Regards
>>>> SG
>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Cloudera, Inc.
>>>
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Cloudera, Inc.
>

Re: Need review/merges for couple of pull requests

Posted by S G <sg...@gmail.com>.
Hey Ryan,

Any chance you were able to look into this (My email on Jun 6)?

Thanks
SG

On Thu, Jun 16, 2016 at 2:05 PM, S G <sg...@gmail.com> wrote:

> Doug Cutting, Ryan Blue,
>
> I have worked with both of you on this ticket and would appreciate your
> thoughts regarding circular references.
>
> If you think the current circular references' implementation in Avro is
> indeed easy for clients to use (i.e. they do not have to write all those
> extra classes), then please help me by providing some test-case/code
> reference.
>
> Thanks
> Sachin
>
>
> On Mon, Jun 6, 2016 at 9:18 AM, S G <sg...@gmail.com> wrote:
>
>>
>> As far as I have understood the attached testcase, I think the solution
>> in https://github.com/apache/avro/pull/23/files (JIRA:
>> https://issues.apache.org/jira/browse/AVRO-695) is more seamless from a
>> user's perspective because the user does not have to write any classes
>> extending LogicalType or Conversion.
>>
>> Writing those classes could become a bit of a challenge:
>> 1) If there are several circular references in the code (something very
>> common in the Hibernate world).
>> 2) It is a pain to change these classes to reflect any changes in the
>> classes being serialized.
>> 3) When the user decides to write a conversion/logical-type class for
>> some record, he has to handle serialization for all fields of the class
>> even though all he wanted to handle was circular references (This is what I
>> have understood but I could be wrong here).
>>
>> In my pull request for AVRO-695, the only thing a user need to do is to
>> call setResolvingCircularRefs(true) on ReflectData. Then every record is
>> converted into a union of the record-type and the newly added class
>> CircularRef. This new class stores the actual reference ID on encountering
>> a circular ref. This is really convenient for the user and there is no
>> penalty on performance. Even if there is a problem, the user can always go
>> back to writing LogicalType and Conversion classes.
>>
>> My request would be to consider this solution for handling circular
>> references instead of asking the user to write a bunch of other classes.
>> If the solution sounds acceptable, then I can bring the patch up-to-date
>> with the latest master branch and reopen the pull request.
>>
>> Cheers,
>> SG
>>
>> On Wed, Jun 1, 2016 at 5:36 PM, S G <sg...@gmail.com> wrote:
>>
>>> Thanks RB,
>>>
>>> But I am not sure I follow that example (Probably because there is no
>>> actual datum class there?).
>>>
>>> Consider a complex code running in production with lots of circular
>>> references.
>>> Something like:
>>>
>>> class Home {
>>>    String address;
>>>    Integer zipCode;
>>>    List <Person> residents;
>>> }
>>>
>>> class Person {
>>>     String name;
>>>     Person spouse;
>>>     House home;
>>>     Map <Car, Dealer> vehiclesOwned;
>>> }
>>>
>>> class Car {
>>>      String make;
>>>      Integer year;
>>>      Dealer soldBy;
>>>      Person ownedBy;
>>> }
>>>
>>> class Dealer {
>>>     String name;
>>>     String address;
>>>     List<Car> vehiclesSold;
>>> }
>>>
>>> The above is a made-up example, intentionally using easy-to-understand
>>> references.
>>> But it does show that circular references can become very complex and
>>> across multiple chains.
>>>
>>> If users have to write Conversion classes for all of the above along
>>> with hash-map(s) that finds out IDs for the objects seen already,  writes
>>> them out as separate fields and then reads them back to replace them with
>>> actual objects, then it would be expecting too much from our clients IMO.
>>>
>>> Perhaps it would be more helpful if there was a more seamless way of
>>> getting this done automatically.
>>>
>>> SG
>>>
>>>
>>>
>>> On Wed, Jun 1, 2016 at 5:22 PM, Ryan Blue <rb...@netflix.com.invalid>
>>> wrote:
>>>
>>>> Those aren't the datum classes, those are logical types that are added
>>>> to
>>>> the schema for your datum classes. The "referenceable" logical type is
>>>> applied to the class that gets replaced with an ID reference and points
>>>> to
>>>> the field to use for that ID. The "reference" logical type is applied to
>>>> the class that contains a referencable datum. Then there are conversions
>>>> that replace a referenceable with its ID.
>>>>
>>>> On Wed, Jun 1, 2016 at 4:52 PM, S G <sg...@gmail.com> wrote:
>>>>
>>>> > RB,
>>>> >
>>>> > The test TestCircularReferences shows that the classes having circular
>>>> > reference need to extend LogicalType.
>>>> > If every circular reference class has to do this, then I think this
>>>> would
>>>> > be a big limitation for actual classes used in production code
>>>> because they
>>>> > would be already extending other classes plus asking several other
>>>> teams to
>>>> > change their base-class would be difficult.
>>>> > Is there a way to do this without making the classes extend the
>>>> LogicalType
>>>> > ?
>>>> >
>>>> > SG
>>>> >
>>>> >
>>>> > On Wed, Jun 1, 2016 at 3:29 PM, Ryan Blue <rb...@netflix.com.invalid>
>>>> > wrote:
>>>> >
>>>> > > SG,
>>>> > >
>>>> > > The example/test I built uses logical types to remove the circular
>>>> > > reference when writing and restore it when reading. It doesn't look
>>>> like
>>>> > > your test adds logical types, so that's probably the issue.
>>>> > >
>>>> > > rb
>>>> > >
>>>> > > On Wed, Jun 1, 2016 at 1:51 PM, S G <sg...@gmail.com>
>>>> wrote:
>>>> > >
>>>> > > > Hey RB,
>>>> > > >
>>>> > > > I was trying out the circular refs with latest 1.8.1 version of
>>>> Avro
>>>> > and
>>>> > > it
>>>> > > > doesn't seem to be working out of the box for me.
>>>> > > > Perhaps I am missing something and would appreciate your help.
>>>> > > >
>>>> > > > Thanks,
>>>> > > > SG
>>>> > > >
>>>> > > >
>>>> > > > Here is my test code:
>>>> > > >
>>>> > > > public class CircularRefsTest
>>>> > > > {
>>>> > > >     @Test
>>>> > > >     public void testSerialization() throws Exception
>>>> > > >     {
>>>> > > >         // Create a circular linked-list
>>>> > > >         ListNode first = new ListNode("first");
>>>> > > >         ListNode second = new ListNode("second");
>>>> > > >         second.next = first;
>>>> > > >         first.next = second;
>>>> > > >
>>>> > > >         ReflectData rdata = ReflectData.AllowNull.get();
>>>> > > >
>>>> > > >         // Print Schema
>>>> > > >         Schema schema = rdata.getSchema(ListNode.class);
>>>> > > >         System.out.println(schema);
>>>> > > >
>>>> > > >         // Serialize
>>>> > > >         DatumWriter<ListNode> datumWriter = new
>>>> > > > ReflectDatumWriter<ListNode>(ListNode.class);
>>>> > > >         ByteArrayOutputStream byteArrayOutputStream = new
>>>> > > > ByteArrayOutputStream();
>>>> > > >         Encoder encoder =
>>>> > > > EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null);
>>>> > > >         datumWriter.write(first, encoder);
>>>> > > >         encoder.flush();
>>>> > > >         byte[] bytes = byteArrayOutputStream.toByteArray();
>>>> > > >
>>>> > > >         assertTrue( bytes != null );
>>>> > > >     }
>>>> > > > }
>>>> > > >
>>>> > > > class ListNode
>>>> > > > {
>>>> > > >     String data;
>>>> > > >     ListNode next;
>>>> > > >
>>>> > > >     public ListNode() {
>>>> > > >     }
>>>> > > >     public ListNode(String data) {
>>>> > > >         this.data = data;
>>>> > > >     }
>>>> > > > }
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > > On Thu, May 28, 2015 at 11:00 AM, Ryan Blue <bl...@cloudera.com>
>>>> wrote:
>>>> > > >
>>>> > > > > SG,
>>>> > > > >
>>>> > > > > The data ends up looking like this:
>>>> > > > >
>>>> > > > > {"id":1,"p":"parent data!","child":{"c":"child
>>>> > > > data!","parent":{"long":1}}}
>>>> > > > >
>>>> > > > > I produced that with avro-tools 1.7.6 and tojson.
>>>> > > > >
>>>> > > > > Here's the schema: {
>>>> > > > >   "type" : "record",
>>>> > > > >   "name" : "Parent",
>>>> > > > >   "fields" : [ {
>>>> > > > >     "name" : "id",
>>>> > > > >     "type" : "long"
>>>> > > > >   }, {
>>>> > > > >     "name" : "p",
>>>> > > > >     "type" : "string"
>>>> > > > >   }, {
>>>> > > > >     "name" : "child",
>>>> > > > >     "type" : {
>>>> > > > >       "type" : "record",
>>>> > > > >       "name" : "Child",
>>>> > > > >       "fields" : [ {
>>>> > > > >         "name" : "c",
>>>> > > > >         "type" : "string"
>>>> > > > >       }, {
>>>> > > > >         "name" : "parent",
>>>> > > > >         "type" : [ "null", "long", "Parent" ]
>>>> > > > >       } ],
>>>> > > > >       "logicalType" : "reference",
>>>> > > > >       "ref-field-name" : "parent"
>>>> > > > >     }
>>>> > > > >   } ],
>>>> > > > >   "logicalType" : "referenceable",
>>>> > > > >   "id-field-name" : "id"
>>>> > > > > }
>>>> > > > >
>>>> > > > > rb
>>>> > > > >
>>>> > > > >
>>>> > > > > On 05/28/2015 09:50 AM, S G wrote:
>>>> > > > >
>>>> > > > >> RB,
>>>> > > > >>
>>>> > > > >> Could you please attach the schema and the JSON serialized
>>>> output
>>>> > from
>>>> > > > >> your
>>>> > > > >> test-code as well?
>>>> > > > >> My build environment is currently broken as I am grappling
>>>> with some
>>>> > > > Java
>>>> > > > >> 8
>>>> > > > >> update issues.
>>>> > > > >>
>>>> > > > >> Thanks
>>>> > > > >> SG
>>>> > > > >>
>>>> > > > >>
>>>> > > > >> On Wed, May 27, 2015 at 3:28 PM, Ryan Blue <bl...@cloudera.com>
>>>> > wrote:
>>>> > > > >>
>>>> > > > >> SG,
>>>> > > > >>>
>>>> > > > >>> Now that logical types are in, I had some time to look at this
>>>> > issue.
>>>> > > > >>> Thanks for your patience on this.
>>>> > > > >>>
>>>> > > > >>> When I started looking at the use case, this began to look
>>>> very
>>>> > much
>>>> > > > like
>>>> > > > >>> a logical type issue. (I know, I've been saying that a lot.)
>>>> When
>>>> > you
>>>> > > > >>> write, you replace any referenced object with its id. When you
>>>> > read,
>>>> > > > you
>>>> > > > >>> replace those ids with the correct object. I went ahead and
>>>> > > implemented
>>>> > > > >>> this using 2 logical types: Referenceable for an object with
>>>> an id,
>>>> > > and
>>>> > > > >>> Reference for a record with a field that references another
>>>> object
>>>> > by
>>>> > > > id.
>>>> > > > >>>
>>>> > > > >>> There were some trade-offs to this approach. For example, I
>>>> had to
>>>> > > use
>>>> > > > a
>>>> > > > >>> logical type for the object containing the reference rather
>>>> than
>>>> > for
>>>> > > > the
>>>> > > > >>> reference itself because I used a callback to set the object
>>>> once
>>>> > it
>>>> > > is
>>>> > > > >>> found. That happens because children with references to a
>>>> parent
>>>> > are
>>>> > > > >>> deserialized completely first. The parent is the last object
>>>> to be
>>>> > > > >>> assembled and passed to the logical type conversion.
>>>> > > > >>>
>>>> > > > >>> A bigger issue was that logical types are currently
>>>> conservative
>>>> > and
>>>> > > > >>> don't
>>>> > > > >>> overlap with reflect types or anything that sets java-class.
>>>> That
>>>> > > means
>>>> > > > >>> that this currently only works with generic types. But, I'd
>>>> rather
>>>> > > make
>>>> > > > >>> logical types work for reflect than add more custom code to
>>>> support
>>>> > > > this.
>>>> > > > >>> Does that sound reasonable?
>>>> > > > >>>
>>>> > > > >>> I'm attaching a diff with the working test code so you can
>>>> take a
>>>> > > look
>>>> > > > at
>>>> > > > >>> the approach. Let me know what you are thinking.
>>>> > > > >>>
>>>> > > > >>> rb
>>>> > > > >>>
>>>> > > > >>> On 05/20/2015 12:05 PM, S G wrote:
>>>> > > > >>>
>>>> > > > >>> I am requesting some help with AVRO-695.
>>>> > > > >>>> Here are some pieces from the last conversation.
>>>> > > > >>>>
>>>> > > > >>>>
>>>> > > > >>>> Doug Cutting
>>>> > > > >>>> <
>>>> > > https://issues.apache.org/jira/secure/ViewProfile.jspa?name=cutting
>>>> >
>>>> > > > >>>> added
>>>> > > > >>>> a comment - 02/Oct/14 21:19
>>>> > > > >>>>
>>>> > > > >>>> Here's a modified version of the patch. It moves all
>>>> significant
>>>> > > > changes
>>>> > > > >>>> to
>>>> > > > >>>> reflect. Since reflect is the only implementation that can
>>>> > generate
>>>> > > an
>>>> > > > >>>> appropriate schema, changes should be confined to there.
>>>> > > > >>>>
>>>> > > > >>>> The tests need to be updated, as they still assume generic.
>>>> > > > >>>> <https://issues.apache.org/jira/browse/AVRO-695#>
>>>> > > > >>>> <
>>>> > > > >>>>
>>>> > > > >>>>
>>>> > > >
>>>> > >
>>>> > https://issues.apache.org/jira/browse/AVRO-695?
>>>> focusedCommentId=14286370&page=com.atlassian.jira.
>>>> plugin.system.issuetabpanels:comment-tabpanel#comment-14286370
>>>> > > > >>>>
>>>> > > > >>>>>
>>>> > > > >>>>> Sachin Goyal
>>>> > > > >>>> <
>>>> > > > >>>>
>>>> > > >
>>>> > https://issues.apache.org/jira/secure/ViewProfile.jspa?
>>>> name=sachingoyal
>>>> > > > >>>> >
>>>> > > > >>>> added
>>>> > > > >>>> a comment - 21/Jan/15 21:56
>>>> > > > >>>>
>>>> > > > >>>> Here is a patch with the updated test-cases.
>>>> > > > >>>> I also confirm that all my changes are there in the patch.
>>>> > > > >>>>
>>>> > > > >>>>
>>>> > > > >>>> The patch was submitted in June 2014 and was very hot till
>>>> October
>>>> > > > 2014.
>>>> > > > >>>> Since then, there has been no action on this even though I
>>>> have
>>>> > sent
>>>> > > > >>>> many
>>>> > > > >>>> reminders in this group.
>>>> > > > >>>>
>>>> > > > >>>> I understand that everyone is very busy with their own stuff
>>>> but I
>>>> > > > would
>>>> > > > >>>> really appreciate if someone could help a fellow engineer in
>>>> > getting
>>>> > > > his
>>>> > > > >>>> patch accepted.
>>>> > > > >>>> It would encourage more participation as well as help people
>>>> > wanting
>>>> > > > to
>>>> > > > >>>> use
>>>> > > > >>>> Avro for circular references.
>>>> > > > >>>>
>>>> > > > >>>> Regards
>>>> > > > >>>> SG
>>>> > > > >>>>
>>>> > > > >>>>
>>>> > > > >>>
>>>> > > > >>> --
>>>> > > > >>> Ryan Blue
>>>> > > > >>> Software Engineer
>>>> > > > >>> Cloudera, Inc.
>>>> > > > >>>
>>>> > > > >>>
>>>> > > > >>
>>>> > > > >
>>>> > > > > --
>>>> > > > > Ryan Blue
>>>> > > > > Software Engineer
>>>> > > > > Cloudera, Inc.
>>>> > > > >
>>>> > > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > --
>>>> > > Ryan Blue
>>>> > > Software Engineer
>>>> > > Netflix
>>>> > >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>>
>>>
>>>
>>
>

Re: Need review/merges for couple of pull requests

Posted by S G <sg...@gmail.com>.
Doug Cutting, Ryan Blue,

I have worked with both of you on this ticket and would appreciate your
thoughts regarding circular references.

If you think the current circular references' implementation in Avro is
indeed easy for clients to use (i.e. they do not have to write all those
extra classes), then please help me by providing some test-case/code
reference.

Thanks
Sachin


On Mon, Jun 6, 2016 at 9:18 AM, S G <sg...@gmail.com> wrote:

>
> As far as I have understood the attached testcase, I think the solution in
> https://github.com/apache/avro/pull/23/files (JIRA:
> https://issues.apache.org/jira/browse/AVRO-695) is more seamless from a
> user's perspective because the user does not have to write any classes
> extending LogicalType or Conversion.
>
> Writing those classes could become a bit of a challenge:
> 1) If there are several circular references in the code (something very
> common in the Hibernate world).
> 2) It is a pain to change these classes to reflect any changes in the
> classes being serialized.
> 3) When the user decides to write a conversion/logical-type class for some
> record, he has to handle serialization for all fields of the class even
> though all he wanted to handle was circular references (This is what I have
> understood but I could be wrong here).
>
> In my pull request for AVRO-695, the only thing a user need to do is to
> call setResolvingCircularRefs(true) on ReflectData. Then every record is
> converted into a union of the record-type and the newly added class
> CircularRef. This new class stores the actual reference ID on encountering
> a circular ref. This is really convenient for the user and there is no
> penalty on performance. Even if there is a problem, the user can always go
> back to writing LogicalType and Conversion classes.
>
> My request would be to consider this solution for handling circular
> references instead of asking the user to write a bunch of other classes.
> If the solution sounds acceptable, then I can bring the patch up-to-date
> with the latest master branch and reopen the pull request.
>
> Cheers,
> SG
>
> On Wed, Jun 1, 2016 at 5:36 PM, S G <sg...@gmail.com> wrote:
>
>> Thanks RB,
>>
>> But I am not sure I follow that example (Probably because there is no
>> actual datum class there?).
>>
>> Consider a complex code running in production with lots of circular
>> references.
>> Something like:
>>
>> class Home {
>>    String address;
>>    Integer zipCode;
>>    List <Person> residents;
>> }
>>
>> class Person {
>>     String name;
>>     Person spouse;
>>     House home;
>>     Map <Car, Dealer> vehiclesOwned;
>> }
>>
>> class Car {
>>      String make;
>>      Integer year;
>>      Dealer soldBy;
>>      Person ownedBy;
>> }
>>
>> class Dealer {
>>     String name;
>>     String address;
>>     List<Car> vehiclesSold;
>> }
>>
>> The above is a made-up example, intentionally using easy-to-understand
>> references.
>> But it does show that circular references can become very complex and
>> across multiple chains.
>>
>> If users have to write Conversion classes for all of the above along with
>> hash-map(s) that finds out IDs for the objects seen already,  writes them
>> out as separate fields and then reads them back to replace them with actual
>> objects, then it would be expecting too much from our clients IMO.
>>
>> Perhaps it would be more helpful if there was a more seamless way of
>> getting this done automatically.
>>
>> SG
>>
>>
>>
>> On Wed, Jun 1, 2016 at 5:22 PM, Ryan Blue <rb...@netflix.com.invalid>
>> wrote:
>>
>>> Those aren't the datum classes, those are logical types that are added to
>>> the schema for your datum classes. The "referenceable" logical type is
>>> applied to the class that gets replaced with an ID reference and points
>>> to
>>> the field to use for that ID. The "reference" logical type is applied to
>>> the class that contains a referencable datum. Then there are conversions
>>> that replace a referenceable with its ID.
>>>
>>> On Wed, Jun 1, 2016 at 4:52 PM, S G <sg...@gmail.com> wrote:
>>>
>>> > RB,
>>> >
>>> > The test TestCircularReferences shows that the classes having circular
>>> > reference need to extend LogicalType.
>>> > If every circular reference class has to do this, then I think this
>>> would
>>> > be a big limitation for actual classes used in production code because
>>> they
>>> > would be already extending other classes plus asking several other
>>> teams to
>>> > change their base-class would be difficult.
>>> > Is there a way to do this without making the classes extend the
>>> LogicalType
>>> > ?
>>> >
>>> > SG
>>> >
>>> >
>>> > On Wed, Jun 1, 2016 at 3:29 PM, Ryan Blue <rb...@netflix.com.invalid>
>>> > wrote:
>>> >
>>> > > SG,
>>> > >
>>> > > The example/test I built uses logical types to remove the circular
>>> > > reference when writing and restore it when reading. It doesn't look
>>> like
>>> > > your test adds logical types, so that's probably the issue.
>>> > >
>>> > > rb
>>> > >
>>> > > On Wed, Jun 1, 2016 at 1:51 PM, S G <sg...@gmail.com>
>>> wrote:
>>> > >
>>> > > > Hey RB,
>>> > > >
>>> > > > I was trying out the circular refs with latest 1.8.1 version of
>>> Avro
>>> > and
>>> > > it
>>> > > > doesn't seem to be working out of the box for me.
>>> > > > Perhaps I am missing something and would appreciate your help.
>>> > > >
>>> > > > Thanks,
>>> > > > SG
>>> > > >
>>> > > >
>>> > > > Here is my test code:
>>> > > >
>>> > > > public class CircularRefsTest
>>> > > > {
>>> > > >     @Test
>>> > > >     public void testSerialization() throws Exception
>>> > > >     {
>>> > > >         // Create a circular linked-list
>>> > > >         ListNode first = new ListNode("first");
>>> > > >         ListNode second = new ListNode("second");
>>> > > >         second.next = first;
>>> > > >         first.next = second;
>>> > > >
>>> > > >         ReflectData rdata = ReflectData.AllowNull.get();
>>> > > >
>>> > > >         // Print Schema
>>> > > >         Schema schema = rdata.getSchema(ListNode.class);
>>> > > >         System.out.println(schema);
>>> > > >
>>> > > >         // Serialize
>>> > > >         DatumWriter<ListNode> datumWriter = new
>>> > > > ReflectDatumWriter<ListNode>(ListNode.class);
>>> > > >         ByteArrayOutputStream byteArrayOutputStream = new
>>> > > > ByteArrayOutputStream();
>>> > > >         Encoder encoder =
>>> > > > EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null);
>>> > > >         datumWriter.write(first, encoder);
>>> > > >         encoder.flush();
>>> > > >         byte[] bytes = byteArrayOutputStream.toByteArray();
>>> > > >
>>> > > >         assertTrue( bytes != null );
>>> > > >     }
>>> > > > }
>>> > > >
>>> > > > class ListNode
>>> > > > {
>>> > > >     String data;
>>> > > >     ListNode next;
>>> > > >
>>> > > >     public ListNode() {
>>> > > >     }
>>> > > >     public ListNode(String data) {
>>> > > >         this.data = data;
>>> > > >     }
>>> > > > }
>>> > > >
>>> > > >
>>> > > >
>>> > > > On Thu, May 28, 2015 at 11:00 AM, Ryan Blue <bl...@cloudera.com>
>>> wrote:
>>> > > >
>>> > > > > SG,
>>> > > > >
>>> > > > > The data ends up looking like this:
>>> > > > >
>>> > > > > {"id":1,"p":"parent data!","child":{"c":"child
>>> > > > data!","parent":{"long":1}}}
>>> > > > >
>>> > > > > I produced that with avro-tools 1.7.6 and tojson.
>>> > > > >
>>> > > > > Here's the schema: {
>>> > > > >   "type" : "record",
>>> > > > >   "name" : "Parent",
>>> > > > >   "fields" : [ {
>>> > > > >     "name" : "id",
>>> > > > >     "type" : "long"
>>> > > > >   }, {
>>> > > > >     "name" : "p",
>>> > > > >     "type" : "string"
>>> > > > >   }, {
>>> > > > >     "name" : "child",
>>> > > > >     "type" : {
>>> > > > >       "type" : "record",
>>> > > > >       "name" : "Child",
>>> > > > >       "fields" : [ {
>>> > > > >         "name" : "c",
>>> > > > >         "type" : "string"
>>> > > > >       }, {
>>> > > > >         "name" : "parent",
>>> > > > >         "type" : [ "null", "long", "Parent" ]
>>> > > > >       } ],
>>> > > > >       "logicalType" : "reference",
>>> > > > >       "ref-field-name" : "parent"
>>> > > > >     }
>>> > > > >   } ],
>>> > > > >   "logicalType" : "referenceable",
>>> > > > >   "id-field-name" : "id"
>>> > > > > }
>>> > > > >
>>> > > > > rb
>>> > > > >
>>> > > > >
>>> > > > > On 05/28/2015 09:50 AM, S G wrote:
>>> > > > >
>>> > > > >> RB,
>>> > > > >>
>>> > > > >> Could you please attach the schema and the JSON serialized
>>> output
>>> > from
>>> > > > >> your
>>> > > > >> test-code as well?
>>> > > > >> My build environment is currently broken as I am grappling with
>>> some
>>> > > > Java
>>> > > > >> 8
>>> > > > >> update issues.
>>> > > > >>
>>> > > > >> Thanks
>>> > > > >> SG
>>> > > > >>
>>> > > > >>
>>> > > > >> On Wed, May 27, 2015 at 3:28 PM, Ryan Blue <bl...@cloudera.com>
>>> > wrote:
>>> > > > >>
>>> > > > >> SG,
>>> > > > >>>
>>> > > > >>> Now that logical types are in, I had some time to look at this
>>> > issue.
>>> > > > >>> Thanks for your patience on this.
>>> > > > >>>
>>> > > > >>> When I started looking at the use case, this began to look very
>>> > much
>>> > > > like
>>> > > > >>> a logical type issue. (I know, I've been saying that a lot.)
>>> When
>>> > you
>>> > > > >>> write, you replace any referenced object with its id. When you
>>> > read,
>>> > > > you
>>> > > > >>> replace those ids with the correct object. I went ahead and
>>> > > implemented
>>> > > > >>> this using 2 logical types: Referenceable for an object with
>>> an id,
>>> > > and
>>> > > > >>> Reference for a record with a field that references another
>>> object
>>> > by
>>> > > > id.
>>> > > > >>>
>>> > > > >>> There were some trade-offs to this approach. For example, I
>>> had to
>>> > > use
>>> > > > a
>>> > > > >>> logical type for the object containing the reference rather
>>> than
>>> > for
>>> > > > the
>>> > > > >>> reference itself because I used a callback to set the object
>>> once
>>> > it
>>> > > is
>>> > > > >>> found. That happens because children with references to a
>>> parent
>>> > are
>>> > > > >>> deserialized completely first. The parent is the last object
>>> to be
>>> > > > >>> assembled and passed to the logical type conversion.
>>> > > > >>>
>>> > > > >>> A bigger issue was that logical types are currently
>>> conservative
>>> > and
>>> > > > >>> don't
>>> > > > >>> overlap with reflect types or anything that sets java-class.
>>> That
>>> > > means
>>> > > > >>> that this currently only works with generic types. But, I'd
>>> rather
>>> > > make
>>> > > > >>> logical types work for reflect than add more custom code to
>>> support
>>> > > > this.
>>> > > > >>> Does that sound reasonable?
>>> > > > >>>
>>> > > > >>> I'm attaching a diff with the working test code so you can
>>> take a
>>> > > look
>>> > > > at
>>> > > > >>> the approach. Let me know what you are thinking.
>>> > > > >>>
>>> > > > >>> rb
>>> > > > >>>
>>> > > > >>> On 05/20/2015 12:05 PM, S G wrote:
>>> > > > >>>
>>> > > > >>> I am requesting some help with AVRO-695.
>>> > > > >>>> Here are some pieces from the last conversation.
>>> > > > >>>>
>>> > > > >>>>
>>> > > > >>>> Doug Cutting
>>> > > > >>>> <
>>> > > https://issues.apache.org/jira/secure/ViewProfile.jspa?name=cutting>
>>> > > > >>>> added
>>> > > > >>>> a comment - 02/Oct/14 21:19
>>> > > > >>>>
>>> > > > >>>> Here's a modified version of the patch. It moves all
>>> significant
>>> > > > changes
>>> > > > >>>> to
>>> > > > >>>> reflect. Since reflect is the only implementation that can
>>> > generate
>>> > > an
>>> > > > >>>> appropriate schema, changes should be confined to there.
>>> > > > >>>>
>>> > > > >>>> The tests need to be updated, as they still assume generic.
>>> > > > >>>> <https://issues.apache.org/jira/browse/AVRO-695#>
>>> > > > >>>> <
>>> > > > >>>>
>>> > > > >>>>
>>> > > >
>>> > >
>>> >
>>> https://issues.apache.org/jira/browse/AVRO-695?focusedCommentId=14286370&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14286370
>>> > > > >>>>
>>> > > > >>>>>
>>> > > > >>>>> Sachin Goyal
>>> > > > >>>> <
>>> > > > >>>>
>>> > > >
>>> >
>>> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=sachingoyal
>>> > > > >>>> >
>>> > > > >>>> added
>>> > > > >>>> a comment - 21/Jan/15 21:56
>>> > > > >>>>
>>> > > > >>>> Here is a patch with the updated test-cases.
>>> > > > >>>> I also confirm that all my changes are there in the patch.
>>> > > > >>>>
>>> > > > >>>>
>>> > > > >>>> The patch was submitted in June 2014 and was very hot till
>>> October
>>> > > > 2014.
>>> > > > >>>> Since then, there has been no action on this even though I
>>> have
>>> > sent
>>> > > > >>>> many
>>> > > > >>>> reminders in this group.
>>> > > > >>>>
>>> > > > >>>> I understand that everyone is very busy with their own stuff
>>> but I
>>> > > > would
>>> > > > >>>> really appreciate if someone could help a fellow engineer in
>>> > getting
>>> > > > his
>>> > > > >>>> patch accepted.
>>> > > > >>>> It would encourage more participation as well as help people
>>> > wanting
>>> > > > to
>>> > > > >>>> use
>>> > > > >>>> Avro for circular references.
>>> > > > >>>>
>>> > > > >>>> Regards
>>> > > > >>>> SG
>>> > > > >>>>
>>> > > > >>>>
>>> > > > >>>
>>> > > > >>> --
>>> > > > >>> Ryan Blue
>>> > > > >>> Software Engineer
>>> > > > >>> Cloudera, Inc.
>>> > > > >>>
>>> > > > >>>
>>> > > > >>
>>> > > > >
>>> > > > > --
>>> > > > > Ryan Blue
>>> > > > > Software Engineer
>>> > > > > Cloudera, Inc.
>>> > > > >
>>> > > >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Ryan Blue
>>> > > Software Engineer
>>> > > Netflix
>>> > >
>>> >
>>>
>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>>
>

Re: Need review/merges for couple of pull requests

Posted by S G <sg...@gmail.com>.
As far as I have understood the attached testcase, I think the solution in
https://github.com/apache/avro/pull/23/files (JIRA:
https://issues.apache.org/jira/browse/AVRO-695) is more seamless from a
user's perspective because the user does not have to write any classes
extending LogicalType or Conversion.

Writing those classes could become a bit of a challenge:
1) If there are several circular references in the code (something very
common in the Hibernate world).
2) It is a pain to change these classes to reflect any changes in the
classes being serialized.
3) When the user decides to write a conversion/logical-type class for some
record, he has to handle serialization for all fields of the class even
though all he wanted to handle was circular references (This is what I have
understood but I could be wrong here).

In my pull request for AVRO-695, the only thing a user need to do is to
call setResolvingCircularRefs(true) on ReflectData. Then every record is
converted into a union of the record-type and the newly added class
CircularRef. This new class stores the actual reference ID on encountering
a circular ref. This is really convenient for the user and there is no
penalty on performance. Even if there is a problem, the user can always go
back to writing LogicalType and Conversion classes.

My request would be to consider this solution for handling circular
references instead of asking the user to write a bunch of other classes.
If the solution sounds acceptable, then I can bring the patch up-to-date
with the latest master branch and reopen the pull request.

Cheers,
SG

On Wed, Jun 1, 2016 at 5:36 PM, S G <sg...@gmail.com> wrote:

> Thanks RB,
>
> But I am not sure I follow that example (Probably because there is no
> actual datum class there?).
>
> Consider a complex code running in production with lots of circular
> references.
> Something like:
>
> class Home {
>    String address;
>    Integer zipCode;
>    List <Person> residents;
> }
>
> class Person {
>     String name;
>     Person spouse;
>     House home;
>     Map <Car, Dealer> vehiclesOwned;
> }
>
> class Car {
>      String make;
>      Integer year;
>      Dealer soldBy;
>      Person ownedBy;
> }
>
> class Dealer {
>     String name;
>     String address;
>     List<Car> vehiclesSold;
> }
>
> The above is a made-up example, intentionally using easy-to-understand
> references.
> But it does show that circular references can become very complex and
> across multiple chains.
>
> If users have to write Conversion classes for all of the above along with
> hash-map(s) that finds out IDs for the objects seen already,  writes them
> out as separate fields and then reads them back to replace them with actual
> objects, then it would be expecting too much from our clients IMO.
>
> Perhaps it would be more helpful if there was a more seamless way of
> getting this done automatically.
>
> SG
>
>
>
> On Wed, Jun 1, 2016 at 5:22 PM, Ryan Blue <rb...@netflix.com.invalid>
> wrote:
>
>> Those aren't the datum classes, those are logical types that are added to
>> the schema for your datum classes. The "referenceable" logical type is
>> applied to the class that gets replaced with an ID reference and points to
>> the field to use for that ID. The "reference" logical type is applied to
>> the class that contains a referencable datum. Then there are conversions
>> that replace a referenceable with its ID.
>>
>> On Wed, Jun 1, 2016 at 4:52 PM, S G <sg...@gmail.com> wrote:
>>
>> > RB,
>> >
>> > The test TestCircularReferences shows that the classes having circular
>> > reference need to extend LogicalType.
>> > If every circular reference class has to do this, then I think this
>> would
>> > be a big limitation for actual classes used in production code because
>> they
>> > would be already extending other classes plus asking several other
>> teams to
>> > change their base-class would be difficult.
>> > Is there a way to do this without making the classes extend the
>> LogicalType
>> > ?
>> >
>> > SG
>> >
>> >
>> > On Wed, Jun 1, 2016 at 3:29 PM, Ryan Blue <rb...@netflix.com.invalid>
>> > wrote:
>> >
>> > > SG,
>> > >
>> > > The example/test I built uses logical types to remove the circular
>> > > reference when writing and restore it when reading. It doesn't look
>> like
>> > > your test adds logical types, so that's probably the issue.
>> > >
>> > > rb
>> > >
>> > > On Wed, Jun 1, 2016 at 1:51 PM, S G <sg...@gmail.com>
>> wrote:
>> > >
>> > > > Hey RB,
>> > > >
>> > > > I was trying out the circular refs with latest 1.8.1 version of Avro
>> > and
>> > > it
>> > > > doesn't seem to be working out of the box for me.
>> > > > Perhaps I am missing something and would appreciate your help.
>> > > >
>> > > > Thanks,
>> > > > SG
>> > > >
>> > > >
>> > > > Here is my test code:
>> > > >
>> > > > public class CircularRefsTest
>> > > > {
>> > > >     @Test
>> > > >     public void testSerialization() throws Exception
>> > > >     {
>> > > >         // Create a circular linked-list
>> > > >         ListNode first = new ListNode("first");
>> > > >         ListNode second = new ListNode("second");
>> > > >         second.next = first;
>> > > >         first.next = second;
>> > > >
>> > > >         ReflectData rdata = ReflectData.AllowNull.get();
>> > > >
>> > > >         // Print Schema
>> > > >         Schema schema = rdata.getSchema(ListNode.class);
>> > > >         System.out.println(schema);
>> > > >
>> > > >         // Serialize
>> > > >         DatumWriter<ListNode> datumWriter = new
>> > > > ReflectDatumWriter<ListNode>(ListNode.class);
>> > > >         ByteArrayOutputStream byteArrayOutputStream = new
>> > > > ByteArrayOutputStream();
>> > > >         Encoder encoder =
>> > > > EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null);
>> > > >         datumWriter.write(first, encoder);
>> > > >         encoder.flush();
>> > > >         byte[] bytes = byteArrayOutputStream.toByteArray();
>> > > >
>> > > >         assertTrue( bytes != null );
>> > > >     }
>> > > > }
>> > > >
>> > > > class ListNode
>> > > > {
>> > > >     String data;
>> > > >     ListNode next;
>> > > >
>> > > >     public ListNode() {
>> > > >     }
>> > > >     public ListNode(String data) {
>> > > >         this.data = data;
>> > > >     }
>> > > > }
>> > > >
>> > > >
>> > > >
>> > > > On Thu, May 28, 2015 at 11:00 AM, Ryan Blue <bl...@cloudera.com>
>> wrote:
>> > > >
>> > > > > SG,
>> > > > >
>> > > > > The data ends up looking like this:
>> > > > >
>> > > > > {"id":1,"p":"parent data!","child":{"c":"child
>> > > > data!","parent":{"long":1}}}
>> > > > >
>> > > > > I produced that with avro-tools 1.7.6 and tojson.
>> > > > >
>> > > > > Here's the schema: {
>> > > > >   "type" : "record",
>> > > > >   "name" : "Parent",
>> > > > >   "fields" : [ {
>> > > > >     "name" : "id",
>> > > > >     "type" : "long"
>> > > > >   }, {
>> > > > >     "name" : "p",
>> > > > >     "type" : "string"
>> > > > >   }, {
>> > > > >     "name" : "child",
>> > > > >     "type" : {
>> > > > >       "type" : "record",
>> > > > >       "name" : "Child",
>> > > > >       "fields" : [ {
>> > > > >         "name" : "c",
>> > > > >         "type" : "string"
>> > > > >       }, {
>> > > > >         "name" : "parent",
>> > > > >         "type" : [ "null", "long", "Parent" ]
>> > > > >       } ],
>> > > > >       "logicalType" : "reference",
>> > > > >       "ref-field-name" : "parent"
>> > > > >     }
>> > > > >   } ],
>> > > > >   "logicalType" : "referenceable",
>> > > > >   "id-field-name" : "id"
>> > > > > }
>> > > > >
>> > > > > rb
>> > > > >
>> > > > >
>> > > > > On 05/28/2015 09:50 AM, S G wrote:
>> > > > >
>> > > > >> RB,
>> > > > >>
>> > > > >> Could you please attach the schema and the JSON serialized output
>> > from
>> > > > >> your
>> > > > >> test-code as well?
>> > > > >> My build environment is currently broken as I am grappling with
>> some
>> > > > Java
>> > > > >> 8
>> > > > >> update issues.
>> > > > >>
>> > > > >> Thanks
>> > > > >> SG
>> > > > >>
>> > > > >>
>> > > > >> On Wed, May 27, 2015 at 3:28 PM, Ryan Blue <bl...@cloudera.com>
>> > wrote:
>> > > > >>
>> > > > >> SG,
>> > > > >>>
>> > > > >>> Now that logical types are in, I had some time to look at this
>> > issue.
>> > > > >>> Thanks for your patience on this.
>> > > > >>>
>> > > > >>> When I started looking at the use case, this began to look very
>> > much
>> > > > like
>> > > > >>> a logical type issue. (I know, I've been saying that a lot.)
>> When
>> > you
>> > > > >>> write, you replace any referenced object with its id. When you
>> > read,
>> > > > you
>> > > > >>> replace those ids with the correct object. I went ahead and
>> > > implemented
>> > > > >>> this using 2 logical types: Referenceable for an object with an
>> id,
>> > > and
>> > > > >>> Reference for a record with a field that references another
>> object
>> > by
>> > > > id.
>> > > > >>>
>> > > > >>> There were some trade-offs to this approach. For example, I had
>> to
>> > > use
>> > > > a
>> > > > >>> logical type for the object containing the reference rather than
>> > for
>> > > > the
>> > > > >>> reference itself because I used a callback to set the object
>> once
>> > it
>> > > is
>> > > > >>> found. That happens because children with references to a parent
>> > are
>> > > > >>> deserialized completely first. The parent is the last object to
>> be
>> > > > >>> assembled and passed to the logical type conversion.
>> > > > >>>
>> > > > >>> A bigger issue was that logical types are currently conservative
>> > and
>> > > > >>> don't
>> > > > >>> overlap with reflect types or anything that sets java-class.
>> That
>> > > means
>> > > > >>> that this currently only works with generic types. But, I'd
>> rather
>> > > make
>> > > > >>> logical types work for reflect than add more custom code to
>> support
>> > > > this.
>> > > > >>> Does that sound reasonable?
>> > > > >>>
>> > > > >>> I'm attaching a diff with the working test code so you can take
>> a
>> > > look
>> > > > at
>> > > > >>> the approach. Let me know what you are thinking.
>> > > > >>>
>> > > > >>> rb
>> > > > >>>
>> > > > >>> On 05/20/2015 12:05 PM, S G wrote:
>> > > > >>>
>> > > > >>> I am requesting some help with AVRO-695.
>> > > > >>>> Here are some pieces from the last conversation.
>> > > > >>>>
>> > > > >>>>
>> > > > >>>> Doug Cutting
>> > > > >>>> <
>> > > https://issues.apache.org/jira/secure/ViewProfile.jspa?name=cutting>
>> > > > >>>> added
>> > > > >>>> a comment - 02/Oct/14 21:19
>> > > > >>>>
>> > > > >>>> Here's a modified version of the patch. It moves all
>> significant
>> > > > changes
>> > > > >>>> to
>> > > > >>>> reflect. Since reflect is the only implementation that can
>> > generate
>> > > an
>> > > > >>>> appropriate schema, changes should be confined to there.
>> > > > >>>>
>> > > > >>>> The tests need to be updated, as they still assume generic.
>> > > > >>>> <https://issues.apache.org/jira/browse/AVRO-695#>
>> > > > >>>> <
>> > > > >>>>
>> > > > >>>>
>> > > >
>> > >
>> >
>> https://issues.apache.org/jira/browse/AVRO-695?focusedCommentId=14286370&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14286370
>> > > > >>>>
>> > > > >>>>>
>> > > > >>>>> Sachin Goyal
>> > > > >>>> <
>> > > > >>>>
>> > > >
>> > https://issues.apache.org/jira/secure/ViewProfile.jspa?name=sachingoyal
>> > > > >>>> >
>> > > > >>>> added
>> > > > >>>> a comment - 21/Jan/15 21:56
>> > > > >>>>
>> > > > >>>> Here is a patch with the updated test-cases.
>> > > > >>>> I also confirm that all my changes are there in the patch.
>> > > > >>>>
>> > > > >>>>
>> > > > >>>> The patch was submitted in June 2014 and was very hot till
>> October
>> > > > 2014.
>> > > > >>>> Since then, there has been no action on this even though I have
>> > sent
>> > > > >>>> many
>> > > > >>>> reminders in this group.
>> > > > >>>>
>> > > > >>>> I understand that everyone is very busy with their own stuff
>> but I
>> > > > would
>> > > > >>>> really appreciate if someone could help a fellow engineer in
>> > getting
>> > > > his
>> > > > >>>> patch accepted.
>> > > > >>>> It would encourage more participation as well as help people
>> > wanting
>> > > > to
>> > > > >>>> use
>> > > > >>>> Avro for circular references.
>> > > > >>>>
>> > > > >>>> Regards
>> > > > >>>> SG
>> > > > >>>>
>> > > > >>>>
>> > > > >>>
>> > > > >>> --
>> > > > >>> Ryan Blue
>> > > > >>> Software Engineer
>> > > > >>> Cloudera, Inc.
>> > > > >>>
>> > > > >>>
>> > > > >>
>> > > > >
>> > > > > --
>> > > > > Ryan Blue
>> > > > > Software Engineer
>> > > > > Cloudera, Inc.
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Ryan Blue
>> > > Software Engineer
>> > > Netflix
>> > >
>> >
>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>

Re: Need review/merges for couple of pull requests

Posted by S G <sg...@gmail.com>.
Thanks RB,

But I am not sure I follow that example (Probably because there is no
actual datum class there?).

Consider a complex code running in production with lots of circular
references.
Something like:

class Home {
   String address;
   Integer zipCode;
   List <Person> residents;
}

class Person {
    String name;
    Person spouse;
    House home;
    Map <Car, Dealer> vehiclesOwned;
}

class Car {
     String make;
     Integer year;
     Dealer soldBy;
     Person ownedBy;
}

class Dealer {
    String name;
    String address;
    List<Car> vehiclesSold;
}

The above is a made-up example, intentionally using easy-to-understand
references.
But it does show that circular references can become very complex and
across multiple chains.

If users have to write Conversion classes for all of the above along with
hash-map(s) that finds out IDs for the objects seen already,  writes them
out as separate fields and then reads them back to replace them with actual
objects, then it would be expecting too much from our clients IMO.

Perhaps it would be more helpful if there was a more seamless way of
getting this done automatically.

SG



On Wed, Jun 1, 2016 at 5:22 PM, Ryan Blue <rb...@netflix.com.invalid> wrote:

> Those aren't the datum classes, those are logical types that are added to
> the schema for your datum classes. The "referenceable" logical type is
> applied to the class that gets replaced with an ID reference and points to
> the field to use for that ID. The "reference" logical type is applied to
> the class that contains a referencable datum. Then there are conversions
> that replace a referenceable with its ID.
>
> On Wed, Jun 1, 2016 at 4:52 PM, S G <sg...@gmail.com> wrote:
>
> > RB,
> >
> > The test TestCircularReferences shows that the classes having circular
> > reference need to extend LogicalType.
> > If every circular reference class has to do this, then I think this would
> > be a big limitation for actual classes used in production code because
> they
> > would be already extending other classes plus asking several other teams
> to
> > change their base-class would be difficult.
> > Is there a way to do this without making the classes extend the
> LogicalType
> > ?
> >
> > SG
> >
> >
> > On Wed, Jun 1, 2016 at 3:29 PM, Ryan Blue <rb...@netflix.com.invalid>
> > wrote:
> >
> > > SG,
> > >
> > > The example/test I built uses logical types to remove the circular
> > > reference when writing and restore it when reading. It doesn't look
> like
> > > your test adds logical types, so that's probably the issue.
> > >
> > > rb
> > >
> > > On Wed, Jun 1, 2016 at 1:51 PM, S G <sg...@gmail.com> wrote:
> > >
> > > > Hey RB,
> > > >
> > > > I was trying out the circular refs with latest 1.8.1 version of Avro
> > and
> > > it
> > > > doesn't seem to be working out of the box for me.
> > > > Perhaps I am missing something and would appreciate your help.
> > > >
> > > > Thanks,
> > > > SG
> > > >
> > > >
> > > > Here is my test code:
> > > >
> > > > public class CircularRefsTest
> > > > {
> > > >     @Test
> > > >     public void testSerialization() throws Exception
> > > >     {
> > > >         // Create a circular linked-list
> > > >         ListNode first = new ListNode("first");
> > > >         ListNode second = new ListNode("second");
> > > >         second.next = first;
> > > >         first.next = second;
> > > >
> > > >         ReflectData rdata = ReflectData.AllowNull.get();
> > > >
> > > >         // Print Schema
> > > >         Schema schema = rdata.getSchema(ListNode.class);
> > > >         System.out.println(schema);
> > > >
> > > >         // Serialize
> > > >         DatumWriter<ListNode> datumWriter = new
> > > > ReflectDatumWriter<ListNode>(ListNode.class);
> > > >         ByteArrayOutputStream byteArrayOutputStream = new
> > > > ByteArrayOutputStream();
> > > >         Encoder encoder =
> > > > EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null);
> > > >         datumWriter.write(first, encoder);
> > > >         encoder.flush();
> > > >         byte[] bytes = byteArrayOutputStream.toByteArray();
> > > >
> > > >         assertTrue( bytes != null );
> > > >     }
> > > > }
> > > >
> > > > class ListNode
> > > > {
> > > >     String data;
> > > >     ListNode next;
> > > >
> > > >     public ListNode() {
> > > >     }
> > > >     public ListNode(String data) {
> > > >         this.data = data;
> > > >     }
> > > > }
> > > >
> > > >
> > > >
> > > > On Thu, May 28, 2015 at 11:00 AM, Ryan Blue <bl...@cloudera.com>
> wrote:
> > > >
> > > > > SG,
> > > > >
> > > > > The data ends up looking like this:
> > > > >
> > > > > {"id":1,"p":"parent data!","child":{"c":"child
> > > > data!","parent":{"long":1}}}
> > > > >
> > > > > I produced that with avro-tools 1.7.6 and tojson.
> > > > >
> > > > > Here's the schema: {
> > > > >   "type" : "record",
> > > > >   "name" : "Parent",
> > > > >   "fields" : [ {
> > > > >     "name" : "id",
> > > > >     "type" : "long"
> > > > >   }, {
> > > > >     "name" : "p",
> > > > >     "type" : "string"
> > > > >   }, {
> > > > >     "name" : "child",
> > > > >     "type" : {
> > > > >       "type" : "record",
> > > > >       "name" : "Child",
> > > > >       "fields" : [ {
> > > > >         "name" : "c",
> > > > >         "type" : "string"
> > > > >       }, {
> > > > >         "name" : "parent",
> > > > >         "type" : [ "null", "long", "Parent" ]
> > > > >       } ],
> > > > >       "logicalType" : "reference",
> > > > >       "ref-field-name" : "parent"
> > > > >     }
> > > > >   } ],
> > > > >   "logicalType" : "referenceable",
> > > > >   "id-field-name" : "id"
> > > > > }
> > > > >
> > > > > rb
> > > > >
> > > > >
> > > > > On 05/28/2015 09:50 AM, S G wrote:
> > > > >
> > > > >> RB,
> > > > >>
> > > > >> Could you please attach the schema and the JSON serialized output
> > from
> > > > >> your
> > > > >> test-code as well?
> > > > >> My build environment is currently broken as I am grappling with
> some
> > > > Java
> > > > >> 8
> > > > >> update issues.
> > > > >>
> > > > >> Thanks
> > > > >> SG
> > > > >>
> > > > >>
> > > > >> On Wed, May 27, 2015 at 3:28 PM, Ryan Blue <bl...@cloudera.com>
> > wrote:
> > > > >>
> > > > >> SG,
> > > > >>>
> > > > >>> Now that logical types are in, I had some time to look at this
> > issue.
> > > > >>> Thanks for your patience on this.
> > > > >>>
> > > > >>> When I started looking at the use case, this began to look very
> > much
> > > > like
> > > > >>> a logical type issue. (I know, I've been saying that a lot.) When
> > you
> > > > >>> write, you replace any referenced object with its id. When you
> > read,
> > > > you
> > > > >>> replace those ids with the correct object. I went ahead and
> > > implemented
> > > > >>> this using 2 logical types: Referenceable for an object with an
> id,
> > > and
> > > > >>> Reference for a record with a field that references another
> object
> > by
> > > > id.
> > > > >>>
> > > > >>> There were some trade-offs to this approach. For example, I had
> to
> > > use
> > > > a
> > > > >>> logical type for the object containing the reference rather than
> > for
> > > > the
> > > > >>> reference itself because I used a callback to set the object once
> > it
> > > is
> > > > >>> found. That happens because children with references to a parent
> > are
> > > > >>> deserialized completely first. The parent is the last object to
> be
> > > > >>> assembled and passed to the logical type conversion.
> > > > >>>
> > > > >>> A bigger issue was that logical types are currently conservative
> > and
> > > > >>> don't
> > > > >>> overlap with reflect types or anything that sets java-class. That
> > > means
> > > > >>> that this currently only works with generic types. But, I'd
> rather
> > > make
> > > > >>> logical types work for reflect than add more custom code to
> support
> > > > this.
> > > > >>> Does that sound reasonable?
> > > > >>>
> > > > >>> I'm attaching a diff with the working test code so you can take a
> > > look
> > > > at
> > > > >>> the approach. Let me know what you are thinking.
> > > > >>>
> > > > >>> rb
> > > > >>>
> > > > >>> On 05/20/2015 12:05 PM, S G wrote:
> > > > >>>
> > > > >>> I am requesting some help with AVRO-695.
> > > > >>>> Here are some pieces from the last conversation.
> > > > >>>>
> > > > >>>>
> > > > >>>> Doug Cutting
> > > > >>>> <
> > > https://issues.apache.org/jira/secure/ViewProfile.jspa?name=cutting>
> > > > >>>> added
> > > > >>>> a comment - 02/Oct/14 21:19
> > > > >>>>
> > > > >>>> Here's a modified version of the patch. It moves all significant
> > > > changes
> > > > >>>> to
> > > > >>>> reflect. Since reflect is the only implementation that can
> > generate
> > > an
> > > > >>>> appropriate schema, changes should be confined to there.
> > > > >>>>
> > > > >>>> The tests need to be updated, as they still assume generic.
> > > > >>>> <https://issues.apache.org/jira/browse/AVRO-695#>
> > > > >>>> <
> > > > >>>>
> > > > >>>>
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/AVRO-695?focusedCommentId=14286370&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14286370
> > > > >>>>
> > > > >>>>>
> > > > >>>>> Sachin Goyal
> > > > >>>> <
> > > > >>>>
> > > >
> > https://issues.apache.org/jira/secure/ViewProfile.jspa?name=sachingoyal
> > > > >>>> >
> > > > >>>> added
> > > > >>>> a comment - 21/Jan/15 21:56
> > > > >>>>
> > > > >>>> Here is a patch with the updated test-cases.
> > > > >>>> I also confirm that all my changes are there in the patch.
> > > > >>>>
> > > > >>>>
> > > > >>>> The patch was submitted in June 2014 and was very hot till
> October
> > > > 2014.
> > > > >>>> Since then, there has been no action on this even though I have
> > sent
> > > > >>>> many
> > > > >>>> reminders in this group.
> > > > >>>>
> > > > >>>> I understand that everyone is very busy with their own stuff
> but I
> > > > would
> > > > >>>> really appreciate if someone could help a fellow engineer in
> > getting
> > > > his
> > > > >>>> patch accepted.
> > > > >>>> It would encourage more participation as well as help people
> > wanting
> > > > to
> > > > >>>> use
> > > > >>>> Avro for circular references.
> > > > >>>>
> > > > >>>> Regards
> > > > >>>> SG
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >>> --
> > > > >>> Ryan Blue
> > > > >>> Software Engineer
> > > > >>> Cloudera, Inc.
> > > > >>>
> > > > >>>
> > > > >>
> > > > >
> > > > > --
> > > > > Ryan Blue
> > > > > Software Engineer
> > > > > Cloudera, Inc.
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Ryan Blue
> > > Software Engineer
> > > Netflix
> > >
> >
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Need review/merges for couple of pull requests

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Those aren't the datum classes, those are logical types that are added to
the schema for your datum classes. The "referenceable" logical type is
applied to the class that gets replaced with an ID reference and points to
the field to use for that ID. The "reference" logical type is applied to
the class that contains a referencable datum. Then there are conversions
that replace a referenceable with its ID.

On Wed, Jun 1, 2016 at 4:52 PM, S G <sg...@gmail.com> wrote:

> RB,
>
> The test TestCircularReferences shows that the classes having circular
> reference need to extend LogicalType.
> If every circular reference class has to do this, then I think this would
> be a big limitation for actual classes used in production code because they
> would be already extending other classes plus asking several other teams to
> change their base-class would be difficult.
> Is there a way to do this without making the classes extend the LogicalType
> ?
>
> SG
>
>
> On Wed, Jun 1, 2016 at 3:29 PM, Ryan Blue <rb...@netflix.com.invalid>
> wrote:
>
> > SG,
> >
> > The example/test I built uses logical types to remove the circular
> > reference when writing and restore it when reading. It doesn't look like
> > your test adds logical types, so that's probably the issue.
> >
> > rb
> >
> > On Wed, Jun 1, 2016 at 1:51 PM, S G <sg...@gmail.com> wrote:
> >
> > > Hey RB,
> > >
> > > I was trying out the circular refs with latest 1.8.1 version of Avro
> and
> > it
> > > doesn't seem to be working out of the box for me.
> > > Perhaps I am missing something and would appreciate your help.
> > >
> > > Thanks,
> > > SG
> > >
> > >
> > > Here is my test code:
> > >
> > > public class CircularRefsTest
> > > {
> > >     @Test
> > >     public void testSerialization() throws Exception
> > >     {
> > >         // Create a circular linked-list
> > >         ListNode first = new ListNode("first");
> > >         ListNode second = new ListNode("second");
> > >         second.next = first;
> > >         first.next = second;
> > >
> > >         ReflectData rdata = ReflectData.AllowNull.get();
> > >
> > >         // Print Schema
> > >         Schema schema = rdata.getSchema(ListNode.class);
> > >         System.out.println(schema);
> > >
> > >         // Serialize
> > >         DatumWriter<ListNode> datumWriter = new
> > > ReflectDatumWriter<ListNode>(ListNode.class);
> > >         ByteArrayOutputStream byteArrayOutputStream = new
> > > ByteArrayOutputStream();
> > >         Encoder encoder =
> > > EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null);
> > >         datumWriter.write(first, encoder);
> > >         encoder.flush();
> > >         byte[] bytes = byteArrayOutputStream.toByteArray();
> > >
> > >         assertTrue( bytes != null );
> > >     }
> > > }
> > >
> > > class ListNode
> > > {
> > >     String data;
> > >     ListNode next;
> > >
> > >     public ListNode() {
> > >     }
> > >     public ListNode(String data) {
> > >         this.data = data;
> > >     }
> > > }
> > >
> > >
> > >
> > > On Thu, May 28, 2015 at 11:00 AM, Ryan Blue <bl...@cloudera.com> wrote:
> > >
> > > > SG,
> > > >
> > > > The data ends up looking like this:
> > > >
> > > > {"id":1,"p":"parent data!","child":{"c":"child
> > > data!","parent":{"long":1}}}
> > > >
> > > > I produced that with avro-tools 1.7.6 and tojson.
> > > >
> > > > Here's the schema: {
> > > >   "type" : "record",
> > > >   "name" : "Parent",
> > > >   "fields" : [ {
> > > >     "name" : "id",
> > > >     "type" : "long"
> > > >   }, {
> > > >     "name" : "p",
> > > >     "type" : "string"
> > > >   }, {
> > > >     "name" : "child",
> > > >     "type" : {
> > > >       "type" : "record",
> > > >       "name" : "Child",
> > > >       "fields" : [ {
> > > >         "name" : "c",
> > > >         "type" : "string"
> > > >       }, {
> > > >         "name" : "parent",
> > > >         "type" : [ "null", "long", "Parent" ]
> > > >       } ],
> > > >       "logicalType" : "reference",
> > > >       "ref-field-name" : "parent"
> > > >     }
> > > >   } ],
> > > >   "logicalType" : "referenceable",
> > > >   "id-field-name" : "id"
> > > > }
> > > >
> > > > rb
> > > >
> > > >
> > > > On 05/28/2015 09:50 AM, S G wrote:
> > > >
> > > >> RB,
> > > >>
> > > >> Could you please attach the schema and the JSON serialized output
> from
> > > >> your
> > > >> test-code as well?
> > > >> My build environment is currently broken as I am grappling with some
> > > Java
> > > >> 8
> > > >> update issues.
> > > >>
> > > >> Thanks
> > > >> SG
> > > >>
> > > >>
> > > >> On Wed, May 27, 2015 at 3:28 PM, Ryan Blue <bl...@cloudera.com>
> wrote:
> > > >>
> > > >> SG,
> > > >>>
> > > >>> Now that logical types are in, I had some time to look at this
> issue.
> > > >>> Thanks for your patience on this.
> > > >>>
> > > >>> When I started looking at the use case, this began to look very
> much
> > > like
> > > >>> a logical type issue. (I know, I've been saying that a lot.) When
> you
> > > >>> write, you replace any referenced object with its id. When you
> read,
> > > you
> > > >>> replace those ids with the correct object. I went ahead and
> > implemented
> > > >>> this using 2 logical types: Referenceable for an object with an id,
> > and
> > > >>> Reference for a record with a field that references another object
> by
> > > id.
> > > >>>
> > > >>> There were some trade-offs to this approach. For example, I had to
> > use
> > > a
> > > >>> logical type for the object containing the reference rather than
> for
> > > the
> > > >>> reference itself because I used a callback to set the object once
> it
> > is
> > > >>> found. That happens because children with references to a parent
> are
> > > >>> deserialized completely first. The parent is the last object to be
> > > >>> assembled and passed to the logical type conversion.
> > > >>>
> > > >>> A bigger issue was that logical types are currently conservative
> and
> > > >>> don't
> > > >>> overlap with reflect types or anything that sets java-class. That
> > means
> > > >>> that this currently only works with generic types. But, I'd rather
> > make
> > > >>> logical types work for reflect than add more custom code to support
> > > this.
> > > >>> Does that sound reasonable?
> > > >>>
> > > >>> I'm attaching a diff with the working test code so you can take a
> > look
> > > at
> > > >>> the approach. Let me know what you are thinking.
> > > >>>
> > > >>> rb
> > > >>>
> > > >>> On 05/20/2015 12:05 PM, S G wrote:
> > > >>>
> > > >>> I am requesting some help with AVRO-695.
> > > >>>> Here are some pieces from the last conversation.
> > > >>>>
> > > >>>>
> > > >>>> Doug Cutting
> > > >>>> <
> > https://issues.apache.org/jira/secure/ViewProfile.jspa?name=cutting>
> > > >>>> added
> > > >>>> a comment - 02/Oct/14 21:19
> > > >>>>
> > > >>>> Here's a modified version of the patch. It moves all significant
> > > changes
> > > >>>> to
> > > >>>> reflect. Since reflect is the only implementation that can
> generate
> > an
> > > >>>> appropriate schema, changes should be confined to there.
> > > >>>>
> > > >>>> The tests need to be updated, as they still assume generic.
> > > >>>> <https://issues.apache.org/jira/browse/AVRO-695#>
> > > >>>> <
> > > >>>>
> > > >>>>
> > >
> >
> https://issues.apache.org/jira/browse/AVRO-695?focusedCommentId=14286370&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14286370
> > > >>>>
> > > >>>>>
> > > >>>>> Sachin Goyal
> > > >>>> <
> > > >>>>
> > >
> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=sachingoyal
> > > >>>> >
> > > >>>> added
> > > >>>> a comment - 21/Jan/15 21:56
> > > >>>>
> > > >>>> Here is a patch with the updated test-cases.
> > > >>>> I also confirm that all my changes are there in the patch.
> > > >>>>
> > > >>>>
> > > >>>> The patch was submitted in June 2014 and was very hot till October
> > > 2014.
> > > >>>> Since then, there has been no action on this even though I have
> sent
> > > >>>> many
> > > >>>> reminders in this group.
> > > >>>>
> > > >>>> I understand that everyone is very busy with their own stuff but I
> > > would
> > > >>>> really appreciate if someone could help a fellow engineer in
> getting
> > > his
> > > >>>> patch accepted.
> > > >>>> It would encourage more participation as well as help people
> wanting
> > > to
> > > >>>> use
> > > >>>> Avro for circular references.
> > > >>>>
> > > >>>> Regards
> > > >>>> SG
> > > >>>>
> > > >>>>
> > > >>>
> > > >>> --
> > > >>> Ryan Blue
> > > >>> Software Engineer
> > > >>> Cloudera, Inc.
> > > >>>
> > > >>>
> > > >>
> > > >
> > > > --
> > > > Ryan Blue
> > > > Software Engineer
> > > > Cloudera, Inc.
> > > >
> > >
> >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
> >
>



-- 
Ryan Blue
Software Engineer
Netflix

Re: Need review/merges for couple of pull requests

Posted by S G <sg...@gmail.com>.
RB,

The test TestCircularReferences shows that the classes having circular
reference need to extend LogicalType.
If every circular reference class has to do this, then I think this would
be a big limitation for actual classes used in production code because they
would be already extending other classes plus asking several other teams to
change their base-class would be difficult.
Is there a way to do this without making the classes extend the LogicalType
?

SG


On Wed, Jun 1, 2016 at 3:29 PM, Ryan Blue <rb...@netflix.com.invalid> wrote:

> SG,
>
> The example/test I built uses logical types to remove the circular
> reference when writing and restore it when reading. It doesn't look like
> your test adds logical types, so that's probably the issue.
>
> rb
>
> On Wed, Jun 1, 2016 at 1:51 PM, S G <sg...@gmail.com> wrote:
>
> > Hey RB,
> >
> > I was trying out the circular refs with latest 1.8.1 version of Avro and
> it
> > doesn't seem to be working out of the box for me.
> > Perhaps I am missing something and would appreciate your help.
> >
> > Thanks,
> > SG
> >
> >
> > Here is my test code:
> >
> > public class CircularRefsTest
> > {
> >     @Test
> >     public void testSerialization() throws Exception
> >     {
> >         // Create a circular linked-list
> >         ListNode first = new ListNode("first");
> >         ListNode second = new ListNode("second");
> >         second.next = first;
> >         first.next = second;
> >
> >         ReflectData rdata = ReflectData.AllowNull.get();
> >
> >         // Print Schema
> >         Schema schema = rdata.getSchema(ListNode.class);
> >         System.out.println(schema);
> >
> >         // Serialize
> >         DatumWriter<ListNode> datumWriter = new
> > ReflectDatumWriter<ListNode>(ListNode.class);
> >         ByteArrayOutputStream byteArrayOutputStream = new
> > ByteArrayOutputStream();
> >         Encoder encoder =
> > EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null);
> >         datumWriter.write(first, encoder);
> >         encoder.flush();
> >         byte[] bytes = byteArrayOutputStream.toByteArray();
> >
> >         assertTrue( bytes != null );
> >     }
> > }
> >
> > class ListNode
> > {
> >     String data;
> >     ListNode next;
> >
> >     public ListNode() {
> >     }
> >     public ListNode(String data) {
> >         this.data = data;
> >     }
> > }
> >
> >
> >
> > On Thu, May 28, 2015 at 11:00 AM, Ryan Blue <bl...@cloudera.com> wrote:
> >
> > > SG,
> > >
> > > The data ends up looking like this:
> > >
> > > {"id":1,"p":"parent data!","child":{"c":"child
> > data!","parent":{"long":1}}}
> > >
> > > I produced that with avro-tools 1.7.6 and tojson.
> > >
> > > Here's the schema: {
> > >   "type" : "record",
> > >   "name" : "Parent",
> > >   "fields" : [ {
> > >     "name" : "id",
> > >     "type" : "long"
> > >   }, {
> > >     "name" : "p",
> > >     "type" : "string"
> > >   }, {
> > >     "name" : "child",
> > >     "type" : {
> > >       "type" : "record",
> > >       "name" : "Child",
> > >       "fields" : [ {
> > >         "name" : "c",
> > >         "type" : "string"
> > >       }, {
> > >         "name" : "parent",
> > >         "type" : [ "null", "long", "Parent" ]
> > >       } ],
> > >       "logicalType" : "reference",
> > >       "ref-field-name" : "parent"
> > >     }
> > >   } ],
> > >   "logicalType" : "referenceable",
> > >   "id-field-name" : "id"
> > > }
> > >
> > > rb
> > >
> > >
> > > On 05/28/2015 09:50 AM, S G wrote:
> > >
> > >> RB,
> > >>
> > >> Could you please attach the schema and the JSON serialized output from
> > >> your
> > >> test-code as well?
> > >> My build environment is currently broken as I am grappling with some
> > Java
> > >> 8
> > >> update issues.
> > >>
> > >> Thanks
> > >> SG
> > >>
> > >>
> > >> On Wed, May 27, 2015 at 3:28 PM, Ryan Blue <bl...@cloudera.com> wrote:
> > >>
> > >> SG,
> > >>>
> > >>> Now that logical types are in, I had some time to look at this issue.
> > >>> Thanks for your patience on this.
> > >>>
> > >>> When I started looking at the use case, this began to look very much
> > like
> > >>> a logical type issue. (I know, I've been saying that a lot.) When you
> > >>> write, you replace any referenced object with its id. When you read,
> > you
> > >>> replace those ids with the correct object. I went ahead and
> implemented
> > >>> this using 2 logical types: Referenceable for an object with an id,
> and
> > >>> Reference for a record with a field that references another object by
> > id.
> > >>>
> > >>> There were some trade-offs to this approach. For example, I had to
> use
> > a
> > >>> logical type for the object containing the reference rather than for
> > the
> > >>> reference itself because I used a callback to set the object once it
> is
> > >>> found. That happens because children with references to a parent are
> > >>> deserialized completely first. The parent is the last object to be
> > >>> assembled and passed to the logical type conversion.
> > >>>
> > >>> A bigger issue was that logical types are currently conservative and
> > >>> don't
> > >>> overlap with reflect types or anything that sets java-class. That
> means
> > >>> that this currently only works with generic types. But, I'd rather
> make
> > >>> logical types work for reflect than add more custom code to support
> > this.
> > >>> Does that sound reasonable?
> > >>>
> > >>> I'm attaching a diff with the working test code so you can take a
> look
> > at
> > >>> the approach. Let me know what you are thinking.
> > >>>
> > >>> rb
> > >>>
> > >>> On 05/20/2015 12:05 PM, S G wrote:
> > >>>
> > >>> I am requesting some help with AVRO-695.
> > >>>> Here are some pieces from the last conversation.
> > >>>>
> > >>>>
> > >>>> Doug Cutting
> > >>>> <
> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=cutting>
> > >>>> added
> > >>>> a comment - 02/Oct/14 21:19
> > >>>>
> > >>>> Here's a modified version of the patch. It moves all significant
> > changes
> > >>>> to
> > >>>> reflect. Since reflect is the only implementation that can generate
> an
> > >>>> appropriate schema, changes should be confined to there.
> > >>>>
> > >>>> The tests need to be updated, as they still assume generic.
> > >>>> <https://issues.apache.org/jira/browse/AVRO-695#>
> > >>>> <
> > >>>>
> > >>>>
> >
> https://issues.apache.org/jira/browse/AVRO-695?focusedCommentId=14286370&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14286370
> > >>>>
> > >>>>>
> > >>>>> Sachin Goyal
> > >>>> <
> > >>>>
> > https://issues.apache.org/jira/secure/ViewProfile.jspa?name=sachingoyal
> > >>>> >
> > >>>> added
> > >>>> a comment - 21/Jan/15 21:56
> > >>>>
> > >>>> Here is a patch with the updated test-cases.
> > >>>> I also confirm that all my changes are there in the patch.
> > >>>>
> > >>>>
> > >>>> The patch was submitted in June 2014 and was very hot till October
> > 2014.
> > >>>> Since then, there has been no action on this even though I have sent
> > >>>> many
> > >>>> reminders in this group.
> > >>>>
> > >>>> I understand that everyone is very busy with their own stuff but I
> > would
> > >>>> really appreciate if someone could help a fellow engineer in getting
> > his
> > >>>> patch accepted.
> > >>>> It would encourage more participation as well as help people wanting
> > to
> > >>>> use
> > >>>> Avro for circular references.
> > >>>>
> > >>>> Regards
> > >>>> SG
> > >>>>
> > >>>>
> > >>>
> > >>> --
> > >>> Ryan Blue
> > >>> Software Engineer
> > >>> Cloudera, Inc.
> > >>>
> > >>>
> > >>
> > >
> > > --
> > > Ryan Blue
> > > Software Engineer
> > > Cloudera, Inc.
> > >
> >
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Need review/merges for couple of pull requests

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
SG,

The example/test I built uses logical types to remove the circular
reference when writing and restore it when reading. It doesn't look like
your test adds logical types, so that's probably the issue.

rb

On Wed, Jun 1, 2016 at 1:51 PM, S G <sg...@gmail.com> wrote:

> Hey RB,
>
> I was trying out the circular refs with latest 1.8.1 version of Avro and it
> doesn't seem to be working out of the box for me.
> Perhaps I am missing something and would appreciate your help.
>
> Thanks,
> SG
>
>
> Here is my test code:
>
> public class CircularRefsTest
> {
>     @Test
>     public void testSerialization() throws Exception
>     {
>         // Create a circular linked-list
>         ListNode first = new ListNode("first");
>         ListNode second = new ListNode("second");
>         second.next = first;
>         first.next = second;
>
>         ReflectData rdata = ReflectData.AllowNull.get();
>
>         // Print Schema
>         Schema schema = rdata.getSchema(ListNode.class);
>         System.out.println(schema);
>
>         // Serialize
>         DatumWriter<ListNode> datumWriter = new
> ReflectDatumWriter<ListNode>(ListNode.class);
>         ByteArrayOutputStream byteArrayOutputStream = new
> ByteArrayOutputStream();
>         Encoder encoder =
> EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null);
>         datumWriter.write(first, encoder);
>         encoder.flush();
>         byte[] bytes = byteArrayOutputStream.toByteArray();
>
>         assertTrue( bytes != null );
>     }
> }
>
> class ListNode
> {
>     String data;
>     ListNode next;
>
>     public ListNode() {
>     }
>     public ListNode(String data) {
>         this.data = data;
>     }
> }
>
>
>
> On Thu, May 28, 2015 at 11:00 AM, Ryan Blue <bl...@cloudera.com> wrote:
>
> > SG,
> >
> > The data ends up looking like this:
> >
> > {"id":1,"p":"parent data!","child":{"c":"child
> data!","parent":{"long":1}}}
> >
> > I produced that with avro-tools 1.7.6 and tojson.
> >
> > Here's the schema: {
> >   "type" : "record",
> >   "name" : "Parent",
> >   "fields" : [ {
> >     "name" : "id",
> >     "type" : "long"
> >   }, {
> >     "name" : "p",
> >     "type" : "string"
> >   }, {
> >     "name" : "child",
> >     "type" : {
> >       "type" : "record",
> >       "name" : "Child",
> >       "fields" : [ {
> >         "name" : "c",
> >         "type" : "string"
> >       }, {
> >         "name" : "parent",
> >         "type" : [ "null", "long", "Parent" ]
> >       } ],
> >       "logicalType" : "reference",
> >       "ref-field-name" : "parent"
> >     }
> >   } ],
> >   "logicalType" : "referenceable",
> >   "id-field-name" : "id"
> > }
> >
> > rb
> >
> >
> > On 05/28/2015 09:50 AM, S G wrote:
> >
> >> RB,
> >>
> >> Could you please attach the schema and the JSON serialized output from
> >> your
> >> test-code as well?
> >> My build environment is currently broken as I am grappling with some
> Java
> >> 8
> >> update issues.
> >>
> >> Thanks
> >> SG
> >>
> >>
> >> On Wed, May 27, 2015 at 3:28 PM, Ryan Blue <bl...@cloudera.com> wrote:
> >>
> >> SG,
> >>>
> >>> Now that logical types are in, I had some time to look at this issue.
> >>> Thanks for your patience on this.
> >>>
> >>> When I started looking at the use case, this began to look very much
> like
> >>> a logical type issue. (I know, I've been saying that a lot.) When you
> >>> write, you replace any referenced object with its id. When you read,
> you
> >>> replace those ids with the correct object. I went ahead and implemented
> >>> this using 2 logical types: Referenceable for an object with an id, and
> >>> Reference for a record with a field that references another object by
> id.
> >>>
> >>> There were some trade-offs to this approach. For example, I had to use
> a
> >>> logical type for the object containing the reference rather than for
> the
> >>> reference itself because I used a callback to set the object once it is
> >>> found. That happens because children with references to a parent are
> >>> deserialized completely first. The parent is the last object to be
> >>> assembled and passed to the logical type conversion.
> >>>
> >>> A bigger issue was that logical types are currently conservative and
> >>> don't
> >>> overlap with reflect types or anything that sets java-class. That means
> >>> that this currently only works with generic types. But, I'd rather make
> >>> logical types work for reflect than add more custom code to support
> this.
> >>> Does that sound reasonable?
> >>>
> >>> I'm attaching a diff with the working test code so you can take a look
> at
> >>> the approach. Let me know what you are thinking.
> >>>
> >>> rb
> >>>
> >>> On 05/20/2015 12:05 PM, S G wrote:
> >>>
> >>> I am requesting some help with AVRO-695.
> >>>> Here are some pieces from the last conversation.
> >>>>
> >>>>
> >>>> Doug Cutting
> >>>> <https://issues.apache.org/jira/secure/ViewProfile.jspa?name=cutting>
> >>>> added
> >>>> a comment - 02/Oct/14 21:19
> >>>>
> >>>> Here's a modified version of the patch. It moves all significant
> changes
> >>>> to
> >>>> reflect. Since reflect is the only implementation that can generate an
> >>>> appropriate schema, changes should be confined to there.
> >>>>
> >>>> The tests need to be updated, as they still assume generic.
> >>>> <https://issues.apache.org/jira/browse/AVRO-695#>
> >>>> <
> >>>>
> >>>>
> https://issues.apache.org/jira/browse/AVRO-695?focusedCommentId=14286370&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14286370
> >>>>
> >>>>>
> >>>>> Sachin Goyal
> >>>> <
> >>>>
> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=sachingoyal
> >>>> >
> >>>> added
> >>>> a comment - 21/Jan/15 21:56
> >>>>
> >>>> Here is a patch with the updated test-cases.
> >>>> I also confirm that all my changes are there in the patch.
> >>>>
> >>>>
> >>>> The patch was submitted in June 2014 and was very hot till October
> 2014.
> >>>> Since then, there has been no action on this even though I have sent
> >>>> many
> >>>> reminders in this group.
> >>>>
> >>>> I understand that everyone is very busy with their own stuff but I
> would
> >>>> really appreciate if someone could help a fellow engineer in getting
> his
> >>>> patch accepted.
> >>>> It would encourage more participation as well as help people wanting
> to
> >>>> use
> >>>> Avro for circular references.
> >>>>
> >>>> Regards
> >>>> SG
> >>>>
> >>>>
> >>>
> >>> --
> >>> Ryan Blue
> >>> Software Engineer
> >>> Cloudera, Inc.
> >>>
> >>>
> >>
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Cloudera, Inc.
> >
>



-- 
Ryan Blue
Software Engineer
Netflix