You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by S G <sg...@gmail.com> on 2016/08/29 04:31:40 UTC

Re: Need review/merges for couple of pull requests

Hey Ryan,

Any chance you were able to look into this (My email on Jun 6)?

Thanks
SG

On Thu, Jun 16, 2016 at 2:05 PM, S G <sg...@gmail.com> wrote:

> Doug Cutting, Ryan Blue,
>
> I have worked with both of you on this ticket and would appreciate your
> thoughts regarding circular references.
>
> If you think the current circular references' implementation in Avro is
> indeed easy for clients to use (i.e. they do not have to write all those
> extra classes), then please help me by providing some test-case/code
> reference.
>
> Thanks
> Sachin
>
>
> On Mon, Jun 6, 2016 at 9:18 AM, S G <sg...@gmail.com> wrote:
>
>>
>> As far as I have understood the attached testcase, I think the solution
>> in https://github.com/apache/avro/pull/23/files (JIRA:
>> https://issues.apache.org/jira/browse/AVRO-695) is more seamless from a
>> user's perspective because the user does not have to write any classes
>> extending LogicalType or Conversion.
>>
>> Writing those classes could become a bit of a challenge:
>> 1) If there are several circular references in the code (something very
>> common in the Hibernate world).
>> 2) It is a pain to change these classes to reflect any changes in the
>> classes being serialized.
>> 3) When the user decides to write a conversion/logical-type class for
>> some record, he has to handle serialization for all fields of the class
>> even though all he wanted to handle was circular references (This is what I
>> have understood but I could be wrong here).
>>
>> In my pull request for AVRO-695, the only thing a user need to do is to
>> call setResolvingCircularRefs(true) on ReflectData. Then every record is
>> converted into a union of the record-type and the newly added class
>> CircularRef. This new class stores the actual reference ID on encountering
>> a circular ref. This is really convenient for the user and there is no
>> penalty on performance. Even if there is a problem, the user can always go
>> back to writing LogicalType and Conversion classes.
>>
>> My request would be to consider this solution for handling circular
>> references instead of asking the user to write a bunch of other classes.
>> If the solution sounds acceptable, then I can bring the patch up-to-date
>> with the latest master branch and reopen the pull request.
>>
>> Cheers,
>> SG
>>
>> On Wed, Jun 1, 2016 at 5:36 PM, S G <sg...@gmail.com> wrote:
>>
>>> Thanks RB,
>>>
>>> But I am not sure I follow that example (Probably because there is no
>>> actual datum class there?).
>>>
>>> Consider a complex code running in production with lots of circular
>>> references.
>>> Something like:
>>>
>>> class Home {
>>>    String address;
>>>    Integer zipCode;
>>>    List <Person> residents;
>>> }
>>>
>>> class Person {
>>>     String name;
>>>     Person spouse;
>>>     House home;
>>>     Map <Car, Dealer> vehiclesOwned;
>>> }
>>>
>>> class Car {
>>>      String make;
>>>      Integer year;
>>>      Dealer soldBy;
>>>      Person ownedBy;
>>> }
>>>
>>> class Dealer {
>>>     String name;
>>>     String address;
>>>     List<Car> vehiclesSold;
>>> }
>>>
>>> The above is a made-up example, intentionally using easy-to-understand
>>> references.
>>> But it does show that circular references can become very complex and
>>> across multiple chains.
>>>
>>> If users have to write Conversion classes for all of the above along
>>> with hash-map(s) that finds out IDs for the objects seen already,  writes
>>> them out as separate fields and then reads them back to replace them with
>>> actual objects, then it would be expecting too much from our clients IMO.
>>>
>>> Perhaps it would be more helpful if there was a more seamless way of
>>> getting this done automatically.
>>>
>>> SG
>>>
>>>
>>>
>>> On Wed, Jun 1, 2016 at 5:22 PM, Ryan Blue <rb...@netflix.com.invalid>
>>> wrote:
>>>
>>>> Those aren't the datum classes, those are logical types that are added
>>>> to
>>>> the schema for your datum classes. The "referenceable" logical type is
>>>> applied to the class that gets replaced with an ID reference and points
>>>> to
>>>> the field to use for that ID. The "reference" logical type is applied to
>>>> the class that contains a referencable datum. Then there are conversions
>>>> that replace a referenceable with its ID.
>>>>
>>>> On Wed, Jun 1, 2016 at 4:52 PM, S G <sg...@gmail.com> wrote:
>>>>
>>>> > RB,
>>>> >
>>>> > The test TestCircularReferences shows that the classes having circular
>>>> > reference need to extend LogicalType.
>>>> > If every circular reference class has to do this, then I think this
>>>> would
>>>> > be a big limitation for actual classes used in production code
>>>> because they
>>>> > would be already extending other classes plus asking several other
>>>> teams to
>>>> > change their base-class would be difficult.
>>>> > Is there a way to do this without making the classes extend the
>>>> LogicalType
>>>> > ?
>>>> >
>>>> > SG
>>>> >
>>>> >
>>>> > On Wed, Jun 1, 2016 at 3:29 PM, Ryan Blue <rb...@netflix.com.invalid>
>>>> > wrote:
>>>> >
>>>> > > SG,
>>>> > >
>>>> > > The example/test I built uses logical types to remove the circular
>>>> > > reference when writing and restore it when reading. It doesn't look
>>>> like
>>>> > > your test adds logical types, so that's probably the issue.
>>>> > >
>>>> > > rb
>>>> > >
>>>> > > On Wed, Jun 1, 2016 at 1:51 PM, S G <sg...@gmail.com>
>>>> wrote:
>>>> > >
>>>> > > > Hey RB,
>>>> > > >
>>>> > > > I was trying out the circular refs with latest 1.8.1 version of
>>>> Avro
>>>> > and
>>>> > > it
>>>> > > > doesn't seem to be working out of the box for me.
>>>> > > > Perhaps I am missing something and would appreciate your help.
>>>> > > >
>>>> > > > Thanks,
>>>> > > > SG
>>>> > > >
>>>> > > >
>>>> > > > Here is my test code:
>>>> > > >
>>>> > > > public class CircularRefsTest
>>>> > > > {
>>>> > > >     @Test
>>>> > > >     public void testSerialization() throws Exception
>>>> > > >     {
>>>> > > >         // Create a circular linked-list
>>>> > > >         ListNode first = new ListNode("first");
>>>> > > >         ListNode second = new ListNode("second");
>>>> > > >         second.next = first;
>>>> > > >         first.next = second;
>>>> > > >
>>>> > > >         ReflectData rdata = ReflectData.AllowNull.get();
>>>> > > >
>>>> > > >         // Print Schema
>>>> > > >         Schema schema = rdata.getSchema(ListNode.class);
>>>> > > >         System.out.println(schema);
>>>> > > >
>>>> > > >         // Serialize
>>>> > > >         DatumWriter<ListNode> datumWriter = new
>>>> > > > ReflectDatumWriter<ListNode>(ListNode.class);
>>>> > > >         ByteArrayOutputStream byteArrayOutputStream = new
>>>> > > > ByteArrayOutputStream();
>>>> > > >         Encoder encoder =
>>>> > > > EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null);
>>>> > > >         datumWriter.write(first, encoder);
>>>> > > >         encoder.flush();
>>>> > > >         byte[] bytes = byteArrayOutputStream.toByteArray();
>>>> > > >
>>>> > > >         assertTrue( bytes != null );
>>>> > > >     }
>>>> > > > }
>>>> > > >
>>>> > > > class ListNode
>>>> > > > {
>>>> > > >     String data;
>>>> > > >     ListNode next;
>>>> > > >
>>>> > > >     public ListNode() {
>>>> > > >     }
>>>> > > >     public ListNode(String data) {
>>>> > > >         this.data = data;
>>>> > > >     }
>>>> > > > }
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > > On Thu, May 28, 2015 at 11:00 AM, Ryan Blue <bl...@cloudera.com>
>>>> wrote:
>>>> > > >
>>>> > > > > SG,
>>>> > > > >
>>>> > > > > The data ends up looking like this:
>>>> > > > >
>>>> > > > > {"id":1,"p":"parent data!","child":{"c":"child
>>>> > > > data!","parent":{"long":1}}}
>>>> > > > >
>>>> > > > > I produced that with avro-tools 1.7.6 and tojson.
>>>> > > > >
>>>> > > > > Here's the schema: {
>>>> > > > >   "type" : "record",
>>>> > > > >   "name" : "Parent",
>>>> > > > >   "fields" : [ {
>>>> > > > >     "name" : "id",
>>>> > > > >     "type" : "long"
>>>> > > > >   }, {
>>>> > > > >     "name" : "p",
>>>> > > > >     "type" : "string"
>>>> > > > >   }, {
>>>> > > > >     "name" : "child",
>>>> > > > >     "type" : {
>>>> > > > >       "type" : "record",
>>>> > > > >       "name" : "Child",
>>>> > > > >       "fields" : [ {
>>>> > > > >         "name" : "c",
>>>> > > > >         "type" : "string"
>>>> > > > >       }, {
>>>> > > > >         "name" : "parent",
>>>> > > > >         "type" : [ "null", "long", "Parent" ]
>>>> > > > >       } ],
>>>> > > > >       "logicalType" : "reference",
>>>> > > > >       "ref-field-name" : "parent"
>>>> > > > >     }
>>>> > > > >   } ],
>>>> > > > >   "logicalType" : "referenceable",
>>>> > > > >   "id-field-name" : "id"
>>>> > > > > }
>>>> > > > >
>>>> > > > > rb
>>>> > > > >
>>>> > > > >
>>>> > > > > On 05/28/2015 09:50 AM, S G wrote:
>>>> > > > >
>>>> > > > >> RB,
>>>> > > > >>
>>>> > > > >> Could you please attach the schema and the JSON serialized
>>>> output
>>>> > from
>>>> > > > >> your
>>>> > > > >> test-code as well?
>>>> > > > >> My build environment is currently broken as I am grappling
>>>> with some
>>>> > > > Java
>>>> > > > >> 8
>>>> > > > >> update issues.
>>>> > > > >>
>>>> > > > >> Thanks
>>>> > > > >> SG
>>>> > > > >>
>>>> > > > >>
>>>> > > > >> On Wed, May 27, 2015 at 3:28 PM, Ryan Blue <bl...@cloudera.com>
>>>> > wrote:
>>>> > > > >>
>>>> > > > >> SG,
>>>> > > > >>>
>>>> > > > >>> Now that logical types are in, I had some time to look at this
>>>> > issue.
>>>> > > > >>> Thanks for your patience on this.
>>>> > > > >>>
>>>> > > > >>> When I started looking at the use case, this began to look
>>>> very
>>>> > much
>>>> > > > like
>>>> > > > >>> a logical type issue. (I know, I've been saying that a lot.)
>>>> When
>>>> > you
>>>> > > > >>> write, you replace any referenced object with its id. When you
>>>> > read,
>>>> > > > you
>>>> > > > >>> replace those ids with the correct object. I went ahead and
>>>> > > implemented
>>>> > > > >>> this using 2 logical types: Referenceable for an object with
>>>> an id,
>>>> > > and
>>>> > > > >>> Reference for a record with a field that references another
>>>> object
>>>> > by
>>>> > > > id.
>>>> > > > >>>
>>>> > > > >>> There were some trade-offs to this approach. For example, I
>>>> had to
>>>> > > use
>>>> > > > a
>>>> > > > >>> logical type for the object containing the reference rather
>>>> than
>>>> > for
>>>> > > > the
>>>> > > > >>> reference itself because I used a callback to set the object
>>>> once
>>>> > it
>>>> > > is
>>>> > > > >>> found. That happens because children with references to a
>>>> parent
>>>> > are
>>>> > > > >>> deserialized completely first. The parent is the last object
>>>> to be
>>>> > > > >>> assembled and passed to the logical type conversion.
>>>> > > > >>>
>>>> > > > >>> A bigger issue was that logical types are currently
>>>> conservative
>>>> > and
>>>> > > > >>> don't
>>>> > > > >>> overlap with reflect types or anything that sets java-class.
>>>> That
>>>> > > means
>>>> > > > >>> that this currently only works with generic types. But, I'd
>>>> rather
>>>> > > make
>>>> > > > >>> logical types work for reflect than add more custom code to
>>>> support
>>>> > > > this.
>>>> > > > >>> Does that sound reasonable?
>>>> > > > >>>
>>>> > > > >>> I'm attaching a diff with the working test code so you can
>>>> take a
>>>> > > look
>>>> > > > at
>>>> > > > >>> the approach. Let me know what you are thinking.
>>>> > > > >>>
>>>> > > > >>> rb
>>>> > > > >>>
>>>> > > > >>> On 05/20/2015 12:05 PM, S G wrote:
>>>> > > > >>>
>>>> > > > >>> I am requesting some help with AVRO-695.
>>>> > > > >>>> Here are some pieces from the last conversation.
>>>> > > > >>>>
>>>> > > > >>>>
>>>> > > > >>>> Doug Cutting
>>>> > > > >>>> <
>>>> > > https://issues.apache.org/jira/secure/ViewProfile.jspa?name=cutting
>>>> >
>>>> > > > >>>> added
>>>> > > > >>>> a comment - 02/Oct/14 21:19
>>>> > > > >>>>
>>>> > > > >>>> Here's a modified version of the patch. It moves all
>>>> significant
>>>> > > > changes
>>>> > > > >>>> to
>>>> > > > >>>> reflect. Since reflect is the only implementation that can
>>>> > generate
>>>> > > an
>>>> > > > >>>> appropriate schema, changes should be confined to there.
>>>> > > > >>>>
>>>> > > > >>>> The tests need to be updated, as they still assume generic.
>>>> > > > >>>> <https://issues.apache.org/jira/browse/AVRO-695#>
>>>> > > > >>>> <
>>>> > > > >>>>
>>>> > > > >>>>
>>>> > > >
>>>> > >
>>>> > https://issues.apache.org/jira/browse/AVRO-695?
>>>> focusedCommentId=14286370&page=com.atlassian.jira.
>>>> plugin.system.issuetabpanels:comment-tabpanel#comment-14286370
>>>> > > > >>>>
>>>> > > > >>>>>
>>>> > > > >>>>> Sachin Goyal
>>>> > > > >>>> <
>>>> > > > >>>>
>>>> > > >
>>>> > https://issues.apache.org/jira/secure/ViewProfile.jspa?
>>>> name=sachingoyal
>>>> > > > >>>> >
>>>> > > > >>>> added
>>>> > > > >>>> a comment - 21/Jan/15 21:56
>>>> > > > >>>>
>>>> > > > >>>> Here is a patch with the updated test-cases.
>>>> > > > >>>> I also confirm that all my changes are there in the patch.
>>>> > > > >>>>
>>>> > > > >>>>
>>>> > > > >>>> The patch was submitted in June 2014 and was very hot till
>>>> October
>>>> > > > 2014.
>>>> > > > >>>> Since then, there has been no action on this even though I
>>>> have
>>>> > sent
>>>> > > > >>>> many
>>>> > > > >>>> reminders in this group.
>>>> > > > >>>>
>>>> > > > >>>> I understand that everyone is very busy with their own stuff
>>>> but I
>>>> > > > would
>>>> > > > >>>> really appreciate if someone could help a fellow engineer in
>>>> > getting
>>>> > > > his
>>>> > > > >>>> patch accepted.
>>>> > > > >>>> It would encourage more participation as well as help people
>>>> > wanting
>>>> > > > to
>>>> > > > >>>> use
>>>> > > > >>>> Avro for circular references.
>>>> > > > >>>>
>>>> > > > >>>> Regards
>>>> > > > >>>> SG
>>>> > > > >>>>
>>>> > > > >>>>
>>>> > > > >>>
>>>> > > > >>> --
>>>> > > > >>> Ryan Blue
>>>> > > > >>> Software Engineer
>>>> > > > >>> Cloudera, Inc.
>>>> > > > >>>
>>>> > > > >>>
>>>> > > > >>
>>>> > > > >
>>>> > > > > --
>>>> > > > > Ryan Blue
>>>> > > > > Software Engineer
>>>> > > > > Cloudera, Inc.
>>>> > > > >
>>>> > > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > --
>>>> > > Ryan Blue
>>>> > > Software Engineer
>>>> > > Netflix
>>>> > >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>>
>>>
>>>
>>
>