You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Software Dev <st...@gmail.com> on 2014/02/20 19:58:19 UTC

Schema inheritance

Is there anyway to include the fields of another schema into our schema
WITHOUT it creating a nested record?


{
    "type": "record",
    "name": "Parent",
    "fields" : [
        {
            "name": "foo",
            "type": "string"
        }
    ]
}

{
    "type": "record",
    "name": "Child",
    "fields" : [
        {
            "name": "bar",
            "type": "string"
        },
// I dont want it nested like this
//        {
//            "name": "parent",
//            "type": "Parent"
//        }
    ]
}

So in this example is there a way to have child include both the "bar"
field as well as "foo" without it nested under parent?

Thanks

Re: Schema inheritance

Posted by Gary Steelman <ga...@gmail.com>.
Hey all, thought I'd chime in here. My understanding of what OP means is
aggregation, rather than inheritance, and you can definitely do that in the
JSON schemas without using IDL.

Please see an example below:

{
  "name" : "MyObj",
  "namespace" : "common.datatypes.generated.avro",
  "type" : "record",
  "fields" : [ {
    "name" : "header",
    "type" : [ "null", "common.datatypes.generated.avro.Header" ],
    "default" : "null"
  }, {
    "name" : "Data",
    "type" : "common.datatypes.generated.avro.Data"
  }, {
    "name" : "Metadata",
    "type" : "common.datatypes.generated.avro.Metadata"
  }]
}

And the schemas for the Data and Metadata objects are in the same directory
as the MyObj schema. When you run code generation using the maven plugin,
the Java classes come out as expected and looking just fine :)

Thanks,
Gary


On Fri, Feb 21, 2014 at 6:02 AM, Harsh J <ha...@cloudera.com> wrote:

> Use of protocol schemas in IDLs is optional.
>
> On Fri, Feb 21, 2014 at 5:52 AM, Software Dev <st...@gmail.com>
> wrote:
> > I should also note that the IDL introduced another concept that we
> > previously were dealing with... protocol. Is the protocol just a
> grouping of
> > related records?
> >
> > FYI, we are using Avro strictly for serialization/de-serialization and no
> > RPC features.
> >
> >
> > On Thu, Feb 20, 2014 at 4:19 PM, Software Dev <static.void.dev@gmail.com
> >
> > wrote:
> >>
> >> We have a similar use case and I would like to just "flatten" out the
> >> schema by including the fields from the parent into the child without
> >> nesting. The reason I don't want to nest is because it doesnt play well
> with
> >> some of our tools (Impala, Pig, etc)
> >>
> >> Now back to your initial response. I started playing around with the IDL
> >> but I still can't seem to figure out how to inherit all the fields from
> a
> >> parent into a child record.
> >>
> >> I would like the final "Chile" record to look like:
> >>
> >> {
> >>     "type": "record",
> >>     "name": "Child",
> >>     "fields" : [
> >>         {
> >>             "name": "bar",
> >>             "type": "string"
> >>         },
> >>         {
> >>             "name": "foo",
> >>             "type": "string"
> >>         }
> >>     ]
> >> }
> >>
> >>
> >> On Thu, Feb 20, 2014 at 12:55 PM, Lewis John Mcgibbney
> >> <le...@gmail.com> wrote:
> >>>
> >>> Going back to your initial question... why don't you wish to include
> >>> "foo" within nested "parent"?
> >>> I am not quite getting it here.
> >>> In my case, the nested records were of substantial size including
> dozens
> >>> of fields, which then had nested records. It did not scale to write
> out AVSC
> >>> definitions for the data model.
> >>>
> >>>
> >>> On Thu, Feb 20, 2014 at 8:28 PM, Software Dev <
> static.void.dev@gmail.com>
> >>> wrote:
> >>>>
> >>>> Thanks for the input. I'm guessing then the above problem can only
> >>>> solved with IDL and not AVSC?
> >>>>
> >>>>
> >>>> On Thu, Feb 20, 2014 at 11:45 AM, Lewis John Mcgibbney
> >>>> <le...@gmail.com> wrote:
> >>>>>
> >>>>> Hey,
> >>>>> Did you check out the IDL documentation?
> >>>>> http://avro.apache.org/docs/current/idl.html
> >>>>> I had similar data modeling issues a while back and this helped out A
> >>>>> LOT.
> >>>>> hth
> >>>>>
> >>>>>
> >>>>> On Thu, Feb 20, 2014 at 6:58 PM, Software Dev
> >>>>> <st...@gmail.com> wrote:
> >>>>>>
> >>>>>> Is there anyway to include the fields of another schema into our
> >>>>>> schema WITHOUT it creating a nested record?
> >>>>>>
> >>>>>>
> >>>>>> {
> >>>>>>     "type": "record",
> >>>>>>     "name": "Parent",
> >>>>>>     "fields" : [
> >>>>>>         {
> >>>>>>             "name": "foo",
> >>>>>>             "type": "string"
> >>>>>>         }
> >>>>>>     ]
> >>>>>> }
> >>>>>>
> >>>>>> {
> >>>>>>     "type": "record",
> >>>>>>     "name": "Child",
> >>>>>>     "fields" : [
> >>>>>>         {
> >>>>>>             "name": "bar",
> >>>>>>             "type": "string"
> >>>>>>         },
> >>>>>> // I dont want it nested like this
> >>>>>> //        {
> >>>>>> //            "name": "parent",
> >>>>>> //            "type": "Parent"
> >>>>>> //        }
> >>>>>>     ]
> >>>>>> }
> >>>>>>
> >>>>>> So in this example is there a way to have child include both the
> "bar"
> >>>>>> field as well as "foo" without it nested under parent?
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Lewis
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Lewis
> >>
> >>
> >
>
>
>
> --
> Harsh J
>

Re: Schema inheritance

Posted by Harsh J <ha...@cloudera.com>.
Use of protocol schemas in IDLs is optional.

On Fri, Feb 21, 2014 at 5:52 AM, Software Dev <st...@gmail.com> wrote:
> I should also note that the IDL introduced another concept that we
> previously were dealing with... protocol. Is the protocol just a grouping of
> related records?
>
> FYI, we are using Avro strictly for serialization/de-serialization and no
> RPC features.
>
>
> On Thu, Feb 20, 2014 at 4:19 PM, Software Dev <st...@gmail.com>
> wrote:
>>
>> We have a similar use case and I would like to just "flatten" out the
>> schema by including the fields from the parent into the child without
>> nesting. The reason I don't want to nest is because it doesnt play well with
>> some of our tools (Impala, Pig, etc)
>>
>> Now back to your initial response. I started playing around with the IDL
>> but I still can't seem to figure out how to inherit all the fields from a
>> parent into a child record.
>>
>> I would like the final "Chile" record to look like:
>>
>> {
>>     "type": "record",
>>     "name": "Child",
>>     "fields" : [
>>         {
>>             "name": "bar",
>>             "type": "string"
>>         },
>>         {
>>             "name": "foo",
>>             "type": "string"
>>         }
>>     ]
>> }
>>
>>
>> On Thu, Feb 20, 2014 at 12:55 PM, Lewis John Mcgibbney
>> <le...@gmail.com> wrote:
>>>
>>> Going back to your initial question... why don't you wish to include
>>> "foo" within nested "parent"?
>>> I am not quite getting it here.
>>> In my case, the nested records were of substantial size including dozens
>>> of fields, which then had nested records. It did not scale to write out AVSC
>>> definitions for the data model.
>>>
>>>
>>> On Thu, Feb 20, 2014 at 8:28 PM, Software Dev <st...@gmail.com>
>>> wrote:
>>>>
>>>> Thanks for the input. I'm guessing then the above problem can only
>>>> solved with IDL and not AVSC?
>>>>
>>>>
>>>> On Thu, Feb 20, 2014 at 11:45 AM, Lewis John Mcgibbney
>>>> <le...@gmail.com> wrote:
>>>>>
>>>>> Hey,
>>>>> Did you check out the IDL documentation?
>>>>> http://avro.apache.org/docs/current/idl.html
>>>>> I had similar data modeling issues a while back and this helped out A
>>>>> LOT.
>>>>> hth
>>>>>
>>>>>
>>>>> On Thu, Feb 20, 2014 at 6:58 PM, Software Dev
>>>>> <st...@gmail.com> wrote:
>>>>>>
>>>>>> Is there anyway to include the fields of another schema into our
>>>>>> schema WITHOUT it creating a nested record?
>>>>>>
>>>>>>
>>>>>> {
>>>>>>     "type": "record",
>>>>>>     "name": "Parent",
>>>>>>     "fields" : [
>>>>>>         {
>>>>>>             "name": "foo",
>>>>>>             "type": "string"
>>>>>>         }
>>>>>>     ]
>>>>>> }
>>>>>>
>>>>>> {
>>>>>>     "type": "record",
>>>>>>     "name": "Child",
>>>>>>     "fields" : [
>>>>>>         {
>>>>>>             "name": "bar",
>>>>>>             "type": "string"
>>>>>>         },
>>>>>> // I dont want it nested like this
>>>>>> //        {
>>>>>> //            "name": "parent",
>>>>>> //            "type": "Parent"
>>>>>> //        }
>>>>>>     ]
>>>>>> }
>>>>>>
>>>>>> So in this example is there a way to have child include both the "bar"
>>>>>> field as well as "foo" without it nested under parent?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Lewis
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Lewis
>>
>>
>



-- 
Harsh J

Re: Schema inheritance

Posted by Software Dev <st...@gmail.com>.
I should also note that the IDL introduced another concept that we
previously were dealing with... protocol. Is the protocol just a grouping
of related records?

FYI, we are using Avro strictly for serialization/de-serialization and no
RPC features.


On Thu, Feb 20, 2014 at 4:19 PM, Software Dev <st...@gmail.com>wrote:

> We have a similar use case and I would like to just "flatten" out the
> schema by including the fields from the parent into the child without
> nesting. The reason I don't want to nest is because it doesnt play well
> with some of our tools (Impala, Pig, etc)
>
> Now back to your initial response. I started playing around with the IDL
> but I still can't seem to figure out how to inherit all the fields from a
> parent into a child record.
>
> I would like the final "Chile" record to look like:
>
> {
>     "type": "record",
>     "name": "Child",
>     "fields" : [
>         {
>             "name": "bar",
>             "type": "string"
>         },
>         {
>             "name": "foo",
>             "type": "string"
>         }
>     ]
> }
>
>
> On Thu, Feb 20, 2014 at 12:55 PM, Lewis John Mcgibbney <
> lewis.mcgibbney@gmail.com> wrote:
>
>> Going back to your initial question... why don't you wish to include
>> "foo" within nested "parent"?
>> I am not quite getting it here.
>> In my case, the nested records were of substantial size including dozens
>> of fields, which then had nested records. It did not scale to write out
>> AVSC definitions for the data model.
>>
>>
>> On Thu, Feb 20, 2014 at 8:28 PM, Software Dev <st...@gmail.com>wrote:
>>
>>> Thanks for the input. I'm guessing then the above problem can only
>>> solved with IDL and not AVSC?
>>>
>>>
>>> On Thu, Feb 20, 2014 at 11:45 AM, Lewis John Mcgibbney <
>>> lewis.mcgibbney@gmail.com> wrote:
>>>
>>>> Hey,
>>>> Did you check out the IDL documentation?
>>>> http://avro.apache.org/docs/current/idl.html
>>>> I had similar data modeling issues a while back and this helped out A
>>>> LOT.
>>>> hth
>>>>
>>>>
>>>> On Thu, Feb 20, 2014 at 6:58 PM, Software Dev <
>>>> static.void.dev@gmail.com> wrote:
>>>>
>>>>> Is there anyway to include the fields of another schema into our
>>>>> schema WITHOUT it creating a nested record?
>>>>>
>>>>>
>>>>> {
>>>>>     "type": "record",
>>>>>     "name": "Parent",
>>>>>     "fields" : [
>>>>>         {
>>>>>             "name": "foo",
>>>>>             "type": "string"
>>>>>         }
>>>>>     ]
>>>>> }
>>>>>
>>>>> {
>>>>>     "type": "record",
>>>>>     "name": "Child",
>>>>>     "fields" : [
>>>>>         {
>>>>>             "name": "bar",
>>>>>             "type": "string"
>>>>>         },
>>>>> // I dont want it nested like this
>>>>> //        {
>>>>> //            "name": "parent",
>>>>> //            "type": "Parent"
>>>>> //        }
>>>>>     ]
>>>>> }
>>>>>
>>>>> So in this example is there a way to have child include both the "bar"
>>>>> field as well as "foo" without it nested under parent?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Lewis*
>>>>
>>>
>>>
>>
>>
>> --
>> *Lewis*
>>
>
>

Re: Schema inheritance

Posted by Software Dev <st...@gmail.com>.
We have a similar use case and I would like to just "flatten" out the
schema by including the fields from the parent into the child without
nesting. The reason I don't want to nest is because it doesnt play well
with some of our tools (Impala, Pig, etc)

Now back to your initial response. I started playing around with the IDL
but I still can't seem to figure out how to inherit all the fields from a
parent into a child record.

I would like the final "Chile" record to look like:

{
    "type": "record",
    "name": "Child",
    "fields" : [
        {
            "name": "bar",
            "type": "string"
        },
        {
            "name": "foo",
            "type": "string"
        }
    ]
}


On Thu, Feb 20, 2014 at 12:55 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Going back to your initial question... why don't you wish to include "foo"
> within nested "parent"?
> I am not quite getting it here.
> In my case, the nested records were of substantial size including dozens
> of fields, which then had nested records. It did not scale to write out
> AVSC definitions for the data model.
>
>
> On Thu, Feb 20, 2014 at 8:28 PM, Software Dev <st...@gmail.com>wrote:
>
>> Thanks for the input. I'm guessing then the above problem can only solved
>> with IDL and not AVSC?
>>
>>
>> On Thu, Feb 20, 2014 at 11:45 AM, Lewis John Mcgibbney <
>> lewis.mcgibbney@gmail.com> wrote:
>>
>>> Hey,
>>> Did you check out the IDL documentation?
>>> http://avro.apache.org/docs/current/idl.html
>>> I had similar data modeling issues a while back and this helped out A
>>> LOT.
>>> hth
>>>
>>>
>>> On Thu, Feb 20, 2014 at 6:58 PM, Software Dev <static.void.dev@gmail.com
>>> > wrote:
>>>
>>>> Is there anyway to include the fields of another schema into our schema
>>>> WITHOUT it creating a nested record?
>>>>
>>>>
>>>> {
>>>>     "type": "record",
>>>>     "name": "Parent",
>>>>     "fields" : [
>>>>         {
>>>>             "name": "foo",
>>>>             "type": "string"
>>>>         }
>>>>     ]
>>>> }
>>>>
>>>> {
>>>>     "type": "record",
>>>>     "name": "Child",
>>>>     "fields" : [
>>>>         {
>>>>             "name": "bar",
>>>>             "type": "string"
>>>>         },
>>>> // I dont want it nested like this
>>>> //        {
>>>> //            "name": "parent",
>>>> //            "type": "Parent"
>>>> //        }
>>>>     ]
>>>> }
>>>>
>>>> So in this example is there a way to have child include both the "bar"
>>>> field as well as "foo" without it nested under parent?
>>>>
>>>> Thanks
>>>>
>>>>
>>>
>>>
>>> --
>>> *Lewis*
>>>
>>
>>
>
>
> --
> *Lewis*
>

Re: Schema inheritance

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Going back to your initial question... why don't you wish to include "foo"
within nested "parent"?
I am not quite getting it here.
In my case, the nested records were of substantial size including dozens of
fields, which then had nested records. It did not scale to write out AVSC
definitions for the data model.


On Thu, Feb 20, 2014 at 8:28 PM, Software Dev <st...@gmail.com>wrote:

> Thanks for the input. I'm guessing then the above problem can only solved
> with IDL and not AVSC?
>
>
> On Thu, Feb 20, 2014 at 11:45 AM, Lewis John Mcgibbney <
> lewis.mcgibbney@gmail.com> wrote:
>
>> Hey,
>> Did you check out the IDL documentation?
>> http://avro.apache.org/docs/current/idl.html
>> I had similar data modeling issues a while back and this helped out A LOT.
>> hth
>>
>>
>> On Thu, Feb 20, 2014 at 6:58 PM, Software Dev <st...@gmail.com>wrote:
>>
>>> Is there anyway to include the fields of another schema into our schema
>>> WITHOUT it creating a nested record?
>>>
>>>
>>> {
>>>     "type": "record",
>>>     "name": "Parent",
>>>     "fields" : [
>>>         {
>>>             "name": "foo",
>>>             "type": "string"
>>>         }
>>>     ]
>>> }
>>>
>>> {
>>>     "type": "record",
>>>     "name": "Child",
>>>     "fields" : [
>>>         {
>>>             "name": "bar",
>>>             "type": "string"
>>>         },
>>> // I dont want it nested like this
>>> //        {
>>> //            "name": "parent",
>>> //            "type": "Parent"
>>> //        }
>>>     ]
>>> }
>>>
>>> So in this example is there a way to have child include both the "bar"
>>> field as well as "foo" without it nested under parent?
>>>
>>> Thanks
>>>
>>>
>>
>>
>> --
>> *Lewis*
>>
>
>


-- 
*Lewis*

Re: Schema inheritance

Posted by Software Dev <st...@gmail.com>.
Thanks for the input. I'm guessing then the above problem can only solved
with IDL and not AVSC?


On Thu, Feb 20, 2014 at 11:45 AM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hey,
> Did you check out the IDL documentation?
> http://avro.apache.org/docs/current/idl.html
> I had similar data modeling issues a while back and this helped out A LOT.
> hth
>
>
> On Thu, Feb 20, 2014 at 6:58 PM, Software Dev <st...@gmail.com>wrote:
>
>> Is there anyway to include the fields of another schema into our schema
>> WITHOUT it creating a nested record?
>>
>>
>> {
>>     "type": "record",
>>     "name": "Parent",
>>     "fields" : [
>>         {
>>             "name": "foo",
>>             "type": "string"
>>         }
>>     ]
>> }
>>
>> {
>>     "type": "record",
>>     "name": "Child",
>>     "fields" : [
>>         {
>>             "name": "bar",
>>             "type": "string"
>>         },
>> // I dont want it nested like this
>> //        {
>> //            "name": "parent",
>> //            "type": "Parent"
>> //        }
>>     ]
>> }
>>
>> So in this example is there a way to have child include both the "bar"
>> field as well as "foo" without it nested under parent?
>>
>> Thanks
>>
>>
>
>
> --
> *Lewis*
>

Re: Schema inheritance

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hey,
Did you check out the IDL documentation?
http://avro.apache.org/docs/current/idl.html
I had similar data modeling issues a while back and this helped out A LOT.
hth


On Thu, Feb 20, 2014 at 6:58 PM, Software Dev <st...@gmail.com>wrote:

> Is there anyway to include the fields of another schema into our schema
> WITHOUT it creating a nested record?
>
>
> {
>     "type": "record",
>     "name": "Parent",
>     "fields" : [
>         {
>             "name": "foo",
>             "type": "string"
>         }
>     ]
> }
>
> {
>     "type": "record",
>     "name": "Child",
>     "fields" : [
>         {
>             "name": "bar",
>             "type": "string"
>         },
> // I dont want it nested like this
> //        {
> //            "name": "parent",
> //            "type": "Parent"
> //        }
>     ]
> }
>
> So in this example is there a way to have child include both the "bar"
> field as well as "foo" without it nested under parent?
>
> Thanks
>
>


-- 
*Lewis*