You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by hiteshpahuja <hi...@gmail.com> on 2014/03/19 04:18:29 UTC

Field level reference across Avro Schema

I am new to Avro, using it for plain serialization and de-serialization
purposes.  I have defined a data model 

{"namespace": "recordData",
 "type": "record",
 "name": "CommonData",
 "fields": [
     {"name": "recordId", "type": "string"},
     {"name": "recordDate",  "type": ["string", "null"]},
     {"name": "recordPrice", "type": ["int", "null"]},
     {"name": "customer", "type": "string"}
 ]
}

I can easily import CommonData in another schema named CustomizedSchema

{"namespace": "recordData",
 "type": "record",
 "name": "CustomizedRecordData",
 "fields": [
     {"name": "custRecordId", "type": "recordData.CommonData"}
     ]
}

This works fine for me. But I wanted to do this like

{"namespace": "recordData",
 "type": "record",
 "name": "CustomizedRecordData",
 "fields": [
     {"name": "custRecordId", "type": "recordData.CommonData.
custRecordId"},
     {"name": "recordDate",  "type": ["string", "null"]}
     ]
}


When I do this , Avro maven plugin gives exception while code generation
saying "recordData.CommonData. custRecordId" undefined field. How can I
import definitions at field level not an record level. Any suggestions.



--
View this message in context: http://apache-avro.679487.n3.nabble.com/Field-level-reference-across-Avro-Schema-tp4029668.html
Sent from the Avro - Developers mailing list archive at Nabble.com.

Re: Field level reference across Avro Schema

Posted by Sean Busbey <bu...@cloudera.com>.
On Tue, Mar 18, 2014 at 11:31 PM, hiteshpahuja <hi...@gmail.com>wrote:

> Intention is to use CommonData schema fields in CustomizedRecordData. While
> defining CustomizedRecordData I wanted to use few fields defined in
> CommonData not the whole CommonData.
>
> Avro allows me to use or import whole CommonData but not specific fields
> from CommonData.
>
>

Ah, yes. Currently the Avro specification only allows the type of a field
to be a named type or a schema. ATM, named types are only Record, Enum, and
Fixed[1].

That does mean that if one of the particular fields of your CommonData is
itself a named type you could reference it, but the usage is awkward.

Expanding named types to include record fields would be an incompatible
change, because it might cause existing schemas to break. Specifically, if
a schema had a field that had the same name as some other named type in the
same namespace the collision would result in an error. If this is something
you want to work out the details on, you should file a jira.

There are a few things you could do now, but the one I'd recommend is to
rely on alias support.

e.g. Given some example customized records

{"namespace": "recordData",
 "type": "record",
 "name": "CustomizedRecordDataFoo",
 "fields": [
     {"name": "recordId", "type": "string"},
     {"name":  "foo",  "type": ["string", "null"]}
     ]
}

{"namespace": "recordData",
 "type": "record",
 "name": "CustomizedRecordDataBar",
 "fields": [
     {"name": "bar", "type": "string"},
     {"name": "recordDate",  "type": "string"}
     ]
}

and then when you want to make use of common, you define a reader schema

{"namespace": "recordData",
  "type": "record",
  "name": "CommonData",
  "aliases": ["CustomizedRecordDataFoo", "CustomizedRecordDataBar"],
  "fields" : [
     {"name": "recordId", "type": ["null", "string"], "default": null},
     {"name": "recordDate",  "type": ["null", "string"], "default": null},
     {"name": "recordPrice", "type": ["null", "int"], "default": null},
     {"name": "customer", "type": ["null", "string"], "default": null}
  ]
}

Using that reader should allow you to go over records of both the
customized versions, with whichever fields are present being set.

Issues to consider in this approach

1) You have to make sure the schema of the individual fields resolve
according to spec rules[2]. The simplified version of this is to make sure
they're both string, int, or whatever (with the one in Common nullable).

2) If the field in the customized record is nullable, you won't be able to
tell the difference between the field not being present and being null. You
can mitigate this by using a known placeholder default instead.

If you can stand some storage overhead, you can deal with the first issue
by using the all-nullable CommonData record in all of the customized
records and then only setting those fields you actually want used.

-Sean

[1]: http://avro.apache.org/docs/1.7.6/spec.html#Names
[2]: http://avro.apache.org/docs/1.7.6/spec.html#Schema+Resolution

Re: Field level reference across Avro Schema

Posted by hiteshpahuja <hi...@gmail.com>.
Intention is to use CommonData schema fields in CustomizedRecordData. While
defining CustomizedRecordData I wanted to use few fields defined in
CommonData not the whole CommonData.

Avro allows me to use or import whole CommonData but not specific fields
from CommonData.



--
View this message in context: http://apache-avro.679487.n3.nabble.com/Field-level-reference-across-Avro-Schema-tp4029668p4029670.html
Sent from the Avro - Developers mailing list archive at Nabble.com.

Re: Field level reference across Avro Schema

Posted by Sean Busbey <bu...@cloudera.com>.
Hi!

Could you clarify your intention a bit?

Is the goal in the second CustomizedRecordData to have a matching field
with what's in CommonData?

Or is the goal to have the first field be CommonData (as in the first
CustomizedRecordData), but followed by additional fields?

-Sean


On Tue, Mar 18, 2014 at 10:18 PM, hiteshpahuja <hi...@gmail.com>wrote:

> I am new to Avro, using it for plain serialization and de-serialization
> purposes.  I have defined a data model
>
> {"namespace": "recordData",
>  "type": "record",
>  "name": "CommonData",
>  "fields": [
>      {"name": "recordId", "type": "string"},
>      {"name": "recordDate",  "type": ["string", "null"]},
>      {"name": "recordPrice", "type": ["int", "null"]},
>      {"name": "customer", "type": "string"}
>  ]
> }
>
> I can easily import CommonData in another schema named CustomizedSchema
>
> {"namespace": "recordData",
>  "type": "record",
>  "name": "CustomizedRecordData",
>  "fields": [
>      {"name": "custRecordId", "type": "recordData.CommonData"}
>      ]
> }
>
> This works fine for me. But I wanted to do this like
>
> {"namespace": "recordData",
>  "type": "record",
>  "name": "CustomizedRecordData",
>  "fields": [
>      {"name": "custRecordId", "type": "recordData.CommonData.
> custRecordId"},
>      {"name": "recordDate",  "type": ["string", "null"]}
>      ]
> }
>
>
> When I do this , Avro maven plugin gives exception while code generation
> saying "recordData.CommonData. custRecordId" undefined field. How can I
> import definitions at field level not an record level. Any suggestions.
>
>
>
>