You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by David Rosenstrauch <da...@darose.net> on 2010/07/27 22:33:59 UTC

Adding arbitrary property on record field

It looks as though it's not possible to add an arbitrary property on a 
record field.  e.g., in the following example, although the schema 
parses fine, the "alias" property gets thrown away:

{
	"name": "KVPair",
	"type": "record",
	"fields" : [
		{"name": "key", "type": "int", "alias": "EventTime"},
		{"name": "values", "type": "bytes"}
	]
}


I had read the Avro spec and thought this was actually allowed. 
("Attributes not defined in this document are permitted as metadata, but 
must not affect the format of serialized data.")

Am I wrong, or is this a bug/omission?  Seems like it would be a really 
useful feature to have.  And near as I can tell, the only other way to 
achieve the same behavior would be to do some kind of hack using the 
field's "doc" attribute.

Thanks,

DR

Re: java specific implementation uses GenericArray ?

Posted by Sharad Agarwal <sh...@yahoo-inc.com>.
Yang wrote:
> one reason I see is thaat SpecificArray attaches to itself a schema,
> this is needed for serialization etc.
>
>   
It can do: java.util.List<Type> or Type[]

Re: java specific implementation uses GenericArray ?

Posted by Yang <te...@gmail.com>.
one reason I see is thaat SpecificArray attaches to itself a schema,
this is needed for serialization etc.

On Tue, Jul 27, 2010 at 3:15 PM, Sharad Agarwal <sh...@yahoo-inc.com> wrote:
> Is there a reason for SpecificCompiler to use GenericArray rather than
> java's native array or list ?
>
>

Re: java specific implementation uses GenericArray ?

Posted by Patrick Linehan <pl...@plinehan.com>.
i'm sorry to say i haven't had time to work on this yet, and it may be quite
a while before i get a chance to get back to it.  other issues have,
unfortunately, taken priority.

please keep me in the loop if you figure it out!

PAt

On Fri, Aug 27, 2010 at 10:19 AM, Stu Hood <st...@rackspace.com> wrote:

> Hey Patrick,
>
> Have you gotten any time to work on this?
>
> Thanks,
> Stu
>
> -----Original Message-----
> From: "Patrick Linehan" <pl...@plinehan.com>
> Sent: Wednesday, August 18, 2010 5:37pm
> To: user@avro.apache.org
> Subject: Re: java specific implementation uses GenericArray ?
>
> On Wed, Aug 18, 2010 at 11:23 AM, Doug Cutting <cu...@apache.org> wrote:
>
> > On 08/16/2010 02:46 PM, Patrick Linehan wrote:
> >
> >> does anyone have any suggestions for dealing with large lists/arrays of
> >> primitive values in avro?
> >>
> >> in my case (numerical algorithms), my naive mapping of a vector type
> >> (mathematical vectors, not java Vectors) to an avro specific type
> >> generates a GenericArray<Double>.  needless to say, i would prefer to
> >> avoid the cost of boxing up all the individual floating point numbers.
> >>
> >> is it possible to coerce avro into using raw java primitive arrays, e.g.
> >> "double[]"?
> >>
> >
> > It should be possible to subclass things to effect this.  In particular:
> >
> >  - extend GenericData, overriding isArray()
> >  - extend GenericDatumReader, overriding readArray()
> >  - extend GenericDatumWriter, overriding writeArray()
> >
> > or s/Generic/Specific/ if you're using generated classes.
> >
> > Note that the reflect implementation already supports java primitive
> > arrays, but values are boxed on read and write.  However if you write
> > type-specific loops for int, float, long and double arrays you should be
> > able to avoid any boxing.
> >
> > Please tell me how this works for you.
> >
> > Thanks,
> >
> > Doug
> >
>
> this is fantastic, thanks for the pointers doug.  i'm currently working on
> some other stuff, but i'll let you know how this works out when i get back
> to it.
>
> PAt
>
>
>

Re: java specific implementation uses GenericArray ?

Posted by Stu Hood <st...@rackspace.com>.
Hey Patrick,

Have you gotten any time to work on this?

Thanks,
Stu

-----Original Message-----
From: "Patrick Linehan" <pl...@plinehan.com>
Sent: Wednesday, August 18, 2010 5:37pm
To: user@avro.apache.org
Subject: Re: java specific implementation uses GenericArray ?

On Wed, Aug 18, 2010 at 11:23 AM, Doug Cutting <cu...@apache.org> wrote:

> On 08/16/2010 02:46 PM, Patrick Linehan wrote:
>
>> does anyone have any suggestions for dealing with large lists/arrays of
>> primitive values in avro?
>>
>> in my case (numerical algorithms), my naive mapping of a vector type
>> (mathematical vectors, not java Vectors) to an avro specific type
>> generates a GenericArray<Double>.  needless to say, i would prefer to
>> avoid the cost of boxing up all the individual floating point numbers.
>>
>> is it possible to coerce avro into using raw java primitive arrays, e.g.
>> "double[]"?
>>
>
> It should be possible to subclass things to effect this.  In particular:
>
>  - extend GenericData, overriding isArray()
>  - extend GenericDatumReader, overriding readArray()
>  - extend GenericDatumWriter, overriding writeArray()
>
> or s/Generic/Specific/ if you're using generated classes.
>
> Note that the reflect implementation already supports java primitive
> arrays, but values are boxed on read and write.  However if you write
> type-specific loops for int, float, long and double arrays you should be
> able to avoid any boxing.
>
> Please tell me how this works for you.
>
> Thanks,
>
> Doug
>

this is fantastic, thanks for the pointers doug.  i'm currently working on
some other stuff, but i'll let you know how this works out when i get back
to it.

PAt



Re: java specific implementation uses GenericArray ?

Posted by Patrick Linehan <pl...@plinehan.com>.
On Wed, Aug 18, 2010 at 11:23 AM, Doug Cutting <cu...@apache.org> wrote:

> On 08/16/2010 02:46 PM, Patrick Linehan wrote:
>
>> does anyone have any suggestions for dealing with large lists/arrays of
>> primitive values in avro?
>>
>> in my case (numerical algorithms), my naive mapping of a vector type
>> (mathematical vectors, not java Vectors) to an avro specific type
>> generates a GenericArray<Double>.  needless to say, i would prefer to
>> avoid the cost of boxing up all the individual floating point numbers.
>>
>> is it possible to coerce avro into using raw java primitive arrays, e.g.
>> "double[]"?
>>
>
> It should be possible to subclass things to effect this.  In particular:
>
>  - extend GenericData, overriding isArray()
>  - extend GenericDatumReader, overriding readArray()
>  - extend GenericDatumWriter, overriding writeArray()
>
> or s/Generic/Specific/ if you're using generated classes.
>
> Note that the reflect implementation already supports java primitive
> arrays, but values are boxed on read and write.  However if you write
> type-specific loops for int, float, long and double arrays you should be
> able to avoid any boxing.
>
> Please tell me how this works for you.
>
> Thanks,
>
> Doug
>

this is fantastic, thanks for the pointers doug.  i'm currently working on
some other stuff, but i'll let you know how this works out when i get back
to it.

PAt

Re: java specific implementation uses GenericArray ?

Posted by Doug Cutting <cu...@apache.org>.
On 08/16/2010 02:46 PM, Patrick Linehan wrote:
> does anyone have any suggestions for dealing with large lists/arrays of
> primitive values in avro?
>
> in my case (numerical algorithms), my naive mapping of a vector type
> (mathematical vectors, not java Vectors) to an avro specific type
> generates a GenericArray<Double>.  needless to say, i would prefer to
> avoid the cost of boxing up all the individual floating point numbers.
>
> is it possible to coerce avro into using raw java primitive arrays, e.g.
> "double[]"?

It should be possible to subclass things to effect this.  In particular:

  - extend GenericData, overriding isArray()
  - extend GenericDatumReader, overriding readArray()
  - extend GenericDatumWriter, overriding writeArray()

or s/Generic/Specific/ if you're using generated classes.

Note that the reflect implementation already supports java primitive 
arrays, but values are boxed on read and write.  However if you write 
type-specific loops for int, float, long and double arrays you should be 
able to avoid any boxing.

Please tell me how this works for you.

Thanks,

Doug

Re: java specific implementation uses GenericArray ?

Posted by Patrick Linehan <pl...@plinehan.com>.
does anyone have any suggestions for dealing with large lists/arrays of
primitive values in avro?

in my case (numerical algorithms), my naive mapping of a vector type
(mathematical vectors, not java Vectors) to an avro specific type generates
a GenericArray<Double>.  needless to say, i would prefer to avoid the cost
of boxing up all the individual floating point numbers.

is it possible to coerce avro into using raw java primitive arrays, e.g.
"double[]"?

On Wed, Jul 28, 2010 at 9:10 AM, Doug Cutting <cu...@apache.org> wrote:

> On 07/28/2010 02:07 AM, Nick Palmer wrote:
>
>> It would be very nice if GenericArray implemented List. I need get,
>> set, and remove in GenericData.Array for my application and have
>> already added these to my Avro code so I can continue developing. I
>> was planning to file a patch in JIRA for this change.
>>
>
> This would be a great patch to have!
>
>
>  The trouble with making GenericArray implement List is that
>> List.size() returns an int and GenericArray.size() returns a long. Is
>> there a reason for this?
>>
>
> Avro arrays can be arbitrarily long, written as blocks.  The thinking was
> that the interface should expose the length as a long, permitting
> implementations that might page values from disk as you iterate.  The
> collision with List#size() is unfortunate.
>
> We could either:
>  a. unilaterally change GenericArray#size() to return int; or
>  b. rename GenericArray#size() to be something else, like arraySize() or
> somesuch, so that someone could still implement a version that's paged.
>
> My instinct is towards (a).  If/when someone ever implements a paged
> representation for GenericArray they can perhaps add a method with the full
> size then.
>
> Doug
>

Re: java specific implementation uses GenericArray ?

Posted by Doug Cutting <cu...@apache.org>.
On 07/28/2010 02:07 AM, Nick Palmer wrote:
> It would be very nice if GenericArray implemented List. I need get,
> set, and remove in GenericData.Array for my application and have
> already added these to my Avro code so I can continue developing. I
> was planning to file a patch in JIRA for this change.

This would be a great patch to have!

> The trouble with making GenericArray implement List is that
> List.size() returns an int and GenericArray.size() returns a long. Is
> there a reason for this?

Avro arrays can be arbitrarily long, written as blocks.  The thinking 
was that the interface should expose the length as a long, permitting 
implementations that might page values from disk as you iterate.  The 
collision with List#size() is unfortunate.

We could either:
  a. unilaterally change GenericArray#size() to return int; or
  b. rename GenericArray#size() to be something else, like arraySize() 
or somesuch, so that someone could still implement a version that's paged.

My instinct is towards (a).  If/when someone ever implements a paged 
representation for GenericArray they can perhaps add a method with the 
full size then.

Doug

Re: java specific implementation uses GenericArray ?

Posted by Nick Palmer <pa...@cs.vu.nl>.
It would be very nice if GenericArray implemented List. I need get, set, and remove in GenericData.Array for my application and have already added these to my Avro code so I can continue developing. I was planning to file a patch in JIRA for this change.

The trouble with making GenericArray implement List is that List.size() returns an int and GenericArray.size() returns a long. Is there a reason for this? If size() can be changed to return an int then I could probably come up with a patch. GenericData.Array already stores size as an int internally and takes an int as the capacity argument to the constructor so I don't understand the long unless it is for supporting streaming APIs. If this is the case then I am not sure how it would be handled but am open to suggestions.

~ Nick

Re: java specific implementation uses GenericArray ?

Posted by Sharad Agarwal <sh...@yahoo-inc.com>.
> GenericArray is designed to support object reuse, while native arrays 
> and List make reuse difficult.  Probably GenericArray should be made to 
> implement List, and perhaps the reader/writer code could be reworked to 
> reuse objects when a GenericArray is used and not to when any other List 
> is used.
>
>   

Using GenericArray is not very symmetric; for maps we do use java.util.Map.
Ok the reason this is a problem for me is:
I am using specific to generate the classes and want to try 
AvroRpcEngine (Avro tunnel in Hadoop). Everything was working fine until 
I had arrays. The problem is AvroRpcEngine uses reflect. I think the 
correct way would be to write my Avro tunnel in Hadoop which uses 
specific instead of reflect APIs.

Sharad

Re: java specific implementation uses GenericArray ?

Posted by Doug Cutting <cu...@apache.org>.
On 07/27/2010 03:15 PM, Sharad Agarwal wrote:
> Is there a reason for SpecificCompiler to use GenericArray rather than
> java's native array or list ?

GenericArray is designed to support object reuse, while native arrays 
and List make reuse difficult.  Probably GenericArray should be made to 
implement List, and perhaps the reader/writer code could be reworked to 
reuse objects when a GenericArray is used and not to when any other List 
is used.

Doug

java specific implementation uses GenericArray ?

Posted by Sharad Agarwal <sh...@yahoo-inc.com>.
Is there a reason for SpecificCompiler to use GenericArray rather than 
java's native array or list ?


Re: Adding arbitrary property on record field

Posted by Doug Cutting <cu...@apache.org>.
David,

The Java implementation currently preserves extra JSON values at the 
schema level, but not at the field level.  It wouldn't be hard to change 
this.  Please file an issue in Jira if this would be useful to you.

Thanks,

Doug

On 07/27/2010 01:33 PM, David Rosenstrauch wrote:
> It looks as though it's not possible to add an arbitrary property on a
> record field. e.g., in the following example, although the schema parses
> fine, the "alias" property gets thrown away:
>
> {
> "name": "KVPair",
> "type": "record",
> "fields" : [
> {"name": "key", "type": "int", "alias": "EventTime"},
> {"name": "values", "type": "bytes"}
> ]
> }
>
>
> I had read the Avro spec and thought this was actually allowed.
> ("Attributes not defined in this document are permitted as metadata, but
> must not affect the format of serialized data.")
>
> Am I wrong, or is this a bug/omission? Seems like it would be a really
> useful feature to have. And near as I can tell, the only other way to
> achieve the same behavior would be to do some kind of hack using the
> field's "doc" attribute.
>
> Thanks,
>
> DR