You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Robey Pointer <ro...@twitter.com> on 2010/04/08 06:20:47 UTC

how to fill in an array (list) from java

As part of some performance testing I started doing (here: http://github.com/robey/avrotest), I needed to build a list/array of structs from inside java. I ended up writing this:

  results.edges = new GenericData.Array[Edge](2, results.getSchema().getField("edges").schema())

which is obviously the wrong way to do it. :) It's just the only way I could figure out from scanning the java source.

What's the right way to do that?

robey


Re: how to fill in an array (list) from java

Posted by Doug Cutting <cu...@apache.org>.
Robey Pointer wrote:
> On 7 Apr 2010, at 21:24, Bruce Mitchener wrote:
>> http://code.google.com/p/thrift-protobuf-compare/source/browse/trunk/tpc/src/serializers/avro/AvroGenericSerializer.java
> 
> Is the idea that I can pass null as the schema? 

There's a updated version of that for Avro 1.3 in:

http://code.google.com/p/thrift-protobuf-compare/issues/detail?id=23

Doug

Re: how to fill in an array (list) from java

Posted by Robey Pointer <ro...@twitter.com>.
On 7 Apr 2010, at 21:24, Bruce Mitchener wrote:

> Is this useful to you?
> 
> http://code.google.com/p/thrift-protobuf-compare/source/browse/trunk/tpc/src/serializers/avro/AvroGenericSerializer.java

Is the idea that I can pass null as the schema? I did try that and it blows up:

Exception in thread "main" org.apache.avro.AvroRuntimeException: Not an array schema: null
	at org.apache.avro.generic.GenericData$Array.<init>(GenericData.java:93)
	at com.twitter.avrotest.AvroFoo.<init>(AvroTest.scala:15)

How did you get that to work? :) Is this maybe a recent change in avro? (I'm using 1.3.1.)

robey


Re: how to fill in an array (list) from java

Posted by Bruce Mitchener <br...@gmail.com>.
Is this useful to you?

http://code.google.com/p/thrift-protobuf-compare/source/browse/trunk/tpc/src/serializers/avro/AvroGenericSerializer.java

That's part of the benchmark discussed here:

http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking

 - Bruce

On Wed, Apr 7, 2010 at 10:20 PM, Robey Pointer <ro...@twitter.com> wrote:

> As part of some performance testing I started doing (here:
> http://github.com/robey/avrotest), I needed to build a list/array of
> structs from inside java. I ended up writing this:
>
>  results.edges = new GenericData.Array[Edge](2,
> results.getSchema().getField("edges").schema())
>
> which is obviously the wrong way to do it. :) It's just the only way I
> could figure out from scanning the java source.
>
> What's the right way to do that?
>
> robey
>
>

Re: how to fill in an array (list) from java

Posted by Robey Pointer <ro...@twitter.com>.
On 12 Apr 2010, at 09:40, Doug Cutting wrote:

> Robey Pointer wrote:
>> Maybe we should add a type of Array that implements the avro array interface but doesn't require a schema?
> 
> Instances must know their schema in order to implement #equals(), #compareTo() and #hashCode() consistently with their serialized form. This is because of unions.  Since different branches of a union are not directly comparable, unions are ordered by branch.

It looks like those would mostly be useful for arrays on the read side (which will always have the schema close at hand). Maybe the problem is for the write side, where the schema isn't handy, and wouldn't be useful anyway unless you planned to keep a copy of the array around after you wrote it.

Would it make sense to allow the schema arg to be null or missing for the write case, and just throw an exception if you try to hashCode or compare a schema-less array?

robey


Re: how to fill in an array (list) from java

Posted by Jeff Hodges <jh...@twitter.com>.
+1 here, too.

On Mon, Apr 12, 2010 at 1:17 PM, Kevin Oliver <KO...@salesforce.com> wrote:
> +1 for the helper factories.
>
> Also, I think better javadocs on the various Schema.createXXX methods as to what is expected would help reduce the learning curve. There was a lot of trial and error for me.
>
> -----Original Message-----
> From: Doug Cutting [mailto:cutting@apache.org]
> Sent: Monday, April 12, 2010 9:41 AM
> To: avro-user@hadoop.apache.org
> Subject: Re: how to fill in an array (list) from java
>
> Robey Pointer wrote:
>> Maybe we should add a type of Array that implements the avro array interface but doesn't require a schema?
>
> Instances must know their schema in order to implement #equals(),
> #compareTo() and #hashCode() consistently with their serialized form.
> This is because of unions.  Since different branches of a union are not
> directly comparable, unions are ordered by branch.
>
>> I'm pushing on this because if you look at my sample code, it's by far the ugliest part of assembling a reply.
>
> Perhaps we can instead work to simplify schema constructors?  For
> example, we might support something like:
>
>   Schema.arrayOf(Type.INTEGER)
>
> Similarly, we could add a unionOf that uses varargs, e.g.:
>
>   Schema.unionOf(Type.NULL, Type.STRING);
>
> Could such things help?
>
> Doug
>
>
>

RE: how to fill in an array (list) from java

Posted by Kevin Oliver <KO...@salesforce.com>.
+1 for the helper factories.

Also, I think better javadocs on the various Schema.createXXX methods as to what is expected would help reduce the learning curve. There was a lot of trial and error for me.

-----Original Message-----
From: Doug Cutting [mailto:cutting@apache.org] 
Sent: Monday, April 12, 2010 9:41 AM
To: avro-user@hadoop.apache.org
Subject: Re: how to fill in an array (list) from java

Robey Pointer wrote:
> Maybe we should add a type of Array that implements the avro array interface but doesn't require a schema?

Instances must know their schema in order to implement #equals(), 
#compareTo() and #hashCode() consistently with their serialized form. 
This is because of unions.  Since different branches of a union are not 
directly comparable, unions are ordered by branch.

> I'm pushing on this because if you look at my sample code, it's by far the ugliest part of assembling a reply.

Perhaps we can instead work to simplify schema constructors?  For 
example, we might support something like:

   Schema.arrayOf(Type.INTEGER)

Similarly, we could add a unionOf that uses varargs, e.g.:

   Schema.unionOf(Type.NULL, Type.STRING);

Could such things help?

Doug



Re: how to fill in an array (list) from java

Posted by Doug Cutting <cu...@apache.org>.
Robey Pointer wrote:
> Maybe we should add a type of Array that implements the avro array interface but doesn't require a schema?

Instances must know their schema in order to implement #equals(), 
#compareTo() and #hashCode() consistently with their serialized form. 
This is because of unions.  Since different branches of a union are not 
directly comparable, unions are ordered by branch.

> I'm pushing on this because if you look at my sample code, it's by far the ugliest part of assembling a reply.

Perhaps we can instead work to simplify schema constructors?  For 
example, we might support something like:

   Schema.arrayOf(Type.INTEGER)

Similarly, we could add a unionOf that uses varargs, e.g.:

   Schema.unionOf(Type.NULL, Type.STRING);

Could such things help?

Doug



Re: how to fill in an array (list) from java

Posted by Robey Pointer <ro...@twitter.com>.
On 9 Apr 2010, at 17:03, Scott Carey wrote:

> The schema required by new GenericData.Array is the schema of the array, not the schema of its elements.  
> 
> Try:
> Schema.createArray(YOUR_ELEMENT_SCHEMA_HERE).

I see from Doug's email that the schema is required now, but it's pretty annoying to fetch. Maybe we should add a type of Array that implements the avro array interface but doesn't require a schema? Would such a patch be welcome? :)

I'm pushing on this because if you look at my sample code, it's by far the ugliest part of assembling a reply.

robey


Re: how to fill in an array (list) from java

Posted by Scott Carey <sc...@richrelevance.com>.
The schema required by new GenericData.Array is the schema of the array, not the schema of its elements.  

Try:
Schema.createArray(YOUR_ELEMENT_SCHEMA_HERE).


On Apr 7, 2010, at 9:20 PM, Robey Pointer wrote:

> As part of some performance testing I started doing (here: http://github.com/robey/avrotest), I needed to build a list/array of structs from inside java. I ended up writing this:
> 
>  results.edges = new GenericData.Array[Edge](2, results.getSchema().getField("edges").schema())
> 
> which is obviously the wrong way to do it. :) It's just the only way I could figure out from scanning the java source.
> 
> What's the right way to do that?
> 
> robey
>