You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Stuart White <st...@gmail.com> on 2009/09/18 22:02:09 UTC

How to get started with avro?

I *think* avro is applicable to a problem I'm working on, but I'm
having difficulty getting started with it.  Is there a "getting
started" guide available?  Or an example "hello world" that I can look
at?  Can someone point me in the direction for how to start using
avro?

Thanks!

Re: How to get started with avro?

Posted by Doug Cutting <cu...@apache.org>.
Stuart White wrote:
> So I guess I'm (1) looking for "hello world" in avro, and (2)
> attempting to determine the level of integration between avro and
> Hadoop.  Do avro InputFormat/OutputFormat classes exist?

This is not yet a mature area.  I wish integration with Hadoop was 
further along.

In Hadoop 0.21 (the next release) should be possible to use 
SequenceFile{Input,Output}Format with Avro specific and reflect data.

This is due to the changes in:

https://issues.apache.org/jira/browse/HADOOP-6120

and

https://issues.apache.org/jira/browse/HADOOP-6165

(Note however that patch did not add tests for end-to-end MapReduce, so 
there may still be some issues.)

For Avro generic data, perhaps the most useful with MapReduce, you'd 
need to somehow get the schema to the Serializer and Deserializer that 
are used by the shuffle, since I think it still uses the deprecated 
SerializationFactory#getSerialization(Class).  This could be done by 
having the application or InputFormat add the schema to the job's 
Configuration, then have (a subclass of) AvroGenericDeserializer find 
for it there.  (The Deserializer is Configurable, so it should have a 
copy of the Configuration available to it.)  You'd use the class name 
passed in (metadata.get(CLASS_KEY) as the key to help lookup the schema 
in the config.  Does that make any sense?

There's also an open issue to define an InputFormat/OutputFormat for 
Avro's container file format:

https://issues.apache.org/jira/browse/MAPREDUCE-815

If you're interested in helping push this forward I'll help too.

Doug


Re: How to get started with avro?

Posted by Stuart White <st...@gmail.com>.
I am using Hadoop to process large data files whose formats are
dynamic.  In order to support dynamic data without embedding the
metadata with the data on every record (such as with MapWritable), I
decided to write my values as BytesWritable "blobs", with an external
schema file that describes the name, type, and ordering of the fields
written into the BytesWritable.  I wrote 2 methods
"BytesWritableToMap" and "MapToBytesWritable" that, using the schema,
convert the data from the BytesWritable "blob" to a Map<String,
Writable> and vice-verse.  The data is stored as BytesWritable,
converted to Map<String, Writable> when I'm dealing with it, and
converted back to BytesWritable to output.  The schema file is a
separate text file that looks something like this:

Field1 : org.apache.hadoop.io.Text;
Field2 : org.apache.hadoop.io.Text;

When I read the description of Avro, it sounded exactly like what I
had done, except with a much broader scope.  If it turns out to be a
replacement for what I've written, then it only makes sense for me to
adopt it.

So I guess I'm (1) looking for "hello world" in avro, and (2)
attempting to determine the level of integration between avro and
Hadoop.  Do avro InputFormat/OutputFormat classes exist?  (I'm not
even sure if that question makes sense yet... don't know enough about
avro yet...)

I'll take a look at the junit tests.  Thanks!


On Fri, Sep 18, 2009 at 3:16 PM, Doug Cutting <cu...@apache.org> wrote:
> Unfortunately Avro does not yet have good introductory documentation or
> examples.  The closest thing to examples are the unit tests.
>
> Can you tell more about what you want to do?
>
> Doug
>
> Stuart White wrote:
>>
>> I *think* avro is applicable to a problem I'm working on, but I'm
>> having difficulty getting started with it.  Is there a "getting
>> started" guide available?  Or an example "hello world" that I can look
>> at?  Can someone point me in the direction for how to start using
>> avro?
>>
>> Thanks!
>

Re: How to get started with avro?

Posted by Doug Cutting <cu...@apache.org>.
Unfortunately Avro does not yet have good introductory documentation or 
examples.  The closest thing to examples are the unit tests.

Can you tell more about what you want to do?

Doug

Stuart White wrote:
> I *think* avro is applicable to a problem I'm working on, but I'm
> having difficulty getting started with it.  Is there a "getting
> started" guide available?  Or an example "hello world" that I can look
> at?  Can someone point me in the direction for how to start using
> avro?
> 
> Thanks!

Re: How to get started with avro?

Posted by Eelco Hillenius <ee...@gmail.com>.
Looking at the unit tests and source code was good to get me started.

Eelco

On Friday, September 18, 2009, Stuart White <st...@gmail.com> wrote:
> I *think* avro is applicable to a problem I'm working on, but I'm
> having difficulty getting started with it.  Is there a "getting
> started" guide available?  Or an example "hello world" that I can look
> at?  Can someone point me in the direction for how to start using
> avro?
>
> Thanks!
>