You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Stuart White <st...@gmail.com> on 2009/09/18 22:02:09 UTC
How to get started with avro?
I *think* avro is applicable to a problem I'm working on, but I'm
having difficulty getting started with it. Is there a "getting
started" guide available? Or an example "hello world" that I can look
at? Can someone point me in the direction for how to start using
avro?
Thanks!
Re: How to get started with avro?
Posted by Doug Cutting <cu...@apache.org>.
Stuart White wrote:
> So I guess I'm (1) looking for "hello world" in avro, and (2)
> attempting to determine the level of integration between avro and
> Hadoop. Do avro InputFormat/OutputFormat classes exist?
This is not yet a mature area. I wish integration with Hadoop was
further along.
In Hadoop 0.21 (the next release) should be possible to use
SequenceFile{Input,Output}Format with Avro specific and reflect data.
This is due to the changes in:
https://issues.apache.org/jira/browse/HADOOP-6120
and
https://issues.apache.org/jira/browse/HADOOP-6165
(Note however that patch did not add tests for end-to-end MapReduce, so
there may still be some issues.)
For Avro generic data, perhaps the most useful with MapReduce, you'd
need to somehow get the schema to the Serializer and Deserializer that
are used by the shuffle, since I think it still uses the deprecated
SerializationFactory#getSerialization(Class). This could be done by
having the application or InputFormat add the schema to the job's
Configuration, then have (a subclass of) AvroGenericDeserializer find
for it there. (The Deserializer is Configurable, so it should have a
copy of the Configuration available to it.) You'd use the class name
passed in (metadata.get(CLASS_KEY) as the key to help lookup the schema
in the config. Does that make any sense?
There's also an open issue to define an InputFormat/OutputFormat for
Avro's container file format:
https://issues.apache.org/jira/browse/MAPREDUCE-815
If you're interested in helping push this forward I'll help too.
Doug
Re: How to get started with avro?
Posted by Stuart White <st...@gmail.com>.
I am using Hadoop to process large data files whose formats are
dynamic. In order to support dynamic data without embedding the
metadata with the data on every record (such as with MapWritable), I
decided to write my values as BytesWritable "blobs", with an external
schema file that describes the name, type, and ordering of the fields
written into the BytesWritable. I wrote 2 methods
"BytesWritableToMap" and "MapToBytesWritable" that, using the schema,
convert the data from the BytesWritable "blob" to a Map<String,
Writable> and vice-verse. The data is stored as BytesWritable,
converted to Map<String, Writable> when I'm dealing with it, and
converted back to BytesWritable to output. The schema file is a
separate text file that looks something like this:
Field1 : org.apache.hadoop.io.Text;
Field2 : org.apache.hadoop.io.Text;
When I read the description of Avro, it sounded exactly like what I
had done, except with a much broader scope. If it turns out to be a
replacement for what I've written, then it only makes sense for me to
adopt it.
So I guess I'm (1) looking for "hello world" in avro, and (2)
attempting to determine the level of integration between avro and
Hadoop. Do avro InputFormat/OutputFormat classes exist? (I'm not
even sure if that question makes sense yet... don't know enough about
avro yet...)
I'll take a look at the junit tests. Thanks!
On Fri, Sep 18, 2009 at 3:16 PM, Doug Cutting <cu...@apache.org> wrote:
> Unfortunately Avro does not yet have good introductory documentation or
> examples. The closest thing to examples are the unit tests.
>
> Can you tell more about what you want to do?
>
> Doug
>
> Stuart White wrote:
>>
>> I *think* avro is applicable to a problem I'm working on, but I'm
>> having difficulty getting started with it. Is there a "getting
>> started" guide available? Or an example "hello world" that I can look
>> at? Can someone point me in the direction for how to start using
>> avro?
>>
>> Thanks!
>
Re: How to get started with avro?
Posted by Doug Cutting <cu...@apache.org>.
Unfortunately Avro does not yet have good introductory documentation or
examples. The closest thing to examples are the unit tests.
Can you tell more about what you want to do?
Doug
Stuart White wrote:
> I *think* avro is applicable to a problem I'm working on, but I'm
> having difficulty getting started with it. Is there a "getting
> started" guide available? Or an example "hello world" that I can look
> at? Can someone point me in the direction for how to start using
> avro?
>
> Thanks!
Re: How to get started with avro?
Posted by Eelco Hillenius <ee...@gmail.com>.
Looking at the unit tests and source code was good to get me started.
Eelco
On Friday, September 18, 2009, Stuart White <st...@gmail.com> wrote:
> I *think* avro is applicable to a problem I'm working on, but I'm
> having difficulty getting started with it. Is there a "getting
> started" guide available? Or an example "hello world" that I can look
> at? Can someone point me in the direction for how to start using
> avro?
>
> Thanks!
>