You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by Scott Banachowski <sb...@yahoo-inc.com> on 2010/04/01 02:23:29 UTC

clarifications on file format

Hi, 

I'm looking at the spec for the container file, and have 2 questions:

The map of metadata key/value pairs begins with a long, then a number of
string-key/bytes-value pairs.  To be consistent with avro maps, should this
be followed by a long of 0?  The spec doesn't say explicitly, but if the
header is described by an avro schema I would suspect yes.

Are the longs that describe the file block varint longs?  Or 64-bit longs?
I assume avro varints.  But if so, if you ever wanted to expand the size of
block by writing more objects to it, you'd be in trouble because you'd
potentially be unable to fit the new size in the varint's location.

Also, I looked around the repo for some example container files, but didn't
see any.  Are there any examples checked in that we can use to examine their
layout and test our readers?

thanks,
Scott


Re: clarifications on file format

Posted by Jeff Hammerbacher <ha...@cloudera.com>.
> The map of metadata key/value pairs begins with a long, then a number of
> string-key/bytes-value pairs.  To be consistent with avro maps, should this
> be followed by a long of 0?  The spec doesn't say explicitly, but if the
> header is described by an avro schema I would suspect yes.
>

Not sure if this is what you are talking about, but in the Python
implementation (datafile.py) we define an Avro schema for the header:

"""

ETA_SCHEMA =
schema.parse("""\

{"type": "record", "name":
"org.apache.avro.file.Header",

 "fields" :
[

   {"name": "magic", "type": {"type": "fixed", "name": "magic", "size":
%d}},

   {"name": "meta", "type": {"type": "map", "values":
"bytes"}},

   {"name": "sync", "type": {"type": "fixed", "name": "sync", "size":
%d}}]}

""" % (MAGIC_SIZE, SYNC_SIZE))

"""

Also, some written container files should show up in
https://issues.apache.org/jira/browse/AVRO-230 real soon now.

Thanks,
Jeff

Re: clarifications on file format

Posted by Scott Carey <sc...@richrelevance.com>.
On Mar 31, 2010, at 5:23 PM, Scott Banachowski wrote:

> Hi, 
> 
> I'm looking at the spec for the container file, and have 2 questions:
> 
> The map of metadata key/value pairs begins with a long, then a number of
> string-key/bytes-value pairs.  To be consistent with avro maps, should this
> be followed by a long of 0?  The spec doesn't say explicitly, but if the
> header is described by an avro schema I would suspect yes.
> 

The Java code for the file uses the avro binary encoder for the map, so it could be defined by an avro schema.

----------
    vout.writeMapStart();                         // write metadata
    vout.setItemCount(meta.size());
    for (Map.Entry<String,byte[]> entry : meta.entrySet()) {
      vout.startItem();
      vout.writeString(entry.getKey());
      vout.writeBytes(entry.getValue());
    }
    vout.writeMapEnd();
    vout.flush(); //vout may be buffered, flush before writing to out
----------

> Are the longs that describe the file block varint longs?  Or 64-bit longs?
> I assume avro varints.  But if so, if you ever wanted to expand the size of
> block by writing more objects to it, you'd be in trouble because you'd
> potentially be unable to fit the new size in the varint's location.
> 

This uses avro encoded longs.  
A block cannot be lengthened in place, one has to know the number of objects and size of the block before writing to the file.  However, since HDFS is write-once resizing a block is not possible for a key use case no matter how the format is designed.  Also, anything other than sequential writes is dangerous for data integrity without great care.

In the Java code objects are encoded to a byte array block buffer before copying the block bytes to the file.  The file format's default block size is 16000 bytes, and it is probably most efficient between 1k and 64k.

A file format optimized for very large blocks would differ.  Also, any file format for random access or in-place modification would necessarily be designed differently.
This one best matches streams of small ( < 100 byte) to medium ( < 4k) sized records, and is built to function with the Hadoop use case.  


> Also, I looked around the repo for some example container files, but didn't
> see any.  Are there any examples checked in that we can use to examine their
> layout and test our readers?
> 
> thanks,
> Scott
>