You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Andrew Hammond <an...@gmail.com> on 2011/04/22 01:28:05 UTC

Concatenating avro data files?

Suppose I have two avro data files containing a number of records. Can I
simply concatenate them together to have a single avro data file without
loosing any records or do I need to actually read them and then write them?

Re: Concatenating avro data files?

Posted by Douglas Creager <dc...@dcreager.net>.
> If the schemas are identical, you can append:

You can't, however, use the Unix "cat" command.  That will give you a second Avro file header halfway through your new file, which is invalid.

Re: Concatenating avro data files?

Posted by Scott Carey <sc...@richrelevance.com>.
If the schemas are identical, you can append:

http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendAllFrom%28org.apache.avro.file.DataFileStream,%20boolean%29

If the compression codec is the same, it will just append block by block without re-serialization or re-compression (very fast).  You can also force it to re-compress if you wish.


On 4/21/11 4:28 PM, "Andrew Hammond" <an...@gmail.com>> wrote:

Suppose I have two avro data files containing a number of records. Can I simply concatenate them together to have a single avro data file without loosing any records or do I need to actually read them and then write them?