You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Andrew Hammond <an...@gmail.com> on 2011/04/22 01:28:05 UTC
Concatenating avro data files?
Suppose I have two avro data files containing a number of records. Can I
simply concatenate them together to have a single avro data file without
loosing any records or do I need to actually read them and then write them?
Re: Concatenating avro data files?
Posted by Douglas Creager <dc...@dcreager.net>.
> If the schemas are identical, you can append:
You can't, however, use the Unix "cat" command. That will give you a second Avro file header halfway through your new file, which is invalid.
Re: Concatenating avro data files?
Posted by Scott Carey <sc...@richrelevance.com>.
If the schemas are identical, you can append:
http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendAllFrom%28org.apache.avro.file.DataFileStream,%20boolean%29
If the compression codec is the same, it will just append block by block without re-serialization or re-compression (very fast). You can also force it to re-compress if you wish.
On 4/21/11 4:28 PM, "Andrew Hammond" <an...@gmail.com>> wrote:
Suppose I have two avro data files containing a number of records. Can I simply concatenate them together to have a single avro data file without loosing any records or do I need to actually read them and then write them?