You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Michael Malak <mi...@yahoo.com> on 2013/02/01 20:32:23 UTC

Re: Is it possible to append to an already existing avro file

Was a JIRA ticket ever created regarding appending to an existing Avro file on HDFS?

What is the status of such a capability, a year out from when the issue below was raised?

On Wed, 22 Feb 2012 10:57:48 +0100, "Vyacheslav Zholudev" <vy...@gmail.com> wrote:

> Thanks for your reply, I suspected this. 
>
> I will create a JIRA ticket.
>
> Vyacheslav
> 
> On Feb 21, 2012, at 6:02 PM, Scott Carey wrote:
> 
>> 
>> On 2/21/12 7:29 AM, "Vyacheslav Zholudev" <vy...@gmail.com>
>> wrote:
>> 
>>> Yep, I saw that method as well as the stackoverflow post. However, I'm
>>> interested how to append to a file on the arbitrary file system, not
>>> only on the local one. 
>>> 
>>> I want to get an OutputStream based on the Path and the FileSystem
>>> implementation and then pass it for appending to avro methods.
>>> 
>>> Is that possible?
>> 
>> It is not possible without modifying DataFileWriter. Please open a JIRA
>> ticket.  
>> 
>> It could not simply append to an OutputStream, since it must either:
>> * Seek to the start to validate the schemas match and find the sync
>> marker, or
>> * Trust that the schemas match and find the sync marker from the last
>> block
>> 
>> DataFileWriter cannot refer to Hadoop classes such as FileSystem, but we
>> could add something to the mapred module that takes a Path and
>> FileSystem and returns something that implemements an interface that
>> DataFileWriter can append to.  This would be something that is both a
>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> and an OutputStream, or has both an InputStream from the start of the
>> existing file and an OutputStream at the end.
>> 
>>> Thanks,
>>> Vyacheslav
>>> 
>>> On Feb 21, 2012, at 5:29 AM, Harsh J wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Use the appendTo feature of the DataFileWriter. See
>>>> 
>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>>> 
>>>> For a quick setup example, read also:
>>>> 
>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>>> 
>>>> On Tue, Feb 21, 2012 at 3:15 AM, Vyacheslav Zholudev
>>>> <vy...@gmail.com> wrote:
>>>>> Hi,
>>>>> 
>>>>> is it possible to append to an already existing avro file when it was
>>>>> written and closed before?
>>>>> 
>>>>> If I use
>>>>> outputStream = fs.append(avroFilePath);
>>>>> 
>>>>> then later on I get: java.io.IOException: Invalid sync!
>>>>> 
>>>>> Probably because the schema is written twice and some other issues.
>>>>> 
>>>>> If I use outputStream = fs.create(avroFilePath); then the avro file
>>>>> gets
>>>>> overwritten.
>>>>> 
>>>>> Thanks,
>>>>> Vyacheslav
>>>> 
>>>> -- 
>>>> Harsh J
>>>> Customer Ops. Engineer
>>>> Cloudera | http://tiny.cloudera.com/about


Re: Is it possible to append to an already existing avro file

Posted by Harsh J <ha...@cloudera.com>.
I *completely* missed that, although I've worked with it in past, thanks Doug!

I updated my example: https://gist.github.com/QwertyManiac/4724582.

On Thu, Feb 7, 2013 at 10:21 PM, Doug Cutting <cu...@apache.org> wrote:
> The avro-mapred module includes a Seekable implementation that works
> with HDFS called FsInput:
>
> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/FsInput.html
>
> With this, your example can be made considerably smaller.
>
> Doug
>
>
>
> On Thu, Feb 7, 2013 at 8:28 AM, Harsh J <ha...@cloudera.com> wrote:
>> I assume by non-trivial you meant the extra Seekable stuff I needed to
>> wrap around the DFS output streams to let Avro take it as append-able?
>> I don't think its possible for Avro to carry it since Avro (core) does
>> not reverse-depend on Hadoop. Should we document it somewhere though?
>> Do you have any ideas on the best place to do that?
>>
>> On Thu, Feb 7, 2013 at 6:12 AM, Michael Malak <mi...@yahoo.com> wrote:
>>> Thanks so much for the code -- it works great!
>>>
>>> Since it is a non-trivial amount of code required to achieve append, I suggest attaching that code to AVRO-1035, in the hopes that someone will come up with an interface that requires just one line of user code to achieve append.
>>>
>>> --- On Wed, 2/6/13, Harsh J <ha...@cloudera.com> wrote:
>>>
>>>> From: Harsh J <ha...@cloudera.com>
>>>> Subject: Re: Is it possible to append to an already existing avro file
>>>> To: user@avro.apache.org
>>>> Date: Wednesday, February 6, 2013, 11:17 AM
>>>> Hey Michael,
>>>>
>>>> It does implement the regular Java OutputStream interface,
>>>> as seen in
>>>> the API: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html.
>>>>
>>>> Here's a sample program that works on Hadoop 2.x in my
>>>> tests:
>>>> https://gist.github.com/QwertyManiac/4724582
>>>>
>>>> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <mi...@yahoo.com>
>>>> wrote:
>>>> > I don't believe a Hadoop FileSystem is a Java
>>>> OutputStream?
>>>> >
>>>> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>>>> wrote:
>>>> >
>>>> >> From: Doug Cutting <cu...@apache.org>
>>>> >> Subject: Re: Is it possible to append to an already
>>>> existing avro file
>>>> >> To: user@avro.apache.org
>>>> >> Date: Tuesday, February 5, 2013, 5:27 PM
>>>> >> It will work on an OutputStream that
>>>> >> supports append.
>>>> >>
>>>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>>>> >> java.io.OutputStream)
>>>> >>
>>>> >> So it depends on how well HDFS implements
>>>> >> FileSystem#append(), not on
>>>> >> any changes in Avro.
>>>> >>
>>>> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>>>> >>
>>>> >> I have no recent personal experience with append
>>>> in
>>>> >> HDFS.  Does anyone
>>>> >> else here?
>>>> >>
>>>> >> Doug
>>>> >>
>>>> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
>>>> <mi...@yahoo.com>
>>>> >> wrote:
>>>> >> > My understanding is that will append to a file
>>>> on the
>>>> >> local filesystem, but not to a file on HDFS.
>>>> >> >
>>>> >> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>>>> >> wrote:
>>>> >> >
>>>> >> >> From: Doug Cutting <cu...@apache.org>
>>>> >> >> Subject: Re: Is it possible to append to
>>>> an already
>>>> >> existing avro file
>>>> >> >> To: user@avro.apache.org
>>>> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
>>>> >> >> The Jira is:
>>>> >> >>
>>>> >> >> https://issues.apache.org/jira/browse/AVRO-1035
>>>> >> >>
>>>> >> >> It is possible to append to an existing
>>>> Avro file:
>>>> >> >>
>>>> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>>> >> >>
>>>> >> >> Should we close that issue as "fixed"?
>>>> >> >>
>>>> >> >> Doug
>>>> >> >>
>>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>>>> Malak
>>>> >> <mi...@yahoo.com>
>>>> >> >> wrote:
>>>> >> >> > Was a JIRA ticket ever created
>>>> regarding
>>>> >> appending to
>>>> >> >> an existing Avro file on HDFS?
>>>> >> >> >
>>>> >> >> > What is the status of such a
>>>> capability, a
>>>> >> year out
>>>> >> >> from when the issue below was raised?
>>>> >> >> >
>>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>>>> >> "Vyacheslav
>>>> >> >> Zholudev" <vy...@gmail.com>
>>>> >> >> wrote:
>>>> >> >> >
>>>> >> >> >> Thanks for your reply, I
>>>> suspected this.
>>>> >> >> >>
>>>> >> >> >> I will create a JIRA ticket.
>>>> >> >> >>
>>>> >> >> >> Vyacheslav
>>>> >> >> >>
>>>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>>>> Scott Carey
>>>> >> wrote:
>>>> >> >> >>
>>>> >> >> >>>
>>>> >> >> >>> On 2/21/12 7:29 AM,
>>>> "Vyacheslav
>>>> >> Zholudev"
>>>> >> >> <vy...@gmail.com>
>>>> >> >> >>> wrote:
>>>> >> >> >>>
>>>> >> >> >>>> Yep, I saw that method as
>>>> well as
>>>> >> the
>>>> >> >> stackoverflow post. However, I'm
>>>> >> >> >>>> interested how to append
>>>> to a file
>>>> >> on the
>>>> >> >> arbitrary file system, not
>>>> >> >> >>>> only on the local one.
>>>> >> >> >>>>
>>>> >> >> >>>> I want to get an
>>>> OutputStream
>>>> >> based on the
>>>> >> >> Path and the FileSystem
>>>> >> >> >>>> implementation and then
>>>> pass it
>>>> >> for
>>>> >> >> appending to avro methods.
>>>> >> >> >>>>
>>>> >> >> >>>> Is that possible?
>>>> >> >> >>>
>>>> >> >> >>> It is not possible without
>>>> modifying
>>>> >> >> DataFileWriter. Please open a JIRA
>>>> >> >> >>> ticket.
>>>> >> >> >>>
>>>> >> >> >>> It could not simply append to
>>>> an
>>>> >> OutputStream,
>>>> >> >> since it must either:
>>>> >> >> >>> * Seek to the start to
>>>> validate the
>>>> >> schemas
>>>> >> >> match and find the sync
>>>> >> >> >>> marker, or
>>>> >> >> >>> * Trust that the schemas
>>>> match and
>>>> >> find the
>>>> >> >> sync marker from the last
>>>> >> >> >>> block
>>>> >> >> >>>
>>>> >> >> >>> DataFileWriter cannot refer
>>>> to Hadoop
>>>> >> classes
>>>> >> >> such as FileSystem, but we
>>>> >> >> >>> could add something to the
>>>> mapred
>>>> >> module that
>>>> >> >> takes a Path and
>>>> >> >> >>> FileSystem and returns
>>>> something that
>>>> >> >> implemements an interface that
>>>> >> >> >>> DataFileWriter can append
>>>> to.
>>>> >> This would
>>>> >> >> be something that is both a
>>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>>> >> >> >>> and an OutputStream, or has
>>>> both an
>>>> >> InputStream
>>>> >> >> from the start of the
>>>> >> >> >>> existing file and an
>>>> OutputStream at
>>>> >> the end.
>>>> >> >> >>>
>>>> >> >> >>>> Thanks,
>>>> >> >> >>>> Vyacheslav
>>>> >> >> >>>>
>>>> >> >> >>>> On Feb 21, 2012, at 5:29
>>>> AM, Harsh
>>>> >> J
>>>> >> >> wrote:
>>>> >> >> >>>>
>>>> >> >> >>>>> Hi,
>>>> >> >> >>>>>
>>>> >> >> >>>>> Use the appendTo
>>>> feature of
>>>> >> the
>>>> >> >> DataFileWriter. See
>>>> >> >> >>>>>
>>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>>> >> >> >>>>>
>>>> >> >> >>>>> For a quick setup
>>>> example,
>>>> >> read also:
>>>> >> >> >>>>>
>>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>>> >> >> >>>>>
>>>> >> >> >>>>> On Tue, Feb 21, 2012
>>>> at 3:15
>>>> >> AM,
>>>> >> >> Vyacheslav Zholudev
>>>> >> >> >>>>> <vy...@gmail.com>
>>>> >> >> wrote:
>>>> >> >> >>>>>> Hi,
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> is it possible to
>>>> append
>>>> >> to an
>>>> >> >> already existing avro file when it was
>>>> >> >> >>>>>> written and
>>>> closed
>>>> >> before?
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> If I use
>>>> >> >> >>>>>> outputStream =
>>>> >> >> fs.append(avroFilePath);
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> then later on I
>>>> get:
>>>> >> >> java.io.IOException: Invalid sync!
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> Probably because
>>>> the
>>>> >> schema is
>>>> >> >> written twice and some other issues.
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> If I use
>>>> outputStream =
>>>> >> >> fs.create(avroFilePath); then the avro
>>>> file
>>>> >> >> >>>>>> gets
>>>> >> >> >>>>>> overwritten.
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> Thanks,
>>>> >> >> >>>>>> Vyacheslav
>>>> >> >> >>>>>
>>>> >> >> >>>>> --
>>>> >> >> >>>>> Harsh J
>>>> >> >> >>>>> Customer Ops.
>>>> Engineer
>>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>>>> >> >> >
>>>> >> >>
>>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>>>> Malak
>>>> >> <mi...@yahoo.com>
>>>> >> >> wrote:
>>>> >> >> > Was a JIRA ticket ever created
>>>> regarding
>>>> >> appending to
>>>> >> >> an existing Avro file on HDFS?
>>>> >> >> >
>>>> >> >> > What is the status of such a
>>>> capability, a
>>>> >> year out
>>>> >> >> from when the issue below was raised?
>>>> >> >> >
>>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>>>> >> "Vyacheslav
>>>> >> >> Zholudev" <vy...@gmail.com>
>>>> >> >> wrote:
>>>> >> >> >
>>>> >> >> >> Thanks for your reply, I
>>>> suspected this.
>>>> >> >> >>
>>>> >> >> >> I will create a JIRA ticket.
>>>> >> >> >>
>>>> >> >> >> Vyacheslav
>>>> >> >> >>
>>>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>>>> Scott Carey
>>>> >> wrote:
>>>> >> >> >>
>>>> >> >> >>>
>>>> >> >> >>> On 2/21/12 7:29 AM,
>>>> "Vyacheslav
>>>> >> Zholudev"
>>>> >> >> <vy...@gmail.com>
>>>> >> >> >>> wrote:
>>>> >> >> >>>
>>>> >> >> >>>> Yep, I saw that method as
>>>> well as
>>>> >> the
>>>> >> >> stackoverflow post. However, I'm
>>>> >> >> >>>> interested how to append
>>>> to a file
>>>> >> on the
>>>> >> >> arbitrary file system, not
>>>> >> >> >>>> only on the local one.
>>>> >> >> >>>>
>>>> >> >> >>>> I want to get an
>>>> OutputStream
>>>> >> based on the
>>>> >> >> Path and the FileSystem
>>>> >> >> >>>> implementation and then
>>>> pass it
>>>> >> for
>>>> >> >> appending to avro methods.
>>>> >> >> >>>>
>>>> >> >> >>>> Is that possible?
>>>> >> >> >>>
>>>> >> >> >>> It is not possible without
>>>> modifying
>>>> >> >> DataFileWriter. Please open a JIRA
>>>> >> >> >>> ticket.
>>>> >> >> >>>
>>>> >> >> >>> It could not simply append to
>>>> an
>>>> >> OutputStream,
>>>> >> >> since it must either:
>>>> >> >> >>> * Seek to the start to
>>>> validate the
>>>> >> schemas
>>>> >> >> match and find the sync
>>>> >> >> >>> marker, or
>>>> >> >> >>> * Trust that the schemas
>>>> match and
>>>> >> find the
>>>> >> >> sync marker from the last
>>>> >> >> >>> block
>>>> >> >> >>>
>>>> >> >> >>> DataFileWriter cannot refer
>>>> to Hadoop
>>>> >> classes
>>>> >> >> such as FileSystem, but we
>>>> >> >> >>> could add something to the
>>>> mapred
>>>> >> module that
>>>> >> >> takes a Path and
>>>> >> >> >>> FileSystem and returns
>>>> something that
>>>> >> >> implemements an interface that
>>>> >> >> >>> DataFileWriter can append
>>>> to.
>>>> >> This would
>>>> >> >> be something that is both a
>>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>>> >> >> >>> and an OutputStream, or has
>>>> both an
>>>> >> InputStream
>>>> >> >> from the start of the
>>>> >> >> >>> existing file and an
>>>> OutputStream at
>>>> >> the end.
>>>> >> >> >>>
>>>> >> >> >>>> Thanks,
>>>> >> >> >>>> Vyacheslav
>>>> >> >> >>>>
>>>> >> >> >>>> On Feb 21, 2012, at 5:29
>>>> AM, Harsh
>>>> >> J
>>>> >> >> wrote:
>>>> >> >> >>>>
>>>> >> >> >>>>> Hi,
>>>> >> >> >>>>>
>>>> >> >> >>>>> Use the appendTo
>>>> feature of
>>>> >> the
>>>> >> >> DataFileWriter. See
>>>> >> >> >>>>>
>>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>>> >> >> >>>>>
>>>> >> >> >>>>> For a quick setup
>>>> example,
>>>> >> read also:
>>>> >> >> >>>>>
>>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>>> >> >> >>>>>
>>>> >> >> >>>>> On Tue, Feb 21, 2012
>>>> at 3:15
>>>> >> AM,
>>>> >> >> Vyacheslav Zholudev
>>>> >> >> >>>>> <vy...@gmail.com>
>>>> >> >> wrote:
>>>> >> >> >>>>>> Hi,
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> is it possible to
>>>> append
>>>> >> to an
>>>> >> >> already existing avro file when it was
>>>> >> >> >>>>>> written and
>>>> closed
>>>> >> before?
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> If I use
>>>> >> >> >>>>>> outputStream =
>>>> >> >> fs.append(avroFilePath);
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> then later on I
>>>> get:
>>>> >> >> java.io.IOException: Invalid sync!
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> Probably because
>>>> the
>>>> >> schema is
>>>> >> >> written twice and some other issues.
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> If I use
>>>> outputStream =
>>>> >> >> fs.create(avroFilePath); then the avro
>>>> file
>>>> >> >> >>>>>> gets
>>>> >> >> >>>>>> overwritten.
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> Thanks,
>>>> >> >> >>>>>> Vyacheslav
>>>> >> >> >>>>>
>>>> >> >> >>>>> --
>>>> >> >> >>>>> Harsh J
>>>> >> >> >>>>> Customer Ops.
>>>> Engineer
>>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>>>> >> >> >
>>>> >> >>
>>>> >>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <mi...@yahoo.com>
>>>> wrote:
>>>> > I don't believe a Hadoop FileSystem is a Java
>>>> OutputStream?
>>>> >
>>>> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>>>> wrote:
>>>> >
>>>> >> From: Doug Cutting <cu...@apache.org>
>>>> >> Subject: Re: Is it possible to append to an already
>>>> existing avro file
>>>> >> To: user@avro.apache.org
>>>> >> Date: Tuesday, February 5, 2013, 5:27 PM
>>>> >> It will work on an OutputStream that
>>>> >> supports append.
>>>> >>
>>>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>>>> >> java.io.OutputStream)
>>>> >>
>>>> >> So it depends on how well HDFS implements
>>>> >> FileSystem#append(), not on
>>>> >> any changes in Avro.
>>>> >>
>>>> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>>>> >>
>>>> >> I have no recent personal experience with append
>>>> in
>>>> >> HDFS.  Does anyone
>>>> >> else here?
>>>> >>
>>>> >> Doug
>>>> >>
>>>> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
>>>> <mi...@yahoo.com>
>>>> >> wrote:
>>>> >> > My understanding is that will append to a file
>>>> on the
>>>> >> local filesystem, but not to a file on HDFS.
>>>> >> >
>>>> >> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>>>> >> wrote:
>>>> >> >
>>>> >> >> From: Doug Cutting <cu...@apache.org>
>>>> >> >> Subject: Re: Is it possible to append to
>>>> an already
>>>> >> existing avro file
>>>> >> >> To: user@avro.apache.org
>>>> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
>>>> >> >> The Jira is:
>>>> >> >>
>>>> >> >> https://issues.apache.org/jira/browse/AVRO-1035
>>>> >> >>
>>>> >> >> It is possible to append to an existing
>>>> Avro file:
>>>> >> >>
>>>> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>>> >> >>
>>>> >> >> Should we close that issue as "fixed"?
>>>> >> >>
>>>> >> >> Doug
>>>> >> >>
>>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>>>> Malak
>>>> >> <mi...@yahoo.com>
>>>> >> >> wrote:
>>>> >> >> > Was a JIRA ticket ever created
>>>> regarding
>>>> >> appending to
>>>> >> >> an existing Avro file on HDFS?
>>>> >> >> >
>>>> >> >> > What is the status of such a
>>>> capability, a
>>>> >> year out
>>>> >> >> from when the issue below was raised?
>>>> >> >> >
>>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>>>> >> "Vyacheslav
>>>> >> >> Zholudev" <vy...@gmail.com>
>>>> >> >> wrote:
>>>> >> >> >
>>>> >> >> >> Thanks for your reply, I
>>>> suspected this.
>>>> >> >> >>
>>>> >> >> >> I will create a JIRA ticket.
>>>> >> >> >>
>>>> >> >> >> Vyacheslav
>>>> >> >> >>
>>>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>>>> Scott Carey
>>>> >> wrote:
>>>> >> >> >>
>>>> >> >> >>>
>>>> >> >> >>> On 2/21/12 7:29 AM,
>>>> "Vyacheslav
>>>> >> Zholudev"
>>>> >> >> <vy...@gmail.com>
>>>> >> >> >>> wrote:
>>>> >> >> >>>
>>>> >> >> >>>> Yep, I saw that method as
>>>> well as
>>>> >> the
>>>> >> >> stackoverflow post. However, I'm
>>>> >> >> >>>> interested how to append
>>>> to a file
>>>> >> on the
>>>> >> >> arbitrary file system, not
>>>> >> >> >>>> only on the local one.
>>>> >> >> >>>>
>>>> >> >> >>>> I want to get an
>>>> OutputStream
>>>> >> based on the
>>>> >> >> Path and the FileSystem
>>>> >> >> >>>> implementation and then
>>>> pass it
>>>> >> for
>>>> >> >> appending to avro methods.
>>>> >> >> >>>>
>>>> >> >> >>>> Is that possible?
>>>> >> >> >>>
>>>> >> >> >>> It is not possible without
>>>> modifying
>>>> >> >> DataFileWriter. Please open a JIRA
>>>> >> >> >>> ticket.
>>>> >> >> >>>
>>>> >> >> >>> It could not simply append to
>>>> an
>>>> >> OutputStream,
>>>> >> >> since it must either:
>>>> >> >> >>> * Seek to the start to
>>>> validate the
>>>> >> schemas
>>>> >> >> match and find the sync
>>>> >> >> >>> marker, or
>>>> >> >> >>> * Trust that the schemas
>>>> match and
>>>> >> find the
>>>> >> >> sync marker from the last
>>>> >> >> >>> block
>>>> >> >> >>>
>>>> >> >> >>> DataFileWriter cannot refer
>>>> to Hadoop
>>>> >> classes
>>>> >> >> such as FileSystem, but we
>>>> >> >> >>> could add something to the
>>>> mapred
>>>> >> module that
>>>> >> >> takes a Path and
>>>> >> >> >>> FileSystem and returns
>>>> something that
>>>> >> >> implemements an interface that
>>>> >> >> >>> DataFileWriter can append
>>>> to.
>>>> >> This would
>>>> >> >> be something that is both a
>>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>>> >> >> >>> and an OutputStream, or has
>>>> both an
>>>> >> InputStream
>>>> >> >> from the start of the
>>>> >> >> >>> existing file and an
>>>> OutputStream at
>>>> >> the end.
>>>> >> >> >>>
>>>> >> >> >>>> Thanks,
>>>> >> >> >>>> Vyacheslav
>>>> >> >> >>>>
>>>> >> >> >>>> On Feb 21, 2012, at 5:29
>>>> AM, Harsh
>>>> >> J
>>>> >> >> wrote:
>>>> >> >> >>>>
>>>> >> >> >>>>> Hi,
>>>> >> >> >>>>>
>>>> >> >> >>>>> Use the appendTo
>>>> feature of
>>>> >> the
>>>> >> >> DataFileWriter. See
>>>> >> >> >>>>>
>>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>>> >> >> >>>>>
>>>> >> >> >>>>> For a quick setup
>>>> example,
>>>> >> read also:
>>>> >> >> >>>>>
>>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>>> >> >> >>>>>
>>>> >> >> >>>>> On Tue, Feb 21, 2012
>>>> at 3:15
>>>> >> AM,
>>>> >> >> Vyacheslav Zholudev
>>>> >> >> >>>>> <vy...@gmail.com>
>>>> >> >> wrote:
>>>> >> >> >>>>>> Hi,
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> is it possible to
>>>> append
>>>> >> to an
>>>> >> >> already existing avro file when it was
>>>> >> >> >>>>>> written and
>>>> closed
>>>> >> before?
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> If I use
>>>> >> >> >>>>>> outputStream =
>>>> >> >> fs.append(avroFilePath);
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> then later on I
>>>> get:
>>>> >> >> java.io.IOException: Invalid sync!
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> Probably because
>>>> the
>>>> >> schema is
>>>> >> >> written twice and some other issues.
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> If I use
>>>> outputStream =
>>>> >> >> fs.create(avroFilePath); then the avro
>>>> file
>>>> >> >> >>>>>> gets
>>>> >> >> >>>>>> overwritten.
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> Thanks,
>>>> >> >> >>>>>> Vyacheslav
>>>> >> >> >>>>>
>>>> >> >> >>>>> --
>>>> >> >> >>>>> Harsh J
>>>> >> >> >>>>> Customer Ops.
>>>> Engineer
>>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>>>> >> >> >
>>>> >> >>
>>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>>>> Malak
>>>> >> <mi...@yahoo.com>
>>>> >> >> wrote:
>>>> >> >> > Was a JIRA ticket ever created
>>>> regarding
>>>> >> appending to
>>>> >> >> an existing Avro file on HDFS?
>>>> >> >> >
>>>> >> >> > What is the status of such a
>>>> capability, a
>>>> >> year out
>>>> >> >> from when the issue below was raised?
>>>> >> >> >
>>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>>>> >> "Vyacheslav
>>>> >> >> Zholudev" <vy...@gmail.com>
>>>> >> >> wrote:
>>>> >> >> >
>>>> >> >> >> Thanks for your reply, I
>>>> suspected this.
>>>> >> >> >>
>>>> >> >> >> I will create a JIRA ticket.
>>>> >> >> >>
>>>> >> >> >> Vyacheslav
>>>> >> >> >>
>>>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>>>> Scott Carey
>>>> >> wrote:
>>>> >> >> >>
>>>> >> >> >>>
>>>> >> >> >>> On 2/21/12 7:29 AM,
>>>> "Vyacheslav
>>>> >> Zholudev"
>>>> >> >> <vy...@gmail.com>
>>>> >> >> >>> wrote:
>>>> >> >> >>>
>>>> >> >> >>>> Yep, I saw that method as
>>>> well as
>>>> >> the
>>>> >> >> stackoverflow post. However, I'm
>>>> >> >> >>>> interested how to append
>>>> to a file
>>>> >> on the
>>>> >> >> arbitrary file system, not
>>>> >> >> >>>> only on the local one.
>>>> >> >> >>>>
>>>> >> >> >>>> I want to get an
>>>> OutputStream
>>>> >> based on the
>>>> >> >> Path and the FileSystem
>>>> >> >> >>>> implementation and then
>>>> pass it
>>>> >> for
>>>> >> >> appending to avro methods.
>>>> >> >> >>>>
>>>> >> >> >>>> Is that possible?
>>>> >> >> >>>
>>>> >> >> >>> It is not possible without
>>>> modifying
>>>> >> >> DataFileWriter. Please open a JIRA
>>>> >> >> >>> ticket.
>>>> >> >> >>>
>>>> >> >> >>> It could not simply append to
>>>> an
>>>> >> OutputStream,
>>>> >> >> since it must either:
>>>> >> >> >>> * Seek to the start to
>>>> validate the
>>>> >> schemas
>>>> >> >> match and find the sync
>>>> >> >> >>> marker, or
>>>> >> >> >>> * Trust that the schemas
>>>> match and
>>>> >> find the
>>>> >> >> sync marker from the last
>>>> >> >> >>> block
>>>> >> >> >>>
>>>> >> >> >>> DataFileWriter cannot refer
>>>> to Hadoop
>>>> >> classes
>>>> >> >> such as FileSystem, but we
>>>> >> >> >>> could add something to the
>>>> mapred
>>>> >> module that
>>>> >> >> takes a Path and
>>>> >> >> >>> FileSystem and returns
>>>> something that
>>>> >> >> implemements an interface that
>>>> >> >> >>> DataFileWriter can append
>>>> to.
>>>> >> This would
>>>> >> >> be something that is both a
>>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>>> >> >> >>> and an OutputStream, or has
>>>> both an
>>>> >> InputStream
>>>> >> >> from the start of the
>>>> >> >> >>> existing file and an
>>>> OutputStream at
>>>> >> the end.
>>>> >> >> >>>
>>>> >> >> >>>> Thanks,
>>>> >> >> >>>> Vyacheslav
>>>> >> >> >>>>
>>>> >> >> >>>> On Feb 21, 2012, at 5:29
>>>> AM, Harsh
>>>> >> J
>>>> >> >> wrote:
>>>> >> >> >>>>
>>>> >> >> >>>>> Hi,
>>>> >> >> >>>>>
>>>> >> >> >>>>> Use the appendTo
>>>> feature of
>>>> >> the
>>>> >> >> DataFileWriter. See
>>>> >> >> >>>>>
>>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>>> >> >> >>>>>
>>>> >> >> >>>>> For a quick setup
>>>> example,
>>>> >> read also:
>>>> >> >> >>>>>
>>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>>> >> >> >>>>>
>>>> >> >> >>>>> On Tue, Feb 21, 2012
>>>> at 3:15
>>>> >> AM,
>>>> >> >> Vyacheslav Zholudev
>>>> >> >> >>>>> <vy...@gmail.com>
>>>> >> >> wrote:
>>>> >> >> >>>>>> Hi,
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> is it possible to
>>>> append
>>>> >> to an
>>>> >> >> already existing avro file when it was
>>>> >> >> >>>>>> written and
>>>> closed
>>>> >> before?
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> If I use
>>>> >> >> >>>>>> outputStream =
>>>> >> >> fs.append(avroFilePath);
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> then later on I
>>>> get:
>>>> >> >> java.io.IOException: Invalid sync!
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> Probably because
>>>> the
>>>> >> schema is
>>>> >> >> written twice and some other issues.
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> If I use
>>>> outputStream =
>>>> >> >> fs.create(avroFilePath); then the avro
>>>> file
>>>> >> >> >>>>>> gets
>>>> >> >> >>>>>> overwritten.
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> Thanks,
>>>> >> >> >>>>>> Vyacheslav
>>>> >> >> >>>>>
>>>> >> >> >>>>> --
>>>> >> >> >>>>> Harsh J
>>>> >> >> >>>>> Customer Ops.
>>>> Engineer
>>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>>>> >> >> >
>>>> >> >>
>>>> >>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <mi...@yahoo.com>
>>>> wrote:
>>>> > I don't believe a Hadoop FileSystem is a Java
>>>> OutputStream?
>>>> >
>>>> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>>>> wrote:
>>>> >
>>>> >> From: Doug Cutting <cu...@apache.org>
>>>> >> Subject: Re: Is it possible to append to an already
>>>> existing avro file
>>>> >> To: user@avro.apache.org
>>>> >> Date: Tuesday, February 5, 2013, 5:27 PM
>>>> >> It will work on an OutputStream that
>>>> >> supports append.
>>>> >>
>>>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>>>> >> java.io.OutputStream)
>>>> >>
>>>> >> So it depends on how well HDFS implements
>>>> >> FileSystem#append(), not on
>>>> >> any changes in Avro.
>>>> >>
>>>> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>>>> >>
>>>> >> I have no recent personal experience with append
>>>> in
>>>> >> HDFS.  Does anyone
>>>> >> else here?
>>>> >>
>>>> >> Doug
>>>> >>
>>>> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
>>>> <mi...@yahoo.com>
>>>> >> wrote:
>>>> >> > My understanding is that will append to a file
>>>> on the
>>>> >> local filesystem, but not to a file on HDFS.
>>>> >> >
>>>> >> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>>>> >> wrote:
>>>> >> >
>>>> >> >> From: Doug Cutting <cu...@apache.org>
>>>> >> >> Subject: Re: Is it possible to append to
>>>> an already
>>>> >> existing avro file
>>>> >> >> To: user@avro.apache.org
>>>> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
>>>> >> >> The Jira is:
>>>> >> >>
>>>> >> >> https://issues.apache.org/jira/browse/AVRO-1035
>>>> >> >>
>>>> >> >> It is possible to append to an existing
>>>> Avro file:
>>>> >> >>
>>>> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>>> >> >>
>>>> >> >> Should we close that issue as "fixed"?
>>>> >> >>
>>>> >> >> Doug
>>>> >> >>
>>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>>>> Malak
>>>> >> <mi...@yahoo.com>
>>>> >> >> wrote:
>>>> >> >> > Was a JIRA ticket ever created
>>>> regarding
>>>> >> appending to
>>>> >> >> an existing Avro file on HDFS?
>>>> >> >> >
>>>> >> >> > What is the status of such a
>>>> capability, a
>>>> >> year out
>>>> >> >> from when the issue below was raised?
>>>> >> >> >
>>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>>>> >> "Vyacheslav
>>>> >> >> Zholudev" <vy...@gmail.com>
>>>> >> >> wrote:
>>>> >> >> >
>>>> >> >> >> Thanks for your reply, I
>>>> suspected this.
>>>> >> >> >>
>>>> >> >> >> I will create a JIRA ticket.
>>>> >> >> >>
>>>> >> >> >> Vyacheslav
>>>> >> >> >>
>>>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>>>> Scott Carey
>>>> >> wrote:
>>>> >> >> >>
>>>> >> >> >>>
>>>> >> >> >>> On 2/21/12 7:29 AM,
>>>> "Vyacheslav
>>>> >> Zholudev"
>>>> >> >> <vy...@gmail.com>
>>>> >> >> >>> wrote:
>>>> >> >> >>>
>>>> >> >> >>>> Yep, I saw that method as
>>>> well as
>>>> >> the
>>>> >> >> stackoverflow post. However, I'm
>>>> >> >> >>>> interested how to append
>>>> to a file
>>>> >> on the
>>>> >> >> arbitrary file system, not
>>>> >> >> >>>> only on the local one.
>>>> >> >> >>>>
>>>> >> >> >>>> I want to get an
>>>> OutputStream
>>>> >> based on the
>>>> >> >> Path and the FileSystem
>>>> >> >> >>>> implementation and then
>>>> pass it
>>>> >> for
>>>> >> >> appending to avro methods.
>>>> >> >> >>>>
>>>> >> >> >>>> Is that possible?
>>>> >> >> >>>
>>>> >> >> >>> It is not possible without
>>>> modifying
>>>> >> >> DataFileWriter. Please open a JIRA
>>>> >> >> >>> ticket.
>>>> >> >> >>>
>>>> >> >> >>> It could not simply append to
>>>> an
>>>> >> OutputStream,
>>>> >> >> since it must either:
>>>> >> >> >>> * Seek to the start to
>>>> validate the
>>>> >> schemas
>>>> >> >> match and find the sync
>>>> >> >> >>> marker, or
>>>> >> >> >>> * Trust that the schemas
>>>> match and
>>>> >> find the
>>>> >> >> sync marker from the last
>>>> >> >> >>> block
>>>> >> >> >>>
>>>> >> >> >>> DataFileWriter cannot refer
>>>> to Hadoop
>>>> >> classes
>>>> >> >> such as FileSystem, but we
>>>> >> >> >>> could add something to the
>>>> mapred
>>>> >> module that
>>>> >> >> takes a Path and
>>>> >> >> >>> FileSystem and returns
>>>> something that
>>>> >> >> implemements an interface that
>>>> >> >> >>> DataFileWriter can append
>>>> to.
>>>> >> This would
>>>> >> >> be something that is both a
>>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>>> >> >> >>> and an OutputStream, or has
>>>> both an
>>>> >> InputStream
>>>> >> >> from the start of the
>>>> >> >> >>> existing file and an
>>>> OutputStream at
>>>> >> the end.
>>>> >> >> >>>
>>>> >> >> >>>> Thanks,
>>>> >> >> >>>> Vyacheslav
>>>> >> >> >>>>
>>>> >> >> >>>> On Feb 21, 2012, at 5:29
>>>> AM, Harsh
>>>> >> J
>>>> >> >> wrote:
>>>> >> >> >>>>
>>>> >> >> >>>>> Hi,
>>>> >> >> >>>>>
>>>> >> >> >>>>> Use the appendTo
>>>> feature of
>>>> >> the
>>>> >> >> DataFileWriter. See
>>>> >> >> >>>>>
>>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>>> >> >> >>>>>
>>>> >> >> >>>>> For a quick setup
>>>> example,
>>>> >> read also:
>>>> >> >> >>>>>
>>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>>> >> >> >>>>>
>>>> >> >> >>>>> On Tue, Feb 21, 2012
>>>> at 3:15
>>>> >> AM,
>>>> >> >> Vyacheslav Zholudev
>>>> >> >> >>>>> <vy...@gmail.com>
>>>> >> >> wrote:
>>>> >> >> >>>>>> Hi,
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> is it possible to
>>>> append
>>>> >> to an
>>>> >> >> already existing avro file when it was
>>>> >> >> >>>>>> written and
>>>> closed
>>>> >> before?
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> If I use
>>>> >> >> >>>>>> outputStream =
>>>> >> >> fs.append(avroFilePath);
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> then later on I
>>>> get:
>>>> >> >> java.io.IOException: Invalid sync!
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> Probably because
>>>> the
>>>> >> schema is
>>>> >> >> written twice and some other issues.
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> If I use
>>>> outputStream =
>>>> >> >> fs.create(avroFilePath); then the avro
>>>> file
>>>> >> >> >>>>>> gets
>>>> >> >> >>>>>> overwritten.
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> Thanks,
>>>> >> >> >>>>>> Vyacheslav
>>>> >> >> >>>>>
>>>> >> >> >>>>> --
>>>> >> >> >>>>> Harsh J
>>>> >> >> >>>>> Customer Ops.
>>>> Engineer
>>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>>>> >> >> >
>>>> >> >>
>>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>>>> Malak
>>>> >> <mi...@yahoo.com>
>>>> >> >> wrote:
>>>> >> >> > Was a JIRA ticket ever created
>>>> regarding
>>>> >> appending to
>>>> >> >> an existing Avro file on HDFS?
>>>> >> >> >
>>>> >> >> > What is the status of such a
>>>> capability, a
>>>> >> year out
>>>> >> >> from when the issue below was raised?
>>>> >> >> >
>>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>>>> >> "Vyacheslav
>>>> >> >> Zholudev" <vy...@gmail.com>
>>>> >> >> wrote:
>>>> >> >> >
>>>> >> >> >> Thanks for your reply, I
>>>> suspected this.
>>>> >> >> >>
>>>> >> >> >> I will create a JIRA ticket.
>>>> >> >> >>
>>>> >> >> >> Vyacheslav
>>>> >> >> >>
>>>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>>>> Scott Carey
>>>> >> wrote:
>>>> >> >> >>
>>>> >> >> >>>
>>>> >> >> >>> On 2/21/12 7:29 AM,
>>>> "Vyacheslav
>>>> >> Zholudev"
>>>> >> >> <vy...@gmail.com>
>>>> >> >> >>> wrote:
>>>> >> >> >>>
>>>> >> >> >>>> Yep, I saw that method as
>>>> well as
>>>> >> the
>>>> >> >> stackoverflow post. However, I'm
>>>> >> >> >>>> interested how to append
>>>> to a file
>>>> >> on the
>>>> >> >> arbitrary file system, not
>>>> >> >> >>>> only on the local one.
>>>> >> >> >>>>
>>>> >> >> >>>> I want to get an
>>>> OutputStream
>>>> >> based on the
>>>> >> >> Path and the FileSystem
>>>> >> >> >>>> implementation and then
>>>> pass it
>>>> >> for
>>>> >> >> appending to avro methods.
>>>> >> >> >>>>
>>>> >> >> >>>> Is that possible?
>>>> >> >> >>>
>>>> >> >> >>> It is not possible without
>>>> modifying
>>>> >> >> DataFileWriter. Please open a JIRA
>>>> >> >> >>> ticket.
>>>> >> >> >>>
>>>> >> >> >>> It could not simply append to
>>>> an
>>>> >> OutputStream,
>>>> >> >> since it must either:
>>>> >> >> >>> * Seek to the start to
>>>> validate the
>>>> >> schemas
>>>> >> >> match and find the sync
>>>> >> >> >>> marker, or
>>>> >> >> >>> * Trust that the schemas
>>>> match and
>>>> >> find the
>>>> >> >> sync marker from the last
>>>> >> >> >>> block
>>>> >> >> >>>
>>>> >> >> >>> DataFileWriter cannot refer
>>>> to Hadoop
>>>> >> classes
>>>> >> >> such as FileSystem, but we
>>>> >> >> >>> could add something to the
>>>> mapred
>>>> >> module that
>>>> >> >> takes a Path and
>>>> >> >> >>> FileSystem and returns
>>>> something that
>>>> >> >> implemements an interface that
>>>> >> >> >>> DataFileWriter can append
>>>> to.
>>>> >> This would
>>>> >> >> be something that is both a
>>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>>> >> >> >>> and an OutputStream, or has
>>>> both an
>>>> >> InputStream
>>>> >> >> from the start of the
>>>> >> >> >>> existing file and an
>>>> OutputStream at
>>>> >> the end.
>>>> >> >> >>>
>>>> >> >> >>>> Thanks,
>>>> >> >> >>>> Vyacheslav
>>>> >> >> >>>>
>>>> >> >> >>>> On Feb 21, 2012, at 5:29
>>>> AM, Harsh
>>>> >> J
>>>> >> >> wrote:
>>>> >> >> >>>>
>>>> >> >> >>>>> Hi,
>>>> >> >> >>>>>
>>>> >> >> >>>>> Use the appendTo
>>>> feature of
>>>> >> the
>>>> >> >> DataFileWriter. See
>>>> >> >> >>>>>
>>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>>> >> >> >>>>>
>>>> >> >> >>>>> For a quick setup
>>>> example,
>>>> >> read also:
>>>> >> >> >>>>>
>>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>>> >> >> >>>>>
>>>> >> >> >>>>> On Tue, Feb 21, 2012
>>>> at 3:15
>>>> >> AM,
>>>> >> >> Vyacheslav Zholudev
>>>> >> >> >>>>> <vy...@gmail.com>
>>>> >> >> wrote:
>>>> >> >> >>>>>> Hi,
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> is it possible to
>>>> append
>>>> >> to an
>>>> >> >> already existing avro file when it was
>>>> >> >> >>>>>> written and
>>>> closed
>>>> >> before?
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> If I use
>>>> >> >> >>>>>> outputStream =
>>>> >> >> fs.append(avroFilePath);
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> then later on I
>>>> get:
>>>> >> >> java.io.IOException: Invalid sync!
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> Probably because
>>>> the
>>>> >> schema is
>>>> >> >> written twice and some other issues.
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> If I use
>>>> outputStream =
>>>> >> >> fs.create(avroFilePath); then the avro
>>>> file
>>>> >> >> >>>>>> gets
>>>> >> >> >>>>>> overwritten.
>>>> >> >> >>>>>>
>>>> >> >> >>>>>> Thanks,
>>>> >> >> >>>>>> Vyacheslav
>>>> >> >> >>>>>
>>>> >> >> >>>>> --
>>>> >> >> >>>>> Harsh J
>>>> >> >> >>>>> Customer Ops.
>>>> Engineer
>>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>>>> >> >> >
>>>> >> >>
>>>> >>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>
>>
>>
>> --
>> Harsh J



--
Harsh J

Re: Is it possible to append to an already existing avro file

Posted by Doug Cutting <cu...@apache.org>.
The avro-mapred module includes a Seekable implementation that works
with HDFS called FsInput:

http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/FsInput.html

With this, your example can be made considerably smaller.

Doug



On Thu, Feb 7, 2013 at 8:28 AM, Harsh J <ha...@cloudera.com> wrote:
> I assume by non-trivial you meant the extra Seekable stuff I needed to
> wrap around the DFS output streams to let Avro take it as append-able?
> I don't think its possible for Avro to carry it since Avro (core) does
> not reverse-depend on Hadoop. Should we document it somewhere though?
> Do you have any ideas on the best place to do that?
>
> On Thu, Feb 7, 2013 at 6:12 AM, Michael Malak <mi...@yahoo.com> wrote:
>> Thanks so much for the code -- it works great!
>>
>> Since it is a non-trivial amount of code required to achieve append, I suggest attaching that code to AVRO-1035, in the hopes that someone will come up with an interface that requires just one line of user code to achieve append.
>>
>> --- On Wed, 2/6/13, Harsh J <ha...@cloudera.com> wrote:
>>
>>> From: Harsh J <ha...@cloudera.com>
>>> Subject: Re: Is it possible to append to an already existing avro file
>>> To: user@avro.apache.org
>>> Date: Wednesday, February 6, 2013, 11:17 AM
>>> Hey Michael,
>>>
>>> It does implement the regular Java OutputStream interface,
>>> as seen in
>>> the API: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html.
>>>
>>> Here's a sample program that works on Hadoop 2.x in my
>>> tests:
>>> https://gist.github.com/QwertyManiac/4724582
>>>
>>> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <mi...@yahoo.com>
>>> wrote:
>>> > I don't believe a Hadoop FileSystem is a Java
>>> OutputStream?
>>> >
>>> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>>> wrote:
>>> >
>>> >> From: Doug Cutting <cu...@apache.org>
>>> >> Subject: Re: Is it possible to append to an already
>>> existing avro file
>>> >> To: user@avro.apache.org
>>> >> Date: Tuesday, February 5, 2013, 5:27 PM
>>> >> It will work on an OutputStream that
>>> >> supports append.
>>> >>
>>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>>> >> java.io.OutputStream)
>>> >>
>>> >> So it depends on how well HDFS implements
>>> >> FileSystem#append(), not on
>>> >> any changes in Avro.
>>> >>
>>> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>>> >>
>>> >> I have no recent personal experience with append
>>> in
>>> >> HDFS.  Does anyone
>>> >> else here?
>>> >>
>>> >> Doug
>>> >>
>>> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
>>> <mi...@yahoo.com>
>>> >> wrote:
>>> >> > My understanding is that will append to a file
>>> on the
>>> >> local filesystem, but not to a file on HDFS.
>>> >> >
>>> >> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>>> >> wrote:
>>> >> >
>>> >> >> From: Doug Cutting <cu...@apache.org>
>>> >> >> Subject: Re: Is it possible to append to
>>> an already
>>> >> existing avro file
>>> >> >> To: user@avro.apache.org
>>> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
>>> >> >> The Jira is:
>>> >> >>
>>> >> >> https://issues.apache.org/jira/browse/AVRO-1035
>>> >> >>
>>> >> >> It is possible to append to an existing
>>> Avro file:
>>> >> >>
>>> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>> >> >>
>>> >> >> Should we close that issue as "fixed"?
>>> >> >>
>>> >> >> Doug
>>> >> >>
>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>>> Malak
>>> >> <mi...@yahoo.com>
>>> >> >> wrote:
>>> >> >> > Was a JIRA ticket ever created
>>> regarding
>>> >> appending to
>>> >> >> an existing Avro file on HDFS?
>>> >> >> >
>>> >> >> > What is the status of such a
>>> capability, a
>>> >> year out
>>> >> >> from when the issue below was raised?
>>> >> >> >
>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>>> >> "Vyacheslav
>>> >> >> Zholudev" <vy...@gmail.com>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >> Thanks for your reply, I
>>> suspected this.
>>> >> >> >>
>>> >> >> >> I will create a JIRA ticket.
>>> >> >> >>
>>> >> >> >> Vyacheslav
>>> >> >> >>
>>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>>> Scott Carey
>>> >> wrote:
>>> >> >> >>
>>> >> >> >>>
>>> >> >> >>> On 2/21/12 7:29 AM,
>>> "Vyacheslav
>>> >> Zholudev"
>>> >> >> <vy...@gmail.com>
>>> >> >> >>> wrote:
>>> >> >> >>>
>>> >> >> >>>> Yep, I saw that method as
>>> well as
>>> >> the
>>> >> >> stackoverflow post. However, I'm
>>> >> >> >>>> interested how to append
>>> to a file
>>> >> on the
>>> >> >> arbitrary file system, not
>>> >> >> >>>> only on the local one.
>>> >> >> >>>>
>>> >> >> >>>> I want to get an
>>> OutputStream
>>> >> based on the
>>> >> >> Path and the FileSystem
>>> >> >> >>>> implementation and then
>>> pass it
>>> >> for
>>> >> >> appending to avro methods.
>>> >> >> >>>>
>>> >> >> >>>> Is that possible?
>>> >> >> >>>
>>> >> >> >>> It is not possible without
>>> modifying
>>> >> >> DataFileWriter. Please open a JIRA
>>> >> >> >>> ticket.
>>> >> >> >>>
>>> >> >> >>> It could not simply append to
>>> an
>>> >> OutputStream,
>>> >> >> since it must either:
>>> >> >> >>> * Seek to the start to
>>> validate the
>>> >> schemas
>>> >> >> match and find the sync
>>> >> >> >>> marker, or
>>> >> >> >>> * Trust that the schemas
>>> match and
>>> >> find the
>>> >> >> sync marker from the last
>>> >> >> >>> block
>>> >> >> >>>
>>> >> >> >>> DataFileWriter cannot refer
>>> to Hadoop
>>> >> classes
>>> >> >> such as FileSystem, but we
>>> >> >> >>> could add something to the
>>> mapred
>>> >> module that
>>> >> >> takes a Path and
>>> >> >> >>> FileSystem and returns
>>> something that
>>> >> >> implemements an interface that
>>> >> >> >>> DataFileWriter can append
>>> to.
>>> >> This would
>>> >> >> be something that is both a
>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>> >> >> >>> and an OutputStream, or has
>>> both an
>>> >> InputStream
>>> >> >> from the start of the
>>> >> >> >>> existing file and an
>>> OutputStream at
>>> >> the end.
>>> >> >> >>>
>>> >> >> >>>> Thanks,
>>> >> >> >>>> Vyacheslav
>>> >> >> >>>>
>>> >> >> >>>> On Feb 21, 2012, at 5:29
>>> AM, Harsh
>>> >> J
>>> >> >> wrote:
>>> >> >> >>>>
>>> >> >> >>>>> Hi,
>>> >> >> >>>>>
>>> >> >> >>>>> Use the appendTo
>>> feature of
>>> >> the
>>> >> >> DataFileWriter. See
>>> >> >> >>>>>
>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>> >> >> >>>>>
>>> >> >> >>>>> For a quick setup
>>> example,
>>> >> read also:
>>> >> >> >>>>>
>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>> >> >> >>>>>
>>> >> >> >>>>> On Tue, Feb 21, 2012
>>> at 3:15
>>> >> AM,
>>> >> >> Vyacheslav Zholudev
>>> >> >> >>>>> <vy...@gmail.com>
>>> >> >> wrote:
>>> >> >> >>>>>> Hi,
>>> >> >> >>>>>>
>>> >> >> >>>>>> is it possible to
>>> append
>>> >> to an
>>> >> >> already existing avro file when it was
>>> >> >> >>>>>> written and
>>> closed
>>> >> before?
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> >> >> >>>>>> outputStream =
>>> >> >> fs.append(avroFilePath);
>>> >> >> >>>>>>
>>> >> >> >>>>>> then later on I
>>> get:
>>> >> >> java.io.IOException: Invalid sync!
>>> >> >> >>>>>>
>>> >> >> >>>>>> Probably because
>>> the
>>> >> schema is
>>> >> >> written twice and some other issues.
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> outputStream =
>>> >> >> fs.create(avroFilePath); then the avro
>>> file
>>> >> >> >>>>>> gets
>>> >> >> >>>>>> overwritten.
>>> >> >> >>>>>>
>>> >> >> >>>>>> Thanks,
>>> >> >> >>>>>> Vyacheslav
>>> >> >> >>>>>
>>> >> >> >>>>> --
>>> >> >> >>>>> Harsh J
>>> >> >> >>>>> Customer Ops.
>>> Engineer
>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>>> >> >> >
>>> >> >>
>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>>> Malak
>>> >> <mi...@yahoo.com>
>>> >> >> wrote:
>>> >> >> > Was a JIRA ticket ever created
>>> regarding
>>> >> appending to
>>> >> >> an existing Avro file on HDFS?
>>> >> >> >
>>> >> >> > What is the status of such a
>>> capability, a
>>> >> year out
>>> >> >> from when the issue below was raised?
>>> >> >> >
>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>>> >> "Vyacheslav
>>> >> >> Zholudev" <vy...@gmail.com>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >> Thanks for your reply, I
>>> suspected this.
>>> >> >> >>
>>> >> >> >> I will create a JIRA ticket.
>>> >> >> >>
>>> >> >> >> Vyacheslav
>>> >> >> >>
>>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>>> Scott Carey
>>> >> wrote:
>>> >> >> >>
>>> >> >> >>>
>>> >> >> >>> On 2/21/12 7:29 AM,
>>> "Vyacheslav
>>> >> Zholudev"
>>> >> >> <vy...@gmail.com>
>>> >> >> >>> wrote:
>>> >> >> >>>
>>> >> >> >>>> Yep, I saw that method as
>>> well as
>>> >> the
>>> >> >> stackoverflow post. However, I'm
>>> >> >> >>>> interested how to append
>>> to a file
>>> >> on the
>>> >> >> arbitrary file system, not
>>> >> >> >>>> only on the local one.
>>> >> >> >>>>
>>> >> >> >>>> I want to get an
>>> OutputStream
>>> >> based on the
>>> >> >> Path and the FileSystem
>>> >> >> >>>> implementation and then
>>> pass it
>>> >> for
>>> >> >> appending to avro methods.
>>> >> >> >>>>
>>> >> >> >>>> Is that possible?
>>> >> >> >>>
>>> >> >> >>> It is not possible without
>>> modifying
>>> >> >> DataFileWriter. Please open a JIRA
>>> >> >> >>> ticket.
>>> >> >> >>>
>>> >> >> >>> It could not simply append to
>>> an
>>> >> OutputStream,
>>> >> >> since it must either:
>>> >> >> >>> * Seek to the start to
>>> validate the
>>> >> schemas
>>> >> >> match and find the sync
>>> >> >> >>> marker, or
>>> >> >> >>> * Trust that the schemas
>>> match and
>>> >> find the
>>> >> >> sync marker from the last
>>> >> >> >>> block
>>> >> >> >>>
>>> >> >> >>> DataFileWriter cannot refer
>>> to Hadoop
>>> >> classes
>>> >> >> such as FileSystem, but we
>>> >> >> >>> could add something to the
>>> mapred
>>> >> module that
>>> >> >> takes a Path and
>>> >> >> >>> FileSystem and returns
>>> something that
>>> >> >> implemements an interface that
>>> >> >> >>> DataFileWriter can append
>>> to.
>>> >> This would
>>> >> >> be something that is both a
>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>> >> >> >>> and an OutputStream, or has
>>> both an
>>> >> InputStream
>>> >> >> from the start of the
>>> >> >> >>> existing file and an
>>> OutputStream at
>>> >> the end.
>>> >> >> >>>
>>> >> >> >>>> Thanks,
>>> >> >> >>>> Vyacheslav
>>> >> >> >>>>
>>> >> >> >>>> On Feb 21, 2012, at 5:29
>>> AM, Harsh
>>> >> J
>>> >> >> wrote:
>>> >> >> >>>>
>>> >> >> >>>>> Hi,
>>> >> >> >>>>>
>>> >> >> >>>>> Use the appendTo
>>> feature of
>>> >> the
>>> >> >> DataFileWriter. See
>>> >> >> >>>>>
>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>> >> >> >>>>>
>>> >> >> >>>>> For a quick setup
>>> example,
>>> >> read also:
>>> >> >> >>>>>
>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>> >> >> >>>>>
>>> >> >> >>>>> On Tue, Feb 21, 2012
>>> at 3:15
>>> >> AM,
>>> >> >> Vyacheslav Zholudev
>>> >> >> >>>>> <vy...@gmail.com>
>>> >> >> wrote:
>>> >> >> >>>>>> Hi,
>>> >> >> >>>>>>
>>> >> >> >>>>>> is it possible to
>>> append
>>> >> to an
>>> >> >> already existing avro file when it was
>>> >> >> >>>>>> written and
>>> closed
>>> >> before?
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> >> >> >>>>>> outputStream =
>>> >> >> fs.append(avroFilePath);
>>> >> >> >>>>>>
>>> >> >> >>>>>> then later on I
>>> get:
>>> >> >> java.io.IOException: Invalid sync!
>>> >> >> >>>>>>
>>> >> >> >>>>>> Probably because
>>> the
>>> >> schema is
>>> >> >> written twice and some other issues.
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> outputStream =
>>> >> >> fs.create(avroFilePath); then the avro
>>> file
>>> >> >> >>>>>> gets
>>> >> >> >>>>>> overwritten.
>>> >> >> >>>>>>
>>> >> >> >>>>>> Thanks,
>>> >> >> >>>>>> Vyacheslav
>>> >> >> >>>>>
>>> >> >> >>>>> --
>>> >> >> >>>>> Harsh J
>>> >> >> >>>>> Customer Ops.
>>> Engineer
>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>>> >> >> >
>>> >> >>
>>> >>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <mi...@yahoo.com>
>>> wrote:
>>> > I don't believe a Hadoop FileSystem is a Java
>>> OutputStream?
>>> >
>>> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>>> wrote:
>>> >
>>> >> From: Doug Cutting <cu...@apache.org>
>>> >> Subject: Re: Is it possible to append to an already
>>> existing avro file
>>> >> To: user@avro.apache.org
>>> >> Date: Tuesday, February 5, 2013, 5:27 PM
>>> >> It will work on an OutputStream that
>>> >> supports append.
>>> >>
>>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>>> >> java.io.OutputStream)
>>> >>
>>> >> So it depends on how well HDFS implements
>>> >> FileSystem#append(), not on
>>> >> any changes in Avro.
>>> >>
>>> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>>> >>
>>> >> I have no recent personal experience with append
>>> in
>>> >> HDFS.  Does anyone
>>> >> else here?
>>> >>
>>> >> Doug
>>> >>
>>> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
>>> <mi...@yahoo.com>
>>> >> wrote:
>>> >> > My understanding is that will append to a file
>>> on the
>>> >> local filesystem, but not to a file on HDFS.
>>> >> >
>>> >> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>>> >> wrote:
>>> >> >
>>> >> >> From: Doug Cutting <cu...@apache.org>
>>> >> >> Subject: Re: Is it possible to append to
>>> an already
>>> >> existing avro file
>>> >> >> To: user@avro.apache.org
>>> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
>>> >> >> The Jira is:
>>> >> >>
>>> >> >> https://issues.apache.org/jira/browse/AVRO-1035
>>> >> >>
>>> >> >> It is possible to append to an existing
>>> Avro file:
>>> >> >>
>>> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>> >> >>
>>> >> >> Should we close that issue as "fixed"?
>>> >> >>
>>> >> >> Doug
>>> >> >>
>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>>> Malak
>>> >> <mi...@yahoo.com>
>>> >> >> wrote:
>>> >> >> > Was a JIRA ticket ever created
>>> regarding
>>> >> appending to
>>> >> >> an existing Avro file on HDFS?
>>> >> >> >
>>> >> >> > What is the status of such a
>>> capability, a
>>> >> year out
>>> >> >> from when the issue below was raised?
>>> >> >> >
>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>>> >> "Vyacheslav
>>> >> >> Zholudev" <vy...@gmail.com>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >> Thanks for your reply, I
>>> suspected this.
>>> >> >> >>
>>> >> >> >> I will create a JIRA ticket.
>>> >> >> >>
>>> >> >> >> Vyacheslav
>>> >> >> >>
>>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>>> Scott Carey
>>> >> wrote:
>>> >> >> >>
>>> >> >> >>>
>>> >> >> >>> On 2/21/12 7:29 AM,
>>> "Vyacheslav
>>> >> Zholudev"
>>> >> >> <vy...@gmail.com>
>>> >> >> >>> wrote:
>>> >> >> >>>
>>> >> >> >>>> Yep, I saw that method as
>>> well as
>>> >> the
>>> >> >> stackoverflow post. However, I'm
>>> >> >> >>>> interested how to append
>>> to a file
>>> >> on the
>>> >> >> arbitrary file system, not
>>> >> >> >>>> only on the local one.
>>> >> >> >>>>
>>> >> >> >>>> I want to get an
>>> OutputStream
>>> >> based on the
>>> >> >> Path and the FileSystem
>>> >> >> >>>> implementation and then
>>> pass it
>>> >> for
>>> >> >> appending to avro methods.
>>> >> >> >>>>
>>> >> >> >>>> Is that possible?
>>> >> >> >>>
>>> >> >> >>> It is not possible without
>>> modifying
>>> >> >> DataFileWriter. Please open a JIRA
>>> >> >> >>> ticket.
>>> >> >> >>>
>>> >> >> >>> It could not simply append to
>>> an
>>> >> OutputStream,
>>> >> >> since it must either:
>>> >> >> >>> * Seek to the start to
>>> validate the
>>> >> schemas
>>> >> >> match and find the sync
>>> >> >> >>> marker, or
>>> >> >> >>> * Trust that the schemas
>>> match and
>>> >> find the
>>> >> >> sync marker from the last
>>> >> >> >>> block
>>> >> >> >>>
>>> >> >> >>> DataFileWriter cannot refer
>>> to Hadoop
>>> >> classes
>>> >> >> such as FileSystem, but we
>>> >> >> >>> could add something to the
>>> mapred
>>> >> module that
>>> >> >> takes a Path and
>>> >> >> >>> FileSystem and returns
>>> something that
>>> >> >> implemements an interface that
>>> >> >> >>> DataFileWriter can append
>>> to.
>>> >> This would
>>> >> >> be something that is both a
>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>> >> >> >>> and an OutputStream, or has
>>> both an
>>> >> InputStream
>>> >> >> from the start of the
>>> >> >> >>> existing file and an
>>> OutputStream at
>>> >> the end.
>>> >> >> >>>
>>> >> >> >>>> Thanks,
>>> >> >> >>>> Vyacheslav
>>> >> >> >>>>
>>> >> >> >>>> On Feb 21, 2012, at 5:29
>>> AM, Harsh
>>> >> J
>>> >> >> wrote:
>>> >> >> >>>>
>>> >> >> >>>>> Hi,
>>> >> >> >>>>>
>>> >> >> >>>>> Use the appendTo
>>> feature of
>>> >> the
>>> >> >> DataFileWriter. See
>>> >> >> >>>>>
>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>> >> >> >>>>>
>>> >> >> >>>>> For a quick setup
>>> example,
>>> >> read also:
>>> >> >> >>>>>
>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>> >> >> >>>>>
>>> >> >> >>>>> On Tue, Feb 21, 2012
>>> at 3:15
>>> >> AM,
>>> >> >> Vyacheslav Zholudev
>>> >> >> >>>>> <vy...@gmail.com>
>>> >> >> wrote:
>>> >> >> >>>>>> Hi,
>>> >> >> >>>>>>
>>> >> >> >>>>>> is it possible to
>>> append
>>> >> to an
>>> >> >> already existing avro file when it was
>>> >> >> >>>>>> written and
>>> closed
>>> >> before?
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> >> >> >>>>>> outputStream =
>>> >> >> fs.append(avroFilePath);
>>> >> >> >>>>>>
>>> >> >> >>>>>> then later on I
>>> get:
>>> >> >> java.io.IOException: Invalid sync!
>>> >> >> >>>>>>
>>> >> >> >>>>>> Probably because
>>> the
>>> >> schema is
>>> >> >> written twice and some other issues.
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> outputStream =
>>> >> >> fs.create(avroFilePath); then the avro
>>> file
>>> >> >> >>>>>> gets
>>> >> >> >>>>>> overwritten.
>>> >> >> >>>>>>
>>> >> >> >>>>>> Thanks,
>>> >> >> >>>>>> Vyacheslav
>>> >> >> >>>>>
>>> >> >> >>>>> --
>>> >> >> >>>>> Harsh J
>>> >> >> >>>>> Customer Ops.
>>> Engineer
>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>>> >> >> >
>>> >> >>
>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>>> Malak
>>> >> <mi...@yahoo.com>
>>> >> >> wrote:
>>> >> >> > Was a JIRA ticket ever created
>>> regarding
>>> >> appending to
>>> >> >> an existing Avro file on HDFS?
>>> >> >> >
>>> >> >> > What is the status of such a
>>> capability, a
>>> >> year out
>>> >> >> from when the issue below was raised?
>>> >> >> >
>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>>> >> "Vyacheslav
>>> >> >> Zholudev" <vy...@gmail.com>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >> Thanks for your reply, I
>>> suspected this.
>>> >> >> >>
>>> >> >> >> I will create a JIRA ticket.
>>> >> >> >>
>>> >> >> >> Vyacheslav
>>> >> >> >>
>>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>>> Scott Carey
>>> >> wrote:
>>> >> >> >>
>>> >> >> >>>
>>> >> >> >>> On 2/21/12 7:29 AM,
>>> "Vyacheslav
>>> >> Zholudev"
>>> >> >> <vy...@gmail.com>
>>> >> >> >>> wrote:
>>> >> >> >>>
>>> >> >> >>>> Yep, I saw that method as
>>> well as
>>> >> the
>>> >> >> stackoverflow post. However, I'm
>>> >> >> >>>> interested how to append
>>> to a file
>>> >> on the
>>> >> >> arbitrary file system, not
>>> >> >> >>>> only on the local one.
>>> >> >> >>>>
>>> >> >> >>>> I want to get an
>>> OutputStream
>>> >> based on the
>>> >> >> Path and the FileSystem
>>> >> >> >>>> implementation and then
>>> pass it
>>> >> for
>>> >> >> appending to avro methods.
>>> >> >> >>>>
>>> >> >> >>>> Is that possible?
>>> >> >> >>>
>>> >> >> >>> It is not possible without
>>> modifying
>>> >> >> DataFileWriter. Please open a JIRA
>>> >> >> >>> ticket.
>>> >> >> >>>
>>> >> >> >>> It could not simply append to
>>> an
>>> >> OutputStream,
>>> >> >> since it must either:
>>> >> >> >>> * Seek to the start to
>>> validate the
>>> >> schemas
>>> >> >> match and find the sync
>>> >> >> >>> marker, or
>>> >> >> >>> * Trust that the schemas
>>> match and
>>> >> find the
>>> >> >> sync marker from the last
>>> >> >> >>> block
>>> >> >> >>>
>>> >> >> >>> DataFileWriter cannot refer
>>> to Hadoop
>>> >> classes
>>> >> >> such as FileSystem, but we
>>> >> >> >>> could add something to the
>>> mapred
>>> >> module that
>>> >> >> takes a Path and
>>> >> >> >>> FileSystem and returns
>>> something that
>>> >> >> implemements an interface that
>>> >> >> >>> DataFileWriter can append
>>> to.
>>> >> This would
>>> >> >> be something that is both a
>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>> >> >> >>> and an OutputStream, or has
>>> both an
>>> >> InputStream
>>> >> >> from the start of the
>>> >> >> >>> existing file and an
>>> OutputStream at
>>> >> the end.
>>> >> >> >>>
>>> >> >> >>>> Thanks,
>>> >> >> >>>> Vyacheslav
>>> >> >> >>>>
>>> >> >> >>>> On Feb 21, 2012, at 5:29
>>> AM, Harsh
>>> >> J
>>> >> >> wrote:
>>> >> >> >>>>
>>> >> >> >>>>> Hi,
>>> >> >> >>>>>
>>> >> >> >>>>> Use the appendTo
>>> feature of
>>> >> the
>>> >> >> DataFileWriter. See
>>> >> >> >>>>>
>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>> >> >> >>>>>
>>> >> >> >>>>> For a quick setup
>>> example,
>>> >> read also:
>>> >> >> >>>>>
>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>> >> >> >>>>>
>>> >> >> >>>>> On Tue, Feb 21, 2012
>>> at 3:15
>>> >> AM,
>>> >> >> Vyacheslav Zholudev
>>> >> >> >>>>> <vy...@gmail.com>
>>> >> >> wrote:
>>> >> >> >>>>>> Hi,
>>> >> >> >>>>>>
>>> >> >> >>>>>> is it possible to
>>> append
>>> >> to an
>>> >> >> already existing avro file when it was
>>> >> >> >>>>>> written and
>>> closed
>>> >> before?
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> >> >> >>>>>> outputStream =
>>> >> >> fs.append(avroFilePath);
>>> >> >> >>>>>>
>>> >> >> >>>>>> then later on I
>>> get:
>>> >> >> java.io.IOException: Invalid sync!
>>> >> >> >>>>>>
>>> >> >> >>>>>> Probably because
>>> the
>>> >> schema is
>>> >> >> written twice and some other issues.
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> outputStream =
>>> >> >> fs.create(avroFilePath); then the avro
>>> file
>>> >> >> >>>>>> gets
>>> >> >> >>>>>> overwritten.
>>> >> >> >>>>>>
>>> >> >> >>>>>> Thanks,
>>> >> >> >>>>>> Vyacheslav
>>> >> >> >>>>>
>>> >> >> >>>>> --
>>> >> >> >>>>> Harsh J
>>> >> >> >>>>> Customer Ops.
>>> Engineer
>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>>> >> >> >
>>> >> >>
>>> >>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <mi...@yahoo.com>
>>> wrote:
>>> > I don't believe a Hadoop FileSystem is a Java
>>> OutputStream?
>>> >
>>> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>>> wrote:
>>> >
>>> >> From: Doug Cutting <cu...@apache.org>
>>> >> Subject: Re: Is it possible to append to an already
>>> existing avro file
>>> >> To: user@avro.apache.org
>>> >> Date: Tuesday, February 5, 2013, 5:27 PM
>>> >> It will work on an OutputStream that
>>> >> supports append.
>>> >>
>>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>>> >> java.io.OutputStream)
>>> >>
>>> >> So it depends on how well HDFS implements
>>> >> FileSystem#append(), not on
>>> >> any changes in Avro.
>>> >>
>>> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>>> >>
>>> >> I have no recent personal experience with append
>>> in
>>> >> HDFS.  Does anyone
>>> >> else here?
>>> >>
>>> >> Doug
>>> >>
>>> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
>>> <mi...@yahoo.com>
>>> >> wrote:
>>> >> > My understanding is that will append to a file
>>> on the
>>> >> local filesystem, but not to a file on HDFS.
>>> >> >
>>> >> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>>> >> wrote:
>>> >> >
>>> >> >> From: Doug Cutting <cu...@apache.org>
>>> >> >> Subject: Re: Is it possible to append to
>>> an already
>>> >> existing avro file
>>> >> >> To: user@avro.apache.org
>>> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
>>> >> >> The Jira is:
>>> >> >>
>>> >> >> https://issues.apache.org/jira/browse/AVRO-1035
>>> >> >>
>>> >> >> It is possible to append to an existing
>>> Avro file:
>>> >> >>
>>> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>> >> >>
>>> >> >> Should we close that issue as "fixed"?
>>> >> >>
>>> >> >> Doug
>>> >> >>
>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>>> Malak
>>> >> <mi...@yahoo.com>
>>> >> >> wrote:
>>> >> >> > Was a JIRA ticket ever created
>>> regarding
>>> >> appending to
>>> >> >> an existing Avro file on HDFS?
>>> >> >> >
>>> >> >> > What is the status of such a
>>> capability, a
>>> >> year out
>>> >> >> from when the issue below was raised?
>>> >> >> >
>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>>> >> "Vyacheslav
>>> >> >> Zholudev" <vy...@gmail.com>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >> Thanks for your reply, I
>>> suspected this.
>>> >> >> >>
>>> >> >> >> I will create a JIRA ticket.
>>> >> >> >>
>>> >> >> >> Vyacheslav
>>> >> >> >>
>>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>>> Scott Carey
>>> >> wrote:
>>> >> >> >>
>>> >> >> >>>
>>> >> >> >>> On 2/21/12 7:29 AM,
>>> "Vyacheslav
>>> >> Zholudev"
>>> >> >> <vy...@gmail.com>
>>> >> >> >>> wrote:
>>> >> >> >>>
>>> >> >> >>>> Yep, I saw that method as
>>> well as
>>> >> the
>>> >> >> stackoverflow post. However, I'm
>>> >> >> >>>> interested how to append
>>> to a file
>>> >> on the
>>> >> >> arbitrary file system, not
>>> >> >> >>>> only on the local one.
>>> >> >> >>>>
>>> >> >> >>>> I want to get an
>>> OutputStream
>>> >> based on the
>>> >> >> Path and the FileSystem
>>> >> >> >>>> implementation and then
>>> pass it
>>> >> for
>>> >> >> appending to avro methods.
>>> >> >> >>>>
>>> >> >> >>>> Is that possible?
>>> >> >> >>>
>>> >> >> >>> It is not possible without
>>> modifying
>>> >> >> DataFileWriter. Please open a JIRA
>>> >> >> >>> ticket.
>>> >> >> >>>
>>> >> >> >>> It could not simply append to
>>> an
>>> >> OutputStream,
>>> >> >> since it must either:
>>> >> >> >>> * Seek to the start to
>>> validate the
>>> >> schemas
>>> >> >> match and find the sync
>>> >> >> >>> marker, or
>>> >> >> >>> * Trust that the schemas
>>> match and
>>> >> find the
>>> >> >> sync marker from the last
>>> >> >> >>> block
>>> >> >> >>>
>>> >> >> >>> DataFileWriter cannot refer
>>> to Hadoop
>>> >> classes
>>> >> >> such as FileSystem, but we
>>> >> >> >>> could add something to the
>>> mapred
>>> >> module that
>>> >> >> takes a Path and
>>> >> >> >>> FileSystem and returns
>>> something that
>>> >> >> implemements an interface that
>>> >> >> >>> DataFileWriter can append
>>> to.
>>> >> This would
>>> >> >> be something that is both a
>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>> >> >> >>> and an OutputStream, or has
>>> both an
>>> >> InputStream
>>> >> >> from the start of the
>>> >> >> >>> existing file and an
>>> OutputStream at
>>> >> the end.
>>> >> >> >>>
>>> >> >> >>>> Thanks,
>>> >> >> >>>> Vyacheslav
>>> >> >> >>>>
>>> >> >> >>>> On Feb 21, 2012, at 5:29
>>> AM, Harsh
>>> >> J
>>> >> >> wrote:
>>> >> >> >>>>
>>> >> >> >>>>> Hi,
>>> >> >> >>>>>
>>> >> >> >>>>> Use the appendTo
>>> feature of
>>> >> the
>>> >> >> DataFileWriter. See
>>> >> >> >>>>>
>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>> >> >> >>>>>
>>> >> >> >>>>> For a quick setup
>>> example,
>>> >> read also:
>>> >> >> >>>>>
>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>> >> >> >>>>>
>>> >> >> >>>>> On Tue, Feb 21, 2012
>>> at 3:15
>>> >> AM,
>>> >> >> Vyacheslav Zholudev
>>> >> >> >>>>> <vy...@gmail.com>
>>> >> >> wrote:
>>> >> >> >>>>>> Hi,
>>> >> >> >>>>>>
>>> >> >> >>>>>> is it possible to
>>> append
>>> >> to an
>>> >> >> already existing avro file when it was
>>> >> >> >>>>>> written and
>>> closed
>>> >> before?
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> >> >> >>>>>> outputStream =
>>> >> >> fs.append(avroFilePath);
>>> >> >> >>>>>>
>>> >> >> >>>>>> then later on I
>>> get:
>>> >> >> java.io.IOException: Invalid sync!
>>> >> >> >>>>>>
>>> >> >> >>>>>> Probably because
>>> the
>>> >> schema is
>>> >> >> written twice and some other issues.
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> outputStream =
>>> >> >> fs.create(avroFilePath); then the avro
>>> file
>>> >> >> >>>>>> gets
>>> >> >> >>>>>> overwritten.
>>> >> >> >>>>>>
>>> >> >> >>>>>> Thanks,
>>> >> >> >>>>>> Vyacheslav
>>> >> >> >>>>>
>>> >> >> >>>>> --
>>> >> >> >>>>> Harsh J
>>> >> >> >>>>> Customer Ops.
>>> Engineer
>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>>> >> >> >
>>> >> >>
>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>>> Malak
>>> >> <mi...@yahoo.com>
>>> >> >> wrote:
>>> >> >> > Was a JIRA ticket ever created
>>> regarding
>>> >> appending to
>>> >> >> an existing Avro file on HDFS?
>>> >> >> >
>>> >> >> > What is the status of such a
>>> capability, a
>>> >> year out
>>> >> >> from when the issue below was raised?
>>> >> >> >
>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>>> >> "Vyacheslav
>>> >> >> Zholudev" <vy...@gmail.com>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >> Thanks for your reply, I
>>> suspected this.
>>> >> >> >>
>>> >> >> >> I will create a JIRA ticket.
>>> >> >> >>
>>> >> >> >> Vyacheslav
>>> >> >> >>
>>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>>> Scott Carey
>>> >> wrote:
>>> >> >> >>
>>> >> >> >>>
>>> >> >> >>> On 2/21/12 7:29 AM,
>>> "Vyacheslav
>>> >> Zholudev"
>>> >> >> <vy...@gmail.com>
>>> >> >> >>> wrote:
>>> >> >> >>>
>>> >> >> >>>> Yep, I saw that method as
>>> well as
>>> >> the
>>> >> >> stackoverflow post. However, I'm
>>> >> >> >>>> interested how to append
>>> to a file
>>> >> on the
>>> >> >> arbitrary file system, not
>>> >> >> >>>> only on the local one.
>>> >> >> >>>>
>>> >> >> >>>> I want to get an
>>> OutputStream
>>> >> based on the
>>> >> >> Path and the FileSystem
>>> >> >> >>>> implementation and then
>>> pass it
>>> >> for
>>> >> >> appending to avro methods.
>>> >> >> >>>>
>>> >> >> >>>> Is that possible?
>>> >> >> >>>
>>> >> >> >>> It is not possible without
>>> modifying
>>> >> >> DataFileWriter. Please open a JIRA
>>> >> >> >>> ticket.
>>> >> >> >>>
>>> >> >> >>> It could not simply append to
>>> an
>>> >> OutputStream,
>>> >> >> since it must either:
>>> >> >> >>> * Seek to the start to
>>> validate the
>>> >> schemas
>>> >> >> match and find the sync
>>> >> >> >>> marker, or
>>> >> >> >>> * Trust that the schemas
>>> match and
>>> >> find the
>>> >> >> sync marker from the last
>>> >> >> >>> block
>>> >> >> >>>
>>> >> >> >>> DataFileWriter cannot refer
>>> to Hadoop
>>> >> classes
>>> >> >> such as FileSystem, but we
>>> >> >> >>> could add something to the
>>> mapred
>>> >> module that
>>> >> >> takes a Path and
>>> >> >> >>> FileSystem and returns
>>> something that
>>> >> >> implemements an interface that
>>> >> >> >>> DataFileWriter can append
>>> to.
>>> >> This would
>>> >> >> be something that is both a
>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>> >> >> >>> and an OutputStream, or has
>>> both an
>>> >> InputStream
>>> >> >> from the start of the
>>> >> >> >>> existing file and an
>>> OutputStream at
>>> >> the end.
>>> >> >> >>>
>>> >> >> >>>> Thanks,
>>> >> >> >>>> Vyacheslav
>>> >> >> >>>>
>>> >> >> >>>> On Feb 21, 2012, at 5:29
>>> AM, Harsh
>>> >> J
>>> >> >> wrote:
>>> >> >> >>>>
>>> >> >> >>>>> Hi,
>>> >> >> >>>>>
>>> >> >> >>>>> Use the appendTo
>>> feature of
>>> >> the
>>> >> >> DataFileWriter. See
>>> >> >> >>>>>
>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>> >> >> >>>>>
>>> >> >> >>>>> For a quick setup
>>> example,
>>> >> read also:
>>> >> >> >>>>>
>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>> >> >> >>>>>
>>> >> >> >>>>> On Tue, Feb 21, 2012
>>> at 3:15
>>> >> AM,
>>> >> >> Vyacheslav Zholudev
>>> >> >> >>>>> <vy...@gmail.com>
>>> >> >> wrote:
>>> >> >> >>>>>> Hi,
>>> >> >> >>>>>>
>>> >> >> >>>>>> is it possible to
>>> append
>>> >> to an
>>> >> >> already existing avro file when it was
>>> >> >> >>>>>> written and
>>> closed
>>> >> before?
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> >> >> >>>>>> outputStream =
>>> >> >> fs.append(avroFilePath);
>>> >> >> >>>>>>
>>> >> >> >>>>>> then later on I
>>> get:
>>> >> >> java.io.IOException: Invalid sync!
>>> >> >> >>>>>>
>>> >> >> >>>>>> Probably because
>>> the
>>> >> schema is
>>> >> >> written twice and some other issues.
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> outputStream =
>>> >> >> fs.create(avroFilePath); then the avro
>>> file
>>> >> >> >>>>>> gets
>>> >> >> >>>>>> overwritten.
>>> >> >> >>>>>>
>>> >> >> >>>>>> Thanks,
>>> >> >> >>>>>> Vyacheslav
>>> >> >> >>>>>
>>> >> >> >>>>> --
>>> >> >> >>>>> Harsh J
>>> >> >> >>>>> Customer Ops.
>>> Engineer
>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>>> >> >> >
>>> >> >>
>>> >>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>
>
>
> --
> Harsh J

Re: Is it possible to append to an already existing avro file

Posted by Michael Malak <mi...@yahoo.com>.
I confess to being a user of rather than a developer of open source, but perhaps you could elaborate on what "depends on" means and what the consequences are?

Isn't it -- or couldn't it be made -- a run-time binding, so that only those who try to use the HDFS append functionality would be required to also include the HDFS Jars in their classpath?

Or is the issue more of a bookkeeping one, whereby every update to HDFS will require an Avro regression test?

Now that Hive supports Avro as of the Jan. 11 release of Hive 0.10, the use case of ingesting data into Avro on HDFS is only going to get more popular, and appending is very handy for ingesting, especially for live real-time or near-real-time data.  So it seems to me that if the inconveniences are minor or can be worked around, that Avro indeed should perhaps "depend on" HDFS.

--- On Thu, 2/7/13, Harsh J <ha...@cloudera.com> wrote:

> From: Harsh J <ha...@cloudera.com>
> Subject: Re: Is it possible to append to an already existing avro file
> To: user@avro.apache.org
> Date: Thursday, February 7, 2013, 9:28 AM
> I assume by non-trivial you meant the
> extra Seekable stuff I needed to
> wrap around the DFS output streams to let Avro take it as
> append-able?
> I don't think its possible for Avro to carry it since Avro
> (core) does
> not reverse-depend on Hadoop. Should we document it
> somewhere though?
> Do you have any ideas on the best place to do that?
> 
> On Thu, Feb 7, 2013 at 6:12 AM, Michael Malak <mi...@yahoo.com>
> wrote:
> > Thanks so much for the code -- it works great!
> >
> > Since it is a non-trivial amount of code required to
> > achieve append, I suggest attaching that code to AVRO-1035,
> > in the hopes that someone will come up with an interface
> > that requires just one line of user code to achieve append.
> >
> > --- On Wed, 2/6/13, Harsh J <ha...@cloudera.com>
> wrote:
> >
> >> From: Harsh J <ha...@cloudera.com>
> >> Subject: Re: Is it possible to append to an already existing avro file
> >> To: user@avro.apache.org
> >> Date: Wednesday, February 6, 2013, 11:17 AM
> >> Hey Michael,
> >>
> >> It does implement the regular Java OutputStream interface,
> >> as seen in
> >> the API: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html.
> >>
> >> Here's a sample program that works on Hadoop 2.x in my
> >> tests:
> >> https://gist.github.com/QwertyManiac/4724582


Re: Is it possible to append to an already existing avro file

Posted by Harsh J <ha...@cloudera.com>.
I assume by non-trivial you meant the extra Seekable stuff I needed to
wrap around the DFS output streams to let Avro take it as append-able?
I don't think its possible for Avro to carry it since Avro (core) does
not reverse-depend on Hadoop. Should we document it somewhere though?
Do you have any ideas on the best place to do that?

On Thu, Feb 7, 2013 at 6:12 AM, Michael Malak <mi...@yahoo.com> wrote:
> Thanks so much for the code -- it works great!
>
> Since it is a non-trivial amount of code required to achieve append, I suggest attaching that code to AVRO-1035, in the hopes that someone will come up with an interface that requires just one line of user code to achieve append.
>
> --- On Wed, 2/6/13, Harsh J <ha...@cloudera.com> wrote:
>
>> From: Harsh J <ha...@cloudera.com>
>> Subject: Re: Is it possible to append to an already existing avro file
>> To: user@avro.apache.org
>> Date: Wednesday, February 6, 2013, 11:17 AM
>> Hey Michael,
>>
>> It does implement the regular Java OutputStream interface,
>> as seen in
>> the API: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html.
>>
>> Here's a sample program that works on Hadoop 2.x in my
>> tests:
>> https://gist.github.com/QwertyManiac/4724582
>>
>> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <mi...@yahoo.com>
>> wrote:
>> > I don't believe a Hadoop FileSystem is a Java
>> OutputStream?
>> >
>> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>> wrote:
>> >
>> >> From: Doug Cutting <cu...@apache.org>
>> >> Subject: Re: Is it possible to append to an already
>> existing avro file
>> >> To: user@avro.apache.org
>> >> Date: Tuesday, February 5, 2013, 5:27 PM
>> >> It will work on an OutputStream that
>> >> supports append.
>> >>
>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>> >> java.io.OutputStream)
>> >>
>> >> So it depends on how well HDFS implements
>> >> FileSystem#append(), not on
>> >> any changes in Avro.
>> >>
>> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>> >>
>> >> I have no recent personal experience with append
>> in
>> >> HDFS.  Does anyone
>> >> else here?
>> >>
>> >> Doug
>> >>
>> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
>> <mi...@yahoo.com>
>> >> wrote:
>> >> > My understanding is that will append to a file
>> on the
>> >> local filesystem, but not to a file on HDFS.
>> >> >
>> >> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>> >> wrote:
>> >> >
>> >> >> From: Doug Cutting <cu...@apache.org>
>> >> >> Subject: Re: Is it possible to append to
>> an already
>> >> existing avro file
>> >> >> To: user@avro.apache.org
>> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
>> >> >> The Jira is:
>> >> >>
>> >> >> https://issues.apache.org/jira/browse/AVRO-1035
>> >> >>
>> >> >> It is possible to append to an existing
>> Avro file:
>> >> >>
>> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >>
>> >> >> Should we close that issue as "fixed"?
>> >> >>
>> >> >> Doug
>> >> >>
>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>> Malak
>> >> <mi...@yahoo.com>
>> >> >> wrote:
>> >> >> > Was a JIRA ticket ever created
>> regarding
>> >> appending to
>> >> >> an existing Avro file on HDFS?
>> >> >> >
>> >> >> > What is the status of such a
>> capability, a
>> >> year out
>> >> >> from when the issue below was raised?
>> >> >> >
>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> >> "Vyacheslav
>> >> >> Zholudev" <vy...@gmail.com>
>> >> >> wrote:
>> >> >> >
>> >> >> >> Thanks for your reply, I
>> suspected this.
>> >> >> >>
>> >> >> >> I will create a JIRA ticket.
>> >> >> >>
>> >> >> >> Vyacheslav
>> >> >> >>
>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>> Scott Carey
>> >> wrote:
>> >> >> >>
>> >> >> >>>
>> >> >> >>> On 2/21/12 7:29 AM,
>> "Vyacheslav
>> >> Zholudev"
>> >> >> <vy...@gmail.com>
>> >> >> >>> wrote:
>> >> >> >>>
>> >> >> >>>> Yep, I saw that method as
>> well as
>> >> the
>> >> >> stackoverflow post. However, I'm
>> >> >> >>>> interested how to append
>> to a file
>> >> on the
>> >> >> arbitrary file system, not
>> >> >> >>>> only on the local one.
>> >> >> >>>>
>> >> >> >>>> I want to get an
>> OutputStream
>> >> based on the
>> >> >> Path and the FileSystem
>> >> >> >>>> implementation and then
>> pass it
>> >> for
>> >> >> appending to avro methods.
>> >> >> >>>>
>> >> >> >>>> Is that possible?
>> >> >> >>>
>> >> >> >>> It is not possible without
>> modifying
>> >> >> DataFileWriter. Please open a JIRA
>> >> >> >>> ticket.
>> >> >> >>>
>> >> >> >>> It could not simply append to
>> an
>> >> OutputStream,
>> >> >> since it must either:
>> >> >> >>> * Seek to the start to
>> validate the
>> >> schemas
>> >> >> match and find the sync
>> >> >> >>> marker, or
>> >> >> >>> * Trust that the schemas
>> match and
>> >> find the
>> >> >> sync marker from the last
>> >> >> >>> block
>> >> >> >>>
>> >> >> >>> DataFileWriter cannot refer
>> to Hadoop
>> >> classes
>> >> >> such as FileSystem, but we
>> >> >> >>> could add something to the
>> mapred
>> >> module that
>> >> >> takes a Path and
>> >> >> >>> FileSystem and returns
>> something that
>> >> >> implemements an interface that
>> >> >> >>> DataFileWriter can append
>> to.
>> >> This would
>> >> >> be something that is both a
>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >> >>> and an OutputStream, or has
>> both an
>> >> InputStream
>> >> >> from the start of the
>> >> >> >>> existing file and an
>> OutputStream at
>> >> the end.
>> >> >> >>>
>> >> >> >>>> Thanks,
>> >> >> >>>> Vyacheslav
>> >> >> >>>>
>> >> >> >>>> On Feb 21, 2012, at 5:29
>> AM, Harsh
>> >> J
>> >> >> wrote:
>> >> >> >>>>
>> >> >> >>>>> Hi,
>> >> >> >>>>>
>> >> >> >>>>> Use the appendTo
>> feature of
>> >> the
>> >> >> DataFileWriter. See
>> >> >> >>>>>
>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >> >>>>>
>> >> >> >>>>> For a quick setup
>> example,
>> >> read also:
>> >> >> >>>>>
>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >> >>>>>
>> >> >> >>>>> On Tue, Feb 21, 2012
>> at 3:15
>> >> AM,
>> >> >> Vyacheslav Zholudev
>> >> >> >>>>> <vy...@gmail.com>
>> >> >> wrote:
>> >> >> >>>>>> Hi,
>> >> >> >>>>>>
>> >> >> >>>>>> is it possible to
>> append
>> >> to an
>> >> >> already existing avro file when it was
>> >> >> >>>>>> written and
>> closed
>> >> before?
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> >> >> >>>>>> outputStream =
>> >> >> fs.append(avroFilePath);
>> >> >> >>>>>>
>> >> >> >>>>>> then later on I
>> get:
>> >> >> java.io.IOException: Invalid sync!
>> >> >> >>>>>>
>> >> >> >>>>>> Probably because
>> the
>> >> schema is
>> >> >> written twice and some other issues.
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> outputStream =
>> >> >> fs.create(avroFilePath); then the avro
>> file
>> >> >> >>>>>> gets
>> >> >> >>>>>> overwritten.
>> >> >> >>>>>>
>> >> >> >>>>>> Thanks,
>> >> >> >>>>>> Vyacheslav
>> >> >> >>>>>
>> >> >> >>>>> --
>> >> >> >>>>> Harsh J
>> >> >> >>>>> Customer Ops.
>> Engineer
>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >> >
>> >> >>
>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>> Malak
>> >> <mi...@yahoo.com>
>> >> >> wrote:
>> >> >> > Was a JIRA ticket ever created
>> regarding
>> >> appending to
>> >> >> an existing Avro file on HDFS?
>> >> >> >
>> >> >> > What is the status of such a
>> capability, a
>> >> year out
>> >> >> from when the issue below was raised?
>> >> >> >
>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> >> "Vyacheslav
>> >> >> Zholudev" <vy...@gmail.com>
>> >> >> wrote:
>> >> >> >
>> >> >> >> Thanks for your reply, I
>> suspected this.
>> >> >> >>
>> >> >> >> I will create a JIRA ticket.
>> >> >> >>
>> >> >> >> Vyacheslav
>> >> >> >>
>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>> Scott Carey
>> >> wrote:
>> >> >> >>
>> >> >> >>>
>> >> >> >>> On 2/21/12 7:29 AM,
>> "Vyacheslav
>> >> Zholudev"
>> >> >> <vy...@gmail.com>
>> >> >> >>> wrote:
>> >> >> >>>
>> >> >> >>>> Yep, I saw that method as
>> well as
>> >> the
>> >> >> stackoverflow post. However, I'm
>> >> >> >>>> interested how to append
>> to a file
>> >> on the
>> >> >> arbitrary file system, not
>> >> >> >>>> only on the local one.
>> >> >> >>>>
>> >> >> >>>> I want to get an
>> OutputStream
>> >> based on the
>> >> >> Path and the FileSystem
>> >> >> >>>> implementation and then
>> pass it
>> >> for
>> >> >> appending to avro methods.
>> >> >> >>>>
>> >> >> >>>> Is that possible?
>> >> >> >>>
>> >> >> >>> It is not possible without
>> modifying
>> >> >> DataFileWriter. Please open a JIRA
>> >> >> >>> ticket.
>> >> >> >>>
>> >> >> >>> It could not simply append to
>> an
>> >> OutputStream,
>> >> >> since it must either:
>> >> >> >>> * Seek to the start to
>> validate the
>> >> schemas
>> >> >> match and find the sync
>> >> >> >>> marker, or
>> >> >> >>> * Trust that the schemas
>> match and
>> >> find the
>> >> >> sync marker from the last
>> >> >> >>> block
>> >> >> >>>
>> >> >> >>> DataFileWriter cannot refer
>> to Hadoop
>> >> classes
>> >> >> such as FileSystem, but we
>> >> >> >>> could add something to the
>> mapred
>> >> module that
>> >> >> takes a Path and
>> >> >> >>> FileSystem and returns
>> something that
>> >> >> implemements an interface that
>> >> >> >>> DataFileWriter can append
>> to.
>> >> This would
>> >> >> be something that is both a
>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >> >>> and an OutputStream, or has
>> both an
>> >> InputStream
>> >> >> from the start of the
>> >> >> >>> existing file and an
>> OutputStream at
>> >> the end.
>> >> >> >>>
>> >> >> >>>> Thanks,
>> >> >> >>>> Vyacheslav
>> >> >> >>>>
>> >> >> >>>> On Feb 21, 2012, at 5:29
>> AM, Harsh
>> >> J
>> >> >> wrote:
>> >> >> >>>>
>> >> >> >>>>> Hi,
>> >> >> >>>>>
>> >> >> >>>>> Use the appendTo
>> feature of
>> >> the
>> >> >> DataFileWriter. See
>> >> >> >>>>>
>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >> >>>>>
>> >> >> >>>>> For a quick setup
>> example,
>> >> read also:
>> >> >> >>>>>
>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >> >>>>>
>> >> >> >>>>> On Tue, Feb 21, 2012
>> at 3:15
>> >> AM,
>> >> >> Vyacheslav Zholudev
>> >> >> >>>>> <vy...@gmail.com>
>> >> >> wrote:
>> >> >> >>>>>> Hi,
>> >> >> >>>>>>
>> >> >> >>>>>> is it possible to
>> append
>> >> to an
>> >> >> already existing avro file when it was
>> >> >> >>>>>> written and
>> closed
>> >> before?
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> >> >> >>>>>> outputStream =
>> >> >> fs.append(avroFilePath);
>> >> >> >>>>>>
>> >> >> >>>>>> then later on I
>> get:
>> >> >> java.io.IOException: Invalid sync!
>> >> >> >>>>>>
>> >> >> >>>>>> Probably because
>> the
>> >> schema is
>> >> >> written twice and some other issues.
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> outputStream =
>> >> >> fs.create(avroFilePath); then the avro
>> file
>> >> >> >>>>>> gets
>> >> >> >>>>>> overwritten.
>> >> >> >>>>>>
>> >> >> >>>>>> Thanks,
>> >> >> >>>>>> Vyacheslav
>> >> >> >>>>>
>> >> >> >>>>> --
>> >> >> >>>>> Harsh J
>> >> >> >>>>> Customer Ops.
>> Engineer
>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >> >
>> >> >>
>> >>
>>
>>
>>
>> --
>> Harsh J
>>
>> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <mi...@yahoo.com>
>> wrote:
>> > I don't believe a Hadoop FileSystem is a Java
>> OutputStream?
>> >
>> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>> wrote:
>> >
>> >> From: Doug Cutting <cu...@apache.org>
>> >> Subject: Re: Is it possible to append to an already
>> existing avro file
>> >> To: user@avro.apache.org
>> >> Date: Tuesday, February 5, 2013, 5:27 PM
>> >> It will work on an OutputStream that
>> >> supports append.
>> >>
>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>> >> java.io.OutputStream)
>> >>
>> >> So it depends on how well HDFS implements
>> >> FileSystem#append(), not on
>> >> any changes in Avro.
>> >>
>> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>> >>
>> >> I have no recent personal experience with append
>> in
>> >> HDFS.  Does anyone
>> >> else here?
>> >>
>> >> Doug
>> >>
>> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
>> <mi...@yahoo.com>
>> >> wrote:
>> >> > My understanding is that will append to a file
>> on the
>> >> local filesystem, but not to a file on HDFS.
>> >> >
>> >> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>> >> wrote:
>> >> >
>> >> >> From: Doug Cutting <cu...@apache.org>
>> >> >> Subject: Re: Is it possible to append to
>> an already
>> >> existing avro file
>> >> >> To: user@avro.apache.org
>> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
>> >> >> The Jira is:
>> >> >>
>> >> >> https://issues.apache.org/jira/browse/AVRO-1035
>> >> >>
>> >> >> It is possible to append to an existing
>> Avro file:
>> >> >>
>> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >>
>> >> >> Should we close that issue as "fixed"?
>> >> >>
>> >> >> Doug
>> >> >>
>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>> Malak
>> >> <mi...@yahoo.com>
>> >> >> wrote:
>> >> >> > Was a JIRA ticket ever created
>> regarding
>> >> appending to
>> >> >> an existing Avro file on HDFS?
>> >> >> >
>> >> >> > What is the status of such a
>> capability, a
>> >> year out
>> >> >> from when the issue below was raised?
>> >> >> >
>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> >> "Vyacheslav
>> >> >> Zholudev" <vy...@gmail.com>
>> >> >> wrote:
>> >> >> >
>> >> >> >> Thanks for your reply, I
>> suspected this.
>> >> >> >>
>> >> >> >> I will create a JIRA ticket.
>> >> >> >>
>> >> >> >> Vyacheslav
>> >> >> >>
>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>> Scott Carey
>> >> wrote:
>> >> >> >>
>> >> >> >>>
>> >> >> >>> On 2/21/12 7:29 AM,
>> "Vyacheslav
>> >> Zholudev"
>> >> >> <vy...@gmail.com>
>> >> >> >>> wrote:
>> >> >> >>>
>> >> >> >>>> Yep, I saw that method as
>> well as
>> >> the
>> >> >> stackoverflow post. However, I'm
>> >> >> >>>> interested how to append
>> to a file
>> >> on the
>> >> >> arbitrary file system, not
>> >> >> >>>> only on the local one.
>> >> >> >>>>
>> >> >> >>>> I want to get an
>> OutputStream
>> >> based on the
>> >> >> Path and the FileSystem
>> >> >> >>>> implementation and then
>> pass it
>> >> for
>> >> >> appending to avro methods.
>> >> >> >>>>
>> >> >> >>>> Is that possible?
>> >> >> >>>
>> >> >> >>> It is not possible without
>> modifying
>> >> >> DataFileWriter. Please open a JIRA
>> >> >> >>> ticket.
>> >> >> >>>
>> >> >> >>> It could not simply append to
>> an
>> >> OutputStream,
>> >> >> since it must either:
>> >> >> >>> * Seek to the start to
>> validate the
>> >> schemas
>> >> >> match and find the sync
>> >> >> >>> marker, or
>> >> >> >>> * Trust that the schemas
>> match and
>> >> find the
>> >> >> sync marker from the last
>> >> >> >>> block
>> >> >> >>>
>> >> >> >>> DataFileWriter cannot refer
>> to Hadoop
>> >> classes
>> >> >> such as FileSystem, but we
>> >> >> >>> could add something to the
>> mapred
>> >> module that
>> >> >> takes a Path and
>> >> >> >>> FileSystem and returns
>> something that
>> >> >> implemements an interface that
>> >> >> >>> DataFileWriter can append
>> to.
>> >> This would
>> >> >> be something that is both a
>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >> >>> and an OutputStream, or has
>> both an
>> >> InputStream
>> >> >> from the start of the
>> >> >> >>> existing file and an
>> OutputStream at
>> >> the end.
>> >> >> >>>
>> >> >> >>>> Thanks,
>> >> >> >>>> Vyacheslav
>> >> >> >>>>
>> >> >> >>>> On Feb 21, 2012, at 5:29
>> AM, Harsh
>> >> J
>> >> >> wrote:
>> >> >> >>>>
>> >> >> >>>>> Hi,
>> >> >> >>>>>
>> >> >> >>>>> Use the appendTo
>> feature of
>> >> the
>> >> >> DataFileWriter. See
>> >> >> >>>>>
>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >> >>>>>
>> >> >> >>>>> For a quick setup
>> example,
>> >> read also:
>> >> >> >>>>>
>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >> >>>>>
>> >> >> >>>>> On Tue, Feb 21, 2012
>> at 3:15
>> >> AM,
>> >> >> Vyacheslav Zholudev
>> >> >> >>>>> <vy...@gmail.com>
>> >> >> wrote:
>> >> >> >>>>>> Hi,
>> >> >> >>>>>>
>> >> >> >>>>>> is it possible to
>> append
>> >> to an
>> >> >> already existing avro file when it was
>> >> >> >>>>>> written and
>> closed
>> >> before?
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> >> >> >>>>>> outputStream =
>> >> >> fs.append(avroFilePath);
>> >> >> >>>>>>
>> >> >> >>>>>> then later on I
>> get:
>> >> >> java.io.IOException: Invalid sync!
>> >> >> >>>>>>
>> >> >> >>>>>> Probably because
>> the
>> >> schema is
>> >> >> written twice and some other issues.
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> outputStream =
>> >> >> fs.create(avroFilePath); then the avro
>> file
>> >> >> >>>>>> gets
>> >> >> >>>>>> overwritten.
>> >> >> >>>>>>
>> >> >> >>>>>> Thanks,
>> >> >> >>>>>> Vyacheslav
>> >> >> >>>>>
>> >> >> >>>>> --
>> >> >> >>>>> Harsh J
>> >> >> >>>>> Customer Ops.
>> Engineer
>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >> >
>> >> >>
>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>> Malak
>> >> <mi...@yahoo.com>
>> >> >> wrote:
>> >> >> > Was a JIRA ticket ever created
>> regarding
>> >> appending to
>> >> >> an existing Avro file on HDFS?
>> >> >> >
>> >> >> > What is the status of such a
>> capability, a
>> >> year out
>> >> >> from when the issue below was raised?
>> >> >> >
>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> >> "Vyacheslav
>> >> >> Zholudev" <vy...@gmail.com>
>> >> >> wrote:
>> >> >> >
>> >> >> >> Thanks for your reply, I
>> suspected this.
>> >> >> >>
>> >> >> >> I will create a JIRA ticket.
>> >> >> >>
>> >> >> >> Vyacheslav
>> >> >> >>
>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>> Scott Carey
>> >> wrote:
>> >> >> >>
>> >> >> >>>
>> >> >> >>> On 2/21/12 7:29 AM,
>> "Vyacheslav
>> >> Zholudev"
>> >> >> <vy...@gmail.com>
>> >> >> >>> wrote:
>> >> >> >>>
>> >> >> >>>> Yep, I saw that method as
>> well as
>> >> the
>> >> >> stackoverflow post. However, I'm
>> >> >> >>>> interested how to append
>> to a file
>> >> on the
>> >> >> arbitrary file system, not
>> >> >> >>>> only on the local one.
>> >> >> >>>>
>> >> >> >>>> I want to get an
>> OutputStream
>> >> based on the
>> >> >> Path and the FileSystem
>> >> >> >>>> implementation and then
>> pass it
>> >> for
>> >> >> appending to avro methods.
>> >> >> >>>>
>> >> >> >>>> Is that possible?
>> >> >> >>>
>> >> >> >>> It is not possible without
>> modifying
>> >> >> DataFileWriter. Please open a JIRA
>> >> >> >>> ticket.
>> >> >> >>>
>> >> >> >>> It could not simply append to
>> an
>> >> OutputStream,
>> >> >> since it must either:
>> >> >> >>> * Seek to the start to
>> validate the
>> >> schemas
>> >> >> match and find the sync
>> >> >> >>> marker, or
>> >> >> >>> * Trust that the schemas
>> match and
>> >> find the
>> >> >> sync marker from the last
>> >> >> >>> block
>> >> >> >>>
>> >> >> >>> DataFileWriter cannot refer
>> to Hadoop
>> >> classes
>> >> >> such as FileSystem, but we
>> >> >> >>> could add something to the
>> mapred
>> >> module that
>> >> >> takes a Path and
>> >> >> >>> FileSystem and returns
>> something that
>> >> >> implemements an interface that
>> >> >> >>> DataFileWriter can append
>> to.
>> >> This would
>> >> >> be something that is both a
>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >> >>> and an OutputStream, or has
>> both an
>> >> InputStream
>> >> >> from the start of the
>> >> >> >>> existing file and an
>> OutputStream at
>> >> the end.
>> >> >> >>>
>> >> >> >>>> Thanks,
>> >> >> >>>> Vyacheslav
>> >> >> >>>>
>> >> >> >>>> On Feb 21, 2012, at 5:29
>> AM, Harsh
>> >> J
>> >> >> wrote:
>> >> >> >>>>
>> >> >> >>>>> Hi,
>> >> >> >>>>>
>> >> >> >>>>> Use the appendTo
>> feature of
>> >> the
>> >> >> DataFileWriter. See
>> >> >> >>>>>
>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >> >>>>>
>> >> >> >>>>> For a quick setup
>> example,
>> >> read also:
>> >> >> >>>>>
>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >> >>>>>
>> >> >> >>>>> On Tue, Feb 21, 2012
>> at 3:15
>> >> AM,
>> >> >> Vyacheslav Zholudev
>> >> >> >>>>> <vy...@gmail.com>
>> >> >> wrote:
>> >> >> >>>>>> Hi,
>> >> >> >>>>>>
>> >> >> >>>>>> is it possible to
>> append
>> >> to an
>> >> >> already existing avro file when it was
>> >> >> >>>>>> written and
>> closed
>> >> before?
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> >> >> >>>>>> outputStream =
>> >> >> fs.append(avroFilePath);
>> >> >> >>>>>>
>> >> >> >>>>>> then later on I
>> get:
>> >> >> java.io.IOException: Invalid sync!
>> >> >> >>>>>>
>> >> >> >>>>>> Probably because
>> the
>> >> schema is
>> >> >> written twice and some other issues.
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> outputStream =
>> >> >> fs.create(avroFilePath); then the avro
>> file
>> >> >> >>>>>> gets
>> >> >> >>>>>> overwritten.
>> >> >> >>>>>>
>> >> >> >>>>>> Thanks,
>> >> >> >>>>>> Vyacheslav
>> >> >> >>>>>
>> >> >> >>>>> --
>> >> >> >>>>> Harsh J
>> >> >> >>>>> Customer Ops.
>> Engineer
>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >> >
>> >> >>
>> >>
>>
>>
>>
>> --
>> Harsh J
>>
>> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <mi...@yahoo.com>
>> wrote:
>> > I don't believe a Hadoop FileSystem is a Java
>> OutputStream?
>> >
>> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>> wrote:
>> >
>> >> From: Doug Cutting <cu...@apache.org>
>> >> Subject: Re: Is it possible to append to an already
>> existing avro file
>> >> To: user@avro.apache.org
>> >> Date: Tuesday, February 5, 2013, 5:27 PM
>> >> It will work on an OutputStream that
>> >> supports append.
>> >>
>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>> >> java.io.OutputStream)
>> >>
>> >> So it depends on how well HDFS implements
>> >> FileSystem#append(), not on
>> >> any changes in Avro.
>> >>
>> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>> >>
>> >> I have no recent personal experience with append
>> in
>> >> HDFS.  Does anyone
>> >> else here?
>> >>
>> >> Doug
>> >>
>> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
>> <mi...@yahoo.com>
>> >> wrote:
>> >> > My understanding is that will append to a file
>> on the
>> >> local filesystem, but not to a file on HDFS.
>> >> >
>> >> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>> >> wrote:
>> >> >
>> >> >> From: Doug Cutting <cu...@apache.org>
>> >> >> Subject: Re: Is it possible to append to
>> an already
>> >> existing avro file
>> >> >> To: user@avro.apache.org
>> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
>> >> >> The Jira is:
>> >> >>
>> >> >> https://issues.apache.org/jira/browse/AVRO-1035
>> >> >>
>> >> >> It is possible to append to an existing
>> Avro file:
>> >> >>
>> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >>
>> >> >> Should we close that issue as "fixed"?
>> >> >>
>> >> >> Doug
>> >> >>
>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>> Malak
>> >> <mi...@yahoo.com>
>> >> >> wrote:
>> >> >> > Was a JIRA ticket ever created
>> regarding
>> >> appending to
>> >> >> an existing Avro file on HDFS?
>> >> >> >
>> >> >> > What is the status of such a
>> capability, a
>> >> year out
>> >> >> from when the issue below was raised?
>> >> >> >
>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> >> "Vyacheslav
>> >> >> Zholudev" <vy...@gmail.com>
>> >> >> wrote:
>> >> >> >
>> >> >> >> Thanks for your reply, I
>> suspected this.
>> >> >> >>
>> >> >> >> I will create a JIRA ticket.
>> >> >> >>
>> >> >> >> Vyacheslav
>> >> >> >>
>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>> Scott Carey
>> >> wrote:
>> >> >> >>
>> >> >> >>>
>> >> >> >>> On 2/21/12 7:29 AM,
>> "Vyacheslav
>> >> Zholudev"
>> >> >> <vy...@gmail.com>
>> >> >> >>> wrote:
>> >> >> >>>
>> >> >> >>>> Yep, I saw that method as
>> well as
>> >> the
>> >> >> stackoverflow post. However, I'm
>> >> >> >>>> interested how to append
>> to a file
>> >> on the
>> >> >> arbitrary file system, not
>> >> >> >>>> only on the local one.
>> >> >> >>>>
>> >> >> >>>> I want to get an
>> OutputStream
>> >> based on the
>> >> >> Path and the FileSystem
>> >> >> >>>> implementation and then
>> pass it
>> >> for
>> >> >> appending to avro methods.
>> >> >> >>>>
>> >> >> >>>> Is that possible?
>> >> >> >>>
>> >> >> >>> It is not possible without
>> modifying
>> >> >> DataFileWriter. Please open a JIRA
>> >> >> >>> ticket.
>> >> >> >>>
>> >> >> >>> It could not simply append to
>> an
>> >> OutputStream,
>> >> >> since it must either:
>> >> >> >>> * Seek to the start to
>> validate the
>> >> schemas
>> >> >> match and find the sync
>> >> >> >>> marker, or
>> >> >> >>> * Trust that the schemas
>> match and
>> >> find the
>> >> >> sync marker from the last
>> >> >> >>> block
>> >> >> >>>
>> >> >> >>> DataFileWriter cannot refer
>> to Hadoop
>> >> classes
>> >> >> such as FileSystem, but we
>> >> >> >>> could add something to the
>> mapred
>> >> module that
>> >> >> takes a Path and
>> >> >> >>> FileSystem and returns
>> something that
>> >> >> implemements an interface that
>> >> >> >>> DataFileWriter can append
>> to.
>> >> This would
>> >> >> be something that is both a
>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >> >>> and an OutputStream, or has
>> both an
>> >> InputStream
>> >> >> from the start of the
>> >> >> >>> existing file and an
>> OutputStream at
>> >> the end.
>> >> >> >>>
>> >> >> >>>> Thanks,
>> >> >> >>>> Vyacheslav
>> >> >> >>>>
>> >> >> >>>> On Feb 21, 2012, at 5:29
>> AM, Harsh
>> >> J
>> >> >> wrote:
>> >> >> >>>>
>> >> >> >>>>> Hi,
>> >> >> >>>>>
>> >> >> >>>>> Use the appendTo
>> feature of
>> >> the
>> >> >> DataFileWriter. See
>> >> >> >>>>>
>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >> >>>>>
>> >> >> >>>>> For a quick setup
>> example,
>> >> read also:
>> >> >> >>>>>
>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >> >>>>>
>> >> >> >>>>> On Tue, Feb 21, 2012
>> at 3:15
>> >> AM,
>> >> >> Vyacheslav Zholudev
>> >> >> >>>>> <vy...@gmail.com>
>> >> >> wrote:
>> >> >> >>>>>> Hi,
>> >> >> >>>>>>
>> >> >> >>>>>> is it possible to
>> append
>> >> to an
>> >> >> already existing avro file when it was
>> >> >> >>>>>> written and
>> closed
>> >> before?
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> >> >> >>>>>> outputStream =
>> >> >> fs.append(avroFilePath);
>> >> >> >>>>>>
>> >> >> >>>>>> then later on I
>> get:
>> >> >> java.io.IOException: Invalid sync!
>> >> >> >>>>>>
>> >> >> >>>>>> Probably because
>> the
>> >> schema is
>> >> >> written twice and some other issues.
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> outputStream =
>> >> >> fs.create(avroFilePath); then the avro
>> file
>> >> >> >>>>>> gets
>> >> >> >>>>>> overwritten.
>> >> >> >>>>>>
>> >> >> >>>>>> Thanks,
>> >> >> >>>>>> Vyacheslav
>> >> >> >>>>>
>> >> >> >>>>> --
>> >> >> >>>>> Harsh J
>> >> >> >>>>> Customer Ops.
>> Engineer
>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >> >
>> >> >>
>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>> Malak
>> >> <mi...@yahoo.com>
>> >> >> wrote:
>> >> >> > Was a JIRA ticket ever created
>> regarding
>> >> appending to
>> >> >> an existing Avro file on HDFS?
>> >> >> >
>> >> >> > What is the status of such a
>> capability, a
>> >> year out
>> >> >> from when the issue below was raised?
>> >> >> >
>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> >> "Vyacheslav
>> >> >> Zholudev" <vy...@gmail.com>
>> >> >> wrote:
>> >> >> >
>> >> >> >> Thanks for your reply, I
>> suspected this.
>> >> >> >>
>> >> >> >> I will create a JIRA ticket.
>> >> >> >>
>> >> >> >> Vyacheslav
>> >> >> >>
>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>> Scott Carey
>> >> wrote:
>> >> >> >>
>> >> >> >>>
>> >> >> >>> On 2/21/12 7:29 AM,
>> "Vyacheslav
>> >> Zholudev"
>> >> >> <vy...@gmail.com>
>> >> >> >>> wrote:
>> >> >> >>>
>> >> >> >>>> Yep, I saw that method as
>> well as
>> >> the
>> >> >> stackoverflow post. However, I'm
>> >> >> >>>> interested how to append
>> to a file
>> >> on the
>> >> >> arbitrary file system, not
>> >> >> >>>> only on the local one.
>> >> >> >>>>
>> >> >> >>>> I want to get an
>> OutputStream
>> >> based on the
>> >> >> Path and the FileSystem
>> >> >> >>>> implementation and then
>> pass it
>> >> for
>> >> >> appending to avro methods.
>> >> >> >>>>
>> >> >> >>>> Is that possible?
>> >> >> >>>
>> >> >> >>> It is not possible without
>> modifying
>> >> >> DataFileWriter. Please open a JIRA
>> >> >> >>> ticket.
>> >> >> >>>
>> >> >> >>> It could not simply append to
>> an
>> >> OutputStream,
>> >> >> since it must either:
>> >> >> >>> * Seek to the start to
>> validate the
>> >> schemas
>> >> >> match and find the sync
>> >> >> >>> marker, or
>> >> >> >>> * Trust that the schemas
>> match and
>> >> find the
>> >> >> sync marker from the last
>> >> >> >>> block
>> >> >> >>>
>> >> >> >>> DataFileWriter cannot refer
>> to Hadoop
>> >> classes
>> >> >> such as FileSystem, but we
>> >> >> >>> could add something to the
>> mapred
>> >> module that
>> >> >> takes a Path and
>> >> >> >>> FileSystem and returns
>> something that
>> >> >> implemements an interface that
>> >> >> >>> DataFileWriter can append
>> to.
>> >> This would
>> >> >> be something that is both a
>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >> >>> and an OutputStream, or has
>> both an
>> >> InputStream
>> >> >> from the start of the
>> >> >> >>> existing file and an
>> OutputStream at
>> >> the end.
>> >> >> >>>
>> >> >> >>>> Thanks,
>> >> >> >>>> Vyacheslav
>> >> >> >>>>
>> >> >> >>>> On Feb 21, 2012, at 5:29
>> AM, Harsh
>> >> J
>> >> >> wrote:
>> >> >> >>>>
>> >> >> >>>>> Hi,
>> >> >> >>>>>
>> >> >> >>>>> Use the appendTo
>> feature of
>> >> the
>> >> >> DataFileWriter. See
>> >> >> >>>>>
>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >> >>>>>
>> >> >> >>>>> For a quick setup
>> example,
>> >> read also:
>> >> >> >>>>>
>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >> >>>>>
>> >> >> >>>>> On Tue, Feb 21, 2012
>> at 3:15
>> >> AM,
>> >> >> Vyacheslav Zholudev
>> >> >> >>>>> <vy...@gmail.com>
>> >> >> wrote:
>> >> >> >>>>>> Hi,
>> >> >> >>>>>>
>> >> >> >>>>>> is it possible to
>> append
>> >> to an
>> >> >> already existing avro file when it was
>> >> >> >>>>>> written and
>> closed
>> >> before?
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> >> >> >>>>>> outputStream =
>> >> >> fs.append(avroFilePath);
>> >> >> >>>>>>
>> >> >> >>>>>> then later on I
>> get:
>> >> >> java.io.IOException: Invalid sync!
>> >> >> >>>>>>
>> >> >> >>>>>> Probably because
>> the
>> >> schema is
>> >> >> written twice and some other issues.
>> >> >> >>>>>>
>> >> >> >>>>>> If I use
>> outputStream =
>> >> >> fs.create(avroFilePath); then the avro
>> file
>> >> >> >>>>>> gets
>> >> >> >>>>>> overwritten.
>> >> >> >>>>>>
>> >> >> >>>>>> Thanks,
>> >> >> >>>>>> Vyacheslav
>> >> >> >>>>>
>> >> >> >>>>> --
>> >> >> >>>>> Harsh J
>> >> >> >>>>> Customer Ops.
>> Engineer
>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >> >
>> >> >>
>> >>
>>
>>
>>
>> --
>> Harsh J
>>



--
Harsh J

Re: Is it possible to append to an already existing avro file

Posted by Michael Malak <mi...@yahoo.com>.
Thanks so much for the code -- it works great!

Since it is a non-trivial amount of code required to achieve append, I suggest attaching that code to AVRO-1035, in the hopes that someone will come up with an interface that requires just one line of user code to achieve append.

--- On Wed, 2/6/13, Harsh J <ha...@cloudera.com> wrote:

> From: Harsh J <ha...@cloudera.com>
> Subject: Re: Is it possible to append to an already existing avro file
> To: user@avro.apache.org
> Date: Wednesday, February 6, 2013, 11:17 AM
> Hey Michael,
> 
> It does implement the regular Java OutputStream interface,
> as seen in
> the API: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html.
> 
> Here's a sample program that works on Hadoop 2.x in my
> tests:
> https://gist.github.com/QwertyManiac/4724582
> 
> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <mi...@yahoo.com>
> wrote:
> > I don't believe a Hadoop FileSystem is a Java
> OutputStream?
> >
> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
> wrote:
> >
> >> From: Doug Cutting <cu...@apache.org>
> >> Subject: Re: Is it possible to append to an already
> existing avro file
> >> To: user@avro.apache.org
> >> Date: Tuesday, February 5, 2013, 5:27 PM
> >> It will work on an OutputStream that
> >> supports append.
> >>
> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
> >> java.io.OutputStream)
> >>
> >> So it depends on how well HDFS implements
> >> FileSystem#append(), not on
> >> any changes in Avro.
> >>
> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
> >>
> >> I have no recent personal experience with append
> in
> >> HDFS.  Does anyone
> >> else here?
> >>
> >> Doug
> >>
> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
> <mi...@yahoo.com>
> >> wrote:
> >> > My understanding is that will append to a file
> on the
> >> local filesystem, but not to a file on HDFS.
> >> >
> >> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
> >> wrote:
> >> >
> >> >> From: Doug Cutting <cu...@apache.org>
> >> >> Subject: Re: Is it possible to append to
> an already
> >> existing avro file
> >> >> To: user@avro.apache.org
> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
> >> >> The Jira is:
> >> >>
> >> >> https://issues.apache.org/jira/browse/AVRO-1035
> >> >>
> >> >> It is possible to append to an existing
> Avro file:
> >> >>
> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >>
> >> >> Should we close that issue as "fixed"?
> >> >>
> >> >> Doug
> >> >>
> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
> Malak
> >> <mi...@yahoo.com>
> >> >> wrote:
> >> >> > Was a JIRA ticket ever created
> regarding
> >> appending to
> >> >> an existing Avro file on HDFS?
> >> >> >
> >> >> > What is the status of such a
> capability, a
> >> year out
> >> >> from when the issue below was raised?
> >> >> >
> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
> >> "Vyacheslav
> >> >> Zholudev" <vy...@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >> Thanks for your reply, I
> suspected this.
> >> >> >>
> >> >> >> I will create a JIRA ticket.
> >> >> >>
> >> >> >> Vyacheslav
> >> >> >>
> >> >> >> On Feb 21, 2012, at 6:02 PM,
> Scott Carey
> >> wrote:
> >> >> >>
> >> >> >>>
> >> >> >>> On 2/21/12 7:29 AM,
> "Vyacheslav
> >> Zholudev"
> >> >> <vy...@gmail.com>
> >> >> >>> wrote:
> >> >> >>>
> >> >> >>>> Yep, I saw that method as
> well as
> >> the
> >> >> stackoverflow post. However, I'm
> >> >> >>>> interested how to append
> to a file
> >> on the
> >> >> arbitrary file system, not
> >> >> >>>> only on the local one.
> >> >> >>>>
> >> >> >>>> I want to get an
> OutputStream
> >> based on the
> >> >> Path and the FileSystem
> >> >> >>>> implementation and then
> pass it
> >> for
> >> >> appending to avro methods.
> >> >> >>>>
> >> >> >>>> Is that possible?
> >> >> >>>
> >> >> >>> It is not possible without
> modifying
> >> >> DataFileWriter. Please open a JIRA
> >> >> >>> ticket.
> >> >> >>>
> >> >> >>> It could not simply append to
> an
> >> OutputStream,
> >> >> since it must either:
> >> >> >>> * Seek to the start to
> validate the
> >> schemas
> >> >> match and find the sync
> >> >> >>> marker, or
> >> >> >>> * Trust that the schemas
> match and
> >> find the
> >> >> sync marker from the last
> >> >> >>> block
> >> >> >>>
> >> >> >>> DataFileWriter cannot refer
> to Hadoop
> >> classes
> >> >> such as FileSystem, but we
> >> >> >>> could add something to the
> mapred
> >> module that
> >> >> takes a Path and
> >> >> >>> FileSystem and returns
> something that
> >> >> implemements an interface that
> >> >> >>> DataFileWriter can append
> to.
> >> This would
> >> >> be something that is both a
> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
> >> >> >>> and an OutputStream, or has
> both an
> >> InputStream
> >> >> from the start of the
> >> >> >>> existing file and an
> OutputStream at
> >> the end.
> >> >> >>>
> >> >> >>>> Thanks,
> >> >> >>>> Vyacheslav
> >> >> >>>>
> >> >> >>>> On Feb 21, 2012, at 5:29
> AM, Harsh
> >> J
> >> >> wrote:
> >> >> >>>>
> >> >> >>>>> Hi,
> >> >> >>>>>
> >> >> >>>>> Use the appendTo
> feature of
> >> the
> >> >> DataFileWriter. See
> >> >> >>>>>
> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >> >>>>>
> >> >> >>>>> For a quick setup
> example,
> >> read also:
> >> >> >>>>>
> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
> >> >> >>>>>
> >> >> >>>>> On Tue, Feb 21, 2012
> at 3:15
> >> AM,
> >> >> Vyacheslav Zholudev
> >> >> >>>>> <vy...@gmail.com>
> >> >> wrote:
> >> >> >>>>>> Hi,
> >> >> >>>>>>
> >> >> >>>>>> is it possible to
> append
> >> to an
> >> >> already existing avro file when it was
> >> >> >>>>>> written and
> closed
> >> before?
> >> >> >>>>>>
> >> >> >>>>>> If I use
> >> >> >>>>>> outputStream =
> >> >> fs.append(avroFilePath);
> >> >> >>>>>>
> >> >> >>>>>> then later on I
> get:
> >> >> java.io.IOException: Invalid sync!
> >> >> >>>>>>
> >> >> >>>>>> Probably because
> the
> >> schema is
> >> >> written twice and some other issues.
> >> >> >>>>>>
> >> >> >>>>>> If I use
> outputStream =
> >> >> fs.create(avroFilePath); then the avro
> file
> >> >> >>>>>> gets
> >> >> >>>>>> overwritten.
> >> >> >>>>>>
> >> >> >>>>>> Thanks,
> >> >> >>>>>> Vyacheslav
> >> >> >>>>>
> >> >> >>>>> --
> >> >> >>>>> Harsh J
> >> >> >>>>> Customer Ops.
> Engineer
> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
> >> >> >
> >> >>
> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
> Malak
> >> <mi...@yahoo.com>
> >> >> wrote:
> >> >> > Was a JIRA ticket ever created
> regarding
> >> appending to
> >> >> an existing Avro file on HDFS?
> >> >> >
> >> >> > What is the status of such a
> capability, a
> >> year out
> >> >> from when the issue below was raised?
> >> >> >
> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
> >> "Vyacheslav
> >> >> Zholudev" <vy...@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >> Thanks for your reply, I
> suspected this.
> >> >> >>
> >> >> >> I will create a JIRA ticket.
> >> >> >>
> >> >> >> Vyacheslav
> >> >> >>
> >> >> >> On Feb 21, 2012, at 6:02 PM,
> Scott Carey
> >> wrote:
> >> >> >>
> >> >> >>>
> >> >> >>> On 2/21/12 7:29 AM,
> "Vyacheslav
> >> Zholudev"
> >> >> <vy...@gmail.com>
> >> >> >>> wrote:
> >> >> >>>
> >> >> >>>> Yep, I saw that method as
> well as
> >> the
> >> >> stackoverflow post. However, I'm
> >> >> >>>> interested how to append
> to a file
> >> on the
> >> >> arbitrary file system, not
> >> >> >>>> only on the local one.
> >> >> >>>>
> >> >> >>>> I want to get an
> OutputStream
> >> based on the
> >> >> Path and the FileSystem
> >> >> >>>> implementation and then
> pass it
> >> for
> >> >> appending to avro methods.
> >> >> >>>>
> >> >> >>>> Is that possible?
> >> >> >>>
> >> >> >>> It is not possible without
> modifying
> >> >> DataFileWriter. Please open a JIRA
> >> >> >>> ticket.
> >> >> >>>
> >> >> >>> It could not simply append to
> an
> >> OutputStream,
> >> >> since it must either:
> >> >> >>> * Seek to the start to
> validate the
> >> schemas
> >> >> match and find the sync
> >> >> >>> marker, or
> >> >> >>> * Trust that the schemas
> match and
> >> find the
> >> >> sync marker from the last
> >> >> >>> block
> >> >> >>>
> >> >> >>> DataFileWriter cannot refer
> to Hadoop
> >> classes
> >> >> such as FileSystem, but we
> >> >> >>> could add something to the
> mapred
> >> module that
> >> >> takes a Path and
> >> >> >>> FileSystem and returns
> something that
> >> >> implemements an interface that
> >> >> >>> DataFileWriter can append
> to.
> >> This would
> >> >> be something that is both a
> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
> >> >> >>> and an OutputStream, or has
> both an
> >> InputStream
> >> >> from the start of the
> >> >> >>> existing file and an
> OutputStream at
> >> the end.
> >> >> >>>
> >> >> >>>> Thanks,
> >> >> >>>> Vyacheslav
> >> >> >>>>
> >> >> >>>> On Feb 21, 2012, at 5:29
> AM, Harsh
> >> J
> >> >> wrote:
> >> >> >>>>
> >> >> >>>>> Hi,
> >> >> >>>>>
> >> >> >>>>> Use the appendTo
> feature of
> >> the
> >> >> DataFileWriter. See
> >> >> >>>>>
> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >> >>>>>
> >> >> >>>>> For a quick setup
> example,
> >> read also:
> >> >> >>>>>
> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
> >> >> >>>>>
> >> >> >>>>> On Tue, Feb 21, 2012
> at 3:15
> >> AM,
> >> >> Vyacheslav Zholudev
> >> >> >>>>> <vy...@gmail.com>
> >> >> wrote:
> >> >> >>>>>> Hi,
> >> >> >>>>>>
> >> >> >>>>>> is it possible to
> append
> >> to an
> >> >> already existing avro file when it was
> >> >> >>>>>> written and
> closed
> >> before?
> >> >> >>>>>>
> >> >> >>>>>> If I use
> >> >> >>>>>> outputStream =
> >> >> fs.append(avroFilePath);
> >> >> >>>>>>
> >> >> >>>>>> then later on I
> get:
> >> >> java.io.IOException: Invalid sync!
> >> >> >>>>>>
> >> >> >>>>>> Probably because
> the
> >> schema is
> >> >> written twice and some other issues.
> >> >> >>>>>>
> >> >> >>>>>> If I use
> outputStream =
> >> >> fs.create(avroFilePath); then the avro
> file
> >> >> >>>>>> gets
> >> >> >>>>>> overwritten.
> >> >> >>>>>>
> >> >> >>>>>> Thanks,
> >> >> >>>>>> Vyacheslav
> >> >> >>>>>
> >> >> >>>>> --
> >> >> >>>>> Harsh J
> >> >> >>>>> Customer Ops.
> Engineer
> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
> >> >> >
> >> >>
> >>
> 
> 
> 
> --
> Harsh J
> 
> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <mi...@yahoo.com>
> wrote:
> > I don't believe a Hadoop FileSystem is a Java
> OutputStream?
> >
> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
> wrote:
> >
> >> From: Doug Cutting <cu...@apache.org>
> >> Subject: Re: Is it possible to append to an already
> existing avro file
> >> To: user@avro.apache.org
> >> Date: Tuesday, February 5, 2013, 5:27 PM
> >> It will work on an OutputStream that
> >> supports append.
> >>
> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
> >> java.io.OutputStream)
> >>
> >> So it depends on how well HDFS implements
> >> FileSystem#append(), not on
> >> any changes in Avro.
> >>
> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
> >>
> >> I have no recent personal experience with append
> in
> >> HDFS.  Does anyone
> >> else here?
> >>
> >> Doug
> >>
> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
> <mi...@yahoo.com>
> >> wrote:
> >> > My understanding is that will append to a file
> on the
> >> local filesystem, but not to a file on HDFS.
> >> >
> >> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
> >> wrote:
> >> >
> >> >> From: Doug Cutting <cu...@apache.org>
> >> >> Subject: Re: Is it possible to append to
> an already
> >> existing avro file
> >> >> To: user@avro.apache.org
> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
> >> >> The Jira is:
> >> >>
> >> >> https://issues.apache.org/jira/browse/AVRO-1035
> >> >>
> >> >> It is possible to append to an existing
> Avro file:
> >> >>
> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >>
> >> >> Should we close that issue as "fixed"?
> >> >>
> >> >> Doug
> >> >>
> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
> Malak
> >> <mi...@yahoo.com>
> >> >> wrote:
> >> >> > Was a JIRA ticket ever created
> regarding
> >> appending to
> >> >> an existing Avro file on HDFS?
> >> >> >
> >> >> > What is the status of such a
> capability, a
> >> year out
> >> >> from when the issue below was raised?
> >> >> >
> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
> >> "Vyacheslav
> >> >> Zholudev" <vy...@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >> Thanks for your reply, I
> suspected this.
> >> >> >>
> >> >> >> I will create a JIRA ticket.
> >> >> >>
> >> >> >> Vyacheslav
> >> >> >>
> >> >> >> On Feb 21, 2012, at 6:02 PM,
> Scott Carey
> >> wrote:
> >> >> >>
> >> >> >>>
> >> >> >>> On 2/21/12 7:29 AM,
> "Vyacheslav
> >> Zholudev"
> >> >> <vy...@gmail.com>
> >> >> >>> wrote:
> >> >> >>>
> >> >> >>>> Yep, I saw that method as
> well as
> >> the
> >> >> stackoverflow post. However, I'm
> >> >> >>>> interested how to append
> to a file
> >> on the
> >> >> arbitrary file system, not
> >> >> >>>> only on the local one.
> >> >> >>>>
> >> >> >>>> I want to get an
> OutputStream
> >> based on the
> >> >> Path and the FileSystem
> >> >> >>>> implementation and then
> pass it
> >> for
> >> >> appending to avro methods.
> >> >> >>>>
> >> >> >>>> Is that possible?
> >> >> >>>
> >> >> >>> It is not possible without
> modifying
> >> >> DataFileWriter. Please open a JIRA
> >> >> >>> ticket.
> >> >> >>>
> >> >> >>> It could not simply append to
> an
> >> OutputStream,
> >> >> since it must either:
> >> >> >>> * Seek to the start to
> validate the
> >> schemas
> >> >> match and find the sync
> >> >> >>> marker, or
> >> >> >>> * Trust that the schemas
> match and
> >> find the
> >> >> sync marker from the last
> >> >> >>> block
> >> >> >>>
> >> >> >>> DataFileWriter cannot refer
> to Hadoop
> >> classes
> >> >> such as FileSystem, but we
> >> >> >>> could add something to the
> mapred
> >> module that
> >> >> takes a Path and
> >> >> >>> FileSystem and returns
> something that
> >> >> implemements an interface that
> >> >> >>> DataFileWriter can append
> to.
> >> This would
> >> >> be something that is both a
> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
> >> >> >>> and an OutputStream, or has
> both an
> >> InputStream
> >> >> from the start of the
> >> >> >>> existing file and an
> OutputStream at
> >> the end.
> >> >> >>>
> >> >> >>>> Thanks,
> >> >> >>>> Vyacheslav
> >> >> >>>>
> >> >> >>>> On Feb 21, 2012, at 5:29
> AM, Harsh
> >> J
> >> >> wrote:
> >> >> >>>>
> >> >> >>>>> Hi,
> >> >> >>>>>
> >> >> >>>>> Use the appendTo
> feature of
> >> the
> >> >> DataFileWriter. See
> >> >> >>>>>
> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >> >>>>>
> >> >> >>>>> For a quick setup
> example,
> >> read also:
> >> >> >>>>>
> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
> >> >> >>>>>
> >> >> >>>>> On Tue, Feb 21, 2012
> at 3:15
> >> AM,
> >> >> Vyacheslav Zholudev
> >> >> >>>>> <vy...@gmail.com>
> >> >> wrote:
> >> >> >>>>>> Hi,
> >> >> >>>>>>
> >> >> >>>>>> is it possible to
> append
> >> to an
> >> >> already existing avro file when it was
> >> >> >>>>>> written and
> closed
> >> before?
> >> >> >>>>>>
> >> >> >>>>>> If I use
> >> >> >>>>>> outputStream =
> >> >> fs.append(avroFilePath);
> >> >> >>>>>>
> >> >> >>>>>> then later on I
> get:
> >> >> java.io.IOException: Invalid sync!
> >> >> >>>>>>
> >> >> >>>>>> Probably because
> the
> >> schema is
> >> >> written twice and some other issues.
> >> >> >>>>>>
> >> >> >>>>>> If I use
> outputStream =
> >> >> fs.create(avroFilePath); then the avro
> file
> >> >> >>>>>> gets
> >> >> >>>>>> overwritten.
> >> >> >>>>>>
> >> >> >>>>>> Thanks,
> >> >> >>>>>> Vyacheslav
> >> >> >>>>>
> >> >> >>>>> --
> >> >> >>>>> Harsh J
> >> >> >>>>> Customer Ops.
> Engineer
> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
> >> >> >
> >> >>
> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
> Malak
> >> <mi...@yahoo.com>
> >> >> wrote:
> >> >> > Was a JIRA ticket ever created
> regarding
> >> appending to
> >> >> an existing Avro file on HDFS?
> >> >> >
> >> >> > What is the status of such a
> capability, a
> >> year out
> >> >> from when the issue below was raised?
> >> >> >
> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
> >> "Vyacheslav
> >> >> Zholudev" <vy...@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >> Thanks for your reply, I
> suspected this.
> >> >> >>
> >> >> >> I will create a JIRA ticket.
> >> >> >>
> >> >> >> Vyacheslav
> >> >> >>
> >> >> >> On Feb 21, 2012, at 6:02 PM,
> Scott Carey
> >> wrote:
> >> >> >>
> >> >> >>>
> >> >> >>> On 2/21/12 7:29 AM,
> "Vyacheslav
> >> Zholudev"
> >> >> <vy...@gmail.com>
> >> >> >>> wrote:
> >> >> >>>
> >> >> >>>> Yep, I saw that method as
> well as
> >> the
> >> >> stackoverflow post. However, I'm
> >> >> >>>> interested how to append
> to a file
> >> on the
> >> >> arbitrary file system, not
> >> >> >>>> only on the local one.
> >> >> >>>>
> >> >> >>>> I want to get an
> OutputStream
> >> based on the
> >> >> Path and the FileSystem
> >> >> >>>> implementation and then
> pass it
> >> for
> >> >> appending to avro methods.
> >> >> >>>>
> >> >> >>>> Is that possible?
> >> >> >>>
> >> >> >>> It is not possible without
> modifying
> >> >> DataFileWriter. Please open a JIRA
> >> >> >>> ticket.
> >> >> >>>
> >> >> >>> It could not simply append to
> an
> >> OutputStream,
> >> >> since it must either:
> >> >> >>> * Seek to the start to
> validate the
> >> schemas
> >> >> match and find the sync
> >> >> >>> marker, or
> >> >> >>> * Trust that the schemas
> match and
> >> find the
> >> >> sync marker from the last
> >> >> >>> block
> >> >> >>>
> >> >> >>> DataFileWriter cannot refer
> to Hadoop
> >> classes
> >> >> such as FileSystem, but we
> >> >> >>> could add something to the
> mapred
> >> module that
> >> >> takes a Path and
> >> >> >>> FileSystem and returns
> something that
> >> >> implemements an interface that
> >> >> >>> DataFileWriter can append
> to.
> >> This would
> >> >> be something that is both a
> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
> >> >> >>> and an OutputStream, or has
> both an
> >> InputStream
> >> >> from the start of the
> >> >> >>> existing file and an
> OutputStream at
> >> the end.
> >> >> >>>
> >> >> >>>> Thanks,
> >> >> >>>> Vyacheslav
> >> >> >>>>
> >> >> >>>> On Feb 21, 2012, at 5:29
> AM, Harsh
> >> J
> >> >> wrote:
> >> >> >>>>
> >> >> >>>>> Hi,
> >> >> >>>>>
> >> >> >>>>> Use the appendTo
> feature of
> >> the
> >> >> DataFileWriter. See
> >> >> >>>>>
> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >> >>>>>
> >> >> >>>>> For a quick setup
> example,
> >> read also:
> >> >> >>>>>
> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
> >> >> >>>>>
> >> >> >>>>> On Tue, Feb 21, 2012
> at 3:15
> >> AM,
> >> >> Vyacheslav Zholudev
> >> >> >>>>> <vy...@gmail.com>
> >> >> wrote:
> >> >> >>>>>> Hi,
> >> >> >>>>>>
> >> >> >>>>>> is it possible to
> append
> >> to an
> >> >> already existing avro file when it was
> >> >> >>>>>> written and
> closed
> >> before?
> >> >> >>>>>>
> >> >> >>>>>> If I use
> >> >> >>>>>> outputStream =
> >> >> fs.append(avroFilePath);
> >> >> >>>>>>
> >> >> >>>>>> then later on I
> get:
> >> >> java.io.IOException: Invalid sync!
> >> >> >>>>>>
> >> >> >>>>>> Probably because
> the
> >> schema is
> >> >> written twice and some other issues.
> >> >> >>>>>>
> >> >> >>>>>> If I use
> outputStream =
> >> >> fs.create(avroFilePath); then the avro
> file
> >> >> >>>>>> gets
> >> >> >>>>>> overwritten.
> >> >> >>>>>>
> >> >> >>>>>> Thanks,
> >> >> >>>>>> Vyacheslav
> >> >> >>>>>
> >> >> >>>>> --
> >> >> >>>>> Harsh J
> >> >> >>>>> Customer Ops.
> Engineer
> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
> >> >> >
> >> >>
> >>
> 
> 
> 
> --
> Harsh J
> 
> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <mi...@yahoo.com>
> wrote:
> > I don't believe a Hadoop FileSystem is a Java
> OutputStream?
> >
> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
> wrote:
> >
> >> From: Doug Cutting <cu...@apache.org>
> >> Subject: Re: Is it possible to append to an already
> existing avro file
> >> To: user@avro.apache.org
> >> Date: Tuesday, February 5, 2013, 5:27 PM
> >> It will work on an OutputStream that
> >> supports append.
> >>
> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
> >> java.io.OutputStream)
> >>
> >> So it depends on how well HDFS implements
> >> FileSystem#append(), not on
> >> any changes in Avro.
> >>
> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
> >>
> >> I have no recent personal experience with append
> in
> >> HDFS.  Does anyone
> >> else here?
> >>
> >> Doug
> >>
> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
> <mi...@yahoo.com>
> >> wrote:
> >> > My understanding is that will append to a file
> on the
> >> local filesystem, but not to a file on HDFS.
> >> >
> >> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
> >> wrote:
> >> >
> >> >> From: Doug Cutting <cu...@apache.org>
> >> >> Subject: Re: Is it possible to append to
> an already
> >> existing avro file
> >> >> To: user@avro.apache.org
> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
> >> >> The Jira is:
> >> >>
> >> >> https://issues.apache.org/jira/browse/AVRO-1035
> >> >>
> >> >> It is possible to append to an existing
> Avro file:
> >> >>
> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >>
> >> >> Should we close that issue as "fixed"?
> >> >>
> >> >> Doug
> >> >>
> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
> Malak
> >> <mi...@yahoo.com>
> >> >> wrote:
> >> >> > Was a JIRA ticket ever created
> regarding
> >> appending to
> >> >> an existing Avro file on HDFS?
> >> >> >
> >> >> > What is the status of such a
> capability, a
> >> year out
> >> >> from when the issue below was raised?
> >> >> >
> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
> >> "Vyacheslav
> >> >> Zholudev" <vy...@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >> Thanks for your reply, I
> suspected this.
> >> >> >>
> >> >> >> I will create a JIRA ticket.
> >> >> >>
> >> >> >> Vyacheslav
> >> >> >>
> >> >> >> On Feb 21, 2012, at 6:02 PM,
> Scott Carey
> >> wrote:
> >> >> >>
> >> >> >>>
> >> >> >>> On 2/21/12 7:29 AM,
> "Vyacheslav
> >> Zholudev"
> >> >> <vy...@gmail.com>
> >> >> >>> wrote:
> >> >> >>>
> >> >> >>>> Yep, I saw that method as
> well as
> >> the
> >> >> stackoverflow post. However, I'm
> >> >> >>>> interested how to append
> to a file
> >> on the
> >> >> arbitrary file system, not
> >> >> >>>> only on the local one.
> >> >> >>>>
> >> >> >>>> I want to get an
> OutputStream
> >> based on the
> >> >> Path and the FileSystem
> >> >> >>>> implementation and then
> pass it
> >> for
> >> >> appending to avro methods.
> >> >> >>>>
> >> >> >>>> Is that possible?
> >> >> >>>
> >> >> >>> It is not possible without
> modifying
> >> >> DataFileWriter. Please open a JIRA
> >> >> >>> ticket.
> >> >> >>>
> >> >> >>> It could not simply append to
> an
> >> OutputStream,
> >> >> since it must either:
> >> >> >>> * Seek to the start to
> validate the
> >> schemas
> >> >> match and find the sync
> >> >> >>> marker, or
> >> >> >>> * Trust that the schemas
> match and
> >> find the
> >> >> sync marker from the last
> >> >> >>> block
> >> >> >>>
> >> >> >>> DataFileWriter cannot refer
> to Hadoop
> >> classes
> >> >> such as FileSystem, but we
> >> >> >>> could add something to the
> mapred
> >> module that
> >> >> takes a Path and
> >> >> >>> FileSystem and returns
> something that
> >> >> implemements an interface that
> >> >> >>> DataFileWriter can append
> to.
> >> This would
> >> >> be something that is both a
> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
> >> >> >>> and an OutputStream, or has
> both an
> >> InputStream
> >> >> from the start of the
> >> >> >>> existing file and an
> OutputStream at
> >> the end.
> >> >> >>>
> >> >> >>>> Thanks,
> >> >> >>>> Vyacheslav
> >> >> >>>>
> >> >> >>>> On Feb 21, 2012, at 5:29
> AM, Harsh
> >> J
> >> >> wrote:
> >> >> >>>>
> >> >> >>>>> Hi,
> >> >> >>>>>
> >> >> >>>>> Use the appendTo
> feature of
> >> the
> >> >> DataFileWriter. See
> >> >> >>>>>
> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >> >>>>>
> >> >> >>>>> For a quick setup
> example,
> >> read also:
> >> >> >>>>>
> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
> >> >> >>>>>
> >> >> >>>>> On Tue, Feb 21, 2012
> at 3:15
> >> AM,
> >> >> Vyacheslav Zholudev
> >> >> >>>>> <vy...@gmail.com>
> >> >> wrote:
> >> >> >>>>>> Hi,
> >> >> >>>>>>
> >> >> >>>>>> is it possible to
> append
> >> to an
> >> >> already existing avro file when it was
> >> >> >>>>>> written and
> closed
> >> before?
> >> >> >>>>>>
> >> >> >>>>>> If I use
> >> >> >>>>>> outputStream =
> >> >> fs.append(avroFilePath);
> >> >> >>>>>>
> >> >> >>>>>> then later on I
> get:
> >> >> java.io.IOException: Invalid sync!
> >> >> >>>>>>
> >> >> >>>>>> Probably because
> the
> >> schema is
> >> >> written twice and some other issues.
> >> >> >>>>>>
> >> >> >>>>>> If I use
> outputStream =
> >> >> fs.create(avroFilePath); then the avro
> file
> >> >> >>>>>> gets
> >> >> >>>>>> overwritten.
> >> >> >>>>>>
> >> >> >>>>>> Thanks,
> >> >> >>>>>> Vyacheslav
> >> >> >>>>>
> >> >> >>>>> --
> >> >> >>>>> Harsh J
> >> >> >>>>> Customer Ops.
> Engineer
> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
> >> >> >
> >> >>
> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
> Malak
> >> <mi...@yahoo.com>
> >> >> wrote:
> >> >> > Was a JIRA ticket ever created
> regarding
> >> appending to
> >> >> an existing Avro file on HDFS?
> >> >> >
> >> >> > What is the status of such a
> capability, a
> >> year out
> >> >> from when the issue below was raised?
> >> >> >
> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
> >> "Vyacheslav
> >> >> Zholudev" <vy...@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >> Thanks for your reply, I
> suspected this.
> >> >> >>
> >> >> >> I will create a JIRA ticket.
> >> >> >>
> >> >> >> Vyacheslav
> >> >> >>
> >> >> >> On Feb 21, 2012, at 6:02 PM,
> Scott Carey
> >> wrote:
> >> >> >>
> >> >> >>>
> >> >> >>> On 2/21/12 7:29 AM,
> "Vyacheslav
> >> Zholudev"
> >> >> <vy...@gmail.com>
> >> >> >>> wrote:
> >> >> >>>
> >> >> >>>> Yep, I saw that method as
> well as
> >> the
> >> >> stackoverflow post. However, I'm
> >> >> >>>> interested how to append
> to a file
> >> on the
> >> >> arbitrary file system, not
> >> >> >>>> only on the local one.
> >> >> >>>>
> >> >> >>>> I want to get an
> OutputStream
> >> based on the
> >> >> Path and the FileSystem
> >> >> >>>> implementation and then
> pass it
> >> for
> >> >> appending to avro methods.
> >> >> >>>>
> >> >> >>>> Is that possible?
> >> >> >>>
> >> >> >>> It is not possible without
> modifying
> >> >> DataFileWriter. Please open a JIRA
> >> >> >>> ticket.
> >> >> >>>
> >> >> >>> It could not simply append to
> an
> >> OutputStream,
> >> >> since it must either:
> >> >> >>> * Seek to the start to
> validate the
> >> schemas
> >> >> match and find the sync
> >> >> >>> marker, or
> >> >> >>> * Trust that the schemas
> match and
> >> find the
> >> >> sync marker from the last
> >> >> >>> block
> >> >> >>>
> >> >> >>> DataFileWriter cannot refer
> to Hadoop
> >> classes
> >> >> such as FileSystem, but we
> >> >> >>> could add something to the
> mapred
> >> module that
> >> >> takes a Path and
> >> >> >>> FileSystem and returns
> something that
> >> >> implemements an interface that
> >> >> >>> DataFileWriter can append
> to.
> >> This would
> >> >> be something that is both a
> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
> >> >> >>> and an OutputStream, or has
> both an
> >> InputStream
> >> >> from the start of the
> >> >> >>> existing file and an
> OutputStream at
> >> the end.
> >> >> >>>
> >> >> >>>> Thanks,
> >> >> >>>> Vyacheslav
> >> >> >>>>
> >> >> >>>> On Feb 21, 2012, at 5:29
> AM, Harsh
> >> J
> >> >> wrote:
> >> >> >>>>
> >> >> >>>>> Hi,
> >> >> >>>>>
> >> >> >>>>> Use the appendTo
> feature of
> >> the
> >> >> DataFileWriter. See
> >> >> >>>>>
> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >> >>>>>
> >> >> >>>>> For a quick setup
> example,
> >> read also:
> >> >> >>>>>
> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
> >> >> >>>>>
> >> >> >>>>> On Tue, Feb 21, 2012
> at 3:15
> >> AM,
> >> >> Vyacheslav Zholudev
> >> >> >>>>> <vy...@gmail.com>
> >> >> wrote:
> >> >> >>>>>> Hi,
> >> >> >>>>>>
> >> >> >>>>>> is it possible to
> append
> >> to an
> >> >> already existing avro file when it was
> >> >> >>>>>> written and
> closed
> >> before?
> >> >> >>>>>>
> >> >> >>>>>> If I use
> >> >> >>>>>> outputStream =
> >> >> fs.append(avroFilePath);
> >> >> >>>>>>
> >> >> >>>>>> then later on I
> get:
> >> >> java.io.IOException: Invalid sync!
> >> >> >>>>>>
> >> >> >>>>>> Probably because
> the
> >> schema is
> >> >> written twice and some other issues.
> >> >> >>>>>>
> >> >> >>>>>> If I use
> outputStream =
> >> >> fs.create(avroFilePath); then the avro
> file
> >> >> >>>>>> gets
> >> >> >>>>>> overwritten.
> >> >> >>>>>>
> >> >> >>>>>> Thanks,
> >> >> >>>>>> Vyacheslav
> >> >> >>>>>
> >> >> >>>>> --
> >> >> >>>>> Harsh J
> >> >> >>>>> Customer Ops.
> Engineer
> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
> >> >> >
> >> >>
> >>
> 
> 
> 
> -- 
> Harsh J
> 

Re: Is it possible to append to an already existing avro file

Posted by Harsh J <ha...@cloudera.com>.
Hey Michael,

It does implement the regular Java OutputStream interface, as seen in
the API: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html.

Here's a sample program that works on Hadoop 2.x in my tests:
https://gist.github.com/QwertyManiac/4724582

On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <mi...@yahoo.com> wrote:
> I don't believe a Hadoop FileSystem is a Java OutputStream?
>
> --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org> wrote:
>
>> From: Doug Cutting <cu...@apache.org>
>> Subject: Re: Is it possible to append to an already existing avro file
>> To: user@avro.apache.org
>> Date: Tuesday, February 5, 2013, 5:27 PM
>> It will work on an OutputStream that
>> supports append.
>>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>> java.io.OutputStream)
>>
>> So it depends on how well HDFS implements
>> FileSystem#append(), not on
>> any changes in Avro.
>>
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>>
>> I have no recent personal experience with append in
>> HDFS.  Does anyone
>> else here?
>>
>> Doug
>>
>> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak <mi...@yahoo.com>
>> wrote:
>> > My understanding is that will append to a file on the
>> local filesystem, but not to a file on HDFS.
>> >
>> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>> wrote:
>> >
>> >> From: Doug Cutting <cu...@apache.org>
>> >> Subject: Re: Is it possible to append to an already
>> existing avro file
>> >> To: user@avro.apache.org
>> >> Date: Tuesday, February 5, 2013, 5:08 PM
>> >> The Jira is:
>> >>
>> >> https://issues.apache.org/jira/browse/AVRO-1035
>> >>
>> >> It is possible to append to an existing Avro file:
>> >>
>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >>
>> >> Should we close that issue as "fixed"?
>> >>
>> >> Doug
>> >>
>> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
>> <mi...@yahoo.com>
>> >> wrote:
>> >> > Was a JIRA ticket ever created regarding
>> appending to
>> >> an existing Avro file on HDFS?
>> >> >
>> >> > What is the status of such a capability, a
>> year out
>> >> from when the issue below was raised?
>> >> >
>> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> "Vyacheslav
>> >> Zholudev" <vy...@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Thanks for your reply, I suspected this.
>> >> >>
>> >> >> I will create a JIRA ticket.
>> >> >>
>> >> >> Vyacheslav
>> >> >>
>> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey
>> wrote:
>> >> >>
>> >> >>>
>> >> >>> On 2/21/12 7:29 AM, "Vyacheslav
>> Zholudev"
>> >> <vy...@gmail.com>
>> >> >>> wrote:
>> >> >>>
>> >> >>>> Yep, I saw that method as well as
>> the
>> >> stackoverflow post. However, I'm
>> >> >>>> interested how to append to a file
>> on the
>> >> arbitrary file system, not
>> >> >>>> only on the local one.
>> >> >>>>
>> >> >>>> I want to get an OutputStream
>> based on the
>> >> Path and the FileSystem
>> >> >>>> implementation and then pass it
>> for
>> >> appending to avro methods.
>> >> >>>>
>> >> >>>> Is that possible?
>> >> >>>
>> >> >>> It is not possible without modifying
>> >> DataFileWriter. Please open a JIRA
>> >> >>> ticket.
>> >> >>>
>> >> >>> It could not simply append to an
>> OutputStream,
>> >> since it must either:
>> >> >>> * Seek to the start to validate the
>> schemas
>> >> match and find the sync
>> >> >>> marker, or
>> >> >>> * Trust that the schemas match and
>> find the
>> >> sync marker from the last
>> >> >>> block
>> >> >>>
>> >> >>> DataFileWriter cannot refer to Hadoop
>> classes
>> >> such as FileSystem, but we
>> >> >>> could add something to the mapred
>> module that
>> >> takes a Path and
>> >> >>> FileSystem and returns something that
>> >> implemements an interface that
>> >> >>> DataFileWriter can append to.
>> This would
>> >> be something that is both a
>> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >>> and an OutputStream, or has both an
>> InputStream
>> >> from the start of the
>> >> >>> existing file and an OutputStream at
>> the end.
>> >> >>>
>> >> >>>> Thanks,
>> >> >>>> Vyacheslav
>> >> >>>>
>> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh
>> J
>> >> wrote:
>> >> >>>>
>> >> >>>>> Hi,
>> >> >>>>>
>> >> >>>>> Use the appendTo feature of
>> the
>> >> DataFileWriter. See
>> >> >>>>>
>> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >>>>>
>> >> >>>>> For a quick setup example,
>> read also:
>> >> >>>>>
>> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >>>>>
>> >> >>>>> On Tue, Feb 21, 2012 at 3:15
>> AM,
>> >> Vyacheslav Zholudev
>> >> >>>>> <vy...@gmail.com>
>> >> wrote:
>> >> >>>>>> Hi,
>> >> >>>>>>
>> >> >>>>>> is it possible to append
>> to an
>> >> already existing avro file when it was
>> >> >>>>>> written and closed
>> before?
>> >> >>>>>>
>> >> >>>>>> If I use
>> >> >>>>>> outputStream =
>> >> fs.append(avroFilePath);
>> >> >>>>>>
>> >> >>>>>> then later on I get:
>> >> java.io.IOException: Invalid sync!
>> >> >>>>>>
>> >> >>>>>> Probably because the
>> schema is
>> >> written twice and some other issues.
>> >> >>>>>>
>> >> >>>>>> If I use outputStream =
>> >> fs.create(avroFilePath); then the avro file
>> >> >>>>>> gets
>> >> >>>>>> overwritten.
>> >> >>>>>>
>> >> >>>>>> Thanks,
>> >> >>>>>> Vyacheslav
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Harsh J
>> >> >>>>> Customer Ops. Engineer
>> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >
>> >>
>> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
>> <mi...@yahoo.com>
>> >> wrote:
>> >> > Was a JIRA ticket ever created regarding
>> appending to
>> >> an existing Avro file on HDFS?
>> >> >
>> >> > What is the status of such a capability, a
>> year out
>> >> from when the issue below was raised?
>> >> >
>> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> "Vyacheslav
>> >> Zholudev" <vy...@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Thanks for your reply, I suspected this.
>> >> >>
>> >> >> I will create a JIRA ticket.
>> >> >>
>> >> >> Vyacheslav
>> >> >>
>> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey
>> wrote:
>> >> >>
>> >> >>>
>> >> >>> On 2/21/12 7:29 AM, "Vyacheslav
>> Zholudev"
>> >> <vy...@gmail.com>
>> >> >>> wrote:
>> >> >>>
>> >> >>>> Yep, I saw that method as well as
>> the
>> >> stackoverflow post. However, I'm
>> >> >>>> interested how to append to a file
>> on the
>> >> arbitrary file system, not
>> >> >>>> only on the local one.
>> >> >>>>
>> >> >>>> I want to get an OutputStream
>> based on the
>> >> Path and the FileSystem
>> >> >>>> implementation and then pass it
>> for
>> >> appending to avro methods.
>> >> >>>>
>> >> >>>> Is that possible?
>> >> >>>
>> >> >>> It is not possible without modifying
>> >> DataFileWriter. Please open a JIRA
>> >> >>> ticket.
>> >> >>>
>> >> >>> It could not simply append to an
>> OutputStream,
>> >> since it must either:
>> >> >>> * Seek to the start to validate the
>> schemas
>> >> match and find the sync
>> >> >>> marker, or
>> >> >>> * Trust that the schemas match and
>> find the
>> >> sync marker from the last
>> >> >>> block
>> >> >>>
>> >> >>> DataFileWriter cannot refer to Hadoop
>> classes
>> >> such as FileSystem, but we
>> >> >>> could add something to the mapred
>> module that
>> >> takes a Path and
>> >> >>> FileSystem and returns something that
>> >> implemements an interface that
>> >> >>> DataFileWriter can append to.
>> This would
>> >> be something that is both a
>> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >>> and an OutputStream, or has both an
>> InputStream
>> >> from the start of the
>> >> >>> existing file and an OutputStream at
>> the end.
>> >> >>>
>> >> >>>> Thanks,
>> >> >>>> Vyacheslav
>> >> >>>>
>> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh
>> J
>> >> wrote:
>> >> >>>>
>> >> >>>>> Hi,
>> >> >>>>>
>> >> >>>>> Use the appendTo feature of
>> the
>> >> DataFileWriter. See
>> >> >>>>>
>> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >>>>>
>> >> >>>>> For a quick setup example,
>> read also:
>> >> >>>>>
>> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >>>>>
>> >> >>>>> On Tue, Feb 21, 2012 at 3:15
>> AM,
>> >> Vyacheslav Zholudev
>> >> >>>>> <vy...@gmail.com>
>> >> wrote:
>> >> >>>>>> Hi,
>> >> >>>>>>
>> >> >>>>>> is it possible to append
>> to an
>> >> already existing avro file when it was
>> >> >>>>>> written and closed
>> before?
>> >> >>>>>>
>> >> >>>>>> If I use
>> >> >>>>>> outputStream =
>> >> fs.append(avroFilePath);
>> >> >>>>>>
>> >> >>>>>> then later on I get:
>> >> java.io.IOException: Invalid sync!
>> >> >>>>>>
>> >> >>>>>> Probably because the
>> schema is
>> >> written twice and some other issues.
>> >> >>>>>>
>> >> >>>>>> If I use outputStream =
>> >> fs.create(avroFilePath); then the avro file
>> >> >>>>>> gets
>> >> >>>>>> overwritten.
>> >> >>>>>>
>> >> >>>>>> Thanks,
>> >> >>>>>> Vyacheslav
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Harsh J
>> >> >>>>> Customer Ops. Engineer
>> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >
>> >>
>>



--
Harsh J

On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <mi...@yahoo.com> wrote:
> I don't believe a Hadoop FileSystem is a Java OutputStream?
>
> --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org> wrote:
>
>> From: Doug Cutting <cu...@apache.org>
>> Subject: Re: Is it possible to append to an already existing avro file
>> To: user@avro.apache.org
>> Date: Tuesday, February 5, 2013, 5:27 PM
>> It will work on an OutputStream that
>> supports append.
>>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>> java.io.OutputStream)
>>
>> So it depends on how well HDFS implements
>> FileSystem#append(), not on
>> any changes in Avro.
>>
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>>
>> I have no recent personal experience with append in
>> HDFS.  Does anyone
>> else here?
>>
>> Doug
>>
>> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak <mi...@yahoo.com>
>> wrote:
>> > My understanding is that will append to a file on the
>> local filesystem, but not to a file on HDFS.
>> >
>> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>> wrote:
>> >
>> >> From: Doug Cutting <cu...@apache.org>
>> >> Subject: Re: Is it possible to append to an already
>> existing avro file
>> >> To: user@avro.apache.org
>> >> Date: Tuesday, February 5, 2013, 5:08 PM
>> >> The Jira is:
>> >>
>> >> https://issues.apache.org/jira/browse/AVRO-1035
>> >>
>> >> It is possible to append to an existing Avro file:
>> >>
>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >>
>> >> Should we close that issue as "fixed"?
>> >>
>> >> Doug
>> >>
>> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
>> <mi...@yahoo.com>
>> >> wrote:
>> >> > Was a JIRA ticket ever created regarding
>> appending to
>> >> an existing Avro file on HDFS?
>> >> >
>> >> > What is the status of such a capability, a
>> year out
>> >> from when the issue below was raised?
>> >> >
>> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> "Vyacheslav
>> >> Zholudev" <vy...@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Thanks for your reply, I suspected this.
>> >> >>
>> >> >> I will create a JIRA ticket.
>> >> >>
>> >> >> Vyacheslav
>> >> >>
>> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey
>> wrote:
>> >> >>
>> >> >>>
>> >> >>> On 2/21/12 7:29 AM, "Vyacheslav
>> Zholudev"
>> >> <vy...@gmail.com>
>> >> >>> wrote:
>> >> >>>
>> >> >>>> Yep, I saw that method as well as
>> the
>> >> stackoverflow post. However, I'm
>> >> >>>> interested how to append to a file
>> on the
>> >> arbitrary file system, not
>> >> >>>> only on the local one.
>> >> >>>>
>> >> >>>> I want to get an OutputStream
>> based on the
>> >> Path and the FileSystem
>> >> >>>> implementation and then pass it
>> for
>> >> appending to avro methods.
>> >> >>>>
>> >> >>>> Is that possible?
>> >> >>>
>> >> >>> It is not possible without modifying
>> >> DataFileWriter. Please open a JIRA
>> >> >>> ticket.
>> >> >>>
>> >> >>> It could not simply append to an
>> OutputStream,
>> >> since it must either:
>> >> >>> * Seek to the start to validate the
>> schemas
>> >> match and find the sync
>> >> >>> marker, or
>> >> >>> * Trust that the schemas match and
>> find the
>> >> sync marker from the last
>> >> >>> block
>> >> >>>
>> >> >>> DataFileWriter cannot refer to Hadoop
>> classes
>> >> such as FileSystem, but we
>> >> >>> could add something to the mapred
>> module that
>> >> takes a Path and
>> >> >>> FileSystem and returns something that
>> >> implemements an interface that
>> >> >>> DataFileWriter can append to.
>> This would
>> >> be something that is both a
>> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >>> and an OutputStream, or has both an
>> InputStream
>> >> from the start of the
>> >> >>> existing file and an OutputStream at
>> the end.
>> >> >>>
>> >> >>>> Thanks,
>> >> >>>> Vyacheslav
>> >> >>>>
>> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh
>> J
>> >> wrote:
>> >> >>>>
>> >> >>>>> Hi,
>> >> >>>>>
>> >> >>>>> Use the appendTo feature of
>> the
>> >> DataFileWriter. See
>> >> >>>>>
>> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >>>>>
>> >> >>>>> For a quick setup example,
>> read also:
>> >> >>>>>
>> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >>>>>
>> >> >>>>> On Tue, Feb 21, 2012 at 3:15
>> AM,
>> >> Vyacheslav Zholudev
>> >> >>>>> <vy...@gmail.com>
>> >> wrote:
>> >> >>>>>> Hi,
>> >> >>>>>>
>> >> >>>>>> is it possible to append
>> to an
>> >> already existing avro file when it was
>> >> >>>>>> written and closed
>> before?
>> >> >>>>>>
>> >> >>>>>> If I use
>> >> >>>>>> outputStream =
>> >> fs.append(avroFilePath);
>> >> >>>>>>
>> >> >>>>>> then later on I get:
>> >> java.io.IOException: Invalid sync!
>> >> >>>>>>
>> >> >>>>>> Probably because the
>> schema is
>> >> written twice and some other issues.
>> >> >>>>>>
>> >> >>>>>> If I use outputStream =
>> >> fs.create(avroFilePath); then the avro file
>> >> >>>>>> gets
>> >> >>>>>> overwritten.
>> >> >>>>>>
>> >> >>>>>> Thanks,
>> >> >>>>>> Vyacheslav
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Harsh J
>> >> >>>>> Customer Ops. Engineer
>> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >
>> >>
>> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
>> <mi...@yahoo.com>
>> >> wrote:
>> >> > Was a JIRA ticket ever created regarding
>> appending to
>> >> an existing Avro file on HDFS?
>> >> >
>> >> > What is the status of such a capability, a
>> year out
>> >> from when the issue below was raised?
>> >> >
>> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> "Vyacheslav
>> >> Zholudev" <vy...@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Thanks for your reply, I suspected this.
>> >> >>
>> >> >> I will create a JIRA ticket.
>> >> >>
>> >> >> Vyacheslav
>> >> >>
>> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey
>> wrote:
>> >> >>
>> >> >>>
>> >> >>> On 2/21/12 7:29 AM, "Vyacheslav
>> Zholudev"
>> >> <vy...@gmail.com>
>> >> >>> wrote:
>> >> >>>
>> >> >>>> Yep, I saw that method as well as
>> the
>> >> stackoverflow post. However, I'm
>> >> >>>> interested how to append to a file
>> on the
>> >> arbitrary file system, not
>> >> >>>> only on the local one.
>> >> >>>>
>> >> >>>> I want to get an OutputStream
>> based on the
>> >> Path and the FileSystem
>> >> >>>> implementation and then pass it
>> for
>> >> appending to avro methods.
>> >> >>>>
>> >> >>>> Is that possible?
>> >> >>>
>> >> >>> It is not possible without modifying
>> >> DataFileWriter. Please open a JIRA
>> >> >>> ticket.
>> >> >>>
>> >> >>> It could not simply append to an
>> OutputStream,
>> >> since it must either:
>> >> >>> * Seek to the start to validate the
>> schemas
>> >> match and find the sync
>> >> >>> marker, or
>> >> >>> * Trust that the schemas match and
>> find the
>> >> sync marker from the last
>> >> >>> block
>> >> >>>
>> >> >>> DataFileWriter cannot refer to Hadoop
>> classes
>> >> such as FileSystem, but we
>> >> >>> could add something to the mapred
>> module that
>> >> takes a Path and
>> >> >>> FileSystem and returns something that
>> >> implemements an interface that
>> >> >>> DataFileWriter can append to.
>> This would
>> >> be something that is both a
>> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >>> and an OutputStream, or has both an
>> InputStream
>> >> from the start of the
>> >> >>> existing file and an OutputStream at
>> the end.
>> >> >>>
>> >> >>>> Thanks,
>> >> >>>> Vyacheslav
>> >> >>>>
>> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh
>> J
>> >> wrote:
>> >> >>>>
>> >> >>>>> Hi,
>> >> >>>>>
>> >> >>>>> Use the appendTo feature of
>> the
>> >> DataFileWriter. See
>> >> >>>>>
>> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >>>>>
>> >> >>>>> For a quick setup example,
>> read also:
>> >> >>>>>
>> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >>>>>
>> >> >>>>> On Tue, Feb 21, 2012 at 3:15
>> AM,
>> >> Vyacheslav Zholudev
>> >> >>>>> <vy...@gmail.com>
>> >> wrote:
>> >> >>>>>> Hi,
>> >> >>>>>>
>> >> >>>>>> is it possible to append
>> to an
>> >> already existing avro file when it was
>> >> >>>>>> written and closed
>> before?
>> >> >>>>>>
>> >> >>>>>> If I use
>> >> >>>>>> outputStream =
>> >> fs.append(avroFilePath);
>> >> >>>>>>
>> >> >>>>>> then later on I get:
>> >> java.io.IOException: Invalid sync!
>> >> >>>>>>
>> >> >>>>>> Probably because the
>> schema is
>> >> written twice and some other issues.
>> >> >>>>>>
>> >> >>>>>> If I use outputStream =
>> >> fs.create(avroFilePath); then the avro file
>> >> >>>>>> gets
>> >> >>>>>> overwritten.
>> >> >>>>>>
>> >> >>>>>> Thanks,
>> >> >>>>>> Vyacheslav
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Harsh J
>> >> >>>>> Customer Ops. Engineer
>> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >
>> >>
>>



--
Harsh J

On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <mi...@yahoo.com> wrote:
> I don't believe a Hadoop FileSystem is a Java OutputStream?
>
> --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org> wrote:
>
>> From: Doug Cutting <cu...@apache.org>
>> Subject: Re: Is it possible to append to an already existing avro file
>> To: user@avro.apache.org
>> Date: Tuesday, February 5, 2013, 5:27 PM
>> It will work on an OutputStream that
>> supports append.
>>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>> java.io.OutputStream)
>>
>> So it depends on how well HDFS implements
>> FileSystem#append(), not on
>> any changes in Avro.
>>
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>>
>> I have no recent personal experience with append in
>> HDFS.  Does anyone
>> else here?
>>
>> Doug
>>
>> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak <mi...@yahoo.com>
>> wrote:
>> > My understanding is that will append to a file on the
>> local filesystem, but not to a file on HDFS.
>> >
>> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>> wrote:
>> >
>> >> From: Doug Cutting <cu...@apache.org>
>> >> Subject: Re: Is it possible to append to an already
>> existing avro file
>> >> To: user@avro.apache.org
>> >> Date: Tuesday, February 5, 2013, 5:08 PM
>> >> The Jira is:
>> >>
>> >> https://issues.apache.org/jira/browse/AVRO-1035
>> >>
>> >> It is possible to append to an existing Avro file:
>> >>
>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >>
>> >> Should we close that issue as "fixed"?
>> >>
>> >> Doug
>> >>
>> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
>> <mi...@yahoo.com>
>> >> wrote:
>> >> > Was a JIRA ticket ever created regarding
>> appending to
>> >> an existing Avro file on HDFS?
>> >> >
>> >> > What is the status of such a capability, a
>> year out
>> >> from when the issue below was raised?
>> >> >
>> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> "Vyacheslav
>> >> Zholudev" <vy...@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Thanks for your reply, I suspected this.
>> >> >>
>> >> >> I will create a JIRA ticket.
>> >> >>
>> >> >> Vyacheslav
>> >> >>
>> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey
>> wrote:
>> >> >>
>> >> >>>
>> >> >>> On 2/21/12 7:29 AM, "Vyacheslav
>> Zholudev"
>> >> <vy...@gmail.com>
>> >> >>> wrote:
>> >> >>>
>> >> >>>> Yep, I saw that method as well as
>> the
>> >> stackoverflow post. However, I'm
>> >> >>>> interested how to append to a file
>> on the
>> >> arbitrary file system, not
>> >> >>>> only on the local one.
>> >> >>>>
>> >> >>>> I want to get an OutputStream
>> based on the
>> >> Path and the FileSystem
>> >> >>>> implementation and then pass it
>> for
>> >> appending to avro methods.
>> >> >>>>
>> >> >>>> Is that possible?
>> >> >>>
>> >> >>> It is not possible without modifying
>> >> DataFileWriter. Please open a JIRA
>> >> >>> ticket.
>> >> >>>
>> >> >>> It could not simply append to an
>> OutputStream,
>> >> since it must either:
>> >> >>> * Seek to the start to validate the
>> schemas
>> >> match and find the sync
>> >> >>> marker, or
>> >> >>> * Trust that the schemas match and
>> find the
>> >> sync marker from the last
>> >> >>> block
>> >> >>>
>> >> >>> DataFileWriter cannot refer to Hadoop
>> classes
>> >> such as FileSystem, but we
>> >> >>> could add something to the mapred
>> module that
>> >> takes a Path and
>> >> >>> FileSystem and returns something that
>> >> implemements an interface that
>> >> >>> DataFileWriter can append to.
>> This would
>> >> be something that is both a
>> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >>> and an OutputStream, or has both an
>> InputStream
>> >> from the start of the
>> >> >>> existing file and an OutputStream at
>> the end.
>> >> >>>
>> >> >>>> Thanks,
>> >> >>>> Vyacheslav
>> >> >>>>
>> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh
>> J
>> >> wrote:
>> >> >>>>
>> >> >>>>> Hi,
>> >> >>>>>
>> >> >>>>> Use the appendTo feature of
>> the
>> >> DataFileWriter. See
>> >> >>>>>
>> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >>>>>
>> >> >>>>> For a quick setup example,
>> read also:
>> >> >>>>>
>> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >>>>>
>> >> >>>>> On Tue, Feb 21, 2012 at 3:15
>> AM,
>> >> Vyacheslav Zholudev
>> >> >>>>> <vy...@gmail.com>
>> >> wrote:
>> >> >>>>>> Hi,
>> >> >>>>>>
>> >> >>>>>> is it possible to append
>> to an
>> >> already existing avro file when it was
>> >> >>>>>> written and closed
>> before?
>> >> >>>>>>
>> >> >>>>>> If I use
>> >> >>>>>> outputStream =
>> >> fs.append(avroFilePath);
>> >> >>>>>>
>> >> >>>>>> then later on I get:
>> >> java.io.IOException: Invalid sync!
>> >> >>>>>>
>> >> >>>>>> Probably because the
>> schema is
>> >> written twice and some other issues.
>> >> >>>>>>
>> >> >>>>>> If I use outputStream =
>> >> fs.create(avroFilePath); then the avro file
>> >> >>>>>> gets
>> >> >>>>>> overwritten.
>> >> >>>>>>
>> >> >>>>>> Thanks,
>> >> >>>>>> Vyacheslav
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Harsh J
>> >> >>>>> Customer Ops. Engineer
>> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >
>> >>
>> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
>> <mi...@yahoo.com>
>> >> wrote:
>> >> > Was a JIRA ticket ever created regarding
>> appending to
>> >> an existing Avro file on HDFS?
>> >> >
>> >> > What is the status of such a capability, a
>> year out
>> >> from when the issue below was raised?
>> >> >
>> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> "Vyacheslav
>> >> Zholudev" <vy...@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Thanks for your reply, I suspected this.
>> >> >>
>> >> >> I will create a JIRA ticket.
>> >> >>
>> >> >> Vyacheslav
>> >> >>
>> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey
>> wrote:
>> >> >>
>> >> >>>
>> >> >>> On 2/21/12 7:29 AM, "Vyacheslav
>> Zholudev"
>> >> <vy...@gmail.com>
>> >> >>> wrote:
>> >> >>>
>> >> >>>> Yep, I saw that method as well as
>> the
>> >> stackoverflow post. However, I'm
>> >> >>>> interested how to append to a file
>> on the
>> >> arbitrary file system, not
>> >> >>>> only on the local one.
>> >> >>>>
>> >> >>>> I want to get an OutputStream
>> based on the
>> >> Path and the FileSystem
>> >> >>>> implementation and then pass it
>> for
>> >> appending to avro methods.
>> >> >>>>
>> >> >>>> Is that possible?
>> >> >>>
>> >> >>> It is not possible without modifying
>> >> DataFileWriter. Please open a JIRA
>> >> >>> ticket.
>> >> >>>
>> >> >>> It could not simply append to an
>> OutputStream,
>> >> since it must either:
>> >> >>> * Seek to the start to validate the
>> schemas
>> >> match and find the sync
>> >> >>> marker, or
>> >> >>> * Trust that the schemas match and
>> find the
>> >> sync marker from the last
>> >> >>> block
>> >> >>>
>> >> >>> DataFileWriter cannot refer to Hadoop
>> classes
>> >> such as FileSystem, but we
>> >> >>> could add something to the mapred
>> module that
>> >> takes a Path and
>> >> >>> FileSystem and returns something that
>> >> implemements an interface that
>> >> >>> DataFileWriter can append to.
>> This would
>> >> be something that is both a
>> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >>> and an OutputStream, or has both an
>> InputStream
>> >> from the start of the
>> >> >>> existing file and an OutputStream at
>> the end.
>> >> >>>
>> >> >>>> Thanks,
>> >> >>>> Vyacheslav
>> >> >>>>
>> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh
>> J
>> >> wrote:
>> >> >>>>
>> >> >>>>> Hi,
>> >> >>>>>
>> >> >>>>> Use the appendTo feature of
>> the
>> >> DataFileWriter. See
>> >> >>>>>
>> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >>>>>
>> >> >>>>> For a quick setup example,
>> read also:
>> >> >>>>>
>> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >>>>>
>> >> >>>>> On Tue, Feb 21, 2012 at 3:15
>> AM,
>> >> Vyacheslav Zholudev
>> >> >>>>> <vy...@gmail.com>
>> >> wrote:
>> >> >>>>>> Hi,
>> >> >>>>>>
>> >> >>>>>> is it possible to append
>> to an
>> >> already existing avro file when it was
>> >> >>>>>> written and closed
>> before?
>> >> >>>>>>
>> >> >>>>>> If I use
>> >> >>>>>> outputStream =
>> >> fs.append(avroFilePath);
>> >> >>>>>>
>> >> >>>>>> then later on I get:
>> >> java.io.IOException: Invalid sync!
>> >> >>>>>>
>> >> >>>>>> Probably because the
>> schema is
>> >> written twice and some other issues.
>> >> >>>>>>
>> >> >>>>>> If I use outputStream =
>> >> fs.create(avroFilePath); then the avro file
>> >> >>>>>> gets
>> >> >>>>>> overwritten.
>> >> >>>>>>
>> >> >>>>>> Thanks,
>> >> >>>>>> Vyacheslav
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Harsh J
>> >> >>>>> Customer Ops. Engineer
>> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >
>> >>
>>



-- 
Harsh J

Re: Is it possible to append to an already existing avro file

Posted by Ken Krugler <kk...@transpac.com>.
On Feb 5, 2013, at 7:30pm, Michael Malak wrote:

> I don't believe a Hadoop FileSystem is a Java OutputStream?

The Hadoop FileSystem.append() method returns an FSDataOutputStream, which is a sub-class of the Java OutputStream.

-- Ken

> 
> --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org> wrote:
> 
>> From: Doug Cutting <cu...@apache.org>
>> Subject: Re: Is it possible to append to an already existing avro file
>> To: user@avro.apache.org
>> Date: Tuesday, February 5, 2013, 5:27 PM
>> It will work on an OutputStream that
>> supports append.
>> 
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>> java.io.OutputStream)
>> 
>> So it depends on how well HDFS implements
>> FileSystem#append(), not on
>> any changes in Avro.
>> 
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>> 
>> I have no recent personal experience with append in
>> HDFS.  Does anyone
>> else here?
>> 
>> Doug
>> 
>> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak <mi...@yahoo.com>
>> wrote:
>>> My understanding is that will append to a file on the
>> local filesystem, but not to a file on HDFS.
>>> 
>>> --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
>> wrote:
>>> 
>>>> From: Doug Cutting <cu...@apache.org>
>>>> Subject: Re: Is it possible to append to an already
>> existing avro file
>>>> To: user@avro.apache.org
>>>> Date: Tuesday, February 5, 2013, 5:08 PM
>>>> The Jira is:
>>>> 
>>>> https://issues.apache.org/jira/browse/AVRO-1035
>>>> 
>>>> It is possible to append to an existing Avro file:
>>>> 
>>>> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>>> 
>>>> Should we close that issue as "fixed"?
>>>> 
>>>> Doug
>>>> 
>>>> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
>> <mi...@yahoo.com>
>>>> wrote:
>>>>> Was a JIRA ticket ever created regarding
>> appending to
>>>> an existing Avro file on HDFS?
>>>>> 
>>>>> What is the status of such a capability, a
>> year out
>>>> from when the issue below was raised?
>>>>> 
>>>>> On Wed, 22 Feb 2012 10:57:48 +0100,
>> "Vyacheslav
>>>> Zholudev" <vy...@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> Thanks for your reply, I suspected this.
>>>>>> 
>>>>>> I will create a JIRA ticket.
>>>>>> 
>>>>>> Vyacheslav
>>>>>> 
>>>>>> On Feb 21, 2012, at 6:02 PM, Scott Carey
>> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> On 2/21/12 7:29 AM, "Vyacheslav
>> Zholudev"
>>>> <vy...@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Yep, I saw that method as well as
>> the
>>>> stackoverflow post. However, I'm
>>>>>>>> interested how to append to a file
>> on the
>>>> arbitrary file system, not
>>>>>>>> only on the local one.
>>>>>>>> 
>>>>>>>> I want to get an OutputStream
>> based on the
>>>> Path and the FileSystem
>>>>>>>> implementation and then pass it
>> for
>>>> appending to avro methods.
>>>>>>>> 
>>>>>>>> Is that possible?
>>>>>>> 
>>>>>>> It is not possible without modifying
>>>> DataFileWriter. Please open a JIRA
>>>>>>> ticket.
>>>>>>> 
>>>>>>> It could not simply append to an
>> OutputStream,
>>>> since it must either:
>>>>>>> * Seek to the start to validate the
>> schemas
>>>> match and find the sync
>>>>>>> marker, or
>>>>>>> * Trust that the schemas match and
>> find the
>>>> sync marker from the last
>>>>>>> block
>>>>>>> 
>>>>>>> DataFileWriter cannot refer to Hadoop
>> classes
>>>> such as FileSystem, but we
>>>>>>> could add something to the mapred
>> module that
>>>> takes a Path and
>>>>>>> FileSystem and returns something that
>>>> implemements an interface that
>>>>>>> DataFileWriter can append to. 
>> This would
>>>> be something that is both a
>>>>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>>>>>> and an OutputStream, or has both an
>> InputStream
>>>> from the start of the
>>>>>>> existing file and an OutputStream at
>> the end.
>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Vyacheslav
>>>>>>>> 
>>>>>>>> On Feb 21, 2012, at 5:29 AM, Harsh
>> J
>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> Use the appendTo feature of
>> the
>>>> DataFileWriter. See
>>>>>>>>> 
>>>>>>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>>>>>>>> 
>>>>>>>>> For a quick setup example,
>> read also:
>>>>>>>>> 
>>>>>>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>>>>>>>> 
>>>>>>>>> On Tue, Feb 21, 2012 at 3:15
>> AM,
>>>> Vyacheslav Zholudev
>>>>>>>>> <vy...@gmail.com>
>>>> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> is it possible to append
>> to an
>>>> already existing avro file when it was
>>>>>>>>>> written and closed
>> before?
>>>>>>>>>> 
>>>>>>>>>> If I use
>>>>>>>>>> outputStream =
>>>> fs.append(avroFilePath);
>>>>>>>>>> 
>>>>>>>>>> then later on I get:
>>>> java.io.IOException: Invalid sync!
>>>>>>>>>> 
>>>>>>>>>> Probably because the
>> schema is
>>>> written twice and some other issues.
>>>>>>>>>> 
>>>>>>>>>> If I use outputStream =
>>>> fs.create(avroFilePath); then the avro file
>>>>>>>>>> gets
>>>>>>>>>> overwritten.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Vyacheslav
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Harsh J
>>>>>>>>> Customer Ops. Engineer
>>>>>>>>> Cloudera | http://tiny.cloudera.com/about
>>>>> 
>>>> 
>>>> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
>> <mi...@yahoo.com>
>>>> wrote:
>>>>> Was a JIRA ticket ever created regarding
>> appending to
>>>> an existing Avro file on HDFS?
>>>>> 
>>>>> What is the status of such a capability, a
>> year out
>>>> from when the issue below was raised?
>>>>> 
>>>>> On Wed, 22 Feb 2012 10:57:48 +0100,
>> "Vyacheslav
>>>> Zholudev" <vy...@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> Thanks for your reply, I suspected this.
>>>>>> 
>>>>>> I will create a JIRA ticket.
>>>>>> 
>>>>>> Vyacheslav
>>>>>> 
>>>>>> On Feb 21, 2012, at 6:02 PM, Scott Carey
>> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> On 2/21/12 7:29 AM, "Vyacheslav
>> Zholudev"
>>>> <vy...@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Yep, I saw that method as well as
>> the
>>>> stackoverflow post. However, I'm
>>>>>>>> interested how to append to a file
>> on the
>>>> arbitrary file system, not
>>>>>>>> only on the local one.
>>>>>>>> 
>>>>>>>> I want to get an OutputStream
>> based on the
>>>> Path and the FileSystem
>>>>>>>> implementation and then pass it
>> for
>>>> appending to avro methods.
>>>>>>>> 
>>>>>>>> Is that possible?
>>>>>>> 
>>>>>>> It is not possible without modifying
>>>> DataFileWriter. Please open a JIRA
>>>>>>> ticket.
>>>>>>> 
>>>>>>> It could not simply append to an
>> OutputStream,
>>>> since it must either:
>>>>>>> * Seek to the start to validate the
>> schemas
>>>> match and find the sync
>>>>>>> marker, or
>>>>>>> * Trust that the schemas match and
>> find the
>>>> sync marker from the last
>>>>>>> block
>>>>>>> 
>>>>>>> DataFileWriter cannot refer to Hadoop
>> classes
>>>> such as FileSystem, but we
>>>>>>> could add something to the mapred
>> module that
>>>> takes a Path and
>>>>>>> FileSystem and returns something that
>>>> implemements an interface that
>>>>>>> DataFileWriter can append to. 
>> This would
>>>> be something that is both a
>>>>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>>>>>> and an OutputStream, or has both an
>> InputStream
>>>> from the start of the
>>>>>>> existing file and an OutputStream at
>> the end.
>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Vyacheslav
>>>>>>>> 
>>>>>>>> On Feb 21, 2012, at 5:29 AM, Harsh
>> J
>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> Use the appendTo feature of
>> the
>>>> DataFileWriter. See
>>>>>>>>> 
>>>>>>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>>>>>>>> 
>>>>>>>>> For a quick setup example,
>> read also:
>>>>>>>>> 
>>>>>>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>>>>>>>> 
>>>>>>>>> On Tue, Feb 21, 2012 at 3:15
>> AM,
>>>> Vyacheslav Zholudev
>>>>>>>>> <vy...@gmail.com>
>>>> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> is it possible to append
>> to an
>>>> already existing avro file when it was
>>>>>>>>>> written and closed
>> before?
>>>>>>>>>> 
>>>>>>>>>> If I use
>>>>>>>>>> outputStream =
>>>> fs.append(avroFilePath);
>>>>>>>>>> 
>>>>>>>>>> then later on I get:
>>>> java.io.IOException: Invalid sync!
>>>>>>>>>> 
>>>>>>>>>> Probably because the
>> schema is
>>>> written twice and some other issues.
>>>>>>>>>> 
>>>>>>>>>> If I use outputStream =
>>>> fs.create(avroFilePath); then the avro file
>>>>>>>>>> gets
>>>>>>>>>> overwritten.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Vyacheslav
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Harsh J
>>>>>>>>> Customer Ops. Engineer
>>>>>>>>> Cloudera | http://tiny.cloudera.com/about
>>>>> 
>>>> 
>> 

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr






Re: Is it possible to append to an already existing avro file

Posted by Michael Malak <mi...@yahoo.com>.
I don't believe a Hadoop FileSystem is a Java OutputStream?

--- On Tue, 2/5/13, Doug Cutting <cu...@apache.org> wrote:

> From: Doug Cutting <cu...@apache.org>
> Subject: Re: Is it possible to append to an already existing avro file
> To: user@avro.apache.org
> Date: Tuesday, February 5, 2013, 5:27 PM
> It will work on an OutputStream that
> supports append.
> 
> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
> java.io.OutputStream)
> 
> So it depends on how well HDFS implements
> FileSystem#append(), not on
> any changes in Avro.
> 
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
> 
> I have no recent personal experience with append in
> HDFS.  Does anyone
> else here?
> 
> Doug
> 
> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak <mi...@yahoo.com>
> wrote:
> > My understanding is that will append to a file on the
> local filesystem, but not to a file on HDFS.
> >
> > --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org>
> wrote:
> >
> >> From: Doug Cutting <cu...@apache.org>
> >> Subject: Re: Is it possible to append to an already
> existing avro file
> >> To: user@avro.apache.org
> >> Date: Tuesday, February 5, 2013, 5:08 PM
> >> The Jira is:
> >>
> >> https://issues.apache.org/jira/browse/AVRO-1035
> >>
> >> It is possible to append to an existing Avro file:
> >>
> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >>
> >> Should we close that issue as "fixed"?
> >>
> >> Doug
> >>
> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
> <mi...@yahoo.com>
> >> wrote:
> >> > Was a JIRA ticket ever created regarding
> appending to
> >> an existing Avro file on HDFS?
> >> >
> >> > What is the status of such a capability, a
> year out
> >> from when the issue below was raised?
> >> >
> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
> "Vyacheslav
> >> Zholudev" <vy...@gmail.com>
> >> wrote:
> >> >
> >> >> Thanks for your reply, I suspected this.
> >> >>
> >> >> I will create a JIRA ticket.
> >> >>
> >> >> Vyacheslav
> >> >>
> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey
> wrote:
> >> >>
> >> >>>
> >> >>> On 2/21/12 7:29 AM, "Vyacheslav
> Zholudev"
> >> <vy...@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>>> Yep, I saw that method as well as
> the
> >> stackoverflow post. However, I'm
> >> >>>> interested how to append to a file
> on the
> >> arbitrary file system, not
> >> >>>> only on the local one.
> >> >>>>
> >> >>>> I want to get an OutputStream
> based on the
> >> Path and the FileSystem
> >> >>>> implementation and then pass it
> for
> >> appending to avro methods.
> >> >>>>
> >> >>>> Is that possible?
> >> >>>
> >> >>> It is not possible without modifying
> >> DataFileWriter. Please open a JIRA
> >> >>> ticket.
> >> >>>
> >> >>> It could not simply append to an
> OutputStream,
> >> since it must either:
> >> >>> * Seek to the start to validate the
> schemas
> >> match and find the sync
> >> >>> marker, or
> >> >>> * Trust that the schemas match and
> find the
> >> sync marker from the last
> >> >>> block
> >> >>>
> >> >>> DataFileWriter cannot refer to Hadoop
> classes
> >> such as FileSystem, but we
> >> >>> could add something to the mapred
> module that
> >> takes a Path and
> >> >>> FileSystem and returns something that
> >> implemements an interface that
> >> >>> DataFileWriter can append to. 
> This would
> >> be something that is both a
> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
> >> >>> and an OutputStream, or has both an
> InputStream
> >> from the start of the
> >> >>> existing file and an OutputStream at
> the end.
> >> >>>
> >> >>>> Thanks,
> >> >>>> Vyacheslav
> >> >>>>
> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh
> J
> >> wrote:
> >> >>>>
> >> >>>>> Hi,
> >> >>>>>
> >> >>>>> Use the appendTo feature of
> the
> >> DataFileWriter. See
> >> >>>>>
> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >>>>>
> >> >>>>> For a quick setup example,
> read also:
> >> >>>>>
> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
> >> >>>>>
> >> >>>>> On Tue, Feb 21, 2012 at 3:15
> AM,
> >> Vyacheslav Zholudev
> >> >>>>> <vy...@gmail.com>
> >> wrote:
> >> >>>>>> Hi,
> >> >>>>>>
> >> >>>>>> is it possible to append
> to an
> >> already existing avro file when it was
> >> >>>>>> written and closed
> before?
> >> >>>>>>
> >> >>>>>> If I use
> >> >>>>>> outputStream =
> >> fs.append(avroFilePath);
> >> >>>>>>
> >> >>>>>> then later on I get:
> >> java.io.IOException: Invalid sync!
> >> >>>>>>
> >> >>>>>> Probably because the
> schema is
> >> written twice and some other issues.
> >> >>>>>>
> >> >>>>>> If I use outputStream =
> >> fs.create(avroFilePath); then the avro file
> >> >>>>>> gets
> >> >>>>>> overwritten.
> >> >>>>>>
> >> >>>>>> Thanks,
> >> >>>>>> Vyacheslav
> >> >>>>>
> >> >>>>> --
> >> >>>>> Harsh J
> >> >>>>> Customer Ops. Engineer
> >> >>>>> Cloudera | http://tiny.cloudera.com/about
> >> >
> >>
> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
> <mi...@yahoo.com>
> >> wrote:
> >> > Was a JIRA ticket ever created regarding
> appending to
> >> an existing Avro file on HDFS?
> >> >
> >> > What is the status of such a capability, a
> year out
> >> from when the issue below was raised?
> >> >
> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
> "Vyacheslav
> >> Zholudev" <vy...@gmail.com>
> >> wrote:
> >> >
> >> >> Thanks for your reply, I suspected this.
> >> >>
> >> >> I will create a JIRA ticket.
> >> >>
> >> >> Vyacheslav
> >> >>
> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey
> wrote:
> >> >>
> >> >>>
> >> >>> On 2/21/12 7:29 AM, "Vyacheslav
> Zholudev"
> >> <vy...@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>>> Yep, I saw that method as well as
> the
> >> stackoverflow post. However, I'm
> >> >>>> interested how to append to a file
> on the
> >> arbitrary file system, not
> >> >>>> only on the local one.
> >> >>>>
> >> >>>> I want to get an OutputStream
> based on the
> >> Path and the FileSystem
> >> >>>> implementation and then pass it
> for
> >> appending to avro methods.
> >> >>>>
> >> >>>> Is that possible?
> >> >>>
> >> >>> It is not possible without modifying
> >> DataFileWriter. Please open a JIRA
> >> >>> ticket.
> >> >>>
> >> >>> It could not simply append to an
> OutputStream,
> >> since it must either:
> >> >>> * Seek to the start to validate the
> schemas
> >> match and find the sync
> >> >>> marker, or
> >> >>> * Trust that the schemas match and
> find the
> >> sync marker from the last
> >> >>> block
> >> >>>
> >> >>> DataFileWriter cannot refer to Hadoop
> classes
> >> such as FileSystem, but we
> >> >>> could add something to the mapred
> module that
> >> takes a Path and
> >> >>> FileSystem and returns something that
> >> implemements an interface that
> >> >>> DataFileWriter can append to. 
> This would
> >> be something that is both a
> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
> >> >>> and an OutputStream, or has both an
> InputStream
> >> from the start of the
> >> >>> existing file and an OutputStream at
> the end.
> >> >>>
> >> >>>> Thanks,
> >> >>>> Vyacheslav
> >> >>>>
> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh
> J
> >> wrote:
> >> >>>>
> >> >>>>> Hi,
> >> >>>>>
> >> >>>>> Use the appendTo feature of
> the
> >> DataFileWriter. See
> >> >>>>>
> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >>>>>
> >> >>>>> For a quick setup example,
> read also:
> >> >>>>>
> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
> >> >>>>>
> >> >>>>> On Tue, Feb 21, 2012 at 3:15
> AM,
> >> Vyacheslav Zholudev
> >> >>>>> <vy...@gmail.com>
> >> wrote:
> >> >>>>>> Hi,
> >> >>>>>>
> >> >>>>>> is it possible to append
> to an
> >> already existing avro file when it was
> >> >>>>>> written and closed
> before?
> >> >>>>>>
> >> >>>>>> If I use
> >> >>>>>> outputStream =
> >> fs.append(avroFilePath);
> >> >>>>>>
> >> >>>>>> then later on I get:
> >> java.io.IOException: Invalid sync!
> >> >>>>>>
> >> >>>>>> Probably because the
> schema is
> >> written twice and some other issues.
> >> >>>>>>
> >> >>>>>> If I use outputStream =
> >> fs.create(avroFilePath); then the avro file
> >> >>>>>> gets
> >> >>>>>> overwritten.
> >> >>>>>>
> >> >>>>>> Thanks,
> >> >>>>>> Vyacheslav
> >> >>>>>
> >> >>>>> --
> >> >>>>> Harsh J
> >> >>>>> Customer Ops. Engineer
> >> >>>>> Cloudera | http://tiny.cloudera.com/about
> >> >
> >>
> 

Re: Is it possible to append to an already existing avro file

Posted by Doug Cutting <cu...@apache.org>.
It will work on an OutputStream that supports append.

http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
java.io.OutputStream)

So it depends on how well HDFS implements FileSystem#append(), not on
any changes in Avro.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)

I have no recent personal experience with append in HDFS.  Does anyone
else here?

Doug

On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak <mi...@yahoo.com> wrote:
> My understanding is that will append to a file on the local filesystem, but not to a file on HDFS.
>
> --- On Tue, 2/5/13, Doug Cutting <cu...@apache.org> wrote:
>
>> From: Doug Cutting <cu...@apache.org>
>> Subject: Re: Is it possible to append to an already existing avro file
>> To: user@avro.apache.org
>> Date: Tuesday, February 5, 2013, 5:08 PM
>> The Jira is:
>>
>> https://issues.apache.org/jira/browse/AVRO-1035
>>
>> It is possible to append to an existing Avro file:
>>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>
>> Should we close that issue as "fixed"?
>>
>> Doug
>>
>> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak <mi...@yahoo.com>
>> wrote:
>> > Was a JIRA ticket ever created regarding appending to
>> an existing Avro file on HDFS?
>> >
>> > What is the status of such a capability, a year out
>> from when the issue below was raised?
>> >
>> > On Wed, 22 Feb 2012 10:57:48 +0100, "Vyacheslav
>> Zholudev" <vy...@gmail.com>
>> wrote:
>> >
>> >> Thanks for your reply, I suspected this.
>> >>
>> >> I will create a JIRA ticket.
>> >>
>> >> Vyacheslav
>> >>
>> >> On Feb 21, 2012, at 6:02 PM, Scott Carey wrote:
>> >>
>> >>>
>> >>> On 2/21/12 7:29 AM, "Vyacheslav Zholudev"
>> <vy...@gmail.com>
>> >>> wrote:
>> >>>
>> >>>> Yep, I saw that method as well as the
>> stackoverflow post. However, I'm
>> >>>> interested how to append to a file on the
>> arbitrary file system, not
>> >>>> only on the local one.
>> >>>>
>> >>>> I want to get an OutputStream based on the
>> Path and the FileSystem
>> >>>> implementation and then pass it for
>> appending to avro methods.
>> >>>>
>> >>>> Is that possible?
>> >>>
>> >>> It is not possible without modifying
>> DataFileWriter. Please open a JIRA
>> >>> ticket.
>> >>>
>> >>> It could not simply append to an OutputStream,
>> since it must either:
>> >>> * Seek to the start to validate the schemas
>> match and find the sync
>> >>> marker, or
>> >>> * Trust that the schemas match and find the
>> sync marker from the last
>> >>> block
>> >>>
>> >>> DataFileWriter cannot refer to Hadoop classes
>> such as FileSystem, but we
>> >>> could add something to the mapred module that
>> takes a Path and
>> >>> FileSystem and returns something that
>> implemements an interface that
>> >>> DataFileWriter can append to.  This would
>> be something that is both a
>> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >>> and an OutputStream, or has both an InputStream
>> from the start of the
>> >>> existing file and an OutputStream at the end.
>> >>>
>> >>>> Thanks,
>> >>>> Vyacheslav
>> >>>>
>> >>>> On Feb 21, 2012, at 5:29 AM, Harsh J
>> wrote:
>> >>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> Use the appendTo feature of the
>> DataFileWriter. See
>> >>>>>
>> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >>>>>
>> >>>>> For a quick setup example, read also:
>> >>>>>
>> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >>>>>
>> >>>>> On Tue, Feb 21, 2012 at 3:15 AM,
>> Vyacheslav Zholudev
>> >>>>> <vy...@gmail.com>
>> wrote:
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> is it possible to append to an
>> already existing avro file when it was
>> >>>>>> written and closed before?
>> >>>>>>
>> >>>>>> If I use
>> >>>>>> outputStream =
>> fs.append(avroFilePath);
>> >>>>>>
>> >>>>>> then later on I get:
>> java.io.IOException: Invalid sync!
>> >>>>>>
>> >>>>>> Probably because the schema is
>> written twice and some other issues.
>> >>>>>>
>> >>>>>> If I use outputStream =
>> fs.create(avroFilePath); then the avro file
>> >>>>>> gets
>> >>>>>> overwritten.
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Vyacheslav
>> >>>>>
>> >>>>> --
>> >>>>> Harsh J
>> >>>>> Customer Ops. Engineer
>> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >
>>
>> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak <mi...@yahoo.com>
>> wrote:
>> > Was a JIRA ticket ever created regarding appending to
>> an existing Avro file on HDFS?
>> >
>> > What is the status of such a capability, a year out
>> from when the issue below was raised?
>> >
>> > On Wed, 22 Feb 2012 10:57:48 +0100, "Vyacheslav
>> Zholudev" <vy...@gmail.com>
>> wrote:
>> >
>> >> Thanks for your reply, I suspected this.
>> >>
>> >> I will create a JIRA ticket.
>> >>
>> >> Vyacheslav
>> >>
>> >> On Feb 21, 2012, at 6:02 PM, Scott Carey wrote:
>> >>
>> >>>
>> >>> On 2/21/12 7:29 AM, "Vyacheslav Zholudev"
>> <vy...@gmail.com>
>> >>> wrote:
>> >>>
>> >>>> Yep, I saw that method as well as the
>> stackoverflow post. However, I'm
>> >>>> interested how to append to a file on the
>> arbitrary file system, not
>> >>>> only on the local one.
>> >>>>
>> >>>> I want to get an OutputStream based on the
>> Path and the FileSystem
>> >>>> implementation and then pass it for
>> appending to avro methods.
>> >>>>
>> >>>> Is that possible?
>> >>>
>> >>> It is not possible without modifying
>> DataFileWriter. Please open a JIRA
>> >>> ticket.
>> >>>
>> >>> It could not simply append to an OutputStream,
>> since it must either:
>> >>> * Seek to the start to validate the schemas
>> match and find the sync
>> >>> marker, or
>> >>> * Trust that the schemas match and find the
>> sync marker from the last
>> >>> block
>> >>>
>> >>> DataFileWriter cannot refer to Hadoop classes
>> such as FileSystem, but we
>> >>> could add something to the mapred module that
>> takes a Path and
>> >>> FileSystem and returns something that
>> implemements an interface that
>> >>> DataFileWriter can append to.  This would
>> be something that is both a
>> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >>> and an OutputStream, or has both an InputStream
>> from the start of the
>> >>> existing file and an OutputStream at the end.
>> >>>
>> >>>> Thanks,
>> >>>> Vyacheslav
>> >>>>
>> >>>> On Feb 21, 2012, at 5:29 AM, Harsh J
>> wrote:
>> >>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> Use the appendTo feature of the
>> DataFileWriter. See
>> >>>>>
>> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >>>>>
>> >>>>> For a quick setup example, read also:
>> >>>>>
>> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >>>>>
>> >>>>> On Tue, Feb 21, 2012 at 3:15 AM,
>> Vyacheslav Zholudev
>> >>>>> <vy...@gmail.com>
>> wrote:
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> is it possible to append to an
>> already existing avro file when it was
>> >>>>>> written and closed before?
>> >>>>>>
>> >>>>>> If I use
>> >>>>>> outputStream =
>> fs.append(avroFilePath);
>> >>>>>>
>> >>>>>> then later on I get:
>> java.io.IOException: Invalid sync!
>> >>>>>>
>> >>>>>> Probably because the schema is
>> written twice and some other issues.
>> >>>>>>
>> >>>>>> If I use outputStream =
>> fs.create(avroFilePath); then the avro file
>> >>>>>> gets
>> >>>>>> overwritten.
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Vyacheslav
>> >>>>>
>> >>>>> --
>> >>>>> Harsh J
>> >>>>> Customer Ops. Engineer
>> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >
>>

Re: Is it possible to append to an already existing avro file

Posted by Michael Malak <mi...@yahoo.com>.
My understanding is that will append to a file on the local filesystem, but not to a file on HDFS.

--- On Tue, 2/5/13, Doug Cutting <cu...@apache.org> wrote:

> From: Doug Cutting <cu...@apache.org>
> Subject: Re: Is it possible to append to an already existing avro file
> To: user@avro.apache.org
> Date: Tuesday, February 5, 2013, 5:08 PM
> The Jira is:
> 
> https://issues.apache.org/jira/browse/AVRO-1035
> 
> It is possible to append to an existing Avro file:
> 
> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> 
> Should we close that issue as "fixed"?
> 
> Doug
> 
> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak <mi...@yahoo.com>
> wrote:
> > Was a JIRA ticket ever created regarding appending to
> an existing Avro file on HDFS?
> >
> > What is the status of such a capability, a year out
> from when the issue below was raised?
> >
> > On Wed, 22 Feb 2012 10:57:48 +0100, "Vyacheslav
> Zholudev" <vy...@gmail.com>
> wrote:
> >
> >> Thanks for your reply, I suspected this.
> >>
> >> I will create a JIRA ticket.
> >>
> >> Vyacheslav
> >>
> >> On Feb 21, 2012, at 6:02 PM, Scott Carey wrote:
> >>
> >>>
> >>> On 2/21/12 7:29 AM, "Vyacheslav Zholudev"
> <vy...@gmail.com>
> >>> wrote:
> >>>
> >>>> Yep, I saw that method as well as the
> stackoverflow post. However, I'm
> >>>> interested how to append to a file on the
> arbitrary file system, not
> >>>> only on the local one.
> >>>>
> >>>> I want to get an OutputStream based on the
> Path and the FileSystem
> >>>> implementation and then pass it for
> appending to avro methods.
> >>>>
> >>>> Is that possible?
> >>>
> >>> It is not possible without modifying
> DataFileWriter. Please open a JIRA
> >>> ticket.
> >>>
> >>> It could not simply append to an OutputStream,
> since it must either:
> >>> * Seek to the start to validate the schemas
> match and find the sync
> >>> marker, or
> >>> * Trust that the schemas match and find the
> sync marker from the last
> >>> block
> >>>
> >>> DataFileWriter cannot refer to Hadoop classes
> such as FileSystem, but we
> >>> could add something to the mapred module that
> takes a Path and
> >>> FileSystem and returns something that
> implemements an interface that
> >>> DataFileWriter can append to.  This would
> be something that is both a
> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
> >>> and an OutputStream, or has both an InputStream
> from the start of the
> >>> existing file and an OutputStream at the end.
> >>>
> >>>> Thanks,
> >>>> Vyacheslav
> >>>>
> >>>> On Feb 21, 2012, at 5:29 AM, Harsh J
> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> Use the appendTo feature of the
> DataFileWriter. See
> >>>>>
> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >>>>>
> >>>>> For a quick setup example, read also:
> >>>>>
> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
> >>>>>
> >>>>> On Tue, Feb 21, 2012 at 3:15 AM,
> Vyacheslav Zholudev
> >>>>> <vy...@gmail.com>
> wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> is it possible to append to an
> already existing avro file when it was
> >>>>>> written and closed before?
> >>>>>>
> >>>>>> If I use
> >>>>>> outputStream =
> fs.append(avroFilePath);
> >>>>>>
> >>>>>> then later on I get:
> java.io.IOException: Invalid sync!
> >>>>>>
> >>>>>> Probably because the schema is
> written twice and some other issues.
> >>>>>>
> >>>>>> If I use outputStream =
> fs.create(avroFilePath); then the avro file
> >>>>>> gets
> >>>>>> overwritten.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Vyacheslav
> >>>>>
> >>>>> --
> >>>>> Harsh J
> >>>>> Customer Ops. Engineer
> >>>>> Cloudera | http://tiny.cloudera.com/about
> >
> 
> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak <mi...@yahoo.com>
> wrote:
> > Was a JIRA ticket ever created regarding appending to
> an existing Avro file on HDFS?
> >
> > What is the status of such a capability, a year out
> from when the issue below was raised?
> >
> > On Wed, 22 Feb 2012 10:57:48 +0100, "Vyacheslav
> Zholudev" <vy...@gmail.com>
> wrote:
> >
> >> Thanks for your reply, I suspected this.
> >>
> >> I will create a JIRA ticket.
> >>
> >> Vyacheslav
> >>
> >> On Feb 21, 2012, at 6:02 PM, Scott Carey wrote:
> >>
> >>>
> >>> On 2/21/12 7:29 AM, "Vyacheslav Zholudev"
> <vy...@gmail.com>
> >>> wrote:
> >>>
> >>>> Yep, I saw that method as well as the
> stackoverflow post. However, I'm
> >>>> interested how to append to a file on the
> arbitrary file system, not
> >>>> only on the local one.
> >>>>
> >>>> I want to get an OutputStream based on the
> Path and the FileSystem
> >>>> implementation and then pass it for
> appending to avro methods.
> >>>>
> >>>> Is that possible?
> >>>
> >>> It is not possible without modifying
> DataFileWriter. Please open a JIRA
> >>> ticket.
> >>>
> >>> It could not simply append to an OutputStream,
> since it must either:
> >>> * Seek to the start to validate the schemas
> match and find the sync
> >>> marker, or
> >>> * Trust that the schemas match and find the
> sync marker from the last
> >>> block
> >>>
> >>> DataFileWriter cannot refer to Hadoop classes
> such as FileSystem, but we
> >>> could add something to the mapred module that
> takes a Path and
> >>> FileSystem and returns something that
> implemements an interface that
> >>> DataFileWriter can append to.  This would
> be something that is both a
> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
> >>> and an OutputStream, or has both an InputStream
> from the start of the
> >>> existing file and an OutputStream at the end.
> >>>
> >>>> Thanks,
> >>>> Vyacheslav
> >>>>
> >>>> On Feb 21, 2012, at 5:29 AM, Harsh J
> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> Use the appendTo feature of the
> DataFileWriter. See
> >>>>>
> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >>>>>
> >>>>> For a quick setup example, read also:
> >>>>>
> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
> >>>>>
> >>>>> On Tue, Feb 21, 2012 at 3:15 AM,
> Vyacheslav Zholudev
> >>>>> <vy...@gmail.com>
> wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> is it possible to append to an
> already existing avro file when it was
> >>>>>> written and closed before?
> >>>>>>
> >>>>>> If I use
> >>>>>> outputStream =
> fs.append(avroFilePath);
> >>>>>>
> >>>>>> then later on I get:
> java.io.IOException: Invalid sync!
> >>>>>>
> >>>>>> Probably because the schema is
> written twice and some other issues.
> >>>>>>
> >>>>>> If I use outputStream =
> fs.create(avroFilePath); then the avro file
> >>>>>> gets
> >>>>>> overwritten.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Vyacheslav
> >>>>>
> >>>>> --
> >>>>> Harsh J
> >>>>> Customer Ops. Engineer
> >>>>> Cloudera | http://tiny.cloudera.com/about
> >
> 

Re: Is it possible to append to an already existing avro file

Posted by Doug Cutting <cu...@apache.org>.
The Jira is:

https://issues.apache.org/jira/browse/AVRO-1035

It is possible to append to an existing Avro file:

http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)

Should we close that issue as "fixed"?

Doug

On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak <mi...@yahoo.com> wrote:
> Was a JIRA ticket ever created regarding appending to an existing Avro file on HDFS?
>
> What is the status of such a capability, a year out from when the issue below was raised?
>
> On Wed, 22 Feb 2012 10:57:48 +0100, "Vyacheslav Zholudev" <vy...@gmail.com> wrote:
>
>> Thanks for your reply, I suspected this.
>>
>> I will create a JIRA ticket.
>>
>> Vyacheslav
>>
>> On Feb 21, 2012, at 6:02 PM, Scott Carey wrote:
>>
>>>
>>> On 2/21/12 7:29 AM, "Vyacheslav Zholudev" <vy...@gmail.com>
>>> wrote:
>>>
>>>> Yep, I saw that method as well as the stackoverflow post. However, I'm
>>>> interested how to append to a file on the arbitrary file system, not
>>>> only on the local one.
>>>>
>>>> I want to get an OutputStream based on the Path and the FileSystem
>>>> implementation and then pass it for appending to avro methods.
>>>>
>>>> Is that possible?
>>>
>>> It is not possible without modifying DataFileWriter. Please open a JIRA
>>> ticket.
>>>
>>> It could not simply append to an OutputStream, since it must either:
>>> * Seek to the start to validate the schemas match and find the sync
>>> marker, or
>>> * Trust that the schemas match and find the sync marker from the last
>>> block
>>>
>>> DataFileWriter cannot refer to Hadoop classes such as FileSystem, but we
>>> could add something to the mapred module that takes a Path and
>>> FileSystem and returns something that implemements an interface that
>>> DataFileWriter can append to.  This would be something that is both a
>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>> and an OutputStream, or has both an InputStream from the start of the
>>> existing file and an OutputStream at the end.
>>>
>>>> Thanks,
>>>> Vyacheslav
>>>>
>>>> On Feb 21, 2012, at 5:29 AM, Harsh J wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Use the appendTo feature of the DataFileWriter. See
>>>>>
>>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>>>>
>>>>> For a quick setup example, read also:
>>>>>
>>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>>>>
>>>>> On Tue, Feb 21, 2012 at 3:15 AM, Vyacheslav Zholudev
>>>>> <vy...@gmail.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> is it possible to append to an already existing avro file when it was
>>>>>> written and closed before?
>>>>>>
>>>>>> If I use
>>>>>> outputStream = fs.append(avroFilePath);
>>>>>>
>>>>>> then later on I get: java.io.IOException: Invalid sync!
>>>>>>
>>>>>> Probably because the schema is written twice and some other issues.
>>>>>>
>>>>>> If I use outputStream = fs.create(avroFilePath); then the avro file
>>>>>> gets
>>>>>> overwritten.
>>>>>>
>>>>>> Thanks,
>>>>>> Vyacheslav
>>>>>
>>>>> --
>>>>> Harsh J
>>>>> Customer Ops. Engineer
>>>>> Cloudera | http://tiny.cloudera.com/about
>

On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak <mi...@yahoo.com> wrote:
> Was a JIRA ticket ever created regarding appending to an existing Avro file on HDFS?
>
> What is the status of such a capability, a year out from when the issue below was raised?
>
> On Wed, 22 Feb 2012 10:57:48 +0100, "Vyacheslav Zholudev" <vy...@gmail.com> wrote:
>
>> Thanks for your reply, I suspected this.
>>
>> I will create a JIRA ticket.
>>
>> Vyacheslav
>>
>> On Feb 21, 2012, at 6:02 PM, Scott Carey wrote:
>>
>>>
>>> On 2/21/12 7:29 AM, "Vyacheslav Zholudev" <vy...@gmail.com>
>>> wrote:
>>>
>>>> Yep, I saw that method as well as the stackoverflow post. However, I'm
>>>> interested how to append to a file on the arbitrary file system, not
>>>> only on the local one.
>>>>
>>>> I want to get an OutputStream based on the Path and the FileSystem
>>>> implementation and then pass it for appending to avro methods.
>>>>
>>>> Is that possible?
>>>
>>> It is not possible without modifying DataFileWriter. Please open a JIRA
>>> ticket.
>>>
>>> It could not simply append to an OutputStream, since it must either:
>>> * Seek to the start to validate the schemas match and find the sync
>>> marker, or
>>> * Trust that the schemas match and find the sync marker from the last
>>> block
>>>
>>> DataFileWriter cannot refer to Hadoop classes such as FileSystem, but we
>>> could add something to the mapred module that takes a Path and
>>> FileSystem and returns something that implemements an interface that
>>> DataFileWriter can append to.  This would be something that is both a
>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>> and an OutputStream, or has both an InputStream from the start of the
>>> existing file and an OutputStream at the end.
>>>
>>>> Thanks,
>>>> Vyacheslav
>>>>
>>>> On Feb 21, 2012, at 5:29 AM, Harsh J wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Use the appendTo feature of the DataFileWriter. See
>>>>>
>>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>>>>
>>>>> For a quick setup example, read also:
>>>>>
>>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>>>>
>>>>> On Tue, Feb 21, 2012 at 3:15 AM, Vyacheslav Zholudev
>>>>> <vy...@gmail.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> is it possible to append to an already existing avro file when it was
>>>>>> written and closed before?
>>>>>>
>>>>>> If I use
>>>>>> outputStream = fs.append(avroFilePath);
>>>>>>
>>>>>> then later on I get: java.io.IOException: Invalid sync!
>>>>>>
>>>>>> Probably because the schema is written twice and some other issues.
>>>>>>
>>>>>> If I use outputStream = fs.create(avroFilePath); then the avro file
>>>>>> gets
>>>>>> overwritten.
>>>>>>
>>>>>> Thanks,
>>>>>> Vyacheslav
>>>>>
>>>>> --
>>>>> Harsh J
>>>>> Customer Ops. Engineer
>>>>> Cloudera | http://tiny.cloudera.com/about
>