You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Florin P <fl...@yahoo.com> on 2012/04/07 20:19:37 UTC

Is append allowed in HDFS?

Hello!
  Just google it for supporting of append into HDFS files and the result: I'm puzzled. Can someone say: YES you can append in TextFile or SequenceFile or whatever format. If yes, in which version this feature is supported ? Also where can I find a good example of using the API? I know that is a long debate about this subject, but really it is challenge to find on the google the current status of this feature.
I look forward for a trust source answer. 

Thank you,
  Regards,
   Florin

Re: Is append allowed in HDFS?

Posted by Harsh J <ha...@cloudera.com>.
This isn't possible presently. If you close the open file stream for a
sequence file, you're done with it. I'd advise not to close it and use
hflush instead, much like a WAL. Close it only when you're done with
some threshold, and open a new file. The hflush (or sync in 1.x) will
ensure that the latest additions are available for immediate reads (to
all new readers).

The patch at https://issues.apache.org/jira/browse/HADOOP-7139 will
help solve this limitation though. Its under review and needs some
further work.

On Tue, Apr 24, 2012 at 6:47 PM, Florin P <fl...@yahoo.com> wrote:
> Hello!
>   Thank you for your responses. I've read in this posts
> http://stackoverflow.com/questions/5598400/hdfs-using-hdfs-api-to-append-to-a-sequencefile
> also
> https://issues.apache.org/jira/browse/HADOOP-3977
>
> that you cannot add new fresh data in an existing SequenceFile. So,
> basically, you have the scenario:
> 1. Writing to a SequenceFile
> 2. Close the file
> 2. Reopen the written file
> 3. Add new fresh data to it
> 4. Close the file
> At the end you'll have the old data plus new added data. Can you have an
> example (code) how you can achieve this scenario with the API? Please
> specify which version you're using.
>
> Thank you.
>
> Regards,
>   Florin
>
> ________________________________
> From: Ioan Eugen Stan <st...@gmail.com>
> To: hdfs-user@hadoop.apache.org; Florin P <fl...@yahoo.com>
> Sent: Friday, April 13, 2012 1:23 PM
>
> Subject: Re: Is append allowed in HDFS?
>
> 2012/4/13 Florin P <fl...@yahoo.com>:
>> Hello!
>>  Thank you all for all responses. It is possible to have a matrix of
>> hadoop
>> file input format that supports append or if I understood correctly, all
>> formats are now supporting append?
>> Thanks a lot.
>>   Regards,
>>  Florin
>
> Hi Florin,
>
> Append is a file-system feature not a file format feature although
> some file formats are designed to be immutable (MapFile, HFile). You
> can append to them, just don't use the interface they normally
> provide.
>
>> ________________________________
>> From: Inder Pall <in...@gmail.com>
>> To: hdfs-user@hadoop.apache.org
>> Sent: Tuesday, April 10, 2012 8:12 AM
>> Subject: Re: Is append allowed in HDFS?
>>
>> Harsh,
>>
>> idea is to call sync for a configured batch. Still under implementation as
>> other parts of the system's aren't complete.
>>
>> recovery/resume-from-errors-at-DN code around general tail-like
>>>>This sounds promising, can you please shed some more light on this.
>>
>> - inder
>> On Tue, Apr 10, 2012 at 1:07 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Your approach looks fine to me. I'd throw in some
>> recovery/resume-from-errors-at-DN code around general tail-like
>> consumption but I think you may have already done that :)
>>
>> But just for my curiosity - do you call sync for every record/unit or
>> batch it by a few, for your problem?
>>
>> On Mon, Apr 9, 2012 at 10:34 PM, Inder Pall <in...@gmail.com> wrote:
>>> Yes makes sense. My use-case is more like a producer/consumer and
>>> consumer
>>> trying to stream data as it arrives.
>>> Has anyone hit this before and if so resolved it in a better way.
>>>
>>> Apologies, if i am digressing from the subject of this thread however
>>> seems
>>> to land in the bucket of append support in HDFS.
>>>
>>> - Inder
>>>
>>>
>>> On Mon, Apr 9, 2012 at 6:27 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>> Inder,
>>>>
>>>> Yes, that is a requirement for readers of sync-ing data. The new meta
>>>> entries can only be read by new readers. The read code would end up
>>>> being exactly like the implementation for method "fs -tail" at
>>>>
>>>>
>>>>
>>>>
>>>> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/fs/FsShell.java?view=markup
>>>> (Line 1101)
>>>>
>>>> HBase does not read the WAL (HLog) continuously/vigorously as it
>>>> syncs, by the way. It only reads the them when a specific request is
>>>> made (for splitting, replaying and debug-printing).
>>>>
>>>> On Mon, Apr 9, 2012 at 6:05 PM, Inder Pall <in...@gmail.com> wrote:
>>>> > Based on what i have tried, after a sync you need to open a new
>>>> > Reader.
>>>> > Please correct if that's not the write semantics.
>>>> >
>>>> > Thanks,
>>>> > - Inder
>>>> >
>>>> >
>>>> > On Mon, Apr 9, 2012 at 4:23 PM, Harsh J <ha...@cloudera.com> wrote:
>>>> >>
>>>> >> I'd also like to note that there are some unresolved issues with the
>>>> >> append version in the 1.x (stable) line.
>>>> >>
>>>> >> Note that HBase's use of the 0.20-append branch features are limited
>>>> >> to using "sync" calls alone (Described in p68 "Coherency Model",
>>>> >> Chapter 3 (The Hadoop Distributed File System) in Hadoop: The
>>>> >> Definitive Guide, 2nd Edition (O'Reilly)). Not the file-reopening
>>>> >> "append" calls. The latter is what is still with issues in the 1.x
>>>> >> releases today. Using the former is alright if its done in the way
>>>> >> similar to HBase's WAL (HLog) (or for similar needs).
>>>> >>
>>>> >> On Mon, Apr 9, 2012 at 3:45 PM, Ioan Eugen Stan
>>>> >> <st...@gmail.com>
>>>> >> wrote:
>>>> >> > 2012/4/7 Florin P <fl...@yahoo.com>:
>>>> >> >> Hello!
>>>> >> >>   Just google it for supporting of append into HDFS files and the
>>>> >> >> result:
>>>> >> >> I'm puzzled. Can someone say: YES you can append in TextFile or
>>>> >> >> SequenceFile
>>>> >> >> or whatever format. If yes, in which version this feature is
>>>> >> >> supported
>>>> >> >> ?
>>>> >> >> Also where can I find a good example of using the API? I know that
>>>> >> >> is a
>>>> >> >> long
>>>> >> >> debate about this subject, but really it is challenge to find on
>>>> >> >> the
>>>> >> >> google
>>>> >> >> the current status of this feature.
>>>> >> >> I look forward for a trust source answer.
>>>> >> >> Thank you,
>>>> >> >>   Regards,
>>>> >> >>    Florin
>>>> >> >
>>>> >> > Hi Florian,
>>>> >> >
>>>> >> > HDFS supports append in Hadoop 1.0.x branch and also 0.22 (a.k.a
>>>> >> > hadoop 2.x branch) and 0.23 (a.k.a hadoop 3.x branch).
>>>> >> >
>>>> >> > [1] http://hbase.apache.org/book/hadoop.html
>>>> >> > [2] http://hbase.apache.org/book/hadoop.html -- search for append
>>>> >> > in
>>>> >> > release notes
>>>> >> >
>>>> >> > Cheers,
>>>> >> > --
>>>> >> > Ioan Eugen Stan
>>>> >> > http://ieugen.blogspot.com/
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Harsh J
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Thanks,
>>>> > - Inder
>>>> >   Tech Platforms @Inmobi
>>>> >   Linkedin - http://goo.gl/eR4Ub
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>>
>>> --
>>> Thanks,
>>> - Inder
>>>   Tech Platforms @Inmobi
>>>   Linkedin - http://goo.gl/eR4Ub
>>
>>
>>
>> --
>> Harsh J
>>
>>
>>
>>
>> --
>> Thanks,
>> - Inder
>>   Tech Platforms @Inmobi
>>   Linkedin - http://goo.gl/eR4Ub
>>
>>
>
>
>
> --
> Ioan Eugen Stan
> http://ieugen.blogspot.com/
>
>



-- 
Harsh J

Re: Is append allowed in HDFS?

Posted by Florin P <fl...@yahoo.com>.
Hello!
  Thank you for your responses. I've read in this posts
http://stackoverflow.com/questions/5598400/hdfs-using-hdfs-api-to-append-to-a-sequencefile 
also
https://issues.apache.org/jira/browse/HADOOP-3977

that you cannot add new fresh data in an existing SequenceFile. So, 
basically, you have the scenario:
1. Writing to a SequenceFile
2. Close the file

2. Reopen the written file
3. Add new fresh data to it
4. Close the file
At the end you'll have the old data plus new added data. Can you have an example (code) how you can achieve this scenario with the API? Please specify which version you're using. 

Thank you.

Regards,
  Florin




________________________________
 From: Ioan Eugen Stan <st...@gmail.com>
To: hdfs-user@hadoop.apache.org; Florin P <fl...@yahoo.com> 
Sent: Friday, April 13, 2012 1:23 PM
Subject: Re: Is append allowed in HDFS?
 
2012/4/13 Florin P <fl...@yahoo.com>:
> Hello!
>  Thank you all for all responses. It is possible to have a matrix of
> hadoop
> file input format that supports append or if I understood correctly, all
> formats are now supporting append?
> Thanks a lot.
>   Regards,
>  Florin

Hi Florin,

Append is a file-system feature not a file format feature although
some file formats are designed to be immutable (MapFile, HFile). You
can append to them, just don't use the interface they normally
provide.

> ________________________________
> From: Inder Pall <in...@gmail.com>
> To: hdfs-user@hadoop.apache.org
> Sent: Tuesday, April 10, 2012 8:12 AM
> Subject: Re: Is append allowed in HDFS?
>
> Harsh,
>
> idea is to call sync for a configured batch. Still under implementation as
> other parts of the system's aren't complete.
>
> recovery/resume-from-errors-at-DN code around general tail-like
>>>This sounds promising, can you please shed some more light on this.
>
> - inder
> On Tue, Apr 10, 2012 at 1:07 AM, Harsh J <ha...@cloudera.com> wrote:
>
> Your approach looks fine to me. I'd throw in some
> recovery/resume-from-errors-at-DN code around general tail-like
> consumption but I think you may have already done that :)
>
> But just for my curiosity - do you call sync for every record/unit or
> batch it by a few, for your problem?
>
> On Mon, Apr 9, 2012 at 10:34 PM, Inder Pall <in...@gmail.com> wrote:
>> Yes makes sense. My use-case is more like a producer/consumer and
>> consumer
>> trying to stream data as it arrives.
>> Has anyone hit this before and if so resolved it in a better way.
>>
>> Apologies, if i am digressing from the subject of this thread however
>> seems
>> to land in the bucket of append support in HDFS.
>>
>> - Inder
>>
>>
>> On Mon, Apr 9, 2012 at 6:27 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>> Inder,
>>>
>>> Yes, that is a requirement for readers of sync-ing data. The new meta
>>> entries can only be read by new readers. The read code would end up
>>> being exactly like the implementation for method "fs -tail" at
>>>
>>>
>>>
>>> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/fs/FsShell.java?view=markup
>>> (Line 1101)
>>>
>>> HBase does not read the WAL (HLog) continuously/vigorously as it
>>> syncs, by the way. It only reads the them when a specific request is
>>> made (for splitting, replaying and debug-printing).
>>>
>>> On Mon, Apr 9, 2012 at 6:05 PM, Inder Pall <in...@gmail.com> wrote:
>>> > Based on what i have tried, after a sync you need to open a new
>>> > Reader.
>>> > Please correct if that's not the write semantics.
>>> >
>>> > Thanks,
>>> > - Inder
>>> >
>>> >
>>> > On Mon, Apr 9, 2012 at 4:23 PM, Harsh J <ha...@cloudera.com> wrote:
>>> >>
>>> >> I'd also like to note that there are some unresolved issues with the
>>> >> append version in the 1.x (stable) line.
>>> >>
>>> >> Note that HBase's use of the 0.20-append branch features are limited
>>> >> to using "sync" calls alone (Described in p68 "Coherency Model",
>>> >> Chapter 3 (The Hadoop Distributed File System) in Hadoop: The
>>> >> Definitive Guide, 2nd Edition (O'Reilly)). Not the file-reopening
>>> >> "append" calls. The latter is what is still with issues in the 1.x
>>> >> releases today. Using the former is alright if its done in the way
>>> >> similar to HBase's WAL (HLog) (or for similar needs).
>>> >>
>>> >> On Mon, Apr 9, 2012 at 3:45 PM, Ioan Eugen Stan
>>> >> <st...@gmail.com>
>>> >> wrote:
>>> >> > 2012/4/7 Florin P <fl...@yahoo.com>:
>>> >> >> Hello!
>>> >> >>   Just google it for supporting of append into HDFS files and the
>>> >> >> result:
>>> >> >> I'm puzzled. Can someone say: YES you can append in TextFile or
>>> >> >> SequenceFile
>>> >> >> or whatever format. If yes, in which version this feature is
>>> >> >> supported
>>> >> >> ?
>>> >> >> Also where can I find a good example of using the API? I know that
>>> >> >> is a
>>> >> >> long
>>> >> >> debate about this subject, but really it is challenge to find on
>>> >> >> the
>>> >> >> google
>>> >> >> the current status of this feature.
>>> >> >> I look forward for a trust source answer.
>>> >> >> Thank you,
>>> >> >>   Regards,
>>> >> >>    Florin
>>> >> >
>>> >> > Hi Florian,
>>> >> >
>>> >> > HDFS supports append in Hadoop 1.0.x branch and also 0.22 (a.k.a
>>> >> > hadoop 2.x branch) and 0.23 (a.k.a hadoop 3.x branch).
>>> >> >
>>> >> > [1] http://hbase.apache.org/book/hadoop.html
>>> >> > [2] http://hbase.apache.org/book/hadoop.html -- search for append
>>> >> > in
>>> >> > release notes
>>> >> >
>>> >> > Cheers,
>>> >> > --
>>> >> > Ioan Eugen Stan
>>> >> > http://ieugen.blogspot.com/
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Harsh J
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Thanks,
>>> > - Inder
>>> >   Tech Platforms @Inmobi
>>> >   Linkedin - http://goo.gl/eR4Ub
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>>
>>
>> --
>> Thanks,
>> - Inder
>>   Tech Platforms @Inmobi
>>   Linkedin - http://goo.gl/eR4Ub
>
>
>
> --
> Harsh J
>
>
>
>
> --
> Thanks,
> - Inder
>   Tech Platforms @Inmobi
>   Linkedin - http://goo.gl/eR4Ub
>
>



--
Ioan Eugen Stan
http://ieugen.blogspot.com/

Re: Is append allowed in HDFS?

Posted by Ioan Eugen Stan <st...@gmail.com>.
2012/4/13 Florin P <fl...@yahoo.com>:
> Hello!
>  Thank you all for all responses. It is possible to have a matrix of
> hadoop
> file input format that supports append or if I understood correctly, all
> formats are now supporting append?
> Thanks a lot.
>   Regards,
>  Florin

Hi Florin,

Append is a file-system feature not a file format feature although
some file formats are designed to be immutable (MapFile, HFile). You
can append to them, just don't use the interface they normally
provide.

> ________________________________
> From: Inder Pall <in...@gmail.com>
> To: hdfs-user@hadoop.apache.org
> Sent: Tuesday, April 10, 2012 8:12 AM
> Subject: Re: Is append allowed in HDFS?
>
> Harsh,
>
> idea is to call sync for a configured batch. Still under implementation as
> other parts of the system's aren't complete.
>
> recovery/resume-from-errors-at-DN code around general tail-like
>>>This sounds promising, can you please shed some more light on this.
>
> - inder
> On Tue, Apr 10, 2012 at 1:07 AM, Harsh J <ha...@cloudera.com> wrote:
>
> Your approach looks fine to me. I'd throw in some
> recovery/resume-from-errors-at-DN code around general tail-like
> consumption but I think you may have already done that :)
>
> But just for my curiosity - do you call sync for every record/unit or
> batch it by a few, for your problem?
>
> On Mon, Apr 9, 2012 at 10:34 PM, Inder Pall <in...@gmail.com> wrote:
>> Yes makes sense. My use-case is more like a producer/consumer and
>> consumer
>> trying to stream data as it arrives.
>> Has anyone hit this before and if so resolved it in a better way.
>>
>> Apologies, if i am digressing from the subject of this thread however
>> seems
>> to land in the bucket of append support in HDFS.
>>
>> - Inder
>>
>>
>> On Mon, Apr 9, 2012 at 6:27 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>> Inder,
>>>
>>> Yes, that is a requirement for readers of sync-ing data. The new meta
>>> entries can only be read by new readers. The read code would end up
>>> being exactly like the implementation for method "fs -tail" at
>>>
>>>
>>>
>>> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/fs/FsShell.java?view=markup
>>> (Line 1101)
>>>
>>> HBase does not read the WAL (HLog) continuously/vigorously as it
>>> syncs, by the way. It only reads the them when a specific request is
>>> made (for splitting, replaying and debug-printing).
>>>
>>> On Mon, Apr 9, 2012 at 6:05 PM, Inder Pall <in...@gmail.com> wrote:
>>> > Based on what i have tried, after a sync you need to open a new
>>> > Reader.
>>> > Please correct if that's not the write semantics.
>>> >
>>> > Thanks,
>>> > - Inder
>>> >
>>> >
>>> > On Mon, Apr 9, 2012 at 4:23 PM, Harsh J <ha...@cloudera.com> wrote:
>>> >>
>>> >> I'd also like to note that there are some unresolved issues with the
>>> >> append version in the 1.x (stable) line.
>>> >>
>>> >> Note that HBase's use of the 0.20-append branch features are limited
>>> >> to using "sync" calls alone (Described in p68 "Coherency Model",
>>> >> Chapter 3 (The Hadoop Distributed File System) in Hadoop: The
>>> >> Definitive Guide, 2nd Edition (O'Reilly)). Not the file-reopening
>>> >> "append" calls. The latter is what is still with issues in the 1.x
>>> >> releases today. Using the former is alright if its done in the way
>>> >> similar to HBase's WAL (HLog) (or for similar needs).
>>> >>
>>> >> On Mon, Apr 9, 2012 at 3:45 PM, Ioan Eugen Stan
>>> >> <st...@gmail.com>
>>> >> wrote:
>>> >> > 2012/4/7 Florin P <fl...@yahoo.com>:
>>> >> >> Hello!
>>> >> >>   Just google it for supporting of append into HDFS files and the
>>> >> >> result:
>>> >> >> I'm puzzled. Can someone say: YES you can append in TextFile or
>>> >> >> SequenceFile
>>> >> >> or whatever format. If yes, in which version this feature is
>>> >> >> supported
>>> >> >> ?
>>> >> >> Also where can I find a good example of using the API? I know that
>>> >> >> is a
>>> >> >> long
>>> >> >> debate about this subject, but really it is challenge to find on
>>> >> >> the
>>> >> >> google
>>> >> >> the current status of this feature.
>>> >> >> I look forward for a trust source answer.
>>> >> >> Thank you,
>>> >> >>   Regards,
>>> >> >>    Florin
>>> >> >
>>> >> > Hi Florian,
>>> >> >
>>> >> > HDFS supports append in Hadoop 1.0.x branch and also 0.22 (a.k.a
>>> >> > hadoop 2.x branch) and 0.23 (a.k.a hadoop 3.x branch).
>>> >> >
>>> >> > [1] http://hbase.apache.org/book/hadoop.html
>>> >> > [2] http://hbase.apache.org/book/hadoop.html -- search for append
>>> >> > in
>>> >> > release notes
>>> >> >
>>> >> > Cheers,
>>> >> > --
>>> >> > Ioan Eugen Stan
>>> >> > http://ieugen.blogspot.com/
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Harsh J
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Thanks,
>>> > - Inder
>>> >   Tech Platforms @Inmobi
>>> >   Linkedin - http://goo.gl/eR4Ub
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>>
>>
>> --
>> Thanks,
>> - Inder
>>   Tech Platforms @Inmobi
>>   Linkedin - http://goo.gl/eR4Ub
>
>
>
> --
> Harsh J
>
>
>
>
> --
> Thanks,
> - Inder
>   Tech Platforms @Inmobi
>   Linkedin - http://goo.gl/eR4Ub
>
>



--
Ioan Eugen Stan
http://ieugen.blogspot.com/

Re: Is append allowed in HDFS?

Posted by Florin P <fl...@yahoo.com>.
Hello!
 Thank you all for all responses. It is possible to have a matrix of hadoop file input format that supports append or if I understood correctly, all formats are now supporting append?
Thanks a lot.
  Regards,
 Florin



________________________________
 From: Inder Pall <in...@gmail.com>
To: hdfs-user@hadoop.apache.org 
Sent: Tuesday, April 10, 2012 8:12 AM
Subject: Re: Is append allowed in HDFS?
 

Harsh,

idea is to call sync for a configured batch. Still under implementation as other parts of the system's aren't complete.

recovery/resume-from-errors-at-DN code around general tail-like
>>This sounds promising, can you please shed some more light on this.

- inder

On Tue, Apr 10, 2012 at 1:07 AM, Harsh J <ha...@cloudera.com> wrote:

Your approach looks fine to me. I'd throw in some
>recovery/resume-from-errors-at-DN code around general tail-like
>consumption but I think you may have already done that :)
>
>But just for my curiosity - do you call sync for every record/unit or
>batch it by a few, for your problem?
>
>
>On Mon, Apr 9, 2012 at 10:34 PM, Inder Pall <in...@gmail.com> wrote:
>> Yes makes sense. My use-case is more like a producer/consumer and consumer
>> trying to stream data as it arrives.
>> Has anyone hit this before and if so resolved it in a better way.
>>
>> Apologies, if i am digressing from the subject of this thread however seems
>> to land in the bucket of append support in HDFS.
>>
>> - Inder
>>
>>
>> On Mon, Apr 9, 2012 at 6:27 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>> Inder,
>>>
>>> Yes, that is a requirement for readers of sync-ing data. The new meta
>>> entries can only be read by new readers. The read code would end up
>>> being exactly like the implementation for method "fs -tail" at
>>>
>>> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/fs/FsShell.java?view=markup
>>> (Line 1101)
>>>
>>> HBase does not read the WAL (HLog) continuously/vigorously as it
>>> syncs, by the way. It only reads the them when a specific request is
>>> made (for splitting, replaying and debug-printing).
>>>
>>> On Mon, Apr 9, 2012 at 6:05 PM, Inder Pall <in...@gmail.com> wrote:
>>> > Based on what i have tried, after a sync you need to open a new Reader.
>>> > Please correct if that's not the write semantics.
>>> >
>>> > Thanks,
>>> > - Inder
>>> >
>>> >
>>> > On Mon, Apr 9, 2012 at 4:23 PM, Harsh J <ha...@cloudera.com> wrote:
>>> >>
>>> >> I'd also like to note that there are some unresolved issues with the
>>> >> append version in the 1.x (stable) line.
>>> >>
>>> >> Note that HBase's use of the 0.20-append branch features are limited
>>> >> to using "sync" calls alone (Described in p68 "Coherency Model",
>>> >> Chapter 3 (The Hadoop Distributed File System) in Hadoop: The
>>> >> Definitive Guide, 2nd Edition (O'Reilly)). Not the file-reopening
>>> >> "append" calls. The latter is what is still with issues in the 1.x
>>> >> releases today. Using the former is alright if its done in the way
>>> >> similar to HBase's WAL (HLog) (or for similar needs).
>>> >>
>>> >> On Mon, Apr 9, 2012 at 3:45 PM, Ioan Eugen Stan <st...@gmail.com>
>>> >> wrote:
>>> >> > 2012/4/7 Florin P <fl...@yahoo.com>:
>>> >> >> Hello!
>>> >> >>   Just google it for supporting of append into HDFS files and the
>>> >> >> result:
>>> >> >> I'm puzzled. Can someone say: YES you can append in TextFile or
>>> >> >> SequenceFile
>>> >> >> or whatever format. If yes, in which version this feature is
>>> >> >> supported
>>> >> >> ?
>>> >> >> Also where can I find a good example of using the API? I know that
>>> >> >> is a
>>> >> >> long
>>> >> >> debate about this subject, but really it is challenge to find on the
>>> >> >> google
>>> >> >> the current status of this feature.
>>> >> >> I look forward for a trust source answer.
>>> >> >> Thank you,
>>> >> >>   Regards,
>>> >> >>    Florin
>>> >> >
>>> >> > Hi Florian,
>>> >> >
>>> >> > HDFS supports append in Hadoop 1.0.x branch and also 0.22 (a.k.a
>>> >> > hadoop 2.x branch) and 0.23 (a.k.a hadoop 3.x branch).
>>> >> >
>>> >> > [1] http://hbase.apache.org/book/hadoop.html
>>> >> > [2] http://hbase.apache.org/book/hadoop.html -- search for append in
>>> >> > release notes
>>> >> >
>>> >> > Cheers,
>>> >> > --
>>> >> > Ioan Eugen Stan
>>> >> > http://ieugen.blogspot.com/
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Harsh J
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Thanks,
>>> > - Inder
>>> >   Tech Platforms @Inmobi
>>> >   Linkedin - http://goo.gl/eR4Ub
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>>
>>
>> --
>> Thanks,
>> - Inder
>>   Tech Platforms @Inmobi
>>   Linkedin - http://goo.gl/eR4Ub
>
>
>
>--
>Harsh J
>


-- 
Thanks,
- Inder 
 
Tech Platforms @Inmobi 
 
Linkedin - http://goo.gl/eR4Ub

Re: Is append allowed in HDFS?

Posted by Inder Pall <in...@gmail.com>.
Harsh,

idea is to call sync for a configured batch. Still under implementation as
other parts of the system's aren't complete.

recovery/resume-from-errors-at-DN code around general tail-like
>>This sounds promising, can you please shed some more light on this.

- inder
On Tue, Apr 10, 2012 at 1:07 AM, Harsh J <ha...@cloudera.com> wrote:

> Your approach looks fine to me. I'd throw in some
> recovery/resume-from-errors-at-DN code around general tail-like
> consumption but I think you may have already done that :)
>
> But just for my curiosity - do you call sync for every record/unit or
> batch it by a few, for your problem?
>
> On Mon, Apr 9, 2012 at 10:34 PM, Inder Pall <in...@gmail.com> wrote:
> > Yes makes sense. My use-case is more like a producer/consumer and
> consumer
> > trying to stream data as it arrives.
> > Has anyone hit this before and if so resolved it in a better way.
> >
> > Apologies, if i am digressing from the subject of this thread however
> seems
> > to land in the bucket of append support in HDFS.
> >
> > - Inder
> >
> >
> > On Mon, Apr 9, 2012 at 6:27 PM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Inder,
> >>
> >> Yes, that is a requirement for readers of sync-ing data. The new meta
> >> entries can only be read by new readers. The read code would end up
> >> being exactly like the implementation for method "fs -tail" at
> >>
> >>
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/fs/FsShell.java?view=markup
> >> (Line 1101)
> >>
> >> HBase does not read the WAL (HLog) continuously/vigorously as it
> >> syncs, by the way. It only reads the them when a specific request is
> >> made (for splitting, replaying and debug-printing).
> >>
> >> On Mon, Apr 9, 2012 at 6:05 PM, Inder Pall <in...@gmail.com>
> wrote:
> >> > Based on what i have tried, after a sync you need to open a new
> Reader.
> >> > Please correct if that's not the write semantics.
> >> >
> >> > Thanks,
> >> > - Inder
> >> >
> >> >
> >> > On Mon, Apr 9, 2012 at 4:23 PM, Harsh J <ha...@cloudera.com> wrote:
> >> >>
> >> >> I'd also like to note that there are some unresolved issues with the
> >> >> append version in the 1.x (stable) line.
> >> >>
> >> >> Note that HBase's use of the 0.20-append branch features are limited
> >> >> to using "sync" calls alone (Described in p68 "Coherency Model",
> >> >> Chapter 3 (The Hadoop Distributed File System) in Hadoop: The
> >> >> Definitive Guide, 2nd Edition (O'Reilly)). Not the file-reopening
> >> >> "append" calls. The latter is what is still with issues in the 1.x
> >> >> releases today. Using the former is alright if its done in the way
> >> >> similar to HBase's WAL (HLog) (or for similar needs).
> >> >>
> >> >> On Mon, Apr 9, 2012 at 3:45 PM, Ioan Eugen Stan <
> stan.ieugen@gmail.com>
> >> >> wrote:
> >> >> > 2012/4/7 Florin P <fl...@yahoo.com>:
> >> >> >> Hello!
> >> >> >>   Just google it for supporting of append into HDFS files and the
> >> >> >> result:
> >> >> >> I'm puzzled. Can someone say: YES you can append in TextFile or
> >> >> >> SequenceFile
> >> >> >> or whatever format. If yes, in which version this feature is
> >> >> >> supported
> >> >> >> ?
> >> >> >> Also where can I find a good example of using the API? I know that
> >> >> >> is a
> >> >> >> long
> >> >> >> debate about this subject, but really it is challenge to find on
> the
> >> >> >> google
> >> >> >> the current status of this feature.
> >> >> >> I look forward for a trust source answer.
> >> >> >> Thank you,
> >> >> >>   Regards,
> >> >> >>    Florin
> >> >> >
> >> >> > Hi Florian,
> >> >> >
> >> >> > HDFS supports append in Hadoop 1.0.x branch and also 0.22 (a.k.a
> >> >> > hadoop 2.x branch) and 0.23 (a.k.a hadoop 3.x branch).
> >> >> >
> >> >> > [1] http://hbase.apache.org/book/hadoop.html
> >> >> > [2] http://hbase.apache.org/book/hadoop.html -- search for append
> in
> >> >> > release notes
> >> >> >
> >> >> > Cheers,
> >> >> > --
> >> >> > Ioan Eugen Stan
> >> >> > http://ieugen.blogspot.com/
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Harsh J
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks,
> >> > - Inder
> >> >   Tech Platforms @Inmobi
> >> >   Linkedin - http://goo.gl/eR4Ub
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
> >
> >
> > --
> > Thanks,
> > - Inder
> >   Tech Platforms @Inmobi
> >   Linkedin - http://goo.gl/eR4Ub
>
>
>
> --
> Harsh J
>



-- 
Thanks,
- Inder
  Tech Platforms @Inmobi
  Linkedin - http://goo.gl/eR4Ub

Re: Is append allowed in HDFS?

Posted by Harsh J <ha...@cloudera.com>.
Your approach looks fine to me. I'd throw in some
recovery/resume-from-errors-at-DN code around general tail-like
consumption but I think you may have already done that :)

But just for my curiosity - do you call sync for every record/unit or
batch it by a few, for your problem?

On Mon, Apr 9, 2012 at 10:34 PM, Inder Pall <in...@gmail.com> wrote:
> Yes makes sense. My use-case is more like a producer/consumer and consumer
> trying to stream data as it arrives.
> Has anyone hit this before and if so resolved it in a better way.
>
> Apologies, if i am digressing from the subject of this thread however seems
> to land in the bucket of append support in HDFS.
>
> - Inder
>
>
> On Mon, Apr 9, 2012 at 6:27 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Inder,
>>
>> Yes, that is a requirement for readers of sync-ing data. The new meta
>> entries can only be read by new readers. The read code would end up
>> being exactly like the implementation for method "fs -tail" at
>>
>> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/fs/FsShell.java?view=markup
>> (Line 1101)
>>
>> HBase does not read the WAL (HLog) continuously/vigorously as it
>> syncs, by the way. It only reads the them when a specific request is
>> made (for splitting, replaying and debug-printing).
>>
>> On Mon, Apr 9, 2012 at 6:05 PM, Inder Pall <in...@gmail.com> wrote:
>> > Based on what i have tried, after a sync you need to open a new Reader.
>> > Please correct if that's not the write semantics.
>> >
>> > Thanks,
>> > - Inder
>> >
>> >
>> > On Mon, Apr 9, 2012 at 4:23 PM, Harsh J <ha...@cloudera.com> wrote:
>> >>
>> >> I'd also like to note that there are some unresolved issues with the
>> >> append version in the 1.x (stable) line.
>> >>
>> >> Note that HBase's use of the 0.20-append branch features are limited
>> >> to using "sync" calls alone (Described in p68 "Coherency Model",
>> >> Chapter 3 (The Hadoop Distributed File System) in Hadoop: The
>> >> Definitive Guide, 2nd Edition (O'Reilly)). Not the file-reopening
>> >> "append" calls. The latter is what is still with issues in the 1.x
>> >> releases today. Using the former is alright if its done in the way
>> >> similar to HBase's WAL (HLog) (or for similar needs).
>> >>
>> >> On Mon, Apr 9, 2012 at 3:45 PM, Ioan Eugen Stan <st...@gmail.com>
>> >> wrote:
>> >> > 2012/4/7 Florin P <fl...@yahoo.com>:
>> >> >> Hello!
>> >> >>   Just google it for supporting of append into HDFS files and the
>> >> >> result:
>> >> >> I'm puzzled. Can someone say: YES you can append in TextFile or
>> >> >> SequenceFile
>> >> >> or whatever format. If yes, in which version this feature is
>> >> >> supported
>> >> >> ?
>> >> >> Also where can I find a good example of using the API? I know that
>> >> >> is a
>> >> >> long
>> >> >> debate about this subject, but really it is challenge to find on the
>> >> >> google
>> >> >> the current status of this feature.
>> >> >> I look forward for a trust source answer.
>> >> >> Thank you,
>> >> >>   Regards,
>> >> >>    Florin
>> >> >
>> >> > Hi Florian,
>> >> >
>> >> > HDFS supports append in Hadoop 1.0.x branch and also 0.22 (a.k.a
>> >> > hadoop 2.x branch) and 0.23 (a.k.a hadoop 3.x branch).
>> >> >
>> >> > [1] http://hbase.apache.org/book/hadoop.html
>> >> > [2] http://hbase.apache.org/book/hadoop.html -- search for append in
>> >> > release notes
>> >> >
>> >> > Cheers,
>> >> > --
>> >> > Ioan Eugen Stan
>> >> > http://ieugen.blogspot.com/
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >
>> >
>> >
>> >
>> > --
>> > Thanks,
>> > - Inder
>> >   Tech Platforms @Inmobi
>> >   Linkedin - http://goo.gl/eR4Ub
>>
>>
>>
>> --
>> Harsh J
>
>
>
>
> --
> Thanks,
> - Inder
>   Tech Platforms @Inmobi
>   Linkedin - http://goo.gl/eR4Ub



-- 
Harsh J

Re: Is append allowed in HDFS?

Posted by Inder Pall <in...@gmail.com>.
Yes makes sense. My use-case is more like a producer/consumer and consumer
trying to stream data as it arrives.
Has anyone hit this before and if so resolved it in a better way.

Apologies, if i am digressing from the subject of this thread however seems
to land in the bucket of append support in HDFS.

- Inder

On Mon, Apr 9, 2012 at 6:27 PM, Harsh J <ha...@cloudera.com> wrote:

> Inder,
>
> Yes, that is a requirement for readers of sync-ing data. The new meta
> entries can only be read by new readers. The read code would end up
> being exactly like the implementation for method "fs -tail" at
>
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/fs/FsShell.java?view=markup
> (Line 1101)
>
> HBase does not read the WAL (HLog) continuously/vigorously as it
> syncs, by the way. It only reads the them when a specific request is
> made (for splitting, replaying and debug-printing).
>
> On Mon, Apr 9, 2012 at 6:05 PM, Inder Pall <in...@gmail.com> wrote:
> > Based on what i have tried, after a sync you need to open a new Reader.
> > Please correct if that's not the write semantics.
> >
> > Thanks,
> > - Inder
> >
> >
> > On Mon, Apr 9, 2012 at 4:23 PM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> I'd also like to note that there are some unresolved issues with the
> >> append version in the 1.x (stable) line.
> >>
> >> Note that HBase's use of the 0.20-append branch features are limited
> >> to using "sync" calls alone (Described in p68 "Coherency Model",
> >> Chapter 3 (The Hadoop Distributed File System) in Hadoop: The
> >> Definitive Guide, 2nd Edition (O'Reilly)). Not the file-reopening
> >> "append" calls. The latter is what is still with issues in the 1.x
> >> releases today. Using the former is alright if its done in the way
> >> similar to HBase's WAL (HLog) (or for similar needs).
> >>
> >> On Mon, Apr 9, 2012 at 3:45 PM, Ioan Eugen Stan <st...@gmail.com>
> >> wrote:
> >> > 2012/4/7 Florin P <fl...@yahoo.com>:
> >> >> Hello!
> >> >>   Just google it for supporting of append into HDFS files and the
> >> >> result:
> >> >> I'm puzzled. Can someone say: YES you can append in TextFile or
> >> >> SequenceFile
> >> >> or whatever format. If yes, in which version this feature is
> supported
> >> >> ?
> >> >> Also where can I find a good example of using the API? I know that
> is a
> >> >> long
> >> >> debate about this subject, but really it is challenge to find on the
> >> >> google
> >> >> the current status of this feature.
> >> >> I look forward for a trust source answer.
> >> >> Thank you,
> >> >>   Regards,
> >> >>    Florin
> >> >
> >> > Hi Florian,
> >> >
> >> > HDFS supports append in Hadoop 1.0.x branch and also 0.22 (a.k.a
> >> > hadoop 2.x branch) and 0.23 (a.k.a hadoop 3.x branch).
> >> >
> >> > [1] http://hbase.apache.org/book/hadoop.html
> >> > [2] http://hbase.apache.org/book/hadoop.html -- search for append in
> >> > release notes
> >> >
> >> > Cheers,
> >> > --
> >> > Ioan Eugen Stan
> >> > http://ieugen.blogspot.com/
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
> >
> >
> > --
> > Thanks,
> > - Inder
> >   Tech Platforms @Inmobi
> >   Linkedin - http://goo.gl/eR4Ub
>
>
>
> --
> Harsh J
>



-- 
Thanks,
- Inder
  Tech Platforms @Inmobi
  Linkedin - http://goo.gl/eR4Ub

Re: Is append allowed in HDFS?

Posted by Harsh J <ha...@cloudera.com>.
Inder,

Yes, that is a requirement for readers of sync-ing data. The new meta
entries can only be read by new readers. The read code would end up
being exactly like the implementation for method "fs -tail" at
http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/core/org/apache/hadoop/fs/FsShell.java?view=markup
(Line 1101)

HBase does not read the WAL (HLog) continuously/vigorously as it
syncs, by the way. It only reads the them when a specific request is
made (for splitting, replaying and debug-printing).

On Mon, Apr 9, 2012 at 6:05 PM, Inder Pall <in...@gmail.com> wrote:
> Based on what i have tried, after a sync you need to open a new Reader.
> Please correct if that's not the write semantics.
>
> Thanks,
> - Inder
>
>
> On Mon, Apr 9, 2012 at 4:23 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> I'd also like to note that there are some unresolved issues with the
>> append version in the 1.x (stable) line.
>>
>> Note that HBase's use of the 0.20-append branch features are limited
>> to using "sync" calls alone (Described in p68 "Coherency Model",
>> Chapter 3 (The Hadoop Distributed File System) in Hadoop: The
>> Definitive Guide, 2nd Edition (O'Reilly)). Not the file-reopening
>> "append" calls. The latter is what is still with issues in the 1.x
>> releases today. Using the former is alright if its done in the way
>> similar to HBase's WAL (HLog) (or for similar needs).
>>
>> On Mon, Apr 9, 2012 at 3:45 PM, Ioan Eugen Stan <st...@gmail.com>
>> wrote:
>> > 2012/4/7 Florin P <fl...@yahoo.com>:
>> >> Hello!
>> >>   Just google it for supporting of append into HDFS files and the
>> >> result:
>> >> I'm puzzled. Can someone say: YES you can append in TextFile or
>> >> SequenceFile
>> >> or whatever format. If yes, in which version this feature is supported
>> >> ?
>> >> Also where can I find a good example of using the API? I know that is a
>> >> long
>> >> debate about this subject, but really it is challenge to find on the
>> >> google
>> >> the current status of this feature.
>> >> I look forward for a trust source answer.
>> >> Thank you,
>> >>   Regards,
>> >>    Florin
>> >
>> > Hi Florian,
>> >
>> > HDFS supports append in Hadoop 1.0.x branch and also 0.22 (a.k.a
>> > hadoop 2.x branch) and 0.23 (a.k.a hadoop 3.x branch).
>> >
>> > [1] http://hbase.apache.org/book/hadoop.html
>> > [2] http://hbase.apache.org/book/hadoop.html -- search for append in
>> > release notes
>> >
>> > Cheers,
>> > --
>> > Ioan Eugen Stan
>> > http://ieugen.blogspot.com/
>>
>>
>>
>> --
>> Harsh J
>
>
>
>
> --
> Thanks,
> - Inder
>   Tech Platforms @Inmobi
>   Linkedin - http://goo.gl/eR4Ub



-- 
Harsh J

Re: Is append allowed in HDFS?

Posted by Inder Pall <in...@gmail.com>.
Based on what i have tried, after a sync you need to open a new Reader.
Please correct if that's not the write semantics.

Thanks,
- Inder

On Mon, Apr 9, 2012 at 4:23 PM, Harsh J <ha...@cloudera.com> wrote:

> I'd also like to note that there are some unresolved issues with the
> append version in the 1.x (stable) line.
>
> Note that HBase's use of the 0.20-append branch features are limited
> to using "sync" calls alone (Described in p68 "Coherency Model",
> Chapter 3 (The Hadoop Distributed File System) in Hadoop: The
> Definitive Guide, 2nd Edition (O'Reilly)). Not the file-reopening
> "append" calls. The latter is what is still with issues in the 1.x
> releases today. Using the former is alright if its done in the way
> similar to HBase's WAL (HLog) (or for similar needs).
>
> On Mon, Apr 9, 2012 at 3:45 PM, Ioan Eugen Stan <st...@gmail.com>
> wrote:
> > 2012/4/7 Florin P <fl...@yahoo.com>:
> >> Hello!
> >>   Just google it for supporting of append into HDFS files and the
> result:
> >> I'm puzzled. Can someone say: YES you can append in TextFile or
> SequenceFile
> >> or whatever format. If yes, in which version this feature is supported ?
> >> Also where can I find a good example of using the API? I know that is a
> long
> >> debate about this subject, but really it is challenge to find on the
> google
> >> the current status of this feature.
> >> I look forward for a trust source answer.
> >> Thank you,
> >>   Regards,
> >>    Florin
> >
> > Hi Florian,
> >
> > HDFS supports append in Hadoop 1.0.x branch and also 0.22 (a.k.a
> > hadoop 2.x branch) and 0.23 (a.k.a hadoop 3.x branch).
> >
> > [1] http://hbase.apache.org/book/hadoop.html
> > [2] http://hbase.apache.org/book/hadoop.html -- search for append in
> > release notes
> >
> > Cheers,
> > --
> > Ioan Eugen Stan
> > http://ieugen.blogspot.com/
>
>
>
> --
> Harsh J
>



-- 
Thanks,
- Inder
  Tech Platforms @Inmobi
  Linkedin - http://goo.gl/eR4Ub

Re: Is append allowed in HDFS?

Posted by Harsh J <ha...@cloudera.com>.
I'd also like to note that there are some unresolved issues with the
append version in the 1.x (stable) line.

Note that HBase's use of the 0.20-append branch features are limited
to using "sync" calls alone (Described in p68 "Coherency Model",
Chapter 3 (The Hadoop Distributed File System) in Hadoop: The
Definitive Guide, 2nd Edition (O'Reilly)). Not the file-reopening
"append" calls. The latter is what is still with issues in the 1.x
releases today. Using the former is alright if its done in the way
similar to HBase's WAL (HLog) (or for similar needs).

On Mon, Apr 9, 2012 at 3:45 PM, Ioan Eugen Stan <st...@gmail.com> wrote:
> 2012/4/7 Florin P <fl...@yahoo.com>:
>> Hello!
>>   Just google it for supporting of append into HDFS files and the result:
>> I'm puzzled. Can someone say: YES you can append in TextFile or SequenceFile
>> or whatever format. If yes, in which version this feature is supported ?
>> Also where can I find a good example of using the API? I know that is a long
>> debate about this subject, but really it is challenge to find on the google
>> the current status of this feature.
>> I look forward for a trust source answer.
>> Thank you,
>>   Regards,
>>    Florin
>
> Hi Florian,
>
> HDFS supports append in Hadoop 1.0.x branch and also 0.22 (a.k.a
> hadoop 2.x branch) and 0.23 (a.k.a hadoop 3.x branch).
>
> [1] http://hbase.apache.org/book/hadoop.html
> [2] http://hbase.apache.org/book/hadoop.html -- search for append in
> release notes
>
> Cheers,
> --
> Ioan Eugen Stan
> http://ieugen.blogspot.com/



-- 
Harsh J

Re: Is append allowed in HDFS?

Posted by Ioan Eugen Stan <st...@gmail.com>.
2012/4/7 Florin P <fl...@yahoo.com>:
> Hello!
>   Just google it for supporting of append into HDFS files and the result:
> I'm puzzled. Can someone say: YES you can append in TextFile or SequenceFile
> or whatever format. If yes, in which version this feature is supported ?
> Also where can I find a good example of using the API? I know that is a long
> debate about this subject, but really it is challenge to find on the google
> the current status of this feature.
> I look forward for a trust source answer.
> Thank you,
>   Regards,
>    Florin

Hi Florian,

HDFS supports append in Hadoop 1.0.x branch and also 0.22 (a.k.a
hadoop 2.x branch) and 0.23 (a.k.a hadoop 3.x branch).

[1] http://hbase.apache.org/book/hadoop.html
[2] http://hbase.apache.org/book/hadoop.html -- search for append in
release notes

Cheers,
-- 
Ioan Eugen Stan
http://ieugen.blogspot.com/