You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by bourne1900 <bo...@yahoo.cn> on 2011/10/17 09:06:20 UTC

Does hadoop support append option?

I know that hadoop0.19.0 supports append option, but not stable.
Does the latest version support append option? Is it stable?
Thanks for help.




bourne

Re: Does hadoop support append option?

Posted by kartheek muthyala <ka...@gmail.com>.
Hey Uma,
yes the version number what ever i was referring is the generationtimestamp
info. I am sorry for screwing the nomenclature we call that the  version
number. I was actually referring to this
http://www.cloudera.com/blog/2009/07/file-appends-in-hdfs/ where he
mentioned this as version. But thanks for your time Uma, I figured out what
I need.
Thanks,
Kartheek.

On Tue, Oct 18, 2011 at 3:39 PM, Uma Maheswara Rao G 72686 <
maheswara@huawei.com> wrote:

>
> ----- Original Message -----
> From: kartheek muthyala <ka...@gmail.com>
> Date: Tuesday, October 18, 2011 1:31 pm
> Subject: Re: Does hadoop support append option?
> To: common-user@hadoop.apache.org
>
> > Thanks Uma for the clarification of the append functionality.
> >
> > My second question is about the version number concept used in the
> > blockmap. Why does it maintain this version number?
> sorry Karthik,
> As i know, there is no version number in blocks map. Are you talking about
> generationTimeStamp or something?
>  can you paste the snippet where you have seen that version number, so,
> that i can get your question clearly.
>
> >
> > ~Kartheek
> >
> > On Tue, Oct 18, 2011 at 12:14 PM, Uma Maheswara Rao G 72686 <
> > maheswara@huawei.com> wrote:
> >
> > > ----- Original Message -----
> > > From: kartheek muthyala <ka...@gmail.com>
> > > Date: Tuesday, October 18, 2011 11:54 am
> > > Subject: Re: Does hadoop support append option?
> > > To: common-user@hadoop.apache.org
> > >
> > > > I am just concerned about the use case of appends in Hadoop. I
> > > > know that
> > > > they have provided support for appends in hadoop. But how
> > > > frequently are the
> > > > files getting appended? .
> > >  In normal case file block details will not be persisted in edit
> > log before
> > > closing the file. As part of close only, this will happen. If NN
> > restart> happens before closing the file, we loose this data.
> > >
> > >  Consider a case, we have a very big file and data also very
> > important, in
> > > this case, we should have an option to persist the block details
> > frequently> into editlog file rite, inorder to avoid the dataloss
> > in case of NN
> > > restarts. To do this, DFS exposed the API called sync. This will
> > basically> persist the editlog entries to disk. To reopen the
> > stream back again we will
> > > use append api.
> > >
> > > In trunk, this support has been refactored cleanly and handled
> > many corner
> > > cases. APIs also provided as hflush.
> > >
> > > There is this version concept too that is
> > > > maintained in the block report, according to my guess this version
> > > > number is
> > > > maintained to make sure that if a datanode gets disconnected once
> > > > and comes
> > > > back if it has a old copy of the data , then discard read requests
> > > > to this
> > > > data node. But if the files are not getting appended frequently
> > > > does the
> > > > version number remain the same?. Any typical use case can you guys
> > > > point to?
> > > >
> > > I am not sure, what is your exact question here. Can you please
> > clarify> more on this?
> > >
> > > > ~Kartheek
> > > >
> > > > On Mon, Oct 17, 2011 at 12:53 PM, Uma Maheswara Rao G 72686 <
> > > > maheswara@huawei.com> wrote:
> > > >
> > > > > AFAIK, append option is there in 20Append branch. Mainly
> > > > supports sync. But
> > > > > there are some issues with that.
> > > > >
> > > > > Same has been merged to 20.205 branch and will be released
> > soon (rc2
> > > > > available). And also fixed many bugs in this branch. As per our
> > > > basic> testing it is pretty good as of now.Need to wait for
> > > > official release.
> > > > >
> > > > > Regards,
> > > > > Uma
> > > > >
> > > > > ----- Original Message -----
> > > > > From: bourne1900 <bo...@yahoo.cn>
> > > > > Date: Monday, October 17, 2011 12:37 pm
> > > > > Subject: Does hadoop support append option?
> > > > > To: common-user <co...@hadoop.apache.org>
> > > > >
> > > > > > I know that hadoop0.19.0 supports append option, but not
> > stable.> > > > Does the latest version support append option? Is
> > it stable?
> > > > > > Thanks for help.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > bourne
> > > > >
> > > >
> > >
> > > Regards,
> > > Uma
> > >
> >
>

Re: Does hadoop support append option?

Posted by Uma Maheswara Rao G 72686 <ma...@huawei.com>.
----- Original Message -----
From: kartheek muthyala <ka...@gmail.com>
Date: Tuesday, October 18, 2011 1:31 pm
Subject: Re: Does hadoop support append option?
To: common-user@hadoop.apache.org

> Thanks Uma for the clarification of the append functionality.
> 
> My second question is about the version number concept used in the 
> blockmap. Why does it maintain this version number?
sorry Karthik,
As i know, there is no version number in blocks map. Are you talking about generationTimeStamp or something?
 can you paste the snippet where you have seen that version number, so, that i can get your question clearly.

> 
> ~Kartheek
> 
> On Tue, Oct 18, 2011 at 12:14 PM, Uma Maheswara Rao G 72686 <
> maheswara@huawei.com> wrote:
> 
> > ----- Original Message -----
> > From: kartheek muthyala <ka...@gmail.com>
> > Date: Tuesday, October 18, 2011 11:54 am
> > Subject: Re: Does hadoop support append option?
> > To: common-user@hadoop.apache.org
> >
> > > I am just concerned about the use case of appends in Hadoop. I
> > > know that
> > > they have provided support for appends in hadoop. But how
> > > frequently are the
> > > files getting appended? .
> >  In normal case file block details will not be persisted in edit 
> log before
> > closing the file. As part of close only, this will happen. If NN 
> restart> happens before closing the file, we loose this data.
> >
> >  Consider a case, we have a very big file and data also very 
> important, in
> > this case, we should have an option to persist the block details 
> frequently> into editlog file rite, inorder to avoid the dataloss 
> in case of NN
> > restarts. To do this, DFS exposed the API called sync. This will 
> basically> persist the editlog entries to disk. To reopen the 
> stream back again we will
> > use append api.
> >
> > In trunk, this support has been refactored cleanly and handled 
> many corner
> > cases. APIs also provided as hflush.
> >
> > There is this version concept too that is
> > > maintained in the block report, according to my guess this version
> > > number is
> > > maintained to make sure that if a datanode gets disconnected once
> > > and comes
> > > back if it has a old copy of the data , then discard read requests
> > > to this
> > > data node. But if the files are not getting appended frequently
> > > does the
> > > version number remain the same?. Any typical use case can you guys
> > > point to?
> > >
> > I am not sure, what is your exact question here. Can you please 
> clarify> more on this?
> >
> > > ~Kartheek
> > >
> > > On Mon, Oct 17, 2011 at 12:53 PM, Uma Maheswara Rao G 72686 <
> > > maheswara@huawei.com> wrote:
> > >
> > > > AFAIK, append option is there in 20Append branch. Mainly
> > > supports sync. But
> > > > there are some issues with that.
> > > >
> > > > Same has been merged to 20.205 branch and will be released 
> soon (rc2
> > > > available). And also fixed many bugs in this branch. As per our
> > > basic> testing it is pretty good as of now.Need to wait for
> > > official release.
> > > >
> > > > Regards,
> > > > Uma
> > > >
> > > > ----- Original Message -----
> > > > From: bourne1900 <bo...@yahoo.cn>
> > > > Date: Monday, October 17, 2011 12:37 pm
> > > > Subject: Does hadoop support append option?
> > > > To: common-user <co...@hadoop.apache.org>
> > > >
> > > > > I know that hadoop0.19.0 supports append option, but not 
> stable.> > > > Does the latest version support append option? Is 
> it stable?
> > > > > Thanks for help.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > bourne
> > > >
> > >
> >
> > Regards,
> > Uma
> >
> 

Re: Does hadoop support append option?

Posted by kartheek muthyala <ka...@gmail.com>.
Thanks Uma for the clarification of the append functionality.

My second question is about the version number concept used in the block
map. Why does it maintain this version number?

~Kartheek

On Tue, Oct 18, 2011 at 12:14 PM, Uma Maheswara Rao G 72686 <
maheswara@huawei.com> wrote:

> ----- Original Message -----
> From: kartheek muthyala <ka...@gmail.com>
> Date: Tuesday, October 18, 2011 11:54 am
> Subject: Re: Does hadoop support append option?
> To: common-user@hadoop.apache.org
>
> > I am just concerned about the use case of appends in Hadoop. I
> > know that
> > they have provided support for appends in hadoop. But how
> > frequently are the
> > files getting appended? .
>  In normal case file block details will not be persisted in edit log before
> closing the file. As part of close only, this will happen. If NN restart
> happens before closing the file, we loose this data.
>
>  Consider a case, we have a very big file and data also very important, in
> this case, we should have an option to persist the block details frequently
> into editlog file rite, inorder to avoid the dataloss in case of NN
> restarts. To do this, DFS exposed the API called sync. This will basically
> persist the editlog entries to disk. To reopen the stream back again we will
> use append api.
>
> In trunk, this support has been refactored cleanly and handled many corner
> cases. APIs also provided as hflush.
>
> There is this version concept too that is
> > maintained in the block report, according to my guess this version
> > number is
> > maintained to make sure that if a datanode gets disconnected once
> > and comes
> > back if it has a old copy of the data , then discard read requests
> > to this
> > data node. But if the files are not getting appended frequently
> > does the
> > version number remain the same?. Any typical use case can you guys
> > point to?
> >
> I am not sure, what is your exact question here. Can you please clarify
> more on this?
>
> > ~Kartheek
> >
> > On Mon, Oct 17, 2011 at 12:53 PM, Uma Maheswara Rao G 72686 <
> > maheswara@huawei.com> wrote:
> >
> > > AFAIK, append option is there in 20Append branch. Mainly
> > supports sync. But
> > > there are some issues with that.
> > >
> > > Same has been merged to 20.205 branch and will be released soon (rc2
> > > available). And also fixed many bugs in this branch. As per our
> > basic> testing it is pretty good as of now.Need to wait for
> > official release.
> > >
> > > Regards,
> > > Uma
> > >
> > > ----- Original Message -----
> > > From: bourne1900 <bo...@yahoo.cn>
> > > Date: Monday, October 17, 2011 12:37 pm
> > > Subject: Does hadoop support append option?
> > > To: common-user <co...@hadoop.apache.org>
> > >
> > > > I know that hadoop0.19.0 supports append option, but not stable.
> > > > Does the latest version support append option? Is it stable?
> > > > Thanks for help.
> > > >
> > > >
> > > >
> > > >
> > > > bourne
> > >
> >
>
> Regards,
> Uma
>

Re: Does hadoop support append option?

Posted by Uma Maheswara Rao G 72686 <ma...@huawei.com>.
----- Original Message -----
From: kartheek muthyala <ka...@gmail.com>
Date: Tuesday, October 18, 2011 11:54 am
Subject: Re: Does hadoop support append option?
To: common-user@hadoop.apache.org

> I am just concerned about the use case of appends in Hadoop. I 
> know that
> they have provided support for appends in hadoop. But how 
> frequently are the
> files getting appended? . 
 In normal case file block details will not be persisted in edit log before closing the file. As part of close only, this will happen. If NN restart happens before closing the file, we loose this data.

 Consider a case, we have a very big file and data also very important, in this case, we should have an option to persist the block details frequently into editlog file rite, inorder to avoid the dataloss in case of NN restarts. To do this, DFS exposed the API called sync. This will basically persist the editlog entries to disk. To reopen the stream back again we will use append api. 

In trunk, this support has been refactored cleanly and handled many corner cases. APIs also provided as hflush.

There is this version concept too that is
> maintained in the block report, according to my guess this version 
> number is
> maintained to make sure that if a datanode gets disconnected once 
> and comes
> back if it has a old copy of the data , then discard read requests 
> to this
> data node. But if the files are not getting appended frequently 
> does the
> version number remain the same?. Any typical use case can you guys 
> point to?
> 
I am not sure, what is your exact question here. Can you please clarify more on this?

> ~Kartheek
> 
> On Mon, Oct 17, 2011 at 12:53 PM, Uma Maheswara Rao G 72686 <
> maheswara@huawei.com> wrote:
> 
> > AFAIK, append option is there in 20Append branch. Mainly 
> supports sync. But
> > there are some issues with that.
> >
> > Same has been merged to 20.205 branch and will be released soon (rc2
> > available). And also fixed many bugs in this branch. As per our 
> basic> testing it is pretty good as of now.Need to wait for 
> official release.
> >
> > Regards,
> > Uma
> >
> > ----- Original Message -----
> > From: bourne1900 <bo...@yahoo.cn>
> > Date: Monday, October 17, 2011 12:37 pm
> > Subject: Does hadoop support append option?
> > To: common-user <co...@hadoop.apache.org>
> >
> > > I know that hadoop0.19.0 supports append option, but not stable.
> > > Does the latest version support append option? Is it stable?
> > > Thanks for help.
> > >
> > >
> > >
> > >
> > > bourne
> >
> 

Regards,
Uma

Re: Does hadoop support append option?

Posted by kartheek muthyala <ka...@gmail.com>.
I am just concerned about the use case of appends in Hadoop. I know that
they have provided support for appends in hadoop. But how frequently are the
files getting appended? . There is this version concept too that is
maintained in the block report, according to my guess this version number is
maintained to make sure that if a datanode gets disconnected once and comes
back if it has a old copy of the data , then discard read requests to this
data node. But if the files are not getting appended frequently does the
version number remain the same?. Any typical use case can you guys point to?

~Kartheek

On Mon, Oct 17, 2011 at 12:53 PM, Uma Maheswara Rao G 72686 <
maheswara@huawei.com> wrote:

> AFAIK, append option is there in 20Append branch. Mainly supports sync. But
> there are some issues with that.
>
> Same has been merged to 20.205 branch and will be released soon (rc2
> available). And also fixed many bugs in this branch. As per our basic
> testing it is pretty good as of now.Need to wait for official release.
>
> Regards,
> Uma
>
> ----- Original Message -----
> From: bourne1900 <bo...@yahoo.cn>
> Date: Monday, October 17, 2011 12:37 pm
> Subject: Does hadoop support append option?
> To: common-user <co...@hadoop.apache.org>
>
> > I know that hadoop0.19.0 supports append option, but not stable.
> > Does the latest version support append option? Is it stable?
> > Thanks for help.
> >
> >
> >
> >
> > bourne
>

Re: Does hadoop support append option?

Posted by Uma Maheswara Rao G 72686 <ma...@huawei.com>.
AFAIK, append option is there in 20Append branch. Mainly supports sync. But there are some issues with that.

Same has been merged to 20.205 branch and will be released soon (rc2 available). And also fixed many bugs in this branch. As per our basic testing it is pretty good as of now.Need to wait for official release.

Regards,
Uma

----- Original Message -----
From: bourne1900 <bo...@yahoo.cn>
Date: Monday, October 17, 2011 12:37 pm
Subject: Does hadoop support append option?
To: common-user <co...@hadoop.apache.org>

> I know that hadoop0.19.0 supports append option, but not stable.
> Does the latest version support append option? Is it stable?
> Thanks for help.
> 
> 
> 
> 
> bourne