You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Stack <st...@duboce.net> on 2010/01/21 23:36:25 UTC

[VOTE -- Round 2] Commit hdfs-630 to 0.21?

I'd like to propose a new vote on having hdfs-630 committed to 0.21.
The first vote on this topic, initiated 12/14/2009, was sunk by Tsz Wo
(Nicholas), Sze suggested improvements. Those suggestions have since
been folded into a new version of the hdfs-630 patch.  Its this new
version of the patch -- 0001-Fix-HDFS-630-0.21-svn-2.patch -- that I'd
like us to vote on. For background on why we -- the hbase community
-- think hdfs-630 important, see the notes below from the original
call-to-vote.

I'm obviously +1.

Thanks for you consideration,
St.Ack

P.S. Regards TRUNK, after chatting with Nicholas, TRUNK was cleaned of
the previous versions of hdfs-630 and we'll likely apply
0001-Fix-HDFS-630-trunk-svn-4.patch, a version of
0001-Fix-HDFS-630-0.21-svn-2.patch that works for TRUNK that includes
the Nicholas suggestions.


On Mon, Dec 14, 2009 at 9:56 PM, stack <st...@duboce.net> wrote:
> I'd like to propose a vote on having hdfs-630 committed to 0.21 (Its already
> been committed to TRUNK).
>
> hdfs-630 adds having the dfsclient pass the namenode the name of datanodes
> its determined dead because it got a failed connection when it tried to
> contact it, etc.  This is useful in the interval between datanode dying and
> namenode timing out its lease.  Without this fix, the namenode can often
> give out the dead datanode as a host for a block.  If the cluster is small,
> less than 5 or 6 nodes, then its very likely namenode will give out the dead
> datanode as a block host.
>
> Small clusters are common in hbase, especially when folks are starting out
> or evaluating hbase.  They'll start with three or four nodes carrying both
> datanodes+hbase regionservers.  They'll experiment killing one of the slaves
> -- datanodes and regionserver -- and watch what happens.  What follows is a
> struggling dfsclient trying to create replicas where one of the datanodes
> passed us by the namenode is dead.   DFSClient will fail and then go back to
> the namenode again, etc. (See
> https://issues.apache.org/jira/browse/HBASE-1876 for more detailed
> blow-by-blow).  HBase operation will be held up during this time and
> eventually a regionserver will shut itself down to protect itself against
> dataloss if we can't successfully write HDFS.
>
> Thanks all,
> St.Ack

Re: [VOTE -- Round 2] Commit hdfs-630 to 0.21?

Posted by Eli Collins <el...@cloudera.com>.

+1

On Thu, Jan 21, 2010 at 2:58 PM, Tsz Wo (Nicholas), Sze
<s2...@yahoo.com> wrote:
> +1
> Nicholas Sze
>
>
>
>
> ----- Original Message ----
>> From: Stack <st...@duboce.net>
>> To: hdfs-dev@hadoop.apache.org
>> Cc: HBase Dev List <hb...@hadoop.apache.org>
>> Sent: Thu, January 21, 2010 2:36:25 PM
>> Subject: [VOTE -- Round 2] Commit hdfs-630 to 0.21?
>>
>> I'd like to propose a new vote on having hdfs-630 committed to 0.21.
>> The first vote on this topic, initiated 12/14/2009, was sunk by Tsz Wo
>> (Nicholas), Sze suggested improvements. Those suggestions have since
>> been folded into a new version of the hdfs-630 patch.  Its this new
>> version of the patch -- 0001-Fix-HDFS-630-0.21-svn-2.patch -- that I'd
>> like us to vote on. For background on why we -- the hbase community
>> -- think hdfs-630 important, see the notes below from the original
>> call-to-vote.
>>
>> I'm obviously +1.
>>
>> Thanks for you consideration,
>> St.Ack
>>
>> P.S. Regards TRUNK, after chatting with Nicholas, TRUNK was cleaned of
>> the previous versions of hdfs-630 and we'll likely apply
>> 0001-Fix-HDFS-630-trunk-svn-4.patch, a version of
>> 0001-Fix-HDFS-630-0.21-svn-2.patch that works for TRUNK that includes
>> the Nicholas suggestions.
>>
>>
>> On Mon, Dec 14, 2009 at 9:56 PM, stack wrote:
>> > I'd like to propose a vote on having hdfs-630 committed to 0.21 (Its already
>> > been committed to TRUNK).
>> >
>> > hdfs-630 adds having the dfsclient pass the namenode the name of datanodes
>> > its determined dead because it got a failed connection when it tried to
>> > contact it, etc.  This is useful in the interval between datanode dying and
>> > namenode timing out its lease.  Without this fix, the namenode can often
>> > give out the dead datanode as a host for a block.  If the cluster is small,
>> > less than 5 or 6 nodes, then its very likely namenode will give out the dead
>> > datanode as a block host.
>> >
>> > Small clusters are common in hbase, especially when folks are starting out
>> > or evaluating hbase.  They'll start with three or four nodes carrying both
>> > datanodes+hbase regionservers.  They'll experiment killing one of the slaves
>> > -- datanodes and regionserver -- and watch what happens.  What follows is a
>> > struggling dfsclient trying to create replicas where one of the datanodes
>> > passed us by the namenode is dead.   DFSClient will fail and then go back to
>> > the namenode again, etc. (See
>> > https://issues.apache.org/jira/browse/HBASE-1876 for more detailed
>> > blow-by-blow).  HBase operation will be held up during this time and
>> > eventually a regionserver will shut itself down to protect itself against
>> > dataloss if we can't successfully write HDFS.
>> >
>> > Thanks all,
>> > St.Ack
>
>
>

Re: [VOTE -- Round 2] Commit hdfs-630 to 0.21?

Posted by Eli Collins <el...@cloudera.com>.

+1

On Thu, Jan 21, 2010 at 2:58 PM, Tsz Wo (Nicholas), Sze
<s2...@yahoo.com> wrote:
> +1
> Nicholas Sze
>
>
>
>
> ----- Original Message ----
>> From: Stack <st...@duboce.net>
>> To: hdfs-dev@hadoop.apache.org
>> Cc: HBase Dev List <hb...@hadoop.apache.org>
>> Sent: Thu, January 21, 2010 2:36:25 PM
>> Subject: [VOTE -- Round 2] Commit hdfs-630 to 0.21?
>>
>> I'd like to propose a new vote on having hdfs-630 committed to 0.21.
>> The first vote on this topic, initiated 12/14/2009, was sunk by Tsz Wo
>> (Nicholas), Sze suggested improvements. Those suggestions have since
>> been folded into a new version of the hdfs-630 patch.  Its this new
>> version of the patch -- 0001-Fix-HDFS-630-0.21-svn-2.patch -- that I'd
>> like us to vote on. For background on why we -- the hbase community
>> -- think hdfs-630 important, see the notes below from the original
>> call-to-vote.
>>
>> I'm obviously +1.
>>
>> Thanks for you consideration,
>> St.Ack
>>
>> P.S. Regards TRUNK, after chatting with Nicholas, TRUNK was cleaned of
>> the previous versions of hdfs-630 and we'll likely apply
>> 0001-Fix-HDFS-630-trunk-svn-4.patch, a version of
>> 0001-Fix-HDFS-630-0.21-svn-2.patch that works for TRUNK that includes
>> the Nicholas suggestions.
>>
>>
>> On Mon, Dec 14, 2009 at 9:56 PM, stack wrote:
>> > I'd like to propose a vote on having hdfs-630 committed to 0.21 (Its already
>> > been committed to TRUNK).
>> >
>> > hdfs-630 adds having the dfsclient pass the namenode the name of datanodes
>> > its determined dead because it got a failed connection when it tried to
>> > contact it, etc.  This is useful in the interval between datanode dying and
>> > namenode timing out its lease.  Without this fix, the namenode can often
>> > give out the dead datanode as a host for a block.  If the cluster is small,
>> > less than 5 or 6 nodes, then its very likely namenode will give out the dead
>> > datanode as a block host.
>> >
>> > Small clusters are common in hbase, especially when folks are starting out
>> > or evaluating hbase.  They'll start with three or four nodes carrying both
>> > datanodes+hbase regionservers.  They'll experiment killing one of the slaves
>> > -- datanodes and regionserver -- and watch what happens.  What follows is a
>> > struggling dfsclient trying to create replicas where one of the datanodes
>> > passed us by the namenode is dead.   DFSClient will fail and then go back to
>> > the namenode again, etc. (See
>> > https://issues.apache.org/jira/browse/HBASE-1876 for more detailed
>> > blow-by-blow).  HBase operation will be held up during this time and
>> > eventually a regionserver will shut itself down to protect itself against
>> > dataloss if we can't successfully write HDFS.
>> >
>> > Thanks all,
>> > St.Ack
>
>
>

Re: [VOTE -- Round 2] Commit hdfs-630 to 0.21?

Posted by "Tsz Wo (Nicholas), Sze" <s2...@yahoo.com>.

+1
Nicholas Sze




----- Original Message ----
> From: Stack <st...@duboce.net>
> To: hdfs-dev@hadoop.apache.org
> Cc: HBase Dev List <hb...@hadoop.apache.org>
> Sent: Thu, January 21, 2010 2:36:25 PM
> Subject: [VOTE -- Round 2] Commit hdfs-630 to 0.21?
> 
> I'd like to propose a new vote on having hdfs-630 committed to 0.21.
> The first vote on this topic, initiated 12/14/2009, was sunk by Tsz Wo
> (Nicholas), Sze suggested improvements. Those suggestions have since
> been folded into a new version of the hdfs-630 patch.  Its this new
> version of the patch -- 0001-Fix-HDFS-630-0.21-svn-2.patch -- that I'd
> like us to vote on. For background on why we -- the hbase community
> -- think hdfs-630 important, see the notes below from the original
> call-to-vote.
> 
> I'm obviously +1.
> 
> Thanks for you consideration,
> St.Ack
> 
> P.S. Regards TRUNK, after chatting with Nicholas, TRUNK was cleaned of
> the previous versions of hdfs-630 and we'll likely apply
> 0001-Fix-HDFS-630-trunk-svn-4.patch, a version of
> 0001-Fix-HDFS-630-0.21-svn-2.patch that works for TRUNK that includes
> the Nicholas suggestions.
> 
> 
> On Mon, Dec 14, 2009 at 9:56 PM, stack wrote:
> > I'd like to propose a vote on having hdfs-630 committed to 0.21 (Its already
> > been committed to TRUNK).
> >
> > hdfs-630 adds having the dfsclient pass the namenode the name of datanodes
> > its determined dead because it got a failed connection when it tried to
> > contact it, etc.  This is useful in the interval between datanode dying and
> > namenode timing out its lease.  Without this fix, the namenode can often
> > give out the dead datanode as a host for a block.  If the cluster is small,
> > less than 5 or 6 nodes, then its very likely namenode will give out the dead
> > datanode as a block host.
> >
> > Small clusters are common in hbase, especially when folks are starting out
> > or evaluating hbase.  They'll start with three or four nodes carrying both
> > datanodes+hbase regionservers.  They'll experiment killing one of the slaves
> > -- datanodes and regionserver -- and watch what happens.  What follows is a
> > struggling dfsclient trying to create replicas where one of the datanodes
> > passed us by the namenode is dead.   DFSClient will fail and then go back to
> > the namenode again, etc. (See
> > https://issues.apache.org/jira/browse/HBASE-1876 for more detailed
> > blow-by-blow).  HBase operation will be held up during this time and
> > eventually a regionserver will shut itself down to protect itself against
> > dataloss if we can't successfully write HDFS.
> >
> > Thanks all,
> > St.Ack

Re: [VOTE -- Round 2] Commit hdfs-630 to 0.21?

Posted by Hairong Kuang <ku...@gmail.com>.

+1

Hairong

On Fri, Jan 22, 2010 at 10:51 AM, Dhruba Borthakur <dh...@gmail.com> wrote:

> +1 for making this patch go into 0.21.
>
> thanks,
> dhruba
>
> On Fri, Jan 22, 2010 at 10:25 AM, Todd Lipcon <to...@cloudera.com> wrote:
>
> > Hi Steve,
> >
> > All of the below may be good ideas, but I don't think they're relevant to
> > the discussion at hand. Specifically, none of them can enter 0.21 without
> a
> > vote as they'd be new features, and it doesn't even sound like there's a
> > JIRA out for them yet. Let's not put off a well-known improvement patch
> > waiting for one that doesn't even exist yet. If we want to get the ideas
> > below into 22 or a later version, let's open a JIRA and discuss there
> > rather
> > than using this vote thread.
> >
> > As for the patch, I'm +1. It certainly is a large improvement on small
> > clusters - without it, in a three node cluster, you cannot successfully
> > kill
> > a DN while doing an fs -put, even if your min.replication is 1. As Ryan
> > mentioned above, this is a huge problem since new users may evaluate
> Hadoop
> > on a 3-node cluster, figure "hey, let's see fault tolerance in action"
> and
> > then be entirely put off when their kill -9 takes the cluster to a
> > screeching halt.
> >
> > Thanks
> > -Todd
> >
> > On Fri, Jan 22, 2010 at 7:32 AM, Steve Loughran <st...@apache.org>
> wrote:
> >
> > > Stack wrote:
> > >
> > > I'm being 0 on this
> > >
> > > -I would worry if the exclusion list was used by the NN to do its
> > > blacklisting, I'm glad to see this isn't happening. Yes, you could pick
> > up
> > > datanode failure faster, but you would also be vulnerable to a user
> doing
> > a
> > > DoS against the cluster by reporting every DN as failing
> > >
> > > -Russ Perry's work on high-speed Hadoop rendering [1] tweaked Hadoop to
> > > allow the datanodes to get the entire list of nodes holding the data,
> and
> > > allowed them to make their own decision about where to get the data
> from.
> > > This
> > >  1. pushed the policy of handling failure down to the clients, less
> need
> > to
> > > talk to the NN about it.
> > >  2. lets you do something very fancy where you deliberately choose data
> > > from different DNs, so that you can then pull data off the cluster at
> the
> > > full bandwidth of every disk
> > >
> > > Long term, I would like to see Russ's addition go in, so worry if the
> > > HDFS-630 patch would be useful long term. Maybe its a more fundamental
> > > issue: where does the decision making go, into the clients or into the
> > NN?
> > >
> > > -steve
> > >
> > >
> > >
> > > [1] http://www.hpl.hp.com/techreports/2009/HPL-2009-345.html
> > >
> >
>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>

Re: [VOTE -- Round 2] Commit hdfs-630 to 0.21?

Posted by Dhruba Borthakur <dh...@gmail.com>.

+1 for making this patch go into 0.21.

thanks,
dhruba

On Fri, Jan 22, 2010 at 10:25 AM, Todd Lipcon <to...@cloudera.com> wrote:

> Hi Steve,
>
> All of the below may be good ideas, but I don't think they're relevant to
> the discussion at hand. Specifically, none of them can enter 0.21 without a
> vote as they'd be new features, and it doesn't even sound like there's a
> JIRA out for them yet. Let's not put off a well-known improvement patch
> waiting for one that doesn't even exist yet. If we want to get the ideas
> below into 22 or a later version, let's open a JIRA and discuss there
> rather
> than using this vote thread.
>
> As for the patch, I'm +1. It certainly is a large improvement on small
> clusters - without it, in a three node cluster, you cannot successfully
> kill
> a DN while doing an fs -put, even if your min.replication is 1. As Ryan
> mentioned above, this is a huge problem since new users may evaluate Hadoop
> on a 3-node cluster, figure "hey, let's see fault tolerance in action" and
> then be entirely put off when their kill -9 takes the cluster to a
> screeching halt.
>
> Thanks
> -Todd
>
> On Fri, Jan 22, 2010 at 7:32 AM, Steve Loughran <st...@apache.org> wrote:
>
> > Stack wrote:
> >
> > I'm being 0 on this
> >
> > -I would worry if the exclusion list was used by the NN to do its
> > blacklisting, I'm glad to see this isn't happening. Yes, you could pick
> up
> > datanode failure faster, but you would also be vulnerable to a user doing
> a
> > DoS against the cluster by reporting every DN as failing
> >
> > -Russ Perry's work on high-speed Hadoop rendering [1] tweaked Hadoop to
> > allow the datanodes to get the entire list of nodes holding the data, and
> > allowed them to make their own decision about where to get the data from.
> > This
> >  1. pushed the policy of handling failure down to the clients, less need
> to
> > talk to the NN about it.
> >  2. lets you do something very fancy where you deliberately choose data
> > from different DNs, so that you can then pull data off the cluster at the
> > full bandwidth of every disk
> >
> > Long term, I would like to see Russ's addition go in, so worry if the
> > HDFS-630 patch would be useful long term. Maybe its a more fundamental
> > issue: where does the decision making go, into the clients or into the
> NN?
> >
> > -steve
> >
> >
> >
> > [1] http://www.hpl.hp.com/techreports/2009/HPL-2009-345.html
> >
>



-- 
Connect to me at http://www.facebook.com/dhruba

Re: [VOTE -- Round 2] Commit hdfs-630 to 0.21?

Posted by Todd Lipcon <to...@cloudera.com>.

Hi Steve,

All of the below may be good ideas, but I don't think they're relevant to
the discussion at hand. Specifically, none of them can enter 0.21 without a
vote as they'd be new features, and it doesn't even sound like there's a
JIRA out for them yet. Let's not put off a well-known improvement patch
waiting for one that doesn't even exist yet. If we want to get the ideas
below into 22 or a later version, let's open a JIRA and discuss there rather
than using this vote thread.

As for the patch, I'm +1. It certainly is a large improvement on small
clusters - without it, in a three node cluster, you cannot successfully kill
a DN while doing an fs -put, even if your min.replication is 1. As Ryan
mentioned above, this is a huge problem since new users may evaluate Hadoop
on a 3-node cluster, figure "hey, let's see fault tolerance in action" and
then be entirely put off when their kill -9 takes the cluster to a
screeching halt.

Thanks
-Todd

On Fri, Jan 22, 2010 at 7:32 AM, Steve Loughran <st...@apache.org> wrote:

> Stack wrote:
>
> I'm being 0 on this
>
> -I would worry if the exclusion list was used by the NN to do its
> blacklisting, I'm glad to see this isn't happening. Yes, you could pick up
> datanode failure faster, but you would also be vulnerable to a user doing a
> DoS against the cluster by reporting every DN as failing
>
> -Russ Perry's work on high-speed Hadoop rendering [1] tweaked Hadoop to
> allow the datanodes to get the entire list of nodes holding the data, and
> allowed them to make their own decision about where to get the data from.
> This
>  1. pushed the policy of handling failure down to the clients, less need to
> talk to the NN about it.
>  2. lets you do something very fancy where you deliberately choose data
> from different DNs, so that you can then pull data off the cluster at the
> full bandwidth of every disk
>
> Long term, I would like to see Russ's addition go in, so worry if the
> HDFS-630 patch would be useful long term. Maybe its a more fundamental
> issue: where does the decision making go, into the clients or into the NN?
>
> -steve
>
>
>
> [1] http://www.hpl.hp.com/techreports/2009/HPL-2009-345.html
>

Re: [VOTE -- Round 2] Commit hdfs-630 to 0.21?

Posted by Dhruba Borthakur <dh...@gmail.com>.

> tweaked Hadoop to allow the datanodes to get the entire list

are you referring to datanodes or dfs clients here?

The client already gets the entire list of replica locations for a block
from the namenode. and one could always develop a DFS client that is free to
choose whatever locations it decides to pick up the data from, isn't it?

thanks,
dhruba

On Fri, Jan 22, 2010 at 7:32 AM, Steve Loughran <st...@apache.org> wrote:

> Stack wrote:
>
> I'm being 0 on this
>
> -I would worry if the exclusion list was used by the NN to do its
> blacklisting, I'm glad to see this isn't happening. Yes, you could pick up
> datanode failure faster, but you would also be vulnerable to a user doing a
> DoS against the cluster by reporting every DN as failing
>
> -Russ Perry's work on high-speed Hadoop rendering [1] tweaked Hadoop to
> allow the datanodes to get the entire list of nodes holding the data, and
> allowed them to make their own decision about where to get the data from.
> This
>  1. pushed the policy of handling failure down to the clients, less need to
> talk to the NN about it.
>  2. lets you do something very fancy where you deliberately choose data
> from different DNs, so that you can then pull data off the cluster at the
> full bandwidth of every disk
>
> Long term, I would like to see Russ's addition go in, so worry if the
> HDFS-630 patch would be useful long term. Maybe its a more fundamental
> issue: where does the decision making go, into the clients or into the NN?
>
> -steve
>
>
>
> [1] http://www.hpl.hp.com/techreports/2009/HPL-2009-345.html
>



-- 
Connect to me at http://www.facebook.com/dhruba

Re: [VOTE -- Round 2] Commit hdfs-630 to 0.21?

Posted by Steve Loughran <st...@apache.org>.

Cosmin Lehene wrote:
> Steve, 
> 
> A DoS could not be done using excludedNodes.
> 
> The blacklisting takes place only at DFSClientLevel. The NN will return a
> list of block locations that excludes the nodes the client decided. This
> list isn't persisted anywhere on the server. So if a client excludes the
> entire set of DNs other clients won't be affected.
> 

OK,  +1 then.

Re: [VOTE -- Round 2] Commit hdfs-630 to 0.21?

Posted by Cosmin Lehene <cl...@adobe.com>.

Steve, 

A DoS could not be done using excludedNodes.

The blacklisting takes place only at DFSClientLevel. The NN will return a
list of block locations that excludes the nodes the client decided. This
list isn't persisted anywhere on the server. So if a client excludes the
entire set of DNs other clients won't be affected.

Cosmin 


On 1/22/10 5:32 PM, "Steve Loughran" <st...@apache.org> wrote:

> Stack wrote:
> 
> I'm being 0 on this
> 
> -I would worry if the exclusion list was used by the NN to do its
> blacklisting, I'm glad to see this isn't happening. Yes, you could pick
> up datanode failure faster, but you would also be vulnerable to a user
> doing a DoS against the cluster by reporting every DN as failing
> 
> -Russ Perry's work on high-speed Hadoop rendering [1] tweaked Hadoop to
> allow the datanodes to get the entire list of nodes holding the data,
> and allowed them to make their own decision about where to get the data
> from. This
>   1. pushed the policy of handling failure down to the clients, less
> need to talk to the NN about it.
>   2. lets you do something very fancy where you deliberately choose data
> from different DNs, so that you can then pull data off the cluster at
> the full bandwidth of every disk
> 
> Long term, I would like to see Russ's addition go in, so worry if the
> HDFS-630 patch would be useful long term. Maybe its a more fundamental
> issue: where does the decision making go, into the clients or into the NN?
> 
> -steve
> 
> 
> 
> [1] http://www.hpl.hp.com/techreports/2009/HPL-2009-345.html

Re: [VOTE -- Round 2] Commit hdfs-630 to 0.21?

Posted by Steve Loughran <st...@apache.org>.

Stack wrote:

I'm being 0 on this

-I would worry if the exclusion list was used by the NN to do its 
blacklisting, I'm glad to see this isn't happening. Yes, you could pick 
up datanode failure faster, but you would also be vulnerable to a user 
doing a DoS against the cluster by reporting every DN as failing

-Russ Perry's work on high-speed Hadoop rendering [1] tweaked Hadoop to 
allow the datanodes to get the entire list of nodes holding the data, 
and allowed them to make their own decision about where to get the data 
from. This
  1. pushed the policy of handling failure down to the clients, less 
need to talk to the NN about it.
  2. lets you do something very fancy where you deliberately choose data 
from different DNs, so that you can then pull data off the cluster at 
the full bandwidth of every disk

Long term, I would like to see Russ's addition go in, so worry if the 
HDFS-630 patch would be useful long term. Maybe its a more fundamental 
issue: where does the decision making go, into the clients or into the NN?

-steve



[1] http://www.hpl.hp.com/techreports/2009/HPL-2009-345.html

Re: [VOTE -- Round 2] Commit hdfs-630 to 0.21?

Posted by "Tsz Wo (Nicholas), Sze" <s2...@yahoo.com>.

+1
Nicholas Sze




----- Original Message ----
> From: Stack <st...@duboce.net>
> To: hdfs-dev@hadoop.apache.org
> Cc: HBase Dev List <hb...@hadoop.apache.org>
> Sent: Thu, January 21, 2010 2:36:25 PM
> Subject: [VOTE -- Round 2] Commit hdfs-630 to 0.21?
> 
> I'd like to propose a new vote on having hdfs-630 committed to 0.21.
> The first vote on this topic, initiated 12/14/2009, was sunk by Tsz Wo
> (Nicholas), Sze suggested improvements. Those suggestions have since
> been folded into a new version of the hdfs-630 patch.  Its this new
> version of the patch -- 0001-Fix-HDFS-630-0.21-svn-2.patch -- that I'd
> like us to vote on. For background on why we -- the hbase community
> -- think hdfs-630 important, see the notes below from the original
> call-to-vote.
> 
> I'm obviously +1.
> 
> Thanks for you consideration,
> St.Ack
> 
> P.S. Regards TRUNK, after chatting with Nicholas, TRUNK was cleaned of
> the previous versions of hdfs-630 and we'll likely apply
> 0001-Fix-HDFS-630-trunk-svn-4.patch, a version of
> 0001-Fix-HDFS-630-0.21-svn-2.patch that works for TRUNK that includes
> the Nicholas suggestions.
> 
> 
> On Mon, Dec 14, 2009 at 9:56 PM, stack wrote:
> > I'd like to propose a vote on having hdfs-630 committed to 0.21 (Its already
> > been committed to TRUNK).
> >
> > hdfs-630 adds having the dfsclient pass the namenode the name of datanodes
> > its determined dead because it got a failed connection when it tried to
> > contact it, etc.  This is useful in the interval between datanode dying and
> > namenode timing out its lease.  Without this fix, the namenode can often
> > give out the dead datanode as a host for a block.  If the cluster is small,
> > less than 5 or 6 nodes, then its very likely namenode will give out the dead
> > datanode as a block host.
> >
> > Small clusters are common in hbase, especially when folks are starting out
> > or evaluating hbase.  They'll start with three or four nodes carrying both
> > datanodes+hbase regionservers.  They'll experiment killing one of the slaves
> > -- datanodes and regionserver -- and watch what happens.  What follows is a
> > struggling dfsclient trying to create replicas where one of the datanodes
> > passed us by the namenode is dead.   DFSClient will fail and then go back to
> > the namenode again, etc. (See
> > https://issues.apache.org/jira/browse/HBASE-1876 for more detailed
> > blow-by-blow).  HBase operation will be held up during this time and
> > eventually a regionserver will shut itself down to protect itself against
> > dataloss if we can't successfully write HDFS.
> >
> > Thanks all,
> > St.Ack

Re: [VOTE -- Round 2] Commit hdfs-630 to 0.21?

Posted by Mahadev Konar <ma...@yahoo-inc.com>.

+1

mahadev


On 1/21/10 2:46 PM, "Ryan Rawson" <ry...@gmail.com> wrote:

> Scaling _down_ is a continual problem for us, and this is one of the
> prime factors. It puts a bad taste in the mouth of new people who then
> run away from HBase and HDFS since it is "unreliable and unstable". It
> is perfectly within scope to support a cluster of about 5-6 machines
> which can have an aggregate capacity of 24TB (which is a fair amount),
> and people expect to start small, prove the concept/technology then
> move up.
> 
> I am also +1
> 
> On Thu, Jan 21, 2010 at 2:36 PM, Stack <st...@duboce.net> wrote:
>> I'd like to propose a new vote on having hdfs-630 committed to 0.21.
>> The first vote on this topic, initiated 12/14/2009, was sunk by Tsz Wo
>> (Nicholas), Sze suggested improvements. Those suggestions have since
>> been folded into a new version of the hdfs-630 patch.  Its this new
>> version of the patch -- 0001-Fix-HDFS-630-0.21-svn-2.patch -- that I'd
>> like us to vote on. For background on why we -- the hbase community
>> -- think hdfs-630 important, see the notes below from the original
>> call-to-vote.
>> 
>> I'm obviously +1.
>> 
>> Thanks for you consideration,
>> St.Ack
>> 
>> P.S. Regards TRUNK, after chatting with Nicholas, TRUNK was cleaned of
>> the previous versions of hdfs-630 and we'll likely apply
>> 0001-Fix-HDFS-630-trunk-svn-4.patch, a version of
>> 0001-Fix-HDFS-630-0.21-svn-2.patch that works for TRUNK that includes
>> the Nicholas suggestions.
>> 
>> 
>> On Mon, Dec 14, 2009 at 9:56 PM, stack <st...@duboce.net> wrote:
>>> I'd like to propose a vote on having hdfs-630 committed to 0.21 (Its already
>>> been committed to TRUNK).
>>> 
>>> hdfs-630 adds having the dfsclient pass the namenode the name of datanodes
>>> its determined dead because it got a failed connection when it tried to
>>> contact it, etc.  This is useful in the interval between datanode dying and
>>> namenode timing out its lease.  Without this fix, the namenode can often
>>> give out the dead datanode as a host for a block.  If the cluster is small,
>>> less than 5 or 6 nodes, then its very likely namenode will give out the dead
>>> datanode as a block host.
>>> 
>>> Small clusters are common in hbase, especially when folks are starting out
>>> or evaluating hbase.  They'll start with three or four nodes carrying both
>>> datanodes+hbase regionservers.  They'll experiment killing one of the slaves
>>> -- datanodes and regionserver -- and watch what happens.  What follows is a
>>> struggling dfsclient trying to create replicas where one of the datanodes
>>> passed us by the namenode is dead.   DFSClient will fail and then go back to
>>> the namenode again, etc. (See
>>> https://issues.apache.org/jira/browse/HBASE-1876 for more detailed
>>> blow-by-blow).  HBase operation will be held up during this time and
>>> eventually a regionserver will shut itself down to protect itself against
>>> dataloss if we can't successfully write HDFS.
>>> 
>>> Thanks all,
>>> St.Ack
>>

Re: [VOTE -- Round 2] Commit hdfs-630 to 0.21?

Posted by Mahadev Konar <ma...@yahoo-inc.com>.

+1

mahadev


On 1/21/10 2:46 PM, "Ryan Rawson" <ry...@gmail.com> wrote:

> Scaling _down_ is a continual problem for us, and this is one of the
> prime factors. It puts a bad taste in the mouth of new people who then
> run away from HBase and HDFS since it is "unreliable and unstable". It
> is perfectly within scope to support a cluster of about 5-6 machines
> which can have an aggregate capacity of 24TB (which is a fair amount),
> and people expect to start small, prove the concept/technology then
> move up.
> 
> I am also +1
> 
> On Thu, Jan 21, 2010 at 2:36 PM, Stack <st...@duboce.net> wrote:
>> I'd like to propose a new vote on having hdfs-630 committed to 0.21.
>> The first vote on this topic, initiated 12/14/2009, was sunk by Tsz Wo
>> (Nicholas), Sze suggested improvements. Those suggestions have since
>> been folded into a new version of the hdfs-630 patch.  Its this new
>> version of the patch -- 0001-Fix-HDFS-630-0.21-svn-2.patch -- that I'd
>> like us to vote on. For background on why we -- the hbase community
>> -- think hdfs-630 important, see the notes below from the original
>> call-to-vote.
>> 
>> I'm obviously +1.
>> 
>> Thanks for you consideration,
>> St.Ack
>> 
>> P.S. Regards TRUNK, after chatting with Nicholas, TRUNK was cleaned of
>> the previous versions of hdfs-630 and we'll likely apply
>> 0001-Fix-HDFS-630-trunk-svn-4.patch, a version of
>> 0001-Fix-HDFS-630-0.21-svn-2.patch that works for TRUNK that includes
>> the Nicholas suggestions.
>> 
>> 
>> On Mon, Dec 14, 2009 at 9:56 PM, stack <st...@duboce.net> wrote:
>>> I'd like to propose a vote on having hdfs-630 committed to 0.21 (Its already
>>> been committed to TRUNK).
>>> 
>>> hdfs-630 adds having the dfsclient pass the namenode the name of datanodes
>>> its determined dead because it got a failed connection when it tried to
>>> contact it, etc.  This is useful in the interval between datanode dying and
>>> namenode timing out its lease.  Without this fix, the namenode can often
>>> give out the dead datanode as a host for a block.  If the cluster is small,
>>> less than 5 or 6 nodes, then its very likely namenode will give out the dead
>>> datanode as a block host.
>>> 
>>> Small clusters are common in hbase, especially when folks are starting out
>>> or evaluating hbase.  They'll start with three or four nodes carrying both
>>> datanodes+hbase regionservers.  They'll experiment killing one of the slaves
>>> -- datanodes and regionserver -- and watch what happens.  What follows is a
>>> struggling dfsclient trying to create replicas where one of the datanodes
>>> passed us by the namenode is dead.   DFSClient will fail and then go back to
>>> the namenode again, etc. (See
>>> https://issues.apache.org/jira/browse/HBASE-1876 for more detailed
>>> blow-by-blow).  HBase operation will be held up during this time and
>>> eventually a regionserver will shut itself down to protect itself against
>>> dataloss if we can't successfully write HDFS.
>>> 
>>> Thanks all,
>>> St.Ack
>>

Re: [VOTE -- Round 2] Commit hdfs-630 to 0.21?

Posted by Ryan Rawson <ry...@gmail.com>.

Scaling _down_ is a continual problem for us, and this is one of the
prime factors. It puts a bad taste in the mouth of new people who then
run away from HBase and HDFS since it is "unreliable and unstable". It
is perfectly within scope to support a cluster of about 5-6 machines
which can have an aggregate capacity of 24TB (which is a fair amount),
and people expect to start small, prove the concept/technology then
move up.

I am also +1

On Thu, Jan 21, 2010 at 2:36 PM, Stack <st...@duboce.net> wrote:
> I'd like to propose a new vote on having hdfs-630 committed to 0.21.
> The first vote on this topic, initiated 12/14/2009, was sunk by Tsz Wo
> (Nicholas), Sze suggested improvements. Those suggestions have since
> been folded into a new version of the hdfs-630 patch.  Its this new
> version of the patch -- 0001-Fix-HDFS-630-0.21-svn-2.patch -- that I'd
> like us to vote on. For background on why we -- the hbase community
> -- think hdfs-630 important, see the notes below from the original
> call-to-vote.
>
> I'm obviously +1.
>
> Thanks for you consideration,
> St.Ack
>
> P.S. Regards TRUNK, after chatting with Nicholas, TRUNK was cleaned of
> the previous versions of hdfs-630 and we'll likely apply
> 0001-Fix-HDFS-630-trunk-svn-4.patch, a version of
> 0001-Fix-HDFS-630-0.21-svn-2.patch that works for TRUNK that includes
> the Nicholas suggestions.
>
>
> On Mon, Dec 14, 2009 at 9:56 PM, stack <st...@duboce.net> wrote:
>> I'd like to propose a vote on having hdfs-630 committed to 0.21 (Its already
>> been committed to TRUNK).
>>
>> hdfs-630 adds having the dfsclient pass the namenode the name of datanodes
>> its determined dead because it got a failed connection when it tried to
>> contact it, etc.  This is useful in the interval between datanode dying and
>> namenode timing out its lease.  Without this fix, the namenode can often
>> give out the dead datanode as a host for a block.  If the cluster is small,
>> less than 5 or 6 nodes, then its very likely namenode will give out the dead
>> datanode as a block host.
>>
>> Small clusters are common in hbase, especially when folks are starting out
>> or evaluating hbase.  They'll start with three or four nodes carrying both
>> datanodes+hbase regionservers.  They'll experiment killing one of the slaves
>> -- datanodes and regionserver -- and watch what happens.  What follows is a
>> struggling dfsclient trying to create replicas where one of the datanodes
>> passed us by the namenode is dead.   DFSClient will fail and then go back to
>> the namenode again, etc. (See
>> https://issues.apache.org/jira/browse/HBASE-1876 for more detailed
>> blow-by-blow).  HBase operation will be held up during this time and
>> eventually a regionserver will shut itself down to protect itself against
>> dataloss if we can't successfully write HDFS.
>>
>> Thanks all,
>> St.Ack
>

Re: [VOTE -- Round 2] Commit hdfs-630 to 0.21?

Posted by Ryan Rawson <ry...@gmail.com>.

Scaling _down_ is a continual problem for us, and this is one of the
prime factors. It puts a bad taste in the mouth of new people who then
run away from HBase and HDFS since it is "unreliable and unstable". It
is perfectly within scope to support a cluster of about 5-6 machines
which can have an aggregate capacity of 24TB (which is a fair amount),
and people expect to start small, prove the concept/technology then
move up.

I am also +1

On Thu, Jan 21, 2010 at 2:36 PM, Stack <st...@duboce.net> wrote:
> I'd like to propose a new vote on having hdfs-630 committed to 0.21.
> The first vote on this topic, initiated 12/14/2009, was sunk by Tsz Wo
> (Nicholas), Sze suggested improvements. Those suggestions have since
> been folded into a new version of the hdfs-630 patch.  Its this new
> version of the patch -- 0001-Fix-HDFS-630-0.21-svn-2.patch -- that I'd
> like us to vote on. For background on why we -- the hbase community
> -- think hdfs-630 important, see the notes below from the original
> call-to-vote.
>
> I'm obviously +1.
>
> Thanks for you consideration,
> St.Ack
>
> P.S. Regards TRUNK, after chatting with Nicholas, TRUNK was cleaned of
> the previous versions of hdfs-630 and we'll likely apply
> 0001-Fix-HDFS-630-trunk-svn-4.patch, a version of
> 0001-Fix-HDFS-630-0.21-svn-2.patch that works for TRUNK that includes
> the Nicholas suggestions.
>
>
> On Mon, Dec 14, 2009 at 9:56 PM, stack <st...@duboce.net> wrote:
>> I'd like to propose a vote on having hdfs-630 committed to 0.21 (Its already
>> been committed to TRUNK).
>>
>> hdfs-630 adds having the dfsclient pass the namenode the name of datanodes
>> its determined dead because it got a failed connection when it tried to
>> contact it, etc.  This is useful in the interval between datanode dying and
>> namenode timing out its lease.  Without this fix, the namenode can often
>> give out the dead datanode as a host for a block.  If the cluster is small,
>> less than 5 or 6 nodes, then its very likely namenode will give out the dead
>> datanode as a block host.
>>
>> Small clusters are common in hbase, especially when folks are starting out
>> or evaluating hbase.  They'll start with three or four nodes carrying both
>> datanodes+hbase regionservers.  They'll experiment killing one of the slaves
>> -- datanodes and regionserver -- and watch what happens.  What follows is a
>> struggling dfsclient trying to create replicas where one of the datanodes
>> passed us by the namenode is dead.   DFSClient will fail and then go back to
>> the namenode again, etc. (See
>> https://issues.apache.org/jira/browse/HBASE-1876 for more detailed
>> blow-by-blow).  HBase operation will be held up during this time and
>> eventually a regionserver will shut itself down to protect itself against
>> dataloss if we can't successfully write HDFS.
>>
>> Thanks all,
>> St.Ack
>