You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Mi...@emc.com on 2011/09/01 00:07:14 UTC

Re: Add Append-HBase support in upcoming 20.205

FWIW, Stack has already done the work needed to make sure that Hbase works
with Hadoop 0.22 branch, and I suppose if
https://issues.apache.org/jira/browse/MAPREDUCE-2767 is committed, it
removes the last blocker from 0.22.0, so that it can be released.

I am cc'ng hbase-dev, since this is relevant to them as well.

- Milind

On 8/31/11 11:41 AM, "sanjay Radia" <sa...@hortonworks.com> wrote:

>
>I propose that the 20-append patches (details below)  be included in
>20.205 which will become the first official Apache
>release of Hadoop that supports Append and HBase.
>
>Background:
>There hasn't been a official Apache release that supports HBase.
>The HBase community have instead been using the 20-append branch; the
>patches were contributed by the HBase community including Facebook. The
>Cloudera distribution has also included these patches.
>Andrew Purtell has ported these patches to 20-security branch.
>
>Risk Level:
>These patches have been used and tested on large HBase clusters by FB ,
>by those who use 20-append branch directly (various users including a 500
>node HBase cluster at Yahoo) and by those that use the Cloudera
>distribution. We have reviewed the patches and have conducted further
>tests; testing and validation continues.
>
>
>Patches:
>HDFS-200. Support append and sync for hadoop 0.20 branch.
>HDFS-142. Blocks that are being written by a client are stored in the
>blocksBeingWritten directory.
>HDFS-1057.  Concurrent readers hit ChecksumExceptions if following a
>writer to very end of file
>HDFS-724.  Use a bidirectional heartbeat to detect stuck pipeline.
>HDFS-895. Allow hflush/sync to occur in parallel with new writes to the
>file.
>HDFS-1520. Lightweight NameNode operation recoverLease to trigger lease
>recovery.
>HDFS-1555. Disallow pipelien recovery if a file is already being lease
>recovered.
>HDFS-1554. New semantics for recoverLease.
>HDFS-988. Fix bug where savenameSpace can corrupt edits log.
>HDFS-826. Allow a mechanism for an application to detect that datanode(s)
>have died in the write pipeline.
>HDFS-630. Client can exclude specific nodes in the write pipeline.
>HDFS-1141. completeFile does not check lease ownership.
>HDFS-1204. Lease expiration should recover single files, not entire lease
>holder
>HDFS-1254. Support append/sync via the default configuration.
>HDFS-1346. DFSClient receives out of order packet ack.
>HDFS-1054. remove sleep before retry for allocating a block.
>
>


Re: Add Append-HBase support in upcoming 20.205

Posted by Mi...@emc.com.
>
>
>
>For example, https://issues.apache.org/jira/browse/HDFS-2288 can cause
>HBase to fail to recover its WAL during a crash scenario. There are
>some others that I'll be likely working through in the coming months.

Thanks Todd.

Will go through it to test against 0.22.

- milind

---
Milind Bhandarkar
Greenplum Labs, EMC
(Disclaimer: Opinions expressed in this email are those of the author, and
do not necessarily represent the views of any organization, past or
present, the author might be affiliated with.)


Re: Add Append-HBase support in upcoming 20.205

Posted by Mi...@emc.com.
>
>
>
>For example, https://issues.apache.org/jira/browse/HDFS-2288 can cause
>HBase to fail to recover its WAL during a crash scenario. There are
>some others that I'll be likely working through in the coming months.

Thanks Todd.

Will go through it to test against 0.22.

- milind

---
Milind Bhandarkar
Greenplum Labs, EMC
(Disclaimer: Opinions expressed in this email are those of the author, and
do not necessarily represent the views of any organization, past or
present, the author might be affiliated with.)


Re: Add Append-HBase support in upcoming 20.205

Posted by Todd Lipcon <to...@cloudera.com>.
On Wed, Aug 31, 2011 at 3:07 PM,  <Mi...@emc.com> wrote:
> FWIW, Stack has already done the work needed to make sure that Hbase works
> with Hadoop 0.22 branch, and I suppose if
> https://issues.apache.org/jira/browse/MAPREDUCE-2767 is committed, it
> removes the last blocker from 0.22.0, so that it can be released.

The 0.22 implementation "works" but there are certainly still bugs in it.

If other HDFS committers familiar with the new append could help here,
that would be very much appreciated.

For example, https://issues.apache.org/jira/browse/HDFS-2288 can cause
HBase to fail to recover its WAL during a crash scenario. There are
some others that I'll be likely working through in the coming months.

-Todd

>
> I am cc'ng hbase-dev, since this is relevant to them as well.
>
> - Milind
>
> On 8/31/11 11:41 AM, "sanjay Radia" <sa...@hortonworks.com> wrote:
>
>>
>>I propose that the 20-append patches (details below)  be included in
>>20.205 which will become the first official Apache
>>release of Hadoop that supports Append and HBase.
>>
>>Background:
>>There hasn't been a official Apache release that supports HBase.
>>The HBase community have instead been using the 20-append branch; the
>>patches were contributed by the HBase community including Facebook. The
>>Cloudera distribution has also included these patches.
>>Andrew Purtell has ported these patches to 20-security branch.
>>
>>Risk Level:
>>These patches have been used and tested on large HBase clusters by FB ,
>>by those who use 20-append branch directly (various users including a 500
>>node HBase cluster at Yahoo) and by those that use the Cloudera
>>distribution. We have reviewed the patches and have conducted further
>>tests; testing and validation continues.
>>
>>
>>Patches:
>>HDFS-200. Support append and sync for hadoop 0.20 branch.
>>HDFS-142. Blocks that are being written by a client are stored in the
>>blocksBeingWritten directory.
>>HDFS-1057.  Concurrent readers hit ChecksumExceptions if following a
>>writer to very end of file
>>HDFS-724.  Use a bidirectional heartbeat to detect stuck pipeline.
>>HDFS-895. Allow hflush/sync to occur in parallel with new writes to the
>>file.
>>HDFS-1520. Lightweight NameNode operation recoverLease to trigger lease
>>recovery.
>>HDFS-1555. Disallow pipelien recovery if a file is already being lease
>>recovered.
>>HDFS-1554. New semantics for recoverLease.
>>HDFS-988. Fix bug where savenameSpace can corrupt edits log.
>>HDFS-826. Allow a mechanism for an application to detect that datanode(s)
>>have died in the write pipeline.
>>HDFS-630. Client can exclude specific nodes in the write pipeline.
>>HDFS-1141. completeFile does not check lease ownership.
>>HDFS-1204. Lease expiration should recover single files, not entire lease
>>holder
>>HDFS-1254. Support append/sync via the default configuration.
>>HDFS-1346. DFSClient receives out of order packet ack.
>>HDFS-1054. remove sleep before retry for allocating a block.
>>
>>
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

RE: Add Append-HBase support in upcoming 20.205

Posted by "Rottinghuis, Joep" <jr...@ebay.com>.
It seems that HBase was made to compile against 0.23 (trunk at the time).
See HBASE-4327.

Thanks,

Joep

-----Original Message-----
From: Milind.Bhandarkar@emc.com [mailto:Milind.Bhandarkar@emc.com] 
Sent: Wednesday, August 31, 2011 3:07 PM
To: general@hadoop.apache.org
Cc: dev@hbase.apache.org
Subject: Re: Add Append-HBase support in upcoming 20.205

FWIW, Stack has already done the work needed to make sure that Hbase works with Hadoop 0.22 branch, and I suppose if
https://issues.apache.org/jira/browse/MAPREDUCE-2767 is committed, it removes the last blocker from 0.22.0, so that it can be released.

I am cc'ng hbase-dev, since this is relevant to them as well.

- Milind


Re: Add Append-HBase support in upcoming 20.205

Posted by Todd Lipcon <to...@cloudera.com>.
On Wed, Aug 31, 2011 at 3:07 PM,  <Mi...@emc.com> wrote:
> FWIW, Stack has already done the work needed to make sure that Hbase works
> with Hadoop 0.22 branch, and I suppose if
> https://issues.apache.org/jira/browse/MAPREDUCE-2767 is committed, it
> removes the last blocker from 0.22.0, so that it can be released.

The 0.22 implementation "works" but there are certainly still bugs in it.

If other HDFS committers familiar with the new append could help here,
that would be very much appreciated.

For example, https://issues.apache.org/jira/browse/HDFS-2288 can cause
HBase to fail to recover its WAL during a crash scenario. There are
some others that I'll be likely working through in the coming months.

-Todd

>
> I am cc'ng hbase-dev, since this is relevant to them as well.
>
> - Milind
>
> On 8/31/11 11:41 AM, "sanjay Radia" <sa...@hortonworks.com> wrote:
>
>>
>>I propose that the 20-append patches (details below)  be included in
>>20.205 which will become the first official Apache
>>release of Hadoop that supports Append and HBase.
>>
>>Background:
>>There hasn't been a official Apache release that supports HBase.
>>The HBase community have instead been using the 20-append branch; the
>>patches were contributed by the HBase community including Facebook. The
>>Cloudera distribution has also included these patches.
>>Andrew Purtell has ported these patches to 20-security branch.
>>
>>Risk Level:
>>These patches have been used and tested on large HBase clusters by FB ,
>>by those who use 20-append branch directly (various users including a 500
>>node HBase cluster at Yahoo) and by those that use the Cloudera
>>distribution. We have reviewed the patches and have conducted further
>>tests; testing and validation continues.
>>
>>
>>Patches:
>>HDFS-200. Support append and sync for hadoop 0.20 branch.
>>HDFS-142. Blocks that are being written by a client are stored in the
>>blocksBeingWritten directory.
>>HDFS-1057.  Concurrent readers hit ChecksumExceptions if following a
>>writer to very end of file
>>HDFS-724.  Use a bidirectional heartbeat to detect stuck pipeline.
>>HDFS-895. Allow hflush/sync to occur in parallel with new writes to the
>>file.
>>HDFS-1520. Lightweight NameNode operation recoverLease to trigger lease
>>recovery.
>>HDFS-1555. Disallow pipelien recovery if a file is already being lease
>>recovered.
>>HDFS-1554. New semantics for recoverLease.
>>HDFS-988. Fix bug where savenameSpace can corrupt edits log.
>>HDFS-826. Allow a mechanism for an application to detect that datanode(s)
>>have died in the write pipeline.
>>HDFS-630. Client can exclude specific nodes in the write pipeline.
>>HDFS-1141. completeFile does not check lease ownership.
>>HDFS-1204. Lease expiration should recover single files, not entire lease
>>holder
>>HDFS-1254. Support append/sync via the default configuration.
>>HDFS-1346. DFSClient receives out of order packet ack.
>>HDFS-1054. remove sleep before retry for allocating a block.
>>
>>
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

RE: Add Append-HBase support in upcoming 20.205

Posted by "Rottinghuis, Joep" <jr...@ebay.com>.
It seems that HBase was made to compile against 0.23 (trunk at the time).
See HBASE-4327.

Thanks,

Joep

-----Original Message-----
From: Milind.Bhandarkar@emc.com [mailto:Milind.Bhandarkar@emc.com] 
Sent: Wednesday, August 31, 2011 3:07 PM
To: general@hadoop.apache.org
Cc: dev@hbase.apache.org
Subject: Re: Add Append-HBase support in upcoming 20.205

FWIW, Stack has already done the work needed to make sure that Hbase works with Hadoop 0.22 branch, and I suppose if
https://issues.apache.org/jira/browse/MAPREDUCE-2767 is committed, it removes the last blocker from 0.22.0, so that it can be released.

I am cc'ng hbase-dev, since this is relevant to them as well.

- Milind