You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Ajit Ratnaparkhi <aj...@gmail.com> on 2011/09/15 13:07:04 UTC

Need help regarding HDFS-RAID

Hi,

We want to use HDFS-RAID in our production cluster. (
http://wiki.apache.org/hadoop/HDFS-RAID)
I am not able to find source/binaries/configs for this in official hadoop
distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).

Can somebody please tell me where can I find that? and installation
procedure?
Also, is HDFS-RAID implementation stable enough to use in production?

thanks,
Ajit.

Re: Need help regarding HDFS-RAID

Posted by Ajit Ratnaparkhi <aj...@gmail.com>.
Thanks!

On Thu, Sep 15, 2011 at 11:31 PM, Andrew Purtell <ap...@apache.org>wrote:

> HDFS RAID from 0.21 will work if back ported to 0.20. Only a minor fixup is
> needed.
>
> HDFS RAID from 0.22 relies on new HDFS APIs not available in 0.20.
>
> Best regards,
>
>     - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
> ------------------------------
> *From:* Ajit Ratnaparkhi <aj...@gmail.com>
>
> *To:* hdfs-user@hadoop.apache.org
> *Cc:* Andrew Purtell <ap...@apache.org>
> *Sent:* Thursday, September 15, 2011 10:54 AM
>
> *Subject:* Re: Need help regarding HDFS-RAID
>
> Thanks for the info!
> So can I use HDFS-RAID taken from apache hdfs trunk as it is with
> hadoop-0.20.1/hadoop-0.20.2 ? It seems to be under branch 0.21, will it work
> with 0.20.* ?
>
> thanks,
> -Ajit.
>
> On Thu, Sep 15, 2011 at 10:44 PM, Dhruba Borthakur <dh...@gmail.com>wrote:
>
> That's right Andy. 0.22+. We are running a HDFS-RAID code base that is
> pretty close to what is available in Apache hdfs trunk.
>
> -dhruba
>
>
> On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell <ap...@apache.org>wrote:
>
> But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?
>
> Best regards,
>
>     - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
> ------------------------------
> *From:* Dhruba Borthakur <dh...@gmail.com>
> *To:* hdfs-user@hadoop.apache.org
> *Sent:* Thursday, September 15, 2011 10:06 AM
> *Subject:* Re: Need help regarding HDFS-RAID
>
> We use HDFS RAID in a big way. Data older than 12 days are RAIDED using XOR
> encoding (effective replication of 2.5). Data older than a few months are
> raided using ReedSolomon (effective observed replication factor of 1.5).
> This is running on our 60 PB size cluster for about an year now.
>
> thanks
> dhruba
>
> On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi <
> ajit.ratnaparkhi@gmail.com> wrote:
>
> Hi,
>
> We were planning to use it for past data archival(instead of moving it to
> archival store).
> Archiving it in HDFS gives advantage of making it easily available for
> processing whenever required.
>
> Is there any archival solution in hadoop ecosystem?
>
> thanks,
> Ajit.
>
>
> On Thu, Sep 15, 2011 at 5:05 PM, Harsh J <ha...@cloudera.com> wrote:
>
> Hey Ajit,
>
> HDFS-RAID was never part of the 0.20 release. It made its debut in the
> 0.21 release [1]. I know that Facebook uses it (and also did develop
> it), but unsure of users beyond Facebook.
>
> While 0.21 overall is not entirely deemed as production-usable yet
> (and is in fact, possibly abandoned for efforts on 0.22+), you can
> give that release a whirl on a test cluster and see for yourself if
> your need beats the stability.
>
> Just curious though - why are you looking to use this specifically?
>
> [1] -
> http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/
>
> On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
> <aj...@gmail.com> wrote:
> > Hi,
> > We want to use HDFS-RAID in our production cluster.
> > (http://wiki.apache.org/hadoop/HDFS-RAID)
> > I am not able to find source/binaries/configs for this in official hadoop
> > distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
> > Can somebody please tell me where can I find that? and installation
> > procedure?
> > Also, is HDFS-RAID implementation stable enough to use in production?
> > thanks,
> > Ajit.
> >
>
>
>
> --
> Harsh J
>
>
>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>
>
>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>
>
>
>
>

Re: Need help regarding HDFS-RAID

Posted by Andrew Purtell <ap...@apache.org>.
HDFS RAID from 0.21 will work if back ported to 0.20. Only a minor fixup is needed.

HDFS RAID from 0.22 relies on new HDFS APIs not available in 0.20.

 
Best regards,


    - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)


>________________________________
>From: Ajit Ratnaparkhi <aj...@gmail.com>
>To: hdfs-user@hadoop.apache.org
>Cc: Andrew Purtell <ap...@apache.org>
>Sent: Thursday, September 15, 2011 10:54 AM
>Subject: Re: Need help regarding HDFS-RAID
>
>
>Thanks for the info!
>So can I use HDFS-RAID taken from apache hdfs trunk as it is with hadoop-0.20.1/hadoop-0.20.2 ? It seems to be under branch 0.21, will it work with 0.20.* ?
>
>
>thanks,
>-Ajit.
>
>
>On Thu, Sep 15, 2011 at 10:44 PM, Dhruba Borthakur <dh...@gmail.com> wrote:
>
>That's right Andy. 0.22+. We are running a HDFS-RAID code base that is pretty close to what is available in Apache hdfs trunk.
>>
>>
>>-dhruba
>>
>>
>>
>>On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell <ap...@apache.org> wrote:
>>
>>But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?
>>>
>>> 
>>>Best regards,
>>>
>>>
>>>       - Andy
>>>
>>>Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
>>>
>>>
>>>>________________________________
>>>>From: Dhruba Borthakur <dh...@gmail.com>
>>>>To: hdfs-user@hadoop.apache.org
>>>>Sent: Thursday, September 15, 2011 10:06 AM
>>>>Subject: Re: Need help regarding HDFS-RAID
>>>>
>>>>
>>>>
>>>>We use HDFS RAID in a big way. Data older than 12 days are RAIDED using XOR encoding (effective replication of 2.5). Data older than a few months are raided using ReedSolomon (effective observed replication factor of 1.5). This is running on our 60 PB size cluster for about an year now.
>>>>
>>>>
>>>>thanks
>>>>dhruba
>>>>
>>>>
>>>>
>>>>On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi <aj...@gmail.com> wrote:
>>>>
>>>>Hi,
>>>>>
>>>>>
>>>>>We were planning to use it for past data archival(instead of moving it to archival store).
>>>>>Archiving it in HDFS gives advantage of making it easily available for processing whenever required.
>>>>>
>>>>>
>>>>>Is there any archival solution in hadoop ecosystem?
>>>>>
>>>>>
>>>>>thanks,
>>>>>Ajit.
>>>>>
>>>>>
>>>>>
>>>>>On Thu, Sep 15, 2011 at 5:05 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>>
>>>>>Hey Ajit,
>>>>>>
>>>>>>HDFS-RAID was never part of the 0.20 release. It made its debut in the
>>>>>>0.21 release [1]. I know that Facebook uses it (and also did develop
>>>>>>it), but unsure of users beyond Facebook.
>>>>>>
>>>>>>While 0.21 overall is not entirely deemed as production-usable yet
>>>>>>(and is in fact, possibly abandoned for efforts on 0.22+), you can
>>>>>>give that release a whirl on a test cluster and see for yourself if
>>>>>>your need beats the stability.
>>>>>>
>>>>>>Just curious though - why are you looking to use this specifically?
>>>>>>
>>>>>>[1] - http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/
>>>>>>
>>>>>>
>>>>>>On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
>>>>>><aj...@gmail.com> wrote:
>>>>>>> Hi,
>>>>>>> We want to use HDFS-RAID in our production cluster.
>>>>>>> (http://wiki.apache.org/hadoop/HDFS-RAID)
>>>>>>> I am not able to find source/binaries/configs for this in official hadoop
>>>>>>> distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
>>>>>>> Can somebody please tell me where can I find that? and installation
>>>>>>> procedure?
>>>>>>> Also, is HDFS-RAID implementation stable enough to use in production?
>>>>>>> thanks,
>>>>>>> Ajit.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>--
>>>>>>Harsh J
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>-- 
>>>>Connect to me at http://www.facebook.com/dhruba
>>>>
>>>>
>>>>
>>
>>
>>
>>-- 
>>Connect to me at http://www.facebook.com/dhruba
>>
>
>
>

Re: Need help regarding HDFS-RAID

Posted by Ajit Ratnaparkhi <aj...@gmail.com>.
Thanks for the info!
So can I use HDFS-RAID taken from apache hdfs trunk as it is with
hadoop-0.20.1/hadoop-0.20.2 ? It seems to be under branch 0.21, will it work
with 0.20.* ?

thanks,
-Ajit.

On Thu, Sep 15, 2011 at 10:44 PM, Dhruba Borthakur <dh...@gmail.com> wrote:

> That's right Andy. 0.22+. We are running a HDFS-RAID code base that is
> pretty close to what is available in Apache hdfs trunk.
>
> -dhruba
>
>
> On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell <ap...@apache.org>wrote:
>
>> But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?
>>
>> Best regards,
>>
>>     - Andy
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>>
>> ------------------------------
>> *From:* Dhruba Borthakur <dh...@gmail.com>
>> *To:* hdfs-user@hadoop.apache.org
>> *Sent:* Thursday, September 15, 2011 10:06 AM
>> *Subject:* Re: Need help regarding HDFS-RAID
>>
>> We use HDFS RAID in a big way. Data older than 12 days are RAIDED using
>> XOR encoding (effective replication of 2.5). Data older than a few months
>> are raided using ReedSolomon (effective observed replication factor of 1.5).
>> This is running on our 60 PB size cluster for about an year now.
>>
>> thanks
>> dhruba
>>
>> On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi <
>> ajit.ratnaparkhi@gmail.com> wrote:
>>
>> Hi,
>>
>> We were planning to use it for past data archival(instead of moving it to
>> archival store).
>> Archiving it in HDFS gives advantage of making it easily available for
>> processing whenever required.
>>
>> Is there any archival solution in hadoop ecosystem?
>>
>> thanks,
>> Ajit.
>>
>>
>> On Thu, Sep 15, 2011 at 5:05 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hey Ajit,
>>
>> HDFS-RAID was never part of the 0.20 release. It made its debut in the
>> 0.21 release [1]. I know that Facebook uses it (and also did develop
>> it), but unsure of users beyond Facebook.
>>
>> While 0.21 overall is not entirely deemed as production-usable yet
>> (and is in fact, possibly abandoned for efforts on 0.22+), you can
>> give that release a whirl on a test cluster and see for yourself if
>> your need beats the stability.
>>
>> Just curious though - why are you looking to use this specifically?
>>
>> [1] -
>> http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/
>>
>> On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
>> <aj...@gmail.com> wrote:
>> > Hi,
>> > We want to use HDFS-RAID in our production cluster.
>> > (http://wiki.apache.org/hadoop/HDFS-RAID)
>> > I am not able to find source/binaries/configs for this in official
>> hadoop
>> > distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
>> > Can somebody please tell me where can I find that? and installation
>> > procedure?
>> > Also, is HDFS-RAID implementation stable enough to use in production?
>> > thanks,
>> > Ajit.
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>>
>>
>>
>>
>> --
>> Connect to me at http://www.facebook.com/dhruba
>>
>>
>>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>

Re: Need help regarding HDFS-RAID

Posted by Ajit Ratnaparkhi <aj...@gmail.com>.
Thanks Dhruba!

Can I try using it? Is it open for use?

-Ajit.

On Tue, Sep 20, 2011 at 2:48 PM, Dhruba Borthakur <dh...@gmail.com> wrote:

> Hi andy,
>
> we do run a version of HDFS RAID that is backported from Apache trunk to a
> 0.20 based release. Our code is in
> https://github.com/facebook/hadoop-20-warehouse/tree/master/src/contrib/raid
> But I do not have an elegant way to contribute this code to Apache
> 0.20.2xx.x.
>
> thanks,
> dhruba
>
>
> On Sat, Sep 17, 2011 at 9:16 AM, Andrew Purtell <ap...@apache.org>wrote:
>
>> Hi Dhruba,
>>
>> Would you consider a contribution of this to branch-0.20-security aka
>> 0.20.2xx.x?
>>
>> If I am mistaken and you do not have a 0.22-ish HDFS RAID backported to an
>> 0.20-ish platform, please disregard.
>>
>> Best regards,
>>
>>     - Andy
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>>
>>  ------------------------------
>> *From:* Dhruba Borthakur <dh...@gmail.com>
>> *To:* hdfs-user@hadoop.apache.org; Andrew Purtell <ap...@apache.org>
>> *Sent:* Thursday, September 15, 2011 10:14 AM
>>
>> *Subject:* Re: Need help regarding HDFS-RAID
>>
>> That's right Andy. 0.22+. We are running a HDFS-RAID code base that is
>> pretty close to what is available in Apache hdfs trunk.
>>
>> -dhruba
>>
>> On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell <ap...@apache.org>wrote:
>>
>> But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?
>>
>> Best regards,
>>
>>     - Andy
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>>
>> ------------------------------
>> *From:* Dhruba Borthakur <dh...@gmail.com>
>> *To:* hdfs-user@hadoop.apache.org
>> *Sent:* Thursday, September 15, 2011 10:06 AM
>> *Subject:* Re: Need help regarding HDFS-RAID
>>
>> We use HDFS RAID in a big way. Data older than 12 days are RAIDED using
>> XOR encoding (effective replication of 2.5). Data older than a few months
>> are raided using ReedSolomon (effective observed replication factor of 1.5).
>> This is running on our 60 PB size cluster for about an year now.
>>
>> thanks
>> dhruba
>>
>> On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi <
>> ajit.ratnaparkhi@gmail.com> wrote:
>>
>> Hi,
>>
>> We were planning to use it for past data archival(instead of moving it to
>> archival store).
>> Archiving it in HDFS gives advantage of making it easily available for
>> processing whenever required.
>>
>> Is there any archival solution in hadoop ecosystem?
>>
>> thanks,
>> Ajit.
>>
>>
>> On Thu, Sep 15, 2011 at 5:05 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hey Ajit,
>>
>> HDFS-RAID was never part of the 0.20 release. It made its debut in the
>> 0.21 release [1]. I know that Facebook uses it (and also did develop
>> it), but unsure of users beyond Facebook.
>>
>> While 0.21 overall is not entirely deemed as production-usable yet
>> (and is in fact, possibly abandoned for efforts on 0.22+), you can
>> give that release a whirl on a test cluster and see for yourself if
>> your need beats the stability.
>>
>> Just curious though - why are you looking to use this specifically?
>>
>> [1] -
>> http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/
>>
>> On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
>> <aj...@gmail.com> wrote:
>> > Hi,
>> > We want to use HDFS-RAID in our production cluster.
>> > (http://wiki.apache.org/hadoop/HDFS-RAID)
>> > I am not able to find source/binaries/configs for this in official
>> hadoop
>> > distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
>> > Can somebody please tell me where can I find that? and installation
>> > procedure?
>> > Also, is HDFS-RAID implementation stable enough to use in production?
>> > thanks,
>> > Ajit.
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>>
>>
>>
>>
>> --
>> Connect to me at http://www.facebook.com/dhruba
>>
>>
>>
>>
>>
>> --
>> Connect to me at http://www.facebook.com/dhruba
>>
>>
>>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>

Re: Need help regarding HDFS-RAID

Posted by Andrew Purtell <ap...@apache.org>.
> I will be very grateful to you if you merge and contribute it to Apache Hadoop 0.20.2xx.x.

Hmm... I see what you mean. I was naive about what is "branch-20-warehouse". I was looking for an updated HDFS RAID that incorporated R-S coding but ran against a 20-ish HDFS. I suppose it is relatively easy to have a HDFS RAID close to what is in trunk if HDFS has evolved in your branch. :-)


It looks like the changes to HDFS can be teased apart as:

  - BlockMissingException

  - Listing file status and block locations: LocatedFileStatus, FileSystem.listLocatedStatus


  - Corrupt file reporting
     - Changes to FSNameSystem and UnderReplicatedBlocks for tracking and reporting corrupt blocks

     - Update to the ClientProtocol for listing corrupt file blocks: listCorruptFileBlocks()

     - DFSUtil.getCorruptFiles


  - Change visibility and constructor for datanode.BlockSender so RAID can send repaired blocks without needing to be a DataNode or without reimplementing the packet protocol


  - A set of quite invasive changes to the NameNode dealing with pluggable block placement policies, but RAID could possibly live without this, the PlacementMonitor would have more work to do in that case


I suppose the upside to any consideration for back porting all of this into an 0.20.2xx is all of the above has already gone through trunk.


Best regards,

    - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)


>________________________________
>From: Dhruba Borthakur <dh...@gmail.com>
>To: hdfs-user@hadoop.apache.org; Andrew Purtell <ap...@apache.org>
>Sent: Tuesday, September 20, 2011 9:49 AM
>Subject: Re: Need help regarding HDFS-RAID
>
>
>Hi Andy,
>
>
>I will be very grateful to you if you merge and contribute it to Apache Hadoop 0.20.2xx.x.
>
>
>thanks,
>dhruba
>
>
>On Tue, Sep 20, 2011 at 9:03 AM, Andrew Purtell <ap...@apache.org> wrote:
>
>Hi Dhruba,
>>
>>Thanks for the pointer. I'm going to try and pull this code into our internal 20-ish distro. Would you object if I make a contribution of that result if it is successful?
>>
>>
>>
>>Best regards,
>>
>>
>>    - Andy
>>
>>Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
>>
>>>________________________________
>>>From: Dhruba Borthakur <dh...@gmail.com>
>>>To: Andrew Purtell <ap...@apache.org>
>>>Cc: "hdfs-user@hadoop.apache.org" <hd...@hadoop.apache.org>
>>>Sent: Tuesday, September 20, 2011 2:18 AM
>>
>>>Subject: Re: Need help regarding HDFS-RAID
>>>
>>>
>>>Hi andy,
>>>
>>>
>>>we do run a version of HDFS RAID that is backported from Apache trunk to a 0.20 based release. Our code is in https://github.com/facebook/hadoop-20-warehouse/tree/master/src/contrib/raid
>>>But I do not have an elegant way to contribute this code to Apache 0.20.2xx.x. 
>>>
>>>
>>>thanks,
>>>dhruba
>>>
>>>
>>>On Sat, Sep 17, 2011 at 9:16 AM, Andrew Purtell <ap...@apache.org> wrote:
>>>
>>>Hi Dhruba,
>>>>
>>>>
>>>>Would you consider a contribution of this to branch-0.20-security aka 0.20.2xx.x?
>>>>
>>>>
>>>>If I am mistaken and you do not have a 0.22-ish HDFS RAID backported to an 0.20-ish platform, please disregard.
>>>>
>>>>
>>>>Best regards,
>>>>
>>>>
>>>>    - Andy
>>>>
>>>>Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
>>>>
>>>>
>>>>>________________________________
>>>>>From: Dhruba Borthakur <dh...@gmail.com>
>>>>>To: hdfs-user@hadoop.apache.org; Andrew Purtell <ap...@apache.org>
>>>>>Sent: Thursday, September 15, 2011 10:14 AM
>>>>>
>>>>>Subject: Re: Need help regarding HDFS-RAID
>>>>>
>>>>>
>>>>>
>>>>>That's right Andy. 0.22+. We are running a HDFS-RAID code base that is pretty close to what is available in Apache hdfs trunk.
>>>>>
>>>>>
>>>>>-dhruba
>>>>>
>>>>>
>>>>>On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell <ap...@apache.org> wrote:
>>>>>
>>>>>But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?
>>>>>>
>>>>>> 
>>>>>>Best regards,
>>>>>>
>>>>>>
>>>>>>       - Andy
>>>>>>
>>>>>>Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
>>>>>>
>>>>>>
>>>>>>>________________________________
>>>>>>>From: Dhruba Borthakur <dh...@gmail.com>
>>>>>>>To: hdfs-user@hadoop.apache.org
>>>>>>>Sent: Thursday, September 15, 2011 10:06 AM
>>>>>>>Subject: Re: Need help regarding HDFS-RAID
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>We use HDFS RAID in a big way. Data older than 12 days are RAIDED using XOR encoding (effective replication of 2.5). Data older than a few months are raided using ReedSolomon (effective observed replication factor of 1.5). This is running on our 60 PB size cluster for about an year now.
>>>>>>>
>>>>>>>
>>>>>>>thanks
>>>>>>>dhruba
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi <aj...@gmail.com> wrote:
>>>>>>>
>>>>>>>Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>>We were planning to use it for past data archival(instead of moving it to archival store).
>>>>>>>>Archiving it in HDFS gives advantage of making it easily available for processing whenever required.
>>>>>>>>
>>>>>>>>
>>>>>>>>Is there any archival solution in hadoop ecosystem?
>>>>>>>>
>>>>>>>>
>>>>>>>>thanks,
>>>>>>>>Ajit.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>On Thu, Sep 15, 2011 at 5:05 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>>>>>
>>>>>>>>Hey Ajit,
>>>>>>>>>
>>>>>>>>>HDFS-RAID was never part of the 0.20 release. It made its debut in the
>>>>>>>>>0.21 release [1]. I know that Facebook uses it (and also did develop
>>>>>>>>>it), but unsure of users beyond Facebook.
>>>>>>>>>
>>>>>>>>>While 0.21 overall is not entirely deemed as production-usable yet
>>>>>>>>>(and is in fact, possibly abandoned for efforts on 0.22+), you can
>>>>>>>>>give that release a whirl on a test cluster and see for yourself if
>>>>>>>>>your need beats the stability.
>>>>>>>>>
>>>>>>>>>Just curious though - why are you looking to use this specifically?
>>>>>>>>>
>>>>>>>>>[1] - http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
>>>>>>>>><aj...@gmail.com> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> We want to use HDFS-RAID in our production cluster.
>>>>>>>>>> (http://wiki.apache.org/hadoop/HDFS-RAID)
>>>>>>>>>> I am not able to find source/binaries/configs for this in official hadoop
>>>>>>>>>> distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
>>>>>>>>>> Can somebody please tell me where can I find that? and installation
>>>>>>>>>> procedure?
>>>>>>>>>> Also, is HDFS-RAID implementation stable enough to use in production?
>>>>>>>>>> thanks,
>>>>>>>>>> Ajit.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>--
>>>>>>>>>Harsh J
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>--
>>>>>>>Connect to me at http://www.facebook.com/dhruba
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>--
>>>>>Connect to me at http://www.facebook.com/dhruba
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>--
>>>Connect to me at http://www.facebook.com/dhruba
>>>
>>>
>>> 
>>
>
>
>
>-- 
>Connect to me at http://www.facebook.com/dhruba
>
>
> 

Re: Need help regarding HDFS-RAID

Posted by Dhruba Borthakur <dh...@gmail.com>.
Hi Andy,

I will be very grateful to you if you merge and contribute it to Apache
Hadoop 0.20.2xx.x.

thanks,
dhruba

On Tue, Sep 20, 2011 at 9:03 AM, Andrew Purtell <ap...@apache.org> wrote:

> Hi Dhruba,
>
> Thanks for the pointer. I'm going to try and pull this code into our
> internal 20-ish distro. Would you object if I make a contribution of that
> result if it is successful?
>
>
> Best regards,
>
>
>     - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
> >________________________________
> >From: Dhruba Borthakur <dh...@gmail.com>
> >To: Andrew Purtell <ap...@apache.org>
> >Cc: "hdfs-user@hadoop.apache.org" <hd...@hadoop.apache.org>
> >Sent: Tuesday, September 20, 2011 2:18 AM
> >Subject: Re: Need help regarding HDFS-RAID
> >
> >
> >Hi andy,
> >
> >
> >we do run a version of HDFS RAID that is backported from Apache trunk to a
> 0.20 based release. Our code is in
> https://github.com/facebook/hadoop-20-warehouse/tree/master/src/contrib/raid
> >But I do not have an elegant way to contribute this code to
> Apache 0.20.2xx.x.
> >
> >
> >thanks,
> >dhruba
> >
> >
> >On Sat, Sep 17, 2011 at 9:16 AM, Andrew Purtell <ap...@apache.org>
> wrote:
> >
> >Hi Dhruba,
> >>
> >>
> >>Would you consider a contribution of this to branch-0.20-security
> aka 0.20.2xx.x?
> >>
> >>
> >>If I am mistaken and you do not have a 0.22-ish HDFS RAID backported to
> an 0.20-ish platform, please disregard.
> >>
> >>
> >>Best regards,
> >>
> >>
> >>    - Andy
> >>
> >>Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
> >>
> >>
> >>>________________________________
> >>>From: Dhruba Borthakur <dh...@gmail.com>
> >>>To: hdfs-user@hadoop.apache.org; Andrew Purtell <ap...@apache.org>
> >>>Sent: Thursday, September 15, 2011 10:14 AM
> >>>
> >>>Subject: Re: Need help regarding HDFS-RAID
> >>>
> >>>
> >>>
> >>>That's right Andy. 0.22+. We are running a HDFS-RAID code base that is
> pretty close to what is available in Apache hdfs trunk.
> >>>
> >>>
> >>>-dhruba
> >>>
> >>>
> >>>On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell <ap...@apache.org>
> wrote:
> >>>
> >>>But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?
> >>>>
> >>>>
> >>>>Best regards,
> >>>>
> >>>>
> >>>>       - Andy
> >>>>
> >>>>Problems worthy of attack prove their worth by hitting back. - Piet
> Hein (via Tom White)
> >>>>
> >>>>
> >>>>>________________________________
> >>>>>From: Dhruba Borthakur <dh...@gmail.com>
> >>>>>To: hdfs-user@hadoop.apache.org
> >>>>>Sent: Thursday, September 15, 2011 10:06 AM
> >>>>>Subject: Re: Need help regarding HDFS-RAID
> >>>>>
> >>>>>
> >>>>>
> >>>>>We use HDFS RAID in a big way. Data older than 12 days are RAIDED
> using XOR encoding (effective replication of 2.5). Data older than a few
> months are raided using ReedSolomon (effective observed replication factor
> of 1.5). This is running on our 60 PB size cluster for about an year now.
> >>>>>
> >>>>>
> >>>>>thanks
> >>>>>dhruba
> >>>>>
> >>>>>
> >>>>>
> >>>>>On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi <
> ajit.ratnaparkhi@gmail.com> wrote:
> >>>>>
> >>>>>Hi,
> >>>>>>
> >>>>>>
> >>>>>>We were planning to use it for past data archival(instead of moving
> it to archival store).
> >>>>>>Archiving it in HDFS gives advantage of making it easily available
> for processing whenever required.
> >>>>>>
> >>>>>>
> >>>>>>Is there any archival solution in hadoop ecosystem?
> >>>>>>
> >>>>>>
> >>>>>>thanks,
> >>>>>>Ajit.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>On Thu, Sep 15, 2011 at 5:05 PM, Harsh J <ha...@cloudera.com> wrote:
> >>>>>>
> >>>>>>Hey Ajit,
> >>>>>>>
> >>>>>>>HDFS-RAID was never part of the 0.20 release. It made its debut in
> the
> >>>>>>>0.21 release [1]. I know that Facebook uses it (and also did develop
> >>>>>>>it), but unsure of users beyond Facebook.
> >>>>>>>
> >>>>>>>While 0.21 overall is not entirely deemed as production-usable yet
> >>>>>>>(and is in fact, possibly abandoned for efforts on 0.22+), you can
> >>>>>>>give that release a whirl on a test cluster and see for yourself if
> >>>>>>>your need beats the stability.
> >>>>>>>
> >>>>>>>Just curious though - why are you looking to use this specifically?
> >>>>>>>
> >>>>>>>[1] -
> http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/
> >>>>>>>
> >>>>>>>
> >>>>>>>On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
> >>>>>>><aj...@gmail.com> wrote:
> >>>>>>>> Hi,
> >>>>>>>> We want to use HDFS-RAID in our production cluster.
> >>>>>>>> (http://wiki.apache.org/hadoop/HDFS-RAID)
> >>>>>>>> I am not able to find source/binaries/configs for this in official
> hadoop
> >>>>>>>> distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
> >>>>>>>> Can somebody please tell me where can I find that? and
> installation
> >>>>>>>> procedure?
> >>>>>>>> Also, is HDFS-RAID implementation stable enough to use in
> production?
> >>>>>>>> thanks,
> >>>>>>>> Ajit.
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>--
> >>>>>>>Harsh J
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>--
> >>>>>Connect to me at http://www.facebook.com/dhruba
> >>>>>
> >>>>>
> >>>>>
> >>>
> >>>
> >>>
> >>>--
> >>>Connect to me at http://www.facebook.com/dhruba
> >>>
> >>>
> >>>
> >
> >
> >
> >--
> >Connect to me at http://www.facebook.com/dhruba
> >
> >
> >
>



-- 
Connect to me at http://www.facebook.com/dhruba

Re: Need help regarding HDFS-RAID

Posted by Andrew Purtell <ap...@apache.org>.
Hi Dhruba,

Thanks for the pointer. I'm going to try and pull this code into our internal 20-ish distro. Would you object if I make a contribution of that result if it is successful?


Best regards,


    - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

>________________________________
>From: Dhruba Borthakur <dh...@gmail.com>
>To: Andrew Purtell <ap...@apache.org>
>Cc: "hdfs-user@hadoop.apache.org" <hd...@hadoop.apache.org>
>Sent: Tuesday, September 20, 2011 2:18 AM
>Subject: Re: Need help regarding HDFS-RAID
>
>
>Hi andy,
>
>
>we do run a version of HDFS RAID that is backported from Apache trunk to a 0.20 based release. Our code is in https://github.com/facebook/hadoop-20-warehouse/tree/master/src/contrib/raid
>But I do not have an elegant way to contribute this code to Apache 0.20.2xx.x. 
>
>
>thanks,
>dhruba
>
>
>On Sat, Sep 17, 2011 at 9:16 AM, Andrew Purtell <ap...@apache.org> wrote:
>
>Hi Dhruba,
>>
>>
>>Would you consider a contribution of this to branch-0.20-security aka 0.20.2xx.x?
>>
>>
>>If I am mistaken and you do not have a 0.22-ish HDFS RAID backported to an 0.20-ish platform, please disregard.
>>
>>
>>Best regards,
>>
>>
>>    - Andy
>>
>>Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
>>
>>
>>>________________________________
>>>From: Dhruba Borthakur <dh...@gmail.com>
>>>To: hdfs-user@hadoop.apache.org; Andrew Purtell <ap...@apache.org>
>>>Sent: Thursday, September 15, 2011 10:14 AM
>>>
>>>Subject: Re: Need help regarding HDFS-RAID
>>>
>>>
>>>
>>>That's right Andy. 0.22+. We are running a HDFS-RAID code base that is pretty close to what is available in Apache hdfs trunk.
>>>
>>>
>>>-dhruba
>>>
>>>
>>>On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell <ap...@apache.org> wrote:
>>>
>>>But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?
>>>>
>>>> 
>>>>Best regards,
>>>>
>>>>
>>>>       - Andy
>>>>
>>>>Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
>>>>
>>>>
>>>>>________________________________
>>>>>From: Dhruba Borthakur <dh...@gmail.com>
>>>>>To: hdfs-user@hadoop.apache.org
>>>>>Sent: Thursday, September 15, 2011 10:06 AM
>>>>>Subject: Re: Need help regarding HDFS-RAID
>>>>>
>>>>>
>>>>>
>>>>>We use HDFS RAID in a big way. Data older than 12 days are RAIDED using XOR encoding (effective replication of 2.5). Data older than a few months are raided using ReedSolomon (effective observed replication factor of 1.5). This is running on our 60 PB size cluster for about an year now.
>>>>>
>>>>>
>>>>>thanks
>>>>>dhruba
>>>>>
>>>>>
>>>>>
>>>>>On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi <aj...@gmail.com> wrote:
>>>>>
>>>>>Hi,
>>>>>>
>>>>>>
>>>>>>We were planning to use it for past data archival(instead of moving it to archival store).
>>>>>>Archiving it in HDFS gives advantage of making it easily available for processing whenever required.
>>>>>>
>>>>>>
>>>>>>Is there any archival solution in hadoop ecosystem?
>>>>>>
>>>>>>
>>>>>>thanks,
>>>>>>Ajit.
>>>>>>
>>>>>>
>>>>>>
>>>>>>On Thu, Sep 15, 2011 at 5:05 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>>>
>>>>>>Hey Ajit,
>>>>>>>
>>>>>>>HDFS-RAID was never part of the 0.20 release. It made its debut in the
>>>>>>>0.21 release [1]. I know that Facebook uses it (and also did develop
>>>>>>>it), but unsure of users beyond Facebook.
>>>>>>>
>>>>>>>While 0.21 overall is not entirely deemed as production-usable yet
>>>>>>>(and is in fact, possibly abandoned for efforts on 0.22+), you can
>>>>>>>give that release a whirl on a test cluster and see for yourself if
>>>>>>>your need beats the stability.
>>>>>>>
>>>>>>>Just curious though - why are you looking to use this specifically?
>>>>>>>
>>>>>>>[1] - http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/
>>>>>>>
>>>>>>>
>>>>>>>On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
>>>>>>><aj...@gmail.com> wrote:
>>>>>>>> Hi,
>>>>>>>> We want to use HDFS-RAID in our production cluster.
>>>>>>>> (http://wiki.apache.org/hadoop/HDFS-RAID)
>>>>>>>> I am not able to find source/binaries/configs for this in official hadoop
>>>>>>>> distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
>>>>>>>> Can somebody please tell me where can I find that? and installation
>>>>>>>> procedure?
>>>>>>>> Also, is HDFS-RAID implementation stable enough to use in production?
>>>>>>>> thanks,
>>>>>>>> Ajit.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>--
>>>>>>>Harsh J
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>-- 
>>>>>Connect to me at http://www.facebook.com/dhruba
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>-- 
>>>Connect to me at http://www.facebook.com/dhruba
>>>
>>>
>>>
>
>
>
>-- 
>Connect to me at http://www.facebook.com/dhruba
>
>
> 

Re: Need help regarding HDFS-RAID

Posted by Dhruba Borthakur <dh...@gmail.com>.
Hi andy,

we do run a version of HDFS RAID that is backported from Apache trunk to a
0.20 based release. Our code is in
https://github.com/facebook/hadoop-20-warehouse/tree/master/src/contrib/raid
But I do not have an elegant way to contribute this code to Apache
0.20.2xx.x.

thanks,
dhruba

On Sat, Sep 17, 2011 at 9:16 AM, Andrew Purtell <ap...@apache.org> wrote:

> Hi Dhruba,
>
> Would you consider a contribution of this to branch-0.20-security aka
> 0.20.2xx.x?
>
> If I am mistaken and you do not have a 0.22-ish HDFS RAID backported to an
> 0.20-ish platform, please disregard.
>
> Best regards,
>
>     - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
> ------------------------------
> *From:* Dhruba Borthakur <dh...@gmail.com>
> *To:* hdfs-user@hadoop.apache.org; Andrew Purtell <ap...@apache.org>
> *Sent:* Thursday, September 15, 2011 10:14 AM
>
> *Subject:* Re: Need help regarding HDFS-RAID
>
> That's right Andy. 0.22+. We are running a HDFS-RAID code base that is
> pretty close to what is available in Apache hdfs trunk.
>
> -dhruba
>
> On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell <ap...@apache.org>wrote:
>
> But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?
>
> Best regards,
>
>     - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
> ------------------------------
> *From:* Dhruba Borthakur <dh...@gmail.com>
> *To:* hdfs-user@hadoop.apache.org
> *Sent:* Thursday, September 15, 2011 10:06 AM
> *Subject:* Re: Need help regarding HDFS-RAID
>
> We use HDFS RAID in a big way. Data older than 12 days are RAIDED using XOR
> encoding (effective replication of 2.5). Data older than a few months are
> raided using ReedSolomon (effective observed replication factor of 1.5).
> This is running on our 60 PB size cluster for about an year now.
>
> thanks
> dhruba
>
> On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi <
> ajit.ratnaparkhi@gmail.com> wrote:
>
> Hi,
>
> We were planning to use it for past data archival(instead of moving it to
> archival store).
> Archiving it in HDFS gives advantage of making it easily available for
> processing whenever required.
>
> Is there any archival solution in hadoop ecosystem?
>
> thanks,
> Ajit.
>
>
> On Thu, Sep 15, 2011 at 5:05 PM, Harsh J <ha...@cloudera.com> wrote:
>
> Hey Ajit,
>
> HDFS-RAID was never part of the 0.20 release. It made its debut in the
> 0.21 release [1]. I know that Facebook uses it (and also did develop
> it), but unsure of users beyond Facebook.
>
> While 0.21 overall is not entirely deemed as production-usable yet
> (and is in fact, possibly abandoned for efforts on 0.22+), you can
> give that release a whirl on a test cluster and see for yourself if
> your need beats the stability.
>
> Just curious though - why are you looking to use this specifically?
>
> [1] -
> http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/
>
> On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
> <aj...@gmail.com> wrote:
> > Hi,
> > We want to use HDFS-RAID in our production cluster.
> > (http://wiki.apache.org/hadoop/HDFS-RAID)
> > I am not able to find source/binaries/configs for this in official hadoop
> > distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
> > Can somebody please tell me where can I find that? and installation
> > procedure?
> > Also, is HDFS-RAID implementation stable enough to use in production?
> > thanks,
> > Ajit.
> >
>
>
>
> --
> Harsh J
>
>
>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>
>
>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>
>
>


-- 
Connect to me at http://www.facebook.com/dhruba

Re: Need help regarding HDFS-RAID

Posted by Andrew Purtell <ap...@apache.org>.
Hi Dhruba,

Would you consider a contribution of this to branch-0.20-security aka 0.20.2xx.x?

If I am mistaken and you do not have a 0.22-ish HDFS RAID backported to an 0.20-ish platform, please disregard.

Best regards,


    - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)


>________________________________
>From: Dhruba Borthakur <dh...@gmail.com>
>To: hdfs-user@hadoop.apache.org; Andrew Purtell <ap...@apache.org>
>Sent: Thursday, September 15, 2011 10:14 AM
>Subject: Re: Need help regarding HDFS-RAID
>
>
>That's right Andy. 0.22+. We are running a HDFS-RAID code base that is pretty close to what is available in Apache hdfs trunk.
>
>
>-dhruba
>
>
>On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell <ap...@apache.org> wrote:
>
>But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?
>>
>> 
>>Best regards,
>>
>>
>>       - Andy
>>
>>Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
>>
>>
>>>________________________________
>>>From: Dhruba Borthakur <dh...@gmail.com>
>>>To: hdfs-user@hadoop.apache.org
>>>Sent: Thursday, September 15, 2011 10:06 AM
>>>Subject: Re: Need help regarding HDFS-RAID
>>>
>>>
>>>
>>>We use HDFS RAID in a big way. Data older than 12 days are RAIDED using XOR encoding (effective replication of 2.5). Data older than a few months are raided using ReedSolomon (effective observed replication factor of 1.5). This is running on our 60 PB size cluster for about an year now.
>>>
>>>
>>>thanks
>>>dhruba
>>>
>>>
>>>
>>>On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi <aj...@gmail.com> wrote:
>>>
>>>Hi,
>>>>
>>>>
>>>>We were planning to use it for past data archival(instead of moving it to archival store).
>>>>Archiving it in HDFS gives advantage of making it easily available for processing whenever required.
>>>>
>>>>
>>>>Is there any archival solution in hadoop ecosystem?
>>>>
>>>>
>>>>thanks,
>>>>Ajit.
>>>>
>>>>
>>>>
>>>>On Thu, Sep 15, 2011 at 5:05 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>>Hey Ajit,
>>>>>
>>>>>HDFS-RAID was never part of the 0.20 release. It made its debut in the
>>>>>0.21 release [1]. I know that Facebook uses it (and also did develop
>>>>>it), but unsure of users beyond Facebook.
>>>>>
>>>>>While 0.21 overall is not entirely deemed as production-usable yet
>>>>>(and is in fact, possibly abandoned for efforts on 0.22+), you can
>>>>>give that release a whirl on a test cluster and see for yourself if
>>>>>your need beats the stability.
>>>>>
>>>>>Just curious though - why are you looking to use this specifically?
>>>>>
>>>>>[1] - http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/
>>>>>
>>>>>
>>>>>On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
>>>>><aj...@gmail.com> wrote:
>>>>>> Hi,
>>>>>> We want to use HDFS-RAID in our production cluster.
>>>>>> (http://wiki.apache.org/hadoop/HDFS-RAID)
>>>>>> I am not able to find source/binaries/configs for this in official hadoop
>>>>>> distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
>>>>>> Can somebody please tell me where can I find that? and installation
>>>>>> procedure?
>>>>>> Also, is HDFS-RAID implementation stable enough to use in production?
>>>>>> thanks,
>>>>>> Ajit.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>--
>>>>>Harsh J
>>>>>
>>>>
>>>
>>>
>>>
>>>-- 
>>>Connect to me at http://www.facebook.com/dhruba
>>>
>>>
>>>
>
>
>
>-- 
>Connect to me at http://www.facebook.com/dhruba
>
>
>

Re: Need help regarding HDFS-RAID

Posted by Dhruba Borthakur <dh...@gmail.com>.
That's right Andy. 0.22+. We are running a HDFS-RAID code base that is
pretty close to what is available in Apache hdfs trunk.

-dhruba

On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell <ap...@apache.org>wrote:

> But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?
>
> Best regards,
>
>     - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
> ------------------------------
> *From:* Dhruba Borthakur <dh...@gmail.com>
> *To:* hdfs-user@hadoop.apache.org
> *Sent:* Thursday, September 15, 2011 10:06 AM
> *Subject:* Re: Need help regarding HDFS-RAID
>
> We use HDFS RAID in a big way. Data older than 12 days are RAIDED using XOR
> encoding (effective replication of 2.5). Data older than a few months are
> raided using ReedSolomon (effective observed replication factor of 1.5).
> This is running on our 60 PB size cluster for about an year now.
>
> thanks
> dhruba
>
> On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi <
> ajit.ratnaparkhi@gmail.com> wrote:
>
> Hi,
>
> We were planning to use it for past data archival(instead of moving it to
> archival store).
> Archiving it in HDFS gives advantage of making it easily available for
> processing whenever required.
>
> Is there any archival solution in hadoop ecosystem?
>
> thanks,
> Ajit.
>
>
> On Thu, Sep 15, 2011 at 5:05 PM, Harsh J <ha...@cloudera.com> wrote:
>
> Hey Ajit,
>
> HDFS-RAID was never part of the 0.20 release. It made its debut in the
> 0.21 release [1]. I know that Facebook uses it (and also did develop
> it), but unsure of users beyond Facebook.
>
> While 0.21 overall is not entirely deemed as production-usable yet
> (and is in fact, possibly abandoned for efforts on 0.22+), you can
> give that release a whirl on a test cluster and see for yourself if
> your need beats the stability.
>
> Just curious though - why are you looking to use this specifically?
>
> [1] -
> http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/
>
> On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
> <aj...@gmail.com> wrote:
> > Hi,
> > We want to use HDFS-RAID in our production cluster.
> > (http://wiki.apache.org/hadoop/HDFS-RAID)
> > I am not able to find source/binaries/configs for this in official hadoop
> > distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
> > Can somebody please tell me where can I find that? and installation
> > procedure?
> > Also, is HDFS-RAID implementation stable enough to use in production?
> > thanks,
> > Ajit.
> >
>
>
>
> --
> Harsh J
>
>
>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>
>
>


-- 
Connect to me at http://www.facebook.com/dhruba

Re: Need help regarding HDFS-RAID

Posted by Andrew Purtell <ap...@apache.org>.
But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?

 
Best regards,


       - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)


>________________________________
>From: Dhruba Borthakur <dh...@gmail.com>
>To: hdfs-user@hadoop.apache.org
>Sent: Thursday, September 15, 2011 10:06 AM
>Subject: Re: Need help regarding HDFS-RAID
>
>
>We use HDFS RAID in a big way. Data older than 12 days are RAIDED using XOR encoding (effective replication of 2.5). Data older than a few months are raided using ReedSolomon (effective observed replication factor of 1.5). This is running on our 60 PB size cluster for about an year now.
>
>
>thanks
>dhruba
>
>
>
>On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi <aj...@gmail.com> wrote:
>
>Hi,
>>
>>
>>We were planning to use it for past data archival(instead of moving it to archival store).
>>Archiving it in HDFS gives advantage of making it easily available for processing whenever required.
>>
>>
>>Is there any archival solution in hadoop ecosystem?
>>
>>
>>thanks,
>>Ajit.
>>
>>
>>
>>On Thu, Sep 15, 2011 at 5:05 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>Hey Ajit,
>>>
>>>HDFS-RAID was never part of the 0.20 release. It made its debut in the
>>>0.21 release [1]. I know that Facebook uses it (and also did develop
>>>it), but unsure of users beyond Facebook.
>>>
>>>While 0.21 overall is not entirely deemed as production-usable yet
>>>(and is in fact, possibly abandoned for efforts on 0.22+), you can
>>>give that release a whirl on a test cluster and see for yourself if
>>>your need beats the stability.
>>>
>>>Just curious though - why are you looking to use this specifically?
>>>
>>>[1] - http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/
>>>
>>>
>>>On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
>>><aj...@gmail.com> wrote:
>>>> Hi,
>>>> We want to use HDFS-RAID in our production cluster.
>>>> (http://wiki.apache.org/hadoop/HDFS-RAID)
>>>> I am not able to find source/binaries/configs for this in official hadoop
>>>> distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
>>>> Can somebody please tell me where can I find that? and installation
>>>> procedure?
>>>> Also, is HDFS-RAID implementation stable enough to use in production?
>>>> thanks,
>>>> Ajit.
>>>>
>>>
>>>
>>>
>>>--
>>>Harsh J
>>>
>>
>
>
>
>-- 
>Connect to me at http://www.facebook.com/dhruba
>
>
>

Re: Need help regarding HDFS-RAID

Posted by Dhruba Borthakur <dh...@gmail.com>.
We use HDFS RAID in a big way. Data older than 12 days are RAIDED using XOR
encoding (effective replication of 2.5). Data older than a few months are
raided using ReedSolomon (effective observed replication factor of 1.5).
This is running on our 60 PB size cluster for about an year now.

thanks
dhruba

On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi <
ajit.ratnaparkhi@gmail.com> wrote:

> Hi,
>
> We were planning to use it for past data archival(instead of moving it to
> archival store).
> Archiving it in HDFS gives advantage of making it easily available for
> processing whenever required.
>
> Is there any archival solution in hadoop ecosystem?
>
> thanks,
> Ajit.
>
>
> On Thu, Sep 15, 2011 at 5:05 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> Hey Ajit,
>>
>> HDFS-RAID was never part of the 0.20 release. It made its debut in the
>> 0.21 release [1]. I know that Facebook uses it (and also did develop
>> it), but unsure of users beyond Facebook.
>>
>> While 0.21 overall is not entirely deemed as production-usable yet
>> (and is in fact, possibly abandoned for efforts on 0.22+), you can
>> give that release a whirl on a test cluster and see for yourself if
>> your need beats the stability.
>>
>> Just curious though - why are you looking to use this specifically?
>>
>> [1] -
>> http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/
>>
>> On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
>> <aj...@gmail.com> wrote:
>> > Hi,
>> > We want to use HDFS-RAID in our production cluster.
>> > (http://wiki.apache.org/hadoop/HDFS-RAID)
>> > I am not able to find source/binaries/configs for this in official
>> hadoop
>> > distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
>> > Can somebody please tell me where can I find that? and installation
>> > procedure?
>> > Also, is HDFS-RAID implementation stable enough to use in production?
>> > thanks,
>> > Ajit.
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>


-- 
Connect to me at http://www.facebook.com/dhruba

Re: Need help regarding HDFS-RAID

Posted by Ajit Ratnaparkhi <aj...@gmail.com>.
Hi,

We were planning to use it for past data archival(instead of moving it to
archival store).
Archiving it in HDFS gives advantage of making it easily available for
processing whenever required.

Is there any archival solution in hadoop ecosystem?

thanks,
Ajit.


On Thu, Sep 15, 2011 at 5:05 PM, Harsh J <ha...@cloudera.com> wrote:

> Hey Ajit,
>
> HDFS-RAID was never part of the 0.20 release. It made its debut in the
> 0.21 release [1]. I know that Facebook uses it (and also did develop
> it), but unsure of users beyond Facebook.
>
> While 0.21 overall is not entirely deemed as production-usable yet
> (and is in fact, possibly abandoned for efforts on 0.22+), you can
> give that release a whirl on a test cluster and see for yourself if
> your need beats the stability.
>
> Just curious though - why are you looking to use this specifically?
>
> [1] -
> http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/
>
> On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
> <aj...@gmail.com> wrote:
> > Hi,
> > We want to use HDFS-RAID in our production cluster.
> > (http://wiki.apache.org/hadoop/HDFS-RAID)
> > I am not able to find source/binaries/configs for this in official hadoop
> > distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
> > Can somebody please tell me where can I find that? and installation
> > procedure?
> > Also, is HDFS-RAID implementation stable enough to use in production?
> > thanks,
> > Ajit.
> >
>
>
>
> --
> Harsh J
>

Re: Need help regarding HDFS-RAID

Posted by Harsh J <ha...@cloudera.com>.
Hey Ajit,

HDFS-RAID was never part of the 0.20 release. It made its debut in the
0.21 release [1]. I know that Facebook uses it (and also did develop
it), but unsure of users beyond Facebook.

While 0.21 overall is not entirely deemed as production-usable yet
(and is in fact, possibly abandoned for efforts on 0.22+), you can
give that release a whirl on a test cluster and see for yourself if
your need beats the stability.

Just curious though - why are you looking to use this specifically?

[1] - http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/

On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
<aj...@gmail.com> wrote:
> Hi,
> We want to use HDFS-RAID in our production cluster.
> (http://wiki.apache.org/hadoop/HDFS-RAID)
> I am not able to find source/binaries/configs for this in official hadoop
> distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
> Can somebody please tell me where can I find that? and installation
> procedure?
> Also, is HDFS-RAID implementation stable enough to use in production?
> thanks,
> Ajit.
>



-- 
Harsh J