You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Sourygna Luangsay <sl...@pragsis.com> on 2012/08/08 18:46:03 UTC

is HDFS RAID "data locality" efficient?

Hi folks!

 

I have just read about the HDFS RAID feature that was added to Hadoop 0.21
or 0.22. and I am quite curious to know if people use it, what kind of use
they have and what they think about Map/Reduce data locality.

 

First big actor of this technology is Facebook, that claims to save many PB
with it (see http://www.slideshare.net/ydn/hdfs-raid-facebook
<http://www.slideshare.net/ydn/hdfs-raid-facebook%20slides%204%20and%205>
slides 4 and 5).

 

I understand the following advantages with HDFS RAID:

-          You can save space

-          System tolerates more missing blocks

 

Nonetheless, one of the drawback I see is M/R data locality.

As far as I understand, the advantage of having 3 replicas of each blocks is
not only security if one server fails or a block is corrupted,
but also the possibility to have as far as 3 tasktrackers executing the map
task with local data.

If you consider the 4th slide of the Facebook presentation, such
infrastructure decreases this possibility to only 1 tasktracker.

That means that if this tasktracker is very busy executing other tasks, you
have the following choice:

-          Waiting this tasktracker to finish executing (part of) the
current tasks (freeing map slots for instance)

-          Executing the map task for this block in another tasktracker,
transferring the information of the block through the network

In both cases, you´ll get a M/R penalty (please, tell me if I am wrong).

 

Has somebody considered such penalty or has some benchmarks to share with
us?

 

One of the scenario I can think in order to take advantage of HDFS RAID
without suffering this penalty is:

-          Using normal HDFS with default replication=3 for my fresh data

-          Using HDFS RAID for my historical data (that is barely used by
M/R)

 

And you, what are you using HDFS RAID for?

 

Regards,

 

Sourygna Luangsay

RE: is HDFS RAID "data locality" efficient?

Posted by Sourygna Luangsay <sl...@pragsis.com>.

Hi,

 

Thanks a lot everybody for your replies.

Your ideas about cold and hot data using different storage policies prove to
be very interesting.

 

Regards,

 

Sourygna Luangsay

RE: is HDFS RAID "data locality" efficient?

Posted by Sourygna Luangsay <sl...@pragsis.com>.

Hi,

 

Thanks a lot everybody for your replies.

Your ideas about cold and hot data using different storage policies prove to
be very interesting.

 

Regards,

 

Sourygna Luangsay

RE: is HDFS RAID "data locality" efficient?

Posted by Sourygna Luangsay <sl...@pragsis.com>.

Hi,

 

Thanks a lot everybody for your replies.

Your ideas about cold and hot data using different storage policies prove to
be very interesting.

 

Regards,

 

Sourygna Luangsay

RE: is HDFS RAID "data locality" efficient?

Posted by Sourygna Luangsay <sl...@pragsis.com>.

Hi,

 

Thanks a lot everybody for your replies.

Your ideas about cold and hot data using different storage policies prove to
be very interesting.

 

Regards,

 

Sourygna Luangsay

Re: is HDFS RAID "data locality" efficient?

Posted by "in.abdul" <in...@gmail.com>.

Nice explanation guys .. thanks

Syed Abdul kather
send from Samsung S3
On Aug 9, 2012 12:02 AM, "Ajit Ratnaparkhi [via Lucene]" <
ml-node+s472066n3999922h8@n3.nabble.com> wrote:

> Agreed with Steve.
> That is most important use of HDFS RAID, where you consume less disk space
> with same reliability and availability guarantee at cost of processing
> performance. Most of data in hdfs is cold data, without HDFS RAID you end
> up maintaining 3 replicas of data which is hardly going to be processed
> again, but you cant remove/move this data to separate archive because if
>  required processing should be as soon as possible.
>
> -Ajit
>
> On Wed, Aug 8, 2012 at 11:01 PM, Steve Loughran <[hidden email]<http://user/SendEmail.jtp?type=node&node=3999922&i=0>
> > wrote:
>
>>
>>
>> On 8 August 2012 09:46, Sourygna Luangsay <[hidden email]<http://user/SendEmail.jtp?type=node&node=3999922&i=1>
>> > wrote:
>>
>>>  Hi folks!****
>>>
>>> One of the scenario I can think in order to take advantage of HDFS RAID
>>> without suffering this penalty is:**
>>>
>>> **-          **Using normal HDFS with default replication=3 for my
>>> “fresh data”****
>>>
>>> **-          **Using HDFS RAID for my historical data (that is barely
>>> used by M/R)****
>>>
>>> ** **
>>>
>>>
>>>
>> exactly: less space use on cold data, with the penalty that access
>> performance can be worse. As the majority of data on a hadoop cluster is
>> usually "cold", it's a space and power efficient story for the archive data
>>
>> --
>> Steve Loughran
>> Hortonworks Inc
>>
>>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/is-HDFS-RAID-data-locality-efficient-tp3999891p3999922.html
>  To unsubscribe from Lucene, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472066&code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw>
> .
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




-----
THANKS AND REGARDS,
SYED ABDUL KATHER
--
View this message in context: http://lucene.472066.n3.nabble.com/is-HDFS-RAID-data-locality-efficient-tp3999891p3999924.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: is HDFS RAID "data locality" efficient?

Posted by Ajit Ratnaparkhi <aj...@gmail.com>.

Agreed with Steve.
That is most important use of HDFS RAID, where you consume less disk space
with same reliability and availability guarantee at cost of processing
performance. Most of data in hdfs is cold data, without HDFS RAID you end
up maintaining 3 replicas of data which is hardly going to be processed
again, but you cant remove/move this data to separate archive because if
 required processing should be as soon as possible.

-Ajit

On Wed, Aug 8, 2012 at 11:01 PM, Steve Loughran <st...@hortonworks.com>wrote:

>
>
> On 8 August 2012 09:46, Sourygna Luangsay <sl...@pragsis.com> wrote:
>
>>  Hi folks!****
>>
>> One of the scenario I can think in order to take advantage of HDFS RAID
>> without suffering this penalty is:**
>>
>> **-          **Using normal HDFS with default replication=3 for my
>> “fresh data”****
>>
>> **-          **Using HDFS RAID for my historical data (that is barely
>> used by M/R)****
>>
>> ** **
>>
>>
>>
> exactly: less space use on cold data, with the penalty that access
> performance can be worse. As the majority of data on a hadoop cluster is
> usually "cold", it's a space and power efficient story for the archive data
>
> --
> Steve Loughran
> Hortonworks Inc
>
>

Re: is HDFS RAID "data locality" efficient?

Posted by Ajit Ratnaparkhi <aj...@gmail.com>.

Agreed with Steve.
That is most important use of HDFS RAID, where you consume less disk space
with same reliability and availability guarantee at cost of processing
performance. Most of data in hdfs is cold data, without HDFS RAID you end
up maintaining 3 replicas of data which is hardly going to be processed
again, but you cant remove/move this data to separate archive because if
 required processing should be as soon as possible.

-Ajit

On Wed, Aug 8, 2012 at 11:01 PM, Steve Loughran <st...@hortonworks.com>wrote:

>
>
> On 8 August 2012 09:46, Sourygna Luangsay <sl...@pragsis.com> wrote:
>
>>  Hi folks!****
>>
>> One of the scenario I can think in order to take advantage of HDFS RAID
>> without suffering this penalty is:**
>>
>> **-          **Using normal HDFS with default replication=3 for my
>> “fresh data”****
>>
>> **-          **Using HDFS RAID for my historical data (that is barely
>> used by M/R)****
>>
>> ** **
>>
>>
>>
> exactly: less space use on cold data, with the penalty that access
> performance can be worse. As the majority of data on a hadoop cluster is
> usually "cold", it's a space and power efficient story for the archive data
>
> --
> Steve Loughran
> Hortonworks Inc
>
>

Re: is HDFS RAID "data locality" efficient?

Posted by Michael Segel <mi...@hotmail.com>.

Ok... 

So under Apache Hadoop, how do you specify the location of when and where a directory will be created on HDFS? 

As an example, if I want to create a /coldData directory in HDFS as a place to store my older data sets, How does that get assigned specifically to a RAIDed HDFS?
(Or even specific machines?) 

I know I can do this in MapR's distribution, but I am not aware of this feature being made available in the Apache based releases? 

Is this part of the latest feature set? 

Thx

-Mike

On Aug 8, 2012, at 12:31 PM, Steve Loughran <st...@hortonworks.com> wrote:

> 
> 
> On 8 August 2012 09:46, Sourygna Luangsay <sl...@pragsis.com> wrote:
> Hi folks!
> 
> One of the scenario I can think in order to take advantage of HDFS RAID without suffering this penalty is:
> 
> -          Using normal HDFS with default replication=3 for my “fresh data”
> 
> -          Using HDFS RAID for my historical data (that is barely used by M/R)
> 
>  
> 
> 
> 
> 
> exactly: less space use on cold data, with the penalty that access performance can be worse. As the majority of data on a hadoop cluster is usually "cold", it's a space and power efficient story for the archive data
> 
> -- 
> Steve Loughran
> Hortonworks Inc
>

Re: is HDFS RAID "data locality" efficient?

Posted by Michael Segel <mi...@hotmail.com>.

Ok... 

So under Apache Hadoop, how do you specify the location of when and where a directory will be created on HDFS? 

As an example, if I want to create a /coldData directory in HDFS as a place to store my older data sets, How does that get assigned specifically to a RAIDed HDFS?
(Or even specific machines?) 

I know I can do this in MapR's distribution, but I am not aware of this feature being made available in the Apache based releases? 

Is this part of the latest feature set? 

Thx

-Mike

On Aug 8, 2012, at 12:31 PM, Steve Loughran <st...@hortonworks.com> wrote:

> 
> 
> On 8 August 2012 09:46, Sourygna Luangsay <sl...@pragsis.com> wrote:
> Hi folks!
> 
> One of the scenario I can think in order to take advantage of HDFS RAID without suffering this penalty is:
> 
> -          Using normal HDFS with default replication=3 for my “fresh data”
> 
> -          Using HDFS RAID for my historical data (that is barely used by M/R)
> 
>  
> 
> 
> 
> 
> exactly: less space use on cold data, with the penalty that access performance can be worse. As the majority of data on a hadoop cluster is usually "cold", it's a space and power efficient story for the archive data
> 
> -- 
> Steve Loughran
> Hortonworks Inc
>

Re: is HDFS RAID "data locality" efficient?

Posted by Ajit Ratnaparkhi <aj...@gmail.com>.

Agreed with Steve.
That is most important use of HDFS RAID, where you consume less disk space
with same reliability and availability guarantee at cost of processing
performance. Most of data in hdfs is cold data, without HDFS RAID you end
up maintaining 3 replicas of data which is hardly going to be processed
again, but you cant remove/move this data to separate archive because if
 required processing should be as soon as possible.

-Ajit

On Wed, Aug 8, 2012 at 11:01 PM, Steve Loughran <st...@hortonworks.com>wrote:

>
>
> On 8 August 2012 09:46, Sourygna Luangsay <sl...@pragsis.com> wrote:
>
>>  Hi folks!****
>>
>> One of the scenario I can think in order to take advantage of HDFS RAID
>> without suffering this penalty is:**
>>
>> **-          **Using normal HDFS with default replication=3 for my
>> “fresh data”****
>>
>> **-          **Using HDFS RAID for my historical data (that is barely
>> used by M/R)****
>>
>> ** **
>>
>>
>>
> exactly: less space use on cold data, with the penalty that access
> performance can be worse. As the majority of data on a hadoop cluster is
> usually "cold", it's a space and power efficient story for the archive data
>
> --
> Steve Loughran
> Hortonworks Inc
>
>

Re: is HDFS RAID "data locality" efficient?

Posted by Michael Segel <mi...@hotmail.com>.

Ok... 

So under Apache Hadoop, how do you specify the location of when and where a directory will be created on HDFS? 

As an example, if I want to create a /coldData directory in HDFS as a place to store my older data sets, How does that get assigned specifically to a RAIDed HDFS?
(Or even specific machines?) 

I know I can do this in MapR's distribution, but I am not aware of this feature being made available in the Apache based releases? 

Is this part of the latest feature set? 

Thx

-Mike

On Aug 8, 2012, at 12:31 PM, Steve Loughran <st...@hortonworks.com> wrote:

> 
> 
> On 8 August 2012 09:46, Sourygna Luangsay <sl...@pragsis.com> wrote:
> Hi folks!
> 
> One of the scenario I can think in order to take advantage of HDFS RAID without suffering this penalty is:
> 
> -          Using normal HDFS with default replication=3 for my “fresh data”
> 
> -          Using HDFS RAID for my historical data (that is barely used by M/R)
> 
>  
> 
> 
> 
> 
> exactly: less space use on cold data, with the penalty that access performance can be worse. As the majority of data on a hadoop cluster is usually "cold", it's a space and power efficient story for the archive data
> 
> -- 
> Steve Loughran
> Hortonworks Inc
>

Re: is HDFS RAID "data locality" efficient?

Posted by Michael Segel <mi...@hotmail.com>.

Ok... 

So under Apache Hadoop, how do you specify the location of when and where a directory will be created on HDFS? 

As an example, if I want to create a /coldData directory in HDFS as a place to store my older data sets, How does that get assigned specifically to a RAIDed HDFS?
(Or even specific machines?) 

I know I can do this in MapR's distribution, but I am not aware of this feature being made available in the Apache based releases? 

Is this part of the latest feature set? 

Thx

-Mike

On Aug 8, 2012, at 12:31 PM, Steve Loughran <st...@hortonworks.com> wrote:

> 
> 
> On 8 August 2012 09:46, Sourygna Luangsay <sl...@pragsis.com> wrote:
> Hi folks!
> 
> One of the scenario I can think in order to take advantage of HDFS RAID without suffering this penalty is:
> 
> -          Using normal HDFS with default replication=3 for my “fresh data”
> 
> -          Using HDFS RAID for my historical data (that is barely used by M/R)
> 
>  
> 
> 
> 
> 
> exactly: less space use on cold data, with the penalty that access performance can be worse. As the majority of data on a hadoop cluster is usually "cold", it's a space and power efficient story for the archive data
> 
> -- 
> Steve Loughran
> Hortonworks Inc
>

Re: is HDFS RAID "data locality" efficient?

Posted by Ajit Ratnaparkhi <aj...@gmail.com>.

Agreed with Steve.
That is most important use of HDFS RAID, where you consume less disk space
with same reliability and availability guarantee at cost of processing
performance. Most of data in hdfs is cold data, without HDFS RAID you end
up maintaining 3 replicas of data which is hardly going to be processed
again, but you cant remove/move this data to separate archive because if
 required processing should be as soon as possible.

-Ajit

On Wed, Aug 8, 2012 at 11:01 PM, Steve Loughran <st...@hortonworks.com>wrote:

>
>
> On 8 August 2012 09:46, Sourygna Luangsay <sl...@pragsis.com> wrote:
>
>>  Hi folks!****
>>
>> One of the scenario I can think in order to take advantage of HDFS RAID
>> without suffering this penalty is:**
>>
>> **-          **Using normal HDFS with default replication=3 for my
>> “fresh data”****
>>
>> **-          **Using HDFS RAID for my historical data (that is barely
>> used by M/R)****
>>
>> ** **
>>
>>
>>
> exactly: less space use on cold data, with the penalty that access
> performance can be worse. As the majority of data on a hadoop cluster is
> usually "cold", it's a space and power efficient story for the archive data
>
> --
> Steve Loughran
> Hortonworks Inc
>
>

Re: is HDFS RAID "data locality" efficient?

Posted by Steve Loughran <st...@hortonworks.com>.

On 8 August 2012 09:46, Sourygna Luangsay <sl...@pragsis.com> wrote:

>  Hi folks!****
>
> One of the scenario I can think in order to take advantage of HDFS RAID
> without suffering this penalty is:**
>
> **-          **Using normal HDFS with default replication=3 for my “fresh
> data”****
>
> **-          **Using HDFS RAID for my historical data (that is barely
> used by M/R)****
>
> ** **
>
>
>
exactly: less space use on cold data, with the penalty that access
performance can be worse. As the majority of data on a hadoop cluster is
usually "cold", it's a space and power efficient story for the archive data

-- 
Steve Loughran
Hortonworks Inc

unsubscribe

Posted by Avram Aelony <Av...@eharmony.com>.

unsubscribe

unsubscribe

Posted by Avram Aelony <Av...@eharmony.com>.

unsubscribe

RE: is HDFS RAID "data locality" efficient?

Posted by "D'Souza, Clive V" <cl...@intel.com>.

Adding to Gaurav’s sentiment - using object stores with Erasure code is pretty good solution when the data starts creeping into the PB scale with a need for redundancy.

Look at Amplidata solutions, they seem to have good stack.

Regards,
-C

From: Gaurav Sharma [mailto:gaurav.gs.sharma@gmail.com]
Sent: Wednesday, August 08, 2012 10:25 AM
To: user@hadoop.apache.org
Subject: Re: is HDFS RAID "data locality" efficient?

Indeed, erasure encoding is a component of a good storage solution esp. for holding on to PB scale datasets but there's an associated cost in terms of latency for real time serving. Depending on the domain (eg. where temporal locality is observed in access patterns), it works well if the hot dataset is small and can be served efficiently from elsewhere. It is a great fit for DW type workloads. Fb had a good presentation sometime back where they discussed a typical impl with Reed Solomon codes et al.

On Aug 8, 2012, at 10:06, Michael Segel <mi...@hotmail.com>> wrote:
Just something to think about...

There's a company here in Chicago called Cleversafe. I believe they recently made an announcement concerning Hadoop?

The interesting thing about RAID is that you're adding to the disk latency and depending on which raid you use you could kill performance on a rebuild of a disk.

In terms of uptime of Apache based Hadoop, RAID allows you to actually hot swap the disks and unless you lose both drives (assuming Raid 1, mirroring), your DN doesn't know and doesn't have to go down.
So there is some value there, however at the expense of storage and storage costs.

You can reduce the replication factor to 2. I don't know that I would go to anything lower because you still can lose the server...

In terms of data locality... maybe you lose a bit, however... because you're raiding your storage, you now have less data per node. So you end up with more nodes, right?

Just some food for thought.

On Aug 8, 2012, at 11:46 AM, Sourygna Luangsay <sl...@pragsis.com>> wrote:

Hi folks!

I have just read about the HDFS RAID feature that was added to Hadoop 0.21 or 0.22. and I am quite curious to know if people use it, what kind of use
they have and what they think about Map/Reduce data locality.

First big actor of this technology is Facebook, that claims to save many PB with it (see http://www.slideshare.net/ydn/hdfs-raid-facebook slides 4 and 5<http://www.slideshare.net/ydn/hdfs-raid-facebook%20slides%204%20and%205>).

I understand the following advantages with HDFS RAID:
-          You can save space
-          System tolerates more missing blocks

Nonetheless, one of the drawback I see is M/R data locality.
As far as I understand, the advantage of having 3 replicas of each blocks is not only security if one server fails or a block is corrupted,
but also the possibility to have as far as 3 tasktrackers executing the map task with “local data”.
If you consider the 4th slide of the Facebook presentation, such infrastructure decreases this possibility to only 1 tasktracker.
That means that if this tasktracker is very busy executing other tasks, you have the following choice:
-          Waiting this tasktracker to finish executing (part of) the current tasks (freeing map slots for instance)
-          Executing the map task for this block in another tasktracker, transferring the information of the block through the network
In both cases, you´ll get a M/R penalty (please, tell me if I am wrong).

Has somebody considered such penalty or has some benchmarks to share with us?

One of the scenario I can think in order to take advantage of HDFS RAID without suffering this penalty is:
-          Using normal HDFS with default replication=3 for my “fresh data”
-          Using HDFS RAID for my historical data (that is barely used by M/R)

And you, what are you using HDFS RAID for?

Regards,

Sourygna Luangsay

unsubscribe

Posted by Avram Aelony <Av...@eharmony.com>.

unsubscribe

RE: is HDFS RAID "data locality" efficient?

Posted by "D'Souza, Clive V" <cl...@intel.com>.

Adding to Gaurav’s sentiment - using object stores with Erasure code is pretty good solution when the data starts creeping into the PB scale with a need for redundancy.

Look at Amplidata solutions, they seem to have good stack.

Regards,
-C

From: Gaurav Sharma [mailto:gaurav.gs.sharma@gmail.com]
Sent: Wednesday, August 08, 2012 10:25 AM
To: user@hadoop.apache.org
Subject: Re: is HDFS RAID "data locality" efficient?

Indeed, erasure encoding is a component of a good storage solution esp. for holding on to PB scale datasets but there's an associated cost in terms of latency for real time serving. Depending on the domain (eg. where temporal locality is observed in access patterns), it works well if the hot dataset is small and can be served efficiently from elsewhere. It is a great fit for DW type workloads. Fb had a good presentation sometime back where they discussed a typical impl with Reed Solomon codes et al.

On Aug 8, 2012, at 10:06, Michael Segel <mi...@hotmail.com>> wrote:
Just something to think about...

There's a company here in Chicago called Cleversafe. I believe they recently made an announcement concerning Hadoop?

The interesting thing about RAID is that you're adding to the disk latency and depending on which raid you use you could kill performance on a rebuild of a disk.

In terms of uptime of Apache based Hadoop, RAID allows you to actually hot swap the disks and unless you lose both drives (assuming Raid 1, mirroring), your DN doesn't know and doesn't have to go down.
So there is some value there, however at the expense of storage and storage costs.

You can reduce the replication factor to 2. I don't know that I would go to anything lower because you still can lose the server...

In terms of data locality... maybe you lose a bit, however... because you're raiding your storage, you now have less data per node. So you end up with more nodes, right?

Just some food for thought.

On Aug 8, 2012, at 11:46 AM, Sourygna Luangsay <sl...@pragsis.com>> wrote:

Hi folks!

I have just read about the HDFS RAID feature that was added to Hadoop 0.21 or 0.22. and I am quite curious to know if people use it, what kind of use
they have and what they think about Map/Reduce data locality.

First big actor of this technology is Facebook, that claims to save many PB with it (see http://www.slideshare.net/ydn/hdfs-raid-facebook slides 4 and 5<http://www.slideshare.net/ydn/hdfs-raid-facebook%20slides%204%20and%205>).

I understand the following advantages with HDFS RAID:
-          You can save space
-          System tolerates more missing blocks

Nonetheless, one of the drawback I see is M/R data locality.
As far as I understand, the advantage of having 3 replicas of each blocks is not only security if one server fails or a block is corrupted,
but also the possibility to have as far as 3 tasktrackers executing the map task with “local data”.
If you consider the 4th slide of the Facebook presentation, such infrastructure decreases this possibility to only 1 tasktracker.
That means that if this tasktracker is very busy executing other tasks, you have the following choice:
-          Waiting this tasktracker to finish executing (part of) the current tasks (freeing map slots for instance)
-          Executing the map task for this block in another tasktracker, transferring the information of the block through the network
In both cases, you´ll get a M/R penalty (please, tell me if I am wrong).

Has somebody considered such penalty or has some benchmarks to share with us?

One of the scenario I can think in order to take advantage of HDFS RAID without suffering this penalty is:
-          Using normal HDFS with default replication=3 for my “fresh data”
-          Using HDFS RAID for my historical data (that is barely used by M/R)

And you, what are you using HDFS RAID for?

Regards,

Sourygna Luangsay

RE: is HDFS RAID "data locality" efficient?

Posted by "D'Souza, Clive V" <cl...@intel.com>.

Adding to Gaurav’s sentiment - using object stores with Erasure code is pretty good solution when the data starts creeping into the PB scale with a need for redundancy.

Look at Amplidata solutions, they seem to have good stack.

Regards,
-C

From: Gaurav Sharma [mailto:gaurav.gs.sharma@gmail.com]
Sent: Wednesday, August 08, 2012 10:25 AM
To: user@hadoop.apache.org
Subject: Re: is HDFS RAID "data locality" efficient?

Indeed, erasure encoding is a component of a good storage solution esp. for holding on to PB scale datasets but there's an associated cost in terms of latency for real time serving. Depending on the domain (eg. where temporal locality is observed in access patterns), it works well if the hot dataset is small and can be served efficiently from elsewhere. It is a great fit for DW type workloads. Fb had a good presentation sometime back where they discussed a typical impl with Reed Solomon codes et al.

On Aug 8, 2012, at 10:06, Michael Segel <mi...@hotmail.com>> wrote:
Just something to think about...

There's a company here in Chicago called Cleversafe. I believe they recently made an announcement concerning Hadoop?

The interesting thing about RAID is that you're adding to the disk latency and depending on which raid you use you could kill performance on a rebuild of a disk.

In terms of uptime of Apache based Hadoop, RAID allows you to actually hot swap the disks and unless you lose both drives (assuming Raid 1, mirroring), your DN doesn't know and doesn't have to go down.
So there is some value there, however at the expense of storage and storage costs.

You can reduce the replication factor to 2. I don't know that I would go to anything lower because you still can lose the server...

In terms of data locality... maybe you lose a bit, however... because you're raiding your storage, you now have less data per node. So you end up with more nodes, right?

Just some food for thought.

On Aug 8, 2012, at 11:46 AM, Sourygna Luangsay <sl...@pragsis.com>> wrote:

Hi folks!

I have just read about the HDFS RAID feature that was added to Hadoop 0.21 or 0.22. and I am quite curious to know if people use it, what kind of use
they have and what they think about Map/Reduce data locality.

First big actor of this technology is Facebook, that claims to save many PB with it (see http://www.slideshare.net/ydn/hdfs-raid-facebook slides 4 and 5<http://www.slideshare.net/ydn/hdfs-raid-facebook%20slides%204%20and%205>).

I understand the following advantages with HDFS RAID:
-          You can save space
-          System tolerates more missing blocks

Nonetheless, one of the drawback I see is M/R data locality.
As far as I understand, the advantage of having 3 replicas of each blocks is not only security if one server fails or a block is corrupted,
but also the possibility to have as far as 3 tasktrackers executing the map task with “local data”.
If you consider the 4th slide of the Facebook presentation, such infrastructure decreases this possibility to only 1 tasktracker.
That means that if this tasktracker is very busy executing other tasks, you have the following choice:
-          Waiting this tasktracker to finish executing (part of) the current tasks (freeing map slots for instance)
-          Executing the map task for this block in another tasktracker, transferring the information of the block through the network
In both cases, you´ll get a M/R penalty (please, tell me if I am wrong).

Has somebody considered such penalty or has some benchmarks to share with us?

One of the scenario I can think in order to take advantage of HDFS RAID without suffering this penalty is:
-          Using normal HDFS with default replication=3 for my “fresh data”
-          Using HDFS RAID for my historical data (that is barely used by M/R)

And you, what are you using HDFS RAID for?

Regards,

Sourygna Luangsay

unsubscribe

Posted by Avram Aelony <Av...@eharmony.com>.

unsubscribe

RE: is HDFS RAID "data locality" efficient?

Posted by "D'Souza, Clive V" <cl...@intel.com>.

Adding to Gaurav’s sentiment - using object stores with Erasure code is pretty good solution when the data starts creeping into the PB scale with a need for redundancy.

Look at Amplidata solutions, they seem to have good stack.

Regards,
-C

From: Gaurav Sharma [mailto:gaurav.gs.sharma@gmail.com]
Sent: Wednesday, August 08, 2012 10:25 AM
To: user@hadoop.apache.org
Subject: Re: is HDFS RAID "data locality" efficient?

Indeed, erasure encoding is a component of a good storage solution esp. for holding on to PB scale datasets but there's an associated cost in terms of latency for real time serving. Depending on the domain (eg. where temporal locality is observed in access patterns), it works well if the hot dataset is small and can be served efficiently from elsewhere. It is a great fit for DW type workloads. Fb had a good presentation sometime back where they discussed a typical impl with Reed Solomon codes et al.

On Aug 8, 2012, at 10:06, Michael Segel <mi...@hotmail.com>> wrote:
Just something to think about...

There's a company here in Chicago called Cleversafe. I believe they recently made an announcement concerning Hadoop?

The interesting thing about RAID is that you're adding to the disk latency and depending on which raid you use you could kill performance on a rebuild of a disk.

In terms of uptime of Apache based Hadoop, RAID allows you to actually hot swap the disks and unless you lose both drives (assuming Raid 1, mirroring), your DN doesn't know and doesn't have to go down.
So there is some value there, however at the expense of storage and storage costs.

You can reduce the replication factor to 2. I don't know that I would go to anything lower because you still can lose the server...

In terms of data locality... maybe you lose a bit, however... because you're raiding your storage, you now have less data per node. So you end up with more nodes, right?

Just some food for thought.

On Aug 8, 2012, at 11:46 AM, Sourygna Luangsay <sl...@pragsis.com>> wrote:

Hi folks!

I have just read about the HDFS RAID feature that was added to Hadoop 0.21 or 0.22. and I am quite curious to know if people use it, what kind of use
they have and what they think about Map/Reduce data locality.

First big actor of this technology is Facebook, that claims to save many PB with it (see http://www.slideshare.net/ydn/hdfs-raid-facebook slides 4 and 5<http://www.slideshare.net/ydn/hdfs-raid-facebook%20slides%204%20and%205>).

I understand the following advantages with HDFS RAID:
-          You can save space
-          System tolerates more missing blocks

Nonetheless, one of the drawback I see is M/R data locality.
As far as I understand, the advantage of having 3 replicas of each blocks is not only security if one server fails or a block is corrupted,
but also the possibility to have as far as 3 tasktrackers executing the map task with “local data”.
If you consider the 4th slide of the Facebook presentation, such infrastructure decreases this possibility to only 1 tasktracker.
That means that if this tasktracker is very busy executing other tasks, you have the following choice:
-          Waiting this tasktracker to finish executing (part of) the current tasks (freeing map slots for instance)
-          Executing the map task for this block in another tasktracker, transferring the information of the block through the network
In both cases, you´ll get a M/R penalty (please, tell me if I am wrong).

Has somebody considered such penalty or has some benchmarks to share with us?

One of the scenario I can think in order to take advantage of HDFS RAID without suffering this penalty is:
-          Using normal HDFS with default replication=3 for my “fresh data”
-          Using HDFS RAID for my historical data (that is barely used by M/R)

And you, what are you using HDFS RAID for?

Regards,

Sourygna Luangsay

Re: is HDFS RAID "data locality" efficient?

Posted by Gaurav Sharma <ga...@gmail.com>.

Indeed, erasure encoding is a component of a good storage solution esp. for holding on to PB scale datasets but there's an associated cost in terms of latency for real time serving. Depending on the domain (eg. where temporal locality is observed in access patterns), it works well if the hot dataset is small and can be served efficiently from elsewhere. It is a great fit for DW type workloads. Fb had a good presentation sometime back where they discussed a typical impl with Reed Solomon codes et al.

On Aug 8, 2012, at 10:06, Michael Segel <mi...@hotmail.com> wrote:

> Just something to think about... 
> 
> There's a company here in Chicago called Cleversafe. I believe they recently made an announcement concerning Hadoop? 
> 
> The interesting thing about RAID is that you're adding to the disk latency and depending on which raid you use you could kill performance on a rebuild of a disk. 
> 
> In terms of uptime of Apache based Hadoop, RAID allows you to actually hot swap the disks and unless you lose both drives (assuming Raid 1, mirroring), your DN doesn't know and doesn't have to go down. 
> So there is some value there, however at the expense of storage and storage costs. 
> 
> You can reduce the replication factor to 2. I don't know that I would go to anything lower because you still can lose the server... 
> 
> In terms of data locality... maybe you lose a bit, however... because you're raiding your storage, you now have less data per node. So you end up with more nodes, right? 
> 
> Just some food for thought. 
> 
> On Aug 8, 2012, at 11:46 AM, Sourygna Luangsay <sl...@pragsis.com> wrote:
> 
>> Hi folks!
>>  
>> I have just read about the HDFS RAID feature that was added to Hadoop 0.21 or 0.22. and I am quite curious to know if people use it, what kind of use
>> they have and what they think about Map/Reduce data locality.
>>  
>> First big actor of this technology is Facebook, that claims to save many PB with it (see http://www.slideshare.net/ydn/hdfs-raid-facebook slides 4 and 5).
>>  
>> I understand the following advantages with HDFS RAID:
>> -          You can save space
>> -          System tolerates more missing blocks
>>  
>> Nonetheless, one of the drawback I see is M/R data locality.
>> As far as I understand, the advantage of having 3 replicas of each blocks is not only security if one server fails or a block is corrupted,
>> but also the possibility to have as far as 3 tasktrackers executing the map task with “local data”.
>> If you consider the 4th slide of the Facebook presentation, such infrastructure decreases this possibility to only 1 tasktracker.
>> That means that if this tasktracker is very busy executing other tasks, you have the following choice:
>> -          Waiting this tasktracker to finish executing (part of) the current tasks (freeing map slots for instance)
>> -          Executing the map task for this block in another tasktracker, transferring the information of the block through the network
>> In both cases, you´ll get a M/R penalty (please, tell me if I am wrong).
>>  
>> Has somebody considered such penalty or has some benchmarks to share with us?
>>  
>> One of the scenario I can think in order to take advantage of HDFS RAID without suffering this penalty is:
>> -          Using normal HDFS with default replication=3 for my “fresh data”
>> -          Using HDFS RAID for my historical data (that is barely used by M/R)
>>  
>> And you, what are you using HDFS RAID for?
>>  
>> Regards,
>>  
>> Sourygna Luangsay
>

Re: is HDFS RAID "data locality" efficient?

Posted by Gaurav Sharma <ga...@gmail.com>.

Indeed, erasure encoding is a component of a good storage solution esp. for holding on to PB scale datasets but there's an associated cost in terms of latency for real time serving. Depending on the domain (eg. where temporal locality is observed in access patterns), it works well if the hot dataset is small and can be served efficiently from elsewhere. It is a great fit for DW type workloads. Fb had a good presentation sometime back where they discussed a typical impl with Reed Solomon codes et al.

On Aug 8, 2012, at 10:06, Michael Segel <mi...@hotmail.com> wrote:

> Just something to think about... 
> 
> There's a company here in Chicago called Cleversafe. I believe they recently made an announcement concerning Hadoop? 
> 
> The interesting thing about RAID is that you're adding to the disk latency and depending on which raid you use you could kill performance on a rebuild of a disk. 
> 
> In terms of uptime of Apache based Hadoop, RAID allows you to actually hot swap the disks and unless you lose both drives (assuming Raid 1, mirroring), your DN doesn't know and doesn't have to go down. 
> So there is some value there, however at the expense of storage and storage costs. 
> 
> You can reduce the replication factor to 2. I don't know that I would go to anything lower because you still can lose the server... 
> 
> In terms of data locality... maybe you lose a bit, however... because you're raiding your storage, you now have less data per node. So you end up with more nodes, right? 
> 
> Just some food for thought. 
> 
> On Aug 8, 2012, at 11:46 AM, Sourygna Luangsay <sl...@pragsis.com> wrote:
> 
>> Hi folks!
>>  
>> I have just read about the HDFS RAID feature that was added to Hadoop 0.21 or 0.22. and I am quite curious to know if people use it, what kind of use
>> they have and what they think about Map/Reduce data locality.
>>  
>> First big actor of this technology is Facebook, that claims to save many PB with it (see http://www.slideshare.net/ydn/hdfs-raid-facebook slides 4 and 5).
>>  
>> I understand the following advantages with HDFS RAID:
>> -          You can save space
>> -          System tolerates more missing blocks
>>  
>> Nonetheless, one of the drawback I see is M/R data locality.
>> As far as I understand, the advantage of having 3 replicas of each blocks is not only security if one server fails or a block is corrupted,
>> but also the possibility to have as far as 3 tasktrackers executing the map task with “local data”.
>> If you consider the 4th slide of the Facebook presentation, such infrastructure decreases this possibility to only 1 tasktracker.
>> That means that if this tasktracker is very busy executing other tasks, you have the following choice:
>> -          Waiting this tasktracker to finish executing (part of) the current tasks (freeing map slots for instance)
>> -          Executing the map task for this block in another tasktracker, transferring the information of the block through the network
>> In both cases, you´ll get a M/R penalty (please, tell me if I am wrong).
>>  
>> Has somebody considered such penalty or has some benchmarks to share with us?
>>  
>> One of the scenario I can think in order to take advantage of HDFS RAID without suffering this penalty is:
>> -          Using normal HDFS with default replication=3 for my “fresh data”
>> -          Using HDFS RAID for my historical data (that is barely used by M/R)
>>  
>> And you, what are you using HDFS RAID for?
>>  
>> Regards,
>>  
>> Sourygna Luangsay
>

Re: is HDFS RAID "data locality" efficient?

Posted by Gaurav Sharma <ga...@gmail.com>.

Indeed, erasure encoding is a component of a good storage solution esp. for holding on to PB scale datasets but there's an associated cost in terms of latency for real time serving. Depending on the domain (eg. where temporal locality is observed in access patterns), it works well if the hot dataset is small and can be served efficiently from elsewhere. It is a great fit for DW type workloads. Fb had a good presentation sometime back where they discussed a typical impl with Reed Solomon codes et al.

On Aug 8, 2012, at 10:06, Michael Segel <mi...@hotmail.com> wrote:

> Just something to think about... 
> 
> There's a company here in Chicago called Cleversafe. I believe they recently made an announcement concerning Hadoop? 
> 
> The interesting thing about RAID is that you're adding to the disk latency and depending on which raid you use you could kill performance on a rebuild of a disk. 
> 
> In terms of uptime of Apache based Hadoop, RAID allows you to actually hot swap the disks and unless you lose both drives (assuming Raid 1, mirroring), your DN doesn't know and doesn't have to go down. 
> So there is some value there, however at the expense of storage and storage costs. 
> 
> You can reduce the replication factor to 2. I don't know that I would go to anything lower because you still can lose the server... 
> 
> In terms of data locality... maybe you lose a bit, however... because you're raiding your storage, you now have less data per node. So you end up with more nodes, right? 
> 
> Just some food for thought. 
> 
> On Aug 8, 2012, at 11:46 AM, Sourygna Luangsay <sl...@pragsis.com> wrote:
> 
>> Hi folks!
>>  
>> I have just read about the HDFS RAID feature that was added to Hadoop 0.21 or 0.22. and I am quite curious to know if people use it, what kind of use
>> they have and what they think about Map/Reduce data locality.
>>  
>> First big actor of this technology is Facebook, that claims to save many PB with it (see http://www.slideshare.net/ydn/hdfs-raid-facebook slides 4 and 5).
>>  
>> I understand the following advantages with HDFS RAID:
>> -          You can save space
>> -          System tolerates more missing blocks
>>  
>> Nonetheless, one of the drawback I see is M/R data locality.
>> As far as I understand, the advantage of having 3 replicas of each blocks is not only security if one server fails or a block is corrupted,
>> but also the possibility to have as far as 3 tasktrackers executing the map task with “local data”.
>> If you consider the 4th slide of the Facebook presentation, such infrastructure decreases this possibility to only 1 tasktracker.
>> That means that if this tasktracker is very busy executing other tasks, you have the following choice:
>> -          Waiting this tasktracker to finish executing (part of) the current tasks (freeing map slots for instance)
>> -          Executing the map task for this block in another tasktracker, transferring the information of the block through the network
>> In both cases, you´ll get a M/R penalty (please, tell me if I am wrong).
>>  
>> Has somebody considered such penalty or has some benchmarks to share with us?
>>  
>> One of the scenario I can think in order to take advantage of HDFS RAID without suffering this penalty is:
>> -          Using normal HDFS with default replication=3 for my “fresh data”
>> -          Using HDFS RAID for my historical data (that is barely used by M/R)
>>  
>> And you, what are you using HDFS RAID for?
>>  
>> Regards,
>>  
>> Sourygna Luangsay
>

Re: is HDFS RAID "data locality" efficient?

Posted by Gaurav Sharma <ga...@gmail.com>.

Indeed, erasure encoding is a component of a good storage solution esp. for holding on to PB scale datasets but there's an associated cost in terms of latency for real time serving. Depending on the domain (eg. where temporal locality is observed in access patterns), it works well if the hot dataset is small and can be served efficiently from elsewhere. It is a great fit for DW type workloads. Fb had a good presentation sometime back where they discussed a typical impl with Reed Solomon codes et al.

On Aug 8, 2012, at 10:06, Michael Segel <mi...@hotmail.com> wrote:

> Just something to think about... 
> 
> There's a company here in Chicago called Cleversafe. I believe they recently made an announcement concerning Hadoop? 
> 
> The interesting thing about RAID is that you're adding to the disk latency and depending on which raid you use you could kill performance on a rebuild of a disk. 
> 
> In terms of uptime of Apache based Hadoop, RAID allows you to actually hot swap the disks and unless you lose both drives (assuming Raid 1, mirroring), your DN doesn't know and doesn't have to go down. 
> So there is some value there, however at the expense of storage and storage costs. 
> 
> You can reduce the replication factor to 2. I don't know that I would go to anything lower because you still can lose the server... 
> 
> In terms of data locality... maybe you lose a bit, however... because you're raiding your storage, you now have less data per node. So you end up with more nodes, right? 
> 
> Just some food for thought. 
> 
> On Aug 8, 2012, at 11:46 AM, Sourygna Luangsay <sl...@pragsis.com> wrote:
> 
>> Hi folks!
>>  
>> I have just read about the HDFS RAID feature that was added to Hadoop 0.21 or 0.22. and I am quite curious to know if people use it, what kind of use
>> they have and what they think about Map/Reduce data locality.
>>  
>> First big actor of this technology is Facebook, that claims to save many PB with it (see http://www.slideshare.net/ydn/hdfs-raid-facebook slides 4 and 5).
>>  
>> I understand the following advantages with HDFS RAID:
>> -          You can save space
>> -          System tolerates more missing blocks
>>  
>> Nonetheless, one of the drawback I see is M/R data locality.
>> As far as I understand, the advantage of having 3 replicas of each blocks is not only security if one server fails or a block is corrupted,
>> but also the possibility to have as far as 3 tasktrackers executing the map task with “local data”.
>> If you consider the 4th slide of the Facebook presentation, such infrastructure decreases this possibility to only 1 tasktracker.
>> That means that if this tasktracker is very busy executing other tasks, you have the following choice:
>> -          Waiting this tasktracker to finish executing (part of) the current tasks (freeing map slots for instance)
>> -          Executing the map task for this block in another tasktracker, transferring the information of the block through the network
>> In both cases, you´ll get a M/R penalty (please, tell me if I am wrong).
>>  
>> Has somebody considered such penalty or has some benchmarks to share with us?
>>  
>> One of the scenario I can think in order to take advantage of HDFS RAID without suffering this penalty is:
>> -          Using normal HDFS with default replication=3 for my “fresh data”
>> -          Using HDFS RAID for my historical data (that is barely used by M/R)
>>  
>> And you, what are you using HDFS RAID for?
>>  
>> Regards,
>>  
>> Sourygna Luangsay
>

Re: is HDFS RAID "data locality" efficient?

Posted by Michael Segel <mi...@hotmail.com>.

Just something to think about... 

There's a company here in Chicago called Cleversafe. I believe they recently made an announcement concerning Hadoop? 

The interesting thing about RAID is that you're adding to the disk latency and depending on which raid you use you could kill performance on a rebuild of a disk. 

In terms of uptime of Apache based Hadoop, RAID allows you to actually hot swap the disks and unless you lose both drives (assuming Raid 1, mirroring), your DN doesn't know and doesn't have to go down. 
So there is some value there, however at the expense of storage and storage costs. 

You can reduce the replication factor to 2. I don't know that I would go to anything lower because you still can lose the server... 

In terms of data locality... maybe you lose a bit, however... because you're raiding your storage, you now have less data per node. So you end up with more nodes, right? 

Just some food for thought. 

On Aug 8, 2012, at 11:46 AM, Sourygna Luangsay <sl...@pragsis.com> wrote:

> Hi folks!
>  
> I have just read about the HDFS RAID feature that was added to Hadoop 0.21 or 0.22. and I am quite curious to know if people use it, what kind of use
> they have and what they think about Map/Reduce data locality.
>  
> First big actor of this technology is Facebook, that claims to save many PB with it (see http://www.slideshare.net/ydn/hdfs-raid-facebook slides 4 and 5).
>  
> I understand the following advantages with HDFS RAID:
> -          You can save space
> -          System tolerates more missing blocks
>  
> Nonetheless, one of the drawback I see is M/R data locality.
> As far as I understand, the advantage of having 3 replicas of each blocks is not only security if one server fails or a block is corrupted,
> but also the possibility to have as far as 3 tasktrackers executing the map task with “local data”.
> If you consider the 4th slide of the Facebook presentation, such infrastructure decreases this possibility to only 1 tasktracker.
> That means that if this tasktracker is very busy executing other tasks, you have the following choice:
> -          Waiting this tasktracker to finish executing (part of) the current tasks (freeing map slots for instance)
> -          Executing the map task for this block in another tasktracker, transferring the information of the block through the network
> In both cases, you´ll get a M/R penalty (please, tell me if I am wrong).
>  
> Has somebody considered such penalty or has some benchmarks to share with us?
>  
> One of the scenario I can think in order to take advantage of HDFS RAID without suffering this penalty is:
> -          Using normal HDFS with default replication=3 for my “fresh data”
> -          Using HDFS RAID for my historical data (that is barely used by M/R)
>  
> And you, what are you using HDFS RAID for?
>  
> Regards,
>  
> Sourygna Luangsay

Re: is HDFS RAID "data locality" efficient?

Posted by Michael Segel <mi...@hotmail.com>.

Just something to think about... 

There's a company here in Chicago called Cleversafe. I believe they recently made an announcement concerning Hadoop? 

The interesting thing about RAID is that you're adding to the disk latency and depending on which raid you use you could kill performance on a rebuild of a disk. 

In terms of uptime of Apache based Hadoop, RAID allows you to actually hot swap the disks and unless you lose both drives (assuming Raid 1, mirroring), your DN doesn't know and doesn't have to go down. 
So there is some value there, however at the expense of storage and storage costs. 

You can reduce the replication factor to 2. I don't know that I would go to anything lower because you still can lose the server... 

In terms of data locality... maybe you lose a bit, however... because you're raiding your storage, you now have less data per node. So you end up with more nodes, right? 

Just some food for thought. 

On Aug 8, 2012, at 11:46 AM, Sourygna Luangsay <sl...@pragsis.com> wrote:

> Hi folks!
>  
> I have just read about the HDFS RAID feature that was added to Hadoop 0.21 or 0.22. and I am quite curious to know if people use it, what kind of use
> they have and what they think about Map/Reduce data locality.
>  
> First big actor of this technology is Facebook, that claims to save many PB with it (see http://www.slideshare.net/ydn/hdfs-raid-facebook slides 4 and 5).
>  
> I understand the following advantages with HDFS RAID:
> -          You can save space
> -          System tolerates more missing blocks
>  
> Nonetheless, one of the drawback I see is M/R data locality.
> As far as I understand, the advantage of having 3 replicas of each blocks is not only security if one server fails or a block is corrupted,
> but also the possibility to have as far as 3 tasktrackers executing the map task with “local data”.
> If you consider the 4th slide of the Facebook presentation, such infrastructure decreases this possibility to only 1 tasktracker.
> That means that if this tasktracker is very busy executing other tasks, you have the following choice:
> -          Waiting this tasktracker to finish executing (part of) the current tasks (freeing map slots for instance)
> -          Executing the map task for this block in another tasktracker, transferring the information of the block through the network
> In both cases, you´ll get a M/R penalty (please, tell me if I am wrong).
>  
> Has somebody considered such penalty or has some benchmarks to share with us?
>  
> One of the scenario I can think in order to take advantage of HDFS RAID without suffering this penalty is:
> -          Using normal HDFS with default replication=3 for my “fresh data”
> -          Using HDFS RAID for my historical data (that is barely used by M/R)
>  
> And you, what are you using HDFS RAID for?
>  
> Regards,
>  
> Sourygna Luangsay

Re: is HDFS RAID "data locality" efficient?

Posted by Steve Loughran <st...@hortonworks.com>.

On 8 August 2012 09:46, Sourygna Luangsay <sl...@pragsis.com> wrote:

>  Hi folks!****
>
> One of the scenario I can think in order to take advantage of HDFS RAID
> without suffering this penalty is:**
>
> **-          **Using normal HDFS with default replication=3 for my “fresh
> data”****
>
> **-          **Using HDFS RAID for my historical data (that is barely
> used by M/R)****
>
> ** **
>
>
>
exactly: less space use on cold data, with the penalty that access
performance can be worse. As the majority of data on a hadoop cluster is
usually "cold", it's a space and power efficient story for the archive data

-- 
Steve Loughran
Hortonworks Inc

Re: is HDFS RAID "data locality" efficient?

Posted by Michael Segel <mi...@hotmail.com>.

Just something to think about... 

There's a company here in Chicago called Cleversafe. I believe they recently made an announcement concerning Hadoop? 

The interesting thing about RAID is that you're adding to the disk latency and depending on which raid you use you could kill performance on a rebuild of a disk. 

In terms of uptime of Apache based Hadoop, RAID allows you to actually hot swap the disks and unless you lose both drives (assuming Raid 1, mirroring), your DN doesn't know and doesn't have to go down. 
So there is some value there, however at the expense of storage and storage costs. 

You can reduce the replication factor to 2. I don't know that I would go to anything lower because you still can lose the server... 

In terms of data locality... maybe you lose a bit, however... because you're raiding your storage, you now have less data per node. So you end up with more nodes, right? 

Just some food for thought. 

On Aug 8, 2012, at 11:46 AM, Sourygna Luangsay <sl...@pragsis.com> wrote:

> Hi folks!
>  
> I have just read about the HDFS RAID feature that was added to Hadoop 0.21 or 0.22. and I am quite curious to know if people use it, what kind of use
> they have and what they think about Map/Reduce data locality.
>  
> First big actor of this technology is Facebook, that claims to save many PB with it (see http://www.slideshare.net/ydn/hdfs-raid-facebook slides 4 and 5).
>  
> I understand the following advantages with HDFS RAID:
> -          You can save space
> -          System tolerates more missing blocks
>  
> Nonetheless, one of the drawback I see is M/R data locality.
> As far as I understand, the advantage of having 3 replicas of each blocks is not only security if one server fails or a block is corrupted,
> but also the possibility to have as far as 3 tasktrackers executing the map task with “local data”.
> If you consider the 4th slide of the Facebook presentation, such infrastructure decreases this possibility to only 1 tasktracker.
> That means that if this tasktracker is very busy executing other tasks, you have the following choice:
> -          Waiting this tasktracker to finish executing (part of) the current tasks (freeing map slots for instance)
> -          Executing the map task for this block in another tasktracker, transferring the information of the block through the network
> In both cases, you´ll get a M/R penalty (please, tell me if I am wrong).
>  
> Has somebody considered such penalty or has some benchmarks to share with us?
>  
> One of the scenario I can think in order to take advantage of HDFS RAID without suffering this penalty is:
> -          Using normal HDFS with default replication=3 for my “fresh data”
> -          Using HDFS RAID for my historical data (that is barely used by M/R)
>  
> And you, what are you using HDFS RAID for?
>  
> Regards,
>  
> Sourygna Luangsay

Re: is HDFS RAID "data locality" efficient?

Posted by Steve Loughran <st...@hortonworks.com>.

On 8 August 2012 09:46, Sourygna Luangsay <sl...@pragsis.com> wrote:

>  Hi folks!****
>
> One of the scenario I can think in order to take advantage of HDFS RAID
> without suffering this penalty is:**
>
> **-          **Using normal HDFS with default replication=3 for my “fresh
> data”****
>
> **-          **Using HDFS RAID for my historical data (that is barely
> used by M/R)****
>
> ** **
>
>
>
exactly: less space use on cold data, with the penalty that access
performance can be worse. As the majority of data on a hadoop cluster is
usually "cold", it's a space and power efficient story for the archive data

-- 
Steve Loughran
Hortonworks Inc

Re: is HDFS RAID "data locality" efficient?

Posted by Steve Loughran <st...@hortonworks.com>.

On 8 August 2012 09:46, Sourygna Luangsay <sl...@pragsis.com> wrote:

>  Hi folks!****
>
> One of the scenario I can think in order to take advantage of HDFS RAID
> without suffering this penalty is:**
>
> **-          **Using normal HDFS with default replication=3 for my “fresh
> data”****
>
> **-          **Using HDFS RAID for my historical data (that is barely
> used by M/R)****
>
> ** **
>
>
>
exactly: less space use on cold data, with the penalty that access
performance can be worse. As the majority of data on a hadoop cluster is
usually "cold", it's a space and power efficient story for the archive data

-- 
Steve Loughran
Hortonworks Inc

Re: is HDFS RAID "data locality" efficient?

Posted by Michael Segel <mi...@hotmail.com>.

Just something to think about... 

There's a company here in Chicago called Cleversafe. I believe they recently made an announcement concerning Hadoop? 

The interesting thing about RAID is that you're adding to the disk latency and depending on which raid you use you could kill performance on a rebuild of a disk. 

In terms of uptime of Apache based Hadoop, RAID allows you to actually hot swap the disks and unless you lose both drives (assuming Raid 1, mirroring), your DN doesn't know and doesn't have to go down. 
So there is some value there, however at the expense of storage and storage costs. 

You can reduce the replication factor to 2. I don't know that I would go to anything lower because you still can lose the server... 

In terms of data locality... maybe you lose a bit, however... because you're raiding your storage, you now have less data per node. So you end up with more nodes, right? 

Just some food for thought. 

On Aug 8, 2012, at 11:46 AM, Sourygna Luangsay <sl...@pragsis.com> wrote:

> Hi folks!
>  
> I have just read about the HDFS RAID feature that was added to Hadoop 0.21 or 0.22. and I am quite curious to know if people use it, what kind of use
> they have and what they think about Map/Reduce data locality.
>  
> First big actor of this technology is Facebook, that claims to save many PB with it (see http://www.slideshare.net/ydn/hdfs-raid-facebook slides 4 and 5).
>  
> I understand the following advantages with HDFS RAID:
> -          You can save space
> -          System tolerates more missing blocks
>  
> Nonetheless, one of the drawback I see is M/R data locality.
> As far as I understand, the advantage of having 3 replicas of each blocks is not only security if one server fails or a block is corrupted,
> but also the possibility to have as far as 3 tasktrackers executing the map task with “local data”.
> If you consider the 4th slide of the Facebook presentation, such infrastructure decreases this possibility to only 1 tasktracker.
> That means that if this tasktracker is very busy executing other tasks, you have the following choice:
> -          Waiting this tasktracker to finish executing (part of) the current tasks (freeing map slots for instance)
> -          Executing the map task for this block in another tasktracker, transferring the information of the block through the network
> In both cases, you´ll get a M/R penalty (please, tell me if I am wrong).
>  
> Has somebody considered such penalty or has some benchmarks to share with us?
>  
> One of the scenario I can think in order to take advantage of HDFS RAID without suffering this penalty is:
> -          Using normal HDFS with default replication=3 for my “fresh data”
> -          Using HDFS RAID for my historical data (that is barely used by M/R)
>  
> And you, what are you using HDFS RAID for?
>  
> Regards,
>  
> Sourygna Luangsay