You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by John Lilley <jo...@redpoint.net> on 2013/09/15 19:08:44 UTC

HDFS performance with an without replication

In our YARN application, we are considering whether to store temporary data with replication=1 or replication=3 (or give the user an option).  Obviously there is a tradeoff between reliability and performance, but on smaller clusters I'd expect this to be less of an issue.

What is the difference in write performance using replication=1 vs 3?  For reading I'd expect the performance to be roughly requivalent.

john

RE: HDFS performance with an without replication

Posted by John Lilley <jo...@redpoint.net>.
Thanks, that makes sense.
john

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Sunday, September 15, 2013 12:39 PM
To: <us...@hadoop.apache.org>
Subject: Re: HDFS performance with an without replication

Write performance improves with lesser replicas (as a result of synchronous and sequenced write pipelines in HDFS). Reads would be the same, unless you're unable to schedule a rack-local read (at worst
case) due to only one (busy) rack holding it.

On Sun, Sep 15, 2013 at 10:38 PM, John Lilley <jo...@redpoint.net> wrote:
> In our YARN application, we are considering whether to store temporary 
> data with replication=1 or replication=3 (or give the user an option).  
> Obviously there is a tradeoff between reliability and performance, but 
> on smaller clusters I'd expect this to be less of an issue.
>
>
>
> What is the difference in write performance using replication=1 vs 3?  
> For reading I'd expect the performance to be roughly requivalent.
>
>
>
> john



--
Harsh J

RE: HDFS performance with an without replication

Posted by John Lilley <jo...@redpoint.net>.
Thanks, that makes sense.
john

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Sunday, September 15, 2013 12:39 PM
To: <us...@hadoop.apache.org>
Subject: Re: HDFS performance with an without replication

Write performance improves with lesser replicas (as a result of synchronous and sequenced write pipelines in HDFS). Reads would be the same, unless you're unable to schedule a rack-local read (at worst
case) due to only one (busy) rack holding it.

On Sun, Sep 15, 2013 at 10:38 PM, John Lilley <jo...@redpoint.net> wrote:
> In our YARN application, we are considering whether to store temporary 
> data with replication=1 or replication=3 (or give the user an option).  
> Obviously there is a tradeoff between reliability and performance, but 
> on smaller clusters I'd expect this to be less of an issue.
>
>
>
> What is the difference in write performance using replication=1 vs 3?  
> For reading I'd expect the performance to be roughly requivalent.
>
>
>
> john



--
Harsh J

RE: HDFS performance with an without replication

Posted by John Lilley <jo...@redpoint.net>.
Thanks, that makes sense.
john

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Sunday, September 15, 2013 12:39 PM
To: <us...@hadoop.apache.org>
Subject: Re: HDFS performance with an without replication

Write performance improves with lesser replicas (as a result of synchronous and sequenced write pipelines in HDFS). Reads would be the same, unless you're unable to schedule a rack-local read (at worst
case) due to only one (busy) rack holding it.

On Sun, Sep 15, 2013 at 10:38 PM, John Lilley <jo...@redpoint.net> wrote:
> In our YARN application, we are considering whether to store temporary 
> data with replication=1 or replication=3 (or give the user an option).  
> Obviously there is a tradeoff between reliability and performance, but 
> on smaller clusters I'd expect this to be less of an issue.
>
>
>
> What is the difference in write performance using replication=1 vs 3?  
> For reading I'd expect the performance to be roughly requivalent.
>
>
>
> john



--
Harsh J

RE: HDFS performance with an without replication

Posted by John Lilley <jo...@redpoint.net>.
Thanks, that makes sense.
john

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Sunday, September 15, 2013 12:39 PM
To: <us...@hadoop.apache.org>
Subject: Re: HDFS performance with an without replication

Write performance improves with lesser replicas (as a result of synchronous and sequenced write pipelines in HDFS). Reads would be the same, unless you're unable to schedule a rack-local read (at worst
case) due to only one (busy) rack holding it.

On Sun, Sep 15, 2013 at 10:38 PM, John Lilley <jo...@redpoint.net> wrote:
> In our YARN application, we are considering whether to store temporary 
> data with replication=1 or replication=3 (or give the user an option).  
> Obviously there is a tradeoff between reliability and performance, but 
> on smaller clusters I'd expect this to be less of an issue.
>
>
>
> What is the difference in write performance using replication=1 vs 3?  
> For reading I'd expect the performance to be roughly requivalent.
>
>
>
> john



--
Harsh J

Re: HDFS performance with an without replication

Posted by Harsh J <ha...@cloudera.com>.
Write performance improves with lesser replicas (as a result of
synchronous and sequenced write pipelines in HDFS). Reads would be the
same, unless you're unable to schedule a rack-local read (at worst
case) due to only one (busy) rack holding it.

On Sun, Sep 15, 2013 at 10:38 PM, John Lilley <jo...@redpoint.net> wrote:
> In our YARN application, we are considering whether to store temporary data
> with replication=1 or replication=3 (or give the user an option).  Obviously
> there is a tradeoff between reliability and performance, but on smaller
> clusters I’d expect this to be less of an issue.
>
>
>
> What is the difference in write performance using replication=1 vs 3?  For
> reading I’d expect the performance to be roughly requivalent.
>
>
>
> john



-- 
Harsh J

Re: HDFS performance with an without replication

Posted by Harsh J <ha...@cloudera.com>.
Write performance improves with lesser replicas (as a result of
synchronous and sequenced write pipelines in HDFS). Reads would be the
same, unless you're unable to schedule a rack-local read (at worst
case) due to only one (busy) rack holding it.

On Sun, Sep 15, 2013 at 10:38 PM, John Lilley <jo...@redpoint.net> wrote:
> In our YARN application, we are considering whether to store temporary data
> with replication=1 or replication=3 (or give the user an option).  Obviously
> there is a tradeoff between reliability and performance, but on smaller
> clusters I’d expect this to be less of an issue.
>
>
>
> What is the difference in write performance using replication=1 vs 3?  For
> reading I’d expect the performance to be roughly requivalent.
>
>
>
> john



-- 
Harsh J

Re: HDFS performance with an without replication

Posted by Harsh J <ha...@cloudera.com>.
Write performance improves with lesser replicas (as a result of
synchronous and sequenced write pipelines in HDFS). Reads would be the
same, unless you're unable to schedule a rack-local read (at worst
case) due to only one (busy) rack holding it.

On Sun, Sep 15, 2013 at 10:38 PM, John Lilley <jo...@redpoint.net> wrote:
> In our YARN application, we are considering whether to store temporary data
> with replication=1 or replication=3 (or give the user an option).  Obviously
> there is a tradeoff between reliability and performance, but on smaller
> clusters I’d expect this to be less of an issue.
>
>
>
> What is the difference in write performance using replication=1 vs 3?  For
> reading I’d expect the performance to be roughly requivalent.
>
>
>
> john



-- 
Harsh J

Re: HDFS performance with an without replication

Posted by Harsh J <ha...@cloudera.com>.
Write performance improves with lesser replicas (as a result of
synchronous and sequenced write pipelines in HDFS). Reads would be the
same, unless you're unable to schedule a rack-local read (at worst
case) due to only one (busy) rack holding it.

On Sun, Sep 15, 2013 at 10:38 PM, John Lilley <jo...@redpoint.net> wrote:
> In our YARN application, we are considering whether to store temporary data
> with replication=1 or replication=3 (or give the user an option).  Obviously
> there is a tradeoff between reliability and performance, but on smaller
> clusters I’d expect this to be less of an issue.
>
>
>
> What is the difference in write performance using replication=1 vs 3?  For
> reading I’d expect the performance to be roughly requivalent.
>
>
>
> john



-- 
Harsh J