You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by George Kousiouris <gk...@mail.ntua.gr> on 2013/02/11 17:42:36 UTC

Correlation between replication factor and read/write performance survey?

Hi all,

Is anyone aware of any survey/paper/report showing the relationship 
between a replication factor and its penalty/benefit on write/read 
operations?

BR,
George

-- 

---------------------------

Re: Correlation between replication factor and read/write performance survey?

Posted by Ted Dunning <td...@maprtech.com>.

The delay due to replication is rarely a large problem in traditional
map-reduce programs since many writes are occurring at once.  The real
problem comes because you are consuming 3x the total disk bandwidth so that
the theoretical maximum equilibrium write bandwidth is limited to the
lesser of half your network bandwidth or a third of your usable disk
bandwidth.  Usable disk bandwidth for ordinary Hadoop typically can achieve
about half the raw bandwidth of the disks themselves.

On Mon, Feb 11, 2013 at 6:36 PM, Rishi Yadav <ri...@infoobjects.com> wrote:

> I think higher replication only makes read easier as client can choose to
> read  block from nearest node.
>
> Writes are done using replication pipeline so client does wait for ack
> from all nodes but writes to only first node. It would be interesting to
> see if there are any benchmarks for delay caused by this acknowledgement.
>
>
> Sent from my iPhone
>
> On Feb 11, 2013, at 6:42 AM, George Kousiouris <gk...@mail.ntua.gr>
> wrote:
>
> >
> > Hi all,
> >
> > Is anyone aware of any survey/paper/report showing the relationship
> between a replication factor and its penalty/benefit on write/read
> operations?
> >
> > BR,
> > George
> >
> > --
> >
> > ---------------------------
> >
> >
>

Re: Correlation between replication factor and read/write performance survey?

Posted by Ted Dunning <td...@maprtech.com>.

The delay due to replication is rarely a large problem in traditional
map-reduce programs since many writes are occurring at once.  The real
problem comes because you are consuming 3x the total disk bandwidth so that
the theoretical maximum equilibrium write bandwidth is limited to the
lesser of half your network bandwidth or a third of your usable disk
bandwidth.  Usable disk bandwidth for ordinary Hadoop typically can achieve
about half the raw bandwidth of the disks themselves.

On Mon, Feb 11, 2013 at 6:36 PM, Rishi Yadav <ri...@infoobjects.com> wrote:

> I think higher replication only makes read easier as client can choose to
> read  block from nearest node.
>
> Writes are done using replication pipeline so client does wait for ack
> from all nodes but writes to only first node. It would be interesting to
> see if there are any benchmarks for delay caused by this acknowledgement.
>
>
> Sent from my iPhone
>
> On Feb 11, 2013, at 6:42 AM, George Kousiouris <gk...@mail.ntua.gr>
> wrote:
>
> >
> > Hi all,
> >
> > Is anyone aware of any survey/paper/report showing the relationship
> between a replication factor and its penalty/benefit on write/read
> operations?
> >
> > BR,
> > George
> >
> > --
> >
> > ---------------------------
> >
> >
>

Re: Correlation between replication factor and read/write performance survey?

Posted by Ted Dunning <td...@maprtech.com>.

The delay due to replication is rarely a large problem in traditional
map-reduce programs since many writes are occurring at once.  The real
problem comes because you are consuming 3x the total disk bandwidth so that
the theoretical maximum equilibrium write bandwidth is limited to the
lesser of half your network bandwidth or a third of your usable disk
bandwidth.  Usable disk bandwidth for ordinary Hadoop typically can achieve
about half the raw bandwidth of the disks themselves.

On Mon, Feb 11, 2013 at 6:36 PM, Rishi Yadav <ri...@infoobjects.com> wrote:

> I think higher replication only makes read easier as client can choose to
> read  block from nearest node.
>
> Writes are done using replication pipeline so client does wait for ack
> from all nodes but writes to only first node. It would be interesting to
> see if there are any benchmarks for delay caused by this acknowledgement.
>
>
> Sent from my iPhone
>
> On Feb 11, 2013, at 6:42 AM, George Kousiouris <gk...@mail.ntua.gr>
> wrote:
>
> >
> > Hi all,
> >
> > Is anyone aware of any survey/paper/report showing the relationship
> between a replication factor and its penalty/benefit on write/read
> operations?
> >
> > BR,
> > George
> >
> > --
> >
> > ---------------------------
> >
> >
>

Re: Correlation between replication factor and read/write performance survey?

Posted by Ted Dunning <td...@maprtech.com>.

The delay due to replication is rarely a large problem in traditional
map-reduce programs since many writes are occurring at once.  The real
problem comes because you are consuming 3x the total disk bandwidth so that
the theoretical maximum equilibrium write bandwidth is limited to the
lesser of half your network bandwidth or a third of your usable disk
bandwidth.  Usable disk bandwidth for ordinary Hadoop typically can achieve
about half the raw bandwidth of the disks themselves.

On Mon, Feb 11, 2013 at 6:36 PM, Rishi Yadav <ri...@infoobjects.com> wrote:

> I think higher replication only makes read easier as client can choose to
> read  block from nearest node.
>
> Writes are done using replication pipeline so client does wait for ack
> from all nodes but writes to only first node. It would be interesting to
> see if there are any benchmarks for delay caused by this acknowledgement.
>
>
> Sent from my iPhone
>
> On Feb 11, 2013, at 6:42 AM, George Kousiouris <gk...@mail.ntua.gr>
> wrote:
>
> >
> > Hi all,
> >
> > Is anyone aware of any survey/paper/report showing the relationship
> between a replication factor and its penalty/benefit on write/read
> operations?
> >
> > BR,
> > George
> >
> > --
> >
> > ---------------------------
> >
> >
>

Re: Correlation between replication factor and read/write performance survey?

Posted by Rishi Yadav <ri...@infoobjects.com>.

I think higher replication only makes read easier as client can choose to read  block from nearest node.

Writes are done using replication pipeline so client does wait for ack from all nodes but writes to only first node. It would be interesting to see if there are any benchmarks for delay caused by this acknowledgement.

Sent from my iPhone

On Feb 11, 2013, at 6:42 AM, George Kousiouris <gk...@mail.ntua.gr> wrote:

> 
> Hi all,
> 
> Is anyone aware of any survey/paper/report showing the relationship between a replication factor and its penalty/benefit on write/read operations?
> 
> BR,
> George
> 
> -- 
> 
> ---------------------------
> 
>

Re: Correlation between replication factor and read/write performance survey?

Posted by Rishi Yadav <ri...@infoobjects.com>.

I think higher replication only makes read easier as client can choose to read  block from nearest node.

Writes are done using replication pipeline so client does wait for ack from all nodes but writes to only first node. It would be interesting to see if there are any benchmarks for delay caused by this acknowledgement.

Sent from my iPhone

On Feb 11, 2013, at 6:42 AM, George Kousiouris <gk...@mail.ntua.gr> wrote:

> 
> Hi all,
> 
> Is anyone aware of any survey/paper/report showing the relationship between a replication factor and its penalty/benefit on write/read operations?
> 
> BR,
> George
> 
> -- 
> 
> ---------------------------
> 
>

Re: Correlation between replication factor and read/write performance survey?

Posted by Rishi Yadav <ri...@infoobjects.com>.

I think higher replication only makes read easier as client can choose to read  block from nearest node.

Writes are done using replication pipeline so client does wait for ack from all nodes but writes to only first node. It would be interesting to see if there are any benchmarks for delay caused by this acknowledgement.

Sent from my iPhone

On Feb 11, 2013, at 6:42 AM, George Kousiouris <gk...@mail.ntua.gr> wrote:

> 
> Hi all,
> 
> Is anyone aware of any survey/paper/report showing the relationship between a replication factor and its penalty/benefit on write/read operations?
> 
> BR,
> George
> 
> -- 
> 
> ---------------------------
> 
>

Re: Correlation between replication factor and read/write performance survey?

Posted by Rishi Yadav <ri...@infoobjects.com>.

I think higher replication only makes read easier as client can choose to read  block from nearest node.

Writes are done using replication pipeline so client does wait for ack from all nodes but writes to only first node. It would be interesting to see if there are any benchmarks for delay caused by this acknowledgement.

Sent from my iPhone

On Feb 11, 2013, at 6:42 AM, George Kousiouris <gk...@mail.ntua.gr> wrote:

> 
> Hi all,
> 
> Is anyone aware of any survey/paper/report showing the relationship between a replication factor and its penalty/benefit on write/read operations?
> 
> BR,
> George
> 
> -- 
> 
> ---------------------------
> 
>