You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by KIM JUN YOUNG <ju...@me.com> on 2013/04/04 12:01:48 UTC

confused info about region-regionserver locality

Hi All. 

There is confused understanding about region-regionser locality.

from the current document ,

http://hbase.apache.org/book/regions.arch.html
9.7.3. Region-RegionServer Locality
Over time, Region-RegionServer locality is achieved via HDFS block replication. The HDFS client does the following by default when choosing locations to write replicas:

First replica is written to local node
Second replica is written to another node in same rack
Third replica is written to a node in another rack (if sufficient nodes)


but, my understanding is different
HDFS write blocks for replica

	first, local node
	second, another node in another rack
	third, random another node in same rack

need to be changed? or am I missing something?

Re: confused info about region-regionserver locality

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

I have created HBASE-8269 for this documentation update.

2013/4/4 Jean-Marc Spaggiari <je...@spaggiari.org>:
>>Isn't this done via pipelining anyway?
> Yes, it's the way it's done.
>
>>So there's no notion of ordering with respect 1st, 2nd, and 3rd block, either all writes go through the pipeline or none are.
> Still correct.
>
>> When the write request returns to the client there will be a local copy, a copy on another machine in the same, and a copy on a machine in a different rack, who cares about the ordering inside the pipeline?
> Not necessary. There might not be any additional copy on a different
> machine on the same rack. BUT.. As you said, who cares ;) As long as
> we have the local copy and some replicas.
>
> I have updated the documentation already. I will open the JIRA and
> submit. I have also added subsequent replicas in case replication
> factor is > 3.
>
> JM
>
> 2013/4/4 lars hofhansl <la...@apache.org>:
>> Isn't this done via pipelining anyway?
>> So there's no notion of ordering with respect 1st, 2nd, and 3rd block, either all writes go through the pipeline or none are.
>>
>> When the write request returns to the client there will be a local copy, a copy on another machine in the same, and a copy on a machine in a different rack, who cares about the ordering inside the pipeline?
>>
>>
>> Seems it would also be inefficient to pipeline from the local rack to another another one and then in the same pipeline back into the local rack (more load on the switch connecting the racks with no benefit).
>>
>> I'll double check.
>>
>>
>> -- Lars
>>
>>
>>
>> ________________________________
>>  From: Jean-Marc Spaggiari <je...@spaggiari.org>
>> To: user@hbase.apache.org
>> Sent: Thursday, April 4, 2013 8:25 AM
>> Subject: Re: confused info about region-regionserver locality
>
>
>>
>> Hi,
>>
>> I think you're right and documentation need to be updated.
>>
>> The 3rd replica is written on a random node in the same rack as the
>> 2nd replica. I will double check. Can you please open a JIRA so this
>> is updated?
>>
>> JM
>>
>> 2013/4/4 KIM JUN YOUNG <ju...@me.com>:
>>> Hi All.
>>>
>>> There is confused understanding about region-regionser locality.
>>>
>>> from the current document ,
>>>
>>> http://hbase.apache.org/book/regions.arch.html
>>> 9.7.3. Region-RegionServer Locality
>>> Over time, Region-RegionServer locality is achieved via HDFS block replication. The HDFS client does the following by default when choosing locations to write replicas:
>>>
>>> First replica is written to local node
>>> Second replica is written to another node in same rack
>>> Third replica is written to a node in another rack (if sufficient nodes)
>>>
>>>
>>> but, my understanding is different
>>> HDFS write blocks for replica
>>>
>>>         first, local node
>>>         second, another node in another rack
>>>         third, random another node in same rack
>>>
>>> need to be changed? or am I missing something?

Re: confused info about region-regionserver locality

Posted by Dave Wang <ds...@cloudera.com>.

I think the order will matter if you run with say replication factor 2.

- Dave


On Thu, Apr 4, 2013 at 11:30 AM, lars hofhansl <la...@apache.org> wrote:

> >> When the write request returns to the client there will be a local
> copy, a copy on another machine in the same, and a copy on a machine in a
> different rack, who cares about the ordering inside the pipeline?
> > Not necessary. There might not be any additional copy on a different
> > machine on the same rack. BUT.. As you said, who cares ;) As long as
> > we have the local copy and some replicas.
>
> Really? Doesn't the whole pipeline have to be successful in order to
> return success to the client.
> (I might be confused :) )
>
>
>
> ________________________________
>  From: Jean-Marc Spaggiari <je...@spaggiari.org>
> To: user@hbase.apache.org; lars hofhansl <la...@apache.org>
> Sent: Thursday, April 4, 2013 11:24 AM
> Subject: Re: confused info about region-regionserver locality
>
> >Isn't this done via pipelining anyway?
> Yes, it's the way it's done.
>
> >So there's no notion of ordering with respect 1st, 2nd, and 3rd block,
> either all writes go through the pipeline or none are.
> Still correct.
>
> > When the write request returns to the client there will be a local copy,
> a copy on another machine in the same, and a copy on a machine in a
> different rack, who cares about the ordering inside the pipeline?
> Not necessary. There might not be any additional copy on a different
> machine on the same rack. BUT.. As you said, who cares ;) As long as
> we have the local copy and some replicas.
>
> I have updated the documentation already. I will open the JIRA and
> submit. I have also added subsequent replicas in case replication
> factor is > 3.
>
> JM
>
> 2013/4/4 lars hofhansl <la...@apache.org>:
> > Isn't this done via pipelining anyway?
> > So there's no notion of ordering with respect 1st, 2nd, and 3rd block,
> either all writes go through the pipeline or none are.
> >
> > When the write request returns to the client there will be a local copy,
> a copy on another machine in the same, and a copy on a machine in a
> different rack, who cares about the ordering inside the pipeline?
> >
> >
> > Seems it would also be inefficient to pipeline from the local rack to
> another another one and then in the same pipeline back into the local rack
> (more load on the switch connecting the racks with no benefit).
> >
> > I'll double check.
> >
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: Jean-Marc Spaggiari <je...@spaggiari.org>
> > To: user@hbase.apache.org
> > Sent: Thursday, April 4, 2013 8:25 AM
> > Subject: Re: confused info about region-regionserver locality
>
>
> >
> > Hi,
> >
> > I think you're right and documentation need to be updated.
> >
> > The 3rd replica is written on a random node in the same rack as the
> > 2nd replica. I will double check. Can you please open a JIRA so this
> > is updated?
> >
> > JM
> >
> > 2013/4/4 KIM JUN YOUNG <ju...@me.com>:
> >> Hi All.
> >>
> >> There is confused understanding about region-regionser locality.
> >>
> >> from the current document ,
> >>
> >> http://hbase.apache.org/book/regions.arch.html
> >> 9.7.3. Region-RegionServer Locality
> >> Over time, Region-RegionServer locality is achieved via HDFS block
> replication. The HDFS client does the following by default when choosing
> locations to write replicas:
> >>
> >> First replica is written to local node
> >> Second replica is written to another node in same rack
> >> Third replica is written to a node in another rack (if sufficient nodes)
> >>
> >>
> >> but, my understanding is different
> >> HDFS write blocks for replica
> >>
> >>         first, local node
> >>         second, another node in another rack
> >>         third, random another node in same rack
> >>
> >> need to be changed? or am I missing something?
>

Re: confused info about region-regionserver locality

Posted by lars hofhansl <la...@apache.org>.

>> When the write request returns to the client there will be a local 
copy, a copy on another machine in the same, and a copy on a machine in a different rack, who cares about the ordering inside the pipeline?
> Not necessary. There might not be any additional copy on a different
> machine on the same rack. BUT.. As you said, who cares ;) As long as
> we have the local copy and some replicas.

Really? Doesn't the whole pipeline have to be successful in order to return success to the client.
(I might be confused :) )



________________________________
 From: Jean-Marc Spaggiari <je...@spaggiari.org>
To: user@hbase.apache.org; lars hofhansl <la...@apache.org> 
Sent: Thursday, April 4, 2013 11:24 AM
Subject: Re: confused info about region-regionserver locality
 
>Isn't this done via pipelining anyway?
Yes, it's the way it's done.

>So there's no notion of ordering with respect 1st, 2nd, and 3rd block, either all writes go through the pipeline or none are.
Still correct.

> When the write request returns to the client there will be a local copy, a copy on another machine in the same, and a copy on a machine in a different rack, who cares about the ordering inside the pipeline?
Not necessary. There might not be any additional copy on a different
machine on the same rack. BUT.. As you said, who cares ;) As long as
we have the local copy and some replicas.

I have updated the documentation already. I will open the JIRA and
submit. I have also added subsequent replicas in case replication
factor is > 3.

JM

2013/4/4 lars hofhansl <la...@apache.org>:
> Isn't this done via pipelining anyway?
> So there's no notion of ordering with respect 1st, 2nd, and 3rd block, either all writes go through the pipeline or none are.
>
> When the write request returns to the client there will be a local copy, a copy on another machine in the same, and a copy on a machine in a different rack, who cares about the ordering inside the pipeline?
>
>
> Seems it would also be inefficient to pipeline from the local rack to another another one and then in the same pipeline back into the local rack (more load on the switch connecting the racks with no benefit).
>
> I'll double check.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Jean-Marc Spaggiari <je...@spaggiari.org>
> To: user@hbase.apache.org
> Sent: Thursday, April 4, 2013 8:25 AM
> Subject: Re: confused info about region-regionserver locality


>
> Hi,
>
> I think you're right and documentation need to be updated.
>
> The 3rd replica is written on a random node in the same rack as the
> 2nd replica. I will double check. Can you please open a JIRA so this
> is updated?
>
> JM
>
> 2013/4/4 KIM JUN YOUNG <ju...@me.com>:
>> Hi All.
>>
>> There is confused understanding about region-regionser locality.
>>
>> from the current document ,
>>
>> http://hbase.apache.org/book/regions.arch.html
>> 9.7.3. Region-RegionServer Locality
>> Over time, Region-RegionServer locality is achieved via HDFS block replication. The HDFS client does the following by default when choosing locations to write replicas:
>>
>> First replica is written to local node
>> Second replica is written to another node in same rack
>> Third replica is written to a node in another rack (if sufficient nodes)
>>
>>
>> but, my understanding is different
>> HDFS write blocks for replica
>>
>>         first, local node
>>         second, another node in another rack
>>         third, random another node in same rack
>>
>> need to be changed? or am I missing something?

Re: confused info about region-regionserver locality

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

>Isn't this done via pipelining anyway?
Yes, it's the way it's done.

>So there's no notion of ordering with respect 1st, 2nd, and 3rd block, either all writes go through the pipeline or none are.
Still correct.

> When the write request returns to the client there will be a local copy, a copy on another machine in the same, and a copy on a machine in a different rack, who cares about the ordering inside the pipeline?
Not necessary. There might not be any additional copy on a different
machine on the same rack. BUT.. As you said, who cares ;) As long as
we have the local copy and some replicas.

I have updated the documentation already. I will open the JIRA and
submit. I have also added subsequent replicas in case replication
factor is > 3.

JM

2013/4/4 lars hofhansl <la...@apache.org>:
> Isn't this done via pipelining anyway?
> So there's no notion of ordering with respect 1st, 2nd, and 3rd block, either all writes go through the pipeline or none are.
>
> When the write request returns to the client there will be a local copy, a copy on another machine in the same, and a copy on a machine in a different rack, who cares about the ordering inside the pipeline?
>
>
> Seems it would also be inefficient to pipeline from the local rack to another another one and then in the same pipeline back into the local rack (more load on the switch connecting the racks with no benefit).
>
> I'll double check.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Jean-Marc Spaggiari <je...@spaggiari.org>
> To: user@hbase.apache.org
> Sent: Thursday, April 4, 2013 8:25 AM
> Subject: Re: confused info about region-regionserver locality


>
> Hi,
>
> I think you're right and documentation need to be updated.
>
> The 3rd replica is written on a random node in the same rack as the
> 2nd replica. I will double check. Can you please open a JIRA so this
> is updated?
>
> JM
>
> 2013/4/4 KIM JUN YOUNG <ju...@me.com>:
>> Hi All.
>>
>> There is confused understanding about region-regionser locality.
>>
>> from the current document ,
>>
>> http://hbase.apache.org/book/regions.arch.html
>> 9.7.3. Region-RegionServer Locality
>> Over time, Region-RegionServer locality is achieved via HDFS block replication. The HDFS client does the following by default when choosing locations to write replicas:
>>
>> First replica is written to local node
>> Second replica is written to another node in same rack
>> Third replica is written to a node in another rack (if sufficient nodes)
>>
>>
>> but, my understanding is different
>> HDFS write blocks for replica
>>
>>         first, local node
>>         second, another node in another rack
>>         third, random another node in same rack
>>
>> need to be changed? or am I missing something?

Re: confused info about region-regionserver locality

Posted by lars hofhansl <la...@apache.org>.

Isn't this done via pipelining anyway?
So there's no notion of ordering with respect 1st, 2nd, and 3rd block, either all writes go through the pipeline or none are.

When the write request returns to the client there will be a local copy, a copy on another machine in the same, and a copy on a machine in a different rack, who cares about the ordering inside the pipeline?


Seems it would also be inefficient to pipeline from the local rack to another another one and then in the same pipeline back into the local rack (more load on the switch connecting the racks with no benefit).

I'll double check.


-- Lars



________________________________
 From: Jean-Marc Spaggiari <je...@spaggiari.org>
To: user@hbase.apache.org 
Sent: Thursday, April 4, 2013 8:25 AM
Subject: Re: confused info about region-regionserver locality
 
Hi,

I think you're right and documentation need to be updated.

The 3rd replica is written on a random node in the same rack as the
2nd replica. I will double check. Can you please open a JIRA so this
is updated?

JM

2013/4/4 KIM JUN YOUNG <ju...@me.com>:
> Hi All.
>
> There is confused understanding about region-regionser locality.
>
> from the current document ,
>
> http://hbase.apache.org/book/regions.arch.html
> 9.7.3. Region-RegionServer Locality
> Over time, Region-RegionServer locality is achieved via HDFS block replication. The HDFS client does the following by default when choosing locations to write replicas:
>
> First replica is written to local node
> Second replica is written to another node in same rack
> Third replica is written to a node in another rack (if sufficient nodes)
>
>
> but, my understanding is different
> HDFS write blocks for replica
>
>         first, local node
>         second, another node in another rack
>         third, random another node in same rack
>
> need to be changed? or am I missing something?

Re: confused info about region-regionserver locality

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi,

I think you're right and documentation need to be updated.

The 3rd replica is written on a random node in the same rack as the
2nd replica. I will double check. Can you please open a JIRA so this
is updated?

JM

2013/4/4 KIM JUN YOUNG <ju...@me.com>:
> Hi All.
>
> There is confused understanding about region-regionser locality.
>
> from the current document ,
>
> http://hbase.apache.org/book/regions.arch.html
> 9.7.3. Region-RegionServer Locality
> Over time, Region-RegionServer locality is achieved via HDFS block replication. The HDFS client does the following by default when choosing locations to write replicas:
>
> First replica is written to local node
> Second replica is written to another node in same rack
> Third replica is written to a node in another rack (if sufficient nodes)
>
>
> but, my understanding is different
> HDFS write blocks for replica
>
>         first, local node
>         second, another node in another rack
>         third, random another node in same rack
>
> need to be changed? or am I missing something?