You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@accumulo.apache.org by Keith Turner <ke...@deenlo.com> on 2015/06/30 17:00:03 UTC

[DISCUSS] HDFS operation to support Accumulo locality

There was a discussion on IRC about balancing and locality yesterday. I was
thinking about the locallity problem, and started thinking about the
possibility of having a HDFS operation that would force a file to have
local replicas. I think approach this has the following pros over forcing a
compaction.

  * Only one replica is copied across the network.
  * Avoids decompressing, deserializing, serializing, and compressing data.

The tricky part about this approach is that Accumulo needs to decide when
to ask HDFS to make a file local. This decision could be based on a
function of the file size and number of recent accesses.

We could avoid decompressing, deserializing, etc today by just copying (not
compacting) frequently accessed files. However this would write 3 replicas
where a HDFS operation would only write one.

Note for the assertion that only one replica would need to be copied I was
thinking of following 3 initial conditions.  I am assuming we want to avoid
all three replicas on same rack.

 * Zero replicas on rack : can copy replica to node and drop replica on
another rack.
 * One replica on rack : can copy replica to node and drop any other
replica.
 * Two replicas on rack : can copy replica to node and drop another replica
on same rack.

Re: [DISCUSS] HDFS operation to support Accumulo locality

Posted by Keith Turner <ke...@deenlo.com>.

On Tue, Jun 30, 2015 at 11:39 AM, Josh Elser <jo...@gmail.com> wrote:

> Sorry in advance if I derail this, but I'm not sure what it would take to
> actually implement such an operation. The initial pushback might just be
> "use the block locations and assign the tablet yourself", since that's
> essentially what HBase does (not suggesting there isn't something better to
> do, just a hunch).
>
> IMO, we don't have a lot of information on locality presently. I was
> thinking it would be nice to create a tool to help us understand locality
> at all.
>
> My guess is that after this, our next big gain would be choosing a better
> candidate for where to move a tablet in the case of rebalancing, splits and
> previous-server failure (pretty much all of the times that we aren't/can't
> put the tablet back to its previous loc). I'm not sure how far this would
> get us combined with the favored nodes API, e.g. a Tablet has some favored
> datanodes which we include in the HDFS calls and we can try to put the
> tablet on one of those nodes and assume that HDFS will have the blocks
> there.
>
> tl;dr I'd want to have examples of how that the current API is
> insufficient before lobbying for new HDFS APIs.



Having some examples of how the status quo is insufficient is a good idea.
I was trying to think of situations where there are no suitable nodes that
have *all* of a tablets file blocks local.  In these situations the best we
can hope for is a node that has the largest subset of a tablets file
blocks. I think the following scenarios can cause this situation where
there is no node that has all tablet file blocks.

 * Added X new tablet servers.  Tablets moved inorder to evenly spread
tablets.
 * A lot of tablets in a table just split.  Inorder to evenly spread
tablets across cluster, need to move them.
 * Decommissioned X tablet servers.  Tablets moved inorder to evenly spread
tablets.
 * A tablets has been hosted on multiple tablet servers and as a result
there is no single datanode that has all of its file blocks.
 * Tablet servers run on a subset of the datanodes.  Is the ratio of
tservers to datanodes goes lower the ability to find a datanode with many
of a tablets file blocks goes down.
 * Decommissioning or adding datanodes could also throw off a tablets
locality.

Are there other cases I am missing?


>
>
> Keith Turner wrote:
>
>> I just thought of one potential issue with this.  The same file can be
>> shared by multiple tablets on different tservers.   If there are more than
>> 3 tablets sharing a file, it could cause problems if all of them request a
>> local replica.  So if hdfs had this operation, Accumulo would have to be
>> careful about which files it requested local blocks for.
>>
>> On Tue, Jun 30, 2015 at 11:00 AM, Keith Turner<ke...@deenlo.com>  wrote:
>>
>>  There was a discussion on IRC about balancing and locality yesterday. I
>>> was thinking about the locallity problem, and started thinking about the
>>> possibility of having a HDFS operation that would force a file to have
>>> local replicas. I think approach this has the following pros over
>>> forcing a
>>> compaction.
>>>
>>>    * Only one replica is copied across the network.
>>>    * Avoids decompressing, deserializing, serializing, and compressing
>>> data.
>>>
>>> The tricky part about this approach is that Accumulo needs to decide when
>>> to ask HDFS to make a file local. This decision could be based on a
>>> function of the file size and number of recent accesses.
>>>
>>> We could avoid decompressing, deserializing, etc today by just copying
>>> (not compacting) frequently accessed files. However this would write 3
>>> replicas where a HDFS operation would only write one.
>>>
>>> Note for the assertion that only one replica would need to be copied I
>>> was
>>> thinking of following 3 initial conditions.  I am assuming we want to
>>> avoid
>>> all three replicas on same rack.
>>>
>>>   * Zero replicas on rack : can copy replica to node and drop replica on
>>> another rack.
>>>   * One replica on rack : can copy replica to node and drop any other
>>> replica.
>>>   * Two replicas on rack : can copy replica to node and drop another
>>> replica on same rack.
>>>
>>>
>>>
>>>
>>

Re: [DISCUSS] HDFS operation to support Accumulo locality

Posted by Sean Busbey <bu...@cloudera.com>.

On Tue, Jun 30, 2015 at 11:42 AM, Keith Turner <ke...@deenlo.com> wrote:

> On Tue, Jun 30, 2015 at 12:25 PM, Josh Elser <jo...@gmail.com> wrote:
>
> >
> >
> > Sean Busbey wrote:
> >
> >>
> >>>
> >>>  Also HDFS-4606
> >>
> >>
> > Well then, HDFS-4606 reads very similar..
> >
>
> Yeah,  I made the one and only vote for the issue :)
>

Allow me to help give you a bump then. :)

-- 
Sean

Re: [DISCUSS] HDFS operation to support Accumulo locality

Posted by Keith Turner <ke...@deenlo.com>.

On Tue, Jun 30, 2015 at 12:25 PM, Josh Elser <jo...@gmail.com> wrote:

>
>
> Sean Busbey wrote:
>
>> On Tue, Jun 30, 2015 at 10:50 AM, Mike Drob<ma...@cloudera.com>  wrote:
>>
>>  On Tue, Jun 30, 2015 at 10:39 AM, Josh Elser<jo...@gmail.com>
>>> wrote:
>>>
>>>  Sorry in advance if I derail this, but I'm not sure what it would take
>>>> to
>>>> actually implement such an operation. The initial pushback might just be
>>>> "use the block locations and assign the tablet yourself", since that's
>>>> essentially what HBase does (not suggesting there isn't something better
>>>>
>>> to
>>>
>>>> do, just a hunch).
>>>>
>>>> IMO, we don't have a lot of information on locality presently. I was
>>>> thinking it would be nice to create a tool to help us understand
>>>> locality
>>>> at all.
>>>>
>>>>  HBASE-4114 or SOLR-7458 are good things to look at here.
>>>
>>>
>>>  Also HDFS-4606
>>
>>
> Well then, HDFS-4606 reads very similar..
>

Yeah,  I made the one and only vote for the issue :)

Re: [DISCUSS] HDFS operation to support Accumulo locality

Posted by Josh Elser <jo...@gmail.com>.


Sean Busbey wrote:
> On Tue, Jun 30, 2015 at 10:50 AM, Mike Drob<ma...@cloudera.com>  wrote:
>
>> On Tue, Jun 30, 2015 at 10:39 AM, Josh Elser<jo...@gmail.com>  wrote:
>>
>>> Sorry in advance if I derail this, but I'm not sure what it would take to
>>> actually implement such an operation. The initial pushback might just be
>>> "use the block locations and assign the tablet yourself", since that's
>>> essentially what HBase does (not suggesting there isn't something better
>> to
>>> do, just a hunch).
>>>
>>> IMO, we don't have a lot of information on locality presently. I was
>>> thinking it would be nice to create a tool to help us understand locality
>>> at all.
>>>
>> HBASE-4114 or SOLR-7458 are good things to look at here.
>>
>>
> Also HDFS-4606
>

Well then, HDFS-4606 reads very similar..

Re: [DISCUSS] HDFS operation to support Accumulo locality

Posted by Sean Busbey <bu...@cloudera.com>.

On Tue, Jun 30, 2015 at 10:50 AM, Mike Drob <ma...@cloudera.com> wrote:

> On Tue, Jun 30, 2015 at 10:39 AM, Josh Elser <jo...@gmail.com> wrote:
>
> > Sorry in advance if I derail this, but I'm not sure what it would take to
> > actually implement such an operation. The initial pushback might just be
> > "use the block locations and assign the tablet yourself", since that's
> > essentially what HBase does (not suggesting there isn't something better
> to
> > do, just a hunch).
> >
> > IMO, we don't have a lot of information on locality presently. I was
> > thinking it would be nice to create a tool to help us understand locality
> > at all.
> >
>
> HBASE-4114 or SOLR-7458 are good things to look at here.
>
>
Also HDFS-4606

-- 
Sean

Re: [DISCUSS] HDFS operation to support Accumulo locality

Posted by Mike Drob <ma...@cloudera.com>.

On Tue, Jun 30, 2015 at 10:39 AM, Josh Elser <jo...@gmail.com> wrote:

> Sorry in advance if I derail this, but I'm not sure what it would take to
> actually implement such an operation. The initial pushback might just be
> "use the block locations and assign the tablet yourself", since that's
> essentially what HBase does (not suggesting there isn't something better to
> do, just a hunch).
>
> IMO, we don't have a lot of information on locality presently. I was
> thinking it would be nice to create a tool to help us understand locality
> at all.
>

HBASE-4114 or SOLR-7458 are good things to look at here.

>
> My guess is that after this, our next big gain would be choosing a better
> candidate for where to move a tablet in the case of rebalancing, splits and
> previous-server failure (pretty much all of the times that we aren't/can't
> put the tablet back to its previous loc). I'm not sure how far this would
> get us combined with the favored nodes API, e.g. a Tablet has some favored
> datanodes which we include in the HDFS calls and we can try to put the
> tablet on one of those nodes and assume that HDFS will have the blocks
> there.
>
> tl;dr I'd want to have examples of how that the current API is
> insufficient before lobbying for new HDFS APIs.
>
>
>
> Keith Turner wrote:
>
>> I just thought of one potential issue with this.  The same file can be
>> shared by multiple tablets on different tservers.   If there are more than
>> 3 tablets sharing a file, it could cause problems if all of them request a
>> local replica.  So if hdfs had this operation, Accumulo would have to be
>> careful about which files it requested local blocks for.
>>
>> On Tue, Jun 30, 2015 at 11:00 AM, Keith Turner<ke...@deenlo.com>  wrote:
>>
>>  There was a discussion on IRC about balancing and locality yesterday. I
>>> was thinking about the locallity problem, and started thinking about the
>>> possibility of having a HDFS operation that would force a file to have
>>> local replicas. I think approach this has the following pros over
>>> forcing a
>>> compaction.
>>>
>>>    * Only one replica is copied across the network.
>>>    * Avoids decompressing, deserializing, serializing, and compressing
>>> data.
>>>
>>> The tricky part about this approach is that Accumulo needs to decide when
>>> to ask HDFS to make a file local. This decision could be based on a
>>> function of the file size and number of recent accesses.
>>>
>>> We could avoid decompressing, deserializing, etc today by just copying
>>> (not compacting) frequently accessed files. However this would write 3
>>> replicas where a HDFS operation would only write one.
>>>
>>> Note for the assertion that only one replica would need to be copied I
>>> was
>>> thinking of following 3 initial conditions.  I am assuming we want to
>>> avoid
>>> all three replicas on same rack.
>>>
>>>   * Zero replicas on rack : can copy replica to node and drop replica on
>>> another rack.
>>>   * One replica on rack : can copy replica to node and drop any other
>>> replica.
>>>   * Two replicas on rack : can copy replica to node and drop another
>>> replica on same rack.
>>>
>>>
>>>
>>>
>>

Re: [DISCUSS] HDFS operation to support Accumulo locality

Posted by Josh Elser <jo...@gmail.com>.

Sorry in advance if I derail this, but I'm not sure what it would take 
to actually implement such an operation. The initial pushback might just 
be "use the block locations and assign the tablet yourself", since 
that's essentially what HBase does (not suggesting there isn't something 
better to do, just a hunch).

IMO, we don't have a lot of information on locality presently. I was 
thinking it would be nice to create a tool to help us understand 
locality at all.

My guess is that after this, our next big gain would be choosing a 
better candidate for where to move a tablet in the case of rebalancing, 
splits and previous-server failure (pretty much all of the times that we 
aren't/can't put the tablet back to its previous loc). I'm not sure how 
far this would get us combined with the favored nodes API, e.g. a Tablet 
has some favored datanodes which we include in the HDFS calls and we can 
try to put the tablet on one of those nodes and assume that HDFS will 
have the blocks there.

tl;dr I'd want to have examples of how that the current API is 
insufficient before lobbying for new HDFS APIs.

Keith Turner wrote:
> I just thought of one potential issue with this.  The same file can be
> shared by multiple tablets on different tservers.   If there are more than
> 3 tablets sharing a file, it could cause problems if all of them request a
> local replica.  So if hdfs had this operation, Accumulo would have to be
> careful about which files it requested local blocks for.
>
> On Tue, Jun 30, 2015 at 11:00 AM, Keith Turner<ke...@deenlo.com>  wrote:
>
>> There was a discussion on IRC about balancing and locality yesterday. I
>> was thinking about the locallity problem, and started thinking about the
>> possibility of having a HDFS operation that would force a file to have
>> local replicas. I think approach this has the following pros over forcing a
>> compaction.
>>
>>    * Only one replica is copied across the network.
>>    * Avoids decompressing, deserializing, serializing, and compressing data.
>>
>> The tricky part about this approach is that Accumulo needs to decide when
>> to ask HDFS to make a file local. This decision could be based on a
>> function of the file size and number of recent accesses.
>>
>> We could avoid decompressing, deserializing, etc today by just copying
>> (not compacting) frequently accessed files. However this would write 3
>> replicas where a HDFS operation would only write one.
>>
>> Note for the assertion that only one replica would need to be copied I was
>> thinking of following 3 initial conditions.  I am assuming we want to avoid
>> all three replicas on same rack.
>>
>>   * Zero replicas on rack : can copy replica to node and drop replica on
>> another rack.
>>   * One replica on rack : can copy replica to node and drop any other
>> replica.
>>   * Two replicas on rack : can copy replica to node and drop another
>> replica on same rack.
>>
>>
>>
>

Re: [DISCUSS] HDFS operation to support Accumulo locality

Posted by Keith Turner <ke...@deenlo.com>.

I just thought of one potential issue with this.  The same file can be
shared by multiple tablets on different tservers.   If there are more than
3 tablets sharing a file, it could cause problems if all of them request a
local replica.  So if hdfs had this operation, Accumulo would have to be
careful about which files it requested local blocks for.

On Tue, Jun 30, 2015 at 11:00 AM, Keith Turner <ke...@deenlo.com> wrote:

> There was a discussion on IRC about balancing and locality yesterday. I
> was thinking about the locallity problem, and started thinking about the
> possibility of having a HDFS operation that would force a file to have
> local replicas. I think approach this has the following pros over forcing a
> compaction.
>
>   * Only one replica is copied across the network.
>   * Avoids decompressing, deserializing, serializing, and compressing data.
>
> The tricky part about this approach is that Accumulo needs to decide when
> to ask HDFS to make a file local. This decision could be based on a
> function of the file size and number of recent accesses.
>
> We could avoid decompressing, deserializing, etc today by just copying
> (not compacting) frequently accessed files. However this would write 3
> replicas where a HDFS operation would only write one.
>
> Note for the assertion that only one replica would need to be copied I was
> thinking of following 3 initial conditions.  I am assuming we want to avoid
> all three replicas on same rack.
>
>  * Zero replicas on rack : can copy replica to node and drop replica on
> another rack.
>  * One replica on rack : can copy replica to node and drop any other
> replica.
>  * Two replicas on rack : can copy replica to node and drop another
> replica on same rack.
>
>
>