You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Nihal Jain (JIRA)" <ji...@apache.org> on 2019/01/08 18:24:00 UTC

[jira] [Comment Edited] (HBASE-21672) Allow skipping HDFS block distribution computation

    [ https://issues.apache.org/jira/browse/HBASE-21672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16737370#comment-16737370 ] 

Nihal Jain edited comment on HBASE-21672 at 1/8/19 6:23 PM:
------------------------------------------------------------

{quote}Shouldn't this either be a no-op for filesystems that don't have locality, or something we can just ask the filesystem?
{quote}
The file-system does not directly return anything as locality as such. We have some logic to calculate it in hbase. it is based on {{HDFSBlocksDistribution}} information which we create using block location information returned by under lying fs. 
{code:java}
  static public HDFSBlocksDistribution computeHDFSBlocksDistribution(
    final FileSystem fs, FileStatus status, long start, long length)
    throws IOException {
    HDFSBlocksDistribution blocksDistribution = new HDFSBlocksDistribution();
    BlockLocation [] blockLocations =
      fs.getFileBlockLocations(status, start, length);
    for(BlockLocation bl : blockLocations) {
      String [] hosts = bl.getHosts();
      long len = bl.getLength();
      blocksDistribution.addHostsAndBlockWeight(hosts, len);
    }

    return blocksDistribution;
  }
{code}
 

I think this solution should be fine, and will be useful, given we know our fs would not do us any good and may waste cpu cycles in creating this {{HDFSBlocksDistribution}} information. In fact we already have something similar in HBase, see HBASE-18478.


was (Author: nihaljain.cs):
{quote}Shouldn't this either be a no-op for filesystems that don't have locality, or something we can just ask the filesystem?
{quote}
The file-system does not directly return anything as locality as such. We have some logic to calculate it in hbase. it is based on {{HDFSBlocksDistribution}} information which we create using block location information returned by under lying fs.

I think this solution should be fine, and will be useful, given we know our fs would not do us any good and may waste cpu cycles in creating this {{HDFSBlocksDistribution}} information. In fact we already have something similar in HBase, see [HBASE-18478|https://issues.apache.org/jira/browse/HBASE-18478].

> Allow skipping HDFS block distribution computation
> --------------------------------------------------
>
>                 Key: HBASE-21672
>                 URL: https://issues.apache.org/jira/browse/HBASE-21672
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nihal Jain
>            Assignee: Nihal Jain
>            Priority: Major
>              Labels: S3
>
> We should have a configuration to skip HDFS block distribution calculation in HBase. For example on file systems that do not surface locality such as S3, calculating block distribution would not be any useful.
> Currentlly, we do not have a way to skip hdfs block distribution computation. For this, we can provide a new configuration key, say {{hbase.block.distribution.skip.computation}} (which would be {{false}} by default).
> Users using filesystems such as s3 may choose to make this {{true}}, thus skipping block distribution computation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)