You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Xiang Li (JIRA)" <ji...@apache.org> on 2017/09/12 09:07:00 UTC

[jira] [Commented] (HBASE-15482) Provide an option to skip calculating block locations for SnapshotInputFormat

    [ https://issues.apache.org/jira/browse/HBASE-15482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162699#comment-16162699 ] 

Xiang Li commented on HBASE-15482:
----------------------------------

Hi, this JIRA is still valid now? I plan to work on it if it is still valid.

> Provide an option to skip calculating block locations for SnapshotInputFormat
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-15482
>                 URL: https://issues.apache.org/jira/browse/HBASE-15482
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Liyin Tang
>            Priority: Minor
>
> When a MR job is reading from SnapshotInputFormat, it needs to calculate the splits based on the block locations in order to get best locality. However, this process may take a long time for large snapshots. 
> In some setup, the computing layer, Spark, Hive or Presto could run out side of HBase cluster. In these scenarios, the block locality doesn't matter. Therefore, it will be great to have an option to skip calculating the block locations for every job. That will super useful for the Hive/Presto/Spark connectors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)