You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Abdullah Yousufi (JIRA)" <ji...@apache.org> on 2016/07/05 22:45:11 UTC

[jira] [Created] (HIVE-14165) Enable faster S3 Split Computation by listing files in blocks

Abdullah Yousufi created HIVE-14165:
---------------------------------------

             Summary: Enable faster S3 Split Computation by listing files in blocks
                 Key: HIVE-14165
                 URL: https://issues.apache.org/jira/browse/HIVE-14165
             Project: Hive
          Issue Type: Improvement
    Affects Versions: 2.1.0
            Reporter: Abdullah Yousufi
            Assignee: Abdullah Yousufi


During split computation when a large of files are required to be listed from S3 then instead of executing 1 API call per file, one can optimize by listing 1000 files in each API call. Thereby reducing the amount of time required for listing files.
Qubole has this optimization in place as detailed here: https://www.qubole.com/blog/product/optimizing-hadoop-for-s3-part-1/?nabe=5695374637924352:0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)