You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Rob Leidle (JIRA)" <ji...@apache.org> on 2016/03/21 20:26:25 UTC

[jira] [Created] (HIVE-13321) Add support for different output strategies

Rob Leidle created HIVE-13321:
---------------------------------

Summary: Add support for different output strategies
Key: HIVE-13321
URL: https://issues.apache.org/jira/browse/HIVE-13321
Project: Hive
Issue Type: Improvement
Reporter: Rob Leidle

The Hadoop ecosystem has expanded to support a wider variety of data-stores and filesystems than simply HDFS. These FileSystems have different write atomicity and read consistency guarantees. There are enhancements we can make to Hive to ensure Hive works even better with a wider variety of FileSystems in the Hadoop ecosystem. We can see work going on in the Hadoop project to robustly support these FileSystems. One such example is HADOOP-9565 where the behavior of MapReduce output is enhanced to do what is optimal for different FileSystems.

A common pattern in MapReduce and Hive is to write all output into a temporary folder and then rename this temporary folder to match the final output location. When using some of the newer FileSystems with Hive, the performance can be improved by directly writing output and avoiding the temporary folder write & rename.

The proposal is to enhance Hive to support different strategies for file output. One such strategy would be a concept named “DirectWrite”. DirectWrite will be optionally enabled, likely on a per-FileSystem basis. When DirectWrite is enabled, all Hive job output will be written directly to the output location.

This is an umbrella JIRA for all the tasks related to this functionality.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)