You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Thomas Poepping (JIRA)" <ji...@apache.org> on 2016/12/21 13:12:59 UTC

[jira] [Commented] (HIVE-1620) Patch to write directly to S3 from Hive

    [ https://issues.apache.org/jira/browse/HIVE-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767027#comment-15767027 ] 

Thomas Poepping commented on HIVE-1620:
---------------------------------------

Hi Sahil,
 
Yes, direct write works well in production. There are definitely some difficult design decisions to be made, and as you say, there is no great solution to clean up after failure. Some other issues are: self-referencing insert overwrite data loss, metadata loss in dynamic partitioning, no good visibility of partial results. There are workarounds / best practices for these, though. We are happy to engage in conversation about pros and cons.
 
The biggest thing we would like to stress with these implementations is that they should be pluggable. The solution should be as generic as possible to avoid spaghetti code.
 
We think the best solution is to make this a conversation about the best design. We are happy to participate in a community design and implementation, drawing on our experience with these types of issues.

> Patch to write directly to S3 from Hive
> ---------------------------------------
>
>                 Key: HIVE-1620
>                 URL: https://issues.apache.org/jira/browse/HIVE-1620
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Vaibhav Aggarwal
>            Assignee: Vaibhav Aggarwal
>         Attachments: HIVE-1620.patch
>
>
> We want to submit a patch to Hive which allows user to write files directly to S3.
> This patch allow user to specify an S3 location as the table output location and hence eliminates the need  of copying data from HDFS to S3.
> Users can run Hive queries directly over the data stored in S3.
> This patch helps integrate hive with S3 better and quicker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)