You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2018/03/01 01:04:00 UTC

[jira] [Commented] (HIVE-18822) INSERT VALUES - HoS + Steaming File Format

    [ https://issues.apache.org/jira/browse/HIVE-18822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381342#comment-16381342 ] 

Thejas M Nair commented on HIVE-18822:
--------------------------------------

This is not exactly what you are asking for, but FYI - [Streaming ingest feature (ACID)|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest], you can get more efficient inserts without running into small files issue. But it needs ORC file format, and its not SQL api.

> INSERT VALUES - HoS + Steaming File Format
> ------------------------------------------
>
>                 Key: HIVE-18822
>                 URL: https://issues.apache.org/jira/browse/HIVE-18822
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>    Affects Versions: 3.0.0
>            Reporter: BELUGA BEHR
>            Priority: Minor
>
> Please optimize the INSERT VALUES function.  When HoS is being used, and a streaming format such as TEXT or AVRO are being used, INSERT VALUES statements should be quick.  The HiveServer2 should pass the vales to the Executor and the Executor should simply append the data to an existing HDFS file instead of creating a new one.  This will reduce the number of small files that exist in the file system... or perhaps the HiveServer2 performs the append without having to first sent the data to the processing engine at all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)