You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Krisztian Kasa (JIRA)" <ji...@apache.org> on 2019/06/14 10:52:00 UTC

[jira] [Commented] (HIVE-21547) Temp Tables: Use stORC format for temporary tables

    [ https://issues.apache.org/jira/browse/HIVE-21547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863942#comment-16863942 ] 

Krisztian Kasa commented on HIVE-21547:
---------------------------------------

[~gopalv]
I found that the propery hive.exec.orc.delta.streaming.optimizations.enabled is used only when the table storage format is orc and acid.
https://github.com/apache/hive/blob/f62379ba279f41b843fcd5f3d4a107b6fcd04dec/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java#L358

Than I tried creating and inserting into normal and temp tables but i didn't found any differences between storage format.
{code}
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.vectorized.execution.enabled=true;

create table table1(id int) stored as orc tblproperties("transactional"="true");
create temporary table tmptable1(id int) stored as orc tblproperties("transactional"="true");

insert into table table1 values (1),(2),(3),(4);
insert into table tmptable1 values (1),(2),(3),(4);

set hive.exec.orc.delta.streaming.optimizations.enabled=true;

create table table2(id int) stored as orc tblproperties("transactional"="true");
create temporary table tmptable2(id int) stored as orc tblproperties("transactional"="true");

insert into table table2 values (1),(2),(3),(4);
insert into table tmptable2 values (1),(2),(3),(4);
{code}

I used orc-tools to get meta info from the orc files contains the tables data
{code}
orc-tools meta hive/itests/qtest/target/tmp/scratchdir/kkasa/3477e2d3-8551-48ec-bbbf-1967bd642d3a/_tmp_space.db/37118e31-0b83-41a6-b080-b8bb3dab30f8/delta_0000001_0000001_0000/bucket_00000
{code}

Only the file path is different between table1 and tmptable1 meta info and table2 and tmptable2 meat info.

Please help me to clarify the goal of this ticket:
Should hive.exec.orc.delta.streaming.optimizations.enabled always true in case of temp tables?
What if the "stored as orc" and "tblproperties("transactional"="true")" clauses are not specified? 
Should temp tables always stored in orc format?


> Temp Tables: Use stORC format for temporary tables
> --------------------------------------------------
>
>                 Key: HIVE-21547
>                 URL: https://issues.apache.org/jira/browse/HIVE-21547
>             Project: Hive
>          Issue Type: Improvement
>          Components: ORC
>    Affects Versions: 4.0.0
>            Reporter: Gopal V
>            Priority: Major
>
> Using st(reaming)ORC (hive.exec.orc.delta.streaming.optimizations.enabled=true) format has massive performance advantages when creating data-sets which will not be stored for long-term.
> The format is compatible with ORC for vectorization and other features, while being cheaper to write out to filesystem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)