You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Krisztian Kasa (JIRA)" <ji...@apache.org> on 2019/06/14 10:52:00 UTC
[jira] [Commented] (HIVE-21547) Temp Tables: Use stORC format for
temporary tables
[ https://issues.apache.org/jira/browse/HIVE-21547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863942#comment-16863942 ]
Krisztian Kasa commented on HIVE-21547:
---------------------------------------
[~gopalv]
I found that the propery hive.exec.orc.delta.streaming.optimizations.enabled is used only when the table storage format is orc and acid.
https://github.com/apache/hive/blob/f62379ba279f41b843fcd5f3d4a107b6fcd04dec/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java#L358
Than I tried creating and inserting into normal and temp tables but i didn't found any differences between storage format.
{code}
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.vectorized.execution.enabled=true;
create table table1(id int) stored as orc tblproperties("transactional"="true");
create temporary table tmptable1(id int) stored as orc tblproperties("transactional"="true");
insert into table table1 values (1),(2),(3),(4);
insert into table tmptable1 values (1),(2),(3),(4);
set hive.exec.orc.delta.streaming.optimizations.enabled=true;
create table table2(id int) stored as orc tblproperties("transactional"="true");
create temporary table tmptable2(id int) stored as orc tblproperties("transactional"="true");
insert into table table2 values (1),(2),(3),(4);
insert into table tmptable2 values (1),(2),(3),(4);
{code}
I used orc-tools to get meta info from the orc files contains the tables data
{code}
orc-tools meta hive/itests/qtest/target/tmp/scratchdir/kkasa/3477e2d3-8551-48ec-bbbf-1967bd642d3a/_tmp_space.db/37118e31-0b83-41a6-b080-b8bb3dab30f8/delta_0000001_0000001_0000/bucket_00000
{code}
Only the file path is different between table1 and tmptable1 meta info and table2 and tmptable2 meat info.
Please help me to clarify the goal of this ticket:
Should hive.exec.orc.delta.streaming.optimizations.enabled always true in case of temp tables?
What if the "stored as orc" and "tblproperties("transactional"="true")" clauses are not specified?
Should temp tables always stored in orc format?
> Temp Tables: Use stORC format for temporary tables
> --------------------------------------------------
>
> Key: HIVE-21547
> URL: https://issues.apache.org/jira/browse/HIVE-21547
> Project: Hive
> Issue Type: Improvement
> Components: ORC
> Affects Versions: 4.0.0
> Reporter: Gopal V
> Priority: Major
>
> Using st(reaming)ORC (hive.exec.orc.delta.streaming.optimizations.enabled=true) format has massive performance advantages when creating data-sets which will not be stored for long-term.
> The format is compatible with ORC for vectorization and other features, while being cheaper to write out to filesystem.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)