You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by no jihun <je...@gmail.com> on 2016/04/08 16:01:13 UTC
ORC compaction not happen.
Hello.
Does anyone can give me some advice?
I am trying to make this scenario work.
A. create orc, bucketed table.
create table_orc ( field1, field2 )
clustered by (field1, field2) into 64 buckets
stored as ORC
B. add rows to table_orc *HOURLY.*
insert into table_orc
select * from hourly_row_2016040821
distribute by (field1, field2)
# after create table by query A
# then run query B (once)
there exists one file per bucket.
[image: 본문 이미지 1]
Now one hour later
I run query B again to import the next hour's data into same table
insert into table_orc
select * from hourly_row_2016040822
distribute by (field1, field2)
I expected there may be some transaction files, delta files .
like orc document says.(https://orc.apache.org/docs/acid.html)
[image: 본문 이미지 2]
But there only found XXXX_copy_i files.
[image: 본문 이미지 3]
and compaction never happens.
This is ACID settings on ambari.
[image: 본문 이미지 4]
Is this expected result?
How can I run multiple insert into X select from Y
and keep one file per bucket by compaction?
No way by insert query?
Any advice will be appreciated.
Thank you.
Re: ORC compaction not happen.
Posted by Mich Talebzadeh <mi...@gmail.com>.
I am guessing here but you may need to define table as ORC transactional
hive> show create table sales3;
OK
CREATE TABLE `sales3`(
`prod_id` bigint,
`cust_id` bigint,
`time_id` timestamp,
`channel_id` bigint,
`promo_id` bigint,
`quantity_sold` decimal(10,0),
`amount_sold` decimal(10,0))
CLUSTERED BY (
prod_id,
cust_id,
time_id,
channel_id,
promo_id)
INTO 256 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://rhes564:9000/user/hive/warehouse/oraclehadoop.db/sales3'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
'numFiles'='512',
'numRows'='5000000',
'orc.compress'='SNAPPY',
'rawDataSize'='0',
'totalSize'='86027477',
'transactional'='true',
'transient_lastDdlTime'='1457429932')
2016-04-08T15:31:46,139 INFO [Thread-9]: compactor.Initiator
(Initiator.java:run(89)) - Checking to see if we should compact
oraclehadoop.sales3
HTH
Dr Mich Talebzadeh
LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 8 April 2016 at 15:01, no jihun <je...@gmail.com> wrote:
> Hello.
>
> Does anyone can give me some advice?
>
> I am trying to make this scenario work.
>
> A. create orc, bucketed table.
>
> create table_orc ( field1, field2 )
> clustered by (field1, field2) into 64 buckets
> stored as ORC
>
>
> B. add rows to table_orc *HOURLY.*
>
> insert into table_orc
> select * from hourly_row_2016040821
> distribute by (field1, field2)
>
>
> # after create table by query A
> # then run query B (once)
>
> there exists one file per bucket.
> [image: 본문 이미지 1]
>
>
> Now one hour later
> I run query B again to import the next hour's data into same table
>
> insert into table_orc
> select * from hourly_row_2016040822
> distribute by (field1, field2)
>
>
> I expected there may be some transaction files, delta files .
> like orc document says.(https://orc.apache.org/docs/acid.html)
> [image: 본문 이미지 2]
>
>
> But there only found XXXX_copy_i files.
> [image: 본문 이미지 3]
>
> and compaction never happens.
>
> This is ACID settings on ambari.
> [image: 본문 이미지 4]
>
>
> Is this expected result?
>
> How can I run multiple insert into X select from Y
> and keep one file per bucket by compaction?
>
> No way by insert query?
>
>
> Any advice will be appreciated.
>
> Thank you.
>