You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Peter Varga (Jira)" <ji...@apache.org> on 2020/09/14 16:45:00 UTC

[jira] [Created] (HIVE-24162) Query based compaction looses bloom filter

Peter Varga created HIVE-24162:
----------------------------------

             Summary: Query based compaction looses bloom filter
                 Key: HIVE-24162
                 URL: https://issues.apache.org/jira/browse/HIVE-24162
             Project: Hive
          Issue Type: Bug
            Reporter: Peter Varga
            Assignee: Peter Varga


*Steps to reproduce:*
  
{noformat}
+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
| CREATE TABLE `bloomTest`(                          |
|   `msisdn` string,                                 |
|   `imsi` varchar(20),                              |
|   `imei` bigint,                                   |
|   `cell_id` bigint)                                |
| ROW FORMAT SERDE                                   |
|   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'      |
| STORED AS INPUTFORMAT                              |
|   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
| LOCATION                                           |
|   's3a://dwxtpcds30-wwgq-dwx-managed/clusters/env-6cwwgq/warehouse-1580338415-7dph/warehouse/tablespace/managed/hive/del_db.db/bloomtest' |
| TBLPROPERTIES (                                    |
|   'bucketing_version'='2',                         |
|   'orc.bloom.filter.columns'='msisdn,cell_id,imsi',  |
|   'orc.bloom.filter.fpp'='0.02',                   |
|   'transactional'='true',                          |
|   'transactional_properties'='default',            |
|   'transient_lastDdlTime'='1597222946')            |
+----------------------------------------------------+

insert into  bloomTest values ("a", "b", 10, 20);
insert into  bloomTest values ("aa", "bb", 100, 200);
insert into  bloomTest values ("aaa", "bbb", 1000, 2000);

select * from bloomTest;
+-------------------+-----------------+-----------------+--------------------+
| bloomtest.msisdn  | bloomtest.imsi  | bloomtest.imei  | bloomtest.cell_id  |
+-------------------+-----------------+-----------------+--------------------+
| a                 | b               | 10              | 20                 |
| aa                | bb              | 100             | 200                |
| aaa               | bbb             | 1000            | 2000               |
+-------------------+-----------------+-----------------+--------------------+


{noformat}
 - Compact the table
{code:java}
alter table bloomTest compact 'MAJOR';
{code}

 - Wait for the compaction to be over and check for bloom filters in dataset.
  
 - delta would have it, but not in the base dataset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)