You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Gabriel C Balan (JIRA)" <ji...@apache.org> on 2016/03/29 20:43:25 UTC

[jira] [Created] (HIVE-13377) Lost rows when using compact index on parquet table

Gabriel C Balan created HIVE-13377:
--------------------------------------

             Summary: Lost rows when using compact index on parquet table
                 Key: HIVE-13377
                 URL: https://issues.apache.org/jira/browse/HIVE-13377
             Project: Hive
          Issue Type: Bug
          Components: Indexing
    Affects Versions: 1.1.0
         Environment: linux, cdh 5.5.0
            Reporter: Gabriel C Balan
            Priority: Minor


Query with where clause on a parquet table loses rows when using a compact index. The query produces the right results without the index.

{code}
create table small_parq(i int) stored as parquet;

insert into table small_parq values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11);

set hive.optimize.index.filter=true;
set hive.optimize.index.filter.compact.minsize=50;

create index  comp_idx on table small_parq (i) as 'compact' WITH DEFERRED REBUILD;
alter index comp_idx on small_parq rebuild;

select * from small_parq where i=3;
--this correctly produces 1 row (value 3).

select * from small_parq where i=11;
--this incorrectly produces 0 rows.

--I see correct results when looking for a row in [1,6];
--I see bad results when looking for a row in [7,11].

--All is well once I disable the compact index
set hive.optimize.index.filter.compact.minsize=50000000;
select * from small_parq where i=11;
--now it correctly produces 1 row (value 11).
{code}

It seems I can't reproduce this issue if the base table was ORC, SEQ, AVRO, TEXTFILE.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)