You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2016/11/23 03:29:58 UTC

[jira] [Created] (HIVE-15268) limit+offset is broken (differently for ACID or not)

Sergey Shelukhin created HIVE-15268:
---------------------------------------

             Summary: limit+offset is broken (differently for ACID or not)
                 Key: HIVE-15268
                 URL: https://issues.apache.org/jira/browse/HIVE-15268
             Project: Hive
          Issue Type: Bug
            Reporter: Sergey Shelukhin


I think some part of putting limit on the map side implicitly assumes there is CombineHiveInputFormat; when splits are not combined, results are incorrect. In fact they are also incorrect for ORC, although differently, even though it seems like it should combined splits. I didn't fully investigate.
IIRC results are correct with text.

{noformat}
set hive.fetch.task.conversion=none;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.exec.dynamic.partition.mode=nonstrict;

CREATE TABLE limitoffset_text (key STRING, value STRING) PARTITIONED BY (ds STRING, hr STRING);
CREATE TABLE limitoffset (key STRING, value STRING) PARTITIONED BY (ds STRING, hr STRING) STORED AS orc;
create table acid_dynamic(key STRING, value STRING) PARTITIONED BY (ds STRING, hr STRING) 
clustered by (key) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true');

insert INTO TABLE limitoffset PARTITION (ds, hr) select * from srcpart;
insert INTO TABLE limitoffset_text PARTITION (ds, hr) select * from srcpart;
insert INTO TABLE acid_dynamic PARTITION (ds, hr) select * from srcpart;

select count(key) from limitoffset_text;
select count(key) from limitoffset;
select count(key) from acid_dynamic;

SELECT limitoffset_text.key FROM limitoffset_text LIMIT 490,200;
SELECT acid_dynamic.key FROM acid_dynamic LIMIT 490,200;
SELECT limitoffset.key FROM limitoffset LIMIT 490,200;
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)