You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sankar Hariappan (Jira)" <ji...@apache.org> on 2020/04/30 10:50:00 UTC

[jira] [Updated] (HIVE-23230) "get_splits" UDF ignores limit clause while creating splits.

     [ https://issues.apache.org/jira/browse/HIVE-23230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sankar Hariappan updated HIVE-23230:
------------------------------------
    Summary: "get_splits" UDF ignores limit clause while creating splits.  (was: "get_splits" udf ignores limit constraint while creating splits)

> "get_splits" UDF ignores limit clause while creating splits.
> ------------------------------------------------------------
>
>                 Key: HIVE-23230
>                 URL: https://issues.apache.org/jira/browse/HIVE-23230
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 3.1.0
>            Reporter: Adesh Kumar Rao
>            Assignee: Adesh Kumar Rao
>            Priority: Major
>         Attachments: HIVE-23230.1.patch, HIVE-23230.2.patch, HIVE-23230.3.patch, HIVE-23230.4.patch, HIVE-23230.5.patch, HIVE-23230.patch
>
>
> Issue: Running the query {noformat}select * from <table> limit n{noformat} from spark via hive warehouse connector may return more rows than "n".
> This happens because "get_splits" udf creates splits ignoring the limit constraint. These splits when submitted to multiple llap daemons will return "n" rows each.
> How to reproduce: Needs spark-shell, hive-warehouse-connector and hive on llap with more that 1 llap daemons running.
> run below commands via beeline to create and populate the table
>  
> {noformat}
> create table test (id int);
> insert into table test values (1);
> insert into table test values (2);
> insert into table test values (3);
> insert into table test values (4);
> insert into table test values (5);
> insert into table test values (6);
> insert into table test values (7);
> delete from test where id = 7;{noformat}
> now running below query via spark-shell
> {noformat}
> import com.hortonworks.hwc.HiveWarehouseSession 
> val hive = HiveWarehouseSession.session(spark).build() 
> hive.executeQuery("select * from test limit 1").show()
> {noformat}
> will return more than 1 rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)