You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Sankar Hariappan (Jira)" <ji...@apache.org> on 2020/04/30 11:02:00 UTC

[jira] [Assigned] (HIVE-23230) "get_splits" UDF ignores limit clause while creating splits.

     [ https://issues.apache.org/jira/browse/HIVE-23230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sankar Hariappan reassigned HIVE-23230:
---------------------------------------

    Assignee: Sankar Hariappan  (was: Adesh Kumar Rao)

> "get_splits" UDF ignores limit clause while creating splits.
> ------------------------------------------------------------
>
>                 Key: HIVE-23230
>                 URL: https://issues.apache.org/jira/browse/HIVE-23230
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 3.1.0
>            Reporter: Adesh Kumar Rao
>            Assignee: Sankar Hariappan
>            Priority: Major
>              Labels: UDF
>             Fix For: 4.0.0
>
>         Attachments: HIVE-23230.1.patch, HIVE-23230.2.patch, HIVE-23230.3.patch, HIVE-23230.4.patch, HIVE-23230.5.patch, HIVE-23230.patch
>
>
> Issue: Running the query {noformat}select * from <table> limit n{noformat} from spark via hive warehouse connector may return more rows than "n".
> This happens because "get_splits" udf creates splits ignoring the limit constraint. These splits when submitted to multiple llap daemons will return "n" rows each.
> How to reproduce: Needs spark-shell, hive-warehouse-connector and hive on llap with more that 1 llap daemons running.
> run below commands via beeline to create and populate the table
>  
> {noformat}
> create table test (id int);
> insert into table test values (1);
> insert into table test values (2);
> insert into table test values (3);
> insert into table test values (4);
> insert into table test values (5);
> insert into table test values (6);
> insert into table test values (7);
> delete from test where id = 7;{noformat}
> now running below query via spark-shell
> {noformat}
> import com.hortonworks.hwc.HiveWarehouseSession 
> val hive = HiveWarehouseSession.session(spark).build() 
> hive.executeQuery("select * from test limit 1").show()
> {noformat}
> will return more than 1 rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)