You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sankar Hariappan (Jira)" <ji...@apache.org> on 2020/04/30 11:02:00 UTC
[jira] [Assigned] (HIVE-23230) "get_splits" UDF ignores limit
clause while creating splits.
[ https://issues.apache.org/jira/browse/HIVE-23230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sankar Hariappan reassigned HIVE-23230:
---------------------------------------
Assignee: Sankar Hariappan (was: Adesh Kumar Rao)
> "get_splits" UDF ignores limit clause while creating splits.
> ------------------------------------------------------------
>
> Key: HIVE-23230
> URL: https://issues.apache.org/jira/browse/HIVE-23230
> Project: Hive
> Issue Type: Bug
> Components: HiveServer2
> Affects Versions: 3.1.0
> Reporter: Adesh Kumar Rao
> Assignee: Sankar Hariappan
> Priority: Major
> Labels: UDF
> Fix For: 4.0.0
>
> Attachments: HIVE-23230.1.patch, HIVE-23230.2.patch, HIVE-23230.3.patch, HIVE-23230.4.patch, HIVE-23230.5.patch, HIVE-23230.patch
>
>
> Issue: Running the query {noformat}select * from <table> limit n{noformat} from spark via hive warehouse connector may return more rows than "n".
> This happens because "get_splits" udf creates splits ignoring the limit constraint. These splits when submitted to multiple llap daemons will return "n" rows each.
> How to reproduce: Needs spark-shell, hive-warehouse-connector and hive on llap with more that 1 llap daemons running.
> run below commands via beeline to create and populate the table
>
> {noformat}
> create table test (id int);
> insert into table test values (1);
> insert into table test values (2);
> insert into table test values (3);
> insert into table test values (4);
> insert into table test values (5);
> insert into table test values (6);
> insert into table test values (7);
> delete from test where id = 7;{noformat}
> now running below query via spark-shell
> {noformat}
> import com.hortonworks.hwc.HiveWarehouseSession
> val hive = HiveWarehouseSession.session(spark).build()
> hive.executeQuery("select * from test limit 1").show()
> {noformat}
> will return more than 1 rows.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)