You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Jorge Bernal (Jira)" <ji...@apache.org> on 2020/06/22 09:26:00 UTC
[jira] [Created] (IMPALA-9877) uuid() generates multiple values per
row when joined from a CTE
Jorge Bernal created IMPALA-9877:
------------------------------------
Summary: uuid() generates multiple values per row when joined from a CTE
Key: IMPALA-9877
URL: https://issues.apache.org/jira/browse/IMPALA-9877
Project: IMPALA
Issue Type: Bug
Affects Versions: Impala 2.12.0
Environment: impalad version 2.12.0-cdh5.16.1 RELEASE (build 4a3775ef6781301af81b23bca45a9faeca5e761d)
Reporter: Jorge Bernal
I'm not sure if I'm misunderstanding the execution model here, but I would expect that when I use uuid() on a CTE, it returns a different value for each row of the CTE, and not one value per row of the final join.
It's hard to explain, so here's an example:
{code:java}
drop table if exists events;
create table events (uid int, ts int);
insert into events values (1,1), (2,2), (1,3);
+-----+----+
| uid | ts |
+-----+----+
| 1 | 1 |
| 2 | 2 |
| 1 | 3 |
+-----+----+{code}
If you then use uuid() on a CTE, and join it with something else, the resulting values might have a different uuid for the same input row:
{code:java}
with sessions as (
select
uid,
uuid() session_id,
min(ts) session_start
from events
group by uid
)
select
session_id,
events.*
from sessions join events on sessions.uid = events.uid;
drop table if exists koke_events;
+--------------------------------------+-----+----+
| session_id | uid | ts |
+--------------------------------------+-----+----+
| 71d3671a-fa97-4c00-8d90-d1584ad7cf18 | 2 | 2 |
| af564c03-2db3-44cb-bf5b-7748e231af72 | 1 | 3 |
| d4bc2779-1bd9-4a5e-a5e2-dc28b00d2f34 | 1 | 1 |
+--------------------------------------+-----+----+
{code}
I would expect that rows with the same {{uid}} would have the same {{session_id}} in this result.
When I run the same query in Hive, that's the behavior:
{code:java}
+---------------------------------------+-------------+------------+--+
| session_id | events.uid | events.ts |
+---------------------------------------+-------------+------------+--+
| e39555b2-c99b-4521-bfb4-9c4d0a5fd3cc | 1 | 1 |
| e39555b2-c99b-4521-bfb4-9c4d0a5fd3cc | 1 | 3 |
| 72f8a922-738e-4151-ba93-3b5e50eb9140 | 2 | 2 |
+---------------------------------------+-------------+------------+--+
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)