You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Jorge Bernal (Jira)" <ji...@apache.org> on 2020/06/22 09:27:00 UTC
[jira] [Updated] (IMPALA-9877) uuid() generates multiple values per
row when joined from a CTE
[ https://issues.apache.org/jira/browse/IMPALA-9877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jorge Bernal updated IMPALA-9877:
---------------------------------
Description:
I'm not sure if I'm misunderstanding the execution model here, but I would expect that when I use uuid() on a CTE, it returns a different value for each row of the CTE, and not one value per row of the final join.
It's hard to explain, so here's an example:
{code:java}
drop table if exists events;
create table events (uid int, ts int);
insert into events values (1,1), (2,2), (1,3);
+-----+----+
| uid | ts |
+-----+----+
| 1 | 1 |
| 2 | 2 |
| 1 | 3 |
+-----+----+{code}
If you then use uuid() on a CTE, and join it with something else, the resulting values might have a different uuid for the same input row:
{code:java}
with sessions as (
select
uid,
uuid() session_id,
min(ts) session_start
from events
group by uid
)
select
session_id,
events.*
from sessions join events on sessions.uid = events.uid;
drop table if exists koke_events;
+--------------------------------------+-----+----+
| session_id | uid | ts |
+--------------------------------------+-----+----+
| 71d3671a-fa97-4c00-8d90-d1584ad7cf18 | 2 | 2 |
| af564c03-2db3-44cb-bf5b-7748e231af72 | 1 | 3 |
| d4bc2779-1bd9-4a5e-a5e2-dc28b00d2f34 | 1 | 1 |
+--------------------------------------+-----+----+
{code}
I would expect that rows with the same {{uid}} would have the same {{session_id}} in this result.
When I run the same query in Hive, that's the behavior:
{code:java}
+---------------------------------------+-------------+------------+--+
| session_id | events.uid | events.ts |
+---------------------------------------+-------------+------------+--+
| e39555b2-c99b-4521-bfb4-9c4d0a5fd3cc | 1 | 1 |
| e39555b2-c99b-4521-bfb4-9c4d0a5fd3cc | 1 | 3 |
| 72f8a922-738e-4151-ba93-3b5e50eb9140 | 2 | 2 |
+---------------------------------------+-------------+------------+--+
{code}
was:
I'm not sure if I'm misunderstanding the execution model here, but I would expect that when I use uuid() on a CTE, it returns a different value for each row of the CTE, and not one value per row of the final join.
It's hard to explain, so here's an example:
{code:java}
drop table if exists events;
create table events (uid int, ts int);
insert into events values (1,1), (2,2), (1,3);
+-----+----+
| uid | ts |
+-----+----+
| 1 | 1 |
| 2 | 2 |
| 1 | 3 |
+-----+----+{code}
If you then use uuid() on a CTE, and join it with something else, the resulting values might have a different uuid for the same input row:
{code:java}
with sessions as (
select
uid,
uuid() session_id,
min(ts) session_start
from events
group by uid
)
select
session_id,
events.*
from sessions join events on sessions.uid = events.uid;
drop table if exists koke_events;
+--------------------------------------+-----+----+
| session_id | uid | ts |
+--------------------------------------+-----+----+
| 71d3671a-fa97-4c00-8d90-d1584ad7cf18 | 2 | 2 |
| af564c03-2db3-44cb-bf5b-7748e231af72 | 1 | 3 |
| d4bc2779-1bd9-4a5e-a5e2-dc28b00d2f34 | 1 | 1 |
+--------------------------------------+-----+----+
{code}
I would expect that rows with the same {{uid}} would have the same {{session_id}} in this result.
When I run the same query in Hive, that's the behavior:
{code:java}
+---------------------------------------+-------------+------------+--+
| session_id | events.uid | events.ts |
+---------------------------------------+-------------+------------+--+
| e39555b2-c99b-4521-bfb4-9c4d0a5fd3cc | 1 | 1 |
| e39555b2-c99b-4521-bfb4-9c4d0a5fd3cc | 1 | 3 |
| 72f8a922-738e-4151-ba93-3b5e50eb9140 | 2 | 2 |
+---------------------------------------+-------------+------------+--+
{code}
> uuid() generates multiple values per row when joined from a CTE
> ---------------------------------------------------------------
>
> Key: IMPALA-9877
> URL: https://issues.apache.org/jira/browse/IMPALA-9877
> Project: IMPALA
> Issue Type: Bug
> Affects Versions: Impala 2.12.0
> Environment: impalad version 2.12.0-cdh5.16.1 RELEASE (build 4a3775ef6781301af81b23bca45a9faeca5e761d)
> Reporter: Jorge Bernal
> Priority: Major
>
> I'm not sure if I'm misunderstanding the execution model here, but I would expect that when I use uuid() on a CTE, it returns a different value for each row of the CTE, and not one value per row of the final join.
> It's hard to explain, so here's an example:
>
> {code:java}
> drop table if exists events;
> create table events (uid int, ts int);
> insert into events values (1,1), (2,2), (1,3);
> +-----+----+
> | uid | ts |
> +-----+----+
> | 1 | 1 |
> | 2 | 2 |
> | 1 | 3 |
> +-----+----+{code}
>
> If you then use uuid() on a CTE, and join it with something else, the resulting values might have a different uuid for the same input row:
> {code:java}
> with sessions as (
> select
> uid,
> uuid() session_id,
> min(ts) session_start
> from events
> group by uid
> )
> select
> session_id,
> events.*
> from sessions join events on sessions.uid = events.uid;
> drop table if exists koke_events;
> +--------------------------------------+-----+----+
> | session_id | uid | ts |
> +--------------------------------------+-----+----+
> | 71d3671a-fa97-4c00-8d90-d1584ad7cf18 | 2 | 2 |
> | af564c03-2db3-44cb-bf5b-7748e231af72 | 1 | 3 |
> | d4bc2779-1bd9-4a5e-a5e2-dc28b00d2f34 | 1 | 1 |
> +--------------------------------------+-----+----+
> {code}
> I would expect that rows with the same {{uid}} would have the same {{session_id}} in this result.
> When I run the same query in Hive, that's the behavior:
> {code:java}
> +---------------------------------------+-------------+------------+--+
> | session_id | events.uid | events.ts |
> +---------------------------------------+-------------+------------+--+
> | e39555b2-c99b-4521-bfb4-9c4d0a5fd3cc | 1 | 1 |
> | e39555b2-c99b-4521-bfb4-9c4d0a5fd3cc | 1 | 3 |
> | 72f8a922-738e-4151-ba93-3b5e50eb9140 | 2 | 2 |
> +---------------------------------------+-------------+------------+--+
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org