You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Glenn Justo Galvizo (Jira)" <ji...@apache.org> on 2021/10/06 22:25:00 UTC
[jira] [Created] (ASTERIXDB-2971) UNION ALL query giving wrong
result
Glenn Justo Galvizo created ASTERIXDB-2971:
----------------------------------------------
Summary: UNION ALL query giving wrong result
Key: ASTERIXDB-2971
URL: https://issues.apache.org/jira/browse/ASTERIXDB-2971
Project: Apache AsterixDB
Issue Type: Bug
Components: COMP - Compiler
Reporter: Glenn Justo Galvizo
Attachments: non-union.plan, union-all.plan
Using the DDLs:
{code:java}
DROP DATAVERSE Yelp IF EXISTS;
CREATE DATAVERSE Yelp;
USE Yelp;
CREATE TYPE GenericType
AS { _id: uuid };
CREATE DATASET Businesses (GenericType)
PRIMARY KEY _id AUTOGENERATED;
CREATE DATASET Checkins (GenericType)
PRIMARY KEY _id AUTOGENERATED;
CREATE DATASET Friends (GenericType)
PRIMARY KEY _id AUTOGENERATED;
CREATE DATASET Reviews (GenericType)
PRIMARY KEY _id AUTOGENERATED;
CREATE DATASET Tips (GenericType)
PRIMARY KEY _id AUTOGENERATED;
CREATE DATASET Users (GenericType)
PRIMARY KEY _id AUTOGENERATED;
CREATE INDEX businessesPrimaryIdx
ON Businesses ( business_id : string );
CREATE INDEX checkinBusinessIDIdx
ON Checkins ( business_id : string );
CREATE INDEX friendsUserIDIdx
ON Friends ( user_id : string );
CREATE INDEX reviewsPrimaryIdx
ON Reviews ( review_id : string );
CREATE INDEX tipsUserIDIdx
ON Tips ( user_id : string );
CREATE INDEX usersPrimaryIdx
ON Users ( user_id : string );
{code}
The following query returns the correct count results:
{code:java}
WITH U1 AS (
FROM (FROM Yelp.Reviews R SELECT VALUE R LIMIT 100000) R,
Yelp.Users U1, Yelp.Tips T, Yelp.Checkins C
WHERE R.user_id = U1.user_id AND
R.business_id = C.business_id AND
R.user_id = T.user_id AND
LEN(U1.friends) BETWEEN 10 AND 30
GROUP BY U1.user_id
LET reviewCount = COUNT(R._id),
tipCount = COUNT(T._id)
HAVING reviewCount > 10 AND
tipCount > 10
SELECT VALUE U1.user_id
LIMIT 15
),
U1F AS (
FROM U1, Yelp.Users UU1, UU1.friends U1F
WHERE TO_STRING(U1) /* +indexnl */ = UU1.user_id
SELECT DISTINCT VALUE U1F
)
SELECT VALUE {
"u1Count": ( FROM U1 SELECT VALUE COUNT(*) )[0],
"u1FCount": ( FROM U1F SELECT VALUE COUNT(*) )[0]
}{code}
But if I were UNION ALL these values instead (have two documents instead of one document with two fields), I get a different count for "u1FCount".
{code:java}
WITH U1 AS (
-- Better to use hash join here...
FROM (FROM Yelp.Reviews R SELECT VALUE R LIMIT 100000) R,
Yelp.Users U1, Yelp.Tips T, Yelp.Checkins C
WHERE R.user_id = U1.user_id AND
R.business_id = C.business_id AND
R.user_id = T.user_id AND
LEN(U1.friends) BETWEEN 10 AND 30
GROUP BY U1.user_id
LET reviewCount = COUNT(R._id),
tipCount = COUNT(T._id)
HAVING reviewCount > 10 AND
tipCount > 10
SELECT VALUE U1.user_id
LIMIT 15
),
U1F AS (
FROM U1, Yelp.Users UU1, UU1.friends U1F
WHERE TO_STRING(U1) /* +indexnl */ = UU1.user_id
SELECT DISTINCT VALUE U1F
)
SELECT VALUE {
"unionCounts": ( FROM U1 SELECT COUNT(*) AS u1Count UNION ALL FROM U1F SELECT COUNT(*) AS u1FCount )
}{code}
The generated plans for each is attached.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)