You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Glenn Justo Galvizo (Jira)" <ji...@apache.org> on 2021/10/07 20:40:00 UTC

[jira] [Commented] (ASTERIXDB-2971) UNION ALL query giving wrong result

    [ https://issues.apache.org/jira/browse/ASTERIXDB-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425785#comment-17425785 ] 

Glenn Justo Galvizo commented on ASTERIXDB-2971:
------------------------------------------------

The workaround is to use ARRAY_CONCAT.

> UNION ALL query giving wrong result
> -----------------------------------
>
>                 Key: ASTERIXDB-2971
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2971
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: COMP - Compiler
>            Reporter: Glenn Justo Galvizo
>            Priority: Major
>         Attachments: non-union.plan, union-all.plan
>
>
> Using the DDLs:
>  
> {code:java}
> DROP DATAVERSE Yelp IF EXISTS;
> CREATE DATAVERSE Yelp;
> USE Yelp;
> CREATE TYPE GenericType
> AS { _id: uuid };
> CREATE DATASET Businesses (GenericType)
> PRIMARY KEY _id AUTOGENERATED;
> CREATE DATASET Checkins (GenericType)
> PRIMARY KEY _id AUTOGENERATED;
> CREATE DATASET Friends (GenericType)
> PRIMARY KEY _id AUTOGENERATED;
> CREATE DATASET Reviews (GenericType)
> PRIMARY KEY _id AUTOGENERATED;
> CREATE DATASET Tips (GenericType)
> PRIMARY KEY _id AUTOGENERATED;
> CREATE DATASET Users (GenericType)
> PRIMARY KEY _id AUTOGENERATED;
> CREATE INDEX businessesPrimaryIdx
> ON Businesses ( business_id : string );
> CREATE INDEX checkinBusinessIDIdx
> ON Checkins ( business_id : string );
> CREATE INDEX friendsUserIDIdx
> ON Friends ( user_id : string );
> CREATE INDEX reviewsPrimaryIdx
> ON Reviews ( review_id : string );
> CREATE INDEX tipsUserIDIdx
> ON Tips ( user_id : string );
> CREATE INDEX usersPrimaryIdx
> ON Users ( user_id : string );
> {code}
>  
> The following query returns the correct count results:
>  
> {code:java}
> WITH U1 AS (
>  FROM (FROM Yelp.Reviews R SELECT VALUE R LIMIT 100000) R,
>  Yelp.Users U1, Yelp.Tips T, Yelp.Checkins C
>  WHERE R.user_id = U1.user_id AND
>  R.business_id = C.business_id AND
>  R.user_id = T.user_id AND
>  LEN(U1.friends) BETWEEN 10 AND 30
>  GROUP BY U1.user_id
>  LET reviewCount = COUNT(R._id),
>  tipCount = COUNT(T._id)
>  HAVING reviewCount > 10 AND
>  tipCount > 10
>  SELECT VALUE U1.user_id
>  LIMIT 15
>  ),
>  U1F AS (
>  FROM U1, Yelp.Users UU1, UU1.friends U1F
>  WHERE TO_STRING(U1) /* +indexnl */ = UU1.user_id
>  SELECT DISTINCT VALUE U1F
>  )
> SELECT VALUE {
>  "u1Count": ( FROM U1 SELECT VALUE COUNT(*) )[0],
>  "u1FCount": ( FROM U1F SELECT VALUE COUNT(*) )[0]
> }{code}
>  
> But if I were UNION ALL these values instead (have two documents instead of one document with two fields), I get a different count for "u1FCount".
>  
> {code:java}
> WITH U1 AS (
>  -- Better to use hash join here...
>  FROM (FROM Yelp.Reviews R SELECT VALUE R LIMIT 100000) R,
>  Yelp.Users U1, Yelp.Tips T, Yelp.Checkins C
>  WHERE R.user_id = U1.user_id AND
>  R.business_id = C.business_id AND
>  R.user_id = T.user_id AND
>  LEN(U1.friends) BETWEEN 10 AND 30
>  GROUP BY U1.user_id
>  LET reviewCount = COUNT(R._id),
>  tipCount = COUNT(T._id)
>  HAVING reviewCount > 10 AND
>  tipCount > 10
>  SELECT VALUE U1.user_id
>  LIMIT 15
>  ),
>  U1F AS (
>  FROM U1, Yelp.Users UU1, UU1.friends U1F
>  WHERE TO_STRING(U1) /* +indexnl */ = UU1.user_id
>  SELECT DISTINCT VALUE U1F
>  )
> SELECT VALUE {
>  "unionCounts": ( FROM U1 SELECT COUNT(*) AS u1Count UNION ALL FROM U1F SELECT COUNT(*) AS u1FCount )
> }{code}
>  
> The generated plans for each is attached.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)