You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Glenn Justo Galvizo (Jira)" <ji...@apache.org> on 2022/09/10 02:56:00 UTC

[jira] [Updated] (ASTERIXDB-3059) EXCEPT in SelectClause

     [ https://issues.apache.org/jira/browse/ASTERIXDB-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Glenn Justo Galvizo updated ASTERIXDB-3059:
-------------------------------------------
    Description: 
It would be nice to have an option to exclude certain fields when using SELECT *. [BigQuery|https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except] gives users the option to exclude columns with 'SELECT * EXCEPT f1, f2, ...'. 'EXCEPT' seems like a nice addition to our grammar, here are a few examples on how we can extend our SELECT clause to include 'EXCEPT':

 

Let's start off with some DDLs and some data:
{code:sql}
CREATE TYPE     GenericType AS { _id: uuid };
CREATE DATASET  Users (GenericType) PRIMARY KEY _id AUTOGENERATED;
CREATE DATASET  Friends (GenericType) PRIMARY KEY _id AUTOGENERATED;

INSERT INTO     Users [
  { "user_id": 1, "name": "Sally",
    "address": { "zip_code": "92512", "state": "CA" } },
  { "user_id": 2, "name": "Mark" },
  { "user_id": 3, "name": "John" },
  { "user_id": 4, "name": "Tracy", "title": "Dr" }
];
INSERT INTO     Friends [
  { "user_id": 1, "friend": 2 },
  { "user_id": 1, "friend": 3 },
  { "user_id": 2, "friend": 1 },
  { "user_id": 3, "friend": 1 }
];{code}
----
Now we can get into the examples:

1. Get me all users U1 that have friends. Do not include the F dataset. In the example below, we have a 'SELECT *' followed by an 'EXCEPT ( F )'. In contrast to BigQuery, we do not specify columns / fields in our exclusion list here, instead we specify an exclusion of _variables_ that would have otherwise been returned with our 'SELECT' clause.
{code:sql}
FROM   Users U1, Friends F
WHERE  U1.user_id = F.user_id
SELECT * EXCEPT F;{code}
Returns the following:
{code:java}
> { "U1":  { "user_id": 1, "name": "Sally",
             "address": { "zip_code": "92512", "state": "CA" } } }
> { "U1": { "user_id": 2, "name": "Mark" } }
> { "U1": { "user_id": 3, "name": "John" } }{code}
2. Get me all users U1 that have friends. Do not include that user's address and title. The example below more closely aligns with BigQuery, as we specify columns / fields that we want to exclude from U1 in the exclusion list 'address' and 'title'.
{code:sql}
FROM   Users U1, Friends F
WHERE  U1.user_id = F.user_id 
SELECT DISTINCT U1.* EXCEPT address, title;{code}
Returns the following:
{code:java}
> { "user_id": 1, "name": "Sally" }
> { "user_id": 2, "name": "Mark" }
> { "user_id": 3, "name": "John" }{code}
3. If we want to exclude some field in a nested object, we can specify the fields within our object with the '.' syntax. Say we don't want to include the zipcode of an address object within a user that has friends. We would use the query below:
{code:sql}
FROM   Users U1, Friends F
WHERE  U1.user_id = F.user_id
SELECT DISTINCT U1.* EXCEPT address.zip_code;{code}
Returns the following:
{code:java}
> { "user_id": 1, "name": "Sally", "address": { "state": "CA" } }
> { "user_id": 2, "name": "Mark" }
> { "user_id": 3, "name": "John" }{code}
 
----
The exact grammar I have in mind is given below, where we simply add the 'EXCEPT' at the tail of our 'SELECT' clause. Logically, the 'EXCEPT' clause surrounds our SELECT- EXPRESSION with a call to OBJECT_REMOVE_FIELDS. Using #2 as an example, a logical equivalent would be:
{code:sql}
FROM   (
  FROM   Users U1, Friends F
  WHERE  U1.user_id = F.user_id
  SELECT DISTINCT U1.* 
) TMP
SELECT VALUE OBJECT_REMOVE_FIELDS(TMP, ["address", "title"]);{code}

  was:
It would be nice to have an option to exclude certain fields when using SELECT *. [BigQuery|https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except] gives users the option to exclude columns with 'SELECT * EXCEPT f1, f2, ...'. 'EXCEPT' seems like a nice addition to our grammar, here are a few examples on how we can extend our SELECT clause to include 'EXCEPT':

 

1. Get me all users U1 and their friends U2. Do not include the intermediate F dataset. In the example below, we have a 'SELECT *' followed by an 'EXCEPT ( F )'. In contrast to BigQuery, we do not specify columns / fields in our exclusion list here, instead we specify an exclusion of _variables_ that would have otherwise been returned with our 'SELECT' clause.
{code:sql}
FROM   Users U1, Friends F, Users U2
WHERE  U1.user_id = F.user_id AND
       F.friend = U2.user_id
SELECT * 
EXCEPT F;{code}
2. Get me all users U1 that have friends, and their friend's IDs. Do not include that user's last name and address. The example below more closely aligns with BigQuery, as we specify columns / fields that we want to exclude from U1 in the exclusion list 'last_name, address'.
{code:sql}
FROM   Users U1, Friends F, Users U2
WHERE  U1.user_id = F.user_id AND
       F.friend = U2.user_id 
SELECT U1.*,
       U2.user_id AS friend_user_id
EXCEPT last_name, address;{code}
3. If we are only interested in users U1 that have friends, then we can use an 'EXCEPT' on a 'SELECT VALUE'. 
{code:sql}
FROM   Users U1, Friends F, Users U2
WHERE  U1.user_id = F.user_id AND
       F.friend = U2.user_id 
SELECT DISTINCT VALUE U1 
EXCEPT last_name, address;{code}
4. If we want to exclude some field in a nested object, we can specify the fields within our object with the '.' syntax. Say we don't want to include the street code of an address object within a user that has friends. We would use the query below:
{code:sql}
FROM   Users U1, Friends F, Users U2
WHERE  U1.user_id = F.user_id AND
       F.friend = U2.user_id 
SELECT DISTINCT VALUE U1
EXCEPT address.zip_code;{code}
 

The exact grammar I have in mind is given below, where we simply add the 'EXCEPT' at the tail of our 'SELECT' clause. Logically, the 'EXCEPT' clause surrounds our SELECT clause expression with a call to OBJECT_REMOVE_FIELDS. Using #3 as an example, a logical equivalent would be:
{code:sql}
FROM   Users U1, Friends F, Users U2
WHERE  U1.user_id = F.user_id AND
       F.friend = U2.user_id 
SELECT DISTINCT VALUE OBJECT_REMOVE_FIELDS(U1, ["last_name", "address"]);{code}


> EXCEPT in SelectClause
> ----------------------
>
>                 Key: ASTERIXDB-3059
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3059
>             Project: Apache AsterixDB
>          Issue Type: New Feature
>          Components: SQL - Translator SQL++
>            Reporter: Glenn Justo Galvizo
>            Assignee: Glenn Justo Galvizo
>            Priority: Major
>         Attachments: SelectClause-1.png
>
>
> It would be nice to have an option to exclude certain fields when using SELECT *. [BigQuery|https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except] gives users the option to exclude columns with 'SELECT * EXCEPT f1, f2, ...'. 'EXCEPT' seems like a nice addition to our grammar, here are a few examples on how we can extend our SELECT clause to include 'EXCEPT':
>  
> Let's start off with some DDLs and some data:
> {code:sql}
> CREATE TYPE     GenericType AS { _id: uuid };
> CREATE DATASET  Users (GenericType) PRIMARY KEY _id AUTOGENERATED;
> CREATE DATASET  Friends (GenericType) PRIMARY KEY _id AUTOGENERATED;
> INSERT INTO     Users [
>   { "user_id": 1, "name": "Sally",
>     "address": { "zip_code": "92512", "state": "CA" } },
>   { "user_id": 2, "name": "Mark" },
>   { "user_id": 3, "name": "John" },
>   { "user_id": 4, "name": "Tracy", "title": "Dr" }
> ];
> INSERT INTO     Friends [
>   { "user_id": 1, "friend": 2 },
>   { "user_id": 1, "friend": 3 },
>   { "user_id": 2, "friend": 1 },
>   { "user_id": 3, "friend": 1 }
> ];{code}
> ----
> Now we can get into the examples:
> 1. Get me all users U1 that have friends. Do not include the F dataset. In the example below, we have a 'SELECT *' followed by an 'EXCEPT ( F )'. In contrast to BigQuery, we do not specify columns / fields in our exclusion list here, instead we specify an exclusion of _variables_ that would have otherwise been returned with our 'SELECT' clause.
> {code:sql}
> FROM   Users U1, Friends F
> WHERE  U1.user_id = F.user_id
> SELECT * EXCEPT F;{code}
> Returns the following:
> {code:java}
> > { "U1":  { "user_id": 1, "name": "Sally",
>              "address": { "zip_code": "92512", "state": "CA" } } }
> > { "U1": { "user_id": 2, "name": "Mark" } }
> > { "U1": { "user_id": 3, "name": "John" } }{code}
> 2. Get me all users U1 that have friends. Do not include that user's address and title. The example below more closely aligns with BigQuery, as we specify columns / fields that we want to exclude from U1 in the exclusion list 'address' and 'title'.
> {code:sql}
> FROM   Users U1, Friends F
> WHERE  U1.user_id = F.user_id 
> SELECT DISTINCT U1.* EXCEPT address, title;{code}
> Returns the following:
> {code:java}
> > { "user_id": 1, "name": "Sally" }
> > { "user_id": 2, "name": "Mark" }
> > { "user_id": 3, "name": "John" }{code}
> 3. If we want to exclude some field in a nested object, we can specify the fields within our object with the '.' syntax. Say we don't want to include the zipcode of an address object within a user that has friends. We would use the query below:
> {code:sql}
> FROM   Users U1, Friends F
> WHERE  U1.user_id = F.user_id
> SELECT DISTINCT U1.* EXCEPT address.zip_code;{code}
> Returns the following:
> {code:java}
> > { "user_id": 1, "name": "Sally", "address": { "state": "CA" } }
> > { "user_id": 2, "name": "Mark" }
> > { "user_id": 3, "name": "John" }{code}
>  
> ----
> The exact grammar I have in mind is given below, where we simply add the 'EXCEPT' at the tail of our 'SELECT' clause. Logically, the 'EXCEPT' clause surrounds our SELECT- EXPRESSION with a call to OBJECT_REMOVE_FIELDS. Using #2 as an example, a logical equivalent would be:
> {code:sql}
> FROM   (
>   FROM   Users U1, Friends F
>   WHERE  U1.user_id = F.user_id
>   SELECT DISTINCT U1.* 
> ) TMP
> SELECT VALUE OBJECT_REMOVE_FIELDS(TMP, ["address", "title"]);{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)