You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "bellwether-softworks (via GitHub)" <gi...@apache.org> on 2023/06/22 12:44:13 UTC

[GitHub] [arrow-datafusion] bellwether-softworks opened a new issue, #6743: Trouble getting fancy with ARRAY_AGG

bellwether-softworks opened a new issue, #6743:
URL: https://github.com/apache/arrow-datafusion/issues/6743

   ### Describe the bug
   
   `ARRAY_AGG` usage in SQL results in an error when used in conjunction with `DISTINCT` parameter alongside another aggregate field.
   
   ### To Reproduce
   
   1. Establish example data:
     ```sql
     CREATE TABLE example(id INT, parent_id INT, tag VARCHAR) AS VALUES
         (1, 0, 'bob'),
         (2, 0, 'cat'),
         (3, 1, 'tom'),
         (4, 1, 'cat'),
         (5, 1, 'tom');
     ```
   2. Execute query using `ARRAY_AGG` and `DISTINCT` parameter:
     ```sql
     SELECT
             parent_id,
             COUNT(id) AS count_of,
             ARRAY_AGG(DISTINCT tag) AS tags
         FROM example
         GROUP BY parent_id;
     ```
   
   Executing the above results in the following message:
   ```
   ArrowError(ExternalError(Internal("Inconsistent types in ScalarValue::iter_to_array. Expected Utf8, got List([tom,cat])")))
   ```
   
   ### Expected behavior
   
   Desired output should be similar to the following:
   
   | parent_id | count_of | tags            |
   |-----------|----------|-----------------|
   | 1         | 3        | [tom, cat] |
   | 0         | 2        | [bob, cat]      |
   
   
   ### Additional context
   
   Omitting either the `COUNT` field, or the `DISTINCT` clause in `ARRAY_AGG`, allows the query to complete successfully.
   
   The above was initially discovered in v17.0.0 and verified to still be presenting in v26.0.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6743: Trouble getting fancy with ARRAY_AGG (DISTINCT ARRAY_AGG)

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6743:
URL: https://github.com/apache/arrow-datafusion/issues/6743#issuecomment-1605356775

   > Sorry I make a stupid question.😂
   
   Not at all -- we are all learning here together!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] parkma99 commented on issue #6743: Trouble getting fancy with ARRAY_AGG

Posted by "parkma99 (via GitHub)" <gi...@apache.org>.
parkma99 commented on issue #6743:
URL: https://github.com/apache/arrow-datafusion/issues/6743#issuecomment-1603989175

   It probably panic in here https://github.com/apache/arrow-datafusion/blob/8be5a8ced87ccf4fb09d33f1f52c525027696e2b/datafusion/common/src/scalar.rs#L2317.
   
   I am confusing where the `sv` defined.
   
   Same question in https://github.com/apache/arrow-datafusion/blob/8be5a8ced87ccf4fb09d33f1f52c525027696e2b/datafusion/common/src/scalar.rs#L2285
   
   cc @alamb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6743: Trouble getting fancy with ARRAY_AGG (DISTINCT ARRAY_AGG)

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6743:
URL: https://github.com/apache/arrow-datafusion/issues/6743#issuecomment-1604500576

   Thanks for the report @bellwether-softworks  -- I added this to https://github.com/apache/arrow-datafusion/issues/2326
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed issue #6743: Trouble getting fancy with ARRAY_AGG (DISTINCT ARRAY_AGG)

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #6743: Trouble getting fancy with ARRAY_AGG (DISTINCT ARRAY_AGG)
URL: https://github.com/apache/arrow-datafusion/issues/6743


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] parkma99 commented on issue #6743: Trouble getting fancy with ARRAY_AGG (DISTINCT ARRAY_AGG)

Posted by "parkma99 (via GitHub)" <gi...@apache.org>.
parkma99 commented on issue #6743:
URL: https://github.com/apache/arrow-datafusion/issues/6743#issuecomment-1605300969

   > In the code you referenced this is basically the catch all meaning "if the type doesn't match one of the other branches"
   
   Sorry I make a stupid question.😂


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6743: Trouble getting fancy with ARRAY_AGG (DISTINCT ARRAY_AGG)

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6743:
URL: https://github.com/apache/arrow-datafusion/issues/6743#issuecomment-1604499977

   > I am confusing where the sv defined.
   
   In the code you referenced this is basically the catch all meaning "if the type doesn't match one of the other branches"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org