You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "izveigor (via GitHub)" <gi...@apache.org> on 2023/06/06 10:39:15 UTC

[GitHub] [arrow-datafusion] izveigor opened a new issue, #6555: Implement `unnest` function

izveigor opened a new issue, #6555:
URL: https://github.com/apache/arrow-datafusion/issues/6555

   ### Is your feature request related to a problem or challenge?
   
   Follow on to https://github.com/apache/arrow-datafusion/pull/6384
   It would be nice to implement `unnest` function (with the properties like the analog in PostgreSQL) in `arrow-datafusion`.
   
   ### Describe the solution you'd like
   
   Main benefits for adding this feature:
   1) With `unnest` function we can use aggregate functions for arrays:
   ```
   SELECT sum(a) AS total FROM (SELECT unnest(make_array(3, 5, 6) AS a) AS b;
   ----
   14
   ```
   2) `unnest` function serves as an exchange between arrays and columns, we have 2 cases of behavior:
   - `unnest` with single argument
   - `unnest` with multiple argument (more than 1) (this form is only allowed in a query's FROM clause)
   
   Examples:
   ```
   unnest(make_array(1, 2))
   ----
   1
   2
   ```
   
   ```
   select * from unnest(make_array(1, 2, 3), make_array('h', 'e', 'l', 'l', 'o')
   ----
   1 h
   2 e
   3 l
     l
     o
   ```
   
   ### Describe alternatives you've considered
   
   For aggregate functions, we can create a lot of individual functions for aggregate functions (like `array_sum`), but I think this implementation would be too redundant.
   
   
   
   ### Additional context
   
   Similar Issues:
   https://github.com/apache/arrow-datafusion/issues/6119
   
   Similar PR:
   https://github.com/apache/arrow-datafusion/pull/6384
   https://github.com/apache/arrow-datafusion/pull/5106
   
   Links to sources:
   https://www.postgresql.org/docs/current/functions-array.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Implement `unnest` function [arrow-datafusion]

Posted by "jonahgao (via GitHub)" <gi...@apache.org>.
jonahgao commented on issue #6555:
URL: https://github.com/apache/arrow-datafusion/issues/6555#issuecomment-2044030176

   take


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] izveigor commented on issue #6555: Implement `unnest` function

Posted by "izveigor (via GitHub)" <gi...@apache.org>.
izveigor commented on issue #6555:
URL: https://github.com/apache/arrow-datafusion/issues/6555#issuecomment-1601286281

   Yes @jayzhan211, I know about this feature: (See PR: https://github.com/apache/arrow-datafusion/pull/5106).
   It only works in Rust implementation (the function `unnest_columns`) but it doesn't in SQL. I only found the expression `InUnnest` in `sqlparser-rs`: https://docs.rs/sqlparser/latest/sqlparser/ast/enum.Expr.html#variant.InUnnest.
   Therefore, I wonder how we will recognize this token 🤔
   
   Also If we want to stick to the standard implementation of the function (MySQL, PostgreSQL and other),I think we should implement the scalar function as well.
   
   @jackwener @alamb @tustvold What do you think about it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jayzhan211 commented on issue #6555: Implement `unnest` function

Posted by "jayzhan211 (via GitHub)" <gi...@apache.org>.
jayzhan211 commented on issue #6555:
URL: https://github.com/apache/arrow-datafusion/issues/6555#issuecomment-1600201622

   I found that we can apply `LogicalPlan::Unnest` instead of implementing `Unnest` in ScalarFunction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] parkma99 commented on issue #6555: Implement `unnest` function

Posted by "parkma99 (via GitHub)" <gi...@apache.org>.
parkma99 commented on issue #6555:
URL: https://github.com/apache/arrow-datafusion/issues/6555#issuecomment-1583774382

   I'd like to work on this . Is anyone already preparing a PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jayzhan211 commented on issue #6555: Implement `unnest` function

Posted by "jayzhan211 (via GitHub)" <gi...@apache.org>.
jayzhan211 commented on issue #6555:
URL: https://github.com/apache/arrow-datafusion/issues/6555#issuecomment-1605206543

   I am considering applying unnest for each array and `join` them together, is there an existing plan for this joining?
   For example, unnest([1,2], [3,4,5]) is simplified to unnest_plan([1,2]), unnest_plan([3,4,5])
   ```
   Ok(Some(Union
     Unnest: unnest(make_array(Int64(1),Int64(2)),make_array(Int64(3),Int64(4)))_0
       Projection: List([1,2]) AS unnest(make_array(Int64(1),Int64(2)),make_array(Int64(3),Int64(4)))_0
         EmptyRelation
     Unnest: unnest(make_array(Int64(1),Int64(2)),make_array(Int64(3),Int64(4)))_1
       Projection: List([3,4]) AS unnest(make_array(Int64(1),Int64(2)),make_array(Int64(3),Int64(4)))_1
         EmptyRelation))
   ```
   
   For those Unnest, apply some kind of `Join` for them.  I did not find any JoinType can concat rows side by side. Are there any existing ways to achieve this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6555: Implement `unnest` function

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6555:
URL: https://github.com/apache/arrow-datafusion/issues/6555#issuecomment-1605454487

   > . I did not find any JoinType can concat rows side by side. Are there any existing ways to achieve this?
   
   Maybe you could use `row_number()` window function to generate a number for each subquery and then use that row number as the join key?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6555: Implement `unnest` function

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6555:
URL: https://github.com/apache/arrow-datafusion/issues/6555#issuecomment-1601645537

   > Also If we want to stick to the standard implementation of the function (MySQL, PostgreSQL and other),I think we should implement the scalar function as well.
   
   I don't understand what a `scalar function` would do -- isn't `unnest` effectively a table function (in the sense that it outputs a "table" (in DataFusion terms, a stream of `RecordBatch`es)
   
   Thus I wonder if the SQL planner (or maybe some optimizer pass) could replace all instances of `Expr::Unnest` with a subquery (which had the relevant `LogialPlan::Unnest`) 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Implement `unnest` function [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #6555: Implement `unnest` function
URL: https://github.com/apache/arrow-datafusion/issues/6555


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jayzhan211 commented on issue #6555: Implement `unnest` function

Posted by "jayzhan211 (via GitHub)" <gi...@apache.org>.
jayzhan211 commented on issue #6555:
URL: https://github.com/apache/arrow-datafusion/issues/6555#issuecomment-1595608071

   The return type of Unnest might be Vec<ArrayRef>, a set of rows (Vec) with an array of columns (Array). https://github.com/apache/arrow-datafusion/blob/9b419b19a66bdd35e9e5c0bca259786f8f3c3965/datafusion/expr/src/columnar_value.rs#L32-L38
   
   Maybe we can add another type like RowsOfArray(Vec<ArrayRef>) for the rows-based return type? Is there other alternative return types of Unnest?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Implement `unnest` function [arrow-datafusion]

Posted by "jonahgao (via GitHub)" <gi...@apache.org>.
jonahgao commented on issue #6555:
URL: https://github.com/apache/arrow-datafusion/issues/6555#issuecomment-2044036550

   > - unnest with multiple argument (more than 1) (this form is only allowed in a query's FROM clause)
   
   I plan to implement this feature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org