You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/26 05:09:13 UTC

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #622: fix 621, where unnamed window functions shall be differentiated by partition and order by clause

Jimexist opened a new pull request #622:
URL: https://github.com/apache/arrow-datafusion/pull/622


   # Which issue does this PR close?
   
   Closes #621
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on pull request #622: fix 621, where unnamed window functions shall be differentiated by partition and order by clause

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #622:
URL: https://github.com/apache/arrow-datafusion/pull/622#issuecomment-868953367


   should we update physical column name as well for better readability in the query output? https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_plan/planner.rs#L137-L139


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on pull request #622: fix 621, where unnamed window functions shall be differentiated by partition and order by clause

Posted by GitBox <gi...@apache.org>.
Jimexist commented on pull request #622:
URL: https://github.com/apache/arrow-datafusion/pull/622#issuecomment-869153668


   > We do allow query outputs (i.e. record batch schema and physical plan schema) to contain columns with same names. This is quite common in join queries where more than one relations contain the same column name.
   > 
   > 
   > 
   > My previous comment was more around the inconsistency on how window function names are generated in logical plane v.s. physical plane. For example, in physical plane, we don't take `partition by clause, order by clause, and window frames` into account when creating the column name. If we don't want to maintain this consistency, I think we should update this invariant instead: https://github.com/apache/arrow-datafusion/blob/master/docs/specification/invariants.md#the-physical-schema-is-invariant-under-planning.
   > 
   > 
   > 
   > https://github.com/apache/arrow-datafusion/blob/master/docs/specification/output-field-name-semantic.md contains some rules on how we want to generate field names for physical schema. We don't have any rules for window functions at the moment, so maybe worth formalize it in that document as well.
   
   Thanks for the advice let me address in another pull request 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist edited a comment on pull request #622: fix 621, where unnamed window functions shall be differentiated by partition and order by clause

Posted by GitBox <gi...@apache.org>.
Jimexist edited a comment on pull request #622:
URL: https://github.com/apache/arrow-datafusion/pull/622#issuecomment-868956435


   > should we update physical column name as well for better readability in the query output? https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_plan/planner.rs#L137-L139
   
   i think this is more of the strategy on how we name unnamed columns in the schema. for window functions without an alias, it'll be created with a unique name, given the function type, arg name, partition by clause, order by clause, and window frames. any of these are equivalent then the columns can be thought of the same and thus deduplicated in the projection phase, otherwise they will be planned separately.
   
   if any such columns are having a name conflict in the projection phase the schema validation logic will fail.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on pull request #622: fix 621, where unnamed window functions shall be differentiated by partition and order by clause

Posted by GitBox <gi...@apache.org>.
Jimexist commented on pull request #622:
URL: https://github.com/apache/arrow-datafusion/pull/622#issuecomment-868956823


   if we are okay with dup names then i guess just keeping the function name would be enough


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on pull request #622: fix 621, where unnamed window functions shall be differentiated by partition and order by clause

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #622:
URL: https://github.com/apache/arrow-datafusion/pull/622#issuecomment-868961912


   We do allow query outputs (i.e. record batch schema and physical plan schema) to contain columns with same names. This is quite common in join queries where more than one relations contain the same column name.
   
   My previous comment was more around the inconsistency on how window function names are generated in logical plane v.s. physical plane. For example, in physical plane, we don't take `partition by clause, order by clause, and window frames` into account when creating the column name. If we don't want to maintain this consistency, I think we should update this invariant instead: https://github.com/apache/arrow-datafusion/blob/master/docs/specification/invariants.md#the-physical-schema-is-invariant-under-planning.
   
   https://github.com/apache/arrow-datafusion/blob/master/docs/specification/output-field-name-semantic.md contains some rules on how we want to generate field names for physical schema. We don't have any rules for window functions at the moment, so maybe worth formalize it in that document as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #622: fix 621, where unnamed window functions shall be differentiated by partition and order by clause

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #622:
URL: https://github.com/apache/arrow-datafusion/pull/622


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on pull request #622: fix 621, where unnamed window functions shall be differentiated by partition and order by clause

Posted by GitBox <gi...@apache.org>.
Jimexist commented on pull request #622:
URL: https://github.com/apache/arrow-datafusion/pull/622#issuecomment-868956556


   Postgres is totally okay with dup names in the projection phase:
   
   ```
   [postgres] # select max(c1) over (), max(c1) over (), max(c1) over (partition by c1) from test limit 3;
    max | max | max
   -----+-----+-----
    e   | e   | a
    e   | e   | a
    e   | e   | a
   (3 rows)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on pull request #622: fix 621, where unnamed window functions shall be differentiated by partition and order by clause

Posted by GitBox <gi...@apache.org>.
Jimexist commented on pull request #622:
URL: https://github.com/apache/arrow-datafusion/pull/622#issuecomment-868956435


   > should we update physical column name as well for better readability in the query output? https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_plan/planner.rs#L137-L139
   
   i think this is more of the strategy on how we name unnamed columns in the schema. for window functions without an alias, it'll be created with a unique name, given the function type, arg name, partition by clause, order by clause, and window frames. any of these are equivalent then the columns can be thought of the same and thus deduplicated in the projection phase, otherwise they will be planned separately.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org