You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "comphead (via GitHub)" <gi...@apache.org> on 2023/12/03 04:23:59 UTC

[I] Wrong behavior for `RANK over()` [arrow-datafusion]

comphead opened a new issue, #8403:
URL: https://github.com/apache/arrow-datafusion/issues/8403

   ### Describe the bug
   
   When working on #8386 
   
   RANK function gives wrong result or fails
   
   ### To Reproduce
   
   ```
   ❯ select rank() over ()  from (select 1 a union all select 2 a) q;
   Arrow error: Invalid argument error: number of columns(2) must match number of fields(1) in schema
   ❯ select rank() over (order by 1)  from (select 1 a union all select 2 a) q;
   +----------------------------------------------------------------------------------------+
   | RANK() ORDER BY [q.a ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW |
   +----------------------------------------------------------------------------------------+
   | 1                                                                                      |
   | 2                                                                                      |
   +----------------------------------------------------------------------------------------+
   ```
   
   ### Expected behavior
   
   It has to be 
   
   ```
   1
   1
   ```
   
   for both cases
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [BUG] Wrong behavior for `RANK over()` [arrow-datafusion]

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #8403:
URL: https://github.com/apache/arrow-datafusion/issues/8403#issuecomment-1837550537

   > > Also unstable behavior
   > 
   > It isn't clear to me that the order of the result from that query is well defined (as in I think it may not be a bug that the output is unstable)
   > 
   > Specifically:
   > 
   > 1. ORDER BY `20, a, 10, 'a', null` can produce any output order (as `20` is a constant)
   > 2. PARTITION BY `10, a, 'a', null` puts all values in the same partition
   > 
   > (as an aside these queries look like they are generated by a sql generator, are you testing out DataFusion with such a system?)
   
   you are right @alamb , I was confused that I wasnt able to have the same behavior from PG, but I managed to get the same in duck DB, and the explanation sounds very reasonable for me.
   
   How ever the main purpose of this ticket is still valid.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [BUG] Wrong behavior for `RANK over()` [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #8403:
URL: https://github.com/apache/arrow-datafusion/issues/8403#issuecomment-1837473877

   > Also unstable behavior
   
   It isn't clear to me that the order of the result from that query is well defined (as in I think it may not be a bug that the output is unstable)
   
   Specifically:
   1. ORDER BY `20, a, 10, 'a', null` can produce any output order (as `20` is a constant)
   2. PARTITION BY `10, a, 'a', null` puts all values in the same partition
   
   
   
   
   
   (as an aside these queries look like they are generated by a sql generator, are you testing out DataFusion with such a system?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [BUG] Wrong behavior for `RANK over()` [arrow-datafusion]

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #8403:
URL: https://github.com/apache/arrow-datafusion/issues/8403#issuecomment-1837404220

   Also unstable behavior 
   ```
   DataFusion CLI v33.0.0
   ❯ select a, 
          rank() over (partition by 10, a, 'a', null order by 20, a, 10, 'a', null) rnk,
          row_number() over (partition by 10, a, 'a', null order by 20, a, 10, 'a', null) rn
          from (select 2 a union all select 1 a) q;
   +---+-----+----+
   | a | rnk | rn |
   +---+-----+----+
   | 2 | 1   | 1  |
   | 1 | 1   | 1  |
   +---+-----+----+
   2 rows in set. Query took 0.010 seconds.
   
   Restart cli
   
   DataFusion CLI v33.0.0
   ❯ select a, 
          rank() over (partition by 10, a, 'a', null order by 20, a, 10, 'a', null) rnk,
          row_number() over (partition by 10, a, 'a', null order by 20, a, 10, 'a', null) rn
          from (select 2 a union all select 1 a) q;
   +---+-----+----+
   | a | rnk | rn |
   +---+-----+----+
   | 1 | 1   | 1  |
   | 2 | 1   | 1  |
   +---+-----+----+
   2 rows in set. Query took 0.018 seconds.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [BUG] Wrong behavior for `RANK over()` [arrow-datafusion]

Posted by "Dandandan (via GitHub)" <gi...@apache.org>.
Dandandan closed issue #8403: [BUG] Wrong behavior for `RANK over()`
URL: https://github.com/apache/arrow-datafusion/issues/8403


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org