You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "NGA-TRAN (via GitHub)" <gi...@apache.org> on 2023/11/14 20:19:45 UTC

[I] Count distinct with date_part/date_bin does not work [arrow-datafusion]

NGA-TRAN opened a new issue, #8175:
URL: https://github.com/apache/arrow-datafusion/issues/8175

   ### Describe the bug
   
   After IOx upgraded DF recently, we hit a bug in count distinct with `date_bin`/`date_part`.  
   
   
   
   ### To Reproduce
   
   After some investigation, here is the reproducer in Datafusion CLI:
   
   ```SQL
   create table t1(state string, city string, min_temp float, area int, time timestamp) as values 
       ('MA', 'Boston', 70.4, 1, 50),
       ('MA', 'Bedford', 71.59, 2, 150);
   
   select date_part('year', time) as bla, count(distinct state) as count from t1 group by bla;
   -- Optimizer rule 'single_distinct_aggregation_to_group_by' failed caused by Schema error: No field named "date_part(Utf8(""year""),t1.time)". Valid fields are group_alias_0, "COUNT(DISTINCT t1.state)".
   
   -- this query has the same issue
   select date_bin(interval '1 year', time) as bla, count(distinct state) as count from t1 group by bla;
   ```
   
   ### Expected behavior
   
   The queries should work
   
   ### Additional context
   
   After I backed out https://github.com/apache/arrow-datafusion/commit/15d8c9bf48a56ae9de34d18becab13fd1942dc4a locally, the queries work


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Count distinct with date_part/date_bin does not work [arrow-datafusion]

Posted by "NGA-TRAN (via GitHub)" <gi...@apache.org>.
NGA-TRAN commented on issue #8175:
URL: https://github.com/apache/arrow-datafusion/issues/8175#issuecomment-1811188501

   I am working on 2 PRs:
   
   1.  Reverting https://github.com/apache/arrow-datafusion/commit/15d8c9bf48a56ae9de34d18becab13fd1942dc4a 
   2. Adding above tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Count distinct with date_part/date_bin does not work [arrow-datafusion]

Posted by "NGA-TRAN (via GitHub)" <gi...@apache.org>.
NGA-TRAN commented on issue #8175:
URL: https://github.com/apache/arrow-datafusion/issues/8175#issuecomment-1811185436

   CC @alamb @haohuaijin 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Regression: Count distinct with date_part/date_bin does not work [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #8175: Regression: Count distinct with date_part/date_bin does not work
URL: https://github.com/apache/arrow-datafusion/issues/8175


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org