You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "web3creator (via GitHub)" <gi...@apache.org> on 2023/06/07 14:50:53 UTC

[GitHub] [arrow-datafusion] web3creator opened a new issue, #6586: Why is this group by statement inconsistent with what I expected

web3creator opened a new issue, #6586:
URL: https://github.com/apache/arrow-datafusion/issues/6586

   ### Describe the bug
   
   
   
   I modified this test case,https://github.com/apache/arrow-datafusion/blob/main/datafusion-examples/examples/csv_sql.rs
   ```
       let df = ctx
           .sql(
               "SELECT c1,c2 from aggregate_test_100  GROUP BY c1,c2",
           )
           .await?;
       df.show().await;
   ```
   
   ### To Reproduce
   
   cargo run --example csv_sql
   
   ### Expected behavior
   
   
   why?Why is the result like this, here c1 is not grouped
   +----+----+
   | c1 | c2 |
   +----+----+
   | e  | 5  |
   | c  | 2  |
   | d  | 3  |
   | c  | 4  |
   | b  | 3  |
   | a  | 4  |
   | a  | 2  |
   | c  | 3  |
   | b  | 1  |
   | a  | 3  |
   | d  | 1  |
   | e  | 2  |
   | e  | 4  |
   | a  | 1  |
   | c  | 5  |
   | d  | 5  |
   | b  | 5  |
   | e  | 3  |
   | c  | 1  |
   | b  | 4  |
   | d  | 2  |
   | b  | 2  |
   | d  | 4  |
   | a  | 5  |
   | e  | 1  |
   +----+----+
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] comphead commented on issue #6586: Why is this group by statement inconsistent with what I expected

Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #6586:
URL: https://github.com/apache/arrow-datafusion/issues/6586#issuecomment-1581698361

   HI @web3creator 
   your query
   ```
               "SELECT c1,c2 from aggregate_test_100  GROUP BY c1,c2",
   ```
   groups by both c1, c2. in this case you will get all unique combinations for c1 and c2. if you want to group C1 only
   you will have to `GROUP BY c1,` and some aggr function on C2 like
   
   ```
               "SELECT c1,sum(c2) from aggregate_test_100  GROUP BY c1",
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jiangzhx commented on issue #6586: Why is this group by statement inconsistent with what I expected

Posted by "jiangzhx (via GitHub)" <gi...@apache.org>.
jiangzhx commented on issue #6586:
URL: https://github.com/apache/arrow-datafusion/issues/6586#issuecomment-1581811693

   hi @web3creator .
   if your query like 
   `SELECT c1,c2,c3 from aggregate_test_100  GROUP BY c1,c2,c3`
   will get all unique combinations for c1 and c2 and c3
   your result should be like this.
   
   
   c1 | c2 | c3
   -- | -- | --
   a | 1 | 2
   a | 2 | 2
   a | 2 | 4
   b | 1 | 3
   b | 1 | 4
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] web3creator commented on issue #6586: Why is this group by statement inconsistent with what I expected

Posted by "web3creator (via GitHub)" <gi...@apache.org>.
web3creator commented on issue #6586:
URL: https://github.com/apache/arrow-datafusion/issues/6586#issuecomment-1581843667

   > hi @web3creator . if your query like `SELECT c1,c2,c3 from aggregate_test_100 GROUP BY c1,c2,c3` will get all unique combinations for c1 and c2 and c3 your result should be like this.
   > 
   > c1	c2	c3
   > a	1	2
   > a	2	2
   > a	2	4
   > b	1	3
   > b	1	4
   
   @jiangzhx  I modified this test case,https://github.com/apache/arrow-datafusion/blob/main/datafusion-examples/examples/csv_sql.rs
   ```
       let df = ctx
           .sql(
               "SELECT c1,c2 from aggregate_test_100  GROUP BY c1,c2",
           )
           .await?;
       df.show().await;
   ```
   
   But the actual result is this.That's why
   +----+----+------+
   | c1 | c2 | c3   |
   +----+----+------+
   | b  | 5  | -82  |
   | e  | 3  | 104  |
   | b  | 1  | 54   |
   | e  | 2  | 49   |
   | d  | 2  | 93   |
   | c  | 3  | 22   |
   | b  | 2  | 31   |
   | b  | 4  | 17   |
   | e  | 2  | -61  |
   | c  | 2  | -117 |
   | c  | 2  | 29   |
   | c  | 4  | 3    |
   | c  | 5  | 118  |
   | b  | 3  | -101 |
   | e  | 3  | -95  |
   | d  | 1  | 57   |
   | a  | 5  | 36   |
   | e  | 4  | -53  |
   | a  | 5  | -31  |
   | b  | 3  | 17   |
   | e  | 1  | 36   |
   | a  | 4  | -101 |
   | e  | 4  | 74   |
   | c  | 5  | -94  |
   | e  | 4  | 96   |
   | a  | 1  | 83   |
   | d  | 5  | -40  |
   | b  | 4  | -111 |
   | d  | 1  | 38   |
   | a  | 4  | -38  |
   | e  | 3  | 112  |
   | d  | 3  | 77   |
   | d  | 1  | -8   |
   | b  | 5  | 68   |
   | d  | 3  | -76  |
   | e  | 4  | 73   |
   | b  | 4  | -117 |
   | a  | 1  | -56  |
   | b  | 4  | -59  |
   | e  | 4  | 30   |
   | c  | 2  | 1    |
   | b  | 1  | 29   |
   | a  | 1  | -85  |
   | a  | 4  | -54  |
   | d  | 1  | -98  |
   | d  | 1  | -99  |
   | a  | 2  | 45   |
   | c  | 1  | 41   |
   | b  | 2  | 63   |
   | d  | 4  | 102  |
   | c  | 1  | -24  |
   | d  | 1  | 125  |
   | c  | 2  | -106 |
   | d  | 4  | 55   |
   | c  | 1  | 70   |
   | d  | 1  | -72  |
   | a  | 2  | -48  |
   | a  | 3  | -72  |
   | c  | 2  | -107 |
   | a  | 2  | -43  |
   | d  | 4  | 5    |
   | e  | 1  | 120  |
   | c  | 2  | -60  |
   | c  | 3  | 73   |
   | c  | 3  | -2   |
   | a  | 1  | -5   |
   | b  | 4  | 47   |
   | a  | 3  | 13   |
   | c  | 4  | 123  |
   | a  | 3  | 17   |
   | d  | 5  | -59  |
   | b  | 5  | 62   |
   | b  | 2  | 68   |
   | e  | 3  | 71   |
   | b  | 1  | 12   |
   | e  | 5  | -86  |
   | a  | 3  | -12  |
   | d  | 3  | 123  |
   | c  | 2  | -29  |
   | e  | 2  | 52   |
   | c  | 4  | -79  |
   | c  | 3  | 97   |
   | b  | 5  | -44  |
   | d  | 2  | 113  |
   | c  | 1  | 103  |
   | e  | 2  | 97   |
   | e  | 4  | -56  |
   | a  | 1  | -25  |
   | e  | 4  | 97   |
   | b  | 5  | -5   |
   | a  | 4  | 65   |
   | b  | 2  | -60  |
   | d  | 2  | 122  |
   | e  | 1  | 71   |
   | e  | 5  | 64   |
   | c  | 4  | -90  |
   | a  | 3  | 14   |
   | a  | 5  | -101 |
   +----+----+------+


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] web3creator commented on issue #6586: Why is this group by statement inconsistent with what I expected

Posted by "web3creator (via GitHub)" <gi...@apache.org>.
web3creator commented on issue #6586:
URL: https://github.com/apache/arrow-datafusion/issues/6586#issuecomment-1581142517

   I inquired that this is due to the problem of outputting multiple partitions. I would like to ask, what method should I use to solve this problem?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] web3creator commented on issue #6586: Why is this group by statement inconsistent with what I expected

Posted by "web3creator (via GitHub)" <gi...@apache.org>.
web3creator commented on issue #6586:
URL: https://github.com/apache/arrow-datafusion/issues/6586#issuecomment-1581799991

   > HI @web3creator your query
   > 
   > ```
   >             "SELECT c1,c2 from aggregate_test_100  GROUP BY c1,c2",
   > ```
   > 
   > groups by both c1, c2. in this case you will get all unique combinations for c1 and c2. if you want to group C1 only you will have to `GROUP BY c1,` and some aggr function on C2 like
   > 
   > ```
   >             "SELECT c1,sum(c2) from aggregate_test_100  GROUP BY c1",
   > ```
   
   @comphead  I want to perform simultaneous grouping on multiple columns such as c1, c2, c3,


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] web3creator commented on issue #6586: Why is this group by statement inconsistent with what I expected

Posted by "web3creator (via GitHub)" <gi...@apache.org>.
web3creator commented on issue #6586:
URL: https://github.com/apache/arrow-datafusion/issues/6586#issuecomment-1581143698

   Can anyone help me with this problem?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jiangzhx commented on issue #6586: Why is this group by statement inconsistent with what I expected

Posted by "jiangzhx (via GitHub)" <gi...@apache.org>.
jiangzhx commented on issue #6586:
URL: https://github.com/apache/arrow-datafusion/issues/6586#issuecomment-1582002341

   
   
   
   > > hi @web3creator . if your query like `SELECT c1,c2,c3 from aggregate_test_100 GROUP BY c1,c2,c3` will get all unique combinations for c1 and c2 and c3 your result should be like this.
   > > c1	c2	c3
   > > a	1	2
   > > a	2	2
   > > a	2	4
   > > b	1	3
   > > b	1	4
   > 
   > hi,@jiangzhx I modified this test case,https://github.com/apache/arrow-datafusion/blob/main/datafusion-examples/examples/csv_sql.rs
   > 
   > ```
   >     let df = ctx
   >         .sql(
   >             "SELECT c1,c2,c3 from aggregate_test_100  GROUP BY c1,c2,c3",
   >         )
   >         .await?;
   >     df.show().await;
   > ```
   > 
   > But the actual result is this.That's why 
   
   the csv_sql.rs result is right.
   
   
   
   > hi @web3creator . if your query like `SELECT c1,c2,c3 from aggregate_test_100 GROUP BY c1,c2,c3` will get all unique combinations for c1 and c2 and c3 your result should be like this.
   > 
   > c1	c2	c3
   > a	1	2
   > a	2	2
   > a	2	4
   > b	1	3
   > b	1	4
   
   It's just demo data for you to better understand that when you group by c1, c2, and c3, there should not be any duplicates.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] web3creator closed issue #6586: Why is this group by statement inconsistent with what I expected

Posted by "web3creator (via GitHub)" <gi...@apache.org>.
web3creator closed issue #6586: Why is this group by statement inconsistent with what I expected
URL: https://github.com/apache/arrow-datafusion/issues/6586


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org