You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/27 13:29:34 UTC

[GitHub] [arrow-datafusion] alamb opened a new issue #782: Wrong results (missing rows) when grouping on a column with NULLs

alamb opened a new issue #782:
URL: https://github.com/apache/arrow-datafusion/issues/782


   **Describe the bug**
   Given a column with null values, if you group by that column the null values are lost
   
   **To Reproduce**
   Input:
   ```
    c1 
   ----
     0
     3
      
     1
     3
   (5 rows)
   ```
   
   And run `SELECT COUNT(*), c1 FROM test GROUP BY c1`
   
   Actual result
   ```
           "+-----------------+----+",
           "| COUNT(UInt8(1)) | c1 |",
           "+-----------------+----+",
           "| 2               | 3  |",
           "| 2               | 0  |",
           "| 1               | 1  |",
           "+-----------------+----+",
   ```
   **Expected behavior**
   Here is the correct answer according to postgres (note the row for `c1 is NULL`):
   ```
   alamb=# SELECT COUNT(*), c1 FROM test GROUP BY c1;
    count | c1 
   -------+----
        1 |   
        2 |  3
        1 |  0
        1 |  1
   (4 rows)
   ```
   
   **Additional context**
   Discovered while playing around with https://github.com/apache/arrow-datafusion/issues/781
   
   Here is the entire reproducer in postgres
   
   ```sql
   alamb=# drop table test;
   DROP TABLE
   alamb=# create table test (c1 int);
   CREATE TABLE
   alamb=# insert into test values(0);
   INSERT 0 1
   alamb=# insert into test values(3);
   INSERT 0 1
   alamb=# insert into test values(NULL);
   INSERT 0 1
   alamb=# insert into test values(1);
   INSERT 0 1
   alamb=# insert into test values(3);
   INSERT 0 1
   alamb=# SELECT COUNT(*), c1 FROM test GROUP BY c1;
    count | c1 
   -------+----
        1 |   
        2 |  3
        1 |  0
        1 |  1
   (4 rows)
   
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb closed issue #782: Wrong results when grouping on a column with NULLs

Posted by GitBox <gi...@apache.org>.
alamb closed issue #782:
URL: https://github.com/apache/arrow-datafusion/issues/782


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #782: Wrong results when grouping on a column with NULLs

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #782:
URL: https://github.com/apache/arrow-datafusion/issues/782#issuecomment-892860218


   Resolved in https://github.com/apache/arrow-datafusion/pull/793 (though we are working on improved performance in #790)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb closed issue #782: Wrong results when grouping on a column with NULLs

Posted by GitBox <gi...@apache.org>.
alamb closed issue #782:
URL: https://github.com/apache/arrow-datafusion/issues/782


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #782: Wrong results (missing rows) when grouping on a column with NULLs

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #782:
URL: https://github.com/apache/arrow-datafusion/issues/782#issuecomment-887604677


   Actually, it is more insidious -- note that
   ```
           "+-----------------+----+",
           "| COUNT(UInt8(1)) | c1 |",
           "+-----------------+----+",
           "| 2               | 3  |",
           "| 2               | 0  |", <-- NOTE there is only a single actual value of 0 in the input (the other is NULL)
           "| 1               | 1  |",
           "+-----------------+----+",```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #782: Wrong results when grouping on a column with NULLs

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #782:
URL: https://github.com/apache/arrow-datafusion/issues/782#issuecomment-887707655


   @Dandandan  I think you are right. I will write up a proposal of how to fix this shortly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #782: Wrong results when grouping on a column with NULLs

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #782:
URL: https://github.com/apache/arrow-datafusion/issues/782#issuecomment-887698068


   Great find. I think group by doesn't support nulls at all ATM.
   
   The groupby scalars don't have a null and extracting a key doesn't look yet at whether the value is null or not.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #782: Wrong results when grouping on a column with NULLs

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #782:
URL: https://github.com/apache/arrow-datafusion/issues/782#issuecomment-892860218


   Resolved in https://github.com/apache/arrow-datafusion/pull/793 (though we are working on improved performance in #790)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org