You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/05/30 09:08:47 UTC

[GitHub] [arrow] jorisvandenbossche opened a new issue, #35817: [Docs][C++] "value_counts" kernel doc incorrectly mentions to skip nulls

jorisvandenbossche opened a new issue, #35817:
URL: https://github.com/apache/arrow/issues/35817

   The "value_counts" kernel indicates that nulls in the input are skipped:
   
   https://github.com/apache/arrow/blob/431785f3062199b2b9052902b67492b933744833/cpp/src/arrow/compute/kernels/vector_hash.cc#L748-L753
   
   But that's not actually the case. Nulls are also counted and included in the output:
   
   ```python
   In [1]: import pyarrow.compute as pc
   
   In [2]: pc.value_counts([1, 2, 2, None, None])
   Out[2]: 
   <pyarrow.lib.StructArray object at 0x7fb88e1c6f20>
   -- is_valid: all not null
   -- child 0 type: int64
     [
       1,
       2,
       null
     ]
   -- child 1 type: int64
     [
       1,
       2,
       2
     ]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou closed issue #35817: [Docs][C++] "value_counts" kernel doc incorrectly mentions to skip nulls

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou closed issue #35817: [Docs][C++] "value_counts" kernel doc incorrectly mentions to skip nulls
URL: https://github.com/apache/arrow/issues/35817


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #35817: [Docs][C++] "value_counts" kernel doc incorrectly mentions to skip nulls

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #35817:
URL: https://github.com/apache/arrow/issues/35817#issuecomment-1568071013

   Same is true for `unique`:
   
   ```
   In [3]: pc.unique([1, 2, 2, None, None])
   Out[3]: 
   <pyarrow.lib.Int64Array object at 0x7fb88e1c75e0>
   [
     1,
     2,
     null
   ]
   ```
   
   And to show it's just the docs, we have various tests that cover this.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org