You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "westonpace (via GitHub)" <gi...@apache.org> on 2023/06/28 21:26:55 UTC

[GitHub] [arrow] westonpace opened a new pull request, #36368: GH-36053: [C++] summarizing a variable results in NA at random, while there is no NA in the subset of data

westonpace opened a new pull request, #36368:
URL: https://github.com/apache/arrow/pull/36368

   ### Rationale for this change
   
   When merging two aggregate states we were failing to use the correct `no_nulls` field.  This field tells us whether we should return `null` if `skip_nulls=False` (if `no_nulls` is false then we return null).
   
   Since we were reading the wrong field we would sometimes emit null even when a column didn't actually have any nulls.
   
   ### What changes are included in this PR?
   
   Fixed the bug.
   
   ### Are these changes tested?
   
   Yes, I added a new unit test that reproduced this failure quite reliably.
   
   ### Are there any user-facing changes?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] bkietz merged pull request #36368: GH-36053: [C++] summarizing a variable results in NA at random, while there is no NA in the subset of data

Posted by "bkietz (via GitHub)" <gi...@apache.org>.
bkietz merged PR #36368:
URL: https://github.com/apache/arrow/pull/36368


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on pull request #36368: GH-36053: [C++] summarizing a variable results in NA at random, while there is no NA in the subset of data

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on PR #36368:
URL: https://github.com/apache/arrow/pull/36368#issuecomment-1613520117

   Just a note that I ran the original R reprex against Arrow C++ built from this branch to confirm that it fixes the issue (it does! Thank you!).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #36368: GH-36053: [C++] summarizing a variable results in NA at random, while there is no NA in the subset of data

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #36368:
URL: https://github.com/apache/arrow/pull/36368#issuecomment-1621322410

   Conbench analyzed the 6 benchmark runs on commit `0cea12ff`.
   
   There were 6 benchmark results indicating a performance regression:
   
   - Commit Run on `arm64-m6g-linux-compute` at [2023-06-29 19:56:15Z](http://conbench.ursa.dev/compare/runs/61dc5fa92570411083bb6a8a3864393c...3074207cef0342348ea3f7a6477bf54d/)
     - [params=<Round, FloatType, RoundMode::HALF_TOWARDS_ZERO>/size:1048576/inverse_null_proportion:100, source=cpp-micro, suite=arrow-compute-scalar-round-benchmark](http://conbench.ursa.dev/compare/benchmarks/0649de02802f77768000f450f9310392...0649de1f95467e358000790b9728a136)
   
   - Commit Run on `ursa-thinkcentre-m75q` at [2023-07-03 10:15:31Z](http://conbench.ursa.dev/compare/runs/feadd13ee9054658a52fec6851176352...a3e8350351a94c60ae9beebe5d02fa7a/)
     - [params=1024, source=cpp-micro, suite=parquet-encoding-benchmark](http://conbench.ursa.dev/compare/benchmarks/064a277e75567a3c8000fa65a256a92a...064a2a02af4d7bc18000309b753dc261)
   - and 4 more (see the report linked below)
   
   The [full Conbench report](https://github.com/apache/arrow/runs/14787559331) has more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #36368: GH-36053: [C++] summarizing a variable results in NA at random, while there is no NA in the subset of data

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #36368:
URL: https://github.com/apache/arrow/pull/36368#issuecomment-1612131883

   :warning: GitHub issue #36053 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org