You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/19 09:05:30 UTC

[GitHub] [arrow-rs] HaoYang670 opened a new pull request, #1714: Check the length of `null_bit_buffer` in `ArrayData::try_new()`

HaoYang670 opened a new pull request, #1714:
URL: https://github.com/apache/arrow-rs/pull/1714

   Signed-off-by: remzi <13...@gmail.com>
   
   # Which issue does this PR close?
   
   <!---
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #1707.
   
   # Rationale for this change
   The `ArrayData::try_new` will panic if `null_bit_buffer` is too short.
   
   # What changes are included in this PR?
   Check the length of `null_bit_buffer` before we call `new_unchecked` in `ArrayData::try_new()`
   
   # Are there any user-facing changes?
   
   No.
   <!---
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] HaoYang670 commented on pull request #1714: Check the length of `null_bit_buffer` in `ArrayData::try_new()`

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on PR #1714:
URL: https://github.com/apache/arrow-rs/pull/1714#issuecomment-1131441800

   I am not sure if this can be removed: https://github.com/apache/arrow-rs/blob/master/arrow/src/array/data.rs#L683-L693
   as `validate()` is a public function.
   
   But if we don't, we will check the length of `null_bit_buffer` twice. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] HaoYang670 commented on pull request #1714: Check the length of `null_bit_buffer` in `ArrayData::try_new()`

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on PR #1714:
URL: https://github.com/apache/arrow-rs/pull/1714#issuecomment-1133634661

   > But if we don't, we will check the length of `null_bit_buffer` twice.
   
   Maybe in the long term, we should try to remove the `null_count` field from `ArrayData`. I don't find why we need this field except making `fn null_count()` as a `const` function (Who will call `null_count()` multiple times on same array?).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] HaoYang670 commented on pull request #1714: Check the length of `null_bit_buffer` in `ArrayData::try_new()`

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on PR #1714:
URL: https://github.com/apache/arrow-rs/pull/1714#issuecomment-1133774255

   Thank you for your explanation @tustvold. I will do some experiments to test how much performance `null_count` could contribute.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya merged pull request #1714: Check the length of `null_bit_buffer` in `ArrayData::try_new()`

Posted by GitBox <gi...@apache.org>.
viirya merged PR #1714:
URL: https://github.com/apache/arrow-rs/pull/1714


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #1714: Check the length of `null_bit_buffer` in `ArrayData::try_new()`

Posted by GitBox <gi...@apache.org>.
tustvold commented on PR #1714:
URL: https://github.com/apache/arrow-rs/pull/1714#issuecomment-1133637853

   > we should try to remove the null_count field from ArrayData
   
   Often faster kernel implementations are possible if one can ignore nulls, as such null_count helps to inform this selection. It is also fairly common for a given array to be fed to more than one kernel, e.g. cmp then filter, and as such caching the null count on ArrayData avoids computing it multiple times


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org