You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/11/15 22:25:48 UTC

[I] Improve statistics test coverage [arrow-datafusion]

alamb opened a new issue, #8228:
URL: https://github.com/apache/arrow-datafusion/issues/8228

   ### Is your feature request related to a problem or challenge?
   
   
   It is clear that the current statistics code is lacking tests . For example I was about to delete code https://github.com/apache/arrow-datafusion/pull/8172 but thankfully @berkaysynnada pointed out the code was actually different, yet not tests failed.
   
   I spent some time auditing the codebase for tests, and here is what I found:
   
   ##  places that I think could do with some additional coverage 
   
   These places have tests, but we should review them to ensure that  the coverage is adequte
   
   Here are the places that do have tests, but the coverage probably needs to be reviewed
   - [ ] LocalLimitExec: https://github.com/apache/arrow-datafusion/blob/fdf3f6c3304956cd56131d8783d7cb38a2242a9f/datafusion/physical-plan/src/limit.rs#L833
   - [ ] UnionExec: https://github.com/apache/arrow-datafusion/blob/c2e768052c43e4bab6705ee76befc19de383c2cb/datafusion/physical-plan/src/union.rs#L696
   - [ ] FilterExec: https://github.com/apache/arrow-datafusion/blob/e1c2f9583015db326b3439897376f14f6b83a99a/datafusion/physical-plan/src/filter.rs#L455
   
   ##  places that appear to be lacking coverage at all
   
   
   Here are `impl ExecutionPlan` that implement  `statistics` but I didn't find *any* tests (though I could have missed them)
   - [ ] `get_statistics_with_limit`: https://github.com/apache/arrow-datafusion/blob/e54894c39202815b14d9e7eae58f64d3a269c165/datafusion/core/src/datasource/statistics.rs#L34-L33
   - [ ] `Join` statistics: https://github.com/apache/arrow-datafusion/blob/e642cc2a94f38518d765d25c8113523aedc29198/datafusion/physical-plan/src/joins/utils.rs#L455-L454
   - [ ] `HashAggregateExec` https://github.com/apache/arrow-datafusion/blob/67d66faa829ea2fe102384a7534f86e66a3027b7/datafusion/physical-plan/src/aggregates/mod.rs#L888-L887
   - [ ] `WindowExec`: https://github.com/apache/arrow-datafusion/blob/c2e768052c43e4bab6705ee76befc19de383c2cb/datafusion/physical-plan/src/windows/window_agg_exec.rs#L250
   - [ ] `BoundedWindowExec`: https://github.com/apache/arrow-datafusion/blob/c2e768052c43e4bab6705ee76befc19de383c2cb/datafusion/physical-plan/src/windows/bounded_window_agg_exec.rs#L311
   
   
   ### Describe the solution you'd like
   
   Review and add coverage as necessary to locations above
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org