You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/19 13:34:09 UTC
[GitHub] [arrow] alamb opened a new pull request #9264: ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record batch, remove test::format_batch
alamb opened a new pull request #9264:
URL: https://github.com/apache/arrow/pull/9264
Note this PR needs the code from https://github.com/apache/arrow/pull/9263 to pass, so marking as a draft until that is complete
The `test::format_batch` function does not have wide range of type support (e.g. it doesn't support dictionaries) and its output makes tests hard to read / update, in my opinion. This PR consolidates the datafusion tests to use `arrow::util::pretty::pretty_format_batches` both to reduce code duplication as well as increase type support
This PR removes the `test::format_batch(&batch);` function and replaces it with `arrow::util::pretty::pretty_format_batches` and some macros. It has no code changes.
This change the following benefits:
1. Better type support (I immediately can compare RecordBatches with `Dictionary` types in tests without having to update `format_batch` and https://github.com/apache/arrow/pull/9233 gets simpler)
2. Better readability and error reporting (at least I find the code and diffs easier to understand)
3. Easier test update / review: it is easier to update the diffs (you can copy/paste the test output into the source code) and to review them
This is a variant of a strategy that I been using with success in IOx [source link](https://github.com/influxdata/influxdb_iox/blob/main/arrow_deps/src/test_util.rs#L15) and I wanted to contribute it back.
An example failure with this PR:
```
---- physical_plan::hash_join::tests::join_left_one stdout ----
thread 'physical_plan::hash_join::tests::join_left_one' panicked at 'assertion failed: `(left == right)`
left: `["+----+----+----+----+", "| a1 | b2 | c1 | c2 |", "+----+----+----+----+", "| 1 | 1 | 7 | 70 |", "| 2 | 2 | 8 | 80 |", "| 2 | 2 | 9 | 80 |", "+----+----+----+----+"]`,
right: `["+----+----+----+----+----+", "| a1 | b1 | c1 | a2 | c2 |", "+----+----+----+----+----+", "| 1 | 4 | 7 | 10 | 70 |", "| 2 | 5 | 8 | 20 | 80 |", "| 3 | 7 | 9 | | |", "+----+----+----+----+----+"]`:
expected:
[
"+----+----+----+----+",
"| a1 | b2 | c1 | c2 |",
"+----+----+----+----+",
"| 1 | 1 | 7 | 70 |",
"| 2 | 2 | 8 | 80 |",
"| 2 | 2 | 9 | 80 |",
"+----+----+----+----+",
]
actual:
[
"+----+----+----+----+----+",
"| a1 | b1 | c1 | a2 | c2 |",
"+----+----+----+----+----+",
"| 1 | 4 | 7 | 10 | 70 |",
"| 2 | 5 | 8 | 20 | 80 |",
"| 3 | 7 | 9 | | |",
"+----+----+----+----+----+",
]
```
You can copy/paste the output of `actual` directly into the test code for an update.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alamb commented on pull request #9264: ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record batch, remove test::format_batch
Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9264:
URL: https://github.com/apache/arrow/pull/9264#issuecomment-765373605
@jorgecarleitao @andygrove and @seddonm1 and @Dandandan -- what do you think of this approach to testing DataFusion output?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alamb closed pull request #9264: ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record batch, remove test::format_batch
Posted by GitBox <gi...@apache.org>.
alamb closed pull request #9264:
URL: https://github.com/apache/arrow/pull/9264
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io commented on pull request #9264: ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record batch, remove test::format_batch
Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9264:
URL: https://github.com/apache/arrow/pull/9264#issuecomment-765910633
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9264?src=pr&el=h1) Report
> Merging [#9264](https://codecov.io/gh/apache/arrow/pull/9264?src=pr&el=desc) (363e1f4) into [master](https://codecov.io/gh/apache/arrow/commit/13e2134dac15a4289f0723a21a046533acf526be?el=desc) (13e2134) will **decrease** coverage by `0.02%`.
> The diff coverage is `100.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9264/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9264?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9264 +/- ##
==========================================
- Coverage 81.87% 81.84% -0.03%
==========================================
Files 215 215
Lines 53097 52949 -148
==========================================
- Hits 43471 43336 -135
+ Misses 9626 9613 -13
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9264?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/datafusion/src/execution/context.rs](https://codecov.io/gh/apache/arrow/pull/9264/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vY29udGV4dC5ycw==) | `88.25% <100.00%> (-1.11%)` | :arrow_down: |
| [...ust/datafusion/src/physical\_plan/hash\_aggregate.rs](https://codecov.io/gh/apache/arrow/pull/9264/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2hhc2hfYWdncmVnYXRlLnJz) | `82.17% <100.00%> (-0.13%)` | :arrow_down: |
| [rust/datafusion/src/physical\_plan/hash\_join.rs](https://codecov.io/gh/apache/arrow/pull/9264/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2hhc2hfam9pbi5ycw==) | `83.40% <100.00%> (-0.48%)` | :arrow_down: |
| [rust/datafusion/src/test/mod.rs](https://codecov.io/gh/apache/arrow/pull/9264/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy90ZXN0L21vZC5ycw==) | `100.00% <100.00%> (+10.07%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9264?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9264?src=pr&el=footer). Last update [13e2134...363e1f4](https://codecov.io/gh/apache/arrow/pull/9264?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9264: ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record batch, remove test::format_batch
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9264:
URL: https://github.com/apache/arrow/pull/9264#issuecomment-765912101
https://issues.apache.org/jira/browse/ARROW-11319
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9264: ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record batch, remove test::format_batch
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9264:
URL: https://github.com/apache/arrow/pull/9264#issuecomment-762873667
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
Thanks for opening a pull request!
Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW
Then could you also rename pull request title in the following format?
ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
See also:
* [Other pull requests](https://github.com/apache/arrow/pulls/)
* [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alamb commented on a change in pull request #9264: ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record batch, remove test::format_batch
Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9264:
URL: https://github.com/apache/arrow/pull/9264#discussion_r560947936
##########
File path: rust/datafusion/src/test/mod.rs
##########
@@ -106,137 +106,7 @@ pub fn aggr_test_schema() -> SchemaRef {
]))
}
-/// Format a batch as csv
-pub fn format_batch(batch: &RecordBatch) -> Vec<String> {
Review comment:
format batch is similar, but not the same, as pretty_print
The major difference is that it renders NULL values as `"NULL"` whereas the pretty printer renders them as an empty string `""`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alamb commented on pull request #9264: ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record batch, remove test::format_batch
Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9264:
URL: https://github.com/apache/arrow/pull/9264#issuecomment-765908175
Rebasing to resolve a logical conflict (a new test that was added)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org