You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/19 13:34:09 UTC

[GitHub] [arrow] alamb opened a new pull request #9264: ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record batch, remove test::format_batch

alamb opened a new pull request #9264:
URL: https://github.com/apache/arrow/pull/9264


   Note this PR needs the code from  https://github.com/apache/arrow/pull/9263 to pass, so marking as a draft until that is complete
   
   The `test::format_batch` function does not have wide range of type support (e.g. it doesn't support dictionaries) and its output makes tests hard to read / update, in my opinion. This PR consolidates the datafusion tests to use `arrow::util::pretty::pretty_format_batches` both to reduce code duplication as well as increase type support
   
   This PR removes the  `test::format_batch(&batch);` function and replaces it with `arrow::util::pretty::pretty_format_batches` and some macros. It has no code changes.
   
   This change the following benefits:
   
   1. Better type support (I immediately can compare RecordBatches with `Dictionary` types in tests without having to update `format_batch` and https://github.com/apache/arrow/pull/9233 gets simpler)
   2. Better readability and error reporting (at least I find the code and diffs easier to understand)
   3. Easier test update / review: it is easier to update the diffs (you can copy/paste the test output into the source code) and to review them
   
   This is a variant of a strategy that I been using with success in IOx [source link](https://github.com/influxdata/influxdb_iox/blob/main/arrow_deps/src/test_util.rs#L15) and I wanted to contribute it back.
   
   An example failure with this PR:
   
   ```
   ---- physical_plan::hash_join::tests::join_left_one stdout ----
   thread 'physical_plan::hash_join::tests::join_left_one' panicked at 'assertion failed: `(left == right)`
     left: `["+----+----+----+----+", "| a1 | b2 | c1 | c2 |", "+----+----+----+----+", "| 1  | 1  | 7  | 70 |", "| 2  | 2  | 8  | 80 |", "| 2  | 2  | 9  | 80 |", "+----+----+----+----+"]`,
    right: `["+----+----+----+----+----+", "| a1 | b1 | c1 | a2 | c2 |", "+----+----+----+----+----+", "| 1  | 4  | 7  | 10 | 70 |", "| 2  | 5  | 8  | 20 | 80 |", "| 3  | 7  | 9  |    |    |", "+----+----+----+----+----+"]`:
   
   expected:
   
   [
       "+----+----+----+----+",
       "| a1 | b2 | c1 | c2 |",
       "+----+----+----+----+",
       "| 1  | 1  | 7  | 70 |",
       "| 2  | 2  | 8  | 80 |",
       "| 2  | 2  | 9  | 80 |",
       "+----+----+----+----+",
   ]
   actual:
   
   [
       "+----+----+----+----+----+",
       "| a1 | b1 | c1 | a2 | c2 |",
       "+----+----+----+----+----+",
       "| 1  | 4  | 7  | 10 | 70 |",
       "| 2  | 5  | 8  | 20 | 80 |",
       "| 3  | 7  | 9  |    |    |",
       "+----+----+----+----+----+",
   ]
   ```
   
   You can copy/paste the output of `actual` directly into the test code for an update. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9264: ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record batch, remove test::format_batch

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9264:
URL: https://github.com/apache/arrow/pull/9264#issuecomment-765373605


   @jorgecarleitao @andygrove and @seddonm1 and @Dandandan  -- what do you think of this approach to testing DataFusion output?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #9264: ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record batch, remove test::format_batch

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #9264:
URL: https://github.com/apache/arrow/pull/9264


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #9264: ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record batch, remove test::format_batch

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9264:
URL: https://github.com/apache/arrow/pull/9264#issuecomment-765910633


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9264?src=pr&el=h1) Report
   > Merging [#9264](https://codecov.io/gh/apache/arrow/pull/9264?src=pr&el=desc) (363e1f4) into [master](https://codecov.io/gh/apache/arrow/commit/13e2134dac15a4289f0723a21a046533acf526be?el=desc) (13e2134) will **decrease** coverage by `0.02%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9264/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9264?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9264      +/-   ##
   ==========================================
   - Coverage   81.87%   81.84%   -0.03%     
   ==========================================
     Files         215      215              
     Lines       53097    52949     -148     
   ==========================================
   - Hits        43471    43336     -135     
   + Misses       9626     9613      -13     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9264?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/datafusion/src/execution/context.rs](https://codecov.io/gh/apache/arrow/pull/9264/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vY29udGV4dC5ycw==) | `88.25% <100.00%> (-1.11%)` | :arrow_down: |
   | [...ust/datafusion/src/physical\_plan/hash\_aggregate.rs](https://codecov.io/gh/apache/arrow/pull/9264/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2hhc2hfYWdncmVnYXRlLnJz) | `82.17% <100.00%> (-0.13%)` | :arrow_down: |
   | [rust/datafusion/src/physical\_plan/hash\_join.rs](https://codecov.io/gh/apache/arrow/pull/9264/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2hhc2hfam9pbi5ycw==) | `83.40% <100.00%> (-0.48%)` | :arrow_down: |
   | [rust/datafusion/src/test/mod.rs](https://codecov.io/gh/apache/arrow/pull/9264/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy90ZXN0L21vZC5ycw==) | `100.00% <100.00%> (+10.07%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9264?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9264?src=pr&el=footer). Last update [13e2134...363e1f4](https://codecov.io/gh/apache/arrow/pull/9264?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9264: ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record batch, remove test::format_batch

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9264:
URL: https://github.com/apache/arrow/pull/9264#issuecomment-765912101


   https://issues.apache.org/jira/browse/ARROW-11319


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9264: ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record batch, remove test::format_batch

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9264:
URL: https://github.com/apache/arrow/pull/9264#issuecomment-762873667


   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9264: ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record batch, remove test::format_batch

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9264:
URL: https://github.com/apache/arrow/pull/9264#discussion_r560947936



##########
File path: rust/datafusion/src/test/mod.rs
##########
@@ -106,137 +106,7 @@ pub fn aggr_test_schema() -> SchemaRef {
     ]))
 }
 
-/// Format a batch as csv
-pub fn format_batch(batch: &RecordBatch) -> Vec<String> {

Review comment:
       format batch is similar, but not the same, as pretty_print
   
   The major difference is that it renders NULL values as `"NULL"` whereas the pretty printer renders them as an empty string `""`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9264: ARROW-11319: [Rust] [DataFusion] Improve test comparisons to record batch, remove test::format_batch

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9264:
URL: https://github.com/apache/arrow/pull/9264#issuecomment-765908175


   Rebasing to resolve a logical conflict (a new test that was added)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org