You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/08/10 15:40:54 UTC

[GitHub] [arrow] alamb opened a new pull request #7924: ARROW-9653: [Rust][DataFusion] Properly handle multi-column aggregates

alamb opened a new pull request #7924:
URL: https://github.com/apache/arrow/pull/7924


   Fix a loop bound bug in aggregate expression handling, tests for same
   
   h2. Before this PR
   ```
   CREATE EXTERNAL TABLE repro(a INT, b INT)
   STORED AS CSV
   WITH HEADER ROW
   LOCATION 'repro.csv';
   
   select sum(a), a, b from repro group by a, b;
   ArrowError(InvalidArgumentError("number of columns(4) must match number of fields(3) in schema"))
   ```
   
   
   h2. After this PR: 
   
   ```
   CREATE EXTERNAL TABLE repro(a INT, b INT)
   STORED AS CSV
   WITH HEADER ROW
   LOCATION 'repro.csv';
   
   > select sum(a), a, b from repro group by a, b;
   
   +--------+---+-----+
   | sum(a) | a | b   |
   +--------+---+-----+
   | 2      | 2 | 100 |
   | 1      | 1 | 100 |
   | 2      | 2 | 200 |
   | 1      | 1 | 200 |
   | 2      | 2 | 300 |
   +--------+---+-----+
   5 row in set. Query took 0 seconds.
   ```
   
   contents of `repro.csv`
   ```
   a,b
   1,100
   1,200
   2,100
   2,200
   2,300
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #7924: ARROW-9653: [Rust][DataFusion] Properly handle multi-column aggregates

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #7924:
URL: https://github.com/apache/arrow/pull/7924#discussion_r468000922



##########
File path: rust/datafusion/src/execution/physical_plan/hash_aggregate.rs
##########
@@ -500,50 +500,49 @@ impl RecordBatchReader for GroupedHashAggregateIterator {
                 )),
             };
             result_arrays.push(array.map_err(ExecutionError::into_arrow_external_error)?);
+        }

Review comment:
       If you look at this with a whitespace blind diff: https://github.com/apache/arrow/pull/7924/files?w=1 the change might be easier to see. The issue is that the current code adds the aggregate expressions one per *grouping expression* rather than once per operator. 
   
   The change is in essence, to move the location of the closing `}` for aggregate expressions




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #7924: ARROW-9653: [Rust][DataFusion] Properly handle multi-column aggregates

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #7924:
URL: https://github.com/apache/arrow/pull/7924#discussion_r468065515



##########
File path: .github/workflows/rust.yml
##########
@@ -152,12 +152,6 @@ jobs:
       - name: Fetch Submodules and Tags
         shell: bash
         run: ci/scripts/util_checkout.sh
-      - name: Cache Build Artifacts

Review comment:
       This is a workaround to get CI to pass on this PR I broke it out into its own PR here: https://github.com/apache/arrow/pull/7926




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7924: ARROW-9653: [Rust][DataFusion] Properly handle multi-column aggregates

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7924:
URL: https://github.com/apache/arrow/pull/7924#issuecomment-671438322


   https://issues.apache.org/jira/browse/ARROW-9653


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #7924: ARROW-9653: [Rust][DataFusion] Properly handle multi-column aggregates

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #7924:
URL: https://github.com/apache/arrow/pull/7924#issuecomment-671554022


   Rebased to pick up 37ee600


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove closed pull request #7924: ARROW-9653: [Rust][DataFusion] Do not error in planner with SQL has multiple group by expressions

Posted by GitBox <gi...@apache.org>.
andygrove closed pull request #7924:
URL: https://github.com/apache/arrow/pull/7924


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #7924: ARROW-9653: [Rust][DataFusion] Properly handle multi-column aggregates

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #7924:
URL: https://github.com/apache/arrow/pull/7924#issuecomment-672001803


   LGTM! 
   
   Surprised on how such a simple query slipped through our tests. :(


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org