You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/07 13:39:44 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request #521: Return errors properly from RepartitionExec

alamb opened a new pull request #521:
URL: https://github.com/apache/arrow-datafusion/pull/521


   # Which issue does this PR close?
   
   Closes https://github.com/apache/arrow-datafusion/issues/437
   
    # Rationale for this change
   Errors should be returned rather than incorrect results (empty stream). See more details on https://github.com/apache/arrow-datafusion/issues/437
   
   # What changes are included in this PR?
   Properly propagate errors from input, and tests for same


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #521: Return errors properly from RepartitionExec

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #521:
URL: https://github.com/apache/arrow-datafusion/pull/521#discussion_r649260106



##########
File path: datafusion/src/physical_plan/repartition.rs
##########
@@ -308,6 +310,45 @@ impl RepartitionExec {
             send_time_nanos: SQLMetric::time_nanos(),
         })
     }
+
+    /// Waits for `input_task` which is consuming one of the inputs to

Review comment:
       in https://github.com/apache/arrow-datafusion/pull/538




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] tustvold commented on a change in pull request #521: Return errors properly from RepartitionExec

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #521:
URL: https://github.com/apache/arrow-datafusion/pull/521#discussion_r646611109



##########
File path: datafusion/src/physical_plan/repartition.rs
##########
@@ -308,6 +310,45 @@ impl RepartitionExec {
             send_time_nanos: SQLMetric::time_nanos(),
         })
     }
+
+    /// Waits for `input_task` which is consuming one of the inputs to

Review comment:
       I wonder if it might be slightly clearer to push the body of the main task into a fallible function, and to then handle propagating any error it returns within the spawned task? i.e. rather than propagating the error through the JoinHandle, make the task that is spawned onto tokio infallible and handle its errors internally??




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #521: Return errors properly from RepartitionExec

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #521:
URL: https://github.com/apache/arrow-datafusion/pull/521#issuecomment-855977564


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/521?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#521](https://codecov.io/gh/apache/arrow-datafusion/pull/521?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (76a1d96) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/767eeb0a8bf17916aafb9a88abd52e7350acb596?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (767eeb0) will **increase** coverage by `0.01%`.
   > The diff coverage is `80.17%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/521/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/521?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #521      +/-   ##
   ==========================================
   + Coverage   76.07%   76.09%   +0.01%     
   ==========================================
     Files         156      156              
     Lines       26750    26858     +108     
   ==========================================
   + Hits        20351    20438      +87     
   - Misses       6399     6420      +21     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/521?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [datafusion/src/test/exec.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/521/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvdGVzdC9leGVjLnJz) | `71.21% <65.21%> (-13.79%)` | :arrow_down: |
   | [datafusion/src/physical\_plan/repartition.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/521/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9yZXBhcnRpdGlvbi5ycw==) | `84.48% <90.00%> (+2.02%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/521?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/521?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [767eeb0...76a1d96](https://codecov.io/gh/apache/arrow-datafusion/pull/521?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] tustvold commented on a change in pull request #521: Return errors properly from RepartitionExec

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #521:
URL: https://github.com/apache/arrow-datafusion/pull/521#discussion_r646611109



##########
File path: datafusion/src/physical_plan/repartition.rs
##########
@@ -308,6 +310,45 @@ impl RepartitionExec {
             send_time_nanos: SQLMetric::time_nanos(),
         })
     }
+
+    /// Waits for `input_task` which is consuming one of the inputs to

Review comment:
       I wonder if it might be slightly clearer to push the body of the main task into a fallible function, and to then handle propagating any error it returns within the spawned task? i.e. rather than propagating the error through the JoinHandle, make the task that is spawned onto tokio infallible and handle its errors internally??
   
   Edit: I guess the advantage with this approach would be that you could propagate panics as well...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #521: Return errors properly from RepartitionExec

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #521:
URL: https://github.com/apache/arrow-datafusion/pull/521#discussion_r646597513



##########
File path: datafusion/src/physical_plan/repartition.rs
##########
@@ -249,13 +252,12 @@ impl ExecutionPlan for RepartitionExec {
                         counter += 1;
                     }
 
-                    // notify each output partition that this input partition has no more data
-                    for (_, tx) in txs {
-                        tx.send(None)
-                            .map_err(|e| DataFusionError::Execution(e.to_string()))?;
-                    }
                     Ok(())
                 });
+
+                // In a separate task, wait for each input to be done

Review comment:
       This is the actual code change (to check for return value in another task). Otherwise the rest of this PR is tests




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #521: Return errors properly from RepartitionExec

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #521:
URL: https://github.com/apache/arrow-datafusion/pull/521#discussion_r646646671



##########
File path: datafusion/src/physical_plan/repartition.rs
##########
@@ -308,6 +310,45 @@ impl RepartitionExec {
             send_time_nanos: SQLMetric::time_nanos(),
         })
     }
+
+    /// Waits for `input_task` which is consuming one of the inputs to

Review comment:
       I agree the approach you describe would be clearer (and avoid needing a separate task) 👍 
   
   The reason I did not pull the main body out into its own function was mostly "trying to keep the diff small" (or perhaps my own laziness wanting to avoid having to figure out all the types of the arguments that got captured),
   
   Perhaps that would be a good follow on PR (there is a lot of messiness / duplication for updating counters which I would also kind of like to fix too)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #521: Return errors properly from RepartitionExec

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #521:
URL: https://github.com/apache/arrow-datafusion/pull/521#issuecomment-857951285


   Rebased


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #521: Return errors properly from RepartitionExec

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #521:
URL: https://github.com/apache/arrow-datafusion/pull/521


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org