You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/01 15:27:27 UTC

[GitHub] [arrow-datafusion] kszucs opened a new pull request #469: Expose DataFrame::sort in the python bindings

kszucs opened a new pull request #469:
URL: https://github.com/apache/arrow-datafusion/pull/469


   # Which issue does this PR close?
   
   When I tried to execute the python unittests I got two recordbatches instead of a single one in the join test, so I exposed the sort method to resolve that.
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #468
   
   # What changes are included in this PR?
   
   - upgrade PyO3 to 0.13
   - expose DataFrame::sort
   - fix join test case
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #469: Expose DataFrame::sort in the python bindings

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #469:
URL: https://github.com/apache/arrow-datafusion/pull/469


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #469: Expose DataFrame::sort in the python bindings

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #469:
URL: https://github.com/apache/arrow-datafusion/pull/469#issuecomment-852234916


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/469?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#469](https://codecov.io/gh/apache/arrow-datafusion/pull/469?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (748b1fd) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/c8ab5a4f00fc8b362eed72d5feb43b03b8ad1fdd?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c8ab5a4) will **increase** coverage by `0.08%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/469/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/469?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #469      +/-   ##
   ==========================================
   + Coverage   75.30%   75.38%   +0.08%     
   ==========================================
     Files         152      153       +1     
     Lines       25275    25694     +419     
   ==========================================
   + Hits        19033    19370     +337     
   - Misses       6242     6324      +82     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/469?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [python/src/dataframe.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9kYXRhZnJhbWUucnM=) | `0.00% <0.00%> (ø)` | |
   | [python/src/expression.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHl0aG9uL3NyYy9leHByZXNzaW9uLnJz) | `0.00% <0.00%> (ø)` | |
   | [datafusion/src/physical\_plan/common.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9jb21tb24ucnM=) | `84.21% <0.00%> (-2.00%)` | :arrow_down: |
   | [...afusion/src/physical\_plan/sort\_preserving\_merge.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9zb3J0X3ByZXNlcnZpbmdfbWVyZ2UucnM=) | `81.66% <0.00%> (ø)` | |
   | [datafusion/src/physical\_plan/sort.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9zb3J0LnJz) | `91.74% <0.00%> (+0.48%)` | :arrow_up: |
   | [datafusion/src/physical\_plan/merge.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9tZXJnZS5ycw==) | `75.00% <0.00%> (+0.71%)` | :arrow_up: |
   | [datafusion/src/physical\_plan/csv.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/469/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9jc3YucnM=) | `82.30% <0.00%> (+0.88%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/469?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/469?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [c8ab5a4...748b1fd](https://codecov.io/gh/apache/arrow-datafusion/pull/469?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on a change in pull request #469: Expose DataFrame::sort in the python bindings

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #469:
URL: https://github.com/apache/arrow-datafusion/pull/469#discussion_r643219189



##########
File path: python/src/dataframe.rs
##########
@@ -93,6 +93,18 @@ impl DataFrame {
         })
     }
 
+    /// Sort by specified sorting expressions
+    fn sort(&self, exprs: Vec<expression::Expression>) -> PyResult<Self> {
+        let exprs = exprs.into_iter().map(|e| e.expr);
+        let builder = LogicalPlanBuilder::from(&self.plan);

Review comment:
       I agree that it is not very clean to use the builder; I tried using the `DataFrameImpl` at some point but was unable because the `DataFrameImpl` contains an `ExecutionContext`, and the Python bindings has its own ExecutionContext wrapper. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] kszucs commented on a change in pull request #469: Expose DataFrame::sort in the python bindings

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #469:
URL: https://github.com/apache/arrow-datafusion/pull/469#discussion_r643214523



##########
File path: python/src/dataframe.rs
##########
@@ -93,6 +93,18 @@ impl DataFrame {
         })
     }
 
+    /// Sort by specified sorting expressions
+    fn sort(&self, exprs: Vec<expression::Expression>) -> PyResult<Self> {
+        let exprs = exprs.into_iter().map(|e| e.expr);
+        let builder = LogicalPlanBuilder::from(&self.plan);

Review comment:
       @jorgecarleitao would it make sense to wrap the `DataFrameImpl` instead?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] kszucs commented on a change in pull request #469: Expose DataFrame::sort in the python bindings

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #469:
URL: https://github.com/apache/arrow-datafusion/pull/469#discussion_r643229440



##########
File path: python/src/dataframe.rs
##########
@@ -93,6 +93,18 @@ impl DataFrame {
         })
     }
 
+    /// Sort by specified sorting expressions
+    fn sort(&self, exprs: Vec<expression::Expression>) -> PyResult<Self> {
+        let exprs = exprs.into_iter().map(|e| e.expr);
+        let builder = LogicalPlanBuilder::from(&self.plan);

Review comment:
       I think we could share `ExecutionContext.state` between the python binding and `DataFrameImpl` at some point, though it is not urgent. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on pull request #469: Expose DataFrame::sort in the python bindings

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #469:
URL: https://github.com/apache/arrow-datafusion/pull/469#issuecomment-853002494


   yes, press the button 👍. I usually give it 3-4 days to others have the opportunity to review and comment on, but this is not consensual, so as it fits you best :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] kszucs commented on pull request #469: Expose DataFrame::sort in the python bindings

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #469:
URL: https://github.com/apache/arrow-datafusion/pull/469#issuecomment-852852355


   @jorgecarleitao what's the merging policy, just squash and merge on the UI?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org