You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/27 16:30:26 UTC

[GitHub] [arrow] Dandandan opened a new pull request #9338: ARROW-11401: [Rust][DataFusion] Pass slices instead of Vec in DataFrame API

Dandandan opened a new pull request #9338:
URL: https://github.com/apache/arrow/pull/9338


   This PR makes the API a bit more ergonomic to use (saving 3 characters per parameter), avoiding creating a `Vec` with a macro, or cloning an existing one.
   
   It is also more consistent, for example `DataFrame::join` already was taking parameters by `&[&str]` instead of `Vec<&str>`.
   
   This is a (big) backwards incompatible change.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #9338: ARROW-11401: [Rust][DataFusion] Pass slices instead of Vec in DataFrame API

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9338:
URL: https://github.com/apache/arrow/pull/9338#issuecomment-771572785


   Looks like this idea doesn't work for the `DataFrame` API as the methods return the trait object, which doesn't allow generic types. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan edited a comment on pull request #9338: ARROW-11401: [Rust][DataFusion] Pass slices instead of Vec in DataFrame API

Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on pull request #9338:
URL: https://github.com/apache/arrow/pull/9338#issuecomment-771572785


   Looks like my idea of `<Into<Vec<Expr>>` doesn't work for the `DataFrame` API as the methods return a trait object, which doesn't allow generic types. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan edited a comment on pull request #9338: ARROW-11401: [Rust][DataFusion] Pass slices instead of Vec in DataFrame API

Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on pull request #9338:
URL: https://github.com/apache/arrow/pull/9338#issuecomment-771572785






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9338: ARROW-11401: [Rust][DataFusion] Pass slices instead of Vec in DataFrame API

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9338:
URL: https://github.com/apache/arrow/pull/9338#issuecomment-771572774


   @Dandandan  I can also take a crack at this later in the week if you don't get to it


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #9338: ARROW-11401: [Rust][DataFusion] Pass slices instead of Vec in DataFrame API

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9338:
URL: https://github.com/apache/arrow/pull/9338#issuecomment-771441508






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #9338: ARROW-11401: [Rust][DataFusion] Pass slices instead of Vec in DataFrame API

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9338:
URL: https://github.com/apache/arrow/pull/9338#issuecomment-771503315


   @alamb what do you think of an implementation like the following? I'll try to create a PR for this later this week.
    
   This would allow passing both an array (for simple contant cases like `sort([col("a")])`) and `Vecs`. For a `Vec`, `into()` is a no-op, for arrays it should be the same as using `vec![]`
   ```rust
       pub fn sort<T: Into<Vec<Expr>>>(&self, expr: T) -> Result<Self> {
           Ok(Self::from(&LogicalPlan::Sort {
               expr: expr.into(),
               input: Arc::new(self.plan.clone()),
           }))
       }
   
   ```
   
   Apart from this, `Self::from` currently does another clone on the logical plan, copying the expressions one more time. This could be changed by having it not accepting a `&LogicalPlan` but a `LogicalPlan`. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9338: ARROW-11401: [Rust][DataFusion] Pass slices instead of Vec in DataFrame API

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9338:
URL: https://github.com/apache/arrow/pull/9338#issuecomment-771572317






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #9338: ARROW-11401: [Rust][DataFusion] Pass slices instead of Vec in DataFrame API

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9338:
URL: https://github.com/apache/arrow/pull/9338#issuecomment-771441508


   @alamb I think you are right, there are two places now `pub fn sort` and `pub fn aggregate` and that do more cloning than before.
   
   I could imagine this would lead to a bit more cloning in some cases, although most cases will be passing either a simple constant slice or a slice from an existing vec.
   I will have a look to your PR.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan edited a comment on pull request #9338: ARROW-11401: [Rust][DataFusion] Pass slices instead of Vec in DataFrame API

Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on pull request #9338:
URL: https://github.com/apache/arrow/pull/9338#issuecomment-771572785


   Looks like this idea doesn't work for the `DataFrame` API as the methods return a trait object, which doesn't allow generic types. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao closed pull request #9338: ARROW-11401: [Rust][DataFusion] Pass slices instead of Vec in DataFrame API

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #9338:
URL: https://github.com/apache/arrow/pull/9338


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9338: ARROW-11401: [Rust][DataFusion] Pass slices instead of Vec in DataFrame API

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9338:
URL: https://github.com/apache/arrow/pull/9338#issuecomment-771572317


   @Dandandan  I think that idea looks good. What do you think about the following slightly more general (but possibly less efficient) version that takes an `impl IntoIterator`  instead:
   
   ```
     pub fn sort(&self, expr: impl IntoIterator<Item=Expr>) -> Result<Self> {
           Ok(Self::from(&LogicalPlan::Sort {
               expr: expr.into_iter().collect(),
               input: Arc::new(self.plan.clone()),
           }))
       }
   ```
   
   I am not sure how much extra work, if any, that would require for `Vec<Expr>` but it would allow creating plans with things like values in a map:
   ```
   let my_map: BTreeMap<&str, Expr> = ....;
   builder = builder.sort(my_map.values())
   ```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #9338: ARROW-11401: [Rust][DataFusion] Pass slices instead of Vec in DataFrame API

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9338:
URL: https://github.com/apache/arrow/pull/9338#issuecomment-768633577


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9338?src=pr&el=h1) Report
   > Merging [#9338](https://codecov.io/gh/apache/arrow/pull/9338?src=pr&el=desc) (a97dc43) into [master](https://codecov.io/gh/apache/arrow/commit/ab5fc979c69ccc5dde07e1bc1467b02951b4b7e9?el=desc) (ab5fc97) will **decrease** coverage by `0.00%`.
   > The diff coverage is `97.75%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9338/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9338?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9338      +/-   ##
   ==========================================
   - Coverage   81.89%   81.88%   -0.01%     
   ==========================================
     Files         215      215              
     Lines       52988    52988              
   ==========================================
   - Hits        43392    43391       -1     
   - Misses       9596     9597       +1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9338?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/benchmarks/src/bin/tpch.rs](https://codecov.io/gh/apache/arrow/pull/9338/diff?src=pr&el=tree#diff-cnVzdC9iZW5jaG1hcmtzL3NyYy9iaW4vdHBjaC5ycw==) | `6.97% <0.00%> (ø)` | |
   | [rust/datafusion/examples/simple\_udaf.rs](https://codecov.io/gh/apache/arrow/pull/9338/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL2V4YW1wbGVzL3NpbXBsZV91ZGFmLnJz) | `0.00% <ø> (ø)` | |
   | [rust/datafusion/src/execution/dataframe\_impl.rs](https://codecov.io/gh/apache/arrow/pull/9338/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vZGF0YWZyYW1lX2ltcGwucnM=) | `93.52% <92.85%> (ø)` | |
   | [rust/datafusion/src/execution/context.rs](https://codecov.io/gh/apache/arrow/pull/9338/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vY29udGV4dC5ycw==) | `88.25% <100.00%> (ø)` | |
   | [rust/datafusion/src/logical\_plan/builder.rs](https://codecov.io/gh/apache/arrow/pull/9338/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vYnVpbGRlci5ycw==) | `88.20% <100.00%> (-0.07%)` | :arrow_down: |
   | [rust/datafusion/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow/pull/9338/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vcGxhbi5ycw==) | `82.48% <100.00%> (ø)` | |
   | [rust/datafusion/src/optimizer/filter\_push\_down.rs](https://codecov.io/gh/apache/arrow/pull/9338/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvZmlsdGVyX3B1c2hfZG93bi5ycw==) | `97.65% <100.00%> (ø)` | |
   | [...t/datafusion/src/optimizer/projection\_push\_down.rs](https://codecov.io/gh/apache/arrow/pull/9338/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvcHJvamVjdGlvbl9wdXNoX2Rvd24ucnM=) | `97.70% <100.00%> (ø)` | |
   | [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/9338/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `78.63% <100.00%> (ø)` | |
   | [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9338/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `84.17% <100.00%> (+0.01%)` | :arrow_up: |
   | ... and [4 more](https://codecov.io/gh/apache/arrow/pull/9338/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9338?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9338?src=pr&el=footer). Last update [ab5fc97...a97dc43](https://codecov.io/gh/apache/arrow/pull/9338?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan edited a comment on pull request #9338: ARROW-11401: [Rust][DataFusion] Pass slices instead of Vec in DataFrame API

Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on pull request #9338:
URL: https://github.com/apache/arrow/pull/9338#issuecomment-771572785


   Looks like my idea of `<Into<Vec<Expr>>` doesn't work for the `DataFrame` API as the methods return a trait object, which doesn't allow generic types for member methods.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9338: ARROW-11401: [Rust][DataFusion] Pass slices instead of Vec in DataFrame API

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9338:
URL: https://github.com/apache/arrow/pull/9338#issuecomment-768408720


   https://issues.apache.org/jira/browse/ARROW-11401


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org