You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/24 01:37:59 UTC

[GitHub] [arrow-datafusion] houqp opened a new pull request #615: use Into as argument type wherever applicable

houqp opened a new pull request #615:
URL: https://github.com/apache/arrow-datafusion/pull/615


    # Rationale for this change
   
   Following the article shared by @jorgecarleitao: https://phaazon.net/blog/on-owning-borrowing-pub-interface. Using `impl Into<T>` as argument type makes it possible for function to take ownership of argument while providing the flexibility to the callers to decide whether they want to provide the argument by reference or by value.
   
   @Dandandan also proposed this pattern in #55 for the builder interface: https://github.com/apache/arrow-datafusion/blob/d55a10569b3a24195bed2d67cc6414c63b6b2336/datafusion/src/logical_plan/builder.rs#L264
   
   I just never thought of extending it to other existing functions and types until I saw that article. I think we can apply this same pattern to other types as well like `LogicalPlanBuilder::from`, which takes a reference of LogicalPlan as argument and clones it internally.
   
   # What changes are included in this PR?
   
   Change `&str` to `impl Into<String>` for functions that does internal clone of the string.
   
   # Are there any user-facing changes?
   
   No breaking API change.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #615: use Into as argument type wherever applicable

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #615:
URL: https://github.com/apache/arrow-datafusion/pull/615#discussion_r658008899



##########
File path: datafusion/src/execution/context.rs
##########
@@ -511,9 +515,10 @@ impl ExecutionContext {
     pub async fn write_parquet(
         &self,
         plan: Arc<dyn ExecutionPlan>,
-        path: String,
+        path: impl Into<String>,
         writer_properties: Option<WriterProperties>,
     ) -> Result<()> {
+        let path = path.into();

Review comment:
       This change looks to me like it simply copies its argument an extra time -- in other words `write_parquet` doesn't actually take ownership of the `String` it just needs a reference
   
   If we want to make this more generic (so that it can take anything that can provide itself as a `&str`) perhaps something like:
   
   ```rust
     path: impl AsRef<str>
     ...
   let path = path.as_ref()
   ``` 
   would be better?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on a change in pull request #615: use Into as argument type wherever applicable

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #615:
URL: https://github.com/apache/arrow-datafusion/pull/615#discussion_r658490596



##########
File path: datafusion/src/logical_plan/builder.rs
##########
@@ -115,21 +115,23 @@ impl LogicalPlanBuilder {
 
     /// Scan a CSV data source
     pub fn scan_csv(
-        path: &str,
+        path: impl Into<String>,
         options: CsvReadOptions,
         projection: Option<Vec<usize>>,
     ) -> Result<Self> {
-        let provider = Arc::new(CsvFile::try_new(path, options)?);
+        let path = path.into();

Review comment:
       the owned value is passed along to `Self::scan` at the end of the function.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #615: use Into as argument type wherever applicable

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #615:
URL: https://github.com/apache/arrow-datafusion/pull/615#issuecomment-869142236


   I think this one is ready - it just needs to be rebased


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on a change in pull request #615: use Into as argument type wherever applicable

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #615:
URL: https://github.com/apache/arrow-datafusion/pull/615#discussion_r658489425



##########
File path: datafusion/src/execution/context.rs
##########
@@ -511,9 +515,10 @@ impl ExecutionContext {
     pub async fn write_parquet(
         &self,
         plan: Arc<dyn ExecutionPlan>,
-        path: String,
+        path: impl Into<String>,
         writer_properties: Option<WriterProperties>,
     ) -> Result<()> {
+        let path = path.into();

Review comment:
       yep, good call, i didn't double check whether the logic actually needs the ownership :+1: 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #615: use Into as argument type wherever applicable

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #615:
URL: https://github.com/apache/arrow-datafusion/pull/615


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #615: use Into as argument type wherever applicable

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #615:
URL: https://github.com/apache/arrow-datafusion/pull/615#discussion_r658010381



##########
File path: datafusion/src/logical_plan/builder.rs
##########
@@ -115,21 +115,23 @@ impl LogicalPlanBuilder {
 
     /// Scan a CSV data source
     pub fn scan_csv(
-        path: &str,
+        path: impl Into<String>,
         options: CsvReadOptions,
         projection: Option<Vec<usize>>,
     ) -> Result<Self> {
-        let provider = Arc::new(CsvFile::try_new(path, options)?);
+        let path = path.into();

Review comment:
       As above, this pattern just doesn't make a lot of sense to me -- if someone passed in a `&str` then this code will now create a copy (a `String`) and then simply use a reference to that copy (rather than the path itself)

##########
File path: datafusion/src/logical_plan/expr.rs
##########
@@ -44,10 +44,10 @@ pub struct Column {
 
 impl Column {
     /// Create Column from unqualified name.
-    pub fn from_name(name: String) -> Self {
+    pub fn from_name(name: impl Into<String>) -> Self {

Review comment:
       👍 

##########
File path: datafusion/src/logical_plan/builder.rs
##########
@@ -146,10 +148,12 @@ impl LogicalPlanBuilder {
 
     /// Convert a table provider into a builder with a TableScan
     pub fn scan(
-        table_name: &str,
+        table_name: impl Into<String>,
         provider: Arc<dyn TableProvider>,
         projection: Option<Vec<usize>>,
     ) -> Result<Self> {
+        let table_name = table_name.into();

Review comment:
       This change makes sense to me as this function previously was making an explicit copy and no can reuse the copy the caller passed in, if any 👍 
   
   ```
               table_name: table_name.to_string(),
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org