You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/22 10:18:25 UTC

[GitHub] [arrow-datafusion] francis-du opened a new pull request #923: [DataFusion] - Support show function for DataFrame

francis-du opened a new pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923


   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #.
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   Collect the query results for the user in the show function and print the results to the console. 
   
   If the user wants to preview the results, they only needs to call the `show` function.
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   Add `show` function implementation for DataFrame.
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   Users can directly print the query results by calling the show function.
   
   eg: 
   
   ```rust
   async fn main() -> Result<()> {
     let mut ctx = ExecutionContext::new();
     let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
     df.show().await?;
     Ok(())
   }
   ```
   
   +---+---+---+
   | a | b | c |
   +---+---+---+
   | 1 | 2 | 3 |
   +---+---+---+
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693527736



##########
File path: datafusion/src/dataframe.rs
##########
@@ -223,6 +223,21 @@ pub trait DataFrame: Send + Sync {
     /// ```
     async fn collect(&self) -> Result<Vec<RecordBatch>>;
 
+    /// Print results.
+    ///
+    /// ```

Review comment:
       ha, ok, never mind then :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r694777497



##########
File path: README.md
##########
@@ -102,12 +101,10 @@ async fn main() -> datafusion::error::Result<()> {
   let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
 
   let df = df.filter(col("a").lt_eq(col("b")))?
-          .aggregate(vec![col("a")], vec![min(col("b"))])?
-          .limit(100)?;
+          .aggregate(vec![col("a")], vec![min(col("b"))])?;
 
   // execute and print results
-  let results: Vec<RecordBatch> = df.collect().await?;
-  print_batches(&results)?;
+  df.show_limit(100).await?;

Review comment:
       ❤️ 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] francis-du commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
francis-du commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693898063



##########
File path: datafusion/src/dataframe.rs
##########
@@ -223,6 +223,21 @@ pub trait DataFrame: Send + Sync {
     /// ```
     async fn collect(&self) -> Result<Vec<RecordBatch>>;
 
+    /// Print results.
+    ///
+    /// ```
+    /// # use datafusion::prelude::*;
+    /// # use datafusion::error::Result;
+    /// # #[tokio::main]
+    /// # async fn main() -> Result<()> {
+    /// let mut ctx = ExecutionContext::new();
+    /// let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
+    /// df.show().await?;
+    /// # Ok(())
+    /// # }
+    /// ```
+    async fn show(&self) -> Result<()>;

Review comment:
       I think it can be inferred from the logical plan whether there is a limit number. If there is no limit, set it to the default value of 20. If there is, then the default limit value is not set.
   
   This is an implementation:
   
   ```rust
   async fn show(&self) -> Result<()> {
       let mut num = 20;
       if let LogicalPlan::Limit { n, input: _ } = self.to_logical_plan() {
           num = n;
       }
       let results = self.limit(num)?.collect().await?;
       Ok(pretty::print_batches(&results)?)
   }
   ```
   
   The user can call the limit function to pass in the number of lines, if limit is not used, only the default 20 lines will be printed.
   
   eg:
   
   ```rust
   df.limit(10)?.show().await?; // limit 10
   
   df.show().await?; // default limit 20
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] loic-sharma commented on pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
loic-sharma commented on pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#issuecomment-903984817


   The main README may be a good candidate here: https://github.com/apache/arrow-datafusion#example-usage


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] francis-du commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
francis-du commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693922249



##########
File path: datafusion/src/dataframe.rs
##########
@@ -223,6 +223,21 @@ pub trait DataFrame: Send + Sync {
     /// ```
     async fn collect(&self) -> Result<Vec<RecordBatch>>;
 
+    /// Print results.
+    ///
+    /// ```
+    /// # use datafusion::prelude::*;
+    /// # use datafusion::error::Result;
+    /// # #[tokio::main]
+    /// # async fn main() -> Result<()> {
+    /// let mut ctx = ExecutionContext::new();
+    /// let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
+    /// df.show().await?;
+    /// # Ok(())
+    /// # }
+    /// ```
+    async fn show(&self) -> Result<()>;

Review comment:
       OK, i pushed it, please review this changes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693514881



##########
File path: datafusion/src/dataframe.rs
##########
@@ -223,6 +223,21 @@ pub trait DataFrame: Send + Sync {
     /// ```
     async fn collect(&self) -> Result<Vec<RecordBatch>>;
 
+    /// Print results.
+    ///
+    /// ```
+    /// # use datafusion::prelude::*;
+    /// # use datafusion::error::Result;
+    /// # #[tokio::main]
+    /// # async fn main() -> Result<()> {
+    /// let mut ctx = ExecutionContext::new();
+    /// let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
+    /// df.show().await?;
+    /// # Ok(())
+    /// # }
+    /// ```
+    async fn show(&self) -> Result<()>;

Review comment:
       It would be nice if we could have similar behavior to Spark, where the default for `show` is to just show the first 20 rows.
   
   ```scala
   def show(numRows: Int): Unit = show(numRows, truncate = true)
   
   def show(): Unit = show(20)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] francis-du commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
francis-du commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693526746



##########
File path: datafusion/src/dataframe.rs
##########
@@ -223,6 +223,21 @@ pub trait DataFrame: Send + Sync {
     /// ```
     async fn collect(&self) -> Result<Vec<RecordBatch>>;
 
+    /// Print results.
+    ///
+    /// ```

Review comment:
       I copied the code comment of the previous function, do I need to add it here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] francis-du commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
francis-du commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693890277



##########
File path: pre-commit.sh
##########
@@ -20,7 +20,7 @@
 # This file is git pre-commit hook.
 #
 # Soft link it as git hook under top dir of apache arrow git repository:
-# $ ln -s  ../../rust/pre-commit.sh .git/hooks/pre-commit

Review comment:
       OK, I will open an issue and submit a PR to fix this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693515022



##########
File path: pre-commit.sh
##########
@@ -20,7 +20,7 @@
 # This file is git pre-commit hook.
 #
 # Soft link it as git hook under top dir of apache arrow git repository:
-# $ ln -s  ../../rust/pre-commit.sh .git/hooks/pre-commit

Review comment:
       Are these pre-commit changes intentional here? This seems unrelated to adding the `show` function.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] francis-du commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
francis-du commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693890277



##########
File path: pre-commit.sh
##########
@@ -20,7 +20,7 @@
 # This file is git pre-commit hook.
 #
 # Soft link it as git hook under top dir of apache arrow git repository:
-# $ ln -s  ../../rust/pre-commit.sh .git/hooks/pre-commit

Review comment:
       OK, I will open an issue and submit another PR to fix this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] francis-du commented on pull request #923: [DataFusion] - Add show and show_limit function for DataFrame

Posted by GitBox <gi...@apache.org>.
francis-du commented on pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#issuecomment-904579975


   > Thank you @francis-du ! 
   > 
   > I enabled the CI checks to run and once they have passed I think this is good to me. 
   > 
   > 
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r694009745



##########
File path: datafusion/src/dataframe.rs
##########
@@ -223,6 +223,21 @@ pub trait DataFrame: Send + Sync {
     /// ```
     async fn collect(&self) -> Result<Vec<RecordBatch>>;
 
+    /// Print results.
+    ///
+    /// ```
+    /// # use datafusion::prelude::*;
+    /// # use datafusion::error::Result;
+    /// # #[tokio::main]
+    /// # async fn main() -> Result<()> {
+    /// let mut ctx = ExecutionContext::new();
+    /// let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
+    /// df.show().await?;
+    /// # Ok(())
+    /// # }
+    /// ```
+    async fn show(&self) -> Result<()>;

Review comment:
       Sorry to be a pain but this seems confusing to me and maybe it would be better to revert to your original code and document that the user can limit output by using `df.limit(20).show()`. Alternatively, we could add a `show_limit(limit: usize)` alternate method where the user can specify how many rows they would like.
   
   The issue I have with this is that it shows 20 rows by default and if the user wants more then they need to add a limit, which is counterintuitive because limit normally reduces the number of rows. Also, this code only works if the final operator is a limit, so it won't work constantly if the limit is wrapped in a sort, for example.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693867284



##########
File path: datafusion/src/dataframe.rs
##########
@@ -223,6 +223,21 @@ pub trait DataFrame: Send + Sync {
     /// ```
     async fn collect(&self) -> Result<Vec<RecordBatch>>;
 
+    /// Print results.
+    ///
+    /// ```
+    /// # use datafusion::prelude::*;
+    /// # use datafusion::error::Result;
+    /// # #[tokio::main]
+    /// # async fn main() -> Result<()> {
+    /// let mut ctx = ExecutionContext::new();
+    /// let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
+    /// df.show().await?;
+    /// # Ok(())
+    /// # }
+    /// ```
+    async fn show(&self) -> Result<()>;

Review comment:
       I think adding a call to `df.limit(20)` prior to showing the output would work well




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] francis-du commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
francis-du commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693898063



##########
File path: datafusion/src/dataframe.rs
##########
@@ -223,6 +223,21 @@ pub trait DataFrame: Send + Sync {
     /// ```
     async fn collect(&self) -> Result<Vec<RecordBatch>>;
 
+    /// Print results.
+    ///
+    /// ```
+    /// # use datafusion::prelude::*;
+    /// # use datafusion::error::Result;
+    /// # #[tokio::main]
+    /// # async fn main() -> Result<()> {
+    /// let mut ctx = ExecutionContext::new();
+    /// let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
+    /// df.show().await?;
+    /// # Ok(())
+    /// # }
+    /// ```
+    async fn show(&self) -> Result<()>;

Review comment:
       I think it can be inferred from the logical plan whether there is a limit number. If there is no limit, set it to the default value of 20. If there is, then the default limit value is not set.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] francis-du commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
francis-du commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r694057537



##########
File path: datafusion/src/dataframe.rs
##########
@@ -223,6 +223,21 @@ pub trait DataFrame: Send + Sync {
     /// ```
     async fn collect(&self) -> Result<Vec<RecordBatch>>;
 
+    /// Print results.
+    ///
+    /// ```
+    /// # use datafusion::prelude::*;
+    /// # use datafusion::error::Result;
+    /// # #[tokio::main]
+    /// # async fn main() -> Result<()> {
+    /// let mut ctx = ExecutionContext::new();
+    /// let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
+    /// df.show().await?;
+    /// # Ok(())
+    /// # }
+    /// ```
+    async fn show(&self) -> Result<()>;

Review comment:
       I greet, I will rewrite.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] francis-du commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
francis-du commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693525114



##########
File path: datafusion/src/dataframe.rs
##########
@@ -223,6 +223,21 @@ pub trait DataFrame: Send + Sync {
     /// ```
     async fn collect(&self) -> Result<Vec<RecordBatch>>;
 
+    /// Print results.
+    ///
+    /// ```
+    /// # use datafusion::prelude::*;
+    /// # use datafusion::error::Result;
+    /// # #[tokio::main]
+    /// # async fn main() -> Result<()> {
+    /// let mut ctx = ExecutionContext::new();
+    /// let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
+    /// df.show().await?;
+    /// # Ok(())
+    /// # }
+    /// ```
+    async fn show(&self) -> Result<()>;

Review comment:
       DataFusion df has limit function to limit the number of rows, so I think it is just to add the default 20 rows, what do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #923: [DataFusion] - Add show and show_limit function for DataFrame

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693525032



##########
File path: datafusion/src/dataframe.rs
##########
@@ -223,6 +223,21 @@ pub trait DataFrame: Send + Sync {
     /// ```
     async fn collect(&self) -> Result<Vec<RecordBatch>>;
 
+    /// Print results.
+    ///
+    /// ```

Review comment:
       We could add `no_run` here and remove the code comment there, so the doc string can still be tested to make sure it always compiles.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693867753



##########
File path: pre-commit.sh
##########
@@ -20,7 +20,7 @@
 # This file is git pre-commit hook.
 #
 # Soft link it as git hook under top dir of apache arrow git repository:
-# $ ln -s  ../../rust/pre-commit.sh .git/hooks/pre-commit

Review comment:
       I would recommend  a separate PR in future cases -- mostly so we can tag the original author of the pre-commit and have a discussion about what to do there without holding up this PR
   
   This change looks uncontentious, but since I don't use the pre-commit.sh hook I am not sure how to test it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693916766



##########
File path: datafusion/src/dataframe.rs
##########
@@ -223,6 +223,21 @@ pub trait DataFrame: Send + Sync {
     /// ```
     async fn collect(&self) -> Result<Vec<RecordBatch>>;
 
+    /// Print results.
+    ///
+    /// ```
+    /// # use datafusion::prelude::*;
+    /// # use datafusion::error::Result;
+    /// # #[tokio::main]
+    /// # async fn main() -> Result<()> {
+    /// let mut ctx = ExecutionContext::new();
+    /// let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
+    /// df.show().await?;
+    /// # Ok(())
+    /// # }
+    /// ```
+    async fn show(&self) -> Result<()>;

Review comment:
       > This is an implementation:
   
   I agree that looks perfect




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] francis-du commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
francis-du commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693926485



##########
File path: pre-commit.sh
##########
@@ -20,7 +20,7 @@
 # This file is git pre-commit hook.
 #
 # Soft link it as git hook under top dir of apache arrow git repository:
-# $ ln -s  ../../rust/pre-commit.sh .git/hooks/pre-commit

Review comment:
       Just run `ln -s  ../../rust/pre-commit.sh .git/hooks/pre-commit` and exec `git commit` 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] francis-du commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
francis-du commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693524633



##########
File path: pre-commit.sh
##########
@@ -20,7 +20,7 @@
 # This file is git pre-commit hook.
 #
 # Soft link it as git hook under top dir of apache arrow git repository:
-# $ ln -s  ../../rust/pre-commit.sh .git/hooks/pre-commit

Review comment:
       Yes, I found that pre-commit does not work, so I changed it, do I need to submit another PR?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] francis-du commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
francis-du commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693890277



##########
File path: pre-commit.sh
##########
@@ -20,7 +20,7 @@
 # This file is git pre-commit hook.
 #
 # Soft link it as git hook under top dir of apache arrow git repository:
-# $ ln -s  ../../rust/pre-commit.sh .git/hooks/pre-commit

Review comment:
       OK, I will open an issue and submit another PR to fix this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] francis-du commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
francis-du commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693922249



##########
File path: datafusion/src/dataframe.rs
##########
@@ -223,6 +223,21 @@ pub trait DataFrame: Send + Sync {
     /// ```
     async fn collect(&self) -> Result<Vec<RecordBatch>>;
 
+    /// Print results.
+    ///
+    /// ```
+    /// # use datafusion::prelude::*;
+    /// # use datafusion::error::Result;
+    /// # #[tokio::main]
+    /// # async fn main() -> Result<()> {
+    /// let mut ctx = ExecutionContext::new();
+    /// let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
+    /// df.show().await?;
+    /// # Ok(())
+    /// # }
+    /// ```
+    async fn show(&self) -> Result<()>;

Review comment:
       OK, i pushed it, please review these changes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r694300076



##########
File path: datafusion/src/dataframe.rs
##########
@@ -223,6 +223,21 @@ pub trait DataFrame: Send + Sync {
     /// ```
     async fn collect(&self) -> Result<Vec<RecordBatch>>;
 
+    /// Print results.
+    ///
+    /// ```
+    /// # use datafusion::prelude::*;
+    /// # use datafusion::error::Result;
+    /// # #[tokio::main]
+    /// # async fn main() -> Result<()> {
+    /// let mut ctx = ExecutionContext::new();
+    /// let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
+    /// df.show().await?;
+    /// # Ok(())
+    /// # }
+    /// ```
+    async fn show(&self) -> Result<()>;

Review comment:
       I defer to @andygrove 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] francis-du commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
francis-du commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693898063



##########
File path: datafusion/src/dataframe.rs
##########
@@ -223,6 +223,21 @@ pub trait DataFrame: Send + Sync {
     /// ```
     async fn collect(&self) -> Result<Vec<RecordBatch>>;
 
+    /// Print results.
+    ///
+    /// ```
+    /// # use datafusion::prelude::*;
+    /// # use datafusion::error::Result;
+    /// # #[tokio::main]
+    /// # async fn main() -> Result<()> {
+    /// let mut ctx = ExecutionContext::new();
+    /// let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
+    /// df.show().await?;
+    /// # Ok(())
+    /// # }
+    /// ```
+    async fn show(&self) -> Result<()>;

Review comment:
       I think it can be inferred from the logical plan whether there is a limit number. If there is no limit, set it to the default value of 20. If there is, then the default limit value is not set.
   
   ```rust
   async fn show(&self) -> Result<()> {
       let mut num = 20;
       match self.to_logical_plan() {
           LogicalPlan::Limit { n, input } => {
               if n > num {
                   num = n;
               }
           }
           _ => {}
       }
       let results = self.limit(num)?.collect().await?;
       Ok(pretty::print_batches(&results)?)
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] francis-du commented on pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
francis-du commented on pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#issuecomment-903746418


   > I think this looks great -- thank you @francis-du !
   
   By the way,should I need to add show function in example code?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693867753



##########
File path: pre-commit.sh
##########
@@ -20,7 +20,7 @@
 # This file is git pre-commit hook.
 #
 # Soft link it as git hook under top dir of apache arrow git repository:
-# $ ln -s  ../../rust/pre-commit.sh .git/hooks/pre-commit

Review comment:
       I would recommend a separate PR (mostly so we can tag the original author of the pre-commit and have a discussion about what to do there without holding up this PR)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] francis-du commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
francis-du commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693898063



##########
File path: datafusion/src/dataframe.rs
##########
@@ -223,6 +223,21 @@ pub trait DataFrame: Send + Sync {
     /// ```
     async fn collect(&self) -> Result<Vec<RecordBatch>>;
 
+    /// Print results.
+    ///
+    /// ```
+    /// # use datafusion::prelude::*;
+    /// # use datafusion::error::Result;
+    /// # #[tokio::main]
+    /// # async fn main() -> Result<()> {
+    /// let mut ctx = ExecutionContext::new();
+    /// let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
+    /// df.show().await?;
+    /// # Ok(())
+    /// # }
+    /// ```
+    async fn show(&self) -> Result<()>;

Review comment:
       I think it can be inferred from the logical plan whether there is a limit number. If there is no limit, set it to the default value of 20. If there is, then the default limit value is not set.
   
   This is an implementation:
   ```rust
   async fn show(&self) -> Result<()> {
       let mut num = 20;
       match self.to_logical_plan() {
           LogicalPlan::Limit { n, input } => {
               if n > num {
                   num = n;
               }
           }
           _ => {}
       }
       let results = self.limit(num)?.collect().await?;
       Ok(pretty::print_batches(&results)?)
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] francis-du commented on a change in pull request #923: [DataFusion] - Support show function for DataFrame

Posted by GitBox <gi...@apache.org>.
francis-du commented on a change in pull request #923:
URL: https://github.com/apache/arrow-datafusion/pull/923#discussion_r693898063



##########
File path: datafusion/src/dataframe.rs
##########
@@ -223,6 +223,21 @@ pub trait DataFrame: Send + Sync {
     /// ```
     async fn collect(&self) -> Result<Vec<RecordBatch>>;
 
+    /// Print results.
+    ///
+    /// ```
+    /// # use datafusion::prelude::*;
+    /// # use datafusion::error::Result;
+    /// # #[tokio::main]
+    /// # async fn main() -> Result<()> {
+    /// let mut ctx = ExecutionContext::new();
+    /// let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
+    /// df.show().await?;
+    /// # Ok(())
+    /// # }
+    /// ```
+    async fn show(&self) -> Result<()>;

Review comment:
       I think it can be inferred from the logical plan whether there is a limit number. If there is no limit, set it to the default value of 20. If there is, then the default limit value is not set.
   
   This is an implementation:
   
   ```rust
     async fn show(&self) -> Result<()> {
         let mut num = 20;
         match self.to_logical_plan() {
             LogicalPlan::Limit { n, input } => {
                 num = n;
             }
             _ => {}
         }
         let results = self.limit(num)?.collect().await?;
         Ok(pretty::print_batches(&results)?)
     }
   ```
   
   The user can call the limit function to pass in the number of lines, if limit is not used, only the default 20 lines will be printed.
   
   eg:
   
   ```rust
   df.limit(10)?.show().await?; // limit 10
   
   df.show().await?; // default limit 20
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org