You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/08/19 03:24:25 UTC

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #7993: ARROW-9760: [Rust] [DataFusion] Added DataFrame::explain

jorgecarleitao commented on a change in pull request #7993:
URL: https://github.com/apache/arrow/pull/7993#discussion_r472638005



##########
File path: rust/datafusion/src/dataframe.rs
##########
@@ -174,4 +174,18 @@ pub trait DataFrame {
 
     /// Return the logical plan represented by this DataFrame.
     fn to_logical_plan(&self) -> LogicalPlan;
+
+    /// Return a DataFrame with the explanation of its plan so far.
+    ///
+    /// ```
+    /// # use datafusion::prelude::*;
+    /// # use datafusion::error::Result;
+    /// # fn main() -> Result<()> {
+    /// let mut ctx = ExecutionContext::new();
+    /// let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
+    /// let batches = df.limit(100)?.explain(false)?.collect()?;
+    /// # Ok(())
+    /// # }
+    /// ```
+    fn explain(&self, verbose: bool) -> Result<Arc<dyn DataFrame>>;

Review comment:
       I find it poor design that `.explain` prints directly to the stdout in spark. IMO saving 1 extra line (print) of code is not a sufficiently good reason to outright spam stdout and limit so much what a user can do with `.explain`.
   
   Some downstream consequences of this decision in spark:
   * it makes it much more difficult to log it correctly
   * the popular pyspark can't use it to convert it to a Python string and prettify it when it is being used in notebooks
   
   I agree with `fn explain(&self, verbose: bool) -> String` (prob. `Result<String>`). For a user, the difference is
   
   ```
   df.explain()
   ```
   
   vs 
   
   ```
   println("{}", df.explain()?)
   ```
   
   I find the latter more expressive of the user's intention, and gives them the freedom to pipe the result to whatever stream they want.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org