You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/08/19 12:46:56 UTC

[GitHub] [arrow] alamb commented on pull request #7993: ARROW-9760: [Rust] [DataFusion] Added DataFrame::explain

alamb commented on pull request #7993:
URL: https://github.com/apache/arrow/pull/7993#issuecomment-676295725


   That makes sense to me
   
   On Tue, Aug 18, 2020 at 11:24 PM Jorge Leitao <no...@github.com>
   wrote:
   
   > *@jorgecarleitao* commented on this pull request.
   > ------------------------------
   >
   > In rust/datafusion/src/dataframe.rs
   > <https://github.com/apache/arrow/pull/7993#discussion_r472638005>:
   >
   > > @@ -174,4 +174,18 @@ pub trait DataFrame {
   >
   >      /// Return the logical plan represented by this DataFrame.
   >      fn to_logical_plan(&self) -> LogicalPlan;
   > +
   > +    /// Return a DataFrame with the explanation of its plan so far.
   > +    ///
   > +    /// ```
   > +    /// # use datafusion::prelude::*;
   > +    /// # use datafusion::error::Result;
   > +    /// # fn main() -> Result<()> {
   > +    /// let mut ctx = ExecutionContext::new();
   > +    /// let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
   > +    /// let batches = df.limit(100)?.explain(false)?.collect()?;
   > +    /// # Ok(())
   > +    /// # }
   > +    /// ```
   > +    fn explain(&self, verbose: bool) -> Result<Arc<dyn DataFrame>>;
   >
   > I find it poor design that .explain prints directly to the stdout in
   > spark. IMO saving 1 extra line (print) of code is not a sufficiently good
   > reason to outright spam stdout and limit so much what a user can do with
   > .explain.
   >
   > Some downstream consequences of this decision in spark:
   >
   >    - it makes it much more difficult to log it correctly
   >    - the popular pyspark can't use it to convert it to a Python string
   >    and prettify it when it is being used in notebooks
   >
   > I agree with fn explain(&self, verbose: bool) -> String (prob.
   > Result<String>). For a user, the difference is
   >
   > df.explain()
   >
   > vs
   >
   > println("{}", df.explain()?)
   >
   > I find the latter more expressive of the user's intention, and gives them
   > the freedom to pipe the result to whatever stream they want.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/arrow/pull/7993#discussion_r472638005>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AADXZMJFG5NSF3L4FDEOIT3SBNAWNANCNFSM4QD6L7YA>
   > .
   >
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org