You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by al...@apache.org on 2021/03/19 10:55:29 UTC

[arrow] branch master updated: ARROW-12015: [Rust] [DataFusion] Integrate doc-comment crate to ensure readme examples remain valid

This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
     new 43d00e9  ARROW-12015: [Rust] [DataFusion] Integrate doc-comment crate to ensure readme examples remain valid
43d00e9 is described below

commit 43d00e9629fe34dc40c78ea96c008de186726a39
Author: Ruan Pearce-Authers <ru...@outlook.com>
AuthorDate: Fri Mar 19 06:54:26 2021 -0400

    ARROW-12015: [Rust] [DataFusion] Integrate doc-comment crate to ensure readme examples remain valid
    
    As discussed [here](https://github.com/apache/arrow/pull/9710#discussion_r596404956), we were looking into how we might add code examples to the DataFusion readme whilst keeping them in sync with reality as we go through API revisions etc.
    
    This PR pulls in a new dev dependency, `doc-comment`, which allows for detecting all the `rust`-tagged code blocks in a Markdown file and treating them as doctests, and wires this up for `README.md`.
    
    My only concerns are:
    - because the end result is a full-blown doctest, you do need to make sure imports etc are present, which makes the samples more verbose than some people would perhaps like
    - again on the verbosity front: we have lots of async code which requires a `#[tokio::main] async fn main() { ... }` wrapper
    
    Neither of these are inherently bad imo, but worth noting upfront.
    
    As an example of a readme sample that passes as a doctest (borrowed from @alamb's latest documentation PR, #9710):
    
    ```rust
    use datafusion::prelude::*;
    use arrow::util::pretty::print_batches;
    use arrow::record_batch::RecordBatch;
    
    #[tokio::main]
    async fn main() -> datafusion::error::Result<()> {
      let mut ctx = ExecutionContext::new();
      // create the dataframe
      let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
    
      let df = df.filter(col("a").lt_eq(col("b")))?
                .aggregate(&[col("a")], &[min(col("b"))])?
                .limit(100)?;
    
      let results: Vec<RecordBatch> = df.collect().await?;
      print_batches(&results)?;
    
      Ok(())
    }
    ```
    
    Closes #9749 from returnString/readme_doctest
    
    Authored-by: Ruan Pearce-Authers <ru...@outlook.com>
    Signed-off-by: Andrew Lamb <an...@nerdnetworks.org>
---
 rust/datafusion/Cargo.toml | 1 +
 rust/datafusion/src/lib.rs | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/rust/datafusion/Cargo.toml b/rust/datafusion/Cargo.toml
index b713b77..3d795ba 100644
--- a/rust/datafusion/Cargo.toml
+++ b/rust/datafusion/Cargo.toml
@@ -78,6 +78,7 @@ tempfile = "3"
 prost = "0.7"
 arrow-flight = { path = "../arrow-flight", version = "4.0.0-SNAPSHOT" }
 tonic = "0.4"
+doc-comment = "0.3"
 
 [[bench]]
 name = "aggregate_query_sql"
diff --git a/rust/datafusion/src/lib.rs b/rust/datafusion/src/lib.rs
index 5126f90..f0fcc4f 100644
--- a/rust/datafusion/src/lib.rs
+++ b/rust/datafusion/src/lib.rs
@@ -175,3 +175,6 @@ pub mod test;
 #[macro_use]
 #[cfg(feature = "regex_expressions")]
 extern crate lazy_static;
+
+#[cfg(doctest)]
+doc_comment::doctest!("../README.md", readme_example_test);