You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by ag...@apache.org on 2021/04/21 13:42:43 UTC

[arrow-datafusion] branch master updated: Create starting point for combined user guide for DataFusion and Ballista (#20)

This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


The following commit(s) were added to refs/heads/master by this push:
     new abe84cf  Create starting point for combined user guide for DataFusion and Ballista (#20)
abe84cf is described below

commit abe84cfbfb6cc3e80ed314bc343ef78eae15ed9b
Author: Andy Grove <an...@users.noreply.github.com>
AuthorDate: Wed Apr 21 07:42:37 2021 -0600

    Create starting point for combined user guide for DataFusion and Ballista (#20)
---
 ballista/docs/user-guide/.gitignore                |   2 -
 docs/user-guide/.gitignore                         |   1 +
 {ballista/docs => docs}/user-guide/README.md       |  14 ++--
 {ballista/docs => docs}/user-guide/book.toml       |   4 +-
 {ballista/docs => docs}/user-guide/src/SUMMARY.md  |  19 +++---
 .../user-guide/src/distributed/client-python.md    |   5 +-
 .../user-guide/src/distributed}/client-rust.md     |   0
 .../user-guide/src/distributed}/clients.md         |   0
 .../user-guide/src/distributed}/configuration.md   |   0
 .../user-guide/src/distributed}/deployment.md      |   0
 .../user-guide/src/distributed}/docker-compose.md  |   0
 .../user-guide/src/distributed}/introduction.md    |   0
 .../user-guide/src/distributed}/kubernetes.md      |   3 +-
 .../user-guide/src/distributed}/standalone.md      |   0
 docs/user-guide/src/example-usage.md               |  76 +++++++++++++++++++++
 {ballista/docs => docs}/user-guide/src/faq.md      |   0
 .../user-guide/src/img/ballista-architecture.png   | Bin
 docs/user-guide/src/introduction.md                |  44 ++++++++++++
 .../user-guide/src/library.md                      |  12 ++--
 19 files changed, 148 insertions(+), 32 deletions(-)

diff --git a/ballista/docs/user-guide/.gitignore b/ballista/docs/user-guide/.gitignore
deleted file mode 100644
index e662f99..0000000
--- a/ballista/docs/user-guide/.gitignore
+++ /dev/null
@@ -1,2 +0,0 @@
-ballista-book.tgz
-book
\ No newline at end of file
diff --git a/docs/user-guide/.gitignore b/docs/user-guide/.gitignore
new file mode 100644
index 0000000..e9c0728
--- /dev/null
+++ b/docs/user-guide/.gitignore
@@ -0,0 +1 @@
+book
\ No newline at end of file
diff --git a/ballista/docs/user-guide/README.md b/docs/user-guide/README.md
similarity index 77%
rename from ballista/docs/user-guide/README.md
rename to docs/user-guide/README.md
index 9ee3e90..0b9278c 100644
--- a/ballista/docs/user-guide/README.md
+++ b/docs/user-guide/README.md
@@ -16,21 +16,15 @@
   specific language governing permissions and limitations
   under the License.
 -->
-# Ballista User Guide Source
+# DataFusion User Guide Source
 
-This directory contains the sources for the user guide that is published at https://ballistacompute.org/docs/.
+This directory contains the sources for the DataFusion user guide.
 
 ## Generate HTML
 
+To generate the user guide in HTML format, run the following commands:
+
 ```bash
 cargo install mdbook
 mdbook build
-```
-
-## Deploy User Guide to Web Site
-
-Requires ssh certificate to be available.
-
-```bash
-./deploy.sh
 ```
\ No newline at end of file
diff --git a/ballista/docs/user-guide/book.toml b/docs/user-guide/book.toml
similarity index 93%
rename from ballista/docs/user-guide/book.toml
rename to docs/user-guide/book.toml
index cf1653d..efb9212 100644
--- a/ballista/docs/user-guide/book.toml
+++ b/docs/user-guide/book.toml
@@ -16,8 +16,8 @@
 # under the License.
 
 [book]
-authors = ["Andy Grove"]
+authors = ["Apache Arrow"]
 language = "en"
 multilingual = false
 src = "src"
-title = "Ballista User Guide"
+title = "DataFusion User Guide"
diff --git a/ballista/docs/user-guide/src/SUMMARY.md b/docs/user-guide/src/SUMMARY.md
similarity index 60%
rename from ballista/docs/user-guide/src/SUMMARY.md
rename to docs/user-guide/src/SUMMARY.md
index c8fc2c8..e2ddcb0 100644
--- a/ballista/docs/user-guide/src/SUMMARY.md
+++ b/docs/user-guide/src/SUMMARY.md
@@ -19,12 +19,15 @@
 # Summary
 
 - [Introduction](introduction.md)
-- [Create a Ballista Cluster](deployment.md)
-  - [Docker](standalone.md)
-  - [Docker Compose](docker-compose.md)
-  - [Kubernetes](kubernetes.md)
-  - [Ballista Configuration](configuration.md)
-- [Clients](clients.md)
-  - [Rust](client-rust.md)
-  - [Python](client-python.md)
+- [Example Usage](example-usage.md)  
+- [Use as a Library](library.md)  
+- [Distributed](distributed/introduction.md)
+  - [Create a Ballista Cluster](distributed/deployment.md)
+    - [Docker](distributed/standalone.md)
+    - [Docker Compose](distributed/docker-compose.md)
+    - [Kubernetes](distributed/kubernetes.md)
+    - [Ballista Configuration](distributed/configuration.md)
+  - [Clients](distributed/clients.md)
+    - [Rust](distributed/client-rust.md)
+    - [Python](distributed/client-python.md)
 - [Frequently Asked Questions](faq.md)
\ No newline at end of file
diff --git a/ballista/docs/user-guide/src/clients.md b/docs/user-guide/src/distributed/client-python.md
similarity index 92%
copy from ballista/docs/user-guide/src/clients.md
copy to docs/user-guide/src/distributed/client-python.md
index 1e223dd..7525c60 100644
--- a/ballista/docs/user-guide/src/clients.md
+++ b/docs/user-guide/src/distributed/client-python.md
@@ -16,7 +16,6 @@
   specific language governing permissions and limitations
   under the License.
 -->
-## Clients
+# Python
 
-- [Rust](client-rust.md)
-- [Python](client-python.md)
+Coming soon.
\ No newline at end of file
diff --git a/ballista/docs/user-guide/src/client-rust.md b/docs/user-guide/src/distributed/client-rust.md
similarity index 100%
rename from ballista/docs/user-guide/src/client-rust.md
rename to docs/user-guide/src/distributed/client-rust.md
diff --git a/ballista/docs/user-guide/src/clients.md b/docs/user-guide/src/distributed/clients.md
similarity index 100%
rename from ballista/docs/user-guide/src/clients.md
rename to docs/user-guide/src/distributed/clients.md
diff --git a/ballista/docs/user-guide/src/configuration.md b/docs/user-guide/src/distributed/configuration.md
similarity index 100%
rename from ballista/docs/user-guide/src/configuration.md
rename to docs/user-guide/src/distributed/configuration.md
diff --git a/ballista/docs/user-guide/src/deployment.md b/docs/user-guide/src/distributed/deployment.md
similarity index 100%
copy from ballista/docs/user-guide/src/deployment.md
copy to docs/user-guide/src/distributed/deployment.md
diff --git a/ballista/docs/user-guide/src/docker-compose.md b/docs/user-guide/src/distributed/docker-compose.md
similarity index 100%
rename from ballista/docs/user-guide/src/docker-compose.md
rename to docs/user-guide/src/distributed/docker-compose.md
diff --git a/ballista/docs/user-guide/src/introduction.md b/docs/user-guide/src/distributed/introduction.md
similarity index 100%
rename from ballista/docs/user-guide/src/introduction.md
rename to docs/user-guide/src/distributed/introduction.md
diff --git a/ballista/docs/user-guide/src/kubernetes.md b/docs/user-guide/src/distributed/kubernetes.md
similarity index 97%
rename from ballista/docs/user-guide/src/kubernetes.md
rename to docs/user-guide/src/distributed/kubernetes.md
index 8cd8bee..027a44d 100644
--- a/ballista/docs/user-guide/src/kubernetes.md
+++ b/docs/user-guide/src/distributed/kubernetes.md
@@ -33,8 +33,7 @@ The k8s deployment consists of:
 Ballista is at an early stage of development and therefore has some significant limitations:
 
 - There is no support for shared object stores such as S3. All data must exist locally on each node in the 
-  cluster, including where any client process runs  (until 
-  [#473](https://github.com/ballista-compute/ballista/issues/473) is resolved).
+  cluster, including where any client process runs.
 - Only a single scheduler instance is currently supported unless the scheduler is configured to use `etcd` as a 
   backing store.
 
diff --git a/ballista/docs/user-guide/src/standalone.md b/docs/user-guide/src/distributed/standalone.md
similarity index 100%
rename from ballista/docs/user-guide/src/standalone.md
rename to docs/user-guide/src/distributed/standalone.md
diff --git a/docs/user-guide/src/example-usage.md b/docs/user-guide/src/example-usage.md
new file mode 100644
index 0000000..ff23c96
--- /dev/null
+++ b/docs/user-guide/src/example-usage.md
@@ -0,0 +1,76 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+# Example Usage
+
+Run a SQL query against data stored in a CSV:
+
+```rust
+use datafusion::prelude::*;
+use arrow::util::pretty::print_batches;
+use arrow::record_batch::RecordBatch;
+
+#[tokio::main]
+async fn main() -> datafusion::error::Result<()> {
+  // register the table
+  let mut ctx = ExecutionContext::new();
+  ctx.register_csv("example", "tests/example.csv", CsvReadOptions::new())?;
+
+  // create a plan to run a SQL query
+  let df = ctx.sql("SELECT a, MIN(b) FROM example GROUP BY a LIMIT 100")?;
+
+  // execute and print results
+  let results: Vec<RecordBatch> = df.collect().await?;
+  print_batches(&results)?;
+  Ok(())
+}
+```
+
+Use the DataFrame API to process data stored in a CSV:
+
+```rust
+use datafusion::prelude::*;
+use arrow::util::pretty::print_batches;
+use arrow::record_batch::RecordBatch;
+
+#[tokio::main]
+async fn main() -> datafusion::error::Result<()> {
+  // create the dataframe
+  let mut ctx = ExecutionContext::new();
+  let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
+
+  let df = df.filter(col("a").lt_eq(col("b")))?
+           .aggregate(vec![col("a")], vec![min(col("b"))])?
+           .limit(100)?;
+
+  // execute and print results
+  let results: Vec<RecordBatch> = df.collect().await?;
+  print_batches(&results)?;
+  Ok(())
+}
+```
+
+Both of these examples will produce
+
+```text
++---+--------+
+| a | MIN(b) |
++---+--------+
+| 1 | 2      |
++---+--------+
+```
diff --git a/ballista/docs/user-guide/src/faq.md b/docs/user-guide/src/faq.md
similarity index 100%
rename from ballista/docs/user-guide/src/faq.md
rename to docs/user-guide/src/faq.md
diff --git a/ballista/docs/user-guide/src/img/ballista-architecture.png b/docs/user-guide/src/img/ballista-architecture.png
similarity index 100%
rename from ballista/docs/user-guide/src/img/ballista-architecture.png
rename to docs/user-guide/src/img/ballista-architecture.png
diff --git a/docs/user-guide/src/introduction.md b/docs/user-guide/src/introduction.md
new file mode 100644
index 0000000..c67fb90
--- /dev/null
+++ b/docs/user-guide/src/introduction.md
@@ -0,0 +1,44 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# DataFusion
+
+DataFusion is an extensible query execution framework, written in
+Rust, that uses [Apache Arrow](https://arrow.apache.org) as its
+in-memory format.
+
+DataFusion supports both an SQL and a DataFrame API for building
+logical query plans as well as a query optimizer and execution engine
+capable of parallel execution against partitioned data sources (CSV
+and Parquet) using threads.
+
+## Use Cases
+
+DataFusion is used to create modern, fast and efficient data
+pipelines, ETL processes, and database systems, which need the
+performance of Rust and Apache Arrow and want to provide their users
+the convenience of an SQL interface or a DataFrame API.
+
+## Why DataFusion?
+
+* *High Performance*: Leveraging Rust and Arrow's memory model, DataFusion achieves very high performance
+* *Easy to Connect*: Being part of the Apache Arrow ecosystem (Arrow, Parquet and Flight), DataFusion works well with the rest of the big data ecosystem
+* *Easy to Embed*: Allowing extension at almost any point in its design, DataFusion can be tailored for your specific usecase
+* *High Quality*:  Extensively tested, both by itself and with the rest of the Arrow ecosystem, DataFusion can be used as the foundation for production systems.
+
diff --git a/ballista/docs/user-guide/src/deployment.md b/docs/user-guide/src/library.md
similarity index 73%
rename from ballista/docs/user-guide/src/deployment.md
rename to docs/user-guide/src/library.md
index 2432f2b..12879b1 100644
--- a/ballista/docs/user-guide/src/deployment.md
+++ b/docs/user-guide/src/library.md
@@ -16,11 +16,13 @@
   specific language governing permissions and limitations
   under the License.
 -->
-# Deployment
+# Using DataFusion as a library
 
-Ballista is packaged as Docker images. Refer to the following guides to create a Ballista cluster:
+DataFusion is [published on crates.io](https://crates.io/crates/datafusion), and is [well documented on docs.rs](https://docs.rs/datafusion/).
 
-- [Create a cluster using Docker](standalone.md)
-- [Create a cluster using Docker Compose](docker-compose.md)
-- [Create a cluster using Kubernetes](kubernetes.md)
+To get started, add the following to your `Cargo.toml` file:
 
+```toml
+[dependencies]
+datafusion = "4.0.0-SNAPSHOT"
+```