You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "Jefffrey (via GitHub)" <gi...@apache.org> on 2023/04/11 12:31:35 UTC

[GitHub] [arrow-datafusion] Jefffrey commented on a diff in pull request #5962: [DOCS]: consolidate doc site content simplify navbar

Jefffrey commented on code in PR #5962:
URL: https://github.com/apache/arrow-datafusion/pull/5962#discussion_r1162735593


##########
docs/source/user-guide/example-usage.md:
##########
@@ -141,3 +141,112 @@ async fn main() -> datafusion::error::Result<()> {
 | 1 | 2      |
 +---+--------+
 ```
+
+# Using DataFusion as a library
+
+## Create a new project
+
+```shell
+cargo new hello_datafusion
+```
+
+```shell
+$ cd hello_datafusion
+$ tree .
+.
+├── Cargo.toml
+└── src
+    └── main.rs
+
+1 directory, 2 files
+```
+
+## Default Configuration
+
+DataFusion is [published on crates.io](https://crates.io/crates/datafusion), and is [well documented on docs.rs](https://docs.rs/datafusion/).
+
+To get started, add the following to your `Cargo.toml` file:
+
+```toml
+[dependencies]
+datafusion = "11.0"

Review Comment:
   bump to latest here? (ditto for anywhere else version is mentioned)



##########
docs/source/user-guide/example-usage.md:
##########
@@ -141,3 +141,112 @@ async fn main() -> datafusion::error::Result<()> {
 | 1 | 2      |
 +---+--------+
 ```
+
+# Using DataFusion as a library
+
+## Create a new project
+
+```shell
+cargo new hello_datafusion
+```
+
+```shell
+$ cd hello_datafusion
+$ tree .
+.
+├── Cargo.toml
+└── src
+    └── main.rs
+
+1 directory, 2 files
+```
+
+## Default Configuration
+
+DataFusion is [published on crates.io](https://crates.io/crates/datafusion), and is [well documented on docs.rs](https://docs.rs/datafusion/).
+
+To get started, add the following to your `Cargo.toml` file:
+
+```toml
+[dependencies]
+datafusion = "11.0"
+```
+
+## Create a main function
+
+Update the main.rs file with your first datafusion application based on [Example usage](https://arrow.apache.org/datafusion/user-guide/example-usage.html)
+
+```rust
+use datafusion::prelude::*;
+
+#[tokio::main]
+async fn main() -> datafusion::error::Result<()> {
+  // register the table
+  let ctx = SessionContext::new();
+  ctx.register_csv("test", "<PATH_TO_YOUR_CSV_FILE>", CsvReadOptions::new()).await?;
+
+  // create a plan to run a SQL query
+  let df = ctx.sql("SELECT * FROM test").await?;
+
+  // execute and print results
+  df.show().await?;
+  Ok(())
+}
+```

Review Comment:
   example feels kinda redundant compared with example code in above sections



##########
docs/source/user-guide/faq.md:
##########
@@ -29,3 +29,37 @@ model and computational kernels. It is designed to run within a single process,
 for parallel query execution.
 
 [Ballista](https://github.com/apache/arrow-ballista) is a distributed compute platform built on DataFusion.
+
+# How does DataFusion Compare with `XYZ`?
+
+When compared to similar systems, DataFusion typically is:
+
+1. Targeted at developers, rather than end users / data scientists.
+2. Designed to be embedded, rather than a complete file based SQL system.
+3. Governed by the [Apache Software Foundation](https://www.apache.org/) process, rather than a single company or individual.
+4. Implemented in `Rust`, rather than `C/C++`
+
+Here is a comparison with similar projects that may help understand
+when DataFusion might be be suitable and unsuitable for your needs:
+
+- [DuckDB](http://www.duckdb.org) is an open source, in process analytic database.

Review Comment:
   change to https link?



##########
docs/source/user-guide/example-usage.md:
##########
@@ -141,3 +141,112 @@ async fn main() -> datafusion::error::Result<()> {
 | 1 | 2      |
 +---+--------+
 ```
+
+# Using DataFusion as a library
+
+## Create a new project
+
+```shell
+cargo new hello_datafusion
+```
+
+```shell
+$ cd hello_datafusion
+$ tree .
+.
+├── Cargo.toml
+└── src
+    └── main.rs
+
+1 directory, 2 files
+```
+
+## Default Configuration
+
+DataFusion is [published on crates.io](https://crates.io/crates/datafusion), and is [well documented on docs.rs](https://docs.rs/datafusion/).
+
+To get started, add the following to your `Cargo.toml` file:
+
+```toml
+[dependencies]
+datafusion = "11.0"
+```
+
+## Create a main function
+
+Update the main.rs file with your first datafusion application based on [Example usage](https://arrow.apache.org/datafusion/user-guide/example-usage.html)

Review Comment:
   is this self link to same page, this page?



##########
docs/source/user-guide/faq.md:
##########
@@ -29,3 +29,37 @@ model and computational kernels. It is designed to run within a single process,
 for parallel query execution.
 
 [Ballista](https://github.com/apache/arrow-ballista) is a distributed compute platform built on DataFusion.
+
+# How does DataFusion Compare with `XYZ`?
+
+When compared to similar systems, DataFusion typically is:
+
+1. Targeted at developers, rather than end users / data scientists.
+2. Designed to be embedded, rather than a complete file based SQL system.
+3. Governed by the [Apache Software Foundation](https://www.apache.org/) process, rather than a single company or individual.
+4. Implemented in `Rust`, rather than `C/C++`
+
+Here is a comparison with similar projects that may help understand
+when DataFusion might be be suitable and unsuitable for your needs:
+
+- [DuckDB](http://www.duckdb.org) is an open source, in process analytic database.
+  Like DataFusion, it supports very fast execution, both from its custom file format
+  and directly from parquet files. Unlike DataFusion, it is written in C/C++ and it
+  is primarily used directly by users as a serverless database and query system rather
+  than as a library for building such database systems.
+
+- [Polars](http://pola.rs): Polars is one of the fastest DataFrame
+  libraries at the time of writing. Like DataFusion, it is also
+  written in Rust and uses the Apache Arrow memory model, but unlike
+  DataFusion it does not provide SQL nor as many extension points.
+
+- [Facebook Velox](https://engineering.fb.com/2022/08/31/open-source/velox/)

Review Comment:
   could switch to github link: https://github.com/facebookincubator/velox since this link seems dead



##########
docs/source/user-guide/faq.md:
##########
@@ -29,3 +29,37 @@ model and computational kernels. It is designed to run within a single process,
 for parallel query execution.
 
 [Ballista](https://github.com/apache/arrow-ballista) is a distributed compute platform built on DataFusion.
+
+# How does DataFusion Compare with `XYZ`?
+
+When compared to similar systems, DataFusion typically is:
+
+1. Targeted at developers, rather than end users / data scientists.
+2. Designed to be embedded, rather than a complete file based SQL system.
+3. Governed by the [Apache Software Foundation](https://www.apache.org/) process, rather than a single company or individual.
+4. Implemented in `Rust`, rather than `C/C++`
+
+Here is a comparison with similar projects that may help understand
+when DataFusion might be be suitable and unsuitable for your needs:
+
+- [DuckDB](http://www.duckdb.org) is an open source, in process analytic database.
+  Like DataFusion, it supports very fast execution, both from its custom file format
+  and directly from parquet files. Unlike DataFusion, it is written in C/C++ and it
+  is primarily used directly by users as a serverless database and query system rather
+  than as a library for building such database systems.
+
+- [Polars](http://pola.rs): Polars is one of the fastest DataFrame
+  libraries at the time of writing. Like DataFusion, it is also
+  written in Rust and uses the Apache Arrow memory model, but unlike
+  DataFusion it does not provide SQL nor as many extension points.

Review Comment:
   change to https url
   
   also i think polars might support sql now, according to their doc: https://pola-rs.github.io/polars-book/user-guide/sql.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org