You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/08 04:24:25 UTC

[GitHub] [arrow-datafusion] andygrove commented on a diff in pull request #2854: Various updates to top-level README

andygrove commented on code in PR #2854:
URL: https://github.com/apache/arrow-datafusion/pull/2854#discussion_r916451408


##########
README.md:
##########
@@ -21,52 +21,70 @@
 
 <img src="docs/source/_static/images/DataFusion-Logo-Background-White.svg" width="256"/>
 
-DataFusion is an extensible query execution framework, written in
+DataFusion is an extensible query planning, optimization, and execution framework, written in
 Rust, that uses [Apache Arrow](https://arrow.apache.org) as its
 in-memory format.
 
-DataFusion supports both an SQL and a DataFrame API for building
-logical query plans as well as a query optimizer and execution engine
-capable of parallel execution against partitioned data sources (CSV
-and Parquet) using threads.
+## Features
 
-DataFusion also supports distributed query execution via the
-[Ballista](https://github.com/apache/arrow-ballista/) crate.
+- SQL query planner with support for multiple SQL dialects
+- DataFrame API
+- Parquet, CSV, JSON, and Avro file formats are supported natively. Custom
+  file formats can be supported by implementing a `TableProvider` trait.
+- Supports popular object stores, including AWS S3, Azure Blob
+  Storage, and Google Cloud Storage. There are extension points for implementing
+  custom object stores.
 
 ## Use Cases
 
-DataFusion is used to create modern, fast and efficient data
-pipelines, ETL processes, and database systems, which need the
-performance of Rust and Apache Arrow and want to provide their users
-the convenience of an SQL interface or a DataFrame API.
+DataFusion is modular in design with many extension points and can be
+used without modification as an embedded query engine and can also provide
+a foundation for building new systems. Here are some example use cases:
+
+- DataFusion can be used as a SQL query planner and query optimizer, providing
+  optimized logical plans that can then be mapped to other execution engines.
+- DataFusion is used to create modern, fast and efficient data
+  pipelines, ETL processes, and database systems, which need the
+  performance of Rust and Apache Arrow and want to provide their users
+  the convenience of an SQL interface or a DataFrame API.
 
 ## Why DataFusion?
 
 - _High Performance_: Leveraging Rust and Arrow's memory model, DataFusion achieves very high performance
 - _Easy to Connect_: Being part of the Apache Arrow ecosystem (Arrow, Parquet and Flight), DataFusion works well with the rest of the big data ecosystem
-- _Easy to Embed_: Allowing extension at almost any point in its design, DataFusion can be tailored for your specific usecase
+- _Easy to Embed_: Allowing extension at almost any point in its design, DataFusion can be tailored for your specific use case
 - _High Quality_: Extensively tested, both by itself and with the rest of the Arrow ecosystem, DataFusion can be used as the foundation for production systems.
 
-## Known Uses
+## DataFusion Community Extensions
 
-Projects that adapt to or serve as plugins to DataFusion:
+There are a number of community projects that extend DataFusion or provide integrations with other systems.
 
+### Language Bindings
+
+- [datafusion-c](https://github.com/datafusion-contrib/datafusion-c)
 - [datafusion-python](https://github.com/datafusion-contrib/datafusion-python)
+- [datafusion-ruby](https://github.com/datafusion-contrib/datafusion-ruby)
 - [datafusion-java](https://github.com/datafusion-contrib/datafusion-java)
-- [datafusion-objectstore-s3](https://github.com/datafusion-contrib/datafusion-objectstore-s3)
-- [datafusion-objectstore-hdfs](https://github.com/datafusion-contrib/datafusion-objectstore-hdfs)
+
+### Integrations
+
 - [datafusion-bigtable](https://github.com/datafusion-contrib/datafusion-bigtable)
-- [datafusion-objectstore-azure](https://github.com/datafusion-contrib/datafusion-objectstore-azure)
+- [datafusion-catalogprovider-glue](https://github.com/datafusion-contrib/datafusion-catalogprovider-glue)
+- [datafusion-substrait](https://github.com/datafusion-contrib/datafusion-substrait)
+
+## Known Uses
 
 Here are some of the projects known to use DataFusion:
 
-- [Ballista](https://github.com/apache/arrow-ballista) Distributed Compute Platform
+- [Ballista](https://github.com/apache/arrow-ballista) Distributed SQL Query Engine
 - [Blaze](https://github.com/blaze-init/blaze) Spark accelerator with DataFusion at its core
 - [Cloudfuse Buzz](https://github.com/cloudfuse-io/buzz-rust)
 - [Cube Store](https://github.com/cube-js/cube.js/tree/master/rust)
-- [delta-rs](https://github.com/delta-io/delta-rs)
+- [datafusion-tui](https://github.com/datafusion-contrib/datafusion-tui) Text UI for DataFusion
+- [delta-rs](https://github.com/delta-io/delta-rs) Native Rust implementation of Delta Lake
 - [Flock](https://github.com/flock-lab/flock)
 - [InfluxDB IOx](https://github.com/influxdata/influxdb_iox) Time Series Database
+- [qv](https://github.com/timvw/qv) Quickly view your data

Review Comment:
   @timvw I assume you are ok with having this listed here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org