You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by ag...@apache.org on 2022/08/24 17:09:15 UTC
[arrow-datafusion] branch master updated: MINOR: documentation updates (#3239)
This is an automated email from the ASF dual-hosted git repository.
agrove pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git
The following commit(s) were added to refs/heads/master by this push:
new 51498ca5e MINOR: documentation updates (#3239)
51498ca5e is described below
commit 51498ca5ed2bafb39abf208c5b74f40ec17c57aa
Author: kmitchener <km...@gmail.com>
AuthorDate: Wed Aug 24 13:09:10 2022 -0400
MINOR: documentation updates (#3239)
* update SQL data type docs to reflect code
move extensibility and rust version compatibility into "how to use library" section
update sql_status with case, coalesce, data sources
* undo page name change
* ...
* making npx prettier happy
---
docs/source/user-guide/library.md | 17 ++++-
docs/source/user-guide/sql/data_types.md | 87 ++++++++++++++++--------
docs/source/user-guide/sql/information_schema.md | 41 ++++-------
docs/source/user-guide/sql/sql_status.md | 26 +++----
4 files changed, 97 insertions(+), 74 deletions(-)
diff --git a/docs/source/user-guide/library.md b/docs/source/user-guide/library.md
index 688520f9c..c7cc1ec42 100644
--- a/docs/source/user-guide/library.md
+++ b/docs/source/user-guide/library.md
@@ -69,6 +69,21 @@ async fn main() -> datafusion::error::Result<()> {
}
```
+## Extensibility
+
+DataFusion is designed to be extensible at all points. To that end, you can provide your own custom:
+
+- [x] User Defined Functions (UDFs)
+- [x] User Defined Aggregate Functions (UDAFs)
+- [x] User Defined Table Source (`TableProvider`) for tables
+- [x] User Defined `Optimizer` passes (plan rewrites)
+- [x] User Defined `LogicalPlan` nodes
+- [x] User Defined `ExecutionPlan` nodes
+
+## Rust Version Compatibility
+
+This crate is tested with the latest stable version of Rust. We do not currently test against other, older versions of the Rust compiler.
+
## Optimized Configuration
For an optimized build several steps are required. First, use the below in your `Cargo.toml`. It is
@@ -94,7 +109,7 @@ use datafusion::prelude::*;
static ALLOC: snmalloc_rs::SnMalloc = snmalloc_rs::SnMalloc;
async fn main() -> datafusion::error::Result<()> {
- ...
+ Ok(())
}
```
diff --git a/docs/source/user-guide/sql/data_types.md b/docs/source/user-guide/sql/data_types.md
index 7a3ed7adb..3325d4a77 100644
--- a/docs/source/user-guide/sql/data_types.md
+++ b/docs/source/user-guide/sql/data_types.md
@@ -22,30 +22,63 @@
DataFusion uses Arrow, and thus the Arrow type system, for query
execution. The SQL types from
[sqlparser-rs](https://github.com/sqlparser-rs/sqlparser-rs/blob/main/src/ast/data_type.rs#L27)
-are mapped to Arrow types according to the following table
-
-| SQL Data Type | Arrow DataType |
-| ------------- | ------------------------------------------------------------------------ |
-| `CHAR` | `Utf8` |
-| `VARCHAR` | `Utf8` |
-| `UUID` | _Not yet supported_ |
-| `CLOB` | _Not yet supported_ |
-| `BINARY` | _Not yet supported_ |
-| `VARBINARY` | _Not yet supported_ |
-| `DECIMAL` | `Float64` |
-| `FLOAT` | `Float32` |
-| `SMALLINT` | `Int16` |
-| `INT` | `Int32` |
-| `BIGINT` | `Int64` |
-| `REAL` | `Float32` |
-| `DOUBLE` | `Float64` |
-| `BOOLEAN` | `Boolean` |
-| `DATE` | `Date32` |
-| `TIME` | `Time64(TimeUnit::Nanosecond)` |
-| `TIMESTAMP` | `Timestamp(TimeUnit::Nanosecond)` |
-| `INTERVAL` | `Interval(YearMonth)` or `Interval(MonthDayNano)` or `Interval(DayTime)` |
-| `REGCLASS` | _Not yet supported_ |
-| `TEXT` | `Utf8` |
-| `BYTEA` | `Binary` |
-| `CUSTOM` | _Not yet supported_ |
-| `ARRAY` | _Not yet supported_ |
+are mapped to [Arrow data types](https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html) according to the following table.
+
+## Character Types
+
+| SQL DataType | Arrow DataType |
+| ------------ | -------------- |
+| `CHAR` | `Utf8` |
+| `VARCHAR` | `Utf8` |
+| `TEXT` | `Utf8` |
+
+## Numeric Types
+
+| SQL DataType | Arrow DataType |
+| ------------------ | :---------------- |
+| `SMALLINT` | `Int16` |
+| `INT` or `INTEGER` | `Int32` |
+| `BIGINT` | `Int64` |
+| `FLOAT` | `Float32` |
+| `REAL` | `Float32` |
+| `DOUBLE` | `Float64` |
+| `DECIMAL(p,s)` | `Decimal128(p,s)` |
+
+## Date/Time Types
+
+| SQL DataType | Arrow DataType |
+| ------------ | :-------------------------------------- |
+| `DATE` | `Date32` |
+| `TIME` | `Time64(TimeUnit::Nanosecond)` |
+| `TIMESTAMP` | `Timestamp(TimeUnit::Nanosecond, None)` |
+
+## Boolean Types
+
+| SQL DataType | Arrow DataType |
+| ------------ | :------------- |
+| `BOOLEAN` | `Boolean` |
+
+## Unsupported Types
+
+| SQL Data Type | Arrow DataType |
+| ------------------- | :------------------ |
+| `UUID` | _Not yet supported_ |
+| `BLOB` | _Not yet supported_ |
+| `CLOB` | _Not yet supported_ |
+| `BINARY` | _Not yet supported_ |
+| `VARBINARY` | _Not yet supported_ |
+| `BYTEA` | _Not yet supported_ |
+| `REGCLASS` | _Not yet supported_ |
+| `NVARCHAR` | _Not yet supported_ |
+| `STRING` | _Not yet supported_ |
+| `CUSTOM` | _Not yet supported_ |
+| `ARRAY` | _Not yet supported_ |
+| `ENUM` | _Not yet supported_ |
+| `SET` | _Not yet supported_ |
+| `INTERVAL` | _Not yet supported_ |
+| `DATETIME` | _Not yet supported_ |
+| `TINYINT` | _Not yet supported_ |
+| `UNSIGNED TINYINT` | _Not yet supported_ |
+| `UNSIGNED SMALLINT` | _Not yet supported_ |
+| `UNSIGNED INT` | _Not yet supported_ |
+| `UNSIGNED BIGINT` | _Not yet supported_ |
diff --git a/docs/source/user-guide/sql/information_schema.md b/docs/source/user-guide/sql/information_schema.md
index ee0fbfd37..3e04fad55 100644
--- a/docs/source/user-guide/sql/information_schema.md
+++ b/docs/source/user-guide/sql/information_schema.md
@@ -19,50 +19,35 @@
# Information Schema
-DataFusion supports showing metadata about the tables available. This information can be accessed using the views
-of the ISO SQL `information_schema` schema or the DataFusion specific `SHOW TABLES` and `SHOW COLUMNS` commands.
+DataFusion supports showing metadata about the tables and views available. This information can be accessed using the
+views of the ISO SQL `information_schema` schema or the DataFusion specific `SHOW TABLES` and `SHOW COLUMNS` commands.
-More information can be found in the [Postgres docs](https://www.postgresql.org/docs/13/infoschema-schema.html)).
+To show tables in the DataFusion catalog, use the `SHOW TABLES` command or the `information_schema.tables` view:
-To show tables available for use in DataFusion, use the `SHOW TABLES` command or the `information_schema.tables` view:
-
-```sql
+```text
> show tables;
+or
+> select * from information_schema.tables;
+---------------+--------------------+------------+------------+
| table_catalog | table_schema | table_name | table_type |
+---------------+--------------------+------------+------------+
| datafusion | public | t | BASE TABLE |
| datafusion | information_schema | tables | VIEW |
+| datafusion | information_schema | views | VIEW |
+| datafusion | information_schema | columns | VIEW |
+---------------+--------------------+------------+------------+
-> select * from information_schema.tables;
-
-+---------------+--------------------+------------+--------------+
-| table_catalog | table_schema | table_name | table_type |
-+---------------+--------------------+------------+--------------+
-| datafusion | public | t | BASE TABLE |
-| datafusion | information_schema | TABLES | SYSTEM TABLE |
-+---------------+--------------------+------------+--------------+
```
-To show the schema of a table in DataFusion, use the `SHOW COLUMNS` command or the or `information_schema.columns` view:
+To show the schema of a table in DataFusion, use the `SHOW COLUMNS` command or the `information_schema.columns` view:
-```sql
+```text
> show columns from t;
+or
+> select table_catalog, table_schema, table_name, column_name, data_type, is_nullable from information_schema.columns;
+---------------+--------------+------------+-------------+-----------+-------------+
| table_catalog | table_schema | table_name | column_name | data_type | is_nullable |
+---------------+--------------+------------+-------------+-----------+-------------+
-| datafusion | public | t | a | Int32 | NO |
-| datafusion | public | t | b | Utf8 | NO |
-| datafusion | public | t | c | Float32 | NO |
+| datafusion | public | t1 | Int64(1) | Int64 | NO |
+---------------+--------------+------------+-------------+-----------+-------------+
-
-> select table_name, column_name, ordinal_position, is_nullable, data_type from information_schema.columns;
-+------------+-------------+------------------+-------------+-----------+
-| table_name | column_name | ordinal_position | is_nullable | data_type |
-+------------+-------------+------------------+-------------+-----------+
-| t | a | 0 | NO | Int32 |
-| t | b | 1 | NO | Utf8 |
-| t | c | 2 | NO | Float32 |
-+------------+-------------+------------------+-------------+-----------+
```
diff --git a/docs/source/user-guide/sql/sql_status.md b/docs/source/user-guide/sql/sql_status.md
index 70747b5d0..b260ecb4b 100644
--- a/docs/source/user-guide/sql/sql_status.md
+++ b/docs/source/user-guide/sql/sql_status.md
@@ -72,8 +72,10 @@
- [x] to_hex
- [x] translate
- [x] trim
-- Miscellaneous/Boolean functions
+- Conditional functions
- [x] nullif
+ - [x] case
+ - [x] coalesce
- Approximation functions
- [x] approx_distinct
- [x] approx_median
@@ -93,8 +95,9 @@
- [x] Array of columns
- [x] Schema Queries
- [x] SHOW TABLES
- - [x] SHOW COLUMNS
- - [x] information_schema.{tables, columns}
+ - [x] SHOW COLUMNS FROM <table/view>
+ - [x] SHOW CREATE TABLE <view>
+ - [x] information_schema.{tables, columns, views}
- [ ] information_schema other views
- [x] Sorting
- [ ] Nested types
@@ -128,18 +131,5 @@
- [x] CSV
- [x] Parquet primitive types
- [ ] Parquet nested types
-
-## Extensibility
-
-DataFusion is designed to be extensible at all points. To that end, you can provide your own custom:
-
-- [x] User Defined Functions (UDFs)
-- [x] User Defined Aggregate Functions (UDAFs)
-- [x] User Defined Table Source (`TableProvider`) for tables
-- [x] User Defined `Optimizer` passes (plan rewrites)
-- [x] User Defined `LogicalPlan` nodes
-- [x] User Defined `ExecutionPlan` nodes
-
-## Rust Version Compatibility
-
-This crate is tested with the latest stable version of Rust. We do not currently test against other, older versions of the Rust compiler.
+- [x] JSON
+- [x] Avro