You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by ag...@apache.org on 2022/08/24 17:09:15 UTC

[arrow-datafusion] branch master updated: MINOR: documentation updates (#3239)

This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


The following commit(s) were added to refs/heads/master by this push:
     new 51498ca5e MINOR: documentation updates (#3239)
51498ca5e is described below

commit 51498ca5ed2bafb39abf208c5b74f40ec17c57aa
Author: kmitchener <km...@gmail.com>
AuthorDate: Wed Aug 24 13:09:10 2022 -0400

    MINOR: documentation updates (#3239)
    
    * update SQL data type docs to reflect code
    move extensibility and rust version compatibility into "how to use library" section
    update sql_status with case, coalesce, data sources
    
    * undo page name change
    
    * ...
    
    * making npx prettier happy
---
 docs/source/user-guide/library.md                | 17 ++++-
 docs/source/user-guide/sql/data_types.md         | 87 ++++++++++++++++--------
 docs/source/user-guide/sql/information_schema.md | 41 ++++-------
 docs/source/user-guide/sql/sql_status.md         | 26 +++----
 4 files changed, 97 insertions(+), 74 deletions(-)

diff --git a/docs/source/user-guide/library.md b/docs/source/user-guide/library.md
index 688520f9c..c7cc1ec42 100644
--- a/docs/source/user-guide/library.md
+++ b/docs/source/user-guide/library.md
@@ -69,6 +69,21 @@ async fn main() -> datafusion::error::Result<()> {
 }
 ```
 
+## Extensibility
+
+DataFusion is designed to be extensible at all points. To that end, you can provide your own custom:
+
+- [x] User Defined Functions (UDFs)
+- [x] User Defined Aggregate Functions (UDAFs)
+- [x] User Defined Table Source (`TableProvider`) for tables
+- [x] User Defined `Optimizer` passes (plan rewrites)
+- [x] User Defined `LogicalPlan` nodes
+- [x] User Defined `ExecutionPlan` nodes
+
+## Rust Version Compatibility
+
+This crate is tested with the latest stable version of Rust. We do not currently test against other, older versions of the Rust compiler.
+
 ## Optimized Configuration
 
 For an optimized build several steps are required. First, use the below in your `Cargo.toml`. It is
@@ -94,7 +109,7 @@ use datafusion::prelude::*;
 static ALLOC: snmalloc_rs::SnMalloc = snmalloc_rs::SnMalloc;
 
 async fn main() -> datafusion::error::Result<()> {
-  ...
+  Ok(())
 }
 ```
 
diff --git a/docs/source/user-guide/sql/data_types.md b/docs/source/user-guide/sql/data_types.md
index 7a3ed7adb..3325d4a77 100644
--- a/docs/source/user-guide/sql/data_types.md
+++ b/docs/source/user-guide/sql/data_types.md
@@ -22,30 +22,63 @@
 DataFusion uses Arrow, and thus the Arrow type system, for query
 execution. The SQL types from
 [sqlparser-rs](https://github.com/sqlparser-rs/sqlparser-rs/blob/main/src/ast/data_type.rs#L27)
-are mapped to Arrow types according to the following table
-
-| SQL Data Type | Arrow DataType                                                           |
-| ------------- | ------------------------------------------------------------------------ |
-| `CHAR`        | `Utf8`                                                                   |
-| `VARCHAR`     | `Utf8`                                                                   |
-| `UUID`        | _Not yet supported_                                                      |
-| `CLOB`        | _Not yet supported_                                                      |
-| `BINARY`      | _Not yet supported_                                                      |
-| `VARBINARY`   | _Not yet supported_                                                      |
-| `DECIMAL`     | `Float64`                                                                |
-| `FLOAT`       | `Float32`                                                                |
-| `SMALLINT`    | `Int16`                                                                  |
-| `INT`         | `Int32`                                                                  |
-| `BIGINT`      | `Int64`                                                                  |
-| `REAL`        | `Float32`                                                                |
-| `DOUBLE`      | `Float64`                                                                |
-| `BOOLEAN`     | `Boolean`                                                                |
-| `DATE`        | `Date32`                                                                 |
-| `TIME`        | `Time64(TimeUnit::Nanosecond)`                                           |
-| `TIMESTAMP`   | `Timestamp(TimeUnit::Nanosecond)`                                        |
-| `INTERVAL`    | `Interval(YearMonth)` or `Interval(MonthDayNano)` or `Interval(DayTime)` |
-| `REGCLASS`    | _Not yet supported_                                                      |
-| `TEXT`        | `Utf8`                                                                   |
-| `BYTEA`       | `Binary`                                                                 |
-| `CUSTOM`      | _Not yet supported_                                                      |
-| `ARRAY`       | _Not yet supported_                                                      |
+are mapped to [Arrow data types](https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html) according to the following table.
+
+## Character Types
+
+| SQL DataType | Arrow DataType |
+| ------------ | -------------- |
+| `CHAR`       | `Utf8`         |
+| `VARCHAR`    | `Utf8`         |
+| `TEXT`       | `Utf8`         |
+
+## Numeric Types
+
+| SQL DataType       | Arrow DataType    |
+| ------------------ | :---------------- |
+| `SMALLINT`         | `Int16`           |
+| `INT` or `INTEGER` | `Int32`           |
+| `BIGINT`           | `Int64`           |
+| `FLOAT`            | `Float32`         |
+| `REAL`             | `Float32`         |
+| `DOUBLE`           | `Float64`         |
+| `DECIMAL(p,s)`     | `Decimal128(p,s)` |
+
+## Date/Time Types
+
+| SQL DataType | Arrow DataType                          |
+| ------------ | :-------------------------------------- |
+| `DATE`       | `Date32`                                |
+| `TIME`       | `Time64(TimeUnit::Nanosecond)`          |
+| `TIMESTAMP`  | `Timestamp(TimeUnit::Nanosecond, None)` |
+
+## Boolean Types
+
+| SQL DataType | Arrow DataType |
+| ------------ | :------------- |
+| `BOOLEAN`    | `Boolean`      |
+
+## Unsupported Types
+
+| SQL Data Type       | Arrow DataType      |
+| ------------------- | :------------------ |
+| `UUID`              | _Not yet supported_ |
+| `BLOB`              | _Not yet supported_ |
+| `CLOB`              | _Not yet supported_ |
+| `BINARY`            | _Not yet supported_ |
+| `VARBINARY`         | _Not yet supported_ |
+| `BYTEA`             | _Not yet supported_ |
+| `REGCLASS`          | _Not yet supported_ |
+| `NVARCHAR`          | _Not yet supported_ |
+| `STRING`            | _Not yet supported_ |
+| `CUSTOM`            | _Not yet supported_ |
+| `ARRAY`             | _Not yet supported_ |
+| `ENUM`              | _Not yet supported_ |
+| `SET`               | _Not yet supported_ |
+| `INTERVAL`          | _Not yet supported_ |
+| `DATETIME`          | _Not yet supported_ |
+| `TINYINT`           | _Not yet supported_ |
+| `UNSIGNED TINYINT`  | _Not yet supported_ |
+| `UNSIGNED SMALLINT` | _Not yet supported_ |
+| `UNSIGNED INT`      | _Not yet supported_ |
+| `UNSIGNED BIGINT`   | _Not yet supported_ |
diff --git a/docs/source/user-guide/sql/information_schema.md b/docs/source/user-guide/sql/information_schema.md
index ee0fbfd37..3e04fad55 100644
--- a/docs/source/user-guide/sql/information_schema.md
+++ b/docs/source/user-guide/sql/information_schema.md
@@ -19,50 +19,35 @@
 
 # Information Schema
 
-DataFusion supports showing metadata about the tables available. This information can be accessed using the views
-of the ISO SQL `information_schema` schema or the DataFusion specific `SHOW TABLES` and `SHOW COLUMNS` commands.
+DataFusion supports showing metadata about the tables and views available. This information can be accessed using the
+views of the ISO SQL `information_schema` schema or the DataFusion specific `SHOW TABLES` and `SHOW COLUMNS` commands.
 
-More information can be found in the [Postgres docs](https://www.postgresql.org/docs/13/infoschema-schema.html)).
+To show tables in the DataFusion catalog, use the `SHOW TABLES` command or the `information_schema.tables` view:
 
-To show tables available for use in DataFusion, use the `SHOW TABLES` command or the `information_schema.tables` view:
-
-```sql
+```text
 > show tables;
+or
+> select * from information_schema.tables;
 +---------------+--------------------+------------+------------+
 | table_catalog | table_schema       | table_name | table_type |
 +---------------+--------------------+------------+------------+
 | datafusion    | public             | t          | BASE TABLE |
 | datafusion    | information_schema | tables     | VIEW       |
+| datafusion    | information_schema | views      | VIEW       |
+| datafusion    | information_schema | columns    | VIEW       |
 +---------------+--------------------+------------+------------+
 
-> select * from information_schema.tables;
-
-+---------------+--------------------+------------+--------------+
-| table_catalog | table_schema       | table_name | table_type   |
-+---------------+--------------------+------------+--------------+
-| datafusion    | public             | t          | BASE TABLE   |
-| datafusion    | information_schema | TABLES     | SYSTEM TABLE |
-+---------------+--------------------+------------+--------------+
 ```
 
-To show the schema of a table in DataFusion, use the `SHOW COLUMNS` command or the or `information_schema.columns` view:
+To show the schema of a table in DataFusion, use the `SHOW COLUMNS` command or the `information_schema.columns` view:
 
-```sql
+```text
 > show columns from t;
+or
+> select table_catalog, table_schema, table_name, column_name, data_type, is_nullable from information_schema.columns;
 +---------------+--------------+------------+-------------+-----------+-------------+
 | table_catalog | table_schema | table_name | column_name | data_type | is_nullable |
 +---------------+--------------+------------+-------------+-----------+-------------+
-| datafusion    | public       | t          | a           | Int32     | NO          |
-| datafusion    | public       | t          | b           | Utf8      | NO          |
-| datafusion    | public       | t          | c           | Float32   | NO          |
+| datafusion    | public       | t1         | Int64(1)    | Int64     | NO          |
 +---------------+--------------+------------+-------------+-----------+-------------+
-
-> select table_name, column_name, ordinal_position, is_nullable, data_type from information_schema.columns;
-+------------+-------------+------------------+-------------+-----------+
-| table_name | column_name | ordinal_position | is_nullable | data_type |
-+------------+-------------+------------------+-------------+-----------+
-| t          | a           | 0                | NO          | Int32     |
-| t          | b           | 1                | NO          | Utf8      |
-| t          | c           | 2                | NO          | Float32   |
-+------------+-------------+------------------+-------------+-----------+
 ```
diff --git a/docs/source/user-guide/sql/sql_status.md b/docs/source/user-guide/sql/sql_status.md
index 70747b5d0..b260ecb4b 100644
--- a/docs/source/user-guide/sql/sql_status.md
+++ b/docs/source/user-guide/sql/sql_status.md
@@ -72,8 +72,10 @@
   - [x] to_hex
   - [x] translate
   - [x] trim
-- Miscellaneous/Boolean functions
+- Conditional functions
   - [x] nullif
+  - [x] case
+  - [x] coalesce
 - Approximation functions
   - [x] approx_distinct
   - [x] approx_median
@@ -93,8 +95,9 @@
   - [x] Array of columns
 - [x] Schema Queries
   - [x] SHOW TABLES
-  - [x] SHOW COLUMNS
-  - [x] information_schema.{tables, columns}
+  - [x] SHOW COLUMNS FROM <table/view>
+  - [x] SHOW CREATE TABLE <view>
+  - [x] information_schema.{tables, columns, views}
   - [ ] information_schema other views
 - [x] Sorting
 - [ ] Nested types
@@ -128,18 +131,5 @@
 - [x] CSV
 - [x] Parquet primitive types
 - [ ] Parquet nested types
-
-## Extensibility
-
-DataFusion is designed to be extensible at all points. To that end, you can provide your own custom:
-
-- [x] User Defined Functions (UDFs)
-- [x] User Defined Aggregate Functions (UDAFs)
-- [x] User Defined Table Source (`TableProvider`) for tables
-- [x] User Defined `Optimizer` passes (plan rewrites)
-- [x] User Defined `LogicalPlan` nodes
-- [x] User Defined `ExecutionPlan` nodes
-
-## Rust Version Compatibility
-
-This crate is tested with the latest stable version of Rust. We do not currently test against other, older versions of the Rust compiler.
+- [x] JSON
+- [x] Avro