You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by ag...@apache.org on 2022/08/15 13:38:10 UTC
[arrow-datafusion] branch master updated: Move expressions to top-level page (#3134)
This is an automated email from the ASF dual-hosted git repository.
agrove pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git
The following commit(s) were added to refs/heads/master by this push:
new 6fe6dcec4 Move expressions to top-level page (#3134)
6fe6dcec4 is described below
commit 6fe6dcec4fbaa4a85e7a0132995b02b321562e47
Author: Andy Grove <an...@gmail.com>
AuthorDate: Mon Aug 15 07:38:04 2022 -0600
Move expressions to top-level page (#3134)
---
docs/source/index.rst | 1 +
docs/source/user-guide/dataframe.md | 190 +--------------------
.../user-guide/{dataframe.md => expressions.md} | 83 ---------
3 files changed, 6 insertions(+), 268 deletions(-)
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 0d6d33ef7..66b3386d5 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -42,6 +42,7 @@ Table of Contents
user-guide/library
user-guide/cli
user-guide/dataframe
+ user-guide/expressions
user-guide/sql/index
user-guide/configs
user-guide/faq
diff --git a/docs/source/user-guide/dataframe.md b/docs/source/user-guide/dataframe.md
index 7eeb7d463..d21be0e42 100644
--- a/docs/source/user-guide/dataframe.md
+++ b/docs/source/user-guide/dataframe.md
@@ -29,8 +29,6 @@ to build up a query definition.
The query can be executed by calling the `collect` method.
-The API is well documented in the [API reference on docs.rs](https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html)
-
The DataFrame struct is part of DataFusion's prelude and can be imported with the following statement.
```rust
@@ -49,6 +47,11 @@ let df = df.filter(col("a").lt_eq(col("b")))?
df.show();
```
+The DataFrame API is well documented in the [API reference on docs.rs](https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html).
+
+Refer to the [Expressions Refence](expressions) for available functions for building logical expressions for use with the
+DataFrame API.
+
## DataFrame Transformations
These methods create a new DataFrame after applying a transformation to the logical plan that the DataFrame represents.
@@ -99,186 +102,3 @@ These methods execute the logical plan represented by the DataFrame and either c
| registry | Return a `FunctionRegistry` used to plan udf's calls. |
| schema | Returns the schema describing the output of this DataFrame in terms of columns returned, where each column has a name, data type, and nullability attribute. |
| to_logical_plan | Return the logical plan represented by this DataFrame. |
-
-# Expressions
-
-DataFrame methods such as `select` and `filter` accept one or more logical expressions and there are many functions
-available for creating logical expressions. These are documented below.
-
-Expressions can be chained together using a fluent-style API:
-
-```rust
-// create the expression `(a > 5) AND (b < 7)`
-col("a").gt(lit(5)).and(col("b").lt(lit(7)))
-```
-
-## Identifiers
-
-| Function | Notes |
-| -------- | -------------------------------------------- |
-| col | Reference a column in a dataframe `col("a")` |
-
-## Literal Values
-
-| Function | Notes |
-| -------- | -------------------------------------------------- |
-| lit | Literal value such as `lit(123)` or `lit("hello")` |
-
-## Boolean Expressions
-
-| Function | Notes |
-| -------- | ----------------------------------------- |
-| and | `and(expr1, expr2)` or `expr1.and(expr2)` |
-| or | `or(expr1, expr2)` or `expr1.or(expr2)` |
-| not | `not(expr)` or `expr.not()` |
-
-## Comparison Expressions
-
-| Function | Notes |
-| -------- | --------------------- |
-| eq | `expr1.eq(expr2)` |
-| gt | `expr1.gt(expr2)` |
-| gt_eq | `expr1.gt_eq(expr2)` |
-| lt | `expr1.lt(expr2)` |
-| lt_eq | `expr1.lt_eq(expr2)` |
-| not_eq | `expr1.not_eq(expr2)` |
-
-## Math Functions
-
-In addition to the math functions listed here, some Rust operators are implemented for expressions, allowing
-expressions such as `col("a") + col("b")` to be used.
-
-| Function | Notes |
-| --------------------- | ------------------------------------------------- |
-| abs(x) | absolute value |
-| acos(x) | inverse cosine |
-| asin(x) | inverse sine |
-| atan(x) | inverse tangent |
-| atan2(y, x) | inverse tangent of y / x |
-| ceil(x) | nearest integer greater than or equal to argument |
-| cos(x) | cosine |
-| exp(x) | exponential |
-| floor(x) | nearest integer less than or equal to argument |
-| ln(x) | natural logarithm |
-| log10(x) | base 10 logarithm |
-| log2(x) | base 2 logarithm |
-| power(base, exponent) | base raised to the power of exponent |
-| round(x) | round to nearest integer |
-| signum(x) | sign of the argument (-1, 0, +1) |
-| sin(x) | sine |
-| sqrt(x) | square root |
-| tan(x) | tangent |
-| trunc(x) | truncate toward zero |
-
-## Conditional Expressions
-
-| Function | Notes |
-| -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| coalesce | Returns the first of its arguments that is not null. Null is returned only if all arguments are null. It is often used to substitute a default value for null values when data is retrieved for display. |
-| case | CASE expression. Example: `case(expr).when(expr, expr).when(expr, expr).otherwise(expr).end()`. |
-| nullif | Returns a null value if `value1` equals `value2`; otherwise it returns `value1`. This can be used to perform the inverse operation of the `coalesce` expression. |
-
-## String Expressions
-
-| Function | Notes |
-| ---------------- | ----- |
-| ascii | |
-| bit_length | |
-| btrim | |
-| char_length | |
-| character_length | |
-| concat | |
-| concat_ws | |
-| chr | |
-| initcap | |
-| left | |
-| length | |
-| lower | |
-| lpad | |
-| ltrim | |
-| md5 | |
-| octet_length | |
-| repeat | |
-| replace | |
-| reverse | |
-| right | |
-| rpad | |
-| rtrim | |
-| digest | |
-| split_part | |
-| starts_with | |
-| strpos | |
-| substr | |
-| translate | |
-| trim | |
-| upper | |
-
-## Regular Expressions
-
-| Function | Notes |
-| -------------- | ----- |
-| regexp_match | |
-| regexp_replace | |
-
-## Temporal Expressions
-
-| Function | Notes |
-| -------------------- | ------------ |
-| date_part | |
-| date_trunc | |
-| from_unixtime | |
-| to_timestamp | |
-| to_timestamp_millis | |
-| to_timestamp_micros | |
-| to_timestamp_seconds | |
-| now() | current time |
-
-## Other Expressions
-
-| Function | Notes |
-| -------- | ----- |
-| array | |
-| in_list | |
-| random | |
-| sha224 | |
-| sha256 | |
-| sha384 | |
-| sha512 | |
-| struct | |
-| to_hex | |
-
-## Aggregate Functions
-
-| Function | Notes |
-| ---------------------------------- | ----- |
-| avg | |
-| approx_distinct | |
-| approx_median | |
-| approx_percentile_cont | |
-| approx_percentile_cont_with_weight | |
-| count | |
-| count_distinct | |
-| cube | |
-| grouping_set | |
-| max | |
-| median | |
-| min | |
-| rollup | |
-| sum | |
-
-## Subquery Expressions
-
-| Function | Notes |
-| --------------- | --------------------------------------------------------------------------------------------- |
-| exists | |
-| in_subquery | `df1.filter(in_subquery(col("foo"), df2))?` is the equivalent of the SQL `WHERE foo IN <df2>` |
-| not_exists | |
-| not_in_subquery | |
-| scalar_subquery | |
-
-## User-Defined Function Expressions
-
-| Function | Notes |
-| ----------- | ----- |
-| create_udf | |
-| create_udaf | |
diff --git a/docs/source/user-guide/dataframe.md b/docs/source/user-guide/expressions.md
similarity index 53%
copy from docs/source/user-guide/dataframe.md
copy to docs/source/user-guide/expressions.md
index 7eeb7d463..79ca0e8da 100644
--- a/docs/source/user-guide/dataframe.md
+++ b/docs/source/user-guide/expressions.md
@@ -17,89 +17,6 @@
under the License.
-->
-# DataFrame API
-
-A DataFrame represents a logical set of rows with the same named columns, similar to a [Pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) or
-[Spark DataFrame](https://spark.apache.org/docs/latest/sql-programming-guide.html).
-
-DataFrames are typically created by calling a method on
-`SessionContext`, such as `read_csv`, and can then be modified
-by calling the transformation methods, such as `filter`, `select`, `aggregate`, and `limit`
-to build up a query definition.
-
-The query can be executed by calling the `collect` method.
-
-The API is well documented in the [API reference on docs.rs](https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrame.html)
-
-The DataFrame struct is part of DataFusion's prelude and can be imported with the following statement.
-
-```rust
-use datafusion::prelude::*;
-```
-
-Here is a minimal example showing the execution of a query using the DataFrame API.
-
-```rust
-let ctx = SessionContext::new();
-let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new()).await?;
-let df = df.filter(col("a").lt_eq(col("b")))?
- .aggregate(vec![col("a")], vec![min(col("b"))])?
- .limit(None, Some(100))?;
-// Print results
-df.show();
-```
-
-## DataFrame Transformations
-
-These methods create a new DataFrame after applying a transformation to the logical plan that the DataFrame represents.
-
-DataFusion DataFrames use lazy evaluation, meaning that each transformation is just creating a new query plan and
-not actually performing any transformations. This approach allows for the overall plan to be optimized before
-execution. The plan is evaluated (executed) when an action method is invoked, such as `collect`.
-
-| Function | Notes |
-| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
-| aggregate | Perform an aggregate query with optional grouping expressions. |
-| distinct | Filter out duplicate rows. |
-| except | Calculate the exception of two DataFrames. The two DataFrames must have exactly the same schema |
-| filter | Filter a DataFrame to only include rows that match the specified filter expression. |
-| intersect | Calculate the intersection of two DataFrames. The two DataFrames must have exactly the same schema |
-| join | Join this DataFrame with another DataFrame using the specified columns as join keys. |
-| limit | Limit the number of rows returned from this DataFrame. |
-| repartition | Repartition a DataFrame based on a logical partitioning scheme. |
-| sort | Sort the DataFrame by the specified sorting expressions. Any expression can be turned into a sort expression by calling its `sort` method. |
-| select | Create a projection based on arbitrary expressions. Example: `df..select(vec![col("c1"), abs(col("c2"))])?` |
-| select_columns | Create a projection based on column names. Example: `df.select_columns(&["id", "name"])?`. |
-| union | Calculate the union of two DataFrames, preserving duplicate rows. The two DataFrames must have exactly the same schema. |
-| union_distinct | Calculate the distinct union of two DataFrames. The two DataFrames must have exactly the same schema. |
-| with_column | Add an additional column to the DataFrame. |
-| with_column_renamed | Rename one column by applying a new projection. |
-
-## DataFrame Actions
-
-These methods execute the logical plan represented by the DataFrame and either collects the results into memory, prints them to stdout, or writes them to disk.
-
-| Function | Notes |
-| -------------------------- | --------------------------------------------------------------------------------------------------------------------------- |
-| collect | Executes this DataFrame and collects all results into a vector of RecordBatch. |
-| collect_partitioned | Executes this DataFrame and collects all results into a vector of vector of RecordBatch maintaining the input partitioning. |
-| execute_stream | Executes this DataFrame and returns a stream over a single partition. |
-| execute_stream_partitioned | Executes this DataFrame and returns one stream per partition. |
-| show | Execute this DataFrame and print the results to stdout. |
-| show_limit | Execute this DataFrame and print a subset of results to stdout. |
-| write_csv | Execute this DataFrame and write the results to disk in CSV format. |
-| write_json | Execute this DataFrame and write the results to disk in JSON format. |
-| write_parquet | Execute this DataFrame and write the results to disk in Parquet format. |
-
-## Other DataFrame Methods
-
-| Function | Notes |
-| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| explain | Return a DataFrame with the explanation of its plan so far. |
-| registry | Return a `FunctionRegistry` used to plan udf's calls. |
-| schema | Returns the schema describing the output of this DataFrame in terms of columns returned, where each column has a name, data type, and nullability attribute. |
-| to_logical_plan | Return the logical plan represented by this DataFrame. |
-
# Expressions
DataFrame methods such as `select` and `filter` accept one or more logical expressions and there are many functions