You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by ne...@apache.org on 2022/04/15 08:02:05 UTC

[arrow-datafusion] branch rdbms-changes updated (e6614aa8f -> 724f4e336)

This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch rdbms-changes
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


 discard e6614aa8f add a Tablesource
     add bed81eade MINOR: fix concat_ws corner bug (#2128)
     add 536210d73 fix df union all bug (#2108)
     add d54ba4e64 feat: 2061 create external table ddl table partition cols (#2099)
     add 88dd6ca3d Update sqlparser requirement from 0.15 to 0.16 (#2152)
     add a0d8b6633 cli: add cargo.lock (#2112)
     add fa5cef8c9 Fixed parquet path partitioning when only selecting partitioned columns (#2000)
     add 69ba713c4 #2109 schema infer max (#2139)
     add 5ae343404 [MINOR] after sqlparser update to 0.16, enable EXTRACT week. (#2157)
     add f99c2719a Update quarterly roadmap for Q2 (#2133)
     add 2a4a835bd fix:  incorrect memory usage track for sort (#2135)
     add ceffb2fca Reduce SortExec memory usage by void constructing single huge batch (#2132)
     add 823011590 Add IF NOT EXISTS to `CREATE TABLE` and `CREATE EXTERNAL TABLE` (#2143)
     add 38498b7bf Reduce repetition in Decimal binary kernels, upgrade to arrow 11.1 (#2107)
     add 8b09a5c6c Add CREATE DATABASE command to SQL (#2094)
     add b890190a6 Add Coalesce function (#1969)
     add 0c4ffd4f7 Add delimiter for create external table (#2162)
     add ea16c30ed [MINOR] ignore suspicious slow test in Ballista (#2167)
     add e5e8125a1 Serialize scalar UDFs in physical plan (#2130)
     add f0200b0a9 [CLI] Add show tables for datafusion-cli (#2137)
     add 0da1f370f minor: Avoid per cell evaluation in Coalesce, use zip in CaseWhen (#2171)
     add 6504d2a78 enable explain for ballista (#2163)
     add fa9e01641 Implement fast path of with_new_children() in ExecutionPlan (#2168)
     add ddf29f112 implement 'StringConcat' operator to support sql like "select 'aa' || 'b' " (#2142)
     add 9815ac6ec Handle merged schemas in parquet pruning (#2170)
     add 70f2b1a9b add ballista plugin manager and udf plugin (#2131)
     add 9cbde6d0e cli: update lockfile (#2178)
     add dec9adcbe Optimize the evaluation of `IN` for large lists using InSet (#2156)
     add a63751494 fix: Sort with a lot of repetition values (#2182)
     add 2d908405f fix 'not' expression will 'NULL' constants (#2144)
     add 41d2ff2aa Make PhysicalAggregateExprNode has repeated PhysicalExprNode (#2184)
     add 73ed545b7 refactor: simplify `prepare_select_exprs` (#2190)
     add 7558a5591 make nightly clippy happy (#2186)
     add c46c91ff3 Multiple row-layout support, part-1: Restructure code for clearness (#2189)
     add 28a6da3d2 MINOR: handle `NULL` in advance to avoid value copy in `string_concat` (#2183)
     add f3360d30b Remove tokio::spawn from WindowAggExec (#2201) (#2203)
     add ee95d41cc Add LogicalPlan::SubqueryAlias (#2172)
     add 6d75948b6 Use `filter` (filter_record_batch) instead of `take` to avoid using indices (#2218)
     add 231027274 feat: Support simple Arrays with Literals (#2194)
     add d81657de0 `case when` supports `NULL`  constant (#2197)
     add 7a6317a0e Add single line description of ExecutionPlan (#2216) (#2217)
     add f39692932 Make ParquetExec usable outside of a tokio runtime (#2201) (#2202)
     add 8058fbb38 Remove tokio::spawn from HashAggregateExec (#2201) (#2215)
     add 774b91bad minor refactor to avoid repeated code (#2222)
     add e7b08ed0e Range scan support for ParquetExec (#1990)
     add b1a28d077 update cli readme (#2220)
     add 8d5bb47f5 add sql level test for decimal data type (#2200)
     add d631a9ca2 chore: add `debug!` log in some execution operators (#2231)
     add 7e7b3ea02 minor: add editor config file (#2224)
     add 3d2e7b0bf Add type coercion rule for date + interval (#2235)
     new 724f4e336 add a Tablesource

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (e6614aa8f)
            \
             N -- N -- N   refs/heads/rdbms-changes (724f4e336)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../integration_hiveserver2.sh => .editorconfig    |   23 +-
 .github/workflows/rust.yml                         |    8 +-
 .gitignore                                         |    1 +
 Cargo.toml                                         |    2 +-
 ballista-examples/src/bin/ballista-dataframe.rs    |    4 +-
 ballista/rust/client/Cargo.toml                    |    2 +-
 ballista/rust/client/src/context.rs                |  153 +-
 ballista/rust/core/Cargo.toml                      |    9 +-
 ballista/rust/core/build.rs                        |    2 +
 ballista/rust/core/proto/ballista.proto            |   41 +-
 ballista/rust/core/proto/datafusion.proto          |   25 +
 ballista/rust/core/src/config.rs                   |   26 +-
 .../core/src/execution_plans/distributed_query.rs  |    2 +-
 .../core/src/execution_plans/shuffle_reader.rs     |    2 +-
 .../core/src/execution_plans/shuffle_writer.rs     |    3 +-
 .../core/src/execution_plans/unresolved_shuffle.rs |    2 +-
 ballista/rust/core/src/lib.rs                      |    2 +
 ballista/rust/core/src/plugin/mod.rs               |  127 +
 ballista/rust/core/src/plugin/plugin_manager.rs    |  150 ++
 ballista/rust/core/src/plugin/udf.rs               |  152 ++
 ballista/rust/core/src/serde/logical_plan/mod.rs   |   73 +-
 ballista/rust/core/src/serde/mod.rs                |   15 +-
 .../core/src/serde/physical_plan/from_proto.rs     |  302 ++-
 ballista/rust/core/src/serde/physical_plan/mod.rs  |  227 +-
 .../rust/core/src/serde/physical_plan/to_proto.rs  |   60 +-
 ballista/rust/executor/Cargo.toml                  |    4 +-
 ballista/rust/executor/executor_config_spec.toml   |    6 +
 ballista/rust/executor/src/collect.rs              |    2 +-
 ballista/rust/executor/src/cpu_bound_executor.rs   |    2 +
 ballista/rust/executor/src/execution_loop.rs       |    6 +-
 ballista/rust/executor/src/executor_server.rs      |    6 +-
 ballista/rust/scheduler/scheduler_config_spec.toml |    8 +-
 ballista/rust/scheduler/src/planner.rs             |   25 +-
 .../rust/scheduler/src/scheduler_server/mod.rs     |    2 +-
 .../src/scheduler_server/query_stage_scheduler.rs  |    3 +-
 .../rust/scheduler/src/state/persistent_state.rs   |   45 +-
 benchmarks/src/bin/nyctaxi.rs                      |    7 +-
 benchmarks/src/bin/tpch.rs                         |    2 +-
 datafusion-cli/Cargo.lock                          | 2422 ++++++++++++++++++++
 datafusion-cli/Cargo.toml                          |    2 +-
 datafusion-cli/README.md                           |   14 +-
 datafusion-cli/src/context.rs                      |    5 +-
 datafusion-examples/Cargo.toml                     |    2 +-
 datafusion-examples/examples/custom_datasource.rs  |   15 +-
 datafusion-examples/examples/dataframe.rs          |    2 +-
 datafusion-examples/examples/flight_server.rs      |    1 +
 datafusion-examples/examples/parquet_sql.rs        |    1 +
 datafusion/common/Cargo.toml                       |    6 +-
 datafusion/common/src/dfschema.rs                  |   34 +-
 datafusion/core/Cargo.toml                         |    6 +-
 datafusion/core/benches/parquet_query_sql.rs       |   10 +-
 datafusion/core/fuzz-utils/Cargo.toml              |    2 +-
 datafusion/core/src/datasource/file_format/avro.rs |    8 +-
 datafusion/core/src/datasource/file_format/csv.rs  |   13 +-
 datafusion/core/src/datasource/file_format/json.rs |   24 +-
 datafusion/core/src/datasource/file_format/mod.rs  |   14 +-
 .../core/src/datasource/file_format/parquet.rs     |  158 +-
 datafusion/core/src/datasource/listing/helpers.rs  |    3 +
 datafusion/core/src/datasource/listing/mod.rs      |   29 +-
 datafusion/core/src/datasource/listing/table.rs    |    5 +-
 datafusion/core/src/execution/context.rs           |  226 +-
 datafusion/core/src/execution/memory_manager.rs    |   37 +-
 datafusion/core/src/execution/options.rs           |  111 +-
 datafusion/core/src/logical_plan/builder.rs        |   84 +-
 datafusion/core/src/logical_plan/mod.rs            |    8 +-
 datafusion/core/src/logical_plan/plan.rs           |   60 +-
 .../core/src/optimizer/common_subexpr_eliminate.rs |    2 +
 datafusion/core/src/optimizer/limit_push_down.rs   |   18 +-
 .../core/src/optimizer/projection_push_down.rs     |   85 +-
 datafusion/core/src/optimizer/utils.rs             |   31 +-
 .../src/physical_optimizer/coalesce_batches.rs     |    3 +-
 .../core/src/physical_optimizer/merge_exec.rs      |   12 +-
 .../core/src/physical_optimizer/repartition.rs     |    6 +-
 datafusion/core/src/physical_optimizer/utils.rs    |    4 +-
 .../core/src/physical_plan/aggregate_rule.rs       |    3 +-
 datafusion/core/src/physical_plan/analyze.rs       |   19 +-
 .../core/src/physical_plan/coalesce_batches.rs     |   17 +-
 .../core/src/physical_plan/coalesce_partitions.rs  |    9 +-
 datafusion/core/src/physical_plan/cross_join.rs    |   20 +-
 datafusion/core/src/physical_plan/display.rs       |   27 +
 datafusion/core/src/physical_plan/empty.rs         |   33 +-
 datafusion/core/src/physical_plan/explain.rs       |   25 +-
 .../core/src/physical_plan/file_format/avro.rs     |   13 +-
 .../core/src/physical_plan/file_format/csv.rs      |   21 +-
 .../core/src/physical_plan/file_format/json.rs     |   13 +-
 .../core/src/physical_plan/file_format/mod.rs      |   43 +-
 .../core/src/physical_plan/file_format/parquet.rs  |  719 ++++--
 datafusion/core/src/physical_plan/filter.rs        |   60 +-
 datafusion/core/src/physical_plan/functions.rs     |   11 +
 .../core/src/physical_plan/hash_aggregate.rs       |  415 ++--
 datafusion/core/src/physical_plan/hash_join.rs     |   23 +-
 datafusion/core/src/physical_plan/limit.rs         |   23 +-
 datafusion/core/src/physical_plan/memory.rs        |    5 +-
 datafusion/core/src/physical_plan/mod.rs           |   32 +-
 datafusion/core/src/physical_plan/planner.rs       |  102 +-
 datafusion/core/src/physical_plan/projection.rs    |   19 +-
 datafusion/core/src/physical_plan/repartition.rs   |   25 +-
 datafusion/core/src/physical_plan/sorts/sort.rs    |  453 +++-
 .../physical_plan/sorts/sort_preserving_merge.rs   |   39 +-
 datafusion/core/src/physical_plan/union.rs         |    7 +-
 datafusion/core/src/physical_plan/values.rs        |   17 +-
 .../src/physical_plan/windows/window_agg_exec.rs   |  112 +-
 datafusion/core/src/prelude.rs                     |   15 +-
 datafusion/core/src/row/jit/mod.rs                 |  224 ++
 datafusion/core/src/row/jit/reader.rs              |  164 ++
 datafusion/core/src/row/jit/writer.rs              |  210 ++
 datafusion/core/src/row/layout.rs                  |   67 +
 datafusion/core/src/row/mod.rs                     |  374 +--
 datafusion/core/src/row/reader.rs                  |  220 +-
 datafusion/core/src/row/validity.rs                |  161 ++
 datafusion/core/src/row/writer.rs                  |  229 +-
 datafusion/core/src/sql/parser.rs                  |  137 ++
 datafusion/core/src/sql/planner.rs                 |  250 +-
 datafusion/core/src/test/exec.rs                   |   26 +-
 datafusion/core/tests/aggregate_simple_pipe.csv    |   16 +
 datafusion/core/tests/custom_sources.rs            |   17 +-
 datafusion/core/tests/decimal_data.csv             |   16 +
 datafusion/core/tests/order_spill_fuzz.rs          |    8 +
 .../core/tests/parquet/repeat_much.snappy.parquet  |  Bin 0 -> 1261 bytes
 datafusion/core/tests/parquet_pruning.rs           |    6 +-
 datafusion/core/tests/path_partition.rs            |  131 ++
 datafusion/core/tests/provider_filter_pushdown.rs  |    2 +-
 datafusion/core/tests/sql/create_drop.rs           |   88 +
 datafusion/core/tests/sql/decimal.rs               |  482 ++++
 datafusion/core/tests/sql/expr.rs                  |  160 +-
 datafusion/core/tests/sql/functions.rs             |  183 ++
 datafusion/core/tests/sql/mod.rs                   |   12 +-
 datafusion/core/tests/sql/order.rs                 |   23 +
 datafusion/core/tests/sql/parquet.rs               |   16 +-
 datafusion/core/tests/sql/predicates.rs            |   17 +
 datafusion/core/tests/sql/select.rs                |   31 -
 datafusion/core/tests/statistics.rs                |   14 +-
 datafusion/core/tests/user_defined_plan.rs         |   15 +-
 datafusion/expr/Cargo.toml                         |    4 +-
 datafusion/expr/src/built_in_function.rs           |    6 +
 datafusion/expr/src/expr_fn.rs                     |    9 +
 datafusion/expr/src/operator.rs                    |    3 +
 datafusion/jit/Cargo.toml                          |    2 +-
 datafusion/physical-expr/Cargo.toml                |    2 +-
 .../physical-expr/src/coercion_rule/binary_rule.rs |   40 +
 .../physical-expr/src/conditional_expressions.rs   |  100 +
 .../src/expressions/approx_percentile_cont.rs      |   11 +-
 datafusion/physical-expr/src/expressions/binary.rs |  323 ++-
 datafusion/physical-expr/src/expressions/case.rs   |  157 +-
 .../physical-expr/src/expressions/in_list.rs       |  413 +++-
 .../physical-expr/src/expressions/lead_lag.rs      |    4 +-
 datafusion/physical-expr/src/expressions/not.rs    |   44 +-
 datafusion/physical-expr/src/lib.rs                |    1 +
 datafusion/physical-expr/src/physical_expr.rs      |   33 +-
 datafusion/physical-expr/src/string_expressions.rs |   21 +-
 datafusion/proto/proto/datafusion.proto            |   25 +
 datafusion/proto/src/from_proto.rs                 |   55 +-
 datafusion/proto/src/to_proto.rs                   |   48 +-
 docs/source/specification/quarterly_roadmap.md     |   84 +-
 docs/source/user-guide/cli.md                      |   17 +-
 docs/source/user-guide/sql/ddl.md                  |   15 +
 integration-tests/test_psql_parity.py              |    2 +-
 157 files changed, 9286 insertions(+), 2746 deletions(-)
 copy ci/scripts/integration_hiveserver2.sh => .editorconfig (71%)
 mode change 100755 => 100644
 create mode 100644 ballista/rust/core/src/plugin/mod.rs
 create mode 100644 ballista/rust/core/src/plugin/plugin_manager.rs
 create mode 100644 ballista/rust/core/src/plugin/udf.rs
 create mode 100644 datafusion-cli/Cargo.lock
 create mode 100644 datafusion/core/src/row/jit/mod.rs
 create mode 100644 datafusion/core/src/row/jit/reader.rs
 create mode 100644 datafusion/core/src/row/jit/writer.rs
 create mode 100644 datafusion/core/src/row/layout.rs
 create mode 100644 datafusion/core/src/row/validity.rs
 create mode 100644 datafusion/core/tests/aggregate_simple_pipe.csv
 create mode 100644 datafusion/core/tests/decimal_data.csv
 create mode 100644 datafusion/core/tests/parquet/repeat_much.snappy.parquet
 create mode 100644 datafusion/core/tests/sql/decimal.rs
 create mode 100644 datafusion/physical-expr/src/conditional_expressions.rs


[arrow-datafusion] 01/01: add a Tablesource

Posted by ne...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch rdbms-changes
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git

commit 724f4e3363289607fed44ce30e9a1992df55d58a
Author: Wakahisa <ne...@gmail.com>
AuthorDate: Mon Feb 14 22:50:05 2022 +0200

    add a Tablesource
    
    Tablesource contains more information about the source of the table.
    It can be a relational table, file(s), in-memory or unspecified.
---
 datafusion/core/src/datasource/datasource.rs | 34 ++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/datafusion/core/src/datasource/datasource.rs b/datafusion/core/src/datasource/datasource.rs
index 1b59c857f..48a2dc09e 100644
--- a/datafusion/core/src/datasource/datasource.rs
+++ b/datafusion/core/src/datasource/datasource.rs
@@ -55,6 +55,35 @@ pub enum TableType {
     Temporary,
 }
 
+/// Indicates the source of this table for metadata/catalog purposes.
+#[derive(Debug, Clone, PartialEq)]
+pub enum TableSource {
+    /// An ordinary physical table.
+    Relational {
+        ///
+        server: Option<String>,
+        ///
+        database: Option<String>,
+        ///
+        schema: Option<String>,
+        ///
+        table: String
+    },
+    /// A file on some file system
+    File {
+        ///
+        protocol: String,
+        ///
+        path: String,
+        ///
+        format: String,
+    },
+    /// A transient table.
+    InMemory,
+    /// An unspecified source, used as the default
+    Unspecified,
+}
+
 /// Source table
 #[async_trait]
 pub trait TableProvider: Sync + Send {
@@ -70,6 +99,11 @@ pub trait TableProvider: Sync + Send {
         TableType::Base
     }
 
+    /// The source of this table
+    fn table_source(&self) -> TableSource {
+        TableSource::Unspecified
+    }
+
     /// Create an ExecutionPlan that will scan the table.
     /// The table provider will be usually responsible of grouping
     /// the source data into partitions that can be efficiently