You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rdblue <gi...@git.apache.org> on 2018/05/11 21:21:58 UTC

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

GitHub user rdblue opened a pull request:

    https://github.com/apache/spark/pull/21306

    [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog support.

    ## What changes were proposed in this pull request?
    
    This adds a mix-in to `DataSourceV2` that allows implementations to support catalog operations: create table, drop table, and alter table.
    
    ## How was this patch tested?
    
    Work in progress. This is for discussion right now.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rdblue/spark SPARK-24252-add-datasource-v2-catalog

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21306.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21306
    
----
commit 34f91c58ca23b81a7ee6f9270f30001e8885733e
Author: Ryan Blue <bl...@...>
Date:   2018-05-05T01:13:01Z

    SPARK-24252: Add v2 data source mix-in for catalog support.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by rdblue <gi...@git.apache.org>.
GitHub user rdblue reopened a pull request:

    https://github.com/apache/spark/pull/21306

    [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog support.

    ## What changes were proposed in this pull request?
    
    This adds a mix-in to `DataSourceV2` that allows implementations to support catalog operations: load table (for schema), create table, drop table, and alter table.
    
    This does not include the proposed transactional API.
    
    ## How was this patch tested?
    
    Work in progress. This is for discussion right now.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rdblue/spark SPARK-24252-add-datasource-v2-catalog

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21306.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21306
    
----
commit 915d72f8a3cf8df428ecdac25f30545d963ee5f7
Author: Ryan Blue <bl...@...>
Date:   2018-05-05T01:13:01Z

    SPARK-24252: Add v2 data source mix-in for catalog support.

commit fb55395a657588d1f87b13dbb1793d13991dc2b0
Author: Ryan Blue <bl...@...>
Date:   2018-05-11T21:27:47Z

    SPARK-24252: Add copyright headers.

commit 42ed4a4a138e5c5f681755d871fd9d9030a4619a
Author: Ryan Blue <bl...@...>
Date:   2018-07-04T17:02:52Z

    SPARK-24252: Update for review comments.
    
    * Rename CatalogSupport to TableSupport
    * Rename DataSourceCatalog to TableCatalog
    * Remove name and database from Table

commit 023995d15b4293fac1530da6bd966b6ab6823980
Author: Ryan Blue <bl...@...>
Date:   2018-07-04T17:19:45Z

    SPARK-24252: Add TableChange example to Javadocs.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r200170480
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/Table.java ---
    @@ -0,0 +1,59 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2.catalog;
    +
    +import org.apache.spark.sql.catalyst.expressions.Expression;
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.List;
    +import java.util.Map;
    +
    +/**
    + * Represents table metadata from a {@link DataSourceCatalog}.
    + */
    +public interface Table {
    --- End diff --
    
    This interface is for named tables, not path-based tables. I think that would probably be a different interface because not all sources can support path-based tables. Cassandra and JDBC are good examples.
    
    For the table metadata, I think we do need partitions. Iceberg creates partitioned tables and I'd like to start getting the DDL operations working. This is why I proposed this metadata on the SPIP a few months ago. We seem to have lazy consensus around it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237723417
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/Table.java ---
    @@ -0,0 +1,46 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.List;
    +import java.util.Map;
    +
    +/**
    + * Represents table metadata from a {@link TableCatalog} or other table sources.
    + */
    +public interface Table {
    +  /**
    +   * Return the table properties.
    +   * @return this table's map of string properties
    +   */
    +  Map<String, String> properties();
    +
    +  /**
    +   * Return the table schema.
    +   * @return this table's schema as a struct type
    +   */
    +  StructType schema();
    +
    +  /**
    +   * Return the table partitioning transforms.
    +   * @return this table's partitioning transforms
    +   */
    +  List<PartitionTransform> partitioning();
    --- End diff --
    
    Should we default this to no partitioning?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1328/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #92621 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92621/testReport)** for PR 21306 at commit [`42ed4a4`](https://github.com/apache/spark/commit/42ed4a4a138e5c5f681755d871fd9d9030a4619a).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237699163
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/Table.java ---
    @@ -0,0 +1,46 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.List;
    +import java.util.Map;
    +
    +/**
    + * Represents table metadata from a {@link TableCatalog} or other table sources.
    --- End diff --
    
    Does it include a View, like what we are doing in the CatalogTable?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1327/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r200709816
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableCatalog.java ---
    @@ -0,0 +1,123 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2.catalog;
    +
    +import org.apache.spark.sql.catalyst.TableIdentifier;
    +import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
    +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException;
    +import org.apache.spark.sql.catalyst.expressions.Expression;
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.List;
    +import java.util.Map;
    +
    +public interface TableCatalog {
    +  /**
    +   * Load table metadata by {@link TableIdentifier identifier} from the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @return the table's metadata
    +   * @throws NoSuchTableException If the table doesn't exist.
    +   */
    +  Table loadTable(TableIdentifier ident) throws NoSuchTableException;
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the identifier
    +   */
    +  default Table createTable(TableIdentifier ident,
    +                            StructType schema) throws TableAlreadyExistsException {
    +    return createTable(ident, schema, Collections.emptyList(), Collections.emptyMap());
    +  }
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @param properties a string map of table properties
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the identifier
    +   */
    +  default Table createTable(TableIdentifier ident,
    +                            StructType schema,
    +                            Map<String, String> properties) throws TableAlreadyExistsException {
    +    return createTable(ident, schema, Collections.emptyList(), properties);
    +  }
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @param partitions a list of expressions to use for partitioning data in the table
    +   * @param properties a string map of table properties
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the identifier
    +   */
    +  Table createTable(TableIdentifier ident,
    +                    StructType schema,
    +                    List<Expression> partitions,
    --- End diff --
    
    > The current end-user API only allows users to specify partition columns.
    
    I think an example would help understand the use of expression here. Right now, I can create a table partitioned by day like this:
    ```
    CREATE TABLE t (ts timestamp, data string, day string) PARTITIONED BY (day)
    ```
    
    Then it's up to queries to supply the right values for `day` in their queries. I'm proposing we change that to something like the following that uses expressions in the PARTITIONED BY clause instead of only allowing column names:
    ```
    CREATE TABLE t (ts timestamp, data string) PARTITIONED BY (date(ts));
    ```
    
    This can handle all identity partitioning in Hive tables today and it can handle bucketing.
    
    > And why does the "partition transform" belong to a table definition?
    
    Transforms should be passed to the table so the source use them for the physical layout. In DataSourceV2, the source could be anything so it needs to be the component that handles the physical layout. Because we want distributed data sources, we need some way of telling them how to distribute data.
    
    For example, I could use a partitioning expression to tell a source how to shard across PostgreSQL instances. I could also use it to define the keys in an HBase connector. Those are uses of partitioning that Spark can't handle internally.
    
    Like Hive, Spark has only supported a limited definition of partitioning up to now, but I'd like to be able to put tables using Hive's layout behind this API eventually. I think this way of configuring partitioning is a good way to do that, while supporting what Iceberg and other sources will need.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by tigerquoll <gi...@git.apache.org>.
Github user tigerquoll commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Sure,
    I am looking at the point of view of supporting Kudu.  Check out https://kudu.apache.org/docs/schema_design.html#partitioning for some of the details.  In particular https://kudu.apache.org/2016/08/23/new-range-partitioning-features.html.
    As kudu is a column store, each column also has attributes associated with it such as encoding and compression codecs.
    Apache Kudu - Apache Kudu Schema Design<https://kudu.apache.org/docs/schema_design.html#partitioning>
    A new open source Apache Hadoop ecosystem project, Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data
    kudu.apache.org
    
    
    I really think that partitions should be considered part of the table schema.  They have an existence above and beyond the definition of a filter that matches a record.  Adding an empty partition changes the state of many underlying systems.  Many systems that support partitions also have APIs for adding and removing partition definitions, some systems require partition information to be specified during table creation.  Those systems that support changing partitions after creation usually have specific for adding and removing partitions.
    
    
    Dale,
    
    ________________________________
    From: Ryan Blue <no...@github.com>
    Sent: Tuesday, 4 September 2018 4:20 PM
    To: apache/spark
    Cc: tigerquoll; Comment
    Subject: Re: [apache/spark] [SPARK-24252][SQL] Add catalog registration and table catalog APIs. (#21306)
    
    
    Can we support column range partition predicates please?
    
    This has an "apply" transform for passing other functions directly through, so that may help if you have additional transforms that aren't committed to Spark yet.
    
    As for range partitioning, can you be more specific about what you mean? What does that transform function look like? Part of the rationale for the existing proposal is that these are all widely used and understood. I want to make sure that as we expand the set of validated transforms, we aren't introducing confusion.
    
    Also, could you share the use case you intend for this? It would be great to hear about uses other than just Iceberg tables.
    
    —
    You are receiving this because you commented.
    Reply to this email directly, view it on GitHub<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F21306%23issuecomment-418430089&data=02%7C01%7C%7C335b27fc36b2449d1ac208d612824fa2%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636716748222067761&sdata=yWzFakaWAq5yhYAo%2FuBoFkIXpP9hoh9f1N6xm3XcQOs%3D&reserved=0>, or mute the thread<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAH9Fuh3RPZ-hTd5T3e92TX-xmiPHEGv5ks5uXqhEgaJpZM4T8FJh&data=02%7C01%7C%7C335b27fc36b2449d1ac208d612824fa2%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636716748222067761&sdata=wJSnYO69FKZ8ZHbqGNrxxGsjC1W0rR7NIWOAE0EqXTA%3D&reserved=0>.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94982/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    There are several things we need to discuss here:
    
    - What catalog operations we want to forward to the data source catalog? Currently it's create/drop/alter table, I think it's good enough for now.
    - How does Spark forward these catalog operations? IMO there are 2 ways.
      - Spark provides an API so that end-users can do it directly. e.g. `spark.catalog("iceberge").createTable(...)`, or SQL API `CREATE TABLE iceberge.db1.tbl1 ...`.
      - When creating/dropping/altering Spark tables, also forward it to the data source catalog. For example, users create a table in Spark via `CREATE TABLE t(...) USING iceberg`, which creates an table entry in the Hive metastore, as well as a iceberg meta file. When dropping this table, Spark should notify iceberg to remove the meta file. It's arguable that we need this feature or not, if users are willing to always add the catalog prefix, they can just write`CREATE TABLE iceberge.db1.tbl1 ...` and `SELECT ... FROM iceberge.db1.tbl1`, and totoally by-pass the Spark catalog.
    - How to lookup the table metadata from data source catalog? I think database name + table name is a common way(e.g. `iceberge.db1.tbl1`), but we should also consider other ways like path (e.g. `` delta.`/a/path` ``). Maybe we can treat path as a table name without database, and leave the data source to interprete it.
    - How to define table metadata? It seems that Spark only need to know the table schema for analysis. Maybe we can forward `DESC TABLE` to data source so that Spark doesn't need to standardize the table metadata.
    - How does the table metadata involve in data reading/writing? When reading data without catalog, e.g. `spark.read.format("my_data_source").option("table", "my_table").load()`, the data source need to get the metadata of the given table. When reading data with catalog, e.g. `spark.table("my_data_source.my_table")`, the data source also need to get the metadata of the given table, but need to implement it in a different API(`CatalogSupport`). It's ok to say that data source implementation is responsible to eliminate code duplication themselves.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    @rxin, I've updated this to use a new interface, `PartitionTransform`, instead of `Expression`. This is used to pass well-known transformations when creating tables, like `Filter` is used to pass a subset of `Expression` to data sources for filtering (or deletes in #21308).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92621/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    A general question. How to use this catalog API to implement the Hive metastore? Is it doable?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    @cloud-fan, thanks for the thorough feedback!
    
    > What catalog operations we want to forward to the data source catalog? Currently it's create/drop/alter table, I think it's good enough for now.
    
    This PR introduces create, drop, and alter. We can always add more later. These are the ones that we need to implement DataSourceV2 operations and DDL support.
    
    > Spark provides an API so that end-users can do it directly. e.g. `spark.catalog("iceberge").createTable(...)`, or SQL API `CREATE TABLE iceberge.db1.tbl1 . . .`
    
    These two are the easiest and least intrusive way to start because the data source catalog interaction is explicitly tied to a catalog. It also matches the behavior used by other systems for multiple catalogs. I think this is what we should start with and then tackle ideas like your second point.
    
    > When creating/dropping/altering Spark tables, also forward it to the data source catalog. . .
    
    For this and a couple other questions, I don't think we need to decide right now. This PR is about getting the interface for other sources in Spark. We don't necessarily need to know all of the ways that users will call it or interact with it, like how `DESC TABLE` will work.
    
    To your question here, I'm not sure whether the `CREATE TABLE ... USING source` syntax should use the default catalog or defer to the catalog for `source` or forward to both, but that doesn't need to block adding this API because I think we can decide it later. In addition, we should probably discuss this on the dev list to make sure we get the behavior right.
    
    > How to lookup the table metadata from data source catalog?
    
    The SPIP proposes two catalog interfaces that return `Table`. One that uses table identifiers and one that uses paths. Data sources can implement support for both or just one. This PR includes just the support for table identifiers. We would add a similar API for path-based tables in another PR.
    
    > How to define table metadata? Maybe we can forward `DESC TABLE` . . .
    
    That sounds like a reasonable idea to me. Like the behavior of `USING`, I don't think this is something that we have to decide right now. We can add support later as we implement table DDL. Maybe `Table` should return a DF that is its `DESCRIBE` output.
    
    > How does the table metadata involve in data reading/writing?
    
    This is another example of something we don't need to decide yet. We have a couple different options for the behavior and will want to think them through and discuss them on the dev list. But I don't think that the behavior necessarily needs to be decided before we add this API to sources.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #90532 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90532/testReport)** for PR 21306 at commit [`7130d13`](https://github.com/apache/spark/commit/7130d13de27c99480189cdce3b7f00749a801e9c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90532/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    @stczwd, I agree with @mccheah. Tables are basically named data sets. Whether they support batch, micro-batch streaming, or continuous streaming is determined by checking whether they implement SupportsBatchScan or similar interfaces. Matt's referenced docs are the right place to go for more context. The purpose here is to make catalogs and reads orthogonal. A catalog can return both batch-compatible and stream-compatible source "tables".
    
    A "table" may be a Kafka topic or may be a file-based data source. And note that both of those can support batch and streaming execution. A Kafka topic could be CDC stream that represents a table, and a file-based source could be streamed by periodically checking for new committed files.
    
    This PR is based on an [SPIP](https://docs.google.com/document/d/1zLFiA1VuaWeVxeTDXNg8bL6GP3BVoOZBkewFtEnjEoo/edit#heading=h.7vhjx9226jbt). That has some background for why I chose the set of table attributes here (schema, partitioning, properties), but a short summary is that those are the core set of attributes that are  used in comparable SQL variants and already used in Spark.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r200174526
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java ---
    @@ -0,0 +1,173 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2.catalog;
    +
    +import org.apache.spark.sql.types.DataType;
    +
    +/**
    + * TableChange subclasses represent requested changes to a table. These are passed to
    + * {@link DataSourceCatalog#alterTable}.
    + */
    +public interface TableChange {
    +
    +  /**
    +   * Create a TableChange for adding a top-level column to a table.
    +   * <p>
    +   * Because "." may be interpreted as a field path separator or may be used in field names, it is
    +   * not allowed in names passed to this method. To add to nested types or to add fields with
    +   * names that contain ".", use {@link #addColumn(String, String, DataType)}.
    +   *
    +   * @param name the new top-level column name
    +   * @param dataType the new column's data type
    +   * @return a TableChange for the addition
    +   */
    +  static TableChange addColumn(String name, DataType dataType) {
    +    return new AddColumn(null, name, dataType);
    +  }
    +
    +  /**
    +   * Create a TableChange for adding a nested column to a table.
    +   * <p>
    +   * The parent name is used to find the parent struct type where the nested field will be added.
    +   * If the parent name is null, the new column will be added to the root as a top-level column.
    +   * If parent identifies a struct, a new column is added to that struct. If it identifies a list,
    +   * the column is added to the list element struct, and if it identifies a map, the new column is
    +   * added to the map's value struct.
    +   * <p>
    +   * The given name is used to name the new column and names containing "." are not handled
    +   * differently.
    +   *
    +   * @param parent the new field's parent
    +   * @param name the new field name
    +   * @param dataType the new field's data type
    +   * @return a TableChange for the addition
    +   */
    +  static TableChange addColumn(String parent, String name, DataType dataType) {
    +    return new AddColumn(parent, name, dataType);
    +  }
    +
    +  /**
    +   * Create a TableChange for renaming a field.
    +   * <p>
    +   * The name is used to find the field to rename. The new name will replace the name of the type.
    +   * For example, renameColumn("a.b.c", "x") should produce column a.b.x.
    +   *
    +   * @param name the current field name
    +   * @param newName the new name
    +   * @return a TableChange for the rename
    +   */
    +  static TableChange renameColumn(String name, String newName) {
    +    return new RenameColumn(name, newName);
    +  }
    +
    +  /**
    +   * Create a TableChange for updating the type of a field.
    +   * <p>
    +   * The name is used to find the field to update.
    +   *
    +   * @param name the field name
    +   * @param newDataType the new data type
    +   * @return a TableChange for the update
    +   */
    +  static TableChange updateColumn(String name, DataType newDataType) {
    +    return new UpdateColumn(name, newDataType);
    +  }
    +
    +  /**
    +   * Create a TableChange for deleting a field from a table.
    +   *
    +   * @param name the name of the field to delete
    +   * @return a TableChange for the delete
    +   */
    +  static TableChange deleteColumn(String name) {
    +    return new DeleteColumn(name);
    +  }
    +
    +  final class AddColumn implements TableChange {
    --- End diff --
    
    Nevermind, I forgot that these are in an interface so they are automatically public.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #90531 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90531/testReport)** for PR 21306 at commit [`34f91c5`](https://github.com/apache/spark/commit/34f91c58ca23b81a7ee6f9270f30001e8885733e).
     * This patch **fails RAT tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  final class AddColumn implements TableChange `
      * `  final class RenameColumn implements TableChange `
      * `  final class UpdateColumn implements TableChange `
      * `  final class DeleteColumn implements TableChange `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93568/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237975013
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/CatalogProvider.java ---
    @@ -0,0 +1,50 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.internal.SQLConf;
    +
    +/**
    + * A marker interface to provide a catalog implementation for Spark.
    + * <p>
    + * Implementations can provide catalog functions by implementing additional interfaces, like
    + * {@link TableCatalog} to expose table operations.
    + * <p>
    + * Catalog implementations must implement this marker interface to be loaded by
    + * {@link Catalogs#load(String, SQLConf)}. The loader will instantiate catalog classes using the
    + * required public no-arg constructor. After creating an instance, it will be configured by calling
    + * {@link #initialize(CaseInsensitiveStringMap)}.
    + * <p>
    + * Catalog implementations are registered to a name by adding a configuration option to Spark:
    + * {@code spark.sql.catalog.catalog-name=com.example.YourCatalogClass}. All configuration properties
    + * in the Spark configuration that share the catalog name prefix,
    + * {@code spark.sql.catalog.catalog-name.(key)=(value)} will be passed in the case insensitive
    + * string map of options in initialization with the prefix removed. An additional property,
    + * {@code name}, is also added to the options and will contain the catalog's name; in this case,
    + * "catalog-name".
    + */
    +public interface CatalogProvider {
    --- End diff --
    
    @cloud-fan, do you want me to create the `sql-api` package in this PR, or do you want to add a separate PR to move the current v2 API?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    @henryr, @cloud-fan, @marmbrus, here's a first pass at adding a catalog mix-in to the v2 API. Please have a look and leave comments on what you'd like to change.
    
    One thing that I don't think we need right away is the `alterTable` operation. We could easily remove that and add it later. For CTAS and other operations, we do need `loadTable`, `createTable`, and `dropTable` soon.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r238046730
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/PartitionTransforms.java ---
    @@ -0,0 +1,229 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +/**
    + * A standard set of transformations that are passed to data sources during table creation.
    + *
    + * @see PartitionTransform
    + */
    +public class PartitionTransforms {
    +  private PartitionTransforms() {
    +  }
    +
    +  /**
    +   * Create a transform for a column with the given name.
    +   * <p>
    +   * This transform is used to pass named transforms that are not known to Spark.
    +   *
    +   * @param transform a name of the transform to apply to the column
    +   * @param colName a column name
    +   * @return an Apply transform for the column
    +   */
    +  public static PartitionTransform apply(String transform, String colName) {
    +    if ("identity".equals(transform)) {
    --- End diff --
    
    I think we should get this done now. Partition transforms are a generalization of Hive partitioning (which uses some columns directly) and bucketing (which is one specific transform). If we add transformation functions now, we will support both of those with a simple API instead of building in special cases for identity and bucket transforms.
    
    I also have a data source that allows users to configure partitioning using more transforms than just identity and bucketing, so I'd like to get this in so that DDL for those tables works.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237979620
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalog/v2/V1MetadataTable.scala ---
    @@ -0,0 +1,118 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2
    +
    +import java.util
    +
    +import scala.collection.JavaConverters._
    +
    +import org.apache.spark.sql.SaveMode
    +import org.apache.spark.sql.catalog.v2.PartitionTransforms.{bucket, identity}
    +import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogUtils}
    +import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, ReadSupport, WriteSupport}
    +import org.apache.spark.sql.sources.v2.reader.DataSourceReader
    +import org.apache.spark.sql.sources.v2.writer.DataSourceWriter
    +import org.apache.spark.sql.types.StructType
    +
    +/**
    + * An implementation of catalog v2 [[Table]] to expose v1 table metadata.
    + */
    +private[sql] class V1MetadataTable(
    +    v1Table: CatalogTable,
    +    v2Source: Option[DataSourceV2]) extends Table {
    +
    +  def readDelegate: ReadSupport = v2Source match {
    +    case r: ReadSupport => r
    +    case _ => throw new UnsupportedOperationException(s"Source does not support reads: $v2Source")
    +  }
    +
    +  def writeDelegate: WriteSupport = v2Source match {
    +    case w: WriteSupport => w
    +    case _ => throw new UnsupportedOperationException(s"Source does not support writes: $v2Source")
    +  }
    +
    +  lazy val options: Map[String, String] = {
    +    v1Table.storage.locationUri match {
    --- End diff --
    
    A second read over this, don't think we necessarily have to use the `Option` lambdas here, and in fact may be less legible, varying from developer to developer.
    
    But if one were to do so, it'd be something like this...
    
    ```
    v1Table.storage.properties + v1Table.storage.locationUri.map(uri -> Map("path" -> CatalogUtils.URITOString(uri)).getOrElse(Map.empty)
    ```
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r200171696
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java ---
    @@ -0,0 +1,173 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2.catalog;
    +
    +import org.apache.spark.sql.types.DataType;
    +
    +/**
    + * TableChange subclasses represent requested changes to a table. These are passed to
    + * {@link DataSourceCatalog#alterTable}.
    + */
    +public interface TableChange {
    +
    +  /**
    +   * Create a TableChange for adding a top-level column to a table.
    +   * <p>
    +   * Because "." may be interpreted as a field path separator or may be used in field names, it is
    +   * not allowed in names passed to this method. To add to nested types or to add fields with
    +   * names that contain ".", use {@link #addColumn(String, String, DataType)}.
    +   *
    +   * @param name the new top-level column name
    +   * @param dataType the new column's data type
    +   * @return a TableChange for the addition
    +   */
    +  static TableChange addColumn(String name, DataType dataType) {
    +    return new AddColumn(null, name, dataType);
    +  }
    +
    +  /**
    +   * Create a TableChange for adding a nested column to a table.
    +   * <p>
    +   * The parent name is used to find the parent struct type where the nested field will be added.
    +   * If the parent name is null, the new column will be added to the root as a top-level column.
    +   * If parent identifies a struct, a new column is added to that struct. If it identifies a list,
    +   * the column is added to the list element struct, and if it identifies a map, the new column is
    +   * added to the map's value struct.
    +   * <p>
    +   * The given name is used to name the new column and names containing "." are not handled
    +   * differently.
    +   *
    +   * @param parent the new field's parent
    +   * @param name the new field name
    +   * @param dataType the new field's data type
    +   * @return a TableChange for the addition
    +   */
    +  static TableChange addColumn(String parent, String name, DataType dataType) {
    +    return new AddColumn(parent, name, dataType);
    +  }
    +
    +  /**
    +   * Create a TableChange for renaming a field.
    +   * <p>
    +   * The name is used to find the field to rename. The new name will replace the name of the type.
    +   * For example, renameColumn("a.b.c", "x") should produce column a.b.x.
    +   *
    +   * @param name the current field name
    +   * @param newName the new name
    +   * @return a TableChange for the rename
    +   */
    +  static TableChange renameColumn(String name, String newName) {
    +    return new RenameColumn(name, newName);
    +  }
    +
    +  /**
    +   * Create a TableChange for updating the type of a field.
    +   * <p>
    +   * The name is used to find the field to update.
    +   *
    +   * @param name the field name
    +   * @param newDataType the new data type
    +   * @return a TableChange for the update
    +   */
    +  static TableChange updateColumn(String name, DataType newDataType) {
    +    return new UpdateColumn(name, newDataType);
    +  }
    +
    +  /**
    +   * Create a TableChange for deleting a field from a table.
    +   *
    +   * @param name the name of the field to delete
    +   * @return a TableChange for the delete
    +   */
    +  static TableChange deleteColumn(String name) {
    +    return new DeleteColumn(name);
    +  }
    +
    +  final class AddColumn implements TableChange {
    --- End diff --
    
    Just noticed that these aren't public, but should be because they will be passed to implementations through `alterTable`.
    
    These should also implement `unapply` for Scala implementations.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93567/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    > Can we support column range partition predicates please?
    
    This has an "apply" transform for passing other functions directly through, so that may help if you have additional transforms that aren't committed to Spark yet.
    
    As for range partitioning, can you be more specific about what you mean? What does that transform function look like? Part of the rationale for the existing proposal is that these are all widely used and understood. I want to make sure that as we expand the set of validated transforms, we aren't introducing confusion.
    
    Also, could you share the use case you intend for this? It would be great to hear about uses other than just Iceberg tables.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237972182
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableCatalog.java ---
    @@ -0,0 +1,137 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.catalyst.TableIdentifier;
    +import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
    +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException;
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.List;
    +import java.util.Map;
    +
    +public interface TableCatalog extends CatalogProvider {
    --- End diff --
    
    The intent is to use some interface to load all catalogs, whether they implement `TableCatalog`, `FunctionCatalog`, or both (or other catalog API parts). So you load a catalog, then check whether it is a `TableCatalog` when you want to use it for tables.
    
    Sounds like the name `CatalogProvider` is the confusing part. You're right that a provider usually implements a get method to provide something. I could change that to `CatalogImpl` or something. Would that work?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #94917 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94917/testReport)** for PR 21306 at commit [`fa0edeb`](https://github.com/apache/spark/commit/fa0edeb1570485cc7d6cd0f848caaaf20480f384).
     * This patch **fails to generate documentation**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class V1TableCatalog(sessionState: SessionState) extends TableCatalog `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237985697
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableCatalog.java ---
    @@ -0,0 +1,137 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.catalyst.TableIdentifier;
    +import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
    +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException;
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.List;
    +import java.util.Map;
    +
    +public interface TableCatalog extends CatalogProvider {
    --- End diff --
    
    I'm ok with that!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93581/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    @cloud-fan, what needs to change to get this in? I'd like to start making more PRs based on these changes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r199992580
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java ---
    @@ -0,0 +1,173 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2.catalog;
    +
    +import org.apache.spark.sql.types.DataType;
    +
    +/**
    + * TableChange subclasses represent requested changes to a table. These are passed to
    + * {@link DataSourceCatalog#alterTable}.
    + */
    +public interface TableChange {
    --- End diff --
    
    This is great!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237973548
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalog/v2/V1MetadataTable.scala ---
    @@ -0,0 +1,118 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2
    +
    +import java.util
    +
    +import scala.collection.JavaConverters._
    +
    +import org.apache.spark.sql.SaveMode
    +import org.apache.spark.sql.catalog.v2.PartitionTransforms.{bucket, identity}
    +import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogUtils}
    +import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, ReadSupport, WriteSupport}
    +import org.apache.spark.sql.sources.v2.reader.DataSourceReader
    +import org.apache.spark.sql.sources.v2.writer.DataSourceWriter
    +import org.apache.spark.sql.types.StructType
    +
    +/**
    + * An implementation of catalog v2 [[Table]] to expose v1 table metadata.
    + */
    +private[sql] class V1MetadataTable(
    +    v1Table: CatalogTable,
    +    v2Source: Option[DataSourceV2]) extends Table {
    +
    +  def readDelegate: ReadSupport = v2Source match {
    +    case r: ReadSupport => r
    +    case _ => throw new UnsupportedOperationException(s"Source does not support reads: $v2Source")
    +  }
    +
    +  def writeDelegate: WriteSupport = v2Source match {
    +    case w: WriteSupport => w
    +    case _ => throw new UnsupportedOperationException(s"Source does not support writes: $v2Source")
    +  }
    +
    +  lazy val options: Map[String, String] = {
    +    v1Table.storage.locationUri match {
    --- End diff --
    
    I use lazy for a couple reasons. First, to avoid building maps or other data values that are never used. Second, to avoid a required ordering for fields. If fields depend on one another, then they have to be reordered when those dependencies change. Lazy values never require reordering.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #93567 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93567/testReport)** for PR 21306 at commit [`1f92a79`](https://github.com/apache/spark/commit/1f92a79f516fa4755470f70360842ded11fe9ed3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3159/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #94906 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94906/testReport)** for PR 21306 at commit [`622180a`](https://github.com/apache/spark/commit/622180a50e05b4d968380824f5dbbe5f89e42422).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class Transforms `
      * `  public static final class Identity extends SingleColumnTransform `
      * `  public static final class Bucket extends SingleColumnTransform `
      * `  public static final class Year extends SingleColumnTransform `
      * `  public static final class Month extends SingleColumnTransform `
      * `  public static final class Date extends SingleColumnTransform `
      * `  public static final class DateAndHour extends SingleColumnTransform `
      * `  public static final class Apply extends SingleColumnTransform `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2294/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #93581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93581/testReport)** for PR 21306 at commit [`f95800c`](https://github.com/apache/spark/commit/f95800c737f160255122da6bbe336309a4e1532e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237971241
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/Table.java ---
    @@ -0,0 +1,46 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.List;
    +import java.util.Map;
    +
    +/**
    + * Represents table metadata from a {@link TableCatalog} or other table sources.
    + */
    +public interface Table {
    +  /**
    +   * Return the table properties.
    +   * @return this table's map of string properties
    +   */
    +  Map<String, String> properties();
    --- End diff --
    
    Yeah, that works.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r200710273
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableCatalog.java ---
    @@ -0,0 +1,123 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2.catalog;
    +
    +import org.apache.spark.sql.catalyst.TableIdentifier;
    +import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
    +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException;
    +import org.apache.spark.sql.catalyst.expressions.Expression;
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.List;
    +import java.util.Map;
    +
    +public interface TableCatalog {
    +  /**
    +   * Load table metadata by {@link TableIdentifier identifier} from the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @return the table's metadata
    +   * @throws NoSuchTableException If the table doesn't exist.
    +   */
    +  Table loadTable(TableIdentifier ident) throws NoSuchTableException;
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the identifier
    +   */
    +  default Table createTable(TableIdentifier ident,
    +                            StructType schema) throws TableAlreadyExistsException {
    +    return createTable(ident, schema, Collections.emptyList(), Collections.emptyMap());
    +  }
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @param properties a string map of table properties
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the identifier
    +   */
    +  default Table createTable(TableIdentifier ident,
    +                            StructType schema,
    +                            Map<String, String> properties) throws TableAlreadyExistsException {
    +    return createTable(ident, schema, Collections.emptyList(), properties);
    +  }
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @param partitions a list of expressions to use for partitioning data in the table
    +   * @param properties a string map of table properties
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the identifier
    +   */
    +  Table createTable(TableIdentifier ident,
    +                    StructType schema,
    +                    List<Expression> partitions,
    --- End diff --
    
    Another benefit: this would allow us to translate `BUCKETED BY` clauses into something we can actually pass to data sources.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237971092
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/PartitionTransforms.java ---
    @@ -0,0 +1,229 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +/**
    + * A standard set of transformations that are passed to data sources during table creation.
    + *
    + * @see PartitionTransform
    + */
    +public class PartitionTransforms {
    +  private PartitionTransforms() {
    +  }
    +
    +  /**
    +   * Create a transform for a column with the given name.
    +   * <p>
    +   * This transform is used to pass named transforms that are not known to Spark.
    +   *
    +   * @param transform a name of the transform to apply to the column
    +   * @param colName a column name
    +   * @return an Apply transform for the column
    +   */
    +  public static PartitionTransform apply(String transform, String colName) {
    +    if ("identity".equals(transform)) {
    --- End diff --
    
    What I wanted to discuss on Wednesday was how to pass these transforms. @rxin and I had some discussions about it on the dev list, but we didn't come up with a decision. I think the solution will probably be to add way to pass generic function application and a list of arguments that are either columns or constant literals.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #93567 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93567/testReport)** for PR 21306 at commit [`1f92a79`](https://github.com/apache/spark/commit/1f92a79f516fa4755470f70360842ded11fe9ed3).
     * This patch **fails to generate documentation**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #93605 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93605/testReport)** for PR 21306 at commit [`1ca9115`](https://github.com/apache/spark/commit/1ca91158f2d470ccb64930682c547d3c56bec603).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    where are we on this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r200418152
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableCatalog.java ---
    @@ -0,0 +1,123 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2.catalog;
    +
    +import org.apache.spark.sql.catalyst.TableIdentifier;
    +import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
    +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException;
    +import org.apache.spark.sql.catalyst.expressions.Expression;
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.List;
    +import java.util.Map;
    +
    +public interface TableCatalog {
    +  /**
    +   * Load table metadata by {@link TableIdentifier identifier} from the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @return the table's metadata
    +   * @throws NoSuchTableException If the table doesn't exist.
    +   */
    +  Table loadTable(TableIdentifier ident) throws NoSuchTableException;
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the identifier
    +   */
    +  default Table createTable(TableIdentifier ident,
    +                            StructType schema) throws TableAlreadyExistsException {
    +    return createTable(ident, schema, Collections.emptyList(), Collections.emptyMap());
    +  }
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @param properties a string map of table properties
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the identifier
    +   */
    +  default Table createTable(TableIdentifier ident,
    +                            StructType schema,
    +                            Map<String, String> properties) throws TableAlreadyExistsException {
    +    return createTable(ident, schema, Collections.emptyList(), properties);
    +  }
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @param partitions a list of expressions to use for partitioning data in the table
    +   * @param properties a string map of table properties
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the identifier
    +   */
    +  Table createTable(TableIdentifier ident,
    +                    StructType schema,
    +                    List<Expression> partitions,
    --- End diff --
    
    I recommend reading the proposal SPIP's "Proposed Changes" section, which goes into more detail than this comment can. In short, you're thinking of partitions as columns like Hive tables, but that is a narrow definition that prevents the underlying format from optimizing queries.
    
    Partitions of a table are derived from the column data through some transform. For example, partitioning by day uses a day transform from a timestamp column: `day(ts)`. Hive doesn't keep track of that transform and requires queries to handle it by inserting both `ts` and `day` columns. This leads to a few problems, including:
    * Hive has no ability to transform `ts > X` to the partition predicate `day >= day(X)`. Queries that don't take into account the table's physical storage by adding partition predicates by hand will result in full table scans.
    * Users can insert any data they choose into the `day` partition and it is up to them to do it correctly.
    
    Also, consider bucketing. Bucketing is also a transform that is effectively a partitioning of the table's files: `bucket=hash(col) % N`. The reason why bucketing is handled as a special case in Hive is that using it _requires_ knowing the transform and relationship between the bucket number and its column. If we think of partitioning as grouping data by common values of a set of transforms, then buckets are just another partition that we can use for purposes like bucketed joins or limiting scans when looking for specific values.
    
    If the transform is _identity_ -- just copy the value into partition data -- then you have the same functionality that Hive provides. But by building the transformations into the partitioning layer, we can do more to optimize queries, while hiding the physical layout of a table.
    
    Using Expression allows Spark to pass `day(ts)` to the data source. It is up to the source which expressions are supported. The current FS tables would reject any expression that isn't just a column reference. Iceberg supports `identity`, `year`, `month`, `day`, `hour`, `truncate`, and `bucket` transforms.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1322/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r230972839
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/Table.java ---
    @@ -0,0 +1,46 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.List;
    +import java.util.Map;
    +
    +/**
    + * Represents table metadata from a {@link TableCatalog} or other table sources.
    + */
    +public interface Table {
    --- End diff --
    
    The nomenclature here appears to conflict with @cloud-fan's refactor in https://github.com/apache/spark/pull/22547/files#diff-45399ef5eed5c873d5f12bf0f1671b8fR40. Maybe we can call this `TableMetadata` or `TableDescription`? Or perhaps we rename the other construct?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r205585462
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/CatalogProvider.java ---
    @@ -0,0 +1,50 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.internal.SQLConf;
    +
    +/**
    + * A marker interface to provide a catalog implementation for Spark.
    + * <p>
    + * Implementations can provide catalog functions by implementing additional interfaces, like
    + * {@link TableCatalog} to expose table operations.
    + * <p>
    + * Catalog implementations must implement this marker interface to be loaded by
    + * {@link Catalogs#load(String, SQLConf)}. The loader will instantiate catalog classes using the
    + * required public no-arg constructor. After creating an instance, it will be configured by calling
    + * {@link #initialize(CaseInsensitiveStringMap)}.
    + * <p>
    + * Catalog implementations are registered to a name by adding a configuration option to Spark:
    + * {@code spark.sql.catalog.catalog-name=com.example.YourCatalogClass}. All configuration properties
    + * in the Spark configuration that share the catalog name prefix,
    + * {@code spark.sql.catalog.catalog-name.(key)=(value)} will be passed in the case insensitive
    + * string map of options in initialization with the prefix removed. An additional property,
    + * {@code name}, is also added to the options and will contain the catalog's name; in this case,
    + * "catalog-name".
    + */
    +public interface CatalogProvider {
    +  /**
    +   * Called to initialize configuration.
    +   * <p>
    +   * This method is called once, just after the provider is instantiated.
    +   *
    +   * @param options a case-insensitive string map of configuration
    +   */
    +  void initialize(CaseInsensitiveStringMap options);
    --- End diff --
    
    Got it. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237721926
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableCatalog.java ---
    @@ -0,0 +1,137 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.catalyst.TableIdentifier;
    +import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
    +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException;
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.List;
    +import java.util.Map;
    +
    +public interface TableCatalog extends CatalogProvider {
    --- End diff --
    
    The semantics here aren't clear at least to me. Typically a `<something>Provider` is a class that can instantiate a `something`. Here it appears to provide itself? The abstraction I would imagine would be to have either:
    
    * `CatalogProvider` has a method called `get(options, sqlConf)` which returns a `TableCatalog` configured with the given options and `SQLConf`, or
    * Remove `CatalogProvider` entirely and put `initialize` in this interface. Every `TableCatalog` instance must be initialized before calling any other methods like `loadTable`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r200824504
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableCatalog.java ---
    @@ -0,0 +1,123 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2.catalog;
    +
    +import org.apache.spark.sql.catalyst.TableIdentifier;
    +import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
    +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException;
    +import org.apache.spark.sql.catalyst.expressions.Expression;
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.List;
    +import java.util.Map;
    +
    +public interface TableCatalog {
    +  /**
    +   * Load table metadata by {@link TableIdentifier identifier} from the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @return the table's metadata
    +   * @throws NoSuchTableException If the table doesn't exist.
    +   */
    +  Table loadTable(TableIdentifier ident) throws NoSuchTableException;
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the identifier
    +   */
    +  default Table createTable(TableIdentifier ident,
    +                            StructType schema) throws TableAlreadyExistsException {
    +    return createTable(ident, schema, Collections.emptyList(), Collections.emptyMap());
    +  }
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @param properties a string map of table properties
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the identifier
    +   */
    +  default Table createTable(TableIdentifier ident,
    +                            StructType schema,
    +                            Map<String, String> properties) throws TableAlreadyExistsException {
    +    return createTable(ident, schema, Collections.emptyList(), properties);
    +  }
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @param partitions a list of expressions to use for partitioning data in the table
    +   * @param properties a string map of table properties
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the identifier
    +   */
    +  Table createTable(TableIdentifier ident,
    +                    StructType schema,
    +                    List<Expression> partitions,
    --- End diff --
    
    I wouldn't say this way of passing partitioning is a new feature. It's just a generalization of the existing partitioning that allows us to pass any type of partition, whether it is bucketing or column-based.
    
    As for open discussion, this was proposed in the SPIP that was fairly widely read and commented on. That SPIP was posted to the dev list a few times, too. I do appreciate you wanting to make sure there's been a chance for the community to discuss it, but there has been plenty of opportunity to comment. At this point, I think it's reasonable to move forward with the implementation.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r200639540
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableCatalog.java ---
    @@ -0,0 +1,123 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2.catalog;
    +
    +import org.apache.spark.sql.catalyst.TableIdentifier;
    +import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
    +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException;
    +import org.apache.spark.sql.catalyst.expressions.Expression;
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.List;
    +import java.util.Map;
    +
    +public interface TableCatalog {
    +  /**
    +   * Load table metadata by {@link TableIdentifier identifier} from the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @return the table's metadata
    +   * @throws NoSuchTableException If the table doesn't exist.
    +   */
    +  Table loadTable(TableIdentifier ident) throws NoSuchTableException;
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the identifier
    +   */
    +  default Table createTable(TableIdentifier ident,
    +                            StructType schema) throws TableAlreadyExistsException {
    +    return createTable(ident, schema, Collections.emptyList(), Collections.emptyMap());
    +  }
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @param properties a string map of table properties
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the identifier
    +   */
    +  default Table createTable(TableIdentifier ident,
    +                            StructType schema,
    +                            Map<String, String> properties) throws TableAlreadyExistsException {
    +    return createTable(ident, schema, Collections.emptyList(), properties);
    +  }
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @param partitions a list of expressions to use for partitioning data in the table
    +   * @param properties a string map of table properties
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the identifier
    +   */
    +  Table createTable(TableIdentifier ident,
    +                    StructType schema,
    +                    List<Expression> partitions,
    --- End diff --
    
    I read the SPIP but I still can't figure it out. How can Spark pass these "partition transform" to data source? The current end-user API only allows users to specify partition columns.
    
    And why does the "partition transform" belong to a table definition? I think it's reasonable to say that a table is partition by column `timestamp`, and supports pushing partition predicates like `year(timestamp) > 2000`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #92621 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92621/testReport)** for PR 21306 at commit [`42ed4a4`](https://github.com/apache/spark/commit/42ed4a4a138e5c5f681755d871fd9d9030a4619a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93556/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2342/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #94982 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94982/testReport)** for PR 21306 at commit [`6b45a11`](https://github.com/apache/spark/commit/6b45a119df8e6382fa2503f854b4a85aed3e3785).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #94773 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94773/testReport)** for PR 21306 at commit [`21acda2`](https://github.com/apache/spark/commit/21acda237744d4299e5bb449dce1ec8a1735f6de).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r211057651
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalog/v2/V1MetadataTable.scala ---
    @@ -0,0 +1,118 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2
    +
    +import java.util
    +
    +import scala.collection.JavaConverters._
    +
    +import org.apache.spark.sql.SaveMode
    +import org.apache.spark.sql.catalog.v2.PartitionTransforms.{bucket, identity}
    +import org.apache.spark.sql.catalyst.catalog.CatalogTable
    +import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, ReadSupport, WriteSupport}
    +import org.apache.spark.sql.sources.v2.reader.DataSourceReader
    +import org.apache.spark.sql.sources.v2.writer.DataSourceWriter
    +import org.apache.spark.sql.types.StructType
    +
    +/**
    + * An implementation of catalog v2 [[Table]] to expose v1 table metadata.
    + */
    +private[sql] class V1MetadataTable(
    --- End diff --
    
    @cloud-fan, I updated this PR that adds the `TableCatalog` API to include an implementation that uses the existing `SessionCatalog`. This `Table` class demonstrates how `Table` would implement `ReadSupport` and `WriteSupport`.
    
    The catalog returns these tables, which have `ReadSupport` and `WriteSupport` mixed in depending on whether the underlying `DataSourceV2` also supports them. In your updated API, it would use the `ReadSupportProvider` instead of the `DataSourceV2` directly, but the difference isn't very large.
    
    The follow-up PR for CTAS and RTAS, #21877, demonstrates how this would be used in the new logical plans.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    @stczwd my understanding here is that a table isn't a streaming table or a batch table, but rather that a table points to data that can either be scanned in stream or in batch, and that the table is responsible for returning either streaming scanners or batch scanners when the logical plan calls for it. The reason why I believe this is the case is because of https://github.com/apache/spark/pull/23086/files#diff-d111d7e2179b55465840c9a81ea004f2R65 and its eventual analogous streaming variant. In the new abstractions we propose here and in [our proposal](https://docs.google.com/document/d/1uUmKCpWLdh9vHxP7AWJ9EgbwB_U6T3EJYNjhISGmiQg/edit), the catalog gets a reference to a `Table` object that can build `Scan`s over that table.
    
    In other words, the crucial overarching theme in all of the following matters is that a Table isn't inherently a streaming or a batch table, but rather a Table supports returning streaming and/or batch scans. The table returned by the catalog is a pointer to the data, and the Scan defines how one reads that data.
    
    > Source needs to be defined for stream table
    
    The catalog returns an instance of `Table` that can create `Scan`s that support the `toStream` method.
    
    > Stream table requires a special flags to indicate that it is a stream table.
    
    When one gets back a `Scan`, calling its `toStream` method will indicate that the table's data is about to be scanned in a streaming manner.
    
    > User and Program need to be aware of whether this table is a stream table.
    
    Probably would be done from the SQL code side. But not as certain about this, can you elaborate?
    
    > What would we do if the user wants to change the stream table to batch table or convert the batch table to stream table?
    
    The new abstraction handles this at the `Scan` level instead of the `Table` level. `Table`s are themselves not streamed or batched, but rather they construct scans that can read them in either stream or batch; the Scan implements `toBatch` and/or `toStream` to support the appropriate read method.
    
    > What does the stream table metadata you define look like? What is the difference between batch table metadata and batch table metadata?
    
    This I don't think is as clear given what has been proposed so far. Will let others offer comment here.
    
    Others should feel free to offer more commentary or correct anything from above.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237723294
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/Table.java ---
    @@ -0,0 +1,46 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.List;
    +import java.util.Map;
    +
    +/**
    + * Represents table metadata from a {@link TableCatalog} or other table sources.
    + */
    +public interface Table {
    --- End diff --
    
    Looking at this patch in comparison to the other again (updated to https://github.com/apache/spark/pull/23086) it looks like this work should be rebased on top of the batch read refactor's PR in order to not have two `Table` classes that do the same thing - is this the right assessment?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by stczwd <gi...@git.apache.org>.
Github user stczwd commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    > @stczwd, thanks for taking a look at this. What are the differences between batch and stream DDL that you think will come up?
    
    1. Source needs to be defined for stream table
    2. Stream table requires a special flags to indicate that it is a stream table.
    3. User and Program need to be aware of whether this table is a stream table.
    4. What would we do if the user wants to change the stream table to batch table or convert the batch table to stream table?
    5. What does the stream table metadata you define look like? What is the difference between batch table metadata and batch table metadata?
    I defined the Stream Table based on DataSource V1 (see in[ Support SQLStreaming in Spark](https://github.com/apache/spark/pull/22575)),  but found that the above problem can not be completely solved with the catalog api.
    How would you solve these in mew Catalog?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #94773 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94773/testReport)** for PR 21306 at commit [`21acda2`](https://github.com/apache/spark/commit/21acda237744d4299e5bb449dce1ec8a1735f6de).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class CaseInsensitiveStringMap implements Map<String, String> `
      * `public class Catalogs `
      * `  final class AddColumn implements TableChange `
      * `  final class RenameColumn implements TableChange `
      * `  final class UpdateColumn implements TableChange `
      * `  final class DeleteColumn implements TableChange `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2193/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237722772
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/PartitionTransforms.java ---
    @@ -0,0 +1,229 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +/**
    + * A standard set of transformations that are passed to data sources during table creation.
    + *
    + * @see PartitionTransform
    + */
    +public class PartitionTransforms {
    +  private PartitionTransforms() {
    +  }
    +
    +  /**
    +   * Create a transform for a column with the given name.
    +   * <p>
    +   * This transform is used to pass named transforms that are not known to Spark.
    +   *
    +   * @param transform a name of the transform to apply to the column
    +   * @param colName a column name
    +   * @return an Apply transform for the column
    +   */
    +  public static PartitionTransform apply(String transform, String colName) {
    +    if ("identity".equals(transform)) {
    --- End diff --
    
    Is it necessary to support all `PartitionTransform` types for a first pass? Though, I would imagine we'd have to for the v2 to v1 catalog adapter. If it weren't for that, I would suggest supporting only a simple set of `PartitionTransform`, such as only `identity`, to keep this PR focused on the catalog API and not the partitions API.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r214977998
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableChange.java ---
    @@ -0,0 +1,182 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.types.DataType;
    +
    +/**
    + * TableChange subclasses represent requested changes to a table. These are passed to
    + * {@link TableCatalog#alterTable}. For example,
    + * <pre>
    + *   import TableChange._
    + *   val catalog = source.asInstanceOf[TableSupport].catalog()
    + *   catalog.alterTable(ident,
    + *       addColumn("x", IntegerType),
    + *       renameColumn("a", "b"),
    + *       deleteColumn("c")
    + *     )
    + * </pre>
    + */
    +public interface TableChange {
    --- End diff --
    
    No, this API covers table configuration, not data modification. If you're interested in dropping partitions, you should look at the [DeleteSupport API proposal](https://github.com/apache/spark/pull/21308).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94906/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #93605 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93605/testReport)** for PR 21306 at commit [`1ca9115`](https://github.com/apache/spark/commit/1ca91158f2d470ccb64930682c547d3c56bec603).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    @stczwd that's my understanding yeah. Others can feel free to correct me otherwise.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by tigerquoll <gi...@git.apache.org>.
Github user tigerquoll commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Can we support column range partition predicates please?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r200170491
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/CatalogSupport.java ---
    @@ -0,0 +1,36 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2;
    +
    +import org.apache.spark.sql.sources.v2.catalog.DataSourceCatalog;
    +
    +/**
    + * A mix-in interface for {@link DataSourceV2} catalog support. Data sources can implement this
    + * interface to provide the ability to load, create, alter, and drop tables.
    + * <p>
    + * Data sources must implement this interface to support logical operations that combine writing
    + * data with catalog tasks, like create-table-as-select.
    + */
    +public interface CatalogSupport {
    --- End diff --
    
    Works for me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r200236259
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableCatalog.java ---
    @@ -0,0 +1,123 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2.catalog;
    +
    +import org.apache.spark.sql.catalyst.TableIdentifier;
    +import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
    +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException;
    +import org.apache.spark.sql.catalyst.expressions.Expression;
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.List;
    +import java.util.Map;
    +
    +public interface TableCatalog {
    +  /**
    +   * Load table metadata by {@link TableIdentifier identifier} from the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @return the table's metadata
    +   * @throws NoSuchTableException If the table doesn't exist.
    +   */
    +  Table loadTable(TableIdentifier ident) throws NoSuchTableException;
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the identifier
    +   */
    +  default Table createTable(TableIdentifier ident,
    +                            StructType schema) throws TableAlreadyExistsException {
    +    return createTable(ident, schema, Collections.emptyList(), Collections.emptyMap());
    +  }
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @param properties a string map of table properties
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the identifier
    +   */
    +  default Table createTable(TableIdentifier ident,
    +                            StructType schema,
    +                            Map<String, String> properties) throws TableAlreadyExistsException {
    +    return createTable(ident, schema, Collections.emptyList(), properties);
    +  }
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @param partitions a list of expressions to use for partitioning data in the table
    +   * @param properties a string map of table properties
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the identifier
    +   */
    +  Table createTable(TableIdentifier ident,
    +                    StructType schema,
    +                    List<Expression> partitions,
    --- End diff --
    
    why it's expressions? In current Spark we only support PARTITION BY columns.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    #21888 demonstrates how to add a `TableIdentifier` with a catalog element, `CatalogTableIdentifier` and how to safely migrate from the old identifier to the new one with catalog.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #90531 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90531/testReport)** for PR 21306 at commit [`34f91c5`](https://github.com/apache/spark/commit/34f91c58ca23b81a7ee6f9270f30001e8885733e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    @stczwd, thanks for taking a look at this. What are the differences between batch and stream DDL that you think will come up?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #93581 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93581/testReport)** for PR 21306 at commit [`f95800c`](https://github.com/apache/spark/commit/f95800c737f160255122da6bbe336309a4e1532e).
     * This patch **fails Java style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237978582
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableCatalog.java ---
    @@ -0,0 +1,137 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.catalyst.TableIdentifier;
    +import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
    +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException;
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.List;
    +import java.util.Map;
    +
    +public interface TableCatalog extends CatalogProvider {
    --- End diff --
    
    Perhaps something as simple as `Catalog` as the top level then, which is sub-interfaced by `TableCatalog`, `FunctionCatalog`, and other "kinds" of catalogs. They can all share the same `initialize` method which is defined by `Catalog`. That sounds like the simplest idea, perhaps?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94773/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r199992200
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/Table.java ---
    @@ -0,0 +1,59 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2.catalog;
    +
    +import org.apache.spark.sql.catalyst.expressions.Expression;
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.List;
    +import java.util.Map;
    +
    +/**
    + * Represents table metadata from a {@link DataSourceCatalog}.
    + */
    +public interface Table {
    --- End diff --
    
    this is something we should decide now. IMO `schema` and `properties` are must-have, but others may not. e.g. if a data source uses a path to lookup table, then there is no database/table name to it. And we don't have a story to deal with partitions yet.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94825/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #94825 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94825/testReport)** for PR 21306 at commit [`622180a`](https://github.com/apache/spark/commit/622180a50e05b4d968380824f5dbbe5f89e42422).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class Transforms `
      * `  public static final class Identity extends SingleColumnTransform `
      * `  public static final class Bucket extends SingleColumnTransform `
      * `  public static final class Year extends SingleColumnTransform `
      * `  public static final class Month extends SingleColumnTransform `
      * `  public static final class Date extends SingleColumnTransform `
      * `  public static final class DateAndHour extends SingleColumnTransform `
      * `  public static final class Apply extends SingleColumnTransform `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r200171424
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/Table.java ---
    @@ -0,0 +1,59 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2.catalog;
    +
    +import org.apache.spark.sql.catalyst.expressions.Expression;
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.List;
    +import java.util.Map;
    +
    +/**
    + * Represents table metadata from a {@link DataSourceCatalog}.
    + */
    +public interface Table {
    --- End diff --
    
    I updated the last comment because I thought this was referring to `CatalogSupport` at first. Sorry about the confusion.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #93556 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93556/testReport)** for PR 21306 at commit [`46100f3`](https://github.com/apache/spark/commit/46100f3fc7c51b86bbdc03fcc7d7b3388748f698).
     * This patch **fails to generate documentation**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3158/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by stczwd <gi...@git.apache.org>.
Github user stczwd commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    @rdblue Have you considered about stream table API? It may have some differences between batch table ddl and stream table ddl.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r200178463
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java ---
    @@ -0,0 +1,173 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2.catalog;
    +
    +import org.apache.spark.sql.types.DataType;
    +
    +/**
    + * TableChange subclasses represent requested changes to a table. These are passed to
    + * {@link DataSourceCatalog#alterTable}.
    + */
    +public interface TableChange {
    +
    +  /**
    +   * Create a TableChange for adding a top-level column to a table.
    +   * <p>
    +   * Because "." may be interpreted as a field path separator or may be used in field names, it is
    +   * not allowed in names passed to this method. To add to nested types or to add fields with
    +   * names that contain ".", use {@link #addColumn(String, String, DataType)}.
    +   *
    +   * @param name the new top-level column name
    +   * @param dataType the new column's data type
    +   * @return a TableChange for the addition
    +   */
    +  static TableChange addColumn(String name, DataType dataType) {
    +    return new AddColumn(null, name, dataType);
    +  }
    +
    +  /**
    +   * Create a TableChange for adding a nested column to a table.
    +   * <p>
    +   * The parent name is used to find the parent struct type where the nested field will be added.
    +   * If the parent name is null, the new column will be added to the root as a top-level column.
    +   * If parent identifies a struct, a new column is added to that struct. If it identifies a list,
    +   * the column is added to the list element struct, and if it identifies a map, the new column is
    +   * added to the map's value struct.
    +   * <p>
    +   * The given name is used to name the new column and names containing "." are not handled
    +   * differently.
    +   *
    +   * @param parent the new field's parent
    +   * @param name the new field name
    +   * @param dataType the new field's data type
    +   * @return a TableChange for the addition
    +   */
    +  static TableChange addColumn(String parent, String name, DataType dataType) {
    +    return new AddColumn(parent, name, dataType);
    +  }
    +
    +  /**
    +   * Create a TableChange for renaming a field.
    +   * <p>
    +   * The name is used to find the field to rename. The new name will replace the name of the type.
    +   * For example, renameColumn("a.b.c", "x") should produce column a.b.x.
    --- End diff --
    
    I added an example to the Javadocs:
    
    ```scala
    import TableChange._
    val catalog = source.asInstanceOf[TableSupport].catalog()
    catalog.alterTable(ident,
        addColumn("x", IntegerType),
        renameColumn("a", "b"),
        deleteColumn("c")
      )
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by stczwd <gi...@git.apache.org>.
Github user stczwd commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    @mccheah you mean the tables user created do not distinguish between stream and batch, but only when they are actually read from it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    @felixcheung, we're waiting on more reviews and a community decision about how to pass partition transforms.
    
    For passing transforms, I think the most reasonable compromise is to go with a generic function application, so each transform would be passed as a function/transform name with one or more arguments, where each argument is either a column reference (by name) or a literal value. That's a fairly small public API addition but it supports a lot of different partitioning schemes to be expressed, including the one above for Kudu.
    
    We already have all of this implemented based on the current PR, but I can update this in the next week or so.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r214978286
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableChange.java ---
    @@ -0,0 +1,182 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.types.DataType;
    +
    +/**
    + * TableChange subclasses represent requested changes to a table. These are passed to
    + * {@link TableCatalog#alterTable}. For example,
    + * <pre>
    + *   import TableChange._
    + *   val catalog = source.asInstanceOf[TableSupport].catalog()
    + *   catalog.alterTable(ident,
    + *       addColumn("x", IntegerType),
    + *       renameColumn("a", "b"),
    + *       deleteColumn("c")
    + *     )
    + * </pre>
    + */
    +public interface TableChange {
    --- End diff --
    
    Yeah, we can add that. What do you suggest changing?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    @marmbrus, @cloud-fan, @gatorsmile, I've updated this PR to use reflection to instantiate catalogs. This allows implementations to provide named catalogs (and reuse implementations) and configure those catalogs with Spark configuration properties.
    
    FYI @bersprockets, @felixcheung, @jzhuge


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r205571827
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/CatalogProvider.java ---
    @@ -0,0 +1,50 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.internal.SQLConf;
    +
    +/**
    + * A marker interface to provide a catalog implementation for Spark.
    + * <p>
    + * Implementations can provide catalog functions by implementing additional interfaces, like
    + * {@link TableCatalog} to expose table operations.
    + * <p>
    + * Catalog implementations must implement this marker interface to be loaded by
    + * {@link Catalogs#load(String, SQLConf)}. The loader will instantiate catalog classes using the
    + * required public no-arg constructor. After creating an instance, it will be configured by calling
    + * {@link #initialize(CaseInsensitiveStringMap)}.
    + * <p>
    + * Catalog implementations are registered to a name by adding a configuration option to Spark:
    + * {@code spark.sql.catalog.catalog-name=com.example.YourCatalogClass}. All configuration properties
    + * in the Spark configuration that share the catalog name prefix,
    + * {@code spark.sql.catalog.catalog-name.(key)=(value)} will be passed in the case insensitive
    + * string map of options in initialization with the prefix removed. An additional property,
    + * {@code name}, is also added to the options and will contain the catalog's name; in this case,
    + * "catalog-name".
    + */
    +public interface CatalogProvider {
    +  /**
    +   * Called to initialize configuration.
    +   * <p>
    +   * This method is called once, just after the provider is instantiated.
    +   *
    +   * @param options a case-insensitive string map of configuration
    +   */
    +  void initialize(CaseInsensitiveStringMap options);
    --- End diff --
    
    That's a Scala map and the v2 APIs are intended to be used with both Java and Scala. My intent is to reuse this map in place of DataSourceOptions, so at least we will reduce some duplication.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r238036776
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/PartitionTransforms.java ---
    @@ -0,0 +1,229 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +/**
    + * A standard set of transformations that are passed to data sources during table creation.
    + *
    + * @see PartitionTransform
    + */
    +public class PartitionTransforms {
    +  private PartitionTransforms() {
    +  }
    +
    +  /**
    +   * Create a transform for a column with the given name.
    +   * <p>
    +   * This transform is used to pass named transforms that are not known to Spark.
    +   *
    +   * @param transform a name of the transform to apply to the column
    +   * @param colName a column name
    +   * @return an Apply transform for the column
    +   */
    +  public static PartitionTransform apply(String transform, String colName) {
    +    if ("identity".equals(transform)) {
    --- End diff --
    
    Is it possible to defer partition support, or is is fundamentally important enough to get that correct now because we will be building on it on e.g. the very next evolution of this API and its uses? I'm thinking about how to minimize the amount of API we're proposing per change, particularly if choices aren't particularly obvious.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237723354
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/Table.java ---
    @@ -0,0 +1,46 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.List;
    +import java.util.Map;
    +
    +/**
    + * Represents table metadata from a {@link TableCatalog} or other table sources.
    + */
    +public interface Table {
    +  /**
    +   * Return the table properties.
    +   * @return this table's map of string properties
    +   */
    +  Map<String, String> properties();
    --- End diff --
    
    Should we `default` this to an empty Map? I don't think all tables will support custom properties.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r231706583
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/Table.java ---
    @@ -0,0 +1,46 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.List;
    +import java.util.Map;
    +
    +/**
    + * Represents table metadata from a {@link TableCatalog} or other table sources.
    + */
    +public interface Table {
    --- End diff --
    
    I think the two `Table` classes are trying to be the same thing. This is one of the reasons why I brought it up in the sync. @cloud-fan's current PR isn't yet based on this work, so it doesn't get the abstraction right.
    
    What you linked to uses `Table` to expose `newScanConfigBuilder`, basically requiring that all tables are readable. Instead, the implementation classes in #22547 should be interfaces that extend this `Table` to make it readable.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r200173138
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/Table.java ---
    @@ -0,0 +1,59 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2.catalog;
    +
    +import org.apache.spark.sql.catalyst.expressions.Expression;
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.List;
    +import java.util.Map;
    +
    +/**
    + * Represents table metadata from a {@link DataSourceCatalog}.
    + */
    +public interface Table {
    --- End diff --
    
    I'll just remove `name` and `database`. We can add them later when we figure out how we want to handle it. We need partitioning right away, though.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for table cat...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    @cloud-fan, @gatorsmile, I don't think this should be merged yet.
    
    I've been implementing CTAS and RTAS based on this commit and I don't think it makes sense to get a `TableCatalog` instance from the data source. The data source should be determined by the catalog, not the other way around.
    
    Otherwise, we could have a case where a `test` catalog uses the `parquet` source, but that `parquet` source would return a `prod` catalog for its `TableCatalog` because `prod` is the default. If catalogs can reuse data sources, then the catalog should be determined first.
    
    FYI @bersprockets, @felixcheung, @henryr 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #94825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94825/testReport)** for PR 21306 at commit [`622180a`](https://github.com/apache/spark/commit/622180a50e05b4d968380824f5dbbe5f89e42422).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237722352
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalog/v2/V1MetadataTable.scala ---
    @@ -0,0 +1,118 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2
    +
    +import java.util
    +
    +import scala.collection.JavaConverters._
    +
    +import org.apache.spark.sql.SaveMode
    +import org.apache.spark.sql.catalog.v2.PartitionTransforms.{bucket, identity}
    +import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogUtils}
    +import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, ReadSupport, WriteSupport}
    +import org.apache.spark.sql.sources.v2.reader.DataSourceReader
    +import org.apache.spark.sql.sources.v2.writer.DataSourceWriter
    +import org.apache.spark.sql.types.StructType
    +
    +/**
    + * An implementation of catalog v2 [[Table]] to expose v1 table metadata.
    + */
    +private[sql] class V1MetadataTable(
    +    v1Table: CatalogTable,
    +    v2Source: Option[DataSourceV2]) extends Table {
    +
    +  def readDelegate: ReadSupport = v2Source match {
    +    case r: ReadSupport => r
    +    case _ => throw new UnsupportedOperationException(s"Source does not support reads: $v2Source")
    +  }
    +
    +  def writeDelegate: WriteSupport = v2Source match {
    +    case w: WriteSupport => w
    +    case _ => throw new UnsupportedOperationException(s"Source does not support writes: $v2Source")
    +  }
    +
    +  lazy val options: Map[String, String] = {
    +    v1Table.storage.locationUri match {
    --- End diff --
    
    Also any particular reason this and the following variables have to be `lazy`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #94918 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94918/testReport)** for PR 21306 at commit [`dca4bf8`](https://github.com/apache/spark/commit/dca4bf8176eaa92de295de54488c3398256e0f7a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2289/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94917/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2227/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92623/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237972742
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalog/v2/V1MetadataTable.scala ---
    @@ -0,0 +1,118 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2
    +
    +import java.util
    +
    +import scala.collection.JavaConverters._
    +
    +import org.apache.spark.sql.SaveMode
    +import org.apache.spark.sql.catalog.v2.PartitionTransforms.{bucket, identity}
    +import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogUtils}
    +import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, ReadSupport, WriteSupport}
    +import org.apache.spark.sql.sources.v2.reader.DataSourceReader
    +import org.apache.spark.sql.sources.v2.writer.DataSourceWriter
    +import org.apache.spark.sql.types.StructType
    +
    +/**
    + * An implementation of catalog v2 [[Table]] to expose v1 table metadata.
    + */
    +private[sql] class V1MetadataTable(
    +    v1Table: CatalogTable,
    +    v2Source: Option[DataSourceV2]) extends Table {
    +
    +  def readDelegate: ReadSupport = v2Source match {
    +    case r: ReadSupport => r
    +    case _ => throw new UnsupportedOperationException(s"Source does not support reads: $v2Source")
    +  }
    +
    +  def writeDelegate: WriteSupport = v2Source match {
    +    case w: WriteSupport => w
    +    case _ => throw new UnsupportedOperationException(s"Source does not support writes: $v2Source")
    +  }
    +
    +  lazy val options: Map[String, String] = {
    +    v1Table.storage.locationUri match {
    --- End diff --
    
    How would the `getOrElse` pattern work here? If the URI is undefined, what tuple should be added to the table properties?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #94982 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94982/testReport)** for PR 21306 at commit [`6b45a11`](https://github.com/apache/spark/commit/6b45a119df8e6382fa2503f854b4a85aed3e3785).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  abstract static class SingleColumnTransform implements PartitionTransform `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by tigerquoll <gi...@git.apache.org>.
Github user tigerquoll commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    So Kudu range partitions support arbitrary sized partition intervals, like the example below, where the first and last range partition are six months in size, but the middle partition is one year in size.  
    
    -- Make a table representing a date/time value as TIMESTAMP.
    -- The strings representing the partition bounds are automatically
    -- cast to TIMESTAMP values.
    create table native_timestamp(id bigint, when_exactly timestamp, event string, primary key (id, when_exactly))
      range (when_exactly)
      (
        partition '2015-06-01' <= values < '2016-01-01',
        partition '2016-01-01' <= values < '2017-01-01',
        partition '2017-01-01' <= values < '2017-06-01'
      )
      stored as kudu;


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by tigerquoll <gi...@git.apache.org>.
Github user tigerquoll commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r214808418
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableChange.java ---
    @@ -0,0 +1,182 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.types.DataType;
    +
    +/**
    + * TableChange subclasses represent requested changes to a table. These are passed to
    + * {@link TableCatalog#alterTable}. For example,
    + * <pre>
    + *   import TableChange._
    + *   val catalog = source.asInstanceOf[TableSupport].catalog()
    + *   catalog.alterTable(ident,
    + *       addColumn("x", IntegerType),
    + *       renameColumn("a", "b"),
    + *       deleteColumn("c")
    + *     )
    + * </pre>
    + */
    +public interface TableChange {
    --- End diff --
    
    is adding or dropping table partitions a table change?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue closed the pull request at:

    https://github.com/apache/spark/pull/21306


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r231707076
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableChange.java ---
    @@ -0,0 +1,182 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.types.DataType;
    +
    +/**
    + * TableChange subclasses represent requested changes to a table. These are passed to
    + * {@link TableCatalog#alterTable}. For example,
    + * <pre>
    + *   import TableChange._
    + *   val catalog = source.asInstanceOf[TableSupport].catalog()
    + *   catalog.alterTable(ident,
    + *       addColumn("x", IntegerType),
    + *       renameColumn("a", "b"),
    + *       deleteColumn("c")
    + *     )
    + * </pre>
    + */
    +public interface TableChange {
    --- End diff --
    
    @mccheah, our table format supports updating the partitioning of a table, so I think it should be supported. But, this is intended to be an initial API so I didn't want to block this on agreeing how to repartition a table.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r230973235
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableChange.java ---
    @@ -0,0 +1,182 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.types.DataType;
    +
    +/**
    + * TableChange subclasses represent requested changes to a table. These are passed to
    + * {@link TableCatalog#alterTable}. For example,
    + * <pre>
    + *   import TableChange._
    + *   val catalog = source.asInstanceOf[TableSupport].catalog()
    + *   catalog.alterTable(ident,
    + *       addColumn("x", IntegerType),
    + *       renameColumn("a", "b"),
    + *       deleteColumn("c")
    + *     )
    + * </pre>
    + */
    +public interface TableChange {
    --- End diff --
    
    Would it be a valid operation to change the partitioning of the table without dropping the entire table and re-creating it? E.g. change the bucket size for such and such column. Seems pretty difficult to do in practice though since the underlying data layout would have to change as part of the modification.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #93568 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93568/testReport)** for PR 21306 at commit [`0ee938b`](https://github.com/apache/spark/commit/0ee938bb2e17a9981062042b97e8036179a9eae8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by tigerquoll <gi...@git.apache.org>.
Github user tigerquoll commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r214806854
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableChange.java ---
    @@ -0,0 +1,182 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.types.DataType;
    +
    +/**
    + * TableChange subclasses represent requested changes to a table. These are passed to
    + * {@link TableCatalog#alterTable}. For example,
    + * <pre>
    + *   import TableChange._
    + *   val catalog = source.asInstanceOf[TableSupport].catalog()
    + *   catalog.alterTable(ident,
    + *       addColumn("x", IntegerType),
    + *       renameColumn("a", "b"),
    + *       deleteColumn("c")
    + *     )
    + * </pre>
    + */
    +public interface TableChange {
    --- End diff --
    
    Can we support adding a comment to a column? / table?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r199992550
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java ---
    @@ -0,0 +1,173 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2.catalog;
    +
    +import org.apache.spark.sql.types.DataType;
    +
    +/**
    + * TableChange subclasses represent requested changes to a table. These are passed to
    + * {@link DataSourceCatalog#alterTable}.
    + */
    +public interface TableChange {
    +
    +  /**
    +   * Create a TableChange for adding a top-level column to a table.
    +   * <p>
    +   * Because "." may be interpreted as a field path separator or may be used in field names, it is
    +   * not allowed in names passed to this method. To add to nested types or to add fields with
    +   * names that contain ".", use {@link #addColumn(String, String, DataType)}.
    +   *
    +   * @param name the new top-level column name
    +   * @param dataType the new column's data type
    +   * @return a TableChange for the addition
    +   */
    +  static TableChange addColumn(String name, DataType dataType) {
    +    return new AddColumn(null, name, dataType);
    +  }
    +
    +  /**
    +   * Create a TableChange for adding a nested column to a table.
    +   * <p>
    +   * The parent name is used to find the parent struct type where the nested field will be added.
    +   * If the parent name is null, the new column will be added to the root as a top-level column.
    +   * If parent identifies a struct, a new column is added to that struct. If it identifies a list,
    +   * the column is added to the list element struct, and if it identifies a map, the new column is
    +   * added to the map's value struct.
    +   * <p>
    +   * The given name is used to name the new column and names containing "." are not handled
    +   * differently.
    +   *
    +   * @param parent the new field's parent
    +   * @param name the new field name
    +   * @param dataType the new field's data type
    +   * @return a TableChange for the addition
    +   */
    +  static TableChange addColumn(String parent, String name, DataType dataType) {
    +    return new AddColumn(parent, name, dataType);
    +  }
    +
    +  /**
    +   * Create a TableChange for renaming a field.
    +   * <p>
    +   * The name is used to find the field to rename. The new name will replace the name of the type.
    +   * For example, renameColumn("a.b.c", "x") should produce column a.b.x.
    --- End diff --
    
    It's great to have an example to show how to use this API, can we add an example to all the methods here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2293/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/686/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r200170560
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java ---
    @@ -0,0 +1,173 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2.catalog;
    +
    +import org.apache.spark.sql.types.DataType;
    +
    +/**
    + * TableChange subclasses represent requested changes to a table. These are passed to
    + * {@link DataSourceCatalog#alterTable}.
    + */
    +public interface TableChange {
    +
    +  /**
    +   * Create a TableChange for adding a top-level column to a table.
    +   * <p>
    +   * Because "." may be interpreted as a field path separator or may be used in field names, it is
    +   * not allowed in names passed to this method. To add to nested types or to add fields with
    +   * names that contain ".", use {@link #addColumn(String, String, DataType)}.
    +   *
    +   * @param name the new top-level column name
    +   * @param dataType the new column's data type
    +   * @return a TableChange for the addition
    +   */
    +  static TableChange addColumn(String name, DataType dataType) {
    +    return new AddColumn(null, name, dataType);
    +  }
    +
    +  /**
    +   * Create a TableChange for adding a nested column to a table.
    +   * <p>
    +   * The parent name is used to find the parent struct type where the nested field will be added.
    +   * If the parent name is null, the new column will be added to the root as a top-level column.
    +   * If parent identifies a struct, a new column is added to that struct. If it identifies a list,
    +   * the column is added to the list element struct, and if it identifies a map, the new column is
    +   * added to the map's value struct.
    +   * <p>
    +   * The given name is used to name the new column and names containing "." are not handled
    +   * differently.
    +   *
    +   * @param parent the new field's parent
    +   * @param name the new field name
    +   * @param dataType the new field's data type
    +   * @return a TableChange for the addition
    +   */
    +  static TableChange addColumn(String parent, String name, DataType dataType) {
    +    return new AddColumn(parent, name, dataType);
    +  }
    +
    +  /**
    +   * Create a TableChange for renaming a field.
    +   * <p>
    +   * The name is used to find the field to rename. The new name will replace the name of the type.
    +   * For example, renameColumn("a.b.c", "x") should produce column a.b.x.
    --- End diff --
    
    Are you looking for examples in Javadoc, or an example implementation?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237711409
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/CatalogProvider.java ---
    @@ -0,0 +1,50 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.internal.SQLConf;
    +
    +/**
    + * A marker interface to provide a catalog implementation for Spark.
    + * <p>
    + * Implementations can provide catalog functions by implementing additional interfaces, like
    + * {@link TableCatalog} to expose table operations.
    + * <p>
    + * Catalog implementations must implement this marker interface to be loaded by
    + * {@link Catalogs#load(String, SQLConf)}. The loader will instantiate catalog classes using the
    + * required public no-arg constructor. After creating an instance, it will be configured by calling
    + * {@link #initialize(CaseInsensitiveStringMap)}.
    + * <p>
    + * Catalog implementations are registered to a name by adding a configuration option to Spark:
    + * {@code spark.sql.catalog.catalog-name=com.example.YourCatalogClass}. All configuration properties
    + * in the Spark configuration that share the catalog name prefix,
    + * {@code spark.sql.catalog.catalog-name.(key)=(value)} will be passed in the case insensitive
    + * string map of options in initialization with the prefix removed. An additional property,
    + * {@code name}, is also added to the options and will contain the catalog's name; in this case,
    + * "catalog-name".
    + */
    +public interface CatalogProvider {
    --- End diff --
    
    As we discussed, will these APIs now live in the `sql-api` package? Also at what point are we going to introduce this new Maven module and package?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #92623 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92623/testReport)** for PR 21306 at commit [`023995d`](https://github.com/apache/spark/commit/023995d15b4293fac1530da6bd966b6ab6823980).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90531/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237974410
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/Table.java ---
    @@ -0,0 +1,46 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.List;
    +import java.util.Map;
    +
    +/**
    + * Represents table metadata from a {@link TableCatalog} or other table sources.
    --- End diff --
    
    No, this interface carries minimal set of operations needed to implement the v2 logical plans. We can expand it later when we need to.
    
    The goal here is to build a replacement catalog API incrementally and to avoid requiring all catalogs to implement all possible catalog features. This API is focused on table operations, not view or function operations that we have yet to define.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1359/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1339/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #94917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94917/testReport)** for PR 21306 at commit [`fa0edeb`](https://github.com/apache/spark/commit/fa0edeb1570485cc7d6cd0f848caaaf20480f384).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/684/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93605/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #92623 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92623/testReport)** for PR 21306 at commit [`023995d`](https://github.com/apache/spark/commit/023995d15b4293fac1530da6bd966b6ab6823980).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #94918 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94918/testReport)** for PR 21306 at commit [`dca4bf8`](https://github.com/apache/spark/commit/dca4bf8176eaa92de295de54488c3398256e0f7a).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class V1TableCatalog(sessionState: SessionState) extends TableCatalog `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r205562855
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/CatalogProvider.java ---
    @@ -0,0 +1,50 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.internal.SQLConf;
    +
    +/**
    + * A marker interface to provide a catalog implementation for Spark.
    + * <p>
    + * Implementations can provide catalog functions by implementing additional interfaces, like
    + * {@link TableCatalog} to expose table operations.
    + * <p>
    + * Catalog implementations must implement this marker interface to be loaded by
    + * {@link Catalogs#load(String, SQLConf)}. The loader will instantiate catalog classes using the
    + * required public no-arg constructor. After creating an instance, it will be configured by calling
    + * {@link #initialize(CaseInsensitiveStringMap)}.
    + * <p>
    + * Catalog implementations are registered to a name by adding a configuration option to Spark:
    + * {@code spark.sql.catalog.catalog-name=com.example.YourCatalogClass}. All configuration properties
    + * in the Spark configuration that share the catalog name prefix,
    + * {@code spark.sql.catalog.catalog-name.(key)=(value)} will be passed in the case insensitive
    + * string map of options in initialization with the prefix removed. An additional property,
    + * {@code name}, is also added to the options and will contain the catalog's name; in this case,
    + * "catalog-name".
    + */
    +public interface CatalogProvider {
    +  /**
    +   * Called to initialize configuration.
    +   * <p>
    +   * This method is called once, just after the provider is instantiated.
    +   *
    +   * @param options a case-insensitive string map of configuration
    +   */
    +  void initialize(CaseInsensitiveStringMap options);
    --- End diff --
    
    Can we reuse `CaseInsensitiveMap[String]`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    @tigerquoll, the proposal isn't to make partitions part of table configuration. It is to make the partitioning scheme part of the table configuration. How sources choose to handle individual partitions is up to the source. How those partitions are exposed through Spark is a different API because the current v2 data source design covers tables that appear to be unpartitioned.
    
    We could support range partitioning with the strategy that was discussed on the dev list, where the configuration is a function application with column references and literals. So your partitioning could be expressed like this:
    
    ```sql
    create table t (id bigint, ts timestamp, data string)
    partitioned by (range(ts, '2016-01-01', '2017-01-01', '2017-06-01')) using kudu.
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r205237682
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -609,6 +611,12 @@ class SparkSession private(
        */
       @transient lazy val catalog: Catalog = new CatalogImpl(self)
     
    +  @transient private lazy val catalogs = new mutable.HashMap[String, CatalogProvider]()
    +
    +  private[sql] def catalog(name: String): CatalogProvider = synchronized {
    --- End diff --
    
    Note that this is `private[sql]`. This allows us to use the named `TableCatalog` instances without solving how multiple catalogs should be exposed to users through a public API just yet.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237984050
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableCatalog.java ---
    @@ -0,0 +1,137 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.catalyst.TableIdentifier;
    +import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
    +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException;
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.List;
    +import java.util.Map;
    +
    +public interface TableCatalog extends CatalogProvider {
    --- End diff --
    
    What about `CatalogPlugin`? I'm hesitant to go with just `Catalog` because it isn't very specific. I think it might cause confusion because the interface has only the `initialize` method.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r199991918
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/CatalogSupport.java ---
    @@ -0,0 +1,36 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2;
    +
    +import org.apache.spark.sql.sources.v2.catalog.DataSourceCatalog;
    +
    +/**
    + * A mix-in interface for {@link DataSourceV2} catalog support. Data sources can implement this
    + * interface to provide the ability to load, create, alter, and drop tables.
    + * <p>
    + * Data sources must implement this interface to support logical operations that combine writing
    + * data with catalog tasks, like create-table-as-select.
    + */
    +public interface CatalogSupport {
    --- End diff --
    
    After thinking about it more, what we really need in the near future is all about table: create/alter/lookup/drop tables, instead of how the tables are organized, like databases, and how other information is stored, like view/function.
    
    How about we call it `TableSupport`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #90532 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90532/testReport)** for PR 21306 at commit [`7130d13`](https://github.com/apache/spark/commit/7130d13de27c99480189cdce3b7f00749a801e9c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    @cloud-fan, on the dev thread about 2.4 you talked about getting this PR in. What do we need to do next?
    
    I can call a vote on the SPIP if you think that's ready. I just bumped the thread. I'm also interested in getting review comments on this or #21978 if you have any.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r200177371
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java ---
    @@ -0,0 +1,173 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2.catalog;
    +
    +import org.apache.spark.sql.types.DataType;
    +
    +/**
    + * TableChange subclasses represent requested changes to a table. These are passed to
    + * {@link DataSourceCatalog#alterTable}.
    + */
    +public interface TableChange {
    +
    +  /**
    +   * Create a TableChange for adding a top-level column to a table.
    +   * <p>
    +   * Because "." may be interpreted as a field path separator or may be used in field names, it is
    +   * not allowed in names passed to this method. To add to nested types or to add fields with
    +   * names that contain ".", use {@link #addColumn(String, String, DataType)}.
    +   *
    +   * @param name the new top-level column name
    +   * @param dataType the new column's data type
    +   * @return a TableChange for the addition
    +   */
    +  static TableChange addColumn(String name, DataType dataType) {
    +    return new AddColumn(null, name, dataType);
    +  }
    +
    +  /**
    +   * Create a TableChange for adding a nested column to a table.
    +   * <p>
    +   * The parent name is used to find the parent struct type where the nested field will be added.
    +   * If the parent name is null, the new column will be added to the root as a top-level column.
    +   * If parent identifies a struct, a new column is added to that struct. If it identifies a list,
    +   * the column is added to the list element struct, and if it identifies a map, the new column is
    +   * added to the map's value struct.
    +   * <p>
    +   * The given name is used to name the new column and names containing "." are not handled
    +   * differently.
    +   *
    +   * @param parent the new field's parent
    +   * @param name the new field name
    +   * @param dataType the new field's data type
    +   * @return a TableChange for the addition
    +   */
    +  static TableChange addColumn(String parent, String name, DataType dataType) {
    +    return new AddColumn(parent, name, dataType);
    +  }
    +
    +  /**
    +   * Create a TableChange for renaming a field.
    +   * <p>
    +   * The name is used to find the field to rename. The new name will replace the name of the type.
    +   * For example, renameColumn("a.b.c", "x") should produce column a.b.x.
    +   *
    +   * @param name the current field name
    +   * @param newName the new name
    +   * @return a TableChange for the rename
    +   */
    +  static TableChange renameColumn(String name, String newName) {
    +    return new RenameColumn(name, newName);
    +  }
    +
    +  /**
    +   * Create a TableChange for updating the type of a field.
    +   * <p>
    +   * The name is used to find the field to update.
    +   *
    +   * @param name the field name
    +   * @param newDataType the new data type
    +   * @return a TableChange for the update
    +   */
    +  static TableChange updateColumn(String name, DataType newDataType) {
    +    return new UpdateColumn(name, newDataType);
    +  }
    +
    +  /**
    +   * Create a TableChange for deleting a field from a table.
    +   *
    +   * @param name the name of the field to delete
    +   * @return a TableChange for the delete
    +   */
    +  static TableChange deleteColumn(String name) {
    +    return new DeleteColumn(name);
    +  }
    +
    +  final class AddColumn implements TableChange {
    --- End diff --
    
    And, I'm not sure it's possible to implement unapply in Java. Not even implementing Product works.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237974718
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/PartitionTransforms.java ---
    @@ -0,0 +1,229 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +/**
    + * A standard set of transformations that are passed to data sources during table creation.
    + *
    + * @see PartitionTransform
    + */
    +public class PartitionTransforms {
    +  private PartitionTransforms() {
    +  }
    +
    +  /**
    +   * Create a transform for a column with the given name.
    +   * <p>
    +   * This transform is used to pass named transforms that are not known to Spark.
    +   *
    +   * @param transform a name of the transform to apply to the column
    +   * @param colName a column name
    +   * @return an Apply transform for the column
    +   */
    +  public static PartitionTransform apply(String transform, String colName) {
    +    if ("identity".equals(transform)) {
    --- End diff --
    
    I should note that the generic function application will probably look like the `Apply` case.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237722264
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalog/v2/V1MetadataTable.scala ---
    @@ -0,0 +1,118 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2
    +
    +import java.util
    +
    +import scala.collection.JavaConverters._
    +
    +import org.apache.spark.sql.SaveMode
    +import org.apache.spark.sql.catalog.v2.PartitionTransforms.{bucket, identity}
    +import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogUtils}
    +import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, ReadSupport, WriteSupport}
    +import org.apache.spark.sql.sources.v2.reader.DataSourceReader
    +import org.apache.spark.sql.sources.v2.writer.DataSourceWriter
    +import org.apache.spark.sql.types.StructType
    +
    +/**
    + * An implementation of catalog v2 [[Table]] to expose v1 table metadata.
    + */
    +private[sql] class V1MetadataTable(
    +    v1Table: CatalogTable,
    +    v2Source: Option[DataSourceV2]) extends Table {
    +
    +  def readDelegate: ReadSupport = v2Source match {
    +    case r: ReadSupport => r
    +    case _ => throw new UnsupportedOperationException(s"Source does not support reads: $v2Source")
    +  }
    +
    +  def writeDelegate: WriteSupport = v2Source match {
    +    case w: WriteSupport => w
    +    case _ => throw new UnsupportedOperationException(s"Source does not support writes: $v2Source")
    +  }
    +
    +  lazy val options: Map[String, String] = {
    +    v1Table.storage.locationUri match {
    --- End diff --
    
    Consider `.map(...).getOrElse`, but we haven't been consistent on using or not using `match` on `Option` types throughout Spark in general anyways so it's fine to leave as-is.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r237971288
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/Table.java ---
    @@ -0,0 +1,46 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalog.v2;
    +
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.List;
    +import java.util.Map;
    +
    +/**
    + * Represents table metadata from a {@link TableCatalog} or other table sources.
    + */
    +public interface Table {
    +  /**
    +   * Return the table properties.
    +   * @return this table's map of string properties
    +   */
    +  Map<String, String> properties();
    +
    +  /**
    +   * Return the table schema.
    +   * @return this table's schema as a struct type
    +   */
    +  StructType schema();
    +
    +  /**
    +   * Return the table partitioning transforms.
    +   * @return this table's partitioning transforms
    +   */
    +  List<PartitionTransform> partitioning();
    --- End diff --
    
    Sure


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94918/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    @cloud-fan, I've updated this to address your comments. Thanks for the reviews!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #93556 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93556/testReport)** for PR 21306 at commit [`46100f3`](https://github.com/apache/spark/commit/46100f3fc7c51b86bbdc03fcc7d7b3388748f698).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #93568 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93568/testReport)** for PR 21306 at commit [`0ee938b`](https://github.com/apache/spark/commit/0ee938bb2e17a9981062042b97e8036179a9eae8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    **[Test build #94906 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94906/testReport)** for PR 21306 at commit [`622180a`](https://github.com/apache/spark/commit/622180a50e05b4d968380824f5dbbe5f89e42422).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r200800546
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableCatalog.java ---
    @@ -0,0 +1,123 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2.catalog;
    +
    +import org.apache.spark.sql.catalyst.TableIdentifier;
    +import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
    +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException;
    +import org.apache.spark.sql.catalyst.expressions.Expression;
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.List;
    +import java.util.Map;
    +
    +public interface TableCatalog {
    +  /**
    +   * Load table metadata by {@link TableIdentifier identifier} from the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @return the table's metadata
    +   * @throws NoSuchTableException If the table doesn't exist.
    +   */
    +  Table loadTable(TableIdentifier ident) throws NoSuchTableException;
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the identifier
    +   */
    +  default Table createTable(TableIdentifier ident,
    +                            StructType schema) throws TableAlreadyExistsException {
    +    return createTable(ident, schema, Collections.emptyList(), Collections.emptyMap());
    +  }
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @param properties a string map of table properties
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the identifier
    +   */
    +  default Table createTable(TableIdentifier ident,
    +                            StructType schema,
    +                            Map<String, String> properties) throws TableAlreadyExistsException {
    +    return createTable(ident, schema, Collections.emptyList(), properties);
    +  }
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @param partitions a list of expressions to use for partitioning data in the table
    +   * @param properties a string map of table properties
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the identifier
    +   */
    +  Table createTable(TableIdentifier ident,
    +                    StructType schema,
    +                    List<Expression> partitions,
    --- End diff --
    
    I see, and this seems a very cool feature. My only concern is that, this new feature is not being discussed in dev list yet, and no JIRA ticket is tracking it. I feel a little weird to support a non-existing feature in data source v2 API. Shall we start a thread in dev list for this new feature? And see if we can make it before 2.4.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org