You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Burak Yavuz (Jira)" <ji...@apache.org> on 2019/09/23 18:04:00 UTC
[jira] [Updated] (SPARK-29219) DataSourceV2: Support all SaveModes
in DataFrameWriter.save
[ https://issues.apache.org/jira/browse/SPARK-29219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Burak Yavuz updated SPARK-29219:
--------------------------------
Description:
We currently don't support all save modes in DataFrameWriter.save as the TableProvider interface allows for the reading/writing of data, but not for the creation of tables. We created a catalog API to support the creation/dropping/checking existence of tables, but DataFrameWriter.save doesn't necessarily use a catalog for example, when writing to a path based table.
For this case, we propose a new interface that will allow TableProviders to extract an Indentifier and a Catalog from a bundle of CaseInsensitiveStringOptions. This information can then be used to check the existence of a table, and support all save modes. If a Catalog is not defined, then the behavior is to use the spark_catalog (or configured session catalog) to perform the check.
The interface can look like:
{code:java}
trait CatalogOptions {
def extractCatalog(StringMap): String
def extractIdentifier(StringMap): Identifier
} {code}
was:
We currently don't support all save modes in DataFrameWriter.save as the TableProvider interface allows for the reading/writing of data, but not for the creation of tables. We created a catalog API to support the creation/dropping/checking existence of tables, but DataFrameWriter.save doesn't necessarily use a catalog for example, when writing to a path based table.
For this case, we propose a new interface that will allow TableProviders to extract an Indentifier and a Catalog from a bundle of CaseInsensitiveStringOptions. This information can then be used to check the existence of a table, and support all save modes. If a Catalog is not defined, then the behavior is to use the spark_catalog (or configured session catalog) to perform the check.
> DataSourceV2: Support all SaveModes in DataFrameWriter.save
> -----------------------------------------------------------
>
> Key: SPARK-29219
> URL: https://issues.apache.org/jira/browse/SPARK-29219
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Burak Yavuz
> Priority: Major
>
> We currently don't support all save modes in DataFrameWriter.save as the TableProvider interface allows for the reading/writing of data, but not for the creation of tables. We created a catalog API to support the creation/dropping/checking existence of tables, but DataFrameWriter.save doesn't necessarily use a catalog for example, when writing to a path based table.
> For this case, we propose a new interface that will allow TableProviders to extract an Indentifier and a Catalog from a bundle of CaseInsensitiveStringOptions. This information can then be used to check the existence of a table, and support all save modes. If a Catalog is not defined, then the behavior is to use the spark_catalog (or configured session catalog) to perform the check.
>
> The interface can look like:
> {code:java}
> trait CatalogOptions {
> def extractCatalog(StringMap): String
> def extractIdentifier(StringMap): Identifier
> } {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org