You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Burak Yavuz (Jira)" <ji...@apache.org> on 2019/09/23 18:04:00 UTC

[jira] [Updated] (SPARK-29219) DataSourceV2: Support all SaveModes in DataFrameWriter.save

     [ https://issues.apache.org/jira/browse/SPARK-29219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Burak Yavuz updated SPARK-29219:
--------------------------------
    Description: 
We currently don't support all save modes in DataFrameWriter.save as the TableProvider interface allows for the reading/writing of data, but not for the creation of tables. We created a catalog API to support the creation/dropping/checking existence of tables, but DataFrameWriter.save doesn't necessarily use a catalog for example, when writing to a path based table.

For this case, we propose a new interface that will allow TableProviders to extract an Indentifier and a Catalog from a bundle of CaseInsensitiveStringOptions. This information can then be used to check the existence of a table, and support all save modes. If a Catalog is not defined, then the behavior is to use the spark_catalog (or configured session catalog) to perform the check.

 

The interface can look like:
{code:java}
trait CatalogOptions {
  def extractCatalog(StringMap): String
  def extractIdentifier(StringMap): Identifier
} {code}

  was:
We currently don't support all save modes in DataFrameWriter.save as the TableProvider interface allows for the reading/writing of data, but not for the creation of tables. We created a catalog API to support the creation/dropping/checking existence of tables, but DataFrameWriter.save doesn't necessarily use a catalog for example, when writing to a path based table.

For this case, we propose a new interface that will allow TableProviders to extract an Indentifier and a Catalog from a bundle of CaseInsensitiveStringOptions. This information can then be used to check the existence of a table, and support all save modes. If a Catalog is not defined, then the behavior is to use the spark_catalog (or configured session catalog) to perform the check.


> DataSourceV2: Support all SaveModes in DataFrameWriter.save
> -----------------------------------------------------------
>
>                 Key: SPARK-29219
>                 URL: https://issues.apache.org/jira/browse/SPARK-29219
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Burak Yavuz
>            Priority: Major
>
> We currently don't support all save modes in DataFrameWriter.save as the TableProvider interface allows for the reading/writing of data, but not for the creation of tables. We created a catalog API to support the creation/dropping/checking existence of tables, but DataFrameWriter.save doesn't necessarily use a catalog for example, when writing to a path based table.
> For this case, we propose a new interface that will allow TableProviders to extract an Indentifier and a Catalog from a bundle of CaseInsensitiveStringOptions. This information can then be used to check the existence of a table, and support all save modes. If a Catalog is not defined, then the behavior is to use the spark_catalog (or configured session catalog) to perform the check.
>  
> The interface can look like:
> {code:java}
> trait CatalogOptions {
>   def extractCatalog(StringMap): String
>   def extractIdentifier(StringMap): Identifier
> } {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org