You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:15:12 UTC
[jira] [Resolved] (SPARK-22457) Tables are supposed to be MANAGED
only taking into account whether a path is provided
[ https://issues.apache.org/jira/browse/SPARK-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-22457.
----------------------------------
Resolution: Incomplete
> Tables are supposed to be MANAGED only taking into account whether a path is provided
> -------------------------------------------------------------------------------------
>
> Key: SPARK-22457
> URL: https://issues.apache.org/jira/browse/SPARK-22457
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.2.0
> Reporter: David Arroyo
> Priority: Major
> Labels: bulk-closed
>
> As far as I know, since Spark 2.2, tables are supposed to be MANAGED only taking into account whether a path is provided:
> {code:java}
> val tableType = if (storage.locationUri.isDefined) {
> CatalogTableType.EXTERNAL
> } else {
> CatalogTableType.MANAGED
> }
> {code}
> This solution seems to be right for filesystem based data sources. On the other hand, when working with other data sources such as elasticsearch, that solution is leading to a weird behaviour described below:
> 1) InMemoryCatalog's doCreateTable() adds a locationURI if CatalogTableType.MANAGED && tableDefinition.storage.locationUri.isEmpty.
> 2) Before loading the data source table FindDataSourceTable's readDataSourceTable() adds a path option if locationURI exists:
> {code:java}
> val pathOption = table.storage.locationUri.map("path" -> CatalogUtils.URIToString(_))
> {code}
> 3) That causes an error when reading from elasticsearch because 'path' is an option already supported by elasticsearch (locationUri is set to file:/home/user/spark-rv/elasticsearch/shop/clients)
> org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot find mapping for file:/home/user/spark-rv/elasticsearch/shop/clients - one is required before using Spark SQL
> Would be possible only to mark tables as MANAGED for a subset of data sources (TEXT, CSV, JSON, JDBC, PARQUET, ORC, HIVE) or think about any other solution?
> P.S. InMemoryCatalog' doDropTable() deletes the directory of the table which from my point of view should only be required for filesystem based data sources:
> {code:java}
> if (tableMeta.tableType == CatalogTableType.MANAGED)
> ...
> // Delete the data/directory of the table
> val dir = new Path(tableMeta.location)
> try {
> val fs = dir.getFileSystem(hadoopConfig)
> fs.delete(dir, true)
> } catch {
> case e: IOException =>
> throw new SparkException(s"Unable to drop table $table as failed " +
> s"to delete its directory $dir", e)
> }
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org