You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by abhijeet bedagkar <qa...@gmail.com> on 2019/09/25 06:57:03 UTC

Intermittently getting "Can not create the managed table error" while creating table from spark 2.4

Hi,

We are facing below error in spark 2.4 intermittently when saving the
managed table from spark.

Error -
pyspark.sql.utils.AnalysisException: u"Can not create the managed
table('`hive_issue`.`table`'). The associated
location('s3://{bucket_name}/EMRFS_WARE_TEST167_new/warehouse/hive_issue.db/table')
already exists.;"

Steps to reproduce--
1. Create dataframe from spark mid size data (30MB CSV file)
2. Save dataframe as a table
3. Terminate the session when above mentioned operation is in progress

Note--
Session termination is just a way to repro this issue. In real time we are
facing this issue intermittently when we are running same spark jobs
multiple times. We use EMRFS and HDFS from EMR cluster and we face the same
issue on both of the systems.
The only ways we can fix this is by deleting the target folder where table
will keep its files which is not option for us and we need to keep
historical information in the table hence we use APPEND mode while writing
to table.


Sample code--
from pyspark.sql import SparkSession
sc = SparkSession.builder.enableHiveSupport().getOrCreate()
df = sc.read.csv("s3://{sample-bucket}1/DATA/consumecomplians.csv")
print "STARTED WRITING TO TABLE"
# Terminate session using ctrl + c after this statement post df.write
action started
df.write.mode("append").saveAsTable("hive_issue.table")
print "COMPLETED WRITING TO TABLE"

We went through the documentation of spark 2.4 [1] and found that spark is
no longer allowing to create manage tables on non empty folders.

1. Any reason behind change in the spatk behaviour
2. To us it looks like a breaking change as despite specifying "overwrite"
option spark in unable to wipe out existing data and create tables
3. Do we have any solution for this issue.

[1]
https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html

Thanks,
Abhijeet