You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Max (Jira)" <ji...@apache.org> on 2022/05/31 13:22:00 UTC
[jira] [Created] (SPARK-39348) Create table in overwrite mode fails when interrupted
Max created SPARK-39348:
---------------------------
Summary: Create table in overwrite mode fails when interrupted
Key: SPARK-39348
URL: https://issues.apache.org/jira/browse/SPARK-39348
Project: Spark
Issue Type: Bug
Components: Input/Output
Affects Versions: 3.1.1
Reporter: Max
When you attempt to rerun an Apache Spark write operation by cancelling the currently running job, the following error occurs:
Error: org.apache.spark.sql.AnalysisException: Cannot create the managed table('`testdb`.` testtable`').
The associated location ('dbfs:/user/hive/warehouse/testdb.db/metastore_cache_ testtable) already exists.;
This problem can occur if:
* The cluster is terminated while a write operation is in progress.
* A temporary network issue occurs.
* The job is interrupted.
Once the metastore data for a particular table is corrupted, it is hard to recover except by dropping the files in that location manually. Basically, the problem is that a metadata directory called {{_STARTED}} isn’t deleted automatically when Azure Databricks tries to overwrite it.
You can reproduce the problem by following these steps:
# Create a DataFrame:
{{val df = spark.range(1000)}}
# Write the DataFrame to a location in overwrite mode:
{{df.write.mode(SaveMode.Overwrite).saveAsTable("testdb.testtable")}}
# Cancel the command while it is executing.
# Re-run the {{write}} command.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org