You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Anil Dasari <ad...@guidewire.com> on 2023/06/26 21:08:21 UTC

[Spark-SQL] Dataframe write saveAsTable failed

Hi,

We have upgraded Spark from 2.4.x to 3.3.1 recently and managed table
creation while writing dataframe as saveAsTable failed with below error.

Can not create the managed table(`<table name>`) The associated
location('hdfs:<table path>') already exists.

On high level our code does below before writing dataframe as table:

sparkSession.sql(s"DROP TABLE IF EXISTS $hiveTableName PURGE")
mydataframe.write.mode(SaveMode.Overwrite).saveAsTable(hiveTableName)

The above code works with Spark 2 because of
spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation which is
deprecated in Spark 3.

The table is dropped and purged before writing the dataframe. I expected
dataframe write shouldn't complain that the path already exists.

After digging further, I noticed there is `_tempory` folder present in the
hdfs table path.

dfs -ls /apps/hive/warehouse/<table-path>/
Found 1 items
drwxr-xr-x   - hadoop hdfsadmingroup          0 2023-06-23 04:45
/apps/hive/warehouse/<table-path>/_temporary

[root@ip-10-121-107-90 bin]# hdfs dfs -ls
/apps/hive/warehouse/<table-path>/_temporary
Found 1 items
drwxr-xr-x   - hadoop hdfsadmingroup          0 2023-06-23 04:45
/apps/hive/warehouse/<table-path>/_temporary/0

[root@ip-10-121-107-90 bin]# hdfs dfs -ls
/apps/hive/warehouse/<table-path>/_temporary/0
Found 1 items
drwxr-xr-x   - hadoop hdfsadmingroup          0 2023-06-23 04:45
/apps/hive/warehouse/<table-path>/_temporary/0/_temporary

Is it because of task failures ? Is there a way to workaround this issue ?

Thanks