You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Filimonov Valentin (Jira)" <ji...@apache.org> on 2022/06/04 16:17:00 UTC

[jira] [Created] (SPARK-39379) FileAlreadyExistsException while insertInto() DF to hive table or directly write().parquet()

Filimonov Valentin created SPARK-39379:
------------------------------------------

             Summary: FileAlreadyExistsException while insertInto() DF to hive table or directly write().parquet()
                 Key: SPARK-39379
                 URL: https://issues.apache.org/jira/browse/SPARK-39379
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.8
         Environment: java.version = 1.8
spark.version = 2.4.8
hadoop.version = 3.1.3
            Reporter: Filimonov Valentin


I have such structure of table where I want to write DF:

 
{code:java}
CREATE EXTERNAL TABLE `usl_rdm_idl_spark_stg.okogu_h`(
  `ctl_loading` bigint,
  `ctl_validfrom` timestamp,
  `end_dt` date,
  `okogu_accept_dt` date)
PARTITIONED BY (
  `p1day` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'hdfs://FESS-DEV/data/usl/rdm_idl_spark/stg/okogu_h'
TBLPROPERTIES (
  'bucketing_version'='2',
  'spark.sql.partitionProvider'='catalog',
  'transient_lastDdlTime'='1654082666')
{code}
 

Final DF has the same structure as mentioned table structure. The issue happens when attr "p1day" (table is partitioned by this attr) has *null* value only. So when I try to write it with any option 

 
{code:java}
finalDF.write().mode(SaveMode.Append).partitionBy("p1day").parquet("somepath);{code}
 

 or

 
{code:java}
finalDF.write().mode(SaveMode.Append).insertInto(String.format("%s.%s", tgtSchema, tgtTable));{code}
I see such error:

 
{code:java}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.fs.FileAlreadyExistsException: /data/usl/rdm_idl_spark/stg/okogu_h/.hive-staging_hive_2022-06-01_16-59-37_442_6329951430234699240-1/-ext-10000/_temporary/0/_temporary/attempt_20220601165937_0116_m_000001_586/p1day=__HIVE_DEFAULT_PARTITION__/part-00001-05999af9-8a25-406e-a307-f97781547db2.c000 for client 10.106.105.11 already exists{code}
 

 

For me it works correctly only when I replace null value in "p1day" column with any value( for ex. "1"):

 
{code:java}
finalDF.withColumn("p1day",lit("1"));{code}
 

 

Is it a bug in spark-sql code? I use org.apache.spark:spark-sql_2.11:2.4.8 

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org