You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dipayan Dev (Jira)" <ji...@apache.org> on 2023/08/25 07:07:00 UTC

[jira] [Comment Edited] (SPARK-44884) Spark doesn't create SUCCESS file when external path is passed

    [ https://issues.apache.org/jira/browse/SPARK-44884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757550#comment-17757550 ] 

Dipayan Dev edited comment on SPARK-44884 at 8/25/23 7:06 AM:
--------------------------------------------------------------

[~stevel@apache.org] , I am running on DataProc but I am able to replicate the same from my local machine as well.

_SUCCESS file is created
 * Spark 2.x (2.4.0 I am using) with .saveAsTable() with or without external path.

_SUCCESS file is not created
 * Spark 3.3.0 with .saveAsTable() with or without external path.

As mentioned, I have set the following config, but no help.
spark.conf.set("spark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs", true) 
Are you able to replicate the issue with the snippet I have shared or the _SUCCESS file is generating at your end when an external path is passed?


was (Author: JIRAUSER301514):
[~stevel@apache.org] , I am running on DataProc but I am able to replicate the same from my local machine as well.

_SUCCESS file is created
 * Spark 2.x (2.4.0 I am using) with .saveAsTable() with or without external path.
 * Spark 3.3.0 with .saveAsTable() without external path.

_SUCCESS file is not created
 * Spark 3.3.0 with .saveAsTable() with external path.

As mentioned, I have set the following config, but no help.
spark.conf.set("spark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs", true) 
Are you able to replicate the issue with the snippet I have shared or the _SUCCESS file is generating at your end when an external path is passed?

> Spark doesn't create SUCCESS file when external path is passed
> --------------------------------------------------------------
>
>                 Key: SPARK-44884
>                 URL: https://issues.apache.org/jira/browse/SPARK-44884
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.3.0
>            Reporter: Dipayan Dev
>            Priority: Critical
>         Attachments: image-2023-08-20-18-08-38-531.png, image-2023-08-20-18-46-53-342.png
>
>
> The issue is not happening in Spark 2.x (I am using 2.4.0), but only in 3.3.0
> Code to reproduce the issue.
>  
> {code:java}
> scala> spark.conf.set("spark.sql.orc.char.enabled", true)
> scala> val DF = Seq(("test1", 123)).toDF("name", "num")
> scala> DF.write.option("path", "gs://test_dd123/").mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("test_schema.table_name")
> 23/08/20 12:31:43 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.   {code}
> The above code succeeds and creates the External Hive table, but {*}there is no SUCCESS file generated{*}. The same code when running spark 2.4.0, generating a SUCCESS file.
> Adding the content of the bucket after table creation
>  
> !image-2023-08-20-18-08-38-531.png|width=453,height=162!
>  
> But when I don’t pass the external path as following, the SUCCESS file is generated
> {code:java}
> scala> DF.write.mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("us_wm_supply_chain_rcv_pre_prod.test_tb1") {code}
> !image-2023-08-20-18-46-53-342.png|width=465,height=166!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org