You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "vijayant soni (Jira)" <ji...@apache.org> on 2020/03/26 13:24:01 UTC

[jira] [Created] (HADOOP-16942) S3A creating folder level delete markers

vijayant soni created HADOOP-16942:
--------------------------------------

             Summary: S3A creating folder level delete markers
                 Key: HADOOP-16942
                 URL: https://issues.apache.org/jira/browse/HADOOP-16942
             Project: Hadoop Common
          Issue Type: Task
          Components: fs/s3
    Affects Versions: 3.2.1
            Reporter: vijayant soni


Using S3A URL scheme while writing out data from Spark to S3 is creating many folder level delete markers.

Writing the same with S3 URL scheme, does not create any delete markers at all.

 

Spark - 2.4.4

Hadoop - 3.2.1

EMR version - 6.0.0
{code:scala}
spark-shell
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
      /_/
         
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_242)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val df = spark.sql("select 1 as a")
df: org.apache.spark.sql.DataFrame = [a: int]

scala> df.show(false)
+---+                                                                           
|a  |
+---+
|1  |
+---+

scala> // Writing to S3 using s3

scala> df.write.mode(org.apache.spark.sql.SaveMode.Overwrite).save("s3://stage-dwh/tmp/vijayant/s3/")
                                                                                
scala> // Writing to S3 using s3a

scala> df.write.mode(org.apache.spark.sql.SaveMode.Overwrite).save("s3a://stage-dwh/tmp/vijayant/s3a/")
                                                                                
scala> 

{code}
Getting delete markers from `s3` write
{code:bash}
 aws s3api list-object-versions --bucket stage-dwh --prefix tmp/vijayant/s3

TO ADD OUTPUT
{code}
Getting delete markers from `s3a` write
{code:bash}
aws s3api list-object-versions --bucket stage-dwh --prefix tmp/vijayant/s3a

TO ADD OUTPUT
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org