You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "vijayant soni (Jira)" <ji...@apache.org> on 2020/03/26 13:24:01 UTC
[jira] [Created] (HADOOP-16942) S3A creating folder level delete
markers
vijayant soni created HADOOP-16942:
--------------------------------------
Summary: S3A creating folder level delete markers
Key: HADOOP-16942
URL: https://issues.apache.org/jira/browse/HADOOP-16942
Project: Hadoop Common
Issue Type: Task
Components: fs/s3
Affects Versions: 3.2.1
Reporter: vijayant soni
Using S3A URL scheme while writing out data from Spark to S3 is creating many folder level delete markers.
Writing the same with S3 URL scheme, does not create any delete markers at all.
Spark - 2.4.4
Hadoop - 3.2.1
EMR version - 6.0.0
{code:scala}
spark-shell
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.4
/_/
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_242)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val df = spark.sql("select 1 as a")
df: org.apache.spark.sql.DataFrame = [a: int]
scala> df.show(false)
+---+
|a |
+---+
|1 |
+---+
scala> // Writing to S3 using s3
scala> df.write.mode(org.apache.spark.sql.SaveMode.Overwrite).save("s3://stage-dwh/tmp/vijayant/s3/")
scala> // Writing to S3 using s3a
scala> df.write.mode(org.apache.spark.sql.SaveMode.Overwrite).save("s3a://stage-dwh/tmp/vijayant/s3a/")
scala>
{code}
Getting delete markers from `s3` write
{code:bash}
aws s3api list-object-versions --bucket stage-dwh --prefix tmp/vijayant/s3
TO ADD OUTPUT
{code}
Getting delete markers from `s3a` write
{code:bash}
aws s3api list-object-versions --bucket stage-dwh --prefix tmp/vijayant/s3a
TO ADD OUTPUT
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org