You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2016/11/07 10:45:58 UTC

[jira] [Commented] (SPARK-7344) Spark hangs reading and writing to the same S3 bucket

    [ https://issues.apache.org/jira/browse/SPARK-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643819#comment-15643819 ] 

Steve Loughran commented on SPARK-7344:
---------------------------------------

I've been doing lots of work with S3a and not seeing things. S3A is the later codebase than s3n, it's the one being actively maintained, and it is the one you'd should be using along with spark built against Hadoop 2.7.1+.



> Spark hangs reading and writing to the same S3 bucket
> -----------------------------------------------------
>
>                 Key: SPARK-7344
>                 URL: https://issues.apache.org/jira/browse/SPARK-7344
>             Project: Spark
>          Issue Type: Bug
>          Components: EC2
>    Affects Versions: 1.2.0, 1.2.1, 1.2.2, 1.3.0, 1.3.1
>         Environment: AWS EC2
>            Reporter: Daniel Mahler
>
> The following code will hang if the `outprefix` is in an S3 bucket
> def copy1 = "s3n://mybucket/copy1"
> def copy2 = "s3n://mybucket/copy2"
> val txt1 = sc.textFile(inpath)
> txt1.count
> val res = txt.saveAsTextFile(copy1)
> val txt2 = sc.textFile(copy1 +"/part-*")
> txt2.count
> txt2.saveAsTextFile(copy2) // <- HANGS HERE
> val txt3 = sc.textFile(copy2 +"/part-*")
> txt3.count
> The problem goew away if copy1 and copy2 are in distinct S2 buckets or when using HDFS instead of S3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org