You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "bin wang (JIRA)" <ji...@apache.org> on 2015/10/20 01:52:28 UTC

[jira] [Comment Edited] (SPARK-6527) sc.binaryFiles can not access files on s3

    [ https://issues.apache.org/jira/browse/SPARK-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964267#comment-14964267 ] 

bin wang edited comment on SPARK-6527 at 10/19/15 11:52 PM:
------------------------------------------------------------

[~zhaozhang], this errors happens to me too while I am using Databricks' notebook. I have tons of images in a bucket, say mybucket, when I do binaryfiles('mybucket/*'), it will error out with same message as yours. However, some of the images contain special characters that when I do binaryfiles('mybucket/00*.jpg') to restrict to a very small number of images, the command ran successfully. 

In that case, I think there is probably something picky about the file names containing certain characters. 


was (Author: biwa7636):
[~zhaozhang], this errors happens to me too while I am using Databricks' notebook. I have tons of images in a bucket, say `mybucket` wher when I do `binaryfiles('mybucket/*')`, it will error out with same message as yours. However, some of the images contain special characters that when I do `binaryfiles('mybucket/00*.jpg')` to restrict to a very small number of images, the command ran successfully. 

In that case, I think there is probably something picky about the file names containing certain characters. 

> sc.binaryFiles can not access files on s3
> -----------------------------------------
>
>                 Key: SPARK-6527
>                 URL: https://issues.apache.org/jira/browse/SPARK-6527
>             Project: Spark
>          Issue Type: Bug
>          Components: EC2, Input/Output
>    Affects Versions: 1.2.0, 1.3.0
>         Environment: I am running Spark on EC2
>            Reporter: Zhao Zhang
>            Priority: Minor
>
> The sc.binaryFIles() can not access the files stored on s3. It can correctly list the number of files, but report "file does not exist" when processing them. I also tried sc.textFile() which works fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org