You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Naresh (Jira)" <ji...@apache.org> on 2021/10/28 21:06:00 UTC

[jira] [Created] (HADOOP-17984) Hadoop-aws jar is unable to read file from S3 if used with third party like MINIO

Naresh created HADOOP-17984:
-------------------------------

             Summary: Hadoop-aws jar is unable to read file from S3 if used with third party like MINIO
                 Key: HADOOP-17984
                 URL: https://issues.apache.org/jira/browse/HADOOP-17984
             Project: Hadoop Common
          Issue Type: Bug
          Components: hadoop-thirdparty
    Affects Versions: 3.2.0
            Reporter: Naresh


Unable to read a file from S3 from spark if end point url is pointing to MINIO within EKS kubernetes cluster. We are able to do read/write from other clients and minio console. But when we read using spark I see empty data frame coming. If I use dataframe.show() it displays  like below.

 

++

||

++

++

 

*Spark Config:*

.config("spark.hadoop.fs.s3a.endpoint", "http://127.0.0.1:9000") // minio url or port-forward to local

.config("spark.hadoop.fs.s3a.access.key",<myaccesskey>)

.config("spark.hadoop.fs.s3a.secret.key",<mysecretkey>)

 

"spark.hadoop.fs.s3a.secret.key"

"spark.hadoop.fs.s3a.secret.key"

.config("spark.hadoop.fs.s3a.path.style.access", *true*)

        .config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")

        .config("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version", "2")

        .config("fs.s3a.committer.staging.conflict-mode", "replace")

        .config("fs.s3a.committer.name", "file")

        .config("fs.s3a.committer.threads", "20")

        .config("fs.s3a.threads.max", "20")

        .config("fs.s3a.fast.upload.buffer", "bytebuffer")

        .config("fs.s3a.fast.upload.active.blocks", "8")

        .config("fs.s3a.block.size", "128M")

        .config("mapred.input.dir.recursive","true")

    .config("spark.sql.parquet.binaryAsString", "true")

 

 

*JAR files:*

hadoop-aws:3.2.0

aws-java-sdk:1.12.30

spark-core_2.12:3.1.2

spark-sql_2.12:3.1.2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org