You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "leesf (Jira)" <ji...@apache.org> on 2019/12/09 23:50:00 UTC

[jira] [Commented] (HUDI-395) hudi does not support scheme s3n when wrtiing to S3

    [ https://issues.apache.org/jira/browse/HUDI-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992040#comment-16992040 ] 

leesf commented on HUDI-395:
----------------------------

Hi, thanks for reporting this, right now, s3n is not supported yet, s3 and s3a is supported. and you would check it here https://github.com/apache/incubator-hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/storage/StorageSchemes.java

> hudi does not support scheme s3n when wrtiing to S3
> ---------------------------------------------------
>
>                 Key: HUDI-395
>                 URL: https://issues.apache.org/jira/browse/HUDI-395
>             Project: Apache Hudi (incubating)
>          Issue Type: Bug
>          Components: Spark datasource
>         Environment: spark-2.4.4-bin-hadoop2.7
>            Reporter: rui feng
>            Priority: Major
>
> When I use Hudi to create a hudi table then write to s3, I used below maven snnipet which is recommended by [https://hudi.apache.org/s3_hoodie.html]
> <dependency>
>  <groupId>org.apache.hudi</groupId>
>  <artifactId>hudi-spark-bundle</artifactId>
>  <version>0.5.0-incubating</version>
> </dependency>
> <dependency>
>  <groupId>org.apache.hadoop</groupId>
>  <artifactId>hadoop-aws</artifactId>
>  <version>2.7.3</version>
> </dependency>
> <dependency>
>  <groupId>com.amazonaws</groupId>
>  <artifactId>aws-java-sdk</artifactId>
>  <version>1.10.34</version>
> </dependency>
> and add the below configuration:
> sc.hadoopConfiguration.set("fs.defaultFS", "s3://niketest1")
>  sc.hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
>  sc.hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
>  sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId", "xxxxxx")
>  sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "xxxxx")
>  sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xxxxxx")
>  sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "xxxxx")
>  
> my spark version is spark-2.4.4-bin-hadoop2.7 and when I run below
> {color:#FF0000}df.write.format("org.apache.hudi").options(hudiOptions).mode(SaveMode.Overwrite).save(hudiTablePath).{color}
> val hudiOptions = Map[String,String](
>  HoodieWriteConfig.TABLE_NAME -> "hudi12",
>  DataSourceWriteOptions.OPERATION_OPT_KEY -> DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL,
>  DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "rider",
>  DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL)
> val hudiTablePath = "s3://niketest1/hudi_test/hudi12"
> the exception occur:
> j{color:#FF0000}ava.lang.IllegalArgumentException: BlockAlignedAvroParquetWriter does not support scheme s3n{color}
>  at org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.getHoodieScheme(HoodieWrapperFileSystem.java:109)
>  at org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.convertToHoodiePath(HoodieWrapperFileSystem.java:85)
>  at org.apache.hudi.io.storage.HoodieParquetWriter.<init>(HoodieParquetWriter.java:57)
>  at org.apache.hudi.io.storage.HoodieStorageWriterFactory.newParquetStorageWriter(HoodieStorageWriterFactory.java:60)
>  at org.apache.hudi.io.storage.HoodieStorageWriterFactory.getStorageWriter(HoodieStorageWriterFactory.java:44)
>  at org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:70)
>  at org.apache.hudi.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteLazyInsertIterable.java:137)
>  at org.apache.hudi.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteLazyInsertIterable.java:125)
>  at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
>  at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:120)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
>  
> Is anyone can tell me what's cause this exception, I tried to use org.apache.hadoop.fs.s3.S3FileSystem to replace org.apache.hadoop.fs.s3native.NativeS3FileSystem for the conf "fs.s3.impl", but other exception occur and it seems org.apache.hadoop.fs.s3.S3FileSystem fit hadoop 2.6.
>  
> Thanks advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)