You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Zhaojing Yu (Jira)" <ji...@apache.org> on 2022/10/01 12:05:00 UTC
[jira] [Resolved] (HUDI-4913) HoodieSnapshotExporter throws IllegalArgumentException: Wrong FS

     [ https://issues.apache.org/jira/browse/HUDI-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhaojing Yu resolved HUDI-4913.
-------------------------------

> HoodieSnapshotExporter throws IllegalArgumentException: Wrong FS
> ----------------------------------------------------------------
>
>                 Key: HUDI-4913
>                 URL: https://issues.apache.org/jira/browse/HUDI-4913
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Ethan Guo
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 0.12.1
>
>
> When using the HoodieSnapshotExporter to export a Hudi dataset on S3 to a different bucket, i.e., the source-base-path and the target-output-path are in different buckets, IllegalArgumentException is thrown:
>  
> {code:java}
> ./bin/spark-submit \
>   --master yarn \
>   --deploy-mode client \
>   --driver-memory 10g \
>   --executor-memory 10g \
>   --num-executors 1 \
>   --executor-cores 4 \
>   --jars /home/hadoop/hudi-spark3.2-bundle_2.12-0.13.0-SNAPSHOT.jar \
>   --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
>   --conf spark.kryoserializer.buffer=256m \
>   --conf spark.kryoserializer.buffer.max=1024m \
>   --conf spark.rdd.compress=true \
>   --conf spark.memory.storageFraction=0.8 \
>   --conf "spark.driver.defaultJavaOptions=-XX:+UseG1GC" \
>   --conf "spark.executor.defaultJavaOptions=-XX:+UseG1GC" \
>   --conf spark.ui.proxyBase="" \
>   --conf 'spark.eventLog.enabled=true' --conf 'spark.eventLog.dir=hdfs:///var/log/spark/apps' \
>   --conf spark.hadoop.yarn.timeline-service.enabled=false \
>   --conf spark.driver.userClassPathFirst=true \
>   --conf spark.executor.userClassPathFirst=true \
>   --conf "spark.sql.hive.convertMetastoreParquet=false" \
>   --conf spark.sql.catalogImplementation=in-memory \
>   --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
>   --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog' \
>   --class "org.apache.hudi.utilities.HoodieSnapshotExporter" \
>       /home/hadoop/hudi-utilities-bundle_2.12-0.13.0-SNAPSHOT.jar \
>   --source-base-path "s3a://ethan-lakehouse-us-east-2/hudi/hudi_trips_cow/" \
>   --target-output-path "s3a://ethan-tmp/backup/" \
>   --output-format "hudi"{code}
>   
> {code:java}
> Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS s3a://ethan-tmp//backup -expected s3a://ethan-lakehouse-us-east-2
>     at org.apache.hadoop.fs.s3native.S3xLoginHelper.checkPath(S3xLoginHelper.java:224)
>     at org.apache.hadoop.fs.s3a.S3AFileSystem.checkPath(S3AFileSystem.java:1155)
>     at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:666)
>     at org.apache.hadoop.fs.s3a.S3AFileSystem.makeQualified(S3AFileSystem.java:1117)
>     at org.apache.hadoop.fs.s3a.S3AFileSystem.qualify(S3AFileSystem.java:1143)
>     at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3078)
>     at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3053)
>     at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760)
>     at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:4263)
>     at org.apache.hudi.utilities.HoodieSnapshotExporter.outputPathExists(HoodieSnapshotExporter.java:145)
>     at org.apache.hudi.utilities.HoodieSnapshotExporter.export(HoodieSnapshotExporter.java:120)
>     at org.apache.hudi.utilities.HoodieSnapshotExporter.main(HoodieSnapshotExporter.java:275)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>     at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>     at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>     at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>     at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>     at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)