You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Zhaojing Yu (Jira)" <ji...@apache.org> on 2022/10/01 12:05:00 UTC
[jira] [Resolved] (HUDI-4913) HoodieSnapshotExporter throws IllegalArgumentException: Wrong FS
[ https://issues.apache.org/jira/browse/HUDI-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhaojing Yu resolved HUDI-4913.
-------------------------------
> HoodieSnapshotExporter throws IllegalArgumentException: Wrong FS
> ----------------------------------------------------------------
>
> Key: HUDI-4913
> URL: https://issues.apache.org/jira/browse/HUDI-4913
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Ethan Guo
> Priority: Critical
> Labels: pull-request-available
> Fix For: 0.12.1
>
>
> When using the HoodieSnapshotExporter to export a Hudi dataset on S3 to a different bucket, i.e., the source-base-path and the target-output-path are in different buckets, IllegalArgumentException is thrown:
>
> {code:java}
> ./bin/spark-submit \
> --master yarn \
> --deploy-mode client \
> --driver-memory 10g \
> --executor-memory 10g \
> --num-executors 1 \
> --executor-cores 4 \
> --jars /home/hadoop/hudi-spark3.2-bundle_2.12-0.13.0-SNAPSHOT.jar \
> --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
> --conf spark.kryoserializer.buffer=256m \
> --conf spark.kryoserializer.buffer.max=1024m \
> --conf spark.rdd.compress=true \
> --conf spark.memory.storageFraction=0.8 \
> --conf "spark.driver.defaultJavaOptions=-XX:+UseG1GC" \
> --conf "spark.executor.defaultJavaOptions=-XX:+UseG1GC" \
> --conf spark.ui.proxyBase="" \
> --conf 'spark.eventLog.enabled=true' --conf 'spark.eventLog.dir=hdfs:///var/log/spark/apps' \
> --conf spark.hadoop.yarn.timeline-service.enabled=false \
> --conf spark.driver.userClassPathFirst=true \
> --conf spark.executor.userClassPathFirst=true \
> --conf "spark.sql.hive.convertMetastoreParquet=false" \
> --conf spark.sql.catalogImplementation=in-memory \
> --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
> --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog' \
> --class "org.apache.hudi.utilities.HoodieSnapshotExporter" \
> /home/hadoop/hudi-utilities-bundle_2.12-0.13.0-SNAPSHOT.jar \
> --source-base-path "s3a://ethan-lakehouse-us-east-2/hudi/hudi_trips_cow/" \
> --target-output-path "s3a://ethan-tmp/backup/" \
> --output-format "hudi"{code}
>
> {code:java}
> Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS s3a://ethan-tmp//backup -expected s3a://ethan-lakehouse-us-east-2
> at org.apache.hadoop.fs.s3native.S3xLoginHelper.checkPath(S3xLoginHelper.java:224)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.checkPath(S3AFileSystem.java:1155)
> at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:666)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.makeQualified(S3AFileSystem.java:1117)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.qualify(S3AFileSystem.java:1143)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3078)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3053)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:4263)
> at org.apache.hudi.utilities.HoodieSnapshotExporter.outputPathExists(HoodieSnapshotExporter.java:145)
> at org.apache.hudi.utilities.HoodieSnapshotExporter.export(HoodieSnapshotExporter.java:120)
> at org.apache.hudi.utilities.HoodieSnapshotExporter.main(HoodieSnapshotExporter.java:275)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala){code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)