You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/24 20:10:35 UTC

[GitHub] [hudi] yihua opened a new pull request, #6785: [HUDI-4913] Fix HoodieSnapshotExporter for writing to a different S3 bucket or FS

yihua opened a new pull request, #6785:
URL: https://github.com/apache/hudi/pull/6785

   ### Change Logs
   
   When using the HoodieSnapshotExporter to export a Hudi dataset on S3 to a different bucket, i.e., the source-base-path and the target-output-path are in different buckets, IllegalArgumentException is thrown:
   
   ```
   ./bin/spark-submit \
     ...
     --jars /home/hadoop/hudi-spark3.2-bundle_2.12-0.13.0-SNAPSHOT.jar \
     --class "org.apache.hudi.utilities.HoodieSnapshotExporter" \
         /home/hadoop/hudi-utilities-bundle_2.12-0.13.0-SNAPSHOT.jar \
     --source-base-path "s3a://ethan-lakehouse-us-east-2/hudi/hudi_trips_cow/" \
     --target-output-path "s3a://ethan-tmp/backup/" \
     --output-format "hudi"
   ```
   ```
   Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS s3a://ethan-tmp//backup -expected s3a://ethan-lakehouse-us-east-2
       at org.apache.hadoop.fs.s3native.S3xLoginHelper.checkPath(S3xLoginHelper.java:224)
       at org.apache.hadoop.fs.s3a.S3AFileSystem.checkPath(S3AFileSystem.java:1155)
       at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:666)
       at org.apache.hadoop.fs.s3a.S3AFileSystem.makeQualified(S3AFileSystem.java:1117)
       at org.apache.hadoop.fs.s3a.S3AFileSystem.qualify(S3AFileSystem.java:1143)
       at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3078)
       at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3053)
       at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760)
       at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:4263)
       at org.apache.hudi.utilities.HoodieSnapshotExporter.outputPathExists(HoodieSnapshotExporter.java:145)
       at org.apache.hudi.utilities.HoodieSnapshotExporter.export(HoodieSnapshotExporter.java:120)
       at org.apache.hudi.utilities.HoodieSnapshotExporter.main(HoodieSnapshotExporter.java:275)
   ```
   
   The root cause is that, when copying the data from the source path to the target path, the same file system based on the source path is used for writing the destination file, causing the exception.
   
   This PR fixes the problem by using the correct file system.
   
   ### Impact
   
   **Risk level: none**
   
   The PR is tested on EMR 6.7.0 with OSS Spark 3.2.2.  Exporting the dataset to a different S3 bucket in "hudi" or "parquet" format is successful.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6785: [HUDI-4913] Fix HoodieSnapshotExporter for writing to a different S3 bucket or FS

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6785:
URL: https://github.com/apache/hudi/pull/6785#issuecomment-1259898607

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "843dfb1a1005c0f9300aa4522c52f39a971b5dca",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "843dfb1a1005c0f9300aa4522c52f39a971b5dca",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 843dfb1a1005c0f9300aa4522c52f39a971b5dca UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6785: [HUDI-4913] Fix HoodieSnapshotExporter for writing to a different S3 bucket or FS

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6785:
URL: https://github.com/apache/hudi/pull/6785#issuecomment-1257064434

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "843dfb1a1005c0f9300aa4522c52f39a971b5dca",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11698",
       "triggerID" : "843dfb1a1005c0f9300aa4522c52f39a971b5dca",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 843dfb1a1005c0f9300aa4522c52f39a971b5dca Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11698) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6785: [HUDI-4913] Fix HoodieSnapshotExporter for writing to a different S3 bucket or FS

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6785:
URL: https://github.com/apache/hudi/pull/6785#issuecomment-1257082977

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "843dfb1a1005c0f9300aa4522c52f39a971b5dca",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11698",
       "triggerID" : "843dfb1a1005c0f9300aa4522c52f39a971b5dca",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 843dfb1a1005c0f9300aa4522c52f39a971b5dca Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11698) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua merged pull request #6785: [HUDI-4913] Fix HoodieSnapshotExporter for writing to a different S3 bucket or FS

Posted by GitBox <gi...@apache.org>.
yihua merged PR #6785:
URL: https://github.com/apache/hudi/pull/6785


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6785: [HUDI-4913] Fix HoodieSnapshotExporter for writing to a different S3 bucket or FS

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6785:
URL: https://github.com/apache/hudi/pull/6785#issuecomment-1257063766

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "843dfb1a1005c0f9300aa4522c52f39a971b5dca",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "843dfb1a1005c0f9300aa4522c52f39a971b5dca",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 843dfb1a1005c0f9300aa4522c52f39a971b5dca UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6785: [HUDI-4913] Fix HoodieSnapshotExporter for writing to a different S3 bucket or FS

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6785:
URL: https://github.com/apache/hudi/pull/6785#issuecomment-1259904314

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "843dfb1a1005c0f9300aa4522c52f39a971b5dca",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11698",
       "triggerID" : "843dfb1a1005c0f9300aa4522c52f39a971b5dca",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 843dfb1a1005c0f9300aa4522c52f39a971b5dca Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11698) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on pull request #6785: [HUDI-4913] Fix HoodieSnapshotExporter for writing to a different S3 bucket or FS

Posted by GitBox <gi...@apache.org>.
yihua commented on PR #6785:
URL: https://github.com/apache/hudi/pull/6785#issuecomment-1259947142

   CI is green.
   <img width="1489" alt="Screen Shot 2022-09-27 at 12 20 19" src="https://user-images.githubusercontent.com/2497195/192616713-53ba31a0-5cb8-4536-aacf-949dec93350d.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org