You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "yihua (via GitHub)" <gi...@apache.org> on 2023/02/05 22:16:04 UTC

[GitHub] [hudi] yihua commented on a diff in pull request #7819: [HUDI-5652] Add hudi-cli-bundle docs

yihua commented on code in PR #7819:
URL: https://github.com/apache/hudi/pull/7819#discussion_r1096817493


##########
website/docs/cli.md:
##########
@@ -5,10 +5,22 @@ last_modified_at: 2021-08-18T15:59:57-04:00
 ---
 
 ### Local set up
-Once hudi has been built, the shell can be fired by via  `cd hudi-cli && ./hudi-cli.sh`. A hudi table resides on DFS, in a location referred to as the `basePath` and
+Once hudi has been built, the shell can be fired by via  `cd hudi-cli && ./hudi-cli.sh`.
+
+Optionally in release `0.13.0` we have now added another way of launching the `hudi cli`, which is using the `hudi-cli-bundle`.

Review Comment:
   Can we make `hudi-cli-bundle` the default way to use Hudi CLI for Spark 3?  Make the instruction for `hudi-cli-bundle` a separate section. Also note down that for Spark 2 the user need to follow the other instruction of building `hudi-cli` module.



##########
website/docs/cli.md:
##########
@@ -5,10 +5,22 @@ last_modified_at: 2021-08-18T15:59:57-04:00
 ---
 
 ### Local set up
-Once hudi has been built, the shell can be fired by via  `cd hudi-cli && ./hudi-cli.sh`. A hudi table resides on DFS, in a location referred to as the `basePath` and
+Once hudi has been built, the shell can be fired by via  `cd hudi-cli && ./hudi-cli.sh`.
+
+Optionally in release `0.13.0` we have now added another way of launching the `hudi cli`, which is using the `hudi-cli-bundle`.
+There are a couple of requirements when using this approach such as having `spark` installed locally on your machine. 
+It is required to use a spark distribution with hadoop dependencies packaged such as `spark-3.3.1-bin-hadoop2.tgz` from https://archive.apache.org/dist/spark/.
+We also recommend you set an env variable `$SPARK_HOME` to the path of where spark is installed on your machine. 
+One important thing to note is that the `hudi-spark-bundle` should also be present when using the `hudi-cli-bundle`.  
+You can run `sh packaging/hudi-cli-bundle/hudi-cli-bundle.sh` once both bundles are compiled or are present.

Review Comment:
   Please add commands that are dummy-proof, i.e., running these commands just works for the user without having to read all these notes.
   
   As far as I know the following are required:
   1. Create an empty folder as a new directory
   2. Copy the `hudi-cli-bundle` jars and `hudi-spark*-bundle` jars to this directory
   3. Copy the following file and folder to this directory
   ```
   packaging/hudi-cli-bundle/hudi-cli-with-bundle.sh 
   packaging/hudi-cli-bundle/conf .  the `conf` folder should be in this directory.
   ```
   4. Start Hudi CLI shell
   ```
   export SPARK_HOME=<spark-home-folder>
   export CLI_BUNDLE_JAR=<cli-bundle-jar-to-use>
   export SPARK_BUNDLE_JAR=<spark-bundle-jar-to-use>
   
   ./hudi-cli-with-bundle.sh
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org