You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/04/28 09:18:45 UTC

[GitHub] [hudi] Neuw84 opened a new issue, #5456: [SUPPORT] HoodieDeltaStreamer - Spark master shouldn't have a default option (AWS Glue)

Neuw84 opened a new issue, #5456:
URL: https://github.com/apache/hudi/issues/5456

   
   **Describe the problem you faced**
   
   When trying to run HoodieDeltaStreamer on AWS Glue I found that the Spark master has no option to inherit from the environment as it defaults to ```local[2]```. In these kind of Serverless environments where you do not have access to the master this configuration should be inherited
   
   This can be seen on line 329 on [HoodieDeltaStreamer](https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java).
   
   ```    public String sparkMaster = "local[2]"; ```
   
   This should be changed for supporting this kind of scenarios, a JavaSparkContext option where no Spark master is defined should be there. 
   
   **Expected behavior**
   
   The Spark master shouldn't have a default as there are some environments (usually serverless such as AWS Glue) where it will be inherited. 
   
   **Environment Description**
   
   * Hudi version : 0.9.0
   
   * Spark version : Spark 3.1.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   **Additional context**
   
   If required I think I could work on this as I have quite good Java experience. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua closed issue #5456: [SUPPORT] HoodieDeltaStreamer - Spark master shouldn't have a default option (AWS Glue)

Posted by GitBox <gi...@apache.org>.
yihua closed issue #5456: [SUPPORT] HoodieDeltaStreamer - Spark master shouldn't have a default option (AWS Glue)
URL: https://github.com/apache/hudi/issues/5456


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on issue #5456: [SUPPORT] HoodieDeltaStreamer - Spark master shouldn't have a default option (AWS Glue)

Posted by GitBox <gi...@apache.org>.
yihua commented on issue #5456:
URL: https://github.com/apache/hudi/issues/5456#issuecomment-1112354777

   @Neuw84 The `HoodieDeltaStreamer` takes the `--spark-master` option to override the default Spark master to use, i.e., the same variable you mentioned, when you use `spark-submit` to run `HoodieDeltaStreamer`. I think your point is that, if the `HoodieDeltaStreamer` constructor is used, the Spark master config can be overwritten by what's configured in `JavaSparkContext jssc`, is that the case?
   
   Feel free to create a Jira ticket [here](https://issues.apache.org/jira/projects/HUDI/issues) for this feature inquiry and work on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Neuw84 commented on issue #5456: [SUPPORT] HoodieDeltaStreamer - Spark master shouldn't have a default option (AWS Glue)

Posted by GitBox <gi...@apache.org>.
Neuw84 commented on issue #5456:
URL: https://github.com/apache/hudi/issues/5456#issuecomment-1112433258

   Hi @yihua, 
   
   The thing is that no default value should be in that class in order to be able to inherit to whatever the master is in serverless Spark engines such as AWS Glue where the master is not known. The idea is to remove that default value (```local[2] ```), add an option to build a Java Spark Context without the default value and change the documentation in the ´´´cli´´´ options. 
   
   Is a small change, but that will enable to use the DeltaStreamer in managed environments such as AWS Glue. Will open the Jira ticket and work on it, it is a small change that will enable me on further contributions if possible :) 
   
   Thanks 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on issue #5456: [SUPPORT] HoodieDeltaStreamer - Spark master shouldn't have a default option (AWS Glue)

Posted by GitBox <gi...@apache.org>.
yihua commented on issue #5456:
URL: https://github.com/apache/hudi/issues/5456#issuecomment-1112730279

   Got you.  Sg.  Let's track the progress there.  Closing this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org