You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "Jay Sen (JIRA)" <ji...@apache.org> on 2019/05/04 01:24:00 UTC

[jira] [Comment Edited] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly

    [ https://issues.apache.org/jira/browse/GOBBLIN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832957#comment-16832957 ] 

Jay Sen edited comment on GOBBLIN-707 at 5/4/19 1:23 AM:
---------------------------------------------------------

+ comment from the git, for more clarity on what you are suggesting.
{quote}Can we leave {{gobblin.sh}} relatively simple and instead have {{gobblin-cli.sh}} and {{gobblin-service.sh}}? {{gobblin.sh}} would just redirect to the correct place depending on the first argument
{quote}
This could also be done, but it would add duplicate the code for handling options (conf, jvmopts, etc...) and classpath building.

Basically, pretty much all options of gobblin-cli scripts is duplicated from gobblin-services ( which needs all options) as shown below:
 gobblin cli --help

{code:bash}
gobblin cli <cli-commands> <params> <other_options>
 cli-commands : admin, jobs, statestore-check, statestore-clean, historystore-manager
 params : respective parameters for the commands
 other_options: 
 --conf-dir <path-of-conf-dir> Gobblon config path. default is '$GOBBLIN_HOME/conf/<exe-mode-name>'. 
 --jvmopts <jvm or gc options> String containing JVM flags to include, in addition to "-Xmx1g -Xms512m". 
 --jars <csv list of extra jars> Column-separated list of extra jars to put on the CLASSPATH. 
 --enable-gc-logs enables gc logs & dumps. 
 --show-classpath prints gobblin runtime classpath. 
 --help Display this help. 
 --verbose Display full command used to start the process.
{code}

 

 {code:bash}
 gobblin services --help
 gobblin service <execution-modes> <start|stop|status> <other_options>

execution-modes : standalone, cluster-master, cluster-worker, aws, yarn, mapreduce, service-manager.
 other_options:
 --cluster-name Name of the cluster to be used by helix & other services. ( default: gobblin_cluster). 
 --conf-dir <path-of-conf-dir> Gobblon config path. default is '$GOBBLIN_HOME/conf/<exe-mode-name>'. 
 --log4j-conf <path-of-log4j-file> default is '$GOBBLIN_HOME/conf/<exe-mode-name>/log4j.properties'. 
 --jvmopts <jvm or gc options> String containing JVM flags to include, in addition to "-Xmx1g -Xms512m". 
 --jars <csv list of extra jars> Column-separated list of extra jars to put on the CLASSPATH. 
 --enable-gc-logs enables gc logs & dumps. --show-classpath prints gobblin runtime classpath. 
 --jt <resource manager URL> Only for mapreduce mode: Job submission URL, if not set, taken from ${HADOOP_HOME}/conf. 
 --fs <file system URL> Only for mapreduce mode: Target file system, if not set, taken from ${HADOOP_HOME}/conf. 
 --help Display this help. 
 --verbose Display full command used to start the process.
{code}
 

If we keep all the code common to handle options and other things then that is pretty much what I have done in gobblin.sh,

may be i can just separate out the help message for cli and services so it will be more clear abut options for each and aligns with what you are suggesting and then later on i can also try to bring in java classes under GobblinCli as a separate PR otherwise this PR will keep growing... :)

Let me know if you think otherwise, and I will think about how to make that change. 

 

Thanks

Jay


was (Author: jaysen):
+ comment from the git, for more clarity on what you are suggesting.
{quote}Can we leave {{gobblin.sh}} relatively simple and instead have {{gobblin-cli.sh}} and {{gobblin-service.sh}}? {{gobblin.sh}} would just redirect to the correct place depending on the first argument
{quote}
This could also be done, but it would add duplicate the code for handling options (conf, jvmopts, etc...) and classpath building.

Basically, pretty much all options of gobblin-cli scripts is duplicated from gobblin-services ( which needs all options) as shown below:
gobblin cli --help

gobblin cli <cli-commands> <params> <other_options>
cli-commands : admin, jobs, statestore-check, statestore-clean, historystore-manager
params : respective parameters for the commands
other_options: 
--conf-dir <path-of-conf-dir> Gobblon config path. default is '$GOBBLIN_HOME/conf/<exe-mode-name>'. 
--jvmopts <jvm or gc options> String containing JVM flags to include, in addition to "-Xmx1g -Xms512m". 
--jars <csv list of extra jars> Column-separated list of extra jars to put on the CLASSPATH. 
--enable-gc-logs enables gc logs & dumps. 
--show-classpath prints gobblin runtime classpath. 
--help Display this help. 
--verbose Display full command used to start the process.

 
gobblin services --help
gobblin service <execution-modes> <start|stop|status> <other_options>

execution-modes : standalone, cluster-master, cluster-worker, aws, yarn, mapreduce, service-manager.
other_options:
--cluster-name Name of the cluster to be used by helix & other services. ( default: gobblin_cluster). 
--conf-dir <path-of-conf-dir> Gobblon config path. default is '$GOBBLIN_HOME/conf/<exe-mode-name>'. 
--log4j-conf <path-of-log4j-file> default is '$GOBBLIN_HOME/conf/<exe-mode-name>/log4j.properties'. 
--jvmopts <jvm or gc options> String containing JVM flags to include, in addition to "-Xmx1g -Xms512m". 
--jars <csv list of extra jars> Column-separated list of extra jars to put on the CLASSPATH. 
--enable-gc-logs enables gc logs & dumps. --show-classpath prints gobblin runtime classpath. 
--jt <resource manager URL> Only for mapreduce mode: Job submission URL, if not set, taken from ${HADOOP_HOME}/conf. 
--fs <file system URL> Only for mapreduce mode: Target file system, if not set, taken from ${HADOOP_HOME}/conf. 
--help Display this help. 
--verbose Display full command used to start the process.

 

If we keep all the code common to handle options and other things then that is pretty much what I have done in gobblin.sh,

may be i can just separate out the help message for cli and services so it will be more clear abut options for each and aligns with what you are suggesting and then later on i can also try to bring in java classes under GobblinCli as a separate PR otherwise this PR will keep growing... :)

Let me know if you think otherwise, and I will think about how to make that change. 

 

Thanks

Jay

> combine & standardize all gobblin scripts into one master script & restructure configs accordingly
> --------------------------------------------------------------------------------------------------
>
>                 Key: GOBBLIN-707
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-707
>             Project: Apache Gobblin
>          Issue Type: Improvement
>            Reporter: Jay Sen
>            Priority: Major
>          Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> gobblin supports multiple modes of executions ( CLI, Standalone, cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines utility to run cli and admin commands. There is a individual script for each of them.
> Having individual script introduces lot of issues
>  # all scripts handles gobblin variables, user parameters differently, and its highly inconsistent among various different gobblin scripts
>  # functionality around start, stop, status checking and handling PID's among lot of other things, varies vastly as per the implementation of the script.
>  # features like GC & JVM params, log4j file selection, classpath calculation, etc... exists in some gobblin scripts but not all, adding to inconsistent user experience.
>  # maintaining total 13 script would be too much effort.
> Also all the gobblin scripts share lot of common code to handle params, start, stop services, status checks, pid handling, etc... combining all the scripts into  1 not only makes maintenance easier but also brings clarity and consistency.
>  
> Solution:
> 1. there can be one gobblin.sh script to handle all gobblin commands and deployment options as per following signature. NOTE: This
> {{gobblin.sh  <command> <params>}}
>  {{gobblin.sh  <execution-mode> <start|stop|status>}}
> {{commands values: admin, cli, statestore-check, statestore-clean, historystore-manager, classpath}}
>  {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, service}}
> with above change, following becomes valid command.
> {code:java}
> # all under GobblinCli class
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run <quick-app-name> -> gobblin cli run <quick-app-name>
> # class: JobStateToJsonConverter
> statestore-checker.sh <args> -> gobblin statestore-checker <args>
> # class: StateStoreCleaner
> statestore-clean.sh <args> -> gobblin statestore-clean <args>
> # class: DatabaseJobHistoryStoreSchemaManager
> historystore-manager.sh <args> -> gobblin historystore-manager <args>
> # class: Cli
> gobblin-admin.sh <args>   -> gobblin admin <args>
> # all gobblin deployment modes
> gobblin-cluster-master.sh   -> gobblin cluster-mater start|stop|status
> gobblin-cluster-worker.sh   -> gobblin cluster-mater start|stop|status
> gobblin-compaction.sh       -> gobblin cluster-mater start|stop|status
> gobblin-env.sh              -> gobblin cluster-mater start|stop|status
> gobblin-mapreduce.sh        -> gobblin cluster-mater start|stop|status
> gobblin-service.sh          -> gobblin cluster-mater start|stop|status
> gobblin-standalone.sh       -> gobblin cluster-mater start|stop|status
> gobblin-yarn.sh             -> gobblin cluster-mater start|stop|status
> {code}
>  
> 2. Also configs needs to be structured and deduped accordingly to make it clear on which config will be picked up for which execution mode.
>  {color:#ff0000}
>  NOTE: this refactoring adds all cli and service commands to gobblin.sh and hence changes the syntax for all commands and services.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)