You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "Issac Buenrostro (JIRA)" <ji...@apache.org> on 2019/05/02 23:28:00 UTC
[jira] [Commented] (GOBBLIN-707) combine & standardize all gobblin
scripts into one master script & restructure configs accordingly
[ https://issues.apache.org/jira/browse/GOBBLIN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832080#comment-16832080 ]
Issac Buenrostro commented on GOBBLIN-707:
------------------------------------------
Thanks for taking this up [~jaysen]
I do see the point of cleaning up the multiple scripts that Gobblin has, however I would challenge that the cleanup should be a bit different. As you pointed out there are two types of scripts: commands and services.
* For commands, the scripts are always pretty much identical, so I believe the access should always be through `GobblinCli` (i.e. implemented as `CliApplication`s). This means that instead of `gobblin statestore-checker` it should be `gobblin cli statestore-checker` and have the bash portion of the script be unique. This has the advantage that `gobblin cli --help` will list all commands, and commands are self-documenting by using the `@Alias` annotation, and even better if we use `ConstructorAndPublicMethodsCliObjectFactory` which will automatically create a help string for each one, and allow programmatic and cli access with the same input.
* For services, I'm not sure how you're approaching things, but it would also be nice to have a single bash script that can handle all of them (given that, as you pointed out, they are all of the form `start|stop|status`).
Re: the PR, I'm a bit confused because a lot of scripts were removed but I don't understand where the replacements are. I may be missing something obvious, and I apologize if that is the case :)
> combine & standardize all gobblin scripts into one master script & restructure configs accordingly
> --------------------------------------------------------------------------------------------------
>
> Key: GOBBLIN-707
> URL: https://issues.apache.org/jira/browse/GOBBLIN-707
> Project: Apache Gobblin
> Issue Type: Improvement
> Reporter: Jay Sen
> Priority: Major
> Time Spent: 5h 40m
> Remaining Estimate: 0h
>
> gobblin supports multiple modes of executions ( CLI, Standalone, cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines utility to run cli and admin commands. There is a individual script for each of them.
> Having individual script introduces lot of issues
> # all scripts handles gobblin variables, user parameters differently, and its highly inconsistent among various different gobblin scripts
> # functionality around start, stop, status checking and handling PID's among lot of other things, varies vastly as per the implementation of the script.
> # features like GC & JVM params, log4j file selection, classpath calculation, etc... exists in some gobblin scripts but not all, adding to inconsistent user experience.
> # maintaining total 13 script would be too much effort.
> Also all the gobblin scripts share lot of common code to handle params, start, stop services, status checks, pid handling, etc... combining all the scripts into 1 not only makes maintenance easier but also brings clarity and consistency.
>
> Solution:
> 1. there can be one gobblin.sh script to handle all gobblin commands and deployment options as per following signature. NOTE: This
> {{gobblin.sh <command> <params>}}
> {{gobblin.sh <execution-mode> <start|stop|status>}}
> {{commands values: admin, cli, statestore-check, statestore-clean, historystore-manager, classpath}}
> {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, service}}
> with above change, following becomes valid command.
> {code:java}
> # all under GobblinCli class
> gobblin run listQuickApps –> gobblin cli run listQuickApps
> gobblin run listQuickApps –> gobblin cli run listQuickApps
> gobblin run <quick-app-name> -> gobblin cli run <quick-app-name>
> # class: JobStateToJsonConverter
> statestore-checker.sh <args> -> gobblin statestore-checker <args>
> # class: StateStoreCleaner
> statestore-clean.sh <args> -> gobblin statestore-clean <args>
> # class: DatabaseJobHistoryStoreSchemaManager
> historystore-manager.sh <args> -> gobblin historystore-manager <args>
> # class: Cli
> gobblin-admin.sh <args> -> gobblin admin <args>
> # all gobblin deployment modes
> gobblin-cluster-master.sh -> gobblin cluster-mater start|stop|status
> gobblin-cluster-worker.sh -> gobblin cluster-mater start|stop|status
> gobblin-compaction.sh -> gobblin cluster-mater start|stop|status
> gobblin-env.sh -> gobblin cluster-mater start|stop|status
> gobblin-mapreduce.sh -> gobblin cluster-mater start|stop|status
> gobblin-service.sh -> gobblin cluster-mater start|stop|status
> gobblin-standalone.sh -> gobblin cluster-mater start|stop|status
> gobblin-yarn.sh -> gobblin cluster-mater start|stop|status
> {code}
>
> 2. Also configs needs to be structured and deduped accordingly to make it clear on which config will be picked up for which execution mode.
>
> {color:#FF0000}
> NOTE: this refactoring to gobblin.sh, changes the way all gobblin commands where ran before{color}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)