You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Jake Maes (JIRA)" <ji...@apache.org> on 2016/12/22 23:06:58 UTC

[jira] [Created] (SAMZA-1068) The run-class.sh script includes logic specific to launching jobs.

Jake Maes created SAMZA-1068:
--------------------------------

             Summary: The run-class.sh script includes logic specific to launching jobs.
                 Key: SAMZA-1068
                 URL: https://issues.apache.org/jira/browse/SAMZA-1068
             Project: Samza
          Issue Type: Bug
            Reporter: Jake Maes


run-class.sh is a script that is used by many other script to launch a Java class. 
 
Particularly:
{noFormat}
114|STAT_JOB|=|"|$BIN_PATH|/|run-class.sh| \
21|exec| |$|(|dirname| |$0|)|/|run-class.sh| |org.apache.samza.checkpoint.CheckpointTool| |"|$@|"
21|exec| |$|(|dirname| |$0|)|/|run-class.sh| |org.apache.samza.coordinator.stream.CoordinatorStreamWriter| |"|$@|"
21|exec| |$|(|dirname| |$0|)|/|run-class.sh| |org.apache.samza.autoscaling.deployer.ConfigManager| |"|$@|"
21|exec| |$|(|dirname| |$0|)|/|run-class.sh| |org.apache.samza.job.JobRunner| |"|$@|"
25|exec| |$|(|dirname| |$0|)|/|run-class.sh| |org.apache.samza.clustermanager.ClusterBasedJobCoordinator| |"|$@|"
28|exec| |$|(|dirname| |$0|)|/|run-class.sh| |org.apache.samza.container.SamzaContainer| |"|$@|"
21|exec| |$|(|dirname| |$0|)|/|run-class.sh| |org.apache.hadoop.yarn.client.cli.ApplicationCLI| |application| |-kill| |"|$@|"
21|exec| |$|(|dirname| |$0|)|/|run-class.sh| |org.apache.samza.storage.StateStorageTool| |"|$@|"
21|APP_ID|=|$|(|exec| |"|$|(|dirname| |$0|)|"|/|run-class.sh| |org.apache.hadoop.yarn.client.cli.ApplicationCLI| |application| |-list| ||| |grep| |"|[[:space:]]|$1|[[:space:]]|"| ||| |grep| |"|application_|"| ||| |awk| |-F| |' '| |'{ print $1 }'|)
21|exec| |$|(|dirname| |$0|)|/|run-class.sh| |org.apache.hadoop.yarn.client.cli.ApplicationCLI| |application| |-list
21|exec| |$|(|dirname| |$0|)|/|run-class.sh| |org.apache.samza.validation.YarnJobValidationTool| |"|$@|"
21|exec| |$|(|dirname| |$0|)|/|run-class.sh| |org.apache.samza.storage.kv.RocksDbReadingTool| |"|$@|"
21|exec| |$|(|dirname| |$0|)|/|run-class.sh| |org.apache.hadoop.yarn.client.cli.ApplicationCLI| |application| |-status| |"|$@|"
21|exec| |$|(|dirname| |$0|)|/|run-class.sh| |org.apache.hadoop.yarn.client.cli.ApplicationCLI| |application| |-list| ||| |grep| |application_| ||| |awk| |-F| |' '| |'{ print $1 }'| ||| |while| |read| |linea| 
{noFormat}

The problem is that it now contains some logic that is specific to launching a job. In particular, it exports a JOB_LIB_DIR environment variable, which can be passed from the java process to any child processes that are spawned, causing unexpected behavior. 

This was noticed at LI because of a Samza REST Monitor implementation which restarts the YARN NodeManager if it fails. Since Samza REST is launched by run-class.sh, it exported its own JOB_LIB_DIR, which it passes down to the NM when it restarts it. This caused the NM to launch jobs with conflicts in the class path. 

As a short term fix, we added the 'env -i' option to prevent this in Samza REST, but everywhere we use run-class.sh, we could run into similar problems. 

Here are a couple ideas to fix it:
1. Remove the job launch logic from run-class.sh, potentially moving it to run-job.sh. Note this refactor needs a lot of testing, since it's on the critical path for running jobs in Samza
2. Rely on a new script to launch java classes generically. Essentially, copy the generic logic from run-class.sh to a new generic script which can be used in all cases other than launching jobs. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)