You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "Joel Baranick (JIRA)" <ji...@apache.org> on 2017/08/28 17:32:00 UTC
[jira] [Updated] (GOBBLIN-227)
JobLauncherUtils.cleanTaskStagingData fails for jobs with forks
[ https://issues.apache.org/jira/browse/GOBBLIN-227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joel Baranick updated GOBBLIN-227:
----------------------------------
Description:
*Precondition:*
Using Hocon configuration and have two forks configured.
*Summary:*
When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it tries to lookup {{writer.staging.dir}} in the configuration and fails.
*Details:*
Hocon configuration doesn't allow the following config:
{code:none}
writer.staging.dir=/foo
writer.staging.dir.0=/foo
writer.staging.dir.1=/foo
{code}
Initially {{writer.staging.dir}} is of type String, but when the Hocon parser encounters {{writer.staging.dir.0}}, it decides that {{writer.staging.dir}} is now of type Object, overwriting the prior value with {{_\{"0": "/foo"\}_}}.
The effective Hocon configuration is:
{code:javascript}
{
"writer": {
"staging": {
"dir": {
"0": "/foo",
"1": "/foo"
}
}
}
}
{code}
Fork specific configuration uses the same config keys as regular configuration except the fork number is appended like: {{.1}}. The code that looks up fork specific configuration doesn't automatically fallback to regular configuration. For example, if the code is trying to find {{writer.staging.dir.0}} and it isn't configured, the job will fail. Then means that all forks must configure fork specific versions of {{writer.staging.dir}}.
When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it cleans up the based on the current job's configuration. Because of this, {{fork.branches}} is always set to {{1}}. The call to {{WriterUtils.getWriterStagingDir(state, numBranches, branchId)}} is made with {{numBranches=1}} and {{branchId=0}}. This results in the method looking for {{writer.staging.dir}}. Unfortunately, when using Hocon configuration the value {{writer.staging.dir}} doesn't exist and the job fails.
was:
*Precondition:*
Using Hocon configuration and have two forks configured.
*Summary:*
When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it tries to lookup {{writer.staging.dir}} in the configuration and fails.
*Details:*
Hocon configuration doesn't allow the following config:
{code:none}
writer.staging.dir=/foo
writer.staging.dir.0=/foo
writer.staging.dir.1=/foo
{code}
Initially {{writer.staging.dir}} is of type String, but when the Hocon parser encounters {{writer.staging.dir.0}}, it decides that {{writer.staging.dir}} is now of type Object, overwriting the prior value with {{_\{"0": "/foo"\}_}}.
The effective Hocon configuration is:
{code:javascript}
{
"writer": {
"staging": {
"dir": {
"0": "/foo",
"1": "/foo"
}
}
}
}
{code}
Fork specific configuration uses the same config keys as regular configuration except the fork number is appended like: {{.1}}. The code that looks up fork specific configuration doesn't automatically fallback to regular configuration. For example, if the code is trying to find {{writer.staging.dir.0}} and it isn't configured, the job will fail. Then means that all forks must configure fork specific versions of {{writer.staging.dir}}.
When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it cleans up the based on the current job's configuration. Because of this, {{fork.branches}} is always set to {{1}}. The call to {{WriterUtils.getWriterStagingDir(state, numBranches, branchId)}} is make with {{numBranches=1}} and {{branchId=0}}. This results in the method looking for {{writer.staging.dir}}. Unfortunately, when using Hocon configuration the value {{writer.staging.dir}} doesn't exist and the job fails.
> JobLauncherUtils.cleanTaskStagingData fails for jobs with forks
> ---------------------------------------------------------------
>
> Key: GOBBLIN-227
> URL: https://issues.apache.org/jira/browse/GOBBLIN-227
> Project: Apache Gobblin
> Issue Type: Bug
> Reporter: Joel Baranick
>
> *Precondition:*
> Using Hocon configuration and have two forks configured.
> *Summary:*
> When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it tries to lookup {{writer.staging.dir}} in the configuration and fails.
> *Details:*
> Hocon configuration doesn't allow the following config:
> {code:none}
> writer.staging.dir=/foo
> writer.staging.dir.0=/foo
> writer.staging.dir.1=/foo
> {code}
> Initially {{writer.staging.dir}} is of type String, but when the Hocon parser encounters {{writer.staging.dir.0}}, it decides that {{writer.staging.dir}} is now of type Object, overwriting the prior value with {{_\{"0": "/foo"\}_}}.
> The effective Hocon configuration is:
> {code:javascript}
> {
> "writer": {
> "staging": {
> "dir": {
> "0": "/foo",
> "1": "/foo"
> }
> }
> }
> }
> {code}
> Fork specific configuration uses the same config keys as regular configuration except the fork number is appended like: {{.1}}. The code that looks up fork specific configuration doesn't automatically fallback to regular configuration. For example, if the code is trying to find {{writer.staging.dir.0}} and it isn't configured, the job will fail. Then means that all forks must configure fork specific versions of {{writer.staging.dir}}.
> When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it cleans up the based on the current job's configuration. Because of this, {{fork.branches}} is always set to {{1}}. The call to {{WriterUtils.getWriterStagingDir(state, numBranches, branchId)}} is made with {{numBranches=1}} and {{branchId=0}}. This results in the method looking for {{writer.staging.dir}}. Unfortunately, when using Hocon configuration the value {{writer.staging.dir}} doesn't exist and the job fails.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)