You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "Joel Baranick (JIRA)" <ji...@apache.org> on 2017/08/28 17:32:00 UTC

[jira] [Updated] (GOBBLIN-227) JobLauncherUtils.cleanTaskStagingData fails for jobs with forks

     [ https://issues.apache.org/jira/browse/GOBBLIN-227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Baranick updated GOBBLIN-227:
----------------------------------
    Description: 
*Precondition:* 
Using Hocon configuration and have two forks configured.

*Summary:* 
When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it tries to lookup {{writer.staging.dir}} in the configuration and fails.

*Details:*
Hocon configuration doesn't allow the following config:
{code:none}
writer.staging.dir=/foo
writer.staging.dir.0=/foo
writer.staging.dir.1=/foo
{code}
Initially {{writer.staging.dir}} is of type String, but when the Hocon parser encounters {{writer.staging.dir.0}}, it decides that {{writer.staging.dir}} is now of type Object, overwriting the prior value with {{_\{"0": "/foo"\}_}}.
The effective Hocon configuration is:

{code:javascript}
{
  "writer": {
    "staging": {
      "dir": {
        "0": "/foo",
        "1": "/foo"
      }
    }
  }
}
{code}

Fork specific configuration uses the same config keys as regular configuration except the fork number is appended like: {{.1}}.  The code that looks up fork specific configuration doesn't automatically fallback to regular configuration.  For example, if the code is trying to find {{writer.staging.dir.0}} and it isn't configured, the job will fail.  Then means that all forks must configure fork specific versions of {{writer.staging.dir}}.

When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it cleans up the based on the current job's configuration.  Because of this, {{fork.branches}} is always set to {{1}}. The call to {{WriterUtils.getWriterStagingDir(state, numBranches, branchId)}} is made with {{numBranches=1}} and {{branchId=0}}.  This results in the method looking for {{writer.staging.dir}}. Unfortunately, when using Hocon configuration the value {{writer.staging.dir}} doesn't exist and the job fails.

  was:
*Precondition:* 
Using Hocon configuration and have two forks configured.

*Summary:* 
When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it tries to lookup {{writer.staging.dir}} in the configuration and fails.

*Details:*
Hocon configuration doesn't allow the following config:
{code:none}
writer.staging.dir=/foo
writer.staging.dir.0=/foo
writer.staging.dir.1=/foo
{code}
Initially {{writer.staging.dir}} is of type String, but when the Hocon parser encounters {{writer.staging.dir.0}}, it decides that {{writer.staging.dir}} is now of type Object, overwriting the prior value with {{_\{"0": "/foo"\}_}}.
The effective Hocon configuration is:

{code:javascript}
{
  "writer": {
    "staging": {
      "dir": {
        "0": "/foo",
        "1": "/foo"
      }
    }
  }
}
{code}

Fork specific configuration uses the same config keys as regular configuration except the fork number is appended like: {{.1}}.  The code that looks up fork specific configuration doesn't automatically fallback to regular configuration.  For example, if the code is trying to find {{writer.staging.dir.0}} and it isn't configured, the job will fail.  Then means that all forks must configure fork specific versions of {{writer.staging.dir}}.

When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it cleans up the based on the current job's configuration.  Because of this, {{fork.branches}} is always set to {{1}}. The call to {{WriterUtils.getWriterStagingDir(state, numBranches, branchId)}} is make with {{numBranches=1}} and {{branchId=0}}.  This results in the method looking for {{writer.staging.dir}}. Unfortunately, when using Hocon configuration the value {{writer.staging.dir}} doesn't exist and the job fails.


> JobLauncherUtils.cleanTaskStagingData fails for jobs with forks
> ---------------------------------------------------------------
>
>                 Key: GOBBLIN-227
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-227
>             Project: Apache Gobblin
>          Issue Type: Bug
>            Reporter: Joel Baranick
>
> *Precondition:* 
> Using Hocon configuration and have two forks configured.
> *Summary:* 
> When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it tries to lookup {{writer.staging.dir}} in the configuration and fails.
> *Details:*
> Hocon configuration doesn't allow the following config:
> {code:none}
> writer.staging.dir=/foo
> writer.staging.dir.0=/foo
> writer.staging.dir.1=/foo
> {code}
> Initially {{writer.staging.dir}} is of type String, but when the Hocon parser encounters {{writer.staging.dir.0}}, it decides that {{writer.staging.dir}} is now of type Object, overwriting the prior value with {{_\{"0": "/foo"\}_}}.
> The effective Hocon configuration is:
> {code:javascript}
> {
>   "writer": {
>     "staging": {
>       "dir": {
>         "0": "/foo",
>         "1": "/foo"
>       }
>     }
>   }
> }
> {code}
> Fork specific configuration uses the same config keys as regular configuration except the fork number is appended like: {{.1}}.  The code that looks up fork specific configuration doesn't automatically fallback to regular configuration.  For example, if the code is trying to find {{writer.staging.dir.0}} and it isn't configured, the job will fail.  Then means that all forks must configure fork specific versions of {{writer.staging.dir}}.
> When {{AbstractJobLauncher}} calls {{JobLauncherUtils.cleanTaskStagingData}} it cleans up the based on the current job's configuration.  Because of this, {{fork.branches}} is always set to {{1}}. The call to {{WriterUtils.getWriterStagingDir(state, numBranches, branchId)}} is made with {{numBranches=1}} and {{branchId=0}}.  This results in the method looking for {{writer.staging.dir}}. Unfortunately, when using Hocon configuration the value {{writer.staging.dir}} doesn't exist and the job fails.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)