You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Veena Basavaraj (JIRA)" <ji...@apache.org> on 2015/02/19 01:19:12 UTC
[jira] [Resolved] (SQOOP-2025) Input/State history per job run / submission

     [ https://issues.apache.org/jira/browse/SQOOP-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Veena Basavaraj resolved SQOOP-2025.
------------------------------------
    Resolution: Won't Fix

> Input/State history per job run / submission
> --------------------------------------------
>
>                 Key: SQOOP-2025
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2025
>             Project: Sqoop
>          Issue Type: Bug
>            Reporter: Veena Basavaraj
>            Assignee: Veena Basavaraj
>             Fix For: 2.0.0
>
>
> As per SQOOP-1804, we will be storing both treating both the config inputs and intermediate state generated as part of the job run in the config object. 
> Currently the config object is stored in the repository model under {code}SQ_CONFIG{code} table. It is per SQ_CONFIGURABLE. 
> The inputs within the Config class  and its attirbutes are stored in the {code}SQ_INPUT{code}
> i,e the columns in the SQ_INPUT map to the attributed of the config @Input annotation
> {code}
>  @Input(size = 50)
>   public String schemaName;
>   @Input(size = 50)
>   public String tableName;
> {code}
> The actual values for the SQ_INPUT keys per sqoop job are stored in
> {code}
> SQ_JOB_INPUT and SQ_LINK_INPUT 
> {code}
> So this means we overwrite the config input values for every job run.
> Lets take an example.
> if a job is started with config value for key "test" as foo, the first job run the SQ_INPUT will reflect the value foo. Before the second run, say the value was modified to "bar" then the SQ_INPUT table will reflect the value "bar", if the user were supposed to query the config values based on the job Id, they will only see the last value modified i.e "bar", it does not tell the user the value that was used before and job run started and the value the job run / submission ended.
> The proposal is to provide this history so that the user can track per job run the config input values.
> A simple proposal is to have a FK submission_id in the SQ_JOB_INPUT table,
> and SQ_LINK_INPUT table.
> [~anandriyer] also suggested we store before/ after config state if possible
> To do the BEFORE/AFTER config history, 
> 1. We will create a new set of values for each config inputs for every job run, based on the prev state ( or ) if the user edits the configs while the prev job is running, create new ones with null submissionId, and associate it will the submission Id once the job run starts. Once the job run finishes, we will write the config values again to store the AFTER information
> 2. We will need to store the BEFORE/AFTER indicator in another column. 
> 3. We will make only the last run config input values editable if the job has not yet started.
>  
> Pros:
> We have a history per job run that we can query
> We do not have race conditions on config input value edits, since every job run has its own state
> Cons
> We will have a lot of entries in the SQ_JOB_INPUT and SQ_LINK_INPUT than we have now, but I see this unprecedented if we need to provide easy debuggability to the users on what inputs and values were used every  job run, what values where edited etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)