You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Gwen Shapira <gs...@cloudera.com> on 2015/02/07 01:57:36 UTC

[Discussion] SQOOP-1516 - Config Input as a Top Level Entity

Hi,

Reviewed the design in the wiki (
https://cwiki.apache.org/confluence/display/SQOOP/Sqoop+Config+as+Top+Level+Entity
).
Thanks for writing such detailed plan. I think its a good idea allow direct
editing of configs and the scope of changes look right.

Few questions:

1. In requirements, you mention editing inputs per submission and referred
to SQOOP-2025. However, SQOOP-2025 discusses storing history, and I'm not
sure it makes sense for history to be editable (well, except for soviet
history...). Did you mean to allow editing history? Or are you referring to
something else?

2. CLI commands:
"show config foo --type JOB --subType from --id 1"

I see few possible issues here:
1. I think users don't see config names, so they won't be able to know
about "foo"
2. We don't want to use IDs, in CLI (thats an issue across the board, so we
may leave it here and fix somewhere else)
3. Having type and subtype seems a bit confusing.  Actually, since we don't
allow creating configs directly, users may not view them as "first class".
In their perspective, they created jobs and links and now they get to view
and edit parts of those jobs and links.

How about just adding config-type as a filter to the existing job and link
commands?

"show job --name my_job --config-type from"
and
"alter job --name my_job --config-type from"

This seems to also match the REST API better. Although user facing commands
don't have to match REST API.
Perhaps others want to chime in here?

3.  In REST API, why are we using subType and not type? Is type already
used somewhere?

4. Repository changes: Yes! I suspect we need those anyway.

Gwen

Re: [Discussion] SQOOP-1516 - Config Input as a Top Level Entity

Posted by Veena Basavaraj <vb...@cloudera.com>.
Gwen,

PING?

The pending question seems to be the following.

-Should the command line/rest API use config as the entity or should the
config editing be a sub-child of job/link objects.?

Thoughts?


Regards
Veena




Best,
*./Vee*

On Sat, Feb 7, 2015 at 6:50 AM, Veena Basavaraj <vb...@cloudera.com>
wrote:

> Answers inline, thanks for the feedback, good points. I have tried to
> answer them. Let me know if I was unclear.
>
>
>
>
> Best,
> *./Vee*
>
> On Fri, Feb 6, 2015 at 4:57 PM, Gwen Shapira <gs...@cloudera.com>
> wrote:
>
>> Hi,
>>
>> Reviewed the design in the wiki (
>> https://cwiki.apache.org/confluence/display/SQOOP/Sqoop+Config+as+Top+Level+Entity
>> ).
>> Thanks for writing such detailed plan. I think its a good idea allow
>> direct editing of configs and the scope of changes look right.
>>
>> Few questions:
>>
>> 1. In requirements, you mention editing inputs per submission and
>> referred to SQOOP-2025. However, SQOOP-2025 discusses storing history, and
>> I'm not sure it makes sense for history to be editable (well, except for
>> soviet history...). Did you mean to allow editing history? Or are you
>> referring to something else?
>>
>
> ​When the design wiki was first written the SQOOP-2025 was in discussion.
> In future I hope we store config per submission and not overwrite it, at
> that point ​we should allow reading by submission ID. Editing on submission
> history was not intended, I have corrected the wiki into 2 separate bullet
> points.
>
>    - Read  the Config Inputs by Type/SubType and By Job /Submission (
>    since SQOOP-2025 <https://issues.apache.org/jira/browse/SQOOP-2025> we
>    may be able to have configs by submissionId)
>    - Update the Config Inputs by Type/SubType for the latest/last
>    submission in the job. We should not allow editing previous submissions and
>    it should be read only
>
>
>> 2. CLI commands:
>> "show config foo --type JOB --subType from --id 1"
>>
>> I see few possible issues here:
>> 1. I think users don't see config names, so they won't be able to know
>> about "foo"
>>
> ​Config objects per type are lists. So ability to edit per list is easier
> since they dont have to go through filling all other unrelated configs in
> the list. Users do see the names when they list the configs per connector.
> Am I missing something?
>
>
> 2. We don't want to use IDs, in CLI (thats an issue across the board, so
>> we may leave it here and fix somewhere else)
>>
> ​
> We have not yet added support for names in any commands so far, there is
> ticket for it, at that point it makes sense to support names, id exists so
> far for every other command and hence I think it is consistent.​
>
>
>
>> 3. Having type and subtype seems a bit confusing.  Actually, since we
>> don't allow creating configs directly, users may not view them as "first
>> class".
>> In their perspective, they created jobs and links and now they get to
>> view and edit parts of those jobs and links.
>>
>
> ​This is the point I am trying to fix as well, when creating a job and
> link, instead of having to provide the same set of config inputs every
> time, it is convenient to give a config name/ id​, especially when creating
> REST calls, it is highly erratic and verbose and difficult to fill all the
> config inputs in a JSON structure ( for the POST), when we could just give
> a config name or id. so the intention of this ticket is the user should
> start seeing config as a first class citizen. I will make this statement
> explicit in the design wiki if it is not already. Speaking from the
> experience of using these apis in HUE app it is unnecessarily hard.
>
>>
>> How about just adding config-type as a filter to the existing job and
>> link commands?
>>
>> "show job --name my_job --config-type from"
>> and
>> "alter job --name my_job --config-type from"
>>
>> This seems to also match the REST API better. Although user facing
>> commands don't have to match REST API.
>> Perhaps others want to chime in here?
>>
> ​Answered above and explained it again below.
>
>>
>> 3.  In REST API, why are we using subType and not type? Is type already
>> used somewhere?
>>
>
> ​I spent a few iterations to understand what can be the best way here.
> As I said above, in future as more connectors are added, we can see config
> objects with more inputs, and if we keep extending the config/input
> annotation as per https://issues.apache.org/jira/browse/SQOOP-1643, it
> might be useful to have configs and inputs as first class citizens and
> having both rest apis/ command line support to edit / read them
> individually and not having associated with a JOB/ LINK.
>
> Currently there is a top level ENUM ( MConfigType ), we have 2 values for
> it LINK/ JOB. This is what I refer to as type in the command line. In case
> of rest API. I used it a a resource name,  v1/config/LINK?name=?&id=?&subType=,
> but if this is confusing we can also use it as
>
> v1/config?type=LINk&name=?&id=?&subType=
>
>
> ​Second,
>
> If you see the changes proposed to the MConfigType, it stores the subTypes
> as part of the Type.
>
> At one point, I thought why not have direction as a parameter for type,
> JOB, but direction is not relevant to all configurables. i,e if for the
> driver configs, "direction" has no meaning. Similarly for the type "LINK"
> there is no concept of direction.
>
> Hence I went with the subType, where subType is a second level hierarchy
> for distinguishing the types of configs that are supported in sqoop
>
> Alternatives are possible, but we have to bear in mind that config/ config
> inputs cannot be associated with jobs and links. they are associated with
> connectors.
>
> The config input values are associated with jobs and links rather.
>
>
>> 4. Repository changes: Yes! I suspect we need those anyway.
>>
> ​Ok I assume the current signature is fine with you then​
>
>
>
>>
>> Gwen
>>
>>
>>
>>
>
>

Re: [Discussion] SQOOP-1516 - Config Input as a Top Level Entity

Posted by Veena Basavaraj <vb...@cloudera.com>.
Answers inline, thanks for the feedback, good points. I have tried to
answer them. Let me know if I was unclear.




Best,
*./Vee*

On Fri, Feb 6, 2015 at 4:57 PM, Gwen Shapira <gs...@cloudera.com> wrote:

> Hi,
>
> Reviewed the design in the wiki (
> https://cwiki.apache.org/confluence/display/SQOOP/Sqoop+Config+as+Top+Level+Entity
> ).
> Thanks for writing such detailed plan. I think its a good idea allow
> direct editing of configs and the scope of changes look right.
>
> Few questions:
>
> 1. In requirements, you mention editing inputs per submission and referred
> to SQOOP-2025. However, SQOOP-2025 discusses storing history, and I'm not
> sure it makes sense for history to be editable (well, except for soviet
> history...). Did you mean to allow editing history? Or are you referring to
> something else?
>

​When the design wiki was first written the SQOOP-2025 was in discussion.
In future I hope we store config per submission and not overwrite it, at
that point ​we should allow reading by submission ID. Editing on submission
history was not intended, I have corrected the wiki into 2 separate bullet
points.

   - Read  the Config Inputs by Type/SubType and By Job /Submission ( since
   SQOOP-2025 <https://issues.apache.org/jira/browse/SQOOP-2025> we may be
   able to have configs by submissionId)
   - Update the Config Inputs by Type/SubType for the latest/last
   submission in the job. We should not allow editing previous submissions and
   it should be read only


> 2. CLI commands:
> "show config foo --type JOB --subType from --id 1"
>
> I see few possible issues here:
> 1. I think users don't see config names, so they won't be able to know
> about "foo"
>
​Config objects per type are lists. So ability to edit per list is easier
since they dont have to go through filling all other unrelated configs in
the list. Users do see the names when they list the configs per connector.
Am I missing something?


2. We don't want to use IDs, in CLI (thats an issue across the board, so we
> may leave it here and fix somewhere else)
>
​
We have not yet added support for names in any commands so far, there is
ticket for it, at that point it makes sense to support names, id exists so
far for every other command and hence I think it is consistent.​



> 3. Having type and subtype seems a bit confusing.  Actually, since we
> don't allow creating configs directly, users may not view them as "first
> class".
> In their perspective, they created jobs and links and now they get to view
> and edit parts of those jobs and links.
>

​This is the point I am trying to fix as well, when creating a job and
link, instead of having to provide the same set of config inputs every
time, it is convenient to give a config name/ id​, especially when creating
REST calls, it is highly erratic and verbose and difficult to fill all the
config inputs in a JSON structure ( for the POST), when we could just give
a config name or id. so the intention of this ticket is the user should
start seeing config as a first class citizen. I will make this statement
explicit in the design wiki if it is not already. Speaking from the
experience of using these apis in HUE app it is unnecessarily hard.

>
> How about just adding config-type as a filter to the existing job and link
> commands?
>
> "show job --name my_job --config-type from"
> and
> "alter job --name my_job --config-type from"
>
> This seems to also match the REST API better. Although user facing
> commands don't have to match REST API.
> Perhaps others want to chime in here?
>
​Answered above and explained it again below.

>
> 3.  In REST API, why are we using subType and not type? Is type already
> used somewhere?
>

​I spent a few iterations to understand what can be the best way here.
As I said above, in future as more connectors are added, we can see config
objects with more inputs, and if we keep extending the config/input
annotation as per https://issues.apache.org/jira/browse/SQOOP-1643, it
might be useful to have configs and inputs as first class citizens and
having both rest apis/ command line support to edit / read them
individually and not having associated with a JOB/ LINK.

Currently there is a top level ENUM ( MConfigType ), we have 2 values for
it LINK/ JOB. This is what I refer to as type in the command line. In case
of rest API. I used it a a resource name,  v1/config/LINK?name=?&id=?&subType=,
but if this is confusing we can also use it as

v1/config?type=LINk&name=?&id=?&subType=


​Second,

If you see the changes proposed to the MConfigType, it stores the subTypes
as part of the Type.

At one point, I thought why not have direction as a parameter for type,
JOB, but direction is not relevant to all configurables. i,e if for the
driver configs, "direction" has no meaning. Similarly for the type "LINK"
there is no concept of direction.

Hence I went with the subType, where subType is a second level hierarchy
for distinguishing the types of configs that are supported in sqoop

Alternatives are possible, but we have to bear in mind that config/ config
inputs cannot be associated with jobs and links. they are associated with
connectors.

The config input values are associated with jobs and links rather.


> 4. Repository changes: Yes! I suspect we need those anyway.
>
​Ok I assume the current signature is fine with you then​



>
> Gwen
>
>
>
>