You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Cheolsoo Park <ch...@cloudera.com> on 2012/06/18 23:55:45 UTC

Sqoop2 questions

Hi Sqoop developers,

Thinking about the Sqoop2 client interface, I have a couple of questions
that I'd like to ask everyone.

*1. CLI vs. Web UI*

Currently, we have MForm type that represents a group of questions. For
example, an import job MForm consists of several MInput such as table name,
target dir, etc.

This seems to make perfect sense for Web UI as we can present a group of
questions in a single form, but I don't think that this fits well with CLI.
For example, in Web UI, we may ask multiple questions in a single form as
follows:

Table name: ___
Target dir: ___
Columns: ___

But in CLI, we can't really do the same. Instead, we're forced to ask a
single question at a time:

Table name? <enter>
Target dir? <enter>
Columns? <enter>

Granted, we could simply iterate MInputs of a MForm in CLI, but I am
wondering if we can have a better logical representation that fits well
with both UIs.

*2. Dependency among options*

Given one of design goals in Sqoop2 is easy to use, we should be able to
guide users through various options by asking relevant questions based on
their previous answers. For example, in an import job, we may ask different
questions depending whether or not it is a Hive import.

Hive import?
  Yes --> Hive table name?
  No --> no further question

To do this, I think that we need some sort of dependency graph to represent
options, and I think that a tree structure (thanks Bilung for your
suggestion) makes more sense than a list. (The current implementation of
MForm is a list.)

If we decide to implement depdency graph, another related question is how
to collect dependency information from connector developers. A while ago, I
played a bit with the connector interface, and one of suggestions from
Arvind was use Java annotation to embed meta data such as label, max
length, etc. For example:

  @ConnectionInput(name = "inp-conn-connectstring", maxChars = 128)
  protected String connectString = null;

  @ConnectionInput(name = "inp-conn-username")
  protected String username = null;

  @ConnectionInput(name = "inp-conn-password", hidden = true)
  protected String password = null;

(You can find full implementation at my github:
http://github.sf.cloudera.com/cheolsoo/apache-sqoop/commit/af8dde141c3ae1e0e70f178a241171d36421aec7
)

Regardless whether we use annotation or another, it seems straightforward
to embed meta data for individual inputs in the connector interface. But it
is not clear to me how to embed meta data among inputs such as dependency
information. I am wondering if anyone has a good suggestion about how to
achieve this.

Thoughts?

Thanks,
Cheolsoo

Re: Sqoop2 questions

Posted by Cheolsoo Park <ch...@cloudera.com>.
Thanks for your answers, Arvind!

All make sense to me.

Cheolsoo

On Mon, Jun 18, 2012 at 3:46 PM, Arvind Prabhakar <ar...@apache.org> wrote:

> Hi Cheolsoo,
>
> Excellent questions. Please see my comments inline below:
>
> On Mon, Jun 18, 2012 at 2:55 PM, Cheolsoo Park <cheolsoo@cloudera.com
> >wrote:
>
> > Hi Sqoop developers,
> >
> > Thinking about the Sqoop2 client interface, I have a couple of questions
> > that I'd like to ask everyone.
> >
> > *1. CLI vs. Web UI*
> >
> > Currently, we have MForm type that represents a group of questions. For
> > example, an import job MForm consists of several MInput such as table
> name,
> > target dir, etc.
> >
> > This seems to make perfect sense for Web UI as we can present a group of
> > questions in a single form, but I don't think that this fits well with
> CLI.
> > For example, in Web UI, we may ask multiple questions in a single form as
> > follows:
> >
> > Table name: ___
> > Target dir: ___
> > Columns: ___
> >
> > But in CLI, we can't really do the same. Instead, we're forced to ask a
> > single question at a time:
> >
> > Table name? <enter>
> > Target dir? <enter>
> > Columns? <enter>
> >
> > Granted, we could simply iterate MInputs of a MForm in CLI, but I am
> > wondering if we can have a better logical representation that fits well
> > with both UIs.
> >
>
> Both the Web UI as well as the CLI will operate on one instance of MForm at
> a given time. This instance will be sent back to the server for validation
> and will come back with validation errors annotating the various MInputs
> within it. If everything is valid, the server will send the next MForm in
> the series. What the CLI/Web UI does with the MForm is upto the client
> implementation. For example, it could describe the inputs necessary upfront
> and then go into interactive mode for each one of them.
>
> One alternative to iterating over the inputs would be to use a library like
> cursors to do terminal manipulation. However, that would require native
> code and will have a significant maintenance cost associated with it.
>
>
> > *2. Dependency among options*
> >
> > Given one of design goals in Sqoop2 is easy to use, we should be able to
> > guide users through various options by asking relevant questions based on
> > their previous answers. For example, in an import job, we may ask
> different
> > questions depending whether or not it is a Hive import.
> >
> > Hive import?
> >  Yes --> Hive table name?
> >  No --> no further question
> >
> > To do this, I think that we need some sort of dependency graph to
> represent
> > options, and I think that a tree structure (thanks Bilung for your
> > suggestion) makes more sense than a list. (The current implementation of
> > MForm is a list.)
> >
> > If we decide to implement depdency graph, another related question is how
> > to collect dependency information from connector developers. A while
> ago, I
> > played a bit with the connector interface, and one of suggestions from
> > Arvind was use Java annotation to embed meta data such as label, max
> > length, etc. For example:
> >
> >  @ConnectionInput(name = "inp-conn-connectstring", maxChars = 128)
> >  protected String connectString = null;
> >
> >  @ConnectionInput(name = "inp-conn-username")
> >  protected String username = null;
> >
> >  @ConnectionInput(name = "inp-conn-password", hidden = true)
> >  protected String password = null;
> >
> > (You can find full implementation at my github:
> >
> >
> http://github.sf.cloudera.com/cheolsoo/apache-sqoop/commit/af8dde141c3ae1e0e70f178a241171d36421aec7
> > )
> >
> > Regardless whether we use annotation or another, it seems straightforward
> > to embed meta data for individual inputs in the connector interface. But
> it
> > is not clear to me how to embed meta data among inputs such as dependency
> > information. I am wondering if anyone has a good suggestion about how to
> > achieve this.
> >
>
> The dependency will be resolved dynamically by the server. For example, if
> the user selects Hive import there would be a follow-up form that will try
> to get details on the various options that the server requires to
> successfully enable that.
>
> Regards,
> Arvind Prabhakar
>
>
>
> >
> > Thoughts?
> >
> > Thanks,
> > Cheolsoo
> >
>

Re: Sqoop2 questions

Posted by Arvind Prabhakar <ar...@apache.org>.
Hi Cheolsoo,

Excellent questions. Please see my comments inline below:

On Mon, Jun 18, 2012 at 2:55 PM, Cheolsoo Park <ch...@cloudera.com>wrote:

> Hi Sqoop developers,
>
> Thinking about the Sqoop2 client interface, I have a couple of questions
> that I'd like to ask everyone.
>
> *1. CLI vs. Web UI*
>
> Currently, we have MForm type that represents a group of questions. For
> example, an import job MForm consists of several MInput such as table name,
> target dir, etc.
>
> This seems to make perfect sense for Web UI as we can present a group of
> questions in a single form, but I don't think that this fits well with CLI.
> For example, in Web UI, we may ask multiple questions in a single form as
> follows:
>
> Table name: ___
> Target dir: ___
> Columns: ___
>
> But in CLI, we can't really do the same. Instead, we're forced to ask a
> single question at a time:
>
> Table name? <enter>
> Target dir? <enter>
> Columns? <enter>
>
> Granted, we could simply iterate MInputs of a MForm in CLI, but I am
> wondering if we can have a better logical representation that fits well
> with both UIs.
>

Both the Web UI as well as the CLI will operate on one instance of MForm at
a given time. This instance will be sent back to the server for validation
and will come back with validation errors annotating the various MInputs
within it. If everything is valid, the server will send the next MForm in
the series. What the CLI/Web UI does with the MForm is upto the client
implementation. For example, it could describe the inputs necessary upfront
and then go into interactive mode for each one of them.

One alternative to iterating over the inputs would be to use a library like
cursors to do terminal manipulation. However, that would require native
code and will have a significant maintenance cost associated with it.


> *2. Dependency among options*
>
> Given one of design goals in Sqoop2 is easy to use, we should be able to
> guide users through various options by asking relevant questions based on
> their previous answers. For example, in an import job, we may ask different
> questions depending whether or not it is a Hive import.
>
> Hive import?
>  Yes --> Hive table name?
>  No --> no further question
>
> To do this, I think that we need some sort of dependency graph to represent
> options, and I think that a tree structure (thanks Bilung for your
> suggestion) makes more sense than a list. (The current implementation of
> MForm is a list.)
>
> If we decide to implement depdency graph, another related question is how
> to collect dependency information from connector developers. A while ago, I
> played a bit with the connector interface, and one of suggestions from
> Arvind was use Java annotation to embed meta data such as label, max
> length, etc. For example:
>
>  @ConnectionInput(name = "inp-conn-connectstring", maxChars = 128)
>  protected String connectString = null;
>
>  @ConnectionInput(name = "inp-conn-username")
>  protected String username = null;
>
>  @ConnectionInput(name = "inp-conn-password", hidden = true)
>  protected String password = null;
>
> (You can find full implementation at my github:
>
> http://github.sf.cloudera.com/cheolsoo/apache-sqoop/commit/af8dde141c3ae1e0e70f178a241171d36421aec7
> )
>
> Regardless whether we use annotation or another, it seems straightforward
> to embed meta data for individual inputs in the connector interface. But it
> is not clear to me how to embed meta data among inputs such as dependency
> information. I am wondering if anyone has a good suggestion about how to
> achieve this.
>

The dependency will be resolved dynamically by the server. For example, if
the user selects Hive import there would be a follow-up form that will try
to get details on the various options that the server requires to
successfully enable that.

Regards,
Arvind Prabhakar



>
> Thoughts?
>
> Thanks,
> Cheolsoo
>