You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Aaron Kimball (JIRA)" <ji...@apache.org> on 2009/09/24 20:19:16 UTC

[jira] Created: (MAPREDUCE-1036) An API Specification for Sqoop

An API Specification for Sqoop
------------------------------

                 Key: MAPREDUCE-1036
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1036
             Project: Hadoop Map/Reduce
          Issue Type: Task
          Components: contrib/sqoop
            Reporter: Aaron Kimball
            Assignee: Aaron Kimball


Over the last several months, Sqoop has evolved to a state that is functional and has room for extensions. Developing extensions requires a stable API and documentation. I am attaching to this ticket a description of Sqoop's design and internal APIs, which include some open questions. I would like to solicit input on the design regarding these open questions and standardize the API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1036) An API Specification for Sqoop

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated MAPREDUCE-1036:
-------------------------------------

    Status: Patch Available  (was: Open)

> An API Specification for Sqoop
> ------------------------------
>
>                 Key: MAPREDUCE-1036
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1036
>             Project: Hadoop Map/Reduce
>          Issue Type: Task
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-1036.patch, sqoop-reference.txt
>
>
> Over the last several months, Sqoop has evolved to a state that is functional and has room for extensions. Developing extensions requires a stable API and documentation. I am attaching to this ticket a description of Sqoop's design and internal APIs, which include some open questions. I would like to solicit input on the design regarding these open questions and standardize the API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1036) An API Specification for Sqoop

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766023#action_12766023 ] 

Tom White commented on MAPREDUCE-1036:
--------------------------------------

> Abstract classes will be used instead of interfaces for all externally-implemented APIs.

+1

The document looks fine to me (nit: "delimeters" typo). I noticed that BigDecimalSerializer seems to be out of place - should it go in Common with Writables, or at least in a different package?

> An API Specification for Sqoop
> ------------------------------
>
>                 Key: MAPREDUCE-1036
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1036
>             Project: Hadoop Map/Reduce
>          Issue Type: Task
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: sqoop-reference.txt
>
>
> Over the last several months, Sqoop has evolved to a state that is functional and has room for extensions. Developing extensions requires a stable API and documentation. I am attaching to this ticket a description of Sqoop's design and internal APIs, which include some open questions. I would like to solicit input on the design regarding these open questions and standardize the API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1036) An API Specification for Sqoop

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated MAPREDUCE-1036:
-------------------------------------

    Attachment: MAPREDUCE-1036.patch

Attaching API document formatted as a patch against Sqoop's documentation. Modified document to take into account discussion thus far.

> An API Specification for Sqoop
> ------------------------------
>
>                 Key: MAPREDUCE-1036
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1036
>             Project: Hadoop Map/Reduce
>          Issue Type: Task
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-1036.patch, sqoop-reference.txt
>
>
> Over the last several months, Sqoop has evolved to a state that is functional and has room for extensions. Developing extensions requires a stable API and documentation. I am attaching to this ticket a description of Sqoop's design and internal APIs, which include some open questions. I would like to solicit input on the design regarding these open questions and standardize the API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1036) An API Specification for Sqoop

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated MAPREDUCE-1036:
-------------------------------------

    Attachment: sqoop-reference.txt

Attaching a draft of the API reference. After the open questions are discussed, I will upload a final version of this document formatted as a patch which extends the existing user-facing documentation.

> An API Specification for Sqoop
> ------------------------------
>
>                 Key: MAPREDUCE-1036
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1036
>             Project: Hadoop Map/Reduce
>          Issue Type: Task
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: sqoop-reference.txt
>
>
> Over the last several months, Sqoop has evolved to a state that is functional and has room for extensions. Developing extensions requires a stable API and documentation. I am attaching to this ticket a description of Sqoop's design and internal APIs, which include some open questions. I would like to solicit input on the design regarding these open questions and standardize the API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1036) An API Specification for Sqoop

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762806#action_12762806 ] 

Aaron Kimball commented on MAPREDUCE-1036:
------------------------------------------

After ruminating on this for a while, here are the decisions I plan to implement:

* ImportOptions will contain a reference to the Configuration used when Sqoop's main() method was invoked. This will be the preferred mechanism to send manager-specific data from the command line (i.e., via {{-D k=v}} and ToolRunner/GenericOptionsParser).
* StreamHandlerFactory will be renamed to AsyncSink to be shorter. All child classes will be renamed as well.
* Abstract classes will be used instead of interfaces for all externally-implemented APIs. This affects ManagerFactory, ConnManager, and StreamHandlerFactory/AsyncSink. Going forward, interfaces will only be used for objects internally created via factories whose implementations are opaque to external clients, or to indicate the presence of a single method or behavior (e.g., {{Closeable}}).


> An API Specification for Sqoop
> ------------------------------
>
>                 Key: MAPREDUCE-1036
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1036
>             Project: Hadoop Map/Reduce
>          Issue Type: Task
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: sqoop-reference.txt
>
>
> Over the last several months, Sqoop has evolved to a state that is functional and has room for extensions. Developing extensions requires a stable API and documentation. I am attaching to this ticket a description of Sqoop's design and internal APIs, which include some open questions. I would like to solicit input on the design regarding these open questions and standardize the API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1036) An API Specification for Sqoop

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773604#action_12773604 ] 

Hudson commented on MAPREDUCE-1036:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #133 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/133/])
    

> An API Specification for Sqoop
> ------------------------------
>
>                 Key: MAPREDUCE-1036
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1036
>             Project: Hadoop Map/Reduce
>          Issue Type: Task
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-1036.patch, sqoop-reference.txt
>
>
> Over the last several months, Sqoop has evolved to a state that is functional and has room for extensions. Developing extensions requires a stable API and documentation. I am attaching to this ticket a description of Sqoop's design and internal APIs, which include some open questions. I would like to solicit input on the design regarding these open questions and standardize the API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1036) An API Specification for Sqoop

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated MAPREDUCE-1036:
-------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I committed this. Thanks, Aaron!

> An API Specification for Sqoop
> ------------------------------
>
>                 Key: MAPREDUCE-1036
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1036
>             Project: Hadoop Map/Reduce
>          Issue Type: Task
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-1036.patch, sqoop-reference.txt
>
>
> Over the last several months, Sqoop has evolved to a state that is functional and has room for extensions. Developing extensions requires a stable API and documentation. I am attaching to this ticket a description of Sqoop's design and internal APIs, which include some open questions. I would like to solicit input on the design regarding these open questions and standardize the API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1036) An API Specification for Sqoop

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766128#action_12766128 ] 

Aaron Kimball commented on MAPREDUCE-1036:
------------------------------------------

BigDecimalSerializer is used from within SqoopRecord classes auto-generated by Sqoop. Therefore it'll be used by client programs. Under the conventions outlined above, that means it belongs in o.a.h.sqoop.lib.

If it has broader applicability than Sqoop, then it may make sense to promote it elsewhere. But it's not a Writable in-and-of-itself. (SqoopRecord uses BigDecimalSerializer to de/serialize BigDecimal fields within its own readFields/write methods.) To my knowledge, Hadoop doesn't currently have a policy with regard to which package external serializers go in. If you think it'd be more generally useful, I'm happy to move it into common. Do you think it should just go in o.a.h.io?

> An API Specification for Sqoop
> ------------------------------
>
>                 Key: MAPREDUCE-1036
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1036
>             Project: Hadoop Map/Reduce
>          Issue Type: Task
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: sqoop-reference.txt
>
>
> Over the last several months, Sqoop has evolved to a state that is functional and has room for extensions. Developing extensions requires a stable API and documentation. I am attaching to this ticket a description of Sqoop's design and internal APIs, which include some open questions. I would like to solicit input on the design regarding these open questions and standardize the API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1036) An API Specification for Sqoop

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763259#action_12763259 ] 

Todd Lipcon commented on MAPREDUCE-1036:
----------------------------------------

Read over the doc just now. Looks good to me, and I agree with all of your postruminatory decisions. +1

> An API Specification for Sqoop
> ------------------------------
>
>                 Key: MAPREDUCE-1036
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1036
>             Project: Hadoop Map/Reduce
>          Issue Type: Task
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: sqoop-reference.txt
>
>
> Over the last several months, Sqoop has evolved to a state that is functional and has room for extensions. Developing extensions requires a stable API and documentation. I am attaching to this ticket a description of Sqoop's design and internal APIs, which include some open questions. I would like to solicit input on the design regarding these open questions and standardize the API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1036) An API Specification for Sqoop

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763174#action_12763174 ] 

Aaron Kimball commented on MAPREDUCE-1036:
------------------------------------------

In the same spirit as the "Context" objects provided to Mapper.map() and Reducer.reduce(), I'm also going to modify ConnManager.importJob() to receive an ImportJobContext as its parameter. 

> An API Specification for Sqoop
> ------------------------------
>
>                 Key: MAPREDUCE-1036
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1036
>             Project: Hadoop Map/Reduce
>          Issue Type: Task
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: sqoop-reference.txt
>
>
> Over the last several months, Sqoop has evolved to a state that is functional and has room for extensions. Developing extensions requires a stable API and documentation. I am attaching to this ticket a description of Sqoop's design and internal APIs, which include some open questions. I would like to solicit input on the design regarding these open questions and standardize the API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1036) An API Specification for Sqoop

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768082#action_12768082 ] 

Hadoop QA commented on MAPREDUCE-1036:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422756/MAPREDUCE-1036.patch
  against trunk revision 827854.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/195/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/195/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/195/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/195/console

This message is automatically generated.

> An API Specification for Sqoop
> ------------------------------
>
>                 Key: MAPREDUCE-1036
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1036
>             Project: Hadoop Map/Reduce
>          Issue Type: Task
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-1036.patch, sqoop-reference.txt
>
>
> Over the last several months, Sqoop has evolved to a state that is functional and has room for extensions. Developing extensions requires a stable API and documentation. I am attaching to this ticket a description of Sqoop's design and internal APIs, which include some open questions. I would like to solicit input on the design regarding these open questions and standardize the API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.