You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Jarek Jarcec Cecho (JIRA)" <ji...@apache.org> on 2013/09/04 17:35:51 UTC

[jira] [Commented] (SQOOP-1072) Sqoop2: Abstract Input/Output interfaces

    [ https://issues.apache.org/jira/browse/SQOOP-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757877#comment-13757877 ] 

Jarek Jarcec Cecho commented on SQOOP-1072:
-------------------------------------------

I've started investigating this one and I would like to share my thoughts with other developers to get additional feedback.

I'm thinking about introducing a new second level citizen object called HIO (hadoop input output). Such objects would be something similar to a small connector, they would have independent configurations, validations and upgraders. Each HIO would cover one specific Input (export) or output (import) on hadoop side. For example I would imagine HDFS, HCatalog, HBase or Hive HIO implementations. I'm thinking of HIO implementations as a second level citizens, because I would not expect users or developers to be creating a new HIO often. Yet I believe that clear separation of each HIO implementation into separate maven module encapsulating the functionality will help us to achieve better readable and maintainable code (e.g unlike Sqoop 1.x). Unlike connectors I would expect that HIO will be more tightly integrated with Sqoop internals and will become more internal abstraction than something entirely exposed to the end user.

Having said all the nice words, I do not have on my mind simple path how to achieve that. Sqoop currently have only one framework entity encapsulating all configuration, validations and upgrades. We could potentially load all HIO modules on server start up and merge them into one structure that will be then used everywhere else. However I would assume that such merge could be quite tricky - we would have to ensure that form names are unique and validations with upgrades could easily become a nightmare. On the bride side, such merge would require quite isolated changes, so the initial implementation would be most likely quite simple. Another approach would be to make the HIO real second level citizen promoting the structures everywhere - e.g. represent them separately in the repository, let user explicitly choose which HIO should be used in a job (protocol + client change), etc... This second approach would be very intrusive as almost every aspect of Sqoop would have to altered. On the other side I would expect that we would end up with much cleaner design as all top level entities would be clearly separated.

I would be interested to hear thoughts of other contributors to see what path would be preferable. I'll be more than happy to put together more formal proposal for the aggressive path if necessary.
                
> Sqoop2: Abstract Input/Output interfaces
> ----------------------------------------
>
>                 Key: SQOOP-1072
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1072
>             Project: Sqoop
>          Issue Type: Improvement
>    Affects Versions: 1.99.2
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Jarek Jarcec Cecho
>             Fix For: 2.0.0
>
>
> The input/output interfaces like {{Text}} or {{SequenceFile}} are currently hardcoded and are present through entire code base. It would be great to abstract the I/O module similarly as we are doing in connectors and push appropriate code to separate modules.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira