You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by "Roman Shaposhnik (Updated) (JIRA)" <ji...@apache.org> on 2012/01/27 20:52:09 UTC

[jira] [Updated] (BIGTOP-358) now that hadoop packages have been split we have to update the dependencies on the downstream packages

     [ https://issues.apache.org/jira/browse/BIGTOP-358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Roman Shaposhnik updated BIGTOP-358:
------------------------------------

    Attachment: bigtop.dot
                bigtop.png

Attached dot and png files are what I figured so far (rectangle boxes represent capabilities that will be provided by actual packages and dotted lines represent "optional/recommended" dependencies). Now, I still have a few concerns:

1. I think it is pretty clear by now that mapreduce dependency has to be on a capability, not an actual package (and then we'll have hadoop-mapreduce "Provide: " that capability. The question is whether we are ready to do the same with hadoop-hdfs and what those capabilities should be called (my proposal is to call them "mapreduce" and "dfs" respectively and make the actual packages hadoop-mapreduce and hadoop-hdfs provide those capabilities for now).

2. For pig, hive,sqoop and mahout the real hard dependency is mapreduce. The dependency on dfs is an optional one (they can run just fine in local mode without ever talking to HDFS). The question is -- what's the best mechanism to "recommend" dfs? I know we can do that with debian packages (Recommends tag), but what about RPM? Finally, are we doing the right thing here by treating dfs as an optional dependency or should we enforce it to begin with?

3. HBase is a weird case here -- at the Maven level they package all of their dependencies (optional or not) into lib/* they end up with a whole bunch of jars there that we're currently replacing by symlinks. Not all of those dependencies are needed by HBase in all cases
(in fact the only hard dependency there is Zookeeper) but having dangling symlinks doesn't seem appealing. The question is -- what do we do?
                
> now that hadoop packages have been split we have to update the dependencies on the downstream packages
> ------------------------------------------------------------------------------------------------------
>
>                 Key: BIGTOP-358
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-358
>             Project: Bigtop
>          Issue Type: Bug
>            Reporter: Roman Shaposhnik
>            Assignee: Roman Shaposhnik
>         Attachments: bigtop.dot, bigtop.png
>
>
> This is actually slightly more complicated than it sounds: it is pretty straightforward to replace a dependency on hadoop with a dependency on hadoop-mapreduce it is less clear what to do with HDFS. Strictly speaking HDFS is not a hard dependency (one can run on a local filesystems just fine).
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira