You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by Konstantin Boudnik <co...@apache.org> on 2011/10/01 01:38:11 UTC
Re: Deciding on the Hadoop version for Bigtop 0.2.0

Roman,

in the proposed form choice 1) is pretty much obvious. However, the great idea
of BigTop (or rather its intellectual ancestor iTest) is to be able to tailor
BigData stacks for your own liking (for a reference he's the early slide deck
we've put on iTest http://www.scribd.com/doc/63012489/Big-Data-Stacks-Validation).

IMO this is a very important feature and just from the point of view of
proving the versatility of the concept it makes sense to go with option 2)
below even if the stack in question will be limited to a somewhat lesser ## of
components. For this option it'd be great to have more components on board. I
have looked through the list of JIRA below, but they look like place holders
now and don't provide much info about what's the problem exactly. Say, I took
a look at the latest failing build of sqoop and it seems like a problem with 

[ivy:resolve]   io problem while parsing ivy file:
http://repo1.maven.org/maven2/org/apache/velocity/velocity/1.7/velocity-1.7.pom:
Resetting to invalid mark
[ivy:resolve]       module not found: org.apache.velocity#velocity;1.7

which might be an easy thing to fix. I can take a look at this and perhaps fix
it.
I also have updated Oozie ticket - another dep. problem, which needs to be
looked upon by some of Oozie experts.

I wouldn't do anything about 3) as well - this project can't fix 0.20.2xx
singlehandedly unless there's a buy-out from a wider Hadoop community.

However, if the consensus is to stick with 0.20.2 which is currently most
accepted Hadoop version or so I believe for the purpose of 0.2.0 release let's
have 0.3.0 right after for Hadoop 0.22 - I don't see any problems with
multiple release chasing each other like that.

Hope it makes sense ;)
  Cos

On Fri, Sep 30, 2011 at 12:06PM, Roman Shaposhnik wrote:
> Hi!
> 
> By now it is obvious that the decision on what version of Hadoop to use
> for Bigtop 0.2.0 is not going to be an easy one ;-)
> 
> So far we've got the following choices:
>    1. Do nothing and stick with 0.20.2. This is not all that bad, I suppose,
>        we've got tons of updates to the Bigtop to justify a release. But it
>        feels a little bit anticlimactic in a sense that Bigtop 0.3.0 is very
>        likely to be on top of 0.23 and I really would like to have a chance
>        of having a Bigtop release on top of the last MR1.
>    2. Target upcoming Hadoop 0.22. This is going to break all of the
>        downstream except for HBase. Here's a list of JIRAs filed:
>            https://issues.apache.org/jira/browse/SQOOP-354
>            https://issues.apache.org/jira/browse/PIG-2277
>            https://issues.apache.org/jira/browse/OOZIE-565
>            https://issues.apache.org/jira/browse/HIVE-2468
>            https://issues.apache.org/jira/browse/MAHOUT-822
>         As you can see, most of the downstream is OKins with
>         implementing the changes in time for .23, but not necessarily
>         in time for .22 and Bigtop 0.2.0.
>    3. Target 0.20.205.0. The downside there is that unless the following
>         build issues are resolved, we can't even compile it on the platforms
>         Bigtop cares about:
>              https://issues.apache.org/jira/browse/HADOOP-6436
>              https://issues.apache.org/jira/browse/MAPREDUCE-2127
>              https://issues.apache.org/jira/browse/HDFS-2327
> 
> For #2 and #3 we can, potentially, mitigate the timing issue by patching
> the downstream locally at the Bigtop level. Do we want to entertain such
> an idea?
> 
> Anyway, please chime in with your thoughts. It would be very nice to
> have Bigtop 0.2.0 fully tested and out sometime around first weeks of
> Nov. As such we have to make a decision on Hadoop rather soon.
> 
> Thanks,
> Roman.