You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mark Kerzner <ma...@gmail.com> on 2009/11/10 21:30:57 UTC

Should I upgrade from 0.18.3 to the latest 0.20.1?

Hi,

I've been working on my project for about a year, and I decided to upgrade
from 0.18.3 (which was stable and already old even back then). I have
started, but I see that many classes have changed, many are deprecated, and
I need to re-write some code. Is it worth it? What are the advantages of
doing this? Other areas of concern are:

   - Will Amazon EMR work with the latest Hadoop?
   - What about Cloudera distribution or Yahoo distribution?

Thank you,
Mark

Re: Should I upgrade from 0.18.3 to the latest 0.20.1?

Posted by Mark Kerzner <ma...@gmail.com>.
Thank you to all who answered on this thread. From your answers, it feels
like I will be OK if I run on 0.20.1 on my workstation, but I will not
change the code and not remove the deprecated API calls. Then I will get the
performance improvements of 0.20.1 but avoid additional work. My API calls
are pretty standard and straightforward.

It will then still work on EMR, and for my own clusters I will use Cloudera
or Yahoo distributions.

Again, thank you.

Mark



On Tue, Nov 10, 2009 at 8:11 PM, Matt Massie <ma...@cloudera.com> wrote:

> Hi Mark-
>
> Currently Amazon's EMR only runs Hadoop 0.18.3.
>
> Cloudera Distribution for Hadoop has patched/tested packages for both
> Hadoop
> 0.18.3 and Hadoop 0.20.1 (as well as Pig, Hive, HBase and Zookeeper).  CDH2
> was released August of this year as a "testing" release.  We expect to
> promote is to "stable" in 4-6 weeks.  You can learn more at
> http://archive.cloudera.com/docs/ or feel free to contact me directly
> off-list.
>
> -Matt
>
> On Tue, Nov 10, 2009 at 12:30 PM, Mark Kerzner <markkerzner@gmail.com
> >wrote:
>
> > Hi,
> >
> > I've been working on my project for about a year, and I decided to
> upgrade
> > from 0.18.3 (which was stable and already old even back then). I have
> > started, but I see that many classes have changed, many are deprecated,
> and
> > I need to re-write some code. Is it worth it? What are the advantages of
> > doing this? Other areas of concern are:
> >
> >   - Will Amazon EMR work with the latest Hadoop?
> >   - What about Cloudera distribution or Yahoo distribution?
> >
> > Thank you,
> > Mark
> >
>

Re: Should I upgrade from 0.18.3 to the latest 0.20.1?

Posted by Matt Massie <ma...@cloudera.com>.
Hi Mark-

Currently Amazon's EMR only runs Hadoop 0.18.3.

Cloudera Distribution for Hadoop has patched/tested packages for both Hadoop
0.18.3 and Hadoop 0.20.1 (as well as Pig, Hive, HBase and Zookeeper).  CDH2
was released August of this year as a "testing" release.  We expect to
promote is to "stable" in 4-6 weeks.  You can learn more at
http://archive.cloudera.com/docs/ or feel free to contact me directly
off-list.

-Matt

On Tue, Nov 10, 2009 at 12:30 PM, Mark Kerzner <ma...@gmail.com>wrote:

> Hi,
>
> I've been working on my project for about a year, and I decided to upgrade
> from 0.18.3 (which was stable and already old even back then). I have
> started, but I see that many classes have changed, many are deprecated, and
> I need to re-write some code. Is it worth it? What are the advantages of
> doing this? Other areas of concern are:
>
>   - Will Amazon EMR work with the latest Hadoop?
>   - What about Cloudera distribution or Yahoo distribution?
>
> Thank you,
> Mark
>

Re: Should I upgrade from 0.18.3 to the latest 0.20.1?

Posted by Edmund Kohlwey <ek...@gmail.com>.
The new API in 0.20.x is likely not what you'll see in the final Hadoop 
1.0 release, which I've heard some people forecast within the next 18 
months or so (we'll see). There will likely be a 0.21.x series, and then 
the final release.

That having been said, its much more similar to what you'll see in the 
final release. Depending on how complex your jobs are, you may see minor 
or no changes in the final release, or you may see dramatic ones. I 
think (someone correct me if I'm wrong) the basic map and reduce 
abstract classes are just about set in stone, but if you're using other 
stuff like file formats, custom splits, etc. then you may see a lot of 
differences. I've also noticed a lot of changes in how the job and task 
trackers work, even in the current trunk. There's also some interesting 
work being done by yahoo on pipelining MR jobs, which will not be in any 
0.20.x release.

The other thing about 0.20.x is that a lot of the old API (like joins, 
etc.) has not been updated, so your application may be a hodgepodge 
patchwork of the two APIs.

Are there any portions of the new API which are particularly attractive 
to you? That might help people suggest weather or not you should switch 
to satisfy that need. If you don't have any needs particular to the 
0.20.x API then there's probably little reason to switch.

If you do upgrade to 0.20.1, make sure to get the cloudera or yahoo 
distributions. The current "stable" (0.20.1) release on the Apache page 
is very buggy.

On 11/10/09 3:30 PM, Mark Kerzner wrote:
> Hi,
>
> I've been working on my project for about a year, and I decided to upgrade
> from 0.18.3 (which was stable and already old even back then). I have
> started, but I see that many classes have changed, many are deprecated, and
> I need to re-write some code. Is it worth it? What are the advantages of
> doing this? Other areas of concern are:
>
>     - Will Amazon EMR work with the latest Hadoop?
>     - What about Cloudera distribution or Yahoo distribution?
>
> Thank you,
> Mark
>
>    


Re: Should I upgrade from 0.18.3 to the latest 0.20.1?

Posted by Scott Carey <sc...@richrelevance.com>.
The old API may be deprecated, but it still works and you don't have to change your code yet.

A later release will remove the old API altogether.  0.20.1+ is a good place to run your old code and learn some of the newer stuff.

There may be many other features you can get good use of (Schedulers, multiple tasks per JVM, general performance improvements, etc).
If those features are compelling enough, you might consider an upgrade.


On 11/10/09 12:30 PM, "Mark Kerzner" <ma...@gmail.com> wrote:

Hi,

I've been working on my project for about a year, and I decided to upgrade
from 0.18.3 (which was stable and already old even back then). I have
started, but I see that many classes have changed, many are deprecated, and
I need to re-write some code. Is it worth it? What are the advantages of
doing this? Other areas of concern are:

   - Will Amazon EMR work with the latest Hadoop?
   - What about Cloudera distribution or Yahoo distribution?

Thank you,
Mark