You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Teruhiko Kurosaka <Ku...@basistech.com> on 2011/07/15 01:33:53 UTC

Which release to use?

I'm a newbie and I am confused by the Hadoop releases.
I thought 0.21.0 is the latest & greatest release that I
should be using but I noticed 0.20.203 has been released
lately, and 0.21.X is marked "unstable, unsupported".

Should I be using 0.20.203?
----
T. "Kuro" Kurosaka



RE: Which release to use?

Posted by Michael Segel <mi...@hotmail.com>.
See, I knew there was something that I forgot. 

It all goes back to the question ... 'which release to use'... 

2 years ago it was a very simple decision. Now, not so much. :-)

And while Arun and Ownen work for a vendor, I do not and I try to follow each company and their offering. 

As Hadoop goes mainstream, the question of which vendor to choose gets interesting. 
Just like in the 90's during the database vendor wars, it looks like the vendor who has the best sales force and PR will win.
(Not necessarily the best product.)

JMHO

-Mike


> Date: Fri, 15 Jul 2011 16:25:55 -0500
> Subject: Re: Which release to use?
> From: markkerzner@gmail.com
> To: common-user@hadoop.apache.org
> 
> Steve,
> 
> this is so well said, do you mind if I repeat it here,
> http://shmsoft.blogspot.com/2011/07/hadoop-commercial-support-options.html
> 
> Thank you,
> Mark
> 
> On Fri, Jul 15, 2011 at 4:00 PM, Steve Loughran <st...@apache.org> wrote:
> 
> > On 15/07/2011 15:58, Michael Segel wrote:
> >
> >>
> >> Unfortunately the picture is a bit more confusing.
> >>
> >> Yahoo! is now HortonWorks. Their stated goal is to not have their own
> >> derivative release but to sell commercial support for the official Apache
> >> release.
> >> So those selling commercial support are:
> >> *Cloudera
> >> *HortonWorks
> >> *MapRTech
> >> *EMC (reselling MapRTech, but had announced their own)
> >> *IBM (not sure what they are selling exactly... still seems like smoke and
> >> mirrors...)
> >> *DataStax
> >>
> >
> > + Amazon, indirectly, that do their own derivative work of some release of
> > Hadoop (which version is it based on?)
> >
> > I've used 0.21, which was the first with the new APIs and, with MRUnit, has
> > the best test framework. For my small-cluster uses, it worked well. (oh, and
> > I didn't care about security)
> >
> >
> >
 		 	   		  

Re: Which release to use?

Posted by Mark Kerzner <ma...@gmail.com>.
Steve,

this is so well said, do you mind if I repeat it here,
http://shmsoft.blogspot.com/2011/07/hadoop-commercial-support-options.html

Thank you,
Mark

On Fri, Jul 15, 2011 at 4:00 PM, Steve Loughran <st...@apache.org> wrote:

> On 15/07/2011 15:58, Michael Segel wrote:
>
>>
>> Unfortunately the picture is a bit more confusing.
>>
>> Yahoo! is now HortonWorks. Their stated goal is to not have their own
>> derivative release but to sell commercial support for the official Apache
>> release.
>> So those selling commercial support are:
>> *Cloudera
>> *HortonWorks
>> *MapRTech
>> *EMC (reselling MapRTech, but had announced their own)
>> *IBM (not sure what they are selling exactly... still seems like smoke and
>> mirrors...)
>> *DataStax
>>
>
> + Amazon, indirectly, that do their own derivative work of some release of
> Hadoop (which version is it based on?)
>
> I've used 0.21, which was the first with the new APIs and, with MRUnit, has
> the best test framework. For my small-cluster uses, it worked well. (oh, and
> I didn't care about security)
>
>
>

Re: Which release to use?

Posted by Steve Loughran <st...@apache.org>.
On 15/07/2011 15:58, Michael Segel wrote:
>
> Unfortunately the picture is a bit more confusing.
>
> Yahoo! is now HortonWorks. Their stated goal is to not have their own derivative release but to sell commercial support for the official Apache release.
> So those selling commercial support are:
> *Cloudera
> *HortonWorks
> *MapRTech
> *EMC (reselling MapRTech, but had announced their own)
> *IBM (not sure what they are selling exactly... still seems like smoke and mirrors...)
> *DataStax

+ Amazon, indirectly, that do their own derivative work of some release 
of Hadoop (which version is it based on?)

I've used 0.21, which was the first with the new APIs and, with MRUnit, 
has the best test framework. For my small-cluster uses, it worked well. 
(oh, and I didn't care about security)
	   		


Re: Which release to use?

Posted by Vitalii Tymchyshyn <ti...@gmail.com>.
19.07.11 14:50, Steve Loughran написав(ла):
> On 19/07/11 12:44, Rita wrote:
>> Arun,
>>
>> I second Joeś comment.
>> Thanks for giving us a heads up.
>> I will wait patiently until 0.23 is considered stable.
>>
>
> API-wise, 0.21 is better. I know that as I'm working with 0.20.203 
> right now, and it is a step backwards.
>
> Regarding future releases, the best way to get it stable is 
> participate in release testing in your own infrastructure. Nothing 
> else will find the problems unique to your setup of hardware, network 
> and software
>

My little hadoop adoption story (or why I won't test 0.23)
I am among those who think that latest release is what is supported and 
so we got to 0.21 way.
BTW: I've tried to find some release roadmap, but could not find 
anything up to date.
We are using HDFR without Map/Reduce.
As far as I can see now 0.21 nowhere near beta quality with non-working 
new features like backup node or append. Also there is no option for 
such unlucky people to back off to 0.20 (at least "hadoop downgrade" 
search do not give any good results).
I did already fill 5 tickets in Jira, 3 of them with patches. On two 
there is no activity at all, on other three answer is the latest 
non-autogenerated message (and over 3 weeks old).
I did send few messages to this list, one to hdfs-user. No answers.
With this level of project activity, I can't afford to test a thing that 
have not got to 0.21 quality level yet. If I will have any problems, I 
can't afford to wait for months to be heard.
I am more or less stable on my own patched 0.21 for now and will either 
move forward if I will see more project activity or move somewhere else 
if it will become "less stable".

Best regards, Vitalii Tymchyshyn

Re: Which release to use?

Posted by Steve Loughran <st...@apache.org>.
On 19/07/11 12:44, Rita wrote:
> Arun,
>
> I second Joeś comment.
> Thanks for giving us a heads up.
> I will wait patiently until 0.23 is considered stable.
>

API-wise, 0.21 is better. I know that as I'm working with 0.20.203 right 
now, and it is a step backwards.

Regarding future releases, the best way to get it stable is participate 
in release testing in your own infrastructure. Nothing else will find 
the problems unique to your setup of hardware, network and software


Re: Which release to use?

Posted by Rita <rm...@gmail.com>.
Arun,

I second Joeś comment.
Thanks for giving us a heads up.
I will wait patiently until 0.23 is considered stable.


On Mon, Jul 18, 2011 at 11:19 PM, Joe Stein
<ch...@allthingshadoop.com>wrote:

> Arun,
>
> Thanks for the update.
>
> Again, I hate to have to play the part of captain obvious.
>
> Glad to hear the same contiguous mantra for this next release.  I think
> sometimes the plebeians ( of which I am one ) need that affirmation.
>
> One love, Apache Hadoop!
>
> /*
> Joe Stein
> http://www.medialets.com
> Twitter: @allthingshadoop
> */
>
> On Jul 18, 2011, at 11:06 PM, Arun Murthy <ac...@hortonworks.com> wrote:
>
> > Joe,
> >
> > The dev community is currently gearing up for hadoop-0.23 off trunk.
> >
> > 0.23 is a massive step forward with with HDFS Federation, NextGen
> > MapReduce and possible others such as wire-compat and HA NameNode.
> >
> > In a couple of weeks I plan to create the 0.23 branch off trunk and we
> > then spend all our energies stabilizing & pushing the release out.
> > Please see my note to general@ for more details.
> >
> > Arun
> >
> > On Jul 18, 2011, at 7:01 PM, Joe Stein <ch...@allthingshadoop.com>
> wrote:
> >
> >> So, last I checked this list was about Apache Hadoop not about
> derivative works.
> >>
> >> The Cloudera team has always been diligent (you rock) about redirecting
> non apache CDH releases to their list for answers.
> >>
> >> I commend those supporting apache releases of Hadoop too, very cool!!!
> >>
> >> But yeah, even I have to ask what the latest release will be.  Is there
> going to be a single Hadoop release or a continued branch that Horton
> maintains and will only support?
> >>
> >> There is something to be said for release from trunk that gets everyone
> on the same page towards our common goals.  You can pin the "state the
> obvious" paper on my back but kinda feel it had to be said.
> >>
> >> One love, Apache Hadoop!
> >>
> >> /*
> >> Joe Stein
> >> http://www.medialets.com
> >> Twitter: @allthingshadoop
> >> */
> >>
> >> On Jul 18, 2011, at 9:51 PM, Michael Segel <mi...@hotmail.com>
> wrote:
> >>
> >>>
> >>>
> >>>
> >>>> Date: Mon, 18 Jul 2011 18:19:38 -0700
> >>>> Subject: Re: Which release to use?
> >>>> From: mcsrivas@gmail.com
> >>>> To: common-user@hadoop.apache.org
> >>>>
> >>>> Mike,
> >>>>
> >>>> Just a minor inaccuracy in your email. Here's setting the record
> straight:
> >>>>
> >>>> 1. MapR directly sells their distribution of Hadoop. Support is from
>  MapR.
> >>>> 2. EMC also sells the MapR distribution, for use on any hardware.
> Support is
> >>>> from EMC worldwide.
> >>>> 3. EMC also sells a Hadoop appliance, which has the MapR distribution
> >>>> specially built for it. Support is from EMC.
> >>>>
> >>>> 4. MapR also has a free, unlimited, unrestricted version called M3,
> which
> >>>> has the same 2-5x performance, management and stability improvements,
> and
> >>>> includes NFS. It is not crippleware, and the unlimited, unrestricted,
> free
> >>>> use does not expire on any date.
> >>>>
> >>>> Hope that clarifies what MapR is doing.
> >>>>
> >>>> thanks & regards,
> >>>> Srivas.
> >>>>
> >>> Srivas,
> >>>
> >>> I'm sorry, I thought I was being clear in that I was only addressing
> EMC and not MapR directly.
> >>> I was responding to post about EMC selling a Greenplum appliance. I
> wanted to point out that EMC will resell MapR's release along with their own
> (EMC) support.
> >>>
> >>> The point I was trying to make was that with respect to derivatives of
> Hadoop, I believe that MapR has a more compelling story than either EMC or
> DataStax. IMHO replacing Java HDFS w either GreenPlum or Cassandra has a
> limited market.  When a company is going to look at a M/R solution cost and
> performance are going to be at the top of the list. MapR isn't cheap but if
> you look at the features in M5, if they work, then you have a very
> compelling reason to look at their release. Some of the people I spoke to
> when I was in Santa Clara were in the beta program. They indicated that MapR
> did what they claimed.
> >>>
> >>> Things are definitely starting to look interesting.
> >>>
> >>> -Mike
> >>>
> >>>> On Mon, Jul 18, 2011 at 11:33 AM, Michael Segel
> >>>> <mi...@hotmail.com>wrote:
> >>>>
> >>>>>
> >>>>> EMC has inked a deal with MapRTech to resell their release and
> support
> >>>>> services for MapRTech.
> >>>>> Does this mean that they are going to stop selling their own release
> on
> >>>>> Greenplum? Maybe not in the near future, however,
> >>>>> a Greenplum appliance may not get the customer transaction that their
> >>>>> reselling of MapR will generate.
> >>>>>
> >>>>> It sounds like they are hedging their bets and are taking an 'IBM'
> >>>>> approach.
> >>>>>
> >>>>>
> >>>>>> Subject: RE: Which release to use?
> >>>>>> Date: Mon, 18 Jul 2011 08:30:59 -0500
> >>>>>> From: Jeff.Schmitz@shell.com
> >>>>>> To: common-user@hadoop.apache.org
> >>>>>>
> >>>>>> Steve,
> >>>>>>
> >>>>>> I read your blog nice post - I believe EMC is selling the Greenplumb
> >>>>>> solution as an appliance -
> >>>>>>
> >>>>>> Cheers -
> >>>>>>
> >>>>>> Jeffery
> >>>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Steve Loughran [mailto:stevel@apache.org]
> >>>>>> Sent: Friday, July 15, 2011 4:07 PM
> >>>>>> To: common-user@hadoop.apache.org
> >>>>>> Subject: Re: Which release to use?
> >>>>>>
> >>>>>> On 15/07/2011 18:06, Arun C Murthy wrote:
> >>>>>>> Apache Hadoop is a volunteer driven, open-source project. The
> >>>>>> contributors to Apache Hadoop, both individuals and folks across a
> >>>>>> diverse set of organizations, are committed to driving the project
> >>>>>> forward and making timely releases - see discussion on hadoop-0.23
> with
> >>>>>> a raft newer features such as HDFS Federation, NextGen MapReduce and
> >>>>>> plans for HA NameNode etc.
> >>>>>>>
> >>>>>>> As with most successful projects there are several options for
> >>>>>> commercial support to Hadoop or its derivatives.
> >>>>>>>
> >>>>>>> However, Apache Hadoop has thrived before there was any commercial
> >>>>>> support (I've personally been involved in over 20 releases of Apache
> >>>>>> Hadoop and deployed them while at Yahoo) and I'm sure it will in
> this
> >>>>>> new world order.
> >>>>>>>
> >>>>>>> We, the Apache Hadoop community, are committed to keeping Apache
> >>>>>> Hadoop 'free', providing support to our users and to move it forward
> at
> >>>>>> a rapid rate.
> >>>>>>>
> >>>>>>
> >>>>>> Arun makes a good point which is that the Apache project depends on
> >>>>>> contributions from the community to thrive. That includes
> >>>>>>
> >>>>>> -bug reports
> >>>>>> -patches to fix problems
> >>>>>> -more tests
> >>>>>> -documentation improvements: more examples, more on getting started,
> >>>>>> troubleshooting, etc.
> >>>>>>
> >>>>>> If there's something lacking in the codebase, and you think you can
> fix
> >>>>>> it, please do so. Helping with the documentation is a good start, as
> it
> >>>>>> can be improved, and you aren't going to break anything.
> >>>>>>
> >>>>>> Once you get into changing the code, you'll end up working with the
> head
> >>>>>>
> >>>>>> of whichever branch you are targeting.
> >>>>>>
> >>>>>> The other area everyone can contribute on is testing. Yes, Y! and FB
> can
> >>>>>>
> >>>>>> test at scale, yes, other people can test large clusters too -but
> nobody
> >>>>>>
> >>>>>> has a network that looks like yours but you. And Hadoop does care
> about
> >>>>>> network configurations. Testing beta and release candidate releases
> in
> >>>>>> your infrastructure, helps verify that the final release will work
> on
> >>>>>> your site, and you don't end up getting all the phone calls about
> >>>>>> something not working
> >>>>>>
> >>>>>>
> >>>>>
> >>>
>



-- 
--- Get your facts first, then you can distort them as you please.--

Re: Which release to use?

Posted by Joe Stein <ch...@allthingshadoop.com>.
Arun,

Thanks for the update.  

Again, I hate to have to play the part of captain obvious.

Glad to hear the same contiguous mantra for this next release.  I think sometimes the plebeians ( of which I am one ) need that affirmation. 

One love, Apache Hadoop!

/*
Joe Stein
http://www.medialets.com
Twitter: @allthingshadoop
*/

On Jul 18, 2011, at 11:06 PM, Arun Murthy <ac...@hortonworks.com> wrote:

> Joe,
> 
> The dev community is currently gearing up for hadoop-0.23 off trunk.
> 
> 0.23 is a massive step forward with with HDFS Federation, NextGen
> MapReduce and possible others such as wire-compat and HA NameNode.
> 
> In a couple of weeks I plan to create the 0.23 branch off trunk and we
> then spend all our energies stabilizing & pushing the release out.
> Please see my note to general@ for more details.
> 
> Arun
> 
> On Jul 18, 2011, at 7:01 PM, Joe Stein <ch...@allthingshadoop.com> wrote:
> 
>> So, last I checked this list was about Apache Hadoop not about derivative works.
>> 
>> The Cloudera team has always been diligent (you rock) about redirecting non apache CDH releases to their list for answers.
>> 
>> I commend those supporting apache releases of Hadoop too, very cool!!!
>> 
>> But yeah, even I have to ask what the latest release will be.  Is there going to be a single Hadoop release or a continued branch that Horton maintains and will only support?
>> 
>> There is something to be said for release from trunk that gets everyone on the same page towards our common goals.  You can pin the "state the obvious" paper on my back but kinda feel it had to be said.
>> 
>> One love, Apache Hadoop!
>> 
>> /*
>> Joe Stein
>> http://www.medialets.com
>> Twitter: @allthingshadoop
>> */
>> 
>> On Jul 18, 2011, at 9:51 PM, Michael Segel <mi...@hotmail.com> wrote:
>> 
>>> 
>>> 
>>> 
>>>> Date: Mon, 18 Jul 2011 18:19:38 -0700
>>>> Subject: Re: Which release to use?
>>>> From: mcsrivas@gmail.com
>>>> To: common-user@hadoop.apache.org
>>>> 
>>>> Mike,
>>>> 
>>>> Just a minor inaccuracy in your email. Here's setting the record straight:
>>>> 
>>>> 1. MapR directly sells their distribution of Hadoop. Support is from  MapR.
>>>> 2. EMC also sells the MapR distribution, for use on any hardware. Support is
>>>> from EMC worldwide.
>>>> 3. EMC also sells a Hadoop appliance, which has the MapR distribution
>>>> specially built for it. Support is from EMC.
>>>> 
>>>> 4. MapR also has a free, unlimited, unrestricted version called M3, which
>>>> has the same 2-5x performance, management and stability improvements, and
>>>> includes NFS. It is not crippleware, and the unlimited, unrestricted, free
>>>> use does not expire on any date.
>>>> 
>>>> Hope that clarifies what MapR is doing.
>>>> 
>>>> thanks & regards,
>>>> Srivas.
>>>> 
>>> Srivas,
>>> 
>>> I'm sorry, I thought I was being clear in that I was only addressing EMC and not MapR directly.
>>> I was responding to post about EMC selling a Greenplum appliance. I wanted to point out that EMC will resell MapR's release along with their own (EMC) support.
>>> 
>>> The point I was trying to make was that with respect to derivatives of Hadoop, I believe that MapR has a more compelling story than either EMC or DataStax. IMHO replacing Java HDFS w either GreenPlum or Cassandra has a limited market.  When a company is going to look at a M/R solution cost and performance are going to be at the top of the list. MapR isn't cheap but if you look at the features in M5, if they work, then you have a very compelling reason to look at their release. Some of the people I spoke to when I was in Santa Clara were in the beta program. They indicated that MapR did what they claimed.
>>> 
>>> Things are definitely starting to look interesting.
>>> 
>>> -Mike
>>> 
>>>> On Mon, Jul 18, 2011 at 11:33 AM, Michael Segel
>>>> <mi...@hotmail.com>wrote:
>>>> 
>>>>> 
>>>>> EMC has inked a deal with MapRTech to resell their release and support
>>>>> services for MapRTech.
>>>>> Does this mean that they are going to stop selling their own release on
>>>>> Greenplum? Maybe not in the near future, however,
>>>>> a Greenplum appliance may not get the customer transaction that their
>>>>> reselling of MapR will generate.
>>>>> 
>>>>> It sounds like they are hedging their bets and are taking an 'IBM'
>>>>> approach.
>>>>> 
>>>>> 
>>>>>> Subject: RE: Which release to use?
>>>>>> Date: Mon, 18 Jul 2011 08:30:59 -0500
>>>>>> From: Jeff.Schmitz@shell.com
>>>>>> To: common-user@hadoop.apache.org
>>>>>> 
>>>>>> Steve,
>>>>>> 
>>>>>> I read your blog nice post - I believe EMC is selling the Greenplumb
>>>>>> solution as an appliance -
>>>>>> 
>>>>>> Cheers -
>>>>>> 
>>>>>> Jeffery
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Steve Loughran [mailto:stevel@apache.org]
>>>>>> Sent: Friday, July 15, 2011 4:07 PM
>>>>>> To: common-user@hadoop.apache.org
>>>>>> Subject: Re: Which release to use?
>>>>>> 
>>>>>> On 15/07/2011 18:06, Arun C Murthy wrote:
>>>>>>> Apache Hadoop is a volunteer driven, open-source project. The
>>>>>> contributors to Apache Hadoop, both individuals and folks across a
>>>>>> diverse set of organizations, are committed to driving the project
>>>>>> forward and making timely releases - see discussion on hadoop-0.23 with
>>>>>> a raft newer features such as HDFS Federation, NextGen MapReduce and
>>>>>> plans for HA NameNode etc.
>>>>>>> 
>>>>>>> As with most successful projects there are several options for
>>>>>> commercial support to Hadoop or its derivatives.
>>>>>>> 
>>>>>>> However, Apache Hadoop has thrived before there was any commercial
>>>>>> support (I've personally been involved in over 20 releases of Apache
>>>>>> Hadoop and deployed them while at Yahoo) and I'm sure it will in this
>>>>>> new world order.
>>>>>>> 
>>>>>>> We, the Apache Hadoop community, are committed to keeping Apache
>>>>>> Hadoop 'free', providing support to our users and to move it forward at
>>>>>> a rapid rate.
>>>>>>> 
>>>>>> 
>>>>>> Arun makes a good point which is that the Apache project depends on
>>>>>> contributions from the community to thrive. That includes
>>>>>> 
>>>>>> -bug reports
>>>>>> -patches to fix problems
>>>>>> -more tests
>>>>>> -documentation improvements: more examples, more on getting started,
>>>>>> troubleshooting, etc.
>>>>>> 
>>>>>> If there's something lacking in the codebase, and you think you can fix
>>>>>> it, please do so. Helping with the documentation is a good start, as it
>>>>>> can be improved, and you aren't going to break anything.
>>>>>> 
>>>>>> Once you get into changing the code, you'll end up working with the head
>>>>>> 
>>>>>> of whichever branch you are targeting.
>>>>>> 
>>>>>> The other area everyone can contribute on is testing. Yes, Y! and FB can
>>>>>> 
>>>>>> test at scale, yes, other people can test large clusters too -but nobody
>>>>>> 
>>>>>> has a network that looks like yours but you. And Hadoop does care about
>>>>>> network configurations. Testing beta and release candidate releases in
>>>>>> your infrastructure, helps verify that the final release will work on
>>>>>> your site, and you don't end up getting all the phone calls about
>>>>>> something not working
>>>>>> 
>>>>>> 
>>>>> 
>>> 

Re: Which release to use?

Posted by Arun Murthy <ac...@hortonworks.com>.
Joe,

The dev community is currently gearing up for hadoop-0.23 off trunk.

 0.23 is a massive step forward with with HDFS Federation, NextGen
MapReduce and possible others such as wire-compat and HA NameNode.

In a couple of weeks I plan to create the 0.23 branch off trunk and we
then spend all our energies stabilizing & pushing the release out.
Please see my note to general@ for more details.

Arun

On Jul 18, 2011, at 7:01 PM, Joe Stein <ch...@allthingshadoop.com> wrote:

> So, last I checked this list was about Apache Hadoop not about derivative works.
>
> The Cloudera team has always been diligent (you rock) about redirecting non apache CDH releases to their list for answers.
>
> I commend those supporting apache releases of Hadoop too, very cool!!!
>
> But yeah, even I have to ask what the latest release will be.  Is there going to be a single Hadoop release or a continued branch that Horton maintains and will only support?
>
> There is something to be said for release from trunk that gets everyone on the same page towards our common goals.  You can pin the "state the obvious" paper on my back but kinda feel it had to be said.
>
> One love, Apache Hadoop!
>
> /*
> Joe Stein
> http://www.medialets.com
> Twitter: @allthingshadoop
> */
>
> On Jul 18, 2011, at 9:51 PM, Michael Segel <mi...@hotmail.com> wrote:
>
>>
>>
>>
>>> Date: Mon, 18 Jul 2011 18:19:38 -0700
>>> Subject: Re: Which release to use?
>>> From: mcsrivas@gmail.com
>>> To: common-user@hadoop.apache.org
>>>
>>> Mike,
>>>
>>> Just a minor inaccuracy in your email. Here's setting the record straight:
>>>
>>> 1. MapR directly sells their distribution of Hadoop. Support is from  MapR.
>>> 2. EMC also sells the MapR distribution, for use on any hardware. Support is
>>> from EMC worldwide.
>>> 3. EMC also sells a Hadoop appliance, which has the MapR distribution
>>> specially built for it. Support is from EMC.
>>>
>>> 4. MapR also has a free, unlimited, unrestricted version called M3, which
>>> has the same 2-5x performance, management and stability improvements, and
>>> includes NFS. It is not crippleware, and the unlimited, unrestricted, free
>>> use does not expire on any date.
>>>
>>> Hope that clarifies what MapR is doing.
>>>
>>> thanks & regards,
>>> Srivas.
>>>
>> Srivas,
>>
>> I'm sorry, I thought I was being clear in that I was only addressing EMC and not MapR directly.
>> I was responding to post about EMC selling a Greenplum appliance. I wanted to point out that EMC will resell MapR's release along with their own (EMC) support.
>>
>> The point I was trying to make was that with respect to derivatives of Hadoop, I believe that MapR has a more compelling story than either EMC or DataStax. IMHO replacing Java HDFS w either GreenPlum or Cassandra has a limited market.  When a company is going to look at a M/R solution cost and performance are going to be at the top of the list. MapR isn't cheap but if you look at the features in M5, if they work, then you have a very compelling reason to look at their release. Some of the people I spoke to when I was in Santa Clara were in the beta program. They indicated that MapR did what they claimed.
>>
>> Things are definitely starting to look interesting.
>>
>> -Mike
>>
>>> On Mon, Jul 18, 2011 at 11:33 AM, Michael Segel
>>> <mi...@hotmail.com>wrote:
>>>
>>>>
>>>> EMC has inked a deal with MapRTech to resell their release and support
>>>> services for MapRTech.
>>>> Does this mean that they are going to stop selling their own release on
>>>> Greenplum? Maybe not in the near future, however,
>>>> a Greenplum appliance may not get the customer transaction that their
>>>> reselling of MapR will generate.
>>>>
>>>> It sounds like they are hedging their bets and are taking an 'IBM'
>>>> approach.
>>>>
>>>>
>>>>> Subject: RE: Which release to use?
>>>>> Date: Mon, 18 Jul 2011 08:30:59 -0500
>>>>> From: Jeff.Schmitz@shell.com
>>>>> To: common-user@hadoop.apache.org
>>>>>
>>>>> Steve,
>>>>>
>>>>> I read your blog nice post - I believe EMC is selling the Greenplumb
>>>>> solution as an appliance -
>>>>>
>>>>> Cheers -
>>>>>
>>>>> Jeffery
>>>>>
>>>>> -----Original Message-----
>>>>> From: Steve Loughran [mailto:stevel@apache.org]
>>>>> Sent: Friday, July 15, 2011 4:07 PM
>>>>> To: common-user@hadoop.apache.org
>>>>> Subject: Re: Which release to use?
>>>>>
>>>>> On 15/07/2011 18:06, Arun C Murthy wrote:
>>>>>> Apache Hadoop is a volunteer driven, open-source project. The
>>>>> contributors to Apache Hadoop, both individuals and folks across a
>>>>> diverse set of organizations, are committed to driving the project
>>>>> forward and making timely releases - see discussion on hadoop-0.23 with
>>>>> a raft newer features such as HDFS Federation, NextGen MapReduce and
>>>>> plans for HA NameNode etc.
>>>>>>
>>>>>> As with most successful projects there are several options for
>>>>> commercial support to Hadoop or its derivatives.
>>>>>>
>>>>>> However, Apache Hadoop has thrived before there was any commercial
>>>>> support (I've personally been involved in over 20 releases of Apache
>>>>> Hadoop and deployed them while at Yahoo) and I'm sure it will in this
>>>>> new world order.
>>>>>>
>>>>>> We, the Apache Hadoop community, are committed to keeping Apache
>>>>> Hadoop 'free', providing support to our users and to move it forward at
>>>>> a rapid rate.
>>>>>>
>>>>>
>>>>> Arun makes a good point which is that the Apache project depends on
>>>>> contributions from the community to thrive. That includes
>>>>>
>>>>> -bug reports
>>>>> -patches to fix problems
>>>>> -more tests
>>>>> -documentation improvements: more examples, more on getting started,
>>>>> troubleshooting, etc.
>>>>>
>>>>> If there's something lacking in the codebase, and you think you can fix
>>>>> it, please do so. Helping with the documentation is a good start, as it
>>>>> can be improved, and you aren't going to break anything.
>>>>>
>>>>> Once you get into changing the code, you'll end up working with the head
>>>>>
>>>>> of whichever branch you are targeting.
>>>>>
>>>>> The other area everyone can contribute on is testing. Yes, Y! and FB can
>>>>>
>>>>> test at scale, yes, other people can test large clusters too -but nobody
>>>>>
>>>>> has a network that looks like yours but you. And Hadoop does care about
>>>>> network configurations. Testing beta and release candidate releases in
>>>>> your infrastructure, helps verify that the final release will work on
>>>>> your site, and you don't end up getting all the phone calls about
>>>>> something not working
>>>>>
>>>>>
>>>>
>>

Re: Which release to use?

Posted by Joe Stein <ch...@allthingshadoop.com>.
So, last I checked this list was about Apache Hadoop not about derivative works.  

The Cloudera team has always been diligent (you rock) about redirecting non apache CDH releases to their list for answers.

I commend those supporting apache releases of Hadoop too, very cool!!!

But yeah, even I have to ask what the latest release will be.  Is there going to be a single Hadoop release or a continued branch that Horton maintains and will only support?

There is something to be said for release from trunk that gets everyone on the same page towards our common goals.  You can pin the "state the obvious" paper on my back but kinda feel it had to be said.

One love, Apache Hadoop!

/*
Joe Stein
http://www.medialets.com
Twitter: @allthingshadoop
*/

On Jul 18, 2011, at 9:51 PM, Michael Segel <mi...@hotmail.com> wrote:

> 
> 
> 
>> Date: Mon, 18 Jul 2011 18:19:38 -0700
>> Subject: Re: Which release to use?
>> From: mcsrivas@gmail.com
>> To: common-user@hadoop.apache.org
>> 
>> Mike,
>> 
>> Just a minor inaccuracy in your email. Here's setting the record straight:
>> 
>> 1. MapR directly sells their distribution of Hadoop. Support is from  MapR.
>> 2. EMC also sells the MapR distribution, for use on any hardware. Support is
>> from EMC worldwide.
>> 3. EMC also sells a Hadoop appliance, which has the MapR distribution
>> specially built for it. Support is from EMC.
>> 
>> 4. MapR also has a free, unlimited, unrestricted version called M3, which
>> has the same 2-5x performance, management and stability improvements, and
>> includes NFS. It is not crippleware, and the unlimited, unrestricted, free
>> use does not expire on any date.
>> 
>> Hope that clarifies what MapR is doing.
>> 
>> thanks & regards,
>> Srivas.
>> 
> Srivas,
> 
> I'm sorry, I thought I was being clear in that I was only addressing EMC and not MapR directly.
> I was responding to post about EMC selling a Greenplum appliance. I wanted to point out that EMC will resell MapR's release along with their own (EMC) support.
> 
> The point I was trying to make was that with respect to derivatives of Hadoop, I believe that MapR has a more compelling story than either EMC or DataStax. IMHO replacing Java HDFS w either GreenPlum or Cassandra has a limited market.  When a company is going to look at a M/R solution cost and performance are going to be at the top of the list. MapR isn't cheap but if you look at the features in M5, if they work, then you have a very compelling reason to look at their release. Some of the people I spoke to when I was in Santa Clara were in the beta program. They indicated that MapR did what they claimed. 
> 
> Things are definitely starting to look interesting.
> 
> -Mike
> 
>> On Mon, Jul 18, 2011 at 11:33 AM, Michael Segel
>> <mi...@hotmail.com>wrote:
>> 
>>> 
>>> EMC has inked a deal with MapRTech to resell their release and support
>>> services for MapRTech.
>>> Does this mean that they are going to stop selling their own release on
>>> Greenplum? Maybe not in the near future, however,
>>> a Greenplum appliance may not get the customer transaction that their
>>> reselling of MapR will generate.
>>> 
>>> It sounds like they are hedging their bets and are taking an 'IBM'
>>> approach.
>>> 
>>> 
>>>> Subject: RE: Which release to use?
>>>> Date: Mon, 18 Jul 2011 08:30:59 -0500
>>>> From: Jeff.Schmitz@shell.com
>>>> To: common-user@hadoop.apache.org
>>>> 
>>>> Steve,
>>>> 
>>>> I read your blog nice post - I believe EMC is selling the Greenplumb
>>>> solution as an appliance -
>>>> 
>>>> Cheers -
>>>> 
>>>> Jeffery
>>>> 
>>>> -----Original Message-----
>>>> From: Steve Loughran [mailto:stevel@apache.org]
>>>> Sent: Friday, July 15, 2011 4:07 PM
>>>> To: common-user@hadoop.apache.org
>>>> Subject: Re: Which release to use?
>>>> 
>>>> On 15/07/2011 18:06, Arun C Murthy wrote:
>>>>> Apache Hadoop is a volunteer driven, open-source project. The
>>>> contributors to Apache Hadoop, both individuals and folks across a
>>>> diverse set of organizations, are committed to driving the project
>>>> forward and making timely releases - see discussion on hadoop-0.23 with
>>>> a raft newer features such as HDFS Federation, NextGen MapReduce and
>>>> plans for HA NameNode etc.
>>>>> 
>>>>> As with most successful projects there are several options for
>>>> commercial support to Hadoop or its derivatives.
>>>>> 
>>>>> However, Apache Hadoop has thrived before there was any commercial
>>>> support (I've personally been involved in over 20 releases of Apache
>>>> Hadoop and deployed them while at Yahoo) and I'm sure it will in this
>>>> new world order.
>>>>> 
>>>>> We, the Apache Hadoop community, are committed to keeping Apache
>>>> Hadoop 'free', providing support to our users and to move it forward at
>>>> a rapid rate.
>>>>> 
>>>> 
>>>> Arun makes a good point which is that the Apache project depends on
>>>> contributions from the community to thrive. That includes
>>>> 
>>>>  -bug reports
>>>>  -patches to fix problems
>>>>  -more tests
>>>>  -documentation improvements: more examples, more on getting started,
>>>> troubleshooting, etc.
>>>> 
>>>> If there's something lacking in the codebase, and you think you can fix
>>>> it, please do so. Helping with the documentation is a good start, as it
>>>> can be improved, and you aren't going to break anything.
>>>> 
>>>> Once you get into changing the code, you'll end up working with the head
>>>> 
>>>> of whichever branch you are targeting.
>>>> 
>>>> The other area everyone can contribute on is testing. Yes, Y! and FB can
>>>> 
>>>> test at scale, yes, other people can test large clusters too -but nobody
>>>> 
>>>> has a network that looks like yours but you. And Hadoop does care about
>>>> network configurations. Testing beta and release candidate releases in
>>>> your infrastructure, helps verify that the final release will work on
>>>> your site, and you don't end up getting all the phone calls about
>>>> something not working
>>>> 
>>>> 
>>> 
>                         

RE: Which release to use?

Posted by Michael Segel <mi...@hotmail.com>.


> Date: Mon, 18 Jul 2011 18:19:38 -0700
> Subject: Re: Which release to use?
> From: mcsrivas@gmail.com
> To: common-user@hadoop.apache.org
> 
> Mike,
> 
> Just a minor inaccuracy in your email. Here's setting the record straight:
> 
> 1. MapR directly sells their distribution of Hadoop. Support is from  MapR.
> 2. EMC also sells the MapR distribution, for use on any hardware. Support is
> from EMC worldwide.
> 3. EMC also sells a Hadoop appliance, which has the MapR distribution
> specially built for it. Support is from EMC.
> 
> 4. MapR also has a free, unlimited, unrestricted version called M3, which
> has the same 2-5x performance, management and stability improvements, and
> includes NFS. It is not crippleware, and the unlimited, unrestricted, free
> use does not expire on any date.
> 
> Hope that clarifies what MapR is doing.
> 
> thanks & regards,
> Srivas.
> 
Srivas,

I'm sorry, I thought I was being clear in that I was only addressing EMC and not MapR directly.
I was responding to post about EMC selling a Greenplum appliance. I wanted to point out that EMC will resell MapR's release along with their own (EMC) support.

The point I was trying to make was that with respect to derivatives of Hadoop, I believe that MapR has a more compelling story than either EMC or DataStax. IMHO replacing Java HDFS w either GreenPlum or Cassandra has a limited market.  When a company is going to look at a M/R solution cost and performance are going to be at the top of the list. MapR isn't cheap but if you look at the features in M5, if they work, then you have a very compelling reason to look at their release. Some of the people I spoke to when I was in Santa Clara were in the beta program. They indicated that MapR did what they claimed. 

Things are definitely starting to look interesting.

-Mike

> On Mon, Jul 18, 2011 at 11:33 AM, Michael Segel
> <mi...@hotmail.com>wrote:
> 
> >
> > EMC has inked a deal with MapRTech to resell their release and support
> > services for MapRTech.
> > Does this mean that they are going to stop selling their own release on
> > Greenplum? Maybe not in the near future, however,
> > a Greenplum appliance may not get the customer transaction that their
> > reselling of MapR will generate.
> >
> > It sounds like they are hedging their bets and are taking an 'IBM'
> > approach.
> >
> >
> > > Subject: RE: Which release to use?
> > > Date: Mon, 18 Jul 2011 08:30:59 -0500
> > > From: Jeff.Schmitz@shell.com
> > > To: common-user@hadoop.apache.org
> > >
> > > Steve,
> > >
> > > I read your blog nice post - I believe EMC is selling the Greenplumb
> > > solution as an appliance -
> > >
> > > Cheers -
> > >
> > > Jeffery
> > >
> > > -----Original Message-----
> > > From: Steve Loughran [mailto:stevel@apache.org]
> > > Sent: Friday, July 15, 2011 4:07 PM
> > > To: common-user@hadoop.apache.org
> > > Subject: Re: Which release to use?
> > >
> > > On 15/07/2011 18:06, Arun C Murthy wrote:
> > > > Apache Hadoop is a volunteer driven, open-source project. The
> > > contributors to Apache Hadoop, both individuals and folks across a
> > > diverse set of organizations, are committed to driving the project
> > > forward and making timely releases - see discussion on hadoop-0.23 with
> > > a raft newer features such as HDFS Federation, NextGen MapReduce and
> > > plans for HA NameNode etc.
> > > >
> > > > As with most successful projects there are several options for
> > > commercial support to Hadoop or its derivatives.
> > > >
> > > > However, Apache Hadoop has thrived before there was any commercial
> > > support (I've personally been involved in over 20 releases of Apache
> > > Hadoop and deployed them while at Yahoo) and I'm sure it will in this
> > > new world order.
> > > >
> > > > We, the Apache Hadoop community, are committed to keeping Apache
> > > Hadoop 'free', providing support to our users and to move it forward at
> > > a rapid rate.
> > > >
> > >
> > > Arun makes a good point which is that the Apache project depends on
> > > contributions from the community to thrive. That includes
> > >
> > >   -bug reports
> > >   -patches to fix problems
> > >   -more tests
> > >   -documentation improvements: more examples, more on getting started,
> > > troubleshooting, etc.
> > >
> > > If there's something lacking in the codebase, and you think you can fix
> > > it, please do so. Helping with the documentation is a good start, as it
> > > can be improved, and you aren't going to break anything.
> > >
> > > Once you get into changing the code, you'll end up working with the head
> > >
> > > of whichever branch you are targeting.
> > >
> > > The other area everyone can contribute on is testing. Yes, Y! and FB can
> > >
> > > test at scale, yes, other people can test large clusters too -but nobody
> > >
> > > has a network that looks like yours but you. And Hadoop does care about
> > > network configurations. Testing beta and release candidate releases in
> > > your infrastructure, helps verify that the final release will work on
> > > your site, and you don't end up getting all the phone calls about
> > > something not working
> > >
> > >
> >
 		 	   		  

Re: Which release to use?

Posted by "M. C. Srivas" <mc...@gmail.com>.
Mike,

Just a minor inaccuracy in your email. Here's setting the record straight:

1. MapR directly sells their distribution of Hadoop. Support is from  MapR.
2. EMC also sells the MapR distribution, for use on any hardware. Support is
from EMC worldwide.
3. EMC also sells a Hadoop appliance, which has the MapR distribution
specially built for it. Support is from EMC.

4. MapR also has a free, unlimited, unrestricted version called M3, which
has the same 2-5x performance, management and stability improvements, and
includes NFS. It is not crippleware, and the unlimited, unrestricted, free
use does not expire on any date.

Hope that clarifies what MapR is doing.

thanks & regards,
Srivas.

On Mon, Jul 18, 2011 at 11:33 AM, Michael Segel
<mi...@hotmail.com>wrote:

>
> EMC has inked a deal with MapRTech to resell their release and support
> services for MapRTech.
> Does this mean that they are going to stop selling their own release on
> Greenplum? Maybe not in the near future, however,
> a Greenplum appliance may not get the customer transaction that their
> reselling of MapR will generate.
>
> It sounds like they are hedging their bets and are taking an 'IBM'
> approach.
>
>
> > Subject: RE: Which release to use?
> > Date: Mon, 18 Jul 2011 08:30:59 -0500
> > From: Jeff.Schmitz@shell.com
> > To: common-user@hadoop.apache.org
> >
> > Steve,
> >
> > I read your blog nice post - I believe EMC is selling the Greenplumb
> > solution as an appliance -
> >
> > Cheers -
> >
> > Jeffery
> >
> > -----Original Message-----
> > From: Steve Loughran [mailto:stevel@apache.org]
> > Sent: Friday, July 15, 2011 4:07 PM
> > To: common-user@hadoop.apache.org
> > Subject: Re: Which release to use?
> >
> > On 15/07/2011 18:06, Arun C Murthy wrote:
> > > Apache Hadoop is a volunteer driven, open-source project. The
> > contributors to Apache Hadoop, both individuals and folks across a
> > diverse set of organizations, are committed to driving the project
> > forward and making timely releases - see discussion on hadoop-0.23 with
> > a raft newer features such as HDFS Federation, NextGen MapReduce and
> > plans for HA NameNode etc.
> > >
> > > As with most successful projects there are several options for
> > commercial support to Hadoop or its derivatives.
> > >
> > > However, Apache Hadoop has thrived before there was any commercial
> > support (I've personally been involved in over 20 releases of Apache
> > Hadoop and deployed them while at Yahoo) and I'm sure it will in this
> > new world order.
> > >
> > > We, the Apache Hadoop community, are committed to keeping Apache
> > Hadoop 'free', providing support to our users and to move it forward at
> > a rapid rate.
> > >
> >
> > Arun makes a good point which is that the Apache project depends on
> > contributions from the community to thrive. That includes
> >
> >   -bug reports
> >   -patches to fix problems
> >   -more tests
> >   -documentation improvements: more examples, more on getting started,
> > troubleshooting, etc.
> >
> > If there's something lacking in the codebase, and you think you can fix
> > it, please do so. Helping with the documentation is a good start, as it
> > can be improved, and you aren't going to break anything.
> >
> > Once you get into changing the code, you'll end up working with the head
> >
> > of whichever branch you are targeting.
> >
> > The other area everyone can contribute on is testing. Yes, Y! and FB can
> >
> > test at scale, yes, other people can test large clusters too -but nobody
> >
> > has a network that looks like yours but you. And Hadoop does care about
> > network configurations. Testing beta and release candidate releases in
> > your infrastructure, helps verify that the final release will work on
> > your site, and you don't end up getting all the phone calls about
> > something not working
> >
> >
>

RE: Which release to use?

Posted by Michael Segel <mi...@hotmail.com>.
EMC has inked a deal with MapRTech to resell their release and support services for MapRTech.
Does this mean that they are going to stop selling their own release on Greenplum? Maybe not in the near future, however,
a Greenplum appliance may not get the customer transaction that their reselling of MapR will generate.

It sounds like they are hedging their bets and are taking an 'IBM' approach.


> Subject: RE: Which release to use?
> Date: Mon, 18 Jul 2011 08:30:59 -0500
> From: Jeff.Schmitz@shell.com
> To: common-user@hadoop.apache.org
> 
> Steve,
> 
> I read your blog nice post - I believe EMC is selling the Greenplumb
> solution as an appliance - 
> 
> Cheers - 
> 
> Jeffery
> 
> -----Original Message-----
> From: Steve Loughran [mailto:stevel@apache.org] 
> Sent: Friday, July 15, 2011 4:07 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Which release to use?
> 
> On 15/07/2011 18:06, Arun C Murthy wrote:
> > Apache Hadoop is a volunteer driven, open-source project. The
> contributors to Apache Hadoop, both individuals and folks across a
> diverse set of organizations, are committed to driving the project
> forward and making timely releases - see discussion on hadoop-0.23 with
> a raft newer features such as HDFS Federation, NextGen MapReduce and
> plans for HA NameNode etc.
> >
> > As with most successful projects there are several options for
> commercial support to Hadoop or its derivatives.
> >
> > However, Apache Hadoop has thrived before there was any commercial
> support (I've personally been involved in over 20 releases of Apache
> Hadoop and deployed them while at Yahoo) and I'm sure it will in this
> new world order.
> >
> > We, the Apache Hadoop community, are committed to keeping Apache
> Hadoop 'free', providing support to our users and to move it forward at
> a rapid rate.
> >
> 
> Arun makes a good point which is that the Apache project depends on 
> contributions from the community to thrive. That includes
> 
>   -bug reports
>   -patches to fix problems
>   -more tests
>   -documentation improvements: more examples, more on getting started, 
> troubleshooting, etc.
> 
> If there's something lacking in the codebase, and you think you can fix 
> it, please do so. Helping with the documentation is a good start, as it 
> can be improved, and you aren't going to break anything.
> 
> Once you get into changing the code, you'll end up working with the head
> 
> of whichever branch you are targeting.
> 
> The other area everyone can contribute on is testing. Yes, Y! and FB can
> 
> test at scale, yes, other people can test large clusters too -but nobody
> 
> has a network that looks like yours but you. And Hadoop does care about 
> network configurations. Testing beta and release candidate releases in 
> your infrastructure, helps verify that the final release will work on 
> your site, and you don't end up getting all the phone calls about 
> something not working
> 
> 
 		 	   		  

RE: Which release to use?

Posted by Je...@shell.com.
Steve,

I read your blog nice post - I believe EMC is selling the Greenplumb
solution as an appliance - 

Cheers - 

Jeffery

-----Original Message-----
From: Steve Loughran [mailto:stevel@apache.org] 
Sent: Friday, July 15, 2011 4:07 PM
To: common-user@hadoop.apache.org
Subject: Re: Which release to use?

On 15/07/2011 18:06, Arun C Murthy wrote:
> Apache Hadoop is a volunteer driven, open-source project. The
contributors to Apache Hadoop, both individuals and folks across a
diverse set of organizations, are committed to driving the project
forward and making timely releases - see discussion on hadoop-0.23 with
a raft newer features such as HDFS Federation, NextGen MapReduce and
plans for HA NameNode etc.
>
> As with most successful projects there are several options for
commercial support to Hadoop or its derivatives.
>
> However, Apache Hadoop has thrived before there was any commercial
support (I've personally been involved in over 20 releases of Apache
Hadoop and deployed them while at Yahoo) and I'm sure it will in this
new world order.
>
> We, the Apache Hadoop community, are committed to keeping Apache
Hadoop 'free', providing support to our users and to move it forward at
a rapid rate.
>

Arun makes a good point which is that the Apache project depends on 
contributions from the community to thrive. That includes

  -bug reports
  -patches to fix problems
  -more tests
  -documentation improvements: more examples, more on getting started, 
troubleshooting, etc.

If there's something lacking in the codebase, and you think you can fix 
it, please do so. Helping with the documentation is a good start, as it 
can be improved, and you aren't going to break anything.

Once you get into changing the code, you'll end up working with the head

of whichever branch you are targeting.

The other area everyone can contribute on is testing. Yes, Y! and FB can

test at scale, yes, other people can test large clusters too -but nobody

has a network that looks like yours but you. And Hadoop does care about 
network configurations. Testing beta and release candidate releases in 
your infrastructure, helps verify that the final release will work on 
your site, and you don't end up getting all the phone calls about 
something not working



Re: Which release to use?

Posted by Steve Loughran <st...@apache.org>.
On 15/07/2011 18:06, Arun C Murthy wrote:
> Apache Hadoop is a volunteer driven, open-source project. The contributors to Apache Hadoop, both individuals and folks across a diverse set of organizations, are committed to driving the project forward and making timely releases - see discussion on hadoop-0.23 with a raft newer features such as HDFS Federation, NextGen MapReduce and plans for HA NameNode etc.
>
> As with most successful projects there are several options for commercial support to Hadoop or its derivatives.
>
> However, Apache Hadoop has thrived before there was any commercial support (I've personally been involved in over 20 releases of Apache Hadoop and deployed them while at Yahoo) and I'm sure it will in this new world order.
>
> We, the Apache Hadoop community, are committed to keeping Apache Hadoop 'free', providing support to our users and to move it forward at a rapid rate.
>

Arun makes a good point which is that the Apache project depends on 
contributions from the community to thrive. That includes

  -bug reports
  -patches to fix problems
  -more tests
  -documentation improvements: more examples, more on getting started, 
troubleshooting, etc.

If there's something lacking in the codebase, and you think you can fix 
it, please do so. Helping with the documentation is a good start, as it 
can be improved, and you aren't going to break anything.

Once you get into changing the code, you'll end up working with the head 
of whichever branch you are targeting.

The other area everyone can contribute on is testing. Yes, Y! and FB can 
test at scale, yes, other people can test large clusters too -but nobody 
has a network that looks like yours but you. And Hadoop does care about 
network configurations. Testing beta and release candidate releases in 
your infrastructure, helps verify that the final release will work on 
your site, and you don't end up getting all the phone calls about 
something not working

Re: Which release to use?

Posted by Arun C Murthy <ac...@hortonworks.com>.
Apache Hadoop is a volunteer driven, open-source project. The contributors to Apache Hadoop, both individuals and folks across a diverse set of organizations, are committed to driving the project forward and making timely releases - see discussion on hadoop-0.23 with a raft newer features such as HDFS Federation, NextGen MapReduce and plans for HA NameNode etc. 

As with most successful projects there are several options for commercial support to Hadoop or its derivatives.

However, Apache Hadoop has thrived before there was any commercial support (I've personally been involved in over 20 releases of Apache Hadoop and deployed them while at Yahoo) and I'm sure it will in this new world order. 

We, the Apache Hadoop community, are committed to keeping Apache Hadoop 'free', providing support to our users and to move it forward at a rapid rate. 

Arun

On Jul 15, 2011, at 7:58 AM, Michael Segel wrote:

> 
> Unfortunately the picture is a bit more confusing.
> 
> Yahoo! is now HortonWorks. Their stated goal is to not have their own derivative release but to sell commercial support for the official Apache release.
> So those selling commercial support are:
> *Cloudera
> *HortonWorks
> *MapRTech
> *EMC (reselling MapRTech, but had announced their own)
> *IBM (not sure what they are selling exactly... still seems like smoke and mirrors...)
> *DataStax 
> 
> So while you can use the Apache release, it may not make sense for your organization to do so. (Said as I don the flame retardant suit...)
> 
> The issue is that outside of HortonWorks which is stating that they will support the official Apache release, everything else is a derivative work of Apache's Hadoop. From what I have seen, Cloudera's release is the closest to the Apache release.
> 
> Like I said, things are getting interesting.
> 
> HTH
> 
> -Mike
> 
> 
> 
>> From: evans@yahoo-inc.com
>> To: common-user@hadoop.apache.org
>> Date: Fri, 15 Jul 2011 07:35:45 -0700
>> Subject: Re: Which release to use?
>> 
>> Adarsh,
>> 
>> Yahoo! no longer has its own distribution of Hadoop.  It has been merged into the 0.20.2XX line so 0.20.203 is what Yahoo is running internally right now, and we are moving towards 0.20.204 which should be out soon.  I am not an expert on Cloudera so I cannot really map its releases to the Apache Releases, but their distro is based off of Apache Hadoop with a few bug fixes and maybe a few features like append added in on top of it, but you need to talk to Cloudera about the exact details.  For the most part they are all very similar.  You need to think most about support, there are several companies that can sell you support if you want/need it.  You also need to think about features vs. stability.  The 0.20.203 release has been tested on a lot of machines by many different groups, but may be missing some features that are needed in some situations.
>> 
>> --Bobby
>> 
>> 
>> On 7/14/11 11:49 PM, "Adarsh Sharma" <ad...@orkash.com> wrote:
>> 
>> Hadoop releases are issued time by time. But one more thing related to
>> hadoop usage,
>> 
>> There are so many providers that provides the distribution of Hadoop ;
>> 
>> 1. Apache Hadoop
>> 2. Cloudera
>> 3. Yahoo
>> 
>> etc.
>> Which distribution is best among them on production usage.
>> I think Cloudera's  is best among them.
>> 
>> 
>> Best Regards,
>> Adarsh
>> Owen O'Malley wrote:
>>> On Jul 14, 2011, at 4:33 PM, Teruhiko Kurosaka wrote:
>>> 
>>> 
>>>> I'm a newbie and I am confused by the Hadoop releases.
>>>> I thought 0.21.0 is the latest & greatest release that I
>>>> should be using but I noticed 0.20.203 has been released
>>>> lately, and 0.21.X is marked "unstable, unsupported".
>>>> 
>>>> Should I be using 0.20.203?
>>>> 
>>> 
>>> Yes, I apologize for confusing release numbering, but the best release to use is 0.20.203.0. It includes security, job limits, and many other improvements over 0.20.2 and 0.21.0. Unfortunately, it doesn't have the new sync support so it isn't suitable for using with HBase. Most large clusters use a separate version of HDFS for HBase.
>>> 
>>> -- Owen
>>> 
>>> 
>> 
>> 
> 		 	   		  


Re: Which release to use?

Posted by Owen O'Malley <ow...@hortonworks.com>.
On Jul 15, 2011, at 7:58 AM, Michael Segel wrote:

> So while you can use the Apache release, it may not make sense for your organization to do so. (Said as I don the flame retardant suit...)

I obviously disagree. *grin* Apache Hadoop 0.20.203.0 is the most stable and well tested release and has been deployed on Yahoo's 40,000 Hadoop machines in clusters of up to 4,500 machines and has been used extensively for running production work loads. We are actively working to make the install and deployment of Apache Hadoop easier

In terms of commercial support, HortonWorks is absolutely supporting the Apache releases. IBM is also supporting the Apache releases:

http://davidmenninger.ventanaresearch.com/2011/05/18/ibm-chooses-hadoop-unity-not-shipping-the-elephant/

So lack of commercial support isn't a problem...

-- Owen

Re: Which release to use?

Posted by Steve Loughran <st...@apache.org>.
On 16/07/2011 16:53, Rita wrote:
> I am curious about the IBM product BigInishgts. Where can we download it? It
> seems we have to register to download it?
>

I think you have to pay to use it

Re: Which release to use?

Posted by Allen Wittenauer <aw...@apache.org>.
On Jul 18, 2011, at 6:02 PM, Rita wrote:

> I am a dimwit.


	We are conditioned by marketing that a higher number is always better.  Experience tells us that this is not necessarily true.



Re: Which release to use?

Posted by Rita <rm...@gmail.com>.
I am a dimwit.


On Mon, Jul 18, 2011 at 8:12 PM, Allen Wittenauer <aw...@apache.org> wrote:

>
> On Jul 18, 2011, at 5:01 PM, Rita wrote:
>
> > I made the big mistake by using the latest version, 0.21.0 and found
> bunch
> > of bugs so I got pissed off at hdfs. Then, after reading this thread it
> > seems I should of used 0.20.x .
> >
> > I really wish we can fix this on the website, stating 0.21.0 as unstable.
>
>
>         It is stated in a few places on the website that 0.21 isn't stable:
>
>
> http://hadoop.apache.org/common/releases.html#23+August%2C+2010%3A+release+0.21.0+available
>
> "It has not undergone testing at scale and should not be considered stable
> or suitable for production."
>
>        ... and ...
>
> http://hadoop.apache.org/common/releases.html#Download
>
>        "0.21.X - unstable, unsupported, does not include security"
>
>        and it isn't in the stable directory on the apache download mirrors.
>
>
>


-- 
--- Get your facts first, then you can distort them as you please.--

Re: Which release to use?

Posted by Allen Wittenauer <aw...@apache.org>.
On Jul 18, 2011, at 5:01 PM, Rita wrote:

> I made the big mistake by using the latest version, 0.21.0 and found bunch
> of bugs so I got pissed off at hdfs. Then, after reading this thread it
> seems I should of used 0.20.x .
> 
> I really wish we can fix this on the website, stating 0.21.0 as unstable.


	It is stated in a few places on the website that 0.21 isn't stable:

http://hadoop.apache.org/common/releases.html#23+August%2C+2010%3A+release+0.21.0+available

"It has not undergone testing at scale and should not be considered stable or suitable for production."

	... and ...

http://hadoop.apache.org/common/releases.html#Download

	"0.21.X - unstable, unsupported, does not include security"

	and it isn't in the stable directory on the apache download mirrors.



Re: Which release to use?

Posted by Rita <rm...@gmail.com>.
I made the big mistake by using the latest version, 0.21.0 and found bunch
of bugs so I got pissed off at hdfs. Then, after reading this thread it
seems I should of used 0.20.x .

I really wish we can fix this on the website, stating 0.21.0 as unstable.



On Mon, Jul 18, 2011 at 4:50 PM, Michael Segel <mi...@hotmail.com>wrote:

>
> Well that's CDH3. :-)
>
> And yes, that's because up until the past month... other releases didn't
> exist w commercial support.
>
> Now there are more players as we look at the movement from leading edge to
> mainstream adopters.
>
>
>
> > Subject: RE: Which release to use?
> > Date: Mon, 18 Jul 2011 14:30:39 -0500
> > From: Jeff.Schmitz@shell.com
> > To: common-user@hadoop.apache.org
> >
> >
> >  Most people are using CH3 - if you need some features from another
> > distro use that -
> >
> > http://www.cloudera.com/hadoop/
> >
> > I wonder if the Cloudera people realize that CH3 was a pretty happening
> > punk band back in the day (if not they do now = )
> >
> > http://en.wikipedia.org/wiki/Channel_3_%28band%29
> >
> > cheers -
> >
> >
> > Jeffery Schmitz
> > Projects and Technology
> > 3737 Bellaire Blvd Houston, Texas 77001
> > Tel: +1-713-245-7326 Fax: +1 713 245 7678
> > Email: Jeff.Schmitz@shell.com
> > Intergalactic Proton Powered Electrical Tentacled Advertising Droids!
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Michael Segel [mailto:michael_segel@hotmail.com]
> > Sent: Monday, July 18, 2011 2:10 PM
> > To: common-user@hadoop.apache.org
> > Subject: RE: Which release to use?
> >
> >
> > Tom,
> >
> > I'm not sure that you're really honoring the purpose and approach of
> > this list.
> >
> > I mean on the one hand, you're not under any obligation to respond or
> > participate on the list. And I can respect that. You're not in an S&D
> > role so you're not 'customer facing' and not used to having to deal with
> > these types of questions.
> >
> > On the other, you're not being free with your information. So when this
> > type of question comes up, it becomes very easy to discount IBM as a
> > release or source provider for commercial support.
> >
> > Without information, I'm afraid that I may have to make recommendations
> > to my clients that may be out of date.
> >
> > There is even some speculation from analysts that recent comments from
> > IBM are more of an indication that IBM is still not ready for prime
> > time.
> >
> > I'm sorry you're not in a position to detail your offering.
> >
> > Maybe by September you might be ready and then talk to our CHUG?
> >
> > -Mike
> >
> >
> >
> > > To: common-user@hadoop.apache.org
> > > Subject: Re: Which release to use?
> > > From: tdeutsch@us.ibm.com
> > > Date: Sat, 16 Jul 2011 10:29:55 -0700
> > >
> > > Hi Rita - I want to make sure we are honoring the purpose/approach of
> > this
> > > list. So you are welcome to ping me for information, but let's take
> > this
> > > discussion off the list at this point.
> > >
> > > ------------------------------------------------
> > > Tom Deutsch
> > > Program Director
> > > CTO Office: Information Management
> > > Hadoop Product Manager / Customer Exec
> > > IBM
> > > 3565 Harbor Blvd
> > > Costa Mesa, CA 92626-1420
> > > tdeutsch@us.ibm.com
> > >
> > >
> > >
> > >
> > > Rita <rm...@gmail.com>
> > > 07/16/2011 08:53 AM
> > > Please respond to
> > > common-user@hadoop.apache.org
> > >
> > >
> > > To
> > > common-user@hadoop.apache.org
> > > cc
> > >
> > > Subject
> > > Re: Which release to use?
> > >
> > >
> > >
> > >
> > >
> > >
> > > I am curious about the IBM product BigInishgts. Where can we download
> > it?
> > > It
> > > seems we have to register to download it?
> > >
> > >
> > > On Fri, Jul 15, 2011 at 12:38 PM, Tom Deutsch <td...@us.ibm.com>
> > wrote:
> > >
> > > > One quick clarification - IBM GA'd a product called BigInsights in
> > 2Q.
> > > It
> > > > faithfully uses the Hadoop stack and many related projects - but
> > > provides
> > > > a number of extensions (that are compatible) based on customer
> > requests.
> > > > Not appropriate to say any more on this list, but the info on it is
> > all
> > > > publically available.
> > > >
> > > >
> > > > ------------------------------------------------
> > > > Tom Deutsch
> > > > Program Director
> > > > CTO Office: Information Management
> > > > Hadoop Product Manager / Customer Exec
> > > > IBM
> > > > 3565 Harbor Blvd
> > > > Costa Mesa, CA 92626-1420
> > > > tdeutsch@us.ibm.com
> > > >
> > > >
> > > >
> > > >
> > > > Michael Segel <mi...@hotmail.com>
> > > > 07/15/2011 07:58 AM
> > > > Please respond to
> > > > common-user@hadoop.apache.org
> > > >
> > > >
> > > > To
> > > > <co...@hadoop.apache.org>
> > > > cc
> > > >
> > > > Subject
> > > > RE: Which release to use?
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Unfortunately the picture is a bit more confusing.
> > > >
> > > > Yahoo! is now HortonWorks. Their stated goal is to not have their
> > own
> > > > derivative release but to sell commercial support for the official
> > > Apache
> > > > release.
> > > > So those selling commercial support are:
> > > > *Cloudera
> > > > *HortonWorks
> > > > *MapRTech
> > > > *EMC (reselling MapRTech, but had announced their own)
> > > > *IBM (not sure what they are selling exactly... still seems like
> > smoke
> > > and
> > > > mirrors...)
> > > > *DataStax
> > > >
> > > > So while you can use the Apache release, it may not make sense for
> > your
> > > > organization to do so. (Said as I don the flame retardant suit...)
> > > >
> > > > The issue is that outside of HortonWorks which is stating that they
> > will
> > > > support the official Apache release, everything else is a derivative
> >
> > > work
> > > > of Apache's Hadoop. From what I have seen, Cloudera's release is the
> > > > closest to the Apache release.
> > > >
> > > > Like I said, things are getting interesting.
> > > >
> > > > HTH
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > --- Get your facts first, then you can distort them as you please.--
> > >
> >
> >
>
>



-- 
--- Get your facts first, then you can distort them as you please.--

RE: Which release to use?

Posted by Michael Segel <mi...@hotmail.com>.
Well that's CDH3. :-)

And yes, that's because up until the past month... other releases didn't exist w commercial support.

Now there are more players as we look at the movement from leading edge to mainstream adopters.



> Subject: RE: Which release to use?
> Date: Mon, 18 Jul 2011 14:30:39 -0500
> From: Jeff.Schmitz@shell.com
> To: common-user@hadoop.apache.org
> 
> 
>  Most people are using CH3 - if you need some features from another
> distro use that - 
> 
> http://www.cloudera.com/hadoop/
> 
> I wonder if the Cloudera people realize that CH3 was a pretty happening
> punk band back in the day (if not they do now = )
> 
> http://en.wikipedia.org/wiki/Channel_3_%28band%29
> 
> cheers - 
> 
> 
> Jeffery Schmitz
> Projects and Technology
> 3737 Bellaire Blvd Houston, Texas 77001
> Tel: +1-713-245-7326 Fax: +1 713 245 7678
> Email: Jeff.Schmitz@shell.com
> Intergalactic Proton Powered Electrical Tentacled Advertising Droids!
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Michael Segel [mailto:michael_segel@hotmail.com] 
> Sent: Monday, July 18, 2011 2:10 PM
> To: common-user@hadoop.apache.org
> Subject: RE: Which release to use?
> 
> 
> Tom,
> 
> I'm not sure that you're really honoring the purpose and approach of
> this list.
> 
> I mean on the one hand, you're not under any obligation to respond or
> participate on the list. And I can respect that. You're not in an S&D
> role so you're not 'customer facing' and not used to having to deal with
> these types of questions.
> 
> On the other, you're not being free with your information. So when this
> type of question comes up, it becomes very easy to discount IBM as a
> release or source provider for commercial support.
> 
> Without information, I'm afraid that I may have to make recommendations
> to my clients that may be out of date.
> 
> There is even some speculation from analysts that recent comments from
> IBM are more of an indication that IBM is still not ready for prime
> time. 
> 
> I'm sorry you're not in a position to detail your offering.
> 
> Maybe by September you might be ready and then talk to our CHUG?
> 
> -Mike
> 
> 
> 
> > To: common-user@hadoop.apache.org
> > Subject: Re: Which release to use?
> > From: tdeutsch@us.ibm.com
> > Date: Sat, 16 Jul 2011 10:29:55 -0700
> > 
> > Hi Rita - I want to make sure we are honoring the purpose/approach of
> this 
> > list. So you are welcome to ping me for information, but let's take
> this 
> > discussion off the list at this point.
> > 
> > ------------------------------------------------
> > Tom Deutsch
> > Program Director
> > CTO Office: Information Management
> > Hadoop Product Manager / Customer Exec
> > IBM
> > 3565 Harbor Blvd
> > Costa Mesa, CA 92626-1420
> > tdeutsch@us.ibm.com
> > 
> > 
> > 
> > 
> > Rita <rm...@gmail.com> 
> > 07/16/2011 08:53 AM
> > Please respond to
> > common-user@hadoop.apache.org
> > 
> > 
> > To
> > common-user@hadoop.apache.org
> > cc
> > 
> > Subject
> > Re: Which release to use?
> > 
> > 
> > 
> > 
> > 
> > 
> > I am curious about the IBM product BigInishgts. Where can we download
> it? 
> > It
> > seems we have to register to download it?
> > 
> > 
> > On Fri, Jul 15, 2011 at 12:38 PM, Tom Deutsch <td...@us.ibm.com>
> wrote:
> > 
> > > One quick clarification - IBM GA'd a product called BigInsights in
> 2Q. 
> > It
> > > faithfully uses the Hadoop stack and many related projects - but 
> > provides
> > > a number of extensions (that are compatible) based on customer
> requests.
> > > Not appropriate to say any more on this list, but the info on it is
> all
> > > publically available.
> > >
> > >
> > > ------------------------------------------------
> > > Tom Deutsch
> > > Program Director
> > > CTO Office: Information Management
> > > Hadoop Product Manager / Customer Exec
> > > IBM
> > > 3565 Harbor Blvd
> > > Costa Mesa, CA 92626-1420
> > > tdeutsch@us.ibm.com
> > >
> > >
> > >
> > >
> > > Michael Segel <mi...@hotmail.com>
> > > 07/15/2011 07:58 AM
> > > Please respond to
> > > common-user@hadoop.apache.org
> > >
> > >
> > > To
> > > <co...@hadoop.apache.org>
> > > cc
> > >
> > > Subject
> > > RE: Which release to use?
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Unfortunately the picture is a bit more confusing.
> > >
> > > Yahoo! is now HortonWorks. Their stated goal is to not have their
> own
> > > derivative release but to sell commercial support for the official 
> > Apache
> > > release.
> > > So those selling commercial support are:
> > > *Cloudera
> > > *HortonWorks
> > > *MapRTech
> > > *EMC (reselling MapRTech, but had announced their own)
> > > *IBM (not sure what they are selling exactly... still seems like
> smoke 
> > and
> > > mirrors...)
> > > *DataStax
> > >
> > > So while you can use the Apache release, it may not make sense for
> your
> > > organization to do so. (Said as I don the flame retardant suit...)
> > >
> > > The issue is that outside of HortonWorks which is stating that they
> will
> > > support the official Apache release, everything else is a derivative
> 
> > work
> > > of Apache's Hadoop. From what I have seen, Cloudera's release is the
> > > closest to the Apache release.
> > >
> > > Like I said, things are getting interesting.
> > >
> > > HTH
> > >
> > >
> > >
> > >
> > 
> > 
> > -- 
> > --- Get your facts first, then you can distort them as you please.--
> > 
>  		 	   		  
> 
 		 	   		  

RE: Which release to use?

Posted by Je...@shell.com.
 Most people are using CH3 - if you need some features from another
distro use that - 

http://www.cloudera.com/hadoop/

I wonder if the Cloudera people realize that CH3 was a pretty happening
punk band back in the day (if not they do now = )

http://en.wikipedia.org/wiki/Channel_3_%28band%29

cheers - 


Jeffery Schmitz
Projects and Technology
3737 Bellaire Blvd Houston, Texas 77001
Tel: +1-713-245-7326 Fax: +1 713 245 7678
Email: Jeff.Schmitz@shell.com
Intergalactic Proton Powered Electrical Tentacled Advertising Droids!





-----Original Message-----
From: Michael Segel [mailto:michael_segel@hotmail.com] 
Sent: Monday, July 18, 2011 2:10 PM
To: common-user@hadoop.apache.org
Subject: RE: Which release to use?


Tom,

I'm not sure that you're really honoring the purpose and approach of
this list.

I mean on the one hand, you're not under any obligation to respond or
participate on the list. And I can respect that. You're not in an S&D
role so you're not 'customer facing' and not used to having to deal with
these types of questions.

On the other, you're not being free with your information. So when this
type of question comes up, it becomes very easy to discount IBM as a
release or source provider for commercial support.

Without information, I'm afraid that I may have to make recommendations
to my clients that may be out of date.

There is even some speculation from analysts that recent comments from
IBM are more of an indication that IBM is still not ready for prime
time. 

I'm sorry you're not in a position to detail your offering.

Maybe by September you might be ready and then talk to our CHUG?

-Mike



> To: common-user@hadoop.apache.org
> Subject: Re: Which release to use?
> From: tdeutsch@us.ibm.com
> Date: Sat, 16 Jul 2011 10:29:55 -0700
> 
> Hi Rita - I want to make sure we are honoring the purpose/approach of
this 
> list. So you are welcome to ping me for information, but let's take
this 
> discussion off the list at this point.
> 
> ------------------------------------------------
> Tom Deutsch
> Program Director
> CTO Office: Information Management
> Hadoop Product Manager / Customer Exec
> IBM
> 3565 Harbor Blvd
> Costa Mesa, CA 92626-1420
> tdeutsch@us.ibm.com
> 
> 
> 
> 
> Rita <rm...@gmail.com> 
> 07/16/2011 08:53 AM
> Please respond to
> common-user@hadoop.apache.org
> 
> 
> To
> common-user@hadoop.apache.org
> cc
> 
> Subject
> Re: Which release to use?
> 
> 
> 
> 
> 
> 
> I am curious about the IBM product BigInishgts. Where can we download
it? 
> It
> seems we have to register to download it?
> 
> 
> On Fri, Jul 15, 2011 at 12:38 PM, Tom Deutsch <td...@us.ibm.com>
wrote:
> 
> > One quick clarification - IBM GA'd a product called BigInsights in
2Q. 
> It
> > faithfully uses the Hadoop stack and many related projects - but 
> provides
> > a number of extensions (that are compatible) based on customer
requests.
> > Not appropriate to say any more on this list, but the info on it is
all
> > publically available.
> >
> >
> > ------------------------------------------------
> > Tom Deutsch
> > Program Director
> > CTO Office: Information Management
> > Hadoop Product Manager / Customer Exec
> > IBM
> > 3565 Harbor Blvd
> > Costa Mesa, CA 92626-1420
> > tdeutsch@us.ibm.com
> >
> >
> >
> >
> > Michael Segel <mi...@hotmail.com>
> > 07/15/2011 07:58 AM
> > Please respond to
> > common-user@hadoop.apache.org
> >
> >
> > To
> > <co...@hadoop.apache.org>
> > cc
> >
> > Subject
> > RE: Which release to use?
> >
> >
> >
> >
> >
> >
> >
> > Unfortunately the picture is a bit more confusing.
> >
> > Yahoo! is now HortonWorks. Their stated goal is to not have their
own
> > derivative release but to sell commercial support for the official 
> Apache
> > release.
> > So those selling commercial support are:
> > *Cloudera
> > *HortonWorks
> > *MapRTech
> > *EMC (reselling MapRTech, but had announced their own)
> > *IBM (not sure what they are selling exactly... still seems like
smoke 
> and
> > mirrors...)
> > *DataStax
> >
> > So while you can use the Apache release, it may not make sense for
your
> > organization to do so. (Said as I don the flame retardant suit...)
> >
> > The issue is that outside of HortonWorks which is stating that they
will
> > support the official Apache release, everything else is a derivative

> work
> > of Apache's Hadoop. From what I have seen, Cloudera's release is the
> > closest to the Apache release.
> >
> > Like I said, things are getting interesting.
> >
> > HTH
> >
> >
> >
> >
> 
> 
> -- 
> --- Get your facts first, then you can distort them as you please.--
> 
 		 	   		  


RE: Which release to use?

Posted by Michael Segel <mi...@hotmail.com>.
Tom,

I'm not sure that you're really honoring the purpose and approach of this list.

I mean on the one hand, you're not under any obligation to respond or participate on the list. And I can respect that. You're not in an S&D role so you're not 'customer facing' and not used to having to deal with these types of questions.

On the other, you're not being free with your information. So when this type of question comes up, it becomes very easy to discount IBM as a release or source provider for commercial support.

Without information, I'm afraid that I may have to make recommendations to my clients that may be out of date.

There is even some speculation from analysts that recent comments from IBM are more of an indication that IBM is still not ready for prime time. 

I'm sorry you're not in a position to detail your offering.

Maybe by September you might be ready and then talk to our CHUG?

-Mike



> To: common-user@hadoop.apache.org
> Subject: Re: Which release to use?
> From: tdeutsch@us.ibm.com
> Date: Sat, 16 Jul 2011 10:29:55 -0700
> 
> Hi Rita - I want to make sure we are honoring the purpose/approach of this 
> list. So you are welcome to ping me for information, but let's take this 
> discussion off the list at this point.
> 
> ------------------------------------------------
> Tom Deutsch
> Program Director
> CTO Office: Information Management
> Hadoop Product Manager / Customer Exec
> IBM
> 3565 Harbor Blvd
> Costa Mesa, CA 92626-1420
> tdeutsch@us.ibm.com
> 
> 
> 
> 
> Rita <rm...@gmail.com> 
> 07/16/2011 08:53 AM
> Please respond to
> common-user@hadoop.apache.org
> 
> 
> To
> common-user@hadoop.apache.org
> cc
> 
> Subject
> Re: Which release to use?
> 
> 
> 
> 
> 
> 
> I am curious about the IBM product BigInishgts. Where can we download it? 
> It
> seems we have to register to download it?
> 
> 
> On Fri, Jul 15, 2011 at 12:38 PM, Tom Deutsch <td...@us.ibm.com> wrote:
> 
> > One quick clarification - IBM GA'd a product called BigInsights in 2Q. 
> It
> > faithfully uses the Hadoop stack and many related projects - but 
> provides
> > a number of extensions (that are compatible) based on customer requests.
> > Not appropriate to say any more on this list, but the info on it is all
> > publically available.
> >
> >
> > ------------------------------------------------
> > Tom Deutsch
> > Program Director
> > CTO Office: Information Management
> > Hadoop Product Manager / Customer Exec
> > IBM
> > 3565 Harbor Blvd
> > Costa Mesa, CA 92626-1420
> > tdeutsch@us.ibm.com
> >
> >
> >
> >
> > Michael Segel <mi...@hotmail.com>
> > 07/15/2011 07:58 AM
> > Please respond to
> > common-user@hadoop.apache.org
> >
> >
> > To
> > <co...@hadoop.apache.org>
> > cc
> >
> > Subject
> > RE: Which release to use?
> >
> >
> >
> >
> >
> >
> >
> > Unfortunately the picture is a bit more confusing.
> >
> > Yahoo! is now HortonWorks. Their stated goal is to not have their own
> > derivative release but to sell commercial support for the official 
> Apache
> > release.
> > So those selling commercial support are:
> > *Cloudera
> > *HortonWorks
> > *MapRTech
> > *EMC (reselling MapRTech, but had announced their own)
> > *IBM (not sure what they are selling exactly... still seems like smoke 
> and
> > mirrors...)
> > *DataStax
> >
> > So while you can use the Apache release, it may not make sense for your
> > organization to do so. (Said as I don the flame retardant suit...)
> >
> > The issue is that outside of HortonWorks which is stating that they will
> > support the official Apache release, everything else is a derivative 
> work
> > of Apache's Hadoop. From what I have seen, Cloudera's release is the
> > closest to the Apache release.
> >
> > Like I said, things are getting interesting.
> >
> > HTH
> >
> >
> >
> >
> 
> 
> -- 
> --- Get your facts first, then you can distort them as you please.--
> 
 		 	   		  

Re: Which release to use?

Posted by Tom Deutsch <td...@us.ibm.com>.
Hi Rita - I want to make sure we are honoring the purpose/approach of this 
list. So you are welcome to ping me for information, but let's take this 
discussion off the list at this point.

------------------------------------------------
Tom Deutsch
Program Director
CTO Office: Information Management
Hadoop Product Manager / Customer Exec
IBM
3565 Harbor Blvd
Costa Mesa, CA 92626-1420
tdeutsch@us.ibm.com




Rita <rm...@gmail.com> 
07/16/2011 08:53 AM
Please respond to
common-user@hadoop.apache.org


To
common-user@hadoop.apache.org
cc

Subject
Re: Which release to use?






I am curious about the IBM product BigInishgts. Where can we download it? 
It
seems we have to register to download it?


On Fri, Jul 15, 2011 at 12:38 PM, Tom Deutsch <td...@us.ibm.com> wrote:

> One quick clarification - IBM GA'd a product called BigInsights in 2Q. 
It
> faithfully uses the Hadoop stack and many related projects - but 
provides
> a number of extensions (that are compatible) based on customer requests.
> Not appropriate to say any more on this list, but the info on it is all
> publically available.
>
>
> ------------------------------------------------
> Tom Deutsch
> Program Director
> CTO Office: Information Management
> Hadoop Product Manager / Customer Exec
> IBM
> 3565 Harbor Blvd
> Costa Mesa, CA 92626-1420
> tdeutsch@us.ibm.com
>
>
>
>
> Michael Segel <mi...@hotmail.com>
> 07/15/2011 07:58 AM
> Please respond to
> common-user@hadoop.apache.org
>
>
> To
> <co...@hadoop.apache.org>
> cc
>
> Subject
> RE: Which release to use?
>
>
>
>
>
>
>
> Unfortunately the picture is a bit more confusing.
>
> Yahoo! is now HortonWorks. Their stated goal is to not have their own
> derivative release but to sell commercial support for the official 
Apache
> release.
> So those selling commercial support are:
> *Cloudera
> *HortonWorks
> *MapRTech
> *EMC (reselling MapRTech, but had announced their own)
> *IBM (not sure what they are selling exactly... still seems like smoke 
and
> mirrors...)
> *DataStax
>
> So while you can use the Apache release, it may not make sense for your
> organization to do so. (Said as I don the flame retardant suit...)
>
> The issue is that outside of HortonWorks which is stating that they will
> support the official Apache release, everything else is a derivative 
work
> of Apache's Hadoop. From what I have seen, Cloudera's release is the
> closest to the Apache release.
>
> Like I said, things are getting interesting.
>
> HTH
>
>
>
>


-- 
--- Get your facts first, then you can distort them as you please.--


Re: Which release to use?

Posted by Rita <rm...@gmail.com>.
I am curious about the IBM product BigInishgts. Where can we download it? It
seems we have to register to download it?


On Fri, Jul 15, 2011 at 12:38 PM, Tom Deutsch <td...@us.ibm.com> wrote:

> One quick clarification - IBM GA'd a product called BigInsights in 2Q. It
> faithfully uses the Hadoop stack and many related projects - but provides
> a number of extensions (that are compatible) based on customer requests.
> Not appropriate to say any more on this list, but the info on it is all
> publically available.
>
>
> ------------------------------------------------
> Tom Deutsch
> Program Director
> CTO Office: Information Management
> Hadoop Product Manager / Customer Exec
> IBM
> 3565 Harbor Blvd
> Costa Mesa, CA 92626-1420
> tdeutsch@us.ibm.com
>
>
>
>
> Michael Segel <mi...@hotmail.com>
> 07/15/2011 07:58 AM
> Please respond to
> common-user@hadoop.apache.org
>
>
> To
> <co...@hadoop.apache.org>
> cc
>
> Subject
> RE: Which release to use?
>
>
>
>
>
>
>
> Unfortunately the picture is a bit more confusing.
>
> Yahoo! is now HortonWorks. Their stated goal is to not have their own
> derivative release but to sell commercial support for the official Apache
> release.
> So those selling commercial support are:
> *Cloudera
> *HortonWorks
> *MapRTech
> *EMC (reselling MapRTech, but had announced their own)
> *IBM (not sure what they are selling exactly... still seems like smoke and
> mirrors...)
> *DataStax
>
> So while you can use the Apache release, it may not make sense for your
> organization to do so. (Said as I don the flame retardant suit...)
>
> The issue is that outside of HortonWorks which is stating that they will
> support the official Apache release, everything else is a derivative work
> of Apache's Hadoop. From what I have seen, Cloudera's release is the
> closest to the Apache release.
>
> Like I said, things are getting interesting.
>
> HTH
>
>
>
>


-- 
--- Get your facts first, then you can distort them as you please.--

RE: Which release to use?

Posted by Tom Deutsch <td...@us.ibm.com>.
One quick clarification - IBM GA'd a product called BigInsights in 2Q. It 
faithfully uses the Hadoop stack and many related projects - but provides 
a number of extensions (that are compatible) based on customer requests. 
Not appropriate to say any more on this list, but the info on it is all 
publically available.


------------------------------------------------
Tom Deutsch
Program Director
CTO Office: Information Management
Hadoop Product Manager / Customer Exec
IBM
3565 Harbor Blvd
Costa Mesa, CA 92626-1420
tdeutsch@us.ibm.com




Michael Segel <mi...@hotmail.com> 
07/15/2011 07:58 AM
Please respond to
common-user@hadoop.apache.org


To
<co...@hadoop.apache.org>
cc

Subject
RE: Which release to use?







Unfortunately the picture is a bit more confusing.

Yahoo! is now HortonWorks. Their stated goal is to not have their own 
derivative release but to sell commercial support for the official Apache 
release.
So those selling commercial support are:
*Cloudera
*HortonWorks
*MapRTech
*EMC (reselling MapRTech, but had announced their own)
*IBM (not sure what they are selling exactly... still seems like smoke and 
mirrors...)
*DataStax 

So while you can use the Apache release, it may not make sense for your 
organization to do so. (Said as I don the flame retardant suit...)

The issue is that outside of HortonWorks which is stating that they will 
support the official Apache release, everything else is a derivative work 
of Apache's Hadoop. From what I have seen, Cloudera's release is the 
closest to the Apache release.

Like I said, things are getting interesting.

HTH

 
  

RE: Which release to use?

Posted by Michael Segel <mi...@hotmail.com>.
Unfortunately the picture is a bit more confusing.

Yahoo! is now HortonWorks. Their stated goal is to not have their own derivative release but to sell commercial support for the official Apache release.
So those selling commercial support are:
*Cloudera
*HortonWorks
*MapRTech
*EMC (reselling MapRTech, but had announced their own)
*IBM (not sure what they are selling exactly... still seems like smoke and mirrors...)
*DataStax 

So while you can use the Apache release, it may not make sense for your organization to do so. (Said as I don the flame retardant suit...)

The issue is that outside of HortonWorks which is stating that they will support the official Apache release, everything else is a derivative work of Apache's Hadoop. From what I have seen, Cloudera's release is the closest to the Apache release.

Like I said, things are getting interesting.

HTH

-Mike



> From: evans@yahoo-inc.com
> To: common-user@hadoop.apache.org
> Date: Fri, 15 Jul 2011 07:35:45 -0700
> Subject: Re: Which release to use?
> 
> Adarsh,
> 
> Yahoo! no longer has its own distribution of Hadoop.  It has been merged into the 0.20.2XX line so 0.20.203 is what Yahoo is running internally right now, and we are moving towards 0.20.204 which should be out soon.  I am not an expert on Cloudera so I cannot really map its releases to the Apache Releases, but their distro is based off of Apache Hadoop with a few bug fixes and maybe a few features like append added in on top of it, but you need to talk to Cloudera about the exact details.  For the most part they are all very similar.  You need to think most about support, there are several companies that can sell you support if you want/need it.  You also need to think about features vs. stability.  The 0.20.203 release has been tested on a lot of machines by many different groups, but may be missing some features that are needed in some situations.
> 
> --Bobby
> 
> 
> On 7/14/11 11:49 PM, "Adarsh Sharma" <ad...@orkash.com> wrote:
> 
> Hadoop releases are issued time by time. But one more thing related to
> hadoop usage,
> 
> There are so many providers that provides the distribution of Hadoop ;
> 
> 1. Apache Hadoop
> 2. Cloudera
> 3. Yahoo
> 
> etc.
> Which distribution is best among them on production usage.
> I think Cloudera's  is best among them.
> 
> 
> Best Regards,
> Adarsh
> Owen O'Malley wrote:
> > On Jul 14, 2011, at 4:33 PM, Teruhiko Kurosaka wrote:
> >
> >
> >> I'm a newbie and I am confused by the Hadoop releases.
> >> I thought 0.21.0 is the latest & greatest release that I
> >> should be using but I noticed 0.20.203 has been released
> >> lately, and 0.21.X is marked "unstable, unsupported".
> >>
> >> Should I be using 0.20.203?
> >>
> >
> > Yes, I apologize for confusing release numbering, but the best release to use is 0.20.203.0. It includes security, job limits, and many other improvements over 0.20.2 and 0.21.0. Unfortunately, it doesn't have the new sync support so it isn't suitable for using with HBase. Most large clusters use a separate version of HDFS for HBase.
> >
> > -- Owen
> >
> >
> 
> 
 		 	   		  

Re: Which release to use?

Posted by Robert Evans <ev...@yahoo-inc.com>.
Adarsh,

Yahoo! no longer has its own distribution of Hadoop.  It has been merged into the 0.20.2XX line so 0.20.203 is what Yahoo is running internally right now, and we are moving towards 0.20.204 which should be out soon.  I am not an expert on Cloudera so I cannot really map its releases to the Apache Releases, but their distro is based off of Apache Hadoop with a few bug fixes and maybe a few features like append added in on top of it, but you need to talk to Cloudera about the exact details.  For the most part they are all very similar.  You need to think most about support, there are several companies that can sell you support if you want/need it.  You also need to think about features vs. stability.  The 0.20.203 release has been tested on a lot of machines by many different groups, but may be missing some features that are needed in some situations.

--Bobby


On 7/14/11 11:49 PM, "Adarsh Sharma" <ad...@orkash.com> wrote:

Hadoop releases are issued time by time. But one more thing related to
hadoop usage,

There are so many providers that provides the distribution of Hadoop ;

1. Apache Hadoop
2. Cloudera
3. Yahoo

etc.
Which distribution is best among them on production usage.
I think Cloudera's  is best among them.


Best Regards,
Adarsh
Owen O'Malley wrote:
> On Jul 14, 2011, at 4:33 PM, Teruhiko Kurosaka wrote:
>
>
>> I'm a newbie and I am confused by the Hadoop releases.
>> I thought 0.21.0 is the latest & greatest release that I
>> should be using but I noticed 0.20.203 has been released
>> lately, and 0.21.X is marked "unstable, unsupported".
>>
>> Should I be using 0.20.203?
>>
>
> Yes, I apologize for confusing release numbering, but the best release to use is 0.20.203.0. It includes security, job limits, and many other improvements over 0.20.2 and 0.21.0. Unfortunately, it doesn't have the new sync support so it isn't suitable for using with HBase. Most large clusters use a separate version of HDFS for HBase.
>
> -- Owen
>
>



Re: Which release to use?

Posted by Adarsh Sharma <ad...@orkash.com>.
Hadoop releases are issued time by time. But one more thing related to 
hadoop usage,

There are so many providers that provides the distribution of Hadoop ;

1. Apache Hadoop
2. Cloudera
3. Yahoo

etc.
Which distribution is best among them on production usage.
I think Cloudera's  is best among them.


Best Regards,
Adarsh
Owen O'Malley wrote:
> On Jul 14, 2011, at 4:33 PM, Teruhiko Kurosaka wrote:
>
>   
>> I'm a newbie and I am confused by the Hadoop releases.
>> I thought 0.21.0 is the latest & greatest release that I
>> should be using but I noticed 0.20.203 has been released
>> lately, and 0.21.X is marked "unstable, unsupported".
>>
>> Should I be using 0.20.203?
>>     
>
> Yes, I apologize for confusing release numbering, but the best release to use is 0.20.203.0. It includes security, job limits, and many other improvements over 0.20.2 and 0.21.0. Unfortunately, it doesn't have the new sync support so it isn't suitable for using with HBase. Most large clusters use a separate version of HDFS for HBase.
>
> -- Owen
>
>   


Re: Which release to use?

Posted by Owen O'Malley <ow...@hortonworks.com>.
On Jul 14, 2011, at 4:33 PM, Teruhiko Kurosaka wrote:

> I'm a newbie and I am confused by the Hadoop releases.
> I thought 0.21.0 is the latest & greatest release that I
> should be using but I noticed 0.20.203 has been released
> lately, and 0.21.X is marked "unstable, unsupported".
> 
> Should I be using 0.20.203?

Yes, I apologize for confusing release numbering, but the best release to use is 0.20.203.0. It includes security, job limits, and many other improvements over 0.20.2 and 0.21.0. Unfortunately, it doesn't have the new sync support so it isn't suitable for using with HBase. Most large clusters use a separate version of HDFS for HBase.

-- Owen


Re: Which release to use?

Posted by Jonathan Coveney <jc...@gmail.com>.
Isaac: there is no more yahoo branch. They are committing all of their code
to apache.

2011/7/15 Isaac Dooley <Is...@twosigma.com>

> Will 0.23 include Kerberos authentication? Will this finally unite the
> Yahoo and Apache branches?
>
> -----Original Message-----
> From: Arun C Murthy [mailto:acm@hortonworks.com]
> Sent: Thursday, July 14, 2011 7:43 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Which release to use?
>
> Hi,
>
>  0.20.203 is the latest stable release which includes a ton of features
> (security - kerberos based authentication) and fixes. Its currently deployed
> at over 50k machines at Yahoo too.
>  So, yes, I'd encourage you to use 0.20.203. We, the community, are
> currently working on hadoop-0.23 and hope to get it out soon.
>
> thanks,
> Arun
>
> On Jul 14, 2011, at 4:33 PM, Teruhiko Kurosaka wrote:
>
> > I'm a newbie and I am confused by the Hadoop releases.
> > I thought 0.21.0 is the latest & greatest release that I
> > should be using but I noticed 0.20.203 has been released
> > lately, and 0.21.X is marked "unstable, unsupported".
> >
> > Should I be using 0.20.203?
> > ----
> > T. "Kuro" Kurosaka
> >
> >
>
>

RE: Which release to use?

Posted by Isaac Dooley <Is...@twosigma.com>.
Will 0.23 include Kerberos authentication? Will this finally unite the Yahoo and Apache branches?

-----Original Message-----
From: Arun C Murthy [mailto:acm@hortonworks.com] 
Sent: Thursday, July 14, 2011 7:43 PM
To: common-user@hadoop.apache.org
Subject: Re: Which release to use?

Hi,

 0.20.203 is the latest stable release which includes a ton of features (security - kerberos based authentication) and fixes. Its currently deployed at over 50k machines at Yahoo too.
 So, yes, I'd encourage you to use 0.20.203. We, the community, are currently working on hadoop-0.23 and hope to get it out soon.

thanks,
Arun

On Jul 14, 2011, at 4:33 PM, Teruhiko Kurosaka wrote:

> I'm a newbie and I am confused by the Hadoop releases.
> I thought 0.21.0 is the latest & greatest release that I
> should be using but I noticed 0.20.203 has been released
> lately, and 0.21.X is marked "unstable, unsupported".
> 
> Should I be using 0.20.203?
> ----
> T. "Kuro" Kurosaka
> 
> 


Re: Which release to use?

Posted by Arun C Murthy <ac...@hortonworks.com>.
Hi,

 0.20.203 is the latest stable release which includes a ton of features (security - kerberos based authentication) and fixes. Its currently deployed at over 50k machines at Yahoo too.
 So, yes, I'd encourage you to use 0.20.203. We, the community, are currently working on hadoop-0.23 and hope to get it out soon.

thanks,
Arun

On Jul 14, 2011, at 4:33 PM, Teruhiko Kurosaka wrote:

> I'm a newbie and I am confused by the Hadoop releases.
> I thought 0.21.0 is the latest & greatest release that I
> should be using but I noticed 0.20.203 has been released
> lately, and 0.21.X is marked "unstable, unsupported".
> 
> Should I be using 0.20.203?
> ----
> T. "Kuro" Kurosaka
> 
>