You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@incubator.apache.org by Josh Wills <jw...@cloudera.com> on 2012/05/16 02:23:28 UTC

[DISCUSS] Crunch to join the Apache Incubator

Hi all,

I would like to propose Crunch, a library for writing MapReduce
pipelines in Java and Scala, as an Apache Incubator project. The
proposal is here:

http://wiki.apache.org/incubator/CrunchProposal

We would gladly welcome additional volunteers to act as mentors on the
project, so if this sounds like your cup of tea, please feel free to
sign up or let us know.

Thanks!
Josh

-- 
Director of Data Science
Cloudera
Twitter: @josh_wills

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Josh Wills <jw...@cloudera.com>.
Hi Jukka,

Apologies for the delay, I had a vacation day. Replies inline.

On Wed, May 23, 2012 at 2:33 PM, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> On Wed, May 16, 2012 at 2:23 AM, Josh Wills <jw...@cloudera.com> wrote:
>> http://wiki.apache.org/incubator/CrunchProposal
>
> Some comments from the related vote thread:
>
>> formally released twice, as versions 0.1.0 (October 2010) and 0.2.0
>
> s/2010/2011/ I presume.

Indeed. Fixed.

>
>> == Source and Intellectual Property Submission Plan ==
>>
>>  * The initial source is already licensed under the Apache License,
>> Version 2.0. https://github.com/cloudera/crunch/blob/master/LICENSE.txt
>
> A software grant is customary for existing codebases.

Understood-- is there an example of the appropriate language?

>
>> == Github Repositories ==
>>
>> http://github.com/apache/crunch
>> git://git.apache.org/crunch.git
>
> I assume you mean that you want to develop the codebase using Git
> instead of Subversion here at the ASF? See
> http://git-wip-us.apache.org/ for more info, and please get in touch
> with infra about the details. A volunteer to help manage the Git
> repository will be appreciated.

Yes, git is our preference. I will get in touch with infra and
volunteer to do the repo management.

>
> BR,
>
> Jukka Zitting
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>



-- 
Director of Data Science
Cloudera
Twitter: @josh_wills

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Wed, May 16, 2012 at 2:23 AM, Josh Wills <jw...@cloudera.com> wrote:
> http://wiki.apache.org/incubator/CrunchProposal

Some comments from the related vote thread:

> formally released twice, as versions 0.1.0 (October 2010) and 0.2.0

s/2010/2011/ I presume.

> == Source and Intellectual Property Submission Plan ==
>
>  * The initial source is already licensed under the Apache License,
> Version 2.0. https://github.com/cloudera/crunch/blob/master/LICENSE.txt

A software grant is customary for existing codebases.

> == Github Repositories ==
>
> http://github.com/apache/crunch
> git://git.apache.org/crunch.git

I assume you mean that you want to develop the codebase using Git
instead of Subversion here at the ASF? See
http://git-wip-us.apache.org/ for more info, and please get in touch
with infra about the details. A volunteer to help manage the Git
repository will be appreciated.

BR,

Jukka Zitting

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Josh Wills <jw...@cloudera.com>.
On Wed, May 23, 2012 at 1:15 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> On Wed, May 23, 2012 at 11:35 AM, Josh Wills <jw...@cloudera.com> wrote:
>> That said, the team did feel strongly about keeping the initial committers
>> to people who had already added major pieces of functionality to Crunch
>
> Speaking as a former benevolent dictator...

I don't imagine that any of the other committers consider me benevolent. ;-)

>
> When you bring a project to Apache, you have to accept that the project may
> someday go in a direction that you don't want to go.  That's the price for
> giving other people a true stake in governance.
>
> Console yourself with the thought that if an active community takes the
> project and runs with it -- in any direction -- the project has succeeded.

I feel this wholeheartedly. The decision to take Crunch to the
Incubator was based on the fact that some of the most important
contributions to the project this year did not come from someone at
Cloudera, and that Robert, Gabriel, and Chris had contributed so much
that they had just as much right to decide the future direction of the
project as I did. There is nothing I admire more about the ASF than
its commitment to meritocracy.

>
> Marvin Humphrey
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>



-- 
Director of Data Science
Cloudera
Twitter: @josh_wills

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Wed, May 23, 2012 at 11:35 AM, Josh Wills <jw...@cloudera.com> wrote:
> That said, the team did feel strongly about keeping the initial committers
> to people who had already added major pieces of functionality to Crunch

Speaking as a former benevolent dictator...

When you bring a project to Apache, you have to accept that the project may
someday go in a direction that you don't want to go.  That's the price for
giving other people a true stake in governance.

Console yourself with the thought that if an active community takes the
project and runs with it -- in any direction -- the project has succeeded.

Marvin Humphrey

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Steve Loughran <st...@gmail.com>.
On 25 May 2012 20:00, Josh Wills <jw...@cloudera.com> wrote:

> Hi Steve,
>
> Thank you for your thoughtful comments. Replies inlined below.
>
> > 1. He's using it at work, so represents the end users.
>
> A super-majority of the initial committers are also end users. I use
> Crunch on my own projects (e.g.,
> http://github.com/cloudera/seismichadoop and
> http://github.com/cloudera/matching ), Cloudera solutions architects
> use Crunch on client projects, Robert is building tools on top of
> Crunch at WibiData, and Gabriel and Chris use it for building
> pipelines at TomTom. I can't speak for Tom and Vinod, but of course,
> they have other positive qualities. :)
>

It's still a fairly limited set of organisations and lacks independence.
Jakob and colleagues have no strategic goals in ensuring the
success/failure of any specific OSS project, merely getting the right tools
for their job.


In a true OSS project -not one that is released under some OSS license but
is effectively a single-vendor-project (JBoss, MySQL, etc) - end users are
not merely consumers of the output, they are potential engineering
resources to be co-opted, be it in their suggestions for improvement,
documentation, bugreps, tests and code itself. That's the challenge -and
it's not easy, especially in a project where some of the developers work on
it full time, others are people that use it a bit and find bugs. Those
little contributors need to be nurtured until they become good ones.



>
> > 2. His code is always of high quality
>
> I in no way meant to disparage Jakob or his coding. The objective of
> my reply was say "no" in the most apologetic, obsequious way possible
> while not going so far over the top as to sound insincere. Having
> LinkedIn on board would be a tremendous PR boost for the project. It
> was painful to say no.
>
> I am in no way savvy in the ways of Apache or the politics of the ASF.
>

Not Apache politics, but a core belief: the notion that a community is
actually more important than the codebase itself. The goal of an incubating
project is not so much to get into the code into a shape where it is ready
to graduate -but build a community to a point where it is considered
successful.

If you don't want that, but instead want to have a project over which you
retain tight control, you are free to continue to host it on github.



> I understand that smart people who I respect a great deal think that
> this is the wrong decision. But I think that it takes something really
> great for someone to see a project like Crunch, play around with, and
> then take the time to make some contributions to it without any
> expectation of recognition, in the form of an Apache committership or
> anything else. That was what Gabriel and Chris and Robert did over the
> past few months. I really admire that, and I think that it deserves
> some special recognition, however small.


That is good, and their past and hopefully ongoing work will help the
project -I just think that it would have helped the project if Jakob's had
been embraced


> I'm willing to have some
> people not like me or think I'm dumb if that's the price of giving
> that to them.
>

I all I have are concerns that the proposal is at risk from the same
problems that others have had in incubation.



> > 3. Given the ongoing discussion on diversity w.r.t Flume, I think it
> would
> > be wise to not follow that projects example, and try to get broader
> > involvement from the outset.
>
> I agree that it is critical to have broad involvement at the outset.
> Both S4 and Flume started out with at least 50% of their initial
> committers from a single company, and no single company constitutes a
> majority of the initial committers to Crunch (Cloudera has three,
> TomTom has two, WibiData has one, and Hortonworks has one). That de
> jure diversity mirrors the de facto diversity in Crunch's commit logs
> over the past several months:
>
> https://github.com/cloudera/crunch/commits/master
>
> There is nothing more important than increasing that de facto
> diversity over time. I fully expect that my role during the incubator
> process is to be the best documenter, repository maintainer, and
> recruiter of new contributors that I can be.
>

It's not clear that Flume has widened its developer base significantly
enough for it to graduate. I fear that Crunch is exposed to the same risks,
and the fact that you are opting to exclude Jakob from the initial dev team
concerns me.

-Steve

Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Josh Wills <jw...@cloudera.com>.
Hi Steve,

Thank you for your thoughtful comments. Replies inlined below.

On Fri, May 25, 2012 at 2:39 AM, Steve Loughran
<st...@gmail.com> wrote:
> On 23 May 2012 19:35, Josh Wills <jw...@cloudera.com> wrote:
>
>> Hey Jakob,
>>
>> This was a tough one-- you know that I've been talking about Crunch
>> w/Joe Adler for a few weeks now, and I personally am really looking
>> forward to working with you guys. That said, the team did feel
>> strongly about keeping the initial committers to people who had
>> already added major pieces of functionality to Crunch, and adding
>> Vinod was about his expertise on the MR2 internals, which we think
>> will be critical to Crunch's success. We are going to put the Crunch
>> proposal up for a vote with the current team in place.
>>
>> We are, of course, very eager to grow the list of committers through
>> the normal Apache process.
>>
>
> I'd go for pulling Jakob in for tactical and strategic reasons
>
> 1. He's using it at work, so represents the end users.

A super-majority of the initial committers are also end users. I use
Crunch on my own projects (e.g.,
http://github.com/cloudera/seismichadoop and
http://github.com/cloudera/matching ), Cloudera solutions architects
use Crunch on client projects, Robert is building tools on top of
Crunch at WibiData, and Gabriel and Chris use it for building
pipelines at TomTom. I can't speak for Tom and Vinod, but of course,
they have other positive qualities. :)

> 2. His code is always of high quality

I in no way meant to disparage Jakob or his coding. The objective of
my reply was say "no" in the most apologetic, obsequious way possible
while not going so far over the top as to sound insincere. Having
LinkedIn on board would be a tremendous PR boost for the project. It
was painful to say no.

I am in no way savvy in the ways of Apache or the politics of the ASF.
I understand that smart people who I respect a great deal think that
this is the wrong decision. But I think that it takes something really
great for someone to see a project like Crunch, play around with, and
then take the time to make some contributions to it without any
expectation of recognition, in the form of an Apache committership or
anything else. That was what Gabriel and Chris and Robert did over the
past few months. I really admire that, and I think that it deserves
some special recognition, however small. I'm willing to have some
people not like me or think I'm dumb if that's the price of giving
that to them.

> 3. Given the ongoing discussion on diversity w.r.t Flume, I think it would
> be wise to not follow that projects example, and try to get broader
> involvement from the outset.

I agree that it is critical to have broad involvement at the outset.
Both S4 and Flume started out with at least 50% of their initial
committers from a single company, and no single company constitutes a
majority of the initial committers to Crunch (Cloudera has three,
TomTom has two, WibiData has one, and Hortonworks has one). That de
jure diversity mirrors the de facto diversity in Crunch's commit logs
over the past several months:

https://github.com/cloudera/crunch/commits/master

There is nothing more important than increasing that de facto
diversity over time. I fully expect that my role during the incubator
process is to be the best documenter, repository maintainer, and
recruiter of new contributors that I can be.

Best,
Josh

-- 
Director of Data Science
Cloudera
Twitter: @josh_wills

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by "Edward J. Yoon" <ed...@apache.org>.
+1

Sent from my iPad

On May 27, 2012, at 6:51 AM, Jukka Zitting <ju...@gmail.com> wrote:

> Personally I've never fully understood the practice of allowing just
> about anyone to sign up as an initial committer of a podling. Putting
> your name on a list does not make you a part of a community,
> participating and contributing does. Recognizing such merit with
> committership is an important part of the social glue that binds our
> communities together.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Josh Wills <jw...@cloudera.com>.
Inlined.

On Sat, May 26, 2012 at 3:20 PM, Jakob Homan <jg...@gmail.com> wrote:
> This isn't about whether or not they will respond appropriately to new
> contributors once they are incubator project.  And it's absolutely not
> whether or not I should be an initial committer (the vote's going on
> already and I'm happy it is).
>
> Since the team has already stumbled in its first steps, I can't
> imagine it would fail further. I'm sure the whole of the team will
> step up (as part of the community).  And I can't say if I'll
> contribute or not.  I'm perfectly fine with not being on the initial
> team and believe the vote should finish and the proposal be accepted
> (I'll put in a binding +1 shortly).

It has certainly been a...let's go with "brisk," education. :)

Jakob, for whatever it is worth, we would be thrilled to have you, and
Joseph, and anyone else at LinkedIn who is interested join Crunch as
committers. Whatever the Apache Incubator equivalent of rolling out
the red carpet is, that's what we'll do.

>
> Instead, it's about the Podling's first steps to building a community,
> which were:
> 1) Announcing the would follow the (in my view flawed) approach of S4
> and reject any established members of the Apache community that may
> wish to join during the proposal.
> 2) Announce an exception because one volunteer (Vinod) had a
> particular background that would be useful.
> 3) Refuse a volunteer with a similar background but who has history of
> being a critic of the company where the code originated.
> 4) When pressed to explain this irregularity, dig itself deeper by
> inventing new concepts out of thin air ('de facto diversity', 'de jure
> diversity') and seeming to suggest that membership on the initial dev
> team was supposed to be some type of first-among equals status that
> was a gift from one person:
>> I really admire that, and I think that it deserves
>> some special recognition, however small. I'm willing to have some
>> people not like me or think I'm dumb if that's the price of giving
>>that to them.
>
> This would be a noble sentiment if Apache Martyr were a role and if it
> didn't severely hobble the appearance of an honest effort at the
> Apache meritocratic approach.  As I said before, it takes a lot of
> dedication to argue this passionately on a subject one professes
> ignorance of.

Okay, that last line was pretty funny.

>
> After this, I think the possibility of such an attitude continuing
> into the polling stage is quite small. After stumbling like this I'm
> sure the team will make every effort to build the community as quickly
> as possible.

Indeed we will. And to Benson's point, there are several people, none
of whom work at the organizations involved in the initial proposal,
who have made contributions to Crunch and that we would like to
continue to work with and have them grow to become committers on the
project. Expanding and diversifying the community based on merit is
our singular focus.

>
>
> On Sat, May 26, 2012 at 3:02 PM, Benson Margulies <bi...@gmail.com> wrote:
>> I'll see Jukka one and raise him one. I have advised potential
>> podlings to be very conservative with their initial list, and keep
>> some potential contributors in their collective back pocket. This
>> gives them a ready-made source of community growth, which is typically
>> the scarcest and most precious commodity to a podling. Given the
>> particular remarks around this case, as Jukka points out, it's a
>> great, ahem, opportunity for founders to demonstrate their
>> understanding of the preeminent value of community growth as opposed
>> to code perfection.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>



-- 
Director of Data Science
Cloudera
Twitter: @josh_wills

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Jakob Homan <jg...@gmail.com>.
This isn't about whether or not they will respond appropriately to new
contributors once they are incubator project.  And it's absolutely not
whether or not I should be an initial committer (the vote's going on
already and I'm happy it is).

Since the team has already stumbled in its first steps, I can't
imagine it would fail further. I'm sure the whole of the team will
step up (as part of the community).  And I can't say if I'll
contribute or not.  I'm perfectly fine with not being on the initial
team and believe the vote should finish and the proposal be accepted
(I'll put in a binding +1 shortly).

Instead, it's about the Podling's first steps to building a community,
which were:
1) Announcing the would follow the (in my view flawed) approach of S4
and reject any established members of the Apache community that may
wish to join during the proposal.
2) Announce an exception because one volunteer (Vinod) had a
particular background that would be useful.
3) Refuse a volunteer with a similar background but who has history of
being a critic of the company where the code originated.
4) When pressed to explain this irregularity, dig itself deeper by
inventing new concepts out of thin air ('de facto diversity', 'de jure
diversity') and seeming to suggest that membership on the initial dev
team was supposed to be some type of first-among equals status that
was a gift from one person:
> I really admire that, and I think that it deserves
> some special recognition, however small. I'm willing to have some
> people not like me or think I'm dumb if that's the price of giving
>that to them.

This would be a noble sentiment if Apache Martyr were a role and if it
didn't severely hobble the appearance of an honest effort at the
Apache meritocratic approach.  As I said before, it takes a lot of
dedication to argue this passionately on a subject one professes
ignorance of.

After this, I think the possibility of such an attitude continuing
into the polling stage is quite small. After stumbling like this I'm
sure the team will make every effort to build the community as quickly
as possible.


On Sat, May 26, 2012 at 3:02 PM, Benson Margulies <bi...@gmail.com> wrote:
> I'll see Jukka one and raise him one. I have advised potential
> podlings to be very conservative with their initial list, and keep
> some potential contributors in their collective back pocket. This
> gives them a ready-made source of community growth, which is typically
> the scarcest and most precious commodity to a podling. Given the
> particular remarks around this case, as Jukka points out, it's a
> great, ahem, opportunity for founders to demonstrate their
> understanding of the preeminent value of community growth as opposed
> to code perfection.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Benson Margulies <bi...@gmail.com>.
I'll see Jukka one and raise him one. I have advised potential
podlings to be very conservative with their initial list, and keep
some potential contributors in their collective back pocket. This
gives them a ready-made source of community growth, which is typically
the scarcest and most precious commodity to a podling. Given the
particular remarks around this case, as Jukka points out, it's a
great, ahem, opportunity for founders to demonstrate their
understanding of the preeminent value of community growth as opposed
to code perfection.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Fri, May 25, 2012 at 11:39 AM, Steve Loughran
<st...@gmail.com> wrote:
> I'd go for pulling Jakob in for tactical and strategic reasons

We grant committership based on merit, not tactics or strategy. Sounds
to me like Josh and the rest of the team would be quite willing to add
Jakob as soon as he shows up with a few patches or other
contributions.

Personally I've never fully understood the practice of allowing just
about anyone to sign up as an initial committer of a podling. Putting
your name on a list does not make you a part of a community,
participating and contributing does. Recognizing such merit with
committership is an important part of the social glue that binds our
communities together.

Of course it's a problem if it turns out that Jakob or anyone else
ends up having trouble earning Crunch committership with reasonable
effort, but so far I see little reason to worry about that.

BR,

Jukka Zitting

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Steve Loughran <st...@gmail.com>.
On 23 May 2012 19:35, Josh Wills <jw...@cloudera.com> wrote:

> Hey Jakob,
>
> This was a tough one-- you know that I've been talking about Crunch
> w/Joe Adler for a few weeks now, and I personally am really looking
> forward to working with you guys. That said, the team did feel
> strongly about keeping the initial committers to people who had
> already added major pieces of functionality to Crunch, and adding
> Vinod was about his expertise on the MR2 internals, which we think
> will be critical to Crunch's success. We are going to put the Crunch
> proposal up for a vote with the current team in place.
>
> We are, of course, very eager to grow the list of committers through
> the normal Apache process.
>

I'd go for pulling Jakob in for tactical and strategic reasons

1. He's using it at work, so represents the end users.
2. His code is always of high quality
3. Given the ongoing discussion on diversity w.r.t Flume, I think it would
be wise to not follow that projects example, and try to get broader
involvement from the outset.

Irrespective of the committer list at entry, you will need that broader
community at exit, so getting them in early is good.

Anyway, your decision, it won't effect my voting -it's just a different
action from that which I'd have taken.


Now, why are my tests failing...
Steve

Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Jakob Homan <jg...@gmail.com>.
> Assuming the VOTE passes, I hope you'll still give the project a chance,
> Jakob.  Given your rep around the ASF, if you contribute as you have to other
> projects yet your merit goes unrecognized, I suspect that the Crunch's Mentors
> are going to be asking questions. ;)

I imagine we'll continue to evaluate and extend Crunch, whether as
part of Incubator or not.

I'm hopeful Crunch's incubation will turn out well and am confident
Josh will make sure it does.  After all, being willing to declare that
rejecting offers of help from experienced volunteers during the
incubator proposal is not "the normal Apache process" after having
contributed 10 patches definitely shows a willingness to do whatever
it takes to build the community the team wants in place.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Wed, May 23, 2012 at 12:40 PM, Jakob Homan <jg...@gmail.com> wrote:
> Totally understand, although basing your criteria on S4's approach,
> which has a rocky time in the incubator and has yet to create a
> release, is certainly an interesting approach.  Best of luck.

Assuming the VOTE passes, I hope you'll still give the project a chance,
Jakob.  Given your rep around the ASF, if you contribute as you have to other
projects yet your merit goes unrecognized, I suspect that the Crunch's Mentors
are going to be asking questions. ;)

Marvin Humphrey

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Jakob Homan <jg...@gmail.com>.
Totally understand, although basing your criteria on S4's approach,
which has a rocky time in the incubator and has yet to create a
release, is certainly an interesting approach.  Best of luck.
-jg

On Wed, May 23, 2012 at 11:35 AM, Josh Wills <jw...@cloudera.com> wrote:
> Hey Jakob,
>
> This was a tough one-- you know that I've been talking about Crunch
> w/Joe Adler for a few weeks now, and I personally am really looking
> forward to working with you guys. That said, the team did feel
> strongly about keeping the initial committers to people who had
> already added major pieces of functionality to Crunch, and adding
> Vinod was about his expertise on the MR2 internals, which we think
> will be critical to Crunch's success. We are going to put the Crunch
> proposal up for a vote with the current team in place.
>
> We are, of course, very eager to grow the list of committers through
> the normal Apache process.
>
> Best,
> Josh
>
> On Tue, May 22, 2012 at 1:34 PM, Jakob Homan <jg...@gmail.com> wrote:
>> Here at LinkedIn we're also experimenting with Crunch and are
>> interested in seeing it succeed in the Incubator.  Barring objections,
>> I'd like to add myself as well.  Hadoo, Giraph and Kafka committer,
>> details here: http://www.linkedin.com/in/jghoman
>>
>> -Jakob
>>
>>
>> On Mon, May 21, 2012 at 10:22 AM, Josh Wills <jw...@cloudera.com> wrote:
>>> Thank you Vinod. I wasn't sure of the right protocol for this sort of
>>> thing, as my expectation was that the initial committers would be
>>> drawn from the people who had contributed to Crunch already. This
>>> thread from when S4 entered the incubator was particularly
>>> illuminating:
>>>
>>> http://markmail.org/message/aw54w4mhg4zfegpn
>>>
>>> After talking it over with my co-submitters, the consensus was that
>>> your background is uniquely valuable to the project, and that we would
>>> like to have you with us as an initial committer.
>>>
>>> Josh
>>>
>>> On Sat, May 19, 2012 at 12:51 PM, Vinod Kumar Vavilapalli
>>> <vi...@hortonworks.com> wrote:
>>>>
>>>> +1, great addition. Looking forward to see this at Apache!
>>>>
>>>> I'd like to add myself to the list of initial committers if that is fine with you. (Didn't find any section for interested developers in the proposal, so asking here)
>>>>
>>>> My Background: Long time Hadoop MapReduce committer. Lead dev on Hadoop next-gen MapReduce aka YARN. Familiar with / know a bit or two about FlumeJava.
>>>>
>>>> Thanks,
>>>> +Vinod
>>>>
>>>>
>>>> On May 15, 2012, at 5:23 PM, Josh Wills wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I would like to propose Crunch, a library for writing MapReduce
>>>>> pipelines in Java and Scala, as an Apache Incubator project. The
>>>>> proposal is here:
>>>>>
>>>>> http://wiki.apache.org/incubator/CrunchProposal
>>>>>
>>>>> We would gladly welcome additional volunteers to act as mentors on the
>>>>> project, so if this sounds like your cup of tea, please feel free to
>>>>> sign up or let us know.
>>>>>
>>>>> Thanks!
>>>>> Josh
>>>>>
>>>>> --
>>>>> Director of Data Science
>>>>> Cloudera
>>>>> Twitter: @josh_wills
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>
>>>
>>>
>>>
>>> --
>>> Director of Data Science
>>> Cloudera
>>> Twitter: @josh_wills
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>
>
>
> --
> Director of Data Science
> Cloudera
> Twitter: @josh_wills
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Josh Wills <jw...@cloudera.com>.
Hey Jakob,

This was a tough one-- you know that I've been talking about Crunch
w/Joe Adler for a few weeks now, and I personally am really looking
forward to working with you guys. That said, the team did feel
strongly about keeping the initial committers to people who had
already added major pieces of functionality to Crunch, and adding
Vinod was about his expertise on the MR2 internals, which we think
will be critical to Crunch's success. We are going to put the Crunch
proposal up for a vote with the current team in place.

We are, of course, very eager to grow the list of committers through
the normal Apache process.

Best,
Josh

On Tue, May 22, 2012 at 1:34 PM, Jakob Homan <jg...@gmail.com> wrote:
> Here at LinkedIn we're also experimenting with Crunch and are
> interested in seeing it succeed in the Incubator.  Barring objections,
> I'd like to add myself as well.  Hadoo, Giraph and Kafka committer,
> details here: http://www.linkedin.com/in/jghoman
>
> -Jakob
>
>
> On Mon, May 21, 2012 at 10:22 AM, Josh Wills <jw...@cloudera.com> wrote:
>> Thank you Vinod. I wasn't sure of the right protocol for this sort of
>> thing, as my expectation was that the initial committers would be
>> drawn from the people who had contributed to Crunch already. This
>> thread from when S4 entered the incubator was particularly
>> illuminating:
>>
>> http://markmail.org/message/aw54w4mhg4zfegpn
>>
>> After talking it over with my co-submitters, the consensus was that
>> your background is uniquely valuable to the project, and that we would
>> like to have you with us as an initial committer.
>>
>> Josh
>>
>> On Sat, May 19, 2012 at 12:51 PM, Vinod Kumar Vavilapalli
>> <vi...@hortonworks.com> wrote:
>>>
>>> +1, great addition. Looking forward to see this at Apache!
>>>
>>> I'd like to add myself to the list of initial committers if that is fine with you. (Didn't find any section for interested developers in the proposal, so asking here)
>>>
>>> My Background: Long time Hadoop MapReduce committer. Lead dev on Hadoop next-gen MapReduce aka YARN. Familiar with / know a bit or two about FlumeJava.
>>>
>>> Thanks,
>>> +Vinod
>>>
>>>
>>> On May 15, 2012, at 5:23 PM, Josh Wills wrote:
>>>
>>>> Hi all,
>>>>
>>>> I would like to propose Crunch, a library for writing MapReduce
>>>> pipelines in Java and Scala, as an Apache Incubator project. The
>>>> proposal is here:
>>>>
>>>> http://wiki.apache.org/incubator/CrunchProposal
>>>>
>>>> We would gladly welcome additional volunteers to act as mentors on the
>>>> project, so if this sounds like your cup of tea, please feel free to
>>>> sign up or let us know.
>>>>
>>>> Thanks!
>>>> Josh
>>>>
>>>> --
>>>> Director of Data Science
>>>> Cloudera
>>>> Twitter: @josh_wills
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>
>>
>>
>>
>> --
>> Director of Data Science
>> Cloudera
>> Twitter: @josh_wills
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>



-- 
Director of Data Science
Cloudera
Twitter: @josh_wills

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Jakob Homan <jg...@gmail.com>.
Here at LinkedIn we're also experimenting with Crunch and are
interested in seeing it succeed in the Incubator.  Barring objections,
I'd like to add myself as well.  Hadoo, Giraph and Kafka committer,
details here: http://www.linkedin.com/in/jghoman

-Jakob


On Mon, May 21, 2012 at 10:22 AM, Josh Wills <jw...@cloudera.com> wrote:
> Thank you Vinod. I wasn't sure of the right protocol for this sort of
> thing, as my expectation was that the initial committers would be
> drawn from the people who had contributed to Crunch already. This
> thread from when S4 entered the incubator was particularly
> illuminating:
>
> http://markmail.org/message/aw54w4mhg4zfegpn
>
> After talking it over with my co-submitters, the consensus was that
> your background is uniquely valuable to the project, and that we would
> like to have you with us as an initial committer.
>
> Josh
>
> On Sat, May 19, 2012 at 12:51 PM, Vinod Kumar Vavilapalli
> <vi...@hortonworks.com> wrote:
>>
>> +1, great addition. Looking forward to see this at Apache!
>>
>> I'd like to add myself to the list of initial committers if that is fine with you. (Didn't find any section for interested developers in the proposal, so asking here)
>>
>> My Background: Long time Hadoop MapReduce committer. Lead dev on Hadoop next-gen MapReduce aka YARN. Familiar with / know a bit or two about FlumeJava.
>>
>> Thanks,
>> +Vinod
>>
>>
>> On May 15, 2012, at 5:23 PM, Josh Wills wrote:
>>
>>> Hi all,
>>>
>>> I would like to propose Crunch, a library for writing MapReduce
>>> pipelines in Java and Scala, as an Apache Incubator project. The
>>> proposal is here:
>>>
>>> http://wiki.apache.org/incubator/CrunchProposal
>>>
>>> We would gladly welcome additional volunteers to act as mentors on the
>>> project, so if this sounds like your cup of tea, please feel free to
>>> sign up or let us know.
>>>
>>> Thanks!
>>> Josh
>>>
>>> --
>>> Director of Data Science
>>> Cloudera
>>> Twitter: @josh_wills
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>
>
>
> --
> Director of Data Science
> Cloudera
> Twitter: @josh_wills
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Josh Wills <jw...@cloudera.com>.
Thank you Vinod. I wasn't sure of the right protocol for this sort of
thing, as my expectation was that the initial committers would be
drawn from the people who had contributed to Crunch already. This
thread from when S4 entered the incubator was particularly
illuminating:

http://markmail.org/message/aw54w4mhg4zfegpn

After talking it over with my co-submitters, the consensus was that
your background is uniquely valuable to the project, and that we would
like to have you with us as an initial committer.

Josh

On Sat, May 19, 2012 at 12:51 PM, Vinod Kumar Vavilapalli
<vi...@hortonworks.com> wrote:
>
> +1, great addition. Looking forward to see this at Apache!
>
> I'd like to add myself to the list of initial committers if that is fine with you. (Didn't find any section for interested developers in the proposal, so asking here)
>
> My Background: Long time Hadoop MapReduce committer. Lead dev on Hadoop next-gen MapReduce aka YARN. Familiar with / know a bit or two about FlumeJava.
>
> Thanks,
> +Vinod
>
>
> On May 15, 2012, at 5:23 PM, Josh Wills wrote:
>
>> Hi all,
>>
>> I would like to propose Crunch, a library for writing MapReduce
>> pipelines in Java and Scala, as an Apache Incubator project. The
>> proposal is here:
>>
>> http://wiki.apache.org/incubator/CrunchProposal
>>
>> We would gladly welcome additional volunteers to act as mentors on the
>> project, so if this sounds like your cup of tea, please feel free to
>> sign up or let us know.
>>
>> Thanks!
>> Josh
>>
>> --
>> Director of Data Science
>> Cloudera
>> Twitter: @josh_wills
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>



-- 
Director of Data Science
Cloudera
Twitter: @josh_wills

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
+1, great addition. Looking forward to see this at Apache!

I'd like to add myself to the list of initial committers if that is fine with you. (Didn't find any section for interested developers in the proposal, so asking here)

My Background: Long time Hadoop MapReduce committer. Lead dev on Hadoop next-gen MapReduce aka YARN. Familiar with / know a bit or two about FlumeJava.

Thanks,
+Vinod


On May 15, 2012, at 5:23 PM, Josh Wills wrote:

> Hi all,
> 
> I would like to propose Crunch, a library for writing MapReduce
> pipelines in Java and Scala, as an Apache Incubator project. The
> proposal is here:
> 
> http://wiki.apache.org/incubator/CrunchProposal
> 
> We would gladly welcome additional volunteers to act as mentors on the
> project, so if this sounds like your cup of tea, please feel free to
> sign up or let us know.
> 
> Thanks!
> Josh
> 
> -- 
> Director of Data Science
> Cloudera
> Twitter: @josh_wills
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Josh Wills <jw...@cloudera.com>.
Arun,

That would be great-- thank you. I went ahead and added your name to
the mentors list. Look forward to seeing you at Hadoop Summit.

Best,
Josh

On Fri, May 18, 2012 at 3:46 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Josh,
>
>  Sounds interesting, I've followed Crunch given my leanings towards Apache Hadoop MapReduce. Good to see it in the ASF.
>
>  If you don't mind I'll sign up as a volunteer mentor.
>
> thanks,
> Arun
>
> On May 15, 2012, at 5:23 PM, Josh Wills wrote:
>
>> Hi all,
>>
>> I would like to propose Crunch, a library for writing MapReduce
>> pipelines in Java and Scala, as an Apache Incubator project. The
>> proposal is here:
>>
>> http://wiki.apache.org/incubator/CrunchProposal
>>
>> We would gladly welcome additional volunteers to act as mentors on the
>> project, so if this sounds like your cup of tea, please feel free to
>> sign up or let us know.
>>
>> Thanks!
>> Josh
>>
>> --
>> Director of Data Science
>> Cloudera
>> Twitter: @josh_wills
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>



-- 
Director of Data Science
Cloudera
Twitter: @josh_wills

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Arun C Murthy <ac...@hortonworks.com>.
Josh,

 Sounds interesting, I've followed Crunch given my leanings towards Apache Hadoop MapReduce. Good to see it in the ASF.

 If you don't mind I'll sign up as a volunteer mentor.

thanks,
Arun

On May 15, 2012, at 5:23 PM, Josh Wills wrote:

> Hi all,
> 
> I would like to propose Crunch, a library for writing MapReduce
> pipelines in Java and Scala, as an Apache Incubator project. The
> proposal is here:
> 
> http://wiki.apache.org/incubator/CrunchProposal
> 
> We would gladly welcome additional volunteers to act as mentors on the
> project, so if this sounds like your cup of tea, please feel free to
> sign up or let us know.
> 
> Thanks!
> Josh
> 
> -- 
> Director of Data Science
> Cloudera
> Twitter: @josh_wills
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Josh Wills <jw...@cloudera.com>.
Nah, that was the automatic wiki linking on CapitalizedWords being a
little bit too clever. Fixed.

On Fri, May 18, 2012 at 10:03 AM, Donald Whytock <dw...@gmail.com> wrote:
> The MapReduce link in the proposal doesn't resolve.  You perhaps want
> to use "http://hadoop.apache.org/mapreduce/" instead.
>
> Don
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>



-- 
Director of Data Science
Cloudera
Twitter: @josh_wills

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Donald Whytock <dw...@gmail.com>.
The MapReduce link in the proposal doesn't resolve.  You perhaps want
to use "http://hadoop.apache.org/mapreduce/" instead.

Don

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Josh Wills <jw...@cloudera.com>.
Hey JB,

I think that the underlying data model is the main difference. Pig,
like Hive and Cascading, has a relational data model-- the fundamental
data type is a Tuple of values. Crunch is closer to bare-metal
MapReduce; it doesn't impose a data model on the developer, and I
think that it ends up being easier to use Crunch when you're working
with data types that would otherwise require you to write lots of UDFs
in Pig-- for example, time series, matrices, or HDF5 files. [1]

The other major difference is, as you alluded to, the programming
environment-- Crunch is a Java library that also has a Scala wrapper,
while Pig is, like Hive, a domain-specific language. Much like the
data model, there is a tradeoff here as well-- Crunch requires more
skilled developers, but it offers those developers the benefits of a
real programming language, like for loops, debugging tools, and a rich
ecosystem of testing frameworks.

I am a Pig fan (see, for instance, [2] and [3]), and I see the tools
as complements, not competitors. Crunch is used by developers who are
building ETL pipelines in which performance and thorough testing are
critical, and Pig is used by analysts and data scientists in order to
run thousands of queries over the results of those ETL pipelines.

Best,
Josh

[1] http://www.hdfgroup.org/HDF5/
[2] http://www.cloudera.com/blog/2011/11/using-hadoop-to-analyze-adverse-drug-events/
[3] http://engineering.linkedin.com/open-source/introducing-datafu-open-source-collection-useful-apache-pig-udfs

On Fri, May 18, 2012 at 1:49 AM, Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
> Hi Josh,
>
> Could you compare with Pig ? Is Scala support the main difference ?
>
> Thanks,
> Regards
> JB
>
>
> On 05/16/2012 02:23 AM, Josh Wills wrote:
>>
>> Hi all,
>>
>> I would like to propose Crunch, a library for writing MapReduce
>> pipelines in Java and Scala, as an Apache Incubator project. The
>> proposal is here:
>>
>> http://wiki.apache.org/incubator/CrunchProposal
>>
>> We would gladly welcome additional volunteers to act as mentors on the
>> project, so if this sounds like your cup of tea, please feel free to
>> sign up or let us know.
>>
>> Thanks!
>> Josh
>>
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>



-- 
Director of Data Science
Cloudera
Twitter: @josh_wills

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Crunch to join the Apache Incubator

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Josh,

Could you compare with Pig ? Is Scala support the main difference ?

Thanks,
Regards
JB

On 05/16/2012 02:23 AM, Josh Wills wrote:
> Hi all,
>
> I would like to propose Crunch, a library for writing MapReduce
> pipelines in Java and Scala, as an Apache Incubator project. The
> proposal is here:
>
> http://wiki.apache.org/incubator/CrunchProposal
>
> We would gladly welcome additional volunteers to act as mentors on the
> project, so if this sounds like your cup of tea, please feel free to
> sign up or let us know.
>
> Thanks!
> Josh
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org