You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by Dmitriy Ryaboy <dv...@gmail.com> on 2010/04/01 01:04:45 UTC

Re: Begin a discussion about Pig as a top level project

Over time, Pig is increasing its coupling to Hadoop (for good reasons),
rather than decreasing it. If and when Pig becomes a viable entity without
hadoop around, it might make sense as a TLP. As is, I think becoming a TLP
will only introduce unnecessary administrative and bureaucratic headaches.
So my vote is also -1.

-Dmitriy



On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <ga...@yahoo-inc.com> wrote:

> So far I haven't seen any feedback on this.  Apache has asked the Hadoop
> PMC to submit input in April on whether some subprojects should be promoted
> to TLPs.  We, the Pig community, need to give feedback to the Hadoop PMC on
> how we feel about this.  Please make your voice heard.
>
> So now I'll head my own call and give my thoughts on it.
>
> The biggest advantage I see to being a TLP is a direct connection to
> Apache.  Right now all of the Pig team's interaction with Apache is through
> the Hadoop PMC.  Being directly connected to Apache would benefit Pig team
> members who would have a better view into Apache.  It would also raise our
> profile in Apache and thus make other projects more aware of us.
>
> However, I am concerned about loosing Pig's explicit connection to Hadoop.
>  This concern has a couple of dimensions.  One, Hadoop and MapReduce are the
> current flavor of the month in computing.  Given that Pig shares a name with
> the common farm animal, it's hard to be sure based on search statistics.
>  But Google trends shows that "hadoop" is searched on much more frequently
> than "hadoop pig" or "apache pig" (see
> http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing that
> most Pig users come from Hadoop users who discover Pig via Hadoop's website.
>  Loosing that subproject tab on Hadoop's front page may radically lower the
> number of users coming to Pig to check out our project.  I would argue that
> this benefits Hadoop as well, since high level languages like Pig Latin have
> the potential to greatly extend the user base and usability of Hadoop.
>
> Two, being explicitly connected to Hadoop keeps our two communities aware
> of each others needs.  There are features proposed for MR that would greatly
> help Pig.  By staying in the Hadoop community Pig is better positioned to
> advocate for and help implement and test those features.  The response to
> this will be that Pig developers can still subscribe to Hadoop mailing
> lists, submit patches, etc.  That is, they can still be part of the Hadoop
> community.  Which reinforces my point that it makes more sense to leave Pig
> in the Hadoop community since Pig developers will need to be part of that
> community anyway.
>
> Finally, philosophically it makes sense to me that projects that are
> tightly connected belong together.  It strikes me as strange to have Pig as
> a TLP completely dependent on another TLP.  Hadoop was originally a
> subproject of Lucene.  It moved out to be a TLP when it became obvious that
> Hadoop had become independent of and useful apart from Lucene.  Pig is not
> in that position relative to Hadoop.
>
> So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to being
> persuaded that I'm wrong or my concerns can be addressed while still having
> Pig as a TLP.
>
> Alan.
>
>
> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
>
>  You have probably heard by now that there is a discussion going on in the
>> Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
>> Zookeeper, Hive, and Pig) should move out from under the Hadoop umbrella and
>> become top level Apache projects (TLP).  This discussion has picked up
>> recently since the Apache board has clearly communicated to the Hadoop PMC
>> that it is concerned that Hadoop is acting as an umbrella project with many
>> disjoint subprojects underneath it.  They are concerned that this gives
>> Apache little insight into the health and happenings of the subproject
>> communities which in turn means Apache cannot properly mentor those
>> communities.
>>
>> The purpose of this email is to start a discussion within the Pig
>> community about this topic.  Let me cover first what becoming TLP would mean
>> for Pig, and then I'll go into what options I think we as a community have.
>>
>> Becoming a TLP would mean that Pig would itself have a PMC that would
>> report directly to the Apache board.  Who would be on the PMC would be
>> something we as a community would need to decide.  Common options would be
>> to say all active committers are on the PMC, or all active committers who
>> have been a committer for at least a year.  We would also need to elect a
>> chair of the PMC.  This lucky person would have no additional power, but
>> would have the additional responsibility of writing quarterly reports on
>> Pig's status for Apache board meetings, as well as coordinating with Apache
>> to get accounts for new  committers, etc.  For more information see
>> http://www.apache.org/foundation/how-it-works.html#roles
>>
>> Becoming a TLP would not mean that we are ostracized from the Hadoop
>> community.  We would continue to be invited to Hadoop Summits, HUGs, etc.
>>  Since all Pig developers and users are by definition Hadoop users, we would
>> continue to be a strong presence in the Hadoop community.
>>
>> I see three ways that we as a community can respond to this:
>>
>> 1) Say yes, we want to be a TLP now.
>> 2) Say yes, we want to be a TLP, but not yet.  We feel we need more time
>> to mature.  If we choose this option we need to be able to clearly
>> articulate how much time we need and what we hope to see change in that
>> time.
>> 3) Say no, we feel the benefits for us staying with Hadoop outweigh the
>> drawbacks of being a disjoint subproject.  If we choose this, we need to be
>> able to say exactly what those benefits are and why we feel they will be
>> compromised by leaving the Hadoop project.
>>
>> There may other options that I haven't thought of.  Please feel free to
>> suggest any you think of.
>>
>> Questions?  Thoughts?  Let the discussion begin.
>>
>> Alan.
>>
>>
>

RE: Begin a discussion about Pig as a top level project

Posted by Pradeep Kamath <pr...@yahoo-inc.com>.

I agree with Ashutosh and Santhosh. Just based on the current direction of the project I think we are more closely tied with Hadoop now (with Pig 0.7, our load/store interfaces are very closely tied with Hadoop) - hence for now my vote would be a -1 to be a TLP - if there is change in that direction/philosophy to be really backend agnostic I think we should revisit this question.

Pradeep

-----Original Message-----
From: Ashutosh Chauhan [mailto:ashutosh.chauhan@gmail.com] 
Sent: Sunday, April 04, 2010 11:11 PM
To: pig-dev@hadoop.apache.org
Subject: Re: Begin a discussion about Pig as a top level project

I concur with Santhosh here. I think main question we need to answer
here is how close our ties are with Hadoop currently and how it will
be in future ? When Pig was originally designed the intent was to keep
it backend neutral, so  much so that there was a reference backend
implementation (also known as local engine) which had nothing to do
with Hadoop. But things have changed since then. Hadoop's local mode
is adopted in favor of Pig's own local mode. We have moved from being
backend agnostic to hadoop favoring. And while this was happening, it
seems we tried to keep Pig Latin language independent of hadoop
backend  while Pig runtime started to make use of hadoop concepts.

Apart from design decisions, this move also has a practical impact on
our codebase. Since we adopted Hadoop more closely, we got rid of an
extra layer of abstraction and instead started using similar
abstractions already existing in Hadoop. This has a positive impact
that it simplified the codebase and provides tighter integration with
Hadoop.
So, if we are continuing in a direction where Hadoop is our only
backend (or atleast a favored one), close ties to Hadoop are useful
because of the reasons Alan and Dmitriy pointed out. if not, then I
think moving out to TLP makes sense. Since, there is no efforts which
I am aware of, is trying to plug in a different backend for Pig, I
think maintaining close ties with Hadoop is useful for Pig. In future
when there is a different distributed computing platform comes up
which we want to use as backend, we can revisit our decision. So, as
for things stand today I am -1 to move out of  Hadoop.

And I would also like to reiterate my point that though Pig runtime
may continue to get closer to Hadoop, we shall keep Pig Latin
completely backend agnostic.

Ashutosh

On Sat, Apr 3, 2010 at 12:43, Santhosh Srinivasan <sm...@yahoo-inc.com> wrote:
> I see this as a multi-part question. Looking back at some of the
> significant roadmap/existential questions asked in the last 12 months, I
> see the following:
>
> 1. With the introduction of SQL, what is the philosophy of Pig (I sent
> an email about this approximately 9 months ago)
> 2. What is the approach to support backward compatibility in Pig (Alan
> had sent an email about this 3 months ago)
> 3. Should Pig be a TLP (the current email thread).
>
> Here is my take on answering the aforementioned questions.
>
> The initial philosophy of Pig was to be backend agnostic. It was
> designed as a data flow language. Whenever a new language is designed,
> the syntax and semantics of the language have to be laid out. The syntax
> is usually captured in the form of a BNF grammar. The semantics are
> defined by the language creators. Backward compatibility is then a
> question of holding true to the syntax and semantics. With Pig, in
> addition to the language, the Java APIs were exposed to customers to
> implement UDFs (load/store/filter/grouping/row transformation etc),
> provision looping since the language does not support looping constructs
> and also support a programmatic mode of access. Backward compatibility
> in this context is to support API versioning.
>
> Do we still intend to position as a data flow language that is backend
> agnostic? If the answer is yes, then there is a strong case for making
> Pig a TLP.
>
> Are we influenced by Hadoop? A big YES! The reason Pig chose to become a
> Hadoop sub-project was to ride the Hadoop popularity wave. As a
> consequence, we chose to be heavily influenced by the Hadoop roadmap.
>
> Like a good lawyer, I also have rebuttals to Alan's questions :)
>
> 1. Search engine popularity - We can discuss this with the Hadoop team
> and still retain links to TLP's that are coupled (loosely or tightly).
> 2. Explicit connection to Hadoop - I see this as logical connection v/s
> physical connection. Today, we are physically connected as a
> sub-project. Becoming a TLP, will not increase/decrease our influence on
> the Hadoop community (think Logical, Physical and MR Layers :)
> 3. Philosophy - I have already talked about this. The tight coupling is
> by choice. If Pig continues to be a data flow language with clear syntax
> and semantics then someone can implement Pig on top of a different
> backend. Do we intend to take this approach?
>
> I just wanted to offer a different opinion to this thread. I strongly
> believe that we should think about the original philosophy. Will we have
> a Pig standards committee that will decide on the changes to the
> language (think C/C++) if there are multiple backend implementations?
>
> I will reserve my vote based on the outcome of the philosophy and
> backward compatibility discussions. If we decide that Pig will be
> treated and maintained like a true language with clear syntax and
> semantics then we have a strong case to make it into a TLP. If not, we
> should retain our existing ties to Hadoop and make Pig into a data flow
> language for Hadoop.
>
> Santhosh
>
> -----Original Message-----
> From: Thejas Nair [mailto:tejas@yahoo-inc.com]
> Sent: Friday, April 02, 2010 4:08 PM
> To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
> Subject: Re: Begin a discussion about Pig as a top level project
>
> I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and
> heavily influenced by its roadmap. I think it makes sense to continue as
> a sub-project of hadoop.
>
> -Thejas
>
>
>
> On 3/31/10 4:04 PM, "Dmitriy Ryaboy" <dv...@gmail.com> wrote:
>
>> Over time, Pig is increasing its coupling to Hadoop (for good
>> reasons), rather than decreasing it. If and when Pig becomes a viable
>> entity without hadoop around, it might make sense as a TLP. As is, I
>> think becoming a TLP will only introduce unnecessary administrative
> and bureaucratic headaches.
>> So my vote is also -1.
>>
>> -Dmitriy
>>
>>
>>
>> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <ga...@yahoo-inc.com>
> wrote:
>>
>>> So far I haven't seen any feedback on this.  Apache has asked the
>>> Hadoop PMC to submit input in April on whether some subprojects
>>> should be promoted to TLPs.  We, the Pig community, need to give
>>> feedback to the Hadoop PMC on how we feel about this.  Please make
> your voice heard.
>>>
>>> So now I'll head my own call and give my thoughts on it.
>>>
>>> The biggest advantage I see to being a TLP is a direct connection to
>>> Apache.  Right now all of the Pig team's interaction with Apache is
>>> through the Hadoop PMC.  Being directly connected to Apache would
>>> benefit Pig team members who would have a better view into Apache.
>>> It would also raise our profile in Apache and thus make other
> projects more aware of us.
>>>
>>> However, I am concerned about loosing Pig's explicit connection to
> Hadoop.
>>>  This concern has a couple of dimensions.  One, Hadoop and MapReduce
>>> are the current flavor of the month in computing.  Given that Pig
>>> shares a name with the common farm animal, it's hard to be sure based
> on search statistics.
>>>  But Google trends shows that "hadoop" is searched on much more
>>> frequently than "hadoop pig" or "apache pig" (see
>>> http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing
>>> that most Pig users come from Hadoop users who discover Pig via
> Hadoop's website.
>>>  Loosing that subproject tab on Hadoop's front page may radically
>>> lower the number of users coming to Pig to check out our project.  I
>>> would argue that this benefits Hadoop as well, since high level
>>> languages like Pig Latin have the potential to greatly extend the
> user base and usability of Hadoop.
>>>
>>> Two, being explicitly connected to Hadoop keeps our two communities
>>> aware of each others needs.  There are features proposed for MR that
>>> would greatly help Pig.  By staying in the Hadoop community Pig is
>>> better positioned to advocate for and help implement and test those
>>> features.  The response to this will be that Pig developers can still
>
>>> subscribe to Hadoop mailing lists, submit patches, etc.  That is,
>>> they can still be part of the Hadoop community.  Which reinforces my
>>> point that it makes more sense to leave Pig in the Hadoop community
>>> since Pig developers will need to be part of that community anyway.
>>>
>>> Finally, philosophically it makes sense to me that projects that are
>>> tightly connected belong together.  It strikes me as strange to have
>>> Pig as a TLP completely dependent on another TLP.  Hadoop was
>>> originally a subproject of Lucene.  It moved out to be a TLP when it
>>> became obvious that Hadoop had become independent of and useful apart
>
>>> from Lucene.  Pig is not in that position relative to Hadoop.
>>>
>>> So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to
>>> being persuaded that I'm wrong or my concerns can be addressed while
>>> still having Pig as a TLP.
>>>
>>> Alan.
>>>
>>>
>>> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
>>>
>>>  You have probably heard by now that there is a discussion going on
>>> in the
>>>> Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
>>>> Zookeeper, Hive, and Pig) should move out from under the Hadoop
>>>> umbrella and become top level Apache projects (TLP).  This
>>>> discussion has picked up recently since the Apache board has clearly
>
>>>> communicated to the Hadoop PMC that it is concerned that Hadoop is
>>>> acting as an umbrella project with many disjoint subprojects
>>>> underneath it.  They are concerned that this gives Apache little
>>>> insight into the health and happenings of the subproject communities
>
>>>> which in turn means Apache cannot properly mentor those communities.
>>>>
>>>> The purpose of this email is to start a discussion within the Pig
>>>> community about this topic.  Let me cover first what becoming TLP
>>>> would mean for Pig, and then I'll go into what options I think we as
> a community have.
>>>>
>>>> Becoming a TLP would mean that Pig would itself have a PMC that
>>>> would report directly to the Apache board.  Who would be on the PMC
>>>> would be something we as a community would need to decide.  Common
>>>> options would be to say all active committers are on the PMC, or all
>
>>>> active committers who have been a committer for at least a year.  We
>
>>>> would also need to elect a chair of the PMC.  This lucky person
>>>> would have no additional power, but would have the additional
>>>> responsibility of writing quarterly reports on Pig's status for
>>>> Apache board meetings, as well as coordinating with Apache to get
>>>> accounts for new  committers, etc.  For more information see
>>>> http://www.apache.org/foundation/how-it-works.html#roles
>>>>
>>>> Becoming a TLP would not mean that we are ostracized from the Hadoop
>
>>>> community.  We would continue to be invited to Hadoop Summits, HUGs,
> etc.
>>>>  Since all Pig developers and users are by definition Hadoop users,
>>>> we would continue to be a strong presence in the Hadoop community.
>>>>
>>>> I see three ways that we as a community can respond to this:
>>>>
>>>> 1) Say yes, we want to be a TLP now.
>>>> 2) Say yes, we want to be a TLP, but not yet.  We feel we need more
>>>> time to mature.  If we choose this option we need to be able to
>>>> clearly articulate how much time we need and what we hope to see
>>>> change in that time.
>>>> 3) Say no, we feel the benefits for us staying with Hadoop outweigh
>>>> the drawbacks of being a disjoint subproject.  If we choose this, we
>
>>>> need to be able to say exactly what those benefits are and why we
>>>> feel they will be compromised by leaving the Hadoop project.
>>>>
>>>> There may other options that I haven't thought of.  Please feel free
>
>>>> to suggest any you think of.
>>>>
>>>> Questions?  Thoughts?  Let the discussion begin.
>>>>
>>>> Alan.
>>>>
>>>>
>>>
>
>

Re: Begin a discussion about Pig as a top level project

Posted by hc busy <hc...@gmail.com>.

I guess this is more of a suggestion for roadmap than TLP discussion, I
think the PMC/committers can create a dedicate position what maintains the
web/doc's. Somebody who yell and screams until the doc's are in sync with
the implementation before the release.

Because TLP is an elevation of status in addition to internal
re-organization. I think it might to create the PR needed to attract the
talents to fill in that job...


On Mon, Apr 5, 2010 at 11:23 AM, Alan Gates <ga...@yahoo-inc.com> wrote:

> I agree that Pig's code documentation is in sad shape.  I think our user
> documentation for each release is good, of limited.  I hope that our
> documents on wiki (such as PigJournal) help people understand our roadmap.
>  Please let us know if you disagree so we can find ways to improve it.
>
> That said, it isn't clear to me how Pig being a TLP will solve that.  The
> current committers or some subset thereof (see original message) would
> become the PMC.  Other than having expanded powers to vote on releases and
> who becomes new committers, the role of these new PMC members would not
> change much.  They won't have anymore time to address documentation and
> communication issues.  We need to find a way to address those no matter what
> governance framework or community Pig is in.
>
> Alan.
>
>
> On Apr 5, 2010, at 9:02 AM, hc busy wrote:
>
>  This is awesome!!! As much as I hate PJM's for wasting time at all the
>> places that I've worked at, I think formalizing the management group(PMC)
>> to
>> openly and clearly determine feature roadmap and dev schedule is the best
>> thing pig can have.
>>
>> I once commented to my co-worker (also heavy pig user) that pig's
>> organization (with all due respect to all you hardworking people) is like
>> a
>> pigsty! documentations all over the place, javadocs from three versions
>> ago,
>> much of the documentation doesn't match actual features... links to the
>> download page is broken.
>>
>> If you look at cascading's website... it's so much cleaner. (Of course...
>> we
>> still use pig because it works well)
>>
>> I think as TLP, pig will receive better marketing and better support in a
>> way that will propel it both in popularity and in the amount of support it
>> receives.
>>
>> As a user, that change will be good for me.
>>
>>
>> On Sun, Apr 4, 2010 at 11:10 PM, Ashutosh Chauhan <
>> ashutosh.chauhan@gmail.com> wrote:
>>
>>  I concur with Santhosh here. I think main question we need to answer
>>> here is how close our ties are with Hadoop currently and how it will
>>> be in future ? When Pig was originally designed the intent was to keep
>>> it backend neutral, so  much so that there was a reference backend
>>> implementation (also known as local engine) which had nothing to do
>>> with Hadoop. But things have changed since then. Hadoop's local mode
>>> is adopted in favor of Pig's own local mode. We have moved from being
>>> backend agnostic to hadoop favoring. And while this was happening, it
>>> seems we tried to keep Pig Latin language independent of hadoop
>>> backend  while Pig runtime started to make use of hadoop concepts.
>>>
>>> Apart from design decisions, this move also has a practical impact on
>>> our codebase. Since we adopted Hadoop more closely, we got rid of an
>>> extra layer of abstraction and instead started using similar
>>> abstractions already existing in Hadoop. This has a positive impact
>>> that it simplified the codebase and provides tighter integration with
>>> Hadoop.
>>> So, if we are continuing in a direction where Hadoop is our only
>>> backend (or atleast a favored one), close ties to Hadoop are useful
>>> because of the reasons Alan and Dmitriy pointed out. if not, then I
>>> think moving out to TLP makes sense. Since, there is no efforts which
>>> I am aware of, is trying to plug in a different backend for Pig, I
>>> think maintaining close ties with Hadoop is useful for Pig. In future
>>> when there is a different distributed computing platform comes up
>>> which we want to use as backend, we can revisit our decision. So, as
>>> for things stand today I am -1 to move out of  Hadoop.
>>>
>>> And I would also like to reiterate my point that though Pig runtime
>>> may continue to get closer to Hadoop, we shall keep Pig Latin
>>> completely backend agnostic.
>>>
>>> Ashutosh
>>>
>>> On Sat, Apr 3, 2010 at 12:43, Santhosh Srinivasan <sm...@yahoo-inc.com>
>>> wrote:
>>>
>>>> I see this as a multi-part question. Looking back at some of the
>>>> significant roadmap/existential questions asked in the last 12 months, I
>>>> see the following:
>>>>
>>>> 1. With the introduction of SQL, what is the philosophy of Pig (I sent
>>>> an email about this approximately 9 months ago)
>>>> 2. What is the approach to support backward compatibility in Pig (Alan
>>>> had sent an email about this 3 months ago)
>>>> 3. Should Pig be a TLP (the current email thread).
>>>>
>>>> Here is my take on answering the aforementioned questions.
>>>>
>>>> The initial philosophy of Pig was to be backend agnostic. It was
>>>> designed as a data flow language. Whenever a new language is designed,
>>>> the syntax and semantics of the language have to be laid out. The syntax
>>>> is usually captured in the form of a BNF grammar. The semantics are
>>>> defined by the language creators. Backward compatibility is then a
>>>> question of holding true to the syntax and semantics. With Pig, in
>>>> addition to the language, the Java APIs were exposed to customers to
>>>> implement UDFs (load/store/filter/grouping/row transformation etc),
>>>> provision looping since the language does not support looping constructs
>>>> and also support a programmatic mode of access. Backward compatibility
>>>> in this context is to support API versioning.
>>>>
>>>> Do we still intend to position as a data flow language that is backend
>>>> agnostic? If the answer is yes, then there is a strong case for making
>>>> Pig a TLP.
>>>>
>>>> Are we influenced by Hadoop? A big YES! The reason Pig chose to become a
>>>> Hadoop sub-project was to ride the Hadoop popularity wave. As a
>>>> consequence, we chose to be heavily influenced by the Hadoop roadmap.
>>>>
>>>> Like a good lawyer, I also have rebuttals to Alan's questions :)
>>>>
>>>> 1. Search engine popularity - We can discuss this with the Hadoop team
>>>> and still retain links to TLP's that are coupled (loosely or tightly).
>>>> 2. Explicit connection to Hadoop - I see this as logical connection v/s
>>>> physical connection. Today, we are physically connected as a
>>>> sub-project. Becoming a TLP, will not increase/decrease our influence on
>>>> the Hadoop community (think Logical, Physical and MR Layers :)
>>>> 3. Philosophy - I have already talked about this. The tight coupling is
>>>> by choice. If Pig continues to be a data flow language with clear syntax
>>>> and semantics then someone can implement Pig on top of a different
>>>> backend. Do we intend to take this approach?
>>>>
>>>> I just wanted to offer a different opinion to this thread. I strongly
>>>> believe that we should think about the original philosophy. Will we have
>>>> a Pig standards committee that will decide on the changes to the
>>>> language (think C/C++) if there are multiple backend implementations?
>>>>
>>>> I will reserve my vote based on the outcome of the philosophy and
>>>> backward compatibility discussions. If we decide that Pig will be
>>>> treated and maintained like a true language with clear syntax and
>>>> semantics then we have a strong case to make it into a TLP. If not, we
>>>> should retain our existing ties to Hadoop and make Pig into a data flow
>>>> language for Hadoop.
>>>>
>>>> Santhosh
>>>>
>>>> -----Original Message-----
>>>> From: Thejas Nair [mailto:tejas@yahoo-inc.com]
>>>> Sent: Friday, April 02, 2010 4:08 PM
>>>> To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
>>>> Subject: Re: Begin a discussion about Pig as a top level project
>>>>
>>>> I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and
>>>> heavily influenced by its roadmap. I think it makes sense to continue as
>>>> a sub-project of hadoop.
>>>>
>>>> -Thejas
>>>>
>>>>
>>>>
>>>> On 3/31/10 4:04 PM, "Dmitriy Ryaboy" <dv...@gmail.com> wrote:
>>>>
>>>>  Over time, Pig is increasing its coupling to Hadoop (for good
>>>>> reasons), rather than decreasing it. If and when Pig becomes a viable
>>>>> entity without hadoop around, it might make sense as a TLP. As is, I
>>>>> think becoming a TLP will only introduce unnecessary administrative
>>>>>
>>>> and bureaucratic headaches.
>>>>
>>>>> So my vote is also -1.
>>>>>
>>>>> -Dmitriy
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <ga...@yahoo-inc.com>
>>>>>
>>>> wrote:
>>>>
>>>>>
>>>>>  So far I haven't seen any feedback on this.  Apache has asked the
>>>>>> Hadoop PMC to submit input in April on whether some subprojects
>>>>>> should be promoted to TLPs.  We, the Pig community, need to give
>>>>>> feedback to the Hadoop PMC on how we feel about this.  Please make
>>>>>>
>>>>> your voice heard.
>>>>
>>>>>
>>>>>> So now I'll head my own call and give my thoughts on it.
>>>>>>
>>>>>> The biggest advantage I see to being a TLP is a direct connection to
>>>>>> Apache.  Right now all of the Pig team's interaction with Apache is
>>>>>> through the Hadoop PMC.  Being directly connected to Apache would
>>>>>> benefit Pig team members who would have a better view into Apache.
>>>>>> It would also raise our profile in Apache and thus make other
>>>>>>
>>>>> projects more aware of us.
>>>>
>>>>>
>>>>>> However, I am concerned about loosing Pig's explicit connection to
>>>>>>
>>>>> Hadoop.
>>>>
>>>>> This concern has a couple of dimensions.  One, Hadoop and MapReduce
>>>>>> are the current flavor of the month in computing.  Given that Pig
>>>>>> shares a name with the common farm animal, it's hard to be sure based
>>>>>>
>>>>> on search statistics.
>>>>
>>>>> But Google trends shows that "hadoop" is searched on much more
>>>>>> frequently than "hadoop pig" or "apache pig" (see
>>>>>> http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing
>>>>>> that most Pig users come from Hadoop users who discover Pig via
>>>>>>
>>>>> Hadoop's website.
>>>>
>>>>> Loosing that subproject tab on Hadoop's front page may radically
>>>>>> lower the number of users coming to Pig to check out our project.  I
>>>>>> would argue that this benefits Hadoop as well, since high level
>>>>>> languages like Pig Latin have the potential to greatly extend the
>>>>>>
>>>>> user base and usability of Hadoop.
>>>>
>>>>>
>>>>>> Two, being explicitly connected to Hadoop keeps our two communities
>>>>>> aware of each others needs.  There are features proposed for MR that
>>>>>> would greatly help Pig.  By staying in the Hadoop community Pig is
>>>>>> better positioned to advocate for and help implement and test those
>>>>>> features.  The response to this will be that Pig developers can still
>>>>>>
>>>>>
>>>>  subscribe to Hadoop mailing lists, submit patches, etc.  That is,
>>>>>> they can still be part of the Hadoop community.  Which reinforces my
>>>>>> point that it makes more sense to leave Pig in the Hadoop community
>>>>>> since Pig developers will need to be part of that community anyway.
>>>>>>
>>>>>> Finally, philosophically it makes sense to me that projects that are
>>>>>> tightly connected belong together.  It strikes me as strange to have
>>>>>> Pig as a TLP completely dependent on another TLP.  Hadoop was
>>>>>> originally a subproject of Lucene.  It moved out to be a TLP when it
>>>>>> became obvious that Hadoop had become independent of and useful apart
>>>>>>
>>>>>
>>>>  from Lucene.  Pig is not in that position relative to Hadoop.
>>>>>>
>>>>>> So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to
>>>>>> being persuaded that I'm wrong or my concerns can be addressed while
>>>>>> still having Pig as a TLP.
>>>>>>
>>>>>> Alan.
>>>>>>
>>>>>>
>>>>>> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
>>>>>>
>>>>>> You have probably heard by now that there is a discussion going on
>>>>>> in the
>>>>>>
>>>>>>> Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
>>>>>>> Zookeeper, Hive, and Pig) should move out from under the Hadoop
>>>>>>> umbrella and become top level Apache projects (TLP).  This
>>>>>>> discussion has picked up recently since the Apache board has clearly
>>>>>>>
>>>>>>
>>>>  communicated to the Hadoop PMC that it is concerned that Hadoop is
>>>>>>> acting as an umbrella project with many disjoint subprojects
>>>>>>> underneath it.  They are concerned that this gives Apache little
>>>>>>> insight into the health and happenings of the subproject communities
>>>>>>>
>>>>>>
>>>>  which in turn means Apache cannot properly mentor those communities.
>>>>>>>
>>>>>>> The purpose of this email is to start a discussion within the Pig
>>>>>>> community about this topic.  Let me cover first what becoming TLP
>>>>>>> would mean for Pig, and then I'll go into what options I think we as
>>>>>>>
>>>>>> a community have.
>>>>
>>>>>
>>>>>>> Becoming a TLP would mean that Pig would itself have a PMC that
>>>>>>> would report directly to the Apache board.  Who would be on the PMC
>>>>>>> would be something we as a community would need to decide.  Common
>>>>>>> options would be to say all active committers are on the PMC, or all
>>>>>>>
>>>>>>
>>>>  active committers who have been a committer for at least a year.  We
>>>>>>>
>>>>>>
>>>>  would also need to elect a chair of the PMC.  This lucky person
>>>>>>> would have no additional power, but would have the additional
>>>>>>> responsibility of writing quarterly reports on Pig's status for
>>>>>>> Apache board meetings, as well as coordinating with Apache to get
>>>>>>> accounts for new  committers, etc.  For more information see
>>>>>>> http://www.apache.org/foundation/how-it-works.html#roles
>>>>>>>
>>>>>>> Becoming a TLP would not mean that we are ostracized from the Hadoop
>>>>>>>
>>>>>>
>>>>  community.  We would continue to be invited to Hadoop Summits, HUGs,
>>>>>>>
>>>>>> etc.
>>>>
>>>>> Since all Pig developers and users are by definition Hadoop users,
>>>>>>> we would continue to be a strong presence in the Hadoop community.
>>>>>>>
>>>>>>> I see three ways that we as a community can respond to this:
>>>>>>>
>>>>>>> 1) Say yes, we want to be a TLP now.
>>>>>>> 2) Say yes, we want to be a TLP, but not yet.  We feel we need more
>>>>>>> time to mature.  If we choose this option we need to be able to
>>>>>>> clearly articulate how much time we need and what we hope to see
>>>>>>> change in that time.
>>>>>>> 3) Say no, we feel the benefits for us staying with Hadoop outweigh
>>>>>>> the drawbacks of being a disjoint subproject.  If we choose this, we
>>>>>>>
>>>>>>
>>>>  need to be able to say exactly what those benefits are and why we
>>>>>>> feel they will be compromised by leaving the Hadoop project.
>>>>>>>
>>>>>>> There may other options that I haven't thought of.  Please feel free
>>>>>>>
>>>>>>
>>>>  to suggest any you think of.
>>>>>>>
>>>>>>> Questions?  Thoughts?  Let the discussion begin.
>>>>>>>
>>>>>>> Alan.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>>
>>>
>

Re: Begin a discussion about Pig as a top level project

Posted by Alan Gates <ga...@yahoo-inc.com>.

I agree that Pig's code documentation is in sad shape.  I think our  
user documentation for each release is good, of limited.  I hope that  
our documents on wiki (such as PigJournal) help people understand our  
roadmap.  Please let us know if you disagree so we can find ways to  
improve it.

That said, it isn't clear to me how Pig being a TLP will solve that.   
The current committers or some subset thereof (see original message)  
would become the PMC.  Other than having expanded powers to vote on  
releases and who becomes new committers, the role of these new PMC  
members would not change much.  They won't have anymore time to  
address documentation and communication issues.  We need to find a way  
to address those no matter what governance framework or community Pig  
is in.

Alan.

On Apr 5, 2010, at 9:02 AM, hc busy wrote:

> This is awesome!!! As much as I hate PJM's for wasting time at all the
> places that I've worked at, I think formalizing the management  
> group(PMC) to
> openly and clearly determine feature roadmap and dev schedule is the  
> best
> thing pig can have.
>
> I once commented to my co-worker (also heavy pig user) that pig's
> organization (with all due respect to all you hardworking people) is  
> like a
> pigsty! documentations all over the place, javadocs from three  
> versions ago,
> much of the documentation doesn't match actual features... links to  
> the
> download page is broken.
>
> If you look at cascading's website... it's so much cleaner. (Of  
> course... we
> still use pig because it works well)
>
> I think as TLP, pig will receive better marketing and better support  
> in a
> way that will propel it both in popularity and in the amount of  
> support it
> receives.
>
> As a user, that change will be good for me.
>
>
> On Sun, Apr 4, 2010 at 11:10 PM, Ashutosh Chauhan <
> ashutosh.chauhan@gmail.com> wrote:
>
>> I concur with Santhosh here. I think main question we need to answer
>> here is how close our ties are with Hadoop currently and how it will
>> be in future ? When Pig was originally designed the intent was to  
>> keep
>> it backend neutral, so  much so that there was a reference backend
>> implementation (also known as local engine) which had nothing to do
>> with Hadoop. But things have changed since then. Hadoop's local mode
>> is adopted in favor of Pig's own local mode. We have moved from being
>> backend agnostic to hadoop favoring. And while this was happening, it
>> seems we tried to keep Pig Latin language independent of hadoop
>> backend  while Pig runtime started to make use of hadoop concepts.
>>
>> Apart from design decisions, this move also has a practical impact on
>> our codebase. Since we adopted Hadoop more closely, we got rid of an
>> extra layer of abstraction and instead started using similar
>> abstractions already existing in Hadoop. This has a positive impact
>> that it simplified the codebase and provides tighter integration with
>> Hadoop.
>> So, if we are continuing in a direction where Hadoop is our only
>> backend (or atleast a favored one), close ties to Hadoop are useful
>> because of the reasons Alan and Dmitriy pointed out. if not, then I
>> think moving out to TLP makes sense. Since, there is no efforts which
>> I am aware of, is trying to plug in a different backend for Pig, I
>> think maintaining close ties with Hadoop is useful for Pig. In future
>> when there is a different distributed computing platform comes up
>> which we want to use as backend, we can revisit our decision. So, as
>> for things stand today I am -1 to move out of  Hadoop.
>>
>> And I would also like to reiterate my point that though Pig runtime
>> may continue to get closer to Hadoop, we shall keep Pig Latin
>> completely backend agnostic.
>>
>> Ashutosh
>>
>> On Sat, Apr 3, 2010 at 12:43, Santhosh Srinivasan <sm...@yahoo-inc.com>
>> wrote:
>>> I see this as a multi-part question. Looking back at some of the
>>> significant roadmap/existential questions asked in the last 12  
>>> months, I
>>> see the following:
>>>
>>> 1. With the introduction of SQL, what is the philosophy of Pig (I  
>>> sent
>>> an email about this approximately 9 months ago)
>>> 2. What is the approach to support backward compatibility in Pig  
>>> (Alan
>>> had sent an email about this 3 months ago)
>>> 3. Should Pig be a TLP (the current email thread).
>>>
>>> Here is my take on answering the aforementioned questions.
>>>
>>> The initial philosophy of Pig was to be backend agnostic. It was
>>> designed as a data flow language. Whenever a new language is  
>>> designed,
>>> the syntax and semantics of the language have to be laid out. The  
>>> syntax
>>> is usually captured in the form of a BNF grammar. The semantics are
>>> defined by the language creators. Backward compatibility is then a
>>> question of holding true to the syntax and semantics. With Pig, in
>>> addition to the language, the Java APIs were exposed to customers to
>>> implement UDFs (load/store/filter/grouping/row transformation etc),
>>> provision looping since the language does not support looping  
>>> constructs
>>> and also support a programmatic mode of access. Backward  
>>> compatibility
>>> in this context is to support API versioning.
>>>
>>> Do we still intend to position as a data flow language that is  
>>> backend
>>> agnostic? If the answer is yes, then there is a strong case for  
>>> making
>>> Pig a TLP.
>>>
>>> Are we influenced by Hadoop? A big YES! The reason Pig chose to  
>>> become a
>>> Hadoop sub-project was to ride the Hadoop popularity wave. As a
>>> consequence, we chose to be heavily influenced by the Hadoop  
>>> roadmap.
>>>
>>> Like a good lawyer, I also have rebuttals to Alan's questions :)
>>>
>>> 1. Search engine popularity - We can discuss this with the Hadoop  
>>> team
>>> and still retain links to TLP's that are coupled (loosely or  
>>> tightly).
>>> 2. Explicit connection to Hadoop - I see this as logical  
>>> connection v/s
>>> physical connection. Today, we are physically connected as a
>>> sub-project. Becoming a TLP, will not increase/decrease our  
>>> influence on
>>> the Hadoop community (think Logical, Physical and MR Layers :)
>>> 3. Philosophy - I have already talked about this. The tight  
>>> coupling is
>>> by choice. If Pig continues to be a data flow language with clear  
>>> syntax
>>> and semantics then someone can implement Pig on top of a different
>>> backend. Do we intend to take this approach?
>>>
>>> I just wanted to offer a different opinion to this thread. I  
>>> strongly
>>> believe that we should think about the original philosophy. Will  
>>> we have
>>> a Pig standards committee that will decide on the changes to the
>>> language (think C/C++) if there are multiple backend  
>>> implementations?
>>>
>>> I will reserve my vote based on the outcome of the philosophy and
>>> backward compatibility discussions. If we decide that Pig will be
>>> treated and maintained like a true language with clear syntax and
>>> semantics then we have a strong case to make it into a TLP. If  
>>> not, we
>>> should retain our existing ties to Hadoop and make Pig into a data  
>>> flow
>>> language for Hadoop.
>>>
>>> Santhosh
>>>
>>> -----Original Message-----
>>> From: Thejas Nair [mailto:tejas@yahoo-inc.com]
>>> Sent: Friday, April 02, 2010 4:08 PM
>>> To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
>>> Subject: Re: Begin a discussion about Pig as a top level project
>>>
>>> I agree with Alan and Dmitriy - Pig is tightly coupled with  
>>> hadoop, and
>>> heavily influenced by its roadmap. I think it makes sense to  
>>> continue as
>>> a sub-project of hadoop.
>>>
>>> -Thejas
>>>
>>>
>>>
>>> On 3/31/10 4:04 PM, "Dmitriy Ryaboy" <dv...@gmail.com> wrote:
>>>
>>>> Over time, Pig is increasing its coupling to Hadoop (for good
>>>> reasons), rather than decreasing it. If and when Pig becomes a  
>>>> viable
>>>> entity without hadoop around, it might make sense as a TLP. As  
>>>> is, I
>>>> think becoming a TLP will only introduce unnecessary administrative
>>> and bureaucratic headaches.
>>>> So my vote is also -1.
>>>>
>>>> -Dmitriy
>>>>
>>>>
>>>>
>>>> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <ga...@yahoo-inc.com>
>>> wrote:
>>>>
>>>>> So far I haven't seen any feedback on this.  Apache has asked the
>>>>> Hadoop PMC to submit input in April on whether some subprojects
>>>>> should be promoted to TLPs.  We, the Pig community, need to give
>>>>> feedback to the Hadoop PMC on how we feel about this.  Please make
>>> your voice heard.
>>>>>
>>>>> So now I'll head my own call and give my thoughts on it.
>>>>>
>>>>> The biggest advantage I see to being a TLP is a direct  
>>>>> connection to
>>>>> Apache.  Right now all of the Pig team's interaction with Apache  
>>>>> is
>>>>> through the Hadoop PMC.  Being directly connected to Apache would
>>>>> benefit Pig team members who would have a better view into Apache.
>>>>> It would also raise our profile in Apache and thus make other
>>> projects more aware of us.
>>>>>
>>>>> However, I am concerned about loosing Pig's explicit connection to
>>> Hadoop.
>>>>> This concern has a couple of dimensions.  One, Hadoop and  
>>>>> MapReduce
>>>>> are the current flavor of the month in computing.  Given that Pig
>>>>> shares a name with the common farm animal, it's hard to be sure  
>>>>> based
>>> on search statistics.
>>>>> But Google trends shows that "hadoop" is searched on much more
>>>>> frequently than "hadoop pig" or "apache pig" (see
>>>>> http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am  
>>>>> guessing
>>>>> that most Pig users come from Hadoop users who discover Pig via
>>> Hadoop's website.
>>>>> Loosing that subproject tab on Hadoop's front page may radically
>>>>> lower the number of users coming to Pig to check out our  
>>>>> project.  I
>>>>> would argue that this benefits Hadoop as well, since high level
>>>>> languages like Pig Latin have the potential to greatly extend the
>>> user base and usability of Hadoop.
>>>>>
>>>>> Two, being explicitly connected to Hadoop keeps our two  
>>>>> communities
>>>>> aware of each others needs.  There are features proposed for MR  
>>>>> that
>>>>> would greatly help Pig.  By staying in the Hadoop community Pig is
>>>>> better positioned to advocate for and help implement and test  
>>>>> those
>>>>> features.  The response to this will be that Pig developers can  
>>>>> still
>>>
>>>>> subscribe to Hadoop mailing lists, submit patches, etc.  That is,
>>>>> they can still be part of the Hadoop community.  Which  
>>>>> reinforces my
>>>>> point that it makes more sense to leave Pig in the Hadoop  
>>>>> community
>>>>> since Pig developers will need to be part of that community  
>>>>> anyway.
>>>>>
>>>>> Finally, philosophically it makes sense to me that projects that  
>>>>> are
>>>>> tightly connected belong together.  It strikes me as strange to  
>>>>> have
>>>>> Pig as a TLP completely dependent on another TLP.  Hadoop was
>>>>> originally a subproject of Lucene.  It moved out to be a TLP  
>>>>> when it
>>>>> became obvious that Hadoop had become independent of and useful  
>>>>> apart
>>>
>>>>> from Lucene.  Pig is not in that position relative to Hadoop.
>>>>>
>>>>> So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to
>>>>> being persuaded that I'm wrong or my concerns can be addressed  
>>>>> while
>>>>> still having Pig as a TLP.
>>>>>
>>>>> Alan.
>>>>>
>>>>>
>>>>> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
>>>>>
>>>>> You have probably heard by now that there is a discussion going on
>>>>> in the
>>>>>> Hadoop PMC as to whether a number of the subprojects (Hbase,  
>>>>>> Avro,
>>>>>> Zookeeper, Hive, and Pig) should move out from under the Hadoop
>>>>>> umbrella and become top level Apache projects (TLP).  This
>>>>>> discussion has picked up recently since the Apache board has  
>>>>>> clearly
>>>
>>>>>> communicated to the Hadoop PMC that it is concerned that Hadoop  
>>>>>> is
>>>>>> acting as an umbrella project with many disjoint subprojects
>>>>>> underneath it.  They are concerned that this gives Apache little
>>>>>> insight into the health and happenings of the subproject  
>>>>>> communities
>>>
>>>>>> which in turn means Apache cannot properly mentor those  
>>>>>> communities.
>>>>>>
>>>>>> The purpose of this email is to start a discussion within the Pig
>>>>>> community about this topic.  Let me cover first what becoming TLP
>>>>>> would mean for Pig, and then I'll go into what options I think  
>>>>>> we as
>>> a community have.
>>>>>>
>>>>>> Becoming a TLP would mean that Pig would itself have a PMC that
>>>>>> would report directly to the Apache board.  Who would be on the  
>>>>>> PMC
>>>>>> would be something we as a community would need to decide.   
>>>>>> Common
>>>>>> options would be to say all active committers are on the PMC,  
>>>>>> or all
>>>
>>>>>> active committers who have been a committer for at least a  
>>>>>> year.  We
>>>
>>>>>> would also need to elect a chair of the PMC.  This lucky person
>>>>>> would have no additional power, but would have the additional
>>>>>> responsibility of writing quarterly reports on Pig's status for
>>>>>> Apache board meetings, as well as coordinating with Apache to get
>>>>>> accounts for new  committers, etc.  For more information see
>>>>>> http://www.apache.org/foundation/how-it-works.html#roles
>>>>>>
>>>>>> Becoming a TLP would not mean that we are ostracized from the  
>>>>>> Hadoop
>>>
>>>>>> community.  We would continue to be invited to Hadoop Summits,  
>>>>>> HUGs,
>>> etc.
>>>>>> Since all Pig developers and users are by definition Hadoop  
>>>>>> users,
>>>>>> we would continue to be a strong presence in the Hadoop  
>>>>>> community.
>>>>>>
>>>>>> I see three ways that we as a community can respond to this:
>>>>>>
>>>>>> 1) Say yes, we want to be a TLP now.
>>>>>> 2) Say yes, we want to be a TLP, but not yet.  We feel we need  
>>>>>> more
>>>>>> time to mature.  If we choose this option we need to be able to
>>>>>> clearly articulate how much time we need and what we hope to see
>>>>>> change in that time.
>>>>>> 3) Say no, we feel the benefits for us staying with Hadoop  
>>>>>> outweigh
>>>>>> the drawbacks of being a disjoint subproject.  If we choose  
>>>>>> this, we
>>>
>>>>>> need to be able to say exactly what those benefits are and why we
>>>>>> feel they will be compromised by leaving the Hadoop project.
>>>>>>
>>>>>> There may other options that I haven't thought of.  Please feel  
>>>>>> free
>>>
>>>>>> to suggest any you think of.
>>>>>>
>>>>>> Questions?  Thoughts?  Let the discussion begin.
>>>>>>
>>>>>> Alan.
>>>>>>
>>>>>>
>>>>>
>>>
>>>
>>

Re: Begin a discussion about Pig as a top level project

Posted by hc busy <hc...@gmail.com>.

This is awesome!!! As much as I hate PJM's for wasting time at all the
places that I've worked at, I think formalizing the management group(PMC) to
openly and clearly determine feature roadmap and dev schedule is the best
thing pig can have.

I once commented to my co-worker (also heavy pig user) that pig's
organization (with all due respect to all you hardworking people) is like a
pigsty! documentations all over the place, javadocs from three versions ago,
much of the documentation doesn't match actual features... links to the
download page is broken.

If you look at cascading's website... it's so much cleaner. (Of course... we
still use pig because it works well)

I think as TLP, pig will receive better marketing and better support in a
way that will propel it both in popularity and in the amount of support it
receives.

As a user, that change will be good for me.


On Sun, Apr 4, 2010 at 11:10 PM, Ashutosh Chauhan <
ashutosh.chauhan@gmail.com> wrote:

> I concur with Santhosh here. I think main question we need to answer
> here is how close our ties are with Hadoop currently and how it will
> be in future ? When Pig was originally designed the intent was to keep
> it backend neutral, so  much so that there was a reference backend
> implementation (also known as local engine) which had nothing to do
> with Hadoop. But things have changed since then. Hadoop's local mode
> is adopted in favor of Pig's own local mode. We have moved from being
> backend agnostic to hadoop favoring. And while this was happening, it
> seems we tried to keep Pig Latin language independent of hadoop
> backend  while Pig runtime started to make use of hadoop concepts.
>
> Apart from design decisions, this move also has a practical impact on
> our codebase. Since we adopted Hadoop more closely, we got rid of an
> extra layer of abstraction and instead started using similar
> abstractions already existing in Hadoop. This has a positive impact
> that it simplified the codebase and provides tighter integration with
> Hadoop.
> So, if we are continuing in a direction where Hadoop is our only
> backend (or atleast a favored one), close ties to Hadoop are useful
> because of the reasons Alan and Dmitriy pointed out. if not, then I
> think moving out to TLP makes sense. Since, there is no efforts which
> I am aware of, is trying to plug in a different backend for Pig, I
> think maintaining close ties with Hadoop is useful for Pig. In future
> when there is a different distributed computing platform comes up
> which we want to use as backend, we can revisit our decision. So, as
> for things stand today I am -1 to move out of  Hadoop.
>
> And I would also like to reiterate my point that though Pig runtime
> may continue to get closer to Hadoop, we shall keep Pig Latin
> completely backend agnostic.
>
> Ashutosh
>
> On Sat, Apr 3, 2010 at 12:43, Santhosh Srinivasan <sm...@yahoo-inc.com>
> wrote:
> > I see this as a multi-part question. Looking back at some of the
> > significant roadmap/existential questions asked in the last 12 months, I
> > see the following:
> >
> > 1. With the introduction of SQL, what is the philosophy of Pig (I sent
> > an email about this approximately 9 months ago)
> > 2. What is the approach to support backward compatibility in Pig (Alan
> > had sent an email about this 3 months ago)
> > 3. Should Pig be a TLP (the current email thread).
> >
> > Here is my take on answering the aforementioned questions.
> >
> > The initial philosophy of Pig was to be backend agnostic. It was
> > designed as a data flow language. Whenever a new language is designed,
> > the syntax and semantics of the language have to be laid out. The syntax
> > is usually captured in the form of a BNF grammar. The semantics are
> > defined by the language creators. Backward compatibility is then a
> > question of holding true to the syntax and semantics. With Pig, in
> > addition to the language, the Java APIs were exposed to customers to
> > implement UDFs (load/store/filter/grouping/row transformation etc),
> > provision looping since the language does not support looping constructs
> > and also support a programmatic mode of access. Backward compatibility
> > in this context is to support API versioning.
> >
> > Do we still intend to position as a data flow language that is backend
> > agnostic? If the answer is yes, then there is a strong case for making
> > Pig a TLP.
> >
> > Are we influenced by Hadoop? A big YES! The reason Pig chose to become a
> > Hadoop sub-project was to ride the Hadoop popularity wave. As a
> > consequence, we chose to be heavily influenced by the Hadoop roadmap.
> >
> > Like a good lawyer, I also have rebuttals to Alan's questions :)
> >
> > 1. Search engine popularity - We can discuss this with the Hadoop team
> > and still retain links to TLP's that are coupled (loosely or tightly).
> > 2. Explicit connection to Hadoop - I see this as logical connection v/s
> > physical connection. Today, we are physically connected as a
> > sub-project. Becoming a TLP, will not increase/decrease our influence on
> > the Hadoop community (think Logical, Physical and MR Layers :)
> > 3. Philosophy - I have already talked about this. The tight coupling is
> > by choice. If Pig continues to be a data flow language with clear syntax
> > and semantics then someone can implement Pig on top of a different
> > backend. Do we intend to take this approach?
> >
> > I just wanted to offer a different opinion to this thread. I strongly
> > believe that we should think about the original philosophy. Will we have
> > a Pig standards committee that will decide on the changes to the
> > language (think C/C++) if there are multiple backend implementations?
> >
> > I will reserve my vote based on the outcome of the philosophy and
> > backward compatibility discussions. If we decide that Pig will be
> > treated and maintained like a true language with clear syntax and
> > semantics then we have a strong case to make it into a TLP. If not, we
> > should retain our existing ties to Hadoop and make Pig into a data flow
> > language for Hadoop.
> >
> > Santhosh
> >
> > -----Original Message-----
> > From: Thejas Nair [mailto:tejas@yahoo-inc.com]
> > Sent: Friday, April 02, 2010 4:08 PM
> > To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
> > Subject: Re: Begin a discussion about Pig as a top level project
> >
> > I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and
> > heavily influenced by its roadmap. I think it makes sense to continue as
> > a sub-project of hadoop.
> >
> > -Thejas
> >
> >
> >
> > On 3/31/10 4:04 PM, "Dmitriy Ryaboy" <dv...@gmail.com> wrote:
> >
> >> Over time, Pig is increasing its coupling to Hadoop (for good
> >> reasons), rather than decreasing it. If and when Pig becomes a viable
> >> entity without hadoop around, it might make sense as a TLP. As is, I
> >> think becoming a TLP will only introduce unnecessary administrative
> > and bureaucratic headaches.
> >> So my vote is also -1.
> >>
> >> -Dmitriy
> >>
> >>
> >>
> >> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <ga...@yahoo-inc.com>
> > wrote:
> >>
> >>> So far I haven't seen any feedback on this.  Apache has asked the
> >>> Hadoop PMC to submit input in April on whether some subprojects
> >>> should be promoted to TLPs.  We, the Pig community, need to give
> >>> feedback to the Hadoop PMC on how we feel about this.  Please make
> > your voice heard.
> >>>
> >>> So now I'll head my own call and give my thoughts on it.
> >>>
> >>> The biggest advantage I see to being a TLP is a direct connection to
> >>> Apache.  Right now all of the Pig team's interaction with Apache is
> >>> through the Hadoop PMC.  Being directly connected to Apache would
> >>> benefit Pig team members who would have a better view into Apache.
> >>> It would also raise our profile in Apache and thus make other
> > projects more aware of us.
> >>>
> >>> However, I am concerned about loosing Pig's explicit connection to
> > Hadoop.
> >>>  This concern has a couple of dimensions.  One, Hadoop and MapReduce
> >>> are the current flavor of the month in computing.  Given that Pig
> >>> shares a name with the common farm animal, it's hard to be sure based
> > on search statistics.
> >>>  But Google trends shows that "hadoop" is searched on much more
> >>> frequently than "hadoop pig" or "apache pig" (see
> >>> http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing
> >>> that most Pig users come from Hadoop users who discover Pig via
> > Hadoop's website.
> >>>  Loosing that subproject tab on Hadoop's front page may radically
> >>> lower the number of users coming to Pig to check out our project.  I
> >>> would argue that this benefits Hadoop as well, since high level
> >>> languages like Pig Latin have the potential to greatly extend the
> > user base and usability of Hadoop.
> >>>
> >>> Two, being explicitly connected to Hadoop keeps our two communities
> >>> aware of each others needs.  There are features proposed for MR that
> >>> would greatly help Pig.  By staying in the Hadoop community Pig is
> >>> better positioned to advocate for and help implement and test those
> >>> features.  The response to this will be that Pig developers can still
> >
> >>> subscribe to Hadoop mailing lists, submit patches, etc.  That is,
> >>> they can still be part of the Hadoop community.  Which reinforces my
> >>> point that it makes more sense to leave Pig in the Hadoop community
> >>> since Pig developers will need to be part of that community anyway.
> >>>
> >>> Finally, philosophically it makes sense to me that projects that are
> >>> tightly connected belong together.  It strikes me as strange to have
> >>> Pig as a TLP completely dependent on another TLP.  Hadoop was
> >>> originally a subproject of Lucene.  It moved out to be a TLP when it
> >>> became obvious that Hadoop had become independent of and useful apart
> >
> >>> from Lucene.  Pig is not in that position relative to Hadoop.
> >>>
> >>> So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to
> >>> being persuaded that I'm wrong or my concerns can be addressed while
> >>> still having Pig as a TLP.
> >>>
> >>> Alan.
> >>>
> >>>
> >>> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
> >>>
> >>>  You have probably heard by now that there is a discussion going on
> >>> in the
> >>>> Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
> >>>> Zookeeper, Hive, and Pig) should move out from under the Hadoop
> >>>> umbrella and become top level Apache projects (TLP).  This
> >>>> discussion has picked up recently since the Apache board has clearly
> >
> >>>> communicated to the Hadoop PMC that it is concerned that Hadoop is
> >>>> acting as an umbrella project with many disjoint subprojects
> >>>> underneath it.  They are concerned that this gives Apache little
> >>>> insight into the health and happenings of the subproject communities
> >
> >>>> which in turn means Apache cannot properly mentor those communities.
> >>>>
> >>>> The purpose of this email is to start a discussion within the Pig
> >>>> community about this topic.  Let me cover first what becoming TLP
> >>>> would mean for Pig, and then I'll go into what options I think we as
> > a community have.
> >>>>
> >>>> Becoming a TLP would mean that Pig would itself have a PMC that
> >>>> would report directly to the Apache board.  Who would be on the PMC
> >>>> would be something we as a community would need to decide.  Common
> >>>> options would be to say all active committers are on the PMC, or all
> >
> >>>> active committers who have been a committer for at least a year.  We
> >
> >>>> would also need to elect a chair of the PMC.  This lucky person
> >>>> would have no additional power, but would have the additional
> >>>> responsibility of writing quarterly reports on Pig's status for
> >>>> Apache board meetings, as well as coordinating with Apache to get
> >>>> accounts for new  committers, etc.  For more information see
> >>>> http://www.apache.org/foundation/how-it-works.html#roles
> >>>>
> >>>> Becoming a TLP would not mean that we are ostracized from the Hadoop
> >
> >>>> community.  We would continue to be invited to Hadoop Summits, HUGs,
> > etc.
> >>>>  Since all Pig developers and users are by definition Hadoop users,
> >>>> we would continue to be a strong presence in the Hadoop community.
> >>>>
> >>>> I see three ways that we as a community can respond to this:
> >>>>
> >>>> 1) Say yes, we want to be a TLP now.
> >>>> 2) Say yes, we want to be a TLP, but not yet.  We feel we need more
> >>>> time to mature.  If we choose this option we need to be able to
> >>>> clearly articulate how much time we need and what we hope to see
> >>>> change in that time.
> >>>> 3) Say no, we feel the benefits for us staying with Hadoop outweigh
> >>>> the drawbacks of being a disjoint subproject.  If we choose this, we
> >
> >>>> need to be able to say exactly what those benefits are and why we
> >>>> feel they will be compromised by leaving the Hadoop project.
> >>>>
> >>>> There may other options that I haven't thought of.  Please feel free
> >
> >>>> to suggest any you think of.
> >>>>
> >>>> Questions?  Thoughts?  Let the discussion begin.
> >>>>
> >>>> Alan.
> >>>>
> >>>>
> >>>
> >
> >
>

Re: Begin a discussion about Pig as a top level project

Posted by Ashutosh Chauhan <as...@gmail.com>.

I concur with Santhosh here. I think main question we need to answer
here is how close our ties are with Hadoop currently and how it will
be in future ? When Pig was originally designed the intent was to keep
it backend neutral, so  much so that there was a reference backend
implementation (also known as local engine) which had nothing to do
with Hadoop. But things have changed since then. Hadoop's local mode
is adopted in favor of Pig's own local mode. We have moved from being
backend agnostic to hadoop favoring. And while this was happening, it
seems we tried to keep Pig Latin language independent of hadoop
backend  while Pig runtime started to make use of hadoop concepts.

Apart from design decisions, this move also has a practical impact on
our codebase. Since we adopted Hadoop more closely, we got rid of an
extra layer of abstraction and instead started using similar
abstractions already existing in Hadoop. This has a positive impact
that it simplified the codebase and provides tighter integration with
Hadoop.
So, if we are continuing in a direction where Hadoop is our only
backend (or atleast a favored one), close ties to Hadoop are useful
because of the reasons Alan and Dmitriy pointed out. if not, then I
think moving out to TLP makes sense. Since, there is no efforts which
I am aware of, is trying to plug in a different backend for Pig, I
think maintaining close ties with Hadoop is useful for Pig. In future
when there is a different distributed computing platform comes up
which we want to use as backend, we can revisit our decision. So, as
for things stand today I am -1 to move out of  Hadoop.

And I would also like to reiterate my point that though Pig runtime
may continue to get closer to Hadoop, we shall keep Pig Latin
completely backend agnostic.

Ashutosh

On Sat, Apr 3, 2010 at 12:43, Santhosh Srinivasan <sm...@yahoo-inc.com> wrote:
> I see this as a multi-part question. Looking back at some of the
> significant roadmap/existential questions asked in the last 12 months, I
> see the following:
>
> 1. With the introduction of SQL, what is the philosophy of Pig (I sent
> an email about this approximately 9 months ago)
> 2. What is the approach to support backward compatibility in Pig (Alan
> had sent an email about this 3 months ago)
> 3. Should Pig be a TLP (the current email thread).
>
> Here is my take on answering the aforementioned questions.
>
> The initial philosophy of Pig was to be backend agnostic. It was
> designed as a data flow language. Whenever a new language is designed,
> the syntax and semantics of the language have to be laid out. The syntax
> is usually captured in the form of a BNF grammar. The semantics are
> defined by the language creators. Backward compatibility is then a
> question of holding true to the syntax and semantics. With Pig, in
> addition to the language, the Java APIs were exposed to customers to
> implement UDFs (load/store/filter/grouping/row transformation etc),
> provision looping since the language does not support looping constructs
> and also support a programmatic mode of access. Backward compatibility
> in this context is to support API versioning.
>
> Do we still intend to position as a data flow language that is backend
> agnostic? If the answer is yes, then there is a strong case for making
> Pig a TLP.
>
> Are we influenced by Hadoop? A big YES! The reason Pig chose to become a
> Hadoop sub-project was to ride the Hadoop popularity wave. As a
> consequence, we chose to be heavily influenced by the Hadoop roadmap.
>
> Like a good lawyer, I also have rebuttals to Alan's questions :)
>
> 1. Search engine popularity - We can discuss this with the Hadoop team
> and still retain links to TLP's that are coupled (loosely or tightly).
> 2. Explicit connection to Hadoop - I see this as logical connection v/s
> physical connection. Today, we are physically connected as a
> sub-project. Becoming a TLP, will not increase/decrease our influence on
> the Hadoop community (think Logical, Physical and MR Layers :)
> 3. Philosophy - I have already talked about this. The tight coupling is
> by choice. If Pig continues to be a data flow language with clear syntax
> and semantics then someone can implement Pig on top of a different
> backend. Do we intend to take this approach?
>
> I just wanted to offer a different opinion to this thread. I strongly
> believe that we should think about the original philosophy. Will we have
> a Pig standards committee that will decide on the changes to the
> language (think C/C++) if there are multiple backend implementations?
>
> I will reserve my vote based on the outcome of the philosophy and
> backward compatibility discussions. If we decide that Pig will be
> treated and maintained like a true language with clear syntax and
> semantics then we have a strong case to make it into a TLP. If not, we
> should retain our existing ties to Hadoop and make Pig into a data flow
> language for Hadoop.
>
> Santhosh
>
> -----Original Message-----
> From: Thejas Nair [mailto:tejas@yahoo-inc.com]
> Sent: Friday, April 02, 2010 4:08 PM
> To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
> Subject: Re: Begin a discussion about Pig as a top level project
>
> I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and
> heavily influenced by its roadmap. I think it makes sense to continue as
> a sub-project of hadoop.
>
> -Thejas
>
>
>
> On 3/31/10 4:04 PM, "Dmitriy Ryaboy" <dv...@gmail.com> wrote:
>
>> Over time, Pig is increasing its coupling to Hadoop (for good
>> reasons), rather than decreasing it. If and when Pig becomes a viable
>> entity without hadoop around, it might make sense as a TLP. As is, I
>> think becoming a TLP will only introduce unnecessary administrative
> and bureaucratic headaches.
>> So my vote is also -1.
>>
>> -Dmitriy
>>
>>
>>
>> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <ga...@yahoo-inc.com>
> wrote:
>>
>>> So far I haven't seen any feedback on this.  Apache has asked the
>>> Hadoop PMC to submit input in April on whether some subprojects
>>> should be promoted to TLPs.  We, the Pig community, need to give
>>> feedback to the Hadoop PMC on how we feel about this.  Please make
> your voice heard.
>>>
>>> So now I'll head my own call and give my thoughts on it.
>>>
>>> The biggest advantage I see to being a TLP is a direct connection to
>>> Apache.  Right now all of the Pig team's interaction with Apache is
>>> through the Hadoop PMC.  Being directly connected to Apache would
>>> benefit Pig team members who would have a better view into Apache.
>>> It would also raise our profile in Apache and thus make other
> projects more aware of us.
>>>
>>> However, I am concerned about loosing Pig's explicit connection to
> Hadoop.
>>>  This concern has a couple of dimensions.  One, Hadoop and MapReduce
>>> are the current flavor of the month in computing.  Given that Pig
>>> shares a name with the common farm animal, it's hard to be sure based
> on search statistics.
>>>  But Google trends shows that "hadoop" is searched on much more
>>> frequently than "hadoop pig" or "apache pig" (see
>>> http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing
>>> that most Pig users come from Hadoop users who discover Pig via
> Hadoop's website.
>>>  Loosing that subproject tab on Hadoop's front page may radically
>>> lower the number of users coming to Pig to check out our project.  I
>>> would argue that this benefits Hadoop as well, since high level
>>> languages like Pig Latin have the potential to greatly extend the
> user base and usability of Hadoop.
>>>
>>> Two, being explicitly connected to Hadoop keeps our two communities
>>> aware of each others needs.  There are features proposed for MR that
>>> would greatly help Pig.  By staying in the Hadoop community Pig is
>>> better positioned to advocate for and help implement and test those
>>> features.  The response to this will be that Pig developers can still
>
>>> subscribe to Hadoop mailing lists, submit patches, etc.  That is,
>>> they can still be part of the Hadoop community.  Which reinforces my
>>> point that it makes more sense to leave Pig in the Hadoop community
>>> since Pig developers will need to be part of that community anyway.
>>>
>>> Finally, philosophically it makes sense to me that projects that are
>>> tightly connected belong together.  It strikes me as strange to have
>>> Pig as a TLP completely dependent on another TLP.  Hadoop was
>>> originally a subproject of Lucene.  It moved out to be a TLP when it
>>> became obvious that Hadoop had become independent of and useful apart
>
>>> from Lucene.  Pig is not in that position relative to Hadoop.
>>>
>>> So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to
>>> being persuaded that I'm wrong or my concerns can be addressed while
>>> still having Pig as a TLP.
>>>
>>> Alan.
>>>
>>>
>>> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
>>>
>>>  You have probably heard by now that there is a discussion going on
>>> in the
>>>> Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
>>>> Zookeeper, Hive, and Pig) should move out from under the Hadoop
>>>> umbrella and become top level Apache projects (TLP).  This
>>>> discussion has picked up recently since the Apache board has clearly
>
>>>> communicated to the Hadoop PMC that it is concerned that Hadoop is
>>>> acting as an umbrella project with many disjoint subprojects
>>>> underneath it.  They are concerned that this gives Apache little
>>>> insight into the health and happenings of the subproject communities
>
>>>> which in turn means Apache cannot properly mentor those communities.
>>>>
>>>> The purpose of this email is to start a discussion within the Pig
>>>> community about this topic.  Let me cover first what becoming TLP
>>>> would mean for Pig, and then I'll go into what options I think we as
> a community have.
>>>>
>>>> Becoming a TLP would mean that Pig would itself have a PMC that
>>>> would report directly to the Apache board.  Who would be on the PMC
>>>> would be something we as a community would need to decide.  Common
>>>> options would be to say all active committers are on the PMC, or all
>
>>>> active committers who have been a committer for at least a year.  We
>
>>>> would also need to elect a chair of the PMC.  This lucky person
>>>> would have no additional power, but would have the additional
>>>> responsibility of writing quarterly reports on Pig's status for
>>>> Apache board meetings, as well as coordinating with Apache to get
>>>> accounts for new  committers, etc.  For more information see
>>>> http://www.apache.org/foundation/how-it-works.html#roles
>>>>
>>>> Becoming a TLP would not mean that we are ostracized from the Hadoop
>
>>>> community.  We would continue to be invited to Hadoop Summits, HUGs,
> etc.
>>>>  Since all Pig developers and users are by definition Hadoop users,
>>>> we would continue to be a strong presence in the Hadoop community.
>>>>
>>>> I see three ways that we as a community can respond to this:
>>>>
>>>> 1) Say yes, we want to be a TLP now.
>>>> 2) Say yes, we want to be a TLP, but not yet.  We feel we need more
>>>> time to mature.  If we choose this option we need to be able to
>>>> clearly articulate how much time we need and what we hope to see
>>>> change in that time.
>>>> 3) Say no, we feel the benefits for us staying with Hadoop outweigh
>>>> the drawbacks of being a disjoint subproject.  If we choose this, we
>
>>>> need to be able to say exactly what those benefits are and why we
>>>> feel they will be compromised by leaving the Hadoop project.
>>>>
>>>> There may other options that I haven't thought of.  Please feel free
>
>>>> to suggest any you think of.
>>>>
>>>> Questions?  Thoughts?  Let the discussion begin.
>>>>
>>>> Alan.
>>>>
>>>>
>>>
>
>

Re: Begin a discussion about Pig as a top level project

Posted by Daniel Dai <da...@gmail.com>.

I agree with the stance that we remain in Hadoop until we see more 
compelling reasons, such as Pig go beyond Hadoop happens. Currently I cannot 
fully weight the advantage and disadvantage of becoming a TLP. But provides 
this is a point of no return, I don't want to move unless we do have a 
strong motivation. We can always choose to become TLP later when we feel 
more convinced to that.

Daniel

--------------------------------------------------
From: "Santhosh Srinivasan" <sm...@yahoo-inc.com>
Sent: Monday, April 05, 2010 12:22 PM
To: <pi...@hadoop.apache.org>
Subject: RE: Begin a discussion about Pig as a top level project

> "Given that, do you think it makes
> sense to say that Pig stays a subproject for now, but if it someday
> grows beyond Hadoop only it becomes a TLP?  I could agree to that
> stance."
>
> Bingo!
>
> Santhosh
>
> -----Original Message-----
> From: Alan Gates [mailto:gates@yahoo-inc.com]
> Sent: Monday, April 05, 2010 11:37 AM
> To: pig-dev@hadoop.apache.org
> Subject: Re: Begin a discussion about Pig as a top level project
>
> Prognostication is a difficult business.  Of course I'd love it if
> someday there is an ISO Pig Latin committee (with meetings in cool
> exotic places) deciding the official standard for Pig Latin.  But that
> seems like saying in your start up's business plan, "When we reach
> Google's size, then we'll do x".  If there ever is an ISO Pig Latin
> standard it will be years off.
>
> As others have noted, staying tight to Hadoop now has many advantages,
> both in technical and adoption terms.  Hence my advocacy of keeping
> Pig Latin Hadoop agnostic while tightly integrating the backend.
> Which is to say that in my view, Pig is Hadoop specific now, but there
> may come a day when that is no longer true.   Whether Pig will ever
> move past just running on Hadoop to running in other parallel systems
> won't be known for years to come.  Given that, do you think it makes
> sense to say that Pig stays a subproject for now, but if it someday
> grows beyond Hadoop only it becomes a TLP?  I could agree to that
> stance.
>
> Alan.
>
> On Apr 3, 2010, at 12:43 PM, Santhosh Srinivasan wrote:
>
>> I see this as a multi-part question. Looking back at some of the
>> significant roadmap/existential questions asked in the last 12
>> months, I
>> see the following:
>>
>> 1. With the introduction of SQL, what is the philosophy of Pig (I sent
>> an email about this approximately 9 months ago)
>> 2. What is the approach to support backward compatibility in Pig (Alan
>> had sent an email about this 3 months ago)
>> 3. Should Pig be a TLP (the current email thread).
>>
>> Here is my take on answering the aforementioned questions.
>>
>> The initial philosophy of Pig was to be backend agnostic. It was
>> designed as a data flow language. Whenever a new language is designed,
>> the syntax and semantics of the language have to be laid out. The
>> syntax
>> is usually captured in the form of a BNF grammar. The semantics are
>> defined by the language creators. Backward compatibility is then a
>> question of holding true to the syntax and semantics. With Pig, in
>> addition to the language, the Java APIs were exposed to customers to
>> implement UDFs (load/store/filter/grouping/row transformation etc),
>> provision looping since the language does not support looping
>> constructs
>> and also support a programmatic mode of access. Backward compatibility
>> in this context is to support API versioning.
>>
>> Do we still intend to position as a data flow language that is backend
>> agnostic? If the answer is yes, then there is a strong case for making
>> Pig a TLP.
>>
>> Are we influenced by Hadoop? A big YES! The reason Pig chose to
>> become a
>> Hadoop sub-project was to ride the Hadoop popularity wave. As a
>> consequence, we chose to be heavily influenced by the Hadoop roadmap.
>>
>> Like a good lawyer, I also have rebuttals to Alan's questions :)
>>
>> 1. Search engine popularity - We can discuss this with the Hadoop team
>> and still retain links to TLP's that are coupled (loosely or tightly).
>> 2. Explicit connection to Hadoop - I see this as logical connection
>> v/s
>> physical connection. Today, we are physically connected as a
>> sub-project. Becoming a TLP, will not increase/decrease our
>> influence on
>> the Hadoop community (think Logical, Physical and MR Layers :)
>> 3. Philosophy - I have already talked about this. The tight coupling
>> is
>> by choice. If Pig continues to be a data flow language with clear
>> syntax
>> and semantics then someone can implement Pig on top of a different
>> backend. Do we intend to take this approach?
>>
>> I just wanted to offer a different opinion to this thread. I strongly
>> believe that we should think about the original philosophy. Will we
>> have
>> a Pig standards committee that will decide on the changes to the
>> language (think C/C++) if there are multiple backend implementations?
>>
>> I will reserve my vote based on the outcome of the philosophy and
>> backward compatibility discussions. If we decide that Pig will be
>> treated and maintained like a true language with clear syntax and
>> semantics then we have a strong case to make it into a TLP. If not, we
>> should retain our existing ties to Hadoop and make Pig into a data
>> flow
>> language for Hadoop.
>>
>> Santhosh
>>
>> -----Original Message-----
>> From: Thejas Nair [mailto:tejas@yahoo-inc.com]
>> Sent: Friday, April 02, 2010 4:08 PM
>> To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
>> Subject: Re: Begin a discussion about Pig as a top level project
>>
>> I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop,
>> and
>> heavily influenced by its roadmap. I think it makes sense to
>> continue as
>> a sub-project of hadoop.
>>
>> -Thejas
>>
>>
>>
>> On 3/31/10 4:04 PM, "Dmitriy Ryaboy" <dv...@gmail.com> wrote:
>>
>>> Over time, Pig is increasing its coupling to Hadoop (for good
>>> reasons), rather than decreasing it. If and when Pig becomes a viable
>>> entity without hadoop around, it might make sense as a TLP. As is, I
>>> think becoming a TLP will only introduce unnecessary administrative
>> and bureaucratic headaches.
>>> So my vote is also -1.
>>>
>>> -Dmitriy
>>>
>>>
>>>
>>> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <ga...@yahoo-inc.com>
>> wrote:
>>>
>>>> So far I haven't seen any feedback on this.  Apache has asked the
>>>> Hadoop PMC to submit input in April on whether some subprojects
>>>> should be promoted to TLPs.  We, the Pig community, need to give
>>>> feedback to the Hadoop PMC on how we feel about this.  Please make
>> your voice heard.
>>>>
>>>> So now I'll head my own call and give my thoughts on it.
>>>>
>>>> The biggest advantage I see to being a TLP is a direct connection to
>>>> Apache.  Right now all of the Pig team's interaction with Apache is
>>>> through the Hadoop PMC.  Being directly connected to Apache would
>>>> benefit Pig team members who would have a better view into Apache.
>>>> It would also raise our profile in Apache and thus make other
>> projects more aware of us.
>>>>
>>>> However, I am concerned about loosing Pig's explicit connection to
>> Hadoop.
>>>> This concern has a couple of dimensions.  One, Hadoop and MapReduce
>>>> are the current flavor of the month in computing.  Given that Pig
>>>> shares a name with the common farm animal, it's hard to be sure
>>>> based
>> on search statistics.
>>>> But Google trends shows that "hadoop" is searched on much more
>>>> frequently than "hadoop pig" or "apache pig" (see
>>>> http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing
>>>> that most Pig users come from Hadoop users who discover Pig via
>> Hadoop's website.
>>>> Loosing that subproject tab on Hadoop's front page may radically
>>>> lower the number of users coming to Pig to check out our project.  I
>>>> would argue that this benefits Hadoop as well, since high level
>>>> languages like Pig Latin have the potential to greatly extend the
>> user base and usability of Hadoop.
>>>>
>>>> Two, being explicitly connected to Hadoop keeps our two communities
>>>> aware of each others needs.  There are features proposed for MR that
>>>> would greatly help Pig.  By staying in the Hadoop community Pig is
>>>> better positioned to advocate for and help implement and test those
>>>> features.  The response to this will be that Pig developers can
>>>> still
>>
>>>> subscribe to Hadoop mailing lists, submit patches, etc.  That is,
>>>> they can still be part of the Hadoop community.  Which reinforces my
>>>> point that it makes more sense to leave Pig in the Hadoop community
>>>> since Pig developers will need to be part of that community anyway.
>>>>
>>>> Finally, philosophically it makes sense to me that projects that are
>>>> tightly connected belong together.  It strikes me as strange to have
>>>> Pig as a TLP completely dependent on another TLP.  Hadoop was
>>>> originally a subproject of Lucene.  It moved out to be a TLP when it
>>>> became obvious that Hadoop had become independent of and useful
>>>> apart
>>
>>>> from Lucene.  Pig is not in that position relative to Hadoop.
>>>>
>>>> So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to
>>>> being persuaded that I'm wrong or my concerns can be addressed while
>>>> still having Pig as a TLP.
>>>>
>>>> Alan.
>>>>
>>>>
>>>> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
>>>>
>>>> You have probably heard by now that there is a discussion going on
>>>> in the
>>>>> Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
>>>>> Zookeeper, Hive, and Pig) should move out from under the Hadoop
>>>>> umbrella and become top level Apache projects (TLP).  This
>>>>> discussion has picked up recently since the Apache board has
>>>>> clearly
>>
>>>>> communicated to the Hadoop PMC that it is concerned that Hadoop is
>>>>> acting as an umbrella project with many disjoint subprojects
>>>>> underneath it.  They are concerned that this gives Apache little
>>>>> insight into the health and happenings of the subproject
>>>>> communities
>>
>>>>> which in turn means Apache cannot properly mentor those
>>>>> communities.
>>>>>
>>>>> The purpose of this email is to start a discussion within the Pig
>>>>> community about this topic.  Let me cover first what becoming TLP
>>>>> would mean for Pig, and then I'll go into what options I think we
>>>>> as
>> a community have.
>>>>>
>>>>> Becoming a TLP would mean that Pig would itself have a PMC that
>>>>> would report directly to the Apache board.  Who would be on the PMC
>>>>> would be something we as a community would need to decide.  Common
>>>>> options would be to say all active committers are on the PMC, or
>>>>> all
>>
>>>>> active committers who have been a committer for at least a year.
>>>>> We
>>
>>>>> would also need to elect a chair of the PMC.  This lucky person
>>>>> would have no additional power, but would have the additional
>>>>> responsibility of writing quarterly reports on Pig's status for
>>>>> Apache board meetings, as well as coordinating with Apache to get
>>>>> accounts for new  committers, etc.  For more information see
>>>>> http://www.apache.org/foundation/how-it-works.html#roles
>>>>>
>>>>> Becoming a TLP would not mean that we are ostracized from the
>>>>> Hadoop
>>
>>>>> community.  We would continue to be invited to Hadoop Summits,
>>>>> HUGs,
>> etc.
>>>>> Since all Pig developers and users are by definition Hadoop users,
>>>>> we would continue to be a strong presence in the Hadoop community.
>>>>>
>>>>> I see three ways that we as a community can respond to this:
>>>>>
>>>>> 1) Say yes, we want to be a TLP now.
>>>>> 2) Say yes, we want to be a TLP, but not yet.  We feel we need more
>>>>> time to mature.  If we choose this option we need to be able to
>>>>> clearly articulate how much time we need and what we hope to see
>>>>> change in that time.
>>>>> 3) Say no, we feel the benefits for us staying with Hadoop outweigh
>>>>> the drawbacks of being a disjoint subproject.  If we choose this,
>>>>> we
>>
>>>>> need to be able to say exactly what those benefits are and why we
>>>>> feel they will be compromised by leaving the Hadoop project.
>>>>>
>>>>> There may other options that I haven't thought of.  Please feel
>>>>> free
>>
>>>>> to suggest any you think of.
>>>>>
>>>>> Questions?  Thoughts?  Let the discussion begin.
>>>>>
>>>>> Alan.
>>>>>
>>>>>
>>>>
>>
>

RE: Begin a discussion about Pig as a top level project

Posted by Santhosh Srinivasan <sm...@yahoo-inc.com>.

"Given that, do you think it makes  
sense to say that Pig stays a subproject for now, but if it someday  
grows beyond Hadoop only it becomes a TLP?  I could agree to that  
stance."

Bingo!

Santhosh 

-----Original Message-----
From: Alan Gates [mailto:gates@yahoo-inc.com] 
Sent: Monday, April 05, 2010 11:37 AM
To: pig-dev@hadoop.apache.org
Subject: Re: Begin a discussion about Pig as a top level project

Prognostication is a difficult business.  Of course I'd love it if  
someday there is an ISO Pig Latin committee (with meetings in cool  
exotic places) deciding the official standard for Pig Latin.  But that  
seems like saying in your start up's business plan, "When we reach  
Google's size, then we'll do x".  If there ever is an ISO Pig Latin  
standard it will be years off.

As others have noted, staying tight to Hadoop now has many advantages,  
both in technical and adoption terms.  Hence my advocacy of keeping  
Pig Latin Hadoop agnostic while tightly integrating the backend.   
Which is to say that in my view, Pig is Hadoop specific now, but there  
may come a day when that is no longer true.   Whether Pig will ever  
move past just running on Hadoop to running in other parallel systems  
won't be known for years to come.  Given that, do you think it makes  
sense to say that Pig stays a subproject for now, but if it someday  
grows beyond Hadoop only it becomes a TLP?  I could agree to that  
stance.

Alan.

On Apr 3, 2010, at 12:43 PM, Santhosh Srinivasan wrote:

> I see this as a multi-part question. Looking back at some of the
> significant roadmap/existential questions asked in the last 12  
> months, I
> see the following:
>
> 1. With the introduction of SQL, what is the philosophy of Pig (I sent
> an email about this approximately 9 months ago)
> 2. What is the approach to support backward compatibility in Pig (Alan
> had sent an email about this 3 months ago)
> 3. Should Pig be a TLP (the current email thread).
>
> Here is my take on answering the aforementioned questions.
>
> The initial philosophy of Pig was to be backend agnostic. It was
> designed as a data flow language. Whenever a new language is designed,
> the syntax and semantics of the language have to be laid out. The  
> syntax
> is usually captured in the form of a BNF grammar. The semantics are
> defined by the language creators. Backward compatibility is then a
> question of holding true to the syntax and semantics. With Pig, in
> addition to the language, the Java APIs were exposed to customers to
> implement UDFs (load/store/filter/grouping/row transformation etc),
> provision looping since the language does not support looping  
> constructs
> and also support a programmatic mode of access. Backward compatibility
> in this context is to support API versioning.
>
> Do we still intend to position as a data flow language that is backend
> agnostic? If the answer is yes, then there is a strong case for making
> Pig a TLP.
>
> Are we influenced by Hadoop? A big YES! The reason Pig chose to  
> become a
> Hadoop sub-project was to ride the Hadoop popularity wave. As a
> consequence, we chose to be heavily influenced by the Hadoop roadmap.
>
> Like a good lawyer, I also have rebuttals to Alan's questions :)
>
> 1. Search engine popularity - We can discuss this with the Hadoop team
> and still retain links to TLP's that are coupled (loosely or tightly).
> 2. Explicit connection to Hadoop - I see this as logical connection  
> v/s
> physical connection. Today, we are physically connected as a
> sub-project. Becoming a TLP, will not increase/decrease our  
> influence on
> the Hadoop community (think Logical, Physical and MR Layers :)
> 3. Philosophy - I have already talked about this. The tight coupling  
> is
> by choice. If Pig continues to be a data flow language with clear  
> syntax
> and semantics then someone can implement Pig on top of a different
> backend. Do we intend to take this approach?
>
> I just wanted to offer a different opinion to this thread. I strongly
> believe that we should think about the original philosophy. Will we  
> have
> a Pig standards committee that will decide on the changes to the
> language (think C/C++) if there are multiple backend implementations?
>
> I will reserve my vote based on the outcome of the philosophy and
> backward compatibility discussions. If we decide that Pig will be
> treated and maintained like a true language with clear syntax and
> semantics then we have a strong case to make it into a TLP. If not, we
> should retain our existing ties to Hadoop and make Pig into a data  
> flow
> language for Hadoop.
>
> Santhosh
>
> -----Original Message-----
> From: Thejas Nair [mailto:tejas@yahoo-inc.com]
> Sent: Friday, April 02, 2010 4:08 PM
> To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
> Subject: Re: Begin a discussion about Pig as a top level project
>
> I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop,  
> and
> heavily influenced by its roadmap. I think it makes sense to  
> continue as
> a sub-project of hadoop.
>
> -Thejas
>
>
>
> On 3/31/10 4:04 PM, "Dmitriy Ryaboy" <dv...@gmail.com> wrote:
>
>> Over time, Pig is increasing its coupling to Hadoop (for good
>> reasons), rather than decreasing it. If and when Pig becomes a viable
>> entity without hadoop around, it might make sense as a TLP. As is, I
>> think becoming a TLP will only introduce unnecessary administrative
> and bureaucratic headaches.
>> So my vote is also -1.
>>
>> -Dmitriy
>>
>>
>>
>> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <ga...@yahoo-inc.com>
> wrote:
>>
>>> So far I haven't seen any feedback on this.  Apache has asked the
>>> Hadoop PMC to submit input in April on whether some subprojects
>>> should be promoted to TLPs.  We, the Pig community, need to give
>>> feedback to the Hadoop PMC on how we feel about this.  Please make
> your voice heard.
>>>
>>> So now I'll head my own call and give my thoughts on it.
>>>
>>> The biggest advantage I see to being a TLP is a direct connection to
>>> Apache.  Right now all of the Pig team's interaction with Apache is
>>> through the Hadoop PMC.  Being directly connected to Apache would
>>> benefit Pig team members who would have a better view into Apache.
>>> It would also raise our profile in Apache and thus make other
> projects more aware of us.
>>>
>>> However, I am concerned about loosing Pig's explicit connection to
> Hadoop.
>>> This concern has a couple of dimensions.  One, Hadoop and MapReduce
>>> are the current flavor of the month in computing.  Given that Pig
>>> shares a name with the common farm animal, it's hard to be sure  
>>> based
> on search statistics.
>>> But Google trends shows that "hadoop" is searched on much more
>>> frequently than "hadoop pig" or "apache pig" (see
>>> http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing
>>> that most Pig users come from Hadoop users who discover Pig via
> Hadoop's website.
>>> Loosing that subproject tab on Hadoop's front page may radically
>>> lower the number of users coming to Pig to check out our project.  I
>>> would argue that this benefits Hadoop as well, since high level
>>> languages like Pig Latin have the potential to greatly extend the
> user base and usability of Hadoop.
>>>
>>> Two, being explicitly connected to Hadoop keeps our two communities
>>> aware of each others needs.  There are features proposed for MR that
>>> would greatly help Pig.  By staying in the Hadoop community Pig is
>>> better positioned to advocate for and help implement and test those
>>> features.  The response to this will be that Pig developers can  
>>> still
>
>>> subscribe to Hadoop mailing lists, submit patches, etc.  That is,
>>> they can still be part of the Hadoop community.  Which reinforces my
>>> point that it makes more sense to leave Pig in the Hadoop community
>>> since Pig developers will need to be part of that community anyway.
>>>
>>> Finally, philosophically it makes sense to me that projects that are
>>> tightly connected belong together.  It strikes me as strange to have
>>> Pig as a TLP completely dependent on another TLP.  Hadoop was
>>> originally a subproject of Lucene.  It moved out to be a TLP when it
>>> became obvious that Hadoop had become independent of and useful  
>>> apart
>
>>> from Lucene.  Pig is not in that position relative to Hadoop.
>>>
>>> So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to
>>> being persuaded that I'm wrong or my concerns can be addressed while
>>> still having Pig as a TLP.
>>>
>>> Alan.
>>>
>>>
>>> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
>>>
>>> You have probably heard by now that there is a discussion going on
>>> in the
>>>> Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
>>>> Zookeeper, Hive, and Pig) should move out from under the Hadoop
>>>> umbrella and become top level Apache projects (TLP).  This
>>>> discussion has picked up recently since the Apache board has  
>>>> clearly
>
>>>> communicated to the Hadoop PMC that it is concerned that Hadoop is
>>>> acting as an umbrella project with many disjoint subprojects
>>>> underneath it.  They are concerned that this gives Apache little
>>>> insight into the health and happenings of the subproject  
>>>> communities
>
>>>> which in turn means Apache cannot properly mentor those  
>>>> communities.
>>>>
>>>> The purpose of this email is to start a discussion within the Pig
>>>> community about this topic.  Let me cover first what becoming TLP
>>>> would mean for Pig, and then I'll go into what options I think we  
>>>> as
> a community have.
>>>>
>>>> Becoming a TLP would mean that Pig would itself have a PMC that
>>>> would report directly to the Apache board.  Who would be on the PMC
>>>> would be something we as a community would need to decide.  Common
>>>> options would be to say all active committers are on the PMC, or  
>>>> all
>
>>>> active committers who have been a committer for at least a year.   
>>>> We
>
>>>> would also need to elect a chair of the PMC.  This lucky person
>>>> would have no additional power, but would have the additional
>>>> responsibility of writing quarterly reports on Pig's status for
>>>> Apache board meetings, as well as coordinating with Apache to get
>>>> accounts for new  committers, etc.  For more information see
>>>> http://www.apache.org/foundation/how-it-works.html#roles
>>>>
>>>> Becoming a TLP would not mean that we are ostracized from the  
>>>> Hadoop
>
>>>> community.  We would continue to be invited to Hadoop Summits,  
>>>> HUGs,
> etc.
>>>> Since all Pig developers and users are by definition Hadoop users,
>>>> we would continue to be a strong presence in the Hadoop community.
>>>>
>>>> I see three ways that we as a community can respond to this:
>>>>
>>>> 1) Say yes, we want to be a TLP now.
>>>> 2) Say yes, we want to be a TLP, but not yet.  We feel we need more
>>>> time to mature.  If we choose this option we need to be able to
>>>> clearly articulate how much time we need and what we hope to see
>>>> change in that time.
>>>> 3) Say no, we feel the benefits for us staying with Hadoop outweigh
>>>> the drawbacks of being a disjoint subproject.  If we choose this,  
>>>> we
>
>>>> need to be able to say exactly what those benefits are and why we
>>>> feel they will be compromised by leaving the Hadoop project.
>>>>
>>>> There may other options that I haven't thought of.  Please feel  
>>>> free
>
>>>> to suggest any you think of.
>>>>
>>>> Questions?  Thoughts?  Let the discussion begin.
>>>>
>>>> Alan.
>>>>
>>>>
>>>
>

Re: Begin a discussion about Pig as a top level project

Posted by hc busy <hc...@gmail.com>.

>The Twitter office is cushier and has more bars within stumbling
distance. Just sayin'.

and strip clubs too, I gather there're a couple on Market... near civic bart
stop ;-)

oh... hey, you guys are at a nice place... lot's of night clubs near there
too .


> "Given that, do you think it makes sense to say that Pig stays a
subproject for now, but if it someday grows beyond Hadoop only it becomes a
TLP?  I could agree to that stance."


Oops, I didn't read your whole message... I think TLP could be part of the
roadmap: Planned publicity, like planned pregnancy, is a good thing.

And on the way there, we should add dedicated resource that updates
documentation and links on the website... :-)




On Mon, Apr 5, 2010 at 12:10 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> The Twitter office is cushier and has more bars within stumbling distance.
> Just sayin'.
>
> To the subject at hand -- I don't think TLP standing has the PR value you
> think it does... feature set, velocity of development, adoption,
> flexibility, etc -- those are far more important.
>
> -Dmitriy
>
> On Mon, Apr 5, 2010 at 11:58 AM, hc busy <hc...@gmail.com> wrote:
>
> > > Of course I'd love it if someday there is an ISO Pig Latin committee
> > (with
> > meetings in cool exotic places) deciding the official standard for Pig
> > Latin.
> >
> > haha!!! Some exotic place like Yahoo's  HQ in sunny Sunnyvale California?
> >
> > I guess it feels like it depends on the roadmap more than roadmap depends
> > on
> > it. In terms of positioning, a TLP would appear to potential users who
> are
> > evaluating alternatives to consider it as _the_ choice as opposed to one
> of
> > the choices. If the ambition is to take it there, then TLP, as useless as
> > it
> > may seem right now, might actually be worth the effort to attain.
> >
> > I mean, would you rather wait until Hive makes TLP and then play catch
> up?
> > I
> > mean, I can kinda see them doing that...
> >
> >
> >
> >
> > On Mon, Apr 5, 2010 at 11:36 AM, Alan Gates <ga...@yahoo-inc.com> wrote:
> >
> > > Prognostication is a difficult business.  Of course I'd love it if
> > someday
> > > there is an ISO Pig Latin committee (with meetings in cool exotic
> places)
> > > deciding the official standard for Pig Latin.  But that seems like
> saying
> > in
> > > your start up's business plan, "When we reach Google's size, then we'll
> > do
> > > x".  If there ever is an ISO Pig Latin standard it will be years off.
> > >
> > > As others have noted, staying tight to Hadoop now has many advantages,
> > both
> > > in technical and adoption terms.  Hence my advocacy of keeping Pig
> Latin
> > > Hadoop agnostic while tightly integrating the backend.  Which is to say
> > that
> > > in my view, Pig is Hadoop specific now, but there may come a day when
> > that
> > > is no longer true.   Whether Pig will ever move past just running on
> > Hadoop
> > > to running in other parallel systems won't be known for years to come.
> > >  Given that, do you think it makes sense to say that Pig stays a
> > subproject
> > > for now, but if it someday grows beyond Hadoop only it becomes a TLP?
>  I
> > > could agree to that stance.
> > >
> > > Alan.
> > >
> > >
> > > On Apr 3, 2010, at 12:43 PM, Santhosh Srinivasan wrote:
> > >
> > >  I see this as a multi-part question. Looking back at some of the
> > >> significant roadmap/existential questions asked in the last 12 months,
> I
> > >> see the following:
> > >>
> > >> 1. With the introduction of SQL, what is the philosophy of Pig (I sent
> > >> an email about this approximately 9 months ago)
> > >> 2. What is the approach to support backward compatibility in Pig (Alan
> > >> had sent an email about this 3 months ago)
> > >> 3. Should Pig be a TLP (the current email thread).
> > >>
> > >> Here is my take on answering the aforementioned questions.
> > >>
> > >> The initial philosophy of Pig was to be backend agnostic. It was
> > >> designed as a data flow language. Whenever a new language is designed,
> > >> the syntax and semantics of the language have to be laid out. The
> syntax
> > >> is usually captured in the form of a BNF grammar. The semantics are
> > >> defined by the language creators. Backward compatibility is then a
> > >> question of holding true to the syntax and semantics. With Pig, in
> > >> addition to the language, the Java APIs were exposed to customers to
> > >> implement UDFs (load/store/filter/grouping/row transformation etc),
> > >> provision looping since the language does not support looping
> constructs
> > >> and also support a programmatic mode of access. Backward compatibility
> > >> in this context is to support API versioning.
> > >>
> > >> Do we still intend to position as a data flow language that is backend
> > >> agnostic? If the answer is yes, then there is a strong case for making
> > >> Pig a TLP.
> > >>
> > >> Are we influenced by Hadoop? A big YES! The reason Pig chose to become
> a
> > >> Hadoop sub-project was to ride the Hadoop popularity wave. As a
> > >> consequence, we chose to be heavily influenced by the Hadoop roadmap.
> > >>
> > >> Like a good lawyer, I also have rebuttals to Alan's questions :)
> > >>
> > >> 1. Search engine popularity - We can discuss this with the Hadoop team
> > >> and still retain links to TLP's that are coupled (loosely or tightly).
> > >> 2. Explicit connection to Hadoop - I see this as logical connection
> v/s
> > >> physical connection. Today, we are physically connected as a
> > >> sub-project. Becoming a TLP, will not increase/decrease our influence
> on
> > >> the Hadoop community (think Logical, Physical and MR Layers :)
> > >> 3. Philosophy - I have already talked about this. The tight coupling
> is
> > >> by choice. If Pig continues to be a data flow language with clear
> syntax
> > >> and semantics then someone can implement Pig on top of a different
> > >> backend. Do we intend to take this approach?
> > >>
> > >> I just wanted to offer a different opinion to this thread. I strongly
> > >> believe that we should think about the original philosophy. Will we
> have
> > >> a Pig standards committee that will decide on the changes to the
> > >> language (think C/C++) if there are multiple backend implementations?
> > >>
> > >> I will reserve my vote based on the outcome of the philosophy and
> > >> backward compatibility discussions. If we decide that Pig will be
> > >> treated and maintained like a true language with clear syntax and
> > >> semantics then we have a strong case to make it into a TLP. If not, we
> > >> should retain our existing ties to Hadoop and make Pig into a data
> flow
> > >> language for Hadoop.
> > >>
> > >> Santhosh
> > >>
> > >> -----Original Message-----
> > >> From: Thejas Nair [mailto:tejas@yahoo-inc.com]
> > >> Sent: Friday, April 02, 2010 4:08 PM
> > >> To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
> > >> Subject: Re: Begin a discussion about Pig as a top level project
> > >>
> > >> I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop,
> and
> > >> heavily influenced by its roadmap. I think it makes sense to continue
> as
> > >> a sub-project of hadoop.
> > >>
> > >> -Thejas
> > >>
> > >>
> > >>
> > >> On 3/31/10 4:04 PM, "Dmitriy Ryaboy" <dv...@gmail.com> wrote:
> > >>
> > >>  Over time, Pig is increasing its coupling to Hadoop (for good
> > >>> reasons), rather than decreasing it. If and when Pig becomes a viable
> > >>> entity without hadoop around, it might make sense as a TLP. As is, I
> > >>> think becoming a TLP will only introduce unnecessary administrative
> > >>>
> > >> and bureaucratic headaches.
> > >>
> > >>> So my vote is also -1.
> > >>>
> > >>> -Dmitriy
> > >>>
> > >>>
> > >>>
> > >>> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <ga...@yahoo-inc.com>
> > >>>
> > >> wrote:
> > >>
> > >>>
> > >>>  So far I haven't seen any feedback on this.  Apache has asked the
> > >>>> Hadoop PMC to submit input in April on whether some subprojects
> > >>>> should be promoted to TLPs.  We, the Pig community, need to give
> > >>>> feedback to the Hadoop PMC on how we feel about this.  Please make
> > >>>>
> > >>> your voice heard.
> > >>
> > >>>
> > >>>> So now I'll head my own call and give my thoughts on it.
> > >>>>
> > >>>> The biggest advantage I see to being a TLP is a direct connection to
> > >>>> Apache.  Right now all of the Pig team's interaction with Apache is
> > >>>> through the Hadoop PMC.  Being directly connected to Apache would
> > >>>> benefit Pig team members who would have a better view into Apache.
> > >>>> It would also raise our profile in Apache and thus make other
> > >>>>
> > >>> projects more aware of us.
> > >>
> > >>>
> > >>>> However, I am concerned about loosing Pig's explicit connection to
> > >>>>
> > >>> Hadoop.
> > >>
> > >>> This concern has a couple of dimensions.  One, Hadoop and MapReduce
> > >>>> are the current flavor of the month in computing.  Given that Pig
> > >>>> shares a name with the common farm animal, it's hard to be sure
> based
> > >>>>
> > >>> on search statistics.
> > >>
> > >>> But Google trends shows that "hadoop" is searched on much more
> > >>>> frequently than "hadoop pig" or "apache pig" (see
> > >>>> http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing
> > >>>> that most Pig users come from Hadoop users who discover Pig via
> > >>>>
> > >>> Hadoop's website.
> > >>
> > >>> Loosing that subproject tab on Hadoop's front page may radically
> > >>>> lower the number of users coming to Pig to check out our project.  I
> > >>>> would argue that this benefits Hadoop as well, since high level
> > >>>> languages like Pig Latin have the potential to greatly extend the
> > >>>>
> > >>> user base and usability of Hadoop.
> > >>
> > >>>
> > >>>> Two, being explicitly connected to Hadoop keeps our two communities
> > >>>> aware of each others needs.  There are features proposed for MR that
> > >>>> would greatly help Pig.  By staying in the Hadoop community Pig is
> > >>>> better positioned to advocate for and help implement and test those
> > >>>> features.  The response to this will be that Pig developers can
> still
> > >>>>
> > >>>
> > >>  subscribe to Hadoop mailing lists, submit patches, etc.  That is,
> > >>>> they can still be part of the Hadoop community.  Which reinforces my
> > >>>> point that it makes more sense to leave Pig in the Hadoop community
> > >>>> since Pig developers will need to be part of that community anyway.
> > >>>>
> > >>>> Finally, philosophically it makes sense to me that projects that are
> > >>>> tightly connected belong together.  It strikes me as strange to have
> > >>>> Pig as a TLP completely dependent on another TLP.  Hadoop was
> > >>>> originally a subproject of Lucene.  It moved out to be a TLP when it
> > >>>> became obvious that Hadoop had become independent of and useful
> apart
> > >>>>
> > >>>
> > >>  from Lucene.  Pig is not in that position relative to Hadoop.
> > >>>>
> > >>>> So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to
> > >>>> being persuaded that I'm wrong or my concerns can be addressed while
> > >>>> still having Pig as a TLP.
> > >>>>
> > >>>> Alan.
> > >>>>
> > >>>>
> > >>>> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
> > >>>>
> > >>>> You have probably heard by now that there is a discussion going on
> > >>>> in the
> > >>>>
> > >>>>> Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
> > >>>>> Zookeeper, Hive, and Pig) should move out from under the Hadoop
> > >>>>> umbrella and become top level Apache projects (TLP).  This
> > >>>>> discussion has picked up recently since the Apache board has
> clearly
> > >>>>>
> > >>>>
> > >>  communicated to the Hadoop PMC that it is concerned that Hadoop is
> > >>>>> acting as an umbrella project with many disjoint subprojects
> > >>>>> underneath it.  They are concerned that this gives Apache little
> > >>>>> insight into the health and happenings of the subproject
> communities
> > >>>>>
> > >>>>
> > >>  which in turn means Apache cannot properly mentor those communities.
> > >>>>>
> > >>>>> The purpose of this email is to start a discussion within the Pig
> > >>>>> community about this topic.  Let me cover first what becoming TLP
> > >>>>> would mean for Pig, and then I'll go into what options I think we
> as
> > >>>>>
> > >>>> a community have.
> > >>
> > >>>
> > >>>>> Becoming a TLP would mean that Pig would itself have a PMC that
> > >>>>> would report directly to the Apache board.  Who would be on the PMC
> > >>>>> would be something we as a community would need to decide.  Common
> > >>>>> options would be to say all active committers are on the PMC, or
> all
> > >>>>>
> > >>>>
> > >>  active committers who have been a committer for at least a year.  We
> > >>>>>
> > >>>>
> > >>  would also need to elect a chair of the PMC.  This lucky person
> > >>>>> would have no additional power, but would have the additional
> > >>>>> responsibility of writing quarterly reports on Pig's status for
> > >>>>> Apache board meetings, as well as coordinating with Apache to get
> > >>>>> accounts for new  committers, etc.  For more information see
> > >>>>> http://www.apache.org/foundation/how-it-works.html#roles
> > >>>>>
> > >>>>> Becoming a TLP would not mean that we are ostracized from the
> Hadoop
> > >>>>>
> > >>>>
> > >>  community.  We would continue to be invited to Hadoop Summits, HUGs,
> > >>>>>
> > >>>> etc.
> > >>
> > >>> Since all Pig developers and users are by definition Hadoop users,
> > >>>>> we would continue to be a strong presence in the Hadoop community.
> > >>>>>
> > >>>>> I see three ways that we as a community can respond to this:
> > >>>>>
> > >>>>> 1) Say yes, we want to be a TLP now.
> > >>>>> 2) Say yes, we want to be a TLP, but not yet.  We feel we need more
> > >>>>> time to mature.  If we choose this option we need to be able to
> > >>>>> clearly articulate how much time we need and what we hope to see
> > >>>>> change in that time.
> > >>>>> 3) Say no, we feel the benefits for us staying with Hadoop outweigh
> > >>>>> the drawbacks of being a disjoint subproject.  If we choose this,
> we
> > >>>>>
> > >>>>
> > >>  need to be able to say exactly what those benefits are and why we
> > >>>>> feel they will be compromised by leaving the Hadoop project.
> > >>>>>
> > >>>>> There may other options that I haven't thought of.  Please feel
> free
> > >>>>>
> > >>>>
> > >>  to suggest any you think of.
> > >>>>>
> > >>>>> Questions?  Thoughts?  Let the discussion begin.
> > >>>>>
> > >>>>> Alan.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> > >
> >
>

Re: Begin a discussion about Pig as a top level project

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

The Twitter office is cushier and has more bars within stumbling distance.
Just sayin'.

To the subject at hand -- I don't think TLP standing has the PR value you
think it does... feature set, velocity of development, adoption,
flexibility, etc -- those are far more important.

-Dmitriy

On Mon, Apr 5, 2010 at 11:58 AM, hc busy <hc...@gmail.com> wrote:

> > Of course I'd love it if someday there is an ISO Pig Latin committee
> (with
> meetings in cool exotic places) deciding the official standard for Pig
> Latin.
>
> haha!!! Some exotic place like Yahoo's  HQ in sunny Sunnyvale California?
>
> I guess it feels like it depends on the roadmap more than roadmap depends
> on
> it. In terms of positioning, a TLP would appear to potential users who are
> evaluating alternatives to consider it as _the_ choice as opposed to one of
> the choices. If the ambition is to take it there, then TLP, as useless as
> it
> may seem right now, might actually be worth the effort to attain.
>
> I mean, would you rather wait until Hive makes TLP and then play catch up?
> I
> mean, I can kinda see them doing that...
>
>
>
>
> On Mon, Apr 5, 2010 at 11:36 AM, Alan Gates <ga...@yahoo-inc.com> wrote:
>
> > Prognostication is a difficult business.  Of course I'd love it if
> someday
> > there is an ISO Pig Latin committee (with meetings in cool exotic places)
> > deciding the official standard for Pig Latin.  But that seems like saying
> in
> > your start up's business plan, "When we reach Google's size, then we'll
> do
> > x".  If there ever is an ISO Pig Latin standard it will be years off.
> >
> > As others have noted, staying tight to Hadoop now has many advantages,
> both
> > in technical and adoption terms.  Hence my advocacy of keeping Pig Latin
> > Hadoop agnostic while tightly integrating the backend.  Which is to say
> that
> > in my view, Pig is Hadoop specific now, but there may come a day when
> that
> > is no longer true.   Whether Pig will ever move past just running on
> Hadoop
> > to running in other parallel systems won't be known for years to come.
> >  Given that, do you think it makes sense to say that Pig stays a
> subproject
> > for now, but if it someday grows beyond Hadoop only it becomes a TLP?  I
> > could agree to that stance.
> >
> > Alan.
> >
> >
> > On Apr 3, 2010, at 12:43 PM, Santhosh Srinivasan wrote:
> >
> >  I see this as a multi-part question. Looking back at some of the
> >> significant roadmap/existential questions asked in the last 12 months, I
> >> see the following:
> >>
> >> 1. With the introduction of SQL, what is the philosophy of Pig (I sent
> >> an email about this approximately 9 months ago)
> >> 2. What is the approach to support backward compatibility in Pig (Alan
> >> had sent an email about this 3 months ago)
> >> 3. Should Pig be a TLP (the current email thread).
> >>
> >> Here is my take on answering the aforementioned questions.
> >>
> >> The initial philosophy of Pig was to be backend agnostic. It was
> >> designed as a data flow language. Whenever a new language is designed,
> >> the syntax and semantics of the language have to be laid out. The syntax
> >> is usually captured in the form of a BNF grammar. The semantics are
> >> defined by the language creators. Backward compatibility is then a
> >> question of holding true to the syntax and semantics. With Pig, in
> >> addition to the language, the Java APIs were exposed to customers to
> >> implement UDFs (load/store/filter/grouping/row transformation etc),
> >> provision looping since the language does not support looping constructs
> >> and also support a programmatic mode of access. Backward compatibility
> >> in this context is to support API versioning.
> >>
> >> Do we still intend to position as a data flow language that is backend
> >> agnostic? If the answer is yes, then there is a strong case for making
> >> Pig a TLP.
> >>
> >> Are we influenced by Hadoop? A big YES! The reason Pig chose to become a
> >> Hadoop sub-project was to ride the Hadoop popularity wave. As a
> >> consequence, we chose to be heavily influenced by the Hadoop roadmap.
> >>
> >> Like a good lawyer, I also have rebuttals to Alan's questions :)
> >>
> >> 1. Search engine popularity - We can discuss this with the Hadoop team
> >> and still retain links to TLP's that are coupled (loosely or tightly).
> >> 2. Explicit connection to Hadoop - I see this as logical connection v/s
> >> physical connection. Today, we are physically connected as a
> >> sub-project. Becoming a TLP, will not increase/decrease our influence on
> >> the Hadoop community (think Logical, Physical and MR Layers :)
> >> 3. Philosophy - I have already talked about this. The tight coupling is
> >> by choice. If Pig continues to be a data flow language with clear syntax
> >> and semantics then someone can implement Pig on top of a different
> >> backend. Do we intend to take this approach?
> >>
> >> I just wanted to offer a different opinion to this thread. I strongly
> >> believe that we should think about the original philosophy. Will we have
> >> a Pig standards committee that will decide on the changes to the
> >> language (think C/C++) if there are multiple backend implementations?
> >>
> >> I will reserve my vote based on the outcome of the philosophy and
> >> backward compatibility discussions. If we decide that Pig will be
> >> treated and maintained like a true language with clear syntax and
> >> semantics then we have a strong case to make it into a TLP. If not, we
> >> should retain our existing ties to Hadoop and make Pig into a data flow
> >> language for Hadoop.
> >>
> >> Santhosh
> >>
> >> -----Original Message-----
> >> From: Thejas Nair [mailto:tejas@yahoo-inc.com]
> >> Sent: Friday, April 02, 2010 4:08 PM
> >> To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
> >> Subject: Re: Begin a discussion about Pig as a top level project
> >>
> >> I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and
> >> heavily influenced by its roadmap. I think it makes sense to continue as
> >> a sub-project of hadoop.
> >>
> >> -Thejas
> >>
> >>
> >>
> >> On 3/31/10 4:04 PM, "Dmitriy Ryaboy" <dv...@gmail.com> wrote:
> >>
> >>  Over time, Pig is increasing its coupling to Hadoop (for good
> >>> reasons), rather than decreasing it. If and when Pig becomes a viable
> >>> entity without hadoop around, it might make sense as a TLP. As is, I
> >>> think becoming a TLP will only introduce unnecessary administrative
> >>>
> >> and bureaucratic headaches.
> >>
> >>> So my vote is also -1.
> >>>
> >>> -Dmitriy
> >>>
> >>>
> >>>
> >>> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <ga...@yahoo-inc.com>
> >>>
> >> wrote:
> >>
> >>>
> >>>  So far I haven't seen any feedback on this.  Apache has asked the
> >>>> Hadoop PMC to submit input in April on whether some subprojects
> >>>> should be promoted to TLPs.  We, the Pig community, need to give
> >>>> feedback to the Hadoop PMC on how we feel about this.  Please make
> >>>>
> >>> your voice heard.
> >>
> >>>
> >>>> So now I'll head my own call and give my thoughts on it.
> >>>>
> >>>> The biggest advantage I see to being a TLP is a direct connection to
> >>>> Apache.  Right now all of the Pig team's interaction with Apache is
> >>>> through the Hadoop PMC.  Being directly connected to Apache would
> >>>> benefit Pig team members who would have a better view into Apache.
> >>>> It would also raise our profile in Apache and thus make other
> >>>>
> >>> projects more aware of us.
> >>
> >>>
> >>>> However, I am concerned about loosing Pig's explicit connection to
> >>>>
> >>> Hadoop.
> >>
> >>> This concern has a couple of dimensions.  One, Hadoop and MapReduce
> >>>> are the current flavor of the month in computing.  Given that Pig
> >>>> shares a name with the common farm animal, it's hard to be sure based
> >>>>
> >>> on search statistics.
> >>
> >>> But Google trends shows that "hadoop" is searched on much more
> >>>> frequently than "hadoop pig" or "apache pig" (see
> >>>> http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing
> >>>> that most Pig users come from Hadoop users who discover Pig via
> >>>>
> >>> Hadoop's website.
> >>
> >>> Loosing that subproject tab on Hadoop's front page may radically
> >>>> lower the number of users coming to Pig to check out our project.  I
> >>>> would argue that this benefits Hadoop as well, since high level
> >>>> languages like Pig Latin have the potential to greatly extend the
> >>>>
> >>> user base and usability of Hadoop.
> >>
> >>>
> >>>> Two, being explicitly connected to Hadoop keeps our two communities
> >>>> aware of each others needs.  There are features proposed for MR that
> >>>> would greatly help Pig.  By staying in the Hadoop community Pig is
> >>>> better positioned to advocate for and help implement and test those
> >>>> features.  The response to this will be that Pig developers can still
> >>>>
> >>>
> >>  subscribe to Hadoop mailing lists, submit patches, etc.  That is,
> >>>> they can still be part of the Hadoop community.  Which reinforces my
> >>>> point that it makes more sense to leave Pig in the Hadoop community
> >>>> since Pig developers will need to be part of that community anyway.
> >>>>
> >>>> Finally, philosophically it makes sense to me that projects that are
> >>>> tightly connected belong together.  It strikes me as strange to have
> >>>> Pig as a TLP completely dependent on another TLP.  Hadoop was
> >>>> originally a subproject of Lucene.  It moved out to be a TLP when it
> >>>> became obvious that Hadoop had become independent of and useful apart
> >>>>
> >>>
> >>  from Lucene.  Pig is not in that position relative to Hadoop.
> >>>>
> >>>> So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to
> >>>> being persuaded that I'm wrong or my concerns can be addressed while
> >>>> still having Pig as a TLP.
> >>>>
> >>>> Alan.
> >>>>
> >>>>
> >>>> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
> >>>>
> >>>> You have probably heard by now that there is a discussion going on
> >>>> in the
> >>>>
> >>>>> Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
> >>>>> Zookeeper, Hive, and Pig) should move out from under the Hadoop
> >>>>> umbrella and become top level Apache projects (TLP).  This
> >>>>> discussion has picked up recently since the Apache board has clearly
> >>>>>
> >>>>
> >>  communicated to the Hadoop PMC that it is concerned that Hadoop is
> >>>>> acting as an umbrella project with many disjoint subprojects
> >>>>> underneath it.  They are concerned that this gives Apache little
> >>>>> insight into the health and happenings of the subproject communities
> >>>>>
> >>>>
> >>  which in turn means Apache cannot properly mentor those communities.
> >>>>>
> >>>>> The purpose of this email is to start a discussion within the Pig
> >>>>> community about this topic.  Let me cover first what becoming TLP
> >>>>> would mean for Pig, and then I'll go into what options I think we as
> >>>>>
> >>>> a community have.
> >>
> >>>
> >>>>> Becoming a TLP would mean that Pig would itself have a PMC that
> >>>>> would report directly to the Apache board.  Who would be on the PMC
> >>>>> would be something we as a community would need to decide.  Common
> >>>>> options would be to say all active committers are on the PMC, or all
> >>>>>
> >>>>
> >>  active committers who have been a committer for at least a year.  We
> >>>>>
> >>>>
> >>  would also need to elect a chair of the PMC.  This lucky person
> >>>>> would have no additional power, but would have the additional
> >>>>> responsibility of writing quarterly reports on Pig's status for
> >>>>> Apache board meetings, as well as coordinating with Apache to get
> >>>>> accounts for new  committers, etc.  For more information see
> >>>>> http://www.apache.org/foundation/how-it-works.html#roles
> >>>>>
> >>>>> Becoming a TLP would not mean that we are ostracized from the Hadoop
> >>>>>
> >>>>
> >>  community.  We would continue to be invited to Hadoop Summits, HUGs,
> >>>>>
> >>>> etc.
> >>
> >>> Since all Pig developers and users are by definition Hadoop users,
> >>>>> we would continue to be a strong presence in the Hadoop community.
> >>>>>
> >>>>> I see three ways that we as a community can respond to this:
> >>>>>
> >>>>> 1) Say yes, we want to be a TLP now.
> >>>>> 2) Say yes, we want to be a TLP, but not yet.  We feel we need more
> >>>>> time to mature.  If we choose this option we need to be able to
> >>>>> clearly articulate how much time we need and what we hope to see
> >>>>> change in that time.
> >>>>> 3) Say no, we feel the benefits for us staying with Hadoop outweigh
> >>>>> the drawbacks of being a disjoint subproject.  If we choose this, we
> >>>>>
> >>>>
> >>  need to be able to say exactly what those benefits are and why we
> >>>>> feel they will be compromised by leaving the Hadoop project.
> >>>>>
> >>>>> There may other options that I haven't thought of.  Please feel free
> >>>>>
> >>>>
> >>  to suggest any you think of.
> >>>>>
> >>>>> Questions?  Thoughts?  Let the discussion begin.
> >>>>>
> >>>>> Alan.
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>
> >
>

Re: Begin a discussion about Pig as a top level project

Posted by hc busy <hc...@gmail.com>.

> Of course I'd love it if someday there is an ISO Pig Latin committee (with
meetings in cool exotic places) deciding the official standard for Pig
Latin.

haha!!! Some exotic place like Yahoo's  HQ in sunny Sunnyvale California?

I guess it feels like it depends on the roadmap more than roadmap depends on
it. In terms of positioning, a TLP would appear to potential users who are
evaluating alternatives to consider it as _the_ choice as opposed to one of
the choices. If the ambition is to take it there, then TLP, as useless as it
may seem right now, might actually be worth the effort to attain.

I mean, would you rather wait until Hive makes TLP and then play catch up? I
mean, I can kinda see them doing that...




On Mon, Apr 5, 2010 at 11:36 AM, Alan Gates <ga...@yahoo-inc.com> wrote:

> Prognostication is a difficult business.  Of course I'd love it if someday
> there is an ISO Pig Latin committee (with meetings in cool exotic places)
> deciding the official standard for Pig Latin.  But that seems like saying in
> your start up's business plan, "When we reach Google's size, then we'll do
> x".  If there ever is an ISO Pig Latin standard it will be years off.
>
> As others have noted, staying tight to Hadoop now has many advantages, both
> in technical and adoption terms.  Hence my advocacy of keeping Pig Latin
> Hadoop agnostic while tightly integrating the backend.  Which is to say that
> in my view, Pig is Hadoop specific now, but there may come a day when that
> is no longer true.   Whether Pig will ever move past just running on Hadoop
> to running in other parallel systems won't be known for years to come.
>  Given that, do you think it makes sense to say that Pig stays a subproject
> for now, but if it someday grows beyond Hadoop only it becomes a TLP?  I
> could agree to that stance.
>
> Alan.
>
>
> On Apr 3, 2010, at 12:43 PM, Santhosh Srinivasan wrote:
>
>  I see this as a multi-part question. Looking back at some of the
>> significant roadmap/existential questions asked in the last 12 months, I
>> see the following:
>>
>> 1. With the introduction of SQL, what is the philosophy of Pig (I sent
>> an email about this approximately 9 months ago)
>> 2. What is the approach to support backward compatibility in Pig (Alan
>> had sent an email about this 3 months ago)
>> 3. Should Pig be a TLP (the current email thread).
>>
>> Here is my take on answering the aforementioned questions.
>>
>> The initial philosophy of Pig was to be backend agnostic. It was
>> designed as a data flow language. Whenever a new language is designed,
>> the syntax and semantics of the language have to be laid out. The syntax
>> is usually captured in the form of a BNF grammar. The semantics are
>> defined by the language creators. Backward compatibility is then a
>> question of holding true to the syntax and semantics. With Pig, in
>> addition to the language, the Java APIs were exposed to customers to
>> implement UDFs (load/store/filter/grouping/row transformation etc),
>> provision looping since the language does not support looping constructs
>> and also support a programmatic mode of access. Backward compatibility
>> in this context is to support API versioning.
>>
>> Do we still intend to position as a data flow language that is backend
>> agnostic? If the answer is yes, then there is a strong case for making
>> Pig a TLP.
>>
>> Are we influenced by Hadoop? A big YES! The reason Pig chose to become a
>> Hadoop sub-project was to ride the Hadoop popularity wave. As a
>> consequence, we chose to be heavily influenced by the Hadoop roadmap.
>>
>> Like a good lawyer, I also have rebuttals to Alan's questions :)
>>
>> 1. Search engine popularity - We can discuss this with the Hadoop team
>> and still retain links to TLP's that are coupled (loosely or tightly).
>> 2. Explicit connection to Hadoop - I see this as logical connection v/s
>> physical connection. Today, we are physically connected as a
>> sub-project. Becoming a TLP, will not increase/decrease our influence on
>> the Hadoop community (think Logical, Physical and MR Layers :)
>> 3. Philosophy - I have already talked about this. The tight coupling is
>> by choice. If Pig continues to be a data flow language with clear syntax
>> and semantics then someone can implement Pig on top of a different
>> backend. Do we intend to take this approach?
>>
>> I just wanted to offer a different opinion to this thread. I strongly
>> believe that we should think about the original philosophy. Will we have
>> a Pig standards committee that will decide on the changes to the
>> language (think C/C++) if there are multiple backend implementations?
>>
>> I will reserve my vote based on the outcome of the philosophy and
>> backward compatibility discussions. If we decide that Pig will be
>> treated and maintained like a true language with clear syntax and
>> semantics then we have a strong case to make it into a TLP. If not, we
>> should retain our existing ties to Hadoop and make Pig into a data flow
>> language for Hadoop.
>>
>> Santhosh
>>
>> -----Original Message-----
>> From: Thejas Nair [mailto:tejas@yahoo-inc.com]
>> Sent: Friday, April 02, 2010 4:08 PM
>> To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
>> Subject: Re: Begin a discussion about Pig as a top level project
>>
>> I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and
>> heavily influenced by its roadmap. I think it makes sense to continue as
>> a sub-project of hadoop.
>>
>> -Thejas
>>
>>
>>
>> On 3/31/10 4:04 PM, "Dmitriy Ryaboy" <dv...@gmail.com> wrote:
>>
>>  Over time, Pig is increasing its coupling to Hadoop (for good
>>> reasons), rather than decreasing it. If and when Pig becomes a viable
>>> entity without hadoop around, it might make sense as a TLP. As is, I
>>> think becoming a TLP will only introduce unnecessary administrative
>>>
>> and bureaucratic headaches.
>>
>>> So my vote is also -1.
>>>
>>> -Dmitriy
>>>
>>>
>>>
>>> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <ga...@yahoo-inc.com>
>>>
>> wrote:
>>
>>>
>>>  So far I haven't seen any feedback on this.  Apache has asked the
>>>> Hadoop PMC to submit input in April on whether some subprojects
>>>> should be promoted to TLPs.  We, the Pig community, need to give
>>>> feedback to the Hadoop PMC on how we feel about this.  Please make
>>>>
>>> your voice heard.
>>
>>>
>>>> So now I'll head my own call and give my thoughts on it.
>>>>
>>>> The biggest advantage I see to being a TLP is a direct connection to
>>>> Apache.  Right now all of the Pig team's interaction with Apache is
>>>> through the Hadoop PMC.  Being directly connected to Apache would
>>>> benefit Pig team members who would have a better view into Apache.
>>>> It would also raise our profile in Apache and thus make other
>>>>
>>> projects more aware of us.
>>
>>>
>>>> However, I am concerned about loosing Pig's explicit connection to
>>>>
>>> Hadoop.
>>
>>> This concern has a couple of dimensions.  One, Hadoop and MapReduce
>>>> are the current flavor of the month in computing.  Given that Pig
>>>> shares a name with the common farm animal, it's hard to be sure based
>>>>
>>> on search statistics.
>>
>>> But Google trends shows that "hadoop" is searched on much more
>>>> frequently than "hadoop pig" or "apache pig" (see
>>>> http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing
>>>> that most Pig users come from Hadoop users who discover Pig via
>>>>
>>> Hadoop's website.
>>
>>> Loosing that subproject tab on Hadoop's front page may radically
>>>> lower the number of users coming to Pig to check out our project.  I
>>>> would argue that this benefits Hadoop as well, since high level
>>>> languages like Pig Latin have the potential to greatly extend the
>>>>
>>> user base and usability of Hadoop.
>>
>>>
>>>> Two, being explicitly connected to Hadoop keeps our two communities
>>>> aware of each others needs.  There are features proposed for MR that
>>>> would greatly help Pig.  By staying in the Hadoop community Pig is
>>>> better positioned to advocate for and help implement and test those
>>>> features.  The response to this will be that Pig developers can still
>>>>
>>>
>>  subscribe to Hadoop mailing lists, submit patches, etc.  That is,
>>>> they can still be part of the Hadoop community.  Which reinforces my
>>>> point that it makes more sense to leave Pig in the Hadoop community
>>>> since Pig developers will need to be part of that community anyway.
>>>>
>>>> Finally, philosophically it makes sense to me that projects that are
>>>> tightly connected belong together.  It strikes me as strange to have
>>>> Pig as a TLP completely dependent on another TLP.  Hadoop was
>>>> originally a subproject of Lucene.  It moved out to be a TLP when it
>>>> became obvious that Hadoop had become independent of and useful apart
>>>>
>>>
>>  from Lucene.  Pig is not in that position relative to Hadoop.
>>>>
>>>> So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to
>>>> being persuaded that I'm wrong or my concerns can be addressed while
>>>> still having Pig as a TLP.
>>>>
>>>> Alan.
>>>>
>>>>
>>>> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
>>>>
>>>> You have probably heard by now that there is a discussion going on
>>>> in the
>>>>
>>>>> Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
>>>>> Zookeeper, Hive, and Pig) should move out from under the Hadoop
>>>>> umbrella and become top level Apache projects (TLP).  This
>>>>> discussion has picked up recently since the Apache board has clearly
>>>>>
>>>>
>>  communicated to the Hadoop PMC that it is concerned that Hadoop is
>>>>> acting as an umbrella project with many disjoint subprojects
>>>>> underneath it.  They are concerned that this gives Apache little
>>>>> insight into the health and happenings of the subproject communities
>>>>>
>>>>
>>  which in turn means Apache cannot properly mentor those communities.
>>>>>
>>>>> The purpose of this email is to start a discussion within the Pig
>>>>> community about this topic.  Let me cover first what becoming TLP
>>>>> would mean for Pig, and then I'll go into what options I think we as
>>>>>
>>>> a community have.
>>
>>>
>>>>> Becoming a TLP would mean that Pig would itself have a PMC that
>>>>> would report directly to the Apache board.  Who would be on the PMC
>>>>> would be something we as a community would need to decide.  Common
>>>>> options would be to say all active committers are on the PMC, or all
>>>>>
>>>>
>>  active committers who have been a committer for at least a year.  We
>>>>>
>>>>
>>  would also need to elect a chair of the PMC.  This lucky person
>>>>> would have no additional power, but would have the additional
>>>>> responsibility of writing quarterly reports on Pig's status for
>>>>> Apache board meetings, as well as coordinating with Apache to get
>>>>> accounts for new  committers, etc.  For more information see
>>>>> http://www.apache.org/foundation/how-it-works.html#roles
>>>>>
>>>>> Becoming a TLP would not mean that we are ostracized from the Hadoop
>>>>>
>>>>
>>  community.  We would continue to be invited to Hadoop Summits, HUGs,
>>>>>
>>>> etc.
>>
>>> Since all Pig developers and users are by definition Hadoop users,
>>>>> we would continue to be a strong presence in the Hadoop community.
>>>>>
>>>>> I see three ways that we as a community can respond to this:
>>>>>
>>>>> 1) Say yes, we want to be a TLP now.
>>>>> 2) Say yes, we want to be a TLP, but not yet.  We feel we need more
>>>>> time to mature.  If we choose this option we need to be able to
>>>>> clearly articulate how much time we need and what we hope to see
>>>>> change in that time.
>>>>> 3) Say no, we feel the benefits for us staying with Hadoop outweigh
>>>>> the drawbacks of being a disjoint subproject.  If we choose this, we
>>>>>
>>>>
>>  need to be able to say exactly what those benefits are and why we
>>>>> feel they will be compromised by leaving the Hadoop project.
>>>>>
>>>>> There may other options that I haven't thought of.  Please feel free
>>>>>
>>>>
>>  to suggest any you think of.
>>>>>
>>>>> Questions?  Thoughts?  Let the discussion begin.
>>>>>
>>>>> Alan.
>>>>>
>>>>>
>>>>>
>>>>
>>
>

Re: Begin a discussion about Pig as a top level project

Posted by Alan Gates <ga...@yahoo-inc.com>.

Prognostication is a difficult business.  Of course I'd love it if  
someday there is an ISO Pig Latin committee (with meetings in cool  
exotic places) deciding the official standard for Pig Latin.  But that  
seems like saying in your start up's business plan, "When we reach  
Google's size, then we'll do x".  If there ever is an ISO Pig Latin  
standard it will be years off.

As others have noted, staying tight to Hadoop now has many advantages,  
both in technical and adoption terms.  Hence my advocacy of keeping  
Pig Latin Hadoop agnostic while tightly integrating the backend.   
Which is to say that in my view, Pig is Hadoop specific now, but there  
may come a day when that is no longer true.   Whether Pig will ever  
move past just running on Hadoop to running in other parallel systems  
won't be known for years to come.  Given that, do you think it makes  
sense to say that Pig stays a subproject for now, but if it someday  
grows beyond Hadoop only it becomes a TLP?  I could agree to that  
stance.

Alan.

On Apr 3, 2010, at 12:43 PM, Santhosh Srinivasan wrote:

> I see this as a multi-part question. Looking back at some of the
> significant roadmap/existential questions asked in the last 12  
> months, I
> see the following:
>
> 1. With the introduction of SQL, what is the philosophy of Pig (I sent
> an email about this approximately 9 months ago)
> 2. What is the approach to support backward compatibility in Pig (Alan
> had sent an email about this 3 months ago)
> 3. Should Pig be a TLP (the current email thread).
>
> Here is my take on answering the aforementioned questions.
>
> The initial philosophy of Pig was to be backend agnostic. It was
> designed as a data flow language. Whenever a new language is designed,
> the syntax and semantics of the language have to be laid out. The  
> syntax
> is usually captured in the form of a BNF grammar. The semantics are
> defined by the language creators. Backward compatibility is then a
> question of holding true to the syntax and semantics. With Pig, in
> addition to the language, the Java APIs were exposed to customers to
> implement UDFs (load/store/filter/grouping/row transformation etc),
> provision looping since the language does not support looping  
> constructs
> and also support a programmatic mode of access. Backward compatibility
> in this context is to support API versioning.
>
> Do we still intend to position as a data flow language that is backend
> agnostic? If the answer is yes, then there is a strong case for making
> Pig a TLP.
>
> Are we influenced by Hadoop? A big YES! The reason Pig chose to  
> become a
> Hadoop sub-project was to ride the Hadoop popularity wave. As a
> consequence, we chose to be heavily influenced by the Hadoop roadmap.
>
> Like a good lawyer, I also have rebuttals to Alan's questions :)
>
> 1. Search engine popularity - We can discuss this with the Hadoop team
> and still retain links to TLP's that are coupled (loosely or tightly).
> 2. Explicit connection to Hadoop - I see this as logical connection  
> v/s
> physical connection. Today, we are physically connected as a
> sub-project. Becoming a TLP, will not increase/decrease our  
> influence on
> the Hadoop community (think Logical, Physical and MR Layers :)
> 3. Philosophy - I have already talked about this. The tight coupling  
> is
> by choice. If Pig continues to be a data flow language with clear  
> syntax
> and semantics then someone can implement Pig on top of a different
> backend. Do we intend to take this approach?
>
> I just wanted to offer a different opinion to this thread. I strongly
> believe that we should think about the original philosophy. Will we  
> have
> a Pig standards committee that will decide on the changes to the
> language (think C/C++) if there are multiple backend implementations?
>
> I will reserve my vote based on the outcome of the philosophy and
> backward compatibility discussions. If we decide that Pig will be
> treated and maintained like a true language with clear syntax and
> semantics then we have a strong case to make it into a TLP. If not, we
> should retain our existing ties to Hadoop and make Pig into a data  
> flow
> language for Hadoop.
>
> Santhosh
>
> -----Original Message-----
> From: Thejas Nair [mailto:tejas@yahoo-inc.com]
> Sent: Friday, April 02, 2010 4:08 PM
> To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
> Subject: Re: Begin a discussion about Pig as a top level project
>
> I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop,  
> and
> heavily influenced by its roadmap. I think it makes sense to  
> continue as
> a sub-project of hadoop.
>
> -Thejas
>
>
>
> On 3/31/10 4:04 PM, "Dmitriy Ryaboy" <dv...@gmail.com> wrote:
>
>> Over time, Pig is increasing its coupling to Hadoop (for good
>> reasons), rather than decreasing it. If and when Pig becomes a viable
>> entity without hadoop around, it might make sense as a TLP. As is, I
>> think becoming a TLP will only introduce unnecessary administrative
> and bureaucratic headaches.
>> So my vote is also -1.
>>
>> -Dmitriy
>>
>>
>>
>> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <ga...@yahoo-inc.com>
> wrote:
>>
>>> So far I haven't seen any feedback on this.  Apache has asked the
>>> Hadoop PMC to submit input in April on whether some subprojects
>>> should be promoted to TLPs.  We, the Pig community, need to give
>>> feedback to the Hadoop PMC on how we feel about this.  Please make
> your voice heard.
>>>
>>> So now I'll head my own call and give my thoughts on it.
>>>
>>> The biggest advantage I see to being a TLP is a direct connection to
>>> Apache.  Right now all of the Pig team's interaction with Apache is
>>> through the Hadoop PMC.  Being directly connected to Apache would
>>> benefit Pig team members who would have a better view into Apache.
>>> It would also raise our profile in Apache and thus make other
> projects more aware of us.
>>>
>>> However, I am concerned about loosing Pig's explicit connection to
> Hadoop.
>>> This concern has a couple of dimensions.  One, Hadoop and MapReduce
>>> are the current flavor of the month in computing.  Given that Pig
>>> shares a name with the common farm animal, it's hard to be sure  
>>> based
> on search statistics.
>>> But Google trends shows that "hadoop" is searched on much more
>>> frequently than "hadoop pig" or "apache pig" (see
>>> http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing
>>> that most Pig users come from Hadoop users who discover Pig via
> Hadoop's website.
>>> Loosing that subproject tab on Hadoop's front page may radically
>>> lower the number of users coming to Pig to check out our project.  I
>>> would argue that this benefits Hadoop as well, since high level
>>> languages like Pig Latin have the potential to greatly extend the
> user base and usability of Hadoop.
>>>
>>> Two, being explicitly connected to Hadoop keeps our two communities
>>> aware of each others needs.  There are features proposed for MR that
>>> would greatly help Pig.  By staying in the Hadoop community Pig is
>>> better positioned to advocate for and help implement and test those
>>> features.  The response to this will be that Pig developers can  
>>> still
>
>>> subscribe to Hadoop mailing lists, submit patches, etc.  That is,
>>> they can still be part of the Hadoop community.  Which reinforces my
>>> point that it makes more sense to leave Pig in the Hadoop community
>>> since Pig developers will need to be part of that community anyway.
>>>
>>> Finally, philosophically it makes sense to me that projects that are
>>> tightly connected belong together.  It strikes me as strange to have
>>> Pig as a TLP completely dependent on another TLP.  Hadoop was
>>> originally a subproject of Lucene.  It moved out to be a TLP when it
>>> became obvious that Hadoop had become independent of and useful  
>>> apart
>
>>> from Lucene.  Pig is not in that position relative to Hadoop.
>>>
>>> So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to
>>> being persuaded that I'm wrong or my concerns can be addressed while
>>> still having Pig as a TLP.
>>>
>>> Alan.
>>>
>>>
>>> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
>>>
>>> You have probably heard by now that there is a discussion going on
>>> in the
>>>> Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
>>>> Zookeeper, Hive, and Pig) should move out from under the Hadoop
>>>> umbrella and become top level Apache projects (TLP).  This
>>>> discussion has picked up recently since the Apache board has  
>>>> clearly
>
>>>> communicated to the Hadoop PMC that it is concerned that Hadoop is
>>>> acting as an umbrella project with many disjoint subprojects
>>>> underneath it.  They are concerned that this gives Apache little
>>>> insight into the health and happenings of the subproject  
>>>> communities
>
>>>> which in turn means Apache cannot properly mentor those  
>>>> communities.
>>>>
>>>> The purpose of this email is to start a discussion within the Pig
>>>> community about this topic.  Let me cover first what becoming TLP
>>>> would mean for Pig, and then I'll go into what options I think we  
>>>> as
> a community have.
>>>>
>>>> Becoming a TLP would mean that Pig would itself have a PMC that
>>>> would report directly to the Apache board.  Who would be on the PMC
>>>> would be something we as a community would need to decide.  Common
>>>> options would be to say all active committers are on the PMC, or  
>>>> all
>
>>>> active committers who have been a committer for at least a year.   
>>>> We
>
>>>> would also need to elect a chair of the PMC.  This lucky person
>>>> would have no additional power, but would have the additional
>>>> responsibility of writing quarterly reports on Pig's status for
>>>> Apache board meetings, as well as coordinating with Apache to get
>>>> accounts for new  committers, etc.  For more information see
>>>> http://www.apache.org/foundation/how-it-works.html#roles
>>>>
>>>> Becoming a TLP would not mean that we are ostracized from the  
>>>> Hadoop
>
>>>> community.  We would continue to be invited to Hadoop Summits,  
>>>> HUGs,
> etc.
>>>> Since all Pig developers and users are by definition Hadoop users,
>>>> we would continue to be a strong presence in the Hadoop community.
>>>>
>>>> I see three ways that we as a community can respond to this:
>>>>
>>>> 1) Say yes, we want to be a TLP now.
>>>> 2) Say yes, we want to be a TLP, but not yet.  We feel we need more
>>>> time to mature.  If we choose this option we need to be able to
>>>> clearly articulate how much time we need and what we hope to see
>>>> change in that time.
>>>> 3) Say no, we feel the benefits for us staying with Hadoop outweigh
>>>> the drawbacks of being a disjoint subproject.  If we choose this,  
>>>> we
>
>>>> need to be able to say exactly what those benefits are and why we
>>>> feel they will be compromised by leaving the Hadoop project.
>>>>
>>>> There may other options that I haven't thought of.  Please feel  
>>>> free
>
>>>> to suggest any you think of.
>>>>
>>>> Questions?  Thoughts?  Let the discussion begin.
>>>>
>>>> Alan.
>>>>
>>>>
>>>
>

RE: Begin a discussion about Pig as a top level project

Posted by Santhosh Srinivasan <sm...@yahoo-inc.com>.

I see this as a multi-part question. Looking back at some of the
significant roadmap/existential questions asked in the last 12 months, I
see the following:

1. With the introduction of SQL, what is the philosophy of Pig (I sent
an email about this approximately 9 months ago)
2. What is the approach to support backward compatibility in Pig (Alan
had sent an email about this 3 months ago)
3. Should Pig be a TLP (the current email thread).

Here is my take on answering the aforementioned questions.

The initial philosophy of Pig was to be backend agnostic. It was
designed as a data flow language. Whenever a new language is designed,
the syntax and semantics of the language have to be laid out. The syntax
is usually captured in the form of a BNF grammar. The semantics are
defined by the language creators. Backward compatibility is then a
question of holding true to the syntax and semantics. With Pig, in
addition to the language, the Java APIs were exposed to customers to
implement UDFs (load/store/filter/grouping/row transformation etc),
provision looping since the language does not support looping constructs
and also support a programmatic mode of access. Backward compatibility
in this context is to support API versioning.

Do we still intend to position as a data flow language that is backend
agnostic? If the answer is yes, then there is a strong case for making
Pig a TLP.

Are we influenced by Hadoop? A big YES! The reason Pig chose to become a
Hadoop sub-project was to ride the Hadoop popularity wave. As a
consequence, we chose to be heavily influenced by the Hadoop roadmap.

Like a good lawyer, I also have rebuttals to Alan's questions :)

1. Search engine popularity - We can discuss this with the Hadoop team
and still retain links to TLP's that are coupled (loosely or tightly).
2. Explicit connection to Hadoop - I see this as logical connection v/s
physical connection. Today, we are physically connected as a
sub-project. Becoming a TLP, will not increase/decrease our influence on
the Hadoop community (think Logical, Physical and MR Layers :)
3. Philosophy - I have already talked about this. The tight coupling is
by choice. If Pig continues to be a data flow language with clear syntax
and semantics then someone can implement Pig on top of a different
backend. Do we intend to take this approach?

I just wanted to offer a different opinion to this thread. I strongly
believe that we should think about the original philosophy. Will we have
a Pig standards committee that will decide on the changes to the
language (think C/C++) if there are multiple backend implementations?

I will reserve my vote based on the outcome of the philosophy and
backward compatibility discussions. If we decide that Pig will be
treated and maintained like a true language with clear syntax and
semantics then we have a strong case to make it into a TLP. If not, we
should retain our existing ties to Hadoop and make Pig into a data flow
language for Hadoop.

Santhosh

-----Original Message-----
From: Thejas Nair [mailto:tejas@yahoo-inc.com] 
Sent: Friday, April 02, 2010 4:08 PM
To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
Subject: Re: Begin a discussion about Pig as a top level project

I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and
heavily influenced by its roadmap. I think it makes sense to continue as
a sub-project of hadoop.

-Thejas

On 3/31/10 4:04 PM, "Dmitriy Ryaboy" <dv...@gmail.com> wrote:

> Over time, Pig is increasing its coupling to Hadoop (for good 
> reasons), rather than decreasing it. If and when Pig becomes a viable 
> entity without hadoop around, it might make sense as a TLP. As is, I 
> think becoming a TLP will only introduce unnecessary administrative
and bureaucratic headaches.
> So my vote is also -1.
> 
> -Dmitriy
> 
> 
> 
> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <ga...@yahoo-inc.com>
wrote:
> 
>> So far I haven't seen any feedback on this.  Apache has asked the 
>> Hadoop PMC to submit input in April on whether some subprojects 
>> should be promoted to TLPs.  We, the Pig community, need to give 
>> feedback to the Hadoop PMC on how we feel about this.  Please make
your voice heard.
>> 
>> So now I'll head my own call and give my thoughts on it.
>> 
>> The biggest advantage I see to being a TLP is a direct connection to 
>> Apache.  Right now all of the Pig team's interaction with Apache is 
>> through the Hadoop PMC.  Being directly connected to Apache would 
>> benefit Pig team members who would have a better view into Apache.  
>> It would also raise our profile in Apache and thus make other
projects more aware of us.
>> 
>> However, I am concerned about loosing Pig's explicit connection to
Hadoop.
>>  This concern has a couple of dimensions.  One, Hadoop and MapReduce 
>> are the current flavor of the month in computing.  Given that Pig 
>> shares a name with the common farm animal, it's hard to be sure based
on search statistics.
>>  But Google trends shows that "hadoop" is searched on much more 
>> frequently than "hadoop pig" or "apache pig" (see 
>> http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing 
>> that most Pig users come from Hadoop users who discover Pig via
Hadoop's website.
>>  Loosing that subproject tab on Hadoop's front page may radically 
>> lower the number of users coming to Pig to check out our project.  I 
>> would argue that this benefits Hadoop as well, since high level 
>> languages like Pig Latin have the potential to greatly extend the
user base and usability of Hadoop.
>> 
>> Two, being explicitly connected to Hadoop keeps our two communities 
>> aware of each others needs.  There are features proposed for MR that 
>> would greatly help Pig.  By staying in the Hadoop community Pig is 
>> better positioned to advocate for and help implement and test those 
>> features.  The response to this will be that Pig developers can still

>> subscribe to Hadoop mailing lists, submit patches, etc.  That is, 
>> they can still be part of the Hadoop community.  Which reinforces my 
>> point that it makes more sense to leave Pig in the Hadoop community 
>> since Pig developers will need to be part of that community anyway.
>> 
>> Finally, philosophically it makes sense to me that projects that are 
>> tightly connected belong together.  It strikes me as strange to have 
>> Pig as a TLP completely dependent on another TLP.  Hadoop was 
>> originally a subproject of Lucene.  It moved out to be a TLP when it 
>> became obvious that Hadoop had become independent of and useful apart

>> from Lucene.  Pig is not in that position relative to Hadoop.
>> 
>> So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to 
>> being persuaded that I'm wrong or my concerns can be addressed while 
>> still having Pig as a TLP.
>> 
>> Alan.
>> 
>> 
>> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
>> 
>>  You have probably heard by now that there is a discussion going on 
>> in the
>>> Hadoop PMC as to whether a number of the subprojects (Hbase, Avro, 
>>> Zookeeper, Hive, and Pig) should move out from under the Hadoop 
>>> umbrella and become top level Apache projects (TLP).  This 
>>> discussion has picked up recently since the Apache board has clearly

>>> communicated to the Hadoop PMC that it is concerned that Hadoop is 
>>> acting as an umbrella project with many disjoint subprojects 
>>> underneath it.  They are concerned that this gives Apache little 
>>> insight into the health and happenings of the subproject communities

>>> which in turn means Apache cannot properly mentor those communities.
>>> 
>>> The purpose of this email is to start a discussion within the Pig 
>>> community about this topic.  Let me cover first what becoming TLP 
>>> would mean for Pig, and then I'll go into what options I think we as
a community have.
>>> 
>>> Becoming a TLP would mean that Pig would itself have a PMC that 
>>> would report directly to the Apache board.  Who would be on the PMC 
>>> would be something we as a community would need to decide.  Common 
>>> options would be to say all active committers are on the PMC, or all

>>> active committers who have been a committer for at least a year.  We

>>> would also need to elect a chair of the PMC.  This lucky person 
>>> would have no additional power, but would have the additional 
>>> responsibility of writing quarterly reports on Pig's status for 
>>> Apache board meetings, as well as coordinating with Apache to get 
>>> accounts for new  committers, etc.  For more information see 
>>> http://www.apache.org/foundation/how-it-works.html#roles
>>> 
>>> Becoming a TLP would not mean that we are ostracized from the Hadoop

>>> community.  We would continue to be invited to Hadoop Summits, HUGs,
etc.
>>>  Since all Pig developers and users are by definition Hadoop users, 
>>> we would continue to be a strong presence in the Hadoop community.
>>> 
>>> I see three ways that we as a community can respond to this:
>>> 
>>> 1) Say yes, we want to be a TLP now.
>>> 2) Say yes, we want to be a TLP, but not yet.  We feel we need more 
>>> time to mature.  If we choose this option we need to be able to 
>>> clearly articulate how much time we need and what we hope to see 
>>> change in that time.
>>> 3) Say no, we feel the benefits for us staying with Hadoop outweigh 
>>> the drawbacks of being a disjoint subproject.  If we choose this, we

>>> need to be able to say exactly what those benefits are and why we 
>>> feel they will be compromised by leaving the Hadoop project.
>>> 
>>> There may other options that I haven't thought of.  Please feel free

>>> to suggest any you think of.
>>> 
>>> Questions?  Thoughts?  Let the discussion begin.
>>> 
>>> Alan.
>>> 
>>> 
>>

Re: Begin a discussion about Pig as a top level project

Posted by Thejas Nair <te...@yahoo-inc.com>.

I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and
heavily influenced by its roadmap. I think it makes sense to continue as a
sub-project of hadoop.

-Thejas



On 3/31/10 4:04 PM, "Dmitriy Ryaboy" <dv...@gmail.com> wrote:

> Over time, Pig is increasing its coupling to Hadoop (for good reasons),
> rather than decreasing it. If and when Pig becomes a viable entity without
> hadoop around, it might make sense as a TLP. As is, I think becoming a TLP
> will only introduce unnecessary administrative and bureaucratic headaches.
> So my vote is also -1.
> 
> -Dmitriy
> 
> 
> 
> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <ga...@yahoo-inc.com> wrote:
> 
>> So far I haven't seen any feedback on this.  Apache has asked the Hadoop
>> PMC to submit input in April on whether some subprojects should be promoted
>> to TLPs.  We, the Pig community, need to give feedback to the Hadoop PMC on
>> how we feel about this.  Please make your voice heard.
>> 
>> So now I'll head my own call and give my thoughts on it.
>> 
>> The biggest advantage I see to being a TLP is a direct connection to
>> Apache.  Right now all of the Pig team's interaction with Apache is through
>> the Hadoop PMC.  Being directly connected to Apache would benefit Pig team
>> members who would have a better view into Apache.  It would also raise our
>> profile in Apache and thus make other projects more aware of us.
>> 
>> However, I am concerned about loosing Pig's explicit connection to Hadoop.
>>  This concern has a couple of dimensions.  One, Hadoop and MapReduce are the
>> current flavor of the month in computing.  Given that Pig shares a name with
>> the common farm animal, it's hard to be sure based on search statistics.
>>  But Google trends shows that "hadoop" is searched on much more frequently
>> than "hadoop pig" or "apache pig" (see
>> http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing that
>> most Pig users come from Hadoop users who discover Pig via Hadoop's website.
>>  Loosing that subproject tab on Hadoop's front page may radically lower the
>> number of users coming to Pig to check out our project.  I would argue that
>> this benefits Hadoop as well, since high level languages like Pig Latin have
>> the potential to greatly extend the user base and usability of Hadoop.
>> 
>> Two, being explicitly connected to Hadoop keeps our two communities aware
>> of each others needs.  There are features proposed for MR that would greatly
>> help Pig.  By staying in the Hadoop community Pig is better positioned to
>> advocate for and help implement and test those features.  The response to
>> this will be that Pig developers can still subscribe to Hadoop mailing
>> lists, submit patches, etc.  That is, they can still be part of the Hadoop
>> community.  Which reinforces my point that it makes more sense to leave Pig
>> in the Hadoop community since Pig developers will need to be part of that
>> community anyway.
>> 
>> Finally, philosophically it makes sense to me that projects that are
>> tightly connected belong together.  It strikes me as strange to have Pig as
>> a TLP completely dependent on another TLP.  Hadoop was originally a
>> subproject of Lucene.  It moved out to be a TLP when it became obvious that
>> Hadoop had become independent of and useful apart from Lucene.  Pig is not
>> in that position relative to Hadoop.
>> 
>> So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to being
>> persuaded that I'm wrong or my concerns can be addressed while still having
>> Pig as a TLP.
>> 
>> Alan.
>> 
>> 
>> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
>> 
>>  You have probably heard by now that there is a discussion going on in the
>>> Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
>>> Zookeeper, Hive, and Pig) should move out from under the Hadoop umbrella and
>>> become top level Apache projects (TLP).  This discussion has picked up
>>> recently since the Apache board has clearly communicated to the Hadoop PMC
>>> that it is concerned that Hadoop is acting as an umbrella project with many
>>> disjoint subprojects underneath it.  They are concerned that this gives
>>> Apache little insight into the health and happenings of the subproject
>>> communities which in turn means Apache cannot properly mentor those
>>> communities.
>>> 
>>> The purpose of this email is to start a discussion within the Pig
>>> community about this topic.  Let me cover first what becoming TLP would mean
>>> for Pig, and then I'll go into what options I think we as a community have.
>>> 
>>> Becoming a TLP would mean that Pig would itself have a PMC that would
>>> report directly to the Apache board.  Who would be on the PMC would be
>>> something we as a community would need to decide.  Common options would be
>>> to say all active committers are on the PMC, or all active committers who
>>> have been a committer for at least a year.  We would also need to elect a
>>> chair of the PMC.  This lucky person would have no additional power, but
>>> would have the additional responsibility of writing quarterly reports on
>>> Pig's status for Apache board meetings, as well as coordinating with Apache
>>> to get accounts for new  committers, etc.  For more information see
>>> http://www.apache.org/foundation/how-it-works.html#roles
>>> 
>>> Becoming a TLP would not mean that we are ostracized from the Hadoop
>>> community.  We would continue to be invited to Hadoop Summits, HUGs, etc.
>>>  Since all Pig developers and users are by definition Hadoop users, we would
>>> continue to be a strong presence in the Hadoop community.
>>> 
>>> I see three ways that we as a community can respond to this:
>>> 
>>> 1) Say yes, we want to be a TLP now.
>>> 2) Say yes, we want to be a TLP, but not yet.  We feel we need more time
>>> to mature.  If we choose this option we need to be able to clearly
>>> articulate how much time we need and what we hope to see change in that
>>> time.
>>> 3) Say no, we feel the benefits for us staying with Hadoop outweigh the
>>> drawbacks of being a disjoint subproject.  If we choose this, we need to be
>>> able to say exactly what those benefits are and why we feel they will be
>>> compromised by leaving the Hadoop project.
>>> 
>>> There may other options that I haven't thought of.  Please feel free to
>>> suggest any you think of.
>>> 
>>> Questions?  Thoughts?  Let the discussion begin.
>>> 
>>> Alan.
>>> 
>>> 
>>