You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Matt Foley <ma...@apache.org> on 2012/11/24 21:13:06 UTC

[VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

For discussion, please see previous thread "[PROPOSAL] introduce Python as
build-time and run-time dependency for Hadoop and throughout Hadoop stack".

This vote consists of three separate items:

1. Contributors shall be allowed to use Python as a platform-independent
scripting language for build-time tasks, and add Python as a build-time
dependency.
Please vote +1, 0, -1.

2. Contributors shall be encouraged to use Maven tasks in combination with
either plug-ins or Groovy scripts to do cross-platform build-time tasks,
even under ant in Hadoop-1.
Please vote +1, 0, -1.

3. Contributors shall be allowed to use Python as a platform-independent
scripting language for run-time tasks, and add Python as a run-time
dependency.
Please vote +1, 0, -1.

Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
use Maven plug-ins or Groovy as the only means of cross-platform build-time
tasks, or to simply continue using platform-dependent scripts as is being
done today.

Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and
until those are worked out I don't want to delay moving to cross-platform
scripts for build-time tasks.

Best regards,
--Matt

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Luke Lu <ll...@vicaya.com>.
-1, +1, -1.

If we want to introduce a "platform independent" scripting language, we
should not choose python, as it has a bad track record for compatibility
(between versions/platforms).

+1 to use groovy, as we can control the version of groovy jars included in
our distribution.

__Luke


On Mon, Nov 26, 2012 at 8:53 AM, Colin McCabe <cm...@alumni.cmu.edu>wrote:

> Nonbinding, but:
>
> +1, +1, 0.
>
> Also, let's please clearly define the versions of Python we support if
> we do chooes to go this route.  Something like 2.4+ would be
> reasonable.  The process launching APIs in particular changed a lot in
> those early 2.x releases.
>
> best,
> Colin
>
>
> On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <ma...@apache.org> wrote:
> > For discussion, please see previous thread "[PROPOSAL] introduce Python
> as
> > build-time and run-time dependency for Hadoop and throughout Hadoop
> stack".
> >
> > This vote consists of three separate items:
> >
> > 1. Contributors shall be allowed to use Python as a platform-independent
> > scripting language for build-time tasks, and add Python as a build-time
> > dependency.
> > Please vote +1, 0, -1.
> >
> > 2. Contributors shall be encouraged to use Maven tasks in combination
> with
> > either plug-ins or Groovy scripts to do cross-platform build-time tasks,
> > even under ant in Hadoop-1.
> > Please vote +1, 0, -1.
> >
> > 3. Contributors shall be allowed to use Python as a platform-independent
> > scripting language for run-time tasks, and add Python as a run-time
> > dependency.
> > Please vote +1, 0, -1.
> >
> > Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors
> to
> > use Maven plug-ins or Groovy as the only means of cross-platform
> build-time
> > tasks, or to simply continue using platform-dependent scripts as is being
> > done today.
> >
> > Vote closes at 12:30pm PST on Saturday 1 December.
> > ---------
> > Personally, my vote is +1, +1, +1.
> > I think #2 is preferable to #1, but still has many unknowns in it, and
> > until those are worked out I don't want to delay moving to cross-platform
> > scripts for build-time tasks.
> >
> > Best regards,
> > --Matt
>

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Chris Nauroth <cn...@hortonworks.com>.
Declaring 2.4 to be the minimum supported version sounds like a great idea.
 I've worked with CentOS distributions that have a dependency on Python
2.4, and it was always awkward to get a later version on those machines.

Thank you,
--Chris

On Mon, Nov 26, 2012 at 8:53 AM, Colin McCabe <cm...@alumni.cmu.edu>wrote:

> Nonbinding, but:
>
> +1, +1, 0.
>
> Also, let's please clearly define the versions of Python we support if
> we do chooes to go this route.  Something like 2.4+ would be
> reasonable.  The process launching APIs in particular changed a lot in
> those early 2.x releases.
>
> best,
> Colin
>
>
> On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <ma...@apache.org> wrote:
> > For discussion, please see previous thread "[PROPOSAL] introduce Python
> as
> > build-time and run-time dependency for Hadoop and throughout Hadoop
> stack".
> >
> > This vote consists of three separate items:
> >
> > 1. Contributors shall be allowed to use Python as a platform-independent
> > scripting language for build-time tasks, and add Python as a build-time
> > dependency.
> > Please vote +1, 0, -1.
> >
> > 2. Contributors shall be encouraged to use Maven tasks in combination
> with
> > either plug-ins or Groovy scripts to do cross-platform build-time tasks,
> > even under ant in Hadoop-1.
> > Please vote +1, 0, -1.
> >
> > 3. Contributors shall be allowed to use Python as a platform-independent
> > scripting language for run-time tasks, and add Python as a run-time
> > dependency.
> > Please vote +1, 0, -1.
> >
> > Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors
> to
> > use Maven plug-ins or Groovy as the only means of cross-platform
> build-time
> > tasks, or to simply continue using platform-dependent scripts as is being
> > done today.
> >
> > Vote closes at 12:30pm PST on Saturday 1 December.
> > ---------
> > Personally, my vote is +1, +1, +1.
> > I think #2 is preferable to #1, but still has many unknowns in it, and
> > until those are worked out I don't want to delay moving to cross-platform
> > scripts for build-time tasks.
> >
> > Best regards,
> > --Matt
>

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Colin McCabe <cm...@alumni.cmu.edu>.
Nonbinding, but:

+1, +1, 0.

Also, let's please clearly define the versions of Python we support if
we do chooes to go this route.  Something like 2.4+ would be
reasonable.  The process launching APIs in particular changed a lot in
those early 2.x releases.

best,
Colin


On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <ma...@apache.org> wrote:
> For discussion, please see previous thread "[PROPOSAL] introduce Python as
> build-time and run-time dependency for Hadoop and throughout Hadoop stack".
>
> This vote consists of three separate items:
>
> 1. Contributors shall be allowed to use Python as a platform-independent
> scripting language for build-time tasks, and add Python as a build-time
> dependency.
> Please vote +1, 0, -1.
>
> 2. Contributors shall be encouraged to use Maven tasks in combination with
> either plug-ins or Groovy scripts to do cross-platform build-time tasks,
> even under ant in Hadoop-1.
> Please vote +1, 0, -1.
>
> 3. Contributors shall be allowed to use Python as a platform-independent
> scripting language for run-time tasks, and add Python as a run-time
> dependency.
> Please vote +1, 0, -1.
>
> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
> use Maven plug-ins or Groovy as the only means of cross-platform build-time
> tasks, or to simply continue using platform-dependent scripts as is being
> done today.
>
> Vote closes at 12:30pm PST on Saturday 1 December.
> ---------
> Personally, my vote is +1, +1, +1.
> I think #2 is preferable to #1, but still has many unknowns in it, and
> until those are worked out I don't want to delay moving to cross-platform
> scripts for build-time tasks.
>
> Best regards,
> --Matt

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Konstantin Boudnik <co...@apache.org>.
-1, +1, -1

Thanks

On Sat, Nov 24, 2012 at 12:13PM, Matt Foley wrote:
> For discussion, please see previous thread "[PROPOSAL] introduce Python as
> build-time and run-time dependency for Hadoop and throughout Hadoop stack".
> 
> This vote consists of three separate items:
> 
> 1. Contributors shall be allowed to use Python as a platform-independent
> scripting language for build-time tasks, and add Python as a build-time
> dependency.
> Please vote +1, 0, -1.
> 
> 2. Contributors shall be encouraged to use Maven tasks in combination with
> either plug-ins or Groovy scripts to do cross-platform build-time tasks,
> even under ant in Hadoop-1.
> Please vote +1, 0, -1.
> 
> 3. Contributors shall be allowed to use Python as a platform-independent
> scripting language for run-time tasks, and add Python as a run-time
> dependency.
> Please vote +1, 0, -1.
> 
> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
> use Maven plug-ins or Groovy as the only means of cross-platform build-time
> tasks, or to simply continue using platform-dependent scripts as is being
> done today.
> 
> Vote closes at 12:30pm PST on Saturday 1 December.
> ---------
> Personally, my vote is +1, +1, +1.
> I think #2 is preferable to #1, but still has many unknowns in it, and
> until those are worked out I don't want to delay moving to cross-platform
> scripts for build-time tasks.
> 
> Best regards,
> --Matt

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Doug Cutting <cu...@apache.org>.
On Sat, Dec 1, 2012 at 2:44 AM, Steve Loughran <st...@hortonworks.com> wrote:
> WinNT Bat/CMD files are the worst possible scripting language invented. At
> the very least, .py should be the language of choice there

The scripts should not have so much logic that .bat files are a problem.

Doug

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Konstantin Boudnik <co...@apache.org>.
On Sat, Dec 01, 2012 at 10:44AM, Steve Loughran wrote:
> On 1 December 2012 01:08, Eli Collins <el...@cloudera.com> wrote:
> 
> > -1, 0, -1
> >
> > IIUC the only platform we plan to add support for that we can't easily
> > support today (w/o an emulation layer like cygwin) is Windows, and it
> > seems like making the bash scripts simpler and having parallel bat
> > files is IMO a better approach.
> >
> >
> WinNT Bat/CMD files are the worst possible scripting language invented. At
> the very least, .py should be the language of choice there

Compare to the OS in question - it isn't _that_ bad ;)


Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Steve Loughran <st...@hortonworks.com>.
On 1 December 2012 01:08, Eli Collins <el...@cloudera.com> wrote:

> -1, 0, -1
>
> IIUC the only platform we plan to add support for that we can't easily
> support today (w/o an emulation layer like cygwin) is Windows, and it
> seems like making the bash scripts simpler and having parallel bat
> files is IMO a better approach.
>
>
WinNT Bat/CMD files are the worst possible scripting language invented. At
the very least, .py should be the language of choice there

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Eli Collins <el...@cloudera.com>.
-1, 0, -1

IIUC the only platform we plan to add support for that we can't easily
support today (w/o an emulation layer like cygwin) is Windows, and it
seems like making the bash scripts simpler and having parallel bat
files is IMO a better approach.

On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <ma...@apache.org> wrote:
> For discussion, please see previous thread "[PROPOSAL] introduce Python as
> build-time and run-time dependency for Hadoop and throughout Hadoop stack".
>
> This vote consists of three separate items:
>
> 1. Contributors shall be allowed to use Python as a platform-independent
> scripting language for build-time tasks, and add Python as a build-time
> dependency.
> Please vote +1, 0, -1.
>
> 2. Contributors shall be encouraged to use Maven tasks in combination with
> either plug-ins or Groovy scripts to do cross-platform build-time tasks,
> even under ant in Hadoop-1.
> Please vote +1, 0, -1.
>
> 3. Contributors shall be allowed to use Python as a platform-independent
> scripting language for run-time tasks, and add Python as a run-time
> dependency.
> Please vote +1, 0, -1.
>
> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
> use Maven plug-ins or Groovy as the only means of cross-platform build-time
> tasks, or to simply continue using platform-dependent scripts as is being
> done today.
>
> Vote closes at 12:30pm PST on Saturday 1 December.
> ---------
> Personally, my vote is +1, +1, +1.
> I think #2 is preferable to #1, but still has many unknowns in it, and
> until those are worked out I don't want to delay moving to cross-platform
> scripts for build-time tasks.
>
> Best regards,
> --Matt

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Radim Kolar <hs...@filez.com>.
> In the PROPOSAL thread you indicated this was for Hadoop1 because it is ANT
> based. And the main reason was to remove saveVersion.sh.
>
> Your #3  was not discussed in the proposal, was it?
it was part of original proposal but not discussed much because language 
war was more attractive option. You want vote like this?

1. Using external language vs maven plugin to build
2. Using external language for startup scripts vs JVM script language. 
Such as Jython use in websphere.
3. Choose python as external language

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Matt Foley <ma...@apache.org>.
>> Python as runtime requirement. Are you planing to migrate all
BASH scripts provided by Hadoop (or dynamically created -ie launcher
scripts) to Python?

I don't intend to mandate use of Python.  Rather, I want there to be a
cross-platform option available.  Things that are best done in
platform-specific manner, should be done in shell for linux, and powershell
for windows.  But things that are best done in a platform-independent way,
can be, with a lower long-term maintenance cost than using different
scripts per platform.

This means that some, but not all, existing scripts may naturally migrate
to Python as the overall system is ported to Windows.  Hopefully when
someone is porting a script that can be well done in a platform-independent
way, they will be able to choose Python and write a single script that can
replace the shell script and make it unnecessary to maintain two scripts
(doing the same job but in different languages!) going forward.

>>  What else in the current build, besides saveVersion.sh, you see
as candidate to be migrated to Python?

I have a greatly improved version of src/docs/relnotes.py that I would like
to submit, for auto-gen of release notes.
That's all that I have on my hotlist right now, although I anticipate that
some of the shell scripts invoked by ant may be natural candidates.

>> How are you planning to define what Python modules can be used?
Will developers have to install them manually?

That's something the community will work out, the same way they decide what
library jars to include, and when to upgrade those versions.  But first,
let's get an agreement in principle that this is the direction we want to
go.

Cheers,
--Matt

On Thu, Nov 29, 2012 at 3:26 PM, Alejandro Abdelnur <tu...@cloudera.com>wrote:

> Matt, thanks for the clarification.
>
> I may have missed the main point of the PROPOSAL thread then. I personally
> want to continue the discussion before voting.
>
> * Phyton as runtime requirement. Are you planing to migrate all BASH
> scripts provided by Hadoop (or dynamically created -ie launcher scripts)
>  to Phyton?
> * What else in the current build, besides saveVersion.sh, you see as
> candidate to be migrated to Phyton?
> * How are you planning to define what Phyton modules can be used? Will
> developers have to install them manually?
>
> Cheers
>
>
> On Thu, Nov 29, 2012 at 2:39 PM, Matt Foley <mf...@hortonworks.com>
> wrote:
>
> > Hi Alejandro,
> > Please see in-line below.
> >
> > On Mon, Nov 26, 2012 at 1:52 PM, Alejandro Abdelnur <tu...@cloudera.com>
> >  wrote:
> >
> > > Matt,
> > >
> > > The scope of this vote seems different from what was discussed in the
> > > PROPOSAL thread.
> > > In the PROPOSAL thread you indicated this was for Hadoop1 because it is
> > ANT
> > > based. And the main reason was to remove saveVersion.sh.
> > > Your #3  was not discussed in the proposal, was it?
> > >
> >
> > The item #3 was in my original statement of the problem, with which I
> > started the proposal thread.  In fact, the thread title was "[PROPOSAL]
> > introduce Python as build-time and run-time dependency for Hadoop and
> > throughout Hadoop stack".  It is true that only one or two people chose
> to
> > discuss #3 further in that thread.
> >
> > The point is not just to replace a single script, but to provide a means
> to
> > do cross-platform scripts, which will over time replace many
> > non-platform-specific scripts written in platform-specific languages.
> >
> >
> > >
> > > It seems this vote is dragging much more stuff it was originally
> > discussed.
> > > I think you should suspend the vote, recap the motivation and then
> > restart
> > > the vote.
> > >
> >
> > I respectfully disagree.  I believe a careful reading of the cited
> > discussion thread, plus my own statement of the vote, provides sufficient
> > background for a thoughtful decision on the subject.  Presumably so do
> the
> > ten other people who had already voted before you made that comment.
> >
> > If several other people want more discussion first, please speak up.
> > Thanks,
> > --Matt
> >
> > As things are laid out at the moment my vote is:
> > >
> > > -1 (It still seems an overkill to introduce a new runtime requirement
> for
> > > building to replace a script.)
> > > +1 (I think this is the right way to simplify the build)
> > > -1 (AFAIK there is not such requirement at the moment, and if it comes
> it
> > > would be in the form of an AM, which I'd argue it should leave outside
> of
> > > Hadoop)
> > >
> > > Thx
> > >
> > >
> > > On Mon, Nov 26, 2012 at 1:16 PM, Giridharan Kesavan <
> > > gkesavan@hortonworks.com> wrote:
> > >
> > > > +1, +1, +1
> > > >
> > > > -Giri
> > > >
> > > >
> > > > On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <ma...@apache.org>
> wrote:
> > > >
> > > > > For discussion, please see previous thread "[PROPOSAL] introduce
> > Python
> > > > as
> > > > > build-time and run-time dependency for Hadoop and throughout Hadoop
> > > > stack".
> > > > >
> > > > > This vote consists of three separate items:
> > > > >
> > > > > 1. Contributors shall be allowed to use Python as a
> > > platform-independent
> > > > > scripting language for build-time tasks, and add Python as a
> > build-time
> > > > > dependency.
> > > > > Please vote +1, 0, -1.
> > > > >
> > > > > 2. Contributors shall be encouraged to use Maven tasks in
> combination
> > > > with
> > > > > either plug-ins or Groovy scripts to do cross-platform build-time
> > > tasks,
> > > > > even under ant in Hadoop-1.
> > > > > Please vote +1, 0, -1.
> > > > >
> > > > > 3. Contributors shall be allowed to use Python as a
> > > platform-independent
> > > > > scripting language for run-time tasks, and add Python as a run-time
> > > > > dependency.
> > > > > Please vote +1, 0, -1.
> > > > >
> > > > > Note that voting -1 on #1 and +1 on #2 essentially REQUIRES
> > > contributors
> > > > to
> > > > > use Maven plug-ins or Groovy as the only means of cross-platform
> > > > build-time
> > > > > tasks, or to simply continue using platform-dependent scripts as is
> > > being
> > > > > done today.
> > > > >
> > > > > Vote closes at 12:30pm PST on Saturday 1 December.
> > > > > ---------
> > > > > Personally, my vote is +1, +1, +1.
> > > > > I think #2 is preferable to #1, but still has many unknowns in it,
> > and
> > > > > until those are worked out I don't want to delay moving to
> > > cross-platform
> > > > > scripts for build-time tasks.
> > > > >
> > > > > Best regards,
> > > > > --Matt
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Alejandro
> > >
> >
>
>
>
> --
> Alejandro
>

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Luke Lu <ll...@vicaya.com>.
On Fri, Nov 30, 2012 at 5:29 AM, Steve Loughran <st...@hortonworks.com>wrote:

> Yes, you need python at build time, but that's no worse than having a
> protoc
> compiler, gcc and the automake toolchain.
>

The problem is that python is known to have _backward_ compatibility issues
on various platforms. It would be very annoying/time consuming to deal with
various support issues regarding python versions on various platforms.

I agree that autotools is a nightmare and should be converted (in branch-1
as well) to cmake (which has good versioning support :) The goal is to have
less external dependencies, not more, again mostly due to support issues.
If we want to introduce an external dependencies, we need to pick something
that are easy to support compatibility wise.

__Luke

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Luke Lu <ll...@vicaya.com>.
On Fri, Nov 30, 2012 at 5:29 AM, Steve Loughran <st...@hortonworks.com>wrote:

> where do you set your env vars... and what about JAVA_HOME
>

There should be only two env vars (JAVA_HOME and HADOOP_HOME) to deal with
in the native scripts (.bat on windows and .sh on unix platforms) to
boostrap jruby scripts, which deal with the rest of the envs.

__Luke

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Steve Loughran <st...@hortonworks.com>.
On 30 November 2012 12:57, Luke Lu <ll...@apache.org> wrote:

> I'd like to change my binding vote to -1, -0, -1.
>
> Considering the hadoop stack/ecosystem as a whole, I think the best cross
> platform scripting language to adopt is jruby for following reasons:
>
> 1. HBase already adopted jruby for HBase shell, which all current platform
> vendors support.
> 2. We can control the version of language implementation at a per release
> basis.
> 3. We don't have to introduce new dependencies in the de facto hadoop
> stack. (see 1).
>
>
I don't see why these arguments should have any impact on using python at
build time, as it doesn't introduce any dependencies downstream. Yes, you
need python at build time, but that's no worse than having a protoc
compiler, gcc and the automake toolchain.



> I'm all for improving multi-platform support. I think the best way to do
> this is to have a thin native script wrappers (using env vars) to call the
> cross-platform jruby scripts.
>
>
Were it not for the env-var configuration hierarchy mess that things are in
today, I'd agree. where do you set your env vars? hadoop-env.sh? Where does
that come from? the hadoop conf dir? How do you find that? An env variable
or a ../../conf from bin/hadoop.sh which breaks once you start symlinking
to hadoop/bin; or do you assume a root installation in /etc/hadoop/conf,
which points to /etc/alternatives/hadoop-conf, which can then point back to
/etc/hadoop/conf.pseudo ? And what about JAVA_HOME?

Those env vars are  something I'd like see the back of.

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Luke Lu <ll...@apache.org>.
I'd like to change my binding vote to -1, -0, -1.

Considering the hadoop stack/ecosystem as a whole, I think the best cross
platform scripting language to adopt is jruby for following reasons:

1. HBase already adopted jruby for HBase shell, which all current platform
vendors support.
2. We can control the version of language implementation at a per release
basis.
3. We don't have to introduce new dependencies in the de facto hadoop
stack. (see 1).

I'm all for improving multi-platform support. I think the best way to do
this is to have a thin native script wrappers (using env vars) to call the
cross-platform jruby scripts.

__Luke



On Fri, Nov 30, 2012 at 3:21 AM, Luke Lu <ll...@vicaya.com> wrote:

> Thanks for the voting thread. Otherwise, many committers would have missed
> it.
>
> I agree that this is a superset of code change that has larger impact than
> typical code change.
>
>
> On Thu, Nov 29, 2012 at 6:26 PM, Matt Foley <ma...@apache.org> wrote:
>
>> > Finally, this is code change, so I'm not sure why we are doing a vote.
>>
>>
>> I view this as a tools issue, that affects questions that go beyond the
>> one-time choice of how to write (or re-write) saveVersion.sh.  Also Aaron
>> (atm) recommended that I bring it to the list.  So here we are :-)
>>
>
>

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Luke Lu <ll...@vicaya.com>.
Thanks for the voting thread. Otherwise, many committers would have missed
it.

I agree that this is a superset of code change that has larger impact than
typical code change.

On Thu, Nov 29, 2012 at 6:26 PM, Matt Foley <ma...@apache.org> wrote:

> > Finally, this is code change, so I'm not sure why we are doing a vote.
>
>
> I view this as a tools issue, that affects questions that go beyond the
> one-time choice of how to write (or re-write) saveVersion.sh.  Also Aaron
> (atm) recommended that I bring it to the list.  So here we are :-)
>

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Arun C Murthy <ac...@hortonworks.com>.
On Nov 29, 2012, at 6:26 PM, Matt Foley wrote:

> Hello again.  Crossed in the mail.
> 
> * What kind of tasks you envision Python scripts will enable that are
>> not possible today?
> 
> 
> The point isn't to open brave new worlds.  The point is to avoid the
> nightmare of having to maintain multiple "parallel" scripts doing the SAME
> THING in multiple scripting languages.  

+1, +1, +1

Couldn't agree more, I don't want to be in the business of having the same logic in multiple platform-specific scripts - doesn't make any sense.

Arun


Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Bikas Saha <bi...@hortonworks.com>.
+1, +1, +1 (non-binding)

We have had promising results for 1 and 2 when porting to Windows. 3 would
allow us to remove platform dependencies from test code. Agree that there
might be some nuanced operations that require OS specific environments but
this would lead to keeping them at a minimum.

Bikas

On 11/29/12 7:22 PM, "Chuan Liu" <ch...@microsoft.com> wrote:

>+1 +1 +1
>
>Agree with Matt on the code maintainability.
>
>I think on one side we have Shell which is a script language and OS
>dependent, e.g. as in bash vs powershell;
>on the other side we have Java which is not a script language and OS
>independent.
>I would accept any script language that can fix the gap as an OS
>independent scripting language.
>Personally, I also prefer Python over Ruby.
>
>Thanks,
>Chuan
>
>________________________________________
>From: mfoley@hortonworks.com on behalf of Matt Foley
>Sent: Thursday, November 29, 2012 6:26 PM
>To: common-dev@hadoop.apache.org
>Subject: Re: [VOTE] introduce Python as build-time and run-time
>dependency for Hadoop and throughout Hadoop stack
>
>Hello again.  Crossed in the mail.
>
>* What kind of tasks you envision Python scripts will enable that are
>> not possible today?
>
>
>The point isn't to open brave new worlds.  The point is to avoid the
>nightmare of having to maintain multiple "parallel" scripts doing the SAME
>THING in multiple scripting languages.  I know from experience that they
>never get maintained right.  It's just a huge source of bugs, because when
>they are in different languages, it can be quite difficult to determine
>that they are *really* doing the same thing.  And in a case like shell vs
>powershell, it will be very common to have contributors who are not
>experts
>in both.
>
>I care deeply about having a high-quality release in both Linux and
>Windows.  And having a cross-platform scripting language will make it much
>easier to maintain that quality over time, without "slip" between the two
>platforms.
>
>* Will the requirement of Python be pushed to clients using the
>> hadoop script? If so, this would affect all downstream projects that use
>> hadoop script in one why or the other, right?
>
>
>If question #3 passes, then Python will become a run-time dependency for
>Hadoop.  That means it would need to be installed as part of the Hadoop
>install preparation, just like all the other Hadoop run-time dependencies.
>
>Is the main motivation of the proposal to make things easier for window,
>> so there is no need for cygwin? If that is the case, have you considered
>> doing directly BAT scripts? If you take Tomcat for example, they have
>>BAT
>> scripts and SH scripts and things work quite nicely.
>
>
>Of course it is sufficient, from the simple implementation perspective, to
>translate all the shell scripts into bat or (better) powershell scripts.
> That is, in fact, the most evident alternative to my proposals #1 and #3.
>
>However, I ask -- beg! -- the community to consider it from the software
>engineering perspective.  We aren't here to just implement something once
>and be done.  It has to be maintained, as most of you on this list are
>well
>aware, for years and years, across multiple generations.  And trying to
>maintain parallel scripts in multiple languages, when not necessitated by
>genuine platform-specific requirements, is just creating bug generators in
>the system.
>
>Personally, I wouldn't be trilled to see the logic in the scripts to
>> get more complex, but on the opposite direction; IMO, scripts should be
>> trimmed to set env vars (with no voodoo logic), build the classpath
>>(with
>> no voodoo logic, just from a set of dirs) and call Java.
>
>
>See the first item above.  The point is to enable cross-platform scripting
>of the things we already have to script.  IMO, scripts should get out of
>the env var business entirely, but that's unrelated to this question :-)
>
>Finally, this is code change, so I'm not sure why we are doing a vote.
>
>
>I view this as a tools issue, that affects questions that go beyond the
>one-time choice of how to write (or re-write) saveVersion.sh.  Also Aaron
>(atm) recommended that I bring it to the list.  So here we are :-)
>
>Cheers,
>--Matt
>
>On Thu, Nov 29, 2012 at 5:25 PM, Alejandro Abdelnur
><tu...@cloudera.com>wrote:
>
>> Matt,
>>
>> Let me repost my previous questions and a few more. I'd appreciate your
>> answers, as it will help me understand the full impact this would have
>>in
>> Hadoop and related projects.
>>
>> * Phyton as runtime requirement. Are you planing to migrate all BASH
>> scripts provided by Hadoop (or dynamically created -ie launcher scripts)
>>  to Phyton?
>> * What else in the current build, besides saveVersion.sh, you see as
>> candidate to be migrated to Phyton?
>> * How are you planning to define what Phyton modules can be used? Will
>> developers have to install them manually?
>> * What kind of tasks you envision Python scripts will enable that are
>>not
>> possible today?
>> * Will the requirement of Python be pushed to clients using the hadoop
>> script? If so, this would affect all downstream projects that use hadoop
>> script in one why or the other, right?
>>
>> Is the main motivation of the proposal to make things easier for
>>window, so
>> there is no need for cygwin? If that is the case, have you considered
>>doing
>> directly BAT scripts? If you take Tomcat for example, they have BAT
>>scripts
>> and SH scripts and things work quite nicely.
>>
>> Personally, I wouldn't be trilled to see the logic in the scripts to get
>> more complex, but on the opposite direction; IMO, scripts should be
>>trimmed
>> to set env vars (with no voodoo logic), build the classpath (with no
>>voodoo
>> logic, just from a set of dirs) and call Java.
>>
>> Finally, this is code change, so I'm not sure why we are doing a vote.
>>
>> Thx.
>>
>> On Thu, Nov 29, 2012 at 3:26 PM, Alejandro Abdelnur <tucu@cloudera.com
>> >wrote:
>>
>> > Matt, thanks for the clarification.
>> >
>> > I may have missed the main point of the PROPOSAL thread then. I
>> personally
>> > want to continue the discussion before voting.
>> >
>> > * Phyton as runtime requirement. Are you planing to migrate all BASH
>> > scripts provided by Hadoop (or dynamically created -ie launcher
>>scripts)
>> >  to Phyton?
>> > * What else in the current build, besides saveVersion.sh, you see as
>> > candidate to be migrated to Phyton?
>> > * How are you planning to define what Phyton modules can be used? Will
>> > developers have to install them manually?
>> >
>> > Cheers
>> >
>> >
>> > On Thu, Nov 29, 2012 at 2:39 PM, Matt Foley <mfoley@hortonworks.com
>> >wrote:
>> >
>> >> Hi Alejandro,
>> >> Please see in-line below.
>> >>
>> >> On Mon, Nov 26, 2012 at 1:52 PM, Alejandro Abdelnur
>><tu...@cloudera.com>
>> >>  wrote:
>> >>
>> >> > Matt,
>> >> >
>> >> > The scope of this vote seems different from what was discussed in
>>the
>> >> > PROPOSAL thread.
>> >> > In the PROPOSAL thread you indicated this was for Hadoop1 because
>>it
>> is
>> >> ANT
>> >> > based. And the main reason was to remove saveVersion.sh.
>> >> > Your #3  was not discussed in the proposal, was it?
>> >> >
>> >>
>> >> The item #3 was in my original statement of the problem, with which I
>> >> started the proposal thread.  In fact, the thread title was
>>"[PROPOSAL]
>> >> introduce Python as build-time and run-time dependency for Hadoop and
>> >> throughout Hadoop stack".  It is true that only one or two people
>>chose
>> to
>> >> discuss #3 further in that thread.
>> >>
>> >> The point is not just to replace a single script, but to provide a
>>means
>> >> to
>> >> do cross-platform scripts, which will over time replace many
>> >> non-platform-specific scripts written in platform-specific languages.
>> >>
>> >>
>> >> >
>> >> > It seems this vote is dragging much more stuff it was originally
>> >> discussed.
>> >> > I think you should suspend the vote, recap the motivation and then
>> >> restart
>> >> > the vote.
>> >> >
>> >>
>> >> I respectfully disagree.  I believe a careful reading of the cited
>> >> discussion thread, plus my own statement of the vote, provides
>> sufficient
>> >> background for a thoughtful decision on the subject.  Presumably so
>>do
>> the
>> >> ten other people who had already voted before you made that comment.
>> >>
>> >> If several other people want more discussion first, please speak up.
>> >> Thanks,
>> >> --Matt
>> >>
>> >> As things are laid out at the moment my vote is:
>> >> >
>> >> > -1 (It still seems an overkill to introduce a new runtime
>>requirement
>> >> for
>> >> > building to replace a script.)
>> >> > +1 (I think this is the right way to simplify the build)
>> >> > -1 (AFAIK there is not such requirement at the moment, and if it
>>comes
>> >> it
>> >> > would be in the form of an AM, which I'd argue it should leave
>>outside
>> >> of
>> >> > Hadoop)
>> >> >
>> >> > Thx
>> >> >
>> >> >
>> >> > On Mon, Nov 26, 2012 at 1:16 PM, Giridharan Kesavan <
>> >> > gkesavan@hortonworks.com> wrote:
>> >> >
>> >> > > +1, +1, +1
>> >> > >
>> >> > > -Giri
>> >> > >
>> >> > >
>> >> > > On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <ma...@apache.org>
>> >> wrote:
>> >> > >
>> >> > > > For discussion, please see previous thread "[PROPOSAL]
>>introduce
>> >> Python
>> >> > > as
>> >> > > > build-time and run-time dependency for Hadoop and throughout
>> Hadoop
>> >> > > stack".
>> >> > > >
>> >> > > > This vote consists of three separate items:
>> >> > > >
>> >> > > > 1. Contributors shall be allowed to use Python as a
>> >> > platform-independent
>> >> > > > scripting language for build-time tasks, and add Python as a
>> >> build-time
>> >> > > > dependency.
>> >> > > > Please vote +1, 0, -1.
>> >> > > >
>> >> > > > 2. Contributors shall be encouraged to use Maven tasks in
>> >> combination
>> >> > > with
>> >> > > > either plug-ins or Groovy scripts to do cross-platform
>>build-time
>> >> > tasks,
>> >> > > > even under ant in Hadoop-1.
>> >> > > > Please vote +1, 0, -1.
>> >> > > >
>> >> > > > 3. Contributors shall be allowed to use Python as a
>> >> > platform-independent
>> >> > > > scripting language for run-time tasks, and add Python as a
>> run-time
>> >> > > > dependency.
>> >> > > > Please vote +1, 0, -1.
>> >> > > >
>> >> > > > Note that voting -1 on #1 and +1 on #2 essentially REQUIRES
>> >> > contributors
>> >> > > to
>> >> > > > use Maven plug-ins or Groovy as the only means of
>>cross-platform
>> >> > > build-time
>> >> > > > tasks, or to simply continue using platform-dependent scripts
>>as
>> is
>> >> > being
>> >> > > > done today.
>> >> > > >
>> >> > > > Vote closes at 12:30pm PST on Saturday 1 December.
>> >> > > > ---------
>> >> > > > Personally, my vote is +1, +1, +1.
>> >> > > > I think #2 is preferable to #1, but still has many unknowns in
>>it,
>> >> and
>> >> > > > until those are worked out I don't want to delay moving to
>> >> > cross-platform
>> >> > > > scripts for build-time tasks.
>> >> > > >
>> >> > > > Best regards,
>> >> > > > --Matt
>> >> > > >
>> >> > >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Alejandro
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Alejandro
>> >
>>
>>
>>
>> --
>> Alejandro
>>
>



RE: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Chuan Liu <ch...@microsoft.com>.
+1 +1 +1

Agree with Matt on the code maintainability.

I think on one side we have Shell which is a script language and OS dependent, e.g. as in bash vs powershell;
on the other side we have Java which is not a script language and OS independent.
I would accept any script language that can fix the gap as an OS independent scripting language.
Personally, I also prefer Python over Ruby.

Thanks,
Chuan

________________________________________
From: mfoley@hortonworks.com on behalf of Matt Foley
Sent: Thursday, November 29, 2012 6:26 PM
To: common-dev@hadoop.apache.org
Subject: Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Hello again.  Crossed in the mail.

* What kind of tasks you envision Python scripts will enable that are
> not possible today?


The point isn't to open brave new worlds.  The point is to avoid the
nightmare of having to maintain multiple "parallel" scripts doing the SAME
THING in multiple scripting languages.  I know from experience that they
never get maintained right.  It's just a huge source of bugs, because when
they are in different languages, it can be quite difficult to determine
that they are *really* doing the same thing.  And in a case like shell vs
powershell, it will be very common to have contributors who are not experts
in both.

I care deeply about having a high-quality release in both Linux and
Windows.  And having a cross-platform scripting language will make it much
easier to maintain that quality over time, without "slip" between the two
platforms.

* Will the requirement of Python be pushed to clients using the
> hadoop script? If so, this would affect all downstream projects that use
> hadoop script in one why or the other, right?


If question #3 passes, then Python will become a run-time dependency for
Hadoop.  That means it would need to be installed as part of the Hadoop
install preparation, just like all the other Hadoop run-time dependencies.

Is the main motivation of the proposal to make things easier for window,
> so there is no need for cygwin? If that is the case, have you considered
> doing directly BAT scripts? If you take Tomcat for example, they have BAT
> scripts and SH scripts and things work quite nicely.


Of course it is sufficient, from the simple implementation perspective, to
translate all the shell scripts into bat or (better) powershell scripts.
 That is, in fact, the most evident alternative to my proposals #1 and #3.

However, I ask -- beg! -- the community to consider it from the software
engineering perspective.  We aren't here to just implement something once
and be done.  It has to be maintained, as most of you on this list are well
aware, for years and years, across multiple generations.  And trying to
maintain parallel scripts in multiple languages, when not necessitated by
genuine platform-specific requirements, is just creating bug generators in
the system.

Personally, I wouldn't be trilled to see the logic in the scripts to
> get more complex, but on the opposite direction; IMO, scripts should be
> trimmed to set env vars (with no voodoo logic), build the classpath (with
> no voodoo logic, just from a set of dirs) and call Java.


See the first item above.  The point is to enable cross-platform scripting
of the things we already have to script.  IMO, scripts should get out of
the env var business entirely, but that's unrelated to this question :-)

Finally, this is code change, so I'm not sure why we are doing a vote.


I view this as a tools issue, that affects questions that go beyond the
one-time choice of how to write (or re-write) saveVersion.sh.  Also Aaron
(atm) recommended that I bring it to the list.  So here we are :-)

Cheers,
--Matt

On Thu, Nov 29, 2012 at 5:25 PM, Alejandro Abdelnur <tu...@cloudera.com>wrote:

> Matt,
>
> Let me repost my previous questions and a few more. I'd appreciate your
> answers, as it will help me understand the full impact this would have in
> Hadoop and related projects.
>
> * Phyton as runtime requirement. Are you planing to migrate all BASH
> scripts provided by Hadoop (or dynamically created -ie launcher scripts)
>  to Phyton?
> * What else in the current build, besides saveVersion.sh, you see as
> candidate to be migrated to Phyton?
> * How are you planning to define what Phyton modules can be used? Will
> developers have to install them manually?
> * What kind of tasks you envision Python scripts will enable that are not
> possible today?
> * Will the requirement of Python be pushed to clients using the hadoop
> script? If so, this would affect all downstream projects that use hadoop
> script in one why or the other, right?
>
> Is the main motivation of the proposal to make things easier for window, so
> there is no need for cygwin? If that is the case, have you considered doing
> directly BAT scripts? If you take Tomcat for example, they have BAT scripts
> and SH scripts and things work quite nicely.
>
> Personally, I wouldn't be trilled to see the logic in the scripts to get
> more complex, but on the opposite direction; IMO, scripts should be trimmed
> to set env vars (with no voodoo logic), build the classpath (with no voodoo
> logic, just from a set of dirs) and call Java.
>
> Finally, this is code change, so I'm not sure why we are doing a vote.
>
> Thx.
>
> On Thu, Nov 29, 2012 at 3:26 PM, Alejandro Abdelnur <tucu@cloudera.com
> >wrote:
>
> > Matt, thanks for the clarification.
> >
> > I may have missed the main point of the PROPOSAL thread then. I
> personally
> > want to continue the discussion before voting.
> >
> > * Phyton as runtime requirement. Are you planing to migrate all BASH
> > scripts provided by Hadoop (or dynamically created -ie launcher scripts)
> >  to Phyton?
> > * What else in the current build, besides saveVersion.sh, you see as
> > candidate to be migrated to Phyton?
> > * How are you planning to define what Phyton modules can be used? Will
> > developers have to install them manually?
> >
> > Cheers
> >
> >
> > On Thu, Nov 29, 2012 at 2:39 PM, Matt Foley <mfoley@hortonworks.com
> >wrote:
> >
> >> Hi Alejandro,
> >> Please see in-line below.
> >>
> >> On Mon, Nov 26, 2012 at 1:52 PM, Alejandro Abdelnur <tu...@cloudera.com>
> >>  wrote:
> >>
> >> > Matt,
> >> >
> >> > The scope of this vote seems different from what was discussed in the
> >> > PROPOSAL thread.
> >> > In the PROPOSAL thread you indicated this was for Hadoop1 because it
> is
> >> ANT
> >> > based. And the main reason was to remove saveVersion.sh.
> >> > Your #3  was not discussed in the proposal, was it?
> >> >
> >>
> >> The item #3 was in my original statement of the problem, with which I
> >> started the proposal thread.  In fact, the thread title was "[PROPOSAL]
> >> introduce Python as build-time and run-time dependency for Hadoop and
> >> throughout Hadoop stack".  It is true that only one or two people chose
> to
> >> discuss #3 further in that thread.
> >>
> >> The point is not just to replace a single script, but to provide a means
> >> to
> >> do cross-platform scripts, which will over time replace many
> >> non-platform-specific scripts written in platform-specific languages.
> >>
> >>
> >> >
> >> > It seems this vote is dragging much more stuff it was originally
> >> discussed.
> >> > I think you should suspend the vote, recap the motivation and then
> >> restart
> >> > the vote.
> >> >
> >>
> >> I respectfully disagree.  I believe a careful reading of the cited
> >> discussion thread, plus my own statement of the vote, provides
> sufficient
> >> background for a thoughtful decision on the subject.  Presumably so do
> the
> >> ten other people who had already voted before you made that comment.
> >>
> >> If several other people want more discussion first, please speak up.
> >> Thanks,
> >> --Matt
> >>
> >> As things are laid out at the moment my vote is:
> >> >
> >> > -1 (It still seems an overkill to introduce a new runtime requirement
> >> for
> >> > building to replace a script.)
> >> > +1 (I think this is the right way to simplify the build)
> >> > -1 (AFAIK there is not such requirement at the moment, and if it comes
> >> it
> >> > would be in the form of an AM, which I'd argue it should leave outside
> >> of
> >> > Hadoop)
> >> >
> >> > Thx
> >> >
> >> >
> >> > On Mon, Nov 26, 2012 at 1:16 PM, Giridharan Kesavan <
> >> > gkesavan@hortonworks.com> wrote:
> >> >
> >> > > +1, +1, +1
> >> > >
> >> > > -Giri
> >> > >
> >> > >
> >> > > On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <ma...@apache.org>
> >> wrote:
> >> > >
> >> > > > For discussion, please see previous thread "[PROPOSAL] introduce
> >> Python
> >> > > as
> >> > > > build-time and run-time dependency for Hadoop and throughout
> Hadoop
> >> > > stack".
> >> > > >
> >> > > > This vote consists of three separate items:
> >> > > >
> >> > > > 1. Contributors shall be allowed to use Python as a
> >> > platform-independent
> >> > > > scripting language for build-time tasks, and add Python as a
> >> build-time
> >> > > > dependency.
> >> > > > Please vote +1, 0, -1.
> >> > > >
> >> > > > 2. Contributors shall be encouraged to use Maven tasks in
> >> combination
> >> > > with
> >> > > > either plug-ins or Groovy scripts to do cross-platform build-time
> >> > tasks,
> >> > > > even under ant in Hadoop-1.
> >> > > > Please vote +1, 0, -1.
> >> > > >
> >> > > > 3. Contributors shall be allowed to use Python as a
> >> > platform-independent
> >> > > > scripting language for run-time tasks, and add Python as a
> run-time
> >> > > > dependency.
> >> > > > Please vote +1, 0, -1.
> >> > > >
> >> > > > Note that voting -1 on #1 and +1 on #2 essentially REQUIRES
> >> > contributors
> >> > > to
> >> > > > use Maven plug-ins or Groovy as the only means of cross-platform
> >> > > build-time
> >> > > > tasks, or to simply continue using platform-dependent scripts as
> is
> >> > being
> >> > > > done today.
> >> > > >
> >> > > > Vote closes at 12:30pm PST on Saturday 1 December.
> >> > > > ---------
> >> > > > Personally, my vote is +1, +1, +1.
> >> > > > I think #2 is preferable to #1, but still has many unknowns in it,
> >> and
> >> > > > until those are worked out I don't want to delay moving to
> >> > cross-platform
> >> > > > scripts for build-time tasks.
> >> > > >
> >> > > > Best regards,
> >> > > > --Matt
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Alejandro
> >> >
> >>
> >
> >
> >
> > --
> > Alejandro
> >
>
>
>
> --
> Alejandro
>


Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Matt Foley <ma...@apache.org>.
Hello again.  Crossed in the mail.

* What kind of tasks you envision Python scripts will enable that are
> not possible today?


The point isn't to open brave new worlds.  The point is to avoid the
nightmare of having to maintain multiple "parallel" scripts doing the SAME
THING in multiple scripting languages.  I know from experience that they
never get maintained right.  It's just a huge source of bugs, because when
they are in different languages, it can be quite difficult to determine
that they are *really* doing the same thing.  And in a case like shell vs
powershell, it will be very common to have contributors who are not experts
in both.

I care deeply about having a high-quality release in both Linux and
Windows.  And having a cross-platform scripting language will make it much
easier to maintain that quality over time, without "slip" between the two
platforms.

* Will the requirement of Python be pushed to clients using the
> hadoop script? If so, this would affect all downstream projects that use
> hadoop script in one why or the other, right?


If question #3 passes, then Python will become a run-time dependency for
Hadoop.  That means it would need to be installed as part of the Hadoop
install preparation, just like all the other Hadoop run-time dependencies.

Is the main motivation of the proposal to make things easier for window,
> so there is no need for cygwin? If that is the case, have you considered
> doing directly BAT scripts? If you take Tomcat for example, they have BAT
> scripts and SH scripts and things work quite nicely.


Of course it is sufficient, from the simple implementation perspective, to
translate all the shell scripts into bat or (better) powershell scripts.
 That is, in fact, the most evident alternative to my proposals #1 and #3.

However, I ask -- beg! -- the community to consider it from the software
engineering perspective.  We aren't here to just implement something once
and be done.  It has to be maintained, as most of you on this list are well
aware, for years and years, across multiple generations.  And trying to
maintain parallel scripts in multiple languages, when not necessitated by
genuine platform-specific requirements, is just creating bug generators in
the system.

Personally, I wouldn't be trilled to see the logic in the scripts to
> get more complex, but on the opposite direction; IMO, scripts should be
> trimmed to set env vars (with no voodoo logic), build the classpath (with
> no voodoo logic, just from a set of dirs) and call Java.


See the first item above.  The point is to enable cross-platform scripting
of the things we already have to script.  IMO, scripts should get out of
the env var business entirely, but that's unrelated to this question :-)

Finally, this is code change, so I'm not sure why we are doing a vote.


I view this as a tools issue, that affects questions that go beyond the
one-time choice of how to write (or re-write) saveVersion.sh.  Also Aaron
(atm) recommended that I bring it to the list.  So here we are :-)

Cheers,
--Matt

On Thu, Nov 29, 2012 at 5:25 PM, Alejandro Abdelnur <tu...@cloudera.com>wrote:

> Matt,
>
> Let me repost my previous questions and a few more. I'd appreciate your
> answers, as it will help me understand the full impact this would have in
> Hadoop and related projects.
>
> * Phyton as runtime requirement. Are you planing to migrate all BASH
> scripts provided by Hadoop (or dynamically created -ie launcher scripts)
>  to Phyton?
> * What else in the current build, besides saveVersion.sh, you see as
> candidate to be migrated to Phyton?
> * How are you planning to define what Phyton modules can be used? Will
> developers have to install them manually?
> * What kind of tasks you envision Python scripts will enable that are not
> possible today?
> * Will the requirement of Python be pushed to clients using the hadoop
> script? If so, this would affect all downstream projects that use hadoop
> script in one why or the other, right?
>
> Is the main motivation of the proposal to make things easier for window, so
> there is no need for cygwin? If that is the case, have you considered doing
> directly BAT scripts? If you take Tomcat for example, they have BAT scripts
> and SH scripts and things work quite nicely.
>
> Personally, I wouldn't be trilled to see the logic in the scripts to get
> more complex, but on the opposite direction; IMO, scripts should be trimmed
> to set env vars (with no voodoo logic), build the classpath (with no voodoo
> logic, just from a set of dirs) and call Java.
>
> Finally, this is code change, so I'm not sure why we are doing a vote.
>
> Thx.
>
> On Thu, Nov 29, 2012 at 3:26 PM, Alejandro Abdelnur <tucu@cloudera.com
> >wrote:
>
> > Matt, thanks for the clarification.
> >
> > I may have missed the main point of the PROPOSAL thread then. I
> personally
> > want to continue the discussion before voting.
> >
> > * Phyton as runtime requirement. Are you planing to migrate all BASH
> > scripts provided by Hadoop (or dynamically created -ie launcher scripts)
> >  to Phyton?
> > * What else in the current build, besides saveVersion.sh, you see as
> > candidate to be migrated to Phyton?
> > * How are you planning to define what Phyton modules can be used? Will
> > developers have to install them manually?
> >
> > Cheers
> >
> >
> > On Thu, Nov 29, 2012 at 2:39 PM, Matt Foley <mfoley@hortonworks.com
> >wrote:
> >
> >> Hi Alejandro,
> >> Please see in-line below.
> >>
> >> On Mon, Nov 26, 2012 at 1:52 PM, Alejandro Abdelnur <tu...@cloudera.com>
> >>  wrote:
> >>
> >> > Matt,
> >> >
> >> > The scope of this vote seems different from what was discussed in the
> >> > PROPOSAL thread.
> >> > In the PROPOSAL thread you indicated this was for Hadoop1 because it
> is
> >> ANT
> >> > based. And the main reason was to remove saveVersion.sh.
> >> > Your #3  was not discussed in the proposal, was it?
> >> >
> >>
> >> The item #3 was in my original statement of the problem, with which I
> >> started the proposal thread.  In fact, the thread title was "[PROPOSAL]
> >> introduce Python as build-time and run-time dependency for Hadoop and
> >> throughout Hadoop stack".  It is true that only one or two people chose
> to
> >> discuss #3 further in that thread.
> >>
> >> The point is not just to replace a single script, but to provide a means
> >> to
> >> do cross-platform scripts, which will over time replace many
> >> non-platform-specific scripts written in platform-specific languages.
> >>
> >>
> >> >
> >> > It seems this vote is dragging much more stuff it was originally
> >> discussed.
> >> > I think you should suspend the vote, recap the motivation and then
> >> restart
> >> > the vote.
> >> >
> >>
> >> I respectfully disagree.  I believe a careful reading of the cited
> >> discussion thread, plus my own statement of the vote, provides
> sufficient
> >> background for a thoughtful decision on the subject.  Presumably so do
> the
> >> ten other people who had already voted before you made that comment.
> >>
> >> If several other people want more discussion first, please speak up.
> >> Thanks,
> >> --Matt
> >>
> >> As things are laid out at the moment my vote is:
> >> >
> >> > -1 (It still seems an overkill to introduce a new runtime requirement
> >> for
> >> > building to replace a script.)
> >> > +1 (I think this is the right way to simplify the build)
> >> > -1 (AFAIK there is not such requirement at the moment, and if it comes
> >> it
> >> > would be in the form of an AM, which I'd argue it should leave outside
> >> of
> >> > Hadoop)
> >> >
> >> > Thx
> >> >
> >> >
> >> > On Mon, Nov 26, 2012 at 1:16 PM, Giridharan Kesavan <
> >> > gkesavan@hortonworks.com> wrote:
> >> >
> >> > > +1, +1, +1
> >> > >
> >> > > -Giri
> >> > >
> >> > >
> >> > > On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <ma...@apache.org>
> >> wrote:
> >> > >
> >> > > > For discussion, please see previous thread "[PROPOSAL] introduce
> >> Python
> >> > > as
> >> > > > build-time and run-time dependency for Hadoop and throughout
> Hadoop
> >> > > stack".
> >> > > >
> >> > > > This vote consists of three separate items:
> >> > > >
> >> > > > 1. Contributors shall be allowed to use Python as a
> >> > platform-independent
> >> > > > scripting language for build-time tasks, and add Python as a
> >> build-time
> >> > > > dependency.
> >> > > > Please vote +1, 0, -1.
> >> > > >
> >> > > > 2. Contributors shall be encouraged to use Maven tasks in
> >> combination
> >> > > with
> >> > > > either plug-ins or Groovy scripts to do cross-platform build-time
> >> > tasks,
> >> > > > even under ant in Hadoop-1.
> >> > > > Please vote +1, 0, -1.
> >> > > >
> >> > > > 3. Contributors shall be allowed to use Python as a
> >> > platform-independent
> >> > > > scripting language for run-time tasks, and add Python as a
> run-time
> >> > > > dependency.
> >> > > > Please vote +1, 0, -1.
> >> > > >
> >> > > > Note that voting -1 on #1 and +1 on #2 essentially REQUIRES
> >> > contributors
> >> > > to
> >> > > > use Maven plug-ins or Groovy as the only means of cross-platform
> >> > > build-time
> >> > > > tasks, or to simply continue using platform-dependent scripts as
> is
> >> > being
> >> > > > done today.
> >> > > >
> >> > > > Vote closes at 12:30pm PST on Saturday 1 December.
> >> > > > ---------
> >> > > > Personally, my vote is +1, +1, +1.
> >> > > > I think #2 is preferable to #1, but still has many unknowns in it,
> >> and
> >> > > > until those are worked out I don't want to delay moving to
> >> > cross-platform
> >> > > > scripts for build-time tasks.
> >> > > >
> >> > > > Best regards,
> >> > > > --Matt
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Alejandro
> >> >
> >>
> >
> >
> >
> > --
> > Alejandro
> >
>
>
>
> --
> Alejandro
>

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
Matt,

Let me repost my previous questions and a few more. I'd appreciate your
answers, as it will help me understand the full impact this would have in
Hadoop and related projects.

* Phyton as runtime requirement. Are you planing to migrate all BASH
scripts provided by Hadoop (or dynamically created -ie launcher scripts)
 to Phyton?
* What else in the current build, besides saveVersion.sh, you see as
candidate to be migrated to Phyton?
* How are you planning to define what Phyton modules can be used? Will
developers have to install them manually?
* What kind of tasks you envision Python scripts will enable that are not
possible today?
* Will the requirement of Python be pushed to clients using the hadoop
script? If so, this would affect all downstream projects that use hadoop
script in one why or the other, right?

Is the main motivation of the proposal to make things easier for window, so
there is no need for cygwin? If that is the case, have you considered doing
directly BAT scripts? If you take Tomcat for example, they have BAT scripts
and SH scripts and things work quite nicely.

Personally, I wouldn't be trilled to see the logic in the scripts to get
more complex, but on the opposite direction; IMO, scripts should be trimmed
to set env vars (with no voodoo logic), build the classpath (with no voodoo
logic, just from a set of dirs) and call Java.

Finally, this is code change, so I'm not sure why we are doing a vote.

Thx.

On Thu, Nov 29, 2012 at 3:26 PM, Alejandro Abdelnur <tu...@cloudera.com>wrote:

> Matt, thanks for the clarification.
>
> I may have missed the main point of the PROPOSAL thread then. I personally
> want to continue the discussion before voting.
>
> * Phyton as runtime requirement. Are you planing to migrate all BASH
> scripts provided by Hadoop (or dynamically created -ie launcher scripts)
>  to Phyton?
> * What else in the current build, besides saveVersion.sh, you see as
> candidate to be migrated to Phyton?
> * How are you planning to define what Phyton modules can be used? Will
> developers have to install them manually?
>
> Cheers
>
>
> On Thu, Nov 29, 2012 at 2:39 PM, Matt Foley <mf...@hortonworks.com>wrote:
>
>> Hi Alejandro,
>> Please see in-line below.
>>
>> On Mon, Nov 26, 2012 at 1:52 PM, Alejandro Abdelnur <tu...@cloudera.com>
>>  wrote:
>>
>> > Matt,
>> >
>> > The scope of this vote seems different from what was discussed in the
>> > PROPOSAL thread.
>> > In the PROPOSAL thread you indicated this was for Hadoop1 because it is
>> ANT
>> > based. And the main reason was to remove saveVersion.sh.
>> > Your #3  was not discussed in the proposal, was it?
>> >
>>
>> The item #3 was in my original statement of the problem, with which I
>> started the proposal thread.  In fact, the thread title was "[PROPOSAL]
>> introduce Python as build-time and run-time dependency for Hadoop and
>> throughout Hadoop stack".  It is true that only one or two people chose to
>> discuss #3 further in that thread.
>>
>> The point is not just to replace a single script, but to provide a means
>> to
>> do cross-platform scripts, which will over time replace many
>> non-platform-specific scripts written in platform-specific languages.
>>
>>
>> >
>> > It seems this vote is dragging much more stuff it was originally
>> discussed.
>> > I think you should suspend the vote, recap the motivation and then
>> restart
>> > the vote.
>> >
>>
>> I respectfully disagree.  I believe a careful reading of the cited
>> discussion thread, plus my own statement of the vote, provides sufficient
>> background for a thoughtful decision on the subject.  Presumably so do the
>> ten other people who had already voted before you made that comment.
>>
>> If several other people want more discussion first, please speak up.
>> Thanks,
>> --Matt
>>
>> As things are laid out at the moment my vote is:
>> >
>> > -1 (It still seems an overkill to introduce a new runtime requirement
>> for
>> > building to replace a script.)
>> > +1 (I think this is the right way to simplify the build)
>> > -1 (AFAIK there is not such requirement at the moment, and if it comes
>> it
>> > would be in the form of an AM, which I'd argue it should leave outside
>> of
>> > Hadoop)
>> >
>> > Thx
>> >
>> >
>> > On Mon, Nov 26, 2012 at 1:16 PM, Giridharan Kesavan <
>> > gkesavan@hortonworks.com> wrote:
>> >
>> > > +1, +1, +1
>> > >
>> > > -Giri
>> > >
>> > >
>> > > On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <ma...@apache.org>
>> wrote:
>> > >
>> > > > For discussion, please see previous thread "[PROPOSAL] introduce
>> Python
>> > > as
>> > > > build-time and run-time dependency for Hadoop and throughout Hadoop
>> > > stack".
>> > > >
>> > > > This vote consists of three separate items:
>> > > >
>> > > > 1. Contributors shall be allowed to use Python as a
>> > platform-independent
>> > > > scripting language for build-time tasks, and add Python as a
>> build-time
>> > > > dependency.
>> > > > Please vote +1, 0, -1.
>> > > >
>> > > > 2. Contributors shall be encouraged to use Maven tasks in
>> combination
>> > > with
>> > > > either plug-ins or Groovy scripts to do cross-platform build-time
>> > tasks,
>> > > > even under ant in Hadoop-1.
>> > > > Please vote +1, 0, -1.
>> > > >
>> > > > 3. Contributors shall be allowed to use Python as a
>> > platform-independent
>> > > > scripting language for run-time tasks, and add Python as a run-time
>> > > > dependency.
>> > > > Please vote +1, 0, -1.
>> > > >
>> > > > Note that voting -1 on #1 and +1 on #2 essentially REQUIRES
>> > contributors
>> > > to
>> > > > use Maven plug-ins or Groovy as the only means of cross-platform
>> > > build-time
>> > > > tasks, or to simply continue using platform-dependent scripts as is
>> > being
>> > > > done today.
>> > > >
>> > > > Vote closes at 12:30pm PST on Saturday 1 December.
>> > > > ---------
>> > > > Personally, my vote is +1, +1, +1.
>> > > > I think #2 is preferable to #1, but still has many unknowns in it,
>> and
>> > > > until those are worked out I don't want to delay moving to
>> > cross-platform
>> > > > scripts for build-time tasks.
>> > > >
>> > > > Best regards,
>> > > > --Matt
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Alejandro
>> >
>>
>
>
>
> --
> Alejandro
>



-- 
Alejandro

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Jitendra Pandey <ji...@hortonworks.com>.
+1, +1, +1

On Fri, Nov 30, 2012 at 5:40 AM, Radim Kolar <hs...@filez.com> wrote:

>
>  inline ant scripts
>>>
>>> =0. Ant's versioning is stricter; you can pull down the exact Jar
>>> versions,
>>> and some of us in the Ant team worked very hard to get it going
>>> everywhere.
>>> You don't gain anything by going to .py
>>>
>> there are sh scripts inside maven ant plugin stuff
>



-- 
<http://hortonworks.com/download/>

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Steve Loughran <st...@hortonworks.com>.
On 30 November 2012 13:40, Radim Kolar <hs...@filez.com> wrote:

>
>  inline ant scripts
>>>
>>> =0. Ant's versioning is stricter; you can pull down the exact Jar
>>> versions,
>>> and some of us in the Ant team worked very hard to get it going
>>> everywhere.
>>> You don't gain anything by going to .py
>>>
>> there are sh scripts inside maven ant plugin stuff
>

Which is because there are some things you can't do in Java -run rpmbuild
to pick up file permissions and hanging symlinks that only become valid on
deployment.

The reason Ant is used to start them is Maven views trying to run native
scripts as a forbidden action - probably popping up some patronising text
"you are trying to run a shell script, please look at
maven.apache.org/wiki/whymavenwontletyoudothings/ to understand this; they
also view building RPMs as not something to encourage either.

(but we digress into an ant vs maven argument. I do actually appreciate the
consistent target naming across projects and the ability for the IDE to set
up  structure, it's just the entire underlying architecture and
implementation that I dislike)

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Radim Kolar <hs...@filez.com>.
>> inline ant scripts
>>
>> =0. Ant's versioning is stricter; you can pull down the exact Jar versions,
>> and some of us in the Ant team worked very hard to get it going everywhere.
>> You don't gain anything by going to .py
there are sh scripts inside maven ant plugin stuff

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Steve Loughran <st...@hortonworks.com>.
On 30 November 2012 00:29, Radim Kolar <hs...@filez.com> wrote:

>
> * What else in the current build, besides saveVersion.sh, you see as
> candidate to be migrated to Phyton?
>
> inline ant scripts
>

=0. Ant's versioning is stricter; you can pull down the exact Jar versions,
and some of us in the Ant team worked very hard to get it going everywhere.
You don't gain anything by going to .py

-steve

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Radim Kolar <hs...@filez.com>.
* What else in the current build, besides saveVersion.sh, you see as
candidate to be migrated to Phyton?

inline ant scripts

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
Matt, thanks for the clarification.

I may have missed the main point of the PROPOSAL thread then. I personally
want to continue the discussion before voting.

* Phyton as runtime requirement. Are you planing to migrate all BASH
scripts provided by Hadoop (or dynamically created -ie launcher scripts)
 to Phyton?
* What else in the current build, besides saveVersion.sh, you see as
candidate to be migrated to Phyton?
* How are you planning to define what Phyton modules can be used? Will
developers have to install them manually?

Cheers


On Thu, Nov 29, 2012 at 2:39 PM, Matt Foley <mf...@hortonworks.com> wrote:

> Hi Alejandro,
> Please see in-line below.
>
> On Mon, Nov 26, 2012 at 1:52 PM, Alejandro Abdelnur <tu...@cloudera.com>
>  wrote:
>
> > Matt,
> >
> > The scope of this vote seems different from what was discussed in the
> > PROPOSAL thread.
> > In the PROPOSAL thread you indicated this was for Hadoop1 because it is
> ANT
> > based. And the main reason was to remove saveVersion.sh.
> > Your #3  was not discussed in the proposal, was it?
> >
>
> The item #3 was in my original statement of the problem, with which I
> started the proposal thread.  In fact, the thread title was "[PROPOSAL]
> introduce Python as build-time and run-time dependency for Hadoop and
> throughout Hadoop stack".  It is true that only one or two people chose to
> discuss #3 further in that thread.
>
> The point is not just to replace a single script, but to provide a means to
> do cross-platform scripts, which will over time replace many
> non-platform-specific scripts written in platform-specific languages.
>
>
> >
> > It seems this vote is dragging much more stuff it was originally
> discussed.
> > I think you should suspend the vote, recap the motivation and then
> restart
> > the vote.
> >
>
> I respectfully disagree.  I believe a careful reading of the cited
> discussion thread, plus my own statement of the vote, provides sufficient
> background for a thoughtful decision on the subject.  Presumably so do the
> ten other people who had already voted before you made that comment.
>
> If several other people want more discussion first, please speak up.
> Thanks,
> --Matt
>
> As things are laid out at the moment my vote is:
> >
> > -1 (It still seems an overkill to introduce a new runtime requirement for
> > building to replace a script.)
> > +1 (I think this is the right way to simplify the build)
> > -1 (AFAIK there is not such requirement at the moment, and if it comes it
> > would be in the form of an AM, which I'd argue it should leave outside of
> > Hadoop)
> >
> > Thx
> >
> >
> > On Mon, Nov 26, 2012 at 1:16 PM, Giridharan Kesavan <
> > gkesavan@hortonworks.com> wrote:
> >
> > > +1, +1, +1
> > >
> > > -Giri
> > >
> > >
> > > On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <ma...@apache.org> wrote:
> > >
> > > > For discussion, please see previous thread "[PROPOSAL] introduce
> Python
> > > as
> > > > build-time and run-time dependency for Hadoop and throughout Hadoop
> > > stack".
> > > >
> > > > This vote consists of three separate items:
> > > >
> > > > 1. Contributors shall be allowed to use Python as a
> > platform-independent
> > > > scripting language for build-time tasks, and add Python as a
> build-time
> > > > dependency.
> > > > Please vote +1, 0, -1.
> > > >
> > > > 2. Contributors shall be encouraged to use Maven tasks in combination
> > > with
> > > > either plug-ins or Groovy scripts to do cross-platform build-time
> > tasks,
> > > > even under ant in Hadoop-1.
> > > > Please vote +1, 0, -1.
> > > >
> > > > 3. Contributors shall be allowed to use Python as a
> > platform-independent
> > > > scripting language for run-time tasks, and add Python as a run-time
> > > > dependency.
> > > > Please vote +1, 0, -1.
> > > >
> > > > Note that voting -1 on #1 and +1 on #2 essentially REQUIRES
> > contributors
> > > to
> > > > use Maven plug-ins or Groovy as the only means of cross-platform
> > > build-time
> > > > tasks, or to simply continue using platform-dependent scripts as is
> > being
> > > > done today.
> > > >
> > > > Vote closes at 12:30pm PST on Saturday 1 December.
> > > > ---------
> > > > Personally, my vote is +1, +1, +1.
> > > > I think #2 is preferable to #1, but still has many unknowns in it,
> and
> > > > until those are worked out I don't want to delay moving to
> > cross-platform
> > > > scripts for build-time tasks.
> > > >
> > > > Best regards,
> > > > --Matt
> > > >
> > >
> >
> >
> >
> > --
> > Alejandro
> >
>



-- 
Alejandro

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Matt Foley <mf...@hortonworks.com>.
Hi Alejandro,
Please see in-line below.

On Mon, Nov 26, 2012 at 1:52 PM, Alejandro Abdelnur <tu...@cloudera.com>
 wrote:

> Matt,
>
> The scope of this vote seems different from what was discussed in the
> PROPOSAL thread.
> In the PROPOSAL thread you indicated this was for Hadoop1 because it is ANT
> based. And the main reason was to remove saveVersion.sh.
> Your #3  was not discussed in the proposal, was it?
>

The item #3 was in my original statement of the problem, with which I
started the proposal thread.  In fact, the thread title was "[PROPOSAL]
introduce Python as build-time and run-time dependency for Hadoop and
throughout Hadoop stack".  It is true that only one or two people chose to
discuss #3 further in that thread.

The point is not just to replace a single script, but to provide a means to
do cross-platform scripts, which will over time replace many
non-platform-specific scripts written in platform-specific languages.


>
> It seems this vote is dragging much more stuff it was originally discussed.
> I think you should suspend the vote, recap the motivation and then restart
> the vote.
>

I respectfully disagree.  I believe a careful reading of the cited
discussion thread, plus my own statement of the vote, provides sufficient
background for a thoughtful decision on the subject.  Presumably so do the
ten other people who had already voted before you made that comment.

If several other people want more discussion first, please speak up.
Thanks,
--Matt

As things are laid out at the moment my vote is:
>
> -1 (It still seems an overkill to introduce a new runtime requirement for
> building to replace a script.)
> +1 (I think this is the right way to simplify the build)
> -1 (AFAIK there is not such requirement at the moment, and if it comes it
> would be in the form of an AM, which I'd argue it should leave outside of
> Hadoop)
>
> Thx
>
>
> On Mon, Nov 26, 2012 at 1:16 PM, Giridharan Kesavan <
> gkesavan@hortonworks.com> wrote:
>
> > +1, +1, +1
> >
> > -Giri
> >
> >
> > On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <ma...@apache.org> wrote:
> >
> > > For discussion, please see previous thread "[PROPOSAL] introduce Python
> > as
> > > build-time and run-time dependency for Hadoop and throughout Hadoop
> > stack".
> > >
> > > This vote consists of three separate items:
> > >
> > > 1. Contributors shall be allowed to use Python as a
> platform-independent
> > > scripting language for build-time tasks, and add Python as a build-time
> > > dependency.
> > > Please vote +1, 0, -1.
> > >
> > > 2. Contributors shall be encouraged to use Maven tasks in combination
> > with
> > > either plug-ins or Groovy scripts to do cross-platform build-time
> tasks,
> > > even under ant in Hadoop-1.
> > > Please vote +1, 0, -1.
> > >
> > > 3. Contributors shall be allowed to use Python as a
> platform-independent
> > > scripting language for run-time tasks, and add Python as a run-time
> > > dependency.
> > > Please vote +1, 0, -1.
> > >
> > > Note that voting -1 on #1 and +1 on #2 essentially REQUIRES
> contributors
> > to
> > > use Maven plug-ins or Groovy as the only means of cross-platform
> > build-time
> > > tasks, or to simply continue using platform-dependent scripts as is
> being
> > > done today.
> > >
> > > Vote closes at 12:30pm PST on Saturday 1 December.
> > > ---------
> > > Personally, my vote is +1, +1, +1.
> > > I think #2 is preferable to #1, but still has many unknowns in it, and
> > > until those are worked out I don't want to delay moving to
> cross-platform
> > > scripts for build-time tasks.
> > >
> > > Best regards,
> > > --Matt
> > >
> >
>
>
>
> --
> Alejandro
>

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
Matt,

The scope of this vote seems different from what was discussed in the
PROPOSAL thread.

In the PROPOSAL thread you indicated this was for Hadoop1 because it is ANT
based. And the main reason was to remove saveVersion.sh.

Your #3  was not discussed in the proposal, was it?

It seems this vote is dragging much more stuff it was originally discussed.
I think you should suspend the vote, recap the motivation and then restart
the vote. As things are laid out at the moment my vote is:

-1 (It still seems an overkill to introduce a new runtime requirement for
building to replace a script.)
+1 (I think this is the right way to simplify the build)
-1 (AFAIK there is not such requirement at the moment, and if it comes it
would be in the form of an AM, which I'd argue it should leave outside of
Hadoop)

Thx


On Mon, Nov 26, 2012 at 1:16 PM, Giridharan Kesavan <
gkesavan@hortonworks.com> wrote:

> +1, +1, +1
>
> -Giri
>
>
> On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <ma...@apache.org> wrote:
>
> > For discussion, please see previous thread "[PROPOSAL] introduce Python
> as
> > build-time and run-time dependency for Hadoop and throughout Hadoop
> stack".
> >
> > This vote consists of three separate items:
> >
> > 1. Contributors shall be allowed to use Python as a platform-independent
> > scripting language for build-time tasks, and add Python as a build-time
> > dependency.
> > Please vote +1, 0, -1.
> >
> > 2. Contributors shall be encouraged to use Maven tasks in combination
> with
> > either plug-ins or Groovy scripts to do cross-platform build-time tasks,
> > even under ant in Hadoop-1.
> > Please vote +1, 0, -1.
> >
> > 3. Contributors shall be allowed to use Python as a platform-independent
> > scripting language for run-time tasks, and add Python as a run-time
> > dependency.
> > Please vote +1, 0, -1.
> >
> > Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors
> to
> > use Maven plug-ins or Groovy as the only means of cross-platform
> build-time
> > tasks, or to simply continue using platform-dependent scripts as is being
> > done today.
> >
> > Vote closes at 12:30pm PST on Saturday 1 December.
> > ---------
> > Personally, my vote is +1, +1, +1.
> > I think #2 is preferable to #1, but still has many unknowns in it, and
> > until those are worked out I don't want to delay moving to cross-platform
> > scripts for build-time tasks.
> >
> > Best regards,
> > --Matt
> >
>



-- 
Alejandro

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Giridharan Kesavan <gk...@hortonworks.com>.
+1, +1, +1

-Giri


On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <ma...@apache.org> wrote:

> For discussion, please see previous thread "[PROPOSAL] introduce Python as
> build-time and run-time dependency for Hadoop and throughout Hadoop stack".
>
> This vote consists of three separate items:
>
> 1. Contributors shall be allowed to use Python as a platform-independent
> scripting language for build-time tasks, and add Python as a build-time
> dependency.
> Please vote +1, 0, -1.
>
> 2. Contributors shall be encouraged to use Maven tasks in combination with
> either plug-ins or Groovy scripts to do cross-platform build-time tasks,
> even under ant in Hadoop-1.
> Please vote +1, 0, -1.
>
> 3. Contributors shall be allowed to use Python as a platform-independent
> scripting language for run-time tasks, and add Python as a run-time
> dependency.
> Please vote +1, 0, -1.
>
> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
> use Maven plug-ins or Groovy as the only means of cross-platform build-time
> tasks, or to simply continue using platform-dependent scripts as is being
> done today.
>
> Vote closes at 12:30pm PST on Saturday 1 December.
> ---------
> Personally, my vote is +1, +1, +1.
> I think #2 is preferable to #1, but still has many unknowns in it, and
> until those are worked out I don't want to delay moving to cross-platform
> scripts for build-time tasks.
>
> Best regards,
> --Matt
>

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Tom White <to...@cloudera.com>.
+1, +1, -1

Tom

On Sat, Nov 24, 2012 at 8:13 PM, Matt Foley <ma...@apache.org> wrote:
> For discussion, please see previous thread "[PROPOSAL] introduce Python as
> build-time and run-time dependency for Hadoop and throughout Hadoop stack".
>
> This vote consists of three separate items:
>
> 1. Contributors shall be allowed to use Python as a platform-independent
> scripting language for build-time tasks, and add Python as a build-time
> dependency.
> Please vote +1, 0, -1.
>
> 2. Contributors shall be encouraged to use Maven tasks in combination with
> either plug-ins or Groovy scripts to do cross-platform build-time tasks,
> even under ant in Hadoop-1.
> Please vote +1, 0, -1.
>
> 3. Contributors shall be allowed to use Python as a platform-independent
> scripting language for run-time tasks, and add Python as a run-time
> dependency.
> Please vote +1, 0, -1.
>
> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
> use Maven plug-ins or Groovy as the only means of cross-platform build-time
> tasks, or to simply continue using platform-dependent scripts as is being
> done today.
>
> Vote closes at 12:30pm PST on Saturday 1 December.
> ---------
> Personally, my vote is +1, +1, +1.
> I think #2 is preferable to #1, but still has many unknowns in it, and
> until those are worked out I don't want to delay moving to cross-platform
> scripts for build-time tasks.
>
> Best regards,
> --Matt

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Chris Nauroth <cn...@hortonworks.com>.
+1, +1, +1 (non-binding)

On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <ma...@apache.org> wrote:

> For discussion, please see previous thread "[PROPOSAL] introduce Python as
> build-time and run-time dependency for Hadoop and throughout Hadoop stack".
>
> This vote consists of three separate items:
>
> 1. Contributors shall be allowed to use Python as a platform-independent
> scripting language for build-time tasks, and add Python as a build-time
> dependency.
> Please vote +1, 0, -1.
>
> 2. Contributors shall be encouraged to use Maven tasks in combination with
> either plug-ins or Groovy scripts to do cross-platform build-time tasks,
> even under ant in Hadoop-1.
> Please vote +1, 0, -1.
>
> 3. Contributors shall be allowed to use Python as a platform-independent
> scripting language for run-time tasks, and add Python as a run-time
> dependency.
> Please vote +1, 0, -1.
>
> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
> use Maven plug-ins or Groovy as the only means of cross-platform build-time
> tasks, or to simply continue using platform-dependent scripts as is being
> done today.
>
> Vote closes at 12:30pm PST on Saturday 1 December.
> ---------
> Personally, my vote is +1, +1, +1.
> I think #2 is preferable to #1, but still has many unknowns in it, and
> until those are worked out I don't want to delay moving to cross-platform
> scripts for build-time tasks.
>
> Best regards,
> --Matt
>

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Raja Aluri <ra...@cmbasics.com>.
+1, +1, +1 (non binding)

It makes it a lot easier to make build tools (that cannot be developed
easily using maven) work across non-unix like platforms (especially
windows).

Raja



On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <ma...@apache.org> wrote:

> For discussion, please see previous thread "[PROPOSAL] introduce Python as
> build-time and run-time dependency for Hadoop and throughout Hadoop stack".
>
> This vote consists of three separate items:
>
> 1. Contributors shall be allowed to use Python as a platform-independent
> scripting language for build-time tasks, and add Python as a build-time
> dependency.
> Please vote +1, 0, -1.
>
> 2. Contributors shall be encouraged to use Maven tasks in combination with
> either plug-ins or Groovy scripts to do cross-platform build-time tasks,
> even under ant in Hadoop-1.
> Please vote +1, 0, -1.
>
> 3. Contributors shall be allowed to use Python as a platform-independent
> scripting language for run-time tasks, and add Python as a run-time
> dependency.
> Please vote +1, 0, -1.
>
> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
> use Maven plug-ins or Groovy as the only means of cross-platform build-time
> tasks, or to simply continue using platform-dependent scripts as is being
> done today.
>
> Vote closes at 12:30pm PST on Saturday 1 December.
> ---------
> Personally, my vote is +1, +1, +1.
> I think #2 is preferable to #1, but still has many unknowns in it, and
> until those are worked out I don't want to delay moving to cross-platform
> scripts for build-time tasks.
>
> Best regards,
> --Matt
>

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Suresh Srinivas <su...@hortonworks.com>.
+1, +1, +1

Regards,
Suresh


On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <ma...@apache.org> wrote:

> For discussion, please see previous thread "[PROPOSAL] introduce Python as
> build-time and run-time dependency for Hadoop and throughout Hadoop stack".
>
> This vote consists of three separate items:
>
> 1. Contributors shall be allowed to use Python as a platform-independent
> scripting language for build-time tasks, and add Python as a build-time
> dependency.
> Please vote +1, 0, -1.
>
> 2. Contributors shall be encouraged to use Maven tasks in combination with
> either plug-ins or Groovy scripts to do cross-platform build-time tasks,
> even under ant in Hadoop-1.
> Please vote +1, 0, -1.
>
> 3. Contributors shall be allowed to use Python as a platform-independent
> scripting language for run-time tasks, and add Python as a run-time
> dependency.
> Please vote +1, 0, -1.
>
> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
> use Maven plug-ins or Groovy as the only means of cross-platform build-time
> tasks, or to simply continue using platform-dependent scripts as is being
> done today.
>
> Vote closes at 12:30pm PST on Saturday 1 December.
> ---------
> Personally, my vote is +1, +1, +1.
> I think #2 is preferable to #1, but still has many unknowns in it, and
> until those are worked out I don't want to delay moving to cross-platform
> scripts for build-time tasks.
>
> Best regards,
> --Matt
>



-- 
http://hortonworks.com/download/

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Doug Cutting <cu...@apache.org>.
-1, +1, -1

Run- & build-time scripting should be limited to operations that are
impossible in Java.  These should not be complex nor should we
encourage more complexity in them.  A parallel set of simple .bat
files for such operations seems preferable to adding a Python
dependency.

Doug

On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <ma...@apache.org> wrote:
> For discussion, please see previous thread "[PROPOSAL] introduce Python as
> build-time and run-time dependency for Hadoop and throughout Hadoop stack".
>
> This vote consists of three separate items:
>
> 1. Contributors shall be allowed to use Python as a platform-independent
> scripting language for build-time tasks, and add Python as a build-time
> dependency.
> Please vote +1, 0, -1.
>
> 2. Contributors shall be encouraged to use Maven tasks in combination with
> either plug-ins or Groovy scripts to do cross-platform build-time tasks,
> even under ant in Hadoop-1.
> Please vote +1, 0, -1.
>
> 3. Contributors shall be allowed to use Python as a platform-independent
> scripting language for run-time tasks, and add Python as a run-time
> dependency.
> Please vote +1, 0, -1.
>
> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
> use Maven plug-ins or Groovy as the only means of cross-platform build-time
> tasks, or to simply continue using platform-dependent scripts as is being
> done today.
>
> Vote closes at 12:30pm PST on Saturday 1 December.
> ---------
> Personally, my vote is +1, +1, +1.
> I think #2 is preferable to #1, but still has many unknowns in it, and
> until those are worked out I don't want to delay moving to cross-platform
> scripts for build-time tasks.
>
> Best regards,
> --Matt

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Radim Kolar <hs...@filez.com>.
-1, +1, -1

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Matt Foley <mf...@hortonworks.com>.
There's already a jira:
HADOOP-8924<https://issues.apache.org/jira/browse/HADOOP-8924>


On Tue, Dec 4, 2012 at 1:00 PM, Alejandro Abdelnur <tu...@cloudera.com>wrote:

> i've been playing around writing a couple of maven plugins, one to replace
> saveversion.sh and the other to invoke protoc. they both work in windows
> standard cmd (no cygwin required). together with hadoop-8887 they would
> remove most of the scripting done the poms.
>
> (they also work in linux and osx)
>
> they are java based, only require having SVN GIT & PROTOC avail in the
> PATH.
>
> if cmake works in windows, i assume hadoop-8887 would be almost there.
>
> this would leave the tar stitching, which is done as script to handle SO
> symlinks. though i have and idea on how we could take care of it.
>
> i'll be creating a jira momentarily.
>
> thx
>
> Alejandro
>
> On Dec 4, 2012, at 12:28 PM, Matt Foley <mf...@hortonworks.com> wrote:
>
> > Please close HADOOP-9073 as "will not fix", citing this discussion.
> >
> > I'm -1 on groovy in maven.  That's worse, not better.  Let it sit for a
> > while and let people propose simplifications of the script situation.
> >
> > Thanks,
> > --Matt
> >
> >
> > On Tue, Dec 4, 2012 at 11:41 AM, Radim Kolar <hs...@filez.com> wrote:
> >
> >> result of vote is to close https://issues.apache.org/**
> >> jira/browse/HADOOP-9073<
> https://issues.apache.org/jira/browse/HADOOP-9073>and write groovy in
> pom.xml variant (option number 2)?
> >>
>

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
i've been playing around writing a couple of maven plugins, one to replace saveversion.sh and the other to invoke protoc. they both work in windows standard cmd (no cygwin required). together with hadoop-8887 they would remove most of the scripting done the poms. 

(they also work in linux and osx)

they are java based, only require having SVN GIT & PROTOC avail in the PATH. 

if cmake works in windows, i assume hadoop-8887 would be almost there. 

this would leave the tar stitching, which is done as script to handle SO symlinks. though i have and idea on how we could take care of it. 

i'll be creating a jira momentarily. 

thx

Alejandro

On Dec 4, 2012, at 12:28 PM, Matt Foley <mf...@hortonworks.com> wrote:

> Please close HADOOP-9073 as "will not fix", citing this discussion.
> 
> I'm -1 on groovy in maven.  That's worse, not better.  Let it sit for a
> while and let people propose simplifications of the script situation.
> 
> Thanks,
> --Matt
> 
> 
> On Tue, Dec 4, 2012 at 11:41 AM, Radim Kolar <hs...@filez.com> wrote:
> 
>> result of vote is to close https://issues.apache.org/**
>> jira/browse/HADOOP-9073<https://issues.apache.org/jira/browse/HADOOP-9073>and write groovy in pom.xml variant (option number 2)?
>> 

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Matt Foley <mf...@hortonworks.com>.
Please close HADOOP-9073 as "will not fix", citing this discussion.

I'm -1 on groovy in maven.  That's worse, not better.  Let it sit for a
while and let people propose simplifications of the script situation.

Thanks,
--Matt


On Tue, Dec 4, 2012 at 11:41 AM, Radim Kolar <hs...@filez.com> wrote:

> result of vote is to close https://issues.apache.org/**
> jira/browse/HADOOP-9073<https://issues.apache.org/jira/browse/HADOOP-9073>and write groovy in pom.xml variant (option number 2)?
>

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Radim Kolar <hs...@filez.com>.
result of vote is to close 
https://issues.apache.org/jira/browse/HADOOP-9073 and write groovy in 
pom.xml variant (option number 2)?

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Matt Foley <ma...@apache.org>.
Hi Doug,
I didn't read your email until this morning, but I spent time overnight
thinking about the Apache Way and reached similar conclusions.  While
tooling is broader in scope than a single code change, it is a technical
choice that we all have to live with.

More importantly, "Community over Code" would suggest that if only slightly
less than 50% of the community is uncomfortable with adding Python to the
mix which is the Hadoop stack, then we probably shouldn't do it, regardless
of the technical merits.

Therefore, I withdraw the question.

We will search for other means of cleaning up the shellscript problem and
making all functionality work with parity in the Windows world.  I am quite
partial to Allen Wittenauer's suggestion in
HADOOP-9082<https://issues.apache.org/jira/browse/HADOOP-9082?focusedCommentId=13507163&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13507163>
that
the scripts should be greatly simplified before dealing with the
cross-platform question.  It is in many respects silly to have so much
functionality "on the side" instead of dealing with it forthrightly in core
code.  In that spirit, I am also -1 on burying the same complexity in maven
plug-ins, which after all just adds another couple layers of complexity,
and limits the number of people who understand it, as well.

Thanks to all who voted and contributed to the discussion.
Best regards,
--Matt


On Mon, Dec 3, 2012 at 8:50 PM, Doug Cutting <cu...@apache.org> wrote:

> Hadoop's bylaws do draw finer distinctions than the Apache voting
> guidelines document, but we follow the same general principles that
> are described there.
>
> As I understand it, the rationale for using consensus for code is that
> everyone needs to agree on everything in the codebase or we've
> disenfranchised some.  We share a single code repository and we need
> to all agree on what goes into it.  A release does not require
> majority since if someone doesn't agree on the timing of a release
> they can choose to make another at a different time, but every change
> that goes into each release requires consensus.  We also require
> consensus for committers and PMC member votes so that we have a group
> that's coherent and is able to reach consensus on code changes.
>
> Re-writing bash scripts in Python is neither a release nor other
> procedural issue.  It involves changes to the software we maintain and
> seems to fall clearly into the "code change" category.
>
> If you disagree then perhaps you'd like to propose a change to the
> bylaws so that scripts have different rules than other kinds of
> software, but I don't yet see the rationale for such a change.
>
> Doug
>
> On Mon, Dec 3, 2012 at 5:22 PM, Matt Foley <ma...@apache.org> wrote:
> > No, but it speaks to whether the Hadoop bylaws can extend the Apache
> voting
> > procedures and draw finer distinctions.  For example, the Apache voting
> > procedures only identify 3 types of votable issue, while the Hadoop
> bylaws
> > identify 9 types of votable issues.
> >
> > If we were forced to fit "development tools" into one of the three
> > categories cited by the Apache voting procedures, it would be fitting a
> > square peg in a round hole.  Since we can instead look at the 9
> categories
> > provided by the Hadoop bylaws, we can acknowledge that "development
> tools"
> > was an overlooked category.  But in my opinion it certainly doesn't fit
> > into the "code change" category.  Tooling is a meta-issue regarding HOW
> we
> > do what needs to be done.  In this case, whether we allow a
> > platform-independent solution, or force contributors to maintain parallel
> > scripts in multiple platform-specific languages for no reason.
> >
> > --Matt
> >
> >
> > On Mon, Dec 3, 2012 at 3:57 PM, Doug Cutting <cu...@apache.org> wrote:
> >
> >> On Mon, Dec 3, 2012 at 2:08 PM, Matt Foley <mf...@hortonworks.com>
> wrote:
> >> > The apache voting process contradicts the Hadoop bylaws:
> >> > http://www.apache.org/foundation/voting.html says that only PMC
> members
> >> can
> >> > make binding votes on code modification issues, but
> >> > http://hadoop.apache.org/bylaws.html says that Committers can make
> >> binding
> >> > votes on them.  Does that mean the Hadoop bylaws have to change?
> >>
> >> This may be a little atypical but I don't see any harm.  The Hadoop
> >> PMC is willing to respect the veto of any committer as binding.  I'd
> >> worry more if we tried to reduce vetoes to a subset of the PMC than
> >> extend it to a superset.
> >>
> >> Do you think this is problematic?
> >>
> >> Doug
> >>
>

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Doug Cutting <cu...@apache.org>.
Hadoop's bylaws do draw finer distinctions than the Apache voting
guidelines document, but we follow the same general principles that
are described there.

As I understand it, the rationale for using consensus for code is that
everyone needs to agree on everything in the codebase or we've
disenfranchised some.  We share a single code repository and we need
to all agree on what goes into it.  A release does not require
majority since if someone doesn't agree on the timing of a release
they can choose to make another at a different time, but every change
that goes into each release requires consensus.  We also require
consensus for committers and PMC member votes so that we have a group
that's coherent and is able to reach consensus on code changes.

Re-writing bash scripts in Python is neither a release nor other
procedural issue.  It involves changes to the software we maintain and
seems to fall clearly into the "code change" category.

If you disagree then perhaps you'd like to propose a change to the
bylaws so that scripts have different rules than other kinds of
software, but I don't yet see the rationale for such a change.

Doug

On Mon, Dec 3, 2012 at 5:22 PM, Matt Foley <ma...@apache.org> wrote:
> No, but it speaks to whether the Hadoop bylaws can extend the Apache voting
> procedures and draw finer distinctions.  For example, the Apache voting
> procedures only identify 3 types of votable issue, while the Hadoop bylaws
> identify 9 types of votable issues.
>
> If we were forced to fit "development tools" into one of the three
> categories cited by the Apache voting procedures, it would be fitting a
> square peg in a round hole.  Since we can instead look at the 9 categories
> provided by the Hadoop bylaws, we can acknowledge that "development tools"
> was an overlooked category.  But in my opinion it certainly doesn't fit
> into the "code change" category.  Tooling is a meta-issue regarding HOW we
> do what needs to be done.  In this case, whether we allow a
> platform-independent solution, or force contributors to maintain parallel
> scripts in multiple platform-specific languages for no reason.
>
> --Matt
>
>
> On Mon, Dec 3, 2012 at 3:57 PM, Doug Cutting <cu...@apache.org> wrote:
>
>> On Mon, Dec 3, 2012 at 2:08 PM, Matt Foley <mf...@hortonworks.com> wrote:
>> > The apache voting process contradicts the Hadoop bylaws:
>> > http://www.apache.org/foundation/voting.html says that only PMC members
>> can
>> > make binding votes on code modification issues, but
>> > http://hadoop.apache.org/bylaws.html says that Committers can make
>> binding
>> > votes on them.  Does that mean the Hadoop bylaws have to change?
>>
>> This may be a little atypical but I don't see any harm.  The Hadoop
>> PMC is willing to respect the veto of any committer as binding.  I'd
>> worry more if we tried to reduce vetoes to a subset of the PMC than
>> extend it to a superset.
>>
>> Do you think this is problematic?
>>
>> Doug
>>

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Matt Foley <ma...@apache.org>.
No, but it speaks to whether the Hadoop bylaws can extend the Apache voting
procedures and draw finer distinctions.  For example, the Apache voting
procedures only identify 3 types of votable issue, while the Hadoop bylaws
identify 9 types of votable issues.

If we were forced to fit "development tools" into one of the three
categories cited by the Apache voting procedures, it would be fitting a
square peg in a round hole.  Since we can instead look at the 9 categories
provided by the Hadoop bylaws, we can acknowledge that "development tools"
was an overlooked category.  But in my opinion it certainly doesn't fit
into the "code change" category.  Tooling is a meta-issue regarding HOW we
do what needs to be done.  In this case, whether we allow a
platform-independent solution, or force contributors to maintain parallel
scripts in multiple platform-specific languages for no reason.

--Matt


On Mon, Dec 3, 2012 at 3:57 PM, Doug Cutting <cu...@apache.org> wrote:

> On Mon, Dec 3, 2012 at 2:08 PM, Matt Foley <mf...@hortonworks.com> wrote:
> > The apache voting process contradicts the Hadoop bylaws:
> > http://www.apache.org/foundation/voting.html says that only PMC members
> can
> > make binding votes on code modification issues, but
> > http://hadoop.apache.org/bylaws.html says that Committers can make
> binding
> > votes on them.  Does that mean the Hadoop bylaws have to change?
>
> This may be a little atypical but I don't see any harm.  The Hadoop
> PMC is willing to respect the veto of any committer as binding.  I'd
> worry more if we tried to reduce vetoes to a subset of the PMC than
> extend it to a superset.
>
> Do you think this is problematic?
>
> Doug
>

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Doug Cutting <cu...@apache.org>.
On Mon, Dec 3, 2012 at 2:08 PM, Matt Foley <mf...@hortonworks.com> wrote:
> The apache voting process contradicts the Hadoop bylaws:
> http://www.apache.org/foundation/voting.html says that only PMC members can
> make binding votes on code modification issues, but
> http://hadoop.apache.org/bylaws.html says that Committers can make binding
> votes on them.  Does that mean the Hadoop bylaws have to change?

This may be a little atypical but I don't see any harm.  The Hadoop
PMC is willing to respect the veto of any committer as binding.  I'd
worry more if we tried to reduce vetoes to a subset of the PMC than
extend it to a superset.

Do you think this is problematic?

Doug

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Matt Foley <mf...@hortonworks.com>.
Hi Doug,
The apache voting process contradicts the Hadoop bylaws:
http://www.apache.org/foundation/voting.html says that only PMC members can
make binding votes on code modification issues, but
http://hadoop.apache.org/bylaws.html says that Committers can make binding
votes on them.  Does that mean the Hadoop bylaws have to change?

Thanks,
--Matt


On Mon, Dec 3, 2012 at 11:37 AM, Doug Cutting <cu...@apache.org> wrote:

> On Mon, Dec 3, 2012 at 11:21 AM, Matt Foley <mf...@hortonworks.com>
> wrote:
> > It is intended to be a "technical discussion", in the sense of the bylaws
> > statement (in section "Roles and Responsibilities: Committers"),
> "Committers
> > may cast binding votes on any technical discussion regarding any
> > subproject."  I therefore intended it to be a majority vote of
> Committers.
>
> I'm not sure how you conclude that technical discussions are resolved
> with majority votes.
>
> http://www.apache.org/foundation/voting.html
>
> > Interestingly, this need to discuss tooling and other issues that go
> beyond
> > a simple "code change" is not addressed in the "Decision Making: Actions"
> > section of the bylaws.  That need seems to have been overlooked in the
> > current rev of that section.  But I do not agree that such issues are
> "code
> > changes"; it relates to the tools we depend on to make code changes,
> which
> > is clearly qualitatively different.
>
> I don't see a striking difference between this and a proposed code
> change.  How is a -1 here fundamentally different than a veto on a
> patch submitted to HADOOP-9082?
>
> Doug
>

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Doug Cutting <cu...@apache.org>.
On Mon, Dec 3, 2012 at 11:21 AM, Matt Foley <mf...@hortonworks.com> wrote:
> It is intended to be a "technical discussion", in the sense of the bylaws
> statement (in section "Roles and Responsibilities: Committers"), "Committers
> may cast binding votes on any technical discussion regarding any
> subproject."  I therefore intended it to be a majority vote of Committers.

I'm not sure how you conclude that technical discussions are resolved
with majority votes.

http://www.apache.org/foundation/voting.html

> Interestingly, this need to discuss tooling and other issues that go beyond
> a simple "code change" is not addressed in the "Decision Making: Actions"
> section of the bylaws.  That need seems to have been overlooked in the
> current rev of that section.  But I do not agree that such issues are "code
> changes"; it relates to the tools we depend on to make code changes, which
> is clearly qualitatively different.

I don't see a striking difference between this and a proposed code
change.  How is a -1 here fundamentally different than a veto on a
patch submitted to HADOOP-9082?

Doug

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Matt Foley <mf...@hortonworks.com>.
It is intended to be a "technical discussion", in the sense of the bylaws
statement (in section "Roles and Responsibilities: Committers"), "Committers
may cast binding votes on any technical discussion regarding any
subproject."  I therefore intended it to be a majority vote of Committers.

Interestingly, this need to discuss tooling and other issues that go beyond
a simple "code change" is not addressed in the "Decision Making: Actions"
section of the bylaws.  That need seems to have been overlooked in the
current rev of that section.  But I do not agree that such issues are "code
changes"; it relates to the tools we depend on to make code changes, which
is clearly qualitatively different.

--Matt


On Mon, Dec 3, 2012 at 10:37 AM, Doug Cutting <cu...@apache.org> wrote:

> On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <ma...@apache.org> wrote:
> > Vote closes at 12:30pm PST on Saturday 1 December.
>
> It's not clear to me what kind of a vote this is.  It seems closest to
> a code change vote, since it implies code changes, although without a
> specific patch yet proposed.  As such it would follow lazy consensus
> rules.  Or is it merely intended as a straw poll, to gauge community
> opinion?
>
> Doug
>

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Doug Cutting <cu...@apache.org>.
On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <ma...@apache.org> wrote:
> Vote closes at 12:30pm PST on Saturday 1 December.

It's not clear to me what kind of a vote this is.  It seems closest to
a code change vote, since it implies code changes, although without a
specific patch yet proposed.  As such it would follow lazy consensus
rules.  Or is it merely intended as a straw poll, to gauge community
opinion?

Doug

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Steve Loughran <st...@hortonworks.com>.
On 24 November 2012 20:13, Matt Foley <ma...@apache.org> wrote:

> For discussion, please see previous thread "[PROPOSAL] introduce Python as
> build-time and run-time dependency for Hadoop and throughout Hadoop stack".
>
> This vote consists of three separate items:
>
> 1. Contributors shall be allowed to use Python as a platform-independent
> scripting language for build-time tasks, and add Python as a build-time
> dependency.
> Please vote +1, 0, -1.
>
>
+1



> 2. Contributors shall be encouraged to use Maven tasks in combination with
> either plug-ins or Groovy scripts to do cross-platform build-time tasks,
> even under ant in Hadoop-1.
> Please vote +1, 0, -1.
>
>
+1

My feelings on Maven are well known, but Groovy can mitigate things. And
I'm not going to advocate post-M2 build tools such as Gradle.

It's ironic that Maven's utter inflexibility forces people to use scripting
languages to get their work done, but Groovy is fairly nimble here -and
easy to learn for any Java programmer. "Groovy in Action" is the book to
own.



> 3. Contributors shall be allowed to use Python as a platform-independent
> scripting language for run-time tasks, and add Python as a run-time
> dependency.
> Please vote +1, 0, -1.
>


+1. I look forward to never having to debug shell script env variable
inheritance ever again.

This does not mean that I advocate writing big bits of the system in .py;
as someone who is debugging OpenStack request throttling this weekend, I
know that Python is not "the solution" to problems. For Hadoop it has a
role, but the role should be ('better than bash') and ('streaming
integration').


> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
> use Maven plug-ins or Groovy as the only means of cross-platform build-time
> tasks, or to simply continue using platform-dependent scripts as is being
> done today.
>
> Vote closes at 12:30pm PST on Saturday 1 December.
> ---------
> Personally, my vote is +1, +1, +1.
> I think #2 is preferable to #1, but still has many unknowns in it, and
> until those are worked out I don't want to delay moving to cross-platform
> scripts for build-time tasks.
>
> Best regards,
> --Matt
>

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Roman Shaposhnik <rv...@apache.org>.
On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley <ma...@apache.org> wrote:
> For discussion, please see previous thread "[PROPOSAL] introduce Python as
> build-time and run-time dependency for Hadoop and throughout Hadoop stack".

Perhaps I'm missing something, but I can't possibly imagine how
a vote on a common-dev@hadoop.apache.org could possibly
affect downstream projects. I honestly don't think we should be
in a business of telling Pig, Hive, Oozie, etc. what to use or
not to use.

With that in mind the following vote applies ONLY to Hadoop
project itself:
   -1, +1, -1

> Personally, my vote is +1, +1, +1.
> I think #2 is preferable to #1, but still has many unknowns in it, and
> until those are worked out I don't want to delay moving to cross-platform
> scripts for build-time tasks.

And yet #2 is, in my opinion, a much better investment of our collective
time. We already at the mercy of JDK, but at least it is a far superior
platform from a support and backward compatibility perspective. Anything
that we can offload to it -- is absolutely worth doing.

Thanks,
Roman.

RE: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Mahadevan Venkatraman <ma...@microsoft.com>.
+1, +1, +1 (non-binding)

Supporting Comments:

Build-time scripts: Using a platform independent language such as python (or maven in certain cases) will greatly help in reducing build breaks and improve on build script maintainability.

Run-time scripts: Most run-time scripts are end-user visible and are scripts that are needed to be run by admin such as starting/stop Hadoop cluster (hadoop-daemons) or by developers submitting a job (hadoop.cmd). There seem to be two types of script files:
    - Scripts intended for a cluster admin or an IT admin:
        - It is desirable to use a common set of python scripts that work across all platforms. However, in a Windows enterprise environment IT admins won't like it if they have to run python scripts to start/stop a cluster. So for these, there should be a PowerShell interface wrapper that can accept the right parameters and pass it down to the python script. Hopefully, the power-shell layer can be a simple pass-thru. This way the python scripts is like any other Java code hidden behind a well-known API surface. IT Admins can't debug it or modify it easily, but this is fine since for scripts like the aforementioned there isn't a requirement that IT Admins should be able to easily be able to view/modify the underlying code.
       - For Windows specific things not supported by Python natively, such as setting ACLs, starting/stopping windows services it should be possible to re-factor the code appropriately. But a little bit of powershell/cmd for these call outs would be unavoidable.

    - Scripts intended for developers/cluster users:
      - Most of these scripts (e.g. hadoop.cmd) would be behind other API surface such as WebHDFS, ODBC, JDBC, Templeton etc. So the advantage of having a common script across platforms outweighs the use of cmd/powershell as a native windows feature. Again, it should also be possible to provide simple powershell wrappers for a windows environment.

Thanks, Mahadevan.

-----Original Message-----
From: Ivan Mitic [mailto:ivanmi@microsoft.com] 
Sent: Thursday, November 29, 2012 3:41 PM
To: common-dev@hadoop.apache.org; mattf@apache.org
Subject: RE: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

+1, +1, +1 (some comments inline)

-----Original Message-----
From: mfoley@hortonworks.com [mailto:mfoley@hortonworks.com] On Behalf Of Matt Foley
Sent: Saturday, November 24, 2012 12:13 PM
To: common-dev@hadoop.apache.org
Subject: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

For discussion, please see previous thread "[PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack".

This vote consists of three separate items:

1. Contributors shall be allowed to use Python as a platform-independent scripting language for build-time tasks, and add Python as a build-time dependency.
Please vote +1, 0, -1. 

2. Contributors shall be encouraged to use Maven tasks in combination with either plug-ins or Groovy scripts to do cross-platform build-time tasks, even under ant in Hadoop-1.
Please vote +1, 0, -1.

>>> I believe 1&2 in combination make a total sense. I ported a few scripts to Python, and thus far, it showed to be up to the task and satisfy the cross-platform requirements. In my option, it is also important to agree on the version, as I've run into some breaking changes in version 3+.


3. Contributors shall be allowed to use Python as a platform-independent scripting language for run-time tasks, and add Python as a run-time dependency.

>>> This is a great aspirational goal! Maintaining two sets of scripts would be a real challenge.


Please vote +1, 0, -1.

Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to use Maven plug-ins or Groovy as the only means of cross-platform build-time tasks, or to simply continue using platform-dependent scripts as is being done today.

Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and until those are worked out I don't want to delay moving to cross-platform scripts for build-time tasks.

Best regards,
--Matt





RE: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Ivan Mitic <iv...@microsoft.com>.
+1, +1, +1 (some comments inline)

-----Original Message-----
From: mfoley@hortonworks.com [mailto:mfoley@hortonworks.com] On Behalf Of Matt Foley
Sent: Saturday, November 24, 2012 12:13 PM
To: common-dev@hadoop.apache.org
Subject: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

For discussion, please see previous thread "[PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack".

This vote consists of three separate items:

1. Contributors shall be allowed to use Python as a platform-independent scripting language for build-time tasks, and add Python as a build-time dependency.
Please vote +1, 0, -1. 

2. Contributors shall be encouraged to use Maven tasks in combination with either plug-ins or Groovy scripts to do cross-platform build-time tasks, even under ant in Hadoop-1.
Please vote +1, 0, -1.

>>> I believe 1&2 in combination make a total sense. I ported a few scripts to Python, and thus far, it showed to be up to the task and satisfy the cross-platform requirements. In my option, it is also important to agree on the version, as I've run into some breaking changes in version 3+.


3. Contributors shall be allowed to use Python as a platform-independent scripting language for run-time tasks, and add Python as a run-time dependency.

>>> This is a great aspirational goal! Maintaining two sets of scripts would be a real challenge.


Please vote +1, 0, -1.

Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to use Maven plug-ins or Groovy as the only means of cross-platform build-time tasks, or to simply continue using platform-dependent scripts as is being done today.

Vote closes at 12:30pm PST on Saturday 1 December.
---------
Personally, my vote is +1, +1, +1.
I think #2 is preferable to #1, but still has many unknowns in it, and until those are worked out I don't want to delay moving to cross-platform scripts for build-time tasks.

Best regards,
--Matt


Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Konstantin Boudnik <co...@apache.org>.
On Sat, Dec 01, 2012 at 10:07PM, Eric Yang wrote:
> -1, +1, -1
> 
> Python has fairly inconsistent support across all major OS vendors.  It is
> hard to get it right unless the scripts are all designed to make use of
> Python 2.4.  However, Python 2.4 doesn't have necessary OS features to make
> Python useful in runtime or build environment unless you write a lot of
> custom modules.  Which defeats the purpose to use python as intermediate
> layer to do OS dependent work.  Jruby may be a better choice.

JRuby? Really? Groovy is already there and it is really a Java dialect unlike
JRuby. And yes - it is quite suitable for build things, considering the use of
it in BigTop

Cos

> On Sat, Dec 1, 2012 at 12:28 PM, Joep Rottinghuis <jr...@gmail.com>wrote:
> 
> > 0, 0, -1 (non-binding)
> >
> > Joep
> >
> > On Nov 24, 2012, at 12:13 PM, Matt Foley <ma...@apache.org> wrote:
> >
> > > For discussion, please see previous thread "[PROPOSAL] introduce Python
> > as
> > > build-time and run-time dependency for Hadoop and throughout Hadoop
> > stack".
> > >
> > > This vote consists of three separate items:
> > >
> > > 1. Contributors shall be allowed to use Python as a platform-independent
> > > scripting language for build-time tasks, and add Python as a build-time
> > > dependency.
> > > Please vote +1, 0, -1.
> > >
> > > 2. Contributors shall be encouraged to use Maven tasks in combination
> > with
> > > either plug-ins or Groovy scripts to do cross-platform build-time tasks,
> > > even under ant in Hadoop-1.
> > > Please vote +1, 0, -1.
> > >
> > > 3. Contributors shall be allowed to use Python as a platform-independent
> > > scripting language for run-time tasks, and add Python as a run-time
> > > dependency.
> > > Please vote +1, 0, -1.
> > >
> > > Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors
> > to
> > > use Maven plug-ins or Groovy as the only means of cross-platform
> > build-time
> > > tasks, or to simply continue using platform-dependent scripts as is being
> > > done today.
> > >
> > > Vote closes at 12:30pm PST on Saturday 1 December.
> > > ---------
> > > Personally, my vote is +1, +1, +1.
> > > I think #2 is preferable to #1, but still has many unknowns in it, and
> > > until those are worked out I don't want to delay moving to cross-platform
> > > scripts for build-time tasks.
> > >
> > > Best regards,
> > > --Matt
> >

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Eric Yang <er...@gmail.com>.
-1, +1, -1

Python has fairly inconsistent support across all major OS vendors.  It is
hard to get it right unless the scripts are all designed to make use of
Python 2.4.  However, Python 2.4 doesn't have necessary OS features to make
Python useful in runtime or build environment unless you write a lot of
custom modules.  Which defeats the purpose to use python as intermediate
layer to do OS dependent work.  Jruby may be a better choice.

regards,
Eric

On Sat, Dec 1, 2012 at 12:28 PM, Joep Rottinghuis <jr...@gmail.com>wrote:

> 0, 0, -1 (non-binding)
>
> Joep
>
> On Nov 24, 2012, at 12:13 PM, Matt Foley <ma...@apache.org> wrote:
>
> > For discussion, please see previous thread "[PROPOSAL] introduce Python
> as
> > build-time and run-time dependency for Hadoop and throughout Hadoop
> stack".
> >
> > This vote consists of three separate items:
> >
> > 1. Contributors shall be allowed to use Python as a platform-independent
> > scripting language for build-time tasks, and add Python as a build-time
> > dependency.
> > Please vote +1, 0, -1.
> >
> > 2. Contributors shall be encouraged to use Maven tasks in combination
> with
> > either plug-ins or Groovy scripts to do cross-platform build-time tasks,
> > even under ant in Hadoop-1.
> > Please vote +1, 0, -1.
> >
> > 3. Contributors shall be allowed to use Python as a platform-independent
> > scripting language for run-time tasks, and add Python as a run-time
> > dependency.
> > Please vote +1, 0, -1.
> >
> > Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors
> to
> > use Maven plug-ins or Groovy as the only means of cross-platform
> build-time
> > tasks, or to simply continue using platform-dependent scripts as is being
> > done today.
> >
> > Vote closes at 12:30pm PST on Saturday 1 December.
> > ---------
> > Personally, my vote is +1, +1, +1.
> > I think #2 is preferable to #1, but still has many unknowns in it, and
> > until those are worked out I don't want to delay moving to cross-platform
> > scripts for build-time tasks.
> >
> > Best regards,
> > --Matt
>

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Joep Rottinghuis <jr...@gmail.com>.
0, 0, -1 (non-binding)

Joep

On Nov 24, 2012, at 12:13 PM, Matt Foley <ma...@apache.org> wrote:

> For discussion, please see previous thread "[PROPOSAL] introduce Python as
> build-time and run-time dependency for Hadoop and throughout Hadoop stack".
> 
> This vote consists of three separate items:
> 
> 1. Contributors shall be allowed to use Python as a platform-independent
> scripting language for build-time tasks, and add Python as a build-time
> dependency.
> Please vote +1, 0, -1.
> 
> 2. Contributors shall be encouraged to use Maven tasks in combination with
> either plug-ins or Groovy scripts to do cross-platform build-time tasks,
> even under ant in Hadoop-1.
> Please vote +1, 0, -1.
> 
> 3. Contributors shall be allowed to use Python as a platform-independent
> scripting language for run-time tasks, and add Python as a run-time
> dependency.
> Please vote +1, 0, -1.
> 
> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
> use Maven plug-ins or Groovy as the only means of cross-platform build-time
> tasks, or to simply continue using platform-dependent scripts as is being
> done today.
> 
> Vote closes at 12:30pm PST on Saturday 1 December.
> ---------
> Personally, my vote is +1, +1, +1.
> I think #2 is preferable to #1, but still has many unknowns in it, and
> until those are worked out I don't want to delay moving to cross-platform
> scripts for build-time tasks.
> 
> Best regards,
> --Matt

Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Adam Berry <am...@yahoo-inc.com>.
0, +1, -1 (non-binding)

Also, it feels like maybe the discussion should have been kept open a little longer, thanksgiving holidays last week meant that people may have missed it.

Cheers,
Adam

On Nov 26, 2012, at 10:16 AM, Robert Evans wrote:

> +1, +1, 0
> 
> On 11/24/12 2:13 PM, "Matt Foley" <ma...@apache.org> wrote:
> 
>> For discussion, please see previous thread "[PROPOSAL] introduce Python as
>> build-time and run-time dependency for Hadoop and throughout Hadoop
>> stack".
>> 
>> This vote consists of three separate items:
>> 
>> 1. Contributors shall be allowed to use Python as a platform-independent
>> scripting language for build-time tasks, and add Python as a build-time
>> dependency.
>> Please vote +1, 0, -1.
>> 
>> 2. Contributors shall be encouraged to use Maven tasks in combination with
>> either plug-ins or Groovy scripts to do cross-platform build-time tasks,
>> even under ant in Hadoop-1.
>> Please vote +1, 0, -1.
>> 
>> 3. Contributors shall be allowed to use Python as a platform-independent
>> scripting language for run-time tasks, and add Python as a run-time
>> dependency.
>> Please vote +1, 0, -1.
>> 
>> Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors
>> to
>> use Maven plug-ins or Groovy as the only means of cross-platform
>> build-time
>> tasks, or to simply continue using platform-dependent scripts as is being
>> done today.
>> 
>> Vote closes at 12:30pm PST on Saturday 1 December.
>> ---------
>> Personally, my vote is +1, +1, +1.
>> I think #2 is preferable to #1, but still has many unknowns in it, and
>> until those are worked out I don't want to delay moving to cross-platform
>> scripts for build-time tasks.
>> 
>> Best regards,
>> --Matt
> 


Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

Posted by Robert Evans <ev...@yahoo-inc.com>.
+1, +1, 0

On 11/24/12 2:13 PM, "Matt Foley" <ma...@apache.org> wrote:

>For discussion, please see previous thread "[PROPOSAL] introduce Python as
>build-time and run-time dependency for Hadoop and throughout Hadoop
>stack".
>
>This vote consists of three separate items:
>
>1. Contributors shall be allowed to use Python as a platform-independent
>scripting language for build-time tasks, and add Python as a build-time
>dependency.
>Please vote +1, 0, -1.
>
>2. Contributors shall be encouraged to use Maven tasks in combination with
>either plug-ins or Groovy scripts to do cross-platform build-time tasks,
>even under ant in Hadoop-1.
>Please vote +1, 0, -1.
>
>3. Contributors shall be allowed to use Python as a platform-independent
>scripting language for run-time tasks, and add Python as a run-time
>dependency.
>Please vote +1, 0, -1.
>
>Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors
>to
>use Maven plug-ins or Groovy as the only means of cross-platform
>build-time
>tasks, or to simply continue using platform-dependent scripts as is being
>done today.
>
>Vote closes at 12:30pm PST on Saturday 1 December.
>---------
>Personally, my vote is +1, +1, +1.
>I think #2 is preferable to #1, but still has many unknowns in it, and
>until those are worked out I don't want to delay moving to cross-platform
>scripts for build-time tasks.
>
>Best regards,
>--Matt