You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Amareshwari Sri Ramadasu <am...@yahoo-inc.com> on 2011/08/29 10:43:29 UTC

Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Some questions on making hadoop-tools top level under trunk,

 1.  Should the patches for tools be created against Hadoop Common?
 2.  What will happen to the tools test automation? Will it run as part of Hadoop Common tests?
 3.  Will it introduce a dependency from MapReduce to Common? Or is this taken care in Mavenization?


Thanks
Amareshwari

On 8/26/11 10:17 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:

Please, don't add more Mavenization work on us (eventually I want to go back
to coding)

Given that Hadoop is already Mavenized, the patch should be Mavenized.

What will have to be done extra (besides Mavenizing distcp) is to create a
hadoop-tools module at root level and within it a hadoop-distcp module.

The hadoop-tools POM will look pretty much like the hadoop-common-project
POM.

The hadoop-distcp POM should follow the hadoop-common POM patterns.

Thanks.

Alejandro

On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
amarsri@yahoo-inc.com> wrote:

> Agree with Mithun and Robert. DistCp and Tools restructuring are separate
> tasks. Since DistCp code is ready to be committed, it need not wait for the
> Tools separation from MR/HDFS.
> I would say it can go into contrib as the patch is now, and when the tools
> restructuring happens it would be just an svn mv.  If there are no issues
> with this proposal I can commit the code tomorrow.
>
> Thanks
> Amareshwari
>
> On 8/26/11 7:45 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote:
>
> I agree with Mithun.  They are related but this goes beyond distcpv2 and
> should not block distcpv2 from going in.  It would be very nice, however, to
> get the layout settled soon so that we all know where to find something when
> we want to work on it.
>
> Also +1 for Alejandro's I also prefer to keep tools at the trunk level.
>
> Even though HDFS, Common, and Mapreduce and perhaps soon tools are separate
> modules right now, there is still tight coupling between the different
> pieces, especially with tests.  IMO until we can reduce that coupling we
> should treat building and testing Hadoop as a single project instead of
> trying to keep them separate.
>
> --Bobby
>
> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <mi...@yahoo.com>
> wrote:
>
> Would it be acceptable if retooling of tools/ were taken up separately? It
> sounds to me like this might be a distinct (albeit related) task.
>
> Mithun
>
>
> ________________________________
> From: Giridharan Kesavan <gk...@hortonworks.com>
> To: mapreduce-dev@hadoop.apache.org
> Sent: Friday, August 26, 2011 12:04 PM
> Subject: Re: DistCpV2 in 0.23
>
> +1 to Alejandro's
>
> I prefer to keep the hadoop-tools at trunk level.
>
> -Giri
>
> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <tu...@cloudera.com>
> wrote:
> > I'd suggest putting hadoop-tools either at trunk/ level or having a a
> tools
> > aggregator module for hdfs and other for common.
> >
> > I personal would prefer at trunk/.
> >
> > Thanks.
> >
> > Alejandro
> >
> > On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
> > amarsri@yahoo-inc.com> wrote:
> >
> >> Agree. It should be separate maven module (and patch puts it as separate
> >> maven module now). And top level for hadoop tools is nice to have, but
> it
> >> becomes hard to maintain until patch automation tests run the tests
> under
> >> tools. Currently we see many times the changes in HDFS effecting RAID
> tests
> >> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
> >>
> >> I propose we can have something like the following:
> >>
> >> trunk/
> >>  - hadoop-mapreduce
> >>      - hadoop-mr-client
> >>      - hadoop-yarn
> >>      - hadoop-tools
> >>          - hadoop-streaming
> >>          - hadoop-archives
> >>          - hadoop-distcp
> >>
> >> Thoughts?
> >>
> >> @Eli and @JD, we did not replace old legacy distcp because this is
> really a
> >> complete rewrite and did not want to remove it until users are
> familiarized
> >> with new one.
> >>
> >> On 8/26/11 12:51 AM, "Todd Lipcon" <to...@cloudera.com> wrote:
> >>
> >> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
> >> in there as well - ie tools that are downstream of MR and/or HDFS.
> >>
> >> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
> mahadev@hortonworks.com>
> >> wrote:
> >> > +1 for a seperate module in hadoop-mapreduce-project. I think
> >> > hadoop-mapreduce-client might not be right place for it. We might have
> >> > to pick a new maven module under hadoop-mapreduce-project that could
> >> > host streaming/distcp/hadoop archives.
> >> >
> >> > thanks
> >> > mahadev
> >> >
> >> > On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
> tucu@cloudera.com>
> >> wrote:
> >> >> Agree, it should be a separate maven module.
> >> >>
> >> >> And it should be under hadoop-mapreduce-client, right?
> >> >>
> >> >> And now that we are in the topic, the same should go for streaming,
> no?
> >> >>
> >> >> Thanks.
> >> >>
> >> >> Alejandro
> >> >>
> >> >> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <to...@cloudera.com>
> >> wrote:
> >> >>
> >> >>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <el...@cloudera.com>
> >> wrote:
> >> >>> > Nice work!   I definitely think this should go in 23 and 20x.
> >> >>> >
> >> >>> > Agree with JD that it should be in the core code, not contrib.  If
> >> >>> > it's going to be maintained then we should put it in the core
> code.
> >> >>>
> >> >>> Now that we're all mavenized, though, a separate maven module and
> >> >>> artifact does make sense IMO - ie "hadoop jar
> >> >>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"
> >> >>>
> >> >>> -Todd
> >> >>> --
> >> >>> Todd Lipcon
> >> >>> Software Engineer, Cloudera
> >> >>>
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
> >>
> >
>
>
>
> --
> -Giri
>
>
>


Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Allen Wittenauer <aw...@apache.org>.
On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
> 
> IMO if the tools module only gets stuff like distcp that's maintained
> then it's not contrib, if it contains all the stuff from the current
> MR contrib then tools is just a re-labeling of contrib. Given that
> this proposal only covers moving distcp to tools it doesn't sound like
> contrib to me.

	At one point, everything in contrib was maintained.  So I guess the big question is: what is the gating criteria for something to get entry into tools?

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
There are a bunch of so called tools in hadoop-mapreduce-project/src/tools -
DistCp, HadoopArchives, Rumen etc. And contrib projects are in src/contrib
in all of common, hdfs and mapred source trees. Not sure how the distinction
was ever made.

The last time we had a discussion about moving contrib projects out of the
core, we didn't reach any consensus - *
http://s.apache.org/HadoopContribDiscussion*. Do we want to revive that
discucssion now? Or we want to keep the status-quo, imitate the source
structure of the present day tools and contrib, but move them to appropriate
maven modules and then have that discussion separately?

I personally prefer the later, given the length and the eventual failure of
the previous discussion.

HADOOP-7590 is a related issue where the src location of contribs like
gridmix, streaming etc is being talked about. I suppose that issue and this
thread ought to converge.

Thanks,
+Vinod

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
There are a bunch of so called tools in hadoop-mapreduce-project/src/tools -
DistCp, HadoopArchives, Rumen etc. And contrib projects are in src/contrib
in all of common, hdfs and mapred source trees. Not sure how the distinction
was ever made.

The last time we had a discussion about moving contrib projects out of the
core, we didn't reach any consensus - *
http://s.apache.org/HadoopContribDiscussion*. Do we want to revive that
discucssion now? Or we want to keep the status-quo, imitate the source
structure of the present day tools and contrib, but move them to appropriate
maven modules and then have that discussion separately?

I personally prefer the later, given the length and the eventual failure of
the previous discussion.

HADOOP-7590 is a related issue where the src location of contribs like
gridmix, streaming etc is being talked about. I suppose that issue and this
thread ought to converge.

Thanks,
+Vinod

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
Following up on this one, the hadoop-tools/ module is already in trunk,
distcp v2 addition could start.

Thanks.

Alejandro

On Mon, Sep 12, 2011 at 6:47 AM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> Alright, I think we've discussed enough on this and everybody seems to
> agree
> about a top level hadoop-tools module.
>
> Time to get into the action. I've filed HADOOP-7624. Amareshwari we can
> track the rest of the implementation related details and questions for your
> specific answers there.
>
> Thanks everyone for putting in your thoughts here.
> +Vinod
>
>
> On Fri, Sep 9, 2011 at 10:55 AM, Rottinghuis, Joep <jrottinghuis@ebay.com
> >wrote:
>
> > If hadoop-tools will be built as part of hadoop-common, then none of
> these
> > tools should be allowed to have a dependency on hdfs or mapreduce.
> > Conversely is also true, when tools do have any such dependency, they
> > cannot be bult as part of hadoop-common.
> > We cannot have circular dependencies like that.
> >
> > That is probably obvious, but I'm just saying...
> >
> > Joep
> > ________________________________________
> > From: Amareshwari Sri Ramadasu [amarsri@yahoo-inc.com]
> > Sent: Wednesday, September 07, 2011 9:33 PM
> > To: mapreduce-dev@hadoop.apache.org
> > Cc: common-dev@hadoop.apache.org
> > Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
> >
> > It is good to have hadoop-tools module separately. But as I asked before
> we
> > need to answer some questions here. I'm trying to answer them myself.
> > Comments are welcome.
> >
> > > > 1.  Should the patches for tools be created against Hadoop Common?
> > Here, I meant should Hadoop common mailing list be used Or should we have
> a
> > separate mailing list for Tools? I agree with Vinod  here, that we can
> tie
> > it Hadoop-common jira/mailing lists.
> >
> > > > 2.  What will happen to the tools test automation? Will it run as
> part
> > of Hadoop Common tests?
> > Jenkins nightly/patch builds for Hadoop tools can run as part of Hadoop
> > common if use Hadoop common mailing list for this.
> > Also, I propose every patch build of HDFS and MAPREDUCE should also run
> > tools tests to make sure nothing is broken. That would ease the
> maintenance
> > of hadoop-tools module. I presume tools test should not take much time
> (some
> > thing like not more than 30 minutes).
> >
> > > > 3.  Will it introduce a dependency from MapReduce to Common? Or is
> this
> > > taken care in Mavenization?
> > I'm not sure about this whether Mavenization can take care of it.
> >
> > Thanks
> > Amareshwari
> >
> > On 9/8/11 9:13 AM, "Rottinghuis, Joep" <jr...@ebay.com> wrote:
> >
> > Does a separate hadoop-tools module imply that there will be a separate
> > Jenkins build as well?
> >
> > Thanks,
> >
> > Joep
> > ________________________________________
> > From: Alejandro Abdelnur [tucu@cloudera.com]
> > Sent: Wednesday, September 07, 2011 11:35 AM
> > To: mapreduce-dev@hadoop.apache.org
> > Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
> >
> > Makes sense
> >
> > On Wed, Sep 7, 2011 at 11:32 AM, <Mi...@emc.com> wrote:
> >
> > > +1 for separate hadoop-tools module. However, if a tool is broken at
> > > release time, and no one comes forward to fix it, it should be removed.
> > > (i.e. Unlike contrib modules, where build and test failures were
> > > tolerated.)
> > >
> > > - milind
> > >
> > > On 9/7/11 11:27 AM, "Mahadev Konar" <ma...@hortonworks.com> wrote:
> > >
> > > >I like the idea of having tools as a seperate module and I dont think
> > > >that it will be a dumping ground unless we choose to make one of it.
> > > >
> > > >+1 for hadoop tools module under trunk.
> > > >
> > > >thanks
> > > >mahadev
> > > >
> > > >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <
> tucu@cloudera.com>
> > > >wrote:
> > > >> Agreed, we should not have a dumping ground. IMO, what it would go
> > into
> > > >> hadoop-tools (i.e. distcp, streaming and someone could argue for
> > > >>FsShell as
> > > >> well) are effectively hadoop CLI utilities. Having them in a
> separate
> > > >>module
> > > >> rather in than in the core module (common, hdfs, mapreduce) does not
> > > >>mean
> > > >> that they are secondary things, just modularization. Also it will
> help
> > > >>to
> > > >> get those tools to use public interfaces of the core module, and
> when
> > we
> > > >> finally have a clean hadoop-client layer, those tools should only
> > > >>depend on
> > > >> that.
> > > >>
> > > >> Finally, the fact that tools would end up under trunk/hadoop-tools,
> it
> > > >>does
> > > >> not prevent that the packaging from HDFS and MAPREDUCE to bundle the
> > > >> same/different tools
> > > >>
> > > >> +1 for hadoop-tools/ (not binding)
> > > >>
> > > >> Thanks.
> > > >>
> > > >>
> > > >> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <er...@gmail.com>
> wrote:
> > > >>
> > > >>> Mapreduce and HDFS are distinct function of Hadoop.  They are
> loosely
> > > >>> coupled.  If we have tools aggregator module, it will not have as
> > > >>> clear distinct function as other Hadoop modules.  Hence, it is
> > > >>> possible for a tool to be depend on both HDFS and map reduce.  If
> > > >>> something broke in tools module, it is unclear which subproject's
> > > >>> responsibility to maintain tools function.  Therefore, it is safer
> to
> > > >>> send tools to incubator or apache extra rather than deposit the
> > > >>> utility tools in tools subcategory.  There are many short lived
> > > >>> projects that attempts to associate themselves with Hadoop but not
> > > >>> being maintained.  It would be better to spin off those utility
> > > >>> projects than use Hadoop as a dumping ground.
> > > >>>
> > > >>> The previous discussion for removing contrib, most people were in
> > > >>> favor of doing so, and only a few contrib owners were reluctant to
> > > >>> remove contrib.  Fewer people has participated in restore
> > > >>> functionality of broken contrib projects.  History speaks for
> itself.
> > > >>> -1 (non-binding) for hadoop-tools.
> > > >>>
> > > >>> regards,
> > > >>> Eric
> > > >>>
> > > >>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <
> > tucu@cloudera.com>
> > > >>> wrote:
> > > >>> > Eric,
> > > >>> >
> > > >>> > Personally I'm fine either way.
> > > >>> >
> > > >>> > Still, I fail to see why a generic/categorized tools
> > increase/reduce
> > > >>>the
> > > >>> > risk of dead code and how they make more-difficult/easier the
> > > >>> > package&deployment.
> > > >>> >
> > > >>> > Would you please explain this?
> > > >>> >
> > > >>> > Thanks.
> > > >>> >
> > > >>> > Alejandro
> > > >>> >
> > > >>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <er...@gmail.com>
> > wrote:
> > > >>> >
> > > >>> >> Option #2 proposed by Amareshwari, seems like a better proposal.
> >  We
> > > >>> don't
> > > >>> >> want to repeat history for contrib again with hadoop-tools.
> >  Having
> > > >>>a
> > > >>> >> generic module like hadoop-tools increases the risk of
> accumulate
> > > >>>dead
> > > >>> code.
> > > >>> >>  It would be better to categorize the hdfs or mapreduce specific
> > > >>>tools
> > > >>> in
> > > >>> >> their respected subcategories.  It is also easier to manage from
> > > >>> >> package/deployment prospective.
> > > >>> >>
> > > >>> >> regards,
> > > >>> >> Eric
> > > >>> >>
> > > >>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
> > > >>> >>
> > > >>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <
> > aw@apache.org>
> > > >>> wrote:
> > > >>> >> >>
> > > >>> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
> > > >>> >> >>> We still need to answer Amareshwari's question (2) she asked
> > > >>>some
> > > >>> time
> > > >>> >> back
> > > >>> >> >>> about the automated code compilation and test execution of
> the
> > > >>>tools
> > > >>> >> module.
> > > >>> >> >>
> > > >>> >> >>
> > > >>> >> >>
> > > >>> >> >>>>> My #1 question is if tools is basically contrib reborn.
>  If
> > > >>>not,
> > > >>> what
> > > >>> >> >>>> makes
> > > >>> >> >>>>> it different?
> > > >>> >> >>
> > > >>> >> >>
> > > >>> >> >>        I'm still waiting for this answer as well.
> > > >>> >> >>
> > > >>> >> >>        Until such, I would be pretty much against a tools
> > module.
> > > >>> >>  Changing the name of the dumping ground doesn't make it any
> less
> > > >>>of a
> > > >>> >> dumping ground.
> > > >>> >> >
> > > >>> >> > IMO if the tools module only gets stuff like distcp that's
> > > >>>maintained
> > > >>> >> > then it's not contrib, if it contains all the stuff from the
> > > >>>current
> > > >>> >> > MR contrib then tools is just a re-labeling of contrib. Given
> > that
> > > >>> >> > this proposal only covers moving distcp to tools it doesn't
> > sound
> > > >>>like
> > > >>> >> > contrib to me.
> > > >>> >> >
> > > >>> >> > Thanks,
> > > >>> >> > Eli
> > > >>> >>
> > > >>> >>
> > > >>> >
> > > >>>
> > > >>
> > > >
> > >
> > >
> >
> >
>

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
Following up on this one, the hadoop-tools/ module is already in trunk,
distcp v2 addition could start.

Thanks.

Alejandro

On Mon, Sep 12, 2011 at 6:47 AM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> Alright, I think we've discussed enough on this and everybody seems to
> agree
> about a top level hadoop-tools module.
>
> Time to get into the action. I've filed HADOOP-7624. Amareshwari we can
> track the rest of the implementation related details and questions for your
> specific answers there.
>
> Thanks everyone for putting in your thoughts here.
> +Vinod
>
>
> On Fri, Sep 9, 2011 at 10:55 AM, Rottinghuis, Joep <jrottinghuis@ebay.com
> >wrote:
>
> > If hadoop-tools will be built as part of hadoop-common, then none of
> these
> > tools should be allowed to have a dependency on hdfs or mapreduce.
> > Conversely is also true, when tools do have any such dependency, they
> > cannot be bult as part of hadoop-common.
> > We cannot have circular dependencies like that.
> >
> > That is probably obvious, but I'm just saying...
> >
> > Joep
> > ________________________________________
> > From: Amareshwari Sri Ramadasu [amarsri@yahoo-inc.com]
> > Sent: Wednesday, September 07, 2011 9:33 PM
> > To: mapreduce-dev@hadoop.apache.org
> > Cc: common-dev@hadoop.apache.org
> > Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
> >
> > It is good to have hadoop-tools module separately. But as I asked before
> we
> > need to answer some questions here. I'm trying to answer them myself.
> > Comments are welcome.
> >
> > > > 1.  Should the patches for tools be created against Hadoop Common?
> > Here, I meant should Hadoop common mailing list be used Or should we have
> a
> > separate mailing list for Tools? I agree with Vinod  here, that we can
> tie
> > it Hadoop-common jira/mailing lists.
> >
> > > > 2.  What will happen to the tools test automation? Will it run as
> part
> > of Hadoop Common tests?
> > Jenkins nightly/patch builds for Hadoop tools can run as part of Hadoop
> > common if use Hadoop common mailing list for this.
> > Also, I propose every patch build of HDFS and MAPREDUCE should also run
> > tools tests to make sure nothing is broken. That would ease the
> maintenance
> > of hadoop-tools module. I presume tools test should not take much time
> (some
> > thing like not more than 30 minutes).
> >
> > > > 3.  Will it introduce a dependency from MapReduce to Common? Or is
> this
> > > taken care in Mavenization?
> > I'm not sure about this whether Mavenization can take care of it.
> >
> > Thanks
> > Amareshwari
> >
> > On 9/8/11 9:13 AM, "Rottinghuis, Joep" <jr...@ebay.com> wrote:
> >
> > Does a separate hadoop-tools module imply that there will be a separate
> > Jenkins build as well?
> >
> > Thanks,
> >
> > Joep
> > ________________________________________
> > From: Alejandro Abdelnur [tucu@cloudera.com]
> > Sent: Wednesday, September 07, 2011 11:35 AM
> > To: mapreduce-dev@hadoop.apache.org
> > Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
> >
> > Makes sense
> >
> > On Wed, Sep 7, 2011 at 11:32 AM, <Mi...@emc.com> wrote:
> >
> > > +1 for separate hadoop-tools module. However, if a tool is broken at
> > > release time, and no one comes forward to fix it, it should be removed.
> > > (i.e. Unlike contrib modules, where build and test failures were
> > > tolerated.)
> > >
> > > - milind
> > >
> > > On 9/7/11 11:27 AM, "Mahadev Konar" <ma...@hortonworks.com> wrote:
> > >
> > > >I like the idea of having tools as a seperate module and I dont think
> > > >that it will be a dumping ground unless we choose to make one of it.
> > > >
> > > >+1 for hadoop tools module under trunk.
> > > >
> > > >thanks
> > > >mahadev
> > > >
> > > >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <
> tucu@cloudera.com>
> > > >wrote:
> > > >> Agreed, we should not have a dumping ground. IMO, what it would go
> > into
> > > >> hadoop-tools (i.e. distcp, streaming and someone could argue for
> > > >>FsShell as
> > > >> well) are effectively hadoop CLI utilities. Having them in a
> separate
> > > >>module
> > > >> rather in than in the core module (common, hdfs, mapreduce) does not
> > > >>mean
> > > >> that they are secondary things, just modularization. Also it will
> help
> > > >>to
> > > >> get those tools to use public interfaces of the core module, and
> when
> > we
> > > >> finally have a clean hadoop-client layer, those tools should only
> > > >>depend on
> > > >> that.
> > > >>
> > > >> Finally, the fact that tools would end up under trunk/hadoop-tools,
> it
> > > >>does
> > > >> not prevent that the packaging from HDFS and MAPREDUCE to bundle the
> > > >> same/different tools
> > > >>
> > > >> +1 for hadoop-tools/ (not binding)
> > > >>
> > > >> Thanks.
> > > >>
> > > >>
> > > >> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <er...@gmail.com>
> wrote:
> > > >>
> > > >>> Mapreduce and HDFS are distinct function of Hadoop.  They are
> loosely
> > > >>> coupled.  If we have tools aggregator module, it will not have as
> > > >>> clear distinct function as other Hadoop modules.  Hence, it is
> > > >>> possible for a tool to be depend on both HDFS and map reduce.  If
> > > >>> something broke in tools module, it is unclear which subproject's
> > > >>> responsibility to maintain tools function.  Therefore, it is safer
> to
> > > >>> send tools to incubator or apache extra rather than deposit the
> > > >>> utility tools in tools subcategory.  There are many short lived
> > > >>> projects that attempts to associate themselves with Hadoop but not
> > > >>> being maintained.  It would be better to spin off those utility
> > > >>> projects than use Hadoop as a dumping ground.
> > > >>>
> > > >>> The previous discussion for removing contrib, most people were in
> > > >>> favor of doing so, and only a few contrib owners were reluctant to
> > > >>> remove contrib.  Fewer people has participated in restore
> > > >>> functionality of broken contrib projects.  History speaks for
> itself.
> > > >>> -1 (non-binding) for hadoop-tools.
> > > >>>
> > > >>> regards,
> > > >>> Eric
> > > >>>
> > > >>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <
> > tucu@cloudera.com>
> > > >>> wrote:
> > > >>> > Eric,
> > > >>> >
> > > >>> > Personally I'm fine either way.
> > > >>> >
> > > >>> > Still, I fail to see why a generic/categorized tools
> > increase/reduce
> > > >>>the
> > > >>> > risk of dead code and how they make more-difficult/easier the
> > > >>> > package&deployment.
> > > >>> >
> > > >>> > Would you please explain this?
> > > >>> >
> > > >>> > Thanks.
> > > >>> >
> > > >>> > Alejandro
> > > >>> >
> > > >>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <er...@gmail.com>
> > wrote:
> > > >>> >
> > > >>> >> Option #2 proposed by Amareshwari, seems like a better proposal.
> >  We
> > > >>> don't
> > > >>> >> want to repeat history for contrib again with hadoop-tools.
> >  Having
> > > >>>a
> > > >>> >> generic module like hadoop-tools increases the risk of
> accumulate
> > > >>>dead
> > > >>> code.
> > > >>> >>  It would be better to categorize the hdfs or mapreduce specific
> > > >>>tools
> > > >>> in
> > > >>> >> their respected subcategories.  It is also easier to manage from
> > > >>> >> package/deployment prospective.
> > > >>> >>
> > > >>> >> regards,
> > > >>> >> Eric
> > > >>> >>
> > > >>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
> > > >>> >>
> > > >>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <
> > aw@apache.org>
> > > >>> wrote:
> > > >>> >> >>
> > > >>> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
> > > >>> >> >>> We still need to answer Amareshwari's question (2) she asked
> > > >>>some
> > > >>> time
> > > >>> >> back
> > > >>> >> >>> about the automated code compilation and test execution of
> the
> > > >>>tools
> > > >>> >> module.
> > > >>> >> >>
> > > >>> >> >>
> > > >>> >> >>
> > > >>> >> >>>>> My #1 question is if tools is basically contrib reborn.
>  If
> > > >>>not,
> > > >>> what
> > > >>> >> >>>> makes
> > > >>> >> >>>>> it different?
> > > >>> >> >>
> > > >>> >> >>
> > > >>> >> >>        I'm still waiting for this answer as well.
> > > >>> >> >>
> > > >>> >> >>        Until such, I would be pretty much against a tools
> > module.
> > > >>> >>  Changing the name of the dumping ground doesn't make it any
> less
> > > >>>of a
> > > >>> >> dumping ground.
> > > >>> >> >
> > > >>> >> > IMO if the tools module only gets stuff like distcp that's
> > > >>>maintained
> > > >>> >> > then it's not contrib, if it contains all the stuff from the
> > > >>>current
> > > >>> >> > MR contrib then tools is just a re-labeling of contrib. Given
> > that
> > > >>> >> > this proposal only covers moving distcp to tools it doesn't
> > sound
> > > >>>like
> > > >>> >> > contrib to me.
> > > >>> >> >
> > > >>> >> > Thanks,
> > > >>> >> > Eli
> > > >>> >>
> > > >>> >>
> > > >>> >
> > > >>>
> > > >>
> > > >
> > >
> > >
> >
> >
>

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
Alright, I think we've discussed enough on this and everybody seems to agree
about a top level hadoop-tools module.

Time to get into the action. I've filed HADOOP-7624. Amareshwari we can
track the rest of the implementation related details and questions for your
specific answers there.

Thanks everyone for putting in your thoughts here.
+Vinod


On Fri, Sep 9, 2011 at 10:55 AM, Rottinghuis, Joep <jr...@ebay.com>wrote:

> If hadoop-tools will be built as part of hadoop-common, then none of these
> tools should be allowed to have a dependency on hdfs or mapreduce.
> Conversely is also true, when tools do have any such dependency, they
> cannot be bult as part of hadoop-common.
> We cannot have circular dependencies like that.
>
> That is probably obvious, but I'm just saying...
>
> Joep
> ________________________________________
> From: Amareshwari Sri Ramadasu [amarsri@yahoo-inc.com]
> Sent: Wednesday, September 07, 2011 9:33 PM
> To: mapreduce-dev@hadoop.apache.org
> Cc: common-dev@hadoop.apache.org
> Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
>
> It is good to have hadoop-tools module separately. But as I asked before we
> need to answer some questions here. I'm trying to answer them myself.
> Comments are welcome.
>
> > > 1.  Should the patches for tools be created against Hadoop Common?
> Here, I meant should Hadoop common mailing list be used Or should we have a
> separate mailing list for Tools? I agree with Vinod  here, that we can tie
> it Hadoop-common jira/mailing lists.
>
> > > 2.  What will happen to the tools test automation? Will it run as part
> of Hadoop Common tests?
> Jenkins nightly/patch builds for Hadoop tools can run as part of Hadoop
> common if use Hadoop common mailing list for this.
> Also, I propose every patch build of HDFS and MAPREDUCE should also run
> tools tests to make sure nothing is broken. That would ease the maintenance
> of hadoop-tools module. I presume tools test should not take much time (some
> thing like not more than 30 minutes).
>
> > > 3.  Will it introduce a dependency from MapReduce to Common? Or is this
> > taken care in Mavenization?
> I'm not sure about this whether Mavenization can take care of it.
>
> Thanks
> Amareshwari
>
> On 9/8/11 9:13 AM, "Rottinghuis, Joep" <jr...@ebay.com> wrote:
>
> Does a separate hadoop-tools module imply that there will be a separate
> Jenkins build as well?
>
> Thanks,
>
> Joep
> ________________________________________
> From: Alejandro Abdelnur [tucu@cloudera.com]
> Sent: Wednesday, September 07, 2011 11:35 AM
> To: mapreduce-dev@hadoop.apache.org
> Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
>
> Makes sense
>
> On Wed, Sep 7, 2011 at 11:32 AM, <Mi...@emc.com> wrote:
>
> > +1 for separate hadoop-tools module. However, if a tool is broken at
> > release time, and no one comes forward to fix it, it should be removed.
> > (i.e. Unlike contrib modules, where build and test failures were
> > tolerated.)
> >
> > - milind
> >
> > On 9/7/11 11:27 AM, "Mahadev Konar" <ma...@hortonworks.com> wrote:
> >
> > >I like the idea of having tools as a seperate module and I dont think
> > >that it will be a dumping ground unless we choose to make one of it.
> > >
> > >+1 for hadoop tools module under trunk.
> > >
> > >thanks
> > >mahadev
> > >
> > >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <tu...@cloudera.com>
> > >wrote:
> > >> Agreed, we should not have a dumping ground. IMO, what it would go
> into
> > >> hadoop-tools (i.e. distcp, streaming and someone could argue for
> > >>FsShell as
> > >> well) are effectively hadoop CLI utilities. Having them in a separate
> > >>module
> > >> rather in than in the core module (common, hdfs, mapreduce) does not
> > >>mean
> > >> that they are secondary things, just modularization. Also it will help
> > >>to
> > >> get those tools to use public interfaces of the core module, and when
> we
> > >> finally have a clean hadoop-client layer, those tools should only
> > >>depend on
> > >> that.
> > >>
> > >> Finally, the fact that tools would end up under trunk/hadoop-tools, it
> > >>does
> > >> not prevent that the packaging from HDFS and MAPREDUCE to bundle the
> > >> same/different tools
> > >>
> > >> +1 for hadoop-tools/ (not binding)
> > >>
> > >> Thanks.
> > >>
> > >>
> > >> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <er...@gmail.com> wrote:
> > >>
> > >>> Mapreduce and HDFS are distinct function of Hadoop.  They are loosely
> > >>> coupled.  If we have tools aggregator module, it will not have as
> > >>> clear distinct function as other Hadoop modules.  Hence, it is
> > >>> possible for a tool to be depend on both HDFS and map reduce.  If
> > >>> something broke in tools module, it is unclear which subproject's
> > >>> responsibility to maintain tools function.  Therefore, it is safer to
> > >>> send tools to incubator or apache extra rather than deposit the
> > >>> utility tools in tools subcategory.  There are many short lived
> > >>> projects that attempts to associate themselves with Hadoop but not
> > >>> being maintained.  It would be better to spin off those utility
> > >>> projects than use Hadoop as a dumping ground.
> > >>>
> > >>> The previous discussion for removing contrib, most people were in
> > >>> favor of doing so, and only a few contrib owners were reluctant to
> > >>> remove contrib.  Fewer people has participated in restore
> > >>> functionality of broken contrib projects.  History speaks for itself.
> > >>> -1 (non-binding) for hadoop-tools.
> > >>>
> > >>> regards,
> > >>> Eric
> > >>>
> > >>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <
> tucu@cloudera.com>
> > >>> wrote:
> > >>> > Eric,
> > >>> >
> > >>> > Personally I'm fine either way.
> > >>> >
> > >>> > Still, I fail to see why a generic/categorized tools
> increase/reduce
> > >>>the
> > >>> > risk of dead code and how they make more-difficult/easier the
> > >>> > package&deployment.
> > >>> >
> > >>> > Would you please explain this?
> > >>> >
> > >>> > Thanks.
> > >>> >
> > >>> > Alejandro
> > >>> >
> > >>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <er...@gmail.com>
> wrote:
> > >>> >
> > >>> >> Option #2 proposed by Amareshwari, seems like a better proposal.
>  We
> > >>> don't
> > >>> >> want to repeat history for contrib again with hadoop-tools.
>  Having
> > >>>a
> > >>> >> generic module like hadoop-tools increases the risk of accumulate
> > >>>dead
> > >>> code.
> > >>> >>  It would be better to categorize the hdfs or mapreduce specific
> > >>>tools
> > >>> in
> > >>> >> their respected subcategories.  It is also easier to manage from
> > >>> >> package/deployment prospective.
> > >>> >>
> > >>> >> regards,
> > >>> >> Eric
> > >>> >>
> > >>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
> > >>> >>
> > >>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <
> aw@apache.org>
> > >>> wrote:
> > >>> >> >>
> > >>> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
> > >>> >> >>> We still need to answer Amareshwari's question (2) she asked
> > >>>some
> > >>> time
> > >>> >> back
> > >>> >> >>> about the automated code compilation and test execution of the
> > >>>tools
> > >>> >> module.
> > >>> >> >>
> > >>> >> >>
> > >>> >> >>
> > >>> >> >>>>> My #1 question is if tools is basically contrib reborn.  If
> > >>>not,
> > >>> what
> > >>> >> >>>> makes
> > >>> >> >>>>> it different?
> > >>> >> >>
> > >>> >> >>
> > >>> >> >>        I'm still waiting for this answer as well.
> > >>> >> >>
> > >>> >> >>        Until such, I would be pretty much against a tools
> module.
> > >>> >>  Changing the name of the dumping ground doesn't make it any less
> > >>>of a
> > >>> >> dumping ground.
> > >>> >> >
> > >>> >> > IMO if the tools module only gets stuff like distcp that's
> > >>>maintained
> > >>> >> > then it's not contrib, if it contains all the stuff from the
> > >>>current
> > >>> >> > MR contrib then tools is just a re-labeling of contrib. Given
> that
> > >>> >> > this proposal only covers moving distcp to tools it doesn't
> sound
> > >>>like
> > >>> >> > contrib to me.
> > >>> >> >
> > >>> >> > Thanks,
> > >>> >> > Eli
> > >>> >>
> > >>> >>
> > >>> >
> > >>>
> > >>
> > >
> >
> >
>
>

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
Alright, I think we've discussed enough on this and everybody seems to agree
about a top level hadoop-tools module.

Time to get into the action. I've filed HADOOP-7624. Amareshwari we can
track the rest of the implementation related details and questions for your
specific answers there.

Thanks everyone for putting in your thoughts here.
+Vinod


On Fri, Sep 9, 2011 at 10:55 AM, Rottinghuis, Joep <jr...@ebay.com>wrote:

> If hadoop-tools will be built as part of hadoop-common, then none of these
> tools should be allowed to have a dependency on hdfs or mapreduce.
> Conversely is also true, when tools do have any such dependency, they
> cannot be bult as part of hadoop-common.
> We cannot have circular dependencies like that.
>
> That is probably obvious, but I'm just saying...
>
> Joep
> ________________________________________
> From: Amareshwari Sri Ramadasu [amarsri@yahoo-inc.com]
> Sent: Wednesday, September 07, 2011 9:33 PM
> To: mapreduce-dev@hadoop.apache.org
> Cc: common-dev@hadoop.apache.org
> Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
>
> It is good to have hadoop-tools module separately. But as I asked before we
> need to answer some questions here. I'm trying to answer them myself.
> Comments are welcome.
>
> > > 1.  Should the patches for tools be created against Hadoop Common?
> Here, I meant should Hadoop common mailing list be used Or should we have a
> separate mailing list for Tools? I agree with Vinod  here, that we can tie
> it Hadoop-common jira/mailing lists.
>
> > > 2.  What will happen to the tools test automation? Will it run as part
> of Hadoop Common tests?
> Jenkins nightly/patch builds for Hadoop tools can run as part of Hadoop
> common if use Hadoop common mailing list for this.
> Also, I propose every patch build of HDFS and MAPREDUCE should also run
> tools tests to make sure nothing is broken. That would ease the maintenance
> of hadoop-tools module. I presume tools test should not take much time (some
> thing like not more than 30 minutes).
>
> > > 3.  Will it introduce a dependency from MapReduce to Common? Or is this
> > taken care in Mavenization?
> I'm not sure about this whether Mavenization can take care of it.
>
> Thanks
> Amareshwari
>
> On 9/8/11 9:13 AM, "Rottinghuis, Joep" <jr...@ebay.com> wrote:
>
> Does a separate hadoop-tools module imply that there will be a separate
> Jenkins build as well?
>
> Thanks,
>
> Joep
> ________________________________________
> From: Alejandro Abdelnur [tucu@cloudera.com]
> Sent: Wednesday, September 07, 2011 11:35 AM
> To: mapreduce-dev@hadoop.apache.org
> Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
>
> Makes sense
>
> On Wed, Sep 7, 2011 at 11:32 AM, <Mi...@emc.com> wrote:
>
> > +1 for separate hadoop-tools module. However, if a tool is broken at
> > release time, and no one comes forward to fix it, it should be removed.
> > (i.e. Unlike contrib modules, where build and test failures were
> > tolerated.)
> >
> > - milind
> >
> > On 9/7/11 11:27 AM, "Mahadev Konar" <ma...@hortonworks.com> wrote:
> >
> > >I like the idea of having tools as a seperate module and I dont think
> > >that it will be a dumping ground unless we choose to make one of it.
> > >
> > >+1 for hadoop tools module under trunk.
> > >
> > >thanks
> > >mahadev
> > >
> > >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <tu...@cloudera.com>
> > >wrote:
> > >> Agreed, we should not have a dumping ground. IMO, what it would go
> into
> > >> hadoop-tools (i.e. distcp, streaming and someone could argue for
> > >>FsShell as
> > >> well) are effectively hadoop CLI utilities. Having them in a separate
> > >>module
> > >> rather in than in the core module (common, hdfs, mapreduce) does not
> > >>mean
> > >> that they are secondary things, just modularization. Also it will help
> > >>to
> > >> get those tools to use public interfaces of the core module, and when
> we
> > >> finally have a clean hadoop-client layer, those tools should only
> > >>depend on
> > >> that.
> > >>
> > >> Finally, the fact that tools would end up under trunk/hadoop-tools, it
> > >>does
> > >> not prevent that the packaging from HDFS and MAPREDUCE to bundle the
> > >> same/different tools
> > >>
> > >> +1 for hadoop-tools/ (not binding)
> > >>
> > >> Thanks.
> > >>
> > >>
> > >> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <er...@gmail.com> wrote:
> > >>
> > >>> Mapreduce and HDFS are distinct function of Hadoop.  They are loosely
> > >>> coupled.  If we have tools aggregator module, it will not have as
> > >>> clear distinct function as other Hadoop modules.  Hence, it is
> > >>> possible for a tool to be depend on both HDFS and map reduce.  If
> > >>> something broke in tools module, it is unclear which subproject's
> > >>> responsibility to maintain tools function.  Therefore, it is safer to
> > >>> send tools to incubator or apache extra rather than deposit the
> > >>> utility tools in tools subcategory.  There are many short lived
> > >>> projects that attempts to associate themselves with Hadoop but not
> > >>> being maintained.  It would be better to spin off those utility
> > >>> projects than use Hadoop as a dumping ground.
> > >>>
> > >>> The previous discussion for removing contrib, most people were in
> > >>> favor of doing so, and only a few contrib owners were reluctant to
> > >>> remove contrib.  Fewer people has participated in restore
> > >>> functionality of broken contrib projects.  History speaks for itself.
> > >>> -1 (non-binding) for hadoop-tools.
> > >>>
> > >>> regards,
> > >>> Eric
> > >>>
> > >>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <
> tucu@cloudera.com>
> > >>> wrote:
> > >>> > Eric,
> > >>> >
> > >>> > Personally I'm fine either way.
> > >>> >
> > >>> > Still, I fail to see why a generic/categorized tools
> increase/reduce
> > >>>the
> > >>> > risk of dead code and how they make more-difficult/easier the
> > >>> > package&deployment.
> > >>> >
> > >>> > Would you please explain this?
> > >>> >
> > >>> > Thanks.
> > >>> >
> > >>> > Alejandro
> > >>> >
> > >>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <er...@gmail.com>
> wrote:
> > >>> >
> > >>> >> Option #2 proposed by Amareshwari, seems like a better proposal.
>  We
> > >>> don't
> > >>> >> want to repeat history for contrib again with hadoop-tools.
>  Having
> > >>>a
> > >>> >> generic module like hadoop-tools increases the risk of accumulate
> > >>>dead
> > >>> code.
> > >>> >>  It would be better to categorize the hdfs or mapreduce specific
> > >>>tools
> > >>> in
> > >>> >> their respected subcategories.  It is also easier to manage from
> > >>> >> package/deployment prospective.
> > >>> >>
> > >>> >> regards,
> > >>> >> Eric
> > >>> >>
> > >>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
> > >>> >>
> > >>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <
> aw@apache.org>
> > >>> wrote:
> > >>> >> >>
> > >>> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
> > >>> >> >>> We still need to answer Amareshwari's question (2) she asked
> > >>>some
> > >>> time
> > >>> >> back
> > >>> >> >>> about the automated code compilation and test execution of the
> > >>>tools
> > >>> >> module.
> > >>> >> >>
> > >>> >> >>
> > >>> >> >>
> > >>> >> >>>>> My #1 question is if tools is basically contrib reborn.  If
> > >>>not,
> > >>> what
> > >>> >> >>>> makes
> > >>> >> >>>>> it different?
> > >>> >> >>
> > >>> >> >>
> > >>> >> >>        I'm still waiting for this answer as well.
> > >>> >> >>
> > >>> >> >>        Until such, I would be pretty much against a tools
> module.
> > >>> >>  Changing the name of the dumping ground doesn't make it any less
> > >>>of a
> > >>> >> dumping ground.
> > >>> >> >
> > >>> >> > IMO if the tools module only gets stuff like distcp that's
> > >>>maintained
> > >>> >> > then it's not contrib, if it contains all the stuff from the
> > >>>current
> > >>> >> > MR contrib then tools is just a re-labeling of contrib. Given
> that
> > >>> >> > this proposal only covers moving distcp to tools it doesn't
> sound
> > >>>like
> > >>> >> > contrib to me.
> > >>> >> >
> > >>> >> > Thanks,
> > >>> >> > Eli
> > >>> >>
> > >>> >>
> > >>> >
> > >>>
> > >>
> > >
> >
> >
>
>

RE: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by "Rottinghuis, Joep" <jr...@ebay.com>.
If hadoop-tools will be built as part of hadoop-common, then none of these tools should be allowed to have a dependency on hdfs or mapreduce.
Conversely is also true, when tools do have any such dependency, they cannot be bult as part of hadoop-common.
We cannot have circular dependencies like that.

That is probably obvious, but I'm just saying...

Joep
________________________________________
From: Amareshwari Sri Ramadasu [amarsri@yahoo-inc.com]
Sent: Wednesday, September 07, 2011 9:33 PM
To: mapreduce-dev@hadoop.apache.org
Cc: common-dev@hadoop.apache.org
Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

It is good to have hadoop-tools module separately. But as I asked before we need to answer some questions here. I'm trying to answer them myself. Comments are welcome.

> > 1.  Should the patches for tools be created against Hadoop Common?
Here, I meant should Hadoop common mailing list be used Or should we have a separate mailing list for Tools? I agree with Vinod  here, that we can tie it Hadoop-common jira/mailing lists.

> > 2.  What will happen to the tools test automation? Will it run as part of Hadoop Common tests?
Jenkins nightly/patch builds for Hadoop tools can run as part of Hadoop common if use Hadoop common mailing list for this.
Also, I propose every patch build of HDFS and MAPREDUCE should also run tools tests to make sure nothing is broken. That would ease the maintenance of hadoop-tools module. I presume tools test should not take much time (some thing like not more than 30 minutes).

> > 3.  Will it introduce a dependency from MapReduce to Common? Or is this
> taken care in Mavenization?
I'm not sure about this whether Mavenization can take care of it.

Thanks
Amareshwari

On 9/8/11 9:13 AM, "Rottinghuis, Joep" <jr...@ebay.com> wrote:

Does a separate hadoop-tools module imply that there will be a separate Jenkins build as well?

Thanks,

Joep
________________________________________
From: Alejandro Abdelnur [tucu@cloudera.com]
Sent: Wednesday, September 07, 2011 11:35 AM
To: mapreduce-dev@hadoop.apache.org
Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Makes sense

On Wed, Sep 7, 2011 at 11:32 AM, <Mi...@emc.com> wrote:

> +1 for separate hadoop-tools module. However, if a tool is broken at
> release time, and no one comes forward to fix it, it should be removed.
> (i.e. Unlike contrib modules, where build and test failures were
> tolerated.)
>
> - milind
>
> On 9/7/11 11:27 AM, "Mahadev Konar" <ma...@hortonworks.com> wrote:
>
> >I like the idea of having tools as a seperate module and I dont think
> >that it will be a dumping ground unless we choose to make one of it.
> >
> >+1 for hadoop tools module under trunk.
> >
> >thanks
> >mahadev
> >
> >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <tu...@cloudera.com>
> >wrote:
> >> Agreed, we should not have a dumping ground. IMO, what it would go into
> >> hadoop-tools (i.e. distcp, streaming and someone could argue for
> >>FsShell as
> >> well) are effectively hadoop CLI utilities. Having them in a separate
> >>module
> >> rather in than in the core module (common, hdfs, mapreduce) does not
> >>mean
> >> that they are secondary things, just modularization. Also it will help
> >>to
> >> get those tools to use public interfaces of the core module, and when we
> >> finally have a clean hadoop-client layer, those tools should only
> >>depend on
> >> that.
> >>
> >> Finally, the fact that tools would end up under trunk/hadoop-tools, it
> >>does
> >> not prevent that the packaging from HDFS and MAPREDUCE to bundle the
> >> same/different tools
> >>
> >> +1 for hadoop-tools/ (not binding)
> >>
> >> Thanks.
> >>
> >>
> >> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <er...@gmail.com> wrote:
> >>
> >>> Mapreduce and HDFS are distinct function of Hadoop.  They are loosely
> >>> coupled.  If we have tools aggregator module, it will not have as
> >>> clear distinct function as other Hadoop modules.  Hence, it is
> >>> possible for a tool to be depend on both HDFS and map reduce.  If
> >>> something broke in tools module, it is unclear which subproject's
> >>> responsibility to maintain tools function.  Therefore, it is safer to
> >>> send tools to incubator or apache extra rather than deposit the
> >>> utility tools in tools subcategory.  There are many short lived
> >>> projects that attempts to associate themselves with Hadoop but not
> >>> being maintained.  It would be better to spin off those utility
> >>> projects than use Hadoop as a dumping ground.
> >>>
> >>> The previous discussion for removing contrib, most people were in
> >>> favor of doing so, and only a few contrib owners were reluctant to
> >>> remove contrib.  Fewer people has participated in restore
> >>> functionality of broken contrib projects.  History speaks for itself.
> >>> -1 (non-binding) for hadoop-tools.
> >>>
> >>> regards,
> >>> Eric
> >>>
> >>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <tu...@cloudera.com>
> >>> wrote:
> >>> > Eric,
> >>> >
> >>> > Personally I'm fine either way.
> >>> >
> >>> > Still, I fail to see why a generic/categorized tools increase/reduce
> >>>the
> >>> > risk of dead code and how they make more-difficult/easier the
> >>> > package&deployment.
> >>> >
> >>> > Would you please explain this?
> >>> >
> >>> > Thanks.
> >>> >
> >>> > Alejandro
> >>> >
> >>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <er...@gmail.com> wrote:
> >>> >
> >>> >> Option #2 proposed by Amareshwari, seems like a better proposal.  We
> >>> don't
> >>> >> want to repeat history for contrib again with hadoop-tools.  Having
> >>>a
> >>> >> generic module like hadoop-tools increases the risk of accumulate
> >>>dead
> >>> code.
> >>> >>  It would be better to categorize the hdfs or mapreduce specific
> >>>tools
> >>> in
> >>> >> their respected subcategories.  It is also easier to manage from
> >>> >> package/deployment prospective.
> >>> >>
> >>> >> regards,
> >>> >> Eric
> >>> >>
> >>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
> >>> >>
> >>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw...@apache.org>
> >>> wrote:
> >>> >> >>
> >>> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
> >>> >> >>> We still need to answer Amareshwari's question (2) she asked
> >>>some
> >>> time
> >>> >> back
> >>> >> >>> about the automated code compilation and test execution of the
> >>>tools
> >>> >> module.
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >>>>> My #1 question is if tools is basically contrib reborn.  If
> >>>not,
> >>> what
> >>> >> >>>> makes
> >>> >> >>>>> it different?
> >>> >> >>
> >>> >> >>
> >>> >> >>        I'm still waiting for this answer as well.
> >>> >> >>
> >>> >> >>        Until such, I would be pretty much against a tools module.
> >>> >>  Changing the name of the dumping ground doesn't make it any less
> >>>of a
> >>> >> dumping ground.
> >>> >> >
> >>> >> > IMO if the tools module only gets stuff like distcp that's
> >>>maintained
> >>> >> > then it's not contrib, if it contains all the stuff from the
> >>>current
> >>> >> > MR contrib then tools is just a re-labeling of contrib. Given that
> >>> >> > this proposal only covers moving distcp to tools it doesn't sound
> >>>like
> >>> >> > contrib to me.
> >>> >> >
> >>> >> > Thanks,
> >>> >> > Eli
> >>> >>
> >>> >>
> >>> >
> >>>
> >>
> >
>
>


RE: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by "Rottinghuis, Joep" <jr...@ebay.com>.
If hadoop-tools will be built as part of hadoop-common, then none of these tools should be allowed to have a dependency on hdfs or mapreduce.
Conversely is also true, when tools do have any such dependency, they cannot be bult as part of hadoop-common.
We cannot have circular dependencies like that.

That is probably obvious, but I'm just saying...

Joep
________________________________________
From: Amareshwari Sri Ramadasu [amarsri@yahoo-inc.com]
Sent: Wednesday, September 07, 2011 9:33 PM
To: mapreduce-dev@hadoop.apache.org
Cc: common-dev@hadoop.apache.org
Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

It is good to have hadoop-tools module separately. But as I asked before we need to answer some questions here. I'm trying to answer them myself. Comments are welcome.

> > 1.  Should the patches for tools be created against Hadoop Common?
Here, I meant should Hadoop common mailing list be used Or should we have a separate mailing list for Tools? I agree with Vinod  here, that we can tie it Hadoop-common jira/mailing lists.

> > 2.  What will happen to the tools test automation? Will it run as part of Hadoop Common tests?
Jenkins nightly/patch builds for Hadoop tools can run as part of Hadoop common if use Hadoop common mailing list for this.
Also, I propose every patch build of HDFS and MAPREDUCE should also run tools tests to make sure nothing is broken. That would ease the maintenance of hadoop-tools module. I presume tools test should not take much time (some thing like not more than 30 minutes).

> > 3.  Will it introduce a dependency from MapReduce to Common? Or is this
> taken care in Mavenization?
I'm not sure about this whether Mavenization can take care of it.

Thanks
Amareshwari

On 9/8/11 9:13 AM, "Rottinghuis, Joep" <jr...@ebay.com> wrote:

Does a separate hadoop-tools module imply that there will be a separate Jenkins build as well?

Thanks,

Joep
________________________________________
From: Alejandro Abdelnur [tucu@cloudera.com]
Sent: Wednesday, September 07, 2011 11:35 AM
To: mapreduce-dev@hadoop.apache.org
Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Makes sense

On Wed, Sep 7, 2011 at 11:32 AM, <Mi...@emc.com> wrote:

> +1 for separate hadoop-tools module. However, if a tool is broken at
> release time, and no one comes forward to fix it, it should be removed.
> (i.e. Unlike contrib modules, where build and test failures were
> tolerated.)
>
> - milind
>
> On 9/7/11 11:27 AM, "Mahadev Konar" <ma...@hortonworks.com> wrote:
>
> >I like the idea of having tools as a seperate module and I dont think
> >that it will be a dumping ground unless we choose to make one of it.
> >
> >+1 for hadoop tools module under trunk.
> >
> >thanks
> >mahadev
> >
> >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <tu...@cloudera.com>
> >wrote:
> >> Agreed, we should not have a dumping ground. IMO, what it would go into
> >> hadoop-tools (i.e. distcp, streaming and someone could argue for
> >>FsShell as
> >> well) are effectively hadoop CLI utilities. Having them in a separate
> >>module
> >> rather in than in the core module (common, hdfs, mapreduce) does not
> >>mean
> >> that they are secondary things, just modularization. Also it will help
> >>to
> >> get those tools to use public interfaces of the core module, and when we
> >> finally have a clean hadoop-client layer, those tools should only
> >>depend on
> >> that.
> >>
> >> Finally, the fact that tools would end up under trunk/hadoop-tools, it
> >>does
> >> not prevent that the packaging from HDFS and MAPREDUCE to bundle the
> >> same/different tools
> >>
> >> +1 for hadoop-tools/ (not binding)
> >>
> >> Thanks.
> >>
> >>
> >> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <er...@gmail.com> wrote:
> >>
> >>> Mapreduce and HDFS are distinct function of Hadoop.  They are loosely
> >>> coupled.  If we have tools aggregator module, it will not have as
> >>> clear distinct function as other Hadoop modules.  Hence, it is
> >>> possible for a tool to be depend on both HDFS and map reduce.  If
> >>> something broke in tools module, it is unclear which subproject's
> >>> responsibility to maintain tools function.  Therefore, it is safer to
> >>> send tools to incubator or apache extra rather than deposit the
> >>> utility tools in tools subcategory.  There are many short lived
> >>> projects that attempts to associate themselves with Hadoop but not
> >>> being maintained.  It would be better to spin off those utility
> >>> projects than use Hadoop as a dumping ground.
> >>>
> >>> The previous discussion for removing contrib, most people were in
> >>> favor of doing so, and only a few contrib owners were reluctant to
> >>> remove contrib.  Fewer people has participated in restore
> >>> functionality of broken contrib projects.  History speaks for itself.
> >>> -1 (non-binding) for hadoop-tools.
> >>>
> >>> regards,
> >>> Eric
> >>>
> >>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <tu...@cloudera.com>
> >>> wrote:
> >>> > Eric,
> >>> >
> >>> > Personally I'm fine either way.
> >>> >
> >>> > Still, I fail to see why a generic/categorized tools increase/reduce
> >>>the
> >>> > risk of dead code and how they make more-difficult/easier the
> >>> > package&deployment.
> >>> >
> >>> > Would you please explain this?
> >>> >
> >>> > Thanks.
> >>> >
> >>> > Alejandro
> >>> >
> >>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <er...@gmail.com> wrote:
> >>> >
> >>> >> Option #2 proposed by Amareshwari, seems like a better proposal.  We
> >>> don't
> >>> >> want to repeat history for contrib again with hadoop-tools.  Having
> >>>a
> >>> >> generic module like hadoop-tools increases the risk of accumulate
> >>>dead
> >>> code.
> >>> >>  It would be better to categorize the hdfs or mapreduce specific
> >>>tools
> >>> in
> >>> >> their respected subcategories.  It is also easier to manage from
> >>> >> package/deployment prospective.
> >>> >>
> >>> >> regards,
> >>> >> Eric
> >>> >>
> >>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
> >>> >>
> >>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw...@apache.org>
> >>> wrote:
> >>> >> >>
> >>> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
> >>> >> >>> We still need to answer Amareshwari's question (2) she asked
> >>>some
> >>> time
> >>> >> back
> >>> >> >>> about the automated code compilation and test execution of the
> >>>tools
> >>> >> module.
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >>>>> My #1 question is if tools is basically contrib reborn.  If
> >>>not,
> >>> what
> >>> >> >>>> makes
> >>> >> >>>>> it different?
> >>> >> >>
> >>> >> >>
> >>> >> >>        I'm still waiting for this answer as well.
> >>> >> >>
> >>> >> >>        Until such, I would be pretty much against a tools module.
> >>> >>  Changing the name of the dumping ground doesn't make it any less
> >>>of a
> >>> >> dumping ground.
> >>> >> >
> >>> >> > IMO if the tools module only gets stuff like distcp that's
> >>>maintained
> >>> >> > then it's not contrib, if it contains all the stuff from the
> >>>current
> >>> >> > MR contrib then tools is just a re-labeling of contrib. Given that
> >>> >> > this proposal only covers moving distcp to tools it doesn't sound
> >>>like
> >>> >> > contrib to me.
> >>> >> >
> >>> >> > Thanks,
> >>> >> > Eli
> >>> >>
> >>> >>
> >>> >
> >>>
> >>
> >
>
>


Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Amareshwari Sri Ramadasu <am...@yahoo-inc.com>.
It is good to have hadoop-tools module separately. But as I asked before we need to answer some questions here. I'm trying to answer them myself. Comments are welcome.

> > 1.  Should the patches for tools be created against Hadoop Common?
Here, I meant should Hadoop common mailing list be used Or should we have a separate mailing list for Tools? I agree with Vinod  here, that we can tie it Hadoop-common jira/mailing lists.

> > 2.  What will happen to the tools test automation? Will it run as part of Hadoop Common tests?
Jenkins nightly/patch builds for Hadoop tools can run as part of Hadoop common if use Hadoop common mailing list for this.
Also, I propose every patch build of HDFS and MAPREDUCE should also run tools tests to make sure nothing is broken. That would ease the maintenance of hadoop-tools module. I presume tools test should not take much time (some thing like not more than 30 minutes).

> > 3.  Will it introduce a dependency from MapReduce to Common? Or is this
> taken care in Mavenization?
I'm not sure about this whether Mavenization can take care of it.

Thanks
Amareshwari

On 9/8/11 9:13 AM, "Rottinghuis, Joep" <jr...@ebay.com> wrote:

Does a separate hadoop-tools module imply that there will be a separate Jenkins build as well?

Thanks,

Joep
________________________________________
From: Alejandro Abdelnur [tucu@cloudera.com]
Sent: Wednesday, September 07, 2011 11:35 AM
To: mapreduce-dev@hadoop.apache.org
Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Makes sense

On Wed, Sep 7, 2011 at 11:32 AM, <Mi...@emc.com> wrote:

> +1 for separate hadoop-tools module. However, if a tool is broken at
> release time, and no one comes forward to fix it, it should be removed.
> (i.e. Unlike contrib modules, where build and test failures were
> tolerated.)
>
> - milind
>
> On 9/7/11 11:27 AM, "Mahadev Konar" <ma...@hortonworks.com> wrote:
>
> >I like the idea of having tools as a seperate module and I dont think
> >that it will be a dumping ground unless we choose to make one of it.
> >
> >+1 for hadoop tools module under trunk.
> >
> >thanks
> >mahadev
> >
> >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <tu...@cloudera.com>
> >wrote:
> >> Agreed, we should not have a dumping ground. IMO, what it would go into
> >> hadoop-tools (i.e. distcp, streaming and someone could argue for
> >>FsShell as
> >> well) are effectively hadoop CLI utilities. Having them in a separate
> >>module
> >> rather in than in the core module (common, hdfs, mapreduce) does not
> >>mean
> >> that they are secondary things, just modularization. Also it will help
> >>to
> >> get those tools to use public interfaces of the core module, and when we
> >> finally have a clean hadoop-client layer, those tools should only
> >>depend on
> >> that.
> >>
> >> Finally, the fact that tools would end up under trunk/hadoop-tools, it
> >>does
> >> not prevent that the packaging from HDFS and MAPREDUCE to bundle the
> >> same/different tools
> >>
> >> +1 for hadoop-tools/ (not binding)
> >>
> >> Thanks.
> >>
> >>
> >> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <er...@gmail.com> wrote:
> >>
> >>> Mapreduce and HDFS are distinct function of Hadoop.  They are loosely
> >>> coupled.  If we have tools aggregator module, it will not have as
> >>> clear distinct function as other Hadoop modules.  Hence, it is
> >>> possible for a tool to be depend on both HDFS and map reduce.  If
> >>> something broke in tools module, it is unclear which subproject's
> >>> responsibility to maintain tools function.  Therefore, it is safer to
> >>> send tools to incubator or apache extra rather than deposit the
> >>> utility tools in tools subcategory.  There are many short lived
> >>> projects that attempts to associate themselves with Hadoop but not
> >>> being maintained.  It would be better to spin off those utility
> >>> projects than use Hadoop as a dumping ground.
> >>>
> >>> The previous discussion for removing contrib, most people were in
> >>> favor of doing so, and only a few contrib owners were reluctant to
> >>> remove contrib.  Fewer people has participated in restore
> >>> functionality of broken contrib projects.  History speaks for itself.
> >>> -1 (non-binding) for hadoop-tools.
> >>>
> >>> regards,
> >>> Eric
> >>>
> >>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <tu...@cloudera.com>
> >>> wrote:
> >>> > Eric,
> >>> >
> >>> > Personally I'm fine either way.
> >>> >
> >>> > Still, I fail to see why a generic/categorized tools increase/reduce
> >>>the
> >>> > risk of dead code and how they make more-difficult/easier the
> >>> > package&deployment.
> >>> >
> >>> > Would you please explain this?
> >>> >
> >>> > Thanks.
> >>> >
> >>> > Alejandro
> >>> >
> >>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <er...@gmail.com> wrote:
> >>> >
> >>> >> Option #2 proposed by Amareshwari, seems like a better proposal.  We
> >>> don't
> >>> >> want to repeat history for contrib again with hadoop-tools.  Having
> >>>a
> >>> >> generic module like hadoop-tools increases the risk of accumulate
> >>>dead
> >>> code.
> >>> >>  It would be better to categorize the hdfs or mapreduce specific
> >>>tools
> >>> in
> >>> >> their respected subcategories.  It is also easier to manage from
> >>> >> package/deployment prospective.
> >>> >>
> >>> >> regards,
> >>> >> Eric
> >>> >>
> >>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
> >>> >>
> >>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw...@apache.org>
> >>> wrote:
> >>> >> >>
> >>> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
> >>> >> >>> We still need to answer Amareshwari's question (2) she asked
> >>>some
> >>> time
> >>> >> back
> >>> >> >>> about the automated code compilation and test execution of the
> >>>tools
> >>> >> module.
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >>>>> My #1 question is if tools is basically contrib reborn.  If
> >>>not,
> >>> what
> >>> >> >>>> makes
> >>> >> >>>>> it different?
> >>> >> >>
> >>> >> >>
> >>> >> >>        I'm still waiting for this answer as well.
> >>> >> >>
> >>> >> >>        Until such, I would be pretty much against a tools module.
> >>> >>  Changing the name of the dumping ground doesn't make it any less
> >>>of a
> >>> >> dumping ground.
> >>> >> >
> >>> >> > IMO if the tools module only gets stuff like distcp that's
> >>>maintained
> >>> >> > then it's not contrib, if it contains all the stuff from the
> >>>current
> >>> >> > MR contrib then tools is just a re-labeling of contrib. Given that
> >>> >> > this proposal only covers moving distcp to tools it doesn't sound
> >>>like
> >>> >> > contrib to me.
> >>> >> >
> >>> >> > Thanks,
> >>> >> > Eli
> >>> >>
> >>> >>
> >>> >
> >>>
> >>
> >
>
>


Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Amareshwari Sri Ramadasu <am...@yahoo-inc.com>.
It is good to have hadoop-tools module separately. But as I asked before we need to answer some questions here. I'm trying to answer them myself. Comments are welcome.

> > 1.  Should the patches for tools be created against Hadoop Common?
Here, I meant should Hadoop common mailing list be used Or should we have a separate mailing list for Tools? I agree with Vinod  here, that we can tie it Hadoop-common jira/mailing lists.

> > 2.  What will happen to the tools test automation? Will it run as part of Hadoop Common tests?
Jenkins nightly/patch builds for Hadoop tools can run as part of Hadoop common if use Hadoop common mailing list for this.
Also, I propose every patch build of HDFS and MAPREDUCE should also run tools tests to make sure nothing is broken. That would ease the maintenance of hadoop-tools module. I presume tools test should not take much time (some thing like not more than 30 minutes).

> > 3.  Will it introduce a dependency from MapReduce to Common? Or is this
> taken care in Mavenization?
I'm not sure about this whether Mavenization can take care of it.

Thanks
Amareshwari

On 9/8/11 9:13 AM, "Rottinghuis, Joep" <jr...@ebay.com> wrote:

Does a separate hadoop-tools module imply that there will be a separate Jenkins build as well?

Thanks,

Joep
________________________________________
From: Alejandro Abdelnur [tucu@cloudera.com]
Sent: Wednesday, September 07, 2011 11:35 AM
To: mapreduce-dev@hadoop.apache.org
Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Makes sense

On Wed, Sep 7, 2011 at 11:32 AM, <Mi...@emc.com> wrote:

> +1 for separate hadoop-tools module. However, if a tool is broken at
> release time, and no one comes forward to fix it, it should be removed.
> (i.e. Unlike contrib modules, where build and test failures were
> tolerated.)
>
> - milind
>
> On 9/7/11 11:27 AM, "Mahadev Konar" <ma...@hortonworks.com> wrote:
>
> >I like the idea of having tools as a seperate module and I dont think
> >that it will be a dumping ground unless we choose to make one of it.
> >
> >+1 for hadoop tools module under trunk.
> >
> >thanks
> >mahadev
> >
> >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <tu...@cloudera.com>
> >wrote:
> >> Agreed, we should not have a dumping ground. IMO, what it would go into
> >> hadoop-tools (i.e. distcp, streaming and someone could argue for
> >>FsShell as
> >> well) are effectively hadoop CLI utilities. Having them in a separate
> >>module
> >> rather in than in the core module (common, hdfs, mapreduce) does not
> >>mean
> >> that they are secondary things, just modularization. Also it will help
> >>to
> >> get those tools to use public interfaces of the core module, and when we
> >> finally have a clean hadoop-client layer, those tools should only
> >>depend on
> >> that.
> >>
> >> Finally, the fact that tools would end up under trunk/hadoop-tools, it
> >>does
> >> not prevent that the packaging from HDFS and MAPREDUCE to bundle the
> >> same/different tools
> >>
> >> +1 for hadoop-tools/ (not binding)
> >>
> >> Thanks.
> >>
> >>
> >> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <er...@gmail.com> wrote:
> >>
> >>> Mapreduce and HDFS are distinct function of Hadoop.  They are loosely
> >>> coupled.  If we have tools aggregator module, it will not have as
> >>> clear distinct function as other Hadoop modules.  Hence, it is
> >>> possible for a tool to be depend on both HDFS and map reduce.  If
> >>> something broke in tools module, it is unclear which subproject's
> >>> responsibility to maintain tools function.  Therefore, it is safer to
> >>> send tools to incubator or apache extra rather than deposit the
> >>> utility tools in tools subcategory.  There are many short lived
> >>> projects that attempts to associate themselves with Hadoop but not
> >>> being maintained.  It would be better to spin off those utility
> >>> projects than use Hadoop as a dumping ground.
> >>>
> >>> The previous discussion for removing contrib, most people were in
> >>> favor of doing so, and only a few contrib owners were reluctant to
> >>> remove contrib.  Fewer people has participated in restore
> >>> functionality of broken contrib projects.  History speaks for itself.
> >>> -1 (non-binding) for hadoop-tools.
> >>>
> >>> regards,
> >>> Eric
> >>>
> >>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <tu...@cloudera.com>
> >>> wrote:
> >>> > Eric,
> >>> >
> >>> > Personally I'm fine either way.
> >>> >
> >>> > Still, I fail to see why a generic/categorized tools increase/reduce
> >>>the
> >>> > risk of dead code and how they make more-difficult/easier the
> >>> > package&deployment.
> >>> >
> >>> > Would you please explain this?
> >>> >
> >>> > Thanks.
> >>> >
> >>> > Alejandro
> >>> >
> >>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <er...@gmail.com> wrote:
> >>> >
> >>> >> Option #2 proposed by Amareshwari, seems like a better proposal.  We
> >>> don't
> >>> >> want to repeat history for contrib again with hadoop-tools.  Having
> >>>a
> >>> >> generic module like hadoop-tools increases the risk of accumulate
> >>>dead
> >>> code.
> >>> >>  It would be better to categorize the hdfs or mapreduce specific
> >>>tools
> >>> in
> >>> >> their respected subcategories.  It is also easier to manage from
> >>> >> package/deployment prospective.
> >>> >>
> >>> >> regards,
> >>> >> Eric
> >>> >>
> >>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
> >>> >>
> >>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw...@apache.org>
> >>> wrote:
> >>> >> >>
> >>> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
> >>> >> >>> We still need to answer Amareshwari's question (2) she asked
> >>>some
> >>> time
> >>> >> back
> >>> >> >>> about the automated code compilation and test execution of the
> >>>tools
> >>> >> module.
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >>>>> My #1 question is if tools is basically contrib reborn.  If
> >>>not,
> >>> what
> >>> >> >>>> makes
> >>> >> >>>>> it different?
> >>> >> >>
> >>> >> >>
> >>> >> >>        I'm still waiting for this answer as well.
> >>> >> >>
> >>> >> >>        Until such, I would be pretty much against a tools module.
> >>> >>  Changing the name of the dumping ground doesn't make it any less
> >>>of a
> >>> >> dumping ground.
> >>> >> >
> >>> >> > IMO if the tools module only gets stuff like distcp that's
> >>>maintained
> >>> >> > then it's not contrib, if it contains all the stuff from the
> >>>current
> >>> >> > MR contrib then tools is just a re-labeling of contrib. Given that
> >>> >> > this proposal only covers moving distcp to tools it doesn't sound
> >>>like
> >>> >> > contrib to me.
> >>> >> >
> >>> >> > Thanks,
> >>> >> > Eli
> >>> >>
> >>> >>
> >>> >
> >>>
> >>
> >
>
>


RE: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by "Rottinghuis, Joep" <jr...@ebay.com>.
Does a separate hadoop-tools module imply that there will be a separate Jenkins build as well?

Thanks,

Joep
________________________________________
From: Alejandro Abdelnur [tucu@cloudera.com]
Sent: Wednesday, September 07, 2011 11:35 AM
To: mapreduce-dev@hadoop.apache.org
Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Makes sense

On Wed, Sep 7, 2011 at 11:32 AM, <Mi...@emc.com> wrote:

> +1 for separate hadoop-tools module. However, if a tool is broken at
> release time, and no one comes forward to fix it, it should be removed.
> (i.e. Unlike contrib modules, where build and test failures were
> tolerated.)
>
> - milind
>
> On 9/7/11 11:27 AM, "Mahadev Konar" <ma...@hortonworks.com> wrote:
>
> >I like the idea of having tools as a seperate module and I dont think
> >that it will be a dumping ground unless we choose to make one of it.
> >
> >+1 for hadoop tools module under trunk.
> >
> >thanks
> >mahadev
> >
> >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <tu...@cloudera.com>
> >wrote:
> >> Agreed, we should not have a dumping ground. IMO, what it would go into
> >> hadoop-tools (i.e. distcp, streaming and someone could argue for
> >>FsShell as
> >> well) are effectively hadoop CLI utilities. Having them in a separate
> >>module
> >> rather in than in the core module (common, hdfs, mapreduce) does not
> >>mean
> >> that they are secondary things, just modularization. Also it will help
> >>to
> >> get those tools to use public interfaces of the core module, and when we
> >> finally have a clean hadoop-client layer, those tools should only
> >>depend on
> >> that.
> >>
> >> Finally, the fact that tools would end up under trunk/hadoop-tools, it
> >>does
> >> not prevent that the packaging from HDFS and MAPREDUCE to bundle the
> >> same/different tools
> >>
> >> +1 for hadoop-tools/ (not binding)
> >>
> >> Thanks.
> >>
> >>
> >> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <er...@gmail.com> wrote:
> >>
> >>> Mapreduce and HDFS are distinct function of Hadoop.  They are loosely
> >>> coupled.  If we have tools aggregator module, it will not have as
> >>> clear distinct function as other Hadoop modules.  Hence, it is
> >>> possible for a tool to be depend on both HDFS and map reduce.  If
> >>> something broke in tools module, it is unclear which subproject's
> >>> responsibility to maintain tools function.  Therefore, it is safer to
> >>> send tools to incubator or apache extra rather than deposit the
> >>> utility tools in tools subcategory.  There are many short lived
> >>> projects that attempts to associate themselves with Hadoop but not
> >>> being maintained.  It would be better to spin off those utility
> >>> projects than use Hadoop as a dumping ground.
> >>>
> >>> The previous discussion for removing contrib, most people were in
> >>> favor of doing so, and only a few contrib owners were reluctant to
> >>> remove contrib.  Fewer people has participated in restore
> >>> functionality of broken contrib projects.  History speaks for itself.
> >>> -1 (non-binding) for hadoop-tools.
> >>>
> >>> regards,
> >>> Eric
> >>>
> >>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <tu...@cloudera.com>
> >>> wrote:
> >>> > Eric,
> >>> >
> >>> > Personally I'm fine either way.
> >>> >
> >>> > Still, I fail to see why a generic/categorized tools increase/reduce
> >>>the
> >>> > risk of dead code and how they make more-difficult/easier the
> >>> > package&deployment.
> >>> >
> >>> > Would you please explain this?
> >>> >
> >>> > Thanks.
> >>> >
> >>> > Alejandro
> >>> >
> >>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <er...@gmail.com> wrote:
> >>> >
> >>> >> Option #2 proposed by Amareshwari, seems like a better proposal.  We
> >>> don't
> >>> >> want to repeat history for contrib again with hadoop-tools.  Having
> >>>a
> >>> >> generic module like hadoop-tools increases the risk of accumulate
> >>>dead
> >>> code.
> >>> >>  It would be better to categorize the hdfs or mapreduce specific
> >>>tools
> >>> in
> >>> >> their respected subcategories.  It is also easier to manage from
> >>> >> package/deployment prospective.
> >>> >>
> >>> >> regards,
> >>> >> Eric
> >>> >>
> >>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
> >>> >>
> >>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw...@apache.org>
> >>> wrote:
> >>> >> >>
> >>> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
> >>> >> >>> We still need to answer Amareshwari's question (2) she asked
> >>>some
> >>> time
> >>> >> back
> >>> >> >>> about the automated code compilation and test execution of the
> >>>tools
> >>> >> module.
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >>>>> My #1 question is if tools is basically contrib reborn.  If
> >>>not,
> >>> what
> >>> >> >>>> makes
> >>> >> >>>>> it different?
> >>> >> >>
> >>> >> >>
> >>> >> >>        I'm still waiting for this answer as well.
> >>> >> >>
> >>> >> >>        Until such, I would be pretty much against a tools module.
> >>> >>  Changing the name of the dumping ground doesn't make it any less
> >>>of a
> >>> >> dumping ground.
> >>> >> >
> >>> >> > IMO if the tools module only gets stuff like distcp that's
> >>>maintained
> >>> >> > then it's not contrib, if it contains all the stuff from the
> >>>current
> >>> >> > MR contrib then tools is just a re-labeling of contrib. Given that
> >>> >> > this proposal only covers moving distcp to tools it doesn't sound
> >>>like
> >>> >> > contrib to me.
> >>> >> >
> >>> >> > Thanks,
> >>> >> > Eli
> >>> >>
> >>> >>
> >>> >
> >>>
> >>
> >
>
>

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
Makes sense

On Wed, Sep 7, 2011 at 11:32 AM, <Mi...@emc.com> wrote:

> +1 for separate hadoop-tools module. However, if a tool is broken at
> release time, and no one comes forward to fix it, it should be removed.
> (i.e. Unlike contrib modules, where build and test failures were
> tolerated.)
>
> - milind
>
> On 9/7/11 11:27 AM, "Mahadev Konar" <ma...@hortonworks.com> wrote:
>
> >I like the idea of having tools as a seperate module and I dont think
> >that it will be a dumping ground unless we choose to make one of it.
> >
> >+1 for hadoop tools module under trunk.
> >
> >thanks
> >mahadev
> >
> >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <tu...@cloudera.com>
> >wrote:
> >> Agreed, we should not have a dumping ground. IMO, what it would go into
> >> hadoop-tools (i.e. distcp, streaming and someone could argue for
> >>FsShell as
> >> well) are effectively hadoop CLI utilities. Having them in a separate
> >>module
> >> rather in than in the core module (common, hdfs, mapreduce) does not
> >>mean
> >> that they are secondary things, just modularization. Also it will help
> >>to
> >> get those tools to use public interfaces of the core module, and when we
> >> finally have a clean hadoop-client layer, those tools should only
> >>depend on
> >> that.
> >>
> >> Finally, the fact that tools would end up under trunk/hadoop-tools, it
> >>does
> >> not prevent that the packaging from HDFS and MAPREDUCE to bundle the
> >> same/different tools
> >>
> >> +1 for hadoop-tools/ (not binding)
> >>
> >> Thanks.
> >>
> >>
> >> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <er...@gmail.com> wrote:
> >>
> >>> Mapreduce and HDFS are distinct function of Hadoop.  They are loosely
> >>> coupled.  If we have tools aggregator module, it will not have as
> >>> clear distinct function as other Hadoop modules.  Hence, it is
> >>> possible for a tool to be depend on both HDFS and map reduce.  If
> >>> something broke in tools module, it is unclear which subproject's
> >>> responsibility to maintain tools function.  Therefore, it is safer to
> >>> send tools to incubator or apache extra rather than deposit the
> >>> utility tools in tools subcategory.  There are many short lived
> >>> projects that attempts to associate themselves with Hadoop but not
> >>> being maintained.  It would be better to spin off those utility
> >>> projects than use Hadoop as a dumping ground.
> >>>
> >>> The previous discussion for removing contrib, most people were in
> >>> favor of doing so, and only a few contrib owners were reluctant to
> >>> remove contrib.  Fewer people has participated in restore
> >>> functionality of broken contrib projects.  History speaks for itself.
> >>> -1 (non-binding) for hadoop-tools.
> >>>
> >>> regards,
> >>> Eric
> >>>
> >>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <tu...@cloudera.com>
> >>> wrote:
> >>> > Eric,
> >>> >
> >>> > Personally I'm fine either way.
> >>> >
> >>> > Still, I fail to see why a generic/categorized tools increase/reduce
> >>>the
> >>> > risk of dead code and how they make more-difficult/easier the
> >>> > package&deployment.
> >>> >
> >>> > Would you please explain this?
> >>> >
> >>> > Thanks.
> >>> >
> >>> > Alejandro
> >>> >
> >>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <er...@gmail.com> wrote:
> >>> >
> >>> >> Option #2 proposed by Amareshwari, seems like a better proposal.  We
> >>> don't
> >>> >> want to repeat history for contrib again with hadoop-tools.  Having
> >>>a
> >>> >> generic module like hadoop-tools increases the risk of accumulate
> >>>dead
> >>> code.
> >>> >>  It would be better to categorize the hdfs or mapreduce specific
> >>>tools
> >>> in
> >>> >> their respected subcategories.  It is also easier to manage from
> >>> >> package/deployment prospective.
> >>> >>
> >>> >> regards,
> >>> >> Eric
> >>> >>
> >>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
> >>> >>
> >>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw...@apache.org>
> >>> wrote:
> >>> >> >>
> >>> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
> >>> >> >>> We still need to answer Amareshwari's question (2) she asked
> >>>some
> >>> time
> >>> >> back
> >>> >> >>> about the automated code compilation and test execution of the
> >>>tools
> >>> >> module.
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >>>>> My #1 question is if tools is basically contrib reborn.  If
> >>>not,
> >>> what
> >>> >> >>>> makes
> >>> >> >>>>> it different?
> >>> >> >>
> >>> >> >>
> >>> >> >>        I'm still waiting for this answer as well.
> >>> >> >>
> >>> >> >>        Until such, I would be pretty much against a tools module.
> >>> >>  Changing the name of the dumping ground doesn't make it any less
> >>>of a
> >>> >> dumping ground.
> >>> >> >
> >>> >> > IMO if the tools module only gets stuff like distcp that's
> >>>maintained
> >>> >> > then it's not contrib, if it contains all the stuff from the
> >>>current
> >>> >> > MR contrib then tools is just a re-labeling of contrib. Given that
> >>> >> > this proposal only covers moving distcp to tools it doesn't sound
> >>>like
> >>> >> > contrib to me.
> >>> >> >
> >>> >> > Thanks,
> >>> >> > Eli
> >>> >>
> >>> >>
> >>> >
> >>>
> >>
> >
>
>

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Mi...@emc.com.
+1 for separate hadoop-tools module. However, if a tool is broken at
release time, and no one comes forward to fix it, it should be removed.
(i.e. Unlike contrib modules, where build and test failures were
tolerated.)

- milind

On 9/7/11 11:27 AM, "Mahadev Konar" <ma...@hortonworks.com> wrote:

>I like the idea of having tools as a seperate module and I dont think
>that it will be a dumping ground unless we choose to make one of it.
>
>+1 for hadoop tools module under trunk.
>
>thanks
>mahadev
>
>On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <tu...@cloudera.com>
>wrote:
>> Agreed, we should not have a dumping ground. IMO, what it would go into
>> hadoop-tools (i.e. distcp, streaming and someone could argue for
>>FsShell as
>> well) are effectively hadoop CLI utilities. Having them in a separate
>>module
>> rather in than in the core module (common, hdfs, mapreduce) does not
>>mean
>> that they are secondary things, just modularization. Also it will help
>>to
>> get those tools to use public interfaces of the core module, and when we
>> finally have a clean hadoop-client layer, those tools should only
>>depend on
>> that.
>>
>> Finally, the fact that tools would end up under trunk/hadoop-tools, it
>>does
>> not prevent that the packaging from HDFS and MAPREDUCE to bundle the
>> same/different tools
>>
>> +1 for hadoop-tools/ (not binding)
>>
>> Thanks.
>>
>>
>> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <er...@gmail.com> wrote:
>>
>>> Mapreduce and HDFS are distinct function of Hadoop.  They are loosely
>>> coupled.  If we have tools aggregator module, it will not have as
>>> clear distinct function as other Hadoop modules.  Hence, it is
>>> possible for a tool to be depend on both HDFS and map reduce.  If
>>> something broke in tools module, it is unclear which subproject's
>>> responsibility to maintain tools function.  Therefore, it is safer to
>>> send tools to incubator or apache extra rather than deposit the
>>> utility tools in tools subcategory.  There are many short lived
>>> projects that attempts to associate themselves with Hadoop but not
>>> being maintained.  It would be better to spin off those utility
>>> projects than use Hadoop as a dumping ground.
>>>
>>> The previous discussion for removing contrib, most people were in
>>> favor of doing so, and only a few contrib owners were reluctant to
>>> remove contrib.  Fewer people has participated in restore
>>> functionality of broken contrib projects.  History speaks for itself.
>>> -1 (non-binding) for hadoop-tools.
>>>
>>> regards,
>>> Eric
>>>
>>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <tu...@cloudera.com>
>>> wrote:
>>> > Eric,
>>> >
>>> > Personally I'm fine either way.
>>> >
>>> > Still, I fail to see why a generic/categorized tools increase/reduce
>>>the
>>> > risk of dead code and how they make more-difficult/easier the
>>> > package&deployment.
>>> >
>>> > Would you please explain this?
>>> >
>>> > Thanks.
>>> >
>>> > Alejandro
>>> >
>>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <er...@gmail.com> wrote:
>>> >
>>> >> Option #2 proposed by Amareshwari, seems like a better proposal.  We
>>> don't
>>> >> want to repeat history for contrib again with hadoop-tools.  Having
>>>a
>>> >> generic module like hadoop-tools increases the risk of accumulate
>>>dead
>>> code.
>>> >>  It would be better to categorize the hdfs or mapreduce specific
>>>tools
>>> in
>>> >> their respected subcategories.  It is also easier to manage from
>>> >> package/deployment prospective.
>>> >>
>>> >> regards,
>>> >> Eric
>>> >>
>>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
>>> >>
>>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw...@apache.org>
>>> wrote:
>>> >> >>
>>> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
>>> >> >>> We still need to answer Amareshwari's question (2) she asked
>>>some
>>> time
>>> >> back
>>> >> >>> about the automated code compilation and test execution of the
>>>tools
>>> >> module.
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>>>> My #1 question is if tools is basically contrib reborn.  If
>>>not,
>>> what
>>> >> >>>> makes
>>> >> >>>>> it different?
>>> >> >>
>>> >> >>
>>> >> >>        I'm still waiting for this answer as well.
>>> >> >>
>>> >> >>        Until such, I would be pretty much against a tools module.
>>> >>  Changing the name of the dumping ground doesn't make it any less
>>>of a
>>> >> dumping ground.
>>> >> >
>>> >> > IMO if the tools module only gets stuff like distcp that's
>>>maintained
>>> >> > then it's not contrib, if it contains all the stuff from the
>>>current
>>> >> > MR contrib then tools is just a re-labeling of contrib. Given that
>>> >> > this proposal only covers moving distcp to tools it doesn't sound
>>>like
>>> >> > contrib to me.
>>> >> >
>>> >> > Thanks,
>>> >> > Eli
>>> >>
>>> >>
>>> >
>>>
>>
>


Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Mahadev Konar <ma...@hortonworks.com>.
I like the idea of having tools as a seperate module and I dont think
that it will be a dumping ground unless we choose to make one of it.

+1 for hadoop tools module under trunk.

thanks
mahadev

On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <tu...@cloudera.com> wrote:
> Agreed, we should not have a dumping ground. IMO, what it would go into
> hadoop-tools (i.e. distcp, streaming and someone could argue for FsShell as
> well) are effectively hadoop CLI utilities. Having them in a separate module
> rather in than in the core module (common, hdfs, mapreduce) does not mean
> that they are secondary things, just modularization. Also it will help to
> get those tools to use public interfaces of the core module, and when we
> finally have a clean hadoop-client layer, those tools should only depend on
> that.
>
> Finally, the fact that tools would end up under trunk/hadoop-tools, it does
> not prevent that the packaging from HDFS and MAPREDUCE to bundle the
> same/different tools
>
> +1 for hadoop-tools/ (not binding)
>
> Thanks.
>
>
> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <er...@gmail.com> wrote:
>
>> Mapreduce and HDFS are distinct function of Hadoop.  They are loosely
>> coupled.  If we have tools aggregator module, it will not have as
>> clear distinct function as other Hadoop modules.  Hence, it is
>> possible for a tool to be depend on both HDFS and map reduce.  If
>> something broke in tools module, it is unclear which subproject's
>> responsibility to maintain tools function.  Therefore, it is safer to
>> send tools to incubator or apache extra rather than deposit the
>> utility tools in tools subcategory.  There are many short lived
>> projects that attempts to associate themselves with Hadoop but not
>> being maintained.  It would be better to spin off those utility
>> projects than use Hadoop as a dumping ground.
>>
>> The previous discussion for removing contrib, most people were in
>> favor of doing so, and only a few contrib owners were reluctant to
>> remove contrib.  Fewer people has participated in restore
>> functionality of broken contrib projects.  History speaks for itself.
>> -1 (non-binding) for hadoop-tools.
>>
>> regards,
>> Eric
>>
>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <tu...@cloudera.com>
>> wrote:
>> > Eric,
>> >
>> > Personally I'm fine either way.
>> >
>> > Still, I fail to see why a generic/categorized tools increase/reduce the
>> > risk of dead code and how they make more-difficult/easier the
>> > package&deployment.
>> >
>> > Would you please explain this?
>> >
>> > Thanks.
>> >
>> > Alejandro
>> >
>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <er...@gmail.com> wrote:
>> >
>> >> Option #2 proposed by Amareshwari, seems like a better proposal.  We
>> don't
>> >> want to repeat history for contrib again with hadoop-tools.  Having a
>> >> generic module like hadoop-tools increases the risk of accumulate dead
>> code.
>> >>  It would be better to categorize the hdfs or mapreduce specific tools
>> in
>> >> their respected subcategories.  It is also easier to manage from
>> >> package/deployment prospective.
>> >>
>> >> regards,
>> >> Eric
>> >>
>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
>> >>
>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw...@apache.org>
>> wrote:
>> >> >>
>> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
>> >> >>> We still need to answer Amareshwari's question (2) she asked some
>> time
>> >> back
>> >> >>> about the automated code compilation and test execution of the tools
>> >> module.
>> >> >>
>> >> >>
>> >> >>
>> >> >>>>> My #1 question is if tools is basically contrib reborn.  If not,
>> what
>> >> >>>> makes
>> >> >>>>> it different?
>> >> >>
>> >> >>
>> >> >>        I'm still waiting for this answer as well.
>> >> >>
>> >> >>        Until such, I would be pretty much against a tools module.
>> >>  Changing the name of the dumping ground doesn't make it any less of a
>> >> dumping ground.
>> >> >
>> >> > IMO if the tools module only gets stuff like distcp that's maintained
>> >> > then it's not contrib, if it contains all the stuff from the current
>> >> > MR contrib then tools is just a re-labeling of contrib. Given that
>> >> > this proposal only covers moving distcp to tools it doesn't sound like
>> >> > contrib to me.
>> >> >
>> >> > Thanks,
>> >> > Eli
>> >>
>> >>
>> >
>>
>

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
Agreed, we should not have a dumping ground. IMO, what it would go into
hadoop-tools (i.e. distcp, streaming and someone could argue for FsShell as
well) are effectively hadoop CLI utilities. Having them in a separate module
rather in than in the core module (common, hdfs, mapreduce) does not mean
that they are secondary things, just modularization. Also it will help to
get those tools to use public interfaces of the core module, and when we
finally have a clean hadoop-client layer, those tools should only depend on
that.

Finally, the fact that tools would end up under trunk/hadoop-tools, it does
not prevent that the packaging from HDFS and MAPREDUCE to bundle the
same/different tools

+1 for hadoop-tools/ (not binding)

Thanks.


On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <er...@gmail.com> wrote:

> Mapreduce and HDFS are distinct function of Hadoop.  They are loosely
> coupled.  If we have tools aggregator module, it will not have as
> clear distinct function as other Hadoop modules.  Hence, it is
> possible for a tool to be depend on both HDFS and map reduce.  If
> something broke in tools module, it is unclear which subproject's
> responsibility to maintain tools function.  Therefore, it is safer to
> send tools to incubator or apache extra rather than deposit the
> utility tools in tools subcategory.  There are many short lived
> projects that attempts to associate themselves with Hadoop but not
> being maintained.  It would be better to spin off those utility
> projects than use Hadoop as a dumping ground.
>
> The previous discussion for removing contrib, most people were in
> favor of doing so, and only a few contrib owners were reluctant to
> remove contrib.  Fewer people has participated in restore
> functionality of broken contrib projects.  History speaks for itself.
> -1 (non-binding) for hadoop-tools.
>
> regards,
> Eric
>
> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <tu...@cloudera.com>
> wrote:
> > Eric,
> >
> > Personally I'm fine either way.
> >
> > Still, I fail to see why a generic/categorized tools increase/reduce the
> > risk of dead code and how they make more-difficult/easier the
> > package&deployment.
> >
> > Would you please explain this?
> >
> > Thanks.
> >
> > Alejandro
> >
> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <er...@gmail.com> wrote:
> >
> >> Option #2 proposed by Amareshwari, seems like a better proposal.  We
> don't
> >> want to repeat history for contrib again with hadoop-tools.  Having a
> >> generic module like hadoop-tools increases the risk of accumulate dead
> code.
> >>  It would be better to categorize the hdfs or mapreduce specific tools
> in
> >> their respected subcategories.  It is also easier to manage from
> >> package/deployment prospective.
> >>
> >> regards,
> >> Eric
> >>
> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
> >>
> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw...@apache.org>
> wrote:
> >> >>
> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
> >> >>> We still need to answer Amareshwari's question (2) she asked some
> time
> >> back
> >> >>> about the automated code compilation and test execution of the tools
> >> module.
> >> >>
> >> >>
> >> >>
> >> >>>>> My #1 question is if tools is basically contrib reborn.  If not,
> what
> >> >>>> makes
> >> >>>>> it different?
> >> >>
> >> >>
> >> >>        I'm still waiting for this answer as well.
> >> >>
> >> >>        Until such, I would be pretty much against a tools module.
> >>  Changing the name of the dumping ground doesn't make it any less of a
> >> dumping ground.
> >> >
> >> > IMO if the tools module only gets stuff like distcp that's maintained
> >> > then it's not contrib, if it contains all the stuff from the current
> >> > MR contrib then tools is just a re-labeling of contrib. Given that
> >> > this proposal only covers moving distcp to tools it doesn't sound like
> >> > contrib to me.
> >> >
> >> > Thanks,
> >> > Eli
> >>
> >>
> >
>

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Eric Yang <er...@gmail.com>.
Mapreduce and HDFS are distinct function of Hadoop.  They are loosely
coupled.  If we have tools aggregator module, it will not have as
clear distinct function as other Hadoop modules.  Hence, it is
possible for a tool to be depend on both HDFS and map reduce.  If
something broke in tools module, it is unclear which subproject's
responsibility to maintain tools function.  Therefore, it is safer to
send tools to incubator or apache extra rather than deposit the
utility tools in tools subcategory.  There are many short lived
projects that attempts to associate themselves with Hadoop but not
being maintained.  It would be better to spin off those utility
projects than use Hadoop as a dumping ground.

The previous discussion for removing contrib, most people were in
favor of doing so, and only a few contrib owners were reluctant to
remove contrib.  Fewer people has participated in restore
functionality of broken contrib projects.  History speaks for itself.
-1 (non-binding) for hadoop-tools.

regards,
Eric

On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <tu...@cloudera.com> wrote:
> Eric,
>
> Personally I'm fine either way.
>
> Still, I fail to see why a generic/categorized tools increase/reduce the
> risk of dead code and how they make more-difficult/easier the
> package&deployment.
>
> Would you please explain this?
>
> Thanks.
>
> Alejandro
>
> On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <er...@gmail.com> wrote:
>
>> Option #2 proposed by Amareshwari, seems like a better proposal.  We don't
>> want to repeat history for contrib again with hadoop-tools.  Having a
>> generic module like hadoop-tools increases the risk of accumulate dead code.
>>  It would be better to categorize the hdfs or mapreduce specific tools in
>> their respected subcategories.  It is also easier to manage from
>> package/deployment prospective.
>>
>> regards,
>> Eric
>>
>> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
>>
>> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw...@apache.org> wrote:
>> >>
>> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
>> >>> We still need to answer Amareshwari's question (2) she asked some time
>> back
>> >>> about the automated code compilation and test execution of the tools
>> module.
>> >>
>> >>
>> >>
>> >>>>> My #1 question is if tools is basically contrib reborn.  If not, what
>> >>>> makes
>> >>>>> it different?
>> >>
>> >>
>> >>        I'm still waiting for this answer as well.
>> >>
>> >>        Until such, I would be pretty much against a tools module.
>>  Changing the name of the dumping ground doesn't make it any less of a
>> dumping ground.
>> >
>> > IMO if the tools module only gets stuff like distcp that's maintained
>> > then it's not contrib, if it contains all the stuff from the current
>> > MR contrib then tools is just a re-labeling of contrib. Given that
>> > this proposal only covers moving distcp to tools it doesn't sound like
>> > contrib to me.
>> >
>> > Thanks,
>> > Eli
>>
>>
>

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Eric Yang <er...@gmail.com>.
Mapreduce and HDFS are distinct function of Hadoop.  They are loosely
coupled.  If we have tools aggregator module, it will not have as
clear distinct function as other Hadoop modules.  Hence, it is
possible for a tool to be depend on both HDFS and map reduce.  If
something broke in tools module, it is unclear which subproject's
responsibility to maintain tools function.  Therefore, it is safer to
send tools to incubator or apache extra rather than deposit the
utility tools in tools subcategory.  There are many short lived
projects that attempts to associate themselves with Hadoop but not
being maintained.  It would be better to spin off those utility
projects than use Hadoop as a dumping ground.

The previous discussion for removing contrib, most people were in
favor of doing so, and only a few contrib owners were reluctant to
remove contrib.  Fewer people has participated in restore
functionality of broken contrib projects.  History speaks for itself.
-1 (non-binding) for hadoop-tools.

regards,
Eric

On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <tu...@cloudera.com> wrote:
> Eric,
>
> Personally I'm fine either way.
>
> Still, I fail to see why a generic/categorized tools increase/reduce the
> risk of dead code and how they make more-difficult/easier the
> package&deployment.
>
> Would you please explain this?
>
> Thanks.
>
> Alejandro
>
> On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <er...@gmail.com> wrote:
>
>> Option #2 proposed by Amareshwari, seems like a better proposal.  We don't
>> want to repeat history for contrib again with hadoop-tools.  Having a
>> generic module like hadoop-tools increases the risk of accumulate dead code.
>>  It would be better to categorize the hdfs or mapreduce specific tools in
>> their respected subcategories.  It is also easier to manage from
>> package/deployment prospective.
>>
>> regards,
>> Eric
>>
>> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
>>
>> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw...@apache.org> wrote:
>> >>
>> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
>> >>> We still need to answer Amareshwari's question (2) she asked some time
>> back
>> >>> about the automated code compilation and test execution of the tools
>> module.
>> >>
>> >>
>> >>
>> >>>>> My #1 question is if tools is basically contrib reborn.  If not, what
>> >>>> makes
>> >>>>> it different?
>> >>
>> >>
>> >>        I'm still waiting for this answer as well.
>> >>
>> >>        Until such, I would be pretty much against a tools module.
>>  Changing the name of the dumping ground doesn't make it any less of a
>> dumping ground.
>> >
>> > IMO if the tools module only gets stuff like distcp that's maintained
>> > then it's not contrib, if it contains all the stuff from the current
>> > MR contrib then tools is just a re-labeling of contrib. Given that
>> > this proposal only covers moving distcp to tools it doesn't sound like
>> > contrib to me.
>> >
>> > Thanks,
>> > Eli
>>
>>
>

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
Eric,

Personally I'm fine either way.

Still, I fail to see why a generic/categorized tools increase/reduce the
risk of dead code and how they make more-difficult/easier the
package&deployment.

Would you please explain this?

Thanks.

Alejandro

On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <er...@gmail.com> wrote:

> Option #2 proposed by Amareshwari, seems like a better proposal.  We don't
> want to repeat history for contrib again with hadoop-tools.  Having a
> generic module like hadoop-tools increases the risk of accumulate dead code.
>  It would be better to categorize the hdfs or mapreduce specific tools in
> their respected subcategories.  It is also easier to manage from
> package/deployment prospective.
>
> regards,
> Eric
>
> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
>
> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw...@apache.org> wrote:
> >>
> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
> >>> We still need to answer Amareshwari's question (2) she asked some time
> back
> >>> about the automated code compilation and test execution of the tools
> module.
> >>
> >>
> >>
> >>>>> My #1 question is if tools is basically contrib reborn.  If not, what
> >>>> makes
> >>>>> it different?
> >>
> >>
> >>        I'm still waiting for this answer as well.
> >>
> >>        Until such, I would be pretty much against a tools module.
>  Changing the name of the dumping ground doesn't make it any less of a
> dumping ground.
> >
> > IMO if the tools module only gets stuff like distcp that's maintained
> > then it's not contrib, if it contains all the stuff from the current
> > MR contrib then tools is just a re-labeling of contrib. Given that
> > this proposal only covers moving distcp to tools it doesn't sound like
> > contrib to me.
> >
> > Thanks,
> > Eli
>
>

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
Eric,

Personally I'm fine either way.

Still, I fail to see why a generic/categorized tools increase/reduce the
risk of dead code and how they make more-difficult/easier the
package&deployment.

Would you please explain this?

Thanks.

Alejandro

On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <er...@gmail.com> wrote:

> Option #2 proposed by Amareshwari, seems like a better proposal.  We don't
> want to repeat history for contrib again with hadoop-tools.  Having a
> generic module like hadoop-tools increases the risk of accumulate dead code.
>  It would be better to categorize the hdfs or mapreduce specific tools in
> their respected subcategories.  It is also easier to manage from
> package/deployment prospective.
>
> regards,
> Eric
>
> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
>
> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw...@apache.org> wrote:
> >>
> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
> >>> We still need to answer Amareshwari's question (2) she asked some time
> back
> >>> about the automated code compilation and test execution of the tools
> module.
> >>
> >>
> >>
> >>>>> My #1 question is if tools is basically contrib reborn.  If not, what
> >>>> makes
> >>>>> it different?
> >>
> >>
> >>        I'm still waiting for this answer as well.
> >>
> >>        Until such, I would be pretty much against a tools module.
>  Changing the name of the dumping ground doesn't make it any less of a
> dumping ground.
> >
> > IMO if the tools module only gets stuff like distcp that's maintained
> > then it's not contrib, if it contains all the stuff from the current
> > MR contrib then tools is just a re-labeling of contrib. Given that
> > this proposal only covers moving distcp to tools it doesn't sound like
> > contrib to me.
> >
> > Thanks,
> > Eli
>
>

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Eric Yang <er...@gmail.com>.
Option #2 proposed by Amareshwari, seems like a better proposal.  We don't want to repeat history for contrib again with hadoop-tools.  Having a generic module like hadoop-tools increases the risk of accumulate dead code.  It would be better to categorize the hdfs or mapreduce specific tools in their respected subcategories.  It is also easier to manage from package/deployment prospective.

regards,
Eric

On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:

> On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw...@apache.org> wrote:
>> 
>> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
>>> We still need to answer Amareshwari's question (2) she asked some time back
>>> about the automated code compilation and test execution of the tools module.
>> 
>> 
>> 
>>>>> My #1 question is if tools is basically contrib reborn.  If not, what
>>>> makes
>>>>> it different?
>> 
>> 
>>        I'm still waiting for this answer as well.
>> 
>>        Until such, I would be pretty much against a tools module.  Changing the name of the dumping ground doesn't make it any less of a dumping ground.
> 
> IMO if the tools module only gets stuff like distcp that's maintained
> then it's not contrib, if it contains all the stuff from the current
> MR contrib then tools is just a re-labeling of contrib. Given that
> this proposal only covers moving distcp to tools it doesn't sound like
> contrib to me.
> 
> Thanks,
> Eli


Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Eric Yang <er...@gmail.com>.
Option #2 proposed by Amareshwari, seems like a better proposal.  We don't want to repeat history for contrib again with hadoop-tools.  Having a generic module like hadoop-tools increases the risk of accumulate dead code.  It would be better to categorize the hdfs or mapreduce specific tools in their respected subcategories.  It is also easier to manage from package/deployment prospective.

regards,
Eric

On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:

> On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw...@apache.org> wrote:
>> 
>> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
>>> We still need to answer Amareshwari's question (2) she asked some time back
>>> about the automated code compilation and test execution of the tools module.
>> 
>> 
>> 
>>>>> My #1 question is if tools is basically contrib reborn.  If not, what
>>>> makes
>>>>> it different?
>> 
>> 
>>        I'm still waiting for this answer as well.
>> 
>>        Until such, I would be pretty much against a tools module.  Changing the name of the dumping ground doesn't make it any less of a dumping ground.
> 
> IMO if the tools module only gets stuff like distcp that's maintained
> then it's not contrib, if it contains all the stuff from the current
> MR contrib then tools is just a re-labeling of contrib. Given that
> this proposal only covers moving distcp to tools it doesn't sound like
> contrib to me.
> 
> Thanks,
> Eli


Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Allen Wittenauer <aw...@apache.org>.
On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
> 
> IMO if the tools module only gets stuff like distcp that's maintained
> then it's not contrib, if it contains all the stuff from the current
> MR contrib then tools is just a re-labeling of contrib. Given that
> this proposal only covers moving distcp to tools it doesn't sound like
> contrib to me.

	At one point, everything in contrib was maintained.  So I guess the big question is: what is the gating criteria for something to get entry into tools?

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Eli Collins <el...@cloudera.com>.
On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw...@apache.org> wrote:
>
> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
>> We still need to answer Amareshwari's question (2) she asked some time back
>> about the automated code compilation and test execution of the tools module.
>
>
>
>>>> My #1 question is if tools is basically contrib reborn.  If not, what
>>> makes
>>>> it different?
>
>
>        I'm still waiting for this answer as well.
>
>        Until such, I would be pretty much against a tools module.  Changing the name of the dumping ground doesn't make it any less of a dumping ground.

IMO if the tools module only gets stuff like distcp that's maintained
then it's not contrib, if it contains all the stuff from the current
MR contrib then tools is just a re-labeling of contrib. Given that
this proposal only covers moving distcp to tools it doesn't sound like
contrib to me.

Thanks,
Eli

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Eli Collins <el...@cloudera.com>.
On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <aw...@apache.org> wrote:
>
> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
>> We still need to answer Amareshwari's question (2) she asked some time back
>> about the automated code compilation and test execution of the tools module.
>
>
>
>>>> My #1 question is if tools is basically contrib reborn.  If not, what
>>> makes
>>>> it different?
>
>
>        I'm still waiting for this answer as well.
>
>        Until such, I would be pretty much against a tools module.  Changing the name of the dumping ground doesn't make it any less of a dumping ground.

IMO if the tools module only gets stuff like distcp that's maintained
then it's not contrib, if it contains all the stuff from the current
MR contrib then tools is just a re-labeling of contrib. Given that
this proposal only covers moving distcp to tools it doesn't sound like
contrib to me.

Thanks,
Eli

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Allen Wittenauer <aw...@apache.org>.
On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
> We still need to answer Amareshwari's question (2) she asked some time back
> about the automated code compilation and test execution of the tools module.



>>> My #1 question is if tools is basically contrib reborn.  If not, what
>> makes
>>> it different?


	I'm still waiting for this answer as well.

	Until such, I would be pretty much against a tools module.  Changing the name of the dumping ground doesn't make it any less of a dumping ground.

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Allen Wittenauer <aw...@apache.org>.
On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
> We still need to answer Amareshwari's question (2) she asked some time back
> about the automated code compilation and test execution of the tools module.



>>> My #1 question is if tools is basically contrib reborn.  If not, what
>> makes
>>> it different?


	I'm still waiting for this answer as well.

	Until such, I would be pretty much against a tools module.  Changing the name of the dumping ground doesn't make it any less of a dumping ground.

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
On Tue, Sep 6, 2011 at 10:58 AM, Mithun Radhakrishnan <
mithun.radhakrishnan@yahoo.com> wrote:

> I'm leaning towards creating a trunk/hadoop-tools/hadoop-distcp (etc.). I'm
> hoping that's going to be acceptable to this forum. This way, moving it out
> to a separate source tree should be easier.
>


+1 for moving forward with this proposal.

We still need to answer Amareshwari's question (2) she asked some time back
about the automated code compilation and test execution of the tools module.
Right now we have separate automated builds for common, hdfs and mapreduce.
If we go with the above proposal, we need to setup automated builds for the
tools modules and possibly tie the related JIRA/Jenkins emails with the
common-project lists.



> It would be nice to have clarity on how tools will be dealt with. It'd be
> convenient to distcp in trunk. (It's tiny and useful.) On the other hand,
> that might be opening doors to adding too much, and complicating the
> build/release. I'd appreciate advice on which way is best.
>
> In the meantime, I'll align the distcpv2 pom.xml with the maven-ized
> version of things, as per Tucu's suggestions.
>
>
+1


Thanks,
+Vinod



> ________________________________
> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
> To: mapreduce-dev@hadoop.apache.org
> Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; Mithun
> Radhakrishnan <mi...@yahoo.com>
> Sent: Tuesday, August 30, 2011 6:13 PM
> Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
>
> As long as hadoop-tools is in some directory at some depth under trunk,
> release of the hadoop-tools is tied to the release of core.
>
> So we actually have these two options instead:
> (1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools)
>     -- Sources at tools/trunk/hadoop-distcp
>     -- Each tool will work with specific version of Hadoop core.
>     -- Releases can really be separate
> (2) Same source tree: trunk/
>     -- Sources at either (1.1) trunk/hadoop-tools or (1.2)
> trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/
>     -- Given release isn't decoupled anyway, either will work. (1.2) is
> prefereable if building mapreduce builds the tools also.
>
> +Vinod
>
>
> On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu <
> amarsri@yahoo-inc.com> wrote:
>
> > Copying common-dev.
> >
> > Summarizing the below discussion: What should be the tools layout after
> > mavenization?
> >
> > Option #1: Have hadoop-tools at top level i.e
> > trunk/
> >   hadoop-tools/
> >       hadoop-distcp/
> > Pros:
> >  Cleaner layout.
> >  In future, tools could be released separately from  Hadoop releases
> >
> > Cons: Difficult to maintain
> >
> > Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if
> > they are depending on MapReduce/HDFS/Common respectively.
> > For ex:
> > hadoop-mapreduce-project/
> >   hadoop-mr-tools/
> >      hadoop-distcp/
> >
> > Pros: Easy to maintain
> > Cons: Still has tight coupling with related projects.
> >
> > Personally, I'm fine with any of the above options. Looking for
> suggestions
> > and reaching a consensus on this.
> >
> > Thanks
> > Amareshwari
> >
> > On 8/30/11 12:10 AM, "Allen Wittenauer" <aw...@apache.org> wrote:
> >
> >
> >
> > I have a feeling this discussion should get moved to common-dev or even
> to
> > general.
> >
> > My #1 question is if tools is basically contrib reborn.  If not, what
> makes
> > it different?
> >
> > On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote:
> >
> > > Some questions on making hadoop-tools top level under trunk,
> > >
> > > 1.  Should the patches for tools be created against Hadoop Common?
> > > 2.  What will happen to the tools test automation? Will it run as part
> of
> > Hadoop Common tests?
> > > 3.  Will it introduce a dependency from MapReduce to Common? Or is this
> > taken care in Mavenization?
> > >
> > >
> > > Thanks
> > > Amareshwari
> > >
> > > On 8/26/11 10:17 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
> > >
> > > Please, don't add more Mavenization work on us (eventually I want to go
> > back
> > > to coding)
> > >
> > > Given that Hadoop is already Mavenized, the patch should be Mavenized.
> > >
> > > What will have to be done extra (besides Mavenizing distcp) is to
> create
> > a
> > > hadoop-tools module at root level and within it a hadoop-distcp module.
> > >
> > > The hadoop-tools POM will look pretty much like the
> hadoop-common-project
> > > POM.
> > >
> > > The hadoop-distcp POM should follow the hadoop-common POM patterns.
> > >
> > > Thanks.
> > >
> > > Alejandro
> > >
> > > On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
> > > amarsri@yahoo-inc.com> wrote:
> > >
> > >> Agree with Mithun and Robert. DistCp and Tools restructuring are
> > separate
> > >> tasks. Since DistCp code is ready to be committed, it need not wait
> for
> > the
> > >> Tools separation from MR/HDFS.
> > >> I would say it can go into contrib as the patch is now, and when the
> > tools
> > >> restructuring happens it would be just an svn mv.  If there are no
> > issues
> > >> with this proposal I can commit the code tomorrow.
> > >>
> > >> Thanks
> > >> Amareshwari
> > >>
> > >> On 8/26/11 7:45 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote:
> > >>
> > >> I agree with Mithun.  They are related but this goes beyond distcpv2
> and
> > >> should not block distcpv2 from going in.  It would be very nice,
> > however, to
> > >> get the layout settled soon so that we all know where to find
> something
> > when
> > >> we want to work on it.
> > >>
> > >> Also +1 for Alejandro's I also prefer to keep tools at the trunk
> level.
> > >>
> > >> Even though HDFS, Common, and Mapreduce and perhaps soon tools are
> > separate
> > >> modules right now, there is still tight coupling between the different
> > >> pieces, especially with tests.  IMO until we can reduce that coupling
> we
> > >> should treat building and testing Hadoop as a single project instead
> of
> > >> trying to keep them separate.
> > >>
> > >> --Bobby
> > >>
> > >> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <
> > mithun.radhakrishnan@yahoo.com>
> > >> wrote:
> > >>
> > >> Would it be acceptable if retooling of tools/ were taken up
> separately?
> > It
> > >> sounds to me like this might be a distinct (albeit related) task.
> > >>
> > >> Mithun
> > >>
> > >>
> > >> ________________________________
> > >> From: Giridharan Kesavan <gk...@hortonworks.com>
> > >> To: mapreduce-dev@hadoop.apache.org
> > >> Sent: Friday, August 26, 2011 12:04 PM
> > >> Subject: Re: DistCpV2 in 0.23
> > >>
> > >> +1 to Alejandro's
> > >>
> > >> I prefer to keep the hadoop-tools at trunk level.
> > >>
> > >> -Giri
> > >>
> > >> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <
> tucu@cloudera.com>
> > >> wrote:
> > >>> I'd suggest putting hadoop-tools either at trunk/ level or having a a
> > >> tools
> > >>> aggregator module for hdfs and other for common.
> > >>>
> > >>> I personal would prefer at trunk/.
> > >>>
> > >>> Thanks.
> > >>>
> > >>> Alejandro
> > >>>
> > >>> On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
> > >>> amarsri@yahoo-inc.com> wrote:
> > >>>
> > >>>> Agree. It should be separate maven module (and patch puts it as
> > separate
> > >>>> maven module now). And top level for hadoop tools is nice to have,
> but
> > >> it
> > >>>> becomes hard to maintain until patch automation tests run the tests
> > >> under
> > >>>> tools. Currently we see many times the changes in HDFS effecting
> RAID
> > >> tests
> > >>>> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
> > >>>>
> > >>>> I propose we can have something like the following:
> > >>>>
> > >>>> trunk/
> > >>>> - hadoop-mapreduce
> > >>>>     - hadoop-mr-client
> > >>>>     - hadoop-yarn
> > >>>>     - hadoop-tools
> > >>>>         - hadoop-streaming
> > >>>>         - hadoop-archives
> > >>>>         - hadoop-distcp
> > >>>>
> > >>>> Thoughts?
> > >>>>
> > >>>> @Eli and @JD, we did not replace old legacy distcp because this is
> > >> really a
> > >>>> complete rewrite and did not want to remove it until users are
> > >> familiarized
> > >>>> with new one.
> > >>>>
> > >>>> On 8/26/11 12:51 AM, "Todd Lipcon" <to...@cloudera.com> wrote:
> > >>>>
> > >>>> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
> > >>>> in there as well - ie tools that are downstream of MR and/or HDFS.
> > >>>>
> > >>>> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
> > >> mahadev@hortonworks.com>
> > >>>> wrote:
> > >>>>> +1 for a seperate module in hadoop-mapreduce-project. I think
> > >>>>> hadoop-mapreduce-client might not be right place for it. We might
> > have
> > >>>>> to pick a new maven module under hadoop-mapreduce-project that
> could
> > >>>>> host streaming/distcp/hadoop archives.
> > >>>>>
> > >>>>> thanks
> > >>>>> mahadev
> > >>>>>
> > >>>>> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
> > >> tucu@cloudera.com>
> > >>>> wrote:
> > >>>>>> Agree, it should be a separate maven module.
> > >>>>>>
> > >>>>>> And it should be under hadoop-mapreduce-client, right?
> > >>>>>>
> > >>>>>> And now that we are in the topic, the same should go for
> streaming,
> > >> no?
> > >>>>>>
> > >>>>>> Thanks.
> > >>>>>>
> > >>>>>> Alejandro
> > >>>>>>
> > >>>>>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <to...@cloudera.com>
> > >>>> wrote:
> > >>>>>>
> > >>>>>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <el...@cloudera.com>
> > >>>> wrote:
> > >>>>>>>> Nice work!   I definitely think this should go in 23 and 20x.
> > >>>>>>>>
> > >>>>>>>> Agree with JD that it should be in the core code, not contrib.
> If
> > >>>>>>>> it's going to be maintained then we should put it in the core
> > >> code.
> > >>>>>>>
> > >>>>>>> Now that we're all mavenized, though, a separate maven module and
> > >>>>>>> artifact does make sense IMO - ie "hadoop jar
> > >>>>>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"
> > >>>>>>>
> > >>>>>>> -Todd
> > >>>>>>> --
> > >>>>>>> Todd Lipcon
> > >>>>>>> Software Engineer, Cloudera
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Todd Lipcon
> > >>>> Software Engineer, Cloudera
> > >>>>
> > >>>>
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> -Giri
> > >>
> > >>
> > >>
> > >
> >
> >
> >
>

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
On Tue, Sep 6, 2011 at 10:58 AM, Mithun Radhakrishnan <
mithun.radhakrishnan@yahoo.com> wrote:

> I'm leaning towards creating a trunk/hadoop-tools/hadoop-distcp (etc.). I'm
> hoping that's going to be acceptable to this forum. This way, moving it out
> to a separate source tree should be easier.
>


+1 for moving forward with this proposal.

We still need to answer Amareshwari's question (2) she asked some time back
about the automated code compilation and test execution of the tools module.
Right now we have separate automated builds for common, hdfs and mapreduce.
If we go with the above proposal, we need to setup automated builds for the
tools modules and possibly tie the related JIRA/Jenkins emails with the
common-project lists.



> It would be nice to have clarity on how tools will be dealt with. It'd be
> convenient to distcp in trunk. (It's tiny and useful.) On the other hand,
> that might be opening doors to adding too much, and complicating the
> build/release. I'd appreciate advice on which way is best.
>
> In the meantime, I'll align the distcpv2 pom.xml with the maven-ized
> version of things, as per Tucu's suggestions.
>
>
+1


Thanks,
+Vinod



> ________________________________
> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
> To: mapreduce-dev@hadoop.apache.org
> Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; Mithun
> Radhakrishnan <mi...@yahoo.com>
> Sent: Tuesday, August 30, 2011 6:13 PM
> Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
>
> As long as hadoop-tools is in some directory at some depth under trunk,
> release of the hadoop-tools is tied to the release of core.
>
> So we actually have these two options instead:
> (1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools)
>     -- Sources at tools/trunk/hadoop-distcp
>     -- Each tool will work with specific version of Hadoop core.
>     -- Releases can really be separate
> (2) Same source tree: trunk/
>     -- Sources at either (1.1) trunk/hadoop-tools or (1.2)
> trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/
>     -- Given release isn't decoupled anyway, either will work. (1.2) is
> prefereable if building mapreduce builds the tools also.
>
> +Vinod
>
>
> On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu <
> amarsri@yahoo-inc.com> wrote:
>
> > Copying common-dev.
> >
> > Summarizing the below discussion: What should be the tools layout after
> > mavenization?
> >
> > Option #1: Have hadoop-tools at top level i.e
> > trunk/
> >   hadoop-tools/
> >       hadoop-distcp/
> > Pros:
> >  Cleaner layout.
> >  In future, tools could be released separately from  Hadoop releases
> >
> > Cons: Difficult to maintain
> >
> > Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if
> > they are depending on MapReduce/HDFS/Common respectively.
> > For ex:
> > hadoop-mapreduce-project/
> >   hadoop-mr-tools/
> >      hadoop-distcp/
> >
> > Pros: Easy to maintain
> > Cons: Still has tight coupling with related projects.
> >
> > Personally, I'm fine with any of the above options. Looking for
> suggestions
> > and reaching a consensus on this.
> >
> > Thanks
> > Amareshwari
> >
> > On 8/30/11 12:10 AM, "Allen Wittenauer" <aw...@apache.org> wrote:
> >
> >
> >
> > I have a feeling this discussion should get moved to common-dev or even
> to
> > general.
> >
> > My #1 question is if tools is basically contrib reborn.  If not, what
> makes
> > it different?
> >
> > On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote:
> >
> > > Some questions on making hadoop-tools top level under trunk,
> > >
> > > 1.  Should the patches for tools be created against Hadoop Common?
> > > 2.  What will happen to the tools test automation? Will it run as part
> of
> > Hadoop Common tests?
> > > 3.  Will it introduce a dependency from MapReduce to Common? Or is this
> > taken care in Mavenization?
> > >
> > >
> > > Thanks
> > > Amareshwari
> > >
> > > On 8/26/11 10:17 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
> > >
> > > Please, don't add more Mavenization work on us (eventually I want to go
> > back
> > > to coding)
> > >
> > > Given that Hadoop is already Mavenized, the patch should be Mavenized.
> > >
> > > What will have to be done extra (besides Mavenizing distcp) is to
> create
> > a
> > > hadoop-tools module at root level and within it a hadoop-distcp module.
> > >
> > > The hadoop-tools POM will look pretty much like the
> hadoop-common-project
> > > POM.
> > >
> > > The hadoop-distcp POM should follow the hadoop-common POM patterns.
> > >
> > > Thanks.
> > >
> > > Alejandro
> > >
> > > On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
> > > amarsri@yahoo-inc.com> wrote:
> > >
> > >> Agree with Mithun and Robert. DistCp and Tools restructuring are
> > separate
> > >> tasks. Since DistCp code is ready to be committed, it need not wait
> for
> > the
> > >> Tools separation from MR/HDFS.
> > >> I would say it can go into contrib as the patch is now, and when the
> > tools
> > >> restructuring happens it would be just an svn mv.  If there are no
> > issues
> > >> with this proposal I can commit the code tomorrow.
> > >>
> > >> Thanks
> > >> Amareshwari
> > >>
> > >> On 8/26/11 7:45 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote:
> > >>
> > >> I agree with Mithun.  They are related but this goes beyond distcpv2
> and
> > >> should not block distcpv2 from going in.  It would be very nice,
> > however, to
> > >> get the layout settled soon so that we all know where to find
> something
> > when
> > >> we want to work on it.
> > >>
> > >> Also +1 for Alejandro's I also prefer to keep tools at the trunk
> level.
> > >>
> > >> Even though HDFS, Common, and Mapreduce and perhaps soon tools are
> > separate
> > >> modules right now, there is still tight coupling between the different
> > >> pieces, especially with tests.  IMO until we can reduce that coupling
> we
> > >> should treat building and testing Hadoop as a single project instead
> of
> > >> trying to keep them separate.
> > >>
> > >> --Bobby
> > >>
> > >> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <
> > mithun.radhakrishnan@yahoo.com>
> > >> wrote:
> > >>
> > >> Would it be acceptable if retooling of tools/ were taken up
> separately?
> > It
> > >> sounds to me like this might be a distinct (albeit related) task.
> > >>
> > >> Mithun
> > >>
> > >>
> > >> ________________________________
> > >> From: Giridharan Kesavan <gk...@hortonworks.com>
> > >> To: mapreduce-dev@hadoop.apache.org
> > >> Sent: Friday, August 26, 2011 12:04 PM
> > >> Subject: Re: DistCpV2 in 0.23
> > >>
> > >> +1 to Alejandro's
> > >>
> > >> I prefer to keep the hadoop-tools at trunk level.
> > >>
> > >> -Giri
> > >>
> > >> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <
> tucu@cloudera.com>
> > >> wrote:
> > >>> I'd suggest putting hadoop-tools either at trunk/ level or having a a
> > >> tools
> > >>> aggregator module for hdfs and other for common.
> > >>>
> > >>> I personal would prefer at trunk/.
> > >>>
> > >>> Thanks.
> > >>>
> > >>> Alejandro
> > >>>
> > >>> On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
> > >>> amarsri@yahoo-inc.com> wrote:
> > >>>
> > >>>> Agree. It should be separate maven module (and patch puts it as
> > separate
> > >>>> maven module now). And top level for hadoop tools is nice to have,
> but
> > >> it
> > >>>> becomes hard to maintain until patch automation tests run the tests
> > >> under
> > >>>> tools. Currently we see many times the changes in HDFS effecting
> RAID
> > >> tests
> > >>>> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
> > >>>>
> > >>>> I propose we can have something like the following:
> > >>>>
> > >>>> trunk/
> > >>>> - hadoop-mapreduce
> > >>>>     - hadoop-mr-client
> > >>>>     - hadoop-yarn
> > >>>>     - hadoop-tools
> > >>>>         - hadoop-streaming
> > >>>>         - hadoop-archives
> > >>>>         - hadoop-distcp
> > >>>>
> > >>>> Thoughts?
> > >>>>
> > >>>> @Eli and @JD, we did not replace old legacy distcp because this is
> > >> really a
> > >>>> complete rewrite and did not want to remove it until users are
> > >> familiarized
> > >>>> with new one.
> > >>>>
> > >>>> On 8/26/11 12:51 AM, "Todd Lipcon" <to...@cloudera.com> wrote:
> > >>>>
> > >>>> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
> > >>>> in there as well - ie tools that are downstream of MR and/or HDFS.
> > >>>>
> > >>>> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
> > >> mahadev@hortonworks.com>
> > >>>> wrote:
> > >>>>> +1 for a seperate module in hadoop-mapreduce-project. I think
> > >>>>> hadoop-mapreduce-client might not be right place for it. We might
> > have
> > >>>>> to pick a new maven module under hadoop-mapreduce-project that
> could
> > >>>>> host streaming/distcp/hadoop archives.
> > >>>>>
> > >>>>> thanks
> > >>>>> mahadev
> > >>>>>
> > >>>>> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
> > >> tucu@cloudera.com>
> > >>>> wrote:
> > >>>>>> Agree, it should be a separate maven module.
> > >>>>>>
> > >>>>>> And it should be under hadoop-mapreduce-client, right?
> > >>>>>>
> > >>>>>> And now that we are in the topic, the same should go for
> streaming,
> > >> no?
> > >>>>>>
> > >>>>>> Thanks.
> > >>>>>>
> > >>>>>> Alejandro
> > >>>>>>
> > >>>>>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <to...@cloudera.com>
> > >>>> wrote:
> > >>>>>>
> > >>>>>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <el...@cloudera.com>
> > >>>> wrote:
> > >>>>>>>> Nice work!   I definitely think this should go in 23 and 20x.
> > >>>>>>>>
> > >>>>>>>> Agree with JD that it should be in the core code, not contrib.
> If
> > >>>>>>>> it's going to be maintained then we should put it in the core
> > >> code.
> > >>>>>>>
> > >>>>>>> Now that we're all mavenized, though, a separate maven module and
> > >>>>>>> artifact does make sense IMO - ie "hadoop jar
> > >>>>>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"
> > >>>>>>>
> > >>>>>>> -Todd
> > >>>>>>> --
> > >>>>>>> Todd Lipcon
> > >>>>>>> Software Engineer, Cloudera
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Todd Lipcon
> > >>>> Software Engineer, Cloudera
> > >>>>
> > >>>>
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> -Giri
> > >>
> > >>
> > >>
> > >
> >
> >
> >
>

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Arun C Murthy <ac...@hortonworks.com>.
+1

On Sep 6, 2011, at 12:13 AM, Amareshwari Sri Ramadasu wrote:

> + Copying common dev.
> 
> On 9/6/11 10:58 AM, "Mithun Radhakrishnan" <mi...@yahoo.com> wrote:
> 
> I'm leaning towards creating a trunk/hadoop-tools/hadoop-distcp (etc.). I'm hoping that's going to be acceptable to this forum. This way, moving it out to a separate source tree should be easier.
> 
> It would be nice to have clarity on how tools will be dealt with. It'd be convenient to distcp in trunk. (It's tiny and useful.) On the other hand, that might be opening doors to adding too much, and complicating the build/release. I'd appreciate advice on which way is best.
> 
> In the meantime, I'll align the distcpv2 pom.xml with the maven-ized version of things, as per Tucu's suggestions.
> 
> Mithun
> 
> 
> ________________________________
> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
> To: mapreduce-dev@hadoop.apache.org
> Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; Mithun Radhakrishnan <mi...@yahoo.com>
> Sent: Tuesday, August 30, 2011 6:13 PM
> Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
> 
> As long as hadoop-tools is in some directory at some depth under trunk,
> release of the hadoop-tools is tied to the release of core.
> 
> So we actually have these two options instead:
> (1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools)
>    -- Sources at tools/trunk/hadoop-distcp
>    -- Each tool will work with specific version of Hadoop core.
>    -- Releases can really be separate
> (2) Same source tree: trunk/
>    -- Sources at either (1.1) trunk/hadoop-tools or (1.2)
> trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/
>    -- Given release isn't decoupled anyway, either will work. (1.2) is
> prefereable if building mapreduce builds the tools also.
> 
> +Vinod
> 
> 
> On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu <
> amarsri@yahoo-inc.com> wrote:
> 
>> Copying common-dev.
>> 
>> Summarizing the below discussion: What should be the tools layout after
>> mavenization?
>> 
>> Option #1: Have hadoop-tools at top level i.e
>> trunk/
>>  hadoop-tools/
>>      hadoop-distcp/
>> Pros:
>> Cleaner layout.
>> In future, tools could be released separately from  Hadoop releases
>> 
>> Cons: Difficult to maintain
>> 
>> Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if
>> they are depending on MapReduce/HDFS/Common respectively.
>> For ex:
>> hadoop-mapreduce-project/
>>  hadoop-mr-tools/
>>     hadoop-distcp/
>> 
>> Pros: Easy to maintain
>> Cons: Still has tight coupling with related projects.
>> 
>> Personally, I'm fine with any of the above options. Looking for suggestions
>> and reaching a consensus on this.
>> 
>> Thanks
>> Amareshwari
>> 
>> On 8/30/11 12:10 AM, "Allen Wittenauer" <aw...@apache.org> wrote:
>> 
>> 
>> 
>> I have a feeling this discussion should get moved to common-dev or even to
>> general.
>> 
>> My #1 question is if tools is basically contrib reborn.  If not, what makes
>> it different?
>> 
>> On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote:
>> 
>>> Some questions on making hadoop-tools top level under trunk,
>>> 
>>> 1.  Should the patches for tools be created against Hadoop Common?
>>> 2.  What will happen to the tools test automation? Will it run as part of
>> Hadoop Common tests?
>>> 3.  Will it introduce a dependency from MapReduce to Common? Or is this
>> taken care in Mavenization?
>>> 
>>> 
>>> Thanks
>>> Amareshwari
>>> 
>>> On 8/26/11 10:17 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
>>> 
>>> Please, don't add more Mavenization work on us (eventually I want to go
>> back
>>> to coding)
>>> 
>>> Given that Hadoop is already Mavenized, the patch should be Mavenized.
>>> 
>>> What will have to be done extra (besides Mavenizing distcp) is to create
>> a
>>> hadoop-tools module at root level and within it a hadoop-distcp module.
>>> 
>>> The hadoop-tools POM will look pretty much like the hadoop-common-project
>>> POM.
>>> 
>>> The hadoop-distcp POM should follow the hadoop-common POM patterns.
>>> 
>>> Thanks.
>>> 
>>> Alejandro
>>> 
>>> On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
>>> amarsri@yahoo-inc.com> wrote:
>>> 
>>>> Agree with Mithun and Robert. DistCp and Tools restructuring are
>> separate
>>>> tasks. Since DistCp code is ready to be committed, it need not wait for
>> the
>>>> Tools separation from MR/HDFS.
>>>> I would say it can go into contrib as the patch is now, and when the
>> tools
>>>> restructuring happens it would be just an svn mv.  If there are no
>> issues
>>>> with this proposal I can commit the code tomorrow.
>>>> 
>>>> Thanks
>>>> Amareshwari
>>>> 
>>>> On 8/26/11 7:45 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote:
>>>> 
>>>> I agree with Mithun.  They are related but this goes beyond distcpv2 and
>>>> should not block distcpv2 from going in.  It would be very nice,
>> however, to
>>>> get the layout settled soon so that we all know where to find something
>> when
>>>> we want to work on it.
>>>> 
>>>> Also +1 for Alejandro's I also prefer to keep tools at the trunk level.
>>>> 
>>>> Even though HDFS, Common, and Mapreduce and perhaps soon tools are
>> separate
>>>> modules right now, there is still tight coupling between the different
>>>> pieces, especially with tests.  IMO until we can reduce that coupling we
>>>> should treat building and testing Hadoop as a single project instead of
>>>> trying to keep them separate.
>>>> 
>>>> --Bobby
>>>> 
>>>> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <
>> mithun.radhakrishnan@yahoo.com>
>>>> wrote:
>>>> 
>>>> Would it be acceptable if retooling of tools/ were taken up separately?
>> It
>>>> sounds to me like this might be a distinct (albeit related) task.
>>>> 
>>>> Mithun
>>>> 
>>>> 
>>>> ________________________________
>>>> From: Giridharan Kesavan <gk...@hortonworks.com>
>>>> To: mapreduce-dev@hadoop.apache.org
>>>> Sent: Friday, August 26, 2011 12:04 PM
>>>> Subject: Re: DistCpV2 in 0.23
>>>> 
>>>> +1 to Alejandro's
>>>> 
>>>> I prefer to keep the hadoop-tools at trunk level.
>>>> 
>>>> -Giri
>>>> 
>>>> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <tu...@cloudera.com>
>>>> wrote:
>>>>> I'd suggest putting hadoop-tools either at trunk/ level or having a a
>>>> tools
>>>>> aggregator module for hdfs and other for common.
>>>>> 
>>>>> I personal would prefer at trunk/.
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> Alejandro
>>>>> 
>>>>> On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
>>>>> amarsri@yahoo-inc.com> wrote:
>>>>> 
>>>>>> Agree. It should be separate maven module (and patch puts it as
>> separate
>>>>>> maven module now). And top level for hadoop tools is nice to have, but
>>>> it
>>>>>> becomes hard to maintain until patch automation tests run the tests
>>>> under
>>>>>> tools. Currently we see many times the changes in HDFS effecting RAID
>>>> tests
>>>>>> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
>>>>>> 
>>>>>> I propose we can have something like the following:
>>>>>> 
>>>>>> trunk/
>>>>>> - hadoop-mapreduce
>>>>>>    - hadoop-mr-client
>>>>>>    - hadoop-yarn
>>>>>>    - hadoop-tools
>>>>>>        - hadoop-streaming
>>>>>>        - hadoop-archives
>>>>>>        - hadoop-distcp
>>>>>> 
>>>>>> Thoughts?
>>>>>> 
>>>>>> @Eli and @JD, we did not replace old legacy distcp because this is
>>>> really a
>>>>>> complete rewrite and did not want to remove it until users are
>>>> familiarized
>>>>>> with new one.
>>>>>> 
>>>>>> On 8/26/11 12:51 AM, "Todd Lipcon" <to...@cloudera.com> wrote:
>>>>>> 
>>>>>> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
>>>>>> in there as well - ie tools that are downstream of MR and/or HDFS.
>>>>>> 
>>>>>> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
>>>> mahadev@hortonworks.com>
>>>>>> wrote:
>>>>>>> +1 for a seperate module in hadoop-mapreduce-project. I think
>>>>>>> hadoop-mapreduce-client might not be right place for it. We might
>> have
>>>>>>> to pick a new maven module under hadoop-mapreduce-project that could
>>>>>>> host streaming/distcp/hadoop archives.
>>>>>>> 
>>>>>>> thanks
>>>>>>> mahadev
>>>>>>> 
>>>>>>> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
>>>> tucu@cloudera.com>
>>>>>> wrote:
>>>>>>>> Agree, it should be a separate maven module.
>>>>>>>> 
>>>>>>>> And it should be under hadoop-mapreduce-client, right?
>>>>>>>> 
>>>>>>>> And now that we are in the topic, the same should go for streaming,
>>>> no?
>>>>>>>> 
>>>>>>>> Thanks.
>>>>>>>> 
>>>>>>>> Alejandro
>>>>>>>> 
>>>>>>>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <to...@cloudera.com>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <el...@cloudera.com>
>>>>>> wrote:
>>>>>>>>>> Nice work!   I definitely think this should go in 23 and 20x.
>>>>>>>>>> 
>>>>>>>>>> Agree with JD that it should be in the core code, not contrib.  If
>>>>>>>>>> it's going to be maintained then we should put it in the core
>>>> code.
>>>>>>>>> 
>>>>>>>>> Now that we're all mavenized, though, a separate maven module and
>>>>>>>>> artifact does make sense IMO - ie "hadoop jar
>>>>>>>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"
>>>>>>>>> 
>>>>>>>>> -Todd
>>>>>>>>> --
>>>>>>>>> Todd Lipcon
>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> -Giri
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>> 
> 


Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Arun C Murthy <ac...@hortonworks.com>.
+1

On Sep 6, 2011, at 12:13 AM, Amareshwari Sri Ramadasu wrote:

> + Copying common dev.
> 
> On 9/6/11 10:58 AM, "Mithun Radhakrishnan" <mi...@yahoo.com> wrote:
> 
> I'm leaning towards creating a trunk/hadoop-tools/hadoop-distcp (etc.). I'm hoping that's going to be acceptable to this forum. This way, moving it out to a separate source tree should be easier.
> 
> It would be nice to have clarity on how tools will be dealt with. It'd be convenient to distcp in trunk. (It's tiny and useful.) On the other hand, that might be opening doors to adding too much, and complicating the build/release. I'd appreciate advice on which way is best.
> 
> In the meantime, I'll align the distcpv2 pom.xml with the maven-ized version of things, as per Tucu's suggestions.
> 
> Mithun
> 
> 
> ________________________________
> From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
> To: mapreduce-dev@hadoop.apache.org
> Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; Mithun Radhakrishnan <mi...@yahoo.com>
> Sent: Tuesday, August 30, 2011 6:13 PM
> Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
> 
> As long as hadoop-tools is in some directory at some depth under trunk,
> release of the hadoop-tools is tied to the release of core.
> 
> So we actually have these two options instead:
> (1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools)
>    -- Sources at tools/trunk/hadoop-distcp
>    -- Each tool will work with specific version of Hadoop core.
>    -- Releases can really be separate
> (2) Same source tree: trunk/
>    -- Sources at either (1.1) trunk/hadoop-tools or (1.2)
> trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/
>    -- Given release isn't decoupled anyway, either will work. (1.2) is
> prefereable if building mapreduce builds the tools also.
> 
> +Vinod
> 
> 
> On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu <
> amarsri@yahoo-inc.com> wrote:
> 
>> Copying common-dev.
>> 
>> Summarizing the below discussion: What should be the tools layout after
>> mavenization?
>> 
>> Option #1: Have hadoop-tools at top level i.e
>> trunk/
>>  hadoop-tools/
>>      hadoop-distcp/
>> Pros:
>> Cleaner layout.
>> In future, tools could be released separately from  Hadoop releases
>> 
>> Cons: Difficult to maintain
>> 
>> Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if
>> they are depending on MapReduce/HDFS/Common respectively.
>> For ex:
>> hadoop-mapreduce-project/
>>  hadoop-mr-tools/
>>     hadoop-distcp/
>> 
>> Pros: Easy to maintain
>> Cons: Still has tight coupling with related projects.
>> 
>> Personally, I'm fine with any of the above options. Looking for suggestions
>> and reaching a consensus on this.
>> 
>> Thanks
>> Amareshwari
>> 
>> On 8/30/11 12:10 AM, "Allen Wittenauer" <aw...@apache.org> wrote:
>> 
>> 
>> 
>> I have a feeling this discussion should get moved to common-dev or even to
>> general.
>> 
>> My #1 question is if tools is basically contrib reborn.  If not, what makes
>> it different?
>> 
>> On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote:
>> 
>>> Some questions on making hadoop-tools top level under trunk,
>>> 
>>> 1.  Should the patches for tools be created against Hadoop Common?
>>> 2.  What will happen to the tools test automation? Will it run as part of
>> Hadoop Common tests?
>>> 3.  Will it introduce a dependency from MapReduce to Common? Or is this
>> taken care in Mavenization?
>>> 
>>> 
>>> Thanks
>>> Amareshwari
>>> 
>>> On 8/26/11 10:17 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
>>> 
>>> Please, don't add more Mavenization work on us (eventually I want to go
>> back
>>> to coding)
>>> 
>>> Given that Hadoop is already Mavenized, the patch should be Mavenized.
>>> 
>>> What will have to be done extra (besides Mavenizing distcp) is to create
>> a
>>> hadoop-tools module at root level and within it a hadoop-distcp module.
>>> 
>>> The hadoop-tools POM will look pretty much like the hadoop-common-project
>>> POM.
>>> 
>>> The hadoop-distcp POM should follow the hadoop-common POM patterns.
>>> 
>>> Thanks.
>>> 
>>> Alejandro
>>> 
>>> On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
>>> amarsri@yahoo-inc.com> wrote:
>>> 
>>>> Agree with Mithun and Robert. DistCp and Tools restructuring are
>> separate
>>>> tasks. Since DistCp code is ready to be committed, it need not wait for
>> the
>>>> Tools separation from MR/HDFS.
>>>> I would say it can go into contrib as the patch is now, and when the
>> tools
>>>> restructuring happens it would be just an svn mv.  If there are no
>> issues
>>>> with this proposal I can commit the code tomorrow.
>>>> 
>>>> Thanks
>>>> Amareshwari
>>>> 
>>>> On 8/26/11 7:45 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote:
>>>> 
>>>> I agree with Mithun.  They are related but this goes beyond distcpv2 and
>>>> should not block distcpv2 from going in.  It would be very nice,
>> however, to
>>>> get the layout settled soon so that we all know where to find something
>> when
>>>> we want to work on it.
>>>> 
>>>> Also +1 for Alejandro's I also prefer to keep tools at the trunk level.
>>>> 
>>>> Even though HDFS, Common, and Mapreduce and perhaps soon tools are
>> separate
>>>> modules right now, there is still tight coupling between the different
>>>> pieces, especially with tests.  IMO until we can reduce that coupling we
>>>> should treat building and testing Hadoop as a single project instead of
>>>> trying to keep them separate.
>>>> 
>>>> --Bobby
>>>> 
>>>> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <
>> mithun.radhakrishnan@yahoo.com>
>>>> wrote:
>>>> 
>>>> Would it be acceptable if retooling of tools/ were taken up separately?
>> It
>>>> sounds to me like this might be a distinct (albeit related) task.
>>>> 
>>>> Mithun
>>>> 
>>>> 
>>>> ________________________________
>>>> From: Giridharan Kesavan <gk...@hortonworks.com>
>>>> To: mapreduce-dev@hadoop.apache.org
>>>> Sent: Friday, August 26, 2011 12:04 PM
>>>> Subject: Re: DistCpV2 in 0.23
>>>> 
>>>> +1 to Alejandro's
>>>> 
>>>> I prefer to keep the hadoop-tools at trunk level.
>>>> 
>>>> -Giri
>>>> 
>>>> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <tu...@cloudera.com>
>>>> wrote:
>>>>> I'd suggest putting hadoop-tools either at trunk/ level or having a a
>>>> tools
>>>>> aggregator module for hdfs and other for common.
>>>>> 
>>>>> I personal would prefer at trunk/.
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> Alejandro
>>>>> 
>>>>> On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
>>>>> amarsri@yahoo-inc.com> wrote:
>>>>> 
>>>>>> Agree. It should be separate maven module (and patch puts it as
>> separate
>>>>>> maven module now). And top level for hadoop tools is nice to have, but
>>>> it
>>>>>> becomes hard to maintain until patch automation tests run the tests
>>>> under
>>>>>> tools. Currently we see many times the changes in HDFS effecting RAID
>>>> tests
>>>>>> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
>>>>>> 
>>>>>> I propose we can have something like the following:
>>>>>> 
>>>>>> trunk/
>>>>>> - hadoop-mapreduce
>>>>>>    - hadoop-mr-client
>>>>>>    - hadoop-yarn
>>>>>>    - hadoop-tools
>>>>>>        - hadoop-streaming
>>>>>>        - hadoop-archives
>>>>>>        - hadoop-distcp
>>>>>> 
>>>>>> Thoughts?
>>>>>> 
>>>>>> @Eli and @JD, we did not replace old legacy distcp because this is
>>>> really a
>>>>>> complete rewrite and did not want to remove it until users are
>>>> familiarized
>>>>>> with new one.
>>>>>> 
>>>>>> On 8/26/11 12:51 AM, "Todd Lipcon" <to...@cloudera.com> wrote:
>>>>>> 
>>>>>> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
>>>>>> in there as well - ie tools that are downstream of MR and/or HDFS.
>>>>>> 
>>>>>> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
>>>> mahadev@hortonworks.com>
>>>>>> wrote:
>>>>>>> +1 for a seperate module in hadoop-mapreduce-project. I think
>>>>>>> hadoop-mapreduce-client might not be right place for it. We might
>> have
>>>>>>> to pick a new maven module under hadoop-mapreduce-project that could
>>>>>>> host streaming/distcp/hadoop archives.
>>>>>>> 
>>>>>>> thanks
>>>>>>> mahadev
>>>>>>> 
>>>>>>> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
>>>> tucu@cloudera.com>
>>>>>> wrote:
>>>>>>>> Agree, it should be a separate maven module.
>>>>>>>> 
>>>>>>>> And it should be under hadoop-mapreduce-client, right?
>>>>>>>> 
>>>>>>>> And now that we are in the topic, the same should go for streaming,
>>>> no?
>>>>>>>> 
>>>>>>>> Thanks.
>>>>>>>> 
>>>>>>>> Alejandro
>>>>>>>> 
>>>>>>>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <to...@cloudera.com>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <el...@cloudera.com>
>>>>>> wrote:
>>>>>>>>>> Nice work!   I definitely think this should go in 23 and 20x.
>>>>>>>>>> 
>>>>>>>>>> Agree with JD that it should be in the core code, not contrib.  If
>>>>>>>>>> it's going to be maintained then we should put it in the core
>>>> code.
>>>>>>>>> 
>>>>>>>>> Now that we're all mavenized, though, a separate maven module and
>>>>>>>>> artifact does make sense IMO - ie "hadoop jar
>>>>>>>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"
>>>>>>>>> 
>>>>>>>>> -Todd
>>>>>>>>> --
>>>>>>>>> Todd Lipcon
>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> -Giri
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>> 
> 


Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Amareshwari Sri Ramadasu <am...@yahoo-inc.com>.
+ Copying common dev.

On 9/6/11 10:58 AM, "Mithun Radhakrishnan" <mi...@yahoo.com> wrote:

I'm leaning towards creating a trunk/hadoop-tools/hadoop-distcp (etc.). I'm hoping that's going to be acceptable to this forum. This way, moving it out to a separate source tree should be easier.

It would be nice to have clarity on how tools will be dealt with. It'd be convenient to distcp in trunk. (It's tiny and useful.) On the other hand, that might be opening doors to adding too much, and complicating the build/release. I'd appreciate advice on which way is best.

In the meantime, I'll align the distcpv2 pom.xml with the maven-ized version of things, as per Tucu's suggestions.

Mithun


________________________________
From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
To: mapreduce-dev@hadoop.apache.org
Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; Mithun Radhakrishnan <mi...@yahoo.com>
Sent: Tuesday, August 30, 2011 6:13 PM
Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

As long as hadoop-tools is in some directory at some depth under trunk,
release of the hadoop-tools is tied to the release of core.

So we actually have these two options instead:
(1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools)
    -- Sources at tools/trunk/hadoop-distcp
    -- Each tool will work with specific version of Hadoop core.
    -- Releases can really be separate
(2) Same source tree: trunk/
    -- Sources at either (1.1) trunk/hadoop-tools or (1.2)
trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/
    -- Given release isn't decoupled anyway, either will work. (1.2) is
prefereable if building mapreduce builds the tools also.

+Vinod


On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu <
amarsri@yahoo-inc.com> wrote:

> Copying common-dev.
>
> Summarizing the below discussion: What should be the tools layout after
> mavenization?
>
> Option #1: Have hadoop-tools at top level i.e
> trunk/
>   hadoop-tools/
>       hadoop-distcp/
> Pros:
>  Cleaner layout.
>  In future, tools could be released separately from  Hadoop releases
>
> Cons: Difficult to maintain
>
> Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if
> they are depending on MapReduce/HDFS/Common respectively.
> For ex:
> hadoop-mapreduce-project/
>   hadoop-mr-tools/
>      hadoop-distcp/
>
> Pros: Easy to maintain
> Cons: Still has tight coupling with related projects.
>
> Personally, I'm fine with any of the above options. Looking for suggestions
> and reaching a consensus on this.
>
> Thanks
> Amareshwari
>
> On 8/30/11 12:10 AM, "Allen Wittenauer" <aw...@apache.org> wrote:
>
>
>
> I have a feeling this discussion should get moved to common-dev or even to
> general.
>
> My #1 question is if tools is basically contrib reborn.  If not, what makes
> it different?
>
> On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote:
>
> > Some questions on making hadoop-tools top level under trunk,
> >
> > 1.  Should the patches for tools be created against Hadoop Common?
> > 2.  What will happen to the tools test automation? Will it run as part of
> Hadoop Common tests?
> > 3.  Will it introduce a dependency from MapReduce to Common? Or is this
> taken care in Mavenization?
> >
> >
> > Thanks
> > Amareshwari
> >
> > On 8/26/11 10:17 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
> >
> > Please, don't add more Mavenization work on us (eventually I want to go
> back
> > to coding)
> >
> > Given that Hadoop is already Mavenized, the patch should be Mavenized.
> >
> > What will have to be done extra (besides Mavenizing distcp) is to create
> a
> > hadoop-tools module at root level and within it a hadoop-distcp module.
> >
> > The hadoop-tools POM will look pretty much like the hadoop-common-project
> > POM.
> >
> > The hadoop-distcp POM should follow the hadoop-common POM patterns.
> >
> > Thanks.
> >
> > Alejandro
> >
> > On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
> > amarsri@yahoo-inc.com> wrote:
> >
> >> Agree with Mithun and Robert. DistCp and Tools restructuring are
> separate
> >> tasks. Since DistCp code is ready to be committed, it need not wait for
> the
> >> Tools separation from MR/HDFS.
> >> I would say it can go into contrib as the patch is now, and when the
> tools
> >> restructuring happens it would be just an svn mv.  If there are no
> issues
> >> with this proposal I can commit the code tomorrow.
> >>
> >> Thanks
> >> Amareshwari
> >>
> >> On 8/26/11 7:45 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote:
> >>
> >> I agree with Mithun.  They are related but this goes beyond distcpv2 and
> >> should not block distcpv2 from going in.  It would be very nice,
> however, to
> >> get the layout settled soon so that we all know where to find something
> when
> >> we want to work on it.
> >>
> >> Also +1 for Alejandro's I also prefer to keep tools at the trunk level.
> >>
> >> Even though HDFS, Common, and Mapreduce and perhaps soon tools are
> separate
> >> modules right now, there is still tight coupling between the different
> >> pieces, especially with tests.  IMO until we can reduce that coupling we
> >> should treat building and testing Hadoop as a single project instead of
> >> trying to keep them separate.
> >>
> >> --Bobby
> >>
> >> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <
> mithun.radhakrishnan@yahoo.com>
> >> wrote:
> >>
> >> Would it be acceptable if retooling of tools/ were taken up separately?
> It
> >> sounds to me like this might be a distinct (albeit related) task.
> >>
> >> Mithun
> >>
> >>
> >> ________________________________
> >> From: Giridharan Kesavan <gk...@hortonworks.com>
> >> To: mapreduce-dev@hadoop.apache.org
> >> Sent: Friday, August 26, 2011 12:04 PM
> >> Subject: Re: DistCpV2 in 0.23
> >>
> >> +1 to Alejandro's
> >>
> >> I prefer to keep the hadoop-tools at trunk level.
> >>
> >> -Giri
> >>
> >> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <tu...@cloudera.com>
> >> wrote:
> >>> I'd suggest putting hadoop-tools either at trunk/ level or having a a
> >> tools
> >>> aggregator module for hdfs and other for common.
> >>>
> >>> I personal would prefer at trunk/.
> >>>
> >>> Thanks.
> >>>
> >>> Alejandro
> >>>
> >>> On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
> >>> amarsri@yahoo-inc.com> wrote:
> >>>
> >>>> Agree. It should be separate maven module (and patch puts it as
> separate
> >>>> maven module now). And top level for hadoop tools is nice to have, but
> >> it
> >>>> becomes hard to maintain until patch automation tests run the tests
> >> under
> >>>> tools. Currently we see many times the changes in HDFS effecting RAID
> >> tests
> >>>> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
> >>>>
> >>>> I propose we can have something like the following:
> >>>>
> >>>> trunk/
> >>>> - hadoop-mapreduce
> >>>>     - hadoop-mr-client
> >>>>     - hadoop-yarn
> >>>>     - hadoop-tools
> >>>>         - hadoop-streaming
> >>>>         - hadoop-archives
> >>>>         - hadoop-distcp
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> @Eli and @JD, we did not replace old legacy distcp because this is
> >> really a
> >>>> complete rewrite and did not want to remove it until users are
> >> familiarized
> >>>> with new one.
> >>>>
> >>>> On 8/26/11 12:51 AM, "Todd Lipcon" <to...@cloudera.com> wrote:
> >>>>
> >>>> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
> >>>> in there as well - ie tools that are downstream of MR and/or HDFS.
> >>>>
> >>>> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
> >> mahadev@hortonworks.com>
> >>>> wrote:
> >>>>> +1 for a seperate module in hadoop-mapreduce-project. I think
> >>>>> hadoop-mapreduce-client might not be right place for it. We might
> have
> >>>>> to pick a new maven module under hadoop-mapreduce-project that could
> >>>>> host streaming/distcp/hadoop archives.
> >>>>>
> >>>>> thanks
> >>>>> mahadev
> >>>>>
> >>>>> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
> >> tucu@cloudera.com>
> >>>> wrote:
> >>>>>> Agree, it should be a separate maven module.
> >>>>>>
> >>>>>> And it should be under hadoop-mapreduce-client, right?
> >>>>>>
> >>>>>> And now that we are in the topic, the same should go for streaming,
> >> no?
> >>>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>> Alejandro
> >>>>>>
> >>>>>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <to...@cloudera.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <el...@cloudera.com>
> >>>> wrote:
> >>>>>>>> Nice work!   I definitely think this should go in 23 and 20x.
> >>>>>>>>
> >>>>>>>> Agree with JD that it should be in the core code, not contrib.  If
> >>>>>>>> it's going to be maintained then we should put it in the core
> >> code.
> >>>>>>>
> >>>>>>> Now that we're all mavenized, though, a separate maven module and
> >>>>>>> artifact does make sense IMO - ie "hadoop jar
> >>>>>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"
> >>>>>>>
> >>>>>>> -Todd
> >>>>>>> --
> >>>>>>> Todd Lipcon
> >>>>>>> Software Engineer, Cloudera
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Todd Lipcon
> >>>> Software Engineer, Cloudera
> >>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> -Giri
> >>
> >>
> >>
> >
>
>
>


Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Amareshwari Sri Ramadasu <am...@yahoo-inc.com>.
+ Copying common dev.

On 9/6/11 10:58 AM, "Mithun Radhakrishnan" <mi...@yahoo.com> wrote:

I'm leaning towards creating a trunk/hadoop-tools/hadoop-distcp (etc.). I'm hoping that's going to be acceptable to this forum. This way, moving it out to a separate source tree should be easier.

It would be nice to have clarity on how tools will be dealt with. It'd be convenient to distcp in trunk. (It's tiny and useful.) On the other hand, that might be opening doors to adding too much, and complicating the build/release. I'd appreciate advice on which way is best.

In the meantime, I'll align the distcpv2 pom.xml with the maven-ized version of things, as per Tucu's suggestions.

Mithun


________________________________
From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
To: mapreduce-dev@hadoop.apache.org
Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; Mithun Radhakrishnan <mi...@yahoo.com>
Sent: Tuesday, August 30, 2011 6:13 PM
Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

As long as hadoop-tools is in some directory at some depth under trunk,
release of the hadoop-tools is tied to the release of core.

So we actually have these two options instead:
(1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools)
    -- Sources at tools/trunk/hadoop-distcp
    -- Each tool will work with specific version of Hadoop core.
    -- Releases can really be separate
(2) Same source tree: trunk/
    -- Sources at either (1.1) trunk/hadoop-tools or (1.2)
trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/
    -- Given release isn't decoupled anyway, either will work. (1.2) is
prefereable if building mapreduce builds the tools also.

+Vinod


On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu <
amarsri@yahoo-inc.com> wrote:

> Copying common-dev.
>
> Summarizing the below discussion: What should be the tools layout after
> mavenization?
>
> Option #1: Have hadoop-tools at top level i.e
> trunk/
>   hadoop-tools/
>       hadoop-distcp/
> Pros:
>  Cleaner layout.
>  In future, tools could be released separately from  Hadoop releases
>
> Cons: Difficult to maintain
>
> Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if
> they are depending on MapReduce/HDFS/Common respectively.
> For ex:
> hadoop-mapreduce-project/
>   hadoop-mr-tools/
>      hadoop-distcp/
>
> Pros: Easy to maintain
> Cons: Still has tight coupling with related projects.
>
> Personally, I'm fine with any of the above options. Looking for suggestions
> and reaching a consensus on this.
>
> Thanks
> Amareshwari
>
> On 8/30/11 12:10 AM, "Allen Wittenauer" <aw...@apache.org> wrote:
>
>
>
> I have a feeling this discussion should get moved to common-dev or even to
> general.
>
> My #1 question is if tools is basically contrib reborn.  If not, what makes
> it different?
>
> On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote:
>
> > Some questions on making hadoop-tools top level under trunk,
> >
> > 1.  Should the patches for tools be created against Hadoop Common?
> > 2.  What will happen to the tools test automation? Will it run as part of
> Hadoop Common tests?
> > 3.  Will it introduce a dependency from MapReduce to Common? Or is this
> taken care in Mavenization?
> >
> >
> > Thanks
> > Amareshwari
> >
> > On 8/26/11 10:17 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
> >
> > Please, don't add more Mavenization work on us (eventually I want to go
> back
> > to coding)
> >
> > Given that Hadoop is already Mavenized, the patch should be Mavenized.
> >
> > What will have to be done extra (besides Mavenizing distcp) is to create
> a
> > hadoop-tools module at root level and within it a hadoop-distcp module.
> >
> > The hadoop-tools POM will look pretty much like the hadoop-common-project
> > POM.
> >
> > The hadoop-distcp POM should follow the hadoop-common POM patterns.
> >
> > Thanks.
> >
> > Alejandro
> >
> > On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
> > amarsri@yahoo-inc.com> wrote:
> >
> >> Agree with Mithun and Robert. DistCp and Tools restructuring are
> separate
> >> tasks. Since DistCp code is ready to be committed, it need not wait for
> the
> >> Tools separation from MR/HDFS.
> >> I would say it can go into contrib as the patch is now, and when the
> tools
> >> restructuring happens it would be just an svn mv.  If there are no
> issues
> >> with this proposal I can commit the code tomorrow.
> >>
> >> Thanks
> >> Amareshwari
> >>
> >> On 8/26/11 7:45 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote:
> >>
> >> I agree with Mithun.  They are related but this goes beyond distcpv2 and
> >> should not block distcpv2 from going in.  It would be very nice,
> however, to
> >> get the layout settled soon so that we all know where to find something
> when
> >> we want to work on it.
> >>
> >> Also +1 for Alejandro's I also prefer to keep tools at the trunk level.
> >>
> >> Even though HDFS, Common, and Mapreduce and perhaps soon tools are
> separate
> >> modules right now, there is still tight coupling between the different
> >> pieces, especially with tests.  IMO until we can reduce that coupling we
> >> should treat building and testing Hadoop as a single project instead of
> >> trying to keep them separate.
> >>
> >> --Bobby
> >>
> >> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <
> mithun.radhakrishnan@yahoo.com>
> >> wrote:
> >>
> >> Would it be acceptable if retooling of tools/ were taken up separately?
> It
> >> sounds to me like this might be a distinct (albeit related) task.
> >>
> >> Mithun
> >>
> >>
> >> ________________________________
> >> From: Giridharan Kesavan <gk...@hortonworks.com>
> >> To: mapreduce-dev@hadoop.apache.org
> >> Sent: Friday, August 26, 2011 12:04 PM
> >> Subject: Re: DistCpV2 in 0.23
> >>
> >> +1 to Alejandro's
> >>
> >> I prefer to keep the hadoop-tools at trunk level.
> >>
> >> -Giri
> >>
> >> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <tu...@cloudera.com>
> >> wrote:
> >>> I'd suggest putting hadoop-tools either at trunk/ level or having a a
> >> tools
> >>> aggregator module for hdfs and other for common.
> >>>
> >>> I personal would prefer at trunk/.
> >>>
> >>> Thanks.
> >>>
> >>> Alejandro
> >>>
> >>> On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
> >>> amarsri@yahoo-inc.com> wrote:
> >>>
> >>>> Agree. It should be separate maven module (and patch puts it as
> separate
> >>>> maven module now). And top level for hadoop tools is nice to have, but
> >> it
> >>>> becomes hard to maintain until patch automation tests run the tests
> >> under
> >>>> tools. Currently we see many times the changes in HDFS effecting RAID
> >> tests
> >>>> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
> >>>>
> >>>> I propose we can have something like the following:
> >>>>
> >>>> trunk/
> >>>> - hadoop-mapreduce
> >>>>     - hadoop-mr-client
> >>>>     - hadoop-yarn
> >>>>     - hadoop-tools
> >>>>         - hadoop-streaming
> >>>>         - hadoop-archives
> >>>>         - hadoop-distcp
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> @Eli and @JD, we did not replace old legacy distcp because this is
> >> really a
> >>>> complete rewrite and did not want to remove it until users are
> >> familiarized
> >>>> with new one.
> >>>>
> >>>> On 8/26/11 12:51 AM, "Todd Lipcon" <to...@cloudera.com> wrote:
> >>>>
> >>>> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
> >>>> in there as well - ie tools that are downstream of MR and/or HDFS.
> >>>>
> >>>> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
> >> mahadev@hortonworks.com>
> >>>> wrote:
> >>>>> +1 for a seperate module in hadoop-mapreduce-project. I think
> >>>>> hadoop-mapreduce-client might not be right place for it. We might
> have
> >>>>> to pick a new maven module under hadoop-mapreduce-project that could
> >>>>> host streaming/distcp/hadoop archives.
> >>>>>
> >>>>> thanks
> >>>>> mahadev
> >>>>>
> >>>>> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
> >> tucu@cloudera.com>
> >>>> wrote:
> >>>>>> Agree, it should be a separate maven module.
> >>>>>>
> >>>>>> And it should be under hadoop-mapreduce-client, right?
> >>>>>>
> >>>>>> And now that we are in the topic, the same should go for streaming,
> >> no?
> >>>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>> Alejandro
> >>>>>>
> >>>>>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <to...@cloudera.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <el...@cloudera.com>
> >>>> wrote:
> >>>>>>>> Nice work!   I definitely think this should go in 23 and 20x.
> >>>>>>>>
> >>>>>>>> Agree with JD that it should be in the core code, not contrib.  If
> >>>>>>>> it's going to be maintained then we should put it in the core
> >> code.
> >>>>>>>
> >>>>>>> Now that we're all mavenized, though, a separate maven module and
> >>>>>>> artifact does make sense IMO - ie "hadoop jar
> >>>>>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"
> >>>>>>>
> >>>>>>> -Todd
> >>>>>>> --
> >>>>>>> Todd Lipcon
> >>>>>>> Software Engineer, Cloudera
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Todd Lipcon
> >>>> Software Engineer, Cloudera
> >>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> -Giri
> >>
> >>
> >>
> >
>
>
>


Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Mithun Radhakrishnan <mi...@yahoo.com>.
I'm leaning towards creating a trunk/hadoop-tools/hadoop-distcp (etc.). I'm hoping that's going to be acceptable to this forum. This way, moving it out to a separate source tree should be easier.

It would be nice to have clarity on how tools will be dealt with. It'd be convenient to distcp in trunk. (It's tiny and useful.) On the other hand, that might be opening doors to adding too much, and complicating the build/release. I'd appreciate advice on which way is best.

In the meantime, I'll align the distcpv2 pom.xml with the maven-ized version of things, as per Tucu's suggestions.

Mithun


________________________________
From: Vinod Kumar Vavilapalli <vi...@hortonworks.com>
To: mapreduce-dev@hadoop.apache.org
Cc: "common-dev@hadoop.apache.org" <co...@hadoop.apache.org>; Mithun Radhakrishnan <mi...@yahoo.com>
Sent: Tuesday, August 30, 2011 6:13 PM
Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

As long as hadoop-tools is in some directory at some depth under trunk,
release of the hadoop-tools is tied to the release of core.

So we actually have these two options instead:
(1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools)
    -- Sources at tools/trunk/hadoop-distcp
    -- Each tool will work with specific version of Hadoop core.
    -- Releases can really be separate
(2) Same source tree: trunk/
    -- Sources at either (1.1) trunk/hadoop-tools or (1.2)
trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/
    -- Given release isn't decoupled anyway, either will work. (1.2) is
prefereable if building mapreduce builds the tools also.

+Vinod


On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu <
amarsri@yahoo-inc.com> wrote:

> Copying common-dev.
>
> Summarizing the below discussion: What should be the tools layout after
> mavenization?
>
> Option #1: Have hadoop-tools at top level i.e
> trunk/
>   hadoop-tools/
>       hadoop-distcp/
> Pros:
>  Cleaner layout.
>  In future, tools could be released separately from  Hadoop releases
>
> Cons: Difficult to maintain
>
> Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if
> they are depending on MapReduce/HDFS/Common respectively.
> For ex:
> hadoop-mapreduce-project/
>   hadoop-mr-tools/
>      hadoop-distcp/
>
> Pros: Easy to maintain
> Cons: Still has tight coupling with related projects.
>
> Personally, I'm fine with any of the above options. Looking for suggestions
> and reaching a consensus on this.
>
> Thanks
> Amareshwari
>
> On 8/30/11 12:10 AM, "Allen Wittenauer" <aw...@apache.org> wrote:
>
>
>
> I have a feeling this discussion should get moved to common-dev or even to
> general.
>
> My #1 question is if tools is basically contrib reborn.  If not, what makes
> it different?
>
> On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote:
>
> > Some questions on making hadoop-tools top level under trunk,
> >
> > 1.  Should the patches for tools be created against Hadoop Common?
> > 2.  What will happen to the tools test automation? Will it run as part of
> Hadoop Common tests?
> > 3.  Will it introduce a dependency from MapReduce to Common? Or is this
> taken care in Mavenization?
> >
> >
> > Thanks
> > Amareshwari
> >
> > On 8/26/11 10:17 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
> >
> > Please, don't add more Mavenization work on us (eventually I want to go
> back
> > to coding)
> >
> > Given that Hadoop is already Mavenized, the patch should be Mavenized.
> >
> > What will have to be done extra (besides Mavenizing distcp) is to create
> a
> > hadoop-tools module at root level and within it a hadoop-distcp module.
> >
> > The hadoop-tools POM will look pretty much like the hadoop-common-project
> > POM.
> >
> > The hadoop-distcp POM should follow the hadoop-common POM patterns.
> >
> > Thanks.
> >
> > Alejandro
> >
> > On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
> > amarsri@yahoo-inc.com> wrote:
> >
> >> Agree with Mithun and Robert. DistCp and Tools restructuring are
> separate
> >> tasks. Since DistCp code is ready to be committed, it need not wait for
> the
> >> Tools separation from MR/HDFS.
> >> I would say it can go into contrib as the patch is now, and when the
> tools
> >> restructuring happens it would be just an svn mv.  If there are no
> issues
> >> with this proposal I can commit the code tomorrow.
> >>
> >> Thanks
> >> Amareshwari
> >>
> >> On 8/26/11 7:45 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote:
> >>
> >> I agree with Mithun.  They are related but this goes beyond distcpv2 and
> >> should not block distcpv2 from going in.  It would be very nice,
> however, to
> >> get the layout settled soon so that we all know where to find something
> when
> >> we want to work on it.
> >>
> >> Also +1 for Alejandro's I also prefer to keep tools at the trunk level.
> >>
> >> Even though HDFS, Common, and Mapreduce and perhaps soon tools are
> separate
> >> modules right now, there is still tight coupling between the different
> >> pieces, especially with tests.  IMO until we can reduce that coupling we
> >> should treat building and testing Hadoop as a single project instead of
> >> trying to keep them separate.
> >>
> >> --Bobby
> >>
> >> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <
> mithun.radhakrishnan@yahoo.com>
> >> wrote:
> >>
> >> Would it be acceptable if retooling of tools/ were taken up separately?
> It
> >> sounds to me like this might be a distinct (albeit related) task.
> >>
> >> Mithun
> >>
> >>
> >> ________________________________
> >> From: Giridharan Kesavan <gk...@hortonworks.com>
> >> To: mapreduce-dev@hadoop.apache.org
> >> Sent: Friday, August 26, 2011 12:04 PM
> >> Subject: Re: DistCpV2 in 0.23
> >>
> >> +1 to Alejandro's
> >>
> >> I prefer to keep the hadoop-tools at trunk level.
> >>
> >> -Giri
> >>
> >> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <tu...@cloudera.com>
> >> wrote:
> >>> I'd suggest putting hadoop-tools either at trunk/ level or having a a
> >> tools
> >>> aggregator module for hdfs and other for common.
> >>>
> >>> I personal would prefer at trunk/.
> >>>
> >>> Thanks.
> >>>
> >>> Alejandro
> >>>
> >>> On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
> >>> amarsri@yahoo-inc.com> wrote:
> >>>
> >>>> Agree. It should be separate maven module (and patch puts it as
> separate
> >>>> maven module now). And top level for hadoop tools is nice to have, but
> >> it
> >>>> becomes hard to maintain until patch automation tests run the tests
> >> under
> >>>> tools. Currently we see many times the changes in HDFS effecting RAID
> >> tests
> >>>> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
> >>>>
> >>>> I propose we can have something like the following:
> >>>>
> >>>> trunk/
> >>>> - hadoop-mapreduce
> >>>>     - hadoop-mr-client
> >>>>     - hadoop-yarn
> >>>>     - hadoop-tools
> >>>>         - hadoop-streaming
> >>>>         - hadoop-archives
> >>>>         - hadoop-distcp
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> @Eli and @JD, we did not replace old legacy distcp because this is
> >> really a
> >>>> complete rewrite and did not want to remove it until users are
> >> familiarized
> >>>> with new one.
> >>>>
> >>>> On 8/26/11 12:51 AM, "Todd Lipcon" <to...@cloudera.com> wrote:
> >>>>
> >>>> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
> >>>> in there as well - ie tools that are downstream of MR and/or HDFS.
> >>>>
> >>>> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
> >> mahadev@hortonworks.com>
> >>>> wrote:
> >>>>> +1 for a seperate module in hadoop-mapreduce-project. I think
> >>>>> hadoop-mapreduce-client might not be right place for it. We might
> have
> >>>>> to pick a new maven module under hadoop-mapreduce-project that could
> >>>>> host streaming/distcp/hadoop archives.
> >>>>>
> >>>>> thanks
> >>>>> mahadev
> >>>>>
> >>>>> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
> >> tucu@cloudera.com>
> >>>> wrote:
> >>>>>> Agree, it should be a separate maven module.
> >>>>>>
> >>>>>> And it should be under hadoop-mapreduce-client, right?
> >>>>>>
> >>>>>> And now that we are in the topic, the same should go for streaming,
> >> no?
> >>>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>> Alejandro
> >>>>>>
> >>>>>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <to...@cloudera.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <el...@cloudera.com>
> >>>> wrote:
> >>>>>>>> Nice work!   I definitely think this should go in 23 and 20x.
> >>>>>>>>
> >>>>>>>> Agree with JD that it should be in the core code, not contrib.  If
> >>>>>>>> it's going to be maintained then we should put it in the core
> >> code.
> >>>>>>>
> >>>>>>> Now that we're all mavenized, though, a separate maven module and
> >>>>>>> artifact does make sense IMO - ie "hadoop jar
> >>>>>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"
> >>>>>>>
> >>>>>>> -Todd
> >>>>>>> --
> >>>>>>> Todd Lipcon
> >>>>>>> Software Engineer, Cloudera
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Todd Lipcon
> >>>> Software Engineer, Cloudera
> >>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> -Giri
> >>
> >>
> >>
> >
>
>
>

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
As long as hadoop-tools is in some directory at some depth under trunk,
release of the hadoop-tools is tied to the release of core.

So we actually have these two options instead:
(1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools)
    -- Sources at tools/trunk/hadoop-distcp
    -- Each tool will work with specific version of Hadoop core.
    -- Releases can really be separate
(2) Same source tree: trunk/
    -- Sources at either (1.1) trunk/hadoop-tools or (1.2)
trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/
    -- Given release isn't decoupled anyway, either will work. (1.2) is
prefereable if building mapreduce builds the tools also.

+Vinod


On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu <
amarsri@yahoo-inc.com> wrote:

> Copying common-dev.
>
> Summarizing the below discussion: What should be the tools layout after
> mavenization?
>
> Option #1: Have hadoop-tools at top level i.e
> trunk/
>   hadoop-tools/
>       hadoop-distcp/
> Pros:
>  Cleaner layout.
>  In future, tools could be released separately from  Hadoop releases
>
> Cons: Difficult to maintain
>
> Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if
> they are depending on MapReduce/HDFS/Common respectively.
> For ex:
> hadoop-mapreduce-project/
>   hadoop-mr-tools/
>      hadoop-distcp/
>
> Pros: Easy to maintain
> Cons: Still has tight coupling with related projects.
>
> Personally, I'm fine with any of the above options. Looking for suggestions
> and reaching a consensus on this.
>
> Thanks
> Amareshwari
>
> On 8/30/11 12:10 AM, "Allen Wittenauer" <aw...@apache.org> wrote:
>
>
>
> I have a feeling this discussion should get moved to common-dev or even to
> general.
>
> My #1 question is if tools is basically contrib reborn.  If not, what makes
> it different?
>
> On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote:
>
> > Some questions on making hadoop-tools top level under trunk,
> >
> > 1.  Should the patches for tools be created against Hadoop Common?
> > 2.  What will happen to the tools test automation? Will it run as part of
> Hadoop Common tests?
> > 3.  Will it introduce a dependency from MapReduce to Common? Or is this
> taken care in Mavenization?
> >
> >
> > Thanks
> > Amareshwari
> >
> > On 8/26/11 10:17 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
> >
> > Please, don't add more Mavenization work on us (eventually I want to go
> back
> > to coding)
> >
> > Given that Hadoop is already Mavenized, the patch should be Mavenized.
> >
> > What will have to be done extra (besides Mavenizing distcp) is to create
> a
> > hadoop-tools module at root level and within it a hadoop-distcp module.
> >
> > The hadoop-tools POM will look pretty much like the hadoop-common-project
> > POM.
> >
> > The hadoop-distcp POM should follow the hadoop-common POM patterns.
> >
> > Thanks.
> >
> > Alejandro
> >
> > On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
> > amarsri@yahoo-inc.com> wrote:
> >
> >> Agree with Mithun and Robert. DistCp and Tools restructuring are
> separate
> >> tasks. Since DistCp code is ready to be committed, it need not wait for
> the
> >> Tools separation from MR/HDFS.
> >> I would say it can go into contrib as the patch is now, and when the
> tools
> >> restructuring happens it would be just an svn mv.  If there are no
> issues
> >> with this proposal I can commit the code tomorrow.
> >>
> >> Thanks
> >> Amareshwari
> >>
> >> On 8/26/11 7:45 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote:
> >>
> >> I agree with Mithun.  They are related but this goes beyond distcpv2 and
> >> should not block distcpv2 from going in.  It would be very nice,
> however, to
> >> get the layout settled soon so that we all know where to find something
> when
> >> we want to work on it.
> >>
> >> Also +1 for Alejandro's I also prefer to keep tools at the trunk level.
> >>
> >> Even though HDFS, Common, and Mapreduce and perhaps soon tools are
> separate
> >> modules right now, there is still tight coupling between the different
> >> pieces, especially with tests.  IMO until we can reduce that coupling we
> >> should treat building and testing Hadoop as a single project instead of
> >> trying to keep them separate.
> >>
> >> --Bobby
> >>
> >> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <
> mithun.radhakrishnan@yahoo.com>
> >> wrote:
> >>
> >> Would it be acceptable if retooling of tools/ were taken up separately?
> It
> >> sounds to me like this might be a distinct (albeit related) task.
> >>
> >> Mithun
> >>
> >>
> >> ________________________________
> >> From: Giridharan Kesavan <gk...@hortonworks.com>
> >> To: mapreduce-dev@hadoop.apache.org
> >> Sent: Friday, August 26, 2011 12:04 PM
> >> Subject: Re: DistCpV2 in 0.23
> >>
> >> +1 to Alejandro's
> >>
> >> I prefer to keep the hadoop-tools at trunk level.
> >>
> >> -Giri
> >>
> >> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <tu...@cloudera.com>
> >> wrote:
> >>> I'd suggest putting hadoop-tools either at trunk/ level or having a a
> >> tools
> >>> aggregator module for hdfs and other for common.
> >>>
> >>> I personal would prefer at trunk/.
> >>>
> >>> Thanks.
> >>>
> >>> Alejandro
> >>>
> >>> On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
> >>> amarsri@yahoo-inc.com> wrote:
> >>>
> >>>> Agree. It should be separate maven module (and patch puts it as
> separate
> >>>> maven module now). And top level for hadoop tools is nice to have, but
> >> it
> >>>> becomes hard to maintain until patch automation tests run the tests
> >> under
> >>>> tools. Currently we see many times the changes in HDFS effecting RAID
> >> tests
> >>>> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
> >>>>
> >>>> I propose we can have something like the following:
> >>>>
> >>>> trunk/
> >>>> - hadoop-mapreduce
> >>>>     - hadoop-mr-client
> >>>>     - hadoop-yarn
> >>>>     - hadoop-tools
> >>>>         - hadoop-streaming
> >>>>         - hadoop-archives
> >>>>         - hadoop-distcp
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> @Eli and @JD, we did not replace old legacy distcp because this is
> >> really a
> >>>> complete rewrite and did not want to remove it until users are
> >> familiarized
> >>>> with new one.
> >>>>
> >>>> On 8/26/11 12:51 AM, "Todd Lipcon" <to...@cloudera.com> wrote:
> >>>>
> >>>> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
> >>>> in there as well - ie tools that are downstream of MR and/or HDFS.
> >>>>
> >>>> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
> >> mahadev@hortonworks.com>
> >>>> wrote:
> >>>>> +1 for a seperate module in hadoop-mapreduce-project. I think
> >>>>> hadoop-mapreduce-client might not be right place for it. We might
> have
> >>>>> to pick a new maven module under hadoop-mapreduce-project that could
> >>>>> host streaming/distcp/hadoop archives.
> >>>>>
> >>>>> thanks
> >>>>> mahadev
> >>>>>
> >>>>> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
> >> tucu@cloudera.com>
> >>>> wrote:
> >>>>>> Agree, it should be a separate maven module.
> >>>>>>
> >>>>>> And it should be under hadoop-mapreduce-client, right?
> >>>>>>
> >>>>>> And now that we are in the topic, the same should go for streaming,
> >> no?
> >>>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>> Alejandro
> >>>>>>
> >>>>>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <to...@cloudera.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <el...@cloudera.com>
> >>>> wrote:
> >>>>>>>> Nice work!   I definitely think this should go in 23 and 20x.
> >>>>>>>>
> >>>>>>>> Agree with JD that it should be in the core code, not contrib.  If
> >>>>>>>> it's going to be maintained then we should put it in the core
> >> code.
> >>>>>>>
> >>>>>>> Now that we're all mavenized, though, a separate maven module and
> >>>>>>> artifact does make sense IMO - ie "hadoop jar
> >>>>>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"
> >>>>>>>
> >>>>>>> -Todd
> >>>>>>> --
> >>>>>>> Todd Lipcon
> >>>>>>> Software Engineer, Cloudera
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Todd Lipcon
> >>>> Software Engineer, Cloudera
> >>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> -Giri
> >>
> >>
> >>
> >
>
>
>

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
As long as hadoop-tools is in some directory at some depth under trunk,
release of the hadoop-tools is tied to the release of core.

So we actually have these two options instead:
(1) Separate source tree (http://svn.apache.org/repos/asf/hadoop/tools)
    -- Sources at tools/trunk/hadoop-distcp
    -- Each tool will work with specific version of Hadoop core.
    -- Releases can really be separate
(2) Same source tree: trunk/
    -- Sources at either (1.1) trunk/hadoop-tools or (1.2)
trunk/hadoop-mapreduce-project/hadoop-mr-tools/hadoop-distcp/
    -- Given release isn't decoupled anyway, either will work. (1.2) is
prefereable if building mapreduce builds the tools also.

+Vinod


On Tue, Aug 30, 2011 at 1:31 PM, Amareshwari Sri Ramadasu <
amarsri@yahoo-inc.com> wrote:

> Copying common-dev.
>
> Summarizing the below discussion: What should be the tools layout after
> mavenization?
>
> Option #1: Have hadoop-tools at top level i.e
> trunk/
>   hadoop-tools/
>       hadoop-distcp/
> Pros:
>  Cleaner layout.
>  In future, tools could be released separately from  Hadoop releases
>
> Cons: Difficult to maintain
>
> Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if
> they are depending on MapReduce/HDFS/Common respectively.
> For ex:
> hadoop-mapreduce-project/
>   hadoop-mr-tools/
>      hadoop-distcp/
>
> Pros: Easy to maintain
> Cons: Still has tight coupling with related projects.
>
> Personally, I'm fine with any of the above options. Looking for suggestions
> and reaching a consensus on this.
>
> Thanks
> Amareshwari
>
> On 8/30/11 12:10 AM, "Allen Wittenauer" <aw...@apache.org> wrote:
>
>
>
> I have a feeling this discussion should get moved to common-dev or even to
> general.
>
> My #1 question is if tools is basically contrib reborn.  If not, what makes
> it different?
>
> On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote:
>
> > Some questions on making hadoop-tools top level under trunk,
> >
> > 1.  Should the patches for tools be created against Hadoop Common?
> > 2.  What will happen to the tools test automation? Will it run as part of
> Hadoop Common tests?
> > 3.  Will it introduce a dependency from MapReduce to Common? Or is this
> taken care in Mavenization?
> >
> >
> > Thanks
> > Amareshwari
> >
> > On 8/26/11 10:17 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
> >
> > Please, don't add more Mavenization work on us (eventually I want to go
> back
> > to coding)
> >
> > Given that Hadoop is already Mavenized, the patch should be Mavenized.
> >
> > What will have to be done extra (besides Mavenizing distcp) is to create
> a
> > hadoop-tools module at root level and within it a hadoop-distcp module.
> >
> > The hadoop-tools POM will look pretty much like the hadoop-common-project
> > POM.
> >
> > The hadoop-distcp POM should follow the hadoop-common POM patterns.
> >
> > Thanks.
> >
> > Alejandro
> >
> > On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
> > amarsri@yahoo-inc.com> wrote:
> >
> >> Agree with Mithun and Robert. DistCp and Tools restructuring are
> separate
> >> tasks. Since DistCp code is ready to be committed, it need not wait for
> the
> >> Tools separation from MR/HDFS.
> >> I would say it can go into contrib as the patch is now, and when the
> tools
> >> restructuring happens it would be just an svn mv.  If there are no
> issues
> >> with this proposal I can commit the code tomorrow.
> >>
> >> Thanks
> >> Amareshwari
> >>
> >> On 8/26/11 7:45 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote:
> >>
> >> I agree with Mithun.  They are related but this goes beyond distcpv2 and
> >> should not block distcpv2 from going in.  It would be very nice,
> however, to
> >> get the layout settled soon so that we all know where to find something
> when
> >> we want to work on it.
> >>
> >> Also +1 for Alejandro's I also prefer to keep tools at the trunk level.
> >>
> >> Even though HDFS, Common, and Mapreduce and perhaps soon tools are
> separate
> >> modules right now, there is still tight coupling between the different
> >> pieces, especially with tests.  IMO until we can reduce that coupling we
> >> should treat building and testing Hadoop as a single project instead of
> >> trying to keep them separate.
> >>
> >> --Bobby
> >>
> >> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <
> mithun.radhakrishnan@yahoo.com>
> >> wrote:
> >>
> >> Would it be acceptable if retooling of tools/ were taken up separately?
> It
> >> sounds to me like this might be a distinct (albeit related) task.
> >>
> >> Mithun
> >>
> >>
> >> ________________________________
> >> From: Giridharan Kesavan <gk...@hortonworks.com>
> >> To: mapreduce-dev@hadoop.apache.org
> >> Sent: Friday, August 26, 2011 12:04 PM
> >> Subject: Re: DistCpV2 in 0.23
> >>
> >> +1 to Alejandro's
> >>
> >> I prefer to keep the hadoop-tools at trunk level.
> >>
> >> -Giri
> >>
> >> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <tu...@cloudera.com>
> >> wrote:
> >>> I'd suggest putting hadoop-tools either at trunk/ level or having a a
> >> tools
> >>> aggregator module for hdfs and other for common.
> >>>
> >>> I personal would prefer at trunk/.
> >>>
> >>> Thanks.
> >>>
> >>> Alejandro
> >>>
> >>> On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
> >>> amarsri@yahoo-inc.com> wrote:
> >>>
> >>>> Agree. It should be separate maven module (and patch puts it as
> separate
> >>>> maven module now). And top level for hadoop tools is nice to have, but
> >> it
> >>>> becomes hard to maintain until patch automation tests run the tests
> >> under
> >>>> tools. Currently we see many times the changes in HDFS effecting RAID
> >> tests
> >>>> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
> >>>>
> >>>> I propose we can have something like the following:
> >>>>
> >>>> trunk/
> >>>> - hadoop-mapreduce
> >>>>     - hadoop-mr-client
> >>>>     - hadoop-yarn
> >>>>     - hadoop-tools
> >>>>         - hadoop-streaming
> >>>>         - hadoop-archives
> >>>>         - hadoop-distcp
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> @Eli and @JD, we did not replace old legacy distcp because this is
> >> really a
> >>>> complete rewrite and did not want to remove it until users are
> >> familiarized
> >>>> with new one.
> >>>>
> >>>> On 8/26/11 12:51 AM, "Todd Lipcon" <to...@cloudera.com> wrote:
> >>>>
> >>>> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
> >>>> in there as well - ie tools that are downstream of MR and/or HDFS.
> >>>>
> >>>> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
> >> mahadev@hortonworks.com>
> >>>> wrote:
> >>>>> +1 for a seperate module in hadoop-mapreduce-project. I think
> >>>>> hadoop-mapreduce-client might not be right place for it. We might
> have
> >>>>> to pick a new maven module under hadoop-mapreduce-project that could
> >>>>> host streaming/distcp/hadoop archives.
> >>>>>
> >>>>> thanks
> >>>>> mahadev
> >>>>>
> >>>>> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
> >> tucu@cloudera.com>
> >>>> wrote:
> >>>>>> Agree, it should be a separate maven module.
> >>>>>>
> >>>>>> And it should be under hadoop-mapreduce-client, right?
> >>>>>>
> >>>>>> And now that we are in the topic, the same should go for streaming,
> >> no?
> >>>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>> Alejandro
> >>>>>>
> >>>>>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <to...@cloudera.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <el...@cloudera.com>
> >>>> wrote:
> >>>>>>>> Nice work!   I definitely think this should go in 23 and 20x.
> >>>>>>>>
> >>>>>>>> Agree with JD that it should be in the core code, not contrib.  If
> >>>>>>>> it's going to be maintained then we should put it in the core
> >> code.
> >>>>>>>
> >>>>>>> Now that we're all mavenized, though, a separate maven module and
> >>>>>>> artifact does make sense IMO - ie "hadoop jar
> >>>>>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"
> >>>>>>>
> >>>>>>> -Todd
> >>>>>>> --
> >>>>>>> Todd Lipcon
> >>>>>>> Software Engineer, Cloudera
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Todd Lipcon
> >>>> Software Engineer, Cloudera
> >>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> -Giri
> >>
> >>
> >>
> >
>
>
>

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Amareshwari Sri Ramadasu <am...@yahoo-inc.com>.
Copying common-dev.

Summarizing the below discussion: What should be the tools layout after mavenization?

Option #1: Have hadoop-tools at top level i.e
trunk/
   hadoop-tools/
       hadoop-distcp/
Pros:
 Cleaner layout.
 In future, tools could be released separately from  Hadoop releases

Cons: Difficult to maintain

Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if they are depending on MapReduce/HDFS/Common respectively.
For ex:
hadoop-mapreduce-project/
   hadoop-mr-tools/
      hadoop-distcp/

Pros: Easy to maintain
Cons: Still has tight coupling with related projects.

Personally, I'm fine with any of the above options. Looking for suggestions and reaching a consensus on this.

Thanks
Amareshwari

On 8/30/11 12:10 AM, "Allen Wittenauer" <aw...@apache.org> wrote:



I have a feeling this discussion should get moved to common-dev or even to general.

My #1 question is if tools is basically contrib reborn.  If not, what makes it different?

On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote:

> Some questions on making hadoop-tools top level under trunk,
>
> 1.  Should the patches for tools be created against Hadoop Common?
> 2.  What will happen to the tools test automation? Will it run as part of Hadoop Common tests?
> 3.  Will it introduce a dependency from MapReduce to Common? Or is this taken care in Mavenization?
>
>
> Thanks
> Amareshwari
>
> On 8/26/11 10:17 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
>
> Please, don't add more Mavenization work on us (eventually I want to go back
> to coding)
>
> Given that Hadoop is already Mavenized, the patch should be Mavenized.
>
> What will have to be done extra (besides Mavenizing distcp) is to create a
> hadoop-tools module at root level and within it a hadoop-distcp module.
>
> The hadoop-tools POM will look pretty much like the hadoop-common-project
> POM.
>
> The hadoop-distcp POM should follow the hadoop-common POM patterns.
>
> Thanks.
>
> Alejandro
>
> On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
> amarsri@yahoo-inc.com> wrote:
>
>> Agree with Mithun and Robert. DistCp and Tools restructuring are separate
>> tasks. Since DistCp code is ready to be committed, it need not wait for the
>> Tools separation from MR/HDFS.
>> I would say it can go into contrib as the patch is now, and when the tools
>> restructuring happens it would be just an svn mv.  If there are no issues
>> with this proposal I can commit the code tomorrow.
>>
>> Thanks
>> Amareshwari
>>
>> On 8/26/11 7:45 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote:
>>
>> I agree with Mithun.  They are related but this goes beyond distcpv2 and
>> should not block distcpv2 from going in.  It would be very nice, however, to
>> get the layout settled soon so that we all know where to find something when
>> we want to work on it.
>>
>> Also +1 for Alejandro's I also prefer to keep tools at the trunk level.
>>
>> Even though HDFS, Common, and Mapreduce and perhaps soon tools are separate
>> modules right now, there is still tight coupling between the different
>> pieces, especially with tests.  IMO until we can reduce that coupling we
>> should treat building and testing Hadoop as a single project instead of
>> trying to keep them separate.
>>
>> --Bobby
>>
>> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <mi...@yahoo.com>
>> wrote:
>>
>> Would it be acceptable if retooling of tools/ were taken up separately? It
>> sounds to me like this might be a distinct (albeit related) task.
>>
>> Mithun
>>
>>
>> ________________________________
>> From: Giridharan Kesavan <gk...@hortonworks.com>
>> To: mapreduce-dev@hadoop.apache.org
>> Sent: Friday, August 26, 2011 12:04 PM
>> Subject: Re: DistCpV2 in 0.23
>>
>> +1 to Alejandro's
>>
>> I prefer to keep the hadoop-tools at trunk level.
>>
>> -Giri
>>
>> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <tu...@cloudera.com>
>> wrote:
>>> I'd suggest putting hadoop-tools either at trunk/ level or having a a
>> tools
>>> aggregator module for hdfs and other for common.
>>>
>>> I personal would prefer at trunk/.
>>>
>>> Thanks.
>>>
>>> Alejandro
>>>
>>> On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
>>> amarsri@yahoo-inc.com> wrote:
>>>
>>>> Agree. It should be separate maven module (and patch puts it as separate
>>>> maven module now). And top level for hadoop tools is nice to have, but
>> it
>>>> becomes hard to maintain until patch automation tests run the tests
>> under
>>>> tools. Currently we see many times the changes in HDFS effecting RAID
>> tests
>>>> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
>>>>
>>>> I propose we can have something like the following:
>>>>
>>>> trunk/
>>>> - hadoop-mapreduce
>>>>     - hadoop-mr-client
>>>>     - hadoop-yarn
>>>>     - hadoop-tools
>>>>         - hadoop-streaming
>>>>         - hadoop-archives
>>>>         - hadoop-distcp
>>>>
>>>> Thoughts?
>>>>
>>>> @Eli and @JD, we did not replace old legacy distcp because this is
>> really a
>>>> complete rewrite and did not want to remove it until users are
>> familiarized
>>>> with new one.
>>>>
>>>> On 8/26/11 12:51 AM, "Todd Lipcon" <to...@cloudera.com> wrote:
>>>>
>>>> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
>>>> in there as well - ie tools that are downstream of MR and/or HDFS.
>>>>
>>>> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
>> mahadev@hortonworks.com>
>>>> wrote:
>>>>> +1 for a seperate module in hadoop-mapreduce-project. I think
>>>>> hadoop-mapreduce-client might not be right place for it. We might have
>>>>> to pick a new maven module under hadoop-mapreduce-project that could
>>>>> host streaming/distcp/hadoop archives.
>>>>>
>>>>> thanks
>>>>> mahadev
>>>>>
>>>>> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
>> tucu@cloudera.com>
>>>> wrote:
>>>>>> Agree, it should be a separate maven module.
>>>>>>
>>>>>> And it should be under hadoop-mapreduce-client, right?
>>>>>>
>>>>>> And now that we are in the topic, the same should go for streaming,
>> no?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Alejandro
>>>>>>
>>>>>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <to...@cloudera.com>
>>>> wrote:
>>>>>>
>>>>>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <el...@cloudera.com>
>>>> wrote:
>>>>>>>> Nice work!   I definitely think this should go in 23 and 20x.
>>>>>>>>
>>>>>>>> Agree with JD that it should be in the core code, not contrib.  If
>>>>>>>> it's going to be maintained then we should put it in the core
>> code.
>>>>>>>
>>>>>>> Now that we're all mavenized, though, a separate maven module and
>>>>>>> artifact does make sense IMO - ie "hadoop jar
>>>>>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"
>>>>>>>
>>>>>>> -Todd
>>>>>>> --
>>>>>>> Todd Lipcon
>>>>>>> Software Engineer, Cloudera
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> -Giri
>>
>>
>>
>



Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Amareshwari Sri Ramadasu <am...@yahoo-inc.com>.
Copying common-dev.

Summarizing the below discussion: What should be the tools layout after mavenization?

Option #1: Have hadoop-tools at top level i.e
trunk/
   hadoop-tools/
       hadoop-distcp/
Pros:
 Cleaner layout.
 In future, tools could be released separately from  Hadoop releases

Cons: Difficult to maintain

Option #2: Keep the tools aggregator module for MapReduce/HDFS/Common if they are depending on MapReduce/HDFS/Common respectively.
For ex:
hadoop-mapreduce-project/
   hadoop-mr-tools/
      hadoop-distcp/

Pros: Easy to maintain
Cons: Still has tight coupling with related projects.

Personally, I'm fine with any of the above options. Looking for suggestions and reaching a consensus on this.

Thanks
Amareshwari

On 8/30/11 12:10 AM, "Allen Wittenauer" <aw...@apache.org> wrote:



I have a feeling this discussion should get moved to common-dev or even to general.

My #1 question is if tools is basically contrib reborn.  If not, what makes it different?

On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote:

> Some questions on making hadoop-tools top level under trunk,
>
> 1.  Should the patches for tools be created against Hadoop Common?
> 2.  What will happen to the tools test automation? Will it run as part of Hadoop Common tests?
> 3.  Will it introduce a dependency from MapReduce to Common? Or is this taken care in Mavenization?
>
>
> Thanks
> Amareshwari
>
> On 8/26/11 10:17 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
>
> Please, don't add more Mavenization work on us (eventually I want to go back
> to coding)
>
> Given that Hadoop is already Mavenized, the patch should be Mavenized.
>
> What will have to be done extra (besides Mavenizing distcp) is to create a
> hadoop-tools module at root level and within it a hadoop-distcp module.
>
> The hadoop-tools POM will look pretty much like the hadoop-common-project
> POM.
>
> The hadoop-distcp POM should follow the hadoop-common POM patterns.
>
> Thanks.
>
> Alejandro
>
> On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
> amarsri@yahoo-inc.com> wrote:
>
>> Agree with Mithun and Robert. DistCp and Tools restructuring are separate
>> tasks. Since DistCp code is ready to be committed, it need not wait for the
>> Tools separation from MR/HDFS.
>> I would say it can go into contrib as the patch is now, and when the tools
>> restructuring happens it would be just an svn mv.  If there are no issues
>> with this proposal I can commit the code tomorrow.
>>
>> Thanks
>> Amareshwari
>>
>> On 8/26/11 7:45 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote:
>>
>> I agree with Mithun.  They are related but this goes beyond distcpv2 and
>> should not block distcpv2 from going in.  It would be very nice, however, to
>> get the layout settled soon so that we all know where to find something when
>> we want to work on it.
>>
>> Also +1 for Alejandro's I also prefer to keep tools at the trunk level.
>>
>> Even though HDFS, Common, and Mapreduce and perhaps soon tools are separate
>> modules right now, there is still tight coupling between the different
>> pieces, especially with tests.  IMO until we can reduce that coupling we
>> should treat building and testing Hadoop as a single project instead of
>> trying to keep them separate.
>>
>> --Bobby
>>
>> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <mi...@yahoo.com>
>> wrote:
>>
>> Would it be acceptable if retooling of tools/ were taken up separately? It
>> sounds to me like this might be a distinct (albeit related) task.
>>
>> Mithun
>>
>>
>> ________________________________
>> From: Giridharan Kesavan <gk...@hortonworks.com>
>> To: mapreduce-dev@hadoop.apache.org
>> Sent: Friday, August 26, 2011 12:04 PM
>> Subject: Re: DistCpV2 in 0.23
>>
>> +1 to Alejandro's
>>
>> I prefer to keep the hadoop-tools at trunk level.
>>
>> -Giri
>>
>> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <tu...@cloudera.com>
>> wrote:
>>> I'd suggest putting hadoop-tools either at trunk/ level or having a a
>> tools
>>> aggregator module for hdfs and other for common.
>>>
>>> I personal would prefer at trunk/.
>>>
>>> Thanks.
>>>
>>> Alejandro
>>>
>>> On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
>>> amarsri@yahoo-inc.com> wrote:
>>>
>>>> Agree. It should be separate maven module (and patch puts it as separate
>>>> maven module now). And top level for hadoop tools is nice to have, but
>> it
>>>> becomes hard to maintain until patch automation tests run the tests
>> under
>>>> tools. Currently we see many times the changes in HDFS effecting RAID
>> tests
>>>> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
>>>>
>>>> I propose we can have something like the following:
>>>>
>>>> trunk/
>>>> - hadoop-mapreduce
>>>>     - hadoop-mr-client
>>>>     - hadoop-yarn
>>>>     - hadoop-tools
>>>>         - hadoop-streaming
>>>>         - hadoop-archives
>>>>         - hadoop-distcp
>>>>
>>>> Thoughts?
>>>>
>>>> @Eli and @JD, we did not replace old legacy distcp because this is
>> really a
>>>> complete rewrite and did not want to remove it until users are
>> familiarized
>>>> with new one.
>>>>
>>>> On 8/26/11 12:51 AM, "Todd Lipcon" <to...@cloudera.com> wrote:
>>>>
>>>> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
>>>> in there as well - ie tools that are downstream of MR and/or HDFS.
>>>>
>>>> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
>> mahadev@hortonworks.com>
>>>> wrote:
>>>>> +1 for a seperate module in hadoop-mapreduce-project. I think
>>>>> hadoop-mapreduce-client might not be right place for it. We might have
>>>>> to pick a new maven module under hadoop-mapreduce-project that could
>>>>> host streaming/distcp/hadoop archives.
>>>>>
>>>>> thanks
>>>>> mahadev
>>>>>
>>>>> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
>> tucu@cloudera.com>
>>>> wrote:
>>>>>> Agree, it should be a separate maven module.
>>>>>>
>>>>>> And it should be under hadoop-mapreduce-client, right?
>>>>>>
>>>>>> And now that we are in the topic, the same should go for streaming,
>> no?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Alejandro
>>>>>>
>>>>>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <to...@cloudera.com>
>>>> wrote:
>>>>>>
>>>>>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <el...@cloudera.com>
>>>> wrote:
>>>>>>>> Nice work!   I definitely think this should go in 23 and 20x.
>>>>>>>>
>>>>>>>> Agree with JD that it should be in the core code, not contrib.  If
>>>>>>>> it's going to be maintained then we should put it in the core
>> code.
>>>>>>>
>>>>>>> Now that we're all mavenized, though, a separate maven module and
>>>>>>> artifact does make sense IMO - ie "hadoop jar
>>>>>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"
>>>>>>>
>>>>>>> -Todd
>>>>>>> --
>>>>>>> Todd Lipcon
>>>>>>> Software Engineer, Cloudera
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> -Giri
>>
>>
>>
>



Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Posted by Allen Wittenauer <aw...@apache.org>.
I have a feeling this discussion should get moved to common-dev or even to general.

My #1 question is if tools is basically contrib reborn.  If not, what makes it different?

On Aug 29, 2011, at 1:43 AM, Amareshwari Sri Ramadasu wrote:

> Some questions on making hadoop-tools top level under trunk,
> 
> 1.  Should the patches for tools be created against Hadoop Common?
> 2.  What will happen to the tools test automation? Will it run as part of Hadoop Common tests?
> 3.  Will it introduce a dependency from MapReduce to Common? Or is this taken care in Mavenization?
> 
> 
> Thanks
> Amareshwari
> 
> On 8/26/11 10:17 PM, "Alejandro Abdelnur" <tu...@cloudera.com> wrote:
> 
> Please, don't add more Mavenization work on us (eventually I want to go back
> to coding)
> 
> Given that Hadoop is already Mavenized, the patch should be Mavenized.
> 
> What will have to be done extra (besides Mavenizing distcp) is to create a
> hadoop-tools module at root level and within it a hadoop-distcp module.
> 
> The hadoop-tools POM will look pretty much like the hadoop-common-project
> POM.
> 
> The hadoop-distcp POM should follow the hadoop-common POM patterns.
> 
> Thanks.
> 
> Alejandro
> 
> On Fri, Aug 26, 2011 at 9:37 AM, Amareshwari Sri Ramadasu <
> amarsri@yahoo-inc.com> wrote:
> 
>> Agree with Mithun and Robert. DistCp and Tools restructuring are separate
>> tasks. Since DistCp code is ready to be committed, it need not wait for the
>> Tools separation from MR/HDFS.
>> I would say it can go into contrib as the patch is now, and when the tools
>> restructuring happens it would be just an svn mv.  If there are no issues
>> with this proposal I can commit the code tomorrow.
>> 
>> Thanks
>> Amareshwari
>> 
>> On 8/26/11 7:45 PM, "Robert Evans" <ev...@yahoo-inc.com> wrote:
>> 
>> I agree with Mithun.  They are related but this goes beyond distcpv2 and
>> should not block distcpv2 from going in.  It would be very nice, however, to
>> get the layout settled soon so that we all know where to find something when
>> we want to work on it.
>> 
>> Also +1 for Alejandro's I also prefer to keep tools at the trunk level.
>> 
>> Even though HDFS, Common, and Mapreduce and perhaps soon tools are separate
>> modules right now, there is still tight coupling between the different
>> pieces, especially with tests.  IMO until we can reduce that coupling we
>> should treat building and testing Hadoop as a single project instead of
>> trying to keep them separate.
>> 
>> --Bobby
>> 
>> On 8/26/11 7:45 AM, "Mithun Radhakrishnan" <mi...@yahoo.com>
>> wrote:
>> 
>> Would it be acceptable if retooling of tools/ were taken up separately? It
>> sounds to me like this might be a distinct (albeit related) task.
>> 
>> Mithun
>> 
>> 
>> ________________________________
>> From: Giridharan Kesavan <gk...@hortonworks.com>
>> To: mapreduce-dev@hadoop.apache.org
>> Sent: Friday, August 26, 2011 12:04 PM
>> Subject: Re: DistCpV2 in 0.23
>> 
>> +1 to Alejandro's
>> 
>> I prefer to keep the hadoop-tools at trunk level.
>> 
>> -Giri
>> 
>> On Thu, Aug 25, 2011 at 9:15 PM, Alejandro Abdelnur <tu...@cloudera.com>
>> wrote:
>>> I'd suggest putting hadoop-tools either at trunk/ level or having a a
>> tools
>>> aggregator module for hdfs and other for common.
>>> 
>>> I personal would prefer at trunk/.
>>> 
>>> Thanks.
>>> 
>>> Alejandro
>>> 
>>> On Thu, Aug 25, 2011 at 9:06 PM, Amareshwari Sri Ramadasu <
>>> amarsri@yahoo-inc.com> wrote:
>>> 
>>>> Agree. It should be separate maven module (and patch puts it as separate
>>>> maven module now). And top level for hadoop tools is nice to have, but
>> it
>>>> becomes hard to maintain until patch automation tests run the tests
>> under
>>>> tools. Currently we see many times the changes in HDFS effecting RAID
>> tests
>>>> in MapReduce. So, I'm fine putting the tools under hadoop-mapreduce.
>>>> 
>>>> I propose we can have something like the following:
>>>> 
>>>> trunk/
>>>> - hadoop-mapreduce
>>>>     - hadoop-mr-client
>>>>     - hadoop-yarn
>>>>     - hadoop-tools
>>>>         - hadoop-streaming
>>>>         - hadoop-archives
>>>>         - hadoop-distcp
>>>> 
>>>> Thoughts?
>>>> 
>>>> @Eli and @JD, we did not replace old legacy distcp because this is
>> really a
>>>> complete rewrite and did not want to remove it until users are
>> familiarized
>>>> with new one.
>>>> 
>>>> On 8/26/11 12:51 AM, "Todd Lipcon" <to...@cloudera.com> wrote:
>>>> 
>>>> Maybe a separate toplevel for hadoop-tools? Stuff like RAID could go
>>>> in there as well - ie tools that are downstream of MR and/or HDFS.
>>>> 
>>>> On Thu, Aug 25, 2011 at 12:09 PM, Mahadev Konar <
>> mahadev@hortonworks.com>
>>>> wrote:
>>>>> +1 for a seperate module in hadoop-mapreduce-project. I think
>>>>> hadoop-mapreduce-client might not be right place for it. We might have
>>>>> to pick a new maven module under hadoop-mapreduce-project that could
>>>>> host streaming/distcp/hadoop archives.
>>>>> 
>>>>> thanks
>>>>> mahadev
>>>>> 
>>>>> On Thu, Aug 25, 2011 at 11:04 AM, Alejandro Abdelnur <
>> tucu@cloudera.com>
>>>> wrote:
>>>>>> Agree, it should be a separate maven module.
>>>>>> 
>>>>>> And it should be under hadoop-mapreduce-client, right?
>>>>>> 
>>>>>> And now that we are in the topic, the same should go for streaming,
>> no?
>>>>>> 
>>>>>> Thanks.
>>>>>> 
>>>>>> Alejandro
>>>>>> 
>>>>>> On Thu, Aug 25, 2011 at 10:58 AM, Todd Lipcon <to...@cloudera.com>
>>>> wrote:
>>>>>> 
>>>>>>> On Thu, Aug 25, 2011 at 10:36 AM, Eli Collins <el...@cloudera.com>
>>>> wrote:
>>>>>>>> Nice work!   I definitely think this should go in 23 and 20x.
>>>>>>>> 
>>>>>>>> Agree with JD that it should be in the core code, not contrib.  If
>>>>>>>> it's going to be maintained then we should put it in the core
>> code.
>>>>>>> 
>>>>>>> Now that we're all mavenized, though, a separate maven module and
>>>>>>> artifact does make sense IMO - ie "hadoop jar
>>>>>>> hadoop-distcp-0.23.0-SNAPSHOT" rather than "hadoop distcp"
>>>>>>> 
>>>>>>> -Todd
>>>>>>> --
>>>>>>> Todd Lipcon
>>>>>>> Software Engineer, Cloudera
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> -Giri
>> 
>> 
>> 
>