You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Alejandro Abdelnur <tu...@cloudera.com> on 2011/10/18 21:41:42 UTC

Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)

Following up on this one, the hadoop-tools/ module is already in trunk,
distcp v2 addition could start.

Thanks.

Alejandro

On Mon, Sep 12, 2011 at 6:47 AM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> Alright, I think we've discussed enough on this and everybody seems to
> agree
> about a top level hadoop-tools module.
>
> Time to get into the action. I've filed HADOOP-7624. Amareshwari we can
> track the rest of the implementation related details and questions for your
> specific answers there.
>
> Thanks everyone for putting in your thoughts here.
> +Vinod
>
>
> On Fri, Sep 9, 2011 at 10:55 AM, Rottinghuis, Joep <jrottinghuis@ebay.com
> >wrote:
>
> > If hadoop-tools will be built as part of hadoop-common, then none of
> these
> > tools should be allowed to have a dependency on hdfs or mapreduce.
> > Conversely is also true, when tools do have any such dependency, they
> > cannot be bult as part of hadoop-common.
> > We cannot have circular dependencies like that.
> >
> > That is probably obvious, but I'm just saying...
> >
> > Joep
> > ________________________________________
> > From: Amareshwari Sri Ramadasu [amarsri@yahoo-inc.com]
> > Sent: Wednesday, September 07, 2011 9:33 PM
> > To: mapreduce-dev@hadoop.apache.org
> > Cc: common-dev@hadoop.apache.org
> > Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
> >
> > It is good to have hadoop-tools module separately. But as I asked before
> we
> > need to answer some questions here. I'm trying to answer them myself.
> > Comments are welcome.
> >
> > > > 1.  Should the patches for tools be created against Hadoop Common?
> > Here, I meant should Hadoop common mailing list be used Or should we have
> a
> > separate mailing list for Tools? I agree with Vinod  here, that we can
> tie
> > it Hadoop-common jira/mailing lists.
> >
> > > > 2.  What will happen to the tools test automation? Will it run as
> part
> > of Hadoop Common tests?
> > Jenkins nightly/patch builds for Hadoop tools can run as part of Hadoop
> > common if use Hadoop common mailing list for this.
> > Also, I propose every patch build of HDFS and MAPREDUCE should also run
> > tools tests to make sure nothing is broken. That would ease the
> maintenance
> > of hadoop-tools module. I presume tools test should not take much time
> (some
> > thing like not more than 30 minutes).
> >
> > > > 3.  Will it introduce a dependency from MapReduce to Common? Or is
> this
> > > taken care in Mavenization?
> > I'm not sure about this whether Mavenization can take care of it.
> >
> > Thanks
> > Amareshwari
> >
> > On 9/8/11 9:13 AM, "Rottinghuis, Joep" <jr...@ebay.com> wrote:
> >
> > Does a separate hadoop-tools module imply that there will be a separate
> > Jenkins build as well?
> >
> > Thanks,
> >
> > Joep
> > ________________________________________
> > From: Alejandro Abdelnur [tucu@cloudera.com]
> > Sent: Wednesday, September 07, 2011 11:35 AM
> > To: mapreduce-dev@hadoop.apache.org
> > Subject: Re: Hadoop Tools Layout (was Re: DistCpV2 in 0.23)
> >
> > Makes sense
> >
> > On Wed, Sep 7, 2011 at 11:32 AM, <Mi...@emc.com> wrote:
> >
> > > +1 for separate hadoop-tools module. However, if a tool is broken at
> > > release time, and no one comes forward to fix it, it should be removed.
> > > (i.e. Unlike contrib modules, where build and test failures were
> > > tolerated.)
> > >
> > > - milind
> > >
> > > On 9/7/11 11:27 AM, "Mahadev Konar" <ma...@hortonworks.com> wrote:
> > >
> > > >I like the idea of having tools as a seperate module and I dont think
> > > >that it will be a dumping ground unless we choose to make one of it.
> > > >
> > > >+1 for hadoop tools module under trunk.
> > > >
> > > >thanks
> > > >mahadev
> > > >
> > > >On Wed, Sep 7, 2011 at 11:18 AM, Alejandro Abdelnur <
> tucu@cloudera.com>
> > > >wrote:
> > > >> Agreed, we should not have a dumping ground. IMO, what it would go
> > into
> > > >> hadoop-tools (i.e. distcp, streaming and someone could argue for
> > > >>FsShell as
> > > >> well) are effectively hadoop CLI utilities. Having them in a
> separate
> > > >>module
> > > >> rather in than in the core module (common, hdfs, mapreduce) does not
> > > >>mean
> > > >> that they are secondary things, just modularization. Also it will
> help
> > > >>to
> > > >> get those tools to use public interfaces of the core module, and
> when
> > we
> > > >> finally have a clean hadoop-client layer, those tools should only
> > > >>depend on
> > > >> that.
> > > >>
> > > >> Finally, the fact that tools would end up under trunk/hadoop-tools,
> it
> > > >>does
> > > >> not prevent that the packaging from HDFS and MAPREDUCE to bundle the
> > > >> same/different tools
> > > >>
> > > >> +1 for hadoop-tools/ (not binding)
> > > >>
> > > >> Thanks.
> > > >>
> > > >>
> > > >> On Wed, Sep 7, 2011 at 10:50 AM, Eric Yang <er...@gmail.com>
> wrote:
> > > >>
> > > >>> Mapreduce and HDFS are distinct function of Hadoop.  They are
> loosely
> > > >>> coupled.  If we have tools aggregator module, it will not have as
> > > >>> clear distinct function as other Hadoop modules.  Hence, it is
> > > >>> possible for a tool to be depend on both HDFS and map reduce.  If
> > > >>> something broke in tools module, it is unclear which subproject's
> > > >>> responsibility to maintain tools function.  Therefore, it is safer
> to
> > > >>> send tools to incubator or apache extra rather than deposit the
> > > >>> utility tools in tools subcategory.  There are many short lived
> > > >>> projects that attempts to associate themselves with Hadoop but not
> > > >>> being maintained.  It would be better to spin off those utility
> > > >>> projects than use Hadoop as a dumping ground.
> > > >>>
> > > >>> The previous discussion for removing contrib, most people were in
> > > >>> favor of doing so, and only a few contrib owners were reluctant to
> > > >>> remove contrib.  Fewer people has participated in restore
> > > >>> functionality of broken contrib projects.  History speaks for
> itself.
> > > >>> -1 (non-binding) for hadoop-tools.
> > > >>>
> > > >>> regards,
> > > >>> Eric
> > > >>>
> > > >>> On Tue, Sep 6, 2011 at 6:55 PM, Alejandro Abdelnur <
> > tucu@cloudera.com>
> > > >>> wrote:
> > > >>> > Eric,
> > > >>> >
> > > >>> > Personally I'm fine either way.
> > > >>> >
> > > >>> > Still, I fail to see why a generic/categorized tools
> > increase/reduce
> > > >>>the
> > > >>> > risk of dead code and how they make more-difficult/easier the
> > > >>> > package&deployment.
> > > >>> >
> > > >>> > Would you please explain this?
> > > >>> >
> > > >>> > Thanks.
> > > >>> >
> > > >>> > Alejandro
> > > >>> >
> > > >>> > On Tue, Sep 6, 2011 at 6:38 PM, Eric Yang <er...@gmail.com>
> > wrote:
> > > >>> >
> > > >>> >> Option #2 proposed by Amareshwari, seems like a better proposal.
> >  We
> > > >>> don't
> > > >>> >> want to repeat history for contrib again with hadoop-tools.
> >  Having
> > > >>>a
> > > >>> >> generic module like hadoop-tools increases the risk of
> accumulate
> > > >>>dead
> > > >>> code.
> > > >>> >>  It would be better to categorize the hdfs or mapreduce specific
> > > >>>tools
> > > >>> in
> > > >>> >> their respected subcategories.  It is also easier to manage from
> > > >>> >> package/deployment prospective.
> > > >>> >>
> > > >>> >> regards,
> > > >>> >> Eric
> > > >>> >>
> > > >>> >> On Sep 6, 2011, at 4:32 PM, Eli Collins wrote:
> > > >>> >>
> > > >>> >> > On Tue, Sep 6, 2011 at 10:11 AM, Allen Wittenauer <
> > aw@apache.org>
> > > >>> wrote:
> > > >>> >> >>
> > > >>> >> >> On Sep 6, 2011, at 9:30 AM, Vinod Kumar Vavilapalli wrote:
> > > >>> >> >>> We still need to answer Amareshwari's question (2) she asked
> > > >>>some
> > > >>> time
> > > >>> >> back
> > > >>> >> >>> about the automated code compilation and test execution of
> the
> > > >>>tools
> > > >>> >> module.
> > > >>> >> >>
> > > >>> >> >>
> > > >>> >> >>
> > > >>> >> >>>>> My #1 question is if tools is basically contrib reborn.
>  If
> > > >>>not,
> > > >>> what
> > > >>> >> >>>> makes
> > > >>> >> >>>>> it different?
> > > >>> >> >>
> > > >>> >> >>
> > > >>> >> >>        I'm still waiting for this answer as well.
> > > >>> >> >>
> > > >>> >> >>        Until such, I would be pretty much against a tools
> > module.
> > > >>> >>  Changing the name of the dumping ground doesn't make it any
> less
> > > >>>of a
> > > >>> >> dumping ground.
> > > >>> >> >
> > > >>> >> > IMO if the tools module only gets stuff like distcp that's
> > > >>>maintained
> > > >>> >> > then it's not contrib, if it contains all the stuff from the
> > > >>>current
> > > >>> >> > MR contrib then tools is just a re-labeling of contrib. Given
> > that
> > > >>> >> > this proposal only covers moving distcp to tools it doesn't
> > sound
> > > >>>like
> > > >>> >> > contrib to me.
> > > >>> >> >
> > > >>> >> > Thanks,
> > > >>> >> > Eli
> > > >>> >>
> > > >>> >>
> > > >>> >
> > > >>>
> > > >>
> > > >
> > >
> > >
> >
> >
>