You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Alex Loddengaard <al...@cloudera.com> on 2008/11/20 18:52:30 UTC

Hadoop Development Status

Some engineers here at Cloudera have been working on a website to report on
Hadoop development status, and we're happy to announce that the website is
now available!  We've written a blog post describing its usefulness, goals,
and future, so take a look if you're interested:

<
http://www.cloudera.com/blog/2008/11/18/introducing-hadoop-development-status/
>

The tool is hosted here:

<http://community.cloudera.com>

Please give us any feedback or suggestions off-list, to avoid polluting the
list.

Enjoy!

Alex, Jeff, and Tom

Re: combiner without reducer

Posted by Ian Swett <is...@yahoo.com>.



--- On Thu, 11/20/08, Amogh Vasekar <va...@yahoo-inc.com> wrote:

> From: Amogh Vasekar <va...@yahoo-inc.com>
> Subject: combiner without reducer
> To: core-dev@hadoop.apache.org, core-user@hadoop.apache.org
> Date: Thursday, November 20, 2008, 9:48 PM
> Hi,
> I believe currently a combiner is not run unless you have
> atleast one
> reducer set. 
> Not getting into the Hadoop-18 semantics of combiner
> running on both
> sides ( the number of reducers are anyways 0, so I guess
> the
> merge-combine doesn't come into picture at all) , I
> have a use case
> where I would like to run a combiner without a reducer.
> Basically the aggregation ( a lookup sort of thing ) I do
> is dependent
> on a relatively small dataset, and the aggregation is
> independent of
> records in the map input data forming the input dataset,
> and hence the
> motivation for combine-without-reduce. 
> What I wanted to do was aggregate the similar records in
> the combiner (
> or particular instance of combiner ) in a single shot, this
> forming my
> output. This would save me from the amount of intermediate
> I/O involved
> in S&S phase at some partial I/O cost on the map +
> combine side, and I
> just wanted to try it out to see if its feasible at all. 
> Given combiner w/o reducer is not supported, I was thinking
> of doing it
> in a similar way Hadoop would do : create a buffer, sort,
> combine as I
> flush.
> Any thoughts on this would be really helpful.
> 
> Thanks,
> Amogh

combiner without reducer

Posted by Amogh Vasekar <va...@yahoo-inc.com>.

Hi,
I believe currently a combiner is not run unless you have atleast one
reducer set. 
Not getting into the Hadoop-18 semantics of combiner running on both
sides ( the number of reducers are anyways 0, so I guess the
merge-combine doesn't come into picture at all) , I have a use case
where I would like to run a combiner without a reducer.
Basically the aggregation ( a lookup sort of thing ) I do is dependent
on a relatively small dataset, and the aggregation is independent of
records in the map input data forming the input dataset, and hence the
motivation for combine-without-reduce. 
What I wanted to do was aggregate the similar records in the combiner (
or particular instance of combiner ) in a single shot, this forming my
output. This would save me from the amount of intermediate I/O involved
in S&S phase at some partial I/O cost on the map + combine side, and I
just wanted to try it out to see if its feasible at all. 
Given combiner w/o reducer is not supported, I was thinking of doing it
in a similar way Hadoop would do : create a buffer, sort, combine as I
flush.
Any thoughts on this would be really helpful.

Thanks,
Amogh

combiner without reducer

Posted by Amogh Vasekar <va...@yahoo-inc.com>.

Hi,
I believe currently a combiner is not run unless you have atleast one
reducer set. 
Not getting into the Hadoop-18 semantics of combiner running on both
sides ( the number of reducers are anyways 0, so I guess the
merge-combine doesn't come into picture at all) , I have a use case
where I would like to run a combiner without a reducer.
Basically the aggregation ( a lookup sort of thing ) I do is dependent
on a relatively small dataset, and the aggregation is independent of
records in the map input data forming the input dataset, and hence the
motivation for combine-without-reduce. 
What I wanted to do was aggregate the similar records in the combiner (
or particular instance of combiner ) in a single shot, this forming my
output. This would save me from the amount of intermediate I/O involved
in S&S phase at some partial I/O cost on the map + combine side, and I
just wanted to try it out to see if its feasible at all. 
Given combiner w/o reducer is not supported, I was thinking of doing it
in a similar way Hadoop would do : create a buffer, sort, combine as I
flush.
Any thoughts on this would be really helpful.

Thanks,
Amogh

Re: Hadoop Development Status

Posted by Alex Loddengaard <al...@cloudera.com>.

Hey Otis,
The code isn't that clean at the moment, and it's a bit intertwined with
some other stuff we are working on.  However, we do plan to include Hadoop
sub projects such as Pig, HBase, etc in the near future.

Hopefully this will be helpful in the meantime.  Thanks,

Alex

On Thu, Nov 20, 2008 at 1:18 PM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> Question for Alex:
>
> Are you going to be releasing this tool?  I'm sure my friends over at
> Lucene/Solr/Nutch/etc. would love to see their project's info presented in
> the same fashion. :)
>
>
> Thanks,
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
>
> ________________________________
> From: Konstantin Shvachko <sh...@yahoo-inc.com>
> To: core-user@hadoop.apache.org
> Sent: Thursday, November 20, 2008 1:41:20 PM
> Subject: Re: Hadoop Development Status
>
> This is very nice.
> A suggestion if it is related to the development status.
> Do you think guys you can analyze which questions are
> discussed most often in the mailing lists, so that we could
> update our FAQs based on that.
> Thanks,
> --Konstantin
>
>
> Alex Loddengaard wrote:
> > Some engineers here at Cloudera have been working on a website to report
> on
> > Hadoop development status, and we're happy to announce that the website
> is
> > now available!  We've written a blog post describing its usefulness,
> goals,
> > and future, so take a look if you're interested:
> >
> > <
> >
> http://www.cloudera.com/blog/2008/11/18/introducing-hadoop-development-status/
> >
> > The tool is hosted here:
> >
> > <http://community.cloudera.com>
> >
> > Please give us any feedback or suggestions off-list, to avoid polluting
> the
> > list.
> >
> > Enjoy!
> >
> > Alex, Jeff, and Tom
> >
>

Re: Hadoop Development Status

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Question for Alex:

Are you going to be releasing this tool?  I'm sure my friends over at Lucene/Solr/Nutch/etc. would love to see their project's info presented in the same fashion. :)


Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch




________________________________
From: Konstantin Shvachko <sh...@yahoo-inc.com>
To: core-user@hadoop.apache.org
Sent: Thursday, November 20, 2008 1:41:20 PM
Subject: Re: Hadoop Development Status

This is very nice.
A suggestion if it is related to the development status.
Do you think guys you can analyze which questions are
discussed most often in the mailing lists, so that we could
update our FAQs based on that.
Thanks,
--Konstantin


Alex Loddengaard wrote:
> Some engineers here at Cloudera have been working on a website to report on
> Hadoop development status, and we're happy to announce that the website is
> now available!  We've written a blog post describing its usefulness, goals,
> and future, so take a look if you're interested:
> 
> <
> http://www.cloudera.com/blog/2008/11/18/introducing-hadoop-development-status/
> 
> The tool is hosted here:
> 
> <http://community.cloudera.com>
> 
> Please give us any feedback or suggestions off-list, to avoid polluting the
> list.
> 
> Enjoy!
> 
> Alex, Jeff, and Tom
>

Re: Hadoop Development Status

Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.

This is very nice.
A suggestion if it is related to the development status.
Do you think guys you can analyze which questions are
discussed most often in the mailing lists, so that we could
update our FAQs based on that.
Thanks,
--Konstantin


Alex Loddengaard wrote:
> Some engineers here at Cloudera have been working on a website to report on
> Hadoop development status, and we're happy to announce that the website is
> now available!  We've written a blog post describing its usefulness, goals,
> and future, so take a look if you're interested:
> 
> <
> http://www.cloudera.com/blog/2008/11/18/introducing-hadoop-development-status/
> 
> The tool is hosted here:
> 
> <http://community.cloudera.com>
> 
> Please give us any feedback or suggestions off-list, to avoid polluting the
> list.
> 
> Enjoy!
> 
> Alex, Jeff, and Tom
>