You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Paul Smith <ps...@aconex.com> on 2005/08/04 07:38:15 UTC
Map-Reduce
I've been reading the Nutch MapReduce stuff[1], and the original
Google paper [2].
I know there's a mapreduce branch in the nutch project, but is there
any plan/talk of perhaps integrating something like that directly
into the Lucene API? For projects that need a lower-level API like
Lucene, rather than the crawl-like nature of Nutch, the potential to
index lots of information in an efficient manner is very appealing
indeed.
I'm not suggesting this is _easy_, just curious of what folks on the
Lucene-side of things think. Perhaps a chance to refactor out from
nutch a shared library?
I would love to hear anyones thoughts on the matter.
cheers,
Paul Smith
[1] http://wiki.apache.org/nutch-data/attachments/Presentations/
attachments/oscon05.pdf
[2] http://labs.google.com/papers/mapreduce-osdi04.pdf
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Map-Reduce
Posted by Paul Smith <ps...@aconex.com>.
On 05/08/2005, at 4:10 AM, Doug Cutting wrote:
> Doug Cutting wrote:
>
>> Perhaps we need to factor Nutch into two projects, one with NDFS
>> and MapReduce and the other with the search-specific code. This
>> falls almost exactly on package lines. The packages
>> org.apache.nutch.{io,ipc,fs,ndfs,mapred} are not dependent on the
>> rest of Nutch.
>>
>
> FYI, over on the nutch-dev list, I just proposed that we split
> these packages into a new project that Nutch then depends on, since
> there seems to be interest in using them independently of Nutch.
> Such a split probably wouldn't happen for at least a month.
>
> http://www.mail-archive.com/nutch-dev%40lucene.apache.org/
> msg00312.html
Awesome, thanks Doug! I really believe that having this out as a
separate project will be more useful for everyone. This will also
give more exposure to Nutch and Lucene as a whole, because people
will experiment with the NDFS/MapReduce stuff first (smaller thing to
comprehend first).
cheers,
Paul
Re: Map-Reduce
Posted by Doug Cutting <cu...@apache.org>.
Doug Cutting wrote:
> Perhaps we need to factor Nutch into two projects, one with NDFS and
> MapReduce and the other with the search-specific code. This falls
> almost exactly on package lines. The packages
> org.apache.nutch.{io,ipc,fs,ndfs,mapred} are not dependent on the rest
> of Nutch.
FYI, over on the nutch-dev list, I just proposed that we split these
packages into a new project that Nutch then depends on, since there
seems to be interest in using them independently of Nutch. Such a split
probably wouldn't happen for at least a month.
http://www.mail-archive.com/nutch-dev%40lucene.apache.org/msg00312.html
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Map-Reduce
Posted by Doug Cutting <cu...@apache.org>.
Paul Smith wrote:
> I know there's a mapreduce branch in the nutch project, but is there
> any plan/talk of perhaps integrating something like that directly into
> the Lucene API? For projects that need a lower-level API like Lucene,
> rather than the crawl-like nature of Nutch, the potential to index lots
> of information in an efficient manner is very appealing indeed.
You can easily use NDFS and MapReduce from Nutch without using Nutch's
crawler.
Perhaps we need to factor Nutch into two projects, one with NDFS and
MapReduce and the other with the search-specific code. This falls
almost exactly on package lines. The packages
org.apache.nutch.{io,ipc,fs,ndfs,mapred} are not dependent on the rest
of Nutch.
But you don't need to wait for such a split in order to use NDFS and
MapReduce. Just check out the mapred branch from SVN and don't use the
parts you don't need. If you find it useful, then argue for the
creation of a new project.
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Map-Reduce
Posted by Tom White <to...@gmail.com>.
This might be what you're looing for: http://computefarm.jini.org/.
Cheers,
Tom
On 8/4/05, Cheolgoo Kang <ap...@gmail.com> wrote:
> Yeah, it would be great if we had a Directory subclass like MapReduceDirectory.
>
> I'm looking for the ComputeFarm that is implemented a distributed
> parallel computing environment on the JINI technology.
>
>
> On 8/4/05, Paul Smith <ps...@aconex.com> wrote:
> > I've been reading the Nutch MapReduce stuff[1], and the original
> > Google paper [2].
> >
> > I know there's a mapreduce branch in the nutch project, but is there
> > any plan/talk of perhaps integrating something like that directly
> > into the Lucene API? For projects that need a lower-level API like
> > Lucene, rather than the crawl-like nature of Nutch, the potential to
> > index lots of information in an efficient manner is very appealing
> > indeed.
> >
> > I'm not suggesting this is _easy_, just curious of what folks on the
> > Lucene-side of things think. Perhaps a chance to refactor out from
> > nutch a shared library?
> >
> > I would love to hear anyones thoughts on the matter.
> >
> > cheers,
> >
> > Paul Smith
> >
> > [1] http://wiki.apache.org/nutch-data/attachments/Presentations/
> > attachments/oscon05.pdf
> > [2] http://labs.google.com/papers/mapreduce-osdi04.pdf
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
>
>
> --
> Regards,
> Cheolgoo Kang
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Map-Reduce
Posted by Cheolgoo Kang <ap...@gmail.com>.
Yeah, it would be great if we had a Directory subclass like MapReduceDirectory.
I'm looking for the ComputeFarm that is implemented a distributed
parallel computing environment on the JINI technology.
On 8/4/05, Paul Smith <ps...@aconex.com> wrote:
> I've been reading the Nutch MapReduce stuff[1], and the original
> Google paper [2].
>
> I know there's a mapreduce branch in the nutch project, but is there
> any plan/talk of perhaps integrating something like that directly
> into the Lucene API? For projects that need a lower-level API like
> Lucene, rather than the crawl-like nature of Nutch, the potential to
> index lots of information in an efficient manner is very appealing
> indeed.
>
> I'm not suggesting this is _easy_, just curious of what folks on the
> Lucene-side of things think. Perhaps a chance to refactor out from
> nutch a shared library?
>
> I would love to hear anyones thoughts on the matter.
>
> cheers,
>
> Paul Smith
>
> [1] http://wiki.apache.org/nutch-data/attachments/Presentations/
> attachments/oscon05.pdf
> [2] http://labs.google.com/papers/mapreduce-osdi04.pdf
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
--
Regards,
Cheolgoo Kang
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Map-Reduce
Posted by Otis Gospodnetic <ot...@yahoo.com>.
Thanks. I saw that, but I was curious about the actual presentation
(what exactly Doug said).
Otis
--- Chris Lamprecht <cl...@gmail.com> wrote:
> Maybe you already saw this, I hit it accidentally, it contains a few
> other files including one called mapred.pdf
>
>
http://wiki.apache.org/nutch-data/attachments/Presentations/attachments/
>
> On 8/4/05, Otis Gospodnetic <ot...@yahoo.com> wrote:
> > > [1] http://wiki.apache.org/nutch-data/attachments/Presentations/
> > > attachments/oscon05.pdf
> >
> > Does anyone have any more info from Doug's MapReduce presentation
> > (transcript, notes, audio, video)?
> >
> > Thanks,
> > Otis
> >
> > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> > Simpy -- http://www.simpy.com/ -- Find it. Tag it. Share it.
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Map-Reduce
Posted by Chris Lamprecht <cl...@gmail.com>.
Maybe you already saw this, I hit it accidentally, it contains a few
other files including one called mapred.pdf
http://wiki.apache.org/nutch-data/attachments/Presentations/attachments/
On 8/4/05, Otis Gospodnetic <ot...@yahoo.com> wrote:
> > [1] http://wiki.apache.org/nutch-data/attachments/Presentations/
> > attachments/oscon05.pdf
>
> Does anyone have any more info from Doug's MapReduce presentation
> (transcript, notes, audio, video)?
>
> Thanks,
> Otis
>
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Simpy -- http://www.simpy.com/ -- Find it. Tag it. Share it.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Map-Reduce
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Aug 4, 2005, at 1:27 PM, Otis Gospodnetic wrote:
>> [1] http://wiki.apache.org/nutch-data/attachments/Presentations/
>> attachments/oscon05.pdf
>>
>
> Does anyone have any more info from Doug's MapReduce presentation
> (transcript, notes, audio, video)?
I was at Doug's OSCON presentation but did not see anyone taking
photos or video. Perhaps someone transcribed it, but I did not. I
was too busy being floored by the magnitude of what Doug has done.
Killing the Internet Archive with the MapReduce implementation of
Nutch crawling is mighty impressive!
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Map-Reduce
Posted by Otis Gospodnetic <ot...@yahoo.com>.
> [1] http://wiki.apache.org/nutch-data/attachments/Presentations/
> attachments/oscon05.pdf
Does anyone have any more info from Doug's MapReduce presentation
(transcript, notes, audio, video)?
Thanks,
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/ -- Find it. Tag it. Share it.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org