You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Naama Kraus <na...@gmail.com> on 2008/04/01 16:22:55 UTC

Nutch and Distributed Lucene

Hi,

I'd like to know if Nutch is running on top of Lucene, or is it non related
to Lucene. I.e. indexing, parsing, crawling, internal data structures ... -
all written from scratch using MapReduce (my impression) ?

What is the relation between Nutch and the distributed Lucene patch that was
inserted lately into Hadoop ?

Thanks for any enlightening,
Naama

-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

Re: Nutch and Distributed Lucene

Posted by Naama Kraus <na...@gmail.com>.
Hi Ning,

Thanks a lot !

Naama

On Tue, Apr 1, 2008 at 7:06 PM, Ning Li <ni...@gmail.com> wrote:

> Hi,
>
> Nutch builds Lucene indexes. But Nutch is much more than that. It is a
> web search application software that crawls the web, inverts links and
> builds indexes. Each step is one or more Map/Reduce jobs. You can find
> more information at http://lucene.apache.org/nutch/
>
> The Map/Reduce job to build Lucene indexes in Nutch is customized to
> the data schema/structures used in Nutch. The index contrib package in
> Hadoop provides a general/configurable process to build Lucene indexes
> in parallel using a Map/Reduce job. That's the main difference. There
> is also the difference that the index build job in Nutch builds
> indexes in reduce tasks, while the index contrib package builds
> indexes in both map and reduce tasks and there are advantages in doing
> that...
>
> Regards,
> Ning
>
>
> On 4/1/08, Naama Kraus <na...@gmail.com> wrote:
> > Hi,
> >
> > I'd like to know if Nutch is running on top of Lucene, or is it non
> related
> > to Lucene. I.e. indexing, parsing, crawling, internal data structures
> ... -
> > all written from scratch using MapReduce (my impression) ?
> >
> > What is the relation between Nutch and the distributed Lucene patch that
> was
> > inserted lately into Hadoop ?
> >
> > Thanks for any enlightening,
> > Naama
> >
> > --
> > oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00
> oo
> > 00 oo 00 oo
> > "If you want your children to be intelligent, read them fairy tales. If
> you
> > want them to be more intelligent, read them more fairy tales." (Albert
> > Einstein)
> >
>



-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

Re: Nutch and Distributed Lucene

Posted by Ning Li <ni...@gmail.com>.
Hi,

Nutch builds Lucene indexes. But Nutch is much more than that. It is a
web search application software that crawls the web, inverts links and
builds indexes. Each step is one or more Map/Reduce jobs. You can find
more information at http://lucene.apache.org/nutch/

The Map/Reduce job to build Lucene indexes in Nutch is customized to
the data schema/structures used in Nutch. The index contrib package in
Hadoop provides a general/configurable process to build Lucene indexes
in parallel using a Map/Reduce job. That's the main difference. There
is also the difference that the index build job in Nutch builds
indexes in reduce tasks, while the index contrib package builds
indexes in both map and reduce tasks and there are advantages in doing
that...

Regards,
Ning


On 4/1/08, Naama Kraus <na...@gmail.com> wrote:
> Hi,
>
> I'd like to know if Nutch is running on top of Lucene, or is it non related
> to Lucene. I.e. indexing, parsing, crawling, internal data structures ... -
> all written from scratch using MapReduce (my impression) ?
>
> What is the relation between Nutch and the distributed Lucene patch that was
> inserted lately into Hadoop ?
>
> Thanks for any enlightening,
> Naama
>
> --
> oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
> 00 oo 00 oo
> "If you want your children to be intelligent, read them fairy tales. If you
> want them to be more intelligent, read them more fairy tales." (Albert
> Einstein)
>