You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by yu...@tce.edu on 2010/07/30 08:23:49 UTC

Re: MapReduce Usage in Search Engines

Hi all,
          I have a basic query regarding Mapreduce usage in search
engines. My queries are:

1.How Map-Reduce is used in search?
2.Is Google uses Mapreduce algorithm for its search engine? Then how they
use it? Explain the architecture or flow of how google or other search
engines work and what is the part of mapreduce in it.....................

                           Please Explain.........

With Regards,
B.Yuhendar


-----------------------------------------
This email was sent using TCEMail Service.
Thiagarajar College of Engineering
Madurai-625 015, India


Re: MapReduce Usage in Search Engines

Posted by Otis Gospodnetic <ot...@yahoo.com>.
MapReduce tends to be used for massive (re)indexing. 
 See http://search-lucene.com/?q=hadoop+mapreduce&fc_project=Solr&fc_project=Lucene
 for how Lucene/Solr people are using MapReduce.

For example, in a recent project we used MapReduce (streaming with jruby, 
actually) together with Solr (Embedded version, to be more precise) to speed up 
indexing of a 20 GB index that used to take a couple of hours.  Now it takes 7 
minutes, because it's parallelized to Nth degree.


MapReduce can also be used for various Machine Learning data crunching, say for 
query log analysis, for content analysis, for NLP, for building of better 
relevance models for search, etc. etc.  See http://mahout.apache.org .

Otis
----Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: "yuhendar@tce.edu" <yu...@tce.edu>
> To: common-dev@hadoop.apache.org
> Sent: Fri, July 30, 2010 2:23:49 AM
> Subject: Re: MapReduce Usage in Search Engines
> 
> Hi all,
>           I have a basic query regarding  Mapreduce usage in search
> engines. My queries are:
> 
> 1.How Map-Reduce is  used in search?
> 2.Is Google uses Mapreduce algorithm for its search engine?  Then how they
> use it? Explain the architecture or flow of how google or other  search
> engines work and what is the part of mapreduce in  it.....................
> 
>                             Please Explain.........
> 
> With  Regards,
> B.Yuhendar
> 
> 
> -----------------------------------------
> This  email was sent using TCEMail Service.
> Thiagarajar College of  Engineering
> Madurai-625 015, India
> 
> 

Re: MapReduce Usage in Search Engines

Posted by Jeff Zhang <zj...@gmail.com>.
As my understanding, google may use mapred to build index, but won't use
mapred in the search phase.
Because search phase need to be low latency which is not mapred's feature.


On Fri, Jul 30, 2010 at 7:06 AM, Saikat Kanjilal <sx...@hotmail.com>wrote:

>
> Hello Yuhendar,I'll add as much as I can at a high level from what I have
> learned so far about map-reduce to answer your questions:
> 1)  The goal behind map-reduce is to perform a distributed computation
> which breaks up a large computation intensive problem into smaller chunks
> and solve those individual chunks and finally combine the result, the
> problem in this case being search, in this problem you have a master node
> and a set of slave nodes, the master (or in the hadoop domain I believe its
> known as the name node) takes input from the client in the form of a job and
> forwards this job out to the slaves which go off and solve smaller pieces of
> the problem and return the results.  The master then uses a combine approach
> to gather the results from all the slaves and present it back to the client.
>   A more concrete example is the distributed grep problem which is a form of
> searching for a particular word (or document) in a huge dataset.  Take a
> look at the hadoop examples or the hadoop webpage to learn more about this.
> 2) Google to my understanding is using their internal implementation of the
> general algorithm for mapreduce to store data in their datastore known as
> bigtable which is a multi-dimensional sorted map.
>
> My 2 cents.Regards.
>
> > Date: Fri, 30 Jul 2010 11:53:49 +0530
> > Subject: Re: MapReduce Usage in Search Engines
> > From: yuhendar@tce.edu
> > To: common-dev@hadoop.apache.org
> >
> > Hi all,
> >           I have a basic query regarding Mapreduce usage in search
> > engines. My queries are:
> >
> > 1.How Map-Reduce is used in search?
> > 2.Is Google uses Mapreduce algorithm for its search engine? Then how they
> > use it? Explain the architecture or flow of how google or other search
> > engines work and what is the part of mapreduce in it.....................
> >
> >                            Please Explain.........
> >
> > With Regards,
> > B.Yuhendar
> >
> >
> > -----------------------------------------
> > This email was sent using TCEMail Service.
> > Thiagarajar College of Engineering
> > Madurai-625 015, India
> >
>
>



-- 
Best Regards

Jeff Zhang

RE: MapReduce Usage in Search Engines

Posted by Saikat Kanjilal <sx...@hotmail.com>.
Hello Yuhendar,I'll add as much as I can at a high level from what I have learned so far about map-reduce to answer your questions:
1)  The goal behind map-reduce is to perform a distributed computation which breaks up a large computation intensive problem into smaller chunks and solve those individual chunks and finally combine the result, the problem in this case being search, in this problem you have a master node and a set of slave nodes, the master (or in the hadoop domain I believe its known as the name node) takes input from the client in the form of a job and forwards this job out to the slaves which go off and solve smaller pieces of the problem and return the results.  The master then uses a combine approach to gather the results from all the slaves and present it back to the client.   A more concrete example is the distributed grep problem which is a form of searching for a particular word (or document) in a huge dataset.  Take a look at the hadoop examples or the hadoop webpage to learn more about this.
2) Google to my understanding is using their internal implementation of the general algorithm for mapreduce to store data in their datastore known as bigtable which is a multi-dimensional sorted map.

My 2 cents.Regards.

> Date: Fri, 30 Jul 2010 11:53:49 +0530
> Subject: Re: MapReduce Usage in Search Engines
> From: yuhendar@tce.edu
> To: common-dev@hadoop.apache.org
> 
> Hi all,
>           I have a basic query regarding Mapreduce usage in search
> engines. My queries are:
> 
> 1.How Map-Reduce is used in search?
> 2.Is Google uses Mapreduce algorithm for its search engine? Then how they
> use it? Explain the architecture or flow of how google or other search
> engines work and what is the part of mapreduce in it.....................
> 
>                            Please Explain.........
> 
> With Regards,
> B.Yuhendar
> 
> 
> -----------------------------------------
> This email was sent using TCEMail Service.
> Thiagarajar College of Engineering
> Madurai-625 015, India
>