You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Rozina Sorathia <Ro...@KPITCummins.com> on 2005/11/09 12:36:29 UTC

Distributed nutch

  

 

 

________________________________

I have following queries..Can anyone explain this or tell me where I
will find the detailed explanation on this:

1. What is Distributed nutch

2. How nutch distributed works?

3. When we say distributed, what is distributed?

4. When one server goes down, what happens?

 

 

 

 

 Thanks and regards,

Rozina Sorathia,

 

  

 


Re: Distributed nutch

Posted by Stefan Groschupf <sg...@media-style.com>.
Please do not cross post to the user and developer list!
Nutch use map reduce as distribution mechanism.
see: http://wiki.apache.org/nutch/Presentations

mapred.pdf: "MapReduce in Nutch", 20 June 2005, Yahoo!, Sunnyvale,  
CA, USA
oscon05.pdf: "Scalable Computing with MapReduce", 3 August 2005,  
OSCON, Portland, OR, USA

HTH
Stefan


Am 09.11.2005 um 12:36 schrieb Rozina Sorathia:

>
>
>
>
> I have following queries..Can anyone explain this or tell me where  
> I will find the detailed explanation on this:
>
> 1. What is Distributed nutch
> 2. How nutch distributed works?
> 3. When we say distributed, what is distributed?
> 4. When one server goes down, what happens?
>
>
>
>
>
>
>
>
>  Thanks and regards,
>
> Rozina Sorathia,
>
>
>
>
>
>
>
>


Re: Distributed nutch

Posted by Paul Baclace <pe...@baclace.net>.
In addition to Stefan Groschupf's detailed references, here are some short, high-level answers to your questions:

Rozina Sorathia wrote:
 >  1. What is Distributed nutch

  Nutch is a distributed Lucene with large scale web crawling.

 >2. How nutch distributed works?

  Modeled after Google's Map-Reduce and Google FS which is a single master, multiple slave system tuned for 100-1000 nodes.

 >3. When we say distributed, what is distributed?

  The filesystem is distributed with multiple copies of files on separate machines.  Crawling, parsing, sorting, and indexing are also distributed.

 >4. When one server goes down, what happens?

  If the master goes down, it can be restarted from a checkpointed state file.
  If a slave goes down, there is redundancy so that operations continue, data is not lost, and work in progress dependent on the dead node is automatically restarted.

Nutch version 0.8 is distributed (still under development in the "mapred" branch) and earlier versions are not distributed.


Re: Distributed nutch

Posted by Stefan Groschupf <sg...@media-style.com>.
Please do not cross post to the user and developer list!
Nutch use map reduce as distribution mechanism.
see: http://wiki.apache.org/nutch/Presentations

mapred.pdf: "MapReduce in Nutch", 20 June 2005, Yahoo!, Sunnyvale,  
CA, USA
oscon05.pdf: "Scalable Computing with MapReduce", 3 August 2005,  
OSCON, Portland, OR, USA

HTH
Stefan


Am 09.11.2005 um 12:36 schrieb Rozina Sorathia:

>
>
>
>
> I have following queries..Can anyone explain this or tell me where  
> I will find the detailed explanation on this:
>
> 1. What is Distributed nutch
> 2. How nutch distributed works?
> 3. When we say distributed, what is distributed?
> 4. When one server goes down, what happens?
>
>
>
>
>
>
>
>
>  Thanks and regards,
>
> Rozina Sorathia,
>
>
>
>
>
>
>
>