You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Mehdi Alemi <al...@comp.iust.ac.ir> on 2011/01/26 13:22:00 UTC

How nutch-2.0 is handled by hbase

Dear developers,
I would like to know how you use hbase in nutch2. Off course, I know that 
GORA store data persistently. But, I'm confusing about working of it's 
map-reduce part.
What is the number of reducers? What is the number of mappers?
Mapper prepare document for reducer and reducer index it with Solr. Solr 
index that document along with other documents and store it with GORA. How 
merge of indexes is resolved? For each term in inverted index, there should 
be one posting list that identify documents containing that term. Is update 
of each term in inverted index is performed by merging new posting list and 
old posting list?

Best Regards,
Mehdi Alemi