You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by Yali Hu <Ar...@csk.com> on 2011/08/22 08:35:04 UTC

The question about create index using Map-Reduce(Hadoop-contrib 2951)

Hi, I would like to ask and confirm one question/problem? about create index
using map-reduce (Hadoop-contrib 2951).

According to Hadoop contrib 2951, it realized the creation/updating index
using map-reduce. 
https://issues.apache.org/jira/browse/HADOOP-2951

But in this package Lucene 2.3 was used. 

I update the Lucene to 3.2 and also modified some source code to match
Lucene 3.2 new classes and methods. 

But I faced one problem here that is related to create TermVector files(tvd,
tvf, tvx).

*These thress TermVector files(tvd, tvf, tvx) cannot be created correctly in
case of merge segements.*

The max num of segments can be set in the  file. If the value is not set the
index segments will be created for each doucment.
For example there are 12 documents was input, the output index file will be
10 segements. 
In this case there is no any problem.   

But if the value of max num of segments is set to 5 for example, of course
the results will be 5 segements. 
There is no any other problem to retrival function. 
But in some of segments which merged from some documents *these three files
is 0 byte checked by Luke. *

I checked all of my modification which related to Lucene 3.2 and didn't find
the reason.

Is anyboby faced the same issue or give me some advices?

Thanks in advance.

Yali Hu    




--
View this message in context: http://lucene.472066.n3.nabble.com/The-question-about-create-index-using-Map-Reduce-Hadoop-contrib-2951-tp3274205p3274205.html
Sent from the Lucene - General mailing list archive at Nabble.com.