You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Phanindra Reva <re...@gmail.com> on 2009/12/02 18:09:58 UTC

manipulation of the Document ( org.apache.lucene.document.Document )

Hello ,
          Our objective is to modify the Document (
org.apache.lucene.document.Document ) before its being indexed. I am
just wondering about the places where we get the reference of the
Document object, so that we can perform modification.
Modifying, does just mean that changing name of some fields, etc.
Here there is one more point to consider. During the modification of
the Document object, we create one more java Object, call it "XYZ" (
specific to our project ), and populate it with some information which
will be exactly needed during assigning payloads for each Term.
           What do you think about the place or point to perform above task.?
     I have tried to solve this, but I found only two solutions either
before the control comes to IndexWriter or during the analysation. But
if we do it before the control comes to IndexWriter, we loose the
reference of the Object XYZ which is needed while assigning payloads.
If we think of doing it during analysis phase, the problem is that we
are not given the reference to the Document object but only the
Reader. (as  tokenStream(String fieldName, Reader reader)  ) . Here I
got stuck finding no progress.
 Please let me know if there are any alternatives.
I am sorry.. if the explaination is complex and terrible.
Thanks.