You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Deepan Chakravarthy <co...@gmail.com> on 2006/08/09 19:44:13 UTC

updating document

Hi,
 We have to update few documents in our index. We have add a additional
field to them. We did as follows

1)read the documents of our interest using IndexReader
2)copy them to a temporary doc object (temp_doc)
3)delete the document in the index 
4)close the IndexReader
5)open the IndexWriter
6)add a new field to (temp_doc)
7)add the (temp_doc) to the index using IndexWriter
8)close the IndexWriter


The problem:
1)Those documents that we updated are not searchable now. When we
perform search based we not find any of those documents we updated.
(using IndexSearcher)

2)But we are still able to read the updated documents using IndexReader.


Questions
1)When i want to update a document by adding a field, should i reindex
all the fields again? will copying the existing document not help and
adding new field not help ? 





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: updating document

Posted by Karel Tejnora <ka...@tejnora.cz>.

Im sending a snippet of code how to reconstruct UNSTORED fields.
It has two parts:
DB+terms

Class.forName("org.postgresql.Driver").newInstance();
            con = DriverManager.getConnection("jdbc:postgresql:lucene", 
"lucene", "lucene");
            PreparedStatement psCompany=con.prepareStatement("INSERT 
INTO term (name,value,doc,pos) VALUES ('company',?,?,?)");
           
            Directory sourceDir = FSDirectory.getDirectory(source,false);
            Directory targetDir = FSDirectory.getDirectory(target,true);
            Analyzer analyzer = (Analyzer)analyzers.get(lang);

            if(!IndexReader.indexExists(sourceDir))
            {
                System.err.println("Source index doesn't live on 
specified path");
                return;
            }
           
            ir = IndexReader.open(sourceDir);
            iw = new IndexWriter(targetDir,analyzer,true);
           
            int numdocs = ir.numDocs();
           
            TermEnum terms = ir.terms();

            String fnCompany = "company".intern();
            while(terms.next())
            {
                Term t = terms.term();
                if(fnCompany==t.field())
                {
                    int docfreq = ir.docFreq(t);
                    psCompany.setString(1, t.text());
                    TermPositions tp = ir.termPositions(t);
                    for(int i=0;i<docfreq;i++)
                    {
                        tp.next();
                        int docId =tp.doc();
                        for(int j=0,len=tp.freq();j<len;j++)
                        {
                            int pos = tp.nextPosition();
                            psCompany.setInt(2, docId);
                            psCompany.setInt(3, pos);
                            psCompany.executeUpdate();
                        }
                    }
                }
            }

For indexing you need, I suppose, - length of field (max(pos) + 
maxPosTerm.length())
and fields is recon by select pos,value,length(value) from term where 
name=? and doc=? order by pos asc;
and (analyzer) tokenstream is just wrapper for resultset.


If you would like to check if table is correct than
select t.value,count(*) as co from (select distinct doc,value from term) 
as t group by t.value order by co desc;

Isssues - you can store it but the information is not OK - it just clone 
UN_STORED well

PS: PostgreSQL with proper PK is pretty fast

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: updating document

Posted by Karel Tejnora <ka...@tejnora.cz>.

Hi,
    I'm facing similar problem. I found a possible way, how to copy a 
part of index (w/o copy whole index,delete,optimize), but don't know how 
to change/add/remove field (or add term vector in my case) to existing 
index.

To copy a part of index override methods in IndexReader

  /** Returns true if document <i>n</i> has been deleted */
  public abstract boolean isDeleted(int n);

  /** Returns true if any documents have been deleted */
  public abstract boolean hasDeletions();

and use addIndexes when it has a just one segment.
I have not tested it yet, just an idea from reading source code.

Karel



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: updating document

Posted by Karel Tejnora <ka...@tejnora.cz>.

Well, you can have! :-) Even I have not tested, just an idea.

You can get document id after add - numDocs() and insert if DB fails, 
you can delete document from RAMDir.

Or in my case of batches - im adding documents in DB with savepoint, 
than create clear index (create=true) and at the end if everything is 
allright
use addIndexes() if not than rollback to savepoint and send sms.
> Unfortunately you may have transactional problems with this approach.  
> That
> is, if you write to lucene and in the same logical transaction you 
> write to
> external storage you may have atomicity problems if one of these actions
> fails.  But you can't have everything!


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: updating document

Posted by Jason Polites <ja...@gmail.com>.

This strategy can also be nicely abstracted from your main app.  Whilst I
haven't yet implemented it, my plan is to create a template style structure
which tells me which fields are in lucene, and which are externalized.  This
way I don't bother storing data in lucene that it stored elsewhere, but I
also don't bother storing data elsewhere that is available in lucene.

Of course I always have the source data from which I can do a total re-index
if necessary.  The end result is an abstraction of a "document" which (by
using a template) will automatically retrieve data from the index or from
external storage based on the meta information in the template.  This means
the app that uses the index no longer cares where the data is, and it can
re-index whenever it likes with confidence that everything will be handled
appropriately.

Unfortunately you may have transactional problems with this approach.  That
is, if you write to lucene and in the same logical transaction you write to
external storage you may have atomicity problems if one of these actions
fails.  But you can't have everything!

As long as you can "fix" the index if it gets out of sync you should be ok.

On 8/11/06, Karel Tejnora <ka...@tejnora.cz> wrote:
>
> Jason is right. I think, even Im not expert on lucene too, your newly
> added document cann't recreate terms for field with analyzer, because
> field text in empty.
> There is very hairy solution - hack a IndexReader, FieldInfosWriter and
> use addIndexes.
>
> Lucene is "only" a fulltext search library, not a datastore. I end up
> with the same design as he suggested - At beginning I choose to store
> fields in lucene index,
> because lucene has no limit on fields length like DB2.
>
> The Jason's strategy is very useful in all cases - *before adding
> document* to lucene index obtain an unique id from sequence (e. g. db)
> add document with
> this synthetic id (e.g. field id and store it within index) and after
> add, save document *with id* (if it has not been saved yet) to xml or
> another storage.
>
> Well, I haven't stored syn. ids with indexed documents and now I'm about
> to reassign them.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: updating document

Posted by Karel Tejnora <ka...@tejnora.cz>.

Jason is right. I think, even Im not expert on lucene too, your newly 
added document cann't recreate terms for field with analyzer, because 
field text in empty.
There is very hairy solution - hack a IndexReader, FieldInfosWriter and 
use addIndexes.

Lucene is "only" a fulltext search library, not a datastore. I end up 
with the same design as he suggested - At beginning I choose to store 
fields in lucene index,
because lucene has no limit on fields length like DB2.

The Jason's strategy is very useful in all cases - *before adding 
document* to lucene index obtain an unique id from sequence (e. g. db) 
add document with
this synthetic id (e.g. field id and store it within index) and after 
add, save document *with id* (if it has not been saved yet) to xml or 
another storage.

Well, I haven't stored syn. ids with indexed documents and now I'm about 
to reassign them.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: updating document

Posted by Jason Polites <ja...@gmail.com>.

Unfortunately yes.  It doesn't really have anything to do with the way you
access the index (I don't think).  The fact is that the data is simply not
in the document.  When you add the document again it is effectively
"re-indexed", so if the raw data of the field is empty, then it won't be
indexed.

I'm in no way a Lucene expert, but this has been my experience.

In my case, I too have important data which I need to preserve during
updates, but it raises an important architecture point.  If you are not
storing the data, then you obviously don't need it for display (no doubt you
have it elsewhere when it is needed for display).  Storing the data simply
for the purposes of preserving it during updates will work, but has the
secondary effect of slowing down searches.  More specifically, slowing down
the retrieval of Hits.  When you call hits.doc(n), the stored values of
fields in the document will be loaded into memory.  The larger the amount of
data stored, the larger the memory consumption.  If you are compressing the
field (using the Field.Store.COMPRESS option) you will see even slower
performance as the field data needs to be decompressed before it is
retrieved.

There are apparently plans to introduce lazy loading of fields in subsequent
releases, which will be a great feature and will solve some of this, but you
still have the issue of needing to store data in the index which doesn't
really need to be there.

In my case, I elected to store this data elsewhere.  So, I have a "document"
(actually an email) which may contain large amounts of text which I want to
search on, but don't need to display from the index.  I index this text, but
don't store it.  When I want to update a document, I retrieve this text from
wherever I put it, and re-insert it into the lucene document.

It's effectively the same process, but it just alleviates the burden of
maintaining this data within the index.  Of course you then have to maintain
two sets of information, but the benefits will probably outweight the costs.

It can be as simple as storing the text content in a compressed file on your
file-system somewhere, keyed with an ID which is unique to the lucene
document it belongs to.  Or you could use a CLOB field (Text type in SQL
Server) in a database etc.

The real advantage of this is the speed gain from the index.  Lucene works
best when the index is light-weight.  My recommendation is to think
carefully about the "role" of the index, vs the role of your data storage
approach.

On 8/11/06, Deepan Chakravarthy <co...@gmail.com> wrote:
>
> On Fri, 2006-08-11 at 01:58 +1000, Jason Polites wrote:
> > Are your storing the contents of the fields in the index?  That is,
> > specifying Field.Store.YES when creating the field?
> >
> > In my experience fields which are not stored are not recoverable from
> the
> > index (well.. they can be reconstructed but it's a lossy process).  So
> when
> > you retrieve the document, you lose non-stored fields.
> >
>
> Yes we have some important fields that are not stored in the index. Is
> there a way to overcome this problem? while updating document.  Will i
> face the same problem with IndexModifier ? (Now I am using IndexReader
> and IndexWriter)
> Thanks
> Deepan
> www.codeshepherd.com
>
>
> > If you are searching on these fields then it would explain why you are
> > losing results.
> >
> > On 8/10/06, Deepan Chakravarthy <co...@gmail.com> wrote:
> > >
> > > On Thu, 2006-08-10 at 09:16 -0400, Erick Erickson wrote:
> > > > You say "Those documents that we updated are not searchable now".
> I've
> > > got
> > > > to ask the obvious question, did you close and re-open the
> *searcher*
> > > > (really, the indexreader you use in your searcher)? I suspect you
> have,
> > > but
> > > > thought I'd ask explicitly.
> > > >
> > > > I'd also get a copy of Luke (http://www.getopt.org/luke/) and
> inspect my
> > > > index after you drop/re-add the data.
> > > I have Luke. When i inspect the index with luke i find the same
> result,
> > > i.e the updated documents are not searchable in the new index.
> > >
> > > I guess Index Modifier used Index reader and writer internally. I am
> > > adding more fields to existing documents in index. so some of my
> > > documents will have n fields and other n+m fields after updating. Does
> > > the difference in number of fields affect search in any manner ?
> > >
> > >
> > > >
> > > > Actually, have you thought about IndexModifier (I'm using Lucene 2.0
> ).
> > > From
> > > > the javadoc....
> > > >
> > > > <<< A class to modify an index, i.e. to delete and add documents.
> This
> > > class
> > > > hides IndexReader<
> > >
> file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/index/IndexReader.html
> > > >and
> > > > IndexWriter<
> > >
> file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/index/IndexWriter.html
> > > >so
> > > > that you do not need to care about implementation details such as
> that
> > > > adding documents is done via IndexWriter and deletion is done via
> > > > IndexReader.>>>
> > > >
> > > > Best
> > > > Erick
> > > >
> > > > On 8/9/06, Deepan Chakravarthy <co...@gmail.com> wrote:
> > > > >
> > > > > Hi,
> > > > > We have to update few documents in our index. We have add a
> additional
> > > > > field to them. We did as follows
> > > > >
> > > > > 1)read the documents of our interest using IndexReader
> > > > > 2)copy them to a temporary doc object (temp_doc)
> > > > > 3)delete the document in the index
> > > > > 4)close the IndexReader
> > > > > 5)open the IndexWriter
> > > > > 6)add a new field to (temp_doc)
> > > > > 7)add the (temp_doc) to the index using IndexWriter
> > > > > 8)close the IndexWriter
> > > > >
> > > > >
> > > > > The problem:
> > > > > 1)Those documents that we updated are not searchable now. When we
> > > > > perform search based we not find any of those documents we
> updated.
> > > > > (using IndexSearcher)
> > > > >
> > > > > 2)But we are still able to read the updated documents using
> > > IndexReader.
> > > > >
> > > > >
> > > > > Questions
> > > > > 1)When i want to update a document by adding a field, should i
> reindex
> > > > > all the fields again? will copying the existing document not help
> and
> > > > > adding new field not help ?
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > > >
> > > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: updating document

Posted by Deepan Chakravarthy <co...@gmail.com>.

On Fri, 2006-08-11 at 01:58 +1000, Jason Polites wrote:
> Are your storing the contents of the fields in the index?  That is,
> specifying Field.Store.YES when creating the field?
> 
> In my experience fields which are not stored are not recoverable from the
> index (well.. they can be reconstructed but it's a lossy process).  So when
> you retrieve the document, you lose non-stored fields.
> 

Yes we have some important fields that are not stored in the index. Is
there a way to overcome this problem? while updating document.  Will i
face the same problem with IndexModifier ? (Now I am using IndexReader
and IndexWriter)
Thanks
Deepan 
www.codeshepherd.com


> If you are searching on these fields then it would explain why you are
> losing results.
> 
> On 8/10/06, Deepan Chakravarthy <co...@gmail.com> wrote:
> >
> > On Thu, 2006-08-10 at 09:16 -0400, Erick Erickson wrote:
> > > You say "Those documents that we updated are not searchable now". I've
> > got
> > > to ask the obvious question, did you close and re-open the *searcher*
> > > (really, the indexreader you use in your searcher)? I suspect you have,
> > but
> > > thought I'd ask explicitly.
> > >
> > > I'd also get a copy of Luke (http://www.getopt.org/luke/) and inspect my
> > > index after you drop/re-add the data.
> > I have Luke. When i inspect the index with luke i find the same result,
> > i.e the updated documents are not searchable in the new index.
> >
> > I guess Index Modifier used Index reader and writer internally. I am
> > adding more fields to existing documents in index. so some of my
> > documents will have n fields and other n+m fields after updating. Does
> > the difference in number of fields affect search in any manner ?
> >
> >
> > >
> > > Actually, have you thought about IndexModifier (I'm using Lucene 2.0).
> > From
> > > the javadoc....
> > >
> > > <<< A class to modify an index, i.e. to delete and add documents. This
> > class
> > > hides IndexReader<
> > file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/index/IndexReader.html
> > >and
> > > IndexWriter<
> > file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/index/IndexWriter.html
> > >so
> > > that you do not need to care about implementation details such as that
> > > adding documents is done via IndexWriter and deletion is done via
> > > IndexReader.>>>
> > >
> > > Best
> > > Erick
> > >
> > > On 8/9/06, Deepan Chakravarthy <co...@gmail.com> wrote:
> > > >
> > > > Hi,
> > > > We have to update few documents in our index. We have add a additional
> > > > field to them. We did as follows
> > > >
> > > > 1)read the documents of our interest using IndexReader
> > > > 2)copy them to a temporary doc object (temp_doc)
> > > > 3)delete the document in the index
> > > > 4)close the IndexReader
> > > > 5)open the IndexWriter
> > > > 6)add a new field to (temp_doc)
> > > > 7)add the (temp_doc) to the index using IndexWriter
> > > > 8)close the IndexWriter
> > > >
> > > >
> > > > The problem:
> > > > 1)Those documents that we updated are not searchable now. When we
> > > > perform search based we not find any of those documents we updated.
> > > > (using IndexSearcher)
> > > >
> > > > 2)But we are still able to read the updated documents using
> > IndexReader.
> > > >
> > > >
> > > > Questions
> > > > 1)When i want to update a document by adding a field, should i reindex
> > > > all the fields again? will copying the existing document not help and
> > > > adding new field not help ?
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: updating document

Posted by Jason Polites <ja...@gmail.com>.

Are your storing the contents of the fields in the index?  That is,
specifying Field.Store.YES when creating the field?

In my experience fields which are not stored are not recoverable from the
index (well.. they can be reconstructed but it's a lossy process).  So when
you retrieve the document, you lose non-stored fields.

If you are searching on these fields then it would explain why you are
losing results.

On 8/10/06, Deepan Chakravarthy <co...@gmail.com> wrote:
>
> On Thu, 2006-08-10 at 09:16 -0400, Erick Erickson wrote:
> > You say "Those documents that we updated are not searchable now". I've
> got
> > to ask the obvious question, did you close and re-open the *searcher*
> > (really, the indexreader you use in your searcher)? I suspect you have,
> but
> > thought I'd ask explicitly.
> >
> > I'd also get a copy of Luke (http://www.getopt.org/luke/) and inspect my
> > index after you drop/re-add the data.
> I have Luke. When i inspect the index with luke i find the same result,
> i.e the updated documents are not searchable in the new index.
>
> I guess Index Modifier used Index reader and writer internally. I am
> adding more fields to existing documents in index. so some of my
> documents will have n fields and other n+m fields after updating. Does
> the difference in number of fields affect search in any manner ?
>
>
> >
> > Actually, have you thought about IndexModifier (I'm using Lucene 2.0).
> From
> > the javadoc....
> >
> > <<< A class to modify an index, i.e. to delete and add documents. This
> class
> > hides IndexReader<
> file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/index/IndexReader.html
> >and
> > IndexWriter<
> file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/index/IndexWriter.html
> >so
> > that you do not need to care about implementation details such as that
> > adding documents is done via IndexWriter and deletion is done via
> > IndexReader.>>>
> >
> > Best
> > Erick
> >
> > On 8/9/06, Deepan Chakravarthy <co...@gmail.com> wrote:
> > >
> > > Hi,
> > > We have to update few documents in our index. We have add a additional
> > > field to them. We did as follows
> > >
> > > 1)read the documents of our interest using IndexReader
> > > 2)copy them to a temporary doc object (temp_doc)
> > > 3)delete the document in the index
> > > 4)close the IndexReader
> > > 5)open the IndexWriter
> > > 6)add a new field to (temp_doc)
> > > 7)add the (temp_doc) to the index using IndexWriter
> > > 8)close the IndexWriter
> > >
> > >
> > > The problem:
> > > 1)Those documents that we updated are not searchable now. When we
> > > perform search based we not find any of those documents we updated.
> > > (using IndexSearcher)
> > >
> > > 2)But we are still able to read the updated documents using
> IndexReader.
> > >
> > >
> > > Questions
> > > 1)When i want to update a document by adding a field, should i reindex
> > > all the fields again? will copying the existing document not help and
> > > adding new field not help ?
> > >
> > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: updating document

Posted by Deepan Chakravarthy <co...@gmail.com>.

On Thu, 2006-08-10 at 09:16 -0400, Erick Erickson wrote:
> You say "Those documents that we updated are not searchable now". I've got
> to ask the obvious question, did you close and re-open the *searcher*
> (really, the indexreader you use in your searcher)? I suspect you have, but
> thought I'd ask explicitly.
> 
> I'd also get a copy of Luke (http://www.getopt.org/luke/) and inspect my
> index after you drop/re-add the data.
I have Luke. When i inspect the index with luke i find the same result,
i.e the updated documents are not searchable in the new index. 

I guess Index Modifier used Index reader and writer internally. I am
adding more fields to existing documents in index. so some of my
documents will have n fields and other n+m fields after updating. Does
the difference in number of fields affect search in any manner ? 


> 
> Actually, have you thought about IndexModifier (I'm using Lucene 2.0). From
> the javadoc....
> 
> <<< A class to modify an index, i.e. to delete and add documents. This class
> hides IndexReader<file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/index/IndexReader.html>and
> IndexWriter<file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/index/IndexWriter.html>so
> that you do not need to care about implementation details such as that
> adding documents is done via IndexWriter and deletion is done via
> IndexReader.>>>
> 
> Best
> Erick
> 
> On 8/9/06, Deepan Chakravarthy <co...@gmail.com> wrote:
> >
> > Hi,
> > We have to update few documents in our index. We have add a additional
> > field to them. We did as follows
> >
> > 1)read the documents of our interest using IndexReader
> > 2)copy them to a temporary doc object (temp_doc)
> > 3)delete the document in the index
> > 4)close the IndexReader
> > 5)open the IndexWriter
> > 6)add a new field to (temp_doc)
> > 7)add the (temp_doc) to the index using IndexWriter
> > 8)close the IndexWriter
> >
> >
> > The problem:
> > 1)Those documents that we updated are not searchable now. When we
> > perform search based we not find any of those documents we updated.
> > (using IndexSearcher)
> >
> > 2)But we are still able to read the updated documents using IndexReader.
> >
> >
> > Questions
> > 1)When i want to update a document by adding a field, should i reindex
> > all the fields again? will copying the existing document not help and
> > adding new field not help ?
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: updating document

Posted by Erick Erickson <er...@gmail.com>.

You say "Those documents that we updated are not searchable now". I've got
to ask the obvious question, did you close and re-open the *searcher*
(really, the indexreader you use in your searcher)? I suspect you have, but
thought I'd ask explicitly.

I'd also get a copy of Luke (http://www.getopt.org/luke/) and inspect my
index after you drop/re-add the data.

Actually, have you thought about IndexModifier (I'm using Lucene 2.0). From
the javadoc....

<<< A class to modify an index, i.e. to delete and add documents. This class
hides IndexReader<file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/index/IndexReader.html>and
IndexWriter<file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/index/IndexWriter.html>so
that you do not need to care about implementation details such as that
adding documents is done via IndexWriter and deletion is done via
IndexReader.>>>

Best
Erick

On 8/9/06, Deepan Chakravarthy <co...@gmail.com> wrote:
>
> Hi,
> We have to update few documents in our index. We have add a additional
> field to them. We did as follows
>
> 1)read the documents of our interest using IndexReader
> 2)copy them to a temporary doc object (temp_doc)
> 3)delete the document in the index
> 4)close the IndexReader
> 5)open the IndexWriter
> 6)add a new field to (temp_doc)
> 7)add the (temp_doc) to the index using IndexWriter
> 8)close the IndexWriter
>
>
> The problem:
> 1)Those documents that we updated are not searchable now. When we
> perform search based we not find any of those documents we updated.
> (using IndexSearcher)
>
> 2)But we are still able to read the updated documents using IndexReader.
>
>
> Questions
> 1)When i want to update a document by adding a field, should i reindex
> all the fields again? will copying the existing document not help and
> adding new field not help ?
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: updating document

Posted by Doron Cohen <DO...@il.ibm.com>.

Hi Deepan, The steps below seems correct, given that all the fields of the
original document are also stored - the javadoc for
indexReader.document(int n) (which I assume is what you are using) says: "
Returns the stored fields of the nth Document in this index." - so, only
stored fields would exist in the document returned by this call, and hence
only those stored fields would be added back to the index, after you are
updating this tmp_doc object with new fields.

Regards,
Doron

Deepan Chakravarthy <co...@gmail.com> wrote on 09/08/2006 10:44:13:

> Hi,
>  We have to update few documents in our index. We have add a additional
> field to them. We did as follows
>
> 1)read the documents of our interest using IndexReader
> 2)copy them to a temporary doc object (temp_doc)
> 3)delete the document in the index
> 4)close the IndexReader
> 5)open the IndexWriter
> 6)add a new field to (temp_doc)
> 7)add the (temp_doc) to the index using IndexWriter
> 8)close the IndexWriter
>
>
> The problem:
> 1)Those documents that we updated are not searchable now. When we
> perform search based we not find any of those documents we updated.
> (using IndexSearcher)
>
> 2)But we are still able to read the updated documents using IndexReader.
>
>
> Questions
> 1)When i want to update a document by adding a field, should i reindex
> all the fields again? will copying the existing document not help and
> adding new field not help ?
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org