You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Chris and Helen Bamford <ch...@bammers.net> on 2017/10/20 09:28:16 UTC

DocValues and SearcherManager

Hi,

I am using Lucene 4.10.3 and have a problem retrieving a DocValue field 
of a document using SearcherManager after I have updated a stored field 
value.

The document has two key values: 'state' (stored Field) and 'id' 
(BinaryDocValue).

After the document is indexed, it undergoes the following chain of events:

  - it is retrieved from the index by 'state' (using a Searcher obtained 
by SearcherManager.maybeRefresh() & searcherManager.acquire())
  - the 'state' field's value is changed and the document is updated 
using the IndexWriter from the SearcherManager 
(indexWriter.updateDocument(Term, Document))

This all works fine.

The problem comes when I want to match on the docValue 'id' reusing the 
same Searcher (SearcherManager.maybeRefresh() + 
searcherManager.acquire()), which does not work.

I'm no expert but it seems that when the document is retrieved by 
'state' it has only stored fields in the list, so when updated it ends 
up calling FieldInfos.addOrUpdate that discards FieldInfo of the 
docValue field 'myId' from the list. Afterwards it is impossible to 
retrieve the docValue using the same searcher 
(searcherManager.maybeRefresh() + searcherManager.acquire()).

If a new reader is obtained the docValue match/update is possible but 
this is a performance critical piece of code and I was hoping to reuse 
the same SearcherManager.

The unit test here shows the problem:

public class SearcherManagerFailureTest {
     private static final String indexPath = "/tmp/mytestindex";

     private IndexWriter indexWriter;
     private SearcherManager searcherManager;
     public Directory directory;

     @Before
     public void beforeTest() throws Exception {
         // Setup
         directory = FSDirectory.open(new File(indexPath));
         IndexWriterConfig idxCfg = new 
IndexWriterConfig(Version.LUCENE_4_10_3, new WhitespaceAnalyzer());
idxCfg.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
         indexWriter = new IndexWriter(directory, idxCfg);
         searcherManager = new SearcherManager(indexWriter, true, new 
SearcherFactory());
     }

     @After
     public void afterTest() throws Exception {
         if (indexWriter != null) {
             indexWriter.commit();
             indexWriter.close();
         }

         if (searcherManager != null) {
             searcherManager.close();
         }

         if (directory != null) {
             directory.close();
         }

         for(File file: new File(indexPath).listFiles())
             if (!file.isDirectory())
                 file.delete();
     }

     @Test
     public void TestSearcherManagerFails() throws Exception{

         //Indexing
         Document doc = new Document();
         FieldType ft = new FieldType(TextField.TYPE_STORED);
         ft.setTokenized(false);
         doc.add(new Field("docId",  "doc1",   ft));
         doc.add(new Field("state",   "added",   ft));
         doc.add(new BinaryDocValuesField("id", new BytesRef("first")));
         indexWriter.addDocument(doc);
         indexWriter.commit();

         //Search by state
         searcherManager.maybeRefresh();
         IndexSearcher searcher = searcherManager.acquire();
         TopDocs topDocs = searcher.search(new TermQuery(new 
Term("state", "added")), null, 1);
         Document indexedDoc = searcher.doc(topDocs.scoreDocs[0].doc);

         //Update document
         String docId = indexedDoc.get("docId");
         Term term = new Term("docId", docId);
         Field stateField = (Field) indexedDoc.getField("state");
         stateField.setStringValue("processed");
         indexWriter.updateDocument(term, indexedDoc);

         //Try get docValue
         searcherManager.maybeRefresh();
         IndexSearcher newSearcher = searcherManager.acquire();

         BinaryDocValues docValues = 
MultiDocValues.getBinaryValues(newSearcher.getIndexReader(), "id");
         Assert.assertEquals(null, docValues);

         Directory newDirectory = FSDirectory.open(new File(indexPath));
         BinaryDocValues docValues2 = 
MultiDocValues.getBinaryValues(DirectoryReader.open(newDirectory), "id");
         Assert.assertNotSame(null, docValues2);

         if(newDirectory != null){
             newDirectory.close();
         }
     }
}


Can anyone advise?

Thanks

- Chris


Re: DocValues and SearcherManager

Posted by Michael McCandless <lu...@mikemccandless.com>.
Hmm the document you get back from a reader is NOT the same document you
had indexed, so you cannot retrieve a doc from a reader, tweak it, re-index
it, and hope everything survived.

In particular, your doc values field "id" is not stored, so when you
retrieve it from the reader, there is no id field, and so when you then
replace that document, the new document has no id field.

You could just add a StoredField("id", id) and then the id is there, but it
will be a simple stored field, not a doc values field.  You must then build
a new Document instance for indexing, where you convert that id back into a
doc values field.

Mike McCandless

http://blog.mikemccandless.com

On Fri, Oct 20, 2017 at 5:28 AM, Chris and Helen Bamford <ch...@bammers.net>
wrote:

> Hi,
>
> I am using Lucene 4.10.3 and have a problem retrieving a DocValue field of
> a document using SearcherManager after I have updated a stored field value.
>
> The document has two key values: 'state' (stored Field) and 'id'
> (BinaryDocValue).
>
> After the document is indexed, it undergoes the following chain of events:
>
>  - it is retrieved from the index by 'state' (using a Searcher obtained by
> SearcherManager.maybeRefresh() & searcherManager.acquire())
>  - the 'state' field's value is changed and the document is updated using
> the IndexWriter from the SearcherManager (indexWriter.updateDocument(Term,
> Document))
>
> This all works fine.
>
> The problem comes when I want to match on the docValue 'id' reusing the
> same Searcher (SearcherManager.maybeRefresh() +
> searcherManager.acquire()), which does not work.
>
> I'm no expert but it seems that when the document is retrieved by 'state'
> it has only stored fields in the list, so when updated it ends up calling
> FieldInfos.addOrUpdate that discards FieldInfo of the docValue field 'myId'
> from the list. Afterwards it is impossible to retrieve the docValue using
> the same searcher (searcherManager.maybeRefresh() +
> searcherManager.acquire()).
>
> If a new reader is obtained the docValue match/update is possible but this
> is a performance critical piece of code and I was hoping to reuse the same
> SearcherManager.
>
> The unit test here shows the problem:
>
> public class SearcherManagerFailureTest {
>     private static final String indexPath = "/tmp/mytestindex";
>
>     private IndexWriter indexWriter;
>     private SearcherManager searcherManager;
>     public Directory directory;
>
>     @Before
>     public void beforeTest() throws Exception {
>         // Setup
>         directory = FSDirectory.open(new File(indexPath));
>         IndexWriterConfig idxCfg = new IndexWriterConfig(Version.LUCENE_4_10_3,
> new WhitespaceAnalyzer());
> idxCfg.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
>         indexWriter = new IndexWriter(directory, idxCfg);
>         searcherManager = new SearcherManager(indexWriter, true, new
> SearcherFactory());
>     }
>
>     @After
>     public void afterTest() throws Exception {
>         if (indexWriter != null) {
>             indexWriter.commit();
>             indexWriter.close();
>         }
>
>         if (searcherManager != null) {
>             searcherManager.close();
>         }
>
>         if (directory != null) {
>             directory.close();
>         }
>
>         for(File file: new File(indexPath).listFiles())
>             if (!file.isDirectory())
>                 file.delete();
>     }
>
>     @Test
>     public void TestSearcherManagerFails() throws Exception{
>
>         //Indexing
>         Document doc = new Document();
>         FieldType ft = new FieldType(TextField.TYPE_STORED);
>         ft.setTokenized(false);
>         doc.add(new Field("docId",  "doc1",   ft));
>         doc.add(new Field("state",   "added",   ft));
>         doc.add(new BinaryDocValuesField("id", new BytesRef("first")));
>         indexWriter.addDocument(doc);
>         indexWriter.commit();
>
>         //Search by state
>         searcherManager.maybeRefresh();
>         IndexSearcher searcher = searcherManager.acquire();
>         TopDocs topDocs = searcher.search(new TermQuery(new Term("state",
> "added")), null, 1);
>         Document indexedDoc = searcher.doc(topDocs.scoreDocs[0].doc);
>
>         //Update document
>         String docId = indexedDoc.get("docId");
>         Term term = new Term("docId", docId);
>         Field stateField = (Field) indexedDoc.getField("state");
>         stateField.setStringValue("processed");
>         indexWriter.updateDocument(term, indexedDoc);
>
>         //Try get docValue
>         searcherManager.maybeRefresh();
>         IndexSearcher newSearcher = searcherManager.acquire();
>
>         BinaryDocValues docValues = MultiDocValues.getBinaryValues(newSearcher.getIndexReader(),
> "id");
>         Assert.assertEquals(null, docValues);
>
>         Directory newDirectory = FSDirectory.open(new File(indexPath));
>         BinaryDocValues docValues2 = MultiDocValues.getBinaryValues
> (DirectoryReader.open(newDirectory), "id");
>         Assert.assertNotSame(null, docValues2);
>
>         if(newDirectory != null){
>             newDirectory.close();
>         }
>     }
> }
>
>
> Can anyone advise?
>
> Thanks
>
> - Chris
>
>