You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucenenet.apache.org by GitBox <gi...@apache.org> on 2021/06/18 21:27:56 UTC

[GitHub] [lucenenet] mikeplacko opened a new issue #494: IndexReader locks files

mikeplacko opened a new issue #494:
URL: https://github.com/apache/lucenenet/issues/494


   I've got an odd file lock problem after updating from v3 to v4. I'm not sure if this is just poor design choices or an actual issue but the behavior is definitely different now.
   
   Here is a simple scenario that seems like doesn't work how I think it should. 
   
   1. Create an application that creates an IndexWriter using a path, add a bunch of documents, and commit it. You should get some index files on disk at that path. 
   2. Create a IIS website that opens an IndexReader using that same path and IndexSearcher and then execute a search for one of those docs. 
   3. Go to disk and attempt to delete all the files lucene.net created in the index path. 
   
   Expected behavior: the files are deleted
   Actual behavior: some files are locked by the w3wp.exe process in IIS that used the reader and cannot be deleted. 
   
   Negative side effect: if the writer runs again it can also run into this issue where it can't update some of those files.
   Workaround: dispose of the reader every time you search. Not necessarily a bad thing, but uses more memory and is slower.
   
   Why is the reader locking any files? 
   The reader is specifically a static variable created one time and disposed on a timed basis for efficiency. My understanding is the reader is thread safe and since we don't care about real-time updates, this is more efficient than opening a new reader every time we do a search. 
   
   Posting the relevant code:
   
   ```
   private static DirectoryReader _reader;
   private static indexPath = @"C:\index\";
   
   public void Search(){
   	if(_reader==null){
   		_reader = DirectoryReader.Open(FSDirectory.Open(new DirectoryInfo(indexPath)), 8);
   	}
   	
   	BooleanQuery mainQuery; 
   	Sort sort;
   
   	IndexSearcher searcher = new IndexSearcher(_reader);
   
   	int numDocs = _reader.NumDocs;
   
   	var collector = TopFieldCollector.Create(sort, numDocs, true, true, true, false);
   
   	searcher.Search(mainQuery, collector);
   }
   
   public void Index(){
           IndexWriterConfig indexConfig = new IndexWriterConfig(LuceneVersion.LUCENE_48, new StandardAnalyzer(LuceneVersion.LUCENE_48));
   
   	var writer = new IndexWriter(FSDirectory.Open(indexPath), indexConfig)
   
   	var luceneDocument = new Lucene.Net.Documents.Document();
   	luceneDocument.Add(new TextField("DocumentId", "Unique ID", Field.Store.YES));
   	
   	writer.UpdateDocument(new Term("DocumentId", "Unique ID Here"), luceneDocument);
   	
   	writer.Commit();
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [lucenenet] NightOwl888 commented on issue #494: IndexReader locks files

Posted by GitBox <gi...@apache.org>.

NightOwl888 commented on issue #494:
URL: https://github.com/apache/lucenenet/issues/494#issuecomment-945181099


   Closing as this question appears to be answered.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [lucenenet] jeme commented on issue #494: IndexReader locks files

Posted by GitBox <gi...@apache.org>.

jeme commented on issue #494:
URL: https://github.com/apache/lucenenet/issues/494#issuecomment-868469000

@rclabo I don't think either of those are directly applicable here since he mentions using two different applications/processes, one for the updating and one for the reading.

@mikeplacko I am not sure there is much of a change really, I am very much seeing the issue with locked files in v3 as well, we see it every time we redeploy our software as there tends to be an overlap between the new version starting up while the old one shuts down, and here we run into locked files just as well. We don't see that as an issue though but just have a retry loop that pools for this until files are no longer locked, then we know the old process is fully shut down.

Som options as I see them:
A) If your trying to use a single process to maintain the index (update it) and then push that so one or more applications can access that, I guess the Replicator is the way to go. This could allow you to update the index in one place and distribute the index to multiple other nodes for searching.

B) Move your indexing into your IIS application and switch to NRT - @rclabo outlines the Lucene part of this in more detail so look at his response. You need to consider running as "Always on" in the IIS and consider how you recycle the process carefully or disable that all together.

C) Move your searching/reading into your indexing process (windows process?) and then forward queries from you IIS site to that process. Again utilizing NRT.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [lucenenet] mikeplacko commented on issue #494: IndexReader locks files

Posted by GitBox <gi...@apache.org>.

mikeplacko commented on issue #494:
URL: https://github.com/apache/lucenenet/issues/494#issuecomment-871775376


   For now, I went with Approach 1 - Point in time Reader. I added a loop on the delete all, just incase something is locked for a backup, waiting a few seconds in-between loops. I don't consider this the ideal solution, but it was the easiest to implement quickly. I'll be testing it more to see how it works, but the first look seems ok. 
   I'll explore Approach 2 more later when I have time. I also like the idea of moving all searching to the windows service and out of IIS. IIS has so many other problems it seems to make sense to run anything I can in the service. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [lucenenet] NightOwl888 closed issue #494: IndexReader locks files

Posted by GitBox <gi...@apache.org>.

NightOwl888 closed issue #494:
URL: https://github.com/apache/lucenenet/issues/494


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [lucenenet] rclabo edited a comment on issue #494: IndexReader locks files

Posted by GitBox <gi...@apache.org>.

rclabo edited a comment on issue #494:
URL: https://github.com/apache/lucenenet/issues/494#issuecomment-864095534






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [lucenenet] mikeplacko commented on issue #494: IndexReader locks files

Posted by GitBox <gi...@apache.org>.

mikeplacko commented on issue #494:
URL: https://github.com/apache/lucenenet/issues/494#issuecomment-864061910


   I think the negative side effect comes into play when you want to reset the search index programmatically by deleting all the index files. IIS can still be generating traffic, searches from users, while the writer process is trying to delete all those files before re-indexing. 
   My example was manual, as its easy to reproduce by just leaving the reader alive and hard to reproduce in normal operation because the reader is only alive a very short time. But it is happening under load in our application. The reader has those files locked and the other process cannot delete them to reset the index. 
   Maybe I can come up with a way to turn off searching in IIS while its trying to reset as a workaround but I think the better solution would be if Lucene.NET had a command for this. Literally Reset, which deletes everything thread safe. Maybe that's already available and I missed it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [lucenenet] rclabo commented on issue #494: IndexReader locks files

Posted by GitBox <gi...@apache.org>.

rclabo commented on issue #494:
URL: https://github.com/apache/lucenenet/issues/494#issuecomment-864009498






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [lucenenet] rclabo commented on issue #494: IndexReader locks files

Posted by GitBox <gi...@apache.org>.

rclabo commented on issue #494:
URL: https://github.com/apache/lucenenet/issues/494#issuecomment-868521605


   @jeme Thanks for the feedback. Approach 1 as I laid it out has no requirement that that the updating and the reading occur in the same process/application.  In fact the processes/applications don't even have to run on the same machine provided they have access to common disk storage.  I supplied Approach 2 as a "bonus" to show another way that this could be handled, some might say a more modern way, but you are totally right it may not be applicable if he has a hard requirement to have the updating in a different process/application then the reading.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org