You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by ki...@epiphany.com on 2002/10/17 00:44:42 UTC

Concurency in Lucene

My company, Epiphany, has decided to integrate our products with Lucene.
I'm leading this effort, and for this I have developed a solution around
Lucene that allows concurrent processes to search, insert, update and delete
documents. 
This solution solves the following:
	- concurrent writing (insert, update, delete) to the Index (see
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=12588 and
http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg01795.html
	- not-transactional nature of Lucene. Solution puts transaction
around every insert, update and delete. All writes are guaranteed to be in
the index eventually.
	- running out of file handles.
	- solution does all of the book-keeping, clients do not worry about
when to open and close  IndexReader/Writer. Technically one can do this
after every operation, but creating/deleting of .lock file slows things
down.


In summary, every write (update, delete, insert) is made to log file first.
There is a worker thread that wakes up every so often, examines the logs,
and makes a decision on whether to propagate changes or not (this is
configurable). If decision is to propagate changes, thread creates new log
files, locks current log files,  makes a copy of the new index, merges
changes from logs to the index, and then hot-swaps the newly created index
and deletes the old logs and index. At any given time, result from search
will not contain deleted documents, but newly created/updated documents will
not be in search result until merge is finished. Worker thread also keeps
state of the logs/index in case of crash. 

Here is what were the driven factors to create this solution. 
	Need for concurrent non-blocking writes (insert/update/delete)
	Need for deleted documents not to show up in the query result (Hits)
once deleted
	Lucene does not handle crashes well. The mentality is "if in doubt,
redo index" which does not work in some cases. Rebuilding of the index is
fast, but in our case a) it takes too many non-Lucene related recourses
(documents can be stored in database), b) high availability of search is a
requirement
		- Lucene can leave .lock files.
		- Lucene keeps state (documents) in memory


I wanted to see how much interest is out there for such a solution and
whether Lucene developers feel that this should be part of Lucene. If there
is enough interest I would like to donate this code to Lucene.

Thanks,

Kiril Zack

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: Concurency in Lucene

Posted by Maurits van Wijland <m....@quicknet.nl>.

Hi Kiril,

>
> I wanted to see how much interest is out there for such a solution and
> whether Lucene developers feel that this should be part of Lucene. If
there
> is enough interest I would like to donate this code to Lucene.

I think that this is a very good addition to Lucene. In case th developers
group doesn't think so, please consider sharing this code with us Users,
because these are some features that lack from Lucene.

We all face these problems and have our work arounds, I use a staging
server where ALL documents are indexed and then a new index is publkished.
I would love to have some sort of transactions and less problems with
crashes.

So, Kiril, let us in on it, please.

regards,
maurits.


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: Concurency in Lucene

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Moved to lucene-dev.

Wow, this sounds good, but I'd love to see the code first and see how
this log file business works first.

Also, I don't think we ever got any tests that use multiple threads and
demonstrate the problems with symultaneous read/write/delete, etc.
It would be nice to have something like that, so that we can verify
that this contribution really fixes these problems.

Thanks,
Otis


--- kiril.zack@epiphany.com wrote:
> My company, Epiphany, has decided to integrate our products with
> Lucene.
> I'm leading this effort, and for this I have developed a solution
> around
> Lucene that allows concurrent processes to search, insert, update and
> delete
> documents. 
> This solution solves the following:
> 	- concurrent writing (insert, update, delete) to the Index (see
> http://nagoya.apache.org/bugzilla/show_bug.cgi?id=12588 and
>
http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg01795.html
> 	- not-transactional nature of Lucene. Solution puts transaction
> around every insert, update and delete. All writes are guaranteed to
> be in
> the index eventually.
> 	- running out of file handles.
> 	- solution does all of the book-keeping, clients do not worry about
> when to open and close  IndexReader/Writer. Technically one can do
> this
> after every operation, but creating/deleting of .lock file slows
> things
> down.
> 
> 
> In summary, every write (update, delete, insert) is made to log file
> first.
> There is a worker thread that wakes up every so often, examines the
> logs,
> and makes a decision on whether to propagate changes or not (this is
> configurable). If decision is to propagate changes, thread creates
> new log
> files, locks current log files,  makes a copy of the new index,
> merges
> changes from logs to the index, and then hot-swaps the newly created
> index
> and deletes the old logs and index. At any given time, result from
> search
> will not contain deleted documents, but newly created/updated
> documents will
> not be in search result until merge is finished. Worker thread also
> keeps
> state of the logs/index in case of crash. 
> 
> Here is what were the driven factors to create this solution. 
> 	Need for concurrent non-blocking writes (insert/update/delete)
> 	Need for deleted documents not to show up in the query result (Hits)
> once deleted
> 	Lucene does not handle crashes well. The mentality is "if in doubt,
> redo index" which does not work in some cases. Rebuilding of the
> index is
> fast, but in our case a) it takes too many non-Lucene related
> recourses
> (documents can be stored in database), b) high availability of search
> is a
> requirement
> 		- Lucene can leave .lock files.
> 		- Lucene keeps state (documents) in memory
> 
> 
> I wanted to see how much interest is out there for such a solution
> and
> whether Lucene developers feel that this should be part of Lucene. If
> there
> is enough interest I would like to donate this code to Lucene.
> 
> Thanks,
> 
> Kiril Zack
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 


__________________________________________________
Do you Yahoo!?
Faith Hill - Exclusive Performances, Videos & More
http://faith.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: Concurency in Lucene

Posted by petite_abeille <pe...@mac.com>.

On Thursday, Oct 17, 2002, at 00:44 Europe/Zurich, 
kiril.zack@epiphany.com wrote:

> If there
> is enough interest I would like to donate this code to Lucene.

Please do :-) I ran into exactly the same type of problems and while I 
seem to have hammered them out I would love to see your take on it.

PA.

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>