You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Marc Sturlese <ma...@gmail.com> on 2009/02/17 13:01:18 UTC

2 strange behaviours with DIH full-import.

Hey, I have 2 problems that I think are really important and can be useful
for other users:

1.) I am runing 3 cores in a solr instance. Each core contains about a
milion and a half docs. Once a full-import is run in a core it will free
just a little bit of java memory. Once that first full-import is done and I
run another full-import with another core the memory used by the first
full-import will never be set free. Once the second full-import is done I
run the third... and I run out of memory! Is this a Solr bug setting memory
to free or I am missing something? Is there any way yo tell Solr to free all
memory after a full-import? It's a really severe error in my case as I can
not be restarting Tomcat server (I have other cron actions syncronized with
it).

2.)I run a full-import and everythins works fine... I run another
full-import in the same core and everything seems so work find. But I have
noticed that the index in  /data/index dir is two times bigger. I have seen
that Solr uses this indexwriter constructor when executes a deleteAll at the
begining of the full import :
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/index/IndexWriter.html#IndexWriter(org.apache.lucene.store.Directory,%20org.apache.lucene.analysis.Analyzer,%20boolean,%20org.apache.lucene.index.IndexDeletionPolicy,%20org.apache.lucene.index.IndexWriter.MaxFieldLength)

Why lucene is not deleteing the data of the old index if the boolean var of
the constructor is set to true? (the results are not duplicated but
phisically the directory /index is double size). Has this something to do
with de deletionPolicy that is saving commits or a lucenes 2.9-dev bug or
something like that???

I am running a nightly-build (from begining of January with some patches
that have been apperaring about concurrency indexing problems) with lucene
2.9-dev.
I would apreciate any advice as these two problems are really driving my
crazy and don't know how to sort it... specially the first one.

Thanks in advance!!

-- 
View this message in context: http://www.nabble.com/2-strange-behaviours-with-DIH-full-import.-tp22055769p22055769.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: 2 strange behaviours with DIH full-import.

Posted by Chris Hostetter <ho...@fucit.org>.
: 2.)I run a full-import and everythins works fine... I run another
: full-import in the same core and everything seems so work find. But I have
: noticed that the index in  /data/index dir is two times bigger. I have seen
: that Solr uses this indexwriter constructor when executes a deleteAll at the
: begining of the full import :
: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/index/IndexWriter.html#IndexWriter(org.apache.lucene.store.Directory,%20org.apache.lucene.analysis.Analyzer,%20boolean,%20org.apache.lucene.index.IndexDeletionPolicy,%20org.apache.lucene.index.IndexWriter.MaxFieldLength)
: 
: Why lucene is not deleteing the data of the old index if the boolean var of
: the constructor is set to true? (the results are not duplicated but
: phisically the directory /index is double size). Has this something to do
: with de deletionPolicy that is saving commits or a lucenes 2.9-dev bug or
: something like that???

this is not unusual, the documents have logically been deleted, but the 
files containing them are still on disk because the "old seracher" is 
still refrencing them, when the "new searcher" is swaped in for hte old 
searcher, those files can be deleted.

on unix filesystems, the old files will actually get deleted immediately 
(even while hte old searcher is still open) becaues unix filesystems let 
you do that.

windows filesystems won't let you delete files while they are open, so 
Lucene keeps track of the fact that the files *can* be deleted, and then 
next time you do a commit, it cleans them up them.



-Hoss


Re: 2 strange behaviours with DIH full-import.

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Tue, Feb 17, 2009 at 5:31 PM, Marc Sturlese <ma...@gmail.com>wrote:

>
> 2.)I run a full-import and everythins works fine... I run another
> full-import in the same core and everything seems so work find. But I have
> noticed that the index in  /data/index dir is two times bigger. I have seen
> that Solr uses this indexwriter constructor when executes a deleteAll at
> the
> begining of the full import :
>
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/index/IndexWriter.html#IndexWriter(org.apache.lucene.store.Directory,%20org.apache.lucene.analysis.Analyzer,%20boolean,%20org.apache.lucene.index.IndexDeletionPolicy,%20org.apache.lucene.index.IndexWriter.MaxFieldLength)<http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/index/IndexWriter.html#IndexWriter%28org.apache.lucene.store.Directory,%20org.apache.lucene.analysis.Analyzer,%20boolean,%20org.apache.lucene.index.IndexDeletionPolicy,%20org.apache.lucene.index.IndexWriter.MaxFieldLength%29>
>
> Why lucene is not deleteing the data of the old index if the boolean var of
> the constructor is set to true? (the results are not duplicated but
> phisically the directory /index is double size). Has this something to do
> with de deletionPolicy that is saving commits or a lucenes 2.9-dev bug or
> something like that???
>

I think this is due to the IndexDeletionPolicy. The problem is that on a
commit, the IndexWriter is closed. It is re-opened only when you send
another add/delete command. If the index writer is closed, the deletion
policy does not take affect and unused commit points are not marked for
deletion. Replication also hit a similar problem, where the index files on
the slave were not getting cleaned up. The solution is the same, we need to
re-open the index writer after the commit closes it.

I'll open an issue and attach a fix.

-- 
Regards,
Shalin Shekhar Mangar.