You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Peter Sturge <pe...@gmail.com> on 2010/07/22 15:43:40 UTC

Getting FileNotFoundException with repl command=backup?

Informational

Hi,

This information is for anyone who might be running into problems when
performing explicit periodic backups of Solr indexes. I encountered this
problem, and hopefully this might be useful to others.
A related Jira issue is: SOLR-1475.

The issue is: When you execute a 'command=backup' request, the snapshot
starts, but then fails later on with file not found errors. This aborts the
snapshot, and you end up with no backup.

This error occurs if, during the backup process, Solr performs more commits
than its 'maxCommitsToKeep' setting in solrconfig.xml. If you don't commit
very often, you probably won't see this problem.
If, however, like me, you have Solr committing very often, the commit point
files for the backup can get deleted before the backup finishes. This is
particualrly true of larger indexes, where the backup can take some time.

Workaround 1:
One workaround to this is to set 'maxCommitsToKeep' to a number higher than
the total number of commits that can occur during the time it takes to do a
backup. Sounds like a 'finger-in-the-air' number? Well, yes it is.
If you commit every 20secs, and a full backup takes 10mins, you'll want a
value of at least 31. The trouble is, how long will a backup take? This can
vary hugely as the index grows, system is busy, disk fragmentation etc.
(my environment takes ~13mins to backup a 5.5GB index to a local folder)

An inefficiency of this approach that needs to be considered is the higher
the 'maxCommitsToKeep' number is, the more files you're going to have
lounging around in your index data folder - the majority of which never get
used. The collective size of these commit point files can be significant.
If you have a high mergeFactor, the number of files will increase as well.
You can set 'maxCommitAge' to delete old commit points after a certain time
- as long as it's not shorter than the 'worst-case' backup time.

I set my 'maxCommitsToKeep' to 2400, and the file not found errors
disappeared (note that 2400 is a hugely conservative number to cater for a
backup taking 24hrs). My mergeFactor is 25, so I get a high number of files
in the index folder, they are generally small in size, but significant extra
storage can be required.

If you're willing to trade off some (ok, potentially a lot of) extraneous
disk usage to keep commit points around waiting for a backup command, this
approach addresses the problem.

Workaround 2:
A preferable method (IMHO), is if you have an extra box, set up a read-only
replica, and then backup from the replica. Then you can then tune the slave
to suit your needs.

Coding:
I'm not very familiar with the repl/backup code, but a coded way to address
this might be to save a commit point's index version files when a backup
command is received, then release them for deletion when complete.
Perhaps someone with good knowledge of this part of Solr could comment more
succinctly.


Thanks,
Peter

Re: Getting FileNotFoundException with repl command=backup?

Posted by Alexander Rothenberg <a....@fotofinder.net>.
Thanks for the info Peter, i think i ran into the same isssue some time ago 
and could not find out why the backup stopped and also got deleted by solr. 

I decided to stop current running updates to solr while backup is running and 
wrote an own backuphandler that simply just copies the index-files to some 
location and rotates older unneeded backups. 

I thought about a cleaner solution where the backuphandler should create a 
LOCK to the index which would prevent incomming updates to write into the 
index. (the same is happening when index-optimizing is running). Then when 
the LOCK is set, a backup could run without any problems and removes the LOCK 
when done then. I was not able to create a working LOCK that prevents 
incomming updates to be applied, never found out... 

-- 
Alexander Rothenberg
Fotofinder GmbH		USt-IdNr. DE812854514
Software Entwicklung	Web: http://www.fotofinder.net/
Potsdamer Str. 96	Tel: +49 30 25792890
10785 Berlin		Fax: +49 30 257928999

Geschäftsführer:	Ali Paczensky
Amtsgericht:		Berlin Charlottenburg (HRB 73099)
Sitz:			Berlin