You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Qwerky <ne...@hmv.co.uk> on 2010/07/22 18:33:30 UTC

Delta import processing duration

I'm using Solr to index data from our data warehouse. The data is imported
through text files. I've written a custom FileImportDataImportHandler that
extends DataSource and it works fine - I've tested it with 280,000 records
and it manages to build the index in about 3 minutes. My problem is that
doing a delta update seems to take a really long time.

I've written a custome FileUpdateDataImportHandler which takes two files,
one for deletes and one fore updates. I've tested with an update file
containing 18,000 records and a delete file containing 30 records - my
custom handler whizzed through them in a few seconds but the page at
/solr/admin/dataimport.jsp says the command is still running (its been
running nearly an hour).

What's taking so long? Could there be some kind of inefficiency in the way
my update handler works?
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Delta-import-processing-duration-tp987562p987562.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Delta import processing duration

Posted by Qwerky <ne...@hmv.co.uk>.
I found my problem! It was a bad custom EntityProcessor I wrote.

My EntityProcessor wasn't checking for hasNext() on the Iterator from my
FileImportDataImportHandler, it was just returning next(). The second bug
was that when the Iterator ran out of records it was returning an empty
Map<String,Object> (it now returns null).
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Delta-import-processing-duration-tp987562p989425.html
Sent from the Solr - User mailing list archive at Nabble.com.