You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ramo Karahasan <ra...@googlemail.com> on 2012/03/07 11:02:21 UTC

DIH Delta index takes much time

Hi,

 

i've indexed my 2 Million documents with DIH on solr. It uses a simple
select without joins where it fetches the distinct of title, and furthermore
ids, descriptions, urls . the first time I've indexed this, it took about 1
hour. Every 1-2 days I get new entries which I want to index. I'm doing and
delta index as described here:
http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport   with
the command: .dataimport?command=full-import&clean=false now I've added 2
more documents to the database, and run the command again. Solr now indexes
over an hour. The last time I've indexed is two weeks ago, but in this two
weeks, nothing has changed.

 

Any ideas how I can fasten that up?


Thanks,

Ramo


RE: DIH Delta index takes much time

Posted by "Dyer, James" <Ja...@ingrambook.com>.
As an insanity check, you might want to take the query that it is executing for delta updates and run it manually through a SQL tool, or do an explain plan or something.  It almost sounds like there could be a silly error in the query you're using and its doing a cartesian join or something like that.

You might also want to try to put your delta data in a text file and use CSV Request Handler to try and update the data.  Is it still taking a long time?  If so, you've eliminated both your database and DIH as the problems, pointing to possible resource constraints with your index.  (see http://wiki.apache.org/solr/UpdateCSV for detailed instructions how to do this).

If the query runs just fine when run manually, AND if the CSV loader test is fast too, then maybe you've stumbled on a new DIH bug nobody has reported before?

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Ramo Karahasan [mailto:ramo.karahasan@googlemail.com] 
Sent: Wednesday, March 07, 2012 1:55 PM
To: solr-user@lucene.apache.org
Subject: AW: DIH Delta index takes much time

Hi,

thank you fort he help. I've tried: 

dataimport?command=full-import&clean=false&optimize=false

and this takes only 19 minutes.... the first run with optimihzie=true takes
about 3 hours... the tomcat logs doesn't show any errors

and 19 minutes is to long too, isn't it?

Thanks,
Ramo

-----Ursprüngliche Nachricht-----
Von: Ahmet Arslan [mailto:iorixxx@yahoo.com] 
Gesendet: Mittwoch, 7. März 2012 12:41
An: solr-user@lucene.apache.org
Betreff: Re: DIH Delta index takes much time

> i've indexed my 2 Million documents with DIH on solr. It uses a simple 
> select without joins where it fetches the distinct of title, and 
> furthermore ids, descriptions, urls . the first time I've indexed 
> this, it took about 1 hour. Every 1-2 days I get new entries which I 
> want to index. I'm doing and delta index as described here:
> http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport   
> with the command: .dataimport?command=full-import&clean=false
> now I've added 2
> more documents to the database, and run the command again.
> Solr now indexes
> over an hour. The last time I've indexed is two weeks ago, but in this 
> two weeks, nothing has changed.

By default, both full and delta issues an optimize in the end. What happens
if you disable it?

.dataimport?command=full-import&clean=false&optimize=false
.dataimport?command=delta-import&optimize=false



AW: DIH Delta index takes much time

Posted by Ramo Karahasan <ra...@googlemail.com>.
Hi,

thank you fort he help. I've tried: 

dataimport?command=full-import&clean=false&optimize=false

and this takes only 19 minutes.... the first run with optimihzie=true takes
about 3 hours... the tomcat logs doesn't show any errors

and 19 minutes is to long too, isn't it?

Thanks,
Ramo

-----Ursprüngliche Nachricht-----
Von: Ahmet Arslan [mailto:iorixxx@yahoo.com] 
Gesendet: Mittwoch, 7. März 2012 12:41
An: solr-user@lucene.apache.org
Betreff: Re: DIH Delta index takes much time

> i've indexed my 2 Million documents with DIH on solr. It uses a simple 
> select without joins where it fetches the distinct of title, and 
> furthermore ids, descriptions, urls . the first time I've indexed 
> this, it took about 1 hour. Every 1-2 days I get new entries which I 
> want to index. I'm doing and delta index as described here:
> http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport   
> with the command: .dataimport?command=full-import&clean=false
> now I've added 2
> more documents to the database, and run the command again.
> Solr now indexes
> over an hour. The last time I've indexed is two weeks ago, but in this 
> two weeks, nothing has changed.

By default, both full and delta issues an optimize in the end. What happens
if you disable it?

.dataimport?command=full-import&clean=false&optimize=false
.dataimport?command=delta-import&optimize=false



Re: DIH Delta index takes much time

Posted by Ahmet Arslan <io...@yahoo.com>.
> i've indexed my 2 Million documents with DIH on solr. It
> uses a simple
> select without joins where it fetches the distinct of title,
> and furthermore
> ids, descriptions, urls . the first time I've indexed this,
> it took about 1
> hour. Every 1-2 days I get new entries which I want to
> index. I'm doing and
> delta index as described here:
> http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport   with
> the command: .dataimport?command=full-import&clean=false
> now I've added 2
> more documents to the database, and run the command again.
> Solr now indexes
> over an hour. The last time I've indexed is two weeks ago,
> but in this two
> weeks, nothing has changed.

By default, both full and delta issues an optimize in the end. What happens if you disable it?

.dataimport?command=full-import&clean=false&optimize=false
.dataimport?command=delta-import&optimize=false