You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Rohit Gupta <ro...@in-rev.com> on 2011/06/04 15:23:20 UTC

URGENT HELP: Improving Solr indexing time

My Solr server takes very long to update index. The table it hits to index is 
huge with 10Million + records , but even in that case I feel this is very long 
time to index. Below is the snapshot of the /dataimport page

<str name="status">busy</str>
<str name="importResponse">A command is still running...</str>
<lst name="statusMessages">
<str name="Time Elapsed">1:53:39.664</str>
<str name="Total Requests made to DataSource">16276</str>
<str name="Total Rows Fetched">24237</str>
<str name="Total Documents Processed">16273</str>
<str name="Total Documents Skipped">0</str>
<str name="Full Dump Started">2011-06-04 11:25:26</str>
</lst>

How can i determine why this is happening and how can I improve this. During all 
our test on the local server before the migration we could index 5 million 
records in 4-5 hrs, but now its taking too long on the live server.

Regards,
Rohit

Re: URGENT HELP: Improving Solr indexing time

Posted by Alexey Serba <as...@gmail.com>.
<str name="Total Requests made to DataSource">16276</str>
...
> so I am doing a delta import of around 500,000 rows at a
> time.

http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport

Re: URGENT HELP: Improving Solr indexing time

Posted by Rohit Gupta <ro...@in-rev.com>.
Thanks Faud,

Have started working optimizing my Database structure, since the tables are huge 
in terms of records, optimization is taking time. 

Will update the results when complete.

Regards,
Rohit



________________________________
From: Fuad Efendi <fu...@efendi.ca>
To: "Solr-User@Lucene. Org" <so...@lucene.apache.org>
Sent: Sun, 5 June, 2011 10:05:22 AM
Subject: Re: URGENT HELP: Improving Solr indexing time

Hi Rohit,

I am currently working on https://issues.apache.org/jira/browse/SOLR-2233
which fixes multithreading issues

How complex is your dataimport schema? SOLR-2233 (multithreading, better
connection handling) improves performance... Especially if SQL is
extremely complex and uses few long-running CachedSqlEntityProcessors and
etc.

Also, check your SQL and indexes, in most cases you can _significantly_
improve performance by simply adding appropriate (for your specific SQL)
indexes. I noticed that even very experienced DBAs sometimes create index
<KEY1, KEY2>, and developer executes query "WHERE KEY2=? ORDER BY KEY1" -
check everything...

Thanks,


-- 
Fuad Efendi
416-993-2060
Tokenizer Inc., Canada
Data Mining, Search Engines
http://www.tokenizer.ca <http://www.tokenizer.ca/>







On 11-06-05 12:09 AM, "Rohit Gupta" <ro...@in-rev.com> wrote:

>No didn't double post, my be it was in my outbox and went out again.
>
>The queries outside solr dont take so long, to return around 500000 rows
>it 
>takes 250 seconds, so I am doing a delta import of around 500,000 rows at
>a 
>time. I have tried turning auto commit  on and things are moving a bit
>faster 
>now. Are there any more tweeking i can do?
>
>Also, planning to move to master-salve model, but am failing to
>understand where 
>to start exactly. 
>
>Regards,
>Rohit
>
>
>
>________________________________
>From: lee carroll <le...@googlemail.com>
>To: solr-user@lucene.apache.org
>Sent: Sun, 5 June, 2011 4:59:44 AM
>Subject: Re: URGENT HELP: Improving Solr indexing time
>
>Rohit - you have double posted maybe - did Otis's answer not help with
>your issue or at least need a response to clarify ?
>
>On 4 June 2011 22:53, Chris Cowan <Ch...@plus3network.com> wrote:
>> How long does the query against the DB take (outside of Solr)? If
>>that's slow 
>>then it's going to take a while to update the index. You might need to
>>figure a 
>>way to break things up a bit, maybe use a delta import instead of a full
>>import.
>>
>> Chris
>>
>> On Jun 4, 2011, at 6:23 AM, Rohit Gupta wrote:
>>
>>> My Solr server takes very long to update index. The table it hits to
>>>index is
>>> huge with 10Million + records , but even in that case I feel this is
>>>very 
>long
>>> time to index. Below is the snapshot of the /dataimport page
>>>
>>> <str name="status">busy</str>
>>> <str name="importResponse">A command is still running...</str>
>>> <lst name="statusMessages">
>>> <str name="Time Elapsed">1:53:39.664</str>
>>> <str name="Total Requests made to DataSource">16276</str>
>>> <str name="Total Rows Fetched">24237</str>
>>> <str name="Total Documents Processed">16273</str>
>>> <str name="Total Documents Skipped">0</str>
>>> <str name="Full Dump Started">2011-06-04 11:25:26</str>
>>> </lst>
>>>
>>> How can i determine why this is happening and how can I improve this.
>>>During 
>>>all
>>> our test on the local server before the migration we could index 5
>>>million
>>> records in 4-5 hrs, but now its taking too long on the live server.
>>>
>>> Regards,
>>> Rohit
>>
>>

Re: URGENT HELP: Improving Solr indexing time

Posted by Fuad Efendi <fu...@efendi.ca>.
Hi Rohit,

I am currently working on https://issues.apache.org/jira/browse/SOLR-2233
which fixes multithreading issues

How complex is your dataimport schema? SOLR-2233 (multithreading, better
connection handling) improves performance... Especially if SQL is
extremely complex and uses few long-running CachedSqlEntityProcessors and
etc.

Also, check your SQL and indexes, in most cases you can _significantly_
improve performance by simply adding appropriate (for your specific SQL)
indexes. I noticed that even very experienced DBAs sometimes create index
<KEY1, KEY2>, and developer executes query "WHERE KEY2=? ORDER BY KEY1" -
check everything...

Thanks,


-- 
Fuad Efendi
416-993-2060
Tokenizer Inc., Canada
Data Mining, Search Engines
http://www.tokenizer.ca <http://www.tokenizer.ca/>







On 11-06-05 12:09 AM, "Rohit Gupta" <ro...@in-rev.com> wrote:

>No didn't double post, my be it was in my outbox and went out again.
>
>The queries outside solr dont take so long, to return around 500000 rows
>it 
>takes 250 seconds, so I am doing a delta import of around 500,000 rows at
>a 
>time. I have tried turning auto commit  on and things are moving a bit
>faster 
>now. Are there any more tweeking i can do?
>
>Also, planning to move to master-salve model, but am failing to
>understand where 
>to start exactly. 
>
>Regards,
>Rohit
>
>
>
>________________________________
>From: lee carroll <le...@googlemail.com>
>To: solr-user@lucene.apache.org
>Sent: Sun, 5 June, 2011 4:59:44 AM
>Subject: Re: URGENT HELP: Improving Solr indexing time
>
>Rohit - you have double posted maybe - did Otis's answer not help with
>your issue or at least need a response to clarify ?
>
>On 4 June 2011 22:53, Chris Cowan <Ch...@plus3network.com> wrote:
>> How long does the query against the DB take (outside of Solr)? If
>>that's slow 
>>then it's going to take a while to update the index. You might need to
>>figure a 
>>way to break things up a bit, maybe use a delta import instead of a full
>>import.
>>
>> Chris
>>
>> On Jun 4, 2011, at 6:23 AM, Rohit Gupta wrote:
>>
>>> My Solr server takes very long to update index. The table it hits to
>>>index is
>>> huge with 10Million + records , but even in that case I feel this is
>>>very 
>long
>>> time to index. Below is the snapshot of the /dataimport page
>>>
>>> <str name="status">busy</str>
>>> <str name="importResponse">A command is still running...</str>
>>> <lst name="statusMessages">
>>> <str name="Time Elapsed">1:53:39.664</str>
>>> <str name="Total Requests made to DataSource">16276</str>
>>> <str name="Total Rows Fetched">24237</str>
>>> <str name="Total Documents Processed">16273</str>
>>> <str name="Total Documents Skipped">0</str>
>>> <str name="Full Dump Started">2011-06-04 11:25:26</str>
>>> </lst>
>>>
>>> How can i determine why this is happening and how can I improve this.
>>>During 
>>>all
>>> our test on the local server before the migration we could index 5
>>>million
>>> records in 4-5 hrs, but now its taking too long on the live server.
>>>
>>> Regards,
>>> Rohit
>>
>>



Re: URGENT HELP: Improving Solr indexing time

Posted by Rohit Gupta <ro...@in-rev.com>.
No didn't double post, my be it was in my outbox and went out again.

The queries outside solr dont take so long, to return around 500000 rows it 
takes 250 seconds, so I am doing a delta import of around 500,000 rows at a 
time. I have tried turning auto commit  on and things are moving a bit faster 
now. Are there any more tweeking i can do?

Also, planning to move to master-salve model, but am failing to understand where 
to start exactly. 

Regards,
Rohit



________________________________
From: lee carroll <le...@googlemail.com>
To: solr-user@lucene.apache.org
Sent: Sun, 5 June, 2011 4:59:44 AM
Subject: Re: URGENT HELP: Improving Solr indexing time

Rohit - you have double posted maybe - did Otis's answer not help with
your issue or at least need a response to clarify ?

On 4 June 2011 22:53, Chris Cowan <Ch...@plus3network.com> wrote:
> How long does the query against the DB take (outside of Solr)? If that's slow 
>then it's going to take a while to update the index. You might need to figure a 
>way to break things up a bit, maybe use a delta import instead of a full import.
>
> Chris
>
> On Jun 4, 2011, at 6:23 AM, Rohit Gupta wrote:
>
>> My Solr server takes very long to update index. The table it hits to index is
>> huge with 10Million + records , but even in that case I feel this is very 
long
>> time to index. Below is the snapshot of the /dataimport page
>>
>> <str name="status">busy</str>
>> <str name="importResponse">A command is still running...</str>
>> <lst name="statusMessages">
>> <str name="Time Elapsed">1:53:39.664</str>
>> <str name="Total Requests made to DataSource">16276</str>
>> <str name="Total Rows Fetched">24237</str>
>> <str name="Total Documents Processed">16273</str>
>> <str name="Total Documents Skipped">0</str>
>> <str name="Full Dump Started">2011-06-04 11:25:26</str>
>> </lst>
>>
>> How can i determine why this is happening and how can I improve this. During 
>>all
>> our test on the local server before the migration we could index 5 million
>> records in 4-5 hrs, but now its taking too long on the live server.
>>
>> Regards,
>> Rohit
>
>

Re: URGENT HELP: Improving Solr indexing time

Posted by lee carroll <le...@googlemail.com>.
Rohit - you have double posted maybe - did Otis's answer not help with
your issue or at least need a response to clarify ?

On 4 June 2011 22:53, Chris Cowan <Ch...@plus3network.com> wrote:
> How long does the query against the DB take (outside of Solr)? If that's slow then it's going to take a while to update the index. You might need to figure a way to break things up a bit, maybe use a delta import instead of a full import.
>
> Chris
>
> On Jun 4, 2011, at 6:23 AM, Rohit Gupta wrote:
>
>> My Solr server takes very long to update index. The table it hits to index is
>> huge with 10Million + records , but even in that case I feel this is very long
>> time to index. Below is the snapshot of the /dataimport page
>>
>> <str name="status">busy</str>
>> <str name="importResponse">A command is still running...</str>
>> <lst name="statusMessages">
>> <str name="Time Elapsed">1:53:39.664</str>
>> <str name="Total Requests made to DataSource">16276</str>
>> <str name="Total Rows Fetched">24237</str>
>> <str name="Total Documents Processed">16273</str>
>> <str name="Total Documents Skipped">0</str>
>> <str name="Full Dump Started">2011-06-04 11:25:26</str>
>> </lst>
>>
>> How can i determine why this is happening and how can I improve this. During all
>> our test on the local server before the migration we could index 5 million
>> records in 4-5 hrs, but now its taking too long on the live server.
>>
>> Regards,
>> Rohit
>
>

Re: URGENT HELP: Improving Solr indexing time

Posted by Chris Cowan <Ch...@plus3network.com>.
How long does the query against the DB take (outside of Solr)? If that's slow then it's going to take a while to update the index. You might need to figure a way to break things up a bit, maybe use a delta import instead of a full import.

Chris

On Jun 4, 2011, at 6:23 AM, Rohit Gupta wrote:

> My Solr server takes very long to update index. The table it hits to index is 
> huge with 10Million + records , but even in that case I feel this is very long 
> time to index. Below is the snapshot of the /dataimport page
> 
> <str name="status">busy</str>
> <str name="importResponse">A command is still running...</str>
> <lst name="statusMessages">
> <str name="Time Elapsed">1:53:39.664</str>
> <str name="Total Requests made to DataSource">16276</str>
> <str name="Total Rows Fetched">24237</str>
> <str name="Total Documents Processed">16273</str>
> <str name="Total Documents Skipped">0</str>
> <str name="Full Dump Started">2011-06-04 11:25:26</str>
> </lst>
> 
> How can i determine why this is happening and how can I improve this. During all 
> our test on the local server before the migration we could index 5 million 
> records in 4-5 hrs, but now its taking too long on the live server.
> 
> Regards,
> Rohit