You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Roshan Kamble <Ro...@smartstreamrdu.com> on 2016/06/25 07:19:44 UTC

SolrCloud persisting data is very slow

Hello,

I am using Solr 6.0.0 in cloudMode (3 physical nodes + one zookeeper)  and have heavy insert/update/delete operations.

I am using CloudSolrClient and tried with all batch size from 100 to 1000.

But it has been observed that persist at Solr node is very slow. It takes around 20 secords to store 50-100 records.

Does anyone know how to improve the speed for these operations?

Regards,
Roshan
________________________________
The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.

Re: SolrCloud persisting data is very slow

Posted by Erick Erickson <er...@gmail.com>.
One thing to add to Shawn's comments: How long does
it take to _acquire_ the data? I've often seen, say,
pulling the data from a RDBMS be the bottleneck
rather than Solr per-se. If you're using SolrJ (with
CloudSolrClient) try commenting out the line where you
add the docs to Solr (i.e. something like client.add(doclist).

The other quick test is to see if Solr is very busy when
indexing. My bet is that it's just idling along and the
hangup is getting the docs assembled to send.

Best,
Erick

On Sat, Jun 25, 2016 at 8:13 AM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 6/25/2016 1:19 AM, Roshan Kamble wrote:
>> I am using Solr 6.0.0 in cloudMode (3 physical nodes + one zookeeper)
>> and have heavy insert/update/delete operations. I am using
>> CloudSolrClient and tried with all batch size from 100 to 1000. But it
>> has been observed that persist at Solr node is very slow. It takes
>> around 20 secords to store 50-100 records. Does anyone know how to
>> improve the speed for these operations?
>
> Is that 20 seconds the *index* time or the *commit* time?  If it's the
> commit time, then see the "slow commits" section of the link that I
> provided below.  You can see how long the last commit took by looking at
> the statistics in the admin UI for the searcher object.
>
> If it's the index time, how much data is in those records?  What does
> the analysis in your schema do to that data?
>
> If you have no idea which process is taking the time, then you should
> decouple indexing from committing, so you can time both separately.
>
> Very slow indexing usually has one or more of these causes:
>
> 1) The data is very large and is heavily analyzed.
> 2) It is only being sent to Solr by a single thread.
> 3) Your Solr machine does not have enough memory for effective operation.
>
> That last item is a somewhat complex topic.  It is one of the things
> discussed here:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems
>
> There could be other problems, but these are the most common.  The
> solutions for these issues are, in the same order:
>
> 1a) Reduce the amount of data per record.
> 1b) change the schema so analysis is not as heavy.
> 1c) Handle rich document processing in your indexing program, not Solr.
> 2) Use multiple threads/processes in your indexing program.
> 3) Add memory to the server, and sometimes increase the max heap size.
>
> Thanks,
> Shawn
>

Re: SolrCloud persisting data is very slow

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/25/2016 1:19 AM, Roshan Kamble wrote:
> I am using Solr 6.0.0 in cloudMode (3 physical nodes + one zookeeper)
> and have heavy insert/update/delete operations. I am using
> CloudSolrClient and tried with all batch size from 100 to 1000. But it
> has been observed that persist at Solr node is very slow. It takes
> around 20 secords to store 50-100 records. Does anyone know how to
> improve the speed for these operations? 

Is that 20 seconds the *index* time or the *commit* time?  If it's the
commit time, then see the "slow commits" section of the link that I
provided below.  You can see how long the last commit took by looking at
the statistics in the admin UI for the searcher object.

If it's the index time, how much data is in those records?  What does
the analysis in your schema do to that data?

If you have no idea which process is taking the time, then you should
decouple indexing from committing, so you can time both separately.

Very slow indexing usually has one or more of these causes:

1) The data is very large and is heavily analyzed.
2) It is only being sent to Solr by a single thread.
3) Your Solr machine does not have enough memory for effective operation.

That last item is a somewhat complex topic.  It is one of the things
discussed here:

https://wiki.apache.org/solr/SolrPerformanceProblems

There could be other problems, but these are the most common.  The
solutions for these issues are, in the same order:

1a) Reduce the amount of data per record.
1b) change the schema so analysis is not as heavy.
1c) Handle rich document processing in your indexing program, not Solr.
2) Use multiple threads/processes in your indexing program.
3) Add memory to the server, and sometimes increase the max heap size.

Thanks,
Shawn