You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by gigo314 <gi...@gmail.com> on 2017/06/01 11:16:30 UTC

Configuration of parallel indexing threads

During performance testing a question was raised whether Solr indexing
performance could be improved by adding more concurrent index writer
threads. I discovered traces of such functionality  here
<https://issues.apache.org/jira/browse/SOLR-3929>  , but not sure how to use
it in Solr 6.2. Hopefully there is a setting in Solr configuration file, but
I cannot find it.



--
View this message in context: http://lucene.472066.n3.nabble.com/Configuration-of-parallel-indexing-threads-tp4338466.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Configuration of parallel indexing threads

Posted by Susheel Kumar <su...@gmail.com>.
How are you indexing currently? Are you using DIH or using SolrJ/Java? And
are you indexing with multiple threads/machines simultaneously etc or just
one thread/machine etc.

Thnx
Susheel

On Thu, Jun 1, 2017 at 11:45 AM, Erick Erickson <er...@gmail.com>
wrote:

> That's been removed in LUCENE-6659. I regularly max out my CPUs by
> having multiple _clients_ send update simultaneously rather than
> trying to up the number of threads the indexing process takes.
>
> But Mike McCandless can answer authoritatively...
>
> Best,
> Erick
>
> On Thu, Jun 1, 2017 at 4:16 AM, gigo314 <gi...@gmail.com> wrote:
> > During performance testing a question was raised whether Solr indexing
> > performance could be improved by adding more concurrent index writer
> > threads. I discovered traces of such functionality  here
> > <https://issues.apache.org/jira/browse/SOLR-3929>  , but not sure how
> to use
> > it in Solr 6.2. Hopefully there is a setting in Solr configuration file,
> but
> > I cannot find it.
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.
> nabble.com/Configuration-of-parallel-indexing-threads-tp4338466.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Configuration of parallel indexing threads

Posted by gigo314 <gi...@gmail.com>.
Thanks a lot!



--
View this message in context: http://lucene.472066.n3.nabble.com/Configuration-of-parallel-indexing-threads-tp4338466p4339792.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Configuration of parallel indexing threads

Posted by Erick Erickson <er...@gmail.com>.
that's pretty much my strategy.

I'll add parenthetically that I often see the bottleneck for indexing
to be acquiring the data from the system of record in the first place
rather than Solr. Assuming you're using SolrJ, an easy test is to
comment out the line that sends to Solr. There's usually some kind of
loop like:

while (more docs) {
    gather 1,000 docs into a list
    cloudSolrClient.add(docList);
    docList.clear()
}

So just comment out the cloudSolrClient.add line. I've seen situations
where the program still takes 95% of the time it takes to actually
index to Solr, in which case you need to focus on getting the data in
the first place.

And you need to batch updates, see:
https://lucidworks.com/2015/10/05/really-batch-updates-solr-2/

Good Luck!
Erick

On Fri, Jun 2, 2017 at 2:59 AM, gigo314 <gi...@gmail.com> wrote:
> Thanks for the replies. Just to confirm that I got it right:
> 1. Since there is no setting to control index writers, is it fair to assume
> that Solr always indexes at maximum possible speed?
> 2. The way to control write speed is to control number of clients that are
> simultaneously posting data, right?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Configuration-of-parallel-indexing-threads-tp4338466p4338599.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Configuration of parallel indexing threads

Posted by gigo314 <gi...@gmail.com>.
Thanks for the replies. Just to confirm that I got it right:
1. Since there is no setting to control index writers, is it fair to assume
that Solr always indexes at maximum possible speed?
2. The way to control write speed is to control number of clients that are
simultaneously posting data, right?



--
View this message in context: http://lucene.472066.n3.nabble.com/Configuration-of-parallel-indexing-threads-tp4338466p4338599.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Configuration of parallel indexing threads

Posted by Erick Erickson <er...@gmail.com>.
That's been removed in LUCENE-6659. I regularly max out my CPUs by
having multiple _clients_ send update simultaneously rather than
trying to up the number of threads the indexing process takes.

But Mike McCandless can answer authoritatively...

Best,
Erick

On Thu, Jun 1, 2017 at 4:16 AM, gigo314 <gi...@gmail.com> wrote:
> During performance testing a question was raised whether Solr indexing
> performance could be improved by adding more concurrent index writer
> threads. I discovered traces of such functionality  here
> <https://issues.apache.org/jira/browse/SOLR-3929>  , but not sure how to use
> it in Solr 6.2. Hopefully there is a setting in Solr configuration file, but
> I cannot find it.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Configuration-of-parallel-indexing-threads-tp4338466.html
> Sent from the Solr - User mailing list archive at Nabble.com.