You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by saiks <ka...@gmail.com> on 2017/09/21 04:19:10 UTC

Seeing very low ingestion performance for a single non-cloud Solr core

Hi,

Environment:
- Solr is running in non-cloud mode on 6.4.2, Sun Java8, Linux
4.4.0-31-generic x86_64
- Ingesting into a single core
- SoftCommit = 5 seconds, HardCommit = 10 seconds
- System has 16 Cpus and 32 Gb of memory (Solr is given 20 Gb of JVM heap)
- text = StandardTokenizer, id = solr.StrField/docValues, hostname =
solr.StrField/docValues, app = solr.StrField/docValues, epoch =
solr.TrieLongField/docValues

I am using jmeter to ingest to Solr core using UpdateRequestHandle
("/update/json") and sending in a batch of 1000 messages(same message) in a
single json array.

Sample message
[{"text":"May 11 10:18:22 scrooge Web-Requests: May 11 10:18:22
@IunAIir1----7k-- EVENT_WR-Y-attack-600 SG_child[823]: [event.error]
Possible attack - 5 blocked requests within 120 seconds",
 "id":"id1",
 "hostname": "xxxxxxxxxx.com",
 "app": "yyyy",
 "epoch": 1483667347941
 },
....]

Jmeter is configured to run 10 threads in parallel repeating the request
1000 times, which should ingest 10,000,000 messages in total.
Jmeter post url:
"/solr/mycore/update/json?overwrite=false&wt=json&commit=false"

Jmeter summary:
summary =   5000 in 00:03:07 =   26.7/s Avg:   370 Min:    27 Max:  1734
Err:     0 (0.00%)

I am only able to ingest 26000 messages per second, looking at system
resources only one or two cpus are at 25-30% and the rest are sitting idle
and also Solr heap is flat at 3Gb with no iowait on the devices.
Increasing parallelism in Jmeter to ingest using 20 threads did not increase
ingested messages per second, but increased the latency by 2x for each
request.

I don't understand why Solr is not able to use all the cpus on the host if I
increase Jmeter parallelism from 10 -> 20 -> 40. What can I do to achieve
performance gain and make Solr utilize system resources to their maximum.

Please help.

Thank you






--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Seeing very low ingestion performance for a single non-cloud Solr core

Posted by saiks <ka...@gmail.com>.
Hi All,

Thanks for the response.

Increasing hard/soft commit intervals did not help.
But by changing "text" field in the ingestion input from the same message to
random messages of similar length gave 60% improved performance.

Im able to ingest 40k - 45k messages per second, earlier I did 26k.

Thanks a lot.





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Seeing very low ingestion performance for a single non-cloud Solr core

Posted by Walter Underwood <wu...@wunderwood.org>.
5 seconds and 10 seconds is very short for auto commit.

20 Gb is probably too much heap.

Sending the exact same message for every update will create a few very long posting lists. Not sure if that is slow, but it is not realistic.

Finally, 26,000 per second is not that slow. That is over 1.5 million/minute. We are indexing bigger documents, but seeing 1 million/minute to a cluster with four shards.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Sep 21, 2017, at 1:18 AM, Emir Arnautović <em...@sematext.com> wrote:
> 
> Hi,
> What are your commit configs? Maybe you are committing too frequently. 
> 
> Thanks,
> Emir
> 
>> On 21 Sep 2017, at 06:19, saiks <ka...@gmail.com> wrote:
>> 
>> Hi,
>> 
>> Environment:
>> - Solr is running in non-cloud mode on 6.4.2, Sun Java8, Linux
>> 4.4.0-31-generic x86_64
>> - Ingesting into a single core
>> - SoftCommit = 5 seconds, HardCommit = 10 seconds
>> - System has 16 Cpus and 32 Gb of memory (Solr is given 20 Gb of JVM heap)
>> - text = StandardTokenizer, id = solr.StrField/docValues, hostname =
>> solr.StrField/docValues, app = solr.StrField/docValues, epoch =
>> solr.TrieLongField/docValues
>> 
>> I am using jmeter to ingest to Solr core using UpdateRequestHandle
>> ("/update/json") and sending in a batch of 1000 messages(same message) in a
>> single json array.
>> 
>> Sample message
>> [{"text":"May 11 10:18:22 scrooge Web-Requests: May 11 10:18:22
>> @IunAIir1----7k-- EVENT_WR-Y-attack-600 SG_child[823]: [event.error]
>> Possible attack - 5 blocked requests within 120 seconds",
>> "id":"id1",
>> "hostname": "xxxxxxxxxx.com",
>> "app": "yyyy",
>> "epoch": 1483667347941
>> },
>> ....]
>> 
>> Jmeter is configured to run 10 threads in parallel repeating the request
>> 1000 times, which should ingest 10,000,000 messages in total.
>> Jmeter post url:
>> "/solr/mycore/update/json?overwrite=false&wt=json&commit=false"
>> 
>> Jmeter summary:
>> summary =   5000 in 00:03:07 =   26.7/s Avg:   370 Min:    27 Max:  1734
>> Err:     0 (0.00%)
>> 
>> I am only able to ingest 26000 messages per second, looking at system
>> resources only one or two cpus are at 25-30% and the rest are sitting idle
>> and also Solr heap is flat at 3Gb with no iowait on the devices.
>> Increasing parallelism in Jmeter to ingest using 20 threads did not increase
>> ingested messages per second, but increased the latency by 2x for each
>> request.
>> 
>> I don't understand why Solr is not able to use all the cpus on the host if I
>> increase Jmeter parallelism from 10 -> 20 -> 40. What can I do to achieve
>> performance gain and make Solr utilize system resources to their maximum.
>> 
>> Please help.
>> 
>> Thank you
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> 


Re: Seeing very low ingestion performance for a single non-cloud Solr core

Posted by Emir Arnautović <em...@sematext.com>.
Hi,
What are your commit configs? Maybe you are committing too frequently. 

Thanks,
Emir

> On 21 Sep 2017, at 06:19, saiks <ka...@gmail.com> wrote:
> 
> Hi,
> 
> Environment:
> - Solr is running in non-cloud mode on 6.4.2, Sun Java8, Linux
> 4.4.0-31-generic x86_64
> - Ingesting into a single core
> - SoftCommit = 5 seconds, HardCommit = 10 seconds
> - System has 16 Cpus and 32 Gb of memory (Solr is given 20 Gb of JVM heap)
> - text = StandardTokenizer, id = solr.StrField/docValues, hostname =
> solr.StrField/docValues, app = solr.StrField/docValues, epoch =
> solr.TrieLongField/docValues
> 
> I am using jmeter to ingest to Solr core using UpdateRequestHandle
> ("/update/json") and sending in a batch of 1000 messages(same message) in a
> single json array.
> 
> Sample message
> [{"text":"May 11 10:18:22 scrooge Web-Requests: May 11 10:18:22
> @IunAIir1----7k-- EVENT_WR-Y-attack-600 SG_child[823]: [event.error]
> Possible attack - 5 blocked requests within 120 seconds",
> "id":"id1",
> "hostname": "xxxxxxxxxx.com",
> "app": "yyyy",
> "epoch": 1483667347941
> },
> ....]
> 
> Jmeter is configured to run 10 threads in parallel repeating the request
> 1000 times, which should ingest 10,000,000 messages in total.
> Jmeter post url:
> "/solr/mycore/update/json?overwrite=false&wt=json&commit=false"
> 
> Jmeter summary:
> summary =   5000 in 00:03:07 =   26.7/s Avg:   370 Min:    27 Max:  1734
> Err:     0 (0.00%)
> 
> I am only able to ingest 26000 messages per second, looking at system
> resources only one or two cpus are at 25-30% and the rest are sitting idle
> and also Solr heap is flat at 3Gb with no iowait on the devices.
> Increasing parallelism in Jmeter to ingest using 20 threads did not increase
> ingested messages per second, but increased the latency by 2x for each
> request.
> 
> I don't understand why Solr is not able to use all the cpus on the host if I
> increase Jmeter parallelism from 10 -> 20 -> 40. What can I do to achieve
> performance gain and make Solr utilize system resources to their maximum.
> 
> Please help.
> 
> Thank you
> 
> 
> 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html