You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Srikanth S <sr...@gmail.com> on 2012/08/25 09:04:05 UTC

SolrCloud admin UI core/stats showing commit count even without no explicit commit

Hi,

I am doing a small test for my company to see if SolrCloud is suitable for
our indexing needs. The setup is as follows:

   - Solr version 4.0 BETA1
   - Three physical machines hosting solr servers
   - Distributed ZooKeeper setup on the same three machines
   - 2 solr cores on each server: total 6 cores
   - 3 shards (and hence 1 replica each)
   - Machine M1 is leader of (shard 1, replica 1) and hosts (shard3,
   replica2)
   - M2 is leader of (shard 2, replica 1) and hosts (shard1, replica2)
   - M3 is leader of (shard 3, replica 1) and hosts (shard2, replica2)
   - Using sun java 1.6.0_27 with -server and Xms=3G and Xmx=6G
   - Two separate machines M4, M5 run separate 'create' client and 'search'
   client respectively

Config:

   - schema.xml: copied from bundled 'example/solr/collection1', removed
   all 'field' and 'copyFields' entries it came with, and added ~15 fields of
   my own (mostly strings and a few integers, all indexed, all stored, four of
   them multivalued)
   - solrconfig.xml: copied from bundled 'example/solr/collection1', set
   autocommit duration to 10mins with openSearcher=false and set
   autoSoftCommit to 3secs.

The documents being committed are fairly small in size, with around 10/15
attributes, most of them strings and fairly small strings (like person
names, street names etc).

I've been indexing data (with no searches in between) using a 50 threaded
'create' client for the last 17 hours at the end of which I have
~400million such documents indexed. For the most part of this time (from
the logs), I was able to index at around 6000-7000 documents per second (to
give you some idea of the machine specs/network etc.) and with each
solrServer.add() request returning in sub 10ms response times. And yes, I
am using solrj with CloudSolrServer.

Questions:
1. When I connect to the admin console of one of the servers, under the
core's 'Plugin/Stats' page and under 'UPDATEHANDLER' I see:
commits:23028
autocommit maxTime:600000ms
autocommits:115
soft autocommit maxTime:3000ms
soft autocommits:22912
optimizes:0

Two things interest me here:
a. there are very few auto-commits while there have been a number of
commits. However, I am not calling any explicit commit anywhere in the
client codes. *Am I missing something here?* Does the Solrj client
automatically commit after each add()? This is what is bothering me the
most, especially in light of less than expected search performance (as
outlined in question 2).
b. I see that 'optimizes'  is 0 here, whereas the core's main page a tick
mark against 'Optimized'. Which one is right? Further, how much does
'optimize' affect the search performance (in the light of the next question
I am going to ask)

2. After reaching the 400million mark, I've set the 'create' client to
index documents at around the rate of ~500 documents/second (using the same
50 threads), and going by the log, that seems to be happening. Now, at the
same time, I've started the 'search' client, which searches for random
documents using 50 threads. Most of these searches return 1 document each,
and rarely 4/5 documents, but not more than that. But I notice that the
search is much slower than what I expected: only around 40 searches go
through per second and each search takes around 1000-1400ms most of the
time. The search is performed using 1, 2, 3 or 4 fields (all ANDed) in the
search query. The question is, am I messing up something (w.r.t. question 1
above), or does it really take this much time to search on an index of this
size?


   -

Please do let me know if I need to share any more details. Thanks in
advance.

Thanks
Srikanth S

Re: SolrCloud admin UI core/stats showing commit count even without no explicit commit

Posted by Erick Erickson <er...@gmail.com>.
Been busy the last couple of days, sorry it took so long to get back....

You have basically 2 questions:

About the 80% rate. It's not quite clear. What I meant was say you have
20M docs on a server. You push it until you max out the QPS rate, say that's
100 queries/second. Now, configure your load testing to put 80 QPS at the
hardware, and keep indexing documents until the QPS rate falls off. That
gives you a good upper bound on the number of docs you can put on your
hardware.

About racking more replicas. Under the covers, there's an internal load
balancing act that gets done. As you add more machines to your SolrCloud
cluster, each one becomes a replica of one of your shards. So if you have 2
shards, adding the fifth machine just becomes the 3rd replica of shard 1.
Adding a sixth machine becomes replica 3 of shard 2. The seventh machine
becomes replica 4 of shard 1. And so on.

Now, incoming queries only have to hit one replica of each shard. So your
QPS rate goes up. This is done internally or you can front your entire
cluster with a load balancer, either way.

BTW, another thing to look at is how memory is consumed on your machines.
Solr usually comes under memory pressure first, so if you're seeing a bunch
of swapping etc, you're probably putting too many docs on each shard.

Unfortunately, testing is really the only way to be sure.

And yeah, the whole SolrCloud is pretty new, and the docs always lag the
code.

Best
Erick


On Mon, Aug 27, 2012 at 11:31 AM, Srikanth S <sr...@gmail.com> wrote:
> Thanks for your response Erick.
>
> Your explanation seems to make sense for the commit count. But I guess the
> UI needs to be fixed.
>
> Regarding the performance, I went through your blog (nicely written btw
> (and good links to other interesting blogs too)). I didn't realize that
> everything that is indexed needs to be kept in memory for reasonable
> performance, and in that case 133M documents (each with several indexed
> fields) per shard, and for a server hosting 2 such shards, the memory we
> have provided does seem to be very less. I think we need to do an
> evaluation of our hardware as you pointed out. I didn't get one thing in
> your blog though: the paragraph that starts with: "Now, take say 80% of the
> QPS rate above...". I am assuming you meant "Keep adding 1M documents and
> see the point where the QPS drops to 80% of the above value". Correct me if
> I am wrong.
>
> Wrt the query rate, we were able to run at around 80-90 searches/sec with
> indexing off, and 50-60 searches/sec while indexing at an average rate of
> 500 inserts/sec.
>
> Regarding stacking up of replicas to get more QPS, I would have expected
> the same, but with very little documentation (and with some of them
> conflicting) on SolrCloud design, I was not very sure about that. So, if
> you can, and if you have access to, can you point me to some places where
> more details about the architecture of SolrCloud is explained? I'd
> appreciate that greatly.
>
> Thanks again.
>
> On Mon, Aug 27, 2012 at 6:33 AM, Erick Erickson <er...@gmail.com>wrote:
>
>> The autocommits are about what I'd expect. 17 hours
>> == 102 ten minute blocks, which is roughly your
>> 115 autocommits. I'm _guessing_ that the total
>> commits are a combination of soft and hard. You'll
>> have 20,400 soft commits in that time frame, so this
>> works as a rough estimate....
>>
>> And SolrJ doesn't do a commit after an add unless
>> you tell it to.
>>
>> As for search performance, it's quite hard to tell, But
>> you have about 133M documents/shard, and two
>> replicas. You have a relatively small amount of
>> memory allocated for indexes that size. It's time to
>> just dig into what you can expect out of your boxes.
>>
>> Here's a blog that outlines a way to understand more
>> about the capacity of your hardware that might help.
>> I'd take the SolrCloud bits out for right now, and just
>> concentrate on the capacity of the machine in your
>> situation, then add SolrCloud back in to the mix.
>>
>> http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>
>> It'd be interesting to see what your query rate
>> was if you stop the indexing process. Mostly I'm
>> just looking for which factors change performance,
>> not recommending that you go with that approach.
>>
>> The good news is that you can get virtually whatever
>> QPS rate you need by simply racking in more replicas
>> for each shard....
>>
>> Best
>> Erick
>>
>>
>>
>>
>> On Sat, Aug 25, 2012 at 3:04 AM, Srikanth S <sr...@gmail.com> wrote:
>> > Hi,
>> >
>> > I am doing a small test for my company to see if SolrCloud is suitable
>> for
>> > our indexing needs. The setup is as follows:
>> >
>> >    - Solr version 4.0 BETA1
>> >    - Three physical machines hosting solr servers
>> >    - Distributed ZooKeeper setup on the same three machines
>> >    - 2 solr cores on each server: total 6 cores
>> >    - 3 shards (and hence 1 replica each)
>> >    - Machine M1 is leader of (shard 1, replica 1) and hosts (shard3,
>> >    replica2)
>> >    - M2 is leader of (shard 2, replica 1) and hosts (shard1, replica2)
>> >    - M3 is leader of (shard 3, replica 1) and hosts (shard2, replica2)
>> >    - Using sun java 1.6.0_27 with -server and Xms=3G and Xmx=6G
>> >    - Two separate machines M4, M5 run separate 'create' client and
>> 'search'
>> >    client respectively
>> >
>> > Config:
>> >
>> >    - schema.xml: copied from bundled 'example/solr/collection1', removed
>> >    all 'field' and 'copyFields' entries it came with, and added ~15
>> fields of
>> >    my own (mostly strings and a few integers, all indexed, all stored,
>> four of
>> >    them multivalued)
>> >    - solrconfig.xml: copied from bundled 'example/solr/collection1', set
>> >    autocommit duration to 10mins with openSearcher=false and set
>> >    autoSoftCommit to 3secs.
>> >
>> > The documents being committed are fairly small in size, with around 10/15
>> > attributes, most of them strings and fairly small strings (like person
>> > names, street names etc).
>> >
>> > I've been indexing data (with no searches in between) using a 50 threaded
>> > 'create' client for the last 17 hours at the end of which I have
>> > ~400million such documents indexed. For the most part of this time (from
>> > the logs), I was able to index at around 6000-7000 documents per second
>> (to
>> > give you some idea of the machine specs/network etc.) and with each
>> > solrServer.add() request returning in sub 10ms response times. And yes, I
>> > am using solrj with CloudSolrServer.
>> >
>> > Questions:
>> > 1. When I connect to the admin console of one of the servers, under the
>> > core's 'Plugin/Stats' page and under 'UPDATEHANDLER' I see:
>> > commits:23028
>> > autocommit maxTime:600000ms
>> > autocommits:115
>> > soft autocommit maxTime:3000ms
>> > soft autocommits:22912
>> > optimizes:0
>> >
>> > Two things interest me here:
>> > a. there are very few auto-commits while there have been a number of
>> > commits. However, I am not calling any explicit commit anywhere in the
>> > client codes. *Am I missing something here?* Does the Solrj client
>> > automatically commit after each add()? This is what is bothering me the
>> > most, especially in light of less than expected search performance (as
>> > outlined in question 2).
>> > b. I see that 'optimizes'  is 0 here, whereas the core's main page a tick
>> > mark against 'Optimized'. Which one is right? Further, how much does
>> > 'optimize' affect the search performance (in the light of the next
>> question
>> > I am going to ask)
>> >
>> > 2. After reaching the 400million mark, I've set the 'create' client to
>> > index documents at around the rate of ~500 documents/second (using the
>> same
>> > 50 threads), and going by the log, that seems to be happening. Now, at
>> the
>> > same time, I've started the 'search' client, which searches for random
>> > documents using 50 threads. Most of these searches return 1 document
>> each,
>> > and rarely 4/5 documents, but not more than that. But I notice that the
>> > search is much slower than what I expected: only around 40 searches go
>> > through per second and each search takes around 1000-1400ms most of the
>> > time. The search is performed using 1, 2, 3 or 4 fields (all ANDed) in
>> the
>> > search query. The question is, am I messing up something (w.r.t.
>> question 1
>> > above), or does it really take this much time to search on an index of
>> this
>> > size?
>> >
>> >
>> >    -
>> >
>> > Please do let me know if I need to share any more details. Thanks in
>> > advance.
>> >
>> > Thanks
>> > Srikanth S
>>

Re: SolrCloud admin UI core/stats showing commit count even without no explicit commit

Posted by Srikanth S <sr...@gmail.com>.
Thanks for your response Erick.

Your explanation seems to make sense for the commit count. But I guess the
UI needs to be fixed.

Regarding the performance, I went through your blog (nicely written btw
(and good links to other interesting blogs too)). I didn't realize that
everything that is indexed needs to be kept in memory for reasonable
performance, and in that case 133M documents (each with several indexed
fields) per shard, and for a server hosting 2 such shards, the memory we
have provided does seem to be very less. I think we need to do an
evaluation of our hardware as you pointed out. I didn't get one thing in
your blog though: the paragraph that starts with: "Now, take say 80% of the
QPS rate above...". I am assuming you meant "Keep adding 1M documents and
see the point where the QPS drops to 80% of the above value". Correct me if
I am wrong.

Wrt the query rate, we were able to run at around 80-90 searches/sec with
indexing off, and 50-60 searches/sec while indexing at an average rate of
500 inserts/sec.

Regarding stacking up of replicas to get more QPS, I would have expected
the same, but with very little documentation (and with some of them
conflicting) on SolrCloud design, I was not very sure about that. So, if
you can, and if you have access to, can you point me to some places where
more details about the architecture of SolrCloud is explained? I'd
appreciate that greatly.

Thanks again.

On Mon, Aug 27, 2012 at 6:33 AM, Erick Erickson <er...@gmail.com>wrote:

> The autocommits are about what I'd expect. 17 hours
> == 102 ten minute blocks, which is roughly your
> 115 autocommits. I'm _guessing_ that the total
> commits are a combination of soft and hard. You'll
> have 20,400 soft commits in that time frame, so this
> works as a rough estimate....
>
> And SolrJ doesn't do a commit after an add unless
> you tell it to.
>
> As for search performance, it's quite hard to tell, But
> you have about 133M documents/shard, and two
> replicas. You have a relatively small amount of
> memory allocated for indexes that size. It's time to
> just dig into what you can expect out of your boxes.
>
> Here's a blog that outlines a way to understand more
> about the capacity of your hardware that might help.
> I'd take the SolrCloud bits out for right now, and just
> concentrate on the capacity of the machine in your
> situation, then add SolrCloud back in to the mix.
>
> http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> It'd be interesting to see what your query rate
> was if you stop the indexing process. Mostly I'm
> just looking for which factors change performance,
> not recommending that you go with that approach.
>
> The good news is that you can get virtually whatever
> QPS rate you need by simply racking in more replicas
> for each shard....
>
> Best
> Erick
>
>
>
>
> On Sat, Aug 25, 2012 at 3:04 AM, Srikanth S <sr...@gmail.com> wrote:
> > Hi,
> >
> > I am doing a small test for my company to see if SolrCloud is suitable
> for
> > our indexing needs. The setup is as follows:
> >
> >    - Solr version 4.0 BETA1
> >    - Three physical machines hosting solr servers
> >    - Distributed ZooKeeper setup on the same three machines
> >    - 2 solr cores on each server: total 6 cores
> >    - 3 shards (and hence 1 replica each)
> >    - Machine M1 is leader of (shard 1, replica 1) and hosts (shard3,
> >    replica2)
> >    - M2 is leader of (shard 2, replica 1) and hosts (shard1, replica2)
> >    - M3 is leader of (shard 3, replica 1) and hosts (shard2, replica2)
> >    - Using sun java 1.6.0_27 with -server and Xms=3G and Xmx=6G
> >    - Two separate machines M4, M5 run separate 'create' client and
> 'search'
> >    client respectively
> >
> > Config:
> >
> >    - schema.xml: copied from bundled 'example/solr/collection1', removed
> >    all 'field' and 'copyFields' entries it came with, and added ~15
> fields of
> >    my own (mostly strings and a few integers, all indexed, all stored,
> four of
> >    them multivalued)
> >    - solrconfig.xml: copied from bundled 'example/solr/collection1', set
> >    autocommit duration to 10mins with openSearcher=false and set
> >    autoSoftCommit to 3secs.
> >
> > The documents being committed are fairly small in size, with around 10/15
> > attributes, most of them strings and fairly small strings (like person
> > names, street names etc).
> >
> > I've been indexing data (with no searches in between) using a 50 threaded
> > 'create' client for the last 17 hours at the end of which I have
> > ~400million such documents indexed. For the most part of this time (from
> > the logs), I was able to index at around 6000-7000 documents per second
> (to
> > give you some idea of the machine specs/network etc.) and with each
> > solrServer.add() request returning in sub 10ms response times. And yes, I
> > am using solrj with CloudSolrServer.
> >
> > Questions:
> > 1. When I connect to the admin console of one of the servers, under the
> > core's 'Plugin/Stats' page and under 'UPDATEHANDLER' I see:
> > commits:23028
> > autocommit maxTime:600000ms
> > autocommits:115
> > soft autocommit maxTime:3000ms
> > soft autocommits:22912
> > optimizes:0
> >
> > Two things interest me here:
> > a. there are very few auto-commits while there have been a number of
> > commits. However, I am not calling any explicit commit anywhere in the
> > client codes. *Am I missing something here?* Does the Solrj client
> > automatically commit after each add()? This is what is bothering me the
> > most, especially in light of less than expected search performance (as
> > outlined in question 2).
> > b. I see that 'optimizes'  is 0 here, whereas the core's main page a tick
> > mark against 'Optimized'. Which one is right? Further, how much does
> > 'optimize' affect the search performance (in the light of the next
> question
> > I am going to ask)
> >
> > 2. After reaching the 400million mark, I've set the 'create' client to
> > index documents at around the rate of ~500 documents/second (using the
> same
> > 50 threads), and going by the log, that seems to be happening. Now, at
> the
> > same time, I've started the 'search' client, which searches for random
> > documents using 50 threads. Most of these searches return 1 document
> each,
> > and rarely 4/5 documents, but not more than that. But I notice that the
> > search is much slower than what I expected: only around 40 searches go
> > through per second and each search takes around 1000-1400ms most of the
> > time. The search is performed using 1, 2, 3 or 4 fields (all ANDed) in
> the
> > search query. The question is, am I messing up something (w.r.t.
> question 1
> > above), or does it really take this much time to search on an index of
> this
> > size?
> >
> >
> >    -
> >
> > Please do let me know if I need to share any more details. Thanks in
> > advance.
> >
> > Thanks
> > Srikanth S
>

Re: SolrCloud admin UI core/stats showing commit count even without no explicit commit

Posted by Erick Erickson <er...@gmail.com>.
The autocommits are about what I'd expect. 17 hours
== 102 ten minute blocks, which is roughly your
115 autocommits. I'm _guessing_ that the total
commits are a combination of soft and hard. You'll
have 20,400 soft commits in that time frame, so this
works as a rough estimate....

And SolrJ doesn't do a commit after an add unless
you tell it to.

As for search performance, it's quite hard to tell, But
you have about 133M documents/shard, and two
replicas. You have a relatively small amount of
memory allocated for indexes that size. It's time to
just dig into what you can expect out of your boxes.

Here's a blog that outlines a way to understand more
about the capacity of your hardware that might help.
I'd take the SolrCloud bits out for right now, and just
concentrate on the capacity of the machine in your
situation, then add SolrCloud back in to the mix.
http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

It'd be interesting to see what your query rate
was if you stop the indexing process. Mostly I'm
just looking for which factors change performance,
not recommending that you go with that approach.

The good news is that you can get virtually whatever
QPS rate you need by simply racking in more replicas
for each shard....

Best
Erick




On Sat, Aug 25, 2012 at 3:04 AM, Srikanth S <sr...@gmail.com> wrote:
> Hi,
>
> I am doing a small test for my company to see if SolrCloud is suitable for
> our indexing needs. The setup is as follows:
>
>    - Solr version 4.0 BETA1
>    - Three physical machines hosting solr servers
>    - Distributed ZooKeeper setup on the same three machines
>    - 2 solr cores on each server: total 6 cores
>    - 3 shards (and hence 1 replica each)
>    - Machine M1 is leader of (shard 1, replica 1) and hosts (shard3,
>    replica2)
>    - M2 is leader of (shard 2, replica 1) and hosts (shard1, replica2)
>    - M3 is leader of (shard 3, replica 1) and hosts (shard2, replica2)
>    - Using sun java 1.6.0_27 with -server and Xms=3G and Xmx=6G
>    - Two separate machines M4, M5 run separate 'create' client and 'search'
>    client respectively
>
> Config:
>
>    - schema.xml: copied from bundled 'example/solr/collection1', removed
>    all 'field' and 'copyFields' entries it came with, and added ~15 fields of
>    my own (mostly strings and a few integers, all indexed, all stored, four of
>    them multivalued)
>    - solrconfig.xml: copied from bundled 'example/solr/collection1', set
>    autocommit duration to 10mins with openSearcher=false and set
>    autoSoftCommit to 3secs.
>
> The documents being committed are fairly small in size, with around 10/15
> attributes, most of them strings and fairly small strings (like person
> names, street names etc).
>
> I've been indexing data (with no searches in between) using a 50 threaded
> 'create' client for the last 17 hours at the end of which I have
> ~400million such documents indexed. For the most part of this time (from
> the logs), I was able to index at around 6000-7000 documents per second (to
> give you some idea of the machine specs/network etc.) and with each
> solrServer.add() request returning in sub 10ms response times. And yes, I
> am using solrj with CloudSolrServer.
>
> Questions:
> 1. When I connect to the admin console of one of the servers, under the
> core's 'Plugin/Stats' page and under 'UPDATEHANDLER' I see:
> commits:23028
> autocommit maxTime:600000ms
> autocommits:115
> soft autocommit maxTime:3000ms
> soft autocommits:22912
> optimizes:0
>
> Two things interest me here:
> a. there are very few auto-commits while there have been a number of
> commits. However, I am not calling any explicit commit anywhere in the
> client codes. *Am I missing something here?* Does the Solrj client
> automatically commit after each add()? This is what is bothering me the
> most, especially in light of less than expected search performance (as
> outlined in question 2).
> b. I see that 'optimizes'  is 0 here, whereas the core's main page a tick
> mark against 'Optimized'. Which one is right? Further, how much does
> 'optimize' affect the search performance (in the light of the next question
> I am going to ask)
>
> 2. After reaching the 400million mark, I've set the 'create' client to
> index documents at around the rate of ~500 documents/second (using the same
> 50 threads), and going by the log, that seems to be happening. Now, at the
> same time, I've started the 'search' client, which searches for random
> documents using 50 threads. Most of these searches return 1 document each,
> and rarely 4/5 documents, but not more than that. But I notice that the
> search is much slower than what I expected: only around 40 searches go
> through per second and each search takes around 1000-1400ms most of the
> time. The search is performed using 1, 2, 3 or 4 fields (all ANDed) in the
> search query. The question is, am I messing up something (w.r.t. question 1
> above), or does it really take this much time to search on an index of this
> size?
>
>
>    -
>
> Please do let me know if I need to share any more details. Thanks in
> advance.
>
> Thanks
> Srikanth S

Re: SolrCloud admin UI core/stats showing commit count even without no explicit commit

Posted by Srikanth S <sr...@gmail.com>.
As explained below, the servers are started with a minimum heap size of 3G
and max heap size of 4G, though I've never seen the heap grow more than 3G.

On Mon, Aug 27, 2012 at 3:02 AM, Lance Norskog <go...@gmail.com> wrote:

> How much memory is allocated? There is a feature in modern Unix
> systems called 'Large Pages' or 'Huge Pages'. This is an operating
> system feature to run very large processes with better virtual memory
> tracking strategies inside the CPU RAM subsystem. Search for 'Large
> Pages' and 'Translation Lookaside Buffer' to learn the gory details.
>
> On Sat, Aug 25, 2012 at 12:04 AM, Srikanth S <sr...@gmail.com> wrote:
> > Hi,
> >
> > I am doing a small test for my company to see if SolrCloud is suitable
> for
> > our indexing needs. The setup is as follows:
> >
> >    - Solr version 4.0 BETA1
> >    - Three physical machines hosting solr servers
> >    - Distributed ZooKeeper setup on the same three machines
> >    - 2 solr cores on each server: total 6 cores
> >    - 3 shards (and hence 1 replica each)
> >    - Machine M1 is leader of (shard 1, replica 1) and hosts (shard3,
> >    replica2)
> >    - M2 is leader of (shard 2, replica 1) and hosts (shard1, replica2)
> >    - M3 is leader of (shard 3, replica 1) and hosts (shard2, replica2)
> >    - Using sun java 1.6.0_27 with -server and Xms=3G and Xmx=6G
> >    - Two separate machines M4, M5 run separate 'create' client and
> 'search'
> >    client respectively
> >
> > Config:
> >
> >    - schema.xml: copied from bundled 'example/solr/collection1', removed
> >    all 'field' and 'copyFields' entries it came with, and added ~15
> fields of
> >    my own (mostly strings and a few integers, all indexed, all stored,
> four of
> >    them multivalued)
> >    - solrconfig.xml: copied from bundled 'example/solr/collection1', set
> >    autocommit duration to 10mins with openSearcher=false and set
> >    autoSoftCommit to 3secs.
> >
> > The documents being committed are fairly small in size, with around 10/15
> > attributes, most of them strings and fairly small strings (like person
> > names, street names etc).
> >
> > I've been indexing data (with no searches in between) using a 50 threaded
> > 'create' client for the last 17 hours at the end of which I have
> > ~400million such documents indexed. For the most part of this time (from
> > the logs), I was able to index at around 6000-7000 documents per second
> (to
> > give you some idea of the machine specs/network etc.) and with each
> > solrServer.add() request returning in sub 10ms response times. And yes, I
> > am using solrj with CloudSolrServer.
> >
> > Questions:
> > 1. When I connect to the admin console of one of the servers, under the
> > core's 'Plugin/Stats' page and under 'UPDATEHANDLER' I see:
> > commits:23028
> > autocommit maxTime:600000ms
> > autocommits:115
> > soft autocommit maxTime:3000ms
> > soft autocommits:22912
> > optimizes:0
> >
> > Two things interest me here:
> > a. there are very few auto-commits while there have been a number of
> > commits. However, I am not calling any explicit commit anywhere in the
> > client codes. *Am I missing something here?* Does the Solrj client
> > automatically commit after each add()? This is what is bothering me the
> > most, especially in light of less than expected search performance (as
> > outlined in question 2).
> > b. I see that 'optimizes'  is 0 here, whereas the core's main page a tick
> > mark against 'Optimized'. Which one is right? Further, how much does
> > 'optimize' affect the search performance (in the light of the next
> question
> > I am going to ask)
> >
> > 2. After reaching the 400million mark, I've set the 'create' client to
> > index documents at around the rate of ~500 documents/second (using the
> same
> > 50 threads), and going by the log, that seems to be happening. Now, at
> the
> > same time, I've started the 'search' client, which searches for random
> > documents using 50 threads. Most of these searches return 1 document
> each,
> > and rarely 4/5 documents, but not more than that. But I notice that the
> > search is much slower than what I expected: only around 40 searches go
> > through per second and each search takes around 1000-1400ms most of the
> > time. The search is performed using 1, 2, 3 or 4 fields (all ANDed) in
> the
> > search query. The question is, am I messing up something (w.r.t.
> question 1
> > above), or does it really take this much time to search on an index of
> this
> > size?
> >
> >
> >    -
> >
> > Please do let me know if I need to share any more details. Thanks in
> > advance.
> >
> > Thanks
> > Srikanth S
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: SolrCloud admin UI core/stats showing commit count even without no explicit commit

Posted by Lance Norskog <go...@gmail.com>.
How much memory is allocated? There is a feature in modern Unix
systems called 'Large Pages' or 'Huge Pages'. This is an operating
system feature to run very large processes with better virtual memory
tracking strategies inside the CPU RAM subsystem. Search for 'Large
Pages' and 'Translation Lookaside Buffer' to learn the gory details.

On Sat, Aug 25, 2012 at 12:04 AM, Srikanth S <sr...@gmail.com> wrote:
> Hi,
>
> I am doing a small test for my company to see if SolrCloud is suitable for
> our indexing needs. The setup is as follows:
>
>    - Solr version 4.0 BETA1
>    - Three physical machines hosting solr servers
>    - Distributed ZooKeeper setup on the same three machines
>    - 2 solr cores on each server: total 6 cores
>    - 3 shards (and hence 1 replica each)
>    - Machine M1 is leader of (shard 1, replica 1) and hosts (shard3,
>    replica2)
>    - M2 is leader of (shard 2, replica 1) and hosts (shard1, replica2)
>    - M3 is leader of (shard 3, replica 1) and hosts (shard2, replica2)
>    - Using sun java 1.6.0_27 with -server and Xms=3G and Xmx=6G
>    - Two separate machines M4, M5 run separate 'create' client and 'search'
>    client respectively
>
> Config:
>
>    - schema.xml: copied from bundled 'example/solr/collection1', removed
>    all 'field' and 'copyFields' entries it came with, and added ~15 fields of
>    my own (mostly strings and a few integers, all indexed, all stored, four of
>    them multivalued)
>    - solrconfig.xml: copied from bundled 'example/solr/collection1', set
>    autocommit duration to 10mins with openSearcher=false and set
>    autoSoftCommit to 3secs.
>
> The documents being committed are fairly small in size, with around 10/15
> attributes, most of them strings and fairly small strings (like person
> names, street names etc).
>
> I've been indexing data (with no searches in between) using a 50 threaded
> 'create' client for the last 17 hours at the end of which I have
> ~400million such documents indexed. For the most part of this time (from
> the logs), I was able to index at around 6000-7000 documents per second (to
> give you some idea of the machine specs/network etc.) and with each
> solrServer.add() request returning in sub 10ms response times. And yes, I
> am using solrj with CloudSolrServer.
>
> Questions:
> 1. When I connect to the admin console of one of the servers, under the
> core's 'Plugin/Stats' page and under 'UPDATEHANDLER' I see:
> commits:23028
> autocommit maxTime:600000ms
> autocommits:115
> soft autocommit maxTime:3000ms
> soft autocommits:22912
> optimizes:0
>
> Two things interest me here:
> a. there are very few auto-commits while there have been a number of
> commits. However, I am not calling any explicit commit anywhere in the
> client codes. *Am I missing something here?* Does the Solrj client
> automatically commit after each add()? This is what is bothering me the
> most, especially in light of less than expected search performance (as
> outlined in question 2).
> b. I see that 'optimizes'  is 0 here, whereas the core's main page a tick
> mark against 'Optimized'. Which one is right? Further, how much does
> 'optimize' affect the search performance (in the light of the next question
> I am going to ask)
>
> 2. After reaching the 400million mark, I've set the 'create' client to
> index documents at around the rate of ~500 documents/second (using the same
> 50 threads), and going by the log, that seems to be happening. Now, at the
> same time, I've started the 'search' client, which searches for random
> documents using 50 threads. Most of these searches return 1 document each,
> and rarely 4/5 documents, but not more than that. But I notice that the
> search is much slower than what I expected: only around 40 searches go
> through per second and each search takes around 1000-1400ms most of the
> time. The search is performed using 1, 2, 3 or 4 fields (all ANDed) in the
> search query. The question is, am I messing up something (w.r.t. question 1
> above), or does it really take this much time to search on an index of this
> size?
>
>
>    -
>
> Please do let me know if I need to share any more details. Thanks in
> advance.
>
> Thanks
> Srikanth S



-- 
Lance Norskog
goksron@gmail.com