You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Luca Quarello <lu...@gmail.com> on 2015/12/30 02:03:15 UTC

SOLR replicas performance

Hi,

I have an 260M documents index (90GB) with this structure:


<field name="fragment" type="text_general" indexed="true" stored="true"
multiValued="false" termVectors="false" termPositions="false"
termOffsets="false" />

  <field name="parentId" type="long" indexed="false" stored="true"
multiValued="false"/>

  <field name="fragmentContentType" type="string" indexed="false"
stored="true" multiValued="false"/>

  <field name="creationDate" type="date" indexed="true" stored="true"
multiValued="false"/>

  <field name="creationTimestamp" type="date" indexed="true" stored="true"
multiValued="false"/>

  <field name="visibility" type="string" indexed="true" stored="true"
multiValued="false"/>

  <field name="category" type="string" indexed="true" stored="true"
multiValued="false"/>

  <field name="marked" type="string" indexed="true" stored="true"
multiValued="false"/>

   <!-- catchall field, containing all other searchable text fields
(implemented

   via copyField further on in this schema  -->

  <field name="text" type="text_general" indexed="true" stored="false"
multiValued="true"/>

  <copyField source="fragment" dest="text"/>

  <copyField source="parentId" dest="text"/>

  <copyField source="fragmentContentType" dest="text"/>

  <copyField source="creationDate" dest="text"/>

  <copyField source="visibility" dest="text"/>

  <copyField source="category" dest="text"/>

  <copyField source="marked" dest="text"/>


where the fragmetnt field contains XML messagges.

There is a search function that provide the messagges satisfying a search
criterion.


TARGET:

To find the best configuration to optimize the response time of a two solr
instances cloud with 2 VM with 8 core and 32 GB


TEST RESULTS:


   1.

   Configurations:
   1.

      the better configuration without replicas
      - CONF1: 16 shards of 17M documents (8 per VM)
      1.

      configuration with replica
      - CONF 2: 8 shards of 35M documents with replication factor of 1
         - CONF 3: 16 shards of 35M documents with replication factor of 1



   1.

   Executed tests


   - sequential requests
      - 5 parallel requests
      - 10 parallel requests
      - 20 parallel requests

in two scenarios: during an indexing phase and not


Call are: http://localhost:8983/solr/sepa/select?
q=+fragment%3A*AAA*+&fq=marked%3AT&fq=-fragmentContentType
%3ABULK&start=0&rows=100&sort=creationTimestamp+desc%2Cid+asc


   1.

   Test results

           All the test have point out an I/O utilization of 100MB/s during

loading data on disk cache, disk cache utilization of 20GB and core
utilization of 100% (all 8 cores)



   -

   No indexing
   -

      CONF1 (time average and maximum time)
      -

         sequential: 4,1 6,9
         -

         5 parallel: 15,6 19,1
         -

         10 parallel: 23,6 30,2
         -

         20 parallel: 48 52,2
         -

      CONF2
      -

         sequential: 12,3 17,4
         -

         5 parallel: 32,5 34,2
         -

         10 parallel: 45,4 49
         -

         20 parallel: 64,6 74
         -

      CONF3
      -

         sequential: 6,9 9,9
         -

         5 parallel: 33,2 37,5
         -

         10 parallel: 46 51
         -

         20 parallel: 68 83



   -

   Indexing (into the solr admin console is it possible to view the
total throughput?
   I find it only relative to a single shard).


CONF1

   -

      sequential: 7,7 9,5
      -

      5 parallel: 26,8 28,4
      -

      10 parallel: 31,8 37,8
      -

      20 parallel: 42 52,5
      -

   CONF2
   -

      sequential: 12,3 19
      -

      5 parallel: 39 40,8
      -

      10 parallel: 56,6 62,9
      -

      20 parallel: 79 116
      -

   CONF3
   -

      sequential: 10 18,9
      -

      5 parallel: 36,5 41,9
      -

      10 parallel: 63,7 64,1
      -

      20 parallel: 85 120



I have two question:

   -

   the response times of the configuration with replica are worse (in test
   case of sequential requests worse of about three time) than the response
   times of the configuration without replica. Is it an expected result?
   - Why during  index inserting and updating replicas doesn’t help to
   reduce the response time?

Re: SOLR replicas performance

Posted by Luca Quarello <lu...@gmail.com>.

Hi Shawn,
I expect that indexing is a little bit slower with replication but in my
case is 3 times worst. I don't explain this.

The monitored consumption of resources is:

           All the test have point out an I/O utilization of 100MB/s during

loading data on disk cache, disk cache utilization of 20GB and core
utilization of 100% (all 8 cores)


 so it seems that the bottleneck are cores and not RAM. I don't expect a
performance improvement increasing RAM. Am I wrong?


Thanks,
Luca

On Fri, Jan 8, 2016 at 4:40 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 1/8/2016 7:55 AM, Luca Quarello wrote:
> > I used solr5.3.1 and I sincerely expected response times with replica
> > configuration near to response times without replica configuration.
> >
> > Do you agree with me?
> >
> > I read here
> >
> http://lucene.472066.n3.nabble.com/Solr-Cloud-Query-Scaling-td4110516.html
> > that "Queries do not need to be routed to leaders; they can be handled by
> > any replica in a shard. Leaders are only needed for handling update
> > requests. "
> >
> > I haven't found this behaviour. In my case CONF2 e CONF3 have all
> replicas
> > on VM2 but analyzing core utilization during a request is 100% on both
> > machines. Why?
>
> Indexing is a little bit slower with replication -- the update must
> happen on all replicas.
>
> If your index is sharded (which I believe you did indicate in your
> initial message), you may find that all replicas get used even for
> queries.  It is entirely possible that some of the shard subqueries will
> be processed on one replica and some of them will be processed on other
> replicas.  I do not know if this commonly happens, but I would not be
> surprised if it does.  If the machines are sized appropriately for the
> index, this separation should speed up queries, because you have the
> resources of multiple machines handling one query.
>
> That phrase "sized appropriately" is very important.  Your initial
> message indicated that you have a 90GB index, and that you are running
> in virtual machines.  Typically VMs have fairly small memory sizes.  It
> is very possible that you simply don't have enough memory in the VM for
> good performance with an index that large.  With 90GB of index data on
> one machine, I would hope for at least 64GB of RAM, and I would prefer
> to have 128GB.  If there is more than 90GB of data on one machine, then
> even more memory would be needed.
>
> Thanks,
> Shawn
>
>

Re: SOLR replicas performance

Posted by Luca Quarello <lu...@gmail.com>.

Hi Tomas,
I give you other details.


   - The fragment field contains 3KB xml messages.
   - The queries that I used for the test are (I only change the word to
   search inside the fragment field between requests): curl "
   http://localhost:8983/solr/sepa/select?q=+fragment%3A*A*+&fq=marked%3AT&fq=-fragmentContentType%3ABULK&start=0&rows=100&sort=creationTimestamp+desc%2Cid+asc"

   - All the tests was executed inside VMs on dedicated HW in details:

2 Hypervisor ESX 5.5 on:


   - Server PowerEdge T420 - Dual Xeon E5-2420 with 128Gb di RAM
   - RAID10 local storage, 4xNear Line Sas 7.200 (about 100MB/s guaranteed
   bandwidth)


I have executed another test with the configuration: 8 shards of 35M
documents on VM1 and 8 empty shards on VM2 (CONF4). The configuration is
without replica.

We can now compare the response times (in seconds) for CONF2 and CONF4:


   - without indexing operations


   -

   CONF2
   -

      *sequential: 12,3 **17,4*
      -

      5 parallel: 32,5 34,2
      -

      10 parallel: 45,4 49
      -

      20 parallel: 64,6 74


   -

   CONF4
   -

      sequential: 5 9,1
      -

      5 parallel: 25 31
      -

      10 parallel: 41 49
      -

      20 parallel: 60 73



   - with indexing operations



   -

   CONF2
   -

      sequential: 12,3 19
      -

      5 parallel: 39 40,8
      -

      10 parallel: 56,6 62,9
      -

      *20 parallel: 79 116*


   -

   CONF4
   -

      sequential: 15,5 17,5
      -

      5 parallel: 30,7 38,3
      -

      10 parallel: 57,5 64,2
      -

      20 parallel: 60 81,4


During the test:

   - CONF2: 8 core on VM1 and 8 core on VM2 100% used (except for
   sequential test without indexing operations where the usage was about 80%).
   - CONF4: 8 core on VM1 100% used


As you can see performance are similar for tests with 5 and 10 parallel
requests both with during indexing operations and without indexing
operations but very different
with sequential requests and with 20 parallel requests. I don't understand
why.

Thanks,
Luca

On Fri, Jan 8, 2016 at 6:47 PM, Tomás Fernández Löbbe <tomasflobbe@gmail.com
> wrote:

> Hi Luca,
> It looks like your queries are complex wildcard queries. My theory is that
> you are CPU-bounded, for a single query one CPU core for each shard will be
> at 100% for the duration of the sub-query. Smaller shards make these
> sub-queries faster which is why 16 shards is better than 8 in your case.
> * In your 16x1 configuration, you have exactly one shard per CPU core, so
> in a single query, 16 subqueries will go to both nodes evenly and use one
> of the CPU cores.
> * In your 8x2 configuration, you still get to use one CPU core per shard,
> but the shards are bigger, so maybe each subquery takes longer (for the
> single query thread and 8x2 scenario I would expect CPU utilization to be
> lower?).
> * In your 16x2 case 16 subqueries will be distributed un-evenly, and some
> node will get more than 8 subqueries, which means that some of the
> subqueries will have to wait for their turn for a CPU core. In addition,
> more Solr cores will be competing for resources.
> If this theory is correct, adding more replicas won't speedup your queries,
> you need to either get faster CPU or simplify your queries/configuration in
> some way. Adding more replicas should improve your query throughput, but
> only if you add them in more HW, not the same one.
>
> ...anyway, just a theory
>
> Tomás
>
> On Fri, Jan 8, 2016 at 7:40 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>
> > On 1/8/2016 7:55 AM, Luca Quarello wrote:
> > > I used solr5.3.1 and I sincerely expected response times with replica
> > > configuration near to response times without replica configuration.
> > >
> > > Do you agree with me?
> > >
> > > I read here
> > >
> >
> http://lucene.472066.n3.nabble.com/Solr-Cloud-Query-Scaling-td4110516.html
> > > that "Queries do not need to be routed to leaders; they can be handled
> by
> > > any replica in a shard. Leaders are only needed for handling update
> > > requests. "
> > >
> > > I haven't found this behaviour. In my case CONF2 e CONF3 have all
> > replicas
> > > on VM2 but analyzing core utilization during a request is 100% on both
> > > machines. Why?
> >
> > Indexing is a little bit slower with replication -- the update must
> > happen on all replicas.
> >
> > If your index is sharded (which I believe you did indicate in your
> > initial message), you may find that all replicas get used even for
> > queries.  It is entirely possible that some of the shard subqueries will
> > be processed on one replica and some of them will be processed on other
> > replicas.  I do not know if this commonly happens, but I would not be
> > surprised if it does.  If the machines are sized appropriately for the
> > index, this separation should speed up queries, because you have the
> > resources of multiple machines handling one query.
> >
> > That phrase "sized appropriately" is very important.  Your initial
> > message indicated that you have a 90GB index, and that you are running
> > in virtual machines.  Typically VMs have fairly small memory sizes.  It
> > is very possible that you simply don't have enough memory in the VM for
> > good performance with an index that large.  With 90GB of index data on
> > one machine, I would hope for at least 64GB of RAM, and I would prefer
> > to have 128GB.  If there is more than 90GB of data on one machine, then
> > even more memory would be needed.
> >
> > Thanks,
> > Shawn
> >
> >
>

Re: SOLR replicas performance

Posted by Tomás Fernández Löbbe <to...@gmail.com>.

Hi Luca,
It looks like your queries are complex wildcard queries. My theory is that
you are CPU-bounded, for a single query one CPU core for each shard will be
at 100% for the duration of the sub-query. Smaller shards make these
sub-queries faster which is why 16 shards is better than 8 in your case.
* In your 16x1 configuration, you have exactly one shard per CPU core, so
in a single query, 16 subqueries will go to both nodes evenly and use one
of the CPU cores.
* In your 8x2 configuration, you still get to use one CPU core per shard,
but the shards are bigger, so maybe each subquery takes longer (for the
single query thread and 8x2 scenario I would expect CPU utilization to be
lower?).
* In your 16x2 case 16 subqueries will be distributed un-evenly, and some
node will get more than 8 subqueries, which means that some of the
subqueries will have to wait for their turn for a CPU core. In addition,
more Solr cores will be competing for resources.
If this theory is correct, adding more replicas won't speedup your queries,
you need to either get faster CPU or simplify your queries/configuration in
some way. Adding more replicas should improve your query throughput, but
only if you add them in more HW, not the same one.

...anyway, just a theory

Tomás

On Fri, Jan 8, 2016 at 7:40 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 1/8/2016 7:55 AM, Luca Quarello wrote:
> > I used solr5.3.1 and I sincerely expected response times with replica
> > configuration near to response times without replica configuration.
> >
> > Do you agree with me?
> >
> > I read here
> >
> http://lucene.472066.n3.nabble.com/Solr-Cloud-Query-Scaling-td4110516.html
> > that "Queries do not need to be routed to leaders; they can be handled by
> > any replica in a shard. Leaders are only needed for handling update
> > requests. "
> >
> > I haven't found this behaviour. In my case CONF2 e CONF3 have all
> replicas
> > on VM2 but analyzing core utilization during a request is 100% on both
> > machines. Why?
>
> Indexing is a little bit slower with replication -- the update must
> happen on all replicas.
>
> If your index is sharded (which I believe you did indicate in your
> initial message), you may find that all replicas get used even for
> queries.  It is entirely possible that some of the shard subqueries will
> be processed on one replica and some of them will be processed on other
> replicas.  I do not know if this commonly happens, but I would not be
> surprised if it does.  If the machines are sized appropriately for the
> index, this separation should speed up queries, because you have the
> resources of multiple machines handling one query.
>
> That phrase "sized appropriately" is very important.  Your initial
> message indicated that you have a 90GB index, and that you are running
> in virtual machines.  Typically VMs have fairly small memory sizes.  It
> is very possible that you simply don't have enough memory in the VM for
> good performance with an index that large.  With 90GB of index data on
> one machine, I would hope for at least 64GB of RAM, and I would prefer
> to have 128GB.  If there is more than 90GB of data on one machine, then
> even more memory would be needed.
>
> Thanks,
> Shawn
>
>

Re: SOLR replicas performance

Posted by Shawn Heisey <ap...@elyograg.org>.

On 1/8/2016 7:55 AM, Luca Quarello wrote:
> I used solr5.3.1 and I sincerely expected response times with replica
> configuration near to response times without replica configuration.
> 
> Do you agree with me?
> 
> I read here
> http://lucene.472066.n3.nabble.com/Solr-Cloud-Query-Scaling-td4110516.html
> that "Queries do not need to be routed to leaders; they can be handled by
> any replica in a shard. Leaders are only needed for handling update
> requests. "
> 
> I haven't found this behaviour. In my case CONF2 e CONF3 have all replicas
> on VM2 but analyzing core utilization during a request is 100% on both
> machines. Why?

Indexing is a little bit slower with replication -- the update must
happen on all replicas.

If your index is sharded (which I believe you did indicate in your
initial message), you may find that all replicas get used even for
queries.  It is entirely possible that some of the shard subqueries will
be processed on one replica and some of them will be processed on other
replicas.  I do not know if this commonly happens, but I would not be
surprised if it does.  If the machines are sized appropriately for the
index, this separation should speed up queries, because you have the
resources of multiple machines handling one query.

That phrase "sized appropriately" is very important.  Your initial
message indicated that you have a 90GB index, and that you are running
in virtual machines.  Typically VMs have fairly small memory sizes.  It
is very possible that you simply don't have enough memory in the VM for
good performance with an index that large.  With 90GB of index data on
one machine, I would hope for at least 64GB of RAM, and I would prefer
to have 128GB.  If there is more than 90GB of data on one machine, then
even more memory would be needed.

Thanks,
Shawn

Re: SOLR replicas performance

Posted by Luca Quarello <lu...@xeffe.it>.

Hi Erick,
I used solr5.3.1 and I sincerely expected response times with replica
configuration near  to response times without replica configuration.

Do you agree with me?

I read here
http://lucene.472066.n3.nabble.com/Solr-Cloud-Query-Scaling-td4110516.html
that "Queries do not need to be routed to leaders; they can be handled by
any replica in a shard. Leaders are only needed for handling update
requests. "

I haven't found this behaviour. In my case CONF2 e CONF3 have all replicas
on VM2 but analyzing core utilization during a request is 100% on both
machines. Why?

Best,
Luca


*Luca Quarello*

M:+39 347 018 3855

luca.quarello@xeffe.it



*X**EFFE * s.r.l

C.so Giovanni Lanza 72, 10131 Torino

T: +39 011 660 5039

F: +39 011 198 26822

www.xeffe.it

On Tue, Jan 5, 2016 at 5:08 PM, Erick Erickson <er...@gmail.com>
wrote:

> What version of Solr? Prior to 5.2 the replicas were doing lots of
> unnecessary work/being blocked, see:
>
> https://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-twice-fast/
>
> Best,
> Erick
>
> On Tue, Jan 5, 2016 at 6:09 AM, Matteo Grolla <ma...@gmail.com>
> wrote:
> > Hi Luca,
> >       not sure if I understood well. Your question is
> > "Why are index times on a solr cloud collecton with 2 replicas higher
> than
> > on solr cloud with 1 replica" right?
> > Well with 2 replicas all docs have to be deparately indexed in 2 places
> and
> > solr has to confirm that both indexing went well.
> > Indexing times are lower on a solrcloud collection with 2 shards (just
> one
> > replica, the leader, per shard) because docs are indexed just once and
> the
> > load is spread on 2 servers instead of one
> >
> > 2015-12-30 2:03 GMT+01:00 Luca Quarello <lu...@gmail.com>:
> >
> >> Hi,
> >>
> >> I have an 260M documents index (90GB) with this structure:
> >>
> >>
> >> <field name="fragment" type="text_general" indexed="true" stored="true"
> >> multiValued="false" termVectors="false" termPositions="false"
> >> termOffsets="false" />
> >>
> >>   <field name="parentId" type="long" indexed="false" stored="true"
> >> multiValued="false"/>
> >>
> >>   <field name="fragmentContentType" type="string" indexed="false"
> >> stored="true" multiValued="false"/>
> >>
> >>   <field name="creationDate" type="date" indexed="true" stored="true"
> >> multiValued="false"/>
> >>
> >>   <field name="creationTimestamp" type="date" indexed="true"
> stored="true"
> >> multiValued="false"/>
> >>
> >>   <field name="visibility" type="string" indexed="true" stored="true"
> >> multiValued="false"/>
> >>
> >>   <field name="category" type="string" indexed="true" stored="true"
> >> multiValued="false"/>
> >>
> >>   <field name="marked" type="string" indexed="true" stored="true"
> >> multiValued="false"/>
> >>
> >>    <!-- catchall field, containing all other searchable text fields
> >> (implemented
> >>
> >>    via copyField further on in this schema  -->
> >>
> >>   <field name="text" type="text_general" indexed="true" stored="false"
> >> multiValued="true"/>
> >>
> >>   <copyField source="fragment" dest="text"/>
> >>
> >>   <copyField source="parentId" dest="text"/>
> >>
> >>   <copyField source="fragmentContentType" dest="text"/>
> >>
> >>   <copyField source="creationDate" dest="text"/>
> >>
> >>   <copyField source="visibility" dest="text"/>
> >>
> >>   <copyField source="category" dest="text"/>
> >>
> >>   <copyField source="marked" dest="text"/>
> >>
> >>
> >> where the fragmetnt field contains XML messagges.
> >>
> >> There is a search function that provide the messagges satisfying a
> search
> >> criterion.
> >>
> >>
> >> TARGET:
> >>
> >> To find the best configuration to optimize the response time of a two
> solr
> >> instances cloud with 2 VM with 8 core and 32 GB
> >>
> >>
> >> TEST RESULTS:
> >>
> >>
> >>    1.
> >>
> >>    Configurations:
> >>    1.
> >>
> >>       the better configuration without replicas
> >>       - CONF1: 16 shards of 17M documents (8 per VM)
> >>       1.
> >>
> >>       configuration with replica
> >>       - CONF 2: 8 shards of 35M documents with replication factor of 1
> >>          - CONF 3: 16 shards of 35M documents with replication factor
> of 1
> >>
> >>
> >>
> >>    1.
> >>
> >>    Executed tests
> >>
> >>
> >>    - sequential requests
> >>       - 5 parallel requests
> >>       - 10 parallel requests
> >>       - 20 parallel requests
> >>
> >> in two scenarios: during an indexing phase and not
> >>
> >>
> >> Call are: http://localhost:8983/solr/sepa/select?
> >> q=+fragment%3A*AAA*+&fq=marked%3AT&fq=-fragmentContentType
> >> %3ABULK&start=0&rows=100&sort=creationTimestamp+desc%2Cid+asc
> >>
> >>
> >>    1.
> >>
> >>    Test results
> >>
> >>            All the test have point out an I/O utilization of 100MB/s
> during
> >>
> >> loading data on disk cache, disk cache utilization of 20GB and core
> >> utilization of 100% (all 8 cores)
> >>
> >>
> >>
> >>    -
> >>
> >>    No indexing
> >>    -
> >>
> >>       CONF1 (time average and maximum time)
> >>       -
> >>
> >>          sequential: 4,1 6,9
> >>          -
> >>
> >>          5 parallel: 15,6 19,1
> >>          -
> >>
> >>          10 parallel: 23,6 30,2
> >>          -
> >>
> >>          20 parallel: 48 52,2
> >>          -
> >>
> >>       CONF2
> >>       -
> >>
> >>          sequential: 12,3 17,4
> >>          -
> >>
> >>          5 parallel: 32,5 34,2
> >>          -
> >>
> >>          10 parallel: 45,4 49
> >>          -
> >>
> >>          20 parallel: 64,6 74
> >>          -
> >>
> >>       CONF3
> >>       -
> >>
> >>          sequential: 6,9 9,9
> >>          -
> >>
> >>          5 parallel: 33,2 37,5
> >>          -
> >>
> >>          10 parallel: 46 51
> >>          -
> >>
> >>          20 parallel: 68 83
> >>
> >>
> >>
> >>    -
> >>
> >>    Indexing (into the solr admin console is it possible to view the
> >> total throughput?
> >>    I find it only relative to a single shard).
> >>
> >>
> >> CONF1
> >>
> >>    -
> >>
> >>       sequential: 7,7 9,5
> >>       -
> >>
> >>       5 parallel: 26,8 28,4
> >>       -
> >>
> >>       10 parallel: 31,8 37,8
> >>       -
> >>
> >>       20 parallel: 42 52,5
> >>       -
> >>
> >>    CONF2
> >>    -
> >>
> >>       sequential: 12,3 19
> >>       -
> >>
> >>       5 parallel: 39 40,8
> >>       -
> >>
> >>       10 parallel: 56,6 62,9
> >>       -
> >>
> >>       20 parallel: 79 116
> >>       -
> >>
> >>    CONF3
> >>    -
> >>
> >>       sequential: 10 18,9
> >>       -
> >>
> >>       5 parallel: 36,5 41,9
> >>       -
> >>
> >>       10 parallel: 63,7 64,1
> >>       -
> >>
> >>       20 parallel: 85 120
> >>
> >>
> >>
> >> I have two question:
> >>
> >>    -
> >>
> >>    the response times of the configuration with replica are worse (in
> test
> >>    case of sequential requests worse of about three time) than the
> response
> >>    times of the configuration without replica. Is it an expected result?
> >>    - Why during  index inserting and updating replicas doesn’t help to
> >>    reduce the response time?
> >>
>

Re: SOLR replicas performance

Posted by Luca Quarello <lu...@gmail.com>.

Hi Matteo,
the questions are two:

   - "Why are response times on a solr cloud collecton with 1 replica
   higher than on solr cloud without replica"

           Configuration1: solrCloud with two 8 cores VMs each with 8
shards of 17M docs
           Configuration2: solrClous with two 8 cores VMs each with 8
shards of 17M docs (8 master and 8 replicas)

I registered worst response time for replicas configuration (conf2) when:

   - Scenario1: I do queries without inserting record into the index
   - Scenario2: I do queries inserting record into the index

I expect similar response times in Scenario1 and better response times for
configuration2 in Scenario2.

Is it correct?

Thanks,
Luca

On Fri, Jan 8, 2016 at 3:56 PM, Luca Quarello <lu...@gmail.com>
wrote:

> Hi Erick,
> I used solr5.3.1 and I sincerely expected response times with replica
> configuration near  to response times without replica configuration.
>
> Do you agree with me?
>
> I read here
> http://lucene.472066.n3.nabble.com/Solr-Cloud-Query-Scaling-td4110516.html that
> "Queries do not need to be routed to leaders; they can be handled by any
> replica in a shard. Leaders are only needed for handling update requests.
>  "
>
> I haven't found this behaviour. In my case CONF2 e CONF3 have all replicas
> on VM2 but analyzing core utilization during a request is 100% on both
> machines. Why?
>
> Best,
> Luca
>
> On Tue, Jan 5, 2016 at 5:08 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> What version of Solr? Prior to 5.2 the replicas were doing lots of
>> unnecessary work/being blocked, see:
>>
>> https://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-twice-fast/
>>
>> Best,
>> Erick
>>
>> On Tue, Jan 5, 2016 at 6:09 AM, Matteo Grolla <ma...@gmail.com>
>> wrote:
>> > Hi Luca,
>> >       not sure if I understood well. Your question is
>> > "Why are index times on a solr cloud collecton with 2 replicas higher
>> than
>> > on solr cloud with 1 replica" right?
>> > Well with 2 replicas all docs have to be deparately indexed in 2 places
>> and
>> > solr has to confirm that both indexing went well.
>> > Indexing times are lower on a solrcloud collection with 2 shards (just
>> one
>> > replica, the leader, per shard) because docs are indexed just once and
>> the
>> > load is spread on 2 servers instead of one
>> >
>> > 2015-12-30 2:03 GMT+01:00 Luca Quarello <lu...@gmail.com>:
>> >
>> >> Hi,
>> >>
>> >> I have an 260M documents index (90GB) with this structure:
>> >>
>> >>
>> >> <field name="fragment" type="text_general" indexed="true" stored="true"
>> >> multiValued="false" termVectors="false" termPositions="false"
>> >> termOffsets="false" />
>> >>
>> >>   <field name="parentId" type="long" indexed="false" stored="true"
>> >> multiValued="false"/>
>> >>
>> >>   <field name="fragmentContentType" type="string" indexed="false"
>> >> stored="true" multiValued="false"/>
>> >>
>> >>   <field name="creationDate" type="date" indexed="true" stored="true"
>> >> multiValued="false"/>
>> >>
>> >>   <field name="creationTimestamp" type="date" indexed="true"
>> stored="true"
>> >> multiValued="false"/>
>> >>
>> >>   <field name="visibility" type="string" indexed="true" stored="true"
>> >> multiValued="false"/>
>> >>
>> >>   <field name="category" type="string" indexed="true" stored="true"
>> >> multiValued="false"/>
>> >>
>> >>   <field name="marked" type="string" indexed="true" stored="true"
>> >> multiValued="false"/>
>> >>
>> >>    <!-- catchall field, containing all other searchable text fields
>> >> (implemented
>> >>
>> >>    via copyField further on in this schema  -->
>> >>
>> >>   <field name="text" type="text_general" indexed="true" stored="false"
>> >> multiValued="true"/>
>> >>
>> >>   <copyField source="fragment" dest="text"/>
>> >>
>> >>   <copyField source="parentId" dest="text"/>
>> >>
>> >>   <copyField source="fragmentContentType" dest="text"/>
>> >>
>> >>   <copyField source="creationDate" dest="text"/>
>> >>
>> >>   <copyField source="visibility" dest="text"/>
>> >>
>> >>   <copyField source="category" dest="text"/>
>> >>
>> >>   <copyField source="marked" dest="text"/>
>> >>
>> >>
>> >> where the fragmetnt field contains XML messagges.
>> >>
>> >> There is a search function that provide the messagges satisfying a
>> search
>> >> criterion.
>> >>
>> >>
>> >> TARGET:
>> >>
>> >> To find the best configuration to optimize the response time of a two
>> solr
>> >> instances cloud with 2 VM with 8 core and 32 GB
>> >>
>> >>
>> >> TEST RESULTS:
>> >>
>> >>
>> >>    1.
>> >>
>> >>    Configurations:
>> >>    1.
>> >>
>> >>       the better configuration without replicas
>> >>       - CONF1: 16 shards of 17M documents (8 per VM)
>> >>       1.
>> >>
>> >>       configuration with replica
>> >>       - CONF 2: 8 shards of 35M documents with replication factor of 1
>> >>          - CONF 3: 16 shards of 35M documents with replication factor
>> of 1
>> >>
>> >>
>> >>
>> >>    1.
>> >>
>> >>    Executed tests
>> >>
>> >>
>> >>    - sequential requests
>> >>       - 5 parallel requests
>> >>       - 10 parallel requests
>> >>       - 20 parallel requests
>> >>
>> >> in two scenarios: during an indexing phase and not
>> >>
>> >>
>> >> Call are: http://localhost:8983/solr/sepa/select?
>> >> q=+fragment%3A*AAA*+&fq=marked%3AT&fq=-fragmentContentType
>> >> %3ABULK&start=0&rows=100&sort=creationTimestamp+desc%2Cid+asc
>> >>
>> >>
>> >>    1.
>> >>
>> >>    Test results
>> >>
>> >>            All the test have point out an I/O utilization of 100MB/s
>> during
>> >>
>> >> loading data on disk cache, disk cache utilization of 20GB and core
>> >> utilization of 100% (all 8 cores)
>> >>
>> >>
>> >>
>> >>    -
>> >>
>> >>    No indexing
>> >>    -
>> >>
>> >>       CONF1 (time average and maximum time)
>> >>       -
>> >>
>> >>          sequential: 4,1 6,9
>> >>          -
>> >>
>> >>          5 parallel: 15,6 19,1
>> >>          -
>> >>
>> >>          10 parallel: 23,6 30,2
>> >>          -
>> >>
>> >>          20 parallel: 48 52,2
>> >>          -
>> >>
>> >>       CONF2
>> >>       -
>> >>
>> >>          sequential: 12,3 17,4
>> >>          -
>> >>
>> >>          5 parallel: 32,5 34,2
>> >>          -
>> >>
>> >>          10 parallel: 45,4 49
>> >>          -
>> >>
>> >>          20 parallel: 64,6 74
>> >>          -
>> >>
>> >>       CONF3
>> >>       -
>> >>
>> >>          sequential: 6,9 9,9
>> >>          -
>> >>
>> >>          5 parallel: 33,2 37,5
>> >>          -
>> >>
>> >>          10 parallel: 46 51
>> >>          -
>> >>
>> >>          20 parallel: 68 83
>> >>
>> >>
>> >>
>> >>    -
>> >>
>> >>    Indexing (into the solr admin console is it possible to view the
>> >> total throughput?
>> >>    I find it only relative to a single shard).
>> >>
>> >>
>> >> CONF1
>> >>
>> >>    -
>> >>
>> >>       sequential: 7,7 9,5
>> >>       -
>> >>
>> >>       5 parallel: 26,8 28,4
>> >>       -
>> >>
>> >>       10 parallel: 31,8 37,8
>> >>       -
>> >>
>> >>       20 parallel: 42 52,5
>> >>       -
>> >>
>> >>    CONF2
>> >>    -
>> >>
>> >>       sequential: 12,3 19
>> >>       -
>> >>
>> >>       5 parallel: 39 40,8
>> >>       -
>> >>
>> >>       10 parallel: 56,6 62,9
>> >>       -
>> >>
>> >>       20 parallel: 79 116
>> >>       -
>> >>
>> >>    CONF3
>> >>    -
>> >>
>> >>       sequential: 10 18,9
>> >>       -
>> >>
>> >>       5 parallel: 36,5 41,9
>> >>       -
>> >>
>> >>       10 parallel: 63,7 64,1
>> >>       -
>> >>
>> >>       20 parallel: 85 120
>> >>
>> >>
>> >>
>> >> I have two question:
>> >>
>> >>    -
>> >>
>> >>    the response times of the configuration with replica are worse (in
>> test
>> >>    case of sequential requests worse of about three time) than the
>> response
>> >>    times of the configuration without replica. Is it an expected
>> result?
>> >>    - Why during  index inserting and updating replicas doesn’t help to
>> >>    reduce the response time?
>> >>
>>
>
>

Re: SOLR replicas performance

Posted by Luca Quarello <lu...@gmail.com>.

Hi Erick,
I used solr5.3.1 and I sincerely expected response times with replica
configuration near  to response times without replica configuration.

Do you agree with me?

I read here
http://lucene.472066.n3.nabble.com/Solr-Cloud-Query-Scaling-td4110516.html that
"Queries do not need to be routed to leaders; they can be handled by any
replica in a shard. Leaders are only needed for handling update requests. "

I haven't found this behaviour. In my case CONF2 e CONF3 have all replicas
on VM2 but analyzing core utilization during a request is 100% on both
machines. Why?

Best,
Luca

On Tue, Jan 5, 2016 at 5:08 PM, Erick Erickson <er...@gmail.com>
wrote:

> What version of Solr? Prior to 5.2 the replicas were doing lots of
> unnecessary work/being blocked, see:
>
> https://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-twice-fast/
>
> Best,
> Erick
>
> On Tue, Jan 5, 2016 at 6:09 AM, Matteo Grolla <ma...@gmail.com>
> wrote:
> > Hi Luca,
> >       not sure if I understood well. Your question is
> > "Why are index times on a solr cloud collecton with 2 replicas higher
> than
> > on solr cloud with 1 replica" right?
> > Well with 2 replicas all docs have to be deparately indexed in 2 places
> and
> > solr has to confirm that both indexing went well.
> > Indexing times are lower on a solrcloud collection with 2 shards (just
> one
> > replica, the leader, per shard) because docs are indexed just once and
> the
> > load is spread on 2 servers instead of one
> >
> > 2015-12-30 2:03 GMT+01:00 Luca Quarello <lu...@gmail.com>:
> >
> >> Hi,
> >>
> >> I have an 260M documents index (90GB) with this structure:
> >>
> >>
> >> <field name="fragment" type="text_general" indexed="true" stored="true"
> >> multiValued="false" termVectors="false" termPositions="false"
> >> termOffsets="false" />
> >>
> >>   <field name="parentId" type="long" indexed="false" stored="true"
> >> multiValued="false"/>
> >>
> >>   <field name="fragmentContentType" type="string" indexed="false"
> >> stored="true" multiValued="false"/>
> >>
> >>   <field name="creationDate" type="date" indexed="true" stored="true"
> >> multiValued="false"/>
> >>
> >>   <field name="creationTimestamp" type="date" indexed="true"
> stored="true"
> >> multiValued="false"/>
> >>
> >>   <field name="visibility" type="string" indexed="true" stored="true"
> >> multiValued="false"/>
> >>
> >>   <field name="category" type="string" indexed="true" stored="true"
> >> multiValued="false"/>
> >>
> >>   <field name="marked" type="string" indexed="true" stored="true"
> >> multiValued="false"/>
> >>
> >>    <!-- catchall field, containing all other searchable text fields
> >> (implemented
> >>
> >>    via copyField further on in this schema  -->
> >>
> >>   <field name="text" type="text_general" indexed="true" stored="false"
> >> multiValued="true"/>
> >>
> >>   <copyField source="fragment" dest="text"/>
> >>
> >>   <copyField source="parentId" dest="text"/>
> >>
> >>   <copyField source="fragmentContentType" dest="text"/>
> >>
> >>   <copyField source="creationDate" dest="text"/>
> >>
> >>   <copyField source="visibility" dest="text"/>
> >>
> >>   <copyField source="category" dest="text"/>
> >>
> >>   <copyField source="marked" dest="text"/>
> >>
> >>
> >> where the fragmetnt field contains XML messagges.
> >>
> >> There is a search function that provide the messagges satisfying a
> search
> >> criterion.
> >>
> >>
> >> TARGET:
> >>
> >> To find the best configuration to optimize the response time of a two
> solr
> >> instances cloud with 2 VM with 8 core and 32 GB
> >>
> >>
> >> TEST RESULTS:
> >>
> >>
> >>    1.
> >>
> >>    Configurations:
> >>    1.
> >>
> >>       the better configuration without replicas
> >>       - CONF1: 16 shards of 17M documents (8 per VM)
> >>       1.
> >>
> >>       configuration with replica
> >>       - CONF 2: 8 shards of 35M documents with replication factor of 1
> >>          - CONF 3: 16 shards of 35M documents with replication factor
> of 1
> >>
> >>
> >>
> >>    1.
> >>
> >>    Executed tests
> >>
> >>
> >>    - sequential requests
> >>       - 5 parallel requests
> >>       - 10 parallel requests
> >>       - 20 parallel requests
> >>
> >> in two scenarios: during an indexing phase and not
> >>
> >>
> >> Call are: http://localhost:8983/solr/sepa/select?
> >> q=+fragment%3A*AAA*+&fq=marked%3AT&fq=-fragmentContentType
> >> %3ABULK&start=0&rows=100&sort=creationTimestamp+desc%2Cid+asc
> >>
> >>
> >>    1.
> >>
> >>    Test results
> >>
> >>            All the test have point out an I/O utilization of 100MB/s
> during
> >>
> >> loading data on disk cache, disk cache utilization of 20GB and core
> >> utilization of 100% (all 8 cores)
> >>
> >>
> >>
> >>    -
> >>
> >>    No indexing
> >>    -
> >>
> >>       CONF1 (time average and maximum time)
> >>       -
> >>
> >>          sequential: 4,1 6,9
> >>          -
> >>
> >>          5 parallel: 15,6 19,1
> >>          -
> >>
> >>          10 parallel: 23,6 30,2
> >>          -
> >>
> >>          20 parallel: 48 52,2
> >>          -
> >>
> >>       CONF2
> >>       -
> >>
> >>          sequential: 12,3 17,4
> >>          -
> >>
> >>          5 parallel: 32,5 34,2
> >>          -
> >>
> >>          10 parallel: 45,4 49
> >>          -
> >>
> >>          20 parallel: 64,6 74
> >>          -
> >>
> >>       CONF3
> >>       -
> >>
> >>          sequential: 6,9 9,9
> >>          -
> >>
> >>          5 parallel: 33,2 37,5
> >>          -
> >>
> >>          10 parallel: 46 51
> >>          -
> >>
> >>          20 parallel: 68 83
> >>
> >>
> >>
> >>    -
> >>
> >>    Indexing (into the solr admin console is it possible to view the
> >> total throughput?
> >>    I find it only relative to a single shard).
> >>
> >>
> >> CONF1
> >>
> >>    -
> >>
> >>       sequential: 7,7 9,5
> >>       -
> >>
> >>       5 parallel: 26,8 28,4
> >>       -
> >>
> >>       10 parallel: 31,8 37,8
> >>       -
> >>
> >>       20 parallel: 42 52,5
> >>       -
> >>
> >>    CONF2
> >>    -
> >>
> >>       sequential: 12,3 19
> >>       -
> >>
> >>       5 parallel: 39 40,8
> >>       -
> >>
> >>       10 parallel: 56,6 62,9
> >>       -
> >>
> >>       20 parallel: 79 116
> >>       -
> >>
> >>    CONF3
> >>    -
> >>
> >>       sequential: 10 18,9
> >>       -
> >>
> >>       5 parallel: 36,5 41,9
> >>       -
> >>
> >>       10 parallel: 63,7 64,1
> >>       -
> >>
> >>       20 parallel: 85 120
> >>
> >>
> >>
> >> I have two question:
> >>
> >>    -
> >>
> >>    the response times of the configuration with replica are worse (in
> test
> >>    case of sequential requests worse of about three time) than the
> response
> >>    times of the configuration without replica. Is it an expected result?
> >>    - Why during  index inserting and updating replicas doesn’t help to
> >>    reduce the response time?
> >>
>

Re: SOLR replicas performance

Posted by Erick Erickson <er...@gmail.com>.

What version of Solr? Prior to 5.2 the replicas were doing lots of
unnecessary work/being blocked, see:
https://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-twice-fast/

Best,
Erick

On Tue, Jan 5, 2016 at 6:09 AM, Matteo Grolla <ma...@gmail.com> wrote:
> Hi Luca,
>       not sure if I understood well. Your question is
> "Why are index times on a solr cloud collecton with 2 replicas higher than
> on solr cloud with 1 replica" right?
> Well with 2 replicas all docs have to be deparately indexed in 2 places and
> solr has to confirm that both indexing went well.
> Indexing times are lower on a solrcloud collection with 2 shards (just one
> replica, the leader, per shard) because docs are indexed just once and the
> load is spread on 2 servers instead of one
>
> 2015-12-30 2:03 GMT+01:00 Luca Quarello <lu...@gmail.com>:
>
>> Hi,
>>
>> I have an 260M documents index (90GB) with this structure:
>>
>>
>> <field name="fragment" type="text_general" indexed="true" stored="true"
>> multiValued="false" termVectors="false" termPositions="false"
>> termOffsets="false" />
>>
>>   <field name="parentId" type="long" indexed="false" stored="true"
>> multiValued="false"/>
>>
>>   <field name="fragmentContentType" type="string" indexed="false"
>> stored="true" multiValued="false"/>
>>
>>   <field name="creationDate" type="date" indexed="true" stored="true"
>> multiValued="false"/>
>>
>>   <field name="creationTimestamp" type="date" indexed="true" stored="true"
>> multiValued="false"/>
>>
>>   <field name="visibility" type="string" indexed="true" stored="true"
>> multiValued="false"/>
>>
>>   <field name="category" type="string" indexed="true" stored="true"
>> multiValued="false"/>
>>
>>   <field name="marked" type="string" indexed="true" stored="true"
>> multiValued="false"/>
>>
>>    <!-- catchall field, containing all other searchable text fields
>> (implemented
>>
>>    via copyField further on in this schema  -->
>>
>>   <field name="text" type="text_general" indexed="true" stored="false"
>> multiValued="true"/>
>>
>>   <copyField source="fragment" dest="text"/>
>>
>>   <copyField source="parentId" dest="text"/>
>>
>>   <copyField source="fragmentContentType" dest="text"/>
>>
>>   <copyField source="creationDate" dest="text"/>
>>
>>   <copyField source="visibility" dest="text"/>
>>
>>   <copyField source="category" dest="text"/>
>>
>>   <copyField source="marked" dest="text"/>
>>
>>
>> where the fragmetnt field contains XML messagges.
>>
>> There is a search function that provide the messagges satisfying a search
>> criterion.
>>
>>
>> TARGET:
>>
>> To find the best configuration to optimize the response time of a two solr
>> instances cloud with 2 VM with 8 core and 32 GB
>>
>>
>> TEST RESULTS:
>>
>>
>>    1.
>>
>>    Configurations:
>>    1.
>>
>>       the better configuration without replicas
>>       - CONF1: 16 shards of 17M documents (8 per VM)
>>       1.
>>
>>       configuration with replica
>>       - CONF 2: 8 shards of 35M documents with replication factor of 1
>>          - CONF 3: 16 shards of 35M documents with replication factor of 1
>>
>>
>>
>>    1.
>>
>>    Executed tests
>>
>>
>>    - sequential requests
>>       - 5 parallel requests
>>       - 10 parallel requests
>>       - 20 parallel requests
>>
>> in two scenarios: during an indexing phase and not
>>
>>
>> Call are: http://localhost:8983/solr/sepa/select?
>> q=+fragment%3A*AAA*+&fq=marked%3AT&fq=-fragmentContentType
>> %3ABULK&start=0&rows=100&sort=creationTimestamp+desc%2Cid+asc
>>
>>
>>    1.
>>
>>    Test results
>>
>>            All the test have point out an I/O utilization of 100MB/s during
>>
>> loading data on disk cache, disk cache utilization of 20GB and core
>> utilization of 100% (all 8 cores)
>>
>>
>>
>>    -
>>
>>    No indexing
>>    -
>>
>>       CONF1 (time average and maximum time)
>>       -
>>
>>          sequential: 4,1 6,9
>>          -
>>
>>          5 parallel: 15,6 19,1
>>          -
>>
>>          10 parallel: 23,6 30,2
>>          -
>>
>>          20 parallel: 48 52,2
>>          -
>>
>>       CONF2
>>       -
>>
>>          sequential: 12,3 17,4
>>          -
>>
>>          5 parallel: 32,5 34,2
>>          -
>>
>>          10 parallel: 45,4 49
>>          -
>>
>>          20 parallel: 64,6 74
>>          -
>>
>>       CONF3
>>       -
>>
>>          sequential: 6,9 9,9
>>          -
>>
>>          5 parallel: 33,2 37,5
>>          -
>>
>>          10 parallel: 46 51
>>          -
>>
>>          20 parallel: 68 83
>>
>>
>>
>>    -
>>
>>    Indexing (into the solr admin console is it possible to view the
>> total throughput?
>>    I find it only relative to a single shard).
>>
>>
>> CONF1
>>
>>    -
>>
>>       sequential: 7,7 9,5
>>       -
>>
>>       5 parallel: 26,8 28,4
>>       -
>>
>>       10 parallel: 31,8 37,8
>>       -
>>
>>       20 parallel: 42 52,5
>>       -
>>
>>    CONF2
>>    -
>>
>>       sequential: 12,3 19
>>       -
>>
>>       5 parallel: 39 40,8
>>       -
>>
>>       10 parallel: 56,6 62,9
>>       -
>>
>>       20 parallel: 79 116
>>       -
>>
>>    CONF3
>>    -
>>
>>       sequential: 10 18,9
>>       -
>>
>>       5 parallel: 36,5 41,9
>>       -
>>
>>       10 parallel: 63,7 64,1
>>       -
>>
>>       20 parallel: 85 120
>>
>>
>>
>> I have two question:
>>
>>    -
>>
>>    the response times of the configuration with replica are worse (in test
>>    case of sequential requests worse of about three time) than the response
>>    times of the configuration without replica. Is it an expected result?
>>    - Why during  index inserting and updating replicas doesn’t help to
>>    reduce the response time?
>>

Re: SOLR replicas performance

Posted by Matteo Grolla <ma...@gmail.com>.

Hi Luca,
      not sure if I understood well. Your question is
"Why are index times on a solr cloud collecton with 2 replicas higher than
on solr cloud with 1 replica" right?
Well with 2 replicas all docs have to be deparately indexed in 2 places and
solr has to confirm that both indexing went well.
Indexing times are lower on a solrcloud collection with 2 shards (just one
replica, the leader, per shard) because docs are indexed just once and the
load is spread on 2 servers instead of one

2015-12-30 2:03 GMT+01:00 Luca Quarello <lu...@gmail.com>:

> Hi,
>
> I have an 260M documents index (90GB) with this structure:
>
>
> <field name="fragment" type="text_general" indexed="true" stored="true"
> multiValued="false" termVectors="false" termPositions="false"
> termOffsets="false" />
>
>   <field name="parentId" type="long" indexed="false" stored="true"
> multiValued="false"/>
>
>   <field name="fragmentContentType" type="string" indexed="false"
> stored="true" multiValued="false"/>
>
>   <field name="creationDate" type="date" indexed="true" stored="true"
> multiValued="false"/>
>
>   <field name="creationTimestamp" type="date" indexed="true" stored="true"
> multiValued="false"/>
>
>   <field name="visibility" type="string" indexed="true" stored="true"
> multiValued="false"/>
>
>   <field name="category" type="string" indexed="true" stored="true"
> multiValued="false"/>
>
>   <field name="marked" type="string" indexed="true" stored="true"
> multiValued="false"/>
>
>    <!-- catchall field, containing all other searchable text fields
> (implemented
>
>    via copyField further on in this schema  -->
>
>   <field name="text" type="text_general" indexed="true" stored="false"
> multiValued="true"/>
>
>   <copyField source="fragment" dest="text"/>
>
>   <copyField source="parentId" dest="text"/>
>
>   <copyField source="fragmentContentType" dest="text"/>
>
>   <copyField source="creationDate" dest="text"/>
>
>   <copyField source="visibility" dest="text"/>
>
>   <copyField source="category" dest="text"/>
>
>   <copyField source="marked" dest="text"/>
>
>
> where the fragmetnt field contains XML messagges.
>
> There is a search function that provide the messagges satisfying a search
> criterion.
>
>
> TARGET:
>
> To find the best configuration to optimize the response time of a two solr
> instances cloud with 2 VM with 8 core and 32 GB
>
>
> TEST RESULTS:
>
>
>    1.
>
>    Configurations:
>    1.
>
>       the better configuration without replicas
>       - CONF1: 16 shards of 17M documents (8 per VM)
>       1.
>
>       configuration with replica
>       - CONF 2: 8 shards of 35M documents with replication factor of 1
>          - CONF 3: 16 shards of 35M documents with replication factor of 1
>
>
>
>    1.
>
>    Executed tests
>
>
>    - sequential requests
>       - 5 parallel requests
>       - 10 parallel requests
>       - 20 parallel requests
>
> in two scenarios: during an indexing phase and not
>
>
> Call are: http://localhost:8983/solr/sepa/select?
> q=+fragment%3A*AAA*+&fq=marked%3AT&fq=-fragmentContentType
> %3ABULK&start=0&rows=100&sort=creationTimestamp+desc%2Cid+asc
>
>
>    1.
>
>    Test results
>
>            All the test have point out an I/O utilization of 100MB/s during
>
> loading data on disk cache, disk cache utilization of 20GB and core
> utilization of 100% (all 8 cores)
>
>
>
>    -
>
>    No indexing
>    -
>
>       CONF1 (time average and maximum time)
>       -
>
>          sequential: 4,1 6,9
>          -
>
>          5 parallel: 15,6 19,1
>          -
>
>          10 parallel: 23,6 30,2
>          -
>
>          20 parallel: 48 52,2
>          -
>
>       CONF2
>       -
>
>          sequential: 12,3 17,4
>          -
>
>          5 parallel: 32,5 34,2
>          -
>
>          10 parallel: 45,4 49
>          -
>
>          20 parallel: 64,6 74
>          -
>
>       CONF3
>       -
>
>          sequential: 6,9 9,9
>          -
>
>          5 parallel: 33,2 37,5
>          -
>
>          10 parallel: 46 51
>          -
>
>          20 parallel: 68 83
>
>
>
>    -
>
>    Indexing (into the solr admin console is it possible to view the
> total throughput?
>    I find it only relative to a single shard).
>
>
> CONF1
>
>    -
>
>       sequential: 7,7 9,5
>       -
>
>       5 parallel: 26,8 28,4
>       -
>
>       10 parallel: 31,8 37,8
>       -
>
>       20 parallel: 42 52,5
>       -
>
>    CONF2
>    -
>
>       sequential: 12,3 19
>       -
>
>       5 parallel: 39 40,8
>       -
>
>       10 parallel: 56,6 62,9
>       -
>
>       20 parallel: 79 116
>       -
>
>    CONF3
>    -
>
>       sequential: 10 18,9
>       -
>
>       5 parallel: 36,5 41,9
>       -
>
>       10 parallel: 63,7 64,1
>       -
>
>       20 parallel: 85 120
>
>
>
> I have two question:
>
>    -
>
>    the response times of the configuration with replica are worse (in test
>    case of sequential requests worse of about three time) than the response
>    times of the configuration without replica. Is it an expected result?
>    - Why during  index inserting and updating replicas doesn’t help to
>    reduce the response time?
>