You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Ash Ramesh <as...@canva.com> on 2018/07/18 06:04:47 UTC

Memory requirements for TLOGs (7.3.1)

Hi everybody,

I have a quick question about what the memory requirements for TLOG
machines are on 7.3.1. We currently run replication where there are 3 TLOGs
with 8gb ram (2gb heap) and N PULL replicas with 32gb ram (4gb heap). We
have > 10M documents (1 collection) with the index size being ~ 17gb. We
send all read traffic to the PULLs and send Updates and Additions to the
Leader TLOG.

We are wondering how this setup can affect performance for replication,
etc. We are thinking of increasing the heap of the TLOG to 4gb but leaving
the total memory on the machine at 8gb. What will that do to performance?
We also expect our index to grow 3/4x in the next 6 months.

Any assistance would be well appreciated :)

Regards,

Ash

-- 
*P.S. We've launched a new blog to share the latest ideas and case studies 
from our team. Check it out here: product.canva.com 
<http://product.canva.com/>. ***
** <https://canva.com>Empowering the world 
to design
Also, we're hiring. Apply here! 
<https://about.canva.com/careers/>
 <https://twitter.com/canva> 
<https://facebook.com/canva> <https://au.linkedin.com/company/canva> 
<https://instagram.com/canva>

Re: Memory requirements for TLOGs (7.3.1)

Posted by Shawn Heisey <ap...@elyograg.org>.

On 7/18/2018 6:33 PM, Ash Ramesh wrote:
> Thanks for the quick responses Shawn & Erick! Just to clarify another few
> points:
>  1. Does having a larger heap size impact ingesting additional documents to
> the index (all CRUD operations) onto a TLOG?

It's extremely difficult, maybe even impossible, for anyone on this list
to predict whether performance will be improved by increasing the heap,
at least not without some really concrete information from the system. 
If you shared your GC log and whatever activity you want to improve was
happening during that log creation, I could probably answer that
question for your specific server.

>  2. Does having a larger ram configured machine (in this case 32gb) affect
> ingestion on TLOGS also?

Having more memory for the OS disk cache does not usually improve
indexing performance.  The only kind of memory that is likely to matter
for that is heap memory.  Once you reach a sufficient heap size,
increasing it further won't help and might actually hurt performance.

>  3. We are currently routing queries via Amazon ASG / Load Balancer. Is
> this one of the recommended ways to set up SOLR infrastructure?

If your client software is not cloud-aware, you'll want an external load
balancer.  The only cloud-aware client that I know for sure exists is
the Java client, which is part of Solr itself as well as a standalone
client.  I did hear once about a cloud-aware client under development
for Python, but I do not know the status of that client -- it would be
third-party software.

Because you're using an external load balancer, you could list only the
PULL replicas in the load balancer back end configuration, and include
the preferLocalShards parameter on the request, so that SolrCloud will
not load balance the requests further.

Thanks,
Shawn

Re: Memory requirements for TLOGs (7.3.1)

Posted by Ash Ramesh <as...@canva.com>.

Thanks for the quick responses Shawn & Erick! Just to clarify another few
points:
 1. Does having a larger heap size impact ingesting additional documents to
the index (all CRUD operations) onto a TLOG?
 2. Does having a larger ram configured machine (in this case 32gb) affect
ingestion on TLOGS also?
 3. We are currently routing queries via Amazon ASG / Load Balancer. Is
this one of the recommended ways to set up SOLR infrastructure?

Best Regards,

Ash


On Thu, Jul 19, 2018 at 12:56 AM Erick Erickson <er...@gmail.com>
wrote:

> There's little good reason to _not_ route searches to your TLOG
> replicas. The only difference between the PULL and TLOG replicas is
> that the TLOG replicas get a raw copy of the incoming document from
> the leader and write them to the TLOG. I.e. there's some additional
> I/O.
>
> It's possible that if you have extremely heavy indexing you might
> notice some additional load on the TLOG .vs. PULL replicas, but from
> what you've said I doubt you have that much indexing traffic.
>
> So basically I'd configure my TLOG and PULL replicas pretty much
> identically and search them both.
>
> Best,
> Erick
>
> On Wed, Jul 18, 2018 at 7:46 AM, Shawn Heisey <ap...@elyograg.org> wrote:
> > On 7/18/2018 12:04 AM, Ash Ramesh wrote:
> >>
> >> I have a quick question about what the memory requirements for TLOG
> >> machines are on 7.3.1. We currently run replication where there are 3
> >> TLOGs
> >> with 8gb ram (2gb heap) and N PULL replicas with 32gb ram (4gb heap). We
> >> have > 10M documents (1 collection) with the index size being ~ 17gb. We
> >> send all read traffic to the PULLs and send Updates and Additions to the
> >> Leader TLOG.
> >>
> >> We are wondering how this setup can affect performance for replication,
> >> etc. We are thinking of increasing the heap of the TLOG to 4gb but
> leaving
> >> the total memory on the machine at 8gb. What will that do to
> performance?
> >> We also expect our index to grow 3/4x in the next 6 months.
> >
> >
> > Performance has more to do with index size and memory size than the type
> of
> > replication you're doing.
> >
> > SolrCloud will load balance queries across the cloud, so your low-memory
> > TLOG replicas are most likely handling queries as well.  In a SolrCloud
> > cluster, a query is not necessarily handled by the machine that you send
> the
> > query to.
> >
> > With memory resources that low compared to index size, the 8GB machines
> > probably do not perform queries as well as the 32GB machines.  If you
> > increase the heap to 4GB, that will only leave 4GB available for the OS
> disk
> > cache, and that's going to drop query performance even further.
> >
> > There is a feature in Solr 7.4 that will allow you to prefer certain
> replica
> > types, so you can tell Solr that it should prefer PULL replicas.  But
> since
> > you're running 7.3.1, you don't have that feature.
> >
> > https://issues.apache.org/jira/browse/SOLR-11982
> >
> > There is also a "preferLocalShards" parameter that has existed for longer
> > than the new feature mentioned above.  This tells Solr that it should not
> > load balance queries in the cloud if there is a local index that can
> satisfy
> > the query.  This parameter should only be used if you have an external
> load
> > balancer.
> >
> > Indexing is a heap-intensive operation that doesn't benefit much from
> having
> > a lot of extra memory for the operating system. I have no idea whether
> 2GB
> > of heap is enough or not.  Increasing the heap size MIGHT make
> performance
> > better, or it might make no difference at all.
> >
> > Thanks,
> > Shawn
> >
>

-- 
*P.S. We've launched a new blog to share the latest ideas and case studies 
from our team. Check it out here: product.canva.com 
<http://product.canva.com/>. ***
** <https://canva.com>Empowering the world 
to design
Also, we're hiring. Apply here! 
<https://about.canva.com/careers/>
 <https://twitter.com/canva> 
<https://facebook.com/canva> <https://au.linkedin.com/company/canva> 
<https://instagram.com/canva>

Re: Memory requirements for TLOGs (7.3.1)

Posted by Erick Erickson <er...@gmail.com>.

There's little good reason to _not_ route searches to your TLOG
replicas. The only difference between the PULL and TLOG replicas is
that the TLOG replicas get a raw copy of the incoming document from
the leader and write them to the TLOG. I.e. there's some additional
I/O.

It's possible that if you have extremely heavy indexing you might
notice some additional load on the TLOG .vs. PULL replicas, but from
what you've said I doubt you have that much indexing traffic.

So basically I'd configure my TLOG and PULL replicas pretty much
identically and search them both.

Best,
Erick

On Wed, Jul 18, 2018 at 7:46 AM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 7/18/2018 12:04 AM, Ash Ramesh wrote:
>>
>> I have a quick question about what the memory requirements for TLOG
>> machines are on 7.3.1. We currently run replication where there are 3
>> TLOGs
>> with 8gb ram (2gb heap) and N PULL replicas with 32gb ram (4gb heap). We
>> have > 10M documents (1 collection) with the index size being ~ 17gb. We
>> send all read traffic to the PULLs and send Updates and Additions to the
>> Leader TLOG.
>>
>> We are wondering how this setup can affect performance for replication,
>> etc. We are thinking of increasing the heap of the TLOG to 4gb but leaving
>> the total memory on the machine at 8gb. What will that do to performance?
>> We also expect our index to grow 3/4x in the next 6 months.
>
>
> Performance has more to do with index size and memory size than the type of
> replication you're doing.
>
> SolrCloud will load balance queries across the cloud, so your low-memory
> TLOG replicas are most likely handling queries as well.  In a SolrCloud
> cluster, a query is not necessarily handled by the machine that you send the
> query to.
>
> With memory resources that low compared to index size, the 8GB machines
> probably do not perform queries as well as the 32GB machines.  If you
> increase the heap to 4GB, that will only leave 4GB available for the OS disk
> cache, and that's going to drop query performance even further.
>
> There is a feature in Solr 7.4 that will allow you to prefer certain replica
> types, so you can tell Solr that it should prefer PULL replicas.  But since
> you're running 7.3.1, you don't have that feature.
>
> https://issues.apache.org/jira/browse/SOLR-11982
>
> There is also a "preferLocalShards" parameter that has existed for longer
> than the new feature mentioned above.  This tells Solr that it should not
> load balance queries in the cloud if there is a local index that can satisfy
> the query.  This parameter should only be used if you have an external load
> balancer.
>
> Indexing is a heap-intensive operation that doesn't benefit much from having
> a lot of extra memory for the operating system. I have no idea whether 2GB
> of heap is enough or not.  Increasing the heap size MIGHT make performance
> better, or it might make no difference at all.
>
> Thanks,
> Shawn
>

Re: Memory requirements for TLOGs (7.3.1)

Posted by Shawn Heisey <ap...@elyograg.org>.

On 7/18/2018 12:04 AM, Ash Ramesh wrote:
> I have a quick question about what the memory requirements for TLOG
> machines are on 7.3.1. We currently run replication where there are 3 TLOGs
> with 8gb ram (2gb heap) and N PULL replicas with 32gb ram (4gb heap). We
> have > 10M documents (1 collection) with the index size being ~ 17gb. We
> send all read traffic to the PULLs and send Updates and Additions to the
> Leader TLOG.
>
> We are wondering how this setup can affect performance for replication,
> etc. We are thinking of increasing the heap of the TLOG to 4gb but leaving
> the total memory on the machine at 8gb. What will that do to performance?
> We also expect our index to grow 3/4x in the next 6 months.

Performance has more to do with index size and memory size than the type 
of replication you're doing.

SolrCloud will load balance queries across the cloud, so your low-memory 
TLOG replicas are most likely handling queries as well.  In a SolrCloud 
cluster, a query is not necessarily handled by the machine that you send 
the query to.

With memory resources that low compared to index size, the 8GB machines 
probably do not perform queries as well as the 32GB machines.  If you 
increase the heap to 4GB, that will only leave 4GB available for the OS 
disk cache, and that's going to drop query performance even further.

There is a feature in Solr 7.4 that will allow you to prefer certain 
replica types, so you can tell Solr that it should prefer PULL 
replicas.  But since you're running 7.3.1, you don't have that feature.

https://issues.apache.org/jira/browse/SOLR-11982

There is also a "preferLocalShards" parameter that has existed for 
longer than the new feature mentioned above.  This tells Solr that it 
should not load balance queries in the cloud if there is a local index 
that can satisfy the query.  This parameter should only be used if you 
have an external load balancer.

Indexing is a heap-intensive operation that doesn't benefit much from 
having a lot of extra memory for the operating system. I have no idea 
whether 2GB of heap is enough or not.  Increasing the heap size MIGHT 
make performance better, or it might make no difference at all.

Thanks,
Shawn