You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ron Buchanan <bu...@gmail.com> on 2020/12/23 15:00:36 UTC
Indexing performance 7.3 vs 8.7
(this is long, just trying to be thorough)
I'm working on upgrading from Solr 7.3 to Solr 8.7 and I am seeing a
significant drop in indexing throughput during a full index reload - from
~1300 documents per second to ~450 documents/sec
Background:
VM hosts (these are configured identically):
- Our Solr clusters run in a virtualized environment.
- Each Virtual Machine has 8 CPUs and 64Gb RAM.
- The hosts are organized into 2 4-host clusters - one for 7.3 and
one for 8.7.
- Each cluster has its own 3 VM Zookeeper cluster (running the
version that was current at the time of install).
JVM:
- all the JVMs are set-up with -Xms28G and -Xmx28Gb
- the Solr 8.7 cluster is running with the default JVM settings
(i.e., as configured by the Solr install script) **other than memory**
- the Solr 7.3 cluster was configured awhile ago, but I'm fairly sure
it's running pretty vanilla JVM settings (if not outright
default) **other
than memory**
- the most obvious difference between the JVM settings for the
environments is the garbage collector: ConcurrentMarkSweep for
7.3 and G1GC
for 8.7
- both run Java 1.8, but 7.3 is running HotSpot and 8.7 is running
OpenJDK (and a bit newer)
Solr:
- 1 shard, 1 replica per host - all NRT (both clusters)
- Both the Solr 7.3 and 8.7 clusters are running the same schema
- with one exception, only the most minimal changes were made to the
default Solr 8.7 solrconfig.xml to keep it in-line with the 7.3
solrconfig
(mostly around Cache settings)
- the exception: running with luceneMatchVersion=7.3.0
Data Loading:
- Data is loaded by a completely separate VM running a custom Java
process that collects data from source and generates SolrInputDocuments
from that source and sends it via CloudSolrClient
- this Java process is multi-threaded with an upper-limit on the
number of simultaneous threads sending documents and the size of the
document payload
- we are loading ~10 million documents during a full-reload - this is
a product catalog, so the documents actually represent data about SKUs we
sell (and they aren't particularly large, though the size is variable)
- the existing Solr 7.3 cluster has a full-reload time of around 2.5
hours, the Solr 8.7 cluster requires around 6.25 hours
Efforts so far:
- checked network speed from the VM generating updates (it's the same
server for both 7.3 and 8.7) and the clusters
- performance to the 8.7 cluster is actually better
- as best as possible, controlling for VM topology (i.e., distribution
of the VMs across hosts within the VM cluster)
- real-time JVM monitoring with VisualVM during indexing on 8.7 cluster
- looked nice - same as I've always seen for the 7.3 cluster
- checked the GC logs with GCEasy
- reported as healthy
Thoughts/questions/considerations:
- could running an older LuceneMatchVersion affect indexing performance?
- still a little concerned that the VM topology is affecting things (our
VM-crew split the 7.3 cluster across VM clusters in an attempt to improve
resiliency in case VM cluster failure and that's not something we can or
want to replicate) - that said, the performance difference is consistent
with what I've seen in our QA environment and that environment has a less
even spread of VMs across hosts (e.g., multiple Solr VMs on the same VM
host)
- we have a couple of custom tokenizers and tokenFilters - those were
rebuilt using the 8.7.0 versions of solr-core and apache-core - they're
pretty simple and I'm not terribly concerned about this, but it is
non-standard
- query performance is comparable between 7.3 and 8.7 and documents
returned are reasonably consistent (few really big differences, mostly just
scoring differences that affect ordering)
- after watching the 8.7 JVMs in real-time during indexing, I decided to
drop the memory to -Xms20g and -Xmx20g - this had no effect on indexing
speed (or GC impacts) - so, I think it's at least safe to say this is not
memory-bound
Final question:
is it simply typical to see significantly worse indexing performance on 8.7
than 7.3?
Any suggestions on where to look would be highly appreciated.
Thanks,
Ron
Re: Indexing performance 7.3 vs 8.7
Posted by Bram Van Dam <br...@intix.eu>.
On 23/12/2020 16:00, Ron Buchanan wrote:
> - both run Java 1.8, but 7.3 is running HotSpot and 8.7 is running
> OpenJDK (and a bit newer)
If you're using G1GC, you probably want to give Java 11 a go. It's an
easy thing to test, and it's had a positive impact for us. Your mileage
may vary.
- Bram