You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ron Buchanan <bu...@gmail.com> on 2020/12/23 15:00:36 UTC

Indexing performance 7.3 vs 8.7

(this is long, just trying to be thorough)

I'm working on upgrading from Solr 7.3 to Solr 8.7 and I am seeing a
significant drop in indexing throughput during a full index reload - from
~1300 documents per second to ~450 documents/sec

Background:

VM hosts (these are configured identically):


   - Our Solr clusters run in a virtualized environment.
      - Each Virtual Machine has 8 CPUs and 64Gb RAM.
      - The hosts are organized into 2 4-host clusters - one for 7.3 and
      one for 8.7.
      - Each cluster has its own 3 VM Zookeeper cluster (running the
      version that was current at the time of install).


JVM:


   - all the JVMs are set-up with -Xms28G and -Xmx28Gb
      - the Solr 8.7 cluster is running with the default JVM settings
      (i.e., as configured by the Solr install script) **other than memory**
      - the Solr 7.3 cluster was configured awhile ago, but I'm fairly sure
      it's running pretty vanilla JVM settings (if not outright
default) **other
      than memory**
      - the most obvious difference between the JVM settings for the
      environments is the garbage collector: ConcurrentMarkSweep for
7.3 and G1GC
      for 8.7
      - both run Java 1.8, but 7.3 is running HotSpot and 8.7 is running
      OpenJDK (and a bit newer)


Solr:


   - 1 shard, 1 replica per host - all NRT (both clusters)
      - Both the Solr 7.3 and 8.7 clusters are running the same schema
      - with one exception, only the most minimal changes were made to the
      default Solr 8.7 solrconfig.xml to keep it in-line with the 7.3
solrconfig
      (mostly around Cache settings)
         - the exception: running with luceneMatchVersion=7.3.0


Data Loading:


   - Data is loaded by a completely separate VM running a custom Java
      process that collects data from source and generates SolrInputDocuments
      from that source and sends it via CloudSolrClient
      - this Java process is multi-threaded with an upper-limit on the
      number of simultaneous threads sending documents and the size of the
      document payload
      - we are loading ~10 million documents during a full-reload - this is
      a product catalog, so the documents actually represent data about SKUs we
      sell (and they aren't particularly large, though the size is variable)
      - the existing Solr 7.3 cluster has a full-reload time of around 2.5
      hours, the Solr 8.7 cluster requires around 6.25 hours


Efforts so far:

   - checked network speed from the VM generating updates (it's the same
   server for both 7.3 and 8.7) and the clusters
      - performance to the 8.7 cluster is actually better
   - as best as possible, controlling for VM topology (i.e., distribution
   of the VMs across hosts within the VM cluster)
   - real-time JVM monitoring with VisualVM during indexing on 8.7 cluster
      - looked nice - same as I've always seen for the 7.3 cluster
   - checked the GC logs with GCEasy
      - reported as healthy


Thoughts/questions/considerations:

   - could running an older LuceneMatchVersion affect indexing performance?
   - still a little concerned that the VM topology is affecting things (our
   VM-crew split the 7.3 cluster across VM clusters in an attempt to improve
   resiliency in case VM cluster failure and that's not something we can or
   want to replicate) - that said, the performance difference is consistent
   with what I've seen in our QA environment and that environment has a less
   even spread of VMs across hosts (e.g., multiple Solr VMs on the same VM
   host)
   - we have a couple of custom tokenizers and tokenFilters - those were
   rebuilt using the 8.7.0 versions of solr-core and apache-core - they're
   pretty simple and I'm not terribly concerned about this, but it is
   non-standard
   - query performance is comparable between 7.3 and 8.7 and documents
   returned are reasonably consistent (few really big differences, mostly just
   scoring differences that affect ordering)
   - after watching the 8.7 JVMs in real-time during indexing, I decided to
   drop the memory to -Xms20g and -Xmx20g - this had no effect on indexing
   speed (or GC impacts) - so, I think it's at least safe to say this is not
   memory-bound


Final question:

is it simply typical to see significantly worse indexing performance on 8.7
than 7.3?

Any suggestions on where to look would be highly appreciated.

Thanks,

Ron

Re: Indexing performance 7.3 vs 8.7

Posted by Bram Van Dam <br...@intix.eu>.
On 23/12/2020 16:00, Ron Buchanan wrote:
>       - both run Java 1.8, but 7.3 is running HotSpot and 8.7 is running
>       OpenJDK (and a bit newer)

If you're using G1GC, you probably want to give Java 11 a go. It's an
easy thing to test, and it's had a positive impact for us. Your mileage
may vary.

 - Bram