You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Roman Ivanov <no...@gmail.com> on 2020/12/09 13:45:23 UTC

How can i poll Solrcloud via API to get the sum of index size of all shards and replicas?

Hello! We have a Solrcloud(7.4) consisting of 90+ hosts(each of them
running multiple nodes of solr, e.g. ports 8983, 8984, 8985), numerous
shards(each having several replicas) and numerous collections.

I was given a task to summarize the total index size(on disks) of a certain
collection. First I calculated it from web interface(via copy-paste)
manually and there were thousands of lines (The http interface(8983) Cloud
- Nodes tab). It took about several hours. Now i consider this task needs
some automatization. I read the API documentation and googled but still no
luck... And any possible solution could help somebody else in the future.

What i tried:
   1) If I poll one of the solr cores via

    "
http://solrhost1.somecorporatesite.org:8983/solr/admin/metrics?wt=JSON&prefix=INDEX
"

I get output like (**cores.json**):

    "responseHeader":{
       "status":0,
        "Qtime":2004},
     "metrics":{
       "solr.core.collectionname1-2020-12-05.shard12.replica_n240:{
       "INDEX.size":"456 bytes",
       "INDEX.sizeInBytes":456},
       "solr.core.collectionname2-2020-12-04.shard74.replica_n650:{
       "INDEX.size":"2.88 GB",
       "INDEX.sizeInBytes":3088933801},

... and so on which is what i need BUT only according to one core(local).
But there are more than 200 of them.

   2) I can get a list of all collections, shards and replicas via:


http://localhost:8983/solr/admin/collections?action=clusterstatus&wt=json

and it looks like (**collections.json**)

    "responseHeader":{
      "status":0,
      "QTime":184},
    "cluster":{
      "collections":{
      "collectionname1":{
      "pullReplicas":"0",
      "replicationFactor":"1",
      "shards":{
         "shard1":{
          "range":"800000000-80e0ffff",
          "state":active",
          "replicas":{
             "core_node67":{
               "core":"collectionname123-2020-11-30_shard1_replica_n54",
               "node_name":"solrhost99.somecorporatesite.org:8985/solr",
               "state":"active",
               "type":"NRT",
               "force_ste_state":"false",
               "leader":"true"},
              "core_node548":{
                 "core":"collectionname223-2020-11-29_shard1_replica_n448",
                  "node_name":"solrhost77.somecorporatesite.org:8984/solr",
                  "state":"active",
                  "type":"NRT",
                  "force_ste_state":"false"}}},
           "shard2":{
                 "range":

... and so on, 117 156 lines

The question is, how can i insert the fields of INDEX.size into the second
output(clusterstatus) for calculation of sum disk space used by indices?

In other words, i need the correspondings fields of INDEX.size in replicas
sections of **collections.json**

Currently the whole solr system consumes 100TB+ and is still growing, we
need to know the tempo of it's growth. Many thanks in advance!