You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Roman Ivanov <no...@gmail.com> on 2020/12/09 13:45:23 UTC
How can i poll Solrcloud via API to get the sum of index size of all
shards and replicas?
Hello! We have a Solrcloud(7.4) consisting of 90+ hosts(each of them
running multiple nodes of solr, e.g. ports 8983, 8984, 8985), numerous
shards(each having several replicas) and numerous collections.
I was given a task to summarize the total index size(on disks) of a certain
collection. First I calculated it from web interface(via copy-paste)
manually and there were thousands of lines (The http interface(8983) Cloud
- Nodes tab). It took about several hours. Now i consider this task needs
some automatization. I read the API documentation and googled but still no
luck... And any possible solution could help somebody else in the future.
What i tried:
1) If I poll one of the solr cores via
"
http://solrhost1.somecorporatesite.org:8983/solr/admin/metrics?wt=JSON&prefix=INDEX
"
I get output like (**cores.json**):
"responseHeader":{
"status":0,
"Qtime":2004},
"metrics":{
"solr.core.collectionname1-2020-12-05.shard12.replica_n240:{
"INDEX.size":"456 bytes",
"INDEX.sizeInBytes":456},
"solr.core.collectionname2-2020-12-04.shard74.replica_n650:{
"INDEX.size":"2.88 GB",
"INDEX.sizeInBytes":3088933801},
... and so on which is what i need BUT only according to one core(local).
But there are more than 200 of them.
2) I can get a list of all collections, shards and replicas via:
http://localhost:8983/solr/admin/collections?action=clusterstatus&wt=json
and it looks like (**collections.json**)
"responseHeader":{
"status":0,
"QTime":184},
"cluster":{
"collections":{
"collectionname1":{
"pullReplicas":"0",
"replicationFactor":"1",
"shards":{
"shard1":{
"range":"800000000-80e0ffff",
"state":active",
"replicas":{
"core_node67":{
"core":"collectionname123-2020-11-30_shard1_replica_n54",
"node_name":"solrhost99.somecorporatesite.org:8985/solr",
"state":"active",
"type":"NRT",
"force_ste_state":"false",
"leader":"true"},
"core_node548":{
"core":"collectionname223-2020-11-29_shard1_replica_n448",
"node_name":"solrhost77.somecorporatesite.org:8984/solr",
"state":"active",
"type":"NRT",
"force_ste_state":"false"}}},
"shard2":{
"range":
... and so on, 117 156 lines
The question is, how can i insert the fields of INDEX.size into the second
output(clusterstatus) for calculation of sum disk space used by indices?
In other words, i need the correspondings fields of INDEX.size in replicas
sections of **collections.json**
Currently the whole solr system consumes 100TB+ and is still growing, we
need to know the tempo of it's growth. Many thanks in advance!