You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Kudrettin Güleryüz <ku...@gmail.com> on 2018/08/27 21:17:05 UTC
cloud disk space utilization
Hi,
We have six Solr nodes with ~1TiB disk space on each mounted as ext4. The
indexers sometimes update the collections and create new ones if update
wouldn't be faster than scratch indexing. (up to around 5 million documents
are indexed for each collection) On average there are around 130
collections on this SolrCloud. Collection sizes vary from 1GiB to 150GiB.
Preferences set:
"cluster-preferences":[{
"maximize":"freedisk",
"precision":10}
,{
"minimize":"cores",
"precision":1}
,{
"minimize":"sysLoadAvg",
"precision":3}],
* Is it be possible to run out of disk space on one of the nodes while
others would have plenty? I observe some are getting close to ~80%
utilization while others stay at ~60%
* Would this difference be due to collection index size differences or due
to error on my side to come up with a useful policy/preferences?
Thank you
Re: cloud disk space utilization
Posted by Kudrettin Güleryüz <ku...@gmail.com>.
Thank you Shalin. I'll try creating a policy with practically zero effect
for now.
On Wed, Aug 29, 2018 at 11:31 PM Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:
> There is a bad oversight on our part which causes preferences to not be
> used for placing replicas unless a cluster policy also exists. We hope to
> fix it in the next release (Solr 7.5). See
> https://issues.apache.org/jira/browse/SOLR-12648
>
> You may also be interested in
> https://issues.apache.org/jira/browse/SOLR-12592
>
>
> On Tue, Aug 28, 2018 at 2:47 AM Kudrettin Güleryüz <ku...@gmail.com>
> wrote:
>
> > Hi,
> >
> > We have six Solr nodes with ~1TiB disk space on each mounted as ext4. The
> > indexers sometimes update the collections and create new ones if update
> > wouldn't be faster than scratch indexing. (up to around 5 million
> documents
> > are indexed for each collection) On average there are around 130
> > collections on this SolrCloud. Collection sizes vary from 1GiB to 150GiB.
> >
> > Preferences set:
> >
> > "cluster-preferences":[{
> > "maximize":"freedisk",
> > "precision":10}
> > ,{
> > "minimize":"cores",
> > "precision":1}
> > ,{
> > "minimize":"sysLoadAvg",
> > "precision":3}],
> >
> > * Is it be possible to run out of disk space on one of the nodes while
> > others would have plenty? I observe some are getting close to ~80%
> > utilization while others stay at ~60%
> > * Would this difference be due to collection index size differences or
> due
> > to error on my side to come up with a useful policy/preferences?
> >
> > Thank you
> >
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
Re: cloud disk space utilization
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
There is a bad oversight on our part which causes preferences to not be
used for placing replicas unless a cluster policy also exists. We hope to
fix it in the next release (Solr 7.5). See
https://issues.apache.org/jira/browse/SOLR-12648
You may also be interested in
https://issues.apache.org/jira/browse/SOLR-12592
On Tue, Aug 28, 2018 at 2:47 AM Kudrettin Güleryüz <ku...@gmail.com>
wrote:
> Hi,
>
> We have six Solr nodes with ~1TiB disk space on each mounted as ext4. The
> indexers sometimes update the collections and create new ones if update
> wouldn't be faster than scratch indexing. (up to around 5 million documents
> are indexed for each collection) On average there are around 130
> collections on this SolrCloud. Collection sizes vary from 1GiB to 150GiB.
>
> Preferences set:
>
> "cluster-preferences":[{
> "maximize":"freedisk",
> "precision":10}
> ,{
> "minimize":"cores",
> "precision":1}
> ,{
> "minimize":"sysLoadAvg",
> "precision":3}],
>
> * Is it be possible to run out of disk space on one of the nodes while
> others would have plenty? I observe some are getting close to ~80%
> utilization while others stay at ~60%
> * Would this difference be due to collection index size differences or due
> to error on my side to come up with a useful policy/preferences?
>
> Thank you
>
--
Regards,
Shalin Shekhar Mangar.
Re: cloud disk space utilization
Posted by Walter Underwood <wu...@wunderwood.org>.
You need free disk space equal to at least half the minimum sizes of the collections.
You might need more. We have a 23 GB collection in Solr cloud. When we reload all
the content and wait until the end to do a commit, it gets up to 51 GB.
wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Aug 29, 2018, at 1:41 PM, Kudrettin Güleryüz <ku...@gmail.com> wrote:
>
> Given the set of preferences above, I would expect the difference between
> the largest freedisk (test-43 currently) and the smallest freedisk (test-45
> currently) to be smaller than what is below. Below is the output from
> reading diagnostics endpoint from autoscaling API. According this output,
> the variation between freedisk values is currently as large as 220GiB. I am
> concerned because I cannot tell if the variation is expected, or if it is
> due to configuration error. Also, it would be great to keep track of a
> single disk space, rather than keeping track of 6 disk spaces, if possible.
>
> What policy/preferences options would you suggest exploring specifically
> for evening out freedisks across Solr nodes?
>
> {
> "responseHeader":{
> "status":0,
> "QTime":284},
> "diagnostics":{
> "sortedNodes":[{
> "node":"test-43:8983_solr",
> "cores":137,
> "freedisk":447.0913887023926,
> "sysLoadAvg":117.0},
> {
> "node":"test-42:8983_solr",
> "cores":137,
> "freedisk":369.33697509765625,
> "sysLoadAvg":93.0},
> {
> "node":"test-46:8983_solr",
> "cores":137,
> "freedisk":361.7615737915039,
> "sysLoadAvg":93.0},
> {
> "node":"test-41:8983_solr",
> "cores":137,
> "freedisk":347.91234970092773,
> "sysLoadAvg":86.0},
> {
> "node":"test-44:8983_solr",
> "cores":137,
> "freedisk":341.1301383972168,
> "sysLoadAvg":160.0},
> {
> "node":"test-45:8983_solr",
> "cores":137,
> "freedisk":227.17399215698242,
> "sysLoadAvg":118.0}],
> "violations":[]},
> "WARNING":"This response format is experimental. It is likely to change
> in the future."}
>
> On Mon, Aug 27, 2018 at 5:17 PM Kudrettin Güleryüz <ku...@gmail.com>
> wrote:
>
>> Hi,
>>
>> We have six Solr nodes with ~1TiB disk space on each mounted as ext4. The
>> indexers sometimes update the collections and create new ones if update
>> wouldn't be faster than scratch indexing. (up to around 5 million documents
>> are indexed for each collection) On average there are around 130
>> collections on this SolrCloud. Collection sizes vary from 1GiB to 150GiB.
>>
>> Preferences set:
>>
>> "cluster-preferences":[{
>> "maximize":"freedisk",
>> "precision":10}
>> ,{
>> "minimize":"cores",
>> "precision":1}
>> ,{
>> "minimize":"sysLoadAvg",
>> "precision":3}],
>>
>> * Is it be possible to run out of disk space on one of the nodes while
>> others would have plenty? I observe some are getting close to ~80%
>> utilization while others stay at ~60%
>> * Would this difference be due to collection index size differences or due
>> to error on my side to come up with a useful policy/preferences?
>>
>> Thank you
>>
>>
Re: cloud disk space utilization
Posted by Kudrettin Güleryüz <ku...@gmail.com>.
Given the set of preferences above, I would expect the difference between
the largest freedisk (test-43 currently) and the smallest freedisk (test-45
currently) to be smaller than what is below. Below is the output from
reading diagnostics endpoint from autoscaling API. According this output,
the variation between freedisk values is currently as large as 220GiB. I am
concerned because I cannot tell if the variation is expected, or if it is
due to configuration error. Also, it would be great to keep track of a
single disk space, rather than keeping track of 6 disk spaces, if possible.
What policy/preferences options would you suggest exploring specifically
for evening out freedisks across Solr nodes?
{
"responseHeader":{
"status":0,
"QTime":284},
"diagnostics":{
"sortedNodes":[{
"node":"test-43:8983_solr",
"cores":137,
"freedisk":447.0913887023926,
"sysLoadAvg":117.0},
{
"node":"test-42:8983_solr",
"cores":137,
"freedisk":369.33697509765625,
"sysLoadAvg":93.0},
{
"node":"test-46:8983_solr",
"cores":137,
"freedisk":361.7615737915039,
"sysLoadAvg":93.0},
{
"node":"test-41:8983_solr",
"cores":137,
"freedisk":347.91234970092773,
"sysLoadAvg":86.0},
{
"node":"test-44:8983_solr",
"cores":137,
"freedisk":341.1301383972168,
"sysLoadAvg":160.0},
{
"node":"test-45:8983_solr",
"cores":137,
"freedisk":227.17399215698242,
"sysLoadAvg":118.0}],
"violations":[]},
"WARNING":"This response format is experimental. It is likely to change
in the future."}
On Mon, Aug 27, 2018 at 5:17 PM Kudrettin Güleryüz <ku...@gmail.com>
wrote:
> Hi,
>
> We have six Solr nodes with ~1TiB disk space on each mounted as ext4. The
> indexers sometimes update the collections and create new ones if update
> wouldn't be faster than scratch indexing. (up to around 5 million documents
> are indexed for each collection) On average there are around 130
> collections on this SolrCloud. Collection sizes vary from 1GiB to 150GiB.
>
> Preferences set:
>
> "cluster-preferences":[{
> "maximize":"freedisk",
> "precision":10}
> ,{
> "minimize":"cores",
> "precision":1}
> ,{
> "minimize":"sysLoadAvg",
> "precision":3}],
>
> * Is it be possible to run out of disk space on one of the nodes while
> others would have plenty? I observe some are getting close to ~80%
> utilization while others stay at ~60%
> * Would this difference be due to collection index size differences or due
> to error on my side to come up with a useful policy/preferences?
>
> Thank you
>
>