You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Toke Eskildsen <te...@statsbiblioteket.dk> on 2014/11/29 19:30:29 UTC

Standardized index metrics (Was: Constantly high disk read access (40-60M/s))

Michael Sokolov [msokolov@safaribooksonline.com] wrote:
> I wonder if there's any value in providing this metric (total index size
> - stored field size - term vector size) as part of the admin panel?  Is
> it meaningful?  It seems like there would be a lot of cases where it
> could give a good rule of thumb for memory sizing, and it would save
> having to root around in the index folder.

At Lucene/Solr Revolution, I talked with Alexandre Rafalovitch about this. We know (https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/) that we cannot get the full picture of an index, but it is a weekly occurrence on this mailing list that people asks questions where it helps to have a gist of the index metrics and how the index is used.

Some sort of "Copy the content of this concentrated metrics box, when you need to talk with other people about your index"-functionality in the admin panel might help with this. To get an idea of usage, it could also contain a few non-filled fields, such as "peak queries per second" or "typical queries".

- Toke Eskildsen

Re: Standardized index metrics (Was: Constantly high disk read access (40-60M/s))

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

On Sat, Nov 29, 2014 at 2:27 PM, Michael Sokolov <
msokolov@safaribooksonline.com> wrote:

> On 11/29/14 1:30 PM, Toke Eskildsen wrote:
>
>> Michael Sokolov [msokolov@safaribooksonline.com] wrote:
>>
>>> I wonder if there's any value in providing this metric (total index size
>>> - stored field size - term vector size) as part of the admin panel?  Is
>>> it meaningful?  It seems like there would be a lot of cases where it
>>> could give a good rule of thumb for memory sizing, and it would save
>>> having to root around in the index folder.
>>>
>> At Lucene/Solr Revolution, I talked with Alexandre Rafalovitch about
>> this. We know (https://lucidworks.com/blog/sizing-hardware-in-the-
>> abstract-why-we-dont-have-a-definitive-answer/) that we cannot get the
>> full picture of an index, but it is a weekly occurrence on this mailing
>> list that people asks questions where it helps to have a gist of the index
>> metrics and how the index is used.
>>
>> Some sort of "Copy the content of this concentrated metrics box, when you
>> need to talk with other people about your index"-functionality in the admin
>> panel might help with this. To get an idea of usage, it could also contain
>> a few non-filled fields, such as "peak queries per second" or "typical
>> queries".
>>
>> - Toke Eskildsen
>>
> Yes - the cautions about the need for prototyping are all very well, but
> even if you take that advice, and build a prototype, it's not clear how to
> tell whether your setup has enough memory or not. You can add more and
> measure response times, but even then you only have a gross measurement,
> and no way of knowing where, in detail, the memory is being used.  Also,
> you might be able to improve your system to make better use of memory with
> more precise information. It seems like we ought to be able to monitor a
> running system, observe its memory requirements over time, and report on
> those.
>

+1 to that!
I haven't been following this aspect of development super closely, but I
believe there are memory/size estimators for various things at Lucene level
that Elasticsearch is nicely exposing via its stats API.  I don't know the
specifics around those estimators without digging in, otherwise I'd open a
JIRA, because I think this is valuable information -- at Sematext we
regularly deal with hardware sizing, memory / CPU usage estimates, etc.
etc., so the more of this info is surfaced the easier it will be for people
to work with Solr.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

Re: Standardized index metrics (Was: Constantly high disk read access (40-60M/s))

Posted by Michael Sokolov <ms...@safaribooksonline.com>.
On 11/29/14 1:30 PM, Toke Eskildsen wrote:
> Michael Sokolov [msokolov@safaribooksonline.com] wrote:
>> I wonder if there's any value in providing this metric (total index size
>> - stored field size - term vector size) as part of the admin panel?  Is
>> it meaningful?  It seems like there would be a lot of cases where it
>> could give a good rule of thumb for memory sizing, and it would save
>> having to root around in the index folder.
> At Lucene/Solr Revolution, I talked with Alexandre Rafalovitch about this. We know (https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/) that we cannot get the full picture of an index, but it is a weekly occurrence on this mailing list that people asks questions where it helps to have a gist of the index metrics and how the index is used.
>
> Some sort of "Copy the content of this concentrated metrics box, when you need to talk with other people about your index"-functionality in the admin panel might help with this. To get an idea of usage, it could also contain a few non-filled fields, such as "peak queries per second" or "typical queries".
>
> - Toke Eskildsen
Yes - the cautions about the need for prototyping are all very well, but 
even if you take that advice, and build a prototype, it's not clear how 
to tell whether your setup has enough memory or not. You can add more 
and measure response times, but even then you only have a gross 
measurement, and no way of knowing where, in detail, the memory is being 
used.  Also, you might be able to improve your system to make better use 
of memory with more precise information. It seems like we ought to be 
able to monitor a running system, observe its memory requirements over 
time, and report on those.

-Mike