You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by "Natarajan, Rajeswari" <ra...@sap.com> on 2021/04/05 04:59:41 UTC

Index Size of a tenant

Hi,

We plan to store multiple tenants in a single collection (multiple shards)  with a composite Id router with docId prefix as tenant id.
In this set up, how can a tenant’s index size be found. Solr metrics api gives the core’s index size .But in same core multiple tenants might be present.
Would like to know if there is any out of box solr api available for this case.


Thanks,
Rajeswari

Re: Index Size of a tenant

Posted by "Natarajan, Rajeswari" <ra...@sap.com>.
Thanks much for your reply.
Thanks,
Rajeswari

On 4/7/21, 1:16 PM, "Shawn Heisey" <ap...@elyograg.org> wrote:

    On 4/7/2021 1:41 PM, Natarajan, Rajeswari wrote:
    > If there is any way to get the size of the index of tenant in a collection where multiple tenants co-exist with composite id router scheme ,let me know
    > We need to somehow track the tenant's index size to see if it grows too big and document count is not proportional to index size in our case.

    There isn't any way to do that.  The way that Lucene's indexes are 
    designed, obtaining that information is currently impossible, and it 
    would likely take a VERY large amount of development effort to make it 
    possible.  I would guess that even if it were possible, obtaining that 
    information would be very expensive in terms of system resources and time.

    The best you can do with current technology is estimate the size based 
    on document count compared to the whole index.  But if each tenant has 
    very different kinds of data in the index, that method would probably 
    give you inaccurate information.

    One thing you could do to have each one be its own collection is set up 
    multiple cloud installs, which can share one zookeeper ensemble by using 
    different chroot values for each one, and only put a few hundred 
    collections in each cloud.  This would probably require a lot of 
    additional hardware, and because of Lucene's economies of scale that 
    Walter was talking about, multiple collections WILL be larger on disk 
    than multiple tenants in one collection.

    Thanks,
    Shawn


Re: Index Size of a tenant

Posted by Shawn Heisey <ap...@elyograg.org>.
On 4/7/2021 1:41 PM, Natarajan, Rajeswari wrote:
> If there is any way to get the size of the index of tenant in a collection where multiple tenants co-exist with composite id router scheme ,let me know
> We need to somehow track the tenant's index size to see if it grows too big and document count is not proportional to index size in our case.

There isn't any way to do that.  The way that Lucene's indexes are 
designed, obtaining that information is currently impossible, and it 
would likely take a VERY large amount of development effort to make it 
possible.  I would guess that even if it were possible, obtaining that 
information would be very expensive in terms of system resources and time.

The best you can do with current technology is estimate the size based 
on document count compared to the whole index.  But if each tenant has 
very different kinds of data in the index, that method would probably 
give you inaccurate information.

One thing you could do to have each one be its own collection is set up 
multiple cloud installs, which can share one zookeeper ensemble by using 
different chroot values for each one, and only put a few hundred 
collections in each cloud.  This would probably require a lot of 
additional hardware, and because of Lucene's economies of scale that 
Walter was talking about, multiple collections WILL be larger on disk 
than multiple tenants in one collection.

Thanks,
Shawn

Re: Index Size of a tenant

Posted by "Natarajan, Rajeswari" <ra...@sap.com>.
If there is any way to get the size of the index of tenant in a collection where multiple tenants co-exist with composite id router scheme ,let me know
We need to somehow track the tenant's index size to see if it grows too big and document count is not proportional to index size in our case.

Thanks,
Rajeswari
 

On 4/5/21, 1:52 PM, "Natarajan, Rajeswari" <ra...@sap.com> wrote:

    Thanks for your reply . We are looking for some strategy to add tenants in a collection. Initially we thought we will go with the
    number of documents. But we saw some tenants have less docs ,but their index size is more than the tenants having
    more documents, meaning the number of docs and index size is not proportional .  So we are looking to see if any way that exists to
    get the size of a tenant's index.

    Thanks,
    Rajeswari

    On 4/5/21, 1:35 PM, "Walter Underwood" <wu...@wunderwood.org> wrote:

        Some index structures are statistics of the entire index, so they don’t belong to one part of it.

        So the number you are asking for doesn’t exist. Lucene indexes don’t work like that. If you
        made an index with the documents from one tenant, it would not be the same size as the
        fraction of a shared index.

        Your best approach is to get the entire disk usage and assign the portion of the that by the
        portion of docs belonging to a tenant.

        But to back up one step, what are you doing with that information? Disk space is not a useful
        or stable metric for indexes. It varies with the number of deleted documents, changes during 
        and after merges, and you need extra unused disk space for Solr to function. That unused space
        must be dedicated to Solr, so should be counted even though it doesn’t have index files on it
        right now. Solr Cloud needs transaction logs even though those aren’t officially part of the index.

        All of that means that there is no API for one tenant’s part of the disk space and there won’t be
        an API for it. The question doesn’t make sense for a Solr system.

        wunder
        Walter Underwood
        wunder@wunderwood.org
        http://observer.wunderwood.org/  (my blog)

        > On Apr 5, 2021, at 1:17 PM, Natarajan, Rajeswari <ra...@sap.com> wrote:
        > 
        > I guess you mean number of documents ,not the size of index in disk. We are looking for size of index in disk.
        > 
        > Thanks,
        > Rajeswari
        > 
        > On 4/5/21, 10:32 AM, "Walter Underwood" <wu...@wunderwood.org> wrote:
        > 
        >    Assuming each tenant has an ID, you can get the size by searching for tenant_id:1234 and requesting zero rows. We do that for metrics for different document types in the same collection.
        > 
        >    wunder
        >    Walter Underwood
        >    wunder@wunderwood.org
        >    http://observer.wunderwood.org/  (my blog)
        > 
        >> On Apr 5, 2021, at 10:02 AM, Natarajan, Rajeswari <ra...@sap.com> wrote:
        >> 
        >> Yes, that's correct .
        >> 
        >> Thanks,
        >> Rajeswari
        >> 
        >> On 4/5/21, 6:21 AM, "Jan Høydahl" <ja...@cominvent.com> wrote:
        >> 
        >>   Why not the obious design choice of one collection per tenant? Are you afraid of Solr not handling a large number of collections?
        >> 
        >>   Jan
        >> 
        >>> 5. apr. 2021 kl. 06:59 skrev Natarajan, Rajeswari <ra...@sap.com>:
        >>> 
        >>> Hi,
        >>> 
        >>> We plan to store multiple tenants in a single collection (multiple shards)  with a composite Id router with docId prefix as tenant id.
        >>> In this set up, how can a tenant’s index size be found. Solr metrics api gives the core’s index size .But in same core multiple tenants might be present.
        >>> Would like to know if there is any out of box solr api available for this case.
        >>> 
        >>> 
        >>> Thanks,
        >>> Rajeswari




Re: Index Size of a tenant

Posted by "Natarajan, Rajeswari" <ra...@sap.com>.
Thanks for your reply . We are looking for some strategy to add tenants in a collection. Initially we thought we will go with the
number of documents. But we saw some tenants have less docs ,but their index size is more than the tenants having
more documents, meaning the number of docs and index size is not proportional .  So we are looking to see if any way that exists to
get the size of a tenant's index.

Thanks,
Rajeswari

On 4/5/21, 1:35 PM, "Walter Underwood" <wu...@wunderwood.org> wrote:

    Some index structures are statistics of the entire index, so they don’t belong to one part of it.

    So the number you are asking for doesn’t exist. Lucene indexes don’t work like that. If you
    made an index with the documents from one tenant, it would not be the same size as the
    fraction of a shared index.

    Your best approach is to get the entire disk usage and assign the portion of the that by the
    portion of docs belonging to a tenant.

    But to back up one step, what are you doing with that information? Disk space is not a useful
    or stable metric for indexes. It varies with the number of deleted documents, changes during 
    and after merges, and you need extra unused disk space for Solr to function. That unused space
    must be dedicated to Solr, so should be counted even though it doesn’t have index files on it
    right now. Solr Cloud needs transaction logs even though those aren’t officially part of the index.

    All of that means that there is no API for one tenant’s part of the disk space and there won’t be
    an API for it. The question doesn’t make sense for a Solr system.

    wunder
    Walter Underwood
    wunder@wunderwood.org
    http://observer.wunderwood.org/  (my blog)

    > On Apr 5, 2021, at 1:17 PM, Natarajan, Rajeswari <ra...@sap.com> wrote:
    > 
    > I guess you mean number of documents ,not the size of index in disk. We are looking for size of index in disk.
    > 
    > Thanks,
    > Rajeswari
    > 
    > On 4/5/21, 10:32 AM, "Walter Underwood" <wu...@wunderwood.org> wrote:
    > 
    >    Assuming each tenant has an ID, you can get the size by searching for tenant_id:1234 and requesting zero rows. We do that for metrics for different document types in the same collection.
    > 
    >    wunder
    >    Walter Underwood
    >    wunder@wunderwood.org
    >    http://observer.wunderwood.org/  (my blog)
    > 
    >> On Apr 5, 2021, at 10:02 AM, Natarajan, Rajeswari <ra...@sap.com> wrote:
    >> 
    >> Yes, that's correct .
    >> 
    >> Thanks,
    >> Rajeswari
    >> 
    >> On 4/5/21, 6:21 AM, "Jan Høydahl" <ja...@cominvent.com> wrote:
    >> 
    >>   Why not the obious design choice of one collection per tenant? Are you afraid of Solr not handling a large number of collections?
    >> 
    >>   Jan
    >> 
    >>> 5. apr. 2021 kl. 06:59 skrev Natarajan, Rajeswari <ra...@sap.com>:
    >>> 
    >>> Hi,
    >>> 
    >>> We plan to store multiple tenants in a single collection (multiple shards)  with a composite Id router with docId prefix as tenant id.
    >>> In this set up, how can a tenant’s index size be found. Solr metrics api gives the core’s index size .But in same core multiple tenants might be present.
    >>> Would like to know if there is any out of box solr api available for this case.
    >>> 
    >>> 
    >>> Thanks,
    >>> Rajeswari



Re: Index Size of a tenant

Posted by Walter Underwood <wu...@wunderwood.org>.
Some index structures are statistics of the entire index, so they don’t belong to one part of it.

So the number you are asking for doesn’t exist. Lucene indexes don’t work like that. If you
made an index with the documents from one tenant, it would not be the same size as the
fraction of a shared index.

Your best approach is to get the entire disk usage and assign the portion of the that by the
portion of docs belonging to a tenant.

But to back up one step, what are you doing with that information? Disk space is not a useful
or stable metric for indexes. It varies with the number of deleted documents, changes during 
and after merges, and you need extra unused disk space for Solr to function. That unused space
must be dedicated to Solr, so should be counted even though it doesn’t have index files on it
right now. Solr Cloud needs transaction logs even though those aren’t officially part of the index.

All of that means that there is no API for one tenant’s part of the disk space and there won’t be
an API for it. The question doesn’t make sense for a Solr system.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 5, 2021, at 1:17 PM, Natarajan, Rajeswari <ra...@sap.com> wrote:
> 
> I guess you mean number of documents ,not the size of index in disk. We are looking for size of index in disk.
> 
> Thanks,
> Rajeswari
> 
> On 4/5/21, 10:32 AM, "Walter Underwood" <wu...@wunderwood.org> wrote:
> 
>    Assuming each tenant has an ID, you can get the size by searching for tenant_id:1234 and requesting zero rows. We do that for metrics for different document types in the same collection.
> 
>    wunder
>    Walter Underwood
>    wunder@wunderwood.org
>    http://observer.wunderwood.org/  (my blog)
> 
>> On Apr 5, 2021, at 10:02 AM, Natarajan, Rajeswari <ra...@sap.com> wrote:
>> 
>> Yes, that's correct .
>> 
>> Thanks,
>> Rajeswari
>> 
>> On 4/5/21, 6:21 AM, "Jan Høydahl" <ja...@cominvent.com> wrote:
>> 
>>   Why not the obious design choice of one collection per tenant? Are you afraid of Solr not handling a large number of collections?
>> 
>>   Jan
>> 
>>> 5. apr. 2021 kl. 06:59 skrev Natarajan, Rajeswari <ra...@sap.com>:
>>> 
>>> Hi,
>>> 
>>> We plan to store multiple tenants in a single collection (multiple shards)  with a composite Id router with docId prefix as tenant id.
>>> In this set up, how can a tenant’s index size be found. Solr metrics api gives the core’s index size .But in same core multiple tenants might be present.
>>> Would like to know if there is any out of box solr api available for this case.
>>> 
>>> 
>>> Thanks,
>>> Rajeswari


Re: Index Size of a tenant

Posted by "Natarajan, Rajeswari" <ra...@sap.com>.
I guess you mean number of documents ,not the size of index in disk. We are looking for size of index in disk.

Thanks,
Rajeswari

On 4/5/21, 10:32 AM, "Walter Underwood" <wu...@wunderwood.org> wrote:

    Assuming each tenant has an ID, you can get the size by searching for tenant_id:1234 and requesting zero rows. We do that for metrics for different document types in the same collection.

    wunder
    Walter Underwood
    wunder@wunderwood.org
    http://observer.wunderwood.org/  (my blog)

    > On Apr 5, 2021, at 10:02 AM, Natarajan, Rajeswari <ra...@sap.com> wrote:
    > 
    > Yes, that's correct .
    > 
    > Thanks,
    > Rajeswari
    > 
    > On 4/5/21, 6:21 AM, "Jan Høydahl" <ja...@cominvent.com> wrote:
    > 
    >    Why not the obious design choice of one collection per tenant? Are you afraid of Solr not handling a large number of collections?
    > 
    >    Jan
    > 
    >> 5. apr. 2021 kl. 06:59 skrev Natarajan, Rajeswari <ra...@sap.com>:
    >> 
    >> Hi,
    >> 
    >> We plan to store multiple tenants in a single collection (multiple shards)  with a composite Id router with docId prefix as tenant id.
    >> In this set up, how can a tenant’s index size be found. Solr metrics api gives the core’s index size .But in same core multiple tenants might be present.
    >> Would like to know if there is any out of box solr api available for this case.
    >> 
    >> 
    >> Thanks,
    >> Rajeswari
    > 
    > 



Re: Index Size of a tenant

Posted by Walter Underwood <wu...@wunderwood.org>.
Assuming each tenant has an ID, you can get the size by searching for tenant_id:1234 and requesting zero rows. We do that for metrics for different document types in the same collection.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 5, 2021, at 10:02 AM, Natarajan, Rajeswari <ra...@sap.com> wrote:
> 
> Yes, that's correct .
> 
> Thanks,
> Rajeswari
> 
> On 4/5/21, 6:21 AM, "Jan Høydahl" <ja...@cominvent.com> wrote:
> 
>    Why not the obious design choice of one collection per tenant? Are you afraid of Solr not handling a large number of collections?
> 
>    Jan
> 
>> 5. apr. 2021 kl. 06:59 skrev Natarajan, Rajeswari <ra...@sap.com>:
>> 
>> Hi,
>> 
>> We plan to store multiple tenants in a single collection (multiple shards)  with a composite Id router with docId prefix as tenant id.
>> In this set up, how can a tenant’s index size be found. Solr metrics api gives the core’s index size .But in same core multiple tenants might be present.
>> Would like to know if there is any out of box solr api available for this case.
>> 
>> 
>> Thanks,
>> Rajeswari
> 
> 


Re: Index Size of a tenant

Posted by "Natarajan, Rajeswari" <ra...@sap.com>.
Yes, that's correct .

Thanks,
Rajeswari

On 4/5/21, 6:21 AM, "Jan Høydahl" <ja...@cominvent.com> wrote:

    Why not the obious design choice of one collection per tenant? Are you afraid of Solr not handling a large number of collections?

    Jan

    > 5. apr. 2021 kl. 06:59 skrev Natarajan, Rajeswari <ra...@sap.com>:
    > 
    > Hi,
    > 
    > We plan to store multiple tenants in a single collection (multiple shards)  with a composite Id router with docId prefix as tenant id.
    > In this set up, how can a tenant’s index size be found. Solr metrics api gives the core’s index size .But in same core multiple tenants might be present.
    > Would like to know if there is any out of box solr api available for this case.
    > 
    > 
    > Thanks,
    > Rajeswari



Re: Index Size of a tenant

Posted by Jan Høydahl <ja...@cominvent.com>.
Why not the obious design choice of one collection per tenant? Are you afraid of Solr not handling a large number of collections?

Jan

> 5. apr. 2021 kl. 06:59 skrev Natarajan, Rajeswari <ra...@sap.com>:
> 
> Hi,
> 
> We plan to store multiple tenants in a single collection (multiple shards)  with a composite Id router with docId prefix as tenant id.
> In this set up, how can a tenant’s index size be found. Solr metrics api gives the core’s index size .But in same core multiple tenants might be present.
> Would like to know if there is any out of box solr api available for this case.
> 
> 
> Thanks,
> Rajeswari