You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by ca...@libero.it on 2017/07/06 18:59:18 UTC

Max document per shard ( included deleted documents )

Hi,

I'm working on an application that index CDR ( Call Detail Record ) in SolrCloud with 1 collection and 3 shards.

Every day the application index 30 millions of CDR. 

I have a purge application that delete records older than 10 days, and call OPTIMIZE,  so the collection will keep only 300 millions of CDR.

Do you know if there is a limit on max number of documents per shard , included deleted documents ? 

I read in some blogs that there is a limit of 2 Billions per shard included deleted documents, that is I can have an empty collection, but if I already indexed 6 Billions of CDR ( 2 per 3 shards ) in that collection, I'll get an error. Is it true ? Have I recreate the collection ?

I see that when I call delete records, apache solr free space on disk.

Thanks.

Agostino


 

Re: Max document per shard ( included deleted documents )

Posted by Walter Underwood <wu...@wunderwood.org>.
The deleted records will be automatically cleaned up in the background. You don’t have to do anything.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jul 7, 2017, at 1:25 PM, calamita.agostino@libero.it wrote:
> 
> 
> Sorry , I  know that size is for shard and not for collection. My doubt  is: if every day I insert 10M documents in a shard and delete 10M of documents (the old ones  ) after  20 days I have to add a new shard or not ? Number of undeleted documents is always the same. ( 100M  for example )
> Thanks.
> Agos. 
> --
> Sent from Libero Mail for Android Friday, 07 July 2017, 07:51PM +02:00 from Erick Erickson  erickerickson@gmail.com :
> 
>> You seem to be confusing shards with collections.
>> 
>> You can have 100 shards each with 100M documents for a total of 10B
>> documents in the _collection_, but no individual shard has more than
>> 100M docs.
>> 
>> Best,
>> Erick
>> 
>> On Fri, Jul 7, 2017 at 10:02 AM,  < calamita.agostino@libero.it > wrote:
>>> 
>>> Ok. I will  never  have  more than  100 Million of document per shard in the same time, because I delete old  documents every  night To keep last  10 days   I don't understand  if I have add shards after months  of indexing  ( insert  and delete can reach 2B after a few  months  ) or  leave the same shards forever.
>>> --
>>> Inviato da Libero Mail per Android Venerdì, 07 Luglio 2017, 06:46PM +02:00 da Erick Erickson  erickerickson@gmail.com :
>>> 
>>>> Stop.. 2 billion is _per shard_ not per collection. You'll probably
>>>> never have that many in practice as the search performance would be
>>>> pretty iffy. Every filterCache entry would occupy up to .25G for
>>>> instance. So just don't expect to fit 2B docs per shard unless you've
>>>> tested the heck out of it and are doing totally simple searches.
>>>> 
>>>> I've seen between 10M and 300M docs on a shard give reasonable
>>>> performance. I've never seen 1B docs on a single shard work well in
>>>> production. It's possible, but I sure wouldn't plan on it.
>>>> 
>>>> You have to test to see what _your_ data and _your_ query patterns
>>>> allow. See:  https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>> On Thu, Jul 6, 2017 at 11:10 PM,  <  calamita.agostino@libero.it > wrote:
>>>>> 
>>>>> Thanks  Erik. I used  implicit  shards. So the right  maintenance  could  be add  other shards after a period  of  time, change  the  roule that  fill  partition  field  in collection and  drop old shards when they  are empty. Is  it  right ? How  can  I  see that 2 billion records  limit is  reached ? Is there  an  API ?
>>>>> --
>>>>> Inviato da Libero Mail per Android Giovedì, 06 Luglio 2017, 11:17PM +02:00 da Erick Erickson  erickerickson@gmail.com :
>>>>> 
>>>>>> Right, every individual shard is limited to 2B records. That does
>>>>>> include deleted docs. But I've never seen a shard (a Lucene index
>>>>>> actually) perform satisfactorily at that scale so while this is a
>>>>>> limit people usually add shards long before that.
>>>>>> 
>>>>>> There is no technical reason to optimize every time, normal segment
>>>>>> merging will eventually remove the data associated with deleted
>>>>>> documents. You'll carry forward a number of deleted docs, but I
>>>>>> usually see it stabilize around 10%-15%.
>>>>>> 
>>>>>> You don't necessarily have to re-index, you can split existing shards.
>>>>>> 
>>>>>> But from your e-mail, it looks like you think you have to do something
>>>>>> explicit to reclaim the resources associated with deleted documents.
>>>>>> You do not have to do this. Optimize is really a special heavyweight
>>>>>> merge. Normal merging happens when you do a commit and that process
>>>>>> also reclaims the deleted document resources.
>>>>>> 
>>>>>> Best,
>>>>>> Erick
>>>>>> 
>>>>>> On Thu, Jul 6, 2017 at 11:59 AM,  <  calamita.agostino@libero.it > wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I'm working on an application that index CDR ( Call Detail Record ) in SolrCloud with 1 collection and 3 shards.
>>>>>>> 
>>>>>>> Every day the application index 30 millions of CDR.
>>>>>>> 
>>>>>>> I have a purge application that delete records older than 10 days, and call OPTIMIZE,  so the collection will keep only 300 millions of CDR.
>>>>>>> 
>>>>>>> Do you know if there is a limit on max number of documents per shard , included deleted documents ?
>>>>>>> 
>>>>>>> I read in some blogs that there is a limit of 2 Billions per shard included deleted documents, that is I can have an empty collection, but if I already indexed 6 Billions of CDR ( 2 per 3 shards ) in that collection, I'll get an error. Is it true ? Have I recreate the collection ?
>>>>>>> 
>>>>>>> I see that when I call delete records, apache solr free space on disk.
>>>>>>> 
>>>>>>> Thanks.
>>>>>>> 
>>>>>>> Agostino
>>>>>>> 
>>>>>>> 
>>>>>>> 


Re[2]: Re[2]: Re[2]: Max document per shard ( included deleted documents )

Posted by ca...@libero.it.
Sorry , I  know that size is for shard and not for collection. My doubt  is: if every day I insert 10M documents in a shard and delete 10M of documents (the old ones  ) after  20 days I have to add a new shard or not ? Number of undeleted documents is always the same. ( 100M  for example )
Thanks.
Agos. 
--
Sent from Libero Mail for Android Friday, 07 July 2017, 07:51PM +02:00 from Erick Erickson  erickerickson@gmail.com :

>You seem to be confusing shards with collections.
>
>You can have 100 shards each with 100M documents for a total of 10B
>documents in the _collection_, but no individual shard has more than
>100M docs.
>
>Best,
>Erick
>
>On Fri, Jul 7, 2017 at 10:02 AM,  < calamita.agostino@libero.it > wrote:
>>
>> Ok. I will  never  have  more than  100 Million of document per shard in the same time, because I delete old  documents every  night To keep last  10 days   I don't understand  if I have add shards after months  of indexing  ( insert  and delete can reach 2B after a few  months  ) or  leave the same shards forever.
>> --
>> Inviato da Libero Mail per Android Venerdì, 07 Luglio 2017, 06:46PM +02:00 da Erick Erickson  erickerickson@gmail.com :
>>
>>>Stop.. 2 billion is _per shard_ not per collection. You'll probably
>>>never have that many in practice as the search performance would be
>>>pretty iffy. Every filterCache entry would occupy up to .25G for
>>>instance. So just don't expect to fit 2B docs per shard unless you've
>>>tested the heck out of it and are doing totally simple searches.
>>>
>>>I've seen between 10M and 300M docs on a shard give reasonable
>>>performance. I've never seen 1B docs on a single shard work well in
>>>production. It's possible, but I sure wouldn't plan on it.
>>>
>>>You have to test to see what _your_ data and _your_ query patterns
>>>allow. See:  https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>>
>>>Best,
>>>Erick
>>>
>>>On Thu, Jul 6, 2017 at 11:10 PM,  <  calamita.agostino@libero.it > wrote:
>>>>
>>>> Thanks  Erik. I used  implicit  shards. So the right  maintenance  could  be add  other shards after a period  of  time, change  the  roule that  fill  partition  field  in collection and  drop old shards when they  are empty. Is  it  right ? How  can  I  see that 2 billion records  limit is  reached ? Is there  an  API ?
>>>> --
>>>> Inviato da Libero Mail per Android Giovedì, 06 Luglio 2017, 11:17PM +02:00 da Erick Erickson  erickerickson@gmail.com :
>>>>
>>>>>Right, every individual shard is limited to 2B records. That does
>>>>>include deleted docs. But I've never seen a shard (a Lucene index
>>>>>actually) perform satisfactorily at that scale so while this is a
>>>>>limit people usually add shards long before that.
>>>>>
>>>>>There is no technical reason to optimize every time, normal segment
>>>>>merging will eventually remove the data associated with deleted
>>>>>documents. You'll carry forward a number of deleted docs, but I
>>>>>usually see it stabilize around 10%-15%.
>>>>>
>>>>>You don't necessarily have to re-index, you can split existing shards.
>>>>>
>>>>>But from your e-mail, it looks like you think you have to do something
>>>>>explicit to reclaim the resources associated with deleted documents.
>>>>>You do not have to do this. Optimize is really a special heavyweight
>>>>>merge. Normal merging happens when you do a commit and that process
>>>>>also reclaims the deleted document resources.
>>>>>
>>>>>Best,
>>>>>Erick
>>>>>
>>>>>On Thu, Jul 6, 2017 at 11:59 AM,  <  calamita.agostino@libero.it > wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm working on an application that index CDR ( Call Detail Record ) in SolrCloud with 1 collection and 3 shards.
>>>>>>
>>>>>> Every day the application index 30 millions of CDR.
>>>>>>
>>>>>> I have a purge application that delete records older than 10 days, and call OPTIMIZE,  so the collection will keep only 300 millions of CDR.
>>>>>>
>>>>>> Do you know if there is a limit on max number of documents per shard , included deleted documents ?
>>>>>>
>>>>>> I read in some blogs that there is a limit of 2 Billions per shard included deleted documents, that is I can have an empty collection, but if I already indexed 6 Billions of CDR ( 2 per 3 shards ) in that collection, I'll get an error. Is it true ? Have I recreate the collection ?
>>>>>>
>>>>>> I see that when I call delete records, apache solr free space on disk.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Agostino
>>>>>>
>>>>>>
>>>>>>

Re: Re[2]: Re[2]: Max document per shard ( included deleted documents )

Posted by Erick Erickson <er...@gmail.com>.
You seem to be confusing shards with collections.

You can have 100 shards each with 100M documents for a total of 10B
documents in the _collection_, but no individual shard has more than
100M docs.

Best,
Erick

On Fri, Jul 7, 2017 at 10:02 AM,  <ca...@libero.it> wrote:
>
> Ok. I will  never  have  more than  100 Million of document per shard in the same time, because I delete old  documents every  night To keep last  10 days   I don't understand  if I have add shards after months  of indexing  ( insert  and delete can reach 2B after a few  months  ) or  leave the same shards forever.
> --
> Inviato da Libero Mail per Android Venerdì, 07 Luglio 2017, 06:46PM +02:00 da Erick Erickson  erickerickson@gmail.com :
>
>>Stop.. 2 billion is _per shard_ not per collection. You'll probably
>>never have that many in practice as the search performance would be
>>pretty iffy. Every filterCache entry would occupy up to .25G for
>>instance. So just don't expect to fit 2B docs per shard unless you've
>>tested the heck out of it and are doing totally simple searches.
>>
>>I've seen between 10M and 300M docs on a shard give reasonable
>>performance. I've never seen 1B docs on a single shard work well in
>>production. It's possible, but I sure wouldn't plan on it.
>>
>>You have to test to see what _your_ data and _your_ query patterns
>>allow. See:  https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>
>>Best,
>>Erick
>>
>>On Thu, Jul 6, 2017 at 11:10 PM,  < calamita.agostino@libero.it > wrote:
>>>
>>> Thanks  Erik. I used  implicit  shards. So the right  maintenance  could  be add  other shards after a period  of  time, change  the  roule that  fill  partition  field  in collection and  drop old shards when they  are empty. Is  it  right ? How  can  I  see that 2 billion records  limit is  reached ? Is there  an  API ?
>>> --
>>> Inviato da Libero Mail per Android Giovedì, 06 Luglio 2017, 11:17PM +02:00 da Erick Erickson  erickerickson@gmail.com :
>>>
>>>>Right, every individual shard is limited to 2B records. That does
>>>>include deleted docs. But I've never seen a shard (a Lucene index
>>>>actually) perform satisfactorily at that scale so while this is a
>>>>limit people usually add shards long before that.
>>>>
>>>>There is no technical reason to optimize every time, normal segment
>>>>merging will eventually remove the data associated with deleted
>>>>documents. You'll carry forward a number of deleted docs, but I
>>>>usually see it stabilize around 10%-15%.
>>>>
>>>>You don't necessarily have to re-index, you can split existing shards.
>>>>
>>>>But from your e-mail, it looks like you think you have to do something
>>>>explicit to reclaim the resources associated with deleted documents.
>>>>You do not have to do this. Optimize is really a special heavyweight
>>>>merge. Normal merging happens when you do a commit and that process
>>>>also reclaims the deleted document resources.
>>>>
>>>>Best,
>>>>Erick
>>>>
>>>>On Thu, Jul 6, 2017 at 11:59 AM,  <  calamita.agostino@libero.it > wrote:
>>>>> Hi,
>>>>>
>>>>> I'm working on an application that index CDR ( Call Detail Record ) in SolrCloud with 1 collection and 3 shards.
>>>>>
>>>>> Every day the application index 30 millions of CDR.
>>>>>
>>>>> I have a purge application that delete records older than 10 days, and call OPTIMIZE,  so the collection will keep only 300 millions of CDR.
>>>>>
>>>>> Do you know if there is a limit on max number of documents per shard , included deleted documents ?
>>>>>
>>>>> I read in some blogs that there is a limit of 2 Billions per shard included deleted documents, that is I can have an empty collection, but if I already indexed 6 Billions of CDR ( 2 per 3 shards ) in that collection, I'll get an error. Is it true ? Have I recreate the collection ?
>>>>>
>>>>> I see that when I call delete records, apache solr free space on disk.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> Agostino
>>>>>
>>>>>
>>>>>

Re[2]: Re[2]: Max document per shard ( included deleted documents )

Posted by ca...@libero.it.
Ok. I will  never  have  more than  100 Million of document per shard in the same time, because I delete old  documents every  night To keep last  10 days   I don't understand  if I have add shards after months  of indexing  ( insert  and delete can reach 2B after a few  months  ) or  leave the same shards forever. 
--
Inviato da Libero Mail per Android Venerdì, 07 Luglio 2017, 06:46PM +02:00 da Erick Erickson  erickerickson@gmail.com :

>Stop.. 2 billion is _per shard_ not per collection. You'll probably
>never have that many in practice as the search performance would be
>pretty iffy. Every filterCache entry would occupy up to .25G for
>instance. So just don't expect to fit 2B docs per shard unless you've
>tested the heck out of it and are doing totally simple searches.
>
>I've seen between 10M and 300M docs on a shard give reasonable
>performance. I've never seen 1B docs on a single shard work well in
>production. It's possible, but I sure wouldn't plan on it.
>
>You have to test to see what _your_ data and _your_ query patterns
>allow. See:  https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
>Best,
>Erick
>
>On Thu, Jul 6, 2017 at 11:10 PM,  < calamita.agostino@libero.it > wrote:
>>
>> Thanks  Erik. I used  implicit  shards. So the right  maintenance  could  be add  other shards after a period  of  time, change  the  roule that  fill  partition  field  in collection and  drop old shards when they  are empty. Is  it  right ? How  can  I  see that 2 billion records  limit is  reached ? Is there  an  API ?
>> --
>> Inviato da Libero Mail per Android Giovedì, 06 Luglio 2017, 11:17PM +02:00 da Erick Erickson  erickerickson@gmail.com :
>>
>>>Right, every individual shard is limited to 2B records. That does
>>>include deleted docs. But I've never seen a shard (a Lucene index
>>>actually) perform satisfactorily at that scale so while this is a
>>>limit people usually add shards long before that.
>>>
>>>There is no technical reason to optimize every time, normal segment
>>>merging will eventually remove the data associated with deleted
>>>documents. You'll carry forward a number of deleted docs, but I
>>>usually see it stabilize around 10%-15%.
>>>
>>>You don't necessarily have to re-index, you can split existing shards.
>>>
>>>But from your e-mail, it looks like you think you have to do something
>>>explicit to reclaim the resources associated with deleted documents.
>>>You do not have to do this. Optimize is really a special heavyweight
>>>merge. Normal merging happens when you do a commit and that process
>>>also reclaims the deleted document resources.
>>>
>>>Best,
>>>Erick
>>>
>>>On Thu, Jul 6, 2017 at 11:59 AM,  <  calamita.agostino@libero.it > wrote:
>>>> Hi,
>>>>
>>>> I'm working on an application that index CDR ( Call Detail Record ) in SolrCloud with 1 collection and 3 shards.
>>>>
>>>> Every day the application index 30 millions of CDR.
>>>>
>>>> I have a purge application that delete records older than 10 days, and call OPTIMIZE,  so the collection will keep only 300 millions of CDR.
>>>>
>>>> Do you know if there is a limit on max number of documents per shard , included deleted documents ?
>>>>
>>>> I read in some blogs that there is a limit of 2 Billions per shard included deleted documents, that is I can have an empty collection, but if I already indexed 6 Billions of CDR ( 2 per 3 shards ) in that collection, I'll get an error. Is it true ? Have I recreate the collection ?
>>>>
>>>> I see that when I call delete records, apache solr free space on disk.
>>>>
>>>> Thanks.
>>>>
>>>> Agostino
>>>>
>>>>
>>>>

Re: Re[2]: Max document per shard ( included deleted documents )

Posted by Erick Erickson <er...@gmail.com>.
Stop.. 2 billion is _per shard_ not per collection. You'll probably
never have that many in practice as the search performance would be
pretty iffy. Every filterCache entry would occupy up to .25G for
instance. So just don't expect to fit 2B docs per shard unless you've
tested the heck out of it and are doing totally simple searches.

I've seen between 10M and 300M docs on a shard give reasonable
performance. I've never seen 1B docs on a single shard work well in
production. It's possible, but I sure wouldn't plan on it.

You have to test to see what _your_ data and _your_ query patterns
allow. See: https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best,
Erick

On Thu, Jul 6, 2017 at 11:10 PM,  <ca...@libero.it> wrote:
>
> Thanks  Erik. I used  implicit  shards. So the right  maintenance  could  be add  other shards after a period  of  time, change  the  roule that  fill  partition  field  in collection and  drop old shards when they  are empty. Is  it  right ? How  can  I  see that 2 billion records  limit is  reached ? Is there  an  API ?
> --
> Inviato da Libero Mail per Android Giovedì, 06 Luglio 2017, 11:17PM +02:00 da Erick Erickson  erickerickson@gmail.com :
>
>>Right, every individual shard is limited to 2B records. That does
>>include deleted docs. But I've never seen a shard (a Lucene index
>>actually) perform satisfactorily at that scale so while this is a
>>limit people usually add shards long before that.
>>
>>There is no technical reason to optimize every time, normal segment
>>merging will eventually remove the data associated with deleted
>>documents. You'll carry forward a number of deleted docs, but I
>>usually see it stabilize around 10%-15%.
>>
>>You don't necessarily have to re-index, you can split existing shards.
>>
>>But from your e-mail, it looks like you think you have to do something
>>explicit to reclaim the resources associated with deleted documents.
>>You do not have to do this. Optimize is really a special heavyweight
>>merge. Normal merging happens when you do a commit and that process
>>also reclaims the deleted document resources.
>>
>>Best,
>>Erick
>>
>>On Thu, Jul 6, 2017 at 11:59 AM,  < calamita.agostino@libero.it > wrote:
>>> Hi,
>>>
>>> I'm working on an application that index CDR ( Call Detail Record ) in SolrCloud with 1 collection and 3 shards.
>>>
>>> Every day the application index 30 millions of CDR.
>>>
>>> I have a purge application that delete records older than 10 days, and call OPTIMIZE,  so the collection will keep only 300 millions of CDR.
>>>
>>> Do you know if there is a limit on max number of documents per shard , included deleted documents ?
>>>
>>> I read in some blogs that there is a limit of 2 Billions per shard included deleted documents, that is I can have an empty collection, but if I already indexed 6 Billions of CDR ( 2 per 3 shards ) in that collection, I'll get an error. Is it true ? Have I recreate the collection ?
>>>
>>> I see that when I call delete records, apache solr free space on disk.
>>>
>>> Thanks.
>>>
>>> Agostino
>>>
>>>
>>>

Re[2]: Max document per shard ( included deleted documents )

Posted by ca...@libero.it.
Thanks  Erik. I used  implicit  shards. So the right  maintenance  could  be add  other shards after a period  of  time, change  the  roule that  fill  partition  field  in collection and  drop old shards when they  are empty. Is  it  right ? How  can  I  see that 2 billion records  limit is  reached ? Is there  an  API ?
--
Inviato da Libero Mail per Android Giovedì, 06 Luglio 2017, 11:17PM +02:00 da Erick Erickson  erickerickson@gmail.com :

>Right, every individual shard is limited to 2B records. That does
>include deleted docs. But I've never seen a shard (a Lucene index
>actually) perform satisfactorily at that scale so while this is a
>limit people usually add shards long before that.
>
>There is no technical reason to optimize every time, normal segment
>merging will eventually remove the data associated with deleted
>documents. You'll carry forward a number of deleted docs, but I
>usually see it stabilize around 10%-15%.
>
>You don't necessarily have to re-index, you can split existing shards.
>
>But from your e-mail, it looks like you think you have to do something
>explicit to reclaim the resources associated with deleted documents.
>You do not have to do this. Optimize is really a special heavyweight
>merge. Normal merging happens when you do a commit and that process
>also reclaims the deleted document resources.
>
>Best,
>Erick
>
>On Thu, Jul 6, 2017 at 11:59 AM,  < calamita.agostino@libero.it > wrote:
>> Hi,
>>
>> I'm working on an application that index CDR ( Call Detail Record ) in SolrCloud with 1 collection and 3 shards.
>>
>> Every day the application index 30 millions of CDR.
>>
>> I have a purge application that delete records older than 10 days, and call OPTIMIZE,  so the collection will keep only 300 millions of CDR.
>>
>> Do you know if there is a limit on max number of documents per shard , included deleted documents ?
>>
>> I read in some blogs that there is a limit of 2 Billions per shard included deleted documents, that is I can have an empty collection, but if I already indexed 6 Billions of CDR ( 2 per 3 shards ) in that collection, I'll get an error. Is it true ? Have I recreate the collection ?
>>
>> I see that when I call delete records, apache solr free space on disk.
>>
>> Thanks.
>>
>> Agostino
>>
>>
>>

Re: Max document per shard ( included deleted documents )

Posted by Erick Erickson <er...@gmail.com>.
Right, every individual shard is limited to 2B records. That does
include deleted docs. But I've never seen a shard (a Lucene index
actually) perform satisfactorily at that scale so while this is a
limit people usually add shards long before that.

There is no technical reason to optimize every time, normal segment
merging will eventually remove the data associated with deleted
documents. You'll carry forward a number of deleted docs, but I
usually see it stabilize around 10%-15%.

You don't necessarily have to re-index, you can split existing shards.

But from your e-mail, it looks like you think you have to do something
explicit to reclaim the resources associated with deleted documents.
You do not have to do this. Optimize is really a special heavyweight
merge. Normal merging happens when you do a commit and that process
also reclaims the deleted document resources.

Best,
Erick

On Thu, Jul 6, 2017 at 11:59 AM,  <ca...@libero.it> wrote:
> Hi,
>
> I'm working on an application that index CDR ( Call Detail Record ) in SolrCloud with 1 collection and 3 shards.
>
> Every day the application index 30 millions of CDR.
>
> I have a purge application that delete records older than 10 days, and call OPTIMIZE,  so the collection will keep only 300 millions of CDR.
>
> Do you know if there is a limit on max number of documents per shard , included deleted documents ?
>
> I read in some blogs that there is a limit of 2 Billions per shard included deleted documents, that is I can have an empty collection, but if I already indexed 6 Billions of CDR ( 2 per 3 shards ) in that collection, I'll get an error. Is it true ? Have I recreate the collection ?
>
> I see that when I call delete records, apache solr free space on disk.
>
> Thanks.
>
> Agostino
>
>
>