You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pushkar Mishra <pu...@gmail.com> on 2020/12/01 14:46:52 UTC

Re: Need help to configure automated deletion of shard in solr

Hi Team,
As I explained the use case , can someone help me out to find out the
configuration way to delete the shard here ?
A quick response  will be greatly appreciated.

Regards
Pushkar


On Mon, Nov 30, 2020 at 11:32 PM Pushkar Mishra <pu...@gmail.com>
wrote:

>
>
> On Mon, Nov 30, 2020, 9:15 PM Pushkar Mishra <pu...@gmail.com>
> wrote:
>
>> Hi Erick,
>> First of all thanks for your response . I will check the possibility  .
>> Let me explain my problem  in detail :
>>
>> 1. We have other use cases where we are making use of listener on
>> postCommit to delete/shift/split the shards . So we have capability to
>> delete the shards .
>> 2. The current use case is , where we have to delete the documents from
>> the shard , and during deletion process(it will be scheduled process, may
>> be hourly or daily, which will delete the documents) , if shards  gets
>> empty (or may be lets  say nominal documents are left ) , then delete the
>> shard.  And I am exploring to do this using configuration .
>>
> 3. Also it will not be in live shard for sure as only those documents are
> deleted which have TTL got over . TTL could be a month or year.
>
> Please assist if you have any config based idea on this
>
>> Regards
>> Pushkar
>>
>> On Mon, Nov 30, 2020, 8:48 PM Erick Erickson <er...@gmail.com>
>> wrote:
>>
>>> Are you using the implicit router? Otherwise you cannot delete a shard.
>>> And you won’t have any shards that have zero documents anyway.
>>>
>>> It’d be a little convoluted, but you could use the collections COLSTATUS
>>> Api to
>>> find the names of all your replicas. Then query _one_ replica of each
>>> shard with something like
>>> solr/collection1_shard1_replica_n1/q=*:*&distrib=false
>>>
>>> that’ll return the number of live docs (i.e. non-deleted docs) and if
>>> it’s zero
>>> you can delete the shard.
>>>
>>> But the implicit router requires you take complete control of where
>>> documents
>>> go, i.e. which shard they land on.
>>>
>>> This really sounds like an XY problem. What’s the use  case you’re trying
>>> to support where you expect a shard’s number of live docs to drop to
>>> zero?
>>>
>>> Best,
>>> Erick
>>>
>>> > On Nov 30, 2020, at 4:57 AM, Pushkar Mishra <pu...@gmail.com>
>>> wrote:
>>> >
>>> > Hi Solr team,
>>> >
>>> > I am using solr cloud.(version 8.5.x). I have a need to find out a
>>> > configuration where I can delete a shard , when number of documents
>>> reaches
>>> > to zero in the shard , can some one help me out to achieve that ?
>>> >
>>> >
>>> > It is urgent , so a quick response will be highly appreciated .
>>> >
>>> > Thanks
>>> > Pushkar
>>> >
>>> > --
>>> > Pushkar Kumar Mishra
>>> > "Reactions are always instinctive whereas responses are always well
>>> thought
>>> > of... So start responding rather than reacting in life"
>>>
>>>

-- 
Pushkar Kumar Mishra
"Reactions are always instinctive whereas responses are always well thought
of... So start responding rather than reacting in life"

Re: Need help to configure automated deletion of shard in solr

Posted by Erick Erickson <er...@gmail.com>.
You can certainly use the TTL logic. Note the TimeRoutedAlias, but
the DocExpirationUpdateFactory. DocExpirationUpdateFactory
operates on each document individually so you can mix-n-match
if you want.

As for knowing when a shard is empty, I suggested a method for that
in one of the earlier e-mails.

If you have a collection per customer, and assuming that a customer
has the same retention policy for all docs, then TimeRoutedAlias would
work.

Best,
Erick

> On Dec 2, 2020, at 12:19 AM, Pushkar Mishra <pu...@gmail.com> wrote:
> 
> Hi Erick,
> It is implicit.
> TTL thing I have explored but due to some complications we can't use. that .
> Let me explain the actual use case .
> 
> We have limited space ,we can't keep storing the document for infinite
> time  . So based on the customer's retention policy ,I need to delete the
> documents. And in this process  if any shard gets empty , need to delete
> the shard as well.
> 
> So lets say , is there a way to know, when solr completes the purging of
> deleted documents, then based on that flag we can configure shard deletion
> 
> Thanks
> Pushkar
> 
> On Tue, Dec 1, 2020 at 9:02 PM Erick Erickson <er...@gmail.com>
> wrote:
> 
>> This is still confusing. You haven’t told us what router you are using,
>> compositeId or implicit?
>> 
>> If you’re using compositeId (the default), you will never have empty shards
>> because docs get assigned to shards via a hashing algorithm that
>> distributes
>> them very evenly across all available shards. You cannot delete any
>> shard when using compositeId as your routing method.
>> 
>> If you don’t know which router you’re using, then you’re using compositeId.
>> 
>> NOTE: for the rest, “documents” means non-deleted documents. Solr will
>> take care of purging the deleted documents automatically.
>> 
>> I think you’re making this much more difficult than you need to. Assuming
>> that the total number of documents remains relatively constant, you can
>> just
>> let Solr take care of it all and not bother with trying to individually
>> manage
>> shards by using the default compositeID routing.
>> 
>> If the number of docs increases you might need to use splitshard. But it
>> sounds like the total number of “live” documents isn’t going to increase.
>> 
>> For TTL, if you have a _fixed_ TTL, i.e. the docs should always expire
>> after,
>> say, 30 dayswhich it doesn’t sound like you do, you can use
>> the “Time Routed Alias” option, see:
>> https://lucene.apache.org/solr/guide/7_5/time-routed-aliases.html
>> 
>> Assuming your TTL isn’t a fixed-interval, you can configure
>> DocExpirationUpdateProcessorFactory to deal with TTL automatically.
>> 
>> And if you still think you need to handle this, you need to explain exactly
>> what problem you’re trying to solve because so far it appears that
>> you’re simply taking on way more work than you need to.
>> 
>> Best,
>> Erick
>> 
>>> On Dec 1, 2020, at 9:46 AM, Pushkar Mishra <pu...@gmail.com>
>> wrote:
>>> 
>>> Hi Team,
>>> As I explained the use case , can someone help me out to find out the
>>> configuration way to delete the shard here ?
>>> A quick response  will be greatly appreciated.
>>> 
>>> Regards
>>> Pushkar
>>> 
>>> 
>>> On Mon, Nov 30, 2020 at 11:32 PM Pushkar Mishra <pu...@gmail.com>
>>> wrote:
>>> 
>>>> 
>>>> 
>>>> On Mon, Nov 30, 2020, 9:15 PM Pushkar Mishra <pu...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Hi Erick,
>>>>> First of all thanks for your response . I will check the possibility  .
>>>>> Let me explain my problem  in detail :
>>>>> 
>>>>> 1. We have other use cases where we are making use of listener on
>>>>> postCommit to delete/shift/split the shards . So we have capability to
>>>>> delete the shards .
>>>>> 2. The current use case is , where we have to delete the documents from
>>>>> the shard , and during deletion process(it will be scheduled process,
>> may
>>>>> be hourly or daily, which will delete the documents) , if shards  gets
>>>>> empty (or may be lets  say nominal documents are left ) , then delete
>> the
>>>>> shard.  And I am exploring to do this using configuration .
>>>>> 
>>>> 3. Also it will not be in live shard for sure as only those documents
>> are
>>>> deleted which have TTL got over . TTL could be a month or year.
>>>> 
>>>> Please assist if you have any config based idea on this
>>>> 
>>>>> Regards
>>>>> Pushkar
>>>>> 
>>>>> On Mon, Nov 30, 2020, 8:48 PM Erick Erickson <er...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Are you using the implicit router? Otherwise you cannot delete a
>> shard.
>>>>>> And you won’t have any shards that have zero documents anyway.
>>>>>> 
>>>>>> It’d be a little convoluted, but you could use the collections
>> COLSTATUS
>>>>>> Api to
>>>>>> find the names of all your replicas. Then query _one_ replica of each
>>>>>> shard with something like
>>>>>> solr/collection1_shard1_replica_n1/q=*:*&distrib=false
>>>>>> 
>>>>>> that’ll return the number of live docs (i.e. non-deleted docs) and if
>>>>>> it’s zero
>>>>>> you can delete the shard.
>>>>>> 
>>>>>> But the implicit router requires you take complete control of where
>>>>>> documents
>>>>>> go, i.e. which shard they land on.
>>>>>> 
>>>>>> This really sounds like an XY problem. What’s the use  case you’re
>> trying
>>>>>> to support where you expect a shard’s number of live docs to drop to
>>>>>> zero?
>>>>>> 
>>>>>> Best,
>>>>>> Erick
>>>>>> 
>>>>>>> On Nov 30, 2020, at 4:57 AM, Pushkar Mishra <pu...@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi Solr team,
>>>>>>> 
>>>>>>> I am using solr cloud.(version 8.5.x). I have a need to find out a
>>>>>>> configuration where I can delete a shard , when number of documents
>>>>>> reaches
>>>>>>> to zero in the shard , can some one help me out to achieve that ?
>>>>>>> 
>>>>>>> 
>>>>>>> It is urgent , so a quick response will be highly appreciated .
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Pushkar
>>>>>>> 
>>>>>>> --
>>>>>>> Pushkar Kumar Mishra
>>>>>>> "Reactions are always instinctive whereas responses are always well
>>>>>> thought
>>>>>>> of... So start responding rather than reacting in life"
>>>>>> 
>>>>>> 
>>> 
>>> --
>>> Pushkar Kumar Mishra
>>> "Reactions are always instinctive whereas responses are always well
>> thought
>>> of... So start responding rather than reacting in life"
>> 
>> 
> 
> -- 
> Pushkar Kumar Mishra
> "Reactions are always instinctive whereas responses are always well thought
> of... So start responding rather than reacting in life"


Re: Need help to configure automated deletion of shard in solr

Posted by Pushkar Mishra <pu...@gmail.com>.
Hi Erick,
It is implicit.
TTL thing I have explored but due to some complications we can't use. that .
Let me explain the actual use case .

We have limited space ,we can't keep storing the document for infinite
time  . So based on the customer's retention policy ,I need to delete the
documents. And in this process  if any shard gets empty , need to delete
the shard as well.

So lets say , is there a way to know, when solr completes the purging of
deleted documents, then based on that flag we can configure shard deletion

Thanks
Pushkar

On Tue, Dec 1, 2020 at 9:02 PM Erick Erickson <er...@gmail.com>
wrote:

> This is still confusing. You haven’t told us what router you are using,
> compositeId or implicit?
>
> If you’re using compositeId (the default), you will never have empty shards
> because docs get assigned to shards via a hashing algorithm that
> distributes
> them very evenly across all available shards. You cannot delete any
> shard when using compositeId as your routing method.
>
> If you don’t know which router you’re using, then you’re using compositeId.
>
> NOTE: for the rest, “documents” means non-deleted documents. Solr will
> take care of purging the deleted documents automatically.
>
> I think you’re making this much more difficult than you need to. Assuming
> that the total number of documents remains relatively constant, you can
> just
> let Solr take care of it all and not bother with trying to individually
> manage
> shards by using the default compositeID routing.
>
> If the number of docs increases you might need to use splitshard. But it
> sounds like the total number of “live” documents isn’t going to increase.
>
> For TTL, if you have a _fixed_ TTL, i.e. the docs should always expire
> after,
> say, 30 dayswhich it doesn’t sound like you do, you can use
> the “Time Routed Alias” option, see:
> https://lucene.apache.org/solr/guide/7_5/time-routed-aliases.html
>
> Assuming your TTL isn’t a fixed-interval, you can configure
> DocExpirationUpdateProcessorFactory to deal with TTL automatically.
>
> And if you still think you need to handle this, you need to explain exactly
> what problem you’re trying to solve because so far it appears that
> you’re simply taking on way more work than you need to.
>
> Best,
> Erick
>
> > On Dec 1, 2020, at 9:46 AM, Pushkar Mishra <pu...@gmail.com>
> wrote:
> >
> > Hi Team,
> > As I explained the use case , can someone help me out to find out the
> > configuration way to delete the shard here ?
> > A quick response  will be greatly appreciated.
> >
> > Regards
> > Pushkar
> >
> >
> > On Mon, Nov 30, 2020 at 11:32 PM Pushkar Mishra <pu...@gmail.com>
> > wrote:
> >
> >>
> >>
> >> On Mon, Nov 30, 2020, 9:15 PM Pushkar Mishra <pu...@gmail.com>
> >> wrote:
> >>
> >>> Hi Erick,
> >>> First of all thanks for your response . I will check the possibility  .
> >>> Let me explain my problem  in detail :
> >>>
> >>> 1. We have other use cases where we are making use of listener on
> >>> postCommit to delete/shift/split the shards . So we have capability to
> >>> delete the shards .
> >>> 2. The current use case is , where we have to delete the documents from
> >>> the shard , and during deletion process(it will be scheduled process,
> may
> >>> be hourly or daily, which will delete the documents) , if shards  gets
> >>> empty (or may be lets  say nominal documents are left ) , then delete
> the
> >>> shard.  And I am exploring to do this using configuration .
> >>>
> >> 3. Also it will not be in live shard for sure as only those documents
> are
> >> deleted which have TTL got over . TTL could be a month or year.
> >>
> >> Please assist if you have any config based idea on this
> >>
> >>> Regards
> >>> Pushkar
> >>>
> >>> On Mon, Nov 30, 2020, 8:48 PM Erick Erickson <er...@gmail.com>
> >>> wrote:
> >>>
> >>>> Are you using the implicit router? Otherwise you cannot delete a
> shard.
> >>>> And you won’t have any shards that have zero documents anyway.
> >>>>
> >>>> It’d be a little convoluted, but you could use the collections
> COLSTATUS
> >>>> Api to
> >>>> find the names of all your replicas. Then query _one_ replica of each
> >>>> shard with something like
> >>>> solr/collection1_shard1_replica_n1/q=*:*&distrib=false
> >>>>
> >>>> that’ll return the number of live docs (i.e. non-deleted docs) and if
> >>>> it’s zero
> >>>> you can delete the shard.
> >>>>
> >>>> But the implicit router requires you take complete control of where
> >>>> documents
> >>>> go, i.e. which shard they land on.
> >>>>
> >>>> This really sounds like an XY problem. What’s the use  case you’re
> trying
> >>>> to support where you expect a shard’s number of live docs to drop to
> >>>> zero?
> >>>>
> >>>> Best,
> >>>> Erick
> >>>>
> >>>>> On Nov 30, 2020, at 4:57 AM, Pushkar Mishra <pu...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Hi Solr team,
> >>>>>
> >>>>> I am using solr cloud.(version 8.5.x). I have a need to find out a
> >>>>> configuration where I can delete a shard , when number of documents
> >>>> reaches
> >>>>> to zero in the shard , can some one help me out to achieve that ?
> >>>>>
> >>>>>
> >>>>> It is urgent , so a quick response will be highly appreciated .
> >>>>>
> >>>>> Thanks
> >>>>> Pushkar
> >>>>>
> >>>>> --
> >>>>> Pushkar Kumar Mishra
> >>>>> "Reactions are always instinctive whereas responses are always well
> >>>> thought
> >>>>> of... So start responding rather than reacting in life"
> >>>>
> >>>>
> >
> > --
> > Pushkar Kumar Mishra
> > "Reactions are always instinctive whereas responses are always well
> thought
> > of... So start responding rather than reacting in life"
>
>

-- 
Pushkar Kumar Mishra
"Reactions are always instinctive whereas responses are always well thought
of... So start responding rather than reacting in life"

Re: Need help to configure automated deletion of shard in solr

Posted by Erick Erickson <er...@gmail.com>.
This is still confusing. You haven’t told us what router you are using, 
compositeId or implicit?

If you’re using compositeId (the default), you will never have empty shards
because docs get assigned to shards via a hashing algorithm that distributes
them very evenly across all available shards. You cannot delete any
shard when using compositeId as your routing method.

If you don’t know which router you’re using, then you’re using compositeId.

NOTE: for the rest, “documents” means non-deleted documents. Solr will
take care of purging the deleted documents automatically.

I think you’re making this much more difficult than you need to. Assuming
that the total number of documents remains relatively constant, you can just
let Solr take care of it all and not bother with trying to individually manage
shards by using the default compositeID routing.

If the number of docs increases you might need to use splitshard. But it
sounds like the total number of “live” documents isn’t going to increase.

For TTL, if you have a _fixed_ TTL, i.e. the docs should always expire after,
say, 30 dayswhich it doesn’t sound like you do, you can use
the “Time Routed Alias” option, see:
https://lucene.apache.org/solr/guide/7_5/time-routed-aliases.html

Assuming your TTL isn’t a fixed-interval, you can configure
DocExpirationUpdateProcessorFactory to deal with TTL automatically.

And if you still think you need to handle this, you need to explain exactly
what problem you’re trying to solve because so far it appears that 
you’re simply taking on way more work than you need to.

Best,
Erick

> On Dec 1, 2020, at 9:46 AM, Pushkar Mishra <pu...@gmail.com> wrote:
> 
> Hi Team,
> As I explained the use case , can someone help me out to find out the
> configuration way to delete the shard here ?
> A quick response  will be greatly appreciated.
> 
> Regards
> Pushkar
> 
> 
> On Mon, Nov 30, 2020 at 11:32 PM Pushkar Mishra <pu...@gmail.com>
> wrote:
> 
>> 
>> 
>> On Mon, Nov 30, 2020, 9:15 PM Pushkar Mishra <pu...@gmail.com>
>> wrote:
>> 
>>> Hi Erick,
>>> First of all thanks for your response . I will check the possibility  .
>>> Let me explain my problem  in detail :
>>> 
>>> 1. We have other use cases where we are making use of listener on
>>> postCommit to delete/shift/split the shards . So we have capability to
>>> delete the shards .
>>> 2. The current use case is , where we have to delete the documents from
>>> the shard , and during deletion process(it will be scheduled process, may
>>> be hourly or daily, which will delete the documents) , if shards  gets
>>> empty (or may be lets  say nominal documents are left ) , then delete the
>>> shard.  And I am exploring to do this using configuration .
>>> 
>> 3. Also it will not be in live shard for sure as only those documents are
>> deleted which have TTL got over . TTL could be a month or year.
>> 
>> Please assist if you have any config based idea on this
>> 
>>> Regards
>>> Pushkar
>>> 
>>> On Mon, Nov 30, 2020, 8:48 PM Erick Erickson <er...@gmail.com>
>>> wrote:
>>> 
>>>> Are you using the implicit router? Otherwise you cannot delete a shard.
>>>> And you won’t have any shards that have zero documents anyway.
>>>> 
>>>> It’d be a little convoluted, but you could use the collections COLSTATUS
>>>> Api to
>>>> find the names of all your replicas. Then query _one_ replica of each
>>>> shard with something like
>>>> solr/collection1_shard1_replica_n1/q=*:*&distrib=false
>>>> 
>>>> that’ll return the number of live docs (i.e. non-deleted docs) and if
>>>> it’s zero
>>>> you can delete the shard.
>>>> 
>>>> But the implicit router requires you take complete control of where
>>>> documents
>>>> go, i.e. which shard they land on.
>>>> 
>>>> This really sounds like an XY problem. What’s the use  case you’re trying
>>>> to support where you expect a shard’s number of live docs to drop to
>>>> zero?
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>>> On Nov 30, 2020, at 4:57 AM, Pushkar Mishra <pu...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Hi Solr team,
>>>>> 
>>>>> I am using solr cloud.(version 8.5.x). I have a need to find out a
>>>>> configuration where I can delete a shard , when number of documents
>>>> reaches
>>>>> to zero in the shard , can some one help me out to achieve that ?
>>>>> 
>>>>> 
>>>>> It is urgent , so a quick response will be highly appreciated .
>>>>> 
>>>>> Thanks
>>>>> Pushkar
>>>>> 
>>>>> --
>>>>> Pushkar Kumar Mishra
>>>>> "Reactions are always instinctive whereas responses are always well
>>>> thought
>>>>> of... So start responding rather than reacting in life"
>>>> 
>>>> 
> 
> -- 
> Pushkar Kumar Mishra
> "Reactions are always instinctive whereas responses are always well thought
> of... So start responding rather than reacting in life"