You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Stephan Bardubitzki <st...@bardubitzki.com> on 2012/08/30 17:44:58 UTC

auto delete docs

I have a db with docs that have a field expire_date. I want to auto 
delete those docs when expire_date has passed, but have no clue how to 
do that.

Any advise would be greatly appreciated.

Thanks,
Stephan

Re: auto delete docs

Posted by Matthieu Rakotojaona <ma...@gmail.com>.
On Thu, Aug 30, 2012 at 5:44 PM, Stephan Bardubitzki
<st...@bardubitzki.com> wrote:
> I have a db with docs that have a field expire_date. I want to auto delete
> those docs when expire_date has passed, but have no clue how to do that.
>
> Any advise would be greatly appreciated.
>
> Thanks,
> Stephan

I don't think there is an internal way in CouchDB to do this.

I did my own approach here, a long time ago :
https://github.com/rakoo/MultiBin-burner. It's in erlang though.

Basically, you build a FSM that will fetch all the expire_dates and
keep the next one, then sleep until that moment. If a new doc arrives
and its expire_date is before the one you are currently watching, it
is set as the new one to watch. When the expire_date arrives, the doc
is burnt, the new dates are processed.

What's missing :
* actual connection with couchdb. I planned to use couchbeam_changes
instead of polling
* actual burning. I believe couchbeam will give you everything you need.
* cleaning of the code, proper OTP. I did not know much at the time.

Note : It looks like your question was more on "How do I bulk delete".
I hope I wasn't too off-topic.

-- 
Matthieu RAKOTOJAONA

Re: auto delete docs

Posted by Simon Metson <si...@cloudant.com>.
For completeness you can limit the number of docs (to 100 for example) via ?limit=100 in the query and just hit the view till you get no responses. That'll recalculate the view each time, because your data is changing. If you want to page through (rather than recalculate the view) you can do ?limit=100&skip=100&stale=ok. 
Cheers
Simon


On Thursday, 30 August 2012 at 18:35, Stephan Bardubitzki wrote:

> Hi Stephen,
> 
> I will keep that in mind, thanks for sharing.
> 
> Stephan
> 
> On 12-08-30 10:26 AM, stephen bartell wrote:
> > What Simon said. Im doing this in production. I have a python script set up as a cron job which queries the expire_time view. Docs that get returned get deleted. Its a real small, simple script and works like a charm.
> > 
> > Im pretty sure you know this, but make sure to page through your view results. If your database hasn't been 'cleaned out' for a while and the database is huge, you're going to get quite a massive response. I once made the mistake of not paging my results in a node program. V8's mem limit is 1gb on a 64 bit machine, so once this limit was exceeded, the program would crash and never complete its job. You could see where this leads if that program is responsible for getting rid of old docs and thus keeping database size under control.
> > 
> > Stephen Bartell
> > 
> > Look ahead, understand the shift, and imagine the right solution five years from now. Then do it!
> > -baekdal
> > 
> > On Aug 30, 2012, at 9:58 AM, Stephan Bardubitzki wrote:
> > 
> > > Okay, the link works now. Must have been an issue at Apache.
> > > 
> > > On 12-08-30 09:30 AM, Stephan Bardubitzki wrote:
> > > > @Simon:
> > > > 
> > > > Thanks, again. Could you please double check the link, I can't open it.
> > > > 
> > > > @Matthieu
> > > > 
> > > > Unfortunately, I'm not familiar with Erlang. I need to do this task on node.js with nano (https://github.com/dscape/nano).
> > > > 
> > > > 
> > > > On 12-08-30 09:03 AM, Simon Metson wrote:
> > > > > Sure, you can use _bulk_docs and send a json doc like:
> > > > > 
> > > > > {
> > > > > "docs": [
> > > > > {"_id": "expired_doc_0", "_rev": "1-62657917", "_deleted": true},
> > > > > {"_id": "expired_doc_1", "_rev": "1-2089673485", "_deleted": true},
> > > > > {"_id": "expired_doc_2", "_rev": "1-2063452834", "_deleted": true}
> > > > > ]
> > > > > }
> > > > > 
> > > > > 
> > > > > http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API#Modify_Multiple_Documents_With_a_Single_Request for more info
> > > > > 
> > > > > 
> > > > > On Thursday, 30 August 2012 at 17:01, Stephan Bardubitzki wrote:
> > > > > 
> > > > > > Thanks Simon,
> > > > > > 
> > > > > > that's what I was thinking too, but have a problem to figure out how
> > > > > > bulk delete should be implemented. Do you have some advise or code
> > > > > > example on that?
> > > > > > 
> > > > > > Stephan
> > > > > > 
> > > > > > 
> > > > > > On 12-08-30 08:48 AM, Simon Metson wrote:
> > > > > > > You need to have a view keyed by expire_date and an external process (some cron script, say) that queries that appropriately and makes the (bulk) delete of old docs.
> > > > > > > Cheers
> > > > > > > Simon
> > > > > > > 
> > > > > > > 
> > > > > > > On Thursday, 30 August 2012 at 16:44, Stephan Bardubitzki wrote:
> > > > > > > 
> > > > > > > > I have a db with docs that have a field expire_date. I want to auto
> > > > > > > > delete those docs when expire_date has passed, but have no clue how to
> > > > > > > > do that.
> > > > > > > > 
> > > > > > > > Any advise would be greatly appreciated.
> > > > > > > > 
> > > > > > > > Thanks,
> > > > > > > > Stephan
> > > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > --------------------------------
> > > > > > > Spam/Virus scanning by CanIt Pro
> > > > > > > 
> > > > > > > For more information see
> > > > > > > http://www.kgbinternet.com/SpamFilter.htm
> > > > > > > 
> > > > > > > To control your spam filter, log in at
> > > > > > > http://filter.kgbinternet.com
> > > > > > > 
> > > > > > 
> > > > > > 
> > > > > 
> > > > > 
> > > > > --------------------------------
> > > > > Spam/Virus scanning by CanIt Pro
> > > > > 
> > > > > For more information see
> > > > > http://www.kgbinternet.com/SpamFilter.htm
> > > > > 
> > > > > To control your spam filter, log in at
> > > > > http://filter.kgbinternet.com
> > > > > 
> > > > 
> > > > 
> > > > --------------------------------
> > > > Spam/Virus scanning by CanIt Pro
> > > > 
> > > > For more information see
> > > > http://www.kgbinternet.com/SpamFilter.htm
> > > > 
> > > > To control your spam filter, log in at
> > > > http://filter.kgbinternet.com
> > > > 
> > > 
> > > 
> > 
> > 
> > --------------------------------
> > Spam/Virus scanning by CanIt Pro
> > 
> > For more information see
> > http://www.kgbinternet.com/SpamFilter.htm
> > 
> > To control your spam filter, log in at
> > http://filter.kgbinternet.com
> > 
> 
> 
> 



Re: auto delete docs

Posted by Stephan Bardubitzki <st...@bardubitzki.com>.
Hi Stephen,

I will keep that in mind, thanks for sharing.

Stephan

On 12-08-30 10:26 AM, stephen bartell wrote:
> What Simon said.  Im doing this in production.  I have a python script set up as a cron job which queries the expire_time view.  Docs that get returned get deleted. Its a real small, simple script and works like a charm.
>
> Im pretty sure you know this, but make sure to page through your view results.  If your database hasn't been 'cleaned out' for a while and the database is huge, you're going to get quite a massive response.  I once made the mistake of not paging my results in a node program.  V8's mem limit is 1gb on a 64 bit machine, so once this limit was exceeded, the program would crash and never complete its job.  You could see where this leads if that program is responsible for getting rid of old docs and thus keeping database size under control.
>
> Stephen Bartell
>
> Look ahead, understand the shift, and imagine the right solution five years from now. Then do it!
> -baekdal
>
> On Aug 30, 2012, at 9:58 AM, Stephan Bardubitzki wrote:
>
>> Okay, the link works now. Must have been an issue at Apache.
>>
>> On 12-08-30 09:30 AM, Stephan Bardubitzki wrote:
>>> @Simon:
>>>
>>> Thanks, again. Could you please double check the link, I can't open it.
>>>
>>> @Matthieu
>>>
>>> Unfortunately, I'm not familiar with Erlang. I need to do this task on node.js with nano (https://github.com/dscape/nano).
>>>
>>>
>>> On 12-08-30 09:03 AM, Simon Metson wrote:
>>>> Sure, you can use _bulk_docs and send a json doc like:
>>>>
>>>> {
>>>>    "docs": [
>>>>      {"_id": "expired_doc_0", "_rev": "1-62657917", "_deleted": true},
>>>>      {"_id": "expired_doc_1", "_rev": "1-2089673485", "_deleted": true},
>>>>      {"_id": "expired_doc_2", "_rev": "1-2063452834", "_deleted": true}
>>>>    ]
>>>> }
>>>>
>>>>
>>>> http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API#Modify_Multiple_Documents_With_a_Single_Request for more info
>>>>
>>>>
>>>> On Thursday, 30 August 2012 at 17:01, Stephan Bardubitzki wrote:
>>>>
>>>>> Thanks Simon,
>>>>>
>>>>> that's what I was thinking too, but have a problem to figure out how
>>>>> bulk delete should be implemented. Do you have some advise or code
>>>>> example on that?
>>>>>
>>>>> Stephan
>>>>>
>>>>>
>>>>> On 12-08-30 08:48 AM, Simon Metson wrote:
>>>>>> You need to have a view keyed by expire_date and an external process (some cron script, say) that queries that appropriately and makes the (bulk) delete of old docs.
>>>>>> Cheers
>>>>>> Simon
>>>>>>
>>>>>>
>>>>>> On Thursday, 30 August 2012 at 16:44, Stephan Bardubitzki wrote:
>>>>>>
>>>>>>> I have a db with docs that have a field expire_date. I want to auto
>>>>>>> delete those docs when expire_date has passed, but have no clue how to
>>>>>>> do that.
>>>>>>>
>>>>>>> Any advise would be greatly appreciated.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Stephan
>>>>>>>
>>>>>>
>>>>>> --------------------------------
>>>>>> Spam/Virus scanning by CanIt Pro
>>>>>>
>>>>>> For more information see
>>>>>> http://www.kgbinternet.com/SpamFilter.htm
>>>>>>
>>>>>> To control your spam filter, log in at
>>>>>> http://filter.kgbinternet.com
>>>>>>
>>>>>
>>>>
>>>> --------------------------------
>>>> Spam/Virus scanning by CanIt Pro
>>>>
>>>> For more information see
>>>> http://www.kgbinternet.com/SpamFilter.htm
>>>>
>>>> To control your spam filter, log in at
>>>> http://filter.kgbinternet.com
>>>>
>>>
>>> --------------------------------
>>> Spam/Virus scanning by CanIt Pro
>>>
>>> For more information see
>>> http://www.kgbinternet.com/SpamFilter.htm
>>>
>>> To control your spam filter, log in at
>>> http://filter.kgbinternet.com
>>>
>
> --------------------------------
> Spam/Virus scanning by CanIt Pro
>
> For more information see
> http://www.kgbinternet.com/SpamFilter.htm
>
> To control your spam filter, log in at
> http://filter.kgbinternet.com
>


Re: auto delete docs

Posted by stephen bartell <sn...@gmail.com>.
What Simon said.  Im doing this in production.  I have a python script set up as a cron job which queries the expire_time view.  Docs that get returned get deleted. Its a real small, simple script and works like a charm.

Im pretty sure you know this, but make sure to page through your view results.  If your database hasn't been 'cleaned out' for a while and the database is huge, you're going to get quite a massive response.  I once made the mistake of not paging my results in a node program.  V8's mem limit is 1gb on a 64 bit machine, so once this limit was exceeded, the program would crash and never complete its job.  You could see where this leads if that program is responsible for getting rid of old docs and thus keeping database size under control.

Stephen Bartell

Look ahead, understand the shift, and imagine the right solution five years from now. Then do it!
-baekdal

On Aug 30, 2012, at 9:58 AM, Stephan Bardubitzki wrote:

> Okay, the link works now. Must have been an issue at Apache.
> 
> On 12-08-30 09:30 AM, Stephan Bardubitzki wrote:
>> @Simon:
>> 
>> Thanks, again. Could you please double check the link, I can't open it.
>> 
>> @Matthieu
>> 
>> Unfortunately, I'm not familiar with Erlang. I need to do this task on node.js with nano (https://github.com/dscape/nano).
>> 
>> 
>> On 12-08-30 09:03 AM, Simon Metson wrote:
>>> Sure, you can use _bulk_docs and send a json doc like:
>>> 
>>> {
>>>   "docs": [
>>>     {"_id": "expired_doc_0", "_rev": "1-62657917", "_deleted": true},
>>>     {"_id": "expired_doc_1", "_rev": "1-2089673485", "_deleted": true},
>>>     {"_id": "expired_doc_2", "_rev": "1-2063452834", "_deleted": true}
>>>   ]
>>> }
>>> 
>>> 
>>> http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API#Modify_Multiple_Documents_With_a_Single_Request for more info
>>> 
>>> 
>>> On Thursday, 30 August 2012 at 17:01, Stephan Bardubitzki wrote:
>>> 
>>>> Thanks Simon,
>>>> 
>>>> that's what I was thinking too, but have a problem to figure out how
>>>> bulk delete should be implemented. Do you have some advise or code
>>>> example on that?
>>>> 
>>>> Stephan
>>>> 
>>>> 
>>>> On 12-08-30 08:48 AM, Simon Metson wrote:
>>>>> You need to have a view keyed by expire_date and an external process (some cron script, say) that queries that appropriately and makes the (bulk) delete of old docs.
>>>>> Cheers
>>>>> Simon
>>>>> 
>>>>> 
>>>>> On Thursday, 30 August 2012 at 16:44, Stephan Bardubitzki wrote:
>>>>> 
>>>>>> I have a db with docs that have a field expire_date. I want to auto
>>>>>> delete those docs when expire_date has passed, but have no clue how to
>>>>>> do that.
>>>>>> 
>>>>>> Any advise would be greatly appreciated.
>>>>>> 
>>>>>> Thanks,
>>>>>> Stephan
>>>>>> 
>>>>> 
>>>>> 
>>>>> --------------------------------
>>>>> Spam/Virus scanning by CanIt Pro
>>>>> 
>>>>> For more information see
>>>>> http://www.kgbinternet.com/SpamFilter.htm
>>>>> 
>>>>> To control your spam filter, log in at
>>>>> http://filter.kgbinternet.com
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> --------------------------------
>>> Spam/Virus scanning by CanIt Pro
>>> 
>>> For more information see
>>> http://www.kgbinternet.com/SpamFilter.htm
>>> 
>>> To control your spam filter, log in at
>>> http://filter.kgbinternet.com
>>> 
>> 
>> 
>> --------------------------------
>> Spam/Virus scanning by CanIt Pro
>> 
>> For more information see
>> http://www.kgbinternet.com/SpamFilter.htm
>> 
>> To control your spam filter, log in at
>> http://filter.kgbinternet.com
>> 
> 


Re: auto delete docs

Posted by Stephan Bardubitzki <st...@bardubitzki.com>.
Okay, the link works now. Must have been an issue at Apache.

On 12-08-30 09:30 AM, Stephan Bardubitzki wrote:
> @Simon:
>
> Thanks, again. Could you please double check the link, I can't open it.
>
> @Matthieu
>
> Unfortunately, I'm not familiar with Erlang. I need to do this task on 
> node.js with nano (https://github.com/dscape/nano).
>
>
> On 12-08-30 09:03 AM, Simon Metson wrote:
>> Sure, you can use _bulk_docs and send a json doc like:
>>
>> {
>>    "docs": [
>>      {"_id": "expired_doc_0", "_rev": "1-62657917", "_deleted": true},
>>      {"_id": "expired_doc_1", "_rev": "1-2089673485", "_deleted": true},
>>      {"_id": "expired_doc_2", "_rev": "1-2063452834", "_deleted": true}
>>    ]
>> }
>>
>>
>> http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API#Modify_Multiple_Documents_With_a_Single_Request 
>> for more info
>>
>>
>> On Thursday, 30 August 2012 at 17:01, Stephan Bardubitzki wrote:
>>
>>> Thanks Simon,
>>>
>>> that's what I was thinking too, but have a problem to figure out how
>>> bulk delete should be implemented. Do you have some advise or code
>>> example on that?
>>>
>>> Stephan
>>>
>>>
>>> On 12-08-30 08:48 AM, Simon Metson wrote:
>>>> You need to have a view keyed by expire_date and an external 
>>>> process (some cron script, say) that queries that appropriately and 
>>>> makes the (bulk) delete of old docs.
>>>> Cheers
>>>> Simon
>>>>
>>>>
>>>> On Thursday, 30 August 2012 at 16:44, Stephan Bardubitzki wrote:
>>>>
>>>>> I have a db with docs that have a field expire_date. I want to auto
>>>>> delete those docs when expire_date has passed, but have no clue 
>>>>> how to
>>>>> do that.
>>>>>
>>>>> Any advise would be greatly appreciated.
>>>>>
>>>>> Thanks,
>>>>> Stephan
>>>>>
>>>>
>>>>
>>>> --------------------------------
>>>> Spam/Virus scanning by CanIt Pro
>>>>
>>>> For more information see
>>>> http://www.kgbinternet.com/SpamFilter.htm
>>>>
>>>> To control your spam filter, log in at
>>>> http://filter.kgbinternet.com
>>>>
>>>
>>>
>>
>>
>> --------------------------------
>> Spam/Virus scanning by CanIt Pro
>>
>> For more information see
>> http://www.kgbinternet.com/SpamFilter.htm
>>
>> To control your spam filter, log in at
>> http://filter.kgbinternet.com
>>
>
>
> --------------------------------
> Spam/Virus scanning by CanIt Pro
>
> For more information see
> http://www.kgbinternet.com/SpamFilter.htm
>
> To control your spam filter, log in at
> http://filter.kgbinternet.com
>


Re: auto delete docs

Posted by Stephan Bardubitzki <st...@bardubitzki.com>.
@Simon:

Thanks, again. Could you please double check the link, I can't open it.

@Matthieu

Unfortunately, I'm not familiar with Erlang. I need to do this task on 
node.js with nano (https://github.com/dscape/nano).


On 12-08-30 09:03 AM, Simon Metson wrote:
> Sure, you can use _bulk_docs and send a json doc like:
>
> {
>    "docs": [
>      {"_id": "expired_doc_0", "_rev": "1-62657917", "_deleted": true},
>      {"_id": "expired_doc_1", "_rev": "1-2089673485", "_deleted": true},
>      {"_id": "expired_doc_2", "_rev": "1-2063452834", "_deleted": true}
>    ]
> }
>
>
> http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API#Modify_Multiple_Documents_With_a_Single_Request for more info
>
>
> On Thursday, 30 August 2012 at 17:01, Stephan Bardubitzki wrote:
>
>> Thanks Simon,
>>
>> that's what I was thinking too, but have a problem to figure out how
>> bulk delete should be implemented. Do you have some advise or code
>> example on that?
>>
>> Stephan
>>
>>
>> On 12-08-30 08:48 AM, Simon Metson wrote:
>>> You need to have a view keyed by expire_date and an external process (some cron script, say) that queries that appropriately and makes the (bulk) delete of old docs.
>>> Cheers
>>> Simon
>>>
>>>
>>> On Thursday, 30 August 2012 at 16:44, Stephan Bardubitzki wrote:
>>>
>>>> I have a db with docs that have a field expire_date. I want to auto
>>>> delete those docs when expire_date has passed, but have no clue how to
>>>> do that.
>>>>
>>>> Any advise would be greatly appreciated.
>>>>
>>>> Thanks,
>>>> Stephan
>>>>
>>>
>>>
>>> --------------------------------
>>> Spam/Virus scanning by CanIt Pro
>>>
>>> For more information see
>>> http://www.kgbinternet.com/SpamFilter.htm
>>>
>>> To control your spam filter, log in at
>>> http://filter.kgbinternet.com
>>>
>>
>>
>
>
> --------------------------------
> Spam/Virus scanning by CanIt Pro
>
> For more information see
> http://www.kgbinternet.com/SpamFilter.htm
>
> To control your spam filter, log in at
> http://filter.kgbinternet.com
>


Re: auto delete docs

Posted by Simon Metson <si...@cloudant.com>.
Sure, you can use _bulk_docs and send a json doc like: 

{
  "docs": [
    {"_id": "expired_doc_0", "_rev": "1-62657917", "_deleted": true},
    {"_id": "expired_doc_1", "_rev": "1-2089673485", "_deleted": true},
    {"_id": "expired_doc_2", "_rev": "1-2063452834", "_deleted": true}
  ]
}


http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API#Modify_Multiple_Documents_With_a_Single_Request for more info 


On Thursday, 30 August 2012 at 17:01, Stephan Bardubitzki wrote:

> Thanks Simon,
> 
> that's what I was thinking too, but have a problem to figure out how 
> bulk delete should be implemented. Do you have some advise or code 
> example on that?
> 
> Stephan
> 
> 
> On 12-08-30 08:48 AM, Simon Metson wrote:
> > You need to have a view keyed by expire_date and an external process (some cron script, say) that queries that appropriately and makes the (bulk) delete of old docs.
> > Cheers
> > Simon
> > 
> > 
> > On Thursday, 30 August 2012 at 16:44, Stephan Bardubitzki wrote:
> > 
> > > I have a db with docs that have a field expire_date. I want to auto
> > > delete those docs when expire_date has passed, but have no clue how to
> > > do that.
> > > 
> > > Any advise would be greatly appreciated.
> > > 
> > > Thanks,
> > > Stephan
> > > 
> > 
> > 
> > 
> > --------------------------------
> > Spam/Virus scanning by CanIt Pro
> > 
> > For more information see
> > http://www.kgbinternet.com/SpamFilter.htm
> > 
> > To control your spam filter, log in at
> > http://filter.kgbinternet.com
> > 
> 
> 
> 



Re: auto delete docs

Posted by Stephan Bardubitzki <st...@bardubitzki.com>.
Thanks Simon,

that's what I was thinking too, but have a problem to figure out how 
bulk delete should be implemented. Do you have some advise or code 
example on that?

Stephan


On 12-08-30 08:48 AM, Simon Metson wrote:
> You need to have a view keyed by expire_date and an external process (some cron script, say) that queries that appropriately and makes the (bulk) delete of old docs.
> Cheers
> Simon
>
>
> On Thursday, 30 August 2012 at 16:44, Stephan Bardubitzki wrote:
>
>> I have a db with docs that have a field expire_date. I want to auto
>> delete those docs when expire_date has passed, but have no clue how to
>> do that.
>>
>> Any advise would be greatly appreciated.
>>
>> Thanks,
>> Stephan
>>
>>
>
>
> --------------------------------
> Spam/Virus scanning by CanIt Pro
>
> For more information see
> http://www.kgbinternet.com/SpamFilter.htm
>
> To control your spam filter, log in at
> http://filter.kgbinternet.com
>


Re: auto delete docs

Posted by Simon Metson <si...@cloudant.com>.
You need to have a view keyed by expire_date and an external process (some cron script, say) that queries that appropriately and makes the (bulk) delete of old docs. 
Cheers
Simon


On Thursday, 30 August 2012 at 16:44, Stephan Bardubitzki wrote:

> I have a db with docs that have a field expire_date. I want to auto 
> delete those docs when expire_date has passed, but have no clue how to 
> do that.
> 
> Any advise would be greatly appreciated.
> 
> Thanks,
> Stephan
> 
>