You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jonathan Ariel <io...@gmail.com> on 2008/07/04 16:52:47 UTC

Bulk delete

Hi,
Is there any good way to do a bulk delete of several documents?
I have more than 1000 documents to delete... and I don't want to send N
request with <delete><id>X</id></delete>.
Doing a query delete isn't a good solution because I have a maximum amount
of terms that I can use in the query. For example:
<delete><query>id:(X1 OR X2 OR .... OR Xn)</query></delete> where n could be
more than 1000

Thanks,

Jonathan

Re: Bulk delete

Posted by Jonathan Ariel <io...@gmail.com>.
this is a really nice feature. any known limit for the amount of Ids that I
can add there?


On Fri, Jul 4, 2008 at 12:40 PM, Jonathan Ariel <io...@gmail.com> wrote:

> yeah I know. the problem with a query is that there is a maximum amount of
> query terms that I can add, which is reasonable. The problem is that I have
> thousands of Ids.
>
>
> On Fri, Jul 4, 2008 at 12:29 PM, Noble Paul നോബിള്‍ नोब्ळ् <
> noble.paul@gmail.com> wrote:
>
>> You can either delete by a query or by an id. It is like you use any
>> database . If you can find a condition by which you can identify these
>> docs then you can delete by a query .
>> --Noble
>> On Fri, Jul 4, 2008 at 8:22 PM, Jonathan Ariel <io...@gmail.com>
>> wrote:
>> > Hi,
>> > Is there any good way to do a bulk delete of several documents?
>> > I have more than 1000 documents to delete... and I don't want to send N
>> > request with <delete><id>X</id></delete>.
>> > Doing a query delete isn't a good solution because I have a maximum
>> amount
>> > of terms that I can use in the query. For example:
>> > <delete><query>id:(X1 OR X2 OR .... OR Xn)</query></delete> where n
>> could be
>> > more than 1000
>> >
>> > Thanks,
>> >
>> > Jonathan
>> >
>>
>>
>>
>> --
>> --Noble Paul
>>
>
>

Re: Bulk delete

Posted by Walter Underwood <wu...@netflix.com>.
Sending chunked requests is more work, but as Mike points out,
it is good design. A design without limits on request size
can fail by overloading the server, hitting security triggers
in firewall software, etc. That is a fragile design.

I doubt that there is any difference in performance between
batched deletes and one mongo delete. Most of the time is in
the commit.

wunder

On 7/4/08 3:53 PM, "Jonathan Ariel" <io...@gmail.com> wrote:

> It is reasonable, but it seems to me too much work if I already know in
> advance all the IDs that I want to delete.
> Having N Ids to delete in advance seems unnatural to execute N requests
> instead of just 1 or few, but not N.
> If I can avoid unnecessary requests grouping them, I would do it. Specially
> if I know that no one will execute deletes but just that process.
> 
> Do you think that solr performs better or the same with N delete requests
> (when N is more than 1000) than 1, 2 or 10?
> 
> 
> 
> On Fri, Jul 4, 2008 at 6:05 PM, Mike Klaas <mi...@gmail.com> wrote:
> 
>> Why?  It is not reasonable in a distributed system to perform requests of
>> unbounded size (not to say that it won't work).  If the concern is
>> throughput, large batches should be sufficient.
>> 
>> -Mike
>> 
>> 
>> On 4-Jul-08, at 9:06 AM, Jonathan Ariel wrote:
>> 
>>  Yes, I just wanted to avoid N requests and do just 2.
>>> 
>>> On Fri, Jul 4, 2008 at 12:48 PM, Walter Underwood <wunderwood@netflix.com
>>>> 
>>> wrote:
>>> 
>>>  Send multiple deletes, with a commit after the last one. --wunder
>>>> 
>>>> On 7/4/08 8:40 AM, "Jonathan Ariel" <io...@gmail.com> wrote:
>>>> 
>>>>  yeah I know. the problem with a query is that there is a maximum amount
>>>>> of
>>>>> 
>>>> query terms that I can add, which is reasonable. The problem is that I
>>>> 
>>>>> have
>>>>> 
>>>> thousands of Ids.
>>>> 
>>>> 
>>>> 
>> 


Re: Bulk delete

Posted by Jonathan Ariel <io...@gmail.com>.
It is reasonable, but it seems to me too much work if I already know in
advance all the IDs that I want to delete.
Having N Ids to delete in advance seems unnatural to execute N requests
instead of just 1 or few, but not N.
If I can avoid unnecessary requests grouping them, I would do it. Specially
if I know that no one will execute deletes but just that process.

Do you think that solr performs better or the same with N delete requests
(when N is more than 1000) than 1, 2 or 10?



On Fri, Jul 4, 2008 at 6:05 PM, Mike Klaas <mi...@gmail.com> wrote:

> Why?  It is not reasonable in a distributed system to perform requests of
> unbounded size (not to say that it won't work).  If the concern is
> throughput, large batches should be sufficient.
>
> -Mike
>
>
> On 4-Jul-08, at 9:06 AM, Jonathan Ariel wrote:
>
>  Yes, I just wanted to avoid N requests and do just 2.
>>
>> On Fri, Jul 4, 2008 at 12:48 PM, Walter Underwood <wunderwood@netflix.com
>> >
>> wrote:
>>
>>  Send multiple deletes, with a commit after the last one. --wunder
>>>
>>> On 7/4/08 8:40 AM, "Jonathan Ariel" <io...@gmail.com> wrote:
>>>
>>>  yeah I know. the problem with a query is that there is a maximum amount
>>>> of
>>>>
>>> query terms that I can add, which is reasonable. The problem is that I
>>>
>>>> have
>>>>
>>> thousands of Ids.
>>>
>>>
>>>
>

Re: Bulk delete

Posted by Mike Klaas <mi...@gmail.com>.
Why?  It is not reasonable in a distributed system to perform requests  
of unbounded size (not to say that it won't work).  If the concern is  
throughput, large batches should be sufficient.

-Mike

On 4-Jul-08, at 9:06 AM, Jonathan Ariel wrote:

> Yes, I just wanted to avoid N requests and do just 2.
>
> On Fri, Jul 4, 2008 at 12:48 PM, Walter Underwood <wunderwood@netflix.com 
> >
> wrote:
>
>> Send multiple deletes, with a commit after the last one. --wunder
>>
>> On 7/4/08 8:40 AM, "Jonathan Ariel" <io...@gmail.com> wrote:
>>
>>> yeah I know. the problem with a query is that there is a maximum  
>>> amount
>>> of
>> query terms that I can add, which is reasonable. The problem is  
>> that I
>>> have
>> thousands of Ids.
>>
>>


Re: Bulk delete

Posted by Jonathan Ariel <io...@gmail.com>.
oh. you're right! if using 1.13 and if there is no limit to the amount of
ids I can send with the delete tag.

On Fri, Jul 4, 2008 at 1:10 PM, Yonik Seeley <yo...@apache.org> wrote:

> On Fri, Jul 4, 2008 at 12:06 PM, Jonathan Ariel <io...@gmail.com>
> wrote:
> > Yes, I just wanted to avoid N requests and do just 2.
>
> Note that you can do it in a single request if you really want... just
> add ?commit=true to the URL.
>
> -Yonik
>

Re: Bulk delete

Posted by Yonik Seeley <yo...@apache.org>.
On Fri, Jul 4, 2008 at 12:06 PM, Jonathan Ariel <io...@gmail.com> wrote:
> Yes, I just wanted to avoid N requests and do just 2.

Note that you can do it in a single request if you really want... just
add ?commit=true to the URL.

-Yonik

Re: Bulk delete

Posted by Jonathan Ariel <io...@gmail.com>.
Yes, I just wanted to avoid N requests and do just 2.

On Fri, Jul 4, 2008 at 12:48 PM, Walter Underwood <wu...@netflix.com>
wrote:

> Send multiple deletes, with a commit after the last one. --wunder
>
> On 7/4/08 8:40 AM, "Jonathan Ariel" <io...@gmail.com> wrote:
>
> > yeah I know. the problem with a query is that there is a maximum amount
> > of
> query terms that I can add, which is reasonable. The problem is that I
> > have
> thousands of Ids.
>
>

Re: Bulk delete

Posted by Walter Underwood <wu...@netflix.com>.
Send multiple deletes, with a commit after the last one. --wunder

On 7/4/08 8:40 AM, "Jonathan Ariel" <io...@gmail.com> wrote:

> yeah I know. the problem with a query is that there is a maximum amount
> of
query terms that I can add, which is reasonable. The problem is that I
> have
thousands of Ids.


Re: Bulk delete

Posted by Jonathan Ariel <io...@gmail.com>.
yeah I know. the problem with a query is that there is a maximum amount of
query terms that I can add, which is reasonable. The problem is that I have
thousands of Ids.

On Fri, Jul 4, 2008 at 12:29 PM, Noble Paul നോബിള്‍ नोब्ळ् <
noble.paul@gmail.com> wrote:

> You can either delete by a query or by an id. It is like you use any
> database . If you can find a condition by which you can identify these
> docs then you can delete by a query .
> --Noble
> On Fri, Jul 4, 2008 at 8:22 PM, Jonathan Ariel <io...@gmail.com> wrote:
> > Hi,
> > Is there any good way to do a bulk delete of several documents?
> > I have more than 1000 documents to delete... and I don't want to send N
> > request with <delete><id>X</id></delete>.
> > Doing a query delete isn't a good solution because I have a maximum
> amount
> > of terms that I can use in the query. For example:
> > <delete><query>id:(X1 OR X2 OR .... OR Xn)</query></delete> where n could
> be
> > more than 1000
> >
> > Thanks,
> >
> > Jonathan
> >
>
>
>
> --
> --Noble Paul
>

Re: Bulk delete

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
You can either delete by a query or by an id. It is like you use any
database . If you can find a condition by which you can identify these
docs then you can delete by a query .
--Noble
On Fri, Jul 4, 2008 at 8:22 PM, Jonathan Ariel <io...@gmail.com> wrote:
> Hi,
> Is there any good way to do a bulk delete of several documents?
> I have more than 1000 documents to delete... and I don't want to send N
> request with <delete><id>X</id></delete>.
> Doing a query delete isn't a good solution because I have a maximum amount
> of terms that I can use in the query. For example:
> <delete><query>id:(X1 OR X2 OR .... OR Xn)</query></delete> where n could be
> more than 1000
>
> Thanks,
>
> Jonathan
>



-- 
--Noble Paul

Re: Bulk delete

Posted by Yonik Seeley <yo...@apache.org>.
On Fri, Jul 4, 2008 at 10:52 AM, Jonathan Ariel <io...@gmail.com> wrote:
> Is there any good way to do a bulk delete of several documents?
> I have more than 1000 documents to delete... and I don't want to send N
> request with <delete><id>X</id></delete>.
> Doing a query delete isn't a good solution because I have a maximum amount
> of terms that I can use in the query. For example:
> <delete><query>id:(X1 OR X2 OR .... OR Xn)</query></delete> where n could be
> more than 1000

As of Solr 1.3, you can specify multiple ids

<delete><id>1</id><id>2</id></delete>

-Yonik

Re: Bulk delete

Posted by Frans Flippo <fr...@gmail.com>.
Are you sure there is not a single criteria by which these documents
are selected for deletion? Surely they're not 1000 random documents?
Perhaps the criteria can not be described by the fields you're
currently indexing, but that's just a matter of adding the necessary
index fields.

E.g. <delete><query>date:[20080101 TO 20080330]</query></delete>

~ Frans

On Fri, Jul 4, 2008 at 4:52 PM, Jonathan Ariel <io...@gmail.com> wrote:
> Hi,
> Is there any good way to do a bulk delete of several documents?
> I have more than 1000 documents to delete... and I don't want to send N
> request with <delete><id>X</id></delete>.
> Doing a query delete isn't a good solution because I have a maximum amount
> of terms that I can use in the query. For example:
> <delete><query>id:(X1 OR X2 OR .... OR Xn)</query></delete> where n could be
> more than 1000
>
> Thanks,
>
> Jonathan
>