You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jay Potharaju <js...@gmail.com> on 2018/05/23 00:35:38 UTC

deletebyQuery vs deletebyId

Hi,
I have a quick question about deletebyQuery vs deleteById. When using
deleteByQuery, if query is id:123 is that same as deleteById in terms of
performance.


Thanks
Jay

Re: deletebyQuery vs deletebyId

Posted by Jay Potharaju <js...@gmail.com>.
Hi Erick,
Yes, I commented on the ticket ...after finding it during my search for the
issue in the solr JIRA.

Setup:
2 Nodes, 6 shards , 3 shards on each node (no replication)
Collection uses implicit routing.

Just to give some background ... The first time I tried it ...it worked but
then when i went back later and tested it again ...and it was only working
intermittently... that lead me to believe either there was a problem on how
i was posting the request or a solr issue..

Based on your suggestion about using httpclient I just tried posting a
request directly to shard and it works
curl http://solrserver:8983/solr/test_shardaa_replica1/update/json/ -H
'Content-type:application/json/docs' -d '{
  "delete": {"id":"aa:1112312:444"}
}'

Thanks
Jay



On Wed, May 23, 2018 at 9:03 PM Erick Erickson <er...@gmail.com>
wrote:

> Hmmm, this looks like https://issues.apache.org/jira/browse/SOLR-8889?
> And are you the "Jay" who commented there?
>
> On Wed, May 23, 2018 at 11:28 PM, Erick Erickson
> <er...@gmail.com> wrote:
> > Tell us some more about your setup, particularly:
> > - you mention routing key. Is the collection used with implicit
> > routing or compositeID?
> > - What does adding &debug=query show?
> > - I'm not entirely sure, frankly, how delete by id and having a
> > different routing field play together. The supposition behind
> > deleteById is that the deletions can be routed to the correct leader
> > by hashing on the id field.
> >
> > Best,
> > Erick
> >
> > On Wed, May 23, 2018 at 6:02 PM, Jay Potharaju <js...@gmail.com>
> wrote:
> >> Thanks Emir & Shawn for chiming in!.
> >> I am testing deleteById in solr6.6.3 and it does not seem to work. I
> have a
> >> 6 shards in my collection and when sending query to solr a routing key
> is
> >> also passed. Also tested this in solr 5.3 also, with same results.
> >> Any suggestions why that would be happening?
> >>
> >> Thanks
> >> Jay
> >>
> >>
> >>
> >> On Wed, May 23, 2018 at 1:26 AM Emir Arnautović <
> >> emir.arnautovic@sematext.com> wrote:
> >>
> >>> Hi Jay,
> >>> Solr does not handle it differently from any other DBQ. It will show
> less
> >>> issues then some other DBQ because affects less documents but the
> mechanics
> >>> of DBQ is the same and does not play well with concurrent changes of
> index
> >>> (merges/updates) especially in SolrCloud mode. Here are some thoughts
> on
> >>> DBQ: http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html <
> >>> http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html>
> >>>
> >>> Thanks,
> >>> Emir
> >>> --
> >>> Monitoring - Log Management - Alerting - Anomaly Detection
> >>> Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> >>>
> >>>
> >>>
> >>> > On 23 May 2018, at 02:35, Jay Potharaju <js...@gmail.com>
> wrote:
> >>> >
> >>> > Hi,
> >>> > I have a quick question about deletebyQuery vs deleteById. When using
> >>> > deleteByQuery, if query is id:123 is that same as deleteById in
> terms of
> >>> > performance.
> >>> >
> >>> >
> >>> > Thanks
> >>> > Jay
> >>>
> >>>
>

Re: deletebyQuery vs deletebyId

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, this looks like https://issues.apache.org/jira/browse/SOLR-8889?
And are you the "Jay" who commented there?

On Wed, May 23, 2018 at 11:28 PM, Erick Erickson
<er...@gmail.com> wrote:
> Tell us some more about your setup, particularly:
> - you mention routing key. Is the collection used with implicit
> routing or compositeID?
> - What does adding &debug=query show?
> - I'm not entirely sure, frankly, how delete by id and having a
> different routing field play together. The supposition behind
> deleteById is that the deletions can be routed to the correct leader
> by hashing on the id field.
>
> Best,
> Erick
>
> On Wed, May 23, 2018 at 6:02 PM, Jay Potharaju <js...@gmail.com> wrote:
>> Thanks Emir & Shawn for chiming in!.
>> I am testing deleteById in solr6.6.3 and it does not seem to work. I have a
>> 6 shards in my collection and when sending query to solr a routing key is
>> also passed. Also tested this in solr 5.3 also, with same results.
>> Any suggestions why that would be happening?
>>
>> Thanks
>> Jay
>>
>>
>>
>> On Wed, May 23, 2018 at 1:26 AM Emir Arnautović <
>> emir.arnautovic@sematext.com> wrote:
>>
>>> Hi Jay,
>>> Solr does not handle it differently from any other DBQ. It will show less
>>> issues then some other DBQ because affects less documents but the mechanics
>>> of DBQ is the same and does not play well with concurrent changes of index
>>> (merges/updates) especially in SolrCloud mode. Here are some thoughts on
>>> DBQ: http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html <
>>> http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html>
>>>
>>> Thanks,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>>
>>>
>>>
>>> > On 23 May 2018, at 02:35, Jay Potharaju <js...@gmail.com> wrote:
>>> >
>>> > Hi,
>>> > I have a quick question about deletebyQuery vs deleteById. When using
>>> > deleteByQuery, if query is id:123 is that same as deleteById in terms of
>>> > performance.
>>> >
>>> >
>>> > Thanks
>>> > Jay
>>>
>>>

Re: deletebyQuery vs deletebyId

Posted by Erick Erickson <er...@gmail.com>.
Tell us some more about your setup, particularly:
- you mention routing key. Is the collection used with implicit
routing or compositeID?
- What does adding &debug=query show?
- I'm not entirely sure, frankly, how delete by id and having a
different routing field play together. The supposition behind
deleteById is that the deletions can be routed to the correct leader
by hashing on the id field.

Best,
Erick

On Wed, May 23, 2018 at 6:02 PM, Jay Potharaju <js...@gmail.com> wrote:
> Thanks Emir & Shawn for chiming in!.
> I am testing deleteById in solr6.6.3 and it does not seem to work. I have a
> 6 shards in my collection and when sending query to solr a routing key is
> also passed. Also tested this in solr 5.3 also, with same results.
> Any suggestions why that would be happening?
>
> Thanks
> Jay
>
>
>
> On Wed, May 23, 2018 at 1:26 AM Emir Arnautović <
> emir.arnautovic@sematext.com> wrote:
>
>> Hi Jay,
>> Solr does not handle it differently from any other DBQ. It will show less
>> issues then some other DBQ because affects less documents but the mechanics
>> of DBQ is the same and does not play well with concurrent changes of index
>> (merges/updates) especially in SolrCloud mode. Here are some thoughts on
>> DBQ: http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html <
>> http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html>
>>
>> Thanks,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>>
>> > On 23 May 2018, at 02:35, Jay Potharaju <js...@gmail.com> wrote:
>> >
>> > Hi,
>> > I have a quick question about deletebyQuery vs deleteById. When using
>> > deleteByQuery, if query is id:123 is that same as deleteById in terms of
>> > performance.
>> >
>> >
>> > Thanks
>> > Jay
>>
>>

Re: deletebyQuery vs deletebyId

Posted by Jay Potharaju <js...@gmail.com>.
Thanks Emir & Shawn for chiming in!.
I am testing deleteById in solr6.6.3 and it does not seem to work. I have a
6 shards in my collection and when sending query to solr a routing key is
also passed. Also tested this in solr 5.3 also, with same results.
Any suggestions why that would be happening?

Thanks
Jay



On Wed, May 23, 2018 at 1:26 AM Emir Arnautović <
emir.arnautovic@sematext.com> wrote:

> Hi Jay,
> Solr does not handle it differently from any other DBQ. It will show less
> issues then some other DBQ because affects less documents but the mechanics
> of DBQ is the same and does not play well with concurrent changes of index
> (merges/updates) especially in SolrCloud mode. Here are some thoughts on
> DBQ: http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html <
> http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html>
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 23 May 2018, at 02:35, Jay Potharaju <js...@gmail.com> wrote:
> >
> > Hi,
> > I have a quick question about deletebyQuery vs deleteById. When using
> > deleteByQuery, if query is id:123 is that same as deleteById in terms of
> > performance.
> >
> >
> > Thanks
> > Jay
>
>

Re: deletebyQuery vs deletebyId

Posted by Emir Arnautović <em...@sematext.com>.
Hi Jay,
Solr does not handle it differently from any other DBQ. It will show less issues then some other DBQ because affects less documents but the mechanics of DBQ is the same and does not play well with concurrent changes of index (merges/updates) especially in SolrCloud mode. Here are some thoughts on DBQ: http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html <http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html>

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 23 May 2018, at 02:35, Jay Potharaju <js...@gmail.com> wrote:
> 
> Hi,
> I have a quick question about deletebyQuery vs deleteById. When using
> deleteByQuery, if query is id:123 is that same as deleteById in terms of
> performance.
> 
> 
> Thanks
> Jay


Re: deletebyQuery vs deletebyId

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/22/2018 6:35 PM, Jay Potharaju wrote:
> I have a quick question about deletebyQuery vs deleteById. When using
> deleteByQuery, if query is id:123 is that same as deleteById in terms of
> performance.

If there is absolutely nothing else happening to update the index, the 
difference between the two would probably be outside normal human 
perception of time -- I think you'd only be able to see the difference 
by measuring it with software, and you might need something that can 
show time units below one millisecond.  On a query that matches a lot of 
documents, the difference might be more pronounced, but likely still 
pretty small.

The issue with DBQ, which I already explained to you on another mailing 
list thread, is that DBQ can interact badly with other operations, 
segment merges in particular.  The delete itself won't take very long, 
but the simple fact that DBQ was used might result in a noticeable pause 
in your indexing operations.

http://lucene.472066.n3.nabble.com/Async-exceptions-during-distributed-update-td4388725.html#a4388787

As mentioned there, the pauses don't happen with id-based delete.

Thanks,
Shawn