You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jay Potharaju <js...@gmail.com> on 2018/05/23 00:35:38 UTC
deletebyQuery vs deletebyId
Hi,
I have a quick question about deletebyQuery vs deleteById. When using
deleteByQuery, if query is id:123 is that same as deleteById in terms of
performance.
Thanks
Jay
Re: deletebyQuery vs deletebyId
Posted by Jay Potharaju <js...@gmail.com>.
Hi Erick,
Yes, I commented on the ticket ...after finding it during my search for the
issue in the solr JIRA.
Setup:
2 Nodes, 6 shards , 3 shards on each node (no replication)
Collection uses implicit routing.
Just to give some background ... The first time I tried it ...it worked but
then when i went back later and tested it again ...and it was only working
intermittently... that lead me to believe either there was a problem on how
i was posting the request or a solr issue..
Based on your suggestion about using httpclient I just tried posting a
request directly to shard and it works
curl http://solrserver:8983/solr/test_shardaa_replica1/update/json/ -H
'Content-type:application/json/docs' -d '{
"delete": {"id":"aa:1112312:444"}
}'
Thanks
Jay
On Wed, May 23, 2018 at 9:03 PM Erick Erickson <er...@gmail.com>
wrote:
> Hmmm, this looks like https://issues.apache.org/jira/browse/SOLR-8889?
> And are you the "Jay" who commented there?
>
> On Wed, May 23, 2018 at 11:28 PM, Erick Erickson
> <er...@gmail.com> wrote:
> > Tell us some more about your setup, particularly:
> > - you mention routing key. Is the collection used with implicit
> > routing or compositeID?
> > - What does adding &debug=query show?
> > - I'm not entirely sure, frankly, how delete by id and having a
> > different routing field play together. The supposition behind
> > deleteById is that the deletions can be routed to the correct leader
> > by hashing on the id field.
> >
> > Best,
> > Erick
> >
> > On Wed, May 23, 2018 at 6:02 PM, Jay Potharaju <js...@gmail.com>
> wrote:
> >> Thanks Emir & Shawn for chiming in!.
> >> I am testing deleteById in solr6.6.3 and it does not seem to work. I
> have a
> >> 6 shards in my collection and when sending query to solr a routing key
> is
> >> also passed. Also tested this in solr 5.3 also, with same results.
> >> Any suggestions why that would be happening?
> >>
> >> Thanks
> >> Jay
> >>
> >>
> >>
> >> On Wed, May 23, 2018 at 1:26 AM Emir Arnautović <
> >> emir.arnautovic@sematext.com> wrote:
> >>
> >>> Hi Jay,
> >>> Solr does not handle it differently from any other DBQ. It will show
> less
> >>> issues then some other DBQ because affects less documents but the
> mechanics
> >>> of DBQ is the same and does not play well with concurrent changes of
> index
> >>> (merges/updates) especially in SolrCloud mode. Here are some thoughts
> on
> >>> DBQ: http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html <
> >>> http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html>
> >>>
> >>> Thanks,
> >>> Emir
> >>> --
> >>> Monitoring - Log Management - Alerting - Anomaly Detection
> >>> Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> >>>
> >>>
> >>>
> >>> > On 23 May 2018, at 02:35, Jay Potharaju <js...@gmail.com>
> wrote:
> >>> >
> >>> > Hi,
> >>> > I have a quick question about deletebyQuery vs deleteById. When using
> >>> > deleteByQuery, if query is id:123 is that same as deleteById in
> terms of
> >>> > performance.
> >>> >
> >>> >
> >>> > Thanks
> >>> > Jay
> >>>
> >>>
>
Re: deletebyQuery vs deletebyId
Posted by Erick Erickson <er...@gmail.com>.
Hmmm, this looks like https://issues.apache.org/jira/browse/SOLR-8889?
And are you the "Jay" who commented there?
On Wed, May 23, 2018 at 11:28 PM, Erick Erickson
<er...@gmail.com> wrote:
> Tell us some more about your setup, particularly:
> - you mention routing key. Is the collection used with implicit
> routing or compositeID?
> - What does adding &debug=query show?
> - I'm not entirely sure, frankly, how delete by id and having a
> different routing field play together. The supposition behind
> deleteById is that the deletions can be routed to the correct leader
> by hashing on the id field.
>
> Best,
> Erick
>
> On Wed, May 23, 2018 at 6:02 PM, Jay Potharaju <js...@gmail.com> wrote:
>> Thanks Emir & Shawn for chiming in!.
>> I am testing deleteById in solr6.6.3 and it does not seem to work. I have a
>> 6 shards in my collection and when sending query to solr a routing key is
>> also passed. Also tested this in solr 5.3 also, with same results.
>> Any suggestions why that would be happening?
>>
>> Thanks
>> Jay
>>
>>
>>
>> On Wed, May 23, 2018 at 1:26 AM Emir Arnautović <
>> emir.arnautovic@sematext.com> wrote:
>>
>>> Hi Jay,
>>> Solr does not handle it differently from any other DBQ. It will show less
>>> issues then some other DBQ because affects less documents but the mechanics
>>> of DBQ is the same and does not play well with concurrent changes of index
>>> (merges/updates) especially in SolrCloud mode. Here are some thoughts on
>>> DBQ: http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html <
>>> http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html>
>>>
>>> Thanks,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>>
>>>
>>>
>>> > On 23 May 2018, at 02:35, Jay Potharaju <js...@gmail.com> wrote:
>>> >
>>> > Hi,
>>> > I have a quick question about deletebyQuery vs deleteById. When using
>>> > deleteByQuery, if query is id:123 is that same as deleteById in terms of
>>> > performance.
>>> >
>>> >
>>> > Thanks
>>> > Jay
>>>
>>>
Re: deletebyQuery vs deletebyId
Posted by Erick Erickson <er...@gmail.com>.
Tell us some more about your setup, particularly:
- you mention routing key. Is the collection used with implicit
routing or compositeID?
- What does adding &debug=query show?
- I'm not entirely sure, frankly, how delete by id and having a
different routing field play together. The supposition behind
deleteById is that the deletions can be routed to the correct leader
by hashing on the id field.
Best,
Erick
On Wed, May 23, 2018 at 6:02 PM, Jay Potharaju <js...@gmail.com> wrote:
> Thanks Emir & Shawn for chiming in!.
> I am testing deleteById in solr6.6.3 and it does not seem to work. I have a
> 6 shards in my collection and when sending query to solr a routing key is
> also passed. Also tested this in solr 5.3 also, with same results.
> Any suggestions why that would be happening?
>
> Thanks
> Jay
>
>
>
> On Wed, May 23, 2018 at 1:26 AM Emir Arnautović <
> emir.arnautovic@sematext.com> wrote:
>
>> Hi Jay,
>> Solr does not handle it differently from any other DBQ. It will show less
>> issues then some other DBQ because affects less documents but the mechanics
>> of DBQ is the same and does not play well with concurrent changes of index
>> (merges/updates) especially in SolrCloud mode. Here are some thoughts on
>> DBQ: http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html <
>> http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html>
>>
>> Thanks,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>>
>> > On 23 May 2018, at 02:35, Jay Potharaju <js...@gmail.com> wrote:
>> >
>> > Hi,
>> > I have a quick question about deletebyQuery vs deleteById. When using
>> > deleteByQuery, if query is id:123 is that same as deleteById in terms of
>> > performance.
>> >
>> >
>> > Thanks
>> > Jay
>>
>>
Re: deletebyQuery vs deletebyId
Posted by Jay Potharaju <js...@gmail.com>.
Thanks Emir & Shawn for chiming in!.
I am testing deleteById in solr6.6.3 and it does not seem to work. I have a
6 shards in my collection and when sending query to solr a routing key is
also passed. Also tested this in solr 5.3 also, with same results.
Any suggestions why that would be happening?
Thanks
Jay
On Wed, May 23, 2018 at 1:26 AM Emir Arnautović <
emir.arnautovic@sematext.com> wrote:
> Hi Jay,
> Solr does not handle it differently from any other DBQ. It will show less
> issues then some other DBQ because affects less documents but the mechanics
> of DBQ is the same and does not play well with concurrent changes of index
> (merges/updates) especially in SolrCloud mode. Here are some thoughts on
> DBQ: http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html <
> http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html>
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 23 May 2018, at 02:35, Jay Potharaju <js...@gmail.com> wrote:
> >
> > Hi,
> > I have a quick question about deletebyQuery vs deleteById. When using
> > deleteByQuery, if query is id:123 is that same as deleteById in terms of
> > performance.
> >
> >
> > Thanks
> > Jay
>
>
Re: deletebyQuery vs deletebyId
Posted by Emir Arnautović <em...@sematext.com>.
Hi Jay,
Solr does not handle it differently from any other DBQ. It will show less issues then some other DBQ because affects less documents but the mechanics of DBQ is the same and does not play well with concurrent changes of index (merges/updates) especially in SolrCloud mode. Here are some thoughts on DBQ: http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html <http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html>
Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> On 23 May 2018, at 02:35, Jay Potharaju <js...@gmail.com> wrote:
>
> Hi,
> I have a quick question about deletebyQuery vs deleteById. When using
> deleteByQuery, if query is id:123 is that same as deleteById in terms of
> performance.
>
>
> Thanks
> Jay
Re: deletebyQuery vs deletebyId
Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/22/2018 6:35 PM, Jay Potharaju wrote:
> I have a quick question about deletebyQuery vs deleteById. When using
> deleteByQuery, if query is id:123 is that same as deleteById in terms of
> performance.
If there is absolutely nothing else happening to update the index, the
difference between the two would probably be outside normal human
perception of time -- I think you'd only be able to see the difference
by measuring it with software, and you might need something that can
show time units below one millisecond. On a query that matches a lot of
documents, the difference might be more pronounced, but likely still
pretty small.
The issue with DBQ, which I already explained to you on another mailing
list thread, is that DBQ can interact badly with other operations,
segment merges in particular. The delete itself won't take very long,
but the simple fact that DBQ was used might result in a noticeable pause
in your indexing operations.
http://lucene.472066.n3.nabble.com/Async-exceptions-during-distributed-update-td4388725.html#a4388787
As mentioned there, the pauses don't happen with id-based delete.
Thanks,
Shawn