You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sujatha sankaran <su...@gmail.com> on 2018/07/05 20:57:21 UTC

Re: Delete By Query issue followed by Delete By Id Issues

Hi Emir,

We are deleting a larger subset of docs with a particular value which we
know based on the id and only updating a few of the deleted. Our document
is of the form
<type>_<part1>_<part2>, we need to delete all that has the same <part1>,
that are no longer in DB and then update only a few that has been updated
in DB.

Thanks,
Sujatha



On Sun, Jun 24, 2018 at 8:59 AM, Emir Arnautović <
emir.arnautovic@sematext.com> wrote:

> Hi Sujatha,
> Did I get it right that you are deleting the same documents that will be
> updated afterward? If that’s the case, then you can simply skip deleting,
> and just send updated version of document. Solr (Lucene) does not have
> delete - it’ll just flag document as deleted. Updating document (assuming
> id is the same) will result in the same thing - old document will not be
> retrievable and will be removed from index when segments holding it is
> merged.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 21 Jun 2018, at 19:59, sujatha sankaran <su...@gmail.com>
> wrote:
> >
> > Thanks,Shawn.
> >
> > Our use case is something like this in a batch load of  several 1000's of
> > documents,we do a delete first followed by update.Example delete all 1000
> > docs and send an update request for 1000.
> >
> > What we see is that there are many missing docs due to DBQ re-ordering of
> > the order of  deletes followed by updates.We also saw issue with nodes
> > going down
> > similar tot issue described here:
> > http://lucene.472066.n3.nabble.com/SolrCloud-Nodes-
> going-to-recovery-state-during-indexing-td4369396.html
> >
> > we see at the end of this batch process, many (several thousand ) missing
> > docs.
> >
> > Due to this and after reading above thread , we decided to move to DBI
> and
> > now are facing issues due to custom routing or implicit routing which we
> > have in place.So I don't think DBQ was working for us, but we did have
> > several such process ( DBQ followed by updates) for different activities
> in
> > the collection happening at the same time.
> >
> >
> > Sujatha
> >
> > On Thu, Jun 21, 2018 at 1:21 PM, Shawn Heisey <ap...@elyograg.org>
> wrote:
> >
> >> On 6/21/2018 9:59 AM, sujatha sankaran wrote:
> >>> Currently from our business perspective we find that we are left with
> no
> >>> options for deleting docs in a batch load as :
> >>>
> >>> DBQ+ batch does not work well together
> >>> DBI+ custom routing (batch load / normal)    would not work as well.
> >>
> >> I would expect DBQ to work, just with the caveat that if you are trying
> >> to do other indexing operations at the same time, you may run into
> >> significant delays, and if there are timeouts configured anywhere that
> >> are shorter than those delays, requests may return failure responses or
> >> log failures.
> >>
> >> If you are using DBQ, you just need to be sure that there are no other
> >> operations happening at the same time, or that your error handling is
> >> bulletproof.  Making sure that no other operations are happening at the
> >> same time as the DBQ is in my opinion a better option.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>
>