You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Eric Kilby <ki...@stylefeeder.com> on 2009/11/11 17:55:54 UTC

Re: Delete of non-existent record succeeds

Rather than start a new thread, I'd like to follow up on this.  I'm going to
oversimplify but the basic question should be straightforward.

I currently have one very large SOLR index, and 5 small ones which contain
filtered subsets out of the big one and are used for faceting in one area of
our site.  The means by which we determine documents to go into the smaller
ones is somewhat expensive computationally, and involves hitting a database
and a machine learning system among other things.

The problem I'm considering is that when a document goes "inactive"
(indicated by a status field) in the big index, I'd like to remove it from
any of the small ones that it happens to be in.  This may be any of the 5 or
none at all, as they don't nearly cover the whole space.  I don't need to
keep inactive documents in the small indexes, and prefer to keep them small
for performance purposes.

So rather than doing the expensive process to figure out what, if any, of
the small indexes to issue the delete against, would it be terribly
expensive to issue 5 deletes against the 5 servers (cores) and have them not
match?  What is the overhead on the SOLR side internally to process a
(non-)delete in this case?  I'm hoping the main overhead on this is
bandwidth to issue the requests, which is not a concern since the code will
be running on the same machine as the SOLR instances.

I appreciate any advice on this matter, and congrats on the release of 1.4!


Yonik Seeley wrote:
> 
> delete means delete if it exists.
> 
> Due to how lucene works, to get good performance deletes are actually
> buffered... when the method returns, the deletes haven't really been
> applied yet.
> 

-- 
View this message in context: http://old.nabble.com/Delete-of-non-existent-record-succeeds-tp12060767p26304667.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Delete of non-existent record succeeds

Posted by Otis Gospodnetic <ot...@yahoo.com>.
I'd go with just broadcasting the delete.  If I remember correctly, that's what we did at one place where we used vanilla Lucene with RMI (pre-Solr) and we didn't see any problems due to that (RMI, on the other hand).  Whether this will work for you depends on how often you'll need to do that, among other things.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----
> From: Eric Kilby <ki...@stylefeeder.com>
> To: solr-user@lucene.apache.org
> Sent: Wed, November 11, 2009 11:55:54 AM
> Subject: Re: Delete of non-existent record succeeds
> 
> 
> Rather than start a new thread, I'd like to follow up on this.  I'm going to
> oversimplify but the basic question should be straightforward.
> 
> I currently have one very large SOLR index, and 5 small ones which contain
> filtered subsets out of the big one and are used for faceting in one area of
> our site.  The means by which we determine documents to go into the smaller
> ones is somewhat expensive computationally, and involves hitting a database
> and a machine learning system among other things.
> 
> The problem I'm considering is that when a document goes "inactive"
> (indicated by a status field) in the big index, I'd like to remove it from
> any of the small ones that it happens to be in.  This may be any of the 5 or
> none at all, as they don't nearly cover the whole space.  I don't need to
> keep inactive documents in the small indexes, and prefer to keep them small
> for performance purposes.
> 
> So rather than doing the expensive process to figure out what, if any, of
> the small indexes to issue the delete against, would it be terribly
> expensive to issue 5 deletes against the 5 servers (cores) and have them not
> match?  What is the overhead on the SOLR side internally to process a
> (non-)delete in this case?  I'm hoping the main overhead on this is
> bandwidth to issue the requests, which is not a concern since the code will
> be running on the same machine as the SOLR instances.
> 
> I appreciate any advice on this matter, and congrats on the release of 1.4!
> 
> 
> Yonik Seeley wrote:
> > 
> > delete means delete if it exists.
> > 
> > Due to how lucene works, to get good performance deletes are actually
> > buffered... when the method returns, the deletes haven't really been
> > applied yet.
> > 
> 
> -- 
> View this message in context: 
> http://old.nabble.com/Delete-of-non-existent-record-succeeds-tp12060767p26304667.html
> Sent from the Solr - User mailing list archive at Nabble.com.