You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andy <an...@yahoo.com> on 2010/11/01 22:04:24 UTC

Which is faster -- delete or update?

My documents have a "down_vote" field. Every time a user votes down a document, I increment the "down_vote" field in my database and also re-index the document to Solr to reflect the new down_vote value.
During searches, I want to restrict the results to only documents with, say fewer than 3 down_vote. 2 ways to implement that:
1) When a user down vote a document, check to see if total down votes have reached 3. If it has, delete document from Solr index.
2) When a user down vote a document, update the document in Solr index to reflect the new down_vote value even if total down votes might have been more than 3. During query, add a "fq" to restrict results to documents with fewer than 3 down votes.
Which approach is better? Is it faster to delete a document from index or to update the document to reflect the new down_vote value?
Thanks.Andy


      

Re: Which is faster -- delete or update?

Posted by Jonathan Rochkind <ro...@jhu.edu>.
The actual time it takes to delete or update the document is unlikely to 
make a difference to you.

What might make a difference to you is the time it takes to actually 
finalize the commit, and the time it takes to re-warm your indexes after 
a commit, and especially the time it takes to run any warming queries 
you have set in newSearcher. Most of these probably won't differ between 
delete or update, but could be a problem either way; one way to find 
out, try it and measure it.

Whether you do a delete or an update, if you're planning on making 
changes to your index more often than, oh, 10 or 20 minute seperation, 
you may run into trouble. Solr isn't so good at frequent changes to the 
index like that.  I haven't looked at it myself, but the Solr patches 
that get called "near real-time" seem like they're intended to deal with 
this, among other things, and allow frequent commits without killing 
performance or RAM usage.

I am not sure how/if other people are effectively dealing with 
user-generated content that needs to be included in the index for 
filtering and searching against. Would be very curious if anyone has any 
successful strategies to share. Another example would be user-generated 
tagging.

Erick Erickson wrote:
> Just deleting a document is faster because all that really happens
> is the document is marked as deleted. An update is really
> a delete followed by an add of the same document, so by definition
> an update will be slower...
>
> But... does it really make a difference? How often to you expect this to
> happen? Perter Karich added a note while I was typing this, and he
> makes some cogent points.
>
> I'm starting to think that I don't care about better unless and until my
> users notice (or I have a reasonable expectation that they #will# notice).
> I'm far more interested in simpler code that I can maintain than I am
> shaving off another 4 milliseconds from the response time. That gives
> me more chance to put in cool new features that the user will notice...
>
> Best
> Erick
>
> On Mon, Nov 1, 2010 at 5:04 PM, Andy <an...@yahoo.com> wrote:
>
>   
>> My documents have a "down_vote" field. Every time a user votes down a
>> document, I increment the "down_vote" field in my database and also re-index
>> the document to Solr to reflect the new down_vote value.
>> During searches, I want to restrict the results to only documents with, say
>> fewer than 3 down_vote. 2 ways to implement that:
>> 1) When a user down vote a document, check to see if total down votes have
>> reached 3. If it has, delete document from Solr index.
>> 2) When a user down vote a document, update the document in Solr index to
>> reflect the new down_vote value even if total down votes might have been
>> more than 3. During query, add a "fq" to restrict results to documents with
>> fewer than 3 down votes.
>> Which approach is better? Is it faster to delete a document from index or
>> to update the document to reflect the new down_vote value?
>> Thanks.Andy
>>
>>
>>
>>     
>
>   

Re: Which is faster -- delete or update?

Posted by Erick Erickson <er...@gmail.com>.
Just deleting a document is faster because all that really happens
is the document is marked as deleted. An update is really
a delete followed by an add of the same document, so by definition
an update will be slower...

But... does it really make a difference? How often to you expect this to
happen? Perter Karich added a note while I was typing this, and he
makes some cogent points.

I'm starting to think that I don't care about better unless and until my
users notice (or I have a reasonable expectation that they #will# notice).
I'm far more interested in simpler code that I can maintain than I am
shaving off another 4 milliseconds from the response time. That gives
me more chance to put in cool new features that the user will notice...

Best
Erick

On Mon, Nov 1, 2010 at 5:04 PM, Andy <an...@yahoo.com> wrote:

> My documents have a "down_vote" field. Every time a user votes down a
> document, I increment the "down_vote" field in my database and also re-index
> the document to Solr to reflect the new down_vote value.
> During searches, I want to restrict the results to only documents with, say
> fewer than 3 down_vote. 2 ways to implement that:
> 1) When a user down vote a document, check to see if total down votes have
> reached 3. If it has, delete document from Solr index.
> 2) When a user down vote a document, update the document in Solr index to
> reflect the new down_vote value even if total down votes might have been
> more than 3. During query, add a "fq" to restrict results to documents with
> fewer than 3 down votes.
> Which approach is better? Is it faster to delete a document from index or
> to update the document to reflect the new down_vote value?
> Thanks.Andy
>
>
>

Re: Which is faster -- delete or update?

Posted by Peter Karich <pe...@yahoo.de>.
  From the user perspective I wouldn't delete it, because it could be 
that down-voting by mistake or spam or something and up-voting can 
resurrect it.
It could be also wise to keep the docs to see which content (from which 
users?) are down voted to get spam accounts?

 From the dev perspective you should benchmark it, if really necessary. 
(I guess updating is a more expensive because I think it is 
delete+completely-new-add)

Regards,
Peter.

> My documents have a "down_vote" field. Every time a user votes down a document, I increment the "down_vote" field in my database and also re-index the document to Solr to reflect the new down_vote value.
> During searches, I want to restrict the results to only documents with, say fewer than 3 down_vote. 2 ways to implement that:
> 1) When a user down vote a document, check to see if total down votes have reached 3. If it has, delete document from Solr index.
> 2) When a user down vote a document, update the document in Solr index to reflect the new down_vote value even if total down votes might have been more than 3. During query, add a "fq" to restrict results to documents with fewer than 3 down votes.
> Which approach is better? Is it faster to delete a document from index or to update the document to reflect the new down_vote value?
> Thanks.Andy
>
>
>