You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erick Erickson (JIRA)" <ji...@apache.org> on 2015/05/01 00:11:06 UTC

[jira] [Commented] (SOLR-7490) Update by query feature

    [ https://issues.apache.org/jira/browse/SOLR-7490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522371#comment-14522371 ] 

Erick Erickson commented on SOLR-7490:
--------------------------------------

bq: Considering a query that qualifies everything, Solr ends up re-importing the whole data from itself which is basically an optimize operation I think

Not at all. An optimize does, not for instance, re-analyze all of the documents. How could it? Unless the data has stored="true", the original content is just _gone_. It just copies some binary bits around, a much simpler task. Perhaps not fast on a large corpus, but much faster then re-analyzing everything.


bq: With atomic updates, as you say, we will be exposing the freedom of updating a huge set of documents in one request. We will be pushing Solr too much unless it is used wisely.

Not really the same thing at all IMO. It's much less surprising to write a program that re-indexes a bunch of data than to write a single statement that's the equivalent of SQL "update blah where blah" and doesn't return for, perhaps, hours.

bq: But it seems to make it easy to change the schema without having to do anything after (basically change the schema and issue an update by query qualifying the whole index) which basically supports uptime re-indexing of a solr collection with new schema I guess.

I think you're still missing the point. There's no data to re-index _from_ unless the fields have stored="true".

bq: But it seems to make it easy to change the schema without having to do anything after (basically change the schema and issue an update by query qualifying the whole index) which basically supports uptime re-indexing of a solr collection with new schema I guess.

On any large size corpus, this will essentially have down-time. Your server will be so hammered that it won't be serving any queries. Or at least not quickly. But it is an interesting idea _if_ (and only if) you have all the data stored.

If you're making the argument that _if_ all fields are stored and _if_ you want to update a particular value for all docs that satisfy a query and _if_ you're willing to accept the risk of huge operations, then the work difference between a update-by-query and atomic updates is roughly equal, I'll agree with you. But frankly the benefit is very marginal in my view, so specialized that I'd be reluctant to push it forward.

Feel free to disagree of course, maybe others have a different opinion. 

> Update by query feature
> -----------------------
>
>                 Key: SOLR-7490
>                 URL: https://issues.apache.org/jira/browse/SOLR-7490
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Praneeth
>            Priority: Minor
>
> An update feature similar to the {{deleteByQuery}} would be very useful. Say, the user wants to update a field of all documents in the index that match a given criteria. I have encountered this use case in my project and it looks like it could be a useful first class solr/lucene feature. I want to check if this is something we would want to support in coming releases of Solr and Lucene, are there scenarios that will prevent us from doing this, feasibility, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org