You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by farhanali <ar...@gmail.com> on 2008/01/10 14:28:52 UTC

Delte by multiple id problem

I have deleted single id by sending delete command to solr server but getting
error when i try to delete multiple id's.
<delete>
<id>2</id>
<id>3</id>
<id>4</id>
</delete>

is it the right syntax??
any body have some idea.
-- 
View this message in context: http://www.nabble.com/Delte-by-multiple-id-problem-tp14733135p14733135.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Transactions and Solr Was: Re: Delte by multiple id problem

Posted by Chris Hostetter <ho...@fucit.org>.
: Suppose I wanted to use this log approach. Does anyone have
: suggestions about the best way to do it? The approach that first comes
: to mind is to store the log as a separate DB table, and to maintain

it largely depends on your DB schema and the mechanisms you use to update 
your data.  In most instances i deal with, we never actually "delete" rows
from the authoritative database table that controls the ID space -- we 
just mark those logical objects as deleted (using an enumerated status 
field) and we have another field that records the lastModified time of 
any logical object.

each batch run just looks for any logical object whose lastModified time 
is greater then the timestamp of the last batch run -- for each object, 
either reindex or delete depending on the status (in rare cases an object 
is modified after it is deleted, but sending a superfulous delete is 
almost inconsequential)

we have some other more complex datamodels that we deal with in more 
complex ways ... but the underlying theme is the same ... know when stuff 
changed, know what stuff is "live" and what stuff is "dead" ... keep a 
"lastRunTime" and compare everything to it.



-Hoss


Re: Transactions and Solr Was: Re: Delte by multiple id problem

Posted by Chris Harris <ry...@gmail.com>.
Suppose I wanted to use this log approach. Does anyone have
suggestions about the best way to do it? The approach that first comes
to mind is to store the log as a separate DB table, and to maintain
that table using a DB trigger attached to the underlying source data
table. This is clearly not the only option, though. Perhaps sometimes
it would be better to maintain the log as a text file, rather than in
the DB. Or perhaps the log should be maintained at the application
layer, not the DB layer. Or with a stored procedure, rather than a
trigger.

On Jan 16, 2008 9:50 PM, Chris Hostetter <ho...@fucit.org> wrote:
>
>> Does anyone have more experience doing [transactional Solr] and
whants to share?
>
> My advice: don't.
>
> I work with (or work with people who work with) about two dozen Solr
> indexes -- we don't attempt to update a single one of them in any sort of
> transactional way. ... Some of them are updated in batch (ie: once every N minutes
> code checks a log of all logical objects modified/deleted from the DB and
> sends the adds/delets to Solr; ...

Re: Transactions and Solr Was: Re: Delte by multiple id problem

Posted by Chris Hostetter <ho...@fucit.org>.
: Does anyone have more experience doing this kind of stuff and whants to share?

My advice: don't.

I work with (or work with people who work with) about two dozen Solr 
indexes -- we don't attempt to update a single one of them in any sort of 
transactional way.  Some of them are updated "real time" (ie: as soon as 
the authoritative DB is updated by some code, the same code updates the 
Solr index; Some of them are updated in batch (ie: once every N minutes 
code checks a log of all logical objects modified/deleted from the DB and 
sends the adds/delets to Solr; And some are only ever rebuilt from scrath 
every N hours (because the data in them isn't very time sensative and 
rebuilding from scratch is easier then dealing with incremental or batch 
updates.

But as i said: we never attempt to be transactional about it, for a few 
reasons:
  1) why should it be part of the transaction?  a Solr index is a 
denormalized/inverted index of data .. why should a tool (or any other 
process) be prevented from writting to an authoritative data store just 
becuase a non authoritative copy of that data can't be updated?  ... if 
you used MySQL with replication, would you really want to block all writes 
to the master just because there's a glitch in replicating to a slave?
  2) why worry about it?  It's relaly a non issue.  If an add or 
delete fails it's usually either developer error (ie: the code 
generating your add statements thinks there's a field that doesn't 
exist), a transient timeout (maybe because of a commit in progress) or 
network glitch (have the client retry once or twice), or in very rare 
instances the whole Solr index was completely jacked (either from disk 
failure, or OOM due to a huge spike in load) and we want to revert 
to a backup of the index in the shortterm and rebuild the index from 
scratch to play it safe.
  3) why limit yourself?  you're going to want the ability to trigger 
arbitrary indexing of your data objects at anytime -- if for no other 
reason then so when you decide to add a field to your index you can 
reindex them all -- so why make your index updating code inherently tied 
to your DB updating code?


As for your specific question along the lines of "why can't we do a 
mix of <add>s and <delete>s all as part of one update message?" the answer 
is "because no one ever wrote any code to parse messages like that."  BUT! 
... that's not the question you really want to ask.  the question you 
relaly want to ask is: "*IF* someone wrote code to allow a mix of <add>s 
and <delete>s all as part of one update message, would it solve my problem 
of wanting to be able to modify my solr index transactionally?" and the 
answer is "No."  Even if Solr accepted update messages that looked 
like this...

    <update>
       <delete><id>42</id></delete>
       <add><field name="id">7</field><field name="a">bb</field></add>
       <add><field name="id">666</field><field name="a">cccc</field></add>
    </update>

...the low level lucene calls that it would be doing internall still 
aren't transactional, so the first "delete" and "add" might succeed, but 
if there was then some kind of internal error, or a timeout because the 
first add took a while (maybe it triggered a segment merge) and the second 
add didn't happen -- the first two commands would have still been 
executed, and there would be no way to "rollback".

In a nutshell: you would be no better off then if your client code has 
sent all three as seperate update messages.


-Hoss


Transactions and Solr Was: Re: Delte by multiple id problem

Posted by Leonardo Santagada <sa...@gmail.com>.
On 12/01/2008, at 11:24, Norberto Meijome wrote:

> On Fri, 11 Jan 2008 00:43:19 -0200
> Leonardo Santagada <sa...@gmail.com> wrote:
>
>> No, actually my problem is that the solr index is mirroring data on a
>> database (a Zope app to be more acurate) so it would be better if I
>> could send the whole transaction together so I don't have to keep it
>> on separate files... wich I have to do so I can not send anything if
>> the transaction is aborted (I can't abort a solr add right?).
>>
>> Maybe I should explain more, but I think this is pretty comon to
>> anyone trying to keep database transactions and a solr index in sync,
>> as solr doesn't support two phase commit or anything like that.
>
> Hola Leonardo,
> I haven't have to do this, but I am starting to design something  
> along these lines.
>
> if you execut your 'add' and 'deletes' from a stored proc, inside a  
> transaction, you can simply have an extra table with Solr doc ids  
> and the action to perform (add / delete).
> eg,
> exec(delete_from_my_db('xyz') ->
> being transaction
> {do here all your DB work}
> {add to tblSolrWork the ID to delete}
> end transaction
> Hence , If the transaction fails, those records will never actually  
> exist.

Not that simple, for example, another add with the same unique key  
should remove the key from the delete, and then store the whole data  
twice so you know what to send to solr. Also you have to save a serial  
number of the transaction so you add documents in the right order and  
do the deletes also in order. And having one table that manages this  
in the same relational database could mean a big drop in performance,  
as everything you do on your db would lock, write and read from a  
single or a couple of tables, and this makes your life a living hell  
also :).

What I am doing on Zope is firing some events when new documents are  
added, updated or removed, and then I join the transaction with my  
transaction manager wich orders the adds to solr and already saves a  
xml file to be sent to solr. The problems with this are the ones  
mentioned, it would be simpler if the same file could send all types  
of commands to solr (add and delete are the ones I am using.

> Whether and how you could do this in Zope, I have no idea, but if  
> you solve it it would be great if you could share it here .
>
> You could also make use of  triggers (on insert / update and  
> onDelete triggers), but I suppose that is a bit more DB dependent  
> than plain SP work - though it may be simpler to implement than  
> changing all your code to call the SP instead of direct SQL cmds...

Probably, but still would hit performance really hard on a relational  
database that have a lot more than documents on it I think.

Does anyone have more experience doing this kind of stuff and whants  
to share?

--
Leonardo Santagada




Re: Delte by multiple id problem

Posted by Norberto Meijome <fr...@meijome.net>.
On Fri, 11 Jan 2008 00:43:19 -0200
Leonardo Santagada <sa...@gmail.com> wrote:

> No, actually my problem is that the solr index is mirroring data on a  
> database (a Zope app to be more acurate) so it would be better if I  
> could send the whole transaction together so I don't have to keep it  
> on separate files... wich I have to do so I can not send anything if  
> the transaction is aborted (I can't abort a solr add right?).
> 
> Maybe I should explain more, but I think this is pretty comon to  
> anyone trying to keep database transactions and a solr index in sync,  
> as solr doesn't support two phase commit or anything like that.

Hola Leonardo,
I haven't have to do this, but I am starting to design something along these lines.

if you execut your 'add' and 'deletes' from a stored proc, inside a transaction, you can simply have an extra table with Solr doc ids and the action to perform (add / delete). 
eg, 
exec(delete_from_my_db('xyz') ->
being transaction
 {do here all your DB work}
 {add to tblSolrWork the ID to delete}
end transaction
Hence , If the transaction fails, those records will never actually exist.

You can then have a process that every x seconds/minutes/hours (depending on your needs), scans this tbSolrWork table and performs whatever adds or deletes are needed. Of course, for the add, you'll also need to get the information to add from somewhere, but I imagine you already do that.

Whether and how you could do this in Zope, I have no idea, but if you solve it it would be great if you could share it here .

You could also make use of  triggers (on insert / update and onDelete triggers), but I suppose that is a bit more DB dependent than plain SP work - though it may be simpler to implement than changing all your code to call the SP instead of direct SQL cmds...

good luck,
B
_________________________
{Beto|Norberto|Numard} Meijome

"He can compress the most words into the smallest idea of any man I know."
  Abraham Lincoln

I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.

Re: Delte by multiple id problem

Posted by Leonardo Santagada <sa...@gmail.com>.
On 10/01/2008, at 22:51, Chris Hostetter wrote:

>
> : Can this be done on the same request? Because I am sending them on  
> diferent
> : requests... wich suck... already dividing inserts and deletes in 2  
> requests is
> : bad enough.
>
> if you use persistent HTTP connections (ie: keep-alive) the network
> overhead goes away, and there should be negligable difference  
> between one
> request per delete and 'batched' dleetes (let alone 2 requests: one  
> for
> inserts and 1 for deletes).
>
> incidently: you aren't deleting docs just so you can add new  
> versions of
> them are you?  assuming you have a uniqueKey field you can just read  
> add
> the docs and Solr will take care of deleting the old version.


No, actually my problem is that the solr index is mirroring data on a  
database (a Zope app to be more acurate) so it would be better if I  
could send the whole transaction together so I don't have to keep it  
on separate files... wich I have to do so I can not send anything if  
the transaction is aborted (I can't abort a solr add right?).

Maybe I should explain more, but I think this is pretty comon to  
anyone trying to keep database transactions and a solr index in sync,  
as solr doesn't support two phase commit or anything like that.


--
Leonardo Santagada




Re: Delte by multiple id problem

Posted by Chris Hostetter <ho...@fucit.org>.
: Can this be done on the same request? Because I am sending them on diferent
: requests... wich suck... already dividing inserts and deletes in 2 requests is
: bad enough.

if you use persistent HTTP connections (ie: keep-alive) the network 
overhead goes away, and there should be negligable difference between one 
request per delete and 'batched' dleetes (let alone 2 requests: one for 
inserts and 1 for deletes).

incidently: you aren't deleting docs just so you can add new versions of 
them are you?  assuming you have a uniqueKey field you can just read add 
the docs and Solr will take care of deleting the old version.



-Hoss


Re: Delte by multiple id problem

Posted by Ryan McKinley <ry...@gmail.com>.
> 
> Does Solr 1.3 has been released or we have to wait??
> 

not released yet -- it is the nightly build....


Re: Delte by multiple id problem

Posted by farhanali <ar...@gmail.com>.


ryantxu wrote:
> 
> 
>>> eg
>>> <delete><id>2</id></delete>
>>> <delete><id>3</id></delete>
>>> <delete><id>4</id></delete>
>> 
>> Can this be done on the same request? Because I am sending them on 
>> diferent requests... wich suck... already dividing inserts and deletes 
>> in 2 requests is bad enough.
>> 
> 
> In 1.2, you have to do one command/request in 1.3 you can do multiple:
> 
> See SOLR-133 in
> http://svn.apache.org/repos/asf/lucene/solr/trunk/CHANGES.txt
> 
> 


Does Solr 1.3 has been released or we have to wait??

-- 
View this message in context: http://www.nabble.com/Delte-by-multiple-id-problem-tp14733135p14750409.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Delte by multiple id problem

Posted by Ryan McKinley <ry...@gmail.com>.
>> eg
>> <delete><id>2</id></delete>
>> <delete><id>3</id></delete>
>> <delete><id>4</id></delete>
> 
> Can this be done on the same request? Because I am sending them on 
> diferent requests... wich suck... already dividing inserts and deletes 
> in 2 requests is bad enough.
> 

In 1.2, you have to do one command/request in 1.3 you can do multiple:

See SOLR-133 in
http://svn.apache.org/repos/asf/lucene/solr/trunk/CHANGES.txt

Re: Delte by multiple id problem

Posted by Leonardo Santagada <sa...@gmail.com>.
On 10/01/2008, at 17:04, Norberto Meijome wrote:

> On Thu, 10 Jan 2008 05:28:52 -0800 (PST)
> farhanali <ar...@gmail.com> wrote:
>
>> I have deleted single id by sending delete command to solr server  
>> but getting
>> error when i try to delete multiple id's.
>> <delete>
>> <id>2</id>
>> <id>3</id>
>> <id>4</id>
>> </delete>
>>
>> is it the right syntax??
>> any body have some idea.
>
> AFAIK, at this time, you need to issue deletes with only 1 id per  
> command.
> eg
> <delete><id>2</id></delete>
> <delete><id>3</id></delete>
> <delete><id>4</id></delete>

Can this be done on the same request? Because I am sending them on  
diferent requests... wich suck... already dividing inserts and deletes  
in 2 requests is bad enough.

--
Leonardo Santagada




Re: Delte by multiple id problem

Posted by Norberto Meijome <fr...@meijome.net>.
On Thu, 10 Jan 2008 05:28:52 -0800 (PST)
farhanali <ar...@gmail.com> wrote:

> I have deleted single id by sending delete command to solr server but getting
> error when i try to delete multiple id's.
> <delete>
> <id>2</id>
> <id>3</id>
> <id>4</id>
> </delete>
> 
> is it the right syntax??
> any body have some idea.

AFAIK, at this time, you need to issue deletes with only 1 id per command.
eg
<delete><id>2</id></delete>
<delete><id>3</id></delete>
<delete><id>4</id></delete>

_________________________
{Beto|Norberto|Numard} Meijome

Life is not measured by the number of breaths we take, but by the moments that take our breath away.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.