You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Harold Frayman <ha...@guardian.co.uk> on 2012/02/15 21:47:35 UTC

update extracted docs

Hi

I have a solr 3.5 database which is populated by using /update/extract
(configured pretty much as per the examples) and additional metadata. The
uploads are handled by a perl-driven webapp which uses WebService::Solr
(which use behind-the-scenes POSTing). That all works fine.

When I come to update the metadata associated with the stored docs, again
using my perl web app, I find the solr doc (by id), amend or append all the
changed metadata and use /update to re-post them. Again that works fine ...
but I'm getting nervous because I'm not sure why it works.

If I try to update only the changed fields for a single doc, the unchanged
fields are removed. Slightly surprising, but if that's what I should
expect, it's not difficult to accept.

So how come using /update doesn't remove the text content (and the indexing
on it) which was originally obtained using /update/extract? And can I
depend on it being there in future, after optimization, for example?

And if I can't, what is the best technique for updating metadata under
these circumstances?

Harold Frayman

Please consider the environment before printing this email.
------------------------------------------------------------------
Visit guardian.co.uk - newspaper of the year

www.guardian.co.uk    www.observer.co.uk     www.guardiannews.com 

On your mobile, visit m.guardian.co.uk or download the Guardian
iPhone app www.guardian.co.uk/iphone
 
To save up to 30% when you subscribe to the Guardian and the Observer
visit www.guardian.co.uk/subscriber 
---------------------------------------------------------------------
This e-mail and all attachments are confidential and may also
be privileged. If you are not the named recipient, please notify
the sender and delete the e-mail and all attachments immediately.
Do not disclose the contents to another person. You may not use
the information for any purpose, or store, or copy, it in any way.
 
Guardian News & Media Limited is not liable for any computer
viruses or other material transmitted with or as part of this
e-mail. You should employ virus checking software.

Guardian News & Media Limited

A member of Guardian Media Group plc
Registered Office
PO Box 68164
Kings Place
90 York Way
London
N1P 2AP

Registered in England Number 908396

Re: update extracted docs

Posted by Emmanuel Espina <es...@gmail.com>.

Solr or Lucene does not update documents. It deletes the old one and
replaces it with a new one when it has the same id.
So if you create a document with the changed fields only, and the same
id, and upload that one, the old one will be erased and replaced with
the new one. So THAT behaviour is expectable.

For updating documents you simply add the entire document again with
the modified fields, or, if that is an expensive procedure and want to
avoid the extraction of the metadata, you can store all the fields and
retrieve the full document, create a new document with all the fields,
even the not modified ones, and use the /update handler to add it
again.

Does that answer your question?

Thanks
Emmanuel






2012/2/15 Harold Frayman <ha...@guardian.co.uk>:
> Hi
>
> I have a solr 3.5 database which is populated by using /update/extract
> (configured pretty much as per the examples) and additional metadata. The
> uploads are handled by a perl-driven webapp which uses WebService::Solr
> (which use behind-the-scenes POSTing). That all works fine.
>
> When I come to update the metadata associated with the stored docs, again
> using my perl web app, I find the solr doc (by id), amend or append all the
> changed metadata and use /update to re-post them. Again that works fine ...
> but I'm getting nervous because I'm not sure why it works.
>
> If I try to update only the changed fields for a single doc, the unchanged
> fields are removed. Slightly surprising, but if that's what I should
> expect, it's not difficult to accept.
>
> So how come using /update doesn't remove the text content (and the indexing
> on it) which was originally obtained using /update/extract? And can I
> depend on it being there in future, after optimization, for example?
>
> And if I can't, what is the best technique for updating metadata under
> these circumstances?
>
> Harold Frayman
>
> Please consider the environment before printing this email.
> ------------------------------------------------------------------
> Visit guardian.co.uk - newspaper of the year
>
> www.guardian.co.uk    www.observer.co.uk     www.guardiannews.com
>
> On your mobile, visit m.guardian.co.uk or download the Guardian
> iPhone app www.guardian.co.uk/iphone
>
> To save up to 30% when you subscribe to the Guardian and the Observer
> visit www.guardian.co.uk/subscriber
> ---------------------------------------------------------------------
> This e-mail and all attachments are confidential and may also
> be privileged. If you are not the named recipient, please notify
> the sender and delete the e-mail and all attachments immediately.
> Do not disclose the contents to another person. You may not use
> the information for any purpose, or store, or copy, it in any way.
>
> Guardian News & Media Limited is not liable for any computer
> viruses or other material transmitted with or as part of this
> e-mail. You should employ virus checking software.
>
> Guardian News & Media Limited
>
> A member of Guardian Media Group plc
> Registered Office
> PO Box 68164
> Kings Place
> 90 York Way
> London
> N1P 2AP
>
> Registered in England Number 908396