You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Thomas Scheffler <th...@uni-jena.de> on 2013/11/27 09:13:14 UTC
weak documents
Hi,
I am relatively new to SOLR and I am looking for a neat way to implement
weak documents with SOLR.
Whenever a document is updated or deleted all it's dependent documents
should be removed from the index. In other words they exist as long as
the document exist they refer to when they were indexed - in that
specific version. On "update" they will be indexed after their master
document.
I could like to have some kind of "dependsOn" field that carries the
uniqueKey value of the master document.
Can this be done efficiently with SOLR?
I need this technique because on update and on delete I don't know how
many dependent documents exists in the SOLR index. Especially for batch
index processes, I need a more efficient way than query before every
update or delete.
kind regards,
Thomas
Re: weak documents
Posted by Walter Underwood <wu...@wunderwood.org>.
Right. Delete by query "id:foo OR dependsOn:foo". --wunder
On Nov 27, 2013, at 6:23 AM, "Jack Krupansky" <ja...@basetechnology.com> wrote:
> Just bite the bullet and do the query at your application level. I mean, Solr/Lucene would have to do the same amount of work internally anyway. If the perceived performance overhead is too great, get beefier hardware.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Thomas Scheffler
> Sent: Wednesday, November 27, 2013 3:13 AM
> To: SOLR User
> Subject: weak documents
>
> Hi,
>
> I am relatively new to SOLR and I am looking for a neat way to implement
> weak documents with SOLR.
>
> Whenever a document is updated or deleted all it's dependent documents
> should be removed from the index. In other words they exist as long as
> the document exist they refer to when they were indexed - in that
> specific version. On "update" they will be indexed after their master
> document.
>
> I could like to have some kind of "dependsOn" field that carries the
> uniqueKey value of the master document.
>
> Can this be done efficiently with SOLR?
>
> I need this technique because on update and on delete I don't know how
> many dependent documents exists in the SOLR index. Especially for batch
> index processes, I need a more efficient way than query before every
> update or delete.
>
> kind regards,
>
> Thomas
--
Walter Underwood
wunder@wunderwood.org
Re: weak documents
Posted by Jack Krupansky <ja...@basetechnology.com>.
Just bite the bullet and do the query at your application level. I mean,
Solr/Lucene would have to do the same amount of work internally anyway. If
the perceived performance overhead is too great, get beefier hardware.
-- Jack Krupansky
-----Original Message-----
From: Thomas Scheffler
Sent: Wednesday, November 27, 2013 3:13 AM
To: SOLR User
Subject: weak documents
Hi,
I am relatively new to SOLR and I am looking for a neat way to implement
weak documents with SOLR.
Whenever a document is updated or deleted all it's dependent documents
should be removed from the index. In other words they exist as long as
the document exist they refer to when they were indexed - in that
specific version. On "update" they will be indexed after their master
document.
I could like to have some kind of "dependsOn" field that carries the
uniqueKey value of the master document.
Can this be done efficiently with SOLR?
I need this technique because on update and on delete I don't know how
many dependent documents exists in the SOLR index. Especially for batch
index processes, I need a more efficient way than query before every
update or delete.
kind regards,
Thomas
Re: weak documents
Posted by Thomas Scheffler <th...@uni-jena.de>.
Am 27.11.2013 09:58, schrieb Paul Libbrecht:
> Thomas,
>
> our experience with Curriki.org is that evaluating what I call the
> "related documents" is a procedure that needs access to the complete
> content and thus is run at the DB level and no thte sold-level.
>
> For example, if a user changes a part of its name, we need to reindex
> all of his resources. Sure we could try to run a solr query for this,
> and maybe add index fields for it, but we felt it better to run this
> on the index-trigger side, the thing in our (XWiki) wiki which
> listens to changes and requests the reindexing of a few documents
> (including deletions).
>
> For the maintenance operation, the same issue has appeared. So, if
> the indexer or listener or solr has been down for a few minutes or
> hours, we'd need to reindex not only all changed documents but all
> changed documents and their related documents.
>
> If you are able to work through your solution that would be
> solr-only, to write down all depends-on at index time, it means you
> would index-update all "inverse related" documents every time that
> changes. For the relation above (documents of a user), it means the
> user documents needs reindexing every time a new document is added. I
> wonder if this makes a scale difference.
I think both use-cases differ a bit. On index-time of my master document
I have all information of dependent documents ready. So instead of
committing one document I commit - lets say - four.
In your case you have to query to get all documents of a user first.
Here is a more detailed use-case. I have metadata in 1 to n languages to
describe a document (e.g. journal article).
I commit a master document in a specified default language to SOLR and
one document for every language I have metadata for. If a user adds or
removes metadata (e.g. abstract in French) there is one document more or
one document less in SOLR. So their number changes and I want stalled
data to be kept in the index.
A similar use case: I have article documents with authors. I create
"author" documents for every article. If someone adds or removes an
author I need to track that change. These "dump" author documents are
used for an alphabetical person index and hold a unique field that is
used to group them but these documents exists only as long as their
master documents do.
My two use-cases are quite similar so I would like these "weak"
documents functionality somehow.
SOLR knows if a document is added with id=foo it have to replace a
document that matches id:"foo". If I can change this behavior to
dependsOn:"foo" I am done. :-D
regards
Thomas
Re: weak documents
Posted by Paul Libbrecht <pa...@hoplahup.net>.
Thomas,
our experience with Curriki.org is that evaluating what I call the "related documents" is a procedure that needs access to the complete content and thus is run at the DB level and no thte sold-level.
For example, if a user changes a part of its name, we need to reindex all of his resources. Sure we could try to run a solr query for this, and maybe add index fields for it, but we felt it better to run this on the index-trigger side, the thing in our (XWiki) wiki which listens to changes and requests the reindexing of a few documents (including deletions).
For the maintenance operation, the same issue has appeared.
So, if the indexer or listener or solr has been down for a few minutes or hours, we'd need to reindex not only all changed documents but all changed documents and their related documents.
If you are able to work through your solution that would be solr-only, to write down all depends-on at index time, it means you would index-update all "inverse related" documents every time that changes. For the relation above (documents of a user), it means the user documents needs reindexing every time a new document is added. I wonder if this makes a scale difference.
Paul
Le 27 nov. 2013 à 09:13, Thomas Scheffler <th...@uni-jena.de> a écrit :
> Hi,
>
> I am relatively new to SOLR and I am looking for a neat way to implement weak documents with SOLR.
>
> Whenever a document is updated or deleted all it's dependent documents should be removed from the index. In other words they exist as long as the document exist they refer to when they were indexed - in that specific version. On "update" they will be indexed after their master document.
>
> I could like to have some kind of "dependsOn" field that carries the uniqueKey value of the master document.
>
> Can this be done efficiently with SOLR?
>
> I need this technique because on update and on delete I don't know how many dependent documents exists in the SOLR index. Especially for batch index processes, I need a more efficient way than query before every update or delete.
>
> kind regards,
>
> Thomas
Re: weak documents
Posted by Upayavira <uv...@odoko.co.uk>.
Just a guess, I haven't investigated them fully yet, but I wonder if
block joins could serve you here, as they involve creating docs in a
parent child relationship.
Or, you could easily fake it:
<delete>
<id>abcd</id>
<query>parent:abcd</query>
</delete>
Not sure if that syntax is completely right, but using that sort of
thing would get you there, For deletes, think.
There isn't yet an update by query (batch update) feature, one that
would be very useful.
Upayavira
On Wed, Nov 27, 2013, at 08:13 AM, Thomas Scheffler wrote:
> Hi,
>
> I am relatively new to SOLR and I am looking for a neat way to implement
> weak documents with SOLR.
>
> Whenever a document is updated or deleted all it's dependent documents
> should be removed from the index. In other words they exist as long as
> the document exist they refer to when they were indexed - in that
> specific version. On "update" they will be indexed after their master
> document.
>
> I could like to have some kind of "dependsOn" field that carries the
> uniqueKey value of the master document.
>
> Can this be done efficiently with SOLR?
>
> I need this technique because on update and on delete I don't know how
> many dependent documents exists in the SOLR index. Especially for batch
> index processes, I need a more efficient way than query before every
> update or delete.
>
> kind regards,
>
> Thomas