You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by Bertrand Delacretaz <bd...@apache.org> on 2008/04/01 08:08:29 UTC

Re: JCR & thesis

On Mon, Mar 31, 2008 at 10:49 PM, Andreas Hartmann <an...@apache.org> wrote:

> ... Find all documents containing the XPath
>  //a[local-name() = 'xhtml' and namespace-uri = 'http://...' and
>  starts-with(@href,'lenya-document:c2c38f30-ff68-11dc-9682-9dea3e2477d4)]
>  That would be typical to find links that would be broken after a
>  document is removed from the live site. I know that JCR doesn't support
>  this directly - I guess this is where XML DBs shine. With JCR, is it
>  necessary to traverse all documents and query the content using XPath,
>  or is there a better solution?...

That's a typical case where the content model makes all the
difference: if each link is a JCR Item (a soft or hard reference
property for example), instead of being embedded in the content,
finding them is very efficient.

That might require some processing when saving documents, with the
benefit of a much richer content structure.

Such an example shows how hard it is to compare storage technologies,
and how important it is to publish the complete source code used for
tests, so that experts of each technology can have a look and comment
on what could be improved.

-Bertrand

Re: JCR & thesis

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi Andreas,

On Tue, Apr 1, 2008 at 9:02 AM, Andreas Hartmann <an...@apache.org> wrote:
> ... just for my understanding: Before saving I would parse the document,
>  extract all internal links and add them to a "outgoingLinks" multi-value
>  property? This makes a lot of sense. We could even add this feature to
>  our current Lenya repository (we have multi-value meta data)....

That's how I would do it, either with one node per link, which would
allow for richer link metadata to be added later if needed, or with a
multi-value property as you suggest.

-Bertrand

Re: JCR & thesis

Posted by Andreas Hartmann <an...@apache.org>.

Hi Bertrand,

Bertrand Delacretaz schrieb:
> On Mon, Mar 31, 2008 at 10:49 PM, Andreas Hartmann <an...@apache.org> wrote:
> 
>> ... Find all documents containing the XPath
>>  //a[local-name() = 'xhtml' and namespace-uri = 'http://...' and
>>  starts-with(@href,'lenya-document:c2c38f30-ff68-11dc-9682-9dea3e2477d4)]
>>  That would be typical to find links that would be broken after a
>>  document is removed from the live site. I know that JCR doesn't support
>>  this directly - I guess this is where XML DBs shine. With JCR, is it
>>  necessary to traverse all documents and query the content using XPath,
>>  or is there a better solution?...
> 
> That's a typical case where the content model makes all the
> difference: if each link is a JCR Item (a soft or hard reference
> property for example), instead of being embedded in the content,
> finding them is very efficient.
> 
> That might require some processing when saving documents, with the
> benefit of a much richer content structure.

just for my understanding: Before saving I would parse the document, 
extract all internal links and add them to a "outgoingLinks" multi-value 
property? This makes a lot of sense. We could even add this feature to 
our current Lenya repository (we have multi-value meta data). Thanks for 
the hint!

-- Andreas

> 
> Such an example shows how hard it is to compare storage technologies,
> and how important it is to publish the complete source code used for
> tests, so that experts of each technology can have a look and comment
> on what could be improved.
> 
> -Bertrand
> 


-- 
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch
Tel.: +41 (0) 43 818 57 01