You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cocoon.apache.org by Stefano Mazzocchi <st...@apache.org> on 2003/11/30 21:11:43 UTC

[RT] Converging the repository concept in cocoon

I'm working on Doco and I finished my first phase: I have a repository 
that I like and does what I need. It's Slide, in case you haven't 
noticed ;-)

So, now I have a WebDAV/DeltaV/DASL/ACL repository and I want to 
connect to it.

There has been a lot of work in the area of "Repository API" lately, 
both inside and outside cocoon.

Cocoon currently hosts four different repositories concepts:

  1) two in the linotype block
  2) one in the slide block
  3) one in the repository block (which is a refactoring of the 
SourceRepository in linotype)

the linotype repository is a big time hack: it does what linotype 
needed, but it's not reusable outside (concerns overlap in its 
interface). The SourceRepository is an implementation of the linotype 
Repository over a source instead that over a file system. While nicer, 
it inherits all the problems of the original interface. It does 
versioning but it doesn't do properties or property querying.

the repository in the slide block uses slide directly and, mostly, for 
authentication purposes... it's based on an older version of slide, 
doesn't handle versioning, doesn't handle file properties. It's based 
on actions, generators and transformers. To me, looks old and the need 
to have the repository on the local machine (and keep it opaque to the 
outside world) makes it impossible to use in what I need.

the one in the repository block is the cleanest one, but IMO, its 
design is backwards. I'll explain what I mean in a second.

For now, I think it's a must that, just as we did with forms, we take a 
look at the various approaches and choose one to follow and ignore the 
other ones.

I think the repository block is the best effort, but it needs 
substantial redesign.

                                          - o -

First of all, let me introduce what I mean with a "repository".

A repository is a place where I store my content.

Functionality I need is:

  1) open/save document
  2) create collection of documents
  3) attach metadata to documents (externally to them!!)
  4) query the repository against document metadata
  5) versioning (autoversioning on saving and version update)

how all these functionalities are implemented should *NOT* be my 
concern, nor I want it to be when I'm using the repository.

The linotype repository uses this design, while the one in the 
repository block does not.

Why not? well, it's fully based on sources and tries to obtain the 
above functionalities from the source abstractions. This means that the 
contract is not on the API but on the source URL.... but this also 
means that we cannot fully separate concerns since it's the driver of 
the repository who chooses which source the repository needs to write 
on.

I strongly dislike this design because I think it got it all backwards: 
it should be the Repository to implement Source and give source access 
to those components who want to access content (say a FileGenerator or 
even a TraxTransformer)

I looked into the repository block and I find a *lot* of things 
(locking, permissions, properties) that look very much like a 
duplication of effort. The Slide project spent years optimizing and 
polishing issues like transactionality and locking, do you really want 
to implement a layer to "emulate" those things in case the given source 
is not capable of handling it itself?

I think a much better approach would be to come up with a

  Repository.java

interface and a few implementations that I can choose when I install 
cocoon. This implementation would also implement Source.java and 
provide its functionality thru a URL protocol.

This allows:

  - clear separation of concerns: cocoon should *NOT* be doing 
repository stuff, which is already big and complex enough

  - complete IoC: you choose the implementation and the implementation 
decides what to do and how to do it. Your contract remains the same 
(thru the source-provided URL protocol and thru the component 
interface)

  - transparent polymorphism: you can have different implementations of 
a repository... file system, webdav, CVS, JCR, ... without having to 
change any code in your application

Thoughts?

--
Stefano.

Re: [RT] Converging the repository concept in cocoon

Posted by Guido Casper <gc...@s-und-n.de>.

Stefano Mazzocchi wrote:
> On 1 Dec 2003, at 10:35, Guido Casper wrote:
>
>> The challenge is to find the right balance.
>
> That's *always* tough. On anything. ;-)

Very true.
This reminds me (again) of Rickard Öberg's second principle for making
software frameworks:
"If you make a decision, be sure that it counts"
leaving me wondering what that particularly has to do with software :-)

Guido

Re: [RT] Converging the repository concept in cocoon

Posted by Stefano Mazzocchi <st...@apache.org>.

On 1 Dec 2003, at 10:35, Guido Casper wrote:

> The challenge is to find the right balance.

That's *always* tough. On anything. ;-)

--
Stefano.

Re: [RT] Converging the repository concept in cocoon

Posted by Guido Casper <gc...@s-und-n.de>.

Gianugo Rabellino wrote:
> Stefano Mazzocchi wrote:
>> I think a much better approach would be to come up with a
>>
>>  Repository.java
>>
>> interface and a few implementations that I can choose when I install
>> cocoon. This implementation would also implement Source.java and
>> provide its functionality thru a URL protocol.
>>
>> This allows:
>>
>>  - clear separation of concerns: cocoon should *NOT* be doing
>> repository stuff, which is already big and complex enough
>>
>>  - complete IoC: you choose the implementation and the implementation
>> decides what to do and how to do it. Your contract remains the same
>> (thru the source-provided URL protocol and thru the component
>> interface)
>>
>>  - transparent polymorphism: you can have different implementations
>> of a repository... file system, webdav, CVS, JCR, ... without having
>> to change any code in your application
>>
>> Thoughts?
>
> A couple:
>
> 1. How do you plan to deal with a Source (which becomes a URL in the
> end) with complex stuff such as versioning or, even worse, searching?
> I'm afraid we'll come up with a very ugly URL design, I can't really
> think of a way to express searches in a URL, where a search has at
> least four parameters (what, scope, conditions, ordering) without
> resorting to URL parameters wich are IMO very bad;

Yes, you don't necessarily want to access all funtionality through the
Source interface but want to have another access path through the
Repository interface (to be used by the flow layer etc.).
IIUC this is how Stefano want it to be as well.

>
> 2. Though I'd just _love_ to see it happen, I'm afraid that it will be
> practically impossible to have different, pluggable, implementations.
> I cannot think, apart from JCR and WebDAV, of a repository
> implemented on
> top of other stuff without hacks or heavy implementation (metadata,
> searching and versioning are all hard stuff to do).
>
> Note: I'm definitely +1 on having one standardized approach to the
> repository issue. I'm just not that sure that we can really make it
> with what we have today without mimicking the JCR API, which could be
> suboptimal.

I second that.
If we want to have pluggable implementations we need an
abstraction/wrapper of higher level than just reusing what's already
there like the JCR. JCR may be another plug, but IMO not the only one we
are relying on (don't know wether the JCR will be the ideal WebDAV API
;-).
Still some things will always be specific to a particular
implementation. The challenge is to find the right balance.

Guido

Re: [RT] Converging the repository concept in cocoon

Posted by Stefano Mazzocchi <st...@apache.org>.

On 30 Nov 2003, at 22:10, Gianugo Rabellino wrote:

> Stefano Mazzocchi wrote:
>> I think a much better approach would be to come up with a
>>  Repository.java
>> interface and a few implementations that I can choose when I install 
>> cocoon. This implementation would also implement Source.java and 
>> provide its functionality thru a URL protocol.
>> This allows:
>>  - clear separation of concerns: cocoon should *NOT* be doing 
>> repository stuff, which is already big and complex enough
>>  - complete IoC: you choose the implementation and the implementation 
>> decides what to do and how to do it. Your contract remains the same 
>> (thru the source-provided URL protocol and thru the component 
>> interface)
>>  - transparent polymorphism: you can have different implementations 
>> of a repository... file system, webdav, CVS, JCR, ... without having 
>> to change any code in your application
>> Thoughts?
>
> A couple:
>
> 1. How do you plan to deal with a Source (which becomes a URL in the 
> end) with complex stuff such as versioning or, even worse, searching?

eheh, I have an idea on how to do this nicely.

> I'm afraid we'll come up with a very ugly URL design, I can't really 
> think of a way to express searches in a URL, where a search has at 
> least four parameters (what, scope, conditions, ordering) without 
> resorting to URL parameters wich are IMO very bad;

Very true. In fact, I'm *not* proposing this.

> 2. Though I'd just _love_ to see it happen, I'm afraid that it will be 
> practically impossible to have different, pluggable, implementations. 
> I cannot think, apart from JCR and WebDAV, of a repository implemented 
> on top of other stuff without hacks or heavy implementation (metadata, 
> searching and versioning are all hard stuff to do).

I need *one* implementation and I want to keep concerns separate. What 
happens next, well, it's not really my problem. If there will only one 
implementation... well, that's because Darwin didn't need another 
one... but as long as the architecture allows it, well, it's fine.

> Note: I'm definitely +1 on having one standardized approach to the 
> repository issue. I'm just not that sure that we can really make it 
> with what we have today without mimicking the JCR API, which could be 
> suboptimal.

See my followup.

--
Stefano.

Re: [RT] Converging the repository concept in cocoon

Posted by Gianugo Rabellino <gi...@apache.org>.

Stefano Mazzocchi wrote:
> I think a much better approach would be to come up with a
> 
>  Repository.java
> 
> interface and a few implementations that I can choose when I install 
> cocoon. This implementation would also implement Source.java and provide 
> its functionality thru a URL protocol.
> 
> This allows:
> 
>  - clear separation of concerns: cocoon should *NOT* be doing repository 
> stuff, which is already big and complex enough
> 
>  - complete IoC: you choose the implementation and the implementation 
> decides what to do and how to do it. Your contract remains the same 
> (thru the source-provided URL protocol and thru the component interface)
> 
>  - transparent polymorphism: you can have different implementations of a 
> repository... file system, webdav, CVS, JCR, ... without having to 
> change any code in your application
> 
> Thoughts?

A couple:

1. How do you plan to deal with a Source (which becomes a URL in the 
end) with complex stuff such as versioning or, even worse, searching? 
I'm afraid we'll come up with a very ugly URL design, I can't really 
think of a way to express searches in a URL, where a search has at least 
four parameters (what, scope, conditions, ordering) without resorting to 
URL parameters wich are IMO very bad;

2. Though I'd just _love_ to see it happen, I'm afraid that it will be 
practically impossible to have different, pluggable, implementations. I 
cannot think, apart from JCR and WebDAV, of a repository implemented on 
top of other stuff without hacks or heavy implementation (metadata, 
searching and versioning are all hard stuff to do).

Note: I'm definitely +1 on having one standardized approach to the 
repository issue. I'm just not that sure that we can really make it with 
what we have today without mimicking the JCR API, which could be suboptimal.

Ciao,

-- 
Gianugo Rabellino
Pro-netics s.r.l. -  http://www.pro-netics.com
Orixo, the XML business alliance - http://www.orixo.com
     (Now blogging at: http://blogs.cocoondev.org/gianugo/)

Re: [RT] Converging the repository concept in cocoon

Posted by Michael Wechner <mi...@wyona.com>.

Stefano Mazzocchi wrote:

> 
> On 5 Dec 2003, at 02:17, Andreas Hartmann wrote:
> 
>> Stefano Mazzocchi wrote:
>>
>>> I looked into the repository block and I find a *lot* of things 
>>> (locking, permissions, properties) that look very much like a 
>>> duplication of effort. The Slide project spent years optimizing and 
>>> polishing issues like transactionality and locking, do you really 
>>> want to implement a layer to "emulate" those things in case the given 
>>> source is not capable of handling it itself?
>>
>>
>> I have a basic question: When you talk about a
>> "repository", does this imply that operations on
>> ressources are transactional, or is this an optional
>> feature?
> 
> 
> That's up to us to define what level of transactionality we 
> want/need/able-to-implement-shortly.
> 
> For me, now, transactionality is not that important, but it would be a 
> good thing to have (for example, making the saving of one document that 
> contains images an atomic thing)

I think for Lenya it is quite important. As soon as we enter a 
multi-user environment where several "documents" are being modified
within a single transaction, things can become very tricky without the 
actual support of transactions.

I hope I will find more time to really join this thread. On the other 
hand I think things are heading into the right direction.

I am currently preparing some thoughts about introspection and workflow 
instances, but I need some more time to write it down. But it should be 
ready before X'Mas.

Anyway, I am going to fix the rest of these "michi"s now within Lenya.

Michi

> 
> -- 
> Stefano.

-- 
Michael Wechner
Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
http://www.wyona.com              http://cocoon.apache.org/lenya/
michael.wechner@wyona.com                        michi@apache.org

Re: [RT] Converging the repository concept in cocoon

Posted by Stefano Mazzocchi <st...@apache.org>.

On 5 Dec 2003, at 02:17, Andreas Hartmann wrote:

> Stefano Mazzocchi wrote:
>
>> I looked into the repository block and I find a *lot* of things 
>> (locking, permissions, properties) that look very much like a 
>> duplication of effort. The Slide project spent years optimizing and 
>> polishing issues like transactionality and locking, do you really 
>> want to implement a layer to "emulate" those things in case the given 
>> source is not capable of handling it itself?
>
> I have a basic question: When you talk about a
> "repository", does this imply that operations on
> ressources are transactional, or is this an optional
> feature?

That's up to us to define what level of transactionality we 
want/need/able-to-implement-shortly.

For me, now, transactionality is not that important, but it would be a 
good thing to have (for example, making the saving of one document that 
contains images an atomic thing)

--
Stefano.

Re: [RT] Converging the repository concept in cocoon

Posted by Andreas Hartmann <an...@apache.org>.

Stefano Mazzocchi wrote:

> 
> I looked into the repository block and I find a *lot* of things 
> (locking, permissions, properties) that look very much like a 
> duplication of effort. The Slide project spent years optimizing and 
> polishing issues like transactionality and locking, do you really want 
> to implement a layer to "emulate" those things in case the given source 
> is not capable of handling it itself?

I have a basic question: When you talk about a
"repository", does this imply that operations on
ressources are transactional, or is this an optional
feature?

Thanks,
-- Andreas

Re: [RT] Converging the repository concept in cocoon

Posted by Stefano Mazzocchi <st...@apache.org>.

On 1 Dec 2003, at 13:04, Mats Norén wrote:

> Things that could prove useful:
>
> 6) observation - add listeners to specific events in the repository 
> based on both the type of event and on the location in the repository.

yes, this is *very* useful. But I'm not sure how we could handle this 
ATM so I would leave it out for now.

> 7)  visitable nodes in the tree - do batch processing on nodes in the 
> repository, etc. For example to set specific properties on nodes in a 
> specific branch.

no branching for now, Slide doesn't support it (yet).

> From a flow (or more correctly from a rhino) perspective I´ve been 
> thinking about some kind of  scriptable node to make it possible to 
> script certain tasks against the repository. This could of course be 
> used from the flow-layer as well.
> Anyone else out there that´s been experimenting with this idea?
>
> I´m aware of the fact that these functionality requirements are not 
> the first to consider when converging the repository concept in 
> Cocoon, but I still think they can be useful. :)

Definately, and JCR addresses all of these issues. But don't hold your 
breath to have these implemented soon.

--
Stefano.

Re: [RT] Converging the repository concept in cocoon

Posted by Mats Norén <ma...@alma.nu>.

Stefano Mazzocchi wrote:

> I'm working on Doco and I finished my first phase: I have a repository 
> that I like and does what I need. It's Slide, in case you haven't 
> noticed ;-)
>
> So, now I have a WebDAV/DeltaV/DASL/ACL repository and I want to 
> connect to it.
>
> There has been a lot of work in the area of "Repository API" lately, 
> both inside and outside cocoon.
>
> Cocoon currently hosts four different repositories concepts:
>
>  1) two in the linotype block
>  2) one in the slide block
>  3) one in the repository block (which is a refactoring of the 
> SourceRepository in linotype)
>
> the linotype repository is a big time hack: it does what linotype 
> needed, but it's not reusable outside (concerns overlap in its 
> interface). The SourceRepository is an implementation of the linotype 
> Repository over a source instead that over a file system. While nicer, 
> it inherits all the problems of the original interface. It does 
> versioning but it doesn't do properties or property querying.
>
> the repository in the slide block uses slide directly and, mostly, for 
> authentication purposes... it's based on an older version of slide, 
> doesn't handle versioning, doesn't handle file properties. It's based 
> on actions, generators and transformers. To me, looks old and the need 
> to have the repository on the local machine (and keep it opaque to the 
> outside world) makes it impossible to use in what I need. 

Not entirely true,  there is some versionable stuff in there, and it 
uses a CVS version of slide2.0 I think. We´ve been using it for a simple 
CMS-solution.
We´re using a relational backend (mysql) instead of  the 
XMLFileDescriptorStore, we store both properties and content and they 
are all versioned.
But if the C2 team could come up with something better I would be more 
than happy to switch to it :)

>
>
> the one in the repository block is the cleanest one, but IMO, its 
> design is backwards. I'll explain what I mean in a second.
>
> For now, I think it's a must that, just as we did with forms, we take 
> a look at the various approaches and choose one to follow and ignore 
> the other ones.
>
> I think the repository block is the best effort, but it needs 
> substantial redesign.
>
>                                          - o -
>
> First of all, let me introduce what I mean with a "repository".
>
> A repository is a place where I store my content.
>
> Functionality I need is:
>
>  1) open/save document
>  2) create collection of documents
>  3) attach metadata to documents (externally to them!!)
>  4) query the repository against document metadata
>  5) versioning (autoversioning on saving and version update)

Things that could prove useful:

6) observation - add listeners to specific events in the repository 
based on both the type of event and on the location in the repository.
7)  visitable nodes in the tree - do batch processing on nodes in the 
repository, etc. For example to set specific properties on nodes in a 
specific branch.

 From a flow (or more correctly from a rhino) perspective I´ve been 
thinking about some kind of  scriptable node to make it possible to 
script certain tasks against the repository. This could of course be 
used from the flow-layer as well.
Anyone else out there that´s been experimenting with this idea?

I´m aware of the fact that these functionality requirements are not the 
first to consider when converging the repository concept in Cocoon, but 
I still think they can be useful. :)

/Regards Mats

Re: [RT] Converging the repository concept in cocoon

Posted by Stefano Mazzocchi <st...@apache.org>.

On 30 Nov 2003, at 21:41, Ryan Hoegg wrote:

> Stefano Mazzocchi wrote:
>
>> I think a much better approach would be to come up with a
>>
>>  Repository.java
>>
>> interface and a few implementations that I can choose when I install 
>> cocoon. This implementation would also implement Source.java and 
>> provide its functionality thru a URL protocol.
>>
>> This allows:
>>
>>  - clear separation of concerns: cocoon should *NOT* be doing 
>> repository stuff, which is already big and complex enough
>>
>>  - complete IoC: you choose the implementation and the implementation 
>> decides what to do and how to do it. Your contract remains the same 
>> (thru the source-provided URL protocol and thru the component 
>> interface)
>>
>>  - transparent polymorphism: you can have different implementations 
>> of a repository... file system, webdav, CVS, JCR, ... without having 
>> to change any code in your application
>>
>> Thoughts?
>>
>> -- 
>> Stefano.
>
>
> Looks good!  However, I think some people may not need all 5 of your 
> requirements:

Great, don't use them.

> > Functionality I need is:
> >
> >  1) open/save document
> >  2) create collection of documents
>
> I think these are the core requirements of a repository.

oh, c'mon, that's a core requirement of a file system and you already 
have it.

> I also think a simple form of document retrieval (like a Source) is 
> necessary, but that might just be rolled into "open"

And you have that as well, it's called "file:"

> >  3) attach metadata to documents (externally to them!!)
> >  4) query the repository against document metadata
> >  5) versioning (autoversioning on saving and version update)
>
> Some people might like to get versioning and metadata a la carte.

if so, don't use the repository and use the FS. but I don't get why you 
would want to do your own stuff once you have a library that does it 
for you.

> Not all versioned repositories need an external metadata facility or 
> the ability to query it.  I think all versioned repositories do need 
> some metadata (commit logs, permissions, etc).
>
> So what about a basic Repository interface, a specialization of it for 
> metadata, and a specialization of that for versioning?

I don't see the point in doing this. A repository without metadata and 
versioning is a file system and you already have an API to use that in 
java.io.*

--
Stefano.

Re: [RT] Converging the repository concept in cocoon

Posted by Ryan Hoegg <rh...@isisnetworks.net>.

Stefano Mazzocchi wrote:

> I think a much better approach would be to come up with a
>
>  Repository.java
>
> interface and a few implementations that I can choose when I install 
> cocoon. This implementation would also implement Source.java and 
> provide its functionality thru a URL protocol.
>
> This allows:
>
>  - clear separation of concerns: cocoon should *NOT* be doing 
> repository stuff, which is already big and complex enough
>
>  - complete IoC: you choose the implementation and the implementation 
> decides what to do and how to do it. Your contract remains the same 
> (thru the source-provided URL protocol and thru the component interface)
>
>  - transparent polymorphism: you can have different implementations of 
> a repository... file system, webdav, CVS, JCR, ... without having to 
> change any code in your application
>
> Thoughts?
>
> -- 
> Stefano.

Looks good!  However, I think some people may not need all 5 of your 
requirements:

 > Functionality I need is:
 >
 >  1) open/save document
 >  2) create collection of documents

I think these are the core requirements of a repository.  I also think a 
simple form of document retrieval (like a Source) is necessary, but that 
might just be rolled into "open"

 >  3) attach metadata to documents (externally to them!!)
 >  4) query the repository against document metadata
 >  5) versioning (autoversioning on saving and version update)

Some people might like to get versioning and metadata a la carte.   Not 
all versioned repositories need an external metadata facility or the 
ability to query it.  I think all versioned repositories do need some 
metadata (commit logs, permissions, etc).

So what about a basic Repository interface, a specialization of it for 
metadata, and a specialization of that for versioning?

--
Ryan Hoegg
ISIS Networks
http://www.isisnetworks.net