You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@chemistry.apache.org by Florian Müller <fm...@opentext.com> on 2009/12/15 16:38:23 UTC

CMIS Implementation Experiences

Hi all,

I would like to foster the technical discussion between the Chemistry team and the people behind the OpenCMIS proposal. If you think this is inappropriate on this list, please let me know.

In order to explain the rationale behind the OpenCMIS design I would like to talk about some of the experiences that we made with CMIS client and server implementations.

We also started with Abdera on the server side. It turned out to be more pain than joy. With a pure JAXB design we ran into compatibility issues. A good tradeoff between efficiency, correctness and maintainability seems to be StAX with JAXB. OpenCMIS handles all AtomPub related tags with StAX and all CMIS related data with JAXB. The JAXB objects are not exposed to the application. They are just interim objects.
The same StAX/JAXB design should work on the server side as well. The effort to implement AtomPub is manageable. I've done this in my CMIS FileShare project.

Another detail we learned is that implementing both bindings in parallel saves you a lot of refactoring later. Both CMIS bindings are really different. If you align your classes and flows to just one binding you might have to refactor a lot later to make the other binding work smoothly. This insight is reflected in OpenCMIS in two areas. First of all, there is a strict decoupling of the binding implementation (Provider layer) and the nicer Java API (Client layer). If somebody would show up with a third CMIS binding we just have to touch the Provider layer. The second area is within the Provider layer. We tried to reuse as much code and concepts as possible between both binding implementations. For example, both binding implementations share the generated JAXB classes, the caching infrastructure and several utilities.

We introduced type (and repository info) caching based on our experiences with applications using a CMIS library. Applications need type information all over the place and it is expensive to fetch them over and over again. From a library perspective one can argue that caching should be done a level above the library. From practical standpoint it would be nice if it is done once and right. So we decided to put it into OpenCMIS. If an application doesn't want it, it can switch it off. The caching works implicitly. Whenever a type definitions runs through the library the data is cached or refreshed.
CMIS provides no mechanism to detect type changes. So there is a slight chance that the type cache holds outdated data. In an enterprise scenario (and that's what OpenCMIS is aiming at) type changes shouldn't happen often. They are usually interconnected with an update or re-deployment of the application. A paranoid application developer can switch off the cache (and accept the performance penalty) or clear the cache regularly (every hour or every five minutes or every 30 seconds...) or create a new session once a while. Since sessions are bound to logins there is a regular exchange of sessions and therewith caches, anyway.

Another aspect that we think is important are extensions. CMIS defines a lot of extension points and repositories will make use of it sooner or later. Application should be able to access and set extension data. Sure, it is against the idea of a standard but it will happen and the library should be prepared for that. The difficult part here is to make the binding invisible to the application since some extension points are very binding specific. Using JAXB in both bindings covers a lot but not everything. OpenCMIS has the infrastructure in place but is not perfect in this regard, yet.

I hope that's the beginning of a fruitful conversation,

Florian

Re: CMIS Implementation Experiences

Posted by Florent Guillaume <fg...@nuxeo.com>.

Hi Florian, David,

On Tue, Dec 15, 2009 at 5:24 PM, David Nuescheler
<da...@day.com> wrote:
> On Tue, Dec 15, 2009 at 4:38 PM, Florian Müller <fm...@opentext.com> wrote:
>> In order to explain the rationale behind the OpenCMIS design I would like to talk about
>> some of the experiences that we made with CMIS client and server implementations.
>> We also started with Abdera on the server side. It turned out to be more pain than joy.
>
> i think in the absence of Dominique (who is on vacation) it is fair to
> say that in our
> initial implementations we also found that there were some extension points
> that we had to use that made us feel like abdera was not exactly
> designed to inject
> cmis in a surgical operation.
> that may have to do with cmis and its use of atompub, but i think i
> agree with the
> general sentiment...

Agreed, we totally need extension points for each method in the SPI.

>> With a pure JAXB design we ran into compatibility issues. A good tradeoff between
>> efficiency, correctness and maintainability seems to be StAX with JAXB. OpenCMIS
>> handles all AtomPub related tags with StAX and all CMIS related data with JAXB.
>> The JAXB objects are not exposed to the application. They are just interim objects.
>> The same StAX/JAXB design should work on the server side as well. The
>> effort to implement AtomPub is manageable. I've done this in my CMIS
>> FileShare project.
>
> sounds like a reasonable proposal to me. especially given your experience
> advice would be very welcome.

Yes, that seems like a good way to do it. There's the overhead of
instantiating JAXB objects just to serialize them later to XML, when
you could just generate the XML using StAX, but that's an acceptable
tradeoff. The codebase we have today uses pure StAX because of its
history, we had a high-performance need for a customer and reducing
the number of generated objects was paramount. But I'm pretty open to
refactoring this.

>> Another detail we learned is that implementing both bindings in parallel
>> saves you a lot of refactoring later. Both CMIS bindings are really
>> different. If you align your classes and flows to just one binding you
>> might have to refactor a lot later  to make the other binding work smoothly.
>
> agreed. i think the chemistry focus on the atompub parts of the spec
> was just a way to get started, rather than a long-term plan.

Yes, I want to set up the SOAP client and server bindings soon,
hopefully before the end of the year. I have experience in a basic
SOAP server for Nuxeo bindings, and the client part shouldn't be hard
to start.

>> We introduced type (and repository info) caching based on our
>> experiences with applications using a CMIS library. Applications need
>> type information all over the place and it is expensive to fetch them over
>> and over again.
>
> absolutely. we ran into the same situation with jcr remoting through
> our spi layer in jackrabbit. luckily, jcr already anticipates such caching
> layer and exposes explicit "refresh()" methods.

We already have caching of the types in APPRepository (see
loadTypes()), and the repository info is read only once as well.

>> From a library perspective one can argue that caching should be done a
>> level above the library.
>> From practical standpoint it would be nice if it is done once and right.
>
> i would even argue that depending on the application you may cache on
> the application in addition of the caching in the pure transport layer.
> i think there is nothing wrong with a cache as long as the application has
> a means to refresh/invalidate the cache... ideally this would be possible
> for parts of the cache, per folder/document or similar...

Yep. Eventually I want to put intelligent caching in the layer that
implements the high level API, but I've held off on this for now
("optimize later").

>> So we decided to put it into OpenCMIS. If an application doesn't want it,
>> it can switch it off. The caching works implicitly.
>
> i would say a refresh much like in a browser could give the application the
> option to flush parts of the cache or even expose that to the user.
> in many cases the user "knows" that something changed and having
> something like a "refresh"-button in the browser can help.
> in my experience it really saves you a lot of first support calls, since
> if the user does not see what he wants to see, he just hits the refresh
> button, but of course that's a concern of the application and not
> of the cmis client.

I'm not too fond of explicit refresh actions, unless it's unavoidable...

>> Whenever a type definitions runs through the library the data is
>> cached or refreshed. CMIS provides no mechanism to detect
>> type changes.

That's a good way to do it. For now Chemistry (for AtomPub) reads all
the types on the first connection and cache them, but this could be
done lazily as you describe.

> i think type changes happen infrequent enough, that it is not
> an issue in the majority of the cases, especially if we
> expose an explicit "refresh" of the cache delegated to
> the app or the user.
>
>> So there is a slight chance that the type cache holds outdated
>> data. In an enterprise scenario (and that's what OpenCMIS is
>> aiming at) type changes shouldn't happen often. They are
>> usually interconnected with an update or re-deployment of the
>> application. A paranoid application developer can switch off the
>> cache (and accept the performance penalty) or clear the cache
>> regularly (every hour or every five minutes or every 30 seconds...)
>> or create a new session once a while.
> ...or let the user of the app decide. especially
> webapp users are used to refresh buttons ;)
>
>> Since sessions are bound to logins there is a regular exchange
>> of sessions and therewith caches, anyway.
> sounds good.

Yep.

>> Another aspect that we think is important are extensions. CMIS
>> defines a lot of extension points and repositories will make use of
>> it sooner or later. Application should be able to access and set
>> extension data. Sure, it is against the idea of a standard but it will
>> happen and the library should be prepared for that. The difficult
>> part here is to make the binding invisible to the application since
>> some extension points are very binding specific. Using JAXB in both
>> bindings covers a lot but not everything. OpenCMIS has the
>> infrastructure in place but is not perfect in this regard, yet.
>
> i think extension points are very desirable particularly in something
> that should be a framework for various implementations / users.
> having said that, superfluous extension points always become
> a maintenance and backwards compatibility issue in the future, when
> we want to refactor things again, and are not sure if we break someones
> extensions... so i think we should choose extension points based
> on real-life scenarios, rather than on wild ideas ;)
> i think we are a group that is broad enough here that we have enough
> real-life use cases to come up with a good set of extension points
> to start with.

I agree with you that extensions are absolutely needed, they're in the
spec for a reason. However at the same time they shouldn't make the
APIs too burdensome to use...

Florent

-- 
Florent Guillaume, Director of R&D, Nuxeo
Open Source, Java EE based, Enterprise Content Management (ECM)
http://www.nuxeo.com   http://www.nuxeo.org   +33 1 40 33 79 87

Re: CMIS Implementation Experiences

Posted by David Nuescheler <da...@day.com>.

Hi Florian,

> I would like to foster the technical discussion between the Chemistry team and the
> people behind the OpenCMIS proposal. If you think this is inappropriate on this list,
> please let me know.

thanks a lot for the input and thanks for starting the technical
conversation. i think this
is really appreciated...

...and this is perfectly appropriate for this list!

> In order to explain the rationale behind the OpenCMIS design I would like to talk about
> some of the experiences that we made with CMIS client and server implementations.
> We also started with Abdera on the server side. It turned out to be more pain than joy.

i think in the absence of Dominique (who is on vacation) it is fair to
say that in our
initial implementations we also found that there were some extension points
that we had to use that made us feel like abdera was not exactly
designed to inject
cmis in a surgical operation.
that may have to do with cmis and its use of atompub, but i think i
agree with the
general sentiment...

> With a pure JAXB design we ran into compatibility issues. A good tradeoff between
> efficiency, correctness and maintainability seems to be StAX with JAXB. OpenCMIS
> handles all AtomPub related tags with StAX and all CMIS related data with JAXB.
> The JAXB objects are not exposed to the application. They are just interim objects.
> The same StAX/JAXB design should work on the server side as well. The
> effort to implement AtomPub is manageable. I've done this in my CMIS
> FileShare project.
sounds like a reasonable proposal to me. especially given your experience
advice would be very welcome.

> Another detail we learned is that implementing both bindings in parallel
> saves you a lot of refactoring later. Both CMIS bindings are really
> different. If you align your classes and flows to just one binding you
> might have to refactor a lot later  to make the other binding work smoothly.
agreed. i think the chemistry focus on the atompub parts of the spec
was just a way to get started, rather than a long-term plan.

> We introduced type (and repository info) caching based on our
> experiences with applications using a CMIS library. Applications need
> type information all over the place and it is expensive to fetch them over
> and over again.
absolutely. we ran into the same situation with jcr remoting through
our spi layer in jackrabbit. luckily, jcr already anticipates such caching
layer and exposes explicit "refresh()" methods.

> From a library perspective one can argue that caching should be done a
> level above the library.
> From practical standpoint it would be nice if it is done once and right.
i would even argue that depending on the application you may cache on
the application in addition of the caching in the pure transport layer.
i think there is nothing wrong with a cache as long as the application has
a means to refresh/invalidate the cache... ideally this would be possible
for parts of the cache, per folder/document or similar...

> So we decided to put it into OpenCMIS. If an application doesn't want it,
> it can switch it off. The caching works implicitly.
i would say a refresh much like in a browser could give the application the
option to flush parts of the cache or even expose that to the user.
in many cases the user "knows" that something changed and having
something like a "refresh"-button in the browser can help.
in my experience it really saves you a lot of first support calls, since
if the user does not see what he wants to see, he just hits the refresh
button, but of course that's a concern of the application and not
of the cmis client.

> Whenever a type definitions runs through the library the data is
> cached or refreshed. CMIS provides no mechanism to detect
> type changes.
i think type changes happen infrequent enough, that it is not
an issue in the majority of the cases, especially if we
expose an explicit "refresh" of the cache delegated to
the app or the user.

> So there is a slight chance that the type cache holds outdated
> data. In an enterprise scenario (and that's what OpenCMIS is
> aiming at) type changes shouldn't happen often. They are
> usually interconnected with an update or re-deployment of the
> application. A paranoid application developer can switch off the
> cache (and accept the performance penalty) or clear the cache
> regularly (every hour or every five minutes or every 30 seconds...)
> or create a new session once a while.
...or let the user of the app decide. especially
webapp users are used to refresh buttons ;)

> Since sessions are bound to logins there is a regular exchange
> of sessions and therewith caches, anyway.
sounds good.

> Another aspect that we think is important are extensions. CMIS
> defines a lot of extension points and repositories will make use of
> it sooner or later. Application should be able to access and set
> extension data. Sure, it is against the idea of a standard but it will
> happen and the library should be prepared for that. The difficult
> part here is to make the binding invisible to the application since
> some extension points are very binding specific. Using JAXB in both
> bindings covers a lot but not everything. OpenCMIS has the
> infrastructure in place but is not perfect in this regard, yet.
i think extension points are very desirable particularly in something
that should be a framework for various implementations / users.
having said that, superfluous extension points always become
a maintenance and backwards compatibility issue in the future, when
we want to refactor things again, and are not sure if we break someones
extensions... so i think we should choose extension points based
on real-life scenarios, rather than on wild ideas ;)

i think we are a group that is broad enough here that we have enough
real-life use cases to come up with a good set of extension points
to start with.

> I hope that's the beginning of a fruitful conversation,
same here, very much so... thanks a lot for starting it.

regards,
david