You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jspwiki.apache.org by Janne Jalkanen <ja...@ecyrd.com> on 2009/02/08 21:15:24 UTC

JCR integration

Folks,

I've got a bit of a problem with respect to the JCR integration.  I  
will write a lengthy email explaining my current thoughts on the  
subject partly to get your ideas, and partly to clarify my own  
thoughts on the subject.

Our current model is stateless in the sense that the repository is  
open all the time, and you just access the pages with a String handle  
(pagename).

However, JCR model is stateful.  That is you first get a Session  
object, through which you manipulate the contents of the repository,  
and once you're done, you release that object.  This is similar to  
Hibernate, though JCR Sessions are not threadsafe (and assumed to be  
bound to a single Thread even, so they can't even be shared when  
synchronized).

So logically we *could* have ContentManager have a single ThreadLocal  
which keeps the Session object. However, since a Session gathers all  
of the changes, it means that in case of e.g. exceptions, the old data  
would still remain somewhere in the Session, and upon a new save, some  
old, obsolete data would be saved as well.  This is obviously not a  
good thing.

No, we can't empty the Session with Session.refresh() before access  
simply because we don't know when access begins - it could start even  
with a simple getPage() - and obviously it's not a good thing if you  
call getPage() multiple times during a single access.

So, it seems to me that in order to avoid side effects, we need to  
have some sort of session management - which means acquiring a session  
to the JSPWiki repository, and then also releasing it.  The WikiEngine  
object would be an obvious place for it, since it's what's being  
passed around anyway.

So the proposal would be to have a WikiEngineFactory object, which you  
would call as follows:

WikIEngine engine = WikiEngineFactory.getEngine();

try
{
    WikiPage page = engine.getPage(...);
}
finally
{
    engine.release();
}

The WikiPage object (and so the would not be valid after  
engine.release().

Now, this does not mean that the Factory will create new WikiEngines,  
but it'll probably set up some ThreadLocals so that it'll work  
properly.  If we had separate WikiEngine objects, we would need to  
recreate all of our Manager classes.  This unfortunately means that  
WikiEngine is for all intents and purposes currently a singleton.   
However, this approach would work, since you can't get to the managers  
without getting a WikiEngine first.

Anybody see any serious problems with this approach?  Anything that'll  
break (other than every single plugin/Task which holds a reference to  
a WikiPage or WikiContext object, but those can be refactored away)?

/Janne

Re: JCR integration

Posted by Janne Jalkanen <ja...@ecyrd.com>.

>>
> However, storing the Node in the WikiPage is obviously just a speed  
> optimization.  These two statements are equivalent:
>
> m_node.getProperty("foo").getString();
> and
> m_engine 
> .getContentManager 
> ().getJCRSession 
> ().getRootNode().getNode(m_jcrPath).getProperty("foo").getString();
> or even
> m_engine 
> .getContentManager 
> ().getJCRSession().getByUUID(m_uuid).getProperty("foo").getString();

I benchmarked these two.  On Priha, to my great surprise, the latter  
method is 25 times faster than the simple getProperty().

However, on Jackrabbit, the former is about 5 times faster.

So this is a fairly important decision which is going to have a huge  
impact on the performance of JSPWiki 3.0.  Unfortunately, these  
results are wildly different, so it's really hard to say what the  
correct solution is.  One option is to find a least-common- 
denominator: use the ThreadLocal method, but still mark WikiPages as  
uncacheable.  This allows us to change strategy without really  
breaking anything in the future.

(Interestingly, getProperty() is about 3.5 times faster on Jackrabbit  
than on Priha; but calling getJCRSession().getItem(path) is on the  
order of 30 times faster on Priha than on Jackrabbit... Clearly  
different optimization priorities (and caching strategies) there.)

/Janne

Re: JCR integration

Posted by Janne Jalkanen <ja...@ecyrd.com>.

> It seems to me that the local instance of WikiPage should indeed have
> a value of "blob." But as far as the JCR is concerned, it's still
> "bar", because save() hadn't been called, right?

Yes, any other Session accessing it would return "bar".

> I do not understand  what you mean. A call to getAttribute() would
> return values from the WikiPage's field variables, no? Why would it
> need to consult a cache?

The code would look like this:

Object getAttribute(String param)
{
    Object o = m_localFields.getAttribute(param);
    if( o == null )
    {
       o = m_node.getProperty(param).toString();
    }

    return o;
}

In this case localfields is a local cache.  This is extraneous work  
every single time we read a page. In addition, since we can store  
generic metadata, we do not know which properties will be fetched.

>> In addition, we need to build a disk cache, managed by WikiPage for  
>> Really
>> Large Binaries, including deletion when WikiPage is garbage  
>> collected, etc
>> (unless we want to cache them in memory, which is something we  
>> probably
>> don't want to do).  Of course a ready cache library can be used to  
>> do that,
>> but again, it feels a bit pointless when JCR does it for us.
>
> You've really lost me here...

Attachments are WikiPage objects with wiki:content property set to the  
binary content, and wiki:contentType set to the MIME type of the  
content.

>
>
>>
>>> Basically, the idea is to make read-only access very cheap, but  
>>> writes
>>> more expensive... and to hide all of the state management details  
>>> from
>>> callers.
>>
>> I don't think we can; imagine a situation where we have something  
>> like this:
>>
>> WikiPage page = getPage(...)
>>
>> page.setAttribute("foo","bar");
>>
>> doThingy(page);
>>
>> ...
>>
>> public doThingy(WikiPage page) throws Exception
>> {
>>  page.setAttribute("bar", "baz");
>> }
>>
>> Now, if doThingy() throws an exception, it needs to do its own state
>> management to clean things up before it passes it up to the caller,  
>> or else
>> the state of the page is undefined when it goes up.
>
> It seems to me that the state of the page is perfectly clear... the
> save() method hasn't been called, and therefore nothing has been
> committed.

Well, the methods manipulating the page are not certain what the state  
of the page is.  It needs to manage the exception somehow...

> Yeah, but you've already made that assumption anyway. It seems to me
> that the save() method is already a state-management method. Is
> purpose is to change the state from "unsaved" to "saved," right?

Correct.
>>
>> In addition, having a proper lifecycle means that the JCR Session  
>> can be
>> released by the JCR engine when it is not used.  If the  
>> ContentManager just
>> holds on to the Session, it'll just keep consuming resources.   
>> Probably not
>> a lot, but it's still something.
>
> Not sure this would be a big deal if we kept references, essentially
> to ThreadLocal copies of Sessions inside ContentManager -- that were
> only used for retrieving Nodes.

As I said, it's not a lot.  It's a deal, but not a big one.

> Here's a counter-proposal for JCRNode:
> 0) Assume that most WikiPages will be created for read-only retrieval
> 1) Remove the field variable that references Node. No Node reference
> means no Session, and no Session means WikiPage becomes (more)
> thread-safe.
> 2) Store attributes, contents, title, author etc as JCRWikiPage fields
> instead of manipulating the Node

This does not work, since it means that we have to fetch all the  
metadata for the page *every single time we access the page*.  You  
see, we don't know what metadata there is for the page.

In the end, we will be writing a copy of Session functionality on top  
of WikiPage.

> 3) When save() is called, start a fresh Session, retrieve the old
> Node, manipulate it; commit; release Session

Unfortunately, this again means that we need to store everything  
inside memory for changed WikiPage fields, which means that for all  
read access, we need to first check internal fields, then go for the  
Session.

The problem is that for large objects, such as attachments, this is a  
major overhead.  So we would need to build some way to stream the  
content into a disk cache, and then open a Session and copy the data  
back-n-forth.

Which is *doable*, but it's essentially duplicating the work that JCR  
already does for us invisibly.

> 4) Maybe store a String or other path-like node name (WikiName?) in
> JCRWikiPage so that it's obvious where it came from in the repository.
> Actually I think this happens already.

Yes.  There is a 1:1 mapping between WikiNames and JCR Paths.

> This is a bit different than how it works now. But if JCRWikiPage
> severs the reference to Node, it also gets rid of the most obvious
> thread-safety issue.

Well, not really.  WikiPage isn't threadsafe now (no synchronization),  
so anything wouldn't really change with that regard.  In addition, if  
you cache a WikiPage at the moment, there is no guarantee that the  
information it contains is correct, so you're not even supposed to  
keep a copy of it.  When the Node is tied to it, that means that  
anytime you access any info on it, you will always get the latest info.

You see, if we store it inside a WikiPage and we have the following  
pattern:

WikiPage a = getPage("TestPage");
a.setAttribute("foo","bar");
a.save();
a.setAttribute("foo","blob");

WikiPage b = getPage("TestPage");
String attr = b.getAttribute("foo");

Is the value of attr "bar" or "blob"?  And what would the developer  
expect?  How about if these two calls are in different functions, and  
WikiPage is not passed around?  Note that in our current architecture,  
the result depends on whether the WikiPage is read from the cache or  
not... So at any rate, this will be an improvement.

My belief is that they should reflect the same state, since otherwise  
you can't make changes in one place visible to any other place without  
calling save().  And save() does create a new version...  With  
WikiPage carrying the state, it would be different for each.

One option would be the following pattern:

ContentManager mgr = m_engine.getContentManager();

try
{
     WikiPage p = mgr.getPage();
     // manipulate
     p.save();
}
finally
{
    mgr.release();
}

So instead of using WikiEngine, we could use ContentManager.  This  
means obviously that all WikiEngine methods relating to content would  
disappear, but on the other hand, that would make WikiEngine very  
lean. But this would be a fairly big change in all of the plugins,  
obviously.

However, storing the Node in the WikiPage is obviously just a speed  
optimization.  These two statements are equivalent:

m_node.getProperty("foo").getString();
and
m_engine 
.getContentManager 
().getJCRSession 
().getRootNode().getNode(m_jcrPath).getProperty("foo").getString();
or even
m_engine 
.getContentManager 
().getJCRSession().getByUUID(m_uuid).getProperty("foo").getString();

Obviously the latter does have some extra overhead. It can be somewhat  
optimized, but still...  This does achieve thread safety at the  
expense of some complexity, while keeping the content in the Session,  
and not store it in a WikiPage.  The problem of course is that this  
does *not* solve the issue of lifecycles, since it's possible to leave  
stale stuff in the WikiPage, which might get saved by accident later  
on - unless the Session is cleaned somehow.

What exactly would the advantage be for making a WikiPage threadsafe,  
since it isn't threadsafe now?

/Janne

Re: JCR integration

Posted by Andrew Jaquith <an...@gmail.com>.

>
> Consider the following case:
>
> WikiPage page = getPage(...);
> page.setAttribute("foo","bar");
> page.save();
>
> page.setAttribute("foo","blob");
>
> String attr = page.getAttribute("foo");
>
> The value of "attr" should obviously be "blob" instead of "bar".

It seems to me that the local instance of WikiPage should indeed have
a value of "blob." But as far as the JCR is concerned, it's still
"bar", because save() hadn't been called, right?

 This means
> that WikiPage will internally need to check the stuff from a local cache
> before it fetches it from JCR.  This is an extra hit on all getAttribute()
> requests.

I do not understand  what you mean. A call to getAttribute() would
return values from the WikiPage's field variables, no? Why would it
need to consult a cache?

>
> In addition, we need to build a disk cache, managed by WikiPage for Really
> Large Binaries, including deletion when WikiPage is garbage collected, etc
> (unless we want to cache them in memory, which is something we probably
> don't want to do).  Of course a ready cache library can be used to do that,
> but again, it feels a bit pointless when JCR does it for us.

You've really lost me here...

>
>> Basically, the idea is to make read-only access very cheap, but writes
>> more expensive... and to hide all of the state management details from
>> callers.
>
> I don't think we can; imagine a situation where we have something like this:
>
> WikiPage page = getPage(...)
>
> page.setAttribute("foo","bar");
>
> doThingy(page);
>
> ...
>
> public doThingy(WikiPage page) throws Exception
> {
>   page.setAttribute("bar", "baz");
> }
>
> Now, if doThingy() throws an exception, it needs to do its own state
> management to clean things up before it passes it up to the caller, or else
> the state of the page is undefined when it goes up.

It seems to me that the state of the page is perfectly clear... the
save() method hasn't been called, and therefore nothing has been
committed.

>
> So we already have to do state management.

Yeah, but you've already made that assumption anyway. It seems to me
that the save() method is already a state-management method. Is
purpose is to change the state from "unsaved" to "saved," right?

>
> In addition, having a proper lifecycle means that the JCR Session can be
> released by the JCR engine when it is not used.  If the ContentManager just
> holds on to the Session, it'll just keep consuming resources.  Probably not
> a lot, but it's still something.

Not sure this would be a big deal if we kept references, essentially
to ThreadLocal copies of Sessions inside ContentManager -- that were
only used for retrieving Nodes.

Here's a counter-proposal for JCRNode:
0) Assume that most WikiPages will be created for read-only retrieval
1) Remove the field variable that references Node. No Node reference
means no Session, and no Session means WikiPage becomes (more)
thread-safe.
2) Store attributes, contents, title, author etc as JCRWikiPage fields
instead of manipulating the Node
3) When save() is called, start a fresh Session, retrieve the old
Node, manipulate it; commit; release Session
4) Maybe store a String or other path-like node name (WikiName?) in
JCRWikiPage so that it's obvious where it came from in the repository.
Actually I think this happens already.
5) If the save() operation fails to retrieve a page (because it was
deleted, perhaps), we throw a checked exception. Otherwise the save()
completes normally.

This is a bit different than how it works now. But if JCRWikiPage
severs the reference to Node, it also gets rid of the most obvious
thread-safety issue.

Re: JCR integration

Posted by Janne Jalkanen <ja...@ecyrd.com>.

> For the modify/write case, any of the set() methods store local,
> uncommitted changes in the WikiPage object. The save() method starts a
> new Session and commits. Or maybe any of set() methods start a new
> transaction which is held open until the save().

...but then we need to code in lots of redundant stuff which is  
already managed by JCR.

Consider the following case:

WikiPage page = getPage(...);
page.setAttribute("foo","bar");
page.save();

page.setAttribute("foo","blob");

String attr = page.getAttribute("foo");

The value of "attr" should obviously be "blob" instead of "bar".  This  
means that WikiPage will internally need to check the stuff from a  
local cache before it fetches it from JCR.  This is an extra hit on  
all getAttribute() requests.

In addition, we need to build a disk cache, managed by WikiPage for  
Really Large Binaries, including deletion when WikiPage is garbage  
collected, etc (unless we want to cache them in memory, which is  
something we probably don't want to do).  Of course a ready cache  
library can be used to do that, but again, it feels a bit pointless  
when JCR does it for us.

> Basically, the idea is to make read-only access very cheap, but writes
> more expensive... and to hide all of the state management details from
> callers.

I don't think we can; imagine a situation where we have something like  
this:

WikiPage page = getPage(...)

page.setAttribute("foo","bar");

doThingy(page);

...

public doThingy(WikiPage page) throws Exception
{
    page.setAttribute("bar", "baz");
}

Now, if doThingy() throws an exception, it needs to do its own state  
management to clean things up before it passes it up to the caller, or  
else the state of the page is undefined when it goes up.

So we already have to do state management.

Originally, I was planning to tie state management into WikiContext's  
lifecycle, but I realized that it's a bit complicated because then  
everything needs a WikiContext either passed around or created, and  
that's a bit of a problem.

However, when Session management is done with WikiEngine, it means  
that the only case where you really have to care about this is when  
you *originally* fetch a WikiEngine.  This is typically done only in  
WikiServletFilter, or in the case of embedders, when you start a  
session (=get a WikiEngine).

In addition, having a proper lifecycle means that the JCR Session can  
be released by the JCR engine when it is not used.  If the  
ContentManager just holds on to the Session, it'll just keep consuming  
resources.  Probably not a lot, but it's still something.

> My thinking on this has been influenced by Alfresco. Here are some
> interesting notes about what they are doing:
>
> http://wiki.alfresco.com/wiki/Introducing_the_Alfresco_Java_Content_Repository_API
> http://wiki.alfresco.com/wiki/Java_Foundation_API

As you can see from their examples, they also have state management.   
They just create a new Session in the beginning (Repository.login()),  
do their thing, and then finally call Session.logout().

This is a common pattern, used also by other content management  
interfaces such as Hibernate.

/Janne

Re: JCR integration

Posted by Andrew Jaquith <an...@gmail.com>.

Ok, it seems unlikely that we are going to find a "free lunch" here.
Agreed, it probably would be expensive to pre-fetch *all* metadata.
But what if ContentManager kept ThreadLocal refererences to Sessions
just for "read" purposes? Then, you could pre-fetch the most common
metadata items, like...
- page title/name
- ACL
- version
- creation/modification times
- author

Page contents would not be pre-fetched. Calling getContent() would
grab an existing Session and use it to get the content.

For the modify/write case, any of the set() methods store local,
uncommitted changes in the WikiPage object. The save() method starts a
new Session and commits. Or maybe any of set() methods start a new
transaction which is held open until the save().

Basically, the idea is to make read-only access very cheap, but writes
more expensive... and to hide all of the state management details from
callers.

My thinking on this has been influenced by Alfresco. Here are some
interesting notes about what they are doing:

http://wiki.alfresco.com/wiki/Introducing_the_Alfresco_Java_Content_Repository_API
http://wiki.alfresco.com/wiki/Java_Foundation_API

Andrew


On Mon, Feb 9, 2009 at 10:36 AM, Janne Jalkanen <ja...@iki.fi> wrote:
> public class WikiPage
> {
>   private Node m_node;
>
>   public String getContentAsString()
>   {
>      return m_node.getProperty("wiki:content").getString();
>   }
> }
>
> Getting a new Node reference for each access is simply too expensive,
> and so would pre-emptively fetching all metadata into memory also be.
>
> We can't subclass nor implement, due to the way JCR API is
> designed.  Node is an interface, which you receive from the JCR
> Session with Session.getItem();
>
> /Janne
>
> On Mon, Feb 09, 2009 at 08:18:28AM -0500, Andrew Jaquith wrote:
>> Janne, is WikiPage == Node? (subclass or implementation). Seems like
>> there's an implied tight coupling between the two...
>>
>> On Feb 9, 2009, at 4:12, Janne Jalkanen <ja...@iki.fi> wrote:
>>
>>>> Re-reading your e-mail, I *think* I understand the comment you made
>>>> about not wanting to keep a ThreadLocal field in ContentManager that
>>>> keeps a reference to the current Session. But why keep the Session?
>>>> Couldn't a getPage() operation open a new Session, get the page, then
>>>> close the Session and return the page? Then there'd be no reason to
>>>> stash the Session anywhere.
>>>
>>> The problem is that the Node keeps a reference to a Session, and the
>>> WikiPage holds a reference to the Node (if it does not, it is
>>> impossible to manipulate the content of the repository through
>>> WikiPage methods, such as get/setAttribute()).
>>>
>>> So, if getPage() closes the session, all attempts to use the WikiPage
>>> for anything will fail spectacularly.
>>>
>>>> Or alternatively, maybe you (we) guarantee that any time we get a
>>>> Session we have to complete work on the session, then call refresh(),
>>>> before returning. That would mean you could keep ThreadLocals in
>>>> ContentManager. But maybe I'm missing your point about the exception
>>>> cases. Could you explain further?
>>>
>>> Guaranteeing calling Session.refresh() means the following pattern:
>>>
>>> m_engine = ... // Get engine somehow
>>> try
>>> {
>>>   WikiPage p = m_engine.getPage(...)
>>>   // Do something
>>> }
>>> finally
>>> {
>>>   // This must be done here, or exceptions will go upwards and
>>>   // one might end up with unpurged data in the Session.
>>>   m_engine.getContentManager().refresh(); // Calls Session.refresh()
>>> }
>>>
>>> So in the end this pattern is *exactly* the same as the pattern above,
>>> with the exception that this does not explicitly state the start of
>>> the Session, and leaves the lifecycle muddy.  We can't call
>>> refresh in our WikiFilter because that does not work for embedders.
>>>
>>> Note that for the most part, the pattern I suggested will only affect
>>> WikiFilter and WikiContextFactory, since those are the ones who get
>>> WikiEngine. Most of our APIs get a WikiContext, and for plugin writers
>>> *nothing* would change (because they did not acquire a WikiEngine,
>>> they just used WikiContext.getEngine()).
>>>
>>> /Janne
>

Re: JCR integration

Posted by Janne Jalkanen <ja...@iki.fi>.

public class WikiPage
{
   private Node m_node;

   public String getContentAsString()
   {
      return m_node.getProperty("wiki:content").getString();
   }
}

Getting a new Node reference for each access is simply too expensive,
and so would pre-emptively fetching all metadata into memory also be.

We can't subclass nor implement, due to the way JCR API is
designed.  Node is an interface, which you receive from the JCR
Session with Session.getItem();

/Janne

On Mon, Feb 09, 2009 at 08:18:28AM -0500, Andrew Jaquith wrote:
> Janne, is WikiPage == Node? (subclass or implementation). Seems like  
> there's an implied tight coupling between the two...
>
> On Feb 9, 2009, at 4:12, Janne Jalkanen <ja...@iki.fi> wrote:
>
>>> Re-reading your e-mail, I *think* I understand the comment you made
>>> about not wanting to keep a ThreadLocal field in ContentManager that
>>> keeps a reference to the current Session. But why keep the Session?
>>> Couldn't a getPage() operation open a new Session, get the page, then
>>> close the Session and return the page? Then there'd be no reason to
>>> stash the Session anywhere.
>>
>> The problem is that the Node keeps a reference to a Session, and the
>> WikiPage holds a reference to the Node (if it does not, it is
>> impossible to manipulate the content of the repository through
>> WikiPage methods, such as get/setAttribute()).
>>
>> So, if getPage() closes the session, all attempts to use the WikiPage
>> for anything will fail spectacularly.
>>
>>> Or alternatively, maybe you (we) guarantee that any time we get a
>>> Session we have to complete work on the session, then call refresh(),
>>> before returning. That would mean you could keep ThreadLocals in
>>> ContentManager. But maybe I'm missing your point about the exception
>>> cases. Could you explain further?
>>
>> Guaranteeing calling Session.refresh() means the following pattern:
>>
>> m_engine = ... // Get engine somehow
>> try
>> {
>>   WikiPage p = m_engine.getPage(...)
>>   // Do something
>> }
>> finally
>> {
>>   // This must be done here, or exceptions will go upwards and
>>   // one might end up with unpurged data in the Session.
>>   m_engine.getContentManager().refresh(); // Calls Session.refresh()
>> }
>>
>> So in the end this pattern is *exactly* the same as the pattern above,
>> with the exception that this does not explicitly state the start of
>> the Session, and leaves the lifecycle muddy.  We can't call
>> refresh in our WikiFilter because that does not work for embedders.
>>
>> Note that for the most part, the pattern I suggested will only affect
>> WikiFilter and WikiContextFactory, since those are the ones who get
>> WikiEngine. Most of our APIs get a WikiContext, and for plugin writers
>> *nothing* would change (because they did not acquire a WikiEngine,
>> they just used WikiContext.getEngine()).
>>
>> /Janne

Re: JCR integration

Posted by Andrew Jaquith <an...@gmail.com>.

Janne, is WikiPage == Node? (subclass or implementation). Seems like  
there's an implied tight coupling between the two...

On Feb 9, 2009, at 4:12, Janne Jalkanen <ja...@iki.fi> wrote:

>> Re-reading your e-mail, I *think* I understand the comment you made
>> about not wanting to keep a ThreadLocal field in ContentManager that
>> keeps a reference to the current Session. But why keep the Session?
>> Couldn't a getPage() operation open a new Session, get the page, then
>> close the Session and return the page? Then there'd be no reason to
>> stash the Session anywhere.
>
> The problem is that the Node keeps a reference to a Session, and the
> WikiPage holds a reference to the Node (if it does not, it is
> impossible to manipulate the content of the repository through
> WikiPage methods, such as get/setAttribute()).
>
> So, if getPage() closes the session, all attempts to use the WikiPage
> for anything will fail spectacularly.
>
>> Or alternatively, maybe you (we) guarantee that any time we get a
>> Session we have to complete work on the session, then call refresh(),
>> before returning. That would mean you could keep ThreadLocals in
>> ContentManager. But maybe I'm missing your point about the exception
>> cases. Could you explain further?
>
> Guaranteeing calling Session.refresh() means the following pattern:
>
> m_engine = ... // Get engine somehow
> try
> {
>   WikiPage p = m_engine.getPage(...)
>   // Do something
> }
> finally
> {
>   // This must be done here, or exceptions will go upwards and
>   // one might end up with unpurged data in the Session.
>   m_engine.getContentManager().refresh(); // Calls Session.refresh()
> }
>
> So in the end this pattern is *exactly* the same as the pattern above,
> with the exception that this does not explicitly state the start of
> the Session, and leaves the lifecycle muddy.  We can't call
> refresh in our WikiFilter because that does not work for embedders.
>
> Note that for the most part, the pattern I suggested will only affect
> WikiFilter and WikiContextFactory, since those are the ones who get
> WikiEngine. Most of our APIs get a WikiContext, and for plugin writers
> *nothing* would change (because they did not acquire a WikiEngine,
> they just used WikiContext.getEngine()).
>
> /Janne

Re: JCR integration

Posted by Janne Jalkanen <ja...@iki.fi>.

> Re-reading your e-mail, I *think* I understand the comment you made
> about not wanting to keep a ThreadLocal field in ContentManager that
> keeps a reference to the current Session. But why keep the Session?
> Couldn't a getPage() operation open a new Session, get the page, then
> close the Session and return the page? Then there'd be no reason to
> stash the Session anywhere.

The problem is that the Node keeps a reference to a Session, and the
WikiPage holds a reference to the Node (if it does not, it is
impossible to manipulate the content of the repository through
WikiPage methods, such as get/setAttribute()).

So, if getPage() closes the session, all attempts to use the WikiPage
for anything will fail spectacularly.

> Or alternatively, maybe you (we) guarantee that any time we get a
> Session we have to complete work on the session, then call refresh(),
> before returning. That would mean you could keep ThreadLocals in
> ContentManager. But maybe I'm missing your point about the exception
> cases. Could you explain further?

Guaranteeing calling Session.refresh() means the following pattern:

m_engine = ... // Get engine somehow
try
{
   WikiPage p = m_engine.getPage(...)
   // Do something
}
finally
{
   // This must be done here, or exceptions will go upwards and
   // one might end up with unpurged data in the Session.
   m_engine.getContentManager().refresh(); // Calls Session.refresh()
}

So in the end this pattern is *exactly* the same as the pattern above,
with the exception that this does not explicitly state the start of
the Session, and leaves the lifecycle muddy.  We can't call
refresh in our WikiFilter because that does not work for embedders.

Note that for the most part, the pattern I suggested will only affect
WikiFilter and WikiContextFactory, since those are the ones who get
WikiEngine. Most of our APIs get a WikiContext, and for plugin writers
*nothing* would change (because they did not acquire a WikiEngine,
they just used WikiContext.getEngine()).

/Janne

Re: JCR integration

Posted by Andrew Jaquith <an...@gmail.com>.

Not sure I completely understand the issue. I do think that the
proposed remedy might be worse than the disease, though.
WikiEngine-as-singleton (more or less) is pretty easy to understand
conceptually, and the WikiEngine itself is pretty "heavyweight" in the
sense that it contains references to all sorts of other important
factories and facades that manage other things.

So, I think we should probably look to make something else
"lightweight", not the WikiEngine. What that might be, I am not sure,
because I don't fully understand the issue.

Re-reading your e-mail, I *think* I understand the comment you made
about not wanting to keep a ThreadLocal field in ContentManager that
keeps a reference to the current Session. But why keep the Session?
Couldn't a getPage() operation open a new Session, get the page, then
close the Session and return the page? Then there'd be no reason to
stash the Session anywhere.

For the most common use case (reading a wiki page), this might be
enough: the page would be wired up by Stripes anyway via
WikiPageTypeConverter. That would happen (usually) once per request.

Or alternatively, maybe you (we) guarantee that any time we get a
Session we have to complete work on the session, then call refresh(),
before returning. That would mean you could keep ThreadLocals in
ContentManager. But maybe I'm missing your point about the exception
cases. Could you explain further?

Andrew

On Sun, Feb 8, 2009 at 3:15 PM, Janne Jalkanen <ja...@ecyrd.com> wrote:
> Folks,
>
> I've got a bit of a problem with respect to the JCR integration.  I will
> write a lengthy email explaining my current thoughts on the subject partly
> to get your ideas, and partly to clarify my own thoughts on the subject.
>
> Our current model is stateless in the sense that the repository is open all
> the time, and you just access the pages with a String handle (pagename).
>
> However, JCR model is stateful.  That is you first get a Session object,
> through which you manipulate the contents of the repository, and once you're
> done, you release that object.  This is similar to Hibernate, though JCR
> Sessions are not threadsafe (and assumed to be bound to a single Thread
> even, so they can't even be shared when synchronized).
>
> So logically we *could* have ContentManager have a single ThreadLocal which
> keeps the Session object. However, since a Session gathers all of the
> changes, it means that in case of e.g. exceptions, the old data would still
> remain somewhere in the Session, and upon a new save, some old, obsolete
> data would be saved as well.  This is obviously not a good thing.
>
> No, we can't empty the Session with Session.refresh() before access simply
> because we don't know when access begins - it could start even with a simple
> getPage() - and obviously it's not a good thing if you call getPage()
> multiple times during a single access.
>
> So, it seems to me that in order to avoid side effects, we need to have some
> sort of session management - which means acquiring a session to the JSPWiki
> repository, and then also releasing it.  The WikiEngine object would be an
> obvious place for it, since it's what's being passed around anyway.
>
> So the proposal would be to have a WikiEngineFactory object, which you would
> call as follows:
>
> WikIEngine engine = WikiEngineFactory.getEngine();
>
> try
> {
>   WikiPage page = engine.getPage(...);
> }
> finally
> {
>   engine.release();
> }
>
> The WikiPage object (and so the would not be valid after engine.release().
>
> Now, this does not mean that the Factory will create new WikiEngines, but
> it'll probably set up some ThreadLocals so that it'll work properly.  If we
> had separate WikiEngine objects, we would need to recreate all of our
> Manager classes.  This unfortunately means that WikiEngine is for all
> intents and purposes currently a singleton.  However, this approach would
> work, since you can't get to the managers without getting a WikiEngine
> first.
>
> Anybody see any serious problems with this approach?  Anything that'll break
> (other than every single plugin/Task which holds a reference to a WikiPage
> or WikiContext object, but those can be refactored away)?
>
> /Janne
>
>