You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jspwiki.apache.org by Andrew Jaquith <an...@gmail.com> on 2010/03/14 16:29:38 UTC

ReferenceManager rewrite: seeking a little discussion

All (and especially Janne) --

In digging into some of the remaining bugs clustered in
PageRenamerTest, I was forced to confront what I'd coded up during the
last re-write of ReferenceManager. Lots of the PageRenamerTests are
still broken. The problem with page-renaming relates, I suspect, to a
checkin Janne did previously that sought to handle case-sensitivity
filesystem issues. To put it simply, the relationship between the wiki
path (as stored in the JCRWikiPage ATTR_TITLE attribute) and the
filesystem wasn't being dealt with gracefully by ReferenceManager. The
bug was too complex to track down, and I didn't have the patience and
time to do it.

I don't mean to blame Janne for this -- not at all. It merely sheds
light on the difficulty of keeping references when the identifiers are
page names. When the page name changes, we have to jump through lots
of hoops to keep the references intact. You can blame ME for that. It
reminded me of the old saying "programmers will create the most
complex things they can debug".

In thinking this through a bit more, I thought it might be better if
we stored references as UUIDs. This means (for example) that renaming
is simple -- all we really need to just change the page text, rather
than the reference "pointers". So, that's what I've been experimenting
with. It seems to work really, really well, and the code is simpler.

The only odd case we have to deal with is when we're referring to a
page that hasn't been created yet. In that case, what I've chosen to
do is create dummy pages in a separate JCR branch (part of the
/jspwiki/wiki:references node tree). Then, when pages are added in
ContentManager, we check FIRST to see if that page is in the
"not-created" tree. If it is, we MOVE it to the pages tree and then
save as normal. Deletions work in reverse: if the page has any inbound
references, we move it back to the "not created" tree to ensure that
references from live pages stay intact; otherwise we zorch the page as
normal.

The "not created" page tree, by the way, is also an example of
something I'm calling a "page foundry" -- a place where future pages
are born but not yet moved into production. I can imagine other
foundries -- for example, a per-user foundry for drafts. Maybe
"nursery" is a better metaphor, but you get the idea.

Thoughts? The code isn't quite ready, but it is progressing nicely. We
might as well fix it before the 3.0 release, right?

Andrew

Re: ReferenceManager rewrite: seeking a little discussion

Posted by Andrew Jaquith <an...@gmail.com>.
You got it right except for the last point. If anything, it makes the
code slightly simpler in most respects -- it cuts down the amount of
code by about 10% in ReferenceManager, notably in the renaming code.

As for the broader issue you are raising, see the reply I just sent to Harry.

Andrew

On Sun, Mar 14, 2010 at 1:35 PM, Florian Holeczek <fl...@holeczek.de> wrote:
> Hi Andrew,
>
> thanks for the clarification! Indeed, I have misunderstood you. I must
> confess that I don't know the code well enough to judge which impacts
> your particular change may have. That said, I change my -1 to a 0.
>
> But if I understand you correctly this time, the whole thing now sounds
> to me as follows:
> * your change is no solution to the problem I meant (a pity! ;-) )
> * I think the problems you're addressing with it have their root cause
> in the problem I mean
> * So this would basically be a hack which makes the code more
> complicated in order to fix some unit tests? ;-)
>
> Regards
>  Florian
>
> Am 14.03.2010 17:50, schrieb Andrew Jaquith:
>> Hi Florian --
>>
>> Thanks for your message. Why the -1? Trying to understand the nature
>> of your objection.
>>
>> I should have clarified something in my previous message. The UUIDs
>> are only used by ReferenceManager -- they are not exposed in the UI at
>> all, and the public methods in ReferenceManager (e.g., getReferredBy)
>> still return WikiPaths like they always have. Users won't see anything
>> different, and they can still enter wiki names as they always have.
>>
>> Behind the scenes, the UUIDs are stored in the back-end JCR pages as
>> wiki:referenceTo and wiki:refererredBy properties. Whenever the page
>> is saved, we extract out the page names that are referenced (as we
>> normally do), normalize the page name, figure out the UUIDs, and then
>> stash the UUIDs.
>>
>> So, in case it wasn't clear -- this UUID stuff is all under the covers.
>>
>> Does that make you reconsider your -1? ;)
>>
>> Andrew
>>
>> On Sun, Mar 14, 2010 at 12:40 PM, Florian Holeczek <fl...@holeczek.de> wrote:
>>> Hi Andrew,
>>>
>>> we've known this issue for quite a long time already, haven't we?
>>> I've been trying to put the focus on it several times (search the list
>>> for normalization, page name, wiki name). Happy to see that this is
>>> finally recognized as a real problem before 3.0 release :-)
>>>
>>> My opinion in short: Introducing and using UUIDs is certainly the
>>> standard and well-proven way to solve such a problem in information
>>> science. However, we're building a wiki server and the wiki way is using
>>> wiki names as identifiers. My opinion was and is that we need to
>>> normalize wiki names, which means to think about a mathematical function
>>> (probably surjective) to map a given wiki name to one defined page.
>>> Unfortunately, this may mean that we're breaking links by removing the
>>> fuzzy logic we've been using up to now.
>>>
>>> So basically, a -1 from me for the UUID solution.
>>>
>>> Regards
>>>  Florian
>>>
>>>
>>> Am 14.03.2010 16:29, schrieb Andrew Jaquith:
>>>> All (and especially Janne) --
>>>>
>>>> In digging into some of the remaining bugs clustered in
>>>> PageRenamerTest, I was forced to confront what I'd coded up during the
>>>> last re-write of ReferenceManager. Lots of the PageRenamerTests are
>>>> still broken. The problem with page-renaming relates, I suspect, to a
>>>> checkin Janne did previously that sought to handle case-sensitivity
>>>> filesystem issues. To put it simply, the relationship between the wiki
>>>> path (as stored in the JCRWikiPage ATTR_TITLE attribute) and the
>>>> filesystem wasn't being dealt with gracefully by ReferenceManager. The
>>>> bug was too complex to track down, and I didn't have the patience and
>>>> time to do it.
>>>>
>>>> I don't mean to blame Janne for this -- not at all. It merely sheds
>>>> light on the difficulty of keeping references when the identifiers are
>>>> page names. When the page name changes, we have to jump through lots
>>>> of hoops to keep the references intact. You can blame ME for that. It
>>>> reminded me of the old saying "programmers will create the most
>>>> complex things they can debug".
>>>>
>>>> In thinking this through a bit more, I thought it might be better if
>>>> we stored references as UUIDs. This means (for example) that renaming
>>>> is simple -- all we really need to just change the page text, rather
>>>> than the reference "pointers". So, that's what I've been experimenting
>>>> with. It seems to work really, really well, and the code is simpler.
>>>>
>>>> The only odd case we have to deal with is when we're referring to a
>>>> page that hasn't been created yet. In that case, what I've chosen to
>>>> do is create dummy pages in a separate JCR branch (part of the
>>>> /jspwiki/wiki:references node tree). Then, when pages are added in
>>>> ContentManager, we check FIRST to see if that page is in the
>>>> "not-created" tree. If it is, we MOVE it to the pages tree and then
>>>> save as normal. Deletions work in reverse: if the page has any inbound
>>>> references, we move it back to the "not created" tree to ensure that
>>>> references from live pages stay intact; otherwise we zorch the page as
>>>> normal.
>>>>
>>>> The "not created" page tree, by the way, is also an example of
>>>> something I'm calling a "page foundry" -- a place where future pages
>>>> are born but not yet moved into production. I can imagine other
>>>> foundries -- for example, a per-user foundry for drafts. Maybe
>>>> "nursery" is a better metaphor, but you get the idea.
>>>>
>>>> Thoughts? The code isn't quite ready, but it is progressing nicely. We
>>>> might as well fix it before the 3.0 release, right?
>>>>
>>>> Andrew
>>>
>

Re: ReferenceManager rewrite: seeking a little discussion

Posted by Florian Holeczek <fl...@holeczek.de>.
Hi Andrew,

thanks for the clarification! Indeed, I have misunderstood you. I must
confess that I don't know the code well enough to judge which impacts
your particular change may have. That said, I change my -1 to a 0.

But if I understand you correctly this time, the whole thing now sounds
to me as follows:
* your change is no solution to the problem I meant (a pity! ;-) )
* I think the problems you're addressing with it have their root cause
in the problem I mean
* So this would basically be a hack which makes the code more
complicated in order to fix some unit tests? ;-)

Regards
 Florian

Am 14.03.2010 17:50, schrieb Andrew Jaquith:
> Hi Florian --
> 
> Thanks for your message. Why the -1? Trying to understand the nature
> of your objection.
> 
> I should have clarified something in my previous message. The UUIDs
> are only used by ReferenceManager -- they are not exposed in the UI at
> all, and the public methods in ReferenceManager (e.g., getReferredBy)
> still return WikiPaths like they always have. Users won't see anything
> different, and they can still enter wiki names as they always have.
> 
> Behind the scenes, the UUIDs are stored in the back-end JCR pages as
> wiki:referenceTo and wiki:refererredBy properties. Whenever the page
> is saved, we extract out the page names that are referenced (as we
> normally do), normalize the page name, figure out the UUIDs, and then
> stash the UUIDs.
> 
> So, in case it wasn't clear -- this UUID stuff is all under the covers.
> 
> Does that make you reconsider your -1? ;)
> 
> Andrew
> 
> On Sun, Mar 14, 2010 at 12:40 PM, Florian Holeczek <fl...@holeczek.de> wrote:
>> Hi Andrew,
>>
>> we've known this issue for quite a long time already, haven't we?
>> I've been trying to put the focus on it several times (search the list
>> for normalization, page name, wiki name). Happy to see that this is
>> finally recognized as a real problem before 3.0 release :-)
>>
>> My opinion in short: Introducing and using UUIDs is certainly the
>> standard and well-proven way to solve such a problem in information
>> science. However, we're building a wiki server and the wiki way is using
>> wiki names as identifiers. My opinion was and is that we need to
>> normalize wiki names, which means to think about a mathematical function
>> (probably surjective) to map a given wiki name to one defined page.
>> Unfortunately, this may mean that we're breaking links by removing the
>> fuzzy logic we've been using up to now.
>>
>> So basically, a -1 from me for the UUID solution.
>>
>> Regards
>>  Florian
>>
>>
>> Am 14.03.2010 16:29, schrieb Andrew Jaquith:
>>> All (and especially Janne) --
>>>
>>> In digging into some of the remaining bugs clustered in
>>> PageRenamerTest, I was forced to confront what I'd coded up during the
>>> last re-write of ReferenceManager. Lots of the PageRenamerTests are
>>> still broken. The problem with page-renaming relates, I suspect, to a
>>> checkin Janne did previously that sought to handle case-sensitivity
>>> filesystem issues. To put it simply, the relationship between the wiki
>>> path (as stored in the JCRWikiPage ATTR_TITLE attribute) and the
>>> filesystem wasn't being dealt with gracefully by ReferenceManager. The
>>> bug was too complex to track down, and I didn't have the patience and
>>> time to do it.
>>>
>>> I don't mean to blame Janne for this -- not at all. It merely sheds
>>> light on the difficulty of keeping references when the identifiers are
>>> page names. When the page name changes, we have to jump through lots
>>> of hoops to keep the references intact. You can blame ME for that. It
>>> reminded me of the old saying "programmers will create the most
>>> complex things they can debug".
>>>
>>> In thinking this through a bit more, I thought it might be better if
>>> we stored references as UUIDs. This means (for example) that renaming
>>> is simple -- all we really need to just change the page text, rather
>>> than the reference "pointers". So, that's what I've been experimenting
>>> with. It seems to work really, really well, and the code is simpler.
>>>
>>> The only odd case we have to deal with is when we're referring to a
>>> page that hasn't been created yet. In that case, what I've chosen to
>>> do is create dummy pages in a separate JCR branch (part of the
>>> /jspwiki/wiki:references node tree). Then, when pages are added in
>>> ContentManager, we check FIRST to see if that page is in the
>>> "not-created" tree. If it is, we MOVE it to the pages tree and then
>>> save as normal. Deletions work in reverse: if the page has any inbound
>>> references, we move it back to the "not created" tree to ensure that
>>> references from live pages stay intact; otherwise we zorch the page as
>>> normal.
>>>
>>> The "not created" page tree, by the way, is also an example of
>>> something I'm calling a "page foundry" -- a place where future pages
>>> are born but not yet moved into production. I can imagine other
>>> foundries -- for example, a per-user foundry for drafts. Maybe
>>> "nursery" is a better metaphor, but you get the idea.
>>>
>>> Thoughts? The code isn't quite ready, but it is progressing nicely. We
>>> might as well fix it before the 3.0 release, right?
>>>
>>> Andrew
>>

Re: ReferenceManager rewrite: seeking a little discussion

Posted by Andrew Jaquith <an...@gmail.com>.
Hi Florian --

Thanks for your message. Why the -1? Trying to understand the nature
of your objection.

I should have clarified something in my previous message. The UUIDs
are only used by ReferenceManager -- they are not exposed in the UI at
all, and the public methods in ReferenceManager (e.g., getReferredBy)
still return WikiPaths like they always have. Users won't see anything
different, and they can still enter wiki names as they always have.

Behind the scenes, the UUIDs are stored in the back-end JCR pages as
wiki:referenceTo and wiki:refererredBy properties. Whenever the page
is saved, we extract out the page names that are referenced (as we
normally do), normalize the page name, figure out the UUIDs, and then
stash the UUIDs.

So, in case it wasn't clear -- this UUID stuff is all under the covers.

Does that make you reconsider your -1? ;)

Andrew

On Sun, Mar 14, 2010 at 12:40 PM, Florian Holeczek <fl...@holeczek.de> wrote:
> Hi Andrew,
>
> we've known this issue for quite a long time already, haven't we?
> I've been trying to put the focus on it several times (search the list
> for normalization, page name, wiki name). Happy to see that this is
> finally recognized as a real problem before 3.0 release :-)
>
> My opinion in short: Introducing and using UUIDs is certainly the
> standard and well-proven way to solve such a problem in information
> science. However, we're building a wiki server and the wiki way is using
> wiki names as identifiers. My opinion was and is that we need to
> normalize wiki names, which means to think about a mathematical function
> (probably surjective) to map a given wiki name to one defined page.
> Unfortunately, this may mean that we're breaking links by removing the
> fuzzy logic we've been using up to now.
>
> So basically, a -1 from me for the UUID solution.
>
> Regards
>  Florian
>
>
> Am 14.03.2010 16:29, schrieb Andrew Jaquith:
>> All (and especially Janne) --
>>
>> In digging into some of the remaining bugs clustered in
>> PageRenamerTest, I was forced to confront what I'd coded up during the
>> last re-write of ReferenceManager. Lots of the PageRenamerTests are
>> still broken. The problem with page-renaming relates, I suspect, to a
>> checkin Janne did previously that sought to handle case-sensitivity
>> filesystem issues. To put it simply, the relationship between the wiki
>> path (as stored in the JCRWikiPage ATTR_TITLE attribute) and the
>> filesystem wasn't being dealt with gracefully by ReferenceManager. The
>> bug was too complex to track down, and I didn't have the patience and
>> time to do it.
>>
>> I don't mean to blame Janne for this -- not at all. It merely sheds
>> light on the difficulty of keeping references when the identifiers are
>> page names. When the page name changes, we have to jump through lots
>> of hoops to keep the references intact. You can blame ME for that. It
>> reminded me of the old saying "programmers will create the most
>> complex things they can debug".
>>
>> In thinking this through a bit more, I thought it might be better if
>> we stored references as UUIDs. This means (for example) that renaming
>> is simple -- all we really need to just change the page text, rather
>> than the reference "pointers". So, that's what I've been experimenting
>> with. It seems to work really, really well, and the code is simpler.
>>
>> The only odd case we have to deal with is when we're referring to a
>> page that hasn't been created yet. In that case, what I've chosen to
>> do is create dummy pages in a separate JCR branch (part of the
>> /jspwiki/wiki:references node tree). Then, when pages are added in
>> ContentManager, we check FIRST to see if that page is in the
>> "not-created" tree. If it is, we MOVE it to the pages tree and then
>> save as normal. Deletions work in reverse: if the page has any inbound
>> references, we move it back to the "not created" tree to ensure that
>> references from live pages stay intact; otherwise we zorch the page as
>> normal.
>>
>> The "not created" page tree, by the way, is also an example of
>> something I'm calling a "page foundry" -- a place where future pages
>> are born but not yet moved into production. I can imagine other
>> foundries -- for example, a per-user foundry for drafts. Maybe
>> "nursery" is a better metaphor, but you get the idea.
>>
>> Thoughts? The code isn't quite ready, but it is progressing nicely. We
>> might as well fix it before the 3.0 release, right?
>>
>> Andrew
>

Re: ReferenceManager rewrite: seeking a little discussion

Posted by Florian Holeczek <fl...@holeczek.de>.
Hi Andrew,

we've known this issue for quite a long time already, haven't we?
I've been trying to put the focus on it several times (search the list
for normalization, page name, wiki name). Happy to see that this is
finally recognized as a real problem before 3.0 release :-)

My opinion in short: Introducing and using UUIDs is certainly the
standard and well-proven way to solve such a problem in information
science. However, we're building a wiki server and the wiki way is using
wiki names as identifiers. My opinion was and is that we need to
normalize wiki names, which means to think about a mathematical function
(probably surjective) to map a given wiki name to one defined page.
Unfortunately, this may mean that we're breaking links by removing the
fuzzy logic we've been using up to now.

So basically, a -1 from me for the UUID solution.

Regards
 Florian


Am 14.03.2010 16:29, schrieb Andrew Jaquith:
> All (and especially Janne) --
> 
> In digging into some of the remaining bugs clustered in
> PageRenamerTest, I was forced to confront what I'd coded up during the
> last re-write of ReferenceManager. Lots of the PageRenamerTests are
> still broken. The problem with page-renaming relates, I suspect, to a
> checkin Janne did previously that sought to handle case-sensitivity
> filesystem issues. To put it simply, the relationship between the wiki
> path (as stored in the JCRWikiPage ATTR_TITLE attribute) and the
> filesystem wasn't being dealt with gracefully by ReferenceManager. The
> bug was too complex to track down, and I didn't have the patience and
> time to do it.
> 
> I don't mean to blame Janne for this -- not at all. It merely sheds
> light on the difficulty of keeping references when the identifiers are
> page names. When the page name changes, we have to jump through lots
> of hoops to keep the references intact. You can blame ME for that. It
> reminded me of the old saying "programmers will create the most
> complex things they can debug".
> 
> In thinking this through a bit more, I thought it might be better if
> we stored references as UUIDs. This means (for example) that renaming
> is simple -- all we really need to just change the page text, rather
> than the reference "pointers". So, that's what I've been experimenting
> with. It seems to work really, really well, and the code is simpler.
> 
> The only odd case we have to deal with is when we're referring to a
> page that hasn't been created yet. In that case, what I've chosen to
> do is create dummy pages in a separate JCR branch (part of the
> /jspwiki/wiki:references node tree). Then, when pages are added in
> ContentManager, we check FIRST to see if that page is in the
> "not-created" tree. If it is, we MOVE it to the pages tree and then
> save as normal. Deletions work in reverse: if the page has any inbound
> references, we move it back to the "not created" tree to ensure that
> references from live pages stay intact; otherwise we zorch the page as
> normal.
> 
> The "not created" page tree, by the way, is also an example of
> something I'm calling a "page foundry" -- a place where future pages
> are born but not yet moved into production. I can imagine other
> foundries -- for example, a per-user foundry for drafts. Maybe
> "nursery" is a better metaphor, but you get the idea.
> 
> Thoughts? The code isn't quite ready, but it is progressing nicely. We
> might as well fix it before the 3.0 release, right?
> 
> Andrew

Re: ReferenceManager rewrite: seeking a little discussion

Posted by Harry Metske <ha...@gmail.com>.
very nice !

thanks,
Harry

2010/3/19 Andrew Jaquith <an...@gmail.com>

> New RefManager code is done, and all tests pass. It works pretty
> nicely, too. I'll check it in tomorrow morning.
>
> Even better, I've been able to kill another dozen or more bugs, so
> we've got just 2 remaining. Getting very close to a clean build...
>
> Andrew
>
> On Mon, Mar 15, 2010 at 9:00 AM, Andrew Jaquith
> <an...@gmail.com> wrote:
> >> ReferenceManager...  Also, since RefMgr worked in 2.x line even in
> renames,
> >> I always figured that there was a small bug introduced, which means that
> >> debugging that one would've been easier than a full rewrite.
> >
> > Yeah, though I re-wote it for 3.x by necessity, so it's going to have
> > bugs 2.x didn't have. :)
> >
> >> (It should be noted that you will need to track the non-existent
> references
> >> using page titles. You can't do them based on UUIDs, since the UUIDs are
> >> assigned by JCR.  You could create a dummy page, but that means
> littering
> >
> > Yes, that's what I assumed.
> >
> >> JSPWiki, so you need to build in certain sanity checks.  In the end,
> there
> >> will be lots more code, though arguably it is going to be easier to
> debug.)
> >
> > Noted. :)
> >
> >>
> >> /Janne
> >>
> >> On Mar 14, 2010, at 22:45 , Andrew Jaquith wrote:
> >>
> >>>> I'm basically +1 on using UUIDs internally, *BUT* there are two big
> >>>> problems: performance. *If* ReferenceManager stores only UUIDs, every
> single
> >>>> page view causes a session.getNodeByUUID().getProperty("wiki:title"),
> which
> >>>> is two reads to the database *per reference*.  This will kill the idea
> of
> >>>> including the page references on the page itself, which I think is
> really
> >>>> neat.
> >>>
> >>> Yes, that is right. I am running into this issue right now. :)
> >>>
> >>> As for references "on the page itself," did you mean the page markup?
> >>> I haven't changed that... that still stays as WikiNames. What I meant
> >>> was using UUIDs in the wiki:refersTo and wiki:referredBy attributes.
> >>>
> >>>> So to solve this a UUID => wiki:title cache needs to be put somewhere.
> >>>
> >>> Should be pretty straightforward, no? The only time cache entries
> >>> would need to be refreshed would be on page-rename operations, which
> >>> won't happen too often.
> >>>
> >>>> Now, the second big problem is non-existing references.  Since an UUID
> >>>> can only exist when the page in question exists, you can't use UUIDs
> to
> >>>> track references to non-existing pages (which the ReferenceManager
> also
> >>>> tracks).
> >>>
> >>> You can, if you use dummy pages to represent the non-existent pages; I
> >>> mentioned this in my first message on this subject (earlier today).
> >>> These go in wiki:references/wiki:notCreated. When new pages are added,
> >>> we look in the "uncreated" tree first, and if there, we move the node
> >>> to the pages tree (and then save all the other attributes on top of
> >>> the node as we normally would).
> >>>
> >>>> So you need to track both in possibly separate systems, including all
> the
> >>>> page creations and removals.  This is the reason why I've never
> bothered to
> >>>> rewrite ReferenceManager into using UUIDs, since doing this tracking
> has
> >>>> been a bit too much work, especially considering that it works now and
> has
> >>>> no performance problems in 3.0.
> >>>
> >>> You are right about having to create a separate mechanism for tracking
> >>> uncreated nodes, and that is what I am experimenting with. I'm
> >>> reasonably far along in terms of proof-of-concent. But, I would
> >>> slightly disagree that the current "use page names as pointers"
> >>> strategy for ReferenceManager works. It mostly does, but breaks down
> >>> in rename situations. I found it nearly impossible to debug -- which
> >>> is saying something, considering I did the last RefMgr rewrite.
> >>>
> >>> This is all about where you want to do the extra work: (1) during page
> >>> renames (as we do now), or (2) during page creation and deletion (as
> >>> the UUID strategy would require). But the UUID strategy has some nice
> >>> benefits, such as making ReferenceManager's code simpler -- it was
> >>> insanely hard to debug. Now it's just difficult.
> >>>
> >>> So, what do you think? I still think this is worth pursing... and I'm
> >>> nearly finished with a POC.
> >>>
> >>> Andrew
> >>>
> >>>>
> >>>> On 14 Mar 2010, at 20:33, Kalle Kivimaa wrote:
> >>>>
> >>>>> Andrew Jaquith <an...@gmail.com> writes:
> >>>>>>
> >>>>>> In thinking this through a bit more, I thought it might be better if
> >>>>>> we stored references as UUIDs. This means (for example) that
> renaming
> >>>>>> is simple -- all we really need to just change the page text, rather
> >>>>>> than the reference "pointers". So, that's what I've been
> experimenting
> >>>>>> with. It seems to work really, really well, and the code is simpler.
> >>>>>
> >>>>> We basically did this in the company I work for, where we use JSPWiki
> as
> >>>>> a base for our UI. We ran into problems with the reference manager
> >>>>> taking up a *very* long time at wiki startup, so we switched over to
> >>>>> using a SQL database for page storage and storing the links between
> >>>>> pages in the database, thus eliminating the "recreate the link
> >>>>> information at every startup" problem.
> >>>>>
> >>>>> The problems we've had with this approach have been mostly on the
> "how
> >>>>> to keep the link information correct", so that's probably something
> you
> >>>>> want to keep in mind.
> >>>>>
> >>>>> --
> >>>>> * Sufficiently advanced magic is indistinguishable from technology
> (T.P)
> >>>>>  *
> >>>>> *           PGP public key available @ http://www.iki.fi/killer
> >>>>>   *
> >>>>
> >>>>
> >>
> >>
> >
>

Re: ReferenceManager rewrite: seeking a little discussion

Posted by Andrew Jaquith <an...@gmail.com>.
New RefManager code is done, and all tests pass. It works pretty
nicely, too. I'll check it in tomorrow morning.

Even better, I've been able to kill another dozen or more bugs, so
we've got just 2 remaining. Getting very close to a clean build...

Andrew

On Mon, Mar 15, 2010 at 9:00 AM, Andrew Jaquith
<an...@gmail.com> wrote:
>> ReferenceManager...  Also, since RefMgr worked in 2.x line even in renames,
>> I always figured that there was a small bug introduced, which means that
>> debugging that one would've been easier than a full rewrite.
>
> Yeah, though I re-wote it for 3.x by necessity, so it's going to have
> bugs 2.x didn't have. :)
>
>> (It should be noted that you will need to track the non-existent references
>> using page titles. You can't do them based on UUIDs, since the UUIDs are
>> assigned by JCR.  You could create a dummy page, but that means littering
>
> Yes, that's what I assumed.
>
>> JSPWiki, so you need to build in certain sanity checks.  In the end, there
>> will be lots more code, though arguably it is going to be easier to debug.)
>
> Noted. :)
>
>>
>> /Janne
>>
>> On Mar 14, 2010, at 22:45 , Andrew Jaquith wrote:
>>
>>>> I'm basically +1 on using UUIDs internally, *BUT* there are two big
>>>> problems: performance. *If* ReferenceManager stores only UUIDs, every single
>>>> page view causes a session.getNodeByUUID().getProperty("wiki:title"), which
>>>> is two reads to the database *per reference*.  This will kill the idea of
>>>> including the page references on the page itself, which I think is really
>>>> neat.
>>>
>>> Yes, that is right. I am running into this issue right now. :)
>>>
>>> As for references "on the page itself," did you mean the page markup?
>>> I haven't changed that... that still stays as WikiNames. What I meant
>>> was using UUIDs in the wiki:refersTo and wiki:referredBy attributes.
>>>
>>>> So to solve this a UUID => wiki:title cache needs to be put somewhere.
>>>
>>> Should be pretty straightforward, no? The only time cache entries
>>> would need to be refreshed would be on page-rename operations, which
>>> won't happen too often.
>>>
>>>> Now, the second big problem is non-existing references.  Since an UUID
>>>> can only exist when the page in question exists, you can't use UUIDs to
>>>> track references to non-existing pages (which the ReferenceManager also
>>>> tracks).
>>>
>>> You can, if you use dummy pages to represent the non-existent pages; I
>>> mentioned this in my first message on this subject (earlier today).
>>> These go in wiki:references/wiki:notCreated. When new pages are added,
>>> we look in the "uncreated" tree first, and if there, we move the node
>>> to the pages tree (and then save all the other attributes on top of
>>> the node as we normally would).
>>>
>>>> So you need to track both in possibly separate systems, including all the
>>>> page creations and removals.  This is the reason why I've never bothered to
>>>> rewrite ReferenceManager into using UUIDs, since doing this tracking has
>>>> been a bit too much work, especially considering that it works now and has
>>>> no performance problems in 3.0.
>>>
>>> You are right about having to create a separate mechanism for tracking
>>> uncreated nodes, and that is what I am experimenting with. I'm
>>> reasonably far along in terms of proof-of-concent. But, I would
>>> slightly disagree that the current "use page names as pointers"
>>> strategy for ReferenceManager works. It mostly does, but breaks down
>>> in rename situations. I found it nearly impossible to debug -- which
>>> is saying something, considering I did the last RefMgr rewrite.
>>>
>>> This is all about where you want to do the extra work: (1) during page
>>> renames (as we do now), or (2) during page creation and deletion (as
>>> the UUID strategy would require). But the UUID strategy has some nice
>>> benefits, such as making ReferenceManager's code simpler -- it was
>>> insanely hard to debug. Now it's just difficult.
>>>
>>> So, what do you think? I still think this is worth pursing... and I'm
>>> nearly finished with a POC.
>>>
>>> Andrew
>>>
>>>>
>>>> On 14 Mar 2010, at 20:33, Kalle Kivimaa wrote:
>>>>
>>>>> Andrew Jaquith <an...@gmail.com> writes:
>>>>>>
>>>>>> In thinking this through a bit more, I thought it might be better if
>>>>>> we stored references as UUIDs. This means (for example) that renaming
>>>>>> is simple -- all we really need to just change the page text, rather
>>>>>> than the reference "pointers". So, that's what I've been experimenting
>>>>>> with. It seems to work really, really well, and the code is simpler.
>>>>>
>>>>> We basically did this in the company I work for, where we use JSPWiki as
>>>>> a base for our UI. We ran into problems with the reference manager
>>>>> taking up a *very* long time at wiki startup, so we switched over to
>>>>> using a SQL database for page storage and storing the links between
>>>>> pages in the database, thus eliminating the "recreate the link
>>>>> information at every startup" problem.
>>>>>
>>>>> The problems we've had with this approach have been mostly on the "how
>>>>> to keep the link information correct", so that's probably something you
>>>>> want to keep in mind.
>>>>>
>>>>> --
>>>>> * Sufficiently advanced magic is indistinguishable from technology (T.P)
>>>>>  *
>>>>> *           PGP public key available @ http://www.iki.fi/killer
>>>>>   *
>>>>
>>>>
>>
>>
>

Re: ReferenceManager rewrite: seeking a little discussion

Posted by Andrew Jaquith <an...@gmail.com>.
> ReferenceManager...  Also, since RefMgr worked in 2.x line even in renames,
> I always figured that there was a small bug introduced, which means that
> debugging that one would've been easier than a full rewrite.

Yeah, though I re-wote it for 3.x by necessity, so it's going to have
bugs 2.x didn't have. :)

> (It should be noted that you will need to track the non-existent references
> using page titles. You can't do them based on UUIDs, since the UUIDs are
> assigned by JCR.  You could create a dummy page, but that means littering

Yes, that's what I assumed.

> JSPWiki, so you need to build in certain sanity checks.  In the end, there
> will be lots more code, though arguably it is going to be easier to debug.)

Noted. :)

>
> /Janne
>
> On Mar 14, 2010, at 22:45 , Andrew Jaquith wrote:
>
>>> I'm basically +1 on using UUIDs internally, *BUT* there are two big
>>> problems: performance. *If* ReferenceManager stores only UUIDs, every single
>>> page view causes a session.getNodeByUUID().getProperty("wiki:title"), which
>>> is two reads to the database *per reference*.  This will kill the idea of
>>> including the page references on the page itself, which I think is really
>>> neat.
>>
>> Yes, that is right. I am running into this issue right now. :)
>>
>> As for references "on the page itself," did you mean the page markup?
>> I haven't changed that... that still stays as WikiNames. What I meant
>> was using UUIDs in the wiki:refersTo and wiki:referredBy attributes.
>>
>>> So to solve this a UUID => wiki:title cache needs to be put somewhere.
>>
>> Should be pretty straightforward, no? The only time cache entries
>> would need to be refreshed would be on page-rename operations, which
>> won't happen too often.
>>
>>> Now, the second big problem is non-existing references.  Since an UUID
>>> can only exist when the page in question exists, you can't use UUIDs to
>>> track references to non-existing pages (which the ReferenceManager also
>>> tracks).
>>
>> You can, if you use dummy pages to represent the non-existent pages; I
>> mentioned this in my first message on this subject (earlier today).
>> These go in wiki:references/wiki:notCreated. When new pages are added,
>> we look in the "uncreated" tree first, and if there, we move the node
>> to the pages tree (and then save all the other attributes on top of
>> the node as we normally would).
>>
>>> So you need to track both in possibly separate systems, including all the
>>> page creations and removals.  This is the reason why I've never bothered to
>>> rewrite ReferenceManager into using UUIDs, since doing this tracking has
>>> been a bit too much work, especially considering that it works now and has
>>> no performance problems in 3.0.
>>
>> You are right about having to create a separate mechanism for tracking
>> uncreated nodes, and that is what I am experimenting with. I'm
>> reasonably far along in terms of proof-of-concent. But, I would
>> slightly disagree that the current "use page names as pointers"
>> strategy for ReferenceManager works. It mostly does, but breaks down
>> in rename situations. I found it nearly impossible to debug -- which
>> is saying something, considering I did the last RefMgr rewrite.
>>
>> This is all about where you want to do the extra work: (1) during page
>> renames (as we do now), or (2) during page creation and deletion (as
>> the UUID strategy would require). But the UUID strategy has some nice
>> benefits, such as making ReferenceManager's code simpler -- it was
>> insanely hard to debug. Now it's just difficult.
>>
>> So, what do you think? I still think this is worth pursing... and I'm
>> nearly finished with a POC.
>>
>> Andrew
>>
>>>
>>> On 14 Mar 2010, at 20:33, Kalle Kivimaa wrote:
>>>
>>>> Andrew Jaquith <an...@gmail.com> writes:
>>>>>
>>>>> In thinking this through a bit more, I thought it might be better if
>>>>> we stored references as UUIDs. This means (for example) that renaming
>>>>> is simple -- all we really need to just change the page text, rather
>>>>> than the reference "pointers". So, that's what I've been experimenting
>>>>> with. It seems to work really, really well, and the code is simpler.
>>>>
>>>> We basically did this in the company I work for, where we use JSPWiki as
>>>> a base for our UI. We ran into problems with the reference manager
>>>> taking up a *very* long time at wiki startup, so we switched over to
>>>> using a SQL database for page storage and storing the links between
>>>> pages in the database, thus eliminating the "recreate the link
>>>> information at every startup" problem.
>>>>
>>>> The problems we've had with this approach have been mostly on the "how
>>>> to keep the link information correct", so that's probably something you
>>>> want to keep in mind.
>>>>
>>>> --
>>>> * Sufficiently advanced magic is indistinguishable from technology (T.P)
>>>>  *
>>>> *           PGP public key available @ http://www.iki.fi/killer
>>>>   *
>>>
>>>
>
>

Re: ReferenceManager rewrite: seeking a little discussion

Posted by Janne Jalkanen <Ja...@ecyrd.com>.
"On the page itself" means "as a Property of the page Node."

If you're willing to do the hard work, go ahead. The reason I didn't  
do it was that I always figured we had bigger problems than rewriting  
ReferenceManager...  Also, since RefMgr worked in 2.x line even in  
renames, I always figured that there was a small bug introduced, which  
means that debugging that one would've been easier than a full rewrite.

(It should be noted that you will need to track the non-existent  
references using page titles. You can't do them based on UUIDs, since  
the UUIDs are assigned by JCR.  You could create a dummy page, but  
that means littering the repository with pages that do not exist,  
which in turn means lots of tracking and eventual garbage collection,  
which in turn can become quite heavy.  You can't assume that you  
always catch all renames, deletes and new page ops since someone could  
be manipulating the page content bypassing JSPWiki, so you need to  
build in certain sanity checks.  In the end, there will be lots more  
code, though arguably it is going to be easier to debug.)

/Janne

On Mar 14, 2010, at 22:45 , Andrew Jaquith wrote:

>> I'm basically +1 on using UUIDs internally, *BUT* there are two big  
>> problems: performance. *If* ReferenceManager stores only UUIDs,  
>> every single page view causes a  
>> session.getNodeByUUID().getProperty("wiki:title"), which is two  
>> reads to the database *per reference*.  This will kill the idea of  
>> including the page references on the page itself, which I think is  
>> really neat.
>
> Yes, that is right. I am running into this issue right now. :)
>
> As for references "on the page itself," did you mean the page markup?
> I haven't changed that... that still stays as WikiNames. What I meant
> was using UUIDs in the wiki:refersTo and wiki:referredBy attributes.
>
>> So to solve this a UUID => wiki:title cache needs to be put  
>> somewhere.
>
> Should be pretty straightforward, no? The only time cache entries
> would need to be refreshed would be on page-rename operations, which
> won't happen too often.
>
>> Now, the second big problem is non-existing references.  Since an  
>> UUID can only exist when the page in question exists, you can't use  
>> UUIDs to track references to non-existing pages (which the  
>> ReferenceManager also tracks).
>
> You can, if you use dummy pages to represent the non-existent pages; I
> mentioned this in my first message on this subject (earlier today).
> These go in wiki:references/wiki:notCreated. When new pages are added,
> we look in the "uncreated" tree first, and if there, we move the node
> to the pages tree (and then save all the other attributes on top of
> the node as we normally would).
>
>> So you need to track both in possibly separate systems, including  
>> all the page creations and removals.  This is the reason why I've  
>> never bothered to rewrite ReferenceManager into using UUIDs, since  
>> doing this tracking has been a bit too much work, especially  
>> considering that it works now and has no performance problems in 3.0.
>
> You are right about having to create a separate mechanism for tracking
> uncreated nodes, and that is what I am experimenting with. I'm
> reasonably far along in terms of proof-of-concent. But, I would
> slightly disagree that the current "use page names as pointers"
> strategy for ReferenceManager works. It mostly does, but breaks down
> in rename situations. I found it nearly impossible to debug -- which
> is saying something, considering I did the last RefMgr rewrite.
>
> This is all about where you want to do the extra work: (1) during page
> renames (as we do now), or (2) during page creation and deletion (as
> the UUID strategy would require). But the UUID strategy has some nice
> benefits, such as making ReferenceManager's code simpler -- it was
> insanely hard to debug. Now it's just difficult.
>
> So, what do you think? I still think this is worth pursing... and I'm
> nearly finished with a POC.
>
> Andrew
>
>>
>> On 14 Mar 2010, at 20:33, Kalle Kivimaa wrote:
>>
>>> Andrew Jaquith <an...@gmail.com> writes:
>>>> In thinking this through a bit more, I thought it might be better  
>>>> if
>>>> we stored references as UUIDs. This means (for example) that  
>>>> renaming
>>>> is simple -- all we really need to just change the page text,  
>>>> rather
>>>> than the reference "pointers". So, that's what I've been  
>>>> experimenting
>>>> with. It seems to work really, really well, and the code is  
>>>> simpler.
>>>
>>> We basically did this in the company I work for, where we use  
>>> JSPWiki as
>>> a base for our UI. We ran into problems with the reference manager
>>> taking up a *very* long time at wiki startup, so we switched over to
>>> using a SQL database for page storage and storing the links between
>>> pages in the database, thus eliminating the "recreate the link
>>> information at every startup" problem.
>>>
>>> The problems we've had with this approach have been mostly on the  
>>> "how
>>> to keep the link information correct", so that's probably  
>>> something you
>>> want to keep in mind.
>>>
>>> --
>>> * Sufficiently advanced magic is indistinguishable from technology  
>>> (T.P)  *
>>> *           PGP public key available @ http://www.iki.fi/ 
>>> killer           *
>>
>>


Re: ReferenceManager rewrite: seeking a little discussion

Posted by Andrew Jaquith <an...@gmail.com>.
> I'm basically +1 on using UUIDs internally, *BUT* there are two big problems: performance. *If* ReferenceManager stores only UUIDs, every single page view causes a session.getNodeByUUID().getProperty("wiki:title"), which is two reads to the database *per reference*.  This will kill the idea of including the page references on the page itself, which I think is really neat.

Yes, that is right. I am running into this issue right now. :)

As for references "on the page itself," did you mean the page markup?
I haven't changed that... that still stays as WikiNames. What I meant
was using UUIDs in the wiki:refersTo and wiki:referredBy attributes.

> So to solve this a UUID => wiki:title cache needs to be put somewhere.

Should be pretty straightforward, no? The only time cache entries
would need to be refreshed would be on page-rename operations, which
won't happen too often.

> Now, the second big problem is non-existing references.  Since an UUID can only exist when the page in question exists, you can't use UUIDs to track references to non-existing pages (which the ReferenceManager also tracks).

You can, if you use dummy pages to represent the non-existent pages; I
mentioned this in my first message on this subject (earlier today).
These go in wiki:references/wiki:notCreated. When new pages are added,
we look in the "uncreated" tree first, and if there, we move the node
to the pages tree (and then save all the other attributes on top of
the node as we normally would).

> So you need to track both in possibly separate systems, including all the page creations and removals.  This is the reason why I've never bothered to rewrite ReferenceManager into using UUIDs, since doing this tracking has been a bit too much work, especially considering that it works now and has no performance problems in 3.0.

You are right about having to create a separate mechanism for tracking
uncreated nodes, and that is what I am experimenting with. I'm
reasonably far along in terms of proof-of-concent. But, I would
slightly disagree that the current "use page names as pointers"
strategy for ReferenceManager works. It mostly does, but breaks down
in rename situations. I found it nearly impossible to debug -- which
is saying something, considering I did the last RefMgr rewrite.

This is all about where you want to do the extra work: (1) during page
renames (as we do now), or (2) during page creation and deletion (as
the UUID strategy would require). But the UUID strategy has some nice
benefits, such as making ReferenceManager's code simpler -- it was
insanely hard to debug. Now it's just difficult.

So, what do you think? I still think this is worth pursing... and I'm
nearly finished with a POC.

Andrew

>
> On 14 Mar 2010, at 20:33, Kalle Kivimaa wrote:
>
>> Andrew Jaquith <an...@gmail.com> writes:
>>> In thinking this through a bit more, I thought it might be better if
>>> we stored references as UUIDs. This means (for example) that renaming
>>> is simple -- all we really need to just change the page text, rather
>>> than the reference "pointers". So, that's what I've been experimenting
>>> with. It seems to work really, really well, and the code is simpler.
>>
>> We basically did this in the company I work for, where we use JSPWiki as
>> a base for our UI. We ran into problems with the reference manager
>> taking up a *very* long time at wiki startup, so we switched over to
>> using a SQL database for page storage and storing the links between
>> pages in the database, thus eliminating the "recreate the link
>> information at every startup" problem.
>>
>> The problems we've had with this approach have been mostly on the "how
>> to keep the link information correct", so that's probably something you
>> want to keep in mind.
>>
>> --
>> * Sufficiently advanced magic is indistinguishable from technology (T.P)  *
>> *           PGP public key available @ http://www.iki.fi/killer           *
>
>

Re: ReferenceManager rewrite: seeking a little discussion

Posted by Janne Jalkanen <ja...@ecyrd.com>.
JSPWiki 3.0 already stores the references internally in the page repository, so that's solved.

I'm basically +1 on using UUIDs internally, *BUT* there are two big problems: performance. *If* ReferenceManager stores only UUIDs, every single page view causes a session.getNodeByUUID().getProperty("wiki:title"), which is two reads to the database *per reference*.  This will kill the idea of including the page references on the page itself, which I think is really neat.

Putting the page references only on PageInfo does not work either, since in a public wiki the PageInfo will get spidered anyway.

So to solve this a UUID => wiki:title cache needs to be put somewhere.

Now, the second big problem is non-existing references.  Since an UUID can only exist when the page in question exists, you can't use UUIDs to track references to non-existing pages (which the ReferenceManager also tracks).  So you need to track both in possibly separate systems, including all the page creations and removals.  This is the reason why I've never bothered to rewrite ReferenceManager into using UUIDs, since doing this tracking has been a bit too much work, especially considering that it works now and has no performance problems in 3.0.

/Janne

On 14 Mar 2010, at 20:33, Kalle Kivimaa wrote:

> Andrew Jaquith <an...@gmail.com> writes:
>> In thinking this through a bit more, I thought it might be better if
>> we stored references as UUIDs. This means (for example) that renaming
>> is simple -- all we really need to just change the page text, rather
>> than the reference "pointers". So, that's what I've been experimenting
>> with. It seems to work really, really well, and the code is simpler.
> 
> We basically did this in the company I work for, where we use JSPWiki as
> a base for our UI. We ran into problems with the reference manager
> taking up a *very* long time at wiki startup, so we switched over to
> using a SQL database for page storage and storing the links between
> pages in the database, thus eliminating the "recreate the link
> information at every startup" problem.
> 
> The problems we've had with this approach have been mostly on the "how
> to keep the link information correct", so that's probably something you
> want to keep in mind.
> 
> -- 
> * Sufficiently advanced magic is indistinguishable from technology (T.P)  *
> *           PGP public key available @ http://www.iki.fi/killer           *


Re: ReferenceManager rewrite: seeking a little discussion

Posted by Kalle Kivimaa <ka...@iki.fi>.
Andrew Jaquith <an...@gmail.com> writes:
> In thinking this through a bit more, I thought it might be better if
> we stored references as UUIDs. This means (for example) that renaming
> is simple -- all we really need to just change the page text, rather
> than the reference "pointers". So, that's what I've been experimenting
> with. It seems to work really, really well, and the code is simpler.

We basically did this in the company I work for, where we use JSPWiki as
a base for our UI. We ran into problems with the reference manager
taking up a *very* long time at wiki startup, so we switched over to
using a SQL database for page storage and storing the links between
pages in the database, thus eliminating the "recreate the link
information at every startup" problem.

The problems we've had with this approach have been mostly on the "how
to keep the link information correct", so that's probably something you
want to keep in mind.

-- 
* Sufficiently advanced magic is indistinguishable from technology (T.P)  *
*           PGP public key available @ http://www.iki.fi/killer           *

Re: ReferenceManager rewrite: seeking a little discussion

Posted by Andrew Jaquith <an...@gmail.com>.
Harry, thanks. You captured the spirit of what I'm trying to do much
more simply than I did in my previous e-mail. :)

I did NOT suggest the more radical step of replacing the actual wiki
page markup with UUIDs "under the covers" (and then de-referencing
them for viewing and editing purposes). That's a discussion for
another time. That WOULD address Florian's larger issue, combined with
a consolidation of where and how we canonicalize and resolve page
names. That, too, is a discussion for another time. :)

Andrew

On Sun, Mar 14, 2010 at 12:56 PM, Harry Metske <ha...@gmail.com> wrote:
> Although I cannot completely oversee the consequences of this switch, I in
> general like the idea of using internal numbers/names/labels (uuids in our
> case) instead of external (visible) names for referencing purposes.
> The issue is very similar to SQL tables, where it is much more convenient
> using identity columns as unique keys instead of "name columns" , especially
> when using foreign keys.
>
> I think this issue does not directly relate to external page names (as
> Florian discussed).
>
> With the assumption that using UUIDs is only used internally, and not
> showing up anywhere in the public API:
> +1 from me, at least something to "commit and try"
>
>
> regards,
> Harry
>
> 2010/3/14 Andrew Jaquith <an...@gmail.com>
>
>> All (and especially Janne) --
>>
>> In digging into some of the remaining bugs clustered in
>> PageRenamerTest, I was forced to confront what I'd coded up during the
>> last re-write of ReferenceManager. Lots of the PageRenamerTests are
>> still broken. The problem with page-renaming relates, I suspect, to a
>> checkin Janne did previously that sought to handle case-sensitivity
>> filesystem issues. To put it simply, the relationship between the wiki
>> path (as stored in the JCRWikiPage ATTR_TITLE attribute) and the
>> filesystem wasn't being dealt with gracefully by ReferenceManager. The
>> bug was too complex to track down, and I didn't have the patience and
>> time to do it.
>>
>> I don't mean to blame Janne for this -- not at all. It merely sheds
>> light on the difficulty of keeping references when the identifiers are
>> page names. When the page name changes, we have to jump through lots
>> of hoops to keep the references intact. You can blame ME for that. It
>> reminded me of the old saying "programmers will create the most
>> complex things they can debug".
>>
>> In thinking this through a bit more, I thought it might be better if
>> we stored references as UUIDs. This means (for example) that renaming
>> is simple -- all we really need to just change the page text, rather
>> than the reference "pointers". So, that's what I've been experimenting
>> with. It seems to work really, really well, and the code is simpler.
>>
>> The only odd case we have to deal with is when we're referring to a
>> page that hasn't been created yet. In that case, what I've chosen to
>> do is create dummy pages in a separate JCR branch (part of the
>> /jspwiki/wiki:references node tree). Then, when pages are added in
>> ContentManager, we check FIRST to see if that page is in the
>> "not-created" tree. If it is, we MOVE it to the pages tree and then
>> save as normal. Deletions work in reverse: if the page has any inbound
>> references, we move it back to the "not created" tree to ensure that
>> references from live pages stay intact; otherwise we zorch the page as
>> normal.
>>
>> The "not created" page tree, by the way, is also an example of
>> something I'm calling a "page foundry" -- a place where future pages
>> are born but not yet moved into production. I can imagine other
>> foundries -- for example, a per-user foundry for drafts. Maybe
>> "nursery" is a better metaphor, but you get the idea.
>>
>> Thoughts? The code isn't quite ready, but it is progressing nicely. We
>> might as well fix it before the 3.0 release, right?
>>
>> Andrew
>>
>

Re: ReferenceManager rewrite: seeking a little discussion

Posted by Harry Metske <ha...@gmail.com>.
Although I cannot completely oversee the consequences of this switch, I in
general like the idea of using internal numbers/names/labels (uuids in our
case) instead of external (visible) names for referencing purposes.
The issue is very similar to SQL tables, where it is much more convenient
using identity columns as unique keys instead of "name columns" , especially
when using foreign keys.

I think this issue does not directly relate to external page names (as
Florian discussed).

With the assumption that using UUIDs is only used internally, and not
showing up anywhere in the public API:
+1 from me, at least something to "commit and try"


regards,
Harry

2010/3/14 Andrew Jaquith <an...@gmail.com>

> All (and especially Janne) --
>
> In digging into some of the remaining bugs clustered in
> PageRenamerTest, I was forced to confront what I'd coded up during the
> last re-write of ReferenceManager. Lots of the PageRenamerTests are
> still broken. The problem with page-renaming relates, I suspect, to a
> checkin Janne did previously that sought to handle case-sensitivity
> filesystem issues. To put it simply, the relationship between the wiki
> path (as stored in the JCRWikiPage ATTR_TITLE attribute) and the
> filesystem wasn't being dealt with gracefully by ReferenceManager. The
> bug was too complex to track down, and I didn't have the patience and
> time to do it.
>
> I don't mean to blame Janne for this -- not at all. It merely sheds
> light on the difficulty of keeping references when the identifiers are
> page names. When the page name changes, we have to jump through lots
> of hoops to keep the references intact. You can blame ME for that. It
> reminded me of the old saying "programmers will create the most
> complex things they can debug".
>
> In thinking this through a bit more, I thought it might be better if
> we stored references as UUIDs. This means (for example) that renaming
> is simple -- all we really need to just change the page text, rather
> than the reference "pointers". So, that's what I've been experimenting
> with. It seems to work really, really well, and the code is simpler.
>
> The only odd case we have to deal with is when we're referring to a
> page that hasn't been created yet. In that case, what I've chosen to
> do is create dummy pages in a separate JCR branch (part of the
> /jspwiki/wiki:references node tree). Then, when pages are added in
> ContentManager, we check FIRST to see if that page is in the
> "not-created" tree. If it is, we MOVE it to the pages tree and then
> save as normal. Deletions work in reverse: if the page has any inbound
> references, we move it back to the "not created" tree to ensure that
> references from live pages stay intact; otherwise we zorch the page as
> normal.
>
> The "not created" page tree, by the way, is also an example of
> something I'm calling a "page foundry" -- a place where future pages
> are born but not yet moved into production. I can imagine other
> foundries -- for example, a per-user foundry for drafts. Maybe
> "nursery" is a better metaphor, but you get the idea.
>
> Thoughts? The code isn't quite ready, but it is progressing nicely. We
> might as well fix it before the 3.0 release, right?
>
> Andrew
>