You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cocoon.apache.org by Bertrand Delacretaz <bd...@apache.org> on 2003/10/11 11:36:08 UTC

[RT] Moving towards a new documentation system (was: [RT] Updating the website)

Le Samedi, 11 oct 2003, à 04:21 Europe/Zurich, David Crossley a écrit :

> Tony Collen wrote:
>> ...We might need to get away from the "developer" vs "user" notion, 
>> because depending on how much about
>> Cocoon you already know, you might have to hack out a new generator 
>> (which would seem to imply
>> information in the developer section) while you are really a user.
>
> +1 ... we have talked about that many times. Almost every
> user is a developer. Anyway "Trails" are better navigation method.

+1

I'm starting to think (and I think this resonates with what Tony was 
saying) that the physical structure of the docs should be flat, 
wiki-style, having all docs "files" (real files or generated) in a 
single directory, of very few directories like "reference", "documents" 
and maybe "technotes".

We can then build all kinds of navigational structures, trails, 
multiple tables of contents, beginners/advanced, whatever (again 
picking up on wiki idea of a flat page structure with many navigation 
paths), but the path to a given document stays valid forever unless 
documents are removed.

Of course we forfeit compatibility with our existing docs URLs, but I 
think this is needed anyway to move forward.

This might also make our remodeling easier:

-move all existing docs to a small number of directories like above, 
"big bag of docs"
-rename docs as needed to give them permanent names
-create a very simple publishing system for now (Forrest probably?), 
until the new docs system moves forward
-start building the navigations, trails, tables of contents 
incrementally
-if the docs format changes for the new doc management system, 
navigation definitions stay valid

I think we need to find a way to get started with this docs remodeling 
without having to wait  too long on our improved doc management system 
- if an incremental path like above works it might help us get started.

Thoughts?

-Bertrand

P.S. We need to find a name for this new doc management system - I'm 
low on ideas noew but maybe CDMS? Cocoon Documentation Management 
System?

Re: [RT] Moving towards a new documentation system

Posted by Tony Collen <co...@umn.edu>.

Bertrand Delacretaz wrote:

<snip/>

> I'm starting to think (and I think this resonates with what Tony was 
> saying) that the physical structure of the docs should be flat, 
> wiki-style, having all docs "files" (real files or generated) in a 
> single directory, of very few directories like "reference", "documents" 
> and maybe "technotes".
> 
> We can then build all kinds of navigational structures, trails, multiple 
> tables of contents, beginners/advanced, whatever (again picking up on 
> wiki idea of a flat page structure with many navigation paths), but the 
> path to a given document stays valid forever unless documents are removed.

Yes, I agree.  The docs should all be in one place -- at least in the 
repository.  It will make things easier for us to find, and we won't 
neccesarily have the problem of trying to categorize new docs, or move 
existing docs into a more appropriate section.  More on this later...

> Of course we forfeit compatibility with our existing docs URLs, but I 
> think this is needed anyway to move forward.
> 
> This might also make our remodeling easier:
> 
> -move all existing docs to a small number of directories like above, 
> "big bag of docs"

Agreed.

> -rename docs as needed to give them permanent names

Also agreed.  I like the idea of giving the docs a permanent name.  This 
should be investigated thoroughly, though.  Remember that we can create 
whatever URL space we want, and have it totally decoupled from the 
physical layout of the docs in the repository.

> -create a very simple publishing system for now (Forrest probably?), 
> until the new docs system moves forward

> -start building the navigations, trails, tables of contents incrementally

Agreed.  The way I think of it, is if we have our so-called "big bag of 
docs", we can then create virtual categories, which are collections of 
docs in a specific order.  This will allow for any type of "virtual" 
docs layout -- be it trails, or straight up reference, TOC, index, etc.

The problem then would be tying in these virtual categories into the 
sitemap automatically somehow.

This way we could create trails, or references for just about anything. 
   (Does this duplicate any Forrest functionality?)

Here's how we could glue together a trail for the Flow:

<category name="teachMeFlowscriptTrail" title="Everything you need to 
know to use the Flow">
     <doc id="123-flow-intro.xml"/>
     <doc id="456-flow-basic.xml"/>
     <doc id="789-flow-advanced.xml"/>
     <doc id="012-flow-woody.xml"/>
     <doc id="345-flow-fom-ref.xml"/>
     <doc id="678-flow-jxt-ref.xml"/>
</category>

Likewise, we could glue together a reference for all of the generators:

<category name="generatorsReference" title="Generators Reference Page">
     <doc id="343-generator-intro.xml"/>
     <doc id="982-file-generator.xml"/>
     <doc id="356-html-generator.xml"/>
     <doc id="234-wsproxy-generator.xml"/>
     <!-- and so on, in order (alphabetical) -->
</category>

Obviously we'd want to abstract the actual filenames, preferably to get 
rid of the numbers out of the actual URL that it will be accessed 
through.  Then again, if we have a good naming convention, we might not 
need numbers on the files at all.

This could even let us keep the "old" urls for the fubared docs layout, 
all we'd have to do is create categories that reflect the current 
fubared layout ;)

Implementation is left as an exercise for the reader....

Regards,

Tony

Re: [RT] Moving towards a new documentation system

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Bertrand Delacretaz wrote:
> 
> Le Samedi, 11 oct 2003, à 14:25 Europe/Zurich, Nicola Ken Barozzi a écrit :
> 
>> Bertrand Delacretaz wrote:
>>
>>> ...We can then build all kinds of navigational structures, trails, 
>>> multiple tables of contents, beginners/advanced, whatever (again 
>>> picking up on wiki idea of a flat page structure with many navigation 
>>> paths), but the path to a given document stays valid forever unless 
>>> documents are removed.
>>
>> Forrest's site.xml is ready to adapt to the needs.
> 
> ok. I'd prefer multiple "navigation definition" files though, one for 
> each "navigation concern" (tracks, beginner/advanced, 
> functionality-based, etc).
> Is this possible with Forrest, or what do you suggest?

site.xml is a file that keeps all the site navigational structure in one 
place. The upside is also that you can link to pages using a 
site:nameofnode syntax, so that changing pages does not break links as 
long as you update site.xml.

In there we also have a resources section, that are not part of the 
navigational structure but are easy to use for site: page linking.

My opinion is that it would be swell to add a "navigations" section that 
contains navigational concerns, each differently named. It has also been 
proposed some time back too BTW, it just needs someone to do it.

>> ..Eh, why "Forrest /probably/ "?
> 
> Only because I haven't been following it lately and don't know much 
> details about where it is and where it is going.

It's there just to be what you want the documentation to be :-)

> Nothing against Forrest!

Pfew! ;-)

>>> ...
>>> -if the docs format changes for the new doc management system, 
>>> navigation definitions stay valid.
>>
>> There was a discussion on Forrest about making all links be done to 
>> files without extensions, and now we use site.xml to reference these 
>> links.
>>
>> The only thing that is still lacking is making the output remain 
>> "static" over time.
> 
> Not sure if I understand this, can you explain?

I mean that in the docs I can refer to a page via the site: system. If I 
keep the site.xml links uptodate, all my links in all my pages are correct.

But this is on *source*. Forrest translates the links to the correct 
location on disk, which has changed. Hence doc writers have no issues, 
but links break.

What we need is to cater for these changes and add a "history" system 
that tracks changes to the site.xml nodes, that are our defined URI 
contract.

BTW, Forrest has a simple revision system, take a look in Forrest.

>> ...What I wanted to do is to have Forrest generate an index of pages 
>> and the users add this to CVS. With this index we have all the doc 
>> history, and Forrest can generate redirects if urls change. I also 
>> want want to generate redirects for filenames without urls and add an 
>> unique id to every page in the index, so that Forrest can add barcodes 
>> to the pages.
> 
> Sounds good.

It's a bit lonely out there. If Cocoon developers would be at Forrest to 
discuss these things it would be much better.

>> ...Please don't forget Forrest.
> 
> Certainly not!

:-)

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: [RT] Improved navigation of learning objects

Posted by Alan Gutierrez <al...@izzy.net>.

* Stefano Mazzocchi <st...@apache.org> [2003-10-12 16:04]:

> On Sunday, Oct 12, 2003, at 16:13 Europe/Rome, Alan Gutierrez wrote:
> 
> >The trouble with Wiki and docs is that new users, such as myself,
> >are going to look for a documentation outline. A good TOC and index
> >makes all the differnece in the world when searching documentation.
> 
> eheh, right on.
> 
> >Has anyone discussed how to impose an outline on a Wiki?
> 
> yes. there are some proposals on the table, ranging from simple to 
> futuristic.

Did you just bang this out? Thanks for responding to my question so
thoroughly.

>                                 - o -
> 
> the simple one is a manually created single-dereferencing linkmap.
> 
> Imagine that you have a repository with the following learning objects:
> 
>  /1
>  /2
>  /3
>  /4
> 
> which are edited and created individually. Then you have a linkmap that 
> basically says
> 
>  Trail "A"
>    /1
>   Section "Whatever"
>     /3
>     /4
>   Section "Somethign else"
>     /2
> 
>  Trail "B"
>    /4
>    /1
> 
> Trails are like "books", and they might share LOs. Trails might be 
> published as a single PDF file for easier offline review.
> 
> Trails can be used as "tabs" in the forrest view, while the rest is the 
> navbar on the side.
> 
> the LO identifier (http://cocoon.apache.org/LO/4) can be translated to 
> a real locator (http://cocoon.apache.org/cocoon/2.1/A/introduction) and 
> all the links rewritten accordingly.
> 
> This link translation is a mechanical lookup, based on the linkmap 
> information.
> 
> [note: this is getting closer to what topic maps are about! see 
> topicmaps.org, even if, IMO, it wouldn't make sense to use topic maps 
> for this because they are much more complex]

An outline is a classic tool for organizing a document. This is the
aspect of content managment that I find interesting, creating a
mechanism for organization by an information architect. Searching is
well and good, but I've learned more from the "Cocoon Developers's
Handbook" in two hours than I've learned form the Cocoon Wiki in a
week.

Make this tedious task of link map creation simplier and you will
have an exciting solution for myriad business problems.  

Creating the link map would be easier if the learning object was a
proscribed object, rather than a found object, the way things tend
to be in the semanitc web. Gathing up all the micro topics of a Wiki
has got to be akin to hearding cats. You might connect to a
two sentance learning object on the topic map, one only to have it
change meaning as people contribute to the object.

To my mind it is best to have a go between.  A documentation Wiki
really ought to be documentation tasks that are pareceled out to the
community. There is a ready context for a contributors submission.
A topic map, or outline, that specifies topics.  Sections are given
over to a tecnhical writer, who acts a maintainer. Wiki
contributions are attached to a topic in the outline, the maintainer
pulls relevent discussion up into the topic, editing it for
consistant style.

This is how good open source documentation is created anyway.

>                                     - o -
> 
> The approach above works but it requires two operations:
> 
>  1) creation of the LO
>  2) connection of the LO in the linkmap

Or those operations could be inverted and iterative.

>                                     - o -

> In the more futuristic approach, each learning object specifies two 
> things:
> 
>  1) what it expects the reader to know
>  2) what it expect to give to the reader
> 
> Suppose that you have learning object library of
> 
>                   /1 -(teaches)-> [a]
>                   /2 -(teaches)-> [a]
>  [a] <-(expects)- /3 -(teaches)-> [b]
>  [a] <-(expects)- /4 -(teaches)-> [c]

This to me is a parallel course to an outline. In many of the
cookbook style software texts I read, there is a see also section in
each recipie. The above would not replace an outline, but the
absence of the above shows a neglection of the medium of hypertext.

>                               - o -
> 
> If each editor comes up with his own identifiers, we might have to 
> issues:
> 
>  1) identifiers are so precise that there is no match in between 
> documents written by different people
> 
>  2) identifiers are so broad in spectrum that concepts overlap and 
> dependencies blur
> 
> So we must find a way to come up with a system that helps us in the 
> discovery, creation and correctness maintenance of a taxonomy, but 
> without removing or overlapping human judgment.

Discovery, creation and correctness maintence. I'm repeating this
because they sound like the overarching goals of content managment.
I like this because it implies that a content managment system aids
decision making, rather than obsolesing decision making.

>                               - o -
> 
> Here is a scenario of use (let's focus on text so, LO = page)
> 
>  a) you edit a page.
> 
>  b) you highlight the words in the page that you believe are important 
> to identify the knowledge distilled in the page that is transmitted to 
> the reader.
> 
>  c) you also highlight the words in the page that you believe identify 
> the cognitive context that is required by the user to understand this 
> page
> 
>  [this can be done very easily in the WYSIWIG editor, using different 
> colors]
> 
>  c) when you are done, you submit the page into the system.
> 
>  d) the page gets processed against the existing repository.
> 
>  e) the system suggests you the topic that might be closer to your 
> page, and you select the ones you think they fit the best. or you 
> introduce a new one in case nothing fits.
> 
> point d) seems critical in the overall smartness of the system, but I 
> don't think it is since semantic estimation is done by humans on point 
> e), point d) just has to be smart enough to remove all the pages that 
> have nothing at all to do with the current learning object.
> 
> Possible implementations are:
> 
>  - document vector space euclidean distance (used by most search 
> engines, including google and lucene)
>  - latent semantic distance (but this one is patented but it will 
> expire in a few years, used for spam filtering by the Mail.app in 
> MacOSX, used by the Microsoft Office help system and the assistant).

Democracy is another useful algortihm. (Altough, I'm giddy thinking
about the use of Google/Lucene not only to extract but to inject
content into a Web.) Moderation is one approach.

Prior to a) there is the contributors interataction with the system
that prompted the sumbission. The submission is not entirely
without context, so perhaps it is enough to inject the submission at
the point where the author got on board, just as I am injecting my
content into this thread here, and let the community move the
content around. That is, provide community tools for moderation and
classifcation. Heck, invite the community to highlight what they
feel are keywords as they read the document, feed that to Lucene.

>                                - o -

> The above model paints a double-dereference hypertext, sort of 
> "polymorphic" hypertext where links are made to "abstract concepts" 
> that are dereferenced against resource implicitly.

Help. Double-dereferenced? Polymorphic? Maybe I don't understand the
meaning of the word dereference in this context.

> In fact, navigation between learning objects can be:
> 
>  1) hand written
>  2) inferred from the contextual dependencies
>  3) inferred from the usage patterns

For 3) keep in mind that once can trace a visitor's through the
documentation. These proximities can be fed to a search engine,
perhaps with feedback from the visitor. A link that says, I found
it! and/or a  Did you find this useful? menu. (This is all being
done somewhere I am sure.)

> I believe that a system that is able to implement all of the above 
> would prove to be a new kind of hypertext, much closer to the original 
> Xanadu vision that Ted Nelson outlined in the '60s when he coined the 
> term hypertext.

Lofty!

> I don't think we have to jump all the way up here, there are smaller 
> steps that we can do to improve what we have, but this is where I want 
> to go.

I am very interested in creating the tools to assist in the ceation
of the outline. I'd like to look at the problem from the persepctive
of a librarian (which I am not) or an editor. 

I am intereseted in managing the content of a single project, rather
than managing a the content of an enormous organziation, or the
entire web. I want a document management system that gears itself
towards the publication of a document, with toc, index, chapters.
Collective Word Processing?

-- 
Alan Gutierrez - alan@agtrz.com - C:504.301.8807 - O:504.948.9237

Re: [RT] Improved navigation of learning objects

Posted by Stefano Mazzocchi <st...@apache.org>.

On Monday, Oct 13, 2003, at 07:48 Europe/Rome, Andreas Hochsteger wrote:

> Hi Stefano!
>
> Stefano Mazzocchi wrote:
> > The approach above works but it requires two operations:
> >
> >  1) creation of the LO
> >  2) connection of the LO in the linkmap
>
> Can you explain to me, why you are always talking from learning  
> objects in this context?
>
> Isn't a learning object a bit too specific to be used as a general  
> term?
>
> Here's a list of documents which might be published on the Cocoon  
> website with a classification if LO is suitable or not:
> * HOWTOs (LO fits)
> * FAQs (LO fits)
> * Tutorials (LO fits)
> * Guides (LO fits)
> * References (LO fits?)
>   - Cocoon Component Reference
>   - Cocoon URI Reference
>   - Cocoon XML Schema Reference
> * News (LO doesn't fit)
> * Status information (LO doesn't fit)
>   - Changes, Todo, Planning notes, ...
> * Release notes (LO doesn't fit)
> * Event reports (LO doesn't fit)
> * Links (LO doesn't fit)
> * ... many more where LO would fit and not
>
> As far as I see it a document would still be the more generic term and  
> a LO a certain subclass of documents (HOWTOs, FAQs, Tutorials, Guides,  
> ...), where the user can really learn something.
>
> What do you think?

It's a matter of terminology and communicating with a intermediate  
natural language doesn't help, but I think that "learning" is the act  
of increasing (or modifying!) the state of your cognitive abilities  
(aka knowledge and ability to interoperate with your environment,  
physical and mental)

definitions of "learning object" are:

http://labs.google.com/ 
glossary?q=learning+object&btnG=Google+Glossary+Search

I personally believe that e-learning (as much as web services) is just  
a marketing term to refer to something that we've been doing since the  
web was started.

anyway, it's a philosophical detail: I think everything that we have in  
our documentation is a learning object, but if you don't, it's not a  
problem and we can get back to call them "pages".

But it would feel strange, though, to have the "video" of Sylvain's  
presentation about woody called "page".

We could call them "resources" (just like the URI RFC does), but then  
we would have to distinguish between resources created for human  
consumption and resources created for machine consumption.

My definition of a learning object is: a particular kind of web  
resource created for human consumption that can be referenced directly  
or indirectly thru the cognitive contexts that it exposes.

I don't think anybody would agree with me on that. It's just my  
personal view of the matter (some people have a more restricted  
definition of what a learning object is) and I'm fine with any other  
terminology, as long as the functionality of indirect referencing is  
taken into account.

--
Stefano.

Re: [RT] Improved navigation of learning objects

Posted by Andreas Hochsteger <e9...@student.tuwien.ac.at>.

Hi Stefano!

Stefano Mazzocchi wrote:
 > The approach above works but it requires two operations:
 >
 >  1) creation of the LO
 >  2) connection of the LO in the linkmap

Can you explain to me, why you are always talking from learning objects 
in this context?

Isn't a learning object a bit too specific to be used as a general term?

Here's a list of documents which might be published on the Cocoon 
website with a classification if LO is suitable or not:
* HOWTOs (LO fits)
* FAQs (LO fits)
* Tutorials (LO fits)
* Guides (LO fits)
* References (LO fits?)
   - Cocoon Component Reference
   - Cocoon URI Reference
   - Cocoon XML Schema Reference
* News (LO doesn't fit)
* Status information (LO doesn't fit)
   - Changes, Todo, Planning notes, ...
* Release notes (LO doesn't fit)
* Event reports (LO doesn't fit)
* Links (LO doesn't fit)
* ... many more where LO would fit and not

As far as I see it a document would still be the more generic term and a 
LO a certain subclass of documents (HOWTOs, FAQs, Tutorials, Guides, 
...), where the user can really learn something.

What do you think?

> Stefano.

Bye,
	Andreas

Re: [RT] Improved navigation of learning objects

Posted by Sylvain Wallez <sy...@anyware-tech.com>.

Stefano Mazzocchi wrote:

>
> On Sunday, Oct 12, 2003, at 16:13 Europe/Rome, Alan Gutierrez wrote:
>
>> The trouble with Wiki and docs is that new users, such as myself,
>> are going to look for a documentation outline. A good TOC and index
>> makes all the differnece in the world when searching documentation.
>
>
> eheh, right on.
>
>> Has anyone discussed how to impose an outline on a Wiki?
>
>
> yes. there are some proposals on the table, ranging from simple to 
> futuristic.
>
>                                 - o -
>
> the simple one is a manually created single-dereferencing linkmap.
>
> Imagine that you have a repository with the following learning objects:
>
>  /1
>  /2
>  /3
>  /4
>
> which are edited and created individually. Then you have a linkmap 
> that basically says
>
>  Trail "A"
>    /1
>   Section "Whatever"
>     /3
>     /4
>   Section "Somethign else"
>     /2
>
>  Trail "B"
>    /4
>    /1
>
> Trails are like "books", and they might share LOs. Trails might be 
> published as a single PDF file for easier offline review.
>
> Trails can be used as "tabs" in the forrest view, while the rest is 
> the navbar on the side.
>
> the LO identifier (http://cocoon.apache.org/LO/4) can be translated to 
> a real locator (http://cocoon.apache.org/cocoon/2.1/A/introduction) 
> and all the links rewritten accordingly.


I like being able to have meaningful URLs as this is something that I 
can remember (yeah, my bookmarks file is in my head). On the other hand, 
this also means that a single LO will have several URLs depending on the 
considered trail.

What would be good is to have in the page displaying a LO the list of 
the trails where this LO appears (i.e. it's other URLs). This would 
allow to explore other trails related to a given subject, and hence help 
to achieve the cognitive dissonance required to engrave the LO in the 
reader's mind.

<snip what="super great RT"/>

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Orixo, the opensource XML business alliance  -  http://www.orixo.com

Re: [RT] Improved navigation of learning objects

Posted by Stefano Mazzocchi <st...@apache.org>.

On Monday, Oct 13, 2003, at 15:12 Europe/Rome, Bertrand Delacretaz 
wrote:

> Le Lundi, 13 oct 2003, à 14:57 Europe/Zurich, Stefano Mazzocchi a 
> écrit :
>
>>> ...Some form of topic map would be useful to build "what's related" 
>>> info though, which helps navigation and discovery a lot.
>>
>> Yes, but such a topic maps would have to be human edited. this is 
>> what scares me. tools for ontology creation (like Protege, 
>> http://protege.stanford.edu/ for example) are available but are 
>> *incredibly* complex to to use.
>
> I was thinking of something simple, I think even just "broader term 
> (BT)" / "narrower term (NT)" relationships (like in a thesaurus) would 
> help find a lot of related stuff, conceptually something like:
>
>   cocoon
>    NT: sitemap
>   sitemap
>    NT: matcher
>    NT: sitemap configuration
>   matcher
>    NT: matcher pattern
>
> This would be an XML file from which a Generator could build a 
> bidirectional tree of narrower/broader terms. Clearly not a full-blown 
> topic map, but lightweight and easy to improve incrementally, we could 
> start with a limited set of terms and already have something useful 
> once learning objects are connected (by numerical IDs if we want this 
> to be solid) to relevant terms.
>
> But this is something that needs to be tested on a prototype to see 
> how useful/easy it is.

sound good to me.

--
Stefano.

Re: [RT] Improved navigation of learning objects

Posted by Bertrand Delacretaz <bd...@apache.org>.

Le Lundi, 13 oct 2003, à 14:57 Europe/Zurich, Stefano Mazzocchi a écrit 
:

>> ...Some form of topic map would be useful to build "what's related" 
>> info though, which helps navigation and discovery a lot.
>
> Yes, but such a topic maps would have to be human edited. this is what 
> scares me. tools for ontology creation (like Protege, 
> http://protege.stanford.edu/ for example) are available but are 
> *incredibly* complex to to use.

I was thinking of something simple, I think even just "broader term 
(BT)" / "narrower term (NT)" relationships (like in a thesaurus) would 
help find a lot of related stuff, conceptually something like:

   cocoon
    NT: sitemap
   sitemap
    NT: matcher
    NT: sitemap configuration
   matcher
    NT: matcher pattern

This would be an XML file from which a Generator could build a 
bidirectional tree of narrower/broader terms. Clearly not a full-blown 
topic map, but lightweight and easy to improve incrementally, we could 
start with a limited set of terms and already have something useful 
once learning objects are connected (by numerical IDs if we want this 
to be solid) to relevant terms.

But this is something that needs to be tested on a prototype to see how 
useful/easy it is.

-Bertrand

Re: [RT] Improved navigation of learning objects

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Stefano Mazzocchi wrote:
...
> "automatic harvesting" scares the crap out of me, Conal.

We will probably be moving the Forrest DTD to XHTML2 in one of the next 
things to do. As you know, there are <meta> tags that are a nice way of 
adding additional info in the page.

http://academ.hvcc.edu/~kantopet/xhtml/index.php?page=xhtml+meta+content
"
Types of meta-information include:

     * The contents and topics of the document.
     * The relation of this document to other documents.
     * The type of document. In other words, is it a text document or
        an image, etc.
     * Information to assist users in navigating the content.
     * Information to assist programs in accessing or processing
       the content.
"

It's also possible to make a page tell which is the next and previous, 
and some browsers like Mozilla or Firebird are able to navigate in that too.

 > I agree that there must be some kind of automatism going on, but the
 > topic creation is a human task and programs would do a terrible job at
 > doing this.

It's still humans editing them, but the information can be scattered in 
the documents themselves.

> but anyway, we decided to do a first step with handwritten linkmaps. we 
> can move incrementally from there on.

What I see is that metadata in the docs cannot and should not totally 
replace a centralized bell-written site.xml, but can nicely complement it.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: [RT] Improved navigation of learning objects

Posted by Stefano Mazzocchi <st...@apache.org>.

On Tuesday, Oct 14, 2003, at 20:42 Europe/Rome, Conal Tuohy wrote:

> Stefano wrote:
>
>> "automatic harvesting" scares the crap out of me, Conal.
>
> This is conceptually no different to harvesting JavaDoc tags from Java
> source.

good point.

>> I agree that there must be some kind of automatism going on, but the
>> topic creation is a human task and programs would do a
>> terrible job at
>> doing this.
>
> The example I gave assumed precisely that a human editor had written a
> namespace topic; the harvester was simply linking a document (which
> mentioned that namespace) to that existing topic. So this is automatic
> creation of associations or links, rather than topics.

yes, but this is simply spreading the issue of topic creation all over 
the place, you are not making it any easier (IMHO)

> But topics can also be safely created automatically in some cases: 
> where
> good structured metadata exists we can confidently base topics on it. 
> e.g.
> topics can usefully be automatically harvested from Java classes that
> implement particular interfaces (generators, transformers, etc).

True, but again, I don't see the point. I'm sure that if we make the 
editing interface to our doc system people will find it much easier to 
just make a list of components and update them as we go (expecially 
since they are not so many).

>> but anyway, we decided to do a first step with handwritten
>> linkmaps. we
>> can move incrementally from there on.
>
> Yes that's true.
>
> What I particularly like about TM is that they invert the usual 
> relationship
> of resources to metadata - in a TM the topics are central and the 
> resources
> are attached to them. So the key activity is to identify the high-level
> topics (the ontology) and then build a harvester to link your 
> resources to
> the topics in the ontology. This linking can be done by recognising 
> patterns
> in the resources (e.g. a reference to a namespace), or, better, by
> recognising explicit metadata (e.g. JavaDoc).

Very true.

--
Stefano.

RE: [RT] Improved navigation of learning objects

Posted by Conal Tuohy <co...@paradise.net.nz>.

Stefano wrote:

> "automatic harvesting" scares the crap out of me, Conal.

This is conceptually no different to harvesting JavaDoc tags from Java
source.

> I agree that there must be some kind of automatism going on, but the
> topic creation is a human task and programs would do a
> terrible job at
> doing this.

The example I gave assumed precisely that a human editor had written a
namespace topic; the harvester was simply linking a document (which
mentioned that namespace) to that existing topic. So this is automatic
creation of associations or links, rather than topics.

But topics can also be safely created automatically in some cases: where
good structured metadata exists we can confidently base topics on it. e.g.
topics can usefully be automatically harvested from Java classes that
implement particular interfaces (generators, transformers, etc).

> but anyway, we decided to do a first step with handwritten
> linkmaps. we
> can move incrementally from there on.

Yes that's true.

What I particularly like about TM is that they invert the usual relationship
of resources to metadata - in a TM the topics are central and the resources
are attached to them. So the key activity is to identify the high-level
topics (the ontology) and then build a harvester to link your resources to
the topics in the ontology. This linking can be done by recognising patterns
in the resources (e.g. a reference to a namespace), or, better, by
recognising explicit metadata (e.g. JavaDoc).

Cheers

Con

Re: [RT] Improved navigation of learning objects

Posted by Stefano Mazzocchi <st...@apache.org>.

On Monday, Oct 13, 2003, at 22:57 Europe/Rome, Conal Tuohy wrote:

> Bertrand Delacretaz wrote:
>
>>> Some form of topic map would be useful to build "what's
>> related" info
>>> though, which helps navigation and discovery a lot.
>
> Stefano Mazzocchi wrote:
>
>> Yes, but such a topic maps would have to be human edited.
>> this is what
>> scares me. tools for ontology creation (like Protege,
>> http://protege.stanford.edu/ for example) are available but are
>> *incredibly* complex to to use.
>
> Certainly some, but not all of this topic map needs to be 
> human-edited. I
> think a basic ontology could be written by hand in a text editor and 
> still
> be large enough to be usable. NB there are other sytaxes for authoring 
> topic
> maps which are simpler than XTM, too.
>
> To be useful, a hand-written ontology would only need to cover some of 
> the
> core concepts such as "namespace", "component", "block", "howto",
> "generator", "transformer". The bulk of the topics and relationships 
> are
> implicit in the docs, and could be automatically harvested into XTM 
> with a
> bunch of XML stylesheets, and linked to the underlying ontology.
>
> For instance, references to java classes, XML namespaces, etc could be
> automatically harvested from xdocs:
>
> <xsl:template
> 	match="source[contains(.,'http://apache.org/cocoon/request/2.0')]"
> 	mode="harvest-namespace-topics">
> 	<!--
> 	This document contains a reference to the request namespace,
> 	so it can be harvested as an occurrence of the namespace's topic
> 	-->
> 	<xtm:topic>
> 		<!-- topic for the request namespace -->
> 		<xtm:subjectIdentity>
> 			<xtm:subjectIndicatorRef
> 				xlink:href="http://apache.org/cocoon/request/2.0"/>
> 		</xtm:subjectIdentity>
> 		<!-- this occurrence -->
> 		<xtm:occurrence>
> 			<xtm:resourceRef xlink:href="{$current-page-url}"/>
> 		</xtm:occurrence>
> </xsl:template>
>
> <xsl:template match="source[contains(.,'org.apache.cocoon.')]">
> 	<!-- reference to a cocoon class: -->
> 	<!-- harvest the class-name and link this resource to the topic 
> reifying
> that class -->
> 	etc.
> </xsl:template>
>
>
> A topic map layer could be harvested not just from the docs, but also 
> from
> the Wiki, the javadoc, the cvs, etc, etc, and the resulting topics 
> merged to
> reveal the relationships which are currently produced by hand. Then 
> you can
> define a website FROM the topic map. Also, of course, you can use 
> other TM
> tools such as the tm-visualiser "tmnav" which displays the topic map 
> using
> the TouchGraph component.

"automatic harvesting" scares the crap out of me, Conal.

I agree that there must be some kind of automatism going on, but the 
topic creation is a human task and programs would do a terrible job at 
doing this.

but anyway, we decided to do a first step with handwritten linkmaps. we 
can move incrementally from there on.

--
Stefano.

RE: [RT] Improved navigation of learning objects

Posted by Conal Tuohy <co...@paradise.net.nz>.

Bertrand Delacretaz wrote:

> > Some form of topic map would be useful to build "what's
> related" info
> > though, which helps navigation and discovery a lot.

Stefano Mazzocchi wrote:

> Yes, but such a topic maps would have to be human edited.
> this is what
> scares me. tools for ontology creation (like Protege,
> http://protege.stanford.edu/ for example) are available but are
> *incredibly* complex to to use.

Certainly some, but not all of this topic map needs to be human-edited. I
think a basic ontology could be written by hand in a text editor and still
be large enough to be usable. NB there are other sytaxes for authoring topic
maps which are simpler than XTM, too.

To be useful, a hand-written ontology would only need to cover some of the
core concepts such as "namespace", "component", "block", "howto",
"generator", "transformer". The bulk of the topics and relationships are
implicit in the docs, and could be automatically harvested into XTM with a
bunch of XML stylesheets, and linked to the underlying ontology.

For instance, references to java classes, XML namespaces, etc could be
automatically harvested from xdocs:

<xsl:template
	match="source[contains(.,'http://apache.org/cocoon/request/2.0')]"
	mode="harvest-namespace-topics">
	<!--
	This document contains a reference to the request namespace,
	so it can be harvested as an occurrence of the namespace's topic
	-->
	<xtm:topic>
		<!-- topic for the request namespace -->
		<xtm:subjectIdentity>
			<xtm:subjectIndicatorRef
				xlink:href="http://apache.org/cocoon/request/2.0"/>
		</xtm:subjectIdentity>
		<!-- this occurrence -->
		<xtm:occurrence>
			<xtm:resourceRef xlink:href="{$current-page-url}"/>
		</xtm:occurrence>
</xsl:template>

<xsl:template match="source[contains(.,'org.apache.cocoon.')]">
	<!-- reference to a cocoon class: -->
	<!-- harvest the class-name and link this resource to the topic reifying
that class -->
	etc.
</xsl:template>

A topic map layer could be harvested not just from the docs, but also from
the Wiki, the javadoc, the cvs, etc, etc, and the resulting topics merged to
reveal the relationships which are currently produced by hand. Then you can
define a website FROM the topic map. Also, of course, you can use other TM
tools such as the tm-visualiser "tmnav" which displays the topic map using
the TouchGraph component.

Cheers

Con

Re: [RT] Improved navigation of learning objects

Posted by Stefano Mazzocchi <st...@apache.org>.

On Monday, Oct 13, 2003, at 09:36 Europe/Rome, Bertrand Delacretaz 
wrote:

> Le Dimanche, 12 oct 2003, à 17:55 Europe/Zurich, Stefano Mazzocchi a 
> écrit :
>> On Sunday, Oct 12, 2003, at 16:13 Europe/Rome, Alan Gutierrez wrote:
>>> ....Has anyone discussed how to impose an outline on a Wiki?
>>
>> yes. there are some proposals on the table, ranging from simple to 
>> futuristic....
>
> I think we should concentrate on the simple one for now, maybe by 
> creating an experimental block to work on a prototype?
> We could copy the existing docs there, assign unique IDs to them and 
> start playing with trails.
> Having permanent IDs leaves room for the futuristic stuff to emerge.

exactly, create a foundation solid enough for us to build upon, but 
without forcing the futuristic stuff to slow us down.

>> ...[note: this is getting closer to what topic maps are about! see 
>> topicmaps.org, even if, IMO, it wouldn't make sense to use topic maps 
>> for this because they are much more complex]
>
> Some form of topic map would be useful to build "what's related" info 
> though, which helps navigation and discovery a lot.

Yes, but such a topic maps would have to be human edited. this is what 
scares me. tools for ontology creation (like Protege, 
http://protege.stanford.edu/ for example) are available but are 
*incredibly* complex to to use.

> <snip very-interesting-futuristic-stuff>
> Thanks for sharing this!

my pleasure.

I'm going to research a lot on this in the future and I would like to 
use the cocoon community as a "beta testing" of the community dynamics 
on top of this. it might take a while, but I'm a patient guy ;-)

--
Stefano.

Re: [RT] Improved navigation of learning objects

Posted by Bertrand Delacretaz <bd...@apache.org>.

Le Dimanche, 12 oct 2003, à 17:55 Europe/Zurich, Stefano Mazzocchi a 
écrit :
> On Sunday, Oct 12, 2003, at 16:13 Europe/Rome, Alan Gutierrez wrote:
>> ....Has anyone discussed how to impose an outline on a Wiki?
>
> yes. there are some proposals on the table, ranging from simple to 
> futuristic....

I think we should concentrate on the simple one for now, maybe by 
creating an experimental block to work on a prototype?
We could copy the existing docs there, assign unique IDs to them and 
start playing with trails.
Having permanent IDs leaves room for the futuristic stuff to emerge.

> ...[note: this is getting closer to what topic maps are about! see 
> topicmaps.org, even if, IMO, it wouldn't make sense to use topic maps 
> for this because they are much more complex]

Some form of topic map would be useful to build "what's related" info 
though, which helps navigation and discovery a lot.

<snip very-interesting-futuristic-stuff>
Thanks for sharing this!

-Bertrand

fyi: agora

Posted by Robert Koberg <ro...@koberg.com>.

Not just for social engineering anymore :)

http://www.agora.de/eng/index.html

[RT] Improved navigation of learning objects

Posted by Stefano Mazzocchi <st...@apache.org>.

On Sunday, Oct 12, 2003, at 16:13 Europe/Rome, Alan Gutierrez wrote:

> The trouble with Wiki and docs is that new users, such as myself,
> are going to look for a documentation outline. A good TOC and index
> makes all the differnece in the world when searching documentation.

eheh, right on.

> Has anyone discussed how to impose an outline on a Wiki?

yes. there are some proposals on the table, ranging from simple to 
futuristic.

                                 - o -

the simple one is a manually created single-dereferencing linkmap.

Imagine that you have a repository with the following learning objects:

  /1
  /2
  /3
  /4

which are edited and created individually. Then you have a linkmap that 
basically says

  Trail "A"
    /1
   Section "Whatever"
     /3
     /4
   Section "Somethign else"
     /2

  Trail "B"
    /4
    /1

Trails are like "books", and they might share LOs. Trails might be 
published as a single PDF file for easier offline review.

Trails can be used as "tabs" in the forrest view, while the rest is the 
navbar on the side.

the LO identifier (http://cocoon.apache.org/LO/4) can be translated to 
a real locator (http://cocoon.apache.org/cocoon/2.1/A/introduction) and 
all the links rewritten accordingly.

This link translation is a mechanical lookup, based on the linkmap 
information.

[note: this is getting closer to what topic maps are about! see 
topicmaps.org, even if, IMO, it wouldn't make sense to use topic maps 
for this because they are much more complex]

                                     - o -

The approach above works but it requires two operations:

  1) creation of the LO
  2) connection of the LO in the linkmap

It is entirely possible to have a page that lists all the LO, but it 
would be pretty useless. It is also possible to find a LO by 
searching... this is normally how wikis are accessed, but it would be 
better to find a way to have LO "self connecting".

                                     - o -

In the more futuristic approach, each learning object specifies two 
things:

  1) what it expects the reader to know
  2) what it expect to give to the reader

the above reflects a mechanical model of the theory of cognition that 
indicates that information is transformed into knowledge by projection 
onto the existing cognitive context.

Suppose that you have learning object library of

                   /1 -(teaches)-> [a]
                   /2 -(teaches)-> [a]
  [a] <-(expects)- /3 -(teaches)-> [b]
  [a] <-(expects)- /4 -(teaches)-> [c]

now you add a new learning object indicating

  [c] <-(expects)- /5 -(teaches)-> [b]

and you can infer that

  [a] <-(expects)- /4+/5 -(teaches)-> [b]

therefore

  /3 and /4+/5

are cognitively related

                               - o -

The above is nice and good, but we hit the big wall: who defines the 
taxonomy and how.

The taxonomy is the collection of all those identifiers that [a] 
identify abstract concepts.

If each editor comes up with his own identifiers, we might have to 
issues:

  1) identifiers are so precise that there is no match in between 
documents written by different people

  2) identifiers are so broad in spectrum that concepts overlap and 
dependencies blur

This is, IMHO, the biggest problem that the semantic web has to face. 
The RDF/RDFSchema/OWL stack is very nice. A piece of art once you get 
it (and I still don't, but the fog is starting to clear up a little), 
but it's all based on taxonomical or ontological contracts... which 
are, IMO, the most painful and expensive thing to create an maintain.

So we must find a way to come up with a system that helps us in the 
discovery, creation and correctness maintenance of a taxonomy, but 
without removing or overlapping human judgment.

                               - o -

Here is a scenario of use (let's focus on text so, LO = page)

  a) you edit a page.

  b) you highlight the words in the page that you believe are important 
to identify the knowledge distilled in the page that is transmitted to 
the reader.

  c) you also highlight the words in the page that you believe identify 
the cognitive context that is required by the user to understand this 
page

  [this can be done very easily in the WYSIWIG editor, using different 
colors]

  c) when you are done, you submit the page into the system.

  d) the page gets processed against the existing repository.

  e) the system suggests you the topic that might be closer to your 
page, and you select the ones you think they fit the best. or you 
introduce a new one in case nothing fits.

point d) seems critical in the overall smartness of the system, but I 
don't think it is since semantic estimation is done by humans on point 
e), point d) just has to be smart enough to remove all the pages that 
have nothing at all to do with the current learning object.

Possible implementations are:

  - document vector space euclidean distance (used by most search 
engines, including google and lucene)
  - latent semantic distance (but this one is patented but it will 
expire in a few years, used for spam filtering by the Mail.app in 
MacOSX, used by the Microsoft Office help system and the assistant).

                                - o -

The above model paints a double-dereference hypertext, sort of 
"polymorphic" hypertext where links are made to "abstract concepts" 
that are dereferenced against resource implicitly.

This allows the structure of the system and the navigation between LO 
to grow independently, using information from the various contextual 
dependencies and from the analysis of the use of the learning objects 
from the uses.

In fact, navigation between learning objects can be:

  1) hand written
  2) inferred from the contextual dependencies
  3) inferred from the usage patterns

I believe that a system that is able to implement all of the above 
would prove to be a new kind of hypertext, much closer to the original 
Xanadu vision that Ted Nelson outlined in the '60s when he coined the 
term hypertext.

I don't think we have to jump all the way up here, there are smaller 
steps that we can do to improve what we have, but this is where I want 
to go.

--
Stefano.

RE: Forrest and the NDS (Re: [RT] Moving towards a new documentation system)

Posted by Reinhard Pötz <re...@gmx.net>.

From: Alan Gutierrez

> * Reinhard Poetz <re...@apache.org> [2003-10-12 14:06]:
> 
> > Stefano envisions a system where it is very easy to edit 
> content using 
> > a WYSIWIG editor (no structured text, no strange XML) for 
> *everybody* 
> > by simply clicking on an 'edit' button - without having 
> installed any 
> > unknown software - and to the changes right at the moment when you 
> > spot e.g. a spelling mistake. A Cocoon committer has to confirm the 
> > changes or additions and then the new stuff is published 
> > (mini-workflow).
> > 
> > After a non-committer has provided e.g. 3 substantial 
> patches to our 
> > documentation or new documents he can get the status of a doc 
> > committer and can publish directly without the confirmation of a 
> > Cocoon CVS committer.
> > 
> > With this simple system we can use the power of Wiki (everybody 
> > becomes a doc author) and combine this with our requirement that we 
> > *must* have a backend repository where we store our content 
> in order 
> > to be able to re-publish our website at every time.
> 
> The trouble with Wiki and docs is that new users, such as 
> myself, 

Thank you! It is very appreciated to get feedback from new users!

> are going to look for a documentation outline. A good 
> TOC and index makes all the differnece in the world when 
> searching documentation. Has anyone discussed how to impose 
> an outline on a Wiki?

In order to navigate through the many documents (Stefano calls them
'learning objects') in a plain structure which a Wiki-like system
provides there will be special documents which are tracks through the
forest of documents (e.g. How to setup Cocoon?, Write your first sitemap
component, ...).

Reinhard

RE: Forrest and the NDS (Re: [RT] Moving towards a new documentation system)

Posted by Reinhard Poetz <re...@apache.org>.

From: Alan Gutierrez

> * Reinhard Poetz <re...@apache.org> [2003-10-12 14:06]:
> 
> > Stefano envisions a system where it is very easy to edit
> content using
> > a WYSIWIG editor (no structured text, no strange XML) for
> *everybody*
> > by simply clicking on an 'edit' button - without having
> installed any
> > unknown software - and to the changes right at the moment when you 
> > spot e.g. a spelling mistake. A Cocoon committer has to confirm the 
> > changes or additions and then the new stuff is published 
> > (mini-workflow).
> > 
> > After a non-committer has provided e.g. 3 substantial
> patches to our
> > documentation or new documents he can get the status of a doc 
> > committer and can publish directly without the confirmation of a 
> > Cocoon CVS committer.
> > 
> > With this simple system we can use the power of Wiki (everybody 
> > becomes a doc author) and combine this with our requirement that we
> > *must* have a backend repository where we store our content
> in order
> > to be able to re-publish our website at every time.
> 
> The trouble with Wiki and docs is that new users, such as myself,

Thank you! It is very appreciated to get feedback from new users!

> are going to look for a documentation outline. A good
> TOC and index makes all the differnece in the world when
> searching documentation. Has anyone discussed how to impose 
> an outline on a Wiki?

In order to navigate through the many documents (Stefano calls them
'learning objects') in a plain structure which a Wiki-like system
provides there will be special documents which are tracks through the
forest of documents (e.g. How to setup Cocoon?, Write your first sitemap
component, ...).

Reinhard

Re: Forrest and the NDS (Re: [RT] Moving towards a new documentation system)

Posted by Alan Gutierrez <al...@agtrz.com>.

* Reinhard Poetz <re...@apache.org> [2003-10-12 14:06]:

> Stefano envisions a system where it is very easy to edit content using a
> WYSIWIG editor (no structured text, no strange XML) for *everybody* by
> simply clicking on an 'edit' button - without having installed any
> unknown software - and to the changes right at the moment when you spot
> e.g. a spelling mistake. A Cocoon committer has to confirm the changes
> or additions and then the new stuff is published (mini-workflow).
> 
> After a non-committer has provided e.g. 3 substantial patches to our
> documentation or new documents he can get the status of a doc committer
> and can publish directly without the confirmation of a Cocoon CVS
> committer.
> 
> With this simple system we can use the power of Wiki (everybody becomes
> a doc author) and combine this with our requirement that we *must* have
> a backend repository where we store our content in order to be able to
> re-publish our website at every time.

The trouble with Wiki and docs is that new users, such as myself,
are going to look for a documentation outline. A good TOC and index
makes all the differnece in the world when searching documentation.
Has anyone discussed how to impose an outline on a Wiki?

-- 
Alan Gutierrez - alan@agtrz.com - C:504.301.8807 - O:504.948.9237

Re: Forrest and the NDS (Re: [RT] Moving towards a new documentation system)

Posted by Alan Gutierrez <al...@agtrz.com>.

* Jeff Turner <je...@apache.org> [2003-10-12 12:27]:

> On Sun, Oct 12, 2003 at 12:40:39PM +0200, Stefano Mazzocchi wrote:
> ...
> > In case something is not clear, I'll be very happy to explain it.
> 
> My main question is: to users, how is this system functionally different
> from a Wiki?  Eg, a good one that has 'related pages' inferred from the
> page's content:
> 
> http://wiki.opensymphony.com/space/WebWork+2+Components
> 
> And a naive question: wouldn't Subversion be a better backend than this
> JSR170 thing?  Then developers could store xdocs in the same repository
> as code, and could check in updates with standard svn tools (including
> IDEs).

It looks like JSR170 is an API specification, so one can create an
API for Subversion. It is important that there is an API for
Subversion available however. Also, if this is to be a Cocoon
application, wouldn't it be _nice to have_ a content management
system that creates XML deltas of XML data?

-- 
Alan Gutierrez - alan@agtrz.com - C:504.301.8807 - O:504.948.9237

Re: Forrest and the NDS

Posted by Stefano Mazzocchi <st...@apache.org>.

On Sunday, Oct 12, 2003, at 14:31 Europe/Rome, Jeff Turner wrote:

> On Sun, Oct 12, 2003 at 12:40:39PM +0200, Stefano Mazzocchi wrote:
> ...
>> In case something is not clear, I'll be very happy to explain it.
>
> My main question is: to users, how is this system functionally 
> different
> from a Wiki?

for editing, you will get a three-pane editor:

  - wysiwig
  - xhtml source
  - wiki source

and you can switch between one and another, depending on what you like 
the most (or what type of content you have to cut/paste from outside 
the editor)

> Eg, a good one that has 'related pages' inferred from the
> page's content:
>
> http://wiki.opensymphony.com/space/WebWork+2+Components

Yes, SnipSnap. I had some private email conversation with the guys at 
Fraunhofer, at some point they wanted to use cocoon for it (IIRC) but 
they decided it was too complex for their needs.

The ideas are very similar, in fact.

What I dislike is the weblog-ish appearence... from a navigational 
perspective, it's too hyperlinked for my own taste, a documentation 
system should help me go thru, it should guide me, not give me tons of 
potential choices and make me choose all the time.

both wikis and weblogs tend to lack a navigation system. it's basically 
what allowed them to grow exponentially. but we need to find a way to 
distill some structure and guide our readers... but without limiting 
the power of expressiveness of an unstructured collection of notes.

> And a naive question: wouldn't Subversion be a better backend than this
> JSR170 thing?  Then developers could store xdocs in the same repository
> as code, and could check in updates with standard svn tools (including
> IDEs).

the goal is to have the backend so friendly that you'd never use the 
repository directly anyway.

but as I said, a webdav repository is the best choice for now (and 
subversion is a good choice for that); we can move on JSR 170 would the 
need emerge (I think it will, but I can't predict when)

--
Stefano.

RE: Forrest and the NDS (Re: [RT] Moving towards a new documentation system)

Posted by Reinhard Poetz <re...@apache.org>.

From: Jeff Turner

> On Sun, Oct 12, 2003 at 12:40:39PM +0200, Stefano Mazzocchi wrote: ...
> > In case something is not clear, I'll be very happy to explain it.
> 
> My main question is: to users, how is this system 
> functionally different from a Wiki?  Eg, a good one that has 
> 'related pages' inferred from the page's content:
> 
> http://wiki.opensymphony.com/space/WebWork+2+Components
> 
> And a naive question: wouldn't Subversion be a better backend 
> than this JSR170 thing?  

JSR170 is the draft to come up with *one* API for all repositories. This
makes sense from a technical point of view but also it will be the first
time that we all have a common understanding what a repository is (where
does it start? where does it end?)

Back to your question: IMHO Subversion would be a good choice to be the
backend but behind a JSR170-API layer. (see
http://www.apache.org/dist/cocoon/events/gt2003/presentations/17-cocoon-
webdav.pdf)

> Then developers could store xdocs in 
> the same repository as code, and could check in updates with 
> standard svn tools (including IDEs).

Wiki lowered the barrier to become editor and to contribute to a project
and people love to use it. Also many committers (including myself) like
it more to write a new text at the Wiki than to edit XML for our
'official' docu. 

Stefano envisions a system where it is very easy to edit content using a
WYSIWIG editor (no structured text, no strange XML) for *everybody* by
simply clicking on an 'edit' button - without having installed any
unknown software - and to the changes right at the moment when you spot
e.g. a spelling mistake. A Cocoon committer has to confirm the changes
or additions and then the new stuff is published (mini-workflow).

After a non-committer has provided e.g. 3 substantial patches to our
documentation or new documents he can get the status of a doc committer
and can publish directly without the confirmation of a Cocoon CVS
committer.

With this simple system we can use the power of Wiki (everybody becomes
a doc author) and combine this with our requirement that we *must* have
a backend repository where we store our content in order to be able to
re-publish our website at every time.

Reinhard

Re: Forrest and the NDS (Re: [RT] Moving towards a new documentation system)

Posted by Jeff Turner <je...@apache.org>.

On Sun, Oct 12, 2003 at 12:40:39PM +0200, Stefano Mazzocchi wrote:
...
> In case something is not clear, I'll be very happy to explain it.

My main question is: to users, how is this system functionally different
from a Wiki?  Eg, a good one that has 'related pages' inferred from the
page's content:

http://wiki.opensymphony.com/space/WebWork+2+Components

And a naive question: wouldn't Subversion be a better backend than this
JSR170 thing?  Then developers could store xdocs in the same repository
as code, and could check in updates with standard svn tools (including
IDEs).

Thanks,

--Jeff

> --
> Stefano.
>

Re: Forrest and the NDS (Re: [RT] Moving towards a new documentation system)

Posted by Bertrand Delacretaz <bd...@apache.org>.

Le Dimanche, 12 oct 2003, à 12:40 Europe/Zurich, Stefano Mazzocchi a 
écrit :

>
> On Sunday, Oct 12, 2003, at 04:23 Europe/Rome, Jeff Turner wrote:
>> ...
>> :)  Let's remember that there's hard reuse and soft reuse.  Hard reuse
>> means physically integrating with Forrest/Lenya.  Soft reuse means
>> reusing ideas, people, and code where appropriate.  For a new project 
>> in
>> RT phase, worrying about hard reuse with Forrest and Lenya would IMHO
>> just slow things down.
>
> I agree: try to come up with a system that runs first and *after* try 
> to merge the results and possible changes back in the original 
> systems. This reduces the potential community friction when changes 
> require some little paradigm shifts.

How about creating an "nds" (or "learningtrove") block and work more or 
less as I outlined in the original message:

-copy all existing docs to a single directory, "big bag of docs", in 
this new block
-rename docs as needed to give them permanent names (=numeric IDs)
-create a very simple publishing system for now
-start building the navigations, trails, tables of contents 
incrementally
-if the docs format changes for the new doc management system, 
navigation definitions stay valid

I like the idea of temporarily forgetting about Forrest and Lenya: a 
simple prototype with minimal constaints would help this move forward, 
and we can later backport the good ideas where they belong.

-Bertrand

Re: Forrest and the NDS (Re: [RT] Moving towards a new documentation system)

Posted by Stefano Mazzocchi <st...@apache.org>.

On Sunday, Oct 12, 2003, at 04:23 Europe/Rome, Jeff Turner wrote:

> On Sat, Oct 11, 2003 at 03:17:48PM +0200, Stefano Mazzocchi wrote:
>>
>> On Saturday, Oct 11, 2003, at 14:25 Europe/Rome, Nicola Ken Barozzi
>> wrote:
>>
>>> Please don't forget Forrest.
>>
>> we are not.
>
> :)  Let's remember that there's hard reuse and soft reuse.  Hard reuse
> means physically integrating with Forrest/Lenya.  Soft reuse means
> reusing ideas, people, and code where appropriate.  For a new project 
> in
> RT phase, worrying about hard reuse with Forrest and Lenya would IMHO
> just slow things down.

I agree: try to come up with a system that runs first and *after* try 
to merge the results and possible changes back in the original systems. 
This reduces the potential community friction when changes require some 
little paradigm shifts.

> Forrest started with a RT, a mailing list, and a bunch of Cocooners
> willing to contribute.  I wasn't at the GetTogether.  I've never seen a
> RT articulating the vision for this project.  There's just tantalising
> snippets.  Could we perhaps dream up a name ('linotype' seems obvious),
> start a mailing list, and have some RTs on what this is all about? :)

That's what we are trying to do with this thread, Jeff, get a name, a 
collection of random thoughts and get going. I don't think we need a 
different list for now.

The things that were discussed at the hackaton were simply 
higher-bandwidth version of this conversation. And Bertrand started it 
over in order for everybody to know what were talked about and having 
the ability to jump in and help.

In case something is not clear, I'll be very happy to explain it.

>> I thought about it and I totally resonate with Bertrand: we need to
>> outline an incremental transition to our own CMS reusing as much dog
>> food as possible (which is also good for a community perspective).
>>
>> Here is my proposal:
>>
>>  1) the system gets divided into three parts
>>
>>      - frontend -> mapped to http://cocoon.apache.org -> static
>>      - repository -> a WebDAV server
>>      - backend -> mapped to http://edit.cocoon.apache.org -> dynamic
>
> The mozilla.org site does something similar, in that the 'edit this 
> page'
> link goes to:
>
> http://doctor.mozilla.org/?file=mozilla-org//html/index.html
>
> I've long thought this would be a great Forrest enhancement.

exactly. Zope does similar things as well.

It's been 3 years I wanted to implement this in cocoon 
(frontend/repository/backend architecture)... the incubation of lenya 
was instrumenting for this. also the creation of linotype 
(experimenting with serious editing, repository separation and 
numberical URIs).

the wiki forced me to realize it's now time to move on and the hackaton 
gave me a way to express myself clear enough so that others were 
tickled by it.

>> The future idea is to use indirect linking where lookup is a sort of
>> "what's related" understood out of the analysis of user patterns, but
>> this is far ahead in the future.
>>
>> For now, I think direct linking would be enough for our needs... we
>> just need a good "lookup and discovery" of learning objects integrated
>> in the backend.
>>
>>                                   - o -
>>
>> As the implementation
>>
>>  1) forrest will be used to generate the site from the contents of the
>> repository
>>
>>  2) the repository can be either plain vanilla subversion or a webdav
>> server implemented by cocoon on top of another repository (either
>> subversion or catacomb or JSR170). even CVS, but we might want to stay
>> away from it.
>>
>>  3) lenya will be used as the backend.
>>
>> Missing things:
>>
>>  1) is forrest already capable of doing what we ask?
>
> Possibly.  Forrest's sitemap is built in layers:
>
> cocoon://**.xml         # Source pipelines generating doc-v12
> cocoon://body-*.html
> cocoon://menu-*.html
> cocoon://tabs-*.html    # Intermediate formats
> cocoon://**.{html,pdf}  # Destination formats
>
> So just switch in a different **.xml subsitemap, and Forrest will build
> the site from whatever backend you choose.

Sounds perfect.

> Forrest's indirect linking system is (IMHO:) pretty cool and Wiki-like,
> in that I can write <a href="site:foo"> from anywhere, and it links to
> the 'foo' page wherever (or whatever) it is.  The source files have a 
> URI
> space all of their own, independent of the final http: URI space.

Very good as well.

> The linking implementation is very flexible, built in input modules.  
> <a
> href="site:index"> causes the 'site' InputModule to be fetched and 
> passed
> the 'index' key.  A SimpleMappingModule converts this to
> '/site//index/@href', which is fed to an XMLModule, which interprets it
> as an XPath into the navigation file, site.xml.  As the XPath prefix 
> and
> suffix are configured in cocoon.xconf, any XML format for the 'linkmap'
> (aka site.xml) can be used.  Lots of gory details at
> http://xml.apache.org/forrest/linking.html

As you can see from my question, I (and possibly many others here) lost 
contact with forrest a while ago: you people simply did the job well 
enough for us crawl back in our corner as simple users ;-)

but as you suggest, we might need "soft reuse" at this point, so you'll 
hear from us when things don't work (or you are more than welcome to 
join, of course).

>>  2) what's the best repository? where do we install it?
>>
>>  3) is lenya enough for what we need? (Michi says so)
>>
>>  4) how much work is the integration of linotype with lenya? (I'll
>> investigate this soon)
>>
>>  5) how do we get the wiki into the repository? (I plan to write a
>> wiki-syntax editing pane for linotype, would this be enough?)
>
> Could use the Chaperon Wiki grammar to convert to XML, but..
> grammar-based validation results in really undecipherable error 
> messages.
> Might be best to first use a regular Wiki engine as a 'lexer', to get 
> the
> Wiki syntax 'well-formed', and then use a grammar to go from 
> well-formed
> Wiki -> XML, and then use an XML schema to ensure validity.  3-phase
> validation of Wiki syntax.  Could warrant a project all on its own..

yeah

>>  6) how do we get the rest of the data into the repository?
>>
>>  7) how do we make it simple to edit linkmaps? what searching and
>> proximity tools can we provide for this?
>
> Lenya is probably the best hunting-ground for this.

yep, forrest as the frontend engine, lenya as the backend engine, 
cocoon empowering both, documents all safely stored in one repository 
(no more CVS branching for docs!!), easy editing, easy workflow, 
ability to assemble trails of documentation, no need to bug 
infrastructure@ for a dynamic site, mirror friendly...

... gosh, seems like heaven ;-)

--
Stefano.

Forrest and the NDS (Re: [RT] Moving towards a new documentation system)

Posted by Jeff Turner <je...@apache.org>.

On Sat, Oct 11, 2003 at 03:17:48PM +0200, Stefano Mazzocchi wrote:
> 
> On Saturday, Oct 11, 2003, at 14:25 Europe/Rome, Nicola Ken Barozzi 
> wrote:
> 
> >Please don't forget Forrest.
> 
> we are not.

:)  Let's remember that there's hard reuse and soft reuse.  Hard reuse
means physically integrating with Forrest/Lenya.  Soft reuse means
reusing ideas, people, and code where appropriate.  For a new project in
RT phase, worrying about hard reuse with Forrest and Lenya would IMHO
just slow things down.

Forrest started with a RT, a mailing list, and a bunch of Cocooners
willing to contribute.  I wasn't at the GetTogether.  I've never seen a
RT articulating the vision for this project.  There's just tantalising
snippets.  Could we perhaps dream up a name ('linotype' seems obvious),
start a mailing list, and have some RTs on what this is all about? :)

> I thought about it and I totally resonate with Bertrand: we need to 
> outline an incremental transition to our own CMS reusing as much dog 
> food as possible (which is also good for a community perspective).
> 
> Here is my proposal:
> 
>  1) the system gets divided into three parts
> 
>      - frontend -> mapped to http://cocoon.apache.org -> static
>      - repository -> a WebDAV server
>      - backend -> mapped to http://edit.cocoon.apache.org -> dynamic

The mozilla.org site does something similar, in that the 'edit this page'
link goes to:

http://doctor.mozilla.org/?file=mozilla-org//html/index.html

I've long thought this would be a great Forrest enhancement.

[snip cool stuff]
> The future idea is to use indirect linking where lookup is a sort of 
> "what's related" understood out of the analysis of user patterns, but 
> this is far ahead in the future.
> 
> For now, I think direct linking would be enough for our needs... we 
> just need a good "lookup and discovery" of learning objects integrated 
> in the backend.
> 
>                                   - o -
> 
> As the implementation
> 
>  1) forrest will be used to generate the site from the contents of the 
> repository
> 
>  2) the repository can be either plain vanilla subversion or a webdav 
> server implemented by cocoon on top of another repository (either 
> subversion or catacomb or JSR170). even CVS, but we might want to stay 
> away from it.
> 
>  3) lenya will be used as the backend.
> 
> Missing things:
> 
>  1) is forrest already capable of doing what we ask?

Possibly.  Forrest's sitemap is built in layers:

cocoon://**.xml         # Source pipelines generating doc-v12
cocoon://body-*.html
cocoon://menu-*.html
cocoon://tabs-*.html    # Intermediate formats
cocoon://**.{html,pdf}  # Destination formats

So just switch in a different **.xml subsitemap, and Forrest will build
the site from whatever backend you choose.

Forrest's indirect linking system is (IMHO:) pretty cool and Wiki-like,
in that I can write <a href="site:foo"> from anywhere, and it links to
the 'foo' page wherever (or whatever) it is.  The source files have a URI
space all of their own, independent of the final http: URI space.

The linking implementation is very flexible, built in input modules.  <a
href="site:index"> causes the 'site' InputModule to be fetched and passed
the 'index' key.  A SimpleMappingModule converts this to
'/site//index/@href', which is fed to an XMLModule, which interprets it
as an XPath into the navigation file, site.xml.  As the XPath prefix and
suffix are configured in cocoon.xconf, any XML format for the 'linkmap'
(aka site.xml) can be used.  Lots of gory details at
http://xml.apache.org/forrest/linking.html

>  2) what's the best repository? where do we install it?
> 
>  3) is lenya enough for what we need? (Michi says so)
> 
>  4) how much work is the integration of linotype with lenya? (I'll 
> investigate this soon)
> 
>  5) how do we get the wiki into the repository? (I plan to write a 
> wiki-syntax editing pane for linotype, would this be enough?)

Could use the Chaperon Wiki grammar to convert to XML, but..
grammar-based validation results in really undecipherable error messages.
Might be best to first use a regular Wiki engine as a 'lexer', to get the
Wiki syntax 'well-formed', and then use a grammar to go from well-formed
Wiki -> XML, and then use an XML schema to ensure validity.  3-phase
validation of Wiki syntax.  Could warrant a project all on its own..

>  6) how do we get the rest of the data into the repository?
> 
>  7) how do we make it simple to edit linkmaps? what searching and 
> proximity tools can we provide for this?

Lenya is probably the best hunting-ground for this.

--Jeff

> Enough of a braindump for now.
> 
> Fire at will.
> 
> 
> --
> Stefano.
>

Re: [RT] Moving towards a new documentation system

Posted by Stefano Mazzocchi <st...@apache.org>.

On Saturday, Oct 11, 2003, at 19:54 Europe/Rome, Nicola Ken Barozzi 
wrote:

> Antonio Gallardo wrote:
> ...
>> Based on lastest posts, I will like to see a fusion Lenya+Forrest. Is 
>> this
>> posible or they are divorced.
>
> Synergy. What I want Forrest and Lenya to seek is syergy.

Agreed 100%. Merging two projects *from the top* is a really bad idea 
in terms of community dynamics.

Besides, the focus of forrest and lenya, while similar, is not 
completely overlapping.

I'm proposign a system where they can be used together, this creates 
synergy and communication. things that improve communities.

if, at the end, the two projects end up merging, well, great, but has 
to come from them, from down below, from environmental needs. but, 
personally, I would like to keep them separate: one community focuses 
on static generation (and tries to come up with smart ways to make the 
most out of it without requiring dynamism) and the other focuses on 
backend things (editing, workflow, versioning, etc...).

well, that's my personal vision of course and, to be honest, not my 
itch to scratch.

we need a backend for our documentation system and without forcing us 
to loose all the good things we have today.

this is what I want (and I would like you as well) to focus. the rest 
will happen darwinistically, as usual.

--
Stefano.

Re: [RT] Moving towards a new documentation system

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Antonio Gallardo wrote:
...
> Based on lastest posts, I will like to see a fusion Lenya+Forrest. Is this
> posible or they are divorced.

Synergy. What I want Forrest and Lenya to seek is syergy.

I can tell you, as a Forrest committer, that we *love* to do things to 
make nice DTDs and relative outputs. On the other hand, Lenya is very 
strong on the editing side, and there is where they are strong.

So from a community perspective, IMO id does not make sense at all to 
merge the communities.

What makes sense, and what I want to pursue as far as I can, is to work 
together towards making users able to use both technologies with the 
same site.

So I'm happy that Stefano came out with this proposal, as it fits 
perfectly with my itches(*)! :-)

(*) have fun *using* Cocoon, add some nice extra components here and 
there, and make other guys do strange things like making editors ;-)

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: [RT] Moving towards a new documentation system

Posted by Antonio Gallardo <ag...@agsoftware.dnsalias.com>.

Stefano Mazzocchi dijo:
>
> On Saturday, Oct 11, 2003, at 14:25 Europe/Rome, Nicola Ken Barozzi
> wrote:
>
>> Please don't forget Forrest.
>
> we are not.
>
> I thought about it and I totally resonate with Bertrand: we need to
> outline an incremental transition to our own CMS reusing as much dog
> food as possible (which is also good for a community perspective).
>
> Here is my proposal:
>
>   1) the system gets divided into three parts
>
>       - frontend -> mapped to http://cocoon.apache.org -> static
>       - repository -> a WebDAV server
>       - backend -> mapped to http://edit.cocoon.apache.org -> dynamic
>
>    2) each page is rendered with a wiki-like "edit this page" link, that
>
> points to the exact same URI, on the backend virtual host.
>
>    3) everybody is able to edit or to enter new pages, but they need to
> be approuved to get published.
>
>    4) a committer can publish directly (thru a login)
>
>    5) each page is considered an atomic learning object (LO) and is
> identified by a numerical URI
>
>    6) the process of creating a learning object (editing) and the
> process of linking some together, are kept independent.

I agree with this point. But this single point is complex.... what if?...
there are many users posting new version of the same document, how we will
handle this? Partial publications or like bugzilla (overwritting the
commited change - if you wish)?

> The above is really important, IMO, because separates concerns between:
>
>     - writers -> those who know about something and want to write about
> it
>     - editors -> those who know what users want to know and assembles
> knowledge for them
>
> The lack of separation between these two types of people is, IMO, what
> is missing from our documentation infrastructure. Note that wikis
> concentrate on the first part and leave the second part up to the
> collective task of content refactoring. I think this is weak and it's
> the worst part of the wiki pattern: we need something better.
>
> The future idea is to use indirect linking where lookup is a sort of
> "what's related" understood out of the analysis of user patterns, but
> this is far ahead in the future.
>
> For now, I think direct linking would be enough for our needs... we
> just need a good "lookup and discovery" of learning objects integrated
> in the backend.
>
>                                    - o -
>
> As the implementation
>
>   1) forrest will be used to generate the site from the contents of the
> repository
>
>   2) the repository can be either plain vanilla subversion or a webdav
> server implemented by cocoon on top of another repository (either
> subversion or catacomb or JSR170). even CVS, but we might want to stay
> away from it.
>
>   3) lenya will be used as the backend.
>
> Missing things:
>
>   1) is forrest already capable of doing what we ask?
>
>   2) what's the best repository? where do we install it?
>
>   3) is lenya enough for what we need? (Michi says so)
>
>   4) how much work is the integration of linotype with lenya? (I'll
> investigate this soon)
>
>   5) how do we get the wiki into the repository? (I plan to write a
> wiki-syntax editing pane for linotype, would this be enough?)
>
>   6) how do we get the rest of the data into the repository?
>
>   7) how do we make it simple to edit linkmaps? what searching and
> proximity tools can we provide for this?
>
> Enough of a braindump for now.
>
> Fire at will.

Based on lastest posts, I will like to see a fusion Lenya+Forrest. Is this
posible or they are divorced.

Best Regards,

Antonio Gallardo

Re: [RT] Moving towards a new documentation system

Posted by Bertrand Delacretaz <bd...@apache.org>.

Le Samedi, 11 oct 2003, à 14:25 Europe/Zurich, Nicola Ken Barozzi a 
écrit :
> Bertrand Delacretaz wrote:
>> ...We can then build all kinds of navigational structures, trails, 
>> multiple tables of contents, beginners/advanced, whatever (again 
>> picking up on wiki idea of a flat page structure with many navigation 
>> paths), but the path to a given document stays valid forever unless 
>> documents are removed.
>
> Forrest's site.xml is ready to adapt to the needs.

ok. I'd prefer multiple "navigation definition" files though, one for 
each "navigation concern" (tracks, beginner/advanced, 
functionality-based, etc).
Is this possible with Forrest, or what do you suggest?

> ..Eh, why "Forrest /probably/ "?

Only because I haven't been following it lately and don't know much 
details about where it is and where it is going.
Nothing against Forrest!

>> ...
>> -if the docs format changes for the new doc management system, 
>> navigation definitions stay valid.
>
> There was a discussion on Forrest about making all links be done to 
> files without extensions, and now we use site.xml to reference these 
> links.
>
> The only thing that is still lacking is making the output remain 
> "static" over time.

Not sure if I understand this, can you explain?

> ...What I wanted to do is to have Forrest generate an index of pages 
> and the users add this to CVS. With this index we have all the doc 
> history, and Forrest can generate redirects if urls change. I also 
> want want to generate redirects for filenames without urls and add an 
> unique id to every page in the index, so that Forrest can add barcodes 
> to the pages.

Sounds good.

> ...Please don't forget Forrest.

Certainly not!

-Bertrand

Re: [RT] Moving towards a new documentation system

Posted by Stefano Mazzocchi <st...@apache.org>.

On Saturday, Oct 11, 2003, at 14:25 Europe/Rome, Nicola Ken Barozzi 
wrote:

> Please don't forget Forrest.

we are not.

I thought about it and I totally resonate with Bertrand: we need to 
outline an incremental transition to our own CMS reusing as much dog 
food as possible (which is also good for a community perspective).

Here is my proposal:

  1) the system gets divided into three parts

      - frontend -> mapped to http://cocoon.apache.org -> static
      - repository -> a WebDAV server
      - backend -> mapped to http://edit.cocoon.apache.org -> dynamic

   2) each page is rendered with a wiki-like "edit this page" link, that 
points to the exact same URI, on the backend virtual host.

   3) everybody is able to edit or to enter new pages, but they need to 
be approuved to get published.

   4) a committer can publish directly (thru a login)

   5) each page is considered an atomic learning object (LO) and is 
identified by a numerical URI

   6) the process of creating a learning object (editing) and the 
process of linking some together, are kept independent.

The above is really important, IMO, because separates concerns between:

    - writers -> those who know about something and want to write about 
it
    - editors -> those who know what users want to know and assembles 
knowledge for them

The lack of separation between these two types of people is, IMO, what 
is missing from our documentation infrastructure. Note that wikis 
concentrate on the first part and leave the second part up to the 
collective task of content refactoring. I think this is weak and it's 
the worst part of the wiki pattern: we need something better.

The future idea is to use indirect linking where lookup is a sort of 
"what's related" understood out of the analysis of user patterns, but 
this is far ahead in the future.

For now, I think direct linking would be enough for our needs... we 
just need a good "lookup and discovery" of learning objects integrated 
in the backend.

                                   - o -

As the implementation

  1) forrest will be used to generate the site from the contents of the 
repository

  2) the repository can be either plain vanilla subversion or a webdav 
server implemented by cocoon on top of another repository (either 
subversion or catacomb or JSR170). even CVS, but we might want to stay 
away from it.

  3) lenya will be used as the backend.

Missing things:

  1) is forrest already capable of doing what we ask?

  2) what's the best repository? where do we install it?

  3) is lenya enough for what we need? (Michi says so)

  4) how much work is the integration of linotype with lenya? (I'll 
investigate this soon)

  5) how do we get the wiki into the repository? (I plan to write a 
wiki-syntax editing pane for linotype, would this be enough?)

  6) how do we get the rest of the data into the repository?

  7) how do we make it simple to edit linkmaps? what searching and 
proximity tools can we provide for this?

Enough of a braindump for now.

Fire at will.


--
Stefano.

Re: [RT] Moving towards a new documentation system

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Bertrand Delacretaz wrote:
...
> I'm starting to think (and I think this resonates with what Tony was 
> saying) that the physical structure of the docs should be flat, 
> wiki-style, having all docs "files" (real files or generated) in a 
> single directory, of very few directories like "reference", "documents" 
> and maybe "technotes".

Agreed. Forrest also has a wiki format and can use semantical html. 
Using HTML has also the benefit of making the docs easily viewable off 
from CVS.

> We can then build all kinds of navigational structures, trails, multiple 
> tables of contents, beginners/advanced, whatever (again picking up on 
> wiki idea of a flat page structure with many navigation paths), but the 
> path to a given document stays valid forever unless documents are removed.

Forrest's site.xml is ready to adapt to the needs.

> Of course we forfeit compatibility with our existing docs URLs, but I 
> think this is needed anyway to move forward.
> 
> This might also make our remodeling easier:
> 
> -move all existing docs to a small number of directories like above, 
> "big bag of docs"
> -rename docs as needed to give them permanent names
> -create a very simple publishing system for now (Forrest probably?), 
> until the new docs system moves forward

Eh, why "Forrest /probably/ "?

> -start building the navigations, trails, tables of contents incrementally
> -if the docs format changes for the new doc management system, 
> navigation definitions stay valid.

There was a discussion on Forrest about making all links be done to 
files without extensions, and now we use site.xml to reference these links.

The only thing that is still lacking is making the output remain 
"static" over time.

What I wanted to do is to have Forrest generate an index of pages and 
the users add this to CVS. With this index we have all the doc history, 
and Forrest can generate redirects if urls change. I also want want to 
generate redirects for filenames without urls and add an unique id to 
every page in the index, so that Forrest can add barcodes to the pages.

> I think we need to find a way to get started with this docs remodeling 
> without having to wait  too long on our improved doc management system - 
> if an incremental path like above works it might help us get started.

Please don't forget Forrest.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: [RT] Moving towards a new documentation system (was: [RT] Updating the website)

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Stefano Mazzocchi wrote:

...
> NOTE: this is *NOT* something that will replace either forrest or lenya. 
> In fact, the idea of this system is to show off *all* the cocoon-related 
> technologies we have in one big showcase for our own use. So, both 
> forrest and lenya should be happy to participate into this because it 
> might give them even more exposure and ideas for new features or simply 
> more itches to scratch that would revamp the various communities.

Forrest has basically decided to focus on the presentational part of 
providing content, so the proposed system is a perfect fit for Forrest.

Lenya on the other hand is much more oriented on the editing. I had 
thought that we could somehow make Forrest use Lenya, but never got to 
it because something seemed wrong.

having a clear separation of view generation (Forrest) and editing 
(Lenya) seems like a perfect wat to reuse the systems getting the best 
out of them.

I would propose that we seek to make Forrest be able to generate the 
site from what you think is necessary. OTOMH that means adding 
navigation concerns to site.xml and an id mechanism for files.

Then we can move the docs to the new layout and have Forrest publish 
this test site regularly.

Then comes the Lenya integration, that has to be able to edit this thing.

Guys, let's move the discussions about what is needed over to 
forrest-dev, because we already had a lot of discussions about similar 
stuff.

Oh, and take a look at site.xml, as it may be used for what is needed.

Maybe I lost some threads here, maybe it's just that I wasn't at the GT, 
but I have the feeling that I don't know what is needed for this by Forrest.

If someone would post on Forrest-dev a list of things that Forrest 
should be able to do it would be great to get us started.

:-)

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

RE: [RT] Moving towards a new documentation system

Posted by Robert Koberg <ro...@koberg.com>.

Morning,

> -----Original Message-----
> From: news [mailto:news@sea.gmane.org] On Behalf Of Nicola Ken Barozzi
> Sent: Saturday, October 11, 2003 2:35 PM
> To: dev@cocoon.apache.org
<snip/>
> > I will try to give you something to chew on. Here is an example page
> (our
> > homepage):
> >
> > <site css="default.css" id="wwwuser" index_page="p1395893912" onnav="0"
> > xsl="default">
> >   <label>Site</label>
> >   <title>liveSTORYBOARD Content Management System: </title>
> >   <page css="inherit" generate="1" id="p1395893912" name="Welcome.html"
> > onnav="1" pgstatus="publish" print_friendly="0" xsl="homepage">
> >     <label>Home</label>
> >     <title>Simple, powerful and secure hosted Web Content
> Management</title>
> >     <regions>
> >       <region name="wide_col">
> >         <item ref="a1095201465"></item>
> >         <item ref="c404932357"></item>
> >       </region>
> >       <region name="narrow1_col">
> >         <item ref="c1109515213"></item>
> >         <item ref="c108879656"></item>
> >       </region>
> >     </regions>
> >   </page>
> > </
> >
> > Note the /site/page/regions/region elements. The region/@name reference
> a
> > page layout identifier like div/@id. The //region/item/@ref references
> an
> > XML document in a content repository.
> >
> > I currently call these in transformation using the document function.
> 
> Hmmm, wait a sec, you are basically using site.xml to drive the site
> generation... the current site.xml would eventually do the same till the
> page tag, but without the things you put in that tag.
> 
> What I would do instead, is to make a welcome.xml page and write inside
> it the xincludes I need... 

OK, I see.

> but wait, you say that the name is the page
> layout identifier... what's a page layout identifier?

The region/@name simply tells the transformation where to stick the content.
For example, in the template below you see that I put those items that are
in the 'narrow_1col' region in an html
div[@id='rightColumn']/div[@class='floater'] and so on. You will also notice
that some boolean tests are done, based on folder and page attributes, to
display certain UI features. 

<xsl:template match="/">
  <html>
    <xsl:call-template name="head"/>
    <body>
      <xsl:call-template name="banner"/>
      <div id="pageBody">
        <div id="rightColumn">
          <xsl:call-template name="nav"/>
          <div class="floater">
            <xsl:apply-templates
select="$lsb_folder_nodeset/lsb:regions/lsb:region[@name='narrow1_col']/*"
mode="load_columns"/>
            <xsl:apply-templates
select="$lsb_page_nodeset/lsb:regions/lsb:region[@name='narrow1_col']/*"
mode="load_columns"/>
          </div>
        </div>
        <div id="wideColumn">
          <xsl:if test="$lsb_folder_snailtrail">
            <xsl:call-template name="snailtrail"/>
          </xsl:if>
          <xsl:if test="$lsb_folder_pager">
            <xsl:call-template name="pager"/>
          </xsl:if>
          <xsl:apply-templates select="$lsb_page_nodeset/lsb:title"/>
          <xsl:if test="$lsb_page_print_friendly">
            <xsl:call-template name="pf_link"/>
          </xsl:if>
          <xsl:apply-templates
select="$lsb_folder_nodeset/lsb:regions/lsb:region[@name='wide_col']/*"
mode="load_columns"/>
          <xsl:apply-templates
select="$lsb_page_nodeset/lsb:regions/lsb:region[@name='wide_col']/*"
mode="load_columns"/>
          <xsl:if test="$lsb_folder_pager">
            <xsl:call-template name="pager"/>
          </xsl:if>
        </div>
      </div>
      <xsl:call-template name="footer"/>
    </body>
  </html>
</xsl:template>

> 
> What does the above thing in practice generate as a result?


Well, it (site.xml) does not generate anything. It is used as the main
Source in a transformation. Additional information like the focus page ID
and its parent ID are sent to the transformation that tells the
transformation what the focus is. Global variables are setup from this
information (i.e. $lsb_page_nodeset). 

By using the entire site.xml as the Source I can resolve all links
accurately and provide for other UI things for our CMS like a dropdown to
select a page to link to. 

There are two ways to handle it, as I see it:

1) [what I currently do] pass the focus IDs into the transformation as
parameters and use the site.xml as the main Source and use the document
function along with URIResolvers to 'find' the content associated with the
region/item/@ref

2) run the site.xml through an XMLReader, strip out unnecessary stuff,
replace the item/@ref with the actual content piece, use XMLfilters to add
the stuff I am currently passing in with parameters and perform the
transformation. 


> 
> > Recently, I have been playing around with a XincludeFilter that does
> this
> > prior to transformation (I assume this is the route you would take :).
> It is
> > just that it is proving to be slower and more memory intensive for some
> > reason (I am probably doing wrong...).
> 
> IIRC cinclude is more appropriate for server-side includes, and is also
> cacheable (don't remember if they made xinclude cacheable yet).

I still have some learning/investigating to do...

> 
> > Use the site.xml in a ContentHandler or a class that traverses the
> site.xml
> > as a JDOM Element (or whatever) to generate the static pages. Our site
> is
> > approaching 500 pages (most with print friendly versions) and it takes
> 6-8
> > seconds to generate everything.
> 
> Wait a sec, you generate the whole 500-page site with Cocoon (or just
> xslts?) in 8 seconds? I mean all the pages?

Not currently with cocoon. I am investigating using it for the 2cnd way I
described above (but, after just reading Bruno's reply, I don't know...).
Currently, as I keep the site.xml as a JDOM Element and just run through the
tree, create folders for the folder elements and generate/transform pages
(regular and print friendly) for the page elements.

In addition, it is relatively simple to transform the site.xml to an Ant
build file along with a catalog for content/xsl:include resolution so a site
can be generated offline -- which makes for a good exit strategy for our
clients.

And yes, all pages in usually less than 8 seconds (slower using Ant). In
addition, assets (images and other binaries) are synched. Most of our sites
(we are an ASP CMS) are small (under 100 pages) and they access their
projects relatively infrequently (I think of it like a health club business
model :) so this approach has not been problematic. But, for some reason, we
are getting more clients with monster sized sites and I want to do away with
the JDOM and use a SAX based approach. I have done this on my own, but it
was slower and more memory intensive. Hopefully using cocoon best practices
will give me more of what I am looking for. I don't know...

> 
> Gosh, it's not fair, I have to sleep now and you got my brain thinking! ;-
> P

Cool :)

Best,
-Rob

> 
> --
> Nicola Ken Barozzi                   nicolaken@apache.org
>              - verba volant, scripta manent -
>     (discussions get forgotten, just code remains)
> ---------------------------------------------------------------------

Re: [RT] Moving towards a new documentation system

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Robert Koberg wrote:
...
<about forrest's site.xml:>

>>The question is: why validate?
> 
> 
> More than a few reasons:
> 
> - ensure unique identifiers with the xs:ID datatype

good point

> - ensure valid id references with xs:IDREF for things like:
> <site id="cocoon.apache.org" index_page=" p1234">
>   <page id="p1234"/>
>   <page id="p1235"/>
>   <folder id="f1234" index_page="p1236">
>     <page id="p1236"/>
>     <page id="p1237"/>
>   </
> </

hmmm, interesting...

> - use schematron or simply an XSLT (with document()) to ensure all internal
> links are valid (<link idref="p1234">a link</link>)
>
> - use simple types to do things like:
> 
>   <xs:simpleType name="file.name.length">
>     <xs:restriction base="xs:NMTOKEN">
>       <xs:maxLength value="255"/>
>     </xs:restriction>
>   </xs:simpleType>

not really needed ATM I think...

> - transform the schema to user interface forms, form elements and validating
> javascript. For example, something like the following could have it's
> pgstatus attribute transformed to a select dropdown and the generate attr to
> a radio button yes/no pair:
> 
>   <xs:complexType name="FileNode">
>     <xs:complexContent>
>       <xs:extension base="lsb:SiteNode">
>         <xs:attribute name="generate" type="xs:boolean" use="required"/>
>         <xs:attributeGroup ref="lsb:name.attr"/>
>         <xs:attribute name="pgstatus" use="required">
>           <xs:simpleType>
>             <xs:restriction base="xs:NMTOKEN">
>               <xs:enumeration value="hold"/>
>               <xs:enumeration value="publish"/>
>               <xs:enumeration value="archive"/>
>               <xs:enumeration value="obsolete"/>
>             </xs:restriction>
>           </xs:simpleType>
>         </xs:attribute>
>       </xs:extension>
>     </xs:complexContent>
>   </xs:complexType>

Interesting, although it's more for future applications.

> Etc, etc...

Ok, it seems that there are very interesting points.

I would add mine: I found site.xml very ackward to edit lately, and had 
a hard time in defining the tab attributes in site.xml. I reckon that 
the use of elements as names is partly to blame.

In any case, as I told you, I really will add navigation.xml as a 
site.xml alternative, so that the above can be done with it.

Then we'll see what of the two gets more use.

...
>>Yes, this happens, but with a twist: the links in site.xml get
>>translated to the ones that are in the real content.
>>
>>Hence I'm writing now a sourcemap, that makes it possible to decouple
>>Forrest from the sources, and build a virtual source space.
>>
>>Oh well, there is a lot of work to do, and I've just begun (*)!
>>
>>(*) getting off projects I went to the core, and finally I'm free to
>>think and code again! :-)
> 
> I will try to give you something to chew on. Here is an example page (our
> homepage):
> 
> <site css="default.css" id="wwwuser" index_page="p1395893912" onnav="0"
> xsl="default">
>   <label>Site</label>
>   <title>liveSTORYBOARD Content Management System: </title>
>   <page css="inherit" generate="1" id="p1395893912" name="Welcome.html"
> onnav="1" pgstatus="publish" print_friendly="0" xsl="homepage">
>     <label>Home</label>
>     <title>Simple, powerful and secure hosted Web Content Management</title>
>     <regions>
>       <region name="wide_col">
>         <item ref="a1095201465"></item>
>         <item ref="c404932357"></item>
>       </region>
>       <region name="narrow1_col">
>         <item ref="c1109515213"></item>
>         <item ref="c108879656"></item>
>       </region>
>     </regions>
>   </page>
> </
> 
> Note the /site/page/regions/region elements. The region/@name reference a
> page layout identifier like div/@id. The //region/item/@ref references an
> XML document in a content repository.
> 
> I currently call these in transformation using the document function.

Hmmm, wait a sec, you are basically using site.xml to drive the site 
generation... the current site.xml would eventually do the same till the 
page tag, but without the things you put in that tag.

What I would do instead, is to make a welcome.xml page and write inside 
it the xincludes I need... but wait, you say that the name is the page 
layout identifier... what's a page layout identifier?

What does the above thing in practice generate as a result?

> Recently, I have been playing around with a XincludeFilter that does this
> prior to transformation (I assume this is the route you would take :). It is
> just that it is proving to be slower and more memory intensive for some
> reason (I am probably doing wrong...).

IIRC cinclude is more appropriate for server-side includes, and is also 
cacheable (don't remember if they made xinclude cacheable yet).

> Use the site.xml in a ContentHandler or a class that traverses the site.xml
> as a JDOM Element (or whatever) to generate the static pages. Our site is
> approaching 500 pages (most with print friendly versions) and it takes 6-8
> seconds to generate everything.

Wait a sec, you generate the whole 500-page site with Cocoon (or just 
xslts?) in 8 seconds? I mean all the pages?

Gosh, it's not fair, I have to sleep now and you got my brain thinking! ;-P

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

RE: [RT] Moving towards a new documentation system

Posted by Bruno Dumon <br...@outerthought.org>.

On Sat, 2003-10-11 at 20:53, Robert Koberg wrote:
<snip/>
> I currently call these in transformation using the document function.
> Recently, I have been playing around with a XincludeFilter that does this
> prior to transformation (I assume this is the route you would take :). It is
> just that it is proving to be slower and more memory intensive for some
> reason (I am probably doing wrong...).

Assuming you're using XPaths on the included documents, that's quite
normal: the XIncludeTransformer has to build a (heavy) DOM-tree of the
document and pass it to a generic XPathProcessor implementation, by
default based on Xalan, which will need to compile the XPath expression
and create a DOM-to-DTM layer around the document. Xalan on the other
hand (or Saxon for that matter) directly builds its own optimized 'DTM'.
XSLT also allows you to put the document loaded with the document
function in a variable so that multiple XPaths can be evaluated upon it
without reloading the document each time.

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org

RE: [RT] Moving towards a new documentation system

Posted by Robert Koberg <ro...@koberg.com>.

Hi,

> -----Original Message-----
> From: news [mailto:news@sea.gmane.org] On Behalf Of Nicola Ken Barozzi
> Sent: Saturday, October 11, 2003 11:01 AM
> To: dev@cocoon.apache.org
> 
> 
> Robert Koberg wrote:
> 
> > First, forrest's site.xml should change the element names to something
> > generic, like:
> >
> > <site label="My Site">
> >   <page id="p34568656" label="Nico Page" url="newnicepage.html"/>
> > </site>
> >
> > So the site.xml can be validated. In its current state a custom schema
> would
> > be required for each site.xml instance -- just doesn't make sense. The
> > element names are currently being used as identifiers. Why not simply
> make
> > them valid IDs?
> 
> The question is: why validate?

More than a few reasons:

- ensure unique identifiers with the xs:ID datatype

- ensure valid id references with xs:IDREF for things like:
<site id="cocoon.apache.org" index_page=" p1234">
  <page id="p1234"/>
  <page id="p1235"/>
  <folder id="f1234" index_page="p1236">
    <page id="p1236"/>
    <page id="p1237"/>
  </
</

- use schematron or simply an XSLT (with document()) to ensure all internal
links are valid (<link idref="p1234">a link</link>)

- use simple types to do things like:

  <xs:simpleType name="file.name.length">
    <xs:restriction base="xs:NMTOKEN">
      <xs:maxLength value="255"/>
    </xs:restriction>
  </xs:simpleType>

- transform the schema to user interface forms, form elements and validating
javascript. For example, something like the following could have it's
pgstatus attribute transformed to a select dropdown and the generate attr to
a radio button yes/no pair:

  <xs:complexType name="FileNode">
    <xs:complexContent>
      <xs:extension base="lsb:SiteNode">
        <xs:attribute name="generate" type="xs:boolean" use="required"/>
        <xs:attributeGroup ref="lsb:name.attr"/>
        <xs:attribute name="pgstatus" use="required">
          <xs:simpleType>
            <xs:restriction base="xs:NMTOKEN">
              <xs:enumeration value="hold"/>
              <xs:enumeration value="publish"/>
              <xs:enumeration value="archive"/>
              <xs:enumeration value="obsolete"/>
            </xs:restriction>
          </xs:simpleType>
        </xs:attribute>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

Etc, etc...

> 
> The schema would simply say that there are nodes and leaves. Hey, that's
> xml!
> 
> Ok, I'm partly serious and partly joking, but what I want to do is to
> make a version of site.xml that can be validated. And - hold to your
> seats - I will take Anakia's navigation.xml for this purpose. So I can
> make us render their sites *and* get validation, all in one go :-)
> 
> > Also, much more site/folder/page metadata can be applied to nodes to
> trigger
> > certain things in a transformation.
> 
> Yes, in fact what we would like to do is say if Forrest should treat
> that doc as a resource or as content to render.
> 
> > Next, why wouldn't you recommend using the site.xml as the site
> structure?
> > The site.xml should be a *virtual* representation of the site. This way
> > (with a validatable site.xml) it is easy to build a tool (in javascript)
> > that can manipulate it.
> 
> It's already the navigational structure. The left-havd navigation is
> *all* done from site.xml.
> 
> > The static site gets generated from the site.xml using the site.xml as a
> > main Source xml for a transformation. This way all nav and content links
> can
> > *always* be valid base on the virtual representation.
> 
> Yes, this happens, but with a twist: the links in site.xml get
> translated to the ones that are in the real content.
> 
> Hence I'm writing now a sourcemap, that makes it possible to decouple
> Forrest from the sources, and build a virtual source space.
> 
> Oh well, there is a lot of work to do, and I've just begun (*)!
> 
> (*) getting off projects I went to the core, and finally I'm free to
> think and code again! :-)


I will try to give you something to chew on. Here is an example page (our
homepage):

<site css="default.css" id="wwwuser" index_page="p1395893912" onnav="0"
xsl="default">
  <label>Site</label>
  <title>liveSTORYBOARD Content Management System: </title>
  <page css="inherit" generate="1" id="p1395893912" name="Welcome.html"
onnav="1" pgstatus="publish" print_friendly="0" xsl="homepage">
    <label>Home</label>
    <title>Simple, powerful and secure hosted Web Content Management</title>
    <regions>
      <region name="wide_col">
        <item ref="a1095201465"></item>
        <item ref="c404932357"></item>
      </region>
      <region name="narrow1_col">
        <item ref="c1109515213"></item>
        <item ref="c108879656"></item>
      </region>
    </regions>
  </page>
</

Note the /site/page/regions/region elements. The region/@name reference a
page layout identifier like div/@id. The //region/item/@ref references an
XML document in a content repository.

I currently call these in transformation using the document function.
Recently, I have been playing around with a XincludeFilter that does this
prior to transformation (I assume this is the route you would take :). It is
just that it is proving to be slower and more memory intensive for some
reason (I am probably doing wrong...).

Use the site.xml in a ContentHandler or a class that traverses the site.xml
as a JDOM Element (or whatever) to generate the static pages. Our site is
approaching 500 pages (most with print friendly versions) and it takes 6-8
seconds to generate everything.

Best,
-Rob

> 
> --
> Nicola Ken Barozzi                   nicolaken@apache.org
>              - verba volant, scripta manent -
>     (discussions get forgotten, just code remains)
> ---------------------------------------------------------------------

Re: [RT] Moving towards a new documentation system

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Robert Koberg wrote:

<spipped description of forrest site.xml>
...
> Wasn't this all a conversation from a couple of years ago? 

Yup :-)

> It is good to see opinions change...

Yup, things change. As long as we remain open to change and accept 
things that people need, things get better.

> First, forrest's site.xml should change the element names to something
> generic, like:
> 
> <site label="My Site">
>   <page id="p34568656" label="Nico Page" url="newnicepage.html"/>
> </site>
> 
> So the site.xml can be validated. In its current state a custom schema would
> be required for each site.xml instance -- just doesn't make sense. The
> element names are currently being used as identifiers. Why not simply make
> them valid IDs?

The question is: why validate?

The schema would simply say that there are nodes and leaves. Hey, that's 
xml!

Ok, I'm partly serious and partly joking, but what I want to do is to 
make a version of site.xml that can be validated. And - hold to your 
seats - I will take Anakia's navigation.xml for this purpose. So I can 
make us render their sites *and* get validation, all in one go :-)

> Also, much more site/folder/page metadata can be applied to nodes to trigger
> certain things in a transformation.

Yes, in fact what we would like to do is say if Forrest should treat 
that doc as a resource or as content to render.

> Next, why wouldn't you recommend using the site.xml as the site structure?
> The site.xml should be a *virtual* representation of the site. This way
> (with a validatable site.xml) it is easy to build a tool (in javascript)
> that can manipulate it. 

It's already the navigational structure. The left-havd navigation is 
*all* done from site.xml.

> The static site gets generated from the site.xml using the site.xml as a
> main Source xml for a transformation. This way all nav and content links can
> *always* be valid base on the virtual representation.

Yes, this happens, but with a twist: the links in site.xml get 
translated to the ones that are in the real content.

Hence I'm writing now a sourcemap, that makes it possible to decouple 
Forrest from the sources, and build a virtual source space.

Oh well, there is a lot of work to do, and I've just begun (*)!

(*) getting off projects I went to the core, and finally I'm free to 
think and code again! :-)

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: [RT] Moving towards a new documentation system

Posted by Bertrand Delacretaz <bd...@apache.org>.

Le Samedi, 11 oct 2003, à 16:54 Europe/Zurich, Nicola Ken Barozzi a 
écrit :

> Bertrand Delacretaz wrote:
>
> ...
>>> ...Messy. what would something like this behave?
>>>
>>>  22003-this-is-first-doc.xml
>>>  22003-this-is-second-doc.xml
>>> ...
>> that's what I meant by the system having to ensure the uniqueness of 
>> IDs. It is certainly problematic.
>
> Look at Forrest, we have been having super-easy revision for a while 
> now.
>
> howto-multi.xml
> revision-howto-multi-2003-09-14.xml
> revision-howto-multi-2003-09-15.xml
>
> Here is how it shows:
> http://xml.apache.org/forrest/community/howto/multi/howto-multi.html
> So you always get the latest version and can also see the revisions....

Very nice!
I think Stefano's example was more about the risk of having different 
documents with the same ID, though.

-Bertrand

RE: [RT] Moving towards a new documentation system

Posted by Robert Koberg <ro...@koberg.com>.

Hi,

> -----Original Message-----
> From: news [mailto:news@sea.gmane.org] On Behalf Of Nicola Ken Barozzi
> Sent: Saturday, October 11, 2003 7:54 AM
> To: dev@cocoon.apache.org
> 
> Bertrand Delacretaz wrote:
> 
> ...
> >> ...Messy. what would something like this behave?
> >>
> >>  22003-this-is-first-doc.xml
> >>  22003-this-is-second-doc.xml
> >> ...
> >
> > that's what I meant by the system having to ensure the uniqueness of
> > IDs. It is certainly problematic.
> 
> Look at Forrest, we have been having super-easy revision for a while now.
> 
> howto-multi.xml
> revision-howto-multi-2003-09-14.xml
> revision-howto-multi-2003-09-15.xml
> 
> Here is how it shows:
> http://xml.apache.org/forrest/community/howto/multi/howto-multi.html
> 
> So you always get the latest version and can also see the revisions.
> 
> > I agree that a pure ID for naming pieces  of content might be better,
> > provided lookup is super-easy and doesn't get in the way of editing,
> > keeping track of changes etc., and the ID's stay readable and
> > "communicable".
> 
> I really think that using ids /instead/ of filenames is not a good idea.
> URIs are about where to find a certain information, not necessarily with
> a specific date version.
> 
> That's why the Forrest revisions have a defined date (or number) in the
> name, so that that stays the same.
> 
> What I would propose, and that I would like to implement, is an indexing
> system that scans all source files and associates a number with that file.
> 
> This means that a file can have a barcode attached to it, and if we keep
> a repository of site barcodes, we can have a fully resolvable barcoded
> page.
> 
> Then, when pages are added or changed, the system would index the files
> again, and add other new pages with incremented numbers.
> 
> Note that there is another trick in this: if I also index site.xml, I
> can get to know the *history* of the site: ids, and can automatically do
> redirects.
> 
> For example, I start with this site.xml.
> 
>   <site label="My Site">
>     <mynicepage label="Nico Page" url="nicepage.html"/>
>   </site>
> 
> I can refer to that in my docs as:
> 
>    <link href="site:mynicepage">
> 
> (note that site nodes can be hierarchical)
> 
> Then one day I change the node to be:
> 
>   <site label="My Site">
>     <mynicepage label="Nico Page" url="newnicepage.html"/>
>   </site>
> 
> The system would understand that the node leads to another page, and
> would generate redirects from the previous link to the new one.
> 
> Of course, we can do this *if* we don't create different pages at the
> same old locations, unless we generate URIs following site.xml instead
> of the file structure (I do not reccomend ATM).

Wasn't this all a conversation from a couple of years ago? It is good to see
opinions change...

First, forrest's site.xml should change the element names to something
generic, like:

<site label="My Site">
  <page id="p34568656" label="Nico Page" url="newnicepage.html"/>
</site>

So the site.xml can be validated. In its current state a custom schema would
be required for each site.xml instance -- just doesn't make sense. The
element names are currently being used as identifiers. Why not simply make
them valid IDs?

Also, much more site/folder/page metadata can be applied to nodes to trigger
certain things in a transformation.

Next, why wouldn't you recommend using the site.xml as the site structure?
The site.xml should be a *virtual* representation of the site. This way
(with a validatable site.xml) it is easy to build a tool (in javascript)
that can manipulate it. 

The static site gets generated from the site.xml using the site.xml as a
main Source xml for a transformation. This way all nav and content links can
*always* be valid base on the virtual representation.

Best,
-Rob


> 
> --
> Nicola Ken Barozzi                   nicolaken@apache.org
>              - verba volant, scripta manent -
>     (discussions get forgotten, just code remains)
> ---------------------------------------------------------------------

Re: [RT] Moving towards a new documentation system

Posted by Stefano Mazzocchi <st...@apache.org>.

On Saturday, Oct 11, 2003, at 16:54 Europe/Rome, Nicola Ken Barozzi 
wrote:

> Bertrand Delacretaz wrote:
>
> ...
>>> ...Messy. what would something like this behave?
>>>
>>>  22003-this-is-first-doc.xml
>>>  22003-this-is-second-doc.xml
>>> ...
>> that's what I meant by the system having to ensure the uniqueness of 
>> IDs. It is certainly problematic.
>
> Look at Forrest, we have been having super-easy revision for a while 
> now.
>
> howto-multi.xml
> revision-howto-multi-2003-09-14.xml
> revision-howto-multi-2003-09-15.xml
>
> Here is how it shows:
> http://xml.apache.org/forrest/community/howto/multi/howto-multi.html
>
> So you always get the latest version and can also see the revisions.

cool. it would be piece of cake to have different *views* of the 
learning object's revisions so that forrest can rely on something like 
the above.

>> I agree that a pure ID for naming pieces  of content might be better, 
>> provided lookup is super-easy and doesn't get in the way of editing, 
>> keeping track of changes etc., and the ID's stay readable and 
>> "communicable".
>
> I really think that using ids /instead/ of filenames is not a good 
> idea.
> URIs are about where to find a certain information, not necessarily 
> with a specific date version.

No, URI are identifiers, URL are locators. A URI *can* be used as a 
URL, but the results is unknown and has to be decided case by case (as 
for namespaces, for example).

A learning object (think of it as the abstraction of what a page is) is 
a container of information and needs to be identified uniquely, 
allowing the content to evolve without require a change in 
identification.

Versioning is an orthogonal identification axis and can be composed to 
provide a bi-dimensional identifier so

  http://cocoon.apache.org/LO/3984

identifies to the learning object 3984, but doesn't specify with 
version.

  http://cocoon.apache.org/LO/3948/343

specifies LO #3984 with revision 343. Note how if the URI is used as a 
locator, the resulting LO is immutable.

don't look at the syntax of the IDs, an alternative syntax could well be

  http://cocoon.apache.org/LO/003.048/3.43

and the meaning is exactly the same.

The behavior of using the URI as a locator is dynamic. For example, at 
one point in time

  http://cocoon.apache.org/LO/3948

and

  http://cocoon.apache.org/LO/3948/394

might locate the same learning object, because revision "394" is the 
last one.

How this maps to the file system is completely irrelevant because the 
repository represents a virtual one.

> That's why the Forrest revisions have a defined date (or number) in 
> the name, so that that stays the same.

see above, same thing, just must abstract, totally decoupled from the 
actually implementation of the storage.

> What I would propose, and that I would like to implement, is an 
> indexing system that scans all source files and associates a number 
> with that file.

that's the job that our repository would do for us transparently.

> This means that a file can have a barcode attached to it, and if we 
> keep a repository of site barcodes, we can have a fully resolvable 
> barcoded page.

welcome to the world of content repositories ;-)

> Then, when pages are added or changed, the system would index the 
> files again, and add other new pages with incremented numbers.

JSR 170 will allow us to do this and *much* more.

> Note that there is another trick in this: if I also index site.xml, I 
> can get to know the *history* of the site: ids, and can automatically 
> do redirects.
>
> For example, I start with this site.xml.
>
>  <site label="My Site">
>    <mynicepage label="Nico Page" url="nicepage.html"/>
>  </site>
>
> I can refer to that in my docs as:
>
>   <link href="site:mynicepage">
>
> (note that site nodes can be hierarchical)
>
> Then one day I change the node to be:
>
>  <site label="My Site">
>    <mynicepage label="Nico Page" url="newnicepage.html"/>
>  </site>
>
> The system would understand that the node leads to another page, and 
> would generate redirects from the previous link to the new one.
>
> Of course, we can do this *if* we don't create different pages at the 
> same old locations, unless we generate URIs following site.xml instead 
> of the file structure (I do not reccomend ATM).

hmmm, I have to think more about this...

--
Stefano.

Re: [RT] Moving towards a new documentation system

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Bertrand Delacretaz wrote:

...
>> ...Messy. what would something like this behave?
>>
>>  22003-this-is-first-doc.xml
>>  22003-this-is-second-doc.xml
>> ...
> 
> that's what I meant by the system having to ensure the uniqueness of 
> IDs. It is certainly problematic.

Look at Forrest, we have been having super-easy revision for a while now.

howto-multi.xml
revision-howto-multi-2003-09-14.xml
revision-howto-multi-2003-09-15.xml

Here is how it shows:
http://xml.apache.org/forrest/community/howto/multi/howto-multi.html

So you always get the latest version and can also see the revisions.

> I agree that a pure ID for naming pieces  of content might be better, 
> provided lookup is super-easy and doesn't get in the way of editing, 
> keeping track of changes etc., and the ID's stay readable and 
> "communicable".

I really think that using ids /instead/ of filenames is not a good idea.
URIs are about where to find a certain information, not necessarily with 
a specific date version.

That's why the Forrest revisions have a defined date (or number) in the 
name, so that that stays the same.

What I would propose, and that I would like to implement, is an indexing 
system that scans all source files and associates a number with that file.

This means that a file can have a barcode attached to it, and if we keep 
a repository of site barcodes, we can have a fully resolvable barcoded page.

Then, when pages are added or changed, the system would index the files 
again, and add other new pages with incremented numbers.

Note that there is another trick in this: if I also index site.xml, I 
can get to know the *history* of the site: ids, and can automatically do 
redirects.

For example, I start with this site.xml.

  <site label="My Site">
    <mynicepage label="Nico Page" url="nicepage.html"/>
  </site>

I can refer to that in my docs as:

   <link href="site:mynicepage">

(note that site nodes can be hierarchical)

Then one day I change the node to be:

  <site label="My Site">
    <mynicepage label="Nico Page" url="newnicepage.html"/>
  </site>

The system would understand that the node leads to another page, and 
would generate redirects from the previous link to the new one.

Of course, we can do this *if* we don't create different pages at the 
same old locations, unless we generate URIs following site.xml instead 
of the file structure (I do not reccomend ATM).

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: [RT] Moving towards a new documentation system (was: [RT] Updating the website)

Posted by Bertrand Delacretaz <bd...@apache.org>.

Le Samedi, 11 oct 2003, à 15:33 Europe/Zurich, Stefano Mazzocchi a 
écrit :
> ...I had two names "Hyperbook" and "Papyrus", but both are probably 
> infringing some trademark. I'll try to come up with something that is 
> "print" or "publishing" or "learning" related.... if you have a 
> suggestion, it's a good time to speak up...

How about "LearningTrove" ?

-Bertrand

Re: [RT] Moving towards a new documentation system

Posted by Bertrand Delacretaz <bd...@apache.org>.

Le Samedi, 11 oct 2003, à 18:30 Europe/Zurich, Barzilai Spinak a écrit :

> Just for the sake of those of us who know less than you, Cocoon 
> gods....

There are no Cocoon gods here - everyone is welcome to express their 
opinion in a respectful way.

> What is the advantage of using an all-numeric filename instead of 
> alphanumeric?...

It's not about filenames, it's about units of information ("learning 
objects") stored in a repository (kind of a "document database").

Document titles may change, IDs will not.

-Bertrand

Re: [RT] Moving towards a new documentation system

Posted by Barzilai Spinak <ba...@internet.com.uy>.

Just for the sake of those of us who know less than you, Cocoon gods.
What is the advantage of using an all-numeric filename instead of 
alphanumeric?
If I'm not mistaken by what I sleepily read this morning, you are 
proposing a flat
"big bag" of documents. So, if there's no "hierarchy", an alphanumeric 
file name is
just as good as a numeric one, plus it can be easily found within your 
files.
The Bugzilla analogy is not quite good. I don't have the bug database in 
my hard drive.
(All this asuming you are discussing the new structure for the Cocoon 
documents that
are distributed with the Cocoon package. Is that true?)
In case an "all numbers" is the way to go, a splitting system like 
Bertrand proposes
gets my +1 for the same reasons he exposed.

I have a better idea, why don't we use 8.3  filenames!?!?!  (joke, of 
course :-)

BarZ
Once-again-talking-about-what-I-don't-understand





TODAS LAS FORMAS DE ACCESO     TODOS LOS SERVICIOS
--------------------------------------------------
ADSL Todas las velocidades - Accesos por Modem 56K
E-mail con Antivirus / Antispam / Alias / Forwards
Desarrollos a medida - Redes - Intranets/Extranets 
--------------------------------------------------
http://www.internet.com.uy          Tel. 707.42.52

Re: [RT] Moving towards a new documentation system

Posted by Stefano Mazzocchi <st...@apache.org>.

On Sunday, Oct 12, 2003, at 11:20 Europe/Rome, Joerg Heinicke wrote:

> On 11.10.2003 18:10, Antonio Gallardo wrote:
>
>> Yep. You are "the master" here.
>
> Hello Antonio,
>
> the above is wrong and I guess Stefano does also not want that it is 
> seen so. This was also the reason for his withdrawal as PMC head. 
> Stefano might be something like a technical leader with his mostly 
> exciting visions, but he is a community member as you and I. You can 
> criticize his postings as anybody else's ones.

+1000

--
Stefano.

Re: [RT] Moving towards a new documentation system

Posted by Joerg Heinicke <jh...@virbus.de>.

On 11.10.2003 18:10, Antonio Gallardo wrote:

> Yep. You are "the master" here.

Hello Antonio,

the above is wrong and I guess Stefano does also not want that it is 
seen so. This was also the reason for his withdrawal as PMC head. 
Stefano might be something like a technical leader with his mostly 
exciting visions, but he is a community member as you and I. You can 
criticize his postings as anybody else's ones.

Please don't see this as personal attack, but your comment does not 
reflect the community's view of itself.

Joerg

Re: [RT] Moving towards a new documentation system

Posted by Antonio Gallardo <ag...@agsoftware.dnsalias.com>.

Stefano Mazzocchi dijo:
>
> On Saturday, Oct 11, 2003, at 17:33 Europe/Rome, Antonio Gallardo wrote:
>
>> Stefano Mazzocchi dijo:
>>> a repository is just like a database: editing a database by direct
>>> SQL injection is silly. Today it doesn't look silly because
>>> repositories are *much* less functional than a database, but when you
>>> have a *serious* repository (for example, one that can extract
>>> properties  from
>>> an image and provide an RDF representation for it), editing it *by
>>> hand* would be silly.
>>
>> Is a good idea to use some a XML database to store the content? I
>> guess we
>> are taking about an application outside the cvs, right? If this is
>> correct
>> a good candidate can be Xindice: http://xml.apache.org/xindice/ :)
>
> no, too weak as a contract.
>
> [believe me I searched for the perfect repository years. JSR 170 is
> what we need, nothing more nothing less]

Yep. You are "the master" here. This was just a suggestion. I am not a pro
in CMS at all. But I am trying to learn. I read the all the mails and
sometimes fire a comment here and there :-D

> note: the repository implementation might use xindice internally, like
> a store, but there is a loooong way to go there.

This was I thought, the database as the "back office". :)

> for now, the best option, IMO, is to stick to simple webdav.

Best Regards,

Antonio Gallardo

Re: [RT] Moving towards a new documentation system

Posted by Stefano Mazzocchi <st...@apache.org>.

On Saturday, Oct 11, 2003, at 17:33 Europe/Rome, Antonio Gallardo wrote:

> Stefano Mazzocchi dijo:
>> a repository is just like a database: editing a database by direct SQL
>> injection is silly. Today it doesn't look silly because repositories
>> are *much* less functional than a database, but when you have a
>> *serious* repository (for example, one that can extract properties 
>> from
>> an image and provide an RDF representation for it), editing it *by
>> hand* would be silly.
>
> Is a good idea to use some a XML database to store the content? I 
> guess we
> are taking about an application outside the cvs, right? If this is 
> correct
> a good candidate can be Xindice: http://xml.apache.org/xindice/ :)

no, too weak as a contract.

[believe me I searched for the perfect repository years. JSR 170 is 
what we need, nothing more nothing less]

note: the repository implementation might use xindice internally, like 
a store, but there is a loooong way to go there.

for now, the best option, IMO, is to stick to simple webdav.

--
Stefano.

Re: [RT] Moving towards a new documentation system

Posted by Antonio Gallardo <ag...@agsoftware.dnsalias.com>.

Stefano Mazzocchi dijo:
> a repository is just like a database: editing a database by direct SQL
> injection is silly. Today it doesn't look silly because repositories
> are *much* less functional than a database, but when you have a
> *serious* repository (for example, one that can extract properties from
> an image and provide an RDF representation for it), editing it *by
> hand* would be silly.

Is a good idea to use some a XML database to store the content? I guess we
are taking about an application outside the cvs, right? If this is correct
a good candidate can be Xindice: http://xml.apache.org/xindice/ :)


>> I agree that a pure ID for naming pieces  of content might be better,
>> provided lookup is super-easy and doesn't get in the way of editing,
>> keeping track of changes etc., and the ID's stay readable and
>> "communicable".
>
> +1 to these goals.
>
> --
> Stefano.

Re: [RT] Moving towards a new documentation system

Posted by Tony Collen <co...@umn.edu>.

Replies inline, I may have missed something important, so please pardon 
anything obvious that was commented on that I may have missed :)

Joerg Heinicke wrote:

> On 19.10.2003 21:07, Stefano Mazzocchi wrote:
> 
>>>>> Remain the "everlasting semantically meaningful names". Of course 
>>>>> the URL should match the content. But if the content changes, so 
>>>>> that it no longer matches, what's the sense of having still this 
>>>>> page? Even if you use IDs and change the content later, the user 
>>>>> linked from outside to this page gets other content than he wants 
>>>>> to have. So there must be available an "outdating mechanism". Let's 
>>>>> say there is a page 
>>>>> http://cocoon.apache.org/documentation/tutorial/xmlforms.html, 
>>>>> which is linked often from outside, because it was the one and only 
>>>>> form handling in Cocoon. Now in Cocoon 2.2 or 3.0 XML Forms are 
>>>>> removed completely, but we don't want to give the user a simple 404 
>>>>> page. We have to point out that he can "use Woody or Cocoon forms 
>>>>> which is by far better than XML Forms" with a link to the correct 
>>>>> page 
>>>>> http://cocoon.apache.org/documentation/tutorial/cocoonforms.html. 
>>>>> You have to do exactly the same for the number URLs. So I think 
>>>>> there is no problem with "everlasting semantically meaningful >>> 
>>>>> names".

Hmm, well, then we need to look and plan ahead a little:

http://cocoon.apache.org/documentation/X.X/tutorial/whatever.html

This way we version the whole branch of the docs tree based on version 
-- sort of how we already have separate sections of the site for 1.x, 
2.0, 2.1, etc.

.. And, perhaps something like 
http://cocoon.apache.org/documentation/foo/bar.html would take you to 
the appropriate document for the most current version of Cocoon.  My 
only concern is how this could possibly interact with the versioning 
scheme that is being planned for the individual documents.. perhaps it 
will work perfectly fine.

Joerg Wrote:

>>> I read today "Cool URIs don't change" [1] for preparing the answering 
>>> of your mail. And there are not many options left after reading it: 
>>> The URLs of a learning object need an ID and a modification date, not 
>>> more, not less.

Ughh... the thought of having a large string of numbers (even if it is a 
date) in order to know what a document is, leaves a bad feeling in the 
pit of my stomach.

Then again, Stefano wrote:

>> hmmm, well, it depends.
>>
>> The URI (we are talking about URIs here, not URLs, careful)

Which puts the hole in my gut at ease, at least for now :)

<snip what="discussion about versioning"/>

>> This is a little bit more tricky. If you choose to use a timestamp for 
>> the version ID, then you have to make sure that you have enough 
>> granularity to take into account the minimum potential time in between 
>> two different changes might happen, or, again, you get a collision.
>>
>> This is why I generally dislike the use of dates in URIs, version 
>> numbers are *abstract*, so
>>
>> http://cocoon.apache.org/lo/39484/342
>>
>> indicate revisions 342 of learning object 39484 and this does not 
>> change over time.

This is a good idea, it's easy enough to increment the number.  Getting 
back to the idea of using the date,  this convention is used in setting 
the "Serial" in a DNS server, which lets the server know when the record 
has been updated:

YYYYMMDDNN

YYYY = year
MM = month
DD = day

And NN where NN = 00 to 99 (I think it will accept more).  The idea is 
when you make a change to a zone file, you update the date, and possibly 
  increment NN, if the date is already "up to date". But I think the 
revision idea above is simple enough.


Stefano wrote:

>> That means that you get a collision if you have more than one version 
>> of the documents per a given date, and this is very likely to happen.

(and then Joerg wrote):

> Ah, ok. Our documentation changes so rarely that I didn't thought about 
> this ;-) Yes, a version is also ok. Though a timestamp must not end with 
> the day, there are also milliseconds. But of course collision is still 
> not impossible. OTOH how strong might documents change on one day? In 
> general a commit short after another one fixes almost only typos or 
> similar. Maybe the last version of a day might be sufficient?

If a document changes more than 100 times in a day, (at least in my 
proposed idea) we might want to think about using the wiki more before a 
commit is made :)

> Maybe the date to version mapping is one thing, that can be handled when 
> mapping URLs to URIs. So as said above the last version of a day is 
> *the* version of that day. Then we have versioned URIs and dated URLs.

I would still like to voice my distaste over having a numerical system 
to access a document, at least for the "current" version of a document. 
  For accessing a historical version of a document, I think it's fine, 
but like I said above, having to remember numbers for a URL really 
really scares me.

> Could it be, that you already came to the same conclusion in other parts 
> of this thread? I appologize for that. Sometimes it takes a bit longer ...

(Ditto) :)

> 
> Joerg
> 


Regards,

Tony

Re: [RT] Moving towards a new documentation system

Posted by Joerg Heinicke <jh...@virbus.de>.

On 19.10.2003 23:26, Stefano Mazzocchi wrote:
> 
>>> The URI (we are talking about URIs here, not URLs, careful)
>>
>>
>> Hmm, but if you want to make the linking from outside also 
>> consistent,  why inventing another scheme?
> 
> 
> True. Consistency is one thing, but remember that is entirely possible  
> that several URLs can point to the same object identified by even a  
> different URI!
> 
> The URI can be used as a URL, but the "meaning" of this use is not so  
> explicit as it seems at first sight. (read below for more)

...

>> I only would like to have date metadata before clicking on a link.
> 
> 
> sorry, I'm not sure I follow you here.

How old is the page to be accessed? Information about Cocoon older than 
3 years might not be of any interest for me. That's similar to the news 
samples I gave at the beginning.

>>> But also note that we are still talking about URIs not URLs. Using  
>>> the LO URI as a URL might not be the only way to access the object.
>>
>>
>> Maybe the date to version mapping is one thing, that can be handled  
>> when mapping URLs to URIs. So as said above the last version of a day  
>> is *the* version of that day. Then we have versioned URIs and dated  
>> URLs.
>>
>> Could it be, that you already came to the same conclusion in other  
>> parts of this thread? I appologize for that. Sometimes it takes a bit  
>> longer ...
> 
> 
> I don't remember if I made it explicit already, but I'm glad you came  
> to the same conclusion. Yes, the "date -> version" can be part of the  
> URL->URI translation procedure.
> 
> So, asking for
> 
>  http://host/path/id/date
> 
> could yield the last revision for that particular date (if any), or  
> could give a list of revisions that were done on that date.
> 
> but at this point, we had to differentiate between version and date...  
> so, another option, following subversion's approach is to use something  
> like
> 
>  http://host/path/id!date=20031015
> 
> or
> 
>  http://host/path/id!version=343
> 
> or even
> 
>  http://host/path/id!branch='cocoon-2.1'
> 
> or, even wilder, following Kimbro Stalken's (of Xindice fame) approach  
> at Syncato (http://www.syncato.org/)
> 
>   http://host/path/!branch='cocoon-2.1'?// 
> author[contains(@name,'Stefano')]
> 
> that would yield a list of objects in the "cocoon-2.1" branch that  
> include at least one element named <author> that contain the string  
> 'stefano' in their name attribute.
> 
> This example shows pretty evidently how URL->URI traslation is not such  
> an automatic and easy thing to describe and to design.
> 
> Also shows that using URI as URLs is not transparent as well and  
> requires some implicit contract.
> 
> -- 
> Stefano.

After these examples the difference between URI and URL is /clear as 
daylight/.

Thanks,

Joerg

RE: [RT] Moving towards a new documentation system

Posted by Robert Koberg <ro...@koberg.com>.

Hi Stefano,

> >
> > For example, unique IDs cannot start with a number, but all examples
> > of IDs
> > have been strictly numerical. Also the date below:
> >
> > http://host/path/id!date 031015
> >
> > does not conform to ISO 8601.
> 
> hmmm, http://www.cl.cam.ac.uk/~mgk25/iso-time.html, says
> 
> <quote>
> part from the recommended primary standard notation YYYY-MM-DD, ISO
> 8601 also specifies a number of alternative formats for use in
> applications with special requirements. All of these alternatives can
> easily and automatically be distinguished from each other:
> 
> The hyphens can be omitted if compactness of the representation is more
> important than human readability, for example as in
> 
> 19950204
> </quote>

I did not read that deep... just followed the spec
(http://www.w3.org/TR/2000/CR-xmlschema-2-20001024/#date). 

So, I just tried to validate:

<abc date="20000129"/>

With:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified" attributeFormDefault="unqualified">
  <xs:element name="abc">
    <xs:complexType>
      <xs:attribute name="date" type="xs:date"/>
    </xs:complexType>
  </xs:element>
</xs:schema>

The instance did not validate. So perhaps the validators are not following
the ISO standard strictly?


> 
> > Am I just being to nit-picky or is this something that simply has no
> > value
> > here?
> 
> but I agree: if we can, reusing datatypes from other standards is a
> good thing 

Cool

> (might well give us code to reuse for parsing already
> implemented in other projects!)

Yea! :)

-Rob


> 
> --
> Stefano.

Re: [RT] Moving towards a new documentation system

Posted by Stefano Mazzocchi <st...@apache.org>.

On Sunday, Oct 19, 2003, at 23:41 Europe/Rome, Robert Koberg wrote:

> Hi,
>
> This is probably a minor and implementation detail but I when I see the
> examples of datatypes being giving I was wondering if you were taking 
> into
> consideration of XML Schema datatypes (and therefore RNG datatypes, 
> though I
> don't understand the desire of clearly object oriented-type people 
> using
> RNG). Some of the examples I have seen do not conform to the structures
> allowed.
>
> For example, unique IDs cannot start with a number, but all examples 
> of IDs
> have been strictly numerical. Also the date below:
>
> http://host/path/id!date=20031015
>
> does not conform to ISO 8601.

hmmm, http://www.cl.cam.ac.uk/~mgk25/iso-time.html, says

<quote>
part from the recommended primary standard notation YYYY-MM-DD, ISO 
8601 also specifies a number of alternative formats for use in 
applications with special requirements. All of these alternatives can 
easily and automatically be distinguished from each other:

The hyphens can be omitted if compactness of the representation is more 
important than human readability, for example as in

19950204
</quote>

> Am I just being to nit-picky or is this something that simply has no 
> value
> here?

but I agree: if we can, reusing datatypes from other standards is a 
good thing (might well give us code to reuse for parsing already 
implemented in other projects!)

--
Stefano.

RE: [RT] Moving towards a new documentation system

Posted by Robert Koberg <ro...@koberg.com>.

Hi,

This is probably a minor and implementation detail but I when I see the
examples of datatypes being giving I was wondering if you were taking into
consideration of XML Schema datatypes (and therefore RNG datatypes, though I
don't understand the desire of clearly object oriented-type people using
RNG). Some of the examples I have seen do not conform to the structures
allowed.

For example, unique IDs cannot start with a number, but all examples of IDs
have been strictly numerical. Also the date below:

http://host/path/id!date=20031015

does not conform to ISO 8601.

Am I just being to nit-picky or is this something that simply has no value
here?

Best,
-Rob

Re: [RT] Moving towards a new documentation system

Posted by Stefano Mazzocchi <st...@apache.org>.

On Sunday, Oct 19, 2003, at 22:32 Europe/Rome, Joerg Heinicke wrote:

> On 19.10.2003 21:07, Stefano Mazzocchi wrote:
>
>>>>> Remain the "everlasting semantically meaningful names". Of course  
>>>>> the URL should match the content. But if the content changes, so  
>>>>> that it no longer matches, what's the sense of having still this  
>>>>> page? Even if you use IDs and change the content later, the user  
>>>>> linked from outside to this page gets other content than he wants  
>>>>> to have. So there must be available an "outdating mechanism".  
>>>>> Let's say there is a page  
>>>>> http://cocoon.apache.org/documentation/tutorial/xmlforms.html,  
>>>>> which is linked often from outside, because it was the one and  
>>>>> only form handling in Cocoon. Now in Cocoon 2.2 or 3.0 XML Forms  
>>>>> are removed completely, but we don't want to give the user a  
>>>>> simple 404 page. We have to point out that he can "use Woody or  
>>>>> Cocoon forms which is by far better than XML Forms" with a link to  
>>>>> the correct page  
>>>>> http://cocoon.apache.org/documentation/tutorial/cocoonforms.html.  
>>>>> You have to do exactly the same for the number URLs. So I think  
>>>>> there is no problem with "everlasting semantically meaningful >>>  
>>>>> names".
>>>> you have a point there, that's for sure.
>>>> I'll think about this some more.... do you have any suggestion?
>>>
>>>
>>> I read today "Cool URIs don't change" [1] for preparing the  
>>> answering of your mail. And there are not many options left after  
>>> reading it: The URLs of a learning object need an ID and a  
>>> modification date, not more, not less.
>> hmmm, well, it depends.
>> The URI (we are talking about URIs here, not URLs, careful)
>
> Hmm, but if you want to make the linking from outside also consistent,  
> why inventing another scheme?

True. Consistency is one thing, but remember that is entirely possible  
that several URLs can point to the same object identified by even a  
different URI!

The URI can be used as a URL, but the "meaning" of this use is not so  
explicit as it seems at first sight. (read below for more)

>> of a LO will need an ID that identifies the object, then might  
>> (optionally) have another ID that identifies the version.
>> So, I would do
>>  http://host/path/ID
>> to identify the object in general
>
> This can also be seen as "latest version" of an LO, can't it?

Exactly, but it could also get you a list of possible versions from  
where you can choose from.

As you see, the URI->URL mapping is not so obvious: even when the  
string translation is one to one, the meaning might not be.

In Subversion, for example, the URI used as a URL references the latest  
version of the file and to get a specific version you have to do

  http://host/repo/file!version=3

[or equivalent, don't remember the exact syntax]

I still don't know what is the best URI->URL translation strategy, but  
for subversion it makes sense so that I can have a subversion  
repository act as a regular web server with very little effort

[I'm thinking about using subversion of the repository for our learning  
objects!]

>> and
>>  http://host/path/ID/version
>> to indicate the object version
>>> The IDs would allow to change the content without a latter mismatch  
>>> between URI and content.
>> Exactly.
>>> The date assures that a later access to a linked page has the  
>>> content it should have, it can not have been changed in the >>> meantime.
>> This is a little bit more tricky. If you choose to use a timestamp  
>> for the version ID, then you have to make sure that you have enough  
>> granularity to take into account the minimum potential time in  
>> between two different changes might happen, or, again, you get a  
>> collision.
>> This is why I generally dislike the use of dates in URIs, version  
>> numbers are *abstract*, so
>> http://cocoon.apache.org/lo/39484/342
>> indicate revisions 342 of learning object 39484 and this does not  
>> change over time.
>>> The user would always access a page where the content is appropriate  
>>> to a certain date, similar to "cvs co -r 20030303 >>> lo/1234567890.xml".
>> That means that you get a collision if you have more than one version  
>> of the documents per a given date, and this is very likely to happen.
>
> Ah, ok. Our documentation changes so rarely that I didn't thought  
> about this ;-)

LOL

> Yes, a version is also ok. Though a timestamp must not end with the  
> day, there are also milliseconds.
> But of course collision is still not impossible. OTOH how strong might  
> documents change on one day?

that's the problem: how do you know? with an incremental versionID you  
don't have collision issues anyway

> In general a commit short after another one fixes almost only typos or  
> similar. Maybe the last version of a day might be sufficient?

> I only would like to have date metadata before clicking on a link.

sorry, I'm not sure I follow you here.

>> But also note that we are still talking about URIs not URLs. Using  
>> the LO URI as a URL might not be the only way to access the object.
>
> Maybe the date to version mapping is one thing, that can be handled  
> when mapping URLs to URIs. So as said above the last version of a day  
> is *the* version of that day. Then we have versioned URIs and dated  
> URLs.
>
> Could it be, that you already came to the same conclusion in other  
> parts of this thread? I appologize for that. Sometimes it takes a bit  
> longer ...

I don't remember if I made it explicit already, but I'm glad you came  
to the same conclusion. Yes, the "date -> version" can be part of the  
URL->URI translation procedure.

So, asking for

  http://host/path/id/date

could yield the last revision for that particular date (if any), or  
could give a list of revisions that were done on that date.

but at this point, we had to differentiate between version and date...  
so, another option, following subversion's approach is to use something  
like

  http://host/path/id!date=20031015

or

  http://host/path/id!version=343

or even

  http://host/path/id!branch='cocoon-2.1'

or, even wilder, following Kimbro Stalken's (of Xindice fame) approach  
at Syncato (http://www.syncato.org/)

   
http://host/path/!branch='cocoon-2.1'?// 
author[contains(@name,'Stefano')]

that would yield a list of objects in the "cocoon-2.1" branch that  
include at least one element named <author> that contain the string  
'stefano' in their name attribute.

This example shows pretty evidently how URL->URI traslation is not such  
an automatic and easy thing to describe and to design.

Also shows that using URI as URLs is not transparent as well and  
requires some implicit contract.

--
Stefano.

Re: [RT] Moving towards a new documentation system

Posted by Joerg Heinicke <jh...@virbus.de>.

On 19.10.2003 21:07, Stefano Mazzocchi wrote:

>>>> Remain the "everlasting semantically meaningful names". Of course 
>>>> the URL should match the content. But if the content changes, so 
>>>> that it no longer matches, what's the sense of having still this 
>>>> page? Even if you use IDs and change the content later, the user 
>>>> linked from outside to this page gets other content than he wants to 
>>>> have. So there must be available an "outdating mechanism". Let's say 
>>>> there is a page 
>>>> http://cocoon.apache.org/documentation/tutorial/xmlforms.html, which 
>>>> is linked often from outside, because it was the one and only form 
>>>> handling in Cocoon. Now in Cocoon 2.2 or 3.0 XML Forms are removed 
>>>> completely, but we don't want to give the user a simple 404 page. We 
>>>> have to point out that he can "use Woody or Cocoon forms which is by 
>>>> far better than XML Forms" with a link to the correct page 
>>>> http://cocoon.apache.org/documentation/tutorial/cocoonforms.html. 
>>>> You have to do exactly the same for the number URLs. So I think 
>>>> there is no problem with "everlasting semantically meaningful >>> 
>>>> names".
> 
> 
>>> you have a point there, that's for sure.
>>> I'll think about this some more.... do you have any suggestion?
>>
>>
>> I read today "Cool URIs don't change" [1] for preparing the answering 
>> of your mail. And there are not many options left after reading it: 
>> The URLs of a learning object need an ID and a modification date, not 
>> more, not less.
> 
> 
> hmmm, well, it depends.
> 
> The URI (we are talking about URIs here, not URLs, careful)

Hmm, but if you want to make the linking from outside also consistent, 
why inventing another scheme?

> of a LO will 
> need an ID that identifies the object, then might (optionally) have 
> another ID that identifies the version.
> 
> So, I would do
> 
>  http://host/path/ID
> 
> to identify the object in general

This can also be seen as "latest version" of an LO, can't it?

> and
> 
>  http://host/path/ID/version
> 
> to indicate the object version
> 
>> The IDs would allow to change the content without a latter mismatch 
>> between URI and content.
> 
> 
> Exactly.
> 
>> The date assures that a later access to a linked page has the content 
>> it should have, it can not have been changed in the meantime.
> 
> 
> This is a little bit more tricky. If you choose to use a timestamp for 
> the version ID, then you have to make sure that you have enough 
> granularity to take into account the minimum potential time in between 
> two different changes might happen, or, again, you get a collision.
> 
> This is why I generally dislike the use of dates in URIs, version 
> numbers are *abstract*, so
> 
> http://cocoon.apache.org/lo/39484/342
> 
> indicate revisions 342 of learning object 39484 and this does not change 
> over time.
> 
>> The user would always access a page where the content is appropriate 
>> to a certain date, similar to "cvs co -r 20030303 lo/1234567890.xml".
> 
> 
> That means that you get a collision if you have more than one version of 
> the documents per a given date, and this is very likely to happen.

Ah, ok. Our documentation changes so rarely that I didn't thought about 
this ;-) Yes, a version is also ok. Though a timestamp must not end with 
the day, there are also milliseconds. But of course collision is still 
not impossible. OTOH how strong might documents change on one day? In 
general a commit short after another one fixes almost only typos or 
similar. Maybe the last version of a day might be sufficient?

I only would like to have date metadata before clicking on a link.

> But also note that we are still talking about URIs not URLs. Using the 
> LO URI as a URL might not be the only way to access the object.

Maybe the date to version mapping is one thing, that can be handled when 
mapping URLs to URIs. So as said above the last version of a day is 
*the* version of that day. Then we have versioned URIs and dated URLs.

Could it be, that you already came to the same conclusion in other parts 
of this thread? I appologize for that. Sometimes it takes a bit longer ...

Joerg

Re: [RT] Moving towards a new documentation system

Posted by Stefano Mazzocchi <st...@apache.org>.

On Sunday, Oct 19, 2003, at 19:54 Europe/Rome, Joerg Heinicke wrote:

>>> Remain the "everlasting semantically meaningful names". Of course 
>>> the URL should match the content. But if the content changes, so 
>>> that it no longer matches, what's the sense of having still this 
>>> page? Even if you use IDs and change the content later, the user 
>>> linked from outside to this page gets other content than he wants to 
>>> have. So there must be available an "outdating mechanism". Let's say 
>>> there is a page 
>>> http://cocoon.apache.org/documentation/tutorial/xmlforms.html, which 
>>> is linked often from outside, because it was the one and only form 
>>> handling in Cocoon. Now in Cocoon 2.2 or 3.0 XML Forms are removed 
>>> completely, but we don't want to give the user a simple 404 page. We 
>>> have to point out that he can "use Woody or Cocoon forms which is by 
>>> far better than XML Forms" with a link to the correct page 
>>> http://cocoon.apache.org/documentation/tutorial/cocoonforms.html. 
>>> You have to do exactly the same for the number URLs. So I think 
>>> there is no problem with "everlasting semantically meaningful >>> names".

>> you have a point there, that's for sure.
>> I'll think about this some more.... do you have any suggestion?
>
> I read today "Cool URIs don't change" [1] for preparing the answering 
> of your mail. And there are not many options left after reading it: 
> The URLs of a learning object need an ID and a modification date, not 
> more, not less.

hmmm, well, it depends.

The URI (we are talking about URIs here, not URLs, careful) of a LO 
will need an ID that identifies the object, then might (optionally) 
have another ID that identifies the version.

So, I would do

  http://host/path/ID

to identify the object in general and

  http://host/path/ID/version

to indicate the object version

> The IDs would allow to change the content without a latter mismatch 
> between URI and content.

Exactly.

> The date assures that a later access to a linked page has the content 
> it should have, it can not have been changed in the meantime.

This is a little bit more tricky. If you choose to use a timestamp for 
the version ID, then you have to make sure that you have enough 
granularity to take into account the minimum potential time in between 
two different changes might happen, or, again, you get a collision.

This is why I generally dislike the use of dates in URIs, version 
numbers are *abstract*, so

http://cocoon.apache.org/lo/39484/342

indicate revisions 342 of learning object 39484 and this does not 
change over time.

> The user would always access a page where the content is appropriate 
> to a certain date, similar to "cvs co -r 20030303 lo/1234567890.xml".

That means that you get a collision if you have more than one version 
of the documents per a given date, and this is very likely to happen.

> We only need a repository handling this. With plain html lying around 
> in a directory it's not doable I guess.

It could be virtualized (linotype already does versioning as mentioned 
above, it's just turned off by default as it wasn't stable and I didn't 
have time to clean it), but yes, a repository would be a much better 
choice.

> [2] also stands on 78 character URLs for avoiding breaks in mails. The 
> above should be perfect for it, while at the moment our URLs are often 
> longer than 78 characters.

Good point (didn't think of this, actually)

> One point is not handled: the autocompletion functionality of the 
> browsers.

Yep.

>  Mozilla mentioned above was not the best example, because it provides 
> additionally at least the title in the URL bar when using 
> autocompletion, but IE does not.

Well, Mozilla does the right thing and I think IE will follow. I 
wouldn't want to make a compromise because of weak browser 
implementations.

But also note that we are still talking about URIs not URLs. Using the 
LO URI as a URL might not be the only way to access the object.

> I don't have a solution for it, maybe it's not that important or at 
> least one compromise must be done.

I agree.

--
Stefano.

Re: [RT] Moving towards a new documentation system

Posted by Joerg Heinicke <jh...@virbus.de>.

Late, but ...

On 12.10.2003 18:21, Stefano Mazzocchi wrote:

...

> but you are talking about news and articles, which are immutable things.
> 
> our learning objects are not immutable. for example, I could change 
> their title as I go.

Maybe the title, but not the content in general/completely - as you
already agreed below.

> Also, adding a date to a learning object is bad: 
> should I change the date everytime I update it? if not, why would I care 
> about the date it was first created?

Read on below.

>> The reason for the content in the URL is the fast access to a page 
>> (you already accessed some time ago) without navigating from start page.
>> You start to type in the URL bar and Mozilla completes it as far as 
>> possible. But if you only see numbers you don't know which one it was. 
>> (It's the same with the autocomplete function using tabulator on linux 
>> commandline.) It's especially the reaccess of a page which gets 
>> complicated by using numbers.
> 
> 
> This is a very good point.

...

>> The possible collision of names should be handled by the repository.
>>
>> Remain the "everlasting semantically meaningful names". Of course the 
>> URL should match the content. But if the content changes, so that it 
>> no longer matches, what's the sense of having still this page? Even if 
>> you use IDs and change the content later, the user linked from outside 
>> to this page gets other content than he wants to have. So there must 
>> be available an "outdating mechanism". Let's say there is a page 
>> http://cocoon.apache.org/documentation/tutorial/xmlforms.html, which 
>> is linked often from outside, because it was the one and only form 
>> handling in Cocoon. Now in Cocoon 2.2 or 3.0 XML Forms are removed 
>> completely, but we don't want to give the user a simple 404 page. We 
>> have to point out that he can "use Woody or Cocoon forms which is by 
>> far better than XML Forms" with a link to the correct page 
>> http://cocoon.apache.org/documentation/tutorial/cocoonforms.html. You 
>> have to do exactly the same for the number URLs. So I think there is 
>> no problem with "everlasting semantically meaningful names".
> 
> 
> you have a point there, that's for sure.
> 
> I'll think about this some more.... do you have any suggestion?

I read today "Cool URIs don't change" [1] for preparing the answering of 
your mail. And there are not many options left after reading it: The 
URLs of a learning object need an ID and a modification date, not more, 
not less. The IDs would allow to change the content without a latter 
mismatch between URI and content. The date assures that a later access 
to a linked page has the content it should have, it can not have been 
changed in the meantime. The user would always access a page where the 
content is appropriate to a certain date, similar to "cvs co -r 20030303 
lo/1234567890.xml". We only need a repository handling this. With plain 
html lying around in a directory it's not doable I guess.

[2] also stands on 78 character URLs for avoiding breaks in mails. The 
above should be perfect for it, while at the moment our URLs are often 
longer than 78 characters.

One point is not handled: the autocompletion functionality of the 
browsers. Mozilla mentioned above was not the best example, because it 
provides additionally at least the title in the URL bar when using 
autocompletion, but IE does not. I don't have a solution for it, maybe 
it's not that important or at least one compromise must be done.

Joerg

[1] http://www.w3.org/Provider/Style/URI.html
[2] http://www.useit.com/alertbox/990321.html

Re: [RT] Moving towards a new documentation system

Posted by Stefano Mazzocchi <st...@apache.org>.

On Sunday, Oct 12, 2003, at 12:05 Europe/Rome, Joerg Heinicke wrote:

>> I think it's just easier to use a number.
>
> I don't like the idea of having simply numbers in the URL. Let's have 
> a look on the following URLs:
>
> http://www.sueddeutsche.de/deutschland/artikel/420/19401/
>
> Sueddeutsche online has an unique identifier (19401) in its URL that 
> is simply counted up. Additionaly you have a classification 
> (deutschland, artikel) and a number I don't know what for. But this 
> additional stuff is useless (maybe it's useful for faster repository 
> access). For a news page IMO it's important to have a date in the URL. 
> They link to other articles without any date information. You have to 
> click on them to know the date (or you can calculate them back from 
> the ID), that's not good usability.
>
> It's similar for
>
> http://www.spiegel.de/sport/formel1/0,1518,269451,00.html

eheh, gotta love Vignette ;-) [part of those numbers identify the 
cluster machine! *that*'s hacky!]

> From the URL you can not even guess from what date it is. But they 
> provide at least the date information when linking to another article.
>
> http://www.heise.de/newsticker/data/psz-11.10.03-002/
>
> is better in handling date information, but they don't provide 
> classification.
>
> I like much better the handling at
>
> http://www.xml.com/pub/a/2003/09/17/stax.html
>
> You get the date and what's the article about.

but you are talking about news and articles, which are immutable things.

our learning objects are not immutable. for example, I could change 
their title as I go. Also, adding a date to a learning object is bad: 
should I change the date everytime I update it? if not, why would I 
care about the date it was first created?

> Now we are no news page, so the above is maybe a bit to far from our 
> needs, but it expresses my thoughts about a URL: It should give as 
> much info as possible.

Believe me, I thought about this for years now. URI schemes vary 
depending on your needs.

Years ago, I though that URIs had to be semantic. I was wrong. I was 
mistaking URIs for URLs.

Moreover, semantic URLs tend to be more fragile because associated with 
the content they contain.

I have no problems for a URL such as the xml.com one. It's clean and 
persisting, but it can't be applied to the same things.

> What we need is the content of a page, not the date of its creation 
> (but maybe this helps too, something like "latest update").

latest update would continously change the URL.

> The reason for the content in the URL is the fast access to a page 
> (you already accessed some time ago) without navigating from start 
> page.
> You start to type in the URL bar and Mozilla completes it as far as 
> possible. But if you only see numbers you don't know which one it was. 
> (It's the same with the autocomplete function using tabulator on linux 
> commandline.) It's especially the reaccess of a page which gets 
> complicated by using numbers.

This is a very good point.

> And if you have customized and learning trails you will possibly 
> "never" find a way back to the page you want to access - similar to 
> customized menus in Windows or MS Office, when you are searching for 
> an entry which was there yesterday, but is no longer :-)

eheh :-)

> The possible collision of names should be handled by the repository.
>
> Remain the "everlasting semantically meaningful names". Of course the 
> URL should match the content. But if the content changes, so that it 
> no longer matches, what's the sense of having still this page? Even if 
> you use IDs and change the content later, the user linked from outside 
> to this page gets other content than he wants to have. So there must 
> be available an "outdating mechanism". Let's say there is a page 
> http://cocoon.apache.org/documentation/tutorial/xmlforms.html, which 
> is linked often from outside, because it was the one and only form 
> handling in Cocoon. Now in Cocoon 2.2 or 3.0 XML Forms are removed 
> completely, but we don't want to give the user a simple 404 page. We 
> have to point out that he can "use Woody or Cocoon forms which is by 
> far better than XML Forms" with a link to the correct page 
> http://cocoon.apache.org/documentation/tutorial/cocoonforms.html. You 
> have to do exactly the same for the number URLs. So I think there is 
> no problem with "everlasting semantically meaningful names".

you have a point there, that's for sure.

I'll think about this some more.... do you have any suggestion?

--
Stefano.

Re: [RT] Moving towards a new documentation system

Posted by Joerg Heinicke <jh...@virbus.de>.

On 11.10.2003 17:19, Stefano Mazzocchi wrote:

>>>> ...How about naming files like
>>>>
>>>>   3948494-some-descriptive-name-for-humans-here.xml
>>>
>>> It's like suggesting to have a BugID "39484-my-file-can't-be-found" 
>>> as the primary key of the bug table in mysql, just because people 
>>> might want to edit bugs by hand inside the database!!
>>>
>>> When you have bug emails, you get the bug ID which is unique and 
>>> semanticless....
>>
>> You usually get the bug ID *and* the title, which lets you decide 
>> whether you're interested in it or not without having to look up the ID.
> 
> yeah, but my point is that we should make it hard for people to edit 
> stuff in the repository directly. Accessing a WebDAV repository directly 
> should be considered a side access for administration purposes, not a 
> direct interface (unless there is a webdav-app in between, but that's 
> another story)
> 
> just like you use the bugzilla frontend to edit the bugs, you don't do 
> it by hand by editing tables because you don't know how many things 
> could be triggered in the database by changing one table.
> 
> a repository is just like a database: editing a database by direct SQL 
> injection is silly. Today it doesn't look silly because repositories are 
> *much* less functional than a database, but when you have a *serious* 
> repository (for example, one that can extract properties from an image 
> and provide an RDF representation for it), editing it *by hand* would be 
> silly.
> 
> In this context, having a file with a numerical name, is just like 
> having a node in a JSR 170 repository with a unique UUID... which is the 
> basis for having the node linkable.
> 
> but from a higher point of view, a LO is actually identified by a number 
> (or the timestamp of creation, anything unique and that can last 
> forever)... if you add a semantically meaningful name, this means that 
> you have to rely on that name to still be semantically meaningful in the 
> future... and different enough to allow to have thousands of documents 
> without incurring into name collisions.
> 
> I think it's just easier to use a number.

I don't like the idea of having simply numbers in the URL. Let's have a 
look on the following URLs:

http://www.sueddeutsche.de/deutschland/artikel/420/19401/

Sueddeutsche online has an unique identifier (19401) in its URL that is 
simply counted up. Additionaly you have a classification (deutschland, 
artikel) and a number I don't know what for. But this additional stuff 
is useless (maybe it's useful for faster repository access). For a news 
page IMO it's important to have a date in the URL. They link to other 
articles without any date information. You have to click on them to know 
the date (or you can calculate them back from the ID), that's not good 
usability.

It's similar for

http://www.spiegel.de/sport/formel1/0,1518,269451,00.html

 From the URL you can not even guess from what date it is. But they 
provide at least the date information when linking to another article.

http://www.heise.de/newsticker/data/psz-11.10.03-002/

is better in handling date information, but they don't provide 
classification.

I like much better the handling at

http://www.xml.com/pub/a/2003/09/17/stax.html

You get the date and what's the article about.

Now we are no news page, so the above is maybe a bit to far from our 
needs, but it expresses my thoughts about a URL: It should give as much 
info as possible.

What we need is the content of a page, not the date of its creation (but 
maybe this helps too, something like "latest update"). The reason for 
the content in the URL is the fast access to a page (you already 
accessed some time ago) without navigating from start page. You start to 
type in the URL bar and Mozilla completes it as far as possible. But if 
you only see numbers you don't know which one it was. (It's the same 
with the autocomplete function using tabulator on linux commandline.) 
It's especially the reaccess of a page which gets complicated by using 
numbers. And if you have customized and learning trails you will 
possibly "never" find a way back to the page you want to access - 
similar to customized menus in Windows or MS Office, when you are 
searching for an entry which was there yesterday, but is no longer :-)

The possible collision of names should be handled by the repository.

Remain the "everlasting semantically meaningful names". Of course the 
URL should match the content. But if the content changes, so that it no 
longer matches, what's the sense of having still this page? Even if you 
use IDs and change the content later, the user linked from outside to 
this page gets other content than he wants to have. So there must be 
available an "outdating mechanism". Let's say there is a page 
http://cocoon.apache.org/documentation/tutorial/xmlforms.html, which is 
linked often from outside, because it was the one and only form handling 
in Cocoon. Now in Cocoon 2.2 or 3.0 XML Forms are removed completely, 
but we don't want to give the user a simple 404 page. We have to point 
out that he can "use Woody or Cocoon forms which is by far better than 
XML Forms" with a link to the correct page 
http://cocoon.apache.org/documentation/tutorial/cocoonforms.html. You 
have to do exactly the same for the number URLs. So I think there is no 
problem with "everlasting semantically meaningful names".

Thoughts?

Joerg

Re: [RT] Moving towards a new documentation system

Posted by Bertrand Delacretaz <bd...@apache.org>.

Le Samedi, 11 oct 2003, à 17:19 Europe/Zurich, Stefano Mazzocchi a 
écrit :

> ...my point is that we should make it hard for people to edit stuff in 
> the repository directly...

Ok, I see the idea.

There's one additional concern though, which was undelying in my 
suggestions: it should be as easy to fix the docs as it is to fix the 
code, meaning that it must be real easy for a committer to change one 
line in a given document/LO, without necessarily needing to start a 
heavy client, wait for ages on "build docs" or something.

This has played a big part in the success of the wiki I think: if one 
sees a mistake there, it is quicker to fix it than to send mail to the 
list asking for a fix.

Doesn't have to do with IDs directly, but should be taken into account 
in the overall design of the system.

>> ...  2003.332.221
>>
>> is much more easier to read (and to spell out) than
>>
>>   2003332221
>
> do you seriously think we'll get to 2 billion learning objects? that's 
> wishful thinking ;-)

You were the one to start with this: 
http://cocoon.apache.org/cocoon/LO/3948494, which is still a large 
number isn't it ;-)

Anyway, the exact for of IDs need not be decided now, as you say:

> ...I have no problems in whatever ID schema we use, even something 
> similar to ISBN or UUID or tre-numbers-dot format of IP addresses, 
> anything is good as long as it doesn't overlap concerns about 
> identification and titleing.

ok.

> ...please people, *STOP* thinking at those things as files and as a 
> repository (CVS, WEbDAV, whatever) as a thin layer on top of a file 
> system... these are just implementation details.

ok, why not move to repository-based docs if we have the resources 
(=volunteers) do to it.

-Bertrand

Re: [RT] Moving towards a new documentation system

Posted by Stefano Mazzocchi <st...@apache.org>.

On Saturday, Oct 11, 2003, at 16:20 Europe/Rome, Bertrand Delacretaz 
wrote:

> Le Samedi, 11 oct 2003, à 15:33 Europe/Zurich, Stefano Mazzocchi a 
> écrit :
>
>>
>> On Saturday, Oct 11, 2003, at 14:58 Europe/Rome, Bertrand Delacretaz 
>> wrote:
>>> ...How about naming files like
>>>
>>>   3948494-some-descriptive-name-for-humans-here.xml
>>
>> It's like suggesting to have a BugID "39484-my-file-can't-be-found" 
>> as the primary key of the bug table in mysql, just because people 
>> might want to edit bugs by hand inside the database!!
>>
>> When you have bug emails, you get the bug ID which is unique and 
>> semanticless....
>
> You usually get the bug ID *and* the title, which lets you decide 
> whether you're interested in it or not without having to look up the 
> ID.

yeah, but my point is that we should make it hard for people to edit 
stuff in the repository directly. Accessing a WebDAV repository 
directly should be considered a side access for administration 
purposes, not a direct interface (unless there is a webdav-app in 
between, but that's another story)

just like you use the bugzilla frontend to edit the bugs, you don't do 
it by hand by editing tables because you don't know how many things 
could be triggered in the database by changing one table.

a repository is just like a database: editing a database by direct SQL 
injection is silly. Today it doesn't look silly because repositories 
are *much* less functional than a database, but when you have a 
*serious* repository (for example, one that can extract properties from 
an image and provide an RDF representation for it), editing it *by 
hand* would be silly.

In this context, having a file with a numerical name, is just like 
having a node in a JSR 170 repository with a unique UUID... which is 
the basis for having the node linkable.

but from a higher point of view, a LO is actually identified by a 
number (or the timestamp of creation, anything unique and that can last 
forever)... if you add a semantically meaningful name, this means that 
you have to rely on that name to still be semantically meaningful in 
the future... and different enough to allow to have thousands of 
documents without incurring into name collisions.

I think it's just easier to use a number.

>> ...Think about TCP/IP: instead of placing a human identifier at the 
>> IP level, they used a lookup mechanism. This is exactly the paradigm 
>> that we should follow, IMO.
>
> Agreed, provided the usability of this lookup is good enough to:
> -Easily find out what learning object a CVS (or other "change event") 
> message is about

not harder than finding out what bug report bugid 23494 is about. it 
would be enough to access the URI of the LO as a URL, thus clicking on 
the URI would yield a browsable view of the LO. I can't think of 
anything simpler than this, not even if it had a semantic name.

> -Easily select a learning object for editing, review, etc, without 
> needing complex tools

a "search by LOID" should be enough.

> Also, the comparison with TCP/IP brings another idea, instead of using 
> big numbers for IDs they could be split like IP addresses to make them 
> more readable.
>
> Some people have a hard time reading long chains of numbers, I am one 
> of these and for me
>
>   2003.332.221
>
> is much more easier to read (and to spell out) than
>
>   2003332221

do you seriously think we'll get to 2 billion learning objects? that's 
wishful thinking ;-)

> Where I have a hard time figuring out the number of 3's and 2's 
> (you'll see when you are my age ;-)

I have no problems in whatever ID schema we use, even something similar 
to ISBN or UUID or tre-numbers-dot format of IP addresses, anything is 
good as long as it doesn't overlap concerns about identification and 
titleing.

> I'm using 2003 in front as starting with the year in which the ID was 
> assigned gives some useful context. Mixing concerns, I know, but also 
> makes for an easy way of splitting LOs in subdirectories for storage, 
> to avoid having millions of files in a single directory.

nah, this is not a problem for future repositories like JSR170, don't 
worr.

please people, *STOP* thinking at those things as files and as a 
repository (CVS, WEbDAV, whatever) as a thin layer on top of a file 
system... these are just implementation details.

>> ...Messy. what would something like this behave?
>>
>>  22003-this-is-first-doc.xml
>>  22003-this-is-second-doc.xml
>> ...
>
> that's what I meant by the system having to ensure the uniqueness of 
> IDs. It is certainly problematic.

yep

> I agree that a pure ID for naming pieces  of content might be better, 
> provided lookup is super-easy and doesn't get in the way of editing, 
> keeping track of changes etc., and the ID's stay readable and 
> "communicable".

+1 to these goals.

--
Stefano.

Re: [RT] Moving towards a new documentation system

Posted by Bertrand Delacretaz <bd...@apache.org>.

Le Samedi, 11 oct 2003, à 15:33 Europe/Zurich, Stefano Mazzocchi a 
écrit :

>
> On Saturday, Oct 11, 2003, at 14:58 Europe/Rome, Bertrand Delacretaz 
> wrote:
>> ...How about naming files like
>>
>>   3948494-some-descriptive-name-for-humans-here.xml
>
> It's like suggesting to have a BugID "39484-my-file-can't-be-found" as 
> the primary key of the bug table in mysql, just because people might 
> want to edit bugs by hand inside the database!!
>
> When you have bug emails, you get the bug ID which is unique and 
> semanticless....

You usually get the bug ID *and* the title, which lets you decide 
whether you're interested in it or not without having to look up the ID.

> ...Think about TCP/IP: instead of placing a human identifier at the IP 
> level, they used a lookup mechanism. This is exactly the paradigm that 
> we should follow, IMO.

Agreed, provided the usability of this lookup is good enough to:
-Easily find out what learning object a CVS (or other "change event") 
message is about
-Easily select a learning object for editing, review, etc, without 
needing complex tools

Also, the comparison with TCP/IP brings another idea, instead of using 
big numbers for IDs they could be split like IP addresses to make them 
more readable.

Some people have a hard time reading long chains of numbers, I am one 
of these and for me

   2003.332.221

is much more easier to read (and to spell out) than

   2003332221

Where I have a hard time figuring out the number of 3's and 2's (you'll 
see when you are my age ;-)

I'm using 2003 in front as starting with the year in which the ID was 
assigned gives some useful context. Mixing concerns, I know, but also 
makes for an easy way of splitting LOs in subdirectories for storage, 
to avoid having millions of files in a single directory.

> ...Messy. what would something like this behave?
>
>  22003-this-is-first-doc.xml
>  22003-this-is-second-doc.xml
> ...

that's what I meant by the system having to ensure the uniqueness of 
IDs. It is certainly problematic.

I agree that a pure ID for naming pieces  of content might be better, 
provided lookup is super-easy and doesn't get in the way of editing, 
keeping track of changes etc., and the ID's stay readable and 
"communicable".

-Bertrand

Re: [RT] Moving towards a new documentation system (was: [RT] Updating the website)

Posted by Stefano Mazzocchi <st...@apache.org>.

On Saturday, Oct 11, 2003, at 14:58 Europe/Rome, Bertrand Delacretaz 
wrote:

> Le Samedi, 11 oct 2003, à 14:11 Europe/Zurich, Stefano Mazzocchi a 
> écrit :
>
>> ...I think the documents should have a *numerical* identifier that 
>> equates them with a URI.
>>
>>  http://cocoon.apache.org/cocoon/LO/3948494
>
> I like the "unique ID" idea, OTOH not having descriptive names makes 
> it hard for people to locate the appropriate file to edit in a 
> directory, decode CVS change messages, etc.
>
> How about naming files like
>
>   3948494-some-descriptive-name-for-humans-here.xml

It's like suggesting to have a BugID "39484-my-file-can't-be-found" as 
the primary key of the bug table in mysql, just because people might 
want to edit bugs by hand inside the database!!

When you have bug emails, you get the bug ID which is unique and 
semanticless. if the bug changes course over time (might even change 
title in the more advanced tracking tools!), the "issue" is the same 
and can change without breaking because the contract is nameless.

Think about TCP/IP: instead of placing a human identifier at the IP 
level, they used a lookup mechanism. This is exactly the paradigm that 
we should follow, IMO.

> Where what follows the first dash is ignored by the publishing system 
> (which will need to check the uniqueness of IDs then)?

Messy. what would something like this behave?

  22003-this-is-first-doc.xml
  22003-this-is-second-doc.xml

>> ..where "LO" stands for "learning object".
>
> hehe ;-)
>
>>> ...-create a very simple publishing system for now (Forrest 
>>> probably?), until the new docs system moves forward
>>
>> a first step could be the introduction of files that contain 
>> navigation structure. They could also be xslt processed to generate 
>> .htdocs file for mod_rewrite instructions so that we don't expose 
>> those URI directly but we wrap them with nicer-looking addresses.
>>
>> [it's the static equivalent of a lookup]...
>
> Sounds good.

or we might use site:-based address translation capabilities of 
forrest. even if I'm not exactly sure on how this works (haven't look 
at forrest in a long time).

>>> P.S. We need to find a name for this new doc management system - I'm 
>>> low on ideas noew but maybe CDMS? Cocoon Documentation Management 
>>> System?
>>
>> -1 for acronyms.
>
> Actually I don't like them either ;-)
> But we need to name this thing at some point.

True.

I had two names "Hyperbook" and "Papyrus", but both are probably 
infringing some trademark. I'll try to come up with something that is 
"print" or "publishing" or "learning" related.... if you have a 
suggestion, it's a good time to speak up.

NOTE: this is *NOT* something that will replace either forrest or 
lenya. In fact, the idea of this system is to show off *all* the 
cocoon-related technologies we have in one big showcase for our own 
use. So, both forrest and lenya should be happy to participate into 
this because it might give them even more exposure and ideas for new 
features or simply more itches to scratch that would revamp the various 
communities.

--
Stefano.

Re: [RT] Moving towards a new documentation system (was: [RT] Updating the website)

Posted by Bertrand Delacretaz <bd...@apache.org>.

Le Samedi, 11 oct 2003, à 14:11 Europe/Zurich, Stefano Mazzocchi a 
écrit :

> ...I think the documents should have a *numerical* identifier that 
> equates them with a URI.
>
>  http://cocoon.apache.org/cocoon/LO/3948494

I like the "unique ID" idea, OTOH not having descriptive names makes it 
hard for people to locate the appropriate file to edit in a directory, 
decode CVS change messages, etc.

How about naming files like

   3948494-some-descriptive-name-for-humans-here.xml

Where what follows the first dash is ignored by the publishing system 
(which will need to check the uniqueness of IDs then)?

> ..where "LO" stands for "learning object".

hehe ;-)

>> ...-create a very simple publishing system for now (Forrest 
>> probably?), until the new docs system moves forward
>
> a first step could be the introduction of files that contain 
> navigation structure. They could also be xslt processed to generate 
> .htdocs file for mod_rewrite instructions so that we don't expose 
> those URI directly but we wrap them with nicer-looking addresses.
>
> [it's the static equivalent of a lookup]...

Sounds good.

>> P.S. We need to find a name for this new doc management system - I'm 
>> low on ideas noew but maybe CDMS? Cocoon Documentation Management 
>> System?
>
> -1 for acronyms.

Actually I don't like them either ;-)
But we need to name this thing at some point.

-Bertrand

Re: [RT] Moving towards a new documentation system (was: [RT] Updating the website)

Posted by Stefano Mazzocchi <st...@apache.org>.

On Saturday, Oct 11, 2003, at 11:36 Europe/Rome, Bertrand Delacretaz 
wrote:

> Le Samedi, 11 oct 2003, à 04:21 Europe/Zurich, David Crossley a écrit :
>
>> Tony Collen wrote:
>>> ...We might need to get away from the "developer" vs "user" notion, 
>>> because depending on how much about
>>> Cocoon you already know, you might have to hack out a new generator 
>>> (which would seem to imply
>>> information in the developer section) while you are really a user.
>>
>> +1 ... we have talked about that many times. Almost every
>> user is a developer. Anyway "Trails" are better navigation method.
>
> +1
>
> I'm starting to think (and I think this resonates with what Tony was 
> saying) that the physical structure of the docs should be flat, 
> wiki-style, having all docs "files" (real files or generated) in a 
> single directory, of very few directories like "reference", 
> "documents" and maybe "technotes".

eheh, seems like the memes start to percolate. good.

I think the documents should have a *numerical* identifier that equates 
them with a URI.

  http://cocoon.apache.org/cocoon/LO/3948494

where "LO" stands for "learning object".

> We can then build all kinds of navigational structures, trails, 
> multiple tables of contents, beginners/advanced, whatever (again 
> picking up on wiki idea of a flat page structure with many navigation 
> paths), but the path to a given document stays valid forever unless 
> documents are removed.

Exactly. Right on.

With a numberical identifier, we can even keep the page if we change 
its title (this will ease refactoring and reduce the chance of future 
migration... btw, this is the approach used by DOI and DSpace... note 
that this is exactly the same concept I use for linotype, where news 
are identified by a unique incremental number)

> Of course we forfeit compatibility with our existing docs URLs, but I 
> think this is needed anyway to move forward.

Big +1, we have to cut the rope at some point.

> This might also make our remodeling easier:
>
> -move all existing docs to a small number of directories like above, 
> "big bag of docs"

+1

> -rename docs as needed to give them permanent names

yep

> -create a very simple publishing system for now (Forrest probably?), 
> until the new docs system moves forward

a first step could be the introduction of files that contain navigation 
structure. They could also be xslt processed to generate .htdocs file 
for mod_rewrite instructions so that we don't expose those URI directly 
but we wrap them with nicer-looking addresses.

[it's the static equivalent of a lookup]

> -start building the navigations, trails, tables of contents 
> incrementally

yes, the links can be refactored without

> -if the docs format changes for the new doc management system, 
> navigation definitions stay valid

true and in the future, we can still use those to seed a more dynamic 
approach.

> I think we need to find a way to get started with this docs remodeling 
> without having to wait  too long on our improved doc management system 
> - if an incremental path like above works it might help us get > started.
>
> Thoughts?

big +1

> -Bertrand
>
>
> P.S. We need to find a name for this new doc management system - I'm 
> low on ideas noew but maybe CDMS? Cocoon Documentation Management 
> System?

-1 for acronyms.

--
Stefano.