You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Stephan Michels <st...@apache.org> on 2002/06/21 14:44:05 UTC
RE: [PROPOSAL] Sources, the next generation RE: Speedup *DirectoryGenerator (fwd)

To bring the discussion back to the cocoon-dev list ;-)


---------- Forwarded message ----------
Date: Fri, 21 Jun 2002 14:34:16 +0200
From: Marc Portier <mp...@outerthought.org>
To: Stephan Michels <st...@vern.chem.tu-berlin.de>
Cc: Steven Noels <st...@outerthought.org>
Subject: RE: [PROPOSAL] Sources,
     the next generation RE: Speedup *DirectoryGenerator

> > >
> > > > I like the idea you propose a lot, but I guess we'll need
> to build it to
> > > > actually get it there...
> > >
> > > My next steps is to creating a WebDAV SourceFactory, which
> will implement
> > > the proposed interfaces. I will checkin the things in the next days.
> > >
> > > So I try to make two example: WebDAV and the direct access to Slide.
> > >
> >
> > great, I still have my 'libre' focus for forest, but that makes
> the testcase
> > for the genericity of the API only better, right?
>
> Jupp, I think it also important that also source generate the same
> description to be independend.
>
> > > What do you think about RDF for the description of the sources?
> >
> > don't know rdf enough (but my colleague, Steven Noels does),
> just throw this
> > way what you think it should look like, and what you try to achieve
> >
> > > > > > I don't. I want a similar syntax to PROPPATCH
> > > > > >
> > > > > > http://asg.web.cmu.edu/rfc/rfc2518.html#sec-8.1.1
> > > > > >
> > > > >
> > > > > you mean, in the output you would foresee a <prop> element that
> > > > > can have any
> > > > > elements from other namespaces as it's contentmodel?
> > > > >
> > > > > I was in fact referring to how you will programmatically
> > > decide what to do
> > > > > with the propertyValue?
> > > > > There is one getSourceProperties() in the InformalSource now, and
> > > > > we didn't
> > > > > discuss the detail of the SourceProperty object... Question
> > > > > being: how will
> > > > > the traverser know how to insert the propertyValue into the
> > > > > generated output
> > > > > XML?
> > > > > - as an attribute to the element OR
> > > > > - as a nested element in the <prop> section of this element?
> > > > >
> > > > > maybe this clarifies my latter remark as well:
> > > > >
> > > > > > > In the latter case I'ld also try to avoid returning the
> > > > > > property value as a
> > > > > > > String, but rather hope for a mechanism to let the SourceImpl
> > > > > > inject the SAX
> > > > > > > Events directly to the output, or return at least use a
> > > > > > > org.w3c.dom.DocumentFragment return type instead.
> > > > > > >
> > > > >
> > > > > giving the fact there is some SAX orientation my current
> > > feeling is there
> > > > > could be need for asking first the Properties that have
> values to be
> > > > > inserted in the atts-list, separate from asking the ones that
> > > > > need to end up
> > > > > as nested elements.
> > >
> > > SourceProperty:
> > >
> > >   String getName()
> > >   String getNamespace()
> > >   String getValueAsString()
> > >   Element getValueAsElement()
> > >
> >
> > That's a bit DOM oriented, isn't it? I would like for some
> >   void toSAX(ContentHandler ...)  // but I guess your
> DOMStreamer example
> > pretty much does the same, so not really needed
>
> The is that must store the DocumentFragment, for the furture design
> SourceProperty can implement XMLizable.
>

I guess it's a matter of taste which one you take up first, for xpath
results the documentFragment will probably proove to be easier, so let's
start that way

> > In every case the return could then be made Node, or DocumentFragment
> > instead, no?
>
> We can do it like the SessionManager, it works with DocumentFragments.
>
> > And it still is kind of leaving the decision question open:
> > Suppose I'm a very generic SourceBrowsingGenerator, I get a
> BrowsableSource
> > that happens to be InformalSource as well (Informal in fact
> sounds more like
> > 'not official', maybe another name would not put up this confusion,
> > suggestion: InspectableSource?)
>
> Yes, InspectableSource sounds a bit better.
>
> > I read all it's properties, and now what? On what basis do I decide to
> > promote certain propertyValues into attributeValues or nested elements?
> >
> > possible alternatives:
> > [1] this generator received some kind of template (or schema?)
> of the output
> > it is expected to generate for each source (then it knows where
> to introduce
> > atts or elms)
> > [2] the property decides by having only one method getValueAsNode, the
> > dom.Node will be either an dom.Attribute dom.Element or
> dom.DocumentFragment
> > [3] the source implementation decides, based on what it knows about the
> > actual file, collection, resource,... it's pointing at, in which case we
> > should have some way (as stated earlier) to have 2 separate
> methods (1) for
> > asking the attribute-value-string-like properties, on which you call
> > getValueAsString()  and (2) for asking the
> to-be-nested-elements, on which
> > you call toSAX()
> >
> > can't figure out why yet, but I do have a tendancy to think 3
> is the most
> > natural and flexible
>
> So then we use RDF every Property is and element in the description like
> this
>
> <rdf:RDF>
>  <rdf:Description about="dav://mydir/myfile.xml">
>   <title xmlns="mynamspace">bla title<title>
>   <getcontentlength xmlns="DAV:">43687</getcontentlength>
>   <width xmlns="http://dfhgf">36237</width>
>   <keywords
> xmlns="mynamspace"><keyword>test</keyword><keyword>test</keyword><
> /keywords>
>  </rdf:Description>
> </rdf:RDF>
>
> So the properties are title, getcontentlength, width and keywords.
>

I have to check the RDF spec to get up to par with your knowledge about it
I'm copying in my colleague Steven (XML, XSL and related master) to maybe
comment on that.... in every case, bringing this back into the newsgroup
will possible gather more feedback as well? (If you like, I could help out
on the writing part, combining our current ideas on the thing)

looks like inside the rdf:Description one can put in any elements you'ld
like, that would be the kind of flexibility we need in any case

I guess choosing RDF as output format defenitely has it's advantages (not
the least the fact that we don't need to invent any own formats :-))


> If you want a description from a collection
>
> <rdf:RDF>
>  <rdf:Description about="dav://mydir">
>   <children xmlns="http://xml.apache.org/cocoon/source">
>    <rdf:Seq>
>     <rdf:li>
>      <rdf:Description about="dav://mydir/myfile.xml">
>       <title xmlns="mynamspace">bla title<title>
>       <getcontentlength xmlns="DAV:">43687</getcontentlength>
>       <width xmlns="http://dfhgf">36237</width>
>       <keywords
> xmlns="mynamspace"><keyword>test</keyword><keyword>test</keyword><
> /keywords>
>      </rdf:Description>
>     </rdf:li>
>    </rdf:Seq>
>   </children>
>  </rdf:Description>
> </rdf:RDF>
>

a bit more verbose then the pure <collection> <item> idea, but still
workable I guess..
the <children> thing kinda points out that rdf itself has no notion of
nesting resources to be described (I'll try to verify by reading the w3c
recommendation)

this fact kind of worries me about sensible rdf related stuff (tools,
stylesheets,...) or whatever that could actually be reused?

also, if it's the case we might aswell turn around the idea and use:
<collection> and <item> still but allow for
<rdf:Description> rather as nested elements inside them that vice versa
(like know whith th children)...

I have the feeling this would allow for easier substraction (through xsl) of
a true rdf file that just lists the resources and their descriptions, but
I'ld like to wait on Steven to comment on this ?


> dav://mydir has also a property 'children', which contains an enumeration
> of description. The RDF format is a bit more verbose, but has the avantage
> to bet a W3C standard.
>
> To your question, I think every propery can have nested elements.
>
yep.

> > >
> > > For a example, if you want to store keywords for a document:
> > >
> > > SourceProperty keywords = source.getSourceProperty("http://mycompany",
> > > "keywords");
> > >
> > > System.out.println("keywords="+keywords.getValueAsString());
> > >
> > > DOMStreamer.stream(keywords.getValueAsElement(), mycontenthandler);
> > >
> > > So getValueAsString() returns something like that
> > >
> > > keywords=<keyword>test</keyword><keyword>bla</keyword>
> > >
> > this is not somehting we would package as an atts-value, is it?
> >
> > > and getValueAsElement()
> > >
> > > <keywords
> > > xmlns="http://mycompany"><keyword>test</keyword><keyword>bla</keyw
> > > ord></keywords>
> > >
> >
> > mmmm, guess I'm talking about something slightly different.
> > and since System.out.println isn't the way we talk to our
> successor in the
> > pipe I guess String-representations of what comes around as a
> to be inserted
> > set of elements is not really the way to go?
>
> Yes, the string representation is not the correct way.
>

mmm, maybe we're throwing away everything now?
my remark was: string representation ALONE is not correct
but you have me started now on why ever bother having attributes?

maybe Steven can come up with some pragmatic reason?


> > maybe this explains my point better:
> > as far as the source-browsing-generator is concerned it will
> only be able to
> > generate output that holds two kind of elements: <collection> that 'can'
> > hold others, or <item> that can not hold other items: (mapping to
> > directories/files or collections/resources or whatever)
> >
> > so un-decorated that would be something like
> >
> > <collection>
> >   <item />
> >   <collection>
> >     <item />
> >   </collection>
> >   <collection />
> > </collection>
> >
> > in fact, this generic component could of course also be aware
> of the URL it
> > used to create the source, so the src="" attribute could also
> get into what
> > we call 'undecorated':
> >
> > <collection src="libre:/documentation/xdocs">
> >   <item src="libre:/documentation/xdocs/test.xml" />
> >   <collection src="file://somepath/images">
> >     <item src="file://somepath/images/test.jpg" />
> >   </collection>
> >   <collection src="dav://srv.dmn/somecollectionpath" />
> > </collection>
> >
> > so what about this decorating?
> > - for certain media files special properties like duration,
> height, width,
> > ...
> >     like to have these as attributes to the <item />
> > - for specific applications the last modification date, or
> other file system
> > attributes
> >     probably also as attribute to <item />
> > - for DAV like stuff the <props>
> >     as you mentioned: like a <props> nested elm
> > - for XML files: the result of some configurable xpath expression
> >     since the xpath-expr could return even a nodeset, there is no way of
> > fitting this into a attributeValue, right...
> >
> > based on how these sources where pointed at (url-protocol part) and what
> > these sources actually have as a mime type these properties would be
> > available... how this properties should be inserted in the
> output is I guess
> > not something the generic generator can decide:
> >
> > I'ld like to get a decorated output version that looks like:
> >
> > <collection src="libre:/documentation/xdocs"
> pfx:name="Documentation Root"
> > xmlns:pfx="...">
> >   <item src="libre:/documentation/xdocs/test.xml"
> >         pfx:title="The XPATH Result on
> /document/header/title/text() into
> > this document" xmlns:pfx="...">
> >     <props xmlns:p="different-for-nested-stuff">
> >       <p:whatever some="this time there was an xpath that produced a
> > nodeset"
> >                   xmlns:doc="stolen-from-the-doc-it-read">
> >         <doc:node >
> >           <!-- just about any nested stuff -->
> >         </doc:node>
> >         <doc:node />
> >       </p:whatever>
> >       <p:other />
> >     </props>
> >   </item>
> >   <collection src="file://somepath/images">
> >     <item src="file://somepath/images/test.jpg" pfx:height="120"
> > pfx:width="137" xmlns:pfx="..."/>
> >   </collection>
> >   <collection src="dav://srv.dmn/somecollectionpath" >
> >     <props .../>
> >   </collection>
> > </collection>
>
> but what is if your xpath expression have also nested elements? It is not
> better to have all properties as elements?
>

good question...
inside libre we know now what the property is going to be used for
- so if it's an attribute-value (== a String) that is needed, we simply go
for the getValue of the first node in the possible NodeSet that was returned
(giving somethimes odd results, I admit, but that forces people to add ome
/text() on those xpath expressions whose returned results will be used as
att-values....

so on the one hand: we can always kind of urge people into thinking about
correctly writing their xpath (like explained above)
on the other hand: if there is no real argument pro attributes for these
kind of property stuff, then yes, we might as well drop it and go for the
always going to work nested element approach

Steven, any arguments?


> >
> > looking at it from the generator again this output would mean
> that it got
> > these events:
> > >>1<<
> > - generator asked to start at src="libre:/documentation/xdocs"
> > - it's browsable so we continue, by first downcasting
> > - isCollection returns true (we should put out a <collection>
> elm, but need
> > the atts first
> > - so we ask its AttrValProperties, it holds one, name==name,
> > value=="Documentation Root", ns==...
> > - now we should allow for a <props> section on the collection as well,
> > asking for the ElmProps we get a null return, so we don't introduce the
> > <props>
> > - next we browse down by checking (hasChildren ?) or just get them
> > getChildren(), with each of the returned Sources we do
> recursively the same:
>
> Yes, works like the DirectoryGenerator and my
> RepositorySourceDescriptionGenerator.
>
> > etc etc
> >
> >
> > So what do you think?
> > (It is getting tedious on the namespace declarations he)
>
>
> I think there also a problem for your attr properties with namspaces.
> Properties should have a namspace to know where they come from.
> If you what to use attributes, you must use prefixes, like this
>
> <item xmlns:myprefix="mynamespace" myprefix:myattribute="bla"/>

yep, that was my remark, it _is_ getting tedious and messy

>
> So where does the prefix comes? If you doesn't declare the prefix, it
> must generate automatic.
> If you use elements, you don't have use prefixes
>

mmm, don't think I can agree, but again Steven can clarify
to my understanding:
- not using prefixes on elements put's them in the _default_ namespace
which means that there could be a xmlns="..." somewhere up in the generated
output that could declare one namespace for all nested elements we're going
to encounter deeper down, avoiding the tedious xmlns on a lot of elements
- not using prefixes on attributes put's them in _no_ namespace
which I would guess is only equal to the default namespace if that one is
not bound to any prefix (so there is no xmlns="" anywhere)

so I guess the remark 'you don't have to use prefixes' applies equally for
elements and attributes

> <item>
>  <myproperty xmlns="mynamespace">bla</myproperty>
> </item>
>

yep, but I guess by just adding prefixing to the tree related elm-names
<tree:item myproperty="bla" xmlns="mynamespace" />
relieves us from that issue, no?

however still, I see gained flexibility and simplicity in requiring props to
be all nested elms... If we don't find the counter-example (I don't
currently) for requiring prop values to be sometimes inserted as att-values,
then we should just dump them.

(should be strong enough case, cause a simple xslt should always be able to
place the elm-context text into an att-value anyway)

> Stephan Michels.
>

-marc=


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org