You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Per Kreipke <pe...@onclave.com> on 2002/06/14 17:56:55 UTC

Speedup *DirectoryGenerator (e.g. ImageDirectoryGenerator et al)...

After looking at ImageDirectoryGenerator, which runs about 3x slower than
DirectoryGenerator on image files, I think the following changes will speed
it up (and similarily, MP3DirectoryGenerator).

For fun, compare DirectoryGenerator against ImageDirectoryGenerator on the
same directory (with a moderate number of images:

<map:match pattern="files/*">
	<map:generator type="DirectoryGenerator" >
		<map:parameter name="depth" value="3" />
	</map:generator>
	<map:serialize type="xml" />
</map:match>

<map:match pattern="files2/*">
	<map:generator type="ImageDirectoryGenerator" >
		<map:parameter name="depth" value="3" />
	</map:generator>
	<map:serialize type="xml" />
</map:match>

Wouldn't it be nice if the second time you requested the image info, it was
as fast as the DirectoryGenerator?



Suggestions:

- having getSize() call getFileType() and then getJpegSize() or
getGifSize(), introduces nice modularity but sacrifices speed. Each function
in that sequence calls (that's two calls total):

  new BufferedInputStream(new FileInputStream(file));

Instead, instantiate the BufferedInputStream in getSize() and pass it to the
other functions. Or move the work from getFileType() and get*Size() back in
to getSize().


- more importantly, caching the information from getSize() plus
'lastModified' in an internal hash table with the file's URL as key would
remove the need to do the expensive work each time. If the file hasn't
changed, then it's size (or MP3 info) hasn't either.


Unfortunately, I don't know Cocoon well enough to understand if Generators
are global instances (so that all requests will share the hash table) or
whether it exists per pipeline, per sitemap, etc. My point: I'm not sure how
to implement the cached info correctly.



I would love to do this work and send in the patch myself, and I'll attempt
to do so when I have the latest C2 source installed here. Unless someone
desparate does it first :-)

Per


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: Speedup *DirectoryGenerator (e.g. ImageDirectoryGenerator et al)...

Posted by Steven Noels <st...@outerthought.org>.
Per Kreipke wrote:

 > After looking at ImageDirectoryGenerator, which runs about 3x slower
 > than DirectoryGenerator on image files, I think the following changes
 > will speed it up (and similarily, MP3DirectoryGenerator).

Per,

we have just committed in Forrest CVS a pre-pre-version of a general
treewalking facility 'Libre' that can be packaged as a Cocoon generator
amongst others. It seems to me that you're trying to do more or less
what we have been doing (and which is 'ready' for refactoring anyhow ;-)

You can check this out at:

http://www.krysalis.org/forrest/libre-intro.html
http://cvs.apache.org/viewcvs.cgi/xml-forrest/src/scratchpad/src/java/

HTH,

</Steven>


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


RE: Speedup *DirectoryGenerator (e.g. ImageDirectoryGenerator et al)...

Posted by Vadim Gritsenko <va...@verizon.net>.
> From: Per Kreipke [mailto:per@onclave.com]
> 
> > *DirectoryGenerators should be refactored so we have the only
> > DirectoryGenerator with pluggable 'processors' of different file
types.
> > This way, you will be able to generate listings of different files
of
> > type in one directory.
> 
> That's a great idea but more grandiose. It certainly would be neat if
you
> could (use POI to) extract metadata from MS Office files, etc. I
imagine
> there are actually code libraries out there for all kinds of 'file
> introspection' or generating metadata from files.
> 
> > > - having getSize() call getFileType() and then getJpegSize() or
> > > getGifSize(), introduces nice modularity but sacrifices speed.
Each
> > > function
> > > in that sequence calls (that's two calls total):
> > >
> > >   new BufferedInputStream(new FileInputStream(file));
> > >
> > > Instead, instantiate the BufferedInputStream in getSize() and pass
it
> > > to the
> > > other functions. Or move the work from getFileType() and
get*Size()
> > > back in
> > > to getSize().
> >
> > Instantiate one instance of RandomAccessFile and pass it to
'processor'.
> 
> Ok. This is re: the pluggable framework you mentioned above or does
this
> apply to the current code too?

MP3 needs it: TAG is in the tail... You don't want to read *all* file,
right? :)


> > > - more importantly, caching the information from getSize() plus
> > > 'lastModified' in an internal hash table with the file's URL as
key
> > > would
> > > remove the need to do the expensive work each time. If the file
hasn't
> > > changed, then it's size (or MP3 info) hasn't either.
> >
> > Cache key should be directory name plus settings, such as depth and
> > masks.
> >
> > Cache validity should be TimestampCacheValidity
(FileTimeStampValidity
> > in Cocoon 2.1) of all files selected by given depth/masks in this
> > directory.
> 
> I think you missed my point, those suggestions apply to caching the
entire
> result, no?

Yes, that's to cache whole response.


> I'm not trying to cache the entire result for reasons listed in the
thread:
> "Cachability (was RE: XInclude Transformer vs CInlude Transformer".
I'm just
> trying to cache each file's metadata individually.
> 
> E.g.:
> 
> key (lastModified, width, height)
> 
> d:\files\per\foo.jpeg: (123456789, 100, 50)
> d:\files\per\bar.gif: (987654321, 200, 100)
> 
> Since the lastModified date is already computed by DirectoryGenerator,
it
> knows whether or not to dive into the file to re-get the metadata.
This is a
> precursor to your plug in architecture too: there's no reason to
re-get the
> info if the file hasn't been modified.

I see. Then, use Store. See XSLProcessor for example of component which
uses Store for its purposes.


> > > Unfortunately, I don't know Cocoon well enough to understand if
> > > Generators
> > > are global instances (so that all requests will share the hash
table)
> > > or
> > > whether it exists per pipeline, per sitemap, etc. My point: I'm
not
> > > sure how
> > > to implement the cached info correctly.
> >
> > Implement generateKey and generateValidity methods.
> 
> Right, but that's only for caching the entire results.

Yup.

Vadim


> Per


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


RE: [PROPOSAL] Sources, the next generation RE: Speedup *DirectoryGenerator

Posted by Marc Portier <mp...@outerthought.org>.
Stephen,

<snip part="introduction"/>

>
> One thing a want to prevent is, to get a one more 'DirectoryGenerator' or
> in your case a TreNavigationGenerator.
>
> So my idea was to have one generator and several implementations of
> SourceFactories. It makes also the life easier to get the access over
> different component.
>

+1
same idea here, I just guessed we do need at least _one_ more to traverse
the 'BrowsableSource' that in its getChildSources() calls upon the
SourceFactories to produce more to traverse? (and as said, hopefully with an
easy transformer incarnation)

>
> > > -----Original Message-----
> > > From: Stephan Michels [mailto:stephan@vern.chem.tu-berlin.de]
> > >
> > > On Fri, 14 Jun 2002, Per Kreipke wrote:
> > >
> > > > > *DirectoryGenerators should be refactored so we have the only
> > > > > DirectoryGenerator with pluggable 'processors' of different
> > > file types.
> > > > > This way, you will be able to generate listings of
> different files of
> > > > > type in one directory.
> > > >
> > > > That's a great idea but more grandiose. It certainly would be
> > > neat if you
> > > > could (use POI to) extract metadata from MS Office files,
> etc. I imagine
> > > > there are actually code libraries out there for all kinds of 'file
> > > > introspection' or generating metadata from files.
> > >
> > >
> > > At the moment I'm to evaluating way to get meta informations from
> > > repositories, like slide or over WebDAV. Also I want to be grant
> >
> > this thread adds up the number of sensible examples nicely
> > next to our focus on (1) filesystem with special xmlbased
> config file in it
> > (libre) there was just one other example:  (2) a swing-like
> treeModel that
> > could be retrieved from a central place (and thus would be
> available to pure
> > swing clients as well)
> >
> > > permissions and locks to sources. My initial stage was creating
> > > 'SourceDescriptor', which is now in current CVS. But more I
> think about
> > > it as more I came to the conclusion that I should follow the SoC.
> > > The next idea is create some interfaces for 'Source' similar to
> > > 'WriteableSource'
> > >
> >
> > Haven't thought about it like this, but sure sounds great
> > One of the troubles I was neglecting to solve up to now was how to think
> > about hybrid trees of sources that would have child sources
> that would come
> > from a different implementation....
> >
> > If I understand this correctly (young but growing knowledge on
> cocoon and
> > avalon internals) there would be some kind of knowledge inside Avalon to
> > translate any source-URL into delivering an actual Source
> implementation?
>
> Yes, that is the way how it works.
>

cool, any particular part of the src distribution I can have a look into to
grasp this?
(or where you refering to 'this is how it _will_ work?' )

> > So there could then be a file:///..../dir, a
> dav://srv/path/collection , and
> > I would nead to think about some
> libre:///filelike-path/dir?cfg=libre.xml
> > (saying: look at this dir in the libre fasion by reading sort,
> filter and
> > attributing info from the libre.xml file you'll find there) don't see it
> > working for the images, mp3 case yet (list of file-extension to
> mimetype to
> > SourceImpl mappings??)
>
> There should be some kind of SourcePeropertyHelper, to get standard
> informations of different sources. Perhaps something like that
>
> <source-properties>
>  <source-property extension="jpg"
> class="org.apache.cocoon.components.source.helper.ImageProperties"/>
>  <source-property extension="mp3"
> class="org.apache.cocoon.components.source.helper.MP3Properties"/>
> </source-properties>
>

yep
I see these as the helper classes to read the availale properties
the ones that actually get into the output format should be more end-user
configurable I guess.

> > In which case each of them could say that one of their kids
> would in fact
> > exist outside their own implementation focus? (libre.xml could introduce
> > that with the <entry location="dav://..."> )
> >
> > (Currently I only thought about being able to switch the
> implementation at
> > the root level,in which case all descendants keep on living in the same
> > implementation space)
> >
> > In every case it would be nice if sourceURL of kids could be
> returned in a
> > relative manner?
> >
> > > So I have the following proposal:
> > >
> > > BrowsableSource:
> > >
> > >   /** if the source a directory */
> > >   boolean isCollection();
> > >
> > >   /** Return the children of the collection */
> > >   Enumeration getChildSources();
> > >
> > made a type-aware collection instead of the Enumeration
> > (which is one of my (bad?) habbits, it allows me to add some
> >  browseEnumeration method that is taking an enumerationVisitor
> >  interface implementation class with some acceptItem(theItem)
> >  method... this kind of relieves the clients of some of the
> >  casting and the boring hasNext() while loop, at the cost of
> >  writing an anonymous inner class.)
>
> Yes, of cause, ther can be some kind of SourceList, SourcePropertyList.
>
> > also still in doubth on adding a hasChildSources() next to
> > the isCollection(), the subtle difference being:
> > - isCollection(): can you have kids?
> > - hasChildSources(): do you have any currently?
> > would be a way to get rid of empty <collection /> elms in the
> > generated output.
>
> Yes, a method hasChildSources is, I think, better.
>

just to be clear: my suggestion was adding it, not replacing it
your comment could be read either way

> > > InformalSource:
> > >
> > >   /** To get a meta information from a source */
> > >   SourceProperty getSourceProperty(String namespace, String name);
> > >
> > >   /** To set a meta information */
> > >   void setSourceProperty(SourceProperty property);
> > >
> > >   /** Get alll informations */
> > >   Enumeration getSourceProperties()
> > >
> > you mean getSourcePropertyNames() with the last one?
> > or do you expect a returned set of namespace-name-value
> > holding objects?
>
> The last one ;-)
>

yep

> > (is SourceProperty in alreay existing class maybe?)
> >
> > > RestrictableSource:
> > >
> > >   /** Get a permission for a owner */
> > >   SourcePermission getSourcePermission(String owner);
> > >
> > >   /** Get a permission for the local owner */
> > >   SourcePermission getSourcePermission();
> > >
> > >   void setSourcePermission(SourcePermission permission);
> > >
> > >   Enumeration getSourcePermissions();
> > >
> > > LockableSource:
> > >
> > >   /** Get a lock for a owner */
> > >   SourcePermission getSourceLock(String owner);
> > >
> > >   /** Get a lock for the local owner */
> > >   SourceLock getSourceLock();
> > >
> > >   void setSourceLock(SourceLock lock);
> > >
> > >   Enumeration getSourceLocks();
> > >
> >
> > the great thing about SoC is that I don't need to know what this is even
> > about :-)
> > (mapping dav stuff I persume)
> >
> > > The interface InformalSource could be used to get properties
> > > from a source, such like image width and height
> > >
> > > file://test.gif
> > > SourceProperty: namespace http://xml.apache.org/cocoon/source/image
> > >                 name width
> > >                 value 480
> > >
> > > The values should also contain XML fragment like
> > > SourceProperty: namespace http://www.test.org/mymetas
> > >                 name title
> > >                 value bla from <a>dgfdh</a>
> >
> > mmm, didn't do this either, in this case you're not thinking
> about setting
> > the property name-value as an attribute on the <item
> > ns-prefix:name="value"/>  elm? but rather as a the content-model for the
> > generated output elm:
> > <item>
> >  <ns-prefix:name>
> > 	<!-- whatever -->
> >  </ns-prefix:name>
> > </item>
> >
> > how will you make the destincition between AttributeProperties and
> > NestedElementProperties?
>
> I don't. I want a similar syntax to PROPPATCH
>
> http://asg.web.cmu.edu/rfc/rfc2518.html#sec-8.1.1
>

you mean, in the output you would foresee a <prop> element that can have any
elements from other namespaces as it's contentmodel?

I was in fact referring to how you will programmatically decide what to do
with the propertyValue?
There is one getSourceProperties() in the InformalSource now, and we didn't
discuss the detail of the SourceProperty object... Question being: how will
the traverser know how to insert the propertyValue into the generated output
XML?
- as an attribute to the element OR
- as a nested element in the <prop> section of this element?

maybe this clarifies my latter remark as well:

> > In the latter case I'ld also try to avoid returning the
> property value as a
> > String, but rather hope for a mechanism to let the SourceImpl
> inject the SAX
> > Events directly to the output, or return at least use a
> > org.w3c.dom.DocumentFragment return type instead.
> >

giving the fact there is some SAX orientation my current feeling is there
could be need for asking first the Properties that have values to be
inserted in the atts-list, separate from asking the ones that need to end up
as nested elements.

> > >
> > > The next thing is that cocoon is able to browse through repositories
> > > At the moment DirectoryGenerator is limited to the file:// protocol, I
> > > think.
> > >
> > > I would also come to the point cachable. Source can IHMO implement
> > > recycable, to there is no need to retrieve all meta informations
> > > for every request.
> > >
> >
> > here, here, have been struggling on this one...
> > limited avalon understanding prevents me from seeing full
> solution though?
> >
> > > I had also took a look Ugo Cei's implementation of CocoBlog.
> He used RSS
> > > to create a description for every entry in xindice. I doesn't
> understand
> > > the difference between RSS and RDF. So I used for my first stage RDF.
> > > So my proposal is to write a 'SourceDescriptionGenerator'. It should
> > > work like DirectoyGenerator, catch all informations from a source, and
> > > generate a 'Resource Description'.
> > >
> > > One think, I doen't know to implement is associate 'SourceCredentials'
> > > to the source, such like username and password.
> > >
> > > Perhaps a ExtendedSourceFactory a possibility:
> > > ExtendedSourceFactory:
> > >
> > >    Source getSource(SourceCredential credential, String location, Map
> > >                     parameters)
> > >
> > > So, what do you think, is this the right way?
> > >
> >
> > sorry, can't help you here yet...
> > hope I did in other parts.
> >
> > one more remark, while refactoring the DirectoryGenerator to
> > some TreeOfSourcesGenerator: design should not be thight to
> > a generatorImpl alone:
> > the use case for the transformer version is (avoiding the aggregation)
> > in a lot of cases the output of this thing will be used as some
> > right hand navigation of some webpage, but it can end up generating
> > small sub-trees just about anywhere I guess. In some cases people will
> > think about this more as a concern of the content-editor that would like
> > to write: <navigation-tree src="...." depth=".." /> to be picked up by
> > some TreeOfSourcesTransformer as well.
> > To achieve this I would separate the SAXgeneration stuff in some
> > TreeOfSourceReader to be used by both the traverser, and the generator.
> >
> > > Stephan Michels.
> > >
> > >
> > -marc=

regards,
-marc=


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


RE: [PROPOSAL] Sources, the next generation RE: Speedup *DirectoryGenerator

Posted by Stephan Michels <st...@vern.chem.tu-berlin.de>.

On Sat, 15 Jun 2002, Marc Portier wrote:

> Hi Stephan,
>
> As Steven pointed out
> (http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=102412384403844&w=2) we've
> just started on something similar (so there is some momentum)
>
> in fact what you're talking about here maps to our package 'yer' which
> offers now some generic interfaces for defining (implementing) and
> traversing very generic hierarchical repositories... next to 'yer' the
> 'libre' thing is what allows us to add filtering, ordering and external
> attributing metadata to the classical filesystem (so the actual libre part
> in the code is probably only interesting inside the forrest context where we
> are trying to find a more flexible alternative to the book.xml for
> documentation stuff, the design around it could fit in with this nicely.)
>
> in every case I hope some of my recent experience on building this prototype
> can be of help, I took te liberty to add some comments inline...

One thing a want to prevent is, to get a one more 'DirectoryGenerator' or
in your case a TreNavigationGenerator.

So my idea was to have one generator and several implementations of
SourceFactories. It makes also the life easier to get the access over
different component.


> > -----Original Message-----
> > From: Stephan Michels [mailto:stephan@vern.chem.tu-berlin.de]
> >
> > On Fri, 14 Jun 2002, Per Kreipke wrote:
> >
> > > > *DirectoryGenerators should be refactored so we have the only
> > > > DirectoryGenerator with pluggable 'processors' of different
> > file types.
> > > > This way, you will be able to generate listings of different files of
> > > > type in one directory.
> > >
> > > That's a great idea but more grandiose. It certainly would be
> > neat if you
> > > could (use POI to) extract metadata from MS Office files, etc. I imagine
> > > there are actually code libraries out there for all kinds of 'file
> > > introspection' or generating metadata from files.
> >
> >
> > At the moment I'm to evaluating way to get meta informations from
> > repositories, like slide or over WebDAV. Also I want to be grant
>
> this thread adds up the number of sensible examples nicely
> next to our focus on (1) filesystem with special xmlbased config file in it
> (libre) there was just one other example:  (2) a swing-like treeModel that
> could be retrieved from a central place (and thus would be available to pure
> swing clients as well)
>
> > permissions and locks to sources. My initial stage was creating
> > 'SourceDescriptor', which is now in current CVS. But more I think about
> > it as more I came to the conclusion that I should follow the SoC.
> > The next idea is create some interfaces for 'Source' similar to
> > 'WriteableSource'
> >
>
> Haven't thought about it like this, but sure sounds great
> One of the troubles I was neglecting to solve up to now was how to think
> about hybrid trees of sources that would have child sources that would come
> from a different implementation....
>
> If I understand this correctly (young but growing knowledge on cocoon and
> avalon internals) there would be some kind of knowledge inside Avalon to
> translate any source-URL into delivering an actual Source implementation?

Yes, that is the way how it works.

> So there could then be a file:///..../dir, a dav://srv/path/collection , and
> I would nead to think about some libre:///filelike-path/dir?cfg=libre.xml
> (saying: look at this dir in the libre fasion by reading sort, filter and
> attributing info from the libre.xml file you'll find there) don't see it
> working for the images, mp3 case yet (list of file-extension to mimetype to
> SourceImpl mappings??)

There should be some kind of SourcePeropertyHelper, to get standard
informations of different sources. Perhaps something like that

<source-properties>
 <source-property extension="jpg"
class="org.apache.cocoon.components.source.helper.ImageProperties"/>
 <source-property extension="mp3"
class="org.apache.cocoon.components.source.helper.MP3Properties"/>
</source-properties>

> In which case each of them could say that one of their kids would in fact
> exist outside their own implementation focus? (libre.xml could introduce
> that with the <entry location="dav://..."> )
>
> (Currently I only thought about being able to switch the implementation at
> the root level,in which case all descendants keep on living in the same
> implementation space)
>
> In every case it would be nice if sourceURL of kids could be returned in a
> relative manner?
>
> > So I have the following proposal:
> >
> > BrowsableSource:
> >
> >   /** if the source a directory */
> >   boolean isCollection();
> >
> >   /** Return the children of the collection */
> >   Enumeration getChildSources();
> >
> made a type-aware collection instead of the Enumeration
> (which is one of my (bad?) habbits, it allows me to add some
>  browseEnumeration method that is taking an enumerationVisitor
>  interface implementation class with some acceptItem(theItem)
>  method... this kind of relieves the clients of some of the
>  casting and the boring hasNext() while loop, at the cost of
>  writing an anonymous inner class.)

Yes, of cause, ther can be some kind of SourceList, SourcePropertyList.

> also still in doubth on adding a hasChildSources() next to
> the isCollection(), the subtle difference being:
> - isCollection(): can you have kids?
> - hasChildSources(): do you have any currently?
> would be a way to get rid of empty <collection /> elms in the
> generated output.

Yes, a method hasChildSources is, I think, better.

> > InformalSource:
> >
> >   /** To get a meta information from a source */
> >   SourceProperty getSourceProperty(String namespace, String name);
> >
> >   /** To set a meta information */
> >   void setSourceProperty(SourceProperty property);
> >
> >   /** Get alll informations */
> >   Enumeration getSourceProperties()
> >
> you mean getSourcePropertyNames() with the last one?
> or do you expect a returned set of namespace-name-value
> holding objects?

The last one ;-)

> (is SourceProperty in alreay existing class maybe?)
>
> > RestrictableSource:
> >
> >   /** Get a permission for a owner */
> >   SourcePermission getSourcePermission(String owner);
> >
> >   /** Get a permission for the local owner */
> >   SourcePermission getSourcePermission();
> >
> >   void setSourcePermission(SourcePermission permission);
> >
> >   Enumeration getSourcePermissions();
> >
> > LockableSource:
> >
> >   /** Get a lock for a owner */
> >   SourcePermission getSourceLock(String owner);
> >
> >   /** Get a lock for the local owner */
> >   SourceLock getSourceLock();
> >
> >   void setSourceLock(SourceLock lock);
> >
> >   Enumeration getSourceLocks();
> >
>
> the great thing about SoC is that I don't need to know what this is even
> about :-)
> (mapping dav stuff I persume)
>
> > The interface InformalSource could be used to get properties
> > from a source, such like image width and height
> >
> > file://test.gif
> > SourceProperty: namespace http://xml.apache.org/cocoon/source/image
> >                 name width
> >                 value 480
> >
> > The values should also contain XML fragment like
> > SourceProperty: namespace http://www.test.org/mymetas
> >                 name title
> >                 value bla from <a>dgfdh</a>
>
> mmm, didn't do this either, in this case you're not thinking about setting
> the property name-value as an attribute on the <item
> ns-prefix:name="value"/>  elm? but rather as a the content-model for the
> generated output elm:
> <item>
>  <ns-prefix:name>
> 	<!-- whatever -->
>  </ns-prefix:name>
> </item>
>
> how will you make the destincition between AttributeProperties and
> NestedElementProperties?

I don't. I want a similar syntax to PROPPATCH

http://asg.web.cmu.edu/rfc/rfc2518.html#sec-8.1.1

> In the latter case I'ld also try to avoid returning the property value as a
> String, but rather hope for a mechanism to let the SourceImpl inject the SAX
> Events directly to the output, or return at least use a
> org.w3c.dom.DocumentFragment return type instead.
>
> >
> > The next thing is that cocoon is able to browse through repositories
> > At the moment DirectoryGenerator is limited to the file:// protocol, I
> > think.
> >
> > I would also come to the point cachable. Source can IHMO implement
> > recycable, to there is no need to retrieve all meta informations
> > for every request.
> >
>
> here, here, have been struggling on this one...
> limited avalon understanding prevents me from seeing full solution though?
>
> > I had also took a look Ugo Cei's implementation of CocoBlog. He used RSS
> > to create a description for every entry in xindice. I doesn't understand
> > the difference between RSS and RDF. So I used for my first stage RDF.
> > So my proposal is to write a 'SourceDescriptionGenerator'. It should
> > work like DirectoyGenerator, catch all informations from a source, and
> > generate a 'Resource Description'.
> >
> > One think, I doen't know to implement is associate 'SourceCredentials'
> > to the source, such like username and password.
> >
> > Perhaps a ExtendedSourceFactory a possibility:
> > ExtendedSourceFactory:
> >
> >    Source getSource(SourceCredential credential, String location, Map
> >                     parameters)
> >
> > So, what do you think, is this the right way?
> >
>
> sorry, can't help you here yet...
> hope I did in other parts.
>
> one more remark, while refactoring the DirectoryGenerator to
> some TreeOfSourcesGenerator: design should not be thight to
> a generatorImpl alone:
> the use case for the transformer version is (avoiding the aggregation)
> in a lot of cases the output of this thing will be used as some
> right hand navigation of some webpage, but it can end up generating
> small sub-trees just about anywhere I guess. In some cases people will
> think about this more as a concern of the content-editor that would like
> to write: <navigation-tree src="...." depth=".." /> to be picked up by
> some TreeOfSourcesTransformer as well.
> To achieve this I would separate the SAXgeneration stuff in some
> TreeOfSourceReader to be used by both the traverser, and the generator.
>
> > Stephan Michels.
> >
> >
> -marc=
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> For additional commands, email: cocoon-dev-help@xml.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


RE: [PROPOSAL] Sources, the next generation RE: Speedup *DirectoryGenerator

Posted by Marc Portier <mp...@outerthought.org>.
Hi Stephan,

As Steven pointed out
(http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=102412384403844&w=2) we've
just started on something similar (so there is some momentum)

in fact what you're talking about here maps to our package 'yer' which
offers now some generic interfaces for defining (implementing) and
traversing very generic hierarchical repositories... next to 'yer' the
'libre' thing is what allows us to add filtering, ordering and external
attributing metadata to the classical filesystem (so the actual libre part
in the code is probably only interesting inside the forrest context where we
are trying to find a more flexible alternative to the book.xml for
documentation stuff, the design around it could fit in with this nicely.)

in every case I hope some of my recent experience on building this prototype
can be of help, I took te liberty to add some comments inline...


> -----Original Message-----
> From: Stephan Michels [mailto:stephan@vern.chem.tu-berlin.de]
>
> On Fri, 14 Jun 2002, Per Kreipke wrote:
>
> > > *DirectoryGenerators should be refactored so we have the only
> > > DirectoryGenerator with pluggable 'processors' of different
> file types.
> > > This way, you will be able to generate listings of different files of
> > > type in one directory.
> >
> > That's a great idea but more grandiose. It certainly would be
> neat if you
> > could (use POI to) extract metadata from MS Office files, etc. I imagine
> > there are actually code libraries out there for all kinds of 'file
> > introspection' or generating metadata from files.
>
>
> At the moment I'm to evaluating way to get meta informations from
> repositories, like slide or over WebDAV. Also I want to be grant

this thread adds up the number of sensible examples nicely
next to our focus on (1) filesystem with special xmlbased config file in it
(libre) there was just one other example:  (2) a swing-like treeModel that
could be retrieved from a central place (and thus would be available to pure
swing clients as well)


> permissions and locks to sources. My initial stage was creating
> 'SourceDescriptor', which is now in current CVS. But more I think about
> it as more I came to the conclusion that I should follow the SoC.
> The next idea is create some interfaces for 'Source' similar to
> 'WriteableSource'
>

Haven't thought about it like this, but sure sounds great
One of the troubles I was neglecting to solve up to now was how to think
about hybrid trees of sources that would have child sources that would come
from a different implementation....

If I understand this correctly (young but growing knowledge on cocoon and
avalon internals) there would be some kind of knowledge inside Avalon to
translate any source-URL into delivering an actual Source implementation?

So there could then be a file:///..../dir, a dav://srv/path/collection , and
I would nead to think about some libre:///filelike-path/dir?cfg=libre.xml
(saying: look at this dir in the libre fasion by reading sort, filter and
attributing info from the libre.xml file you'll find there) don't see it
working for the images, mp3 case yet (list of file-extension to mimetype to
SourceImpl mappings??)

In which case each of them could say that one of their kids would in fact
exist outside their own implementation focus? (libre.xml could introduce
that with the <entry location="dav://..."> )

(Currently I only thought about being able to switch the implementation at
the root level,in which case all descendants keep on living in the same
implementation space)

In every case it would be nice if sourceURL of kids could be returned in a
relative manner?

> So I have the following proposal:
>
> BrowsableSource:
>
>   /** if the source a directory */
>   boolean isCollection();
>
>   /** Return the children of the collection */
>   Enumeration getChildSources();
>
made a type-aware collection instead of the Enumeration
(which is one of my (bad?) habbits, it allows me to add some
 browseEnumeration method that is taking an enumerationVisitor
 interface implementation class with some acceptItem(theItem)
 method... this kind of relieves the clients of some of the
 casting and the boring hasNext() while loop, at the cost of
 writing an anonymous inner class.)

also still in doubth on adding a hasChildSources() next to
the isCollection(), the subtle difference being:
- isCollection(): can you have kids?
- hasChildSources(): do you have any currently?
would be a way to get rid of empty <collection /> elms in the
generated output.

> InformalSource:
>
>   /** To get a meta information from a source */
>   SourceProperty getSourceProperty(String namespace, String name);
>
>   /** To set a meta information */
>   void setSourceProperty(SourceProperty property);
>
>   /** Get alll informations */
>   Enumeration getSourceProperties()
>
you mean getSourcePropertyNames() with the last one?
or do you expect a returned set of namespace-name-value
holding objects?

(is SourceProperty in alreay existing class maybe?)

> RestrictableSource:
>
>   /** Get a permission for a owner */
>   SourcePermission getSourcePermission(String owner);
>
>   /** Get a permission for the local owner */
>   SourcePermission getSourcePermission();
>
>   void setSourcePermission(SourcePermission permission);
>
>   Enumeration getSourcePermissions();
>
> LockableSource:
>
>   /** Get a lock for a owner */
>   SourcePermission getSourceLock(String owner);
>
>   /** Get a lock for the local owner */
>   SourceLock getSourceLock();
>
>   void setSourceLock(SourceLock lock);
>
>   Enumeration getSourceLocks();
>

the great thing about SoC is that I don't need to know what this is even
about :-)
(mapping dav stuff I persume)

> The interface InformalSource could be used to get properties
> from a source, such like image width and height
>
> file://test.gif
> SourceProperty: namespace http://xml.apache.org/cocoon/source/image
>                 name width
>                 value 480
>
> The values should also contain XML fragment like
> SourceProperty: namespace http://www.test.org/mymetas
>                 name title
>                 value bla from <a>dgfdh</a>

mmm, didn't do this either, in this case you're not thinking about setting
the property name-value as an attribute on the <item
ns-prefix:name="value"/>  elm? but rather as a the content-model for the
generated output elm:
<item>
 <ns-prefix:name>
	<!-- whatever -->
 </ns-prefix:name>
</item>

how will you make the destincition between AttributeProperties and
NestedElementProperties?

In the latter case I'ld also try to avoid returning the property value as a
String, but rather hope for a mechanism to let the SourceImpl inject the SAX
Events directly to the output, or return at least use a
org.w3c.dom.DocumentFragment return type instead.

>
> The next thing is that cocoon is able to browse through repositories
> At the moment DirectoryGenerator is limited to the file:// protocol, I
> think.
>
> I would also come to the point cachable. Source can IHMO implement
> recycable, to there is no need to retrieve all meta informations
> for every request.
>

here, here, have been struggling on this one...
limited avalon understanding prevents me from seeing full solution though?

> I had also took a look Ugo Cei's implementation of CocoBlog. He used RSS
> to create a description for every entry in xindice. I doesn't understand
> the difference between RSS and RDF. So I used for my first stage RDF.
> So my proposal is to write a 'SourceDescriptionGenerator'. It should
> work like DirectoyGenerator, catch all informations from a source, and
> generate a 'Resource Description'.
>
> One think, I doen't know to implement is associate 'SourceCredentials'
> to the source, such like username and password.
>
> Perhaps a ExtendedSourceFactory a possibility:
> ExtendedSourceFactory:
>
>    Source getSource(SourceCredential credential, String location, Map
>                     parameters)
>
> So, what do you think, is this the right way?
>

sorry, can't help you here yet...
hope I did in other parts.

one more remark, while refactoring the DirectoryGenerator to
some TreeOfSourcesGenerator: design should not be thight to
a generatorImpl alone:
the use case for the transformer version is (avoiding the aggregation)
in a lot of cases the output of this thing will be used as some
right hand navigation of some webpage, but it can end up generating
small sub-trees just about anywhere I guess. In some cases people will
think about this more as a concern of the content-editor that would like
to write: <navigation-tree src="...." depth=".." /> to be picked up by
some TreeOfSourcesTransformer as well.
To achieve this I would separate the SAXgeneration stuff in some
TreeOfSourceReader to be used by both the traverser, and the generator.

> Stephan Michels.
>
>
-marc=


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


[PROPOSAL] Sources, the next generation RE: Speedup *DirectoryGenerator

Posted by Stephan Michels <st...@vern.chem.tu-berlin.de>.

On Fri, 14 Jun 2002, Per Kreipke wrote:

> > *DirectoryGenerators should be refactored so we have the only
> > DirectoryGenerator with pluggable 'processors' of different file types.
> > This way, you will be able to generate listings of different files of
> > type in one directory.
>
> That's a great idea but more grandiose. It certainly would be neat if you
> could (use POI to) extract metadata from MS Office files, etc. I imagine
> there are actually code libraries out there for all kinds of 'file
> introspection' or generating metadata from files.


At the moment I'm to evaluating way to get meta informations from
repositories, like slide or over WebDAV. Also I want to be grant
permissions and locks to sources. My initial stage was creating
'SourceDescriptor', which is now in current CVS. But more I think about
it as more I came to the conclusion that I should follow the SoC.
The next idea is create some interfaces for 'Source' similar to
'WriteableSource'

So I have the following proposal:

BrowsableSource:

  /** if the source a directory */
  boolean isCollection();

  /** Return the children of the collection */
  Enumeration getChildSources();

InformalSource:

  /** To get a meta information from a source */
  SourceProperty getSourceProperty(String namespace, String name);

  /** To set a meta information */
  void setSourceProperty(SourceProperty property);

  /** Get alll informations */
  Enumeration getSourceProperties()

RestrictableSource:

  /** Get a permission for a owner */
  SourcePermission getSourcePermission(String owner);

  /** Get a permission for the local owner */
  SourcePermission getSourcePermission();

  void setSourcePermission(SourcePermission permission);

  Enumeration getSourcePermissions();

LockableSource:

  /** Get a lock for a owner */
  SourcePermission getSourceLock(String owner);

  /** Get a lock for the local owner */
  SourceLock getSourceLock();

  void setSourceLock(SourceLock lock);

  Enumeration getSourceLocks();

The interface InformalSource could be used to get properties
from a source, such like image width and height

file://test.gif
SourceProperty: namespace http://xml.apache.org/cocoon/source/image
                name width
                value 480

The values should also contain XML fragment like
SourceProperty: namespace http://www.test.org/mymetas
                name title
                value bla from <a>dgfdh</a>

The next thing is that cocoon is able to browse through repositories
At the moment DirectoryGenerator is limited to the file:// protocol, I
think.

I would also come to the point cachable. Source can IHMO implement
recycable, to there is no need to retrieve all meta informations
for every request.

I had also took a look Ugo Cei's implementation of CocoBlog. He used RSS
to create a description for every entry in xindice. I doesn't understand
the difference between RSS and RDF. So I used for my first stage RDF.
So my proposal is to write a 'SourceDescriptionGenerator'. It should
work like DirectoyGenerator, catch all informations from a source, and
generate a 'Resource Description'.

One think, I doen't know to implement is associate 'SourceCredentials'
to the source, such like username and password.

Perhaps a ExtendedSourceFactory a possibility:
ExtendedSourceFactory:

   Source getSource(SourceCredential credential, String location, Map
                    parameters)

So, what do you think, is this the right way?

Stephan Michels.


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


RE: Speedup *DirectoryGenerator (e.g. ImageDirectoryGenerator et al)...

Posted by Per Kreipke <pe...@onclave.com>.
> *DirectoryGenerators should be refactored so we have the only
> DirectoryGenerator with pluggable 'processors' of different file types.
> This way, you will be able to generate listings of different files of
> type in one directory.

That's a great idea but more grandiose. It certainly would be neat if you
could (use POI to) extract metadata from MS Office files, etc. I imagine
there are actually code libraries out there for all kinds of 'file
introspection' or generating metadata from files.

> > - having getSize() call getFileType() and then getJpegSize() or
> > getGifSize(), introduces nice modularity but sacrifices speed. Each
> function
> > in that sequence calls (that's two calls total):
> >
> >   new BufferedInputStream(new FileInputStream(file));
> >
> > Instead, instantiate the BufferedInputStream in getSize() and pass it
> to the
> > other functions. Or move the work from getFileType() and get*Size()
> back in
> > to getSize().
>
> Instantiate one instance of RandomAccessFile and pass it to 'processor'.

Ok. This is re: the pluggable framework you mentioned above or does this
apply to the current code too?

> > - more importantly, caching the information from getSize() plus
> > 'lastModified' in an internal hash table with the file's URL as key
> would
> > remove the need to do the expensive work each time. If the file hasn't
> > changed, then it's size (or MP3 info) hasn't either.
>
> Cache key should be directory name plus settings, such as depth and
> masks.
>
> Cache validity should be TimestampCacheValidity (FileTimeStampValidity
> in Cocoon 2.1) of all files selected by given depth/masks in this
> directory.

I think you missed my point, those suggestions apply to caching the entire
result, no?

I'm not trying to cache the entire result for reasons listed in the thread:
"Cachability (was RE: XInclude Transformer vs CInlude Transformer". I'm just
trying to cache each file's metadata individually.

E.g.:

key (lastModified, width, height)

d:\files\per\foo.jpeg: (123456789, 100, 50)
d:\files\per\bar.gif: (987654321, 200, 100)

Since the lastModified date is already computed by DirectoryGenerator, it
knows whether or not to dive into the file to re-get the metadata. This is a
precursor to your plug in architecture too: there's no reason to re-get the
info if the file hasn't been modified.

> > Unfortunately, I don't know Cocoon well enough to understand if
> Generators
> > are global instances (so that all requests will share the hash table)
> or
> > whether it exists per pipeline, per sitemap, etc. My point: I'm not
> sure how
> > to implement the cached info correctly.
>
> Implement generateKey and generateValidity methods.

Right, but that's only for caching the entire results.

Per


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


RE: Speedup *DirectoryGenerator (e.g. ImageDirectoryGenerator et al)...

Posted by Vadim Gritsenko <va...@verizon.net>.
> From: Per Kreipke [mailto:per@onclave.com]
...
> Wouldn't it be nice if the second time you requested the image info,
it was
> as fast as the DirectoryGenerator?
> 
> 
> Suggestions:

*DirectoryGenerators should be refactored so we have the only
DirectoryGenerator with pluggable 'processors' of different file types.
This way, you will be able to generate listings of different files of
type in one directory.


> - having getSize() call getFileType() and then getJpegSize() or
> getGifSize(), introduces nice modularity but sacrifices speed. Each
function
> in that sequence calls (that's two calls total):
> 
>   new BufferedInputStream(new FileInputStream(file));
> 
> Instead, instantiate the BufferedInputStream in getSize() and pass it
to the
> other functions. Or move the work from getFileType() and get*Size()
back in
> to getSize().

Instantiate one instance of RandomAccessFile and pass it to 'processor'.


> - more importantly, caching the information from getSize() plus
> 'lastModified' in an internal hash table with the file's URL as key
would
> remove the need to do the expensive work each time. If the file hasn't
> changed, then it's size (or MP3 info) hasn't either.

Cache key should be directory name plus settings, such as depth and
masks.

Cache validity should be TimestampCacheValidity (FileTimeStampValidity
in Cocoon 2.1) of all files selected by given depth/masks in this
directory.


> Unfortunately, I don't know Cocoon well enough to understand if
Generators
> are global instances (so that all requests will share the hash table)
or
> whether it exists per pipeline, per sitemap, etc. My point: I'm not
sure how
> to implement the cached info correctly.

Implement generateKey and generateValidity methods.

Vadim


> I would love to do this work and send in the patch myself, and I'll
attempt
> to do so when I have the latest C2 source installed here. Unless
someone
> desparate does it first :-)
> 
> Per


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org