You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Gianugo Rabellino <gi...@apache.org> on 2003/08/29 19:46:30 UTC
WebDAV proxy available (was: [RT] WebDAV proxy / DASL search in Cocoon)
Bertrand Delacretaz wrote:
> If I'm right, this would allow Cocoon to proxy most WebDAV operations to
> a non-DASL WebDAV backend, and process the few operations that relate to
> properties (PROPFIND, PROPPATCH, SEARCH) directly, storing properties in
> a database.
This was an itch I just had to scratch. :-) I have now committed to the
proxy block a new ProxyGenerator that takes whatever method and forwards
it to an origin server specified as a sitemap parameter. To do so, I had
to put together a new httpclient method (I'm wondering if the httpclient
team might be interested in it once polished...) that clones an incoming
HttpServletRequest and makes a call to an origin server. So far I've
tried it with cadaver and skunkdav, surprisingly enough it seems to work
with all the webdav methods.
We have a starting point then, but beware:
1. header handling is *hacky*. I had to filter by hand some headers in
order to make the application behave (don't ask me why, but
Content-Length in the Cocoon proxied response was off by one byte);
2. no HTTP/1.1 keepalive. I see no way of handling it in Cocoon, since
every request has to go through the pipeline, and the response
OutputStream cannot be reused. This might be a major performance hit.
Anyway, it's there for you to play. :-) To access a DAV repository
running at localhost on port 81 under the /dav context, all you have to
do is
<map:generator label="content"
logger="sitemap.generator.proxy"
name="generic-proxy"
src="org.apache.cocoon.generation.GenericProxyGenerator"/>
[...]
<map:match pattern="dav/">
<map:generate type="generic-proxy">
<map:parameter name="url" value="http://localhost:81/dav/"/>
<map:parameter name="path" value="/dav/"/>
</map:generate>
<map:serialize type="xml"/>
</map:match>
<map:match pattern="dav/**">
<map:generate type="generic-proxy">
<map:parameter name="url" value="http://localhost:8881/dav/"/>
<map:parameter name="path" value="/dav/{1}"/>
</map:generate>
<map:serialize type="xml"/>
</map:match>
Anyone willing to tackle DASL searches now? :-)
Ciao,
--
Gianugo Rabellino
Pro-netics s.r.l. - http://www.pro-netics.com
Orixo, the XML business alliance - http://www.orixo.com
(Now blogging at: http://blogs.cocoondev.org/gianugo/)
Re: WebDAV proxy available
Posted by Andreas Hochsteger <e9...@student.tuwien.ac.at>.
Guido Casper wrote:
> Why holding the meta data redundant at all? The meta data is already
> there (on the WebDAV server, even if not DASL enabled). You just need
> to find a way to index afterwards.
>
> A SQL DB would be nice to store and index a certain set of predefined
> properties but it falls short if you have any abitrary user-defined
> properties (as is the case with WebDAV).
>
> When I first heard about the idea for using Cocoon as a DASL indexing
> engine I immediately thought about using Lucene for this.
>
> You would need any collection respond to a GET request with a XML
> representation as the TraversableGenerator generates. The links view
> applies an additional XSLT stylesheet:
>
> <map:view from-position="TraversableGenerator" name="links">
> <map:transform src="2htmlLinks.xsl"/>
> <map:serialize type="links"/>
> </map:view>
>
> So the whole WebDAV repository can be crawled by Lucene.
> And Lucene's content-view-query is configured as
> "cocoon-view=properties".
>
> with the "properties" view being
>
> <map:view from-position="content" name="properties">
> <map:transform type="source-props"/><!--reading props-->
> <map:serialize type="xml"/>
> </map:view>
>
> You would need the LuceneXMLIndexer to store ALL fields (instead of
> just a few configured ones), as a DASL query could request all
> properties. You could even index content and properties side by side,
> as DASL may query the content itself.
>
> The challenging part is to create another SearchGenerator that parses
> the DASL query, although it shouldn't be too hard to at least cover a
> basic DASL subset.
>
> If all this once is working you can easily DASL-enable any
> InspectableSource and it fits more nicely with Cocoon's architecture
> IMHO.
>
> Does this make sense?
Sounds reasonable to me.
I specifically like the fact to be able to query any InspectableSource
via DASL.
> Guido
Bye,
Andreas
Re: WebDAV proxy available
Posted by Gianugo Rabellino <gi...@apache.org>.
Guido Casper wrote:
>>Bertrand Delacretaz wrote:
>>
>>>> <match type="request-method" pattern="PUT">
>>>> <act type="syncmetadatadb">
>>>> <generate type="webdavproxy">
>>>> <parameter name="url" value="http://whatever/dav/{../1}"/
>>>> </generate>
>>>> <serialize/>
>>>> </act>
>>>> </match>
>
>
> I don't understand. PUT doesn't carry any meta data (not explicitely).
> Do you mean PROPPATCH?
No, as Bertrand said you need to sync every possible property (even live
ones), so a PUT generates properties on the origin server.
>>You're right, we need a safer approach.
>
>
> Why holding the meta data redundant at all? The meta data is already
> there (on the WebDAV server, even if not DASL enabled). You just need
> to find a way to index afterwards.
I'm with Bertrand here: afterwards might just be too late. Think about
documents being deleted or whose metadata are modified (a live document
being put on hold as an example). I'm now trying to understand if there
is a way to implement rollback at a webdav level (a decent SQL database
would have it for free, so this is not my major concern). I'm wondering
if an undo operation can be applied for every WebDAV method, but this
would require quite a messy transaction/redo log" maintenance on Cocoon,
and I'm not sure I want to dive into that. :-)
> A SQL DB would be nice to store and index a certain set of predefined
> properties but it falls short if you have any abitrary user-defined
> properties (as is the case with WebDAV).
The problem isn't really simple name-value properties as Bertrand
suggested. Metadata in WebDAV are actually elements, so you need a way
to query them using XML (though actually DAV:basicsearch has no such
capabilities AFAIK: it seems to suppose untyped (String, String)
name-value properties. So, if basicsearch is enough, an SQL DB would be
just fine (as Catacomb shows).
> You would need the LuceneXMLIndexer to store ALL fields (instead of
> just a few configured ones), as a DASL query could request all
> properties. You could even index content and properties side by side,
> as DASL may query the content itself.
But how does Lucene behave with incremental indexing?
> The challenging part is to create another SearchGenerator that parses
> the DASL query, although it shouldn't be too hard to at least cover a
> basic DASL subset.
The basicsearch grammar implementation is the least problem I can
imagine. It's XML, so it should be pretty easy to translate it into
another domain.
> If all this once is working you can easily DASL-enable any
> InspectableSource and it fits more nicely with Cocoon's architecture
> IMHO.
Oh well, that would be pure Nirvana. But it would fit in a much bigger
picture: WebDAV enabled Sources with not only DASL but all the WebDAV
stuff in it. Yet, I'm afraid it's a long way to go.
Ciao,
--
Gianugo Rabellino
Pro-netics s.r.l. - http://www.pro-netics.com
Orixo, the XML business alliance - http://www.orixo.com
(Now blogging at: http://blogs.cocoondev.org/gianugo/)
Re: WebDAV proxy available
Posted by Bertrand Delacretaz <bd...@codeconsult.ch>.
Le Lundi, 1 sep 2003, à 21:06 Europe/Zurich, Guido Casper a écrit :
> ...I don't understand. PUT doesn't carry any meta data (not
> explicitely).
> Do you mean PROPPATCH?
No, but if you want to index the dead properties as well as live ones,
you need to intercept PUT and either read the dead props from the
backend to index them or recreate them locally.
> ...Why holding the meta data redundant at all? The meta data is already
> there (on the WebDAV server, even if not DASL enabled). You just need
> to find a way to index afterwards.
"Afterwards" will not be suitable for all purposes, in most cases you
want the index to be updated right away, if possible in the same
transaction used to store the content (but this might be hard to do).
> ...A SQL DB would be nice to store and index a certain set of
> predefined
> properties but it falls short if you have any abitrary user-defined
> properties (as is the case with WebDAV)....
Storing properties in a table with PROP_NAME and PROP_VALUE rows, for
example, allows for arbitrary user-defined properties (unless you want
to store widely varying data types, but I don't know if WebDAV has a
concept of data types for properties).
> ...When I first heard about the idea for using Cocoon as a DASL
> indexing
> engine I immediately thought about using Lucene for this....
Sounds interesting, but will Lucene allow efficient index updates on
every PUT/PROPPATCH and similar operations, or do you have to defer
indexing for later?
> ...The challenging part is to create another SearchGenerator that
> parses
> the DASL query, although it shouldn't be too hard to at least cover a
> basic DASL subset....
Yes, and even a custom DASL search language would be useful already,
even if the basicsearch stuff is not implemented at first.
> ...Does this make sense?
It sure does - I'm just worried about how well Lucene compares to a
database in terms of storing large quantities of data.
-Bertrand
Re: WebDAV proxy available
Posted by Guido Casper <gc...@s-und-n.de>.
Gianugo Rabellino <gi...@apache.org> wrote:
> Bertrand Delacretaz wrote:
>>
>>> <match type="request-method" pattern="PUT">
>>> <act type="syncmetadatadb">
>>> <generate type="webdavproxy">
>>> <parameter name="url" value="http://whatever/dav/{../1}"/
>>> </generate>
>>> <serialize/>
>>> </act>
>>> </match>
I don't understand. PUT doesn't carry any meta data (not explicitely).
Do you mean PROPPATCH?
>>
>>
>> I think the webdav backend is much more likely to fail than
>> syncmetadb in such cases (due to insufficient authorizations etc),
>> which would create inconsistencies between the backend and meta db
>> if the sync is done with actions.
>
> You're right, we need a safer approach.
Why holding the meta data redundant at all? The meta data is already
there (on the WebDAV server, even if not DASL enabled). You just need
to find a way to index afterwards.
A SQL DB would be nice to store and index a certain set of predefined
properties but it falls short if you have any abitrary user-defined
properties (as is the case with WebDAV).
When I first heard about the idea for using Cocoon as a DASL indexing
engine I immediately thought about using Lucene for this.
You would need any collection respond to a GET request with a XML
representation as the TraversableGenerator generates. The links view
applies an additional XSLT stylesheet:
<map:view from-position="TraversableGenerator" name="links">
<map:transform src="2htmlLinks.xsl"/>
<map:serialize type="links"/>
</map:view>
So the whole WebDAV repository can be crawled by Lucene.
And Lucene's content-view-query is configured as
"cocoon-view=properties".
with the "properties" view being
<map:view from-position="content" name="properties">
<map:transform type="source-props"/><!--reading props-->
<map:serialize type="xml"/>
</map:view>
You would need the LuceneXMLIndexer to store ALL fields (instead of
just a few configured ones), as a DASL query could request all
properties. You could even index content and properties side by side,
as DASL may query the content itself.
The challenging part is to create another SearchGenerator that parses
the DASL query, although it shouldn't be too hard to at least cover a
basic DASL subset.
If all this once is working you can easily DASL-enable any
InspectableSource and it fits more nicely with Cocoon's architecture
IMHO.
Does this make sense?
Guido
Re: WebDAV proxy available
Posted by Bertrand Delacretaz <bd...@codeconsult.ch>.
Le Lundi, 1 sep 2003, à 17:54 Europe/Zurich, Gianugo Rabellino a écrit :
> ...This however raises a new issue: the whole operation should be
> atomic and (somehow) transaction-aware. If for any reason the Dasl
> component cannot update the database, an error should be thrown and
> the operation should fail. Interesting problem, and I don't actually
> see a way to solve it OOTB....
Right, I don't think it is possible to run a transaction between the
webdav backend an an internal properties store.
The best we could do is to detect the problem (couldn't update local
store) and report it, but it might be hard to recover.
-Bertrand
Re: WebDAV proxy available
Posted by Gianugo Rabellino <gi...@apache.org>.
Bertrand Delacretaz wrote:
>
>> <match type="request-method" pattern="PUT">
>> <act type="syncmetadatadb">
>> <generate type="webdavproxy">
>> <parameter name="url" value="http://whatever/dav/{../1}"/
>> </generate>
>> <serialize/>
>> </act>
>> </match>
>
>
> I think the webdav backend is much more likely to fail than syncmetadb
> in such cases (due to insufficient authorizations etc), which would
> create inconsistencies between the backend and meta db if the sync is
> done with actions.
You're right, we need a safer approach.
>
> It might be better to sync the meta db based on the results of the
> backend, in which case I'd go for a Transformer to post-process the PUT
> result from the backend, feeding sql code to a downstream SQLTransformer:
>
> PUT operation:
> WebdavProxy -> DaslTransformer -> SQLTransformer
It doesn't really work. Every operation carried on by the WebDAV proxy
is actually "eating" the request, so what you get in case of a PUT
operation is just a status code. What you need to do, then, is to issue
a PROPFIND to the backend server and get your metadata.
This however raises a new issue: the whole operation should be atomic
and (somehow) transaction-aware. If for any reason the Dasl component
cannot update the database, an error should be thrown and the operation
should fail. Interesting problem, and I don't actually see a way to
solve it OOTB.
Ciao,
--
Gianugo Rabellino
Pro-netics s.r.l. - http://www.pro-netics.com
Orixo, the XML business alliance - http://www.orixo.com
(Now blogging at: http://blogs.cocoondev.org/gianugo/)
Re: WebDAV proxy available
Posted by Bertrand Delacretaz <bd...@codeconsult.ch>.
Le Lundi, 1 sep 2003, à 10:57 Europe/Zurich, Gianugo Rabellino a écrit :
> ...The hooks IMO should be in the pipeline actually....
sounds good.
> <match type="request-method" pattern="PUT">
> <act type="syncmetadatadb">
> <generate type="webdavproxy">
> <parameter name="url" value="http://whatever/dav/{../1}"/
> </generate>
> <serialize/>
> </act>
> </match>
I think the webdav backend is much more likely to fail than syncmetadb
in such cases (due to insufficient authorizations etc), which would
create inconsistencies between the backend and meta db if the sync is
done with actions.
It might be better to sync the meta db based on the results of the
backend, in which case I'd go for a Transformer to post-process the PUT
result from the backend, feeding sql code to a downstream
SQLTransformer:
PUT operation:
WebdavProxy -> DaslTransformer -> SQLTransformer
I haven't studied all details yet, but hopefully I'm not too far from
reality ;-)
-Bertrand
Re: WebDAV proxy available
Posted by Gianugo Rabellino <gi...@apache.org>.
Bertrand Delacretaz wrote:
>> Anyone willing to tackle DASL searches now? :-)
>
>
> I will have a look.
>
> IIUC the next step would be to implement a ProxyWithHooks that would
> allow requests to be pre- or post-processed, to be able to manipulate
> requests to the webdav backend and their results, right?
The hooks IMO should be in the pipeline actually. What we actually need
is a ProxyHttpTransformer or even just some actions (waiting for the
upcoming object model dav adapter), so that:
<match pattern="dav/**">
<match type="request-method" pattern="SEARCH">
<generate type="request"/>
<transform src="dasl2sql.xsl"/>
<transform type="sql"/>
<transform type="sql2propfind.xsl"/>
<serialize/>
</match>
<match type="request-method" pattern="PUT">
<act type="syncmetadatadb">
<generate type="webdavproxy">
<parameter name="url" value="http://whatever/dav/{../1}"/
</generate>
<serialize/>
</act>
</match>
<match type="request-method" pattern="DELETE">
<act type="syncmetadatadb">
<generate type="webdavproxy">
<parameter name="url" value="http://whatever/dav/{../1}"/
</generate>
<serialize/>
</act>
</match>
<match type="request-method" pattern="PROPFIND">
<generate type="webdavproxy">
<parameter name="url" value="http://whatever/dav/{../1}"/
</generate>
<serialize/>
</match>
[...repeat...]
</match>
The only implementation issue left is using a transformer, an action or
flow to process data that have to go both to the origin server and to a
custom processing component. Sounds reasonable?
Ciao,
--
Gianugo Rabellino
Pro-netics s.r.l. - http://www.pro-netics.com
Orixo, the XML business alliance - http://www.orixo.com
(Now blogging at: http://blogs.cocoondev.org/gianugo/)
Re: WebDAV proxy available
Posted by Bertrand Delacretaz <bd...@codeconsult.ch>.
Le Vendredi, 29 aoû 2003, à 19:46 Europe/Zurich, Gianugo Rabellino a
écrit :
> I have now committed to the proxy block a new ProxyGenerator that
> takes whatever method and forwards it to an origin server specified as
> a sitemap parameter....
Cool!
> Anyone willing to tackle DASL searches now? :-)
I will have a look.
IIUC the next step would be to implement a ProxyWithHooks that would
allow requests to be pre- or post-processed, to be able to manipulate
requests to the webdav backend and their results, right?
I'm not sure where to do this processing though - as webdav requests
and responses are XML it would be nice to be able to process them in
Cocoon pipelines, OTOH a single Generator would be more compact. Need
to dive in to get a clearer picture.
-Bertrand