You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Gianugo Rabellino <gi...@apache.org> on 2003/08/29 19:46:30 UTC

WebDAV proxy available (was: [RT] WebDAV proxy / DASL search in Cocoon)

Bertrand Delacretaz wrote:

> If I'm right, this would allow Cocoon to proxy most WebDAV operations to 
> a non-DASL WebDAV backend, and process the few operations that relate to 
> properties (PROPFIND, PROPPATCH, SEARCH) directly, storing properties in 
> a database.

This was an itch I just had to scratch. :-) I have now committed to the 
proxy block a new ProxyGenerator that takes whatever method and forwards 
it to an origin server specified as a sitemap parameter. To do so, I had 
to put together a new httpclient method (I'm wondering if the httpclient 
team might be interested in it once polished...) that clones an incoming 
HttpServletRequest and makes a call to an origin server. So far I've 
tried it with cadaver and skunkdav, surprisingly enough it seems to work 
with all the webdav methods.

We have a starting point then, but beware:

1. header handling is *hacky*. I had to filter by hand some headers in 
order to make the application behave (don't ask me why, but 
Content-Length in the Cocoon proxied response was off by one byte);

2. no HTTP/1.1 keepalive. I see no way of handling it in Cocoon, since 
every request has to go through the pipeline, and the response 
OutputStream cannot be reused. This might be a major performance hit.

Anyway, it's there for you to play. :-) To access a DAV repository 
running at localhost on port 81 under the /dav context, all you have to 
do is

     <map:generator label="content"
          logger="sitemap.generator.proxy"
          name="generic-proxy"
          src="org.apache.cocoon.generation.GenericProxyGenerator"/>


[...]

     <map:match pattern="dav/">
         <map:generate type="generic-proxy">
           <map:parameter name="url" value="http://localhost:81/dav/"/>
           <map:parameter name="path" value="/dav/"/>
         </map:generate>
         <map:serialize type="xml"/>
     </map:match>

     <map:match pattern="dav/**">
         <map:generate type="generic-proxy">
           <map:parameter name="url" value="http://localhost:8881/dav/"/>
           <map:parameter name="path" value="/dav/{1}"/>
         </map:generate>
         <map:serialize type="xml"/>
     </map:match>

Anyone willing to tackle DASL searches now? :-)

Ciao,

-- 
Gianugo Rabellino
Pro-netics s.r.l. -  http://www.pro-netics.com
Orixo, the XML business alliance - http://www.orixo.com
     (Now blogging at: http://blogs.cocoondev.org/gianugo/)


Re: WebDAV proxy available

Posted by Andreas Hochsteger <e9...@student.tuwien.ac.at>.

Guido Casper wrote:

> Why holding the meta data redundant at all? The meta data is already
> there (on the WebDAV server, even if not DASL enabled). You just need
> to find a way to index afterwards.
> 
> A SQL DB would be nice to store and index a certain set of predefined
> properties but it falls short if you have any abitrary user-defined
> properties (as is the case with WebDAV).
> 
> When I first heard about the idea for using Cocoon as a DASL indexing
> engine I immediately thought about using Lucene for this.
> 
> You would need any collection respond to a GET request with a XML
> representation as the TraversableGenerator generates. The links view
> applies an additional XSLT stylesheet:
> 
>     <map:view from-position="TraversableGenerator" name="links">
>       <map:transform src="2htmlLinks.xsl"/>
>       <map:serialize type="links"/>
>     </map:view>
> 
> So the whole WebDAV repository can be crawled by Lucene.
> And Lucene's content-view-query is configured as
> "cocoon-view=properties".
> 
> with the "properties" view being
> 
>     <map:view from-position="content" name="properties">
>       <map:transform type="source-props"/><!--reading props-->
>       <map:serialize type="xml"/>
>     </map:view>
> 
> You would need the LuceneXMLIndexer to store ALL fields (instead of
> just a few configured ones), as a DASL query could request all
> properties. You could even index content and properties side by side,
> as DASL may query the content itself.
> 
> The challenging part is to create another SearchGenerator that parses
> the DASL query, although it shouldn't be too hard to at least cover a
> basic DASL subset.
> 
> If all this once is working you can easily DASL-enable any
> InspectableSource and it fits more nicely with Cocoon's architecture
> IMHO.
> 
> Does this make sense?

Sounds reasonable to me.
I specifically like the fact to be able to query any InspectableSource 
via DASL.

> Guido

Bye,
	Andreas



Re: WebDAV proxy available

Posted by Gianugo Rabellino <gi...@apache.org>.
Guido Casper wrote:

>>Bertrand Delacretaz wrote:
>>
>>>>  <match type="request-method" pattern="PUT">
>>>>    <act type="syncmetadatadb">
>>>>     <generate type="webdavproxy">
>>>>        <parameter name="url" value="http://whatever/dav/{../1}"/
>>>>     </generate>
>>>>     <serialize/>
>>>>    </act>
>>>>  </match>
> 
> 
> I don't understand. PUT doesn't carry any meta data (not explicitely).
> Do you mean PROPPATCH?

No, as Bertrand said you need to sync every possible property (even live 
ones), so a PUT generates properties on the origin server.

>>You're right, we need a safer approach.
> 
> 
> Why holding the meta data redundant at all? The meta data is already
> there (on the WebDAV server, even if not DASL enabled). You just need
> to find a way to index afterwards.

I'm with Bertrand here: afterwards might just be too late. Think about 
documents being deleted or whose metadata are modified (a live document 
being put on hold as an example). I'm now trying to understand if there 
is a way to implement rollback at a webdav level (a decent SQL database 
would have it for free, so this is not my major concern). I'm wondering 
if an undo operation can be applied for every WebDAV method, but this 
would require quite a messy transaction/redo log" maintenance on Cocoon, 
and I'm not sure I want to dive into that. :-)

> A SQL DB would be nice to store and index a certain set of predefined
> properties but it falls short if you have any abitrary user-defined
> properties (as is the case with WebDAV).

The problem isn't really simple name-value properties as Bertrand 
suggested. Metadata in WebDAV are actually elements, so you need a way 
to query them using XML (though actually DAV:basicsearch has no such 
capabilities AFAIK: it seems to suppose untyped (String, String) 
name-value properties. So, if basicsearch is enough, an SQL DB would be 
just fine (as Catacomb shows).

> You would need the LuceneXMLIndexer to store ALL fields (instead of
> just a few configured ones), as a DASL query could request all
> properties. You could even index content and properties side by side,
> as DASL may query the content itself.

But how does Lucene behave with incremental indexing?

> The challenging part is to create another SearchGenerator that parses
> the DASL query, although it shouldn't be too hard to at least cover a
> basic DASL subset.

The basicsearch grammar implementation is the least problem I can 
imagine. It's XML, so it should be pretty easy to translate it into 
another domain.

> If all this once is working you can easily DASL-enable any
> InspectableSource and it fits more nicely with Cocoon's architecture
> IMHO.

Oh well, that would be pure Nirvana. But it would fit in a much bigger 
picture: WebDAV enabled Sources with not only DASL but all the WebDAV 
stuff in it. Yet, I'm afraid it's a long way to go.

Ciao,

-- 
Gianugo Rabellino
Pro-netics s.r.l. -  http://www.pro-netics.com
Orixo, the XML business alliance - http://www.orixo.com
     (Now blogging at: http://blogs.cocoondev.org/gianugo/)


Re: WebDAV proxy available

Posted by Bertrand Delacretaz <bd...@codeconsult.ch>.
Le Lundi, 1 sep 2003, à 21:06 Europe/Zurich, Guido Casper a écrit :
> ...I don't understand. PUT doesn't carry any meta data (not 
> explicitely).
> Do you mean PROPPATCH?

No, but if you want to index the dead properties as well as live ones, 
you need to intercept PUT and either read the dead props from the 
backend to index them or recreate them locally.

> ...Why holding the meta data redundant at all? The meta data is already
> there (on the WebDAV server, even if not DASL enabled). You just need
> to find a way to index afterwards.

"Afterwards" will not be suitable for all purposes, in most cases you 
want the index to be updated right away, if possible in the same 
transaction used to store the content (but this might be hard to do).

> ...A SQL DB would be nice to store and index a certain set of 
> predefined
> properties but it falls short if you have any abitrary user-defined
> properties (as is the case with WebDAV)....

Storing properties in a table with PROP_NAME and PROP_VALUE rows, for 
example, allows for arbitrary user-defined properties (unless you want 
to store widely varying data types, but I don't know if WebDAV has a 
concept of data types for properties).

> ...When I first heard about the idea for using Cocoon as a DASL 
> indexing
> engine I immediately thought about using Lucene for this....

Sounds interesting, but will Lucene allow efficient index updates on 
every PUT/PROPPATCH and similar operations, or do you have to defer 
indexing for later?

> ...The challenging part is to create another SearchGenerator that 
> parses
> the DASL query, although it shouldn't be too hard to at least cover a
> basic DASL subset....

Yes, and even a custom DASL search language would be useful already, 
even if the basicsearch stuff is not implemented at first.

> ...Does this make sense?

It sure does - I'm just worried about how well Lucene compares to a 
database in terms of storing large quantities of data.

-Bertrand

Re: WebDAV proxy available

Posted by Guido Casper <gc...@s-und-n.de>.
Gianugo Rabellino <gi...@apache.org> wrote:
> Bertrand Delacretaz wrote:
>>
>>>   <match type="request-method" pattern="PUT">
>>>     <act type="syncmetadatadb">
>>>      <generate type="webdavproxy">
>>>         <parameter name="url" value="http://whatever/dav/{../1}"/
>>>      </generate>
>>>      <serialize/>
>>>     </act>
>>>   </match>

I don't understand. PUT doesn't carry any meta data (not explicitely).
Do you mean PROPPATCH?

>>
>>
>> I think the webdav backend is much more likely to fail than
>> syncmetadb in such cases (due to insufficient authorizations etc),
>> which would create inconsistencies between the backend and meta db
>> if the sync is done with actions.
>
> You're right, we need a safer approach.

Why holding the meta data redundant at all? The meta data is already
there (on the WebDAV server, even if not DASL enabled). You just need
to find a way to index afterwards.

A SQL DB would be nice to store and index a certain set of predefined
properties but it falls short if you have any abitrary user-defined
properties (as is the case with WebDAV).

When I first heard about the idea for using Cocoon as a DASL indexing
engine I immediately thought about using Lucene for this.

You would need any collection respond to a GET request with a XML
representation as the TraversableGenerator generates. The links view
applies an additional XSLT stylesheet:

    <map:view from-position="TraversableGenerator" name="links">
      <map:transform src="2htmlLinks.xsl"/>
      <map:serialize type="links"/>
    </map:view>

So the whole WebDAV repository can be crawled by Lucene.
And Lucene's content-view-query is configured as
"cocoon-view=properties".

with the "properties" view being

    <map:view from-position="content" name="properties">
      <map:transform type="source-props"/><!--reading props-->
      <map:serialize type="xml"/>
    </map:view>

You would need the LuceneXMLIndexer to store ALL fields (instead of
just a few configured ones), as a DASL query could request all
properties. You could even index content and properties side by side,
as DASL may query the content itself.

The challenging part is to create another SearchGenerator that parses
the DASL query, although it shouldn't be too hard to at least cover a
basic DASL subset.

If all this once is working you can easily DASL-enable any
InspectableSource and it fits more nicely with Cocoon's architecture
IMHO.

Does this make sense?

Guido


Re: WebDAV proxy available

Posted by Bertrand Delacretaz <bd...@codeconsult.ch>.
Le Lundi, 1 sep 2003, à 17:54 Europe/Zurich, Gianugo Rabellino a écrit :
> ...This however raises a new issue: the whole operation should be 
> atomic and (somehow) transaction-aware. If for any reason the Dasl 
> component cannot update the database, an error should be thrown and 
> the operation should fail. Interesting problem, and I don't actually 
> see a way to solve it OOTB....

Right, I don't think it is possible to run a transaction between the 
webdav backend an an internal properties store.

The best we could do is to detect the problem (couldn't update local 
store) and report it, but it might be hard to recover.

-Bertrand

Re: WebDAV proxy available

Posted by Gianugo Rabellino <gi...@apache.org>.
Bertrand Delacretaz wrote:
> 
>>   <match type="request-method" pattern="PUT">
>>     <act type="syncmetadatadb">
>>      <generate type="webdavproxy">
>>         <parameter name="url" value="http://whatever/dav/{../1}"/
>>      </generate>
>>      <serialize/>
>>     </act>
>>   </match>
> 
> 
> I think the webdav backend is much more likely to fail than syncmetadb 
> in such cases (due to insufficient authorizations etc), which would 
> create inconsistencies between the backend and meta db if the sync is 
> done with actions.

You're right, we need a safer approach.

> 
> It might be better to sync the meta db based on the results of the 
> backend, in which case I'd go for a Transformer to post-process the PUT 
> result from the backend, feeding sql code to a downstream SQLTransformer:
> 
> PUT operation:
> WebdavProxy -> DaslTransformer -> SQLTransformer

It doesn't really work. Every operation carried on by the WebDAV proxy 
is actually "eating" the request, so what you get in case of a PUT 
operation is just a status code. What you need to do, then, is to issue 
a PROPFIND to the backend server and get your metadata.

This however raises a new issue: the whole operation should be atomic 
and (somehow) transaction-aware. If for any reason the Dasl component 
cannot update the database, an error should be thrown and the operation 
should fail. Interesting problem, and I don't actually see a way to 
solve it OOTB.

Ciao,

-- 
Gianugo Rabellino
Pro-netics s.r.l. -  http://www.pro-netics.com
Orixo, the XML business alliance - http://www.orixo.com
     (Now blogging at: http://blogs.cocoondev.org/gianugo/)


Re: WebDAV proxy available

Posted by Bertrand Delacretaz <bd...@codeconsult.ch>.
Le Lundi, 1 sep 2003, à 10:57 Europe/Zurich, Gianugo Rabellino a écrit :

> ...The hooks IMO should be in the pipeline actually....

sounds good.

>   <match type="request-method" pattern="PUT">
>     <act type="syncmetadatadb">
>      <generate type="webdavproxy">
>         <parameter name="url" value="http://whatever/dav/{../1}"/
>      </generate>
>      <serialize/>
>     </act>
>   </match>

I think the webdav backend is much more likely to fail than syncmetadb 
in such cases (due to insufficient authorizations etc), which would 
create inconsistencies between the backend and meta db if the sync is 
done with actions.

It might be better to sync the meta db based on the results of the 
backend, in which case I'd go for a Transformer to post-process the PUT 
result from the backend, feeding sql code to a downstream 
SQLTransformer:

PUT operation:
WebdavProxy -> DaslTransformer -> SQLTransformer

I haven't studied all details yet, but hopefully I'm not too far from 
reality ;-)

-Bertrand


Re: WebDAV proxy available

Posted by Gianugo Rabellino <gi...@apache.org>.
Bertrand Delacretaz wrote:

>> Anyone willing to tackle DASL searches now? :-)
> 
> 
> I will have a look.
> 
> IIUC the next step would be to implement a ProxyWithHooks that would 
> allow requests to be pre- or post-processed, to be able to manipulate 
> requests to the webdav backend and their results, right?

The hooks IMO should be in the pipeline actually. What we actually need 
is a ProxyHttpTransformer or even just some actions (waiting for the 
upcoming object model dav adapter), so that:

<match pattern="dav/**">
   <match type="request-method" pattern="SEARCH">
      <generate type="request"/>
      <transform src="dasl2sql.xsl"/>
      <transform type="sql"/>
      <transform type="sql2propfind.xsl"/>
      <serialize/>
    </match>
   <match type="request-method" pattern="PUT">
     <act type="syncmetadatadb">
      <generate type="webdavproxy">
         <parameter name="url" value="http://whatever/dav/{../1}"/
      </generate>
      <serialize/>
     </act>
   </match>
   <match type="request-method" pattern="DELETE">
     <act type="syncmetadatadb">
      <generate type="webdavproxy">
         <parameter name="url" value="http://whatever/dav/{../1}"/
      </generate>
      <serialize/>
     </act>
   </match>
   <match type="request-method" pattern="PROPFIND">
      <generate type="webdavproxy">
         <parameter name="url" value="http://whatever/dav/{../1}"/
      </generate>
      <serialize/>
   </match>
   [...repeat...]
</match>

The only implementation issue left is using a transformer, an action or 
flow to process data that have to go both to the origin server and to a 
custom processing component. Sounds reasonable?

Ciao,

-- 
Gianugo Rabellino
Pro-netics s.r.l. -  http://www.pro-netics.com
Orixo, the XML business alliance - http://www.orixo.com
     (Now blogging at: http://blogs.cocoondev.org/gianugo/)


Re: WebDAV proxy available

Posted by Bertrand Delacretaz <bd...@codeconsult.ch>.
Le Vendredi, 29 aoû 2003, à 19:46 Europe/Zurich, Gianugo Rabellino a 
écrit :

> I have now committed to the proxy block a new ProxyGenerator that 
> takes whatever method and forwards it to an origin server specified as 
> a sitemap parameter....

Cool!

> Anyone willing to tackle DASL searches now? :-)

I will have a look.

IIUC the next step would be to implement a ProxyWithHooks that would 
allow requests to be pre- or post-processed, to be able to manipulate 
requests to the webdav backend and their results, right?

I'm not sure where to do this processing though - as webdav requests 
and responses are XML it would be nice to be able to process them in 
Cocoon pipelines, OTOH a single Generator would be more compact. Need 
to dive in to get a clearer picture.

-Bertrand