You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Chris Knight <Ch...@nasa.gov> on 2003/05/30 02:15:13 UTC

automatic type identification (by extension) in virtual DAV repositories

So, Catacomb and other mod_dav backends that handle GET requests would 
like to have the Content-Type of resources be automatically identified 
based on the path information of that resource in the same manner as 
file resources.

How would I best approach this problem? It appears that the mod_dav.c 
hooks are defined as MIDDLE as is the mod_mime.c hooks...Should I force 
a specific ordering of execution? Only other problem is find_ct in 
mod_mime.c wants to see the finfo structure which doesn't apply to 
virtual resources.

I can't call find_ct directly, nor can I get at the mime_dir_config 
information to map the extension back to a Content-Type. Ideas?

It seems to me that the mime_dir_config information should be made 
public or at least some methods for converting extension to Content-Type 
(and visa-versa?)


Re: automatic type identification (by extension) in virtual DAV repositories

Posted by Chris Knight <Ch...@nasa.gov>.
André Malo wrote:

>* Chris Knight wrote:
>
>  
>
>>So, Catacomb and other mod_dav backends that handle GET requests would
>>like to have the Content-Type of resources be automatically identified
>>based on the path information of that resource in the same manner as
>>file resources.
>>
>>How would I best approach this problem? It appears that the mod_dav.c
>>hooks are defined as MIDDLE as is the mod_mime.c hooks...Should I force
>>a specific ordering of execution? Only other problem is find_ct in
>>mod_mime.c wants to see the finfo structure which doesn't apply to
>>virtual resources.
>>    
>>
>
>ModMimeUsePathInfo was created for this purpose IIRC.
>
>HTH, nd
>  
>
But it does not function in this circumstance. I believe 
ModMimeUsePathInfo turns on calls to find_ct but find_ct relies on 
filename which only goes down to the <Location> (the root of the 
Catacomb server.) I could hack the filename if I can get to it before 
find_ct is called (as detailed in my second paragraph) but that's 
certainly a poor solution.


Re: automatic type identification (by extension) in virtual DAV repositories

Posted by Chris Knight <Ch...@nasa.gov>.
Justin Erenkrantz wrote:

> --On Friday, May 30, 2003 2:34 AM +0200 André Malo <nd...@perlig.de> wrote:
>
>> ModMimeUsePathInfo was created for this purpose IIRC.
>
>
> Indeed.  The only caveat is that ModMimeUsePathInfo shouldn't 
> necessarily be enabled on resources that you edit.  The problem comes 
> into play when filters are applied and the transmitted representation 
> differs from the on-disk (in-storage) resource.  (This is actually a 
> major collision with REST that WebDAV never quite solved cleanly - it 
> just assumes that the resource == representation.)

Note that Content-Type identification would only come into play when the 
Content-Type was not explicitly identified by the original PUT (as 
happens with the Win32's Web Folders client and the Cadaver client and 
surely many others.) Would it be better, instead, to not send a 
Content-Type header at all? (It appears that find_ct defaults to 
text/plain if it's unable to determine type by extension, should it 
instead do nothing?)

> The best example is with mod_include'd files.  With ModMimeUsePathInfo 
> On, there is no way to disable the output filters when communicating 
> with a DAV client.  Therefore, the solution is to mount the repository 
> twice - once as read-only with ModMimeUsePathInfo On and another one 
> with it Off so that the DAV clients can modify without having the 
> filters executed and get the 'raw' representation.
>
> Hope that makes sense.  If not, I can try to clarify.  -- justin

Already addressed by the WebDAV spec, see:

http://asg.web.cmu.edu/rfc/rfc2518.html#sec-5.4

And, of course, this issue is peripheral to Content-Type identification.


Re: fixed -- mod_mime.c#find_ct() redux

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Thursday, June 5, 2003 10:21 AM -0700 Chris Knight 
<Ch...@nasa.gov> wrote:

> I think because content_type was already defined, it skipped calling
> ap_run_type_checker?

I'm fairly sure it is run all the time, but I'd have to look to be sure.

> This is a major issue with non-fs backends...One option would be to re-do
> the filter to use a temporary file to feed to PHP (a gross hack, I know.)
> Thanks for the details/history...

And, that's an option that was suggested and discarded.  =)

> Good news is I now have a much firmer grasp of the internal workings of
> Apache...
>
> Is there an active project to write/improve the documentation of these
> internal workings?

Just submit patches to dev@httpd.  There's some stuff in 
docs/manual/developer/ that might be a good starting point.  -- justin

fixed -- mod_mime.c#find_ct() redux

Posted by Chris Knight <Ch...@nasa.gov>.
Justin Erenkrantz wrote:

> It looks like Catacomb (0.8.0 is what I just downloaded) is doing the 
> same thing in its dav_repos_set_headers as mod_dav_svn.  That hook 
> runs after the fixups hooks, so it just trounces on the content-type 
> that mod_mime tried to set with ModMimeUsePathInfo.

Ah, good call. I've fixed Catacomb to not do this anymore unless the 
RDBMS has Content-Type defined.

>> In doing some debugging, with mod_dav_fs (with DEBUG_GET_HANDLER 
>> defined in
>> repos.c) I've found that find_ct is not called to identify the type. I'm
>
> Well, here, find_ct is called in the fixups stage.  So, I'm not sure 
> what's going on for you.  Some more specifics could be helpful.

I think because content_type was already defined, it skipped calling 
ap_run_type_checker?

>> assuming the inability to run PHP scripts from a DAV server is the same
>> problem. Bug or feature?
>
> Note that since PHP 4.2.3, which (I believe) introduced the PHP 
> handler for httpd-2.0 and removed the PHP filter, it's not possible to 
> do this chaining. Implementing PHP as a filter got to be a nightmare, 
> and we eventually gave up and did it as a handler instead.  One of the 
> drawbacks is that PHP can't work off of virtual repositories or 
> anything that has its own handler now.
>
> (PHP requires file-backed input into its parser which kills it for 
> Subversion and httpd-2.0.  I believe there may be some work to allow 
> PHP to take in push-based streams like Apache httpd output filters can 
> deliver - when that is ready, we can reexamine PHP as filter again and 
> determine what needs to happen on httpd's side to make it all happy - 
> there is definitely some work that has to be done to httpd as well.)

This is a major issue with non-fs backends...One option would be to 
re-do the filter to use a temporary file to feed to PHP (a gross hack, I 
know.) Thanks for the details/history...

>> Also, I would highly recommend that testing of Apache include testing 
>> the
>> GET handler in mod_dav_fs. There are other backends (Catacomb, ?) 
>> that use
>> mod_dav as their front-end and who handle GETs themselves and we're 
>> running
>> into these issues.
>
>
> httpd-test's perl-framework already has some tests for WebDAV that 
> does exactly this.  Feel free to help us expand our tests over on 
> test-dev@httpd.apache.org.  -- justin

Strangely, I am not seeing the problem with mod_dav_fs anymore...Perhaps 
I was hallucinating...I guess it "works for me". :^o

Good news is I now have a much firmer grasp of the internal workings of 
Apache...

Is there an active project to write/improve the documentation of these 
internal workings?


Re: mod_mime.c#find_ct() redux (Re: automatic type identification (by extension) in virtual DAV repositories)

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Wednesday, June 4, 2003 11:33 AM -0700 Chris Knight 
<Ch...@nasa.gov> wrote:

> First, apologies for being dense with regards to ModMimeUsePathInfo, the
> issue regarding Content-Type auto-identification has nothing to do with this
> directive.

Um, that is exactly what ModMimeUsePathInfo does.  But, let me clarify a few 
things: this is only on the Content-Type sent in the response headers - not 
for the request (i.e. on a PUT without a Content-Type) - and this isn't 
mod_mime_magic which looks at the content to determine the content-type.

mod_dav_fs is a special case because it goes down to the file level rather 
than being at the virtual location layer (r->filename is correct on its own). 
So, it doesn't need ModMimeUsePathInfo.

ModMimeUsePathInfo doesn't work with mod_dav_svn right now, but that's because 
mod_dav_svn unconditionally internally sets the MIME content-type (and uses 
the wrong API to boot).  Although I should probably fix that, but that's 
neither here nor there for this list.

It looks like Catacomb (0.8.0 is what I just downloaded) is doing the same 
thing in its dav_repos_set_headers as mod_dav_svn.  That hook runs after the 
fixups hooks, so it just trounces on the content-type that mod_mime tried to 
set with ModMimeUsePathInfo.

> In doing some debugging, with mod_dav_fs (with DEBUG_GET_HANDLER defined in
> repos.c) I've found that find_ct is not called to identify the type. I'm

Well, here, find_ct is called in the fixups stage.  So, I'm not sure what's 
going on for you.  Some more specifics could be helpful.

> assuming the inability to run PHP scripts from a DAV server is the same
> problem. Bug or feature?

Note that since PHP 4.2.3, which (I believe) introduced the PHP handler for 
httpd-2.0 and removed the PHP filter, it's not possible to do this chaining. 
Implementing PHP as a filter got to be a nightmare, and we eventually gave up 
and did it as a handler instead.  One of the drawbacks is that PHP can't work 
off of virtual repositories or anything that has its own handler now.

(PHP requires file-backed input into its parser which kills it for Subversion 
and httpd-2.0.  I believe there may be some work to allow PHP to take in 
push-based streams like Apache httpd output filters can deliver - when that is 
ready, we can reexamine PHP as filter again and determine what needs to happen 
on httpd's side to make it all happy - there is definitely some work that has 
to be done to httpd as well.)

> Also, I would highly recommend that testing of Apache include testing the
> GET handler in mod_dav_fs. There are other backends (Catacomb, ?) that use
> mod_dav as their front-end and who handle GETs themselves and we're running
> into these issues.

httpd-test's perl-framework already has some tests for WebDAV that does 
exactly this.  Feel free to help us expand our tests over on 
test-dev@httpd.apache.org.  -- justin

mod_mime.c#find_ct() redux (Re: automatic type identification (by extension) in virtual DAV repositories)

Posted by Chris Knight <Ch...@nasa.gov>.
First, apologies for being dense with regards to ModMimeUsePathInfo, the 
issue regarding Content-Type auto-identification has nothing to do with 
this directive.

In doing some debugging, with mod_dav_fs (with DEBUG_GET_HANDLER defined 
in repos.c) I've found that find_ct is not called to identify the type. 
I'm assuming the inability to run PHP scripts from a DAV server is the 
same problem. Bug or feature?

Is there any way to at least access the extension_mappings hash?

Also, I would highly recommend that testing of Apache include testing 
the GET handler in mod_dav_fs. There are other backends (Catacomb, ?) 
that use mod_dav as their front-end and who handle GETs themselves and 
we're running into these issues.


Re: automatic type identification (by extension) in virtual DAV repositories

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Friday, May 30, 2003 2:34 AM +0200 André Malo <nd...@perlig.de> wrote:

> * Chris Knight wrote:
>
>> So, Catacomb and other mod_dav backends that handle GET requests would
>> like to have the Content-Type of resources be automatically identified
>> based on the path information of that resource in the same manner as
>> file resources.
...
>
> ModMimeUsePathInfo was created for this purpose IIRC.

Indeed.  The only caveat is that ModMimeUsePathInfo shouldn't necessarily be 
enabled on resources that you edit.  The problem comes into play when filters 
are applied and the transmitted representation differs from the on-disk 
(in-storage) resource.  (This is actually a major collision with REST that 
WebDAV never quite solved cleanly - it just assumes that the resource == 
representation.)

The best example is with mod_include'd files.  With ModMimeUsePathInfo On, 
there is no way to disable the output filters when communicating with a DAV 
client.  Therefore, the solution is to mount the repository twice - once as 
read-only with ModMimeUsePathInfo On and another one with it Off so that the 
DAV clients can modify without having the filters executed and get the 'raw' 
representation.

Hope that makes sense.  If not, I can try to clarify.  -- justin

Re: automatic type identification (by extension) in virtual DAV repositories

Posted by André Malo <nd...@perlig.de>.
* Chris Knight wrote:

> So, Catacomb and other mod_dav backends that handle GET requests would
> like to have the Content-Type of resources be automatically identified
> based on the path information of that resource in the same manner as
> file resources.
> 
> How would I best approach this problem? It appears that the mod_dav.c
> hooks are defined as MIDDLE as is the mod_mime.c hooks...Should I force
> a specific ordering of execution? Only other problem is find_ct in
> mod_mime.c wants to see the finfo structure which doesn't apply to
> virtual resources.

ModMimeUsePathInfo was created for this purpose IIRC.

HTH, nd
-- 
Wenn nur Ingenieure mit Diplom programmieren würden, hätten wir
wahrscheinlich weniger schlechte Software.
Wir hätten allerdings auch weniger gute Software.
                                   -- Felix von Leitner in dasr