You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Steve Frank <la...@gmail.com> on 2005/12/28 18:18:06 UTC

mod_dav Code

Hello. I am new to the list because I needed to make some adjustments to the
mod_dav code and I'm hoping someone can confirm what I have done makes
sense.

Some Info:
We have anywhere from 250k to 1 million PUTs a night. Of those, we usually
have about 50 that end up with a 204 status even though they don't actually
exist once the upload is finished. This is obviously a huge problem.

The Setup:
We have 4 web servers, and 6 file servers. This problem happens on every web
and file server. The file servers are attached using NFS with the sync
option. The web servers are running 2.0.55 with FC3. The file servers are
mostly FC3, but two are RH9. Keep-Alive is turned on in Apache. The client
software that uploads the files will retry the upload 3 times until it gets
a status >= 200 and < 300.

More Info:
I was able to narrow down the problem. It seems to only happen with some
requests that first return a 500 (the "Could not get next bucket brigade"
error). The client gets the 500, and then starts the transfer again. Apache
will then respond with a 204 message, and the client will think the upload
worked, even though it really didn't. Every filename uploaded is unique, and
should always return a 201, so the fact that we see 204's is odd and must
mean that the 500 (or first request) has not finished when the 204 (or
second request) starts.

An Example For One File:
The apache logs show that both requests (the 500 and 204) had the same
request time, such as "2005-12-27 23:01:43" even though there's no way they
could have happened at the same time, since the 500 request took 423s and
the 204 took 133s. Also the 500 reported (using mod_logio) that the input
was 1677285, yet the input for the 204 was 1996163.

My Solution:
At the end of the dav_method_put function, I added code that actually checks
the existence of the file that was uploaded, and also checks the size if it
does exist.

So right before the return, I added:

    struct    stat    statinfo;
    if(stat(r->filename, &statinfo) != 0) {

        err = dav_new_error(r->pool, HTTP_NOT_FOUND, 0,
                            apr_psprintf(r->pool,
                                         "File Not Found After PUT: %s",
                                         r->filename));

        return dav_handle_err(r, err, NULL);

    } else {
        //THIS SECTION HAS NEVER BEEN NEEDED
        if (statinfo.st_size != total_written) {
            ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
                          apr_psprintf(r->pool, "Invalid PUT: %s (WRITTEN:
%i, SIZE: %i)", r->filename, total_written, statinfo.st_size));
        } else {
            ap_log_rerror(APLOG_MARK, APLOG_NOTICE, 0, r,
                          apr_psprintf(r->pool, "Successful PUT: %s
(WRITTEN: %i)", r->filename, total_written));
        }
    }


Unfortunately, this has only decreased the number of lost files, but not
eliminated it.

The only thing I can think of is somehow the 500 process remains alive until
the end of the 204 and then deletes the file. Also, the fact that 204 is
being returned means the 204 request was writing over the 500's version of
the file, so the 500 request has not finished when the 204 happens.

Thanks for your help!

-Steve