You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Steve Frank <la...@gmail.com> on 2005/12/30 01:07:07 UTC

[users@httpd] mod_dav Problem

Hello. I am new to the list because of the problem I am seeing on our
servers, which I explain below.

Some Info On The Problem:
We have over 250,000 PUTs a night. Of those, we usually have about 50 that
end up with a 204 status even though they don't actually exist once the
upload is finished. This is obviously a huge problem. I was able to narrow
down the problem. It seems to only happen with some requests that first
return a 500 (the "Could not get next bucket brigade" error). The client
gets the 500, and then starts the transfer again. Apache will then respond
with a 204 message, and the client will think the upload worked, even though
it really didn't. Every filename uploaded is unique, and should always
return a 201, so the fact that we see 204's is odd and must mean that the
500 (or first request) has not finished when the 204 (or second request)
starts. Some log entries will hopefully explain this better...

+---------------------+--------+----------------------------------------+-------------+--------+-----------+---------+----------+
| requesttime         | method | requesturl                             |
querystring | status | timetaken | ioinput | iooutput |
+---------------------+--------+----------------------------------------+-------------+--------+-----------+---------+----------+
| 2005-12-29 03:59:55 | PUT    |
/webdav/username1/folder/file1.txt|             |    204 |    428980 |
1200 |     1277 |
| 2005-12-29 03:58:57 | PUT    | /webdav/username1/folder/file1.txt
|             |    500 | 306521783 |     495 |     1311 |
| 2005-12-29 06:05:49 | PUT    | /webdav/username2/folder/file1.txt
|             |    204 |   2497618 |  142558 |     1277 |
| 2005-12-29 06:02:55 | PUT    | /webdav/username2/folder/file1.txt
|             |    500 | 303576082 |   96329 |     1311 |
+---------------------+--------+---------------------------------------+-------------+--------+-----------+---------+----------+

The above entries show the problem for two different users. The first
request by each user eventually times out (and returns the 500). Our timeout
is set to 300s and the request is killed after basically 300s. But based on
the requesttime, you can see that the second request for each user happens
before the first request is finished. So the second request finished
successfully with a 204, but the first one is still running for some reason.
And when the first one times out it deletes the file. Unfortunately, now the
client thinks the file is there because the second request was successful.

The Setup:
We have a few web servers, and a few more file servers. This problem happens
on every web and file server. The file servers are attached using NFS with
the sync option. The web servers are running 2.0.55 with FC3. The file
servers are mostly FC3, but two are RH9. The client software that uploads
the files will retry the upload 3 times until it gets a status >= 200 and <
300. I've been experimenting with different setups on one webserver.

More Info:
With the experimental webserver, I have tried apache 2.0.46 (it is pre
bitbucket). I have tried turning KeepAlive off. I have changed the Timeout.
None of that has done anything.

At the end of the dav_method_put function, I added code that actually checks
the existence of the file that was uploaded, and also checks the size if it
does exist.

So right before the return, I added:

    struct    stat    statinfo;
    if(stat(r->filename, &statinfo) != 0) {

        err = dav_new_error(r->pool, HTTP_NOT_FOUND, 0,
                            apr_psprintf(r->pool,
                                         "File Not Found After PUT: %s",
                                         r->filename));

        return dav_handle_err(r, err, NULL);

    } else {
        //THIS SECTION HAS NEVER BEEN NEEDED
        if (statinfo.st_size != total_written) {
            ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
                          apr_psprintf(r->pool, "Invalid PUT: %s (WRITTEN:
%i, SIZE: %i)", r->filename, total_written, statinfo.st_size));
        } else {
            ap_log_rerror(APLOG_MARK, APLOG_NOTICE, 0, r,
                          apr_psprintf(r->pool, "Successful PUT: %s
(WRITTEN: %i)", r->filename, total_written));
        }
    }


Unfortunately, this has only decreased the number of lost files, but not
eliminated it. I would also recommend that something like this is added to
the actual mod_dav code.

The only thing I can think of is somehow the 500 process remains alive until
the end of the 204 and then deletes the file. Also, the fact that 204 is
being returned means the 204 request was writing over the 500's version of
the file, so the 500 request has not finished when the 204 happens.

Thoughts???