You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Jesus Cea <jc...@jcea.es> on 2020/02/21 00:43:15 UTC

[users@httpd] DAV doesn't flush data & rename stable before confirming to the client

As far as I understand the code, current DAV implementation just ask the
OS to rename the ".dav" to the real destination file.

Problem is... power failure/disk failure.

We can lose data if we have a power failure or disk failure at the wrong
time. I am experiencing this in production.

I have both cases, files missing because the ".dav" file was not
actually renamed on disk before the power failure occurred and files
with the wrong content (in particular, files truncated/partially
written). Leaking ".dav" files is bad but replacing a good file with a
truncated one is evil. Even talking about power failures/sudden USB
unplugging, etc.

I could suggest two (three) approaches:

First approach:

1. Before the rename (from ".dav" to the real filename), do a buffer+OS
flush (file descriptor flush + OS fsync). That is, be sure the file is
stable on disk before the client gets the ACK.

If this operation is considered too costly and affecting benchmarking or
whatever, please provide a configuration option and let the admin to
decide if she prefers speed or data loss/corruption.

This approach would leak ".dav" files and the client could get an ACK
for a file just uploaded but that will be missing after a power lost,
but at least if the file is there, it is there. No corrupted.

That is, if the file is there, it is correct. You don't get partial
files or good files replaced with bad files.

Second approach:

This approach would require a durable database. The current lock
database could be reused, but I am not sure about the "durable" (or the
entire ACID) currently guaranteed by Apache.

1. When a file is uploaded, write in the database (durable!) the ".dav"
path and the final path destination file.
2. When the upload is done, flush+sync the file, as explained in the
previous approach. ACK the client.
3. Schedule a database update to remove the record. Do not do it now, do
it in a few minutes. The idea here is to be sure that when this record
is deleted, nothing could happen tho the data just uploaded. Also, you
can group database deletes for better performance.

4. If the Apache HTTP server restarts, scan the database. For each
registered ".dav" file, try to delete it. The file could not be there,
and that would be OK.

5. Delete processed records in the database.

This approach would delete stale ".dav" files left behind if the server
crash/power failure" while files are uploaded.

Third approach:

This one would be bulletproof:

Replace in the previous approach:

2. When the upload is done, flush+sync the file, as explained in the
previous approach. Update the database and mark that record as "done".
ACK the client.

...

4. If the Apache HTTP server restarts, scan the database. For each
registered ".dav" file, try to delete it if not marked as "done". If
marked as "done", try to rename the ".dav" file to the real filename.
The file could not be there, and that would be OK.

In a posix filesystem, I am not sure if a "rename"+"fsync" could
guarantee stable storage. If that is the case, we could not require the
database to be sure that the file is not going to vanish at power
failure after the client was ACK'ed.

Some of these approaches are costly. If you think the cost is
unreasonable, please, provide a configuration toggle and let the admin
to choose.

Thanks.

-- 
Jesús Cea Avión                         _/_/      _/_/_/        _/_/_/
jcea@jcea.es - https://www.jcea.es/    _/_/    _/_/  _/_/    _/_/  _/_/
Twitter: @jcea                        _/_/    _/_/          _/_/_/_/_/
jabber / xmpp:jcea@jabber.org  _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz