You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Joseph Grace <oc...@serv.net> on 2004/09/07 07:30:14 UTC
More on open("rws"/"rwd"), O_{D,}SYNC, metadata, and OSX JVM
Dear derby-dev:
Given the derby issue on OSX of preallocate+"rws" file open()'s
failing, I did some more research (google) on the issue of "rws"
metadata. I don't think it's quite as mysterious as it may have at
first seemed (though there are still open questions of O_SYNC
interpretation on OSX and other advanced operating systems). In any
case, take the following with a grain of salt and chime in if you can
with additional corroboration or information.
I gather that metadata typically refers to the update of the directory
information associated with the contents of the file. So, if a
database gets updated, then the, e.g., "modified time", should also be
updated (in the directory entry for that file) as well. If that's
true, then:
"rwd" updates just the contents ensuring full data retrieval, but
glosses over all but essential metadata (i.e., new block allocation
metadata is handled, but directory timestamp updates are skipped).
"rws" updates not only the contents, but also non-essential
metadata (directory timestamps (e.g., "modified time", "access time")
et al.).
I'm not sure it's that simple (since the Java most likely relies on the
underlying OS support for O_SYNC, O_DSYNC or their analogues) but at
least it narrows the focus a bit. I get this impression from a variety
of sites, but this URL is perhaps the clearest I found:
http://publib16.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixprggd/
genprogc/fileio.htm#wq222
under "Synchronous I/O" where it mentions:
-=-
• Specified by the O_DSYNC open flag. When a file is opened using the
O_DSYNC open mode, the write () system call will not return until the
file data and all file system meta-data required to retrieve the file
data are both written to their permanent storage locations.
• Specified by the O_SYNC open flag. In addition to items specified
by O_DSYNC, O_SYNC specifies that the write () system call will not
return until all file attributes relative to the I/O are written to
their permanent storage locations, even if the attributes are not
required to retrieve the file data.
-=-
IOW, I believe O_DSYNC should protect data integrity even if it
(purposely for performance reasons) avoids updating all associated
metadata. O_SYNC is good too, but (at least according to above) you
pay a performance penalty. So, I think O_DSYNC may be a worthwhile
substitute for O_SYNC as long as the incidental metadata is not all
important.
Bottom line: a production d/b with performance goals likely uses
O_DSYNC (since O_SYNC is overkill if you just need to protect the
data).
-=-
I also looked a bit into OSX for O_DSYNC in search of "rws"/"rwd"
insights. I downloaded the sources for Darwin (OSX's BSD
underpinnings). It appears that OSX only has an O_SYNC flag. The
Darwin code says that O_DSYNC is not supported yet. So, in theory,
O_DSYNC should degenerate to O_SYNC.
Unfortunately, even though I was able to look at the Darwin sources, I
do not have the Apple Java sources to see how the flags are treated in
OSX's JVM1.4.2. (There is no JVM1.5 (Java Tiger) (pre)release yet, so
I can't test against a newer version of Java (yet).)
Having said all that, the question still remains why does O_SYNC behave
differently than O_DSYNC in the OSX JVM (especially since only O_SYNC
exists in Darwin). I don't know. The two knee-jerk hypotheses I have
are:
1. jvm:O_SYNC is using darwin:O_SYNC, but jvm:O_DSYNC is
darwin:no_sync (that would be bad). So, if you need O_DSYNC, you
better use O_SYNC (which fails mysteriously when file is preallocated).
_or_
2. jvm:O_DSYNC uses darwin:O_SYNC (as it should), and jvm:O_SYNC uses
darwin:O_SYNC and also synchronizes OSX metadata files like .DS_Store
and resource forks (and ends up taking exception under ambiguous
conditions in "rws" mode).
I don't know enough about typical O_SYNC, .DS_Store, or resource forks
to know the answer to this mystery.
Bottom line: I know neither whether jvm:O_DSYNC protects data on
OSX/Java1.4.2 (as it should), nor why jvm:O_SYNC is any different than
jvm:O_DSYNC on OSX/Java1.4.2 (especially when jvm:O_DSYNC should
degenerate to darwin:O_SYNC since Darwin only has O_SYNC).
Anyway, that's probably an excess for one post. If anyone has insight
to any of these questions (e.g., anyone from the OSX java team! ;-),
please share.
Cheers,
= Joe =