You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Joseph Grace <oc...@serv.net> on 2004/09/07 07:30:14 UTC

More on open("rws"/"rwd"), O_{D,}SYNC, metadata, and OSX JVM

Dear derby-dev:

Given the derby issue on OSX of preallocate+"rws" file open()'s  
failing, I did some more research (google) on the issue of "rws"  
metadata.  I don't think it's quite as mysterious as it may have at  
first seemed (though there are still open questions of O_SYNC  
interpretation on OSX and other advanced operating systems).  In any  
case, take the following with a grain of salt and chime in if you can  
with additional corroboration or information.

I gather that metadata typically refers to the update of the directory  
information associated with the contents of the file.  So, if a  
database gets updated, then the, e.g., "modified time", should also be  
updated (in the directory entry for that file) as well.  If that's  
true, then:

     "rwd" updates just the contents ensuring full data retrieval, but  
glosses over all but essential metadata (i.e., new block allocation  
metadata is handled, but directory timestamp updates are skipped).

     "rws" updates not only the contents, but also non-essential  
metadata (directory timestamps (e.g., "modified time", "access time")  
et al.).

I'm not sure it's that simple (since the Java most likely relies on the  
underlying OS support for O_SYNC, O_DSYNC or their analogues) but at  
least it narrows the focus a bit.  I get this impression from a variety  
of sites, but this URL is perhaps the clearest I found:

      
http://publib16.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixprggd/ 
genprogc/fileio.htm#wq222

under "Synchronous I/O" where it mentions:

-=-
	• 	Specified by the O_DSYNC open flag. When a file is opened using the  
O_DSYNC open mode, the write () system call will not return until the  
file data and all file system meta-data required to retrieve the file  
data are both written to their permanent storage locations.
	• 	Specified by the O_SYNC open flag. In addition to items specified  
by O_DSYNC, O_SYNC specifies that the write () system call will not  
return until all file attributes relative to the I/O are written to  
their permanent storage locations, even if the attributes are not  
required to retrieve the file data.
-=-

IOW, I believe O_DSYNC should protect data integrity even if it  
(purposely for performance reasons) avoids updating all associated  
metadata.  O_SYNC is good too, but (at least according to above) you  
pay a performance penalty.  So, I think O_DSYNC may be a worthwhile  
substitute for O_SYNC as long as the incidental metadata is not all  
important.

Bottom line:  a production d/b with performance goals likely uses  
O_DSYNC (since O_SYNC is overkill if you just need to protect the  
data).

-=-

I also looked a bit into OSX for O_DSYNC in search of "rws"/"rwd"  
insights.  I downloaded the sources for Darwin (OSX's BSD  
underpinnings).  It appears that OSX only has an O_SYNC flag.  The  
Darwin code says that O_DSYNC is not supported yet.  So, in theory,  
O_DSYNC should degenerate to O_SYNC.

Unfortunately, even though I was able to look at the Darwin sources, I  
do not have the Apple Java sources to see how the flags are treated in  
OSX's JVM1.4.2.  (There is no JVM1.5 (Java Tiger) (pre)release yet, so  
I can't test against a newer version of Java (yet).)

Having said all that, the question still remains why does O_SYNC behave  
differently than O_DSYNC in the OSX JVM (especially since only O_SYNC  
exists in Darwin).  I don't know.  The two knee-jerk hypotheses I have  
are:

1.  jvm:O_SYNC is using darwin:O_SYNC, but jvm:O_DSYNC is  
darwin:no_sync (that would be bad).  So, if you need O_DSYNC, you  
better use O_SYNC (which fails mysteriously when file is preallocated).

_or_

2.  jvm:O_DSYNC uses darwin:O_SYNC (as it should), and jvm:O_SYNC uses  
darwin:O_SYNC and also synchronizes OSX metadata files like .DS_Store  
and resource forks (and ends up taking exception under ambiguous  
conditions in "rws" mode).

I don't know enough about typical O_SYNC, .DS_Store, or resource forks  
to know the answer to this mystery.

Bottom line:  I know neither whether jvm:O_DSYNC protects data on  
OSX/Java1.4.2 (as it should), nor why jvm:O_SYNC is any different than  
jvm:O_DSYNC on OSX/Java1.4.2 (especially when jvm:O_DSYNC should  
degenerate to darwin:O_SYNC since Darwin only has O_SYNC).

Anyway, that's probably an excess for one post.  If anyone has insight  
to any of these questions (e.g., anyone from the OSX java team! ;-),  
please share.

Cheers,

= Joe =