You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Dawid Weiss <da...@gmail.com> on 2018/07/19 09:47:12 UTC
Directory contracts: read access to a file still open for writing?
While looking at the code I came across the following in the Directory class:
* A Directory is a flat list of files. Files may be written once, when they
* are created. Once a file is created it may only be opened for read, or
* deleted. Random access is permitted both when reading and writing.
What is the "Random access is permitted both when reading and
writing"? Specifically, IndexOutput doesn't allow seeks and if "once a
file is created it may only be opened for read" mean "ONLY after a
file is created it may be opened for read" then we should allow
directory implementations for which concurrent opening of a file for
which an IndexOutput is still open for writes result in an
IOException...
We currently make an exception from the above for "segments*" files,
as shown in MockDirectoryWrapper:
// cannot open a file for input if it's still open for
// output, except for segments.gen and segments_N
if (!allowReadingFilesStillOpenForWrite &&
openFilesForWrite.contains(name) && !name.startsWith("segments")) { ,
and BaseDirectoryTestCase:
try {
IndexInput input = dir.openInput(file, newIOContext(random()));
input.close();
} catch (FileNotFoundException | NoSuchFileException e) {
// ignore
} catch (IOException e) {
if (e.getMessage() != null &&
e.getMessage().contains("still open for writing")) {
// ignore
} else {
throw new RuntimeException(e);
}
}
(For the record, Solr's MockDirectoryFactory enables opening files
being written to to be opened entirely.)
I understand SegmentInfos.finishCommit does an atomic rename (and dir
metadata flush) from a temporary (pending) segments file to the final
segments_X so there should be no possibility of reading or ever
accessing a partially written (or still open for writing) segments*
file.
Am I missing something? Are the above assumptions and exceptions a
historical heritage that can be cleaned up and the contract of the
Directory class clarified?
Dawid
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Directory contracts: read access to a file still open for writing?
Posted by Robert Muir <rc...@gmail.com>.
On Thu, Jul 19, 2018 at 5:47 AM, Dawid Weiss <da...@gmail.com> wrote:
> While looking at the code I came across the following in the Directory class:
>
> * A Directory is a flat list of files. Files may be written once, when they
> * are created. Once a file is created it may only be opened for read, or
> * deleted. Random access is permitted both when reading and writing.
>
> What is the "Random access is permitted both when reading and
> writing"? Specifically, IndexOutput doesn't allow seeks and if "once a
> file is created it may only be opened for read" mean "ONLY after a
> file is created it may be opened for read" then we should allow
> directory implementations for which concurrent opening of a file for
> which an IndexOutput is still open for writes result in an
> IOException...
>
> We currently make an exception from the above for "segments*" files,
> as shown in MockDirectoryWrapper:
>
> // cannot open a file for input if it's still open for
> // output, except for segments.gen and segments_N
> if (!allowReadingFilesStillOpenForWrite &&
> openFilesForWrite.contains(name) && !name.startsWith("segments")) { ,
>
> and BaseDirectoryTestCase:
>
> try {
> IndexInput input = dir.openInput(file, newIOContext(random()));
> input.close();
> } catch (FileNotFoundException | NoSuchFileException e) {
> // ignore
> } catch (IOException e) {
> if (e.getMessage() != null &&
> e.getMessage().contains("still open for writing")) {
> // ignore
> } else {
> throw new RuntimeException(e);
> }
> }
>
> (For the record, Solr's MockDirectoryFactory enables opening files
> being written to to be opened entirely.)
>
> I understand SegmentInfos.finishCommit does an atomic rename (and dir
> metadata flush) from a temporary (pending) segments file to the final
> segments_X so there should be no possibility of reading or ever
> accessing a partially written (or still open for writing) segments*
> file.
>
> Am I missing something? Are the above assumptions and exceptions a
> historical heritage that can be cleaned up and the contract of the
> Directory class clarified?
I don't think so, as witnessed by segments.gen, there is some cruft in
the tests. It would be great to tighten up the tests here, might find
a bug!
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Directory contracts: read access to a file still open for writing?
Posted by Robert Muir <rc...@gmail.com>.
On Thu, Jul 19, 2018 at 9:31 AM, Dawid Weiss <da...@gmail.com> wrote:
>
> I think it'd be good to clean this up and enforce write-once and
> no-read-before-write-closed
> policy as it opens doors to other improvements and cleanups (RAMDirectory...).
>
> I filed LUCENE-8415 to track this.
>
Big +1, thank you.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Directory contracts: read access to a file still open for writing?
Posted by Dawid Weiss <da...@gmail.com>.
> I don't think this is true? it should not happen for segments_N
> because those files are only produced by atomic rename (we write a
> "pending" file first)
That's exactly right from my understanding of the code. Those
references to "segments.gen" are only used for
trying to read old indexes... but I don't think they could be read
anyway (codecs).
I think it'd be good to clean this up and enforce write-once and
no-read-before-write-closed
policy as it opens doors to other improvements and cleanups (RAMDirectory...).
I filed LUCENE-8415 to track this.
Dawid
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Directory contracts: read access to a file still open for writing?
Posted by Robert Muir <rc...@gmail.com>.
On Thu, Jul 19, 2018 at 9:12 AM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> Hi Dawid,
>
> Those docs are stale -- we removed random access writing a long time ago.
> Please fix :)
>
> Opening a file for read that is still open for writing is less well defined
> -- it certainly happens for segments_N (we stopped writing segments.gen a
> while ago), but really should not happen for any other index files, I think?
>
I don't think this is true? it should not happen for segments_N
because those files are only produced by atomic rename (we write a
"pending" file first)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Directory contracts: read access to a file still open for writing?
Posted by Michael McCandless <lu...@mikemccandless.com>.
Hi Dawid,
Those docs are stale -- we removed random access writing a long time ago.
Please fix :)
Opening a file for read that is still open for writing is less well defined
-- it certainly happens for segments_N (we stopped writing segments.gen a
while ago), but really should not happen for any other index files, I think?
Mike McCandless
http://blog.mikemccandless.com
On Thu, Jul 19, 2018 at 5:47 AM, Dawid Weiss <da...@gmail.com> wrote:
> While looking at the code I came across the following in the Directory
> class:
>
> * A Directory is a flat list of files. Files may be written once, when
> they
> * are created. Once a file is created it may only be opened for read, or
> * deleted. Random access is permitted both when reading and writing.
>
> What is the "Random access is permitted both when reading and
> writing"? Specifically, IndexOutput doesn't allow seeks and if "once a
> file is created it may only be opened for read" mean "ONLY after a
> file is created it may be opened for read" then we should allow
> directory implementations for which concurrent opening of a file for
> which an IndexOutput is still open for writes result in an
> IOException...
>
> We currently make an exception from the above for "segments*" files,
> as shown in MockDirectoryWrapper:
>
> // cannot open a file for input if it's still open for
> // output, except for segments.gen and segments_N
> if (!allowReadingFilesStillOpenForWrite &&
> openFilesForWrite.contains(name) && !name.startsWith("segments")) { ,
>
> and BaseDirectoryTestCase:
>
> try {
> IndexInput input = dir.openInput(file,
> newIOContext(random()));
> input.close();
> } catch (FileNotFoundException | NoSuchFileException e) {
> // ignore
> } catch (IOException e) {
> if (e.getMessage() != null &&
> e.getMessage().contains("still open for writing")) {
> // ignore
> } else {
> throw new RuntimeException(e);
> }
> }
>
> (For the record, Solr's MockDirectoryFactory enables opening files
> being written to to be opened entirely.)
>
> I understand SegmentInfos.finishCommit does an atomic rename (and dir
> metadata flush) from a temporary (pending) segments file to the final
> segments_X so there should be no possibility of reading or ever
> accessing a partially written (or still open for writing) segments*
> file.
>
> Am I missing something? Are the above assumptions and exceptions a
> historical heritage that can be cleaned up and the contract of the
> Directory class clarified?
>
> Dawid
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>