You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2017/11/10 17:00:03 UTC

[jira] [Comment Edited] (LUCENE-8048) Filesystems do not guarantee order of directories updates

    [ https://issues.apache.org/jira/browse/LUCENE-8048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16247771#comment-16247771 ] 

Uwe Schindler edited comment on LUCENE-8048 at 11/10/17 4:59 PM:
-----------------------------------------------------------------

The guarantees about fsync are weak anyways (as Robert said), and on top - fsyncing directory metadata is a hack, the Java API does not allow to do it via API, you need a hack with a file handle - but it works in our testing ([~mikemccand] had/has a test computer with a remote powerswitch to stress all this for weeks). The directory sync is at least documented in Linux Man Pages, for other operating systems it not defined (lack of POSIX standard for it).

In short:
- On Linux, the fsync on the directory really works, but we only know about usual file systems (ext4 and xfs I think)
- In addition because the atomic rename use case is very common in Unix world to commit stuff, the kernel already does the right thing. If it sees an atomic rename, it automatically fsyncs directory under certain conditions (read source code). Robert, is this right - it's long ago when I last looked at that code!
- On MacOSX/Solaris the same applies like for linux, although it does not have the automatism in kernel. And we don't know if fsyncing directory is really done for all file systems. The Man page does not say anything and POSIX does not define it.
- On Windows, the fsync on directory does not work at all (it is a no-op in Lucene -> we have a try-catch around it with an assertion on Windows in the exception block). But Windows file systems guarantee that after the atomic rename the directory is in an consistent state (it's documented). Happens-before also works.


was (Author: thetaphi):
The guarantees about fsync are weak anyways (as Robert said), and on top - fsyncing directory metadata is a hack, the Java API does not allow to do it via API, you need a hack with a file handle - but it works in our testing ([~mikemccand] had/has a test computer with a remote powerswitch to stress all this for weeks). The directory sync is at least documented in Linux Man Pages, for other operating systems it not defined (lack of POSIX standard for it).

In short:
- On Linux, the fsync on the directory really works, but we only know about usual file systems (ext4 and xfs I think)
- In addition because the atomic rename use case is very common in Unix world to commit stuff, the kernel already does the right thing. If it sees an atomic rename, it automatically fsyncs directory under certain conditions (read source code). It "detects
- On MacOSX/Solaris the same applies like for linux, although it does not have the automatism in kernel. And we don't know if fsyncing directory is really done for all file systems. The Man page does not say anything and POSIX does not define it.
- On Windows, the fsync on directory does not work at all (it is a no-op in Lucene -> we have a try-catch around it with an assertion on Windows in the exception block). But Windows file systems guarantee that after the atomic rename the directory is in an consistent state (it's documented). Happens-before also works.

> Filesystems do not guarantee order of directories updates
> ---------------------------------------------------------
>
>                 Key: LUCENE-8048
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8048
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Nikolay Martynov
>
> Currently when index is written to disk the following sequence of events is taking place:
> * write segment file
> * sync segment file
> * write segment file
> * sync segment file
> ...
> * write list of segments
> * sync list of segments
> * rename list of segments
> * sync index directory
> This sequence leads to potential window of opportunity for system to crash after 'rename list of segments' but before 'sync index directory' and depending on exact filesystem implementation this may potentially lead to 'list of segments' being visible in directory while some of the segments are not.
> Solution to this is to sync index directory after all segments have been written. [This commit|https://github.com/mar-kolya/lucene-solr/commit/58e05dd1f633ab9b02d9e6374c7fab59689ae71c] shows idea implemented. I'm fairly certain that I didn't find all the places this may be potentially happening.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org