You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Ivan Veselovsky (JIRA)" <ji...@apache.org> on 2016/04/21 21:36:25 UTC

[jira] [Comment Edited] (IGNITE-2938) IgfsBackupsDualAsyncSelfTest.testAppendParentMissing and IgfsBackupsDualAsyncSelfTest.testAppendParentMissingPartially fail sometimes on master

    [ https://issues.apache.org/jira/browse/IGNITE-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252520#comment-15252520 ] 

Ivan Veselovsky edited comment on IGNITE-2938 at 4/21/16 7:36 PM:
------------------------------------------------------------------

The problem description: 
IgfsDataManager#dataBlock() method has the following structure (simplified code):
{code}
  Future<byte[]> f = dataCache.getAsync(blockKey);

  f = f.chain({
       byte[] result = f0.get();

       if (result == null) {
          result = secondaryReader.read(...);
       }

       putSafe(blockKey, result);

       return result;
  })

  return f;
{code} 

, where {code}putSafe{code} is a kind of async put to data cache executed in a special pool.

This code invoked from 2 places: 
1) IgfsMetaManager#appendDual() -- in this case it reads the last incomplete block of the file to append data to it.
2) IgfsInputStreamImpl -- in this case we just read all the blocks of the file.

Original problem observed in test is that the datacache.put() operation invoked asynchronously from putSafe() can overwrite the new block written during the append action.

The following statement seems to be true: only the last incomplete block of the file can cause such conflict. This is true because all the other blocks can be written only once from IGFS side, because the IGFS API only allows to append file, no random access write is possible. Also, IGFS secondary fs model does not assume that a file can be modified "externally" on the secondary file system. If that happens, the file should be completely re-read, and this new file will have new Ids in all caches. So, all the blocks except the last, are constant. But we should bear in mind that they can disappear from data cache as a result of eviction, and in such case they will be re-read from secondary file system again. 

So, the following change is proposed to fix the problem:
0) Introduce a new filed of IgfsEntryInfo#version (exact name TBD), that would uniquely identify the file data state.
1) in case of append operation we may put the last incomplete block to the data cache *only* before we acquired write lock (meta#invokeLock()) on the file. 
Namely , the pseudo-code of putSafe task should look like this:
  {code}  
  try (Transaction tx = metaCache.startTx()) {
    IgfsFileInfo info = metaCahce.get(fileId); 
    long currentVersion = info.version();
    IgniteUuid lockId = info.lockId();
    if (currentVersion == initialVersion && lockId == null) {
       dataCache.putIfAbsent(blockKey, block);   // "put if absent" because there is no use case when we should overwrite anything there.
    } 
  }
{code}
currentVersion == initialVersion condition is because if a file was appended several times, the block we have is obsolete -- we should drop it.
lockId == null conditiuon is because after we acquired the lock, the new block may be written at any time. We do "putIfAbsent", so it seems, we may put even after the lock is acquired. But this is not the case because after append operation has put a new block, this block may be evicted immediately, and the block is absent (null) again. In this case we'll get again a wrong situation , when an old block will be written over the new one.

In IgfsInputStreamInpl read implementation we should apply exactly the same async put logic for the last incomplete block. 
(The last block index can always be easily calculated provided we have IgfsEntryInfo: this is fileLength / blockSize if fileLength % blockSize != 0.)

When we read other blocks (except the last incomplete one) in IgfsInputStreamInpl, we should *only* check that the file still exists. Otherwise we'll just cache a garbage.
We don't need any version or lockId checks because these blocks cannot be modified (if they are modified externally , then we should re-create the IGFS file completely).
So, this code should look like 
{code}
  try (Transaction tx = metaCache.startTx()) {
    IgfsFileInfo info = metaCahce.get(fileId); 
    if (info != null) {
      IgniteUuid lockId = info.lockId();
      // Note that we don't check if lockId is null or not, because these blocks must be constant:
      if (lockId != DELETE_LOCK_ID) {
         dataCache.putIfAbsent(blockKey, block); //We cannot assert result there because concurrent InputStream might have performed the same put already.
      } 
   }
  }
{code}

This logic


was (Author: iveselovskiy):
The problem description: 
IgfsDataManager#dataBlock() method has the following structure (simplified code):
{code}
  Future<byte[]> f = dataCache.getAsync(blockKey);

  f = f.chain({
       byte[] result = f0.get();

       if (result == null) {
          result = secondaryReader.read(...);
       }

       putSafe(blockKey, result);

       return result;
  })

  return f;
{code} 

, where {code}putSafe{code} is a kind of async put to data cache executed in a special pool.

This code invoked from 2 places: 
1) IgfsMetaManager#appendDual() -- in this case it reads the last incomplete block of the file to append data to it.
2) IgfsInputStreamImpl -- in this case we just read all the blocks of the file.

Original problem observed in test is that the datacache.put() operation invoked asynchronously from putSafe() can overwrite the new block written during the append action.

The following statement seem to be true:
Only the last incomplete block of the file can cause such conflict. This is true because all the other blocks can be written only once from IGFS side, because the IGFS API only allows to append file, no random access write is possible. Also, IGFS secondary fs model does not assume that a file can be modified "externally" on the secondary file system. If that happens, the file should be completely re-read, and this new file will have new Ids in all caches. So, all the blocks except the last, are constant. But we should bear in mind that they can disappear from data cache as a result of eviction, and in such case they will be re-read from secondary file system again. 

So, the following change is proposed to fix the problem:
0) Introduce a new filed of IgfsEntryInfo#version (exact name TBD), that would uniquely identify the file data state.
1) in case of append operation we may put the last incomplete block to the data cache *only* before we acquired write lock (meta#invokeLock()) on the file. 
Namely , the pseudo-code of putSafe task should look like this:
  {code}  
  try (Transaction tx = metaCache.startTx()) {
    IgfsFileInfo info = metaCahce.get(fileId); 
    long currentVersion = info.version();
    IgniteUuid lockId = info.lockId();
    if (currentVersion == initialVersion && lockId == null) {
       dataCache.putIfAbsent(blockKey, block);   // "put if absent" because there is no use case when we should overwrite anything there.
    } 
  }
{code}
currentVersion == initialVersion condition is because if a file was appended several times, the block we have is obsolete -- we should drop it.
lockId == null conditiuon is because after we acquired the lock, the new block may be written at any time. We do "putIfAbsent", so it seems, we may put even after the lock is acquired. But this is not the case because after append operation has put a new block, this block may be evicted immediately, and the block is absent (null) again. In this case we'll get again a wrong situation , when an old block will be written over the new one.

In IgfsInputStreamInpl read implementation we should apply exactly the same async put logic for the last incomplete block. 
(The last block index can always be easily calculated provided we have IgfsEntryInfo: this is fileLength / blockSize if fileLength % blockSize != 0.)

When we read other blocks (except the last incomplete one) in IgfsInputStreamInpl, we should *only* check that the file still exists. Otherwise we'll just cache a garbage.
We don't need any version or lockId checks because these blocks cannot be modified (if they are modified externally , then we should re-create the IGFS file completely).
So, this code should look like 
{code}
  try (Transaction tx = metaCache.startTx()) {
    IgfsFileInfo info = metaCahce.get(fileId); 
    if (info != null) {
      IgniteUuid lockId = info.lockId();
      // Note that we don't check if lockId is null or not, because these blocks must be constant:
      if (lockId != DELETE_LOCK_ID) {
         dataCache.putIfAbsent(blockKey, block); //We cannot assert result there because concurrent InputStream might have performed the same put already.
      } 
   }
  }
{code}

This logic

> IgfsBackupsDualAsyncSelfTest.testAppendParentMissing and IgfsBackupsDualAsyncSelfTest.testAppendParentMissingPartially fail sometimes on master
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-2938
>                 URL: https://issues.apache.org/jira/browse/IGNITE-2938
>             Project: Ignite
>          Issue Type: Test
>          Components: IGFS
>    Affects Versions: 1.5.0.final
>            Reporter: Ivan Veselovsky
>            Assignee: Ivan Veselovsky
>             Fix For: 1.6
>
>
> Tests 
>   IgfsBackupsDualAsyncSelfTest.testAppendParentMissing	
>   IgfsBackupsDualAsyncSelfTest.testAppendParentMissingPartially	
> fail from time to time on master -- need to investigate. 
> It looks like that started to happen after fix  https://issues.apache.org/jira/browse/IGNITE-1631 .
> The failure happens with probability ~1/50 :
> {code}
> [18:14:50,772][INFO ][main][root] >>> Starting test: IgfsBackupsDualAsyncSelfTest#testAppendParentMissingPartially <<<
> [18:14:50,792][ERROR][main][root] Test failed.
> java.io.IOException: Inconsistent file's data block (incorrectly written?) [path=/dir/subdir/file, blockIdx=0, blockSize=128, expectedBlockSize=256, fileBlockSize=524288, fileLen=256]
> 	at org.apache.ignite.internal.processors.igfs.IgfsInputStreamImpl.block(IgfsInputStreamImpl.java:485)
> 	at org.apache.ignite.internal.processors.igfs.IgfsInputStreamImpl.blockFragmentizerSafe(IgfsInputStreamImpl.java:399)
> 	at org.apache.ignite.internal.processors.igfs.IgfsInputStreamImpl.readFromStore(IgfsInputStreamImpl.java:373)
> 	at org.apache.ignite.internal.processors.igfs.IgfsInputStreamImpl.readFully(IgfsInputStreamImpl.java:222)
> 	at org.apache.ignite.internal.processors.igfs.IgfsInputStreamImpl.readFully(IgfsInputStreamImpl.java:216)
> 	at org.apache.ignite.internal.processors.igfs.IgfsAbstractSelfTest.checkFileContent(IgfsAbstractSelfTest.java:2999)
> 	at org.apache.ignite.internal.processors.igfs.IgfsAbstractSelfTest.checkFile(IgfsAbstractSelfTest.java:2969)
> 	at org.apache.ignite.internal.processors.igfs.IgfsDualAbstractSelfTest.testAppendParentMissingPartially(IgfsDualAbstractSelfTest.java:1364)
> 	at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at junit.framework.TestCase.runTest(TestCase.java:176)
> 	at org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:1759)
> 	at org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:118)
> 	at org.apache.ignite.testframework.junits.GridAbstractTest$4.run(GridAbstractTest.java:1697)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)