You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2007/09/29 19:05:53 UTC

[jira] Issue Comment Edited: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531257 ] 

rangadi edited comment on HADOOP-1942 at 9/29/07 10:04 AM:
----------------------------------------------------------------

Dhuba, this works around the locking issue with a 1millisec sleep, which is probably ok though better not to do IMO.

I think there is another issue w.r.t how we use lastModificationTIme and lastSyncTIme. Assume each sync takes 1 milisec (might be more) and there is a steady load of more than 1000 edits per sec (quite common). Then lastSyncTIme is _always_ equal or behind lastModTIme. So that implies every IPC thread will run a sync (plus the newly added sleep time). 

This essentially brings us back to same situation : number of edits possible is not much larger than number of syncs possible per sec. I might be mistaken here but the benchmark stats can show this.

I basically like the idea of using two buffers to increase sync efficiency. I think it will have a big improvement on NNBench. I think locking  looks complicated because we have 3 read/write locks. I think it can be done with one simle synchronized lock, and not affected by the 'lastModTime' issue above :

{code}
synchronized void logEdit(...) {
      writeEdit( currentBuffer );
      processErrorStreams(); // etc
} 

void logSync {
   long myGen = 0;
   long localSyncGen = 0;

   synchronized (this) {
       myGen = currentGen;
       
       while ( myGen > syncGen && isSyncRunning ) {
            wait(100);
        }

        if ( myGen <= syncGen ) {
           return;
        }

        // now this thread is expected to run the sync.
       localSyncGen = currentGen;
       isSyncRunning = true;      
       swapBuffers() ;
       currentGen++;
    }

    //sync the old buffer.
    //also sync could be skipped if there is no data in the old buffer.

   synchronized (this) {
       isSyncRunning = false;
       processErrorStreams(); //etc.
       syncGen = localSyncGen;
       this.notifyAll();
   }
}
{code}

Regd processErrorStreams() : this is an error condition and usually never happens. It could be something like this : 

{code}
synchronized processErrorStreams() {
     while ( isSyncRunning) {
            wait();
       }
     //remove the error streams.       
   }    
}
{code}


      was (Author: rangadi):
    
Dhuba, this works around the locking issue with a 1millisec sleep, which is probably ok though better not to do IMO.

I think there is another issue w.r.t how we use lastModificationTIme and lastSyncTIme. Assume each sync takes 1 milisec (might be more) and there is a steady load of more than 1000 edits per sec (quite common). Then lastSyncTIme is _always_ equal or behind lastModTIme. So that implies every IPC thread will run a sync (plus the newly added sleep time). 

This essentially brings us back to same situation : number of edits possible is not much larger than number of syncs possible per sec. I might be mistaken here but the benchmark stats can show this.

I basically like the idea of using two buffers to increase sync efficiency. I think it will have a big improvement on NNBench. I think locking  looks complicated because we have 3 read/write locks. I think it can be done with one simle synchronized lock, and not affected by the 'lastModTime' issue above :

{code}
synchronized void logEdit(...) {
      writeEdit( currentBuffer );
      processErrorStreams(); // etc
} 

void logSync {
   long myGen = 0;
   long localSyncGen = 0;

   synchronized (this) {
       myGen = currentGen;
       
       while ( myGen > syncGen && isSyncRunning ) {
            wait(100);
        }

        if ( myGen <= syncGen ) {
           return;
        }

        // now this thread is expected to run the sync.
       localSyncGen = currentGen;
       isSyncRunning = true;      
       swapBuffers() ;
       currentGen++;
    }

    //sync the old buffer.
    //also sync could be skipped if there is no data in the old buffer.

   synchronized (this) {
       isSyncRunning = false;
       processErrorStreams(); //etc.
       syncGen = localSyncGen;
       editLoc.notifyAll();
   }
}
{code}

Regd processErrorStreams() : this is an error condition and usually never happens. It could be something like this : 

{code}
synchronized processErrorStreams() {
     while ( isSyncRunning) {
            wait();
       }
     //remove the error streams.       
   }    
}
{code}

  
> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.