You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2007/09/25 08:04:50 UTC

[jira] Created: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Increase the concurrency of transaction logging to edits log
------------------------------------------------------------

                 Key: HADOOP-1942
                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
             Project: Hadoop
          Issue Type: Improvement
          Components: dfs
            Reporter: dhruba borthakur
            Assignee: dhruba borthakur
             Fix For: 0.15.0


For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.

One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).

This approach does a better job of batching syncs to disk, thus improving performance.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533876 ] 

Raghu Angadi commented on HADOOP-1942:
--------------------------------------

Yes, following line show nearly 18ms for each sync. 10 ms does not surprise much.. I wonder what was happenning before. Would sync time vary much on amount of data synced?

{noformat}
2007-10-09 19:45:37,778 INFO org.apache.hadoop.fs.FSNamesystem: Number of transactions: 240414 
Total time for transactions(ms): 1242 Number of syncs: 9823 SyncTimes(ms): 179477 
{noformat}


> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch, transactionLogSync8.patch, transactionLogSync9.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1942:
-------------------------------------

    Attachment: transactionLogSync.patch

Added statistics counters that are printed to the namenode log.

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531257 ] 

Raghu Angadi commented on HADOOP-1942:
--------------------------------------


Dhuba, this works around the locking issue with a 1millisec sleep, which is probably ok though better not to do IMO.

I think there is another issue w.r.t how we use lastModificationTIme and lastSyncTIme. Assume each sync takes 1 milisec (might be more) and there is a steady load of more than 1000 edits per sec (quite common). Then lastSyncTIme is _always_ equal or behind lastModTIme. So that implies every IPC thread will run a sync (plus the newly added sleep time). 

This essentially brings us back to same situation : number of edits possible is not much larger than number of syncs possible per sec. I might be mistaken here but the benchmark stats can show this.

I basically like the idea of using two buffers to increase sync efficiency. I think it will have a big improvement on NNBench. I think locking  looks complicated because we have 3 read/write locks. I think it can be done with one simle synchronized lock, and not affected by the 'lastModTime' issue above :

{code}
synchronized void logEdit(...) {
      writeEdit( currentBuffer );
      processErrorStreams(); // etc
} 

void logSync {
   long myGen = 0;
   long localSyncGen = 0;

   synchronized (this) {
       myGen = currentGen;
       
       while ( myGen > syncGen && isSyncRunning ) {
            wait(100);
        }

        if ( myGen <= syncGen ) {
           return;
        }

        // now this thread is expected to run the sync.
       localSyncGen = currentGen;
       isSyncRunning = true;      
       swapBuffers() ;
       currentGen++;
    }

    //sync the old buffer.
    //also sync could be skipped if there is no data in the old buffer.

   synchronized (this) {
       isSyncRunning = false;
       processErrorStreams(); //etc.
       syncGen = localSyncGen;
       editLoc.notifyAll();
   }
}
{code}

Regd processErrorStreams() : this is an error condition and usually never happens. It could be something like this : 

{code}
synchronized processErrorStreams() {
     while ( isSyncRunning) {
            wait();
       }
     //remove the error streams.       
   }    
}
{code}


> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1942:
-------------------------------------

    Priority: Blocker  (was: Major)

This performance improvement might be critical to support a 1400 node webmap cluster, hence marking it as a blocker.

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1942:
-------------------------------------

    Attachment: transactionLogSync9.patch

Fixed the findbugs warning in FSEditLog.

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch, transactionLogSync8.patch, transactionLogSync9.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1942:
-------------------------------------

    Attachment: transactionLogSync.patch

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533830 ] 

dhruba borthakur commented on HADOOP-1942:
------------------------------------------

My analysis of the logs show that each sync is taking about 10 ms on the average. Also, when a sync is running, an average of 25-30 threads are waiting for that sync (or the next sync) to complete.

Given the above, I think the next experiment we can run is to do th same benchmark with the number of threads on the namenode set to 100 or so. 

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch, transactionLogSync8.patch, transactionLogSync9.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Mukund Madhugiri (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533538 ] 

Mukund Madhugiri commented on HADOOP-1942:
------------------------------------------

I ran NNBench on a 500 node cluster and here is the comparison data

trunk:
CreateWrite: 924 TPS
OpenRead: 58892 TPS
Delete: 766 TPS

trunk + patch:
CreateWrite: 2520 TPS
OpenRead: 54831 TPS
Delete: 1993 TPS

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch, transactionLogSync8.patch, transactionLogSync9.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531257 ] 

rangadi edited comment on HADOOP-1942 at 9/29/07 10:04 AM:
----------------------------------------------------------------

Dhuba, this works around the locking issue with a 1millisec sleep, which is probably ok though better not to do IMO.

I think there is another issue w.r.t how we use lastModificationTIme and lastSyncTIme. Assume each sync takes 1 milisec (might be more) and there is a steady load of more than 1000 edits per sec (quite common). Then lastSyncTIme is _always_ equal or behind lastModTIme. So that implies every IPC thread will run a sync (plus the newly added sleep time). 

This essentially brings us back to same situation : number of edits possible is not much larger than number of syncs possible per sec. I might be mistaken here but the benchmark stats can show this.

I basically like the idea of using two buffers to increase sync efficiency. I think it will have a big improvement on NNBench. I think locking  looks complicated because we have 3 read/write locks. I think it can be done with one simle synchronized lock, and not affected by the 'lastModTime' issue above :

{code}
synchronized void logEdit(...) {
      writeEdit( currentBuffer );
      processErrorStreams(); // etc
} 

void logSync {
   long myGen = 0;
   long localSyncGen = 0;

   synchronized (this) {
       myGen = currentGen;
       
       while ( myGen > syncGen && isSyncRunning ) {
            wait(100);
        }

        if ( myGen <= syncGen ) {
           return;
        }

        // now this thread is expected to run the sync.
       localSyncGen = currentGen;
       isSyncRunning = true;      
       swapBuffers() ;
       currentGen++;
    }

    //sync the old buffer.
    //also sync could be skipped if there is no data in the old buffer.

   synchronized (this) {
       isSyncRunning = false;
       processErrorStreams(); //etc.
       syncGen = localSyncGen;
       this.notifyAll();
   }
}
{code}

Regd processErrorStreams() : this is an error condition and usually never happens. It could be something like this : 

{code}
synchronized processErrorStreams() {
     while ( isSyncRunning) {
            wait();
       }
     //remove the error streams.       
   }    
}
{code}


      was (Author: rangadi):
    
Dhuba, this works around the locking issue with a 1millisec sleep, which is probably ok though better not to do IMO.

I think there is another issue w.r.t how we use lastModificationTIme and lastSyncTIme. Assume each sync takes 1 milisec (might be more) and there is a steady load of more than 1000 edits per sec (quite common). Then lastSyncTIme is _always_ equal or behind lastModTIme. So that implies every IPC thread will run a sync (plus the newly added sleep time). 

This essentially brings us back to same situation : number of edits possible is not much larger than number of syncs possible per sec. I might be mistaken here but the benchmark stats can show this.

I basically like the idea of using two buffers to increase sync efficiency. I think it will have a big improvement on NNBench. I think locking  looks complicated because we have 3 read/write locks. I think it can be done with one simle synchronized lock, and not affected by the 'lastModTime' issue above :

{code}
synchronized void logEdit(...) {
      writeEdit( currentBuffer );
      processErrorStreams(); // etc
} 

void logSync {
   long myGen = 0;
   long localSyncGen = 0;

   synchronized (this) {
       myGen = currentGen;
       
       while ( myGen > syncGen && isSyncRunning ) {
            wait(100);
        }

        if ( myGen <= syncGen ) {
           return;
        }

        // now this thread is expected to run the sync.
       localSyncGen = currentGen;
       isSyncRunning = true;      
       swapBuffers() ;
       currentGen++;
    }

    //sync the old buffer.
    //also sync could be skipped if there is no data in the old buffer.

   synchronized (this) {
       isSyncRunning = false;
       processErrorStreams(); //etc.
       syncGen = localSyncGen;
       editLoc.notifyAll();
   }
}
{code}

Regd processErrorStreams() : this is an error condition and usually never happens. It could be something like this : 

{code}
synchronized processErrorStreams() {
     while ( isSyncRunning) {
            wait();
       }
     //remove the error streams.       
   }    
}
{code}

  
> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1942:
-------------------------------------

    Attachment: transactionLogSync2.patch

Thanks Raghu. I changed the locking slightly to account for the fact that a thread should not block for the syncLock while holding the flushLock. This could negate most of the concurrency optimizations that we are targeting. Can you pl comment on this version?

I also added a unit test. This unit test is still work-under-progress.



> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1942:
-------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this. 

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch, transactionLogSync8.patch, transactionLogSync9.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1942:
-------------------------------------

    Attachment: transactionLogSync5.patch

Merged patch with latest trunk. I also removed resetting the ThreadLocal txid in logSync(). In future, if we need to implement logSyncTillNow() then we can implement that part of it. I also think that the time measurement to write the transaction into memory (logEdit) might be helpful. Espeically, if in future we decide to compare transaction log resident in ram vs. nvram. The two calls to retrieve system time should not add much overhead. 

About removing the "synchronized (editStream)": this one makes sense, but let me ponder over it for a day.

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1942:
-------------------------------------

    Attachment: transactionLogSync4.patch

Incorporated review comments from Raghu.

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533248 ] 

Hadoop QA commented on HADOOP-1942:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12367299/transactionLogSync9.patch
against trunk revision r582867.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests -1.  The patch failed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/906/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/906/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/906/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/906/console

This message is automatically generated.

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch, transactionLogSync8.patch, transactionLogSync9.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532312 ] 

Raghu Angadi commented on HADOOP-1942:
--------------------------------------

Dhruba, you need to update the patch. It does not apply to trunk (because of HADOOP-1978).

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1942:
-------------------------------------

    Status: Patch Available  (was: Open)

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch, transactionLogSync8.patch, transactionLogSync9.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531172 ] 

Raghu Angadi commented on HADOOP-1942:
--------------------------------------

Looks like this patch has the same locking issue as before. But it would be nice to gather stats with this implementation and compare with new stats once we fix it.

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532538 ] 

rangadi edited comment on HADOOP-1942 at 10/4/07 3:23 PM:
---------------------------------------------------------------

Just one more : 

- close() needs to do logSyncTillNow() (within synchronized, instead of estream.flushAndSync()), otherwise it can lose data in the current buffer. Currently  close is called by by rollEditsLog() and PurgeEditsLog().

There could be more of such minor/subtle data loss issues in future. Apart from checksums for edit/image files, we could probably keep couple of counters for catching at least some of these issues in future. I will also think about them.

      was (Author: rangadi):
    Just one more : 

- close() needs to do logSyncTillNow() (within synchronized, instead of estream.flushAndSync()), otherwise it can lose data in the current buffer. Currently it is used by by rollEditsLog() and PurgeEditsLog().

There could be more of such minor/subtle data loss issues in future. Apart from checksums for edit/image files, we could probably keep couple of counters for catching at least some of these issues in future. I will also think about them.
  
> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531657 ] 

Raghu Angadi commented on HADOOP-1942:
--------------------------------------

> I am thinking of extending your idea to remember the counter in logEdit(). 

I was thinking of the same. But logEdit() is called somewhere down the stack. Are you going to set a thread local?
But even with out it, we get most of the benifit since time between logEdit() and entering logSync() is pretty small comared time to sync. But the transaction id is good as well.


> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532538 ] 

Raghu Angadi commented on HADOOP-1942:
--------------------------------------

Just one more : 

- close() needs to do logSyncTillNow() (within synchronized, instead of estream.flushAndSync()), otherwise it can lose data in the current buffer. Currently it is used by by rollEditsLog() and PurgeEditsLog().

There could be more of such minor/subtle data loss issues in future. Apart from checksums for edit/image files, we could probably keep couple of counters for catching at least some of these issues in future. I will also think about them.

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533720 ] 

Hudson commented on HADOOP-1942:
--------------------------------

Integrated in Hadoop-Nightly #267 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/267/])

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch, transactionLogSync8.patch, transactionLogSync9.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532791 ] 

Raghu Angadi commented on HADOOP-1942:
--------------------------------------

> The close() call does a flush too.

I am a little confused. close() always did a flush. The problem was that it always flushed {{buf1}} so it is right 50% of the time. 


> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch, transactionLogSync8.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1942:
-------------------------------------

    Attachment: transactionLogSync6.patch

Removed unused imports from NameNode.java

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1942:
-------------------------------------

    Attachment:     (was: transactionLogSync6.patch)

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Mukund Madhugiri (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537381 ] 

Mukund Madhugiri commented on HADOOP-1942:
------------------------------------------

Please ignore my previous comment. The run did not actually go thru with 120 threads. It was run with 40 threads as well.

Here is new data from a 100 node benchmark run:

#1. TPS without this fix, 40 threads
CreateWrite: 1023
OpenRead: 46728
Rename: 1169
Delete: 962

#2. TPS with this fix, 40 threads
CreateWrite: 3533 (3 times better than #1)
OpenRead: 43243
Rename: 9090 (7 times better than #1)
Delete: 7142 (7 times better than #1)

#3. TPS with this fix, 120 threads
CreateWrite: 4138 (4 times better than #1)
OpenRead: 50251
Rename: 10097 (8 times better than #1)
Delete: 7210 (7 times better than #1)

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch, transactionLogSync8.patch, transactionLogSync9.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533596 ] 

Raghu Angadi commented on HADOOP-1942:
--------------------------------------


With 40, the best case is 20 times than before I was hoping at least 10 times. If it is actually 40, then increase in create/deletes is less than I would expect. I think we should get some jstacks to see where the IPC threads are most of the time with the current config.  What do the stats printed on the log say?

 

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch, transactionLogSync8.patch, transactionLogSync9.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1942:
-------------------------------------

    Attachment: transactionLogSync8.patch

The close() call does a flush too.

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch, transactionLogSync8.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Mukund Madhugiri (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534193 ] 

Mukund Madhugiri commented on HADOOP-1942:
------------------------------------------

Here is more data:

1. TPS without this fix, 40 threads: 
CreateWrite: 924
OpenRead: 58892
Rename: 1017
Delete: 766

2. TPS with this fix, 40 threads:
CreateWrite: 3489
OpenRead: 58978
Rename: 3905
Delete: 2241

3. TPS with this fix, 120 threads:
CreateWrite: 3376
OpenRead: 50343
Rename: 3907
Delete: 2520

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch, transactionLogSync8.patch, transactionLogSync9.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533223 ] 

Hadoop QA commented on HADOOP-1942:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12367167/transactionLogSync8.patch
against trunk revision r582867.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs -1.  The patch appears to introduce 1 new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/904/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/904/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/904/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/904/console

This message is automatically generated.

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch, transactionLogSync8.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1942:
-------------------------------------

    Attachment: transactionLogSync3.patch

This patch includes the locking changes to optimize writing and sync-ing of the edit log. It also includes statistics to gather the following:

1.Number of transactions
2.Time to write these transactions to memory buffer (average& total)
3. Number of syncs
4. Time to do these syncs (average & total)

These statistics are written to the Namenode log once every minute. They are also written to the statistics aggregator daemon if present.

This patch includes a unit-test that creates 100 threads and each thread processes 1000 transactions. For this test case, the current trunk does about 95000 syncs. Trunk plus this patch does about 4000 syncs. A huge improvement!

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1942:
-------------------------------------

    Attachment: transactionLogSync6.patch

This patch does the following:
1.  removed the "synchronized estreams". This exposed a bug that cause the transaction log to get corrupted.  
2. EditLogOutputStream does not implement DataOutput leading to code simplification.
3. Swap DataOutputStreams rather than ByteOutputStream. This fixed the bug exposed by 1 above.

Thanks to Raghu for these review comments.

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532313 ] 

Raghu Angadi commented on HADOOP-1942:
--------------------------------------

FSEditLog.java :

- in logSync() : mytxid should be set to min(id.txid, txid) otherwise, when id.txtd is MAX_VALUE, thread could stay in logSync() for longer time (i.e. it will always sync). This can happen when completeFile() returns false, which is quite often.   
-- Another option is not to reset id.txid but provide logSyncTillNow(), which calls logSync() with id.txid set to current txid, if such a call is required.

- synchronized (editstream) is not required inside logEdit(). Looks like it existed before but can be removed.

- there are two calls to System.currentTimeMillis() in side editLog(). editLog() is an in memory operation. I don't think we need to measure that. editLog() is just like any other processing now.

I haven't looked at the Stats etc yet.


> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533597 ] 

Raghu Angadi commented on HADOOP-1942:
--------------------------------------

> If it is actually 40, then increase in create/deletes is less than I would expect.

Assuming this is still bound by sync.


> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch, transactionLogSync8.patch, transactionLogSync9.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1942:
-------------------------------------

    Attachment:     (was: transactionLogSync.patch)

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533177 ] 

Raghu Angadi commented on HADOOP-1942:
--------------------------------------

+1 looks good to me. Thanks for the changes.

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch, transactionLogSync8.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1942:
-------------------------------------

    Status: Open  (was: Patch Available)

Findbugs warning.

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch, transactionLogSync8.patch, transactionLogSync9.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531152 ] 

dhruba borthakur commented on HADOOP-1942:
------------------------------------------

I am still working on gathering performance numbers on a large cluster.

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531858 ] 

dhruba borthakur commented on HADOOP-1942:
------------------------------------------

I would not store the TransactionId in a ThreadLocal because it forces the same thread that did the transaction to call the sync. I would return the TransactionId from logEdit() all the way up the stack, and then pass it back into logSync(). This will allow us (in future) to move to a model where the namenode can complete RPCs without holding on to a handler thread permanently for the lifetime of a RPC call.



> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533593 ] 

dhruba borthakur commented on HADOOP-1942:
------------------------------------------

I believe it is 40 threads.

It might give us even better performance if we repeat this benchmark with more than 40 threads.

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch, transactionLogSync8.patch, transactionLogSync9.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531132 ] 

Doug Cutting commented on HADOOP-1942:
--------------------------------------

Do you have any benchmarks that demonstrate the improvement?

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533159 ] 

dhruba borthakur commented on HADOOP-1942:
------------------------------------------

The last patch I uploaded on Friday fixed the problem with flush flushing the wrong buffer. Can you pl have another look at it? Thanks.

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch, transactionLogSync8.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533554 ] 

Raghu Angadi commented on HADOOP-1942:
--------------------------------------

Mukund, how many IPC threads did you use for the test? Default is 10.

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch, transactionLogSync8.patch, transactionLogSync9.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531655 ] 

dhruba borthakur commented on HADOOP-1942:
------------------------------------------

I like your idea. In fact, the current code has a bug that the modificationTime used for batching is not stored per transaction. This fix shud give us lots of  concurrency.

I am thinking of extending your idea to remember the counter in logEdit(). It can be something like a TransactionId. logEdit() will return the transactionId. Then this transactionId is passed into logSync(). logSync() will wait till that particular transaction is synced to disk. This allows threads that do multiple transactions to issue only one logSync().

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1942:
-------------------------------------

    Status: Patch Available  (was: Open)

Thanks to Raghu for his patient review of this patch.

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch, transactionLogSync8.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532539 ] 

Raghu Angadi commented on HADOOP-1942:
--------------------------------------

minor : extra imports in NameNodeMetrics.java.

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch, transactionLogSync4.patch, transactionLogSync5.patch, transactionLogSync6.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.