You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2007/10/03 09:42:51 UTC

[jira] Updated: (HADOOP-1942) Increase the concurrency of transaction logging to edits log

     [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1942:
-------------------------------------

    Attachment: transactionLogSync3.patch

This patch includes the locking changes to optimize writing and sync-ing of the edit log. It also includes statistics to gather the following:

1.Number of transactions
2.Time to write these transactions to memory buffer (average& total)
3. Number of syncs
4. Time to do these syncs (average & total)

These statistics are written to the Namenode log once every minute. They are also written to the statistics aggregator daemon if present.

This patch includes a unit-test that creates 100 threads and each thread processes 1000 transactions. For this test case, the current trunk does about 95000 syncs. Trunk plus this patch does about 4000 syncs. A huge improvement!

> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate of transactions that are being logged into tghe edits log. In the current code, a batching scheme implies that all transactions do not have to incur a sync of the edits log to disk. However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock, acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the persistent store. Since the buffers are swapped, new transactions continue to get logged into the new buffer. (Of course, the new transactions cannot complete before this new buffer is sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.