You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Colin Patrick McCabe (JIRA)" <ji...@apache.org> on 2014/09/02 20:02:21 UTC

[jira] [Commented] (HADOOP-11029) FileSystem#Statistics uses volatile variables that must be updated on write or read calls.

    [ https://issues.apache.org/jira/browse/HADOOP-11029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118439#comment-14118439 ] 

Colin Patrick McCabe commented on HADOOP-11029:
-----------------------------------------------

The original title here was "LocalFS Statistics performs thread local call per byte written."  This title is misleading in a few ways.  First of all, this is a generic Filesystem.java thing, not a LocalFS thing.  Secondly, it only "performs a thread local call per byte written" if you write a byte at a time.  If you write a byte at a time, you will have other high overheads, such as system call overhead (for local FS).  Finally, I don't think ThreadLocal is the source of that locked instruction you pointed out.

That lock prefix is almost certainly coming from the fact that the fields inside Statistics are volatile, not from {{ThreadLocal#get}}.  The entire point of {{ThreadLocal}} is that it is local to a thread, so no cross-thread synchronization should be required to access it (in a good implementation).  But we do require the volatile variables so that the updated statistics can be visible to other threads calling {{Statistics#get}}

The thread-local statistics code was added to alleviate a performance problem where we were taking a lock, adding to a counter, and then releasing the lock after every write or read operation.  It was a substantial improvement on that naive implementation.  See HDFS-5276 for details.

I'm not sure if Java offers a way to avoid those volatile variables.  As I mentioned, the reason the Statistics fields are volatile is so that other threads can read their values when we're calling {{Statistics#get}}.  Is there a way to do the memory barrier only on read?  If so, I'm not aware of it.  Maybe we could cast to volatile?  Or is there a memory barrier instruction in Unsafe?  That would be the only way to avoid the volatile.

Can you put some numbers behind this?  If this is only an issue when you are writing a byte at a time, I'd be inclined to close this, since, as I mentioned, byte-at-a-time is a well-known anti-pattern with Java streams.

> FileSystem#Statistics uses volatile variables that must be updated on write or read calls.
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-11029
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11029
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.5.0, 2.6.0
>            Reporter: Gopal V
>         Attachments: local-fs-locking.png
>
>
> This code is there in the hot-path of IFile writer via RawLocalFileSystem.
> !local-fs-locking.png!
> From a preliminary glance, the lock prefix calls are coming from a threadlocal.get() within FileSystem.Statistics 
> {code}
>    /**
>      * Get or create the thread-local data associated with the current thread.
>      */
>     private StatisticsData getThreadData() {
>       StatisticsData data = threadData.get();
>       if (data == null) {
>         data = new StatisticsData(
>             new WeakReference<Thread>(Thread.currentThread()));
>         threadData.set(data);
>         synchronized(this) {
>           if (allData == null) {
>             allData = new LinkedList<StatisticsData>();
>           }
>           allData.add(data);
>         }
>       }
>       return data;
>     }
>     /**
>      * Increment the bytes read in the statistics
>      * @param newBytes the additional bytes read
>      */
>     public void incrementBytesRead(long newBytes) {
>       getThreadData().bytesRead += newBytes;
>     }
> {code}
> This is incredibly inefficient when used from FSDataOutputStream
> {code}
>     public void write(int b) throws IOException {
>       out.write(b);
>       position++;
>       if (statistics != null) {
>         statistics.incrementBytesWritten(1);
>       }
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)