You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by tsuna <ts...@gmail.com> on 2010/06/08 19:17:50 UTC
Re: Review Request: HBASE-2468: Improvements to prewarm META cache on
clients.
On Fri, May 28, 2010 at 8:21 PM, Paul Cowan <co...@aconex.com> wrote:
> Changing this class to use ConcurrentHashMap, and completely removing
> synchronization, won't work, as it's written. The lazy-loading in
> regionCachePreFetchEnabled() (get - if null, put) won't work correctly with
> a ConcurrentHashMap without external synchronization.
I'm not sure to understand why, can you explain?
> 1) Replace with a ConcurrentHashMap, and change regionCachePreFetchEnabled()
> to do something like:
> prefetchRegionCacheEnabled.putIfAbsent(tableName, true);
> return prefetchRegionCacheEnabled.get(tableName);
Slide 31 of the aforementioned presentation reads "Calls putIfAbsent
every time it needs to read a value" "Unfortunately, this usage is
very common" so I think this is not a good idea.
I'm still fairly new to Java so I tend to trust what people like
Joshua Bloch write, but I can be convinced otherwise :)
--
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com
Re: Review Request: HBASE-2468: Improvements to prewarm META cache
on clients.
Posted by Paul Cowan <co...@aconex.com>.
On 09/06/10 15:56, tsuna wrote:
> Oh, I see. But that's because the regionCachePrefetchEnabled method
> was unnecessarily changing the Map. If the "read-only" methods like
> this one weren't modifying any data structure (and they don't need
> to), this problem wouldn't exist.
Yep, exactly. None of the other 3 ways need additional synchronization,
it's just the one specific way it was implemented (hence my 'won't work,
as it's written' originally).
> Also even though ConcurrentHashSet doesn't exist, it can be
> implemented by storing some random, unique Object instance as the
> value. I think that's how HashSet is implemented:
Yeah, it's very easily implemented. In fact, there's a utility method to
do this easily (which I only just learned about a little while ago,
after sending the original email -- there's always something new to learn!):
Collections.newSetFromMap(new ConcurrentHashMap());
which would work correctly in this instance. As I said though, if it's a
write-rarely, read-frequently, low-number-of-entries scenario then
CopyOnWriteArraySet will likely work just as well, if not better.
Cheers,
Paul
Re: Review Request: HBASE-2468: Improvements to prewarm META cache on
clients.
Posted by tsuna <ts...@gmail.com>.
On Tue, Jun 8, 2010 at 5:59 PM, Paul Cowan <co...@aconex.com> wrote:
> No problem. With that change, there would be a potential race condition:
>
> Thread 1 Thread 2
> ---------- -------------
> enter regionCachePrefetchEnabled
> call ConcurrentMap.get() - get null
> enter disableRegionCachePrefetch
> ConcurrentMap.put(false);
> exit disableRegionCachePrefetch
> ConcurrentMap.put(true)
> exit regionCachePrefetchEnabled
>
>
> which means that after one (nominally 'read-only') call to
> regionCachePrefetchEnabled and one call to disableRegionCachePrefetch have
> completed, the value in the cache for the table is true, when it really
> should be false.
Oh, I see. But that's because the regionCachePrefetchEnabled method
was unnecessarily changing the Map. If the "read-only" methods like
this one weren't modifying any data structure (and they don't need
to), this problem wouldn't exist.
Also even though ConcurrentHashSet doesn't exist, it can be
implemented by storing some random, unique Object instance as the
value. I think that's how HashSet is implemented:
public class HashSet<E> [...] {
[...]
private transient HashMap<E,Object> map;
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
[...]
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
--
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com
Re: Review Request: HBASE-2468: Improvements to prewarm META cache
on clients.
Posted by Paul Cowan <co...@aconex.com>.
Hi tsuna,
>> Changing this class to use ConcurrentHashMap, and completely removing
>> synchronization, won't work, as it's written. The lazy-loading in
>> regionCachePreFetchEnabled() (get - if null, put) won't work correctly with
>> a ConcurrentHashMap without external synchronization.
>
> I'm not sure to understand why, can you explain?
No problem. With that change, there would be a potential race condition:
Thread 1 Thread 2
---------- -------------
enter regionCachePrefetchEnabled
call ConcurrentMap.get() - get null
enter disableRegionCachePrefetch
ConcurrentMap.put(false);
exit disableRegionCachePrefetch
ConcurrentMap.put(true)
exit regionCachePrefetchEnabled
which means that after one (nominally 'read-only') call to
regionCachePrefetchEnabled and one call to disableRegionCachePrefetch
have completed, the value in the cache for the table is true, when it
really should be false.
The get-nullcheck-put triad really needs to be atomic, which requires
either external synchronization or use of some of the other features of
ConcurrentHashMap like below.
> Slide 31 of the aforementioned presentation reads "Calls putIfAbsent
> every time it needs to read a value" "Unfortunately, this usage is
> very common" so I think this is not a good idea.
>
> I'm still fairly new to Java so I tend to trust what people like
> Joshua Bloch write, but I can be convinced otherwise :)
That's purely for performance reasons -- you're right, I should have
suggested
get, if (result == null) {putIfAbsent, get}
rather than
putIfAbsent, get
Both are correct but as Josh talks about on that slide (and the
subsequent one) it's needless contention. I would be almost certain that
it's still faster than a synchronized block, but not as much as other
options. Should have mentioned that.
Of the 4 options I mentioned (putIfAbsent(), get() and default return
value if missing, CopyOnWriteArraySet, ConcurrentHashSet), the
putIfAbsent() method is likely the slowest (given the usage pattern
anyway), and (more importantly, in my opinion) the least clear anyway. I
think the ConcurrentSet solution is much cleaner.
Hope that makes thing clear.
Cheers,
Paul