You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by tsuna <ts...@gmail.com> on 2010/06/08 19:17:50 UTC

Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

On Fri, May 28, 2010 at 8:21 PM, Paul Cowan <co...@aconex.com> wrote:
> Changing this class to use ConcurrentHashMap, and completely removing
> synchronization, won't work, as it's written. The lazy-loading in
> regionCachePreFetchEnabled() (get - if null, put) won't work correctly with
> a ConcurrentHashMap without external synchronization.

I'm not sure to understand why, can you explain?

> 1) Replace with a ConcurrentHashMap, and change regionCachePreFetchEnabled()
> to do something like:
>       prefetchRegionCacheEnabled.putIfAbsent(tableName, true);
>       return prefetchRegionCacheEnabled.get(tableName);

Slide 31 of the aforementioned presentation reads "Calls putIfAbsent
every time it needs to read a value" "Unfortunately, this usage is
very common" so I think this is not a good idea.

I'm still fairly new to Java so I tend to trust what people like
Joshua Bloch write, but I can be convinced otherwise :)

-- 
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Paul Cowan <co...@aconex.com>.
On 09/06/10 15:56, tsuna wrote:
> Oh, I see.  But that's because the regionCachePrefetchEnabled method
> was unnecessarily changing the Map.  If the "read-only" methods like
> this one weren't modifying any data structure (and they don't need
> to), this problem wouldn't exist.

Yep, exactly. None of the other 3 ways need additional synchronization, 
it's just the one specific way it was implemented (hence my 'won't work, 
as it's written' originally).

> Also even though ConcurrentHashSet doesn't exist, it can be
> implemented by storing some random, unique Object instance as the
> value.  I think that's how HashSet is implemented:

Yeah, it's very easily implemented. In fact, there's a utility method to 
do this easily (which I only just learned about a little while ago, 
after sending the original email -- there's always something new to learn!):

   Collections.newSetFromMap(new ConcurrentHashMap());

which would work correctly in this instance. As I said though, if it's a 
write-rarely, read-frequently, low-number-of-entries scenario then 
CopyOnWriteArraySet will likely work just as well, if not better.

Cheers,

Paul


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by tsuna <ts...@gmail.com>.
On Tue, Jun 8, 2010 at 5:59 PM, Paul Cowan <co...@aconex.com> wrote:
> No problem. With that change, there would be a potential race condition:
>
> Thread 1                                Thread 2
> ----------                              -------------
> enter regionCachePrefetchEnabled
> call ConcurrentMap.get() - get null
>                                        enter disableRegionCachePrefetch
>                                        ConcurrentMap.put(false);
>                                        exit disableRegionCachePrefetch
> ConcurrentMap.put(true)
> exit regionCachePrefetchEnabled
>
>
> which means that after one (nominally 'read-only') call to
> regionCachePrefetchEnabled and one call to disableRegionCachePrefetch have
> completed, the value in the cache for the table is true, when it really
> should be false.

Oh, I see.  But that's because the regionCachePrefetchEnabled method
was unnecessarily changing the Map.  If the "read-only" methods like
this one weren't modifying any data structure (and they don't need
to), this problem wouldn't exist.

Also even though ConcurrentHashSet doesn't exist, it can be
implemented by storing some random, unique Object instance as the
value.  I think that's how HashSet is implemented:

public class HashSet<E> [...] {
    [...]
    private transient HashMap<E,Object> map;

    // Dummy value to associate with an Object in the backing Map
    private static final Object PRESENT = new Object();

    [...]
    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }



-- 
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Paul Cowan <co...@aconex.com>.
Hi tsuna,

>> Changing this class to use ConcurrentHashMap, and completely removing
>> synchronization, won't work, as it's written. The lazy-loading in
>> regionCachePreFetchEnabled() (get - if null, put) won't work correctly with
>> a ConcurrentHashMap without external synchronization.
>
> I'm not sure to understand why, can you explain?

No problem. With that change, there would be a potential race condition:

Thread 1                                Thread 2
----------                              -------------
enter regionCachePrefetchEnabled
call ConcurrentMap.get() - get null
                                         enter disableRegionCachePrefetch
                                         ConcurrentMap.put(false);
                                         exit disableRegionCachePrefetch
ConcurrentMap.put(true)
exit regionCachePrefetchEnabled


which means that after one (nominally 'read-only') call to 
regionCachePrefetchEnabled and one call to disableRegionCachePrefetch 
have completed, the value in the cache for the table is true, when it 
really should be false.

The get-nullcheck-put triad really needs to be atomic, which requires 
either external synchronization or use of some of the other features of 
ConcurrentHashMap like below.

> Slide 31 of the aforementioned presentation reads "Calls putIfAbsent
> every time it needs to read a value" "Unfortunately, this usage is
> very common" so I think this is not a good idea.
>
> I'm still fairly new to Java so I tend to trust what people like
> Joshua Bloch write, but I can be convinced otherwise :)

That's purely for performance reasons -- you're right, I should have 
suggested
    get, if (result == null) {putIfAbsent, get}
rather than
    putIfAbsent, get

Both are correct but as Josh talks about on that slide (and the 
subsequent one) it's needless contention. I would be almost certain that 
it's still faster than a synchronized block, but not as much as other 
options. Should have mentioned that.

Of the 4 options I mentioned (putIfAbsent(), get() and default return 
value if missing, CopyOnWriteArraySet, ConcurrentHashSet), the 
putIfAbsent() method is likely the slowest (given the usage pattern 
anyway), and (more importantly, in my opinion) the least clear anyway. I 
think the ConcurrentSet solution is much cleaner.

Hope that makes thing clear.

Cheers,

Paul