You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Mingjie Lai <mj...@gmail.com> on 2010/05/28 19:31:53 UTC

Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/
-----------------------------------------------------------

Review request for hbase.


Summary
-------

HBASE-2468: Improvements to prewarm META cache on clients.

Changes:
1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
- void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
- Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
- prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.

2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 


This addresses bug HBASE-2468.
    http://issues.apache.org/jira/browse/HBASE-2468


Diffs
-----

  src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 09de2ac 
  src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
  src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 3de661e 
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 

Diff: http://review.hbase.org/r/98/diff


Testing
-------

Unit tests passed locally for me. 


Thanks,

Mingjie


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Paul Cowan <co...@aconex.com>.
On 09/06/10 15:56, tsuna wrote:
> Oh, I see.  But that's because the regionCachePrefetchEnabled method
> was unnecessarily changing the Map.  If the "read-only" methods like
> this one weren't modifying any data structure (and they don't need
> to), this problem wouldn't exist.

Yep, exactly. None of the other 3 ways need additional synchronization, 
it's just the one specific way it was implemented (hence my 'won't work, 
as it's written' originally).

> Also even though ConcurrentHashSet doesn't exist, it can be
> implemented by storing some random, unique Object instance as the
> value.  I think that's how HashSet is implemented:

Yeah, it's very easily implemented. In fact, there's a utility method to 
do this easily (which I only just learned about a little while ago, 
after sending the original email -- there's always something new to learn!):

   Collections.newSetFromMap(new ConcurrentHashMap());

which would work correctly in this instance. As I said though, if it's a 
write-rarely, read-frequently, low-number-of-entries scenario then 
CopyOnWriteArraySet will likely work just as well, if not better.

Cheers,

Paul


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by tsuna <ts...@gmail.com>.
On Tue, Jun 8, 2010 at 5:59 PM, Paul Cowan <co...@aconex.com> wrote:
> No problem. With that change, there would be a potential race condition:
>
> Thread 1                                Thread 2
> ----------                              -------------
> enter regionCachePrefetchEnabled
> call ConcurrentMap.get() - get null
>                                        enter disableRegionCachePrefetch
>                                        ConcurrentMap.put(false);
>                                        exit disableRegionCachePrefetch
> ConcurrentMap.put(true)
> exit regionCachePrefetchEnabled
>
>
> which means that after one (nominally 'read-only') call to
> regionCachePrefetchEnabled and one call to disableRegionCachePrefetch have
> completed, the value in the cache for the table is true, when it really
> should be false.

Oh, I see.  But that's because the regionCachePrefetchEnabled method
was unnecessarily changing the Map.  If the "read-only" methods like
this one weren't modifying any data structure (and they don't need
to), this problem wouldn't exist.

Also even though ConcurrentHashSet doesn't exist, it can be
implemented by storing some random, unique Object instance as the
value.  I think that's how HashSet is implemented:

public class HashSet<E> [...] {
    [...]
    private transient HashMap<E,Object> map;

    // Dummy value to associate with an Object in the backing Map
    private static final Object PRESENT = new Object();

    [...]
    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }



-- 
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Paul Cowan <co...@aconex.com>.
Hi tsuna,

>> Changing this class to use ConcurrentHashMap, and completely removing
>> synchronization, won't work, as it's written. The lazy-loading in
>> regionCachePreFetchEnabled() (get - if null, put) won't work correctly with
>> a ConcurrentHashMap without external synchronization.
>
> I'm not sure to understand why, can you explain?

No problem. With that change, there would be a potential race condition:

Thread 1                                Thread 2
----------                              -------------
enter regionCachePrefetchEnabled
call ConcurrentMap.get() - get null
                                         enter disableRegionCachePrefetch
                                         ConcurrentMap.put(false);
                                         exit disableRegionCachePrefetch
ConcurrentMap.put(true)
exit regionCachePrefetchEnabled


which means that after one (nominally 'read-only') call to 
regionCachePrefetchEnabled and one call to disableRegionCachePrefetch 
have completed, the value in the cache for the table is true, when it 
really should be false.

The get-nullcheck-put triad really needs to be atomic, which requires 
either external synchronization or use of some of the other features of 
ConcurrentHashMap like below.

> Slide 31 of the aforementioned presentation reads "Calls putIfAbsent
> every time it needs to read a value" "Unfortunately, this usage is
> very common" so I think this is not a good idea.
>
> I'm still fairly new to Java so I tend to trust what people like
> Joshua Bloch write, but I can be convinced otherwise :)

That's purely for performance reasons -- you're right, I should have 
suggested
    get, if (result == null) {putIfAbsent, get}
rather than
    putIfAbsent, get

Both are correct but as Josh talks about on that slide (and the 
subsequent one) it's needless contention. I would be almost certain that 
it's still faster than a synchronized block, but not as much as other 
options. Should have mentioned that.

Of the 4 options I mentioned (putIfAbsent(), get() and default return 
value if missing, CopyOnWriteArraySet, ConcurrentHashSet), the 
putIfAbsent() method is likely the slowest (given the usage pattern 
anyway), and (more importantly, in my opinion) the least clear anyway. I 
think the ConcurrentSet solution is much cleaner.

Hope that makes thing clear.

Cheers,

Paul

Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by tsuna <ts...@gmail.com>.
On Fri, May 28, 2010 at 8:21 PM, Paul Cowan <co...@aconex.com> wrote:
> Changing this class to use ConcurrentHashMap, and completely removing
> synchronization, won't work, as it's written. The lazy-loading in
> regionCachePreFetchEnabled() (get - if null, put) won't work correctly with
> a ConcurrentHashMap without external synchronization.

I'm not sure to understand why, can you explain?

> 1) Replace with a ConcurrentHashMap, and change regionCachePreFetchEnabled()
> to do something like:
>       prefetchRegionCacheEnabled.putIfAbsent(tableName, true);
>       return prefetchRegionCacheEnabled.get(tableName);

Slide 31 of the aforementioned presentation reads "Calls putIfAbsent
every time it needs to read a value" "Unfortunately, this usage is
very common" so I think this is not a good idea.

I'm still fairly new to Java so I tend to trust what people like
Joshua Bloch write, but I can be convinced otherwise :)

-- 
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Mingjie Lai <mj...@gmail.com>.
Paul.

 > [Commenting on-list, not sure if I should just sign up + put these
 > comments direct on reviews?]

Yes. You're supposed to sign in and write comment directly to 
http://review.hbase.org

 > So assuming that
 > disabling/enabling prefetch for a region is rare (both in terms of
 > frequency, and in terms of a small number of regions being disabled),
 > the best performance will probably come from something like
 > private final Set<byte[]> prefetchDisabledRegions
 > = new CopyOnWriteArraySet<byte[]>()

Excellent suggestion. It's the exact use case. I will follow your 
suggestion and use CopyOnWriteArraySet instead of HashMap.

Thanks,
Mingjie

On 05/28/2010 08:21 PM, Paul Cowan wrote:
> [Commenting on-list, not sure if I should just sign up + put these
> comments direct on reviews?]
>
> On 29/05/2010 7:59 AM, Ryan Rawson wrote:
>  > Belay that order, I need to inspect this code review... unfortunately
> a concurrent collection is not the answer to all multi threading woes.
>
> Changing this class to use ConcurrentHashMap, and completely removing
> synchronization, won't work, as it's written. The lazy-loading in
> regionCachePreFetchEnabled() (get - if null, put) won't work correctly
> with a ConcurrentHashMap without external synchronization.
>
> As it is, the class is thread-safe in the way it accesses the HashMap.
> To remove the synchronization (if desired), you could:
>
> 1) Replace with a ConcurrentHashMap, and change
> regionCachePreFetchEnabled() to do something like:
> prefetchRegionCacheEnabled.putIfAbsent(tableName, true);
> return prefetchRegionCacheEnabled.get(tableName);
> Alternatively, remove the 'true' -- that's the default, right? So there
> isn't actually any reason to put true values in the map if there's no
> value otherwise. Then just use:
> Boolean result = this.prefetchRegionCacheEnabled.get(tableName);
> return result == null ? Boolean.TRUE : result;
> (and it's simpler, right?)
>
>
> 2) Again, given that 'true' is the default value, there's probably
> actually no reason to use a Map at all. All you want to know is which
> ones have specifically had fetching disabled, right? So assuming that
> disabling/enabling prefetch for a region is rare (both in terms of
> frequency, and in terms of a small number of regions being disabled),
> the best performance will probably come from something like
> private final Set<byte[]> prefetchDisabledRegions
> = new CopyOnWriteArraySet<byte[]>()
> then disable is just:
> prefetchDisabledRegions.add(name);
> enable is:
> prefetchDisabledRegions.remove(name);
> and isEnabled is just
> prefetchDisabledRegions.contains(name);
> which is probably going to be quicker if disabling is done only for a
> few regions (so linear walking of the underlying array is quicker than
> hashing + probing) -- and remember, scans of CopyOnWriteArraySet are
> completely lock-free. Also, it's cleaner IMHO...
>
>
> 3) Use the solution from #2 with a good ConcurrentHashSet (there isn't
> one in the JDK, unfortunately, but maybe you have one available), which
> is more useful if large numbers of regions will be disabled or if
> regions will be toggled on/off frequently.
>
> Cheers,
>
> Paul
>
> PS: Minor suggestion: this class is inconsistent with variable/method
> names as to whether 'prefetch' is one camel-cased word (prefetch) or two
> (preFetch). Just thought I'd mention it!
>

Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Paul Cowan <co...@aconex.com>.
[Commenting on-list, not sure if I should just sign up + put these 
comments direct on reviews?]

On 29/05/2010 7:59 AM, Ryan Rawson wrote:
 > Belay that order, I need to inspect this code review... unfortunately 
a concurrent collection is not the answer to all multi threading woes.

Changing this class to use ConcurrentHashMap, and completely removing 
synchronization, won't work, as it's written. The lazy-loading in 
regionCachePreFetchEnabled() (get - if null, put) won't work correctly 
with a ConcurrentHashMap without external synchronization.

As it is, the class is thread-safe in the way it accesses the HashMap. 
To remove the synchronization (if desired), you could:

1) Replace with a ConcurrentHashMap, and change 
regionCachePreFetchEnabled() to do something like:
        prefetchRegionCacheEnabled.putIfAbsent(tableName, true);
        return prefetchRegionCacheEnabled.get(tableName);
Alternatively, remove the 'true' -- that's the default, right? So there 
isn't actually any reason to put true values in the map if there's no 
value otherwise. Then just use:
        Boolean result = this.prefetchRegionCacheEnabled.get(tableName);
        return result == null ? Boolean.TRUE : result;
(and it's simpler, right?)


2) Again, given that 'true' is the default value, there's probably 
actually no reason to use a Map at all. All you want to know is which 
ones have specifically had fetching disabled, right? So assuming that 
disabling/enabling prefetch for a region is rare (both in terms of 
frequency, and in terms of a small number of regions being disabled), 
the best performance will probably come from something like
       private final Set<byte[]> prefetchDisabledRegions
         = new CopyOnWriteArraySet<byte[]>()
then disable is just:
       prefetchDisabledRegions.add(name);
enable is:
       prefetchDisabledRegions.remove(name);
and isEnabled is just
       prefetchDisabledRegions.contains(name);
which is probably going to be quicker if disabling is done only for a 
few regions (so linear walking of the underlying array is quicker than 
hashing + probing) -- and remember, scans of CopyOnWriteArraySet are 
completely lock-free. Also, it's cleaner IMHO...


3) Use the solution from #2 with a good ConcurrentHashSet (there isn't 
one in the JDK, unfortunately, but maybe you have one available), which 
is more useful if large numbers of regions will be disabled or if 
regions will be toggled on/off frequently.

Cheers,

Paul

PS: Minor suggestion: this class is inconsistent with variable/method 
names as to whether 'prefetch' is one camel-cased word (prefetch) or two 
(preFetch). Just thought I'd mention it!

Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Ryan Rawson <ry...@gmail.com>.

> On 2010-05-28 14:13:14, Benoit Sigoure wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, line 306
> > <http://review.hbase.org/r/98/diff/3/?file=741#file741line306>
> >
> >     Please use a concurrent collection and remove the synchronized blocks.  For guidance, see around slide 30 of this presentation:
> >     http://www.cs.umd.edu/class/fall2009/cmsc132H/slides/still-effective.pdf

Belay that order, I need to inspect this code review... unfortunately a concurrent collection is not the answer to all multi threading woes.


> On 2010-05-28 14:13:14, Benoit Sigoure wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, line 1594
> > <http://review.hbase.org/r/98/diff/3/?file=741#file741line1594>
> >
> >     Use `Boolean.TRUE' instead of `new Boolean(true)'.

Please use the autoboxing features of java6 (our target platform). Ie: just 'true' and 'false' for this and all others.


- Ryan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review93
-----------------------------------------------------------


On 2010-05-28 13:20:33, Mingjie Lai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> -----------------------------------------------------------
> 
> (Updated 2010-05-28 13:20:33)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
> - Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 
> 
> 
> This addresses bug HBASE-2468.
>     http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 09de2ac 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 3de661e 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> -------
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Benoit Sigoure <ts...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review93
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/client/HConnection.java
<http://review.hbase.org/r/98/#comment531>

    Properly document this argument.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment534>

    Please use a concurrent collection and remove the synchronized blocks.  For guidance, see around slide 30 of this presentation:
    http://www.cs.umd.edu/class/fall2009/cmsc132H/slides/still-effective.pdf



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment532>

    Coding style: put the catch on the previous line.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment533>

    We don't typically call methods using `this.methodname(args)' – remove the `this.'



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment535>

    Add a space before the `:'.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment536>

    Use `Boolean.TRUE' instead of `new Boolean(true)'.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment537>

    Use `Boolean.FALSE' instead of `new Boolean(false)'.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment538>

    Use `Boolean.TRUE' instead of `new Boolean(true)'.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment539>

    Remove the outer parentheses.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment540>

    Add a space before the `:'.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment541>

    If this fits on the previous line while staying under 80 columns, please wrap it around.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment542>

    Remove the space after `cacheLocation'



src/main/java/org/apache/hadoop/hbase/client/HTable.java
<http://review.hbase.org/r/98/#comment546>

    Use {@link #readFields readFields} instead of <code>readFields</code>



src/main/java/org/apache/hadoop/hbase/client/HTable.java
<http://review.hbase.org/r/98/#comment544>

    You can remove this <p>



src/main/java/org/apache/hadoop/hbase/client/HTable.java
<http://review.hbase.org/r/98/#comment543>

    Use <pre> not <code>



src/main/java/org/apache/hadoop/hbase/client/HTable.java
<http://review.hbase.org/r/98/#comment545>

    Add a space before the `:'.



src/main/java/org/apache/hadoop/hbase/client/HTable.java
<http://review.hbase.org/r/98/#comment547>

    Use {@link #getRegionsInfo getRegionsInfo}



src/main/java/org/apache/hadoop/hbase/client/HTable.java
<http://review.hbase.org/r/98/#comment548>

    Use {@link ...} here and below.



src/main/java/org/apache/hadoop/hbase/client/HTable.java
<http://review.hbase.org/r/98/#comment550>

    Wrap this around with the previous line.



src/main/java/org/apache/hadoop/hbase/client/HTable.java
<http://review.hbase.org/r/98/#comment549>

    I believe you can remove this block.


- Benoit


On 2010-05-28 13:20:33, Mingjie Lai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> -----------------------------------------------------------
> 
> (Updated 2010-05-28 13:20:33)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
> - Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 
> 
> 
> This addresses bug HBASE-2468.
>     http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 09de2ac 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 3de661e 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> -------
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Mingjie Lai <mj...@gmail.com>.

> On 2010-05-28 15:05:08, Ryan Rawson wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, line 1604
> > <http://review.hbase.org/r/98/diff/3/?file=741#file741line1604>
> >
> >     What does this method actually do? If its a predicate start it with 'is'.  Also precaching should be off by default...
> >      Is it?

Will add ``is'' prefix to this method. 

Pre-caching (in case of a cache miss) is set to ``on'' by default right now, according to stack:

``I'm for default behavior being grabbing more than just the one row – ten or something. Regards full-table scan as default, I think it should be an option (Its a nice option to have). ''

https://issues.apache.org/jira/browse/HBASE-2468?focusedCommentId=12870321&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12870321


- Mingjie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review95
-----------------------------------------------------------


On 2010-05-28 13:20:33, Mingjie Lai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> -----------------------------------------------------------
> 
> (Updated 2010-05-28 13:20:33)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
> - Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 
> 
> 
> This addresses bug HBASE-2468.
>     http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 09de2ac 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 3de661e 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> -------
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Ryan Rawson <ry...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review95
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment553>

    What does this method actually do? If its a predicate start it with 'is'.  Also precaching should be off by default...
     Is it?


- Ryan


On 2010-05-28 13:20:33, Mingjie Lai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> -----------------------------------------------------------
> 
> (Updated 2010-05-28 13:20:33)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
> - Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 
> 
> 
> This addresses bug HBASE-2468.
>     http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 09de2ac 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 3de661e 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> -------
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Mingjie Lai <mj...@gmail.com>.

> On 2010-05-31 21:52:12, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HTable.java, line 1075
> > <http://review.hbase.org/r/98/diff/4/?file=872#file872line1075>
> >
> >     I feel like it would be cleaner to have the following methods be non-static, so they don't need any arguments. that would reduce the number of functions by factor of two

That was what I wanted to do in the very beginning. I don't like so many functions either.  

But there is one existing method: 
  public static boolean isTableEnabled(String tableName) ...
  public static boolean isTableEnabled(Configuration conf, String tableName) ...

It's one of the reason that I just want to keep the original coding style to be consistent. 

In addition, I agree we can make isRegionCachePrefetchEnabled() etc. to be non-static. However, enableRegionCachePrefetch(), and disableRegionCachePrefetch() must to be static since we want to enable/disable a table without instantiate HTable. In another word, we cannot really dis/enable cache prefetch for each HTable instance who have the same table name. While we can only enable/disable based on table name. It is pretty similar as table enable/disable. 


> On 2010-05-31 21:52:12, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, line 1608
> > <http://review.hbase.org/r/98/diff/4/?file=871#file871line1608>
> >
> >     kill this function

I don't like it either. I will kill all ``is...Disabled()'' methods. 


> On 2010-05-31 21:52:12, Todd Lipcon wrote:
> > src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java, line 3639
> > <http://review.hbase.org/r/98/diff/4/?file=874#file874line3639>
> >
> >     I really like these unit tests, but I think you should use a row key for the Get that isn't also a region start key, as it may expose different behavior. eg I think you will end up with 11 cached regions instead of 10

As mentioned above, I reimplemented MetaScanner so it will start scanning Meta from the region row where the given table row resides, instead of scanning from the next region row in Meta. HTable.getRowOrBefore() is called in MetaScanner to achieve it, (however I'm not very sure whether it is the most efficient way to do it or not). 

So for this unit test, no matter the passed row is a region start key or not, we will always get 10 here. 


> On 2010-05-31 21:52:12, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, lines 776-781
> > <http://review.hbase.org/r/98/diff/4/?file=871#file871line776>
> >
> >     this block should go after the cache check below, right?

I reimplemented MetaScanner so it will start scanning Meta from the region where the given user table row resides, instead of scanning from the next region row in Meta.

In this case, prefetchRegionCache() also fetches the queried table+row to the cache. Here I put it before the cache check block, so it can load the result from cache directly. Otherwise it may do an extra meta scan if cache prefetch is enabled. 

However if multiple threads accessing this block concurrently, this way will cause cache-prefetch executed twice. But I think this case is pretty rare, so the penalty should be relatively smaller. 

What do you think? 


- Mingjie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review107
-----------------------------------------------------------


On 2010-05-31 19:49:20, Mingjie Lai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> -----------------------------------------------------------
> 
> (Updated 2010-05-31 19:49:20)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
> - Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 
> 
> 
> This addresses bug HBASE-2468.
>     http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 09de2ac 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 3de661e 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> -------
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Todd Lipcon <to...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review107
-----------------------------------------------------------



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment635>

    typo: locationS



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment633>

    I think any of getCachedRegionCount, countCachedRegions, or getNumCachedRegions would be a clearer name.



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment634>

    style-wise, why "final" here?



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment636>

    remove these @param javadocs - they just take up space if the param names are self-explanatory (which these are)
    
    same thing above



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment637>

    this needs a Comparator argument, otherwise it does object identity rather than bytewise comparison of the table names



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment640>

    javadoc out of date - it prefetches the region for the given row, and prefetchLimit regions ahead



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment638>

    should return false to stop scanning at this point, right?



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment639>

    // cache this meta entry
    
    (it's not caching all)



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment641>

    this block should go after the cache check below, right?



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment647>

    this function needs to synchronized on cachedRegionLocations which is an unsynchronized HashMap



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment648>

    return location != null;



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment654>

    kill this function



src/main/java/org/apache/hadoop/hbase/client/HTable.java
<http://review.hbase.org/r/98/#comment649>

    you should prefix the file with writeInt(allRegions.size()) so you don't have to use EOFException catching below



src/main/java/org/apache/hadoop/hbase/client/HTable.java
<http://review.hbase.org/r/98/#comment650>

    yuck, see above



src/main/java/org/apache/hadoop/hbase/client/HTable.java
<http://review.hbase.org/r/98/#comment651>

    I feel like it would be cleaner to have the following methods be non-static, so they don't need any arguments. that would reduce the number of functions by factor of two



src/main/java/org/apache/hadoop/hbase/client/HTable.java
<http://review.hbase.org/r/98/#comment653>

    incorrect javadoc, also a few places below



src/main/java/org/apache/hadoop/hbase/client/HTable.java
<http://review.hbase.org/r/98/#comment652>

    why do we need a separate function for enabled and disabled? aren't they always inverse of each other?



src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
<http://review.hbase.org/r/98/#comment642>

    should make clear this is the row name in the user table, not the meta table.
    
    should also be clarified that it will start scanning with the region *after* row, because it doesn't use getClosestRowBefore



src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
<http://review.hbase.org/r/98/#comment643>

    useless @throws



src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
<http://review.hbase.org/r/98/#comment645>

    you should use util.getTestDir here



src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
<http://review.hbase.org/r/98/#comment644>

    import java.io.File



src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
<http://review.hbase.org/r/98/#comment646>

    I really like these unit tests, but I think you should use a row key for the Get that isn't also a region start key, as it may expose different behavior. eg I think you will end up with 11 cached regions instead of 10


- Todd


On 2010-05-31 19:49:20, Mingjie Lai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> -----------------------------------------------------------
> 
> (Updated 2010-05-31 19:49:20)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
> - Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 
> 
> 
> This addresses bug HBASE-2468.
>     http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 09de2ac 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 3de661e 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> -------
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Todd Lipcon <to...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review165
-----------------------------------------------------------


Looking good! Just a few notes.


src/main/java/org/apache/hadoop/hbase/client/HConnection.java
<http://review.hbase.org/r/98/#comment813>

    I thought we were collapsing these two calls into setRegionCachePrefetchEnabled(tableName, enabled)?



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment816>

    I don't entirely understand why we key these hashes by integer, but it seems like you're following the status quo, so doesn't need to be addressed in this patch.



src/main/java/org/apache/hadoop/hbase/client/HTable.java
<http://review.hbase.org/r/98/#comment822>

    I still don't quite understand the logic about why these should be static. Previously you pointed to the enable/disable calls, but those are more like admin calls, not calls that affect client behavior. Anyone else have an opinion?



src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
<http://review.hbase.org/r/98/#comment823>

    I think this should be Math.max(rowLimit, configuration.getInt(...)) - if we only want to scan 5 rows, we don't want the scanner to prefetch 100 for us.


- Todd


On 2010-06-09 15:50:59, Mingjie Lai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> -----------------------------------------------------------
> 
> (Updated 2010-06-09 15:50:59)
> 
> 
> Review request for hbase, Todd Lipcon and stack.
> 
> 
> Summary
> -------
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
> - Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 
> 
> 
> This addresses bug HBASE-2468.
>     http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java ed18092 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> -------
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Mingjie Lai <mj...@gmail.com>.

> On 2010-06-10 08:53:41, Jonathan Gray wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HConnection.java, line 235
> > <http://review.hbase.org/r/98/diff/5/?file=941#file941line235>
> >
> >     +1 on collapsing with a boolean.  setRegionCachePrefetch(table, enable)?  Seems self descriptive and with a couple lines of javadoc should be clear.

yes sir. will modify to use set...() and get...(). 


> On 2010-06-10 08:53:41, Jonathan Gray wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HTable.java, line 1079
> > <http://review.hbase.org/r/98/diff/5/?file=943#file943line1079>
> >
> >     I guess these are static because of how HTables all share a single HCM per conf.  The setting of prefetching is set at the HCM level not the HTable level, however clients are usually not exposed to HCM and only deal with HTable.
> >     
> >     We should probably make it clear in the javadoc for these methods that they apply to all HTable instances, though that may be clear from being static.
> >     
> >     Maybe since these are more advanced calls, they shouldn't be in HTable?  If we provide proper documentation, it should be easy enough for a user to grab the HCM and apply the config at that level?

> I guess these are static because of how HTables all share a single HCM per conf...
Yes. 

> Maybe since these are more advanced calls, they shouldn't be in HTable?
Two alternatives:
1) HCM: as jgray said, ``however clients are usually not exposed to HCM and only deal with HTable.''
2) HBaseAdmin: it is a more reasonable design choice since these operation are at HCM level. 
3) or, make it a configuration. It would be one global configuration applied to all tables, and cannot be changed dynamically. 

I like 2) better, but not really sure whether we want to expose it there or not. 

What do you think? 


- Mingjie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review176
-----------------------------------------------------------


On 2010-06-09 15:50:59, Mingjie Lai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> -----------------------------------------------------------
> 
> (Updated 2010-06-09 15:50:59)
> 
> 
> Review request for hbase, Todd Lipcon and stack.
> 
> 
> Summary
> -------
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
> - Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 
> 
> 
> This addresses bug HBASE-2468.
>     http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java ed18092 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> -------
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Todd Lipcon <to...@cloudera.com>.

> On 2010-06-10 08:53:41, Jonathan Gray wrote:
> > src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java, line 109
> > <http://review.hbase.org/r/98/diff/5/?file=944#file944line109>
> >
> >     Hmm, wouldn't that be Math.min then?  If rowLimit = 5 and scanner.caching = 100, we want to only do 5?

woops, yes :)


- Todd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review176
-----------------------------------------------------------


On 2010-06-09 15:50:59, Mingjie Lai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> -----------------------------------------------------------
> 
> (Updated 2010-06-09 15:50:59)
> 
> 
> Review request for hbase, Todd Lipcon and stack.
> 
> 
> Summary
> -------
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
> - Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 
> 
> 
> This addresses bug HBASE-2468.
>     http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java ed18092 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> -------
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Jonathan Gray <jg...@apache.org>.

> On 2010-06-10 08:53:41, Jonathan Gray wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HTable.java, line 1079
> > <http://review.hbase.org/r/98/diff/5/?file=943#file943line1079>
> >
> >     I guess these are static because of how HTables all share a single HCM per conf.  The setting of prefetching is set at the HCM level not the HTable level, however clients are usually not exposed to HCM and only deal with HTable.
> >     
> >     We should probably make it clear in the javadoc for these methods that they apply to all HTable instances, though that may be clear from being static.
> >     
> >     Maybe since these are more advanced calls, they shouldn't be in HTable?  If we provide proper documentation, it should be easy enough for a user to grab the HCM and apply the config at that level?
> 
> Mingjie Lai wrote:
>     > I guess these are static because of how HTables all share a single HCM per conf...
>     Yes. 
>     
>     > Maybe since these are more advanced calls, they shouldn't be in HTable?
>     Two alternatives:
>     1) HCM: as jgray said, ``however clients are usually not exposed to HCM and only deal with HTable.''
>     2) HBaseAdmin: it is a more reasonable design choice since these operation are at HCM level. 
>     3) or, make it a configuration. It would be one global configuration applied to all tables, and cannot be changed dynamically. 
>     
>     I like 2) better, but not really sure whether we want to expose it there or not. 
>     
>     What do you think?

Adding it to HBaseAdmin could make sense.  This one is a bit of an odd one because it's a client-side configuration parameter done at the per-client-jvm level.  Typically we have per-query or per-htable-instance configs.  HBaseAdmin is generally made up of remote administration commands not local client config.

If we provide sufficient javadoc (including as a class comment on HTable) it doesn't matter so much where we put it.  Since it's distinct from what's currently in HTable and HBaseAdmin, maybe it does make sense as a static in HCM?


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review176
-----------------------------------------------------------


On 2010-06-09 15:50:59, Mingjie Lai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> -----------------------------------------------------------
> 
> (Updated 2010-06-09 15:50:59)
> 
> 
> Review request for hbase, Todd Lipcon and stack.
> 
> 
> Summary
> -------
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
> - Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 
> 
> 
> This addresses bug HBASE-2468.
>     http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java ed18092 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> -------
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Todd Lipcon <to...@cloudera.com>.

> On 2010-06-10 08:53:41, Jonathan Gray wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HTable.java, line 1079
> > <http://review.hbase.org/r/98/diff/5/?file=943#file943line1079>
> >
> >     I guess these are static because of how HTables all share a single HCM per conf.  The setting of prefetching is set at the HCM level not the HTable level, however clients are usually not exposed to HCM and only deal with HTable.
> >     
> >     We should probably make it clear in the javadoc for these methods that they apply to all HTable instances, though that may be clear from being static.
> >     
> >     Maybe since these are more advanced calls, they shouldn't be in HTable?  If we provide proper documentation, it should be easy enough for a user to grab the HCM and apply the config at that level?
> 
> Mingjie Lai wrote:
>     > I guess these are static because of how HTables all share a single HCM per conf...
>     Yes. 
>     
>     > Maybe since these are more advanced calls, they shouldn't be in HTable?
>     Two alternatives:
>     1) HCM: as jgray said, ``however clients are usually not exposed to HCM and only deal with HTable.''
>     2) HBaseAdmin: it is a more reasonable design choice since these operation are at HCM level. 
>     3) or, make it a configuration. It would be one global configuration applied to all tables, and cannot be changed dynamically. 
>     
>     I like 2) better, but not really sure whether we want to expose it there or not. 
>     
>     What do you think?
> 
> Jonathan Gray wrote:
>     Adding it to HBaseAdmin could make sense.  This one is a bit of an odd one because it's a client-side configuration parameter done at the per-client-jvm level.  Typically we have per-query or per-htable-instance configs.  HBaseAdmin is generally made up of remote administration commands not local client config.
>     
>     If we provide sufficient javadoc (including as a class comment on HTable) it doesn't matter so much where we put it.  Since it's distinct from what's currently in HTable and HBaseAdmin, maybe it does make sense as a static in HCM?

I think keeping HConnectionManager an internal interface is a good idea, so kind of -0 there. -1 on HBaseAdmin, since we should keep that for administrative functions that really change something on the cluster. So I'd prefer HTable, but wouldn't cry over HCM.


- Todd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review176
-----------------------------------------------------------


On 2010-06-09 15:50:59, Mingjie Lai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> -----------------------------------------------------------
> 
> (Updated 2010-06-09 15:50:59)
> 
> 
> Review request for hbase, Todd Lipcon and stack.
> 
> 
> Summary
> -------
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
> - Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 
> 
> 
> This addresses bug HBASE-2468.
>     http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java ed18092 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> -------
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Jonathan Gray <jg...@apache.org>.

> On 2010-06-10 08:53:41, Jonathan Gray wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HTable.java, line 1079
> > <http://review.hbase.org/r/98/diff/5/?file=943#file943line1079>
> >
> >     I guess these are static because of how HTables all share a single HCM per conf.  The setting of prefetching is set at the HCM level not the HTable level, however clients are usually not exposed to HCM and only deal with HTable.
> >     
> >     We should probably make it clear in the javadoc for these methods that they apply to all HTable instances, though that may be clear from being static.
> >     
> >     Maybe since these are more advanced calls, they shouldn't be in HTable?  If we provide proper documentation, it should be easy enough for a user to grab the HCM and apply the config at that level?
> 
> Mingjie Lai wrote:
>     > I guess these are static because of how HTables all share a single HCM per conf...
>     Yes. 
>     
>     > Maybe since these are more advanced calls, they shouldn't be in HTable?
>     Two alternatives:
>     1) HCM: as jgray said, ``however clients are usually not exposed to HCM and only deal with HTable.''
>     2) HBaseAdmin: it is a more reasonable design choice since these operation are at HCM level. 
>     3) or, make it a configuration. It would be one global configuration applied to all tables, and cannot be changed dynamically. 
>     
>     I like 2) better, but not really sure whether we want to expose it there or not. 
>     
>     What do you think?
> 
> Jonathan Gray wrote:
>     Adding it to HBaseAdmin could make sense.  This one is a bit of an odd one because it's a client-side configuration parameter done at the per-client-jvm level.  Typically we have per-query or per-htable-instance configs.  HBaseAdmin is generally made up of remote administration commands not local client config.
>     
>     If we provide sufficient javadoc (including as a class comment on HTable) it doesn't matter so much where we put it.  Since it's distinct from what's currently in HTable and HBaseAdmin, maybe it does make sense as a static in HCM?
> 
> Todd Lipcon wrote:
>     I think keeping HConnectionManager an internal interface is a good idea, so kind of -0 there. -1 on HBaseAdmin, since we should keep that for administrative functions that really change something on the cluster. So I'd prefer HTable, but wouldn't cry over HCM.

Since we can knock it down to just two methods, get/set, let's just put it in HTable.

But it will be static, right, so people understand it's not per-instance.  Let's also make sure there is javadoc that also explains that it is for all HTable instances for the tables you configure.


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review176
-----------------------------------------------------------


On 2010-06-09 15:50:59, Mingjie Lai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> -----------------------------------------------------------
> 
> (Updated 2010-06-09 15:50:59)
> 
> 
> Review request for hbase, Todd Lipcon and stack.
> 
> 
> Summary
> -------
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
> - Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 
> 
> 
> This addresses bug HBASE-2468.
>     http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java ed18092 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> -------
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Jonathan Gray <jg...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review176
-----------------------------------------------------------


This looks great!  I think we should think more about where to expose this, and also think about using more of a get/set API to reduce the method calls and make it look more like the other client APIs.


src/main/java/org/apache/hadoop/hbase/client/HConnection.java
<http://review.hbase.org/r/98/#comment841>

    +1 on collapsing with a boolean.  setRegionCachePrefetch(table, enable)?  Seems self descriptive and with a couple lines of javadoc should be clear.



src/main/java/org/apache/hadoop/hbase/client/HTable.java
<http://review.hbase.org/r/98/#comment840>

    I guess these are static because of how HTables all share a single HCM per conf.  The setting of prefetching is set at the HCM level not the HTable level, however clients are usually not exposed to HCM and only deal with HTable.
    
    We should probably make it clear in the javadoc for these methods that they apply to all HTable instances, though that may be clear from being static.
    
    Maybe since these are more advanced calls, they shouldn't be in HTable?  If we provide proper documentation, it should be easy enough for a user to grab the HCM and apply the config at that level?



src/main/java/org/apache/hadoop/hbase/client/HTable.java
<http://review.hbase.org/r/98/#comment842>

    Should we also make these methods (if we even leave it in HTable) more like setRegionCachePrefetch / getRegionCachePrefetch?  HTable is the core client class so the less noise we add the better.  We have other methods in client APIs of the get/set form like Scan.setCacheBlocks and Put.setWriteToWAL



src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
<http://review.hbase.org/r/98/#comment843>

    Hmm, wouldn't that be Math.min then?  If rowLimit = 5 and scanner.caching = 100, we want to only do 5?


- Jonathan


On 2010-06-09 15:50:59, Mingjie Lai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> -----------------------------------------------------------
> 
> (Updated 2010-06-09 15:50:59)
> 
> 
> Review request for hbase, Todd Lipcon and stack.
> 
> 
> Summary
> -------
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
> - Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 
> 
> 
> This addresses bug HBASE-2468.
>     http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java ed18092 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> -------
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by st...@duboce.net.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review210
-----------------------------------------------------------

Ship it!


+1

I did a quick skim.  All looks good.  This is a weird one in that to prewarm we are going to do a getClosestOrBefore (which we'll be doing anyway) and then we'll open scanner, and scan next 10 rows... then close scanner; i.e. extra rpcs inline w/ first lookup against tale.  This latter will actually slow down first lookup some but we can make it faster by making open scanner return results so we don't have to then go call next (already an issue to do this) and also, making it so we scan forward always by changing keys in .META. such that region names are keyed by endkey rather than startkey... also another issue to do this.

- stack


On 2010-06-13 00:49:50, Mingjie Lai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> -----------------------------------------------------------
> 
> (Updated 2010-06-13 00:49:50)
> 
> 
> Review request for hbase, Todd Lipcon and stack.
> 
> 
> Summary
> -------
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
> - Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 
> 
> 
> This addresses bug HBASE-2468.
>     http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 03cbf8d 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 8e47f2d 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java f11ada7 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> -------
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Mingjie Lai <mj...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/
-----------------------------------------------------------

(Updated 2010-06-13 00:49:50.977413)


Review request for hbase, Todd Lipcon and stack.


Changes
-------

1) resolved conflicts from the latest code commits
2) added get/setRegionCachePrefetch() in HTable, removed isRegionCachePrefetchEnabled(), enableRegionCachePrefetch(), etc. 
3) in MetaScanner, added Math.min(rowLimit, configuration.getInt(...))


Summary
-------

HBASE-2468: Improvements to prewarm META cache on clients.

Changes:
1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
- void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
- Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
- prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.

2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 


This addresses bug HBASE-2468.
    http://issues.apache.org/jira/browse/HBASE-2468


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 03cbf8d 
  src/main/java/org/apache/hadoop/hbase/client/HTable.java 8e47f2d 
  src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java f11ada7 
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 

Diff: http://review.hbase.org/r/98/diff


Testing
-------

Unit tests passed locally for me. 


Thanks,

Mingjie


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Mingjie Lai <mj...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/
-----------------------------------------------------------

(Updated 2010-06-09 15:50:59.084657)


Review request for hbase, Todd Lipcon and stack.


Changes
-------

@St^ack: please see my comments for your feedback regarding getRowOrBefore() issue. 

Invite Todd as a reviewer. 


Summary
-------

HBASE-2468: Improvements to prewarm META cache on clients.

Changes:
1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
- void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
- Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
- prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.

2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 


This addresses bug HBASE-2468.
    http://issues.apache.org/jira/browse/HBASE-2468


Diffs
-----

  src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java ed18092 
  src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
  src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 

Diff: http://review.hbase.org/r/98/diff


Testing
-------

Unit tests passed locally for me. 


Thanks,

Mingjie


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Mingjie Lai <mj...@gmail.com>.

> On 2010-06-07 14:23:42, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java, line 96
> > <http://review.hbase.org/r/98/diff/5/?file=944#file944line96>
> >
> >     getRowOrBefore is an expensive call.  Are we sure we are not calling this too often?

I agree it is an expensive call. 

However I don't think it would bring any performance penalty for existing and potential use cases:
Use case 1 -- existing MetaScanner users: since this method is newly added, existing users won't be affected; 
Use case 2 -- hbase clients when locating a region :  
1) if prefetch is on, it calls this MetaScanner with [table + row combination], which calls getRowOrBefore() to get current region info, then number of following regions from meta. After that, the client can get the region info directly from cache. 
2) if prefetch is disabled (current behavior), it eventually calls similar method getClosestRowBefore() to get desired region. 

So no matter prefetch is on or not, getRowOrBefore(or getClosestRowBefore) eventually is called. The only difference is whether to scan following regions from meta or not. 

For future MetaScanner users which scan from one region with desired use table row, it has to take the effort since it is the expected behavior. 


- Mingjie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review144
-----------------------------------------------------------


On 2010-06-02 01:13:42, Mingjie Lai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> -----------------------------------------------------------
> 
> (Updated 2010-06-02 01:13:42)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
> - Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 
> 
> 
> This addresses bug HBASE-2468.
>     http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java ed18092 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> -------
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by st...@duboce.net.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review144
-----------------------------------------------------------

Ship it!


I think this good to go.  Seem my comments below.  See what you think.  My one concern is the number of calls to getRowOrBefore... hopefully this patch cuts down overall on our need to use this function.  I'd like to hear your opinion on that.


src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment744>

    This is code duplicated from elsewhere.  Can I help make it so we don't have to do this duplication?  Or, for now, since this your fist patch, we can put it off IF you file a JIRA to fix the duplication (smile).



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/98/#comment745>

    So we start scanning at 'row'?  Is this the 'row' the user asked for? No, it needs to be the row in the .META. table, right?  We need to find the row in .META. that contains the asked for row first?  NM, I see below how the row here is made.. .this looks right.



src/main/java/org/apache/hadoop/hbase/client/HTable.java
<http://review.hbase.org/r/98/#comment746>

    This is a nice little facility.



src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
<http://review.hbase.org/r/98/#comment747>

    OK.  This looks right.



src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
<http://review.hbase.org/r/98/#comment748>

    getRowOrBefore is an expensive call.  Are we sure we are not calling this too often?


- stack


On 2010-06-02 01:13:42, Mingjie Lai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> -----------------------------------------------------------
> 
> (Updated 2010-06-02 01:13:42)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
> - Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
> - prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 
> 
> 
> This addresses bug HBASE-2468.
>     http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java ed18092 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> -------
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Mingjie Lai <mj...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/
-----------------------------------------------------------

(Updated 2010-06-02 01:13:42.272750)


Review request for hbase.


Changes
-------

1. Fix some issues pointed out by Todd.
2. Re-do MetaScanner so it will start scanning Meta from the region where the given user table row resides, instead of scanning from the next region row in Meta.


Summary
-------

HBASE-2468: Improvements to prewarm META cache on clients.

Changes:
1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
- void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
- Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
- prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.

2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 


This addresses bug HBASE-2468.
    http://issues.apache.org/jira/browse/HBASE-2468


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java ed18092 
  src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
  src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 

Diff: http://review.hbase.org/r/98/diff


Testing
-------

Unit tests passed locally for me. 


Thanks,

Mingjie


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Mingjie Lai <mj...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/
-----------------------------------------------------------

(Updated 2010-05-31 19:49:20.012863)


Review request for hbase.


Changes
-------

Improved to address comments from tsuna, Paul Crown, and jdcryans. 


Summary
-------

HBASE-2468: Improvements to prewarm META cache on clients.

Changes:
1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
- void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
- Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
- prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.

2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 


This addresses bug HBASE-2468.
    http://issues.apache.org/jira/browse/HBASE-2468


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 09de2ac 
  src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
  src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 3de661e 
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 

Diff: http://review.hbase.org/r/98/diff


Testing
-------

Unit tests passed locally for me. 


Thanks,

Mingjie


Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.

Posted by Mingjie Lai <mj...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/
-----------------------------------------------------------

(Updated 2010-05-28 13:20:33.843332)


Review request for hbase.


Changes
-------

Improved a little, to address the comments from Benoit Sigoure, at http://review.hbase.org/r/78/#review91.


Summary
-------

HBASE-2468: Improvements to prewarm META cache on clients.

Changes:
1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
- void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
- Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
- prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.

2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 


This addresses bug HBASE-2468.
    http://issues.apache.org/jira/browse/HBASE-2468


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 09de2ac 
  src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
  src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 3de661e 
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 

Diff: http://review.hbase.org/r/98/diff


Testing
-------

Unit tests passed locally for me. 


Thanks,

Mingjie