You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Jonathan Gray <jg...@apache.org> on 2010/09/07 01:37:36 UTC
Re: Review Request: Eliminate duplicates, stale versions. Have determined behaviour and storefile ordering

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/780/#review1092
-----------------------------------------------------------


This is great Pranav.  It looks like all normally flushed files would order before any HFileOuputFormat file?  I'm fine with that for now but any ideas for what we could do in the long-term?  Would HFOF query HBase to get the current seqid of the region?

Do you think we need any more testing?  I think at a minimum we should add a nice one to TestFromClientSide.  This is where "expected behavior" is usually defined.


trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
<http://review.cloudera.org/r/780/#comment3597>

    Use a comment like:
    /** Keeps track of the ... */
    
    That way it's javadoc and will show up properly in IDEs and such.



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
<http://review.cloudera.org/r/780/#comment3598>

    add timestamp to javadoc w/ a description



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
<http://review.cloudera.org/r/780/#comment3600>

    Should update this comment since we don't always include to this point anymore



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
<http://review.cloudera.org/r/780/#comment3601>

    good



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
<http://review.cloudera.org/r/780/#comment3602>

    was this a bug or changed behavior?



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
<http://review.cloudera.org/r/780/#comment3599>

    I would say drop all of these helper methods.  I think it's actually more descriptive to use the code rather than single-line helpers, especially given a descriptive variable name.
    
    Unless you feel strongly that this is better.



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
<http://review.cloudera.org/r/780/#comment3603>

    put this after javadoc right above the method signature



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
<http://review.cloudera.org/r/780/#comment3604>

    Should this be rewritten now?  Including the comment above.
    
    Looks like always return colChecker now, just if seek_next_row we set stickNextRow?



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
<http://review.cloudera.org/r/780/#comment3605>

    Really sucks we have these two classes with duplicated code.  Don't want you to fix this, fine as it is for now.



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
<http://review.cloudera.org/r/780/#comment3606>

    Should we resetTS() here?  Or resetTS() happens at the start of finding a new col?



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
<http://review.cloudera.org/r/780/#comment3607>

    Maybe these methods are good, will make it much easier to have the two ColumnTrackers use a base class?



trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestColumnSeeking.java
<http://review.cloudera.org/r/780/#comment3608>

    apache license


- Jonathan


On 2010-09-06 15:44:17, Pranav Khaitan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/780/
> -----------------------------------------------------------
> 
> (Updated 2010-09-06 15:44:17)
> 
> 
> Review request for hbase, stack, Jonathan Gray, Ryan Rawson, Karthik Ranganathan, and Kannan Muthukkaruppan.
> 
> 
> Summary
> -------
> 
> Goodbye duplicates, hello consistent ordering.
> 
> A way to order multiple cells having same keys. Every StoreFile will have a timestamp associated with it and when any two cells have same keys, they will be sorted based on timestamps of the StoreFiles they come from. Therefore, for duplicate versions of a cell, the version which came in latest will be returned. We are ensuring that only one version will be returned.
> 
> There are two components in this patch. The first one is associated with KeyValueHeap and it ensures that duplicate cells are ordered correctly. The second one is in ColumnTracker and it ensures that only one version is allowed to be returned from StoreScanner.
> 
> For all existing files and/or any files which do not have the 'TIMESTAMP' field in meta, their timestamp will be set to zero which means they will be assumed to be the 'oldest' file.
> 
> Also changed the return codes in ColumnTracker to be consistent with definition of ScanQueryMatcher.ReturnCode.
> 
> Design discussed with jgray.
> 
> Further suggestions/questions are welcome!
> 
> 
> This addresses bugs HBASE-1485, HBASE-2406 and HBASE-2649.
>     http://issues.apache.org/jira/browse/HBASE-1485
>     http://issues.apache.org/jira/browse/HBASE-2406
>     http://issues.apache.org/jira/browse/HBASE-2649
> 
> 
> Diffs
> -----
> 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 993156 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 993156 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 993156 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 993156 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 993156 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MinorCompactingStoreScanner.java 993156 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 993156 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 993156 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 993156 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 993156 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 993156 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 993156 
>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 993156 
>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestColumnSeeking.java PRE-CREATION 
>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 993156 
>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 993156 
>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 993156 
> 
> Diff: http://review.cloudera.org/r/780/diff
> 
> 
> Testing
> -------
> 
> All existing tests are passing and running through this code path. 
> 
> Added two tests which use several combination of columns, rows, values and ensure that only the most recent one is returned. This test will be useful for testing all kinds of column reseeking, row reseeking, etc.
> 
> 
> Thanks,
> 
> Pranav
> 
>