You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org> on 2010/06/26 08:53:50 UTC

[jira] Created: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Add ability to extract a specified list of versions of a column in a single roundtrip
-------------------------------------------------------------------------------------

                 Key: HBASE-2793
                 URL: https://issues.apache.org/jira/browse/HBASE-2793
             Project: HBase
          Issue Type: New Feature
            Reporter: Kannan Muthukkaruppan


In one of the use cases we were looking at, each row contains a single column, but with several versions (e.g., each version representing an event in a log), and we want to be able to extract specific set of versions from the row in a single round-trip.

Currently, on a Get, one can retrieve a specific version of a column using setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882880#action_12882880 ] 

ryan rawson commented on HBASE-2793:
------------------------------------

#2 wontbe so bad... filters are pretty deep and will be just as efficient as
hacking scan query Matcher I think.

On Jun 26, 2010 11:45 AM, "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>
https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882878#action_12882878]
to be:
be nice if the fix for this issue also takes advantage of that optimization
and avoids a full row scan).
seems like in this approach, you'll end up doing a full scan-- and check
against the filter for each row. There wouldn't be a way to early exit.
passed in set of versions; use the code setTimeRange() to trim down the set
of columns we look at; and apply the filter against those columns. Still not
a great approach is versions passed are spread out too much.
the same server roundtrip of course). I think it is still important to
preserve row-level consistency-- i.e. we should do a consistent read of the
all the versions within a row. The stuff Ryan has done should probably make
it easy. But I don't know this too well yet.
objects, all for the same row, and use setTimeStamp() to set the version
explicitly in each Get object. The trouble though is that the general case
of the Batch Get[] API doesn't have to support a consistency read across all
Gets in a batch; but for this case a consistent read would be the desired
semantics.
and you are interested in version 1 and 10000 ones, then point lookups will
be as good as it gets-- and should fetch just the minimal blocks needed. If
the versions happen to be on same block, even better-- the blocks should be
warm in the LRU cache. The case where this approach might not be as CPU
efficient is if the versions are fairly densely packed together, and a range
scan (#2) might have worked better. But for the case the app should probably
be using setTimeRange() API instead.
single roundtrip
-------------------------------------------------------------------------------------
column, but with several versions (e.g., each version representing an event
in a log), and we want to be able to extract specific set of versions from
the row in a single round-trip.
using setTimeStamp(ts) or a range of versions using setTimeRange(min, max).
But not a set of specified versions. It would be useful to add this ability.


> Add ability to extract a specified list of versions of a column in a single roundtrip
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2793
>                 URL: https://issues.apache.org/jira/browse/HBASE-2793
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> In one of the use cases we were looking at, each row contains a single column, but with several versions (e.g., each version representing an event in a log), and we want to be able to extract specific set of versions from the row in a single round-trip.
> Currently, on a Get, one can retrieve a specific version of a column using setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kannan Muthukkaruppan updated HBASE-2793:
-----------------------------------------

    Attachment: 2793_patch_v3.txt

removed unused imports (Ryan's comments).

Ryan: Can you approve/commit the change?

> Add ability to extract a specified list of versions of a column in a single roundtrip
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2793
>                 URL: https://issues.apache.org/jira/browse/HBASE-2793
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>         Attachments: 2793_patch_v1.txt, 2793_patch_v2.txt, 2793_patch_v3.txt
>
>
> In one of the use cases we were looking at, each row contains a single column, but with several versions (e.g., each version representing an event in a log), and we want to be able to extract specific set of versions from the row in a single round-trip.
> Currently, on a Get, one can retrieve a specific version of a column using setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882854#action_12882854 ] 

stack commented on HBASE-2793:
------------------------------

What you thinking Kannan?  Passing a filter?

> Add ability to extract a specified list of versions of a column in a single roundtrip
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2793
>                 URL: https://issues.apache.org/jira/browse/HBASE-2793
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> In one of the use cases we were looking at, each row contains a single column, but with several versions (e.g., each version representing an event in a log), and we want to be able to extract specific set of versions from the row in a single round-trip.
> Currently, on a Get, one can retrieve a specific version of a column using setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882878#action_12882878 ] 

Kannan Muthukkaruppan commented on HBASE-2793:
----------------------------------------------

Still need to look at the code some more. Thinking aloud some options seem to be:

(Note as background: that we are planning to add HBASE-2265. So, it would be nice if the fix for this issue also takes advantage of that optimization and avoids a full row scan).

#1) Filter object with a list of versions you are interested in. But it seems like in this approach, you'll end up doing a full scan-- and check against the filter for each row. There wouldn't be a way to early exit.

#2) Variant of #1. Additionally compute the min/max version from the passed in set of versions; use the code setTimeRange() to trim down the set of columns we look at; and apply the filter against those columns. Still not a great approach is versions passed are spread out too much.

#3) Do N point lookups (or 1 column scans), one version at a time (all in the same server roundtrip of course). I think it is still important to preserve row-level consistency-- i.e. we should do a consistent read of the all the versions within a row. The stuff Ryan has done should probably make it easy. But I don't know this too well yet.

#4) Implement Batch Get[] API. The app would need to pass a List of Get objects, all for the same row, and use setTimeStamp() to set the version explicitly in each Get object. The trouble though is that the general case of the Batch Get[] API doesn't have to support a consistency read across all Gets in a batch; but for this case a consistent read would be the desired semantics.

I think #3 might be best overall. If there are 10000 versions of a cell, and you are interested in version 1 and 10000 ones, then point lookups will be as good as it gets-- and should fetch just the minimal blocks needed.  If the versions happen to be on same block, even better-- the blocks should be warm in the LRU cache. The case where this approach might not be as CPU efficient is if the versions are fairly densely packed together, and a range scan (#2) might have worked better. But for the case the app should probably be using setTimeRange() API instead.




> Add ability to extract a specified list of versions of a column in a single roundtrip
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2793
>                 URL: https://issues.apache.org/jira/browse/HBASE-2793
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> In one of the use cases we were looking at, each row contains a single column, but with several versions (e.g., each version representing an event in a log), and we want to be able to extract specific set of versions from the row in a single round-trip.
> Currently, on a Get, one can retrieve a specific version of a column using setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885763#action_12885763 ] 

Kannan Muthukkaruppan commented on HBASE-2793:
----------------------------------------------

I am going to revise the diff after discussion with Jonathan.

Currently, on the server side, I was special casing by looking to see if the filter is an instanceof TimestampsFilter and then updating/narrowing down the timerange based on that. But the special casing will not quite work if for example it is compound filter.

-----

Another question: Both ScanQueryMatcher and QueryMatcher have filter logic-- but they seem to be doing different things in the fallthrough/default case.

In ScanQueryMatcher:
{code}
     if (filterResponse == ReturnCode.INCLUDE)
      return MatchCode.INCLUDE;

    if (filterResponse == ReturnCode.SKIP)
      return MatchCode.SKIP;

    // else if (filterResponse == ReturnCode.NEXT_ROW)

    stickyNextRow = true;
    return MatchCode.SEEK_NEXT_ROW;
{code}
the default behavior is MatchCode.SEEK_NEXT_ROW;

In QueryMatcher:
{code}
    if (mc == MatchCode.INCLUDE && this.filter != null) {
      switch(this.filter.filterKeyValue(kv)) {
        case INCLUDE: return MatchCode.INCLUDE;
        case SKIP: return MatchCode.SKIP;
        default: return MatchCode.DONE;
      }
    }
{code}
the default case is MatchCode.DONE

I am not clear on the why the above difference. Any ideas?

For the timestamps filter, which now (with my change) returns a new code Filter.NEXT_COL (when the scan falls below the min timestamp), I guess I need to update both of these places. Currently, my patch only does this in ScanQueryMatcher.




> Add ability to extract a specified list of versions of a column in a single roundtrip
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2793
>                 URL: https://issues.apache.org/jira/browse/HBASE-2793
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>         Attachments: 2793_patch_v1.txt
>
>
> In one of the use cases we were looking at, each row contains a single column, but with several versions (e.g., each version representing an event in a log), and we want to be able to extract specific set of versions from the row in a single round-trip.
> Currently, on a Get, one can retrieve a specific version of a column using setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ryan rawson resolved HBASE-2793.
--------------------------------

     Hadoop Flags: [Reviewed]
    Fix Version/s: 0.21.0
       Resolution: Fixed

committed to trunk

> Add ability to extract a specified list of versions of a column in a single roundtrip
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2793
>                 URL: https://issues.apache.org/jira/browse/HBASE-2793
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>             Fix For: 0.21.0
>
>         Attachments: 2793_patch_v1.txt, 2793_patch_v2.txt, 2793_patch_v3.txt
>
>
> In one of the use cases we were looking at, each row contains a single column, but with several versions (e.g., each version representing an event in a log), and we want to be able to extract specific set of versions from the row in a single round-trip.
> Currently, on a Get, one can retrieve a specific version of a column using setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886036#action_12886036 ] 

HBase Review Board commented on HBASE-2793:
-------------------------------------------

Message from: "Kannan Muthukkaruppan" <ka...@facebook.com>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/268/
-----------------------------------------------------------

(Updated 2010-07-07 12:01:55.533222)


Review request for hbase.


Changes
-------

*  Removed specialized check for TimestampsFilter in ScanQueryMatcher for narrowing down time range. (Will do that optimization once Pranav's changes (for HBase-2265) are in-- by allowing Filters to participate in StoreFileScanner.shouldSeek() check. Each filter should be able to participate in shouldSeek() to avoid seeking into unnecessary files.).

* Enhanced test by throwing in some version deletes in the mix.


Summary
-------

Discussion up in https://issues.apache.org/jira/browse/HBASE-2793.

Using the Filter approach as discussed.


This addresses bug HBASE-2793.
    http://issues.apache.org/jira/browse/HBASE-2793


Diffs (updated)
-----

  trunk/src/main/java/org/apache/hadoop/hbase/filter/Filter.java 960691 
  trunk/src/main/java/org/apache/hadoop/hbase/filter/TimestampsFilter.java PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 960691 
  trunk/src/test/java/org/apache/hadoop/hbase/client/TestTimestampsFilter.java PRE-CREATION 

Diff: http://review.hbase.org/r/268/diff


Testing
-------

The new unit test for TimestampsFilter passes. Running unit tests right now.


Thanks,

Kannan




> Add ability to extract a specified list of versions of a column in a single roundtrip
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2793
>                 URL: https://issues.apache.org/jira/browse/HBASE-2793
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>         Attachments: 2793_patch_v1.txt, 2793_patch_v2.txt
>
>
> In one of the use cases we were looking at, each row contains a single column, but with several versions (e.g., each version representing an event in a log), and we want to be able to extract specific set of versions from the row in a single round-trip.
> Currently, on a Get, one can retrieve a specific version of a column using setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kannan Muthukkaruppan updated HBASE-2793:
-----------------------------------------

    Attachment: 2793_patch_v1.txt

> Add ability to extract a specified list of versions of a column in a single roundtrip
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2793
>                 URL: https://issues.apache.org/jira/browse/HBASE-2793
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>         Attachments: 2793_patch_v1.txt
>
>
> In one of the use cases we were looking at, each row contains a single column, but with several versions (e.g., each version representing an event in a log), and we want to be able to extract specific set of versions from the row in a single round-trip.
> Currently, on a Get, one can retrieve a specific version of a column using setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886750#action_12886750 ] 

Kannan Muthukkaruppan commented on HBASE-2793:
----------------------------------------------

all unit tests pass.

> Add ability to extract a specified list of versions of a column in a single roundtrip
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2793
>                 URL: https://issues.apache.org/jira/browse/HBASE-2793
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>         Attachments: 2793_patch_v1.txt, 2793_patch_v2.txt, 2793_patch_v3.txt
>
>
> In one of the use cases we were looking at, each row contains a single column, but with several versions (e.g., each version representing an event in a log), and we want to be able to extract specific set of versions from the row in a single round-trip.
> Currently, on a Get, one can retrieve a specific version of a column using setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882900#action_12882900 ] 

Kannan Muthukkaruppan commented on HBASE-2793:
----------------------------------------------

Ryan: Ok, will explore a Filter based approach. 

> Add ability to extract a specified list of versions of a column in a single roundtrip
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2793
>                 URL: https://issues.apache.org/jira/browse/HBASE-2793
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> In one of the use cases we were looking at, each row contains a single column, but with several versions (e.g., each version representing an event in a log), and we want to be able to extract specific set of versions from the row in a single round-trip.
> Currently, on a Get, one can retrieve a specific version of a column using setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886258#action_12886258 ] 

HBase Review Board commented on HBASE-2793:
-------------------------------------------

Message from: "Kannan Muthukkaruppan" <ka...@facebook.com>


bq.  On 2010-07-07 15:03:56, Ryan Rawson wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/filter/TimestampsFilter.java, line 26
bq.  > <http://review.hbase.org/r/268/diff/2/?file=2118#file2118line26>
bq.  >
bq.  >     by default if somehow the minTimeStamp gets kept as 0 with an empty timestamps list, then it will filter _everything out_.  Do you want to set this to 0 and instead include everything by default?  Might make it easier to debug perhaps?
bq.  >

Ryan: If TimestampsFilter is used, but client passes an empty list of timestamps, the semantics I am aiming for is indeed to filter everything out. That's why I am setting this MAX_VALUE. The TimestampsFilter is like an IN list predicate in SQL.

...
WHERE timestamp_col IN (t1, t2, ... t3)

If the IN list is empty, no rows should be returned.


bq.  On 2010-07-07 15:03:56, Ryan Rawson wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java, line 25
bq.  > <http://review.hbase.org/r/268/diff/2/?file=2119#file2119line25>
bq.  >
bq.  >     i dont think this import is necessary...

yes, my previous version of the patch needed it. I'll remove it.


bq.  On 2010-07-07 15:03:56, Ryan Rawson wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java, line 27
bq.  > <http://review.hbase.org/r/268/diff/2/?file=2119#file2119line27>
bq.  >
bq.  >     nor this one

ditto. Will remove.


- Kannan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/268/#review317
-----------------------------------------------------------





> Add ability to extract a specified list of versions of a column in a single roundtrip
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2793
>                 URL: https://issues.apache.org/jira/browse/HBASE-2793
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>         Attachments: 2793_patch_v1.txt, 2793_patch_v2.txt
>
>
> In one of the use cases we were looking at, each row contains a single column, but with several versions (e.g., each version representing an event in a log), and we want to be able to extract specific set of versions from the row in a single round-trip.
> Currently, on a Get, one can retrieve a specific version of a column using setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886108#action_12886108 ] 

HBase Review Board commented on HBASE-2793:
-------------------------------------------

Message from: "Ryan Rawson" <ry...@gmail.com>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/268/#review317
-----------------------------------------------------------



trunk/src/main/java/org/apache/hadoop/hbase/filter/TimestampsFilter.java
<http://review.hbase.org/r/268/#comment1382>

    by default if somehow the minTimeStamp gets kept as 0 with an empty timestamps list, then it will filter _everything out_.  Do you want to set this to 0 and instead include everything by default?  Might make it easier to debug perhaps?
    



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
<http://review.hbase.org/r/268/#comment1383>

    i dont think this import is necessary...



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
<http://review.hbase.org/r/268/#comment1384>

    nor this one


- Ryan





> Add ability to extract a specified list of versions of a column in a single roundtrip
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2793
>                 URL: https://issues.apache.org/jira/browse/HBASE-2793
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>         Attachments: 2793_patch_v1.txt, 2793_patch_v2.txt
>
>
> In one of the use cases we were looking at, each row contains a single column, but with several versions (e.g., each version representing an event in a log), and we want to be able to extract specific set of versions from the row in a single round-trip.
> Currently, on a Get, one can retrieve a specific version of a column using setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885805#action_12885805 ] 

Jonathan Gray commented on HBASE-2793:
--------------------------------------

bq. I am not clear on the why the above difference. Any ideas?

QueryMatcher was used for Gets.  ScanQueryMatcher is for Scans.  Next row of a get is done.  QueryMatcher should actually be removed as we no longer (or should no longer) use it directly.  Some discussion of that in HBASE-2803.

You only need to worry about ScanQueryMatcher.

> Add ability to extract a specified list of versions of a column in a single roundtrip
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2793
>                 URL: https://issues.apache.org/jira/browse/HBASE-2793
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>         Attachments: 2793_patch_v1.txt
>
>
> In one of the use cases we were looking at, each row contains a single column, but with several versions (e.g., each version representing an event in a log), and we want to be able to extract specific set of versions from the row in a single round-trip.
> Currently, on a Get, one can retrieve a specific version of a column using setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882881#action_12882881 ] 

Kannan Muthukkaruppan commented on HBASE-2793:
----------------------------------------------

With #3, in terms of API, what I had in mind was to add setTimeStamps() to the Get object which takes a List of timestamps, and stashes away the list in a private (new) field of the Get object. 

On a Get object, the client may apply a setTimeStamp(), setTimeRange(), setTimeStamps(), and these correspond to the equivalent notions in SQL

 WHERE time = ts

 WHERE time >= ts1 and time < ts2

 WHERE time IN (ts1, ts2, ...., tsn)

respectively. 

If client calls multiple of these APIs on the same Get object, we could simply have a latest wins rule (which is already the case for the existing two API calls).





> Add ability to extract a specified list of versions of a column in a single roundtrip
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2793
>                 URL: https://issues.apache.org/jira/browse/HBASE-2793
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> In one of the use cases we were looking at, each row contains a single column, but with several versions (e.g., each version representing an event in a log), and we want to be able to extract specific set of versions from the row in a single round-trip.
> Currently, on a Get, one can retrieve a specific version of a column using setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kannan Muthukkaruppan reassigned HBASE-2793:
--------------------------------------------

    Assignee: Kannan Muthukkaruppan

> Add ability to extract a specified list of versions of a column in a single roundtrip
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2793
>                 URL: https://issues.apache.org/jira/browse/HBASE-2793
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> In one of the use cases we were looking at, each row contains a single column, but with several versions (e.g., each version representing an event in a log), and we want to be able to extract specific set of versions from the row in a single round-trip.
> Currently, on a Get, one can retrieve a specific version of a column using setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kannan Muthukkaruppan updated HBASE-2793:
-----------------------------------------

    Attachment: 2793_patch_v2.txt

* Removed specialized check for TimestampsFilter in ScanQueryMatcher for narrowing down time range. (Will do that optimization once Pranav's changes (for HBase-2265) are in-- by allowing Filters to participate in StoreFileScanner.shouldSeek() check. Each filter should be able to participate in shouldSeek() to avoid seeking into unnecessary files.).

* Enhanced test by throwing in some  version deletes in the mix.

> Add ability to extract a specified list of versions of a column in a single roundtrip
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2793
>                 URL: https://issues.apache.org/jira/browse/HBASE-2793
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>         Attachments: 2793_patch_v1.txt, 2793_patch_v2.txt
>
>
> In one of the use cases we were looking at, each row contains a single column, but with several versions (e.g., each version representing an event in a log), and we want to be able to extract specific set of versions from the row in a single round-trip.
> Currently, on a Get, one can retrieve a specific version of a column using setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885586#action_12885586 ] 

HBase Review Board commented on HBASE-2793:
-------------------------------------------

Message from: "Kannan Muthukkaruppan" <ka...@facebook.com>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/268/
-----------------------------------------------------------

(Updated 2010-07-06 09:19:52.326413)


Review request for hbase.


Summary
-------

Discussion up in https://issues.apache.org/jira/browse/HBASE-2793.

Using the Filter approach as discussed.


This addresses bug HBASE-2793.
    http://issues.apache.org/jira/browse/HBASE-2793


Diffs
-----

  trunk/src/main/java/org/apache/hadoop/hbase/filter/Filter.java 960691 
  trunk/src/main/java/org/apache/hadoop/hbase/filter/TimestampsFilter.java PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 960691 
  trunk/src/test/java/org/apache/hadoop/hbase/client/TestTimestampsFilter.java PRE-CREATION 

Diff: http://review.hbase.org/r/268/diff


Testing
-------

The new unit test for TimestampsFilter passes. Running unit tests right now.


Thanks,

Kannan




> Add ability to extract a specified list of versions of a column in a single roundtrip
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2793
>                 URL: https://issues.apache.org/jira/browse/HBASE-2793
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> In one of the use cases we were looking at, each row contains a single column, but with several versions (e.g., each version representing an event in a log), and we want to be able to extract specific set of versions from the row in a single round-trip.
> Currently, on a Get, one can retrieve a specific version of a column using setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885346#action_12885346 ] 

Kannan Muthukkaruppan commented on HBASE-2793:
----------------------------------------------

Diff available @ http://review.hbase.org/r/268/

> Add ability to extract a specified list of versions of a column in a single roundtrip
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2793
>                 URL: https://issues.apache.org/jira/browse/HBASE-2793
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>
> In one of the use cases we were looking at, each row contains a single column, but with several versions (e.g., each version representing an event in a log), and we want to be able to extract specific set of versions from the row in a single round-trip.
> Currently, on a Get, one can retrieve a specific version of a column using setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2793) Add ability to extract a specified list of versions of a column in a single roundtrip

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885988#action_12885988 ] 

Kannan Muthukkaruppan commented on HBASE-2793:
----------------------------------------------

That makes sense. Thanks Jonathan.

> Add ability to extract a specified list of versions of a column in a single roundtrip
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2793
>                 URL: https://issues.apache.org/jira/browse/HBASE-2793
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>         Attachments: 2793_patch_v1.txt
>
>
> In one of the use cases we were looking at, each row contains a single column, but with several versions (e.g., each version representing an event in a log), and we want to be able to extract specific set of versions from the row in a single round-trip.
> Currently, on a Get, one can retrieve a specific version of a column using setTimeStamp(ts) or a range of versions using setTimeRange(min, max). But not a set of specified versions. It would be useful to add this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.