You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Jim Kellerman (JIRA)" <ji...@apache.org> on 2007/06/27 20:16:26 UTC

[jira] Issue Comment Edited: (HADOOP-1531) Add RowFilter to HRegion.HScanner

    [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508609 ] 

Jim Kellerman edited comment on HADOOP-1531 at 6/27/07 11:16 AM:
-----------------------------------------------------------------

In general, I have no objections to this change.

However, I do have a couple of comments:

- in HClient, the constructor for ClientScanner that does not take a filter is no longer needed since the constructor is only called from obtainScanner and obtainScanner(columns,startRow) just calls obtainScanner(columns,startRow,filter) specifying null for the filter.

- in HClient.ClientScanner, shouldn't the call to server.openScanner be conditionalized so that it either calls the HRegionServerInterface.openScanner which takes a scanner or the one which does not? (since you can't pass a null over an rpc) For example:

  try {

    if(this.filter == null) {

      this.scannerId = this.server.openScanner(info.regionInfo.regionName, 

          this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW);

    } else {

      this.scannerId = this.server.openScanner(info.regionInfo.regionName, 

          this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW, filter);

    }

- finally, I would like to see a test case that uses a filter The existing tests will ensure that there are no regressions.



 was:
In general, I have no objections to this change.

However, I do have a couple of comments:

- in HClient, the constructor for ClientScanner that does not take a filter is no longer needed since the constructor is only called from obtainScanner and obtainScanner(columns,startRow) just calls obtainScanner(columns,startRow,filter) specifying null for the filter.

- in HClient.ClientScanner, shouldn't the call to server.openScanner be conditionalized so that it either calls the HRegionServerInterface.openScanner which takes a scanner or the one which does not? (since you can't pass a null over an rpc) For example:

  try {
    if(this.filter == null) {
      this.scannerId = this.server.openScanner(info.regionInfo.regionName, 
          this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW);
    } else {
      this.scannerId = this.server.openScanner(info.regionInfo.regionName, 
          this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW, filter);
    }

- finally, I would like to see a test case that uses a filter The existing tests will ensure that there are no regressions.


> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.