You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "James Kennedy (JIRA)" <ji...@apache.org> on 2007/06/26 22:20:25 UTC

[jira] Created: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Add RowFilter to HRegion.HScanner
---------------------------------

                 Key: HADOOP-1531
                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
             Project: Hadoop
          Issue Type: Improvement
          Components: contrib/hbase
    Affects Versions: 0.14.0
            Reporter: James Kennedy
            Assignee: James Kennedy


I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.

HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.

Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.

The initial RowFilter implementation is limited in several ways:
* Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
* Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
* row key criteria is a regular expression only
* row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.

But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-1531:
--------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: code-style-formatter, eclipse.preferences, RowFilter-v2.patch, RowFilter-v3.patch, RowFilter-v4.patch, RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "James Kennedy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508648 ] 

James Kennedy commented on HADOOP-1531:
---------------------------------------

in relation to Hadoop-1439, i'll add a comment there.

I think i needed that RowFilter() constructor to make RPC serialization happy.  Needs to instantiate the filter before it calls readFields().

The regexp is passed to the constructor since its a single param whereas the column criteria is many. I suppose i could add a constructor that takes a list of col criteria.

filter subpackage is a good idea... in fact it wouldn't hurt to refactor whats there into some other sub-packages.

is the filter == null code necessary? I thought I saw RPC code that was handling nulls. Indeed in my testing i seem to be able to pass null filters through rpc just fine. I'll dbl check that though.

and I'll fix the other stuff you guys mentioned.



> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1531:
----------------------------------

    Attachment: code-style-formatter

Window->Preferences->Java->CodeStyle->Formatter->Export output.

Eclipse has so many options (some buried quite well). I was unaware of this, but here it is.

> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: code-style-formatter, eclipse.preferences, RowFilter-v2.patch, RowFilter-v3.patch, RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508609 ] 

Jim Kellerman edited comment on HADOOP-1531 at 6/27/07 11:16 AM:
-----------------------------------------------------------------

In general, I have no objections to this change.

However, I do have a couple of comments:

- in HClient, the constructor for ClientScanner that does not take a filter is no longer needed since the constructor is only called from obtainScanner and obtainScanner(columns,startRow) just calls obtainScanner(columns,startRow,filter) specifying null for the filter.

- in HClient.ClientScanner, shouldn't the call to server.openScanner be conditionalized so that it either calls the HRegionServerInterface.openScanner which takes a scanner or the one which does not? (since you can't pass a null over an rpc) For example:

  try {

    if(this.filter == null) {

      this.scannerId = this.server.openScanner(info.regionInfo.regionName, 

          this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW);

    } else {

      this.scannerId = this.server.openScanner(info.regionInfo.regionName, 

          this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW, filter);

    }

- finally, I would like to see a test case that uses a filter The existing tests will ensure that there are no regressions.



 was:
In general, I have no objections to this change.

However, I do have a couple of comments:

- in HClient, the constructor for ClientScanner that does not take a filter is no longer needed since the constructor is only called from obtainScanner and obtainScanner(columns,startRow) just calls obtainScanner(columns,startRow,filter) specifying null for the filter.

- in HClient.ClientScanner, shouldn't the call to server.openScanner be conditionalized so that it either calls the HRegionServerInterface.openScanner which takes a scanner or the one which does not? (since you can't pass a null over an rpc) For example:

  try {
    if(this.filter == null) {
      this.scannerId = this.server.openScanner(info.regionInfo.regionName, 
          this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW);
    } else {
      this.scannerId = this.server.openScanner(info.regionInfo.regionName, 
          this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW, filter);
    }

- finally, I would like to see a test case that uses a filter The existing tests will ensure that there are no regressions.


> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "James Kennedy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509954 ] 

James Kennedy commented on HADOOP-1531:
---------------------------------------

Yeah, i'd implement a DomainRowFilter that not only stops filtering when it leaves the domain but also uses a more efficient more specific way of filtering other than regexp.  E.g String.endsWith().  

You're right, filter() should be inverted, that's a bug.

Michael, that's what i thought... so you've been manually adjusting the line wrapping yourself? I guess that's what I'll do then to.

Jim, thanks again.

> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: code-style-formatter, eclipse.preferences, RowFilter-v2.patch, RowFilter-v3.patch, RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508609 ] 

Jim Kellerman edited comment on HADOOP-1531 at 6/27/07 11:19 AM:
-----------------------------------------------------------------

In general, I have no objections to this change.

However, I do have a couple of comments:

- in HClient, the constructor for ClientScanner that does not take a filter is no longer needed since the constructor is only called from obtainScanner and obtainScanner(columns,startRow) just calls obtainScanner(columns,startRow,filter) specifying null for the filter.

- in HClient.ClientScanner, shouldn't the call to server.openScanner be conditionalized so that it either calls the HRegionServerInterface.openScanner which takes a scanner or the one which does not? (since you can't pass a null over an rpc) For example:

try {

if(this.filter == null) {

this.scannerId = this.server.openScanner(info.regionInfo.regionName, 

this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW);

} else {

this.scannerId = this.server.openScanner(info.regionInfo.regionName, 

this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW, filter);

}

- finally, I would like to see a test case that uses a filter The existing tests will ensure that there are no regressions.



 was:
In general, I have no objections to this change.

However, I do have a couple of comments:

- in HClient, the constructor for ClientScanner that does not take a filter is no longer needed since the constructor is only called from obtainScanner and obtainScanner(columns,startRow) just calls obtainScanner(columns,startRow,filter) specifying null for the filter.

- in HClient.ClientScanner, shouldn't the call to server.openScanner be conditionalized so that it either calls the HRegionServerInterface.openScanner which takes a scanner or the one which does not? (since you can't pass a null over an rpc) For example:

  try {

    if(this.filter == null) {

      this.scannerId = this.server.openScanner(info.regionInfo.regionName, 

          this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW);

    } else {

      this.scannerId = this.server.openScanner(info.regionInfo.regionName, 

          this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW, filter);

    }

- finally, I would like to see a test case that uses a filter The existing tests will ensure that there are no regressions.


> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12510462 ] 

stack commented on HADOOP-1531:
-------------------------------

Committed with below message.  Thanks for the contribution James.

HADOOP-1531 Add RowFilter to HRegion.HScanner.
Adds a row/column filter interface and two implementations: A pager and a
row/column-value regex filter.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HRegionServer.java
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HRegionInterface.java
    (openScanner): Add override that specifies a row fliter.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HClient.java
    (obtainScanner): Add override that specifies a row fliter.
    (ColumnScanner): Add filter parameter to constructor.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HRegion.java
    (getScanner): Add override with filter parameter.
    (next): Add handling of filtering.
A src/contrib/hbase/src/java/org/apache/hadoop/hbase/filter/InvalidRowFilterException.java
A src/contrib/hbase/src/java/org/apache/hadoop/hbase/filter/RegExpRowFilter.java
A src/contrib/hbase/src/java/org/apache/hadoop/hbase/filter/RowFilterSet.java
A src/contrib/hbase/src/java/org/apache/hadoop/hbase/filter/PageRowFilter.java
A src/contrib/hbase/src/java/org/apache/hadoop/hbase/filter/RowFilterInterface.java
    Row-filter interface, exception and implementations.
A src/contrib/hbase/src/test/org/apache/hadoop/hbase/filter/TestRegExpRowFilter.java
A src/contrib/hbase/src/test/org/apache/hadoop/hbase/filter/TestPageRowFilter.java
    Simple pager and regex filter tests.

> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: code-style-formatter, eclipse.preferences, RowFilter-v2.patch, RowFilter-v3.patch, RowFilter-v4.patch, RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "James Kennedy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509893 ] 

James Kennedy commented on HADOOP-1531:
---------------------------------------

Thanks for the update.

filterAllRemaining() always returns false in the RegExpRowFilter case because there is nothing about filtering a single row that tells the filter it should no longer process future rows.  After the first time the filter method returns true (say row 50 filtered), rows 57-78, may be good matches and should be included in the results.  This filter does not assume that valid rows are in a consecutive chunk only.

As for the filter(final Text rowKey, final Text colKey,  final byte[] data) method, the logic is already AND logic. If rowKey is non-null and does not match regexp, method returns true right away.  Otherwise returns true if colum tests fail.  The javadoc is confusing on this though by saying and/or and I should probably change it.  What I meant was that it's an AND if you include a non-null rowKey.  But non-null row key is optional and if not included, then only the column tests apply.

Any word on my Q above about exporting your eclipse formatter settings?





> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: RowFilter-v2.patch, RowFilter-v3.patch, RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "James Kennedy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Kennedy updated HADOOP-1531:
----------------------------------

    Status: Patch Available  (was: Open)

Looks good, thanks Michael.

> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: code-style-formatter, eclipse.preferences, RowFilter-v2.patch, RowFilter-v3.patch, RowFilter-v4.patch, RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508609 ] 

Jim Kellerman edited comment on HADOOP-1531 at 6/27/07 11:23 AM:
-----------------------------------------------------------------

In general, I have no objections to this change.

However, I do have a couple of comments:

- in HClient, the constructor for ClientScanner that does not take a filter is no longer needed since the constructor is only called from obtainScanner and obtainScanner(columns,startRow) just calls obtainScanner(columns,startRow,filter) specifying null for the filter.

- in HClient.ClientScanner, shouldn't the call to server.openScanner be conditionalized so that it either calls the HRegionServerInterface.openScanner which takes a scanner or the one which does not? (since you can't pass a null over an rpc) For example:


  try {

  if(this.filter == null) {

  this.scannerId = this.server.openScanner(info.regionInfo.regionName, 

  this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW);

  } else {

  this.scannerId = this.server.openScanner(info.regionInfo.regionName, 

  this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW, filter);

  }
</pre>
- finally, I would like to see a test case that uses a filter The existing tests will ensure that there are no regressions.



 was:
In general, I have no objections to this change.

However, I do have a couple of comments:

- in HClient, the constructor for ClientScanner that does not take a filter is no longer needed since the constructor is only called from obtainScanner and obtainScanner(columns,startRow) just calls obtainScanner(columns,startRow,filter) specifying null for the filter.

- in HClient.ClientScanner, shouldn't the call to server.openScanner be conditionalized so that it either calls the HRegionServerInterface.openScanner which takes a scanner or the one which does not? (since you can't pass a null over an rpc) For example:

<pre>
try {
  if(this.filter == null) {
    this.scannerId = this.server.openScanner(info.regionInfo.regionName, 
        this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW);
  } else {
    this.scannerId = this.server.openScanner(info.regionInfo.regionName, 
        this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW, filter);
}
</pre>
- finally, I would like to see a test case that uses a filter The existing tests will ensure that there are no regressions.


> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508601 ] 

stack commented on HADOOP-1531:
-------------------------------

Nice addition James.

Do you think HADOOP-1439 should be done as a filter?

in RowFilterInterface.java and elsewhere, you open class javadoc comment with a '<p>'.  Superfluous?

Why have this constructor:

+  /** Default constructor, filters nothing. */
+  public RowFilter() {
+    // nada
+  }

Is it needed?

Why pass row regexes on construction but use a setter for adding column filters rather than pass both to the constructor?

Should these additions go into a filter subpackage? (Its getting a little crowded in the hbase home directory).

Fix the auto-formatting in HRegion.next (Tests are split over lines.  Makes it harder to follow).

As you state, looks like a feature that would benefit from basic unit tests.

Patch applied cleanly for me to r551038



> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "James Kennedy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509946 ] 

James Kennedy commented on HADOOP-1531:
---------------------------------------

Hmmm... in my eclipse 3.2.2 i can go to Windows->Preferences->Java->CodeStyle->Formatter->Export.  But thanks for your eclipse settings, i think i can extract your formatter from there...


> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: eclipse.preferences, RowFilter-v2.patch, RowFilter-v3.patch, RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12510657 ] 

Hudson commented on HADOOP-1531:
--------------------------------

Integrated in Hadoop-Nightly #146 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/146/])

> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: code-style-formatter, eclipse.preferences, RowFilter-v2.patch, RowFilter-v3.patch, RowFilter-v4.patch, RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12510216 ] 

Hadoop QA commented on HADOOP-1531:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12361077/RowFilter-v4.patch applied and successfully tested against trunk revision r553080.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/361/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/361/console

> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: code-style-formatter, eclipse.preferences, RowFilter-v2.patch, RowFilter-v3.patch, RowFilter-v4.patch, RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1531:
----------------------------------

    Attachment: eclipse.preferences

All my eclipse preferences. Certainly contains the formatting stuff but I did not see a way to just export the formatting stuff.

NOTE: This is for eclipse 3.2 and not eclipse-europa (3.3)


> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: eclipse.preferences, RowFilter-v2.patch, RowFilter-v3.patch, RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508609 ] 

Jim Kellerman edited comment on HADOOP-1531 at 6/27/07 11:21 AM:
-----------------------------------------------------------------

In general, I have no objections to this change.

However, I do have a couple of comments:

- in HClient, the constructor for ClientScanner that does not take a filter is no longer needed since the constructor is only called from obtainScanner and obtainScanner(columns,startRow) just calls obtainScanner(columns,startRow,filter) specifying null for the filter.

- in HClient.ClientScanner, shouldn't the call to server.openScanner be conditionalized so that it either calls the HRegionServerInterface.openScanner which takes a scanner or the one which does not? (since you can't pass a null over an rpc) For example:

<pre>
try {
  if(this.filter == null) {
    this.scannerId = this.server.openScanner(info.regionInfo.regionName, 
        this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW);
  } else {
    this.scannerId = this.server.openScanner(info.regionInfo.regionName, 
        this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW, filter);
}
</pre>
- finally, I would like to see a test case that uses a filter The existing tests will ensure that there are no regressions.



 was:
In general, I have no objections to this change.

However, I do have a couple of comments:

- in HClient, the constructor for ClientScanner that does not take a filter is no longer needed since the constructor is only called from obtainScanner and obtainScanner(columns,startRow) just calls obtainScanner(columns,startRow,filter) specifying null for the filter.

- in HClient.ClientScanner, shouldn't the call to server.openScanner be conditionalized so that it either calls the HRegionServerInterface.openScanner which takes a scanner or the one which does not? (since you can't pass a null over an rpc) For example:

try {

if(this.filter == null) {

this.scannerId = this.server.openScanner(info.regionInfo.regionName, 

this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW);

} else {

this.scannerId = this.server.openScanner(info.regionInfo.regionName, 

this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW, filter);

}

- finally, I would like to see a test case that uses a filter The existing tests will ensure that there are no regressions.


> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "James Kennedy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Kennedy updated HADOOP-1531:
----------------------------------

    Attachment: RowFilter.patch

This first patch is missing tests but I have tested it less formally.  It's enough for you to see and give feedback.

> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508609 ] 

Jim Kellerman commented on HADOOP-1531:
---------------------------------------

In general, I have no objections to this change.

However, I do have a couple of comments:

- in HClient, the constructor for ClientScanner that does not take a filter is no longer needed since the constructor is only called from obtainScanner and obtainScanner(columns,startRow) just calls obtainScanner(columns,startRow,filter) specifying null for the filter.

- in HClient.ClientScanner, shouldn't the call to server.openScanner be conditionalized so that it either calls the HRegionServerInterface.openScanner which takes a scanner or the one which does not? (since you can't pass a null over an rpc) For example:

  try {
    if(this.filter == null) {
      this.scannerId = this.server.openScanner(info.regionInfo.regionName, 
          this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW);
    } else {
      this.scannerId = this.server.openScanner(info.regionInfo.regionName, 
          this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW, filter);
    }

- finally, I would like to see a test case that uses a filter The existing tests will ensure that there are no regressions.


> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508609 ] 

Jim Kellerman edited comment on HADOOP-1531 at 6/27/07 11:27 AM:
-----------------------------------------------------------------

In general, I have no objections to this change.

However, I do have a couple of comments:

- in HClient, the constructor for ClientScanner that does not take a filter is no longer needed since the constructor is only called from obtainScanner and obtainScanner(columns,startRow) just calls obtainScanner(columns,startRow,filter) specifying null for the filter.

- in HClient.ClientScanner, shouldn't the call to server.openScanner be conditionalized so that it either calls the HRegionServerInterface.openScanner which takes a scanner or the one which does not? (since you can't pass a null over an rpc) For example:

{code}
  try {
    if(this.filter == null) {
      this.scannerId = this.server.openScanner(info.regionInfo.regionName, 
          this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW);
    } else {
      this.scannerId = this.server.openScanner(info.regionInfo.regionName, 
          this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW, filter);
    }
{code}

- finally, I would like to see a test case that uses a filter The existing tests will ensure that there are no regressions.



 was:
In general, I have no objections to this change.

However, I do have a couple of comments:

- in HClient, the constructor for ClientScanner that does not take a filter is no longer needed since the constructor is only called from obtainScanner and obtainScanner(columns,startRow) just calls obtainScanner(columns,startRow,filter) specifying null for the filter.

- in HClient.ClientScanner, shouldn't the call to server.openScanner be conditionalized so that it either calls the HRegionServerInterface.openScanner which takes a scanner or the one which does not? (since you can't pass a null over an rpc) For example:


  try {

  if(this.filter == null) {

  this.scannerId = this.server.openScanner(info.regionInfo.regionName, 

  this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW);

  } else {

  this.scannerId = this.server.openScanner(info.regionInfo.regionName, 

  this.columns, currentRegion == 0 ? this.startRow : EMPTY_START_ROW, filter);

  }
</pre>
- finally, I would like to see a test case that uses a filter The existing tests will ensure that there are no regressions.


> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509948 ] 

stack commented on HADOOP-1531:
-------------------------------

Regards the filterAllRemaining above, I should add another version of the regex filter if I just want a scanner to return me all row keys that match apache.com (and that gives up scanning once it leaves row keys that contain that domain)?  Thanks for setting me right on final(Text, Text, byte []) . In RegExpRowFilter.filter(Text), its returning true if the regex matches.  Shouldn't that be inverted?

Regards export of my eclipse format, I would not inflict my settings on others.  I loose interest configuring eclipse after about the tenth dialog box so at a minimum they are incomplete.   IIRC, others have posted 'hadoop' eclipse formatters to the mailing list.

On wrapping after the operator, it does not appear as an option in the eclipse formatter tablet (I'm looking at an eclipse 3.3).  it always wants to wrap before. Odd, because if you use eclipse to break a long string, it leaves the '+' as the last character on the wrapped line (which is what I want).  Hanging operators when wrapping a line as an indicator of continuation I first read of in 'Java Elements of Style' I believe.  It made sense to me.

> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: eclipse.preferences, RowFilter-v2.patch, RowFilter-v3.patch, RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-1531:
--------------------------

    Attachment: RowFilter-v3.patch

Hey James:

I added a couple of tests and made some minor edits (Renamed exception InvalidRowFilter to InvalidRowFilterException, made line lengths < 80, etc).  

On RegExpRowFilter, I understand it just one possible implementation, but was wondering if it intentional that filterAllRemaining always returns false.  Should it return true as soon as filter method starts to return true?  And for the filter method that takes a row and column, the implementation tests that row matches regex OR column value matches.  I was thinking the default would be regex matches row AND column value matches.  What you think?

> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: RowFilter-v2.patch, RowFilter-v3.patch, RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-1531:
--------------------------

    Attachment: RowFilter-v4.patch

James: This patch includes invertion of regex match inside in RegExpRowFilter.filter.  Also adds little test of row + column values for this class.  If you think it good to go, move issue to 'patch sumitted' (I built it local and all seems good).

> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: code-style-formatter, eclipse.preferences, RowFilter-v2.patch, RowFilter-v3.patch, RowFilter-v4.patch, RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1531) Add RowFilter to HRegion.HScanner

Posted by "James Kennedy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Kennedy updated HADOOP-1531:
----------------------------------

    Attachment: RowFilter-v2.patch

Ok i've fixed above issues and implemented filter-stop-scan mechanism.

I also renamed RowFilter to RegExpRowFilter, created a new PageRowFilter for limiting to a maximum result size, and created a RowFilterSet which is a RowFilter that contains RowFilters and represents a heirarchy of filters to be processed disjunctively or conjunctively.

I've tested with my own tests but still need to write one in the HBase project.

I had a hard time getting my eclipse formatter to wrap lines on boolean operators like you suggested.  I did manually in this case.  Is it possible for you or have you guys already posted an export of your formatter settings?  That way I can be sure I'm formatting exactly how you are.

> Add RowFilter to HRegion.HScanner
> ---------------------------------
>
>                 Key: HADOOP-1531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1531
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.14.0
>            Reporter: James Kennedy
>            Assignee: James Kennedy
>         Attachments: RowFilter-v2.patch, RowFilter.patch
>
>
> I've implemented a RowFilterInterface and a RowFilter implementation.  This is passed to the HRegion.HScanner via HClient.openScanner() though it is an entirely optional parameter.
> HScanner applies the filter in the next() call by iterating until it encounters a row that is not filtered by the RowFilter.  The filter applies criteria based on row keys and/or column data values.
> Null values are little tricky since the resultSet in that loop may represent nulls as absent columns or as DELETED_BYTES.  Nevertheless null cases are taken care of by the filter and you can for example retrieve all rows where column X = null.
> The initial RowFilter implementation is limited in several ways:
> * Equality test only with literal values. No !=, <, >, etc. No col1 == col2. This is a straight-up byte[] comparison.
> * Multiple column criteria are treated as an implicit conjunction, no disjunction possible.
> * row key criteria is a regular expression only
> * row key criteria is independent of column criteria. No "if rowkey.matches(A)  and col1==B"  although the interface is created to allow for that.
> But it should be easy to write an improved RowFilterInterface implementation to take care of most of the above without having to change code elsewhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.