You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jesse Yates (JIRA)" <ji...@apache.org> on 2013/07/05 22:39:49 UTC

[jira] [Comment Edited] (HBASE-8809) Include deletes in the scan (setRaw) method does not respect the time range or the filter

    [ https://issues.apache.org/jira/browse/HBASE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701107#comment-13701107 ] 

Jesse Yates edited comment on HBASE-8809 at 7/5/13 8:39 PM:
------------------------------------------------------------

As slight follow up to this, it feels like raw scans should also ignore the column version/timestamp filtering. In particular, I'm talking about this section in ScanQueryMatcher:
{code}
 MatchCode colChecker = columns.checkColumn(bytes, offset, qualLength,
        timestamp, type, kv.getMemstoreTS() > maxReadPointToTrackVersions);
    /*
     * According to current implementation, colChecker can only be
     * SEEK_NEXT_COL, SEEK_NEXT_ROW, SKIP or INCLUDE. Therefore, always return
     * the MatchCode. If it is SEEK_NEXT_ROW, also set stickyNextRow.
     */
    ...
{code}

Where the ScanWildcardColumnTracker will not ignore the timestamp in the simple case - four (since default is to keep 3 versions) puts to the same row with increasing timestamps will ignore the first by default, even though its still "present" in the store regardless of the rawness of the scan.

Thoughts?
                
      was (Author: jesse_yates):
    As slight follow up to this, it feels like raw scans should also ignore the column version/timestamp filtering. In particular, I'm talking about this section in ScanQueryMatcher:
{code}
 MatchCode colChecker = columns.checkColumn(bytes, offset, qualLength,
        timestamp, type, kv.getMemstoreTS() > maxReadPointToTrackVersions);
    /*
     * According to current implementation, colChecker can only be
     * SEEK_NEXT_COL, SEEK_NEXT_ROW, SKIP or INCLUDE. Therefore, always return
     * the MatchCode. If it is SEEK_NEXT_ROW, also set stickyNextRow.
     */
    ...
{code}

Where the ScanWildcardColumnTracker will not ignore the timestamp in the simple case - four puts to the same row with different timestamps will ignore the oldest by default, even though its still "present" in the store regardless of the rawness of the scan.

Thoughts?
                  
> Include deletes in the scan (setRaw) method does not respect the time range or the filter
> -----------------------------------------------------------------------------------------
>
>                 Key: HBASE-8809
>                 URL: https://issues.apache.org/jira/browse/HBASE-8809
>             Project: HBase
>          Issue Type: Bug
>          Components: Scanners
>            Reporter: Vasu Mariyala
>            Assignee: Lars Hofhansl
>             Fix For: 0.98.0, 0.95.2, 0.94.10
>
>         Attachments: 8809-0.94.txt, 8809-trunk.txt, DeleteMarkers.doc
>
>
> If a row has been deleted at time stamp 'T' and a scan with time range (0, T-1) is executed, it still returns the delete marker at time stamp 'T'. It is because of the code in ScanQueryMatcher.java
> {code}
>       if (retainDeletesInOutput
>           || (!isUserScan && (EnvironmentEdgeManager.currentTimeMillis() - timestamp) <= timeToPurgeDeletes)
>           || kv.getMemstoreTS() > maxReadPointToTrackVersions) {
>         // always include or it is not time yet to check whether it is OK
>         // to purge deltes or not
>         return MatchCode.INCLUDE;
>       }
> {code}
> The assumption is scan (even with setRaw is set to true) should respect the filters and the time range specified.
> Please let me know if you think this behavior can be changed so that I can provide a patch for it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira