You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Nicholas Harezga <n....@apexxs.com> on 2016/03/23 19:36:04 UTC

ValueFilter returning old versions

I have a table with row keys representing file names, a single column family, and file creation time as the column qualifier. The value of these columns is a serialized JSON representation of an object. My program goes through the records, performs an operation on the file, and modifies the JSON object to indicate that the file has been processed. On each run of the program I only want to grab up to a specified number of records that have yet to be processed. Previously I was grabbing all of the records and filtering at the client side. I am now attempting to move the filtering to the server side to reduce network traffic and hopefully streamline the process a bit.

I am using a ValueFilter with a SubstringComparator to get the rows that meet my conditions.
Scan scan = new Scan();
String filterString = "\"jobState\":\"new\"";

scan.setFilter(new ValueFilter(CompareFilter.CompareOp.EQUAL, new SubstringComparator(filterString)));

When records are added they have a jobState of "new" and when they have been processed the jobState is set to "processed" and the record in HBase is updated. If I do a scan from HBase shell or do a scan of the full table from Java I get the most recent version (maximum versions for this table is set to 1). When I scan using the filter I still get the original version of this row, and if I change the filter to use "processed" I get the updated version.

The end result of this is that I process the same files several times. The process repeats itself until HBase performs a flush or compaction, verified by flushing manually from HBase shell.

I am currently using hbase-shaded-client v1.1.2 for my Java API and I have HBase v1.0.0-cdh5.4.8 running on my cluster under Cloudera Manager v5.4.8. I believe I found a similar issue posted in December, 2013 (http://mail-archives.apache.org/mod_mbox/hbase-user/201312.mbox/%3CCADoiZqpxq64L75v3T3RGsks-82kRYMFMNYnYs-+2u0-f2a0PoA@mail.gmail.com%3E) but there didn't appear to be any resolution to the issue other than creating a custom filter.

Is there a newer version of HBase that doesn't have this issue? Is there a better way for me to do the filtering that I need to do?

If there is any further information I can provide please let me know. Any recommendations/help would be greatly appreciated.