You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Scott Brunza <sc...@sonalysts.com> on 2012/01/18 14:20:55 UTC

Re: getting row info without data

I think this fits what I'm trying to do, but have been having a heck of a
time with it.

I'm using a SingleColumnValueFilter to get rows who's cf:qual have a given
value, but would just like the key, no values.  If I add that filter to a
filterList with a KeyOnlyFilter.new, I get nothing if KeyOnlyFilter is 
first and everything, as if there wasn't a KeyOnlyFilter, when it's last.

I'm writing it in a ruby script.


filterList = FilterList.new
filterList.addFilter(KeyOnlyFilter.new)
filterList.addFilter(SingleColumnValueFilter.new(Bytes.toBytes(cf),
   Bytes.toBytes(qualifier), CompareFilter::CompareOp.valueOf('EQUAL'),
   SubstringComparator.new(string)))
scan.setCaching(2)
scan.setFilter(filterList)
result_scanner = table.getScanner(scan)

result_scanner.each do |res|
  puts(res)
end

result_scanner.close


Scott


Re: getting row info without data

Posted by Scott Brunza <sc...@sonalysts.com>.
I've done some more testing with a simple table on a pseudo-distributed system (my laptop).  Below is the test script with the various tests' outputs.  I guess where I'm really getting confused is when I query for info:lame = Washington and want the info:fname column returned, why are all fname's getting returned, not just Washington (row 3…).

The more complex table I was working with earlier, both on my laptop and a cluster, contains information about network traffic.  I was attempting to only return the keys for all hosts from a given country.  I added a column to restrict the amount of stuff I was getting back, since I was only expecting to see the row key we defined during puts.  Maybe I'm actually getting back exactly what I'm supposed to.  Not sure.  I could still be thinking too "yesSQL".

I also noticed that I was able to actually see the values contained in the returnes from the more complex table, while here, I'm getting vlen=X, and when running the shell, I get back the actual name value, i.e., George.

Scott



include Java

import org.apache.hadoop.hbase.HBaseConfiguration

import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.client.ResultScanner
import org.apache.hadoop.hbase.client.Scan

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.FilterList
import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter
import org.apache.hadoop.hbase.filter.KeyOnlyFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator

import org.apache.hadoop.hbase.util.Bytes

conf  = HBaseConfiguration.new
admin = HBaseAdmin.new(conf)
table = HTable.new('sample_names')
scan  = Scan.new
result_scanner = ResultScanner.new

filterList = FilterList.new
filterList.addFilter(SingleColumnValueFilter.new(Bytes.toBytes('info'),Bytes.toBytes('lname'),CompareFilter::CompareOp.valueOf('EQUAL'),SubstringComparator.new('Washington')))
filterList.addFilter(KeyOnlyFilter.new)

#scan.addColumn(Bytes.toBytes('info'), Bytes.toBytes('lname'))
scan.addColumn(Bytes.toBytes('info'), Bytes.toBytes('fname'))
scan.setFilter(filterList)

result_scanner = table.getScanner(scan)

result_scanner.each do |res|
  puts(res)
end

result_scanner.close


# scan of table from shell:
# ROW                   COLUMN+CELL                                               
# 1                    column=info:fname, timestamp=1326979816243, value=John
# 1                    column=info:lname, timestamp=1326979823380, value=Smith
# 2                    column=info:fname, timestamp=1326979829610, value=Jane
# 2                    column=info:lname, timestamp=1326979834954, value=Doe
# 3                    column=info:fname, timestamp=1326979841429, value=George
# 3                    column=info:lname, timestamp=1326979849407, value=Washington
# 4                    column=info:fname, timestamp=1326979856746, value=Ben
# 4                    column=info:lname, timestamp=1326979862339, value=Franklin
#
# with empty filter list:
# keyvalues={1/info:fname/1326979816243/Put/vlen=4, 1/info:lname/1326979823380/Put/vlen=5}
# keyvalues={2/info:fname/1326979829610/Put/vlen=4, 2/info:lname/1326979834954/Put/vlen=3}
# keyvalues={3/info:fname/1326979841429/Put/vlen=6, 3/info:lname/1326979849407/Put/vlen=10}
# keyvalues={4/info:fname/1326979856746/Put/vlen=3, 4/info:lname/1326979862339/Put/vlen=8}
#
# with only KeyOnlyFilter in list:
# keyvalues={1/info:fname/1326979816243/Put/vlen=0, 1/info:lname/1326979823380/Put/vlen=0}
# keyvalues={2/info:fname/1326979829610/Put/vlen=0, 2/info:lname/1326979834954/Put/vlen=0}
# keyvalues={3/info:fname/1326979841429/Put/vlen=0, 3/info:lname/1326979849407/Put/vlen=0}
# keyvalues={4/info:fname/1326979856746/Put/vlen=0, 4/info:lname/1326979862339/Put/vlen=0}
#
# with info:lname column and empty filter list:
# keyvalues={1/info:lname/1326979823380/Put/vlen=5}
# keyvalues={2/info:lname/1326979834954/Put/vlen=3}
# keyvalues={3/info:lname/1326979849407/Put/vlen=10}
# keyvalues={4/info:lname/1326979862339/Put/vlen=8}
#
# with info:lname column and KeyOnlyFilter:
# keyvalues={1/info:lname/1326979823380/Put/vlen=0}
# keyvalues={2/info:lname/1326979834954/Put/vlen=0}
# keyvalues={3/info:lname/1326979849407/Put/vlen=0}
# keyvalues={4/info:lname/1326979862339/Put/vlen=0}
#
# as above, but adding column after setting filter:
# keyvalues={1/info:lname/1326979823380/Put/vlen=0}
# keyvalues={2/info:lname/1326979834954/Put/vlen=0}
# keyvalues={3/info:lname/1326979849407/Put/vlen=0}
# keyvalues={4/info:lname/1326979862339/Put/vlen=0}
# 
# with only SingleColumnValueFilter in list:
# keyvalues={3/info:fname/1326979841429/Put/vlen=6, 3/info:lname/1326979849407/Put/vlen=10}
#
# with KeyOnlyFilter then SingleColumnValueFilter in list:
# <returns nothing>
#
# with SingleColumnValueFilter then KeyOnlyFilter in list:
# keyvalues={3/info:fname/1326979841429/Put/vlen=0, 3/info:lname/1326979849407/Put/vlen=0}
#
# with SingleColumnValueFilter then KeyOnlyFilter in list, set filter, add info:fname column:
# keyvalues={1/info:fname/1326979816243/Put/vlen=0}
# keyvalues={2/info:fname/1326979829610/Put/vlen=0}
# keyvalues={3/info:fname/1326979841429/Put/vlen=0}
# keyvalues={4/info:fname/1326979856746/Put/vlen=0}
#
# with SingleColumnValueFilter then KeyOnlyFilter in list, add info:fname column, set filter:
# keyvalues={1/info:fname/1326979816243/Put/vlen=0}
# keyvalues={2/info:fname/1326979829610/Put/vlen=0}
# keyvalues={3/info:fname/1326979841429/Put/vlen=0}
# keyvalues={4/info:fname/1326979856746/Put/vlen=0}



--- Let us all bask in television's warm glowing warming glow ---
Scott Brunza           860.326.3637         scottso@sonalysts.com

This e-mail and any files transmitted with it may be  proprietary
and are  intended solely for the use of the  individual or entity
to whom they are addressed.  If you have received this  e-mail in
error please notify the sender.


Re: getting row info without data

Posted by Stack <st...@duboce.net>.
On Wed, Jan 18, 2012 at 5:20 AM, Scott Brunza <sc...@sonalysts.com> wrote:
> I'm using a SingleColumnValueFilter to get rows who's cf:qual have a given
> value, but would just like the key, no values.

I take it you made SingleColumnValueFilter do the right thing first?

> If I add that filter to a
> filterList with a KeyOnlyFilter.new, I get nothing if KeyOnlyFilter is
> first and everything, as if there wasn't a KeyOnlyFilter, when it's last.
>

This sounds like the KeyOnlyFilter only works if its second?

> I'm writing it in a ruby script.
>
>
> filterList = FilterList.new
> filterList.addFilter(KeyOnlyFilter.new)
> filterList.addFilter(SingleColumnValueFilter.new(Bytes.toBytes(cf),
>   Bytes.toBytes(qualifier), CompareFilter::CompareOp.valueOf('EQUAL'),
>   SubstringComparator.new(string)))

Can you make another dumber filter work with KeyOnlyFilter in a FilterList?

Is this on cluster or is it in a standalone instance?  Adding a bit of
logging to each of the filters might help.   Its likely an issue in
filtering but a bit of logging might uncover the unexpected?

Good on you Scott,
St.Ack