You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Akbar Gadhiya <ak...@gmail.com> on 2012/04/20 12:19:01 UTC

Compare range of numbers on column family

Hello,

I need help in scanning data with column family value.

With this sample data and scan command, first scan command returns nothing
and second returns row containing 6000.

PK.john.20120422 column=alternateKey:ms, timestamp=1334912415796, value=6000

My use case is to scan records which falls between start and end timestamp.
(timestamp is stored in column family alternateKey:ms)
We can not use timestamp provided by hbase because it indicates time when
record is inserted to hbase but we require timestamp related to business
needs.

We are trying to compare number as opposed to lexical comparison.  Is there
any way I can perform this scan operation?

My data and scan command look like,

create 'demo', 'user', 'alternateKey', 'content'

put 'innar_demo', 'PK.innar.20120418', 'user', 'Innar'
put 'innar_demo', 'PK.innar.20120418', 'alternateKey:city', 'Tallinn'
put 'innar_demo', 'PK.innar.20120418', 'alternateKey:phone', '0001'
put 'innar_demo', 'PK.innar.20120418', 'alternateKey:ms', '1000'
put 'innar_demo', 'PK.innar.20120418', 'content', 'Innar_GPB'

put 'innar_demo', 'PK.akbar.20120418', 'user', 'Akbar'
put 'innar_demo', 'PK.akbar.20120418', 'alternateKey:city', 'Ahmedabad'
put 'innar_demo', 'PK.akbar.20120418', 'alternateKey:phone', '0002'
put 'innar_demo', 'PK.akbar.20120418', 'alternateKey:ms', '2000'
put 'innar_demo', 'PK.akbar.20120418', 'content', 'Akbar_GPB'

put 'innar_demo', 'PK.ell.20120419', 'user', 'Ell'
put 'innar_demo', 'PK.ell.20120419', 'alternateKey:city', 'Bangkok'
put 'innar_demo', 'PK.ell.20120419', 'alternateKey:phone', '0003'
put 'innar_demo', 'PK.ell.20120419', 'alternateKey:ms', '3000'
put 'innar_demo', 'PK.ell.20120419', 'content', 'Ell_GPB'

put 'innar_demo', 'PK.jane.20120420', 'user', 'Jane'
put 'innar_demo', 'PK.jane.20120420', 'alternateKey:city', 'Jersey City'
put 'innar_demo', 'PK.jane.20120420', 'alternateKey:phone', '0004'
put 'innar_demo', 'PK.jane.20120420', 'alternateKey:ms', '4000'
put 'innar_demo', 'PK.jane.20120420', 'content', 'Jane_GPB'

put 'innar_demo', 'PK.michael.20120421', 'user', 'Michael'
put 'innar_demo', 'PK.michael.20120421', 'alternateKey:city', 'Berlin'
put 'innar_demo', 'PK.michael.20120421', 'alternateKey:phone', '0005'
put 'innar_demo', 'PK.michael.20120421', 'alternateKey:ms', '5000'
put 'innar_demo', 'PK.michael.20120421', 'content', 'Michael_GPB'

put 'innar_demo', 'PK.john.20120422', 'user', 'John'
put 'innar_demo', 'PK.john.20120422', 'alternateKey:city', 'London'
put 'innar_demo', 'PK.john.20120422', 'alternateKey:phone', '0006'
put 'innar_demo', 'PK.john.20120422', 'alternateKey:ms', '6000'
put 'innar_demo', 'PK.john.20120422', 'content', 'John_GPB'

import org.apache.hadoop.hbase.filter.FilterList
import org.apache.hadoop.hbase.filter.FilterList::Operator
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.filter.BinaryComparator
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.filter.ColumnRangeFilter

scan 'demo', {COLUMNS => ['alternateKey:ms'], FILTER =>
FilterList.new(FilterList::Operator::MUST_PASS_ALL,
java.util.Arrays.asList(SingleColumnValueFilter.new(Bytes.toBytes('alternateKey'),
Bytes.toBytes('ms'), CompareFilter::CompareOp.valueOf('GREATER'),
BinaryComparator.new(Bytes.toBytes('5000'))),
SingleColumnValueFilter.new(Bytes.toBytes('alternateKey'),
Bytes.toBytes('ms'), CompareFilter::CompareOp.valueOf('LESS'),
BinaryComparator.new(Bytes.toBytes('10000')))))}

scan 'demo', {COLUMNS => ['alternateKey:ms'], FILTER =>
FilterList.new(FilterList::Operator::MUST_PASS_ALL,
java.util.Arrays.asList(SingleColumnValueFilter.new(Bytes.toBytes('alternateKey'),
Bytes.toBytes('ms'), CompareFilter::CompareOp.valueOf('GREATER'),
BinaryComparator.new(Bytes.toBytes('5000'))),
SingleColumnValueFilter.new(Bytes.toBytes('alternateKey'),
Bytes.toBytes('ms'), CompareFilter::CompareOp.valueOf('LESS'),
BinaryComparator.new(Bytes.toBytes('9000')))))}


Thanks.

Re: Compare range of numbers on column family

Posted by anil gupta <an...@gmail.com>.
Hi Akbar,

In order to do numerical comparison first you will need to store the
numberical comparsion data as a Number rather than a String. For storing
numerical data you will need to write a custom mapper if you are using
HBase bulk loading.
Once you have store the data as number rather Strings then you will need to
use the BinaryComparator.
Hope this Helps

-Anil

On Fri, Apr 20, 2012 at 3:57 AM, Bijieshan <bi...@huawei.com> wrote:

> Akbar,
>
> I think you need to customize a comparator yourself. You can't get the
> results you want by using BinaryComparator.
> Hope I get you correctly.
>
> Jieshan.
>
> -----Original Message-----
> From: Akbar Gadhiya [mailto:akbar.gadhiya@gmail.com]
> Sent: Friday, April 20, 2012 6:19 PM
> To: user@hbase.apache.org
> Subject: Compare range of numbers on column family
>
> Hello,
>
> I need help in scanning data with column family value.
>
> With this sample data and scan command, first scan command returns nothing
> and second returns row containing 6000.
>
> PK.john.20120422 column=alternateKey:ms, timestamp=1334912415796,
> value=6000
>
> My use case is to scan records which falls between start and end timestamp.
> (timestamp is stored in column family alternateKey:ms)
> We can not use timestamp provided by hbase because it indicates time when
> record is inserted to hbase but we require timestamp related to business
> needs.
>
> We are trying to compare number as opposed to lexical comparison.  Is there
> any way I can perform this scan operation?
>
> My data and scan command look like,
>
> create 'demo', 'user', 'alternateKey', 'content'
>
> put 'innar_demo', 'PK.innar.20120418', 'user', 'Innar'
> put 'innar_demo', 'PK.innar.20120418', 'alternateKey:city', 'Tallinn'
> put 'innar_demo', 'PK.innar.20120418', 'alternateKey:phone', '0001'
> put 'innar_demo', 'PK.innar.20120418', 'alternateKey:ms', '1000'
> put 'innar_demo', 'PK.innar.20120418', 'content', 'Innar_GPB'
>
> put 'innar_demo', 'PK.akbar.20120418', 'user', 'Akbar'
> put 'innar_demo', 'PK.akbar.20120418', 'alternateKey:city', 'Ahmedabad'
> put 'innar_demo', 'PK.akbar.20120418', 'alternateKey:phone', '0002'
> put 'innar_demo', 'PK.akbar.20120418', 'alternateKey:ms', '2000'
> put 'innar_demo', 'PK.akbar.20120418', 'content', 'Akbar_GPB'
>
> put 'innar_demo', 'PK.ell.20120419', 'user', 'Ell'
> put 'innar_demo', 'PK.ell.20120419', 'alternateKey:city', 'Bangkok'
> put 'innar_demo', 'PK.ell.20120419', 'alternateKey:phone', '0003'
> put 'innar_demo', 'PK.ell.20120419', 'alternateKey:ms', '3000'
> put 'innar_demo', 'PK.ell.20120419', 'content', 'Ell_GPB'
>
> put 'innar_demo', 'PK.jane.20120420', 'user', 'Jane'
> put 'innar_demo', 'PK.jane.20120420', 'alternateKey:city', 'Jersey City'
> put 'innar_demo', 'PK.jane.20120420', 'alternateKey:phone', '0004'
> put 'innar_demo', 'PK.jane.20120420', 'alternateKey:ms', '4000'
> put 'innar_demo', 'PK.jane.20120420', 'content', 'Jane_GPB'
>
> put 'innar_demo', 'PK.michael.20120421', 'user', 'Michael'
> put 'innar_demo', 'PK.michael.20120421', 'alternateKey:city', 'Berlin'
> put 'innar_demo', 'PK.michael.20120421', 'alternateKey:phone', '0005'
> put 'innar_demo', 'PK.michael.20120421', 'alternateKey:ms', '5000'
> put 'innar_demo', 'PK.michael.20120421', 'content', 'Michael_GPB'
>
> put 'innar_demo', 'PK.john.20120422', 'user', 'John'
> put 'innar_demo', 'PK.john.20120422', 'alternateKey:city', 'London'
> put 'innar_demo', 'PK.john.20120422', 'alternateKey:phone', '0006'
> put 'innar_demo', 'PK.john.20120422', 'alternateKey:ms', '6000'
> put 'innar_demo', 'PK.john.20120422', 'content', 'John_GPB'
>
> import org.apache.hadoop.hbase.filter.FilterList
> import org.apache.hadoop.hbase.filter.FilterList::Operator
> import org.apache.hadoop.hbase.filter.CompareFilter
> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> import org.apache.hadoop.hbase.filter.SubstringComparator
> import org.apache.hadoop.hbase.filter.BinaryComparator
> import org.apache.hadoop.hbase.util.Bytes
> import org.apache.hadoop.hbase.filter.ColumnRangeFilter
>
> scan 'demo', {COLUMNS => ['alternateKey:ms'], FILTER =>
> FilterList.new(FilterList::Operator::MUST_PASS_ALL,
>
> java.util.Arrays.asList(SingleColumnValueFilter.new(Bytes.toBytes('alternateKey'),
> Bytes.toBytes('ms'), CompareFilter::CompareOp.valueOf('GREATER'),
> BinaryComparator.new(Bytes.toBytes('5000'))),
> SingleColumnValueFilter.new(Bytes.toBytes('alternateKey'),
> Bytes.toBytes('ms'), CompareFilter::CompareOp.valueOf('LESS'),
> BinaryComparator.new(Bytes.toBytes('10000')))))}
>
> scan 'demo', {COLUMNS => ['alternateKey:ms'], FILTER =>
> FilterList.new(FilterList::Operator::MUST_PASS_ALL,
>
> java.util.Arrays.asList(SingleColumnValueFilter.new(Bytes.toBytes('alternateKey'),
> Bytes.toBytes('ms'), CompareFilter::CompareOp.valueOf('GREATER'),
> BinaryComparator.new(Bytes.toBytes('5000'))),
> SingleColumnValueFilter.new(Bytes.toBytes('alternateKey'),
> Bytes.toBytes('ms'), CompareFilter::CompareOp.valueOf('LESS'),
> BinaryComparator.new(Bytes.toBytes('9000')))))}
>
>
> Thanks.
>



-- 
Thanks & Regards,
Anil Gupta

RE: Compare range of numbers on column family

Posted by Bijieshan <bi...@huawei.com>.
Akbar,

I think you need to customize a comparator yourself. You can't get the results you want by using BinaryComparator.
Hope I get you correctly.

Jieshan. 

-----Original Message-----
From: Akbar Gadhiya [mailto:akbar.gadhiya@gmail.com] 
Sent: Friday, April 20, 2012 6:19 PM
To: user@hbase.apache.org
Subject: Compare range of numbers on column family

Hello,

I need help in scanning data with column family value.

With this sample data and scan command, first scan command returns nothing
and second returns row containing 6000.

PK.john.20120422 column=alternateKey:ms, timestamp=1334912415796, value=6000

My use case is to scan records which falls between start and end timestamp.
(timestamp is stored in column family alternateKey:ms)
We can not use timestamp provided by hbase because it indicates time when
record is inserted to hbase but we require timestamp related to business
needs.

We are trying to compare number as opposed to lexical comparison.  Is there
any way I can perform this scan operation?

My data and scan command look like,

create 'demo', 'user', 'alternateKey', 'content'

put 'innar_demo', 'PK.innar.20120418', 'user', 'Innar'
put 'innar_demo', 'PK.innar.20120418', 'alternateKey:city', 'Tallinn'
put 'innar_demo', 'PK.innar.20120418', 'alternateKey:phone', '0001'
put 'innar_demo', 'PK.innar.20120418', 'alternateKey:ms', '1000'
put 'innar_demo', 'PK.innar.20120418', 'content', 'Innar_GPB'

put 'innar_demo', 'PK.akbar.20120418', 'user', 'Akbar'
put 'innar_demo', 'PK.akbar.20120418', 'alternateKey:city', 'Ahmedabad'
put 'innar_demo', 'PK.akbar.20120418', 'alternateKey:phone', '0002'
put 'innar_demo', 'PK.akbar.20120418', 'alternateKey:ms', '2000'
put 'innar_demo', 'PK.akbar.20120418', 'content', 'Akbar_GPB'

put 'innar_demo', 'PK.ell.20120419', 'user', 'Ell'
put 'innar_demo', 'PK.ell.20120419', 'alternateKey:city', 'Bangkok'
put 'innar_demo', 'PK.ell.20120419', 'alternateKey:phone', '0003'
put 'innar_demo', 'PK.ell.20120419', 'alternateKey:ms', '3000'
put 'innar_demo', 'PK.ell.20120419', 'content', 'Ell_GPB'

put 'innar_demo', 'PK.jane.20120420', 'user', 'Jane'
put 'innar_demo', 'PK.jane.20120420', 'alternateKey:city', 'Jersey City'
put 'innar_demo', 'PK.jane.20120420', 'alternateKey:phone', '0004'
put 'innar_demo', 'PK.jane.20120420', 'alternateKey:ms', '4000'
put 'innar_demo', 'PK.jane.20120420', 'content', 'Jane_GPB'

put 'innar_demo', 'PK.michael.20120421', 'user', 'Michael'
put 'innar_demo', 'PK.michael.20120421', 'alternateKey:city', 'Berlin'
put 'innar_demo', 'PK.michael.20120421', 'alternateKey:phone', '0005'
put 'innar_demo', 'PK.michael.20120421', 'alternateKey:ms', '5000'
put 'innar_demo', 'PK.michael.20120421', 'content', 'Michael_GPB'

put 'innar_demo', 'PK.john.20120422', 'user', 'John'
put 'innar_demo', 'PK.john.20120422', 'alternateKey:city', 'London'
put 'innar_demo', 'PK.john.20120422', 'alternateKey:phone', '0006'
put 'innar_demo', 'PK.john.20120422', 'alternateKey:ms', '6000'
put 'innar_demo', 'PK.john.20120422', 'content', 'John_GPB'

import org.apache.hadoop.hbase.filter.FilterList
import org.apache.hadoop.hbase.filter.FilterList::Operator
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.filter.BinaryComparator
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.filter.ColumnRangeFilter

scan 'demo', {COLUMNS => ['alternateKey:ms'], FILTER =>
FilterList.new(FilterList::Operator::MUST_PASS_ALL,
java.util.Arrays.asList(SingleColumnValueFilter.new(Bytes.toBytes('alternateKey'),
Bytes.toBytes('ms'), CompareFilter::CompareOp.valueOf('GREATER'),
BinaryComparator.new(Bytes.toBytes('5000'))),
SingleColumnValueFilter.new(Bytes.toBytes('alternateKey'),
Bytes.toBytes('ms'), CompareFilter::CompareOp.valueOf('LESS'),
BinaryComparator.new(Bytes.toBytes('10000')))))}

scan 'demo', {COLUMNS => ['alternateKey:ms'], FILTER =>
FilterList.new(FilterList::Operator::MUST_PASS_ALL,
java.util.Arrays.asList(SingleColumnValueFilter.new(Bytes.toBytes('alternateKey'),
Bytes.toBytes('ms'), CompareFilter::CompareOp.valueOf('GREATER'),
BinaryComparator.new(Bytes.toBytes('5000'))),
SingleColumnValueFilter.new(Bytes.toBytes('alternateKey'),
Bytes.toBytes('ms'), CompareFilter::CompareOp.valueOf('LESS'),
BinaryComparator.new(Bytes.toBytes('9000')))))}


Thanks.