You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ajay Govindarajan <ag...@yahoo.com> on 2011/04/28 20:30:19 UTC

HBase querying across region servers

We have a bunch of synchronous requests that will read and write data to hbase. I have written some code that uses the HBase  client library to use Puts for writes, Gets for reads with rowkeys and Scans for reads with filters. Currently we have only one region server (since its a dev environment) so the queries work fine. Eventually we will have multiple region servers in our production environment. From the documentation it seems that Gets and Puts will work across multiple region servers while scans don't.

So how do I solve this problem to get scans to work across multiple region servers? Should I avoid using scans and replace it with Gets using filters ? Is that a big perfrmance overhead?
Or is there a framework to perform scan like queries across multiple region servers?

Any help will be appreciated.

thanks
-ajay

Re: HBase querying across region servers

Posted by Jean-Daniel Cryans <jd...@apache.org>.
(please don't write back personally unless it's really personal)

That's all fine, rows are always contained in a single region. By
state I meant if you created some fancy filter yourself and decided to
keep some state where the filtering of one row could affect how others
would be filtered.

Like I said earlier, that's not the case with the ones shipped with
HBase, so again yes this is going to work.

The documentation is all contained in the javadoc.

J-D

On Thu, Apr 28, 2011 at 3:07 PM, Ajay Govindarajan
<ag...@yahoo.com> wrote:
> SingleColumnValueFilter filter = new SingleColumnValueFilter(
>                     Bytes.toBytes(columnFamily), Bytes.toBytes(key),
>                     CompareOp.EQUAL, Bytes.toBytes("someValue"));
> filter.setFilterIfMissing(true);
> Scan scan = new Scan();
> scan.setFilter(filter);
> ResultScanner scanner = hTable.getScanner(scan);
> for (Result r = scanner.next(); r != null; r = scanner.next()) {
>      String rowKey = Bytes.toString(r.getRow());
>     NavigableMap<byte[], byte[]> map = r.getFamilyMap(Bytes
>                 .toBytes(columnFamily));
> }
>
>
> Will this code work across regions?
>
> Also you say that " if you happened to have some sort of state in your
> filter"? As far as I can see only the reset() and filterRow() methods seem
> to alter the state. Are there more methods that alter the state? If so could
> you please point me to the relevant documentation?
>
> thanks very much
> -ajay
>
>
> ________________________________
> From: Jean-Daniel Cryans <jd...@apache.org>
> To: user@hbase.apache.org; Ajay Govindarajan <ag...@yahoo.com>
> Sent: Thursday, April 28, 2011 1:41 PM
> Subject: Re: HBase querying across region servers
>
> Can you give an example of what you're trying to do?
>
> BTW what we mean when we say that filters don't work across region
> servers (actually it's more across regions, so it's also a problem on
> a single machine) is that if you happened to have some sort of state
> in your filter, it wouldn't be carried from one region to another. I
> don't think any of the filters HBase ships with have that sort of
> issue, so they can all be used to scan a full table if that's what you
> fancy.
>
> J-D
>
> On Thu, Apr 28, 2011 at 1:19 PM, Ajay Govindarajan
> <ag...@yahoo.com> wrote:
>> Sorry, what I meant was Scans using Filters. There are use-cases for which
>> we will not know the row keys. So we have to resort to filters using
>> SingleColumnValueFilter or PrefixFilter
>> Since filters don't work across region servers, are there any alternative
>> APIs or workarounds? Or is there a fundamental schema design issue here?
>>
>> thanks
>> -ajay
>>
>>
>>
>>
>>
>>
>>
>> ________________________________
>> From: Bennett Andrews <be...@gmail.com>
>> To: user@hbase.apache.org; Ajay Govindarajan <ag...@yahoo.com>
>> Sent: Thursday, April 28, 2011 12:54 PM
>> Subject: Re: HBase querying across region servers
>>
>> Scans will work across region servers transparently.  All you need to do
>> is
>> specify a start row and end row.  Use this when you reading sequential
>> rows
>> as it will be faster.
>>
>> -bennett
>>
>>
>>
>> On Thu, Apr 28, 2011 at 2:30 PM, Ajay Govindarajan
>> <ag...@yahoo.com>wrote:
>>
>>> We have a bunch of synchronous requests that will read and write data to
>>> hbase. I have written some code that uses the HBase  client library to
>>> use
>>> Puts for writes, Gets for reads with rowkeys and Scans for reads with
>>> filters. Currently we have only one region server (since its a dev
>>> environment) so the queries work fine. Eventually we will have multiple
>>> region servers in our production environment. From the documentation it
>>> seems that Gets and Puts will work across multiple region servers while
>>> scans don't.
>>>
>>> So how do I solve this problem to get scans to work across multiple
>>> region
>>> servers? Should I avoid using scans and replace it with Gets using
>>> filters ?
>>> Is that a big perfrmance overhead?
>>> Or is there a framework to perform scan like queries across multiple
>>> region
>>> servers?
>>>
>>> Any help will be appreciated.
>>>
>>> thanks
>>> -ajay
>>>
>
>
>

Re: HBase querying across region servers

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Can you give an example of what you're trying to do?

BTW what we mean when we say that filters don't work across region
servers (actually it's more across regions, so it's also a problem on
a single machine) is that if you happened to have some sort of state
in your filter, it wouldn't be carried from one region to another. I
don't think any of the filters HBase ships with have that sort of
issue, so they can all be used to scan a full table if that's what you
fancy.

J-D

On Thu, Apr 28, 2011 at 1:19 PM, Ajay Govindarajan
<ag...@yahoo.com> wrote:
> Sorry, what I meant was Scans using Filters. There are use-cases for which we will not know the row keys. So we have to resort to filters using SingleColumnValueFilter or PrefixFilter
> Since filters don't work across region servers, are there any alternative APIs or workarounds? Or is there a fundamental schema design issue here?
>
> thanks
> -ajay
>
>
>
>
>
>
>
> ________________________________
> From: Bennett Andrews <be...@gmail.com>
> To: user@hbase.apache.org; Ajay Govindarajan <ag...@yahoo.com>
> Sent: Thursday, April 28, 2011 12:54 PM
> Subject: Re: HBase querying across region servers
>
> Scans will work across region servers transparently.  All you need to do is
> specify a start row and end row.  Use this when you reading sequential rows
> as it will be faster.
>
> -bennett
>
>
>
> On Thu, Apr 28, 2011 at 2:30 PM, Ajay Govindarajan
> <ag...@yahoo.com>wrote:
>
>> We have a bunch of synchronous requests that will read and write data to
>> hbase. I have written some code that uses the HBase  client library to use
>> Puts for writes, Gets for reads with rowkeys and Scans for reads with
>> filters. Currently we have only one region server (since its a dev
>> environment) so the queries work fine. Eventually we will have multiple
>> region servers in our production environment. From the documentation it
>> seems that Gets and Puts will work across multiple region servers while
>> scans don't.
>>
>> So how do I solve this problem to get scans to work across multiple region
>> servers? Should I avoid using scans and replace it with Gets using filters ?
>> Is that a big perfrmance overhead?
>> Or is there a framework to perform scan like queries across multiple region
>> servers?
>>
>> Any help will be appreciated.
>>
>> thanks
>> -ajay
>>

Re: HBase querying across region servers

Posted by Ajay Govindarajan <ag...@yahoo.com>.
Sorry, what I meant was Scans using Filters. There are use-cases for which we will not know the row keys. So we have to resort to filters using SingleColumnValueFilter or PrefixFilter
Since filters don't work across region servers, are there any alternative APIs or workarounds? Or is there a fundamental schema design issue here?

thanks
-ajay







________________________________
From: Bennett Andrews <be...@gmail.com>
To: user@hbase.apache.org; Ajay Govindarajan <ag...@yahoo.com>
Sent: Thursday, April 28, 2011 12:54 PM
Subject: Re: HBase querying across region servers

Scans will work across region servers transparently.  All you need to do is
specify a start row and end row.  Use this when you reading sequential rows
as it will be faster.

-bennett



On Thu, Apr 28, 2011 at 2:30 PM, Ajay Govindarajan
<ag...@yahoo.com>wrote:

> We have a bunch of synchronous requests that will read and write data to
> hbase. I have written some code that uses the HBase  client library to use
> Puts for writes, Gets for reads with rowkeys and Scans for reads with
> filters. Currently we have only one region server (since its a dev
> environment) so the queries work fine. Eventually we will have multiple
> region servers in our production environment. From the documentation it
> seems that Gets and Puts will work across multiple region servers while
> scans don't.
>
> So how do I solve this problem to get scans to work across multiple region
> servers? Should I avoid using scans and replace it with Gets using filters ?
> Is that a big perfrmance overhead?
> Or is there a framework to perform scan like queries across multiple region
> servers?
>
> Any help will be appreciated.
>
> thanks
> -ajay
>

Re: HBase querying across region servers

Posted by Bennett Andrews <be...@gmail.com>.
Scans will work across region servers transparently.  All you need to do is
specify a start row and end row.  Use this when you reading sequential rows
as it will be faster.

-bennett



On Thu, Apr 28, 2011 at 2:30 PM, Ajay Govindarajan
<ag...@yahoo.com>wrote:

> We have a bunch of synchronous requests that will read and write data to
> hbase. I have written some code that uses the HBase  client library to use
> Puts for writes, Gets for reads with rowkeys and Scans for reads with
> filters. Currently we have only one region server (since its a dev
> environment) so the queries work fine. Eventually we will have multiple
> region servers in our production environment. From the documentation it
> seems that Gets and Puts will work across multiple region servers while
> scans don't.
>
> So how do I solve this problem to get scans to work across multiple region
> servers? Should I avoid using scans and replace it with Gets using filters ?
> Is that a big perfrmance overhead?
> Or is there a framework to perform scan like queries across multiple region
> servers?
>
> Any help will be appreciated.
>
> thanks
> -ajay
>