You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Farrokh Shahriari <mo...@gmail.com> on 2013/02/01 14:52:02 UTC

Parallel scan in HBase

Hi there
I have two question about scan in Hbase :
1) Does scan operation with specific filter run in parallel on different
regionservers ?
2) I wanna know whether this code runs at client side for searching the
retrieved results or not ?

         for (Result result : scanner1) {
                         for (KeyValue kv : result.raw()) {
                         //
                         // some coeds
                         //
                     }
               }


Farrokh Shahriari

Re: Parallel scan in HBase

Posted by Farrokh Shahriari <mo...@gmail.com>.
Thank you guys,
@Mohammad : Yeah I should retreice all the rows and compare each of them to
a specific value.
As I understand that Hbase by default doesn't support parallel scan,but I
can implement it by my own through Coprocessors & knowing the start/end row
key on each region, am I correct ?

Farrokh

On Fri, Feb 1, 2013 at 8:37 PM, James Taylor <jt...@salesforce.com> wrote:

> If you run a SQL query that does aggregation (i.e. uses a built-in
> aggregation function like COUNT or does a GROUP BY), Phoenix will
> orchestrate the running of a set of queries in parallel, segmented along
> your row key (driven by the start/stop key plus region boundaries). We take
> advantage of a nifty feature that Lars added where you can pass in your own
> ExecutorService to an HTable, so you could do something similar.
>
> Regards,
>
>     James
>
>
> On 02/01/2013 08:40 AM, Mohammad Tariq wrote:
>
>> Do you need to scan each n every row within that range?Or you need
>> specific
>> rows based on some filter?
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Fri, Feb 1, 2013 at 9:16 PM, lars hofhansl <la...@apache.org> wrote:
>>
>>  The scan contract in HBase is that all rows are returned in order, so all
>>> regions have to be traversed in order as well.
>>> It would be nice to add some facility to HBase to performs the scanning
>>> in
>>> parallel.
>>>
>>>
>>>
>>> ______________________________**__
>>>   From: Farrokh Shahriari <mohandes.zebeleh.67@gmail.com**>
>>> To: user@hbase.apache.org
>>> Sent: Friday, February 1, 2013 5:52 AM
>>> Subject: Parallel scan in HBase
>>>
>>> Hi there
>>> I have two question about scan in Hbase :
>>> 1) Does scan operation with specific filter run in parallel on different
>>> regionservers ?
>>> 2) I wanna know whether this code runs at client side for searching the
>>> retrieved results or not ?
>>>
>>>           for (Result result : scanner1) {
>>>                           for (KeyValue kv : result.raw()) {
>>>                           //
>>>                           // some coeds
>>>                           //
>>>                       }
>>>                 }
>>>
>>>
>>> Farrokh Shahriari
>>>
>>>
>>
>

Re: Parallel scan in HBase

Posted by James Taylor <jt...@salesforce.com>.
If you run a SQL query that does aggregation (i.e. uses a built-in 
aggregation function like COUNT or does a GROUP BY), Phoenix will 
orchestrate the running of a set of queries in parallel, segmented along 
your row key (driven by the start/stop key plus region boundaries). We 
take advantage of a nifty feature that Lars added where you can pass in 
your own ExecutorService to an HTable, so you could do something similar.

Regards,

     James

On 02/01/2013 08:40 AM, Mohammad Tariq wrote:
> Do you need to scan each n every row within that range?Or you need specific
> rows based on some filter?
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Fri, Feb 1, 2013 at 9:16 PM, lars hofhansl <la...@apache.org> wrote:
>
>> The scan contract in HBase is that all rows are returned in order, so all
>> regions have to be traversed in order as well.
>> It would be nice to add some facility to HBase to performs the scanning in
>> parallel.
>>
>>
>>
>> ________________________________
>>   From: Farrokh Shahriari <mo...@gmail.com>
>> To: user@hbase.apache.org
>> Sent: Friday, February 1, 2013 5:52 AM
>> Subject: Parallel scan in HBase
>>
>> Hi there
>> I have two question about scan in Hbase :
>> 1) Does scan operation with specific filter run in parallel on different
>> regionservers ?
>> 2) I wanna know whether this code runs at client side for searching the
>> retrieved results or not ?
>>
>>           for (Result result : scanner1) {
>>                           for (KeyValue kv : result.raw()) {
>>                           //
>>                           // some coeds
>>                           //
>>                       }
>>                 }
>>
>>
>> Farrokh Shahriari
>>
>


Re: Parallel scan in HBase

Posted by Mohammad Tariq <do...@gmail.com>.
Do you need to scan each n every row within that range?Or you need specific
rows based on some filter?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Fri, Feb 1, 2013 at 9:16 PM, lars hofhansl <la...@apache.org> wrote:

> The scan contract in HBase is that all rows are returned in order, so all
> regions have to be traversed in order as well.
> It would be nice to add some facility to HBase to performs the scanning in
> parallel.
>
>
>
> ________________________________
>  From: Farrokh Shahriari <mo...@gmail.com>
> To: user@hbase.apache.org
> Sent: Friday, February 1, 2013 5:52 AM
> Subject: Parallel scan in HBase
>
> Hi there
> I have two question about scan in Hbase :
> 1) Does scan operation with specific filter run in parallel on different
> regionservers ?
> 2) I wanna know whether this code runs at client side for searching the
> retrieved results or not ?
>
>          for (Result result : scanner1) {
>                          for (KeyValue kv : result.raw()) {
>                          //
>                          // some coeds
>                          //
>                      }
>                }
>
>
> Farrokh Shahriari
>

Re: Parallel scan in HBase

Posted by lars hofhansl <la...@apache.org>.
The scan contract in HBase is that all rows are returned in order, so all regions have to be traversed in order as well.
It would be nice to add some facility to HBase to performs the scanning in parallel.



________________________________
 From: Farrokh Shahriari <mo...@gmail.com>
To: user@hbase.apache.org 
Sent: Friday, February 1, 2013 5:52 AM
Subject: Parallel scan in HBase
 
Hi there
I have two question about scan in Hbase :
1) Does scan operation with specific filter run in parallel on different
regionservers ?
2) I wanna know whether this code runs at client side for searching the
retrieved results or not ?

         for (Result result : scanner1) {
                         for (KeyValue kv : result.raw()) {
                         //
                         // some coeds
                         //
                     }
               }


Farrokh Shahriari

Re: Parallel scan in HBase

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
MR job is almost doing that.

The map methode is called for each row, and you can have multiple jobs
running at the same time.

It's the way the rowcounter is working. Scanning every row to count
it, but spreading the work over all the nodes...

Give it a look.

JM

2013/2/1, Alexander Ignatov <ai...@mirantis.com>:
> You could use Coprocessors framework. To do that you have to implement
> your own Coprocessors's module and include it to each RegionServers.
>
> Here is an introduction article how to use Coprocessors:
> https://blogs.apache.org/hbase/entry/coprocessor_introduction
>
> --
> Regards,
> Alexander Ignatov
>
>
> On 2/1/2013 6:57 PM, Farrokh Shahriari wrote:
>> Tnx for your reply,
>> In my case, I should scan all rows( about 1 millions to 5 millions rows)
>> in
>> a table & it takes a long time. I wanna know is there any way I can do it
>> in parallel or not ?
>>
>> On Fri, Feb 1, 2013 at 5:32 PM, Mohammad Tariq <do...@gmail.com>
>> wrote:
>>
>>> Hello Farrokh,
>>>
>>>      Scans work sequentially with one region after the other. Scans from
>>> client side do not go to regionservers in parallel. And, for the second
>>> question, the code will run at the client side.
>>>
>>> Warm Regards,
>>> Tariq
>>> https://mtariq.jux.com/
>>> cloudfront.blogspot.com
>>>
>>>
>>> On Fri, Feb 1, 2013 at 7:22 PM, Farrokh Shahriari <
>>> mohandes.zebeleh.67@gmail.com> wrote:
>>>
>>>> Hi there
>>>> I have two question about scan in Hbase :
>>>> 1) Does scan operation with specific filter run in parallel on
>>>> different
>>>> regionservers ?
>>>> 2) I wanna know whether this code runs at client side for searching the
>>>> retrieved results or not ?
>>>>
>>>>           for (Result result : scanner1) {
>>>>                           for (KeyValue kv : result.raw()) {
>>>>                           //
>>>>                           // some coeds
>>>>                           //
>>>>                       }
>>>>                 }
>>>>
>>>>
>>>> Farrokh Shahriari
>>>>
>
>
>

Re: Parallel scan in HBase

Posted by Alexander Ignatov <ai...@mirantis.com>.
You could use Coprocessors framework. To do that you have to implement 
your own Coprocessors's module and include it to each RegionServers.

Here is an introduction article how to use Coprocessors:
https://blogs.apache.org/hbase/entry/coprocessor_introduction

-- 
Regards,
Alexander Ignatov


On 2/1/2013 6:57 PM, Farrokh Shahriari wrote:
> Tnx for your reply,
> In my case, I should scan all rows( about 1 millions to 5 millions rows) in
> a table & it takes a long time. I wanna know is there any way I can do it
> in parallel or not ?
>
> On Fri, Feb 1, 2013 at 5:32 PM, Mohammad Tariq <do...@gmail.com> wrote:
>
>> Hello Farrokh,
>>
>>      Scans work sequentially with one region after the other. Scans from
>> client side do not go to regionservers in parallel. And, for the second
>> question, the code will run at the client side.
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Fri, Feb 1, 2013 at 7:22 PM, Farrokh Shahriari <
>> mohandes.zebeleh.67@gmail.com> wrote:
>>
>>> Hi there
>>> I have two question about scan in Hbase :
>>> 1) Does scan operation with specific filter run in parallel on different
>>> regionservers ?
>>> 2) I wanna know whether this code runs at client side for searching the
>>> retrieved results or not ?
>>>
>>>           for (Result result : scanner1) {
>>>                           for (KeyValue kv : result.raw()) {
>>>                           //
>>>                           // some coeds
>>>                           //
>>>                       }
>>>                 }
>>>
>>>
>>> Farrokh Shahriari
>>>



Re: Parallel scan in HBase

Posted by Farrokh Shahriari <mo...@gmail.com>.
Tnx for your reply,
In my case, I should scan all rows( about 1 millions to 5 millions rows) in
a table & it takes a long time. I wanna know is there any way I can do it
in parallel or not ?

On Fri, Feb 1, 2013 at 5:32 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Farrokh,
>
>     Scans work sequentially with one region after the other. Scans from
> client side do not go to regionservers in parallel. And, for the second
> question, the code will run at the client side.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Fri, Feb 1, 2013 at 7:22 PM, Farrokh Shahriari <
> mohandes.zebeleh.67@gmail.com> wrote:
>
> > Hi there
> > I have two question about scan in Hbase :
> > 1) Does scan operation with specific filter run in parallel on different
> > regionservers ?
> > 2) I wanna know whether this code runs at client side for searching the
> > retrieved results or not ?
> >
> >          for (Result result : scanner1) {
> >                          for (KeyValue kv : result.raw()) {
> >                          //
> >                          // some coeds
> >                          //
> >                      }
> >                }
> >
> >
> > Farrokh Shahriari
> >
>

Re: Parallel scan in HBase

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Farrokh,

    Scans work sequentially with one region after the other. Scans from
client side do not go to regionservers in parallel. And, for the second
question, the code will run at the client side.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Fri, Feb 1, 2013 at 7:22 PM, Farrokh Shahriari <
mohandes.zebeleh.67@gmail.com> wrote:

> Hi there
> I have two question about scan in Hbase :
> 1) Does scan operation with specific filter run in parallel on different
> regionservers ?
> 2) I wanna know whether this code runs at client side for searching the
> retrieved results or not ?
>
>          for (Result result : scanner1) {
>                          for (KeyValue kv : result.raw()) {
>                          //
>                          // some coeds
>                          //
>                      }
>                }
>
>
> Farrokh Shahriari
>