You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by ch huang <ju...@gmail.com> on 2013/07/13 10:43:39 UTC

the scan will be executed parallel if not use coprocessor?

ATT

Re: the scan will be executed parallel if not use coprocessor?

Posted by Ted Yu <yu...@gmail.com>.

Have you read https://blogs.apache.org/hbase/entry/coprocessor_introduction?

Cheers

On Sat, Jul 13, 2013 at 4:50 AM, ch huang <ju...@gmail.com> wrote:

> hi ted ,for example i have a table with 10 regions, if i offer the
> condition hit the data of 8 regions,is it different do it use oraginal scan
> and use coprocessor? i know coprocessor can do it parallel for each region
> ,but why the oraginal scan will slow than coprocessor?
>
>
>
> On Sat, Jul 13, 2013 at 7:36 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Can you clarify your question a little bit ?
> >
> > That is, are you expecting parallel scan within region boundary or across
> > boundaries ?
> >
> > Cheers
> >
> > On Jul 13, 2013, at 1:43 AM, ch huang <ju...@gmail.com> wrote:
> >
> > > ATT
> >
>

Re: the scan will be executed parallel if not use coprocessor?

Posted by Anoop John <an...@gmail.com>.

Yes it may be good to visit HBASE-1935 ..

Whether or not CP Observers (pre/post hooks) are used or not, the scanning
is sequential from HBase client side. Phoenix having their own client side
code to make mutiple parallel scan requests to servers. (splitting the scan
range)

We have Endpoints. The execution of this from client side will be
parallel.

Just said to make it clear.

-Anoop-

On Tue, Jul 16, 2013 at 12:28 AM, lars hofhansl <la...@apache.org> wrote:

> The HBase contract guarantees that rows are returned in row order.
> That puts limits what can be done in parallel. For example one could farm
> out the requests to the region servers in parallel, but the client would
> still have to wait for the rows that sort first and deliver those to the
> client first.
> We could add a new scan option that optionally allows to return rows out
> of order, in that case the client could deliver the rows as they are
> retrieved.
> In that case care must be taken that the parallel scanner behaves
> correctly when regions have moved - currently the client scanner know how
> far it got in the scan, and just resets from there; that part would be a
> bit more tricky in the parallel case.
>
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: ramkrishna vasudevan <ra...@gmail.com>
> To: "user@hbase.apache.org" <us...@hbase.apache.org>
> Cc:
> Sent: Sunday, July 14, 2013 9:15 PM
> Subject: Re: the scan will be executed parallel if not use coprocessor?
>
> The HBase by default does not use parallel scanning mechanism.  It is
> sequential.  There are some JIRA that try to implement scanning in parallel
> on the regions.  HBASE-1935 is one such idea.
> Projects like phoenix uses Coprocessors to scan the regions in parallel and
> the results are returned to the clients.
>
> Regards
> Ram
>
>
> On Mon, Jul 15, 2013 at 7:20 AM, ch huang <ju...@gmail.com> wrote:
>
> > phoenix is using coprocessor internal
> >
> > On Sun, Jul 14, 2013 at 11:15 PM, Asaf Mesika <as...@gmail.com>
> > wrote:
> >
> > > To my knowledge, scan is not parallel, hence the speed of queries of
> > > Impala, Phoenix, and other similar projects.
> > >
> > > On Saturday, July 13, 2013, ch huang wrote:
> > >
> > > > hi ted ,for example i have a table with 10 regions, if i offer the
> > > > condition hit the data of 8 regions,is it different do it use
> oraginal
> > > scan
> > > > and use coprocessor? i know coprocessor can do it parallel for each
> > > region
> > > > ,but why the oraginal scan will slow than coprocessor?
> > > >
> > > >
> > > >
> > > > On Sat, Jul 13, 2013 at 7:36 PM, Ted Yu <yuzhihong@gmail.com
> > > <javascript:;>>
> > > > wrote:
> > > >
> > > > > Can you clarify your question a little bit ?
> > > > >
> > > > > That is, are you expecting parallel scan within region boundary or
> > > across
> > > > > boundaries ?
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Jul 13, 2013, at 1:43 AM, ch huang <justlooks@gmail.com
> > > <javascript:;>>
> > > > wrote:
> > > > >
> > > > > > ATT
> > > > >
> > > >
> > >
> > >
> > > --
> > > Sent from Gmail Mobile
> > >
> >
>
>

Re: the scan will be executed parallel if not use coprocessor?

Posted by lars hofhansl <la...@apache.org>.

The HBase contract guarantees that rows are returned in row order.
That puts limits what can be done in parallel. For example one could farm out the requests to the region servers in parallel, but the client would still have to wait for the rows that sort first and deliver those to the client first.
We could add a new scan option that optionally allows to return rows out of order, in that case the client could deliver the rows as they are retrieved.
In that case care must be taken that the parallel scanner behaves correctly when regions have moved - currently the client scanner know how far it got in the scan, and just resets from there; that part would be a bit more tricky in the parallel case.

-- Lars

----- Original Message -----
From: ramkrishna vasudevan <ra...@gmail.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org>
Cc: 
Sent: Sunday, July 14, 2013 9:15 PM
Subject: Re: the scan will be executed parallel if not use coprocessor?

The HBase by default does not use parallel scanning mechanism.  It is
sequential.  There are some JIRA that try to implement scanning in parallel
on the regions.  HBASE-1935 is one such idea.
Projects like phoenix uses Coprocessors to scan the regions in parallel and
the results are returned to the clients.

Regards
Ram

On Mon, Jul 15, 2013 at 7:20 AM, ch huang <ju...@gmail.com> wrote:

> phoenix is using coprocessor internal
>
> On Sun, Jul 14, 2013 at 11:15 PM, Asaf Mesika <as...@gmail.com>
> wrote:
>
> > To my knowledge, scan is not parallel, hence the speed of queries of
> > Impala, Phoenix, and other similar projects.
> >
> > On Saturday, July 13, 2013, ch huang wrote:
> >
> > > hi ted ,for example i have a table with 10 regions, if i offer the
> > > condition hit the data of 8 regions,is it different do it use oraginal
> > scan
> > > and use coprocessor? i know coprocessor can do it parallel for each
> > region
> > > ,but why the oraginal scan will slow than coprocessor?
> > >
> > >
> > >
> > > On Sat, Jul 13, 2013 at 7:36 PM, Ted Yu <yuzhihong@gmail.com
> > <javascript:;>>
> > > wrote:
> > >
> > > > Can you clarify your question a little bit ?
> > > >
> > > > That is, are you expecting parallel scan within region boundary or
> > across
> > > > boundaries ?
> > > >
> > > > Cheers
> > > >
> > > > On Jul 13, 2013, at 1:43 AM, ch huang <justlooks@gmail.com
> > <javascript:;>>
> > > wrote:
> > > >
> > > > > ATT
> > > >
> > >
> >
> >
> > --
> > Sent from Gmail Mobile
> >
>

Re: the scan will be executed parallel if not use coprocessor?

Posted by ramkrishna vasudevan <ra...@gmail.com>.

The HBase by default does not use parallel scanning mechanism.  It is
sequential.  There are some JIRA that try to implement scanning in parallel
on the regions.  HBASE-1935 is one such idea.
Projects like phoenix uses Coprocessors to scan the regions in parallel and
the results are returned to the clients.

Regards
Ram


On Mon, Jul 15, 2013 at 7:20 AM, ch huang <ju...@gmail.com> wrote:

> phoenix is using coprocessor internal
>
> On Sun, Jul 14, 2013 at 11:15 PM, Asaf Mesika <as...@gmail.com>
> wrote:
>
> > To my knowledge, scan is not parallel, hence the speed of queries of
> > Impala, Phoenix, and other similar projects.
> >
> > On Saturday, July 13, 2013, ch huang wrote:
> >
> > > hi ted ,for example i have a table with 10 regions, if i offer the
> > > condition hit the data of 8 regions,is it different do it use oraginal
> > scan
> > > and use coprocessor? i know coprocessor can do it parallel for each
> > region
> > > ,but why the oraginal scan will slow than coprocessor?
> > >
> > >
> > >
> > > On Sat, Jul 13, 2013 at 7:36 PM, Ted Yu <yuzhihong@gmail.com
> > <javascript:;>>
> > > wrote:
> > >
> > > > Can you clarify your question a little bit ?
> > > >
> > > > That is, are you expecting parallel scan within region boundary or
> > across
> > > > boundaries ?
> > > >
> > > > Cheers
> > > >
> > > > On Jul 13, 2013, at 1:43 AM, ch huang <justlooks@gmail.com
> > <javascript:;>>
> > > wrote:
> > > >
> > > > > ATT
> > > >
> > >
> >
> >
> > --
> > Sent from Gmail Mobile
> >
>

Re: the scan will be executed parallel if not use coprocessor?

Posted by ch huang <ju...@gmail.com>.

phoenix is using coprocessor internal

On Sun, Jul 14, 2013 at 11:15 PM, Asaf Mesika <as...@gmail.com> wrote:

> To my knowledge, scan is not parallel, hence the speed of queries of
> Impala, Phoenix, and other similar projects.
>
> On Saturday, July 13, 2013, ch huang wrote:
>
> > hi ted ,for example i have a table with 10 regions, if i offer the
> > condition hit the data of 8 regions,is it different do it use oraginal
> scan
> > and use coprocessor? i know coprocessor can do it parallel for each
> region
> > ,but why the oraginal scan will slow than coprocessor?
> >
> >
> >
> > On Sat, Jul 13, 2013 at 7:36 PM, Ted Yu <yuzhihong@gmail.com
> <javascript:;>>
> > wrote:
> >
> > > Can you clarify your question a little bit ?
> > >
> > > That is, are you expecting parallel scan within region boundary or
> across
> > > boundaries ?
> > >
> > > Cheers
> > >
> > > On Jul 13, 2013, at 1:43 AM, ch huang <justlooks@gmail.com
> <javascript:;>>
> > wrote:
> > >
> > > > ATT
> > >
> >
>
>
> --
> Sent from Gmail Mobile
>

Re: the scan will be executed parallel if not use coprocessor?

Posted by Asaf Mesika <as...@gmail.com>.

To my knowledge, scan is not parallel, hence the speed of queries of
Impala, Phoenix, and other similar projects.

On Saturday, July 13, 2013, ch huang wrote:

> hi ted ,for example i have a table with 10 regions, if i offer the
> condition hit the data of 8 regions,is it different do it use oraginal scan
> and use coprocessor? i know coprocessor can do it parallel for each region
> ,but why the oraginal scan will slow than coprocessor?
>
>
>
> On Sat, Jul 13, 2013 at 7:36 PM, Ted Yu <yuzhihong@gmail.com<javascript:;>>
> wrote:
>
> > Can you clarify your question a little bit ?
> >
> > That is, are you expecting parallel scan within region boundary or across
> > boundaries ?
> >
> > Cheers
> >
> > On Jul 13, 2013, at 1:43 AM, ch huang <justlooks@gmail.com<javascript:;>>
> wrote:
> >
> > > ATT
> >
>


-- 
Sent from Gmail Mobile

Re: the scan will be executed parallel if not use coprocessor?

Posted by ch huang <ju...@gmail.com>.

hi ted ,for example i have a table with 10 regions, if i offer the
condition hit the data of 8 regions,is it different do it use oraginal scan
and use coprocessor? i know coprocessor can do it parallel for each region
,but why the oraginal scan will slow than coprocessor?

On Sat, Jul 13, 2013 at 7:36 PM, Ted Yu <yu...@gmail.com> wrote:

> Can you clarify your question a little bit ?
>
> That is, are you expecting parallel scan within region boundary or across
> boundaries ?
>
> Cheers
>
> On Jul 13, 2013, at 1:43 AM, ch huang <ju...@gmail.com> wrote:
>
> > ATT
>

Re: the scan will be executed parallel if not use coprocessor?

Posted by Ted Yu <yu...@gmail.com>.

Can you clarify your question a little bit ?

That is, are you expecting parallel scan within region boundary or across boundaries ?

Cheers

On Jul 13, 2013, at 1:43 AM, ch huang <ju...@gmail.com> wrote:

> ATT