You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by 冯宏华 <fe...@xiaomi.com> on 2014/03/17 12:24:13 UTC

答复: setMaxResultSize method in Scan

No such method for Scan in 0.94.x.

If you want to set the max result size for a scan, you can achieve this by setting the "hbase.client.scanner.max.result.size" configuration, the default for which is Long.MAX_VALUE (no limited)
________________________________________
发件人: Weiping Qu [qu@informatik.uni-kl.de]
发送时间: 2014年3月17日 18:50
收件人: dev@hbase.apache.org
主题: setMaxResultSize method in Scan

Hello,

I could not find the method setMaxResultSize(long m)
(http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
in my Scanclass (0.94.13 version).
Can anyone help me? Thanks

Weiping

Re: 答复: setMaxResultSize method in Scan

Posted by Weiping Qu <qu...@informatik.uni-kl.de>.
Thank you for the reply.
I will check that.

Cheers
> This method was introduced by HBASE-2214 which is in 0.96+
>
> Can you upgrade to 0.96 or 0.98 ?
>
> Cheers
>
>
> On Mon, Mar 17, 2014 at 4:48 AM, Weiping Qu <qu...@informatik.uni-kl.de> wrote:
>
>> Thanks.
>>
>> I'd like to assume that setMaxResultSize is equivalent to the SQL Limit
>> clause as expected which is specified each time a SQL statement is
>> executed .
>> Now through "hbase.client.scanner.max.result.size" can the limitation of
>> number of row returned only apply to all the scanner instances.
>> I am wondering why the setMaxResultSize is removed now.
>>
>>> No such method for Scan in 0.94.x.
>>>
>>> If you want to set the max result size for a scan, you can achieve this
>> by setting the "hbase.client.scanner.max.result.size" configuration, the
>> default for which is Long.MAX_VALUE (no limited)
>>> ________________________________________
>>> 发件人: Weiping Qu [qu@informatik.uni-kl.de]
>>> 发送时间: 2014年3月17日 18:50
>>> 收件人: dev@hbase.apache.org
>>> 主题: setMaxResultSize method in Scan
>>>
>>> Hello,
>>>
>>> I could not find the method setMaxResultSize(long m)
>>> (
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
>>> in my Scanclass (0.94.13 version).
>>> Can anyone help me? Thanks
>>>
>>> Weiping
>>
>> --
>> Mit freundlichen Grü?en / Kind Regards
>>
>> *Weiping Qu*
>>
>> University of Kaiserslautern
>> Department of Computer Science
>> Heterogeneous Information Systems Group
>> P.O. Box 3049
>> 67653 Kaiserslautern, Germany
>>
>> Email: qu (at) informatik.uni-kl.de
>> Phone: +49 631 205 3264
>> Fax: +49 631 205 3299
>> Room: 36/331
>>


-- 
Mit freundlichen Grü?en / Kind Regards

*Weiping Qu*

University of Kaiserslautern
Department of Computer Science
Heterogeneous Information Systems Group
P.O. Box 3049
67653 Kaiserslautern, Germany

Email: qu (at) informatik.uni-kl.de
Phone: +49 631 205 3264
Fax: +49 631 205 3299
Room: 36/331

Re: 答复: setMaxResultSize method in Scan

Posted by Ted Yu <yu...@gmail.com>.
This method was introduced by HBASE-2214 which is in 0.96+

Can you upgrade to 0.96 or 0.98 ?

Cheers


On Mon, Mar 17, 2014 at 4:48 AM, Weiping Qu <qu...@informatik.uni-kl.de> wrote:

> Thanks.
>
> I'd like to assume that setMaxResultSize is equivalent to the SQL Limit
> clause as expected which is specified each time a SQL statement is
> executed .
> Now through "hbase.client.scanner.max.result.size" can the limitation of
> number of row returned only apply to all the scanner instances.
> I am wondering why the setMaxResultSize is removed now.
>
> > No such method for Scan in 0.94.x.
> >
> > If you want to set the max result size for a scan, you can achieve this
> by setting the "hbase.client.scanner.max.result.size" configuration, the
> default for which is Long.MAX_VALUE (no limited)
> > ________________________________________
> > 发件人: Weiping Qu [qu@informatik.uni-kl.de]
> > 发送时间: 2014年3月17日 18:50
> > 收件人: dev@hbase.apache.org
> > 主题: setMaxResultSize method in Scan
> >
> > Hello,
> >
> > I could not find the method setMaxResultSize(long m)
> > (
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
> > in my Scanclass (0.94.13 version).
> > Can anyone help me? Thanks
> >
> > Weiping
>
>
> --
> Mit freundlichen Grü?en / Kind Regards
>
> *Weiping Qu*
>
> University of Kaiserslautern
> Department of Computer Science
> Heterogeneous Information Systems Group
> P.O. Box 3049
> 67653 Kaiserslautern, Germany
>
> Email: qu (at) informatik.uni-kl.de
> Phone: +49 631 205 3264
> Fax: +49 631 205 3299
> Room: 36/331
>

Re: 答复: setMaxResultSize method in Scan

Posted by Weiping Qu <qu...@informatik.uni-kl.de>.
Hi James,

Thank you for reminding me of that.
Cheers,
Weiping
> Hi Weiping,
> Take a look at Apache Phoenix (http://phoenix.incubator.apache.org/). It's
> a SQL layer on top of HBase and has support for LIMIT and a query planner
> and optimizer.
> Thanks,
> James
>
>
> On Mon, Mar 17, 2014 at 12:19 PM, Weiping Qu <qu...@informatik.uni-kl.de>wrote:
>
>> I am doing a mult-thread(100) scan test over hbase.
>> If one request with given key-range matches a large number of
>> correspoding rows in hbase, my request is waiting for this scan to
>> complete.
>> The throughput is really slow.
>> For test purpose, I'd like to use LIMIT to reduce the time on scanning
>> and transferring results back from hbase to increase the throughput.
>> Do you think the "hbase.client.scan.max.result.size" or
>> setMaxResultSize(in bytes) could help HBase to stop scan at the LIMIT
>> before scanning complete corresponding rows?
>>
>> As you mentioned that there is no query optimizer in HBase, I assume
>> that region servers will not stop scanning the rows in this key-range in
>> this case until it gets all the results and limit the results to max
>> size which is sent to the client.
>> If so, there is not much I can do to compare the throughput with that in
>> relational databases like MySQL.
>>
>> Thanks,
>> Cheers.
>>> Limit clause in SQL Select statement makes sense because it allows query
>> optimizer to plan accordingly.
>>> It does not make sense in HBase as since there is no query planner and
>> or optimization involved during
>>> scanning HBase table. You can easily mimic this functionality on a
>> client side (I mean - limit).
>>> Best regards,
>>> Vladimir Rodionov
>>> Principal Platform Engineer
>>> Carrier IQ, www.carrieriq.com
>>> e-mail: vrodionov@carrieriq.com
>>>
>>> ________________________________________
>>> From: Weiping Qu [qu@informatik.uni-kl.de]
>>> Sent: Monday, March 17, 2014 4:48 AM
>>> To: dev@hbase.apache.org
>>> Subject: Re: 答复: setMaxResultSize method in Scan
>>>
>>> Thanks.
>>>
>>> I'd like to assume that setMaxResultSize is equivalent to the SQL Limit
>>> clause as expected which is specified each time a SQL statement is
>>> executed .
>>> Now through "hbase.client.scanner.max.result.size" can the limitation of
>>> number of row returned only apply to all the scanner instances.
>>> I am wondering why the setMaxResultSize is removed now.
>>>
>>>> No such method for Scan in 0.94.x.
>>>>
>>>> If you want to set the max result size for a scan, you can achieve this
>> by setting the "hbase.client.scanner.max.result.size" configuration, the
>> default for which is Long.MAX_VALUE (no limited)
>>>> ________________________________________
>>>> 发件人: Weiping Qu [qu@informatik.uni-kl.de]
>>>> 发送时间: 2014年3月17日 18:50
>>>> 收件人: dev@hbase.apache.org
>>>> 主题: setMaxResultSize method in Scan
>>>>
>>>> Hello,
>>>>
>>>> I could not find the method setMaxResultSize(long m)
>>>> (
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
>>>> in my Scanclass (0.94.13 version).
>>>> Can anyone help me? Thanks
>>>>
>>>> Weiping
>>> --
>>> Mit freundlichen Grü?en / Kind Regards
>>>
>>> *Weiping Qu*
>>>
>>> University of Kaiserslautern
>>> Department of Computer Science
>>> Heterogeneous Information Systems Group
>>> P.O. Box 3049
>>> 67653 Kaiserslautern, Germany
>>>
>>> Email: qu (at) informatik.uni-kl.de
>>> Phone: +49 631 205 3264
>>> Fax: +49 631 205 3299
>>> Room: 36/331
>>>
>>> Confidentiality Notice:  The information contained in this message,
>> including any attachments hereto, may be confidential and is intended to be
>> read only by the individual or entity to whom this message is addressed. If
>> the reader of this message is not the intended recipient or an agent or
>> designee of the intended recipient, please note that any review, use,
>> disclosure or distribution of this message or its attachments, in any form,
>> is strictly prohibited.  If you have received this message in error, please
>> immediately notify the sender and/or Notifications@carrieriq.com and
>> delete or destroy any copy of this message and its attachments.
>>
>>
>> --
>> Mit freundlichen Grü?en / Kind Regards
>>
>> *Weiping Qu*
>>
>> University of Kaiserslautern
>> Department of Computer Science
>> Heterogeneous Information Systems Group
>> P.O. Box 3049
>> 67653 Kaiserslautern, Germany
>>
>> Email: qu (at) informatik.uni-kl.de
>> Phone: +49 631 205 3264
>> Fax: +49 631 205 3299
>> Room: 36/331
>>


-- 
Mit freundlichen Grü?en / Kind Regards

*Weiping Qu*

University of Kaiserslautern
Department of Computer Science
Heterogeneous Information Systems Group
P.O. Box 3049
67653 Kaiserslautern, Germany

Email: qu (at) informatik.uni-kl.de
Phone: +49 631 205 3264
Fax: +49 631 205 3299
Room: 36/331

Re: 答复: setMaxResultSize method in Scan

Posted by James Taylor <jt...@salesforce.com>.
Hi Weiping,
Take a look at Apache Phoenix (http://phoenix.incubator.apache.org/). It's
a SQL layer on top of HBase and has support for LIMIT and a query planner
and optimizer.
Thanks,
James


On Mon, Mar 17, 2014 at 12:19 PM, Weiping Qu <qu...@informatik.uni-kl.de>wrote:

> I am doing a mult-thread(100) scan test over hbase.
> If one request with given key-range matches a large number of
> correspoding rows in hbase, my request is waiting for this scan to
> complete.
> The throughput is really slow.
> For test purpose, I'd like to use LIMIT to reduce the time on scanning
> and transferring results back from hbase to increase the throughput.
> Do you think the "hbase.client.scan.max.result.size" or
> setMaxResultSize(in bytes) could help HBase to stop scan at the LIMIT
> before scanning complete corresponding rows?
>
> As you mentioned that there is no query optimizer in HBase, I assume
> that region servers will not stop scanning the rows in this key-range in
> this case until it gets all the results and limit the results to max
> size which is sent to the client.
> If so, there is not much I can do to compare the throughput with that in
> relational databases like MySQL.
>
> Thanks,
> Cheers.
> > Limit clause in SQL Select statement makes sense because it allows query
> optimizer to plan accordingly.
> > It does not make sense in HBase as since there is no query planner and
> or optimization involved during
> > scanning HBase table. You can easily mimic this functionality on a
> client side (I mean - limit).
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: vrodionov@carrieriq.com
> >
> > ________________________________________
> > From: Weiping Qu [qu@informatik.uni-kl.de]
> > Sent: Monday, March 17, 2014 4:48 AM
> > To: dev@hbase.apache.org
> > Subject: Re: 答复: setMaxResultSize method in Scan
> >
> > Thanks.
> >
> > I'd like to assume that setMaxResultSize is equivalent to the SQL Limit
> > clause as expected which is specified each time a SQL statement is
> > executed .
> > Now through "hbase.client.scanner.max.result.size" can the limitation of
> > number of row returned only apply to all the scanner instances.
> > I am wondering why the setMaxResultSize is removed now.
> >
> >> No such method for Scan in 0.94.x.
> >>
> >> If you want to set the max result size for a scan, you can achieve this
> by setting the "hbase.client.scanner.max.result.size" configuration, the
> default for which is Long.MAX_VALUE (no limited)
> >> ________________________________________
> >> 发件人: Weiping Qu [qu@informatik.uni-kl.de]
> >> 发送时间: 2014年3月17日 18:50
> >> 收件人: dev@hbase.apache.org
> >> 主题: setMaxResultSize method in Scan
> >>
> >> Hello,
> >>
> >> I could not find the method setMaxResultSize(long m)
> >> (
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
> >> in my Scanclass (0.94.13 version).
> >> Can anyone help me? Thanks
> >>
> >> Weiping
> >
> > --
> > Mit freundlichen Grü?en / Kind Regards
> >
> > *Weiping Qu*
> >
> > University of Kaiserslautern
> > Department of Computer Science
> > Heterogeneous Information Systems Group
> > P.O. Box 3049
> > 67653 Kaiserslautern, Germany
> >
> > Email: qu (at) informatik.uni-kl.de
> > Phone: +49 631 205 3264
> > Fax: +49 631 205 3299
> > Room: 36/331
> >
> > Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>
>
> --
> Mit freundlichen Grü?en / Kind Regards
>
> *Weiping Qu*
>
> University of Kaiserslautern
> Department of Computer Science
> Heterogeneous Information Systems Group
> P.O. Box 3049
> 67653 Kaiserslautern, Germany
>
> Email: qu (at) informatik.uni-kl.de
> Phone: +49 631 205 3264
> Fax: +49 631 205 3299
> Room: 36/331
>

RE: 答复: setMaxResultSize method in Scan

Posted by Vladimir Rodionov <vr...@carrieriq.com>.
Sure.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Ted Yu [yuzhihong@gmail.com]
Sent: Monday, March 17, 2014 1:34 PM
To: dev@hbase.apache.org
Subject: Re: 答复: setMaxResultSize method in Scan

bq. Scan.setRowCaching()

I think you meant Scan.setCaching()


On Mon, Mar 17, 2014 at 1:28 PM, Vladimir Rodionov
<vr...@carrieriq.com>wrote:

>
> HBase RegionServer does scanning in batches, client requests next batch
> from server
> and server reads and merge the data from cache/disk. You can control batch
> data size by setting both:
>
> Scan.setRowCaching(number of rows to send in one RPC request)
>
> Technically speaking, this allows you to control LIMIT from the client
> side. Your overhead will never be larger than the limit set by
> setRowCaching.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Weiping Qu [qu@informatik.uni-kl.de]
> Sent: Monday, March 17, 2014 12:19 PM
> To: dev@hbase.apache.org
> Subject: Re: 答复: setMaxResultSize method in Scan
>
> I am doing a mult-thread(100) scan test over hbase.
> If one request with given key-range matches a large number of
> correspoding rows in hbase, my request is waiting for this scan to
> complete.
> The throughput is really slow.
> For test purpose, I'd like to use LIMIT to reduce the time on scanning
> and transferring results back from hbase to increase the throughput.
> Do you think the "hbase.client.scan.max.result.size" or
> setMaxResultSize(in bytes) could help HBase to stop scan at the LIMIT
> before scanning complete corresponding rows?
>
> As you mentioned that there is no query optimizer in HBase, I assume
> that region servers will not stop scanning the rows in this key-range in
> this case until it gets all the results and limit the results to max
> size which is sent to the client.
> If so, there is not much I can do to compare the throughput with that in
> relational databases like MySQL.
>
> Thanks,
> Cheers.
> > Limit clause in SQL Select statement makes sense because it allows query
> optimizer to plan accordingly.
> > It does not make sense in HBase as since there is no query planner and
> or optimization involved during
> > scanning HBase table. You can easily mimic this functionality on a
> client side (I mean - limit).
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: vrodionov@carrieriq.com
> >
> > ________________________________________
> > From: Weiping Qu [qu@informatik.uni-kl.de]
> > Sent: Monday, March 17, 2014 4:48 AM
> > To: dev@hbase.apache.org
> > Subject: Re: 答复: setMaxResultSize method in Scan
> >
> > Thanks.
> >
> > I'd like to assume that setMaxResultSize is equivalent to the SQL Limit
> > clause as expected which is specified each time a SQL statement is
> > executed .
> > Now through "hbase.client.scanner.max.result.size" can the limitation of
> > number of row returned only apply to all the scanner instances.
> > I am wondering why the setMaxResultSize is removed now.
> >
> >> No such method for Scan in 0.94.x.
> >>
> >> If you want to set the max result size for a scan, you can achieve this
> by setting the "hbase.client.scanner.max.result.size" configuration, the
> default for which is Long.MAX_VALUE (no limited)
> >> ________________________________________
> >> 发件人: Weiping Qu [qu@informatik.uni-kl.de]
> >> 发送时间: 2014年3月17日 18:50
> >> 收件人: dev@hbase.apache.org
> >> 主题: setMaxResultSize method in Scan
> >>
> >> Hello,
> >>
> >> I could not find the method setMaxResultSize(long m)
> >> (
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
> >> in my Scanclass (0.94.13 version).
> >> Can anyone help me? Thanks
> >>
> >> Weiping
> >
> > --
> > Mit freundlichen Grü?en / Kind Regards
> >
> > *Weiping Qu*
> >
> > University of Kaiserslautern
> > Department of Computer Science
> > Heterogeneous Information Systems Group
> > P.O. Box 3049
> > 67653 Kaiserslautern, Germany
> >
> > Email: qu (at) informatik.uni-kl.de
> > Phone: +49 631 205 3264
> > Fax: +49 631 205 3299
> > Room: 36/331
> >
> > Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>
>
> --
> Mit freundlichen Grü?en / Kind Regards
>
> *Weiping Qu*
>
> University of Kaiserslautern
> Department of Computer Science
> Heterogeneous Information Systems Group
> P.O. Box 3049
> 67653 Kaiserslautern, Germany
>
> Email: qu (at) informatik.uni-kl.de
> Phone: +49 631 205 3264
> Fax: +49 631 205 3299
> Room: 36/331
>

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: 答复: setMaxResultSize method in Scan

Posted by Ted Yu <yu...@gmail.com>.
bq. Scan.setRowCaching()

I think you meant Scan.setCaching()


On Mon, Mar 17, 2014 at 1:28 PM, Vladimir Rodionov
<vr...@carrieriq.com>wrote:

>
> HBase RegionServer does scanning in batches, client requests next batch
> from server
> and server reads and merge the data from cache/disk. You can control batch
> data size by setting both:
>
> Scan.setRowCaching(number of rows to send in one RPC request)
>
> Technically speaking, this allows you to control LIMIT from the client
> side. Your overhead will never be larger than the limit set by
> setRowCaching.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Weiping Qu [qu@informatik.uni-kl.de]
> Sent: Monday, March 17, 2014 12:19 PM
> To: dev@hbase.apache.org
> Subject: Re: 答复: setMaxResultSize method in Scan
>
> I am doing a mult-thread(100) scan test over hbase.
> If one request with given key-range matches a large number of
> correspoding rows in hbase, my request is waiting for this scan to
> complete.
> The throughput is really slow.
> For test purpose, I'd like to use LIMIT to reduce the time on scanning
> and transferring results back from hbase to increase the throughput.
> Do you think the "hbase.client.scan.max.result.size" or
> setMaxResultSize(in bytes) could help HBase to stop scan at the LIMIT
> before scanning complete corresponding rows?
>
> As you mentioned that there is no query optimizer in HBase, I assume
> that region servers will not stop scanning the rows in this key-range in
> this case until it gets all the results and limit the results to max
> size which is sent to the client.
> If so, there is not much I can do to compare the throughput with that in
> relational databases like MySQL.
>
> Thanks,
> Cheers.
> > Limit clause in SQL Select statement makes sense because it allows query
> optimizer to plan accordingly.
> > It does not make sense in HBase as since there is no query planner and
> or optimization involved during
> > scanning HBase table. You can easily mimic this functionality on a
> client side (I mean - limit).
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: vrodionov@carrieriq.com
> >
> > ________________________________________
> > From: Weiping Qu [qu@informatik.uni-kl.de]
> > Sent: Monday, March 17, 2014 4:48 AM
> > To: dev@hbase.apache.org
> > Subject: Re: 答复: setMaxResultSize method in Scan
> >
> > Thanks.
> >
> > I'd like to assume that setMaxResultSize is equivalent to the SQL Limit
> > clause as expected which is specified each time a SQL statement is
> > executed .
> > Now through "hbase.client.scanner.max.result.size" can the limitation of
> > number of row returned only apply to all the scanner instances.
> > I am wondering why the setMaxResultSize is removed now.
> >
> >> No such method for Scan in 0.94.x.
> >>
> >> If you want to set the max result size for a scan, you can achieve this
> by setting the "hbase.client.scanner.max.result.size" configuration, the
> default for which is Long.MAX_VALUE (no limited)
> >> ________________________________________
> >> 发件人: Weiping Qu [qu@informatik.uni-kl.de]
> >> 发送时间: 2014年3月17日 18:50
> >> 收件人: dev@hbase.apache.org
> >> 主题: setMaxResultSize method in Scan
> >>
> >> Hello,
> >>
> >> I could not find the method setMaxResultSize(long m)
> >> (
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
> >> in my Scanclass (0.94.13 version).
> >> Can anyone help me? Thanks
> >>
> >> Weiping
> >
> > --
> > Mit freundlichen Grü?en / Kind Regards
> >
> > *Weiping Qu*
> >
> > University of Kaiserslautern
> > Department of Computer Science
> > Heterogeneous Information Systems Group
> > P.O. Box 3049
> > 67653 Kaiserslautern, Germany
> >
> > Email: qu (at) informatik.uni-kl.de
> > Phone: +49 631 205 3264
> > Fax: +49 631 205 3299
> > Room: 36/331
> >
> > Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>
>
> --
> Mit freundlichen Grü?en / Kind Regards
>
> *Weiping Qu*
>
> University of Kaiserslautern
> Department of Computer Science
> Heterogeneous Information Systems Group
> P.O. Box 3049
> 67653 Kaiserslautern, Germany
>
> Email: qu (at) informatik.uni-kl.de
> Phone: +49 631 205 3264
> Fax: +49 631 205 3299
> Room: 36/331
>

RE: 答复: setMaxResultSize method in Scan

Posted by Vladimir Rodionov <vr...@carrieriq.com>.
HBase RegionServer does scanning in batches, client requests next batch from server
and server reads and merge the data from cache/disk. You can control batch data size by setting both:

Scan.setRowCaching(number of rows to send in one RPC request)

Technically speaking, this allows you to control LIMIT from the client side. Your overhead will never be larger than the limit set by setRowCaching.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Weiping Qu [qu@informatik.uni-kl.de]
Sent: Monday, March 17, 2014 12:19 PM
To: dev@hbase.apache.org
Subject: Re: 答复: setMaxResultSize method in Scan

I am doing a mult-thread(100) scan test over hbase.
If one request with given key-range matches a large number of
correspoding rows in hbase, my request is waiting for this scan to complete.
The throughput is really slow.
For test purpose, I'd like to use LIMIT to reduce the time on scanning
and transferring results back from hbase to increase the throughput.
Do you think the "hbase.client.scan.max.result.size" or
setMaxResultSize(in bytes) could help HBase to stop scan at the LIMIT
before scanning complete corresponding rows?

As you mentioned that there is no query optimizer in HBase, I assume
that region servers will not stop scanning the rows in this key-range in
this case until it gets all the results and limit the results to max
size which is sent to the client.
If so, there is not much I can do to compare the throughput with that in
relational databases like MySQL.

Thanks,
Cheers.
> Limit clause in SQL Select statement makes sense because it allows query optimizer to plan accordingly.
> It does not make sense in HBase as since there is no query planner and or optimization involved during
> scanning HBase table. You can easily mimic this functionality on a client side (I mean - limit).
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Weiping Qu [qu@informatik.uni-kl.de]
> Sent: Monday, March 17, 2014 4:48 AM
> To: dev@hbase.apache.org
> Subject: Re: 答复: setMaxResultSize method in Scan
>
> Thanks.
>
> I'd like to assume that setMaxResultSize is equivalent to the SQL Limit
> clause as expected which is specified each time a SQL statement is
> executed .
> Now through "hbase.client.scanner.max.result.size" can the limitation of
> number of row returned only apply to all the scanner instances.
> I am wondering why the setMaxResultSize is removed now.
>
>> No such method for Scan in 0.94.x.
>>
>> If you want to set the max result size for a scan, you can achieve this by setting the "hbase.client.scanner.max.result.size" configuration, the default for which is Long.MAX_VALUE (no limited)
>> ________________________________________
>> 发件人: Weiping Qu [qu@informatik.uni-kl.de]
>> 发送时间: 2014年3月17日 18:50
>> 收件人: dev@hbase.apache.org
>> 主题: setMaxResultSize method in Scan
>>
>> Hello,
>>
>> I could not find the method setMaxResultSize(long m)
>> (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
>> in my Scanclass (0.94.13 version).
>> Can anyone help me? Thanks
>>
>> Weiping
>
> --
> Mit freundlichen Grü?en / Kind Regards
>
> *Weiping Qu*
>
> University of Kaiserslautern
> Department of Computer Science
> Heterogeneous Information Systems Group
> P.O. Box 3049
> 67653 Kaiserslautern, Germany
>
> Email: qu (at) informatik.uni-kl.de
> Phone: +49 631 205 3264
> Fax: +49 631 205 3299
> Room: 36/331
>
> Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.


--
Mit freundlichen Grü?en / Kind Regards

*Weiping Qu*

University of Kaiserslautern
Department of Computer Science
Heterogeneous Information Systems Group
P.O. Box 3049
67653 Kaiserslautern, Germany

Email: qu (at) informatik.uni-kl.de
Phone: +49 631 205 3264
Fax: +49 631 205 3299
Room: 36/331

Re: 答复: setMaxResultSize method in Scan

Posted by Weiping Qu <qu...@informatik.uni-kl.de>.
I am doing a mult-thread(100) scan test over hbase.
If one request with given key-range matches a large number of
correspoding rows in hbase, my request is waiting for this scan to complete.
The throughput is really slow.
For test purpose, I'd like to use LIMIT to reduce the time on scanning
and transferring results back from hbase to increase the throughput.
Do you think the "hbase.client.scan.max.result.size" or
setMaxResultSize(in bytes) could help HBase to stop scan at the LIMIT
before scanning complete corresponding rows?

As you mentioned that there is no query optimizer in HBase, I assume
that region servers will not stop scanning the rows in this key-range in
this case until it gets all the results and limit the results to max
size which is sent to the client.
If so, there is not much I can do to compare the throughput with that in
relational databases like MySQL.

Thanks,
Cheers.
> Limit clause in SQL Select statement makes sense because it allows query optimizer to plan accordingly.
> It does not make sense in HBase as since there is no query planner and or optimization involved during
> scanning HBase table. You can easily mimic this functionality on a client side (I mean - limit).
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Weiping Qu [qu@informatik.uni-kl.de]
> Sent: Monday, March 17, 2014 4:48 AM
> To: dev@hbase.apache.org
> Subject: Re: 答复: setMaxResultSize method in Scan
>
> Thanks.
>
> I'd like to assume that setMaxResultSize is equivalent to the SQL Limit
> clause as expected which is specified each time a SQL statement is
> executed .
> Now through "hbase.client.scanner.max.result.size" can the limitation of
> number of row returned only apply to all the scanner instances.
> I am wondering why the setMaxResultSize is removed now.
>
>> No such method for Scan in 0.94.x.
>>
>> If you want to set the max result size for a scan, you can achieve this by setting the "hbase.client.scanner.max.result.size" configuration, the default for which is Long.MAX_VALUE (no limited)
>> ________________________________________
>> 发件人: Weiping Qu [qu@informatik.uni-kl.de]
>> 发送时间: 2014年3月17日 18:50
>> 收件人: dev@hbase.apache.org
>> 主题: setMaxResultSize method in Scan
>>
>> Hello,
>>
>> I could not find the method setMaxResultSize(long m)
>> (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
>> in my Scanclass (0.94.13 version).
>> Can anyone help me? Thanks
>>
>> Weiping
>
> --
> Mit freundlichen Grü?en / Kind Regards
>
> *Weiping Qu*
>
> University of Kaiserslautern
> Department of Computer Science
> Heterogeneous Information Systems Group
> P.O. Box 3049
> 67653 Kaiserslautern, Germany
>
> Email: qu (at) informatik.uni-kl.de
> Phone: +49 631 205 3264
> Fax: +49 631 205 3299
> Room: 36/331
>
> Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.


-- 
Mit freundlichen Grü?en / Kind Regards

*Weiping Qu*

University of Kaiserslautern
Department of Computer Science
Heterogeneous Information Systems Group
P.O. Box 3049
67653 Kaiserslautern, Germany

Email: qu (at) informatik.uni-kl.de
Phone: +49 631 205 3264
Fax: +49 631 205 3299
Room: 36/331

RE: 答复: setMaxResultSize method in Scan

Posted by Vladimir Rodionov <vr...@carrieriq.com>.
Limit clause in SQL Select statement makes sense because it allows query optimizer to plan accordingly.
It does not make sense in HBase as since there is no query planner and or optimization involved during
scanning HBase table. You can easily mimic this functionality on a client side (I mean - limit).

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Weiping Qu [qu@informatik.uni-kl.de]
Sent: Monday, March 17, 2014 4:48 AM
To: dev@hbase.apache.org
Subject: Re: 答复: setMaxResultSize method in Scan

Thanks.

I'd like to assume that setMaxResultSize is equivalent to the SQL Limit
clause as expected which is specified each time a SQL statement is
executed .
Now through "hbase.client.scanner.max.result.size" can the limitation of
number of row returned only apply to all the scanner instances.
I am wondering why the setMaxResultSize is removed now.

> No such method for Scan in 0.94.x.
>
> If you want to set the max result size for a scan, you can achieve this by setting the "hbase.client.scanner.max.result.size" configuration, the default for which is Long.MAX_VALUE (no limited)
> ________________________________________
> 发件人: Weiping Qu [qu@informatik.uni-kl.de]
> 发送时间: 2014年3月17日 18:50
> 收件人: dev@hbase.apache.org
> 主题: setMaxResultSize method in Scan
>
> Hello,
>
> I could not find the method setMaxResultSize(long m)
> (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
> in my Scanclass (0.94.13 version).
> Can anyone help me? Thanks
>
> Weiping


--
Mit freundlichen Grü?en / Kind Regards

*Weiping Qu*

University of Kaiserslautern
Department of Computer Science
Heterogeneous Information Systems Group
P.O. Box 3049
67653 Kaiserslautern, Germany

Email: qu (at) informatik.uni-kl.de
Phone: +49 631 205 3264
Fax: +49 631 205 3299
Room: 36/331

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: 答复: setMaxResultSize method in Scan

Posted by Weiping Qu <qu...@informatik.uni-kl.de>.
Thanks.

I'd like to assume that setMaxResultSize is equivalent to the SQL Limit
clause as expected which is specified each time a SQL statement is
executed .
Now through "hbase.client.scanner.max.result.size" can the limitation of
number of row returned only apply to all the scanner instances.
I am wondering why the setMaxResultSize is removed now.

> No such method for Scan in 0.94.x.
>
> If you want to set the max result size for a scan, you can achieve this by setting the "hbase.client.scanner.max.result.size" configuration, the default for which is Long.MAX_VALUE (no limited)
> ________________________________________
> 发件人: Weiping Qu [qu@informatik.uni-kl.de]
> 发送时间: 2014年3月17日 18:50
> 收件人: dev@hbase.apache.org
> 主题: setMaxResultSize method in Scan
>
> Hello,
>
> I could not find the method setMaxResultSize(long m)
> (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
> in my Scanclass (0.94.13 version).
> Can anyone help me? Thanks
>
> Weiping


-- 
Mit freundlichen Grü?en / Kind Regards

*Weiping Qu*

University of Kaiserslautern
Department of Computer Science
Heterogeneous Information Systems Group
P.O. Box 3049
67653 Kaiserslautern, Germany

Email: qu (at) informatik.uni-kl.de
Phone: +49 631 205 3264
Fax: +49 631 205 3299
Room: 36/331