You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Abe Weinograd <ab...@flonet.com> on 2014/03/07 16:19:35 UTC

JDBC result iteration is slow

Trying to pull around 100k rows through the JDBC driver.  I
set hbase.client.scanner.caching to 10000 in the JDBC connection options.
 Additionally, its very slow with even 1,000 rows (about 30 seconds to
iterate over it).

I assume this is a client side issue, but not sure what else I can tweak.

Thanks,
Abe

Re: JDBC result iteration is slow

Posted by James Taylor <ja...@apache.org>.
Hi Abe,
I don't think this is a Phoenix issue, as your query doesn't even have a
where clause. All this would boil down to in HBase would be a regular Scan
with a PageFilter (like the explain plan says). We parallelize it, but that
just amounts to running N scans each with discrete ranges over your row key.

I suspect there's a problem with one of your region servers. Check your
server logs for exceptions. You can also check the Phoenix logs to see how
long each parallel scan is taking to isolate if any regions are slower than
others.

Thanks,
James


On Mon, Mar 10, 2014 at 9:49 AM, Abe Weinograd <ab...@flonet.com> wrote:

> Hi James,
>
> Thanks.  Here is the info you requested.  Additionally, I assumed it was a
> client side thing because a COUNT(1) on the whole table is < 2sec after
> rows are in the cache.  The first time running COUNT(1) is usually a bit
> longer.  My table has about 3.7M rows in it.  A SELECT * (listing most
> columns in the table) is what takes longer with the CPU spiking on the
> client while the result set is being iterated over.
>
> Thanks for your help.
> Abe
>
> - HBase version: 0.94.15 (CDH 4.6)
> - Phoenix version: 2.2.3 (using tarball from
> - size of cluster: 1 Master 4 RS (each 15GB of RAM, 4 Cores)
> - setting for JVM max heap size: 4GiB
> - create table statement: attached
> - query: attached
> - explain plan:
> CLIENT PARALLEL 48-WAY FULL SCAN OVER MY_TABLE
>     SERVER FILTER BY PageFilter 100000
> CLIENT 100000 ROW LIMIT
>
> - number of rows in table: 3.7 Million (just testing with this.  this will
> be much larger over time)
>
>
> On Mon, Mar 10, 2014 at 11:44 AM, James Taylor <ja...@apache.org>wrote:
>
>> Hi Abe,
>> There's likely something wrong with your installation, as this is not
>> expected behavior. Please let us know the following:
>> - HBase version
>> - Phoenix version
>> - size of cluster
>> - setting for JVM max heap size
>> - create table statement
>> - query
>> - explain plan
>> - number of rows in table
>> Thanks,
>> James
>>
>>
>> On Monday, March 10, 2014, Abe Weinograd <ab...@flonet.com> wrote:
>>
>>> I spent a little more time with this and am still unable to tune the
>>> client properly.  I am testing using sqlline, Squirrel and just using the
>>> JDBC driver in code.  I tried setting the hbase scanner caching in the JDBC
>>> connection, in addition to putting it in the hbase-site.xml in the same dir
>>> as the jar for sqlline.  I think my client is bottlenecked, partly cause
>>> the CPU spikes and ~30 secs to retrieve 1,000 rows.
>>>
>>> I expect to retrieve a lot more than this in our use cases.  Is this a
>>> tuning issue on my end or is this expected behavior.
>>>
>>> Thanks,
>>> Abe
>>>
>>>
>>> On Fri, Mar 7, 2014 at 10:19 AM, Abe Weinograd <ab...@flonet.com> wrote:
>>>
>>>> Trying to pull around 100k rows through the JDBC driver.  I
>>>> set hbase.client.scanner.caching to 10000 in the JDBC connection options.
>>>>  Additionally, its very slow with even 1,000 rows (about 30 seconds to
>>>> iterate over it).
>>>>
>>>> I assume this is a client side issue, but not sure what else I can
>>>> tweak.
>>>>
>>>> Thanks,
>>>> Abe
>>>>
>>>
>>>
>

Re: JDBC result iteration is slow

Posted by Abe Weinograd <ab...@flonet.com>.
Hi James,

Thanks.  Here is the info you requested.  Additionally, I assumed it was a
client side thing because a COUNT(1) on the whole table is < 2sec after
rows are in the cache.  The first time running COUNT(1) is usually a bit
longer.  My table has about 3.7M rows in it.  A SELECT * (listing most
columns in the table) is what takes longer with the CPU spiking on the
client while the result set is being iterated over.

Thanks for your help.
Abe

- HBase version: 0.94.15 (CDH 4.6)
- Phoenix version: 2.2.3 (using tarball from
- size of cluster: 1 Master 4 RS (each 15GB of RAM, 4 Cores)
- setting for JVM max heap size: 4GiB
- create table statement: attached
- query: attached
- explain plan:
CLIENT PARALLEL 48-WAY FULL SCAN OVER MY_TABLE
    SERVER FILTER BY PageFilter 100000
CLIENT 100000 ROW LIMIT

- number of rows in table: 3.7 Million (just testing with this.  this will
be much larger over time)


On Mon, Mar 10, 2014 at 11:44 AM, James Taylor <ja...@apache.org>wrote:

> Hi Abe,
> There's likely something wrong with your installation, as this is not
> expected behavior. Please let us know the following:
> - HBase version
> - Phoenix version
> - size of cluster
> - setting for JVM max heap size
> - create table statement
> - query
> - explain plan
> - number of rows in table
> Thanks,
> James
>
>
> On Monday, March 10, 2014, Abe Weinograd <ab...@flonet.com> wrote:
>
>> I spent a little more time with this and am still unable to tune the
>> client properly.  I am testing using sqlline, Squirrel and just using the
>> JDBC driver in code.  I tried setting the hbase scanner caching in the JDBC
>> connection, in addition to putting it in the hbase-site.xml in the same dir
>> as the jar for sqlline.  I think my client is bottlenecked, partly cause
>> the CPU spikes and ~30 secs to retrieve 1,000 rows.
>>
>> I expect to retrieve a lot more than this in our use cases.  Is this a
>> tuning issue on my end or is this expected behavior.
>>
>> Thanks,
>> Abe
>>
>>
>> On Fri, Mar 7, 2014 at 10:19 AM, Abe Weinograd <ab...@flonet.com> wrote:
>>
>>> Trying to pull around 100k rows through the JDBC driver.  I
>>> set hbase.client.scanner.caching to 10000 in the JDBC connection options.
>>>  Additionally, its very slow with even 1,000 rows (about 30 seconds to
>>> iterate over it).
>>>
>>> I assume this is a client side issue, but not sure what else I can tweak.
>>>
>>> Thanks,
>>> Abe
>>>
>>
>>

Re: JDBC result iteration is slow

Posted by James Taylor <ja...@apache.org>.
Hi Abe,
There's likely something wrong with your installation, as this is not
expected behavior. Please let us know the following:
- HBase version
- Phoenix version
- size of cluster
- setting for JVM max heap size
- create table statement
- query
- explain plan
- number of rows in table
Thanks,
James


On Monday, March 10, 2014, Abe Weinograd <ab...@flonet.com> wrote:

> I spent a little more time with this and am still unable to tune the
> client properly.  I am testing using sqlline, Squirrel and just using the
> JDBC driver in code.  I tried setting the hbase scanner caching in the JDBC
> connection, in addition to putting it in the hbase-site.xml in the same dir
> as the jar for sqlline.  I think my client is bottlenecked, partly cause
> the CPU spikes and ~30 secs to retrieve 1,000 rows.
>
> I expect to retrieve a lot more than this in our use cases.  Is this a
> tuning issue on my end or is this expected behavior.
>
> Thanks,
> Abe
>
>
> On Fri, Mar 7, 2014 at 10:19 AM, Abe Weinograd <abe@flonet.com<javascript:_e(%7B%7D,'cvml','abe@flonet.com');>
> > wrote:
>
>> Trying to pull around 100k rows through the JDBC driver.  I
>> set hbase.client.scanner.caching to 10000 in the JDBC connection options.
>>  Additionally, its very slow with even 1,000 rows (about 30 seconds to
>> iterate over it).
>>
>> I assume this is a client side issue, but not sure what else I can tweak.
>>
>> Thanks,
>> Abe
>>
>
>

Re: JDBC result iteration is slow

Posted by Abe Weinograd <ab...@flonet.com>.
I spent a little more time with this and am still unable to tune the client
properly.  I am testing using sqlline, Squirrel and just using the JDBC
driver in code.  I tried setting the hbase scanner caching in the JDBC
connection, in addition to putting it in the hbase-site.xml in the same dir
as the jar for sqlline.  I think my client is bottlenecked, partly cause
the CPU spikes and ~30 secs to retrieve 1,000 rows.

I expect to retrieve a lot more than this in our use cases.  Is this a
tuning issue on my end or is this expected behavior.

Thanks,
Abe


On Fri, Mar 7, 2014 at 10:19 AM, Abe Weinograd <ab...@flonet.com> wrote:

> Trying to pull around 100k rows through the JDBC driver.  I
> set hbase.client.scanner.caching to 10000 in the JDBC connection options.
>  Additionally, its very slow with even 1,000 rows (about 30 seconds to
> iterate over it).
>
> I assume this is a client side issue, but not sure what else I can tweak.
>
> Thanks,
> Abe
>