You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Abe Weinograd <ab...@flonet.com> on 2014/10/07 22:34:35 UTC

count on large table

I have a table with 1B  rows.  I know this can is very specific to my
environment, but just doing a SELECT COUNT(1) on the table   It never
finished.

We have a 10 node cluster with the RS's Heap size at 26GiB and skewed
towards the block cache.  In the RS logs, i see a lot of these:

2014-10-07 16:27:04,942 WARN org.apache.hadoop.ipc.RpcServer:
(responseTooSlow):
{"processingtimems":22770,"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","client":"
10.10.0.10:44791
","starttimems":1412713602172,"queuetimems":0,"class":"HRegionServer","responsesize":8,"method":"Scan"}

They stop eventually, but i the query times out and the query tool
reports: org.apache.phoenix.exception.PhoenixIOException: 187541ms passed
since the last invocation, timeout is currently set to 60000

Any ideas of where I can start in order to figure this out?

using Phoenix 4.1 on CDH 5.1 (Hbase 0.98.1)

Thanks,
Abe

Re: count on large table

Posted by Abe Weinograd <ab...@flonet.com>.
It seems to be disabled across the cluster.

Abe

On Tue, Oct 21, 2014 at 9:04 AM, Puneet Kumar Ojha <
puneet.kumar@pubmatic.com> wrote:

>  Please check IPV6 , if enabled …disable it and synchronize the ntpd and
> restart. It might help.
>
>
>
> *From:* Abe Weinograd [mailto:abe@flonet.com]
> *Sent:* Tuesday, October 21, 2014 6:19 PM
> *To:* user; lars hofhansl
>
> *Subject:* Re: count on large table
>
>
>
> Hi Lars,
>
>
>
> We have 10 Region Servers and 2 1TB on each.  The table is not salted, but
> we pre split regions when we bulk load so that we force equal distribution
> of our data.  the data is relatively distributed across our region servers,
> with no one Region Server being the "long tail."
>
>
>
> I don't have any metrics from Ganglia.  We are running CDH on EC2, for
> what it is work.  The CPUs spike to 100% and IO jumps pretty equallyon the
> Region Servers.  Attached is the RS log from one of them when all I am
> doing on the entire cluster is a COUNT in phoenix.
>
>
>
> Thanks again for your help,
>
> Abe
>
>
>
>
>
>
>
> On Tue, Oct 14, 2014 at 3:10 AM, lars hofhansl <la...@apache.org> wrote:
>
>   Back on the envelope math - assuming disks that can sustain 120mb/s -
> suggests you'd need about 17 disks 100% busy in total to pull 120gb off the
> disks in 60s. (i.e. at least 6 servers completely utilizing all of their
> disk). How many server do you have? HBase/HDFS will likely not quite max
> out all disks so your 10 machines are cutting that close.
>
>
>
> Not concerned about the 250 regions - at least not for this.
>
>
>
> Are all machines/disks/CPUs equally busy? Is the table salted?
>
> Note that HBase's block cache stores data uncompressed, and hence your
> dataset likely does not fit into the aggregate block cache. Your query
> might run sightly better with the /*+ NO_CACHE */ hint.
>
>
>
> Now from your 187541ms number, things look worse, though.
>
> Do you have OTSDB or Ganglia to record metrics of that cluster? If so can
> you share some graphs of IO/CPU during the query time.
>
> Any chance to attach a profiler to one of the busy region server, or at
> least get us stack trace?
>
>
>
> Thanks.
>
>
>
> -- Lars
>
>
>    ------------------------------
>
> *From:* Abe Weinograd <ab...@flonet.com>
> *To:* user <us...@phoenix.apache.org>; lars hofhansl <la...@apache.org>
> *Sent:* Monday, October 13, 2014 9:30 AM
>
>
> *Subject:* Re: count on large table
>
>
>
> Hi Lars,
>
>
>
> Thanks for following up.
>
>
>
> Table Size - 120G doing a du on HDFS.  We are using Snappy compression on
> the table.
>
> Column Family - We have 1 column family for all columns and are using the
> Phoenix default one.
>
> Regions - right now we have a ton of regions (250) because we pre split to
> help out bulk loads.  I haven't collapsed them yet, but in a DEV
> environment that is configured the same way, we have ~50 regions and
> experience the same performance issues.  I am planning on squaring this
> away and trying again.
>
> Resource Utilization - Really high CPU usage on the region servers and
> noticing a spike in IO too.
>
>
>
> Based on your questions and what I know, the # of regions needs to be
> compacted first, though I am not sure this is going to solve my issue.  the
> data nodes in HDFS have 3 1TB disks so I am not convinced that my IO is the
> bottleneck here.
>
>
>
> Thanks,
>
> Abe
>
>
>
>
>
> On Thu, Oct 9, 2014 at 8:36 PM, lars hofhansl <la...@apache.org> wrote:
>
>   Hi Abe,
>
>
>
> this is interesting.
>
>
>
> How big are your rows (i.e. how much data is in the table, you tell with
> du in HDFS)? And how many columns do you have? Any column families?
>
> How many regions are in this table? (you can tell that through the HBase
> HMaster UI page)
>
> When you execute the query, are all HBase region servers busy? Do you see
> IO, or just high CPU?
>
>
>
> Client batching won't help with an aggregate (such as count) where not
> much data is transferred back to the client.
>
>
>
> Thanks.
>
>
>
> -- Lars
>
>
>    ------------------------------
>
> *From:* Abe Weinograd <ab...@flonet.com>
> *To:* user <us...@phoenix.apache.org>
> *Sent:* Wednesday, October 8, 2014 9:15 AM
> *Subject:* Re: count on large table
>
>
>
> Good point.  I have to figure out how to do that in a SQL Tool like
> Squirrel or workbench.
>
>
>
> Is there any obvious thing i can do to help tune this?  I know that's a
> loaded question.  My client scanner batches are 1000 (also tried 10000 with
> no luck).
>
>
>
> Thanks,
>
> Abe
>
>
>
>
>
> On Tue, Oct 7, 2014 at 9:09 PM, sunfl@certusnet.com.cn <
> sunfl@certusnet.com.cn> wrote:
>
>  Hi, Abe
>
> Maybe setting the following property would help...
>
> <property>
>     <name>phoenix.query.timeoutMs</name>
>     <value>3600000</value>
>
> </property>
>
>
>
> Thanks,
>
> Sun
>
>
>  ------------------------------
>   ------------------------------
>
>
>
>   *From:* Abe Weinograd <ab...@flonet.com>
>
> *Date:* 2014-10-08 04:34
>
> *To:* user <us...@phoenix.apache.org>
>
> *Subject:* count on large table
>
> I have a table with 1B  rows.  I know this can is very specific to my
> environment, but just doing a SELECT COUNT(1) on the table   It never
> finished.
>
>
>
> We have a 10 node cluster with the RS's Heap size at 26GiB and skewed
> towards the block cache.  In the RS logs, i see a lot of these:
>
>
>
> 2014-10-07 16:27:04,942 WARN org.apache.hadoop.ipc.RpcServer:
> (responseTooSlow):
> {"processingtimems":22770,"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","client":"
> 10.10.0.10:44791
> ","starttimems":1412713602172,"queuetimems":0,"class":"HRegionServer","responsesize":8,"method":"Scan"}
>
>
>
> They stop eventually, but i the query times out and the query tool
> reports: org.apache.phoenix.exception.PhoenixIOException: 187541ms passed
> since the last invocation, timeout is currently set to 60000
>
>
>
> Any ideas of where I can start in order to figure this out?
>
>
>
> using Phoenix 4.1 on CDH 5.1 (Hbase 0.98.1)
>
>
>
> Thanks,
>
> Abe
>
>
>
>
>
>
>
>
>
>
>

RE: count on large table

Posted by Puneet Kumar Ojha <pu...@pubmatic.com>.
Please check IPV6 , if enabled …disable it and synchronize the ntpd and restart. It might help.

From: Abe Weinograd [mailto:abe@flonet.com]
Sent: Tuesday, October 21, 2014 6:19 PM
To: user; lars hofhansl
Subject: Re: count on large table

Hi Lars,

We have 10 Region Servers and 2 1TB on each.  The table is not salted, but we pre split regions when we bulk load so that we force equal distribution of our data.  the data is relatively distributed across our region servers, with no one Region Server being the "long tail."

I don't have any metrics from Ganglia.  We are running CDH on EC2, for what it is work.  The CPUs spike to 100% and IO jumps pretty equallyon the Region Servers.  Attached is the RS log from one of them when all I am doing on the entire cluster is a COUNT in phoenix.

Thanks again for your help,
Abe



On Tue, Oct 14, 2014 at 3:10 AM, lars hofhansl <la...@apache.org>> wrote:
Back on the envelope math - assuming disks that can sustain 120mb/s - suggests you'd need about 17 disks 100% busy in total to pull 120gb off the disks in 60s. (i.e. at least 6 servers completely utilizing all of their disk). How many server do you have? HBase/HDFS will likely not quite max out all disks so your 10 machines are cutting that close.

Not concerned about the 250 regions - at least not for this.

Are all machines/disks/CPUs equally busy? Is the table salted?
Note that HBase's block cache stores data uncompressed, and hence your dataset likely does not fit into the aggregate block cache. Your query might run sightly better with the /*+ NO_CACHE */ hint.

Now from your 187541ms number, things look worse, though.
Do you have OTSDB or Ganglia to record metrics of that cluster? If so can you share some graphs of IO/CPU during the query time.
Any chance to attach a profiler to one of the busy region server, or at least get us stack trace?

Thanks.

-- Lars

________________________________
From: Abe Weinograd <ab...@flonet.com>>
To: user <us...@phoenix.apache.org>>; lars hofhansl <la...@apache.org>>
Sent: Monday, October 13, 2014 9:30 AM

Subject: Re: count on large table

Hi Lars,

Thanks for following up.

Table Size - 120G doing a du on HDFS.  We are using Snappy compression on the table.
Column Family - We have 1 column family for all columns and are using the Phoenix default one.
Regions - right now we have a ton of regions (250) because we pre split to help out bulk loads.  I haven't collapsed them yet, but in a DEV environment that is configured the same way, we have ~50 regions and experience the same performance issues.  I am planning on squaring this away and trying again.
Resource Utilization - Really high CPU usage on the region servers and noticing a spike in IO too.

Based on your questions and what I know, the # of regions needs to be compacted first, though I am not sure this is going to solve my issue.  the data nodes in HDFS have 3 1TB disks so I am not convinced that my IO is the bottleneck here.

Thanks,
Abe


On Thu, Oct 9, 2014 at 8:36 PM, lars hofhansl <la...@apache.org>> wrote:
Hi Abe,

this is interesting.

How big are your rows (i.e. how much data is in the table, you tell with du in HDFS)? And how many columns do you have? Any column families?
How many regions are in this table? (you can tell that through the HBase HMaster UI page)
When you execute the query, are all HBase region servers busy? Do you see IO, or just high CPU?

Client batching won't help with an aggregate (such as count) where not much data is transferred back to the client.

Thanks.

-- Lars

________________________________
From: Abe Weinograd <ab...@flonet.com>>
To: user <us...@phoenix.apache.org>>
Sent: Wednesday, October 8, 2014 9:15 AM
Subject: Re: count on large table

Good point.  I have to figure out how to do that in a SQL Tool like Squirrel or workbench.

Is there any obvious thing i can do to help tune this?  I know that's a loaded question.  My client scanner batches are 1000 (also tried 10000 with no luck).

Thanks,
Abe


On Tue, Oct 7, 2014 at 9:09 PM, sunfl@certusnet.com.cn<ma...@certusnet.com.cn> <su...@certusnet.com.cn>> wrote:
Hi, Abe
Maybe setting the following property would help...
<property>
    <name>phoenix.query.timeoutMs</name>
    <value>3600000</value>
</property>

Thanks,
Sun

________________________________
________________________________

From: Abe Weinograd<ma...@flonet.com>
Date: 2014-10-08 04:34
To: user<ma...@phoenix.apache.org>
Subject: count on large table
I have a table with 1B  rows.  I know this can is very specific to my environment, but just doing a SELECT COUNT(1) on the table   It never finished.

We have a 10 node cluster with the RS's Heap size at 26GiB and skewed towards the block cache.  In the RS logs, i see a lot of these:

2014-10-07 16:27:04,942 WARN org.apache.hadoop.ipc.RpcServer: (responseTooSlow): {"processingtimems":22770,"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","client":"10.10.0.10:44791<http://10.10.0.10:44791/>","starttimems":1412713602172,"queuetimems":0,"class":"HRegionServer","responsesize":8,"method":"Scan"}

They stop eventually, but i the query times out and the query tool reports: org.apache.phoenix.exception.PhoenixIOException: 187541ms passed since the last invocation, timeout is currently set to 60000

Any ideas of where I can start in order to figure this out?

using Phoenix 4.1 on CDH 5.1 (Hbase 0.98.1)

Thanks,
Abe






Re: count on large table

Posted by Abe Weinograd <ab...@flonet.com>.
Hi Lars,

We have 10 Region Servers and 2 1TB on each.  The table is not salted, but
we pre split regions when we bulk load so that we force equal distribution
of our data.  the data is relatively distributed across our region servers,
with no one Region Server being the "long tail."

I don't have any metrics from Ganglia.  We are running CDH on EC2, for what
it is work.  The CPUs spike to 100% and IO jumps pretty equallyon the
Region Servers.  Attached is the RS log from one of them when all I am
doing on the entire cluster is a COUNT in phoenix.

Thanks again for your help,
Abe



On Tue, Oct 14, 2014 at 3:10 AM, lars hofhansl <la...@apache.org> wrote:

> Back on the envelope math - assuming disks that can sustain 120mb/s -
> suggests you'd need about 17 disks 100% busy in total to pull 120gb off the
> disks in 60s. (i.e. at least 6 servers completely utilizing all of their
> disk). How many server do you have? HBase/HDFS will likely not quite max
> out all disks so your 10 machines are cutting that close.
>
> Not concerned about the 250 regions - at least not for this.
>
> Are all machines/disks/CPUs equally busy? Is the table salted?
> Note that HBase's block cache stores data uncompressed, and hence your
> dataset likely does not fit into the aggregate block cache. Your query
> might run sightly better with the /*+ NO_CACHE */ hint.
>
> Now from your 187541ms number, things look worse, though.
> Do you have OTSDB or Ganglia to record metrics of that cluster? If so can
> you share some graphs of IO/CPU during the query time.
> Any chance to attach a profiler to one of the busy region server, or at
> least get us stack trace?
>
> Thanks.
>
> -- Lars
>
>   ------------------------------
>  *From:* Abe Weinograd <ab...@flonet.com>
> *To:* user <us...@phoenix.apache.org>; lars hofhansl <la...@apache.org>
> *Sent:* Monday, October 13, 2014 9:30 AM
>
> *Subject:* Re: count on large table
>
> Hi Lars,
>
> Thanks for following up.
>
> Table Size - 120G doing a du on HDFS.  We are using Snappy compression on
> the table.
> Column Family - We have 1 column family for all columns and are using the
> Phoenix default one.
> Regions - right now we have a ton of regions (250) because we pre split to
> help out bulk loads.  I haven't collapsed them yet, but in a DEV
> environment that is configured the same way, we have ~50 regions and
> experience the same performance issues.  I am planning on squaring this
> away and trying again.
> Resource Utilization - Really high CPU usage on the region servers and
> noticing a spike in IO too.
>
> Based on your questions and what I know, the # of regions needs to be
> compacted first, though I am not sure this is going to solve my issue.  the
> data nodes in HDFS have 3 1TB disks so I am not convinced that my IO is the
> bottleneck here.
>
> Thanks,
> Abe
>
>
>
> On Thu, Oct 9, 2014 at 8:36 PM, lars hofhansl <la...@apache.org> wrote:
>
> Hi Abe,
>
> this is interesting.
>
> How big are your rows (i.e. how much data is in the table, you tell with
> du in HDFS)? And how many columns do you have? Any column families?
> How many regions are in this table? (you can tell that through the HBase
> HMaster UI page)
> When you execute the query, are all HBase region servers busy? Do you see
> IO, or just high CPU?
>
> Client batching won't help with an aggregate (such as count) where not
> much data is transferred back to the client.
>
> Thanks.
>
> -- Lars
>
>   ------------------------------
>  *From:* Abe Weinograd <ab...@flonet.com>
> *To:* user <us...@phoenix.apache.org>
> *Sent:* Wednesday, October 8, 2014 9:15 AM
> *Subject:* Re: count on large table
>
> Good point.  I have to figure out how to do that in a SQL Tool like
> Squirrel or workbench.
>
> Is there any obvious thing i can do to help tune this?  I know that's a
> loaded question.  My client scanner batches are 1000 (also tried 10000 with
> no luck).
>
> Thanks,
> Abe
>
>
>
> On Tue, Oct 7, 2014 at 9:09 PM, sunfl@certusnet.com.cn <
> sunfl@certusnet.com.cn> wrote:
>
> Hi, Abe
> Maybe setting the following property would help...
> <property>
>     <name>phoenix.query.timeoutMs</name>
>     <value>3600000</value>
> </property>
>
> Thanks,
> Sun
>
> ------------------------------
> ------------------------------
>
> *From:* Abe Weinograd <ab...@flonet.com>
> *Date:* 2014-10-08 04:34
> *To:* user <us...@phoenix.apache.org>
> *Subject:* count on large table
> I have a table with 1B  rows.  I know this can is very specific to my
> environment, but just doing a SELECT COUNT(1) on the table   It never
> finished.
>
> We have a 10 node cluster with the RS's Heap size at 26GiB and skewed
> towards the block cache.  In the RS logs, i see a lot of these:
>
> 2014-10-07 16:27:04,942 WARN org.apache.hadoop.ipc.RpcServer:
> (responseTooSlow):
> {"processingtimems":22770,"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","client":"
> 10.10.0.10:44791
> ","starttimems":1412713602172,"queuetimems":0,"class":"HRegionServer","responsesize":8,"method":"Scan"}
>
> They stop eventually, but i the query times out and the query tool
> reports: org.apache.phoenix.exception.PhoenixIOException: 187541ms passed
> since the last invocation, timeout is currently set to 60000
>
> Any ideas of where I can start in order to figure this out?
>
> using Phoenix 4.1 on CDH 5.1 (Hbase 0.98.1)
>
> Thanks,
> Abe
>
>
>
>
>
>
>
>

Re: count on large table

Posted by lars hofhansl <la...@apache.org>.
Back on the envelope math - assuming disks that can sustain 120mb/s - suggests you'd need about 17 disks 100% busy in total to pull 120gb off the disks in 60s. (i.e. at least 6 servers completely utilizing all of their disk). How many server do you have? HBase/HDFS will likely not quite max out all disks so your 10 machines are cutting that close.

Not concerned about the 250 regions - at least not for this.

Are all machines/disks/CPUs equally busy? Is the table salted?
Note that HBase's block cache stores data uncompressed, and hence your dataset likely does not fit into the aggregate block cache. Your query might run sightly better with the /*+ NO_CACHE */ hint.

Now from your 187541ms number, things look worse, though.Do you have OTSDB or Ganglia to record metrics of that cluster? If so can you share some graphs of IO/CPU during the query time.Any chance to attach a profiler to one of the busy region server, or at least get us stack trace?

Thanks. 

-- Lars

      From: Abe Weinograd <ab...@flonet.com>
 To: user <us...@phoenix.apache.org>; lars hofhansl <la...@apache.org> 
 Sent: Monday, October 13, 2014 9:30 AM
 Subject: Re: count on large table
   
Hi Lars,
Thanks for following up.
Table Size - 120G doing a du on HDFS.  We are using Snappy compression on the table.Column Family - We have 1 column family for all columns and are using the Phoenix default one.Regions - right now we have a ton of regions (250) because we pre split to help out bulk loads.  I haven't collapsed them yet, but in a DEV environment that is configured the same way, we have ~50 regions and experience the same performance issues.  I am planning on squaring this away and trying again.Resource Utilization - Really high CPU usage on the region servers and noticing a spike in IO too.
Based on your questions and what I know, the # of regions needs to be compacted first, though I am not sure this is going to solve my issue.  the data nodes in HDFS have 3 1TB disks so I am not convinced that my IO is the bottleneck here.
Thanks,Abe


On Thu, Oct 9, 2014 at 8:36 PM, lars hofhansl <la...@apache.org> wrote:

Hi Abe,

this is interesting.

How big are your rows (i.e. how much data is in the table, you tell with du in HDFS)? And how many columns do you have? Any column families?
How many regions are in this table? (you can tell that through the HBase HMaster UI page)
When you execute the query, are all HBase region servers busy? Do you see IO, or just high CPU?

Client batching won't help with an aggregate (such as count) where not much data is transferred back to the client.

Thanks.
-- Lars

      From: Abe Weinograd <ab...@flonet.com>
 To: user <us...@phoenix.apache.org> 
 Sent: Wednesday, October 8, 2014 9:15 AM
 Subject: Re: count on large table
   
Good point.  I have to figure out how to do that in a SQL Tool like Squirrel or workbench.
Is there any obvious thing i can do to help tune this?  I know that's a loaded question.  My client scanner batches are 1000 (also tried 10000 with no luck).
Thanks,Abe


On Tue, Oct 7, 2014 at 9:09 PM, sunfl@certusnet.com.cn <su...@certusnet.com.cn> wrote:

Hi, AbeMaybe setting the following property would help...<property>
    <name>phoenix.query.timeoutMs</name>
    <value>3600000</value>  </property>
Thanks,Sun


From: Abe WeinogradDate: 2014-10-08 04:34To: userSubject: count on large tableI have a table with 1B  rows.  I know this can is very specific to my environment, but just doing a SELECT COUNT(1) on the table   It never finished.  
We have a 10 node cluster with the RS's Heap size at 26GiB and skewed towards the block cache.  In the RS logs, i see a lot of these:
2014-10-07 16:27:04,942 WARN org.apache.hadoop.ipc.RpcServer: (responseTooSlow): {"processingtimems":22770,"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","client":"10.10.0.10:44791","starttimems":1412713602172,"queuetimems":0,"class":"HRegionServer","responsesize":8,"method":"Scan"}

They stop eventually, but i the query times out and the query tool reports: org.apache.phoenix.exception.PhoenixIOException: 187541ms passed since the last invocation, timeout is currently set to 60000
Any ideas of where I can start in order to figure this out?
using Phoenix 4.1 on CDH 5.1 (Hbase 0.98.1)
Thanks,Abe




   



  

Re: count on large table

Posted by Abe Weinograd <ab...@flonet.com>.
Hi Lars,

Thanks for following up.

Table Size - 120G doing a du on HDFS.  We are using Snappy compression on
the table.
Column Family - We have 1 column family for all columns and are using the
Phoenix default one.
Regions - right now we have a ton of regions (250) because we pre split to
help out bulk loads.  I haven't collapsed them yet, but in a DEV
environment that is configured the same way, we have ~50 regions and
experience the same performance issues.  I am planning on squaring this
away and trying again.
Resource Utilization - Really high CPU usage on the region servers and
noticing a spike in IO too.

Based on your questions and what I know, the # of regions needs to be
compacted first, though I am not sure this is going to solve my issue.  the
data nodes in HDFS have 3 1TB disks so I am not convinced that my IO is the
bottleneck here.

Thanks,
Abe

On Thu, Oct 9, 2014 at 8:36 PM, lars hofhansl <la...@apache.org> wrote:

> Hi Abe,
>
> this is interesting.
>
> How big are your rows (i.e. how much data is in the table, you tell with
> du in HDFS)? And how many columns do you have? Any column families?
> How many regions are in this table? (you can tell that through the HBase
> HMaster UI page)
> When you execute the query, are all HBase region servers busy? Do you see
> IO, or just high CPU?
>
> Client batching won't help with an aggregate (such as count) where not
> much data is transferred back to the client.
>
> Thanks.
>
> -- Lars
>
>   ------------------------------
>  *From:* Abe Weinograd <ab...@flonet.com>
> *To:* user <us...@phoenix.apache.org>
> *Sent:* Wednesday, October 8, 2014 9:15 AM
> *Subject:* Re: count on large table
>
> Good point.  I have to figure out how to do that in a SQL Tool like
> Squirrel or workbench.
>
> Is there any obvious thing i can do to help tune this?  I know that's a
> loaded question.  My client scanner batches are 1000 (also tried 10000 with
> no luck).
>
> Thanks,
> Abe
>
>
>
> On Tue, Oct 7, 2014 at 9:09 PM, sunfl@certusnet.com.cn <
> sunfl@certusnet.com.cn> wrote:
>
> Hi, Abe
> Maybe setting the following property would help...
> <property>
>     <name>phoenix.query.timeoutMs</name>
>     <value>3600000</value>
> </property>
>
> Thanks,
> Sun
>
> ------------------------------
> ------------------------------
>
> *From:* Abe Weinograd <ab...@flonet.com>
> *Date:* 2014-10-08 04:34
> *To:* user <us...@phoenix.apache.org>
> *Subject:* count on large table
> I have a table with 1B  rows.  I know this can is very specific to my
> environment, but just doing a SELECT COUNT(1) on the table   It never
> finished.
>
> We have a 10 node cluster with the RS's Heap size at 26GiB and skewed
> towards the block cache.  In the RS logs, i see a lot of these:
>
> 2014-10-07 16:27:04,942 WARN org.apache.hadoop.ipc.RpcServer:
> (responseTooSlow):
> {"processingtimems":22770,"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","client":"
> 10.10.0.10:44791
> ","starttimems":1412713602172,"queuetimems":0,"class":"HRegionServer","responsesize":8,"method":"Scan"}
>
> They stop eventually, but i the query times out and the query tool
> reports: org.apache.phoenix.exception.PhoenixIOException: 187541ms passed
> since the last invocation, timeout is currently set to 60000
>
> Any ideas of where I can start in order to figure this out?
>
> using Phoenix 4.1 on CDH 5.1 (Hbase 0.98.1)
>
> Thanks,
> Abe
>
>
>
>
>

Re: count on large table

Posted by lars hofhansl <la...@apache.org>.
Hi Abe,


this is interesting.


How big are your rows (i.e. how much data is in the table, you tell with du in HDFS)? And how many columns do you have? Any column families?

How many regions are in this table? (you can tell that through the HBase HMaster UI page)

When you execute the query, are all HBase region servers busy? Do you see IO, or just high CPU?


Client batching won't help with an aggregate (such as count) where not much data is transferred back to the client.


Thanks.

-- Lars



________________________________
 From: Abe Weinograd <ab...@flonet.com>
To: user <us...@phoenix.apache.org> 
Sent: Wednesday, October 8, 2014 9:15 AM
Subject: Re: count on large table
 


Good point.  I have to figure out how to do that in a SQL Tool like Squirrel or workbench.

Is there any obvious thing i can do to help tune this?  I know that's a loaded question.  My client scanner batches are 1000 (also tried 10000 with no luck).

Thanks,
Abe




On Tue, Oct 7, 2014 at 9:09 PM, sunfl@certusnet.com.cn <su...@certusnet.com.cn> wrote:

Hi, Abe
>Maybe setting the following property would help...
><property> 
>    <name>phoenix.query.timeoutMs</name> 
>    <value>3600000</value> 
></property>
>
>
>Thanks,
>Sun
>
>
>________________________________
> 
>
>________________________________
>
>
>
>From: Abe Weinograd
>>Date: 2014-10-08 04:34
>>To: user
>>Subject: count on large table
>>I have a table with 1B  rows.  I know this can is very specific to my environment, but just doing a SELECT COUNT(1) on the table   It never finished.  
>>
>>
>>We have a 10 node cluster with the RS's Heap size at 26GiB and skewed towards the block cache.  In the RS logs, i see a lot of these:
>>
>>
>>2014-10-07 16:27:04,942 WARN org.apache.hadoop.ipc.RpcServer: (responseTooSlow): {"processingtimems":22770,"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","client":"10.10.0.10:44791","starttimems":1412713602172,"queuetimems":0,"class":"HRegionServer","responsesize":8,"method":"Scan"}
>>
>>
>>
>>They stop eventually, but i the query times out and the query tool reports: org.apache.phoenix.exception.PhoenixIOException: 187541ms passed since the last invocation, timeout is currently set to 60000
>>
>>
>>Any ideas of where I can start in order to figure this out?
>>
>>
>>using Phoenix 4.1 on CDH 5.1 (Hbase 0.98.1)
>>
>>
>>Thanks,
>>Abe

Re: Re: count on large table

Posted by "sunfl@certusnet.com.cn" <su...@certusnet.com.cn>.
Squirrel sql client possibly connect to host server if your phoenix and hbase are distributed deployed. So I consider it 
working to configure your phoenix server via modifying hbase-site.xml and reconnect phoenix through squirrel. 

Thanks,
Sun




 
From: Abe Weinograd
Date: 2014-10-09 00:15
To: user
Subject: Re: count on large table
Good point.  I have to figure out how to do that in a SQL Tool like Squirrel or workbench.

Is there any obvious thing i can do to help tune this?  I know that's a loaded question.  My client scanner batches are 1000 (also tried 10000 with no luck).

Thanks,
Abe

On Tue, Oct 7, 2014 at 9:09 PM, sunfl@certusnet.com.cn <su...@certusnet.com.cn> wrote:
Hi, Abe
Maybe setting the following property would help...
<property> 
    <name>phoenix.query.timeoutMs</name> 
    <value>3600000</value> 
</property>

Thanks,
Sun






From: Abe Weinograd
Date: 2014-10-08 04:34
To: user
Subject: count on large table
I have a table with 1B  rows.  I know this can is very specific to my environment, but just doing a SELECT COUNT(1) on the table   It never finished.  

We have a 10 node cluster with the RS's Heap size at 26GiB and skewed towards the block cache.  In the RS logs, i see a lot of these:

2014-10-07 16:27:04,942 WARN org.apache.hadoop.ipc.RpcServer: (responseTooSlow): {"processingtimems":22770,"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","client":"10.10.0.10:44791","starttimems":1412713602172,"queuetimems":0,"class":"HRegionServer","responsesize":8,"method":"Scan"}

They stop eventually, but i the query times out and the query tool reports: org.apache.phoenix.exception.PhoenixIOException: 187541ms passed since the last invocation, timeout is currently set to 60000

Any ideas of where I can start in order to figure this out?

using Phoenix 4.1 on CDH 5.1 (Hbase 0.98.1)

Thanks,
Abe


Re: count on large table

Posted by Abe Weinograd <ab...@flonet.com>.
Good point.  I have to figure out how to do that in a SQL Tool like
Squirrel or workbench.

Is there any obvious thing i can do to help tune this?  I know that's a
loaded question.  My client scanner batches are 1000 (also tried 10000 with
no luck).

Thanks,
Abe

On Tue, Oct 7, 2014 at 9:09 PM, sunfl@certusnet.com.cn <
sunfl@certusnet.com.cn> wrote:

> Hi, Abe
> Maybe setting the following property would help...
> <property>
>     <name>phoenix.query.timeoutMs</name>
>     <value>3600000</value>
> </property>
>
> Thanks,
> Sun
>
> ------------------------------
> ------------------------------
>
>
> *From:* Abe Weinograd <ab...@flonet.com>
> *Date:* 2014-10-08 04:34
> *To:* user <us...@phoenix.apache.org>
> *Subject:* count on large table
> I have a table with 1B  rows.  I know this can is very specific to my
> environment, but just doing a SELECT COUNT(1) on the table   It never
> finished.
>
> We have a 10 node cluster with the RS's Heap size at 26GiB and skewed
> towards the block cache.  In the RS logs, i see a lot of these:
>
> 2014-10-07 16:27:04,942 WARN org.apache.hadoop.ipc.RpcServer:
> (responseTooSlow):
> {"processingtimems":22770,"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","client":"
> 10.10.0.10:44791
> ","starttimems":1412713602172,"queuetimems":0,"class":"HRegionServer","responsesize":8,"method":"Scan"}
>
> They stop eventually, but i the query times out and the query tool
> reports: org.apache.phoenix.exception.PhoenixIOException: 187541ms passed
> since the last invocation, timeout is currently set to 60000
>
> Any ideas of where I can start in order to figure this out?
>
> using Phoenix 4.1 on CDH 5.1 (Hbase 0.98.1)
>
> Thanks,
> Abe
>
>

Re: count on large table

Posted by "sunfl@certusnet.com.cn" <su...@certusnet.com.cn>.
Hi, Abe
Maybe setting the following property would help...
<property> 
    <name>phoenix.query.timeoutMs</name> 
    <value>3600000</value> 
</property>

Thanks,
Sun






From: Abe Weinograd
Date: 2014-10-08 04:34
To: user
Subject: count on large table
I have a table with 1B  rows.  I know this can is very specific to my environment, but just doing a SELECT COUNT(1) on the table   It never finished.  

We have a 10 node cluster with the RS's Heap size at 26GiB and skewed towards the block cache.  In the RS logs, i see a lot of these:

2014-10-07 16:27:04,942 WARN org.apache.hadoop.ipc.RpcServer: (responseTooSlow): {"processingtimems":22770,"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","client":"10.10.0.10:44791","starttimems":1412713602172,"queuetimems":0,"class":"HRegionServer","responsesize":8,"method":"Scan"}

They stop eventually, but i the query times out and the query tool reports: org.apache.phoenix.exception.PhoenixIOException: 187541ms passed since the last invocation, timeout is currently set to 60000

Any ideas of where I can start in order to figure this out?

using Phoenix 4.1 on CDH 5.1 (Hbase 0.98.1)

Thanks,
Abe