You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Kevin <ke...@gmail.com> on 2012/07/19 21:44:20 UTC

HTable.coprocessorExec call times out

Hi,

I'm using endpoint coprocessors to do intense scans in parallel on some
tables. I log the time it takes for each coprocessor to finish its job on
the region. Each coprocessor rarely takes longer than a few seconds,
maximum of 5 seconds (there are only 5 regions on the tables right now). As
my cluster grows with data the call HTable.coprocessorExec takes longer and
longer but the coprocessors themselves finish quickly (under 5 seconds).
Eventually I see all my regionservers die because the coprocessorExec call
timed out and zookeeper kills the connection, which makes the regionserver
die.

In terms of code structure, the coprocessorExec call is done inside a
for-loop. The for-loop iterates over a List of objects to help form filters
for the endpoint and then calls the coprocessorExec once per object
processed.

What would be the bottleneck? Is calling the coprocessor like this in a
for-loop loading the regions down and not allowing them time to do GC? Is
there a way to ping a table and judge if it'll be ready for the endpoint
call?

Thanks,
-Kevin

Re: HTable.coprocessorExec call times out

Posted by Kevin <ke...@gmail.com>.

I checked the regionservers' .out log and found that they were producing an
out of memory exception. I found a couple of fatal memory bugs in my code
and now everything seems to be fine. I don't know how I forgot to look at
the .out log file.

On Fri, Jul 20, 2012 at 9:35 AM, Ted Yu <yu...@gmail.com> wrote:

> Can you check the following config param to see if its value is high enough
> ?
>
>     <name>hbase.zookeeper.property.maxClientCnxns</name>
>
> Cheers
>
> On Fri, Jul 20, 2012 at 6:23 AM, Kevin <ke...@gmail.com> wrote:
>
> > In zookeeper I see that regionserver connections are timing out. I open
> an
> > HTable, call coprocessorExec, then I close the HTable. This is done in
> the
> > for-loop. I'm not sure why the regionservers are timing out. I think
> don't
> > think anymore it's a client-side issue but maybe a server-side issue with
> > the regionservers. I don't know what else to do to debug it further.
> >
> > 2012-07-19 07:01:26,738 [myid:] - INFO  [NIOServerCxn.Factory:
> > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed socket connection for
> > client /10.10.23.124:46863 which had sessionid 0x138763251670a3d
> > 2012-07-19 07:01:56,000 [myid:] - INFO
>  [SessionTracker:ZooKeeperServer@325
> > ]
> > - Expiring session 0x138763251670a37, timeout of 40000ms exceeded
> > 2012-07-19 07:01:56,000 [myid:] - INFO  [ProcessThread(sid:0
> > cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> > sessionid: 0x138763251670a37
> > 2012-07-19 07:02:00,000 [myid:] - INFO
>  [SessionTracker:ZooKeeperServer@325
> > ]
> > - Expiring session 0x138763251670a3d, timeout of 40000ms exceeded
> > 2012-07-19 07:02:00,000 [myid:] - INFO
>  [SessionTracker:ZooKeeperServer@325
> > ]
> > - Expiring session 0x138763251670a39, timeout of 40000ms exceeded
> > 2012-07-19 07:02:00,001 [myid:] - INFO
>  [SessionTracker:ZooKeeperServer@325
> > ]
> > - Expiring session 0x138763251670a3f, timeout of 40000ms exceeded
> > 2012-07-19 07:02:00,001 [myid:] - INFO  [ProcessThread(sid:0
> > cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> > sessionid: 0x138763251670a3d
> > 2012-07-19 07:02:00,001 [myid:] - INFO  [ProcessThread(sid:0
> > cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> > sessionid: 0x138763251670a39
> > 2012-07-19 07:02:00,001 [myid:] - INFO  [ProcessThread(sid:0
> > cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> > sessionid: 0x138763251670a3f
> > 2012-07-19 07:02:04,000 [myid:] - INFO
>  [SessionTracker:ZooKeeperServer@325
> > ]
> > - Expiring session 0x138763251670a38, timeout of 40000ms exceeded
> > 2012-07-19 07:02:04,000 [myid:] - INFO
>  [SessionTracker:ZooKeeperServer@325
> > ]
> > - Expiring session 0x138763251670a36, timeout of 40000ms exceeded
> > 2012-07-19 07:02:04,001 [myid:] - INFO
>  [SessionTracker:ZooKeeperServer@325
> > ]
> > - Expiring session 0x138763251670a3c, timeout of 40000ms exceeded
> > 2012-07-19 07:02:04,001 [myid:] - INFO
>  [SessionTracker:ZooKeeperServer@325
> > ]
> > - Expiring session 0x138763251670a3e, timeout of 40000ms exceeded
> > 2012-07-19 07:02:04,001 [myid:] - INFO  [ProcessThread(sid:0
> > cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> > sessionid: 0x138763251670a38
> > 2012-07-19 07:02:04,001 [myid:] - INFO  [ProcessThread(sid:0
> > cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> > sessionid: 0x138763251670a36
> > 2012-07-19 07:02:04,002 [myid:] - INFO  [ProcessThread(sid:0
> > cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> > sessionid: 0x138763251670a3c
> > 2012-07-19 07:02:04,002 [myid:] - INFO  [ProcessThread(sid:0
> > cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> > sessionid: 0x138763251670a3e
> >
> >
> > On Thu, Jul 19, 2012 at 4:55 PM, Kevin <ke...@gmail.com>
> wrote:
> >
> > > That's it. That's the end of the regionserver log. In the master's web
> UI
> > > the regionserver is in the table labeled "Dead Region Servers."
> > >
> > > In the master's log there is:
> > > 2012-07-19 07:02:04,016 INFO
> > > org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer
> > > ephemeral node deleted, processing expiration
> > [slave2,60020,1342694622535]
> > > 2012-07-19 07:02:04,025 DEBUG
> > > org.apache.hadoop.hbase.master.AssignmentManager: based on AM, current
> > > region=-ROOT-,,0.70236052 is on server=slave2,60020,1342694622535
> server
> > > being checked: slave2,60020,1342694622535
> > > 2012-07-19 07:02:04,025 DEBUG
> > > org.apache.hadoop.hbase.master.AssignmentManager: based on AM, current
> > > region=.META.,,1.1028785192 is on server=slave2,60020,1342694622535
> > server
> > > being checked: slave2,60020,1342694622535
> > > 2012-07-19 07:02:04,027 DEBUG
> > > org.apache.hadoop.hbase.master.ServerManager:
> > > Added=slave2,60020,1342694622535 to dead servers, submitted shutdown
> > > handler to be executed, root=true, meta=true
> > > 2012-07-19 07:02:04,027 INFO
> > > org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting
> > > logs for slave2,60020,1342694622535
> > >
> > > The timestamps are different than above but it's from the same
> > application
> > > just at a different time than I sent before. The message would look the
> > > same only with the timestamps being 2012-07-19 09:49ish
> > >
> > >
> > > On Thu, Jul 19, 2012 at 4:50 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > >> Can you paste more of the region server log after 09:49:18,551 (till
> the
> > >> region server died) ?
> > >>
> > >> Thanks
> > >>
> > >> On Thu, Jul 19, 2012 at 1:46 PM, Kevin <ke...@gmail.com>
> > wrote:
> > >>
> > >> > The log snippet just before the regionservers die look like this:
> > >> >
> > >> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: putting
> new
> > >> > rowkey
> > >> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: new
> rowkey
> > >> put
> > >> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: coproc
> > time:
> > >> > 1227 ms
> > >> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: closing
> > >> scanner
> > >> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: scanner
> > >> closed
> > >> > <after this log statement in the endpoint code is the return
> > statement>
> > >> >
> > >> > A coprocessorExec call may be from 3-20 seconds after the previous
> (it
> > >> > depends how long the last call took). But I know the endpoints are
> > >> > finishing their code fast because throughout the log each "coproc
> > time:"
> > >> > statement is under 5 seconds.
> > >> >
> > >> > I am using CDH4b2, which uses HBase 0.92.1.
> > >> >
> > >> > On Thu, Jul 19, 2012 at 4:35 PM, Ted Yu <yu...@gmail.com>
> wrote:
> > >> >
> > >> > > Kevin:
> > >> > > Can you pastebin the log snippet from region server just before it
> > >> died ?
> > >> > >
> > >> > > How frequent were your coprocessorExec() calls ?
> > >> > > What HBase version were you using ?
> > >> > >
> > >> > > Thanks
> > >> > >
> > >> > > On Thu, Jul 19, 2012 at 12:44 PM, Kevin <
> kevin.macksamie@gmail.com>
> > >> > wrote:
> > >> > >
> > >> > > > Hi,
> > >> > > >
> > >> > > > I'm using endpoint coprocessors to do intense scans in parallel
> on
> > >> some
> > >> > > > tables. I log the time it takes for each coprocessor to finish
> its
> > >> job
> > >> > on
> > >> > > > the region. Each coprocessor rarely takes longer than a few
> > seconds,
> > >> > > > maximum of 5 seconds (there are only 5 regions on the tables
> right
> > >> > now).
> > >> > > As
> > >> > > > my cluster grows with data the call HTable.coprocessorExec takes
> > >> longer
> > >> > > and
> > >> > > > longer but the coprocessors themselves finish quickly (under 5
> > >> > seconds).
> > >> > > > Eventually I see all my regionservers die because the
> > >> coprocessorExec
> > >> > > call
> > >> > > > timed out and zookeeper kills the connection, which makes the
> > >> > > regionserver
> > >> > > > die.
> > >> > > >
> > >> > > > In terms of code structure, the coprocessorExec call is done
> > inside
> > >> a
> > >> > > > for-loop. The for-loop iterates over a List of objects to help
> > form
> > >> > > filters
> > >> > > > for the endpoint and then calls the coprocessorExec once per
> > object
> > >> > > > processed.
> > >> > > >
> > >> > > > What would be the bottleneck? Is calling the coprocessor like
> this
> > >> in a
> > >> > > > for-loop loading the regions down and not allowing them time to
> do
> > >> GC?
> > >> > Is
> > >> > > > there a way to ping a table and judge if it'll be ready for the
> > >> > endpoint
> > >> > > > call?
> > >> > > >
> > >> > > > Thanks,
> > >> > > > -Kevin
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: HTable.coprocessorExec call times out

Posted by Ted Yu <yu...@gmail.com>.

Can you check the following config param to see if its value is high enough
?

    <name>hbase.zookeeper.property.maxClientCnxns</name>

Cheers

On Fri, Jul 20, 2012 at 6:23 AM, Kevin <ke...@gmail.com> wrote:

> In zookeeper I see that regionserver connections are timing out. I open an
> HTable, call coprocessorExec, then I close the HTable. This is done in the
> for-loop. I'm not sure why the regionservers are timing out. I think don't
> think anymore it's a client-side issue but maybe a server-side issue with
> the regionservers. I don't know what else to do to debug it further.
>
> 2012-07-19 07:01:26,738 [myid:] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed socket connection for
> client /10.10.23.124:46863 which had sessionid 0x138763251670a3d
> 2012-07-19 07:01:56,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325
> ]
> - Expiring session 0x138763251670a37, timeout of 40000ms exceeded
> 2012-07-19 07:01:56,000 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> sessionid: 0x138763251670a37
> 2012-07-19 07:02:00,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325
> ]
> - Expiring session 0x138763251670a3d, timeout of 40000ms exceeded
> 2012-07-19 07:02:00,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325
> ]
> - Expiring session 0x138763251670a39, timeout of 40000ms exceeded
> 2012-07-19 07:02:00,001 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325
> ]
> - Expiring session 0x138763251670a3f, timeout of 40000ms exceeded
> 2012-07-19 07:02:00,001 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> sessionid: 0x138763251670a3d
> 2012-07-19 07:02:00,001 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> sessionid: 0x138763251670a39
> 2012-07-19 07:02:00,001 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> sessionid: 0x138763251670a3f
> 2012-07-19 07:02:04,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325
> ]
> - Expiring session 0x138763251670a38, timeout of 40000ms exceeded
> 2012-07-19 07:02:04,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325
> ]
> - Expiring session 0x138763251670a36, timeout of 40000ms exceeded
> 2012-07-19 07:02:04,001 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325
> ]
> - Expiring session 0x138763251670a3c, timeout of 40000ms exceeded
> 2012-07-19 07:02:04,001 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325
> ]
> - Expiring session 0x138763251670a3e, timeout of 40000ms exceeded
> 2012-07-19 07:02:04,001 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> sessionid: 0x138763251670a38
> 2012-07-19 07:02:04,001 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> sessionid: 0x138763251670a36
> 2012-07-19 07:02:04,002 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> sessionid: 0x138763251670a3c
> 2012-07-19 07:02:04,002 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> sessionid: 0x138763251670a3e
>
>
> On Thu, Jul 19, 2012 at 4:55 PM, Kevin <ke...@gmail.com> wrote:
>
> > That's it. That's the end of the regionserver log. In the master's web UI
> > the regionserver is in the table labeled "Dead Region Servers."
> >
> > In the master's log there is:
> > 2012-07-19 07:02:04,016 INFO
> > org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer
> > ephemeral node deleted, processing expiration
> [slave2,60020,1342694622535]
> > 2012-07-19 07:02:04,025 DEBUG
> > org.apache.hadoop.hbase.master.AssignmentManager: based on AM, current
> > region=-ROOT-,,0.70236052 is on server=slave2,60020,1342694622535 server
> > being checked: slave2,60020,1342694622535
> > 2012-07-19 07:02:04,025 DEBUG
> > org.apache.hadoop.hbase.master.AssignmentManager: based on AM, current
> > region=.META.,,1.1028785192 is on server=slave2,60020,1342694622535
> server
> > being checked: slave2,60020,1342694622535
> > 2012-07-19 07:02:04,027 DEBUG
> > org.apache.hadoop.hbase.master.ServerManager:
> > Added=slave2,60020,1342694622535 to dead servers, submitted shutdown
> > handler to be executed, root=true, meta=true
> > 2012-07-19 07:02:04,027 INFO
> > org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting
> > logs for slave2,60020,1342694622535
> >
> > The timestamps are different than above but it's from the same
> application
> > just at a different time than I sent before. The message would look the
> > same only with the timestamps being 2012-07-19 09:49ish
> >
> >
> > On Thu, Jul 19, 2012 at 4:50 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> >> Can you paste more of the region server log after 09:49:18,551 (till the
> >> region server died) ?
> >>
> >> Thanks
> >>
> >> On Thu, Jul 19, 2012 at 1:46 PM, Kevin <ke...@gmail.com>
> wrote:
> >>
> >> > The log snippet just before the regionservers die look like this:
> >> >
> >> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: putting new
> >> > rowkey
> >> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: new rowkey
> >> put
> >> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: coproc
> time:
> >> > 1227 ms
> >> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: closing
> >> scanner
> >> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: scanner
> >> closed
> >> > <after this log statement in the endpoint code is the return
> statement>
> >> >
> >> > A coprocessorExec call may be from 3-20 seconds after the previous (it
> >> > depends how long the last call took). But I know the endpoints are
> >> > finishing their code fast because throughout the log each "coproc
> time:"
> >> > statement is under 5 seconds.
> >> >
> >> > I am using CDH4b2, which uses HBase 0.92.1.
> >> >
> >> > On Thu, Jul 19, 2012 at 4:35 PM, Ted Yu <yu...@gmail.com> wrote:
> >> >
> >> > > Kevin:
> >> > > Can you pastebin the log snippet from region server just before it
> >> died ?
> >> > >
> >> > > How frequent were your coprocessorExec() calls ?
> >> > > What HBase version were you using ?
> >> > >
> >> > > Thanks
> >> > >
> >> > > On Thu, Jul 19, 2012 at 12:44 PM, Kevin <ke...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > I'm using endpoint coprocessors to do intense scans in parallel on
> >> some
> >> > > > tables. I log the time it takes for each coprocessor to finish its
> >> job
> >> > on
> >> > > > the region. Each coprocessor rarely takes longer than a few
> seconds,
> >> > > > maximum of 5 seconds (there are only 5 regions on the tables right
> >> > now).
> >> > > As
> >> > > > my cluster grows with data the call HTable.coprocessorExec takes
> >> longer
> >> > > and
> >> > > > longer but the coprocessors themselves finish quickly (under 5
> >> > seconds).
> >> > > > Eventually I see all my regionservers die because the
> >> coprocessorExec
> >> > > call
> >> > > > timed out and zookeeper kills the connection, which makes the
> >> > > regionserver
> >> > > > die.
> >> > > >
> >> > > > In terms of code structure, the coprocessorExec call is done
> inside
> >> a
> >> > > > for-loop. The for-loop iterates over a List of objects to help
> form
> >> > > filters
> >> > > > for the endpoint and then calls the coprocessorExec once per
> object
> >> > > > processed.
> >> > > >
> >> > > > What would be the bottleneck? Is calling the coprocessor like this
> >> in a
> >> > > > for-loop loading the regions down and not allowing them time to do
> >> GC?
> >> > Is
> >> > > > there a way to ping a table and judge if it'll be ready for the
> >> > endpoint
> >> > > > call?
> >> > > >
> >> > > > Thanks,
> >> > > > -Kevin
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: HTable.coprocessorExec call times out

Posted by Kevin <ke...@gmail.com>.

In zookeeper I see that regionserver connections are timing out. I open an
HTable, call coprocessorExec, then I close the HTable. This is done in the
for-loop. I'm not sure why the regionservers are timing out. I think don't
think anymore it's a client-side issue but maybe a server-side issue with
the regionservers. I don't know what else to do to debug it further.

2012-07-19 07:01:26,738 [myid:] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed socket connection for
client /10.10.23.124:46863 which had sessionid 0x138763251670a3d
2012-07-19 07:01:56,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325]
- Expiring session 0x138763251670a37, timeout of 40000ms exceeded
2012-07-19 07:01:56,000 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::PrepRequestProcessor@476] - Processed session termination for
sessionid: 0x138763251670a37
2012-07-19 07:02:00,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325]
- Expiring session 0x138763251670a3d, timeout of 40000ms exceeded
2012-07-19 07:02:00,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325]
- Expiring session 0x138763251670a39, timeout of 40000ms exceeded
2012-07-19 07:02:00,001 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325]
- Expiring session 0x138763251670a3f, timeout of 40000ms exceeded
2012-07-19 07:02:00,001 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::PrepRequestProcessor@476] - Processed session termination for
sessionid: 0x138763251670a3d
2012-07-19 07:02:00,001 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::PrepRequestProcessor@476] - Processed session termination for
sessionid: 0x138763251670a39
2012-07-19 07:02:00,001 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::PrepRequestProcessor@476] - Processed session termination for
sessionid: 0x138763251670a3f
2012-07-19 07:02:04,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325]
- Expiring session 0x138763251670a38, timeout of 40000ms exceeded
2012-07-19 07:02:04,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325]
- Expiring session 0x138763251670a36, timeout of 40000ms exceeded
2012-07-19 07:02:04,001 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325]
- Expiring session 0x138763251670a3c, timeout of 40000ms exceeded
2012-07-19 07:02:04,001 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325]
- Expiring session 0x138763251670a3e, timeout of 40000ms exceeded
2012-07-19 07:02:04,001 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::PrepRequestProcessor@476] - Processed session termination for
sessionid: 0x138763251670a38
2012-07-19 07:02:04,001 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::PrepRequestProcessor@476] - Processed session termination for
sessionid: 0x138763251670a36
2012-07-19 07:02:04,002 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::PrepRequestProcessor@476] - Processed session termination for
sessionid: 0x138763251670a3c
2012-07-19 07:02:04,002 [myid:] - INFO  [ProcessThread(sid:0
cport:-1)::PrepRequestProcessor@476] - Processed session termination for
sessionid: 0x138763251670a3e


On Thu, Jul 19, 2012 at 4:55 PM, Kevin <ke...@gmail.com> wrote:

> That's it. That's the end of the regionserver log. In the master's web UI
> the regionserver is in the table labeled "Dead Region Servers."
>
> In the master's log there is:
> 2012-07-19 07:02:04,016 INFO
> org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer
> ephemeral node deleted, processing expiration [slave2,60020,1342694622535]
> 2012-07-19 07:02:04,025 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: based on AM, current
> region=-ROOT-,,0.70236052 is on server=slave2,60020,1342694622535 server
> being checked: slave2,60020,1342694622535
> 2012-07-19 07:02:04,025 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: based on AM, current
> region=.META.,,1.1028785192 is on server=slave2,60020,1342694622535 server
> being checked: slave2,60020,1342694622535
> 2012-07-19 07:02:04,027 DEBUG
> org.apache.hadoop.hbase.master.ServerManager:
> Added=slave2,60020,1342694622535 to dead servers, submitted shutdown
> handler to be executed, root=true, meta=true
> 2012-07-19 07:02:04,027 INFO
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting
> logs for slave2,60020,1342694622535
>
> The timestamps are different than above but it's from the same application
> just at a different time than I sent before. The message would look the
> same only with the timestamps being 2012-07-19 09:49ish
>
>
> On Thu, Jul 19, 2012 at 4:50 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Can you paste more of the region server log after 09:49:18,551 (till the
>> region server died) ?
>>
>> Thanks
>>
>> On Thu, Jul 19, 2012 at 1:46 PM, Kevin <ke...@gmail.com> wrote:
>>
>> > The log snippet just before the regionservers die look like this:
>> >
>> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: putting new
>> > rowkey
>> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: new rowkey
>> put
>> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: coproc time:
>> > 1227 ms
>> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: closing
>> scanner
>> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: scanner
>> closed
>> > <after this log statement in the endpoint code is the return statement>
>> >
>> > A coprocessorExec call may be from 3-20 seconds after the previous (it
>> > depends how long the last call took). But I know the endpoints are
>> > finishing their code fast because throughout the log each "coproc time:"
>> > statement is under 5 seconds.
>> >
>> > I am using CDH4b2, which uses HBase 0.92.1.
>> >
>> > On Thu, Jul 19, 2012 at 4:35 PM, Ted Yu <yu...@gmail.com> wrote:
>> >
>> > > Kevin:
>> > > Can you pastebin the log snippet from region server just before it
>> died ?
>> > >
>> > > How frequent were your coprocessorExec() calls ?
>> > > What HBase version were you using ?
>> > >
>> > > Thanks
>> > >
>> > > On Thu, Jul 19, 2012 at 12:44 PM, Kevin <ke...@gmail.com>
>> > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I'm using endpoint coprocessors to do intense scans in parallel on
>> some
>> > > > tables. I log the time it takes for each coprocessor to finish its
>> job
>> > on
>> > > > the region. Each coprocessor rarely takes longer than a few seconds,
>> > > > maximum of 5 seconds (there are only 5 regions on the tables right
>> > now).
>> > > As
>> > > > my cluster grows with data the call HTable.coprocessorExec takes
>> longer
>> > > and
>> > > > longer but the coprocessors themselves finish quickly (under 5
>> > seconds).
>> > > > Eventually I see all my regionservers die because the
>> coprocessorExec
>> > > call
>> > > > timed out and zookeeper kills the connection, which makes the
>> > > regionserver
>> > > > die.
>> > > >
>> > > > In terms of code structure, the coprocessorExec call is done inside
>> a
>> > > > for-loop. The for-loop iterates over a List of objects to help form
>> > > filters
>> > > > for the endpoint and then calls the coprocessorExec once per object
>> > > > processed.
>> > > >
>> > > > What would be the bottleneck? Is calling the coprocessor like this
>> in a
>> > > > for-loop loading the regions down and not allowing them time to do
>> GC?
>> > Is
>> > > > there a way to ping a table and judge if it'll be ready for the
>> > endpoint
>> > > > call?
>> > > >
>> > > > Thanks,
>> > > > -Kevin
>> > > >
>> > >
>> >
>>
>
>

Re: HTable.coprocessorExec call times out

Posted by Kevin <ke...@gmail.com>.

That's it. That's the end of the regionserver log. In the master's web UI
the regionserver is in the table labeled "Dead Region Servers."

In the master's log there is:
2012-07-19 07:02:04,016 INFO
org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer
ephemeral node deleted, processing expiration [slave2,60020,1342694622535]
2012-07-19 07:02:04,025 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: based on AM, current
region=-ROOT-,,0.70236052 is on server=slave2,60020,1342694622535 server
being checked: slave2,60020,1342694622535
2012-07-19 07:02:04,025 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: based on AM, current
region=.META.,,1.1028785192 is on server=slave2,60020,1342694622535 server
being checked: slave2,60020,1342694622535
2012-07-19 07:02:04,027 DEBUG org.apache.hadoop.hbase.master.ServerManager:
Added=slave2,60020,1342694622535 to dead servers, submitted shutdown
handler to be executed, root=true, meta=true
2012-07-19 07:02:04,027 INFO
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting
logs for slave2,60020,1342694622535

The timestamps are different than above but it's from the same application
just at a different time than I sent before. The message would look the
same only with the timestamps being 2012-07-19 09:49ish

On Thu, Jul 19, 2012 at 4:50 PM, Ted Yu <yu...@gmail.com> wrote:

> Can you paste more of the region server log after 09:49:18,551 (till the
> region server died) ?
>
> Thanks
>
> On Thu, Jul 19, 2012 at 1:46 PM, Kevin <ke...@gmail.com> wrote:
>
> > The log snippet just before the regionservers die look like this:
> >
> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: putting new
> > rowkey
> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: new rowkey put
> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: coproc time:
> > 1227 ms
> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: closing
> scanner
> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: scanner closed
> > <after this log statement in the endpoint code is the return statement>
> >
> > A coprocessorExec call may be from 3-20 seconds after the previous (it
> > depends how long the last call took). But I know the endpoints are
> > finishing their code fast because throughout the log each "coproc time:"
> > statement is under 5 seconds.
> >
> > I am using CDH4b2, which uses HBase 0.92.1.
> >
> > On Thu, Jul 19, 2012 at 4:35 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Kevin:
> > > Can you pastebin the log snippet from region server just before it
> died ?
> > >
> > > How frequent were your coprocessorExec() calls ?
> > > What HBase version were you using ?
> > >
> > > Thanks
> > >
> > > On Thu, Jul 19, 2012 at 12:44 PM, Kevin <ke...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm using endpoint coprocessors to do intense scans in parallel on
> some
> > > > tables. I log the time it takes for each coprocessor to finish its
> job
> > on
> > > > the region. Each coprocessor rarely takes longer than a few seconds,
> > > > maximum of 5 seconds (there are only 5 regions on the tables right
> > now).
> > > As
> > > > my cluster grows with data the call HTable.coprocessorExec takes
> longer
> > > and
> > > > longer but the coprocessors themselves finish quickly (under 5
> > seconds).
> > > > Eventually I see all my regionservers die because the coprocessorExec
> > > call
> > > > timed out and zookeeper kills the connection, which makes the
> > > regionserver
> > > > die.
> > > >
> > > > In terms of code structure, the coprocessorExec call is done inside a
> > > > for-loop. The for-loop iterates over a List of objects to help form
> > > filters
> > > > for the endpoint and then calls the coprocessorExec once per object
> > > > processed.
> > > >
> > > > What would be the bottleneck? Is calling the coprocessor like this
> in a
> > > > for-loop loading the regions down and not allowing them time to do
> GC?
> > Is
> > > > there a way to ping a table and judge if it'll be ready for the
> > endpoint
> > > > call?
> > > >
> > > > Thanks,
> > > > -Kevin
> > > >
> > >
> >
>

Re: HTable.coprocessorExec call times out

Posted by Ted Yu <yu...@gmail.com>.

Can you paste more of the region server log after 09:49:18,551 (till the
region server died) ?

Thanks

On Thu, Jul 19, 2012 at 1:46 PM, Kevin <ke...@gmail.com> wrote:

> The log snippet just before the regionservers die look like this:
>
> 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: putting new
> rowkey
> 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: new rowkey put
> 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: coproc time:
> 1227 ms
> 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: closing scanner
> 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: scanner closed
> <after this log statement in the endpoint code is the return statement>
>
> A coprocessorExec call may be from 3-20 seconds after the previous (it
> depends how long the last call took). But I know the endpoints are
> finishing their code fast because throughout the log each "coproc time:"
> statement is under 5 seconds.
>
> I am using CDH4b2, which uses HBase 0.92.1.
>
> On Thu, Jul 19, 2012 at 4:35 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Kevin:
> > Can you pastebin the log snippet from region server just before it died ?
> >
> > How frequent were your coprocessorExec() calls ?
> > What HBase version were you using ?
> >
> > Thanks
> >
> > On Thu, Jul 19, 2012 at 12:44 PM, Kevin <ke...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > I'm using endpoint coprocessors to do intense scans in parallel on some
> > > tables. I log the time it takes for each coprocessor to finish its job
> on
> > > the region. Each coprocessor rarely takes longer than a few seconds,
> > > maximum of 5 seconds (there are only 5 regions on the tables right
> now).
> > As
> > > my cluster grows with data the call HTable.coprocessorExec takes longer
> > and
> > > longer but the coprocessors themselves finish quickly (under 5
> seconds).
> > > Eventually I see all my regionservers die because the coprocessorExec
> > call
> > > timed out and zookeeper kills the connection, which makes the
> > regionserver
> > > die.
> > >
> > > In terms of code structure, the coprocessorExec call is done inside a
> > > for-loop. The for-loop iterates over a List of objects to help form
> > filters
> > > for the endpoint and then calls the coprocessorExec once per object
> > > processed.
> > >
> > > What would be the bottleneck? Is calling the coprocessor like this in a
> > > for-loop loading the regions down and not allowing them time to do GC?
> Is
> > > there a way to ping a table and judge if it'll be ready for the
> endpoint
> > > call?
> > >
> > > Thanks,
> > > -Kevin
> > >
> >
>

Re: HTable.coprocessorExec call times out

Posted by Kevin <ke...@gmail.com>.

The log snippet just before the regionservers die look like this:

2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: putting new
rowkey
2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: new rowkey put
2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: coproc time:
1227 ms
2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: closing scanner
2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: scanner closed
<after this log statement in the endpoint code is the return statement>

A coprocessorExec call may be from 3-20 seconds after the previous (it
depends how long the last call took). But I know the endpoints are
finishing their code fast because throughout the log each "coproc time:"
statement is under 5 seconds.

I am using CDH4b2, which uses HBase 0.92.1.

On Thu, Jul 19, 2012 at 4:35 PM, Ted Yu <yu...@gmail.com> wrote:

> Kevin:
> Can you pastebin the log snippet from region server just before it died ?
>
> How frequent were your coprocessorExec() calls ?
> What HBase version were you using ?
>
> Thanks
>
> On Thu, Jul 19, 2012 at 12:44 PM, Kevin <ke...@gmail.com> wrote:
>
> > Hi,
> >
> > I'm using endpoint coprocessors to do intense scans in parallel on some
> > tables. I log the time it takes for each coprocessor to finish its job on
> > the region. Each coprocessor rarely takes longer than a few seconds,
> > maximum of 5 seconds (there are only 5 regions on the tables right now).
> As
> > my cluster grows with data the call HTable.coprocessorExec takes longer
> and
> > longer but the coprocessors themselves finish quickly (under 5 seconds).
> > Eventually I see all my regionservers die because the coprocessorExec
> call
> > timed out and zookeeper kills the connection, which makes the
> regionserver
> > die.
> >
> > In terms of code structure, the coprocessorExec call is done inside a
> > for-loop. The for-loop iterates over a List of objects to help form
> filters
> > for the endpoint and then calls the coprocessorExec once per object
> > processed.
> >
> > What would be the bottleneck? Is calling the coprocessor like this in a
> > for-loop loading the regions down and not allowing them time to do GC? Is
> > there a way to ping a table and judge if it'll be ready for the endpoint
> > call?
> >
> > Thanks,
> > -Kevin
> >
>

Re: HTable.coprocessorExec call times out

Posted by Ted Yu <yu...@gmail.com>.

Kevin:
Can you pastebin the log snippet from region server just before it died ?

How frequent were your coprocessorExec() calls ?
What HBase version were you using ?

Thanks

On Thu, Jul 19, 2012 at 12:44 PM, Kevin <ke...@gmail.com> wrote:

> Hi,
>
> I'm using endpoint coprocessors to do intense scans in parallel on some
> tables. I log the time it takes for each coprocessor to finish its job on
> the region. Each coprocessor rarely takes longer than a few seconds,
> maximum of 5 seconds (there are only 5 regions on the tables right now). As
> my cluster grows with data the call HTable.coprocessorExec takes longer and
> longer but the coprocessors themselves finish quickly (under 5 seconds).
> Eventually I see all my regionservers die because the coprocessorExec call
> timed out and zookeeper kills the connection, which makes the regionserver
> die.
>
> In terms of code structure, the coprocessorExec call is done inside a
> for-loop. The for-loop iterates over a List of objects to help form filters
> for the endpoint and then calls the coprocessorExec once per object
> processed.
>
> What would be the bottleneck? Is calling the coprocessor like this in a
> for-loop loading the regions down and not allowing them time to do GC? Is
> there a way to ping a table and judge if it'll be ready for the endpoint
> call?
>
> Thanks,
> -Kevin
>