You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Mayuresh <ma...@gmail.com> on 2011/09/23 11:52:22 UTC

Fwd: Coprocessor execution - HBaseServer: (responseTooSlow) java.nio.channels.ClosedChannelException

Hi,

I am using the AggregationClient to perform aggregate calculation on a
10 M row hbase table which is spread across around 27 regions.
Currently all the regions are on a single region server.

Here is briefly what the code is doing:
        Leases leases = new Leases(40 * 60 * 1000, 5 * 1000);
        leases.start();
        long startTime = System.currentTimeMillis();
        System.out.println("Start time: " + startTime);
        Configuration conf = new Configuration(true);
        AggregationClient aClient = new AggregationClient(conf);

        Scan scan = new Scan();
        scan.addColumn(TEST_FAMILY, TEST_QUALIFIER);
        final ColumnInterpreter<Long, Long> ci = new LongColumnInterpreter();

        double avg = aClient.avg(TEST_TABLE, ci,
                scan);
        leases.close();
        long endTime = System.currentTimeMillis();
        System.out.println("End time: " + endTime);
        System.out.println("Avg is: " + avg);
        System.out.println("Time taken is: " + (endTime - startTime));

However it is failing after some time with the following log messages
in the region server log:
2011-09-23 14:21:25,649 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for
region Test,serv3-1315575650141,1316682342728.b3fe66994b0f2fb0643c88f6897474a1.
2011-09-23 14:21:25,650 DEBUG
org.apache.hadoop.hbase.io.hfile.HFileReaderV2: On close of file
15761056997240226 evicted 344 block(s)
2011-09-23 14:21:25,650 DEBUG
org.apache.hadoop.hbase.regionserver.Store: closed data
2011-09-23 14:21:25,650 INFO
org.apache.hadoop.hbase.regionserver.HRegion: Closed
Test,serv3-1315575650141,1316682342728.b3fe66994b0f2fb0643c88f6897474a1.
2011-09-23 14:21:25,650 DEBUG
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
Closed region Test,serv3-1315575650141,1316682342728.b3fe66994b0f2fb0643c88f6897474a1.
2011-09-23 14:21:25,651 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server Responder, call execCoprocessor([B@6b6ab732,
getAvg(org.apache.hadoop.hbase.client.coprocessor.LongColumnInterpreter@2b216ab6,
{"timeRange":[0,9223372036854775807],"batch":-1,"startRow":"serv1-1315571539741","stopRow":"serv5-1315575834916","totalColumns":0,"cacheBlocks":true,"families":{"data":[]},"maxVersions":1,"caching":-1}))
from 137.72.240.177:47563: output error
2011-09-23 14:21:25,651 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 9 on 60020 caught:
java.nio.channels.ClosedChannelException
        at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
        at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1501)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:876)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:955)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:390)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1240)

And the following in the zookeeper log:
2011-09-23 14:19:50,978 INFO
org.apache.zookeeper.server.NIOServerCnxn: Established session
0x1329577cad70004 with negotiated timeout 180000 for client
/137.72.240.177:45431
2011-09-23 14:21:11,028 WARN
org.apache.zookeeper.server.NIOServerCnxn: EndOfStreamException:
Unable to read additional data from client sessionid
0x1329577cad70004, likely client has closed socket
2011-09-23 14:21:11,029 INFO
org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection
for client /137.72.240.177:45431 which had sessionid 0x1329577cad70004
2011-09-23 14:24:42,000 INFO
org.apache.zookeeper.server.ZooKeeperServer: Expiring session
0x1329577cad70004, timeout of 180000ms exceeded
2011-09-23 14:24:42,001 INFO
org.apache.zookeeper.server.PrepRequestProcessor: Processed session
termination for sessionid: 0x1329577cad70004

Any ideas on how to resolve this?

Thanks in advance.

Regards,
Mayuresh

Re: Coprocessor execution - HBaseServer: (responseTooSlow) java.nio.channels.ClosedChannelException

Posted by Stack <st...@duboce.net>.

responseTooSlow is a new feature in TRUNK.  Slow responses get logged
with detail.  I suppose your long-running CP qualifies.  We should
make it so there can be exceptions.   The ClosedChannelException
usually means the client has given up waiting, it timedout.   Try
setting "ipc.socket.timeout" to something large.

St.Ack

On Mon, Sep 26, 2011 at 3:22 AM, Mayuresh <ma...@gmail.com> wrote:
> I increased the leases in hbase-site.xml to around 50 minutes:
>
>  <property>
>    <name>hbase.regionserver.lease.period</name>
>    <!--value>60000</value-->
>    <value>3000000</value>
>    <description>HRegion server lease period in milliseconds. Default is
>    60 seconds. Clients must report in within this period else they are
>    considered dead.</description>
>  </property>
>
> However I still fail with the same error:
>
> 2011-09-26 15:50:28,857 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":118696,"call":"execCoprocessor([B@5b0e6f59,
> getAvg(org.apache.hadoop.hbase.client.coprocessor.LongColumnInterpreter@10b06ac3,
> {\"timeRange\":[0,9223372036854775807],\"batch\":-1,\"startRow\":\"\",\"stopRow\":\"\",\"totalColumns\":0,\"cacheBlocks\":true,\"families\":{\"data\":[]},\"maxVersions\":1,\"caching\":-1}))","client":"137.72.240.180:54843","starttimems":1317032310155,"queuetimems":9,"class":"HRegionServer","responsesize":0,"method":"execCoprocessor"}
> 2011-09-26 15:50:28,872 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> Server Responder, call execCoprocessor([B@5b0e6f59,
> getAvg(org.apache.hadoop.hbase.client.coprocessor.LongColumnInterpreter@10b06ac3,
> {"timeRange":[0,9223372036854775807],"batch":-1,"startRow":"","stopRow":"","totalColumns":0,"cacheBlocks":true,"families":{"data":[]},"maxVersions":1,"caching":-1}))
> from 137.72.240.180:54843: output error
> 2011-09-26 15:50:28,873 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 4 on 60020 caught:
> java.nio.channels.ClosedChannelException
>        at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
>        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>        at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1501)
>        at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:876)
>        at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:955)
>        at org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:390)
>        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1240)
>
> What does this responseTooSlow and "output error" mean? Any clues to
> find whats causing this?
>
> Thanks.
>
>
> On Mon, Sep 26, 2011 at 2:26 PM, Mayuresh <ma...@gmail.com> wrote:
>> Hi Ted,
>>
>> Yes I am aware that this isn't a good setup. I am working on
>> understanding the coprocessors before I can use a bigger setup.
>>
>> I am talking about the prez at
>> https://hbase.s3.amazonaws.com/hbase/HBase-CP-HUG10.pdf by Andrew
>> Purtell, slide 5 :
>>
>> <quote>
>> For long running jobs the client must periodically poll status to keep it
>> alive; jobs without interest will be cancelled
>> </quote>
>>
>>
>> On Mon, Sep 26, 2011 at 2:08 PM, Ted Yu <yu...@gmail.com> wrote:
>>> Looks like my response to user@ got bounced.
>>>
>>> Himanshu is continuing the work on HBASE-3607
>>>
>>>>> Currently all the regions are on a single region server.
>>> I don't think the above is a good setup.
>>>
>>>>> It was mentioned to poll the job status periodically to avoid timeouts.
>>> Can you tell us more about the presentation ? Normally you can include the
>>> author in the cc list of this email chain.
>>>
>>> Cheers
>>>
>>> On Mon, Sep 26, 2011 at 12:44 AM, Mayuresh <ma...@gmail.com>
>>> wrote:
>>>>
>>>> Yes the computation takes longer than 60 secs. Will setting lease
>>>> timeout in hbase-site.xml help?
>>>>
>>>> In one of the coprocessor presentations. It was mentioned to poll the
>>>> job status periodically to avoid timeouts. Any idea how to do that?
>>>>
>>>> On Sep 23, 2011 9:59 PM, "Stack" <st...@duboce.net> wrote:
>>>> > Is it doing a long GC? Is the aggregation taking longer than the
>>>> > client 60 second timeout (You are setting lease below but I don't
>>>> > think that will have an effect since it server-side)?
>>>> >
>>>> > St.Ack
>>>> >
>>>> >
>>>> > On Fri, Sep 23, 2011 at 2:52 AM, Mayuresh
>>>> > <ma...@gmail.com> wrote:
>>>> >> Hi,
>>>> >>
>>>> >> I am using the AggregationClient to perform aggregate calculation on a
>>>> >> 10 M row hbase table which is spread across around 27 regions.
>>>> >> Currently all the regions are on a single region server.
>>>> >>
>>>> >> Here is briefly what the code is doing:
>>>> >>         Leases leases = new Leases(40 * 60 * 1000, 5 * 1000);
>>>> >>         leases.start();
>>>> >>         long startTime = System.currentTimeMillis();
>>>> >>         System.out.println("Start time: " + startTime);
>>>> >>         Configuration conf = new Configuration(true);
>>>> >>         AggregationClient aClient = new AggregationClient(conf);
>>>> >>
>>>> >>         Scan scan = new Scan();
>>>> >>         scan.addColumn(TEST_FAMILY, TEST_QUALIFIER);
>>>> >>         final ColumnInterpreter<Long, Long> ci = new
>>>> >> LongColumnInterpreter();
>>>> >>
>>>> >>         double avg = aClient.avg(TEST_TABLE, ci,
>>>> >>                 scan);
>>>> >>         leases.close();
>>>> >>         long endTime = System.currentTimeMillis();
>>>> >>         System.out.println("End time: " + endTime);
>>>> >>         System.out.println("Avg is: " + avg);
>>>> >>         System.out.println("Time taken is: " + (endTime - startTime));
>>>> >>
>>>> >> However it is failing after some time with the following log messages
>>>> >> in the region server log:
>>>> >> 2011-09-23 14:21:25,649 DEBUG
>>>> >> org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for
>>>> >> region
>>>> >> Test,serv3-1315575650141,1316682342728.b3fe66994b0f2fb0643c88f6897474a1.
>>>> >> 2011-09-23 14:21:25,650 DEBUG
>>>> >> org.apache.hadoop.hbase.io.hfile.HFileReaderV2: On close of file
>>>> >> 15761056997240226 evicted 344 block(s)
>>>> >> 2011-09-23 14:21:25,650 DEBUG
>>>> >> org.apache.hadoop.hbase.regionserver.Store: closed data
>>>> >> 2011-09-23 14:21:25,650 INFO
>>>> >> org.apache.hadoop.hbase.regionserver.HRegion: Closed
>>>> >>
>>>> >> Test,serv3-1315575650141,1316682342728.b3fe66994b0f2fb0643c88f6897474a1.
>>>> >> 2011-09-23 14:21:25,650 DEBUG
>>>> >> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
>>>> >> Closed region
>>>> >> Test,serv3-1315575650141,1316682342728.b3fe66994b0f2fb0643c88f6897474a1.
>>>> >> 2011-09-23 14:21:25,651 WARN org.apache.hadoop.ipc.HBaseServer: IPC
>>>> >> Server Responder, call execCoprocessor([B@6b6ab732,
>>>> >>
>>>> >> getAvg(org.apache.hadoop.hbase.client.coprocessor.LongColumnInterpreter@2b216ab6,
>>>> >>
>>>> >> {"timeRange":[0,9223372036854775807],"batch":-1,"startRow":"serv1-1315571539741","stopRow":"serv5-1315575834916","totalColumns":0,"cacheBlocks":true,"families":{"data":[]},"maxVersions":1,"caching":-1}))
>>>> >> from 137.72.240.177:47563: output error
>>>> >> 2011-09-23 14:21:25,651 WARN org.apache.hadoop.ipc.HBaseServer: IPC
>>>> >> Server handler 9 on 60020 caught:
>>>> >> java.nio.channels.ClosedChannelException
>>>> >>         at
>>>> >> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
>>>> >>         at
>>>> >> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>>>> >>         at
>>>> >> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1501)
>>>> >>         at
>>>> >> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:876)
>>>> >>         at
>>>> >> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:955)
>>>> >>         at
>>>> >> org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:390)
>>>> >>         at
>>>> >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1240)
>>>> >>
>>>> >> And the following in the zookeeper log:
>>>> >> 2011-09-23 14:19:50,978 INFO
>>>> >> org.apache.zookeeper.server.NIOServerCnxn: Established session
>>>> >> 0x1329577cad70004 with negotiated timeout 180000 for client
>>>> >> /137.72.240.177:45431
>>>> >> 2011-09-23 14:21:11,028 WARN
>>>> >> org.apache.zookeeper.server.NIOServerCnxn: EndOfStreamException:
>>>> >> Unable to read additional data from client sessionid
>>>> >> 0x1329577cad70004, likely client has closed socket
>>>> >> 2011-09-23 14:21:11,029 INFO
>>>> >> org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection
>>>> >> for client /137.72.240.177:45431 which had sessionid 0x1329577cad70004
>>>> >> 2011-09-23 14:24:42,000 INFO
>>>> >> org.apache.zookeeper.server.ZooKeeperServer: Expiring session
>>>> >> 0x1329577cad70004, timeout of 180000ms exceeded
>>>> >> 2011-09-23 14:24:42,001 INFO
>>>> >> org.apache.zookeeper.server.PrepRequestProcessor: Processed session
>>>> >> termination for sessionid: 0x1329577cad70004
>>>> >>
>>>> >> Any ideas on how to resolve this?
>>>> >>
>>>> >> Thanks in advance.
>>>> >>
>>>> >> Regards,
>>>> >> Mayuresh
>>>> >>
>>>
>>>
>>
>>
>>
>> --
>> -Mayuresh
>>
>
>
>
> --
> -Mayuresh
>