You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Vimal Jain <vk...@gmail.com> on 2013/10/22 09:02:16 UTC

High Full GC count for Region server

Hi,
I am running in Hbase in pseudo distributed mode. ( Hadoop version - 1.1.2
, Hbase version - 0.94.7 )
I am getting few exceptions in both hadoop ( namenode , datanode) logs and
hbase(region server).
When i search for these exceptions on google , i concluded  that problem is
mainly due to large number of full GC in region server process.

I used jstat and found that there are total of 950 full GCs in span of 4
days for region server process.Is this ok?

I am totally confused by number of exceptions i am getting.
Also i get below exceptions intermittently.


Region server:-

2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
(responseTooSlow):
{"processingtimems":15312,"call":"next(-6681408251916104762, 1000), rpc
version=1, client version=29, methodsFingerPrint=-1368823753","client":"
192.168.20.31:48270
","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
(operationTooSlow): {"processingtimems":14759,"client":"192.168.20.31:48247
","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"ginfo":["netGainPool"]},"row":"1629657","queuetimems":0,"method":"get","totalColumns":1,"maxVersions":1}

2013-10-18 10:37:45,008 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/hbase/event_data/4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
could only be replicated to 0 nodes, instead of 1
    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)

Name node :-
java.io.IOException: File
/hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
could only be replicated to 0 nodes, instead of 1

java.io.IOException: Got blockReceived message from unregistered or dead
node blk_-2949905629769882833_52274

Data node :-
480000 millis timeout while waiting for channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010remote=/
192.168.20.30:36188]

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.20.30:50010,
storageID=DS-1816106352-192.168.20.30-50010-1369314076237, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 39309 bytes


-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Can you stop HBase and run  fsck on Hadoop to see how your HDFS health is?


2013/10/24 Vimal Jain <vk...@gmail.com>

> Hi Ted/Jean,
> Can you please help here ?
>
>
> On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <vk...@gmail.com> wrote:
>
> > Hi Ted,
> > Yes i checked namenode and datanode logs and i found below exceptions in
> > both the logs:-
> >
> > Name node :-
> > java.io.IOException: File
> >
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> > could only be replicated to 0 nodes, instead of 1
> >
> > java.io.IOException: Got blockReceived message from unregistered or dead
> > node blk_-2949905629769882833_52274
> >
> > Data node :-
> > 480000 millis timeout while waiting for channel to be ready for write. ch
> > : java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
> >  remote=/192.168.20.30:36188]
> >
> > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> > DatanodeRegistration(192.168.20.30:50010,
> > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> infoPort=50075,
> > ipcPort=50020):DataXceiver
> >
> > java.io.EOFException: while trying to read 39309 bytes
> >
> >
> > On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> >> bq. java.io.IOException: File /hbase/event_data/
> >> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> >> could
> >> only be replicated to 0 nodes, instead of 1
> >>
> >> Have you checked Namenode / Datanode logs ?
> >> Looks like hdfs was not stable.
> >>
> >>
> >> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com> wrote:
> >>
> >> > HI Jean,
> >> > Thanks for your reply.
> >> > I have total 8 GB memory and distribution is as follows:-
> >> >
> >> > Region server  - 2 GB
> >> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
> >> > OS - 1 GB
> >> >
> >> > Please let me know if you need more information.
> >> >
> >> >
> >> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
> >> > jean-marc@spaggiari.org> wrote:
> >> >
> >> > > Hi Vimal,
> >> > >
> >> > > What are your settings? Memory of the host, and memory allocated for
> >> the
> >> > > different HBase services?
> >> > >
> >> > > Thanks,
> >> > >
> >> > > JM
> >> > >
> >> > >
> >> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
> >> > >
> >> > > > Hi,
> >> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop
> version -
> >> > > 1.1.2
> >> > > > , Hbase version - 0.94.7 )
> >> > > > I am getting few exceptions in both hadoop ( namenode , datanode)
> >> logs
> >> > > and
> >> > > > hbase(region server).
> >> > > > When i search for these exceptions on google , i concluded  that
> >> > problem
> >> > > is
> >> > > > mainly due to large number of full GC in region server process.
> >> > > >
> >> > > > I used jstat and found that there are total of 950 full GCs in
> span
> >> of
> >> > 4
> >> > > > days for region server process.Is this ok?
> >> > > >
> >> > > > I am totally confused by number of exceptions i am getting.
> >> > > > Also i get below exceptions intermittently.
> >> > > >
> >> > > >
> >> > > > Region server:-
> >> > > >
> >> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
> >> > > > (responseTooSlow):
> >> > > > {"processingtimems":15312,"call":"next(-6681408251916104762,
> 1000),
> >> rpc
> >> > > > version=1, client version=29,
> >> > methodsFingerPrint=-1368823753","client":"
> >> > > > 192.168.20.31:48270
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
> >> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
> >> > > > (operationTooSlow): {"processingtimems":14759,"client":"
> >> > > > 192.168.20.31:48247
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"ginfo":["netGainPool"]},"row":"1629657","queuetimems":0,"method":"get","totalColumns":1,"maxVersions":1}
> >> > > >
> >> > > > 2013-10-18 10:37:45,008 WARN org.apache.hadoop.hdfs.DFSClient:
> >> > > DataStreamer
> >> > > > Exception: org.apache.hadoop.ipc.RemoteException:
> >> java.io.IOException:
> >> > > File
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> /hbase/event_data/4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> >> > > > could only be replicated to 0 nodes, instead of 1
> >> > > >     at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
> >> > > >
> >> > > > Name node :-
> >> > > > java.io.IOException: File
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> >> > > > could only be replicated to 0 nodes, instead of 1
> >> > > >
> >> > > > java.io.IOException: Got blockReceived message from unregistered
> or
> >> > dead
> >> > > > node blk_-2949905629769882833_52274
> >> > > >
> >> > > > Data node :-
> >> > > > 480000 millis timeout while waiting for channel to be ready for
> >> write.
> >> > > ch :
> >> > > > java.nio.channels.SocketChannel[connected local=/
> >> 192.168.20.30:50010
> >> > > > remote=/
> >> > > > 192.168.20.30:36188]
> >> > > >
> >> > > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> >> > > > DatanodeRegistration(
> >> > > > 192.168.20.30:50010,
> >> > > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> >> > > infoPort=50075,
> >> > > > ipcPort=50020):DataXceiver
> >> > > > java.io.EOFException: while trying to read 39309 bytes
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Thanks and Regards,
> >> > > > Vimal Jain
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks and Regards,
> >> > Vimal Jain
> >> >
> >>
> >
> >
> >
> > --
> > Thanks and Regards,
> > Vimal Jain
> >
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>

Re: High Full GC count for Region server

Posted by Adrien Mogenet <ad...@gmail.com>.
The "responseTooSlow" message is triggered whenever a bunch of operations
is taking more than a configured amount of time. In your case, processing
15827 elements can lead into long response time, so no worry about this.

However, your SocketTimeoutException might be due to long GC pauses. I
guess it might also be due to network failures or RS contention (too many
requests on this RS, no more IPC slot...)


On Thu, Oct 31, 2013 at 9:52 AM, Vimal Jain <vk...@gmail.com> wrote:

> Hi,
> Can anyone please reply to the above query ?
>
>
> On Tue, Oct 29, 2013 at 10:48 AM, Vimal Jain <vk...@gmail.com> wrote:
>
> > Hi,
> > Here is my analysis of this problem.Please correct me if i wrong
> somewhere.
> > I have assigned 2 GB to region server process.I think its sufficient
> > enough to handle around 9GB of data.
> > I have not changed much of the parameters , especially memstore size
> which
> > is 128 GB for 0.94.7 by default.
> > Also as per my understanding , each col-family has one memstore
> associated
> > with it.So my memstores are taking 128*3 = 384 MB ( I have 3 column
> > families).
> > So i think i should reduce memstore size to something like 32/64 MB so
> > that data is flushed to disk at higher frequency then current
> > frequency.This will save some memory.
> > Is there any other parameter other then memstore size which affects
> memory
> > utilization.
> >
> > Also I am getting below exceptions in data node log and region server log
> > every day.Is it due to long GC pauses ?
> >
> > Data node logs :-
> >
> > hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 WARN
> > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> > 192.168.20.30:5001
> > 0, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> > infoPort=50075, ipcPort=50020):Got exception while serving
> > blk_-560908881317618221_58058
> >  to /192.168.20.30:
> > hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
> > millis timeout while waiting for channel to be ready for write. ch :
> > java.nio
> > .channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
> > 192.168.20.30:39413]
> > hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 ERROR
> > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> > 192.168.20.30:500
> >
> > 10, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> > infoPort=50075, ipcPort=50020):DataXceiver
> > hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
> > millis timeout while waiting for channel to be ready for write. ch :
> > java.nio
> > .channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
> > 192.168.20.30:39413]
> >
> >
> > Region server logs :-
> >
> > hbase-hadoop-regionserver-woody.log:2013-10-29 01:01:16,475 WARN
> > org.apache.hadoop.ipc.HBaseServer: (responseTooSlow):
> > {"processingtimems":15827,"call
> > ":"multi(org.apache.hadoop.hbase.client.MultiAction@2918e464), rpc
> > version=1, client version=29,
> > methodsFingerPrint=-1368823753","client":"192.168.20.
> >
> >
> 31:50619","starttimems":1382988660645,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
> > hbase-hadoop-regionserver-woody.log:2013-10-29 06:01:27,459 WARN
> > org.apache.hadoop.ipc.HBaseServer: (operationTooSlow):
> > {"processingtimems":14745,"cli
> > ent":"192.168.20.31:50908
> >
> ","timeRange":[0,9223372036854775807],"starttimems":1383006672707,"responsesize":55,"class":"HRegionServer","table":"event_da
> >
> >
> ta","cacheBlocks":true,"families":{"oinfo":["clubStatus"]},"row":"1752869","queuetimems":1,"method":"get","totalColumns":1,"maxVersions":1}
> >
> >
> >
> >
> >
> > On Mon, Oct 28, 2013 at 11:55 PM, Asaf Mesika <asaf.mesika@gmail.com
> >wrote:
> >
> >> Check through HDFS UI that your cluster haven't reached maximum disk
> >> capacity
> >>
> >> On Thursday, October 24, 2013, Vimal Jain wrote:
> >>
> >> > Hi Ted/Jean,
> >> > Can you please help here ?
> >> >
> >> >
> >> > On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <vkjk89@gmail.com
> >> <javascript:;>>
> >> > wrote:
> >> >
> >> > > Hi Ted,
> >> > > Yes i checked namenode and datanode logs and i found below
> exceptions
> >> in
> >> > > both the logs:-
> >> > >
> >> > > Name node :-
> >> > > java.io.IOException: File
> >> > >
> >> >
> >>
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> >> > > could only be replicated to 0 nodes, instead of 1
> >> > >
> >> > > java.io.IOException: Got blockReceived message from unregistered or
> >> dead
> >> > > node blk_-2949905629769882833_52274
> >> > >
> >> > > Data node :-
> >> > > 480000 millis timeout while waiting for channel to be ready for
> >> write. ch
> >> > > : java.nio.channels.SocketChannel[connected local=/
> >> 192.168.20.30:50010
> >> > >  remote=/192.168.20.30:36188]
> >> > >
> >> > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> >> > > DatanodeRegistration(192.168.20.30:50010,
> >> > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> >> > infoPort=50075,
> >> > > ipcPort=50020):DataXceiver
> >> > >
> >> > > java.io.EOFException: while trying to read 39309 bytes
> >> > >
> >> > >
> >> > > On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com>
> wrote:
> >> > >
> >> > >> bq. java.io.IOException: File /hbase/event_data/
> >> > >>
> >> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> >> > >> could
> >> > >> only be replicated to 0 nodes, instead of 1
> >> > >>
> >> > >> Have you checked Namenode / Datanode logs ?
> >> > >> Looks like hdfs was not stable.
> >> > >>
> >> > >>
> >> > >> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com>
> >> wrote:
> >> > >>
> >> > >> > HI Jean,
> >> > >> > Thanks for your reply.
> >> > >> > I have total 8 GB memory and distribution is as follows:-
> >> > >> >
> >> > >> > Region server  - 2 GB
> >> > >> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
> >> > >> > OS - 1 GB
> >> > >> >
> >> > >> > Please let me know if you need more information.
> >> > >> >
> >> > >> >
> >> > >> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
> >> > >> > jean-marc@spaggiari.org> wrote:
> >> > >> >
> >> > >> > > Hi Vimal,
> >> > >> > >
> >> > >> > > What are your settings? Memory of the host, and memory
> allocated
> >> for
> >> > >> the
> >> > >> > > different HBase services?
> >> > >> > >
> >> > >> > > Thanks,
> >> > >> > >
> >> > >> > > JM
> >> > >> > >
> >> > >> > >
> >> > >> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
> >> > >> > >
> >> > >> > > > Hi,
> >> > >> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop
> >> > version -
> >> > >> > > 1.1.2
> >> > >> > > > , Hbase version - 0.94.7 )
> >> > >> > > > I am getting few exceptions in both hadoop ( namenode ,
> >> datanode)
> >> > >> logs
> >> > >> > > and
> >> > >> > > > hbase(region server).
> >> > >> > > > When i search for these exceptions on google , i concluded
> >>  that
> >> > >> > problem
> >> > >> > > is
> >> > >> > > > mainly due to large number of full GC in region server
> process.
> >> > >> > > >
> >> > >> > > > I used jstat and found that there are total of 950 full GCs
> in
> >> > span
> >> > >> of
> >> > >> > 4
> >> > >> > > > days for region server process.Is this ok?
> >> > >> > > >
> >> > >> > > > I am totally confused by number of exceptions i am getting.
> >> > >> > > > Also i get below exceptions intermittently.
> >> > >> > > >
> >> > >> > > >
> >> > >> > > > Region server:-
> >> > >> > > >
> >> > >> > > > 2013-10-22 12:00:26,627 WARN
> org.apache.hadoop.ipc.HBaseServer:
> >> > >> > > > (responseTooSlow):
> >> > >> > > > {"processingtimems":15312,"call":"next(-6681408251916104762,
> >> > 1000),
> >> > >> rpc
> >> > >> > > > version=1, client version=29,
> >> > >> > methodsFingerPrint=-1368823753","client":"
> >> > >> > > > 192.168.20.31:48270
> >> > >> > > >
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
> >> > >> > > > 2013-10-22 12:06:17,606 WARN
> org.apache.hadoop.ipc.HBaseServer:
> >> > >> > > > (operationTooSlow): {"processingtimems":14759,"client":"
> >> > >> > > > 192.168.20.31:48247
> >> > >> > > >
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"gin
> >>
> >
> >
> >
> > --
> > Thanks and Regards,
> > Vimal Jain
> >
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>



-- 
Adrien Mogenet
http://www.borntosegfault.com

Re: High Full GC count for Region server

Posted by Vimal Jain <vk...@gmail.com>.
Hi,
Can anyone please reply to the above query ?


On Tue, Oct 29, 2013 at 10:48 AM, Vimal Jain <vk...@gmail.com> wrote:

> Hi,
> Here is my analysis of this problem.Please correct me if i wrong somewhere.
> I have assigned 2 GB to region server process.I think its sufficient
> enough to handle around 9GB of data.
> I have not changed much of the parameters , especially memstore size which
> is 128 GB for 0.94.7 by default.
> Also as per my understanding , each col-family has one memstore associated
> with it.So my memstores are taking 128*3 = 384 MB ( I have 3 column
> families).
> So i think i should reduce memstore size to something like 32/64 MB so
> that data is flushed to disk at higher frequency then current
> frequency.This will save some memory.
> Is there any other parameter other then memstore size which affects memory
> utilization.
>
> Also I am getting below exceptions in data node log and region server log
> every day.Is it due to long GC pauses ?
>
> Data node logs :-
>
> hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 192.168.20.30:5001
> 0, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> infoPort=50075, ipcPort=50020):Got exception while serving
> blk_-560908881317618221_58058
>  to /192.168.20.30:
> hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
> millis timeout while waiting for channel to be ready for write. ch :
> java.nio
> .channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
> 192.168.20.30:39413]
> hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 192.168.20.30:500
>
> 10, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> infoPort=50075, ipcPort=50020):DataXceiver
> hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
> millis timeout while waiting for channel to be ready for write. ch :
> java.nio
> .channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
> 192.168.20.30:39413]
>
>
> Region server logs :-
>
> hbase-hadoop-regionserver-woody.log:2013-10-29 01:01:16,475 WARN
> org.apache.hadoop.ipc.HBaseServer: (responseTooSlow):
> {"processingtimems":15827,"call
> ":"multi(org.apache.hadoop.hbase.client.MultiAction@2918e464), rpc
> version=1, client version=29,
> methodsFingerPrint=-1368823753","client":"192.168.20.
>
> 31:50619","starttimems":1382988660645,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
> hbase-hadoop-regionserver-woody.log:2013-10-29 06:01:27,459 WARN
> org.apache.hadoop.ipc.HBaseServer: (operationTooSlow):
> {"processingtimems":14745,"cli
> ent":"192.168.20.31:50908
> ","timeRange":[0,9223372036854775807],"starttimems":1383006672707,"responsesize":55,"class":"HRegionServer","table":"event_da
>
> ta","cacheBlocks":true,"families":{"oinfo":["clubStatus"]},"row":"1752869","queuetimems":1,"method":"get","totalColumns":1,"maxVersions":1}
>
>
>
>
>
> On Mon, Oct 28, 2013 at 11:55 PM, Asaf Mesika <as...@gmail.com>wrote:
>
>> Check through HDFS UI that your cluster haven't reached maximum disk
>> capacity
>>
>> On Thursday, October 24, 2013, Vimal Jain wrote:
>>
>> > Hi Ted/Jean,
>> > Can you please help here ?
>> >
>> >
>> > On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <vkjk89@gmail.com
>> <javascript:;>>
>> > wrote:
>> >
>> > > Hi Ted,
>> > > Yes i checked namenode and datanode logs and i found below exceptions
>> in
>> > > both the logs:-
>> > >
>> > > Name node :-
>> > > java.io.IOException: File
>> > >
>> >
>> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
>> > > could only be replicated to 0 nodes, instead of 1
>> > >
>> > > java.io.IOException: Got blockReceived message from unregistered or
>> dead
>> > > node blk_-2949905629769882833_52274
>> > >
>> > > Data node :-
>> > > 480000 millis timeout while waiting for channel to be ready for
>> write. ch
>> > > : java.nio.channels.SocketChannel[connected local=/
>> 192.168.20.30:50010
>> > >  remote=/192.168.20.30:36188]
>> > >
>> > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
>> > > DatanodeRegistration(192.168.20.30:50010,
>> > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
>> > infoPort=50075,
>> > > ipcPort=50020):DataXceiver
>> > >
>> > > java.io.EOFException: while trying to read 39309 bytes
>> > >
>> > >
>> > > On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:
>> > >
>> > >> bq. java.io.IOException: File /hbase/event_data/
>> > >>
>> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
>> > >> could
>> > >> only be replicated to 0 nodes, instead of 1
>> > >>
>> > >> Have you checked Namenode / Datanode logs ?
>> > >> Looks like hdfs was not stable.
>> > >>
>> > >>
>> > >> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com>
>> wrote:
>> > >>
>> > >> > HI Jean,
>> > >> > Thanks for your reply.
>> > >> > I have total 8 GB memory and distribution is as follows:-
>> > >> >
>> > >> > Region server  - 2 GB
>> > >> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
>> > >> > OS - 1 GB
>> > >> >
>> > >> > Please let me know if you need more information.
>> > >> >
>> > >> >
>> > >> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
>> > >> > jean-marc@spaggiari.org> wrote:
>> > >> >
>> > >> > > Hi Vimal,
>> > >> > >
>> > >> > > What are your settings? Memory of the host, and memory allocated
>> for
>> > >> the
>> > >> > > different HBase services?
>> > >> > >
>> > >> > > Thanks,
>> > >> > >
>> > >> > > JM
>> > >> > >
>> > >> > >
>> > >> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
>> > >> > >
>> > >> > > > Hi,
>> > >> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop
>> > version -
>> > >> > > 1.1.2
>> > >> > > > , Hbase version - 0.94.7 )
>> > >> > > > I am getting few exceptions in both hadoop ( namenode ,
>> datanode)
>> > >> logs
>> > >> > > and
>> > >> > > > hbase(region server).
>> > >> > > > When i search for these exceptions on google , i concluded
>>  that
>> > >> > problem
>> > >> > > is
>> > >> > > > mainly due to large number of full GC in region server process.
>> > >> > > >
>> > >> > > > I used jstat and found that there are total of 950 full GCs in
>> > span
>> > >> of
>> > >> > 4
>> > >> > > > days for region server process.Is this ok?
>> > >> > > >
>> > >> > > > I am totally confused by number of exceptions i am getting.
>> > >> > > > Also i get below exceptions intermittently.
>> > >> > > >
>> > >> > > >
>> > >> > > > Region server:-
>> > >> > > >
>> > >> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
>> > >> > > > (responseTooSlow):
>> > >> > > > {"processingtimems":15312,"call":"next(-6681408251916104762,
>> > 1000),
>> > >> rpc
>> > >> > > > version=1, client version=29,
>> > >> > methodsFingerPrint=-1368823753","client":"
>> > >> > > > 192.168.20.31:48270
>> > >> > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
>> > >> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
>> > >> > > > (operationTooSlow): {"processingtimems":14759,"client":"
>> > >> > > > 192.168.20.31:48247
>> > >> > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"gin
>>
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>



-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Vimal Jain <vk...@gmail.com>.
Hi,
Can anyone please reply to the above query ?


On Tue, Oct 29, 2013 at 10:48 AM, Vimal Jain <vk...@gmail.com> wrote:

> Hi,
> Here is my analysis of this problem.Please correct me if i wrong somewhere.
> I have assigned 2 GB to region server process.I think its sufficient
> enough to handle around 9GB of data.
> I have not changed much of the parameters , especially memstore size which
> is 128 GB for 0.94.7 by default.
> Also as per my understanding , each col-family has one memstore associated
> with it.So my memstores are taking 128*3 = 384 MB ( I have 3 column
> families).
> So i think i should reduce memstore size to something like 32/64 MB so
> that data is flushed to disk at higher frequency then current
> frequency.This will save some memory.
> Is there any other parameter other then memstore size which affects memory
> utilization.
>
> Also I am getting below exceptions in data node log and region server log
> every day.Is it due to long GC pauses ?
>
> Data node logs :-
>
> hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 192.168.20.30:5001
> 0, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> infoPort=50075, ipcPort=50020):Got exception while serving
> blk_-560908881317618221_58058
>  to /192.168.20.30:
> hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
> millis timeout while waiting for channel to be ready for write. ch :
> java.nio
> .channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
> 192.168.20.30:39413]
> hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 192.168.20.30:500
>
> 10, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> infoPort=50075, ipcPort=50020):DataXceiver
> hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
> millis timeout while waiting for channel to be ready for write. ch :
> java.nio
> .channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
> 192.168.20.30:39413]
>
>
> Region server logs :-
>
> hbase-hadoop-regionserver-woody.log:2013-10-29 01:01:16,475 WARN
> org.apache.hadoop.ipc.HBaseServer: (responseTooSlow):
> {"processingtimems":15827,"call
> ":"multi(org.apache.hadoop.hbase.client.MultiAction@2918e464), rpc
> version=1, client version=29,
> methodsFingerPrint=-1368823753","client":"192.168.20.
>
> 31:50619","starttimems":1382988660645,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
> hbase-hadoop-regionserver-woody.log:2013-10-29 06:01:27,459 WARN
> org.apache.hadoop.ipc.HBaseServer: (operationTooSlow):
> {"processingtimems":14745,"cli
> ent":"192.168.20.31:50908
> ","timeRange":[0,9223372036854775807],"starttimems":1383006672707,"responsesize":55,"class":"HRegionServer","table":"event_da
>
> ta","cacheBlocks":true,"families":{"oinfo":["clubStatus"]},"row":"1752869","queuetimems":1,"method":"get","totalColumns":1,"maxVersions":1}
>
>
>
>
>
> On Mon, Oct 28, 2013 at 11:55 PM, Asaf Mesika <as...@gmail.com>wrote:
>
>> Check through HDFS UI that your cluster haven't reached maximum disk
>> capacity
>>
>> On Thursday, October 24, 2013, Vimal Jain wrote:
>>
>> > Hi Ted/Jean,
>> > Can you please help here ?
>> >
>> >
>> > On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <vkjk89@gmail.com
>> <javascript:;>>
>> > wrote:
>> >
>> > > Hi Ted,
>> > > Yes i checked namenode and datanode logs and i found below exceptions
>> in
>> > > both the logs:-
>> > >
>> > > Name node :-
>> > > java.io.IOException: File
>> > >
>> >
>> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
>> > > could only be replicated to 0 nodes, instead of 1
>> > >
>> > > java.io.IOException: Got blockReceived message from unregistered or
>> dead
>> > > node blk_-2949905629769882833_52274
>> > >
>> > > Data node :-
>> > > 480000 millis timeout while waiting for channel to be ready for
>> write. ch
>> > > : java.nio.channels.SocketChannel[connected local=/
>> 192.168.20.30:50010
>> > >  remote=/192.168.20.30:36188]
>> > >
>> > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
>> > > DatanodeRegistration(192.168.20.30:50010,
>> > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
>> > infoPort=50075,
>> > > ipcPort=50020):DataXceiver
>> > >
>> > > java.io.EOFException: while trying to read 39309 bytes
>> > >
>> > >
>> > > On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:
>> > >
>> > >> bq. java.io.IOException: File /hbase/event_data/
>> > >>
>> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
>> > >> could
>> > >> only be replicated to 0 nodes, instead of 1
>> > >>
>> > >> Have you checked Namenode / Datanode logs ?
>> > >> Looks like hdfs was not stable.
>> > >>
>> > >>
>> > >> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com>
>> wrote:
>> > >>
>> > >> > HI Jean,
>> > >> > Thanks for your reply.
>> > >> > I have total 8 GB memory and distribution is as follows:-
>> > >> >
>> > >> > Region server  - 2 GB
>> > >> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
>> > >> > OS - 1 GB
>> > >> >
>> > >> > Please let me know if you need more information.
>> > >> >
>> > >> >
>> > >> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
>> > >> > jean-marc@spaggiari.org> wrote:
>> > >> >
>> > >> > > Hi Vimal,
>> > >> > >
>> > >> > > What are your settings? Memory of the host, and memory allocated
>> for
>> > >> the
>> > >> > > different HBase services?
>> > >> > >
>> > >> > > Thanks,
>> > >> > >
>> > >> > > JM
>> > >> > >
>> > >> > >
>> > >> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
>> > >> > >
>> > >> > > > Hi,
>> > >> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop
>> > version -
>> > >> > > 1.1.2
>> > >> > > > , Hbase version - 0.94.7 )
>> > >> > > > I am getting few exceptions in both hadoop ( namenode ,
>> datanode)
>> > >> logs
>> > >> > > and
>> > >> > > > hbase(region server).
>> > >> > > > When i search for these exceptions on google , i concluded
>>  that
>> > >> > problem
>> > >> > > is
>> > >> > > > mainly due to large number of full GC in region server process.
>> > >> > > >
>> > >> > > > I used jstat and found that there are total of 950 full GCs in
>> > span
>> > >> of
>> > >> > 4
>> > >> > > > days for region server process.Is this ok?
>> > >> > > >
>> > >> > > > I am totally confused by number of exceptions i am getting.
>> > >> > > > Also i get below exceptions intermittently.
>> > >> > > >
>> > >> > > >
>> > >> > > > Region server:-
>> > >> > > >
>> > >> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
>> > >> > > > (responseTooSlow):
>> > >> > > > {"processingtimems":15312,"call":"next(-6681408251916104762,
>> > 1000),
>> > >> rpc
>> > >> > > > version=1, client version=29,
>> > >> > methodsFingerPrint=-1368823753","client":"
>> > >> > > > 192.168.20.31:48270
>> > >> > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
>> > >> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
>> > >> > > > (operationTooSlow): {"processingtimems":14759,"client":"
>> > >> > > > 192.168.20.31:48247
>> > >> > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"gin
>>
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>



-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Vimal Jain <vk...@gmail.com>.
Hi,
Can anyone please reply to the above query ?


On Tue, Oct 29, 2013 at 10:48 AM, Vimal Jain <vk...@gmail.com> wrote:

> Hi,
> Here is my analysis of this problem.Please correct me if i wrong somewhere.
> I have assigned 2 GB to region server process.I think its sufficient
> enough to handle around 9GB of data.
> I have not changed much of the parameters , especially memstore size which
> is 128 GB for 0.94.7 by default.
> Also as per my understanding , each col-family has one memstore associated
> with it.So my memstores are taking 128*3 = 384 MB ( I have 3 column
> families).
> So i think i should reduce memstore size to something like 32/64 MB so
> that data is flushed to disk at higher frequency then current
> frequency.This will save some memory.
> Is there any other parameter other then memstore size which affects memory
> utilization.
>
> Also I am getting below exceptions in data node log and region server log
> every day.Is it due to long GC pauses ?
>
> Data node logs :-
>
> hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 192.168.20.30:5001
> 0, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> infoPort=50075, ipcPort=50020):Got exception while serving
> blk_-560908881317618221_58058
>  to /192.168.20.30:
> hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
> millis timeout while waiting for channel to be ready for write. ch :
> java.nio
> .channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
> 192.168.20.30:39413]
> hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 192.168.20.30:500
>
> 10, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> infoPort=50075, ipcPort=50020):DataXceiver
> hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
> millis timeout while waiting for channel to be ready for write. ch :
> java.nio
> .channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
> 192.168.20.30:39413]
>
>
> Region server logs :-
>
> hbase-hadoop-regionserver-woody.log:2013-10-29 01:01:16,475 WARN
> org.apache.hadoop.ipc.HBaseServer: (responseTooSlow):
> {"processingtimems":15827,"call
> ":"multi(org.apache.hadoop.hbase.client.MultiAction@2918e464), rpc
> version=1, client version=29,
> methodsFingerPrint=-1368823753","client":"192.168.20.
>
> 31:50619","starttimems":1382988660645,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
> hbase-hadoop-regionserver-woody.log:2013-10-29 06:01:27,459 WARN
> org.apache.hadoop.ipc.HBaseServer: (operationTooSlow):
> {"processingtimems":14745,"cli
> ent":"192.168.20.31:50908
> ","timeRange":[0,9223372036854775807],"starttimems":1383006672707,"responsesize":55,"class":"HRegionServer","table":"event_da
>
> ta","cacheBlocks":true,"families":{"oinfo":["clubStatus"]},"row":"1752869","queuetimems":1,"method":"get","totalColumns":1,"maxVersions":1}
>
>
>
>
>
> On Mon, Oct 28, 2013 at 11:55 PM, Asaf Mesika <as...@gmail.com>wrote:
>
>> Check through HDFS UI that your cluster haven't reached maximum disk
>> capacity
>>
>> On Thursday, October 24, 2013, Vimal Jain wrote:
>>
>> > Hi Ted/Jean,
>> > Can you please help here ?
>> >
>> >
>> > On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <vkjk89@gmail.com
>> <javascript:;>>
>> > wrote:
>> >
>> > > Hi Ted,
>> > > Yes i checked namenode and datanode logs and i found below exceptions
>> in
>> > > both the logs:-
>> > >
>> > > Name node :-
>> > > java.io.IOException: File
>> > >
>> >
>> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
>> > > could only be replicated to 0 nodes, instead of 1
>> > >
>> > > java.io.IOException: Got blockReceived message from unregistered or
>> dead
>> > > node blk_-2949905629769882833_52274
>> > >
>> > > Data node :-
>> > > 480000 millis timeout while waiting for channel to be ready for
>> write. ch
>> > > : java.nio.channels.SocketChannel[connected local=/
>> 192.168.20.30:50010
>> > >  remote=/192.168.20.30:36188]
>> > >
>> > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
>> > > DatanodeRegistration(192.168.20.30:50010,
>> > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
>> > infoPort=50075,
>> > > ipcPort=50020):DataXceiver
>> > >
>> > > java.io.EOFException: while trying to read 39309 bytes
>> > >
>> > >
>> > > On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:
>> > >
>> > >> bq. java.io.IOException: File /hbase/event_data/
>> > >>
>> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
>> > >> could
>> > >> only be replicated to 0 nodes, instead of 1
>> > >>
>> > >> Have you checked Namenode / Datanode logs ?
>> > >> Looks like hdfs was not stable.
>> > >>
>> > >>
>> > >> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com>
>> wrote:
>> > >>
>> > >> > HI Jean,
>> > >> > Thanks for your reply.
>> > >> > I have total 8 GB memory and distribution is as follows:-
>> > >> >
>> > >> > Region server  - 2 GB
>> > >> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
>> > >> > OS - 1 GB
>> > >> >
>> > >> > Please let me know if you need more information.
>> > >> >
>> > >> >
>> > >> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
>> > >> > jean-marc@spaggiari.org> wrote:
>> > >> >
>> > >> > > Hi Vimal,
>> > >> > >
>> > >> > > What are your settings? Memory of the host, and memory allocated
>> for
>> > >> the
>> > >> > > different HBase services?
>> > >> > >
>> > >> > > Thanks,
>> > >> > >
>> > >> > > JM
>> > >> > >
>> > >> > >
>> > >> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
>> > >> > >
>> > >> > > > Hi,
>> > >> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop
>> > version -
>> > >> > > 1.1.2
>> > >> > > > , Hbase version - 0.94.7 )
>> > >> > > > I am getting few exceptions in both hadoop ( namenode ,
>> datanode)
>> > >> logs
>> > >> > > and
>> > >> > > > hbase(region server).
>> > >> > > > When i search for these exceptions on google , i concluded
>>  that
>> > >> > problem
>> > >> > > is
>> > >> > > > mainly due to large number of full GC in region server process.
>> > >> > > >
>> > >> > > > I used jstat and found that there are total of 950 full GCs in
>> > span
>> > >> of
>> > >> > 4
>> > >> > > > days for region server process.Is this ok?
>> > >> > > >
>> > >> > > > I am totally confused by number of exceptions i am getting.
>> > >> > > > Also i get below exceptions intermittently.
>> > >> > > >
>> > >> > > >
>> > >> > > > Region server:-
>> > >> > > >
>> > >> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
>> > >> > > > (responseTooSlow):
>> > >> > > > {"processingtimems":15312,"call":"next(-6681408251916104762,
>> > 1000),
>> > >> rpc
>> > >> > > > version=1, client version=29,
>> > >> > methodsFingerPrint=-1368823753","client":"
>> > >> > > > 192.168.20.31:48270
>> > >> > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
>> > >> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
>> > >> > > > (operationTooSlow): {"processingtimems":14759,"client":"
>> > >> > > > 192.168.20.31:48247
>> > >> > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"gin
>>
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>



-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Vimal Jain <vk...@gmail.com>.
Hi,
Can anyone please reply to the above query ?


On Tue, Oct 29, 2013 at 10:48 AM, Vimal Jain <vk...@gmail.com> wrote:

> Hi,
> Here is my analysis of this problem.Please correct me if i wrong somewhere.
> I have assigned 2 GB to region server process.I think its sufficient
> enough to handle around 9GB of data.
> I have not changed much of the parameters , especially memstore size which
> is 128 GB for 0.94.7 by default.
> Also as per my understanding , each col-family has one memstore associated
> with it.So my memstores are taking 128*3 = 384 MB ( I have 3 column
> families).
> So i think i should reduce memstore size to something like 32/64 MB so
> that data is flushed to disk at higher frequency then current
> frequency.This will save some memory.
> Is there any other parameter other then memstore size which affects memory
> utilization.
>
> Also I am getting below exceptions in data node log and region server log
> every day.Is it due to long GC pauses ?
>
> Data node logs :-
>
> hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 192.168.20.30:5001
> 0, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> infoPort=50075, ipcPort=50020):Got exception while serving
> blk_-560908881317618221_58058
>  to /192.168.20.30:
> hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
> millis timeout while waiting for channel to be ready for write. ch :
> java.nio
> .channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
> 192.168.20.30:39413]
> hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 192.168.20.30:500
>
> 10, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> infoPort=50075, ipcPort=50020):DataXceiver
> hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
> millis timeout while waiting for channel to be ready for write. ch :
> java.nio
> .channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
> 192.168.20.30:39413]
>
>
> Region server logs :-
>
> hbase-hadoop-regionserver-woody.log:2013-10-29 01:01:16,475 WARN
> org.apache.hadoop.ipc.HBaseServer: (responseTooSlow):
> {"processingtimems":15827,"call
> ":"multi(org.apache.hadoop.hbase.client.MultiAction@2918e464), rpc
> version=1, client version=29,
> methodsFingerPrint=-1368823753","client":"192.168.20.
>
> 31:50619","starttimems":1382988660645,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
> hbase-hadoop-regionserver-woody.log:2013-10-29 06:01:27,459 WARN
> org.apache.hadoop.ipc.HBaseServer: (operationTooSlow):
> {"processingtimems":14745,"cli
> ent":"192.168.20.31:50908
> ","timeRange":[0,9223372036854775807],"starttimems":1383006672707,"responsesize":55,"class":"HRegionServer","table":"event_da
>
> ta","cacheBlocks":true,"families":{"oinfo":["clubStatus"]},"row":"1752869","queuetimems":1,"method":"get","totalColumns":1,"maxVersions":1}
>
>
>
>
>
> On Mon, Oct 28, 2013 at 11:55 PM, Asaf Mesika <as...@gmail.com>wrote:
>
>> Check through HDFS UI that your cluster haven't reached maximum disk
>> capacity
>>
>> On Thursday, October 24, 2013, Vimal Jain wrote:
>>
>> > Hi Ted/Jean,
>> > Can you please help here ?
>> >
>> >
>> > On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <vkjk89@gmail.com
>> <javascript:;>>
>> > wrote:
>> >
>> > > Hi Ted,
>> > > Yes i checked namenode and datanode logs and i found below exceptions
>> in
>> > > both the logs:-
>> > >
>> > > Name node :-
>> > > java.io.IOException: File
>> > >
>> >
>> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
>> > > could only be replicated to 0 nodes, instead of 1
>> > >
>> > > java.io.IOException: Got blockReceived message from unregistered or
>> dead
>> > > node blk_-2949905629769882833_52274
>> > >
>> > > Data node :-
>> > > 480000 millis timeout while waiting for channel to be ready for
>> write. ch
>> > > : java.nio.channels.SocketChannel[connected local=/
>> 192.168.20.30:50010
>> > >  remote=/192.168.20.30:36188]
>> > >
>> > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
>> > > DatanodeRegistration(192.168.20.30:50010,
>> > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
>> > infoPort=50075,
>> > > ipcPort=50020):DataXceiver
>> > >
>> > > java.io.EOFException: while trying to read 39309 bytes
>> > >
>> > >
>> > > On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:
>> > >
>> > >> bq. java.io.IOException: File /hbase/event_data/
>> > >>
>> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
>> > >> could
>> > >> only be replicated to 0 nodes, instead of 1
>> > >>
>> > >> Have you checked Namenode / Datanode logs ?
>> > >> Looks like hdfs was not stable.
>> > >>
>> > >>
>> > >> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com>
>> wrote:
>> > >>
>> > >> > HI Jean,
>> > >> > Thanks for your reply.
>> > >> > I have total 8 GB memory and distribution is as follows:-
>> > >> >
>> > >> > Region server  - 2 GB
>> > >> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
>> > >> > OS - 1 GB
>> > >> >
>> > >> > Please let me know if you need more information.
>> > >> >
>> > >> >
>> > >> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
>> > >> > jean-marc@spaggiari.org> wrote:
>> > >> >
>> > >> > > Hi Vimal,
>> > >> > >
>> > >> > > What are your settings? Memory of the host, and memory allocated
>> for
>> > >> the
>> > >> > > different HBase services?
>> > >> > >
>> > >> > > Thanks,
>> > >> > >
>> > >> > > JM
>> > >> > >
>> > >> > >
>> > >> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
>> > >> > >
>> > >> > > > Hi,
>> > >> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop
>> > version -
>> > >> > > 1.1.2
>> > >> > > > , Hbase version - 0.94.7 )
>> > >> > > > I am getting few exceptions in both hadoop ( namenode ,
>> datanode)
>> > >> logs
>> > >> > > and
>> > >> > > > hbase(region server).
>> > >> > > > When i search for these exceptions on google , i concluded
>>  that
>> > >> > problem
>> > >> > > is
>> > >> > > > mainly due to large number of full GC in region server process.
>> > >> > > >
>> > >> > > > I used jstat and found that there are total of 950 full GCs in
>> > span
>> > >> of
>> > >> > 4
>> > >> > > > days for region server process.Is this ok?
>> > >> > > >
>> > >> > > > I am totally confused by number of exceptions i am getting.
>> > >> > > > Also i get below exceptions intermittently.
>> > >> > > >
>> > >> > > >
>> > >> > > > Region server:-
>> > >> > > >
>> > >> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
>> > >> > > > (responseTooSlow):
>> > >> > > > {"processingtimems":15312,"call":"next(-6681408251916104762,
>> > 1000),
>> > >> rpc
>> > >> > > > version=1, client version=29,
>> > >> > methodsFingerPrint=-1368823753","client":"
>> > >> > > > 192.168.20.31:48270
>> > >> > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
>> > >> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
>> > >> > > > (operationTooSlow): {"processingtimems":14759,"client":"
>> > >> > > > 192.168.20.31:48247
>> > >> > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"gin
>>
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>



-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Vimal Jain <vk...@gmail.com>.
Hi,
Can anyone please reply to the above query ?


On Tue, Oct 29, 2013 at 10:48 AM, Vimal Jain <vk...@gmail.com> wrote:

> Hi,
> Here is my analysis of this problem.Please correct me if i wrong somewhere.
> I have assigned 2 GB to region server process.I think its sufficient
> enough to handle around 9GB of data.
> I have not changed much of the parameters , especially memstore size which
> is 128 GB for 0.94.7 by default.
> Also as per my understanding , each col-family has one memstore associated
> with it.So my memstores are taking 128*3 = 384 MB ( I have 3 column
> families).
> So i think i should reduce memstore size to something like 32/64 MB so
> that data is flushed to disk at higher frequency then current
> frequency.This will save some memory.
> Is there any other parameter other then memstore size which affects memory
> utilization.
>
> Also I am getting below exceptions in data node log and region server log
> every day.Is it due to long GC pauses ?
>
> Data node logs :-
>
> hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 192.168.20.30:5001
> 0, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> infoPort=50075, ipcPort=50020):Got exception while serving
> blk_-560908881317618221_58058
>  to /192.168.20.30:
> hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
> millis timeout while waiting for channel to be ready for write. ch :
> java.nio
> .channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
> 192.168.20.30:39413]
> hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 192.168.20.30:500
>
> 10, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> infoPort=50075, ipcPort=50020):DataXceiver
> hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
> millis timeout while waiting for channel to be ready for write. ch :
> java.nio
> .channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
> 192.168.20.30:39413]
>
>
> Region server logs :-
>
> hbase-hadoop-regionserver-woody.log:2013-10-29 01:01:16,475 WARN
> org.apache.hadoop.ipc.HBaseServer: (responseTooSlow):
> {"processingtimems":15827,"call
> ":"multi(org.apache.hadoop.hbase.client.MultiAction@2918e464), rpc
> version=1, client version=29,
> methodsFingerPrint=-1368823753","client":"192.168.20.
>
> 31:50619","starttimems":1382988660645,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
> hbase-hadoop-regionserver-woody.log:2013-10-29 06:01:27,459 WARN
> org.apache.hadoop.ipc.HBaseServer: (operationTooSlow):
> {"processingtimems":14745,"cli
> ent":"192.168.20.31:50908
> ","timeRange":[0,9223372036854775807],"starttimems":1383006672707,"responsesize":55,"class":"HRegionServer","table":"event_da
>
> ta","cacheBlocks":true,"families":{"oinfo":["clubStatus"]},"row":"1752869","queuetimems":1,"method":"get","totalColumns":1,"maxVersions":1}
>
>
>
>
>
> On Mon, Oct 28, 2013 at 11:55 PM, Asaf Mesika <as...@gmail.com>wrote:
>
>> Check through HDFS UI that your cluster haven't reached maximum disk
>> capacity
>>
>> On Thursday, October 24, 2013, Vimal Jain wrote:
>>
>> > Hi Ted/Jean,
>> > Can you please help here ?
>> >
>> >
>> > On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <vkjk89@gmail.com
>> <javascript:;>>
>> > wrote:
>> >
>> > > Hi Ted,
>> > > Yes i checked namenode and datanode logs and i found below exceptions
>> in
>> > > both the logs:-
>> > >
>> > > Name node :-
>> > > java.io.IOException: File
>> > >
>> >
>> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
>> > > could only be replicated to 0 nodes, instead of 1
>> > >
>> > > java.io.IOException: Got blockReceived message from unregistered or
>> dead
>> > > node blk_-2949905629769882833_52274
>> > >
>> > > Data node :-
>> > > 480000 millis timeout while waiting for channel to be ready for
>> write. ch
>> > > : java.nio.channels.SocketChannel[connected local=/
>> 192.168.20.30:50010
>> > >  remote=/192.168.20.30:36188]
>> > >
>> > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
>> > > DatanodeRegistration(192.168.20.30:50010,
>> > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
>> > infoPort=50075,
>> > > ipcPort=50020):DataXceiver
>> > >
>> > > java.io.EOFException: while trying to read 39309 bytes
>> > >
>> > >
>> > > On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:
>> > >
>> > >> bq. java.io.IOException: File /hbase/event_data/
>> > >>
>> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
>> > >> could
>> > >> only be replicated to 0 nodes, instead of 1
>> > >>
>> > >> Have you checked Namenode / Datanode logs ?
>> > >> Looks like hdfs was not stable.
>> > >>
>> > >>
>> > >> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com>
>> wrote:
>> > >>
>> > >> > HI Jean,
>> > >> > Thanks for your reply.
>> > >> > I have total 8 GB memory and distribution is as follows:-
>> > >> >
>> > >> > Region server  - 2 GB
>> > >> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
>> > >> > OS - 1 GB
>> > >> >
>> > >> > Please let me know if you need more information.
>> > >> >
>> > >> >
>> > >> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
>> > >> > jean-marc@spaggiari.org> wrote:
>> > >> >
>> > >> > > Hi Vimal,
>> > >> > >
>> > >> > > What are your settings? Memory of the host, and memory allocated
>> for
>> > >> the
>> > >> > > different HBase services?
>> > >> > >
>> > >> > > Thanks,
>> > >> > >
>> > >> > > JM
>> > >> > >
>> > >> > >
>> > >> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
>> > >> > >
>> > >> > > > Hi,
>> > >> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop
>> > version -
>> > >> > > 1.1.2
>> > >> > > > , Hbase version - 0.94.7 )
>> > >> > > > I am getting few exceptions in both hadoop ( namenode ,
>> datanode)
>> > >> logs
>> > >> > > and
>> > >> > > > hbase(region server).
>> > >> > > > When i search for these exceptions on google , i concluded
>>  that
>> > >> > problem
>> > >> > > is
>> > >> > > > mainly due to large number of full GC in region server process.
>> > >> > > >
>> > >> > > > I used jstat and found that there are total of 950 full GCs in
>> > span
>> > >> of
>> > >> > 4
>> > >> > > > days for region server process.Is this ok?
>> > >> > > >
>> > >> > > > I am totally confused by number of exceptions i am getting.
>> > >> > > > Also i get below exceptions intermittently.
>> > >> > > >
>> > >> > > >
>> > >> > > > Region server:-
>> > >> > > >
>> > >> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
>> > >> > > > (responseTooSlow):
>> > >> > > > {"processingtimems":15312,"call":"next(-6681408251916104762,
>> > 1000),
>> > >> rpc
>> > >> > > > version=1, client version=29,
>> > >> > methodsFingerPrint=-1368823753","client":"
>> > >> > > > 192.168.20.31:48270
>> > >> > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
>> > >> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
>> > >> > > > (operationTooSlow): {"processingtimems":14759,"client":"
>> > >> > > > 192.168.20.31:48247
>> > >> > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"gin
>>
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>



-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Vimal Jain <vk...@gmail.com>.
Hi,
Here is my analysis of this problem.Please correct me if i wrong somewhere.
I have assigned 2 GB to region server process.I think its sufficient enough
to handle around 9GB of data.
I have not changed much of the parameters , especially memstore size which
is 128 GB for 0.94.7 by default.
Also as per my understanding , each col-family has one memstore associated
with it.So my memstores are taking 128*3 = 384 MB ( I have 3 column
families).
So i think i should reduce memstore size to something like 32/64 MB so that
data is flushed to disk at higher frequency then current frequency.This
will save some memory.
Is there any other parameter other then memstore size which affects memory
utilization.

Also I am getting below exceptions in data node log and region server log
every day.Is it due to long GC pauses ?

Data node logs :-

hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.20.30:5001
0, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
infoPort=50075, ipcPort=50020):Got exception while serving
blk_-560908881317618221_58058
 to /192.168.20.30:
hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
millis timeout while waiting for channel to be ready for write. ch :
java.nio
.channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
192.168.20.30:39413]
hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.20.30:500
10, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
infoPort=50075, ipcPort=50020):DataXceiver
hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
millis timeout while waiting for channel to be ready for write. ch :
java.nio
.channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
192.168.20.30:39413]


Region server logs :-

hbase-hadoop-regionserver-woody.log:2013-10-29 01:01:16,475 WARN
org.apache.hadoop.ipc.HBaseServer: (responseTooSlow):
{"processingtimems":15827,"call
":"multi(org.apache.hadoop.hbase.client.MultiAction@2918e464), rpc
version=1, client version=29,
methodsFingerPrint=-1368823753","client":"192.168.20.
31:50619","starttimems":1382988660645,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
hbase-hadoop-regionserver-woody.log:2013-10-29 06:01:27,459 WARN
org.apache.hadoop.ipc.HBaseServer: (operationTooSlow):
{"processingtimems":14745,"cli
ent":"192.168.20.31:50908
","timeRange":[0,9223372036854775807],"starttimems":1383006672707,"responsesize":55,"class":"HRegionServer","table":"event_da
ta","cacheBlocks":true,"families":{"oinfo":["clubStatus"]},"row":"1752869","queuetimems":1,"method":"get","totalColumns":1,"maxVersions":1}





On Mon, Oct 28, 2013 at 11:55 PM, Asaf Mesika <as...@gmail.com> wrote:

> Check through HDFS UI that your cluster haven't reached maximum disk
> capacity
>
> On Thursday, October 24, 2013, Vimal Jain wrote:
>
> > Hi Ted/Jean,
> > Can you please help here ?
> >
> >
> > On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <vkjk89@gmail.com
> <javascript:;>>
> > wrote:
> >
> > > Hi Ted,
> > > Yes i checked namenode and datanode logs and i found below exceptions
> in
> > > both the logs:-
> > >
> > > Name node :-
> > > java.io.IOException: File
> > >
> >
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> > > could only be replicated to 0 nodes, instead of 1
> > >
> > > java.io.IOException: Got blockReceived message from unregistered or
> dead
> > > node blk_-2949905629769882833_52274
> > >
> > > Data node :-
> > > 480000 millis timeout while waiting for channel to be ready for write.
> ch
> > > : java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
> > >  remote=/192.168.20.30:36188]
> > >
> > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> > > DatanodeRegistration(192.168.20.30:50010,
> > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> > infoPort=50075,
> > > ipcPort=50020):DataXceiver
> > >
> > > java.io.EOFException: while trying to read 39309 bytes
> > >
> > >
> > > On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > >> bq. java.io.IOException: File /hbase/event_data/
> > >> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> > >> could
> > >> only be replicated to 0 nodes, instead of 1
> > >>
> > >> Have you checked Namenode / Datanode logs ?
> > >> Looks like hdfs was not stable.
> > >>
> > >>
> > >> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com> wrote:
> > >>
> > >> > HI Jean,
> > >> > Thanks for your reply.
> > >> > I have total 8 GB memory and distribution is as follows:-
> > >> >
> > >> > Region server  - 2 GB
> > >> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
> > >> > OS - 1 GB
> > >> >
> > >> > Please let me know if you need more information.
> > >> >
> > >> >
> > >> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
> > >> > jean-marc@spaggiari.org> wrote:
> > >> >
> > >> > > Hi Vimal,
> > >> > >
> > >> > > What are your settings? Memory of the host, and memory allocated
> for
> > >> the
> > >> > > different HBase services?
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > JM
> > >> > >
> > >> > >
> > >> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
> > >> > >
> > >> > > > Hi,
> > >> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop
> > version -
> > >> > > 1.1.2
> > >> > > > , Hbase version - 0.94.7 )
> > >> > > > I am getting few exceptions in both hadoop ( namenode ,
> datanode)
> > >> logs
> > >> > > and
> > >> > > > hbase(region server).
> > >> > > > When i search for these exceptions on google , i concluded  that
> > >> > problem
> > >> > > is
> > >> > > > mainly due to large number of full GC in region server process.
> > >> > > >
> > >> > > > I used jstat and found that there are total of 950 full GCs in
> > span
> > >> of
> > >> > 4
> > >> > > > days for region server process.Is this ok?
> > >> > > >
> > >> > > > I am totally confused by number of exceptions i am getting.
> > >> > > > Also i get below exceptions intermittently.
> > >> > > >
> > >> > > >
> > >> > > > Region server:-
> > >> > > >
> > >> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
> > >> > > > (responseTooSlow):
> > >> > > > {"processingtimems":15312,"call":"next(-6681408251916104762,
> > 1000),
> > >> rpc
> > >> > > > version=1, client version=29,
> > >> > methodsFingerPrint=-1368823753","client":"
> > >> > > > 192.168.20.31:48270
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
> > >> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
> > >> > > > (operationTooSlow): {"processingtimems":14759,"client":"
> > >> > > > 192.168.20.31:48247
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"gin
>



-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Vimal Jain <vk...@gmail.com>.
Hi,
Here is my analysis of this problem.Please correct me if i wrong somewhere.
I have assigned 2 GB to region server process.I think its sufficient enough
to handle around 9GB of data.
I have not changed much of the parameters , especially memstore size which
is 128 GB for 0.94.7 by default.
Also as per my understanding , each col-family has one memstore associated
with it.So my memstores are taking 128*3 = 384 MB ( I have 3 column
families).
So i think i should reduce memstore size to something like 32/64 MB so that
data is flushed to disk at higher frequency then current frequency.This
will save some memory.
Is there any other parameter other then memstore size which affects memory
utilization.

Also I am getting below exceptions in data node log and region server log
every day.Is it due to long GC pauses ?

Data node logs :-

hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.20.30:5001
0, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
infoPort=50075, ipcPort=50020):Got exception while serving
blk_-560908881317618221_58058
 to /192.168.20.30:
hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
millis timeout while waiting for channel to be ready for write. ch :
java.nio
.channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
192.168.20.30:39413]
hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.20.30:500
10, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
infoPort=50075, ipcPort=50020):DataXceiver
hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
millis timeout while waiting for channel to be ready for write. ch :
java.nio
.channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
192.168.20.30:39413]


Region server logs :-

hbase-hadoop-regionserver-woody.log:2013-10-29 01:01:16,475 WARN
org.apache.hadoop.ipc.HBaseServer: (responseTooSlow):
{"processingtimems":15827,"call
":"multi(org.apache.hadoop.hbase.client.MultiAction@2918e464), rpc
version=1, client version=29,
methodsFingerPrint=-1368823753","client":"192.168.20.
31:50619","starttimems":1382988660645,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
hbase-hadoop-regionserver-woody.log:2013-10-29 06:01:27,459 WARN
org.apache.hadoop.ipc.HBaseServer: (operationTooSlow):
{"processingtimems":14745,"cli
ent":"192.168.20.31:50908
","timeRange":[0,9223372036854775807],"starttimems":1383006672707,"responsesize":55,"class":"HRegionServer","table":"event_da
ta","cacheBlocks":true,"families":{"oinfo":["clubStatus"]},"row":"1752869","queuetimems":1,"method":"get","totalColumns":1,"maxVersions":1}





On Mon, Oct 28, 2013 at 11:55 PM, Asaf Mesika <as...@gmail.com> wrote:

> Check through HDFS UI that your cluster haven't reached maximum disk
> capacity
>
> On Thursday, October 24, 2013, Vimal Jain wrote:
>
> > Hi Ted/Jean,
> > Can you please help here ?
> >
> >
> > On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <vkjk89@gmail.com
> <javascript:;>>
> > wrote:
> >
> > > Hi Ted,
> > > Yes i checked namenode and datanode logs and i found below exceptions
> in
> > > both the logs:-
> > >
> > > Name node :-
> > > java.io.IOException: File
> > >
> >
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> > > could only be replicated to 0 nodes, instead of 1
> > >
> > > java.io.IOException: Got blockReceived message from unregistered or
> dead
> > > node blk_-2949905629769882833_52274
> > >
> > > Data node :-
> > > 480000 millis timeout while waiting for channel to be ready for write.
> ch
> > > : java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
> > >  remote=/192.168.20.30:36188]
> > >
> > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> > > DatanodeRegistration(192.168.20.30:50010,
> > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> > infoPort=50075,
> > > ipcPort=50020):DataXceiver
> > >
> > > java.io.EOFException: while trying to read 39309 bytes
> > >
> > >
> > > On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > >> bq. java.io.IOException: File /hbase/event_data/
> > >> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> > >> could
> > >> only be replicated to 0 nodes, instead of 1
> > >>
> > >> Have you checked Namenode / Datanode logs ?
> > >> Looks like hdfs was not stable.
> > >>
> > >>
> > >> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com> wrote:
> > >>
> > >> > HI Jean,
> > >> > Thanks for your reply.
> > >> > I have total 8 GB memory and distribution is as follows:-
> > >> >
> > >> > Region server  - 2 GB
> > >> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
> > >> > OS - 1 GB
> > >> >
> > >> > Please let me know if you need more information.
> > >> >
> > >> >
> > >> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
> > >> > jean-marc@spaggiari.org> wrote:
> > >> >
> > >> > > Hi Vimal,
> > >> > >
> > >> > > What are your settings? Memory of the host, and memory allocated
> for
> > >> the
> > >> > > different HBase services?
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > JM
> > >> > >
> > >> > >
> > >> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
> > >> > >
> > >> > > > Hi,
> > >> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop
> > version -
> > >> > > 1.1.2
> > >> > > > , Hbase version - 0.94.7 )
> > >> > > > I am getting few exceptions in both hadoop ( namenode ,
> datanode)
> > >> logs
> > >> > > and
> > >> > > > hbase(region server).
> > >> > > > When i search for these exceptions on google , i concluded  that
> > >> > problem
> > >> > > is
> > >> > > > mainly due to large number of full GC in region server process.
> > >> > > >
> > >> > > > I used jstat and found that there are total of 950 full GCs in
> > span
> > >> of
> > >> > 4
> > >> > > > days for region server process.Is this ok?
> > >> > > >
> > >> > > > I am totally confused by number of exceptions i am getting.
> > >> > > > Also i get below exceptions intermittently.
> > >> > > >
> > >> > > >
> > >> > > > Region server:-
> > >> > > >
> > >> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
> > >> > > > (responseTooSlow):
> > >> > > > {"processingtimems":15312,"call":"next(-6681408251916104762,
> > 1000),
> > >> rpc
> > >> > > > version=1, client version=29,
> > >> > methodsFingerPrint=-1368823753","client":"
> > >> > > > 192.168.20.31:48270
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
> > >> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
> > >> > > > (operationTooSlow): {"processingtimems":14759,"client":"
> > >> > > > 192.168.20.31:48247
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"gin
>



-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Vimal Jain <vk...@gmail.com>.
Hi,
Here is my analysis of this problem.Please correct me if i wrong somewhere.
I have assigned 2 GB to region server process.I think its sufficient enough
to handle around 9GB of data.
I have not changed much of the parameters , especially memstore size which
is 128 GB for 0.94.7 by default.
Also as per my understanding , each col-family has one memstore associated
with it.So my memstores are taking 128*3 = 384 MB ( I have 3 column
families).
So i think i should reduce memstore size to something like 32/64 MB so that
data is flushed to disk at higher frequency then current frequency.This
will save some memory.
Is there any other parameter other then memstore size which affects memory
utilization.

Also I am getting below exceptions in data node log and region server log
every day.Is it due to long GC pauses ?

Data node logs :-

hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.20.30:5001
0, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
infoPort=50075, ipcPort=50020):Got exception while serving
blk_-560908881317618221_58058
 to /192.168.20.30:
hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
millis timeout while waiting for channel to be ready for write. ch :
java.nio
.channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
192.168.20.30:39413]
hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.20.30:500
10, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
infoPort=50075, ipcPort=50020):DataXceiver
hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
millis timeout while waiting for channel to be ready for write. ch :
java.nio
.channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
192.168.20.30:39413]


Region server logs :-

hbase-hadoop-regionserver-woody.log:2013-10-29 01:01:16,475 WARN
org.apache.hadoop.ipc.HBaseServer: (responseTooSlow):
{"processingtimems":15827,"call
":"multi(org.apache.hadoop.hbase.client.MultiAction@2918e464), rpc
version=1, client version=29,
methodsFingerPrint=-1368823753","client":"192.168.20.
31:50619","starttimems":1382988660645,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
hbase-hadoop-regionserver-woody.log:2013-10-29 06:01:27,459 WARN
org.apache.hadoop.ipc.HBaseServer: (operationTooSlow):
{"processingtimems":14745,"cli
ent":"192.168.20.31:50908
","timeRange":[0,9223372036854775807],"starttimems":1383006672707,"responsesize":55,"class":"HRegionServer","table":"event_da
ta","cacheBlocks":true,"families":{"oinfo":["clubStatus"]},"row":"1752869","queuetimems":1,"method":"get","totalColumns":1,"maxVersions":1}





On Mon, Oct 28, 2013 at 11:55 PM, Asaf Mesika <as...@gmail.com> wrote:

> Check through HDFS UI that your cluster haven't reached maximum disk
> capacity
>
> On Thursday, October 24, 2013, Vimal Jain wrote:
>
> > Hi Ted/Jean,
> > Can you please help here ?
> >
> >
> > On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <vkjk89@gmail.com
> <javascript:;>>
> > wrote:
> >
> > > Hi Ted,
> > > Yes i checked namenode and datanode logs and i found below exceptions
> in
> > > both the logs:-
> > >
> > > Name node :-
> > > java.io.IOException: File
> > >
> >
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> > > could only be replicated to 0 nodes, instead of 1
> > >
> > > java.io.IOException: Got blockReceived message from unregistered or
> dead
> > > node blk_-2949905629769882833_52274
> > >
> > > Data node :-
> > > 480000 millis timeout while waiting for channel to be ready for write.
> ch
> > > : java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
> > >  remote=/192.168.20.30:36188]
> > >
> > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> > > DatanodeRegistration(192.168.20.30:50010,
> > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> > infoPort=50075,
> > > ipcPort=50020):DataXceiver
> > >
> > > java.io.EOFException: while trying to read 39309 bytes
> > >
> > >
> > > On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > >> bq. java.io.IOException: File /hbase/event_data/
> > >> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> > >> could
> > >> only be replicated to 0 nodes, instead of 1
> > >>
> > >> Have you checked Namenode / Datanode logs ?
> > >> Looks like hdfs was not stable.
> > >>
> > >>
> > >> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com> wrote:
> > >>
> > >> > HI Jean,
> > >> > Thanks for your reply.
> > >> > I have total 8 GB memory and distribution is as follows:-
> > >> >
> > >> > Region server  - 2 GB
> > >> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
> > >> > OS - 1 GB
> > >> >
> > >> > Please let me know if you need more information.
> > >> >
> > >> >
> > >> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
> > >> > jean-marc@spaggiari.org> wrote:
> > >> >
> > >> > > Hi Vimal,
> > >> > >
> > >> > > What are your settings? Memory of the host, and memory allocated
> for
> > >> the
> > >> > > different HBase services?
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > JM
> > >> > >
> > >> > >
> > >> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
> > >> > >
> > >> > > > Hi,
> > >> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop
> > version -
> > >> > > 1.1.2
> > >> > > > , Hbase version - 0.94.7 )
> > >> > > > I am getting few exceptions in both hadoop ( namenode ,
> datanode)
> > >> logs
> > >> > > and
> > >> > > > hbase(region server).
> > >> > > > When i search for these exceptions on google , i concluded  that
> > >> > problem
> > >> > > is
> > >> > > > mainly due to large number of full GC in region server process.
> > >> > > >
> > >> > > > I used jstat and found that there are total of 950 full GCs in
> > span
> > >> of
> > >> > 4
> > >> > > > days for region server process.Is this ok?
> > >> > > >
> > >> > > > I am totally confused by number of exceptions i am getting.
> > >> > > > Also i get below exceptions intermittently.
> > >> > > >
> > >> > > >
> > >> > > > Region server:-
> > >> > > >
> > >> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
> > >> > > > (responseTooSlow):
> > >> > > > {"processingtimems":15312,"call":"next(-6681408251916104762,
> > 1000),
> > >> rpc
> > >> > > > version=1, client version=29,
> > >> > methodsFingerPrint=-1368823753","client":"
> > >> > > > 192.168.20.31:48270
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
> > >> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
> > >> > > > (operationTooSlow): {"processingtimems":14759,"client":"
> > >> > > > 192.168.20.31:48247
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"gin
>



-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Vimal Jain <vk...@gmail.com>.
Hi,
Here is my analysis of this problem.Please correct me if i wrong somewhere.
I have assigned 2 GB to region server process.I think its sufficient enough
to handle around 9GB of data.
I have not changed much of the parameters , especially memstore size which
is 128 GB for 0.94.7 by default.
Also as per my understanding , each col-family has one memstore associated
with it.So my memstores are taking 128*3 = 384 MB ( I have 3 column
families).
So i think i should reduce memstore size to something like 32/64 MB so that
data is flushed to disk at higher frequency then current frequency.This
will save some memory.
Is there any other parameter other then memstore size which affects memory
utilization.

Also I am getting below exceptions in data node log and region server log
every day.Is it due to long GC pauses ?

Data node logs :-

hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.20.30:5001
0, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
infoPort=50075, ipcPort=50020):Got exception while serving
blk_-560908881317618221_58058
 to /192.168.20.30:
hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
millis timeout while waiting for channel to be ready for write. ch :
java.nio
.channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
192.168.20.30:39413]
hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.20.30:500
10, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
infoPort=50075, ipcPort=50020):DataXceiver
hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
millis timeout while waiting for channel to be ready for write. ch :
java.nio
.channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
192.168.20.30:39413]


Region server logs :-

hbase-hadoop-regionserver-woody.log:2013-10-29 01:01:16,475 WARN
org.apache.hadoop.ipc.HBaseServer: (responseTooSlow):
{"processingtimems":15827,"call
":"multi(org.apache.hadoop.hbase.client.MultiAction@2918e464), rpc
version=1, client version=29,
methodsFingerPrint=-1368823753","client":"192.168.20.
31:50619","starttimems":1382988660645,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
hbase-hadoop-regionserver-woody.log:2013-10-29 06:01:27,459 WARN
org.apache.hadoop.ipc.HBaseServer: (operationTooSlow):
{"processingtimems":14745,"cli
ent":"192.168.20.31:50908
","timeRange":[0,9223372036854775807],"starttimems":1383006672707,"responsesize":55,"class":"HRegionServer","table":"event_da
ta","cacheBlocks":true,"families":{"oinfo":["clubStatus"]},"row":"1752869","queuetimems":1,"method":"get","totalColumns":1,"maxVersions":1}





On Mon, Oct 28, 2013 at 11:55 PM, Asaf Mesika <as...@gmail.com> wrote:

> Check through HDFS UI that your cluster haven't reached maximum disk
> capacity
>
> On Thursday, October 24, 2013, Vimal Jain wrote:
>
> > Hi Ted/Jean,
> > Can you please help here ?
> >
> >
> > On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <vkjk89@gmail.com
> <javascript:;>>
> > wrote:
> >
> > > Hi Ted,
> > > Yes i checked namenode and datanode logs and i found below exceptions
> in
> > > both the logs:-
> > >
> > > Name node :-
> > > java.io.IOException: File
> > >
> >
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> > > could only be replicated to 0 nodes, instead of 1
> > >
> > > java.io.IOException: Got blockReceived message from unregistered or
> dead
> > > node blk_-2949905629769882833_52274
> > >
> > > Data node :-
> > > 480000 millis timeout while waiting for channel to be ready for write.
> ch
> > > : java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
> > >  remote=/192.168.20.30:36188]
> > >
> > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> > > DatanodeRegistration(192.168.20.30:50010,
> > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> > infoPort=50075,
> > > ipcPort=50020):DataXceiver
> > >
> > > java.io.EOFException: while trying to read 39309 bytes
> > >
> > >
> > > On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > >> bq. java.io.IOException: File /hbase/event_data/
> > >> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> > >> could
> > >> only be replicated to 0 nodes, instead of 1
> > >>
> > >> Have you checked Namenode / Datanode logs ?
> > >> Looks like hdfs was not stable.
> > >>
> > >>
> > >> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com> wrote:
> > >>
> > >> > HI Jean,
> > >> > Thanks for your reply.
> > >> > I have total 8 GB memory and distribution is as follows:-
> > >> >
> > >> > Region server  - 2 GB
> > >> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
> > >> > OS - 1 GB
> > >> >
> > >> > Please let me know if you need more information.
> > >> >
> > >> >
> > >> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
> > >> > jean-marc@spaggiari.org> wrote:
> > >> >
> > >> > > Hi Vimal,
> > >> > >
> > >> > > What are your settings? Memory of the host, and memory allocated
> for
> > >> the
> > >> > > different HBase services?
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > JM
> > >> > >
> > >> > >
> > >> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
> > >> > >
> > >> > > > Hi,
> > >> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop
> > version -
> > >> > > 1.1.2
> > >> > > > , Hbase version - 0.94.7 )
> > >> > > > I am getting few exceptions in both hadoop ( namenode ,
> datanode)
> > >> logs
> > >> > > and
> > >> > > > hbase(region server).
> > >> > > > When i search for these exceptions on google , i concluded  that
> > >> > problem
> > >> > > is
> > >> > > > mainly due to large number of full GC in region server process.
> > >> > > >
> > >> > > > I used jstat and found that there are total of 950 full GCs in
> > span
> > >> of
> > >> > 4
> > >> > > > days for region server process.Is this ok?
> > >> > > >
> > >> > > > I am totally confused by number of exceptions i am getting.
> > >> > > > Also i get below exceptions intermittently.
> > >> > > >
> > >> > > >
> > >> > > > Region server:-
> > >> > > >
> > >> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
> > >> > > > (responseTooSlow):
> > >> > > > {"processingtimems":15312,"call":"next(-6681408251916104762,
> > 1000),
> > >> rpc
> > >> > > > version=1, client version=29,
> > >> > methodsFingerPrint=-1368823753","client":"
> > >> > > > 192.168.20.31:48270
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
> > >> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
> > >> > > > (operationTooSlow): {"processingtimems":14759,"client":"
> > >> > > > 192.168.20.31:48247
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"gin
>



-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Vimal Jain <vk...@gmail.com>.
Hi,
Here is my analysis of this problem.Please correct me if i wrong somewhere.
I have assigned 2 GB to region server process.I think its sufficient enough
to handle around 9GB of data.
I have not changed much of the parameters , especially memstore size which
is 128 GB for 0.94.7 by default.
Also as per my understanding , each col-family has one memstore associated
with it.So my memstores are taking 128*3 = 384 MB ( I have 3 column
families).
So i think i should reduce memstore size to something like 32/64 MB so that
data is flushed to disk at higher frequency then current frequency.This
will save some memory.
Is there any other parameter other then memstore size which affects memory
utilization.

Also I am getting below exceptions in data node log and region server log
every day.Is it due to long GC pauses ?

Data node logs :-

hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.20.30:5001
0, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
infoPort=50075, ipcPort=50020):Got exception while serving
blk_-560908881317618221_58058
 to /192.168.20.30:
hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
millis timeout while waiting for channel to be ready for write. ch :
java.nio
.channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
192.168.20.30:39413]
hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.20.30:500
10, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
infoPort=50075, ipcPort=50020):DataXceiver
hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
millis timeout while waiting for channel to be ready for write. ch :
java.nio
.channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
192.168.20.30:39413]


Region server logs :-

hbase-hadoop-regionserver-woody.log:2013-10-29 01:01:16,475 WARN
org.apache.hadoop.ipc.HBaseServer: (responseTooSlow):
{"processingtimems":15827,"call
":"multi(org.apache.hadoop.hbase.client.MultiAction@2918e464), rpc
version=1, client version=29,
methodsFingerPrint=-1368823753","client":"192.168.20.
31:50619","starttimems":1382988660645,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
hbase-hadoop-regionserver-woody.log:2013-10-29 06:01:27,459 WARN
org.apache.hadoop.ipc.HBaseServer: (operationTooSlow):
{"processingtimems":14745,"cli
ent":"192.168.20.31:50908
","timeRange":[0,9223372036854775807],"starttimems":1383006672707,"responsesize":55,"class":"HRegionServer","table":"event_da
ta","cacheBlocks":true,"families":{"oinfo":["clubStatus"]},"row":"1752869","queuetimems":1,"method":"get","totalColumns":1,"maxVersions":1}





On Mon, Oct 28, 2013 at 11:55 PM, Asaf Mesika <as...@gmail.com> wrote:

> Check through HDFS UI that your cluster haven't reached maximum disk
> capacity
>
> On Thursday, October 24, 2013, Vimal Jain wrote:
>
> > Hi Ted/Jean,
> > Can you please help here ?
> >
> >
> > On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <vkjk89@gmail.com
> <javascript:;>>
> > wrote:
> >
> > > Hi Ted,
> > > Yes i checked namenode and datanode logs and i found below exceptions
> in
> > > both the logs:-
> > >
> > > Name node :-
> > > java.io.IOException: File
> > >
> >
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> > > could only be replicated to 0 nodes, instead of 1
> > >
> > > java.io.IOException: Got blockReceived message from unregistered or
> dead
> > > node blk_-2949905629769882833_52274
> > >
> > > Data node :-
> > > 480000 millis timeout while waiting for channel to be ready for write.
> ch
> > > : java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
> > >  remote=/192.168.20.30:36188]
> > >
> > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> > > DatanodeRegistration(192.168.20.30:50010,
> > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> > infoPort=50075,
> > > ipcPort=50020):DataXceiver
> > >
> > > java.io.EOFException: while trying to read 39309 bytes
> > >
> > >
> > > On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > >> bq. java.io.IOException: File /hbase/event_data/
> > >> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> > >> could
> > >> only be replicated to 0 nodes, instead of 1
> > >>
> > >> Have you checked Namenode / Datanode logs ?
> > >> Looks like hdfs was not stable.
> > >>
> > >>
> > >> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com> wrote:
> > >>
> > >> > HI Jean,
> > >> > Thanks for your reply.
> > >> > I have total 8 GB memory and distribution is as follows:-
> > >> >
> > >> > Region server  - 2 GB
> > >> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
> > >> > OS - 1 GB
> > >> >
> > >> > Please let me know if you need more information.
> > >> >
> > >> >
> > >> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
> > >> > jean-marc@spaggiari.org> wrote:
> > >> >
> > >> > > Hi Vimal,
> > >> > >
> > >> > > What are your settings? Memory of the host, and memory allocated
> for
> > >> the
> > >> > > different HBase services?
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > JM
> > >> > >
> > >> > >
> > >> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
> > >> > >
> > >> > > > Hi,
> > >> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop
> > version -
> > >> > > 1.1.2
> > >> > > > , Hbase version - 0.94.7 )
> > >> > > > I am getting few exceptions in both hadoop ( namenode ,
> datanode)
> > >> logs
> > >> > > and
> > >> > > > hbase(region server).
> > >> > > > When i search for these exceptions on google , i concluded  that
> > >> > problem
> > >> > > is
> > >> > > > mainly due to large number of full GC in region server process.
> > >> > > >
> > >> > > > I used jstat and found that there are total of 950 full GCs in
> > span
> > >> of
> > >> > 4
> > >> > > > days for region server process.Is this ok?
> > >> > > >
> > >> > > > I am totally confused by number of exceptions i am getting.
> > >> > > > Also i get below exceptions intermittently.
> > >> > > >
> > >> > > >
> > >> > > > Region server:-
> > >> > > >
> > >> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
> > >> > > > (responseTooSlow):
> > >> > > > {"processingtimems":15312,"call":"next(-6681408251916104762,
> > 1000),
> > >> rpc
> > >> > > > version=1, client version=29,
> > >> > methodsFingerPrint=-1368823753","client":"
> > >> > > > 192.168.20.31:48270
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
> > >> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
> > >> > > > (operationTooSlow): {"processingtimems":14759,"client":"
> > >> > > > 192.168.20.31:48247
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"gin
>



-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Asaf Mesika <as...@gmail.com>.
Check through HDFS UI that your cluster haven't reached maximum disk
capacity

On Thursday, October 24, 2013, Vimal Jain wrote:

> Hi Ted/Jean,
> Can you please help here ?
>
>
> On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <vkjk89@gmail.com<javascript:;>>
> wrote:
>
> > Hi Ted,
> > Yes i checked namenode and datanode logs and i found below exceptions in
> > both the logs:-
> >
> > Name node :-
> > java.io.IOException: File
> >
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> > could only be replicated to 0 nodes, instead of 1
> >
> > java.io.IOException: Got blockReceived message from unregistered or dead
> > node blk_-2949905629769882833_52274
> >
> > Data node :-
> > 480000 millis timeout while waiting for channel to be ready for write. ch
> > : java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
> >  remote=/192.168.20.30:36188]
> >
> > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> > DatanodeRegistration(192.168.20.30:50010,
> > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> infoPort=50075,
> > ipcPort=50020):DataXceiver
> >
> > java.io.EOFException: while trying to read 39309 bytes
> >
> >
> > On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> >> bq. java.io.IOException: File /hbase/event_data/
> >> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> >> could
> >> only be replicated to 0 nodes, instead of 1
> >>
> >> Have you checked Namenode / Datanode logs ?
> >> Looks like hdfs was not stable.
> >>
> >>
> >> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com> wrote:
> >>
> >> > HI Jean,
> >> > Thanks for your reply.
> >> > I have total 8 GB memory and distribution is as follows:-
> >> >
> >> > Region server  - 2 GB
> >> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
> >> > OS - 1 GB
> >> >
> >> > Please let me know if you need more information.
> >> >
> >> >
> >> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
> >> > jean-marc@spaggiari.org> wrote:
> >> >
> >> > > Hi Vimal,
> >> > >
> >> > > What are your settings? Memory of the host, and memory allocated for
> >> the
> >> > > different HBase services?
> >> > >
> >> > > Thanks,
> >> > >
> >> > > JM
> >> > >
> >> > >
> >> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
> >> > >
> >> > > > Hi,
> >> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop
> version -
> >> > > 1.1.2
> >> > > > , Hbase version - 0.94.7 )
> >> > > > I am getting few exceptions in both hadoop ( namenode , datanode)
> >> logs
> >> > > and
> >> > > > hbase(region server).
> >> > > > When i search for these exceptions on google , i concluded  that
> >> > problem
> >> > > is
> >> > > > mainly due to large number of full GC in region server process.
> >> > > >
> >> > > > I used jstat and found that there are total of 950 full GCs in
> span
> >> of
> >> > 4
> >> > > > days for region server process.Is this ok?
> >> > > >
> >> > > > I am totally confused by number of exceptions i am getting.
> >> > > > Also i get below exceptions intermittently.
> >> > > >
> >> > > >
> >> > > > Region server:-
> >> > > >
> >> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
> >> > > > (responseTooSlow):
> >> > > > {"processingtimems":15312,"call":"next(-6681408251916104762,
> 1000),
> >> rpc
> >> > > > version=1, client version=29,
> >> > methodsFingerPrint=-1368823753","client":"
> >> > > > 192.168.20.31:48270
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
> >> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
> >> > > > (operationTooSlow): {"processingtimems":14759,"client":"
> >> > > > 192.168.20.31:48247
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"gin

Re: High Full GC count for Region server

Posted by Vimal Jain <vk...@gmail.com>.
Hi Ted/Jean,
Can you please help here ?


On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <vk...@gmail.com> wrote:

> Hi Ted,
> Yes i checked namenode and datanode logs and i found below exceptions in
> both the logs:-
>
> Name node :-
> java.io.IOException: File
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> could only be replicated to 0 nodes, instead of 1
>
> java.io.IOException: Got blockReceived message from unregistered or dead
> node blk_-2949905629769882833_52274
>
> Data node :-
> 480000 millis timeout while waiting for channel to be ready for write. ch
> : java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
>  remote=/192.168.20.30:36188]
>
> ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(192.168.20.30:50010,
> storageID=DS-1816106352-192.168.20.30-50010-1369314076237, infoPort=50075,
> ipcPort=50020):DataXceiver
>
> java.io.EOFException: while trying to read 39309 bytes
>
>
> On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> bq. java.io.IOException: File /hbase/event_data/
>> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
>> could
>> only be replicated to 0 nodes, instead of 1
>>
>> Have you checked Namenode / Datanode logs ?
>> Looks like hdfs was not stable.
>>
>>
>> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com> wrote:
>>
>> > HI Jean,
>> > Thanks for your reply.
>> > I have total 8 GB memory and distribution is as follows:-
>> >
>> > Region server  - 2 GB
>> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
>> > OS - 1 GB
>> >
>> > Please let me know if you need more information.
>> >
>> >
>> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
>> > jean-marc@spaggiari.org> wrote:
>> >
>> > > Hi Vimal,
>> > >
>> > > What are your settings? Memory of the host, and memory allocated for
>> the
>> > > different HBase services?
>> > >
>> > > Thanks,
>> > >
>> > > JM
>> > >
>> > >
>> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
>> > >
>> > > > Hi,
>> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop version -
>> > > 1.1.2
>> > > > , Hbase version - 0.94.7 )
>> > > > I am getting few exceptions in both hadoop ( namenode , datanode)
>> logs
>> > > and
>> > > > hbase(region server).
>> > > > When i search for these exceptions on google , i concluded  that
>> > problem
>> > > is
>> > > > mainly due to large number of full GC in region server process.
>> > > >
>> > > > I used jstat and found that there are total of 950 full GCs in span
>> of
>> > 4
>> > > > days for region server process.Is this ok?
>> > > >
>> > > > I am totally confused by number of exceptions i am getting.
>> > > > Also i get below exceptions intermittently.
>> > > >
>> > > >
>> > > > Region server:-
>> > > >
>> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
>> > > > (responseTooSlow):
>> > > > {"processingtimems":15312,"call":"next(-6681408251916104762, 1000),
>> rpc
>> > > > version=1, client version=29,
>> > methodsFingerPrint=-1368823753","client":"
>> > > > 192.168.20.31:48270
>> > > >
>> > > >
>> > >
>> >
>> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
>> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
>> > > > (operationTooSlow): {"processingtimems":14759,"client":"
>> > > > 192.168.20.31:48247
>> > > >
>> > > >
>> > >
>> >
>> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"ginfo":["netGainPool"]},"row":"1629657","queuetimems":0,"method":"get","totalColumns":1,"maxVersions":1}
>> > > >
>> > > > 2013-10-18 10:37:45,008 WARN org.apache.hadoop.hdfs.DFSClient:
>> > > DataStreamer
>> > > > Exception: org.apache.hadoop.ipc.RemoteException:
>> java.io.IOException:
>> > > File
>> > > >
>> > > >
>> > >
>> >
>> /hbase/event_data/4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
>> > > > could only be replicated to 0 nodes, instead of 1
>> > > >     at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
>> > > >
>> > > > Name node :-
>> > > > java.io.IOException: File
>> > > >
>> > > >
>> > >
>> >
>> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
>> > > > could only be replicated to 0 nodes, instead of 1
>> > > >
>> > > > java.io.IOException: Got blockReceived message from unregistered or
>> > dead
>> > > > node blk_-2949905629769882833_52274
>> > > >
>> > > > Data node :-
>> > > > 480000 millis timeout while waiting for channel to be ready for
>> write.
>> > > ch :
>> > > > java.nio.channels.SocketChannel[connected local=/
>> 192.168.20.30:50010
>> > > > remote=/
>> > > > 192.168.20.30:36188]
>> > > >
>> > > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
>> > > > DatanodeRegistration(
>> > > > 192.168.20.30:50010,
>> > > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
>> > > infoPort=50075,
>> > > > ipcPort=50020):DataXceiver
>> > > > java.io.EOFException: while trying to read 39309 bytes
>> > > >
>> > > >
>> > > > --
>> > > > Thanks and Regards,
>> > > > Vimal Jain
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Thanks and Regards,
>> > Vimal Jain
>> >
>>
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>



-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Vimal Jain <vk...@gmail.com>.
Hi Ted/Jean,
Can you please help here ?


On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <vk...@gmail.com> wrote:

> Hi Ted,
> Yes i checked namenode and datanode logs and i found below exceptions in
> both the logs:-
>
> Name node :-
> java.io.IOException: File
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> could only be replicated to 0 nodes, instead of 1
>
> java.io.IOException: Got blockReceived message from unregistered or dead
> node blk_-2949905629769882833_52274
>
> Data node :-
> 480000 millis timeout while waiting for channel to be ready for write. ch
> : java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
>  remote=/192.168.20.30:36188]
>
> ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(192.168.20.30:50010,
> storageID=DS-1816106352-192.168.20.30-50010-1369314076237, infoPort=50075,
> ipcPort=50020):DataXceiver
>
> java.io.EOFException: while trying to read 39309 bytes
>
>
> On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> bq. java.io.IOException: File /hbase/event_data/
>> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
>> could
>> only be replicated to 0 nodes, instead of 1
>>
>> Have you checked Namenode / Datanode logs ?
>> Looks like hdfs was not stable.
>>
>>
>> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com> wrote:
>>
>> > HI Jean,
>> > Thanks for your reply.
>> > I have total 8 GB memory and distribution is as follows:-
>> >
>> > Region server  - 2 GB
>> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
>> > OS - 1 GB
>> >
>> > Please let me know if you need more information.
>> >
>> >
>> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
>> > jean-marc@spaggiari.org> wrote:
>> >
>> > > Hi Vimal,
>> > >
>> > > What are your settings? Memory of the host, and memory allocated for
>> the
>> > > different HBase services?
>> > >
>> > > Thanks,
>> > >
>> > > JM
>> > >
>> > >
>> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
>> > >
>> > > > Hi,
>> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop version -
>> > > 1.1.2
>> > > > , Hbase version - 0.94.7 )
>> > > > I am getting few exceptions in both hadoop ( namenode , datanode)
>> logs
>> > > and
>> > > > hbase(region server).
>> > > > When i search for these exceptions on google , i concluded  that
>> > problem
>> > > is
>> > > > mainly due to large number of full GC in region server process.
>> > > >
>> > > > I used jstat and found that there are total of 950 full GCs in span
>> of
>> > 4
>> > > > days for region server process.Is this ok?
>> > > >
>> > > > I am totally confused by number of exceptions i am getting.
>> > > > Also i get below exceptions intermittently.
>> > > >
>> > > >
>> > > > Region server:-
>> > > >
>> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
>> > > > (responseTooSlow):
>> > > > {"processingtimems":15312,"call":"next(-6681408251916104762, 1000),
>> rpc
>> > > > version=1, client version=29,
>> > methodsFingerPrint=-1368823753","client":"
>> > > > 192.168.20.31:48270
>> > > >
>> > > >
>> > >
>> >
>> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
>> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
>> > > > (operationTooSlow): {"processingtimems":14759,"client":"
>> > > > 192.168.20.31:48247
>> > > >
>> > > >
>> > >
>> >
>> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"ginfo":["netGainPool"]},"row":"1629657","queuetimems":0,"method":"get","totalColumns":1,"maxVersions":1}
>> > > >
>> > > > 2013-10-18 10:37:45,008 WARN org.apache.hadoop.hdfs.DFSClient:
>> > > DataStreamer
>> > > > Exception: org.apache.hadoop.ipc.RemoteException:
>> java.io.IOException:
>> > > File
>> > > >
>> > > >
>> > >
>> >
>> /hbase/event_data/4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
>> > > > could only be replicated to 0 nodes, instead of 1
>> > > >     at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
>> > > >
>> > > > Name node :-
>> > > > java.io.IOException: File
>> > > >
>> > > >
>> > >
>> >
>> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
>> > > > could only be replicated to 0 nodes, instead of 1
>> > > >
>> > > > java.io.IOException: Got blockReceived message from unregistered or
>> > dead
>> > > > node blk_-2949905629769882833_52274
>> > > >
>> > > > Data node :-
>> > > > 480000 millis timeout while waiting for channel to be ready for
>> write.
>> > > ch :
>> > > > java.nio.channels.SocketChannel[connected local=/
>> 192.168.20.30:50010
>> > > > remote=/
>> > > > 192.168.20.30:36188]
>> > > >
>> > > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
>> > > > DatanodeRegistration(
>> > > > 192.168.20.30:50010,
>> > > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
>> > > infoPort=50075,
>> > > > ipcPort=50020):DataXceiver
>> > > > java.io.EOFException: while trying to read 39309 bytes
>> > > >
>> > > >
>> > > > --
>> > > > Thanks and Regards,
>> > > > Vimal Jain
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Thanks and Regards,
>> > Vimal Jain
>> >
>>
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>



-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Vimal Jain <vk...@gmail.com>.
Hi Ted/Jean,
Can you please help here ?


On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <vk...@gmail.com> wrote:

> Hi Ted,
> Yes i checked namenode and datanode logs and i found below exceptions in
> both the logs:-
>
> Name node :-
> java.io.IOException: File
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> could only be replicated to 0 nodes, instead of 1
>
> java.io.IOException: Got blockReceived message from unregistered or dead
> node blk_-2949905629769882833_52274
>
> Data node :-
> 480000 millis timeout while waiting for channel to be ready for write. ch
> : java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
>  remote=/192.168.20.30:36188]
>
> ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(192.168.20.30:50010,
> storageID=DS-1816106352-192.168.20.30-50010-1369314076237, infoPort=50075,
> ipcPort=50020):DataXceiver
>
> java.io.EOFException: while trying to read 39309 bytes
>
>
> On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> bq. java.io.IOException: File /hbase/event_data/
>> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
>> could
>> only be replicated to 0 nodes, instead of 1
>>
>> Have you checked Namenode / Datanode logs ?
>> Looks like hdfs was not stable.
>>
>>
>> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com> wrote:
>>
>> > HI Jean,
>> > Thanks for your reply.
>> > I have total 8 GB memory and distribution is as follows:-
>> >
>> > Region server  - 2 GB
>> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
>> > OS - 1 GB
>> >
>> > Please let me know if you need more information.
>> >
>> >
>> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
>> > jean-marc@spaggiari.org> wrote:
>> >
>> > > Hi Vimal,
>> > >
>> > > What are your settings? Memory of the host, and memory allocated for
>> the
>> > > different HBase services?
>> > >
>> > > Thanks,
>> > >
>> > > JM
>> > >
>> > >
>> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
>> > >
>> > > > Hi,
>> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop version -
>> > > 1.1.2
>> > > > , Hbase version - 0.94.7 )
>> > > > I am getting few exceptions in both hadoop ( namenode , datanode)
>> logs
>> > > and
>> > > > hbase(region server).
>> > > > When i search for these exceptions on google , i concluded  that
>> > problem
>> > > is
>> > > > mainly due to large number of full GC in region server process.
>> > > >
>> > > > I used jstat and found that there are total of 950 full GCs in span
>> of
>> > 4
>> > > > days for region server process.Is this ok?
>> > > >
>> > > > I am totally confused by number of exceptions i am getting.
>> > > > Also i get below exceptions intermittently.
>> > > >
>> > > >
>> > > > Region server:-
>> > > >
>> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
>> > > > (responseTooSlow):
>> > > > {"processingtimems":15312,"call":"next(-6681408251916104762, 1000),
>> rpc
>> > > > version=1, client version=29,
>> > methodsFingerPrint=-1368823753","client":"
>> > > > 192.168.20.31:48270
>> > > >
>> > > >
>> > >
>> >
>> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
>> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
>> > > > (operationTooSlow): {"processingtimems":14759,"client":"
>> > > > 192.168.20.31:48247
>> > > >
>> > > >
>> > >
>> >
>> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"ginfo":["netGainPool"]},"row":"1629657","queuetimems":0,"method":"get","totalColumns":1,"maxVersions":1}
>> > > >
>> > > > 2013-10-18 10:37:45,008 WARN org.apache.hadoop.hdfs.DFSClient:
>> > > DataStreamer
>> > > > Exception: org.apache.hadoop.ipc.RemoteException:
>> java.io.IOException:
>> > > File
>> > > >
>> > > >
>> > >
>> >
>> /hbase/event_data/4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
>> > > > could only be replicated to 0 nodes, instead of 1
>> > > >     at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
>> > > >
>> > > > Name node :-
>> > > > java.io.IOException: File
>> > > >
>> > > >
>> > >
>> >
>> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
>> > > > could only be replicated to 0 nodes, instead of 1
>> > > >
>> > > > java.io.IOException: Got blockReceived message from unregistered or
>> > dead
>> > > > node blk_-2949905629769882833_52274
>> > > >
>> > > > Data node :-
>> > > > 480000 millis timeout while waiting for channel to be ready for
>> write.
>> > > ch :
>> > > > java.nio.channels.SocketChannel[connected local=/
>> 192.168.20.30:50010
>> > > > remote=/
>> > > > 192.168.20.30:36188]
>> > > >
>> > > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
>> > > > DatanodeRegistration(
>> > > > 192.168.20.30:50010,
>> > > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
>> > > infoPort=50075,
>> > > > ipcPort=50020):DataXceiver
>> > > > java.io.EOFException: while trying to read 39309 bytes
>> > > >
>> > > >
>> > > > --
>> > > > Thanks and Regards,
>> > > > Vimal Jain
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Thanks and Regards,
>> > Vimal Jain
>> >
>>
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>



-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Vimal Jain <vk...@gmail.com>.
Hi Ted/Jean,
Can you please help here ?


On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <vk...@gmail.com> wrote:

> Hi Ted,
> Yes i checked namenode and datanode logs and i found below exceptions in
> both the logs:-
>
> Name node :-
> java.io.IOException: File
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> could only be replicated to 0 nodes, instead of 1
>
> java.io.IOException: Got blockReceived message from unregistered or dead
> node blk_-2949905629769882833_52274
>
> Data node :-
> 480000 millis timeout while waiting for channel to be ready for write. ch
> : java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
>  remote=/192.168.20.30:36188]
>
> ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(192.168.20.30:50010,
> storageID=DS-1816106352-192.168.20.30-50010-1369314076237, infoPort=50075,
> ipcPort=50020):DataXceiver
>
> java.io.EOFException: while trying to read 39309 bytes
>
>
> On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> bq. java.io.IOException: File /hbase/event_data/
>> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
>> could
>> only be replicated to 0 nodes, instead of 1
>>
>> Have you checked Namenode / Datanode logs ?
>> Looks like hdfs was not stable.
>>
>>
>> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com> wrote:
>>
>> > HI Jean,
>> > Thanks for your reply.
>> > I have total 8 GB memory and distribution is as follows:-
>> >
>> > Region server  - 2 GB
>> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
>> > OS - 1 GB
>> >
>> > Please let me know if you need more information.
>> >
>> >
>> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
>> > jean-marc@spaggiari.org> wrote:
>> >
>> > > Hi Vimal,
>> > >
>> > > What are your settings? Memory of the host, and memory allocated for
>> the
>> > > different HBase services?
>> > >
>> > > Thanks,
>> > >
>> > > JM
>> > >
>> > >
>> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
>> > >
>> > > > Hi,
>> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop version -
>> > > 1.1.2
>> > > > , Hbase version - 0.94.7 )
>> > > > I am getting few exceptions in both hadoop ( namenode , datanode)
>> logs
>> > > and
>> > > > hbase(region server).
>> > > > When i search for these exceptions on google , i concluded  that
>> > problem
>> > > is
>> > > > mainly due to large number of full GC in region server process.
>> > > >
>> > > > I used jstat and found that there are total of 950 full GCs in span
>> of
>> > 4
>> > > > days for region server process.Is this ok?
>> > > >
>> > > > I am totally confused by number of exceptions i am getting.
>> > > > Also i get below exceptions intermittently.
>> > > >
>> > > >
>> > > > Region server:-
>> > > >
>> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
>> > > > (responseTooSlow):
>> > > > {"processingtimems":15312,"call":"next(-6681408251916104762, 1000),
>> rpc
>> > > > version=1, client version=29,
>> > methodsFingerPrint=-1368823753","client":"
>> > > > 192.168.20.31:48270
>> > > >
>> > > >
>> > >
>> >
>> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
>> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
>> > > > (operationTooSlow): {"processingtimems":14759,"client":"
>> > > > 192.168.20.31:48247
>> > > >
>> > > >
>> > >
>> >
>> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"ginfo":["netGainPool"]},"row":"1629657","queuetimems":0,"method":"get","totalColumns":1,"maxVersions":1}
>> > > >
>> > > > 2013-10-18 10:37:45,008 WARN org.apache.hadoop.hdfs.DFSClient:
>> > > DataStreamer
>> > > > Exception: org.apache.hadoop.ipc.RemoteException:
>> java.io.IOException:
>> > > File
>> > > >
>> > > >
>> > >
>> >
>> /hbase/event_data/4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
>> > > > could only be replicated to 0 nodes, instead of 1
>> > > >     at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
>> > > >
>> > > > Name node :-
>> > > > java.io.IOException: File
>> > > >
>> > > >
>> > >
>> >
>> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
>> > > > could only be replicated to 0 nodes, instead of 1
>> > > >
>> > > > java.io.IOException: Got blockReceived message from unregistered or
>> > dead
>> > > > node blk_-2949905629769882833_52274
>> > > >
>> > > > Data node :-
>> > > > 480000 millis timeout while waiting for channel to be ready for
>> write.
>> > > ch :
>> > > > java.nio.channels.SocketChannel[connected local=/
>> 192.168.20.30:50010
>> > > > remote=/
>> > > > 192.168.20.30:36188]
>> > > >
>> > > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
>> > > > DatanodeRegistration(
>> > > > 192.168.20.30:50010,
>> > > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
>> > > infoPort=50075,
>> > > > ipcPort=50020):DataXceiver
>> > > > java.io.EOFException: while trying to read 39309 bytes
>> > > >
>> > > >
>> > > > --
>> > > > Thanks and Regards,
>> > > > Vimal Jain
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Thanks and Regards,
>> > Vimal Jain
>> >
>>
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>



-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Vimal Jain <vk...@gmail.com>.
Hi Ted/Jean,
Can you please help here ?


On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <vk...@gmail.com> wrote:

> Hi Ted,
> Yes i checked namenode and datanode logs and i found below exceptions in
> both the logs:-
>
> Name node :-
> java.io.IOException: File
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> could only be replicated to 0 nodes, instead of 1
>
> java.io.IOException: Got blockReceived message from unregistered or dead
> node blk_-2949905629769882833_52274
>
> Data node :-
> 480000 millis timeout while waiting for channel to be ready for write. ch
> : java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
>  remote=/192.168.20.30:36188]
>
> ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(192.168.20.30:50010,
> storageID=DS-1816106352-192.168.20.30-50010-1369314076237, infoPort=50075,
> ipcPort=50020):DataXceiver
>
> java.io.EOFException: while trying to read 39309 bytes
>
>
> On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> bq. java.io.IOException: File /hbase/event_data/
>> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
>> could
>> only be replicated to 0 nodes, instead of 1
>>
>> Have you checked Namenode / Datanode logs ?
>> Looks like hdfs was not stable.
>>
>>
>> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com> wrote:
>>
>> > HI Jean,
>> > Thanks for your reply.
>> > I have total 8 GB memory and distribution is as follows:-
>> >
>> > Region server  - 2 GB
>> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
>> > OS - 1 GB
>> >
>> > Please let me know if you need more information.
>> >
>> >
>> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
>> > jean-marc@spaggiari.org> wrote:
>> >
>> > > Hi Vimal,
>> > >
>> > > What are your settings? Memory of the host, and memory allocated for
>> the
>> > > different HBase services?
>> > >
>> > > Thanks,
>> > >
>> > > JM
>> > >
>> > >
>> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
>> > >
>> > > > Hi,
>> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop version -
>> > > 1.1.2
>> > > > , Hbase version - 0.94.7 )
>> > > > I am getting few exceptions in both hadoop ( namenode , datanode)
>> logs
>> > > and
>> > > > hbase(region server).
>> > > > When i search for these exceptions on google , i concluded  that
>> > problem
>> > > is
>> > > > mainly due to large number of full GC in region server process.
>> > > >
>> > > > I used jstat and found that there are total of 950 full GCs in span
>> of
>> > 4
>> > > > days for region server process.Is this ok?
>> > > >
>> > > > I am totally confused by number of exceptions i am getting.
>> > > > Also i get below exceptions intermittently.
>> > > >
>> > > >
>> > > > Region server:-
>> > > >
>> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
>> > > > (responseTooSlow):
>> > > > {"processingtimems":15312,"call":"next(-6681408251916104762, 1000),
>> rpc
>> > > > version=1, client version=29,
>> > methodsFingerPrint=-1368823753","client":"
>> > > > 192.168.20.31:48270
>> > > >
>> > > >
>> > >
>> >
>> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
>> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
>> > > > (operationTooSlow): {"processingtimems":14759,"client":"
>> > > > 192.168.20.31:48247
>> > > >
>> > > >
>> > >
>> >
>> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"ginfo":["netGainPool"]},"row":"1629657","queuetimems":0,"method":"get","totalColumns":1,"maxVersions":1}
>> > > >
>> > > > 2013-10-18 10:37:45,008 WARN org.apache.hadoop.hdfs.DFSClient:
>> > > DataStreamer
>> > > > Exception: org.apache.hadoop.ipc.RemoteException:
>> java.io.IOException:
>> > > File
>> > > >
>> > > >
>> > >
>> >
>> /hbase/event_data/4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
>> > > > could only be replicated to 0 nodes, instead of 1
>> > > >     at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
>> > > >
>> > > > Name node :-
>> > > > java.io.IOException: File
>> > > >
>> > > >
>> > >
>> >
>> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
>> > > > could only be replicated to 0 nodes, instead of 1
>> > > >
>> > > > java.io.IOException: Got blockReceived message from unregistered or
>> > dead
>> > > > node blk_-2949905629769882833_52274
>> > > >
>> > > > Data node :-
>> > > > 480000 millis timeout while waiting for channel to be ready for
>> write.
>> > > ch :
>> > > > java.nio.channels.SocketChannel[connected local=/
>> 192.168.20.30:50010
>> > > > remote=/
>> > > > 192.168.20.30:36188]
>> > > >
>> > > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
>> > > > DatanodeRegistration(
>> > > > 192.168.20.30:50010,
>> > > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
>> > > infoPort=50075,
>> > > > ipcPort=50020):DataXceiver
>> > > > java.io.EOFException: while trying to read 39309 bytes
>> > > >
>> > > >
>> > > > --
>> > > > Thanks and Regards,
>> > > > Vimal Jain
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Thanks and Regards,
>> > Vimal Jain
>> >
>>
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>



-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Vimal Jain <vk...@gmail.com>.
Hi Ted,
Yes i checked namenode and datanode logs and i found below exceptions in
both the logs:-

Name node :-
java.io.IOException: File
/hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
could only be replicated to 0 nodes, instead of 1

java.io.IOException: Got blockReceived message from unregistered or dead
node blk_-2949905629769882833_52274

Data node :-
480000 millis timeout while waiting for channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
 remote=/192.168.20.30:36188]

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.20.30:50010,
storageID=DS-1816106352-192.168.20.30-50010-1369314076237, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 39309 bytes


On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. java.io.IOException: File /hbase/event_data/
> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> could
> only be replicated to 0 nodes, instead of 1
>
> Have you checked Namenode / Datanode logs ?
> Looks like hdfs was not stable.
>
>
> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com> wrote:
>
> > HI Jean,
> > Thanks for your reply.
> > I have total 8 GB memory and distribution is as follows:-
> >
> > Region server  - 2 GB
> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
> > OS - 1 GB
> >
> > Please let me know if you need more information.
> >
> >
> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> > > Hi Vimal,
> > >
> > > What are your settings? Memory of the host, and memory allocated for
> the
> > > different HBase services?
> > >
> > > Thanks,
> > >
> > > JM
> > >
> > >
> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
> > >
> > > > Hi,
> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop version -
> > > 1.1.2
> > > > , Hbase version - 0.94.7 )
> > > > I am getting few exceptions in both hadoop ( namenode , datanode)
> logs
> > > and
> > > > hbase(region server).
> > > > When i search for these exceptions on google , i concluded  that
> > problem
> > > is
> > > > mainly due to large number of full GC in region server process.
> > > >
> > > > I used jstat and found that there are total of 950 full GCs in span
> of
> > 4
> > > > days for region server process.Is this ok?
> > > >
> > > > I am totally confused by number of exceptions i am getting.
> > > > Also i get below exceptions intermittently.
> > > >
> > > >
> > > > Region server:-
> > > >
> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > > {"processingtimems":15312,"call":"next(-6681408251916104762, 1000),
> rpc
> > > > version=1, client version=29,
> > methodsFingerPrint=-1368823753","client":"
> > > > 192.168.20.31:48270
> > > >
> > > >
> > >
> >
> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (operationTooSlow): {"processingtimems":14759,"client":"
> > > > 192.168.20.31:48247
> > > >
> > > >
> > >
> >
> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"ginfo":["netGainPool"]},"row":"1629657","queuetimems":0,"method":"get","totalColumns":1,"maxVersions":1}
> > > >
> > > > 2013-10-18 10:37:45,008 WARN org.apache.hadoop.hdfs.DFSClient:
> > > DataStreamer
> > > > Exception: org.apache.hadoop.ipc.RemoteException:
> java.io.IOException:
> > > File
> > > >
> > > >
> > >
> >
> /hbase/event_data/4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> > > > could only be replicated to 0 nodes, instead of 1
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
> > > >
> > > > Name node :-
> > > > java.io.IOException: File
> > > >
> > > >
> > >
> >
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> > > > could only be replicated to 0 nodes, instead of 1
> > > >
> > > > java.io.IOException: Got blockReceived message from unregistered or
> > dead
> > > > node blk_-2949905629769882833_52274
> > > >
> > > > Data node :-
> > > > 480000 millis timeout while waiting for channel to be ready for
> write.
> > > ch :
> > > > java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
> > > > remote=/
> > > > 192.168.20.30:36188]
> > > >
> > > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> > > > DatanodeRegistration(
> > > > 192.168.20.30:50010,
> > > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> > > infoPort=50075,
> > > > ipcPort=50020):DataXceiver
> > > > java.io.EOFException: while trying to read 39309 bytes
> > > >
> > > >
> > > > --
> > > > Thanks and Regards,
> > > > Vimal Jain
> > > >
> > >
> >
> >
> >
> > --
> > Thanks and Regards,
> > Vimal Jain
> >
>



-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Vimal Jain <vk...@gmail.com>.
Hi Ted,
Yes i checked namenode and datanode logs and i found below exceptions in
both the logs:-

Name node :-
java.io.IOException: File
/hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
could only be replicated to 0 nodes, instead of 1

java.io.IOException: Got blockReceived message from unregistered or dead
node blk_-2949905629769882833_52274

Data node :-
480000 millis timeout while waiting for channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
 remote=/192.168.20.30:36188]

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.20.30:50010,
storageID=DS-1816106352-192.168.20.30-50010-1369314076237, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 39309 bytes


On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. java.io.IOException: File /hbase/event_data/
> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> could
> only be replicated to 0 nodes, instead of 1
>
> Have you checked Namenode / Datanode logs ?
> Looks like hdfs was not stable.
>
>
> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com> wrote:
>
> > HI Jean,
> > Thanks for your reply.
> > I have total 8 GB memory and distribution is as follows:-
> >
> > Region server  - 2 GB
> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
> > OS - 1 GB
> >
> > Please let me know if you need more information.
> >
> >
> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> > > Hi Vimal,
> > >
> > > What are your settings? Memory of the host, and memory allocated for
> the
> > > different HBase services?
> > >
> > > Thanks,
> > >
> > > JM
> > >
> > >
> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
> > >
> > > > Hi,
> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop version -
> > > 1.1.2
> > > > , Hbase version - 0.94.7 )
> > > > I am getting few exceptions in both hadoop ( namenode , datanode)
> logs
> > > and
> > > > hbase(region server).
> > > > When i search for these exceptions on google , i concluded  that
> > problem
> > > is
> > > > mainly due to large number of full GC in region server process.
> > > >
> > > > I used jstat and found that there are total of 950 full GCs in span
> of
> > 4
> > > > days for region server process.Is this ok?
> > > >
> > > > I am totally confused by number of exceptions i am getting.
> > > > Also i get below exceptions intermittently.
> > > >
> > > >
> > > > Region server:-
> > > >
> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > > {"processingtimems":15312,"call":"next(-6681408251916104762, 1000),
> rpc
> > > > version=1, client version=29,
> > methodsFingerPrint=-1368823753","client":"
> > > > 192.168.20.31:48270
> > > >
> > > >
> > >
> >
> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (operationTooSlow): {"processingtimems":14759,"client":"
> > > > 192.168.20.31:48247
> > > >
> > > >
> > >
> >
> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"ginfo":["netGainPool"]},"row":"1629657","queuetimems":0,"method":"get","totalColumns":1,"maxVersions":1}
> > > >
> > > > 2013-10-18 10:37:45,008 WARN org.apache.hadoop.hdfs.DFSClient:
> > > DataStreamer
> > > > Exception: org.apache.hadoop.ipc.RemoteException:
> java.io.IOException:
> > > File
> > > >
> > > >
> > >
> >
> /hbase/event_data/4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> > > > could only be replicated to 0 nodes, instead of 1
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
> > > >
> > > > Name node :-
> > > > java.io.IOException: File
> > > >
> > > >
> > >
> >
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> > > > could only be replicated to 0 nodes, instead of 1
> > > >
> > > > java.io.IOException: Got blockReceived message from unregistered or
> > dead
> > > > node blk_-2949905629769882833_52274
> > > >
> > > > Data node :-
> > > > 480000 millis timeout while waiting for channel to be ready for
> write.
> > > ch :
> > > > java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
> > > > remote=/
> > > > 192.168.20.30:36188]
> > > >
> > > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> > > > DatanodeRegistration(
> > > > 192.168.20.30:50010,
> > > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> > > infoPort=50075,
> > > > ipcPort=50020):DataXceiver
> > > > java.io.EOFException: while trying to read 39309 bytes
> > > >
> > > >
> > > > --
> > > > Thanks and Regards,
> > > > Vimal Jain
> > > >
> > >
> >
> >
> >
> > --
> > Thanks and Regards,
> > Vimal Jain
> >
>



-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Vimal Jain <vk...@gmail.com>.
Hi Ted,
Yes i checked namenode and datanode logs and i found below exceptions in
both the logs:-

Name node :-
java.io.IOException: File
/hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
could only be replicated to 0 nodes, instead of 1

java.io.IOException: Got blockReceived message from unregistered or dead
node blk_-2949905629769882833_52274

Data node :-
480000 millis timeout while waiting for channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
 remote=/192.168.20.30:36188]

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.20.30:50010,
storageID=DS-1816106352-192.168.20.30-50010-1369314076237, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 39309 bytes


On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. java.io.IOException: File /hbase/event_data/
> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> could
> only be replicated to 0 nodes, instead of 1
>
> Have you checked Namenode / Datanode logs ?
> Looks like hdfs was not stable.
>
>
> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com> wrote:
>
> > HI Jean,
> > Thanks for your reply.
> > I have total 8 GB memory and distribution is as follows:-
> >
> > Region server  - 2 GB
> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
> > OS - 1 GB
> >
> > Please let me know if you need more information.
> >
> >
> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> > > Hi Vimal,
> > >
> > > What are your settings? Memory of the host, and memory allocated for
> the
> > > different HBase services?
> > >
> > > Thanks,
> > >
> > > JM
> > >
> > >
> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
> > >
> > > > Hi,
> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop version -
> > > 1.1.2
> > > > , Hbase version - 0.94.7 )
> > > > I am getting few exceptions in both hadoop ( namenode , datanode)
> logs
> > > and
> > > > hbase(region server).
> > > > When i search for these exceptions on google , i concluded  that
> > problem
> > > is
> > > > mainly due to large number of full GC in region server process.
> > > >
> > > > I used jstat and found that there are total of 950 full GCs in span
> of
> > 4
> > > > days for region server process.Is this ok?
> > > >
> > > > I am totally confused by number of exceptions i am getting.
> > > > Also i get below exceptions intermittently.
> > > >
> > > >
> > > > Region server:-
> > > >
> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > > {"processingtimems":15312,"call":"next(-6681408251916104762, 1000),
> rpc
> > > > version=1, client version=29,
> > methodsFingerPrint=-1368823753","client":"
> > > > 192.168.20.31:48270
> > > >
> > > >
> > >
> >
> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (operationTooSlow): {"processingtimems":14759,"client":"
> > > > 192.168.20.31:48247
> > > >
> > > >
> > >
> >
> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"ginfo":["netGainPool"]},"row":"1629657","queuetimems":0,"method":"get","totalColumns":1,"maxVersions":1}
> > > >
> > > > 2013-10-18 10:37:45,008 WARN org.apache.hadoop.hdfs.DFSClient:
> > > DataStreamer
> > > > Exception: org.apache.hadoop.ipc.RemoteException:
> java.io.IOException:
> > > File
> > > >
> > > >
> > >
> >
> /hbase/event_data/4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> > > > could only be replicated to 0 nodes, instead of 1
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
> > > >
> > > > Name node :-
> > > > java.io.IOException: File
> > > >
> > > >
> > >
> >
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> > > > could only be replicated to 0 nodes, instead of 1
> > > >
> > > > java.io.IOException: Got blockReceived message from unregistered or
> > dead
> > > > node blk_-2949905629769882833_52274
> > > >
> > > > Data node :-
> > > > 480000 millis timeout while waiting for channel to be ready for
> write.
> > > ch :
> > > > java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
> > > > remote=/
> > > > 192.168.20.30:36188]
> > > >
> > > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> > > > DatanodeRegistration(
> > > > 192.168.20.30:50010,
> > > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> > > infoPort=50075,
> > > > ipcPort=50020):DataXceiver
> > > > java.io.EOFException: while trying to read 39309 bytes
> > > >
> > > >
> > > > --
> > > > Thanks and Regards,
> > > > Vimal Jain
> > > >
> > >
> >
> >
> >
> > --
> > Thanks and Regards,
> > Vimal Jain
> >
>



-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Vimal Jain <vk...@gmail.com>.
Hi Ted,
Yes i checked namenode and datanode logs and i found below exceptions in
both the logs:-

Name node :-
java.io.IOException: File
/hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
could only be replicated to 0 nodes, instead of 1

java.io.IOException: Got blockReceived message from unregistered or dead
node blk_-2949905629769882833_52274

Data node :-
480000 millis timeout while waiting for channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
 remote=/192.168.20.30:36188]

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.20.30:50010,
storageID=DS-1816106352-192.168.20.30-50010-1369314076237, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 39309 bytes


On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. java.io.IOException: File /hbase/event_data/
> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> could
> only be replicated to 0 nodes, instead of 1
>
> Have you checked Namenode / Datanode logs ?
> Looks like hdfs was not stable.
>
>
> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com> wrote:
>
> > HI Jean,
> > Thanks for your reply.
> > I have total 8 GB memory and distribution is as follows:-
> >
> > Region server  - 2 GB
> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
> > OS - 1 GB
> >
> > Please let me know if you need more information.
> >
> >
> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> > > Hi Vimal,
> > >
> > > What are your settings? Memory of the host, and memory allocated for
> the
> > > different HBase services?
> > >
> > > Thanks,
> > >
> > > JM
> > >
> > >
> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
> > >
> > > > Hi,
> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop version -
> > > 1.1.2
> > > > , Hbase version - 0.94.7 )
> > > > I am getting few exceptions in both hadoop ( namenode , datanode)
> logs
> > > and
> > > > hbase(region server).
> > > > When i search for these exceptions on google , i concluded  that
> > problem
> > > is
> > > > mainly due to large number of full GC in region server process.
> > > >
> > > > I used jstat and found that there are total of 950 full GCs in span
> of
> > 4
> > > > days for region server process.Is this ok?
> > > >
> > > > I am totally confused by number of exceptions i am getting.
> > > > Also i get below exceptions intermittently.
> > > >
> > > >
> > > > Region server:-
> > > >
> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > > {"processingtimems":15312,"call":"next(-6681408251916104762, 1000),
> rpc
> > > > version=1, client version=29,
> > methodsFingerPrint=-1368823753","client":"
> > > > 192.168.20.31:48270
> > > >
> > > >
> > >
> >
> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (operationTooSlow): {"processingtimems":14759,"client":"
> > > > 192.168.20.31:48247
> > > >
> > > >
> > >
> >
> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"ginfo":["netGainPool"]},"row":"1629657","queuetimems":0,"method":"get","totalColumns":1,"maxVersions":1}
> > > >
> > > > 2013-10-18 10:37:45,008 WARN org.apache.hadoop.hdfs.DFSClient:
> > > DataStreamer
> > > > Exception: org.apache.hadoop.ipc.RemoteException:
> java.io.IOException:
> > > File
> > > >
> > > >
> > >
> >
> /hbase/event_data/4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> > > > could only be replicated to 0 nodes, instead of 1
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
> > > >
> > > > Name node :-
> > > > java.io.IOException: File
> > > >
> > > >
> > >
> >
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> > > > could only be replicated to 0 nodes, instead of 1
> > > >
> > > > java.io.IOException: Got blockReceived message from unregistered or
> > dead
> > > > node blk_-2949905629769882833_52274
> > > >
> > > > Data node :-
> > > > 480000 millis timeout while waiting for channel to be ready for
> write.
> > > ch :
> > > > java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
> > > > remote=/
> > > > 192.168.20.30:36188]
> > > >
> > > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> > > > DatanodeRegistration(
> > > > 192.168.20.30:50010,
> > > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> > > infoPort=50075,
> > > > ipcPort=50020):DataXceiver
> > > > java.io.EOFException: while trying to read 39309 bytes
> > > >
> > > >
> > > > --
> > > > Thanks and Regards,
> > > > Vimal Jain
> > > >
> > >
> >
> >
> >
> > --
> > Thanks and Regards,
> > Vimal Jain
> >
>



-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Vimal Jain <vk...@gmail.com>.
Hi Ted,
Yes i checked namenode and datanode logs and i found below exceptions in
both the logs:-

Name node :-
java.io.IOException: File
/hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
could only be replicated to 0 nodes, instead of 1

java.io.IOException: Got blockReceived message from unregistered or dead
node blk_-2949905629769882833_52274

Data node :-
480000 millis timeout while waiting for channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
 remote=/192.168.20.30:36188]

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.20.30:50010,
storageID=DS-1816106352-192.168.20.30-50010-1369314076237, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 39309 bytes


On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. java.io.IOException: File /hbase/event_data/
> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> could
> only be replicated to 0 nodes, instead of 1
>
> Have you checked Namenode / Datanode logs ?
> Looks like hdfs was not stable.
>
>
> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com> wrote:
>
> > HI Jean,
> > Thanks for your reply.
> > I have total 8 GB memory and distribution is as follows:-
> >
> > Region server  - 2 GB
> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
> > OS - 1 GB
> >
> > Please let me know if you need more information.
> >
> >
> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> > > Hi Vimal,
> > >
> > > What are your settings? Memory of the host, and memory allocated for
> the
> > > different HBase services?
> > >
> > > Thanks,
> > >
> > > JM
> > >
> > >
> > > 2013/10/22 Vimal Jain <vk...@gmail.com>
> > >
> > > > Hi,
> > > > I am running in Hbase in pseudo distributed mode. ( Hadoop version -
> > > 1.1.2
> > > > , Hbase version - 0.94.7 )
> > > > I am getting few exceptions in both hadoop ( namenode , datanode)
> logs
> > > and
> > > > hbase(region server).
> > > > When i search for these exceptions on google , i concluded  that
> > problem
> > > is
> > > > mainly due to large number of full GC in region server process.
> > > >
> > > > I used jstat and found that there are total of 950 full GCs in span
> of
> > 4
> > > > days for region server process.Is this ok?
> > > >
> > > > I am totally confused by number of exceptions i am getting.
> > > > Also i get below exceptions intermittently.
> > > >
> > > >
> > > > Region server:-
> > > >
> > > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > > {"processingtimems":15312,"call":"next(-6681408251916104762, 1000),
> rpc
> > > > version=1, client version=29,
> > methodsFingerPrint=-1368823753","client":"
> > > > 192.168.20.31:48270
> > > >
> > > >
> > >
> >
> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
> > > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (operationTooSlow): {"processingtimems":14759,"client":"
> > > > 192.168.20.31:48247
> > > >
> > > >
> > >
> >
> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"ginfo":["netGainPool"]},"row":"1629657","queuetimems":0,"method":"get","totalColumns":1,"maxVersions":1}
> > > >
> > > > 2013-10-18 10:37:45,008 WARN org.apache.hadoop.hdfs.DFSClient:
> > > DataStreamer
> > > > Exception: org.apache.hadoop.ipc.RemoteException:
> java.io.IOException:
> > > File
> > > >
> > > >
> > >
> >
> /hbase/event_data/4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> > > > could only be replicated to 0 nodes, instead of 1
> > > >     at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
> > > >
> > > > Name node :-
> > > > java.io.IOException: File
> > > >
> > > >
> > >
> >
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> > > > could only be replicated to 0 nodes, instead of 1
> > > >
> > > > java.io.IOException: Got blockReceived message from unregistered or
> > dead
> > > > node blk_-2949905629769882833_52274
> > > >
> > > > Data node :-
> > > > 480000 millis timeout while waiting for channel to be ready for
> write.
> > > ch :
> > > > java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
> > > > remote=/
> > > > 192.168.20.30:36188]
> > > >
> > > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> > > > DatanodeRegistration(
> > > > 192.168.20.30:50010,
> > > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> > > infoPort=50075,
> > > > ipcPort=50020):DataXceiver
> > > > java.io.EOFException: while trying to read 39309 bytes
> > > >
> > > >
> > > > --
> > > > Thanks and Regards,
> > > > Vimal Jain
> > > >
> > >
> >
> >
> >
> > --
> > Thanks and Regards,
> > Vimal Jain
> >
>



-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Ted Yu <yu...@gmail.com>.
bq. java.io.IOException: File /hbase/event_data/
4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0 could
only be replicated to 0 nodes, instead of 1

Have you checked Namenode / Datanode logs ?
Looks like hdfs was not stable.


On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vk...@gmail.com> wrote:

> HI Jean,
> Thanks for your reply.
> I have total 8 GB memory and distribution is as follows:-
>
> Region server  - 2 GB
> Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
> OS - 1 GB
>
> Please let me know if you need more information.
>
>
> On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > Hi Vimal,
> >
> > What are your settings? Memory of the host, and memory allocated for the
> > different HBase services?
> >
> > Thanks,
> >
> > JM
> >
> >
> > 2013/10/22 Vimal Jain <vk...@gmail.com>
> >
> > > Hi,
> > > I am running in Hbase in pseudo distributed mode. ( Hadoop version -
> > 1.1.2
> > > , Hbase version - 0.94.7 )
> > > I am getting few exceptions in both hadoop ( namenode , datanode) logs
> > and
> > > hbase(region server).
> > > When i search for these exceptions on google , i concluded  that
> problem
> > is
> > > mainly due to large number of full GC in region server process.
> > >
> > > I used jstat and found that there are total of 950 full GCs in span of
> 4
> > > days for region server process.Is this ok?
> > >
> > > I am totally confused by number of exceptions i am getting.
> > > Also i get below exceptions intermittently.
> > >
> > >
> > > Region server:-
> > >
> > > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > > {"processingtimems":15312,"call":"next(-6681408251916104762, 1000), rpc
> > > version=1, client version=29,
> methodsFingerPrint=-1368823753","client":"
> > > 192.168.20.31:48270
> > >
> > >
> >
> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
> > > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (operationTooSlow): {"processingtimems":14759,"client":"
> > > 192.168.20.31:48247
> > >
> > >
> >
> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"ginfo":["netGainPool"]},"row":"1629657","queuetimems":0,"method":"get","totalColumns":1,"maxVersions":1}
> > >
> > > 2013-10-18 10:37:45,008 WARN org.apache.hadoop.hdfs.DFSClient:
> > DataStreamer
> > > Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException:
> > File
> > >
> > >
> >
> /hbase/event_data/4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> > > could only be replicated to 0 nodes, instead of 1
> > >     at
> > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
> > >
> > > Name node :-
> > > java.io.IOException: File
> > >
> > >
> >
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> > > could only be replicated to 0 nodes, instead of 1
> > >
> > > java.io.IOException: Got blockReceived message from unregistered or
> dead
> > > node blk_-2949905629769882833_52274
> > >
> > > Data node :-
> > > 480000 millis timeout while waiting for channel to be ready for write.
> > ch :
> > > java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
> > > remote=/
> > > 192.168.20.30:36188]
> > >
> > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> > > DatanodeRegistration(
> > > 192.168.20.30:50010,
> > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> > infoPort=50075,
> > > ipcPort=50020):DataXceiver
> > > java.io.EOFException: while trying to read 39309 bytes
> > >
> > >
> > > --
> > > Thanks and Regards,
> > > Vimal Jain
> > >
> >
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>

Re: High Full GC count for Region server

Posted by Vimal Jain <vk...@gmail.com>.
HI Jean,
Thanks for your reply.
I have total 8 GB memory and distribution is as follows:-

Region server  - 2 GB
Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1 GB
OS - 1 GB

Please let me know if you need more information.


On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Vimal,
>
> What are your settings? Memory of the host, and memory allocated for the
> different HBase services?
>
> Thanks,
>
> JM
>
>
> 2013/10/22 Vimal Jain <vk...@gmail.com>
>
> > Hi,
> > I am running in Hbase in pseudo distributed mode. ( Hadoop version -
> 1.1.2
> > , Hbase version - 0.94.7 )
> > I am getting few exceptions in both hadoop ( namenode , datanode) logs
> and
> > hbase(region server).
> > When i search for these exceptions on google , i concluded  that problem
> is
> > mainly due to large number of full GC in region server process.
> >
> > I used jstat and found that there are total of 950 full GCs in span of 4
> > days for region server process.Is this ok?
> >
> > I am totally confused by number of exceptions i am getting.
> > Also i get below exceptions intermittently.
> >
> >
> > Region server:-
> >
> > 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":15312,"call":"next(-6681408251916104762, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=-1368823753","client":"
> > 192.168.20.31:48270
> >
> >
> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
> > 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
> > (operationTooSlow): {"processingtimems":14759,"client":"
> > 192.168.20.31:48247
> >
> >
> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"ginfo":["netGainPool"]},"row":"1629657","queuetimems":0,"method":"get","totalColumns":1,"maxVersions":1}
> >
> > 2013-10-18 10:37:45,008 WARN org.apache.hadoop.hdfs.DFSClient:
> DataStreamer
> > Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException:
> File
> >
> >
> /hbase/event_data/4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> > could only be replicated to 0 nodes, instead of 1
> >     at
> >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
> >
> > Name node :-
> > java.io.IOException: File
> >
> >
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> > could only be replicated to 0 nodes, instead of 1
> >
> > java.io.IOException: Got blockReceived message from unregistered or dead
> > node blk_-2949905629769882833_52274
> >
> > Data node :-
> > 480000 millis timeout while waiting for channel to be ready for write.
> ch :
> > java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
> > remote=/
> > 192.168.20.30:36188]
> >
> > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> > DatanodeRegistration(
> > 192.168.20.30:50010,
> > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> infoPort=50075,
> > ipcPort=50020):DataXceiver
> > java.io.EOFException: while trying to read 39309 bytes
> >
> >
> > --
> > Thanks and Regards,
> > Vimal Jain
> >
>



-- 
Thanks and Regards,
Vimal Jain

Re: High Full GC count for Region server

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Vimal,

What are your settings? Memory of the host, and memory allocated for the
different HBase services?

Thanks,

JM


2013/10/22 Vimal Jain <vk...@gmail.com>

> Hi,
> I am running in Hbase in pseudo distributed mode. ( Hadoop version - 1.1.2
> , Hbase version - 0.94.7 )
> I am getting few exceptions in both hadoop ( namenode , datanode) logs and
> hbase(region server).
> When i search for these exceptions on google , i concluded  that problem is
> mainly due to large number of full GC in region server process.
>
> I used jstat and found that there are total of 950 full GCs in span of 4
> days for region server process.Is this ok?
>
> I am totally confused by number of exceptions i am getting.
> Also i get below exceptions intermittently.
>
>
> Region server:-
>
> 2013-10-22 12:00:26,627 WARN org.apache.hadoop.ipc.HBaseServer:
> (responseTooSlow):
> {"processingtimems":15312,"call":"next(-6681408251916104762, 1000), rpc
> version=1, client version=29, methodsFingerPrint=-1368823753","client":"
> 192.168.20.31:48270
>
> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
> 2013-10-22 12:06:17,606 WARN org.apache.hadoop.ipc.HBaseServer:
> (operationTooSlow): {"processingtimems":14759,"client":"
> 192.168.20.31:48247
>
> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"ginfo":["netGainPool"]},"row":"1629657","queuetimems":0,"method":"get","totalColumns":1,"maxVersions":1}
>
> 2013-10-18 10:37:45,008 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
> Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
>
> /hbase/event_data/4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> could only be replicated to 0 nodes, instead of 1
>     at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
>
> Name node :-
> java.io.IOException: File
>
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> could only be replicated to 0 nodes, instead of 1
>
> java.io.IOException: Got blockReceived message from unregistered or dead
> node blk_-2949905629769882833_52274
>
> Data node :-
> 480000 millis timeout while waiting for channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.20.30:50010
> remote=/
> 192.168.20.30:36188]
>
> ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(
> 192.168.20.30:50010,
> storageID=DS-1816106352-192.168.20.30-50010-1369314076237, infoPort=50075,
> ipcPort=50020):DataXceiver
> java.io.EOFException: while trying to read 39309 bytes
>
>
> --
> Thanks and Regards,
> Vimal Jain
>