You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Federico Gaule <fg...@despegar.com> on 2013/12/05 13:49:59 UTC

RPC - Queue Time when handlers are all waiting

Hi,

I have 2 clusters, Master (a) - Slave (b) replication.
B doesn't have client write or reads, all handlers (100) are waiting but
rpc.metrics.RpcQueueTime_num_ops and rpc.metrics.RpcQueueTime_avg_time reports
to be rpc calls to be queued.
There are some screenshots below to show ganglia metrics. How is this
behaviour explained? I have looked for metrics specifications but can't
find much information.

Handlers
http://i42.tinypic.com/242ssoz.png

NumOps
http://tinypic.com/r/of2c8k/5

AvgTime
http://tinypic.com/r/2lsvg5w/5

Cheers

Re: RPC - Queue Time when handlers are all waiting

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Ok. So I will say there is nothing to worry about. There is a millisecond
spent from time to time on a call. It's sound to be the time it takes to go
to the queue and get picked-up. So it stayed on the queue 1ms until an
handler pick it up. And that's why you see you metric growing very slowly.

Do you see any value bigger than 1? How much bigger?


2013/12/10 Federico Gaule <fg...@despegar.com>

> When activating DEBUG level for
> org.apache.hadoop.hbase.ipc.WritableRpcEngine
> didin't get any line on my log. Checking WritableRpcEngine code, looks like
> *org.apache.hadoop.ipc.HBaseServer.trace *should be the right one.
> Here we have some lines. In case you need/want more, just tell:
>
> 2013-12-10 13:18:59,272 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call
> #148; Served: HRegionInterface#multi *queueTime=1* processingTime=6
> contents=1 Put, 16 values [ min=1 max=1 avg=1 ], 16 KeyValues
> 2013-12-10 13:18:59,693 DEBUG org.apache.hadoop.ipc.RPCEngine: Call:
> regionServerReport 2
> 2013-12-10 13:19:02,001 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call
> #157; Served: HRegionInterface#multi queueTime=0 processingTime=9
> contents=62 values [ min=8 max=8 avg=8 ], 62 Puts, 62 KeyValues
> 2013-12-10 13:19:02,697 DEBUG org.apache.hadoop.ipc.RPCEngine: Call:
> regionServerReport 2
> 2013-12-10 13:19:04,731 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call
> #372; Served: HRegionInterface#multi *queueTime=1* processingTime=5
> contents=2 Puts, 9 values [ min=2 max=2 avg=2 ], 9 KeyValues
> 2013-12-10 13:19:05,702 DEBUG org.apache.hadoop.ipc.RPCEngine: Call:
> regionServerReport 3
> 2013-12-10 13:19:08,675 DEBUG org.apache.hadoop.ipc.RPCEngine: Call: multi
> 13
> 2013-12-10 13:19:08,675 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call
> #85378; Served: HRegionInterface#replicateLogEntries queueTime=0
> processingTime=14 contents=1 Entry
> 2013-12-10 13:19:08,705 DEBUG org.apache.hadoop.ipc.RPCEngine: Call:
> regionServerReport 2
> 2013-12-10 13:19:09,335 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call
> #149; Served: HRegionInterface#multi *queueTime=1* processingTime=6
> contents=2 Puts, 19 values [ min=1 max=1 avg=1 ], 19 KeyValues
>
> Thanks!
>
>
> 2013/12/10 Jean-Marc Spaggiari <je...@spaggiari.org>
>
> > Can you activate debug loglevel on
> > org.apache.hadoop.hbase.ipc.WritableRpcEngine and look at something
> looking
> > like
> >
> > Call #xxxx; Served:xxxx#xxxx  queueTime=YYYY....
> >
> > We want to see what you have for YYYY.. This is what is added to your
> > RpcQueueTime.
> >
> > Thanks,
> >
> > JM
> >
>
>
>
> --
>
> [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
>
> *Ing. Federico Gaule*
> Líder Técnico - PAM <ho...@despegar.com>
> Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> tel. +54 (11) 4894-3500
>
> *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
> Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
> en YouTube!] <http://www.youtube.com/Despegar>*
> *Despegar.com, Inc. *
> El mejor precio para tu viaje.
>
> Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
> secreto profesional.
> Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> sistema.
> El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
> persona.
>

Re: RPC - Queue Time when handlers are all waiting

Posted by Federico Gaule <fg...@despegar.com>.

When activating DEBUG level for org.apache.hadoop.hbase.ipc.WritableRpcEngine
didin't get any line on my log. Checking WritableRpcEngine code, looks like
*org.apache.hadoop.ipc.HBaseServer.trace *should be the right one.
Here we have some lines. In case you need/want more, just tell:

2013-12-10 13:18:59,272 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call
#148; Served: HRegionInterface#multi *queueTime=1* processingTime=6
contents=1 Put, 16 values [ min=1 max=1 avg=1 ], 16 KeyValues
2013-12-10 13:18:59,693 DEBUG org.apache.hadoop.ipc.RPCEngine: Call:
regionServerReport 2
2013-12-10 13:19:02,001 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call
#157; Served: HRegionInterface#multi queueTime=0 processingTime=9
contents=62 values [ min=8 max=8 avg=8 ], 62 Puts, 62 KeyValues
2013-12-10 13:19:02,697 DEBUG org.apache.hadoop.ipc.RPCEngine: Call:
regionServerReport 2
2013-12-10 13:19:04,731 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call
#372; Served: HRegionInterface#multi *queueTime=1* processingTime=5
contents=2 Puts, 9 values [ min=2 max=2 avg=2 ], 9 KeyValues
2013-12-10 13:19:05,702 DEBUG org.apache.hadoop.ipc.RPCEngine: Call:
regionServerReport 3
2013-12-10 13:19:08,675 DEBUG org.apache.hadoop.ipc.RPCEngine: Call: multi
13
2013-12-10 13:19:08,675 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call
#85378; Served: HRegionInterface#replicateLogEntries queueTime=0
processingTime=14 contents=1 Entry
2013-12-10 13:19:08,705 DEBUG org.apache.hadoop.ipc.RPCEngine: Call:
regionServerReport 2
2013-12-10 13:19:09,335 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call
#149; Served: HRegionInterface#multi *queueTime=1* processingTime=6
contents=2 Puts, 19 values [ min=1 max=1 avg=1 ], 19 KeyValues

Thanks!


2013/12/10 Jean-Marc Spaggiari <je...@spaggiari.org>

> Can you activate debug loglevel on
> org.apache.hadoop.hbase.ipc.WritableRpcEngine and look at something looking
> like
>
> Call #xxxx; Served:xxxx#xxxx  queueTime=YYYY....
>
> We want to see what you have for YYYY.. This is what is added to your
> RpcQueueTime.
>
> Thanks,
>
> JM
>



-- 

[image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]

*Ing. Federico Gaule*
Líder Técnico - PAM <ho...@despegar.com>
Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
tel. +54 (11) 4894-3500

*[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
en YouTube!] <http://www.youtube.com/Despegar>*
*Despegar.com, Inc. *
El mejor precio para tu viaje.

Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
secreto profesional.
Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
sistema.
El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
persona.

Re: RPC - Queue Time when handlers are all waiting

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Can you activate debug loglevel on
org.apache.hadoop.hbase.ipc.WritableRpcEngine and look at something looking
like

Call #xxxx; Served:xxxx#xxxx  queueTime=YYYY....

We want to see what you have for YYYY.. This is what is added to your
RpcQueueTime.

Thanks,

JM

Re: RPC - Queue Time when handlers are all waiting

Posted by Federico Gaule <fg...@despegar.com>.

I thought we could, somehow, check queue items. Replication Handlers status
are waiting most of the time:
Here's a sample of 1 minute, refreshing every second. Allways got same
output while RpcQueueTime_avg_time was reporting activity (Handlers 29...0
were all WAITING).:

hbase@pam-hb-replica-b-00:~$ jstack 15585 | grep 'IPC Server handler' -A 20
| egrep 'RUNNABLE|BLOCKED' -B 1 -A 20
"IPC Server listener on 60020" daemon prio=10 tid=0x00007f3658189000
nid=0x3d3a runnable [0x00007f363ca68000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <0x000000070030f3a8> (a sun.nio.ch.Util$2)
- locked <0x000000070030f390> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000007003d31b8> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Listener.run(HBaseServer.java:636)

Thanks!



2013/12/10 Jean-Marc Spaggiari <je...@spaggiari.org>

> You can do something like (types it in the email, did not tested it):
>
> for i in {1..240}; do wget "
> http://MASTER_IP:6001/master-status?format=json&filter=handler"; done
>
> That will create a 240 JSon files that you can look at to see handlers
> status. Also, you this same page, you should see you 30 replication
> handlers as well as you 30 "standard" handlers.
>
> Last, I will recommend you to move back to less replication handlers since
> I don't think you need 30.
>
> JM
>
>
> 2013/12/10 Federico Gaule <fg...@despegar.com>
>
> > There is any coprocessor on slaves, neither master.
> > How can i dump RPC queues?
> >
> > Thanks!
> >
> >
> > 2013/12/10 Jean-Marc Spaggiari <je...@spaggiari.org>
> >
> > > Here are the properties from the code regarding the handlers.
> > > hbase.master.handler.count
> > > hbase.regionserver.handler.count
> > > hbase.regionserver.replication.handler.count
> > > hbase.regionserver.metahandler.count
> > >
> > > Do you have any coprocessor configured on your slave cluster? Ar eyou
> > able
> > > to dump the RPC queues every 5 seconds to see what is in?
> > >
> > > JM
> > >
> > >
> > >
> > > 2013/12/10 Federico Gaule <fg...@despegar.com>
> > >
> > > > I'm using hbase 0.94.13.
> > > > hbase.regionserver.metahandler.count is a more intuituve name for
> those
> > > > handlers :)
> > > >
> > > >
> > > > 2013/12/10 Nicolas Liochon <nk...@gmail.com>
> > > >
> > > > > It's hbase.regionserver.metahandler.count. Not sure it causes the
> > issue
> > > > > you're facing, thought. What's your HBase version?
> > > > >
> > > > >
> > > > > On Tue, Dec 10, 2013 at 1:21 PM, Federico Gaule <
> fgaule@despegar.com
> > >
> > > > > wrote:
> > > > >
> > > > > > There is another set of handler we haven't customized "PRI IPC"
> > > > (priority
> > > > > > ?). What are those handlers used for? What is the property used
> to
> > > > > increase
> > > > > > the number of handlers?
> > > > hbase.regionserver.custom.priority.handler.count
> > > > > ?
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > >
> > > > > > 2013/12/10 Federico Gaule <fg...@despegar.com>
> > > > > >
> > > > > > > I've increased hbase.regionserver.replication.handler.count 10x
> > > (30)
> > > > > but
> > > > > > > nothing have changed. rpc.metrics.RpcQueueTime_avg_time still
> > shows
> > > > > > > activity :(
> > > > > > >
> > > > > > > Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 29 on 60000
> > > > WAITING
> > > > > > > (since 16hrs, 58mins, 56sec ago)Waiting for a call (since
> 16hrs,
> > > > > 58mins,
> > > > > > > 56sec ago) Mon Dec 09 14:04:10 EST 2013 REPL IPC Server handler
> > 28
> > > on
> > > > > > > 60000WAITING (since 16hrs, 58mins, 56sec ago) Waiting for a
> call
> > > > (since
> > > > > > > 16hrs, 58mins, 56sec ago)Mon Dec 09 14:04:10 EST 2013 REPL IPC
> > > Server
> > > > > > > handler 27 on 60000 WAITING (since 16hrs, 58mins, 56sec
> > ago)Waiting
> > > > > for a
> > > > > > > call (since 16hrs, 58mins, 56sec ago) Mon Dec 09 14:04:10 EST
> > 2013
> > > > REPL
> > > > > > > IPC Server handler 26 on 60000WAITING (since 16hrs, 58mins,
> 56sec
> > > > > > ago)Waiting for a call (since 16hrs, 58mins, 56sec ago)
> > > > > > > ... ...
> > > > > > > ...
> > > > > > > ...
> > > > > > > Mon Dec 09 14:04:10 EST 2013 REPL IPC Server handler 2 on
> > > > 60000WAITING
> > > > > > > (since 16hrs, 58mins, 56sec ago) Waiting for a call (since
> 16hrs,
> > > > > 58mins,
> > > > > > > 56sec ago)Mon Dec 09 14:04:10 EST 2013 REPL IPC Server handler
> 1
> > on
> > > > > > 60000WAITING (since 16hrs, 58mins, 56sec ago)Waiting
> > > > > > > for a call (since 16hrs, 58mins, 56sec ago) Mon Dec 09 14:04:10
> > EST
> > > > > > 2013REPL IPC Server handler 0 on 60000WAITING
> > > > > > > (since 16hrs, 58mins, 56sec ago) Waiting for a call (since
> 16hrs,
> > > > > 58mins,
> > > > > > > 56sec ago)
> > > > > > > Thanks JM
> > > > > > >
> > > > > > >
> > > > > > > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > > > > > >
> > > > > > >> Yes, default value is 3 in 0.94.14. If you have not changed
> it,
> > > then
> > > > > > it's
> > > > > > >> still 3.
> > > > > > >>
> > > > > > >> conf.getInt("hbase.regionserver.replication.handler.count",
> 3);
> > > > > > >>
> > > > > > >> Keep us posted on the results.
> > > > > > >>
> > > > > > >> JM
> > > > > > >>
> > > > > > >>
> > > > > > >> 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > > > > >>
> > > > > > >> > Default value for
> hbase.regionserver.replication.handler.count
> > > > > (can't
> > > > > > >> find
> > > > > > >> > what is the default, Is it 3?)
> > > > > > >> > I'll do a try increasing that property
> > > > > > >> >
> > > > > > >> > Fri Dec 06 12:44:12 EST 2013REPL IPC Server handler 2 on
> > > > > 60020WAITING
> > > > > > >> > (since 8sec ago)Waiting for a call (since 8sec ago)Fri Dec
> 06
> > > > > 12:44:12
> > > > > > >> EST
> > > > > > >> > 2013REPL IPC Server handler 1 on 60020WAITING (since 8sec
> > > > > ago)Waiting
> > > > > > >> for a
> > > > > > >> > call (since 8sec ago)Fri Dec 06 12:44:12 EST 2013REPL IPC
> > Server
> > > > > > >> handler 0
> > > > > > >> > on 60020WAITING (since 2sec ago)Waiting for a call (since
> 2sec
> > > > ago)
> > > > > > >> > Thanks JM
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > > > > > >> >
> > > > > > >> > > For replications, the handlers used on the salve cluster
> are
> > > > > > >> configured
> > > > > > >> > by
> > > > > > >> > > hbase.regionserver.replication.handler.count. What value
> do
> > > you
> > > > > have
> > > > > > >> for
> > > > > > >> > > this property?
> > > > > > >> > >
> > > > > > >> > > JM
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > > > > >> > >
> > > > > > >> > > > Here is a thread saying what i think it should be (
> > > > > > >> > > >
> > > > > >
> http://grokbase.com/t/hbase/user/13bmndq53k/average-rpc-queue-time
> > )
> > > > > > >> > > >
> > > > > > >> > > > "The RpcQueueTime metrics are a measurement of how long
> > > > > individual
> > > > > > >> > calls
> > > > > > >> > > > stay in this queued state. If your handlers were never
> > 100%
> > > > > > >> occupied,
> > > > > > >> > > this
> > > > > > >> > > > value would be 0. An average of 3 hours is concerning,
> it
> > > > > > basically
> > > > > > >> > means
> > > > > > >> > > > that when a call comes into the RegionServer it takes on
> > > > > average 3
> > > > > > >> > hours
> > > > > > >> > > to
> > > > > > >> > > > start processing, because handlers are all occupied for
> > that
> > > > > > amount
> > > > > > >> of
> > > > > > >> > > > time."
> > > > > > >> > > >
> > > > > > >> > > > Is that correct?
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > > > > >> > > >
> > > > > > >> > > > > Correct me if i'm wrong, but, Queues should be used
> only
> > > > when
> > > > > > >> > handlers
> > > > > > >> > > > are
> > > > > > >> > > > > all busy, shouldn't it?.
> > > > > > >> > > > > If that's true, i don't get why there is activity
> > related
> > > to
> > > > > > >> queues.
> > > > > > >> > > > >
> > > > > > >> > > > > Maybe i'm missing some piece of knowledge about when
> > hbase
> > > > is
> > > > > > >> using
> > > > > > >> > > > queues
> > > > > > >> > > > > :)
> > > > > > >> > > > >
> > > > > > >> > > > > Thanks
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > > 2013/12/9 Jean-Marc Spaggiari <
> jean-marc@spaggiari.org>
> > > > > > >> > > > >
> > > > > > >> > > > >> There might be something I'm missing ;)
> > > > > > >> > > > >>
> > > > > > >> > > > >> On cluster B, as you said, never more than 50% of
> your
> > > > > handlers
> > > > > > >> are
> > > > > > >> > > > used.
> > > > > > >> > > > >> Your Ganglia metrics are showing that there is
> > activities
> > > > > (num
> > > > > > >> ops
> > > > > > >> > is
> > > > > > >> > > > >> increasing), which is correct.
> > > > > > >> > > > >>
> > > > > > >> > > > >> Can you please confirm what you think is wrong from
> > your
> > > > > > charts?
> > > > > > >> > > > >>
> > > > > > >> > > > >> Thanks,
> > > > > > >> > > > >>
> > > > > > >> > > > >> JM
> > > > > > >> > > > >>
> > > > > > >> > > > >>
> > > > > > >> > > > >> 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > > > > >> > > > >>
> > > > > > >> > > > >> > Hi JM,
> > > > > > >> > > > >> > Cluster B is only receiving replication data
> > (writes),
> > > > but
> > > > > > >> > handlers
> > > > > > >> > > > are
> > > > > > >> > > > >> > waiting most of the time (never 50% of them are
> > used).
> > > > As i
> > > > > > >> have
> > > > > > >> > > read,
> > > > > > >> > > > >> RPC
> > > > > > >> > > > >> > queue is only used when handlers are all waiting,
> > does
> > > it
> > > > > > count
> > > > > > >> > for
> > > > > > >> > > > >> > replication as well?
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > Thanks!
> > > > > > >> > > > >> >
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > 2013/12/9 Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org
> > > >
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > > Hi,
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > When you say that B doesn't get any read/write
> > > > operation,
> > > > > > >> does
> > > > > > >> > it
> > > > > > >> > > > mean
> > > > > > >> > > > >> > you
> > > > > > >> > > > >> > > stopped the replication? Or B is still getting
> the
> > > > write
> > > > > > >> > > operations
> > > > > > >> > > > >> from
> > > > > > >> > > > >> > A
> > > > > > >> > > > >> > > because of the replication? If so, that's why you
> > RPC
> > > > > queue
> > > > > > >> is
> > > > > > >> > > > used...
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > JM
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> > > > Not much information in RS logs (DEBUG level
> set
> > to
> > > > > > >> > > > >> > > > org.apache.hadoop.hbase). Here is a sample of
> one
> > > > > > >> regionserver
> > > > > > >> > > > >> showing
> > > > > > >> > > > >> > > > increasing rpc.metrics.RpcQueueTime_num_ops and
> > > > > > >> > > > >> > > > rpc.metrics.RpcQueueTime_avg_time
> > > > > > >> > > > >> > > > activity:
> > > > > > >> > > > >> > > >
> > > > > > >> > > > >> > > > 2013-12-09 08:09:10,699 DEBUG
> > > > > > >> > > > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache:
> > > > Stats:
> > > > > > >> > > total=23.14
> > > > > > >> > > > >> MB,
> > > > > > >> > > > >> > > > free=2.73 GB, max=2.75 GB, blocks=0,
> > > > > accesses=122442151,
> > > > > > >> > > > >> > hits=122168501,
> > > > > > >> > > > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> > > > > > >> > > > cachingHits=122162378,
> > > > > > >> > > > >> > > > cachingHitsRatio=99.97%, , evictions=0,
> > > evicted=6768,
> > > > > > >> > > > >> > > > evictedPerRun=Infinity
> > > > > > >> > > > >> > > > 2013-12-09 08:09:11,396 INFO
> > > > > > >> > > > >> > > >
> > > > > > >> > >
> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > > > >> > > > >> Total
> > > > > > >> > > > >> > > > replicated: 1
> > > > > > >> > > > >> > > > 2013-12-09 08:09:14,979 INFO
> > > > > > >> > > > >> > > >
> > > > > > >> > >
> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > > > >> > > > >> Total
> > > > > > >> > > > >> > > > replicated: 2
> > > > > > >> > > > >> > > > 2013-12-09 08:09:16,016 INFO
> > > > > > >> > > > >> > > >
> > > > > > >> > >
> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > > > >> > > > >> Total
> > > > > > >> > > > >> > > > replicated: 1
> > > > > > >> > > > >> > > > ...
> > > > > > >> > > > >> > > > 2013-12-09 08:14:07,659 INFO
> > > > > > >> > > > >> > > >
> > > > > > >> > >
> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > > > >> > > > >> Total
> > > > > > >> > > > >> > > > replicated: 1
> > > > > > >> > > > >> > > > 2013-12-09 08:14:08,713 INFO
> > > > > > >> > > > >> > > >
> > > > > > >> > >
> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > > > >> > > > >> Total
> > > > > > >> > > > >> > > > replicated: 3
> > > > > > >> > > > >> > > > 2013-12-09 08:14:10,699 DEBUG
> > > > > > >> > > > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache:
> > > > Stats:
> > > > > > >> > > total=23.14
> > > > > > >> > > > >> MB,
> > > > > > >> > > > >> > > > free=2.73 GB, max=2.75 GB, blocks=0,
> > > > > accesses=122442151,
> > > > > > >> > > > >> > hits=122168501,
> > > > > > >> > > > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> > > > > > >> > > > cachingHits=122162378,
> > > > > > >> > > > >> > > > cachingHitsRatio=99.97%, , evictions=0,
> > > evicted=6768,
> > > > > > >> > > > >> > > > evictedPerRun=Infinity
> > > > > > >> > > > >> > > > 2013-12-09 08:14:12,711 INFO
> > > > > > >> > > > >> > > >
> > > > > > >> > >
> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > > > >> > > > >> Total
> > > > > > >> > > > >> > > > replicated: 1
> > > > > > >> > > > >> > > > 2013-12-09 08:14:14,778 INFO
> > > > > > >> > > > >> > > >
> > > > > > >> > >
> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > > > >> > > > >> Total
> > > > > > >> > > > >> > > > replicated: 3
> > > > > > >> > > > >> > > > ...
> > > > > > >> > > > >> > > > 2013-12-09 08:15:09,199 INFO
> > > > > > >> > > > >> > > >
> > > > > > >> > >
> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > > > >> > > > >> Total
> > > > > > >> > > > >> > > > replicated: 3
> > > > > > >> > > > >> > > > 2013-12-09 08:15:12,243 INFO
> > > > > > >> > > > >> > > >
> > > > > > >> > >
> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > > > >> > > > >> Total
> > > > > > >> > > > >> > > > replicated: 2
> > > > > > >> > > > >> > > > 2013-12-09 08:15:22,086 INFO
> > > > > > >> > > > >> > > >
> > > > > > >> > >
> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > > > >> > > > >> Total
> > > > > > >> > > > >> > > > replicated: 2
> > > > > > >> > > > >> > > >
> > > > > > >> > > > >> > > > Thanks
> > > > > > >> > > > >> > > >
> > > > > > >> > > > >> > > >
> > > > > > >> > > > >> > > > 2013/12/7 Bharath Vissapragada <
> > > > bharathv@cloudera.com>
> > > > > > >> > > > >> > > >
> > > > > > >> > > > >> > > > > I'd look into the RS logs to see whats
> > happening
> > > > > there.
> > > > > > >> > > > Difficult
> > > > > > >> > > > >> to
> > > > > > >> > > > >> > > > guess
> > > > > > >> > > > >> > > > > from the given information!
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > > On Sat, Dec 7, 2013 at 8:52 PM, Federico
> Gaule
> > <
> > > > > > >> > > > >> fgaule@despegar.com>
> > > > > > >> > > > >> > > > > wrote:
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > > > Any clue?
> > > > > > >> > > > >> > > > > > El dic 5, 2013 9:49 a.m., "Federico Gaule"
> <
> > > > > > >> > > > fgaule@despegar.com
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > > > > escribió:
> > > > > > >> > > > >> > > > > >
> > > > > > >> > > > >> > > > > > > Hi,
> > > > > > >> > > > >> > > > > > >
> > > > > > >> > > > >> > > > > > > I have 2 clusters, Master (a) - Slave (b)
> > > > > > >> replication.
> > > > > > >> > > > >> > > > > > > B doesn't have client write or reads, all
> > > > > handlers
> > > > > > >> (100)
> > > > > > >> > > are
> > > > > > >> > > > >> > > waiting
> > > > > > >> > > > >> > > > > but
> > > > > > >> > > > >> > > > > > > rpc.metrics.RpcQueueTime_num_ops and
> > > > > > >> > > > >> > > > rpc.metrics.RpcQueueTime_avg_time
> > > > > > >> > > > >> > > > > > reports
> > > > > > >> > > > >> > > > > > > to be rpc calls to be queued.
> > > > > > >> > > > >> > > > > > > There are some screenshots below to show
> > > > ganglia
> > > > > > >> > metrics.
> > > > > > >> > > > How
> > > > > > >> > > > >> is
> > > > > > >> > > > >> > > this
> > > > > > >> > > > >> > > > > > > behaviour explained? I have looked for
> > > metrics
> > > > > > >> > > > specifications
> > > > > > >> > > > >> but
> > > > > > >> > > > >> > > > can't
> > > > > > >> > > > >> > > > > > > find much information.
> > > > > > >> > > > >> > > > > > >
> > > > > > >> > > > >> > > > > > > Handlers
> > > > > > >> > > > >> > > > > > > http://i42.tinypic.com/242ssoz.png
> > > > > > >> > > > >> > > > > > >
> > > > > > >> > > > >> > > > > > > NumOps
> > > > > > >> > > > >> > > > > > > http://tinypic.com/r/of2c8k/5
> > > > > > >> > > > >> > > > > > >
> > > > > > >> > > > >> > > > > > > AvgTime
> > > > > > >> > > > >> > > > > > > http://tinypic.com/r/2lsvg5w/5
> > > > > > >> > > > >> > > > > > >
> > > > > > >> > > > >> > > > > > > Cheers
> > > > > > >> > > > >> > > > > > >
> > > > > > >> > > > >> > > > > >
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > > > --
> > > > > > >> > > > >> > > > > Bharath Vissapragada
> > > > > > >> > > > >> > > > > <http://www.cloudera.com>
> > > > > > >> > > > >> > > > >
> > > > > > >> > > > >> > > >
> > > > > > >> > > > >> > > >
> > > > > > >> > > > >> > > >
> > > > > > >> > > > >> > > > --
> > > > > > >> > > > >> > > >
> > > > > > >> > > > >> > > > [image:
> > > > > > >> > > > >>
> > > > > http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > > > > >> > > > >> > ]
> > > > > > >> > > > >> > > >
> > > > > > >> > > > >> > > > *Ing. Federico Gaule*
> > > > > > >> > > > >> > > > Líder Técnico - PAM <
> hotels-pam-it@despegar.com>
> > > > > > >> > > > >> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A.
> (C1043AAU)
> > > > > > >> > > > >> > > > tel. +54 (11) 4894-3500
> > > > > > >> > > > >> > > >
> > > > > > >> > > > >> > > > *[image: Seguinos en Twitter!] <
> > > > > > >> > > http://twitter.com/#!/despegarar>
> > > > > > >> > > > >> > > [image:
> > > > > > >> > > > >> > > > Seguinos en Facebook!] <
> > > > > http://www.facebook.com/despegar
> > > > > > >
> > > > > > >> > > [image:
> > > > > > >> > > > >> > > Seguinos
> > > > > > >> > > > >> > > > en YouTube!] <http://www.youtube.com/Despegar
> >*
> > > > > > >> > > > >> > > > *Despegar.com, Inc. *
> > > > > > >> > > > >> > > > El mejor precio para tu viaje.
> > > > > > >> > > > >> > > >
> > > > > > >> > > > >> > > > Este mensaje es confidencial y puede contener
> > > > > > informaciÃ³n
> > > > > > >> > > > amparada
> > > > > > >> > > > >> por
> > > > > > >> > > > >> > > el
> > > > > > >> > > > >> > > > secreto profesional.
> > > > > > >> > > > >> > > > Si usted ha recibido este e-mail por error, por
> > > favor
> > > > > > >> > > > >> comunÃquenoslo
> > > > > > >> > > > >> > > > inmediatamente respondiendo a este e-mail y
> luego
> > > > > > >> > eliminÃ¡ndolo
> > > > > > >> > > de
> > > > > > >> > > > >> su
> > > > > > >> > > > >> > > > sistema.
> > > > > > >> > > > >> > > > El contenido de este mensaje no deberÃ¡ ser
> > copiado
> > > > ni
> > > > > > >> > > divulgado a
> > > > > > >> > > > >> > > ninguna
> > > > > > >> > > > >> > > > persona.
> > > > > > >> > > > >> > > >
> > > > > > >> > > > >> > >
> > > > > > >> > > > >> >
> > > > > > >> > > > >> >
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > --
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > [image:
> > > > > > >> > > >
> > > > http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > *Ing. Federico Gaule*
> > > > > > >> > > > >> > Líder Técnico - PAM <ho...@despegar.com>
> > > > > > >> > > > >> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > > > >> > > > >> > tel. +54 (11) 4894-3500
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > *[image: Seguinos en Twitter!] <
> > > > > > >> http://twitter.com/#!/despegarar>
> > > > > > >> > > > >> [image:
> > > > > > >> > > > >> > Seguinos en Facebook!] <
> > > http://www.facebook.com/despegar
> > > > >
> > > > > > >> [image:
> > > > > > >> > > > >> Seguinos
> > > > > > >> > > > >> > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > > > >> > > > >> > *Despegar.com, Inc. *
> > > > > > >> > > > >> > El mejor precio para tu viaje.
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > Este mensaje es confidencial y puede contener
> > > > informaciÃ³n
> > > > > > >> > amparada
> > > > > > >> > > > por
> > > > > > >> > > > >> el
> > > > > > >> > > > >> > secreto profesional.
> > > > > > >> > > > >> > Si usted ha recibido este e-mail por error, por
> favor
> > > > > > >> > > comunÃquenoslo
> > > > > > >> > > > >> > inmediatamente respondiendo a este e-mail y luego
> > > > > > >> eliminÃ¡ndolo de
> > > > > > >> > > su
> > > > > > >> > > > >> > sistema.
> > > > > > >> > > > >> > El contenido de este mensaje no deberÃ¡ ser copiado
> > ni
> > > > > > >> divulgado a
> > > > > > >> > > > >> ninguna
> > > > > > >> > > > >> > persona.
> > > > > > >> > > > >> >
> > > > > > >> > > > >>
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > > --
> > > > > > >> > > > >
> > > > > > >> > > > > [image:
> > > > > > >> >
> > http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > > > > >> > > ]
> > > > > > >> > > > >
> > > > > > >> > > > > *Ing. Federico Gaule*
> > > > > > >> > > > > Líder Técnico - PAM <ho...@despegar.com>
> > > > > > >> > > > >
> > > > > > >> > > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > > > >> > > > > tel. +54 (11) 4894-3500
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > > *[image: Seguinos en Twitter!] <
> > > > > > http://twitter.com/#!/despegarar>
> > > > > > >> > > > [image:
> > > > > > >> > > > > Seguinos en Facebook!] <
> > http://www.facebook.com/despegar>
> > > > > > [image:
> > > > > > >> > > > Seguinos
> > > > > > >> > > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > > > >> > > > > *Despegar.com, Inc. *
> > > > > > >> > > > >
> > > > > > >> > > > > El mejor precio para tu viaje.
> > > > > > >> > > > >
> > > > > > >> > > > > Este mensaje es confidencial y puede contener
> > informaciÃ³n
> > > > > > >> amparada
> > > > > > >> > por
> > > > > > >> > > > el
> > > > > > >> > > > > secreto profesional.
> > > > > > >> > > > > Si usted ha recibido este e-mail por error, por favor
> > > > > > >> comunÃquenoslo
> > > > > > >> > > > > inmediatamente respondiendo a este e-mail y luego
> > > > > eliminÃ¡ndolo
> > > > > > >> de su
> > > > > > >> > > > > sistema.
> > > > > > >> > > > > El contenido de este mensaje no deberÃ¡ ser copiado ni
> > > > > > divulgado a
> > > > > > >> > > > ninguna
> > > > > > >> > > > > persona.
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > --
> > > > > > >> > > >
> > > > > > >> > > > [image:
> > > > > > >>
> http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > > > > >> > ]
> > > > > > >> > > >
> > > > > > >> > > > *Ing. Federico Gaule*
> > > > > > >> > > > Líder Técnico - PAM <ho...@despegar.com>
> > > > > > >> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > > > >> > > > tel. +54 (11) 4894-3500
> > > > > > >> > > >
> > > > > > >> > > > *[image: Seguinos en Twitter!] <
> > > > > http://twitter.com/#!/despegarar>
> > > > > > >> > > [image:
> > > > > > >> > > > Seguinos en Facebook!] <
> http://www.facebook.com/despegar>
> > > > > [image:
> > > > > > >> > > Seguinos
> > > > > > >> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > > > >> > > > *Despegar.com, Inc. *
> > > > > > >> > > > El mejor precio para tu viaje.
> > > > > > >> > > >
> > > > > > >> > > > Este mensaje es confidencial y puede contener
> informaciÃ³n
> > > > > > amparada
> > > > > > >> por
> > > > > > >> > > el
> > > > > > >> > > > secreto profesional.
> > > > > > >> > > > Si usted ha recibido este e-mail por error, por favor
> > > > > > >> comunÃquenoslo
> > > > > > >> > > > inmediatamente respondiendo a este e-mail y luego
> > > > eliminÃ¡ndolo
> > > > > de
> > > > > > >> su
> > > > > > >> > > > sistema.
> > > > > > >> > > > El contenido de este mensaje no deberÃ¡ ser copiado ni
> > > > > divulgado a
> > > > > > >> > > ninguna
> > > > > > >> > > > persona.
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > --
> > > > > > >> >
> > > > > > >> > [image:
> > > > > > http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> > > > > > >> >
> > > > > > >> > *Ing. Federico Gaule*
> > > > > > >> > Líder Técnico - PAM <ho...@despegar.com>
> > > > > > >> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > > > >> > tel. +54 (11) 4894-3500
> > > > > > >> >
> > > > > > >> > *[image: Seguinos en Twitter!] <
> > > http://twitter.com/#!/despegarar>
> > > > > > >> [image:
> > > > > > >> > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> > > [image:
> > > > > > >> Seguinos
> > > > > > >> > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > > > >> > *Despegar.com, Inc. *
> > > > > > >> > El mejor precio para tu viaje.
> > > > > > >> >
> > > > > > >> > Este mensaje es confidencial y puede contener informaciÃ³n
> > > > amparada
> > > > > > por
> > > > > > >> el
> > > > > > >> > secreto profesional.
> > > > > > >> > Si usted ha recibido este e-mail por error, por favor
> > > > > comunÃquenoslo
> > > > > > >> > inmediatamente respondiendo a este e-mail y luego
> > eliminÃ¡ndolo
> > > de
> > > > > su
> > > > > > >> > sistema.
> > > > > > >> > El contenido de este mensaje no deberÃ¡ ser copiado ni
> > > divulgado a
> > > > > > >> ninguna
> > > > > > >> > persona.
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > [image:
> > > > http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > > > ]
> > > > > > >
> > > > > > > *Ing. Federico Gaule*
> > > > > > > Líder Técnico - PAM <ho...@despegar.com>
> > > > > > >
> > > > > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > > > > tel. +54 (11) 4894-3500
> > > > > > >
> > > > > > >
> > > > > > > *[image: Seguinos en Twitter!] <
> http://twitter.com/#!/despegarar
> > >
> > > > > > [image:
> > > > > > > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> > [image:
> > > > > > Seguinos
> > > > > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > > > > *Despegar.com, Inc. *
> > > > > > >
> > > > > > > El mejor precio para tu viaje.
> > > > > > >
> > > > > > > Este mensaje es confidencial y puede contener informaciÃ³n
> > amparada
> > > > por
> > > > > > el
> > > > > > > secreto profesional.
> > > > > > > Si usted ha recibido este e-mail por error, por favor
> > > comunÃquenoslo
> > > > > > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo
> > de
> > > su
> > > > > > > sistema.
> > > > > > > El contenido de este mensaje no deberÃ¡ ser copiado ni
> divulgado
> > a
> > > > > > ninguna
> > > > > > > persona.
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > [image:
> > > http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > > ]
> > > > > >
> > > > > > *Ing. Federico Gaule*
> > > > > > Líder Técnico - PAM <ho...@despegar.com>
> > > > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > > > tel. +54 (11) 4894-3500
> > > > > >
> > > > > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar
> >
> > > > > [image:
> > > > > > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> [image:
> > > > > Seguinos
> > > > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > > > *Despegar.com, Inc. *
> > > > > > El mejor precio para tu viaje.
> > > > > >
> > > > > > Este mensaje es confidencial y puede contener informaciÃ³n
> amparada
> > > por
> > > > > el
> > > > > > secreto profesional.
> > > > > > Si usted ha recibido este e-mail por error, por favor
> > comunÃquenoslo
> > > > > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo
> de
> > su
> > > > > > sistema.
> > > > > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado
> a
> > > > > ninguna
> > > > > > persona.
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > [image:
> http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > ]
> > > >
> > > > *Ing. Federico Gaule*
> > > > Líder Técnico - PAM <ho...@despegar.com>
> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > tel. +54 (11) 4894-3500
> > > >
> > > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> > > [image:
> > > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> > > Seguinos
> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > *Despegar.com, Inc. *
> > > > El mejor precio para tu viaje.
> > > >
> > > > Este mensaje es confidencial y puede contener informaciÃ³n amparada
> por
> > > el
> > > > secreto profesional.
> > > > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> > > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> > > > sistema.
> > > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> > > ninguna
> > > > persona.
> > > >
> > >
> >
> >
> >
> > --
> >
> > [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> >
> > *Ing. Federico Gaule*
> > Líder Técnico - PAM <ho...@despegar.com>
> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > tel. +54 (11) 4894-3500
> >
> > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> [image:
> > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> Seguinos
> > en YouTube!] <http://www.youtube.com/Despegar>*
> > *Despegar.com, Inc. *
> > El mejor precio para tu viaje.
> >
> > Este mensaje es confidencial y puede contener informaciÃ³n amparada por
> el
> > secreto profesional.
> > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> > sistema.
> > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> ninguna
> > persona.
> >
>



-- 

[image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]

*Ing. Federico Gaule*
Líder Técnico - PAM <ho...@despegar.com>
Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
tel. +54 (11) 4894-3500

*[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
en YouTube!] <http://www.youtube.com/Despegar>*
*Despegar.com, Inc. *
El mejor precio para tu viaje.

Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
secreto profesional.
Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
sistema.
El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
persona.

Re: RPC - Queue Time when handlers are all waiting

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

You can do something like (types it in the email, did not tested it):

for i in {1..240}; do wget "
http://MASTER_IP:6001/master-status?format=json&filter=handler"; done

That will create a 240 JSon files that you can look at to see handlers
status. Also, you this same page, you should see you 30 replication
handlers as well as you 30 "standard" handlers.

Last, I will recommend you to move back to less replication handlers since
I don't think you need 30.

JM


2013/12/10 Federico Gaule <fg...@despegar.com>

> There is any coprocessor on slaves, neither master.
> How can i dump RPC queues?
>
> Thanks!
>
>
> 2013/12/10 Jean-Marc Spaggiari <je...@spaggiari.org>
>
> > Here are the properties from the code regarding the handlers.
> > hbase.master.handler.count
> > hbase.regionserver.handler.count
> > hbase.regionserver.replication.handler.count
> > hbase.regionserver.metahandler.count
> >
> > Do you have any coprocessor configured on your slave cluster? Ar eyou
> able
> > to dump the RPC queues every 5 seconds to see what is in?
> >
> > JM
> >
> >
> >
> > 2013/12/10 Federico Gaule <fg...@despegar.com>
> >
> > > I'm using hbase 0.94.13.
> > > hbase.regionserver.metahandler.count is a more intuituve name for those
> > > handlers :)
> > >
> > >
> > > 2013/12/10 Nicolas Liochon <nk...@gmail.com>
> > >
> > > > It's hbase.regionserver.metahandler.count. Not sure it causes the
> issue
> > > > you're facing, thought. What's your HBase version?
> > > >
> > > >
> > > > On Tue, Dec 10, 2013 at 1:21 PM, Federico Gaule <fgaule@despegar.com
> >
> > > > wrote:
> > > >
> > > > > There is another set of handler we haven't customized "PRI IPC"
> > > (priority
> > > > > ?). What are those handlers used for? What is the property used to
> > > > increase
> > > > > the number of handlers?
> > > hbase.regionserver.custom.priority.handler.count
> > > > ?
> > > > >
> > > > > Thanks!
> > > > >
> > > > >
> > > > > 2013/12/10 Federico Gaule <fg...@despegar.com>
> > > > >
> > > > > > I've increased hbase.regionserver.replication.handler.count 10x
> > (30)
> > > > but
> > > > > > nothing have changed. rpc.metrics.RpcQueueTime_avg_time still
> shows
> > > > > > activity :(
> > > > > >
> > > > > > Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 29 on 60000
> > > WAITING
> > > > > > (since 16hrs, 58mins, 56sec ago)Waiting for a call (since 16hrs,
> > > > 58mins,
> > > > > > 56sec ago) Mon Dec 09 14:04:10 EST 2013 REPL IPC Server handler
> 28
> > on
> > > > > > 60000WAITING (since 16hrs, 58mins, 56sec ago) Waiting for a call
> > > (since
> > > > > > 16hrs, 58mins, 56sec ago)Mon Dec 09 14:04:10 EST 2013 REPL IPC
> > Server
> > > > > > handler 27 on 60000 WAITING (since 16hrs, 58mins, 56sec
> ago)Waiting
> > > > for a
> > > > > > call (since 16hrs, 58mins, 56sec ago) Mon Dec 09 14:04:10 EST
> 2013
> > > REPL
> > > > > > IPC Server handler 26 on 60000WAITING (since 16hrs, 58mins, 56sec
> > > > > ago)Waiting for a call (since 16hrs, 58mins, 56sec ago)
> > > > > > ... ...
> > > > > > ...
> > > > > > ...
> > > > > > Mon Dec 09 14:04:10 EST 2013 REPL IPC Server handler 2 on
> > > 60000WAITING
> > > > > > (since 16hrs, 58mins, 56sec ago) Waiting for a call (since 16hrs,
> > > > 58mins,
> > > > > > 56sec ago)Mon Dec 09 14:04:10 EST 2013 REPL IPC Server handler 1
> on
> > > > > 60000WAITING (since 16hrs, 58mins, 56sec ago)Waiting
> > > > > > for a call (since 16hrs, 58mins, 56sec ago) Mon Dec 09 14:04:10
> EST
> > > > > 2013REPL IPC Server handler 0 on 60000WAITING
> > > > > > (since 16hrs, 58mins, 56sec ago) Waiting for a call (since 16hrs,
> > > > 58mins,
> > > > > > 56sec ago)
> > > > > > Thanks JM
> > > > > >
> > > > > >
> > > > > > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > > > > >
> > > > > >> Yes, default value is 3 in 0.94.14. If you have not changed it,
> > then
> > > > > it's
> > > > > >> still 3.
> > > > > >>
> > > > > >> conf.getInt("hbase.regionserver.replication.handler.count", 3);
> > > > > >>
> > > > > >> Keep us posted on the results.
> > > > > >>
> > > > > >> JM
> > > > > >>
> > > > > >>
> > > > > >> 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > > > >>
> > > > > >> > Default value for hbase.regionserver.replication.handler.count
> > > > (can't
> > > > > >> find
> > > > > >> > what is the default, Is it 3?)
> > > > > >> > I'll do a try increasing that property
> > > > > >> >
> > > > > >> > Fri Dec 06 12:44:12 EST 2013REPL IPC Server handler 2 on
> > > > 60020WAITING
> > > > > >> > (since 8sec ago)Waiting for a call (since 8sec ago)Fri Dec 06
> > > > 12:44:12
> > > > > >> EST
> > > > > >> > 2013REPL IPC Server handler 1 on 60020WAITING (since 8sec
> > > > ago)Waiting
> > > > > >> for a
> > > > > >> > call (since 8sec ago)Fri Dec 06 12:44:12 EST 2013REPL IPC
> Server
> > > > > >> handler 0
> > > > > >> > on 60020WAITING (since 2sec ago)Waiting for a call (since 2sec
> > > ago)
> > > > > >> > Thanks JM
> > > > > >> >
> > > > > >> >
> > > > > >> > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > > > > >> >
> > > > > >> > > For replications, the handlers used on the salve cluster are
> > > > > >> configured
> > > > > >> > by
> > > > > >> > > hbase.regionserver.replication.handler.count. What value do
> > you
> > > > have
> > > > > >> for
> > > > > >> > > this property?
> > > > > >> > >
> > > > > >> > > JM
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > > > >> > >
> > > > > >> > > > Here is a thread saying what i think it should be (
> > > > > >> > > >
> > > > > http://grokbase.com/t/hbase/user/13bmndq53k/average-rpc-queue-time
> )
> > > > > >> > > >
> > > > > >> > > > "The RpcQueueTime metrics are a measurement of how long
> > > > individual
> > > > > >> > calls
> > > > > >> > > > stay in this queued state. If your handlers were never
> 100%
> > > > > >> occupied,
> > > > > >> > > this
> > > > > >> > > > value would be 0. An average of 3 hours is concerning, it
> > > > > basically
> > > > > >> > means
> > > > > >> > > > that when a call comes into the RegionServer it takes on
> > > > average 3
> > > > > >> > hours
> > > > > >> > > to
> > > > > >> > > > start processing, because handlers are all occupied for
> that
> > > > > amount
> > > > > >> of
> > > > > >> > > > time."
> > > > > >> > > >
> > > > > >> > > > Is that correct?
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > > > >> > > >
> > > > > >> > > > > Correct me if i'm wrong, but, Queues should be used only
> > > when
> > > > > >> > handlers
> > > > > >> > > > are
> > > > > >> > > > > all busy, shouldn't it?.
> > > > > >> > > > > If that's true, i don't get why there is activity
> related
> > to
> > > > > >> queues.
> > > > > >> > > > >
> > > > > >> > > > > Maybe i'm missing some piece of knowledge about when
> hbase
> > > is
> > > > > >> using
> > > > > >> > > > queues
> > > > > >> > > > > :)
> > > > > >> > > > >
> > > > > >> > > > > Thanks
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > > > > >> > > > >
> > > > > >> > > > >> There might be something I'm missing ;)
> > > > > >> > > > >>
> > > > > >> > > > >> On cluster B, as you said, never more than 50% of your
> > > > handlers
> > > > > >> are
> > > > > >> > > > used.
> > > > > >> > > > >> Your Ganglia metrics are showing that there is
> activities
> > > > (num
> > > > > >> ops
> > > > > >> > is
> > > > > >> > > > >> increasing), which is correct.
> > > > > >> > > > >>
> > > > > >> > > > >> Can you please confirm what you think is wrong from
> your
> > > > > charts?
> > > > > >> > > > >>
> > > > > >> > > > >> Thanks,
> > > > > >> > > > >>
> > > > > >> > > > >> JM
> > > > > >> > > > >>
> > > > > >> > > > >>
> > > > > >> > > > >> 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > > > >> > > > >>
> > > > > >> > > > >> > Hi JM,
> > > > > >> > > > >> > Cluster B is only receiving replication data
> (writes),
> > > but
> > > > > >> > handlers
> > > > > >> > > > are
> > > > > >> > > > >> > waiting most of the time (never 50% of them are
> used).
> > > As i
> > > > > >> have
> > > > > >> > > read,
> > > > > >> > > > >> RPC
> > > > > >> > > > >> > queue is only used when handlers are all waiting,
> does
> > it
> > > > > count
> > > > > >> > for
> > > > > >> > > > >> > replication as well?
> > > > > >> > > > >> >
> > > > > >> > > > >> > Thanks!
> > > > > >> > > > >> >
> > > > > >> > > > >> >
> > > > > >> > > > >> > 2013/12/9 Jean-Marc Spaggiari <
> jean-marc@spaggiari.org
> > >
> > > > > >> > > > >> >
> > > > > >> > > > >> > > Hi,
> > > > > >> > > > >> > >
> > > > > >> > > > >> > > When you say that B doesn't get any read/write
> > > operation,
> > > > > >> does
> > > > > >> > it
> > > > > >> > > > mean
> > > > > >> > > > >> > you
> > > > > >> > > > >> > > stopped the replication? Or B is still getting the
> > > write
> > > > > >> > > operations
> > > > > >> > > > >> from
> > > > > >> > > > >> > A
> > > > > >> > > > >> > > because of the replication? If so, that's why you
> RPC
> > > > queue
> > > > > >> is
> > > > > >> > > > used...
> > > > > >> > > > >> > >
> > > > > >> > > > >> > > JM
> > > > > >> > > > >> > >
> > > > > >> > > > >> > >
> > > > > >> > > > >> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > > > >> > > > >> > >
> > > > > >> > > > >> > > > Not much information in RS logs (DEBUG level set
> to
> > > > > >> > > > >> > > > org.apache.hadoop.hbase). Here is a sample of one
> > > > > >> regionserver
> > > > > >> > > > >> showing
> > > > > >> > > > >> > > > increasing rpc.metrics.RpcQueueTime_num_ops and
> > > > > >> > > > >> > > > rpc.metrics.RpcQueueTime_avg_time
> > > > > >> > > > >> > > > activity:
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > 2013-12-09 08:09:10,699 DEBUG
> > > > > >> > > > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache:
> > > Stats:
> > > > > >> > > total=23.14
> > > > > >> > > > >> MB,
> > > > > >> > > > >> > > > free=2.73 GB, max=2.75 GB, blocks=0,
> > > > accesses=122442151,
> > > > > >> > > > >> > hits=122168501,
> > > > > >> > > > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> > > > > >> > > > cachingHits=122162378,
> > > > > >> > > > >> > > > cachingHitsRatio=99.97%, , evictions=0,
> > evicted=6768,
> > > > > >> > > > >> > > > evictedPerRun=Infinity
> > > > > >> > > > >> > > > 2013-12-09 08:09:11,396 INFO
> > > > > >> > > > >> > > >
> > > > > >> > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > > >> > > > >> Total
> > > > > >> > > > >> > > > replicated: 1
> > > > > >> > > > >> > > > 2013-12-09 08:09:14,979 INFO
> > > > > >> > > > >> > > >
> > > > > >> > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > > >> > > > >> Total
> > > > > >> > > > >> > > > replicated: 2
> > > > > >> > > > >> > > > 2013-12-09 08:09:16,016 INFO
> > > > > >> > > > >> > > >
> > > > > >> > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > > >> > > > >> Total
> > > > > >> > > > >> > > > replicated: 1
> > > > > >> > > > >> > > > ...
> > > > > >> > > > >> > > > 2013-12-09 08:14:07,659 INFO
> > > > > >> > > > >> > > >
> > > > > >> > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > > >> > > > >> Total
> > > > > >> > > > >> > > > replicated: 1
> > > > > >> > > > >> > > > 2013-12-09 08:14:08,713 INFO
> > > > > >> > > > >> > > >
> > > > > >> > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > > >> > > > >> Total
> > > > > >> > > > >> > > > replicated: 3
> > > > > >> > > > >> > > > 2013-12-09 08:14:10,699 DEBUG
> > > > > >> > > > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache:
> > > Stats:
> > > > > >> > > total=23.14
> > > > > >> > > > >> MB,
> > > > > >> > > > >> > > > free=2.73 GB, max=2.75 GB, blocks=0,
> > > > accesses=122442151,
> > > > > >> > > > >> > hits=122168501,
> > > > > >> > > > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> > > > > >> > > > cachingHits=122162378,
> > > > > >> > > > >> > > > cachingHitsRatio=99.97%, , evictions=0,
> > evicted=6768,
> > > > > >> > > > >> > > > evictedPerRun=Infinity
> > > > > >> > > > >> > > > 2013-12-09 08:14:12,711 INFO
> > > > > >> > > > >> > > >
> > > > > >> > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > > >> > > > >> Total
> > > > > >> > > > >> > > > replicated: 1
> > > > > >> > > > >> > > > 2013-12-09 08:14:14,778 INFO
> > > > > >> > > > >> > > >
> > > > > >> > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > > >> > > > >> Total
> > > > > >> > > > >> > > > replicated: 3
> > > > > >> > > > >> > > > ...
> > > > > >> > > > >> > > > 2013-12-09 08:15:09,199 INFO
> > > > > >> > > > >> > > >
> > > > > >> > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > > >> > > > >> Total
> > > > > >> > > > >> > > > replicated: 3
> > > > > >> > > > >> > > > 2013-12-09 08:15:12,243 INFO
> > > > > >> > > > >> > > >
> > > > > >> > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > > >> > > > >> Total
> > > > > >> > > > >> > > > replicated: 2
> > > > > >> > > > >> > > > 2013-12-09 08:15:22,086 INFO
> > > > > >> > > > >> > > >
> > > > > >> > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > > >> > > > >> Total
> > > > > >> > > > >> > > > replicated: 2
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > Thanks
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > 2013/12/7 Bharath Vissapragada <
> > > bharathv@cloudera.com>
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > > I'd look into the RS logs to see whats
> happening
> > > > there.
> > > > > >> > > > Difficult
> > > > > >> > > > >> to
> > > > > >> > > > >> > > > guess
> > > > > >> > > > >> > > > > from the given information!
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > On Sat, Dec 7, 2013 at 8:52 PM, Federico Gaule
> <
> > > > > >> > > > >> fgaule@despegar.com>
> > > > > >> > > > >> > > > > wrote:
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > > Any clue?
> > > > > >> > > > >> > > > > > El dic 5, 2013 9:49 a.m., "Federico Gaule" <
> > > > > >> > > > fgaule@despegar.com
> > > > > >> > > > >> >
> > > > > >> > > > >> > > > > escribió:
> > > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > > > > Hi,
> > > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > > > > I have 2 clusters, Master (a) - Slave (b)
> > > > > >> replication.
> > > > > >> > > > >> > > > > > > B doesn't have client write or reads, all
> > > > handlers
> > > > > >> (100)
> > > > > >> > > are
> > > > > >> > > > >> > > waiting
> > > > > >> > > > >> > > > > but
> > > > > >> > > > >> > > > > > > rpc.metrics.RpcQueueTime_num_ops and
> > > > > >> > > > >> > > > rpc.metrics.RpcQueueTime_avg_time
> > > > > >> > > > >> > > > > > reports
> > > > > >> > > > >> > > > > > > to be rpc calls to be queued.
> > > > > >> > > > >> > > > > > > There are some screenshots below to show
> > > ganglia
> > > > > >> > metrics.
> > > > > >> > > > How
> > > > > >> > > > >> is
> > > > > >> > > > >> > > this
> > > > > >> > > > >> > > > > > > behaviour explained? I have looked for
> > metrics
> > > > > >> > > > specifications
> > > > > >> > > > >> but
> > > > > >> > > > >> > > > can't
> > > > > >> > > > >> > > > > > > find much information.
> > > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > > > > Handlers
> > > > > >> > > > >> > > > > > > http://i42.tinypic.com/242ssoz.png
> > > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > > > > NumOps
> > > > > >> > > > >> > > > > > > http://tinypic.com/r/of2c8k/5
> > > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > > > > AvgTime
> > > > > >> > > > >> > > > > > > http://tinypic.com/r/2lsvg5w/5
> > > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > > > > Cheers
> > > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > --
> > > > > >> > > > >> > > > > Bharath Vissapragada
> > > > > >> > > > >> > > > > <http://www.cloudera.com>
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > --
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > [image:
> > > > > >> > > > >>
> > > > http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > > > >> > > > >> > ]
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > *Ing. Federico Gaule*
> > > > > >> > > > >> > > > Líder Técnico - PAM <ho...@despegar.com>
> > > > > >> > > > >> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > > >> > > > >> > > > tel. +54 (11) 4894-3500
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > *[image: Seguinos en Twitter!] <
> > > > > >> > > http://twitter.com/#!/despegarar>
> > > > > >> > > > >> > > [image:
> > > > > >> > > > >> > > > Seguinos en Facebook!] <
> > > > http://www.facebook.com/despegar
> > > > > >
> > > > > >> > > [image:
> > > > > >> > > > >> > > Seguinos
> > > > > >> > > > >> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > > >> > > > >> > > > *Despegar.com, Inc. *
> > > > > >> > > > >> > > > El mejor precio para tu viaje.
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > Este mensaje es confidencial y puede contener
> > > > > informaciÃ³n
> > > > > >> > > > amparada
> > > > > >> > > > >> por
> > > > > >> > > > >> > > el
> > > > > >> > > > >> > > > secreto profesional.
> > > > > >> > > > >> > > > Si usted ha recibido este e-mail por error, por
> > favor
> > > > > >> > > > >> comunÃquenoslo
> > > > > >> > > > >> > > > inmediatamente respondiendo a este e-mail y luego
> > > > > >> > eliminÃ¡ndolo
> > > > > >> > > de
> > > > > >> > > > >> su
> > > > > >> > > > >> > > > sistema.
> > > > > >> > > > >> > > > El contenido de este mensaje no deberÃ¡ ser
> copiado
> > > ni
> > > > > >> > > divulgado a
> > > > > >> > > > >> > > ninguna
> > > > > >> > > > >> > > > persona.
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > >
> > > > > >> > > > >> >
> > > > > >> > > > >> >
> > > > > >> > > > >> >
> > > > > >> > > > >> > --
> > > > > >> > > > >> >
> > > > > >> > > > >> > [image:
> > > > > >> > > >
> > > http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> > > > > >> > > > >> >
> > > > > >> > > > >> > *Ing. Federico Gaule*
> > > > > >> > > > >> > Líder Técnico - PAM <ho...@despegar.com>
> > > > > >> > > > >> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > > >> > > > >> > tel. +54 (11) 4894-3500
> > > > > >> > > > >> >
> > > > > >> > > > >> > *[image: Seguinos en Twitter!] <
> > > > > >> http://twitter.com/#!/despegarar>
> > > > > >> > > > >> [image:
> > > > > >> > > > >> > Seguinos en Facebook!] <
> > http://www.facebook.com/despegar
> > > >
> > > > > >> [image:
> > > > > >> > > > >> Seguinos
> > > > > >> > > > >> > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > > >> > > > >> > *Despegar.com, Inc. *
> > > > > >> > > > >> > El mejor precio para tu viaje.
> > > > > >> > > > >> >
> > > > > >> > > > >> > Este mensaje es confidencial y puede contener
> > > informaciÃ³n
> > > > > >> > amparada
> > > > > >> > > > por
> > > > > >> > > > >> el
> > > > > >> > > > >> > secreto profesional.
> > > > > >> > > > >> > Si usted ha recibido este e-mail por error, por favor
> > > > > >> > > comunÃquenoslo
> > > > > >> > > > >> > inmediatamente respondiendo a este e-mail y luego
> > > > > >> eliminÃ¡ndolo de
> > > > > >> > > su
> > > > > >> > > > >> > sistema.
> > > > > >> > > > >> > El contenido de este mensaje no deberÃ¡ ser copiado
> ni
> > > > > >> divulgado a
> > > > > >> > > > >> ninguna
> > > > > >> > > > >> > persona.
> > > > > >> > > > >> >
> > > > > >> > > > >>
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > --
> > > > > >> > > > >
> > > > > >> > > > > [image:
> > > > > >> >
> http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > > > >> > > ]
> > > > > >> > > > >
> > > > > >> > > > > *Ing. Federico Gaule*
> > > > > >> > > > > Líder Técnico - PAM <ho...@despegar.com>
> > > > > >> > > > >
> > > > > >> > > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > > >> > > > > tel. +54 (11) 4894-3500
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > *[image: Seguinos en Twitter!] <
> > > > > http://twitter.com/#!/despegarar>
> > > > > >> > > > [image:
> > > > > >> > > > > Seguinos en Facebook!] <
> http://www.facebook.com/despegar>
> > > > > [image:
> > > > > >> > > > Seguinos
> > > > > >> > > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > > >> > > > > *Despegar.com, Inc. *
> > > > > >> > > > >
> > > > > >> > > > > El mejor precio para tu viaje.
> > > > > >> > > > >
> > > > > >> > > > > Este mensaje es confidencial y puede contener
> informaciÃ³n
> > > > > >> amparada
> > > > > >> > por
> > > > > >> > > > el
> > > > > >> > > > > secreto profesional.
> > > > > >> > > > > Si usted ha recibido este e-mail por error, por favor
> > > > > >> comunÃquenoslo
> > > > > >> > > > > inmediatamente respondiendo a este e-mail y luego
> > > > eliminÃ¡ndolo
> > > > > >> de su
> > > > > >> > > > > sistema.
> > > > > >> > > > > El contenido de este mensaje no deberÃ¡ ser copiado ni
> > > > > divulgado a
> > > > > >> > > > ninguna
> > > > > >> > > > > persona.
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > --
> > > > > >> > > >
> > > > > >> > > > [image:
> > > > > >> http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > > > >> > ]
> > > > > >> > > >
> > > > > >> > > > *Ing. Federico Gaule*
> > > > > >> > > > Líder Técnico - PAM <ho...@despegar.com>
> > > > > >> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > > >> > > > tel. +54 (11) 4894-3500
> > > > > >> > > >
> > > > > >> > > > *[image: Seguinos en Twitter!] <
> > > > http://twitter.com/#!/despegarar>
> > > > > >> > > [image:
> > > > > >> > > > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> > > > [image:
> > > > > >> > > Seguinos
> > > > > >> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > > >> > > > *Despegar.com, Inc. *
> > > > > >> > > > El mejor precio para tu viaje.
> > > > > >> > > >
> > > > > >> > > > Este mensaje es confidencial y puede contener informaciÃ³n
> > > > > amparada
> > > > > >> por
> > > > > >> > > el
> > > > > >> > > > secreto profesional.
> > > > > >> > > > Si usted ha recibido este e-mail por error, por favor
> > > > > >> comunÃquenoslo
> > > > > >> > > > inmediatamente respondiendo a este e-mail y luego
> > > eliminÃ¡ndolo
> > > > de
> > > > > >> su
> > > > > >> > > > sistema.
> > > > > >> > > > El contenido de este mensaje no deberÃ¡ ser copiado ni
> > > > divulgado a
> > > > > >> > > ninguna
> > > > > >> > > > persona.
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > --
> > > > > >> >
> > > > > >> > [image:
> > > > > http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> > > > > >> >
> > > > > >> > *Ing. Federico Gaule*
> > > > > >> > Líder Técnico - PAM <ho...@despegar.com>
> > > > > >> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > > >> > tel. +54 (11) 4894-3500
> > > > > >> >
> > > > > >> > *[image: Seguinos en Twitter!] <
> > http://twitter.com/#!/despegarar>
> > > > > >> [image:
> > > > > >> > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> > [image:
> > > > > >> Seguinos
> > > > > >> > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > > >> > *Despegar.com, Inc. *
> > > > > >> > El mejor precio para tu viaje.
> > > > > >> >
> > > > > >> > Este mensaje es confidencial y puede contener informaciÃ³n
> > > amparada
> > > > > por
> > > > > >> el
> > > > > >> > secreto profesional.
> > > > > >> > Si usted ha recibido este e-mail por error, por favor
> > > > comunÃquenoslo
> > > > > >> > inmediatamente respondiendo a este e-mail y luego
> eliminÃ¡ndolo
> > de
> > > > su
> > > > > >> > sistema.
> > > > > >> > El contenido de este mensaje no deberÃ¡ ser copiado ni
> > divulgado a
> > > > > >> ninguna
> > > > > >> > persona.
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > [image:
> > > http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > > ]
> > > > > >
> > > > > > *Ing. Federico Gaule*
> > > > > > Líder Técnico - PAM <ho...@despegar.com>
> > > > > >
> > > > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > > > tel. +54 (11) 4894-3500
> > > > > >
> > > > > >
> > > > > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar
> >
> > > > > [image:
> > > > > > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> [image:
> > > > > Seguinos
> > > > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > > > *Despegar.com, Inc. *
> > > > > >
> > > > > > El mejor precio para tu viaje.
> > > > > >
> > > > > > Este mensaje es confidencial y puede contener informaciÃ³n
> amparada
> > > por
> > > > > el
> > > > > > secreto profesional.
> > > > > > Si usted ha recibido este e-mail por error, por favor
> > comunÃquenoslo
> > > > > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo
> de
> > su
> > > > > > sistema.
> > > > > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado
> a
> > > > > ninguna
> > > > > > persona.
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > [image:
> > http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > ]
> > > > >
> > > > > *Ing. Federico Gaule*
> > > > > Líder Técnico - PAM <ho...@despegar.com>
> > > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > > tel. +54 (11) 4894-3500
> > > > >
> > > > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> > > > [image:
> > > > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> > > > Seguinos
> > > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > > *Despegar.com, Inc. *
> > > > > El mejor precio para tu viaje.
> > > > >
> > > > > Este mensaje es confidencial y puede contener informaciÃ³n amparada
> > por
> > > > el
> > > > > secreto profesional.
> > > > > Si usted ha recibido este e-mail por error, por favor
> comunÃquenoslo
> > > > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de
> su
> > > > > sistema.
> > > > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> > > > ninguna
> > > > > persona.
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png
> ]
> > >
> > > *Ing. Federico Gaule*
> > > Líder Técnico - PAM <ho...@despegar.com>
> > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > tel. +54 (11) 4894-3500
> > >
> > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> > [image:
> > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> > Seguinos
> > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > *Despegar.com, Inc. *
> > > El mejor precio para tu viaje.
> > >
> > > Este mensaje es confidencial y puede contener informaciÃ³n amparada por
> > el
> > > secreto profesional.
> > > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> > > sistema.
> > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> > ninguna
> > > persona.
> > >
> >
>
>
>
> --
>
> [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
>
> *Ing. Federico Gaule*
> Líder Técnico - PAM <ho...@despegar.com>
> Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> tel. +54 (11) 4894-3500
>
> *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
> Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
> en YouTube!] <http://www.youtube.com/Despegar>*
> *Despegar.com, Inc. *
> El mejor precio para tu viaje.
>
> Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
> secreto profesional.
> Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> sistema.
> El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
> persona.
>

Re: RPC - Queue Time when handlers are all waiting

Posted by Federico Gaule <fg...@despegar.com>.

There is any coprocessor on slaves, neither master.
How can i dump RPC queues?

Thanks!


2013/12/10 Jean-Marc Spaggiari <je...@spaggiari.org>

> Here are the properties from the code regarding the handlers.
> hbase.master.handler.count
> hbase.regionserver.handler.count
> hbase.regionserver.replication.handler.count
> hbase.regionserver.metahandler.count
>
> Do you have any coprocessor configured on your slave cluster? Ar eyou able
> to dump the RPC queues every 5 seconds to see what is in?
>
> JM
>
>
>
> 2013/12/10 Federico Gaule <fg...@despegar.com>
>
> > I'm using hbase 0.94.13.
> > hbase.regionserver.metahandler.count is a more intuituve name for those
> > handlers :)
> >
> >
> > 2013/12/10 Nicolas Liochon <nk...@gmail.com>
> >
> > > It's hbase.regionserver.metahandler.count. Not sure it causes the issue
> > > you're facing, thought. What's your HBase version?
> > >
> > >
> > > On Tue, Dec 10, 2013 at 1:21 PM, Federico Gaule <fg...@despegar.com>
> > > wrote:
> > >
> > > > There is another set of handler we haven't customized "PRI IPC"
> > (priority
> > > > ?). What are those handlers used for? What is the property used to
> > > increase
> > > > the number of handlers?
> > hbase.regionserver.custom.priority.handler.count
> > > ?
> > > >
> > > > Thanks!
> > > >
> > > >
> > > > 2013/12/10 Federico Gaule <fg...@despegar.com>
> > > >
> > > > > I've increased hbase.regionserver.replication.handler.count 10x
> (30)
> > > but
> > > > > nothing have changed. rpc.metrics.RpcQueueTime_avg_time still shows
> > > > > activity :(
> > > > >
> > > > > Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 29 on 60000
> > WAITING
> > > > > (since 16hrs, 58mins, 56sec ago)Waiting for a call (since 16hrs,
> > > 58mins,
> > > > > 56sec ago) Mon Dec 09 14:04:10 EST 2013 REPL IPC Server handler 28
> on
> > > > > 60000WAITING (since 16hrs, 58mins, 56sec ago) Waiting for a call
> > (since
> > > > > 16hrs, 58mins, 56sec ago)Mon Dec 09 14:04:10 EST 2013 REPL IPC
> Server
> > > > > handler 27 on 60000 WAITING (since 16hrs, 58mins, 56sec ago)Waiting
> > > for a
> > > > > call (since 16hrs, 58mins, 56sec ago) Mon Dec 09 14:04:10 EST 2013
> > REPL
> > > > > IPC Server handler 26 on 60000WAITING (since 16hrs, 58mins, 56sec
> > > > ago)Waiting for a call (since 16hrs, 58mins, 56sec ago)
> > > > > ... ...
> > > > > ...
> > > > > ...
> > > > > Mon Dec 09 14:04:10 EST 2013 REPL IPC Server handler 2 on
> > 60000WAITING
> > > > > (since 16hrs, 58mins, 56sec ago) Waiting for a call (since 16hrs,
> > > 58mins,
> > > > > 56sec ago)Mon Dec 09 14:04:10 EST 2013 REPL IPC Server handler 1 on
> > > > 60000WAITING (since 16hrs, 58mins, 56sec ago)Waiting
> > > > > for a call (since 16hrs, 58mins, 56sec ago) Mon Dec 09 14:04:10 EST
> > > > 2013REPL IPC Server handler 0 on 60000WAITING
> > > > > (since 16hrs, 58mins, 56sec ago) Waiting for a call (since 16hrs,
> > > 58mins,
> > > > > 56sec ago)
> > > > > Thanks JM
> > > > >
> > > > >
> > > > > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > > > >
> > > > >> Yes, default value is 3 in 0.94.14. If you have not changed it,
> then
> > > > it's
> > > > >> still 3.
> > > > >>
> > > > >> conf.getInt("hbase.regionserver.replication.handler.count", 3);
> > > > >>
> > > > >> Keep us posted on the results.
> > > > >>
> > > > >> JM
> > > > >>
> > > > >>
> > > > >> 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > > >>
> > > > >> > Default value for hbase.regionserver.replication.handler.count
> > > (can't
> > > > >> find
> > > > >> > what is the default, Is it 3?)
> > > > >> > I'll do a try increasing that property
> > > > >> >
> > > > >> > Fri Dec 06 12:44:12 EST 2013REPL IPC Server handler 2 on
> > > 60020WAITING
> > > > >> > (since 8sec ago)Waiting for a call (since 8sec ago)Fri Dec 06
> > > 12:44:12
> > > > >> EST
> > > > >> > 2013REPL IPC Server handler 1 on 60020WAITING (since 8sec
> > > ago)Waiting
> > > > >> for a
> > > > >> > call (since 8sec ago)Fri Dec 06 12:44:12 EST 2013REPL IPC Server
> > > > >> handler 0
> > > > >> > on 60020WAITING (since 2sec ago)Waiting for a call (since 2sec
> > ago)
> > > > >> > Thanks JM
> > > > >> >
> > > > >> >
> > > > >> > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > > > >> >
> > > > >> > > For replications, the handlers used on the salve cluster are
> > > > >> configured
> > > > >> > by
> > > > >> > > hbase.regionserver.replication.handler.count. What value do
> you
> > > have
> > > > >> for
> > > > >> > > this property?
> > > > >> > >
> > > > >> > > JM
> > > > >> > >
> > > > >> > >
> > > > >> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > > >> > >
> > > > >> > > > Here is a thread saying what i think it should be (
> > > > >> > > >
> > > > http://grokbase.com/t/hbase/user/13bmndq53k/average-rpc-queue-time)
> > > > >> > > >
> > > > >> > > > "The RpcQueueTime metrics are a measurement of how long
> > > individual
> > > > >> > calls
> > > > >> > > > stay in this queued state. If your handlers were never 100%
> > > > >> occupied,
> > > > >> > > this
> > > > >> > > > value would be 0. An average of 3 hours is concerning, it
> > > > basically
> > > > >> > means
> > > > >> > > > that when a call comes into the RegionServer it takes on
> > > average 3
> > > > >> > hours
> > > > >> > > to
> > > > >> > > > start processing, because handlers are all occupied for that
> > > > amount
> > > > >> of
> > > > >> > > > time."
> > > > >> > > >
> > > > >> > > > Is that correct?
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > > >> > > >
> > > > >> > > > > Correct me if i'm wrong, but, Queues should be used only
> > when
> > > > >> > handlers
> > > > >> > > > are
> > > > >> > > > > all busy, shouldn't it?.
> > > > >> > > > > If that's true, i don't get why there is activity related
> to
> > > > >> queues.
> > > > >> > > > >
> > > > >> > > > > Maybe i'm missing some piece of knowledge about when hbase
> > is
> > > > >> using
> > > > >> > > > queues
> > > > >> > > > > :)
> > > > >> > > > >
> > > > >> > > > > Thanks
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > > > >> > > > >
> > > > >> > > > >> There might be something I'm missing ;)
> > > > >> > > > >>
> > > > >> > > > >> On cluster B, as you said, never more than 50% of your
> > > handlers
> > > > >> are
> > > > >> > > > used.
> > > > >> > > > >> Your Ganglia metrics are showing that there is activities
> > > (num
> > > > >> ops
> > > > >> > is
> > > > >> > > > >> increasing), which is correct.
> > > > >> > > > >>
> > > > >> > > > >> Can you please confirm what you think is wrong from your
> > > > charts?
> > > > >> > > > >>
> > > > >> > > > >> Thanks,
> > > > >> > > > >>
> > > > >> > > > >> JM
> > > > >> > > > >>
> > > > >> > > > >>
> > > > >> > > > >> 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > > >> > > > >>
> > > > >> > > > >> > Hi JM,
> > > > >> > > > >> > Cluster B is only receiving replication data (writes),
> > but
> > > > >> > handlers
> > > > >> > > > are
> > > > >> > > > >> > waiting most of the time (never 50% of them are used).
> > As i
> > > > >> have
> > > > >> > > read,
> > > > >> > > > >> RPC
> > > > >> > > > >> > queue is only used when handlers are all waiting, does
> it
> > > > count
> > > > >> > for
> > > > >> > > > >> > replication as well?
> > > > >> > > > >> >
> > > > >> > > > >> > Thanks!
> > > > >> > > > >> >
> > > > >> > > > >> >
> > > > >> > > > >> > 2013/12/9 Jean-Marc Spaggiari <jean-marc@spaggiari.org
> >
> > > > >> > > > >> >
> > > > >> > > > >> > > Hi,
> > > > >> > > > >> > >
> > > > >> > > > >> > > When you say that B doesn't get any read/write
> > operation,
> > > > >> does
> > > > >> > it
> > > > >> > > > mean
> > > > >> > > > >> > you
> > > > >> > > > >> > > stopped the replication? Or B is still getting the
> > write
> > > > >> > > operations
> > > > >> > > > >> from
> > > > >> > > > >> > A
> > > > >> > > > >> > > because of the replication? If so, that's why you RPC
> > > queue
> > > > >> is
> > > > >> > > > used...
> > > > >> > > > >> > >
> > > > >> > > > >> > > JM
> > > > >> > > > >> > >
> > > > >> > > > >> > >
> > > > >> > > > >> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > > >> > > > >> > >
> > > > >> > > > >> > > > Not much information in RS logs (DEBUG level set to
> > > > >> > > > >> > > > org.apache.hadoop.hbase). Here is a sample of one
> > > > >> regionserver
> > > > >> > > > >> showing
> > > > >> > > > >> > > > increasing rpc.metrics.RpcQueueTime_num_ops and
> > > > >> > > > >> > > > rpc.metrics.RpcQueueTime_avg_time
> > > > >> > > > >> > > > activity:
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > 2013-12-09 08:09:10,699 DEBUG
> > > > >> > > > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache:
> > Stats:
> > > > >> > > total=23.14
> > > > >> > > > >> MB,
> > > > >> > > > >> > > > free=2.73 GB, max=2.75 GB, blocks=0,
> > > accesses=122442151,
> > > > >> > > > >> > hits=122168501,
> > > > >> > > > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> > > > >> > > > cachingHits=122162378,
> > > > >> > > > >> > > > cachingHitsRatio=99.97%, , evictions=0,
> evicted=6768,
> > > > >> > > > >> > > > evictedPerRun=Infinity
> > > > >> > > > >> > > > 2013-12-09 08:09:11,396 INFO
> > > > >> > > > >> > > >
> > > > >> > >
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> > > > >> Total
> > > > >> > > > >> > > > replicated: 1
> > > > >> > > > >> > > > 2013-12-09 08:09:14,979 INFO
> > > > >> > > > >> > > >
> > > > >> > >
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> > > > >> Total
> > > > >> > > > >> > > > replicated: 2
> > > > >> > > > >> > > > 2013-12-09 08:09:16,016 INFO
> > > > >> > > > >> > > >
> > > > >> > >
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> > > > >> Total
> > > > >> > > > >> > > > replicated: 1
> > > > >> > > > >> > > > ...
> > > > >> > > > >> > > > 2013-12-09 08:14:07,659 INFO
> > > > >> > > > >> > > >
> > > > >> > >
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> > > > >> Total
> > > > >> > > > >> > > > replicated: 1
> > > > >> > > > >> > > > 2013-12-09 08:14:08,713 INFO
> > > > >> > > > >> > > >
> > > > >> > >
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> > > > >> Total
> > > > >> > > > >> > > > replicated: 3
> > > > >> > > > >> > > > 2013-12-09 08:14:10,699 DEBUG
> > > > >> > > > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache:
> > Stats:
> > > > >> > > total=23.14
> > > > >> > > > >> MB,
> > > > >> > > > >> > > > free=2.73 GB, max=2.75 GB, blocks=0,
> > > accesses=122442151,
> > > > >> > > > >> > hits=122168501,
> > > > >> > > > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> > > > >> > > > cachingHits=122162378,
> > > > >> > > > >> > > > cachingHitsRatio=99.97%, , evictions=0,
> evicted=6768,
> > > > >> > > > >> > > > evictedPerRun=Infinity
> > > > >> > > > >> > > > 2013-12-09 08:14:12,711 INFO
> > > > >> > > > >> > > >
> > > > >> > >
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> > > > >> Total
> > > > >> > > > >> > > > replicated: 1
> > > > >> > > > >> > > > 2013-12-09 08:14:14,778 INFO
> > > > >> > > > >> > > >
> > > > >> > >
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> > > > >> Total
> > > > >> > > > >> > > > replicated: 3
> > > > >> > > > >> > > > ...
> > > > >> > > > >> > > > 2013-12-09 08:15:09,199 INFO
> > > > >> > > > >> > > >
> > > > >> > >
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> > > > >> Total
> > > > >> > > > >> > > > replicated: 3
> > > > >> > > > >> > > > 2013-12-09 08:15:12,243 INFO
> > > > >> > > > >> > > >
> > > > >> > >
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> > > > >> Total
> > > > >> > > > >> > > > replicated: 2
> > > > >> > > > >> > > > 2013-12-09 08:15:22,086 INFO
> > > > >> > > > >> > > >
> > > > >> > >
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> > > > >> Total
> > > > >> > > > >> > > > replicated: 2
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > Thanks
> > > > >> > > > >> > > >
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > 2013/12/7 Bharath Vissapragada <
> > bharathv@cloudera.com>
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > > I'd look into the RS logs to see whats happening
> > > there.
> > > > >> > > > Difficult
> > > > >> > > > >> to
> > > > >> > > > >> > > > guess
> > > > >> > > > >> > > > > from the given information!
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > > On Sat, Dec 7, 2013 at 8:52 PM, Federico Gaule <
> > > > >> > > > >> fgaule@despegar.com>
> > > > >> > > > >> > > > > wrote:
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > > > Any clue?
> > > > >> > > > >> > > > > > El dic 5, 2013 9:49 a.m., "Federico Gaule" <
> > > > >> > > > fgaule@despegar.com
> > > > >> > > > >> >
> > > > >> > > > >> > > > > escribió:
> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > > > > Hi,
> > > > >> > > > >> > > > > > >
> > > > >> > > > >> > > > > > > I have 2 clusters, Master (a) - Slave (b)
> > > > >> replication.
> > > > >> > > > >> > > > > > > B doesn't have client write or reads, all
> > > handlers
> > > > >> (100)
> > > > >> > > are
> > > > >> > > > >> > > waiting
> > > > >> > > > >> > > > > but
> > > > >> > > > >> > > > > > > rpc.metrics.RpcQueueTime_num_ops and
> > > > >> > > > >> > > > rpc.metrics.RpcQueueTime_avg_time
> > > > >> > > > >> > > > > > reports
> > > > >> > > > >> > > > > > > to be rpc calls to be queued.
> > > > >> > > > >> > > > > > > There are some screenshots below to show
> > ganglia
> > > > >> > metrics.
> > > > >> > > > How
> > > > >> > > > >> is
> > > > >> > > > >> > > this
> > > > >> > > > >> > > > > > > behaviour explained? I have looked for
> metrics
> > > > >> > > > specifications
> > > > >> > > > >> but
> > > > >> > > > >> > > > can't
> > > > >> > > > >> > > > > > > find much information.
> > > > >> > > > >> > > > > > >
> > > > >> > > > >> > > > > > > Handlers
> > > > >> > > > >> > > > > > > http://i42.tinypic.com/242ssoz.png
> > > > >> > > > >> > > > > > >
> > > > >> > > > >> > > > > > > NumOps
> > > > >> > > > >> > > > > > > http://tinypic.com/r/of2c8k/5
> > > > >> > > > >> > > > > > >
> > > > >> > > > >> > > > > > > AvgTime
> > > > >> > > > >> > > > > > > http://tinypic.com/r/2lsvg5w/5
> > > > >> > > > >> > > > > > >
> > > > >> > > > >> > > > > > > Cheers
> > > > >> > > > >> > > > > > >
> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > > --
> > > > >> > > > >> > > > > Bharath Vissapragada
> > > > >> > > > >> > > > > <http://www.cloudera.com>
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > >
> > > > >> > > > >> > > >
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > --
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > [image:
> > > > >> > > > >>
> > > http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > > >> > > > >> > ]
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > *Ing. Federico Gaule*
> > > > >> > > > >> > > > Líder Técnico - PAM <ho...@despegar.com>
> > > > >> > > > >> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > >> > > > >> > > > tel. +54 (11) 4894-3500
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > *[image: Seguinos en Twitter!] <
> > > > >> > > http://twitter.com/#!/despegarar>
> > > > >> > > > >> > > [image:
> > > > >> > > > >> > > > Seguinos en Facebook!] <
> > > http://www.facebook.com/despegar
> > > > >
> > > > >> > > [image:
> > > > >> > > > >> > > Seguinos
> > > > >> > > > >> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > >> > > > >> > > > *Despegar.com, Inc. *
> > > > >> > > > >> > > > El mejor precio para tu viaje.
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > Este mensaje es confidencial y puede contener
> > > > informaciÃ³n
> > > > >> > > > amparada
> > > > >> > > > >> por
> > > > >> > > > >> > > el
> > > > >> > > > >> > > > secreto profesional.
> > > > >> > > > >> > > > Si usted ha recibido este e-mail por error, por
> favor
> > > > >> > > > >> comunÃquenoslo
> > > > >> > > > >> > > > inmediatamente respondiendo a este e-mail y luego
> > > > >> > eliminÃ¡ndolo
> > > > >> > > de
> > > > >> > > > >> su
> > > > >> > > > >> > > > sistema.
> > > > >> > > > >> > > > El contenido de este mensaje no deberÃ¡ ser copiado
> > ni
> > > > >> > > divulgado a
> > > > >> > > > >> > > ninguna
> > > > >> > > > >> > > > persona.
> > > > >> > > > >> > > >
> > > > >> > > > >> > >
> > > > >> > > > >> >
> > > > >> > > > >> >
> > > > >> > > > >> >
> > > > >> > > > >> > --
> > > > >> > > > >> >
> > > > >> > > > >> > [image:
> > > > >> > > >
> > http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> > > > >> > > > >> >
> > > > >> > > > >> > *Ing. Federico Gaule*
> > > > >> > > > >> > Líder Técnico - PAM <ho...@despegar.com>
> > > > >> > > > >> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > >> > > > >> > tel. +54 (11) 4894-3500
> > > > >> > > > >> >
> > > > >> > > > >> > *[image: Seguinos en Twitter!] <
> > > > >> http://twitter.com/#!/despegarar>
> > > > >> > > > >> [image:
> > > > >> > > > >> > Seguinos en Facebook!] <
> http://www.facebook.com/despegar
> > >
> > > > >> [image:
> > > > >> > > > >> Seguinos
> > > > >> > > > >> > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > >> > > > >> > *Despegar.com, Inc. *
> > > > >> > > > >> > El mejor precio para tu viaje.
> > > > >> > > > >> >
> > > > >> > > > >> > Este mensaje es confidencial y puede contener
> > informaciÃ³n
> > > > >> > amparada
> > > > >> > > > por
> > > > >> > > > >> el
> > > > >> > > > >> > secreto profesional.
> > > > >> > > > >> > Si usted ha recibido este e-mail por error, por favor
> > > > >> > > comunÃquenoslo
> > > > >> > > > >> > inmediatamente respondiendo a este e-mail y luego
> > > > >> eliminÃ¡ndolo de
> > > > >> > > su
> > > > >> > > > >> > sistema.
> > > > >> > > > >> > El contenido de este mensaje no deberÃ¡ ser copiado ni
> > > > >> divulgado a
> > > > >> > > > >> ninguna
> > > > >> > > > >> > persona.
> > > > >> > > > >> >
> > > > >> > > > >>
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > --
> > > > >> > > > >
> > > > >> > > > > [image:
> > > > >> > http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > > >> > > ]
> > > > >> > > > >
> > > > >> > > > > *Ing. Federico Gaule*
> > > > >> > > > > Líder Técnico - PAM <ho...@despegar.com>
> > > > >> > > > >
> > > > >> > > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > >> > > > > tel. +54 (11) 4894-3500
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > *[image: Seguinos en Twitter!] <
> > > > http://twitter.com/#!/despegarar>
> > > > >> > > > [image:
> > > > >> > > > > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> > > > [image:
> > > > >> > > > Seguinos
> > > > >> > > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > >> > > > > *Despegar.com, Inc. *
> > > > >> > > > >
> > > > >> > > > > El mejor precio para tu viaje.
> > > > >> > > > >
> > > > >> > > > > Este mensaje es confidencial y puede contener informaciÃ³n
> > > > >> amparada
> > > > >> > por
> > > > >> > > > el
> > > > >> > > > > secreto profesional.
> > > > >> > > > > Si usted ha recibido este e-mail por error, por favor
> > > > >> comunÃquenoslo
> > > > >> > > > > inmediatamente respondiendo a este e-mail y luego
> > > eliminÃ¡ndolo
> > > > >> de su
> > > > >> > > > > sistema.
> > > > >> > > > > El contenido de este mensaje no deberÃ¡ ser copiado ni
> > > > divulgado a
> > > > >> > > > ninguna
> > > > >> > > > > persona.
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > >
> > > > >> > > > [image:
> > > > >> http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > > >> > ]
> > > > >> > > >
> > > > >> > > > *Ing. Federico Gaule*
> > > > >> > > > Líder Técnico - PAM <ho...@despegar.com>
> > > > >> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > >> > > > tel. +54 (11) 4894-3500
> > > > >> > > >
> > > > >> > > > *[image: Seguinos en Twitter!] <
> > > http://twitter.com/#!/despegarar>
> > > > >> > > [image:
> > > > >> > > > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> > > [image:
> > > > >> > > Seguinos
> > > > >> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > >> > > > *Despegar.com, Inc. *
> > > > >> > > > El mejor precio para tu viaje.
> > > > >> > > >
> > > > >> > > > Este mensaje es confidencial y puede contener informaciÃ³n
> > > > amparada
> > > > >> por
> > > > >> > > el
> > > > >> > > > secreto profesional.
> > > > >> > > > Si usted ha recibido este e-mail por error, por favor
> > > > >> comunÃquenoslo
> > > > >> > > > inmediatamente respondiendo a este e-mail y luego
> > eliminÃ¡ndolo
> > > de
> > > > >> su
> > > > >> > > > sistema.
> > > > >> > > > El contenido de este mensaje no deberÃ¡ ser copiado ni
> > > divulgado a
> > > > >> > > ninguna
> > > > >> > > > persona.
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> >
> > > > >> > [image:
> > > > http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> > > > >> >
> > > > >> > *Ing. Federico Gaule*
> > > > >> > Líder Técnico - PAM <ho...@despegar.com>
> > > > >> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > >> > tel. +54 (11) 4894-3500
> > > > >> >
> > > > >> > *[image: Seguinos en Twitter!] <
> http://twitter.com/#!/despegarar>
> > > > >> [image:
> > > > >> > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> [image:
> > > > >> Seguinos
> > > > >> > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > >> > *Despegar.com, Inc. *
> > > > >> > El mejor precio para tu viaje.
> > > > >> >
> > > > >> > Este mensaje es confidencial y puede contener informaciÃ³n
> > amparada
> > > > por
> > > > >> el
> > > > >> > secreto profesional.
> > > > >> > Si usted ha recibido este e-mail por error, por favor
> > > comunÃquenoslo
> > > > >> > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo
> de
> > > su
> > > > >> > sistema.
> > > > >> > El contenido de este mensaje no deberÃ¡ ser copiado ni
> divulgado a
> > > > >> ninguna
> > > > >> > persona.
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > [image:
> > http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > ]
> > > > >
> > > > > *Ing. Federico Gaule*
> > > > > Líder Técnico - PAM <ho...@despegar.com>
> > > > >
> > > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > > tel. +54 (11) 4894-3500
> > > > >
> > > > >
> > > > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> > > > [image:
> > > > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> > > > Seguinos
> > > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > > *Despegar.com, Inc. *
> > > > >
> > > > > El mejor precio para tu viaje.
> > > > >
> > > > > Este mensaje es confidencial y puede contener informaciÃ³n amparada
> > por
> > > > el
> > > > > secreto profesional.
> > > > > Si usted ha recibido este e-mail por error, por favor
> comunÃquenoslo
> > > > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de
> su
> > > > > sistema.
> > > > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> > > > ninguna
> > > > > persona.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > [image:
> http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > ]
> > > >
> > > > *Ing. Federico Gaule*
> > > > Líder Técnico - PAM <ho...@despegar.com>
> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > tel. +54 (11) 4894-3500
> > > >
> > > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> > > [image:
> > > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> > > Seguinos
> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > *Despegar.com, Inc. *
> > > > El mejor precio para tu viaje.
> > > >
> > > > Este mensaje es confidencial y puede contener informaciÃ³n amparada
> por
> > > el
> > > > secreto profesional.
> > > > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> > > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> > > > sistema.
> > > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> > > ninguna
> > > > persona.
> > > >
> > >
> >
> >
> >
> > --
> >
> > [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> >
> > *Ing. Federico Gaule*
> > Líder Técnico - PAM <ho...@despegar.com>
> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > tel. +54 (11) 4894-3500
> >
> > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> [image:
> > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> Seguinos
> > en YouTube!] <http://www.youtube.com/Despegar>*
> > *Despegar.com, Inc. *
> > El mejor precio para tu viaje.
> >
> > Este mensaje es confidencial y puede contener informaciÃ³n amparada por
> el
> > secreto profesional.
> > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> > sistema.
> > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> ninguna
> > persona.
> >
>



-- 

[image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]

*Ing. Federico Gaule*
Líder Técnico - PAM <ho...@despegar.com>
Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
tel. +54 (11) 4894-3500

*[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
en YouTube!] <http://www.youtube.com/Despegar>*
*Despegar.com, Inc. *
El mejor precio para tu viaje.

Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
secreto profesional.
Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
sistema.
El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
persona.

Re: RPC - Queue Time when handlers are all waiting

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Here are the properties from the code regarding the handlers.
hbase.master.handler.count
hbase.regionserver.handler.count
hbase.regionserver.replication.handler.count
hbase.regionserver.metahandler.count

Do you have any coprocessor configured on your slave cluster? Ar eyou able
to dump the RPC queues every 5 seconds to see what is in?

JM



2013/12/10 Federico Gaule <fg...@despegar.com>

> I'm using hbase 0.94.13.
> hbase.regionserver.metahandler.count is a more intuituve name for those
> handlers :)
>
>
> 2013/12/10 Nicolas Liochon <nk...@gmail.com>
>
> > It's hbase.regionserver.metahandler.count. Not sure it causes the issue
> > you're facing, thought. What's your HBase version?
> >
> >
> > On Tue, Dec 10, 2013 at 1:21 PM, Federico Gaule <fg...@despegar.com>
> > wrote:
> >
> > > There is another set of handler we haven't customized "PRI IPC"
> (priority
> > > ?). What are those handlers used for? What is the property used to
> > increase
> > > the number of handlers?
> hbase.regionserver.custom.priority.handler.count
> > ?
> > >
> > > Thanks!
> > >
> > >
> > > 2013/12/10 Federico Gaule <fg...@despegar.com>
> > >
> > > > I've increased hbase.regionserver.replication.handler.count 10x (30)
> > but
> > > > nothing have changed. rpc.metrics.RpcQueueTime_avg_time still shows
> > > > activity :(
> > > >
> > > > Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 29 on 60000
> WAITING
> > > > (since 16hrs, 58mins, 56sec ago)Waiting for a call (since 16hrs,
> > 58mins,
> > > > 56sec ago) Mon Dec 09 14:04:10 EST 2013 REPL IPC Server handler 28 on
> > > > 60000WAITING (since 16hrs, 58mins, 56sec ago) Waiting for a call
> (since
> > > > 16hrs, 58mins, 56sec ago)Mon Dec 09 14:04:10 EST 2013 REPL IPC Server
> > > > handler 27 on 60000 WAITING (since 16hrs, 58mins, 56sec ago)Waiting
> > for a
> > > > call (since 16hrs, 58mins, 56sec ago) Mon Dec 09 14:04:10 EST 2013
> REPL
> > > > IPC Server handler 26 on 60000WAITING (since 16hrs, 58mins, 56sec
> > > ago)Waiting for a call (since 16hrs, 58mins, 56sec ago)
> > > > ... ...
> > > > ...
> > > > ...
> > > > Mon Dec 09 14:04:10 EST 2013 REPL IPC Server handler 2 on
> 60000WAITING
> > > > (since 16hrs, 58mins, 56sec ago) Waiting for a call (since 16hrs,
> > 58mins,
> > > > 56sec ago)Mon Dec 09 14:04:10 EST 2013 REPL IPC Server handler 1 on
> > > 60000WAITING (since 16hrs, 58mins, 56sec ago)Waiting
> > > > for a call (since 16hrs, 58mins, 56sec ago) Mon Dec 09 14:04:10 EST
> > > 2013REPL IPC Server handler 0 on 60000WAITING
> > > > (since 16hrs, 58mins, 56sec ago) Waiting for a call (since 16hrs,
> > 58mins,
> > > > 56sec ago)
> > > > Thanks JM
> > > >
> > > >
> > > > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > > >
> > > >> Yes, default value is 3 in 0.94.14. If you have not changed it, then
> > > it's
> > > >> still 3.
> > > >>
> > > >> conf.getInt("hbase.regionserver.replication.handler.count", 3);
> > > >>
> > > >> Keep us posted on the results.
> > > >>
> > > >> JM
> > > >>
> > > >>
> > > >> 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > >>
> > > >> > Default value for hbase.regionserver.replication.handler.count
> > (can't
> > > >> find
> > > >> > what is the default, Is it 3?)
> > > >> > I'll do a try increasing that property
> > > >> >
> > > >> > Fri Dec 06 12:44:12 EST 2013REPL IPC Server handler 2 on
> > 60020WAITING
> > > >> > (since 8sec ago)Waiting for a call (since 8sec ago)Fri Dec 06
> > 12:44:12
> > > >> EST
> > > >> > 2013REPL IPC Server handler 1 on 60020WAITING (since 8sec
> > ago)Waiting
> > > >> for a
> > > >> > call (since 8sec ago)Fri Dec 06 12:44:12 EST 2013REPL IPC Server
> > > >> handler 0
> > > >> > on 60020WAITING (since 2sec ago)Waiting for a call (since 2sec
> ago)
> > > >> > Thanks JM
> > > >> >
> > > >> >
> > > >> > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > > >> >
> > > >> > > For replications, the handlers used on the salve cluster are
> > > >> configured
> > > >> > by
> > > >> > > hbase.regionserver.replication.handler.count. What value do you
> > have
> > > >> for
> > > >> > > this property?
> > > >> > >
> > > >> > > JM
> > > >> > >
> > > >> > >
> > > >> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > >> > >
> > > >> > > > Here is a thread saying what i think it should be (
> > > >> > > >
> > > http://grokbase.com/t/hbase/user/13bmndq53k/average-rpc-queue-time)
> > > >> > > >
> > > >> > > > "The RpcQueueTime metrics are a measurement of how long
> > individual
> > > >> > calls
> > > >> > > > stay in this queued state. If your handlers were never 100%
> > > >> occupied,
> > > >> > > this
> > > >> > > > value would be 0. An average of 3 hours is concerning, it
> > > basically
> > > >> > means
> > > >> > > > that when a call comes into the RegionServer it takes on
> > average 3
> > > >> > hours
> > > >> > > to
> > > >> > > > start processing, because handlers are all occupied for that
> > > amount
> > > >> of
> > > >> > > > time."
> > > >> > > >
> > > >> > > > Is that correct?
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > >> > > >
> > > >> > > > > Correct me if i'm wrong, but, Queues should be used only
> when
> > > >> > handlers
> > > >> > > > are
> > > >> > > > > all busy, shouldn't it?.
> > > >> > > > > If that's true, i don't get why there is activity related to
> > > >> queues.
> > > >> > > > >
> > > >> > > > > Maybe i'm missing some piece of knowledge about when hbase
> is
> > > >> using
> > > >> > > > queues
> > > >> > > > > :)
> > > >> > > > >
> > > >> > > > > Thanks
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > > >> > > > >
> > > >> > > > >> There might be something I'm missing ;)
> > > >> > > > >>
> > > >> > > > >> On cluster B, as you said, never more than 50% of your
> > handlers
> > > >> are
> > > >> > > > used.
> > > >> > > > >> Your Ganglia metrics are showing that there is activities
> > (num
> > > >> ops
> > > >> > is
> > > >> > > > >> increasing), which is correct.
> > > >> > > > >>
> > > >> > > > >> Can you please confirm what you think is wrong from your
> > > charts?
> > > >> > > > >>
> > > >> > > > >> Thanks,
> > > >> > > > >>
> > > >> > > > >> JM
> > > >> > > > >>
> > > >> > > > >>
> > > >> > > > >> 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > >> > > > >>
> > > >> > > > >> > Hi JM,
> > > >> > > > >> > Cluster B is only receiving replication data (writes),
> but
> > > >> > handlers
> > > >> > > > are
> > > >> > > > >> > waiting most of the time (never 50% of them are used).
> As i
> > > >> have
> > > >> > > read,
> > > >> > > > >> RPC
> > > >> > > > >> > queue is only used when handlers are all waiting, does it
> > > count
> > > >> > for
> > > >> > > > >> > replication as well?
> > > >> > > > >> >
> > > >> > > > >> > Thanks!
> > > >> > > > >> >
> > > >> > > > >> >
> > > >> > > > >> > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > > >> > > > >> >
> > > >> > > > >> > > Hi,
> > > >> > > > >> > >
> > > >> > > > >> > > When you say that B doesn't get any read/write
> operation,
> > > >> does
> > > >> > it
> > > >> > > > mean
> > > >> > > > >> > you
> > > >> > > > >> > > stopped the replication? Or B is still getting the
> write
> > > >> > > operations
> > > >> > > > >> from
> > > >> > > > >> > A
> > > >> > > > >> > > because of the replication? If so, that's why you RPC
> > queue
> > > >> is
> > > >> > > > used...
> > > >> > > > >> > >
> > > >> > > > >> > > JM
> > > >> > > > >> > >
> > > >> > > > >> > >
> > > >> > > > >> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > >> > > > >> > >
> > > >> > > > >> > > > Not much information in RS logs (DEBUG level set to
> > > >> > > > >> > > > org.apache.hadoop.hbase). Here is a sample of one
> > > >> regionserver
> > > >> > > > >> showing
> > > >> > > > >> > > > increasing rpc.metrics.RpcQueueTime_num_ops and
> > > >> > > > >> > > > rpc.metrics.RpcQueueTime_avg_time
> > > >> > > > >> > > > activity:
> > > >> > > > >> > > >
> > > >> > > > >> > > > 2013-12-09 08:09:10,699 DEBUG
> > > >> > > > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache:
> Stats:
> > > >> > > total=23.14
> > > >> > > > >> MB,
> > > >> > > > >> > > > free=2.73 GB, max=2.75 GB, blocks=0,
> > accesses=122442151,
> > > >> > > > >> > hits=122168501,
> > > >> > > > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> > > >> > > > cachingHits=122162378,
> > > >> > > > >> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> > > >> > > > >> > > > evictedPerRun=Infinity
> > > >> > > > >> > > > 2013-12-09 08:09:11,396 INFO
> > > >> > > > >> > > >
> > > >> > >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > >> > > > >> Total
> > > >> > > > >> > > > replicated: 1
> > > >> > > > >> > > > 2013-12-09 08:09:14,979 INFO
> > > >> > > > >> > > >
> > > >> > >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > >> > > > >> Total
> > > >> > > > >> > > > replicated: 2
> > > >> > > > >> > > > 2013-12-09 08:09:16,016 INFO
> > > >> > > > >> > > >
> > > >> > >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > >> > > > >> Total
> > > >> > > > >> > > > replicated: 1
> > > >> > > > >> > > > ...
> > > >> > > > >> > > > 2013-12-09 08:14:07,659 INFO
> > > >> > > > >> > > >
> > > >> > >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > >> > > > >> Total
> > > >> > > > >> > > > replicated: 1
> > > >> > > > >> > > > 2013-12-09 08:14:08,713 INFO
> > > >> > > > >> > > >
> > > >> > >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > >> > > > >> Total
> > > >> > > > >> > > > replicated: 3
> > > >> > > > >> > > > 2013-12-09 08:14:10,699 DEBUG
> > > >> > > > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache:
> Stats:
> > > >> > > total=23.14
> > > >> > > > >> MB,
> > > >> > > > >> > > > free=2.73 GB, max=2.75 GB, blocks=0,
> > accesses=122442151,
> > > >> > > > >> > hits=122168501,
> > > >> > > > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> > > >> > > > cachingHits=122162378,
> > > >> > > > >> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> > > >> > > > >> > > > evictedPerRun=Infinity
> > > >> > > > >> > > > 2013-12-09 08:14:12,711 INFO
> > > >> > > > >> > > >
> > > >> > >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > >> > > > >> Total
> > > >> > > > >> > > > replicated: 1
> > > >> > > > >> > > > 2013-12-09 08:14:14,778 INFO
> > > >> > > > >> > > >
> > > >> > >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > >> > > > >> Total
> > > >> > > > >> > > > replicated: 3
> > > >> > > > >> > > > ...
> > > >> > > > >> > > > 2013-12-09 08:15:09,199 INFO
> > > >> > > > >> > > >
> > > >> > >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > >> > > > >> Total
> > > >> > > > >> > > > replicated: 3
> > > >> > > > >> > > > 2013-12-09 08:15:12,243 INFO
> > > >> > > > >> > > >
> > > >> > >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > >> > > > >> Total
> > > >> > > > >> > > > replicated: 2
> > > >> > > > >> > > > 2013-12-09 08:15:22,086 INFO
> > > >> > > > >> > > >
> > > >> > >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > >> > > > >> Total
> > > >> > > > >> > > > replicated: 2
> > > >> > > > >> > > >
> > > >> > > > >> > > > Thanks
> > > >> > > > >> > > >
> > > >> > > > >> > > >
> > > >> > > > >> > > > 2013/12/7 Bharath Vissapragada <
> bharathv@cloudera.com>
> > > >> > > > >> > > >
> > > >> > > > >> > > > > I'd look into the RS logs to see whats happening
> > there.
> > > >> > > > Difficult
> > > >> > > > >> to
> > > >> > > > >> > > > guess
> > > >> > > > >> > > > > from the given information!
> > > >> > > > >> > > > >
> > > >> > > > >> > > > >
> > > >> > > > >> > > > > On Sat, Dec 7, 2013 at 8:52 PM, Federico Gaule <
> > > >> > > > >> fgaule@despegar.com>
> > > >> > > > >> > > > > wrote:
> > > >> > > > >> > > > >
> > > >> > > > >> > > > > > Any clue?
> > > >> > > > >> > > > > > El dic 5, 2013 9:49 a.m., "Federico Gaule" <
> > > >> > > > fgaule@despegar.com
> > > >> > > > >> >
> > > >> > > > >> > > > > escribió:
> > > >> > > > >> > > > > >
> > > >> > > > >> > > > > > > Hi,
> > > >> > > > >> > > > > > >
> > > >> > > > >> > > > > > > I have 2 clusters, Master (a) - Slave (b)
> > > >> replication.
> > > >> > > > >> > > > > > > B doesn't have client write or reads, all
> > handlers
> > > >> (100)
> > > >> > > are
> > > >> > > > >> > > waiting
> > > >> > > > >> > > > > but
> > > >> > > > >> > > > > > > rpc.metrics.RpcQueueTime_num_ops and
> > > >> > > > >> > > > rpc.metrics.RpcQueueTime_avg_time
> > > >> > > > >> > > > > > reports
> > > >> > > > >> > > > > > > to be rpc calls to be queued.
> > > >> > > > >> > > > > > > There are some screenshots below to show
> ganglia
> > > >> > metrics.
> > > >> > > > How
> > > >> > > > >> is
> > > >> > > > >> > > this
> > > >> > > > >> > > > > > > behaviour explained? I have looked for metrics
> > > >> > > > specifications
> > > >> > > > >> but
> > > >> > > > >> > > > can't
> > > >> > > > >> > > > > > > find much information.
> > > >> > > > >> > > > > > >
> > > >> > > > >> > > > > > > Handlers
> > > >> > > > >> > > > > > > http://i42.tinypic.com/242ssoz.png
> > > >> > > > >> > > > > > >
> > > >> > > > >> > > > > > > NumOps
> > > >> > > > >> > > > > > > http://tinypic.com/r/of2c8k/5
> > > >> > > > >> > > > > > >
> > > >> > > > >> > > > > > > AvgTime
> > > >> > > > >> > > > > > > http://tinypic.com/r/2lsvg5w/5
> > > >> > > > >> > > > > > >
> > > >> > > > >> > > > > > > Cheers
> > > >> > > > >> > > > > > >
> > > >> > > > >> > > > > >
> > > >> > > > >> > > > >
> > > >> > > > >> > > > >
> > > >> > > > >> > > > >
> > > >> > > > >> > > > > --
> > > >> > > > >> > > > > Bharath Vissapragada
> > > >> > > > >> > > > > <http://www.cloudera.com>
> > > >> > > > >> > > > >
> > > >> > > > >> > > >
> > > >> > > > >> > > >
> > > >> > > > >> > > >
> > > >> > > > >> > > > --
> > > >> > > > >> > > >
> > > >> > > > >> > > > [image:
> > > >> > > > >>
> > http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > >> > > > >> > ]
> > > >> > > > >> > > >
> > > >> > > > >> > > > *Ing. Federico Gaule*
> > > >> > > > >> > > > Líder Técnico - PAM <ho...@despegar.com>
> > > >> > > > >> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > >> > > > >> > > > tel. +54 (11) 4894-3500
> > > >> > > > >> > > >
> > > >> > > > >> > > > *[image: Seguinos en Twitter!] <
> > > >> > > http://twitter.com/#!/despegarar>
> > > >> > > > >> > > [image:
> > > >> > > > >> > > > Seguinos en Facebook!] <
> > http://www.facebook.com/despegar
> > > >
> > > >> > > [image:
> > > >> > > > >> > > Seguinos
> > > >> > > > >> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > >> > > > >> > > > *Despegar.com, Inc. *
> > > >> > > > >> > > > El mejor precio para tu viaje.
> > > >> > > > >> > > >
> > > >> > > > >> > > > Este mensaje es confidencial y puede contener
> > > informaciÃ³n
> > > >> > > > amparada
> > > >> > > > >> por
> > > >> > > > >> > > el
> > > >> > > > >> > > > secreto profesional.
> > > >> > > > >> > > > Si usted ha recibido este e-mail por error, por favor
> > > >> > > > >> comunÃquenoslo
> > > >> > > > >> > > > inmediatamente respondiendo a este e-mail y luego
> > > >> > eliminÃ¡ndolo
> > > >> > > de
> > > >> > > > >> su
> > > >> > > > >> > > > sistema.
> > > >> > > > >> > > > El contenido de este mensaje no deberÃ¡ ser copiado
> ni
> > > >> > > divulgado a
> > > >> > > > >> > > ninguna
> > > >> > > > >> > > > persona.
> > > >> > > > >> > > >
> > > >> > > > >> > >
> > > >> > > > >> >
> > > >> > > > >> >
> > > >> > > > >> >
> > > >> > > > >> > --
> > > >> > > > >> >
> > > >> > > > >> > [image:
> > > >> > > >
> http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> > > >> > > > >> >
> > > >> > > > >> > *Ing. Federico Gaule*
> > > >> > > > >> > Líder Técnico - PAM <ho...@despegar.com>
> > > >> > > > >> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > >> > > > >> > tel. +54 (11) 4894-3500
> > > >> > > > >> >
> > > >> > > > >> > *[image: Seguinos en Twitter!] <
> > > >> http://twitter.com/#!/despegarar>
> > > >> > > > >> [image:
> > > >> > > > >> > Seguinos en Facebook!] <http://www.facebook.com/despegar
> >
> > > >> [image:
> > > >> > > > >> Seguinos
> > > >> > > > >> > en YouTube!] <http://www.youtube.com/Despegar>*
> > > >> > > > >> > *Despegar.com, Inc. *
> > > >> > > > >> > El mejor precio para tu viaje.
> > > >> > > > >> >
> > > >> > > > >> > Este mensaje es confidencial y puede contener
> informaciÃ³n
> > > >> > amparada
> > > >> > > > por
> > > >> > > > >> el
> > > >> > > > >> > secreto profesional.
> > > >> > > > >> > Si usted ha recibido este e-mail por error, por favor
> > > >> > > comunÃquenoslo
> > > >> > > > >> > inmediatamente respondiendo a este e-mail y luego
> > > >> eliminÃ¡ndolo de
> > > >> > > su
> > > >> > > > >> > sistema.
> > > >> > > > >> > El contenido de este mensaje no deberÃ¡ ser copiado ni
> > > >> divulgado a
> > > >> > > > >> ninguna
> > > >> > > > >> > persona.
> > > >> > > > >> >
> > > >> > > > >>
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > --
> > > >> > > > >
> > > >> > > > > [image:
> > > >> > http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > >> > > ]
> > > >> > > > >
> > > >> > > > > *Ing. Federico Gaule*
> > > >> > > > > Líder Técnico - PAM <ho...@despegar.com>
> > > >> > > > >
> > > >> > > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > >> > > > > tel. +54 (11) 4894-3500
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > *[image: Seguinos en Twitter!] <
> > > http://twitter.com/#!/despegarar>
> > > >> > > > [image:
> > > >> > > > > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> > > [image:
> > > >> > > > Seguinos
> > > >> > > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > >> > > > > *Despegar.com, Inc. *
> > > >> > > > >
> > > >> > > > > El mejor precio para tu viaje.
> > > >> > > > >
> > > >> > > > > Este mensaje es confidencial y puede contener informaciÃ³n
> > > >> amparada
> > > >> > por
> > > >> > > > el
> > > >> > > > > secreto profesional.
> > > >> > > > > Si usted ha recibido este e-mail por error, por favor
> > > >> comunÃquenoslo
> > > >> > > > > inmediatamente respondiendo a este e-mail y luego
> > eliminÃ¡ndolo
> > > >> de su
> > > >> > > > > sistema.
> > > >> > > > > El contenido de este mensaje no deberÃ¡ ser copiado ni
> > > divulgado a
> > > >> > > > ninguna
> > > >> > > > > persona.
> > > >> > > > >
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > --
> > > >> > > >
> > > >> > > > [image:
> > > >> http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > >> > ]
> > > >> > > >
> > > >> > > > *Ing. Federico Gaule*
> > > >> > > > Líder Técnico - PAM <ho...@despegar.com>
> > > >> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > >> > > > tel. +54 (11) 4894-3500
> > > >> > > >
> > > >> > > > *[image: Seguinos en Twitter!] <
> > http://twitter.com/#!/despegarar>
> > > >> > > [image:
> > > >> > > > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> > [image:
> > > >> > > Seguinos
> > > >> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > >> > > > *Despegar.com, Inc. *
> > > >> > > > El mejor precio para tu viaje.
> > > >> > > >
> > > >> > > > Este mensaje es confidencial y puede contener informaciÃ³n
> > > amparada
> > > >> por
> > > >> > > el
> > > >> > > > secreto profesional.
> > > >> > > > Si usted ha recibido este e-mail por error, por favor
> > > >> comunÃquenoslo
> > > >> > > > inmediatamente respondiendo a este e-mail y luego
> eliminÃ¡ndolo
> > de
> > > >> su
> > > >> > > > sistema.
> > > >> > > > El contenido de este mensaje no deberÃ¡ ser copiado ni
> > divulgado a
> > > >> > > ninguna
> > > >> > > > persona.
> > > >> > > >
> > > >> > >
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> >
> > > >> > [image:
> > > http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> > > >> >
> > > >> > *Ing. Federico Gaule*
> > > >> > Líder Técnico - PAM <ho...@despegar.com>
> > > >> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > >> > tel. +54 (11) 4894-3500
> > > >> >
> > > >> > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> > > >> [image:
> > > >> > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> > > >> Seguinos
> > > >> > en YouTube!] <http://www.youtube.com/Despegar>*
> > > >> > *Despegar.com, Inc. *
> > > >> > El mejor precio para tu viaje.
> > > >> >
> > > >> > Este mensaje es confidencial y puede contener informaciÃ³n
> amparada
> > > por
> > > >> el
> > > >> > secreto profesional.
> > > >> > Si usted ha recibido este e-mail por error, por favor
> > comunÃquenoslo
> > > >> > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de
> > su
> > > >> > sistema.
> > > >> > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> > > >> ninguna
> > > >> > persona.
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > [image:
> http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > ]
> > > >
> > > > *Ing. Federico Gaule*
> > > > Líder Técnico - PAM <ho...@despegar.com>
> > > >
> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > tel. +54 (11) 4894-3500
> > > >
> > > >
> > > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> > > [image:
> > > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> > > Seguinos
> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > *Despegar.com, Inc. *
> > > >
> > > > El mejor precio para tu viaje.
> > > >
> > > > Este mensaje es confidencial y puede contener informaciÃ³n amparada
> por
> > > el
> > > > secreto profesional.
> > > > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> > > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> > > > sistema.
> > > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> > > ninguna
> > > > persona.
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png
> ]
> > >
> > > *Ing. Federico Gaule*
> > > Líder Técnico - PAM <ho...@despegar.com>
> > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > tel. +54 (11) 4894-3500
> > >
> > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> > [image:
> > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> > Seguinos
> > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > *Despegar.com, Inc. *
> > > El mejor precio para tu viaje.
> > >
> > > Este mensaje es confidencial y puede contener informaciÃ³n amparada por
> > el
> > > secreto profesional.
> > > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> > > sistema.
> > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> > ninguna
> > > persona.
> > >
> >
>
>
>
> --
>
> [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
>
> *Ing. Federico Gaule*
> Líder Técnico - PAM <ho...@despegar.com>
> Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> tel. +54 (11) 4894-3500
>
> *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
> Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
> en YouTube!] <http://www.youtube.com/Despegar>*
> *Despegar.com, Inc. *
> El mejor precio para tu viaje.
>
> Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
> secreto profesional.
> Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> sistema.
> El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
> persona.
>

Re: RPC - Queue Time when handlers are all waiting

Posted by Federico Gaule <fg...@despegar.com>.

I'm using hbase 0.94.13.
hbase.regionserver.metahandler.count is a more intuituve name for those
handlers :)


2013/12/10 Nicolas Liochon <nk...@gmail.com>

> It's hbase.regionserver.metahandler.count. Not sure it causes the issue
> you're facing, thought. What's your HBase version?
>
>
> On Tue, Dec 10, 2013 at 1:21 PM, Federico Gaule <fg...@despegar.com>
> wrote:
>
> > There is another set of handler we haven't customized "PRI IPC" (priority
> > ?). What are those handlers used for? What is the property used to
> increase
> > the number of handlers? hbase.regionserver.custom.priority.handler.count
> ?
> >
> > Thanks!
> >
> >
> > 2013/12/10 Federico Gaule <fg...@despegar.com>
> >
> > > I've increased hbase.regionserver.replication.handler.count 10x (30)
> but
> > > nothing have changed. rpc.metrics.RpcQueueTime_avg_time still shows
> > > activity :(
> > >
> > > Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 29 on 60000 WAITING
> > > (since 16hrs, 58mins, 56sec ago)Waiting for a call (since 16hrs,
> 58mins,
> > > 56sec ago) Mon Dec 09 14:04:10 EST 2013 REPL IPC Server handler 28 on
> > > 60000WAITING (since 16hrs, 58mins, 56sec ago) Waiting for a call (since
> > > 16hrs, 58mins, 56sec ago)Mon Dec 09 14:04:10 EST 2013 REPL IPC Server
> > > handler 27 on 60000 WAITING (since 16hrs, 58mins, 56sec ago)Waiting
> for a
> > > call (since 16hrs, 58mins, 56sec ago) Mon Dec 09 14:04:10 EST 2013 REPL
> > > IPC Server handler 26 on 60000WAITING (since 16hrs, 58mins, 56sec
> > ago)Waiting for a call (since 16hrs, 58mins, 56sec ago)
> > > ... ...
> > > ...
> > > ...
> > > Mon Dec 09 14:04:10 EST 2013 REPL IPC Server handler 2 on 60000WAITING
> > > (since 16hrs, 58mins, 56sec ago) Waiting for a call (since 16hrs,
> 58mins,
> > > 56sec ago)Mon Dec 09 14:04:10 EST 2013 REPL IPC Server handler 1 on
> > 60000WAITING (since 16hrs, 58mins, 56sec ago)Waiting
> > > for a call (since 16hrs, 58mins, 56sec ago) Mon Dec 09 14:04:10 EST
> > 2013REPL IPC Server handler 0 on 60000WAITING
> > > (since 16hrs, 58mins, 56sec ago) Waiting for a call (since 16hrs,
> 58mins,
> > > 56sec ago)
> > > Thanks JM
> > >
> > >
> > > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > >
> > >> Yes, default value is 3 in 0.94.14. If you have not changed it, then
> > it's
> > >> still 3.
> > >>
> > >> conf.getInt("hbase.regionserver.replication.handler.count", 3);
> > >>
> > >> Keep us posted on the results.
> > >>
> > >> JM
> > >>
> > >>
> > >> 2013/12/9 Federico Gaule <fg...@despegar.com>
> > >>
> > >> > Default value for hbase.regionserver.replication.handler.count
> (can't
> > >> find
> > >> > what is the default, Is it 3?)
> > >> > I'll do a try increasing that property
> > >> >
> > >> > Fri Dec 06 12:44:12 EST 2013REPL IPC Server handler 2 on
> 60020WAITING
> > >> > (since 8sec ago)Waiting for a call (since 8sec ago)Fri Dec 06
> 12:44:12
> > >> EST
> > >> > 2013REPL IPC Server handler 1 on 60020WAITING (since 8sec
> ago)Waiting
> > >> for a
> > >> > call (since 8sec ago)Fri Dec 06 12:44:12 EST 2013REPL IPC Server
> > >> handler 0
> > >> > on 60020WAITING (since 2sec ago)Waiting for a call (since 2sec ago)
> > >> > Thanks JM
> > >> >
> > >> >
> > >> > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > >> >
> > >> > > For replications, the handlers used on the salve cluster are
> > >> configured
> > >> > by
> > >> > > hbase.regionserver.replication.handler.count. What value do you
> have
> > >> for
> > >> > > this property?
> > >> > >
> > >> > > JM
> > >> > >
> > >> > >
> > >> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > >> > >
> > >> > > > Here is a thread saying what i think it should be (
> > >> > > >
> > http://grokbase.com/t/hbase/user/13bmndq53k/average-rpc-queue-time)
> > >> > > >
> > >> > > > "The RpcQueueTime metrics are a measurement of how long
> individual
> > >> > calls
> > >> > > > stay in this queued state. If your handlers were never 100%
> > >> occupied,
> > >> > > this
> > >> > > > value would be 0. An average of 3 hours is concerning, it
> > basically
> > >> > means
> > >> > > > that when a call comes into the RegionServer it takes on
> average 3
> > >> > hours
> > >> > > to
> > >> > > > start processing, because handlers are all occupied for that
> > amount
> > >> of
> > >> > > > time."
> > >> > > >
> > >> > > > Is that correct?
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > >> > > >
> > >> > > > > Correct me if i'm wrong, but, Queues should be used only when
> > >> > handlers
> > >> > > > are
> > >> > > > > all busy, shouldn't it?.
> > >> > > > > If that's true, i don't get why there is activity related to
> > >> queues.
> > >> > > > >
> > >> > > > > Maybe i'm missing some piece of knowledge about when hbase is
> > >> using
> > >> > > > queues
> > >> > > > > :)
> > >> > > > >
> > >> > > > > Thanks
> > >> > > > >
> > >> > > > >
> > >> > > > > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > >> > > > >
> > >> > > > >> There might be something I'm missing ;)
> > >> > > > >>
> > >> > > > >> On cluster B, as you said, never more than 50% of your
> handlers
> > >> are
> > >> > > > used.
> > >> > > > >> Your Ganglia metrics are showing that there is activities
> (num
> > >> ops
> > >> > is
> > >> > > > >> increasing), which is correct.
> > >> > > > >>
> > >> > > > >> Can you please confirm what you think is wrong from your
> > charts?
> > >> > > > >>
> > >> > > > >> Thanks,
> > >> > > > >>
> > >> > > > >> JM
> > >> > > > >>
> > >> > > > >>
> > >> > > > >> 2013/12/9 Federico Gaule <fg...@despegar.com>
> > >> > > > >>
> > >> > > > >> > Hi JM,
> > >> > > > >> > Cluster B is only receiving replication data (writes), but
> > >> > handlers
> > >> > > > are
> > >> > > > >> > waiting most of the time (never 50% of them are used). As i
> > >> have
> > >> > > read,
> > >> > > > >> RPC
> > >> > > > >> > queue is only used when handlers are all waiting, does it
> > count
> > >> > for
> > >> > > > >> > replication as well?
> > >> > > > >> >
> > >> > > > >> > Thanks!
> > >> > > > >> >
> > >> > > > >> >
> > >> > > > >> > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > >> > > > >> >
> > >> > > > >> > > Hi,
> > >> > > > >> > >
> > >> > > > >> > > When you say that B doesn't get any read/write operation,
> > >> does
> > >> > it
> > >> > > > mean
> > >> > > > >> > you
> > >> > > > >> > > stopped the replication? Or B is still getting the write
> > >> > > operations
> > >> > > > >> from
> > >> > > > >> > A
> > >> > > > >> > > because of the replication? If so, that's why you RPC
> queue
> > >> is
> > >> > > > used...
> > >> > > > >> > >
> > >> > > > >> > > JM
> > >> > > > >> > >
> > >> > > > >> > >
> > >> > > > >> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > >> > > > >> > >
> > >> > > > >> > > > Not much information in RS logs (DEBUG level set to
> > >> > > > >> > > > org.apache.hadoop.hbase). Here is a sample of one
> > >> regionserver
> > >> > > > >> showing
> > >> > > > >> > > > increasing rpc.metrics.RpcQueueTime_num_ops and
> > >> > > > >> > > > rpc.metrics.RpcQueueTime_avg_time
> > >> > > > >> > > > activity:
> > >> > > > >> > > >
> > >> > > > >> > > > 2013-12-09 08:09:10,699 DEBUG
> > >> > > > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats:
> > >> > > total=23.14
> > >> > > > >> MB,
> > >> > > > >> > > > free=2.73 GB, max=2.75 GB, blocks=0,
> accesses=122442151,
> > >> > > > >> > hits=122168501,
> > >> > > > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> > >> > > > cachingHits=122162378,
> > >> > > > >> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> > >> > > > >> > > > evictedPerRun=Infinity
> > >> > > > >> > > > 2013-12-09 08:09:11,396 INFO
> > >> > > > >> > > >
> > >> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > >> > > > >> Total
> > >> > > > >> > > > replicated: 1
> > >> > > > >> > > > 2013-12-09 08:09:14,979 INFO
> > >> > > > >> > > >
> > >> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > >> > > > >> Total
> > >> > > > >> > > > replicated: 2
> > >> > > > >> > > > 2013-12-09 08:09:16,016 INFO
> > >> > > > >> > > >
> > >> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > >> > > > >> Total
> > >> > > > >> > > > replicated: 1
> > >> > > > >> > > > ...
> > >> > > > >> > > > 2013-12-09 08:14:07,659 INFO
> > >> > > > >> > > >
> > >> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > >> > > > >> Total
> > >> > > > >> > > > replicated: 1
> > >> > > > >> > > > 2013-12-09 08:14:08,713 INFO
> > >> > > > >> > > >
> > >> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > >> > > > >> Total
> > >> > > > >> > > > replicated: 3
> > >> > > > >> > > > 2013-12-09 08:14:10,699 DEBUG
> > >> > > > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats:
> > >> > > total=23.14
> > >> > > > >> MB,
> > >> > > > >> > > > free=2.73 GB, max=2.75 GB, blocks=0,
> accesses=122442151,
> > >> > > > >> > hits=122168501,
> > >> > > > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> > >> > > > cachingHits=122162378,
> > >> > > > >> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> > >> > > > >> > > > evictedPerRun=Infinity
> > >> > > > >> > > > 2013-12-09 08:14:12,711 INFO
> > >> > > > >> > > >
> > >> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > >> > > > >> Total
> > >> > > > >> > > > replicated: 1
> > >> > > > >> > > > 2013-12-09 08:14:14,778 INFO
> > >> > > > >> > > >
> > >> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > >> > > > >> Total
> > >> > > > >> > > > replicated: 3
> > >> > > > >> > > > ...
> > >> > > > >> > > > 2013-12-09 08:15:09,199 INFO
> > >> > > > >> > > >
> > >> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > >> > > > >> Total
> > >> > > > >> > > > replicated: 3
> > >> > > > >> > > > 2013-12-09 08:15:12,243 INFO
> > >> > > > >> > > >
> > >> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > >> > > > >> Total
> > >> > > > >> > > > replicated: 2
> > >> > > > >> > > > 2013-12-09 08:15:22,086 INFO
> > >> > > > >> > > >
> > >> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > >> > > > >> Total
> > >> > > > >> > > > replicated: 2
> > >> > > > >> > > >
> > >> > > > >> > > > Thanks
> > >> > > > >> > > >
> > >> > > > >> > > >
> > >> > > > >> > > > 2013/12/7 Bharath Vissapragada <bh...@cloudera.com>
> > >> > > > >> > > >
> > >> > > > >> > > > > I'd look into the RS logs to see whats happening
> there.
> > >> > > > Difficult
> > >> > > > >> to
> > >> > > > >> > > > guess
> > >> > > > >> > > > > from the given information!
> > >> > > > >> > > > >
> > >> > > > >> > > > >
> > >> > > > >> > > > > On Sat, Dec 7, 2013 at 8:52 PM, Federico Gaule <
> > >> > > > >> fgaule@despegar.com>
> > >> > > > >> > > > > wrote:
> > >> > > > >> > > > >
> > >> > > > >> > > > > > Any clue?
> > >> > > > >> > > > > > El dic 5, 2013 9:49 a.m., "Federico Gaule" <
> > >> > > > fgaule@despegar.com
> > >> > > > >> >
> > >> > > > >> > > > > escribió:
> > >> > > > >> > > > > >
> > >> > > > >> > > > > > > Hi,
> > >> > > > >> > > > > > >
> > >> > > > >> > > > > > > I have 2 clusters, Master (a) - Slave (b)
> > >> replication.
> > >> > > > >> > > > > > > B doesn't have client write or reads, all
> handlers
> > >> (100)
> > >> > > are
> > >> > > > >> > > waiting
> > >> > > > >> > > > > but
> > >> > > > >> > > > > > > rpc.metrics.RpcQueueTime_num_ops and
> > >> > > > >> > > > rpc.metrics.RpcQueueTime_avg_time
> > >> > > > >> > > > > > reports
> > >> > > > >> > > > > > > to be rpc calls to be queued.
> > >> > > > >> > > > > > > There are some screenshots below to show ganglia
> > >> > metrics.
> > >> > > > How
> > >> > > > >> is
> > >> > > > >> > > this
> > >> > > > >> > > > > > > behaviour explained? I have looked for metrics
> > >> > > > specifications
> > >> > > > >> but
> > >> > > > >> > > > can't
> > >> > > > >> > > > > > > find much information.
> > >> > > > >> > > > > > >
> > >> > > > >> > > > > > > Handlers
> > >> > > > >> > > > > > > http://i42.tinypic.com/242ssoz.png
> > >> > > > >> > > > > > >
> > >> > > > >> > > > > > > NumOps
> > >> > > > >> > > > > > > http://tinypic.com/r/of2c8k/5
> > >> > > > >> > > > > > >
> > >> > > > >> > > > > > > AvgTime
> > >> > > > >> > > > > > > http://tinypic.com/r/2lsvg5w/5
> > >> > > > >> > > > > > >
> > >> > > > >> > > > > > > Cheers
> > >> > > > >> > > > > > >
> > >> > > > >> > > > > >
> > >> > > > >> > > > >
> > >> > > > >> > > > >
> > >> > > > >> > > > >
> > >> > > > >> > > > > --
> > >> > > > >> > > > > Bharath Vissapragada
> > >> > > > >> > > > > <http://www.cloudera.com>
> > >> > > > >> > > > >
> > >> > > > >> > > >
> > >> > > > >> > > >
> > >> > > > >> > > >
> > >> > > > >> > > > --
> > >> > > > >> > > >
> > >> > > > >> > > > [image:
> > >> > > > >>
> http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > >> > > > >> > ]
> > >> > > > >> > > >
> > >> > > > >> > > > *Ing. Federico Gaule*
> > >> > > > >> > > > Líder Técnico - PAM <ho...@despegar.com>
> > >> > > > >> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > >> > > > >> > > > tel. +54 (11) 4894-3500
> > >> > > > >> > > >
> > >> > > > >> > > > *[image: Seguinos en Twitter!] <
> > >> > > http://twitter.com/#!/despegarar>
> > >> > > > >> > > [image:
> > >> > > > >> > > > Seguinos en Facebook!] <
> http://www.facebook.com/despegar
> > >
> > >> > > [image:
> > >> > > > >> > > Seguinos
> > >> > > > >> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > >> > > > >> > > > *Despegar.com, Inc. *
> > >> > > > >> > > > El mejor precio para tu viaje.
> > >> > > > >> > > >
> > >> > > > >> > > > Este mensaje es confidencial y puede contener
> > informaciÃ³n
> > >> > > > amparada
> > >> > > > >> por
> > >> > > > >> > > el
> > >> > > > >> > > > secreto profesional.
> > >> > > > >> > > > Si usted ha recibido este e-mail por error, por favor
> > >> > > > >> comunÃquenoslo
> > >> > > > >> > > > inmediatamente respondiendo a este e-mail y luego
> > >> > eliminÃ¡ndolo
> > >> > > de
> > >> > > > >> su
> > >> > > > >> > > > sistema.
> > >> > > > >> > > > El contenido de este mensaje no deberÃ¡ ser copiado ni
> > >> > > divulgado a
> > >> > > > >> > > ninguna
> > >> > > > >> > > > persona.
> > >> > > > >> > > >
> > >> > > > >> > >
> > >> > > > >> >
> > >> > > > >> >
> > >> > > > >> >
> > >> > > > >> > --
> > >> > > > >> >
> > >> > > > >> > [image:
> > >> > > > http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> > >> > > > >> >
> > >> > > > >> > *Ing. Federico Gaule*
> > >> > > > >> > Líder Técnico - PAM <ho...@despegar.com>
> > >> > > > >> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > >> > > > >> > tel. +54 (11) 4894-3500
> > >> > > > >> >
> > >> > > > >> > *[image: Seguinos en Twitter!] <
> > >> http://twitter.com/#!/despegarar>
> > >> > > > >> [image:
> > >> > > > >> > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> > >> [image:
> > >> > > > >> Seguinos
> > >> > > > >> > en YouTube!] <http://www.youtube.com/Despegar>*
> > >> > > > >> > *Despegar.com, Inc. *
> > >> > > > >> > El mejor precio para tu viaje.
> > >> > > > >> >
> > >> > > > >> > Este mensaje es confidencial y puede contener informaciÃ³n
> > >> > amparada
> > >> > > > por
> > >> > > > >> el
> > >> > > > >> > secreto profesional.
> > >> > > > >> > Si usted ha recibido este e-mail por error, por favor
> > >> > > comunÃquenoslo
> > >> > > > >> > inmediatamente respondiendo a este e-mail y luego
> > >> eliminÃ¡ndolo de
> > >> > > su
> > >> > > > >> > sistema.
> > >> > > > >> > El contenido de este mensaje no deberÃ¡ ser copiado ni
> > >> divulgado a
> > >> > > > >> ninguna
> > >> > > > >> > persona.
> > >> > > > >> >
> > >> > > > >>
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > >
> > >> > > > > [image:
> > >> > http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > >> > > ]
> > >> > > > >
> > >> > > > > *Ing. Federico Gaule*
> > >> > > > > Líder Técnico - PAM <ho...@despegar.com>
> > >> > > > >
> > >> > > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > >> > > > > tel. +54 (11) 4894-3500
> > >> > > > >
> > >> > > > >
> > >> > > > > *[image: Seguinos en Twitter!] <
> > http://twitter.com/#!/despegarar>
> > >> > > > [image:
> > >> > > > > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> > [image:
> > >> > > > Seguinos
> > >> > > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > >> > > > > *Despegar.com, Inc. *
> > >> > > > >
> > >> > > > > El mejor precio para tu viaje.
> > >> > > > >
> > >> > > > > Este mensaje es confidencial y puede contener informaciÃ³n
> > >> amparada
> > >> > por
> > >> > > > el
> > >> > > > > secreto profesional.
> > >> > > > > Si usted ha recibido este e-mail por error, por favor
> > >> comunÃquenoslo
> > >> > > > > inmediatamente respondiendo a este e-mail y luego
> eliminÃ¡ndolo
> > >> de su
> > >> > > > > sistema.
> > >> > > > > El contenido de este mensaje no deberÃ¡ ser copiado ni
> > divulgado a
> > >> > > > ninguna
> > >> > > > > persona.
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > >
> > >> > > > [image:
> > >> http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > >> > ]
> > >> > > >
> > >> > > > *Ing. Federico Gaule*
> > >> > > > Líder Técnico - PAM <ho...@despegar.com>
> > >> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > >> > > > tel. +54 (11) 4894-3500
> > >> > > >
> > >> > > > *[image: Seguinos en Twitter!] <
> http://twitter.com/#!/despegarar>
> > >> > > [image:
> > >> > > > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> [image:
> > >> > > Seguinos
> > >> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > >> > > > *Despegar.com, Inc. *
> > >> > > > El mejor precio para tu viaje.
> > >> > > >
> > >> > > > Este mensaje es confidencial y puede contener informaciÃ³n
> > amparada
> > >> por
> > >> > > el
> > >> > > > secreto profesional.
> > >> > > > Si usted ha recibido este e-mail por error, por favor
> > >> comunÃquenoslo
> > >> > > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo
> de
> > >> su
> > >> > > > sistema.
> > >> > > > El contenido de este mensaje no deberÃ¡ ser copiado ni
> divulgado a
> > >> > > ninguna
> > >> > > > persona.
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> >
> > >> > [image:
> > http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> > >> >
> > >> > *Ing. Federico Gaule*
> > >> > Líder Técnico - PAM <ho...@despegar.com>
> > >> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > >> > tel. +54 (11) 4894-3500
> > >> >
> > >> > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> > >> [image:
> > >> > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> > >> Seguinos
> > >> > en YouTube!] <http://www.youtube.com/Despegar>*
> > >> > *Despegar.com, Inc. *
> > >> > El mejor precio para tu viaje.
> > >> >
> > >> > Este mensaje es confidencial y puede contener informaciÃ³n amparada
> > por
> > >> el
> > >> > secreto profesional.
> > >> > Si usted ha recibido este e-mail por error, por favor
> comunÃquenoslo
> > >> > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de
> su
> > >> > sistema.
> > >> > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> > >> ninguna
> > >> > persona.
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > >
> > > [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png
> ]
> > >
> > > *Ing. Federico Gaule*
> > > Líder Técnico - PAM <ho...@despegar.com>
> > >
> > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > tel. +54 (11) 4894-3500
> > >
> > >
> > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> > [image:
> > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> > Seguinos
> > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > *Despegar.com, Inc. *
> > >
> > > El mejor precio para tu viaje.
> > >
> > > Este mensaje es confidencial y puede contener informaciÃ³n amparada por
> > el
> > > secreto profesional.
> > > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> > > sistema.
> > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> > ninguna
> > > persona.
> > >
> >
> >
> >
> > --
> >
> > [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> >
> > *Ing. Federico Gaule*
> > Líder Técnico - PAM <ho...@despegar.com>
> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > tel. +54 (11) 4894-3500
> >
> > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> [image:
> > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> Seguinos
> > en YouTube!] <http://www.youtube.com/Despegar>*
> > *Despegar.com, Inc. *
> > El mejor precio para tu viaje.
> >
> > Este mensaje es confidencial y puede contener informaciÃ³n amparada por
> el
> > secreto profesional.
> > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> > sistema.
> > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> ninguna
> > persona.
> >
>



-- 

[image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]

*Ing. Federico Gaule*
Líder Técnico - PAM <ho...@despegar.com>
Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
tel. +54 (11) 4894-3500

*[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
en YouTube!] <http://www.youtube.com/Despegar>*
*Despegar.com, Inc. *
El mejor precio para tu viaje.

Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
secreto profesional.
Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
sistema.
El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
persona.

Re: RPC - Queue Time when handlers are all waiting

Posted by Nicolas Liochon <nk...@gmail.com>.

It's hbase.regionserver.metahandler.count. Not sure it causes the issue
you're facing, thought. What's your HBase version?


On Tue, Dec 10, 2013 at 1:21 PM, Federico Gaule <fg...@despegar.com> wrote:

> There is another set of handler we haven't customized "PRI IPC" (priority
> ?). What are those handlers used for? What is the property used to increase
> the number of handlers? hbase.regionserver.custom.priority.handler.count ?
>
> Thanks!
>
>
> 2013/12/10 Federico Gaule <fg...@despegar.com>
>
> > I've increased hbase.regionserver.replication.handler.count 10x (30) but
> > nothing have changed. rpc.metrics.RpcQueueTime_avg_time still shows
> > activity :(
> >
> > Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 29 on 60000 WAITING
> > (since 16hrs, 58mins, 56sec ago)Waiting for a call (since 16hrs, 58mins,
> > 56sec ago) Mon Dec 09 14:04:10 EST 2013 REPL IPC Server handler 28 on
> > 60000WAITING (since 16hrs, 58mins, 56sec ago) Waiting for a call (since
> > 16hrs, 58mins, 56sec ago)Mon Dec 09 14:04:10 EST 2013 REPL IPC Server
> > handler 27 on 60000 WAITING (since 16hrs, 58mins, 56sec ago)Waiting for a
> > call (since 16hrs, 58mins, 56sec ago) Mon Dec 09 14:04:10 EST 2013 REPL
> > IPC Server handler 26 on 60000WAITING (since 16hrs, 58mins, 56sec
> ago)Waiting for a call (since 16hrs, 58mins, 56sec ago)
> > ... ...
> > ...
> > ...
> > Mon Dec 09 14:04:10 EST 2013 REPL IPC Server handler 2 on 60000WAITING
> > (since 16hrs, 58mins, 56sec ago) Waiting for a call (since 16hrs, 58mins,
> > 56sec ago)Mon Dec 09 14:04:10 EST 2013 REPL IPC Server handler 1 on
> 60000WAITING (since 16hrs, 58mins, 56sec ago)Waiting
> > for a call (since 16hrs, 58mins, 56sec ago) Mon Dec 09 14:04:10 EST
> 2013REPL IPC Server handler 0 on 60000WAITING
> > (since 16hrs, 58mins, 56sec ago) Waiting for a call (since 16hrs, 58mins,
> > 56sec ago)
> > Thanks JM
> >
> >
> > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> >
> >> Yes, default value is 3 in 0.94.14. If you have not changed it, then
> it's
> >> still 3.
> >>
> >> conf.getInt("hbase.regionserver.replication.handler.count", 3);
> >>
> >> Keep us posted on the results.
> >>
> >> JM
> >>
> >>
> >> 2013/12/9 Federico Gaule <fg...@despegar.com>
> >>
> >> > Default value for hbase.regionserver.replication.handler.count (can't
> >> find
> >> > what is the default, Is it 3?)
> >> > I'll do a try increasing that property
> >> >
> >> > Fri Dec 06 12:44:12 EST 2013REPL IPC Server handler 2 on 60020WAITING
> >> > (since 8sec ago)Waiting for a call (since 8sec ago)Fri Dec 06 12:44:12
> >> EST
> >> > 2013REPL IPC Server handler 1 on 60020WAITING (since 8sec ago)Waiting
> >> for a
> >> > call (since 8sec ago)Fri Dec 06 12:44:12 EST 2013REPL IPC Server
> >> handler 0
> >> > on 60020WAITING (since 2sec ago)Waiting for a call (since 2sec ago)
> >> > Thanks JM
> >> >
> >> >
> >> > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> >> >
> >> > > For replications, the handlers used on the salve cluster are
> >> configured
> >> > by
> >> > > hbase.regionserver.replication.handler.count. What value do you have
> >> for
> >> > > this property?
> >> > >
> >> > > JM
> >> > >
> >> > >
> >> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> >> > >
> >> > > > Here is a thread saying what i think it should be (
> >> > > >
> http://grokbase.com/t/hbase/user/13bmndq53k/average-rpc-queue-time)
> >> > > >
> >> > > > "The RpcQueueTime metrics are a measurement of how long individual
> >> > calls
> >> > > > stay in this queued state. If your handlers were never 100%
> >> occupied,
> >> > > this
> >> > > > value would be 0. An average of 3 hours is concerning, it
> basically
> >> > means
> >> > > > that when a call comes into the RegionServer it takes on average 3
> >> > hours
> >> > > to
> >> > > > start processing, because handlers are all occupied for that
> amount
> >> of
> >> > > > time."
> >> > > >
> >> > > > Is that correct?
> >> > > >
> >> > > >
> >> > > >
> >> > > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> >> > > >
> >> > > > > Correct me if i'm wrong, but, Queues should be used only when
> >> > handlers
> >> > > > are
> >> > > > > all busy, shouldn't it?.
> >> > > > > If that's true, i don't get why there is activity related to
> >> queues.
> >> > > > >
> >> > > > > Maybe i'm missing some piece of knowledge about when hbase is
> >> using
> >> > > > queues
> >> > > > > :)
> >> > > > >
> >> > > > > Thanks
> >> > > > >
> >> > > > >
> >> > > > > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> >> > > > >
> >> > > > >> There might be something I'm missing ;)
> >> > > > >>
> >> > > > >> On cluster B, as you said, never more than 50% of your handlers
> >> are
> >> > > > used.
> >> > > > >> Your Ganglia metrics are showing that there is activities (num
> >> ops
> >> > is
> >> > > > >> increasing), which is correct.
> >> > > > >>
> >> > > > >> Can you please confirm what you think is wrong from your
> charts?
> >> > > > >>
> >> > > > >> Thanks,
> >> > > > >>
> >> > > > >> JM
> >> > > > >>
> >> > > > >>
> >> > > > >> 2013/12/9 Federico Gaule <fg...@despegar.com>
> >> > > > >>
> >> > > > >> > Hi JM,
> >> > > > >> > Cluster B is only receiving replication data (writes), but
> >> > handlers
> >> > > > are
> >> > > > >> > waiting most of the time (never 50% of them are used). As i
> >> have
> >> > > read,
> >> > > > >> RPC
> >> > > > >> > queue is only used when handlers are all waiting, does it
> count
> >> > for
> >> > > > >> > replication as well?
> >> > > > >> >
> >> > > > >> > Thanks!
> >> > > > >> >
> >> > > > >> >
> >> > > > >> > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> >> > > > >> >
> >> > > > >> > > Hi,
> >> > > > >> > >
> >> > > > >> > > When you say that B doesn't get any read/write operation,
> >> does
> >> > it
> >> > > > mean
> >> > > > >> > you
> >> > > > >> > > stopped the replication? Or B is still getting the write
> >> > > operations
> >> > > > >> from
> >> > > > >> > A
> >> > > > >> > > because of the replication? If so, that's why you RPC queue
> >> is
> >> > > > used...
> >> > > > >> > >
> >> > > > >> > > JM
> >> > > > >> > >
> >> > > > >> > >
> >> > > > >> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> >> > > > >> > >
> >> > > > >> > > > Not much information in RS logs (DEBUG level set to
> >> > > > >> > > > org.apache.hadoop.hbase). Here is a sample of one
> >> regionserver
> >> > > > >> showing
> >> > > > >> > > > increasing rpc.metrics.RpcQueueTime_num_ops and
> >> > > > >> > > > rpc.metrics.RpcQueueTime_avg_time
> >> > > > >> > > > activity:
> >> > > > >> > > >
> >> > > > >> > > > 2013-12-09 08:09:10,699 DEBUG
> >> > > > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats:
> >> > > total=23.14
> >> > > > >> MB,
> >> > > > >> > > > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151,
> >> > > > >> > hits=122168501,
> >> > > > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> >> > > > cachingHits=122162378,
> >> > > > >> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> >> > > > >> > > > evictedPerRun=Infinity
> >> > > > >> > > > 2013-12-09 08:09:11,396 INFO
> >> > > > >> > > >
> >> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> >> > > > >> Total
> >> > > > >> > > > replicated: 1
> >> > > > >> > > > 2013-12-09 08:09:14,979 INFO
> >> > > > >> > > >
> >> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> >> > > > >> Total
> >> > > > >> > > > replicated: 2
> >> > > > >> > > > 2013-12-09 08:09:16,016 INFO
> >> > > > >> > > >
> >> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> >> > > > >> Total
> >> > > > >> > > > replicated: 1
> >> > > > >> > > > ...
> >> > > > >> > > > 2013-12-09 08:14:07,659 INFO
> >> > > > >> > > >
> >> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> >> > > > >> Total
> >> > > > >> > > > replicated: 1
> >> > > > >> > > > 2013-12-09 08:14:08,713 INFO
> >> > > > >> > > >
> >> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> >> > > > >> Total
> >> > > > >> > > > replicated: 3
> >> > > > >> > > > 2013-12-09 08:14:10,699 DEBUG
> >> > > > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats:
> >> > > total=23.14
> >> > > > >> MB,
> >> > > > >> > > > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151,
> >> > > > >> > hits=122168501,
> >> > > > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> >> > > > cachingHits=122162378,
> >> > > > >> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> >> > > > >> > > > evictedPerRun=Infinity
> >> > > > >> > > > 2013-12-09 08:14:12,711 INFO
> >> > > > >> > > >
> >> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> >> > > > >> Total
> >> > > > >> > > > replicated: 1
> >> > > > >> > > > 2013-12-09 08:14:14,778 INFO
> >> > > > >> > > >
> >> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> >> > > > >> Total
> >> > > > >> > > > replicated: 3
> >> > > > >> > > > ...
> >> > > > >> > > > 2013-12-09 08:15:09,199 INFO
> >> > > > >> > > >
> >> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> >> > > > >> Total
> >> > > > >> > > > replicated: 3
> >> > > > >> > > > 2013-12-09 08:15:12,243 INFO
> >> > > > >> > > >
> >> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> >> > > > >> Total
> >> > > > >> > > > replicated: 2
> >> > > > >> > > > 2013-12-09 08:15:22,086 INFO
> >> > > > >> > > >
> >> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> >> > > > >> Total
> >> > > > >> > > > replicated: 2
> >> > > > >> > > >
> >> > > > >> > > > Thanks
> >> > > > >> > > >
> >> > > > >> > > >
> >> > > > >> > > > 2013/12/7 Bharath Vissapragada <bh...@cloudera.com>
> >> > > > >> > > >
> >> > > > >> > > > > I'd look into the RS logs to see whats happening there.
> >> > > > Difficult
> >> > > > >> to
> >> > > > >> > > > guess
> >> > > > >> > > > > from the given information!
> >> > > > >> > > > >
> >> > > > >> > > > >
> >> > > > >> > > > > On Sat, Dec 7, 2013 at 8:52 PM, Federico Gaule <
> >> > > > >> fgaule@despegar.com>
> >> > > > >> > > > > wrote:
> >> > > > >> > > > >
> >> > > > >> > > > > > Any clue?
> >> > > > >> > > > > > El dic 5, 2013 9:49 a.m., "Federico Gaule" <
> >> > > > fgaule@despegar.com
> >> > > > >> >
> >> > > > >> > > > > escribió:
> >> > > > >> > > > > >
> >> > > > >> > > > > > > Hi,
> >> > > > >> > > > > > >
> >> > > > >> > > > > > > I have 2 clusters, Master (a) - Slave (b)
> >> replication.
> >> > > > >> > > > > > > B doesn't have client write or reads, all handlers
> >> (100)
> >> > > are
> >> > > > >> > > waiting
> >> > > > >> > > > > but
> >> > > > >> > > > > > > rpc.metrics.RpcQueueTime_num_ops and
> >> > > > >> > > > rpc.metrics.RpcQueueTime_avg_time
> >> > > > >> > > > > > reports
> >> > > > >> > > > > > > to be rpc calls to be queued.
> >> > > > >> > > > > > > There are some screenshots below to show ganglia
> >> > metrics.
> >> > > > How
> >> > > > >> is
> >> > > > >> > > this
> >> > > > >> > > > > > > behaviour explained? I have looked for metrics
> >> > > > specifications
> >> > > > >> but
> >> > > > >> > > > can't
> >> > > > >> > > > > > > find much information.
> >> > > > >> > > > > > >
> >> > > > >> > > > > > > Handlers
> >> > > > >> > > > > > > http://i42.tinypic.com/242ssoz.png
> >> > > > >> > > > > > >
> >> > > > >> > > > > > > NumOps
> >> > > > >> > > > > > > http://tinypic.com/r/of2c8k/5
> >> > > > >> > > > > > >
> >> > > > >> > > > > > > AvgTime
> >> > > > >> > > > > > > http://tinypic.com/r/2lsvg5w/5
> >> > > > >> > > > > > >
> >> > > > >> > > > > > > Cheers
> >> > > > >> > > > > > >
> >> > > > >> > > > > >
> >> > > > >> > > > >
> >> > > > >> > > > >
> >> > > > >> > > > >
> >> > > > >> > > > > --
> >> > > > >> > > > > Bharath Vissapragada
> >> > > > >> > > > > <http://www.cloudera.com>
> >> > > > >> > > > >
> >> > > > >> > > >
> >> > > > >> > > >
> >> > > > >> > > >
> >> > > > >> > > > --
> >> > > > >> > > >
> >> > > > >> > > > [image:
> >> > > > >> http://www.despegar.com/galeria/images/promos/isodespegar1.png
> >> > > > >> > ]
> >> > > > >> > > >
> >> > > > >> > > > *Ing. Federico Gaule*
> >> > > > >> > > > Líder Técnico - PAM <ho...@despegar.com>
> >> > > > >> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> >> > > > >> > > > tel. +54 (11) 4894-3500
> >> > > > >> > > >
> >> > > > >> > > > *[image: Seguinos en Twitter!] <
> >> > > http://twitter.com/#!/despegarar>
> >> > > > >> > > [image:
> >> > > > >> > > > Seguinos en Facebook!] <http://www.facebook.com/despegar
> >
> >> > > [image:
> >> > > > >> > > Seguinos
> >> > > > >> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> >> > > > >> > > > *Despegar.com, Inc. *
> >> > > > >> > > > El mejor precio para tu viaje.
> >> > > > >> > > >
> >> > > > >> > > > Este mensaje es confidencial y puede contener
> informaciÃ³n
> >> > > > amparada
> >> > > > >> por
> >> > > > >> > > el
> >> > > > >> > > > secreto profesional.
> >> > > > >> > > > Si usted ha recibido este e-mail por error, por favor
> >> > > > >> comunÃquenoslo
> >> > > > >> > > > inmediatamente respondiendo a este e-mail y luego
> >> > eliminÃ¡ndolo
> >> > > de
> >> > > > >> su
> >> > > > >> > > > sistema.
> >> > > > >> > > > El contenido de este mensaje no deberÃ¡ ser copiado ni
> >> > > divulgado a
> >> > > > >> > > ninguna
> >> > > > >> > > > persona.
> >> > > > >> > > >
> >> > > > >> > >
> >> > > > >> >
> >> > > > >> >
> >> > > > >> >
> >> > > > >> > --
> >> > > > >> >
> >> > > > >> > [image:
> >> > > > http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> >> > > > >> >
> >> > > > >> > *Ing. Federico Gaule*
> >> > > > >> > Líder Técnico - PAM <ho...@despegar.com>
> >> > > > >> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> >> > > > >> > tel. +54 (11) 4894-3500
> >> > > > >> >
> >> > > > >> > *[image: Seguinos en Twitter!] <
> >> http://twitter.com/#!/despegarar>
> >> > > > >> [image:
> >> > > > >> > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> >> [image:
> >> > > > >> Seguinos
> >> > > > >> > en YouTube!] <http://www.youtube.com/Despegar>*
> >> > > > >> > *Despegar.com, Inc. *
> >> > > > >> > El mejor precio para tu viaje.
> >> > > > >> >
> >> > > > >> > Este mensaje es confidencial y puede contener informaciÃ³n
> >> > amparada
> >> > > > por
> >> > > > >> el
> >> > > > >> > secreto profesional.
> >> > > > >> > Si usted ha recibido este e-mail por error, por favor
> >> > > comunÃquenoslo
> >> > > > >> > inmediatamente respondiendo a este e-mail y luego
> >> eliminÃ¡ndolo de
> >> > > su
> >> > > > >> > sistema.
> >> > > > >> > El contenido de este mensaje no deberÃ¡ ser copiado ni
> >> divulgado a
> >> > > > >> ninguna
> >> > > > >> > persona.
> >> > > > >> >
> >> > > > >>
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > >
> >> > > > > [image:
> >> > http://www.despegar.com/galeria/images/promos/isodespegar1.png
> >> > > ]
> >> > > > >
> >> > > > > *Ing. Federico Gaule*
> >> > > > > Líder Técnico - PAM <ho...@despegar.com>
> >> > > > >
> >> > > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> >> > > > > tel. +54 (11) 4894-3500
> >> > > > >
> >> > > > >
> >> > > > > *[image: Seguinos en Twitter!] <
> http://twitter.com/#!/despegarar>
> >> > > > [image:
> >> > > > > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> [image:
> >> > > > Seguinos
> >> > > > > en YouTube!] <http://www.youtube.com/Despegar>*
> >> > > > > *Despegar.com, Inc. *
> >> > > > >
> >> > > > > El mejor precio para tu viaje.
> >> > > > >
> >> > > > > Este mensaje es confidencial y puede contener informaciÃ³n
> >> amparada
> >> > por
> >> > > > el
> >> > > > > secreto profesional.
> >> > > > > Si usted ha recibido este e-mail por error, por favor
> >> comunÃquenoslo
> >> > > > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo
> >> de su
> >> > > > > sistema.
> >> > > > > El contenido de este mensaje no deberÃ¡ ser copiado ni
> divulgado a
> >> > > > ninguna
> >> > > > > persona.
> >> > > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > >
> >> > > > [image:
> >> http://www.despegar.com/galeria/images/promos/isodespegar1.png
> >> > ]
> >> > > >
> >> > > > *Ing. Federico Gaule*
> >> > > > Líder Técnico - PAM <ho...@despegar.com>
> >> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> >> > > > tel. +54 (11) 4894-3500
> >> > > >
> >> > > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> >> > > [image:
> >> > > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> >> > > Seguinos
> >> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> >> > > > *Despegar.com, Inc. *
> >> > > > El mejor precio para tu viaje.
> >> > > >
> >> > > > Este mensaje es confidencial y puede contener informaciÃ³n
> amparada
> >> por
> >> > > el
> >> > > > secreto profesional.
> >> > > > Si usted ha recibido este e-mail por error, por favor
> >> comunÃquenoslo
> >> > > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de
> >> su
> >> > > > sistema.
> >> > > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> >> > > ninguna
> >> > > > persona.
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > [image:
> http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> >> >
> >> > *Ing. Federico Gaule*
> >> > Líder Técnico - PAM <ho...@despegar.com>
> >> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> >> > tel. +54 (11) 4894-3500
> >> >
> >> > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> >> [image:
> >> > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> >> Seguinos
> >> > en YouTube!] <http://www.youtube.com/Despegar>*
> >> > *Despegar.com, Inc. *
> >> > El mejor precio para tu viaje.
> >> >
> >> > Este mensaje es confidencial y puede contener informaciÃ³n amparada
> por
> >> el
> >> > secreto profesional.
> >> > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> >> > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> >> > sistema.
> >> > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> >> ninguna
> >> > persona.
> >> >
> >>
> >
> >
> >
> > --
> >
> > [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> >
> > *Ing. Federico Gaule*
> > Líder Técnico - PAM <ho...@despegar.com>
> >
> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > tel. +54 (11) 4894-3500
> >
> >
> > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> [image:
> > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> Seguinos
> > en YouTube!] <http://www.youtube.com/Despegar>*
> > *Despegar.com, Inc. *
> >
> > El mejor precio para tu viaje.
> >
> > Este mensaje es confidencial y puede contener informaciÃ³n amparada por
> el
> > secreto profesional.
> > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> > sistema.
> > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> ninguna
> > persona.
> >
>
>
>
> --
>
> [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
>
> *Ing. Federico Gaule*
> Líder Técnico - PAM <ho...@despegar.com>
> Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> tel. +54 (11) 4894-3500
>
> *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
> Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
> en YouTube!] <http://www.youtube.com/Despegar>*
> *Despegar.com, Inc. *
> El mejor precio para tu viaje.
>
> Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
> secreto profesional.
> Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> sistema.
> El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
> persona.
>

Re: RPC - Queue Time when handlers are all waiting

Posted by Federico Gaule <fg...@despegar.com>.

There is another set of handler we haven't customized "PRI IPC" (priority
?). What are those handlers used for? What is the property used to increase
the number of handlers? hbase.regionserver.custom.priority.handler.count ?

Thanks!


2013/12/10 Federico Gaule <fg...@despegar.com>

> I've increased hbase.regionserver.replication.handler.count 10x (30) but
> nothing have changed. rpc.metrics.RpcQueueTime_avg_time still shows
> activity :(
>
> Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 29 on 60000 WAITING
> (since 16hrs, 58mins, 56sec ago)Waiting for a call (since 16hrs, 58mins,
> 56sec ago) Mon Dec 09 14:04:10 EST 2013 REPL IPC Server handler 28 on
> 60000WAITING (since 16hrs, 58mins, 56sec ago) Waiting for a call (since
> 16hrs, 58mins, 56sec ago)Mon Dec 09 14:04:10 EST 2013 REPL IPC Server
> handler 27 on 60000 WAITING (since 16hrs, 58mins, 56sec ago)Waiting for a
> call (since 16hrs, 58mins, 56sec ago) Mon Dec 09 14:04:10 EST 2013 REPL
> IPC Server handler 26 on 60000WAITING (since 16hrs, 58mins, 56sec ago)Waiting for a call (since 16hrs, 58mins, 56sec ago)
> ... ...
> ...
> ...
> Mon Dec 09 14:04:10 EST 2013 REPL IPC Server handler 2 on 60000WAITING
> (since 16hrs, 58mins, 56sec ago) Waiting for a call (since 16hrs, 58mins,
> 56sec ago)Mon Dec 09 14:04:10 EST 2013 REPL IPC Server handler 1 on 60000WAITING (since 16hrs, 58mins, 56sec ago)Waiting
> for a call (since 16hrs, 58mins, 56sec ago) Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 0 on 60000WAITING
> (since 16hrs, 58mins, 56sec ago) Waiting for a call (since 16hrs, 58mins,
> 56sec ago)
> Thanks JM
>
>
> 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
>
>> Yes, default value is 3 in 0.94.14. If you have not changed it, then it's
>> still 3.
>>
>> conf.getInt("hbase.regionserver.replication.handler.count", 3);
>>
>> Keep us posted on the results.
>>
>> JM
>>
>>
>> 2013/12/9 Federico Gaule <fg...@despegar.com>
>>
>> > Default value for hbase.regionserver.replication.handler.count (can't
>> find
>> > what is the default, Is it 3?)
>> > I'll do a try increasing that property
>> >
>> > Fri Dec 06 12:44:12 EST 2013REPL IPC Server handler 2 on 60020WAITING
>> > (since 8sec ago)Waiting for a call (since 8sec ago)Fri Dec 06 12:44:12
>> EST
>> > 2013REPL IPC Server handler 1 on 60020WAITING (since 8sec ago)Waiting
>> for a
>> > call (since 8sec ago)Fri Dec 06 12:44:12 EST 2013REPL IPC Server
>> handler 0
>> > on 60020WAITING (since 2sec ago)Waiting for a call (since 2sec ago)
>> > Thanks JM
>> >
>> >
>> > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
>> >
>> > > For replications, the handlers used on the salve cluster are
>> configured
>> > by
>> > > hbase.regionserver.replication.handler.count. What value do you have
>> for
>> > > this property?
>> > >
>> > > JM
>> > >
>> > >
>> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
>> > >
>> > > > Here is a thread saying what i think it should be (
>> > > > http://grokbase.com/t/hbase/user/13bmndq53k/average-rpc-queue-time)
>> > > >
>> > > > "The RpcQueueTime metrics are a measurement of how long individual
>> > calls
>> > > > stay in this queued state. If your handlers were never 100%
>> occupied,
>> > > this
>> > > > value would be 0. An average of 3 hours is concerning, it basically
>> > means
>> > > > that when a call comes into the RegionServer it takes on average 3
>> > hours
>> > > to
>> > > > start processing, because handlers are all occupied for that amount
>> of
>> > > > time."
>> > > >
>> > > > Is that correct?
>> > > >
>> > > >
>> > > >
>> > > > 2013/12/9 Federico Gaule <fg...@despegar.com>
>> > > >
>> > > > > Correct me if i'm wrong, but, Queues should be used only when
>> > handlers
>> > > > are
>> > > > > all busy, shouldn't it?.
>> > > > > If that's true, i don't get why there is activity related to
>> queues.
>> > > > >
>> > > > > Maybe i'm missing some piece of knowledge about when hbase is
>> using
>> > > > queues
>> > > > > :)
>> > > > >
>> > > > > Thanks
>> > > > >
>> > > > >
>> > > > > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
>> > > > >
>> > > > >> There might be something I'm missing ;)
>> > > > >>
>> > > > >> On cluster B, as you said, never more than 50% of your handlers
>> are
>> > > > used.
>> > > > >> Your Ganglia metrics are showing that there is activities (num
>> ops
>> > is
>> > > > >> increasing), which is correct.
>> > > > >>
>> > > > >> Can you please confirm what you think is wrong from your charts?
>> > > > >>
>> > > > >> Thanks,
>> > > > >>
>> > > > >> JM
>> > > > >>
>> > > > >>
>> > > > >> 2013/12/9 Federico Gaule <fg...@despegar.com>
>> > > > >>
>> > > > >> > Hi JM,
>> > > > >> > Cluster B is only receiving replication data (writes), but
>> > handlers
>> > > > are
>> > > > >> > waiting most of the time (never 50% of them are used). As i
>> have
>> > > read,
>> > > > >> RPC
>> > > > >> > queue is only used when handlers are all waiting, does it count
>> > for
>> > > > >> > replication as well?
>> > > > >> >
>> > > > >> > Thanks!
>> > > > >> >
>> > > > >> >
>> > > > >> > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
>> > > > >> >
>> > > > >> > > Hi,
>> > > > >> > >
>> > > > >> > > When you say that B doesn't get any read/write operation,
>> does
>> > it
>> > > > mean
>> > > > >> > you
>> > > > >> > > stopped the replication? Or B is still getting the write
>> > > operations
>> > > > >> from
>> > > > >> > A
>> > > > >> > > because of the replication? If so, that's why you RPC queue
>> is
>> > > > used...
>> > > > >> > >
>> > > > >> > > JM
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
>> > > > >> > >
>> > > > >> > > > Not much information in RS logs (DEBUG level set to
>> > > > >> > > > org.apache.hadoop.hbase). Here is a sample of one
>> regionserver
>> > > > >> showing
>> > > > >> > > > increasing rpc.metrics.RpcQueueTime_num_ops and
>> > > > >> > > > rpc.metrics.RpcQueueTime_avg_time
>> > > > >> > > > activity:
>> > > > >> > > >
>> > > > >> > > > 2013-12-09 08:09:10,699 DEBUG
>> > > > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats:
>> > > total=23.14
>> > > > >> MB,
>> > > > >> > > > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151,
>> > > > >> > hits=122168501,
>> > > > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
>> > > > cachingHits=122162378,
>> > > > >> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
>> > > > >> > > > evictedPerRun=Infinity
>> > > > >> > > > 2013-12-09 08:09:11,396 INFO
>> > > > >> > > >
>> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
>> > > > >> Total
>> > > > >> > > > replicated: 1
>> > > > >> > > > 2013-12-09 08:09:14,979 INFO
>> > > > >> > > >
>> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
>> > > > >> Total
>> > > > >> > > > replicated: 2
>> > > > >> > > > 2013-12-09 08:09:16,016 INFO
>> > > > >> > > >
>> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
>> > > > >> Total
>> > > > >> > > > replicated: 1
>> > > > >> > > > ...
>> > > > >> > > > 2013-12-09 08:14:07,659 INFO
>> > > > >> > > >
>> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
>> > > > >> Total
>> > > > >> > > > replicated: 1
>> > > > >> > > > 2013-12-09 08:14:08,713 INFO
>> > > > >> > > >
>> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
>> > > > >> Total
>> > > > >> > > > replicated: 3
>> > > > >> > > > 2013-12-09 08:14:10,699 DEBUG
>> > > > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats:
>> > > total=23.14
>> > > > >> MB,
>> > > > >> > > > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151,
>> > > > >> > hits=122168501,
>> > > > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
>> > > > cachingHits=122162378,
>> > > > >> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
>> > > > >> > > > evictedPerRun=Infinity
>> > > > >> > > > 2013-12-09 08:14:12,711 INFO
>> > > > >> > > >
>> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
>> > > > >> Total
>> > > > >> > > > replicated: 1
>> > > > >> > > > 2013-12-09 08:14:14,778 INFO
>> > > > >> > > >
>> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
>> > > > >> Total
>> > > > >> > > > replicated: 3
>> > > > >> > > > ...
>> > > > >> > > > 2013-12-09 08:15:09,199 INFO
>> > > > >> > > >
>> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
>> > > > >> Total
>> > > > >> > > > replicated: 3
>> > > > >> > > > 2013-12-09 08:15:12,243 INFO
>> > > > >> > > >
>> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
>> > > > >> Total
>> > > > >> > > > replicated: 2
>> > > > >> > > > 2013-12-09 08:15:22,086 INFO
>> > > > >> > > >
>> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
>> > > > >> Total
>> > > > >> > > > replicated: 2
>> > > > >> > > >
>> > > > >> > > > Thanks
>> > > > >> > > >
>> > > > >> > > >
>> > > > >> > > > 2013/12/7 Bharath Vissapragada <bh...@cloudera.com>
>> > > > >> > > >
>> > > > >> > > > > I'd look into the RS logs to see whats happening there.
>> > > > Difficult
>> > > > >> to
>> > > > >> > > > guess
>> > > > >> > > > > from the given information!
>> > > > >> > > > >
>> > > > >> > > > >
>> > > > >> > > > > On Sat, Dec 7, 2013 at 8:52 PM, Federico Gaule <
>> > > > >> fgaule@despegar.com>
>> > > > >> > > > > wrote:
>> > > > >> > > > >
>> > > > >> > > > > > Any clue?
>> > > > >> > > > > > El dic 5, 2013 9:49 a.m., "Federico Gaule" <
>> > > > fgaule@despegar.com
>> > > > >> >
>> > > > >> > > > > escribió:
>> > > > >> > > > > >
>> > > > >> > > > > > > Hi,
>> > > > >> > > > > > >
>> > > > >> > > > > > > I have 2 clusters, Master (a) - Slave (b)
>> replication.
>> > > > >> > > > > > > B doesn't have client write or reads, all handlers
>> (100)
>> > > are
>> > > > >> > > waiting
>> > > > >> > > > > but
>> > > > >> > > > > > > rpc.metrics.RpcQueueTime_num_ops and
>> > > > >> > > > rpc.metrics.RpcQueueTime_avg_time
>> > > > >> > > > > > reports
>> > > > >> > > > > > > to be rpc calls to be queued.
>> > > > >> > > > > > > There are some screenshots below to show ganglia
>> > metrics.
>> > > > How
>> > > > >> is
>> > > > >> > > this
>> > > > >> > > > > > > behaviour explained? I have looked for metrics
>> > > > specifications
>> > > > >> but
>> > > > >> > > > can't
>> > > > >> > > > > > > find much information.
>> > > > >> > > > > > >
>> > > > >> > > > > > > Handlers
>> > > > >> > > > > > > http://i42.tinypic.com/242ssoz.png
>> > > > >> > > > > > >
>> > > > >> > > > > > > NumOps
>> > > > >> > > > > > > http://tinypic.com/r/of2c8k/5
>> > > > >> > > > > > >
>> > > > >> > > > > > > AvgTime
>> > > > >> > > > > > > http://tinypic.com/r/2lsvg5w/5
>> > > > >> > > > > > >
>> > > > >> > > > > > > Cheers
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > > >
>> > > > >> > > > >
>> > > > >> > > > > --
>> > > > >> > > > > Bharath Vissapragada
>> > > > >> > > > > <http://www.cloudera.com>
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > > >
>> > > > >> > > >
>> > > > >> > > > --
>> > > > >> > > >
>> > > > >> > > > [image:
>> > > > >> http://www.despegar.com/galeria/images/promos/isodespegar1.png
>> > > > >> > ]
>> > > > >> > > >
>> > > > >> > > > *Ing. Federico Gaule*
>> > > > >> > > > Líder Técnico - PAM <ho...@despegar.com>
>> > > > >> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
>> > > > >> > > > tel. +54 (11) 4894-3500
>> > > > >> > > >
>> > > > >> > > > *[image: Seguinos en Twitter!] <
>> > > http://twitter.com/#!/despegarar>
>> > > > >> > > [image:
>> > > > >> > > > Seguinos en Facebook!] <http://www.facebook.com/despegar>
>> > > [image:
>> > > > >> > > Seguinos
>> > > > >> > > > en YouTube!] <http://www.youtube.com/Despegar>*
>> > > > >> > > > *Despegar.com, Inc. *
>> > > > >> > > > El mejor precio para tu viaje.
>> > > > >> > > >
>> > > > >> > > > Este mensaje es confidencial y puede contener informaciÃ³n
>> > > > amparada
>> > > > >> por
>> > > > >> > > el
>> > > > >> > > > secreto profesional.
>> > > > >> > > > Si usted ha recibido este e-mail por error, por favor
>> > > > >> comunÃquenoslo
>> > > > >> > > > inmediatamente respondiendo a este e-mail y luego
>> > eliminÃ¡ndolo
>> > > de
>> > > > >> su
>> > > > >> > > > sistema.
>> > > > >> > > > El contenido de este mensaje no deberÃ¡ ser copiado ni
>> > > divulgado a
>> > > > >> > > ninguna
>> > > > >> > > > persona.
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >> >
>> > > > >> >
>> > > > >> > --
>> > > > >> >
>> > > > >> > [image:
>> > > > http://www.despegar.com/galeria/images/promos/isodespegar1.png]
>> > > > >> >
>> > > > >> > *Ing. Federico Gaule*
>> > > > >> > Líder Técnico - PAM <ho...@despegar.com>
>> > > > >> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
>> > > > >> > tel. +54 (11) 4894-3500
>> > > > >> >
>> > > > >> > *[image: Seguinos en Twitter!] <
>> http://twitter.com/#!/despegarar>
>> > > > >> [image:
>> > > > >> > Seguinos en Facebook!] <http://www.facebook.com/despegar>
>> [image:
>> > > > >> Seguinos
>> > > > >> > en YouTube!] <http://www.youtube.com/Despegar>*
>> > > > >> > *Despegar.com, Inc. *
>> > > > >> > El mejor precio para tu viaje.
>> > > > >> >
>> > > > >> > Este mensaje es confidencial y puede contener informaciÃ³n
>> > amparada
>> > > > por
>> > > > >> el
>> > > > >> > secreto profesional.
>> > > > >> > Si usted ha recibido este e-mail por error, por favor
>> > > comunÃquenoslo
>> > > > >> > inmediatamente respondiendo a este e-mail y luego
>> eliminÃ¡ndolo de
>> > > su
>> > > > >> > sistema.
>> > > > >> > El contenido de este mensaje no deberÃ¡ ser copiado ni
>> divulgado a
>> > > > >> ninguna
>> > > > >> > persona.
>> > > > >> >
>> > > > >>
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > >
>> > > > > [image:
>> > http://www.despegar.com/galeria/images/promos/isodespegar1.png
>> > > ]
>> > > > >
>> > > > > *Ing. Federico Gaule*
>> > > > > Líder Técnico - PAM <ho...@despegar.com>
>> > > > >
>> > > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
>> > > > > tel. +54 (11) 4894-3500
>> > > > >
>> > > > >
>> > > > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
>> > > > [image:
>> > > > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
>> > > > Seguinos
>> > > > > en YouTube!] <http://www.youtube.com/Despegar>*
>> > > > > *Despegar.com, Inc. *
>> > > > >
>> > > > > El mejor precio para tu viaje.
>> > > > >
>> > > > > Este mensaje es confidencial y puede contener informaciÃ³n
>> amparada
>> > por
>> > > > el
>> > > > > secreto profesional.
>> > > > > Si usted ha recibido este e-mail por error, por favor
>> comunÃquenoslo
>> > > > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo
>> de su
>> > > > > sistema.
>> > > > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
>> > > > ninguna
>> > > > > persona.
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > >
>> > > > [image:
>> http://www.despegar.com/galeria/images/promos/isodespegar1.png
>> > ]
>> > > >
>> > > > *Ing. Federico Gaule*
>> > > > Líder Técnico - PAM <ho...@despegar.com>
>> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
>> > > > tel. +54 (11) 4894-3500
>> > > >
>> > > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
>> > > [image:
>> > > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
>> > > Seguinos
>> > > > en YouTube!] <http://www.youtube.com/Despegar>*
>> > > > *Despegar.com, Inc. *
>> > > > El mejor precio para tu viaje.
>> > > >
>> > > > Este mensaje es confidencial y puede contener informaciÃ³n amparada
>> por
>> > > el
>> > > > secreto profesional.
>> > > > Si usted ha recibido este e-mail por error, por favor
>> comunÃquenoslo
>> > > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de
>> su
>> > > > sistema.
>> > > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
>> > > ninguna
>> > > > persona.
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> >
>> > [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
>> >
>> > *Ing. Federico Gaule*
>> > Líder Técnico - PAM <ho...@despegar.com>
>> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
>> > tel. +54 (11) 4894-3500
>> >
>> > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
>> [image:
>> > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
>> Seguinos
>> > en YouTube!] <http://www.youtube.com/Despegar>*
>> > *Despegar.com, Inc. *
>> > El mejor precio para tu viaje.
>> >
>> > Este mensaje es confidencial y puede contener informaciÃ³n amparada por
>> el
>> > secreto profesional.
>> > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
>> > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
>> > sistema.
>> > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
>> ninguna
>> > persona.
>> >
>>
>
>
>
> --
>
> [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
>
> *Ing. Federico Gaule*
> Líder Técnico - PAM <ho...@despegar.com>
>
> Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> tel. +54 (11) 4894-3500
>
>
> *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
> Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
> en YouTube!] <http://www.youtube.com/Despegar>*
> *Despegar.com, Inc. *
>
> El mejor precio para tu viaje.
>
> Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
> secreto profesional.
> Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> sistema.
> El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
> persona.
>



-- 

[image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]

*Ing. Federico Gaule*
Líder Técnico - PAM <ho...@despegar.com>
Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
tel. +54 (11) 4894-3500

*[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
en YouTube!] <http://www.youtube.com/Despegar>*
*Despegar.com, Inc. *
El mejor precio para tu viaje.

Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
secreto profesional.
Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
sistema.
El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
persona.

Re: RPC - Queue Time when handlers are all waiting

Posted by Federico Gaule <fg...@despegar.com>.

I've increased hbase.regionserver.replication.handler.count 10x (30) but
nothing have changed. rpc.metrics.RpcQueueTime_avg_time still shows
activity :(

Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 29 on 60000WAITING
(since 16hrs, 58mins, 56sec ago)Waiting for a call (since 16hrs, 58mins,
56sec ago)Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 28 on
60000WAITING
(since 16hrs, 58mins, 56sec ago)Waiting for a call (since 16hrs, 58mins,
56sec ago)Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 27 on
60000WAITING
(since 16hrs, 58mins, 56sec ago)Waiting for a call (since 16hrs, 58mins,
56sec ago)Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 26 on
60000WAITING
(since 16hrs, 58mins, 56sec ago)Waiting for a call (since 16hrs, 58mins,
56sec ago)......
...
...
Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 2 on 60000WAITING
(since 16hrs, 58mins, 56sec ago)Waiting for a call (since 16hrs, 58mins,
56sec ago)Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 1 on 60000WAITING
(since 16hrs, 58mins, 56sec ago)Waiting for a call (since 16hrs, 58mins,
56sec ago)Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 0 on 60000WAITING
(since 16hrs, 58mins, 56sec ago)Waiting for a call (since 16hrs, 58mins,
56sec ago)
Thanks JM


2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>

> Yes, default value is 3 in 0.94.14. If you have not changed it, then it's
> still 3.
>
> conf.getInt("hbase.regionserver.replication.handler.count", 3);
>
> Keep us posted on the results.
>
> JM
>
>
> 2013/12/9 Federico Gaule <fg...@despegar.com>
>
> > Default value for hbase.regionserver.replication.handler.count (can't
> find
> > what is the default, Is it 3?)
> > I'll do a try increasing that property
> >
> > Fri Dec 06 12:44:12 EST 2013REPL IPC Server handler 2 on 60020WAITING
> > (since 8sec ago)Waiting for a call (since 8sec ago)Fri Dec 06 12:44:12
> EST
> > 2013REPL IPC Server handler 1 on 60020WAITING (since 8sec ago)Waiting
> for a
> > call (since 8sec ago)Fri Dec 06 12:44:12 EST 2013REPL IPC Server handler
> 0
> > on 60020WAITING (since 2sec ago)Waiting for a call (since 2sec ago)
> > Thanks JM
> >
> >
> > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> >
> > > For replications, the handlers used on the salve cluster are configured
> > by
> > > hbase.regionserver.replication.handler.count. What value do you have
> for
> > > this property?
> > >
> > > JM
> > >
> > >
> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > >
> > > > Here is a thread saying what i think it should be (
> > > > http://grokbase.com/t/hbase/user/13bmndq53k/average-rpc-queue-time)
> > > >
> > > > "The RpcQueueTime metrics are a measurement of how long individual
> > calls
> > > > stay in this queued state. If your handlers were never 100% occupied,
> > > this
> > > > value would be 0. An average of 3 hours is concerning, it basically
> > means
> > > > that when a call comes into the RegionServer it takes on average 3
> > hours
> > > to
> > > > start processing, because handlers are all occupied for that amount
> of
> > > > time."
> > > >
> > > > Is that correct?
> > > >
> > > >
> > > >
> > > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > >
> > > > > Correct me if i'm wrong, but, Queues should be used only when
> > handlers
> > > > are
> > > > > all busy, shouldn't it?.
> > > > > If that's true, i don't get why there is activity related to
> queues.
> > > > >
> > > > > Maybe i'm missing some piece of knowledge about when hbase is using
> > > > queues
> > > > > :)
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > > > >
> > > > >> There might be something I'm missing ;)
> > > > >>
> > > > >> On cluster B, as you said, never more than 50% of your handlers
> are
> > > > used.
> > > > >> Your Ganglia metrics are showing that there is activities (num ops
> > is
> > > > >> increasing), which is correct.
> > > > >>
> > > > >> Can you please confirm what you think is wrong from your charts?
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> JM
> > > > >>
> > > > >>
> > > > >> 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > > >>
> > > > >> > Hi JM,
> > > > >> > Cluster B is only receiving replication data (writes), but
> > handlers
> > > > are
> > > > >> > waiting most of the time (never 50% of them are used). As i have
> > > read,
> > > > >> RPC
> > > > >> > queue is only used when handlers are all waiting, does it count
> > for
> > > > >> > replication as well?
> > > > >> >
> > > > >> > Thanks!
> > > > >> >
> > > > >> >
> > > > >> > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > > > >> >
> > > > >> > > Hi,
> > > > >> > >
> > > > >> > > When you say that B doesn't get any read/write operation, does
> > it
> > > > mean
> > > > >> > you
> > > > >> > > stopped the replication? Or B is still getting the write
> > > operations
> > > > >> from
> > > > >> > A
> > > > >> > > because of the replication? If so, that's why you RPC queue is
> > > > used...
> > > > >> > >
> > > > >> > > JM
> > > > >> > >
> > > > >> > >
> > > > >> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > > >> > >
> > > > >> > > > Not much information in RS logs (DEBUG level set to
> > > > >> > > > org.apache.hadoop.hbase). Here is a sample of one
> regionserver
> > > > >> showing
> > > > >> > > > increasing rpc.metrics.RpcQueueTime_num_ops and
> > > > >> > > > rpc.metrics.RpcQueueTime_avg_time
> > > > >> > > > activity:
> > > > >> > > >
> > > > >> > > > 2013-12-09 08:09:10,699 DEBUG
> > > > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats:
> > > total=23.14
> > > > >> MB,
> > > > >> > > > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151,
> > > > >> > hits=122168501,
> > > > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> > > > cachingHits=122162378,
> > > > >> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> > > > >> > > > evictedPerRun=Infinity
> > > > >> > > > 2013-12-09 08:09:11,396 INFO
> > > > >> > > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> Total
> > > > >> > > > replicated: 1
> > > > >> > > > 2013-12-09 08:09:14,979 INFO
> > > > >> > > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> Total
> > > > >> > > > replicated: 2
> > > > >> > > > 2013-12-09 08:09:16,016 INFO
> > > > >> > > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> Total
> > > > >> > > > replicated: 1
> > > > >> > > > ...
> > > > >> > > > 2013-12-09 08:14:07,659 INFO
> > > > >> > > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> Total
> > > > >> > > > replicated: 1
> > > > >> > > > 2013-12-09 08:14:08,713 INFO
> > > > >> > > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> Total
> > > > >> > > > replicated: 3
> > > > >> > > > 2013-12-09 08:14:10,699 DEBUG
> > > > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats:
> > > total=23.14
> > > > >> MB,
> > > > >> > > > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151,
> > > > >> > hits=122168501,
> > > > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> > > > cachingHits=122162378,
> > > > >> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> > > > >> > > > evictedPerRun=Infinity
> > > > >> > > > 2013-12-09 08:14:12,711 INFO
> > > > >> > > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> Total
> > > > >> > > > replicated: 1
> > > > >> > > > 2013-12-09 08:14:14,778 INFO
> > > > >> > > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> Total
> > > > >> > > > replicated: 3
> > > > >> > > > ...
> > > > >> > > > 2013-12-09 08:15:09,199 INFO
> > > > >> > > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> Total
> > > > >> > > > replicated: 3
> > > > >> > > > 2013-12-09 08:15:12,243 INFO
> > > > >> > > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> Total
> > > > >> > > > replicated: 2
> > > > >> > > > 2013-12-09 08:15:22,086 INFO
> > > > >> > > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> Total
> > > > >> > > > replicated: 2
> > > > >> > > >
> > > > >> > > > Thanks
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > 2013/12/7 Bharath Vissapragada <bh...@cloudera.com>
> > > > >> > > >
> > > > >> > > > > I'd look into the RS logs to see whats happening there.
> > > > Difficult
> > > > >> to
> > > > >> > > > guess
> > > > >> > > > > from the given information!
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > On Sat, Dec 7, 2013 at 8:52 PM, Federico Gaule <
> > > > >> fgaule@despegar.com>
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Any clue?
> > > > >> > > > > > El dic 5, 2013 9:49 a.m., "Federico Gaule" <
> > > > fgaule@despegar.com
> > > > >> >
> > > > >> > > > > escribió:
> > > > >> > > > > >
> > > > >> > > > > > > Hi,
> > > > >> > > > > > >
> > > > >> > > > > > > I have 2 clusters, Master (a) - Slave (b) replication.
> > > > >> > > > > > > B doesn't have client write or reads, all handlers
> (100)
> > > are
> > > > >> > > waiting
> > > > >> > > > > but
> > > > >> > > > > > > rpc.metrics.RpcQueueTime_num_ops and
> > > > >> > > > rpc.metrics.RpcQueueTime_avg_time
> > > > >> > > > > > reports
> > > > >> > > > > > > to be rpc calls to be queued.
> > > > >> > > > > > > There are some screenshots below to show ganglia
> > metrics.
> > > > How
> > > > >> is
> > > > >> > > this
> > > > >> > > > > > > behaviour explained? I have looked for metrics
> > > > specifications
> > > > >> but
> > > > >> > > > can't
> > > > >> > > > > > > find much information.
> > > > >> > > > > > >
> > > > >> > > > > > > Handlers
> > > > >> > > > > > > http://i42.tinypic.com/242ssoz.png
> > > > >> > > > > > >
> > > > >> > > > > > > NumOps
> > > > >> > > > > > > http://tinypic.com/r/of2c8k/5
> > > > >> > > > > > >
> > > > >> > > > > > > AvgTime
> > > > >> > > > > > > http://tinypic.com/r/2lsvg5w/5
> > > > >> > > > > > >
> > > > >> > > > > > > Cheers
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > --
> > > > >> > > > > Bharath Vissapragada
> > > > >> > > > > <http://www.cloudera.com>
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > >
> > > > >> > > > [image:
> > > > >> http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > > >> > ]
> > > > >> > > >
> > > > >> > > > *Ing. Federico Gaule*
> > > > >> > > > Líder Técnico - PAM <ho...@despegar.com>
> > > > >> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > >> > > > tel. +54 (11) 4894-3500
> > > > >> > > >
> > > > >> > > > *[image: Seguinos en Twitter!] <
> > > http://twitter.com/#!/despegarar>
> > > > >> > > [image:
> > > > >> > > > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> > > [image:
> > > > >> > > Seguinos
> > > > >> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > >> > > > *Despegar.com, Inc. *
> > > > >> > > > El mejor precio para tu viaje.
> > > > >> > > >
> > > > >> > > > Este mensaje es confidencial y puede contener informaciÃ³n
> > > > amparada
> > > > >> por
> > > > >> > > el
> > > > >> > > > secreto profesional.
> > > > >> > > > Si usted ha recibido este e-mail por error, por favor
> > > > >> comunÃquenoslo
> > > > >> > > > inmediatamente respondiendo a este e-mail y luego
> > eliminÃ¡ndolo
> > > de
> > > > >> su
> > > > >> > > > sistema.
> > > > >> > > > El contenido de este mensaje no deberÃ¡ ser copiado ni
> > > divulgado a
> > > > >> > > ninguna
> > > > >> > > > persona.
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> >
> > > > >> > [image:
> > > > http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> > > > >> >
> > > > >> > *Ing. Federico Gaule*
> > > > >> > Líder Técnico - PAM <ho...@despegar.com>
> > > > >> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > >> > tel. +54 (11) 4894-3500
> > > > >> >
> > > > >> > *[image: Seguinos en Twitter!] <
> http://twitter.com/#!/despegarar>
> > > > >> [image:
> > > > >> > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> [image:
> > > > >> Seguinos
> > > > >> > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > >> > *Despegar.com, Inc. *
> > > > >> > El mejor precio para tu viaje.
> > > > >> >
> > > > >> > Este mensaje es confidencial y puede contener informaciÃ³n
> > amparada
> > > > por
> > > > >> el
> > > > >> > secreto profesional.
> > > > >> > Si usted ha recibido este e-mail por error, por favor
> > > comunÃquenoslo
> > > > >> > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo
> de
> > > su
> > > > >> > sistema.
> > > > >> > El contenido de este mensaje no deberÃ¡ ser copiado ni
> divulgado a
> > > > >> ninguna
> > > > >> > persona.
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > [image:
> > http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > ]
> > > > >
> > > > > *Ing. Federico Gaule*
> > > > > Líder Técnico - PAM <ho...@despegar.com>
> > > > >
> > > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > > tel. +54 (11) 4894-3500
> > > > >
> > > > >
> > > > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> > > > [image:
> > > > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> > > > Seguinos
> > > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > > *Despegar.com, Inc. *
> > > > >
> > > > > El mejor precio para tu viaje.
> > > > >
> > > > > Este mensaje es confidencial y puede contener informaciÃ³n amparada
> > por
> > > > el
> > > > > secreto profesional.
> > > > > Si usted ha recibido este e-mail por error, por favor
> comunÃquenoslo
> > > > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de
> su
> > > > > sistema.
> > > > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> > > > ninguna
> > > > > persona.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > [image:
> http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > ]
> > > >
> > > > *Ing. Federico Gaule*
> > > > Líder Técnico - PAM <ho...@despegar.com>
> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > tel. +54 (11) 4894-3500
> > > >
> > > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> > > [image:
> > > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> > > Seguinos
> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > *Despegar.com, Inc. *
> > > > El mejor precio para tu viaje.
> > > >
> > > > Este mensaje es confidencial y puede contener informaciÃ³n amparada
> por
> > > el
> > > > secreto profesional.
> > > > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> > > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> > > > sistema.
> > > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> > > ninguna
> > > > persona.
> > > >
> > >
> >
> >
> >
> > --
> >
> > [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> >
> > *Ing. Federico Gaule*
> > Líder Técnico - PAM <ho...@despegar.com>
> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > tel. +54 (11) 4894-3500
> >
> > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> [image:
> > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> Seguinos
> > en YouTube!] <http://www.youtube.com/Despegar>*
> > *Despegar.com, Inc. *
> > El mejor precio para tu viaje.
> >
> > Este mensaje es confidencial y puede contener informaciÃ³n amparada por
> el
> > secreto profesional.
> > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> > sistema.
> > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> ninguna
> > persona.
> >
>



-- 

[image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]

*Ing. Federico Gaule*
Líder Técnico - PAM <ho...@despegar.com>
Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
tel. +54 (11) 4894-3500

*[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
en YouTube!] <http://www.youtube.com/Despegar>*
*Despegar.com, Inc. *
El mejor precio para tu viaje.

Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
secreto profesional.
Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
sistema.
El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
persona.

Re: RPC - Queue Time when handlers are all waiting

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Yes, default value is 3 in 0.94.14. If you have not changed it, then it's
still 3.

conf.getInt("hbase.regionserver.replication.handler.count", 3);

Keep us posted on the results.

JM


2013/12/9 Federico Gaule <fg...@despegar.com>

> Default value for hbase.regionserver.replication.handler.count (can't find
> what is the default, Is it 3?)
> I'll do a try increasing that property
>
> Fri Dec 06 12:44:12 EST 2013REPL IPC Server handler 2 on 60020WAITING
> (since 8sec ago)Waiting for a call (since 8sec ago)Fri Dec 06 12:44:12 EST
> 2013REPL IPC Server handler 1 on 60020WAITING (since 8sec ago)Waiting for a
> call (since 8sec ago)Fri Dec 06 12:44:12 EST 2013REPL IPC Server handler 0
> on 60020WAITING (since 2sec ago)Waiting for a call (since 2sec ago)
> Thanks JM
>
>
> 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
>
> > For replications, the handlers used on the salve cluster are configured
> by
> > hbase.regionserver.replication.handler.count. What value do you have for
> > this property?
> >
> > JM
> >
> >
> > 2013/12/9 Federico Gaule <fg...@despegar.com>
> >
> > > Here is a thread saying what i think it should be (
> > > http://grokbase.com/t/hbase/user/13bmndq53k/average-rpc-queue-time)
> > >
> > > "The RpcQueueTime metrics are a measurement of how long individual
> calls
> > > stay in this queued state. If your handlers were never 100% occupied,
> > this
> > > value would be 0. An average of 3 hours is concerning, it basically
> means
> > > that when a call comes into the RegionServer it takes on average 3
> hours
> > to
> > > start processing, because handlers are all occupied for that amount of
> > > time."
> > >
> > > Is that correct?
> > >
> > >
> > >
> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > >
> > > > Correct me if i'm wrong, but, Queues should be used only when
> handlers
> > > are
> > > > all busy, shouldn't it?.
> > > > If that's true, i don't get why there is activity related to queues.
> > > >
> > > > Maybe i'm missing some piece of knowledge about when hbase is using
> > > queues
> > > > :)
> > > >
> > > > Thanks
> > > >
> > > >
> > > > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > > >
> > > >> There might be something I'm missing ;)
> > > >>
> > > >> On cluster B, as you said, never more than 50% of your handlers are
> > > used.
> > > >> Your Ganglia metrics are showing that there is activities (num ops
> is
> > > >> increasing), which is correct.
> > > >>
> > > >> Can you please confirm what you think is wrong from your charts?
> > > >>
> > > >> Thanks,
> > > >>
> > > >> JM
> > > >>
> > > >>
> > > >> 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > >>
> > > >> > Hi JM,
> > > >> > Cluster B is only receiving replication data (writes), but
> handlers
> > > are
> > > >> > waiting most of the time (never 50% of them are used). As i have
> > read,
> > > >> RPC
> > > >> > queue is only used when handlers are all waiting, does it count
> for
> > > >> > replication as well?
> > > >> >
> > > >> > Thanks!
> > > >> >
> > > >> >
> > > >> > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > > >> >
> > > >> > > Hi,
> > > >> > >
> > > >> > > When you say that B doesn't get any read/write operation, does
> it
> > > mean
> > > >> > you
> > > >> > > stopped the replication? Or B is still getting the write
> > operations
> > > >> from
> > > >> > A
> > > >> > > because of the replication? If so, that's why you RPC queue is
> > > used...
> > > >> > >
> > > >> > > JM
> > > >> > >
> > > >> > >
> > > >> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > > >> > >
> > > >> > > > Not much information in RS logs (DEBUG level set to
> > > >> > > > org.apache.hadoop.hbase). Here is a sample of one regionserver
> > > >> showing
> > > >> > > > increasing rpc.metrics.RpcQueueTime_num_ops and
> > > >> > > > rpc.metrics.RpcQueueTime_avg_time
> > > >> > > > activity:
> > > >> > > >
> > > >> > > > 2013-12-09 08:09:10,699 DEBUG
> > > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats:
> > total=23.14
> > > >> MB,
> > > >> > > > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151,
> > > >> > hits=122168501,
> > > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> > > cachingHits=122162378,
> > > >> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> > > >> > > > evictedPerRun=Infinity
> > > >> > > > 2013-12-09 08:09:11,396 INFO
> > > >> > > >
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > >> Total
> > > >> > > > replicated: 1
> > > >> > > > 2013-12-09 08:09:14,979 INFO
> > > >> > > >
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > >> Total
> > > >> > > > replicated: 2
> > > >> > > > 2013-12-09 08:09:16,016 INFO
> > > >> > > >
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > >> Total
> > > >> > > > replicated: 1
> > > >> > > > ...
> > > >> > > > 2013-12-09 08:14:07,659 INFO
> > > >> > > >
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > >> Total
> > > >> > > > replicated: 1
> > > >> > > > 2013-12-09 08:14:08,713 INFO
> > > >> > > >
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > >> Total
> > > >> > > > replicated: 3
> > > >> > > > 2013-12-09 08:14:10,699 DEBUG
> > > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats:
> > total=23.14
> > > >> MB,
> > > >> > > > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151,
> > > >> > hits=122168501,
> > > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> > > cachingHits=122162378,
> > > >> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> > > >> > > > evictedPerRun=Infinity
> > > >> > > > 2013-12-09 08:14:12,711 INFO
> > > >> > > >
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > >> Total
> > > >> > > > replicated: 1
> > > >> > > > 2013-12-09 08:14:14,778 INFO
> > > >> > > >
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > >> Total
> > > >> > > > replicated: 3
> > > >> > > > ...
> > > >> > > > 2013-12-09 08:15:09,199 INFO
> > > >> > > >
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > >> Total
> > > >> > > > replicated: 3
> > > >> > > > 2013-12-09 08:15:12,243 INFO
> > > >> > > >
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > >> Total
> > > >> > > > replicated: 2
> > > >> > > > 2013-12-09 08:15:22,086 INFO
> > > >> > > >
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > >> Total
> > > >> > > > replicated: 2
> > > >> > > >
> > > >> > > > Thanks
> > > >> > > >
> > > >> > > >
> > > >> > > > 2013/12/7 Bharath Vissapragada <bh...@cloudera.com>
> > > >> > > >
> > > >> > > > > I'd look into the RS logs to see whats happening there.
> > > Difficult
> > > >> to
> > > >> > > > guess
> > > >> > > > > from the given information!
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Sat, Dec 7, 2013 at 8:52 PM, Federico Gaule <
> > > >> fgaule@despegar.com>
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Any clue?
> > > >> > > > > > El dic 5, 2013 9:49 a.m., "Federico Gaule" <
> > > fgaule@despegar.com
> > > >> >
> > > >> > > > > escribió:
> > > >> > > > > >
> > > >> > > > > > > Hi,
> > > >> > > > > > >
> > > >> > > > > > > I have 2 clusters, Master (a) - Slave (b) replication.
> > > >> > > > > > > B doesn't have client write or reads, all handlers (100)
> > are
> > > >> > > waiting
> > > >> > > > > but
> > > >> > > > > > > rpc.metrics.RpcQueueTime_num_ops and
> > > >> > > > rpc.metrics.RpcQueueTime_avg_time
> > > >> > > > > > reports
> > > >> > > > > > > to be rpc calls to be queued.
> > > >> > > > > > > There are some screenshots below to show ganglia
> metrics.
> > > How
> > > >> is
> > > >> > > this
> > > >> > > > > > > behaviour explained? I have looked for metrics
> > > specifications
> > > >> but
> > > >> > > > can't
> > > >> > > > > > > find much information.
> > > >> > > > > > >
> > > >> > > > > > > Handlers
> > > >> > > > > > > http://i42.tinypic.com/242ssoz.png
> > > >> > > > > > >
> > > >> > > > > > > NumOps
> > > >> > > > > > > http://tinypic.com/r/of2c8k/5
> > > >> > > > > > >
> > > >> > > > > > > AvgTime
> > > >> > > > > > > http://tinypic.com/r/2lsvg5w/5
> > > >> > > > > > >
> > > >> > > > > > > Cheers
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > --
> > > >> > > > > Bharath Vissapragada
> > > >> > > > > <http://www.cloudera.com>
> > > >> > > > >
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > --
> > > >> > > >
> > > >> > > > [image:
> > > >> http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > >> > ]
> > > >> > > >
> > > >> > > > *Ing. Federico Gaule*
> > > >> > > > Líder Técnico - PAM <ho...@despegar.com>
> > > >> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > >> > > > tel. +54 (11) 4894-3500
> > > >> > > >
> > > >> > > > *[image: Seguinos en Twitter!] <
> > http://twitter.com/#!/despegarar>
> > > >> > > [image:
> > > >> > > > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> > [image:
> > > >> > > Seguinos
> > > >> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > >> > > > *Despegar.com, Inc. *
> > > >> > > > El mejor precio para tu viaje.
> > > >> > > >
> > > >> > > > Este mensaje es confidencial y puede contener informaciÃ³n
> > > amparada
> > > >> por
> > > >> > > el
> > > >> > > > secreto profesional.
> > > >> > > > Si usted ha recibido este e-mail por error, por favor
> > > >> comunÃquenoslo
> > > >> > > > inmediatamente respondiendo a este e-mail y luego
> eliminÃ¡ndolo
> > de
> > > >> su
> > > >> > > > sistema.
> > > >> > > > El contenido de este mensaje no deberÃ¡ ser copiado ni
> > divulgado a
> > > >> > > ninguna
> > > >> > > > persona.
> > > >> > > >
> > > >> > >
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> >
> > > >> > [image:
> > > http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> > > >> >
> > > >> > *Ing. Federico Gaule*
> > > >> > Líder Técnico - PAM <ho...@despegar.com>
> > > >> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > >> > tel. +54 (11) 4894-3500
> > > >> >
> > > >> > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> > > >> [image:
> > > >> > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> > > >> Seguinos
> > > >> > en YouTube!] <http://www.youtube.com/Despegar>*
> > > >> > *Despegar.com, Inc. *
> > > >> > El mejor precio para tu viaje.
> > > >> >
> > > >> > Este mensaje es confidencial y puede contener informaciÃ³n
> amparada
> > > por
> > > >> el
> > > >> > secreto profesional.
> > > >> > Si usted ha recibido este e-mail por error, por favor
> > comunÃquenoslo
> > > >> > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de
> > su
> > > >> > sistema.
> > > >> > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> > > >> ninguna
> > > >> > persona.
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > [image:
> http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > ]
> > > >
> > > > *Ing. Federico Gaule*
> > > > Líder Técnico - PAM <ho...@despegar.com>
> > > >
> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > tel. +54 (11) 4894-3500
> > > >
> > > >
> > > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> > > [image:
> > > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> > > Seguinos
> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > *Despegar.com, Inc. *
> > > >
> > > > El mejor precio para tu viaje.
> > > >
> > > > Este mensaje es confidencial y puede contener informaciÃ³n amparada
> por
> > > el
> > > > secreto profesional.
> > > > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> > > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> > > > sistema.
> > > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> > > ninguna
> > > > persona.
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png
> ]
> > >
> > > *Ing. Federico Gaule*
> > > Líder Técnico - PAM <ho...@despegar.com>
> > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > tel. +54 (11) 4894-3500
> > >
> > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> > [image:
> > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> > Seguinos
> > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > *Despegar.com, Inc. *
> > > El mejor precio para tu viaje.
> > >
> > > Este mensaje es confidencial y puede contener informaciÃ³n amparada por
> > el
> > > secreto profesional.
> > > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> > > sistema.
> > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> > ninguna
> > > persona.
> > >
> >
>
>
>
> --
>
> [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
>
> *Ing. Federico Gaule*
> Líder Técnico - PAM <ho...@despegar.com>
> Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> tel. +54 (11) 4894-3500
>
> *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
> Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
> en YouTube!] <http://www.youtube.com/Despegar>*
> *Despegar.com, Inc. *
> El mejor precio para tu viaje.
>
> Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
> secreto profesional.
> Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> sistema.
> El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
> persona.
>

Re: RPC - Queue Time when handlers are all waiting

Posted by Federico Gaule <fg...@despegar.com>.

Default value for hbase.regionserver.replication.handler.count (can't find
what is the default, Is it 3?)
I'll do a try increasing that property

Fri Dec 06 12:44:12 EST 2013REPL IPC Server handler 2 on 60020WAITING
(since 8sec ago)Waiting for a call (since 8sec ago)Fri Dec 06 12:44:12 EST
2013REPL IPC Server handler 1 on 60020WAITING (since 8sec ago)Waiting for a
call (since 8sec ago)Fri Dec 06 12:44:12 EST 2013REPL IPC Server handler 0
on 60020WAITING (since 2sec ago)Waiting for a call (since 2sec ago)
Thanks JM


2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>

> For replications, the handlers used on the salve cluster are configured by
> hbase.regionserver.replication.handler.count. What value do you have for
> this property?
>
> JM
>
>
> 2013/12/9 Federico Gaule <fg...@despegar.com>
>
> > Here is a thread saying what i think it should be (
> > http://grokbase.com/t/hbase/user/13bmndq53k/average-rpc-queue-time)
> >
> > "The RpcQueueTime metrics are a measurement of how long individual calls
> > stay in this queued state. If your handlers were never 100% occupied,
> this
> > value would be 0. An average of 3 hours is concerning, it basically means
> > that when a call comes into the RegionServer it takes on average 3 hours
> to
> > start processing, because handlers are all occupied for that amount of
> > time."
> >
> > Is that correct?
> >
> >
> >
> > 2013/12/9 Federico Gaule <fg...@despegar.com>
> >
> > > Correct me if i'm wrong, but, Queues should be used only when handlers
> > are
> > > all busy, shouldn't it?.
> > > If that's true, i don't get why there is activity related to queues.
> > >
> > > Maybe i'm missing some piece of knowledge about when hbase is using
> > queues
> > > :)
> > >
> > > Thanks
> > >
> > >
> > > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > >
> > >> There might be something I'm missing ;)
> > >>
> > >> On cluster B, as you said, never more than 50% of your handlers are
> > used.
> > >> Your Ganglia metrics are showing that there is activities (num ops is
> > >> increasing), which is correct.
> > >>
> > >> Can you please confirm what you think is wrong from your charts?
> > >>
> > >> Thanks,
> > >>
> > >> JM
> > >>
> > >>
> > >> 2013/12/9 Federico Gaule <fg...@despegar.com>
> > >>
> > >> > Hi JM,
> > >> > Cluster B is only receiving replication data (writes), but handlers
> > are
> > >> > waiting most of the time (never 50% of them are used). As i have
> read,
> > >> RPC
> > >> > queue is only used when handlers are all waiting, does it count for
> > >> > replication as well?
> > >> >
> > >> > Thanks!
> > >> >
> > >> >
> > >> > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> > >> >
> > >> > > Hi,
> > >> > >
> > >> > > When you say that B doesn't get any read/write operation, does it
> > mean
> > >> > you
> > >> > > stopped the replication? Or B is still getting the write
> operations
> > >> from
> > >> > A
> > >> > > because of the replication? If so, that's why you RPC queue is
> > used...
> > >> > >
> > >> > > JM
> > >> > >
> > >> > >
> > >> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > >> > >
> > >> > > > Not much information in RS logs (DEBUG level set to
> > >> > > > org.apache.hadoop.hbase). Here is a sample of one regionserver
> > >> showing
> > >> > > > increasing rpc.metrics.RpcQueueTime_num_ops and
> > >> > > > rpc.metrics.RpcQueueTime_avg_time
> > >> > > > activity:
> > >> > > >
> > >> > > > 2013-12-09 08:09:10,699 DEBUG
> > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats:
> total=23.14
> > >> MB,
> > >> > > > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151,
> > >> > hits=122168501,
> > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> > cachingHits=122162378,
> > >> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> > >> > > > evictedPerRun=Infinity
> > >> > > > 2013-12-09 08:09:11,396 INFO
> > >> > > >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > >> Total
> > >> > > > replicated: 1
> > >> > > > 2013-12-09 08:09:14,979 INFO
> > >> > > >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > >> Total
> > >> > > > replicated: 2
> > >> > > > 2013-12-09 08:09:16,016 INFO
> > >> > > >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > >> Total
> > >> > > > replicated: 1
> > >> > > > ...
> > >> > > > 2013-12-09 08:14:07,659 INFO
> > >> > > >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > >> Total
> > >> > > > replicated: 1
> > >> > > > 2013-12-09 08:14:08,713 INFO
> > >> > > >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > >> Total
> > >> > > > replicated: 3
> > >> > > > 2013-12-09 08:14:10,699 DEBUG
> > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats:
> total=23.14
> > >> MB,
> > >> > > > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151,
> > >> > hits=122168501,
> > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> > cachingHits=122162378,
> > >> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> > >> > > > evictedPerRun=Infinity
> > >> > > > 2013-12-09 08:14:12,711 INFO
> > >> > > >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > >> Total
> > >> > > > replicated: 1
> > >> > > > 2013-12-09 08:14:14,778 INFO
> > >> > > >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > >> Total
> > >> > > > replicated: 3
> > >> > > > ...
> > >> > > > 2013-12-09 08:15:09,199 INFO
> > >> > > >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > >> Total
> > >> > > > replicated: 3
> > >> > > > 2013-12-09 08:15:12,243 INFO
> > >> > > >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > >> Total
> > >> > > > replicated: 2
> > >> > > > 2013-12-09 08:15:22,086 INFO
> > >> > > >
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > >> Total
> > >> > > > replicated: 2
> > >> > > >
> > >> > > > Thanks
> > >> > > >
> > >> > > >
> > >> > > > 2013/12/7 Bharath Vissapragada <bh...@cloudera.com>
> > >> > > >
> > >> > > > > I'd look into the RS logs to see whats happening there.
> > Difficult
> > >> to
> > >> > > > guess
> > >> > > > > from the given information!
> > >> > > > >
> > >> > > > >
> > >> > > > > On Sat, Dec 7, 2013 at 8:52 PM, Federico Gaule <
> > >> fgaule@despegar.com>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Any clue?
> > >> > > > > > El dic 5, 2013 9:49 a.m., "Federico Gaule" <
> > fgaule@despegar.com
> > >> >
> > >> > > > > escribió:
> > >> > > > > >
> > >> > > > > > > Hi,
> > >> > > > > > >
> > >> > > > > > > I have 2 clusters, Master (a) - Slave (b) replication.
> > >> > > > > > > B doesn't have client write or reads, all handlers (100)
> are
> > >> > > waiting
> > >> > > > > but
> > >> > > > > > > rpc.metrics.RpcQueueTime_num_ops and
> > >> > > > rpc.metrics.RpcQueueTime_avg_time
> > >> > > > > > reports
> > >> > > > > > > to be rpc calls to be queued.
> > >> > > > > > > There are some screenshots below to show ganglia metrics.
> > How
> > >> is
> > >> > > this
> > >> > > > > > > behaviour explained? I have looked for metrics
> > specifications
> > >> but
> > >> > > > can't
> > >> > > > > > > find much information.
> > >> > > > > > >
> > >> > > > > > > Handlers
> > >> > > > > > > http://i42.tinypic.com/242ssoz.png
> > >> > > > > > >
> > >> > > > > > > NumOps
> > >> > > > > > > http://tinypic.com/r/of2c8k/5
> > >> > > > > > >
> > >> > > > > > > AvgTime
> > >> > > > > > > http://tinypic.com/r/2lsvg5w/5
> > >> > > > > > >
> > >> > > > > > > Cheers
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > > Bharath Vissapragada
> > >> > > > > <http://www.cloudera.com>
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > >
> > >> > > > [image:
> > >> http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > >> > ]
> > >> > > >
> > >> > > > *Ing. Federico Gaule*
> > >> > > > Líder Técnico - PAM <ho...@despegar.com>
> > >> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > >> > > > tel. +54 (11) 4894-3500
> > >> > > >
> > >> > > > *[image: Seguinos en Twitter!] <
> http://twitter.com/#!/despegarar>
> > >> > > [image:
> > >> > > > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> [image:
> > >> > > Seguinos
> > >> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > >> > > > *Despegar.com, Inc. *
> > >> > > > El mejor precio para tu viaje.
> > >> > > >
> > >> > > > Este mensaje es confidencial y puede contener informaciÃ³n
> > amparada
> > >> por
> > >> > > el
> > >> > > > secreto profesional.
> > >> > > > Si usted ha recibido este e-mail por error, por favor
> > >> comunÃquenoslo
> > >> > > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo
> de
> > >> su
> > >> > > > sistema.
> > >> > > > El contenido de este mensaje no deberÃ¡ ser copiado ni
> divulgado a
> > >> > > ninguna
> > >> > > > persona.
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> >
> > >> > [image:
> > http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> > >> >
> > >> > *Ing. Federico Gaule*
> > >> > Líder Técnico - PAM <ho...@despegar.com>
> > >> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > >> > tel. +54 (11) 4894-3500
> > >> >
> > >> > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> > >> [image:
> > >> > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> > >> Seguinos
> > >> > en YouTube!] <http://www.youtube.com/Despegar>*
> > >> > *Despegar.com, Inc. *
> > >> > El mejor precio para tu viaje.
> > >> >
> > >> > Este mensaje es confidencial y puede contener informaciÃ³n amparada
> > por
> > >> el
> > >> > secreto profesional.
> > >> > Si usted ha recibido este e-mail por error, por favor
> comunÃquenoslo
> > >> > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de
> su
> > >> > sistema.
> > >> > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> > >> ninguna
> > >> > persona.
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > >
> > > [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png
> ]
> > >
> > > *Ing. Federico Gaule*
> > > Líder Técnico - PAM <ho...@despegar.com>
> > >
> > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > tel. +54 (11) 4894-3500
> > >
> > >
> > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> > [image:
> > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> > Seguinos
> > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > *Despegar.com, Inc. *
> > >
> > > El mejor precio para tu viaje.
> > >
> > > Este mensaje es confidencial y puede contener informaciÃ³n amparada por
> > el
> > > secreto profesional.
> > > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> > > sistema.
> > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> > ninguna
> > > persona.
> > >
> >
> >
> >
> > --
> >
> > [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> >
> > *Ing. Federico Gaule*
> > Líder Técnico - PAM <ho...@despegar.com>
> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > tel. +54 (11) 4894-3500
> >
> > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> [image:
> > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> Seguinos
> > en YouTube!] <http://www.youtube.com/Despegar>*
> > *Despegar.com, Inc. *
> > El mejor precio para tu viaje.
> >
> > Este mensaje es confidencial y puede contener informaciÃ³n amparada por
> el
> > secreto profesional.
> > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> > sistema.
> > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> ninguna
> > persona.
> >
>



-- 

[image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]

*Ing. Federico Gaule*
Líder Técnico - PAM <ho...@despegar.com>
Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
tel. +54 (11) 4894-3500

*[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
en YouTube!] <http://www.youtube.com/Despegar>*
*Despegar.com, Inc. *
El mejor precio para tu viaje.

Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
secreto profesional.
Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
sistema.
El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
persona.

Re: RPC - Queue Time when handlers are all waiting

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

For replications, the handlers used on the salve cluster are configured by
hbase.regionserver.replication.handler.count. What value do you have for
this property?

JM


2013/12/9 Federico Gaule <fg...@despegar.com>

> Here is a thread saying what i think it should be (
> http://grokbase.com/t/hbase/user/13bmndq53k/average-rpc-queue-time)
>
> "The RpcQueueTime metrics are a measurement of how long individual calls
> stay in this queued state. If your handlers were never 100% occupied, this
> value would be 0. An average of 3 hours is concerning, it basically means
> that when a call comes into the RegionServer it takes on average 3 hours to
> start processing, because handlers are all occupied for that amount of
> time."
>
> Is that correct?
>
>
>
> 2013/12/9 Federico Gaule <fg...@despegar.com>
>
> > Correct me if i'm wrong, but, Queues should be used only when handlers
> are
> > all busy, shouldn't it?.
> > If that's true, i don't get why there is activity related to queues.
> >
> > Maybe i'm missing some piece of knowledge about when hbase is using
> queues
> > :)
> >
> > Thanks
> >
> >
> > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> >
> >> There might be something I'm missing ;)
> >>
> >> On cluster B, as you said, never more than 50% of your handlers are
> used.
> >> Your Ganglia metrics are showing that there is activities (num ops is
> >> increasing), which is correct.
> >>
> >> Can you please confirm what you think is wrong from your charts?
> >>
> >> Thanks,
> >>
> >> JM
> >>
> >>
> >> 2013/12/9 Federico Gaule <fg...@despegar.com>
> >>
> >> > Hi JM,
> >> > Cluster B is only receiving replication data (writes), but handlers
> are
> >> > waiting most of the time (never 50% of them are used). As i have read,
> >> RPC
> >> > queue is only used when handlers are all waiting, does it count for
> >> > replication as well?
> >> >
> >> > Thanks!
> >> >
> >> >
> >> > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> >> >
> >> > > Hi,
> >> > >
> >> > > When you say that B doesn't get any read/write operation, does it
> mean
> >> > you
> >> > > stopped the replication? Or B is still getting the write operations
> >> from
> >> > A
> >> > > because of the replication? If so, that's why you RPC queue is
> used...
> >> > >
> >> > > JM
> >> > >
> >> > >
> >> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> >> > >
> >> > > > Not much information in RS logs (DEBUG level set to
> >> > > > org.apache.hadoop.hbase). Here is a sample of one regionserver
> >> showing
> >> > > > increasing rpc.metrics.RpcQueueTime_num_ops and
> >> > > > rpc.metrics.RpcQueueTime_avg_time
> >> > > > activity:
> >> > > >
> >> > > > 2013-12-09 08:09:10,699 DEBUG
> >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=23.14
> >> MB,
> >> > > > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151,
> >> > hits=122168501,
> >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> cachingHits=122162378,
> >> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> >> > > > evictedPerRun=Infinity
> >> > > > 2013-12-09 08:09:11,396 INFO
> >> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> >> Total
> >> > > > replicated: 1
> >> > > > 2013-12-09 08:09:14,979 INFO
> >> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> >> Total
> >> > > > replicated: 2
> >> > > > 2013-12-09 08:09:16,016 INFO
> >> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> >> Total
> >> > > > replicated: 1
> >> > > > ...
> >> > > > 2013-12-09 08:14:07,659 INFO
> >> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> >> Total
> >> > > > replicated: 1
> >> > > > 2013-12-09 08:14:08,713 INFO
> >> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> >> Total
> >> > > > replicated: 3
> >> > > > 2013-12-09 08:14:10,699 DEBUG
> >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=23.14
> >> MB,
> >> > > > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151,
> >> > hits=122168501,
> >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> cachingHits=122162378,
> >> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> >> > > > evictedPerRun=Infinity
> >> > > > 2013-12-09 08:14:12,711 INFO
> >> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> >> Total
> >> > > > replicated: 1
> >> > > > 2013-12-09 08:14:14,778 INFO
> >> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> >> Total
> >> > > > replicated: 3
> >> > > > ...
> >> > > > 2013-12-09 08:15:09,199 INFO
> >> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> >> Total
> >> > > > replicated: 3
> >> > > > 2013-12-09 08:15:12,243 INFO
> >> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> >> Total
> >> > > > replicated: 2
> >> > > > 2013-12-09 08:15:22,086 INFO
> >> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> >> Total
> >> > > > replicated: 2
> >> > > >
> >> > > > Thanks
> >> > > >
> >> > > >
> >> > > > 2013/12/7 Bharath Vissapragada <bh...@cloudera.com>
> >> > > >
> >> > > > > I'd look into the RS logs to see whats happening there.
> Difficult
> >> to
> >> > > > guess
> >> > > > > from the given information!
> >> > > > >
> >> > > > >
> >> > > > > On Sat, Dec 7, 2013 at 8:52 PM, Federico Gaule <
> >> fgaule@despegar.com>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Any clue?
> >> > > > > > El dic 5, 2013 9:49 a.m., "Federico Gaule" <
> fgaule@despegar.com
> >> >
> >> > > > > escribió:
> >> > > > > >
> >> > > > > > > Hi,
> >> > > > > > >
> >> > > > > > > I have 2 clusters, Master (a) - Slave (b) replication.
> >> > > > > > > B doesn't have client write or reads, all handlers (100) are
> >> > > waiting
> >> > > > > but
> >> > > > > > > rpc.metrics.RpcQueueTime_num_ops and
> >> > > > rpc.metrics.RpcQueueTime_avg_time
> >> > > > > > reports
> >> > > > > > > to be rpc calls to be queued.
> >> > > > > > > There are some screenshots below to show ganglia metrics.
> How
> >> is
> >> > > this
> >> > > > > > > behaviour explained? I have looked for metrics
> specifications
> >> but
> >> > > > can't
> >> > > > > > > find much information.
> >> > > > > > >
> >> > > > > > > Handlers
> >> > > > > > > http://i42.tinypic.com/242ssoz.png
> >> > > > > > >
> >> > > > > > > NumOps
> >> > > > > > > http://tinypic.com/r/of2c8k/5
> >> > > > > > >
> >> > > > > > > AvgTime
> >> > > > > > > http://tinypic.com/r/2lsvg5w/5
> >> > > > > > >
> >> > > > > > > Cheers
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > Bharath Vissapragada
> >> > > > > <http://www.cloudera.com>
> >> > > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > >
> >> > > > [image:
> >> http://www.despegar.com/galeria/images/promos/isodespegar1.png
> >> > ]
> >> > > >
> >> > > > *Ing. Federico Gaule*
> >> > > > Líder Técnico - PAM <ho...@despegar.com>
> >> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> >> > > > tel. +54 (11) 4894-3500
> >> > > >
> >> > > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> >> > > [image:
> >> > > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> >> > > Seguinos
> >> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> >> > > > *Despegar.com, Inc. *
> >> > > > El mejor precio para tu viaje.
> >> > > >
> >> > > > Este mensaje es confidencial y puede contener informaciÃ³n
> amparada
> >> por
> >> > > el
> >> > > > secreto profesional.
> >> > > > Si usted ha recibido este e-mail por error, por favor
> >> comunÃquenoslo
> >> > > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de
> >> su
> >> > > > sistema.
> >> > > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> >> > > ninguna
> >> > > > persona.
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > [image:
> http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> >> >
> >> > *Ing. Federico Gaule*
> >> > Líder Técnico - PAM <ho...@despegar.com>
> >> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> >> > tel. +54 (11) 4894-3500
> >> >
> >> > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> >> [image:
> >> > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> >> Seguinos
> >> > en YouTube!] <http://www.youtube.com/Despegar>*
> >> > *Despegar.com, Inc. *
> >> > El mejor precio para tu viaje.
> >> >
> >> > Este mensaje es confidencial y puede contener informaciÃ³n amparada
> por
> >> el
> >> > secreto profesional.
> >> > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> >> > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> >> > sistema.
> >> > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> >> ninguna
> >> > persona.
> >> >
> >>
> >
> >
> >
> > --
> >
> > [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> >
> > *Ing. Federico Gaule*
> > Líder Técnico - PAM <ho...@despegar.com>
> >
> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > tel. +54 (11) 4894-3500
> >
> >
> > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> [image:
> > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> Seguinos
> > en YouTube!] <http://www.youtube.com/Despegar>*
> > *Despegar.com, Inc. *
> >
> > El mejor precio para tu viaje.
> >
> > Este mensaje es confidencial y puede contener informaciÃ³n amparada por
> el
> > secreto profesional.
> > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> > sistema.
> > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> ninguna
> > persona.
> >
>
>
>
> --
>
> [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
>
> *Ing. Federico Gaule*
> Líder Técnico - PAM <ho...@despegar.com>
> Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> tel. +54 (11) 4894-3500
>
> *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
> Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
> en YouTube!] <http://www.youtube.com/Despegar>*
> *Despegar.com, Inc. *
> El mejor precio para tu viaje.
>
> Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
> secreto profesional.
> Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> sistema.
> El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
> persona.
>

Re: RPC - Queue Time when handlers are all waiting

Posted by Federico Gaule <fg...@despegar.com>.

Here is a thread saying what i think it should be (
http://grokbase.com/t/hbase/user/13bmndq53k/average-rpc-queue-time)

"The RpcQueueTime metrics are a measurement of how long individual calls
stay in this queued state. If your handlers were never 100% occupied, this
value would be 0. An average of 3 hours is concerning, it basically means
that when a call comes into the RegionServer it takes on average 3 hours to
start processing, because handlers are all occupied for that amount of
time."

Is that correct?



2013/12/9 Federico Gaule <fg...@despegar.com>

> Correct me if i'm wrong, but, Queues should be used only when handlers are
> all busy, shouldn't it?.
> If that's true, i don't get why there is activity related to queues.
>
> Maybe i'm missing some piece of knowledge about when hbase is using queues
> :)
>
> Thanks
>
>
> 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
>
>> There might be something I'm missing ;)
>>
>> On cluster B, as you said, never more than 50% of your handlers are used.
>> Your Ganglia metrics are showing that there is activities (num ops is
>> increasing), which is correct.
>>
>> Can you please confirm what you think is wrong from your charts?
>>
>> Thanks,
>>
>> JM
>>
>>
>> 2013/12/9 Federico Gaule <fg...@despegar.com>
>>
>> > Hi JM,
>> > Cluster B is only receiving replication data (writes), but handlers are
>> > waiting most of the time (never 50% of them are used). As i have read,
>> RPC
>> > queue is only used when handlers are all waiting, does it count for
>> > replication as well?
>> >
>> > Thanks!
>> >
>> >
>> > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
>> >
>> > > Hi,
>> > >
>> > > When you say that B doesn't get any read/write operation, does it mean
>> > you
>> > > stopped the replication? Or B is still getting the write operations
>> from
>> > A
>> > > because of the replication? If so, that's why you RPC queue is used...
>> > >
>> > > JM
>> > >
>> > >
>> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
>> > >
>> > > > Not much information in RS logs (DEBUG level set to
>> > > > org.apache.hadoop.hbase). Here is a sample of one regionserver
>> showing
>> > > > increasing rpc.metrics.RpcQueueTime_num_ops and
>> > > > rpc.metrics.RpcQueueTime_avg_time
>> > > > activity:
>> > > >
>> > > > 2013-12-09 08:09:10,699 DEBUG
>> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=23.14
>> MB,
>> > > > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151,
>> > hits=122168501,
>> > > > hitRatio=99.77%, , cachingAccesses=122192927, cachingHits=122162378,
>> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
>> > > > evictedPerRun=Infinity
>> > > > 2013-12-09 08:09:11,396 INFO
>> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
>> Total
>> > > > replicated: 1
>> > > > 2013-12-09 08:09:14,979 INFO
>> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
>> Total
>> > > > replicated: 2
>> > > > 2013-12-09 08:09:16,016 INFO
>> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
>> Total
>> > > > replicated: 1
>> > > > ...
>> > > > 2013-12-09 08:14:07,659 INFO
>> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
>> Total
>> > > > replicated: 1
>> > > > 2013-12-09 08:14:08,713 INFO
>> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
>> Total
>> > > > replicated: 3
>> > > > 2013-12-09 08:14:10,699 DEBUG
>> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=23.14
>> MB,
>> > > > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151,
>> > hits=122168501,
>> > > > hitRatio=99.77%, , cachingAccesses=122192927, cachingHits=122162378,
>> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
>> > > > evictedPerRun=Infinity
>> > > > 2013-12-09 08:14:12,711 INFO
>> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
>> Total
>> > > > replicated: 1
>> > > > 2013-12-09 08:14:14,778 INFO
>> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
>> Total
>> > > > replicated: 3
>> > > > ...
>> > > > 2013-12-09 08:15:09,199 INFO
>> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
>> Total
>> > > > replicated: 3
>> > > > 2013-12-09 08:15:12,243 INFO
>> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
>> Total
>> > > > replicated: 2
>> > > > 2013-12-09 08:15:22,086 INFO
>> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
>> Total
>> > > > replicated: 2
>> > > >
>> > > > Thanks
>> > > >
>> > > >
>> > > > 2013/12/7 Bharath Vissapragada <bh...@cloudera.com>
>> > > >
>> > > > > I'd look into the RS logs to see whats happening there. Difficult
>> to
>> > > > guess
>> > > > > from the given information!
>> > > > >
>> > > > >
>> > > > > On Sat, Dec 7, 2013 at 8:52 PM, Federico Gaule <
>> fgaule@despegar.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Any clue?
>> > > > > > El dic 5, 2013 9:49 a.m., "Federico Gaule" <fgaule@despegar.com
>> >
>> > > > > escribió:
>> > > > > >
>> > > > > > > Hi,
>> > > > > > >
>> > > > > > > I have 2 clusters, Master (a) - Slave (b) replication.
>> > > > > > > B doesn't have client write or reads, all handlers (100) are
>> > > waiting
>> > > > > but
>> > > > > > > rpc.metrics.RpcQueueTime_num_ops and
>> > > > rpc.metrics.RpcQueueTime_avg_time
>> > > > > > reports
>> > > > > > > to be rpc calls to be queued.
>> > > > > > > There are some screenshots below to show ganglia metrics. How
>> is
>> > > this
>> > > > > > > behaviour explained? I have looked for metrics specifications
>> but
>> > > > can't
>> > > > > > > find much information.
>> > > > > > >
>> > > > > > > Handlers
>> > > > > > > http://i42.tinypic.com/242ssoz.png
>> > > > > > >
>> > > > > > > NumOps
>> > > > > > > http://tinypic.com/r/of2c8k/5
>> > > > > > >
>> > > > > > > AvgTime
>> > > > > > > http://tinypic.com/r/2lsvg5w/5
>> > > > > > >
>> > > > > > > Cheers
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Bharath Vissapragada
>> > > > > <http://www.cloudera.com>
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > >
>> > > > [image:
>> http://www.despegar.com/galeria/images/promos/isodespegar1.png
>> > ]
>> > > >
>> > > > *Ing. Federico Gaule*
>> > > > Líder Técnico - PAM <ho...@despegar.com>
>> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
>> > > > tel. +54 (11) 4894-3500
>> > > >
>> > > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
>> > > [image:
>> > > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
>> > > Seguinos
>> > > > en YouTube!] <http://www.youtube.com/Despegar>*
>> > > > *Despegar.com, Inc. *
>> > > > El mejor precio para tu viaje.
>> > > >
>> > > > Este mensaje es confidencial y puede contener informaciÃ³n amparada
>> por
>> > > el
>> > > > secreto profesional.
>> > > > Si usted ha recibido este e-mail por error, por favor
>> comunÃquenoslo
>> > > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de
>> su
>> > > > sistema.
>> > > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
>> > > ninguna
>> > > > persona.
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> >
>> > [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
>> >
>> > *Ing. Federico Gaule*
>> > Líder Técnico - PAM <ho...@despegar.com>
>> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
>> > tel. +54 (11) 4894-3500
>> >
>> > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
>> [image:
>> > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
>> Seguinos
>> > en YouTube!] <http://www.youtube.com/Despegar>*
>> > *Despegar.com, Inc. *
>> > El mejor precio para tu viaje.
>> >
>> > Este mensaje es confidencial y puede contener informaciÃ³n amparada por
>> el
>> > secreto profesional.
>> > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
>> > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
>> > sistema.
>> > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
>> ninguna
>> > persona.
>> >
>>
>
>
>
> --
>
> [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
>
> *Ing. Federico Gaule*
> Líder Técnico - PAM <ho...@despegar.com>
>
> Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> tel. +54 (11) 4894-3500
>
>
> *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
> Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
> en YouTube!] <http://www.youtube.com/Despegar>*
> *Despegar.com, Inc. *
>
> El mejor precio para tu viaje.
>
> Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
> secreto profesional.
> Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> sistema.
> El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
> persona.
>



-- 

[image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]

*Ing. Federico Gaule*
Líder Técnico - PAM <ho...@despegar.com>
Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
tel. +54 (11) 4894-3500

*[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
en YouTube!] <http://www.youtube.com/Despegar>*
*Despegar.com, Inc. *
El mejor precio para tu viaje.

Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
secreto profesional.
Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
sistema.
El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
persona.

Re: RPC - Queue Time when handlers are all waiting

Posted by Federico Gaule <fg...@despegar.com>.

Correct me if i'm wrong, but, Queues should be used only when handlers are
all busy, shouldn't it?.
If that's true, i don't get why there is activity related to queues.

Maybe i'm missing some piece of knowledge about when hbase is using queues
:)

Thanks


2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>

> There might be something I'm missing ;)
>
> On cluster B, as you said, never more than 50% of your handlers are used.
> Your Ganglia metrics are showing that there is activities (num ops is
> increasing), which is correct.
>
> Can you please confirm what you think is wrong from your charts?
>
> Thanks,
>
> JM
>
>
> 2013/12/9 Federico Gaule <fg...@despegar.com>
>
> > Hi JM,
> > Cluster B is only receiving replication data (writes), but handlers are
> > waiting most of the time (never 50% of them are used). As i have read,
> RPC
> > queue is only used when handlers are all waiting, does it count for
> > replication as well?
> >
> > Thanks!
> >
> >
> > 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
> >
> > > Hi,
> > >
> > > When you say that B doesn't get any read/write operation, does it mean
> > you
> > > stopped the replication? Or B is still getting the write operations
> from
> > A
> > > because of the replication? If so, that's why you RPC queue is used...
> > >
> > > JM
> > >
> > >
> > > 2013/12/9 Federico Gaule <fg...@despegar.com>
> > >
> > > > Not much information in RS logs (DEBUG level set to
> > > > org.apache.hadoop.hbase). Here is a sample of one regionserver
> showing
> > > > increasing rpc.metrics.RpcQueueTime_num_ops and
> > > > rpc.metrics.RpcQueueTime_avg_time
> > > > activity:
> > > >
> > > > 2013-12-09 08:09:10,699 DEBUG
> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=23.14
> MB,
> > > > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151,
> > hits=122168501,
> > > > hitRatio=99.77%, , cachingAccesses=122192927, cachingHits=122162378,
> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> > > > evictedPerRun=Infinity
> > > > 2013-12-09 08:09:11,396 INFO
> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> Total
> > > > replicated: 1
> > > > 2013-12-09 08:09:14,979 INFO
> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> Total
> > > > replicated: 2
> > > > 2013-12-09 08:09:16,016 INFO
> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> Total
> > > > replicated: 1
> > > > ...
> > > > 2013-12-09 08:14:07,659 INFO
> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> Total
> > > > replicated: 1
> > > > 2013-12-09 08:14:08,713 INFO
> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> Total
> > > > replicated: 3
> > > > 2013-12-09 08:14:10,699 DEBUG
> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=23.14
> MB,
> > > > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151,
> > hits=122168501,
> > > > hitRatio=99.77%, , cachingAccesses=122192927, cachingHits=122162378,
> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> > > > evictedPerRun=Infinity
> > > > 2013-12-09 08:14:12,711 INFO
> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> Total
> > > > replicated: 1
> > > > 2013-12-09 08:14:14,778 INFO
> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> Total
> > > > replicated: 3
> > > > ...
> > > > 2013-12-09 08:15:09,199 INFO
> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> Total
> > > > replicated: 3
> > > > 2013-12-09 08:15:12,243 INFO
> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> Total
> > > > replicated: 2
> > > > 2013-12-09 08:15:22,086 INFO
> > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> Total
> > > > replicated: 2
> > > >
> > > > Thanks
> > > >
> > > >
> > > > 2013/12/7 Bharath Vissapragada <bh...@cloudera.com>
> > > >
> > > > > I'd look into the RS logs to see whats happening there. Difficult
> to
> > > > guess
> > > > > from the given information!
> > > > >
> > > > >
> > > > > On Sat, Dec 7, 2013 at 8:52 PM, Federico Gaule <
> fgaule@despegar.com>
> > > > > wrote:
> > > > >
> > > > > > Any clue?
> > > > > > El dic 5, 2013 9:49 a.m., "Federico Gaule" <fg...@despegar.com>
> > > > > escribió:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I have 2 clusters, Master (a) - Slave (b) replication.
> > > > > > > B doesn't have client write or reads, all handlers (100) are
> > > waiting
> > > > > but
> > > > > > > rpc.metrics.RpcQueueTime_num_ops and
> > > > rpc.metrics.RpcQueueTime_avg_time
> > > > > > reports
> > > > > > > to be rpc calls to be queued.
> > > > > > > There are some screenshots below to show ganglia metrics. How
> is
> > > this
> > > > > > > behaviour explained? I have looked for metrics specifications
> but
> > > > can't
> > > > > > > find much information.
> > > > > > >
> > > > > > > Handlers
> > > > > > > http://i42.tinypic.com/242ssoz.png
> > > > > > >
> > > > > > > NumOps
> > > > > > > http://tinypic.com/r/of2c8k/5
> > > > > > >
> > > > > > > AvgTime
> > > > > > > http://tinypic.com/r/2lsvg5w/5
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Bharath Vissapragada
> > > > > <http://www.cloudera.com>
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > [image:
> http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > ]
> > > >
> > > > *Ing. Federico Gaule*
> > > > Líder Técnico - PAM <ho...@despegar.com>
> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > tel. +54 (11) 4894-3500
> > > >
> > > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> > > [image:
> > > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> > > Seguinos
> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > *Despegar.com, Inc. *
> > > > El mejor precio para tu viaje.
> > > >
> > > > Este mensaje es confidencial y puede contener informaciÃ³n amparada
> por
> > > el
> > > > secreto profesional.
> > > > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> > > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> > > > sistema.
> > > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> > > ninguna
> > > > persona.
> > > >
> > >
> >
> >
> >
> > --
> >
> > [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> >
> > *Ing. Federico Gaule*
> > Líder Técnico - PAM <ho...@despegar.com>
> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > tel. +54 (11) 4894-3500
> >
> > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> [image:
> > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> Seguinos
> > en YouTube!] <http://www.youtube.com/Despegar>*
> > *Despegar.com, Inc. *
> > El mejor precio para tu viaje.
> >
> > Este mensaje es confidencial y puede contener informaciÃ³n amparada por
> el
> > secreto profesional.
> > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> > sistema.
> > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> ninguna
> > persona.
> >
>



-- 

[image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]

*Ing. Federico Gaule*
Líder Técnico - PAM <ho...@despegar.com>
Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
tel. +54 (11) 4894-3500

*[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
en YouTube!] <http://www.youtube.com/Despegar>*
*Despegar.com, Inc. *
El mejor precio para tu viaje.

Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
secreto profesional.
Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
sistema.
El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
persona.

Re: RPC - Queue Time when handlers are all waiting

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

There might be something I'm missing ;)

On cluster B, as you said, never more than 50% of your handlers are used.
Your Ganglia metrics are showing that there is activities (num ops is
increasing), which is correct.

Can you please confirm what you think is wrong from your charts?

Thanks,

JM


2013/12/9 Federico Gaule <fg...@despegar.com>

> Hi JM,
> Cluster B is only receiving replication data (writes), but handlers are
> waiting most of the time (never 50% of them are used). As i have read, RPC
> queue is only used when handlers are all waiting, does it count for
> replication as well?
>
> Thanks!
>
>
> 2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>
>
> > Hi,
> >
> > When you say that B doesn't get any read/write operation, does it mean
> you
> > stopped the replication? Or B is still getting the write operations from
> A
> > because of the replication? If so, that's why you RPC queue is used...
> >
> > JM
> >
> >
> > 2013/12/9 Federico Gaule <fg...@despegar.com>
> >
> > > Not much information in RS logs (DEBUG level set to
> > > org.apache.hadoop.hbase). Here is a sample of one regionserver showing
> > > increasing rpc.metrics.RpcQueueTime_num_ops and
> > > rpc.metrics.RpcQueueTime_avg_time
> > > activity:
> > >
> > > 2013-12-09 08:09:10,699 DEBUG
> > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=23.14 MB,
> > > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151,
> hits=122168501,
> > > hitRatio=99.77%, , cachingAccesses=122192927, cachingHits=122162378,
> > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> > > evictedPerRun=Infinity
> > > 2013-12-09 08:09:11,396 INFO
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> > > replicated: 1
> > > 2013-12-09 08:09:14,979 INFO
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> > > replicated: 2
> > > 2013-12-09 08:09:16,016 INFO
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> > > replicated: 1
> > > ...
> > > 2013-12-09 08:14:07,659 INFO
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> > > replicated: 1
> > > 2013-12-09 08:14:08,713 INFO
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> > > replicated: 3
> > > 2013-12-09 08:14:10,699 DEBUG
> > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=23.14 MB,
> > > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151,
> hits=122168501,
> > > hitRatio=99.77%, , cachingAccesses=122192927, cachingHits=122162378,
> > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> > > evictedPerRun=Infinity
> > > 2013-12-09 08:14:12,711 INFO
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> > > replicated: 1
> > > 2013-12-09 08:14:14,778 INFO
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> > > replicated: 3
> > > ...
> > > 2013-12-09 08:15:09,199 INFO
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> > > replicated: 3
> > > 2013-12-09 08:15:12,243 INFO
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> > > replicated: 2
> > > 2013-12-09 08:15:22,086 INFO
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> > > replicated: 2
> > >
> > > Thanks
> > >
> > >
> > > 2013/12/7 Bharath Vissapragada <bh...@cloudera.com>
> > >
> > > > I'd look into the RS logs to see whats happening there. Difficult to
> > > guess
> > > > from the given information!
> > > >
> > > >
> > > > On Sat, Dec 7, 2013 at 8:52 PM, Federico Gaule <fg...@despegar.com>
> > > > wrote:
> > > >
> > > > > Any clue?
> > > > > El dic 5, 2013 9:49 a.m., "Federico Gaule" <fg...@despegar.com>
> > > > escribió:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I have 2 clusters, Master (a) - Slave (b) replication.
> > > > > > B doesn't have client write or reads, all handlers (100) are
> > waiting
> > > > but
> > > > > > rpc.metrics.RpcQueueTime_num_ops and
> > > rpc.metrics.RpcQueueTime_avg_time
> > > > > reports
> > > > > > to be rpc calls to be queued.
> > > > > > There are some screenshots below to show ganglia metrics. How is
> > this
> > > > > > behaviour explained? I have looked for metrics specifications but
> > > can't
> > > > > > find much information.
> > > > > >
> > > > > > Handlers
> > > > > > http://i42.tinypic.com/242ssoz.png
> > > > > >
> > > > > > NumOps
> > > > > > http://tinypic.com/r/of2c8k/5
> > > > > >
> > > > > > AvgTime
> > > > > > http://tinypic.com/r/2lsvg5w/5
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Bharath Vissapragada
> > > > <http://www.cloudera.com>
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png
> ]
> > >
> > > *Ing. Federico Gaule*
> > > Líder Técnico - PAM <ho...@despegar.com>
> > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > tel. +54 (11) 4894-3500
> > >
> > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> > [image:
> > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> > Seguinos
> > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > *Despegar.com, Inc. *
> > > El mejor precio para tu viaje.
> > >
> > > Este mensaje es confidencial y puede contener informaciÃ³n amparada por
> > el
> > > secreto profesional.
> > > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> > > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> > > sistema.
> > > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> > ninguna
> > > persona.
> > >
> >
>
>
>
> --
>
> [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
>
> *Ing. Federico Gaule*
> Líder Técnico - PAM <ho...@despegar.com>
> Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> tel. +54 (11) 4894-3500
>
> *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
> Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
> en YouTube!] <http://www.youtube.com/Despegar>*
> *Despegar.com, Inc. *
> El mejor precio para tu viaje.
>
> Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
> secreto profesional.
> Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> sistema.
> El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
> persona.
>

Re: RPC - Queue Time when handlers are all waiting

Posted by Federico Gaule <fg...@despegar.com>.

Hi JM,
Cluster B is only receiving replication data (writes), but handlers are
waiting most of the time (never 50% of them are used). As i have read, RPC
queue is only used when handlers are all waiting, does it count for
replication as well?

Thanks!


2013/12/9 Jean-Marc Spaggiari <je...@spaggiari.org>

> Hi,
>
> When you say that B doesn't get any read/write operation, does it mean you
> stopped the replication? Or B is still getting the write operations from A
> because of the replication? If so, that's why you RPC queue is used...
>
> JM
>
>
> 2013/12/9 Federico Gaule <fg...@despegar.com>
>
> > Not much information in RS logs (DEBUG level set to
> > org.apache.hadoop.hbase). Here is a sample of one regionserver showing
> > increasing rpc.metrics.RpcQueueTime_num_ops and
> > rpc.metrics.RpcQueueTime_avg_time
> > activity:
> >
> > 2013-12-09 08:09:10,699 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=23.14 MB,
> > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151, hits=122168501,
> > hitRatio=99.77%, , cachingAccesses=122192927, cachingHits=122162378,
> > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> > evictedPerRun=Infinity
> > 2013-12-09 08:09:11,396 INFO
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> > replicated: 1
> > 2013-12-09 08:09:14,979 INFO
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> > replicated: 2
> > 2013-12-09 08:09:16,016 INFO
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> > replicated: 1
> > ...
> > 2013-12-09 08:14:07,659 INFO
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> > replicated: 1
> > 2013-12-09 08:14:08,713 INFO
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> > replicated: 3
> > 2013-12-09 08:14:10,699 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=23.14 MB,
> > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151, hits=122168501,
> > hitRatio=99.77%, , cachingAccesses=122192927, cachingHits=122162378,
> > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> > evictedPerRun=Infinity
> > 2013-12-09 08:14:12,711 INFO
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> > replicated: 1
> > 2013-12-09 08:14:14,778 INFO
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> > replicated: 3
> > ...
> > 2013-12-09 08:15:09,199 INFO
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> > replicated: 3
> > 2013-12-09 08:15:12,243 INFO
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> > replicated: 2
> > 2013-12-09 08:15:22,086 INFO
> > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> > replicated: 2
> >
> > Thanks
> >
> >
> > 2013/12/7 Bharath Vissapragada <bh...@cloudera.com>
> >
> > > I'd look into the RS logs to see whats happening there. Difficult to
> > guess
> > > from the given information!
> > >
> > >
> > > On Sat, Dec 7, 2013 at 8:52 PM, Federico Gaule <fg...@despegar.com>
> > > wrote:
> > >
> > > > Any clue?
> > > > El dic 5, 2013 9:49 a.m., "Federico Gaule" <fg...@despegar.com>
> > > escribió:
> > > >
> > > > > Hi,
> > > > >
> > > > > I have 2 clusters, Master (a) - Slave (b) replication.
> > > > > B doesn't have client write or reads, all handlers (100) are
> waiting
> > > but
> > > > > rpc.metrics.RpcQueueTime_num_ops and
> > rpc.metrics.RpcQueueTime_avg_time
> > > > reports
> > > > > to be rpc calls to be queued.
> > > > > There are some screenshots below to show ganglia metrics. How is
> this
> > > > > behaviour explained? I have looked for metrics specifications but
> > can't
> > > > > find much information.
> > > > >
> > > > > Handlers
> > > > > http://i42.tinypic.com/242ssoz.png
> > > > >
> > > > > NumOps
> > > > > http://tinypic.com/r/of2c8k/5
> > > > >
> > > > > AvgTime
> > > > > http://tinypic.com/r/2lsvg5w/5
> > > > >
> > > > > Cheers
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Bharath Vissapragada
> > > <http://www.cloudera.com>
> > >
> >
> >
> >
> > --
> >
> > [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> >
> > *Ing. Federico Gaule*
> > Líder Técnico - PAM <ho...@despegar.com>
> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > tel. +54 (11) 4894-3500
> >
> > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> [image:
> > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> Seguinos
> > en YouTube!] <http://www.youtube.com/Despegar>*
> > *Despegar.com, Inc. *
> > El mejor precio para tu viaje.
> >
> > Este mensaje es confidencial y puede contener informaciÃ³n amparada por
> el
> > secreto profesional.
> > Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> > inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> > sistema.
> > El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a
> ninguna
> > persona.
> >
>



-- 

[image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]

*Ing. Federico Gaule*
Líder Técnico - PAM <ho...@despegar.com>
Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
tel. +54 (11) 4894-3500

*[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
en YouTube!] <http://www.youtube.com/Despegar>*
*Despegar.com, Inc. *
El mejor precio para tu viaje.

Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
secreto profesional.
Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
sistema.
El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
persona.

Re: RPC - Queue Time when handlers are all waiting

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi,

When you say that B doesn't get any read/write operation, does it mean you
stopped the replication? Or B is still getting the write operations from A
because of the replication? If so, that's why you RPC queue is used...

JM


2013/12/9 Federico Gaule <fg...@despegar.com>

> Not much information in RS logs (DEBUG level set to
> org.apache.hadoop.hbase). Here is a sample of one regionserver showing
> increasing rpc.metrics.RpcQueueTime_num_ops and
> rpc.metrics.RpcQueueTime_avg_time
> activity:
>
> 2013-12-09 08:09:10,699 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=23.14 MB,
> free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151, hits=122168501,
> hitRatio=99.77%, , cachingAccesses=122192927, cachingHits=122162378,
> cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> evictedPerRun=Infinity
> 2013-12-09 08:09:11,396 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> replicated: 1
> 2013-12-09 08:09:14,979 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> replicated: 2
> 2013-12-09 08:09:16,016 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> replicated: 1
> ...
> 2013-12-09 08:14:07,659 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> replicated: 1
> 2013-12-09 08:14:08,713 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> replicated: 3
> 2013-12-09 08:14:10,699 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=23.14 MB,
> free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151, hits=122168501,
> hitRatio=99.77%, , cachingAccesses=122192927, cachingHits=122162378,
> cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> evictedPerRun=Infinity
> 2013-12-09 08:14:12,711 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> replicated: 1
> 2013-12-09 08:14:14,778 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> replicated: 3
> ...
> 2013-12-09 08:15:09,199 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> replicated: 3
> 2013-12-09 08:15:12,243 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> replicated: 2
> 2013-12-09 08:15:22,086 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
> replicated: 2
>
> Thanks
>
>
> 2013/12/7 Bharath Vissapragada <bh...@cloudera.com>
>
> > I'd look into the RS logs to see whats happening there. Difficult to
> guess
> > from the given information!
> >
> >
> > On Sat, Dec 7, 2013 at 8:52 PM, Federico Gaule <fg...@despegar.com>
> > wrote:
> >
> > > Any clue?
> > > El dic 5, 2013 9:49 a.m., "Federico Gaule" <fg...@despegar.com>
> > escribió:
> > >
> > > > Hi,
> > > >
> > > > I have 2 clusters, Master (a) - Slave (b) replication.
> > > > B doesn't have client write or reads, all handlers (100) are waiting
> > but
> > > > rpc.metrics.RpcQueueTime_num_ops and
> rpc.metrics.RpcQueueTime_avg_time
> > > reports
> > > > to be rpc calls to be queued.
> > > > There are some screenshots below to show ganglia metrics. How is this
> > > > behaviour explained? I have looked for metrics specifications but
> can't
> > > > find much information.
> > > >
> > > > Handlers
> > > > http://i42.tinypic.com/242ssoz.png
> > > >
> > > > NumOps
> > > > http://tinypic.com/r/of2c8k/5
> > > >
> > > > AvgTime
> > > > http://tinypic.com/r/2lsvg5w/5
> > > >
> > > > Cheers
> > > >
> > >
> >
> >
> >
> > --
> > Bharath Vissapragada
> > <http://www.cloudera.com>
> >
>
>
>
> --
>
> [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
>
> *Ing. Federico Gaule*
> Líder Técnico - PAM <ho...@despegar.com>
> Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> tel. +54 (11) 4894-3500
>
> *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
> Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
> en YouTube!] <http://www.youtube.com/Despegar>*
> *Despegar.com, Inc. *
> El mejor precio para tu viaje.
>
> Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
> secreto profesional.
> Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
> inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
> sistema.
> El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
> persona.
>

Re: RPC - Queue Time when handlers are all waiting

Posted by Federico Gaule <fg...@despegar.com>.

Not much information in RS logs (DEBUG level set to
org.apache.hadoop.hbase). Here is a sample of one regionserver showing
increasing rpc.metrics.RpcQueueTime_num_ops and
rpc.metrics.RpcQueueTime_avg_time
activity:

2013-12-09 08:09:10,699 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=23.14 MB,
free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151, hits=122168501,
hitRatio=99.77%, , cachingAccesses=122192927, cachingHits=122162378,
cachingHitsRatio=99.97%, , evictions=0, evicted=6768, evictedPerRun=Infinity
2013-12-09 08:09:11,396 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
replicated: 1
2013-12-09 08:09:14,979 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
replicated: 2
2013-12-09 08:09:16,016 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
replicated: 1
...
2013-12-09 08:14:07,659 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
replicated: 1
2013-12-09 08:14:08,713 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
replicated: 3
2013-12-09 08:14:10,699 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=23.14 MB,
free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151, hits=122168501,
hitRatio=99.77%, , cachingAccesses=122192927, cachingHits=122162378,
cachingHitsRatio=99.97%, , evictions=0, evicted=6768, evictedPerRun=Infinity
2013-12-09 08:14:12,711 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
replicated: 1
2013-12-09 08:14:14,778 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
replicated: 3
...
2013-12-09 08:15:09,199 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
replicated: 3
2013-12-09 08:15:12,243 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
replicated: 2
2013-12-09 08:15:22,086 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSink: Total
replicated: 2

Thanks


2013/12/7 Bharath Vissapragada <bh...@cloudera.com>

> I'd look into the RS logs to see whats happening there. Difficult to guess
> from the given information!
>
>
> On Sat, Dec 7, 2013 at 8:52 PM, Federico Gaule <fg...@despegar.com>
> wrote:
>
> > Any clue?
> > El dic 5, 2013 9:49 a.m., "Federico Gaule" <fg...@despegar.com>
> escribió:
> >
> > > Hi,
> > >
> > > I have 2 clusters, Master (a) - Slave (b) replication.
> > > B doesn't have client write or reads, all handlers (100) are waiting
> but
> > > rpc.metrics.RpcQueueTime_num_ops and rpc.metrics.RpcQueueTime_avg_time
> > reports
> > > to be rpc calls to be queued.
> > > There are some screenshots below to show ganglia metrics. How is this
> > > behaviour explained? I have looked for metrics specifications but can't
> > > find much information.
> > >
> > > Handlers
> > > http://i42.tinypic.com/242ssoz.png
> > >
> > > NumOps
> > > http://tinypic.com/r/of2c8k/5
> > >
> > > AvgTime
> > > http://tinypic.com/r/2lsvg5w/5
> > >
> > > Cheers
> > >
> >
>
>
>
> --
> Bharath Vissapragada
> <http://www.cloudera.com>
>



-- 

[image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]

*Ing. Federico Gaule*
Líder Técnico - PAM <ho...@despegar.com>
Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
tel. +54 (11) 4894-3500

*[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
en YouTube!] <http://www.youtube.com/Despegar>*
*Despegar.com, Inc. *
El mejor precio para tu viaje.

Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
secreto profesional.
Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
sistema.
El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
persona.

Re: RPC - Queue Time when handlers are all waiting

Posted by Bharath Vissapragada <bh...@cloudera.com>.

I'd look into the RS logs to see whats happening there. Difficult to guess
from the given information!


On Sat, Dec 7, 2013 at 8:52 PM, Federico Gaule <fg...@despegar.com> wrote:

> Any clue?
> El dic 5, 2013 9:49 a.m., "Federico Gaule" <fg...@despegar.com> escribió:
>
> > Hi,
> >
> > I have 2 clusters, Master (a) - Slave (b) replication.
> > B doesn't have client write or reads, all handlers (100) are waiting but
> > rpc.metrics.RpcQueueTime_num_ops and rpc.metrics.RpcQueueTime_avg_time
> reports
> > to be rpc calls to be queued.
> > There are some screenshots below to show ganglia metrics. How is this
> > behaviour explained? I have looked for metrics specifications but can't
> > find much information.
> >
> > Handlers
> > http://i42.tinypic.com/242ssoz.png
> >
> > NumOps
> > http://tinypic.com/r/of2c8k/5
> >
> > AvgTime
> > http://tinypic.com/r/2lsvg5w/5
> >
> > Cheers
> >
>



-- 
Bharath Vissapragada
<http://www.cloudera.com>

Re: RPC - Queue Time when handlers are all waiting

Posted by Federico Gaule <fg...@despegar.com>.

Any clue?
El dic 5, 2013 9:49 a.m., "Federico Gaule" <fg...@despegar.com> escribió:

> Hi,
>
> I have 2 clusters, Master (a) - Slave (b) replication.
> B doesn't have client write or reads, all handlers (100) are waiting but
> rpc.metrics.RpcQueueTime_num_ops and rpc.metrics.RpcQueueTime_avg_time reports
> to be rpc calls to be queued.
> There are some screenshots below to show ganglia metrics. How is this
> behaviour explained? I have looked for metrics specifications but can't
> find much information.
>
> Handlers
> http://i42.tinypic.com/242ssoz.png
>
> NumOps
> http://tinypic.com/r/of2c8k/5
>
> AvgTime
> http://tinypic.com/r/2lsvg5w/5
>
> Cheers
>

Re: RPC - Queue Time when handlers are all waiting

Posted by Federico Gaule <fg...@despegar.com>.

Nope, since i've increase replication handlers from its default, it's
reporting with a ceiling of 1, before that a few days ago reported 15.
Why are RPC  queued in spite of having all handlers WAITING?

2013/12/10 Andrew Purtell <ap...@apache.org>

> Aside from other issues being explored on this thread, "REPL IPC Server
> handler N on PORT WAITING Waiting for a call (since 22 hrs, 57mins, 38sec
> ago)" looks to me like ReplicationSource setting up IPC with INT_MAX for a
> timeout.
>
>
> On Thu, Dec 5, 2013 at 8:49 PM, Federico Gaule <fg...@despegar.com>
> wrote:
>
> > Hi,
> >
> > I have 2 clusters, Master (a) - Slave (b) replication.
> > B doesn't have client write or reads, all handlers (100) are waiting but
> > rpc.metrics.RpcQueueTime_num_ops and rpc.metrics.RpcQueueTime_avg_time
> > reports
> > to be rpc calls to be queued.
> > There are some screenshots below to show ganglia metrics. How is this
> > behaviour explained? I have looked for metrics specifications but can't
> > find much information.
> >
> > Handlers
> > http://i42.tinypic.com/242ssoz.png
> >
> > NumOps
> > http://tinypic.com/r/of2c8k/5
> >
> > AvgTime
> > http://tinypic.com/r/2lsvg5w/5
> >
> > Cheers
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

-- 

[image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]

*Ing. Federico Gaule*
Líder Técnico - PAM <ho...@despegar.com>
Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
tel. +54 (11) 4894-3500

*[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
en YouTube!] <http://www.youtube.com/Despegar>*
*Despegar.com, Inc. *
El mejor precio para tu viaje.

Este mensaje es confidencial y puede contener informaciÃ³n amparada por el
secreto profesional.
Si usted ha recibido este e-mail por error, por favor comunÃquenoslo
inmediatamente respondiendo a este e-mail y luego eliminÃ¡ndolo de su
sistema.
El contenido de este mensaje no deberÃ¡ ser copiado ni divulgado a ninguna
persona.

Re: RPC - Queue Time when handlers are all waiting

Posted by Andrew Purtell <ap...@apache.org>.

Aside from other issues being explored on this thread, "REPL IPC Server
handler N on PORT WAITING Waiting for a call (since 22 hrs, 57mins, 38sec
ago)" looks to me like ReplicationSource setting up IPC with INT_MAX for a
timeout.

On Thu, Dec 5, 2013 at 8:49 PM, Federico Gaule <fg...@despegar.com> wrote:

> Hi,
>
> I have 2 clusters, Master (a) - Slave (b) replication.
> B doesn't have client write or reads, all handlers (100) are waiting but
> rpc.metrics.RpcQueueTime_num_ops and rpc.metrics.RpcQueueTime_avg_time
> reports
> to be rpc calls to be queued.
> There are some screenshots below to show ganglia metrics. How is this
> behaviour explained? I have looked for metrics specifications but can't
> find much information.
>
> Handlers
> http://i42.tinypic.com/242ssoz.png
>
> NumOps
> http://tinypic.com/r/of2c8k/5
>
> AvgTime
> http://tinypic.com/r/2lsvg5w/5
>
> Cheers
>

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)