You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Dejan Menges <de...@gmail.com> on 2015/06/20 10:54:27 UTC

HDFS Short-Circuit Local Reads

Hi,

We are using (still, until Monday) HDP 2.1 for quite some time now, and SC
local reads were enabled all the time. In beginning, we used Hortonworks
recommendations and set SC cache size to 256, with default 5 minutes to
invalidate them, and that's where problems started.

At some point in time we started using multigets. After very short time
they started timing out on our side. We were playing with different
timeouts, graphite was showing (metric
hbase.regionserver.RegionServer.get_mean) that load on three nodes out of
all other increased drastically. Looking into logs, googling, going through
documentation over and over again, we found some discussion that SC cache
by should be no lower than 4096. After setting it up to 4096, our problem
was solved. For some time.

At some point our data usage patterns were changed, and as we already had
monitoring for this stuff, multigets started timing out again, monitoring
showing it's timing out on two nodes where number of open sockets was ~3-4k
per node, while on all others was 400-500. Narrowing this down a little bit
we found some strangely too big regions, did some splitting, some manual
merges, HBase distributed it around, but issue was still there. And then I
found next three things (here's questions coming):

- With cache size of 4096, and 300000ms cache expiry timeout, we saw
exactly every ten minutes this error in logs:

2015-06-18 14:26:07,093 WARN org.apache.hadoop.hdfs.BlockReaderLocal: error
creating DomainSocket
2015-06-18 14:26:07,093 WARN
org.apache.hadoop.hdfs.client.ShortCircuitCache:
ShortCircuitCache(0x3d1dc8c9): failed to load
1109699858_BP-1988583858-172.22.5.40-1424448407690
--
2015-06-18 14:36:07,135 WARN org.apache.hadoop.hdfs.BlockReaderLocal: error
creating DomainSocket
2015-06-18 14:36:07,136 WARN
org.apache.hadoop.hdfs.client.ShortCircuitCache:
ShortCircuitCache(0x3d1dc8c9): failed to load
1109704764_BP-1988583858-172.22.5.40-1424448407690
--
2015-06-18 14:46:07,137 WARN org.apache.hadoop.hdfs.BlockReaderLocal: error
creating DomainSocket
2015-06-18 14:46:07,138 WARN
org.apache.hadoop.hdfs.client.ShortCircuitCache:
ShortCircuitCache(0x3d1dc8c9): failed to load
1105787899_BP-1988583858-172.22.5.40-1424448407690

- After increasing SC cache to 8192 (as on those couple that were getting
up to 5-7k 4096 obviously wasn't enough):
    - Our multigets are not taking between 20-30 seconds anymore but being
again done within 5 seconds, what's our client timeout.
    - netstat -tanlp | grep -c 50010 shows now ~ 2800 open local SC per
every node.

Why would those errors be logged exactly every 10 minutes with 4096 cache
size and 5 minutes expire timeout?

Why would increasing SC cache also 'balance' number of open SC on all nodes?

Am I right that hbase.regionserver.RegionServer.get_mean shows mean number
of gets in unit on time, not time needed to make a gets? If I'm true,
increasing this made, in our case, gets faster. If I'm wrong, it made gets
slower, but then it speeded up our multigets, what's twisting my brain
after narrowing this down for a week.

How should cache and expiry timeout correlate to each other?

Thanks a lot!