You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Nick Dimiduk <nd...@apache.org> on 2016/02/12 22:38:59 UTC

Re: HDFS short-circuit tokens expiring

This explanation makes sense and is consistent with what I'm seeing. From
what I can tell, the requests to succeed eventually. I filed HDFS-9803 as
you suggested.

Thanks,
Nick


On Wed, Jan 20, 2016 at 11:01 AM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> Hi Nick,
>
> This is something that the client does recover from internally.  I expect
> the exception would not propagate to the caller, and therefore the reading
> process would continue successfully.  There is a potential for a latency
> hit on the extra NameNode RPC required for recovery.
>
> On opening a file, the client issues an RPC to the NameNode to determine
> block locations for reading the blocks from DataNodes.  In a secured
> cluster, this RPC response also includes a block access token for each
> block, which the client needs for subsequent authorization to the DataNodes
> hosting replicas of those blocks.  In typical configuration, this
> information (block locations and access tokens) is pre-fetched for up to 10
> blocks, possibly capped by file size of course.  Block access tokens have
> an expiration.  Let's say for example that the application does not read
> all the way through to the tenth block within the block access token
> expiration.  At that point, the client would recover internally by issuing
> another RPC to the NameNode to fetch a fresh block access token with a new
> expiration.  Then, the read would continue as normal.
>
> Does this explanation jive with what you're seeing?  Are the reads
> successful, but experiencing undesired latency (the "response too slow"
> warning you mentioned)?  If so, then maybe this hints at a feature request
> for HDFS.  For example, maybe it would be desirable to detect "almost
> expired" block access tokens in a background thread and refetch them
> proactively.  That way, the caller wouldn't absorb the latency hit of the
> extra RPC as part of its read call.  Please feel free to file an HDFS JIRA
> if this makes sense, or if you have something else like it in mind.
>
> --Chris Nauroth
>
> From: Dejan Menges <de...@gmail.com>>
> Date: Wednesday, January 20, 2016 at 3:26 AM
> To: Nick Dimiduk <nd...@apache.org>>, "
> user@hadoop.apache.org<ma...@hadoop.apache.org>" <
> user@hadoop.apache.org<ma...@hadoop.apache.org>>
> Cc: hbase-user <us...@hbase.apache.org>>
> Subject: Re: HDFS short-circuit tokens expiring
>
> Hi Nick,
>
> I had exactly the same case, and in our case it was that tokens were
> expiring too quickly. What we increased was
> dfs.client.read.shortcircuit.streams.cache.size and
> dfs.client.read.shortcircuit.streams.cache.expiry.ms<
> http://dfs.client.read.shortcircuit.streams.cache.expiry.ms>.
>
> Hope this helps.
>
> Best,
> Dejan
>
> On Wed, Jan 20, 2016 at 12:15 AM Nick Dimiduk <ndimiduk@apache.org<mailto:
> ndimiduk@apache.org>> wrote:
> Hi folks,
>
> This looks like it sits in the intersection between HDFS and HBase. My
> region server logs are flooding with messages like
> "SecretManager$InvalidToken: access control error while attempting to set
> up short-circuit access to <hdfs-file-path> ... is expired" [0].
>
> These logs correspond with responseTooSlow WARNings from the region server.
>
> Maybe I have misconfigured short-circuit reads? Such an expiration seems
> like something the client or client consumer should handle re-negotiations.
>
> Thanks a lot,
> -n
>
> [0]
>
> 2016-01-19 22:10:14,432 INFO
> [B.defaultRpcServer.handler=4,queue=1,port=16020]
> shortcircuit.ShortCircuitCache: ShortCircuitCache(0x71bdc547): could not
> load 1074037633_BP-1145309065-XXX-1448053136416 due to InvalidToken
> exception.
> org.apache.hadoop.security.token.SecretManager$InvalidToken: access
> control error while attempting to set up short-circuit access to <hfile
> path> token with block_token_identifier (expiryDate=1453194430724,
> keyId=1508822027, userId=hbase,
> blockPoolId=BP-1145309065-XXX-1448053136416, blockId=1074037633, access
> modes=[READ]) is expired.
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:591)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:490)
> at
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:782)
> at
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:716)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333)
> at
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
> at
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:678)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1372)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1591)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:614)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:267)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:181)
> at
> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:256)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:817)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:792)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:621)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5486)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5637)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5424)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5410)
> at
> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver$1.next(GroupedAggregateRegionObserver.java:510)
> at
> org.apache.phoenix.coprocessor.BaseRegionScanner.next(BaseRegionScanner.java:40)
> at
> org.apache.phoenix.coprocessor.BaseRegionScanner.nextRaw(BaseRegionScanner.java:60)
> at
> org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:77)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2395)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
>
>