You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Nick Dimiduk <nd...@apache.org> on 2016/01/20 00:14:55 UTC

HDFS short-circuit tokens expiring

Hi folks,

This looks like it sits in the intersection between HDFS and HBase. My
region server logs are flooding with messages like
"SecretManager$InvalidToken: access control error while attempting to set
up short-circuit access to <hdfs-file-path> ... is expired" [0].

These logs correspond with responseTooSlow WARNings from the region server.

Maybe I have misconfigured short-circuit reads? Such an expiration seems
like something the client or client consumer should handle re-negotiations.

Thanks a lot,
-n

[0]

2016-01-19 22:10:14,432 INFO
 [B.defaultRpcServer.handler=4,queue=1,port=16020]
shortcircuit.ShortCircuitCache: ShortCircuitCache(0x71bdc547): could not
load 1074037633_BP-1145309065-XXX-1448053136416 due to InvalidToken
exception.
org.apache.hadoop.security.token.SecretManager$InvalidToken: access control
error while attempting to set up short-circuit access to <hfile path> token
with block_token_identifier (expiryDate=1453194430724, keyId=1508822027,
userId=hbase, blockPoolId=BP-1145309065-XXX-1448053136416,
blockId=1074037633, access modes=[READ]) is expired.
at
org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:591)
at
org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:490)
at
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:782)
at
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:716)
at
org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422)
at
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333)
at
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
at java.io.DataInputStream.read(DataInputStream.java:149)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:678)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1372)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1591)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437)
at
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:614)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:267)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:181)
at
org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:256)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:817)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:792)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:621)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5486)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5637)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5424)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5410)
at
org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver$1.next(GroupedAggregateRegionObserver.java:510)
at
org.apache.phoenix.coprocessor.BaseRegionScanner.next(BaseRegionScanner.java:40)
at
org.apache.phoenix.coprocessor.BaseRegionScanner.nextRaw(BaseRegionScanner.java:60)
at
org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:77)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2395)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)

Re: HDFS short-circuit tokens expiring

Posted by Nick Dimiduk <nd...@apache.org>.

This explanation makes sense and is consistent with what I'm seeing. From
what I can tell, the requests to succeed eventually. I filed HDFS-9803 as
you suggested.

Thanks,
Nick


On Wed, Jan 20, 2016 at 11:01 AM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> Hi Nick,
>
> This is something that the client does recover from internally.  I expect
> the exception would not propagate to the caller, and therefore the reading
> process would continue successfully.  There is a potential for a latency
> hit on the extra NameNode RPC required for recovery.
>
> On opening a file, the client issues an RPC to the NameNode to determine
> block locations for reading the blocks from DataNodes.  In a secured
> cluster, this RPC response also includes a block access token for each
> block, which the client needs for subsequent authorization to the DataNodes
> hosting replicas of those blocks.  In typical configuration, this
> information (block locations and access tokens) is pre-fetched for up to 10
> blocks, possibly capped by file size of course.  Block access tokens have
> an expiration.  Let's say for example that the application does not read
> all the way through to the tenth block within the block access token
> expiration.  At that point, the client would recover internally by issuing
> another RPC to the NameNode to fetch a fresh block access token with a new
> expiration.  Then, the read would continue as normal.
>
> Does this explanation jive with what you're seeing?  Are the reads
> successful, but experiencing undesired latency (the "response too slow"
> warning you mentioned)?  If so, then maybe this hints at a feature request
> for HDFS.  For example, maybe it would be desirable to detect "almost
> expired" block access tokens in a background thread and refetch them
> proactively.  That way, the caller wouldn't absorb the latency hit of the
> extra RPC as part of its read call.  Please feel free to file an HDFS JIRA
> if this makes sense, or if you have something else like it in mind.
>
> --Chris Nauroth
>
> From: Dejan Menges <de...@gmail.com>>
> Date: Wednesday, January 20, 2016 at 3:26 AM
> To: Nick Dimiduk <nd...@apache.org>>, "
> user@hadoop.apache.org<ma...@hadoop.apache.org>" <
> user@hadoop.apache.org<ma...@hadoop.apache.org>>
> Cc: hbase-user <us...@hbase.apache.org>>
> Subject: Re: HDFS short-circuit tokens expiring
>
> Hi Nick,
>
> I had exactly the same case, and in our case it was that tokens were
> expiring too quickly. What we increased was
> dfs.client.read.shortcircuit.streams.cache.size and
> dfs.client.read.shortcircuit.streams.cache.expiry.ms<
> http://dfs.client.read.shortcircuit.streams.cache.expiry.ms>.
>
> Hope this helps.
>
> Best,
> Dejan
>
> On Wed, Jan 20, 2016 at 12:15 AM Nick Dimiduk <ndimiduk@apache.org<mailto:
> ndimiduk@apache.org>> wrote:
> Hi folks,
>
> This looks like it sits in the intersection between HDFS and HBase. My
> region server logs are flooding with messages like
> "SecretManager$InvalidToken: access control error while attempting to set
> up short-circuit access to <hdfs-file-path> ... is expired" [0].
>
> These logs correspond with responseTooSlow WARNings from the region server.
>
> Maybe I have misconfigured short-circuit reads? Such an expiration seems
> like something the client or client consumer should handle re-negotiations.
>
> Thanks a lot,
> -n
>
> [0]
>
> 2016-01-19 22:10:14,432 INFO
> [B.defaultRpcServer.handler=4,queue=1,port=16020]
> shortcircuit.ShortCircuitCache: ShortCircuitCache(0x71bdc547): could not
> load 1074037633_BP-1145309065-XXX-1448053136416 due to InvalidToken
> exception.
> org.apache.hadoop.security.token.SecretManager$InvalidToken: access
> control error while attempting to set up short-circuit access to <hfile
> path> token with block_token_identifier (expiryDate=1453194430724,
> keyId=1508822027, userId=hbase,
> blockPoolId=BP-1145309065-XXX-1448053136416, blockId=1074037633, access
> modes=[READ]) is expired.
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:591)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:490)
> at
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:782)
> at
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:716)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333)
> at
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
> at
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:678)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1372)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1591)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:614)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:267)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:181)
> at
> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:256)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:817)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:792)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:621)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5486)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5637)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5424)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5410)
> at
> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver$1.next(GroupedAggregateRegionObserver.java:510)
> at
> org.apache.phoenix.coprocessor.BaseRegionScanner.next(BaseRegionScanner.java:40)
> at
> org.apache.phoenix.coprocessor.BaseRegionScanner.nextRaw(BaseRegionScanner.java:60)
> at
> org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:77)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2395)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
>
>

Re: HDFS short-circuit tokens expiring

Posted by Nick Dimiduk <nd...@apache.org>.

This explanation makes sense and is consistent with what I'm seeing. From
what I can tell, the requests to succeed eventually. I filed HDFS-9803 as
you suggested.

Thanks,
Nick


On Wed, Jan 20, 2016 at 11:01 AM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> Hi Nick,
>
> This is something that the client does recover from internally.  I expect
> the exception would not propagate to the caller, and therefore the reading
> process would continue successfully.  There is a potential for a latency
> hit on the extra NameNode RPC required for recovery.
>
> On opening a file, the client issues an RPC to the NameNode to determine
> block locations for reading the blocks from DataNodes.  In a secured
> cluster, this RPC response also includes a block access token for each
> block, which the client needs for subsequent authorization to the DataNodes
> hosting replicas of those blocks.  In typical configuration, this
> information (block locations and access tokens) is pre-fetched for up to 10
> blocks, possibly capped by file size of course.  Block access tokens have
> an expiration.  Let's say for example that the application does not read
> all the way through to the tenth block within the block access token
> expiration.  At that point, the client would recover internally by issuing
> another RPC to the NameNode to fetch a fresh block access token with a new
> expiration.  Then, the read would continue as normal.
>
> Does this explanation jive with what you're seeing?  Are the reads
> successful, but experiencing undesired latency (the "response too slow"
> warning you mentioned)?  If so, then maybe this hints at a feature request
> for HDFS.  For example, maybe it would be desirable to detect "almost
> expired" block access tokens in a background thread and refetch them
> proactively.  That way, the caller wouldn't absorb the latency hit of the
> extra RPC as part of its read call.  Please feel free to file an HDFS JIRA
> if this makes sense, or if you have something else like it in mind.
>
> --Chris Nauroth
>
> From: Dejan Menges <de...@gmail.com>>
> Date: Wednesday, January 20, 2016 at 3:26 AM
> To: Nick Dimiduk <nd...@apache.org>>, "
> user@hadoop.apache.org<ma...@hadoop.apache.org>" <
> user@hadoop.apache.org<ma...@hadoop.apache.org>>
> Cc: hbase-user <us...@hbase.apache.org>>
> Subject: Re: HDFS short-circuit tokens expiring
>
> Hi Nick,
>
> I had exactly the same case, and in our case it was that tokens were
> expiring too quickly. What we increased was
> dfs.client.read.shortcircuit.streams.cache.size and
> dfs.client.read.shortcircuit.streams.cache.expiry.ms<
> http://dfs.client.read.shortcircuit.streams.cache.expiry.ms>.
>
> Hope this helps.
>
> Best,
> Dejan
>
> On Wed, Jan 20, 2016 at 12:15 AM Nick Dimiduk <ndimiduk@apache.org<mailto:
> ndimiduk@apache.org>> wrote:
> Hi folks,
>
> This looks like it sits in the intersection between HDFS and HBase. My
> region server logs are flooding with messages like
> "SecretManager$InvalidToken: access control error while attempting to set
> up short-circuit access to <hdfs-file-path> ... is expired" [0].
>
> These logs correspond with responseTooSlow WARNings from the region server.
>
> Maybe I have misconfigured short-circuit reads? Such an expiration seems
> like something the client or client consumer should handle re-negotiations.
>
> Thanks a lot,
> -n
>
> [0]
>
> 2016-01-19 22:10:14,432 INFO
> [B.defaultRpcServer.handler=4,queue=1,port=16020]
> shortcircuit.ShortCircuitCache: ShortCircuitCache(0x71bdc547): could not
> load 1074037633_BP-1145309065-XXX-1448053136416 due to InvalidToken
> exception.
> org.apache.hadoop.security.token.SecretManager$InvalidToken: access
> control error while attempting to set up short-circuit access to <hfile
> path> token with block_token_identifier (expiryDate=1453194430724,
> keyId=1508822027, userId=hbase,
> blockPoolId=BP-1145309065-XXX-1448053136416, blockId=1074037633, access
> modes=[READ]) is expired.
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:591)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:490)
> at
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:782)
> at
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:716)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333)
> at
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
> at
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:678)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1372)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1591)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:614)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:267)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:181)
> at
> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:256)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:817)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:792)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:621)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5486)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5637)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5424)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5410)
> at
> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver$1.next(GroupedAggregateRegionObserver.java:510)
> at
> org.apache.phoenix.coprocessor.BaseRegionScanner.next(BaseRegionScanner.java:40)
> at
> org.apache.phoenix.coprocessor.BaseRegionScanner.nextRaw(BaseRegionScanner.java:60)
> at
> org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:77)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2395)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
>
>

Re: HDFS short-circuit tokens expiring

Posted by Nick Dimiduk <nd...@apache.org>.

This explanation makes sense and is consistent with what I'm seeing. From
what I can tell, the requests to succeed eventually. I filed HDFS-9803 as
you suggested.

Thanks,
Nick


On Wed, Jan 20, 2016 at 11:01 AM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> Hi Nick,
>
> This is something that the client does recover from internally.  I expect
> the exception would not propagate to the caller, and therefore the reading
> process would continue successfully.  There is a potential for a latency
> hit on the extra NameNode RPC required for recovery.
>
> On opening a file, the client issues an RPC to the NameNode to determine
> block locations for reading the blocks from DataNodes.  In a secured
> cluster, this RPC response also includes a block access token for each
> block, which the client needs for subsequent authorization to the DataNodes
> hosting replicas of those blocks.  In typical configuration, this
> information (block locations and access tokens) is pre-fetched for up to 10
> blocks, possibly capped by file size of course.  Block access tokens have
> an expiration.  Let's say for example that the application does not read
> all the way through to the tenth block within the block access token
> expiration.  At that point, the client would recover internally by issuing
> another RPC to the NameNode to fetch a fresh block access token with a new
> expiration.  Then, the read would continue as normal.
>
> Does this explanation jive with what you're seeing?  Are the reads
> successful, but experiencing undesired latency (the "response too slow"
> warning you mentioned)?  If so, then maybe this hints at a feature request
> for HDFS.  For example, maybe it would be desirable to detect "almost
> expired" block access tokens in a background thread and refetch them
> proactively.  That way, the caller wouldn't absorb the latency hit of the
> extra RPC as part of its read call.  Please feel free to file an HDFS JIRA
> if this makes sense, or if you have something else like it in mind.
>
> --Chris Nauroth
>
> From: Dejan Menges <de...@gmail.com>>
> Date: Wednesday, January 20, 2016 at 3:26 AM
> To: Nick Dimiduk <nd...@apache.org>>, "
> user@hadoop.apache.org<ma...@hadoop.apache.org>" <
> user@hadoop.apache.org<ma...@hadoop.apache.org>>
> Cc: hbase-user <us...@hbase.apache.org>>
> Subject: Re: HDFS short-circuit tokens expiring
>
> Hi Nick,
>
> I had exactly the same case, and in our case it was that tokens were
> expiring too quickly. What we increased was
> dfs.client.read.shortcircuit.streams.cache.size and
> dfs.client.read.shortcircuit.streams.cache.expiry.ms<
> http://dfs.client.read.shortcircuit.streams.cache.expiry.ms>.
>
> Hope this helps.
>
> Best,
> Dejan
>
> On Wed, Jan 20, 2016 at 12:15 AM Nick Dimiduk <ndimiduk@apache.org<mailto:
> ndimiduk@apache.org>> wrote:
> Hi folks,
>
> This looks like it sits in the intersection between HDFS and HBase. My
> region server logs are flooding with messages like
> "SecretManager$InvalidToken: access control error while attempting to set
> up short-circuit access to <hdfs-file-path> ... is expired" [0].
>
> These logs correspond with responseTooSlow WARNings from the region server.
>
> Maybe I have misconfigured short-circuit reads? Such an expiration seems
> like something the client or client consumer should handle re-negotiations.
>
> Thanks a lot,
> -n
>
> [0]
>
> 2016-01-19 22:10:14,432 INFO
> [B.defaultRpcServer.handler=4,queue=1,port=16020]
> shortcircuit.ShortCircuitCache: ShortCircuitCache(0x71bdc547): could not
> load 1074037633_BP-1145309065-XXX-1448053136416 due to InvalidToken
> exception.
> org.apache.hadoop.security.token.SecretManager$InvalidToken: access
> control error while attempting to set up short-circuit access to <hfile
> path> token with block_token_identifier (expiryDate=1453194430724,
> keyId=1508822027, userId=hbase,
> blockPoolId=BP-1145309065-XXX-1448053136416, blockId=1074037633, access
> modes=[READ]) is expired.
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:591)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:490)
> at
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:782)
> at
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:716)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333)
> at
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
> at
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:678)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1372)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1591)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:614)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:267)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:181)
> at
> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:256)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:817)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:792)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:621)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5486)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5637)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5424)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5410)
> at
> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver$1.next(GroupedAggregateRegionObserver.java:510)
> at
> org.apache.phoenix.coprocessor.BaseRegionScanner.next(BaseRegionScanner.java:40)
> at
> org.apache.phoenix.coprocessor.BaseRegionScanner.nextRaw(BaseRegionScanner.java:60)
> at
> org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:77)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2395)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
>
>

Re: HDFS short-circuit tokens expiring

Posted by Nick Dimiduk <nd...@apache.org>.

This explanation makes sense and is consistent with what I'm seeing. From
what I can tell, the requests to succeed eventually. I filed HDFS-9803 as
you suggested.

Thanks,
Nick


On Wed, Jan 20, 2016 at 11:01 AM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> Hi Nick,
>
> This is something that the client does recover from internally.  I expect
> the exception would not propagate to the caller, and therefore the reading
> process would continue successfully.  There is a potential for a latency
> hit on the extra NameNode RPC required for recovery.
>
> On opening a file, the client issues an RPC to the NameNode to determine
> block locations for reading the blocks from DataNodes.  In a secured
> cluster, this RPC response also includes a block access token for each
> block, which the client needs for subsequent authorization to the DataNodes
> hosting replicas of those blocks.  In typical configuration, this
> information (block locations and access tokens) is pre-fetched for up to 10
> blocks, possibly capped by file size of course.  Block access tokens have
> an expiration.  Let's say for example that the application does not read
> all the way through to the tenth block within the block access token
> expiration.  At that point, the client would recover internally by issuing
> another RPC to the NameNode to fetch a fresh block access token with a new
> expiration.  Then, the read would continue as normal.
>
> Does this explanation jive with what you're seeing?  Are the reads
> successful, but experiencing undesired latency (the "response too slow"
> warning you mentioned)?  If so, then maybe this hints at a feature request
> for HDFS.  For example, maybe it would be desirable to detect "almost
> expired" block access tokens in a background thread and refetch them
> proactively.  That way, the caller wouldn't absorb the latency hit of the
> extra RPC as part of its read call.  Please feel free to file an HDFS JIRA
> if this makes sense, or if you have something else like it in mind.
>
> --Chris Nauroth
>
> From: Dejan Menges <de...@gmail.com>>
> Date: Wednesday, January 20, 2016 at 3:26 AM
> To: Nick Dimiduk <nd...@apache.org>>, "
> user@hadoop.apache.org<ma...@hadoop.apache.org>" <
> user@hadoop.apache.org<ma...@hadoop.apache.org>>
> Cc: hbase-user <us...@hbase.apache.org>>
> Subject: Re: HDFS short-circuit tokens expiring
>
> Hi Nick,
>
> I had exactly the same case, and in our case it was that tokens were
> expiring too quickly. What we increased was
> dfs.client.read.shortcircuit.streams.cache.size and
> dfs.client.read.shortcircuit.streams.cache.expiry.ms<
> http://dfs.client.read.shortcircuit.streams.cache.expiry.ms>.
>
> Hope this helps.
>
> Best,
> Dejan
>
> On Wed, Jan 20, 2016 at 12:15 AM Nick Dimiduk <ndimiduk@apache.org<mailto:
> ndimiduk@apache.org>> wrote:
> Hi folks,
>
> This looks like it sits in the intersection between HDFS and HBase. My
> region server logs are flooding with messages like
> "SecretManager$InvalidToken: access control error while attempting to set
> up short-circuit access to <hdfs-file-path> ... is expired" [0].
>
> These logs correspond with responseTooSlow WARNings from the region server.
>
> Maybe I have misconfigured short-circuit reads? Such an expiration seems
> like something the client or client consumer should handle re-negotiations.
>
> Thanks a lot,
> -n
>
> [0]
>
> 2016-01-19 22:10:14,432 INFO
> [B.defaultRpcServer.handler=4,queue=1,port=16020]
> shortcircuit.ShortCircuitCache: ShortCircuitCache(0x71bdc547): could not
> load 1074037633_BP-1145309065-XXX-1448053136416 due to InvalidToken
> exception.
> org.apache.hadoop.security.token.SecretManager$InvalidToken: access
> control error while attempting to set up short-circuit access to <hfile
> path> token with block_token_identifier (expiryDate=1453194430724,
> keyId=1508822027, userId=hbase,
> blockPoolId=BP-1145309065-XXX-1448053136416, blockId=1074037633, access
> modes=[READ]) is expired.
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:591)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:490)
> at
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:782)
> at
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:716)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333)
> at
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
> at
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:678)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1372)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1591)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:614)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:267)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:181)
> at
> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:256)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:817)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:792)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:621)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5486)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5637)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5424)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5410)
> at
> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver$1.next(GroupedAggregateRegionObserver.java:510)
> at
> org.apache.phoenix.coprocessor.BaseRegionScanner.next(BaseRegionScanner.java:40)
> at
> org.apache.phoenix.coprocessor.BaseRegionScanner.nextRaw(BaseRegionScanner.java:60)
> at
> org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:77)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2395)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
>
>

Re: HDFS short-circuit tokens expiring

Posted by Nick Dimiduk <nd...@apache.org>.

This explanation makes sense and is consistent with what I'm seeing. From
what I can tell, the requests to succeed eventually. I filed HDFS-9803 as
you suggested.

Thanks,
Nick


On Wed, Jan 20, 2016 at 11:01 AM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> Hi Nick,
>
> This is something that the client does recover from internally.  I expect
> the exception would not propagate to the caller, and therefore the reading
> process would continue successfully.  There is a potential for a latency
> hit on the extra NameNode RPC required for recovery.
>
> On opening a file, the client issues an RPC to the NameNode to determine
> block locations for reading the blocks from DataNodes.  In a secured
> cluster, this RPC response also includes a block access token for each
> block, which the client needs for subsequent authorization to the DataNodes
> hosting replicas of those blocks.  In typical configuration, this
> information (block locations and access tokens) is pre-fetched for up to 10
> blocks, possibly capped by file size of course.  Block access tokens have
> an expiration.  Let's say for example that the application does not read
> all the way through to the tenth block within the block access token
> expiration.  At that point, the client would recover internally by issuing
> another RPC to the NameNode to fetch a fresh block access token with a new
> expiration.  Then, the read would continue as normal.
>
> Does this explanation jive with what you're seeing?  Are the reads
> successful, but experiencing undesired latency (the "response too slow"
> warning you mentioned)?  If so, then maybe this hints at a feature request
> for HDFS.  For example, maybe it would be desirable to detect "almost
> expired" block access tokens in a background thread and refetch them
> proactively.  That way, the caller wouldn't absorb the latency hit of the
> extra RPC as part of its read call.  Please feel free to file an HDFS JIRA
> if this makes sense, or if you have something else like it in mind.
>
> --Chris Nauroth
>
> From: Dejan Menges <de...@gmail.com>>
> Date: Wednesday, January 20, 2016 at 3:26 AM
> To: Nick Dimiduk <nd...@apache.org>>, "
> user@hadoop.apache.org<ma...@hadoop.apache.org>" <
> user@hadoop.apache.org<ma...@hadoop.apache.org>>
> Cc: hbase-user <us...@hbase.apache.org>>
> Subject: Re: HDFS short-circuit tokens expiring
>
> Hi Nick,
>
> I had exactly the same case, and in our case it was that tokens were
> expiring too quickly. What we increased was
> dfs.client.read.shortcircuit.streams.cache.size and
> dfs.client.read.shortcircuit.streams.cache.expiry.ms<
> http://dfs.client.read.shortcircuit.streams.cache.expiry.ms>.
>
> Hope this helps.
>
> Best,
> Dejan
>
> On Wed, Jan 20, 2016 at 12:15 AM Nick Dimiduk <ndimiduk@apache.org<mailto:
> ndimiduk@apache.org>> wrote:
> Hi folks,
>
> This looks like it sits in the intersection between HDFS and HBase. My
> region server logs are flooding with messages like
> "SecretManager$InvalidToken: access control error while attempting to set
> up short-circuit access to <hdfs-file-path> ... is expired" [0].
>
> These logs correspond with responseTooSlow WARNings from the region server.
>
> Maybe I have misconfigured short-circuit reads? Such an expiration seems
> like something the client or client consumer should handle re-negotiations.
>
> Thanks a lot,
> -n
>
> [0]
>
> 2016-01-19 22:10:14,432 INFO
> [B.defaultRpcServer.handler=4,queue=1,port=16020]
> shortcircuit.ShortCircuitCache: ShortCircuitCache(0x71bdc547): could not
> load 1074037633_BP-1145309065-XXX-1448053136416 due to InvalidToken
> exception.
> org.apache.hadoop.security.token.SecretManager$InvalidToken: access
> control error while attempting to set up short-circuit access to <hfile
> path> token with block_token_identifier (expiryDate=1453194430724,
> keyId=1508822027, userId=hbase,
> blockPoolId=BP-1145309065-XXX-1448053136416, blockId=1074037633, access
> modes=[READ]) is expired.
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:591)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:490)
> at
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:782)
> at
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:716)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333)
> at
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
> at
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:678)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1372)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1591)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:614)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:267)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:181)
> at
> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:256)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:817)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:792)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:621)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5486)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5637)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5424)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5410)
> at
> org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver$1.next(GroupedAggregateRegionObserver.java:510)
> at
> org.apache.phoenix.coprocessor.BaseRegionScanner.next(BaseRegionScanner.java:40)
> at
> org.apache.phoenix.coprocessor.BaseRegionScanner.nextRaw(BaseRegionScanner.java:60)
> at
> org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:77)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2395)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
>
>

Re: HDFS short-circuit tokens expiring

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hi Nick,

This is something that the client does recover from internally.  I expect the exception would not propagate to the caller, and therefore the reading process would continue successfully.  There is a potential for a latency hit on the extra NameNode RPC required for recovery.

On opening a file, the client issues an RPC to the NameNode to determine block locations for reading the blocks from DataNodes.  In a secured cluster, this RPC response also includes a block access token for each block, which the client needs for subsequent authorization to the DataNodes hosting replicas of those blocks.  In typical configuration, this information (block locations and access tokens) is pre-fetched for up to 10 blocks, possibly capped by file size of course.  Block access tokens have an expiration.  Let's say for example that the application does not read all the way through to the tenth block within the block access token expiration.  At that point, the client would recover internally by issuing another RPC to the NameNode to fetch a fresh block access token with a new expiration.  Then, the read would continue as normal.

Does this explanation jive with what you're seeing?  Are the reads successful, but experiencing undesired latency (the "response too slow" warning you mentioned)?  If so, then maybe this hints at a feature request for HDFS.  For example, maybe it would be desirable to detect "almost expired" block access tokens in a background thread and refetch them proactively.  That way, the caller wouldn't absorb the latency hit of the extra RPC as part of its read call.  Please feel free to file an HDFS JIRA if this makes sense, or if you have something else like it in mind.

--Chris Nauroth

From: Dejan Menges <de...@gmail.com>>
Date: Wednesday, January 20, 2016 at 3:26 AM
To: Nick Dimiduk <nd...@apache.org>>, "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Cc: hbase-user <us...@hbase.apache.org>>
Subject: Re: HDFS short-circuit tokens expiring

Hi Nick,

I had exactly the same case, and in our case it was that tokens were expiring too quickly. What we increased was dfs.client.read.shortcircuit.streams.cache.size and dfs.client.read.shortcircuit.streams.cache.expiry.ms<http://dfs.client.read.shortcircuit.streams.cache.expiry.ms>.

Hope this helps.

Best,
Dejan

On Wed, Jan 20, 2016 at 12:15 AM Nick Dimiduk <nd...@apache.org>> wrote:
Hi folks,

This looks like it sits in the intersection between HDFS and HBase. My region server logs are flooding with messages like "SecretManager$InvalidToken: access control error while attempting to set up short-circuit access to <hdfs-file-path> ... is expired" [0].

These logs correspond with responseTooSlow WARNings from the region server.

Maybe I have misconfigured short-circuit reads? Such an expiration seems like something the client or client consumer should handle re-negotiations.

Thanks a lot,
-n

[0]

2016-01-19 22:10:14,432 INFO  [B.defaultRpcServer.handler=4,queue=1,port=16020] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x71bdc547): could not load 1074037633_BP-1145309065-XXX-1448053136416 due to InvalidToken exception.
org.apache.hadoop.security.token.SecretManager$InvalidToken: access control error while attempting to set up short-circuit access to <hfile path> token with block_token_identifier (expiryDate=1453194430724, keyId=1508822027, userId=hbase, blockPoolId=BP-1145309065-XXX-1448053136416, blockId=1074037633, access modes=[READ]) is expired.
at org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:591)
at org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:490)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:782)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:716)
at org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:678)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1372)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1591)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437)
at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:614)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:267)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:181)
at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:256)
at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:817)
at org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:792)
at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:621)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5486)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5637)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5424)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5410)
at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver$1.next(GroupedAggregateRegionObserver.java:510)
at org.apache.phoenix.coprocessor.BaseRegionScanner.next(BaseRegionScanner.java:40)
at org.apache.phoenix.coprocessor.BaseRegionScanner.nextRaw(BaseRegionScanner.java:60)
at org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:77)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2395)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)

Re: HDFS short-circuit tokens expiring

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hi Nick,

This is something that the client does recover from internally.  I expect the exception would not propagate to the caller, and therefore the reading process would continue successfully.  There is a potential for a latency hit on the extra NameNode RPC required for recovery.

On opening a file, the client issues an RPC to the NameNode to determine block locations for reading the blocks from DataNodes.  In a secured cluster, this RPC response also includes a block access token for each block, which the client needs for subsequent authorization to the DataNodes hosting replicas of those blocks.  In typical configuration, this information (block locations and access tokens) is pre-fetched for up to 10 blocks, possibly capped by file size of course.  Block access tokens have an expiration.  Let's say for example that the application does not read all the way through to the tenth block within the block access token expiration.  At that point, the client would recover internally by issuing another RPC to the NameNode to fetch a fresh block access token with a new expiration.  Then, the read would continue as normal.

Does this explanation jive with what you're seeing?  Are the reads successful, but experiencing undesired latency (the "response too slow" warning you mentioned)?  If so, then maybe this hints at a feature request for HDFS.  For example, maybe it would be desirable to detect "almost expired" block access tokens in a background thread and refetch them proactively.  That way, the caller wouldn't absorb the latency hit of the extra RPC as part of its read call.  Please feel free to file an HDFS JIRA if this makes sense, or if you have something else like it in mind.

--Chris Nauroth

From: Dejan Menges <de...@gmail.com>>
Date: Wednesday, January 20, 2016 at 3:26 AM
To: Nick Dimiduk <nd...@apache.org>>, "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Cc: hbase-user <us...@hbase.apache.org>>
Subject: Re: HDFS short-circuit tokens expiring

Hi Nick,

I had exactly the same case, and in our case it was that tokens were expiring too quickly. What we increased was dfs.client.read.shortcircuit.streams.cache.size and dfs.client.read.shortcircuit.streams.cache.expiry.ms<http://dfs.client.read.shortcircuit.streams.cache.expiry.ms>.

Hope this helps.

Best,
Dejan

On Wed, Jan 20, 2016 at 12:15 AM Nick Dimiduk <nd...@apache.org>> wrote:
Hi folks,

This looks like it sits in the intersection between HDFS and HBase. My region server logs are flooding with messages like "SecretManager$InvalidToken: access control error while attempting to set up short-circuit access to <hdfs-file-path> ... is expired" [0].

These logs correspond with responseTooSlow WARNings from the region server.

Maybe I have misconfigured short-circuit reads? Such an expiration seems like something the client or client consumer should handle re-negotiations.

Thanks a lot,
-n

[0]

2016-01-19 22:10:14,432 INFO  [B.defaultRpcServer.handler=4,queue=1,port=16020] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x71bdc547): could not load 1074037633_BP-1145309065-XXX-1448053136416 due to InvalidToken exception.
org.apache.hadoop.security.token.SecretManager$InvalidToken: access control error while attempting to set up short-circuit access to <hfile path> token with block_token_identifier (expiryDate=1453194430724, keyId=1508822027, userId=hbase, blockPoolId=BP-1145309065-XXX-1448053136416, blockId=1074037633, access modes=[READ]) is expired.
at org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:591)
at org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:490)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:782)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:716)
at org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:678)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1372)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1591)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437)
at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:614)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:267)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:181)
at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:256)
at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:817)
at org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:792)
at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:621)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5486)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5637)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5424)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5410)
at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver$1.next(GroupedAggregateRegionObserver.java:510)
at org.apache.phoenix.coprocessor.BaseRegionScanner.next(BaseRegionScanner.java:40)
at org.apache.phoenix.coprocessor.BaseRegionScanner.nextRaw(BaseRegionScanner.java:60)
at org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:77)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2395)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)

Re: HDFS short-circuit tokens expiring

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hi Nick,

This is something that the client does recover from internally.  I expect the exception would not propagate to the caller, and therefore the reading process would continue successfully.  There is a potential for a latency hit on the extra NameNode RPC required for recovery.

On opening a file, the client issues an RPC to the NameNode to determine block locations for reading the blocks from DataNodes.  In a secured cluster, this RPC response also includes a block access token for each block, which the client needs for subsequent authorization to the DataNodes hosting replicas of those blocks.  In typical configuration, this information (block locations and access tokens) is pre-fetched for up to 10 blocks, possibly capped by file size of course.  Block access tokens have an expiration.  Let's say for example that the application does not read all the way through to the tenth block within the block access token expiration.  At that point, the client would recover internally by issuing another RPC to the NameNode to fetch a fresh block access token with a new expiration.  Then, the read would continue as normal.

Does this explanation jive with what you're seeing?  Are the reads successful, but experiencing undesired latency (the "response too slow" warning you mentioned)?  If so, then maybe this hints at a feature request for HDFS.  For example, maybe it would be desirable to detect "almost expired" block access tokens in a background thread and refetch them proactively.  That way, the caller wouldn't absorb the latency hit of the extra RPC as part of its read call.  Please feel free to file an HDFS JIRA if this makes sense, or if you have something else like it in mind.

--Chris Nauroth

From: Dejan Menges <de...@gmail.com>>
Date: Wednesday, January 20, 2016 at 3:26 AM
To: Nick Dimiduk <nd...@apache.org>>, "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Cc: hbase-user <us...@hbase.apache.org>>
Subject: Re: HDFS short-circuit tokens expiring

Hi Nick,

I had exactly the same case, and in our case it was that tokens were expiring too quickly. What we increased was dfs.client.read.shortcircuit.streams.cache.size and dfs.client.read.shortcircuit.streams.cache.expiry.ms<http://dfs.client.read.shortcircuit.streams.cache.expiry.ms>.

Hope this helps.

Best,
Dejan

On Wed, Jan 20, 2016 at 12:15 AM Nick Dimiduk <nd...@apache.org>> wrote:
Hi folks,

This looks like it sits in the intersection between HDFS and HBase. My region server logs are flooding with messages like "SecretManager$InvalidToken: access control error while attempting to set up short-circuit access to <hdfs-file-path> ... is expired" [0].

These logs correspond with responseTooSlow WARNings from the region server.

Maybe I have misconfigured short-circuit reads? Such an expiration seems like something the client or client consumer should handle re-negotiations.

Thanks a lot,
-n

[0]

2016-01-19 22:10:14,432 INFO  [B.defaultRpcServer.handler=4,queue=1,port=16020] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x71bdc547): could not load 1074037633_BP-1145309065-XXX-1448053136416 due to InvalidToken exception.
org.apache.hadoop.security.token.SecretManager$InvalidToken: access control error while attempting to set up short-circuit access to <hfile path> token with block_token_identifier (expiryDate=1453194430724, keyId=1508822027, userId=hbase, blockPoolId=BP-1145309065-XXX-1448053136416, blockId=1074037633, access modes=[READ]) is expired.
at org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:591)
at org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:490)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:782)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:716)
at org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:678)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1372)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1591)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437)
at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:614)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:267)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:181)
at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:256)
at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:817)
at org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:792)
at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:621)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5486)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5637)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5424)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5410)
at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver$1.next(GroupedAggregateRegionObserver.java:510)
at org.apache.phoenix.coprocessor.BaseRegionScanner.next(BaseRegionScanner.java:40)
at org.apache.phoenix.coprocessor.BaseRegionScanner.nextRaw(BaseRegionScanner.java:60)
at org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:77)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2395)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)

Re: HDFS short-circuit tokens expiring

Posted by Chris Nauroth <cn...@hortonworks.com>.

Hi Nick,

This is something that the client does recover from internally.  I expect the exception would not propagate to the caller, and therefore the reading process would continue successfully.  There is a potential for a latency hit on the extra NameNode RPC required for recovery.

On opening a file, the client issues an RPC to the NameNode to determine block locations for reading the blocks from DataNodes.  In a secured cluster, this RPC response also includes a block access token for each block, which the client needs for subsequent authorization to the DataNodes hosting replicas of those blocks.  In typical configuration, this information (block locations and access tokens) is pre-fetched for up to 10 blocks, possibly capped by file size of course.  Block access tokens have an expiration.  Let's say for example that the application does not read all the way through to the tenth block within the block access token expiration.  At that point, the client would recover internally by issuing another RPC to the NameNode to fetch a fresh block access token with a new expiration.  Then, the read would continue as normal.

Does this explanation jive with what you're seeing?  Are the reads successful, but experiencing undesired latency (the "response too slow" warning you mentioned)?  If so, then maybe this hints at a feature request for HDFS.  For example, maybe it would be desirable to detect "almost expired" block access tokens in a background thread and refetch them proactively.  That way, the caller wouldn't absorb the latency hit of the extra RPC as part of its read call.  Please feel free to file an HDFS JIRA if this makes sense, or if you have something else like it in mind.

--Chris Nauroth

From: Dejan Menges <de...@gmail.com>>
Date: Wednesday, January 20, 2016 at 3:26 AM
To: Nick Dimiduk <nd...@apache.org>>, "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Cc: hbase-user <us...@hbase.apache.org>>
Subject: Re: HDFS short-circuit tokens expiring

Hi Nick,

I had exactly the same case, and in our case it was that tokens were expiring too quickly. What we increased was dfs.client.read.shortcircuit.streams.cache.size and dfs.client.read.shortcircuit.streams.cache.expiry.ms<http://dfs.client.read.shortcircuit.streams.cache.expiry.ms>.

Hope this helps.

Best,
Dejan

On Wed, Jan 20, 2016 at 12:15 AM Nick Dimiduk <nd...@apache.org>> wrote:
Hi folks,

This looks like it sits in the intersection between HDFS and HBase. My region server logs are flooding with messages like "SecretManager$InvalidToken: access control error while attempting to set up short-circuit access to <hdfs-file-path> ... is expired" [0].

These logs correspond with responseTooSlow WARNings from the region server.

Maybe I have misconfigured short-circuit reads? Such an expiration seems like something the client or client consumer should handle re-negotiations.

Thanks a lot,
-n

[0]

2016-01-19 22:10:14,432 INFO  [B.defaultRpcServer.handler=4,queue=1,port=16020] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x71bdc547): could not load 1074037633_BP-1145309065-XXX-1448053136416 due to InvalidToken exception.
org.apache.hadoop.security.token.SecretManager$InvalidToken: access control error while attempting to set up short-circuit access to <hfile path> token with block_token_identifier (expiryDate=1453194430724, keyId=1508822027, userId=hbase, blockPoolId=BP-1145309065-XXX-1448053136416, blockId=1074037633, access modes=[READ]) is expired.
at org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:591)
at org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:490)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:782)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:716)
at org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:678)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1372)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1591)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437)
at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:614)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:267)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:181)
at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:256)
at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:817)
at org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:792)
at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:621)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5486)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5637)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5424)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5410)
at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver$1.next(GroupedAggregateRegionObserver.java:510)
at org.apache.phoenix.coprocessor.BaseRegionScanner.next(BaseRegionScanner.java:40)
at org.apache.phoenix.coprocessor.BaseRegionScanner.nextRaw(BaseRegionScanner.java:60)
at org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:77)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2395)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)