You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Tianying Chang <ty...@gmail.com> on 2014/04/30 18:20:44 UTC

export snapshot fail sometime due to LeaseExpiredException

Hi,

When I export large table with 460+ regions, I saw the exportSnapshot job
fail sometime (not all the time). The error of the map task is below: But I
verified the file highlighted below, it does exist. Smaller table seems
always pass. Any idea? Is it because it is too big and get session timeout?

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on /hbase/.archive/rich_pin_data_v1/7713d5331180cb610834ba1c4ebbb9b3/d/eef3642f49244547bb6606d4d0f15f1f
File does not exist. Holder DFSClient_NONMAPREDUCE_279781617_1 does
not have any open files.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
	at org.apache.hadoop.ipc.ProtobufR



Thanks

Tian-Ying

Re: export snapshot fail sometime due to LeaseExpiredException

Posted by Tianying Chang <ty...@gmail.com>.

The namenode has the same error. It seem the Exception are all for the
HFile that is bigger than 1G. Smaller HFile seems all fine.  Is it because
the default timeout too short for those copy task?

org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
/hbase/.archive/rich_pin_data_v1/ce168540863baee7b05b741775b38816/d/862ae8d7e1cd48d0a2b97658a816af8b
File does not exist. Holder DFSClient_NONMAPREDUCE_1731821456_1 does not
have any open files.

2014-04-30 16:49:51,923 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:tychang (auth:SIMPLE) cause:org.apache.hadoop.hdfs.server.namenode.
LeaseExpiredException: No lease on
/hbase/.archive/rich_pin_data_v1/7dfa2b0afee391a7c65e097722ea237e/d/6bc9add98fe1402c8d556195ba6be8e9
File does not exist. Holder DFSClient_NONMAPREDUCE_224703696_1 does not
have any open files.

2014-04-30 16:49:51,923 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 25 on 8020, call
org.apache.hadoop.hdfs.protocol.ClientProtocol.complete from
10.183.170.97:52160: error:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
No lease on
/hbase/.archive/rich_pin_data_v1/7dfa2b0afee391a7c65e097722ea237e/d/6bc9add98fe1402c8d556195ba6be8e9
File does not exist. Holder DFSClient_NONMAPREDUCE_224703696_1 does not
have any open files.

org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
/hbase/.archive/rich_pin_data_v1/7dfa2b0afee391a7c65e097722ea237e/d/6bc9add98fe1402c8d556195ba6be8e9
File does not exist. Holder DFSClient_NONMAPREDUCE_224703696_1 does not
have any open files.

2014-04-30 16:51:09,548 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:tychang (auth:SIMPLE) cause:org.apache.hadoop.hdfs.server.namenode.
LeaseExpiredException: No lease on
/hbase/.archive/rich_pin_data_v1/f1df459c3cd1be662133f9ae2b6f12cc/d/ccd80ef4569a41b0b84381014e5f7266
File does not exist. Holder DFSClient_NONMAPREDUCE_1186324301_1 does not
have any open files.

2014-04-30 16:51:09,548 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 47 on 8020, call
org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from
10.180.161.116:48239: error:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
No lease on
/hbase/.archive/rich_pin_data_v1/f1df459c3cd1be662133f9ae2b6f12cc/d/ccd80ef4569a41b0b84381014e5f7266
File does not exist. Holder DFSClient_NONMAPREDUCE_1186324301_1 does not
have any open files.

org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
/hbase/.archive/rich_pin_data_v1/f1df459c3cd1be662133f9ae2b6f12cc/d/ccd80ef4569a41b0b84381014e5f7266
File does not exist. Holder DFSClient_NONMAPREDUCE_1186324301_1 does not
have any open files.

2014-04-30 16:53:31,996 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:tychang (auth:SIMPLE) cause:org.apache.hadoop.hdfs.server.namenode.
LeaseExpiredException: No lease on
/hbase/.archive/rich_pin_data_v1/93d87cd353d06fda40054553924f3554/d/b8b832530fe0413595357b291c4e9701
File does not exist. Holder DFSClient_NONMAPREDUCE_-1410010161_1 does not
have any open files.

2014-04-30 16:53:31,996 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 58 on 8020, call
org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from
10.180.170.209:54236: error:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
No lease on
/hbase/.archive/rich_pin_data_v1/93d87cd353d06fda40054553924f3554/d/b8b832530fe0413595357b291c4e9701
File does not exist. Holder DFSClient_NONMAPREDUCE_-1410010161_1 does not
have any open files.

org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
/hbase/.archive/rich_pin_data_v1/93d87cd353d06fda40054553924f3554/d/b8b832530fe0413595357b291c4e9701
File does not exist. Holder DFSClient_NONMAPREDUCE_-1410010161_1 does not
have any open files.

On Wed, Apr 30, 2014 at 9:53 AM, Tianying Chang <ty...@gmail.com> wrote:

> we are using
>
> Hadoop 2.0.0-cdh4.2.0 and hbase 0.94.7. We also backported several
> snapshot related jira, e.g 10111(verify snapshot), 11083 (bandwidth
> throttle in exportSnapshot)
>
> I found when the  LeaseExpiredException first reported, that file indeed
> not there, and the map task retry. And I verifified couple minutes later,
> that HFile does exist under /.archive. But the retry map task still
> complain the same error of file  not exist...
>
> I will check the namenode log for the LeaseExpiredException.
>
>
> Thanks
>
> Tian-Ying
>
>
> On Wed, Apr 30, 2014 at 9:33 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> Can you give us the hbase and hadoop releases you're using ?
>>
>> Can you check namenode log around the time LeaseExpiredException was
>> encountered ?
>>
>> Cheers
>>
>>
>> On Wed, Apr 30, 2014 at 9:20 AM, Tianying Chang <ty...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > When I export large table with 460+ regions, I saw the exportSnapshot
>> job
>> > fail sometime (not all the time). The error of the map task is below:
>> But I
>> > verified the file highlighted below, it does exist. Smaller table seems
>> > always pass. Any idea? Is it because it is too big and get session
>> timeout?
>> >
>> >
>> >
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>> > No lease on
>> >
>> /hbase/.archive/rich_pin_data_v1/7713d5331180cb610834ba1c4ebbb9b3/d/eef3642f49244547bb6606d4d0f15f1f
>> > File does not exist. Holder DFSClient_NONMAPREDUCE_279781617_1 does
>> > not have any open files.
>> >         at
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
>> >         at
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
>> >         at
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
>> >         at
>> >
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
>> >         at
>> >
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>> >         at
>> >
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>> >         at org.apache.hadoop.ipc.ProtobufR
>> >
>> >
>> >
>> > Thanks
>> >
>> > Tian-Ying
>> >
>>
>
>

Re: export snapshot fail sometime due to LeaseExpiredException

Posted by Tianying Chang <ty...@gmail.com>.

Hi,

I found it is only that cluster I was using for my test has this issue.
When I changed the destination cluster to another one, it is not a problem
anymore. I still don't know what is special about that cluster that cause
the job to  fail sometime, especially on the 2nd, 3rd... run. But at least
I know exportSnapshot is stable.

Thanks a lot for your help.
Tian-Ying


On Wed, Apr 30, 2014 at 3:43 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. 1. delete_snapshot 'myTable'
>
> myTable is a table, not name of a snapshot, right ?
>
>
> HBASE-10766 was not among the list of patches in your earlier email. Can
> you apply the patch and try again ?
>
> Cheers
>
>
> On Wed, Apr 30, 2014 at 3:31 PM, Tianying Chang <ty...@gmail.com> wrote:
>
> > Actually, my testing on a 90G table always succeed, never fail. The
> failed
> > one is a production table which has about 400G and 460 regions.
> >
> > The weird thing is it seems the first run after I refresh the jar(either
> > throttle or non-throttle) always succeed with no failed task. But then
> 2nd,
> > 3rd... will always fail. And the error message is about the destination
> > file does not exist. But since it is the file that it is trying to copy
> > into, this is very strange.
> >
> > BTW, I cleanup the destinattion cluster by doing 3 things:
> > 1. delete_snapshot 'myTable'
> > 2. hadoop dfs -rmr /hbase/.hbase-snapshot/.tmp
> > 3. hadoop dfs -rmr /hbase/.archive/myTable
> >
> > Thanks
> > Tian-Ying
> >
> >
> > On Wed, Apr 30, 2014 at 3:07 PM, Matteo Bertozzi <
> theo.bertozzi@gmail.com
> > >wrote:
> >
> > > can you post your ExportSnapshot.java code?
> > > Is your destination an hbase cluster? if yes do you have HBASE-10766.
> if
> > > not try to export to an hdfs path (not /hbase subdir)
> > > do you have other stuff playing with the files in .archive? or multiple
> > > ExportSnapshot running against the same set of files?
> > >
> > > we have testing for ExportSnapshot with 40G files, so the problem is
> not
> > on
> > > the size.
> > > It may be one of the above, or your lease timeout too low for the
> "busy"
> > > state of your machines
> > >
> > > Matteo
> > >
> > >
> > >
> > > On Wed, Apr 30, 2014 at 2:55 PM, Tianying Chang <ty...@gmail.com>
> > wrote:
> > >
> > > > I think it is not directly caused by the throttle. On the 2nd run on
> > the
> > > > non-throttle jar, the LeaseExpiredException shows up again(for big
> > file).
> > > > So it does seem like the exportSnapshot is not reliable for big file.
> > > >
> > > > The weird thing is when I replace the jar and restart the cluster,
> the
> > > > first run of the big table always succeed. But then the later run
> > always
> > > > fail with these LeaseExpiredException.  Smaller table has no problem
> no
> > > > matter how many times I re-run.
> > > >
> > > > Thanks
> > > > Tian-Ying
> > > >
> > > >
> > > > On Wed, Apr 30, 2014 at 2:24 PM, Tianying Chang <ty...@gmail.com>
> > > wrote:
> > > >
> > > > > Ted,
> > > > >
> > > > > it seems it is due to the Jira-11083: throttle bandwidth during
> > > snapshot
> > > > > export <https://issues.apache.org/jira/browse/HBASE-11083> After I
> > > > revert
> > > > > it back, the job succeed again. It seems even when I set the
> throttle
> > > > > bandwidth high, like 200M, iftop shows much lower value. Maybe the
> > > > throttle
> > > > > is sleeping longer than it supposed to? But I am not clear why a
> slow
> > > > copy
> > > > > job can cause LeaseExpiredException. Any idea?
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> > > > No lease on
> > > >
> > >
> >
> /hbase/.archive/rich_pin_data_v1/b50ab10bb4812acc2e9fa6c564c9adef/d/bac3c661a897466aaf1706a9e1bd9e9a
> > > > File does not exist. Holder DFSClient_NONMAPREDUCE_-2096088484_1 does
> > not
> > > > have any open files.
> > > > >       at
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
> > > > >       at
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
> > > > >       at
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2454)
> > > > >       at
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2431)
> > > > >       at
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:536)
> > > > >       at
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:335)
> > > > >       at
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$
> > > > >
> > > > >
> > > > > Thanks
> > > > > Tian-Ying
> > > > >
> > > > >
> > > > > On Wed, Apr 30, 2014 at 1:25 PM, Ted Yu <yu...@gmail.com>
> wrote:
> > > > >
> > > > >> Tianying:
> > > > >> Have you checked audit log on namenode for deletion event
> > > corresponding
> > > > to
> > > > >> the files involved in LeaseExpiredException ?
> > > > >>
> > > > >> Cheers
> > > > >>
> > > > >>
> > > > >> On Wed, Apr 30, 2014 at 10:44 AM, Tianying Chang <
> tychang@gmail.com
> > >
> > > > >> wrote:
> > > > >>
> > > > >> > This time re-run passed (although with many failed/retry tasks)
> > with
> > > > my
> > > > >> > throttle bandwidth as 200M(although by iftop, it never reach
> close
> > > to
> > > > >> that
> > > > >> > number). Is there a way to increase the lease expire time for
> low
> > > > >> throttle
> > > > >> > bandwidth for individual export job?
> > > > >> >
> > > > >> > Thanks
> > > > >> > Tian-Ying
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On Wed, Apr 30, 2014 at 10:17 AM, Tianying Chang <
> > tychang@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> >
> > > > >> > > yes, I am using the bandwidth throttle feature. The export job
> > of
> > > > this
> > > > >> > > table actually succeed for its first run. When I rerun it (for
> > my
> > > > >> robust
> > > > >> > > testing) it seems never pass.  I am wondering if it has some
> > werid
> > > > >> state
> > > > >> > (I
> > > > >> > > did clean up the target cluster even removed
> > > > >> > > /hbase/.archive/rich_pint_data_v1 folder)
> > > > >> > >
> > > > >> > > It seems even if I set the throttle value really large, it
> still
> > > > fail.
> > > > >> > And
> > > > >> > > I think even after I replace the jar back to the one without
> > > > >> throttle, it
> > > > >> > > still fail for re-run.
> > > > >> > >
> > > > >> > > Is there some way that I can increase the lease to be very
> large
> > > to
> > > > >> test
> > > > >> > > it out?
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > On Wed, Apr 30, 2014 at 10:02 AM, Matteo Bertozzi <
> > > > >> > theo.bertozzi@gmail.com
> > > > >> > > > wrote:
> > > > >> > >
> > > > >> > >> the file is the file in export, so you are creating that
> file.
> > > > >> > >> do you have the bandwidth throttle on?
> > > > >> > >>
> > > > >> > >> I'm thinking that the file is slow writing: e.g. write(few
> > bytes)
> > > > >> wait
> > > > >> > >> write(few bytes)
> > > > >> > >> and on the wait your lease expire
> > > > >> > >> or something like that can happen if your MR job is stuck in
> > > > someway
> > > > >> > (slow
> > > > >> > >> machine or similar) and it is not writing within the lease
> > > timeout
> > > > >> > >>
> > > > >> > >> Matteo
> > > > >> > >>
> > > > >> > >>
> > > > >> > >>
> > > > >> > >> On Wed, Apr 30, 2014 at 9:53 AM, Tianying Chang <
> > > tychang@gmail.com
> > > > >
> > > > >> > >> wrote:
> > > > >> > >>
> > > > >> > >> > we are using
> > > > >> > >> >
> > > > >> > >> > Hadoop 2.0.0-cdh4.2.0 and hbase 0.94.7. We also backported
> > > > several
> > > > >> > >> snapshot
> > > > >> > >> > related jira, e.g 10111(verify snapshot), 11083 (bandwidth
> > > > >> throttle in
> > > > >> > >> > exportSnapshot)
> > > > >> > >> >
> > > > >> > >> > I found when the  LeaseExpiredException first reported,
> that
> > > file
> > > > >> > indeed
> > > > >> > >> > not there, and the map task retry. And I verifified couple
> > > > minutes
> > > > >> > >> later,
> > > > >> > >> > that HFile does exist under /.archive. But the retry map
> task
> > > > still
> > > > >> > >> > complain the same error of file  not exist...
> > > > >> > >> >
> > > > >> > >> > I will check the namenode log for the
> LeaseExpiredException.
> > > > >> > >> >
> > > > >> > >> >
> > > > >> > >> > Thanks
> > > > >> > >> >
> > > > >> > >> > Tian-Ying
> > > > >> > >> >
> > > > >> > >> >
> > > > >> > >> > On Wed, Apr 30, 2014 at 9:33 AM, Ted Yu <
> yuzhihong@gmail.com
> > >
> > > > >> wrote:
> > > > >> > >> >
> > > > >> > >> > > Can you give us the hbase and hadoop releases you're
> using
> > ?
> > > > >> > >> > >
> > > > >> > >> > > Can you check namenode log around the time
> > > > LeaseExpiredException
> > > > >> was
> > > > >> > >> > > encountered ?
> > > > >> > >> > >
> > > > >> > >> > > Cheers
> > > > >> > >> > >
> > > > >> > >> > >
> > > > >> > >> > > On Wed, Apr 30, 2014 at 9:20 AM, Tianying Chang <
> > > > >> tychang@gmail.com>
> > > > >> > >> > wrote:
> > > > >> > >> > >
> > > > >> > >> > > > Hi,
> > > > >> > >> > > >
> > > > >> > >> > > > When I export large table with 460+ regions, I saw the
> > > > >> > >> exportSnapshot
> > > > >> > >> > job
> > > > >> > >> > > > fail sometime (not all the time). The error of the map
> > task
> > > > is
> > > > >> > >> below:
> > > > >> > >> > > But I
> > > > >> > >> > > > verified the file highlighted below, it does exist.
> > Smaller
> > > > >> table
> > > > >> > >> seems
> > > > >> > >> > > > always pass. Any idea? Is it because it is too big and
> > get
> > > > >> session
> > > > >> > >> > > timeout?
> > > > >> > >> > > >
> > > > >> > >> > > >
> > > > >> > >> > > >
> > > > >> > >> > >
> > > > >> > >> >
> > > > >> > >>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> > > > >> > >> > > > No lease on
> > > > >> > >> > > >
> > > > >> > >> > >
> > > > >> > >> >
> > > > >> > >>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> /hbase/.archive/rich_pin_data_v1/7713d5331180cb610834ba1c4ebbb9b3/d/eef3642f49244547bb6606d4d0f15f1f
> > > > >> > >> > > > File does not exist. Holder
> > > > DFSClient_NONMAPREDUCE_279781617_1
> > > > >> > does
> > > > >> > >> > > > not have any open files.
> > > > >> > >> > > >         at
> > > > >> > >> > > >
> > > > >> > >> > >
> > > > >> > >> >
> > > > >> > >>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
> > > > >> > >> > > >         at
> > > > >> > >> > > >
> > > > >> > >> > >
> > > > >> > >> >
> > > > >> > >>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
> > > > >> > >> > > >         at
> > > > >> > >> > > >
> > > > >> > >> > >
> > > > >> > >> >
> > > > >> > >>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
> > > > >> > >> > > >         at
> > > > >> > >> > > >
> > > > >> > >> > >
> > > > >> > >> >
> > > > >> > >>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
> > > > >> > >> > > >         at
> > > > >> > >> > > >
> > > > >> > >> > >
> > > > >> > >> >
> > > > >> > >>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
> > > > >> > >> > > >         at
> > > > >> > >> > > >
> > > > >> > >> > >
> > > > >> > >> >
> > > > >> > >>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
> > > > >> > >> > > >         at org.apache.hadoop.ipc.ProtobufR
> > > > >> > >> > > >
> > > > >> > >> > > >
> > > > >> > >> > > >
> > > > >> > >> > > > Thanks
> > > > >> > >> > > >
> > > > >> > >> > > > Tian-Ying
> > > > >> > >> > > >
> > > > >> > >> > >
> > > > >> > >> >
> > > > >> > >>
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: export snapshot fail sometime due to LeaseExpiredException

Posted by Ted Yu <yu...@gmail.com>.

bq. 1. delete_snapshot 'myTable'

myTable is a table, not name of a snapshot, right ?


HBASE-10766 was not among the list of patches in your earlier email. Can
you apply the patch and try again ?

Cheers


On Wed, Apr 30, 2014 at 3:31 PM, Tianying Chang <ty...@gmail.com> wrote:

> Actually, my testing on a 90G table always succeed, never fail. The failed
> one is a production table which has about 400G and 460 regions.
>
> The weird thing is it seems the first run after I refresh the jar(either
> throttle or non-throttle) always succeed with no failed task. But then 2nd,
> 3rd... will always fail. And the error message is about the destination
> file does not exist. But since it is the file that it is trying to copy
> into, this is very strange.
>
> BTW, I cleanup the destinattion cluster by doing 3 things:
> 1. delete_snapshot 'myTable'
> 2. hadoop dfs -rmr /hbase/.hbase-snapshot/.tmp
> 3. hadoop dfs -rmr /hbase/.archive/myTable
>
> Thanks
> Tian-Ying
>
>
> On Wed, Apr 30, 2014 at 3:07 PM, Matteo Bertozzi <theo.bertozzi@gmail.com
> >wrote:
>
> > can you post your ExportSnapshot.java code?
> > Is your destination an hbase cluster? if yes do you have HBASE-10766. if
> > not try to export to an hdfs path (not /hbase subdir)
> > do you have other stuff playing with the files in .archive? or multiple
> > ExportSnapshot running against the same set of files?
> >
> > we have testing for ExportSnapshot with 40G files, so the problem is not
> on
> > the size.
> > It may be one of the above, or your lease timeout too low for the "busy"
> > state of your machines
> >
> > Matteo
> >
> >
> >
> > On Wed, Apr 30, 2014 at 2:55 PM, Tianying Chang <ty...@gmail.com>
> wrote:
> >
> > > I think it is not directly caused by the throttle. On the 2nd run on
> the
> > > non-throttle jar, the LeaseExpiredException shows up again(for big
> file).
> > > So it does seem like the exportSnapshot is not reliable for big file.
> > >
> > > The weird thing is when I replace the jar and restart the cluster, the
> > > first run of the big table always succeed. But then the later run
> always
> > > fail with these LeaseExpiredException.  Smaller table has no problem no
> > > matter how many times I re-run.
> > >
> > > Thanks
> > > Tian-Ying
> > >
> > >
> > > On Wed, Apr 30, 2014 at 2:24 PM, Tianying Chang <ty...@gmail.com>
> > wrote:
> > >
> > > > Ted,
> > > >
> > > > it seems it is due to the Jira-11083: throttle bandwidth during
> > snapshot
> > > > export <https://issues.apache.org/jira/browse/HBASE-11083> After I
> > > revert
> > > > it back, the job succeed again. It seems even when I set the throttle
> > > > bandwidth high, like 200M, iftop shows much lower value. Maybe the
> > > throttle
> > > > is sleeping longer than it supposed to? But I am not clear why a slow
> > > copy
> > > > job can cause LeaseExpiredException. Any idea?
> > > >
> > > >
> > >
> >
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> > > No lease on
> > >
> >
> /hbase/.archive/rich_pin_data_v1/b50ab10bb4812acc2e9fa6c564c9adef/d/bac3c661a897466aaf1706a9e1bd9e9a
> > > File does not exist. Holder DFSClient_NONMAPREDUCE_-2096088484_1 does
> not
> > > have any open files.
> > > >       at
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
> > > >       at
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
> > > >       at
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2454)
> > > >       at
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2431)
> > > >       at
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:536)
> > > >       at
> > >
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:335)
> > > >       at
> > >
> >
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$
> > > >
> > > >
> > > > Thanks
> > > > Tian-Ying
> > > >
> > > >
> > > > On Wed, Apr 30, 2014 at 1:25 PM, Ted Yu <yu...@gmail.com> wrote:
> > > >
> > > >> Tianying:
> > > >> Have you checked audit log on namenode for deletion event
> > corresponding
> > > to
> > > >> the files involved in LeaseExpiredException ?
> > > >>
> > > >> Cheers
> > > >>
> > > >>
> > > >> On Wed, Apr 30, 2014 at 10:44 AM, Tianying Chang <tychang@gmail.com
> >
> > > >> wrote:
> > > >>
> > > >> > This time re-run passed (although with many failed/retry tasks)
> with
> > > my
> > > >> > throttle bandwidth as 200M(although by iftop, it never reach close
> > to
> > > >> that
> > > >> > number). Is there a way to increase the lease expire time for low
> > > >> throttle
> > > >> > bandwidth for individual export job?
> > > >> >
> > > >> > Thanks
> > > >> > Tian-Ying
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Wed, Apr 30, 2014 at 10:17 AM, Tianying Chang <
> tychang@gmail.com
> > >
> > > >> > wrote:
> > > >> >
> > > >> > > yes, I am using the bandwidth throttle feature. The export job
> of
> > > this
> > > >> > > table actually succeed for its first run. When I rerun it (for
> my
> > > >> robust
> > > >> > > testing) it seems never pass.  I am wondering if it has some
> werid
> > > >> state
> > > >> > (I
> > > >> > > did clean up the target cluster even removed
> > > >> > > /hbase/.archive/rich_pint_data_v1 folder)
> > > >> > >
> > > >> > > It seems even if I set the throttle value really large, it still
> > > fail.
> > > >> > And
> > > >> > > I think even after I replace the jar back to the one without
> > > >> throttle, it
> > > >> > > still fail for re-run.
> > > >> > >
> > > >> > > Is there some way that I can increase the lease to be very large
> > to
> > > >> test
> > > >> > > it out?
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Wed, Apr 30, 2014 at 10:02 AM, Matteo Bertozzi <
> > > >> > theo.bertozzi@gmail.com
> > > >> > > > wrote:
> > > >> > >
> > > >> > >> the file is the file in export, so you are creating that file.
> > > >> > >> do you have the bandwidth throttle on?
> > > >> > >>
> > > >> > >> I'm thinking that the file is slow writing: e.g. write(few
> bytes)
> > > >> wait
> > > >> > >> write(few bytes)
> > > >> > >> and on the wait your lease expire
> > > >> > >> or something like that can happen if your MR job is stuck in
> > > someway
> > > >> > (slow
> > > >> > >> machine or similar) and it is not writing within the lease
> > timeout
> > > >> > >>
> > > >> > >> Matteo
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >> On Wed, Apr 30, 2014 at 9:53 AM, Tianying Chang <
> > tychang@gmail.com
> > > >
> > > >> > >> wrote:
> > > >> > >>
> > > >> > >> > we are using
> > > >> > >> >
> > > >> > >> > Hadoop 2.0.0-cdh4.2.0 and hbase 0.94.7. We also backported
> > > several
> > > >> > >> snapshot
> > > >> > >> > related jira, e.g 10111(verify snapshot), 11083 (bandwidth
> > > >> throttle in
> > > >> > >> > exportSnapshot)
> > > >> > >> >
> > > >> > >> > I found when the  LeaseExpiredException first reported, that
> > file
> > > >> > indeed
> > > >> > >> > not there, and the map task retry. And I verifified couple
> > > minutes
> > > >> > >> later,
> > > >> > >> > that HFile does exist under /.archive. But the retry map task
> > > still
> > > >> > >> > complain the same error of file  not exist...
> > > >> > >> >
> > > >> > >> > I will check the namenode log for the LeaseExpiredException.
> > > >> > >> >
> > > >> > >> >
> > > >> > >> > Thanks
> > > >> > >> >
> > > >> > >> > Tian-Ying
> > > >> > >> >
> > > >> > >> >
> > > >> > >> > On Wed, Apr 30, 2014 at 9:33 AM, Ted Yu <yuzhihong@gmail.com
> >
> > > >> wrote:
> > > >> > >> >
> > > >> > >> > > Can you give us the hbase and hadoop releases you're using
> ?
> > > >> > >> > >
> > > >> > >> > > Can you check namenode log around the time
> > > LeaseExpiredException
> > > >> was
> > > >> > >> > > encountered ?
> > > >> > >> > >
> > > >> > >> > > Cheers
> > > >> > >> > >
> > > >> > >> > >
> > > >> > >> > > On Wed, Apr 30, 2014 at 9:20 AM, Tianying Chang <
> > > >> tychang@gmail.com>
> > > >> > >> > wrote:
> > > >> > >> > >
> > > >> > >> > > > Hi,
> > > >> > >> > > >
> > > >> > >> > > > When I export large table with 460+ regions, I saw the
> > > >> > >> exportSnapshot
> > > >> > >> > job
> > > >> > >> > > > fail sometime (not all the time). The error of the map
> task
> > > is
> > > >> > >> below:
> > > >> > >> > > But I
> > > >> > >> > > > verified the file highlighted below, it does exist.
> Smaller
> > > >> table
> > > >> > >> seems
> > > >> > >> > > > always pass. Any idea? Is it because it is too big and
> get
> > > >> session
> > > >> > >> > > timeout?
> > > >> > >> > > >
> > > >> > >> > > >
> > > >> > >> > > >
> > > >> > >> > >
> > > >> > >> >
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> > > >> > >> > > > No lease on
> > > >> > >> > > >
> > > >> > >> > >
> > > >> > >> >
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
> /hbase/.archive/rich_pin_data_v1/7713d5331180cb610834ba1c4ebbb9b3/d/eef3642f49244547bb6606d4d0f15f1f
> > > >> > >> > > > File does not exist. Holder
> > > DFSClient_NONMAPREDUCE_279781617_1
> > > >> > does
> > > >> > >> > > > not have any open files.
> > > >> > >> > > >         at
> > > >> > >> > > >
> > > >> > >> > >
> > > >> > >> >
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
> > > >> > >> > > >         at
> > > >> > >> > > >
> > > >> > >> > >
> > > >> > >> >
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
> > > >> > >> > > >         at
> > > >> > >> > > >
> > > >> > >> > >
> > > >> > >> >
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
> > > >> > >> > > >         at
> > > >> > >> > > >
> > > >> > >> > >
> > > >> > >> >
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
> > > >> > >> > > >         at
> > > >> > >> > > >
> > > >> > >> > >
> > > >> > >> >
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
> > > >> > >> > > >         at
> > > >> > >> > > >
> > > >> > >> > >
> > > >> > >> >
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
> > > >> > >> > > >         at org.apache.hadoop.ipc.ProtobufR
> > > >> > >> > > >
> > > >> > >> > > >
> > > >> > >> > > >
> > > >> > >> > > > Thanks
> > > >> > >> > > >
> > > >> > >> > > > Tian-Ying
> > > >> > >> > > >
> > > >> > >> > >
> > > >> > >> >
> > > >> > >>
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: export snapshot fail sometime due to LeaseExpiredException

Posted by Tianying Chang <ty...@gmail.com>.

Actually, my testing on a 90G table always succeed, never fail. The failed
one is a production table which has about 400G and 460 regions.

The weird thing is it seems the first run after I refresh the jar(either
throttle or non-throttle) always succeed with no failed task. But then 2nd,
3rd... will always fail. And the error message is about the destination
file does not exist. But since it is the file that it is trying to copy
into, this is very strange.

BTW, I cleanup the destinattion cluster by doing 3 things:
1. delete_snapshot 'myTable'
2. hadoop dfs -rmr /hbase/.hbase-snapshot/.tmp
3. hadoop dfs -rmr /hbase/.archive/myTable

Thanks
Tian-Ying


On Wed, Apr 30, 2014 at 3:07 PM, Matteo Bertozzi <th...@gmail.com>wrote:

> can you post your ExportSnapshot.java code?
> Is your destination an hbase cluster? if yes do you have HBASE-10766. if
> not try to export to an hdfs path (not /hbase subdir)
> do you have other stuff playing with the files in .archive? or multiple
> ExportSnapshot running against the same set of files?
>
> we have testing for ExportSnapshot with 40G files, so the problem is not on
> the size.
> It may be one of the above, or your lease timeout too low for the "busy"
> state of your machines
>
> Matteo
>
>
>
> On Wed, Apr 30, 2014 at 2:55 PM, Tianying Chang <ty...@gmail.com> wrote:
>
> > I think it is not directly caused by the throttle. On the 2nd run on the
> > non-throttle jar, the LeaseExpiredException shows up again(for big file).
> > So it does seem like the exportSnapshot is not reliable for big file.
> >
> > The weird thing is when I replace the jar and restart the cluster, the
> > first run of the big table always succeed. But then the later run always
> > fail with these LeaseExpiredException.  Smaller table has no problem no
> > matter how many times I re-run.
> >
> > Thanks
> > Tian-Ying
> >
> >
> > On Wed, Apr 30, 2014 at 2:24 PM, Tianying Chang <ty...@gmail.com>
> wrote:
> >
> > > Ted,
> > >
> > > it seems it is due to the Jira-11083: throttle bandwidth during
> snapshot
> > > export <https://issues.apache.org/jira/browse/HBASE-11083> After I
> > revert
> > > it back, the job succeed again. It seems even when I set the throttle
> > > bandwidth high, like 200M, iftop shows much lower value. Maybe the
> > throttle
> > > is sleeping longer than it supposed to? But I am not clear why a slow
> > copy
> > > job can cause LeaseExpiredException. Any idea?
> > >
> > >
> >
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> > No lease on
> >
> /hbase/.archive/rich_pin_data_v1/b50ab10bb4812acc2e9fa6c564c9adef/d/bac3c661a897466aaf1706a9e1bd9e9a
> > File does not exist. Holder DFSClient_NONMAPREDUCE_-2096088484_1 does not
> > have any open files.
> > >       at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
> > >       at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
> > >       at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2454)
> > >       at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2431)
> > >       at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:536)
> > >       at
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:335)
> > >       at
> >
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$
> > >
> > >
> > > Thanks
> > > Tian-Ying
> > >
> > >
> > > On Wed, Apr 30, 2014 at 1:25 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > >> Tianying:
> > >> Have you checked audit log on namenode for deletion event
> corresponding
> > to
> > >> the files involved in LeaseExpiredException ?
> > >>
> > >> Cheers
> > >>
> > >>
> > >> On Wed, Apr 30, 2014 at 10:44 AM, Tianying Chang <ty...@gmail.com>
> > >> wrote:
> > >>
> > >> > This time re-run passed (although with many failed/retry tasks) with
> > my
> > >> > throttle bandwidth as 200M(although by iftop, it never reach close
> to
> > >> that
> > >> > number). Is there a way to increase the lease expire time for low
> > >> throttle
> > >> > bandwidth for individual export job?
> > >> >
> > >> > Thanks
> > >> > Tian-Ying
> > >> >
> > >> >
> > >> >
> > >> > On Wed, Apr 30, 2014 at 10:17 AM, Tianying Chang <tychang@gmail.com
> >
> > >> > wrote:
> > >> >
> > >> > > yes, I am using the bandwidth throttle feature. The export job of
> > this
> > >> > > table actually succeed for its first run. When I rerun it (for my
> > >> robust
> > >> > > testing) it seems never pass.  I am wondering if it has some werid
> > >> state
> > >> > (I
> > >> > > did clean up the target cluster even removed
> > >> > > /hbase/.archive/rich_pint_data_v1 folder)
> > >> > >
> > >> > > It seems even if I set the throttle value really large, it still
> > fail.
> > >> > And
> > >> > > I think even after I replace the jar back to the one without
> > >> throttle, it
> > >> > > still fail for re-run.
> > >> > >
> > >> > > Is there some way that I can increase the lease to be very large
> to
> > >> test
> > >> > > it out?
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Wed, Apr 30, 2014 at 10:02 AM, Matteo Bertozzi <
> > >> > theo.bertozzi@gmail.com
> > >> > > > wrote:
> > >> > >
> > >> > >> the file is the file in export, so you are creating that file.
> > >> > >> do you have the bandwidth throttle on?
> > >> > >>
> > >> > >> I'm thinking that the file is slow writing: e.g. write(few bytes)
> > >> wait
> > >> > >> write(few bytes)
> > >> > >> and on the wait your lease expire
> > >> > >> or something like that can happen if your MR job is stuck in
> > someway
> > >> > (slow
> > >> > >> machine or similar) and it is not writing within the lease
> timeout
> > >> > >>
> > >> > >> Matteo
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> On Wed, Apr 30, 2014 at 9:53 AM, Tianying Chang <
> tychang@gmail.com
> > >
> > >> > >> wrote:
> > >> > >>
> > >> > >> > we are using
> > >> > >> >
> > >> > >> > Hadoop 2.0.0-cdh4.2.0 and hbase 0.94.7. We also backported
> > several
> > >> > >> snapshot
> > >> > >> > related jira, e.g 10111(verify snapshot), 11083 (bandwidth
> > >> throttle in
> > >> > >> > exportSnapshot)
> > >> > >> >
> > >> > >> > I found when the  LeaseExpiredException first reported, that
> file
> > >> > indeed
> > >> > >> > not there, and the map task retry. And I verifified couple
> > minutes
> > >> > >> later,
> > >> > >> > that HFile does exist under /.archive. But the retry map task
> > still
> > >> > >> > complain the same error of file  not exist...
> > >> > >> >
> > >> > >> > I will check the namenode log for the LeaseExpiredException.
> > >> > >> >
> > >> > >> >
> > >> > >> > Thanks
> > >> > >> >
> > >> > >> > Tian-Ying
> > >> > >> >
> > >> > >> >
> > >> > >> > On Wed, Apr 30, 2014 at 9:33 AM, Ted Yu <yu...@gmail.com>
> > >> wrote:
> > >> > >> >
> > >> > >> > > Can you give us the hbase and hadoop releases you're using ?
> > >> > >> > >
> > >> > >> > > Can you check namenode log around the time
> > LeaseExpiredException
> > >> was
> > >> > >> > > encountered ?
> > >> > >> > >
> > >> > >> > > Cheers
> > >> > >> > >
> > >> > >> > >
> > >> > >> > > On Wed, Apr 30, 2014 at 9:20 AM, Tianying Chang <
> > >> tychang@gmail.com>
> > >> > >> > wrote:
> > >> > >> > >
> > >> > >> > > > Hi,
> > >> > >> > > >
> > >> > >> > > > When I export large table with 460+ regions, I saw the
> > >> > >> exportSnapshot
> > >> > >> > job
> > >> > >> > > > fail sometime (not all the time). The error of the map task
> > is
> > >> > >> below:
> > >> > >> > > But I
> > >> > >> > > > verified the file highlighted below, it does exist. Smaller
> > >> table
> > >> > >> seems
> > >> > >> > > > always pass. Any idea? Is it because it is too big and get
> > >> session
> > >> > >> > > timeout?
> > >> > >> > > >
> > >> > >> > > >
> > >> > >> > > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> >
> > >>
> >
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> > >> > >> > > > No lease on
> > >> > >> > > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> >
> > >>
> >
> /hbase/.archive/rich_pin_data_v1/7713d5331180cb610834ba1c4ebbb9b3/d/eef3642f49244547bb6606d4d0f15f1f
> > >> > >> > > > File does not exist. Holder
> > DFSClient_NONMAPREDUCE_279781617_1
> > >> > does
> > >> > >> > > > not have any open files.
> > >> > >> > > >         at
> > >> > >> > > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> >
> > >>
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
> > >> > >> > > >         at
> > >> > >> > > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> >
> > >>
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
> > >> > >> > > >         at
> > >> > >> > > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> >
> > >>
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
> > >> > >> > > >         at
> > >> > >> > > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> >
> > >>
> >
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
> > >> > >> > > >         at
> > >> > >> > > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> >
> > >>
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
> > >> > >> > > >         at
> > >> > >> > > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> >
> > >>
> >
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
> > >> > >> > > >         at org.apache.hadoop.ipc.ProtobufR
> > >> > >> > > >
> > >> > >> > > >
> > >> > >> > > >
> > >> > >> > > > Thanks
> > >> > >> > > >
> > >> > >> > > > Tian-Ying
> > >> > >> > > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: export snapshot fail sometime due to LeaseExpiredException

Posted by Matteo Bertozzi <th...@gmail.com>.

can you post your ExportSnapshot.java code?
Is your destination an hbase cluster? if yes do you have HBASE-10766. if
not try to export to an hdfs path (not /hbase subdir)
do you have other stuff playing with the files in .archive? or multiple
ExportSnapshot running against the same set of files?

we have testing for ExportSnapshot with 40G files, so the problem is not on
the size.
It may be one of the above, or your lease timeout too low for the "busy"
state of your machines

Matteo



On Wed, Apr 30, 2014 at 2:55 PM, Tianying Chang <ty...@gmail.com> wrote:

> I think it is not directly caused by the throttle. On the 2nd run on the
> non-throttle jar, the LeaseExpiredException shows up again(for big file).
> So it does seem like the exportSnapshot is not reliable for big file.
>
> The weird thing is when I replace the jar and restart the cluster, the
> first run of the big table always succeed. But then the later run always
> fail with these LeaseExpiredException.  Smaller table has no problem no
> matter how many times I re-run.
>
> Thanks
> Tian-Ying
>
>
> On Wed, Apr 30, 2014 at 2:24 PM, Tianying Chang <ty...@gmail.com> wrote:
>
> > Ted,
> >
> > it seems it is due to the Jira-11083: throttle bandwidth during snapshot
> > export <https://issues.apache.org/jira/browse/HBASE-11083> After I
> revert
> > it back, the job succeed again. It seems even when I set the throttle
> > bandwidth high, like 200M, iftop shows much lower value. Maybe the
> throttle
> > is sleeping longer than it supposed to? But I am not clear why a slow
> copy
> > job can cause LeaseExpiredException. Any idea?
> >
> >
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> No lease on
> /hbase/.archive/rich_pin_data_v1/b50ab10bb4812acc2e9fa6c564c9adef/d/bac3c661a897466aaf1706a9e1bd9e9a
> File does not exist. Holder DFSClient_NONMAPREDUCE_-2096088484_1 does not
> have any open files.
> >       at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
> >       at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
> >       at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2454)
> >       at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2431)
> >       at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:536)
> >       at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:335)
> >       at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$
> >
> >
> > Thanks
> > Tian-Ying
> >
> >
> > On Wed, Apr 30, 2014 at 1:25 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> >> Tianying:
> >> Have you checked audit log on namenode for deletion event corresponding
> to
> >> the files involved in LeaseExpiredException ?
> >>
> >> Cheers
> >>
> >>
> >> On Wed, Apr 30, 2014 at 10:44 AM, Tianying Chang <ty...@gmail.com>
> >> wrote:
> >>
> >> > This time re-run passed (although with many failed/retry tasks) with
> my
> >> > throttle bandwidth as 200M(although by iftop, it never reach close to
> >> that
> >> > number). Is there a way to increase the lease expire time for low
> >> throttle
> >> > bandwidth for individual export job?
> >> >
> >> > Thanks
> >> > Tian-Ying
> >> >
> >> >
> >> >
> >> > On Wed, Apr 30, 2014 at 10:17 AM, Tianying Chang <ty...@gmail.com>
> >> > wrote:
> >> >
> >> > > yes, I am using the bandwidth throttle feature. The export job of
> this
> >> > > table actually succeed for its first run. When I rerun it (for my
> >> robust
> >> > > testing) it seems never pass.  I am wondering if it has some werid
> >> state
> >> > (I
> >> > > did clean up the target cluster even removed
> >> > > /hbase/.archive/rich_pint_data_v1 folder)
> >> > >
> >> > > It seems even if I set the throttle value really large, it still
> fail.
> >> > And
> >> > > I think even after I replace the jar back to the one without
> >> throttle, it
> >> > > still fail for re-run.
> >> > >
> >> > > Is there some way that I can increase the lease to be very large to
> >> test
> >> > > it out?
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Apr 30, 2014 at 10:02 AM, Matteo Bertozzi <
> >> > theo.bertozzi@gmail.com
> >> > > > wrote:
> >> > >
> >> > >> the file is the file in export, so you are creating that file.
> >> > >> do you have the bandwidth throttle on?
> >> > >>
> >> > >> I'm thinking that the file is slow writing: e.g. write(few bytes)
> >> wait
> >> > >> write(few bytes)
> >> > >> and on the wait your lease expire
> >> > >> or something like that can happen if your MR job is stuck in
> someway
> >> > (slow
> >> > >> machine or similar) and it is not writing within the lease timeout
> >> > >>
> >> > >> Matteo
> >> > >>
> >> > >>
> >> > >>
> >> > >> On Wed, Apr 30, 2014 at 9:53 AM, Tianying Chang <tychang@gmail.com
> >
> >> > >> wrote:
> >> > >>
> >> > >> > we are using
> >> > >> >
> >> > >> > Hadoop 2.0.0-cdh4.2.0 and hbase 0.94.7. We also backported
> several
> >> > >> snapshot
> >> > >> > related jira, e.g 10111(verify snapshot), 11083 (bandwidth
> >> throttle in
> >> > >> > exportSnapshot)
> >> > >> >
> >> > >> > I found when the  LeaseExpiredException first reported, that file
> >> > indeed
> >> > >> > not there, and the map task retry. And I verifified couple
> minutes
> >> > >> later,
> >> > >> > that HFile does exist under /.archive. But the retry map task
> still
> >> > >> > complain the same error of file  not exist...
> >> > >> >
> >> > >> > I will check the namenode log for the LeaseExpiredException.
> >> > >> >
> >> > >> >
> >> > >> > Thanks
> >> > >> >
> >> > >> > Tian-Ying
> >> > >> >
> >> > >> >
> >> > >> > On Wed, Apr 30, 2014 at 9:33 AM, Ted Yu <yu...@gmail.com>
> >> wrote:
> >> > >> >
> >> > >> > > Can you give us the hbase and hadoop releases you're using ?
> >> > >> > >
> >> > >> > > Can you check namenode log around the time
> LeaseExpiredException
> >> was
> >> > >> > > encountered ?
> >> > >> > >
> >> > >> > > Cheers
> >> > >> > >
> >> > >> > >
> >> > >> > > On Wed, Apr 30, 2014 at 9:20 AM, Tianying Chang <
> >> tychang@gmail.com>
> >> > >> > wrote:
> >> > >> > >
> >> > >> > > > Hi,
> >> > >> > > >
> >> > >> > > > When I export large table with 460+ regions, I saw the
> >> > >> exportSnapshot
> >> > >> > job
> >> > >> > > > fail sometime (not all the time). The error of the map task
> is
> >> > >> below:
> >> > >> > > But I
> >> > >> > > > verified the file highlighted below, it does exist. Smaller
> >> table
> >> > >> seems
> >> > >> > > > always pass. Any idea? Is it because it is too big and get
> >> session
> >> > >> > > timeout?
> >> > >> > > >
> >> > >> > > >
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> >> > >> > > > No lease on
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> /hbase/.archive/rich_pin_data_v1/7713d5331180cb610834ba1c4ebbb9b3/d/eef3642f49244547bb6606d4d0f15f1f
> >> > >> > > > File does not exist. Holder
> DFSClient_NONMAPREDUCE_279781617_1
> >> > does
> >> > >> > > > not have any open files.
> >> > >> > > >         at
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
> >> > >> > > >         at
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
> >> > >> > > >         at
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
> >> > >> > > >         at
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
> >> > >> > > >         at
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
> >> > >> > > >         at
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
> >> > >> > > >         at org.apache.hadoop.ipc.ProtobufR
> >> > >> > > >
> >> > >> > > >
> >> > >> > > >
> >> > >> > > > Thanks
> >> > >> > > >
> >> > >> > > > Tian-Ying
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>

Re: export snapshot fail sometime due to LeaseExpiredException

Posted by Tianying Chang <ty...@gmail.com>.

I think it is not directly caused by the throttle. On the 2nd run on the
non-throttle jar, the LeaseExpiredException shows up again(for big file).
So it does seem like the exportSnapshot is not reliable for big file.

The weird thing is when I replace the jar and restart the cluster, the
first run of the big table always succeed. But then the later run always
fail with these LeaseExpiredException.  Smaller table has no problem no
matter how many times I re-run.

Thanks
Tian-Ying


On Wed, Apr 30, 2014 at 2:24 PM, Tianying Chang <ty...@gmail.com> wrote:

> Ted,
>
> it seems it is due to the Jira-11083: throttle bandwidth during snapshot
> export <https://issues.apache.org/jira/browse/HBASE-11083> After I revert
> it back, the job succeed again. It seems even when I set the throttle
> bandwidth high, like 200M, iftop shows much lower value. Maybe the throttle
> is sleeping longer than it supposed to? But I am not clear why a slow copy
> job can cause LeaseExpiredException. Any idea?
>
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /hbase/.archive/rich_pin_data_v1/b50ab10bb4812acc2e9fa6c564c9adef/d/bac3c661a897466aaf1706a9e1bd9e9a File does not exist. Holder DFSClient_NONMAPREDUCE_-2096088484_1 does not have any open files.
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2454)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2431)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:536)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:335)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$
>
>
> Thanks
> Tian-Ying
>
>
> On Wed, Apr 30, 2014 at 1:25 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Tianying:
>> Have you checked audit log on namenode for deletion event corresponding to
>> the files involved in LeaseExpiredException ?
>>
>> Cheers
>>
>>
>> On Wed, Apr 30, 2014 at 10:44 AM, Tianying Chang <ty...@gmail.com>
>> wrote:
>>
>> > This time re-run passed (although with many failed/retry tasks) with my
>> > throttle bandwidth as 200M(although by iftop, it never reach close to
>> that
>> > number). Is there a way to increase the lease expire time for low
>> throttle
>> > bandwidth for individual export job?
>> >
>> > Thanks
>> > Tian-Ying
>> >
>> >
>> >
>> > On Wed, Apr 30, 2014 at 10:17 AM, Tianying Chang <ty...@gmail.com>
>> > wrote:
>> >
>> > > yes, I am using the bandwidth throttle feature. The export job of this
>> > > table actually succeed for its first run. When I rerun it (for my
>> robust
>> > > testing) it seems never pass.  I am wondering if it has some werid
>> state
>> > (I
>> > > did clean up the target cluster even removed
>> > > /hbase/.archive/rich_pint_data_v1 folder)
>> > >
>> > > It seems even if I set the throttle value really large, it still fail.
>> > And
>> > > I think even after I replace the jar back to the one without
>> throttle, it
>> > > still fail for re-run.
>> > >
>> > > Is there some way that I can increase the lease to be very large to
>> test
>> > > it out?
>> > >
>> > >
>> > >
>> > > On Wed, Apr 30, 2014 at 10:02 AM, Matteo Bertozzi <
>> > theo.bertozzi@gmail.com
>> > > > wrote:
>> > >
>> > >> the file is the file in export, so you are creating that file.
>> > >> do you have the bandwidth throttle on?
>> > >>
>> > >> I'm thinking that the file is slow writing: e.g. write(few bytes)
>> wait
>> > >> write(few bytes)
>> > >> and on the wait your lease expire
>> > >> or something like that can happen if your MR job is stuck in someway
>> > (slow
>> > >> machine or similar) and it is not writing within the lease timeout
>> > >>
>> > >> Matteo
>> > >>
>> > >>
>> > >>
>> > >> On Wed, Apr 30, 2014 at 9:53 AM, Tianying Chang <ty...@gmail.com>
>> > >> wrote:
>> > >>
>> > >> > we are using
>> > >> >
>> > >> > Hadoop 2.0.0-cdh4.2.0 and hbase 0.94.7. We also backported several
>> > >> snapshot
>> > >> > related jira, e.g 10111(verify snapshot), 11083 (bandwidth
>> throttle in
>> > >> > exportSnapshot)
>> > >> >
>> > >> > I found when the  LeaseExpiredException first reported, that file
>> > indeed
>> > >> > not there, and the map task retry. And I verifified couple minutes
>> > >> later,
>> > >> > that HFile does exist under /.archive. But the retry map task still
>> > >> > complain the same error of file  not exist...
>> > >> >
>> > >> > I will check the namenode log for the LeaseExpiredException.
>> > >> >
>> > >> >
>> > >> > Thanks
>> > >> >
>> > >> > Tian-Ying
>> > >> >
>> > >> >
>> > >> > On Wed, Apr 30, 2014 at 9:33 AM, Ted Yu <yu...@gmail.com>
>> wrote:
>> > >> >
>> > >> > > Can you give us the hbase and hadoop releases you're using ?
>> > >> > >
>> > >> > > Can you check namenode log around the time LeaseExpiredException
>> was
>> > >> > > encountered ?
>> > >> > >
>> > >> > > Cheers
>> > >> > >
>> > >> > >
>> > >> > > On Wed, Apr 30, 2014 at 9:20 AM, Tianying Chang <
>> tychang@gmail.com>
>> > >> > wrote:
>> > >> > >
>> > >> > > > Hi,
>> > >> > > >
>> > >> > > > When I export large table with 460+ regions, I saw the
>> > >> exportSnapshot
>> > >> > job
>> > >> > > > fail sometime (not all the time). The error of the map task is
>> > >> below:
>> > >> > > But I
>> > >> > > > verified the file highlighted below, it does exist. Smaller
>> table
>> > >> seems
>> > >> > > > always pass. Any idea? Is it because it is too big and get
>> session
>> > >> > > timeout?
>> > >> > > >
>> > >> > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>> > >> > > > No lease on
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> /hbase/.archive/rich_pin_data_v1/7713d5331180cb610834ba1c4ebbb9b3/d/eef3642f49244547bb6606d4d0f15f1f
>> > >> > > > File does not exist. Holder DFSClient_NONMAPREDUCE_279781617_1
>> > does
>> > >> > > > not have any open files.
>> > >> > > >         at
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
>> > >> > > >         at
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
>> > >> > > >         at
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
>> > >> > > >         at
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
>> > >> > > >         at
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>> > >> > > >         at
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>> > >> > > >         at org.apache.hadoop.ipc.ProtobufR
>> > >> > > >
>> > >> > > >
>> > >> > > >
>> > >> > > > Thanks
>> > >> > > >
>> > >> > > > Tian-Ying
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: export snapshot fail sometime due to LeaseExpiredException

Posted by Tianying Chang <ty...@gmail.com>.

Ted,

it seems it is due to the Jira-11083: throttle bandwidth during snapshot
export <https://issues.apache.org/jira/browse/HBASE-11083> After I revert
it back, the job succeed again. It seems even when I set the throttle
bandwidth high, like 200M, iftop shows much lower value. Maybe the throttle
is sleeping longer than it supposed to? But I am not clear why a slow copy
job can cause LeaseExpiredException. Any idea?

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on /hbase/.archive/rich_pin_data_v1/b50ab10bb4812acc2e9fa6c564c9adef/d/bac3c661a897466aaf1706a9e1bd9e9a
File does not exist. Holder DFSClient_NONMAPREDUCE_-2096088484_1 does
not have any open files.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2454)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2431)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:536)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:335)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$


Thanks
Tian-Ying


On Wed, Apr 30, 2014 at 1:25 PM, Ted Yu <yu...@gmail.com> wrote:

> Tianying:
> Have you checked audit log on namenode for deletion event corresponding to
> the files involved in LeaseExpiredException ?
>
> Cheers
>
>
> On Wed, Apr 30, 2014 at 10:44 AM, Tianying Chang <ty...@gmail.com>
> wrote:
>
> > This time re-run passed (although with many failed/retry tasks) with my
> > throttle bandwidth as 200M(although by iftop, it never reach close to
> that
> > number). Is there a way to increase the lease expire time for low
> throttle
> > bandwidth for individual export job?
> >
> > Thanks
> > Tian-Ying
> >
> >
> >
> > On Wed, Apr 30, 2014 at 10:17 AM, Tianying Chang <ty...@gmail.com>
> > wrote:
> >
> > > yes, I am using the bandwidth throttle feature. The export job of this
> > > table actually succeed for its first run. When I rerun it (for my
> robust
> > > testing) it seems never pass.  I am wondering if it has some werid
> state
> > (I
> > > did clean up the target cluster even removed
> > > /hbase/.archive/rich_pint_data_v1 folder)
> > >
> > > It seems even if I set the throttle value really large, it still fail.
> > And
> > > I think even after I replace the jar back to the one without throttle,
> it
> > > still fail for re-run.
> > >
> > > Is there some way that I can increase the lease to be very large to
> test
> > > it out?
> > >
> > >
> > >
> > > On Wed, Apr 30, 2014 at 10:02 AM, Matteo Bertozzi <
> > theo.bertozzi@gmail.com
> > > > wrote:
> > >
> > >> the file is the file in export, so you are creating that file.
> > >> do you have the bandwidth throttle on?
> > >>
> > >> I'm thinking that the file is slow writing: e.g. write(few bytes) wait
> > >> write(few bytes)
> > >> and on the wait your lease expire
> > >> or something like that can happen if your MR job is stuck in someway
> > (slow
> > >> machine or similar) and it is not writing within the lease timeout
> > >>
> > >> Matteo
> > >>
> > >>
> > >>
> > >> On Wed, Apr 30, 2014 at 9:53 AM, Tianying Chang <ty...@gmail.com>
> > >> wrote:
> > >>
> > >> > we are using
> > >> >
> > >> > Hadoop 2.0.0-cdh4.2.0 and hbase 0.94.7. We also backported several
> > >> snapshot
> > >> > related jira, e.g 10111(verify snapshot), 11083 (bandwidth throttle
> in
> > >> > exportSnapshot)
> > >> >
> > >> > I found when the  LeaseExpiredException first reported, that file
> > indeed
> > >> > not there, and the map task retry. And I verifified couple minutes
> > >> later,
> > >> > that HFile does exist under /.archive. But the retry map task still
> > >> > complain the same error of file  not exist...
> > >> >
> > >> > I will check the namenode log for the LeaseExpiredException.
> > >> >
> > >> >
> > >> > Thanks
> > >> >
> > >> > Tian-Ying
> > >> >
> > >> >
> > >> > On Wed, Apr 30, 2014 at 9:33 AM, Ted Yu <yu...@gmail.com>
> wrote:
> > >> >
> > >> > > Can you give us the hbase and hadoop releases you're using ?
> > >> > >
> > >> > > Can you check namenode log around the time LeaseExpiredException
> was
> > >> > > encountered ?
> > >> > >
> > >> > > Cheers
> > >> > >
> > >> > >
> > >> > > On Wed, Apr 30, 2014 at 9:20 AM, Tianying Chang <
> tychang@gmail.com>
> > >> > wrote:
> > >> > >
> > >> > > > Hi,
> > >> > > >
> > >> > > > When I export large table with 460+ regions, I saw the
> > >> exportSnapshot
> > >> > job
> > >> > > > fail sometime (not all the time). The error of the map task is
> > >> below:
> > >> > > But I
> > >> > > > verified the file highlighted below, it does exist. Smaller
> table
> > >> seems
> > >> > > > always pass. Any idea? Is it because it is too big and get
> session
> > >> > > timeout?
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> > >> > > > No lease on
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> /hbase/.archive/rich_pin_data_v1/7713d5331180cb610834ba1c4ebbb9b3/d/eef3642f49244547bb6606d4d0f15f1f
> > >> > > > File does not exist. Holder DFSClient_NONMAPREDUCE_279781617_1
> > does
> > >> > > > not have any open files.
> > >> > > >         at
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
> > >> > > >         at
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
> > >> > > >         at
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
> > >> > > >         at
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
> > >> > > >         at
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
> > >> > > >         at
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
> > >> > > >         at org.apache.hadoop.ipc.ProtobufR
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > Thanks
> > >> > > >
> > >> > > > Tian-Ying
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: export snapshot fail sometime due to LeaseExpiredException

Posted by Ted Yu <yu...@gmail.com>.

Tianying:
Have you checked audit log on namenode for deletion event corresponding to
the files involved in LeaseExpiredException ?

Cheers


On Wed, Apr 30, 2014 at 10:44 AM, Tianying Chang <ty...@gmail.com> wrote:

> This time re-run passed (although with many failed/retry tasks) with my
> throttle bandwidth as 200M(although by iftop, it never reach close to that
> number). Is there a way to increase the lease expire time for low throttle
> bandwidth for individual export job?
>
> Thanks
> Tian-Ying
>
>
>
> On Wed, Apr 30, 2014 at 10:17 AM, Tianying Chang <ty...@gmail.com>
> wrote:
>
> > yes, I am using the bandwidth throttle feature. The export job of this
> > table actually succeed for its first run. When I rerun it (for my robust
> > testing) it seems never pass.  I am wondering if it has some werid state
> (I
> > did clean up the target cluster even removed
> > /hbase/.archive/rich_pint_data_v1 folder)
> >
> > It seems even if I set the throttle value really large, it still fail.
> And
> > I think even after I replace the jar back to the one without throttle, it
> > still fail for re-run.
> >
> > Is there some way that I can increase the lease to be very large to test
> > it out?
> >
> >
> >
> > On Wed, Apr 30, 2014 at 10:02 AM, Matteo Bertozzi <
> theo.bertozzi@gmail.com
> > > wrote:
> >
> >> the file is the file in export, so you are creating that file.
> >> do you have the bandwidth throttle on?
> >>
> >> I'm thinking that the file is slow writing: e.g. write(few bytes) wait
> >> write(few bytes)
> >> and on the wait your lease expire
> >> or something like that can happen if your MR job is stuck in someway
> (slow
> >> machine or similar) and it is not writing within the lease timeout
> >>
> >> Matteo
> >>
> >>
> >>
> >> On Wed, Apr 30, 2014 at 9:53 AM, Tianying Chang <ty...@gmail.com>
> >> wrote:
> >>
> >> > we are using
> >> >
> >> > Hadoop 2.0.0-cdh4.2.0 and hbase 0.94.7. We also backported several
> >> snapshot
> >> > related jira, e.g 10111(verify snapshot), 11083 (bandwidth throttle in
> >> > exportSnapshot)
> >> >
> >> > I found when the  LeaseExpiredException first reported, that file
> indeed
> >> > not there, and the map task retry. And I verifified couple minutes
> >> later,
> >> > that HFile does exist under /.archive. But the retry map task still
> >> > complain the same error of file  not exist...
> >> >
> >> > I will check the namenode log for the LeaseExpiredException.
> >> >
> >> >
> >> > Thanks
> >> >
> >> > Tian-Ying
> >> >
> >> >
> >> > On Wed, Apr 30, 2014 at 9:33 AM, Ted Yu <yu...@gmail.com> wrote:
> >> >
> >> > > Can you give us the hbase and hadoop releases you're using ?
> >> > >
> >> > > Can you check namenode log around the time LeaseExpiredException was
> >> > > encountered ?
> >> > >
> >> > > Cheers
> >> > >
> >> > >
> >> > > On Wed, Apr 30, 2014 at 9:20 AM, Tianying Chang <ty...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > When I export large table with 460+ regions, I saw the
> >> exportSnapshot
> >> > job
> >> > > > fail sometime (not all the time). The error of the map task is
> >> below:
> >> > > But I
> >> > > > verified the file highlighted below, it does exist. Smaller table
> >> seems
> >> > > > always pass. Any idea? Is it because it is too big and get session
> >> > > timeout?
> >> > > >
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> >> > > > No lease on
> >> > > >
> >> > >
> >> >
> >>
> /hbase/.archive/rich_pin_data_v1/7713d5331180cb610834ba1c4ebbb9b3/d/eef3642f49244547bb6606d4d0f15f1f
> >> > > > File does not exist. Holder DFSClient_NONMAPREDUCE_279781617_1
> does
> >> > > > not have any open files.
> >> > > >         at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
> >> > > >         at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
> >> > > >         at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
> >> > > >         at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
> >> > > >         at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
> >> > > >         at
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
> >> > > >         at org.apache.hadoop.ipc.ProtobufR
> >> > > >
> >> > > >
> >> > > >
> >> > > > Thanks
> >> > > >
> >> > > > Tian-Ying
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: export snapshot fail sometime due to LeaseExpiredException

Posted by Tianying Chang <ty...@gmail.com>.

This time re-run passed (although with many failed/retry tasks) with my
throttle bandwidth as 200M(although by iftop, it never reach close to that
number). Is there a way to increase the lease expire time for low throttle
bandwidth for individual export job?

Thanks
Tian-Ying



On Wed, Apr 30, 2014 at 10:17 AM, Tianying Chang <ty...@gmail.com> wrote:

> yes, I am using the bandwidth throttle feature. The export job of this
> table actually succeed for its first run. When I rerun it (for my robust
> testing) it seems never pass.  I am wondering if it has some werid state (I
> did clean up the target cluster even removed
> /hbase/.archive/rich_pint_data_v1 folder)
>
> It seems even if I set the throttle value really large, it still fail. And
> I think even after I replace the jar back to the one without throttle, it
> still fail for re-run.
>
> Is there some way that I can increase the lease to be very large to test
> it out?
>
>
>
> On Wed, Apr 30, 2014 at 10:02 AM, Matteo Bertozzi <theo.bertozzi@gmail.com
> > wrote:
>
>> the file is the file in export, so you are creating that file.
>> do you have the bandwidth throttle on?
>>
>> I'm thinking that the file is slow writing: e.g. write(few bytes) wait
>> write(few bytes)
>> and on the wait your lease expire
>> or something like that can happen if your MR job is stuck in someway (slow
>> machine or similar) and it is not writing within the lease timeout
>>
>> Matteo
>>
>>
>>
>> On Wed, Apr 30, 2014 at 9:53 AM, Tianying Chang <ty...@gmail.com>
>> wrote:
>>
>> > we are using
>> >
>> > Hadoop 2.0.0-cdh4.2.0 and hbase 0.94.7. We also backported several
>> snapshot
>> > related jira, e.g 10111(verify snapshot), 11083 (bandwidth throttle in
>> > exportSnapshot)
>> >
>> > I found when the  LeaseExpiredException first reported, that file indeed
>> > not there, and the map task retry. And I verifified couple minutes
>> later,
>> > that HFile does exist under /.archive. But the retry map task still
>> > complain the same error of file  not exist...
>> >
>> > I will check the namenode log for the LeaseExpiredException.
>> >
>> >
>> > Thanks
>> >
>> > Tian-Ying
>> >
>> >
>> > On Wed, Apr 30, 2014 at 9:33 AM, Ted Yu <yu...@gmail.com> wrote:
>> >
>> > > Can you give us the hbase and hadoop releases you're using ?
>> > >
>> > > Can you check namenode log around the time LeaseExpiredException was
>> > > encountered ?
>> > >
>> > > Cheers
>> > >
>> > >
>> > > On Wed, Apr 30, 2014 at 9:20 AM, Tianying Chang <ty...@gmail.com>
>> > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > When I export large table with 460+ regions, I saw the
>> exportSnapshot
>> > job
>> > > > fail sometime (not all the time). The error of the map task is
>> below:
>> > > But I
>> > > > verified the file highlighted below, it does exist. Smaller table
>> seems
>> > > > always pass. Any idea? Is it because it is too big and get session
>> > > timeout?
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>> > > > No lease on
>> > > >
>> > >
>> >
>> /hbase/.archive/rich_pin_data_v1/7713d5331180cb610834ba1c4ebbb9b3/d/eef3642f49244547bb6606d4d0f15f1f
>> > > > File does not exist. Holder DFSClient_NONMAPREDUCE_279781617_1 does
>> > > > not have any open files.
>> > > >         at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
>> > > >         at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
>> > > >         at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
>> > > >         at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
>> > > >         at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>> > > >         at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>> > > >         at org.apache.hadoop.ipc.ProtobufR
>> > > >
>> > > >
>> > > >
>> > > > Thanks
>> > > >
>> > > > Tian-Ying
>> > > >
>> > >
>> >
>>
>
>

Re: export snapshot fail sometime due to LeaseExpiredException

Posted by Tianying Chang <ty...@gmail.com>.

yes, I am using the bandwidth throttle feature. The export job of this
table actually succeed for its first run. When I rerun it (for my robust
testing) it seems never pass.  I am wondering if it has some werid state (I
did clean up the target cluster even removed
/hbase/.archive/rich_pint_data_v1 folder)

It seems even if I set the throttle value really large, it still fail. And
I think even after I replace the jar back to the one without throttle, it
still fail for re-run.

Is there some way that I can increase the lease to be very large to test it
out?



On Wed, Apr 30, 2014 at 10:02 AM, Matteo Bertozzi
<th...@gmail.com>wrote:

> the file is the file in export, so you are creating that file.
> do you have the bandwidth throttle on?
>
> I'm thinking that the file is slow writing: e.g. write(few bytes) wait
> write(few bytes)
> and on the wait your lease expire
> or something like that can happen if your MR job is stuck in someway (slow
> machine or similar) and it is not writing within the lease timeout
>
> Matteo
>
>
>
> On Wed, Apr 30, 2014 at 9:53 AM, Tianying Chang <ty...@gmail.com> wrote:
>
> > we are using
> >
> > Hadoop 2.0.0-cdh4.2.0 and hbase 0.94.7. We also backported several
> snapshot
> > related jira, e.g 10111(verify snapshot), 11083 (bandwidth throttle in
> > exportSnapshot)
> >
> > I found when the  LeaseExpiredException first reported, that file indeed
> > not there, and the map task retry. And I verifified couple minutes later,
> > that HFile does exist under /.archive. But the retry map task still
> > complain the same error of file  not exist...
> >
> > I will check the namenode log for the LeaseExpiredException.
> >
> >
> > Thanks
> >
> > Tian-Ying
> >
> >
> > On Wed, Apr 30, 2014 at 9:33 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Can you give us the hbase and hadoop releases you're using ?
> > >
> > > Can you check namenode log around the time LeaseExpiredException was
> > > encountered ?
> > >
> > > Cheers
> > >
> > >
> > > On Wed, Apr 30, 2014 at 9:20 AM, Tianying Chang <ty...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > When I export large table with 460+ regions, I saw the exportSnapshot
> > job
> > > > fail sometime (not all the time). The error of the map task is below:
> > > But I
> > > > verified the file highlighted below, it does exist. Smaller table
> seems
> > > > always pass. Any idea? Is it because it is too big and get session
> > > timeout?
> > > >
> > > >
> > > >
> > >
> >
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> > > > No lease on
> > > >
> > >
> >
> /hbase/.archive/rich_pin_data_v1/7713d5331180cb610834ba1c4ebbb9b3/d/eef3642f49244547bb6606d4d0f15f1f
> > > > File does not exist. Holder DFSClient_NONMAPREDUCE_279781617_1 does
> > > > not have any open files.
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
> > > >         at org.apache.hadoop.ipc.ProtobufR
> > > >
> > > >
> > > >
> > > > Thanks
> > > >
> > > > Tian-Ying
> > > >
> > >
> >
>

Re: export snapshot fail sometime due to LeaseExpiredException

Posted by Matteo Bertozzi <th...@gmail.com>.

the file is the file in export, so you are creating that file.
do you have the bandwidth throttle on?

I'm thinking that the file is slow writing: e.g. write(few bytes) wait
write(few bytes)
and on the wait your lease expire
or something like that can happen if your MR job is stuck in someway (slow
machine or similar) and it is not writing within the lease timeout

Matteo



On Wed, Apr 30, 2014 at 9:53 AM, Tianying Chang <ty...@gmail.com> wrote:

> we are using
>
> Hadoop 2.0.0-cdh4.2.0 and hbase 0.94.7. We also backported several snapshot
> related jira, e.g 10111(verify snapshot), 11083 (bandwidth throttle in
> exportSnapshot)
>
> I found when the  LeaseExpiredException first reported, that file indeed
> not there, and the map task retry. And I verifified couple minutes later,
> that HFile does exist under /.archive. But the retry map task still
> complain the same error of file  not exist...
>
> I will check the namenode log for the LeaseExpiredException.
>
>
> Thanks
>
> Tian-Ying
>
>
> On Wed, Apr 30, 2014 at 9:33 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > Can you give us the hbase and hadoop releases you're using ?
> >
> > Can you check namenode log around the time LeaseExpiredException was
> > encountered ?
> >
> > Cheers
> >
> >
> > On Wed, Apr 30, 2014 at 9:20 AM, Tianying Chang <ty...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > When I export large table with 460+ regions, I saw the exportSnapshot
> job
> > > fail sometime (not all the time). The error of the map task is below:
> > But I
> > > verified the file highlighted below, it does exist. Smaller table seems
> > > always pass. Any idea? Is it because it is too big and get session
> > timeout?
> > >
> > >
> > >
> >
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> > > No lease on
> > >
> >
> /hbase/.archive/rich_pin_data_v1/7713d5331180cb610834ba1c4ebbb9b3/d/eef3642f49244547bb6606d4d0f15f1f
> > > File does not exist. Holder DFSClient_NONMAPREDUCE_279781617_1 does
> > > not have any open files.
> > >         at
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
> > >         at
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
> > >         at
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
> > >         at
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
> > >         at
> > >
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
> > >         at
> > >
> >
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
> > >         at org.apache.hadoop.ipc.ProtobufR
> > >
> > >
> > >
> > > Thanks
> > >
> > > Tian-Ying
> > >
> >
>

Re: export snapshot fail sometime due to LeaseExpiredException

Posted by Tianying Chang <ty...@gmail.com>.

we are using

Hadoop 2.0.0-cdh4.2.0 and hbase 0.94.7. We also backported several snapshot
related jira, e.g 10111(verify snapshot), 11083 (bandwidth throttle in
exportSnapshot)

I found when the  LeaseExpiredException first reported, that file indeed
not there, and the map task retry. And I verifified couple minutes later,
that HFile does exist under /.archive. But the retry map task still
complain the same error of file  not exist...

I will check the namenode log for the LeaseExpiredException.


Thanks

Tian-Ying


On Wed, Apr 30, 2014 at 9:33 AM, Ted Yu <yu...@gmail.com> wrote:

> Can you give us the hbase and hadoop releases you're using ?
>
> Can you check namenode log around the time LeaseExpiredException was
> encountered ?
>
> Cheers
>
>
> On Wed, Apr 30, 2014 at 9:20 AM, Tianying Chang <ty...@gmail.com> wrote:
>
> > Hi,
> >
> > When I export large table with 460+ regions, I saw the exportSnapshot job
> > fail sometime (not all the time). The error of the map task is below:
> But I
> > verified the file highlighted below, it does exist. Smaller table seems
> > always pass. Any idea? Is it because it is too big and get session
> timeout?
> >
> >
> >
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> > No lease on
> >
> /hbase/.archive/rich_pin_data_v1/7713d5331180cb610834ba1c4ebbb9b3/d/eef3642f49244547bb6606d4d0f15f1f
> > File does not exist. Holder DFSClient_NONMAPREDUCE_279781617_1 does
> > not have any open files.
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
> >         at
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
> >         at
> >
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
> >         at org.apache.hadoop.ipc.ProtobufR
> >
> >
> >
> > Thanks
> >
> > Tian-Ying
> >
>

Re: export snapshot fail sometime due to LeaseExpiredException

Posted by Ted Yu <yu...@gmail.com>.

Can you give us the hbase and hadoop releases you're using ?

Can you check namenode log around the time LeaseExpiredException was
encountered ?

Cheers


On Wed, Apr 30, 2014 at 9:20 AM, Tianying Chang <ty...@gmail.com> wrote:

> Hi,
>
> When I export large table with 460+ regions, I saw the exportSnapshot job
> fail sometime (not all the time). The error of the map task is below: But I
> verified the file highlighted below, it does exist. Smaller table seems
> always pass. Any idea? Is it because it is too big and get session timeout?
>
>
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> No lease on
> /hbase/.archive/rich_pin_data_v1/7713d5331180cb610834ba1c4ebbb9b3/d/eef3642f49244547bb6606d4d0f15f1f
> File does not exist. Holder DFSClient_NONMAPREDUCE_279781617_1 does
> not have any open files.
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>         at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>         at org.apache.hadoop.ipc.ProtobufR
>
>
>
> Thanks
>
> Tian-Ying
>