You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Brian Jeltema <br...@digitalenvoy.net> on 2014/09/06 15:09:51 UTC
directory usage question
I'm trying to track down a problem I'm having running map/reduce jobs against snapshots.
Can someone explain the difference between files stored in:
/apps/hbase/data/archive/data/default
and files stored in
/apps/hbase/data/data/default
(Hadoop 2.4, HBase 0.98)
Thanks
Re: directory usage question
Posted by Brian Jeltema <br...@digitalenvoy.net>.
> Your cluster is an insecure HBase deployment, right ?
Yes
>
> Are all files under /apps/hbase/data/archive/data/default owned by user
> 'hdfs’ ?
No. However the ownership failure isn’t what I’m concerned about; I understand what caused that.
But the stack trace illustrated behavior of initTableSnapshotMapperJob that I didn’t expect, and
I’m just trying to understand what it’s doing.
>
> BTW in tip of 0.98, with HBASE-11742, related code looks a bit different.
>
> Cheers
>
>
> On Sun, Sep 7, 2014 at 8:27 AM, Brian Jeltema <
> brian.jeltema@digitalenvoy.net> wrote:
>
>>
>>> Eclipse doesn't show that RestoreSnapshotHelper.restoreHdfsRegions() is
>>> called by initTableSnapshotMapperJob (in master branch)
>>>
>>> Looking at TableMapReduceUtil.java in 0.98, I don't see direct relation
>>> between the two.
>>>
>>> Do you have stack trace or something else showing the relationship ?
>>
>> Right. That’s what I meant by ‘indirectly’. This is a stack trace that was
>> caused by an ownership conflict:
>>
>> java.io.IOException: java.util.concurrent.ExecutionException:
>> org.apache.hadoop.security.AccessControlException: Permission denied:
>> user=hbase, access=WRITE,
>> inode="/apps/hbase/data/archive/data/default/Host/c41d632d5eee02e1883215460e5c261d/p":hdfs:hdfs:drwxr-xr-x
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:232)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:176)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5509)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5491)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5465)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3608)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3578)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3552)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:754)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:558)
>> at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:396)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>> at
>> org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:131)
>> at
>> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.cloneHdfsRegions(RestoreSnapshotHelper.java:475)
>> at
>> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:208)
>> at
>> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:733)
>> at
>> org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat.setInput(TableSnapshotInputFormat.java:397)
>> at
>> org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableSnapshotMapperJob(TableMapReduceUtil.java:301)
>> at
>> net.digitalenvoy.hp.job.ParseHostnamesJob.run(ParseHostnamesJob.java:77)
>> at net.digitalenvoy.hp.HostProcessor.run(HostProcessor.java:165)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> at net.digitalenvoy.hp.HostProcessor.main(HostProcessor.java:47)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>>
>>>
>>> Cheers
>>>
>>>
>>> On Sun, Sep 7, 2014 at 5:48 AM, Brian Jeltema <
>>> brian.jeltema@digitalenvoy.net> wrote:
>>>
>>>> initTableSnapshotMapperJob writes into this directory (indirectly) via
>>>> RestoreSnapshotHelper.restoreHdfsRegions
>>>>
>>>> Is this expected? I would have expected writes to be limited to the temp
>>>> directory passed in the init call
>>>>
>>>> Brian
>>>>
>>>> On Sep 7, 2014, at 8:17 AM, Ted Yu <yu...@gmail.com> wrote:
>>>>
>>>>> The files under archive directory are referenced by snapshots.
>>>>> Please don't delete them manually.
>>>>>
>>>>> You can delete unused snapshots.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Sep 7, 2014, at 4:08 AM, Brian Jeltema <
>>>> brian.jeltema@digitalenvoy.net> wrote:
>>>>>
>>>>>>
>>>>>> On Sep 6, 2014, at 9:32 AM, Ted Yu <yu...@gmail.com> wrote:
>>>>>>
>>>>>>> Can you post your hbase-site.xml ?
>>>>>>>
>>>>>>> /apps/hbase/data/archive/data/default is where HFiles are archived
>>>> (e.g.
>>>>>>> when a column family is deleted, HFiles for this column family are
>>>> stored
>>>>>>> here).
>>>>>>> /apps/hbase/data/data/default seems to be your hbase.rootdir
>>>>>>
>>>>>> hbase.rootdir is defined to be hdfs://foo:8020/apps/hbase/data. I
>> think
>>>> that's the default that Ambari creates.
>>>>>>
>>>>>> So the HFiles in the archive subdirectory have been discarded and can
>>>> be deleted safely?
>>>>>>
>>>>>>> bq. a problem I'm having running map/reduce jobs against snapshots
>>>>>>>
>>>>>>> Can you describe the problem in a bit more detail ?
>>>>>>
>>>>>> I don't understand what I'm seeing well enough to ask an intelligent
>>>> question yet.
>>>>>> I appear to be scanning duplicate rows when using
>>>> initTableSnapshotMapperJob,
>>>>>> but I'm trying to get a better understanding of how this works, since
>>>> It's probably just
>>>>>> something I'm doing wrong.
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Sep 6, 2014 at 6:09 AM, Brian Jeltema <
>>>>>>> brian.jeltema@digitalenvoy.net> wrote:
>>>>>>>
>>>>>>>> I'm trying to track down a problem I'm having running map/reduce
>> jobs
>>>>>>>> against snapshots.
>>>>>>>> Can someone explain the difference between files stored in:
>>>>>>>>
>>>>>>>> /apps/hbase/data/archive/data/default
>>>>>>>>
>>>>>>>> and files stored in
>>>>>>>>
>>>>>>>> /apps/hbase/data/data/default
>>>>>>>>
>>>>>>>> (Hadoop 2.4, HBase 0.98)
>>>>>>>>
>>>>>>>> Thanks
>>>>>>
>>>>>
>>>>
>>>>
>>
>>
Re: directory usage question
Posted by Ted Yu <yu...@gmail.com>.
Your cluster is an insecure HBase deployment, right ?
Are all files under /apps/hbase/data/archive/data/default owned by user
'hdfs' ?
BTW in tip of 0.98, with HBASE-11742, related code looks a bit different.
Cheers
On Sun, Sep 7, 2014 at 8:27 AM, Brian Jeltema <
brian.jeltema@digitalenvoy.net> wrote:
>
> > Eclipse doesn't show that RestoreSnapshotHelper.restoreHdfsRegions() is
> > called by initTableSnapshotMapperJob (in master branch)
> >
> > Looking at TableMapReduceUtil.java in 0.98, I don't see direct relation
> > between the two.
> >
> > Do you have stack trace or something else showing the relationship ?
>
> Right. That’s what I meant by ‘indirectly’. This is a stack trace that was
> caused by an ownership conflict:
>
> java.io.IOException: java.util.concurrent.ExecutionException:
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=hbase, access=WRITE,
> inode="/apps/hbase/data/archive/data/default/Host/c41d632d5eee02e1883215460e5c261d/p":hdfs:hdfs:drwxr-xr-x
> at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265)
> at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251)
> at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:232)
> at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:176)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5509)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5491)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5465)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3608)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3578)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3552)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:754)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:558)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
> at
> org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:131)
> at
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.cloneHdfsRegions(RestoreSnapshotHelper.java:475)
> at
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:208)
> at
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:733)
> at
> org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat.setInput(TableSnapshotInputFormat.java:397)
> at
> org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableSnapshotMapperJob(TableMapReduceUtil.java:301)
> at
> net.digitalenvoy.hp.job.ParseHostnamesJob.run(ParseHostnamesJob.java:77)
> at net.digitalenvoy.hp.HostProcessor.run(HostProcessor.java:165)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at net.digitalenvoy.hp.HostProcessor.main(HostProcessor.java:47)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>
> >
> > Cheers
> >
> >
> > On Sun, Sep 7, 2014 at 5:48 AM, Brian Jeltema <
> > brian.jeltema@digitalenvoy.net> wrote:
> >
> >> initTableSnapshotMapperJob writes into this directory (indirectly) via
> >> RestoreSnapshotHelper.restoreHdfsRegions
> >>
> >> Is this expected? I would have expected writes to be limited to the temp
> >> directory passed in the init call
> >>
> >> Brian
> >>
> >> On Sep 7, 2014, at 8:17 AM, Ted Yu <yu...@gmail.com> wrote:
> >>
> >>> The files under archive directory are referenced by snapshots.
> >>> Please don't delete them manually.
> >>>
> >>> You can delete unused snapshots.
> >>>
> >>> Cheers
> >>>
> >>> On Sep 7, 2014, at 4:08 AM, Brian Jeltema <
> >> brian.jeltema@digitalenvoy.net> wrote:
> >>>
> >>>>
> >>>> On Sep 6, 2014, at 9:32 AM, Ted Yu <yu...@gmail.com> wrote:
> >>>>
> >>>>> Can you post your hbase-site.xml ?
> >>>>>
> >>>>> /apps/hbase/data/archive/data/default is where HFiles are archived
> >> (e.g.
> >>>>> when a column family is deleted, HFiles for this column family are
> >> stored
> >>>>> here).
> >>>>> /apps/hbase/data/data/default seems to be your hbase.rootdir
> >>>>
> >>>> hbase.rootdir is defined to be hdfs://foo:8020/apps/hbase/data. I
> think
> >> that's the default that Ambari creates.
> >>>>
> >>>> So the HFiles in the archive subdirectory have been discarded and can
> >> be deleted safely?
> >>>>
> >>>>> bq. a problem I'm having running map/reduce jobs against snapshots
> >>>>>
> >>>>> Can you describe the problem in a bit more detail ?
> >>>>
> >>>> I don't understand what I'm seeing well enough to ask an intelligent
> >> question yet.
> >>>> I appear to be scanning duplicate rows when using
> >> initTableSnapshotMapperJob,
> >>>> but I'm trying to get a better understanding of how this works, since
> >> It's probably just
> >>>> something I'm doing wrong.
> >>>>
> >>>> Brian
> >>>>
> >>>>> Cheers
> >>>>>
> >>>>>
> >>>>> On Sat, Sep 6, 2014 at 6:09 AM, Brian Jeltema <
> >>>>> brian.jeltema@digitalenvoy.net> wrote:
> >>>>>
> >>>>>> I'm trying to track down a problem I'm having running map/reduce
> jobs
> >>>>>> against snapshots.
> >>>>>> Can someone explain the difference between files stored in:
> >>>>>>
> >>>>>> /apps/hbase/data/archive/data/default
> >>>>>>
> >>>>>> and files stored in
> >>>>>>
> >>>>>> /apps/hbase/data/data/default
> >>>>>>
> >>>>>> (Hadoop 2.4, HBase 0.98)
> >>>>>>
> >>>>>> Thanks
> >>>>
> >>>
> >>
> >>
>
>
Re: directory usage question
Posted by Brian Jeltema <br...@digitalenvoy.net>.
> Eclipse doesn't show that RestoreSnapshotHelper.restoreHdfsRegions() is
> called by initTableSnapshotMapperJob (in master branch)
>
> Looking at TableMapReduceUtil.java in 0.98, I don't see direct relation
> between the two.
>
> Do you have stack trace or something else showing the relationship ?
Right. That’s what I meant by ‘indirectly’. This is a stack trace that was caused by an ownership conflict:
java.io.IOException: java.util.concurrent.ExecutionException: org.apache.hadoop.security.AccessControlException: Permission denied: user=hbase, access=WRITE, inode="/apps/hbase/data/archive/data/default/Host/c41d632d5eee02e1883215460e5c261d/p":hdfs:hdfs:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:232)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:176)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5509)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5491)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5465)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3608)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3578)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3552)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:754)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:558)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:131)
at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.cloneHdfsRegions(RestoreSnapshotHelper.java:475)
at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:208)
at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:733)
at org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat.setInput(TableSnapshotInputFormat.java:397)
at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableSnapshotMapperJob(TableMapReduceUtil.java:301)
at net.digitalenvoy.hp.job.ParseHostnamesJob.run(ParseHostnamesJob.java:77)
at net.digitalenvoy.hp.HostProcessor.run(HostProcessor.java:165)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at net.digitalenvoy.hp.HostProcessor.main(HostProcessor.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>
> Cheers
>
>
> On Sun, Sep 7, 2014 at 5:48 AM, Brian Jeltema <
> brian.jeltema@digitalenvoy.net> wrote:
>
>> initTableSnapshotMapperJob writes into this directory (indirectly) via
>> RestoreSnapshotHelper.restoreHdfsRegions
>>
>> Is this expected? I would have expected writes to be limited to the temp
>> directory passed in the init call
>>
>> Brian
>>
>> On Sep 7, 2014, at 8:17 AM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> The files under archive directory are referenced by snapshots.
>>> Please don't delete them manually.
>>>
>>> You can delete unused snapshots.
>>>
>>> Cheers
>>>
>>> On Sep 7, 2014, at 4:08 AM, Brian Jeltema <
>> brian.jeltema@digitalenvoy.net> wrote:
>>>
>>>>
>>>> On Sep 6, 2014, at 9:32 AM, Ted Yu <yu...@gmail.com> wrote:
>>>>
>>>>> Can you post your hbase-site.xml ?
>>>>>
>>>>> /apps/hbase/data/archive/data/default is where HFiles are archived
>> (e.g.
>>>>> when a column family is deleted, HFiles for this column family are
>> stored
>>>>> here).
>>>>> /apps/hbase/data/data/default seems to be your hbase.rootdir
>>>>
>>>> hbase.rootdir is defined to be hdfs://foo:8020/apps/hbase/data. I think
>> that's the default that Ambari creates.
>>>>
>>>> So the HFiles in the archive subdirectory have been discarded and can
>> be deleted safely?
>>>>
>>>>> bq. a problem I'm having running map/reduce jobs against snapshots
>>>>>
>>>>> Can you describe the problem in a bit more detail ?
>>>>
>>>> I don't understand what I'm seeing well enough to ask an intelligent
>> question yet.
>>>> I appear to be scanning duplicate rows when using
>> initTableSnapshotMapperJob,
>>>> but I'm trying to get a better understanding of how this works, since
>> It's probably just
>>>> something I'm doing wrong.
>>>>
>>>> Brian
>>>>
>>>>> Cheers
>>>>>
>>>>>
>>>>> On Sat, Sep 6, 2014 at 6:09 AM, Brian Jeltema <
>>>>> brian.jeltema@digitalenvoy.net> wrote:
>>>>>
>>>>>> I'm trying to track down a problem I'm having running map/reduce jobs
>>>>>> against snapshots.
>>>>>> Can someone explain the difference between files stored in:
>>>>>>
>>>>>> /apps/hbase/data/archive/data/default
>>>>>>
>>>>>> and files stored in
>>>>>>
>>>>>> /apps/hbase/data/data/default
>>>>>>
>>>>>> (Hadoop 2.4, HBase 0.98)
>>>>>>
>>>>>> Thanks
>>>>
>>>
>>
>>
Re: directory usage question
Posted by Ted Yu <yu...@gmail.com>.
Eclipse doesn't show that RestoreSnapshotHelper.restoreHdfsRegions() is
called by initTableSnapshotMapperJob (in master branch)
Looking at TableMapReduceUtil.java in 0.98, I don't see direct relation
between the two.
Do you have stack trace or something else showing the relationship ?
Cheers
On Sun, Sep 7, 2014 at 5:48 AM, Brian Jeltema <
brian.jeltema@digitalenvoy.net> wrote:
> initTableSnapshotMapperJob writes into this directory (indirectly) via
> RestoreSnapshotHelper.restoreHdfsRegions
>
> Is this expected? I would have expected writes to be limited to the temp
> directory passed in the init call
>
> Brian
>
> On Sep 7, 2014, at 8:17 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > The files under archive directory are referenced by snapshots.
> > Please don't delete them manually.
> >
> > You can delete unused snapshots.
> >
> > Cheers
> >
> > On Sep 7, 2014, at 4:08 AM, Brian Jeltema <
> brian.jeltema@digitalenvoy.net> wrote:
> >
> >>
> >> On Sep 6, 2014, at 9:32 AM, Ted Yu <yu...@gmail.com> wrote:
> >>
> >>> Can you post your hbase-site.xml ?
> >>>
> >>> /apps/hbase/data/archive/data/default is where HFiles are archived
> (e.g.
> >>> when a column family is deleted, HFiles for this column family are
> stored
> >>> here).
> >>> /apps/hbase/data/data/default seems to be your hbase.rootdir
> >>
> >> hbase.rootdir is defined to be hdfs://foo:8020/apps/hbase/data. I think
> that's the default that Ambari creates.
> >>
> >> So the HFiles in the archive subdirectory have been discarded and can
> be deleted safely?
> >>
> >>> bq. a problem I'm having running map/reduce jobs against snapshots
> >>>
> >>> Can you describe the problem in a bit more detail ?
> >>
> >> I don't understand what I'm seeing well enough to ask an intelligent
> question yet.
> >> I appear to be scanning duplicate rows when using
> initTableSnapshotMapperJob,
> >> but I'm trying to get a better understanding of how this works, since
> It's probably just
> >> something I'm doing wrong.
> >>
> >> Brian
> >>
> >>> Cheers
> >>>
> >>>
> >>> On Sat, Sep 6, 2014 at 6:09 AM, Brian Jeltema <
> >>> brian.jeltema@digitalenvoy.net> wrote:
> >>>
> >>>> I'm trying to track down a problem I'm having running map/reduce jobs
> >>>> against snapshots.
> >>>> Can someone explain the difference between files stored in:
> >>>>
> >>>> /apps/hbase/data/archive/data/default
> >>>>
> >>>> and files stored in
> >>>>
> >>>> /apps/hbase/data/data/default
> >>>>
> >>>> (Hadoop 2.4, HBase 0.98)
> >>>>
> >>>> Thanks
> >>
> >
>
>
Re: directory usage question
Posted by Brian Jeltema <br...@digitalenvoy.net>.
initTableSnapshotMapperJob writes into this directory (indirectly) via RestoreSnapshotHelper.restoreHdfsRegions
Is this expected? I would have expected writes to be limited to the temp directory passed in the init call
Brian
On Sep 7, 2014, at 8:17 AM, Ted Yu <yu...@gmail.com> wrote:
> The files under archive directory are referenced by snapshots.
> Please don't delete them manually.
>
> You can delete unused snapshots.
>
> Cheers
>
> On Sep 7, 2014, at 4:08 AM, Brian Jeltema <br...@digitalenvoy.net> wrote:
>
>>
>> On Sep 6, 2014, at 9:32 AM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> Can you post your hbase-site.xml ?
>>>
>>> /apps/hbase/data/archive/data/default is where HFiles are archived (e.g.
>>> when a column family is deleted, HFiles for this column family are stored
>>> here).
>>> /apps/hbase/data/data/default seems to be your hbase.rootdir
>>
>> hbase.rootdir is defined to be hdfs://foo:8020/apps/hbase/data. I think that's the default that Ambari creates.
>>
>> So the HFiles in the archive subdirectory have been discarded and can be deleted safely?
>>
>>> bq. a problem I'm having running map/reduce jobs against snapshots
>>>
>>> Can you describe the problem in a bit more detail ?
>>
>> I don't understand what I'm seeing well enough to ask an intelligent question yet.
>> I appear to be scanning duplicate rows when using initTableSnapshotMapperJob,
>> but I'm trying to get a better understanding of how this works, since It's probably just
>> something I'm doing wrong.
>>
>> Brian
>>
>>> Cheers
>>>
>>>
>>> On Sat, Sep 6, 2014 at 6:09 AM, Brian Jeltema <
>>> brian.jeltema@digitalenvoy.net> wrote:
>>>
>>>> I'm trying to track down a problem I'm having running map/reduce jobs
>>>> against snapshots.
>>>> Can someone explain the difference between files stored in:
>>>>
>>>> /apps/hbase/data/archive/data/default
>>>>
>>>> and files stored in
>>>>
>>>> /apps/hbase/data/data/default
>>>>
>>>> (Hadoop 2.4, HBase 0.98)
>>>>
>>>> Thanks
>>
>
Re: directory usage question
Posted by Ted Yu <yu...@gmail.com>.
The files under archive directory are referenced by snapshots.
Please don't delete them manually.
You can delete unused snapshots.
Cheers
On Sep 7, 2014, at 4:08 AM, Brian Jeltema <br...@digitalenvoy.net> wrote:
>
> On Sep 6, 2014, at 9:32 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> Can you post your hbase-site.xml ?
>>
>> /apps/hbase/data/archive/data/default is where HFiles are archived (e.g.
>> when a column family is deleted, HFiles for this column family are stored
>> here).
>> /apps/hbase/data/data/default seems to be your hbase.rootdir
>
> hbase.rootdir is defined to be hdfs://foo:8020/apps/hbase/data. I think that's the default that Ambari creates.
>
> So the HFiles in the archive subdirectory have been discarded and can be deleted safely?
>
>> bq. a problem I'm having running map/reduce jobs against snapshots
>>
>> Can you describe the problem in a bit more detail ?
>
> I don't understand what I'm seeing well enough to ask an intelligent question yet.
> I appear to be scanning duplicate rows when using initTableSnapshotMapperJob,
> but I'm trying to get a better understanding of how this works, since It's probably just
> something I'm doing wrong.
>
> Brian
>
>> Cheers
>>
>>
>> On Sat, Sep 6, 2014 at 6:09 AM, Brian Jeltema <
>> brian.jeltema@digitalenvoy.net> wrote:
>>
>>> I'm trying to track down a problem I'm having running map/reduce jobs
>>> against snapshots.
>>> Can someone explain the difference between files stored in:
>>>
>>> /apps/hbase/data/archive/data/default
>>>
>>> and files stored in
>>>
>>> /apps/hbase/data/data/default
>>>
>>> (Hadoop 2.4, HBase 0.98)
>>>
>>> Thanks
>
Re: directory usage question
Posted by Brian Jeltema <br...@digitalenvoy.net>.
On Sep 6, 2014, at 9:32 AM, Ted Yu <yu...@gmail.com> wrote:
> Can you post your hbase-site.xml ?
>
> /apps/hbase/data/archive/data/default is where HFiles are archived (e.g.
> when a column family is deleted, HFiles for this column family are stored
> here).
> /apps/hbase/data/data/default seems to be your hbase.rootdir
>
>
hbase.rootdir is defined to be hdfs://foo:8020/apps/hbase/data. I think that's the default that Ambari creates.
So the HFiles in the archive subdirectory have been discarded and can be deleted safely?
> bq. a problem I'm having running map/reduce jobs against snapshots
>
> Can you describe the problem in a bit more detail ?
>
>
I don't understand what I'm seeing well enough to ask an intelligent question yet.
I appear to be scanning duplicate rows when using initTableSnapshotMapperJob,
but I'm trying to get a better understanding of how this works, since It's probably just
something I'm doing wrong.
Brian
> Cheers
>
>
> On Sat, Sep 6, 2014 at 6:09 AM, Brian Jeltema <
> brian.jeltema@digitalenvoy.net> wrote:
>
>> I'm trying to track down a problem I'm having running map/reduce jobs
>> against snapshots.
>> Can someone explain the difference between files stored in:
>>
>> /apps/hbase/data/archive/data/default
>>
>> and files stored in
>>
>> /apps/hbase/data/data/default
>>
>> (Hadoop 2.4, HBase 0.98)
>>
>> Thanks
Re: directory usage question
Posted by Ted Yu <yu...@gmail.com>.
Can you post your hbase-site.xml ?
/apps/hbase/data/archive/data/default is where HFiles are archived (e.g.
when a column family is deleted, HFiles for this column family are stored
here).
/apps/hbase/data/data/default seems to be your hbase.rootdir
bq. a problem I'm having running map/reduce jobs against snapshots
Can you describe the problem in a bit more detail ?
Cheers
On Sat, Sep 6, 2014 at 6:09 AM, Brian Jeltema <
brian.jeltema@digitalenvoy.net> wrote:
> I'm trying to track down a problem I'm having running map/reduce jobs
> against snapshots.
> Can someone explain the difference between files stored in:
>
> /apps/hbase/data/archive/data/default
>
> and files stored in
>
> /apps/hbase/data/data/default
>
> (Hadoop 2.4, HBase 0.98)
>
> Thanks