You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Brian Jeltema <br...@digitalenvoy.net> on 2014/09/06 15:09:51 UTC

directory usage question

I'm trying to track down a problem I'm having running map/reduce jobs against snapshots.
Can someone explain the difference between files stored in:

    /apps/hbase/data/archive/data/default

and files stored in 

    /apps/hbase/data/data/default

(Hadoop 2.4, HBase 0.98)

Thanks

Re: directory usage question

Posted by Brian Jeltema <br...@digitalenvoy.net>.

> Your cluster is an insecure HBase deployment, right ?

Yes

> 
> Are all files under /apps/hbase/data/archive/data/default owned by user
> 'hdfs’ ?

No. However the ownership failure isn’t what I’m concerned about; I understand what caused that.
But the stack trace illustrated behavior of initTableSnapshotMapperJob that I didn’t expect, and
I’m just trying to understand what it’s doing.

> 
> BTW in tip of 0.98, with HBASE-11742, related code looks a bit different.
> 
> Cheers
> 
> 
> On Sun, Sep 7, 2014 at 8:27 AM, Brian Jeltema <
> brian.jeltema@digitalenvoy.net> wrote:
> 
>> 
>>> Eclipse doesn't show that RestoreSnapshotHelper.restoreHdfsRegions() is
>>> called by initTableSnapshotMapperJob (in master branch)
>>> 
>>> Looking at TableMapReduceUtil.java in 0.98, I don't see direct relation
>>> between the two.
>>> 
>>> Do you have stack trace or something else showing the relationship ?
>> 
>> Right. That’s what I meant by ‘indirectly’. This is a stack trace that was
>> caused by an ownership conflict:
>> 
>> java.io.IOException: java.util.concurrent.ExecutionException:
>> org.apache.hadoop.security.AccessControlException: Permission denied:
>> user=hbase, access=WRITE,
>> inode="/apps/hbase/data/archive/data/default/Host/c41d632d5eee02e1883215460e5c261d/p":hdfs:hdfs:drwxr-xr-x
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:232)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:176)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5509)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5491)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5465)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3608)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3578)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3552)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:754)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:558)
>> at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:396)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>> at
>> org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:131)
>> at
>> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.cloneHdfsRegions(RestoreSnapshotHelper.java:475)
>> at
>> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:208)
>> at
>> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:733)
>> at
>> org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat.setInput(TableSnapshotInputFormat.java:397)
>> at
>> org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableSnapshotMapperJob(TableMapReduceUtil.java:301)
>> at
>> net.digitalenvoy.hp.job.ParseHostnamesJob.run(ParseHostnamesJob.java:77)
>> at net.digitalenvoy.hp.HostProcessor.run(HostProcessor.java:165)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> at net.digitalenvoy.hp.HostProcessor.main(HostProcessor.java:47)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>> 
>>> 
>>> Cheers
>>> 
>>> 
>>> On Sun, Sep 7, 2014 at 5:48 AM, Brian Jeltema <
>>> brian.jeltema@digitalenvoy.net> wrote:
>>> 
>>>> initTableSnapshotMapperJob writes into this directory (indirectly) via
>>>> RestoreSnapshotHelper.restoreHdfsRegions
>>>> 
>>>> Is this expected? I would have expected writes to be limited to the temp
>>>> directory passed in the init call
>>>> 
>>>> Brian
>>>> 
>>>> On Sep 7, 2014, at 8:17 AM, Ted Yu <yu...@gmail.com> wrote:
>>>> 
>>>>> The files under archive directory are referenced by snapshots.
>>>>> Please don't delete them manually.
>>>>> 
>>>>> You can delete unused snapshots.
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> On Sep 7, 2014, at 4:08 AM, Brian Jeltema <
>>>> brian.jeltema@digitalenvoy.net> wrote:
>>>>> 
>>>>>> 
>>>>>> On Sep 6, 2014, at 9:32 AM, Ted Yu <yu...@gmail.com> wrote:
>>>>>> 
>>>>>>> Can you post your hbase-site.xml ?
>>>>>>> 
>>>>>>> /apps/hbase/data/archive/data/default is where HFiles are archived
>>>> (e.g.
>>>>>>> when a column family is deleted, HFiles for this column family are
>>>> stored
>>>>>>> here).
>>>>>>> /apps/hbase/data/data/default seems to be your hbase.rootdir
>>>>>> 
>>>>>> hbase.rootdir is defined to be hdfs://foo:8020/apps/hbase/data. I
>> think
>>>> that's the default that Ambari creates.
>>>>>> 
>>>>>> So the HFiles in the archive subdirectory have been discarded and can
>>>> be deleted safely?
>>>>>> 
>>>>>>> bq. a problem I'm having running map/reduce jobs against snapshots
>>>>>>> 
>>>>>>> Can you describe the problem in a bit more detail ?
>>>>>> 
>>>>>> I don't understand what I'm seeing well enough to ask an intelligent
>>>> question yet.
>>>>>> I appear to be scanning duplicate rows when using
>>>> initTableSnapshotMapperJob,
>>>>>> but I'm trying to get a better understanding of how this works, since
>>>> It's probably just
>>>>>> something I'm doing wrong.
>>>>>> 
>>>>>> Brian
>>>>>> 
>>>>>>> Cheers
>>>>>>> 
>>>>>>> 
>>>>>>> On Sat, Sep 6, 2014 at 6:09 AM, Brian Jeltema <
>>>>>>> brian.jeltema@digitalenvoy.net> wrote:
>>>>>>> 
>>>>>>>> I'm trying to track down a problem I'm having running map/reduce
>> jobs
>>>>>>>> against snapshots.
>>>>>>>> Can someone explain the difference between files stored in:
>>>>>>>> 
>>>>>>>> /apps/hbase/data/archive/data/default
>>>>>>>> 
>>>>>>>> and files stored in
>>>>>>>> 
>>>>>>>> /apps/hbase/data/data/default
>>>>>>>> 
>>>>>>>> (Hadoop 2.4, HBase 0.98)
>>>>>>>> 
>>>>>>>> Thanks
>>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>>

Re: directory usage question

Posted by Ted Yu <yu...@gmail.com>.

Your cluster is an insecure HBase deployment, right ?

Are all files under /apps/hbase/data/archive/data/default owned by user
'hdfs' ?

BTW in tip of 0.98, with HBASE-11742, related code looks a bit different.

Cheers


On Sun, Sep 7, 2014 at 8:27 AM, Brian Jeltema <
brian.jeltema@digitalenvoy.net> wrote:

>
> > Eclipse doesn't show that RestoreSnapshotHelper.restoreHdfsRegions() is
> > called by initTableSnapshotMapperJob (in master branch)
> >
> > Looking at TableMapReduceUtil.java in 0.98, I don't see direct relation
> > between the two.
> >
> > Do you have stack trace or something else showing the relationship ?
>
> Right. That’s what I meant by ‘indirectly’. This is a stack trace that was
> caused by an ownership conflict:
>
> java.io.IOException: java.util.concurrent.ExecutionException:
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=hbase, access=WRITE,
> inode="/apps/hbase/data/archive/data/default/Host/c41d632d5eee02e1883215460e5c261d/p":hdfs:hdfs:drwxr-xr-x
>  at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265)
>  at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251)
>  at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:232)
>  at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:176)
>  at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5509)
>  at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5491)
>  at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5465)
>  at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3608)
>  at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3578)
>  at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3552)
>  at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:754)
>  at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:558)
>  at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:396)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>  at
> org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:131)
>  at
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.cloneHdfsRegions(RestoreSnapshotHelper.java:475)
>  at
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:208)
>  at
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:733)
>  at
> org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat.setInput(TableSnapshotInputFormat.java:397)
>  at
> org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableSnapshotMapperJob(TableMapReduceUtil.java:301)
>  at
> net.digitalenvoy.hp.job.ParseHostnamesJob.run(ParseHostnamesJob.java:77)
>  at net.digitalenvoy.hp.HostProcessor.run(HostProcessor.java:165)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>  at net.digitalenvoy.hp.HostProcessor.main(HostProcessor.java:47)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>
> >
> > Cheers
> >
> >
> > On Sun, Sep 7, 2014 at 5:48 AM, Brian Jeltema <
> > brian.jeltema@digitalenvoy.net> wrote:
> >
> >> initTableSnapshotMapperJob writes into this directory (indirectly) via
> >> RestoreSnapshotHelper.restoreHdfsRegions
> >>
> >> Is this expected? I would have expected writes to be limited to the temp
> >> directory passed in the init call
> >>
> >> Brian
> >>
> >> On Sep 7, 2014, at 8:17 AM, Ted Yu <yu...@gmail.com> wrote:
> >>
> >>> The files under archive directory are referenced by snapshots.
> >>> Please don't delete them manually.
> >>>
> >>> You can delete unused snapshots.
> >>>
> >>> Cheers
> >>>
> >>> On Sep 7, 2014, at 4:08 AM, Brian Jeltema <
> >> brian.jeltema@digitalenvoy.net> wrote:
> >>>
> >>>>
> >>>> On Sep 6, 2014, at 9:32 AM, Ted Yu <yu...@gmail.com> wrote:
> >>>>
> >>>>> Can you post your hbase-site.xml ?
> >>>>>
> >>>>> /apps/hbase/data/archive/data/default is where HFiles are archived
> >> (e.g.
> >>>>> when a column family is deleted, HFiles for this column family are
> >> stored
> >>>>> here).
> >>>>> /apps/hbase/data/data/default seems to be your hbase.rootdir
> >>>>
> >>>> hbase.rootdir is defined to be hdfs://foo:8020/apps/hbase/data. I
> think
> >> that's the default that Ambari creates.
> >>>>
> >>>> So the HFiles in the archive subdirectory have been discarded and can
> >> be deleted safely?
> >>>>
> >>>>> bq. a problem I'm having running map/reduce jobs against snapshots
> >>>>>
> >>>>> Can you describe the problem in a bit more detail ?
> >>>>
> >>>> I don't understand what I'm seeing well enough to ask an intelligent
> >> question yet.
> >>>> I appear to be scanning duplicate rows when using
> >> initTableSnapshotMapperJob,
> >>>> but I'm trying to get a better understanding of how this works, since
> >> It's probably just
> >>>> something I'm doing wrong.
> >>>>
> >>>> Brian
> >>>>
> >>>>> Cheers
> >>>>>
> >>>>>
> >>>>> On Sat, Sep 6, 2014 at 6:09 AM, Brian Jeltema <
> >>>>> brian.jeltema@digitalenvoy.net> wrote:
> >>>>>
> >>>>>> I'm trying to track down a problem I'm having running map/reduce
> jobs
> >>>>>> against snapshots.
> >>>>>> Can someone explain the difference between files stored in:
> >>>>>>
> >>>>>> /apps/hbase/data/archive/data/default
> >>>>>>
> >>>>>> and files stored in
> >>>>>>
> >>>>>> /apps/hbase/data/data/default
> >>>>>>
> >>>>>> (Hadoop 2.4, HBase 0.98)
> >>>>>>
> >>>>>> Thanks
> >>>>
> >>>
> >>
> >>
>
>

Re: directory usage question

Posted by Brian Jeltema <br...@digitalenvoy.net>.

> Eclipse doesn't show that RestoreSnapshotHelper.restoreHdfsRegions() is
> called by initTableSnapshotMapperJob (in master branch)
> 
> Looking at TableMapReduceUtil.java in 0.98, I don't see direct relation
> between the two.
> 
> Do you have stack trace or something else showing the relationship ?

Right. That’s what I meant by ‘indirectly’. This is a stack trace that was caused by an ownership conflict:

java.io.IOException: java.util.concurrent.ExecutionException: org.apache.hadoop.security.AccessControlException: Permission denied: user=hbase, access=WRITE, inode="/apps/hbase/data/archive/data/default/Host/c41d632d5eee02e1883215460e5c261d/p":hdfs:hdfs:drwxr-xr-x
 at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265)
 at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251)
 at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:232)
 at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:176)
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5509)
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5491)
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5465)
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3608)
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3578)
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3552)
 at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:754)
 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:558)
 at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:131)
 at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.cloneHdfsRegions(RestoreSnapshotHelper.java:475)
 at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:208)
 at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:733)
 at org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat.setInput(TableSnapshotInputFormat.java:397)
 at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableSnapshotMapperJob(TableMapReduceUtil.java:301)
 at net.digitalenvoy.hp.job.ParseHostnamesJob.run(ParseHostnamesJob.java:77)
 at net.digitalenvoy.hp.HostProcessor.run(HostProcessor.java:165)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at net.digitalenvoy.hp.HostProcessor.main(HostProcessor.java:47)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

> 
> Cheers
> 
> 
> On Sun, Sep 7, 2014 at 5:48 AM, Brian Jeltema <
> brian.jeltema@digitalenvoy.net> wrote:
> 
>> initTableSnapshotMapperJob writes into this directory (indirectly) via
>> RestoreSnapshotHelper.restoreHdfsRegions
>> 
>> Is this expected? I would have expected writes to be limited to the temp
>> directory passed in the init call
>> 
>> Brian
>> 
>> On Sep 7, 2014, at 8:17 AM, Ted Yu <yu...@gmail.com> wrote:
>> 
>>> The files under archive directory are referenced by snapshots.
>>> Please don't delete them manually.
>>> 
>>> You can delete unused snapshots.
>>> 
>>> Cheers
>>> 
>>> On Sep 7, 2014, at 4:08 AM, Brian Jeltema <
>> brian.jeltema@digitalenvoy.net> wrote:
>>> 
>>>> 
>>>> On Sep 6, 2014, at 9:32 AM, Ted Yu <yu...@gmail.com> wrote:
>>>> 
>>>>> Can you post your hbase-site.xml ?
>>>>> 
>>>>> /apps/hbase/data/archive/data/default is where HFiles are archived
>> (e.g.
>>>>> when a column family is deleted, HFiles for this column family are
>> stored
>>>>> here).
>>>>> /apps/hbase/data/data/default seems to be your hbase.rootdir
>>>> 
>>>> hbase.rootdir is defined to be hdfs://foo:8020/apps/hbase/data. I think
>> that's the default that Ambari creates.
>>>> 
>>>> So the HFiles in the archive subdirectory have been discarded and can
>> be deleted safely?
>>>> 
>>>>> bq. a problem I'm having running map/reduce jobs against snapshots
>>>>> 
>>>>> Can you describe the problem in a bit more detail ?
>>>> 
>>>> I don't understand what I'm seeing well enough to ask an intelligent
>> question yet.
>>>> I appear to be scanning duplicate rows when using
>> initTableSnapshotMapperJob,
>>>> but I'm trying to get a better understanding of how this works, since
>> It's probably just
>>>> something I'm doing wrong.
>>>> 
>>>> Brian
>>>> 
>>>>> Cheers
>>>>> 
>>>>> 
>>>>> On Sat, Sep 6, 2014 at 6:09 AM, Brian Jeltema <
>>>>> brian.jeltema@digitalenvoy.net> wrote:
>>>>> 
>>>>>> I'm trying to track down a problem I'm having running map/reduce jobs
>>>>>> against snapshots.
>>>>>> Can someone explain the difference between files stored in:
>>>>>> 
>>>>>> /apps/hbase/data/archive/data/default
>>>>>> 
>>>>>> and files stored in
>>>>>> 
>>>>>> /apps/hbase/data/data/default
>>>>>> 
>>>>>> (Hadoop 2.4, HBase 0.98)
>>>>>> 
>>>>>> Thanks
>>>> 
>>> 
>> 
>>

Re: directory usage question

Posted by Ted Yu <yu...@gmail.com>.

Eclipse doesn't show that RestoreSnapshotHelper.restoreHdfsRegions() is
called by initTableSnapshotMapperJob (in master branch)

Looking at TableMapReduceUtil.java in 0.98, I don't see direct relation
between the two.

Do you have stack trace or something else showing the relationship ?

Cheers


On Sun, Sep 7, 2014 at 5:48 AM, Brian Jeltema <
brian.jeltema@digitalenvoy.net> wrote:

> initTableSnapshotMapperJob writes into this directory (indirectly) via
> RestoreSnapshotHelper.restoreHdfsRegions
>
> Is this expected? I would have expected writes to be limited to the temp
> directory passed in the init call
>
> Brian
>
> On Sep 7, 2014, at 8:17 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > The files under archive directory are referenced by snapshots.
> > Please don't delete them manually.
> >
> > You can delete unused snapshots.
> >
> > Cheers
> >
> > On Sep 7, 2014, at 4:08 AM, Brian Jeltema <
> brian.jeltema@digitalenvoy.net> wrote:
> >
> >>
> >> On Sep 6, 2014, at 9:32 AM, Ted Yu <yu...@gmail.com> wrote:
> >>
> >>> Can you post your hbase-site.xml ?
> >>>
> >>> /apps/hbase/data/archive/data/default is where HFiles are archived
> (e.g.
> >>> when a column family is deleted, HFiles for this column family are
> stored
> >>> here).
> >>> /apps/hbase/data/data/default seems to be your hbase.rootdir
> >>
> >> hbase.rootdir is defined to be hdfs://foo:8020/apps/hbase/data. I think
> that's the default that Ambari creates.
> >>
> >> So the HFiles in the archive subdirectory have been discarded and can
> be deleted safely?
> >>
> >>> bq. a problem I'm having running map/reduce jobs against snapshots
> >>>
> >>> Can you describe the problem in a bit more detail ?
> >>
> >> I don't understand what I'm seeing well enough to ask an intelligent
> question yet.
> >> I appear to be scanning duplicate rows when using
> initTableSnapshotMapperJob,
> >> but I'm trying to get a better understanding of how this works, since
> It's probably just
> >> something I'm doing wrong.
> >>
> >> Brian
> >>
> >>> Cheers
> >>>
> >>>
> >>> On Sat, Sep 6, 2014 at 6:09 AM, Brian Jeltema <
> >>> brian.jeltema@digitalenvoy.net> wrote:
> >>>
> >>>> I'm trying to track down a problem I'm having running map/reduce jobs
> >>>> against snapshots.
> >>>> Can someone explain the difference between files stored in:
> >>>>
> >>>>  /apps/hbase/data/archive/data/default
> >>>>
> >>>> and files stored in
> >>>>
> >>>>  /apps/hbase/data/data/default
> >>>>
> >>>> (Hadoop 2.4, HBase 0.98)
> >>>>
> >>>> Thanks
> >>
> >
>
>

Re: directory usage question

Posted by Brian Jeltema <br...@digitalenvoy.net>.

initTableSnapshotMapperJob writes into this directory (indirectly) via RestoreSnapshotHelper.restoreHdfsRegions

Is this expected? I would have expected writes to be limited to the temp directory passed in the init call

Brian

On Sep 7, 2014, at 8:17 AM, Ted Yu <yu...@gmail.com> wrote:

> The files under archive directory are referenced by snapshots. 
> Please don't delete them manually. 
> 
> You can delete unused snapshots. 
> 
> Cheers
> 
> On Sep 7, 2014, at 4:08 AM, Brian Jeltema <br...@digitalenvoy.net> wrote:
> 
>> 
>> On Sep 6, 2014, at 9:32 AM, Ted Yu <yu...@gmail.com> wrote:
>> 
>>> Can you post your hbase-site.xml ?
>>> 
>>> /apps/hbase/data/archive/data/default is where HFiles are archived (e.g.
>>> when a column family is deleted, HFiles for this column family are stored
>>> here).
>>> /apps/hbase/data/data/default seems to be your hbase.rootdir
>> 
>> hbase.rootdir is defined to be hdfs://foo:8020/apps/hbase/data. I think that's the default that Ambari creates.
>> 
>> So the HFiles in the archive subdirectory have been discarded and can be deleted safely? 
>> 
>>> bq. a problem I'm having running map/reduce jobs against snapshots
>>> 
>>> Can you describe the problem in a bit more detail ?
>> 
>> I don't understand what I'm seeing well enough to ask an intelligent question yet.
>> I appear to be scanning duplicate rows when using initTableSnapshotMapperJob,
>> but I'm trying to get a better understanding of how this works, since It's probably just
>> something I'm doing wrong.
>> 
>> Brian
>> 
>>> Cheers
>>> 
>>> 
>>> On Sat, Sep 6, 2014 at 6:09 AM, Brian Jeltema <
>>> brian.jeltema@digitalenvoy.net> wrote:
>>> 
>>>> I'm trying to track down a problem I'm having running map/reduce jobs
>>>> against snapshots.
>>>> Can someone explain the difference between files stored in:
>>>> 
>>>>  /apps/hbase/data/archive/data/default
>>>> 
>>>> and files stored in
>>>> 
>>>>  /apps/hbase/data/data/default
>>>> 
>>>> (Hadoop 2.4, HBase 0.98)
>>>> 
>>>> Thanks
>> 
>

Re: directory usage question

Posted by Ted Yu <yu...@gmail.com>.

The files under archive directory are referenced by snapshots. 
Please don't delete them manually. 

You can delete unused snapshots. 

Cheers

On Sep 7, 2014, at 4:08 AM, Brian Jeltema <br...@digitalenvoy.net> wrote:

> 
> On Sep 6, 2014, at 9:32 AM, Ted Yu <yu...@gmail.com> wrote:
> 
>> Can you post your hbase-site.xml ?
>> 
>> /apps/hbase/data/archive/data/default is where HFiles are archived (e.g.
>> when a column family is deleted, HFiles for this column family are stored
>> here).
>> /apps/hbase/data/data/default seems to be your hbase.rootdir
> 
> hbase.rootdir is defined to be hdfs://foo:8020/apps/hbase/data. I think that's the default that Ambari creates.
> 
> So the HFiles in the archive subdirectory have been discarded and can be deleted safely? 
> 
>> bq. a problem I'm having running map/reduce jobs against snapshots
>> 
>> Can you describe the problem in a bit more detail ?
> 
> I don't understand what I'm seeing well enough to ask an intelligent question yet.
> I appear to be scanning duplicate rows when using initTableSnapshotMapperJob,
> but I'm trying to get a better understanding of how this works, since It's probably just
> something I'm doing wrong.
> 
> Brian
> 
>> Cheers
>> 
>> 
>> On Sat, Sep 6, 2014 at 6:09 AM, Brian Jeltema <
>> brian.jeltema@digitalenvoy.net> wrote:
>> 
>>> I'm trying to track down a problem I'm having running map/reduce jobs
>>> against snapshots.
>>> Can someone explain the difference between files stored in:
>>> 
>>>   /apps/hbase/data/archive/data/default
>>> 
>>> and files stored in
>>> 
>>>   /apps/hbase/data/data/default
>>> 
>>> (Hadoop 2.4, HBase 0.98)
>>> 
>>> Thanks
>

Re: directory usage question

Posted by Brian Jeltema <br...@digitalenvoy.net>.

On Sep 6, 2014, at 9:32 AM, Ted Yu <yu...@gmail.com> wrote:

> Can you post your hbase-site.xml ?
> 
> /apps/hbase/data/archive/data/default is where HFiles are archived (e.g.
> when a column family is deleted, HFiles for this column family are stored
> here).
> /apps/hbase/data/data/default seems to be your hbase.rootdir
> 
> 

hbase.rootdir is defined to be hdfs://foo:8020/apps/hbase/data. I think that's the default that Ambari creates.

So the HFiles in the archive subdirectory have been discarded and can be deleted safely? 

> bq. a problem I'm having running map/reduce jobs against snapshots
> 
> Can you describe the problem in a bit more detail ?
> 
> 

I don't understand what I'm seeing well enough to ask an intelligent question yet.
I appear to be scanning duplicate rows when using initTableSnapshotMapperJob,
but I'm trying to get a better understanding of how this works, since It's probably just
something I'm doing wrong.

Brian

> Cheers
> 
> 
> On Sat, Sep 6, 2014 at 6:09 AM, Brian Jeltema <
> brian.jeltema@digitalenvoy.net> wrote:
> 
>> I'm trying to track down a problem I'm having running map/reduce jobs
>> against snapshots.
>> Can someone explain the difference between files stored in:
>> 
>>    /apps/hbase/data/archive/data/default
>> 
>> and files stored in
>> 
>>    /apps/hbase/data/data/default
>> 
>> (Hadoop 2.4, HBase 0.98)
>> 
>> Thanks

Re: directory usage question

Posted by Ted Yu <yu...@gmail.com>.

Can you post your hbase-site.xml ?

/apps/hbase/data/archive/data/default is where HFiles are archived (e.g.
when a column family is deleted, HFiles for this column family are stored
here).
/apps/hbase/data/data/default seems to be your hbase.rootdir

bq. a problem I'm having running map/reduce jobs against snapshots

Can you describe the problem in a bit more detail ?

Cheers

On Sat, Sep 6, 2014 at 6:09 AM, Brian Jeltema <
brian.jeltema@digitalenvoy.net> wrote:

> I'm trying to track down a problem I'm having running map/reduce jobs
> against snapshots.
> Can someone explain the difference between files stored in:
>
>     /apps/hbase/data/archive/data/default
>
> and files stored in
>
>     /apps/hbase/data/data/default
>
> (Hadoop 2.4, HBase 0.98)
>
> Thanks