You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Daisy Zhou <da...@wibidata.com> on 2014/09/16 23:26:26 UTC

Bulk-loading HFiles after table split (on ACL enabled cluster)

Hi,

I can't find mention of this issue on the Jira.  Is it known?  I think that
if a split of the HFiles is required, LoadIncrementalHFiles should create
the new HFiles with the correct permissions to be bulk-loaded. Currently it
just hangs because the permissions are wrong.

Here is how I reproduce my issue:

On a cluster with ACL enabled, I generate HFiles for a bulk-load, then
*force a table split*, and then attempt to bulk-load the HFiles.  The
bulk-load hangs (similar to when the hfiles' directory is not chown'ed
properly):

14/09/15 15:44:41 INFO
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles: Trying to
load hfile=hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/00000
first=\x00fs\xC0song-32\x00 last=\xFEI\x99~song-44\x0014/09/15
15:44:41 INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
HFile at hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/00000
no longer fits inside a single region. Splitting...14/09/15 15:44:42
INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
Successfully split into new HFiles
hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/_tmp/kiji.kiji_music.table.songs,1.bottom
and hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/_tmp/kiji.kiji_music.table.songs,1.top14/09/15
15:44:42 INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
Split occured while grouping HFiles, retry attempt 1 with 2 files
remaining to group or split
14/09/15 15:44:42 INFO
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles: Trying to
load hfile=hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/_tmp/kiji.kiji_music.table.songs,1.top
first=c\xA8\x0D\x81song-9\x00 last=\xFEI\x99~song-44\x0014/09/15
15:44:42 INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
Trying to load hfile=hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/_tmp/kiji.kiji_music.table.songs,1.bottom
first=\x00fs\xC0song-32\x00 last=^49\xDEsong-13\x00


If I chmod -R 777 the directory and try again, the bulk load completes
successfully.


Daisy

Re: Bulk-loading HFiles after table split (on ACL enabled cluster)

Posted by Daisy Zhou <da...@wibidata.com>.
All right, thank you.  I've modified my client code to chmod while the
bulk-load is running instead, since even if I manually chmod beforehand,
the newly split HFiles need to be chmod'd before the bulk-load can continue.

On Wed, Sep 17, 2014 at 5:28 PM, Matteo Bertozzi <th...@gmail.com>
wrote:

> yeah, in a non secure cluster you have to manually the chmod.
> there was discussion to implement something like the SecureBulkLoadEndPoint
> even for the unsecure setup, but at the moment there is no jira/patch
> available.
> (the SecureBulkLoadEndPoint is basically doing a chmod 777 before starting
> the bulkload)
>
> Matteo
>
>
> On Wed, Sep 17, 2014 at 12:58 PM, Daisy Zhou <da...@wibidata.com> wrote:
>
> > Thanks for the response, Matteo.
> >
> > My HBase is not a secure HBase, I only have ACL enabled on HDFS.  I did
> try
> > adding the SecureBulkLoadEndpoint coprocessor to my HBase cluster, but I
> > think it does something different, and it didn't help.
> >
> > I normally have to chmod -R a+rwx the hfile directory in order to
> bulk-load
> > them, because the hbase user and current user both need write access.
> Then
> > the newly created split HFiles do not have those same permissions,
> unless I
> > chmod them specifically.  Am I doing something wrong?
> >
> > Daisy
> >
> > On Tue, Sep 16, 2014 at 2:28 PM, Matteo Bertozzi <
> theo.bertozzi@gmail.com>
> > wrote:
> >
> > > are you using the SecureBulkLoadEndpoint? that should take care of
> > > permissions
> > > http://hbase.apache.org/book/hbase.secure.bulkload.html
> > >
> > > Matteo
> > >
> > >
> > > On Tue, Sep 16, 2014 at 2:26 PM, Daisy Zhou <da...@wibidata.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I can't find mention of this issue on the Jira.  Is it known?  I
> think
> > > that
> > > > if a split of the HFiles is required, LoadIncrementalHFiles should
> > create
> > > > the new HFiles with the correct permissions to be bulk-loaded.
> > Currently
> > > it
> > > > just hangs because the permissions are wrong.
> > > >
> > > > Here is how I reproduce my issue:
> > > >
> > > > On a cluster with ACL enabled, I generate HFiles for a bulk-load,
> then
> > > > *force a table split*, and then attempt to bulk-load the HFiles.  The
> > > > bulk-load hangs (similar to when the hfiles' directory is not
> chown'ed
> > > > properly):
> > > >
> > > > 14/09/15 15:44:41 INFO
> > > > org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles: Trying to
> > > > load
> > > >
> > >
> >
> hfile=hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/00000
> > > > first=\x00fs\xC0song-32\x00 last=\xFEI\x99~song-44\x0014/09/15
> > > > 15:44:41 INFO
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
> > > > HFile at
> > > >
> > >
> >
> hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/00000
> > > > no longer fits inside a single region. Splitting...14/09/15 15:44:42
> > > > INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
> > > > Successfully split into new HFiles
> > > >
> > > >
> > >
> >
> hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/_tmp/kiji.kiji_music.table.songs,1.bottom
> > > > and
> > > >
> > >
> >
> hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/_tmp/kiji.kiji_music.table.songs,1.top14/09/15
> > > > 15:44:42 INFO
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
> > > > Split occured while grouping HFiles, retry attempt 1 with 2 files
> > > > remaining to group or split
> > > > 14/09/15 15:44:42 INFO
> > > > org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles: Trying to
> > > > load
> > > >
> > >
> >
> hfile=hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/_tmp/kiji.kiji_music.table.songs,1.top
> > > > first=c\xA8\x0D\x81song-9\x00 last=\xFEI\x99~song-44\x0014/09/15
> > > > 15:44:42 INFO
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
> > > > Trying to load
> > > >
> > >
> >
> hfile=hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/_tmp/kiji.kiji_music.table.songs,1.bottom
> > > > first=\x00fs\xC0song-32\x00 last=^49\xDEsong-13\x00
> > > >
> > > >
> > > > If I chmod -R 777 the directory and try again, the bulk load
> completes
> > > > successfully.
> > > >
> > > >
> > > > Daisy
> > > >
> > >
> >
>

Re: Bulk-loading HFiles after table split (on ACL enabled cluster)

Posted by Matteo Bertozzi <th...@gmail.com>.
yeah, in a non secure cluster you have to manually the chmod.
there was discussion to implement something like the SecureBulkLoadEndPoint
even for the unsecure setup, but at the moment there is no jira/patch
available.
(the SecureBulkLoadEndPoint is basically doing a chmod 777 before starting
the bulkload)

Matteo


On Wed, Sep 17, 2014 at 12:58 PM, Daisy Zhou <da...@wibidata.com> wrote:

> Thanks for the response, Matteo.
>
> My HBase is not a secure HBase, I only have ACL enabled on HDFS.  I did try
> adding the SecureBulkLoadEndpoint coprocessor to my HBase cluster, but I
> think it does something different, and it didn't help.
>
> I normally have to chmod -R a+rwx the hfile directory in order to bulk-load
> them, because the hbase user and current user both need write access.  Then
> the newly created split HFiles do not have those same permissions, unless I
> chmod them specifically.  Am I doing something wrong?
>
> Daisy
>
> On Tue, Sep 16, 2014 at 2:28 PM, Matteo Bertozzi <th...@gmail.com>
> wrote:
>
> > are you using the SecureBulkLoadEndpoint? that should take care of
> > permissions
> > http://hbase.apache.org/book/hbase.secure.bulkload.html
> >
> > Matteo
> >
> >
> > On Tue, Sep 16, 2014 at 2:26 PM, Daisy Zhou <da...@wibidata.com> wrote:
> >
> > > Hi,
> > >
> > > I can't find mention of this issue on the Jira.  Is it known?  I think
> > that
> > > if a split of the HFiles is required, LoadIncrementalHFiles should
> create
> > > the new HFiles with the correct permissions to be bulk-loaded.
> Currently
> > it
> > > just hangs because the permissions are wrong.
> > >
> > > Here is how I reproduce my issue:
> > >
> > > On a cluster with ACL enabled, I generate HFiles for a bulk-load, then
> > > *force a table split*, and then attempt to bulk-load the HFiles.  The
> > > bulk-load hangs (similar to when the hfiles' directory is not chown'ed
> > > properly):
> > >
> > > 14/09/15 15:44:41 INFO
> > > org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles: Trying to
> > > load
> > >
> >
> hfile=hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/00000
> > > first=\x00fs\xC0song-32\x00 last=\xFEI\x99~song-44\x0014/09/15
> > > 15:44:41 INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
> > > HFile at
> > >
> >
> hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/00000
> > > no longer fits inside a single region. Splitting...14/09/15 15:44:42
> > > INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
> > > Successfully split into new HFiles
> > >
> > >
> >
> hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/_tmp/kiji.kiji_music.table.songs,1.bottom
> > > and
> > >
> >
> hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/_tmp/kiji.kiji_music.table.songs,1.top14/09/15
> > > 15:44:42 INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
> > > Split occured while grouping HFiles, retry attempt 1 with 2 files
> > > remaining to group or split
> > > 14/09/15 15:44:42 INFO
> > > org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles: Trying to
> > > load
> > >
> >
> hfile=hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/_tmp/kiji.kiji_music.table.songs,1.top
> > > first=c\xA8\x0D\x81song-9\x00 last=\xFEI\x99~song-44\x0014/09/15
> > > 15:44:42 INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
> > > Trying to load
> > >
> >
> hfile=hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/_tmp/kiji.kiji_music.table.songs,1.bottom
> > > first=\x00fs\xC0song-32\x00 last=^49\xDEsong-13\x00
> > >
> > >
> > > If I chmod -R 777 the directory and try again, the bulk load completes
> > > successfully.
> > >
> > >
> > > Daisy
> > >
> >
>

Re: Bulk-loading HFiles after table split (on ACL enabled cluster)

Posted by Daisy Zhou <da...@wibidata.com>.
Thanks for the response, Matteo.

My HBase is not a secure HBase, I only have ACL enabled on HDFS.  I did try
adding the SecureBulkLoadEndpoint coprocessor to my HBase cluster, but I
think it does something different, and it didn't help.

I normally have to chmod -R a+rwx the hfile directory in order to bulk-load
them, because the hbase user and current user both need write access.  Then
the newly created split HFiles do not have those same permissions, unless I
chmod them specifically.  Am I doing something wrong?

Daisy

On Tue, Sep 16, 2014 at 2:28 PM, Matteo Bertozzi <th...@gmail.com>
wrote:

> are you using the SecureBulkLoadEndpoint? that should take care of
> permissions
> http://hbase.apache.org/book/hbase.secure.bulkload.html
>
> Matteo
>
>
> On Tue, Sep 16, 2014 at 2:26 PM, Daisy Zhou <da...@wibidata.com> wrote:
>
> > Hi,
> >
> > I can't find mention of this issue on the Jira.  Is it known?  I think
> that
> > if a split of the HFiles is required, LoadIncrementalHFiles should create
> > the new HFiles with the correct permissions to be bulk-loaded. Currently
> it
> > just hangs because the permissions are wrong.
> >
> > Here is how I reproduce my issue:
> >
> > On a cluster with ACL enabled, I generate HFiles for a bulk-load, then
> > *force a table split*, and then attempt to bulk-load the HFiles.  The
> > bulk-load hangs (similar to when the hfiles' directory is not chown'ed
> > properly):
> >
> > 14/09/15 15:44:41 INFO
> > org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles: Trying to
> > load
> >
> hfile=hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/00000
> > first=\x00fs\xC0song-32\x00 last=\xFEI\x99~song-44\x0014/09/15
> > 15:44:41 INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
> > HFile at
> >
> hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/00000
> > no longer fits inside a single region. Splitting...14/09/15 15:44:42
> > INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
> > Successfully split into new HFiles
> >
> >
> hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/_tmp/kiji.kiji_music.table.songs,1.bottom
> > and
> >
> hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/_tmp/kiji.kiji_music.table.songs,1.top14/09/15
> > 15:44:42 INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
> > Split occured while grouping HFiles, retry attempt 1 with 2 files
> > remaining to group or split
> > 14/09/15 15:44:42 INFO
> > org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles: Trying to
> > load
> >
> hfile=hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/_tmp/kiji.kiji_music.table.songs,1.top
> > first=c\xA8\x0D\x81song-9\x00 last=\xFEI\x99~song-44\x0014/09/15
> > 15:44:42 INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
> > Trying to load
> >
> hfile=hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/_tmp/kiji.kiji_music.table.songs,1.bottom
> > first=\x00fs\xC0song-32\x00 last=^49\xDEsong-13\x00
> >
> >
> > If I chmod -R 777 the directory and try again, the bulk load completes
> > successfully.
> >
> >
> > Daisy
> >
>

Re: Bulk-loading HFiles after table split (on ACL enabled cluster)

Posted by Matteo Bertozzi <th...@gmail.com>.
are you using the SecureBulkLoadEndpoint? that should take care of
permissions
http://hbase.apache.org/book/hbase.secure.bulkload.html

Matteo


On Tue, Sep 16, 2014 at 2:26 PM, Daisy Zhou <da...@wibidata.com> wrote:

> Hi,
>
> I can't find mention of this issue on the Jira.  Is it known?  I think that
> if a split of the HFiles is required, LoadIncrementalHFiles should create
> the new HFiles with the correct permissions to be bulk-loaded. Currently it
> just hangs because the permissions are wrong.
>
> Here is how I reproduce my issue:
>
> On a cluster with ACL enabled, I generate HFiles for a bulk-load, then
> *force a table split*, and then attempt to bulk-load the HFiles.  The
> bulk-load hangs (similar to when the hfiles' directory is not chown'ed
> properly):
>
> 14/09/15 15:44:41 INFO
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles: Trying to
> load
> hfile=hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/00000
> first=\x00fs\xC0song-32\x00 last=\xFEI\x99~song-44\x0014/09/15
> 15:44:41 INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
> HFile at
> hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/00000
> no longer fits inside a single region. Splitting...14/09/15 15:44:42
> INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
> Successfully split into new HFiles
>
> hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/_tmp/kiji.kiji_music.table.songs,1.bottom
> and
> hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/_tmp/kiji.kiji_music.table.songs,1.top14/09/15
> 15:44:42 INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
> Split occured while grouping HFiles, retry attempt 1 with 2 files
> remaining to group or split
> 14/09/15 15:44:42 INFO
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles: Trying to
> load
> hfile=hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/_tmp/kiji.kiji_music.table.songs,1.top
> first=c\xA8\x0D\x81song-9\x00 last=\xFEI\x99~song-44\x0014/09/15
> 15:44:42 INFO org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles:
> Trying to load
> hfile=hdfs://bento:8020/user/daisy/kiji-mr-tutorial/hfile-output/part-r-00000.hfile/B/_tmp/kiji.kiji_music.table.songs,1.bottom
> first=\x00fs\xC0song-32\x00 last=^49\xDEsong-13\x00
>
>
> If I chmod -R 777 the directory and try again, the bulk load completes
> successfully.
>
>
> Daisy
>