You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Oleg Ruchovets <or...@gmail.com> on 2012/08/28 18:29:44 UTC

bulk loading problem

Hi ,
   I am on process to write my first bulk loading job. I use Cloudera
CDH3U3 with hbase 0.90.4

Executing a job I see HFiles   which created after job finished but there
were  no entries in hbase. hbase shell >> count  'uu_bulk'  return 0.

Here is my job configuration:

        Configuration  conf =  HBaseConfiguration.create();

       Job job = new Job(conf, getClass().getSimpleName());

        job.setJarByClass(UuPushMapReduceJobFactory.class);
        job.setMapperClass(UuPushMapper.class);
        job.setMapOutputKeyClass(ImmutableBytesWritable.class);
        job.setMapOutputValueClass(KeyValue.class);
        job.setOutputFormatClass(HFileOutputFormat.class);



        String path = uuAggregationContext.getUuInputPath();
        String outputPath =
"/bulk_loading_hbase/output/"+System.currentTimeMillis();
        LOG.info("path = " + path);
        LOG.info("outputPath = " + outputPath);

        final String tableName = "uu_bulk";
        LOG.info("hbase tableName: " + tableName);
        createRegions(conf , Bytes.toBytes(tableName));

        FileInputFormat.addInputPath(job, new Path(path));
        FileOutputFormat.setOutputPath(job, new Path(outputPath));

        HFileOutputFormat.configureIncrementalLoad(job, new HTable(conf,
tableName));
//=====================================================================================
Reducers log ends

2012-08-28 11:53:40,643 INFO org.apache.hadoop.mapred.Merger: Down to
the last merge-pass, with 10 segments left of total size: 222885367
bytes
2012-08-28 11:53:54,137 INFO
org.apache.hadoop.hbase.mapreduce.HFileOutputFormat:
Writer=hdfs://hdn16/bulk_loading_hbase/output/1346194117045/_temporary/_attempt_201208260949_0026_r_000005_0/d/3908303205246218823,
wrote=268435455
2012-08-28 11:54:11,966 INFO org.apache.hadoop.mapred.Task:
Task:attempt_201208260949_0026_r_000005_0 is done. And is in the
process of commiting
2012-08-28 11:54:12,975 INFO org.apache.hadoop.mapred.Task: Task
attempt_201208260949_0026_r_000005_0 is allowed to commit now
2012-08-28 11:54:13,007 INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved
output of task 'attempt_201208260949_0026_r_000005_0' to
/bulk_loading_hbase/output/1346194117045
2012-08-28 11:54:13,009 INFO org.apache.hadoop.mapred.Task: Task
'attempt_201208260949_0026_r_000005_0' done.
2012-08-28 11:54:13,010 INFO
org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
truncater with mapRetainSize=-1 and reduceRetainSize=-1

As I understand HFiles were written
to /bulk_loading_hbase/output/1346194117045 but I don't see any activity
related to moving HFiles to hbase.


What I am doing wrong? What should to get the result to be  written to
 Hbase?

Thanks in advance
Oleg.

Re: bulk loading problem

Posted by Oleg Ruchovets <or...@gmail.com>.

Great.
It works !!!!

On Tue, Aug 28, 2012 at 6:42 PM, Igal Shilman <ig...@wix.com> wrote:

> As suggested by the book, take a look at:
> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles class,
>
> This tool expects two arguments: (1) the path to the generated HFiles (in
> your case it's outputPath) (2) the target table.
> To use it programatically, you can either invoke it via the ToolRunner, or
> calling LoadIncrementalHFiles.doBulkLoad() by yourself.
> (after your M/R job has successfully finished)
>
> If you are already loading to an existing table, then: (following your
> code)
>
> LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
> > int ret = loader.doBulkLoad(new Path(outputPath), new HTable(conf,
> > tableName));
>
>
> Otherwise,
>
>
> > int ret = ToolRunner.run(new LoadIncrementalHFiles(conf),
> >             new String[] {outputPath, tableName});
>
>
>
> Good luck,
> Igal.
>
> On Tue, Aug 28, 2012 at 10:59 PM, Oleg Ruchovets <oruchovets@gmail.com
> >wrote:
>
> > Hi Igal , thank you for the quick response  .
> >    Can I execute this step programmatically?
> >
> > From link you sent :
> >
> > 9.8.5. Advanced Usage
> >
> > Although the importtsv tool is useful in many cases, advanced users may
> > want to generate data programatically, or import data from other formats.
> > To get started doing so, dig into ImportTsv.java and check the JavaDoc
> for
> > HFileOutputFormat.
> >
> > The import step of the bulk load can also be done programatically. See
> the
> > LoadIncrementalHFiles class for more information.
> > The question is : what should I do/add to my job to write generated
> HFiles
> > programmatically to Hbase?
> >
> >
> >
> >
> > On Tue, Aug 28, 2012 at 8:08 PM, Igal Shilman <ig...@wix.com> wrote:
> >
> > > Hi,
> > > You need to complete the bulk load.
> > > Check out http://hbase.apache.org/book/arch.bulk.load.html 9.8.2
> > >
> > > Igal.
> > >
> > > On Tue, Aug 28, 2012 at 7:29 PM, Oleg Ruchovets <oruchovets@gmail.com
> > > >wrote:
> > >
> > > > Hi ,
> > > >    I am on process to write my first bulk loading job. I use Cloudera
> > > > CDH3U3 with hbase 0.90.4
> > > >
> > > > Executing a job I see HFiles   which created after job finished but
> > there
> > > > were  no entries in hbase. hbase shell >> count  'uu_bulk'  return 0.
> > > >
> > > > Here is my job configuration:
> > > >
> > > >         Configuration  conf =  HBaseConfiguration.create();
> > > >
> > > >        Job job = new Job(conf, getClass().getSimpleName());
> > > >
> > > >         job.setJarByClass(UuPushMapReduceJobFactory.class);
> > > >         job.setMapperClass(UuPushMapper.class);
> > > >         job.setMapOutputKeyClass(ImmutableBytesWritable.class);
> > > >         job.setMapOutputValueClass(KeyValue.class);
> > > >         job.setOutputFormatClass(HFileOutputFormat.class);
> > > >
> > > >
> > > >
> > > >         String path = uuAggregationContext.getUuInputPath();
> > > >         String outputPath =
> > > > "/bulk_loading_hbase/output/"+System.currentTimeMillis();
> > > >         LOG.info("path = " + path);
> > > >         LOG.info("outputPath = " + outputPath);
> > > >
> > > >         final String tableName = "uu_bulk";
> > > >         LOG.info("hbase tableName: " + tableName);
> > > >         createRegions(conf , Bytes.toBytes(tableName));
> > > >
> > > >         FileInputFormat.addInputPath(job, new Path(path));
> > > >         FileOutputFormat.setOutputPath(job, new Path(outputPath));
> > > >
> > > >         HFileOutputFormat.configureIncrementalLoad(job, new
> > HTable(conf,
> > > > tableName));
> > > >
> > > >
> > >
> >
> //=====================================================================================
> > > > Reducers log ends
> > > >
> > > > 2012-08-28 11:53:40,643 INFO org.apache.hadoop.mapred.Merger: Down to
> > > > the last merge-pass, with 10 segments left of total size: 222885367
> > > > bytes
> > > > 2012-08-28 11:53:54,137 INFO
> > > > org.apache.hadoop.hbase.mapreduce.HFileOutputFormat:
> > > >
> > > >
> > >
> >
> Writer=hdfs://hdn16/bulk_loading_hbase/output/1346194117045/_temporary/_attempt_201208260949_0026_r_000005_0/d/3908303205246218823,
> > > > wrote=268435455
> > > > 2012-08-28 11:54:11,966 INFO org.apache.hadoop.mapred.Task:
> > > > Task:attempt_201208260949_0026_r_000005_0 is done. And is in the
> > > > process of commiting
> > > > 2012-08-28 11:54:12,975 INFO org.apache.hadoop.mapred.Task: Task
> > > > attempt_201208260949_0026_r_000005_0 is allowed to commit now
> > > > 2012-08-28 11:54:13,007 INFO
> > > > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved
> > > > output of task 'attempt_201208260949_0026_r_000005_0' to
> > > > /bulk_loading_hbase/output/1346194117045
> > > > 2012-08-28 11:54:13,009 INFO org.apache.hadoop.mapred.Task: Task
> > > > 'attempt_201208260949_0026_r_000005_0' done.
> > > > 2012-08-28 11:54:13,010 INFO
> > > > org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
> > > > truncater with mapRetainSize=-1 and reduceRetainSize=-1
> > > >
> > > > As I understand HFiles were written
> > > > to /bulk_loading_hbase/output/1346194117045 but I don't see any
> > activity
> > > > related to moving HFiles to hbase.
> > > >
> > > >
> > > > What I am doing wrong? What should to get the result to be  written
> to
> > > >  Hbase?
> > > >
> > > > Thanks in advance
> > > > Oleg.
> > > >
> > >
> >
>

Re: bulk loading problem

Posted by Igal Shilman <ig...@wix.com>.

As suggested by the book, take a look at:
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles class,

This tool expects two arguments: (1) the path to the generated HFiles (in
your case it's outputPath) (2) the target table.
To use it programatically, you can either invoke it via the ToolRunner, or
calling LoadIncrementalHFiles.doBulkLoad() by yourself.
(after your M/R job has successfully finished)

If you are already loading to an existing table, then: (following your code)

LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
> int ret = loader.doBulkLoad(new Path(outputPath), new HTable(conf,
> tableName));


Otherwise,


> int ret = ToolRunner.run(new LoadIncrementalHFiles(conf),
>             new String[] {outputPath, tableName});



Good luck,
Igal.

On Tue, Aug 28, 2012 at 10:59 PM, Oleg Ruchovets <or...@gmail.com>wrote:

> Hi Igal , thank you for the quick response  .
>    Can I execute this step programmatically?
>
> From link you sent :
>
> 9.8.5. Advanced Usage
>
> Although the importtsv tool is useful in many cases, advanced users may
> want to generate data programatically, or import data from other formats.
> To get started doing so, dig into ImportTsv.java and check the JavaDoc for
> HFileOutputFormat.
>
> The import step of the bulk load can also be done programatically. See the
> LoadIncrementalHFiles class for more information.
> The question is : what should I do/add to my job to write generated HFiles
> programmatically to Hbase?
>
>
>
>
> On Tue, Aug 28, 2012 at 8:08 PM, Igal Shilman <ig...@wix.com> wrote:
>
> > Hi,
> > You need to complete the bulk load.
> > Check out http://hbase.apache.org/book/arch.bulk.load.html 9.8.2
> >
> > Igal.
> >
> > On Tue, Aug 28, 2012 at 7:29 PM, Oleg Ruchovets <oruchovets@gmail.com
> > >wrote:
> >
> > > Hi ,
> > >    I am on process to write my first bulk loading job. I use Cloudera
> > > CDH3U3 with hbase 0.90.4
> > >
> > > Executing a job I see HFiles   which created after job finished but
> there
> > > were  no entries in hbase. hbase shell >> count  'uu_bulk'  return 0.
> > >
> > > Here is my job configuration:
> > >
> > >         Configuration  conf =  HBaseConfiguration.create();
> > >
> > >        Job job = new Job(conf, getClass().getSimpleName());
> > >
> > >         job.setJarByClass(UuPushMapReduceJobFactory.class);
> > >         job.setMapperClass(UuPushMapper.class);
> > >         job.setMapOutputKeyClass(ImmutableBytesWritable.class);
> > >         job.setMapOutputValueClass(KeyValue.class);
> > >         job.setOutputFormatClass(HFileOutputFormat.class);
> > >
> > >
> > >
> > >         String path = uuAggregationContext.getUuInputPath();
> > >         String outputPath =
> > > "/bulk_loading_hbase/output/"+System.currentTimeMillis();
> > >         LOG.info("path = " + path);
> > >         LOG.info("outputPath = " + outputPath);
> > >
> > >         final String tableName = "uu_bulk";
> > >         LOG.info("hbase tableName: " + tableName);
> > >         createRegions(conf , Bytes.toBytes(tableName));
> > >
> > >         FileInputFormat.addInputPath(job, new Path(path));
> > >         FileOutputFormat.setOutputPath(job, new Path(outputPath));
> > >
> > >         HFileOutputFormat.configureIncrementalLoad(job, new
> HTable(conf,
> > > tableName));
> > >
> > >
> >
> //=====================================================================================
> > > Reducers log ends
> > >
> > > 2012-08-28 11:53:40,643 INFO org.apache.hadoop.mapred.Merger: Down to
> > > the last merge-pass, with 10 segments left of total size: 222885367
> > > bytes
> > > 2012-08-28 11:53:54,137 INFO
> > > org.apache.hadoop.hbase.mapreduce.HFileOutputFormat:
> > >
> > >
> >
> Writer=hdfs://hdn16/bulk_loading_hbase/output/1346194117045/_temporary/_attempt_201208260949_0026_r_000005_0/d/3908303205246218823,
> > > wrote=268435455
> > > 2012-08-28 11:54:11,966 INFO org.apache.hadoop.mapred.Task:
> > > Task:attempt_201208260949_0026_r_000005_0 is done. And is in the
> > > process of commiting
> > > 2012-08-28 11:54:12,975 INFO org.apache.hadoop.mapred.Task: Task
> > > attempt_201208260949_0026_r_000005_0 is allowed to commit now
> > > 2012-08-28 11:54:13,007 INFO
> > > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved
> > > output of task 'attempt_201208260949_0026_r_000005_0' to
> > > /bulk_loading_hbase/output/1346194117045
> > > 2012-08-28 11:54:13,009 INFO org.apache.hadoop.mapred.Task: Task
> > > 'attempt_201208260949_0026_r_000005_0' done.
> > > 2012-08-28 11:54:13,010 INFO
> > > org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
> > > truncater with mapRetainSize=-1 and reduceRetainSize=-1
> > >
> > > As I understand HFiles were written
> > > to /bulk_loading_hbase/output/1346194117045 but I don't see any
> activity
> > > related to moving HFiles to hbase.
> > >
> > >
> > > What I am doing wrong? What should to get the result to be  written to
> > >  Hbase?
> > >
> > > Thanks in advance
> > > Oleg.
> > >
> >
>

Re: bulk loading problem

Posted by Oleg Ruchovets <or...@gmail.com>.

Hi Igal , thank you for the quick response  .
   Can I execute this step programmatically?

>From link you sent :

9.8.5. Advanced Usage

Although the importtsv tool is useful in many cases, advanced users may
want to generate data programatically, or import data from other formats.
To get started doing so, dig into ImportTsv.java and check the JavaDoc for
HFileOutputFormat.

The import step of the bulk load can also be done programatically. See the
LoadIncrementalHFiles class for more information.
The question is : what should I do/add to my job to write generated HFiles
programmatically to Hbase?




On Tue, Aug 28, 2012 at 8:08 PM, Igal Shilman <ig...@wix.com> wrote:

> Hi,
> You need to complete the bulk load.
> Check out http://hbase.apache.org/book/arch.bulk.load.html 9.8.2
>
> Igal.
>
> On Tue, Aug 28, 2012 at 7:29 PM, Oleg Ruchovets <oruchovets@gmail.com
> >wrote:
>
> > Hi ,
> >    I am on process to write my first bulk loading job. I use Cloudera
> > CDH3U3 with hbase 0.90.4
> >
> > Executing a job I see HFiles   which created after job finished but there
> > were  no entries in hbase. hbase shell >> count  'uu_bulk'  return 0.
> >
> > Here is my job configuration:
> >
> >         Configuration  conf =  HBaseConfiguration.create();
> >
> >        Job job = new Job(conf, getClass().getSimpleName());
> >
> >         job.setJarByClass(UuPushMapReduceJobFactory.class);
> >         job.setMapperClass(UuPushMapper.class);
> >         job.setMapOutputKeyClass(ImmutableBytesWritable.class);
> >         job.setMapOutputValueClass(KeyValue.class);
> >         job.setOutputFormatClass(HFileOutputFormat.class);
> >
> >
> >
> >         String path = uuAggregationContext.getUuInputPath();
> >         String outputPath =
> > "/bulk_loading_hbase/output/"+System.currentTimeMillis();
> >         LOG.info("path = " + path);
> >         LOG.info("outputPath = " + outputPath);
> >
> >         final String tableName = "uu_bulk";
> >         LOG.info("hbase tableName: " + tableName);
> >         createRegions(conf , Bytes.toBytes(tableName));
> >
> >         FileInputFormat.addInputPath(job, new Path(path));
> >         FileOutputFormat.setOutputPath(job, new Path(outputPath));
> >
> >         HFileOutputFormat.configureIncrementalLoad(job, new HTable(conf,
> > tableName));
> >
> >
> //=====================================================================================
> > Reducers log ends
> >
> > 2012-08-28 11:53:40,643 INFO org.apache.hadoop.mapred.Merger: Down to
> > the last merge-pass, with 10 segments left of total size: 222885367
> > bytes
> > 2012-08-28 11:53:54,137 INFO
> > org.apache.hadoop.hbase.mapreduce.HFileOutputFormat:
> >
> >
> Writer=hdfs://hdn16/bulk_loading_hbase/output/1346194117045/_temporary/_attempt_201208260949_0026_r_000005_0/d/3908303205246218823,
> > wrote=268435455
> > 2012-08-28 11:54:11,966 INFO org.apache.hadoop.mapred.Task:
> > Task:attempt_201208260949_0026_r_000005_0 is done. And is in the
> > process of commiting
> > 2012-08-28 11:54:12,975 INFO org.apache.hadoop.mapred.Task: Task
> > attempt_201208260949_0026_r_000005_0 is allowed to commit now
> > 2012-08-28 11:54:13,007 INFO
> > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved
> > output of task 'attempt_201208260949_0026_r_000005_0' to
> > /bulk_loading_hbase/output/1346194117045
> > 2012-08-28 11:54:13,009 INFO org.apache.hadoop.mapred.Task: Task
> > 'attempt_201208260949_0026_r_000005_0' done.
> > 2012-08-28 11:54:13,010 INFO
> > org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
> > truncater with mapRetainSize=-1 and reduceRetainSize=-1
> >
> > As I understand HFiles were written
> > to /bulk_loading_hbase/output/1346194117045 but I don't see any activity
> > related to moving HFiles to hbase.
> >
> >
> > What I am doing wrong? What should to get the result to be  written to
> >  Hbase?
> >
> > Thanks in advance
> > Oleg.
> >
>

Re: bulk loading problem

Posted by Igal Shilman <ig...@wix.com>.

Hi,
You need to complete the bulk load.
Check out http://hbase.apache.org/book/arch.bulk.load.html 9.8.2

Igal.

On Tue, Aug 28, 2012 at 7:29 PM, Oleg Ruchovets <or...@gmail.com>wrote:

> Hi ,
>    I am on process to write my first bulk loading job. I use Cloudera
> CDH3U3 with hbase 0.90.4
>
> Executing a job I see HFiles   which created after job finished but there
> were  no entries in hbase. hbase shell >> count  'uu_bulk'  return 0.
>
> Here is my job configuration:
>
>         Configuration  conf =  HBaseConfiguration.create();
>
>        Job job = new Job(conf, getClass().getSimpleName());
>
>         job.setJarByClass(UuPushMapReduceJobFactory.class);
>         job.setMapperClass(UuPushMapper.class);
>         job.setMapOutputKeyClass(ImmutableBytesWritable.class);
>         job.setMapOutputValueClass(KeyValue.class);
>         job.setOutputFormatClass(HFileOutputFormat.class);
>
>
>
>         String path = uuAggregationContext.getUuInputPath();
>         String outputPath =
> "/bulk_loading_hbase/output/"+System.currentTimeMillis();
>         LOG.info("path = " + path);
>         LOG.info("outputPath = " + outputPath);
>
>         final String tableName = "uu_bulk";
>         LOG.info("hbase tableName: " + tableName);
>         createRegions(conf , Bytes.toBytes(tableName));
>
>         FileInputFormat.addInputPath(job, new Path(path));
>         FileOutputFormat.setOutputPath(job, new Path(outputPath));
>
>         HFileOutputFormat.configureIncrementalLoad(job, new HTable(conf,
> tableName));
>
> //=====================================================================================
> Reducers log ends
>
> 2012-08-28 11:53:40,643 INFO org.apache.hadoop.mapred.Merger: Down to
> the last merge-pass, with 10 segments left of total size: 222885367
> bytes
> 2012-08-28 11:53:54,137 INFO
> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat:
>
> Writer=hdfs://hdn16/bulk_loading_hbase/output/1346194117045/_temporary/_attempt_201208260949_0026_r_000005_0/d/3908303205246218823,
> wrote=268435455
> 2012-08-28 11:54:11,966 INFO org.apache.hadoop.mapred.Task:
> Task:attempt_201208260949_0026_r_000005_0 is done. And is in the
> process of commiting
> 2012-08-28 11:54:12,975 INFO org.apache.hadoop.mapred.Task: Task
> attempt_201208260949_0026_r_000005_0 is allowed to commit now
> 2012-08-28 11:54:13,007 INFO
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved
> output of task 'attempt_201208260949_0026_r_000005_0' to
> /bulk_loading_hbase/output/1346194117045
> 2012-08-28 11:54:13,009 INFO org.apache.hadoop.mapred.Task: Task
> 'attempt_201208260949_0026_r_000005_0' done.
> 2012-08-28 11:54:13,010 INFO
> org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
> truncater with mapRetainSize=-1 and reduceRetainSize=-1
>
> As I understand HFiles were written
> to /bulk_loading_hbase/output/1346194117045 but I don't see any activity
> related to moving HFiles to hbase.
>
>
> What I am doing wrong? What should to get the result to be  written to
>  Hbase?
>
> Thanks in advance
> Oleg.
>