You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Shuja Rehman <sh...@gmail.com> on 2010/11/09 21:02:42 UTC

Bulk Load Sample Code

Hi

I am trying to investigate the bulk load option as described in the
following link.

http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html

Does anybody have sample code or have used it before?
Can it be helpful to insert data into existing table. In my scenario, I have
one table with 1 column family in which data will be inserted every 15
minutes.

Kindly share your experiences

Thanks
-- 
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>

Re: Bulk Load Sample Code

Posted by Adam Phelps <am...@opendns.com>.

On 11/12/10 10:34 AM, Shuja Rehman wrote:
> @Adam
> Why u not use the configureIncrementalLoad() which automatically sets up a
> TotalOrderPartitioner ?

I just wasn't aware of this method until today.  If that produces a more 
efficient result then I'll be switching to it.

- Adam

Re: Bulk Load Sample Code

Posted by Shuja Rehman <sh...@gmail.com>.

@stack

There is no job running in the job interface while the job tracker log shows
that the job is submitted successfully as shown below.

2010-11-12 10:17:35,795 INFO org.apache.hadoop.mapred.JobTracker: Job
job_201011120442_0010 added successfully for user 'root' to queue 'default'
2010-11-12 10:17:35,795 INFO org.apache.hadoop.mapred.AuditLogger:
USER=root    IP=10.10.10.2   OPERATION=SUBMIT_JOB
TARGET=job_201011120442_0010    RESULT=SUCCESS

And in task tracker log, there is nothing about this job. I think job did
not assign to any task tracker. About the jar mismatches, I have checked the
jars of program and server, Both have the same version i-e cloudera b3.
Also I have seen the userlogs, and in user log, no folder is created for
this job. Any other guess where to look for it?

@Adam
Why u not use the configureIncrementalLoad() which automatically sets up a
TotalOrderPartitioner ?

On Fri, Nov 12, 2010 at 11:10 PM, Adam Phelps <am...@opendns.com> wrote:

> On 11/10/10 11:57 AM, Stack wrote:
>
>> On Wed, Nov 10, 2010 at 11:53 AM, Shuja Rehman<sh...@gmail.com>
>>  wrote:
>>
>>> oh! I think u have not read the full post. The essay has 3 paragraphs  :)
>>>
>>> *Should I need to add the following line also
>>>
>>>  job.setPartitionerClass(TotalOrderPartitioner.class);
>>>
>>>
>> You need to specify other than default partitioner so yes, above seems
>> necessary (Be aware that if only one reducer, all may appear to work
>> though your partitioner is bad... its when you have multiple reducers
>> that bad partitioner will show).
>>
>
> I skimmed over this thread as we've been using LoadIncrementalHFiles to
> load the output of our MR jobs, however it looks like we're using
> HRegionPartitioner rather than TotalOrderPartitioner.  The current code is
> definitely working, however the page regarding bulk loads that was posted
> earlier implies that TotalOrderPartitioner is best for efficiency.  What is
> the difference between the two?
>
> - Adam
>

-- 
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>

Re: Bulk Load Sample Code

Posted by Alex Baranau <al...@gmail.com>.

I believe "by chance" here = *1* reduce job.

Alex Baranau
----
Sematext :: http://sematext.com/

On Mon, Nov 15, 2010 at 7:09 PM, Todd Lipcon <to...@cloudera.com> wrote:

> On Mon, Nov 15, 2010 at 12:24 AM, Shuja Rehman <shujamughal@gmail.com
> >wrote:
>
> > If HRegionPartitioner works correctly then what is the use of
> > configureIncrementalLoad() as discussed here
> >
> > http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
> >
> > and why this link did not discuss about HRegionPartitioner? Where is the
> > documentaion of HRegionPartitioner and in which case we give preference
> to
> > HRegionPartitioner  over configureIncrementalLoad(). What is the
> difference
> > in both?
> >
> >
> Please read my email again - I'm surprised that it works correctly - if it
> does work correctly it's by chance. It should *not* be used for incremental
> load.
>
> -Todd
>
>
> >
> > On Sat, Nov 13, 2010 at 5:12 AM, Todd Lipcon <to...@cloudera.com> wrote:
> >
> > > I'm surprised that HRegionPartitioner works correctly for incremental
> > load.
> > > It definitely won't work if the regions are also shifting during the MR
> > > job.
> > >
> > > Thanks
> > > -Todd
> > >
> > > On Fri, Nov 12, 2010 at 10:10 AM, Adam Phelps <am...@opendns.com> wrote:
> > >
> > > > On 11/10/10 11:57 AM, Stack wrote:
> > > >
> > > >> On Wed, Nov 10, 2010 at 11:53 AM, Shuja Rehman<
> shujamughal@gmail.com>
> > > >>  wrote:
> > > >>
> > > >>> oh! I think u have not read the full post. The essay has 3
> paragraphs
> > >  :)
> > > >>>
> > > >>> *Should I need to add the following line also
> > > >>>
> > > >>>  job.setPartitionerClass(TotalOrderPartitioner.class);
> > > >>>
> > > >>>
> > > >> You need to specify other than default partitioner so yes, above
> seems
> > > >> necessary (Be aware that if only one reducer, all may appear to work
> > > >> though your partitioner is bad... its when you have multiple
> reducers
> > > >> that bad partitioner will show).
> > > >>
> > > >
> > > > I skimmed over this thread as we've been using LoadIncrementalHFiles
> to
> > > > load the output of our MR jobs, however it looks like we're using
> > > > HRegionPartitioner rather than TotalOrderPartitioner.  The current
> code
> > > is
> > > > definitely working, however the page regarding bulk loads that was
> > posted
> > > > earlier implies that TotalOrderPartitioner is best for efficiency.
> >  What
> > > is
> > > > the difference between the two?
> > > >
> > > > - Adam
> > > >
> > >
> > >
> > >
> > > --
> > > Todd Lipcon
> > > Software Engineer, Cloudera
> > >
> >
> >
> >
> > --
> > Regards
> > Shuja-ur-Rehman Baig
> > <http://pk.linkedin.com/in/shujamughal>
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Bulk Load Sample Code

Posted by Todd Lipcon <to...@cloudera.com>.

On Mon, Nov 15, 2010 at 12:24 AM, Shuja Rehman <sh...@gmail.com>wrote:

> If HRegionPartitioner works correctly then what is the use of
> configureIncrementalLoad() as discussed here
>
> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
>
> and why this link did not discuss about HRegionPartitioner? Where is the
> documentaion of HRegionPartitioner and in which case we give preference to
> HRegionPartitioner  over configureIncrementalLoad(). What is the difference
> in both?
>
>
Please read my email again - I'm surprised that it works correctly - if it
does work correctly it's by chance. It should *not* be used for incremental
load.

-Todd


>
> On Sat, Nov 13, 2010 at 5:12 AM, Todd Lipcon <to...@cloudera.com> wrote:
>
> > I'm surprised that HRegionPartitioner works correctly for incremental
> load.
> > It definitely won't work if the regions are also shifting during the MR
> > job.
> >
> > Thanks
> > -Todd
> >
> > On Fri, Nov 12, 2010 at 10:10 AM, Adam Phelps <am...@opendns.com> wrote:
> >
> > > On 11/10/10 11:57 AM, Stack wrote:
> > >
> > >> On Wed, Nov 10, 2010 at 11:53 AM, Shuja Rehman<sh...@gmail.com>
> > >>  wrote:
> > >>
> > >>> oh! I think u have not read the full post. The essay has 3 paragraphs
> >  :)
> > >>>
> > >>> *Should I need to add the following line also
> > >>>
> > >>>  job.setPartitionerClass(TotalOrderPartitioner.class);
> > >>>
> > >>>
> > >> You need to specify other than default partitioner so yes, above seems
> > >> necessary (Be aware that if only one reducer, all may appear to work
> > >> though your partitioner is bad... its when you have multiple reducers
> > >> that bad partitioner will show).
> > >>
> > >
> > > I skimmed over this thread as we've been using LoadIncrementalHFiles to
> > > load the output of our MR jobs, however it looks like we're using
> > > HRegionPartitioner rather than TotalOrderPartitioner.  The current code
> > is
> > > definitely working, however the page regarding bulk loads that was
> posted
> > > earlier implies that TotalOrderPartitioner is best for efficiency.
>  What
> > is
> > > the difference between the two?
> > >
> > > - Adam
> > >
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>
>
>
> --
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Bulk Load Sample Code

Posted by Shuja Rehman <sh...@gmail.com>.

If HRegionPartitioner works correctly then what is the use of
configureIncrementalLoad() as discussed here

http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html

and why this link did not discuss about HRegionPartitioner? Where is the
documentaion of HRegionPartitioner and in which case we give preference to
HRegionPartitioner  over configureIncrementalLoad(). What is the difference
in both?


On Sat, Nov 13, 2010 at 5:12 AM, Todd Lipcon <to...@cloudera.com> wrote:

> I'm surprised that HRegionPartitioner works correctly for incremental load.
> It definitely won't work if the regions are also shifting during the MR
> job.
>
> Thanks
> -Todd
>
> On Fri, Nov 12, 2010 at 10:10 AM, Adam Phelps <am...@opendns.com> wrote:
>
> > On 11/10/10 11:57 AM, Stack wrote:
> >
> >> On Wed, Nov 10, 2010 at 11:53 AM, Shuja Rehman<sh...@gmail.com>
> >>  wrote:
> >>
> >>> oh! I think u have not read the full post. The essay has 3 paragraphs
>  :)
> >>>
> >>> *Should I need to add the following line also
> >>>
> >>>  job.setPartitionerClass(TotalOrderPartitioner.class);
> >>>
> >>>
> >> You need to specify other than default partitioner so yes, above seems
> >> necessary (Be aware that if only one reducer, all may appear to work
> >> though your partitioner is bad... its when you have multiple reducers
> >> that bad partitioner will show).
> >>
> >
> > I skimmed over this thread as we've been using LoadIncrementalHFiles to
> > load the output of our MR jobs, however it looks like we're using
> > HRegionPartitioner rather than TotalOrderPartitioner.  The current code
> is
> > definitely working, however the page regarding bulk loads that was posted
> > earlier implies that TotalOrderPartitioner is best for efficiency.  What
> is
> > the difference between the two?
> >
> > - Adam
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>

Re: Bulk Load Sample Code

Posted by Todd Lipcon <to...@cloudera.com>.

I'm surprised that HRegionPartitioner works correctly for incremental load.
It definitely won't work if the regions are also shifting during the MR job.

Thanks
-Todd

On Fri, Nov 12, 2010 at 10:10 AM, Adam Phelps <am...@opendns.com> wrote:

> On 11/10/10 11:57 AM, Stack wrote:
>
>> On Wed, Nov 10, 2010 at 11:53 AM, Shuja Rehman<sh...@gmail.com>
>>  wrote:
>>
>>> oh! I think u have not read the full post. The essay has 3 paragraphs  :)
>>>
>>> *Should I need to add the following line also
>>>
>>>  job.setPartitionerClass(TotalOrderPartitioner.class);
>>>
>>>
>> You need to specify other than default partitioner so yes, above seems
>> necessary (Be aware that if only one reducer, all may appear to work
>> though your partitioner is bad... its when you have multiple reducers
>> that bad partitioner will show).
>>
>
> I skimmed over this thread as we've been using LoadIncrementalHFiles to
> load the output of our MR jobs, however it looks like we're using
> HRegionPartitioner rather than TotalOrderPartitioner.  The current code is
> definitely working, however the page regarding bulk loads that was posted
> earlier implies that TotalOrderPartitioner is best for efficiency.  What is
> the difference between the two?
>
> - Adam
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Bulk Load Sample Code

Posted by Adam Phelps <am...@opendns.com>.

On 11/10/10 11:57 AM, Stack wrote:
> On Wed, Nov 10, 2010 at 11:53 AM, Shuja Rehman<sh...@gmail.com>  wrote:
>> oh! I think u have not read the full post. The essay has 3 paragraphs  :)
>>
>> *Should I need to add the following line also
>>
>>   job.setPartitionerClass(TotalOrderPartitioner.class);
>>
>
> You need to specify other than default partitioner so yes, above seems
> necessary (Be aware that if only one reducer, all may appear to work
> though your partitioner is bad... its when you have multiple reducers
> that bad partitioner will show).

I skimmed over this thread as we've been using LoadIncrementalHFiles to 
load the output of our MR jobs, however it looks like we're using 
HRegionPartitioner rather than TotalOrderPartitioner.  The current code 
is definitely working, however the page regarding bulk loads that was 
posted earlier implies that TotalOrderPartitioner is best for 
efficiency.  What is the difference between the two?

- Adam

Re: Bulk Load Sample Code

Posted by Stack <st...@duboce.net>.

Look in jobtracker interface.  Figure which  jobs are running.  Are
they progressing?  Probably not.  Go to the machine that is hosting
the task.  Tail tasktracker logs.  Figure who the map is trying to
talk too.  Stack dump the map task for good measure figure where its
blocked.  If its trying to talk to hbase or hdfs, go where its
connecting.  Tail logs there and/or stack trace.

The task timesout after ten minutes.  Check logs then.  Might be a clue.

Sounds like mismatched jars or something your program is doing.

St.Ack


On Fri, Nov 12, 2010 at 8:01 AM, Shuja Rehman <sh...@gmail.com> wrote:
> St.Ack,
>
> Here is one new problem now. When I run the job with 1 file, everything goes
> smooth. but when i give a set of files as input, it stuck and did not doing
> anything. here is output.
>
>
> 10/11/12 07:42:54 INFO mapreduce.HFileOutputFormat: Looking up current
> regions for table org.apache.hadoop.hbase.client.HTable@55bb93
> 10/11/12 07:42:54 INFO mapreduce.HFileOutputFormat: Configuring 1 reduce
> partitions to match current region count
> 10/11/12 07:42:54 INFO mapreduce.HFileOutputFormat: Writing partition
> information to
> hdfs://app4.hsd1.wa.comcast.net./user/root/partitions_1289576574949
> 10/11/12 07:42:55 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 10/11/12 07:42:55 INFO zlib.ZlibFactory: Successfully loaded & initialized
> native-zlib library
> 10/11/12 07:42:55 INFO compress.CodecPool: Got brand-new compressor
> 10/11/12 07:42:55 INFO mapreduce.HFileOutputFormat: Incremental table output
> configured.
> 10/11/12 07:42:55 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 10/11/12 07:42:56 INFO input.FileInputFormat: Total input paths to process :
> 96
> 10/11/12 07:42:56 INFO mapred.JobClient: Running job: job_201011120442_0004
> 10/11/12 07:42:57 INFO mapred.JobClient:  map 0% reduce 0%
>
> Any guess why it is not proceeding forward?
>
> Thanks
>
>
> On Fri, Nov 12, 2010 at 8:04 PM, Shuja Rehman <sh...@gmail.com> wrote:
>
>> Thanks St.Ack
>>
>> It solved the problem.
>>
>>
>> On Fri, Nov 12, 2010 at 7:41 PM, Stack <st...@duboce.net> wrote:
>>
>>> Fix your classpath.  Add the google library.  See
>>>
>>> http://hbase.apache.org/docs/r0.89.20100924/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description
>>> for more on classpath.
>>>
>>> St.Ack
>>>
>>>
>>> On Fri, Nov 12, 2010 at 5:07 AM, Shuja Rehman <sh...@gmail.com>
>>> wrote:
>>> > Hi
>>> >
>>> > I am trying to use configureIncrementalLoad() function to handle the
>>> > totalOrderPartitioning but it throws this exception.
>>> >
>>> > 10/11/12 05:02:21 INFO zookeeper.ClientCnxn: Opening socket connection
>>> to
>>> > server /10.10.10.2:2181
>>> > 10/11/12 05:02:21 INFO zookeeper.ClientCnxn: Socket connection
>>> established
>>> > to app4.hsd1.wa.comcast.net./10.10.10.2:2181, initiating session
>>> > 10/11/12 05:02:21 INFO zookeeper.ClientCnxn: Session establishment
>>> complete
>>> > on server app4.hsd1.wa.comcast.net./10.10.10.2:2181, sessionid =
>>> > 0x12c401bfdae0008, negotiated timeout = 40000
>>> > 10/11/12 05:02:21 INFO mapreduce.HFileOutputFormat: Looking up current
>>> > regions for table org.apache.hadoop.hbase.client.HTable@21e554
>>> > 10/11/12 05:02:21 INFO mapreduce.HFileOutputFormat: Configuring 1 reduce
>>> > partitions to match current region count
>>> > 10/11/12 05:02:21 INFO mapreduce.HFileOutputFormat: Writing partition
>>> > information to
>>> > hdfs://app4.hsd1.wa.comcast.net./user/root/partitions_1289566941504
>>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>> > com/google/common/base/Preconditions
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.writePartitions(HFileOutputFormat.java:185)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.configureIncrementalLoad(HFileOutputFormat.java:258)
>>> >        at ParserDriver.runJob(ParserDriver.java:162)
>>> >        at ParserDriver.main(ParserDriver.java:109)
>>> >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> >        at
>>> >
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> >        at
>>> >
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> >        at java.lang.reflect.Method.invoke(Method.java:597)
>>> >        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>>> > Caused by: java.lang.ClassNotFoundException:
>>> > com.google.common.base.Preconditions
>>> >        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>> >        at java.security.AccessController.doPrivileged(Native Method)
>>> >        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>> >        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>> >        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>> >        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>> >        ... 9 more
>>> > 10/11/12 05:02:21 INFO zookeeper.ZooKeeper: Session: 0x12c401bfdae0008
>>> > closed
>>> >
>>> > Here is the code.
>>> >
>>> >  Configuration conf = HBaseConfiguration.create();
>>> >
>>> >   Job job = new Job(conf, "j");
>>> >
>>> >    HTable table = new HTable(conf, "mytab");
>>> >
>>> >    FileInputFormat.setInputPaths(job, input);
>>> >    job.setJarByClass(ParserDriver.class);
>>> >    job.setMapperClass(MyParserMapper.class);
>>> >
>>> >    job.setInputFormatClass(XmlInputFormat.class);
>>> >    job.setReducerClass(PutSortReducer.class);
>>> >    Path outPath = new Path(output);
>>> >    FileOutputFormat.setOutputPath(job, outPath);
>>> >
>>> >    job.setMapOutputValueClass(Put.class);
>>> >    job.setMapOutputKeyClass(ImmutableBytesWritable.class);
>>> > *    HFileOutputFormat.configureIncrementalLoad(job, table);*
>>> >    TableMapReduceUtil.addDependencyJars(job);
>>> >     job.waitForCompletion(true);
>>> >
>>> > I guess, there are some jar files missing. if yes then from where to get
>>> > these?
>>> >
>>> > Thanks
>>> >
>>> > On Thu, Nov 11, 2010 at 12:57 AM, Stack <st...@duboce.net> wrote:
>>> >
>>> >> On Wed, Nov 10, 2010 at 11:53 AM, Shuja Rehman <sh...@gmail.com>
>>> >> wrote:
>>> >> > oh! I think u have not read the full post. The essay has 3 paragraphs
>>>  :)
>>> >> >
>>> >> > *Should I need to add the following line also
>>> >> >
>>> >> >  job.setPartitionerClass(TotalOrderPartitioner.class);
>>> >> >
>>> >>
>>> >> You need to specify other than default partitioner so yes, above seems
>>> >> necessary (Be aware that if only one reducer, all may appear to work
>>> >> though your partitioner is bad... its when you have multiple reducers
>>> >> that bad partitioner will show).
>>> >>
>>> >> > which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?
>>> >> >
>>> >>
>>> >> Yes.  Or 2nd edition, October 2010.
>>> >>
>>> >> St.Ack
>>> >>
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Thu, Nov 11, 2010 at 12:49 AM, Stack <st...@duboce.net> wrote:
>>> >> >
>>> >> >> Which two questions (you wrote an essay that looked like one big
>>> >> >> question -- smile).
>>> >> >> St.Ack
>>> >> >>
>>> >> >> On Wed, Nov 10, 2010 at 11:44 AM, Shuja Rehman <
>>> shujamughal@gmail.com>
>>> >> >> wrote:
>>> >> >> > yeah, I tried it and it did not fails. can u answer other 2
>>> questions
>>> >> as
>>> >> >> > well?
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > On Thu, Nov 11, 2010 at 12:15 AM, Stack <st...@duboce.net> wrote:
>>> >> >> >
>>> >> >> >> All below looks reasonable (I did not do detailed review of your
>>> code
>>> >> >> >> posting).  Have you tried it?  Did it fail?
>>> >> >> >> St.Ack
>>> >> >> >>
>>> >> >> >> On Wed, Nov 10, 2010 at 11:12 AM, Shuja Rehman <
>>> >> shujamughal@gmail.com>
>>> >> >> >> wrote:
>>> >> >> >> > On Wed, Nov 10, 2010 at 9:20 PM, Stack <st...@duboce.net>
>>> wrote:
>>> >> >> >> >
>>> >> >> >> >> What you need?  bulk-upload, in the scheme of things, is a
>>> well
>>> >> >> >> >> documented feature.  Its also one that has had some exercise
>>> and
>>> >> is
>>> >> >> >> >> known to work well.  For a 0.89 release and trunk,
>>> documentation
>>> >> is
>>> >> >> >> >> here:
>>> http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html
>>> >> .
>>> >> >> >> >> The unit test you refer to below is good for figuring how to
>>> run a
>>> >> >> job
>>> >> >> >> >> (Bulk-upload was redone for 0.89/trunk and is much improved
>>> over
>>> >> what
>>> >> >> >> >> was available in 0.20.x)
>>> >> >> >> >>
>>> >> >> >> >
>>> >> >> >> > *I need to load data into hbase using Hfiles.  *
>>> >> >> >> >
>>> >> >> >> > Ok, let me tell what I understand from all these things.
>>> Basically
>>> >> >> there
>>> >> >> >> are
>>> >> >> >> > two ways to bulk load into hbase.
>>> >> >> >> >
>>> >> >> >> > 1- Using Command Line tools (importtsv, completebulkload )
>>> >> >> >> > 2- Mapreduce job using HFileOutputFormat
>>> >> >> >> >
>>> >> >> >> > At the moment, I have generated the Hfiles using
>>> HFileOutputFormat
>>> >> and
>>> >> >> >> > loading these files into hbase using completebulkload command
>>> line
>>> >> >> tool.
>>> >> >> >> > here is my basic code skeleton. Correct me if I do anything
>>> wrong.
>>> >> >> >> >
>>> >> >> >> > Configuration conf = new Configuration();
>>> >> >> >> > Job job = new Job(conf, "myjob");
>>> >> >> >> >
>>> >> >> >> >    FileInputFormat.setInputPaths(job, input);
>>> >> >> >> >    job.setJarByClass(ParserDriver.class);
>>> >> >> >> >    job.setMapperClass(MyParserMapper.class);
>>> >> >> >> >    job.setNumReduceTasks(1);
>>> >> >> >> >    job.setInputFormatClass(XmlInputFormat.class);
>>> >> >> >> >    job.setOutputFormatClass(HFileOutputFormat.class);
>>> >> >> >> >    job.setOutputKeyClass(ImmutableBytesWritable.class);
>>> >> >> >> >    job.setOutputValueClass(Put.class);
>>> >> >> >> >    job.setReducerClass(PutSortReducer.class);
>>> >> >> >> >
>>> >> >> >> >    Path outPath = new Path(output);
>>> >> >> >> >    FileOutputFormat.setOutputPath(job, outPath);
>>> >> >> >> >          job.waitForCompletion(true);
>>> >> >> >> >
>>> >> >> >> > and here is mapper skeleton
>>> >> >> >> >
>>> >> >> >> > public class MyParserMapper   extends
>>> >> >> >> >    Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
>>> >> >> >> >  while(true)
>>> >> >> >> >   {
>>> >> >> >> >       Put put = new Put(rowId);
>>> >> >> >> >      put.add(...);
>>> >> >> >> >      context.write(rwId, put);
>>> >> >> >> >   }
>>> >> >> >> >
>>> >> >> >> > The link says:
>>> >> >> >> > *In order to function efficiently, HFileOutputFormat must be
>>> >> >> configured
>>> >> >> >> such
>>> >> >> >> > that each output HFile fits within a single region. In order to
>>> do
>>> >> >> this,
>>> >> >> >> > jobs use Hadoop's TotalOrderPartitioner class to partition the
>>> map
>>> >> >> output
>>> >> >> >> > into disjoint ranges of the key space, corresponding to the key
>>> >> ranges
>>> >> >> of
>>> >> >> >> > the regions in the table. *"
>>> >> >> >> >
>>> >> >> >> > Now according to my configuration above  where i need to set
>>> >> >> >> > *TotalOrderPartitioner
>>> >> >> >> > ? *Should I need to add the following line also
>>> >> >> >> >
>>> >> >> >> > job.setPartitionerClass(TotalOrderPartitioner.class);
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > On totalorderpartition, this is a partitioner class from
>>> hadoop.
>>> >>  The
>>> >> >> >> >> MR partitioner -- the class that dictates which reducers get
>>> what
>>> >> map
>>> >> >> >> >> outputs -- is pluggable. The default partitioner does a hash
>>> of
>>> >> the
>>> >> >> >> >> output key to figure which reducer.  This won't work if you
>>> are
>>> >> >> >> >> looking to have your hfile output totally sorted.
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >> If you can't figure what its about, I'd suggest you check out
>>> the
>>> >> >> >> >> hadoop book where it gets a good explication.
>>> >> >> >> >>
>>> >> >> >> >>   which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?
>>> >> >> >> >
>>> >> >> >> > On incremental upload, the doc. suggests you look at the output
>>> for
>>> >> >> >> >> LoadIncrementalHFiles command.  Have you done that?  You run
>>> the
>>> >> >> >> >> command and it'll add in whatever is ready for loading.
>>> >> >> >> >>
>>> >> >> >> >
>>> >> >> >> >   I just use the command line tool for bulk uplaod but not seen
>>> >> >> >> > LoadIncrementalHFiles  class yet to do it through program
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >  ------------------------------
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >>
>>> >> >> >> >> St.Ack
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman <
>>> >> shujamughal@gmail.com
>>> >> >> >
>>> >> >> >> >> wrote:
>>> >> >> >> >> > Hey Community,
>>> >> >> >> >> >
>>> >> >> >> >> > Well...it seems that nobody has experienced with the bulk
>>> load
>>> >> >> option.
>>> >> >> >> I
>>> >> >> >> >> > have found one class which might help to write the code for
>>> it.
>>> >> >> >> >> >
>>> >> >> >> >> >
>>> >> >> >> >>
>>> >> >> >>
>>> >> >>
>>> >>
>>> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
>>> >> >> >> >> >
>>> >> >> >> >> > From this, you can get the idea how to write map reduce job
>>> to
>>> >> >> output
>>> >> >> >> in
>>> >> >> >> >> > HFiles format. But There is a little confusion about these
>>> two
>>> >> >> things
>>> >> >> >> >> >
>>> >> >> >> >> > 1-TotalOrderPartitioner
>>> >> >> >> >> > 2-configureIncrementalLoad
>>> >> >> >> >> >
>>> >> >> >> >> > Does anybody have idea about how these things and how to
>>> >> configure
>>> >> >> it
>>> >> >> >> for
>>> >> >> >> >> > the job?
>>> >> >> >> >> >
>>> >> >> >> >> > Thanks
>>> >> >> >> >> >
>>> >> >> >> >> >
>>> >> >> >> >> >
>>> >> >> >> >> > On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <
>>> >> >> shujamughal@gmail.com>
>>> >> >> >> >> wrote:
>>> >> >> >> >> >
>>> >> >> >> >> >> Hi
>>> >> >> >> >> >>
>>> >> >> >> >> >> I am trying to investigate the bulk load option as
>>> described in
>>> >> >> the
>>> >> >> >> >> >> following link.
>>> >> >> >> >> >>
>>> >> >> >> >> >>
>>> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
>>> >> >> >> >> >>
>>> >> >> >> >> >> Does anybody have sample code or have used it before?
>>> >> >> >> >> >> Can it be helpful to insert data into existing table. In my
>>> >> >> scenario,
>>> >> >> >> I
>>> >> >> >> >> >> have one table with 1 column family in which data will be
>>> >> inserted
>>> >> >> >> every
>>> >> >> >> >> 15
>>> >> >> >> >> >> minutes.
>>> >> >> >> >> >>
>>> >> >> >> >> >> Kindly share your experiences
>>> >> >> >> >> >>
>>> >> >> >> >> >> Thanks
>>> >> >> >> >> >> --
>>> >> >> >> >> >> Regards
>>> >> >> >> >> >> Shuja-ur-Rehman Baig
>>> >> >> >> >> >> <http://pk.linkedin.com/in/shujamughal>
>>> >> >> >> >> >>
>>> >> >> >> >> >>
>>> >> >> >> >> >
>>> >> >> >> >> >
>>> >> >> >> >> > --
>>> >> >> >> >> > Regards
>>> >> >> >> >> > Shuja-ur-Rehman Baig
>>> >> >> >> >> > <http://pk.linkedin.com/in/shujamughal>
>>> >> >> >> >> >
>>> >> >> >> >>
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > --
>>> >> >> >> > Regards
>>> >> >> >> > Shuja-ur-Rehman Baig
>>> >> >> >> > <http://pk.linkedin.com/in/shujamughal>
>>> >> >> >> >
>>> >> >> >>
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > --
>>> >> >> > Regards
>>> >> >> > Shuja-ur-Rehman Baig
>>> >> >> > <http://pk.linkedin.com/in/shujamughal>
>>> >> >> >
>>> >> >>
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Regards
>>> >> > Shuja-ur-Rehman Baig
>>> >> > <http://pk.linkedin.com/in/shujamughal>
>>> >> >
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Regards
>>> > Shuja-ur-Rehman Baig
>>> > <http://pk.linkedin.com/in/shujamughal>
>>> >
>>>
>>
>>
>>
>> --
>> Regards
>> Shuja-ur-Rehman Baig
>> <http://pk.linkedin.com/in/shujamughal>
>>
>>
>
>
> --
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>
>

Re: Bulk Load Sample Code

Posted by Shuja Rehman <sh...@gmail.com>.

St.Ack,

Here is one new problem now. When I run the job with 1 file, everything goes
smooth. but when i give a set of files as input, it stuck and did not doing
anything. here is output.


10/11/12 07:42:54 INFO mapreduce.HFileOutputFormat: Looking up current
regions for table org.apache.hadoop.hbase.client.HTable@55bb93
10/11/12 07:42:54 INFO mapreduce.HFileOutputFormat: Configuring 1 reduce
partitions to match current region count
10/11/12 07:42:54 INFO mapreduce.HFileOutputFormat: Writing partition
information to
hdfs://app4.hsd1.wa.comcast.net./user/root/partitions_1289576574949
10/11/12 07:42:55 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
10/11/12 07:42:55 INFO zlib.ZlibFactory: Successfully loaded & initialized
native-zlib library
10/11/12 07:42:55 INFO compress.CodecPool: Got brand-new compressor
10/11/12 07:42:55 INFO mapreduce.HFileOutputFormat: Incremental table output
configured.
10/11/12 07:42:55 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
10/11/12 07:42:56 INFO input.FileInputFormat: Total input paths to process :
96
10/11/12 07:42:56 INFO mapred.JobClient: Running job: job_201011120442_0004
10/11/12 07:42:57 INFO mapred.JobClient:  map 0% reduce 0%

Any guess why it is not proceeding forward?

Thanks


On Fri, Nov 12, 2010 at 8:04 PM, Shuja Rehman <sh...@gmail.com> wrote:

> Thanks St.Ack
>
> It solved the problem.
>
>
> On Fri, Nov 12, 2010 at 7:41 PM, Stack <st...@duboce.net> wrote:
>
>> Fix your classpath.  Add the google library.  See
>>
>> http://hbase.apache.org/docs/r0.89.20100924/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description
>> for more on classpath.
>>
>> St.Ack
>>
>>
>> On Fri, Nov 12, 2010 at 5:07 AM, Shuja Rehman <sh...@gmail.com>
>> wrote:
>> > Hi
>> >
>> > I am trying to use configureIncrementalLoad() function to handle the
>> > totalOrderPartitioning but it throws this exception.
>> >
>> > 10/11/12 05:02:21 INFO zookeeper.ClientCnxn: Opening socket connection
>> to
>> > server /10.10.10.2:2181
>> > 10/11/12 05:02:21 INFO zookeeper.ClientCnxn: Socket connection
>> established
>> > to app4.hsd1.wa.comcast.net./10.10.10.2:2181, initiating session
>> > 10/11/12 05:02:21 INFO zookeeper.ClientCnxn: Session establishment
>> complete
>> > on server app4.hsd1.wa.comcast.net./10.10.10.2:2181, sessionid =
>> > 0x12c401bfdae0008, negotiated timeout = 40000
>> > 10/11/12 05:02:21 INFO mapreduce.HFileOutputFormat: Looking up current
>> > regions for table org.apache.hadoop.hbase.client.HTable@21e554
>> > 10/11/12 05:02:21 INFO mapreduce.HFileOutputFormat: Configuring 1 reduce
>> > partitions to match current region count
>> > 10/11/12 05:02:21 INFO mapreduce.HFileOutputFormat: Writing partition
>> > information to
>> > hdfs://app4.hsd1.wa.comcast.net./user/root/partitions_1289566941504
>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>> > com/google/common/base/Preconditions
>> >        at
>> >
>> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.writePartitions(HFileOutputFormat.java:185)
>> >        at
>> >
>> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.configureIncrementalLoad(HFileOutputFormat.java:258)
>> >        at ParserDriver.runJob(ParserDriver.java:162)
>> >        at ParserDriver.main(ParserDriver.java:109)
>> >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >        at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >        at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >        at java.lang.reflect.Method.invoke(Method.java:597)
>> >        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>> > Caused by: java.lang.ClassNotFoundException:
>> > com.google.common.base.Preconditions
>> >        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>> >        at java.security.AccessController.doPrivileged(Native Method)
>> >        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>> >        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>> >        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>> >        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>> >        ... 9 more
>> > 10/11/12 05:02:21 INFO zookeeper.ZooKeeper: Session: 0x12c401bfdae0008
>> > closed
>> >
>> > Here is the code.
>> >
>> >  Configuration conf = HBaseConfiguration.create();
>> >
>> >   Job job = new Job(conf, "j");
>> >
>> >    HTable table = new HTable(conf, "mytab");
>> >
>> >    FileInputFormat.setInputPaths(job, input);
>> >    job.setJarByClass(ParserDriver.class);
>> >    job.setMapperClass(MyParserMapper.class);
>> >
>> >    job.setInputFormatClass(XmlInputFormat.class);
>> >    job.setReducerClass(PutSortReducer.class);
>> >    Path outPath = new Path(output);
>> >    FileOutputFormat.setOutputPath(job, outPath);
>> >
>> >    job.setMapOutputValueClass(Put.class);
>> >    job.setMapOutputKeyClass(ImmutableBytesWritable.class);
>> > *    HFileOutputFormat.configureIncrementalLoad(job, table);*
>> >    TableMapReduceUtil.addDependencyJars(job);
>> >     job.waitForCompletion(true);
>> >
>> > I guess, there are some jar files missing. if yes then from where to get
>> > these?
>> >
>> > Thanks
>> >
>> > On Thu, Nov 11, 2010 at 12:57 AM, Stack <st...@duboce.net> wrote:
>> >
>> >> On Wed, Nov 10, 2010 at 11:53 AM, Shuja Rehman <sh...@gmail.com>
>> >> wrote:
>> >> > oh! I think u have not read the full post. The essay has 3 paragraphs
>>  :)
>> >> >
>> >> > *Should I need to add the following line also
>> >> >
>> >> >  job.setPartitionerClass(TotalOrderPartitioner.class);
>> >> >
>> >>
>> >> You need to specify other than default partitioner so yes, above seems
>> >> necessary (Be aware that if only one reducer, all may appear to work
>> >> though your partitioner is bad... its when you have multiple reducers
>> >> that bad partitioner will show).
>> >>
>> >> > which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?
>> >> >
>> >>
>> >> Yes.  Or 2nd edition, October 2010.
>> >>
>> >> St.Ack
>> >>
>> >> >
>> >> >
>> >> >
>> >> > On Thu, Nov 11, 2010 at 12:49 AM, Stack <st...@duboce.net> wrote:
>> >> >
>> >> >> Which two questions (you wrote an essay that looked like one big
>> >> >> question -- smile).
>> >> >> St.Ack
>> >> >>
>> >> >> On Wed, Nov 10, 2010 at 11:44 AM, Shuja Rehman <
>> shujamughal@gmail.com>
>> >> >> wrote:
>> >> >> > yeah, I tried it and it did not fails. can u answer other 2
>> questions
>> >> as
>> >> >> > well?
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Thu, Nov 11, 2010 at 12:15 AM, Stack <st...@duboce.net> wrote:
>> >> >> >
>> >> >> >> All below looks reasonable (I did not do detailed review of your
>> code
>> >> >> >> posting).  Have you tried it?  Did it fail?
>> >> >> >> St.Ack
>> >> >> >>
>> >> >> >> On Wed, Nov 10, 2010 at 11:12 AM, Shuja Rehman <
>> >> shujamughal@gmail.com>
>> >> >> >> wrote:
>> >> >> >> > On Wed, Nov 10, 2010 at 9:20 PM, Stack <st...@duboce.net>
>> wrote:
>> >> >> >> >
>> >> >> >> >> What you need?  bulk-upload, in the scheme of things, is a
>> well
>> >> >> >> >> documented feature.  Its also one that has had some exercise
>> and
>> >> is
>> >> >> >> >> known to work well.  For a 0.89 release and trunk,
>> documentation
>> >> is
>> >> >> >> >> here:
>> http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html
>> >> .
>> >> >> >> >> The unit test you refer to below is good for figuring how to
>> run a
>> >> >> job
>> >> >> >> >> (Bulk-upload was redone for 0.89/trunk and is much improved
>> over
>> >> what
>> >> >> >> >> was available in 0.20.x)
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> > *I need to load data into hbase using Hfiles.  *
>> >> >> >> >
>> >> >> >> > Ok, let me tell what I understand from all these things.
>> Basically
>> >> >> there
>> >> >> >> are
>> >> >> >> > two ways to bulk load into hbase.
>> >> >> >> >
>> >> >> >> > 1- Using Command Line tools (importtsv, completebulkload )
>> >> >> >> > 2- Mapreduce job using HFileOutputFormat
>> >> >> >> >
>> >> >> >> > At the moment, I have generated the Hfiles using
>> HFileOutputFormat
>> >> and
>> >> >> >> > loading these files into hbase using completebulkload command
>> line
>> >> >> tool.
>> >> >> >> > here is my basic code skeleton. Correct me if I do anything
>> wrong.
>> >> >> >> >
>> >> >> >> > Configuration conf = new Configuration();
>> >> >> >> > Job job = new Job(conf, "myjob");
>> >> >> >> >
>> >> >> >> >    FileInputFormat.setInputPaths(job, input);
>> >> >> >> >    job.setJarByClass(ParserDriver.class);
>> >> >> >> >    job.setMapperClass(MyParserMapper.class);
>> >> >> >> >    job.setNumReduceTasks(1);
>> >> >> >> >    job.setInputFormatClass(XmlInputFormat.class);
>> >> >> >> >    job.setOutputFormatClass(HFileOutputFormat.class);
>> >> >> >> >    job.setOutputKeyClass(ImmutableBytesWritable.class);
>> >> >> >> >    job.setOutputValueClass(Put.class);
>> >> >> >> >    job.setReducerClass(PutSortReducer.class);
>> >> >> >> >
>> >> >> >> >    Path outPath = new Path(output);
>> >> >> >> >    FileOutputFormat.setOutputPath(job, outPath);
>> >> >> >> >          job.waitForCompletion(true);
>> >> >> >> >
>> >> >> >> > and here is mapper skeleton
>> >> >> >> >
>> >> >> >> > public class MyParserMapper   extends
>> >> >> >> >    Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
>> >> >> >> >  while(true)
>> >> >> >> >   {
>> >> >> >> >       Put put = new Put(rowId);
>> >> >> >> >      put.add(...);
>> >> >> >> >      context.write(rwId, put);
>> >> >> >> >   }
>> >> >> >> >
>> >> >> >> > The link says:
>> >> >> >> > *In order to function efficiently, HFileOutputFormat must be
>> >> >> configured
>> >> >> >> such
>> >> >> >> > that each output HFile fits within a single region. In order to
>> do
>> >> >> this,
>> >> >> >> > jobs use Hadoop's TotalOrderPartitioner class to partition the
>> map
>> >> >> output
>> >> >> >> > into disjoint ranges of the key space, corresponding to the key
>> >> ranges
>> >> >> of
>> >> >> >> > the regions in the table. *"
>> >> >> >> >
>> >> >> >> > Now according to my configuration above  where i need to set
>> >> >> >> > *TotalOrderPartitioner
>> >> >> >> > ? *Should I need to add the following line also
>> >> >> >> >
>> >> >> >> > job.setPartitionerClass(TotalOrderPartitioner.class);
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On totalorderpartition, this is a partitioner class from
>> hadoop.
>> >>  The
>> >> >> >> >> MR partitioner -- the class that dictates which reducers get
>> what
>> >> map
>> >> >> >> >> outputs -- is pluggable. The default partitioner does a hash
>> of
>> >> the
>> >> >> >> >> output key to figure which reducer.  This won't work if you
>> are
>> >> >> >> >> looking to have your hfile output totally sorted.
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >> If you can't figure what its about, I'd suggest you check out
>> the
>> >> >> >> >> hadoop book where it gets a good explication.
>> >> >> >> >>
>> >> >> >> >>   which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?
>> >> >> >> >
>> >> >> >> > On incremental upload, the doc. suggests you look at the output
>> for
>> >> >> >> >> LoadIncrementalHFiles command.  Have you done that?  You run
>> the
>> >> >> >> >> command and it'll add in whatever is ready for loading.
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> >   I just use the command line tool for bulk uplaod but not seen
>> >> >> >> > LoadIncrementalHFiles  class yet to do it through program
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >  ------------------------------
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> St.Ack
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman <
>> >> shujamughal@gmail.com
>> >> >> >
>> >> >> >> >> wrote:
>> >> >> >> >> > Hey Community,
>> >> >> >> >> >
>> >> >> >> >> > Well...it seems that nobody has experienced with the bulk
>> load
>> >> >> option.
>> >> >> >> I
>> >> >> >> >> > have found one class which might help to write the code for
>> it.
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
>> >> >> >> >> >
>> >> >> >> >> > From this, you can get the idea how to write map reduce job
>> to
>> >> >> output
>> >> >> >> in
>> >> >> >> >> > HFiles format. But There is a little confusion about these
>> two
>> >> >> things
>> >> >> >> >> >
>> >> >> >> >> > 1-TotalOrderPartitioner
>> >> >> >> >> > 2-configureIncrementalLoad
>> >> >> >> >> >
>> >> >> >> >> > Does anybody have idea about how these things and how to
>> >> configure
>> >> >> it
>> >> >> >> for
>> >> >> >> >> > the job?
>> >> >> >> >> >
>> >> >> >> >> > Thanks
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <
>> >> >> shujamughal@gmail.com>
>> >> >> >> >> wrote:
>> >> >> >> >> >
>> >> >> >> >> >> Hi
>> >> >> >> >> >>
>> >> >> >> >> >> I am trying to investigate the bulk load option as
>> described in
>> >> >> the
>> >> >> >> >> >> following link.
>> >> >> >> >> >>
>> >> >> >> >> >>
>> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
>> >> >> >> >> >>
>> >> >> >> >> >> Does anybody have sample code or have used it before?
>> >> >> >> >> >> Can it be helpful to insert data into existing table. In my
>> >> >> scenario,
>> >> >> >> I
>> >> >> >> >> >> have one table with 1 column family in which data will be
>> >> inserted
>> >> >> >> every
>> >> >> >> >> 15
>> >> >> >> >> >> minutes.
>> >> >> >> >> >>
>> >> >> >> >> >> Kindly share your experiences
>> >> >> >> >> >>
>> >> >> >> >> >> Thanks
>> >> >> >> >> >> --
>> >> >> >> >> >> Regards
>> >> >> >> >> >> Shuja-ur-Rehman Baig
>> >> >> >> >> >> <http://pk.linkedin.com/in/shujamughal>
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > --
>> >> >> >> >> > Regards
>> >> >> >> >> > Shuja-ur-Rehman Baig
>> >> >> >> >> > <http://pk.linkedin.com/in/shujamughal>
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > --
>> >> >> >> > Regards
>> >> >> >> > Shuja-ur-Rehman Baig
>> >> >> >> > <http://pk.linkedin.com/in/shujamughal>
>> >> >> >> >
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Regards
>> >> >> > Shuja-ur-Rehman Baig
>> >> >> > <http://pk.linkedin.com/in/shujamughal>
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Regards
>> >> > Shuja-ur-Rehman Baig
>> >> > <http://pk.linkedin.com/in/shujamughal>
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Regards
>> > Shuja-ur-Rehman Baig
>> > <http://pk.linkedin.com/in/shujamughal>
>> >
>>
>
>
>
> --
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>
>
>


-- 
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>

Re: Bulk Load Sample Code

Posted by Shuja Rehman <sh...@gmail.com>.

Thanks St.Ack

It solved the problem.

On Fri, Nov 12, 2010 at 7:41 PM, Stack <st...@duboce.net> wrote:

> Fix your classpath.  Add the google library.  See
>
> http://hbase.apache.org/docs/r0.89.20100924/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description
> for more on classpath.
>
> St.Ack
>
>
> On Fri, Nov 12, 2010 at 5:07 AM, Shuja Rehman <sh...@gmail.com>
> wrote:
> > Hi
> >
> > I am trying to use configureIncrementalLoad() function to handle the
> > totalOrderPartitioning but it throws this exception.
> >
> > 10/11/12 05:02:21 INFO zookeeper.ClientCnxn: Opening socket connection to
> > server /10.10.10.2:2181
> > 10/11/12 05:02:21 INFO zookeeper.ClientCnxn: Socket connection
> established
> > to app4.hsd1.wa.comcast.net./10.10.10.2:2181, initiating session
> > 10/11/12 05:02:21 INFO zookeeper.ClientCnxn: Session establishment
> complete
> > on server app4.hsd1.wa.comcast.net./10.10.10.2:2181, sessionid =
> > 0x12c401bfdae0008, negotiated timeout = 40000
> > 10/11/12 05:02:21 INFO mapreduce.HFileOutputFormat: Looking up current
> > regions for table org.apache.hadoop.hbase.client.HTable@21e554
> > 10/11/12 05:02:21 INFO mapreduce.HFileOutputFormat: Configuring 1 reduce
> > partitions to match current region count
> > 10/11/12 05:02:21 INFO mapreduce.HFileOutputFormat: Writing partition
> > information to
> > hdfs://app4.hsd1.wa.comcast.net./user/root/partitions_1289566941504
> > Exception in thread "main" java.lang.NoClassDefFoundError:
> > com/google/common/base/Preconditions
> >        at
> >
> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.writePartitions(HFileOutputFormat.java:185)
> >        at
> >
> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.configureIncrementalLoad(HFileOutputFormat.java:258)
> >        at ParserDriver.runJob(ParserDriver.java:162)
> >        at ParserDriver.main(ParserDriver.java:109)
> >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >        at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >        at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> > Caused by: java.lang.ClassNotFoundException:
> > com.google.common.base.Preconditions
> >        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> >        at java.security.AccessController.doPrivileged(Native Method)
> >        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> >        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
> >        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> >        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
> >        ... 9 more
> > 10/11/12 05:02:21 INFO zookeeper.ZooKeeper: Session: 0x12c401bfdae0008
> > closed
> >
> > Here is the code.
> >
> >  Configuration conf = HBaseConfiguration.create();
> >
> >   Job job = new Job(conf, "j");
> >
> >    HTable table = new HTable(conf, "mytab");
> >
> >    FileInputFormat.setInputPaths(job, input);
> >    job.setJarByClass(ParserDriver.class);
> >    job.setMapperClass(MyParserMapper.class);
> >
> >    job.setInputFormatClass(XmlInputFormat.class);
> >    job.setReducerClass(PutSortReducer.class);
> >    Path outPath = new Path(output);
> >    FileOutputFormat.setOutputPath(job, outPath);
> >
> >    job.setMapOutputValueClass(Put.class);
> >    job.setMapOutputKeyClass(ImmutableBytesWritable.class);
> > *    HFileOutputFormat.configureIncrementalLoad(job, table);*
> >    TableMapReduceUtil.addDependencyJars(job);
> >     job.waitForCompletion(true);
> >
> > I guess, there are some jar files missing. if yes then from where to get
> > these?
> >
> > Thanks
> >
> > On Thu, Nov 11, 2010 at 12:57 AM, Stack <st...@duboce.net> wrote:
> >
> >> On Wed, Nov 10, 2010 at 11:53 AM, Shuja Rehman <sh...@gmail.com>
> >> wrote:
> >> > oh! I think u have not read the full post. The essay has 3 paragraphs
>  :)
> >> >
> >> > *Should I need to add the following line also
> >> >
> >> >  job.setPartitionerClass(TotalOrderPartitioner.class);
> >> >
> >>
> >> You need to specify other than default partitioner so yes, above seems
> >> necessary (Be aware that if only one reducer, all may appear to work
> >> though your partitioner is bad... its when you have multiple reducers
> >> that bad partitioner will show).
> >>
> >> > which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?
> >> >
> >>
> >> Yes.  Or 2nd edition, October 2010.
> >>
> >> St.Ack
> >>
> >> >
> >> >
> >> >
> >> > On Thu, Nov 11, 2010 at 12:49 AM, Stack <st...@duboce.net> wrote:
> >> >
> >> >> Which two questions (you wrote an essay that looked like one big
> >> >> question -- smile).
> >> >> St.Ack
> >> >>
> >> >> On Wed, Nov 10, 2010 at 11:44 AM, Shuja Rehman <
> shujamughal@gmail.com>
> >> >> wrote:
> >> >> > yeah, I tried it and it did not fails. can u answer other 2
> questions
> >> as
> >> >> > well?
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Thu, Nov 11, 2010 at 12:15 AM, Stack <st...@duboce.net> wrote:
> >> >> >
> >> >> >> All below looks reasonable (I did not do detailed review of your
> code
> >> >> >> posting).  Have you tried it?  Did it fail?
> >> >> >> St.Ack
> >> >> >>
> >> >> >> On Wed, Nov 10, 2010 at 11:12 AM, Shuja Rehman <
> >> shujamughal@gmail.com>
> >> >> >> wrote:
> >> >> >> > On Wed, Nov 10, 2010 at 9:20 PM, Stack <st...@duboce.net>
> wrote:
> >> >> >> >
> >> >> >> >> What you need?  bulk-upload, in the scheme of things, is a well
> >> >> >> >> documented feature.  Its also one that has had some exercise
> and
> >> is
> >> >> >> >> known to work well.  For a 0.89 release and trunk,
> documentation
> >> is
> >> >> >> >> here:
> http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html
> >> .
> >> >> >> >> The unit test you refer to below is good for figuring how to
> run a
> >> >> job
> >> >> >> >> (Bulk-upload was redone for 0.89/trunk and is much improved
> over
> >> what
> >> >> >> >> was available in 0.20.x)
> >> >> >> >>
> >> >> >> >
> >> >> >> > *I need to load data into hbase using Hfiles.  *
> >> >> >> >
> >> >> >> > Ok, let me tell what I understand from all these things.
> Basically
> >> >> there
> >> >> >> are
> >> >> >> > two ways to bulk load into hbase.
> >> >> >> >
> >> >> >> > 1- Using Command Line tools (importtsv, completebulkload )
> >> >> >> > 2- Mapreduce job using HFileOutputFormat
> >> >> >> >
> >> >> >> > At the moment, I have generated the Hfiles using
> HFileOutputFormat
> >> and
> >> >> >> > loading these files into hbase using completebulkload command
> line
> >> >> tool.
> >> >> >> > here is my basic code skeleton. Correct me if I do anything
> wrong.
> >> >> >> >
> >> >> >> > Configuration conf = new Configuration();
> >> >> >> > Job job = new Job(conf, "myjob");
> >> >> >> >
> >> >> >> >    FileInputFormat.setInputPaths(job, input);
> >> >> >> >    job.setJarByClass(ParserDriver.class);
> >> >> >> >    job.setMapperClass(MyParserMapper.class);
> >> >> >> >    job.setNumReduceTasks(1);
> >> >> >> >    job.setInputFormatClass(XmlInputFormat.class);
> >> >> >> >    job.setOutputFormatClass(HFileOutputFormat.class);
> >> >> >> >    job.setOutputKeyClass(ImmutableBytesWritable.class);
> >> >> >> >    job.setOutputValueClass(Put.class);
> >> >> >> >    job.setReducerClass(PutSortReducer.class);
> >> >> >> >
> >> >> >> >    Path outPath = new Path(output);
> >> >> >> >    FileOutputFormat.setOutputPath(job, outPath);
> >> >> >> >          job.waitForCompletion(true);
> >> >> >> >
> >> >> >> > and here is mapper skeleton
> >> >> >> >
> >> >> >> > public class MyParserMapper   extends
> >> >> >> >    Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
> >> >> >> >  while(true)
> >> >> >> >   {
> >> >> >> >       Put put = new Put(rowId);
> >> >> >> >      put.add(...);
> >> >> >> >      context.write(rwId, put);
> >> >> >> >   }
> >> >> >> >
> >> >> >> > The link says:
> >> >> >> > *In order to function efficiently, HFileOutputFormat must be
> >> >> configured
> >> >> >> such
> >> >> >> > that each output HFile fits within a single region. In order to
> do
> >> >> this,
> >> >> >> > jobs use Hadoop's TotalOrderPartitioner class to partition the
> map
> >> >> output
> >> >> >> > into disjoint ranges of the key space, corresponding to the key
> >> ranges
> >> >> of
> >> >> >> > the regions in the table. *"
> >> >> >> >
> >> >> >> > Now according to my configuration above  where i need to set
> >> >> >> > *TotalOrderPartitioner
> >> >> >> > ? *Should I need to add the following line also
> >> >> >> >
> >> >> >> > job.setPartitionerClass(TotalOrderPartitioner.class);
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > On totalorderpartition, this is a partitioner class from hadoop.
> >>  The
> >> >> >> >> MR partitioner -- the class that dictates which reducers get
> what
> >> map
> >> >> >> >> outputs -- is pluggable. The default partitioner does a hash of
> >> the
> >> >> >> >> output key to figure which reducer.  This won't work if you are
> >> >> >> >> looking to have your hfile output totally sorted.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> >> If you can't figure what its about, I'd suggest you check out
> the
> >> >> >> >> hadoop book where it gets a good explication.
> >> >> >> >>
> >> >> >> >>   which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?
> >> >> >> >
> >> >> >> > On incremental upload, the doc. suggests you look at the output
> for
> >> >> >> >> LoadIncrementalHFiles command.  Have you done that?  You run
> the
> >> >> >> >> command and it'll add in whatever is ready for loading.
> >> >> >> >>
> >> >> >> >
> >> >> >> >   I just use the command line tool for bulk uplaod but not seen
> >> >> >> > LoadIncrementalHFiles  class yet to do it through program
> >> >> >> >
> >> >> >> >
> >> >> >> >  ------------------------------
> >> >> >> >
> >> >> >> >
> >> >> >> >>
> >> >> >> >> St.Ack
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman <
> >> shujamughal@gmail.com
> >> >> >
> >> >> >> >> wrote:
> >> >> >> >> > Hey Community,
> >> >> >> >> >
> >> >> >> >> > Well...it seems that nobody has experienced with the bulk
> load
> >> >> option.
> >> >> >> I
> >> >> >> >> > have found one class which might help to write the code for
> it.
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
> >> >> >> >> >
> >> >> >> >> > From this, you can get the idea how to write map reduce job
> to
> >> >> output
> >> >> >> in
> >> >> >> >> > HFiles format. But There is a little confusion about these
> two
> >> >> things
> >> >> >> >> >
> >> >> >> >> > 1-TotalOrderPartitioner
> >> >> >> >> > 2-configureIncrementalLoad
> >> >> >> >> >
> >> >> >> >> > Does anybody have idea about how these things and how to
> >> configure
> >> >> it
> >> >> >> for
> >> >> >> >> > the job?
> >> >> >> >> >
> >> >> >> >> > Thanks
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <
> >> >> shujamughal@gmail.com>
> >> >> >> >> wrote:
> >> >> >> >> >
> >> >> >> >> >> Hi
> >> >> >> >> >>
> >> >> >> >> >> I am trying to investigate the bulk load option as described
> in
> >> >> the
> >> >> >> >> >> following link.
> >> >> >> >> >>
> >> >> >> >> >> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
> >> >> >> >> >>
> >> >> >> >> >> Does anybody have sample code or have used it before?
> >> >> >> >> >> Can it be helpful to insert data into existing table. In my
> >> >> scenario,
> >> >> >> I
> >> >> >> >> >> have one table with 1 column family in which data will be
> >> inserted
> >> >> >> every
> >> >> >> >> 15
> >> >> >> >> >> minutes.
> >> >> >> >> >>
> >> >> >> >> >> Kindly share your experiences
> >> >> >> >> >>
> >> >> >> >> >> Thanks
> >> >> >> >> >> --
> >> >> >> >> >> Regards
> >> >> >> >> >> Shuja-ur-Rehman Baig
> >> >> >> >> >> <http://pk.linkedin.com/in/shujamughal>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > --
> >> >> >> >> > Regards
> >> >> >> >> > Shuja-ur-Rehman Baig
> >> >> >> >> > <http://pk.linkedin.com/in/shujamughal>
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> > Regards
> >> >> >> > Shuja-ur-Rehman Baig
> >> >> >> > <http://pk.linkedin.com/in/shujamughal>
> >> >> >> >
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Regards
> >> >> > Shuja-ur-Rehman Baig
> >> >> > <http://pk.linkedin.com/in/shujamughal>
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Regards
> >> > Shuja-ur-Rehman Baig
> >> > <http://pk.linkedin.com/in/shujamughal>
> >> >
> >>
> >
> >
> >
> > --
> > Regards
> > Shuja-ur-Rehman Baig
> > <http://pk.linkedin.com/in/shujamughal>
> >
>



-- 
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>

Re: Bulk Load Sample Code

Posted by Stack <st...@duboce.net>.

Fix your classpath.  Add the google library.  See
http://hbase.apache.org/docs/r0.89.20100924/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description
for more on classpath.

St.Ack


On Fri, Nov 12, 2010 at 5:07 AM, Shuja Rehman <sh...@gmail.com> wrote:
> Hi
>
> I am trying to use configureIncrementalLoad() function to handle the
> totalOrderPartitioning but it throws this exception.
>
> 10/11/12 05:02:21 INFO zookeeper.ClientCnxn: Opening socket connection to
> server /10.10.10.2:2181
> 10/11/12 05:02:21 INFO zookeeper.ClientCnxn: Socket connection established
> to app4.hsd1.wa.comcast.net./10.10.10.2:2181, initiating session
> 10/11/12 05:02:21 INFO zookeeper.ClientCnxn: Session establishment complete
> on server app4.hsd1.wa.comcast.net./10.10.10.2:2181, sessionid =
> 0x12c401bfdae0008, negotiated timeout = 40000
> 10/11/12 05:02:21 INFO mapreduce.HFileOutputFormat: Looking up current
> regions for table org.apache.hadoop.hbase.client.HTable@21e554
> 10/11/12 05:02:21 INFO mapreduce.HFileOutputFormat: Configuring 1 reduce
> partitions to match current region count
> 10/11/12 05:02:21 INFO mapreduce.HFileOutputFormat: Writing partition
> information to
> hdfs://app4.hsd1.wa.comcast.net./user/root/partitions_1289566941504
> Exception in thread "main" java.lang.NoClassDefFoundError:
> com/google/common/base/Preconditions
>        at
> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.writePartitions(HFileOutputFormat.java:185)
>        at
> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.configureIncrementalLoad(HFileOutputFormat.java:258)
>        at ParserDriver.runJob(ParserDriver.java:162)
>        at ParserDriver.main(ParserDriver.java:109)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Caused by: java.lang.ClassNotFoundException:
> com.google.common.base.Preconditions
>        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>        ... 9 more
> 10/11/12 05:02:21 INFO zookeeper.ZooKeeper: Session: 0x12c401bfdae0008
> closed
>
> Here is the code.
>
>  Configuration conf = HBaseConfiguration.create();
>
>   Job job = new Job(conf, "j");
>
>    HTable table = new HTable(conf, "mytab");
>
>    FileInputFormat.setInputPaths(job, input);
>    job.setJarByClass(ParserDriver.class);
>    job.setMapperClass(MyParserMapper.class);
>
>    job.setInputFormatClass(XmlInputFormat.class);
>    job.setReducerClass(PutSortReducer.class);
>    Path outPath = new Path(output);
>    FileOutputFormat.setOutputPath(job, outPath);
>
>    job.setMapOutputValueClass(Put.class);
>    job.setMapOutputKeyClass(ImmutableBytesWritable.class);
> *    HFileOutputFormat.configureIncrementalLoad(job, table);*
>    TableMapReduceUtil.addDependencyJars(job);
>     job.waitForCompletion(true);
>
> I guess, there are some jar files missing. if yes then from where to get
> these?
>
> Thanks
>
> On Thu, Nov 11, 2010 at 12:57 AM, Stack <st...@duboce.net> wrote:
>
>> On Wed, Nov 10, 2010 at 11:53 AM, Shuja Rehman <sh...@gmail.com>
>> wrote:
>> > oh! I think u have not read the full post. The essay has 3 paragraphs  :)
>> >
>> > *Should I need to add the following line also
>> >
>> >  job.setPartitionerClass(TotalOrderPartitioner.class);
>> >
>>
>> You need to specify other than default partitioner so yes, above seems
>> necessary (Be aware that if only one reducer, all may appear to work
>> though your partitioner is bad... its when you have multiple reducers
>> that bad partitioner will show).
>>
>> > which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?
>> >
>>
>> Yes.  Or 2nd edition, October 2010.
>>
>> St.Ack
>>
>> >
>> >
>> >
>> > On Thu, Nov 11, 2010 at 12:49 AM, Stack <st...@duboce.net> wrote:
>> >
>> >> Which two questions (you wrote an essay that looked like one big
>> >> question -- smile).
>> >> St.Ack
>> >>
>> >> On Wed, Nov 10, 2010 at 11:44 AM, Shuja Rehman <sh...@gmail.com>
>> >> wrote:
>> >> > yeah, I tried it and it did not fails. can u answer other 2 questions
>> as
>> >> > well?
>> >> >
>> >> >
>> >> >
>> >> > On Thu, Nov 11, 2010 at 12:15 AM, Stack <st...@duboce.net> wrote:
>> >> >
>> >> >> All below looks reasonable (I did not do detailed review of your code
>> >> >> posting).  Have you tried it?  Did it fail?
>> >> >> St.Ack
>> >> >>
>> >> >> On Wed, Nov 10, 2010 at 11:12 AM, Shuja Rehman <
>> shujamughal@gmail.com>
>> >> >> wrote:
>> >> >> > On Wed, Nov 10, 2010 at 9:20 PM, Stack <st...@duboce.net> wrote:
>> >> >> >
>> >> >> >> What you need?  bulk-upload, in the scheme of things, is a well
>> >> >> >> documented feature.  Its also one that has had some exercise and
>> is
>> >> >> >> known to work well.  For a 0.89 release and trunk, documentation
>> is
>> >> >> >> here: http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html
>> .
>> >> >> >> The unit test you refer to below is good for figuring how to run a
>> >> job
>> >> >> >> (Bulk-upload was redone for 0.89/trunk and is much improved over
>> what
>> >> >> >> was available in 0.20.x)
>> >> >> >>
>> >> >> >
>> >> >> > *I need to load data into hbase using Hfiles.  *
>> >> >> >
>> >> >> > Ok, let me tell what I understand from all these things. Basically
>> >> there
>> >> >> are
>> >> >> > two ways to bulk load into hbase.
>> >> >> >
>> >> >> > 1- Using Command Line tools (importtsv, completebulkload )
>> >> >> > 2- Mapreduce job using HFileOutputFormat
>> >> >> >
>> >> >> > At the moment, I have generated the Hfiles using HFileOutputFormat
>> and
>> >> >> > loading these files into hbase using completebulkload command line
>> >> tool.
>> >> >> > here is my basic code skeleton. Correct me if I do anything wrong.
>> >> >> >
>> >> >> > Configuration conf = new Configuration();
>> >> >> > Job job = new Job(conf, "myjob");
>> >> >> >
>> >> >> >    FileInputFormat.setInputPaths(job, input);
>> >> >> >    job.setJarByClass(ParserDriver.class);
>> >> >> >    job.setMapperClass(MyParserMapper.class);
>> >> >> >    job.setNumReduceTasks(1);
>> >> >> >    job.setInputFormatClass(XmlInputFormat.class);
>> >> >> >    job.setOutputFormatClass(HFileOutputFormat.class);
>> >> >> >    job.setOutputKeyClass(ImmutableBytesWritable.class);
>> >> >> >    job.setOutputValueClass(Put.class);
>> >> >> >    job.setReducerClass(PutSortReducer.class);
>> >> >> >
>> >> >> >    Path outPath = new Path(output);
>> >> >> >    FileOutputFormat.setOutputPath(job, outPath);
>> >> >> >          job.waitForCompletion(true);
>> >> >> >
>> >> >> > and here is mapper skeleton
>> >> >> >
>> >> >> > public class MyParserMapper   extends
>> >> >> >    Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
>> >> >> >  while(true)
>> >> >> >   {
>> >> >> >       Put put = new Put(rowId);
>> >> >> >      put.add(...);
>> >> >> >      context.write(rwId, put);
>> >> >> >   }
>> >> >> >
>> >> >> > The link says:
>> >> >> > *In order to function efficiently, HFileOutputFormat must be
>> >> configured
>> >> >> such
>> >> >> > that each output HFile fits within a single region. In order to do
>> >> this,
>> >> >> > jobs use Hadoop's TotalOrderPartitioner class to partition the map
>> >> output
>> >> >> > into disjoint ranges of the key space, corresponding to the key
>> ranges
>> >> of
>> >> >> > the regions in the table. *"
>> >> >> >
>> >> >> > Now according to my configuration above  where i need to set
>> >> >> > *TotalOrderPartitioner
>> >> >> > ? *Should I need to add the following line also
>> >> >> >
>> >> >> > job.setPartitionerClass(TotalOrderPartitioner.class);
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On totalorderpartition, this is a partitioner class from hadoop.
>>  The
>> >> >> >> MR partitioner -- the class that dictates which reducers get what
>> map
>> >> >> >> outputs -- is pluggable. The default partitioner does a hash of
>> the
>> >> >> >> output key to figure which reducer.  This won't work if you are
>> >> >> >> looking to have your hfile output totally sorted.
>> >> >> >>
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> >> If you can't figure what its about, I'd suggest you check out the
>> >> >> >> hadoop book where it gets a good explication.
>> >> >> >>
>> >> >> >>   which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?
>> >> >> >
>> >> >> > On incremental upload, the doc. suggests you look at the output for
>> >> >> >> LoadIncrementalHFiles command.  Have you done that?  You run the
>> >> >> >> command and it'll add in whatever is ready for loading.
>> >> >> >>
>> >> >> >
>> >> >> >   I just use the command line tool for bulk uplaod but not seen
>> >> >> > LoadIncrementalHFiles  class yet to do it through program
>> >> >> >
>> >> >> >
>> >> >> >  ------------------------------
>> >> >> >
>> >> >> >
>> >> >> >>
>> >> >> >> St.Ack
>> >> >> >>
>> >> >> >>
>> >> >> >> On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman <
>> shujamughal@gmail.com
>> >> >
>> >> >> >> wrote:
>> >> >> >> > Hey Community,
>> >> >> >> >
>> >> >> >> > Well...it seems that nobody has experienced with the bulk load
>> >> option.
>> >> >> I
>> >> >> >> > have found one class which might help to write the code for it.
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
>> >> >> >> >
>> >> >> >> > From this, you can get the idea how to write map reduce job to
>> >> output
>> >> >> in
>> >> >> >> > HFiles format. But There is a little confusion about these two
>> >> things
>> >> >> >> >
>> >> >> >> > 1-TotalOrderPartitioner
>> >> >> >> > 2-configureIncrementalLoad
>> >> >> >> >
>> >> >> >> > Does anybody have idea about how these things and how to
>> configure
>> >> it
>> >> >> for
>> >> >> >> > the job?
>> >> >> >> >
>> >> >> >> > Thanks
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <
>> >> shujamughal@gmail.com>
>> >> >> >> wrote:
>> >> >> >> >
>> >> >> >> >> Hi
>> >> >> >> >>
>> >> >> >> >> I am trying to investigate the bulk load option as described in
>> >> the
>> >> >> >> >> following link.
>> >> >> >> >>
>> >> >> >> >> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
>> >> >> >> >>
>> >> >> >> >> Does anybody have sample code or have used it before?
>> >> >> >> >> Can it be helpful to insert data into existing table. In my
>> >> scenario,
>> >> >> I
>> >> >> >> >> have one table with 1 column family in which data will be
>> inserted
>> >> >> every
>> >> >> >> 15
>> >> >> >> >> minutes.
>> >> >> >> >>
>> >> >> >> >> Kindly share your experiences
>> >> >> >> >>
>> >> >> >> >> Thanks
>> >> >> >> >> --
>> >> >> >> >> Regards
>> >> >> >> >> Shuja-ur-Rehman Baig
>> >> >> >> >> <http://pk.linkedin.com/in/shujamughal>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > --
>> >> >> >> > Regards
>> >> >> >> > Shuja-ur-Rehman Baig
>> >> >> >> > <http://pk.linkedin.com/in/shujamughal>
>> >> >> >> >
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Regards
>> >> >> > Shuja-ur-Rehman Baig
>> >> >> > <http://pk.linkedin.com/in/shujamughal>
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Regards
>> >> > Shuja-ur-Rehman Baig
>> >> > <http://pk.linkedin.com/in/shujamughal>
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Regards
>> > Shuja-ur-Rehman Baig
>> > <http://pk.linkedin.com/in/shujamughal>
>> >
>>
>
>
>
> --
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>
>

Re: Bulk Load Sample Code

Posted by Shuja Rehman <sh...@gmail.com>.

Hi

I am trying to use configureIncrementalLoad() function to handle the
totalOrderPartitioning but it throws this exception.

10/11/12 05:02:21 INFO zookeeper.ClientCnxn: Opening socket connection to
server /10.10.10.2:2181
10/11/12 05:02:21 INFO zookeeper.ClientCnxn: Socket connection established
to app4.hsd1.wa.comcast.net./10.10.10.2:2181, initiating session
10/11/12 05:02:21 INFO zookeeper.ClientCnxn: Session establishment complete
on server app4.hsd1.wa.comcast.net./10.10.10.2:2181, sessionid =
0x12c401bfdae0008, negotiated timeout = 40000
10/11/12 05:02:21 INFO mapreduce.HFileOutputFormat: Looking up current
regions for table org.apache.hadoop.hbase.client.HTable@21e554
10/11/12 05:02:21 INFO mapreduce.HFileOutputFormat: Configuring 1 reduce
partitions to match current region count
10/11/12 05:02:21 INFO mapreduce.HFileOutputFormat: Writing partition
information to
hdfs://app4.hsd1.wa.comcast.net./user/root/partitions_1289566941504
Exception in thread "main" java.lang.NoClassDefFoundError:
com/google/common/base/Preconditions
        at
org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.writePartitions(HFileOutputFormat.java:185)
        at
org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.configureIncrementalLoad(HFileOutputFormat.java:258)
        at ParserDriver.runJob(ParserDriver.java:162)
        at ParserDriver.main(ParserDriver.java:109)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Caused by: java.lang.ClassNotFoundException:
com.google.common.base.Preconditions
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
        ... 9 more
10/11/12 05:02:21 INFO zookeeper.ZooKeeper: Session: 0x12c401bfdae0008
closed

Here is the code.

  Configuration conf = HBaseConfiguration.create();

   Job job = new Job(conf, "j");

    HTable table = new HTable(conf, "mytab");

    FileInputFormat.setInputPaths(job, input);
    job.setJarByClass(ParserDriver.class);
    job.setMapperClass(MyParserMapper.class);

    job.setInputFormatClass(XmlInputFormat.class);
    job.setReducerClass(PutSortReducer.class);
    Path outPath = new Path(output);
    FileOutputFormat.setOutputPath(job, outPath);

    job.setMapOutputValueClass(Put.class);
    job.setMapOutputKeyClass(ImmutableBytesWritable.class);
*    HFileOutputFormat.configureIncrementalLoad(job, table);*
    TableMapReduceUtil.addDependencyJars(job);
     job.waitForCompletion(true);

I guess, there are some jar files missing. if yes then from where to get
these?

Thanks

On Thu, Nov 11, 2010 at 12:57 AM, Stack <st...@duboce.net> wrote:

> On Wed, Nov 10, 2010 at 11:53 AM, Shuja Rehman <sh...@gmail.com>
> wrote:
> > oh! I think u have not read the full post. The essay has 3 paragraphs  :)
> >
> > *Should I need to add the following line also
> >
> >  job.setPartitionerClass(TotalOrderPartitioner.class);
> >
>
> You need to specify other than default partitioner so yes, above seems
> necessary (Be aware that if only one reducer, all may appear to work
> though your partitioner is bad... its when you have multiple reducers
> that bad partitioner will show).
>
> > which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?
> >
>
> Yes.  Or 2nd edition, October 2010.
>
> St.Ack
>
> >
> >
> >
> > On Thu, Nov 11, 2010 at 12:49 AM, Stack <st...@duboce.net> wrote:
> >
> >> Which two questions (you wrote an essay that looked like one big
> >> question -- smile).
> >> St.Ack
> >>
> >> On Wed, Nov 10, 2010 at 11:44 AM, Shuja Rehman <sh...@gmail.com>
> >> wrote:
> >> > yeah, I tried it and it did not fails. can u answer other 2 questions
> as
> >> > well?
> >> >
> >> >
> >> >
> >> > On Thu, Nov 11, 2010 at 12:15 AM, Stack <st...@duboce.net> wrote:
> >> >
> >> >> All below looks reasonable (I did not do detailed review of your code
> >> >> posting).  Have you tried it?  Did it fail?
> >> >> St.Ack
> >> >>
> >> >> On Wed, Nov 10, 2010 at 11:12 AM, Shuja Rehman <
> shujamughal@gmail.com>
> >> >> wrote:
> >> >> > On Wed, Nov 10, 2010 at 9:20 PM, Stack <st...@duboce.net> wrote:
> >> >> >
> >> >> >> What you need?  bulk-upload, in the scheme of things, is a well
> >> >> >> documented feature.  Its also one that has had some exercise and
> is
> >> >> >> known to work well.  For a 0.89 release and trunk, documentation
> is
> >> >> >> here: http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html
> .
> >> >> >> The unit test you refer to below is good for figuring how to run a
> >> job
> >> >> >> (Bulk-upload was redone for 0.89/trunk and is much improved over
> what
> >> >> >> was available in 0.20.x)
> >> >> >>
> >> >> >
> >> >> > *I need to load data into hbase using Hfiles.  *
> >> >> >
> >> >> > Ok, let me tell what I understand from all these things. Basically
> >> there
> >> >> are
> >> >> > two ways to bulk load into hbase.
> >> >> >
> >> >> > 1- Using Command Line tools (importtsv, completebulkload )
> >> >> > 2- Mapreduce job using HFileOutputFormat
> >> >> >
> >> >> > At the moment, I have generated the Hfiles using HFileOutputFormat
> and
> >> >> > loading these files into hbase using completebulkload command line
> >> tool.
> >> >> > here is my basic code skeleton. Correct me if I do anything wrong.
> >> >> >
> >> >> > Configuration conf = new Configuration();
> >> >> > Job job = new Job(conf, "myjob");
> >> >> >
> >> >> >    FileInputFormat.setInputPaths(job, input);
> >> >> >    job.setJarByClass(ParserDriver.class);
> >> >> >    job.setMapperClass(MyParserMapper.class);
> >> >> >    job.setNumReduceTasks(1);
> >> >> >    job.setInputFormatClass(XmlInputFormat.class);
> >> >> >    job.setOutputFormatClass(HFileOutputFormat.class);
> >> >> >    job.setOutputKeyClass(ImmutableBytesWritable.class);
> >> >> >    job.setOutputValueClass(Put.class);
> >> >> >    job.setReducerClass(PutSortReducer.class);
> >> >> >
> >> >> >    Path outPath = new Path(output);
> >> >> >    FileOutputFormat.setOutputPath(job, outPath);
> >> >> >          job.waitForCompletion(true);
> >> >> >
> >> >> > and here is mapper skeleton
> >> >> >
> >> >> > public class MyParserMapper   extends
> >> >> >    Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
> >> >> >  while(true)
> >> >> >   {
> >> >> >       Put put = new Put(rowId);
> >> >> >      put.add(...);
> >> >> >      context.write(rwId, put);
> >> >> >   }
> >> >> >
> >> >> > The link says:
> >> >> > *In order to function efficiently, HFileOutputFormat must be
> >> configured
> >> >> such
> >> >> > that each output HFile fits within a single region. In order to do
> >> this,
> >> >> > jobs use Hadoop's TotalOrderPartitioner class to partition the map
> >> output
> >> >> > into disjoint ranges of the key space, corresponding to the key
> ranges
> >> of
> >> >> > the regions in the table. *"
> >> >> >
> >> >> > Now according to my configuration above  where i need to set
> >> >> > *TotalOrderPartitioner
> >> >> > ? *Should I need to add the following line also
> >> >> >
> >> >> > job.setPartitionerClass(TotalOrderPartitioner.class);
> >> >> >
> >> >> >
> >> >> >
> >> >> > On totalorderpartition, this is a partitioner class from hadoop.
>  The
> >> >> >> MR partitioner -- the class that dictates which reducers get what
> map
> >> >> >> outputs -- is pluggable. The default partitioner does a hash of
> the
> >> >> >> output key to figure which reducer.  This won't work if you are
> >> >> >> looking to have your hfile output totally sorted.
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> >> >> >> If you can't figure what its about, I'd suggest you check out the
> >> >> >> hadoop book where it gets a good explication.
> >> >> >>
> >> >> >>   which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?
> >> >> >
> >> >> > On incremental upload, the doc. suggests you look at the output for
> >> >> >> LoadIncrementalHFiles command.  Have you done that?  You run the
> >> >> >> command and it'll add in whatever is ready for loading.
> >> >> >>
> >> >> >
> >> >> >   I just use the command line tool for bulk uplaod but not seen
> >> >> > LoadIncrementalHFiles  class yet to do it through program
> >> >> >
> >> >> >
> >> >> >  ------------------------------
> >> >> >
> >> >> >
> >> >> >>
> >> >> >> St.Ack
> >> >> >>
> >> >> >>
> >> >> >> On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman <
> shujamughal@gmail.com
> >> >
> >> >> >> wrote:
> >> >> >> > Hey Community,
> >> >> >> >
> >> >> >> > Well...it seems that nobody has experienced with the bulk load
> >> option.
> >> >> I
> >> >> >> > have found one class which might help to write the code for it.
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >>
> >>
> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
> >> >> >> >
> >> >> >> > From this, you can get the idea how to write map reduce job to
> >> output
> >> >> in
> >> >> >> > HFiles format. But There is a little confusion about these two
> >> things
> >> >> >> >
> >> >> >> > 1-TotalOrderPartitioner
> >> >> >> > 2-configureIncrementalLoad
> >> >> >> >
> >> >> >> > Does anybody have idea about how these things and how to
> configure
> >> it
> >> >> for
> >> >> >> > the job?
> >> >> >> >
> >> >> >> > Thanks
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <
> >> shujamughal@gmail.com>
> >> >> >> wrote:
> >> >> >> >
> >> >> >> >> Hi
> >> >> >> >>
> >> >> >> >> I am trying to investigate the bulk load option as described in
> >> the
> >> >> >> >> following link.
> >> >> >> >>
> >> >> >> >> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
> >> >> >> >>
> >> >> >> >> Does anybody have sample code or have used it before?
> >> >> >> >> Can it be helpful to insert data into existing table. In my
> >> scenario,
> >> >> I
> >> >> >> >> have one table with 1 column family in which data will be
> inserted
> >> >> every
> >> >> >> 15
> >> >> >> >> minutes.
> >> >> >> >>
> >> >> >> >> Kindly share your experiences
> >> >> >> >>
> >> >> >> >> Thanks
> >> >> >> >> --
> >> >> >> >> Regards
> >> >> >> >> Shuja-ur-Rehman Baig
> >> >> >> >> <http://pk.linkedin.com/in/shujamughal>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> > Regards
> >> >> >> > Shuja-ur-Rehman Baig
> >> >> >> > <http://pk.linkedin.com/in/shujamughal>
> >> >> >> >
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Regards
> >> >> > Shuja-ur-Rehman Baig
> >> >> > <http://pk.linkedin.com/in/shujamughal>
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Regards
> >> > Shuja-ur-Rehman Baig
> >> > <http://pk.linkedin.com/in/shujamughal>
> >> >
> >>
> >
> >
> >
> > --
> > Regards
> > Shuja-ur-Rehman Baig
> > <http://pk.linkedin.com/in/shujamughal>
> >
>



-- 
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>

Re: Bulk Load Sample Code

Posted by Stack <st...@duboce.net>.

On Wed, Nov 10, 2010 at 11:53 AM, Shuja Rehman <sh...@gmail.com> wrote:
> oh! I think u have not read the full post. The essay has 3 paragraphs  :)
>
> *Should I need to add the following line also
>
>  job.setPartitionerClass(TotalOrderPartitioner.class);
>

You need to specify other than default partitioner so yes, above seems
necessary (Be aware that if only one reducer, all may appear to work
though your partitioner is bad... its when you have multiple reducers
that bad partitioner will show).

> which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?
>

Yes.  Or 2nd edition, October 2010.

St.Ack

>
>
>
> On Thu, Nov 11, 2010 at 12:49 AM, Stack <st...@duboce.net> wrote:
>
>> Which two questions (you wrote an essay that looked like one big
>> question -- smile).
>> St.Ack
>>
>> On Wed, Nov 10, 2010 at 11:44 AM, Shuja Rehman <sh...@gmail.com>
>> wrote:
>> > yeah, I tried it and it did not fails. can u answer other 2 questions as
>> > well?
>> >
>> >
>> >
>> > On Thu, Nov 11, 2010 at 12:15 AM, Stack <st...@duboce.net> wrote:
>> >
>> >> All below looks reasonable (I did not do detailed review of your code
>> >> posting).  Have you tried it?  Did it fail?
>> >> St.Ack
>> >>
>> >> On Wed, Nov 10, 2010 at 11:12 AM, Shuja Rehman <sh...@gmail.com>
>> >> wrote:
>> >> > On Wed, Nov 10, 2010 at 9:20 PM, Stack <st...@duboce.net> wrote:
>> >> >
>> >> >> What you need?  bulk-upload, in the scheme of things, is a well
>> >> >> documented feature.  Its also one that has had some exercise and is
>> >> >> known to work well.  For a 0.89 release and trunk, documentation is
>> >> >> here: http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html.
>> >> >> The unit test you refer to below is good for figuring how to run a
>> job
>> >> >> (Bulk-upload was redone for 0.89/trunk and is much improved over what
>> >> >> was available in 0.20.x)
>> >> >>
>> >> >
>> >> > *I need to load data into hbase using Hfiles.  *
>> >> >
>> >> > Ok, let me tell what I understand from all these things. Basically
>> there
>> >> are
>> >> > two ways to bulk load into hbase.
>> >> >
>> >> > 1- Using Command Line tools (importtsv, completebulkload )
>> >> > 2- Mapreduce job using HFileOutputFormat
>> >> >
>> >> > At the moment, I have generated the Hfiles using HFileOutputFormat and
>> >> > loading these files into hbase using completebulkload command line
>> tool.
>> >> > here is my basic code skeleton. Correct me if I do anything wrong.
>> >> >
>> >> > Configuration conf = new Configuration();
>> >> > Job job = new Job(conf, "myjob");
>> >> >
>> >> >    FileInputFormat.setInputPaths(job, input);
>> >> >    job.setJarByClass(ParserDriver.class);
>> >> >    job.setMapperClass(MyParserMapper.class);
>> >> >    job.setNumReduceTasks(1);
>> >> >    job.setInputFormatClass(XmlInputFormat.class);
>> >> >    job.setOutputFormatClass(HFileOutputFormat.class);
>> >> >    job.setOutputKeyClass(ImmutableBytesWritable.class);
>> >> >    job.setOutputValueClass(Put.class);
>> >> >    job.setReducerClass(PutSortReducer.class);
>> >> >
>> >> >    Path outPath = new Path(output);
>> >> >    FileOutputFormat.setOutputPath(job, outPath);
>> >> >          job.waitForCompletion(true);
>> >> >
>> >> > and here is mapper skeleton
>> >> >
>> >> > public class MyParserMapper   extends
>> >> >    Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
>> >> >  while(true)
>> >> >   {
>> >> >       Put put = new Put(rowId);
>> >> >      put.add(...);
>> >> >      context.write(rwId, put);
>> >> >   }
>> >> >
>> >> > The link says:
>> >> > *In order to function efficiently, HFileOutputFormat must be
>> configured
>> >> such
>> >> > that each output HFile fits within a single region. In order to do
>> this,
>> >> > jobs use Hadoop's TotalOrderPartitioner class to partition the map
>> output
>> >> > into disjoint ranges of the key space, corresponding to the key ranges
>> of
>> >> > the regions in the table. *"
>> >> >
>> >> > Now according to my configuration above  where i need to set
>> >> > *TotalOrderPartitioner
>> >> > ? *Should I need to add the following line also
>> >> >
>> >> > job.setPartitionerClass(TotalOrderPartitioner.class);
>> >> >
>> >> >
>> >> >
>> >> > On totalorderpartition, this is a partitioner class from hadoop.  The
>> >> >> MR partitioner -- the class that dictates which reducers get what map
>> >> >> outputs -- is pluggable. The default partitioner does a hash of the
>> >> >> output key to figure which reducer.  This won't work if you are
>> >> >> looking to have your hfile output totally sorted.
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> >> If you can't figure what its about, I'd suggest you check out the
>> >> >> hadoop book where it gets a good explication.
>> >> >>
>> >> >>   which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?
>> >> >
>> >> > On incremental upload, the doc. suggests you look at the output for
>> >> >> LoadIncrementalHFiles command.  Have you done that?  You run the
>> >> >> command and it'll add in whatever is ready for loading.
>> >> >>
>> >> >
>> >> >   I just use the command line tool for bulk uplaod but not seen
>> >> > LoadIncrementalHFiles  class yet to do it through program
>> >> >
>> >> >
>> >> >  ------------------------------
>> >> >
>> >> >
>> >> >>
>> >> >> St.Ack
>> >> >>
>> >> >>
>> >> >> On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman <shujamughal@gmail.com
>> >
>> >> >> wrote:
>> >> >> > Hey Community,
>> >> >> >
>> >> >> > Well...it seems that nobody has experienced with the bulk load
>> option.
>> >> I
>> >> >> > have found one class which might help to write the code for it.
>> >> >> >
>> >> >> >
>> >> >>
>> >>
>> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
>> >> >> >
>> >> >> > From this, you can get the idea how to write map reduce job to
>> output
>> >> in
>> >> >> > HFiles format. But There is a little confusion about these two
>> things
>> >> >> >
>> >> >> > 1-TotalOrderPartitioner
>> >> >> > 2-configureIncrementalLoad
>> >> >> >
>> >> >> > Does anybody have idea about how these things and how to configure
>> it
>> >> for
>> >> >> > the job?
>> >> >> >
>> >> >> > Thanks
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <
>> shujamughal@gmail.com>
>> >> >> wrote:
>> >> >> >
>> >> >> >> Hi
>> >> >> >>
>> >> >> >> I am trying to investigate the bulk load option as described in
>> the
>> >> >> >> following link.
>> >> >> >>
>> >> >> >> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
>> >> >> >>
>> >> >> >> Does anybody have sample code or have used it before?
>> >> >> >> Can it be helpful to insert data into existing table. In my
>> scenario,
>> >> I
>> >> >> >> have one table with 1 column family in which data will be inserted
>> >> every
>> >> >> 15
>> >> >> >> minutes.
>> >> >> >>
>> >> >> >> Kindly share your experiences
>> >> >> >>
>> >> >> >> Thanks
>> >> >> >> --
>> >> >> >> Regards
>> >> >> >> Shuja-ur-Rehman Baig
>> >> >> >> <http://pk.linkedin.com/in/shujamughal>
>> >> >> >>
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Regards
>> >> >> > Shuja-ur-Rehman Baig
>> >> >> > <http://pk.linkedin.com/in/shujamughal>
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Regards
>> >> > Shuja-ur-Rehman Baig
>> >> > <http://pk.linkedin.com/in/shujamughal>
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Regards
>> > Shuja-ur-Rehman Baig
>> > <http://pk.linkedin.com/in/shujamughal>
>> >
>>
>
>
>
> --
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>
>

Re: Bulk Load Sample Code

Posted by Shuja Rehman <sh...@gmail.com>.

oh! I think u have not read the full post. The essay has 3 paragraphs  :)

*Should I need to add the following line also

  job.setPartitionerClass(TotalOrderPartitioner.class);

which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?




On Thu, Nov 11, 2010 at 12:49 AM, Stack <st...@duboce.net> wrote:

> Which two questions (you wrote an essay that looked like one big
> question -- smile).
> St.Ack
>
> On Wed, Nov 10, 2010 at 11:44 AM, Shuja Rehman <sh...@gmail.com>
> wrote:
> > yeah, I tried it and it did not fails. can u answer other 2 questions as
> > well?
> >
> >
> >
> > On Thu, Nov 11, 2010 at 12:15 AM, Stack <st...@duboce.net> wrote:
> >
> >> All below looks reasonable (I did not do detailed review of your code
> >> posting).  Have you tried it?  Did it fail?
> >> St.Ack
> >>
> >> On Wed, Nov 10, 2010 at 11:12 AM, Shuja Rehman <sh...@gmail.com>
> >> wrote:
> >> > On Wed, Nov 10, 2010 at 9:20 PM, Stack <st...@duboce.net> wrote:
> >> >
> >> >> What you need?  bulk-upload, in the scheme of things, is a well
> >> >> documented feature.  Its also one that has had some exercise and is
> >> >> known to work well.  For a 0.89 release and trunk, documentation is
> >> >> here: http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html.
> >> >> The unit test you refer to below is good for figuring how to run a
> job
> >> >> (Bulk-upload was redone for 0.89/trunk and is much improved over what
> >> >> was available in 0.20.x)
> >> >>
> >> >
> >> > *I need to load data into hbase using Hfiles.  *
> >> >
> >> > Ok, let me tell what I understand from all these things. Basically
> there
> >> are
> >> > two ways to bulk load into hbase.
> >> >
> >> > 1- Using Command Line tools (importtsv, completebulkload )
> >> > 2- Mapreduce job using HFileOutputFormat
> >> >
> >> > At the moment, I have generated the Hfiles using HFileOutputFormat and
> >> > loading these files into hbase using completebulkload command line
> tool.
> >> > here is my basic code skeleton. Correct me if I do anything wrong.
> >> >
> >> > Configuration conf = new Configuration();
> >> > Job job = new Job(conf, "myjob");
> >> >
> >> >    FileInputFormat.setInputPaths(job, input);
> >> >    job.setJarByClass(ParserDriver.class);
> >> >    job.setMapperClass(MyParserMapper.class);
> >> >    job.setNumReduceTasks(1);
> >> >    job.setInputFormatClass(XmlInputFormat.class);
> >> >    job.setOutputFormatClass(HFileOutputFormat.class);
> >> >    job.setOutputKeyClass(ImmutableBytesWritable.class);
> >> >    job.setOutputValueClass(Put.class);
> >> >    job.setReducerClass(PutSortReducer.class);
> >> >
> >> >    Path outPath = new Path(output);
> >> >    FileOutputFormat.setOutputPath(job, outPath);
> >> >          job.waitForCompletion(true);
> >> >
> >> > and here is mapper skeleton
> >> >
> >> > public class MyParserMapper   extends
> >> >    Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
> >> >  while(true)
> >> >   {
> >> >       Put put = new Put(rowId);
> >> >      put.add(...);
> >> >      context.write(rwId, put);
> >> >   }
> >> >
> >> > The link says:
> >> > *In order to function efficiently, HFileOutputFormat must be
> configured
> >> such
> >> > that each output HFile fits within a single region. In order to do
> this,
> >> > jobs use Hadoop's TotalOrderPartitioner class to partition the map
> output
> >> > into disjoint ranges of the key space, corresponding to the key ranges
> of
> >> > the regions in the table. *"
> >> >
> >> > Now according to my configuration above  where i need to set
> >> > *TotalOrderPartitioner
> >> > ? *Should I need to add the following line also
> >> >
> >> > job.setPartitionerClass(TotalOrderPartitioner.class);
> >> >
> >> >
> >> >
> >> > On totalorderpartition, this is a partitioner class from hadoop.  The
> >> >> MR partitioner -- the class that dictates which reducers get what map
> >> >> outputs -- is pluggable. The default partitioner does a hash of the
> >> >> output key to figure which reducer.  This won't work if you are
> >> >> looking to have your hfile output totally sorted.
> >> >>
> >> >>
> >> >
> >> >
> >> >> If you can't figure what its about, I'd suggest you check out the
> >> >> hadoop book where it gets a good explication.
> >> >>
> >> >>   which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?
> >> >
> >> > On incremental upload, the doc. suggests you look at the output for
> >> >> LoadIncrementalHFiles command.  Have you done that?  You run the
> >> >> command and it'll add in whatever is ready for loading.
> >> >>
> >> >
> >> >   I just use the command line tool for bulk uplaod but not seen
> >> > LoadIncrementalHFiles  class yet to do it through program
> >> >
> >> >
> >> >  ------------------------------
> >> >
> >> >
> >> >>
> >> >> St.Ack
> >> >>
> >> >>
> >> >> On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman <shujamughal@gmail.com
> >
> >> >> wrote:
> >> >> > Hey Community,
> >> >> >
> >> >> > Well...it seems that nobody has experienced with the bulk load
> option.
> >> I
> >> >> > have found one class which might help to write the code for it.
> >> >> >
> >> >> >
> >> >>
> >>
> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
> >> >> >
> >> >> > From this, you can get the idea how to write map reduce job to
> output
> >> in
> >> >> > HFiles format. But There is a little confusion about these two
> things
> >> >> >
> >> >> > 1-TotalOrderPartitioner
> >> >> > 2-configureIncrementalLoad
> >> >> >
> >> >> > Does anybody have idea about how these things and how to configure
> it
> >> for
> >> >> > the job?
> >> >> >
> >> >> > Thanks
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <
> shujamughal@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >> Hi
> >> >> >>
> >> >> >> I am trying to investigate the bulk load option as described in
> the
> >> >> >> following link.
> >> >> >>
> >> >> >> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
> >> >> >>
> >> >> >> Does anybody have sample code or have used it before?
> >> >> >> Can it be helpful to insert data into existing table. In my
> scenario,
> >> I
> >> >> >> have one table with 1 column family in which data will be inserted
> >> every
> >> >> 15
> >> >> >> minutes.
> >> >> >>
> >> >> >> Kindly share your experiences
> >> >> >>
> >> >> >> Thanks
> >> >> >> --
> >> >> >> Regards
> >> >> >> Shuja-ur-Rehman Baig
> >> >> >> <http://pk.linkedin.com/in/shujamughal>
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Regards
> >> >> > Shuja-ur-Rehman Baig
> >> >> > <http://pk.linkedin.com/in/shujamughal>
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Regards
> >> > Shuja-ur-Rehman Baig
> >> > <http://pk.linkedin.com/in/shujamughal>
> >> >
> >>
> >
> >
> >
> > --
> > Regards
> > Shuja-ur-Rehman Baig
> > <http://pk.linkedin.com/in/shujamughal>
> >
>



-- 
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>

Re: Bulk Load Sample Code

Posted by Stack <st...@duboce.net>.

Which two questions (you wrote an essay that looked like one big
question -- smile).
St.Ack

On Wed, Nov 10, 2010 at 11:44 AM, Shuja Rehman <sh...@gmail.com> wrote:
> yeah, I tried it and it did not fails. can u answer other 2 questions as
> well?
>
>
>
> On Thu, Nov 11, 2010 at 12:15 AM, Stack <st...@duboce.net> wrote:
>
>> All below looks reasonable (I did not do detailed review of your code
>> posting).  Have you tried it?  Did it fail?
>> St.Ack
>>
>> On Wed, Nov 10, 2010 at 11:12 AM, Shuja Rehman <sh...@gmail.com>
>> wrote:
>> > On Wed, Nov 10, 2010 at 9:20 PM, Stack <st...@duboce.net> wrote:
>> >
>> >> What you need?  bulk-upload, in the scheme of things, is a well
>> >> documented feature.  Its also one that has had some exercise and is
>> >> known to work well.  For a 0.89 release and trunk, documentation is
>> >> here: http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html.
>> >> The unit test you refer to below is good for figuring how to run a job
>> >> (Bulk-upload was redone for 0.89/trunk and is much improved over what
>> >> was available in 0.20.x)
>> >>
>> >
>> > *I need to load data into hbase using Hfiles.  *
>> >
>> > Ok, let me tell what I understand from all these things. Basically there
>> are
>> > two ways to bulk load into hbase.
>> >
>> > 1- Using Command Line tools (importtsv, completebulkload )
>> > 2- Mapreduce job using HFileOutputFormat
>> >
>> > At the moment, I have generated the Hfiles using HFileOutputFormat and
>> > loading these files into hbase using completebulkload command line tool.
>> > here is my basic code skeleton. Correct me if I do anything wrong.
>> >
>> > Configuration conf = new Configuration();
>> > Job job = new Job(conf, "myjob");
>> >
>> >    FileInputFormat.setInputPaths(job, input);
>> >    job.setJarByClass(ParserDriver.class);
>> >    job.setMapperClass(MyParserMapper.class);
>> >    job.setNumReduceTasks(1);
>> >    job.setInputFormatClass(XmlInputFormat.class);
>> >    job.setOutputFormatClass(HFileOutputFormat.class);
>> >    job.setOutputKeyClass(ImmutableBytesWritable.class);
>> >    job.setOutputValueClass(Put.class);
>> >    job.setReducerClass(PutSortReducer.class);
>> >
>> >    Path outPath = new Path(output);
>> >    FileOutputFormat.setOutputPath(job, outPath);
>> >          job.waitForCompletion(true);
>> >
>> > and here is mapper skeleton
>> >
>> > public class MyParserMapper   extends
>> >    Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
>> >  while(true)
>> >   {
>> >       Put put = new Put(rowId);
>> >      put.add(...);
>> >      context.write(rwId, put);
>> >   }
>> >
>> > The link says:
>> > *In order to function efficiently, HFileOutputFormat must be configured
>> such
>> > that each output HFile fits within a single region. In order to do this,
>> > jobs use Hadoop's TotalOrderPartitioner class to partition the map output
>> > into disjoint ranges of the key space, corresponding to the key ranges of
>> > the regions in the table. *"
>> >
>> > Now according to my configuration above  where i need to set
>> > *TotalOrderPartitioner
>> > ? *Should I need to add the following line also
>> >
>> > job.setPartitionerClass(TotalOrderPartitioner.class);
>> >
>> >
>> >
>> > On totalorderpartition, this is a partitioner class from hadoop.  The
>> >> MR partitioner -- the class that dictates which reducers get what map
>> >> outputs -- is pluggable. The default partitioner does a hash of the
>> >> output key to figure which reducer.  This won't work if you are
>> >> looking to have your hfile output totally sorted.
>> >>
>> >>
>> >
>> >
>> >> If you can't figure what its about, I'd suggest you check out the
>> >> hadoop book where it gets a good explication.
>> >>
>> >>   which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?
>> >
>> > On incremental upload, the doc. suggests you look at the output for
>> >> LoadIncrementalHFiles command.  Have you done that?  You run the
>> >> command and it'll add in whatever is ready for loading.
>> >>
>> >
>> >   I just use the command line tool for bulk uplaod but not seen
>> > LoadIncrementalHFiles  class yet to do it through program
>> >
>> >
>> >  ------------------------------
>> >
>> >
>> >>
>> >> St.Ack
>> >>
>> >>
>> >> On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman <sh...@gmail.com>
>> >> wrote:
>> >> > Hey Community,
>> >> >
>> >> > Well...it seems that nobody has experienced with the bulk load option.
>> I
>> >> > have found one class which might help to write the code for it.
>> >> >
>> >> >
>> >>
>> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
>> >> >
>> >> > From this, you can get the idea how to write map reduce job to output
>> in
>> >> > HFiles format. But There is a little confusion about these two things
>> >> >
>> >> > 1-TotalOrderPartitioner
>> >> > 2-configureIncrementalLoad
>> >> >
>> >> > Does anybody have idea about how these things and how to configure it
>> for
>> >> > the job?
>> >> >
>> >> > Thanks
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <sh...@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Hi
>> >> >>
>> >> >> I am trying to investigate the bulk load option as described in the
>> >> >> following link.
>> >> >>
>> >> >> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
>> >> >>
>> >> >> Does anybody have sample code or have used it before?
>> >> >> Can it be helpful to insert data into existing table. In my scenario,
>> I
>> >> >> have one table with 1 column family in which data will be inserted
>> every
>> >> 15
>> >> >> minutes.
>> >> >>
>> >> >> Kindly share your experiences
>> >> >>
>> >> >> Thanks
>> >> >> --
>> >> >> Regards
>> >> >> Shuja-ur-Rehman Baig
>> >> >> <http://pk.linkedin.com/in/shujamughal>
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> > --
>> >> > Regards
>> >> > Shuja-ur-Rehman Baig
>> >> > <http://pk.linkedin.com/in/shujamughal>
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Regards
>> > Shuja-ur-Rehman Baig
>> > <http://pk.linkedin.com/in/shujamughal>
>> >
>>
>
>
>
> --
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>
>

Re: Bulk Load Sample Code

Posted by Shuja Rehman <sh...@gmail.com>.

yeah, I tried it and it did not fails. can u answer other 2 questions as
well?



On Thu, Nov 11, 2010 at 12:15 AM, Stack <st...@duboce.net> wrote:

> All below looks reasonable (I did not do detailed review of your code
> posting).  Have you tried it?  Did it fail?
> St.Ack
>
> On Wed, Nov 10, 2010 at 11:12 AM, Shuja Rehman <sh...@gmail.com>
> wrote:
> > On Wed, Nov 10, 2010 at 9:20 PM, Stack <st...@duboce.net> wrote:
> >
> >> What you need?  bulk-upload, in the scheme of things, is a well
> >> documented feature.  Its also one that has had some exercise and is
> >> known to work well.  For a 0.89 release and trunk, documentation is
> >> here: http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html.
> >> The unit test you refer to below is good for figuring how to run a job
> >> (Bulk-upload was redone for 0.89/trunk and is much improved over what
> >> was available in 0.20.x)
> >>
> >
> > *I need to load data into hbase using Hfiles.  *
> >
> > Ok, let me tell what I understand from all these things. Basically there
> are
> > two ways to bulk load into hbase.
> >
> > 1- Using Command Line tools (importtsv, completebulkload )
> > 2- Mapreduce job using HFileOutputFormat
> >
> > At the moment, I have generated the Hfiles using HFileOutputFormat and
> > loading these files into hbase using completebulkload command line tool.
> > here is my basic code skeleton. Correct me if I do anything wrong.
> >
> > Configuration conf = new Configuration();
> > Job job = new Job(conf, "myjob");
> >
> >    FileInputFormat.setInputPaths(job, input);
> >    job.setJarByClass(ParserDriver.class);
> >    job.setMapperClass(MyParserMapper.class);
> >    job.setNumReduceTasks(1);
> >    job.setInputFormatClass(XmlInputFormat.class);
> >    job.setOutputFormatClass(HFileOutputFormat.class);
> >    job.setOutputKeyClass(ImmutableBytesWritable.class);
> >    job.setOutputValueClass(Put.class);
> >    job.setReducerClass(PutSortReducer.class);
> >
> >    Path outPath = new Path(output);
> >    FileOutputFormat.setOutputPath(job, outPath);
> >          job.waitForCompletion(true);
> >
> > and here is mapper skeleton
> >
> > public class MyParserMapper   extends
> >    Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
> >  while(true)
> >   {
> >       Put put = new Put(rowId);
> >      put.add(...);
> >      context.write(rwId, put);
> >   }
> >
> > The link says:
> > *In order to function efficiently, HFileOutputFormat must be configured
> such
> > that each output HFile fits within a single region. In order to do this,
> > jobs use Hadoop's TotalOrderPartitioner class to partition the map output
> > into disjoint ranges of the key space, corresponding to the key ranges of
> > the regions in the table. *"
> >
> > Now according to my configuration above  where i need to set
> > *TotalOrderPartitioner
> > ? *Should I need to add the following line also
> >
> > job.setPartitionerClass(TotalOrderPartitioner.class);
> >
> >
> >
> > On totalorderpartition, this is a partitioner class from hadoop.  The
> >> MR partitioner -- the class that dictates which reducers get what map
> >> outputs -- is pluggable. The default partitioner does a hash of the
> >> output key to figure which reducer.  This won't work if you are
> >> looking to have your hfile output totally sorted.
> >>
> >>
> >
> >
> >> If you can't figure what its about, I'd suggest you check out the
> >> hadoop book where it gets a good explication.
> >>
> >>   which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?
> >
> > On incremental upload, the doc. suggests you look at the output for
> >> LoadIncrementalHFiles command.  Have you done that?  You run the
> >> command and it'll add in whatever is ready for loading.
> >>
> >
> >   I just use the command line tool for bulk uplaod but not seen
> > LoadIncrementalHFiles  class yet to do it through program
> >
> >
> >  ------------------------------
> >
> >
> >>
> >> St.Ack
> >>
> >>
> >> On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman <sh...@gmail.com>
> >> wrote:
> >> > Hey Community,
> >> >
> >> > Well...it seems that nobody has experienced with the bulk load option.
> I
> >> > have found one class which might help to write the code for it.
> >> >
> >> >
> >>
> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
> >> >
> >> > From this, you can get the idea how to write map reduce job to output
> in
> >> > HFiles format. But There is a little confusion about these two things
> >> >
> >> > 1-TotalOrderPartitioner
> >> > 2-configureIncrementalLoad
> >> >
> >> > Does anybody have idea about how these things and how to configure it
> for
> >> > the job?
> >> >
> >> > Thanks
> >> >
> >> >
> >> >
> >> > On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <sh...@gmail.com>
> >> wrote:
> >> >
> >> >> Hi
> >> >>
> >> >> I am trying to investigate the bulk load option as described in the
> >> >> following link.
> >> >>
> >> >> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
> >> >>
> >> >> Does anybody have sample code or have used it before?
> >> >> Can it be helpful to insert data into existing table. In my scenario,
> I
> >> >> have one table with 1 column family in which data will be inserted
> every
> >> 15
> >> >> minutes.
> >> >>
> >> >> Kindly share your experiences
> >> >>
> >> >> Thanks
> >> >> --
> >> >> Regards
> >> >> Shuja-ur-Rehman Baig
> >> >> <http://pk.linkedin.com/in/shujamughal>
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > Regards
> >> > Shuja-ur-Rehman Baig
> >> > <http://pk.linkedin.com/in/shujamughal>
> >> >
> >>
> >
> >
> >
> > --
> > Regards
> > Shuja-ur-Rehman Baig
> > <http://pk.linkedin.com/in/shujamughal>
> >
>



-- 
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>

Re: Bulk Load Sample Code

Posted by Stack <st...@duboce.net>.

All below looks reasonable (I did not do detailed review of your code
posting).  Have you tried it?  Did it fail?
St.Ack

On Wed, Nov 10, 2010 at 11:12 AM, Shuja Rehman <sh...@gmail.com> wrote:
> On Wed, Nov 10, 2010 at 9:20 PM, Stack <st...@duboce.net> wrote:
>
>> What you need?  bulk-upload, in the scheme of things, is a well
>> documented feature.  Its also one that has had some exercise and is
>> known to work well.  For a 0.89 release and trunk, documentation is
>> here: http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html.
>> The unit test you refer to below is good for figuring how to run a job
>> (Bulk-upload was redone for 0.89/trunk and is much improved over what
>> was available in 0.20.x)
>>
>
> *I need to load data into hbase using Hfiles.  *
>
> Ok, let me tell what I understand from all these things. Basically there are
> two ways to bulk load into hbase.
>
> 1- Using Command Line tools (importtsv, completebulkload )
> 2- Mapreduce job using HFileOutputFormat
>
> At the moment, I have generated the Hfiles using HFileOutputFormat and
> loading these files into hbase using completebulkload command line tool.
> here is my basic code skeleton. Correct me if I do anything wrong.
>
> Configuration conf = new Configuration();
> Job job = new Job(conf, "myjob");
>
>    FileInputFormat.setInputPaths(job, input);
>    job.setJarByClass(ParserDriver.class);
>    job.setMapperClass(MyParserMapper.class);
>    job.setNumReduceTasks(1);
>    job.setInputFormatClass(XmlInputFormat.class);
>    job.setOutputFormatClass(HFileOutputFormat.class);
>    job.setOutputKeyClass(ImmutableBytesWritable.class);
>    job.setOutputValueClass(Put.class);
>    job.setReducerClass(PutSortReducer.class);
>
>    Path outPath = new Path(output);
>    FileOutputFormat.setOutputPath(job, outPath);
>          job.waitForCompletion(true);
>
> and here is mapper skeleton
>
> public class MyParserMapper   extends
>    Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
>  while(true)
>   {
>       Put put = new Put(rowId);
>      put.add(...);
>      context.write(rwId, put);
>   }
>
> The link says:
> *In order to function efficiently, HFileOutputFormat must be configured such
> that each output HFile fits within a single region. In order to do this,
> jobs use Hadoop's TotalOrderPartitioner class to partition the map output
> into disjoint ranges of the key space, corresponding to the key ranges of
> the regions in the table. *"
>
> Now according to my configuration above  where i need to set
> *TotalOrderPartitioner
> ? *Should I need to add the following line also
>
> job.setPartitionerClass(TotalOrderPartitioner.class);
>
>
>
> On totalorderpartition, this is a partitioner class from hadoop.  The
>> MR partitioner -- the class that dictates which reducers get what map
>> outputs -- is pluggable. The default partitioner does a hash of the
>> output key to figure which reducer.  This won't work if you are
>> looking to have your hfile output totally sorted.
>>
>>
>
>
>> If you can't figure what its about, I'd suggest you check out the
>> hadoop book where it gets a good explication.
>>
>>   which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?
>
> On incremental upload, the doc. suggests you look at the output for
>> LoadIncrementalHFiles command.  Have you done that?  You run the
>> command and it'll add in whatever is ready for loading.
>>
>
>   I just use the command line tool for bulk uplaod but not seen
> LoadIncrementalHFiles  class yet to do it through program
>
>
>  ------------------------------
>
>
>>
>> St.Ack
>>
>>
>> On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman <sh...@gmail.com>
>> wrote:
>> > Hey Community,
>> >
>> > Well...it seems that nobody has experienced with the bulk load option. I
>> > have found one class which might help to write the code for it.
>> >
>> >
>> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
>> >
>> > From this, you can get the idea how to write map reduce job to output in
>> > HFiles format. But There is a little confusion about these two things
>> >
>> > 1-TotalOrderPartitioner
>> > 2-configureIncrementalLoad
>> >
>> > Does anybody have idea about how these things and how to configure it for
>> > the job?
>> >
>> > Thanks
>> >
>> >
>> >
>> > On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <sh...@gmail.com>
>> wrote:
>> >
>> >> Hi
>> >>
>> >> I am trying to investigate the bulk load option as described in the
>> >> following link.
>> >>
>> >> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
>> >>
>> >> Does anybody have sample code or have used it before?
>> >> Can it be helpful to insert data into existing table. In my scenario, I
>> >> have one table with 1 column family in which data will be inserted every
>> 15
>> >> minutes.
>> >>
>> >> Kindly share your experiences
>> >>
>> >> Thanks
>> >> --
>> >> Regards
>> >> Shuja-ur-Rehman Baig
>> >> <http://pk.linkedin.com/in/shujamughal>
>> >>
>> >>
>> >
>> >
>> > --
>> > Regards
>> > Shuja-ur-Rehman Baig
>> > <http://pk.linkedin.com/in/shujamughal>
>> >
>>
>
>
>
> --
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>
>

Re: Bulk Load Sample Code

Posted by Shuja Rehman <sh...@gmail.com>.

On Wed, Nov 10, 2010 at 9:20 PM, Stack <st...@duboce.net> wrote:

> What you need?  bulk-upload, in the scheme of things, is a well
> documented feature.  Its also one that has had some exercise and is
> known to work well.  For a 0.89 release and trunk, documentation is
> here: http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html.
> The unit test you refer to below is good for figuring how to run a job
> (Bulk-upload was redone for 0.89/trunk and is much improved over what
> was available in 0.20.x)
>

*I need to load data into hbase using Hfiles.  *

Ok, let me tell what I understand from all these things. Basically there are
two ways to bulk load into hbase.

1- Using Command Line tools (importtsv, completebulkload )
2- Mapreduce job using HFileOutputFormat

At the moment, I have generated the Hfiles using HFileOutputFormat and
loading these files into hbase using completebulkload command line tool.
here is my basic code skeleton. Correct me if I do anything wrong.

Configuration conf = new Configuration();
Job job = new Job(conf, "myjob");

    FileInputFormat.setInputPaths(job, input);
    job.setJarByClass(ParserDriver.class);
    job.setMapperClass(MyParserMapper.class);
    job.setNumReduceTasks(1);
    job.setInputFormatClass(XmlInputFormat.class);
    job.setOutputFormatClass(HFileOutputFormat.class);
    job.setOutputKeyClass(ImmutableBytesWritable.class);
    job.setOutputValueClass(Put.class);
    job.setReducerClass(PutSortReducer.class);

    Path outPath = new Path(output);
    FileOutputFormat.setOutputPath(job, outPath);
          job.waitForCompletion(true);

and here is mapper skeleton

public class MyParserMapper   extends
    Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
 while(true)
   {
       Put put = new Put(rowId);
      put.add(...);
      context.write(rwId, put);
   }

The link says:
*In order to function efficiently, HFileOutputFormat must be configured such
that each output HFile fits within a single region. In order to do this,
jobs use Hadoop's TotalOrderPartitioner class to partition the map output
into disjoint ranges of the key space, corresponding to the key ranges of
the regions in the table. *"

Now according to my configuration above  where i need to set
*TotalOrderPartitioner
? *Should I need to add the following line also

job.setPartitionerClass(TotalOrderPartitioner.class);



On totalorderpartition, this is a partitioner class from hadoop.  The
> MR partitioner -- the class that dictates which reducers get what map
> outputs -- is pluggable. The default partitioner does a hash of the
> output key to figure which reducer.  This won't work if you are
> looking to have your hfile output totally sorted.
>
>


> If you can't figure what its about, I'd suggest you check out the
> hadoop book where it gets a good explication.
>
>   which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?

On incremental upload, the doc. suggests you look at the output for
> LoadIncrementalHFiles command.  Have you done that?  You run the
> command and it'll add in whatever is ready for loading.
>

   I just use the command line tool for bulk uplaod but not seen
LoadIncrementalHFiles  class yet to do it through program


 ------------------------------


>
> St.Ack
>
>
> On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman <sh...@gmail.com>
> wrote:
> > Hey Community,
> >
> > Well...it seems that nobody has experienced with the bulk load option. I
> > have found one class which might help to write the code for it.
> >
> >
> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
> >
> > From this, you can get the idea how to write map reduce job to output in
> > HFiles format. But There is a little confusion about these two things
> >
> > 1-TotalOrderPartitioner
> > 2-configureIncrementalLoad
> >
> > Does anybody have idea about how these things and how to configure it for
> > the job?
> >
> > Thanks
> >
> >
> >
> > On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <sh...@gmail.com>
> wrote:
> >
> >> Hi
> >>
> >> I am trying to investigate the bulk load option as described in the
> >> following link.
> >>
> >> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
> >>
> >> Does anybody have sample code or have used it before?
> >> Can it be helpful to insert data into existing table. In my scenario, I
> >> have one table with 1 column family in which data will be inserted every
> 15
> >> minutes.
> >>
> >> Kindly share your experiences
> >>
> >> Thanks
> >> --
> >> Regards
> >> Shuja-ur-Rehman Baig
> >> <http://pk.linkedin.com/in/shujamughal>
> >>
> >>
> >
> >
> > --
> > Regards
> > Shuja-ur-Rehman Baig
> > <http://pk.linkedin.com/in/shujamughal>
> >
>



-- 
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>

Re: Bulk Load Sample Code

Posted by Stack <st...@duboce.net>.

What you need?  bulk-upload, in the scheme of things, is a well
documented feature.  Its also one that has had some exercise and is
known to work well.  For a 0.89 release and trunk, documentation is
here: http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html.
The unit test you refer to below is good for figuring how to run a job
(Bulk-upload was redone for 0.89/trunk and is much improved over what
was available in 0.20.x)

On totalorderpartition, this is a partitioner class from hadoop.  The
MR partitioner -- the class that dictates which reducers get what map
outputs -- is pluggable. The default partitioner does a hash of the
output key to figure which reducer.  This won't work if you are
looking to have your hfile output totally sorted.

If you can't figure what its about, I'd suggest you check out the
hadoop book where it gets a good explication.

On incremental upload, the doc. suggests you look at the output for
LoadIncrementalHFiles command.  Have you done that?  You run the
command and it'll add in whatever is ready for loading.

St.Ack

On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman <sh...@gmail.com> wrote:
> Hey Community,
>
> Well...it seems that nobody has experienced with the bulk load option. I
> have found one class which might help to write the code for it.
>
> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
>
> From this, you can get the idea how to write map reduce job to output in
> HFiles format. But There is a little confusion about these two things
>
> 1-TotalOrderPartitioner
> 2-configureIncrementalLoad
>
> Does anybody have idea about how these things and how to configure it for
> the job?
>
> Thanks
>
>
>
> On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <sh...@gmail.com> wrote:
>
>> Hi
>>
>> I am trying to investigate the bulk load option as described in the
>> following link.
>>
>> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
>>
>> Does anybody have sample code or have used it before?
>> Can it be helpful to insert data into existing table. In my scenario, I
>> have one table with 1 column family in which data will be inserted every 15
>> minutes.
>>
>> Kindly share your experiences
>>
>> Thanks
>> --
>> Regards
>> Shuja-ur-Rehman Baig
>> <http://pk.linkedin.com/in/shujamughal>
>>
>>
>
>
> --
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>
>

Re: Bulk Load Sample Code

Posted by Oleg Ruchovets <or...@gmail.com>.

Hi , what do you mean Shouldn't be hard to hack it up?
    Is it required changes of hbase code itself or there are some tricks
that could be done without changing hbase code?

And another question:
    I case I am going to change my hbase schema using only one column
families.
I'll have one column family and 2000 qualifiers:data pairs.
I didn't find any information about qualifiers limitations. Does  it brings
any performance penalties?

Thanks Oleg.

On Wed, Nov 10, 2010 at 6:08 PM, Stack <st...@duboce.net> wrote:

> Currently one column family only.  Shouldn't be hard to hack it up to
> do mulitple.  No one has done it yet.
> St.Ack
>
> On Wed, Nov 10, 2010 at 7:57 AM, Oleg Ruchovets <or...@gmail.com>
> wrote:
> > Hi ,
> >    Am I write that version hbase 20.X supports bulk load only in case
> data
> > has ONE column family?
> > My question is :
> >    Does latest (or any other) version support bulk load data with
> multiple
> > column families?
> >
> > Thanks
> > Oleg.
> >
> > On Wed, Nov 10, 2010 at 4:47 PM, Shuja Rehman <sh...@gmail.com>
> wrote:
> >
> >> Hey Community,
> >>
> >> Well...it seems that nobody has experienced with the bulk load option. I
> >> have found one class which might help to write the code for it.
> >>
> >>
> >>
> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
> >>
> >> From this, you can get the idea how to write map reduce job to output in
> >> HFiles format. But There is a little confusion about these two things
> >>
> >> 1-TotalOrderPartitioner
> >> 2-configureIncrementalLoad
> >>
> >> Does anybody have idea about how these things and how to configure it
> for
> >> the job?
> >>
> >> Thanks
> >>
> >>
> >>
> >> On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <sh...@gmail.com>
> >> wrote:
> >>
> >> > Hi
> >> >
> >> > I am trying to investigate the bulk load option as described in the
> >> > following link.
> >> >
> >> > http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
> >> >
> >> > Does anybody have sample code or have used it before?
> >> > Can it be helpful to insert data into existing table. In my scenario,
> I
> >> > have one table with 1 column family in which data will be inserted
> every
> >> 15
> >> > minutes.
> >> >
> >> > Kindly share your experiences
> >> >
> >> > Thanks
> >> > --
> >> > Regards
> >> > Shuja-ur-Rehman Baig
> >> > <http://pk.linkedin.com/in/shujamughal>
> >> >
> >> >
> >>
> >>
> >> --
> >> Regards
> >> Shuja-ur-Rehman Baig
> >> <http://pk.linkedin.com/in/shujamughal>
> >>
> >
>

Re: Bulk Load Sample Code

Posted by Stack <st...@duboce.net>.

Currently one column family only.  Shouldn't be hard to hack it up to
do mulitple.  No one has done it yet.
St.Ack

On Wed, Nov 10, 2010 at 7:57 AM, Oleg Ruchovets <or...@gmail.com> wrote:
> Hi ,
>    Am I write that version hbase 20.X supports bulk load only in case data
> has ONE column family?
> My question is :
>    Does latest (or any other) version support bulk load data with multiple
> column families?
>
> Thanks
> Oleg.
>
> On Wed, Nov 10, 2010 at 4:47 PM, Shuja Rehman <sh...@gmail.com> wrote:
>
>> Hey Community,
>>
>> Well...it seems that nobody has experienced with the bulk load option. I
>> have found one class which might help to write the code for it.
>>
>>
>> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
>>
>> From this, you can get the idea how to write map reduce job to output in
>> HFiles format. But There is a little confusion about these two things
>>
>> 1-TotalOrderPartitioner
>> 2-configureIncrementalLoad
>>
>> Does anybody have idea about how these things and how to configure it for
>> the job?
>>
>> Thanks
>>
>>
>>
>> On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <sh...@gmail.com>
>> wrote:
>>
>> > Hi
>> >
>> > I am trying to investigate the bulk load option as described in the
>> > following link.
>> >
>> > http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
>> >
>> > Does anybody have sample code or have used it before?
>> > Can it be helpful to insert data into existing table. In my scenario, I
>> > have one table with 1 column family in which data will be inserted every
>> 15
>> > minutes.
>> >
>> > Kindly share your experiences
>> >
>> > Thanks
>> > --
>> > Regards
>> > Shuja-ur-Rehman Baig
>> > <http://pk.linkedin.com/in/shujamughal>
>> >
>> >
>>
>>
>> --
>> Regards
>> Shuja-ur-Rehman Baig
>> <http://pk.linkedin.com/in/shujamughal>
>>
>

Re: Bulk Load Sample Code

Posted by Oleg Ruchovets <or...@gmail.com>.

Hi ,
    Am I write that version hbase 20.X supports bulk load only in case data
has ONE column family?
My question is :
    Does latest (or any other) version support bulk load data with multiple
column families?

Thanks
Oleg.

On Wed, Nov 10, 2010 at 4:47 PM, Shuja Rehman <sh...@gmail.com> wrote:

> Hey Community,
>
> Well...it seems that nobody has experienced with the bulk load option. I
> have found one class which might help to write the code for it.
>
>
> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
>
> From this, you can get the idea how to write map reduce job to output in
> HFiles format. But There is a little confusion about these two things
>
> 1-TotalOrderPartitioner
> 2-configureIncrementalLoad
>
> Does anybody have idea about how these things and how to configure it for
> the job?
>
> Thanks
>
>
>
> On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <sh...@gmail.com>
> wrote:
>
> > Hi
> >
> > I am trying to investigate the bulk load option as described in the
> > following link.
> >
> > http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
> >
> > Does anybody have sample code or have used it before?
> > Can it be helpful to insert data into existing table. In my scenario, I
> > have one table with 1 column family in which data will be inserted every
> 15
> > minutes.
> >
> > Kindly share your experiences
> >
> > Thanks
> > --
> > Regards
> > Shuja-ur-Rehman Baig
> > <http://pk.linkedin.com/in/shujamughal>
> >
> >
>
>
> --
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>
>

Re: Bulk Load Sample Code

Posted by Shuja Rehman <sh...@gmail.com>.

Hey Community,

Well...it seems that nobody has experienced with the bulk load option. I
have found one class which might help to write the code for it.

https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java

>From this, you can get the idea how to write map reduce job to output in
HFiles format. But There is a little confusion about these two things

1-TotalOrderPartitioner
2-configureIncrementalLoad

Does anybody have idea about how these things and how to configure it for
the job?

Thanks



On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <sh...@gmail.com> wrote:

> Hi
>
> I am trying to investigate the bulk load option as described in the
> following link.
>
> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
>
> Does anybody have sample code or have used it before?
> Can it be helpful to insert data into existing table. In my scenario, I
> have one table with 1 column family in which data will be inserted every 15
> minutes.
>
> Kindly share your experiences
>
> Thanks
> --
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>
>
>


-- 
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>