You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Rama Ramani <ra...@live.com> on 2015/01/09 21:53:52 UTC

RE: HBase - bulk loading files

Is there a way to specify Salted buckets with HBase ImportTsv while doing bulk load?
 
Thanks
Rama
 
From: rama.ramani@live.com
To: user@hbase.apache.org
Subject: RE: HBase - bulk loading files
Date: Fri, 19 Dec 2014 14:09:09 -0800




0.98.0.2.1.9.0-2196-hadoop2Hadoop 2.4.0.2.1.9.0-2196Subversion git@github.com:hortonworks/hadoop-monarch.git -r cb50542bc92fb77dee52
No, the clusters were not taking additional load.
ThanksRama
> Date: Fri, 19 Dec 2014 13:50:30 -0800
> Subject: Re: HBase - bulk loading files
> From: yuzhihong@gmail.com
> To: user@hbase.apache.org
> 
> Can you let us know the HBase and hadoop versions you're using ?
> 
> Were the clusters taking load from other sources when ImportTsv was running
> ?
> 
> Cheers
> 
> On Fri, Dec 19, 2014 at 1:43 PM, Rama Ramani <ra...@live.com> wrote:
> 
> > Hello,         I am bulk loading a set of files (about 400MB each) with
> > "|" as the delimiter using ImportTsv. It takes a long time for the 'map'
> > job to complete on both a 4 node and a 16 node cluster. I tried the option
> > to generate the output (providing -Dimporttsv.bulk.output) which took time
> > indicating that the generation of the output files needs improvement.
> > I am seeing about 8000 rows / sec for this dataset, the 400MB ingestion
> > takes about 5-6 mins. How can I improve this? Is there an alternate tool I
> > can use?
> > ThanksRama
 		 	   		   		 	   		  

Re: HBase - bulk loading files

Posted by Ted Yu <yu...@gmail.com>.
Salted buckets seem to be concept from other projects, such as Phoenix.

Can you be a bit more specific about your requirement ?

Cheers

On Fri, Jan 9, 2015 at 12:53 PM, Rama Ramani <ra...@live.com> wrote:

> Is there a way to specify Salted buckets with HBase ImportTsv while doing
> bulk load?
>
> Thanks
> Rama
>
> From: rama.ramani@live.com
> To: user@hbase.apache.org
> Subject: RE: HBase - bulk loading files
> Date: Fri, 19 Dec 2014 14:09:09 -0800
>
>
>
>
> 0.98.0.2.1.9.0-2196-hadoop2Hadoop 2.4.0.2.1.9.0-2196Subversion
> git@github.com:hortonworks/hadoop-monarch.git -r cb50542bc92fb77dee52
> No, the clusters were not taking additional load.
> ThanksRama
> > Date: Fri, 19 Dec 2014 13:50:30 -0800
> > Subject: Re: HBase - bulk loading files
> > From: yuzhihong@gmail.com
> > To: user@hbase.apache.org
> >
> > Can you let us know the HBase and hadoop versions you're using ?
> >
> > Were the clusters taking load from other sources when ImportTsv was
> running
> > ?
> >
> > Cheers
> >
> > On Fri, Dec 19, 2014 at 1:43 PM, Rama Ramani <ra...@live.com>
> wrote:
> >
> > > Hello,         I am bulk loading a set of files (about 400MB each) with
> > > "|" as the delimiter using ImportTsv. It takes a long time for the
> 'map'
> > > job to complete on both a 4 node and a 16 node cluster. I tried the
> option
> > > to generate the output (providing -Dimporttsv.bulk.output) which took
> time
> > > indicating that the generation of the output files needs improvement.
> > > I am seeing about 8000 rows / sec for this dataset, the 400MB ingestion
> > > takes about 5-6 mins. How can I improve this? Is there an alternate
> tool I
> > > can use?
> > > ThanksRama
>
>
>