You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Ted Yu <yu...@gmail.com> on 2016/12/03 20:59:57 UTC

Re: Writting bottleneck in HBase ?

I was in China the past 10 days where I didn't have access to gmail.

bq. repeat this sequence a thousand times

You mean proceeding with the next parameter ?

bq. use hashing mechanism to transform this long string

How is the hash generated ?
The hash prefix should presumably evenly distribute the write load.

Thanks

On Thu, Nov 24, 2016 at 8:13 AM, schausson <sc...@softera.fr> wrote:

> Hi, thanks for your answer.
>
> About your question related to thread management : yes, I have several
> threads (up to 4) that may call my persistence method.
>
> When I wrote the post, I had not configured anything special about regions
> for my table so it basically used default splitting policy I guess.
> Next to your answer, I gave a try to this :
> /byte[][] splits = new
> RegionSplitter.HexStringSplit().split(numberOfRegionServers);
> /
> Which lead to 12 regions at table creation time.
>
> It slightly improved performances : persistance drops from 2min to 1min40s
> approximately.
>
> I tried with 24 regions but nothing changed then...
>
> About how parameters IDs are distributed : to make it simple, I read 5
> values per parameter (*2000) and call persistence, and repeat this sequence
> a thousand times. So they should distribute accross all my region servers,
> right ?
> One additional clue : parameters ID are alphanumeric, evenly distributed
> between AAAAA and ZZZZZ, but I add a prefix to them which is long string
> (about 25 characters). To save storage space (because rowId is dupplicated
> for each cell), I use hashing mechanism to transform this long string into
> Long value (and I ahev a mapping table next to the main table), so I dont
> really know how these Long values "distribute"...
>
> Not sure I'm clear...
>
>
>
>
>
>
>
>
>
> --
> View this message in context: http://apache-hbase.679495.n3.
> nabble.com/Writting-bottleneck-in-HBase-tp4084656p4084678.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

Re: Writting bottleneck in HBase ?

Posted by schausson <sc...@softera.fr>.

Hi Ted, thanks for your help !

It seems I was not clear with my explanation, let me try again :
In my input file, let's say I have 2000 parameters and for each parameter,
5000 values recorded along given timeframe.
When I read the file, I read it part by part, basically by using a time
sliding window : For instance, I read all parameters values between t0 and
t1, 
which return me approximately  5 values per parameter. I write this chunk of
data to HBase and read the file for subsequent time window (t1 to t2), write
data to HBase and so on...

About hashing mechanism applied to rowId, here is the algorithm :

		public long hash(String string) {
		  long h = 1125899906842597L; // prime
		  int len = string.length();

		  for (int i = 0; i < len; i++) {
		    h = 31*h + string.charAt(i);
		  }
		  return h;
		}

Which does not guarantee any even distribution from what I understand...

Regards



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Writting-bottleneck-in-HBase-tp4084656p4084985.html
Sent from the HBase User mailing list archive at Nabble.com.