You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Chris Tarnas <cf...@email.com> on 2011/02/15 18:01:11 UTC

Put errors via thrift

I have a long running Hadoop streaming job that also puts about a billion sub 1kb rows into Hbase via thrift, and last night I got quite a few errors like this one:

Still had 34 puts left after retrying 10 times.

Could that be caused by one or more long running compactions and a split? I'm using GZ (license problems preclude LZO for the time being) and pretty much compactions and a split were all that I saw in the logs. I'm sure the long running compactions were a result of raising hbase.hstore.blockingStoreFile 20 and hbase.hregion.memstore.block.multiplier to 24, - that worked well to circumvent HBASE-3483 and other pauses for the smaller ~50M row inserts we had been doing. 

This is on a 10 datanode, each with 12 processors, 48GB RAM and 12 2TB drives. 3 other nodes are the masters and zookeeper quorum.

thanks,
-chris

Re: Put errors via thrift

Posted by Ryan Rawson <ry...@gmail.com>.
0.90.0 has been out since Jan 19th (nearly a month).  The 0.89 variant
you are running is substantially different in key areas than what is
current and published.

There are no fees for upgrading btw, it's completely free!
-ryan

On Tue, Feb 15, 2011 at 11:26 AM, Chris Tarnas <cf...@email.com> wrote:
> We are running cdh3b3 - so next week when they go to b4 we'll be up to 0.90 - I'm looking forward to it.
>
> -chris
>
> On Feb 15, 2011, at 11:05 AM, Ryan Rawson wrote:
>
>> If you were using 0.90, that unhelpful error message would be much more helpful!
>>
>> On Tue, Feb 15, 2011 at 9:56 AM, Jean-Daniel Cryans <jd...@apache.org> wrote:
>>> Compactions are done in the background, they won't block writes.
>>>
>>> Regarding splitting time, it could be that it had to retry a bunch of
>>> times in such a way that the write timed out, but I can't say for sure
>>> without the logs.
>>>
>>> Have you considered using the bulk loader? I personally would never
>>> try to insert a few billion rows via Thrift in a streaming job, sounds
>>> like a recipe for trouble ;)
>>>
>>> At the very least, you should consider pre-splitting your table so
>>> that you don't have to wait after the splits, splitting only makes
>>> sense when the data is slowly growing and not under an import. See
>>> this API call: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
>>> byte[][])
>>>
>>> J-D
>>>
>>> On Tue, Feb 15, 2011 at 9:01 AM, Chris Tarnas <cf...@email.com> wrote:
>>>> I have a long running Hadoop streaming job that also puts about a billion sub 1kb rows into Hbase via thrift, and last night I got quite a few errors like this one:
>>>>
>>>> Still had 34 puts left after retrying 10 times.
>>>>
>>>> Could that be caused by one or more long running compactions and a split? I'm using GZ (license problems preclude LZO for the time being) and pretty much compactions and a split were all that I saw in the logs. I'm sure the long running compactions were a result of raising hbase.hstore.blockingStoreFile 20 and hbase.hregion.memstore.block.multiplier to 24, - that worked well to circumvent HBASE-3483 and other pauses for the smaller ~50M row inserts we had been doing.
>>>>
>>>> This is on a 10 datanode, each with 12 processors, 48GB RAM and 12 2TB drives. 3 other nodes are the masters and zookeeper quorum.
>>>>
>>>> thanks,
>>>> -chris
>>>
>
>

Re: Put errors via thrift

Posted by Chris Tarnas <cf...@email.com>.
We are running cdh3b3 - so next week when they go to b4 we'll be up to 0.90 - I'm looking forward to it.

-chris

On Feb 15, 2011, at 11:05 AM, Ryan Rawson wrote:

> If you were using 0.90, that unhelpful error message would be much more helpful!
> 
> On Tue, Feb 15, 2011 at 9:56 AM, Jean-Daniel Cryans <jd...@apache.org> wrote:
>> Compactions are done in the background, they won't block writes.
>> 
>> Regarding splitting time, it could be that it had to retry a bunch of
>> times in such a way that the write timed out, but I can't say for sure
>> without the logs.
>> 
>> Have you considered using the bulk loader? I personally would never
>> try to insert a few billion rows via Thrift in a streaming job, sounds
>> like a recipe for trouble ;)
>> 
>> At the very least, you should consider pre-splitting your table so
>> that you don't have to wait after the splits, splitting only makes
>> sense when the data is slowly growing and not under an import. See
>> this API call: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
>> byte[][])
>> 
>> J-D
>> 
>> On Tue, Feb 15, 2011 at 9:01 AM, Chris Tarnas <cf...@email.com> wrote:
>>> I have a long running Hadoop streaming job that also puts about a billion sub 1kb rows into Hbase via thrift, and last night I got quite a few errors like this one:
>>> 
>>> Still had 34 puts left after retrying 10 times.
>>> 
>>> Could that be caused by one or more long running compactions and a split? I'm using GZ (license problems preclude LZO for the time being) and pretty much compactions and a split were all that I saw in the logs. I'm sure the long running compactions were a result of raising hbase.hstore.blockingStoreFile 20 and hbase.hregion.memstore.block.multiplier to 24, - that worked well to circumvent HBASE-3483 and other pauses for the smaller ~50M row inserts we had been doing.
>>> 
>>> This is on a 10 datanode, each with 12 processors, 48GB RAM and 12 2TB drives. 3 other nodes are the masters and zookeeper quorum.
>>> 
>>> thanks,
>>> -chris
>> 


Re: Put errors via thrift

Posted by Ryan Rawson <ry...@gmail.com>.
If you were using 0.90, that unhelpful error message would be much more helpful!

On Tue, Feb 15, 2011 at 9:56 AM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> Compactions are done in the background, they won't block writes.
>
> Regarding splitting time, it could be that it had to retry a bunch of
> times in such a way that the write timed out, but I can't say for sure
> without the logs.
>
> Have you considered using the bulk loader? I personally would never
> try to insert a few billion rows via Thrift in a streaming job, sounds
> like a recipe for trouble ;)
>
> At the very least, you should consider pre-splitting your table so
> that you don't have to wait after the splits, splitting only makes
> sense when the data is slowly growing and not under an import. See
> this API call: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
> byte[][])
>
> J-D
>
> On Tue, Feb 15, 2011 at 9:01 AM, Chris Tarnas <cf...@email.com> wrote:
>> I have a long running Hadoop streaming job that also puts about a billion sub 1kb rows into Hbase via thrift, and last night I got quite a few errors like this one:
>>
>> Still had 34 puts left after retrying 10 times.
>>
>> Could that be caused by one or more long running compactions and a split? I'm using GZ (license problems preclude LZO for the time being) and pretty much compactions and a split were all that I saw in the logs. I'm sure the long running compactions were a result of raising hbase.hstore.blockingStoreFile 20 and hbase.hregion.memstore.block.multiplier to 24, - that worked well to circumvent HBASE-3483 and other pauses for the smaller ~50M row inserts we had been doing.
>>
>> This is on a 10 datanode, each with 12 processors, 48GB RAM and 12 2TB drives. 3 other nodes are the masters and zookeeper quorum.
>>
>> thanks,
>> -chris
>

RE: Row Key Question

Posted by Gary Gilbert - SQLstream <ga...@sqlstream.com>.
Hi
I've been considering a slightly different scenario.

In this scenario I'd hash the column qualifier and mod by some constant and
append the result to the rowkey. The idea is to spread the writes for a
specific rowkey among the various regions.  Mod by the constant gives
control over how many ranges would exist.  It assumes that all column
qualifiers are equally possible.

Depending on the app, scans could either  (1) append x00's or xff's to a
specific rowkey to gather all the columns for a specific key or (2)
enumerate the rowkey values and do a merge of all the x00's x01's x02's etc.
depending on requirements.

Any thoughts?
Gary

-----Original Message-----
From: Peter Haidinyak [mailto:phaidinyak@local.com] 
Sent: Tuesday, February 15, 2011 6:38 PM
To: user@hbase.apache.org
Subject: Row Key Question

Hi All,
  A couple of weeks ago I asked about how to distribute my rows across the
servers if the key always starts with the date in the format...

YYYY-MM-DD

I believe Stack, although I could be wrong, suggested pre-pending a 'X-'
when 'X' is a number from 1 to the number of servers I have. This way a scan
can be threaded out where there is one thread per server and each thread
'owns' one 'X-' range of the keys. 
My question is on the import side, should I have one thread per server and
round-robin each line of our log files to the threads for the 'put' to the
server? Does this buy me anymore throughput?

Thanks again.

-Pete


Re: Row Key Question

Posted by Stack <st...@duboce.net>.
On Wed, Feb 16, 2011 at 10:48 AM, Peter Haidinyak <ph...@local.com> wrote:
> I'm not using the Timestamp alone, it is part of a compound key.
> My old key included
> <timestamp>|<vendor name>|<other data>
>
> My new key will include
> <vendor name>|<timestamp>|<other data>
>

Yes.  Got that.  Was just trying to give you a bit more background to
highlight what the lads were saying before me.


> This is still not ideal since a couple of vendor makes up over 50% of the logs. It would be nice to prefix the key with a server Id and force the row to that server. With my limited knowledge I don't know how  to do that yet.
>

You don't want to do that (You'll learn why when you pick up more hbasics).

Would suggest you not worry about the distribution.  Thats the point
of hbase.  You don't have to worry about where the stuff is.

St.Ack

RE: Row Key Question

Posted by Peter Haidinyak <ph...@local.com>.
I'm not using the Timestamp alone, it is part of a compound key. 
My old key included
<timestamp>|<vendor name>|<other data>

My new key will include
<vendor name>|<timestamp>|<other data>

This is still not ideal since a couple of vendor makes up over 50% of the logs. It would be nice to prefix the key with a server Id and force the row to that server. With my limited knowledge I don't know how  to do that yet.

Thanks

-Pete


-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Wednesday, February 16, 2011 10:15 AM
To: user@hbase.apache.org
Subject: Re: Row Key Question

See http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
byte[][])

For illustration of why ts alone is a bad key for sorted hbase, see
http://hbase.apache.org/schema.html#d0e2139

St.Ack

On Wed, Feb 16, 2011 at 10:01 AM, Peter Haidinyak <ph...@local.com> wrote:
> Thanks, I'm storing log files and need to scan the tables by date and vendor. Since the vendor is limited to at most 16 characters I can put a padded version in the front followed by the date (vendor**********|DD-MM-YYYY|other date) and I can still scan by setting the start row to (vendor**********|DD-MM-YYYY|other date) and the end row to (w***************|DD-MM-YYYY|other date).
>
> Can anyone point me to information about 'pre-creating regions'? That sounds like an interesting solution.
>
> Thanks again
>
> -Pete
>
>
>
> -----Original Message-----
> From: Doug Meil [mailto:doug.meil@explorysmedical.com]
> Sent: Wednesday, February 16, 2011 9:41 AM
> To: user@hbase.apache.org
> Subject: RE: Row Key Question
>
> Hi there-
>
> As was described in the HBase chapter in the Hadoop book by Tom White, you don't want to insert a lot of data at one time with incrementing keys.
>
> YYYY-MM-DD would seem to me to be a reasonable lead-portion of a key - as long as you aren't trying to insert everything in time-order (and all at one time).  There are other posts about randomizing the input records.  That would provide scan-ability, assuming that is important to you.   There are also tricks where you can reverse the date (e.g., dd-mm-yyyy, or hash the date, etc.) for better spread if randoming the input records isn't possible.
>
> Another big performance benefit we've seen is pre-creating regions for tables.  One of our guys posted something about that this week.  You'll have more servers participating in the load right off the bat.
>
> Doug
>
>
> -----Original Message-----
> From: Peter Haidinyak [mailto:phaidinyak@local.com]
> Sent: Tuesday, February 15, 2011 7:38 PM
> To: user@hbase.apache.org
> Subject: Row Key Question
>
> Hi All,
>  A couple of weeks ago I asked about how to distribute my rows across the servers if the key always starts with the date in the format...
>
> YYYY-MM-DD
>
> I believe Stack, although I could be wrong, suggested pre-pending a 'X-' when 'X' is a number from 1 to the number of servers I have. This way a scan can be threaded out where there is one thread per server and each thread 'owns' one 'X-' range of the keys.
> My question is on the import side, should I have one thread per server and round-robin each line of our log files to the threads for the 'put' to the server? Does this buy me anymore throughput?
>
> Thanks again.
>
> -Pete
>
>

Re: Row Key Question

Posted by Stack <st...@duboce.net>.
See http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
byte[][])

For illustration of why ts alone is a bad key for sorted hbase, see
http://hbase.apache.org/schema.html#d0e2139

St.Ack

On Wed, Feb 16, 2011 at 10:01 AM, Peter Haidinyak <ph...@local.com> wrote:
> Thanks, I'm storing log files and need to scan the tables by date and vendor. Since the vendor is limited to at most 16 characters I can put a padded version in the front followed by the date (vendor**********|DD-MM-YYYY|other date) and I can still scan by setting the start row to (vendor**********|DD-MM-YYYY|other date) and the end row to (w***************|DD-MM-YYYY|other date).
>
> Can anyone point me to information about 'pre-creating regions'? That sounds like an interesting solution.
>
> Thanks again
>
> -Pete
>
>
>
> -----Original Message-----
> From: Doug Meil [mailto:doug.meil@explorysmedical.com]
> Sent: Wednesday, February 16, 2011 9:41 AM
> To: user@hbase.apache.org
> Subject: RE: Row Key Question
>
> Hi there-
>
> As was described in the HBase chapter in the Hadoop book by Tom White, you don't want to insert a lot of data at one time with incrementing keys.
>
> YYYY-MM-DD would seem to me to be a reasonable lead-portion of a key - as long as you aren't trying to insert everything in time-order (and all at one time).  There are other posts about randomizing the input records.  That would provide scan-ability, assuming that is important to you.   There are also tricks where you can reverse the date (e.g., dd-mm-yyyy, or hash the date, etc.) for better spread if randoming the input records isn't possible.
>
> Another big performance benefit we've seen is pre-creating regions for tables.  One of our guys posted something about that this week.  You'll have more servers participating in the load right off the bat.
>
> Doug
>
>
> -----Original Message-----
> From: Peter Haidinyak [mailto:phaidinyak@local.com]
> Sent: Tuesday, February 15, 2011 7:38 PM
> To: user@hbase.apache.org
> Subject: Row Key Question
>
> Hi All,
>  A couple of weeks ago I asked about how to distribute my rows across the servers if the key always starts with the date in the format...
>
> YYYY-MM-DD
>
> I believe Stack, although I could be wrong, suggested pre-pending a 'X-' when 'X' is a number from 1 to the number of servers I have. This way a scan can be threaded out where there is one thread per server and each thread 'owns' one 'X-' range of the keys.
> My question is on the import side, should I have one thread per server and round-robin each line of our log files to the threads for the 'put' to the server? Does this buy me anymore throughput?
>
> Thanks again.
>
> -Pete
>
>

RE: Row Key Question

Posted by Peter Haidinyak <ph...@local.com>.
Thanks, I'm storing log files and need to scan the tables by date and vendor. Since the vendor is limited to at most 16 characters I can put a padded version in the front followed by the date (vendor**********|DD-MM-YYYY|other date) and I can still scan by setting the start row to (vendor**********|DD-MM-YYYY|other date) and the end row to (w***************|DD-MM-YYYY|other date).

Can anyone point me to information about 'pre-creating regions'? That sounds like an interesting solution.

Thanks again

-Pete



-----Original Message-----
From: Doug Meil [mailto:doug.meil@explorysmedical.com] 
Sent: Wednesday, February 16, 2011 9:41 AM
To: user@hbase.apache.org
Subject: RE: Row Key Question

Hi there-

As was described in the HBase chapter in the Hadoop book by Tom White, you don't want to insert a lot of data at one time with incrementing keys.

YYYY-MM-DD would seem to me to be a reasonable lead-portion of a key - as long as you aren't trying to insert everything in time-order (and all at one time).  There are other posts about randomizing the input records.  That would provide scan-ability, assuming that is important to you.   There are also tricks where you can reverse the date (e.g., dd-mm-yyyy, or hash the date, etc.) for better spread if randoming the input records isn't possible.  

Another big performance benefit we've seen is pre-creating regions for tables.  One of our guys posted something about that this week.  You'll have more servers participating in the load right off the bat.

Doug


-----Original Message-----
From: Peter Haidinyak [mailto:phaidinyak@local.com] 
Sent: Tuesday, February 15, 2011 7:38 PM
To: user@hbase.apache.org
Subject: Row Key Question

Hi All,
  A couple of weeks ago I asked about how to distribute my rows across the servers if the key always starts with the date in the format...

YYYY-MM-DD

I believe Stack, although I could be wrong, suggested pre-pending a 'X-' when 'X' is a number from 1 to the number of servers I have. This way a scan can be threaded out where there is one thread per server and each thread 'owns' one 'X-' range of the keys. 
My question is on the import side, should I have one thread per server and round-robin each line of our log files to the threads for the 'put' to the server? Does this buy me anymore throughput?

Thanks again.

-Pete


RE: Row Key Question

Posted by Doug Meil <do...@explorysmedical.com>.
Hi there-

As was described in the HBase chapter in the Hadoop book by Tom White, you don't want to insert a lot of data at one time with incrementing keys.

YYYY-MM-DD would seem to me to be a reasonable lead-portion of a key - as long as you aren't trying to insert everything in time-order (and all at one time).  There are other posts about randomizing the input records.  That would provide scan-ability, assuming that is important to you.   There are also tricks where you can reverse the date (e.g., dd-mm-yyyy, or hash the date, etc.) for better spread if randoming the input records isn't possible.  

Another big performance benefit we've seen is pre-creating regions for tables.  One of our guys posted something about that this week.  You'll have more servers participating in the load right off the bat.

Doug


-----Original Message-----
From: Peter Haidinyak [mailto:phaidinyak@local.com] 
Sent: Tuesday, February 15, 2011 7:38 PM
To: user@hbase.apache.org
Subject: Row Key Question

Hi All,
  A couple of weeks ago I asked about how to distribute my rows across the servers if the key always starts with the date in the format...

YYYY-MM-DD

I believe Stack, although I could be wrong, suggested pre-pending a 'X-' when 'X' is a number from 1 to the number of servers I have. This way a scan can be threaded out where there is one thread per server and each thread 'owns' one 'X-' range of the keys. 
My question is on the import side, should I have one thread per server and round-robin each line of our log files to the threads for the 'put' to the server? Does this buy me anymore throughput?

Thanks again.

-Pete


Re: Row Key Question

Posted by Chris Tarnas <cf...@email.com>.
I've been playing with salting my keys as well keys. My current experiments are around hashing the rowkey and using digits of that to create the prefix. That would make your salts and your puts idempotent, but you do loose control of data-locality.

-chris

On Feb 15, 2011, at 4:38 PM, Peter Haidinyak wrote:

> Hi All,
>  A couple of weeks ago I asked about how to distribute my rows across the servers if the key always starts with the date in the format...
> 
> YYYY-MM-DD
> 
> I believe Stack, although I could be wrong, suggested pre-pending a 'X-' when 'X' is a number from 1 to the number of servers I have. This way a scan can be threaded out where there is one thread per server and each thread 'owns' one 'X-' range of the keys. 
> My question is on the import side, should I have one thread per server and round-robin each line of our log files to the threads for the 'put' to the server? Does this buy me anymore throughput?
> 
> Thanks again.
> 
> -Pete
> 


Row Key Question

Posted by Peter Haidinyak <ph...@local.com>.
Hi All,
  A couple of weeks ago I asked about how to distribute my rows across the servers if the key always starts with the date in the format...

YYYY-MM-DD

I believe Stack, although I could be wrong, suggested pre-pending a 'X-' when 'X' is a number from 1 to the number of servers I have. This way a scan can be threaded out where there is one thread per server and each thread 'owns' one 'X-' range of the keys. 
My question is on the import side, should I have one thread per server and round-robin each line of our log files to the threads for the 'put' to the server? Does this buy me anymore throughput?

Thanks again.

-Pete


Re: Put errors via thrift

Posted by Chris Tarnas <cf...@email.com>.
Thanks for the help. It definitely looks like the move to 0.90 would resolve many of these issues.

-chris

On Feb 15, 2011, at 2:33 PM, Jean-Daniel Cryans wrote:

> That would make sense... although I've done testing and the more files
> you have to split, the longer it takes to create the reference files
> so the longer the split. Now that I think of it, with your high
> blocking store files setting, you may be running into an extreme case
> of https://issues.apache.org/jira/browse/HBASE-3308
> 
> J-D
> 
> On Tue, Feb 15, 2011 at 2:27 PM, Chris Tarnas <cf...@email.com> wrote:
>> No swapping, about 30% of the total CPU is idle, looking through ganglia I do see a spike in cpu_wio at that time - but only to 2%. My suspect though is GZ compression is just taking a while.
>> 
>> 
>> 
>> On Feb 15, 2011, at 2:10 PM, Jean-Daniel Cryans wrote:
>> 
>>> Yeah if it's the same key space that splits, it could explain the
>>> issue... 65 seconds is a long time! Is there any swapping going on?
>>> CPU or IO starvation?
>>> 
>>> In that context I don't see any problem setting the pausing time higher.
>>> 
>>> J-D
>>> 
>>> On Tue, Feb 15, 2011 at 1:54 PM, Chris Tarnas <cf...@email.com> wrote:
>>>> Hi JD,
>>>> 
>>>> Two splits happened within 90 seconds of each other on one server - one took 65 seconds, the next took 43 seconds. with only a 10 second timeout (10 tries, 1 second between) I think that was the issue. Are their any hidden issues to raising those retry parameters so I can withstand a 120 second pause?
>>>> 
>>>> thanks,
>>>> -chris
>>>> 
>>>> On Feb 15, 2011, at 1:37 PM, Chris Tarnas wrote:
>>>> 
>>>>> 
>>>>> On Feb 15, 2011, at 11:32 AM, Jean-Daniel Cryans wrote:
>>>>> 
>>>>>> On Tue, Feb 15, 2011 at 11:24 AM, Chris Tarnas <cf...@email.com> wrote:
>>>>>>> We are definitely considering writing a bulk loader, but as it is this fits into an existing processing pipeline that is not Java and does not fit into the importtsv tool (we use column names as data as well) we have not done it yet. I do foresee a Java bulk loader in our future though.
>>>>>> 
>>>>>> Well I was referring to THE bulk loader: http://hbase.apache.org/bulk-loads.html
>>>>>> 
>>>>> 
>>>>> It has the same problem really for us. Also - does that needs 0.92 for multi-column support? I'm pretty sure we will be moving to a bulk loader soon.
>>>>> 
>>>>>>> 
>>>>>>> Does the shell expose the createTable method that defines the number of columns (or I suppose I'll probably need to brush up on my JRuby...). Splits were definitely happening then. Currently I'm using 1GB regions, I'll probably go larger (~5) and salt my keys to distribute them better.
>>>>>> 
>>>>>> I don't think that method is in the shell, it'd be weird anyway to
>>>>>> write down hundreds of bytes in the shell IMO... Do you see a region
>>>>>> hotspots? If so, definitely solve the key distribution as it's going
>>>>>> to kill your performance. Bigger regions won't really help if you're
>>>>>> still always writing to the same few ones.
>>>>>> 
>>>>> 
>>>>> We use schema files that we redirect into the shell like DDL. My other reason to go to large reasons was we are going to have lots of older data as well. The top few loads will be hot and used most often but we do need access to the older data as well. I foresee up to about 2-4 billion rows a week, so at the rate we are creating these tables that would be quite a few regions per server at 1GB regions.
>>>>> 
>>>>>>> 
>>>>>>> The reason I had thought it might be compaction related is I saw that we had hit the hbase.hstore.blockingStoreFiles limit as well as having the timeout expire.
>>>>>>> 
>>>>>> 
>>>>>> Well the writes would block on flushing, so unless all the handlers
>>>>>> are filled then you shouldn't see retries exhausted. You could grep
>>>>>> your logs to see how log the splits took btw, but the total locking
>>>>>> time isn't exactly that time... it's less than that. 0.90.1 would
>>>>>> definitely help here.
>>>>>> 
>>>>> 
>>>>> Most splits look to be about 5-7 seconds. I'll investigate more around the error times and see if any were longer.
>>>>> 
>>>>> We'll be upgrading next week.
>>>>> 
>>>>> Thanks again!
>>>>> -chris
>>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>> 


Re: Put errors via thrift

Posted by Jean-Daniel Cryans <jd...@apache.org>.
That would make sense... although I've done testing and the more files
you have to split, the longer it takes to create the reference files
so the longer the split. Now that I think of it, with your high
blocking store files setting, you may be running into an extreme case
of https://issues.apache.org/jira/browse/HBASE-3308

J-D

On Tue, Feb 15, 2011 at 2:27 PM, Chris Tarnas <cf...@email.com> wrote:
> No swapping, about 30% of the total CPU is idle, looking through ganglia I do see a spike in cpu_wio at that time - but only to 2%. My suspect though is GZ compression is just taking a while.
>
>
>
> On Feb 15, 2011, at 2:10 PM, Jean-Daniel Cryans wrote:
>
>> Yeah if it's the same key space that splits, it could explain the
>> issue... 65 seconds is a long time! Is there any swapping going on?
>> CPU or IO starvation?
>>
>> In that context I don't see any problem setting the pausing time higher.
>>
>> J-D
>>
>> On Tue, Feb 15, 2011 at 1:54 PM, Chris Tarnas <cf...@email.com> wrote:
>>> Hi JD,
>>>
>>> Two splits happened within 90 seconds of each other on one server - one took 65 seconds, the next took 43 seconds. with only a 10 second timeout (10 tries, 1 second between) I think that was the issue. Are their any hidden issues to raising those retry parameters so I can withstand a 120 second pause?
>>>
>>> thanks,
>>> -chris
>>>
>>> On Feb 15, 2011, at 1:37 PM, Chris Tarnas wrote:
>>>
>>>>
>>>> On Feb 15, 2011, at 11:32 AM, Jean-Daniel Cryans wrote:
>>>>
>>>>> On Tue, Feb 15, 2011 at 11:24 AM, Chris Tarnas <cf...@email.com> wrote:
>>>>>> We are definitely considering writing a bulk loader, but as it is this fits into an existing processing pipeline that is not Java and does not fit into the importtsv tool (we use column names as data as well) we have not done it yet. I do foresee a Java bulk loader in our future though.
>>>>>
>>>>> Well I was referring to THE bulk loader: http://hbase.apache.org/bulk-loads.html
>>>>>
>>>>
>>>> It has the same problem really for us. Also - does that needs 0.92 for multi-column support? I'm pretty sure we will be moving to a bulk loader soon.
>>>>
>>>>>>
>>>>>> Does the shell expose the createTable method that defines the number of columns (or I suppose I'll probably need to brush up on my JRuby...). Splits were definitely happening then. Currently I'm using 1GB regions, I'll probably go larger (~5) and salt my keys to distribute them better.
>>>>>
>>>>> I don't think that method is in the shell, it'd be weird anyway to
>>>>> write down hundreds of bytes in the shell IMO... Do you see a region
>>>>> hotspots? If so, definitely solve the key distribution as it's going
>>>>> to kill your performance. Bigger regions won't really help if you're
>>>>> still always writing to the same few ones.
>>>>>
>>>>
>>>> We use schema files that we redirect into the shell like DDL. My other reason to go to large reasons was we are going to have lots of older data as well. The top few loads will be hot and used most often but we do need access to the older data as well. I foresee up to about 2-4 billion rows a week, so at the rate we are creating these tables that would be quite a few regions per server at 1GB regions.
>>>>
>>>>>>
>>>>>> The reason I had thought it might be compaction related is I saw that we had hit the hbase.hstore.blockingStoreFiles limit as well as having the timeout expire.
>>>>>>
>>>>>
>>>>> Well the writes would block on flushing, so unless all the handlers
>>>>> are filled then you shouldn't see retries exhausted. You could grep
>>>>> your logs to see how log the splits took btw, but the total locking
>>>>> time isn't exactly that time... it's less than that. 0.90.1 would
>>>>> definitely help here.
>>>>>
>>>>
>>>> Most splits look to be about 5-7 seconds. I'll investigate more around the error times and see if any were longer.
>>>>
>>>> We'll be upgrading next week.
>>>>
>>>> Thanks again!
>>>> -chris
>>>>>
>>>>
>>>
>>>
>
>

Re: Put errors via thrift

Posted by Chris Tarnas <cf...@email.com>.
No swapping, about 30% of the total CPU is idle, looking through ganglia I do see a spike in cpu_wio at that time - but only to 2%. My suspect though is GZ compression is just taking a while. 



On Feb 15, 2011, at 2:10 PM, Jean-Daniel Cryans wrote:

> Yeah if it's the same key space that splits, it could explain the
> issue... 65 seconds is a long time! Is there any swapping going on?
> CPU or IO starvation?
> 
> In that context I don't see any problem setting the pausing time higher.
> 
> J-D
> 
> On Tue, Feb 15, 2011 at 1:54 PM, Chris Tarnas <cf...@email.com> wrote:
>> Hi JD,
>> 
>> Two splits happened within 90 seconds of each other on one server - one took 65 seconds, the next took 43 seconds. with only a 10 second timeout (10 tries, 1 second between) I think that was the issue. Are their any hidden issues to raising those retry parameters so I can withstand a 120 second pause?
>> 
>> thanks,
>> -chris
>> 
>> On Feb 15, 2011, at 1:37 PM, Chris Tarnas wrote:
>> 
>>> 
>>> On Feb 15, 2011, at 11:32 AM, Jean-Daniel Cryans wrote:
>>> 
>>>> On Tue, Feb 15, 2011 at 11:24 AM, Chris Tarnas <cf...@email.com> wrote:
>>>>> We are definitely considering writing a bulk loader, but as it is this fits into an existing processing pipeline that is not Java and does not fit into the importtsv tool (we use column names as data as well) we have not done it yet. I do foresee a Java bulk loader in our future though.
>>>> 
>>>> Well I was referring to THE bulk loader: http://hbase.apache.org/bulk-loads.html
>>>> 
>>> 
>>> It has the same problem really for us. Also - does that needs 0.92 for multi-column support? I'm pretty sure we will be moving to a bulk loader soon.
>>> 
>>>>> 
>>>>> Does the shell expose the createTable method that defines the number of columns (or I suppose I'll probably need to brush up on my JRuby...). Splits were definitely happening then. Currently I'm using 1GB regions, I'll probably go larger (~5) and salt my keys to distribute them better.
>>>> 
>>>> I don't think that method is in the shell, it'd be weird anyway to
>>>> write down hundreds of bytes in the shell IMO... Do you see a region
>>>> hotspots? If so, definitely solve the key distribution as it's going
>>>> to kill your performance. Bigger regions won't really help if you're
>>>> still always writing to the same few ones.
>>>> 
>>> 
>>> We use schema files that we redirect into the shell like DDL. My other reason to go to large reasons was we are going to have lots of older data as well. The top few loads will be hot and used most often but we do need access to the older data as well. I foresee up to about 2-4 billion rows a week, so at the rate we are creating these tables that would be quite a few regions per server at 1GB regions.
>>> 
>>>>> 
>>>>> The reason I had thought it might be compaction related is I saw that we had hit the hbase.hstore.blockingStoreFiles limit as well as having the timeout expire.
>>>>> 
>>>> 
>>>> Well the writes would block on flushing, so unless all the handlers
>>>> are filled then you shouldn't see retries exhausted. You could grep
>>>> your logs to see how log the splits took btw, but the total locking
>>>> time isn't exactly that time... it's less than that. 0.90.1 would
>>>> definitely help here.
>>>> 
>>> 
>>> Most splits look to be about 5-7 seconds. I'll investigate more around the error times and see if any were longer.
>>> 
>>> We'll be upgrading next week.
>>> 
>>> Thanks again!
>>> -chris
>>>> 
>>> 
>> 
>> 


Re: Put errors via thrift

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Yeah if it's the same key space that splits, it could explain the
issue... 65 seconds is a long time! Is there any swapping going on?
CPU or IO starvation?

In that context I don't see any problem setting the pausing time higher.

J-D

On Tue, Feb 15, 2011 at 1:54 PM, Chris Tarnas <cf...@email.com> wrote:
> Hi JD,
>
> Two splits happened within 90 seconds of each other on one server - one took 65 seconds, the next took 43 seconds. with only a 10 second timeout (10 tries, 1 second between) I think that was the issue. Are their any hidden issues to raising those retry parameters so I can withstand a 120 second pause?
>
> thanks,
> -chris
>
> On Feb 15, 2011, at 1:37 PM, Chris Tarnas wrote:
>
>>
>> On Feb 15, 2011, at 11:32 AM, Jean-Daniel Cryans wrote:
>>
>>> On Tue, Feb 15, 2011 at 11:24 AM, Chris Tarnas <cf...@email.com> wrote:
>>>> We are definitely considering writing a bulk loader, but as it is this fits into an existing processing pipeline that is not Java and does not fit into the importtsv tool (we use column names as data as well) we have not done it yet. I do foresee a Java bulk loader in our future though.
>>>
>>> Well I was referring to THE bulk loader: http://hbase.apache.org/bulk-loads.html
>>>
>>
>> It has the same problem really for us. Also - does that needs 0.92 for multi-column support? I'm pretty sure we will be moving to a bulk loader soon.
>>
>>>>
>>>> Does the shell expose the createTable method that defines the number of columns (or I suppose I'll probably need to brush up on my JRuby...). Splits were definitely happening then. Currently I'm using 1GB regions, I'll probably go larger (~5) and salt my keys to distribute them better.
>>>
>>> I don't think that method is in the shell, it'd be weird anyway to
>>> write down hundreds of bytes in the shell IMO... Do you see a region
>>> hotspots? If so, definitely solve the key distribution as it's going
>>> to kill your performance. Bigger regions won't really help if you're
>>> still always writing to the same few ones.
>>>
>>
>> We use schema files that we redirect into the shell like DDL. My other reason to go to large reasons was we are going to have lots of older data as well. The top few loads will be hot and used most often but we do need access to the older data as well. I foresee up to about 2-4 billion rows a week, so at the rate we are creating these tables that would be quite a few regions per server at 1GB regions.
>>
>>>>
>>>> The reason I had thought it might be compaction related is I saw that we had hit the hbase.hstore.blockingStoreFiles limit as well as having the timeout expire.
>>>>
>>>
>>> Well the writes would block on flushing, so unless all the handlers
>>> are filled then you shouldn't see retries exhausted. You could grep
>>> your logs to see how log the splits took btw, but the total locking
>>> time isn't exactly that time... it's less than that. 0.90.1 would
>>> definitely help here.
>>>
>>
>> Most splits look to be about 5-7 seconds. I'll investigate more around the error times and see if any were longer.
>>
>> We'll be upgrading next week.
>>
>> Thanks again!
>> -chris
>>>
>>
>
>

Re: Put errors via thrift

Posted by Chris Tarnas <cf...@email.com>.
Hi JD,

Two splits happened within 90 seconds of each other on one server - one took 65 seconds, the next took 43 seconds. with only a 10 second timeout (10 tries, 1 second between) I think that was the issue. Are their any hidden issues to raising those retry parameters so I can withstand a 120 second pause?

thanks,
-chris

On Feb 15, 2011, at 1:37 PM, Chris Tarnas wrote:

> 
> On Feb 15, 2011, at 11:32 AM, Jean-Daniel Cryans wrote:
> 
>> On Tue, Feb 15, 2011 at 11:24 AM, Chris Tarnas <cf...@email.com> wrote:
>>> We are definitely considering writing a bulk loader, but as it is this fits into an existing processing pipeline that is not Java and does not fit into the importtsv tool (we use column names as data as well) we have not done it yet. I do foresee a Java bulk loader in our future though.
>> 
>> Well I was referring to THE bulk loader: http://hbase.apache.org/bulk-loads.html
>> 
> 
> It has the same problem really for us. Also - does that needs 0.92 for multi-column support? I'm pretty sure we will be moving to a bulk loader soon.
> 
>>> 
>>> Does the shell expose the createTable method that defines the number of columns (or I suppose I'll probably need to brush up on my JRuby...). Splits were definitely happening then. Currently I'm using 1GB regions, I'll probably go larger (~5) and salt my keys to distribute them better.
>> 
>> I don't think that method is in the shell, it'd be weird anyway to
>> write down hundreds of bytes in the shell IMO... Do you see a region
>> hotspots? If so, definitely solve the key distribution as it's going
>> to kill your performance. Bigger regions won't really help if you're
>> still always writing to the same few ones.
>> 
> 
> We use schema files that we redirect into the shell like DDL. My other reason to go to large reasons was we are going to have lots of older data as well. The top few loads will be hot and used most often but we do need access to the older data as well. I foresee up to about 2-4 billion rows a week, so at the rate we are creating these tables that would be quite a few regions per server at 1GB regions.
> 
>>> 
>>> The reason I had thought it might be compaction related is I saw that we had hit the hbase.hstore.blockingStoreFiles limit as well as having the timeout expire.
>>> 
>> 
>> Well the writes would block on flushing, so unless all the handlers
>> are filled then you shouldn't see retries exhausted. You could grep
>> your logs to see how log the splits took btw, but the total locking
>> time isn't exactly that time... it's less than that. 0.90.1 would
>> definitely help here.
>> 
> 
> Most splits look to be about 5-7 seconds. I'll investigate more around the error times and see if any were longer. 
> 
> We'll be upgrading next week.
> 
> Thanks again!
> -chris
>> 
> 


Re: Put errors via thrift

Posted by Chris Tarnas <cf...@email.com>.
On Feb 15, 2011, at 11:32 AM, Jean-Daniel Cryans wrote:

> On Tue, Feb 15, 2011 at 11:24 AM, Chris Tarnas <cf...@email.com> wrote:
>> We are definitely considering writing a bulk loader, but as it is this fits into an existing processing pipeline that is not Java and does not fit into the importtsv tool (we use column names as data as well) we have not done it yet. I do foresee a Java bulk loader in our future though.
> 
> Well I was referring to THE bulk loader: http://hbase.apache.org/bulk-loads.html
> 

It has the same problem really for us. Also - does that needs 0.92 for multi-column support? I'm pretty sure we will be moving to a bulk loader soon.

>> 
>> Does the shell expose the createTable method that defines the number of columns (or I suppose I'll probably need to brush up on my JRuby...). Splits were definitely happening then. Currently I'm using 1GB regions, I'll probably go larger (~5) and salt my keys to distribute them better.
> 
> I don't think that method is in the shell, it'd be weird anyway to
> write down hundreds of bytes in the shell IMO... Do you see a region
> hotspots? If so, definitely solve the key distribution as it's going
> to kill your performance. Bigger regions won't really help if you're
> still always writing to the same few ones.
> 

We use schema files that we redirect into the shell like DDL. My other reason to go to large reasons was we are going to have lots of older data as well. The top few loads will be hot and used most often but we do need access to the older data as well. I foresee up to about 2-4 billion rows a week, so at the rate we are creating these tables that would be quite a few regions per server at 1GB regions.

>> 
>> The reason I had thought it might be compaction related is I saw that we had hit the hbase.hstore.blockingStoreFiles limit as well as having the timeout expire.
>> 
> 
> Well the writes would block on flushing, so unless all the handlers
> are filled then you shouldn't see retries exhausted. You could grep
> your logs to see how log the splits took btw, but the total locking
> time isn't exactly that time... it's less than that. 0.90.1 would
> definitely help here.
> 

Most splits look to be about 5-7 seconds. I'll investigate more around the error times and see if any were longer. 

We'll be upgrading next week.

Thanks again!
-chris
> 


Re: Put errors via thrift

Posted by Jean-Daniel Cryans <jd...@apache.org>.
On Tue, Feb 15, 2011 at 11:24 AM, Chris Tarnas <cf...@email.com> wrote:
> We are definitely considering writing a bulk loader, but as it is this fits into an existing processing pipeline that is not Java and does not fit into the importtsv tool (we use column names as data as well) we have not done it yet. I do foresee a Java bulk loader in our future though.

Well I was referring to THE bulk loader: http://hbase.apache.org/bulk-loads.html

>
> Does the shell expose the createTable method that defines the number of columns (or I suppose I'll probably need to brush up on my JRuby...). Splits were definitely happening then. Currently I'm using 1GB regions, I'll probably go larger (~5) and salt my keys to distribute them better.

I don't think that method is in the shell, it'd be weird anyway to
write down hundreds of bytes in the shell IMO... Do you see a region
hotspots? If so, definitely solve the key distribution as it's going
to kill your performance. Bigger regions won't really help if you're
still always writing to the same few ones.

>
> The reason I had thought it might be compaction related is I saw that we had hit the hbase.hstore.blockingStoreFiles limit as well as having the timeout expire.
>

Well the writes would block on flushing, so unless all the handlers
are filled then you shouldn't see retries exhausted. You could grep
your logs to see how log the splits took btw, but the total locking
time isn't exactly that time... it's less than that. 0.90.1 would
definitely help here.

J-D

Re: Put errors via thrift

Posted by Chris Tarnas <cf...@email.com>.
We are definitely considering writing a bulk loader, but as it is this fits into an existing processing pipeline that is not Java and does not fit into the importtsv tool (we use column names as data as well) we have not done it yet. I do foresee a Java bulk loader in our future though.

Does the shell expose the createTable method that defines the number of columns (or I suppose I'll probably need to brush up on my JRuby...). Splits were definitely happening then. Currently I'm using 1GB regions, I'll probably go larger (~5) and salt my keys to distribute them better.

The reason I had thought it might be compaction related is I saw that we had hit the hbase.hstore.blockingStoreFiles limit as well as having the timeout expire.

thanks,
-chris

On Feb 15, 2011, at 9:56 AM, Jean-Daniel Cryans wrote:

> Compactions are done in the background, they won't block writes.
> 
> Regarding splitting time, it could be that it had to retry a bunch of
> times in such a way that the write timed out, but I can't say for sure
> without the logs.
> 
> Have you considered using the bulk loader? I personally would never
> try to insert a few billion rows via Thrift in a streaming job, sounds
> like a recipe for trouble ;)
> 
> At the very least, you should consider pre-splitting your table so
> that you don't have to wait after the splits, splitting only makes
> sense when the data is slowly growing and not under an import. See
> this API call: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
> byte[][])
> 
> J-D
> 
> On Tue, Feb 15, 2011 at 9:01 AM, Chris Tarnas <cf...@email.com> wrote:
>> I have a long running Hadoop streaming job that also puts about a billion sub 1kb rows into Hbase via thrift, and last night I got quite a few errors like this one:
>> 
>> Still had 34 puts left after retrying 10 times.
>> 
>> Could that be caused by one or more long running compactions and a split? I'm using GZ (license problems preclude LZO for the time being) and pretty much compactions and a split were all that I saw in the logs. I'm sure the long running compactions were a result of raising hbase.hstore.blockingStoreFile 20 and hbase.hregion.memstore.block.multiplier to 24, - that worked well to circumvent HBASE-3483 and other pauses for the smaller ~50M row inserts we had been doing.
>> 
>> This is on a 10 datanode, each with 12 processors, 48GB RAM and 12 2TB drives. 3 other nodes are the masters and zookeeper quorum.
>> 
>> thanks,
>> -chris


Re: Put errors via thrift

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Compactions are done in the background, they won't block writes.

Regarding splitting time, it could be that it had to retry a bunch of
times in such a way that the write timed out, but I can't say for sure
without the logs.

Have you considered using the bulk loader? I personally would never
try to insert a few billion rows via Thrift in a streaming job, sounds
like a recipe for trouble ;)

At the very least, you should consider pre-splitting your table so
that you don't have to wait after the splits, splitting only makes
sense when the data is slowly growing and not under an import. See
this API call: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
byte[][])

J-D

On Tue, Feb 15, 2011 at 9:01 AM, Chris Tarnas <cf...@email.com> wrote:
> I have a long running Hadoop streaming job that also puts about a billion sub 1kb rows into Hbase via thrift, and last night I got quite a few errors like this one:
>
> Still had 34 puts left after retrying 10 times.
>
> Could that be caused by one or more long running compactions and a split? I'm using GZ (license problems preclude LZO for the time being) and pretty much compactions and a split were all that I saw in the logs. I'm sure the long running compactions were a result of raising hbase.hstore.blockingStoreFile 20 and hbase.hregion.memstore.block.multiplier to 24, - that worked well to circumvent HBASE-3483 and other pauses for the smaller ~50M row inserts we had been doing.
>
> This is on a 10 datanode, each with 12 processors, 48GB RAM and 12 2TB drives. 3 other nodes are the masters and zookeeper quorum.
>
> thanks,
> -chris