You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by 王凯 <wj...@163.com> on 2008/11/11 09:36:16 UTC

HBase read performance

hello, every one. i used to test the performance in PE, but the performance is not well enough. especially, the table format is not as what i need. so, i create a table and write some string in every cell. then, i use the count , the count time is the count_1 time. after all, i count all the tables again, the count time is the count_2 time. count_2 time is almost half of the count_1 time!

i do not know why this happened, perhaps cache?

column 	row 	  cell 	write  	count_1  	count_2 
10	     10000	   10B 	 17.2        13.5	         7.2
10	     10000	   50B 	 17	        13.1	         7.3
10	     10000	   200B     19.7	        13.6	         7.6
10	   100000	  10B 	128.4	131.5	74.7
10	   100000	  50B 	134.6	143.1	66.2
10	   100000	  200B      138.1	100.1	77.3



	
	
		
			
		
	


Re: HBase read performance

Posted by stack <st...@duboce.net>.
So, all is running on single machine? Can you figure where the time is 
being spent? Run iostat, etc? Can you try with more than one machine?
Thanks,
St.Ack


���� wrote:
> ��2008-11-12��"Michael Stack" stack@duboce.net> ���
>   
>> ���� wrote:
>>     
>>>   
>>>       
>>>> Are you using hbase TRUNK? If so, and if your checkout was recent, 
>>>> you'll see benefit/disadvantage of cache.
>>>>     
>>>>         
>>> hadoop 0.18.1, hbase 0.18.0. I do not use TRUNK , any useful update?
>>> what do you mean the disadvantage of cache?
>>>   
>>>       
>> Disadvantage is that if you are getting mostly cache-misses, then you 
>> will be paying the price of filling the cache but getting no benefit.
>>
>> There is no data block cache in 0.18.x (by default) so this is not the 
>> issue here. Ignore my comments on cache effect from earlier.
>>
>>     
>>>>     
>>>>         
>>>>> sorry, i did not explain this clearly. there is 10 columns in the table, 10000 rows in a column ,and the 10Bytes in a row
>>>>> the time is 17s, 13.5s, 7.2s
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>> 10000 rows in a column? Do you mean 10000 rows in the table and each row 
>>>> has an entry in the column? Or do you mean 10 rows in the table and each 
>>>> row has 10000 columns?
>>>>
>>>>     
>>>>         
>>> 10000 rows in the table and each row has an entry in the column
>>>   
>>>       
>> Then the numbers would seem to be way off. Something else must be going 
>> on. Is the machine swapping?
>>
>>
>>     
>>> DELL PowerEdge 430 , P4 2.8G, 1G Memory. Tooooo poor
>>>   
>>>       
>> Is the machine swapping? Are the datanodes running on same machines?
>>     
> there is no swapping, because we will add swap files in another project.
> and the table is only build on one machine!
>
>   
>> St.Ack
>>     


Re:Re: HBase read performance

Posted by 王凯 <wj...@163.com>.
 
 


在2008-11-12,"Michael Stack" stack@duboce.net> 写道:
>王凯 wrote:
>>   
>>> Are you using hbase TRUNK? If so, and if your checkout was recent, 
>>> you'll see benefit/disadvantage of cache.
>>>     
>> hadoop 0.18.1, hbase 0.18.0. I do not use TRUNK , any useful update?
>> what do you mean the disadvantage of cache?
>>   
>Disadvantage is that if you are getting mostly cache-misses, then you 
>will be paying the price of filling the cache but getting no benefit.
>
>There is no data block cache in 0.18.x (by default) so this is not the 
>issue here. Ignore my comments on cache effect from earlier.
>
>>>     
>>>> sorry, i did not explain this clearly. there is 10 columns in the table, 10000 rows in a column ,and the 10Bytes in a row
>>>> the time is 17s, 13.5s, 7.2s
>>>>
>>>>   
>>>>       
>>> 10000 rows in a column? Do you mean 10000 rows in the table and each row 
>>> has an entry in the column? Or do you mean 10 rows in the table and each 
>>> row has 10000 columns?
>>>
>>>     
>> 10000 rows in the table and each row has an entry in the column
>>   
>
>Then the numbers would seem to be way off. Something else must be going 
>on. Is the machine swapping?
>
>
>> DELL PowerEdge 430 , P4 2.8G, 1G Memory. Tooooo poor
>>   
>Is the machine swapping? Are the datanodes running on same machines?
there is no swapping, because we will add swap files in another project.
and the table is only build on one machine!

>St.Ack

Re: HBase read performance

Posted by Michael Stack <st...@duboce.net>.
���� wrote:
>   
>> Are you using hbase TRUNK? If so, and if your checkout was recent, 
>> you'll see benefit/disadvantage of cache.
>>     
> hadoop 0.18.1, hbase 0.18.0. I do not use TRUNK , any useful update?
> what do you mean the disadvantage of cache?
>   
Disadvantage is that if you are getting mostly cache-misses, then you 
will be paying the price of filling the cache but getting no benefit.

There is no data block cache in 0.18.x (by default) so this is not the 
issue here. Ignore my comments on cache effect from earlier.

>>     
>>>>> column 	row 	  cell 	write  	count_1  	count_2 
>>>>> 10	     10000	   10B 	 17.2        13.5	         7.2
>>>>> 10	     10000	   50B 	 17	        13.1	         7.3
>>>>> 10	     10000	   200B     19.7	        13.6	         7.6
>>>>> 10	   100000	  10B 	128.4	131.5	74.7
>>>>> 10	   100000	  50B 	134.6	143.1	66.2
>>>>> 10	   100000	  200B      138.1	100.1	77.3
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>> What is above saying?  That in column 10, you wrote 1000 items of size 
>>>> ten bytes?  The write took 17.2ms, first read 13.5ms and the second 7.2ms?
>>>>
>>>>     
>>>>         
>>> sorry, i did not explain this clearly. there is 10 columns in the table, 10000 rows in a column ,and the 10Bytes in a row
>>> the time is 17s, 13.5s, 7.2s
>>>
>>>   
>>>       
>> 10000 rows in a column? Do you mean 10000 rows in the table and each row 
>> has an entry in the column? Or do you mean 10 rows in the table and each 
>> row has 10000 columns?
>>
>>     
> 10000 rows in the table and each row has an entry in the column
>   

Then the numbers would seem to be way off. Something else must be going 
on. Is the machine swapping?


> DELL PowerEdge 430 , P4 2.8G, 1G Memory. Tooooo poor
>   
Is the machine swapping? Are the datanodes running on same machines?

St.Ack

Re:Re: HBase read performance

Posted by 王凯 <wj...@163.com>.
 
 


在2008-11-12,"Michael Stack" <st...@duboce.net> 写道:
>王凯 wrote:
>>  
>>
>>
>>
>>
>> 在2008-11-12,"Michael Stack" <st...@duboce.net> 写道:
>>   
>>> 王凯 wrote:
>>>     
>>>> hello, every one. i used to test the performance in PE, but the performance is not well enough. 
>>>>       
>>> Please say more.  What kind of numbers were you getting?
>>>
>>>     
>>>> especially, the table format is not as what i need. so, i create a table and write some string in every cell. then, i use the count , the count time is the count_1 time. 
>>>> after all, i count all the tables again, the count time is the count_2 time. count_2 time is almost half of the count_1 time!
>>>>
>>>> i do not know why this happened, perhaps cache?
>>>>   
>>>>       
>>> Perhaps. If you enable DEBUG and look in the regionserver log, you can 
>>> see log of cache hits and misses.  Try and get general sense of how 
>>> first run compares to second.  Are your reads random or serial?  If 
>>> serial, then yeah, cache is going to help.
>>>     
>> thanks, i am a new comer
>> when the data would be in cache? some times , the count time is never change!
>>   
>
>Are you using hbase TRUNK? If so, and if your checkout was recent, 
>you'll see benefit/disadvantage of cache.
hadoop 0.18.1, hbase 0.18.0. I do not use TRUNK , any useful update?
what do you mean the disadvantage of cache?
>
>
>>>> column 	row 	  cell 	write  	count_1  	count_2 
>>>> 10	     10000	   10B 	 17.2        13.5	         7.2
>>>> 10	     10000	   50B 	 17	        13.1	         7.3
>>>> 10	     10000	   200B     19.7	        13.6	         7.6
>>>> 10	   100000	  10B 	128.4	131.5	74.7
>>>> 10	   100000	  50B 	134.6	143.1	66.2
>>>> 10	   100000	  200B      138.1	100.1	77.3
>>>>
>>>>   
>>>>       
>>> What is above saying?  That in column 10, you wrote 1000 items of size 
>>> ten bytes?  The write took 17.2ms, first read 13.5ms and the second 7.2ms?
>>>
>>>     
>>
>> sorry, i did not explain this clearly. there is 10 columns in the table, 10000 rows in a column ,and the 10Bytes in a row
>> the time is 17s, 13.5s, 7.2s
>>
>>   
>10000 rows in a column? Do you mean 10000 rows in the table and each row 
>has an entry in the column? Or do you mean 10 rows in the table and each 
>row has 10000 columns?
>
10000 rows in the table and each row has an entry in the column
>
>17seconds, 13.5seconds and 7.2seconds are not what we usually see. Tell 
>us more about your hardware setup.

DELL PowerEdge 430 , P4 2.8G, 1G Memory. Tooooo poor!

>Thanks,
>St.Ack

Re: HBase read performance

Posted by Michael Stack <st...@duboce.net>.
���� wrote:
>  
>
>
>
>
> ��2008-11-12��"Michael Stack" <st...@duboce.net> ���
>   
>> ���� wrote:
>>     
>>> hello, every one. i used to test the performance in PE, but the performance is not well enough. 
>>>       
>> Please say more.  What kind of numbers were you getting?
>>
>>     
>>> especially, the table format is not as what i need. so, i create a table and write some string in every cell. then, i use the count , the count time is the count_1 time. 
>>> after all, i count all the tables again, the count time is the count_2 time. count_2 time is almost half of the count_1 time!
>>>
>>> i do not know why this happened, perhaps cache?
>>>   
>>>       
>> Perhaps. If you enable DEBUG and look in the regionserver log, you can 
>> see log of cache hits and misses.  Try and get general sense of how 
>> first run compares to second.  Are your reads random or serial?  If 
>> serial, then yeah, cache is going to help.
>>     
> thanks, i am a new comer
> when the data would be in cache? some times , the count time is never change!
>   

Are you using hbase TRUNK? If so, and if your checkout was recent, 
you'll see benefit/disadvantage of cache.


>>> column 	row 	  cell 	write  	count_1  	count_2 
>>> 10	     10000	   10B 	 17.2        13.5	         7.2
>>> 10	     10000	   50B 	 17	        13.1	         7.3
>>> 10	     10000	   200B     19.7	        13.6	         7.6
>>> 10	   100000	  10B 	128.4	131.5	74.7
>>> 10	   100000	  50B 	134.6	143.1	66.2
>>> 10	   100000	  200B      138.1	100.1	77.3
>>>
>>>   
>>>       
>> What is above saying?  That in column 10, you wrote 1000 items of size 
>> ten bytes?  The write took 17.2ms, first read 13.5ms and the second 7.2ms?
>>
>>     
>
> sorry, i did not explain this clearly. there is 10 columns in the table, 10000 rows in a column ,and the 10Bytes in a row
> the time is 17s, 13.5s, 7.2s
>
>   
10000 rows in a column? Do you mean 10000 rows in the table and each row 
has an entry in the column? Or do you mean 10 rows in the table and each 
row has 10000 columns?



17seconds, 13.5seconds and 7.2seconds are not what we usually see. Tell 
us more about your hardware setup.

Thanks,
St.Ack

Re:Re: HBase read performance

Posted by 王凯 <wj...@163.com>.
 




在2008-11-12,"Michael Stack" <st...@duboce.net> 写道:
>王凯 wrote:
>> hello, every one. i used to test the performance in PE, but the performance is not well enough. 
>Please say more.  What kind of numbers were you getting?
>
>> especially, the table format is not as what i need. so, i create a table and write some string in every cell. then, i use the count , the count time is the count_1 time. 
>> after all, i count all the tables again, the count time is the count_2 time. count_2 time is almost half of the count_1 time!
>>
>> i do not know why this happened, perhaps cache?
>>   
>Perhaps. If you enable DEBUG and look in the regionserver log, you can 
>see log of cache hits and misses.  Try and get general sense of how 
>first run compares to second.  Are your reads random or serial?  If 
>serial, then yeah, cache is going to help.
thanks, i am a new comer
when the data would be in cache? some times , the count time is never change!

>
>> column 	row 	  cell 	write  	count_1  	count_2 
>> 10	     10000	   10B 	 17.2        13.5	         7.2
>> 10	     10000	   50B 	 17	        13.1	         7.3
>> 10	     10000	   200B     19.7	        13.6	         7.6
>> 10	   100000	  10B 	128.4	131.5	74.7
>> 10	   100000	  50B 	134.6	143.1	66.2
>> 10	   100000	  200B      138.1	100.1	77.3
>>
>>   
>What is above saying?  That in column 10, you wrote 1000 items of size 
>ten bytes?  The write took 17.2ms, first read 13.5ms and the second 7.2ms?
>

sorry, i did not explain this clearly. there is 10 columns in the table, 10000 rows in a column ,and the 10Bytes in a row
the time is 17s, 13.5s, 7.2s

>Thanks,
>St.Ack

Re: HBase read performance

Posted by Michael Stack <st...@duboce.net>.
王凯 wrote:
> hello, every one. i used to test the performance in PE, but the performance is not well enough. 
Please say more.  What kind of numbers were you getting?

> especially, the table format is not as what i need. so, i create a table and write some string in every cell. then, i use the count , the count time is the count_1 time. 
> after all, i count all the tables again, the count time is the count_2 time. count_2 time is almost half of the count_1 time!
>
> i do not know why this happened, perhaps cache?
>   
Perhaps. If you enable DEBUG and look in the regionserver log, you can 
see log of cache hits and misses.  Try and get general sense of how 
first run compares to second.  Are your reads random or serial?  If 
serial, then yeah, cache is going to help.

> column 	row 	  cell 	write  	count_1  	count_2 
> 10	     10000	   10B 	 17.2        13.5	         7.2
> 10	     10000	   50B 	 17	        13.1	         7.3
> 10	     10000	   200B     19.7	        13.6	         7.6
> 10	   100000	  10B 	128.4	131.5	74.7
> 10	   100000	  50B 	134.6	143.1	66.2
> 10	   100000	  200B      138.1	100.1	77.3
>
>   
What is above saying?  That in column 10, you wrote 1000 items of size 
ten bytes?  The write took 17.2ms, first read 13.5ms and the second 7.2ms?

Thanks,
St.Ack