You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Panayotis Antonopoulos <an...@hotmail.com> on 2011/05/17 14:16:39 UTC

HFiles created by MR Jobs and HBase Performance




Hello,
I am writing a MR job where each reducer will output one HFile containing some of the rows of the table that will be created.
At first I thought to use the HashPartitioner to achieve load balancing, but this would mix the rows and the output of each reducer will not be a continuous part of the Hbase table that will be created combining all these files.

So, I would like to ask you if it is important to use a Partitioner (TotalOrderPartitioner, for example) that will allow the reducers to have a continuous part of the table?

If I do not do that, will this ruin the performance of HBase when executing queries or when it runs compactions, as rows, which are supposed to be next to each other, will be in different HFiles and the number of disk seeks will increase?

Thank you for your help!
Panagiotis
 		 	   		  

RE: HFiles created by MR Jobs and HBase Performance

Posted by Panayotis Antonopoulos <an...@hotmail.com>.
Thank you for your help!
I hadn't understood the use of the TotalOrderPartitioner correctly.

> Date: Tue, 17 May 2011 09:14:37 -0500
> Subject: Re: HFiles created by MR Jobs and HBase Performance
> From: cft@email.com
> To: user@hbase.apache.org
> 
> If I understand hbase bulk loading correctly each hfile generated needs its
> keys to fit within one existing region - that is the reason the total order
> partitioner is used. I believe however that within one region before
> compaction you can have multiple hfiles for a given column family and each
> hfile does not need to have distinct key ranges, they just need to fit
> within the overall range of the region. This does impact read performance so
> multiple hfiles get cleaned up and condensed into one during a compaction.
> 
> -chris
> 
> 2011/5/17 Panayotis Antonopoulos <an...@hotmail.com>
> 
> >
> >
> >
> >
> > Hello,
> > I am writing a MR job where each reducer will output one HFile containing
> > some of the rows of the table that will be created.
> > At first I thought to use the HashPartitioner to achieve load balancing,
> > but this would mix the rows and the output of each reducer will not be a
> > continuous part of the Hbase table that will be created combining all these
> > files.
> >
> > So, I would like to ask you if it is important to use a Partitioner
> > (TotalOrderPartitioner, for example) that will allow the reducers to have a
> > continuous part of the table?
> >
> > If I do not do that, will this ruin the performance of HBase when executing
> > queries or when it runs compactions, as rows, which are supposed to be next
> > to each other, will be in different HFiles and the number of disk seeks will
> > increase?
> >
> > Thank you for your help!
> > Panagiotis
> >
 		 	   		  

Re: HFiles created by MR Jobs and HBase Performance

Posted by Christopher Tarnas <cf...@email.com>.
If I understand hbase bulk loading correctly each hfile generated needs its
keys to fit within one existing region - that is the reason the total order
partitioner is used. I believe however that within one region before
compaction you can have multiple hfiles for a given column family and each
hfile does not need to have distinct key ranges, they just need to fit
within the overall range of the region. This does impact read performance so
multiple hfiles get cleaned up and condensed into one during a compaction.

-chris

2011/5/17 Panayotis Antonopoulos <an...@hotmail.com>

>
>
>
>
> Hello,
> I am writing a MR job where each reducer will output one HFile containing
> some of the rows of the table that will be created.
> At first I thought to use the HashPartitioner to achieve load balancing,
> but this would mix the rows and the output of each reducer will not be a
> continuous part of the Hbase table that will be created combining all these
> files.
>
> So, I would like to ask you if it is important to use a Partitioner
> (TotalOrderPartitioner, for example) that will allow the reducers to have a
> continuous part of the table?
>
> If I do not do that, will this ruin the performance of HBase when executing
> queries or when it runs compactions, as rows, which are supposed to be next
> to each other, will be in different HFiles and the number of disk seeks will
> increase?
>
> Thank you for your help!
> Panagiotis
>