You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Daniel <d4...@gmail.com> on 2008/07/15 14:54:23 UTC

performance on getRow and get

hi all,
   i'm writting a program to access my hbase table in a MR job. my first
version is to get different values from get(row,column name),
and now im changing to get one row each time into a map, and query that map
instead - for one reduce job.
   i think it would be better to access hbase only once per one reduce
function, but it seems like the latter version takes a longer time to finish

during the reduce job. does this mean get(row, column name) is less
expensive than get(row) ?
  thanks.

Daniel

RE: performance on getRow and get

Posted by Jim Kellerman <ji...@powerset.com>.

get(row, column) is more efficient than get(row) because get(row) must access multiple HStores and do multiple reads while get(row, column) only accesses one HStore.

---
Jim Kellerman, Senior Engineer; Powerset


> -----Original Message-----
> From: Daniel [mailto:d4nielfree@gmail.com]
> Sent: Tuesday, July 15, 2008 5:54 AM
> To: hbase-user@hadoop.apache.org
> Subject: performance on getRow and get
>
> hi all,
>    i'm writting a program to access my hbase table in a MR
> job. my first version is to get different values from
> get(row,column name), and now im changing to get one row each
> time into a map, and query that map instead - for one reduce job.
>    i think it would be better to access hbase only once per
> one reduce function, but it seems like the latter version
> takes a longer time to finish
>
> during the reduce job. does this mean get(row, column name)
> is less expensive than get(row) ?
>   thanks.
>
> Daniel
>
> No virus found in this incoming message.
> Checked by AVG - http://www.avg.com
> Version: 8.0.138 / Virus Database: 270.4.10/1551 - Release
> Date: 7/14/2008 6:49 AM
>
No virus found in this outgoing message.
Checked by AVG - http://www.avg.com
Version: 8.0.138 / Virus Database: 270.4.10/1551 - Release Date: 7/14/2008 6:49 AM

Re: performance on getRow and get

Posted by ZhaoWei <wz...@gmail.com>.

One HStore per column family, really? So get the whole family is not expensive?

Thanks

From: "Sébastien_Rainville" <se...@gmail.com>
Subject: Re: performance on getRow and get
Date: Tue, 15 Jul 2008 09:08:03 -0400

> Hi Daniel,
> 
> Yes get(row) is more expensive than get(row, column name). Keep in mind that
> HBase is column oriented. So when you fetch data from multiple columns it
> means that it will need to access multiple files (1 per column family) in
> order to get the data for the whole row.
> 
> Sebastien
> 
> 
> 
> 
> On Tue, Jul 15, 2008 at 8:54 AM, Daniel <d4...@gmail.com> wrote:
> 
> > hi all,
> >   i'm writting a program to access my hbase table in a MR job. my first
> > version is to get different values from get(row,column name),
> > and now im changing to get one row each time into a map, and query that map
> > instead - for one reduce job.
> >   i think it would be better to access hbase only once per one reduce
> > function, but it seems like the latter version takes a longer time to
> > finish
> >
> > during the reduce job. does this mean get(row, column name) is less
> > expensive than get(row) ?
> >  thanks.
> >
> > Daniel
> >

Re: performance on getRow and get

Posted by Sébastien Rainville <se...@gmail.com>.

Hi Daniel,

Yes get(row) is more expensive than get(row, column name). Keep in mind that
HBase is column oriented. So when you fetch data from multiple columns it
means that it will need to access multiple files (1 per column family) in
order to get the data for the whole row.

Sebastien

On Tue, Jul 15, 2008 at 8:54 AM, Daniel <d4...@gmail.com> wrote:

> hi all,
>   i'm writting a program to access my hbase table in a MR job. my first
> version is to get different values from get(row,column name),
> and now im changing to get one row each time into a map, and query that map
> instead - for one reduce job.
>   i think it would be better to access hbase only once per one reduce
> function, but it seems like the latter version takes a longer time to
> finish
>
> during the reduce job. does this mean get(row, column name) is less
> expensive than get(row) ?
>  thanks.
>
> Daniel
>