You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by "Sharma, Avani" <ag...@ebay.com> on 2010/06/29 21:11:46 UTC

speed up reads in hBase

I have about 2.8M rows in my HBase table with multiple versions ( max 6) .

When I try to lookup 1000 records, it takes a total time of 20 minutes ! Per read takes about a second or more.
Will appreciate any pointers on speeding these ?

Thanks,
-Avani

Re: speed up reads in hBase

Posted by Jean-Daniel Cryans <jd...@apache.org>.

If you have the RAM, you can tweak hfile.block.cache.size as it will
cache more data. From conf/hbase-default.xml

      <description>
          Percentage of maximum heap (-Xmx setting) to allocate to block cache
          used by HFile/StoreFile. Default of 0.2 means allocate 20%.
          Set to 0 to disable.
      </description>

J-D

On Tue, Jun 29, 2010 at 5:52 PM, Sharma, Avani <ag...@ebay.com> wrote:
> Thanks. That was an obvious problem , now that I look at it. I can read about 10K records in 6.3seconds.
>
> What are the other possible things I could try to make it even faster ? Any pointers are appreciated.
>
> -Avani
>

RE: speed up reads in hBase

Posted by "Sharma, Avani" <ag...@ebay.com>.

Thanks. That was an obvious problem , now that I look at it. I can read about 10K records in 6.3seconds. 

What are the other possible things I could try to make it even faster ? Any pointers are appreciated.

-Avani

-----Original Message-----
From: Jonathan Gray [mailto:jgray@facebook.com] 
Sent: Tuesday, June 29, 2010 4:35 PM
To: user@hbase.apache.org
Subject: RE: speed up reads in hBase

Avani,

Are you including the time to instantiate the HTable?  Are you instantiating a new one each time?  Creating an HBaseConfiguration object will actually parse up the XML and all that.  You should just reuse the same HTable instance (within a thread) or if you have a multi-threaded program, then use HTablePool.

Are there multiple families on this table?

Obviously 1 second per Get is extraordinarily slow so there may be several things at play here.

JG

> -----Original Message-----
> From: Sharma, Avani [mailto:agsharma@ebay.com]
> Sent: Tuesday, June 29, 2010 4:05 PM
> To: user@hbase.apache.org
> Subject: RE: speed up reads in hBase
> 
> Max 6 versions. The rows are very small. I am currently running a
> prototype to see if HBase will work for our application in a real-time
> environment.
> The initial result shows slower than what-we-need performance.
> 
> I am not querying older versions. I am querying the latest in the
> experiment results I sent across.
> I ran experiments, and the timing doesn't change for querying 6
> versions old data or by not using versioning (instead create
> key_timestamp as the row key).
> 
> 
> I was hoping there would be ways to optimize the querying of latest
> version (forget about older versions ).
> 
> -Avani
> 
> -----Original Message-----
> From: Todd Lipcon [mailto:todd@cloudera.com]
> Sent: Tuesday, June 29, 2010 3:21 PM
> To: user@hbase.apache.org
> Subject: Re: speed up reads in hBase
> 
> Hi Avani,
> 
> There are currently some optimizations that Jonathan Gray is working on
> to
> make selection of specific time ranges more efficient.
> 
> How many versions are retained for the rows in this column family?
> 
> -Todd
> 
> On Tue, Jun 29, 2010 at 1:08 PM, Sharma, Avani <ag...@ebay.com>
> wrote:
> 
> > Rows are very small (like 50 bytes max). I am accessing the latest
> version
> > after setting  timerange.
> >
> >        HTable table = new HTable(new HBaseConfiguration(),
> table_name);
> >
> >        Get getRes = new Get(Bytes.toBytes(lkp_key));
> >
> >        long maxStamp  = new
> SimpleDateFormat("yyyyMMdd").parse(date_for_ts,
> > new ParsePosition(0)).getTime();
> >        getRes.setTimeRange(0, maxStamp);
> >
> >        Result r   = table.get(getRes);
> >        NavigableMap<byte[], byte[]> kvMap   =
> > r.getFamilyMap((Bytes.toBytes("data")));
> >
> > -----Original Message-----
> > From: Michael Segel [mailto:michael_segel@hotmail.com]
> > Sent: Tuesday, June 29, 2010 12:46 PM
> > To: user@hbase.apache.org
> > Subject: RE: speed up reads in hBase
> >
> >
> >
> > How wide are your rows?
> > Are you accessing the last version or pulling back all of the
> versions per
> > row?
> > > From: agsharma@ebay.com
> > > To: user@hbase.apache.org
> > > Date: Tue, 29 Jun 2010 12:11:46 -0700
> > > Subject: speed up reads in hBase
> > >
> > >
> > > I have about 2.8M rows in my HBase table with multiple versions (
> max 6)
> > .
> > >
> > > When I try to lookup 1000 records, it takes a total time of 20
> minutes !
> > Per read takes about a second or more.
> > > Will appreciate any pointers on speeding these ?
> > >
> > > Thanks,
> > > -Avani
> >
> > _________________________________________________________________
> > Hotmail has tools for the New Busy. Search, chat and e-mail from your
> > inbox.
> >
> >
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL
> :ON:WL:en-US:WM_HMP:042010_1
> >
> 
> 
> 
> --
> Todd Lipcon
> Software Engineer, Cloudera

Client jdk1.5 (Jboss5.1)

Posted by Palaniappan Thiyagarajan <pt...@cashedge.com>.

All,

Is it necessary to upgrade JBOSS5.1 JDK to 1.6 to connect to hbase?   We have JDK1.6 on the hadoop server and our client server is JBOSS application.  

Please clarify if we need upgrade our JBOSS JDK to 1.6 to connect to HADOOP.  Any pointer for JBOSS/HBASE connectivity would be appreciated.

Thanks
Palani

RE: speed up reads in hBase

Posted by Jonathan Gray <jg...@facebook.com>.

Avani,

Are you including the time to instantiate the HTable?  Are you instantiating a new one each time?  Creating an HBaseConfiguration object will actually parse up the XML and all that.  You should just reuse the same HTable instance (within a thread) or if you have a multi-threaded program, then use HTablePool.

Are there multiple families on this table?

Obviously 1 second per Get is extraordinarily slow so there may be several things at play here.

JG

> -----Original Message-----
> From: Sharma, Avani [mailto:agsharma@ebay.com]
> Sent: Tuesday, June 29, 2010 4:05 PM
> To: user@hbase.apache.org
> Subject: RE: speed up reads in hBase
> 
> Max 6 versions. The rows are very small. I am currently running a
> prototype to see if HBase will work for our application in a real-time
> environment.
> The initial result shows slower than what-we-need performance.
> 
> I am not querying older versions. I am querying the latest in the
> experiment results I sent across.
> I ran experiments, and the timing doesn't change for querying 6
> versions old data or by not using versioning (instead create
> key_timestamp as the row key).
> 
> 
> I was hoping there would be ways to optimize the querying of latest
> version (forget about older versions ).
> 
> -Avani
> 
> -----Original Message-----
> From: Todd Lipcon [mailto:todd@cloudera.com]
> Sent: Tuesday, June 29, 2010 3:21 PM
> To: user@hbase.apache.org
> Subject: Re: speed up reads in hBase
> 
> Hi Avani,
> 
> There are currently some optimizations that Jonathan Gray is working on
> to
> make selection of specific time ranges more efficient.
> 
> How many versions are retained for the rows in this column family?
> 
> -Todd
> 
> On Tue, Jun 29, 2010 at 1:08 PM, Sharma, Avani <ag...@ebay.com>
> wrote:
> 
> > Rows are very small (like 50 bytes max). I am accessing the latest
> version
> > after setting  timerange.
> >
> >        HTable table = new HTable(new HBaseConfiguration(),
> table_name);
> >
> >        Get getRes = new Get(Bytes.toBytes(lkp_key));
> >
> >        long maxStamp  = new
> SimpleDateFormat("yyyyMMdd").parse(date_for_ts,
> > new ParsePosition(0)).getTime();
> >        getRes.setTimeRange(0, maxStamp);
> >
> >        Result r   = table.get(getRes);
> >        NavigableMap<byte[], byte[]> kvMap   =
> > r.getFamilyMap((Bytes.toBytes("data")));
> >
> > -----Original Message-----
> > From: Michael Segel [mailto:michael_segel@hotmail.com]
> > Sent: Tuesday, June 29, 2010 12:46 PM
> > To: user@hbase.apache.org
> > Subject: RE: speed up reads in hBase
> >
> >
> >
> > How wide are your rows?
> > Are you accessing the last version or pulling back all of the
> versions per
> > row?
> > > From: agsharma@ebay.com
> > > To: user@hbase.apache.org
> > > Date: Tue, 29 Jun 2010 12:11:46 -0700
> > > Subject: speed up reads in hBase
> > >
> > >
> > > I have about 2.8M rows in my HBase table with multiple versions (
> max 6)
> > .
> > >
> > > When I try to lookup 1000 records, it takes a total time of 20
> minutes !
> > Per read takes about a second or more.
> > > Will appreciate any pointers on speeding these ?
> > >
> > > Thanks,
> > > -Avani
> >
> > _________________________________________________________________
> > Hotmail has tools for the New Busy. Search, chat and e-mail from your
> > inbox.
> >
> >
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL
> :ON:WL:en-US:WM_HMP:042010_1
> >
> 
> 
> 
> --
> Todd Lipcon
> Software Engineer, Cloudera

RE: speed up reads in hBase

Posted by "Sharma, Avani" <ag...@ebay.com>.

Max 6 versions. The rows are very small. I am currently running a prototype to see if HBase will work for our application in a real-time environment.
The initial result shows slower than what-we-need performance.

I am not querying older versions. I am querying the latest in the experiment results I sent across.
I ran experiments, and the timing doesn't change for querying 6 versions old data or by not using versioning (instead create key_timestamp as the row key).

I was hoping there would be ways to optimize the querying of latest version (forget about older versions ).

-Avani

-----Original Message-----
From: Todd Lipcon [mailto:todd@cloudera.com] 
Sent: Tuesday, June 29, 2010 3:21 PM
To: user@hbase.apache.org
Subject: Re: speed up reads in hBase

Hi Avani,

There are currently some optimizations that Jonathan Gray is working on to
make selection of specific time ranges more efficient.

How many versions are retained for the rows in this column family?

-Todd

On Tue, Jun 29, 2010 at 1:08 PM, Sharma, Avani <ag...@ebay.com> wrote:

> Rows are very small (like 50 bytes max). I am accessing the latest version
> after setting  timerange.
>
>        HTable table = new HTable(new HBaseConfiguration(), table_name);
>
>        Get getRes = new Get(Bytes.toBytes(lkp_key));
>
>        long maxStamp  = new SimpleDateFormat("yyyyMMdd").parse(date_for_ts,
> new ParsePosition(0)).getTime();
>        getRes.setTimeRange(0, maxStamp);
>
>        Result r   = table.get(getRes);
>        NavigableMap<byte[], byte[]> kvMap   =
> r.getFamilyMap((Bytes.toBytes("data")));
>
> -----Original Message-----
> From: Michael Segel [mailto:michael_segel@hotmail.com]
> Sent: Tuesday, June 29, 2010 12:46 PM
> To: user@hbase.apache.org
> Subject: RE: speed up reads in hBase
>
>
>
> How wide are your rows?
> Are you accessing the last version or pulling back all of the versions per
> row?
> > From: agsharma@ebay.com
> > To: user@hbase.apache.org
> > Date: Tue, 29 Jun 2010 12:11:46 -0700
> > Subject: speed up reads in hBase
> >
> >
> > I have about 2.8M rows in my HBase table with multiple versions ( max 6)
> .
> >
> > When I try to lookup 1000 records, it takes a total time of 20 minutes !
> Per read takes about a second or more.
> > Will appreciate any pointers on speeding these ?
> >
> > Thanks,
> > -Avani
>
> _________________________________________________________________
> Hotmail has tools for the New Busy. Search, chat and e-mail from your
> inbox.
>
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
>

-- 
Todd Lipcon
Software Engineer, Cloudera

Re: speed up reads in hBase

Posted by Todd Lipcon <to...@cloudera.com>.

Hi Avani,

There are currently some optimizations that Jonathan Gray is working on to
make selection of specific time ranges more efficient.

How many versions are retained for the rows in this column family?

-Todd

On Tue, Jun 29, 2010 at 1:08 PM, Sharma, Avani <ag...@ebay.com> wrote:

> Rows are very small (like 50 bytes max). I am accessing the latest version
> after setting  timerange.
>
>        HTable table = new HTable(new HBaseConfiguration(), table_name);
>
>        Get getRes = new Get(Bytes.toBytes(lkp_key));
>
>        long maxStamp  = new SimpleDateFormat("yyyyMMdd").parse(date_for_ts,
> new ParsePosition(0)).getTime();
>        getRes.setTimeRange(0, maxStamp);
>
>        Result r   = table.get(getRes);
>        NavigableMap<byte[], byte[]> kvMap   =
> r.getFamilyMap((Bytes.toBytes("data")));
>
> -----Original Message-----
> From: Michael Segel [mailto:michael_segel@hotmail.com]
> Sent: Tuesday, June 29, 2010 12:46 PM
> To: user@hbase.apache.org
> Subject: RE: speed up reads in hBase
>
>
>
> How wide are your rows?
> Are you accessing the last version or pulling back all of the versions per
> row?
> > From: agsharma@ebay.com
> > To: user@hbase.apache.org
> > Date: Tue, 29 Jun 2010 12:11:46 -0700
> > Subject: speed up reads in hBase
> >
> >
> > I have about 2.8M rows in my HBase table with multiple versions ( max 6)
> .
> >
> > When I try to lookup 1000 records, it takes a total time of 20 minutes !
> Per read takes about a second or more.
> > Will appreciate any pointers on speeding these ?
> >
> > Thanks,
> > -Avani
>
> _________________________________________________________________
> Hotmail has tools for the New Busy. Search, chat and e-mail from your
> inbox.
>
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
>



-- 
Todd Lipcon
Software Engineer, Cloudera

RE: speed up reads in hBase

Posted by "Sharma, Avani" <ag...@ebay.com>.

Rows are very small (like 50 bytes max). I am accessing the latest version after setting  timerange.

  	HTable table = new HTable(new HBaseConfiguration(), table_name);

        Get getRes = new Get(Bytes.toBytes(lkp_key));

        long maxStamp  = new SimpleDateFormat("yyyyMMdd").parse(date_for_ts, new ParsePosition(0)).getTime();
        getRes.setTimeRange(0, maxStamp);

        Result r   = table.get(getRes);
        NavigableMap<byte[], byte[]> kvMap   = r.getFamilyMap((Bytes.toBytes("data")));

-----Original Message-----
From: Michael Segel [mailto:michael_segel@hotmail.com] 
Sent: Tuesday, June 29, 2010 12:46 PM
To: user@hbase.apache.org
Subject: RE: speed up reads in hBase



How wide are your rows?
Are you accessing the last version or pulling back all of the versions per row?
> From: agsharma@ebay.com
> To: user@hbase.apache.org
> Date: Tue, 29 Jun 2010 12:11:46 -0700
> Subject: speed up reads in hBase
> 
> 
> I have about 2.8M rows in my HBase table with multiple versions ( max 6) .
> 
> When I try to lookup 1000 records, it takes a total time of 20 minutes ! Per read takes about a second or more.
> Will appreciate any pointers on speeding these ?
> 
> Thanks,
> -Avani
 		 	   		  
_________________________________________________________________
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1

RE: speed up reads in hBase

Posted by Michael Segel <mi...@hotmail.com>.


How wide are your rows?
Are you accessing the last version or pulling back all of the versions per row?
> From: agsharma@ebay.com
> To: user@hbase.apache.org
> Date: Tue, 29 Jun 2010 12:11:46 -0700
> Subject: speed up reads in hBase
> 
> 
> I have about 2.8M rows in my HBase table with multiple versions ( max 6) .
> 
> When I try to lookup 1000 records, it takes a total time of 20 minutes ! Per read takes about a second or more.
> Will appreciate any pointers on speeding these ?
> 
> Thanks,
> -Avani
 		 	   		  
_________________________________________________________________
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1