You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by lars hofhansl <la...@apache.org> on 2013/01/25 23:00:51 UTC

Re: Hbase scans taking a lot of time

Enable scan batching in Hive.
You're probably performing 300m RPC requests, i.e. you're mostly measuring network latency.

-- Lars



________________________________
 From: Vibhav Mundra <mu...@gmail.com>
To: user@hbase.apache.org; dev@hbase.apache.org 
Sent: Friday, January 25, 2013 1:10 AM
Subject: Hbase scans taking a lot of time
 
I am facing a very strange problem with HBase.

This what I did:
a) Create a table, using pre partioned splits.
b) Also the column familes are zipped with lzo compression.
c) Using the above configuration I am able to populate 2 million row per
min in the Hbase.
d) I have created a table with 300 million odd rows, which roughy took me 3
hours to populate and the data size is of 25GB.

e) But when I query for data the performance I am getting is very bad.
   Basically this is what I am seeing: High CPU, no disk I/O and network
I/O is happening at the rate of 6~7MB secs.


Because of this, if I scan the entries of the table using Hive it is taking
ages.
Basically it is taking around 24 hours to scan the table. Any idea, of how
to debug.


-Vibhav

Re: Hbase scans taking a lot of time

Posted by Alok Kumar <al...@gmail.com>.
Vibhav,

Hive submits a map-reduce job to hdfs cluster.* How many node cluster you
have?*

>>Because of this, if I scan the entries of the table using Hive it is
taking
>>ages.

Do you have 'order by' or 'group by' clause in you query? Queries take
longer to execute with these clauses.
Try with Hbase Filters if it can fit with your need. It would be
comparatively faster with limitations ( no order by' or 'group by' no joins)

Regards,
Alok


On Sat, Jan 26, 2013 at 5:26 AM, lars hofhansl <la...@apache.org> wrote:

> Sorry I meant scan caching. (not batching)
>
>
>
> ________________________________
>  From: lars hofhansl <la...@apache.org>
> To: "user@hbase.apache.org" <us...@hbase.apache.org>; "dev@hbase.apache.org"
> <de...@hbase.apache.org>
> Sent: Friday, January 25, 2013 2:00 PM
> Subject: Re: Hbase scans taking a lot of time
>
> Enable scan batching in Hive.
> You're probably performing 300m RPC requests, i.e. you're mostly measuring
> network latency.
>
> -- Lars
>
>
>
> ________________________________
> From: Vibhav Mundra <mu...@gmail.com>
> To: user@hbase.apache.org; dev@hbase.apache.org
> Sent: Friday, January 25, 2013 1:10 AM
> Subject: Hbase scans taking a lot of time
>
> I am facing a very strange problem with HBase.
>
> This what I did:
> a) Create a table, using pre partioned splits.
> b) Also the column familes are zipped with lzo compression.
> c) Using the above configuration I am able to populate 2 million row per
> min in the Hbase.
> d) I have created a table with 300 million odd rows, which roughy took me 3
> hours to populate and the data size is of 25GB.
>
> e) But when I query for data the performance I am getting is very bad.
>    Basically this is what I am seeing: High CPU, no disk I/O and network
> I/O is happening at the rate of 6~7MB secs.
>
>
> Because of this, if I scan the entries of the table using Hive it is taking
> ages.
> Basically it is taking around 24 hours to scan the table. Any idea, of how
> to debug.
>
>
> -Vibhav
>



-- 
Alok Kumar

Re: Hbase scans taking a lot of time

Posted by Alok Kumar <al...@gmail.com>.
Vibhav,

Hive submits a map-reduce job to hdfs cluster.* How many node cluster you
have?*

>>Because of this, if I scan the entries of the table using Hive it is
taking
>>ages.

Do you have 'order by' or 'group by' clause in you query? Queries take
longer to execute with these clauses.
Try with Hbase Filters if it can fit with your need. It would be
comparatively faster with limitations ( no order by' or 'group by' no joins)

Regards,
Alok


On Sat, Jan 26, 2013 at 5:26 AM, lars hofhansl <la...@apache.org> wrote:

> Sorry I meant scan caching. (not batching)
>
>
>
> ________________________________
>  From: lars hofhansl <la...@apache.org>
> To: "user@hbase.apache.org" <us...@hbase.apache.org>; "dev@hbase.apache.org"
> <de...@hbase.apache.org>
> Sent: Friday, January 25, 2013 2:00 PM
> Subject: Re: Hbase scans taking a lot of time
>
> Enable scan batching in Hive.
> You're probably performing 300m RPC requests, i.e. you're mostly measuring
> network latency.
>
> -- Lars
>
>
>
> ________________________________
> From: Vibhav Mundra <mu...@gmail.com>
> To: user@hbase.apache.org; dev@hbase.apache.org
> Sent: Friday, January 25, 2013 1:10 AM
> Subject: Hbase scans taking a lot of time
>
> I am facing a very strange problem with HBase.
>
> This what I did:
> a) Create a table, using pre partioned splits.
> b) Also the column familes are zipped with lzo compression.
> c) Using the above configuration I am able to populate 2 million row per
> min in the Hbase.
> d) I have created a table with 300 million odd rows, which roughy took me 3
> hours to populate and the data size is of 25GB.
>
> e) But when I query for data the performance I am getting is very bad.
>    Basically this is what I am seeing: High CPU, no disk I/O and network
> I/O is happening at the rate of 6~7MB secs.
>
>
> Because of this, if I scan the entries of the table using Hive it is taking
> ages.
> Basically it is taking around 24 hours to scan the table. Any idea, of how
> to debug.
>
>
> -Vibhav
>



-- 
Alok Kumar

Re: Hbase scans taking a lot of time

Posted by lars hofhansl <la...@apache.org>.
Sorry I meant scan caching. (not batching)



________________________________
 From: lars hofhansl <la...@apache.org>
To: "user@hbase.apache.org" <us...@hbase.apache.org>; "dev@hbase.apache.org" <de...@hbase.apache.org> 
Sent: Friday, January 25, 2013 2:00 PM
Subject: Re: Hbase scans taking a lot of time
 
Enable scan batching in Hive.
You're probably performing 300m RPC requests, i.e. you're mostly measuring network latency.

-- Lars



________________________________
From: Vibhav Mundra <mu...@gmail.com>
To: user@hbase.apache.org; dev@hbase.apache.org 
Sent: Friday, January 25, 2013 1:10 AM
Subject: Hbase scans taking a lot of time

I am facing a very strange problem with HBase.

This what I did:
a) Create a table, using pre partioned splits.
b) Also the column familes are zipped with lzo compression.
c) Using the above configuration I am able to populate 2 million row per
min in the Hbase.
d) I have created a table with 300 million odd rows, which roughy took me 3
hours to populate and the data size is of 25GB.

e) But when I query for data the performance I am getting is very bad.
   Basically this is what I am seeing: High CPU, no disk I/O and network
I/O is happening at the rate of 6~7MB secs.


Because of this, if I scan the entries of the table using Hive it is taking
ages.
Basically it is taking around 24 hours to scan the table. Any idea, of how
to debug.


-Vibhav

Re: Hbase scans taking a lot of time

Posted by lars hofhansl <la...@apache.org>.
Sorry I meant scan caching. (not batching)



________________________________
 From: lars hofhansl <la...@apache.org>
To: "user@hbase.apache.org" <us...@hbase.apache.org>; "dev@hbase.apache.org" <de...@hbase.apache.org> 
Sent: Friday, January 25, 2013 2:00 PM
Subject: Re: Hbase scans taking a lot of time
 
Enable scan batching in Hive.
You're probably performing 300m RPC requests, i.e. you're mostly measuring network latency.

-- Lars



________________________________
From: Vibhav Mundra <mu...@gmail.com>
To: user@hbase.apache.org; dev@hbase.apache.org 
Sent: Friday, January 25, 2013 1:10 AM
Subject: Hbase scans taking a lot of time

I am facing a very strange problem with HBase.

This what I did:
a) Create a table, using pre partioned splits.
b) Also the column familes are zipped with lzo compression.
c) Using the above configuration I am able to populate 2 million row per
min in the Hbase.
d) I have created a table with 300 million odd rows, which roughy took me 3
hours to populate and the data size is of 25GB.

e) But when I query for data the performance I am getting is very bad.
   Basically this is what I am seeing: High CPU, no disk I/O and network
I/O is happening at the rate of 6~7MB secs.


Because of this, if I scan the entries of the table using Hive it is taking
ages.
Basically it is taking around 24 hours to scan the table. Any idea, of how
to debug.


-Vibhav