You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Gautham Acharya <ga...@alleninstitute.org> on 2019/09/13 01:14:21 UTC

Retrieving large rows from Hbase

Hi,

I'm new to this distribution list and to Hbase in general, so I apologize if I'm asking a basic question.

I'm running an Apache Hbase Cluster on AWS EMR. I have a table that is a single column family, 75,000 columns and 50,000 rows. I'm trying to get all the column values for a single row, and when the row is not sparse, and has 75,000 values, the return time is extremely slow - it takes almost 3.5 seconds for me to fetch the data from the DB. I'm querying the table from a Lambda function running Happybase.


What can I do to make this faster? This seems incredibly slow - the return payload is 75,000 value pairs, and is only ~2MB. It should be much faster than 3 seconds. I'm looking for millisecond return time.

I have a BLOCKCACHE size of 8194kb, a BLOOMFILTER of type ROW, and SNAPPY compression enabled on this table.
--gautham

Re: Retrieving large rows from Hbase

Posted by Stack <st...@duboce.net>.

On Sat, Sep 14, 2019 at 9:32 AM Gautham Acharya <ga...@alleninstitute.org>
wrote:

> The 3.5 seconds is the time taken to fetch data from Hbase
>
>
Can you tell where the time is being spent (thread-dumping RS during
query)? Is it assembling 75k items? IIRC, its single-threaded at this point
but still... Otherwise, suggest filing an issue. I don't know of much by
way of perf profiling assembling up wide rows. Would be worth digging in.

Thanks,
S




> -----Original Message-----
> From: Stack [mailto:stack@duboce.net]
> Sent: Saturday, September 14, 2019 9:16 AM
> To: Hbase-User <us...@hbase.apache.org>
> Subject: Re: Retrieving large rows from Hbase
>
> CAUTION: This email originated from outside the Allen Institute. Please do
> not click links or open attachments unless you've validated the sender and
> know the content is safe.
> ________________________________
>
> On Thu, Sep 12, 2019 at 6:14 PM Gautham Acharya <
> gauthama@alleninstitute.org>
> wrote:
>
> > Hi,
> >
> > I'm new to this distribution list and to Hbase in general, so I
> > apologize if I'm asking a basic question.
> >
> > I'm running an Apache Hbase Cluster on AWS EMR. I have a table that is
> > a single column family, 75,000 columns and 50,000 rows. I'm trying to
> > get all the column values for a single row, and when the row is not
> > sparse, and has
> > 75,000 values, the return time is extremely slow - it takes almost 3.5
> > seconds for me to fetch the data from the DB. I'm querying the table
> > from a Lambda function running Happybase.
> >
> > Can you figure where the time is being spent -- in hbase or in the
> happybase processing? Happybase means an extra hop recasting 75k items in
> python.
>
> Thanks,
> S
>
>
> >
> > What can I do to make this faster? This seems incredibly slow - the
> > return payload is 75,000 value pairs, and is only ~2MB. It should be
> > much faster than 3 seconds. I'm looking for millisecond return time.
> >
> > I have a BLOCKCACHE size of 8194kb, a BLOOMFILTER of type ROW, and
> > SNAPPY compression enabled on this table.
> >
>
>
>
>
> > --gautham
> >
> >
>

RE: Retrieving large rows from Hbase

Posted by Gautham Acharya <ga...@alleninstitute.org>.

The 3.5 seconds is the time taken to fetch data from Hbase

-----Original Message-----
From: Stack [mailto:stack@duboce.net] 
Sent: Saturday, September 14, 2019 9:16 AM
To: Hbase-User <us...@hbase.apache.org>
Subject: Re: Retrieving large rows from Hbase

CAUTION: This email originated from outside the Allen Institute. Please do not click links or open attachments unless you've validated the sender and know the content is safe.
________________________________

On Thu, Sep 12, 2019 at 6:14 PM Gautham Acharya <ga...@alleninstitute.org>
wrote:

> Hi,
>
> I'm new to this distribution list and to Hbase in general, so I 
> apologize if I'm asking a basic question.
>
> I'm running an Apache Hbase Cluster on AWS EMR. I have a table that is 
> a single column family, 75,000 columns and 50,000 rows. I'm trying to 
> get all the column values for a single row, and when the row is not 
> sparse, and has
> 75,000 values, the return time is extremely slow - it takes almost 3.5 
> seconds for me to fetch the data from the DB. I'm querying the table 
> from a Lambda function running Happybase.
>
> Can you figure where the time is being spent -- in hbase or in the
happybase processing? Happybase means an extra hop recasting 75k items in python.

Thanks,
S


>
> What can I do to make this faster? This seems incredibly slow - the 
> return payload is 75,000 value pairs, and is only ~2MB. It should be 
> much faster than 3 seconds. I'm looking for millisecond return time.
>
> I have a BLOCKCACHE size of 8194kb, a BLOOMFILTER of type ROW, and 
> SNAPPY compression enabled on this table.
>




> --gautham
>
>

Re: Retrieving large rows from Hbase

Posted by Stack <st...@duboce.net>.

On Thu, Sep 12, 2019 at 6:14 PM Gautham Acharya <ga...@alleninstitute.org>
wrote:

> Hi,
>
> I'm new to this distribution list and to Hbase in general, so I apologize
> if I'm asking a basic question.
>
> I'm running an Apache Hbase Cluster on AWS EMR. I have a table that is a
> single column family, 75,000 columns and 50,000 rows. I'm trying to get all
> the column values for a single row, and when the row is not sparse, and has
> 75,000 values, the return time is extremely slow - it takes almost 3.5
> seconds for me to fetch the data from the DB. I'm querying the table from a
> Lambda function running Happybase.
>
> Can you figure where the time is being spent -- in hbase or in the
happybase processing? Happybase means an extra hop recasting 75k items in
python.

Thanks,
S


>
> What can I do to make this faster? This seems incredibly slow - the return
> payload is 75,000 value pairs, and is only ~2MB. It should be much faster
> than 3 seconds. I'm looking for millisecond return time.
>
> I have a BLOCKCACHE size of 8194kb, a BLOOMFILTER of type ROW, and SNAPPY
> compression enabled on this table.
>




> --gautham
>
>