You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Gautham Acharya <ga...@alleninstitute.org> on 2019/09/13 01:14:21 UTC
Retrieving large rows from Hbase
Hi,
I'm new to this distribution list and to Hbase in general, so I apologize if I'm asking a basic question.
I'm running an Apache Hbase Cluster on AWS EMR. I have a table that is a single column family, 75,000 columns and 50,000 rows. I'm trying to get all the column values for a single row, and when the row is not sparse, and has 75,000 values, the return time is extremely slow - it takes almost 3.5 seconds for me to fetch the data from the DB. I'm querying the table from a Lambda function running Happybase.
What can I do to make this faster? This seems incredibly slow - the return payload is 75,000 value pairs, and is only ~2MB. It should be much faster than 3 seconds. I'm looking for millisecond return time.
I have a BLOCKCACHE size of 8194kb, a BLOOMFILTER of type ROW, and SNAPPY compression enabled on this table.
--gautham
Re: Retrieving large rows from Hbase
Posted by Stack <st...@duboce.net>.
On Sat, Sep 14, 2019 at 9:32 AM Gautham Acharya <ga...@alleninstitute.org>
wrote:
> The 3.5 seconds is the time taken to fetch data from Hbase
>
>
Can you tell where the time is being spent (thread-dumping RS during
query)? Is it assembling 75k items? IIRC, its single-threaded at this point
but still... Otherwise, suggest filing an issue. I don't know of much by
way of perf profiling assembling up wide rows. Would be worth digging in.
Thanks,
S
> -----Original Message-----
> From: Stack [mailto:stack@duboce.net]
> Sent: Saturday, September 14, 2019 9:16 AM
> To: Hbase-User <us...@hbase.apache.org>
> Subject: Re: Retrieving large rows from Hbase
>
> CAUTION: This email originated from outside the Allen Institute. Please do
> not click links or open attachments unless you've validated the sender and
> know the content is safe.
> ________________________________
>
> On Thu, Sep 12, 2019 at 6:14 PM Gautham Acharya <
> gauthama@alleninstitute.org>
> wrote:
>
> > Hi,
> >
> > I'm new to this distribution list and to Hbase in general, so I
> > apologize if I'm asking a basic question.
> >
> > I'm running an Apache Hbase Cluster on AWS EMR. I have a table that is
> > a single column family, 75,000 columns and 50,000 rows. I'm trying to
> > get all the column values for a single row, and when the row is not
> > sparse, and has
> > 75,000 values, the return time is extremely slow - it takes almost 3.5
> > seconds for me to fetch the data from the DB. I'm querying the table
> > from a Lambda function running Happybase.
> >
> > Can you figure where the time is being spent -- in hbase or in the
> happybase processing? Happybase means an extra hop recasting 75k items in
> python.
>
> Thanks,
> S
>
>
> >
> > What can I do to make this faster? This seems incredibly slow - the
> > return payload is 75,000 value pairs, and is only ~2MB. It should be
> > much faster than 3 seconds. I'm looking for millisecond return time.
> >
> > I have a BLOCKCACHE size of 8194kb, a BLOOMFILTER of type ROW, and
> > SNAPPY compression enabled on this table.
> >
>
>
>
>
> > --gautham
> >
> >
>
RE: Retrieving large rows from Hbase
Posted by Gautham Acharya <ga...@alleninstitute.org>.
The 3.5 seconds is the time taken to fetch data from Hbase
-----Original Message-----
From: Stack [mailto:stack@duboce.net]
Sent: Saturday, September 14, 2019 9:16 AM
To: Hbase-User <us...@hbase.apache.org>
Subject: Re: Retrieving large rows from Hbase
CAUTION: This email originated from outside the Allen Institute. Please do not click links or open attachments unless you've validated the sender and know the content is safe.
________________________________
On Thu, Sep 12, 2019 at 6:14 PM Gautham Acharya <ga...@alleninstitute.org>
wrote:
> Hi,
>
> I'm new to this distribution list and to Hbase in general, so I
> apologize if I'm asking a basic question.
>
> I'm running an Apache Hbase Cluster on AWS EMR. I have a table that is
> a single column family, 75,000 columns and 50,000 rows. I'm trying to
> get all the column values for a single row, and when the row is not
> sparse, and has
> 75,000 values, the return time is extremely slow - it takes almost 3.5
> seconds for me to fetch the data from the DB. I'm querying the table
> from a Lambda function running Happybase.
>
> Can you figure where the time is being spent -- in hbase or in the
happybase processing? Happybase means an extra hop recasting 75k items in python.
Thanks,
S
>
> What can I do to make this faster? This seems incredibly slow - the
> return payload is 75,000 value pairs, and is only ~2MB. It should be
> much faster than 3 seconds. I'm looking for millisecond return time.
>
> I have a BLOCKCACHE size of 8194kb, a BLOOMFILTER of type ROW, and
> SNAPPY compression enabled on this table.
>
> --gautham
>
>
Re: Retrieving large rows from Hbase
Posted by Stack <st...@duboce.net>.
On Thu, Sep 12, 2019 at 6:14 PM Gautham Acharya <ga...@alleninstitute.org>
wrote:
> Hi,
>
> I'm new to this distribution list and to Hbase in general, so I apologize
> if I'm asking a basic question.
>
> I'm running an Apache Hbase Cluster on AWS EMR. I have a table that is a
> single column family, 75,000 columns and 50,000 rows. I'm trying to get all
> the column values for a single row, and when the row is not sparse, and has
> 75,000 values, the return time is extremely slow - it takes almost 3.5
> seconds for me to fetch the data from the DB. I'm querying the table from a
> Lambda function running Happybase.
>
> Can you figure where the time is being spent -- in hbase or in the
happybase processing? Happybase means an extra hop recasting 75k items in
python.
Thanks,
S
>
> What can I do to make this faster? This seems incredibly slow - the return
> payload is 75,000 value pairs, and is only ~2MB. It should be much faster
> than 3 seconds. I'm looking for millisecond return time.
>
> I have a BLOCKCACHE size of 8194kb, a BLOOMFILTER of type ROW, and SNAPPY
> compression enabled on this table.
>
> --gautham
>
>