You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by AnandaVelMurugan Chandra Mohan <an...@gmail.com> on 2012/07/02 07:23:57 UTC

Re: ways to improve performance of Scan with SingleColumnValueFilter..Please help!!!

Thanks for the suggestions. I will fix my cluster set up.

On Fri, Jun 29, 2012 at 8:01 PM, Alex Baranau <al...@gmail.com>wrote:

> 1. Theoretically, scanning on each regionserver with cps might help you, I
> think. But this is not a good way to go with anyways...
>
> 2. Table is created on the cluster, not on individual RSs. Though table
> Regions are assigned to specific RSs. While creating the table you will be
> talking to master node (even if you open shell on slave node) and it will
> decide where to place regions (depending on the regions # on the slaves).
> You can probably try to manually move regions to desired RSs, but that is
> also not a good way to go with.
>
> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase
>
> On Fri, Jun 29, 2012 at 8:50 AM, AnandaVelMurugan Chandra Mohan <
> ananthu2050@gmail.com> wrote:
>
> > Hi,
> >
> > Thanks for the suggestions. I too observed that scan in hbase shell takes
> > almost same time.
> >
> > I would try to fix my HBase cluster set up.
> >
> > Meanwhile, I have two questions
> >
> >
> >   - Will endpoint coprocessors help my cause? (In case cluster
> >   modification is beyond my control, I would lean on this approach)
> >   - I am logging into my Hbase node (India) and creating the table. Does
> >   it imply that my table is getting created in region server in my node
> in
> >   India. Once the web application deployment is complete, I will move
> this
> >   web application into US server farm. If there is a way to instruct
> Hbase
> > to
> >   create table in US region server, I hope it will solve the issue.
> >
> > Please advice. Thanks!!!
> >
> >
> > On Fri, Jun 29, 2012 at 5:52 PM, Alex Baranau <alex.baranov.v@gmail.com
> > >wrote:
> >
> > > I'd agree that HBase is not designed to be run in such
> > "inter-continental"
> > > single cluster setup. Latency in communication between nodes (slaves)
> is
> > > vital for the health of the cluster.
> > >
> > > So, the short answer: just don't do it that way.
> > >
> > > What is the reason to have nodes in these locations?
> > >
> > > Alex Baranau
> > > ------
> > > Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop -
> HBase
> > >
> > > On Fri, Jun 29, 2012 at 7:06 AM, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org> wrote:
> > >
> > > > Hi Anand,
> > > >
> > > > Using HBase/Hadoop for some tests for weeks now, I figure that it's
> > > > very network consuming. Using it with a wireless computer was VERY
> > > > slow. I moved to a 1000BASE-T network and it's now WAY better. I'm
> not
> > > > sure having the nodes shared that way on internet will be efficient.
> > > >
> > > > Have you tried to put/retrieve some files from hadoop with the
> command
> > > > line tool to see the performances? Can you analyse your bandwidth
> > > > usage in the same time?
> > > >
> > > > --
> > > > JM
> > > >
> > > > 2012/6/29, AnandaVelMurugan Chandra Mohan <an...@gmail.com>:
> > > > > Hi,
> > > > >
> > > > > I am using HBase client API to access HBase. My HBase version is
> > 0.92.1
> > > > and
> > > > > I have three nodes in my Hadoop cluster. Two nodes are in US and
> one
> > > node
> > > > > in India. HBase master is in one of the node in US.
> > > > >
> > > > > In this HBase set up, I have a table with 1200+ rows. I am
> > developing a
> > > > web
> > > > > application which uses HBase client java API to retrieve data  from
> > > this
> > > > > table. This is a GWT web application deployed in JBoss (running in
> a
> > > > server
> > > > > farm in India). When I retrieve data from Hbase table based on a
> > column
> > > > > value, it takes 6 mins. In code, I am doing a scan on table with
> > > > > "SingleColumnValueFilter". Given the number of rows, this
> performance
> > > is
> > > > > very bad (6 mins for 1200 records). Is there any way to improve the
> > > > > performance?
> > > > >
> > > > > Any help would be greatly appreciated.
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Anand
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Anand
> >
>



-- 
Regards,
Anand

Re: ways to improve performance of Scan with SingleColumnValueFilter..Please help!!!

Posted by manoj p <eo...@gmail.com>.

Hi anand,

     Try increasing your rowBatchSize() using hbaseString() function.This
will increase the no.of rows fetched during a single RPC call. This might
improve the speed of retrieval.

Cheers,
Manoj.P

On Mon, Jul 2, 2012 at 10:53 AM, AnandaVelMurugan Chandra Mohan <
ananthu2050@gmail.com> wrote:

> Thanks for the suggestions. I will fix my cluster set up.
>
> On Fri, Jun 29, 2012 at 8:01 PM, Alex Baranau <alex.baranov.v@gmail.com
> >wrote:
>
> > 1. Theoretically, scanning on each regionserver with cps might help you,
> I
> > think. But this is not a good way to go with anyways...
> >
> > 2. Table is created on the cluster, not on individual RSs. Though table
> > Regions are assigned to specific RSs. While creating the table you will
> be
> > talking to master node (even if you open shell on slave node) and it will
> > decide where to place regions (depending on the regions # on the slaves).
> > You can probably try to manually move regions to desired RSs, but that is
> > also not a good way to go with.
> >
> > Alex Baranau
> > ------
> > Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase
> >
> > On Fri, Jun 29, 2012 at 8:50 AM, AnandaVelMurugan Chandra Mohan <
> > ananthu2050@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > Thanks for the suggestions. I too observed that scan in hbase shell
> takes
> > > almost same time.
> > >
> > > I would try to fix my HBase cluster set up.
> > >
> > > Meanwhile, I have two questions
> > >
> > >
> > >   - Will endpoint coprocessors help my cause? (In case cluster
> > >   modification is beyond my control, I would lean on this approach)
> > >   - I am logging into my Hbase node (India) and creating the table.
> Does
> > >   it imply that my table is getting created in region server in my node
> > in
> > >   India. Once the web application deployment is complete, I will move
> > this
> > >   web application into US server farm. If there is a way to instruct
> > Hbase
> > > to
> > >   create table in US region server, I hope it will solve the issue.
> > >
> > > Please advice. Thanks!!!
> > >
> > >
> > > On Fri, Jun 29, 2012 at 5:52 PM, Alex Baranau <
> alex.baranov.v@gmail.com
> > > >wrote:
> > >
> > > > I'd agree that HBase is not designed to be run in such
> > > "inter-continental"
> > > > single cluster setup. Latency in communication between nodes (slaves)
> > is
> > > > vital for the health of the cluster.
> > > >
> > > > So, the short answer: just don't do it that way.
> > > >
> > > > What is the reason to have nodes in these locations?
> > > >
> > > > Alex Baranau
> > > > ------
> > > > Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop -
> > HBase
> > > >
> > > > On Fri, Jun 29, 2012 at 7:06 AM, Jean-Marc Spaggiari <
> > > > jean-marc@spaggiari.org> wrote:
> > > >
> > > > > Hi Anand,
> > > > >
> > > > > Using HBase/Hadoop for some tests for weeks now, I figure that it's
> > > > > very network consuming. Using it with a wireless computer was VERY
> > > > > slow. I moved to a 1000BASE-T network and it's now WAY better. I'm
> > not
> > > > > sure having the nodes shared that way on internet will be
> efficient.
> > > > >
> > > > > Have you tried to put/retrieve some files from hadoop with the
> > command
> > > > > line tool to see the performances? Can you analyse your bandwidth
> > > > > usage in the same time?
> > > > >
> > > > > --
> > > > > JM
> > > > >
> > > > > 2012/6/29, AnandaVelMurugan Chandra Mohan <an...@gmail.com>:
> > > > > > Hi,
> > > > > >
> > > > > > I am using HBase client API to access HBase. My HBase version is
> > > 0.92.1
> > > > > and
> > > > > > I have three nodes in my Hadoop cluster. Two nodes are in US and
> > one
> > > > node
> > > > > > in India. HBase master is in one of the node in US.
> > > > > >
> > > > > > In this HBase set up, I have a table with 1200+ rows. I am
> > > developing a
> > > > > web
> > > > > > application which uses HBase client java API to retrieve data
>  from
> > > > this
> > > > > > table. This is a GWT web application deployed in JBoss (running
> in
> > a
> > > > > server
> > > > > > farm in India). When I retrieve data from Hbase table based on a
> > > column
> > > > > > value, it takes 6 mins. In code, I am doing a scan on table with
> > > > > > "SingleColumnValueFilter". Given the number of rows, this
> > performance
> > > > is
> > > > > > very bad (6 mins for 1200 records). Is there any way to improve
> the
> > > > > > performance?
> > > > > >
> > > > > > Any help would be greatly appreciated.
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > > Anand
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Anand
> > >
> >
>
>
>
> --
> Regards,
> Anand
>