You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Nishanth S <ni...@gmail.com> on 2014/09/25 18:57:50 UTC
Wide Rows vs Multiple column families
Hi everyone,
This question may have been asked many times but I would really appreciate
if some one can help me on how to go about this.
Currently my hbase table consists of about 10 columns per row which in
total has an average size of 5K.The chunk of the size is held by one
particular column(more than 4K).Would it help to move this column out to a
different column family when we do reads.There are cases where we just need
to access the smaller columns and there is another set of use cases where
you need both the data(the one in smaller column and this huge data
chunk).In general I am trying to answer the below questions in this
scenario.
1.Would seperating to multiple column families affect hbase write
performance?
2. How would if affect my read performance considering both the read cases?
3.Is there any advantage that I am gaining by seperating into multiple cfs?
I would really appreciate if any one could point me in the right direction.
-Thanks
Nishan
Re: Wide Rows vs Multiple column families
Posted by Ted Yu <yu...@gmail.com>.
bq. had to spawn multiple put requests in this case because there is no API
for sending insert requests to multiple column family.
Could this be related to the slowdown you observed ?
Are you able to use HTable API and see if the same slowdown is reproduced ?
BTW try 0.98.6.1 if you can.
Cheers
On Mon, Sep 29, 2014 at 10:00 AM, Nishanth S <ni...@gmail.com>
wrote:
> Hbase Release: 0.96.1
> Number of column families at which issue is observed is 2.Earlier I had one
> single column family where all the data was persisted.In the new case I
> was storing all meta data into column family 1(less than 1k) and a blob
> on second column family(around 7Kb).
> We have 9 node cluster with 7 hbase region servers and using hadoop 2.3.0.
>
> I am also using asynch hbase client 1.5 for ingesting data into hbase.I
> had to spawn multiple put requests in this case because there is no API for
> sending insert requests to multiple column family.
>
> Thanks,
> Nishan
>
>
> On Mon, Sep 29, 2014 at 10:49 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > Can you give a bit more detail, such as:
> >
> > the release of HBase you're using
> > number of column families where slowdown is observed
> > size of cluster
> > release of hadoop you're using
> >
> > Thanks
> >
> > On Mon, Sep 29, 2014 at 9:43 AM, Nishanth S <ni...@gmail.com>
> > wrote:
> >
> > > Hey Ted,
> > >
> > > I was in the process of comparing insert throughputs which we
> > > discussed using ycsb.What I could find is that when I split the data
> into
> > > multiple column families the insert through is coming down to half
> > when
> > > compared to persisting into a single column family.Do you think this is
> > > possible or am I doing some thing wrong.
> > >
> > > -Nishan
> > >
> > > On Thu, Sep 25, 2014 at 11:56 AM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > There should not be impact to hbase write performance for two column
> > > > families.
> > > >
> > > > Cheers
> > > >
> > > > On Thu, Sep 25, 2014 at 10:53 AM, Nishanth S <
> nishanth.2884@gmail.com>
> > > > wrote:
> > > >
> > > > > Thank you Ted.No I do not plan to use bulk loading since the data
> > is
> > > > > incremental in nature.
> > > > >
> > > > > On Thu, Sep 25, 2014 at 11:36 AM, Ted Yu <yu...@gmail.com>
> > wrote:
> > > > >
> > > > > > For #1, do you plan to use bulk load ?
> > > > > >
> > > > > > For #3, take a look at HBASE-5416 which introduced essential
> column
> > > > > family.
> > > > > > In your query, you can designate the smaller column family as
> > > essential
> > > > > > column family where smaller columns are queried.
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > On Thu, Sep 25, 2014 at 9:57 AM, Nishanth S <
> > nishanth.2884@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > This question may have been asked many times but I would
> really
> > > > > > appreciate
> > > > > > > if some one can help me on how to go about this.
> > > > > > >
> > > > > > >
> > > > > > > Currently my hbase table consists of about 10 columns per row
> > > which
> > > > in
> > > > > > > total has an average size of 5K.The chunk of the size is held
> by
> > > > one
> > > > > > > particular column(more than 4K).Would it help to move this
> > column
> > > > out
> > > > > > to a
> > > > > > > different column family when we do reads.There are cases where
> we
> > > > just
> > > > > > need
> > > > > > > to access the smaller columns and there is another set of use
> > > cases
> > > > > > where
> > > > > > > you need both the data(the one in smaller column and this huge
> > data
> > > > > > > chunk).In general I am trying to answer the below questions in
> > this
> > > > > > > scenario.
> > > > > > >
> > > > > > >
> > > > > > > 1.Would seperating to multiple column families affect hbase
> > write
> > > > > > > performance?
> > > > > > >
> > > > > > > 2. How would if affect my read performance considering both the
> > > read
> > > > > > cases?
> > > > > > >
> > > > > > > 3.Is there any advantage that I am gaining by seperating into
> > > > multiple
> > > > > > cfs?
> > > > > > >
> > > > > > >
> > > > > > > I would really appreciate if any one could point me in the
> right
> > > > > > > direction.
> > > > > > >
> > > > > > >
> > > > > > > -Thanks
> > > > > > > Nishan
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Re: Wide Rows vs Multiple column families
Posted by Nishanth S <ni...@gmail.com>.
Hbase Release: 0.96.1
Number of column families at which issue is observed is 2.Earlier I had one
single column family where all the data was persisted.In the new case I
was storing all meta data into column family 1(less than 1k) and a blob
on second column family(around 7Kb).
We have 9 node cluster with 7 hbase region servers and using hadoop 2.3.0.
I am also using asynch hbase client 1.5 for ingesting data into hbase.I
had to spawn multiple put requests in this case because there is no API for
sending insert requests to multiple column family.
Thanks,
Nishan
On Mon, Sep 29, 2014 at 10:49 AM, Ted Yu <yu...@gmail.com> wrote:
> Can you give a bit more detail, such as:
>
> the release of HBase you're using
> number of column families where slowdown is observed
> size of cluster
> release of hadoop you're using
>
> Thanks
>
> On Mon, Sep 29, 2014 at 9:43 AM, Nishanth S <ni...@gmail.com>
> wrote:
>
> > Hey Ted,
> >
> > I was in the process of comparing insert throughputs which we
> > discussed using ycsb.What I could find is that when I split the data into
> > multiple column families the insert through is coming down to half
> when
> > compared to persisting into a single column family.Do you think this is
> > possible or am I doing some thing wrong.
> >
> > -Nishan
> >
> > On Thu, Sep 25, 2014 at 11:56 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > There should not be impact to hbase write performance for two column
> > > families.
> > >
> > > Cheers
> > >
> > > On Thu, Sep 25, 2014 at 10:53 AM, Nishanth S <ni...@gmail.com>
> > > wrote:
> > >
> > > > Thank you Ted.No I do not plan to use bulk loading since the data
> is
> > > > incremental in nature.
> > > >
> > > > On Thu, Sep 25, 2014 at 11:36 AM, Ted Yu <yu...@gmail.com>
> wrote:
> > > >
> > > > > For #1, do you plan to use bulk load ?
> > > > >
> > > > > For #3, take a look at HBASE-5416 which introduced essential column
> > > > family.
> > > > > In your query, you can designate the smaller column family as
> > essential
> > > > > column family where smaller columns are queried.
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Thu, Sep 25, 2014 at 9:57 AM, Nishanth S <
> nishanth.2884@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > This question may have been asked many times but I would really
> > > > > appreciate
> > > > > > if some one can help me on how to go about this.
> > > > > >
> > > > > >
> > > > > > Currently my hbase table consists of about 10 columns per row
> > which
> > > in
> > > > > > total has an average size of 5K.The chunk of the size is held by
> > > one
> > > > > > particular column(more than 4K).Would it help to move this
> column
> > > out
> > > > > to a
> > > > > > different column family when we do reads.There are cases where we
> > > just
> > > > > need
> > > > > > to access the smaller columns and there is another set of use
> > cases
> > > > > where
> > > > > > you need both the data(the one in smaller column and this huge
> data
> > > > > > chunk).In general I am trying to answer the below questions in
> this
> > > > > > scenario.
> > > > > >
> > > > > >
> > > > > > 1.Would seperating to multiple column families affect hbase
> write
> > > > > > performance?
> > > > > >
> > > > > > 2. How would if affect my read performance considering both the
> > read
> > > > > cases?
> > > > > >
> > > > > > 3.Is there any advantage that I am gaining by seperating into
> > > multiple
> > > > > cfs?
> > > > > >
> > > > > >
> > > > > > I would really appreciate if any one could point me in the right
> > > > > > direction.
> > > > > >
> > > > > >
> > > > > > -Thanks
> > > > > > Nishan
> > > > > >
> > > > >
> > > >
> > >
> >
>
Re: Wide Rows vs Multiple column families
Posted by Ted Yu <yu...@gmail.com>.
Can you give a bit more detail, such as:
the release of HBase you're using
number of column families where slowdown is observed
size of cluster
release of hadoop you're using
Thanks
On Mon, Sep 29, 2014 at 9:43 AM, Nishanth S <ni...@gmail.com> wrote:
> Hey Ted,
>
> I was in the process of comparing insert throughputs which we
> discussed using ycsb.What I could find is that when I split the data into
> multiple column families the insert through is coming down to half when
> compared to persisting into a single column family.Do you think this is
> possible or am I doing some thing wrong.
>
> -Nishan
>
> On Thu, Sep 25, 2014 at 11:56 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > There should not be impact to hbase write performance for two column
> > families.
> >
> > Cheers
> >
> > On Thu, Sep 25, 2014 at 10:53 AM, Nishanth S <ni...@gmail.com>
> > wrote:
> >
> > > Thank you Ted.No I do not plan to use bulk loading since the data is
> > > incremental in nature.
> > >
> > > On Thu, Sep 25, 2014 at 11:36 AM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > For #1, do you plan to use bulk load ?
> > > >
> > > > For #3, take a look at HBASE-5416 which introduced essential column
> > > family.
> > > > In your query, you can designate the smaller column family as
> essential
> > > > column family where smaller columns are queried.
> > > >
> > > > Cheers
> > > >
> > > > On Thu, Sep 25, 2014 at 9:57 AM, Nishanth S <nishanth.2884@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > This question may have been asked many times but I would really
> > > > appreciate
> > > > > if some one can help me on how to go about this.
> > > > >
> > > > >
> > > > > Currently my hbase table consists of about 10 columns per row
> which
> > in
> > > > > total has an average size of 5K.The chunk of the size is held by
> > one
> > > > > particular column(more than 4K).Would it help to move this column
> > out
> > > > to a
> > > > > different column family when we do reads.There are cases where we
> > just
> > > > need
> > > > > to access the smaller columns and there is another set of use
> cases
> > > > where
> > > > > you need both the data(the one in smaller column and this huge data
> > > > > chunk).In general I am trying to answer the below questions in this
> > > > > scenario.
> > > > >
> > > > >
> > > > > 1.Would seperating to multiple column families affect hbase write
> > > > > performance?
> > > > >
> > > > > 2. How would if affect my read performance considering both the
> read
> > > > cases?
> > > > >
> > > > > 3.Is there any advantage that I am gaining by seperating into
> > multiple
> > > > cfs?
> > > > >
> > > > >
> > > > > I would really appreciate if any one could point me in the right
> > > > > direction.
> > > > >
> > > > >
> > > > > -Thanks
> > > > > Nishan
> > > > >
> > > >
> > >
> >
>
Re: Wide Rows vs Multiple column families
Posted by Nishanth S <ni...@gmail.com>.
Hey Ted,
I was in the process of comparing insert throughputs which we
discussed using ycsb.What I could find is that when I split the data into
multiple column families the insert through is coming down to half when
compared to persisting into a single column family.Do you think this is
possible or am I doing some thing wrong.
-Nishan
On Thu, Sep 25, 2014 at 11:56 AM, Ted Yu <yu...@gmail.com> wrote:
> There should not be impact to hbase write performance for two column
> families.
>
> Cheers
>
> On Thu, Sep 25, 2014 at 10:53 AM, Nishanth S <ni...@gmail.com>
> wrote:
>
> > Thank you Ted.No I do not plan to use bulk loading since the data is
> > incremental in nature.
> >
> > On Thu, Sep 25, 2014 at 11:36 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > For #1, do you plan to use bulk load ?
> > >
> > > For #3, take a look at HBASE-5416 which introduced essential column
> > family.
> > > In your query, you can designate the smaller column family as essential
> > > column family where smaller columns are queried.
> > >
> > > Cheers
> > >
> > > On Thu, Sep 25, 2014 at 9:57 AM, Nishanth S <ni...@gmail.com>
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > This question may have been asked many times but I would really
> > > appreciate
> > > > if some one can help me on how to go about this.
> > > >
> > > >
> > > > Currently my hbase table consists of about 10 columns per row which
> in
> > > > total has an average size of 5K.The chunk of the size is held by
> one
> > > > particular column(more than 4K).Would it help to move this column
> out
> > > to a
> > > > different column family when we do reads.There are cases where we
> just
> > > need
> > > > to access the smaller columns and there is another set of use cases
> > > where
> > > > you need both the data(the one in smaller column and this huge data
> > > > chunk).In general I am trying to answer the below questions in this
> > > > scenario.
> > > >
> > > >
> > > > 1.Would seperating to multiple column families affect hbase write
> > > > performance?
> > > >
> > > > 2. How would if affect my read performance considering both the read
> > > cases?
> > > >
> > > > 3.Is there any advantage that I am gaining by seperating into
> multiple
> > > cfs?
> > > >
> > > >
> > > > I would really appreciate if any one could point me in the right
> > > > direction.
> > > >
> > > >
> > > > -Thanks
> > > > Nishan
> > > >
> > >
> >
>
Re: Wide Rows vs Multiple column families
Posted by Nishanth S <ni...@gmail.com>.
Thank you Ted.
-Nishan
On Thu, Sep 25, 2014 at 11:56 AM, Ted Yu <yu...@gmail.com> wrote:
> There should not be impact to hbase write performance for two column
> families.
>
> Cheers
>
> On Thu, Sep 25, 2014 at 10:53 AM, Nishanth S <ni...@gmail.com>
> wrote:
>
> > Thank you Ted.No I do not plan to use bulk loading since the data is
> > incremental in nature.
> >
> > On Thu, Sep 25, 2014 at 11:36 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > For #1, do you plan to use bulk load ?
> > >
> > > For #3, take a look at HBASE-5416 which introduced essential column
> > family.
> > > In your query, you can designate the smaller column family as essential
> > > column family where smaller columns are queried.
> > >
> > > Cheers
> > >
> > > On Thu, Sep 25, 2014 at 9:57 AM, Nishanth S <ni...@gmail.com>
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > This question may have been asked many times but I would really
> > > appreciate
> > > > if some one can help me on how to go about this.
> > > >
> > > >
> > > > Currently my hbase table consists of about 10 columns per row which
> in
> > > > total has an average size of 5K.The chunk of the size is held by
> one
> > > > particular column(more than 4K).Would it help to move this column
> out
> > > to a
> > > > different column family when we do reads.There are cases where we
> just
> > > need
> > > > to access the smaller columns and there is another set of use cases
> > > where
> > > > you need both the data(the one in smaller column and this huge data
> > > > chunk).In general I am trying to answer the below questions in this
> > > > scenario.
> > > >
> > > >
> > > > 1.Would seperating to multiple column families affect hbase write
> > > > performance?
> > > >
> > > > 2. How would if affect my read performance considering both the read
> > > cases?
> > > >
> > > > 3.Is there any advantage that I am gaining by seperating into
> multiple
> > > cfs?
> > > >
> > > >
> > > > I would really appreciate if any one could point me in the right
> > > > direction.
> > > >
> > > >
> > > > -Thanks
> > > > Nishan
> > > >
> > >
> >
>
Re: Wide Rows vs Multiple column families
Posted by Ted Yu <yu...@gmail.com>.
There should not be impact to hbase write performance for two column
families.
Cheers
On Thu, Sep 25, 2014 at 10:53 AM, Nishanth S <ni...@gmail.com>
wrote:
> Thank you Ted.No I do not plan to use bulk loading since the data is
> incremental in nature.
>
> On Thu, Sep 25, 2014 at 11:36 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > For #1, do you plan to use bulk load ?
> >
> > For #3, take a look at HBASE-5416 which introduced essential column
> family.
> > In your query, you can designate the smaller column family as essential
> > column family where smaller columns are queried.
> >
> > Cheers
> >
> > On Thu, Sep 25, 2014 at 9:57 AM, Nishanth S <ni...@gmail.com>
> > wrote:
> >
> > > Hi everyone,
> > >
> > > This question may have been asked many times but I would really
> > appreciate
> > > if some one can help me on how to go about this.
> > >
> > >
> > > Currently my hbase table consists of about 10 columns per row which in
> > > total has an average size of 5K.The chunk of the size is held by one
> > > particular column(more than 4K).Would it help to move this column out
> > to a
> > > different column family when we do reads.There are cases where we just
> > need
> > > to access the smaller columns and there is another set of use cases
> > where
> > > you need both the data(the one in smaller column and this huge data
> > > chunk).In general I am trying to answer the below questions in this
> > > scenario.
> > >
> > >
> > > 1.Would seperating to multiple column families affect hbase write
> > > performance?
> > >
> > > 2. How would if affect my read performance considering both the read
> > cases?
> > >
> > > 3.Is there any advantage that I am gaining by seperating into multiple
> > cfs?
> > >
> > >
> > > I would really appreciate if any one could point me in the right
> > > direction.
> > >
> > >
> > > -Thanks
> > > Nishan
> > >
> >
>
Re: Wide Rows vs Multiple column families
Posted by Nishanth S <ni...@gmail.com>.
Thank you Ted.No I do not plan to use bulk loading since the data is
incremental in nature.
On Thu, Sep 25, 2014 at 11:36 AM, Ted Yu <yu...@gmail.com> wrote:
> For #1, do you plan to use bulk load ?
>
> For #3, take a look at HBASE-5416 which introduced essential column family.
> In your query, you can designate the smaller column family as essential
> column family where smaller columns are queried.
>
> Cheers
>
> On Thu, Sep 25, 2014 at 9:57 AM, Nishanth S <ni...@gmail.com>
> wrote:
>
> > Hi everyone,
> >
> > This question may have been asked many times but I would really
> appreciate
> > if some one can help me on how to go about this.
> >
> >
> > Currently my hbase table consists of about 10 columns per row which in
> > total has an average size of 5K.The chunk of the size is held by one
> > particular column(more than 4K).Would it help to move this column out
> to a
> > different column family when we do reads.There are cases where we just
> need
> > to access the smaller columns and there is another set of use cases
> where
> > you need both the data(the one in smaller column and this huge data
> > chunk).In general I am trying to answer the below questions in this
> > scenario.
> >
> >
> > 1.Would seperating to multiple column families affect hbase write
> > performance?
> >
> > 2. How would if affect my read performance considering both the read
> cases?
> >
> > 3.Is there any advantage that I am gaining by seperating into multiple
> cfs?
> >
> >
> > I would really appreciate if any one could point me in the right
> > direction.
> >
> >
> > -Thanks
> > Nishan
> >
>
Re: Wide Rows vs Multiple column families
Posted by Ted Yu <yu...@gmail.com>.
For #1, do you plan to use bulk load ?
For #3, take a look at HBASE-5416 which introduced essential column family.
In your query, you can designate the smaller column family as essential
column family where smaller columns are queried.
Cheers
On Thu, Sep 25, 2014 at 9:57 AM, Nishanth S <ni...@gmail.com> wrote:
> Hi everyone,
>
> This question may have been asked many times but I would really appreciate
> if some one can help me on how to go about this.
>
>
> Currently my hbase table consists of about 10 columns per row which in
> total has an average size of 5K.The chunk of the size is held by one
> particular column(more than 4K).Would it help to move this column out to a
> different column family when we do reads.There are cases where we just need
> to access the smaller columns and there is another set of use cases where
> you need both the data(the one in smaller column and this huge data
> chunk).In general I am trying to answer the below questions in this
> scenario.
>
>
> 1.Would seperating to multiple column families affect hbase write
> performance?
>
> 2. How would if affect my read performance considering both the read cases?
>
> 3.Is there any advantage that I am gaining by seperating into multiple cfs?
>
>
> I would really appreciate if any one could point me in the right
> direction.
>
>
> -Thanks
> Nishan
>