You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Shushant Arora <sh...@gmail.com> on 2015/08/17 15:45:27 UTC

hbase doubts

1.Is there any max limit on key size of hbase table.
2.Is multiple small tables vs one large table which one is preferred.
3.for bulk load -when  LoadIncremantalHfile is run it again recalculates
the region splits based on region boundary - is this division happens on
client side or server side again at region server or hbase master and then
it assigns the splits which cross target region boundary to desired
regionserver.

Re: hbase doubts

Posted by Shushant Arora <sh...@gmail.com>.
and will using keyprefixregionsplit policy instead of default Increasing to
upperbound split policy help here?

On Wed, Aug 19, 2015 at 10:23 AM, Shushant Arora <sh...@gmail.com>
wrote:

> When last region gets new data and split in two - what is the split point
> - say last reagion was having 10 files and split alogorithm decided to
> split this region-
>
> Will the two children regions have 5-5 files or the key space of original
> region(parent region) say have range (2015-08-01#guid to 2015-08-06#guid)
> will be divided to 2 equal parts child1 has (2015-08-01#guid to
> 2015-08-03#guids) and child2 has range (2015-08-04#guid to 2015-08-06#guid)
> and all data is  rewritten in child regions to accomany this key range and
> then since its time series based so new data will come in increasing dates
> and for dates>2015-08-06 only so will go to child2 and child1 wil always be
> half filled. And child2 only will lead to new splits when reached split
> size threshold.
>
>
>
>
>
>
> On Wed, Aug 19, 2015 at 4:16 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> Since year and month are part of the row key in this scenario (instead of
>> just the day of month), the last region would get new data and be split.
>>
>> Is this effect desirable for your app ?
>>
>> Cheers
>>
>> On Tue, Aug 18, 2015 at 12:55 PM, Shushant Arora <
>> shushantarora09@gmail.com>
>> wrote:
>>
>> > for hbase key containing time as prefix say(yyyy-mm-dd#other fields of
>> guid
>> > base) I am using bulk load to avoid hot spot of regionserver (avoiding
>> > write to WAL).
>> >
>> > What should be the initial splits of regions. Say I have 30
>> regionserves.
>> >
>> > shall intial 30 days as intial splits and then auto split takes care of
>> > splitting regions if it grows further will serve ?
>> > Or since if it has date as prefix and when region is split in 2 from
>> midway
>> > - and new data will come for increasing date only will lead to  one
>> region
>> > to be half filled always and rest half never filled?
>> >
>> > On Tue, Aug 18, 2015 at 9:41 PM, anil gupta <an...@gmail.com>
>> wrote:
>> >
>> > > As per my experience, Phoenix is way superior than Hive-HBase
>> integration
>> > > for sql-like querying on HBase. It's because, Phoenix is built on top
>> of
>> > > HBase unlike Hive.
>> > >
>> > > On Tue, Aug 18, 2015 at 9:09 AM, Ted Yu <yu...@gmail.com> wrote:
>> > >
>> > > > To my knowledge, Phoenix provides better integration with hbase.
>> > > >
>> > > > A third possibility is Spark on HBase.
>> > > >
>> > > > If you want to explore these alternatives, I suggest asking on
>> > respective
>> > > > mailing lists where you can get expert opinions.
>> > > >
>> > > > Cheers
>> > > >
>> > > > On Tue, Aug 18, 2015 at 9:03 AM, Shushant Arora <
>> > > shushantarora09@gmail.com
>> > > > >
>> > > > wrote:
>> > > >
>> > > > > Thanks!
>> > > > >
>> > > > > Which one is better for sqlkind of queries over hbase (queries
>> > involve
>> > > > > filter , key range scan), aggregates by column values.
>> > > > > .
>> > > > > 1.Hive storage handlers
>> > > > > 2.or Phoenix
>> > > > >
>> > > > > On Tue, Aug 18, 2015 at 9:14 PM, Ted Yu <yu...@gmail.com>
>> wrote:
>> > > > >
>> > > > > > For #1, if you want to count distinct values for F1, you can
>> write
>> > a
>> > > > > > coprocessor which aggregates the count on region server and
>> returns
>> > > the
>> > > > > > result to client which does the final aggregation.
>> > > > > >
>> > > > > > Take a look
>> > > > > > at
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java
>> > > > > > and related classes for example.
>> > > > > >
>> > > > > > On Mon, Aug 17, 2015 at 10:08 PM, Shushant Arora <
>> > > > > > shushantarora09@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Thanks !
>> > > > > > > few more doubts :
>> > > > > > >
>> > > > > > > 1.Say if requirement is to count distinct value of F1-
>> > > > > > >
>> > > > > > > If field is part of key- is hbase can't just scan key and skip
>> > > value
>> > > > > > > deserialsation and return result to client which will
>> calculate
>> > > > > distinct
>> > > > > > > and in second approcah Hbase will desrialise the value of
>> return
>> > > > column
>> > > > > > > containing F1 to cleint which will calculate the distinct.
>> > > > > > >
>> > > > > > > 2.For bulk load when LoadIncrementalHFiles runs and
>> regionserver
>> > > > moves
>> > > > > > the
>> > > > > > > hfiles from hdfs to region directory - does regionserver
>> localise
>> > > the
>> > > > > > hfile
>> > > > > > > by downloading it to local and then uploading again in region
>> > > > > directory?
>> > > > > > Or
>> > > > > > > it just moves to to region directory and wait for next
>> compaction
>> > > to
>> > > > > get
>> > > > > > it
>> > > > > > > localise  as in regionserver failure case?
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > On Mon, Aug 17, 2015 at 11:00 PM, Ted Yu <yuzhihong@gmail.com
>> >
>> > > > wrote:
>> > > > > > >
>> > > > > > > > For both scenarios you mentioned, field is not leading part
>> of
>> > > row
>> > > > > key.
>> > > > > > > > You would need to specify timerange or start row / stop row
>> to
>> > > > narrow
>> > > > > > the
>> > > > > > > > key range being scanned.
>> > > > > > > >
>> > > > > > > > I am leaning toward using second approach.
>> > > > > > > >
>> > > > > > > > Cheers
>> > > > > > > >
>> > > > > > > > On Mon, Aug 17, 2015 at 9:41 AM, Shushant Arora <
>> > > > > > > shushantarora09@gmail.com
>> > > > > > > > >
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > ~8-10 fields of size (5 of  20 bytes each )and 3 fields of
>> > size
>> > > > 200
>> > > > > > > bytes
>> > > > > > > > > each.
>> > > > > > > > >
>> > > > > > > > > On Mon, Aug 17, 2015 at 9:55 PM, Ted Yu <
>> yuzhihong@gmail.com
>> > >
>> > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > How many fields such as F1 are you considering for
>> > embedding
>> > > in
>> > > > > row
>> > > > > > > > key ?
>> > > > > > > > > >
>> > > > > > > > > > Suggested reading:
>> > > > > > > > > > http://hbase.apache.org/book.html#rowkey.design
>> > > > > > > > > > http://hbase.apache.org/book.html#client.filter.kvm
>> (see
>> > > > > > > > > > ColumnPrefixFilter)
>> > > > > > > > > >
>> > > > > > > > > > Cheers
>> > > > > > > > > >
>> > > > > > > > > > On Mon, Aug 17, 2015 at 8:13 AM, Shushant Arora <
>> > > > > > > > > shushantarora09@gmail.com
>> > > > > > > > > > >
>> > > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > 1.so size limit is per cell's identifier + value ?
>> > > > > > > > > > >
>> > > > > > > > > > > What is more optimise - to have field in key or in
>> column
>> > > > > > family's
>> > > > > > > > > > column ?
>> > > > > > > > > > > If pattern is like every row has that field.
>> > > > > > > > > > >
>> > > > > > > > > > > Say I have a field F1 in all rows so
>> > > > > > > > > > > Situtatio -1
>> > > > > > > > > > > key1#F1(as composite key)  - and rest fields in column
>> > > > > > > > > > >
>> > > > > > > > > > > Situation-2
>> > > > > > > > > > > key1 as key and F1 part of column family.
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > This is the main reason I  asked the key size limit.
>> > > > > > > > > > > If I asked for no of rows where F1 is = 'someval'
>> will it
>> > > be
>> > > > > > faster
>> > > > > > > > in
>> > > > > > > > > > > situation-1 than in situation-2. Since in 1 it can
>> return
>> > > the
>> > > > > > > result
>> > > > > > > > > just
>> > > > > > > > > > > by traversing keys no need to read columns?
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > On Mon, Aug 17, 2015 at 8:27 PM, Ted Yu <
>> > > yuzhihong@gmail.com
>> > > > >
>> > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > For #1, it is the limit on a single keyvalue, not
>> row,
>> > > not
>> > > > > key.
>> > > > > > > > > > > >
>> > > > > > > > > > > > For #2, please see the following:
>> > > > > > > > > > > >
>> > > > > > > > > > > > http://hbase.apache.org/book.html#store.memstore
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > >
>> > http://hbase.apache.org/book.html#regionserver_splitting_implementation
>> > > > > > > > > > > >
>> > > > > > > > > > > > Cheers
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Mon, Aug 17, 2015 at 7:36 AM, Shushant Arora <
>> > > > > > > > > > > shushantarora09@gmail.com
>> > > > > > > > > > > > >
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > 1.Is hbase.client.keyvalue.maxsize  is max size of
>> > row
>> > > or
>> > > > > key
>> > > > > > > > only
>> > > > > > > > > ?
>> > > > > > > > > > Is
>> > > > > > > > > > > > > there any limit on key size only ?
>> > > > > > > > > > > > > 2.Access pattern is mostly on key based only- Is
>> > > > memstores
>> > > > > > and
>> > > > > > > > > > regions
>> > > > > > > > > > > > on a
>> > > > > > > > > > > > > regionserver are per table basis? Is it if I have
>> > > > multiple
>> > > > > > > tables
>> > > > > > > > > it
>> > > > > > > > > > > will
>> > > > > > > > > > > > > have multiple memstores instead of few if it would
>> > have
>> > > > > been
>> > > > > > > one
>> > > > > > > > > > large
>> > > > > > > > > > > > > table ?
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > On Mon, Aug 17, 2015 at 7:29 PM, Ted Yu <
>> > > > > yuzhihong@gmail.com
>> > > > > > >
>> > > > > > > > > wrote:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > For #1, take a look at the following in
>> > > > > hbase-default.xml :
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >     <name>hbase.client.keyvalue.maxsize</name>
>> > > > > > > > > > > > > >     <value>10485760</value>
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > For #2, it would be easier to answer if you can
>> > > outline
>> > > > > > > access
>> > > > > > > > > > > patterns
>> > > > > > > > > > > > > in
>> > > > > > > > > > > > > > your app.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > For #3, adjustment according to current region
>> > > > boundaries
>> > > > > > is
>> > > > > > > > done
>> > > > > > > > > > > > client
>> > > > > > > > > > > > > > side. Take a look at the javadoc for
>> LoadQueueItem
>> > > > > > > > > > > > > > in LoadIncrementalHFiles.java
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Cheers
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora
>> <
>> > > > > > > > > > > > > shushantarora09@gmail.com
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 1.Is there any max limit on key size of hbase
>> > > table.
>> > > > > > > > > > > > > > > 2.Is multiple small tables vs one large table
>> > which
>> > > > one
>> > > > > > is
>> > > > > > > > > > > preferred.
>> > > > > > > > > > > > > > > 3.for bulk load -when  LoadIncremantalHfile is
>> > run
>> > > it
>> > > > > > again
>> > > > > > > > > > > > > recalculates
>> > > > > > > > > > > > > > > the region splits based on region boundary -
>> is
>> > > this
>> > > > > > > division
>> > > > > > > > > > > happens
>> > > > > > > > > > > > > on
>> > > > > > > > > > > > > > > client side or server side again at region
>> server
>> > > or
>> > > > > > hbase
>> > > > > > > > > master
>> > > > > > > > > > > and
>> > > > > > > > > > > > > > then
>> > > > > > > > > > > > > > > it assigns the splits which cross target
>> region
>> > > > > boundary
>> > > > > > to
>> > > > > > > > > > desired
>> > > > > > > > > > > > > > > regionserver.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Thanks & Regards,
>> > > Anil Gupta
>> > >
>> >
>>
>
>

Re: hbase doubts

Posted by Ted Yu <yu...@gmail.com>.
Please read the following w.r.t. region splits:

http://hbase.apache.org/book.html#arch.region.splits (there is link to blog
with details)
http://hbase.apache.org/book.html#manual_region_splitting_decisions

FYI

On Tue, Aug 18, 2015 at 9:53 PM, Shushant Arora <sh...@gmail.com>
wrote:

> When last region gets new data and split in two - what is the split point -
> say last reagion was having 10 files and split alogorithm decided to split
> this region-
>
> Will the two children regions have 5-5 files or the key space of original
> region(parent region) say have range (2015-08-01#guid to 2015-08-06#guid)
> will be divided to 2 equal parts child1 has (2015-08-01#guid to
> 2015-08-03#guids) and child2 has range (2015-08-04#guid to 2015-08-06#guid)
> and all data is  rewritten in child regions to accomany this key range and
> then since its time series based so new data will come in increasing dates
> and for dates>2015-08-06 only so will go to child2 and child1 wil always be
> half filled. And child2 only will lead to new splits when reached split
> size threshold.
>
>
>
>
>
>
> On Wed, Aug 19, 2015 at 4:16 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > Since year and month are part of the row key in this scenario (instead of
> > just the day of month), the last region would get new data and be split.
> >
> > Is this effect desirable for your app ?
> >
> > Cheers
> >
> > On Tue, Aug 18, 2015 at 12:55 PM, Shushant Arora <
> > shushantarora09@gmail.com>
> > wrote:
> >
> > > for hbase key containing time as prefix say(yyyy-mm-dd#other fields of
> > guid
> > > base) I am using bulk load to avoid hot spot of regionserver (avoiding
> > > write to WAL).
> > >
> > > What should be the initial splits of regions. Say I have 30
> regionserves.
> > >
> > > shall intial 30 days as intial splits and then auto split takes care of
> > > splitting regions if it grows further will serve ?
> > > Or since if it has date as prefix and when region is split in 2 from
> > midway
> > > - and new data will come for increasing date only will lead to  one
> > region
> > > to be half filled always and rest half never filled?
> > >
> > > On Tue, Aug 18, 2015 at 9:41 PM, anil gupta <an...@gmail.com>
> > wrote:
> > >
> > > > As per my experience, Phoenix is way superior than Hive-HBase
> > integration
> > > > for sql-like querying on HBase. It's because, Phoenix is built on top
> > of
> > > > HBase unlike Hive.
> > > >
> > > > On Tue, Aug 18, 2015 at 9:09 AM, Ted Yu <yu...@gmail.com> wrote:
> > > >
> > > > > To my knowledge, Phoenix provides better integration with hbase.
> > > > >
> > > > > A third possibility is Spark on HBase.
> > > > >
> > > > > If you want to explore these alternatives, I suggest asking on
> > > respective
> > > > > mailing lists where you can get expert opinions.
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Tue, Aug 18, 2015 at 9:03 AM, Shushant Arora <
> > > > shushantarora09@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > Which one is better for sqlkind of queries over hbase (queries
> > > involve
> > > > > > filter , key range scan), aggregates by column values.
> > > > > > .
> > > > > > 1.Hive storage handlers
> > > > > > 2.or Phoenix
> > > > > >
> > > > > > On Tue, Aug 18, 2015 at 9:14 PM, Ted Yu <yu...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > For #1, if you want to count distinct values for F1, you can
> > write
> > > a
> > > > > > > coprocessor which aggregates the count on region server and
> > returns
> > > > the
> > > > > > > result to client which does the final aggregation.
> > > > > > >
> > > > > > > Take a look
> > > > > > > at
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java
> > > > > > > and related classes for example.
> > > > > > >
> > > > > > > On Mon, Aug 17, 2015 at 10:08 PM, Shushant Arora <
> > > > > > > shushantarora09@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks !
> > > > > > > > few more doubts :
> > > > > > > >
> > > > > > > > 1.Say if requirement is to count distinct value of F1-
> > > > > > > >
> > > > > > > > If field is part of key- is hbase can't just scan key and
> skip
> > > > value
> > > > > > > > deserialsation and return result to client which will
> calculate
> > > > > > distinct
> > > > > > > > and in second approcah Hbase will desrialise the value of
> > return
> > > > > column
> > > > > > > > containing F1 to cleint which will calculate the distinct.
> > > > > > > >
> > > > > > > > 2.For bulk load when LoadIncrementalHFiles runs and
> > regionserver
> > > > > moves
> > > > > > > the
> > > > > > > > hfiles from hdfs to region directory - does regionserver
> > localise
> > > > the
> > > > > > > hfile
> > > > > > > > by downloading it to local and then uploading again in region
> > > > > > directory?
> > > > > > > Or
> > > > > > > > it just moves to to region directory and wait for next
> > compaction
> > > > to
> > > > > > get
> > > > > > > it
> > > > > > > > localise  as in regionserver failure case?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Aug 17, 2015 at 11:00 PM, Ted Yu <
> yuzhihong@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > For both scenarios you mentioned, field is not leading part
> > of
> > > > row
> > > > > > key.
> > > > > > > > > You would need to specify timerange or start row / stop row
> > to
> > > > > narrow
> > > > > > > the
> > > > > > > > > key range being scanned.
> > > > > > > > >
> > > > > > > > > I am leaning toward using second approach.
> > > > > > > > >
> > > > > > > > > Cheers
> > > > > > > > >
> > > > > > > > > On Mon, Aug 17, 2015 at 9:41 AM, Shushant Arora <
> > > > > > > > shushantarora09@gmail.com
> > > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > ~8-10 fields of size (5 of  20 bytes each )and 3 fields
> of
> > > size
> > > > > 200
> > > > > > > > bytes
> > > > > > > > > > each.
> > > > > > > > > >
> > > > > > > > > > On Mon, Aug 17, 2015 at 9:55 PM, Ted Yu <
> > yuzhihong@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > How many fields such as F1 are you considering for
> > > embedding
> > > > in
> > > > > > row
> > > > > > > > > key ?
> > > > > > > > > > >
> > > > > > > > > > > Suggested reading:
> > > > > > > > > > > http://hbase.apache.org/book.html#rowkey.design
> > > > > > > > > > > http://hbase.apache.org/book.html#client.filter.kvm
> (see
> > > > > > > > > > > ColumnPrefixFilter)
> > > > > > > > > > >
> > > > > > > > > > > Cheers
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Aug 17, 2015 at 8:13 AM, Shushant Arora <
> > > > > > > > > > shushantarora09@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > 1.so size limit is per cell's identifier + value ?
> > > > > > > > > > > >
> > > > > > > > > > > > What is more optimise - to have field in key or in
> > column
> > > > > > > family's
> > > > > > > > > > > column ?
> > > > > > > > > > > > If pattern is like every row has that field.
> > > > > > > > > > > >
> > > > > > > > > > > > Say I have a field F1 in all rows so
> > > > > > > > > > > > Situtatio -1
> > > > > > > > > > > > key1#F1(as composite key)  - and rest fields in
> column
> > > > > > > > > > > >
> > > > > > > > > > > > Situation-2
> > > > > > > > > > > > key1 as key and F1 part of column family.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > This is the main reason I  asked the key size limit.
> > > > > > > > > > > > If I asked for no of rows where F1 is = 'someval'
> will
> > it
> > > > be
> > > > > > > faster
> > > > > > > > > in
> > > > > > > > > > > > situation-1 than in situation-2. Since in 1 it can
> > return
> > > > the
> > > > > > > > result
> > > > > > > > > > just
> > > > > > > > > > > > by traversing keys no need to read columns?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Aug 17, 2015 at 8:27 PM, Ted Yu <
> > > > yuzhihong@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > For #1, it is the limit on a single keyvalue, not
> > row,
> > > > not
> > > > > > key.
> > > > > > > > > > > > >
> > > > > > > > > > > > > For #2, please see the following:
> > > > > > > > > > > > >
> > > > > > > > > > > > > http://hbase.apache.org/book.html#store.memstore
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> http://hbase.apache.org/book.html#regionserver_splitting_implementation
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cheers
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Aug 17, 2015 at 7:36 AM, Shushant Arora <
> > > > > > > > > > > > shushantarora09@gmail.com
> > > > > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 1.Is hbase.client.keyvalue.maxsize  is max size
> of
> > > row
> > > > or
> > > > > > key
> > > > > > > > > only
> > > > > > > > > > ?
> > > > > > > > > > > Is
> > > > > > > > > > > > > > there any limit on key size only ?
> > > > > > > > > > > > > > 2.Access pattern is mostly on key based only- Is
> > > > > memstores
> > > > > > > and
> > > > > > > > > > > regions
> > > > > > > > > > > > > on a
> > > > > > > > > > > > > > regionserver are per table basis? Is it if I have
> > > > > multiple
> > > > > > > > tables
> > > > > > > > > > it
> > > > > > > > > > > > will
> > > > > > > > > > > > > > have multiple memstores instead of few if it
> would
> > > have
> > > > > > been
> > > > > > > > one
> > > > > > > > > > > large
> > > > > > > > > > > > > > table ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Aug 17, 2015 at 7:29 PM, Ted Yu <
> > > > > > yuzhihong@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > For #1, take a look at the following in
> > > > > > hbase-default.xml :
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >     <name>hbase.client.keyvalue.maxsize</name>
> > > > > > > > > > > > > > >     <value>10485760</value>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > For #2, it would be easier to answer if you can
> > > > outline
> > > > > > > > access
> > > > > > > > > > > > patterns
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > your app.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > For #3, adjustment according to current region
> > > > > boundaries
> > > > > > > is
> > > > > > > > > done
> > > > > > > > > > > > > client
> > > > > > > > > > > > > > > side. Take a look at the javadoc for
> > LoadQueueItem
> > > > > > > > > > > > > > > in LoadIncrementalHFiles.java
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Cheers
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Aug 17, 2015 at 6:45 AM, Shushant
> Arora <
> > > > > > > > > > > > > > shushantarora09@gmail.com
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 1.Is there any max limit on key size of hbase
> > > > table.
> > > > > > > > > > > > > > > > 2.Is multiple small tables vs one large table
> > > which
> > > > > one
> > > > > > > is
> > > > > > > > > > > > preferred.
> > > > > > > > > > > > > > > > 3.for bulk load -when  LoadIncremantalHfile
> is
> > > run
> > > > it
> > > > > > > again
> > > > > > > > > > > > > > recalculates
> > > > > > > > > > > > > > > > the region splits based on region boundary -
> is
> > > > this
> > > > > > > > division
> > > > > > > > > > > > happens
> > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > client side or server side again at region
> > server
> > > > or
> > > > > > > hbase
> > > > > > > > > > master
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > then
> > > > > > > > > > > > > > > > it assigns the splits which cross target
> region
> > > > > > boundary
> > > > > > > to
> > > > > > > > > > > desired
> > > > > > > > > > > > > > > > regionserver.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks & Regards,
> > > > Anil Gupta
> > > >
> > >
> >
>

Re: hbase doubts

Posted by Shushant Arora <sh...@gmail.com>.
When last region gets new data and split in two - what is the split point -
say last reagion was having 10 files and split alogorithm decided to split
this region-

Will the two children regions have 5-5 files or the key space of original
region(parent region) say have range (2015-08-01#guid to 2015-08-06#guid)
will be divided to 2 equal parts child1 has (2015-08-01#guid to
2015-08-03#guids) and child2 has range (2015-08-04#guid to 2015-08-06#guid)
and all data is  rewritten in child regions to accomany this key range and
then since its time series based so new data will come in increasing dates
and for dates>2015-08-06 only so will go to child2 and child1 wil always be
half filled. And child2 only will lead to new splits when reached split
size threshold.






On Wed, Aug 19, 2015 at 4:16 AM, Ted Yu <yu...@gmail.com> wrote:

> Since year and month are part of the row key in this scenario (instead of
> just the day of month), the last region would get new data and be split.
>
> Is this effect desirable for your app ?
>
> Cheers
>
> On Tue, Aug 18, 2015 at 12:55 PM, Shushant Arora <
> shushantarora09@gmail.com>
> wrote:
>
> > for hbase key containing time as prefix say(yyyy-mm-dd#other fields of
> guid
> > base) I am using bulk load to avoid hot spot of regionserver (avoiding
> > write to WAL).
> >
> > What should be the initial splits of regions. Say I have 30 regionserves.
> >
> > shall intial 30 days as intial splits and then auto split takes care of
> > splitting regions if it grows further will serve ?
> > Or since if it has date as prefix and when region is split in 2 from
> midway
> > - and new data will come for increasing date only will lead to  one
> region
> > to be half filled always and rest half never filled?
> >
> > On Tue, Aug 18, 2015 at 9:41 PM, anil gupta <an...@gmail.com>
> wrote:
> >
> > > As per my experience, Phoenix is way superior than Hive-HBase
> integration
> > > for sql-like querying on HBase. It's because, Phoenix is built on top
> of
> > > HBase unlike Hive.
> > >
> > > On Tue, Aug 18, 2015 at 9:09 AM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > To my knowledge, Phoenix provides better integration with hbase.
> > > >
> > > > A third possibility is Spark on HBase.
> > > >
> > > > If you want to explore these alternatives, I suggest asking on
> > respective
> > > > mailing lists where you can get expert opinions.
> > > >
> > > > Cheers
> > > >
> > > > On Tue, Aug 18, 2015 at 9:03 AM, Shushant Arora <
> > > shushantarora09@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Thanks!
> > > > >
> > > > > Which one is better for sqlkind of queries over hbase (queries
> > involve
> > > > > filter , key range scan), aggregates by column values.
> > > > > .
> > > > > 1.Hive storage handlers
> > > > > 2.or Phoenix
> > > > >
> > > > > On Tue, Aug 18, 2015 at 9:14 PM, Ted Yu <yu...@gmail.com>
> wrote:
> > > > >
> > > > > > For #1, if you want to count distinct values for F1, you can
> write
> > a
> > > > > > coprocessor which aggregates the count on region server and
> returns
> > > the
> > > > > > result to client which does the final aggregation.
> > > > > >
> > > > > > Take a look
> > > > > > at
> > > > > >
> > > > >
> > > >
> > >
> >
> hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java
> > > > > > and related classes for example.
> > > > > >
> > > > > > On Mon, Aug 17, 2015 at 10:08 PM, Shushant Arora <
> > > > > > shushantarora09@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks !
> > > > > > > few more doubts :
> > > > > > >
> > > > > > > 1.Say if requirement is to count distinct value of F1-
> > > > > > >
> > > > > > > If field is part of key- is hbase can't just scan key and skip
> > > value
> > > > > > > deserialsation and return result to client which will calculate
> > > > > distinct
> > > > > > > and in second approcah Hbase will desrialise the value of
> return
> > > > column
> > > > > > > containing F1 to cleint which will calculate the distinct.
> > > > > > >
> > > > > > > 2.For bulk load when LoadIncrementalHFiles runs and
> regionserver
> > > > moves
> > > > > > the
> > > > > > > hfiles from hdfs to region directory - does regionserver
> localise
> > > the
> > > > > > hfile
> > > > > > > by downloading it to local and then uploading again in region
> > > > > directory?
> > > > > > Or
> > > > > > > it just moves to to region directory and wait for next
> compaction
> > > to
> > > > > get
> > > > > > it
> > > > > > > localise  as in regionserver failure case?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Aug 17, 2015 at 11:00 PM, Ted Yu <yu...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > > For both scenarios you mentioned, field is not leading part
> of
> > > row
> > > > > key.
> > > > > > > > You would need to specify timerange or start row / stop row
> to
> > > > narrow
> > > > > > the
> > > > > > > > key range being scanned.
> > > > > > > >
> > > > > > > > I am leaning toward using second approach.
> > > > > > > >
> > > > > > > > Cheers
> > > > > > > >
> > > > > > > > On Mon, Aug 17, 2015 at 9:41 AM, Shushant Arora <
> > > > > > > shushantarora09@gmail.com
> > > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > ~8-10 fields of size (5 of  20 bytes each )and 3 fields of
> > size
> > > > 200
> > > > > > > bytes
> > > > > > > > > each.
> > > > > > > > >
> > > > > > > > > On Mon, Aug 17, 2015 at 9:55 PM, Ted Yu <
> yuzhihong@gmail.com
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > How many fields such as F1 are you considering for
> > embedding
> > > in
> > > > > row
> > > > > > > > key ?
> > > > > > > > > >
> > > > > > > > > > Suggested reading:
> > > > > > > > > > http://hbase.apache.org/book.html#rowkey.design
> > > > > > > > > > http://hbase.apache.org/book.html#client.filter.kvm (see
> > > > > > > > > > ColumnPrefixFilter)
> > > > > > > > > >
> > > > > > > > > > Cheers
> > > > > > > > > >
> > > > > > > > > > On Mon, Aug 17, 2015 at 8:13 AM, Shushant Arora <
> > > > > > > > > shushantarora09@gmail.com
> > > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > 1.so size limit is per cell's identifier + value ?
> > > > > > > > > > >
> > > > > > > > > > > What is more optimise - to have field in key or in
> column
> > > > > > family's
> > > > > > > > > > column ?
> > > > > > > > > > > If pattern is like every row has that field.
> > > > > > > > > > >
> > > > > > > > > > > Say I have a field F1 in all rows so
> > > > > > > > > > > Situtatio -1
> > > > > > > > > > > key1#F1(as composite key)  - and rest fields in column
> > > > > > > > > > >
> > > > > > > > > > > Situation-2
> > > > > > > > > > > key1 as key and F1 part of column family.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > This is the main reason I  asked the key size limit.
> > > > > > > > > > > If I asked for no of rows where F1 is = 'someval' will
> it
> > > be
> > > > > > faster
> > > > > > > > in
> > > > > > > > > > > situation-1 than in situation-2. Since in 1 it can
> return
> > > the
> > > > > > > result
> > > > > > > > > just
> > > > > > > > > > > by traversing keys no need to read columns?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Aug 17, 2015 at 8:27 PM, Ted Yu <
> > > yuzhihong@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > For #1, it is the limit on a single keyvalue, not
> row,
> > > not
> > > > > key.
> > > > > > > > > > > >
> > > > > > > > > > > > For #2, please see the following:
> > > > > > > > > > > >
> > > > > > > > > > > > http://hbase.apache.org/book.html#store.memstore
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > http://hbase.apache.org/book.html#regionserver_splitting_implementation
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Aug 17, 2015 at 7:36 AM, Shushant Arora <
> > > > > > > > > > > shushantarora09@gmail.com
> > > > > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > 1.Is hbase.client.keyvalue.maxsize  is max size of
> > row
> > > or
> > > > > key
> > > > > > > > only
> > > > > > > > > ?
> > > > > > > > > > Is
> > > > > > > > > > > > > there any limit on key size only ?
> > > > > > > > > > > > > 2.Access pattern is mostly on key based only- Is
> > > > memstores
> > > > > > and
> > > > > > > > > > regions
> > > > > > > > > > > > on a
> > > > > > > > > > > > > regionserver are per table basis? Is it if I have
> > > > multiple
> > > > > > > tables
> > > > > > > > > it
> > > > > > > > > > > will
> > > > > > > > > > > > > have multiple memstores instead of few if it would
> > have
> > > > > been
> > > > > > > one
> > > > > > > > > > large
> > > > > > > > > > > > > table ?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Aug 17, 2015 at 7:29 PM, Ted Yu <
> > > > > yuzhihong@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > For #1, take a look at the following in
> > > > > hbase-default.xml :
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >     <name>hbase.client.keyvalue.maxsize</name>
> > > > > > > > > > > > > >     <value>10485760</value>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > For #2, it would be easier to answer if you can
> > > outline
> > > > > > > access
> > > > > > > > > > > patterns
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > your app.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > For #3, adjustment according to current region
> > > > boundaries
> > > > > > is
> > > > > > > > done
> > > > > > > > > > > > client
> > > > > > > > > > > > > > side. Take a look at the javadoc for
> LoadQueueItem
> > > > > > > > > > > > > > in LoadIncrementalHFiles.java
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cheers
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora <
> > > > > > > > > > > > > shushantarora09@gmail.com
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1.Is there any max limit on key size of hbase
> > > table.
> > > > > > > > > > > > > > > 2.Is multiple small tables vs one large table
> > which
> > > > one
> > > > > > is
> > > > > > > > > > > preferred.
> > > > > > > > > > > > > > > 3.for bulk load -when  LoadIncremantalHfile is
> > run
> > > it
> > > > > > again
> > > > > > > > > > > > > recalculates
> > > > > > > > > > > > > > > the region splits based on region boundary - is
> > > this
> > > > > > > division
> > > > > > > > > > > happens
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > > client side or server side again at region
> server
> > > or
> > > > > > hbase
> > > > > > > > > master
> > > > > > > > > > > and
> > > > > > > > > > > > > > then
> > > > > > > > > > > > > > > it assigns the splits which cross target region
> > > > > boundary
> > > > > > to
> > > > > > > > > > desired
> > > > > > > > > > > > > > > regionserver.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Anil Gupta
> > >
> >
>

Re: hbase doubts

Posted by Ted Yu <yu...@gmail.com>.
Since year and month are part of the row key in this scenario (instead of
just the day of month), the last region would get new data and be split.

Is this effect desirable for your app ?

Cheers

On Tue, Aug 18, 2015 at 12:55 PM, Shushant Arora <sh...@gmail.com>
wrote:

> for hbase key containing time as prefix say(yyyy-mm-dd#other fields of guid
> base) I am using bulk load to avoid hot spot of regionserver (avoiding
> write to WAL).
>
> What should be the initial splits of regions. Say I have 30 regionserves.
>
> shall intial 30 days as intial splits and then auto split takes care of
> splitting regions if it grows further will serve ?
> Or since if it has date as prefix and when region is split in 2 from midway
> - and new data will come for increasing date only will lead to  one region
> to be half filled always and rest half never filled?
>
> On Tue, Aug 18, 2015 at 9:41 PM, anil gupta <an...@gmail.com> wrote:
>
> > As per my experience, Phoenix is way superior than Hive-HBase integration
> > for sql-like querying on HBase. It's because, Phoenix is built on top of
> > HBase unlike Hive.
> >
> > On Tue, Aug 18, 2015 at 9:09 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > To my knowledge, Phoenix provides better integration with hbase.
> > >
> > > A third possibility is Spark on HBase.
> > >
> > > If you want to explore these alternatives, I suggest asking on
> respective
> > > mailing lists where you can get expert opinions.
> > >
> > > Cheers
> > >
> > > On Tue, Aug 18, 2015 at 9:03 AM, Shushant Arora <
> > shushantarora09@gmail.com
> > > >
> > > wrote:
> > >
> > > > Thanks!
> > > >
> > > > Which one is better for sqlkind of queries over hbase (queries
> involve
> > > > filter , key range scan), aggregates by column values.
> > > > .
> > > > 1.Hive storage handlers
> > > > 2.or Phoenix
> > > >
> > > > On Tue, Aug 18, 2015 at 9:14 PM, Ted Yu <yu...@gmail.com> wrote:
> > > >
> > > > > For #1, if you want to count distinct values for F1, you can write
> a
> > > > > coprocessor which aggregates the count on region server and returns
> > the
> > > > > result to client which does the final aggregation.
> > > > >
> > > > > Take a look
> > > > > at
> > > > >
> > > >
> > >
> >
> hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java
> > > > > and related classes for example.
> > > > >
> > > > > On Mon, Aug 17, 2015 at 10:08 PM, Shushant Arora <
> > > > > shushantarora09@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks !
> > > > > > few more doubts :
> > > > > >
> > > > > > 1.Say if requirement is to count distinct value of F1-
> > > > > >
> > > > > > If field is part of key- is hbase can't just scan key and skip
> > value
> > > > > > deserialsation and return result to client which will calculate
> > > > distinct
> > > > > > and in second approcah Hbase will desrialise the value of return
> > > column
> > > > > > containing F1 to cleint which will calculate the distinct.
> > > > > >
> > > > > > 2.For bulk load when LoadIncrementalHFiles runs and regionserver
> > > moves
> > > > > the
> > > > > > hfiles from hdfs to region directory - does regionserver localise
> > the
> > > > > hfile
> > > > > > by downloading it to local and then uploading again in region
> > > > directory?
> > > > > Or
> > > > > > it just moves to to region directory and wait for next compaction
> > to
> > > > get
> > > > > it
> > > > > > localise  as in regionserver failure case?
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Aug 17, 2015 at 11:00 PM, Ted Yu <yu...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > For both scenarios you mentioned, field is not leading part of
> > row
> > > > key.
> > > > > > > You would need to specify timerange or start row / stop row to
> > > narrow
> > > > > the
> > > > > > > key range being scanned.
> > > > > > >
> > > > > > > I am leaning toward using second approach.
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > > On Mon, Aug 17, 2015 at 9:41 AM, Shushant Arora <
> > > > > > shushantarora09@gmail.com
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > ~8-10 fields of size (5 of  20 bytes each )and 3 fields of
> size
> > > 200
> > > > > > bytes
> > > > > > > > each.
> > > > > > > >
> > > > > > > > On Mon, Aug 17, 2015 at 9:55 PM, Ted Yu <yuzhihong@gmail.com
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > > How many fields such as F1 are you considering for
> embedding
> > in
> > > > row
> > > > > > > key ?
> > > > > > > > >
> > > > > > > > > Suggested reading:
> > > > > > > > > http://hbase.apache.org/book.html#rowkey.design
> > > > > > > > > http://hbase.apache.org/book.html#client.filter.kvm (see
> > > > > > > > > ColumnPrefixFilter)
> > > > > > > > >
> > > > > > > > > Cheers
> > > > > > > > >
> > > > > > > > > On Mon, Aug 17, 2015 at 8:13 AM, Shushant Arora <
> > > > > > > > shushantarora09@gmail.com
> > > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > 1.so size limit is per cell's identifier + value ?
> > > > > > > > > >
> > > > > > > > > > What is more optimise - to have field in key or in column
> > > > > family's
> > > > > > > > > column ?
> > > > > > > > > > If pattern is like every row has that field.
> > > > > > > > > >
> > > > > > > > > > Say I have a field F1 in all rows so
> > > > > > > > > > Situtatio -1
> > > > > > > > > > key1#F1(as composite key)  - and rest fields in column
> > > > > > > > > >
> > > > > > > > > > Situation-2
> > > > > > > > > > key1 as key and F1 part of column family.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > This is the main reason I  asked the key size limit.
> > > > > > > > > > If I asked for no of rows where F1 is = 'someval' will it
> > be
> > > > > faster
> > > > > > > in
> > > > > > > > > > situation-1 than in situation-2. Since in 1 it can return
> > the
> > > > > > result
> > > > > > > > just
> > > > > > > > > > by traversing keys no need to read columns?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Aug 17, 2015 at 8:27 PM, Ted Yu <
> > yuzhihong@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > For #1, it is the limit on a single keyvalue, not row,
> > not
> > > > key.
> > > > > > > > > > >
> > > > > > > > > > > For #2, please see the following:
> > > > > > > > > > >
> > > > > > > > > > > http://hbase.apache.org/book.html#store.memstore
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> http://hbase.apache.org/book.html#regionserver_splitting_implementation
> > > > > > > > > > >
> > > > > > > > > > > Cheers
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Aug 17, 2015 at 7:36 AM, Shushant Arora <
> > > > > > > > > > shushantarora09@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > 1.Is hbase.client.keyvalue.maxsize  is max size of
> row
> > or
> > > > key
> > > > > > > only
> > > > > > > > ?
> > > > > > > > > Is
> > > > > > > > > > > > there any limit on key size only ?
> > > > > > > > > > > > 2.Access pattern is mostly on key based only- Is
> > > memstores
> > > > > and
> > > > > > > > > regions
> > > > > > > > > > > on a
> > > > > > > > > > > > regionserver are per table basis? Is it if I have
> > > multiple
> > > > > > tables
> > > > > > > > it
> > > > > > > > > > will
> > > > > > > > > > > > have multiple memstores instead of few if it would
> have
> > > > been
> > > > > > one
> > > > > > > > > large
> > > > > > > > > > > > table ?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Aug 17, 2015 at 7:29 PM, Ted Yu <
> > > > yuzhihong@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > For #1, take a look at the following in
> > > > hbase-default.xml :
> > > > > > > > > > > > >
> > > > > > > > > > > > >     <name>hbase.client.keyvalue.maxsize</name>
> > > > > > > > > > > > >     <value>10485760</value>
> > > > > > > > > > > > >
> > > > > > > > > > > > > For #2, it would be easier to answer if you can
> > outline
> > > > > > access
> > > > > > > > > > patterns
> > > > > > > > > > > > in
> > > > > > > > > > > > > your app.
> > > > > > > > > > > > >
> > > > > > > > > > > > > For #3, adjustment according to current region
> > > boundaries
> > > > > is
> > > > > > > done
> > > > > > > > > > > client
> > > > > > > > > > > > > side. Take a look at the javadoc for LoadQueueItem
> > > > > > > > > > > > > in LoadIncrementalHFiles.java
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cheers
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora <
> > > > > > > > > > > > shushantarora09@gmail.com
> > > > > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 1.Is there any max limit on key size of hbase
> > table.
> > > > > > > > > > > > > > 2.Is multiple small tables vs one large table
> which
> > > one
> > > > > is
> > > > > > > > > > preferred.
> > > > > > > > > > > > > > 3.for bulk load -when  LoadIncremantalHfile is
> run
> > it
> > > > > again
> > > > > > > > > > > > recalculates
> > > > > > > > > > > > > > the region splits based on region boundary - is
> > this
> > > > > > division
> > > > > > > > > > happens
> > > > > > > > > > > > on
> > > > > > > > > > > > > > client side or server side again at region server
> > or
> > > > > hbase
> > > > > > > > master
> > > > > > > > > > and
> > > > > > > > > > > > > then
> > > > > > > > > > > > > > it assigns the splits which cross target region
> > > > boundary
> > > > > to
> > > > > > > > > desired
> > > > > > > > > > > > > > regionserver.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>

Re: hbase doubts

Posted by Shahab Yunus <sh...@gmail.com>.
One thought to ponder:

If you are going to be splitting continuously and at a quicker pace, do you
have a strategy/plan to merge old regions? Otherwise, you can end up with a
cluster with proliferation of regions.

Regards,
Shahab

On Tue, Aug 18, 2015 at 3:55 PM, Shushant Arora <sh...@gmail.com>
wrote:

> for hbase key containing time as prefix say(yyyy-mm-dd#other fields of guid
> base) I am using bulk load to avoid hot spot of regionserver (avoiding
> write to WAL).
>
> What should be the initial splits of regions. Say I have 30 regionserves.
>
> shall intial 30 days as intial splits and then auto split takes care of
> splitting regions if it grows further will serve ?
> Or since if it has date as prefix and when region is split in 2 from midway
> - and new data will come for increasing date only will lead to  one region
> to be half filled always and rest half never filled?
>
> On Tue, Aug 18, 2015 at 9:41 PM, anil gupta <an...@gmail.com> wrote:
>
> > As per my experience, Phoenix is way superior than Hive-HBase integration
> > for sql-like querying on HBase. It's because, Phoenix is built on top of
> > HBase unlike Hive.
> >
> > On Tue, Aug 18, 2015 at 9:09 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > To my knowledge, Phoenix provides better integration with hbase.
> > >
> > > A third possibility is Spark on HBase.
> > >
> > > If you want to explore these alternatives, I suggest asking on
> respective
> > > mailing lists where you can get expert opinions.
> > >
> > > Cheers
> > >
> > > On Tue, Aug 18, 2015 at 9:03 AM, Shushant Arora <
> > shushantarora09@gmail.com
> > > >
> > > wrote:
> > >
> > > > Thanks!
> > > >
> > > > Which one is better for sqlkind of queries over hbase (queries
> involve
> > > > filter , key range scan), aggregates by column values.
> > > > .
> > > > 1.Hive storage handlers
> > > > 2.or Phoenix
> > > >
> > > > On Tue, Aug 18, 2015 at 9:14 PM, Ted Yu <yu...@gmail.com> wrote:
> > > >
> > > > > For #1, if you want to count distinct values for F1, you can write
> a
> > > > > coprocessor which aggregates the count on region server and returns
> > the
> > > > > result to client which does the final aggregation.
> > > > >
> > > > > Take a look
> > > > > at
> > > > >
> > > >
> > >
> >
> hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java
> > > > > and related classes for example.
> > > > >
> > > > > On Mon, Aug 17, 2015 at 10:08 PM, Shushant Arora <
> > > > > shushantarora09@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks !
> > > > > > few more doubts :
> > > > > >
> > > > > > 1.Say if requirement is to count distinct value of F1-
> > > > > >
> > > > > > If field is part of key- is hbase can't just scan key and skip
> > value
> > > > > > deserialsation and return result to client which will calculate
> > > > distinct
> > > > > > and in second approcah Hbase will desrialise the value of return
> > > column
> > > > > > containing F1 to cleint which will calculate the distinct.
> > > > > >
> > > > > > 2.For bulk load when LoadIncrementalHFiles runs and regionserver
> > > moves
> > > > > the
> > > > > > hfiles from hdfs to region directory - does regionserver localise
> > the
> > > > > hfile
> > > > > > by downloading it to local and then uploading again in region
> > > > directory?
> > > > > Or
> > > > > > it just moves to to region directory and wait for next compaction
> > to
> > > > get
> > > > > it
> > > > > > localise  as in regionserver failure case?
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Aug 17, 2015 at 11:00 PM, Ted Yu <yu...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > For both scenarios you mentioned, field is not leading part of
> > row
> > > > key.
> > > > > > > You would need to specify timerange or start row / stop row to
> > > narrow
> > > > > the
> > > > > > > key range being scanned.
> > > > > > >
> > > > > > > I am leaning toward using second approach.
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > > On Mon, Aug 17, 2015 at 9:41 AM, Shushant Arora <
> > > > > > shushantarora09@gmail.com
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > ~8-10 fields of size (5 of  20 bytes each )and 3 fields of
> size
> > > 200
> > > > > > bytes
> > > > > > > > each.
> > > > > > > >
> > > > > > > > On Mon, Aug 17, 2015 at 9:55 PM, Ted Yu <yuzhihong@gmail.com
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > > How many fields such as F1 are you considering for
> embedding
> > in
> > > > row
> > > > > > > key ?
> > > > > > > > >
> > > > > > > > > Suggested reading:
> > > > > > > > > http://hbase.apache.org/book.html#rowkey.design
> > > > > > > > > http://hbase.apache.org/book.html#client.filter.kvm (see
> > > > > > > > > ColumnPrefixFilter)
> > > > > > > > >
> > > > > > > > > Cheers
> > > > > > > > >
> > > > > > > > > On Mon, Aug 17, 2015 at 8:13 AM, Shushant Arora <
> > > > > > > > shushantarora09@gmail.com
> > > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > 1.so size limit is per cell's identifier + value ?
> > > > > > > > > >
> > > > > > > > > > What is more optimise - to have field in key or in column
> > > > > family's
> > > > > > > > > column ?
> > > > > > > > > > If pattern is like every row has that field.
> > > > > > > > > >
> > > > > > > > > > Say I have a field F1 in all rows so
> > > > > > > > > > Situtatio -1
> > > > > > > > > > key1#F1(as composite key)  - and rest fields in column
> > > > > > > > > >
> > > > > > > > > > Situation-2
> > > > > > > > > > key1 as key and F1 part of column family.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > This is the main reason I  asked the key size limit.
> > > > > > > > > > If I asked for no of rows where F1 is = 'someval' will it
> > be
> > > > > faster
> > > > > > > in
> > > > > > > > > > situation-1 than in situation-2. Since in 1 it can return
> > the
> > > > > > result
> > > > > > > > just
> > > > > > > > > > by traversing keys no need to read columns?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Aug 17, 2015 at 8:27 PM, Ted Yu <
> > yuzhihong@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > For #1, it is the limit on a single keyvalue, not row,
> > not
> > > > key.
> > > > > > > > > > >
> > > > > > > > > > > For #2, please see the following:
> > > > > > > > > > >
> > > > > > > > > > > http://hbase.apache.org/book.html#store.memstore
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> http://hbase.apache.org/book.html#regionserver_splitting_implementation
> > > > > > > > > > >
> > > > > > > > > > > Cheers
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Aug 17, 2015 at 7:36 AM, Shushant Arora <
> > > > > > > > > > shushantarora09@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > 1.Is hbase.client.keyvalue.maxsize  is max size of
> row
> > or
> > > > key
> > > > > > > only
> > > > > > > > ?
> > > > > > > > > Is
> > > > > > > > > > > > there any limit on key size only ?
> > > > > > > > > > > > 2.Access pattern is mostly on key based only- Is
> > > memstores
> > > > > and
> > > > > > > > > regions
> > > > > > > > > > > on a
> > > > > > > > > > > > regionserver are per table basis? Is it if I have
> > > multiple
> > > > > > tables
> > > > > > > > it
> > > > > > > > > > will
> > > > > > > > > > > > have multiple memstores instead of few if it would
> have
> > > > been
> > > > > > one
> > > > > > > > > large
> > > > > > > > > > > > table ?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Aug 17, 2015 at 7:29 PM, Ted Yu <
> > > > yuzhihong@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > For #1, take a look at the following in
> > > > hbase-default.xml :
> > > > > > > > > > > > >
> > > > > > > > > > > > >     <name>hbase.client.keyvalue.maxsize</name>
> > > > > > > > > > > > >     <value>10485760</value>
> > > > > > > > > > > > >
> > > > > > > > > > > > > For #2, it would be easier to answer if you can
> > outline
> > > > > > access
> > > > > > > > > > patterns
> > > > > > > > > > > > in
> > > > > > > > > > > > > your app.
> > > > > > > > > > > > >
> > > > > > > > > > > > > For #3, adjustment according to current region
> > > boundaries
> > > > > is
> > > > > > > done
> > > > > > > > > > > client
> > > > > > > > > > > > > side. Take a look at the javadoc for LoadQueueItem
> > > > > > > > > > > > > in LoadIncrementalHFiles.java
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cheers
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora <
> > > > > > > > > > > > shushantarora09@gmail.com
> > > > > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 1.Is there any max limit on key size of hbase
> > table.
> > > > > > > > > > > > > > 2.Is multiple small tables vs one large table
> which
> > > one
> > > > > is
> > > > > > > > > > preferred.
> > > > > > > > > > > > > > 3.for bulk load -when  LoadIncremantalHfile is
> run
> > it
> > > > > again
> > > > > > > > > > > > recalculates
> > > > > > > > > > > > > > the region splits based on region boundary - is
> > this
> > > > > > division
> > > > > > > > > > happens
> > > > > > > > > > > > on
> > > > > > > > > > > > > > client side or server side again at region server
> > or
> > > > > hbase
> > > > > > > > master
> > > > > > > > > > and
> > > > > > > > > > > > > then
> > > > > > > > > > > > > > it assigns the splits which cross target region
> > > > boundary
> > > > > to
> > > > > > > > > desired
> > > > > > > > > > > > > > regionserver.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>

Re: hbase doubts

Posted by Shushant Arora <sh...@gmail.com>.
for hbase key containing time as prefix say(yyyy-mm-dd#other fields of guid
base) I am using bulk load to avoid hot spot of regionserver (avoiding
write to WAL).

What should be the initial splits of regions. Say I have 30 regionserves.

shall intial 30 days as intial splits and then auto split takes care of
splitting regions if it grows further will serve ?
Or since if it has date as prefix and when region is split in 2 from midway
- and new data will come for increasing date only will lead to  one region
to be half filled always and rest half never filled?

On Tue, Aug 18, 2015 at 9:41 PM, anil gupta <an...@gmail.com> wrote:

> As per my experience, Phoenix is way superior than Hive-HBase integration
> for sql-like querying on HBase. It's because, Phoenix is built on top of
> HBase unlike Hive.
>
> On Tue, Aug 18, 2015 at 9:09 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > To my knowledge, Phoenix provides better integration with hbase.
> >
> > A third possibility is Spark on HBase.
> >
> > If you want to explore these alternatives, I suggest asking on respective
> > mailing lists where you can get expert opinions.
> >
> > Cheers
> >
> > On Tue, Aug 18, 2015 at 9:03 AM, Shushant Arora <
> shushantarora09@gmail.com
> > >
> > wrote:
> >
> > > Thanks!
> > >
> > > Which one is better for sqlkind of queries over hbase (queries involve
> > > filter , key range scan), aggregates by column values.
> > > .
> > > 1.Hive storage handlers
> > > 2.or Phoenix
> > >
> > > On Tue, Aug 18, 2015 at 9:14 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > For #1, if you want to count distinct values for F1, you can write a
> > > > coprocessor which aggregates the count on region server and returns
> the
> > > > result to client which does the final aggregation.
> > > >
> > > > Take a look
> > > > at
> > > >
> > >
> >
> hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java
> > > > and related classes for example.
> > > >
> > > > On Mon, Aug 17, 2015 at 10:08 PM, Shushant Arora <
> > > > shushantarora09@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks !
> > > > > few more doubts :
> > > > >
> > > > > 1.Say if requirement is to count distinct value of F1-
> > > > >
> > > > > If field is part of key- is hbase can't just scan key and skip
> value
> > > > > deserialsation and return result to client which will calculate
> > > distinct
> > > > > and in second approcah Hbase will desrialise the value of return
> > column
> > > > > containing F1 to cleint which will calculate the distinct.
> > > > >
> > > > > 2.For bulk load when LoadIncrementalHFiles runs and regionserver
> > moves
> > > > the
> > > > > hfiles from hdfs to region directory - does regionserver localise
> the
> > > > hfile
> > > > > by downloading it to local and then uploading again in region
> > > directory?
> > > > Or
> > > > > it just moves to to region directory and wait for next compaction
> to
> > > get
> > > > it
> > > > > localise  as in regionserver failure case?
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Aug 17, 2015 at 11:00 PM, Ted Yu <yu...@gmail.com>
> > wrote:
> > > > >
> > > > > > For both scenarios you mentioned, field is not leading part of
> row
> > > key.
> > > > > > You would need to specify timerange or start row / stop row to
> > narrow
> > > > the
> > > > > > key range being scanned.
> > > > > >
> > > > > > I am leaning toward using second approach.
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > On Mon, Aug 17, 2015 at 9:41 AM, Shushant Arora <
> > > > > shushantarora09@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > ~8-10 fields of size (5 of  20 bytes each )and 3 fields of size
> > 200
> > > > > bytes
> > > > > > > each.
> > > > > > >
> > > > > > > On Mon, Aug 17, 2015 at 9:55 PM, Ted Yu <yu...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > How many fields such as F1 are you considering for embedding
> in
> > > row
> > > > > > key ?
> > > > > > > >
> > > > > > > > Suggested reading:
> > > > > > > > http://hbase.apache.org/book.html#rowkey.design
> > > > > > > > http://hbase.apache.org/book.html#client.filter.kvm (see
> > > > > > > > ColumnPrefixFilter)
> > > > > > > >
> > > > > > > > Cheers
> > > > > > > >
> > > > > > > > On Mon, Aug 17, 2015 at 8:13 AM, Shushant Arora <
> > > > > > > shushantarora09@gmail.com
> > > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > 1.so size limit is per cell's identifier + value ?
> > > > > > > > >
> > > > > > > > > What is more optimise - to have field in key or in column
> > > > family's
> > > > > > > > column ?
> > > > > > > > > If pattern is like every row has that field.
> > > > > > > > >
> > > > > > > > > Say I have a field F1 in all rows so
> > > > > > > > > Situtatio -1
> > > > > > > > > key1#F1(as composite key)  - and rest fields in column
> > > > > > > > >
> > > > > > > > > Situation-2
> > > > > > > > > key1 as key and F1 part of column family.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > This is the main reason I  asked the key size limit.
> > > > > > > > > If I asked for no of rows where F1 is = 'someval' will it
> be
> > > > faster
> > > > > > in
> > > > > > > > > situation-1 than in situation-2. Since in 1 it can return
> the
> > > > > result
> > > > > > > just
> > > > > > > > > by traversing keys no need to read columns?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Aug 17, 2015 at 8:27 PM, Ted Yu <
> yuzhihong@gmail.com
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > For #1, it is the limit on a single keyvalue, not row,
> not
> > > key.
> > > > > > > > > >
> > > > > > > > > > For #2, please see the following:
> > > > > > > > > >
> > > > > > > > > > http://hbase.apache.org/book.html#store.memstore
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > http://hbase.apache.org/book.html#regionserver_splitting_implementation
> > > > > > > > > >
> > > > > > > > > > Cheers
> > > > > > > > > >
> > > > > > > > > > On Mon, Aug 17, 2015 at 7:36 AM, Shushant Arora <
> > > > > > > > > shushantarora09@gmail.com
> > > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > 1.Is hbase.client.keyvalue.maxsize  is max size of row
> or
> > > key
> > > > > > only
> > > > > > > ?
> > > > > > > > Is
> > > > > > > > > > > there any limit on key size only ?
> > > > > > > > > > > 2.Access pattern is mostly on key based only- Is
> > memstores
> > > > and
> > > > > > > > regions
> > > > > > > > > > on a
> > > > > > > > > > > regionserver are per table basis? Is it if I have
> > multiple
> > > > > tables
> > > > > > > it
> > > > > > > > > will
> > > > > > > > > > > have multiple memstores instead of few if it would have
> > > been
> > > > > one
> > > > > > > > large
> > > > > > > > > > > table ?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Aug 17, 2015 at 7:29 PM, Ted Yu <
> > > yuzhihong@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > For #1, take a look at the following in
> > > hbase-default.xml :
> > > > > > > > > > > >
> > > > > > > > > > > >     <name>hbase.client.keyvalue.maxsize</name>
> > > > > > > > > > > >     <value>10485760</value>
> > > > > > > > > > > >
> > > > > > > > > > > > For #2, it would be easier to answer if you can
> outline
> > > > > access
> > > > > > > > > patterns
> > > > > > > > > > > in
> > > > > > > > > > > > your app.
> > > > > > > > > > > >
> > > > > > > > > > > > For #3, adjustment according to current region
> > boundaries
> > > > is
> > > > > > done
> > > > > > > > > > client
> > > > > > > > > > > > side. Take a look at the javadoc for LoadQueueItem
> > > > > > > > > > > > in LoadIncrementalHFiles.java
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora <
> > > > > > > > > > > shushantarora09@gmail.com
> > > > > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > 1.Is there any max limit on key size of hbase
> table.
> > > > > > > > > > > > > 2.Is multiple small tables vs one large table which
> > one
> > > > is
> > > > > > > > > preferred.
> > > > > > > > > > > > > 3.for bulk load -when  LoadIncremantalHfile is run
> it
> > > > again
> > > > > > > > > > > recalculates
> > > > > > > > > > > > > the region splits based on region boundary - is
> this
> > > > > division
> > > > > > > > > happens
> > > > > > > > > > > on
> > > > > > > > > > > > > client side or server side again at region server
> or
> > > > hbase
> > > > > > > master
> > > > > > > > > and
> > > > > > > > > > > > then
> > > > > > > > > > > > > it assigns the splits which cross target region
> > > boundary
> > > > to
> > > > > > > > desired
> > > > > > > > > > > > > regionserver.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Re: hbase doubts

Posted by anil gupta <an...@gmail.com>.
As per my experience, Phoenix is way superior than Hive-HBase integration
for sql-like querying on HBase. It's because, Phoenix is built on top of
HBase unlike Hive.

On Tue, Aug 18, 2015 at 9:09 AM, Ted Yu <yu...@gmail.com> wrote:

> To my knowledge, Phoenix provides better integration with hbase.
>
> A third possibility is Spark on HBase.
>
> If you want to explore these alternatives, I suggest asking on respective
> mailing lists where you can get expert opinions.
>
> Cheers
>
> On Tue, Aug 18, 2015 at 9:03 AM, Shushant Arora <shushantarora09@gmail.com
> >
> wrote:
>
> > Thanks!
> >
> > Which one is better for sqlkind of queries over hbase (queries involve
> > filter , key range scan), aggregates by column values.
> > .
> > 1.Hive storage handlers
> > 2.or Phoenix
> >
> > On Tue, Aug 18, 2015 at 9:14 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > For #1, if you want to count distinct values for F1, you can write a
> > > coprocessor which aggregates the count on region server and returns the
> > > result to client which does the final aggregation.
> > >
> > > Take a look
> > > at
> > >
> >
> hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java
> > > and related classes for example.
> > >
> > > On Mon, Aug 17, 2015 at 10:08 PM, Shushant Arora <
> > > shushantarora09@gmail.com>
> > > wrote:
> > >
> > > > Thanks !
> > > > few more doubts :
> > > >
> > > > 1.Say if requirement is to count distinct value of F1-
> > > >
> > > > If field is part of key- is hbase can't just scan key and skip value
> > > > deserialsation and return result to client which will calculate
> > distinct
> > > > and in second approcah Hbase will desrialise the value of return
> column
> > > > containing F1 to cleint which will calculate the distinct.
> > > >
> > > > 2.For bulk load when LoadIncrementalHFiles runs and regionserver
> moves
> > > the
> > > > hfiles from hdfs to region directory - does regionserver localise the
> > > hfile
> > > > by downloading it to local and then uploading again in region
> > directory?
> > > Or
> > > > it just moves to to region directory and wait for next compaction to
> > get
> > > it
> > > > localise  as in regionserver failure case?
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Aug 17, 2015 at 11:00 PM, Ted Yu <yu...@gmail.com>
> wrote:
> > > >
> > > > > For both scenarios you mentioned, field is not leading part of row
> > key.
> > > > > You would need to specify timerange or start row / stop row to
> narrow
> > > the
> > > > > key range being scanned.
> > > > >
> > > > > I am leaning toward using second approach.
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Mon, Aug 17, 2015 at 9:41 AM, Shushant Arora <
> > > > shushantarora09@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > ~8-10 fields of size (5 of  20 bytes each )and 3 fields of size
> 200
> > > > bytes
> > > > > > each.
> > > > > >
> > > > > > On Mon, Aug 17, 2015 at 9:55 PM, Ted Yu <yu...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > How many fields such as F1 are you considering for embedding in
> > row
> > > > > key ?
> > > > > > >
> > > > > > > Suggested reading:
> > > > > > > http://hbase.apache.org/book.html#rowkey.design
> > > > > > > http://hbase.apache.org/book.html#client.filter.kvm (see
> > > > > > > ColumnPrefixFilter)
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > > On Mon, Aug 17, 2015 at 8:13 AM, Shushant Arora <
> > > > > > shushantarora09@gmail.com
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > 1.so size limit is per cell's identifier + value ?
> > > > > > > >
> > > > > > > > What is more optimise - to have field in key or in column
> > > family's
> > > > > > > column ?
> > > > > > > > If pattern is like every row has that field.
> > > > > > > >
> > > > > > > > Say I have a field F1 in all rows so
> > > > > > > > Situtatio -1
> > > > > > > > key1#F1(as composite key)  - and rest fields in column
> > > > > > > >
> > > > > > > > Situation-2
> > > > > > > > key1 as key and F1 part of column family.
> > > > > > > >
> > > > > > > >
> > > > > > > > This is the main reason I  asked the key size limit.
> > > > > > > > If I asked for no of rows where F1 is = 'someval' will it be
> > > faster
> > > > > in
> > > > > > > > situation-1 than in situation-2. Since in 1 it can return the
> > > > result
> > > > > > just
> > > > > > > > by traversing keys no need to read columns?
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Aug 17, 2015 at 8:27 PM, Ted Yu <yuzhihong@gmail.com
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > > For #1, it is the limit on a single keyvalue, not row, not
> > key.
> > > > > > > > >
> > > > > > > > > For #2, please see the following:
> > > > > > > > >
> > > > > > > > > http://hbase.apache.org/book.html#store.memstore
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> http://hbase.apache.org/book.html#regionserver_splitting_implementation
> > > > > > > > >
> > > > > > > > > Cheers
> > > > > > > > >
> > > > > > > > > On Mon, Aug 17, 2015 at 7:36 AM, Shushant Arora <
> > > > > > > > shushantarora09@gmail.com
> > > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > 1.Is hbase.client.keyvalue.maxsize  is max size of row or
> > key
> > > > > only
> > > > > > ?
> > > > > > > Is
> > > > > > > > > > there any limit on key size only ?
> > > > > > > > > > 2.Access pattern is mostly on key based only- Is
> memstores
> > > and
> > > > > > > regions
> > > > > > > > > on a
> > > > > > > > > > regionserver are per table basis? Is it if I have
> multiple
> > > > tables
> > > > > > it
> > > > > > > > will
> > > > > > > > > > have multiple memstores instead of few if it would have
> > been
> > > > one
> > > > > > > large
> > > > > > > > > > table ?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Aug 17, 2015 at 7:29 PM, Ted Yu <
> > yuzhihong@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > For #1, take a look at the following in
> > hbase-default.xml :
> > > > > > > > > > >
> > > > > > > > > > >     <name>hbase.client.keyvalue.maxsize</name>
> > > > > > > > > > >     <value>10485760</value>
> > > > > > > > > > >
> > > > > > > > > > > For #2, it would be easier to answer if you can outline
> > > > access
> > > > > > > > patterns
> > > > > > > > > > in
> > > > > > > > > > > your app.
> > > > > > > > > > >
> > > > > > > > > > > For #3, adjustment according to current region
> boundaries
> > > is
> > > > > done
> > > > > > > > > client
> > > > > > > > > > > side. Take a look at the javadoc for LoadQueueItem
> > > > > > > > > > > in LoadIncrementalHFiles.java
> > > > > > > > > > >
> > > > > > > > > > > Cheers
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora <
> > > > > > > > > > shushantarora09@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > 1.Is there any max limit on key size of hbase table.
> > > > > > > > > > > > 2.Is multiple small tables vs one large table which
> one
> > > is
> > > > > > > > preferred.
> > > > > > > > > > > > 3.for bulk load -when  LoadIncremantalHfile is run it
> > > again
> > > > > > > > > > recalculates
> > > > > > > > > > > > the region splits based on region boundary - is this
> > > > division
> > > > > > > > happens
> > > > > > > > > > on
> > > > > > > > > > > > client side or server side again at region server or
> > > hbase
> > > > > > master
> > > > > > > > and
> > > > > > > > > > > then
> > > > > > > > > > > > it assigns the splits which cross target region
> > boundary
> > > to
> > > > > > > desired
> > > > > > > > > > > > regionserver.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: hbase doubts

Posted by Ted Yu <yu...@gmail.com>.
To my knowledge, Phoenix provides better integration with hbase.

A third possibility is Spark on HBase.

If you want to explore these alternatives, I suggest asking on respective
mailing lists where you can get expert opinions.

Cheers

On Tue, Aug 18, 2015 at 9:03 AM, Shushant Arora <sh...@gmail.com>
wrote:

> Thanks!
>
> Which one is better for sqlkind of queries over hbase (queries involve
> filter , key range scan), aggregates by column values.
> .
> 1.Hive storage handlers
> 2.or Phoenix
>
> On Tue, Aug 18, 2015 at 9:14 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > For #1, if you want to count distinct values for F1, you can write a
> > coprocessor which aggregates the count on region server and returns the
> > result to client which does the final aggregation.
> >
> > Take a look
> > at
> >
> hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java
> > and related classes for example.
> >
> > On Mon, Aug 17, 2015 at 10:08 PM, Shushant Arora <
> > shushantarora09@gmail.com>
> > wrote:
> >
> > > Thanks !
> > > few more doubts :
> > >
> > > 1.Say if requirement is to count distinct value of F1-
> > >
> > > If field is part of key- is hbase can't just scan key and skip value
> > > deserialsation and return result to client which will calculate
> distinct
> > > and in second approcah Hbase will desrialise the value of return column
> > > containing F1 to cleint which will calculate the distinct.
> > >
> > > 2.For bulk load when LoadIncrementalHFiles runs and regionserver moves
> > the
> > > hfiles from hdfs to region directory - does regionserver localise the
> > hfile
> > > by downloading it to local and then uploading again in region
> directory?
> > Or
> > > it just moves to to region directory and wait for next compaction to
> get
> > it
> > > localise  as in regionserver failure case?
> > >
> > >
> > >
> > >
> > > On Mon, Aug 17, 2015 at 11:00 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > For both scenarios you mentioned, field is not leading part of row
> key.
> > > > You would need to specify timerange or start row / stop row to narrow
> > the
> > > > key range being scanned.
> > > >
> > > > I am leaning toward using second approach.
> > > >
> > > > Cheers
> > > >
> > > > On Mon, Aug 17, 2015 at 9:41 AM, Shushant Arora <
> > > shushantarora09@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > ~8-10 fields of size (5 of  20 bytes each )and 3 fields of size 200
> > > bytes
> > > > > each.
> > > > >
> > > > > On Mon, Aug 17, 2015 at 9:55 PM, Ted Yu <yu...@gmail.com>
> wrote:
> > > > >
> > > > > > How many fields such as F1 are you considering for embedding in
> row
> > > > key ?
> > > > > >
> > > > > > Suggested reading:
> > > > > > http://hbase.apache.org/book.html#rowkey.design
> > > > > > http://hbase.apache.org/book.html#client.filter.kvm (see
> > > > > > ColumnPrefixFilter)
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > On Mon, Aug 17, 2015 at 8:13 AM, Shushant Arora <
> > > > > shushantarora09@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > 1.so size limit is per cell's identifier + value ?
> > > > > > >
> > > > > > > What is more optimise - to have field in key or in column
> > family's
> > > > > > column ?
> > > > > > > If pattern is like every row has that field.
> > > > > > >
> > > > > > > Say I have a field F1 in all rows so
> > > > > > > Situtatio -1
> > > > > > > key1#F1(as composite key)  - and rest fields in column
> > > > > > >
> > > > > > > Situation-2
> > > > > > > key1 as key and F1 part of column family.
> > > > > > >
> > > > > > >
> > > > > > > This is the main reason I  asked the key size limit.
> > > > > > > If I asked for no of rows where F1 is = 'someval' will it be
> > faster
> > > > in
> > > > > > > situation-1 than in situation-2. Since in 1 it can return the
> > > result
> > > > > just
> > > > > > > by traversing keys no need to read columns?
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Aug 17, 2015 at 8:27 PM, Ted Yu <yu...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > For #1, it is the limit on a single keyvalue, not row, not
> key.
> > > > > > > >
> > > > > > > > For #2, please see the following:
> > > > > > > >
> > > > > > > > http://hbase.apache.org/book.html#store.memstore
> > > > > > > >
> > > > > >
> > > >
> > http://hbase.apache.org/book.html#regionserver_splitting_implementation
> > > > > > > >
> > > > > > > > Cheers
> > > > > > > >
> > > > > > > > On Mon, Aug 17, 2015 at 7:36 AM, Shushant Arora <
> > > > > > > shushantarora09@gmail.com
> > > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > 1.Is hbase.client.keyvalue.maxsize  is max size of row or
> key
> > > > only
> > > > > ?
> > > > > > Is
> > > > > > > > > there any limit on key size only ?
> > > > > > > > > 2.Access pattern is mostly on key based only- Is memstores
> > and
> > > > > > regions
> > > > > > > > on a
> > > > > > > > > regionserver are per table basis? Is it if I have multiple
> > > tables
> > > > > it
> > > > > > > will
> > > > > > > > > have multiple memstores instead of few if it would have
> been
> > > one
> > > > > > large
> > > > > > > > > table ?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Aug 17, 2015 at 7:29 PM, Ted Yu <
> yuzhihong@gmail.com
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > For #1, take a look at the following in
> hbase-default.xml :
> > > > > > > > > >
> > > > > > > > > >     <name>hbase.client.keyvalue.maxsize</name>
> > > > > > > > > >     <value>10485760</value>
> > > > > > > > > >
> > > > > > > > > > For #2, it would be easier to answer if you can outline
> > > access
> > > > > > > patterns
> > > > > > > > > in
> > > > > > > > > > your app.
> > > > > > > > > >
> > > > > > > > > > For #3, adjustment according to current region boundaries
> > is
> > > > done
> > > > > > > > client
> > > > > > > > > > side. Take a look at the javadoc for LoadQueueItem
> > > > > > > > > > in LoadIncrementalHFiles.java
> > > > > > > > > >
> > > > > > > > > > Cheers
> > > > > > > > > >
> > > > > > > > > > On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora <
> > > > > > > > > shushantarora09@gmail.com
> > > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > 1.Is there any max limit on key size of hbase table.
> > > > > > > > > > > 2.Is multiple small tables vs one large table which one
> > is
> > > > > > > preferred.
> > > > > > > > > > > 3.for bulk load -when  LoadIncremantalHfile is run it
> > again
> > > > > > > > > recalculates
> > > > > > > > > > > the region splits based on region boundary - is this
> > > division
> > > > > > > happens
> > > > > > > > > on
> > > > > > > > > > > client side or server side again at region server or
> > hbase
> > > > > master
> > > > > > > and
> > > > > > > > > > then
> > > > > > > > > > > it assigns the splits which cross target region
> boundary
> > to
> > > > > > desired
> > > > > > > > > > > regionserver.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: hbase doubts

Posted by Shushant Arora <sh...@gmail.com>.
Thanks!

Which one is better for sqlkind of queries over hbase (queries involve
filter , key range scan), aggregates by column values.
.
1.Hive storage handlers
2.or Phoenix

On Tue, Aug 18, 2015 at 9:14 PM, Ted Yu <yu...@gmail.com> wrote:

> For #1, if you want to count distinct values for F1, you can write a
> coprocessor which aggregates the count on region server and returns the
> result to client which does the final aggregation.
>
> Take a look
> at
> hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java
> and related classes for example.
>
> On Mon, Aug 17, 2015 at 10:08 PM, Shushant Arora <
> shushantarora09@gmail.com>
> wrote:
>
> > Thanks !
> > few more doubts :
> >
> > 1.Say if requirement is to count distinct value of F1-
> >
> > If field is part of key- is hbase can't just scan key and skip value
> > deserialsation and return result to client which will calculate distinct
> > and in second approcah Hbase will desrialise the value of return column
> > containing F1 to cleint which will calculate the distinct.
> >
> > 2.For bulk load when LoadIncrementalHFiles runs and regionserver moves
> the
> > hfiles from hdfs to region directory - does regionserver localise the
> hfile
> > by downloading it to local and then uploading again in region directory?
> Or
> > it just moves to to region directory and wait for next compaction to get
> it
> > localise  as in regionserver failure case?
> >
> >
> >
> >
> > On Mon, Aug 17, 2015 at 11:00 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > For both scenarios you mentioned, field is not leading part of row key.
> > > You would need to specify timerange or start row / stop row to narrow
> the
> > > key range being scanned.
> > >
> > > I am leaning toward using second approach.
> > >
> > > Cheers
> > >
> > > On Mon, Aug 17, 2015 at 9:41 AM, Shushant Arora <
> > shushantarora09@gmail.com
> > > >
> > > wrote:
> > >
> > > > ~8-10 fields of size (5 of  20 bytes each )and 3 fields of size 200
> > bytes
> > > > each.
> > > >
> > > > On Mon, Aug 17, 2015 at 9:55 PM, Ted Yu <yu...@gmail.com> wrote:
> > > >
> > > > > How many fields such as F1 are you considering for embedding in row
> > > key ?
> > > > >
> > > > > Suggested reading:
> > > > > http://hbase.apache.org/book.html#rowkey.design
> > > > > http://hbase.apache.org/book.html#client.filter.kvm (see
> > > > > ColumnPrefixFilter)
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Mon, Aug 17, 2015 at 8:13 AM, Shushant Arora <
> > > > shushantarora09@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > 1.so size limit is per cell's identifier + value ?
> > > > > >
> > > > > > What is more optimise - to have field in key or in column
> family's
> > > > > column ?
> > > > > > If pattern is like every row has that field.
> > > > > >
> > > > > > Say I have a field F1 in all rows so
> > > > > > Situtatio -1
> > > > > > key1#F1(as composite key)  - and rest fields in column
> > > > > >
> > > > > > Situation-2
> > > > > > key1 as key and F1 part of column family.
> > > > > >
> > > > > >
> > > > > > This is the main reason I  asked the key size limit.
> > > > > > If I asked for no of rows where F1 is = 'someval' will it be
> faster
> > > in
> > > > > > situation-1 than in situation-2. Since in 1 it can return the
> > result
> > > > just
> > > > > > by traversing keys no need to read columns?
> > > > > >
> > > > > >
> > > > > > On Mon, Aug 17, 2015 at 8:27 PM, Ted Yu <yu...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > For #1, it is the limit on a single keyvalue, not row, not key.
> > > > > > >
> > > > > > > For #2, please see the following:
> > > > > > >
> > > > > > > http://hbase.apache.org/book.html#store.memstore
> > > > > > >
> > > > >
> > >
> http://hbase.apache.org/book.html#regionserver_splitting_implementation
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > > On Mon, Aug 17, 2015 at 7:36 AM, Shushant Arora <
> > > > > > shushantarora09@gmail.com
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > 1.Is hbase.client.keyvalue.maxsize  is max size of row or key
> > > only
> > > > ?
> > > > > Is
> > > > > > > > there any limit on key size only ?
> > > > > > > > 2.Access pattern is mostly on key based only- Is memstores
> and
> > > > > regions
> > > > > > > on a
> > > > > > > > regionserver are per table basis? Is it if I have multiple
> > tables
> > > > it
> > > > > > will
> > > > > > > > have multiple memstores instead of few if it would have been
> > one
> > > > > large
> > > > > > > > table ?
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Aug 17, 2015 at 7:29 PM, Ted Yu <yuzhihong@gmail.com
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > > For #1, take a look at the following in hbase-default.xml :
> > > > > > > > >
> > > > > > > > >     <name>hbase.client.keyvalue.maxsize</name>
> > > > > > > > >     <value>10485760</value>
> > > > > > > > >
> > > > > > > > > For #2, it would be easier to answer if you can outline
> > access
> > > > > > patterns
> > > > > > > > in
> > > > > > > > > your app.
> > > > > > > > >
> > > > > > > > > For #3, adjustment according to current region boundaries
> is
> > > done
> > > > > > > client
> > > > > > > > > side. Take a look at the javadoc for LoadQueueItem
> > > > > > > > > in LoadIncrementalHFiles.java
> > > > > > > > >
> > > > > > > > > Cheers
> > > > > > > > >
> > > > > > > > > On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora <
> > > > > > > > shushantarora09@gmail.com
> > > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > 1.Is there any max limit on key size of hbase table.
> > > > > > > > > > 2.Is multiple small tables vs one large table which one
> is
> > > > > > preferred.
> > > > > > > > > > 3.for bulk load -when  LoadIncremantalHfile is run it
> again
> > > > > > > > recalculates
> > > > > > > > > > the region splits based on region boundary - is this
> > division
> > > > > > happens
> > > > > > > > on
> > > > > > > > > > client side or server side again at region server or
> hbase
> > > > master
> > > > > > and
> > > > > > > > > then
> > > > > > > > > > it assigns the splits which cross target region boundary
> to
> > > > > desired
> > > > > > > > > > regionserver.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: hbase doubts

Posted by Ted Yu <yu...@gmail.com>.
For #1, if you want to count distinct values for F1, you can write a
coprocessor which aggregates the count on region server and returns the
result to client which does the final aggregation.

Take a look
at hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java
and related classes for example.

On Mon, Aug 17, 2015 at 10:08 PM, Shushant Arora <sh...@gmail.com>
wrote:

> Thanks !
> few more doubts :
>
> 1.Say if requirement is to count distinct value of F1-
>
> If field is part of key- is hbase can't just scan key and skip value
> deserialsation and return result to client which will calculate distinct
> and in second approcah Hbase will desrialise the value of return column
> containing F1 to cleint which will calculate the distinct.
>
> 2.For bulk load when LoadIncrementalHFiles runs and regionserver moves the
> hfiles from hdfs to region directory - does regionserver localise the hfile
> by downloading it to local and then uploading again in region directory? Or
> it just moves to to region directory and wait for next compaction to get it
> localise  as in regionserver failure case?
>
>
>
>
> On Mon, Aug 17, 2015 at 11:00 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > For both scenarios you mentioned, field is not leading part of row key.
> > You would need to specify timerange or start row / stop row to narrow the
> > key range being scanned.
> >
> > I am leaning toward using second approach.
> >
> > Cheers
> >
> > On Mon, Aug 17, 2015 at 9:41 AM, Shushant Arora <
> shushantarora09@gmail.com
> > >
> > wrote:
> >
> > > ~8-10 fields of size (5 of  20 bytes each )and 3 fields of size 200
> bytes
> > > each.
> > >
> > > On Mon, Aug 17, 2015 at 9:55 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > How many fields such as F1 are you considering for embedding in row
> > key ?
> > > >
> > > > Suggested reading:
> > > > http://hbase.apache.org/book.html#rowkey.design
> > > > http://hbase.apache.org/book.html#client.filter.kvm (see
> > > > ColumnPrefixFilter)
> > > >
> > > > Cheers
> > > >
> > > > On Mon, Aug 17, 2015 at 8:13 AM, Shushant Arora <
> > > shushantarora09@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > 1.so size limit is per cell's identifier + value ?
> > > > >
> > > > > What is more optimise - to have field in key or in column family's
> > > > column ?
> > > > > If pattern is like every row has that field.
> > > > >
> > > > > Say I have a field F1 in all rows so
> > > > > Situtatio -1
> > > > > key1#F1(as composite key)  - and rest fields in column
> > > > >
> > > > > Situation-2
> > > > > key1 as key and F1 part of column family.
> > > > >
> > > > >
> > > > > This is the main reason I  asked the key size limit.
> > > > > If I asked for no of rows where F1 is = 'someval' will it be faster
> > in
> > > > > situation-1 than in situation-2. Since in 1 it can return the
> result
> > > just
> > > > > by traversing keys no need to read columns?
> > > > >
> > > > >
> > > > > On Mon, Aug 17, 2015 at 8:27 PM, Ted Yu <yu...@gmail.com>
> wrote:
> > > > >
> > > > > > For #1, it is the limit on a single keyvalue, not row, not key.
> > > > > >
> > > > > > For #2, please see the following:
> > > > > >
> > > > > > http://hbase.apache.org/book.html#store.memstore
> > > > > >
> > > >
> > http://hbase.apache.org/book.html#regionserver_splitting_implementation
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > On Mon, Aug 17, 2015 at 7:36 AM, Shushant Arora <
> > > > > shushantarora09@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > 1.Is hbase.client.keyvalue.maxsize  is max size of row or key
> > only
> > > ?
> > > > Is
> > > > > > > there any limit on key size only ?
> > > > > > > 2.Access pattern is mostly on key based only- Is memstores and
> > > > regions
> > > > > > on a
> > > > > > > regionserver are per table basis? Is it if I have multiple
> tables
> > > it
> > > > > will
> > > > > > > have multiple memstores instead of few if it would have been
> one
> > > > large
> > > > > > > table ?
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Aug 17, 2015 at 7:29 PM, Ted Yu <yu...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > For #1, take a look at the following in hbase-default.xml :
> > > > > > > >
> > > > > > > >     <name>hbase.client.keyvalue.maxsize</name>
> > > > > > > >     <value>10485760</value>
> > > > > > > >
> > > > > > > > For #2, it would be easier to answer if you can outline
> access
> > > > > patterns
> > > > > > > in
> > > > > > > > your app.
> > > > > > > >
> > > > > > > > For #3, adjustment according to current region boundaries is
> > done
> > > > > > client
> > > > > > > > side. Take a look at the javadoc for LoadQueueItem
> > > > > > > > in LoadIncrementalHFiles.java
> > > > > > > >
> > > > > > > > Cheers
> > > > > > > >
> > > > > > > > On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora <
> > > > > > > shushantarora09@gmail.com
> > > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > 1.Is there any max limit on key size of hbase table.
> > > > > > > > > 2.Is multiple small tables vs one large table which one is
> > > > > preferred.
> > > > > > > > > 3.for bulk load -when  LoadIncremantalHfile is run it again
> > > > > > > recalculates
> > > > > > > > > the region splits based on region boundary - is this
> division
> > > > > happens
> > > > > > > on
> > > > > > > > > client side or server side again at region server or hbase
> > > master
> > > > > and
> > > > > > > > then
> > > > > > > > > it assigns the splits which cross target region boundary to
> > > > desired
> > > > > > > > > regionserver.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: hbase doubts

Posted by Shushant Arora <sh...@gmail.com>.
Thanks !
few more doubts :

1.Say if requirement is to count distinct value of F1-

If field is part of key- is hbase can't just scan key and skip value
deserialsation and return result to client which will calculate distinct
and in second approcah Hbase will desrialise the value of return column
containing F1 to cleint which will calculate the distinct.

2.For bulk load when LoadIncrementalHFiles runs and regionserver moves the
hfiles from hdfs to region directory - does regionserver localise the hfile
by downloading it to local and then uploading again in region directory? Or
it just moves to to region directory and wait for next compaction to get it
localise  as in regionserver failure case?




On Mon, Aug 17, 2015 at 11:00 PM, Ted Yu <yu...@gmail.com> wrote:

> For both scenarios you mentioned, field is not leading part of row key.
> You would need to specify timerange or start row / stop row to narrow the
> key range being scanned.
>
> I am leaning toward using second approach.
>
> Cheers
>
> On Mon, Aug 17, 2015 at 9:41 AM, Shushant Arora <shushantarora09@gmail.com
> >
> wrote:
>
> > ~8-10 fields of size (5 of  20 bytes each )and 3 fields of size 200 bytes
> > each.
> >
> > On Mon, Aug 17, 2015 at 9:55 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > How many fields such as F1 are you considering for embedding in row
> key ?
> > >
> > > Suggested reading:
> > > http://hbase.apache.org/book.html#rowkey.design
> > > http://hbase.apache.org/book.html#client.filter.kvm (see
> > > ColumnPrefixFilter)
> > >
> > > Cheers
> > >
> > > On Mon, Aug 17, 2015 at 8:13 AM, Shushant Arora <
> > shushantarora09@gmail.com
> > > >
> > > wrote:
> > >
> > > > 1.so size limit is per cell's identifier + value ?
> > > >
> > > > What is more optimise - to have field in key or in column family's
> > > column ?
> > > > If pattern is like every row has that field.
> > > >
> > > > Say I have a field F1 in all rows so
> > > > Situtatio -1
> > > > key1#F1(as composite key)  - and rest fields in column
> > > >
> > > > Situation-2
> > > > key1 as key and F1 part of column family.
> > > >
> > > >
> > > > This is the main reason I  asked the key size limit.
> > > > If I asked for no of rows where F1 is = 'someval' will it be faster
> in
> > > > situation-1 than in situation-2. Since in 1 it can return the result
> > just
> > > > by traversing keys no need to read columns?
> > > >
> > > >
> > > > On Mon, Aug 17, 2015 at 8:27 PM, Ted Yu <yu...@gmail.com> wrote:
> > > >
> > > > > For #1, it is the limit on a single keyvalue, not row, not key.
> > > > >
> > > > > For #2, please see the following:
> > > > >
> > > > > http://hbase.apache.org/book.html#store.memstore
> > > > >
> > >
> http://hbase.apache.org/book.html#regionserver_splitting_implementation
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Mon, Aug 17, 2015 at 7:36 AM, Shushant Arora <
> > > > shushantarora09@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > 1.Is hbase.client.keyvalue.maxsize  is max size of row or key
> only
> > ?
> > > Is
> > > > > > there any limit on key size only ?
> > > > > > 2.Access pattern is mostly on key based only- Is memstores and
> > > regions
> > > > > on a
> > > > > > regionserver are per table basis? Is it if I have multiple tables
> > it
> > > > will
> > > > > > have multiple memstores instead of few if it would have been one
> > > large
> > > > > > table ?
> > > > > >
> > > > > >
> > > > > > On Mon, Aug 17, 2015 at 7:29 PM, Ted Yu <yu...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > For #1, take a look at the following in hbase-default.xml :
> > > > > > >
> > > > > > >     <name>hbase.client.keyvalue.maxsize</name>
> > > > > > >     <value>10485760</value>
> > > > > > >
> > > > > > > For #2, it would be easier to answer if you can outline access
> > > > patterns
> > > > > > in
> > > > > > > your app.
> > > > > > >
> > > > > > > For #3, adjustment according to current region boundaries is
> done
> > > > > client
> > > > > > > side. Take a look at the javadoc for LoadQueueItem
> > > > > > > in LoadIncrementalHFiles.java
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > > On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora <
> > > > > > shushantarora09@gmail.com
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > 1.Is there any max limit on key size of hbase table.
> > > > > > > > 2.Is multiple small tables vs one large table which one is
> > > > preferred.
> > > > > > > > 3.for bulk load -when  LoadIncremantalHfile is run it again
> > > > > > recalculates
> > > > > > > > the region splits based on region boundary - is this division
> > > > happens
> > > > > > on
> > > > > > > > client side or server side again at region server or hbase
> > master
> > > > and
> > > > > > > then
> > > > > > > > it assigns the splits which cross target region boundary to
> > > desired
> > > > > > > > regionserver.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: hbase doubts

Posted by Ted Yu <yu...@gmail.com>.
For both scenarios you mentioned, field is not leading part of row key.
You would need to specify timerange or start row / stop row to narrow the
key range being scanned.

I am leaning toward using second approach.

Cheers

On Mon, Aug 17, 2015 at 9:41 AM, Shushant Arora <sh...@gmail.com>
wrote:

> ~8-10 fields of size (5 of  20 bytes each )and 3 fields of size 200 bytes
> each.
>
> On Mon, Aug 17, 2015 at 9:55 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > How many fields such as F1 are you considering for embedding in row key ?
> >
> > Suggested reading:
> > http://hbase.apache.org/book.html#rowkey.design
> > http://hbase.apache.org/book.html#client.filter.kvm (see
> > ColumnPrefixFilter)
> >
> > Cheers
> >
> > On Mon, Aug 17, 2015 at 8:13 AM, Shushant Arora <
> shushantarora09@gmail.com
> > >
> > wrote:
> >
> > > 1.so size limit is per cell's identifier + value ?
> > >
> > > What is more optimise - to have field in key or in column family's
> > column ?
> > > If pattern is like every row has that field.
> > >
> > > Say I have a field F1 in all rows so
> > > Situtatio -1
> > > key1#F1(as composite key)  - and rest fields in column
> > >
> > > Situation-2
> > > key1 as key and F1 part of column family.
> > >
> > >
> > > This is the main reason I  asked the key size limit.
> > > If I asked for no of rows where F1 is = 'someval' will it be faster in
> > > situation-1 than in situation-2. Since in 1 it can return the result
> just
> > > by traversing keys no need to read columns?
> > >
> > >
> > > On Mon, Aug 17, 2015 at 8:27 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > For #1, it is the limit on a single keyvalue, not row, not key.
> > > >
> > > > For #2, please see the following:
> > > >
> > > > http://hbase.apache.org/book.html#store.memstore
> > > >
> > http://hbase.apache.org/book.html#regionserver_splitting_implementation
> > > >
> > > > Cheers
> > > >
> > > > On Mon, Aug 17, 2015 at 7:36 AM, Shushant Arora <
> > > shushantarora09@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > 1.Is hbase.client.keyvalue.maxsize  is max size of row or key only
> ?
> > Is
> > > > > there any limit on key size only ?
> > > > > 2.Access pattern is mostly on key based only- Is memstores and
> > regions
> > > > on a
> > > > > regionserver are per table basis? Is it if I have multiple tables
> it
> > > will
> > > > > have multiple memstores instead of few if it would have been one
> > large
> > > > > table ?
> > > > >
> > > > >
> > > > > On Mon, Aug 17, 2015 at 7:29 PM, Ted Yu <yu...@gmail.com>
> wrote:
> > > > >
> > > > > > For #1, take a look at the following in hbase-default.xml :
> > > > > >
> > > > > >     <name>hbase.client.keyvalue.maxsize</name>
> > > > > >     <value>10485760</value>
> > > > > >
> > > > > > For #2, it would be easier to answer if you can outline access
> > > patterns
> > > > > in
> > > > > > your app.
> > > > > >
> > > > > > For #3, adjustment according to current region boundaries is done
> > > > client
> > > > > > side. Take a look at the javadoc for LoadQueueItem
> > > > > > in LoadIncrementalHFiles.java
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora <
> > > > > shushantarora09@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > 1.Is there any max limit on key size of hbase table.
> > > > > > > 2.Is multiple small tables vs one large table which one is
> > > preferred.
> > > > > > > 3.for bulk load -when  LoadIncremantalHfile is run it again
> > > > > recalculates
> > > > > > > the region splits based on region boundary - is this division
> > > happens
> > > > > on
> > > > > > > client side or server side again at region server or hbase
> master
> > > and
> > > > > > then
> > > > > > > it assigns the splits which cross target region boundary to
> > desired
> > > > > > > regionserver.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: hbase doubts

Posted by Shushant Arora <sh...@gmail.com>.
~8-10 fields of size (5 of  20 bytes each )and 3 fields of size 200 bytes
each.

On Mon, Aug 17, 2015 at 9:55 PM, Ted Yu <yu...@gmail.com> wrote:

> How many fields such as F1 are you considering for embedding in row key ?
>
> Suggested reading:
> http://hbase.apache.org/book.html#rowkey.design
> http://hbase.apache.org/book.html#client.filter.kvm (see
> ColumnPrefixFilter)
>
> Cheers
>
> On Mon, Aug 17, 2015 at 8:13 AM, Shushant Arora <shushantarora09@gmail.com
> >
> wrote:
>
> > 1.so size limit is per cell's identifier + value ?
> >
> > What is more optimise - to have field in key or in column family's
> column ?
> > If pattern is like every row has that field.
> >
> > Say I have a field F1 in all rows so
> > Situtatio -1
> > key1#F1(as composite key)  - and rest fields in column
> >
> > Situation-2
> > key1 as key and F1 part of column family.
> >
> >
> > This is the main reason I  asked the key size limit.
> > If I asked for no of rows where F1 is = 'someval' will it be faster in
> > situation-1 than in situation-2. Since in 1 it can return the result just
> > by traversing keys no need to read columns?
> >
> >
> > On Mon, Aug 17, 2015 at 8:27 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > For #1, it is the limit on a single keyvalue, not row, not key.
> > >
> > > For #2, please see the following:
> > >
> > > http://hbase.apache.org/book.html#store.memstore
> > >
> http://hbase.apache.org/book.html#regionserver_splitting_implementation
> > >
> > > Cheers
> > >
> > > On Mon, Aug 17, 2015 at 7:36 AM, Shushant Arora <
> > shushantarora09@gmail.com
> > > >
> > > wrote:
> > >
> > > > 1.Is hbase.client.keyvalue.maxsize  is max size of row or key only ?
> Is
> > > > there any limit on key size only ?
> > > > 2.Access pattern is mostly on key based only- Is memstores and
> regions
> > > on a
> > > > regionserver are per table basis? Is it if I have multiple tables it
> > will
> > > > have multiple memstores instead of few if it would have been one
> large
> > > > table ?
> > > >
> > > >
> > > > On Mon, Aug 17, 2015 at 7:29 PM, Ted Yu <yu...@gmail.com> wrote:
> > > >
> > > > > For #1, take a look at the following in hbase-default.xml :
> > > > >
> > > > >     <name>hbase.client.keyvalue.maxsize</name>
> > > > >     <value>10485760</value>
> > > > >
> > > > > For #2, it would be easier to answer if you can outline access
> > patterns
> > > > in
> > > > > your app.
> > > > >
> > > > > For #3, adjustment according to current region boundaries is done
> > > client
> > > > > side. Take a look at the javadoc for LoadQueueItem
> > > > > in LoadIncrementalHFiles.java
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora <
> > > > shushantarora09@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > 1.Is there any max limit on key size of hbase table.
> > > > > > 2.Is multiple small tables vs one large table which one is
> > preferred.
> > > > > > 3.for bulk load -when  LoadIncremantalHfile is run it again
> > > > recalculates
> > > > > > the region splits based on region boundary - is this division
> > happens
> > > > on
> > > > > > client side or server side again at region server or hbase master
> > and
> > > > > then
> > > > > > it assigns the splits which cross target region boundary to
> desired
> > > > > > regionserver.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: hbase doubts

Posted by Ted Yu <yu...@gmail.com>.
How many fields such as F1 are you considering for embedding in row key ?

Suggested reading:
http://hbase.apache.org/book.html#rowkey.design
http://hbase.apache.org/book.html#client.filter.kvm (see ColumnPrefixFilter)

Cheers

On Mon, Aug 17, 2015 at 8:13 AM, Shushant Arora <sh...@gmail.com>
wrote:

> 1.so size limit is per cell's identifier + value ?
>
> What is more optimise - to have field in key or in column family's column ?
> If pattern is like every row has that field.
>
> Say I have a field F1 in all rows so
> Situtatio -1
> key1#F1(as composite key)  - and rest fields in column
>
> Situation-2
> key1 as key and F1 part of column family.
>
>
> This is the main reason I  asked the key size limit.
> If I asked for no of rows where F1 is = 'someval' will it be faster in
> situation-1 than in situation-2. Since in 1 it can return the result just
> by traversing keys no need to read columns?
>
>
> On Mon, Aug 17, 2015 at 8:27 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > For #1, it is the limit on a single keyvalue, not row, not key.
> >
> > For #2, please see the following:
> >
> > http://hbase.apache.org/book.html#store.memstore
> > http://hbase.apache.org/book.html#regionserver_splitting_implementation
> >
> > Cheers
> >
> > On Mon, Aug 17, 2015 at 7:36 AM, Shushant Arora <
> shushantarora09@gmail.com
> > >
> > wrote:
> >
> > > 1.Is hbase.client.keyvalue.maxsize  is max size of row or key only ? Is
> > > there any limit on key size only ?
> > > 2.Access pattern is mostly on key based only- Is memstores and regions
> > on a
> > > regionserver are per table basis? Is it if I have multiple tables it
> will
> > > have multiple memstores instead of few if it would have been one large
> > > table ?
> > >
> > >
> > > On Mon, Aug 17, 2015 at 7:29 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > For #1, take a look at the following in hbase-default.xml :
> > > >
> > > >     <name>hbase.client.keyvalue.maxsize</name>
> > > >     <value>10485760</value>
> > > >
> > > > For #2, it would be easier to answer if you can outline access
> patterns
> > > in
> > > > your app.
> > > >
> > > > For #3, adjustment according to current region boundaries is done
> > client
> > > > side. Take a look at the javadoc for LoadQueueItem
> > > > in LoadIncrementalHFiles.java
> > > >
> > > > Cheers
> > > >
> > > > On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora <
> > > shushantarora09@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > 1.Is there any max limit on key size of hbase table.
> > > > > 2.Is multiple small tables vs one large table which one is
> preferred.
> > > > > 3.for bulk load -when  LoadIncremantalHfile is run it again
> > > recalculates
> > > > > the region splits based on region boundary - is this division
> happens
> > > on
> > > > > client side or server side again at region server or hbase master
> and
> > > > then
> > > > > it assigns the splits which cross target region boundary to desired
> > > > > regionserver.
> > > > >
> > > >
> > >
> >
>

Re: hbase doubts

Posted by Shushant Arora <sh...@gmail.com>.
1.so size limit is per cell's identifier + value ?

What is more optimise - to have field in key or in column family's column ?
If pattern is like every row has that field.

Say I have a field F1 in all rows so
Situtatio -1
key1#F1(as composite key)  - and rest fields in column

Situation-2
key1 as key and F1 part of column family.


This is the main reason I  asked the key size limit.
If I asked for no of rows where F1 is = 'someval' will it be faster in
situation-1 than in situation-2. Since in 1 it can return the result just
by traversing keys no need to read columns?


On Mon, Aug 17, 2015 at 8:27 PM, Ted Yu <yu...@gmail.com> wrote:

> For #1, it is the limit on a single keyvalue, not row, not key.
>
> For #2, please see the following:
>
> http://hbase.apache.org/book.html#store.memstore
> http://hbase.apache.org/book.html#regionserver_splitting_implementation
>
> Cheers
>
> On Mon, Aug 17, 2015 at 7:36 AM, Shushant Arora <shushantarora09@gmail.com
> >
> wrote:
>
> > 1.Is hbase.client.keyvalue.maxsize  is max size of row or key only ? Is
> > there any limit on key size only ?
> > 2.Access pattern is mostly on key based only- Is memstores and regions
> on a
> > regionserver are per table basis? Is it if I have multiple tables it will
> > have multiple memstores instead of few if it would have been one large
> > table ?
> >
> >
> > On Mon, Aug 17, 2015 at 7:29 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > For #1, take a look at the following in hbase-default.xml :
> > >
> > >     <name>hbase.client.keyvalue.maxsize</name>
> > >     <value>10485760</value>
> > >
> > > For #2, it would be easier to answer if you can outline access patterns
> > in
> > > your app.
> > >
> > > For #3, adjustment according to current region boundaries is done
> client
> > > side. Take a look at the javadoc for LoadQueueItem
> > > in LoadIncrementalHFiles.java
> > >
> > > Cheers
> > >
> > > On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora <
> > shushantarora09@gmail.com
> > > >
> > > wrote:
> > >
> > > > 1.Is there any max limit on key size of hbase table.
> > > > 2.Is multiple small tables vs one large table which one is preferred.
> > > > 3.for bulk load -when  LoadIncremantalHfile is run it again
> > recalculates
> > > > the region splits based on region boundary - is this division happens
> > on
> > > > client side or server side again at region server or hbase master and
> > > then
> > > > it assigns the splits which cross target region boundary to desired
> > > > regionserver.
> > > >
> > >
> >
>

Re: hbase doubts

Posted by Ted Yu <yu...@gmail.com>.
For #1, it is the limit on a single keyvalue, not row, not key.

For #2, please see the following:

http://hbase.apache.org/book.html#store.memstore
http://hbase.apache.org/book.html#regionserver_splitting_implementation

Cheers

On Mon, Aug 17, 2015 at 7:36 AM, Shushant Arora <sh...@gmail.com>
wrote:

> 1.Is hbase.client.keyvalue.maxsize  is max size of row or key only ? Is
> there any limit on key size only ?
> 2.Access pattern is mostly on key based only- Is memstores and regions on a
> regionserver are per table basis? Is it if I have multiple tables it will
> have multiple memstores instead of few if it would have been one large
> table ?
>
>
> On Mon, Aug 17, 2015 at 7:29 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > For #1, take a look at the following in hbase-default.xml :
> >
> >     <name>hbase.client.keyvalue.maxsize</name>
> >     <value>10485760</value>
> >
> > For #2, it would be easier to answer if you can outline access patterns
> in
> > your app.
> >
> > For #3, adjustment according to current region boundaries is done client
> > side. Take a look at the javadoc for LoadQueueItem
> > in LoadIncrementalHFiles.java
> >
> > Cheers
> >
> > On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora <
> shushantarora09@gmail.com
> > >
> > wrote:
> >
> > > 1.Is there any max limit on key size of hbase table.
> > > 2.Is multiple small tables vs one large table which one is preferred.
> > > 3.for bulk load -when  LoadIncremantalHfile is run it again
> recalculates
> > > the region splits based on region boundary - is this division happens
> on
> > > client side or server side again at region server or hbase master and
> > then
> > > it assigns the splits which cross target region boundary to desired
> > > regionserver.
> > >
> >
>

Re: hbase doubts

Posted by Shushant Arora <sh...@gmail.com>.
1.Is hbase.client.keyvalue.maxsize  is max size of row or key only ? Is
there any limit on key size only ?
2.Access pattern is mostly on key based only- Is memstores and regions on a
regionserver are per table basis? Is it if I have multiple tables it will
have multiple memstores instead of few if it would have been one large
table ?


On Mon, Aug 17, 2015 at 7:29 PM, Ted Yu <yu...@gmail.com> wrote:

> For #1, take a look at the following in hbase-default.xml :
>
>     <name>hbase.client.keyvalue.maxsize</name>
>     <value>10485760</value>
>
> For #2, it would be easier to answer if you can outline access patterns in
> your app.
>
> For #3, adjustment according to current region boundaries is done client
> side. Take a look at the javadoc for LoadQueueItem
> in LoadIncrementalHFiles.java
>
> Cheers
>
> On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora <shushantarora09@gmail.com
> >
> wrote:
>
> > 1.Is there any max limit on key size of hbase table.
> > 2.Is multiple small tables vs one large table which one is preferred.
> > 3.for bulk load -when  LoadIncremantalHfile is run it again recalculates
> > the region splits based on region boundary - is this division happens on
> > client side or server side again at region server or hbase master and
> then
> > it assigns the splits which cross target region boundary to desired
> > regionserver.
> >
>

Re: hbase doubts

Posted by Ted Yu <yu...@gmail.com>.
For #1, take a look at the following in hbase-default.xml :

    <name>hbase.client.keyvalue.maxsize</name>
    <value>10485760</value>

For #2, it would be easier to answer if you can outline access patterns in
your app.

For #3, adjustment according to current region boundaries is done client
side. Take a look at the javadoc for LoadQueueItem
in LoadIncrementalHFiles.java

Cheers

On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora <sh...@gmail.com>
wrote:

> 1.Is there any max limit on key size of hbase table.
> 2.Is multiple small tables vs one large table which one is preferred.
> 3.for bulk load -when  LoadIncremantalHfile is run it again recalculates
> the region splits based on region boundary - is this division happens on
> client side or server side again at region server or hbase master and then
> it assigns the splits which cross target region boundary to desired
> regionserver.
>