You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by James Johansville <ja...@gmail.com> on 2016/03/25 00:45:13 UTC

Inconsistent scan performance

Hello all,

So, I wrote a Java application for HBase that does a partitioned full-table
scan according to a set number of partitions. For example, if there are 20
partitions specified, then 20 separate full scans are launched that cover
an equal slice of the row identifier range.

The rows are uniformly distributed throughout the RegionServers. I
confirmed this through the hbase shell. I have only one column family, and
each row has the same number of column qualifiers.

My problem is that the individual scan performance is wildly inconsistent
even though they fetch approximately a similar number of rows. This
inconsistency appears to be random with respect to hosts or regionservers
or partitions or CPU cores. I am the only user of the fleet and not running
any other concurrent HBase operation.

I started measuring from the beginning of the scan and stopped measuring
after the scan was completed. I am not doing any logic with the results,
just scanning them.

For ~230K rows fetched per scan, I am getting anywhere from 4 seconds to
100+ seconds. This seems a little too bouncy for me. Does anyone have any
insight? By comparison, a similar utility I wrote to upsert to
regionservers was very consistent in ops/sec and I had no issues with it.

Using 13 partitions on a machine that has 32 CPU cores and 16 GB heap, I
see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of log
output I saved that used 130 partitions.

total # partitions:130; partition id:47; rows:232730 elapsed_sec:6.401
ops/sec:36358.38150289017
total # partitions:130; partition id:100; rows:206890 elapsed_sec:6.636
ops/sec:31176.91380349608
total # partitions:130; partition id:63; rows:233437 elapsed_sec:7.586
ops/sec:30772.08014764039
total # partitions:130; partition id:9; rows:232585 elapsed_sec:32.985
ops/sec:7051.235410034865
total # partitions:130; partition id:19; rows:234192 elapsed_sec:38.733
ops/sec:6046.3170939508955
total # partitions:130; partition id:1; rows:232860 elapsed_sec:48.479
ops/sec:4803.316900101075
total # partitions:130; partition id:125; rows:205334 elapsed_sec:41.911
ops/sec:4899.286583474505
total # partitions:130; partition id:123; rows:206622 elapsed_sec:42.281
ops/sec:4886.875901705258
total # partitions:130; partition id:54; rows:232811 elapsed_sec:49.083
ops/sec:4743.210480206996

I use setCacheBlocks(false), setCaching(5000).  Does anyone have any
insight into how I can make the read performance more consistent?

Thanks!

Re: Inconsistent scan performance

Posted by "james.johansville" <ja...@gmail.com>.
Hello all,

Thank you for the help. It was the network latency between datacenters that
caused it; once I started forcing scans to be performed near the
regionserver they are located within, the performance issues disappeared. I
am surprised I did not see nearly as much variance within writes even
though these were crossing datacenters as well.


On Sat, Mar 26, 2016 at 11:00 PM, <la...@apache.org> wrote:

> Hi James,
>
> The low numbers are _way_ slow. Way lower than what we typically see.
> I assume the region servers have spinning drives? How many?Can you report
> metrics like IOWAIT. Also as Nicolas mentioned, can you report on the data
> locality index on each region server?
>
> What's the networking between the machines? I assume 1ge? And all NICs are
> good (none reports a lower link speed?)
>
> Is it always the same servers that are slow, or does it vary?
>
> Lastly, how big are the rows?
>
> Thanks.
>
> -- Lars
>
>       From: James Johansville <ja...@gmail.com>
>  To: user@hbase.apache.org
>  Sent: Friday, March 25, 2016 12:23 PM
>  Subject: Re: Inconsistent scan performance
>
> Hello all,
>
> I have 13 RegionServers and presplit into 13 regions (which motivated my
> comment that I aligned my queries with the regionservers, which obviously
> isn't accurate). I have been testing using a multiple of 13 for partitioned
> scans.
>
> Here are my current region setup -- I converted the row identifier to a
> logical identifier known to my code for easier read access. There are 26
> regions split roughly evenly over a logical range of 0-996.
>
> Here are region boundaries. The splits aren't perfect, so I assume for a 26
> partition scan, each scan is going to read most of one region and a tiny
> bit of another region (maybe three). I have given each server an index and
> marked its datacenter with an index as well -- so the first three regions
> are on server #9, and so on.
>
> 0 region range=[EMPTY]~id:38 server index=9 dc=1
> 1 region range=id:38~id:77 server index=9 dc=1
> 2 region range=id:77~id:115 server index=9 dc=1
> 3 region range=id:115~id:154 server index=10 dc=3
> 4 region range=id:154~id:192 server index=3 dc=2
> 5 region range=id:192~id:231 server index=10 dc=3
> 6 region range=id:231~id:269 server index=10 dc=3
> 7 region range=id:269~id:308 server index=10 dc=3
> 8 region range=id:308~id:346 server index=1 dc=1
> 9 region range=id:346~id:385 server index=4 dc=2
> 10 region range=id:385~id:423 server index=2 dc=2
> 11 region range=id:423~id:462 server index=5 dc=3
> 12 region range=id:462~id:500 server index=11 dc=1
> 13 region range=id:500~id:539 server index=10 dc=3
> 14 region range=id:539~id:577 server index=11 dc=1
> 15 region range=id:577~id:616 server index=11 dc=1
> 16 region range=id:616~id:654 server index=6 dc=2
> 17 region range=id:654~id:693 server index=12 dc=3
> 18 region range=id:693~id:730 server index=8 dc=1
> 19 region range=id:730~id:769 server index=4 dc=2
> 20 region range=id:769~id:806 server index=12 dc=3
> 21 region range=id:806~id:845 server index=11 dc=1
> 22 region range=id:845~id:883 server index=9 dc=1
> 23 region range=id:883~id:921 server index=12 dc=3
> 24 region range=id:921~id:958 server index=0 dc=3
> 25 region range=id:958:[EMPTY] server index=12 dc=3
>
>
> Here are my scan ranges -- the end isn't inclusive. Each scan range
> corresponds to a particular partition number in sequence.
> 0 scan range=id:0~id:39
> 1 scan range=id:39~id:78
> 2 scan range=id:78~id:117
> 3 scan range=id:117~id:156
> 4 scan range=id:156~id:195
> 5 scan range=id:195~id:234
> 6 scan range=id:234~id:273
> 7 scan range=id:273~id:312
> 8 scan range=id:312~id:351
> 9 scan range=id:351~id:389
> 10 scan range=id:389~id:427
> 11 scan range=id:427~id:465
> 12 scan range=id:465~id:503
> 13 scan range=id:503~id:541
> 14 scan range=id:541~id:579
> 15 scan range=id:579~id:617
> 16 scan range=id:617~id:655
> 17 scan range=id:655~id:693
> 18 scan range=id:693~id:731
> 19 scan range=id:731~id:769
> 20 scan range=id:769~id:807
> 21 scan range=id:807~id:845
> 22 scan range=id:845~id:883
> 23 scan range=id:883~id:921
> 24 scan range=id:921~id:959
> 25 scan range=id:959~id:997
>
> I ran the utility twice on 26-range.
> total # partitions:26; partition_number:0; rows:1775368 elapsed_sec:433.525
> ops/sec:4095.191742114065
> total # partitions:26; partition_number:1; rows:1774447 elapsed_sec:427.884
> ops/sec:4147.028166512419
> total # partitions:26; partition_number:2; rows:1774755 elapsed_sec:674.357
> ops/sec:2631.773674774637
> total # partitions:26; partition_number:3; rows:1776706 elapsed_sec:376.375
> ops/sec:4720.573895715709
> total # partitions:26; partition_number:4; rows:1777409 elapsed_sec:47.484
> ops/sec:37431.74543003959
> total # partitions:26; partition_number:5; rows:1779412 elapsed_sec:39.443
> ops/sec:45113.50556499252
> total # partitions:26; partition_number:6; rows:1773312 elapsed_sec:407.076
> ops/sec:4356.2184948265185
> total # partitions:26; partition_number:7; rows:1773790 elapsed_sec:436.169
> ops/sec:4066.749356327479
> total # partitions:26; partition_number:8; rows:1773311 elapsed_sec:617.88
> ops/sec:2869.99255518871
> total # partitions:26; partition_number:9; rows:1727384 elapsed_sec:27.348
> ops/sec:63163.0832236361
> total # partitions:26; partition_number:10; rows:1734960 elapsed_sec:64.383
> ops/sec:26947.48613764503
> total # partitions:26; partition_number:11; rows:1729728
> elapsed_sec:381.221 ops/sec:4537.336610522505
> total # partitions:26; partition_number:12; rows:1732999 elapsed_sec:59.049
> ops/sec:29348.49023692188
> total # partitions:26; partition_number:13; rows:1731442
> elapsed_sec:369.975 ops/sec:4679.889181701466
> total # partitions:26; partition_number:14; rows:1732026
> elapsed_sec:698.667 ops/sec:2479.0436645784043
> total # partitions:26; partition_number:15; rows:1727737
> elapsed_sec:398.964 ops/sec:4330.558646895459
> total # partitions:26; partition_number:16; rows:1728357
> elapsed_sec:695.636 ops/sec:2484.570953774675
> total # partitions:26; partition_number:17; rows:1733250
> elapsed_sec:392.337 ops/sec:4417.758202769558
> total # partitions:26; partition_number:18; rows:1742667
> elapsed_sec:410.238 ops/sec:4247.94143887207
> total # partitions:26; partition_number:19; rows:1742777
> elapsed_sec:374.509 ops/sec:4653.498313792192
> total # partitions:26; partition_number:20; rows:1742520 elapsed_sec:44.447
> ops/sec:39204.445744369696
> total # partitions:26; partition_number:21; rows:1742248
> elapsed_sec:404.692 ops/sec:4305.120931473812
> total # partitions:26; partition_number:22; rows:1743288
> elapsed_sec:420.886 ops/sec:4141.948175990648
> total # partitions:26; partition_number:23; rows:1741106
> elapsed_sec:374.782 ops/sec:4645.650004535971
> total # partitions:26; partition_number:24; rows:1738634 elapsed_sec:34.059
> ops/sec:51047.71132446637
> total # partitions:26; partition_number:25; rows:1736522 elapsed_sec:38.873
> ops/sec:44671.67442698017
>
> total # partitions:26; partition_number:0; rows:1775368 elapsed_sec:687.482
> ops/sec:2582.4210670243006
> total # partitions:26; partition_number:1; rows:1774447 elapsed_sec:687.132
> ops/sec:2582.3961043875124
> total # partitions:26; partition_number:2; rows:1774755 elapsed_sec:57.592
> ops/sec:30815.998749826365
> total # partitions:26; partition_number:3; rows:1776706 elapsed_sec:369.096
> ops/sec:4813.6690725448125
> total # partitions:26; partition_number:4; rows:1777409 elapsed_sec:49.486
> ops/sec:35917.41098492503
> total # partitions:26; partition_number:5; rows:1779412 elapsed_sec:39.008
> ops/sec:45616.59146841673
> total # partitions:26; partition_number:6; rows:1773312 elapsed_sec:405.897
> ops/sec:4368.871905926873
> total # partitions:26; partition_number:7; rows:1773790 elapsed_sec:368.597
> ops/sec:4812.274652262498
> total # partitions:26; partition_number:8; rows:1773311 elapsed_sec:401.001
> ops/sec:4422.210917179758ex
> total # partitions:26; partition_number:9; rows:1727384 elapsed_sec:381.702
> ops/sec:4525.477990683832
> total # partitions:26; partition_number:10; rows:1734960
> elapsed_sec:659.132 ops/sec:2632.1890000788917
> total # partitions:26; partition_number:11; rows:1729728 elapsed_sec:65.535
> ops/sec:26393.95742732891
> total # partitions:26; partition_number:12; rows:1732999
> elapsed_sec:367.694 ops/sec:4713.15550430521
> total # partitions:26; partition_number:13; rows:1731442
> elapsed_sec:394.045 ops/sec:4394.020987450672
> total # partitions:26; partition_number:14; rows:1732026
> elapsed_sec:692.832 ops/sec:2499.9220590272967
> total # partitions:26; partition_number:15; rows:1727737
> elapsed_sec:637.833 ops/sec:2708.760757126082
> total # partitions:26; partition_number:16; rows:1728357 elapsed_sec:35.41
> ops/sec:48809.85597288902
> total # partitions:26; partition_number:17; rows:1733250
> elapsed_sec:348.711 ops/sec:4970.448308197906
> total # partitions:26; partition_number:18; rows:1742667
> elapsed_sec:747.852 ops/sec:2330.2297780844337
> total # partitions:26; partition_number:19; rows:1742777 elapsed_sec:21.719
> ops/sec:80242.04613472075
> total # partitions:26; partition_number:20; rows:1742520 elapsed_sec:41.713
> ops/sec:41774.02728166279
> total # partitions:26; partition_number:21; rows:1742248
> elapsed_sec:398.595 ops/sec:4370.97304281288
> total # partitions:26; partition_number:22; rows:1743288
> elapsed_sec:416.296 ops/sec:4187.61650364164
> total # partitions:26; partition_number:23; rows:1741106 elapsed_sec:376.67
> ops/sec:4622.364403854833
> total # partitions:26; partition_number:24; rows:1738634 elapsed_sec:34.025
> ops/sec:51098.72152828803
> total # partitions:26; partition_number:25; rows:1736522 elapsed_sec:38.673
> ops/sec:44902.69697204768
>
> I can't make rhyme or reason of it. There seems to be an imperfect
> region-to-region correspondence, but some queries that could cross multiple
> datacenters in theory appear fast. Some scans that are restricted to only
> one regionserver are fast, some are slow. Anyone with more experience who
> might have insight?
>
>
> On Fri, Mar 25, 2016 at 9:20 AM, Stack <st...@duboce.net> wrote:
>
> > On Fri, Mar 25, 2016 at 3:50 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > James:
> > > Another experiment you can do is to enable region replica -
> HBASE-10070.
> > >
> > > This would bring down the read variance greatly.
> > >
> > >
> > Suggest you NOT do this James.
> >
> > Lets figure your issue as-is rather than compound by adding yet more
> moving
> > parts.
> >
> > St.Ack
> >
> >
> >
> >
> > > > On Mar 25, 2016, at 2:41 AM, Nicolas Liochon <nk...@gmail.com>
> > wrote:
> > > >
> > > > The read path is much more complex than the write one, so the
> response
> > > time
> > > > has much more variance.
> > > > The gap is so wide here that I would bet on Ted's or Stack's points,
> > but
> > > > here are a few other sources of variance:
> > > > - hbase cache: as Anoop said, may be the data is already in the hbase
> > > cache
> > > > (setCacheBlocks(false), means "don't add blocks to the cache", not
> > "don't
> > > > use the cache")
> > > > - OS cache: and if the data is not in HBase cache may be it is in the
> > > > operating system cache (for example if you run the test multiple
> times)
> > > > - data locality: if you're lucky the data is local to the region
> > server.
> > > If
> > > > you're not, the reads need an extra network hoop.
> > > > - number of store: more hfiles/stores per region => slower reads.
> > > > - number of versions and so on: sub case of the previous one: if the
> > rows
> > > > have been updated multiple times and the compaction has not ran yet,
> > you
> > > > will read much more data.
> > > > - (another subcase): the data has not been flushed yet and is
> available
> > > in
> > > > the memstore => fast read.
> > > >
> > > > None of these points has any importance for the the write path.
> > Basically
> > > > the writes variance says nothing about the variance you will get on
> the
> > > > reads.
> > > >
> > > > IIRC, locality and number of stores are visible in HBase UI. Doing a
> > > table
> > > > flush and then running a major compaction generally helps to
> stabilize
> > > > response time when you do a test. But it should not explain the x25
> > > you're
> > > > seeing, there is something else somewhere else. I don't get the
> > > > regionserver boundaries you're mentioning: there is no boundary
> between
> > > > regionservers. A regionserver can host A->D and M->S while another
> > hosts
> > > > D->M and S->Z for example.
> > > >
> > > >> On Fri, Mar 25, 2016 at 6:51 AM, Anoop John <an...@gmail.com>
> > > wrote:
> > > >>
> > > >> I see you set cacheBlocks to be false on the Scan.  By any chance on
> > > >> some other RS(s), the data you are looking for is already in cache?
> > > >> (Any previous scan or  by cache on write)  And there are no
> concurrent
> > > >> writes any way right?  This much difference in time !  One
> > > >> possibility is blocks avail or not avail in cache..
> > > >>
> > > >> -Anoop-
> > > >>
> > > >>> On Fri, Mar 25, 2016 at 11:04 AM, Stack <st...@duboce.net> wrote:
> > > >>> On Thu, Mar 24, 2016 at 4:45 PM, James Johansville <
> > > >>> james.johansville@gmail.com> wrote:
> > > >>>
> > > >>>> Hello all,
> > > >>>>
> > > >>>> So, I wrote a Java application for HBase that does a partitioned
> > > >> full-table
> > > >>>> scan according to a set number of partitions. For example, if
> there
> > > are
> > > >> 20
> > > >>>> partitions specified, then 20 separate full scans are launched
> that
> > > >> cover
> > > >>>> an equal slice of the row identifier range.
> > > >>>>
> > > >>>> The rows are uniformly distributed throughout the RegionServers.
> > > >>>
> > > >>>
> > > >>> How many RegionServers? How many Regions? Are Regions evenly
> > > distributed
> > > >>> across the servers? If you put all partitions on one machine and
> then
> > > run
> > > >>> your client, do the timings even out?
> > > >>>
> > > >>> The disparity seems really wide.
> > > >>>
> > > >>> St.Ack
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>> I
> > > >>>> confirmed this through the hbase shell. I have only one column
> > family,
> > > >> and
> > > >>>> each row has the same number of column qualifiers.
> > > >>>>
> > > >>>> My problem is that the individual scan performance is wildly
> > > >> inconsistent
> > > >>>> even though they fetch approximately a similar number of rows.
> This
> > > >>>> inconsistency appears to be random with respect to hosts or
> > > >> regionservers
> > > >>>> or partitions or CPU cores. I am the only user of the fleet and
> not
> > > >> running
> > > >>>> any other concurrent HBase operation.
> > > >>>>
> > > >>>> I started measuring from the beginning of the scan and stopped
> > > measuring
> > > >>>> after the scan was completed. I am not doing any logic with the
> > > results,
> > > >>>> just scanning them.
> > > >>>>
> > > >>>> For ~230K rows fetched per scan, I am getting anywhere from 4
> > seconds
> > > to
> > > >>>> 100+ seconds. This seems a little too bouncy for me. Does anyone
> > have
> > > >> any
> > > >>>> insight? By comparison, a similar utility I wrote to upsert to
> > > >>>> regionservers was very consistent in ops/sec and I had no issues
> > with
> > > >> it.
> > > >>>>
> > > >>>> Using 13 partitions on a machine that has 32 CPU cores and 16 GB
> > > heap, I
> > > >>>> see anywhere between 3K ops/sec to 82K ops/sec. Here's an example
> of
> > > log
> > > >>>> output I saved that used 130 partitions.
> > > >>>>
> > > >>>> total # partitions:130; partition id:47; rows:232730
> > elapsed_sec:6.401
> > > >>>> ops/sec:36358.38150289017
> > > >>>> total # partitions:130; partition id:100; rows:206890
> > > elapsed_sec:6.636
> > > >>>> ops/sec:31176.91380349608
> > > >>>> total # partitions:130; partition id:63; rows:233437
> > elapsed_sec:7.586
> > > >>>> ops/sec:30772.08014764039
> > > >>>> total # partitions:130; partition id:9; rows:232585
> > elapsed_sec:32.985
> > > >>>> ops/sec:7051.235410034865
> > > >>>> total # partitions:130; partition id:19; rows:234192
> > > elapsed_sec:38.733
> > > >>>> ops/sec:6046.3170939508955
> > > >>>> total # partitions:130; partition id:1; rows:232860
> > elapsed_sec:48.479
> > > >>>> ops/sec:4803.316900101075
> > > >>>> total # partitions:130; partition id:125; rows:205334
> > > elapsed_sec:41.911
> > > >>>> ops/sec:4899.286583474505
> > > >>>> total # partitions:130; partition id:123; rows:206622
> > > elapsed_sec:42.281
> > > >>>> ops/sec:4886.875901705258
> > > >>>> total # partitions:130; partition id:54; rows:232811
> > > elapsed_sec:49.083
> > > >>>> ops/sec:4743.210480206996
> > > >>>>
> > > >>>> I use setCacheBlocks(false), setCaching(5000).  Does anyone have
> any
> > > >>>> insight into how I can make the read performance more consistent?
> > > >>>>
> > > >>>> Thanks!
> > > >>
> > >
> >
>
>
>
>

Re: Inconsistent scan performance

Posted by la...@apache.org.
Hi James,

The low numbers are _way_ slow. Way lower than what we typically see.
I assume the region servers have spinning drives? How many?Can you report metrics like IOWAIT. Also as Nicolas mentioned, can you report on the data locality index on each region server?

What's the networking between the machines? I assume 1ge? And all NICs are good (none reports a lower link speed?)

Is it always the same servers that are slow, or does it vary?

Lastly, how big are the rows?

Thanks.

-- Lars

      From: James Johansville <ja...@gmail.com>
 To: user@hbase.apache.org 
 Sent: Friday, March 25, 2016 12:23 PM
 Subject: Re: Inconsistent scan performance
   
Hello all,

I have 13 RegionServers and presplit into 13 regions (which motivated my
comment that I aligned my queries with the regionservers, which obviously
isn't accurate). I have been testing using a multiple of 13 for partitioned
scans.

Here are my current region setup -- I converted the row identifier to a
logical identifier known to my code for easier read access. There are 26
regions split roughly evenly over a logical range of 0-996.

Here are region boundaries. The splits aren't perfect, so I assume for a 26
partition scan, each scan is going to read most of one region and a tiny
bit of another region (maybe three). I have given each server an index and
marked its datacenter with an index as well -- so the first three regions
are on server #9, and so on.

0 region range=[EMPTY]~id:38 server index=9 dc=1
1 region range=id:38~id:77 server index=9 dc=1
2 region range=id:77~id:115 server index=9 dc=1
3 region range=id:115~id:154 server index=10 dc=3
4 region range=id:154~id:192 server index=3 dc=2
5 region range=id:192~id:231 server index=10 dc=3
6 region range=id:231~id:269 server index=10 dc=3
7 region range=id:269~id:308 server index=10 dc=3
8 region range=id:308~id:346 server index=1 dc=1
9 region range=id:346~id:385 server index=4 dc=2
10 region range=id:385~id:423 server index=2 dc=2
11 region range=id:423~id:462 server index=5 dc=3
12 region range=id:462~id:500 server index=11 dc=1
13 region range=id:500~id:539 server index=10 dc=3
14 region range=id:539~id:577 server index=11 dc=1
15 region range=id:577~id:616 server index=11 dc=1
16 region range=id:616~id:654 server index=6 dc=2
17 region range=id:654~id:693 server index=12 dc=3
18 region range=id:693~id:730 server index=8 dc=1
19 region range=id:730~id:769 server index=4 dc=2
20 region range=id:769~id:806 server index=12 dc=3
21 region range=id:806~id:845 server index=11 dc=1
22 region range=id:845~id:883 server index=9 dc=1
23 region range=id:883~id:921 server index=12 dc=3
24 region range=id:921~id:958 server index=0 dc=3
25 region range=id:958:[EMPTY] server index=12 dc=3


Here are my scan ranges -- the end isn't inclusive. Each scan range
corresponds to a particular partition number in sequence.
0 scan range=id:0~id:39
1 scan range=id:39~id:78
2 scan range=id:78~id:117
3 scan range=id:117~id:156
4 scan range=id:156~id:195
5 scan range=id:195~id:234
6 scan range=id:234~id:273
7 scan range=id:273~id:312
8 scan range=id:312~id:351
9 scan range=id:351~id:389
10 scan range=id:389~id:427
11 scan range=id:427~id:465
12 scan range=id:465~id:503
13 scan range=id:503~id:541
14 scan range=id:541~id:579
15 scan range=id:579~id:617
16 scan range=id:617~id:655
17 scan range=id:655~id:693
18 scan range=id:693~id:731
19 scan range=id:731~id:769
20 scan range=id:769~id:807
21 scan range=id:807~id:845
22 scan range=id:845~id:883
23 scan range=id:883~id:921
24 scan range=id:921~id:959
25 scan range=id:959~id:997

I ran the utility twice on 26-range.
total # partitions:26; partition_number:0; rows:1775368 elapsed_sec:433.525
ops/sec:4095.191742114065
total # partitions:26; partition_number:1; rows:1774447 elapsed_sec:427.884
ops/sec:4147.028166512419
total # partitions:26; partition_number:2; rows:1774755 elapsed_sec:674.357
ops/sec:2631.773674774637
total # partitions:26; partition_number:3; rows:1776706 elapsed_sec:376.375
ops/sec:4720.573895715709
total # partitions:26; partition_number:4; rows:1777409 elapsed_sec:47.484
ops/sec:37431.74543003959
total # partitions:26; partition_number:5; rows:1779412 elapsed_sec:39.443
ops/sec:45113.50556499252
total # partitions:26; partition_number:6; rows:1773312 elapsed_sec:407.076
ops/sec:4356.2184948265185
total # partitions:26; partition_number:7; rows:1773790 elapsed_sec:436.169
ops/sec:4066.749356327479
total # partitions:26; partition_number:8; rows:1773311 elapsed_sec:617.88
ops/sec:2869.99255518871
total # partitions:26; partition_number:9; rows:1727384 elapsed_sec:27.348
ops/sec:63163.0832236361
total # partitions:26; partition_number:10; rows:1734960 elapsed_sec:64.383
ops/sec:26947.48613764503
total # partitions:26; partition_number:11; rows:1729728
elapsed_sec:381.221 ops/sec:4537.336610522505
total # partitions:26; partition_number:12; rows:1732999 elapsed_sec:59.049
ops/sec:29348.49023692188
total # partitions:26; partition_number:13; rows:1731442
elapsed_sec:369.975 ops/sec:4679.889181701466
total # partitions:26; partition_number:14; rows:1732026
elapsed_sec:698.667 ops/sec:2479.0436645784043
total # partitions:26; partition_number:15; rows:1727737
elapsed_sec:398.964 ops/sec:4330.558646895459
total # partitions:26; partition_number:16; rows:1728357
elapsed_sec:695.636 ops/sec:2484.570953774675
total # partitions:26; partition_number:17; rows:1733250
elapsed_sec:392.337 ops/sec:4417.758202769558
total # partitions:26; partition_number:18; rows:1742667
elapsed_sec:410.238 ops/sec:4247.94143887207
total # partitions:26; partition_number:19; rows:1742777
elapsed_sec:374.509 ops/sec:4653.498313792192
total # partitions:26; partition_number:20; rows:1742520 elapsed_sec:44.447
ops/sec:39204.445744369696
total # partitions:26; partition_number:21; rows:1742248
elapsed_sec:404.692 ops/sec:4305.120931473812
total # partitions:26; partition_number:22; rows:1743288
elapsed_sec:420.886 ops/sec:4141.948175990648
total # partitions:26; partition_number:23; rows:1741106
elapsed_sec:374.782 ops/sec:4645.650004535971
total # partitions:26; partition_number:24; rows:1738634 elapsed_sec:34.059
ops/sec:51047.71132446637
total # partitions:26; partition_number:25; rows:1736522 elapsed_sec:38.873
ops/sec:44671.67442698017

total # partitions:26; partition_number:0; rows:1775368 elapsed_sec:687.482
ops/sec:2582.4210670243006
total # partitions:26; partition_number:1; rows:1774447 elapsed_sec:687.132
ops/sec:2582.3961043875124
total # partitions:26; partition_number:2; rows:1774755 elapsed_sec:57.592
ops/sec:30815.998749826365
total # partitions:26; partition_number:3; rows:1776706 elapsed_sec:369.096
ops/sec:4813.6690725448125
total # partitions:26; partition_number:4; rows:1777409 elapsed_sec:49.486
ops/sec:35917.41098492503
total # partitions:26; partition_number:5; rows:1779412 elapsed_sec:39.008
ops/sec:45616.59146841673
total # partitions:26; partition_number:6; rows:1773312 elapsed_sec:405.897
ops/sec:4368.871905926873
total # partitions:26; partition_number:7; rows:1773790 elapsed_sec:368.597
ops/sec:4812.274652262498
total # partitions:26; partition_number:8; rows:1773311 elapsed_sec:401.001
ops/sec:4422.210917179758ex
total # partitions:26; partition_number:9; rows:1727384 elapsed_sec:381.702
ops/sec:4525.477990683832
total # partitions:26; partition_number:10; rows:1734960
elapsed_sec:659.132 ops/sec:2632.1890000788917
total # partitions:26; partition_number:11; rows:1729728 elapsed_sec:65.535
ops/sec:26393.95742732891
total # partitions:26; partition_number:12; rows:1732999
elapsed_sec:367.694 ops/sec:4713.15550430521
total # partitions:26; partition_number:13; rows:1731442
elapsed_sec:394.045 ops/sec:4394.020987450672
total # partitions:26; partition_number:14; rows:1732026
elapsed_sec:692.832 ops/sec:2499.9220590272967
total # partitions:26; partition_number:15; rows:1727737
elapsed_sec:637.833 ops/sec:2708.760757126082
total # partitions:26; partition_number:16; rows:1728357 elapsed_sec:35.41
ops/sec:48809.85597288902
total # partitions:26; partition_number:17; rows:1733250
elapsed_sec:348.711 ops/sec:4970.448308197906
total # partitions:26; partition_number:18; rows:1742667
elapsed_sec:747.852 ops/sec:2330.2297780844337
total # partitions:26; partition_number:19; rows:1742777 elapsed_sec:21.719
ops/sec:80242.04613472075
total # partitions:26; partition_number:20; rows:1742520 elapsed_sec:41.713
ops/sec:41774.02728166279
total # partitions:26; partition_number:21; rows:1742248
elapsed_sec:398.595 ops/sec:4370.97304281288
total # partitions:26; partition_number:22; rows:1743288
elapsed_sec:416.296 ops/sec:4187.61650364164
total # partitions:26; partition_number:23; rows:1741106 elapsed_sec:376.67
ops/sec:4622.364403854833
total # partitions:26; partition_number:24; rows:1738634 elapsed_sec:34.025
ops/sec:51098.72152828803
total # partitions:26; partition_number:25; rows:1736522 elapsed_sec:38.673
ops/sec:44902.69697204768

I can't make rhyme or reason of it. There seems to be an imperfect
region-to-region correspondence, but some queries that could cross multiple
datacenters in theory appear fast. Some scans that are restricted to only
one regionserver are fast, some are slow. Anyone with more experience who
might have insight?


On Fri, Mar 25, 2016 at 9:20 AM, Stack <st...@duboce.net> wrote:

> On Fri, Mar 25, 2016 at 3:50 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > James:
> > Another experiment you can do is to enable region replica - HBASE-10070.
> >
> > This would bring down the read variance greatly.
> >
> >
> Suggest you NOT do this James.
>
> Lets figure your issue as-is rather than compound by adding yet more moving
> parts.
>
> St.Ack
>
>
>
>
> > > On Mar 25, 2016, at 2:41 AM, Nicolas Liochon <nk...@gmail.com>
> wrote:
> > >
> > > The read path is much more complex than the write one, so the response
> > time
> > > has much more variance.
> > > The gap is so wide here that I would bet on Ted's or Stack's points,
> but
> > > here are a few other sources of variance:
> > > - hbase cache: as Anoop said, may be the data is already in the hbase
> > cache
> > > (setCacheBlocks(false), means "don't add blocks to the cache", not
> "don't
> > > use the cache")
> > > - OS cache: and if the data is not in HBase cache may be it is in the
> > > operating system cache (for example if you run the test multiple times)
> > > - data locality: if you're lucky the data is local to the region
> server.
> > If
> > > you're not, the reads need an extra network hoop.
> > > - number of store: more hfiles/stores per region => slower reads.
> > > - number of versions and so on: sub case of the previous one: if the
> rows
> > > have been updated multiple times and the compaction has not ran yet,
> you
> > > will read much more data.
> > > - (another subcase): the data has not been flushed yet and is available
> > in
> > > the memstore => fast read.
> > >
> > > None of these points has any importance for the the write path.
> Basically
> > > the writes variance says nothing about the variance you will get on the
> > > reads.
> > >
> > > IIRC, locality and number of stores are visible in HBase UI. Doing a
> > table
> > > flush and then running a major compaction generally helps to stabilize
> > > response time when you do a test. But it should not explain the x25
> > you're
> > > seeing, there is something else somewhere else. I don't get the
> > > regionserver boundaries you're mentioning: there is no boundary between
> > > regionservers. A regionserver can host A->D and M->S while another
> hosts
> > > D->M and S->Z for example.
> > >
> > >> On Fri, Mar 25, 2016 at 6:51 AM, Anoop John <an...@gmail.com>
> > wrote:
> > >>
> > >> I see you set cacheBlocks to be false on the Scan.  By any chance on
> > >> some other RS(s), the data you are looking for is already in cache?
> > >> (Any previous scan or  by cache on write)  And there are no concurrent
> > >> writes any way right?  This much difference in time !  One
> > >> possibility is blocks avail or not avail in cache..
> > >>
> > >> -Anoop-
> > >>
> > >>> On Fri, Mar 25, 2016 at 11:04 AM, Stack <st...@duboce.net> wrote:
> > >>> On Thu, Mar 24, 2016 at 4:45 PM, James Johansville <
> > >>> james.johansville@gmail.com> wrote:
> > >>>
> > >>>> Hello all,
> > >>>>
> > >>>> So, I wrote a Java application for HBase that does a partitioned
> > >> full-table
> > >>>> scan according to a set number of partitions. For example, if there
> > are
> > >> 20
> > >>>> partitions specified, then 20 separate full scans are launched that
> > >> cover
> > >>>> an equal slice of the row identifier range.
> > >>>>
> > >>>> The rows are uniformly distributed throughout the RegionServers.
> > >>>
> > >>>
> > >>> How many RegionServers? How many Regions? Are Regions evenly
> > distributed
> > >>> across the servers? If you put all partitions on one machine and then
> > run
> > >>> your client, do the timings even out?
> > >>>
> > >>> The disparity seems really wide.
> > >>>
> > >>> St.Ack
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>> I
> > >>>> confirmed this through the hbase shell. I have only one column
> family,
> > >> and
> > >>>> each row has the same number of column qualifiers.
> > >>>>
> > >>>> My problem is that the individual scan performance is wildly
> > >> inconsistent
> > >>>> even though they fetch approximately a similar number of rows. This
> > >>>> inconsistency appears to be random with respect to hosts or
> > >> regionservers
> > >>>> or partitions or CPU cores. I am the only user of the fleet and not
> > >> running
> > >>>> any other concurrent HBase operation.
> > >>>>
> > >>>> I started measuring from the beginning of the scan and stopped
> > measuring
> > >>>> after the scan was completed. I am not doing any logic with the
> > results,
> > >>>> just scanning them.
> > >>>>
> > >>>> For ~230K rows fetched per scan, I am getting anywhere from 4
> seconds
> > to
> > >>>> 100+ seconds. This seems a little too bouncy for me. Does anyone
> have
> > >> any
> > >>>> insight? By comparison, a similar utility I wrote to upsert to
> > >>>> regionservers was very consistent in ops/sec and I had no issues
> with
> > >> it.
> > >>>>
> > >>>> Using 13 partitions on a machine that has 32 CPU cores and 16 GB
> > heap, I
> > >>>> see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of
> > log
> > >>>> output I saved that used 130 partitions.
> > >>>>
> > >>>> total # partitions:130; partition id:47; rows:232730
> elapsed_sec:6.401
> > >>>> ops/sec:36358.38150289017
> > >>>> total # partitions:130; partition id:100; rows:206890
> > elapsed_sec:6.636
> > >>>> ops/sec:31176.91380349608
> > >>>> total # partitions:130; partition id:63; rows:233437
> elapsed_sec:7.586
> > >>>> ops/sec:30772.08014764039
> > >>>> total # partitions:130; partition id:9; rows:232585
> elapsed_sec:32.985
> > >>>> ops/sec:7051.235410034865
> > >>>> total # partitions:130; partition id:19; rows:234192
> > elapsed_sec:38.733
> > >>>> ops/sec:6046.3170939508955
> > >>>> total # partitions:130; partition id:1; rows:232860
> elapsed_sec:48.479
> > >>>> ops/sec:4803.316900101075
> > >>>> total # partitions:130; partition id:125; rows:205334
> > elapsed_sec:41.911
> > >>>> ops/sec:4899.286583474505
> > >>>> total # partitions:130; partition id:123; rows:206622
> > elapsed_sec:42.281
> > >>>> ops/sec:4886.875901705258
> > >>>> total # partitions:130; partition id:54; rows:232811
> > elapsed_sec:49.083
> > >>>> ops/sec:4743.210480206996
> > >>>>
> > >>>> I use setCacheBlocks(false), setCaching(5000).  Does anyone have any
> > >>>> insight into how I can make the read performance more consistent?
> > >>>>
> > >>>> Thanks!
> > >>
> >
>


  

Re: Inconsistent scan performance

Posted by Stack <st...@duboce.net>.
On Fri, Mar 25, 2016 at 12:23 PM, James Johansville <
james.johansville@gmail.com> wrote:

> Hello all,
>
> I have 13 RegionServers and presplit into 13 regions (which motivated my
> comment that I aligned my queries with the regionservers, which obviously
> isn't accurate). I have been testing using a multiple of 13 for partitioned
> scans.
>
>
The 13 regions are distributed one per server (pardon the dumb question)?



> Here are my current region setup -- I converted the row identifier to a
> logical identifier known to my code for easier read access. There are 26
> regions split roughly evenly over a logical range of 0-996.
>
>
Any reason not to align on region boundaries --
Admin#getTableRegions(TableName)?



> Here are region boundaries. The splits aren't perfect, so I assume for a 26
> partition scan, each scan is going to read most of one region and a tiny
> bit of another region (maybe three). I have given each server an index and
> marked its datacenter with an index as well -- so the first three regions
> are on server #9, and so on.
>
>
Some servers are in a different (physical) DataCenter?



> 0 region range=[EMPTY]~id:38 server index=9 dc=1
> 1 region range=id:38~id:77 server index=9 dc=1
> 2 region range=id:77~id:115 server index=9 dc=1
> 3 region range=id:115~id:154 server index=10 dc=3
> 4 region range=id:154~id:192 server index=3 dc=2
> 5 region range=id:192~id:231 server index=10 dc=3
> 6 region range=id:231~id:269 server index=10 dc=3
> 7 region range=id:269~id:308 server index=10 dc=3
> 8 region range=id:308~id:346 server index=1 dc=1
> 9 region range=id:346~id:385 server index=4 dc=2
> 10 region range=id:385~id:423 server index=2 dc=2
> 11 region range=id:423~id:462 server index=5 dc=3
> 12 region range=id:462~id:500 server index=11 dc=1
> 13 region range=id:500~id:539 server index=10 dc=3
> 14 region range=id:539~id:577 server index=11 dc=1
> 15 region range=id:577~id:616 server index=11 dc=1
> 16 region range=id:616~id:654 server index=6 dc=2
> 17 region range=id:654~id:693 server index=12 dc=3
> 18 region range=id:693~id:730 server index=8 dc=1
> 19 region range=id:730~id:769 server index=4 dc=2
> 20 region range=id:769~id:806 server index=12 dc=3
> 21 region range=id:806~id:845 server index=11 dc=1
> 22 region range=id:845~id:883 server index=9 dc=1
> 23 region range=id:883~id:921 server index=12 dc=3
> 24 region range=id:921~id:958 server index=0 dc=3
> 25 region range=id:958:[EMPTY] server index=12 dc=3
>
>
> Here are my scan ranges -- the end isn't inclusive. Each scan range
> corresponds to a particular partition number in sequence.
> 0 scan range=id:0~id:39
> 1 scan range=id:39~id:78
> 2 scan range=id:78~id:117
> 3 scan range=id:117~id:156
> 4 scan range=id:156~id:195
> 5 scan range=id:195~id:234
> 6 scan range=id:234~id:273
> 7 scan range=id:273~id:312
> 8 scan range=id:312~id:351
> 9 scan range=id:351~id:389
> 10 scan range=id:389~id:427
> 11 scan range=id:427~id:465
> 12 scan range=id:465~id:503
> 13 scan range=id:503~id:541
> 14 scan range=id:541~id:579
> 15 scan range=id:579~id:617
> 16 scan range=id:617~id:655
> 17 scan range=id:655~id:693
> 18 scan range=id:693~id:731
> 19 scan range=id:731~id:769
> 20 scan range=id:769~id:807
> 21 scan range=id:807~id:845
> 22 scan range=id:845~id:883
> 23 scan range=id:883~id:921
> 24 scan range=id:921~id:959
> 25 scan range=id:959~id:997
>
> I ran the utility twice on 26-range.
> total # partitions:26; partition_number:0; rows:1775368 elapsed_sec:433.525
> ops/sec:4095.191742114065
> total # partitions:26; partition_number:1; rows:1774447 elapsed_sec:427.884
> ops/sec:4147.028166512419
> total # partitions:26; partition_number:2; rows:1774755 elapsed_sec:674.357
> ops/sec:2631.773674774637
> total # partitions:26; partition_number:3; rows:1776706 elapsed_sec:376.375
> ops/sec:4720.573895715709
> total # partitions:26; partition_number:4; rows:1777409 elapsed_sec:47.484
> ops/sec:37431.74543003959
> total # partitions:26; partition_number:5; rows:1779412 elapsed_sec:39.443
> ops/sec:45113.50556499252
> total # partitions:26; partition_number:6; rows:1773312 elapsed_sec:407.076
> ops/sec:4356.2184948265185
> total # partitions:26; partition_number:7; rows:1773790 elapsed_sec:436.169
> ops/sec:4066.749356327479
> total # partitions:26; partition_number:8; rows:1773311 elapsed_sec:617.88
> ops/sec:2869.99255518871
> total # partitions:26; partition_number:9; rows:1727384 elapsed_sec:27.348
> ops/sec:63163.0832236361
> total # partitions:26; partition_number:10; rows:1734960 elapsed_sec:64.383
> ops/sec:26947.48613764503
> total # partitions:26; partition_number:11; rows:1729728
> elapsed_sec:381.221 ops/sec:4537.336610522505
> total # partitions:26; partition_number:12; rows:1732999 elapsed_sec:59.049
> ops/sec:29348.49023692188
> total # partitions:26; partition_number:13; rows:1731442
> elapsed_sec:369.975 ops/sec:4679.889181701466
> total # partitions:26; partition_number:14; rows:1732026
> elapsed_sec:698.667 ops/sec:2479.0436645784043
> total # partitions:26; partition_number:15; rows:1727737
> elapsed_sec:398.964 ops/sec:4330.558646895459
> total # partitions:26; partition_number:16; rows:1728357
> elapsed_sec:695.636 ops/sec:2484.570953774675
> total # partitions:26; partition_number:17; rows:1733250
> elapsed_sec:392.337 ops/sec:4417.758202769558
> total # partitions:26; partition_number:18; rows:1742667
> elapsed_sec:410.238 ops/sec:4247.94143887207
> total # partitions:26; partition_number:19; rows:1742777
> elapsed_sec:374.509 ops/sec:4653.498313792192
> total # partitions:26; partition_number:20; rows:1742520 elapsed_sec:44.447
> ops/sec:39204.445744369696
> total # partitions:26; partition_number:21; rows:1742248
> elapsed_sec:404.692 ops/sec:4305.120931473812
> total # partitions:26; partition_number:22; rows:1743288
> elapsed_sec:420.886 ops/sec:4141.948175990648
> total # partitions:26; partition_number:23; rows:1741106
> elapsed_sec:374.782 ops/sec:4645.650004535971
> total # partitions:26; partition_number:24; rows:1738634 elapsed_sec:34.059
> ops/sec:51047.71132446637
> total # partitions:26; partition_number:25; rows:1736522 elapsed_sec:38.873
> ops/sec:44671.67442698017
>
> total # partitions:26; partition_number:0; rows:1775368 elapsed_sec:687.482
> ops/sec:2582.4210670243006
> total # partitions:26; partition_number:1; rows:1774447 elapsed_sec:687.132
> ops/sec:2582.3961043875124
> total # partitions:26; partition_number:2; rows:1774755 elapsed_sec:57.592
> ops/sec:30815.998749826365
> total # partitions:26; partition_number:3; rows:1776706 elapsed_sec:369.096
> ops/sec:4813.6690725448125
> total # partitions:26; partition_number:4; rows:1777409 elapsed_sec:49.486
> ops/sec:35917.41098492503
> total # partitions:26; partition_number:5; rows:1779412 elapsed_sec:39.008
> ops/sec:45616.59146841673
> total # partitions:26; partition_number:6; rows:1773312 elapsed_sec:405.897
> ops/sec:4368.871905926873
> total # partitions:26; partition_number:7; rows:1773790 elapsed_sec:368.597
> ops/sec:4812.274652262498
> total # partitions:26; partition_number:8; rows:1773311 elapsed_sec:401.001
> ops/sec:4422.210917179758ex
> total # partitions:26; partition_number:9; rows:1727384 elapsed_sec:381.702
> ops/sec:4525.477990683832
> total # partitions:26; partition_number:10; rows:1734960
> elapsed_sec:659.132 ops/sec:2632.1890000788917
> total # partitions:26; partition_number:11; rows:1729728 elapsed_sec:65.535
> ops/sec:26393.95742732891
> total # partitions:26; partition_number:12; rows:1732999
> elapsed_sec:367.694 ops/sec:4713.15550430521
> total # partitions:26; partition_number:13; rows:1731442
> elapsed_sec:394.045 ops/sec:4394.020987450672
> total # partitions:26; partition_number:14; rows:1732026
> elapsed_sec:692.832 ops/sec:2499.9220590272967
> total # partitions:26; partition_number:15; rows:1727737
> elapsed_sec:637.833 ops/sec:2708.760757126082
> total # partitions:26; partition_number:16; rows:1728357 elapsed_sec:35.41
> ops/sec:48809.85597288902
> total # partitions:26; partition_number:17; rows:1733250
> elapsed_sec:348.711 ops/sec:4970.448308197906
> total # partitions:26; partition_number:18; rows:1742667
> elapsed_sec:747.852 ops/sec:2330.2297780844337
> total # partitions:26; partition_number:19; rows:1742777 elapsed_sec:21.719
> ops/sec:80242.04613472075
> total # partitions:26; partition_number:20; rows:1742520 elapsed_sec:41.713
> ops/sec:41774.02728166279
> total # partitions:26; partition_number:21; rows:1742248
> elapsed_sec:398.595 ops/sec:4370.97304281288
> total # partitions:26; partition_number:22; rows:1743288
> elapsed_sec:416.296 ops/sec:4187.61650364164
> total # partitions:26; partition_number:23; rows:1741106 elapsed_sec:376.67
> ops/sec:4622.364403854833
> total # partitions:26; partition_number:24; rows:1738634 elapsed_sec:34.025
> ops/sec:51098.72152828803
> total # partitions:26; partition_number:25; rows:1736522 elapsed_sec:38.673
> ops/sec:44902.69697204768
>
> I can't make rhyme or reason of it. There seems to be an imperfect
> region-to-region correspondence, but some queries that could cross multiple
> datacenters in theory appear fast. Some scans that are restricted to only
> one regionserver are fast, some are slow. Anyone with more experience who
> might have insight?
>
>
Try with one regionserver only; stop all servers, then restart just one
master and one regionserver. See if you have variance then?

St.Ack



>
> On Fri, Mar 25, 2016 at 9:20 AM, Stack <st...@duboce.net> wrote:
>
> > On Fri, Mar 25, 2016 at 3:50 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > James:
> > > Another experiment you can do is to enable region replica -
> HBASE-10070.
> > >
> > > This would bring down the read variance greatly.
> > >
> > >
> > Suggest you NOT do this James.
> >
> > Lets figure your issue as-is rather than compound by adding yet more
> moving
> > parts.
> >
> > St.Ack
> >
> >
> >
> >
> > > > On Mar 25, 2016, at 2:41 AM, Nicolas Liochon <nk...@gmail.com>
> > wrote:
> > > >
> > > > The read path is much more complex than the write one, so the
> response
> > > time
> > > > has much more variance.
> > > > The gap is so wide here that I would bet on Ted's or Stack's points,
> > but
> > > > here are a few other sources of variance:
> > > > - hbase cache: as Anoop said, may be the data is already in the hbase
> > > cache
> > > > (setCacheBlocks(false), means "don't add blocks to the cache", not
> > "don't
> > > > use the cache")
> > > > - OS cache: and if the data is not in HBase cache may be it is in the
> > > > operating system cache (for example if you run the test multiple
> times)
> > > > - data locality: if you're lucky the data is local to the region
> > server.
> > > If
> > > > you're not, the reads need an extra network hoop.
> > > > - number of store: more hfiles/stores per region => slower reads.
> > > > - number of versions and so on: sub case of the previous one: if the
> > rows
> > > > have been updated multiple times and the compaction has not ran yet,
> > you
> > > > will read much more data.
> > > > - (another subcase): the data has not been flushed yet and is
> available
> > > in
> > > > the memstore => fast read.
> > > >
> > > > None of these points has any importance for the the write path.
> > Basically
> > > > the writes variance says nothing about the variance you will get on
> the
> > > > reads.
> > > >
> > > > IIRC, locality and number of stores are visible in HBase UI. Doing a
> > > table
> > > > flush and then running a major compaction generally helps to
> stabilize
> > > > response time when you do a test. But it should not explain the x25
> > > you're
> > > > seeing, there is something else somewhere else. I don't get the
> > > > regionserver boundaries you're mentioning: there is no boundary
> between
> > > > regionservers. A regionserver can host A->D and M->S while another
> > hosts
> > > > D->M and S->Z for example.
> > > >
> > > >> On Fri, Mar 25, 2016 at 6:51 AM, Anoop John <an...@gmail.com>
> > > wrote:
> > > >>
> > > >> I see you set cacheBlocks to be false on the Scan.   By any chance
> on
> > > >> some other RS(s), the data you are looking for is already in cache?
> > > >> (Any previous scan or  by cache on write)  And there are no
> concurrent
> > > >> writes any way right?   This much difference in time !  One
> > > >> possibility is blocks avail or not avail in cache..
> > > >>
> > > >> -Anoop-
> > > >>
> > > >>> On Fri, Mar 25, 2016 at 11:04 AM, Stack <st...@duboce.net> wrote:
> > > >>> On Thu, Mar 24, 2016 at 4:45 PM, James Johansville <
> > > >>> james.johansville@gmail.com> wrote:
> > > >>>
> > > >>>> Hello all,
> > > >>>>
> > > >>>> So, I wrote a Java application for HBase that does a partitioned
> > > >> full-table
> > > >>>> scan according to a set number of partitions. For example, if
> there
> > > are
> > > >> 20
> > > >>>> partitions specified, then 20 separate full scans are launched
> that
> > > >> cover
> > > >>>> an equal slice of the row identifier range.
> > > >>>>
> > > >>>> The rows are uniformly distributed throughout the RegionServers.
> > > >>>
> > > >>>
> > > >>> How many RegionServers? How many Regions? Are Regions evenly
> > > distributed
> > > >>> across the servers? If you put all partitions on one machine and
> then
> > > run
> > > >>> your client, do the timings even out?
> > > >>>
> > > >>> The disparity seems really wide.
> > > >>>
> > > >>> St.Ack
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>> I
> > > >>>> confirmed this through the hbase shell. I have only one column
> > family,
> > > >> and
> > > >>>> each row has the same number of column qualifiers.
> > > >>>>
> > > >>>> My problem is that the individual scan performance is wildly
> > > >> inconsistent
> > > >>>> even though they fetch approximately a similar number of rows.
> This
> > > >>>> inconsistency appears to be random with respect to hosts or
> > > >> regionservers
> > > >>>> or partitions or CPU cores. I am the only user of the fleet and
> not
> > > >> running
> > > >>>> any other concurrent HBase operation.
> > > >>>>
> > > >>>> I started measuring from the beginning of the scan and stopped
> > > measuring
> > > >>>> after the scan was completed. I am not doing any logic with the
> > > results,
> > > >>>> just scanning them.
> > > >>>>
> > > >>>> For ~230K rows fetched per scan, I am getting anywhere from 4
> > seconds
> > > to
> > > >>>> 100+ seconds. This seems a little too bouncy for me. Does anyone
> > have
> > > >> any
> > > >>>> insight? By comparison, a similar utility I wrote to upsert to
> > > >>>> regionservers was very consistent in ops/sec and I had no issues
> > with
> > > >> it.
> > > >>>>
> > > >>>> Using 13 partitions on a machine that has 32 CPU cores and 16 GB
> > > heap, I
> > > >>>> see anywhere between 3K ops/sec to 82K ops/sec. Here's an example
> of
> > > log
> > > >>>> output I saved that used 130 partitions.
> > > >>>>
> > > >>>> total # partitions:130; partition id:47; rows:232730
> > elapsed_sec:6.401
> > > >>>> ops/sec:36358.38150289017
> > > >>>> total # partitions:130; partition id:100; rows:206890
> > > elapsed_sec:6.636
> > > >>>> ops/sec:31176.91380349608
> > > >>>> total # partitions:130; partition id:63; rows:233437
> > elapsed_sec:7.586
> > > >>>> ops/sec:30772.08014764039
> > > >>>> total # partitions:130; partition id:9; rows:232585
> > elapsed_sec:32.985
> > > >>>> ops/sec:7051.235410034865
> > > >>>> total # partitions:130; partition id:19; rows:234192
> > > elapsed_sec:38.733
> > > >>>> ops/sec:6046.3170939508955
> > > >>>> total # partitions:130; partition id:1; rows:232860
> > elapsed_sec:48.479
> > > >>>> ops/sec:4803.316900101075
> > > >>>> total # partitions:130; partition id:125; rows:205334
> > > elapsed_sec:41.911
> > > >>>> ops/sec:4899.286583474505
> > > >>>> total # partitions:130; partition id:123; rows:206622
> > > elapsed_sec:42.281
> > > >>>> ops/sec:4886.875901705258
> > > >>>> total # partitions:130; partition id:54; rows:232811
> > > elapsed_sec:49.083
> > > >>>> ops/sec:4743.210480206996
> > > >>>>
> > > >>>> I use setCacheBlocks(false), setCaching(5000).  Does anyone have
> any
> > > >>>> insight into how I can make the read performance more consistent?
> > > >>>>
> > > >>>> Thanks!
> > > >>
> > >
> >
>

Re: Inconsistent scan performance

Posted by James Johansville <ja...@gmail.com>.
Hello all,

I have 13 RegionServers and presplit into 13 regions (which motivated my
comment that I aligned my queries with the regionservers, which obviously
isn't accurate). I have been testing using a multiple of 13 for partitioned
scans.

Here are my current region setup -- I converted the row identifier to a
logical identifier known to my code for easier read access. There are 26
regions split roughly evenly over a logical range of 0-996.

Here are region boundaries. The splits aren't perfect, so I assume for a 26
partition scan, each scan is going to read most of one region and a tiny
bit of another region (maybe three). I have given each server an index and
marked its datacenter with an index as well -- so the first three regions
are on server #9, and so on.

0 region range=[EMPTY]~id:38 server index=9 dc=1
1 region range=id:38~id:77 server index=9 dc=1
2 region range=id:77~id:115 server index=9 dc=1
3 region range=id:115~id:154 server index=10 dc=3
4 region range=id:154~id:192 server index=3 dc=2
5 region range=id:192~id:231 server index=10 dc=3
6 region range=id:231~id:269 server index=10 dc=3
7 region range=id:269~id:308 server index=10 dc=3
8 region range=id:308~id:346 server index=1 dc=1
9 region range=id:346~id:385 server index=4 dc=2
10 region range=id:385~id:423 server index=2 dc=2
11 region range=id:423~id:462 server index=5 dc=3
12 region range=id:462~id:500 server index=11 dc=1
13 region range=id:500~id:539 server index=10 dc=3
14 region range=id:539~id:577 server index=11 dc=1
15 region range=id:577~id:616 server index=11 dc=1
16 region range=id:616~id:654 server index=6 dc=2
17 region range=id:654~id:693 server index=12 dc=3
18 region range=id:693~id:730 server index=8 dc=1
19 region range=id:730~id:769 server index=4 dc=2
20 region range=id:769~id:806 server index=12 dc=3
21 region range=id:806~id:845 server index=11 dc=1
22 region range=id:845~id:883 server index=9 dc=1
23 region range=id:883~id:921 server index=12 dc=3
24 region range=id:921~id:958 server index=0 dc=3
25 region range=id:958:[EMPTY] server index=12 dc=3


Here are my scan ranges -- the end isn't inclusive. Each scan range
corresponds to a particular partition number in sequence.
0 scan range=id:0~id:39
1 scan range=id:39~id:78
2 scan range=id:78~id:117
3 scan range=id:117~id:156
4 scan range=id:156~id:195
5 scan range=id:195~id:234
6 scan range=id:234~id:273
7 scan range=id:273~id:312
8 scan range=id:312~id:351
9 scan range=id:351~id:389
10 scan range=id:389~id:427
11 scan range=id:427~id:465
12 scan range=id:465~id:503
13 scan range=id:503~id:541
14 scan range=id:541~id:579
15 scan range=id:579~id:617
16 scan range=id:617~id:655
17 scan range=id:655~id:693
18 scan range=id:693~id:731
19 scan range=id:731~id:769
20 scan range=id:769~id:807
21 scan range=id:807~id:845
22 scan range=id:845~id:883
23 scan range=id:883~id:921
24 scan range=id:921~id:959
25 scan range=id:959~id:997

I ran the utility twice on 26-range.
total # partitions:26; partition_number:0; rows:1775368 elapsed_sec:433.525
ops/sec:4095.191742114065
total # partitions:26; partition_number:1; rows:1774447 elapsed_sec:427.884
ops/sec:4147.028166512419
total # partitions:26; partition_number:2; rows:1774755 elapsed_sec:674.357
ops/sec:2631.773674774637
total # partitions:26; partition_number:3; rows:1776706 elapsed_sec:376.375
ops/sec:4720.573895715709
total # partitions:26; partition_number:4; rows:1777409 elapsed_sec:47.484
ops/sec:37431.74543003959
total # partitions:26; partition_number:5; rows:1779412 elapsed_sec:39.443
ops/sec:45113.50556499252
total # partitions:26; partition_number:6; rows:1773312 elapsed_sec:407.076
ops/sec:4356.2184948265185
total # partitions:26; partition_number:7; rows:1773790 elapsed_sec:436.169
ops/sec:4066.749356327479
total # partitions:26; partition_number:8; rows:1773311 elapsed_sec:617.88
ops/sec:2869.99255518871
total # partitions:26; partition_number:9; rows:1727384 elapsed_sec:27.348
ops/sec:63163.0832236361
total # partitions:26; partition_number:10; rows:1734960 elapsed_sec:64.383
ops/sec:26947.48613764503
total # partitions:26; partition_number:11; rows:1729728
elapsed_sec:381.221 ops/sec:4537.336610522505
total # partitions:26; partition_number:12; rows:1732999 elapsed_sec:59.049
ops/sec:29348.49023692188
total # partitions:26; partition_number:13; rows:1731442
elapsed_sec:369.975 ops/sec:4679.889181701466
total # partitions:26; partition_number:14; rows:1732026
elapsed_sec:698.667 ops/sec:2479.0436645784043
total # partitions:26; partition_number:15; rows:1727737
elapsed_sec:398.964 ops/sec:4330.558646895459
total # partitions:26; partition_number:16; rows:1728357
elapsed_sec:695.636 ops/sec:2484.570953774675
total # partitions:26; partition_number:17; rows:1733250
elapsed_sec:392.337 ops/sec:4417.758202769558
total # partitions:26; partition_number:18; rows:1742667
elapsed_sec:410.238 ops/sec:4247.94143887207
total # partitions:26; partition_number:19; rows:1742777
elapsed_sec:374.509 ops/sec:4653.498313792192
total # partitions:26; partition_number:20; rows:1742520 elapsed_sec:44.447
ops/sec:39204.445744369696
total # partitions:26; partition_number:21; rows:1742248
elapsed_sec:404.692 ops/sec:4305.120931473812
total # partitions:26; partition_number:22; rows:1743288
elapsed_sec:420.886 ops/sec:4141.948175990648
total # partitions:26; partition_number:23; rows:1741106
elapsed_sec:374.782 ops/sec:4645.650004535971
total # partitions:26; partition_number:24; rows:1738634 elapsed_sec:34.059
ops/sec:51047.71132446637
total # partitions:26; partition_number:25; rows:1736522 elapsed_sec:38.873
ops/sec:44671.67442698017

total # partitions:26; partition_number:0; rows:1775368 elapsed_sec:687.482
ops/sec:2582.4210670243006
total # partitions:26; partition_number:1; rows:1774447 elapsed_sec:687.132
ops/sec:2582.3961043875124
total # partitions:26; partition_number:2; rows:1774755 elapsed_sec:57.592
ops/sec:30815.998749826365
total # partitions:26; partition_number:3; rows:1776706 elapsed_sec:369.096
ops/sec:4813.6690725448125
total # partitions:26; partition_number:4; rows:1777409 elapsed_sec:49.486
ops/sec:35917.41098492503
total # partitions:26; partition_number:5; rows:1779412 elapsed_sec:39.008
ops/sec:45616.59146841673
total # partitions:26; partition_number:6; rows:1773312 elapsed_sec:405.897
ops/sec:4368.871905926873
total # partitions:26; partition_number:7; rows:1773790 elapsed_sec:368.597
ops/sec:4812.274652262498
total # partitions:26; partition_number:8; rows:1773311 elapsed_sec:401.001
ops/sec:4422.210917179758ex
total # partitions:26; partition_number:9; rows:1727384 elapsed_sec:381.702
ops/sec:4525.477990683832
total # partitions:26; partition_number:10; rows:1734960
elapsed_sec:659.132 ops/sec:2632.1890000788917
total # partitions:26; partition_number:11; rows:1729728 elapsed_sec:65.535
ops/sec:26393.95742732891
total # partitions:26; partition_number:12; rows:1732999
elapsed_sec:367.694 ops/sec:4713.15550430521
total # partitions:26; partition_number:13; rows:1731442
elapsed_sec:394.045 ops/sec:4394.020987450672
total # partitions:26; partition_number:14; rows:1732026
elapsed_sec:692.832 ops/sec:2499.9220590272967
total # partitions:26; partition_number:15; rows:1727737
elapsed_sec:637.833 ops/sec:2708.760757126082
total # partitions:26; partition_number:16; rows:1728357 elapsed_sec:35.41
ops/sec:48809.85597288902
total # partitions:26; partition_number:17; rows:1733250
elapsed_sec:348.711 ops/sec:4970.448308197906
total # partitions:26; partition_number:18; rows:1742667
elapsed_sec:747.852 ops/sec:2330.2297780844337
total # partitions:26; partition_number:19; rows:1742777 elapsed_sec:21.719
ops/sec:80242.04613472075
total # partitions:26; partition_number:20; rows:1742520 elapsed_sec:41.713
ops/sec:41774.02728166279
total # partitions:26; partition_number:21; rows:1742248
elapsed_sec:398.595 ops/sec:4370.97304281288
total # partitions:26; partition_number:22; rows:1743288
elapsed_sec:416.296 ops/sec:4187.61650364164
total # partitions:26; partition_number:23; rows:1741106 elapsed_sec:376.67
ops/sec:4622.364403854833
total # partitions:26; partition_number:24; rows:1738634 elapsed_sec:34.025
ops/sec:51098.72152828803
total # partitions:26; partition_number:25; rows:1736522 elapsed_sec:38.673
ops/sec:44902.69697204768

I can't make rhyme or reason of it. There seems to be an imperfect
region-to-region correspondence, but some queries that could cross multiple
datacenters in theory appear fast. Some scans that are restricted to only
one regionserver are fast, some are slow. Anyone with more experience who
might have insight?


On Fri, Mar 25, 2016 at 9:20 AM, Stack <st...@duboce.net> wrote:

> On Fri, Mar 25, 2016 at 3:50 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > James:
> > Another experiment you can do is to enable region replica - HBASE-10070.
> >
> > This would bring down the read variance greatly.
> >
> >
> Suggest you NOT do this James.
>
> Lets figure your issue as-is rather than compound by adding yet more moving
> parts.
>
> St.Ack
>
>
>
>
> > > On Mar 25, 2016, at 2:41 AM, Nicolas Liochon <nk...@gmail.com>
> wrote:
> > >
> > > The read path is much more complex than the write one, so the response
> > time
> > > has much more variance.
> > > The gap is so wide here that I would bet on Ted's or Stack's points,
> but
> > > here are a few other sources of variance:
> > > - hbase cache: as Anoop said, may be the data is already in the hbase
> > cache
> > > (setCacheBlocks(false), means "don't add blocks to the cache", not
> "don't
> > > use the cache")
> > > - OS cache: and if the data is not in HBase cache may be it is in the
> > > operating system cache (for example if you run the test multiple times)
> > > - data locality: if you're lucky the data is local to the region
> server.
> > If
> > > you're not, the reads need an extra network hoop.
> > > - number of store: more hfiles/stores per region => slower reads.
> > > - number of versions and so on: sub case of the previous one: if the
> rows
> > > have been updated multiple times and the compaction has not ran yet,
> you
> > > will read much more data.
> > > - (another subcase): the data has not been flushed yet and is available
> > in
> > > the memstore => fast read.
> > >
> > > None of these points has any importance for the the write path.
> Basically
> > > the writes variance says nothing about the variance you will get on the
> > > reads.
> > >
> > > IIRC, locality and number of stores are visible in HBase UI. Doing a
> > table
> > > flush and then running a major compaction generally helps to stabilize
> > > response time when you do a test. But it should not explain the x25
> > you're
> > > seeing, there is something else somewhere else. I don't get the
> > > regionserver boundaries you're mentioning: there is no boundary between
> > > regionservers. A regionserver can host A->D and M->S while another
> hosts
> > > D->M and S->Z for example.
> > >
> > >> On Fri, Mar 25, 2016 at 6:51 AM, Anoop John <an...@gmail.com>
> > wrote:
> > >>
> > >> I see you set cacheBlocks to be false on the Scan.   By any chance on
> > >> some other RS(s), the data you are looking for is already in cache?
> > >> (Any previous scan or  by cache on write)  And there are no concurrent
> > >> writes any way right?   This much difference in time !  One
> > >> possibility is blocks avail or not avail in cache..
> > >>
> > >> -Anoop-
> > >>
> > >>> On Fri, Mar 25, 2016 at 11:04 AM, Stack <st...@duboce.net> wrote:
> > >>> On Thu, Mar 24, 2016 at 4:45 PM, James Johansville <
> > >>> james.johansville@gmail.com> wrote:
> > >>>
> > >>>> Hello all,
> > >>>>
> > >>>> So, I wrote a Java application for HBase that does a partitioned
> > >> full-table
> > >>>> scan according to a set number of partitions. For example, if there
> > are
> > >> 20
> > >>>> partitions specified, then 20 separate full scans are launched that
> > >> cover
> > >>>> an equal slice of the row identifier range.
> > >>>>
> > >>>> The rows are uniformly distributed throughout the RegionServers.
> > >>>
> > >>>
> > >>> How many RegionServers? How many Regions? Are Regions evenly
> > distributed
> > >>> across the servers? If you put all partitions on one machine and then
> > run
> > >>> your client, do the timings even out?
> > >>>
> > >>> The disparity seems really wide.
> > >>>
> > >>> St.Ack
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>> I
> > >>>> confirmed this through the hbase shell. I have only one column
> family,
> > >> and
> > >>>> each row has the same number of column qualifiers.
> > >>>>
> > >>>> My problem is that the individual scan performance is wildly
> > >> inconsistent
> > >>>> even though they fetch approximately a similar number of rows. This
> > >>>> inconsistency appears to be random with respect to hosts or
> > >> regionservers
> > >>>> or partitions or CPU cores. I am the only user of the fleet and not
> > >> running
> > >>>> any other concurrent HBase operation.
> > >>>>
> > >>>> I started measuring from the beginning of the scan and stopped
> > measuring
> > >>>> after the scan was completed. I am not doing any logic with the
> > results,
> > >>>> just scanning them.
> > >>>>
> > >>>> For ~230K rows fetched per scan, I am getting anywhere from 4
> seconds
> > to
> > >>>> 100+ seconds. This seems a little too bouncy for me. Does anyone
> have
> > >> any
> > >>>> insight? By comparison, a similar utility I wrote to upsert to
> > >>>> regionservers was very consistent in ops/sec and I had no issues
> with
> > >> it.
> > >>>>
> > >>>> Using 13 partitions on a machine that has 32 CPU cores and 16 GB
> > heap, I
> > >>>> see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of
> > log
> > >>>> output I saved that used 130 partitions.
> > >>>>
> > >>>> total # partitions:130; partition id:47; rows:232730
> elapsed_sec:6.401
> > >>>> ops/sec:36358.38150289017
> > >>>> total # partitions:130; partition id:100; rows:206890
> > elapsed_sec:6.636
> > >>>> ops/sec:31176.91380349608
> > >>>> total # partitions:130; partition id:63; rows:233437
> elapsed_sec:7.586
> > >>>> ops/sec:30772.08014764039
> > >>>> total # partitions:130; partition id:9; rows:232585
> elapsed_sec:32.985
> > >>>> ops/sec:7051.235410034865
> > >>>> total # partitions:130; partition id:19; rows:234192
> > elapsed_sec:38.733
> > >>>> ops/sec:6046.3170939508955
> > >>>> total # partitions:130; partition id:1; rows:232860
> elapsed_sec:48.479
> > >>>> ops/sec:4803.316900101075
> > >>>> total # partitions:130; partition id:125; rows:205334
> > elapsed_sec:41.911
> > >>>> ops/sec:4899.286583474505
> > >>>> total # partitions:130; partition id:123; rows:206622
> > elapsed_sec:42.281
> > >>>> ops/sec:4886.875901705258
> > >>>> total # partitions:130; partition id:54; rows:232811
> > elapsed_sec:49.083
> > >>>> ops/sec:4743.210480206996
> > >>>>
> > >>>> I use setCacheBlocks(false), setCaching(5000).  Does anyone have any
> > >>>> insight into how I can make the read performance more consistent?
> > >>>>
> > >>>> Thanks!
> > >>
> >
>

Re: Inconsistent scan performance

Posted by Stack <st...@duboce.net>.
On Fri, Mar 25, 2016 at 3:50 AM, Ted Yu <yu...@gmail.com> wrote:

> James:
> Another experiment you can do is to enable region replica - HBASE-10070.
>
> This would bring down the read variance greatly.
>
>
Suggest you NOT do this James.

Lets figure your issue as-is rather than compound by adding yet more moving
parts.

St.Ack




> > On Mar 25, 2016, at 2:41 AM, Nicolas Liochon <nk...@gmail.com> wrote:
> >
> > The read path is much more complex than the write one, so the response
> time
> > has much more variance.
> > The gap is so wide here that I would bet on Ted's or Stack's points, but
> > here are a few other sources of variance:
> > - hbase cache: as Anoop said, may be the data is already in the hbase
> cache
> > (setCacheBlocks(false), means "don't add blocks to the cache", not "don't
> > use the cache")
> > - OS cache: and if the data is not in HBase cache may be it is in the
> > operating system cache (for example if you run the test multiple times)
> > - data locality: if you're lucky the data is local to the region server.
> If
> > you're not, the reads need an extra network hoop.
> > - number of store: more hfiles/stores per region => slower reads.
> > - number of versions and so on: sub case of the previous one: if the rows
> > have been updated multiple times and the compaction has not ran yet, you
> > will read much more data.
> > - (another subcase): the data has not been flushed yet and is available
> in
> > the memstore => fast read.
> >
> > None of these points has any importance for the the write path. Basically
> > the writes variance says nothing about the variance you will get on the
> > reads.
> >
> > IIRC, locality and number of stores are visible in HBase UI. Doing a
> table
> > flush and then running a major compaction generally helps to stabilize
> > response time when you do a test. But it should not explain the x25
> you're
> > seeing, there is something else somewhere else. I don't get the
> > regionserver boundaries you're mentioning: there is no boundary between
> > regionservers. A regionserver can host A->D and M->S while another hosts
> > D->M and S->Z for example.
> >
> >> On Fri, Mar 25, 2016 at 6:51 AM, Anoop John <an...@gmail.com>
> wrote:
> >>
> >> I see you set cacheBlocks to be false on the Scan.   By any chance on
> >> some other RS(s), the data you are looking for is already in cache?
> >> (Any previous scan or  by cache on write)  And there are no concurrent
> >> writes any way right?   This much difference in time !  One
> >> possibility is blocks avail or not avail in cache..
> >>
> >> -Anoop-
> >>
> >>> On Fri, Mar 25, 2016 at 11:04 AM, Stack <st...@duboce.net> wrote:
> >>> On Thu, Mar 24, 2016 at 4:45 PM, James Johansville <
> >>> james.johansville@gmail.com> wrote:
> >>>
> >>>> Hello all,
> >>>>
> >>>> So, I wrote a Java application for HBase that does a partitioned
> >> full-table
> >>>> scan according to a set number of partitions. For example, if there
> are
> >> 20
> >>>> partitions specified, then 20 separate full scans are launched that
> >> cover
> >>>> an equal slice of the row identifier range.
> >>>>
> >>>> The rows are uniformly distributed throughout the RegionServers.
> >>>
> >>>
> >>> How many RegionServers? How many Regions? Are Regions evenly
> distributed
> >>> across the servers? If you put all partitions on one machine and then
> run
> >>> your client, do the timings even out?
> >>>
> >>> The disparity seems really wide.
> >>>
> >>> St.Ack
> >>>
> >>>
> >>>
> >>>
> >>>> I
> >>>> confirmed this through the hbase shell. I have only one column family,
> >> and
> >>>> each row has the same number of column qualifiers.
> >>>>
> >>>> My problem is that the individual scan performance is wildly
> >> inconsistent
> >>>> even though they fetch approximately a similar number of rows. This
> >>>> inconsistency appears to be random with respect to hosts or
> >> regionservers
> >>>> or partitions or CPU cores. I am the only user of the fleet and not
> >> running
> >>>> any other concurrent HBase operation.
> >>>>
> >>>> I started measuring from the beginning of the scan and stopped
> measuring
> >>>> after the scan was completed. I am not doing any logic with the
> results,
> >>>> just scanning them.
> >>>>
> >>>> For ~230K rows fetched per scan, I am getting anywhere from 4 seconds
> to
> >>>> 100+ seconds. This seems a little too bouncy for me. Does anyone have
> >> any
> >>>> insight? By comparison, a similar utility I wrote to upsert to
> >>>> regionservers was very consistent in ops/sec and I had no issues with
> >> it.
> >>>>
> >>>> Using 13 partitions on a machine that has 32 CPU cores and 16 GB
> heap, I
> >>>> see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of
> log
> >>>> output I saved that used 130 partitions.
> >>>>
> >>>> total # partitions:130; partition id:47; rows:232730 elapsed_sec:6.401
> >>>> ops/sec:36358.38150289017
> >>>> total # partitions:130; partition id:100; rows:206890
> elapsed_sec:6.636
> >>>> ops/sec:31176.91380349608
> >>>> total # partitions:130; partition id:63; rows:233437 elapsed_sec:7.586
> >>>> ops/sec:30772.08014764039
> >>>> total # partitions:130; partition id:9; rows:232585 elapsed_sec:32.985
> >>>> ops/sec:7051.235410034865
> >>>> total # partitions:130; partition id:19; rows:234192
> elapsed_sec:38.733
> >>>> ops/sec:6046.3170939508955
> >>>> total # partitions:130; partition id:1; rows:232860 elapsed_sec:48.479
> >>>> ops/sec:4803.316900101075
> >>>> total # partitions:130; partition id:125; rows:205334
> elapsed_sec:41.911
> >>>> ops/sec:4899.286583474505
> >>>> total # partitions:130; partition id:123; rows:206622
> elapsed_sec:42.281
> >>>> ops/sec:4886.875901705258
> >>>> total # partitions:130; partition id:54; rows:232811
> elapsed_sec:49.083
> >>>> ops/sec:4743.210480206996
> >>>>
> >>>> I use setCacheBlocks(false), setCaching(5000).  Does anyone have any
> >>>> insight into how I can make the read performance more consistent?
> >>>>
> >>>> Thanks!
> >>
>

Re: Inconsistent scan performance

Posted by Ted Yu <yu...@gmail.com>.
James:
Another experiment you can do is to enable region replica - HBASE-10070. 

This would bring down the read variance greatly. 

> On Mar 25, 2016, at 2:41 AM, Nicolas Liochon <nk...@gmail.com> wrote:
> 
> The read path is much more complex than the write one, so the response time
> has much more variance.
> The gap is so wide here that I would bet on Ted's or Stack's points, but
> here are a few other sources of variance:
> - hbase cache: as Anoop said, may be the data is already in the hbase cache
> (setCacheBlocks(false), means "don't add blocks to the cache", not "don't
> use the cache")
> - OS cache: and if the data is not in HBase cache may be it is in the
> operating system cache (for example if you run the test multiple times)
> - data locality: if you're lucky the data is local to the region server. If
> you're not, the reads need an extra network hoop.
> - number of store: more hfiles/stores per region => slower reads.
> - number of versions and so on: sub case of the previous one: if the rows
> have been updated multiple times and the compaction has not ran yet, you
> will read much more data.
> - (another subcase): the data has not been flushed yet and is available in
> the memstore => fast read.
> 
> None of these points has any importance for the the write path. Basically
> the writes variance says nothing about the variance you will get on the
> reads.
> 
> IIRC, locality and number of stores are visible in HBase UI. Doing a table
> flush and then running a major compaction generally helps to stabilize
> response time when you do a test. But it should not explain the x25 you're
> seeing, there is something else somewhere else. I don't get the
> regionserver boundaries you're mentioning: there is no boundary between
> regionservers. A regionserver can host A->D and M->S while another hosts
> D->M and S->Z for example.
> 
>> On Fri, Mar 25, 2016 at 6:51 AM, Anoop John <an...@gmail.com> wrote:
>> 
>> I see you set cacheBlocks to be false on the Scan.   By any chance on
>> some other RS(s), the data you are looking for is already in cache?
>> (Any previous scan or  by cache on write)  And there are no concurrent
>> writes any way right?   This much difference in time !  One
>> possibility is blocks avail or not avail in cache..
>> 
>> -Anoop-
>> 
>>> On Fri, Mar 25, 2016 at 11:04 AM, Stack <st...@duboce.net> wrote:
>>> On Thu, Mar 24, 2016 at 4:45 PM, James Johansville <
>>> james.johansville@gmail.com> wrote:
>>> 
>>>> Hello all,
>>>> 
>>>> So, I wrote a Java application for HBase that does a partitioned
>> full-table
>>>> scan according to a set number of partitions. For example, if there are
>> 20
>>>> partitions specified, then 20 separate full scans are launched that
>> cover
>>>> an equal slice of the row identifier range.
>>>> 
>>>> The rows are uniformly distributed throughout the RegionServers.
>>> 
>>> 
>>> How many RegionServers? How many Regions? Are Regions evenly distributed
>>> across the servers? If you put all partitions on one machine and then run
>>> your client, do the timings even out?
>>> 
>>> The disparity seems really wide.
>>> 
>>> St.Ack
>>> 
>>> 
>>> 
>>> 
>>>> I
>>>> confirmed this through the hbase shell. I have only one column family,
>> and
>>>> each row has the same number of column qualifiers.
>>>> 
>>>> My problem is that the individual scan performance is wildly
>> inconsistent
>>>> even though they fetch approximately a similar number of rows. This
>>>> inconsistency appears to be random with respect to hosts or
>> regionservers
>>>> or partitions or CPU cores. I am the only user of the fleet and not
>> running
>>>> any other concurrent HBase operation.
>>>> 
>>>> I started measuring from the beginning of the scan and stopped measuring
>>>> after the scan was completed. I am not doing any logic with the results,
>>>> just scanning them.
>>>> 
>>>> For ~230K rows fetched per scan, I am getting anywhere from 4 seconds to
>>>> 100+ seconds. This seems a little too bouncy for me. Does anyone have
>> any
>>>> insight? By comparison, a similar utility I wrote to upsert to
>>>> regionservers was very consistent in ops/sec and I had no issues with
>> it.
>>>> 
>>>> Using 13 partitions on a machine that has 32 CPU cores and 16 GB heap, I
>>>> see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of log
>>>> output I saved that used 130 partitions.
>>>> 
>>>> total # partitions:130; partition id:47; rows:232730 elapsed_sec:6.401
>>>> ops/sec:36358.38150289017
>>>> total # partitions:130; partition id:100; rows:206890 elapsed_sec:6.636
>>>> ops/sec:31176.91380349608
>>>> total # partitions:130; partition id:63; rows:233437 elapsed_sec:7.586
>>>> ops/sec:30772.08014764039
>>>> total # partitions:130; partition id:9; rows:232585 elapsed_sec:32.985
>>>> ops/sec:7051.235410034865
>>>> total # partitions:130; partition id:19; rows:234192 elapsed_sec:38.733
>>>> ops/sec:6046.3170939508955
>>>> total # partitions:130; partition id:1; rows:232860 elapsed_sec:48.479
>>>> ops/sec:4803.316900101075
>>>> total # partitions:130; partition id:125; rows:205334 elapsed_sec:41.911
>>>> ops/sec:4899.286583474505
>>>> total # partitions:130; partition id:123; rows:206622 elapsed_sec:42.281
>>>> ops/sec:4886.875901705258
>>>> total # partitions:130; partition id:54; rows:232811 elapsed_sec:49.083
>>>> ops/sec:4743.210480206996
>>>> 
>>>> I use setCacheBlocks(false), setCaching(5000).  Does anyone have any
>>>> insight into how I can make the read performance more consistent?
>>>> 
>>>> Thanks!
>> 

Re: Inconsistent scan performance

Posted by Nicolas Liochon <nk...@gmail.com>.
The read path is much more complex than the write one, so the response time
has much more variance.
The gap is so wide here that I would bet on Ted's or Stack's points, but
here are a few other sources of variance:
- hbase cache: as Anoop said, may be the data is already in the hbase cache
(setCacheBlocks(false), means "don't add blocks to the cache", not "don't
use the cache")
- OS cache: and if the data is not in HBase cache may be it is in the
operating system cache (for example if you run the test multiple times)
- data locality: if you're lucky the data is local to the region server. If
you're not, the reads need an extra network hoop.
- number of store: more hfiles/stores per region => slower reads.
- number of versions and so on: sub case of the previous one: if the rows
have been updated multiple times and the compaction has not ran yet, you
will read much more data.
- (another subcase): the data has not been flushed yet and is available in
the memstore => fast read.

None of these points has any importance for the the write path. Basically
the writes variance says nothing about the variance you will get on the
reads.

IIRC, locality and number of stores are visible in HBase UI. Doing a table
flush and then running a major compaction generally helps to stabilize
response time when you do a test. But it should not explain the x25 you're
seeing, there is something else somewhere else. I don't get the
regionserver boundaries you're mentioning: there is no boundary between
regionservers. A regionserver can host A->D and M->S while another hosts
D->M and S->Z for example.

On Fri, Mar 25, 2016 at 6:51 AM, Anoop John <an...@gmail.com> wrote:

> I see you set cacheBlocks to be false on the Scan.   By any chance on
> some other RS(s), the data you are looking for is already in cache?
> (Any previous scan or  by cache on write)  And there are no concurrent
> writes any way right?   This much difference in time !  One
> possibility is blocks avail or not avail in cache..
>
> -Anoop-
>
> On Fri, Mar 25, 2016 at 11:04 AM, Stack <st...@duboce.net> wrote:
> > On Thu, Mar 24, 2016 at 4:45 PM, James Johansville <
> > james.johansville@gmail.com> wrote:
> >
> >> Hello all,
> >>
> >> So, I wrote a Java application for HBase that does a partitioned
> full-table
> >> scan according to a set number of partitions. For example, if there are
> 20
> >> partitions specified, then 20 separate full scans are launched that
> cover
> >> an equal slice of the row identifier range.
> >>
> >> The rows are uniformly distributed throughout the RegionServers.
> >
> >
> > How many RegionServers? How many Regions? Are Regions evenly distributed
> > across the servers? If you put all partitions on one machine and then run
> > your client, do the timings even out?
> >
> > The disparity seems really wide.
> >
> > St.Ack
> >
> >
> >
> >
> >> I
> >> confirmed this through the hbase shell. I have only one column family,
> and
> >> each row has the same number of column qualifiers.
> >>
> >> My problem is that the individual scan performance is wildly
> inconsistent
> >> even though they fetch approximately a similar number of rows. This
> >> inconsistency appears to be random with respect to hosts or
> regionservers
> >> or partitions or CPU cores. I am the only user of the fleet and not
> running
> >> any other concurrent HBase operation.
> >>
> >> I started measuring from the beginning of the scan and stopped measuring
> >> after the scan was completed. I am not doing any logic with the results,
> >> just scanning them.
> >>
> >> For ~230K rows fetched per scan, I am getting anywhere from 4 seconds to
> >> 100+ seconds. This seems a little too bouncy for me. Does anyone have
> any
> >> insight? By comparison, a similar utility I wrote to upsert to
> >> regionservers was very consistent in ops/sec and I had no issues with
> it.
> >>
> >> Using 13 partitions on a machine that has 32 CPU cores and 16 GB heap, I
> >> see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of log
> >> output I saved that used 130 partitions.
> >>
> >> total # partitions:130; partition id:47; rows:232730 elapsed_sec:6.401
> >> ops/sec:36358.38150289017
> >> total # partitions:130; partition id:100; rows:206890 elapsed_sec:6.636
> >> ops/sec:31176.91380349608
> >> total # partitions:130; partition id:63; rows:233437 elapsed_sec:7.586
> >> ops/sec:30772.08014764039
> >> total # partitions:130; partition id:9; rows:232585 elapsed_sec:32.985
> >> ops/sec:7051.235410034865
> >> total # partitions:130; partition id:19; rows:234192 elapsed_sec:38.733
> >> ops/sec:6046.3170939508955
> >> total # partitions:130; partition id:1; rows:232860 elapsed_sec:48.479
> >> ops/sec:4803.316900101075
> >> total # partitions:130; partition id:125; rows:205334 elapsed_sec:41.911
> >> ops/sec:4899.286583474505
> >> total # partitions:130; partition id:123; rows:206622 elapsed_sec:42.281
> >> ops/sec:4886.875901705258
> >> total # partitions:130; partition id:54; rows:232811 elapsed_sec:49.083
> >> ops/sec:4743.210480206996
> >>
> >> I use setCacheBlocks(false), setCaching(5000).  Does anyone have any
> >> insight into how I can make the read performance more consistent?
> >>
> >> Thanks!
> >>
>

Re: Inconsistent scan performance

Posted by Anoop John <an...@gmail.com>.
I see you set cacheBlocks to be false on the Scan.   By any chance on
some other RS(s), the data you are looking for is already in cache?
(Any previous scan or  by cache on write)  And there are no concurrent
writes any way right?   This much difference in time !  One
possibility is blocks avail or not avail in cache..

-Anoop-

On Fri, Mar 25, 2016 at 11:04 AM, Stack <st...@duboce.net> wrote:
> On Thu, Mar 24, 2016 at 4:45 PM, James Johansville <
> james.johansville@gmail.com> wrote:
>
>> Hello all,
>>
>> So, I wrote a Java application for HBase that does a partitioned full-table
>> scan according to a set number of partitions. For example, if there are 20
>> partitions specified, then 20 separate full scans are launched that cover
>> an equal slice of the row identifier range.
>>
>> The rows are uniformly distributed throughout the RegionServers.
>
>
> How many RegionServers? How many Regions? Are Regions evenly distributed
> across the servers? If you put all partitions on one machine and then run
> your client, do the timings even out?
>
> The disparity seems really wide.
>
> St.Ack
>
>
>
>
>> I
>> confirmed this through the hbase shell. I have only one column family, and
>> each row has the same number of column qualifiers.
>>
>> My problem is that the individual scan performance is wildly inconsistent
>> even though they fetch approximately a similar number of rows. This
>> inconsistency appears to be random with respect to hosts or regionservers
>> or partitions or CPU cores. I am the only user of the fleet and not running
>> any other concurrent HBase operation.
>>
>> I started measuring from the beginning of the scan and stopped measuring
>> after the scan was completed. I am not doing any logic with the results,
>> just scanning them.
>>
>> For ~230K rows fetched per scan, I am getting anywhere from 4 seconds to
>> 100+ seconds. This seems a little too bouncy for me. Does anyone have any
>> insight? By comparison, a similar utility I wrote to upsert to
>> regionservers was very consistent in ops/sec and I had no issues with it.
>>
>> Using 13 partitions on a machine that has 32 CPU cores and 16 GB heap, I
>> see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of log
>> output I saved that used 130 partitions.
>>
>> total # partitions:130; partition id:47; rows:232730 elapsed_sec:6.401
>> ops/sec:36358.38150289017
>> total # partitions:130; partition id:100; rows:206890 elapsed_sec:6.636
>> ops/sec:31176.91380349608
>> total # partitions:130; partition id:63; rows:233437 elapsed_sec:7.586
>> ops/sec:30772.08014764039
>> total # partitions:130; partition id:9; rows:232585 elapsed_sec:32.985
>> ops/sec:7051.235410034865
>> total # partitions:130; partition id:19; rows:234192 elapsed_sec:38.733
>> ops/sec:6046.3170939508955
>> total # partitions:130; partition id:1; rows:232860 elapsed_sec:48.479
>> ops/sec:4803.316900101075
>> total # partitions:130; partition id:125; rows:205334 elapsed_sec:41.911
>> ops/sec:4899.286583474505
>> total # partitions:130; partition id:123; rows:206622 elapsed_sec:42.281
>> ops/sec:4886.875901705258
>> total # partitions:130; partition id:54; rows:232811 elapsed_sec:49.083
>> ops/sec:4743.210480206996
>>
>> I use setCacheBlocks(false), setCaching(5000).  Does anyone have any
>> insight into how I can make the read performance more consistent?
>>
>> Thanks!
>>

Re: Inconsistent scan performance

Posted by Stack <st...@duboce.net>.
On Thu, Mar 24, 2016 at 4:45 PM, James Johansville <
james.johansville@gmail.com> wrote:

> Hello all,
>
> So, I wrote a Java application for HBase that does a partitioned full-table
> scan according to a set number of partitions. For example, if there are 20
> partitions specified, then 20 separate full scans are launched that cover
> an equal slice of the row identifier range.
>
> The rows are uniformly distributed throughout the RegionServers.


How many RegionServers? How many Regions? Are Regions evenly distributed
across the servers? If you put all partitions on one machine and then run
your client, do the timings even out?

The disparity seems really wide.

St.Ack




> I
> confirmed this through the hbase shell. I have only one column family, and
> each row has the same number of column qualifiers.
>
> My problem is that the individual scan performance is wildly inconsistent
> even though they fetch approximately a similar number of rows. This
> inconsistency appears to be random with respect to hosts or regionservers
> or partitions or CPU cores. I am the only user of the fleet and not running
> any other concurrent HBase operation.
>
> I started measuring from the beginning of the scan and stopped measuring
> after the scan was completed. I am not doing any logic with the results,
> just scanning them.
>
> For ~230K rows fetched per scan, I am getting anywhere from 4 seconds to
> 100+ seconds. This seems a little too bouncy for me. Does anyone have any
> insight? By comparison, a similar utility I wrote to upsert to
> regionservers was very consistent in ops/sec and I had no issues with it.
>
> Using 13 partitions on a machine that has 32 CPU cores and 16 GB heap, I
> see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of log
> output I saved that used 130 partitions.
>
> total # partitions:130; partition id:47; rows:232730 elapsed_sec:6.401
> ops/sec:36358.38150289017
> total # partitions:130; partition id:100; rows:206890 elapsed_sec:6.636
> ops/sec:31176.91380349608
> total # partitions:130; partition id:63; rows:233437 elapsed_sec:7.586
> ops/sec:30772.08014764039
> total # partitions:130; partition id:9; rows:232585 elapsed_sec:32.985
> ops/sec:7051.235410034865
> total # partitions:130; partition id:19; rows:234192 elapsed_sec:38.733
> ops/sec:6046.3170939508955
> total # partitions:130; partition id:1; rows:232860 elapsed_sec:48.479
> ops/sec:4803.316900101075
> total # partitions:130; partition id:125; rows:205334 elapsed_sec:41.911
> ops/sec:4899.286583474505
> total # partitions:130; partition id:123; rows:206622 elapsed_sec:42.281
> ops/sec:4886.875901705258
> total # partitions:130; partition id:54; rows:232811 elapsed_sec:49.083
> ops/sec:4743.210480206996
>
> I use setCacheBlocks(false), setCaching(5000).  Does anyone have any
> insight into how I can make the read performance more consistent?
>
> Thanks!
>

Re: Inconsistent scan performance

Posted by Ted Yu <yu...@gmail.com>.
Crossing region boundaries which happen to be on different servers may be.

On Thu, Mar 24, 2016 at 5:49 PM, James Johansville <
james.johansville@gmail.com> wrote:

> In theory they should be aligned with *regionserver* boundaries. Would
> crossing multiple regions on the same regionserver result in the big
> performance difference being seen here?
>
> I am using Hortonworks HBase 1.1.2
>
> On Thu, Mar 24, 2016 at 5:32 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > I assume the partitions' boundaries don't align with region boundaries,
> > right ?
> >
> > Meaning some partitions would cross region boundaries.
> >
> > Which hbase release do you use ?
> >
> > Thanks
> >
> > On Thu, Mar 24, 2016 at 4:45 PM, James Johansville <
> > james.johansville@gmail.com> wrote:
> >
> > > Hello all,
> > >
> > > So, I wrote a Java application for HBase that does a partitioned
> > full-table
> > > scan according to a set number of partitions. For example, if there are
> > 20
> > > partitions specified, then 20 separate full scans are launched that
> cover
> > > an equal slice of the row identifier range.
> > >
> > > The rows are uniformly distributed throughout the RegionServers. I
> > > confirmed this through the hbase shell. I have only one column family,
> > and
> > > each row has the same number of column qualifiers.
> > >
> > > My problem is that the individual scan performance is wildly
> inconsistent
> > > even though they fetch approximately a similar number of rows. This
> > > inconsistency appears to be random with respect to hosts or
> regionservers
> > > or partitions or CPU cores. I am the only user of the fleet and not
> > running
> > > any other concurrent HBase operation.
> > >
> > > I started measuring from the beginning of the scan and stopped
> measuring
> > > after the scan was completed. I am not doing any logic with the
> results,
> > > just scanning them.
> > >
> > > For ~230K rows fetched per scan, I am getting anywhere from 4 seconds
> to
> > > 100+ seconds. This seems a little too bouncy for me. Does anyone have
> any
> > > insight? By comparison, a similar utility I wrote to upsert to
> > > regionservers was very consistent in ops/sec and I had no issues with
> it.
> > >
> > > Using 13 partitions on a machine that has 32 CPU cores and 16 GB heap,
> I
> > > see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of
> log
> > > output I saved that used 130 partitions.
> > >
> > > total # partitions:130; partition id:47; rows:232730 elapsed_sec:6.401
> > > ops/sec:36358.38150289017
> > > total # partitions:130; partition id:100; rows:206890 elapsed_sec:6.636
> > > ops/sec:31176.91380349608
> > > total # partitions:130; partition id:63; rows:233437 elapsed_sec:7.586
> > > ops/sec:30772.08014764039
> > > total # partitions:130; partition id:9; rows:232585 elapsed_sec:32.985
> > > ops/sec:7051.235410034865
> > > total # partitions:130; partition id:19; rows:234192 elapsed_sec:38.733
> > > ops/sec:6046.3170939508955
> > > total # partitions:130; partition id:1; rows:232860 elapsed_sec:48.479
> > > ops/sec:4803.316900101075
> > > total # partitions:130; partition id:125; rows:205334
> elapsed_sec:41.911
> > > ops/sec:4899.286583474505
> > > total # partitions:130; partition id:123; rows:206622
> elapsed_sec:42.281
> > > ops/sec:4886.875901705258
> > > total # partitions:130; partition id:54; rows:232811 elapsed_sec:49.083
> > > ops/sec:4743.210480206996
> > >
> > > I use setCacheBlocks(false), setCaching(5000).  Does anyone have any
> > > insight into how I can make the read performance more consistent?
> > >
> > > Thanks!
> > >
> >
>

Re: Inconsistent scan performance

Posted by James Johansville <ja...@gmail.com>.
In theory they should be aligned with *regionserver* boundaries. Would
crossing multiple regions on the same regionserver result in the big
performance difference being seen here?

I am using Hortonworks HBase 1.1.2

On Thu, Mar 24, 2016 at 5:32 PM, Ted Yu <yu...@gmail.com> wrote:

> I assume the partitions' boundaries don't align with region boundaries,
> right ?
>
> Meaning some partitions would cross region boundaries.
>
> Which hbase release do you use ?
>
> Thanks
>
> On Thu, Mar 24, 2016 at 4:45 PM, James Johansville <
> james.johansville@gmail.com> wrote:
>
> > Hello all,
> >
> > So, I wrote a Java application for HBase that does a partitioned
> full-table
> > scan according to a set number of partitions. For example, if there are
> 20
> > partitions specified, then 20 separate full scans are launched that cover
> > an equal slice of the row identifier range.
> >
> > The rows are uniformly distributed throughout the RegionServers. I
> > confirmed this through the hbase shell. I have only one column family,
> and
> > each row has the same number of column qualifiers.
> >
> > My problem is that the individual scan performance is wildly inconsistent
> > even though they fetch approximately a similar number of rows. This
> > inconsistency appears to be random with respect to hosts or regionservers
> > or partitions or CPU cores. I am the only user of the fleet and not
> running
> > any other concurrent HBase operation.
> >
> > I started measuring from the beginning of the scan and stopped measuring
> > after the scan was completed. I am not doing any logic with the results,
> > just scanning them.
> >
> > For ~230K rows fetched per scan, I am getting anywhere from 4 seconds to
> > 100+ seconds. This seems a little too bouncy for me. Does anyone have any
> > insight? By comparison, a similar utility I wrote to upsert to
> > regionservers was very consistent in ops/sec and I had no issues with it.
> >
> > Using 13 partitions on a machine that has 32 CPU cores and 16 GB heap, I
> > see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of log
> > output I saved that used 130 partitions.
> >
> > total # partitions:130; partition id:47; rows:232730 elapsed_sec:6.401
> > ops/sec:36358.38150289017
> > total # partitions:130; partition id:100; rows:206890 elapsed_sec:6.636
> > ops/sec:31176.91380349608
> > total # partitions:130; partition id:63; rows:233437 elapsed_sec:7.586
> > ops/sec:30772.08014764039
> > total # partitions:130; partition id:9; rows:232585 elapsed_sec:32.985
> > ops/sec:7051.235410034865
> > total # partitions:130; partition id:19; rows:234192 elapsed_sec:38.733
> > ops/sec:6046.3170939508955
> > total # partitions:130; partition id:1; rows:232860 elapsed_sec:48.479
> > ops/sec:4803.316900101075
> > total # partitions:130; partition id:125; rows:205334 elapsed_sec:41.911
> > ops/sec:4899.286583474505
> > total # partitions:130; partition id:123; rows:206622 elapsed_sec:42.281
> > ops/sec:4886.875901705258
> > total # partitions:130; partition id:54; rows:232811 elapsed_sec:49.083
> > ops/sec:4743.210480206996
> >
> > I use setCacheBlocks(false), setCaching(5000).  Does anyone have any
> > insight into how I can make the read performance more consistent?
> >
> > Thanks!
> >
>

Re: Inconsistent scan performance

Posted by Ted Yu <yu...@gmail.com>.
I assume the partitions' boundaries don't align with region boundaries,
right ?

Meaning some partitions would cross region boundaries.

Which hbase release do you use ?

Thanks

On Thu, Mar 24, 2016 at 4:45 PM, James Johansville <
james.johansville@gmail.com> wrote:

> Hello all,
>
> So, I wrote a Java application for HBase that does a partitioned full-table
> scan according to a set number of partitions. For example, if there are 20
> partitions specified, then 20 separate full scans are launched that cover
> an equal slice of the row identifier range.
>
> The rows are uniformly distributed throughout the RegionServers. I
> confirmed this through the hbase shell. I have only one column family, and
> each row has the same number of column qualifiers.
>
> My problem is that the individual scan performance is wildly inconsistent
> even though they fetch approximately a similar number of rows. This
> inconsistency appears to be random with respect to hosts or regionservers
> or partitions or CPU cores. I am the only user of the fleet and not running
> any other concurrent HBase operation.
>
> I started measuring from the beginning of the scan and stopped measuring
> after the scan was completed. I am not doing any logic with the results,
> just scanning them.
>
> For ~230K rows fetched per scan, I am getting anywhere from 4 seconds to
> 100+ seconds. This seems a little too bouncy for me. Does anyone have any
> insight? By comparison, a similar utility I wrote to upsert to
> regionservers was very consistent in ops/sec and I had no issues with it.
>
> Using 13 partitions on a machine that has 32 CPU cores and 16 GB heap, I
> see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of log
> output I saved that used 130 partitions.
>
> total # partitions:130; partition id:47; rows:232730 elapsed_sec:6.401
> ops/sec:36358.38150289017
> total # partitions:130; partition id:100; rows:206890 elapsed_sec:6.636
> ops/sec:31176.91380349608
> total # partitions:130; partition id:63; rows:233437 elapsed_sec:7.586
> ops/sec:30772.08014764039
> total # partitions:130; partition id:9; rows:232585 elapsed_sec:32.985
> ops/sec:7051.235410034865
> total # partitions:130; partition id:19; rows:234192 elapsed_sec:38.733
> ops/sec:6046.3170939508955
> total # partitions:130; partition id:1; rows:232860 elapsed_sec:48.479
> ops/sec:4803.316900101075
> total # partitions:130; partition id:125; rows:205334 elapsed_sec:41.911
> ops/sec:4899.286583474505
> total # partitions:130; partition id:123; rows:206622 elapsed_sec:42.281
> ops/sec:4886.875901705258
> total # partitions:130; partition id:54; rows:232811 elapsed_sec:49.083
> ops/sec:4743.210480206996
>
> I use setCacheBlocks(false), setCaching(5000).  Does anyone have any
> insight into how I can make the read performance more consistent?
>
> Thanks!
>