You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Harry Waye <hw...@arachnys.com> on 2013/10/23 16:57:35 UTC

Optimizing bulk load performance

I'm trying to load data into hbase using HFileOutputFormat and incremental
bulk load but am getting rather lackluster performance, 10h for ~0.5TB
data, ~50000 blocks.  This is being loaded into a table that has 2
families, 9 columns, 2500 regions and is ~10TB in size.  Keys are md5
hashes and regions are pretty evenly spread.  The majority of time appears
to be spend in the reduce phase, with the map phase completing very
quickly.  The network doesn't appear to be saturated, but the load is
consistently at 6 which is the number or reduce tasks per node.

12 hosts (6 cores, 2 disk as RAID0, 1GB eth, no one else on the rack).

MR conf: 6 mappers, 6 reducers per node.

I spoke to someone on IRC and they recommended reducing job output
replication to 1, and reducing the number of mappers which I reduced to 2.
 Reducing replication appeared not to make any difference, reducing
reducers appeared just to slow the job down.  I'm going to have a look at
running the benchmarks mentioned on Michael Noll's blog and see what that
turns up.  I guess some questions I have are:

How does the global number/size of blocks affect perf.?  (I have a lot of
10mb files, which are the input files)

How does the job local number/size of input blocks affect perf.?

What is actually happening in the reduce phase that requires so much CPU?
 I assume the actual construction of HFiles isn't intensive.

Ultimately, how can I improve performance?
Thanks

Re: Optimizing bulk load performance

Posted by Harry Waye <hw...@arachnys.com>.

Haven't had a chance to run netperf, but spotted messages in syslog of the
form:

Oct 25 21:03:22 ... kernel: [107058.190743] net_ratelimit: 136 callbacks
suppressed
Oct 25 21:03:22 ... kernel: [107058.190746] nf_conntrack: table full,
dropping packet.

Which does perhaps suggest RPC requests are being dropped maybe.  ~16000
connections for port 50060 i.e. tasktracker, I guess I'll try raising the
max and seeing what effect that has.


On 24 October 2013 23:02, Harry Waye <hw...@arachnys.com> wrote:

> Got it!  Re. 50% utilisation, I forgot to mention that 6 cores does not
> include hyper-threading.  Foolish I know, but that would explain CPU0 being
> at 50%.  The nodes are as stated in
> http://www.hetzner.de/en/hosting/produkte_rootserver/ex10 bar the RAID1.
>
>
> On 24 October 2013 22:50, Jean-Marc Spaggiari <je...@spaggiari.org>wrote:
>
>> Remote calls to a server. Just forget about it ;) Please verify the
>> network
>> bandwidth between your nodes.
>>
>>
>> 2013/10/24 Harry Waye <hw...@arachnys.com>
>>
>> > Excuse the ignorance, RCP?
>> >
>> >
>> > On 24 October 2013 22:28, Jean-Marc Spaggiari <jean-marc@spaggiari.org
>> > >wrote:
>> >
>> > > Your nodes are almost 50% idle... Might be something else. Sound it's
>> not
>> > > your disks nor your CPU... Maybe to many RCPs?
>> > >
>> > > Have you investigate on your network side? netperf might be a good
>> help
>> > for
>> > > you.
>> > >
>> > > JM
>> > >
>> > >
>> > > 2013/10/24 Harry Waye <hw...@arachnys.com>
>> > >
>> > > > p.s. I guess this is more turning into a general hadoop issue, but
>> I'll
>> > > > keep the discussion here seeing that I have an audience, unless
>> there
>> > are
>> > > > objections.
>> > > >
>> > > >
>> > > > On 24 October 2013 22:02, Harry Waye <hw...@arachnys.com> wrote:
>> > > >
>> > > > > So just a short update, I'll read into it a little more tomorrow.
>> >  This
>> > > > is
>> > > > > from three of the nodes:
>> > > > > https://gist.github.com/hazzadous/1264af7c674e1b3cf867
>> > > > >
>> > > > > The first is the grey guy.  Just glancing at it, it looks to
>> > fluctuate
>> > > > > more than the others.  I guess that could suggest that there are
>> some
>> > > > > issues with reading from the disks.  Interestingly, it's the only
>> one
>> > > > that
>> > > > > doesn't have smartd installed, which alerts us on changes for the
>> > other
>> > > > > nodes.  I suspect there's probably some mileage in checking its
>> smart
>> > > > > attributes.  Will do that tomorrow though.
>> > > > >
>> > > > > Out of curiosity, how do people normally monitor disk issues?  I'm
>> > > going
>> > > > > to set up collectd to push various things from smartctl tomorrow,
>> at
>> > > the
>> > > > > moment all we do is receive emails, which is mostly noise about
>> > problem
>> > > > > sector counts increasing +1.
>> > > > >
>> > > > >
>> > > > > On 24 October 2013 19:40, Jean-Marc Spaggiari <
>> > jean-marc@spaggiari.org
>> > > > >wrote:
>> > > > >
>> > > > >> Can you try vmstat 2? 2 is the interval in seconds it will
>> display
>> > the
>> > > > >> disk
>> > > > >> usage. On the extract here, nothing is running. only 8% is used.
>> (1%
>> > > > disk
>> > > > >> IO, 6% User, 1% sys)
>> > > > >>
>> > > > >> Run it on 2 or 3 different nodes while you are putting the load
>> on
>> > the
>> > > > >> cluster. And take a look at the 4 last numbers and see what the
>> > value
>> > > of
>> > > > >> the last one?
>> > > > >>
>> > > > >> On the usercpu0 graph, who is the gray guy showing hight?
>> > > > >>
>> > > > >> JM
>> > > > >>
>> > > > >> 2013/10/24 Harry Waye <hw...@arachnys.com>
>> > > > >>
>> > > > >> > Ok I'm running a load job atm, I've add some possibly
>> > > incomprehensible
>> > > > >> > coloured lines to the graph: http://goo.gl/cUGCGG
>> > > > >> >
>> > > > >> > This is actually with one fewer nodes due to decommissioning to
>> > > > replace
>> > > > >> a
>> > > > >> > disk, hence I guess the reason for one squiggly line showing no
>> > disk
>> > > > >> > activity.  I've included only the cpu stats for CPU0 from each
>> > node.
>> > > > >>  The
>> > > > >> > last graph should read "Memory Used".  vmstat from one of the
>> > nodes:
>> > > > >> >
>> > > > >> > procs -----------memory---------- ---swap-- -----io----
>> -system--
>> > > > >> > ----cpu----
>> > > > >> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in
>> cs
>> > us
>> > > > sy
>> > > > >> id
>> > > > >> > wa
>> > > > >> >  6  0      0 392448 524668 43823900    0    0   501  1044    0
>> >  0
>> > >  6
>> > > > >>  1
>> > > > >> > 91  1
>> > > > >> >
>> > > > >> > To me the wait doesn't seem that high.  Job stats are
>> > > > >> > http://goo.gl/ZYdUKp,  the job setup is
>> > > > >> > https://gist.github.com/hazzadous/ac57a384f2ab685f07f6
>> > > > >> >
>> > > > >> > Does anything jump out at you?
>> > > > >> >
>> > > > >> > Cheers
>> > > > >> > H
>> > > > >> >
>> > > > >> >
>> > > > >> > On 24 October 2013 16:16, Harry Waye <hw...@arachnys.com>
>> wrote:
>> > > > >> >
>> > > > >> > > Hi JM
>> > > > >> > >
>> > > > >> > > I took a snapshot on the initial run, before the changes:
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://www.evernote.com/shard/s95/sh/b8e1516d-7c49-43f0-8b5f-d16bbdd3fe13/00d7c6cd6dd9fba92d6f00f90fb54fc1/res/4f0e20a2-1ecb-4085-8bc8-b3263c23afb5/screenshot.png
>> > > > >> > >
>> > > > >> > > Good timing, disks appear to be exploding (ATA errors) atm
>> thus
>> > > I'm
>> > > > >> > > decommissioning and reprovisioning with new disks.  I'll be
>> > > > >> > reprovisioning
>> > > > >> > > as without RAID (it's software RAID just to compound the
>> issue)
>> > > > >> although
>> > > > >> > > not sure how I'll go about migrating all nodes.  I guess I'd
>> > need
>> > > to
>> > > > >> put
>> > > > >> > > more correctly speced nodes in the rack and decommission the
>> > > > existing.
>> > > > >> > >  Makes diff. to
>> > > > >> > >
>> > > > >> > > We're using hetzner at the moment which may not have been a
>> good
>> > > > >> choice.
>> > > > >> > >  Has anyone had any experience with them wrt. Hadoop?  They
>> > offer
>> > > 7
>> > > > >> and
>> > > > >> > 15
>> > > > >> > > disk options, but are low on the cpu front (quad core).  Our
>> > > > workload
>> > > > >> > will
>> > > > >> > > be I assume on the high side.  There's also a 8 disk Dell
>> > > PowerEdge
>> > > > >> what
>> > > > >> > is
>> > > > >> > > a little more powerful.  What hosting providers would people
>> > > > >> recommended?
>> > > > >> > >  (And what would be the strategy for migrating?)
>> > > > >> > >
>> > > > >> > > Anyhow, when I have things more stable I'll have a look at
>> > > checking
>> > > > >> out
>> > > > >> > > what's using the cpu.  In the mean time, can anything be
>> gleamed
>> > > > from
>> > > > >> the
>> > > > >> > > above snap?
>> > > > >> > >
>> > > > >> > > Cheers
>> > > > >> > > H
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > On 24 October 2013 15:14, Jean-Marc Spaggiari <
>> > > > >> jean-marc@spaggiari.org
>> > > > >> > >wrote:
>> > > > >> > >
>> > > > >> > >> Hi Harry,
>> > > > >> > >>
>> > > > >> > >> Do you have more details on the exact load? Can you run
>> vmstats
>> > > and
>> > > > >> see
>> > > > >> > >> what kind of load it is? Is it user? cpu? wio?
>> > > > >> > >>
>> > > > >> > >> I suspect your disks to be the issue. There is 2 things
>> here.
>> > > > >> > >>
>> > > > >> > >> First, we don't recommend RAID for the HDFS/HBase disk. The
>> > best
>> > > is
>> > > > >> to
>> > > > >> > >> simply mount the disks on 2 mounting points and give them to
>> > > HDFS.
>> > > > >> > >> Second, 2 disks per not is very low. On a dev cluster is not
>> > even
>> > > > >> > >> recommended. In production, you should go with 12 or more.
>> > > > >> > >>
>> > > > >> > >> So with only 2 disks in RAID, I suspect your WIO to be high
>> > which
>> > > > is
>> > > > >> > what
>> > > > >> > >> might slow your process.
>> > > > >> > >>
>> > > > >> > >> Can you take a look on that direction? If it's not that, we
>> > will
>> > > > >> > continue
>> > > > >> > >> to investigate ;)
>> > > > >> > >>
>> > > > >> > >> Thanks,
>> > > > >> > >>
>> > > > >> > >> JM
>> > > > >> > >>
>> > > > >> > >>
>> > > > >> > >> 2013/10/23 Harry Waye <hw...@arachnys.com>
>> > > > >> > >>
>> > > > >> > >> > I'm trying to load data into hbase using HFileOutputFormat
>> > and
>> > > > >> > >> incremental
>> > > > >> > >> > bulk load but am getting rather lackluster performance,
>> 10h
>> > for
>> > > > >> ~0.5TB
>> > > > >> > >> > data, ~50000 blocks.  This is being loaded into a table
>> that
>> > > has
>> > > > 2
>> > > > >> > >> > families, 9 columns, 2500 regions and is ~10TB in size.
>>  Keys
>> > > are
>> > > > >> md5
>> > > > >> > >> > hashes and regions are pretty evenly spread.  The
>> majority of
>> > > > time
>> > > > >> > >> appears
>> > > > >> > >> > to be spend in the reduce phase, with the map phase
>> > completing
>> > > > very
>> > > > >> > >> > quickly.  The network doesn't appear to be saturated, but
>> the
>> > > > load
>> > > > >> is
>> > > > >> > >> > consistently at 6 which is the number or reduce tasks per
>> > node.
>> > > > >> > >> >
>> > > > >> > >> > 12 hosts (6 cores, 2 disk as RAID0, 1GB eth, no one else
>> on
>> > the
>> > > > >> rack).
>> > > > >> > >> >
>> > > > >> > >> > MR conf: 6 mappers, 6 reducers per node.
>> > > > >> > >> >
>> > > > >> > >> > I spoke to someone on IRC and they recommended reducing
>> job
>> > > > output
>> > > > >> > >> > replication to 1, and reducing the number of mappers
>> which I
>> > > > >> reduced
>> > > > >> > to
>> > > > >> > >> 2.
>> > > > >> > >> >  Reducing replication appeared not to make any difference,
>> > > > reducing
>> > > > >> > >> > reducers appeared just to slow the job down.  I'm going to
>> > > have a
>> > > > >> look
>> > > > >> > >> at
>> > > > >> > >> > running the benchmarks mentioned on Michael Noll's blog
>> and
>> > see
>> > > > >> what
>> > > > >> > >> that
>> > > > >> > >> > turns up.  I guess some questions I have are:
>> > > > >> > >> >
>> > > > >> > >> > How does the global number/size of blocks affect perf.?
>>  (I
>> > > have
>> > > > a
>> > > > >> lot
>> > > > >> > >> of
>> > > > >> > >> > 10mb files, which are the input files)
>> > > > >> > >> >
>> > > > >> > >> > How does the job local number/size of input blocks affect
>> > > perf.?
>> > > > >> > >> >
>> > > > >> > >> > What is actually happening in the reduce phase that
>> requires
>> > so
>> > > > >> much
>> > > > >> > >> CPU?
>> > > > >> > >> >  I assume the actual construction of HFiles isn't
>> intensive.
>> > > > >> > >> >
>> > > > >> > >> > Ultimately, how can I improve performance?
>> > > > >> > >> > Thanks
>> > > > >> > >> >
>> > > > >> > >>
>> > > > >> > >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > --
>> > > > >> > > Harry Waye, Co-founder/CTO
>> > > > >> > > harry@arachnys.com
>> > > > >> > > +44 7890 734289
>> > > > >> > >
>> > > > >> > > Follow us on Twitter: @arachnys <
>> > https://twitter.com/#!/arachnys>
>> > > > >> > >
>> > > > >> > > ---
>> > > > >> > > Arachnys Information Services Limited is a company
>> registered in
>> > > > >> England
>> > > > >> > &
>> > > > >> > > Wales. Company number: 7269723. Registered office: 40
>> Clarendon
>> > > St,
>> > > > >> > > Cambridge, CB1 1JX.
>> > > > >> > >
>> > > > >> >
>> > > > >> >
>> > > > >> >
>> > > > >> > --
>> > > > >> > Harry Waye, Co-founder/CTO
>> > > > >> > harry@arachnys.com
>> > > > >> > +44 7890 734289
>> > > > >> >
>> > > > >> > Follow us on Twitter: @arachnys <
>> https://twitter.com/#!/arachnys>
>> > > > >> >
>> > > > >> > ---
>> > > > >> > Arachnys Information Services Limited is a company registered
>> in
>> > > > >> England &
>> > > > >> > Wales. Company number: 7269723. Registered office: 40 Clarendon
>> > St,
>> > > > >> > Cambridge, CB1 1JX.
>> > > > >> >
>> > > > >>
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Harry Waye, Co-founder/CTO
>> > > > > harry@arachnys.com
>> > > > > +44 7890 734289
>> > > > >
>> > > > > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
>> > > > >
>> > > > > ---
>> > > > > Arachnys Information Services Limited is a company registered in
>> > > England
>> > > > &
>> > > > > Wales. Company number: 7269723. Registered office: 40 Clarendon
>> St,
>> > > > > Cambridge, CB1 1JX.
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Harry Waye, Co-founder/CTO
>> > > > harry@arachnys.com
>> > > > +44 7890 734289
>> > > >
>> > > > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
>> > > >
>> > > > ---
>> > > > Arachnys Information Services Limited is a company registered in
>> > England
>> > > &
>> > > > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
>> > > > Cambridge, CB1 1JX.
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Harry Waye, Co-founder/CTO
>> > harry@arachnys.com
>> > +44 7890 734289
>> >
>> > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
>> >
>> > ---
>> > Arachnys Information Services Limited is a company registered in
>> England &
>> > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
>> > Cambridge, CB1 1JX.
>> >
>>
>
>
>
> --
> Harry Waye, Co-founder/CTO
> harry@arachnys.com
> +44 7890 734289
>
> Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
>
> ---
> Arachnys Information Services Limited is a company registered in England &
> Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> Cambridge, CB1 1JX.
>



-- 
Harry Waye, Co-founder/CTO
harry@arachnys.com
+44 7890 734289

Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>

---
Arachnys Information Services Limited is a company registered in England &
Wales. Company number: 7269723. Registered office: 40 Clarendon St,
Cambridge, CB1 1JX.

Re: Optimizing bulk load performance

Posted by Harry Waye <hw...@arachnys.com>.

Got it!  Re. 50% utilisation, I forgot to mention that 6 cores does not
include hyper-threading.  Foolish I know, but that would explain CPU0 being
at 50%.  The nodes are as stated in
http://www.hetzner.de/en/hosting/produkte_rootserver/ex10 bar the RAID1.


On 24 October 2013 22:50, Jean-Marc Spaggiari <je...@spaggiari.org>wrote:

> Remote calls to a server. Just forget about it ;) Please verify the network
> bandwidth between your nodes.
>
>
> 2013/10/24 Harry Waye <hw...@arachnys.com>
>
> > Excuse the ignorance, RCP?
> >
> >
> > On 24 October 2013 22:28, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> > >wrote:
> >
> > > Your nodes are almost 50% idle... Might be something else. Sound it's
> not
> > > your disks nor your CPU... Maybe to many RCPs?
> > >
> > > Have you investigate on your network side? netperf might be a good help
> > for
> > > you.
> > >
> > > JM
> > >
> > >
> > > 2013/10/24 Harry Waye <hw...@arachnys.com>
> > >
> > > > p.s. I guess this is more turning into a general hadoop issue, but
> I'll
> > > > keep the discussion here seeing that I have an audience, unless there
> > are
> > > > objections.
> > > >
> > > >
> > > > On 24 October 2013 22:02, Harry Waye <hw...@arachnys.com> wrote:
> > > >
> > > > > So just a short update, I'll read into it a little more tomorrow.
> >  This
> > > > is
> > > > > from three of the nodes:
> > > > > https://gist.github.com/hazzadous/1264af7c674e1b3cf867
> > > > >
> > > > > The first is the grey guy.  Just glancing at it, it looks to
> > fluctuate
> > > > > more than the others.  I guess that could suggest that there are
> some
> > > > > issues with reading from the disks.  Interestingly, it's the only
> one
> > > > that
> > > > > doesn't have smartd installed, which alerts us on changes for the
> > other
> > > > > nodes.  I suspect there's probably some mileage in checking its
> smart
> > > > > attributes.  Will do that tomorrow though.
> > > > >
> > > > > Out of curiosity, how do people normally monitor disk issues?  I'm
> > > going
> > > > > to set up collectd to push various things from smartctl tomorrow,
> at
> > > the
> > > > > moment all we do is receive emails, which is mostly noise about
> > problem
> > > > > sector counts increasing +1.
> > > > >
> > > > >
> > > > > On 24 October 2013 19:40, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org
> > > > >wrote:
> > > > >
> > > > >> Can you try vmstat 2? 2 is the interval in seconds it will display
> > the
> > > > >> disk
> > > > >> usage. On the extract here, nothing is running. only 8% is used.
> (1%
> > > > disk
> > > > >> IO, 6% User, 1% sys)
> > > > >>
> > > > >> Run it on 2 or 3 different nodes while you are putting the load on
> > the
> > > > >> cluster. And take a look at the 4 last numbers and see what the
> > value
> > > of
> > > > >> the last one?
> > > > >>
> > > > >> On the usercpu0 graph, who is the gray guy showing hight?
> > > > >>
> > > > >> JM
> > > > >>
> > > > >> 2013/10/24 Harry Waye <hw...@arachnys.com>
> > > > >>
> > > > >> > Ok I'm running a load job atm, I've add some possibly
> > > incomprehensible
> > > > >> > coloured lines to the graph: http://goo.gl/cUGCGG
> > > > >> >
> > > > >> > This is actually with one fewer nodes due to decommissioning to
> > > > replace
> > > > >> a
> > > > >> > disk, hence I guess the reason for one squiggly line showing no
> > disk
> > > > >> > activity.  I've included only the cpu stats for CPU0 from each
> > node.
> > > > >>  The
> > > > >> > last graph should read "Memory Used".  vmstat from one of the
> > nodes:
> > > > >> >
> > > > >> > procs -----------memory---------- ---swap-- -----io----
> -system--
> > > > >> > ----cpu----
> > > > >> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in
> cs
> > us
> > > > sy
> > > > >> id
> > > > >> > wa
> > > > >> >  6  0      0 392448 524668 43823900    0    0   501  1044    0
> >  0
> > >  6
> > > > >>  1
> > > > >> > 91  1
> > > > >> >
> > > > >> > To me the wait doesn't seem that high.  Job stats are
> > > > >> > http://goo.gl/ZYdUKp,  the job setup is
> > > > >> > https://gist.github.com/hazzadous/ac57a384f2ab685f07f6
> > > > >> >
> > > > >> > Does anything jump out at you?
> > > > >> >
> > > > >> > Cheers
> > > > >> > H
> > > > >> >
> > > > >> >
> > > > >> > On 24 October 2013 16:16, Harry Waye <hw...@arachnys.com>
> wrote:
> > > > >> >
> > > > >> > > Hi JM
> > > > >> > >
> > > > >> > > I took a snapshot on the initial run, before the changes:
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://www.evernote.com/shard/s95/sh/b8e1516d-7c49-43f0-8b5f-d16bbdd3fe13/00d7c6cd6dd9fba92d6f00f90fb54fc1/res/4f0e20a2-1ecb-4085-8bc8-b3263c23afb5/screenshot.png
> > > > >> > >
> > > > >> > > Good timing, disks appear to be exploding (ATA errors) atm
> thus
> > > I'm
> > > > >> > > decommissioning and reprovisioning with new disks.  I'll be
> > > > >> > reprovisioning
> > > > >> > > as without RAID (it's software RAID just to compound the
> issue)
> > > > >> although
> > > > >> > > not sure how I'll go about migrating all nodes.  I guess I'd
> > need
> > > to
> > > > >> put
> > > > >> > > more correctly speced nodes in the rack and decommission the
> > > > existing.
> > > > >> > >  Makes diff. to
> > > > >> > >
> > > > >> > > We're using hetzner at the moment which may not have been a
> good
> > > > >> choice.
> > > > >> > >  Has anyone had any experience with them wrt. Hadoop?  They
> > offer
> > > 7
> > > > >> and
> > > > >> > 15
> > > > >> > > disk options, but are low on the cpu front (quad core).  Our
> > > > workload
> > > > >> > will
> > > > >> > > be I assume on the high side.  There's also a 8 disk Dell
> > > PowerEdge
> > > > >> what
> > > > >> > is
> > > > >> > > a little more powerful.  What hosting providers would people
> > > > >> recommended?
> > > > >> > >  (And what would be the strategy for migrating?)
> > > > >> > >
> > > > >> > > Anyhow, when I have things more stable I'll have a look at
> > > checking
> > > > >> out
> > > > >> > > what's using the cpu.  In the mean time, can anything be
> gleamed
> > > > from
> > > > >> the
> > > > >> > > above snap?
> > > > >> > >
> > > > >> > > Cheers
> > > > >> > > H
> > > > >> > >
> > > > >> > >
> > > > >> > > On 24 October 2013 15:14, Jean-Marc Spaggiari <
> > > > >> jean-marc@spaggiari.org
> > > > >> > >wrote:
> > > > >> > >
> > > > >> > >> Hi Harry,
> > > > >> > >>
> > > > >> > >> Do you have more details on the exact load? Can you run
> vmstats
> > > and
> > > > >> see
> > > > >> > >> what kind of load it is? Is it user? cpu? wio?
> > > > >> > >>
> > > > >> > >> I suspect your disks to be the issue. There is 2 things here.
> > > > >> > >>
> > > > >> > >> First, we don't recommend RAID for the HDFS/HBase disk. The
> > best
> > > is
> > > > >> to
> > > > >> > >> simply mount the disks on 2 mounting points and give them to
> > > HDFS.
> > > > >> > >> Second, 2 disks per not is very low. On a dev cluster is not
> > even
> > > > >> > >> recommended. In production, you should go with 12 or more.
> > > > >> > >>
> > > > >> > >> So with only 2 disks in RAID, I suspect your WIO to be high
> > which
> > > > is
> > > > >> > what
> > > > >> > >> might slow your process.
> > > > >> > >>
> > > > >> > >> Can you take a look on that direction? If it's not that, we
> > will
> > > > >> > continue
> > > > >> > >> to investigate ;)
> > > > >> > >>
> > > > >> > >> Thanks,
> > > > >> > >>
> > > > >> > >> JM
> > > > >> > >>
> > > > >> > >>
> > > > >> > >> 2013/10/23 Harry Waye <hw...@arachnys.com>
> > > > >> > >>
> > > > >> > >> > I'm trying to load data into hbase using HFileOutputFormat
> > and
> > > > >> > >> incremental
> > > > >> > >> > bulk load but am getting rather lackluster performance, 10h
> > for
> > > > >> ~0.5TB
> > > > >> > >> > data, ~50000 blocks.  This is being loaded into a table
> that
> > > has
> > > > 2
> > > > >> > >> > families, 9 columns, 2500 regions and is ~10TB in size.
>  Keys
> > > are
> > > > >> md5
> > > > >> > >> > hashes and regions are pretty evenly spread.  The majority
> of
> > > > time
> > > > >> > >> appears
> > > > >> > >> > to be spend in the reduce phase, with the map phase
> > completing
> > > > very
> > > > >> > >> > quickly.  The network doesn't appear to be saturated, but
> the
> > > > load
> > > > >> is
> > > > >> > >> > consistently at 6 which is the number or reduce tasks per
> > node.
> > > > >> > >> >
> > > > >> > >> > 12 hosts (6 cores, 2 disk as RAID0, 1GB eth, no one else on
> > the
> > > > >> rack).
> > > > >> > >> >
> > > > >> > >> > MR conf: 6 mappers, 6 reducers per node.
> > > > >> > >> >
> > > > >> > >> > I spoke to someone on IRC and they recommended reducing job
> > > > output
> > > > >> > >> > replication to 1, and reducing the number of mappers which
> I
> > > > >> reduced
> > > > >> > to
> > > > >> > >> 2.
> > > > >> > >> >  Reducing replication appeared not to make any difference,
> > > > reducing
> > > > >> > >> > reducers appeared just to slow the job down.  I'm going to
> > > have a
> > > > >> look
> > > > >> > >> at
> > > > >> > >> > running the benchmarks mentioned on Michael Noll's blog and
> > see
> > > > >> what
> > > > >> > >> that
> > > > >> > >> > turns up.  I guess some questions I have are:
> > > > >> > >> >
> > > > >> > >> > How does the global number/size of blocks affect perf.?  (I
> > > have
> > > > a
> > > > >> lot
> > > > >> > >> of
> > > > >> > >> > 10mb files, which are the input files)
> > > > >> > >> >
> > > > >> > >> > How does the job local number/size of input blocks affect
> > > perf.?
> > > > >> > >> >
> > > > >> > >> > What is actually happening in the reduce phase that
> requires
> > so
> > > > >> much
> > > > >> > >> CPU?
> > > > >> > >> >  I assume the actual construction of HFiles isn't
> intensive.
> > > > >> > >> >
> > > > >> > >> > Ultimately, how can I improve performance?
> > > > >> > >> > Thanks
> > > > >> > >> >
> > > > >> > >>
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > --
> > > > >> > > Harry Waye, Co-founder/CTO
> > > > >> > > harry@arachnys.com
> > > > >> > > +44 7890 734289
> > > > >> > >
> > > > >> > > Follow us on Twitter: @arachnys <
> > https://twitter.com/#!/arachnys>
> > > > >> > >
> > > > >> > > ---
> > > > >> > > Arachnys Information Services Limited is a company registered
> in
> > > > >> England
> > > > >> > &
> > > > >> > > Wales. Company number: 7269723. Registered office: 40
> Clarendon
> > > St,
> > > > >> > > Cambridge, CB1 1JX.
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > Harry Waye, Co-founder/CTO
> > > > >> > harry@arachnys.com
> > > > >> > +44 7890 734289
> > > > >> >
> > > > >> > Follow us on Twitter: @arachnys <
> https://twitter.com/#!/arachnys>
> > > > >> >
> > > > >> > ---
> > > > >> > Arachnys Information Services Limited is a company registered in
> > > > >> England &
> > > > >> > Wales. Company number: 7269723. Registered office: 40 Clarendon
> > St,
> > > > >> > Cambridge, CB1 1JX.
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Harry Waye, Co-founder/CTO
> > > > > harry@arachnys.com
> > > > > +44 7890 734289
> > > > >
> > > > > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
> > > > >
> > > > > ---
> > > > > Arachnys Information Services Limited is a company registered in
> > > England
> > > > &
> > > > > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> > > > > Cambridge, CB1 1JX.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Harry Waye, Co-founder/CTO
> > > > harry@arachnys.com
> > > > +44 7890 734289
> > > >
> > > > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
> > > >
> > > > ---
> > > > Arachnys Information Services Limited is a company registered in
> > England
> > > &
> > > > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> > > > Cambridge, CB1 1JX.
> > > >
> > >
> >
> >
> >
> > --
> > Harry Waye, Co-founder/CTO
> > harry@arachnys.com
> > +44 7890 734289
> >
> > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
> >
> > ---
> > Arachnys Information Services Limited is a company registered in England
> &
> > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> > Cambridge, CB1 1JX.
> >
>



-- 
Harry Waye, Co-founder/CTO
harry@arachnys.com
+44 7890 734289

Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>

---
Arachnys Information Services Limited is a company registered in England &
Wales. Company number: 7269723. Registered office: 40 Clarendon St,
Cambridge, CB1 1JX.

Re: Optimizing bulk load performance

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Remote calls to a server. Just forget about it ;) Please verify the network
bandwidth between your nodes.


2013/10/24 Harry Waye <hw...@arachnys.com>

> Excuse the ignorance, RCP?
>
>
> On 24 October 2013 22:28, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> >wrote:
>
> > Your nodes are almost 50% idle... Might be something else. Sound it's not
> > your disks nor your CPU... Maybe to many RCPs?
> >
> > Have you investigate on your network side? netperf might be a good help
> for
> > you.
> >
> > JM
> >
> >
> > 2013/10/24 Harry Waye <hw...@arachnys.com>
> >
> > > p.s. I guess this is more turning into a general hadoop issue, but I'll
> > > keep the discussion here seeing that I have an audience, unless there
> are
> > > objections.
> > >
> > >
> > > On 24 October 2013 22:02, Harry Waye <hw...@arachnys.com> wrote:
> > >
> > > > So just a short update, I'll read into it a little more tomorrow.
>  This
> > > is
> > > > from three of the nodes:
> > > > https://gist.github.com/hazzadous/1264af7c674e1b3cf867
> > > >
> > > > The first is the grey guy.  Just glancing at it, it looks to
> fluctuate
> > > > more than the others.  I guess that could suggest that there are some
> > > > issues with reading from the disks.  Interestingly, it's the only one
> > > that
> > > > doesn't have smartd installed, which alerts us on changes for the
> other
> > > > nodes.  I suspect there's probably some mileage in checking its smart
> > > > attributes.  Will do that tomorrow though.
> > > >
> > > > Out of curiosity, how do people normally monitor disk issues?  I'm
> > going
> > > > to set up collectd to push various things from smartctl tomorrow, at
> > the
> > > > moment all we do is receive emails, which is mostly noise about
> problem
> > > > sector counts increasing +1.
> > > >
> > > >
> > > > On 24 October 2013 19:40, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org
> > > >wrote:
> > > >
> > > >> Can you try vmstat 2? 2 is the interval in seconds it will display
> the
> > > >> disk
> > > >> usage. On the extract here, nothing is running. only 8% is used. (1%
> > > disk
> > > >> IO, 6% User, 1% sys)
> > > >>
> > > >> Run it on 2 or 3 different nodes while you are putting the load on
> the
> > > >> cluster. And take a look at the 4 last numbers and see what the
> value
> > of
> > > >> the last one?
> > > >>
> > > >> On the usercpu0 graph, who is the gray guy showing hight?
> > > >>
> > > >> JM
> > > >>
> > > >> 2013/10/24 Harry Waye <hw...@arachnys.com>
> > > >>
> > > >> > Ok I'm running a load job atm, I've add some possibly
> > incomprehensible
> > > >> > coloured lines to the graph: http://goo.gl/cUGCGG
> > > >> >
> > > >> > This is actually with one fewer nodes due to decommissioning to
> > > replace
> > > >> a
> > > >> > disk, hence I guess the reason for one squiggly line showing no
> disk
> > > >> > activity.  I've included only the cpu stats for CPU0 from each
> node.
> > > >>  The
> > > >> > last graph should read "Memory Used".  vmstat from one of the
> nodes:
> > > >> >
> > > >> > procs -----------memory---------- ---swap-- -----io---- -system--
> > > >> > ----cpu----
> > > >> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs
> us
> > > sy
> > > >> id
> > > >> > wa
> > > >> >  6  0      0 392448 524668 43823900    0    0   501  1044    0
>  0
> >  6
> > > >>  1
> > > >> > 91  1
> > > >> >
> > > >> > To me the wait doesn't seem that high.  Job stats are
> > > >> > http://goo.gl/ZYdUKp,  the job setup is
> > > >> > https://gist.github.com/hazzadous/ac57a384f2ab685f07f6
> > > >> >
> > > >> > Does anything jump out at you?
> > > >> >
> > > >> > Cheers
> > > >> > H
> > > >> >
> > > >> >
> > > >> > On 24 October 2013 16:16, Harry Waye <hw...@arachnys.com> wrote:
> > > >> >
> > > >> > > Hi JM
> > > >> > >
> > > >> > > I took a snapshot on the initial run, before the changes:
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://www.evernote.com/shard/s95/sh/b8e1516d-7c49-43f0-8b5f-d16bbdd3fe13/00d7c6cd6dd9fba92d6f00f90fb54fc1/res/4f0e20a2-1ecb-4085-8bc8-b3263c23afb5/screenshot.png
> > > >> > >
> > > >> > > Good timing, disks appear to be exploding (ATA errors) atm thus
> > I'm
> > > >> > > decommissioning and reprovisioning with new disks.  I'll be
> > > >> > reprovisioning
> > > >> > > as without RAID (it's software RAID just to compound the issue)
> > > >> although
> > > >> > > not sure how I'll go about migrating all nodes.  I guess I'd
> need
> > to
> > > >> put
> > > >> > > more correctly speced nodes in the rack and decommission the
> > > existing.
> > > >> > >  Makes diff. to
> > > >> > >
> > > >> > > We're using hetzner at the moment which may not have been a good
> > > >> choice.
> > > >> > >  Has anyone had any experience with them wrt. Hadoop?  They
> offer
> > 7
> > > >> and
> > > >> > 15
> > > >> > > disk options, but are low on the cpu front (quad core).  Our
> > > workload
> > > >> > will
> > > >> > > be I assume on the high side.  There's also a 8 disk Dell
> > PowerEdge
> > > >> what
> > > >> > is
> > > >> > > a little more powerful.  What hosting providers would people
> > > >> recommended?
> > > >> > >  (And what would be the strategy for migrating?)
> > > >> > >
> > > >> > > Anyhow, when I have things more stable I'll have a look at
> > checking
> > > >> out
> > > >> > > what's using the cpu.  In the mean time, can anything be gleamed
> > > from
> > > >> the
> > > >> > > above snap?
> > > >> > >
> > > >> > > Cheers
> > > >> > > H
> > > >> > >
> > > >> > >
> > > >> > > On 24 October 2013 15:14, Jean-Marc Spaggiari <
> > > >> jean-marc@spaggiari.org
> > > >> > >wrote:
> > > >> > >
> > > >> > >> Hi Harry,
> > > >> > >>
> > > >> > >> Do you have more details on the exact load? Can you run vmstats
> > and
> > > >> see
> > > >> > >> what kind of load it is? Is it user? cpu? wio?
> > > >> > >>
> > > >> > >> I suspect your disks to be the issue. There is 2 things here.
> > > >> > >>
> > > >> > >> First, we don't recommend RAID for the HDFS/HBase disk. The
> best
> > is
> > > >> to
> > > >> > >> simply mount the disks on 2 mounting points and give them to
> > HDFS.
> > > >> > >> Second, 2 disks per not is very low. On a dev cluster is not
> even
> > > >> > >> recommended. In production, you should go with 12 or more.
> > > >> > >>
> > > >> > >> So with only 2 disks in RAID, I suspect your WIO to be high
> which
> > > is
> > > >> > what
> > > >> > >> might slow your process.
> > > >> > >>
> > > >> > >> Can you take a look on that direction? If it's not that, we
> will
> > > >> > continue
> > > >> > >> to investigate ;)
> > > >> > >>
> > > >> > >> Thanks,
> > > >> > >>
> > > >> > >> JM
> > > >> > >>
> > > >> > >>
> > > >> > >> 2013/10/23 Harry Waye <hw...@arachnys.com>
> > > >> > >>
> > > >> > >> > I'm trying to load data into hbase using HFileOutputFormat
> and
> > > >> > >> incremental
> > > >> > >> > bulk load but am getting rather lackluster performance, 10h
> for
> > > >> ~0.5TB
> > > >> > >> > data, ~50000 blocks.  This is being loaded into a table that
> > has
> > > 2
> > > >> > >> > families, 9 columns, 2500 regions and is ~10TB in size.  Keys
> > are
> > > >> md5
> > > >> > >> > hashes and regions are pretty evenly spread.  The majority of
> > > time
> > > >> > >> appears
> > > >> > >> > to be spend in the reduce phase, with the map phase
> completing
> > > very
> > > >> > >> > quickly.  The network doesn't appear to be saturated, but the
> > > load
> > > >> is
> > > >> > >> > consistently at 6 which is the number or reduce tasks per
> node.
> > > >> > >> >
> > > >> > >> > 12 hosts (6 cores, 2 disk as RAID0, 1GB eth, no one else on
> the
> > > >> rack).
> > > >> > >> >
> > > >> > >> > MR conf: 6 mappers, 6 reducers per node.
> > > >> > >> >
> > > >> > >> > I spoke to someone on IRC and they recommended reducing job
> > > output
> > > >> > >> > replication to 1, and reducing the number of mappers which I
> > > >> reduced
> > > >> > to
> > > >> > >> 2.
> > > >> > >> >  Reducing replication appeared not to make any difference,
> > > reducing
> > > >> > >> > reducers appeared just to slow the job down.  I'm going to
> > have a
> > > >> look
> > > >> > >> at
> > > >> > >> > running the benchmarks mentioned on Michael Noll's blog and
> see
> > > >> what
> > > >> > >> that
> > > >> > >> > turns up.  I guess some questions I have are:
> > > >> > >> >
> > > >> > >> > How does the global number/size of blocks affect perf.?  (I
> > have
> > > a
> > > >> lot
> > > >> > >> of
> > > >> > >> > 10mb files, which are the input files)
> > > >> > >> >
> > > >> > >> > How does the job local number/size of input blocks affect
> > perf.?
> > > >> > >> >
> > > >> > >> > What is actually happening in the reduce phase that requires
> so
> > > >> much
> > > >> > >> CPU?
> > > >> > >> >  I assume the actual construction of HFiles isn't intensive.
> > > >> > >> >
> > > >> > >> > Ultimately, how can I improve performance?
> > > >> > >> > Thanks
> > > >> > >> >
> > > >> > >>
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Harry Waye, Co-founder/CTO
> > > >> > > harry@arachnys.com
> > > >> > > +44 7890 734289
> > > >> > >
> > > >> > > Follow us on Twitter: @arachnys <
> https://twitter.com/#!/arachnys>
> > > >> > >
> > > >> > > ---
> > > >> > > Arachnys Information Services Limited is a company registered in
> > > >> England
> > > >> > &
> > > >> > > Wales. Company number: 7269723. Registered office: 40 Clarendon
> > St,
> > > >> > > Cambridge, CB1 1JX.
> > > >> > >
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Harry Waye, Co-founder/CTO
> > > >> > harry@arachnys.com
> > > >> > +44 7890 734289
> > > >> >
> > > >> > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
> > > >> >
> > > >> > ---
> > > >> > Arachnys Information Services Limited is a company registered in
> > > >> England &
> > > >> > Wales. Company number: 7269723. Registered office: 40 Clarendon
> St,
> > > >> > Cambridge, CB1 1JX.
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Harry Waye, Co-founder/CTO
> > > > harry@arachnys.com
> > > > +44 7890 734289
> > > >
> > > > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
> > > >
> > > > ---
> > > > Arachnys Information Services Limited is a company registered in
> > England
> > > &
> > > > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> > > > Cambridge, CB1 1JX.
> > > >
> > >
> > >
> > >
> > > --
> > > Harry Waye, Co-founder/CTO
> > > harry@arachnys.com
> > > +44 7890 734289
> > >
> > > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
> > >
> > > ---
> > > Arachnys Information Services Limited is a company registered in
> England
> > &
> > > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> > > Cambridge, CB1 1JX.
> > >
> >
>
>
>
> --
> Harry Waye, Co-founder/CTO
> harry@arachnys.com
> +44 7890 734289
>
> Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
>
> ---
> Arachnys Information Services Limited is a company registered in England &
> Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> Cambridge, CB1 1JX.
>

Re: Optimizing bulk load performance

Posted by Ted Yu <yu...@gmail.com>.

I guess Jean meant RPCs.


On Thu, Oct 24, 2013 at 2:34 PM, Harry Waye <hw...@arachnys.com> wrote:

> Excuse the ignorance, RCP?
>
>
> On 24 October 2013 22:28, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> >wrote:
>
> > Your nodes are almost 50% idle... Might be something else. Sound it's not
> > your disks nor your CPU... Maybe to many RCPs?
> >
> > Have you investigate on your network side? netperf might be a good help
> for
> > you.
> >
> > JM
> >
> >
> > 2013/10/24 Harry Waye <hw...@arachnys.com>
> >
> > > p.s. I guess this is more turning into a general hadoop issue, but I'll
> > > keep the discussion here seeing that I have an audience, unless there
> are
> > > objections.
> > >
> > >
> > > On 24 October 2013 22:02, Harry Waye <hw...@arachnys.com> wrote:
> > >
> > > > So just a short update, I'll read into it a little more tomorrow.
>  This
> > > is
> > > > from three of the nodes:
> > > > https://gist.github.com/hazzadous/1264af7c674e1b3cf867
> > > >
> > > > The first is the grey guy.  Just glancing at it, it looks to
> fluctuate
> > > > more than the others.  I guess that could suggest that there are some
> > > > issues with reading from the disks.  Interestingly, it's the only one
> > > that
> > > > doesn't have smartd installed, which alerts us on changes for the
> other
> > > > nodes.  I suspect there's probably some mileage in checking its smart
> > > > attributes.  Will do that tomorrow though.
> > > >
> > > > Out of curiosity, how do people normally monitor disk issues?  I'm
> > going
> > > > to set up collectd to push various things from smartctl tomorrow, at
> > the
> > > > moment all we do is receive emails, which is mostly noise about
> problem
> > > > sector counts increasing +1.
> > > >
> > > >
> > > > On 24 October 2013 19:40, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org
> > > >wrote:
> > > >
> > > >> Can you try vmstat 2? 2 is the interval in seconds it will display
> the
> > > >> disk
> > > >> usage. On the extract here, nothing is running. only 8% is used. (1%
> > > disk
> > > >> IO, 6% User, 1% sys)
> > > >>
> > > >> Run it on 2 or 3 different nodes while you are putting the load on
> the
> > > >> cluster. And take a look at the 4 last numbers and see what the
> value
> > of
> > > >> the last one?
> > > >>
> > > >> On the usercpu0 graph, who is the gray guy showing hight?
> > > >>
> > > >> JM
> > > >>
> > > >> 2013/10/24 Harry Waye <hw...@arachnys.com>
> > > >>
> > > >> > Ok I'm running a load job atm, I've add some possibly
> > incomprehensible
> > > >> > coloured lines to the graph: http://goo.gl/cUGCGG
> > > >> >
> > > >> > This is actually with one fewer nodes due to decommissioning to
> > > replace
> > > >> a
> > > >> > disk, hence I guess the reason for one squiggly line showing no
> disk
> > > >> > activity.  I've included only the cpu stats for CPU0 from each
> node.
> > > >>  The
> > > >> > last graph should read "Memory Used".  vmstat from one of the
> nodes:
> > > >> >
> > > >> > procs -----------memory---------- ---swap-- -----io---- -system--
> > > >> > ----cpu----
> > > >> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs
> us
> > > sy
> > > >> id
> > > >> > wa
> > > >> >  6  0      0 392448 524668 43823900    0    0   501  1044    0
>  0
> >  6
> > > >>  1
> > > >> > 91  1
> > > >> >
> > > >> > To me the wait doesn't seem that high.  Job stats are
> > > >> > http://goo.gl/ZYdUKp,  the job setup is
> > > >> > https://gist.github.com/hazzadous/ac57a384f2ab685f07f6
> > > >> >
> > > >> > Does anything jump out at you?
> > > >> >
> > > >> > Cheers
> > > >> > H
> > > >> >
> > > >> >
> > > >> > On 24 October 2013 16:16, Harry Waye <hw...@arachnys.com> wrote:
> > > >> >
> > > >> > > Hi JM
> > > >> > >
> > > >> > > I took a snapshot on the initial run, before the changes:
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://www.evernote.com/shard/s95/sh/b8e1516d-7c49-43f0-8b5f-d16bbdd3fe13/00d7c6cd6dd9fba92d6f00f90fb54fc1/res/4f0e20a2-1ecb-4085-8bc8-b3263c23afb5/screenshot.png
> > > >> > >
> > > >> > > Good timing, disks appear to be exploding (ATA errors) atm thus
> > I'm
> > > >> > > decommissioning and reprovisioning with new disks.  I'll be
> > > >> > reprovisioning
> > > >> > > as without RAID (it's software RAID just to compound the issue)
> > > >> although
> > > >> > > not sure how I'll go about migrating all nodes.  I guess I'd
> need
> > to
> > > >> put
> > > >> > > more correctly speced nodes in the rack and decommission the
> > > existing.
> > > >> > >  Makes diff. to
> > > >> > >
> > > >> > > We're using hetzner at the moment which may not have been a good
> > > >> choice.
> > > >> > >  Has anyone had any experience with them wrt. Hadoop?  They
> offer
> > 7
> > > >> and
> > > >> > 15
> > > >> > > disk options, but are low on the cpu front (quad core).  Our
> > > workload
> > > >> > will
> > > >> > > be I assume on the high side.  There's also a 8 disk Dell
> > PowerEdge
> > > >> what
> > > >> > is
> > > >> > > a little more powerful.  What hosting providers would people
> > > >> recommended?
> > > >> > >  (And what would be the strategy for migrating?)
> > > >> > >
> > > >> > > Anyhow, when I have things more stable I'll have a look at
> > checking
> > > >> out
> > > >> > > what's using the cpu.  In the mean time, can anything be gleamed
> > > from
> > > >> the
> > > >> > > above snap?
> > > >> > >
> > > >> > > Cheers
> > > >> > > H
> > > >> > >
> > > >> > >
> > > >> > > On 24 October 2013 15:14, Jean-Marc Spaggiari <
> > > >> jean-marc@spaggiari.org
> > > >> > >wrote:
> > > >> > >
> > > >> > >> Hi Harry,
> > > >> > >>
> > > >> > >> Do you have more details on the exact load? Can you run vmstats
> > and
> > > >> see
> > > >> > >> what kind of load it is? Is it user? cpu? wio?
> > > >> > >>
> > > >> > >> I suspect your disks to be the issue. There is 2 things here.
> > > >> > >>
> > > >> > >> First, we don't recommend RAID for the HDFS/HBase disk. The
> best
> > is
> > > >> to
> > > >> > >> simply mount the disks on 2 mounting points and give them to
> > HDFS.
> > > >> > >> Second, 2 disks per not is very low. On a dev cluster is not
> even
> > > >> > >> recommended. In production, you should go with 12 or more.
> > > >> > >>
> > > >> > >> So with only 2 disks in RAID, I suspect your WIO to be high
> which
> > > is
> > > >> > what
> > > >> > >> might slow your process.
> > > >> > >>
> > > >> > >> Can you take a look on that direction? If it's not that, we
> will
> > > >> > continue
> > > >> > >> to investigate ;)
> > > >> > >>
> > > >> > >> Thanks,
> > > >> > >>
> > > >> > >> JM
> > > >> > >>
> > > >> > >>
> > > >> > >> 2013/10/23 Harry Waye <hw...@arachnys.com>
> > > >> > >>
> > > >> > >> > I'm trying to load data into hbase using HFileOutputFormat
> and
> > > >> > >> incremental
> > > >> > >> > bulk load but am getting rather lackluster performance, 10h
> for
> > > >> ~0.5TB
> > > >> > >> > data, ~50000 blocks.  This is being loaded into a table that
> > has
> > > 2
> > > >> > >> > families, 9 columns, 2500 regions and is ~10TB in size.  Keys
> > are
> > > >> md5
> > > >> > >> > hashes and regions are pretty evenly spread.  The majority of
> > > time
> > > >> > >> appears
> > > >> > >> > to be spend in the reduce phase, with the map phase
> completing
> > > very
> > > >> > >> > quickly.  The network doesn't appear to be saturated, but the
> > > load
> > > >> is
> > > >> > >> > consistently at 6 which is the number or reduce tasks per
> node.
> > > >> > >> >
> > > >> > >> > 12 hosts (6 cores, 2 disk as RAID0, 1GB eth, no one else on
> the
> > > >> rack).
> > > >> > >> >
> > > >> > >> > MR conf: 6 mappers, 6 reducers per node.
> > > >> > >> >
> > > >> > >> > I spoke to someone on IRC and they recommended reducing job
> > > output
> > > >> > >> > replication to 1, and reducing the number of mappers which I
> > > >> reduced
> > > >> > to
> > > >> > >> 2.
> > > >> > >> >  Reducing replication appeared not to make any difference,
> > > reducing
> > > >> > >> > reducers appeared just to slow the job down.  I'm going to
> > have a
> > > >> look
> > > >> > >> at
> > > >> > >> > running the benchmarks mentioned on Michael Noll's blog and
> see
> > > >> what
> > > >> > >> that
> > > >> > >> > turns up.  I guess some questions I have are:
> > > >> > >> >
> > > >> > >> > How does the global number/size of blocks affect perf.?  (I
> > have
> > > a
> > > >> lot
> > > >> > >> of
> > > >> > >> > 10mb files, which are the input files)
> > > >> > >> >
> > > >> > >> > How does the job local number/size of input blocks affect
> > perf.?
> > > >> > >> >
> > > >> > >> > What is actually happening in the reduce phase that requires
> so
> > > >> much
> > > >> > >> CPU?
> > > >> > >> >  I assume the actual construction of HFiles isn't intensive.
> > > >> > >> >
> > > >> > >> > Ultimately, how can I improve performance?
> > > >> > >> > Thanks
> > > >> > >> >
> > > >> > >>
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Harry Waye, Co-founder/CTO
> > > >> > > harry@arachnys.com
> > > >> > > +44 7890 734289
> > > >> > >
> > > >> > > Follow us on Twitter: @arachnys <
> https://twitter.com/#!/arachnys>
> > > >> > >
> > > >> > > ---
> > > >> > > Arachnys Information Services Limited is a company registered in
> > > >> England
> > > >> > &
> > > >> > > Wales. Company number: 7269723. Registered office: 40 Clarendon
> > St,
> > > >> > > Cambridge, CB1 1JX.
> > > >> > >
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Harry Waye, Co-founder/CTO
> > > >> > harry@arachnys.com
> > > >> > +44 7890 734289
> > > >> >
> > > >> > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
> > > >> >
> > > >> > ---
> > > >> > Arachnys Information Services Limited is a company registered in
> > > >> England &
> > > >> > Wales. Company number: 7269723. Registered office: 40 Clarendon
> St,
> > > >> > Cambridge, CB1 1JX.
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Harry Waye, Co-founder/CTO
> > > > harry@arachnys.com
> > > > +44 7890 734289
> > > >
> > > > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
> > > >
> > > > ---
> > > > Arachnys Information Services Limited is a company registered in
> > England
> > > &
> > > > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> > > > Cambridge, CB1 1JX.
> > > >
> > >
> > >
> > >
> > > --
> > > Harry Waye, Co-founder/CTO
> > > harry@arachnys.com
> > > +44 7890 734289
> > >
> > > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
> > >
> > > ---
> > > Arachnys Information Services Limited is a company registered in
> England
> > &
> > > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> > > Cambridge, CB1 1JX.
> > >
> >
>
>
>
> --
> Harry Waye, Co-founder/CTO
> harry@arachnys.com
> +44 7890 734289
>
> Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
>
> ---
> Arachnys Information Services Limited is a company registered in England &
> Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> Cambridge, CB1 1JX.
>

Re: Optimizing bulk load performance

Posted by Harry Waye <hw...@arachnys.com>.

Excuse the ignorance, RCP?


On 24 October 2013 22:28, Jean-Marc Spaggiari <je...@spaggiari.org>wrote:

> Your nodes are almost 50% idle... Might be something else. Sound it's not
> your disks nor your CPU... Maybe to many RCPs?
>
> Have you investigate on your network side? netperf might be a good help for
> you.
>
> JM
>
>
> 2013/10/24 Harry Waye <hw...@arachnys.com>
>
> > p.s. I guess this is more turning into a general hadoop issue, but I'll
> > keep the discussion here seeing that I have an audience, unless there are
> > objections.
> >
> >
> > On 24 October 2013 22:02, Harry Waye <hw...@arachnys.com> wrote:
> >
> > > So just a short update, I'll read into it a little more tomorrow.  This
> > is
> > > from three of the nodes:
> > > https://gist.github.com/hazzadous/1264af7c674e1b3cf867
> > >
> > > The first is the grey guy.  Just glancing at it, it looks to fluctuate
> > > more than the others.  I guess that could suggest that there are some
> > > issues with reading from the disks.  Interestingly, it's the only one
> > that
> > > doesn't have smartd installed, which alerts us on changes for the other
> > > nodes.  I suspect there's probably some mileage in checking its smart
> > > attributes.  Will do that tomorrow though.
> > >
> > > Out of curiosity, how do people normally monitor disk issues?  I'm
> going
> > > to set up collectd to push various things from smartctl tomorrow, at
> the
> > > moment all we do is receive emails, which is mostly noise about problem
> > > sector counts increasing +1.
> > >
> > >
> > > On 24 October 2013 19:40, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> > >wrote:
> > >
> > >> Can you try vmstat 2? 2 is the interval in seconds it will display the
> > >> disk
> > >> usage. On the extract here, nothing is running. only 8% is used. (1%
> > disk
> > >> IO, 6% User, 1% sys)
> > >>
> > >> Run it on 2 or 3 different nodes while you are putting the load on the
> > >> cluster. And take a look at the 4 last numbers and see what the value
> of
> > >> the last one?
> > >>
> > >> On the usercpu0 graph, who is the gray guy showing hight?
> > >>
> > >> JM
> > >>
> > >> 2013/10/24 Harry Waye <hw...@arachnys.com>
> > >>
> > >> > Ok I'm running a load job atm, I've add some possibly
> incomprehensible
> > >> > coloured lines to the graph: http://goo.gl/cUGCGG
> > >> >
> > >> > This is actually with one fewer nodes due to decommissioning to
> > replace
> > >> a
> > >> > disk, hence I guess the reason for one squiggly line showing no disk
> > >> > activity.  I've included only the cpu stats for CPU0 from each node.
> > >>  The
> > >> > last graph should read "Memory Used".  vmstat from one of the nodes:
> > >> >
> > >> > procs -----------memory---------- ---swap-- -----io---- -system--
> > >> > ----cpu----
> > >> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
> > sy
> > >> id
> > >> > wa
> > >> >  6  0      0 392448 524668 43823900    0    0   501  1044    0    0
>  6
> > >>  1
> > >> > 91  1
> > >> >
> > >> > To me the wait doesn't seem that high.  Job stats are
> > >> > http://goo.gl/ZYdUKp,  the job setup is
> > >> > https://gist.github.com/hazzadous/ac57a384f2ab685f07f6
> > >> >
> > >> > Does anything jump out at you?
> > >> >
> > >> > Cheers
> > >> > H
> > >> >
> > >> >
> > >> > On 24 October 2013 16:16, Harry Waye <hw...@arachnys.com> wrote:
> > >> >
> > >> > > Hi JM
> > >> > >
> > >> > > I took a snapshot on the initial run, before the changes:
> > >> > >
> > >> >
> > >>
> >
> https://www.evernote.com/shard/s95/sh/b8e1516d-7c49-43f0-8b5f-d16bbdd3fe13/00d7c6cd6dd9fba92d6f00f90fb54fc1/res/4f0e20a2-1ecb-4085-8bc8-b3263c23afb5/screenshot.png
> > >> > >
> > >> > > Good timing, disks appear to be exploding (ATA errors) atm thus
> I'm
> > >> > > decommissioning and reprovisioning with new disks.  I'll be
> > >> > reprovisioning
> > >> > > as without RAID (it's software RAID just to compound the issue)
> > >> although
> > >> > > not sure how I'll go about migrating all nodes.  I guess I'd need
> to
> > >> put
> > >> > > more correctly speced nodes in the rack and decommission the
> > existing.
> > >> > >  Makes diff. to
> > >> > >
> > >> > > We're using hetzner at the moment which may not have been a good
> > >> choice.
> > >> > >  Has anyone had any experience with them wrt. Hadoop?  They offer
> 7
> > >> and
> > >> > 15
> > >> > > disk options, but are low on the cpu front (quad core).  Our
> > workload
> > >> > will
> > >> > > be I assume on the high side.  There's also a 8 disk Dell
> PowerEdge
> > >> what
> > >> > is
> > >> > > a little more powerful.  What hosting providers would people
> > >> recommended?
> > >> > >  (And what would be the strategy for migrating?)
> > >> > >
> > >> > > Anyhow, when I have things more stable I'll have a look at
> checking
> > >> out
> > >> > > what's using the cpu.  In the mean time, can anything be gleamed
> > from
> > >> the
> > >> > > above snap?
> > >> > >
> > >> > > Cheers
> > >> > > H
> > >> > >
> > >> > >
> > >> > > On 24 October 2013 15:14, Jean-Marc Spaggiari <
> > >> jean-marc@spaggiari.org
> > >> > >wrote:
> > >> > >
> > >> > >> Hi Harry,
> > >> > >>
> > >> > >> Do you have more details on the exact load? Can you run vmstats
> and
> > >> see
> > >> > >> what kind of load it is? Is it user? cpu? wio?
> > >> > >>
> > >> > >> I suspect your disks to be the issue. There is 2 things here.
> > >> > >>
> > >> > >> First, we don't recommend RAID for the HDFS/HBase disk. The best
> is
> > >> to
> > >> > >> simply mount the disks on 2 mounting points and give them to
> HDFS.
> > >> > >> Second, 2 disks per not is very low. On a dev cluster is not even
> > >> > >> recommended. In production, you should go with 12 or more.
> > >> > >>
> > >> > >> So with only 2 disks in RAID, I suspect your WIO to be high which
> > is
> > >> > what
> > >> > >> might slow your process.
> > >> > >>
> > >> > >> Can you take a look on that direction? If it's not that, we will
> > >> > continue
> > >> > >> to investigate ;)
> > >> > >>
> > >> > >> Thanks,
> > >> > >>
> > >> > >> JM
> > >> > >>
> > >> > >>
> > >> > >> 2013/10/23 Harry Waye <hw...@arachnys.com>
> > >> > >>
> > >> > >> > I'm trying to load data into hbase using HFileOutputFormat and
> > >> > >> incremental
> > >> > >> > bulk load but am getting rather lackluster performance, 10h for
> > >> ~0.5TB
> > >> > >> > data, ~50000 blocks.  This is being loaded into a table that
> has
> > 2
> > >> > >> > families, 9 columns, 2500 regions and is ~10TB in size.  Keys
> are
> > >> md5
> > >> > >> > hashes and regions are pretty evenly spread.  The majority of
> > time
> > >> > >> appears
> > >> > >> > to be spend in the reduce phase, with the map phase completing
> > very
> > >> > >> > quickly.  The network doesn't appear to be saturated, but the
> > load
> > >> is
> > >> > >> > consistently at 6 which is the number or reduce tasks per node.
> > >> > >> >
> > >> > >> > 12 hosts (6 cores, 2 disk as RAID0, 1GB eth, no one else on the
> > >> rack).
> > >> > >> >
> > >> > >> > MR conf: 6 mappers, 6 reducers per node.
> > >> > >> >
> > >> > >> > I spoke to someone on IRC and they recommended reducing job
> > output
> > >> > >> > replication to 1, and reducing the number of mappers which I
> > >> reduced
> > >> > to
> > >> > >> 2.
> > >> > >> >  Reducing replication appeared not to make any difference,
> > reducing
> > >> > >> > reducers appeared just to slow the job down.  I'm going to
> have a
> > >> look
> > >> > >> at
> > >> > >> > running the benchmarks mentioned on Michael Noll's blog and see
> > >> what
> > >> > >> that
> > >> > >> > turns up.  I guess some questions I have are:
> > >> > >> >
> > >> > >> > How does the global number/size of blocks affect perf.?  (I
> have
> > a
> > >> lot
> > >> > >> of
> > >> > >> > 10mb files, which are the input files)
> > >> > >> >
> > >> > >> > How does the job local number/size of input blocks affect
> perf.?
> > >> > >> >
> > >> > >> > What is actually happening in the reduce phase that requires so
> > >> much
> > >> > >> CPU?
> > >> > >> >  I assume the actual construction of HFiles isn't intensive.
> > >> > >> >
> > >> > >> > Ultimately, how can I improve performance?
> > >> > >> > Thanks
> > >> > >> >
> > >> > >>
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Harry Waye, Co-founder/CTO
> > >> > > harry@arachnys.com
> > >> > > +44 7890 734289
> > >> > >
> > >> > > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
> > >> > >
> > >> > > ---
> > >> > > Arachnys Information Services Limited is a company registered in
> > >> England
> > >> > &
> > >> > > Wales. Company number: 7269723. Registered office: 40 Clarendon
> St,
> > >> > > Cambridge, CB1 1JX.
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Harry Waye, Co-founder/CTO
> > >> > harry@arachnys.com
> > >> > +44 7890 734289
> > >> >
> > >> > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
> > >> >
> > >> > ---
> > >> > Arachnys Information Services Limited is a company registered in
> > >> England &
> > >> > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> > >> > Cambridge, CB1 1JX.
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Harry Waye, Co-founder/CTO
> > > harry@arachnys.com
> > > +44 7890 734289
> > >
> > > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
> > >
> > > ---
> > > Arachnys Information Services Limited is a company registered in
> England
> > &
> > > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> > > Cambridge, CB1 1JX.
> > >
> >
> >
> >
> > --
> > Harry Waye, Co-founder/CTO
> > harry@arachnys.com
> > +44 7890 734289
> >
> > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
> >
> > ---
> > Arachnys Information Services Limited is a company registered in England
> &
> > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> > Cambridge, CB1 1JX.
> >
>



-- 
Harry Waye, Co-founder/CTO
harry@arachnys.com
+44 7890 734289

Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>

---
Arachnys Information Services Limited is a company registered in England &
Wales. Company number: 7269723. Registered office: 40 Clarendon St,
Cambridge, CB1 1JX.

Re: Optimizing bulk load performance

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Your nodes are almost 50% idle... Might be something else. Sound it's not
your disks nor your CPU... Maybe to many RCPs?

Have you investigate on your network side? netperf might be a good help for
you.

JM


2013/10/24 Harry Waye <hw...@arachnys.com>

> p.s. I guess this is more turning into a general hadoop issue, but I'll
> keep the discussion here seeing that I have an audience, unless there are
> objections.
>
>
> On 24 October 2013 22:02, Harry Waye <hw...@arachnys.com> wrote:
>
> > So just a short update, I'll read into it a little more tomorrow.  This
> is
> > from three of the nodes:
> > https://gist.github.com/hazzadous/1264af7c674e1b3cf867
> >
> > The first is the grey guy.  Just glancing at it, it looks to fluctuate
> > more than the others.  I guess that could suggest that there are some
> > issues with reading from the disks.  Interestingly, it's the only one
> that
> > doesn't have smartd installed, which alerts us on changes for the other
> > nodes.  I suspect there's probably some mileage in checking its smart
> > attributes.  Will do that tomorrow though.
> >
> > Out of curiosity, how do people normally monitor disk issues?  I'm going
> > to set up collectd to push various things from smartctl tomorrow, at the
> > moment all we do is receive emails, which is mostly noise about problem
> > sector counts increasing +1.
> >
> >
> > On 24 October 2013 19:40, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> >wrote:
> >
> >> Can you try vmstat 2? 2 is the interval in seconds it will display the
> >> disk
> >> usage. On the extract here, nothing is running. only 8% is used. (1%
> disk
> >> IO, 6% User, 1% sys)
> >>
> >> Run it on 2 or 3 different nodes while you are putting the load on the
> >> cluster. And take a look at the 4 last numbers and see what the value of
> >> the last one?
> >>
> >> On the usercpu0 graph, who is the gray guy showing hight?
> >>
> >> JM
> >>
> >> 2013/10/24 Harry Waye <hw...@arachnys.com>
> >>
> >> > Ok I'm running a load job atm, I've add some possibly incomprehensible
> >> > coloured lines to the graph: http://goo.gl/cUGCGG
> >> >
> >> > This is actually with one fewer nodes due to decommissioning to
> replace
> >> a
> >> > disk, hence I guess the reason for one squiggly line showing no disk
> >> > activity.  I've included only the cpu stats for CPU0 from each node.
> >>  The
> >> > last graph should read "Memory Used".  vmstat from one of the nodes:
> >> >
> >> > procs -----------memory---------- ---swap-- -----io---- -system--
> >> > ----cpu----
> >> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
> sy
> >> id
> >> > wa
> >> >  6  0      0 392448 524668 43823900    0    0   501  1044    0    0  6
> >>  1
> >> > 91  1
> >> >
> >> > To me the wait doesn't seem that high.  Job stats are
> >> > http://goo.gl/ZYdUKp,  the job setup is
> >> > https://gist.github.com/hazzadous/ac57a384f2ab685f07f6
> >> >
> >> > Does anything jump out at you?
> >> >
> >> > Cheers
> >> > H
> >> >
> >> >
> >> > On 24 October 2013 16:16, Harry Waye <hw...@arachnys.com> wrote:
> >> >
> >> > > Hi JM
> >> > >
> >> > > I took a snapshot on the initial run, before the changes:
> >> > >
> >> >
> >>
> https://www.evernote.com/shard/s95/sh/b8e1516d-7c49-43f0-8b5f-d16bbdd3fe13/00d7c6cd6dd9fba92d6f00f90fb54fc1/res/4f0e20a2-1ecb-4085-8bc8-b3263c23afb5/screenshot.png
> >> > >
> >> > > Good timing, disks appear to be exploding (ATA errors) atm thus I'm
> >> > > decommissioning and reprovisioning with new disks.  I'll be
> >> > reprovisioning
> >> > > as without RAID (it's software RAID just to compound the issue)
> >> although
> >> > > not sure how I'll go about migrating all nodes.  I guess I'd need to
> >> put
> >> > > more correctly speced nodes in the rack and decommission the
> existing.
> >> > >  Makes diff. to
> >> > >
> >> > > We're using hetzner at the moment which may not have been a good
> >> choice.
> >> > >  Has anyone had any experience with them wrt. Hadoop?  They offer 7
> >> and
> >> > 15
> >> > > disk options, but are low on the cpu front (quad core).  Our
> workload
> >> > will
> >> > > be I assume on the high side.  There's also a 8 disk Dell PowerEdge
> >> what
> >> > is
> >> > > a little more powerful.  What hosting providers would people
> >> recommended?
> >> > >  (And what would be the strategy for migrating?)
> >> > >
> >> > > Anyhow, when I have things more stable I'll have a look at checking
> >> out
> >> > > what's using the cpu.  In the mean time, can anything be gleamed
> from
> >> the
> >> > > above snap?
> >> > >
> >> > > Cheers
> >> > > H
> >> > >
> >> > >
> >> > > On 24 October 2013 15:14, Jean-Marc Spaggiari <
> >> jean-marc@spaggiari.org
> >> > >wrote:
> >> > >
> >> > >> Hi Harry,
> >> > >>
> >> > >> Do you have more details on the exact load? Can you run vmstats and
> >> see
> >> > >> what kind of load it is? Is it user? cpu? wio?
> >> > >>
> >> > >> I suspect your disks to be the issue. There is 2 things here.
> >> > >>
> >> > >> First, we don't recommend RAID for the HDFS/HBase disk. The best is
> >> to
> >> > >> simply mount the disks on 2 mounting points and give them to HDFS.
> >> > >> Second, 2 disks per not is very low. On a dev cluster is not even
> >> > >> recommended. In production, you should go with 12 or more.
> >> > >>
> >> > >> So with only 2 disks in RAID, I suspect your WIO to be high which
> is
> >> > what
> >> > >> might slow your process.
> >> > >>
> >> > >> Can you take a look on that direction? If it's not that, we will
> >> > continue
> >> > >> to investigate ;)
> >> > >>
> >> > >> Thanks,
> >> > >>
> >> > >> JM
> >> > >>
> >> > >>
> >> > >> 2013/10/23 Harry Waye <hw...@arachnys.com>
> >> > >>
> >> > >> > I'm trying to load data into hbase using HFileOutputFormat and
> >> > >> incremental
> >> > >> > bulk load but am getting rather lackluster performance, 10h for
> >> ~0.5TB
> >> > >> > data, ~50000 blocks.  This is being loaded into a table that has
> 2
> >> > >> > families, 9 columns, 2500 regions and is ~10TB in size.  Keys are
> >> md5
> >> > >> > hashes and regions are pretty evenly spread.  The majority of
> time
> >> > >> appears
> >> > >> > to be spend in the reduce phase, with the map phase completing
> very
> >> > >> > quickly.  The network doesn't appear to be saturated, but the
> load
> >> is
> >> > >> > consistently at 6 which is the number or reduce tasks per node.
> >> > >> >
> >> > >> > 12 hosts (6 cores, 2 disk as RAID0, 1GB eth, no one else on the
> >> rack).
> >> > >> >
> >> > >> > MR conf: 6 mappers, 6 reducers per node.
> >> > >> >
> >> > >> > I spoke to someone on IRC and they recommended reducing job
> output
> >> > >> > replication to 1, and reducing the number of mappers which I
> >> reduced
> >> > to
> >> > >> 2.
> >> > >> >  Reducing replication appeared not to make any difference,
> reducing
> >> > >> > reducers appeared just to slow the job down.  I'm going to have a
> >> look
> >> > >> at
> >> > >> > running the benchmarks mentioned on Michael Noll's blog and see
> >> what
> >> > >> that
> >> > >> > turns up.  I guess some questions I have are:
> >> > >> >
> >> > >> > How does the global number/size of blocks affect perf.?  (I have
> a
> >> lot
> >> > >> of
> >> > >> > 10mb files, which are the input files)
> >> > >> >
> >> > >> > How does the job local number/size of input blocks affect perf.?
> >> > >> >
> >> > >> > What is actually happening in the reduce phase that requires so
> >> much
> >> > >> CPU?
> >> > >> >  I assume the actual construction of HFiles isn't intensive.
> >> > >> >
> >> > >> > Ultimately, how can I improve performance?
> >> > >> > Thanks
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Harry Waye, Co-founder/CTO
> >> > > harry@arachnys.com
> >> > > +44 7890 734289
> >> > >
> >> > > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
> >> > >
> >> > > ---
> >> > > Arachnys Information Services Limited is a company registered in
> >> England
> >> > &
> >> > > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> >> > > Cambridge, CB1 1JX.
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Harry Waye, Co-founder/CTO
> >> > harry@arachnys.com
> >> > +44 7890 734289
> >> >
> >> > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
> >> >
> >> > ---
> >> > Arachnys Information Services Limited is a company registered in
> >> England &
> >> > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> >> > Cambridge, CB1 1JX.
> >> >
> >>
> >
> >
> >
> > --
> > Harry Waye, Co-founder/CTO
> > harry@arachnys.com
> > +44 7890 734289
> >
> > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
> >
> > ---
> > Arachnys Information Services Limited is a company registered in England
> &
> > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> > Cambridge, CB1 1JX.
> >
>
>
>
> --
> Harry Waye, Co-founder/CTO
> harry@arachnys.com
> +44 7890 734289
>
> Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
>
> ---
> Arachnys Information Services Limited is a company registered in England &
> Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> Cambridge, CB1 1JX.
>

Re: Optimizing bulk load performance

Posted by Harry Waye <hw...@arachnys.com>.

p.s. I guess this is more turning into a general hadoop issue, but I'll
keep the discussion here seeing that I have an audience, unless there are
objections.


On 24 October 2013 22:02, Harry Waye <hw...@arachnys.com> wrote:

> So just a short update, I'll read into it a little more tomorrow.  This is
> from three of the nodes:
> https://gist.github.com/hazzadous/1264af7c674e1b3cf867
>
> The first is the grey guy.  Just glancing at it, it looks to fluctuate
> more than the others.  I guess that could suggest that there are some
> issues with reading from the disks.  Interestingly, it's the only one that
> doesn't have smartd installed, which alerts us on changes for the other
> nodes.  I suspect there's probably some mileage in checking its smart
> attributes.  Will do that tomorrow though.
>
> Out of curiosity, how do people normally monitor disk issues?  I'm going
> to set up collectd to push various things from smartctl tomorrow, at the
> moment all we do is receive emails, which is mostly noise about problem
> sector counts increasing +1.
>
>
> On 24 October 2013 19:40, Jean-Marc Spaggiari <je...@spaggiari.org>wrote:
>
>> Can you try vmstat 2? 2 is the interval in seconds it will display the
>> disk
>> usage. On the extract here, nothing is running. only 8% is used. (1% disk
>> IO, 6% User, 1% sys)
>>
>> Run it on 2 or 3 different nodes while you are putting the load on the
>> cluster. And take a look at the 4 last numbers and see what the value of
>> the last one?
>>
>> On the usercpu0 graph, who is the gray guy showing hight?
>>
>> JM
>>
>> 2013/10/24 Harry Waye <hw...@arachnys.com>
>>
>> > Ok I'm running a load job atm, I've add some possibly incomprehensible
>> > coloured lines to the graph: http://goo.gl/cUGCGG
>> >
>> > This is actually with one fewer nodes due to decommissioning to replace
>> a
>> > disk, hence I guess the reason for one squiggly line showing no disk
>> > activity.  I've included only the cpu stats for CPU0 from each node.
>>  The
>> > last graph should read "Memory Used".  vmstat from one of the nodes:
>> >
>> > procs -----------memory---------- ---swap-- -----io---- -system--
>> > ----cpu----
>> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
>> id
>> > wa
>> >  6  0      0 392448 524668 43823900    0    0   501  1044    0    0  6
>>  1
>> > 91  1
>> >
>> > To me the wait doesn't seem that high.  Job stats are
>> > http://goo.gl/ZYdUKp,  the job setup is
>> > https://gist.github.com/hazzadous/ac57a384f2ab685f07f6
>> >
>> > Does anything jump out at you?
>> >
>> > Cheers
>> > H
>> >
>> >
>> > On 24 October 2013 16:16, Harry Waye <hw...@arachnys.com> wrote:
>> >
>> > > Hi JM
>> > >
>> > > I took a snapshot on the initial run, before the changes:
>> > >
>> >
>> https://www.evernote.com/shard/s95/sh/b8e1516d-7c49-43f0-8b5f-d16bbdd3fe13/00d7c6cd6dd9fba92d6f00f90fb54fc1/res/4f0e20a2-1ecb-4085-8bc8-b3263c23afb5/screenshot.png
>> > >
>> > > Good timing, disks appear to be exploding (ATA errors) atm thus I'm
>> > > decommissioning and reprovisioning with new disks.  I'll be
>> > reprovisioning
>> > > as without RAID (it's software RAID just to compound the issue)
>> although
>> > > not sure how I'll go about migrating all nodes.  I guess I'd need to
>> put
>> > > more correctly speced nodes in the rack and decommission the existing.
>> > >  Makes diff. to
>> > >
>> > > We're using hetzner at the moment which may not have been a good
>> choice.
>> > >  Has anyone had any experience with them wrt. Hadoop?  They offer 7
>> and
>> > 15
>> > > disk options, but are low on the cpu front (quad core).  Our workload
>> > will
>> > > be I assume on the high side.  There's also a 8 disk Dell PowerEdge
>> what
>> > is
>> > > a little more powerful.  What hosting providers would people
>> recommended?
>> > >  (And what would be the strategy for migrating?)
>> > >
>> > > Anyhow, when I have things more stable I'll have a look at checking
>> out
>> > > what's using the cpu.  In the mean time, can anything be gleamed from
>> the
>> > > above snap?
>> > >
>> > > Cheers
>> > > H
>> > >
>> > >
>> > > On 24 October 2013 15:14, Jean-Marc Spaggiari <
>> jean-marc@spaggiari.org
>> > >wrote:
>> > >
>> > >> Hi Harry,
>> > >>
>> > >> Do you have more details on the exact load? Can you run vmstats and
>> see
>> > >> what kind of load it is? Is it user? cpu? wio?
>> > >>
>> > >> I suspect your disks to be the issue. There is 2 things here.
>> > >>
>> > >> First, we don't recommend RAID for the HDFS/HBase disk. The best is
>> to
>> > >> simply mount the disks on 2 mounting points and give them to HDFS.
>> > >> Second, 2 disks per not is very low. On a dev cluster is not even
>> > >> recommended. In production, you should go with 12 or more.
>> > >>
>> > >> So with only 2 disks in RAID, I suspect your WIO to be high which is
>> > what
>> > >> might slow your process.
>> > >>
>> > >> Can you take a look on that direction? If it's not that, we will
>> > continue
>> > >> to investigate ;)
>> > >>
>> > >> Thanks,
>> > >>
>> > >> JM
>> > >>
>> > >>
>> > >> 2013/10/23 Harry Waye <hw...@arachnys.com>
>> > >>
>> > >> > I'm trying to load data into hbase using HFileOutputFormat and
>> > >> incremental
>> > >> > bulk load but am getting rather lackluster performance, 10h for
>> ~0.5TB
>> > >> > data, ~50000 blocks.  This is being loaded into a table that has 2
>> > >> > families, 9 columns, 2500 regions and is ~10TB in size.  Keys are
>> md5
>> > >> > hashes and regions are pretty evenly spread.  The majority of time
>> > >> appears
>> > >> > to be spend in the reduce phase, with the map phase completing very
>> > >> > quickly.  The network doesn't appear to be saturated, but the load
>> is
>> > >> > consistently at 6 which is the number or reduce tasks per node.
>> > >> >
>> > >> > 12 hosts (6 cores, 2 disk as RAID0, 1GB eth, no one else on the
>> rack).
>> > >> >
>> > >> > MR conf: 6 mappers, 6 reducers per node.
>> > >> >
>> > >> > I spoke to someone on IRC and they recommended reducing job output
>> > >> > replication to 1, and reducing the number of mappers which I
>> reduced
>> > to
>> > >> 2.
>> > >> >  Reducing replication appeared not to make any difference, reducing
>> > >> > reducers appeared just to slow the job down.  I'm going to have a
>> look
>> > >> at
>> > >> > running the benchmarks mentioned on Michael Noll's blog and see
>> what
>> > >> that
>> > >> > turns up.  I guess some questions I have are:
>> > >> >
>> > >> > How does the global number/size of blocks affect perf.?  (I have a
>> lot
>> > >> of
>> > >> > 10mb files, which are the input files)
>> > >> >
>> > >> > How does the job local number/size of input blocks affect perf.?
>> > >> >
>> > >> > What is actually happening in the reduce phase that requires so
>> much
>> > >> CPU?
>> > >> >  I assume the actual construction of HFiles isn't intensive.
>> > >> >
>> > >> > Ultimately, how can I improve performance?
>> > >> > Thanks
>> > >> >
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > > Harry Waye, Co-founder/CTO
>> > > harry@arachnys.com
>> > > +44 7890 734289
>> > >
>> > > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
>> > >
>> > > ---
>> > > Arachnys Information Services Limited is a company registered in
>> England
>> > &
>> > > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
>> > > Cambridge, CB1 1JX.
>> > >
>> >
>> >
>> >
>> > --
>> > Harry Waye, Co-founder/CTO
>> > harry@arachnys.com
>> > +44 7890 734289
>> >
>> > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
>> >
>> > ---
>> > Arachnys Information Services Limited is a company registered in
>> England &
>> > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
>> > Cambridge, CB1 1JX.
>> >
>>
>
>
>
> --
> Harry Waye, Co-founder/CTO
> harry@arachnys.com
> +44 7890 734289
>
> Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
>
> ---
> Arachnys Information Services Limited is a company registered in England &
> Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> Cambridge, CB1 1JX.
>



-- 
Harry Waye, Co-founder/CTO
harry@arachnys.com
+44 7890 734289

Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>

---
Arachnys Information Services Limited is a company registered in England &
Wales. Company number: 7269723. Registered office: 40 Clarendon St,
Cambridge, CB1 1JX.

Re: Optimizing bulk load performance

Posted by Harry Waye <hw...@arachnys.com>.

So just a short update, I'll read into it a little more tomorrow.  This is
from three of the nodes:
https://gist.github.com/hazzadous/1264af7c674e1b3cf867

The first is the grey guy.  Just glancing at it, it looks to fluctuate more
than the others.  I guess that could suggest that there are some issues
with reading from the disks.  Interestingly, it's the only one that doesn't
have smartd installed, which alerts us on changes for the other nodes.  I
suspect there's probably some mileage in checking its smart attributes.
 Will do that tomorrow though.

Out of curiosity, how do people normally monitor disk issues?  I'm going to
set up collectd to push various things from smartctl tomorrow, at the
moment all we do is receive emails, which is mostly noise about problem
sector counts increasing +1.


On 24 October 2013 19:40, Jean-Marc Spaggiari <je...@spaggiari.org>wrote:

> Can you try vmstat 2? 2 is the interval in seconds it will display the disk
> usage. On the extract here, nothing is running. only 8% is used. (1% disk
> IO, 6% User, 1% sys)
>
> Run it on 2 or 3 different nodes while you are putting the load on the
> cluster. And take a look at the 4 last numbers and see what the value of
> the last one?
>
> On the usercpu0 graph, who is the gray guy showing hight?
>
> JM
>
> 2013/10/24 Harry Waye <hw...@arachnys.com>
>
> > Ok I'm running a load job atm, I've add some possibly incomprehensible
> > coloured lines to the graph: http://goo.gl/cUGCGG
> >
> > This is actually with one fewer nodes due to decommissioning to replace a
> > disk, hence I guess the reason for one squiggly line showing no disk
> > activity.  I've included only the cpu stats for CPU0 from each node.  The
> > last graph should read "Memory Used".  vmstat from one of the nodes:
> >
> > procs -----------memory---------- ---swap-- -----io---- -system--
> > ----cpu----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
> id
> > wa
> >  6  0      0 392448 524668 43823900    0    0   501  1044    0    0  6  1
> > 91  1
> >
> > To me the wait doesn't seem that high.  Job stats are
> > http://goo.gl/ZYdUKp,  the job setup is
> > https://gist.github.com/hazzadous/ac57a384f2ab685f07f6
> >
> > Does anything jump out at you?
> >
> > Cheers
> > H
> >
> >
> > On 24 October 2013 16:16, Harry Waye <hw...@arachnys.com> wrote:
> >
> > > Hi JM
> > >
> > > I took a snapshot on the initial run, before the changes:
> > >
> >
> https://www.evernote.com/shard/s95/sh/b8e1516d-7c49-43f0-8b5f-d16bbdd3fe13/00d7c6cd6dd9fba92d6f00f90fb54fc1/res/4f0e20a2-1ecb-4085-8bc8-b3263c23afb5/screenshot.png
> > >
> > > Good timing, disks appear to be exploding (ATA errors) atm thus I'm
> > > decommissioning and reprovisioning with new disks.  I'll be
> > reprovisioning
> > > as without RAID (it's software RAID just to compound the issue)
> although
> > > not sure how I'll go about migrating all nodes.  I guess I'd need to
> put
> > > more correctly speced nodes in the rack and decommission the existing.
> > >  Makes diff. to
> > >
> > > We're using hetzner at the moment which may not have been a good
> choice.
> > >  Has anyone had any experience with them wrt. Hadoop?  They offer 7 and
> > 15
> > > disk options, but are low on the cpu front (quad core).  Our workload
> > will
> > > be I assume on the high side.  There's also a 8 disk Dell PowerEdge
> what
> > is
> > > a little more powerful.  What hosting providers would people
> recommended?
> > >  (And what would be the strategy for migrating?)
> > >
> > > Anyhow, when I have things more stable I'll have a look at checking out
> > > what's using the cpu.  In the mean time, can anything be gleamed from
> the
> > > above snap?
> > >
> > > Cheers
> > > H
> > >
> > >
> > > On 24 October 2013 15:14, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> > >wrote:
> > >
> > >> Hi Harry,
> > >>
> > >> Do you have more details on the exact load? Can you run vmstats and
> see
> > >> what kind of load it is? Is it user? cpu? wio?
> > >>
> > >> I suspect your disks to be the issue. There is 2 things here.
> > >>
> > >> First, we don't recommend RAID for the HDFS/HBase disk. The best is to
> > >> simply mount the disks on 2 mounting points and give them to HDFS.
> > >> Second, 2 disks per not is very low. On a dev cluster is not even
> > >> recommended. In production, you should go with 12 or more.
> > >>
> > >> So with only 2 disks in RAID, I suspect your WIO to be high which is
> > what
> > >> might slow your process.
> > >>
> > >> Can you take a look on that direction? If it's not that, we will
> > continue
> > >> to investigate ;)
> > >>
> > >> Thanks,
> > >>
> > >> JM
> > >>
> > >>
> > >> 2013/10/23 Harry Waye <hw...@arachnys.com>
> > >>
> > >> > I'm trying to load data into hbase using HFileOutputFormat and
> > >> incremental
> > >> > bulk load but am getting rather lackluster performance, 10h for
> ~0.5TB
> > >> > data, ~50000 blocks.  This is being loaded into a table that has 2
> > >> > families, 9 columns, 2500 regions and is ~10TB in size.  Keys are
> md5
> > >> > hashes and regions are pretty evenly spread.  The majority of time
> > >> appears
> > >> > to be spend in the reduce phase, with the map phase completing very
> > >> > quickly.  The network doesn't appear to be saturated, but the load
> is
> > >> > consistently at 6 which is the number or reduce tasks per node.
> > >> >
> > >> > 12 hosts (6 cores, 2 disk as RAID0, 1GB eth, no one else on the
> rack).
> > >> >
> > >> > MR conf: 6 mappers, 6 reducers per node.
> > >> >
> > >> > I spoke to someone on IRC and they recommended reducing job output
> > >> > replication to 1, and reducing the number of mappers which I reduced
> > to
> > >> 2.
> > >> >  Reducing replication appeared not to make any difference, reducing
> > >> > reducers appeared just to slow the job down.  I'm going to have a
> look
> > >> at
> > >> > running the benchmarks mentioned on Michael Noll's blog and see what
> > >> that
> > >> > turns up.  I guess some questions I have are:
> > >> >
> > >> > How does the global number/size of blocks affect perf.?  (I have a
> lot
> > >> of
> > >> > 10mb files, which are the input files)
> > >> >
> > >> > How does the job local number/size of input blocks affect perf.?
> > >> >
> > >> > What is actually happening in the reduce phase that requires so much
> > >> CPU?
> > >> >  I assume the actual construction of HFiles isn't intensive.
> > >> >
> > >> > Ultimately, how can I improve performance?
> > >> > Thanks
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Harry Waye, Co-founder/CTO
> > > harry@arachnys.com
> > > +44 7890 734289
> > >
> > > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
> > >
> > > ---
> > > Arachnys Information Services Limited is a company registered in
> England
> > &
> > > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> > > Cambridge, CB1 1JX.
> > >
> >
> >
> >
> > --
> > Harry Waye, Co-founder/CTO
> > harry@arachnys.com
> > +44 7890 734289
> >
> > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
> >
> > ---
> > Arachnys Information Services Limited is a company registered in England
> &
> > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> > Cambridge, CB1 1JX.
> >
>



-- 
Harry Waye, Co-founder/CTO
harry@arachnys.com
+44 7890 734289

Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>

---
Arachnys Information Services Limited is a company registered in England &
Wales. Company number: 7269723. Registered office: 40 Clarendon St,
Cambridge, CB1 1JX.

Re: Optimizing bulk load performance

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Can you try vmstat 2? 2 is the interval in seconds it will display the disk
usage. On the extract here, nothing is running. only 8% is used. (1% disk
IO, 6% User, 1% sys)

Run it on 2 or 3 different nodes while you are putting the load on the
cluster. And take a look at the 4 last numbers and see what the value of
the last one?

On the usercpu0 graph, who is the gray guy showing hight?

JM

2013/10/24 Harry Waye <hw...@arachnys.com>

> Ok I'm running a load job atm, I've add some possibly incomprehensible
> coloured lines to the graph: http://goo.gl/cUGCGG
>
> This is actually with one fewer nodes due to decommissioning to replace a
> disk, hence I guess the reason for one squiggly line showing no disk
> activity.  I've included only the cpu stats for CPU0 from each node.  The
> last graph should read "Memory Used".  vmstat from one of the nodes:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa
>  6  0      0 392448 524668 43823900    0    0   501  1044    0    0  6  1
> 91  1
>
> To me the wait doesn't seem that high.  Job stats are
> http://goo.gl/ZYdUKp,  the job setup is
> https://gist.github.com/hazzadous/ac57a384f2ab685f07f6
>
> Does anything jump out at you?
>
> Cheers
> H
>
>
> On 24 October 2013 16:16, Harry Waye <hw...@arachnys.com> wrote:
>
> > Hi JM
> >
> > I took a snapshot on the initial run, before the changes:
> >
> https://www.evernote.com/shard/s95/sh/b8e1516d-7c49-43f0-8b5f-d16bbdd3fe13/00d7c6cd6dd9fba92d6f00f90fb54fc1/res/4f0e20a2-1ecb-4085-8bc8-b3263c23afb5/screenshot.png
> >
> > Good timing, disks appear to be exploding (ATA errors) atm thus I'm
> > decommissioning and reprovisioning with new disks.  I'll be
> reprovisioning
> > as without RAID (it's software RAID just to compound the issue) although
> > not sure how I'll go about migrating all nodes.  I guess I'd need to put
> > more correctly speced nodes in the rack and decommission the existing.
> >  Makes diff. to
> >
> > We're using hetzner at the moment which may not have been a good choice.
> >  Has anyone had any experience with them wrt. Hadoop?  They offer 7 and
> 15
> > disk options, but are low on the cpu front (quad core).  Our workload
> will
> > be I assume on the high side.  There's also a 8 disk Dell PowerEdge what
> is
> > a little more powerful.  What hosting providers would people recommended?
> >  (And what would be the strategy for migrating?)
> >
> > Anyhow, when I have things more stable I'll have a look at checking out
> > what's using the cpu.  In the mean time, can anything be gleamed from the
> > above snap?
> >
> > Cheers
> > H
> >
> >
> > On 24 October 2013 15:14, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> >wrote:
> >
> >> Hi Harry,
> >>
> >> Do you have more details on the exact load? Can you run vmstats and see
> >> what kind of load it is? Is it user? cpu? wio?
> >>
> >> I suspect your disks to be the issue. There is 2 things here.
> >>
> >> First, we don't recommend RAID for the HDFS/HBase disk. The best is to
> >> simply mount the disks on 2 mounting points and give them to HDFS.
> >> Second, 2 disks per not is very low. On a dev cluster is not even
> >> recommended. In production, you should go with 12 or more.
> >>
> >> So with only 2 disks in RAID, I suspect your WIO to be high which is
> what
> >> might slow your process.
> >>
> >> Can you take a look on that direction? If it's not that, we will
> continue
> >> to investigate ;)
> >>
> >> Thanks,
> >>
> >> JM
> >>
> >>
> >> 2013/10/23 Harry Waye <hw...@arachnys.com>
> >>
> >> > I'm trying to load data into hbase using HFileOutputFormat and
> >> incremental
> >> > bulk load but am getting rather lackluster performance, 10h for ~0.5TB
> >> > data, ~50000 blocks.  This is being loaded into a table that has 2
> >> > families, 9 columns, 2500 regions and is ~10TB in size.  Keys are md5
> >> > hashes and regions are pretty evenly spread.  The majority of time
> >> appears
> >> > to be spend in the reduce phase, with the map phase completing very
> >> > quickly.  The network doesn't appear to be saturated, but the load is
> >> > consistently at 6 which is the number or reduce tasks per node.
> >> >
> >> > 12 hosts (6 cores, 2 disk as RAID0, 1GB eth, no one else on the rack).
> >> >
> >> > MR conf: 6 mappers, 6 reducers per node.
> >> >
> >> > I spoke to someone on IRC and they recommended reducing job output
> >> > replication to 1, and reducing the number of mappers which I reduced
> to
> >> 2.
> >> >  Reducing replication appeared not to make any difference, reducing
> >> > reducers appeared just to slow the job down.  I'm going to have a look
> >> at
> >> > running the benchmarks mentioned on Michael Noll's blog and see what
> >> that
> >> > turns up.  I guess some questions I have are:
> >> >
> >> > How does the global number/size of blocks affect perf.?  (I have a lot
> >> of
> >> > 10mb files, which are the input files)
> >> >
> >> > How does the job local number/size of input blocks affect perf.?
> >> >
> >> > What is actually happening in the reduce phase that requires so much
> >> CPU?
> >> >  I assume the actual construction of HFiles isn't intensive.
> >> >
> >> > Ultimately, how can I improve performance?
> >> > Thanks
> >> >
> >>
> >
> >
> >
> > --
> > Harry Waye, Co-founder/CTO
> > harry@arachnys.com
> > +44 7890 734289
> >
> > Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
> >
> > ---
> > Arachnys Information Services Limited is a company registered in England
> &
> > Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> > Cambridge, CB1 1JX.
> >
>
>
>
> --
> Harry Waye, Co-founder/CTO
> harry@arachnys.com
> +44 7890 734289
>
> Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
>
> ---
> Arachnys Information Services Limited is a company registered in England &
> Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> Cambridge, CB1 1JX.
>

Re: Optimizing bulk load performance

Posted by Harry Waye <hw...@arachnys.com>.

Ok I'm running a load job atm, I've add some possibly incomprehensible
coloured lines to the graph: http://goo.gl/cUGCGG

This is actually with one fewer nodes due to decommissioning to replace a
disk, hence I guess the reason for one squiggly line showing no disk
activity.  I've included only the cpu stats for CPU0 from each node.  The
last graph should read "Memory Used".  vmstat from one of the nodes:

procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
wa
 6  0      0 392448 524668 43823900    0    0   501  1044    0    0  6  1
91  1

To me the wait doesn't seem that high.  Job stats are
http://goo.gl/ZYdUKp,  the job setup is
https://gist.github.com/hazzadous/ac57a384f2ab685f07f6

Does anything jump out at you?

Cheers
H


On 24 October 2013 16:16, Harry Waye <hw...@arachnys.com> wrote:

> Hi JM
>
> I took a snapshot on the initial run, before the changes:
> https://www.evernote.com/shard/s95/sh/b8e1516d-7c49-43f0-8b5f-d16bbdd3fe13/00d7c6cd6dd9fba92d6f00f90fb54fc1/res/4f0e20a2-1ecb-4085-8bc8-b3263c23afb5/screenshot.png
>
> Good timing, disks appear to be exploding (ATA errors) atm thus I'm
> decommissioning and reprovisioning with new disks.  I'll be reprovisioning
> as without RAID (it's software RAID just to compound the issue) although
> not sure how I'll go about migrating all nodes.  I guess I'd need to put
> more correctly speced nodes in the rack and decommission the existing.
>  Makes diff. to
>
> We're using hetzner at the moment which may not have been a good choice.
>  Has anyone had any experience with them wrt. Hadoop?  They offer 7 and 15
> disk options, but are low on the cpu front (quad core).  Our workload will
> be I assume on the high side.  There's also a 8 disk Dell PowerEdge what is
> a little more powerful.  What hosting providers would people recommended?
>  (And what would be the strategy for migrating?)
>
> Anyhow, when I have things more stable I'll have a look at checking out
> what's using the cpu.  In the mean time, can anything be gleamed from the
> above snap?
>
> Cheers
> H
>
>
> On 24 October 2013 15:14, Jean-Marc Spaggiari <je...@spaggiari.org>wrote:
>
>> Hi Harry,
>>
>> Do you have more details on the exact load? Can you run vmstats and see
>> what kind of load it is? Is it user? cpu? wio?
>>
>> I suspect your disks to be the issue. There is 2 things here.
>>
>> First, we don't recommend RAID for the HDFS/HBase disk. The best is to
>> simply mount the disks on 2 mounting points and give them to HDFS.
>> Second, 2 disks per not is very low. On a dev cluster is not even
>> recommended. In production, you should go with 12 or more.
>>
>> So with only 2 disks in RAID, I suspect your WIO to be high which is what
>> might slow your process.
>>
>> Can you take a look on that direction? If it's not that, we will continue
>> to investigate ;)
>>
>> Thanks,
>>
>> JM
>>
>>
>> 2013/10/23 Harry Waye <hw...@arachnys.com>
>>
>> > I'm trying to load data into hbase using HFileOutputFormat and
>> incremental
>> > bulk load but am getting rather lackluster performance, 10h for ~0.5TB
>> > data, ~50000 blocks.  This is being loaded into a table that has 2
>> > families, 9 columns, 2500 regions and is ~10TB in size.  Keys are md5
>> > hashes and regions are pretty evenly spread.  The majority of time
>> appears
>> > to be spend in the reduce phase, with the map phase completing very
>> > quickly.  The network doesn't appear to be saturated, but the load is
>> > consistently at 6 which is the number or reduce tasks per node.
>> >
>> > 12 hosts (6 cores, 2 disk as RAID0, 1GB eth, no one else on the rack).
>> >
>> > MR conf: 6 mappers, 6 reducers per node.
>> >
>> > I spoke to someone on IRC and they recommended reducing job output
>> > replication to 1, and reducing the number of mappers which I reduced to
>> 2.
>> >  Reducing replication appeared not to make any difference, reducing
>> > reducers appeared just to slow the job down.  I'm going to have a look
>> at
>> > running the benchmarks mentioned on Michael Noll's blog and see what
>> that
>> > turns up.  I guess some questions I have are:
>> >
>> > How does the global number/size of blocks affect perf.?  (I have a lot
>> of
>> > 10mb files, which are the input files)
>> >
>> > How does the job local number/size of input blocks affect perf.?
>> >
>> > What is actually happening in the reduce phase that requires so much
>> CPU?
>> >  I assume the actual construction of HFiles isn't intensive.
>> >
>> > Ultimately, how can I improve performance?
>> > Thanks
>> >
>>
>
>
>
> --
> Harry Waye, Co-founder/CTO
> harry@arachnys.com
> +44 7890 734289
>
> Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>
>
> ---
> Arachnys Information Services Limited is a company registered in England &
> Wales. Company number: 7269723. Registered office: 40 Clarendon St,
> Cambridge, CB1 1JX.
>



-- 
Harry Waye, Co-founder/CTO
harry@arachnys.com
+44 7890 734289

Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>

---
Arachnys Information Services Limited is a company registered in England &
Wales. Company number: 7269723. Registered office: 40 Clarendon St,
Cambridge, CB1 1JX.

Re: Optimizing bulk load performance

Posted by Harry Waye <hw...@arachnys.com>.

Hi JM

I took a snapshot on the initial run, before the changes:
https://www.evernote.com/shard/s95/sh/b8e1516d-7c49-43f0-8b5f-d16bbdd3fe13/00d7c6cd6dd9fba92d6f00f90fb54fc1/res/4f0e20a2-1ecb-4085-8bc8-b3263c23afb5/screenshot.png

Good timing, disks appear to be exploding (ATA errors) atm thus I'm
decommissioning and reprovisioning with new disks.  I'll be reprovisioning
as without RAID (it's software RAID just to compound the issue) although
not sure how I'll go about migrating all nodes.  I guess I'd need to put
more correctly speced nodes in the rack and decommission the existing.
 Makes diff. to

We're using hetzner at the moment which may not have been a good choice.
 Has anyone had any experience with them wrt. Hadoop?  They offer 7 and 15
disk options, but are low on the cpu front (quad core).  Our workload will
be I assume on the high side.  There's also a 8 disk Dell PowerEdge what is
a little more powerful.  What hosting providers would people recommended?
 (And what would be the strategy for migrating?)

Anyhow, when I have things more stable I'll have a look at checking out
what's using the cpu.  In the mean time, can anything be gleamed from the
above snap?

Cheers
H


On 24 October 2013 15:14, Jean-Marc Spaggiari <je...@spaggiari.org>wrote:

> Hi Harry,
>
> Do you have more details on the exact load? Can you run vmstats and see
> what kind of load it is? Is it user? cpu? wio?
>
> I suspect your disks to be the issue. There is 2 things here.
>
> First, we don't recommend RAID for the HDFS/HBase disk. The best is to
> simply mount the disks on 2 mounting points and give them to HDFS.
> Second, 2 disks per not is very low. On a dev cluster is not even
> recommended. In production, you should go with 12 or more.
>
> So with only 2 disks in RAID, I suspect your WIO to be high which is what
> might slow your process.
>
> Can you take a look on that direction? If it's not that, we will continue
> to investigate ;)
>
> Thanks,
>
> JM
>
>
> 2013/10/23 Harry Waye <hw...@arachnys.com>
>
> > I'm trying to load data into hbase using HFileOutputFormat and
> incremental
> > bulk load but am getting rather lackluster performance, 10h for ~0.5TB
> > data, ~50000 blocks.  This is being loaded into a table that has 2
> > families, 9 columns, 2500 regions and is ~10TB in size.  Keys are md5
> > hashes and regions are pretty evenly spread.  The majority of time
> appears
> > to be spend in the reduce phase, with the map phase completing very
> > quickly.  The network doesn't appear to be saturated, but the load is
> > consistently at 6 which is the number or reduce tasks per node.
> >
> > 12 hosts (6 cores, 2 disk as RAID0, 1GB eth, no one else on the rack).
> >
> > MR conf: 6 mappers, 6 reducers per node.
> >
> > I spoke to someone on IRC and they recommended reducing job output
> > replication to 1, and reducing the number of mappers which I reduced to
> 2.
> >  Reducing replication appeared not to make any difference, reducing
> > reducers appeared just to slow the job down.  I'm going to have a look at
> > running the benchmarks mentioned on Michael Noll's blog and see what that
> > turns up.  I guess some questions I have are:
> >
> > How does the global number/size of blocks affect perf.?  (I have a lot of
> > 10mb files, which are the input files)
> >
> > How does the job local number/size of input blocks affect perf.?
> >
> > What is actually happening in the reduce phase that requires so much CPU?
> >  I assume the actual construction of HFiles isn't intensive.
> >
> > Ultimately, how can I improve performance?
> > Thanks
> >
>



-- 
Harry Waye, Co-founder/CTO
harry@arachnys.com
+44 7890 734289

Follow us on Twitter: @arachnys <https://twitter.com/#!/arachnys>

---
Arachnys Information Services Limited is a company registered in England &
Wales. Company number: 7269723. Registered office: 40 Clarendon St,
Cambridge, CB1 1JX.

Re: Optimizing bulk load performance

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Harry,

Do you have more details on the exact load? Can you run vmstats and see
what kind of load it is? Is it user? cpu? wio?

I suspect your disks to be the issue. There is 2 things here.

First, we don't recommend RAID for the HDFS/HBase disk. The best is to
simply mount the disks on 2 mounting points and give them to HDFS.
Second, 2 disks per not is very low. On a dev cluster is not even
recommended. In production, you should go with 12 or more.

So with only 2 disks in RAID, I suspect your WIO to be high which is what
might slow your process.

Can you take a look on that direction? If it's not that, we will continue
to investigate ;)

Thanks,

JM


2013/10/23 Harry Waye <hw...@arachnys.com>

> I'm trying to load data into hbase using HFileOutputFormat and incremental
> bulk load but am getting rather lackluster performance, 10h for ~0.5TB
> data, ~50000 blocks.  This is being loaded into a table that has 2
> families, 9 columns, 2500 regions and is ~10TB in size.  Keys are md5
> hashes and regions are pretty evenly spread.  The majority of time appears
> to be spend in the reduce phase, with the map phase completing very
> quickly.  The network doesn't appear to be saturated, but the load is
> consistently at 6 which is the number or reduce tasks per node.
>
> 12 hosts (6 cores, 2 disk as RAID0, 1GB eth, no one else on the rack).
>
> MR conf: 6 mappers, 6 reducers per node.
>
> I spoke to someone on IRC and they recommended reducing job output
> replication to 1, and reducing the number of mappers which I reduced to 2.
>  Reducing replication appeared not to make any difference, reducing
> reducers appeared just to slow the job down.  I'm going to have a look at
> running the benchmarks mentioned on Michael Noll's blog and see what that
> turns up.  I guess some questions I have are:
>
> How does the global number/size of blocks affect perf.?  (I have a lot of
> 10mb files, which are the input files)
>
> How does the job local number/size of input blocks affect perf.?
>
> What is actually happening in the reduce phase that requires so much CPU?
>  I assume the actual construction of HFiles isn't intensive.
>
> Ultimately, how can I improve performance?
> Thanks
>

Re: Optimizing bulk load performance

Posted by Premal Shah <pr...@gmail.com>.

Hi Harry,
I'm currently working on Map Reduce which also involves incremental bulk
load using the HFileOutputFormat and I see similar performance in the
reduce phase and I believe this is the reason. The KeyValues have to be
sorted before being written to the HFile. So, the reducer runs a
TotalOrderPartitioner<http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapred/lib/TotalOrderPartitioner.html>
to
sort your map output and depending on how much data there is to sort plus
the allocated memory, sorting could be a performance bottleneck. The number
of reducers = number of regions and that cannot be overridden in the job
config. I guess this is related to your issue.

Hope this helps.

On Wed, Oct 23, 2013 at 7:57 AM, Harry Waye <hw...@arachnys.com> wrote:

> I'm trying to load data into hbase using HFileOutputFormat and incremental
> bulk load but am getting rather lackluster performance, 10h for ~0.5TB
> data, ~50000 blocks.  This is being loaded into a table that has 2
> families, 9 columns, 2500 regions and is ~10TB in size.  Keys are md5
> hashes and regions are pretty evenly spread.  The majority of time appears
> to be spend in the reduce phase, with the map phase completing very
> quickly.  The network doesn't appear to be saturated, but the load is
> consistently at 6 which is the number or reduce tasks per node.
>
> 12 hosts (6 cores, 2 disk as RAID0, 1GB eth, no one else on the rack).
>
> MR conf: 6 mappers, 6 reducers per node.
>
> I spoke to someone on IRC and they recommended reducing job output
> replication to 1, and reducing the number of mappers which I reduced to 2.
>  Reducing replication appeared not to make any difference, reducing
> reducers appeared just to slow the job down.  I'm going to have a look at
> running the benchmarks mentioned on Michael Noll's blog and see what that
> turns up.  I guess some questions I have are:
>
> How does the global number/size of blocks affect perf.?  (I have a lot of
> 10mb files, which are the input files)
>
> How does the job local number/size of input blocks affect perf.?
>
> What is actually happening in the reduce phase that requires so much CPU?
>  I assume the actual construction of HFiles isn't intensive.
>
> Ultimately, how can I improve performance?
> Thanks
>

-- 
Regards,
Premal Shah.