You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Mahadev Konar <ma...@yahoo-inc.com> on 2009/11/07 03:40:01 UTC
ApacheCon 2009 Meetup talk.
Hi all,
I had given a brief overview of ZooKeeper and BookKeeper at ApacheCon
meetup this week. The talk is uploaded at
http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations
In case you guys are interested.
Thanks
mahadev
Re: ZK on EC2
Posted by Ted Dunning <te...@gmail.com>.
Several of our search engines use pretty large heaps (12-24GB). That means
that if they *ever* do a full collection, disaster ensues because it can
take so long.
That means that we have to use concurrent collectors as much as possible and
make sure that the concurrent collectors get all the ephemeral garbage. One
server, for instance, uses the following java options:
-verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution
These options give us lots of detail about what is happening in the
collections. Most importantly, we need to know that the tenuring
distribution never has any significant tail of objects that might survive
into the space that will cause a full collection. This is pretty safe in
general because our servers either create objects to respond to a single
request or create cached items that survive essentially forever.
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
Concurrent collectors are critical. We use the hbase recommendations here.
-XX:MaxTenuringThreshold=6 -XX:SurvivorRatio=6
Max tenuring threshold is related to what we saw on the tenuring
distribution. We very rarely see any objects last 4 collections so we set
it so that it would have to last two more collections in order to become
tenured. The survivor ratio is related to this and is set based on
recommendations for non-stop, low latency servers.
-XX:CMSInitiatingOccupancyFraction=60
-XX:+UseCMSInitiatingOccupancyOnly
CMS collections have couple of ways to be triggered. We limit it to a
single way to make the world simpler. Again, this is taken from outside
recommendations from the hbase guys and other commentors on the web.
-XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC
I doubt that these are important. It is always nice to get more information
and I want to avoid any possibility of some library triggering a huge
collection.
-XX:ParallelGCThreads=8
If the parallel GC needs horsepower, I want it to get it.
-Xdebug
Very rarely useful, but a royal pain if not installed. I don't know if it
has a performance impact (I think not).
-Xms8000m -Xmx8000m
Setting the minimum heap helps avoid full GC's during the early life of the
server.
On Tue, Nov 10, 2009 at 11:27 AM, Patrick Hunt <ph...@apache.org> wrote:
> Can you elaborate on "gc tuning" - you are using the incremental collector?
>
> Patrick
>
>
> Ted Dunning wrote:
>
>> The server side is a fairly standard (but old) config:
>>
>> tickTime=2000
>> dataDir=/home/zookeeper/
>> clientPort=2181
>> initLimit=5
>> syncLimit=2
>>
>> Most of our clients now use 5 seconds as the timeout, but I think that we
>> went to longer timeouts in the past. Without digging in to determine the
>> truth of the matter, my guess is that we needed the longer timeouts before
>> we tuned the GC parameters and that after tuning GC, we were able to
>> return
>> to a more reasonable timeout. In retrospect, I think that we blamed EC2
>> for
>> some of our own GC misconfiguration.
>>
>> I would not use our configuration here as canonical since we didn't apply
>> a
>> whole lot of brainpower to this problem.
>>
>> On Tue, Nov 10, 2009 at 9:29 AM, Patrick Hunt <ph...@apache.org> wrote:
>>
>> Ted, could you provide your configuration information for the cluster
>>> (incl
>>> the client timeout you use), if you're willing I'd be happy to put this
>>> up
>>> on the wiki for others interested in running in EC2.
>>>
>>>
>>
>>
>>
--
Ted Dunning, CTO
DeepDyve
Re: ZK on EC2
Posted by Patrick Hunt <ph...@apache.org>.
Can you elaborate on "gc tuning" - you are using the incremental collector?
Patrick
Ted Dunning wrote:
> The server side is a fairly standard (but old) config:
>
> tickTime=2000
> dataDir=/home/zookeeper/
> clientPort=2181
> initLimit=5
> syncLimit=2
>
> Most of our clients now use 5 seconds as the timeout, but I think that we
> went to longer timeouts in the past. Without digging in to determine the
> truth of the matter, my guess is that we needed the longer timeouts before
> we tuned the GC parameters and that after tuning GC, we were able to return
> to a more reasonable timeout. In retrospect, I think that we blamed EC2 for
> some of our own GC misconfiguration.
>
> I would not use our configuration here as canonical since we didn't apply a
> whole lot of brainpower to this problem.
>
> On Tue, Nov 10, 2009 at 9:29 AM, Patrick Hunt <ph...@apache.org> wrote:
>
>> Ted, could you provide your configuration information for the cluster (incl
>> the client timeout you use), if you're willing I'd be happy to put this up
>> on the wiki for others interested in running in EC2.
>>
>
>
>
Re: ZK on EC2
Posted by Ted Dunning <te...@gmail.com>.
The server side is a fairly standard (but old) config:
tickTime=2000
dataDir=/home/zookeeper/
clientPort=2181
initLimit=5
syncLimit=2
Most of our clients now use 5 seconds as the timeout, but I think that we
went to longer timeouts in the past. Without digging in to determine the
truth of the matter, my guess is that we needed the longer timeouts before
we tuned the GC parameters and that after tuning GC, we were able to return
to a more reasonable timeout. In retrospect, I think that we blamed EC2 for
some of our own GC misconfiguration.
I would not use our configuration here as canonical since we didn't apply a
whole lot of brainpower to this problem.
On Tue, Nov 10, 2009 at 9:29 AM, Patrick Hunt <ph...@apache.org> wrote:
> Ted, could you provide your configuration information for the cluster (incl
> the client timeout you use), if you're willing I'd be happy to put this up
> on the wiki for others interested in running in EC2.
>
--
Ted Dunning, CTO
DeepDyve
Re: ZK on EC2
Posted by Patrick Hunt <ph...@apache.org>.
Ok, good. Based on the comparison of perf numbers, and Ted's experience
with large instances on ec2 running zk, this makes sense to me. A large
is about half (very roughly) the horsepower of what I was using for my
tests. Ted mentioned that he hasn't seen issues on ec2 running with
large instances and that correlates to these numbers (again, this is all
rough back of the envelope type stuff but good enough imo).
Anyone have a small that they could run the same cpu/disk/network tests?
I'd be interested to see how that stacks up.
Ted, could you provide your configuration information for the cluster
(incl the client timeout you use), if you're willing I'd be happy to put
this up on the wiki for others interested in running in EC2.
Thanks!
Patrick
Ted Dunning wrote:
> I only have one large instance live. My impression from previously is that
> between host bandwidth is generally about what you saw. We have been able
> to sustain 20-30MB/s into EC2 to a single node which should be harder than
> moving data between nodes. I have heard rumors that others were able to get
> double what I got for incoming transfer.
>
> On Mon, Nov 9, 2009 at 9:47 PM, Patrick Hunt <ph...@apache.org> wrote:
>
>> Could you test networking - scping data between hosts? (I was seeing
>> 64.1MB/s for a 512mb file - the one created by dd, random data)
>>
>
>
>
Re: ZK on EC2
Posted by Ted Dunning <te...@gmail.com>.
I only have one large instance live. My impression from previously is that
between host bandwidth is generally about what you saw. We have been able
to sustain 20-30MB/s into EC2 to a single node which should be harder than
moving data between nodes. I have heard rumors that others were able to get
double what I got for incoming transfer.
On Mon, Nov 9, 2009 at 9:47 PM, Patrick Hunt <ph...@apache.org> wrote:
> Could you test networking - scping data between hosts? (I was seeing
> 64.1MB/s for a 512mb file - the one created by dd, random data)
>
--
Ted Dunning, CTO
DeepDyve
Re: ZK on EC2
Posted by Patrick Hunt <ph...@apache.org>.
Interesting, so comparing a large (4cores and "high" i/o performance)
ec2 instance (the first number on each line below) vs the host I used in
the latency test (the second number on each line):
ebs cache 817 vs 11532 ~ 7% (ec2 7% as performant)
ebs bufread 53 vs 88 ~ 60%
native cache 829 vs 11532 ~ 7%
native bufread 80 vs 88 ~ 90%
dd 512m 106s vs 74s ~ 43% longer for ec2 large
md5sum 512m 2.13s vs 1.5 ~ 42% longer
Good thing we don't rely on disk cache. ;-) Raw processing power looks
about half. Could you test networking - scping data between hosts? (I
was seeing 64.1MB/s for a 512mb file - the one created by dd, random data)
Small anyone?
Patrick
Ted Dunning wrote:
> /dev/sdp is an EBS volume. /dev/sdb is a native volume.
>
> This is a large instance.
>
> root@domU-<mumble>#:~# hdparm -tT /dev/sdp
>
> /dev/sdp:
> Timing cached reads: 1634 MB in 2.00 seconds = 817.30 MB/sec
> Timing buffered disk reads: 160 MB in 3.00 seconds = 53.27 MB/sec
> root@domU-<mumble>:~# hdparm -tT /dev/sdb
>
> /dev/sdb:
> Timing cached reads: 1658 MB in 2.00 seconds = 829.44 MB/sec
> Timing buffered disk reads: 242 MB in 3.00 seconds = 80.56 MB/sec
> root@domU-<mumble>:~# time dd if=/dev/urandom bs=512000 of=/tmp/memtest
> count=1050
> 1050+0 records in
> 1050+0 records out
> 537600000 bytes (538 MB) copied, 106.525 s, 5.0 MB/s
>
> real 1m46.517s
> user 0m0.000s
> sys 1m46.127s
> root@domU-<mumble>:~# time md5sum /tmp/memtest; time md5sum /tmp/memtest;
> time md5sum /tmp/memtest
> f79304f68ce04011ca0aebfbd548134a /tmp/memtest
>
> real 0m2.234s
> user 0m1.613s
> sys 0m0.590s
> f79304f68ce04011ca0aebfbd548134a /tmp/memtest
>
> real 0m2.136s
> user 0m1.560s
> sys 0m0.584s
> f79304f68ce04011ca0aebfbd548134a /tmp/memtest
>
> real 0m2.123s
> user 0m1.640s
> sys 0m0.481s
> root@domU-<mumble>:~#
>
>
> On Mon, Nov 9, 2009 at 4:54 PM, Patrick Hunt <ph...@apache.org> wrote:
>
>> I'm really interested to know how ec2 compares wrt disk and network
>> performance to what I've documented here under the "hardware" section:
>> http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview#Hardware
>>
>> Is it possible for someone to compare the network and disk performance
>> (scp, dd, md5sum, etc...) that I document in the wiki page on say, EC2
>> small/large nodes? I'd do it myself but I've not used ec2. If anyone could
>> try these and report I'd appreciate it.
>>
>> Patrick
>>
>>
>> Ted Dunning wrote:
>>
>>> Worked pretty well for me. We did extend all of our timeouts. The
>>> biggest
>>> worry for us was timeouts on the client side. The ZK server side was no
>>> problem in that respect.
>>>
>>> On Mon, Nov 9, 2009 at 4:20 PM, Jun Rao <ju...@almaden.ibm.com> wrote:
>>>
>>> Has anyone deployed ZK on EC2? What's the experience there? Are there
>>>> more
>>>> timeouts, lead re-election, etc? Thanks,
>>>>
>>>> Jun
>>>> IBM Almaden Research Center
>>>> K55/B1, 650 Harry Road, San Jose, CA 95120-6099
>>>>
>>>> junrao@almaden.ibm.com
>>>>
>>>
>>>
>>>
>>>
>
>
Re: ZK on EC2
Posted by Ted Dunning <te...@gmail.com>.
/dev/sdp is an EBS volume. /dev/sdb is a native volume.
This is a large instance.
root@domU-<mumble>#:~# hdparm -tT /dev/sdp
/dev/sdp:
Timing cached reads: 1634 MB in 2.00 seconds = 817.30 MB/sec
Timing buffered disk reads: 160 MB in 3.00 seconds = 53.27 MB/sec
root@domU-<mumble>:~# hdparm -tT /dev/sdb
/dev/sdb:
Timing cached reads: 1658 MB in 2.00 seconds = 829.44 MB/sec
Timing buffered disk reads: 242 MB in 3.00 seconds = 80.56 MB/sec
root@domU-<mumble>:~# time dd if=/dev/urandom bs=512000 of=/tmp/memtest
count=1050
1050+0 records in
1050+0 records out
537600000 bytes (538 MB) copied, 106.525 s, 5.0 MB/s
real 1m46.517s
user 0m0.000s
sys 1m46.127s
root@domU-<mumble>:~# time md5sum /tmp/memtest; time md5sum /tmp/memtest;
time md5sum /tmp/memtest
f79304f68ce04011ca0aebfbd548134a /tmp/memtest
real 0m2.234s
user 0m1.613s
sys 0m0.590s
f79304f68ce04011ca0aebfbd548134a /tmp/memtest
real 0m2.136s
user 0m1.560s
sys 0m0.584s
f79304f68ce04011ca0aebfbd548134a /tmp/memtest
real 0m2.123s
user 0m1.640s
sys 0m0.481s
root@domU-<mumble>:~#
On Mon, Nov 9, 2009 at 4:54 PM, Patrick Hunt <ph...@apache.org> wrote:
> I'm really interested to know how ec2 compares wrt disk and network
> performance to what I've documented here under the "hardware" section:
> http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview#Hardware
>
> Is it possible for someone to compare the network and disk performance
> (scp, dd, md5sum, etc...) that I document in the wiki page on say, EC2
> small/large nodes? I'd do it myself but I've not used ec2. If anyone could
> try these and report I'd appreciate it.
>
> Patrick
>
>
> Ted Dunning wrote:
>
>> Worked pretty well for me. We did extend all of our timeouts. The
>> biggest
>> worry for us was timeouts on the client side. The ZK server side was no
>> problem in that respect.
>>
>> On Mon, Nov 9, 2009 at 4:20 PM, Jun Rao <ju...@almaden.ibm.com> wrote:
>>
>> Has anyone deployed ZK on EC2? What's the experience there? Are there
>>> more
>>> timeouts, lead re-election, etc? Thanks,
>>>
>>> Jun
>>> IBM Almaden Research Center
>>> K55/B1, 650 Harry Road, San Jose, CA 95120-6099
>>>
>>> junrao@almaden.ibm.com
>>>
>>
>>
>>
>>
>>
--
Ted Dunning, CTO
DeepDyve
Re: ZK on EC2
Posted by Patrick Hunt <ph...@apache.org>.
I'm really interested to know how ec2 compares wrt disk and network
performance to what I've documented here under the "hardware" section:
http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview#Hardware
Is it possible for someone to compare the network and disk performance
(scp, dd, md5sum, etc...) that I document in the wiki page on say, EC2
small/large nodes? I'd do it myself but I've not used ec2. If anyone
could try these and report I'd appreciate it.
Patrick
Ted Dunning wrote:
> Worked pretty well for me. We did extend all of our timeouts. The biggest
> worry for us was timeouts on the client side. The ZK server side was no
> problem in that respect.
>
> On Mon, Nov 9, 2009 at 4:20 PM, Jun Rao <ju...@almaden.ibm.com> wrote:
>
>> Has anyone deployed ZK on EC2? What's the experience there? Are there more
>> timeouts, lead re-election, etc? Thanks,
>>
>> Jun
>> IBM Almaden Research Center
>> K55/B1, 650 Harry Road, San Jose, CA 95120-6099
>>
>> junrao@almaden.ibm.com
>
>
>
>
Re: ZK on EC2
Posted by Ted Dunning <te...@gmail.com>.
10-30s at different times. Not sure what the final numbers were.
On Mon, Nov 9, 2009 at 4:39 PM, Jun Rao <ju...@almaden.ibm.com> wrote:
> Thanks, Ted.
>
> What long did you set the client timeout?
>
> Jun
> IBM Almaden Research Center
> K55/B1, 650 Harry Road, San Jose, CA 95120-6099
>
> junrao@almaden.ibm.com
>
>
> Ted Dunning <te...@gmail.com> wrote on 11/09/2009 04:24:16 PM:
>
> > [image removed]
> >
> > Re: ZK on EC2
> >
> > Ted Dunning
> >
> > to:
> >
> > zookeeper-user
> >
> > 11/09/2009 04:25 PM
> >
> > Please respond to zookeeper-user
> >
> > Worked pretty well for me. We did extend all of our timeouts. The
> biggest
> > worry for us was timeouts on the client side. The ZK server side was no
> > problem in that respect.
> >
> > On Mon, Nov 9, 2009 at 4:20 PM, Jun Rao <ju...@almaden.ibm.com> wrote:
> >
> > > Has anyone deployed ZK on EC2? What's the experience there? Are there
> more
> > > timeouts, lead re-election, etc? Thanks,
> > >
> > > Jun
> > > IBM Almaden Research Center
> > > K55/B1, 650 Harry Road, San Jose, CA 95120-6099
> > >
> > > junrao@almaden.ibm.com
> >
> >
> >
> >
> > --
> > Ted Dunning, CTO
> > DeepDyve
>
--
Ted Dunning, CTO
DeepDyve
Re: ZK on EC2
Posted by Jun Rao <ju...@almaden.ibm.com>.
Thanks, Ted.
What long did you set the client timeout?
Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA 95120-6099
junrao@almaden.ibm.com
Ted Dunning <te...@gmail.com> wrote on 11/09/2009 04:24:16 PM:
> [image removed]
>
> Re: ZK on EC2
>
> Ted Dunning
>
> to:
>
> zookeeper-user
>
> 11/09/2009 04:25 PM
>
> Please respond to zookeeper-user
>
> Worked pretty well for me. We did extend all of our timeouts. The
biggest
> worry for us was timeouts on the client side. The ZK server side was no
> problem in that respect.
>
> On Mon, Nov 9, 2009 at 4:20 PM, Jun Rao <ju...@almaden.ibm.com> wrote:
>
> > Has anyone deployed ZK on EC2? What's the experience there? Are there
more
> > timeouts, lead re-election, etc? Thanks,
> >
> > Jun
> > IBM Almaden Research Center
> > K55/B1, 650 Harry Road, San Jose, CA 95120-6099
> >
> > junrao@almaden.ibm.com
>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
Re: ZK on EC2
Posted by Ted Dunning <te...@gmail.com>.
Worked pretty well for me. We did extend all of our timeouts. The biggest
worry for us was timeouts on the client side. The ZK server side was no
problem in that respect.
On Mon, Nov 9, 2009 at 4:20 PM, Jun Rao <ju...@almaden.ibm.com> wrote:
> Has anyone deployed ZK on EC2? What's the experience there? Are there more
> timeouts, lead re-election, etc? Thanks,
>
> Jun
> IBM Almaden Research Center
> K55/B1, 650 Harry Road, San Jose, CA 95120-6099
>
> junrao@almaden.ibm.com
--
Ted Dunning, CTO
DeepDyve
ZK on EC2
Posted by Jun Rao <ju...@almaden.ibm.com>.
Has anyone deployed ZK on EC2? What's the experience there? Are there more
timeouts, lead re-election, etc? Thanks,
Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA 95120-6099
junrao@almaden.ibm.com