You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by N Keywal <nk...@gmail.com> on 2012/02/14 17:00:41 UTC

sync vs. async vs. multi performances

Hi,

I've done a test with Zookeeper 3.4.2 to compare the performances of
synchronous vs. asynchronous vs. multi when creating znode (variations
around:
calling 10000 times zk.create("/dummyTest", "dummy".getBytes(),
ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);) The code is at the
end of the mail.

I've tested different environments:
- 1 linux server with the client and 1 zookeeper node on the same machine
- 1 linux server for the client, 1 for 1 zookeeper node.
- 6 linux servers, 1 for the client, 5 for 5 zookeeper nodes.

Server are middle range, with 4*2 cores, jdk 1.6. ZK was on its own HD.

But the results are comparable:

Using the sync API, it takes 200 seconds for 10K creations, so around 0.02
second per call.
Using the async API, it takes 2 seconds for 10K (including waiting for the
last callback message)
Using the "multi" available since 3.4, it takes less than 1 second, again
for 10K.

I'm surprised by the time taken by the sync operation, I was not expecting
it to be that slow. The gap between async & sync is quite huge.

Is this something expected? Zookeeper is used in critical functions in
Hadoop/Hbase, I was looking at the possible benefits of using "multi", but
it seems low compared to async (well ~3 times faster :-). There are many
small data creations/deletions with the sync API in the existing hbase
algorithms, it would not be simple to replace them all by asynchronous
calls...

Cheers,

N.

--

public class ZookeeperTest {
  static ZooKeeper zk;
  static int nbTests = 10000;

  private ZookeeperTest() {
  }

  public static void test11() throws Exception {
    for (int i = 0; i < nbTests; ++i) {
      zk.create("/dummyTest_" + i, "dummy".getBytes(),
ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
    }
  }


  public static void test51() throws Exception {
    final AtomicInteger counter = new AtomicInteger(0);

    for (int i = 0; i < nbTests; ++i) {
      zk.create("/dummyTest_" + i, "dummy".getBytes(),
ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT,
        new AsyncCallback.StringCallback() {
          public void processResult(int i, String s, Object o, String s1) {
            counter.incrementAndGet();
          }
        }
        , null);
    }

    while (counter.get() != nbTests) {
      Thread.sleep(1);
    }
  }

  public static void test41() throws Exception {
    ArrayList<Op> ops = new ArrayList<Op>(nbTests);
    for (int i = 0; i < nbTests; ++i) {
      ops.add(
        Op.create("/dummyTest_" + i, "dummy".getBytes(),
ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT)
      );
    }

    zk.multi(ops);
  }

  public static void delete() throws Exception {
    ArrayList<Op> ops = new ArrayList<Op>(nbTests);

    for (int i = 0; i < nbTests; ++i) {
      ops.add(
        Op.delete("/dummyTest_" + i,-1)
      );
    }

    zk.multi(ops);
  }


  public static void test(String connection, String testName) throws
Throwable{
    Method m = ZookeeperTest.class.getMethod(testName);

    zk = new ZooKeeper(connection, 20000, new Watcher() {
      public void process(WatchedEvent watchedEvent) {
      }
    });

    final long start = System.currentTimeMillis();

    try {
      m.invoke(null);
    } catch (IllegalAccessException e) {
      throw e;
    } catch (InvocationTargetException e) {
      throw e.getTargetException();
    }

    final long end = System.currentTimeMillis();

    zk.close();

    final long endClose = System.currentTimeMillis();

    System.out.println(testName+":  ExeTime= " + (end - start) );
  }

  public static void main(String... args) throws Throwable {
      test(args[0], args[1]);
  }
}

Re: sync vs. async vs. multi performances

Posted by N Keywal <nk...@gmail.com>.
Yes. But it's dedicated to ZK (the os and tmp are on another one).

On Tue, Feb 14, 2012 at 5:11 PM, Flavio Junqueira <fp...@yahoo-inc.com> wrote:

> From your description, it sounds like you're using one disk device for the
> ZooKeeper server. Is it correct?
>
> -Flavio
>
> On Feb 14, 2012, at 5:00 PM, N Keywal wrote:
>
> > Hi,
> >
> > I've done a test with Zookeeper 3.4.2 to compare the performances of
> > synchronous vs. asynchronous vs. multi when creating znode (variations
> > around:
> > calling 10000 times zk.create("/dummyTest", "dummy".getBytes(),
> > ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);) The code is at the
> > end of the mail.
> >
> > I've tested different environments:
> > - 1 linux server with the client and 1 zookeeper node on the same machine
> > - 1 linux server for the client, 1 for 1 zookeeper node.
> > - 6 linux servers, 1 for the client, 5 for 5 zookeeper nodes.
> >
> > Server are middle range, with 4*2 cores, jdk 1.6. ZK was on its own HD.
> >
> > But the results are comparable:
> >
> > Using the sync API, it takes 200 seconds for 10K creations, so around
> 0.02
> > second per call.
> > Using the async API, it takes 2 seconds for 10K (including waiting for
> the
> > last callback message)
> > Using the "multi" available since 3.4, it takes less than 1 second, again
> > for 10K.
> >
> > I'm surprised by the time taken by the sync operation, I was not
> expecting
> > it to be that slow. The gap between async & sync is quite huge.
> >
> > Is this something expected? Zookeeper is used in critical functions in
> > Hadoop/Hbase, I was looking at the possible benefits of using "multi",
> but
> > it seems low compared to async (well ~3 times faster :-). There are many
> > small data creations/deletions with the sync API in the existing hbase
> > algorithms, it would not be simple to replace them all by asynchronous
> > calls...
> >
> > Cheers,
> >
> > N.
> >
> > --
> >
> > public class ZookeeperTest {
> >  static ZooKeeper zk;
> >  static int nbTests = 10000;
> >
> >  private ZookeeperTest() {
> >  }
> >
> >  public static void test11() throws Exception {
> >    for (int i = 0; i < nbTests; ++i) {
> >      zk.create("/dummyTest_" + i, "dummy".getBytes(),
> > ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
> >    }
> >  }
> >
> >
> >  public static void test51() throws Exception {
> >    final AtomicInteger counter = new AtomicInteger(0);
> >
> >    for (int i = 0; i < nbTests; ++i) {
> >      zk.create("/dummyTest_" + i, "dummy".getBytes(),
> > ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT,
> >        new AsyncCallback.StringCallback() {
> >          public void processResult(int i, String s, Object o, String s1)
> {
> >            counter.incrementAndGet();
> >          }
> >        }
> >        , null);
> >    }
> >
> >    while (counter.get() != nbTests) {
> >      Thread.sleep(1);
> >    }
> >  }
> >
> >  public static void test41() throws Exception {
> >    ArrayList<Op> ops = new ArrayList<Op>(nbTests);
> >    for (int i = 0; i < nbTests; ++i) {
> >      ops.add(
> >        Op.create("/dummyTest_" + i, "dummy".getBytes(),
> > ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT)
> >      );
> >    }
> >
> >    zk.multi(ops);
> >  }
> >
> >  public static void delete() throws Exception {
> >    ArrayList<Op> ops = new ArrayList<Op>(nbTests);
> >
> >    for (int i = 0; i < nbTests; ++i) {
> >      ops.add(
> >        Op.delete("/dummyTest_" + i,-1)
> >      );
> >    }
> >
> >    zk.multi(ops);
> >  }
> >
> >
> >  public static void test(String connection, String testName) throws
> > Throwable{
> >    Method m = ZookeeperTest.class.getMethod(testName);
> >
> >    zk = new ZooKeeper(connection, 20000, new Watcher() {
> >      public void process(WatchedEvent watchedEvent) {
> >      }
> >    });
> >
> >    final long start = System.currentTimeMillis();
> >
> >    try {
> >      m.invoke(null);
> >    } catch (IllegalAccessException e) {
> >      throw e;
> >    } catch (InvocationTargetException e) {
> >      throw e.getTargetException();
> >    }
> >
> >    final long end = System.currentTimeMillis();
> >
> >    zk.close();
> >
> >    final long endClose = System.currentTimeMillis();
> >
> >    System.out.println(testName+":  ExeTime= " + (end - start) );
> >  }
> >
> >  public static void main(String... args) throws Throwable {
> >      test(args[0], args[1]);
> >  }
> > }
>
> flavio
> junqueira
>
> research scientist
>
> fpj@yahoo-inc.com
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301
>
>

Re: sync vs. async vs. multi performances

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.
From your description, it sounds like you're using one disk device for the ZooKeeper server. Is it correct?

-Flavio

On Feb 14, 2012, at 5:00 PM, N Keywal wrote:

> Hi,
> 
> I've done a test with Zookeeper 3.4.2 to compare the performances of
> synchronous vs. asynchronous vs. multi when creating znode (variations
> around:
> calling 10000 times zk.create("/dummyTest", "dummy".getBytes(),
> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);) The code is at the
> end of the mail.
> 
> I've tested different environments:
> - 1 linux server with the client and 1 zookeeper node on the same machine
> - 1 linux server for the client, 1 for 1 zookeeper node.
> - 6 linux servers, 1 for the client, 5 for 5 zookeeper nodes.
> 
> Server are middle range, with 4*2 cores, jdk 1.6. ZK was on its own HD.
> 
> But the results are comparable:
> 
> Using the sync API, it takes 200 seconds for 10K creations, so around 0.02
> second per call.
> Using the async API, it takes 2 seconds for 10K (including waiting for the
> last callback message)
> Using the "multi" available since 3.4, it takes less than 1 second, again
> for 10K.
> 
> I'm surprised by the time taken by the sync operation, I was not expecting
> it to be that slow. The gap between async & sync is quite huge.
> 
> Is this something expected? Zookeeper is used in critical functions in
> Hadoop/Hbase, I was looking at the possible benefits of using "multi", but
> it seems low compared to async (well ~3 times faster :-). There are many
> small data creations/deletions with the sync API in the existing hbase
> algorithms, it would not be simple to replace them all by asynchronous
> calls...
> 
> Cheers,
> 
> N.
> 
> --
> 
> public class ZookeeperTest {
>  static ZooKeeper zk;
>  static int nbTests = 10000;
> 
>  private ZookeeperTest() {
>  }
> 
>  public static void test11() throws Exception {
>    for (int i = 0; i < nbTests; ++i) {
>      zk.create("/dummyTest_" + i, "dummy".getBytes(),
> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
>    }
>  }
> 
> 
>  public static void test51() throws Exception {
>    final AtomicInteger counter = new AtomicInteger(0);
> 
>    for (int i = 0; i < nbTests; ++i) {
>      zk.create("/dummyTest_" + i, "dummy".getBytes(),
> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT,
>        new AsyncCallback.StringCallback() {
>          public void processResult(int i, String s, Object o, String s1) {
>            counter.incrementAndGet();
>          }
>        }
>        , null);
>    }
> 
>    while (counter.get() != nbTests) {
>      Thread.sleep(1);
>    }
>  }
> 
>  public static void test41() throws Exception {
>    ArrayList<Op> ops = new ArrayList<Op>(nbTests);
>    for (int i = 0; i < nbTests; ++i) {
>      ops.add(
>        Op.create("/dummyTest_" + i, "dummy".getBytes(),
> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT)
>      );
>    }
> 
>    zk.multi(ops);
>  }
> 
>  public static void delete() throws Exception {
>    ArrayList<Op> ops = new ArrayList<Op>(nbTests);
> 
>    for (int i = 0; i < nbTests; ++i) {
>      ops.add(
>        Op.delete("/dummyTest_" + i,-1)
>      );
>    }
> 
>    zk.multi(ops);
>  }
> 
> 
>  public static void test(String connection, String testName) throws
> Throwable{
>    Method m = ZookeeperTest.class.getMethod(testName);
> 
>    zk = new ZooKeeper(connection, 20000, new Watcher() {
>      public void process(WatchedEvent watchedEvent) {
>      }
>    });
> 
>    final long start = System.currentTimeMillis();
> 
>    try {
>      m.invoke(null);
>    } catch (IllegalAccessException e) {
>      throw e;
>    } catch (InvocationTargetException e) {
>      throw e.getTargetException();
>    }
> 
>    final long end = System.currentTimeMillis();
> 
>    zk.close();
> 
>    final long endClose = System.currentTimeMillis();
> 
>    System.out.println(testName+":  ExeTime= " + (end - start) );
>  }
> 
>  public static void main(String... args) throws Throwable {
>      test(args[0], args[1]);
>  }
> }

flavio
junqueira
 
research scientist
 
fpj@yahoo-inc.com
direct +34 93-183-8828
 
avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301


Re: sync vs. async vs. multi performances

Posted by Ted Dunning <te...@gmail.com>.
Just as an arithmetic check, hundreds x 20 ms = seconds, not minutes.  Even
1000 x 0.02 s = 20 s which isn't all that long.  Faster is nice, but this
doesn't reach "minutes".  And as Flavio points out, if the recovery is
threaded, it will be faster.

On Tue, Feb 14, 2012 at 3:37 PM, N Keywal <nk...@gmail.com> wrote:

> Hi,
>
> Thanks for the replies.
>
> It's used when assigning the regions (kind of dataset) to the regionserver
> (jvm process in a physical server). There is one zookeeper node per region.
> On a server failure, there is typically a few hundreds regions to reassign,
> with multiple status written in . On paper, if we need 0,02s per node, that
> makes it to the minute to recover, just for zookeeper.
>
> That's theory. I haven't done a precise measurement yet.
>
>
> Anyway, if ZooKeeper can be faster, it's always very interesting :-)
>
>
> Cheers,
>
> N.
>
>
> On Tue, Feb 14, 2012 at 8:00 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > These results are about what is expected although the might be a little
> > more extreme.
> >
> > I doubt very much that hbase is mutating zk nodes fast enough for this to
> > matter much.
> >
> > Sent from my iPhone
> >
> > On Feb 14, 2012, at 8:00, N Keywal <nk...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I've done a test with Zookeeper 3.4.2 to compare the performances of
> > > synchronous vs. asynchronous vs. multi when creating znode (variations
> > > around:
> > > calling 10000 times zk.create("/dummyTest", "dummy".getBytes(),
> > > ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);) The code is at
> the
> > > end of the mail.
> > >
> > > I've tested different environments:
> > > - 1 linux server with the client and 1 zookeeper node on the same
> machine
> > > - 1 linux server for the client, 1 for 1 zookeeper node.
> > > - 6 linux servers, 1 for the client, 5 for 5 zookeeper nodes.
> > >
> > > Server are middle range, with 4*2 cores, jdk 1.6. ZK was on its own HD.
> > >
> > > But the results are comparable:
> > >
> > > Using the sync API, it takes 200 seconds for 10K creations, so around
> > 0.02
> > > second per call.
> > > Using the async API, it takes 2 seconds for 10K (including waiting for
> > the
> > > last callback message)
> > > Using the "multi" available since 3.4, it takes less than 1 second,
> again
> > > for 10K.
> > >
> > > I'm surprised by the time taken by the sync operation, I was not
> > expecting
> > > it to be that slow. The gap between async & sync is quite huge.
> > >
> > > Is this something expected? Zookeeper is used in critical functions in
> > > Hadoop/Hbase, I was looking at the possible benefits of using "multi",
> > but
> > > it seems low compared to async (well ~3 times faster :-). There are
> many
> > > small data creations/deletions with the sync API in the existing hbase
> > > algorithms, it would not be simple to replace them all by asynchronous
> > > calls...
> > >
> > > Cheers,
> > >
> > > N.
> > >
> > > --
> > >
> > > public class ZookeeperTest {
> > >  static ZooKeeper zk;
> > >  static int nbTests = 10000;
> > >
> > >  private ZookeeperTest() {
> > >  }
> > >
> > >  public static void test11() throws Exception {
> > >    for (int i = 0; i < nbTests; ++i) {
> > >      zk.create("/dummyTest_" + i, "dummy".getBytes(),
> > > ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
> > >    }
> > >  }
> > >
> > >
> > >  public static void test51() throws Exception {
> > >    final AtomicInteger counter = new AtomicInteger(0);
> > >
> > >    for (int i = 0; i < nbTests; ++i) {
> > >      zk.create("/dummyTest_" + i, "dummy".getBytes(),
> > > ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT,
> > >        new AsyncCallback.StringCallback() {
> > >          public void processResult(int i, String s, Object o, String
> s1)
> > {
> > >            counter.incrementAndGet();
> > >          }
> > >        }
> > >        , null);
> > >    }
> > >
> > >    while (counter.get() != nbTests) {
> > >      Thread.sleep(1);
> > >    }
> > >  }
> > >
> > >  public static void test41() throws Exception {
> > >    ArrayList<Op> ops = new ArrayList<Op>(nbTests);
> > >    for (int i = 0; i < nbTests; ++i) {
> > >      ops.add(
> > >        Op.create("/dummyTest_" + i, "dummy".getBytes(),
> > > ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT)
> > >      );
> > >    }
> > >
> > >    zk.multi(ops);
> > >  }
> > >
> > >  public static void delete() throws Exception {
> > >    ArrayList<Op> ops = new ArrayList<Op>(nbTests);
> > >
> > >    for (int i = 0; i < nbTests; ++i) {
> > >      ops.add(
> > >        Op.delete("/dummyTest_" + i,-1)
> > >      );
> > >    }
> > >
> > >    zk.multi(ops);
> > >  }
> > >
> > >
> > >  public static void test(String connection, String testName) throws
> > > Throwable{
> > >    Method m = ZookeeperTest.class.getMethod(testName);
> > >
> > >    zk = new ZooKeeper(connection, 20000, new Watcher() {
> > >      public void process(WatchedEvent watchedEvent) {
> > >      }
> > >    });
> > >
> > >    final long start = System.currentTimeMillis();
> > >
> > >    try {
> > >      m.invoke(null);
> > >    } catch (IllegalAccessException e) {
> > >      throw e;
> > >    } catch (InvocationTargetException e) {
> > >      throw e.getTargetException();
> > >    }
> > >
> > >    final long end = System.currentTimeMillis();
> > >
> > >    zk.close();
> > >
> > >    final long endClose = System.currentTimeMillis();
> > >
> > >    System.out.println(testName+":  ExeTime= " + (end - start) );
> > >  }
> > >
> > >  public static void main(String... args) throws Throwable {
> > >      test(args[0], args[1]);
> > >  }
> > > }
> >
>

Re: sync vs. async vs. multi performances

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.
Hi Ariel, Here is what they mean:

	Net means the overhead of the replication protocol only, not writing to disk
	Net+disk means the overhead of the replication protocol with writes to disk enabled	
	Net+disk (no write cache) same as the previous one, and we have turned the write cache of the disk off

-Flavio
 
On Feb 18, 2012, at 4:17 PM, Ariel Weisberg wrote:

> Hi,
> 
> In that diagram, what is the difference between net, net + disk, and net +
> disk (no write cache)?
> 
> Thanks,
> Ariel
> 
> On Fri, Feb 17, 2012 at 3:41 AM, Flavio Junqueira <fp...@yahoo-inc.com> wrote:
> 
>> Hi Ariel, That wiki is stale. Check it here:
>> 
>> 
>> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeperPresentations
>> 
>> In particular check the HIC talk, slide 57. We were using 1k byte writes
>> for those tests.
>> 
>> -Flavio
>> 
>> On Feb 15, 2012, at 12:18 AM, Ariel Weisberg wrote:
>> 
>>> Hi,
>>> 
>>> I tried to look at the presentations on the wiki, but the links aren't
>>> working? I was using
>>> http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations and the
>>> error at the top of the page is "You are not allowed to do AttachFile on
>>> this page. Login and try again."
>>> 
>>> I used (http://pastebin.com/uu7igM3J) and the results for 4k writes were
>>> http://pastebin.com/N26CJtQE. 8.5 milliseconds, which is a bit slower
>> than
>>> 5. Is it possible to beat the rotation speed?
>>> 
>>> You can increase the write size quite a bit to 240k and it only goes up
>> to
>>> 10 milliseconds. http://pastebin.com/MSTwaHYN
>>> 
>>> My recollection was being in the 12-14 range, but I may be thinking of
>> when
>>> I was pushing throughput.
>>> 
>>> Ariel
>>> 
>>> On Tue, Feb 14, 2012 at 4:02 PM, Flavio Junqueira <fp...@yahoo-inc.com>
>> wrote:
>>> 
>>>> Some of our previous measurements gave us around 5ms, check some of our
>>>> presentations we uploaded to the wiki. Those use 7.2k RPM disks and not
>>>> only volatile storage or battery backed cache. We do have the write
>> cache
>>>> on for the numbers I'm referring to. There are also numbers there when
>> the
>>>> write cache is off.
>>>> 
>>>> -Flavio
>>>> 
>>>> On Feb 14, 2012, at 9:48 PM, Ariel Weisberg wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> It's only a minute of you process each region serially. Process 100 or
>>>> 1000
>>>>> in parallel and it will go a lot faster.
>>>>> 
>>>>> 20 milliseconds to synchronously commit to a 5.4k disk is about right.
>>>> This
>>>>> is assuming the configuration for this is correct. On ext3 you need to
>>>>> mount with barrier=1 (ext4, xfs enable write barriers by default). If
>>>>> someone is getting significantly faster numbers they are probably
>> writing
>>>>> to a volatile or battery backed cache.
>>>>> 
>>>>> Performance is relative. The number of operations the DB can do is
>>>> roughly
>>>>> constant although multi may be able to more efficiently batch
>> operations
>>>> by
>>>>> amortizing all the coordination overhead.
>>>>> 
>>>>> In the synchronous case the DB is starved for work %99 of the time so
>> it
>>>> is
>>>>> not surprising that it is slow. You are benchmarking round trip time in
>>>>> that case, and that is dominated by the time it takes to synchronously
>>>>> commmit something to disk.
>>>>> 
>>>>> In the asynchronous case there is plenty of work and you can fully
>>>> utilize
>>>>> all the throughput available to get it done because each fsync makes
>>>>> multiple operations durable. However the work is still presented
>>>> piecemeal
>>>>> so there is per-operation overhead.
>>>>> 
>>>>> Caveat, I am on 3.3.3 so I haven't read how multi operations are
>>>>> implemented, but the numbers you are getting bear this out. In the
>>>>> multi-case you are getting the benefit of keeping the DB fully utilized
>>>>> plus amortizing the coordination overhead across multiple operations so
>>>> you
>>>>> get a boost in throughput beyond just async.
>>>>> 
>>>>> Ariel
>>>>> 
>>>>> On Tue, Feb 14, 2012 at 3:37 PM, N Keywal <nk...@gmail.com> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Thanks for the replies.
>>>>>> 
>>>>>> It's used when assigning the regions (kind of dataset) to the
>>>> regionserver
>>>>>> (jvm process in a physical server). There is one zookeeper node per
>>>> region.
>>>>>> On a server failure, there is typically a few hundreds regions to
>>>> reassign,
>>>>>> with multiple status written in . On paper, if we need 0,02s per node,
>>>> that
>>>>>> makes it to the minute to recover, just for zookeeper.
>>>>>> 
>>>>>> That's theory. I haven't done a precise measurement yet.
>>>>>> 
>>>>>> 
>>>>>> Anyway, if ZooKeeper can be faster, it's always very interesting :-)
>>>>>> 
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> N.
>>>>>> 
>>>>>> 
>>>>>> On Tue, Feb 14, 2012 at 8:00 PM, Ted Dunning <te...@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> These results are about what is expected although the might be a
>> little
>>>>>>> more extreme.
>>>>>>> 
>>>>>>> I doubt very much that hbase is mutating zk nodes fast enough for
>> this
>>>> to
>>>>>>> matter much.
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>> On Feb 14, 2012, at 8:00, N Keywal <nk...@gmail.com> wrote:
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I've done a test with Zookeeper 3.4.2 to compare the performances of
>>>>>>>> synchronous vs. asynchronous vs. multi when creating znode
>> (variations
>>>>>>>> around:
>>>>>>>> calling 10000 times zk.create("/dummyTest", "dummy".getBytes(),
>>>>>>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);) The code is at
>>>>>> the
>>>>>>>> end of the mail.
>>>>>>>> 
>>>>>>>> I've tested different environments:
>>>>>>>> - 1 linux server with the client and 1 zookeeper node on the same
>>>>>> machine
>>>>>>>> - 1 linux server for the client, 1 for 1 zookeeper node.
>>>>>>>> - 6 linux servers, 1 for the client, 5 for 5 zookeeper nodes.
>>>>>>>> 
>>>>>>>> Server are middle range, with 4*2 cores, jdk 1.6. ZK was on its own
>>>> HD.
>>>>>>>> 
>>>>>>>> But the results are comparable:
>>>>>>>> 
>>>>>>>> Using the sync API, it takes 200 seconds for 10K creations, so
>> around
>>>>>>> 0.02
>>>>>>>> second per call.
>>>>>>>> Using the async API, it takes 2 seconds for 10K (including waiting
>> for
>>>>>>> the
>>>>>>>> last callback message)
>>>>>>>> Using the "multi" available since 3.4, it takes less than 1 second,
>>>>>> again
>>>>>>>> for 10K.
>>>>>>>> 
>>>>>>>> I'm surprised by the time taken by the sync operation, I was not
>>>>>>> expecting
>>>>>>>> it to be that slow. The gap between async & sync is quite huge.
>>>>>>>> 
>>>>>>>> Is this something expected? Zookeeper is used in critical functions
>> in
>>>>>>>> Hadoop/Hbase, I was looking at the possible benefits of using
>> "multi",
>>>>>>> but
>>>>>>>> it seems low compared to async (well ~3 times faster :-). There are
>>>>>> many
>>>>>>>> small data creations/deletions with the sync API in the existing
>> hbase
>>>>>>>> algorithms, it would not be simple to replace them all by
>> asynchronous
>>>>>>>> calls...
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> 
>>>>>>>> N.
>>>>>>>> 
>>>>>>>> --
>>>>>>>> 
>>>>>>>> public class ZookeeperTest {
>>>>>>>> static ZooKeeper zk;
>>>>>>>> static int nbTests = 10000;
>>>>>>>> 
>>>>>>>> private ZookeeperTest() {
>>>>>>>> }
>>>>>>>> 
>>>>>>>> public static void test11() throws Exception {
>>>>>>>> for (int i = 0; i < nbTests; ++i) {
>>>>>>>>   zk.create("/dummyTest_" + i, "dummy".getBytes(),
>>>>>>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
>>>>>>>> }
>>>>>>>> }
>>>>>>>> 
>>>>>>>> 
>>>>>>>> public static void test51() throws Exception {
>>>>>>>> final AtomicInteger counter = new AtomicInteger(0);
>>>>>>>> 
>>>>>>>> for (int i = 0; i < nbTests; ++i) {
>>>>>>>>   zk.create("/dummyTest_" + i, "dummy".getBytes(),
>>>>>>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT,
>>>>>>>>     new AsyncCallback.StringCallback() {
>>>>>>>>       public void processResult(int i, String s, Object o, String
>>>>>> s1)
>>>>>>> {
>>>>>>>>         counter.incrementAndGet();
>>>>>>>>       }
>>>>>>>>     }
>>>>>>>>     , null);
>>>>>>>> }
>>>>>>>> 
>>>>>>>> while (counter.get() != nbTests) {
>>>>>>>>   Thread.sleep(1);
>>>>>>>> }
>>>>>>>> }
>>>>>>>> 
>>>>>>>> public static void test41() throws Exception {
>>>>>>>> ArrayList<Op> ops = new ArrayList<Op>(nbTests);
>>>>>>>> for (int i = 0; i < nbTests; ++i) {
>>>>>>>>   ops.add(
>>>>>>>>     Op.create("/dummyTest_" + i, "dummy".getBytes(),
>>>>>>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT)
>>>>>>>>   );
>>>>>>>> }
>>>>>>>> 
>>>>>>>> zk.multi(ops);
>>>>>>>> }
>>>>>>>> 
>>>>>>>> public static void delete() throws Exception {
>>>>>>>> ArrayList<Op> ops = new ArrayList<Op>(nbTests);
>>>>>>>> 
>>>>>>>> for (int i = 0; i < nbTests; ++i) {
>>>>>>>>   ops.add(
>>>>>>>>     Op.delete("/dummyTest_" + i,-1)
>>>>>>>>   );
>>>>>>>> }
>>>>>>>> 
>>>>>>>> zk.multi(ops);
>>>>>>>> }
>>>>>>>> 
>>>>>>>> 
>>>>>>>> public static void test(String connection, String testName) throws
>>>>>>>> Throwable{
>>>>>>>> Method m = ZookeeperTest.class.getMethod(testName);
>>>>>>>> 
>>>>>>>> zk = new ZooKeeper(connection, 20000, new Watcher() {
>>>>>>>>   public void process(WatchedEvent watchedEvent) {
>>>>>>>>   }
>>>>>>>> });
>>>>>>>> 
>>>>>>>> final long start = System.currentTimeMillis();
>>>>>>>> 
>>>>>>>> try {
>>>>>>>>   m.invoke(null);
>>>>>>>> } catch (IllegalAccessException e) {
>>>>>>>>   throw e;
>>>>>>>> } catch (InvocationTargetException e) {
>>>>>>>>   throw e.getTargetException();
>>>>>>>> }
>>>>>>>> 
>>>>>>>> final long end = System.currentTimeMillis();
>>>>>>>> 
>>>>>>>> zk.close();
>>>>>>>> 
>>>>>>>> final long endClose = System.currentTimeMillis();
>>>>>>>> 
>>>>>>>> System.out.println(testName+":  ExeTime= " + (end - start) );
>>>>>>>> }
>>>>>>>> 
>>>>>>>> public static void main(String... args) throws Throwable {
>>>>>>>>   test(args[0], args[1]);
>>>>>>>> }
>>>>>>>> }
>>>>>>> 
>>>>>> 
>>>> 
>>>> flavio
>>>> junqueira
>>>> 
>>>> research scientist
>>>> 
>>>> fpj@yahoo-inc.com
>>>> direct +34 93-183-8828
>>>> 
>>>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>>>> phone (408) 349 3300    fax (408) 349 3301
>>>> 
>>>> 
>> 
>> flavio
>> junqueira
>> 
>> research scientist
>> 
>> fpj@yahoo-inc.com
>> direct +34 93-183-8828
>> 
>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>> phone (408) 349 3300    fax (408) 349 3301
>> 
>> 

flavio
junqueira
 
research scientist
 
fpj@yahoo-inc.com
direct +34 93-183-8828
 
avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301


Re: sync vs. async vs. multi performances

Posted by Ariel Weisberg <aw...@voltdb.com>.
Hi,

In that diagram, what is the difference between net, net + disk, and net +
disk (no write cache)?

Thanks,
Ariel

On Fri, Feb 17, 2012 at 3:41 AM, Flavio Junqueira <fp...@yahoo-inc.com> wrote:

> Hi Ariel, That wiki is stale. Check it here:
>
>
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeperPresentations
>
> In particular check the HIC talk, slide 57. We were using 1k byte writes
> for those tests.
>
> -Flavio
>
> On Feb 15, 2012, at 12:18 AM, Ariel Weisberg wrote:
>
> > Hi,
> >
> > I tried to look at the presentations on the wiki, but the links aren't
> > working? I was using
> > http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations and the
> > error at the top of the page is "You are not allowed to do AttachFile on
> > this page. Login and try again."
> >
> > I used (http://pastebin.com/uu7igM3J) and the results for 4k writes were
> > http://pastebin.com/N26CJtQE. 8.5 milliseconds, which is a bit slower
> than
> > 5. Is it possible to beat the rotation speed?
> >
> > You can increase the write size quite a bit to 240k and it only goes up
> to
> > 10 milliseconds. http://pastebin.com/MSTwaHYN
> >
> > My recollection was being in the 12-14 range, but I may be thinking of
> when
> > I was pushing throughput.
> >
> > Ariel
> >
> > On Tue, Feb 14, 2012 at 4:02 PM, Flavio Junqueira <fp...@yahoo-inc.com>
> wrote:
> >
> >> Some of our previous measurements gave us around 5ms, check some of our
> >> presentations we uploaded to the wiki. Those use 7.2k RPM disks and not
> >> only volatile storage or battery backed cache. We do have the write
> cache
> >> on for the numbers I'm referring to. There are also numbers there when
> the
> >> write cache is off.
> >>
> >> -Flavio
> >>
> >> On Feb 14, 2012, at 9:48 PM, Ariel Weisberg wrote:
> >>
> >>> Hi,
> >>>
> >>> It's only a minute of you process each region serially. Process 100 or
> >> 1000
> >>> in parallel and it will go a lot faster.
> >>>
> >>> 20 milliseconds to synchronously commit to a 5.4k disk is about right.
> >> This
> >>> is assuming the configuration for this is correct. On ext3 you need to
> >>> mount with barrier=1 (ext4, xfs enable write barriers by default). If
> >>> someone is getting significantly faster numbers they are probably
> writing
> >>> to a volatile or battery backed cache.
> >>>
> >>> Performance is relative. The number of operations the DB can do is
> >> roughly
> >>> constant although multi may be able to more efficiently batch
> operations
> >> by
> >>> amortizing all the coordination overhead.
> >>>
> >>> In the synchronous case the DB is starved for work %99 of the time so
> it
> >> is
> >>> not surprising that it is slow. You are benchmarking round trip time in
> >>> that case, and that is dominated by the time it takes to synchronously
> >>> commmit something to disk.
> >>>
> >>> In the asynchronous case there is plenty of work and you can fully
> >> utilize
> >>> all the throughput available to get it done because each fsync makes
> >>> multiple operations durable. However the work is still presented
> >> piecemeal
> >>> so there is per-operation overhead.
> >>>
> >>> Caveat, I am on 3.3.3 so I haven't read how multi operations are
> >>> implemented, but the numbers you are getting bear this out. In the
> >>> multi-case you are getting the benefit of keeping the DB fully utilized
> >>> plus amortizing the coordination overhead across multiple operations so
> >> you
> >>> get a boost in throughput beyond just async.
> >>>
> >>> Ariel
> >>>
> >>> On Tue, Feb 14, 2012 at 3:37 PM, N Keywal <nk...@gmail.com> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> Thanks for the replies.
> >>>>
> >>>> It's used when assigning the regions (kind of dataset) to the
> >> regionserver
> >>>> (jvm process in a physical server). There is one zookeeper node per
> >> region.
> >>>> On a server failure, there is typically a few hundreds regions to
> >> reassign,
> >>>> with multiple status written in . On paper, if we need 0,02s per node,
> >> that
> >>>> makes it to the minute to recover, just for zookeeper.
> >>>>
> >>>> That's theory. I haven't done a precise measurement yet.
> >>>>
> >>>>
> >>>> Anyway, if ZooKeeper can be faster, it's always very interesting :-)
> >>>>
> >>>>
> >>>> Cheers,
> >>>>
> >>>> N.
> >>>>
> >>>>
> >>>> On Tue, Feb 14, 2012 at 8:00 PM, Ted Dunning <te...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> These results are about what is expected although the might be a
> little
> >>>>> more extreme.
> >>>>>
> >>>>> I doubt very much that hbase is mutating zk nodes fast enough for
> this
> >> to
> >>>>> matter much.
> >>>>>
> >>>>> Sent from my iPhone
> >>>>>
> >>>>> On Feb 14, 2012, at 8:00, N Keywal <nk...@gmail.com> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I've done a test with Zookeeper 3.4.2 to compare the performances of
> >>>>>> synchronous vs. asynchronous vs. multi when creating znode
> (variations
> >>>>>> around:
> >>>>>> calling 10000 times zk.create("/dummyTest", "dummy".getBytes(),
> >>>>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);) The code is at
> >>>> the
> >>>>>> end of the mail.
> >>>>>>
> >>>>>> I've tested different environments:
> >>>>>> - 1 linux server with the client and 1 zookeeper node on the same
> >>>> machine
> >>>>>> - 1 linux server for the client, 1 for 1 zookeeper node.
> >>>>>> - 6 linux servers, 1 for the client, 5 for 5 zookeeper nodes.
> >>>>>>
> >>>>>> Server are middle range, with 4*2 cores, jdk 1.6. ZK was on its own
> >> HD.
> >>>>>>
> >>>>>> But the results are comparable:
> >>>>>>
> >>>>>> Using the sync API, it takes 200 seconds for 10K creations, so
> around
> >>>>> 0.02
> >>>>>> second per call.
> >>>>>> Using the async API, it takes 2 seconds for 10K (including waiting
> for
> >>>>> the
> >>>>>> last callback message)
> >>>>>> Using the "multi" available since 3.4, it takes less than 1 second,
> >>>> again
> >>>>>> for 10K.
> >>>>>>
> >>>>>> I'm surprised by the time taken by the sync operation, I was not
> >>>>> expecting
> >>>>>> it to be that slow. The gap between async & sync is quite huge.
> >>>>>>
> >>>>>> Is this something expected? Zookeeper is used in critical functions
> in
> >>>>>> Hadoop/Hbase, I was looking at the possible benefits of using
> "multi",
> >>>>> but
> >>>>>> it seems low compared to async (well ~3 times faster :-). There are
> >>>> many
> >>>>>> small data creations/deletions with the sync API in the existing
> hbase
> >>>>>> algorithms, it would not be simple to replace them all by
> asynchronous
> >>>>>> calls...
> >>>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>> N.
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> public class ZookeeperTest {
> >>>>>> static ZooKeeper zk;
> >>>>>> static int nbTests = 10000;
> >>>>>>
> >>>>>> private ZookeeperTest() {
> >>>>>> }
> >>>>>>
> >>>>>> public static void test11() throws Exception {
> >>>>>>  for (int i = 0; i < nbTests; ++i) {
> >>>>>>    zk.create("/dummyTest_" + i, "dummy".getBytes(),
> >>>>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
> >>>>>>  }
> >>>>>> }
> >>>>>>
> >>>>>>
> >>>>>> public static void test51() throws Exception {
> >>>>>>  final AtomicInteger counter = new AtomicInteger(0);
> >>>>>>
> >>>>>>  for (int i = 0; i < nbTests; ++i) {
> >>>>>>    zk.create("/dummyTest_" + i, "dummy".getBytes(),
> >>>>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT,
> >>>>>>      new AsyncCallback.StringCallback() {
> >>>>>>        public void processResult(int i, String s, Object o, String
> >>>> s1)
> >>>>> {
> >>>>>>          counter.incrementAndGet();
> >>>>>>        }
> >>>>>>      }
> >>>>>>      , null);
> >>>>>>  }
> >>>>>>
> >>>>>>  while (counter.get() != nbTests) {
> >>>>>>    Thread.sleep(1);
> >>>>>>  }
> >>>>>> }
> >>>>>>
> >>>>>> public static void test41() throws Exception {
> >>>>>>  ArrayList<Op> ops = new ArrayList<Op>(nbTests);
> >>>>>>  for (int i = 0; i < nbTests; ++i) {
> >>>>>>    ops.add(
> >>>>>>      Op.create("/dummyTest_" + i, "dummy".getBytes(),
> >>>>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT)
> >>>>>>    );
> >>>>>>  }
> >>>>>>
> >>>>>>  zk.multi(ops);
> >>>>>> }
> >>>>>>
> >>>>>> public static void delete() throws Exception {
> >>>>>>  ArrayList<Op> ops = new ArrayList<Op>(nbTests);
> >>>>>>
> >>>>>>  for (int i = 0; i < nbTests; ++i) {
> >>>>>>    ops.add(
> >>>>>>      Op.delete("/dummyTest_" + i,-1)
> >>>>>>    );
> >>>>>>  }
> >>>>>>
> >>>>>>  zk.multi(ops);
> >>>>>> }
> >>>>>>
> >>>>>>
> >>>>>> public static void test(String connection, String testName) throws
> >>>>>> Throwable{
> >>>>>>  Method m = ZookeeperTest.class.getMethod(testName);
> >>>>>>
> >>>>>>  zk = new ZooKeeper(connection, 20000, new Watcher() {
> >>>>>>    public void process(WatchedEvent watchedEvent) {
> >>>>>>    }
> >>>>>>  });
> >>>>>>
> >>>>>>  final long start = System.currentTimeMillis();
> >>>>>>
> >>>>>>  try {
> >>>>>>    m.invoke(null);
> >>>>>>  } catch (IllegalAccessException e) {
> >>>>>>    throw e;
> >>>>>>  } catch (InvocationTargetException e) {
> >>>>>>    throw e.getTargetException();
> >>>>>>  }
> >>>>>>
> >>>>>>  final long end = System.currentTimeMillis();
> >>>>>>
> >>>>>>  zk.close();
> >>>>>>
> >>>>>>  final long endClose = System.currentTimeMillis();
> >>>>>>
> >>>>>>  System.out.println(testName+":  ExeTime= " + (end - start) );
> >>>>>> }
> >>>>>>
> >>>>>> public static void main(String... args) throws Throwable {
> >>>>>>    test(args[0], args[1]);
> >>>>>> }
> >>>>>> }
> >>>>>
> >>>>
> >>
> >> flavio
> >> junqueira
> >>
> >> research scientist
> >>
> >> fpj@yahoo-inc.com
> >> direct +34 93-183-8828
> >>
> >> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> >> phone (408) 349 3300    fax (408) 349 3301
> >>
> >>
>
> flavio
> junqueira
>
> research scientist
>
> fpj@yahoo-inc.com
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301
>
>

Re: sync vs. async vs. multi performances

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.
Hi Ariel, That wiki is stale. Check it here:

	https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeperPresentations

In particular check the HIC talk, slide 57. We were using 1k byte writes for those tests.

-Flavio

On Feb 15, 2012, at 12:18 AM, Ariel Weisberg wrote:

> Hi,
> 
> I tried to look at the presentations on the wiki, but the links aren't
> working? I was using
> http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations and the
> error at the top of the page is "You are not allowed to do AttachFile on
> this page. Login and try again."
> 
> I used (http://pastebin.com/uu7igM3J) and the results for 4k writes were
> http://pastebin.com/N26CJtQE. 8.5 milliseconds, which is a bit slower than
> 5. Is it possible to beat the rotation speed?
> 
> You can increase the write size quite a bit to 240k and it only goes up to
> 10 milliseconds. http://pastebin.com/MSTwaHYN
> 
> My recollection was being in the 12-14 range, but I may be thinking of when
> I was pushing throughput.
> 
> Ariel
> 
> On Tue, Feb 14, 2012 at 4:02 PM, Flavio Junqueira <fp...@yahoo-inc.com> wrote:
> 
>> Some of our previous measurements gave us around 5ms, check some of our
>> presentations we uploaded to the wiki. Those use 7.2k RPM disks and not
>> only volatile storage or battery backed cache. We do have the write cache
>> on for the numbers I'm referring to. There are also numbers there when the
>> write cache is off.
>> 
>> -Flavio
>> 
>> On Feb 14, 2012, at 9:48 PM, Ariel Weisberg wrote:
>> 
>>> Hi,
>>> 
>>> It's only a minute of you process each region serially. Process 100 or
>> 1000
>>> in parallel and it will go a lot faster.
>>> 
>>> 20 milliseconds to synchronously commit to a 5.4k disk is about right.
>> This
>>> is assuming the configuration for this is correct. On ext3 you need to
>>> mount with barrier=1 (ext4, xfs enable write barriers by default). If
>>> someone is getting significantly faster numbers they are probably writing
>>> to a volatile or battery backed cache.
>>> 
>>> Performance is relative. The number of operations the DB can do is
>> roughly
>>> constant although multi may be able to more efficiently batch operations
>> by
>>> amortizing all the coordination overhead.
>>> 
>>> In the synchronous case the DB is starved for work %99 of the time so it
>> is
>>> not surprising that it is slow. You are benchmarking round trip time in
>>> that case, and that is dominated by the time it takes to synchronously
>>> commmit something to disk.
>>> 
>>> In the asynchronous case there is plenty of work and you can fully
>> utilize
>>> all the throughput available to get it done because each fsync makes
>>> multiple operations durable. However the work is still presented
>> piecemeal
>>> so there is per-operation overhead.
>>> 
>>> Caveat, I am on 3.3.3 so I haven't read how multi operations are
>>> implemented, but the numbers you are getting bear this out. In the
>>> multi-case you are getting the benefit of keeping the DB fully utilized
>>> plus amortizing the coordination overhead across multiple operations so
>> you
>>> get a boost in throughput beyond just async.
>>> 
>>> Ariel
>>> 
>>> On Tue, Feb 14, 2012 at 3:37 PM, N Keywal <nk...@gmail.com> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Thanks for the replies.
>>>> 
>>>> It's used when assigning the regions (kind of dataset) to the
>> regionserver
>>>> (jvm process in a physical server). There is one zookeeper node per
>> region.
>>>> On a server failure, there is typically a few hundreds regions to
>> reassign,
>>>> with multiple status written in . On paper, if we need 0,02s per node,
>> that
>>>> makes it to the minute to recover, just for zookeeper.
>>>> 
>>>> That's theory. I haven't done a precise measurement yet.
>>>> 
>>>> 
>>>> Anyway, if ZooKeeper can be faster, it's always very interesting :-)
>>>> 
>>>> 
>>>> Cheers,
>>>> 
>>>> N.
>>>> 
>>>> 
>>>> On Tue, Feb 14, 2012 at 8:00 PM, Ted Dunning <te...@gmail.com>
>>>> wrote:
>>>> 
>>>>> These results are about what is expected although the might be a little
>>>>> more extreme.
>>>>> 
>>>>> I doubt very much that hbase is mutating zk nodes fast enough for this
>> to
>>>>> matter much.
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>> On Feb 14, 2012, at 8:00, N Keywal <nk...@gmail.com> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I've done a test with Zookeeper 3.4.2 to compare the performances of
>>>>>> synchronous vs. asynchronous vs. multi when creating znode (variations
>>>>>> around:
>>>>>> calling 10000 times zk.create("/dummyTest", "dummy".getBytes(),
>>>>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);) The code is at
>>>> the
>>>>>> end of the mail.
>>>>>> 
>>>>>> I've tested different environments:
>>>>>> - 1 linux server with the client and 1 zookeeper node on the same
>>>> machine
>>>>>> - 1 linux server for the client, 1 for 1 zookeeper node.
>>>>>> - 6 linux servers, 1 for the client, 5 for 5 zookeeper nodes.
>>>>>> 
>>>>>> Server are middle range, with 4*2 cores, jdk 1.6. ZK was on its own
>> HD.
>>>>>> 
>>>>>> But the results are comparable:
>>>>>> 
>>>>>> Using the sync API, it takes 200 seconds for 10K creations, so around
>>>>> 0.02
>>>>>> second per call.
>>>>>> Using the async API, it takes 2 seconds for 10K (including waiting for
>>>>> the
>>>>>> last callback message)
>>>>>> Using the "multi" available since 3.4, it takes less than 1 second,
>>>> again
>>>>>> for 10K.
>>>>>> 
>>>>>> I'm surprised by the time taken by the sync operation, I was not
>>>>> expecting
>>>>>> it to be that slow. The gap between async & sync is quite huge.
>>>>>> 
>>>>>> Is this something expected? Zookeeper is used in critical functions in
>>>>>> Hadoop/Hbase, I was looking at the possible benefits of using "multi",
>>>>> but
>>>>>> it seems low compared to async (well ~3 times faster :-). There are
>>>> many
>>>>>> small data creations/deletions with the sync API in the existing hbase
>>>>>> algorithms, it would not be simple to replace them all by asynchronous
>>>>>> calls...
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> N.
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> public class ZookeeperTest {
>>>>>> static ZooKeeper zk;
>>>>>> static int nbTests = 10000;
>>>>>> 
>>>>>> private ZookeeperTest() {
>>>>>> }
>>>>>> 
>>>>>> public static void test11() throws Exception {
>>>>>>  for (int i = 0; i < nbTests; ++i) {
>>>>>>    zk.create("/dummyTest_" + i, "dummy".getBytes(),
>>>>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
>>>>>>  }
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> public static void test51() throws Exception {
>>>>>>  final AtomicInteger counter = new AtomicInteger(0);
>>>>>> 
>>>>>>  for (int i = 0; i < nbTests; ++i) {
>>>>>>    zk.create("/dummyTest_" + i, "dummy".getBytes(),
>>>>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT,
>>>>>>      new AsyncCallback.StringCallback() {
>>>>>>        public void processResult(int i, String s, Object o, String
>>>> s1)
>>>>> {
>>>>>>          counter.incrementAndGet();
>>>>>>        }
>>>>>>      }
>>>>>>      , null);
>>>>>>  }
>>>>>> 
>>>>>>  while (counter.get() != nbTests) {
>>>>>>    Thread.sleep(1);
>>>>>>  }
>>>>>> }
>>>>>> 
>>>>>> public static void test41() throws Exception {
>>>>>>  ArrayList<Op> ops = new ArrayList<Op>(nbTests);
>>>>>>  for (int i = 0; i < nbTests; ++i) {
>>>>>>    ops.add(
>>>>>>      Op.create("/dummyTest_" + i, "dummy".getBytes(),
>>>>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT)
>>>>>>    );
>>>>>>  }
>>>>>> 
>>>>>>  zk.multi(ops);
>>>>>> }
>>>>>> 
>>>>>> public static void delete() throws Exception {
>>>>>>  ArrayList<Op> ops = new ArrayList<Op>(nbTests);
>>>>>> 
>>>>>>  for (int i = 0; i < nbTests; ++i) {
>>>>>>    ops.add(
>>>>>>      Op.delete("/dummyTest_" + i,-1)
>>>>>>    );
>>>>>>  }
>>>>>> 
>>>>>>  zk.multi(ops);
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> public static void test(String connection, String testName) throws
>>>>>> Throwable{
>>>>>>  Method m = ZookeeperTest.class.getMethod(testName);
>>>>>> 
>>>>>>  zk = new ZooKeeper(connection, 20000, new Watcher() {
>>>>>>    public void process(WatchedEvent watchedEvent) {
>>>>>>    }
>>>>>>  });
>>>>>> 
>>>>>>  final long start = System.currentTimeMillis();
>>>>>> 
>>>>>>  try {
>>>>>>    m.invoke(null);
>>>>>>  } catch (IllegalAccessException e) {
>>>>>>    throw e;
>>>>>>  } catch (InvocationTargetException e) {
>>>>>>    throw e.getTargetException();
>>>>>>  }
>>>>>> 
>>>>>>  final long end = System.currentTimeMillis();
>>>>>> 
>>>>>>  zk.close();
>>>>>> 
>>>>>>  final long endClose = System.currentTimeMillis();
>>>>>> 
>>>>>>  System.out.println(testName+":  ExeTime= " + (end - start) );
>>>>>> }
>>>>>> 
>>>>>> public static void main(String... args) throws Throwable {
>>>>>>    test(args[0], args[1]);
>>>>>> }
>>>>>> }
>>>>> 
>>>> 
>> 
>> flavio
>> junqueira
>> 
>> research scientist
>> 
>> fpj@yahoo-inc.com
>> direct +34 93-183-8828
>> 
>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>> phone (408) 349 3300    fax (408) 349 3301
>> 
>> 

flavio
junqueira
 
research scientist
 
fpj@yahoo-inc.com
direct +34 93-183-8828
 
avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301


Re: sync vs. async vs. multi performances

Posted by Ted Dunning <te...@gmail.com>.
Yes it is possible.

With a loaded server, each group of transactions wiill take about one
rotation.  But the time from when they arrived to the time that they are
committed will be roughly 0 ... 8 ms for a 7200 RPM drive because the
transactions will be arriving at different times.

There will be overheads which make this untrue, but the basic idea that you
don't necessarily have to wait for a full rotation if you arrive partway
through a rotation is correct.

On Tue, Feb 14, 2012 at 6:18 PM, Ariel Weisberg <aw...@voltdb.com>wrote:

> ...
> I used (http://pastebin.com/uu7igM3J) and the results for 4k writes were
> http://pastebin.com/N26CJtQE. 8.5 milliseconds, which is a bit slower than
> 5. Is it possible to beat the rotation speed?
>
>

Re: sync vs. async vs. multi performances

Posted by Ariel Weisberg <aw...@voltdb.com>.
Hi,

I tried to look at the presentations on the wiki, but the links aren't
working? I was using
http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations and the
error at the top of the page is "You are not allowed to do AttachFile on
this page. Login and try again."

I used (http://pastebin.com/uu7igM3J) and the results for 4k writes were
http://pastebin.com/N26CJtQE. 8.5 milliseconds, which is a bit slower than
5. Is it possible to beat the rotation speed?

You can increase the write size quite a bit to 240k and it only goes up to
10 milliseconds. http://pastebin.com/MSTwaHYN

My recollection was being in the 12-14 range, but I may be thinking of when
I was pushing throughput.

Ariel

On Tue, Feb 14, 2012 at 4:02 PM, Flavio Junqueira <fp...@yahoo-inc.com> wrote:

> Some of our previous measurements gave us around 5ms, check some of our
> presentations we uploaded to the wiki. Those use 7.2k RPM disks and not
> only volatile storage or battery backed cache. We do have the write cache
> on for the numbers I'm referring to. There are also numbers there when the
> write cache is off.
>
> -Flavio
>
> On Feb 14, 2012, at 9:48 PM, Ariel Weisberg wrote:
>
> > Hi,
> >
> > It's only a minute of you process each region serially. Process 100 or
> 1000
> > in parallel and it will go a lot faster.
> >
> > 20 milliseconds to synchronously commit to a 5.4k disk is about right.
> This
> > is assuming the configuration for this is correct. On ext3 you need to
> > mount with barrier=1 (ext4, xfs enable write barriers by default). If
> > someone is getting significantly faster numbers they are probably writing
> > to a volatile or battery backed cache.
> >
> > Performance is relative. The number of operations the DB can do is
> roughly
> > constant although multi may be able to more efficiently batch operations
> by
> > amortizing all the coordination overhead.
> >
> > In the synchronous case the DB is starved for work %99 of the time so it
> is
> > not surprising that it is slow. You are benchmarking round trip time in
> > that case, and that is dominated by the time it takes to synchronously
> > commmit something to disk.
> >
> > In the asynchronous case there is plenty of work and you can fully
> utilize
> > all the throughput available to get it done because each fsync makes
> > multiple operations durable. However the work is still presented
> piecemeal
> > so there is per-operation overhead.
> >
> > Caveat, I am on 3.3.3 so I haven't read how multi operations are
> > implemented, but the numbers you are getting bear this out. In the
> > multi-case you are getting the benefit of keeping the DB fully utilized
> > plus amortizing the coordination overhead across multiple operations so
> you
> > get a boost in throughput beyond just async.
> >
> > Ariel
> >
> > On Tue, Feb 14, 2012 at 3:37 PM, N Keywal <nk...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> Thanks for the replies.
> >>
> >> It's used when assigning the regions (kind of dataset) to the
> regionserver
> >> (jvm process in a physical server). There is one zookeeper node per
> region.
> >> On a server failure, there is typically a few hundreds regions to
> reassign,
> >> with multiple status written in . On paper, if we need 0,02s per node,
> that
> >> makes it to the minute to recover, just for zookeeper.
> >>
> >> That's theory. I haven't done a precise measurement yet.
> >>
> >>
> >> Anyway, if ZooKeeper can be faster, it's always very interesting :-)
> >>
> >>
> >> Cheers,
> >>
> >> N.
> >>
> >>
> >> On Tue, Feb 14, 2012 at 8:00 PM, Ted Dunning <te...@gmail.com>
> >> wrote:
> >>
> >>> These results are about what is expected although the might be a little
> >>> more extreme.
> >>>
> >>> I doubt very much that hbase is mutating zk nodes fast enough for this
> to
> >>> matter much.
> >>>
> >>> Sent from my iPhone
> >>>
> >>> On Feb 14, 2012, at 8:00, N Keywal <nk...@gmail.com> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I've done a test with Zookeeper 3.4.2 to compare the performances of
> >>>> synchronous vs. asynchronous vs. multi when creating znode (variations
> >>>> around:
> >>>> calling 10000 times zk.create("/dummyTest", "dummy".getBytes(),
> >>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);) The code is at
> >> the
> >>>> end of the mail.
> >>>>
> >>>> I've tested different environments:
> >>>> - 1 linux server with the client and 1 zookeeper node on the same
> >> machine
> >>>> - 1 linux server for the client, 1 for 1 zookeeper node.
> >>>> - 6 linux servers, 1 for the client, 5 for 5 zookeeper nodes.
> >>>>
> >>>> Server are middle range, with 4*2 cores, jdk 1.6. ZK was on its own
> HD.
> >>>>
> >>>> But the results are comparable:
> >>>>
> >>>> Using the sync API, it takes 200 seconds for 10K creations, so around
> >>> 0.02
> >>>> second per call.
> >>>> Using the async API, it takes 2 seconds for 10K (including waiting for
> >>> the
> >>>> last callback message)
> >>>> Using the "multi" available since 3.4, it takes less than 1 second,
> >> again
> >>>> for 10K.
> >>>>
> >>>> I'm surprised by the time taken by the sync operation, I was not
> >>> expecting
> >>>> it to be that slow. The gap between async & sync is quite huge.
> >>>>
> >>>> Is this something expected? Zookeeper is used in critical functions in
> >>>> Hadoop/Hbase, I was looking at the possible benefits of using "multi",
> >>> but
> >>>> it seems low compared to async (well ~3 times faster :-). There are
> >> many
> >>>> small data creations/deletions with the sync API in the existing hbase
> >>>> algorithms, it would not be simple to replace them all by asynchronous
> >>>> calls...
> >>>>
> >>>> Cheers,
> >>>>
> >>>> N.
> >>>>
> >>>> --
> >>>>
> >>>> public class ZookeeperTest {
> >>>> static ZooKeeper zk;
> >>>> static int nbTests = 10000;
> >>>>
> >>>> private ZookeeperTest() {
> >>>> }
> >>>>
> >>>> public static void test11() throws Exception {
> >>>>   for (int i = 0; i < nbTests; ++i) {
> >>>>     zk.create("/dummyTest_" + i, "dummy".getBytes(),
> >>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
> >>>>   }
> >>>> }
> >>>>
> >>>>
> >>>> public static void test51() throws Exception {
> >>>>   final AtomicInteger counter = new AtomicInteger(0);
> >>>>
> >>>>   for (int i = 0; i < nbTests; ++i) {
> >>>>     zk.create("/dummyTest_" + i, "dummy".getBytes(),
> >>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT,
> >>>>       new AsyncCallback.StringCallback() {
> >>>>         public void processResult(int i, String s, Object o, String
> >> s1)
> >>> {
> >>>>           counter.incrementAndGet();
> >>>>         }
> >>>>       }
> >>>>       , null);
> >>>>   }
> >>>>
> >>>>   while (counter.get() != nbTests) {
> >>>>     Thread.sleep(1);
> >>>>   }
> >>>> }
> >>>>
> >>>> public static void test41() throws Exception {
> >>>>   ArrayList<Op> ops = new ArrayList<Op>(nbTests);
> >>>>   for (int i = 0; i < nbTests; ++i) {
> >>>>     ops.add(
> >>>>       Op.create("/dummyTest_" + i, "dummy".getBytes(),
> >>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT)
> >>>>     );
> >>>>   }
> >>>>
> >>>>   zk.multi(ops);
> >>>> }
> >>>>
> >>>> public static void delete() throws Exception {
> >>>>   ArrayList<Op> ops = new ArrayList<Op>(nbTests);
> >>>>
> >>>>   for (int i = 0; i < nbTests; ++i) {
> >>>>     ops.add(
> >>>>       Op.delete("/dummyTest_" + i,-1)
> >>>>     );
> >>>>   }
> >>>>
> >>>>   zk.multi(ops);
> >>>> }
> >>>>
> >>>>
> >>>> public static void test(String connection, String testName) throws
> >>>> Throwable{
> >>>>   Method m = ZookeeperTest.class.getMethod(testName);
> >>>>
> >>>>   zk = new ZooKeeper(connection, 20000, new Watcher() {
> >>>>     public void process(WatchedEvent watchedEvent) {
> >>>>     }
> >>>>   });
> >>>>
> >>>>   final long start = System.currentTimeMillis();
> >>>>
> >>>>   try {
> >>>>     m.invoke(null);
> >>>>   } catch (IllegalAccessException e) {
> >>>>     throw e;
> >>>>   } catch (InvocationTargetException e) {
> >>>>     throw e.getTargetException();
> >>>>   }
> >>>>
> >>>>   final long end = System.currentTimeMillis();
> >>>>
> >>>>   zk.close();
> >>>>
> >>>>   final long endClose = System.currentTimeMillis();
> >>>>
> >>>>   System.out.println(testName+":  ExeTime= " + (end - start) );
> >>>> }
> >>>>
> >>>> public static void main(String... args) throws Throwable {
> >>>>     test(args[0], args[1]);
> >>>> }
> >>>> }
> >>>
> >>
>
> flavio
> junqueira
>
> research scientist
>
> fpj@yahoo-inc.com
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301
>
>

Re: sync vs. async vs. multi performances

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.
Some of our previous measurements gave us around 5ms, check some of our presentations we uploaded to the wiki. Those use 7.2k RPM disks and not only volatile storage or battery backed cache. We do have the write cache on for the numbers I'm referring to. There are also numbers there when the write cache is off.

-Flavio

On Feb 14, 2012, at 9:48 PM, Ariel Weisberg wrote:

> Hi,
> 
> It's only a minute of you process each region serially. Process 100 or 1000
> in parallel and it will go a lot faster.
> 
> 20 milliseconds to synchronously commit to a 5.4k disk is about right. This
> is assuming the configuration for this is correct. On ext3 you need to
> mount with barrier=1 (ext4, xfs enable write barriers by default). If
> someone is getting significantly faster numbers they are probably writing
> to a volatile or battery backed cache.
> 
> Performance is relative. The number of operations the DB can do is roughly
> constant although multi may be able to more efficiently batch operations by
> amortizing all the coordination overhead.
> 
> In the synchronous case the DB is starved for work %99 of the time so it is
> not surprising that it is slow. You are benchmarking round trip time in
> that case, and that is dominated by the time it takes to synchronously
> commmit something to disk.
> 
> In the asynchronous case there is plenty of work and you can fully utilize
> all the throughput available to get it done because each fsync makes
> multiple operations durable. However the work is still presented piecemeal
> so there is per-operation overhead.
> 
> Caveat, I am on 3.3.3 so I haven't read how multi operations are
> implemented, but the numbers you are getting bear this out. In the
> multi-case you are getting the benefit of keeping the DB fully utilized
> plus amortizing the coordination overhead across multiple operations so you
> get a boost in throughput beyond just async.
> 
> Ariel
> 
> On Tue, Feb 14, 2012 at 3:37 PM, N Keywal <nk...@gmail.com> wrote:
> 
>> Hi,
>> 
>> Thanks for the replies.
>> 
>> It's used when assigning the regions (kind of dataset) to the regionserver
>> (jvm process in a physical server). There is one zookeeper node per region.
>> On a server failure, there is typically a few hundreds regions to reassign,
>> with multiple status written in . On paper, if we need 0,02s per node, that
>> makes it to the minute to recover, just for zookeeper.
>> 
>> That's theory. I haven't done a precise measurement yet.
>> 
>> 
>> Anyway, if ZooKeeper can be faster, it's always very interesting :-)
>> 
>> 
>> Cheers,
>> 
>> N.
>> 
>> 
>> On Tue, Feb 14, 2012 at 8:00 PM, Ted Dunning <te...@gmail.com>
>> wrote:
>> 
>>> These results are about what is expected although the might be a little
>>> more extreme.
>>> 
>>> I doubt very much that hbase is mutating zk nodes fast enough for this to
>>> matter much.
>>> 
>>> Sent from my iPhone
>>> 
>>> On Feb 14, 2012, at 8:00, N Keywal <nk...@gmail.com> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I've done a test with Zookeeper 3.4.2 to compare the performances of
>>>> synchronous vs. asynchronous vs. multi when creating znode (variations
>>>> around:
>>>> calling 10000 times zk.create("/dummyTest", "dummy".getBytes(),
>>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);) The code is at
>> the
>>>> end of the mail.
>>>> 
>>>> I've tested different environments:
>>>> - 1 linux server with the client and 1 zookeeper node on the same
>> machine
>>>> - 1 linux server for the client, 1 for 1 zookeeper node.
>>>> - 6 linux servers, 1 for the client, 5 for 5 zookeeper nodes.
>>>> 
>>>> Server are middle range, with 4*2 cores, jdk 1.6. ZK was on its own HD.
>>>> 
>>>> But the results are comparable:
>>>> 
>>>> Using the sync API, it takes 200 seconds for 10K creations, so around
>>> 0.02
>>>> second per call.
>>>> Using the async API, it takes 2 seconds for 10K (including waiting for
>>> the
>>>> last callback message)
>>>> Using the "multi" available since 3.4, it takes less than 1 second,
>> again
>>>> for 10K.
>>>> 
>>>> I'm surprised by the time taken by the sync operation, I was not
>>> expecting
>>>> it to be that slow. The gap between async & sync is quite huge.
>>>> 
>>>> Is this something expected? Zookeeper is used in critical functions in
>>>> Hadoop/Hbase, I was looking at the possible benefits of using "multi",
>>> but
>>>> it seems low compared to async (well ~3 times faster :-). There are
>> many
>>>> small data creations/deletions with the sync API in the existing hbase
>>>> algorithms, it would not be simple to replace them all by asynchronous
>>>> calls...
>>>> 
>>>> Cheers,
>>>> 
>>>> N.
>>>> 
>>>> --
>>>> 
>>>> public class ZookeeperTest {
>>>> static ZooKeeper zk;
>>>> static int nbTests = 10000;
>>>> 
>>>> private ZookeeperTest() {
>>>> }
>>>> 
>>>> public static void test11() throws Exception {
>>>>   for (int i = 0; i < nbTests; ++i) {
>>>>     zk.create("/dummyTest_" + i, "dummy".getBytes(),
>>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
>>>>   }
>>>> }
>>>> 
>>>> 
>>>> public static void test51() throws Exception {
>>>>   final AtomicInteger counter = new AtomicInteger(0);
>>>> 
>>>>   for (int i = 0; i < nbTests; ++i) {
>>>>     zk.create("/dummyTest_" + i, "dummy".getBytes(),
>>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT,
>>>>       new AsyncCallback.StringCallback() {
>>>>         public void processResult(int i, String s, Object o, String
>> s1)
>>> {
>>>>           counter.incrementAndGet();
>>>>         }
>>>>       }
>>>>       , null);
>>>>   }
>>>> 
>>>>   while (counter.get() != nbTests) {
>>>>     Thread.sleep(1);
>>>>   }
>>>> }
>>>> 
>>>> public static void test41() throws Exception {
>>>>   ArrayList<Op> ops = new ArrayList<Op>(nbTests);
>>>>   for (int i = 0; i < nbTests; ++i) {
>>>>     ops.add(
>>>>       Op.create("/dummyTest_" + i, "dummy".getBytes(),
>>>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT)
>>>>     );
>>>>   }
>>>> 
>>>>   zk.multi(ops);
>>>> }
>>>> 
>>>> public static void delete() throws Exception {
>>>>   ArrayList<Op> ops = new ArrayList<Op>(nbTests);
>>>> 
>>>>   for (int i = 0; i < nbTests; ++i) {
>>>>     ops.add(
>>>>       Op.delete("/dummyTest_" + i,-1)
>>>>     );
>>>>   }
>>>> 
>>>>   zk.multi(ops);
>>>> }
>>>> 
>>>> 
>>>> public static void test(String connection, String testName) throws
>>>> Throwable{
>>>>   Method m = ZookeeperTest.class.getMethod(testName);
>>>> 
>>>>   zk = new ZooKeeper(connection, 20000, new Watcher() {
>>>>     public void process(WatchedEvent watchedEvent) {
>>>>     }
>>>>   });
>>>> 
>>>>   final long start = System.currentTimeMillis();
>>>> 
>>>>   try {
>>>>     m.invoke(null);
>>>>   } catch (IllegalAccessException e) {
>>>>     throw e;
>>>>   } catch (InvocationTargetException e) {
>>>>     throw e.getTargetException();
>>>>   }
>>>> 
>>>>   final long end = System.currentTimeMillis();
>>>> 
>>>>   zk.close();
>>>> 
>>>>   final long endClose = System.currentTimeMillis();
>>>> 
>>>>   System.out.println(testName+":  ExeTime= " + (end - start) );
>>>> }
>>>> 
>>>> public static void main(String... args) throws Throwable {
>>>>     test(args[0], args[1]);
>>>> }
>>>> }
>>> 
>> 

flavio
junqueira
 
research scientist
 
fpj@yahoo-inc.com
direct +34 93-183-8828
 
avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301


Re: sync vs. async vs. multi performances

Posted by Ariel Weisberg <aw...@voltdb.com>.
Hi,

It's only a minute of you process each region serially. Process 100 or 1000
in parallel and it will go a lot faster.

20 milliseconds to synchronously commit to a 5.4k disk is about right. This
is assuming the configuration for this is correct. On ext3 you need to
mount with barrier=1 (ext4, xfs enable write barriers by default). If
someone is getting significantly faster numbers they are probably writing
to a volatile or battery backed cache.

Performance is relative. The number of operations the DB can do is roughly
constant although multi may be able to more efficiently batch operations by
amortizing all the coordination overhead.

In the synchronous case the DB is starved for work %99 of the time so it is
not surprising that it is slow. You are benchmarking round trip time in
that case, and that is dominated by the time it takes to synchronously
commmit something to disk.

In the asynchronous case there is plenty of work and you can fully utilize
all the throughput available to get it done because each fsync makes
multiple operations durable. However the work is still presented piecemeal
so there is per-operation overhead.

Caveat, I am on 3.3.3 so I haven't read how multi operations are
implemented, but the numbers you are getting bear this out. In the
multi-case you are getting the benefit of keeping the DB fully utilized
plus amortizing the coordination overhead across multiple operations so you
get a boost in throughput beyond just async.

Ariel

On Tue, Feb 14, 2012 at 3:37 PM, N Keywal <nk...@gmail.com> wrote:

> Hi,
>
> Thanks for the replies.
>
> It's used when assigning the regions (kind of dataset) to the regionserver
> (jvm process in a physical server). There is one zookeeper node per region.
> On a server failure, there is typically a few hundreds regions to reassign,
> with multiple status written in . On paper, if we need 0,02s per node, that
> makes it to the minute to recover, just for zookeeper.
>
> That's theory. I haven't done a precise measurement yet.
>
>
> Anyway, if ZooKeeper can be faster, it's always very interesting :-)
>
>
> Cheers,
>
> N.
>
>
> On Tue, Feb 14, 2012 at 8:00 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > These results are about what is expected although the might be a little
> > more extreme.
> >
> > I doubt very much that hbase is mutating zk nodes fast enough for this to
> > matter much.
> >
> > Sent from my iPhone
> >
> > On Feb 14, 2012, at 8:00, N Keywal <nk...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I've done a test with Zookeeper 3.4.2 to compare the performances of
> > > synchronous vs. asynchronous vs. multi when creating znode (variations
> > > around:
> > > calling 10000 times zk.create("/dummyTest", "dummy".getBytes(),
> > > ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);) The code is at
> the
> > > end of the mail.
> > >
> > > I've tested different environments:
> > > - 1 linux server with the client and 1 zookeeper node on the same
> machine
> > > - 1 linux server for the client, 1 for 1 zookeeper node.
> > > - 6 linux servers, 1 for the client, 5 for 5 zookeeper nodes.
> > >
> > > Server are middle range, with 4*2 cores, jdk 1.6. ZK was on its own HD.
> > >
> > > But the results are comparable:
> > >
> > > Using the sync API, it takes 200 seconds for 10K creations, so around
> > 0.02
> > > second per call.
> > > Using the async API, it takes 2 seconds for 10K (including waiting for
> > the
> > > last callback message)
> > > Using the "multi" available since 3.4, it takes less than 1 second,
> again
> > > for 10K.
> > >
> > > I'm surprised by the time taken by the sync operation, I was not
> > expecting
> > > it to be that slow. The gap between async & sync is quite huge.
> > >
> > > Is this something expected? Zookeeper is used in critical functions in
> > > Hadoop/Hbase, I was looking at the possible benefits of using "multi",
> > but
> > > it seems low compared to async (well ~3 times faster :-). There are
> many
> > > small data creations/deletions with the sync API in the existing hbase
> > > algorithms, it would not be simple to replace them all by asynchronous
> > > calls...
> > >
> > > Cheers,
> > >
> > > N.
> > >
> > > --
> > >
> > > public class ZookeeperTest {
> > >  static ZooKeeper zk;
> > >  static int nbTests = 10000;
> > >
> > >  private ZookeeperTest() {
> > >  }
> > >
> > >  public static void test11() throws Exception {
> > >    for (int i = 0; i < nbTests; ++i) {
> > >      zk.create("/dummyTest_" + i, "dummy".getBytes(),
> > > ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
> > >    }
> > >  }
> > >
> > >
> > >  public static void test51() throws Exception {
> > >    final AtomicInteger counter = new AtomicInteger(0);
> > >
> > >    for (int i = 0; i < nbTests; ++i) {
> > >      zk.create("/dummyTest_" + i, "dummy".getBytes(),
> > > ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT,
> > >        new AsyncCallback.StringCallback() {
> > >          public void processResult(int i, String s, Object o, String
> s1)
> > {
> > >            counter.incrementAndGet();
> > >          }
> > >        }
> > >        , null);
> > >    }
> > >
> > >    while (counter.get() != nbTests) {
> > >      Thread.sleep(1);
> > >    }
> > >  }
> > >
> > >  public static void test41() throws Exception {
> > >    ArrayList<Op> ops = new ArrayList<Op>(nbTests);
> > >    for (int i = 0; i < nbTests; ++i) {
> > >      ops.add(
> > >        Op.create("/dummyTest_" + i, "dummy".getBytes(),
> > > ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT)
> > >      );
> > >    }
> > >
> > >    zk.multi(ops);
> > >  }
> > >
> > >  public static void delete() throws Exception {
> > >    ArrayList<Op> ops = new ArrayList<Op>(nbTests);
> > >
> > >    for (int i = 0; i < nbTests; ++i) {
> > >      ops.add(
> > >        Op.delete("/dummyTest_" + i,-1)
> > >      );
> > >    }
> > >
> > >    zk.multi(ops);
> > >  }
> > >
> > >
> > >  public static void test(String connection, String testName) throws
> > > Throwable{
> > >    Method m = ZookeeperTest.class.getMethod(testName);
> > >
> > >    zk = new ZooKeeper(connection, 20000, new Watcher() {
> > >      public void process(WatchedEvent watchedEvent) {
> > >      }
> > >    });
> > >
> > >    final long start = System.currentTimeMillis();
> > >
> > >    try {
> > >      m.invoke(null);
> > >    } catch (IllegalAccessException e) {
> > >      throw e;
> > >    } catch (InvocationTargetException e) {
> > >      throw e.getTargetException();
> > >    }
> > >
> > >    final long end = System.currentTimeMillis();
> > >
> > >    zk.close();
> > >
> > >    final long endClose = System.currentTimeMillis();
> > >
> > >    System.out.println(testName+":  ExeTime= " + (end - start) );
> > >  }
> > >
> > >  public static void main(String... args) throws Throwable {
> > >      test(args[0], args[1]);
> > >  }
> > > }
> >
>

Re: sync vs. async vs. multi performances

Posted by N Keywal <nk...@gmail.com>.
Hi,

Thanks for the replies.

It's used when assigning the regions (kind of dataset) to the regionserver
(jvm process in a physical server). There is one zookeeper node per region.
On a server failure, there is typically a few hundreds regions to reassign,
with multiple status written in . On paper, if we need 0,02s per node, that
makes it to the minute to recover, just for zookeeper.

That's theory. I haven't done a precise measurement yet.


Anyway, if ZooKeeper can be faster, it's always very interesting :-)


Cheers,

N.


On Tue, Feb 14, 2012 at 8:00 PM, Ted Dunning <te...@gmail.com> wrote:

> These results are about what is expected although the might be a little
> more extreme.
>
> I doubt very much that hbase is mutating zk nodes fast enough for this to
> matter much.
>
> Sent from my iPhone
>
> On Feb 14, 2012, at 8:00, N Keywal <nk...@gmail.com> wrote:
>
> > Hi,
> >
> > I've done a test with Zookeeper 3.4.2 to compare the performances of
> > synchronous vs. asynchronous vs. multi when creating znode (variations
> > around:
> > calling 10000 times zk.create("/dummyTest", "dummy".getBytes(),
> > ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);) The code is at the
> > end of the mail.
> >
> > I've tested different environments:
> > - 1 linux server with the client and 1 zookeeper node on the same machine
> > - 1 linux server for the client, 1 for 1 zookeeper node.
> > - 6 linux servers, 1 for the client, 5 for 5 zookeeper nodes.
> >
> > Server are middle range, with 4*2 cores, jdk 1.6. ZK was on its own HD.
> >
> > But the results are comparable:
> >
> > Using the sync API, it takes 200 seconds for 10K creations, so around
> 0.02
> > second per call.
> > Using the async API, it takes 2 seconds for 10K (including waiting for
> the
> > last callback message)
> > Using the "multi" available since 3.4, it takes less than 1 second, again
> > for 10K.
> >
> > I'm surprised by the time taken by the sync operation, I was not
> expecting
> > it to be that slow. The gap between async & sync is quite huge.
> >
> > Is this something expected? Zookeeper is used in critical functions in
> > Hadoop/Hbase, I was looking at the possible benefits of using "multi",
> but
> > it seems low compared to async (well ~3 times faster :-). There are many
> > small data creations/deletions with the sync API in the existing hbase
> > algorithms, it would not be simple to replace them all by asynchronous
> > calls...
> >
> > Cheers,
> >
> > N.
> >
> > --
> >
> > public class ZookeeperTest {
> >  static ZooKeeper zk;
> >  static int nbTests = 10000;
> >
> >  private ZookeeperTest() {
> >  }
> >
> >  public static void test11() throws Exception {
> >    for (int i = 0; i < nbTests; ++i) {
> >      zk.create("/dummyTest_" + i, "dummy".getBytes(),
> > ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
> >    }
> >  }
> >
> >
> >  public static void test51() throws Exception {
> >    final AtomicInteger counter = new AtomicInteger(0);
> >
> >    for (int i = 0; i < nbTests; ++i) {
> >      zk.create("/dummyTest_" + i, "dummy".getBytes(),
> > ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT,
> >        new AsyncCallback.StringCallback() {
> >          public void processResult(int i, String s, Object o, String s1)
> {
> >            counter.incrementAndGet();
> >          }
> >        }
> >        , null);
> >    }
> >
> >    while (counter.get() != nbTests) {
> >      Thread.sleep(1);
> >    }
> >  }
> >
> >  public static void test41() throws Exception {
> >    ArrayList<Op> ops = new ArrayList<Op>(nbTests);
> >    for (int i = 0; i < nbTests; ++i) {
> >      ops.add(
> >        Op.create("/dummyTest_" + i, "dummy".getBytes(),
> > ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT)
> >      );
> >    }
> >
> >    zk.multi(ops);
> >  }
> >
> >  public static void delete() throws Exception {
> >    ArrayList<Op> ops = new ArrayList<Op>(nbTests);
> >
> >    for (int i = 0; i < nbTests; ++i) {
> >      ops.add(
> >        Op.delete("/dummyTest_" + i,-1)
> >      );
> >    }
> >
> >    zk.multi(ops);
> >  }
> >
> >
> >  public static void test(String connection, String testName) throws
> > Throwable{
> >    Method m = ZookeeperTest.class.getMethod(testName);
> >
> >    zk = new ZooKeeper(connection, 20000, new Watcher() {
> >      public void process(WatchedEvent watchedEvent) {
> >      }
> >    });
> >
> >    final long start = System.currentTimeMillis();
> >
> >    try {
> >      m.invoke(null);
> >    } catch (IllegalAccessException e) {
> >      throw e;
> >    } catch (InvocationTargetException e) {
> >      throw e.getTargetException();
> >    }
> >
> >    final long end = System.currentTimeMillis();
> >
> >    zk.close();
> >
> >    final long endClose = System.currentTimeMillis();
> >
> >    System.out.println(testName+":  ExeTime= " + (end - start) );
> >  }
> >
> >  public static void main(String... args) throws Throwable {
> >      test(args[0], args[1]);
> >  }
> > }
>

Re: sync vs. async vs. multi performances

Posted by Ted Dunning <te...@gmail.com>.
These results are about what is expected although the might be a little more extreme. 

I doubt very much that hbase is mutating zk nodes fast enough for this to matter much. 

Sent from my iPhone

On Feb 14, 2012, at 8:00, N Keywal <nk...@gmail.com> wrote:

> Hi,
> 
> I've done a test with Zookeeper 3.4.2 to compare the performances of
> synchronous vs. asynchronous vs. multi when creating znode (variations
> around:
> calling 10000 times zk.create("/dummyTest", "dummy".getBytes(),
> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);) The code is at the
> end of the mail.
> 
> I've tested different environments:
> - 1 linux server with the client and 1 zookeeper node on the same machine
> - 1 linux server for the client, 1 for 1 zookeeper node.
> - 6 linux servers, 1 for the client, 5 for 5 zookeeper nodes.
> 
> Server are middle range, with 4*2 cores, jdk 1.6. ZK was on its own HD.
> 
> But the results are comparable:
> 
> Using the sync API, it takes 200 seconds for 10K creations, so around 0.02
> second per call.
> Using the async API, it takes 2 seconds for 10K (including waiting for the
> last callback message)
> Using the "multi" available since 3.4, it takes less than 1 second, again
> for 10K.
> 
> I'm surprised by the time taken by the sync operation, I was not expecting
> it to be that slow. The gap between async & sync is quite huge.
> 
> Is this something expected? Zookeeper is used in critical functions in
> Hadoop/Hbase, I was looking at the possible benefits of using "multi", but
> it seems low compared to async (well ~3 times faster :-). There are many
> small data creations/deletions with the sync API in the existing hbase
> algorithms, it would not be simple to replace them all by asynchronous
> calls...
> 
> Cheers,
> 
> N.
> 
> --
> 
> public class ZookeeperTest {
>  static ZooKeeper zk;
>  static int nbTests = 10000;
> 
>  private ZookeeperTest() {
>  }
> 
>  public static void test11() throws Exception {
>    for (int i = 0; i < nbTests; ++i) {
>      zk.create("/dummyTest_" + i, "dummy".getBytes(),
> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
>    }
>  }
> 
> 
>  public static void test51() throws Exception {
>    final AtomicInteger counter = new AtomicInteger(0);
> 
>    for (int i = 0; i < nbTests; ++i) {
>      zk.create("/dummyTest_" + i, "dummy".getBytes(),
> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT,
>        new AsyncCallback.StringCallback() {
>          public void processResult(int i, String s, Object o, String s1) {
>            counter.incrementAndGet();
>          }
>        }
>        , null);
>    }
> 
>    while (counter.get() != nbTests) {
>      Thread.sleep(1);
>    }
>  }
> 
>  public static void test41() throws Exception {
>    ArrayList<Op> ops = new ArrayList<Op>(nbTests);
>    for (int i = 0; i < nbTests; ++i) {
>      ops.add(
>        Op.create("/dummyTest_" + i, "dummy".getBytes(),
> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT)
>      );
>    }
> 
>    zk.multi(ops);
>  }
> 
>  public static void delete() throws Exception {
>    ArrayList<Op> ops = new ArrayList<Op>(nbTests);
> 
>    for (int i = 0; i < nbTests; ++i) {
>      ops.add(
>        Op.delete("/dummyTest_" + i,-1)
>      );
>    }
> 
>    zk.multi(ops);
>  }
> 
> 
>  public static void test(String connection, String testName) throws
> Throwable{
>    Method m = ZookeeperTest.class.getMethod(testName);
> 
>    zk = new ZooKeeper(connection, 20000, new Watcher() {
>      public void process(WatchedEvent watchedEvent) {
>      }
>    });
> 
>    final long start = System.currentTimeMillis();
> 
>    try {
>      m.invoke(null);
>    } catch (IllegalAccessException e) {
>      throw e;
>    } catch (InvocationTargetException e) {
>      throw e.getTargetException();
>    }
> 
>    final long end = System.currentTimeMillis();
> 
>    zk.close();
> 
>    final long endClose = System.currentTimeMillis();
> 
>    System.out.println(testName+":  ExeTime= " + (end - start) );
>  }
> 
>  public static void main(String... args) throws Throwable {
>      test(args[0], args[1]);
>  }
> }

Re: sync vs. async vs. multi performances

Posted by Camille Fournier <ca...@apache.org>.
True. We do have an outstanding jira that Ben filed on a perf problem in
3.4.X that could be contributing:
https://issues.apache.org/jira/browse/ZOOKEEPER-1390


On Tue, Feb 14, 2012 at 12:42 PM, Flavio Junqueira <fp...@yahoo-inc.com>wrote:

> If I'm reading it correctly, these tests are getting 20ms per op. This is
> too high. I think we were getting something like 5ms before.
>
> -Flavio
>
> On Feb 14, 2012, at 6:24 PM, Camille Fournier wrote:
>
> > Sync calls have to make a complete roundtrip before the next call from
> that
> > client will happen. It's not surprising at all that it would take quite a
> > bit longer to do a sync call than an async call. It could be that the
> > bottleneck in this case is your client, not your server. If the sync
> calls
> > are happening amongst clients on many different servers, it probably
> > doesn't matter.
> >
> > C
> >
> > On Tue, Feb 14, 2012 at 11:00 AM, N Keywal <nk...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I've done a test with Zookeeper 3.4.2 to compare the performances of
> >> synchronous vs. asynchronous vs. multi when creating znode (variations
> >> around:
> >> calling 10000 times zk.create("/dummyTest", "dummy".getBytes(),
> >> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);) The code is at the
> >> end of the mail.
> >>
> >> I've tested different environments:
> >> - 1 linux server with the client and 1 zookeeper node on the same
> machine
> >> - 1 linux server for the client, 1 for 1 zookeeper node.
> >> - 6 linux servers, 1 for the client, 5 for 5 zookeeper nodes.
> >>
> >> Server are middle range, with 4*2 cores, jdk 1.6. ZK was on its own HD.
> >>
> >> But the results are comparable:
> >>
> >> Using the sync API, it takes 200 seconds for 10K creations, so around
> 0.02
> >> second per call.
> >> Using the async API, it takes 2 seconds for 10K (including waiting for
> the
> >> last callback message)
> >> Using the "multi" available since 3.4, it takes less than 1 second,
> again
> >> for 10K.
> >>
> >> I'm surprised by the time taken by the sync operation, I was not
> expecting
> >> it to be that slow. The gap between async & sync is quite huge.
> >>
> >> Is this something expected? Zookeeper is used in critical functions in
> >> Hadoop/Hbase, I was looking at the possible benefits of using "multi",
> but
> >> it seems low compared to async (well ~3 times faster :-). There are many
> >> small data creations/deletions with the sync API in the existing hbase
> >> algorithms, it would not be simple to replace them all by asynchronous
> >> calls...
> >>
> >> Cheers,
> >>
> >> N.
> >>
> >> --
> >>
> >> public class ZookeeperTest {
> >> static ZooKeeper zk;
> >> static int nbTests = 10000;
> >>
> >> private ZookeeperTest() {
> >> }
> >>
> >> public static void test11() throws Exception {
> >>   for (int i = 0; i < nbTests; ++i) {
> >>     zk.create("/dummyTest_" + i, "dummy".getBytes(),
> >> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
> >>   }
> >> }
> >>
> >>
> >> public static void test51() throws Exception {
> >>   final AtomicInteger counter = new AtomicInteger(0);
> >>
> >>   for (int i = 0; i < nbTests; ++i) {
> >>     zk.create("/dummyTest_" + i, "dummy".getBytes(),
> >> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT,
> >>       new AsyncCallback.StringCallback() {
> >>         public void processResult(int i, String s, Object o, String s1)
> {
> >>           counter.incrementAndGet();
> >>         }
> >>       }
> >>       , null);
> >>   }
> >>
> >>   while (counter.get() != nbTests) {
> >>     Thread.sleep(1);
> >>   }
> >> }
> >>
> >> public static void test41() throws Exception {
> >>   ArrayList<Op> ops = new ArrayList<Op>(nbTests);
> >>   for (int i = 0; i < nbTests; ++i) {
> >>     ops.add(
> >>       Op.create("/dummyTest_" + i, "dummy".getBytes(),
> >> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT)
> >>     );
> >>   }
> >>
> >>   zk.multi(ops);
> >> }
> >>
> >> public static void delete() throws Exception {
> >>   ArrayList<Op> ops = new ArrayList<Op>(nbTests);
> >>
> >>   for (int i = 0; i < nbTests; ++i) {
> >>     ops.add(
> >>       Op.delete("/dummyTest_" + i,-1)
> >>     );
> >>   }
> >>
> >>   zk.multi(ops);
> >> }
> >>
> >>
> >> public static void test(String connection, String testName) throws
> >> Throwable{
> >>   Method m = ZookeeperTest.class.getMethod(testName);
> >>
> >>   zk = new ZooKeeper(connection, 20000, new Watcher() {
> >>     public void process(WatchedEvent watchedEvent) {
> >>     }
> >>   });
> >>
> >>   final long start = System.currentTimeMillis();
> >>
> >>   try {
> >>     m.invoke(null);
> >>   } catch (IllegalAccessException e) {
> >>     throw e;
> >>   } catch (InvocationTargetException e) {
> >>     throw e.getTargetException();
> >>   }
> >>
> >>   final long end = System.currentTimeMillis();
> >>
> >>   zk.close();
> >>
> >>   final long endClose = System.currentTimeMillis();
> >>
> >>   System.out.println(testName+":  ExeTime= " + (end - start) );
> >> }
> >>
> >> public static void main(String... args) throws Throwable {
> >>     test(args[0], args[1]);
> >> }
> >> }
> >>
>
> flavio
> junqueira
>
> research scientist
>
> fpj@yahoo-inc.com
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301
>
>

Re: sync vs. async vs. multi performances

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.
If I'm reading it correctly, these tests are getting 20ms per op. This is too high. I think we were getting something like 5ms before.

-Flavio

On Feb 14, 2012, at 6:24 PM, Camille Fournier wrote:

> Sync calls have to make a complete roundtrip before the next call from that
> client will happen. It's not surprising at all that it would take quite a
> bit longer to do a sync call than an async call. It could be that the
> bottleneck in this case is your client, not your server. If the sync calls
> are happening amongst clients on many different servers, it probably
> doesn't matter.
> 
> C
> 
> On Tue, Feb 14, 2012 at 11:00 AM, N Keywal <nk...@gmail.com> wrote:
> 
>> Hi,
>> 
>> I've done a test with Zookeeper 3.4.2 to compare the performances of
>> synchronous vs. asynchronous vs. multi when creating znode (variations
>> around:
>> calling 10000 times zk.create("/dummyTest", "dummy".getBytes(),
>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);) The code is at the
>> end of the mail.
>> 
>> I've tested different environments:
>> - 1 linux server with the client and 1 zookeeper node on the same machine
>> - 1 linux server for the client, 1 for 1 zookeeper node.
>> - 6 linux servers, 1 for the client, 5 for 5 zookeeper nodes.
>> 
>> Server are middle range, with 4*2 cores, jdk 1.6. ZK was on its own HD.
>> 
>> But the results are comparable:
>> 
>> Using the sync API, it takes 200 seconds for 10K creations, so around 0.02
>> second per call.
>> Using the async API, it takes 2 seconds for 10K (including waiting for the
>> last callback message)
>> Using the "multi" available since 3.4, it takes less than 1 second, again
>> for 10K.
>> 
>> I'm surprised by the time taken by the sync operation, I was not expecting
>> it to be that slow. The gap between async & sync is quite huge.
>> 
>> Is this something expected? Zookeeper is used in critical functions in
>> Hadoop/Hbase, I was looking at the possible benefits of using "multi", but
>> it seems low compared to async (well ~3 times faster :-). There are many
>> small data creations/deletions with the sync API in the existing hbase
>> algorithms, it would not be simple to replace them all by asynchronous
>> calls...
>> 
>> Cheers,
>> 
>> N.
>> 
>> --
>> 
>> public class ZookeeperTest {
>> static ZooKeeper zk;
>> static int nbTests = 10000;
>> 
>> private ZookeeperTest() {
>> }
>> 
>> public static void test11() throws Exception {
>>   for (int i = 0; i < nbTests; ++i) {
>>     zk.create("/dummyTest_" + i, "dummy".getBytes(),
>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
>>   }
>> }
>> 
>> 
>> public static void test51() throws Exception {
>>   final AtomicInteger counter = new AtomicInteger(0);
>> 
>>   for (int i = 0; i < nbTests; ++i) {
>>     zk.create("/dummyTest_" + i, "dummy".getBytes(),
>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT,
>>       new AsyncCallback.StringCallback() {
>>         public void processResult(int i, String s, Object o, String s1) {
>>           counter.incrementAndGet();
>>         }
>>       }
>>       , null);
>>   }
>> 
>>   while (counter.get() != nbTests) {
>>     Thread.sleep(1);
>>   }
>> }
>> 
>> public static void test41() throws Exception {
>>   ArrayList<Op> ops = new ArrayList<Op>(nbTests);
>>   for (int i = 0; i < nbTests; ++i) {
>>     ops.add(
>>       Op.create("/dummyTest_" + i, "dummy".getBytes(),
>> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT)
>>     );
>>   }
>> 
>>   zk.multi(ops);
>> }
>> 
>> public static void delete() throws Exception {
>>   ArrayList<Op> ops = new ArrayList<Op>(nbTests);
>> 
>>   for (int i = 0; i < nbTests; ++i) {
>>     ops.add(
>>       Op.delete("/dummyTest_" + i,-1)
>>     );
>>   }
>> 
>>   zk.multi(ops);
>> }
>> 
>> 
>> public static void test(String connection, String testName) throws
>> Throwable{
>>   Method m = ZookeeperTest.class.getMethod(testName);
>> 
>>   zk = new ZooKeeper(connection, 20000, new Watcher() {
>>     public void process(WatchedEvent watchedEvent) {
>>     }
>>   });
>> 
>>   final long start = System.currentTimeMillis();
>> 
>>   try {
>>     m.invoke(null);
>>   } catch (IllegalAccessException e) {
>>     throw e;
>>   } catch (InvocationTargetException e) {
>>     throw e.getTargetException();
>>   }
>> 
>>   final long end = System.currentTimeMillis();
>> 
>>   zk.close();
>> 
>>   final long endClose = System.currentTimeMillis();
>> 
>>   System.out.println(testName+":  ExeTime= " + (end - start) );
>> }
>> 
>> public static void main(String... args) throws Throwable {
>>     test(args[0], args[1]);
>> }
>> }
>> 

flavio
junqueira
 
research scientist
 
fpj@yahoo-inc.com
direct +34 93-183-8828
 
avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301


Re: sync vs. async vs. multi performances

Posted by Camille Fournier <ca...@apache.org>.
Sync calls have to make a complete roundtrip before the next call from that
client will happen. It's not surprising at all that it would take quite a
bit longer to do a sync call than an async call. It could be that the
bottleneck in this case is your client, not your server. If the sync calls
are happening amongst clients on many different servers, it probably
doesn't matter.

C

On Tue, Feb 14, 2012 at 11:00 AM, N Keywal <nk...@gmail.com> wrote:

> Hi,
>
> I've done a test with Zookeeper 3.4.2 to compare the performances of
> synchronous vs. asynchronous vs. multi when creating znode (variations
> around:
> calling 10000 times zk.create("/dummyTest", "dummy".getBytes(),
> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);) The code is at the
> end of the mail.
>
> I've tested different environments:
> - 1 linux server with the client and 1 zookeeper node on the same machine
> - 1 linux server for the client, 1 for 1 zookeeper node.
> - 6 linux servers, 1 for the client, 5 for 5 zookeeper nodes.
>
> Server are middle range, with 4*2 cores, jdk 1.6. ZK was on its own HD.
>
> But the results are comparable:
>
> Using the sync API, it takes 200 seconds for 10K creations, so around 0.02
> second per call.
> Using the async API, it takes 2 seconds for 10K (including waiting for the
> last callback message)
> Using the "multi" available since 3.4, it takes less than 1 second, again
> for 10K.
>
> I'm surprised by the time taken by the sync operation, I was not expecting
> it to be that slow. The gap between async & sync is quite huge.
>
> Is this something expected? Zookeeper is used in critical functions in
> Hadoop/Hbase, I was looking at the possible benefits of using "multi", but
> it seems low compared to async (well ~3 times faster :-). There are many
> small data creations/deletions with the sync API in the existing hbase
> algorithms, it would not be simple to replace them all by asynchronous
> calls...
>
> Cheers,
>
> N.
>
> --
>
> public class ZookeeperTest {
>  static ZooKeeper zk;
>  static int nbTests = 10000;
>
>  private ZookeeperTest() {
>  }
>
>  public static void test11() throws Exception {
>    for (int i = 0; i < nbTests; ++i) {
>      zk.create("/dummyTest_" + i, "dummy".getBytes(),
> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
>    }
>  }
>
>
>  public static void test51() throws Exception {
>    final AtomicInteger counter = new AtomicInteger(0);
>
>    for (int i = 0; i < nbTests; ++i) {
>      zk.create("/dummyTest_" + i, "dummy".getBytes(),
> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT,
>        new AsyncCallback.StringCallback() {
>          public void processResult(int i, String s, Object o, String s1) {
>            counter.incrementAndGet();
>          }
>        }
>        , null);
>    }
>
>    while (counter.get() != nbTests) {
>      Thread.sleep(1);
>    }
>  }
>
>  public static void test41() throws Exception {
>    ArrayList<Op> ops = new ArrayList<Op>(nbTests);
>    for (int i = 0; i < nbTests; ++i) {
>      ops.add(
>        Op.create("/dummyTest_" + i, "dummy".getBytes(),
> ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT)
>      );
>    }
>
>    zk.multi(ops);
>  }
>
>  public static void delete() throws Exception {
>    ArrayList<Op> ops = new ArrayList<Op>(nbTests);
>
>    for (int i = 0; i < nbTests; ++i) {
>      ops.add(
>        Op.delete("/dummyTest_" + i,-1)
>      );
>    }
>
>    zk.multi(ops);
>  }
>
>
>  public static void test(String connection, String testName) throws
> Throwable{
>    Method m = ZookeeperTest.class.getMethod(testName);
>
>    zk = new ZooKeeper(connection, 20000, new Watcher() {
>      public void process(WatchedEvent watchedEvent) {
>      }
>    });
>
>    final long start = System.currentTimeMillis();
>
>    try {
>      m.invoke(null);
>    } catch (IllegalAccessException e) {
>      throw e;
>    } catch (InvocationTargetException e) {
>      throw e.getTargetException();
>    }
>
>    final long end = System.currentTimeMillis();
>
>    zk.close();
>
>    final long endClose = System.currentTimeMillis();
>
>    System.out.println(testName+":  ExeTime= " + (end - start) );
>  }
>
>  public static void main(String... args) throws Throwable {
>      test(args[0], args[1]);
>  }
> }
>