You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Thomas Koch <th...@koch.ro> on 2010/12/03 21:05:33 UTC

DataTree as HashMap and tree?

Hi,

the DataTree structure on the ZK server stores each DataNode in a 
ConcurrentHashMap (DataTree.nodes) and in a a double-linked tree of DataNodes. 
The class' javadoc says that the additional hashmap is used for fast lookup.

Have you ever done a benchmark, whether the hashmap really provides a 
significant speed improvement compared to traversing the tree?

Example: I want to lookup the DataNode /a/b/c. Traversing the tree means I 
have to make three lookups in children lists of nodes. The current code makes 
one lookup in the big hashmap of _all_ nodes. I doubt, that the current code 
is really faster, but I wanted to ask here before programming a benchmark.

Together with this question goes another one: I have some ideas about 
improving Path handling, which could significantly save memory but could make 
path operations more CPU intensive by one magnitude. In your experience with 
large ZK installations, is CPU an issue? Is memory such an issue that you 
would trade some CPU for it?

Best regards,

Thomas Koch, http://www.koch.ro

Re: DataTree as HashMap and tree?

Posted by Mahadev Konar <ma...@yahoo-inc.com>.
Hi Thomas,
  It would be great to have benchmarks to see if the hashmap helps or not. For deeper trees I think it would help. This is what we usually do in Hadoop as well for faster look ups. But please feel free to benchmark and post your results.

Thanks
mahadev


On 12/3/10 12:05 PM, "Thomas Koch" <th...@koch.ro> wrote:

Hi,

the DataTree structure on the ZK server stores each DataNode in a
ConcurrentHashMap (DataTree.nodes) and in a a double-linked tree of DataNodes.
The class' javadoc says that the additional hashmap is used for fast lookup.

Have you ever done a benchmark, whether the hashmap really provides a
significant speed improvement compared to traversing the tree?

Example: I want to lookup the DataNode /a/b/c. Traversing the tree means I
have to make three lookups in children lists of nodes. The current code makes
one lookup in the big hashmap of _all_ nodes. I doubt, that the current code
is really faster, but I wanted to ask here before programming a benchmark.

Together with this question goes another one: I have some ideas about
improving Path handling, which could significantly save memory but could make
path operations more CPU intensive by one magnitude. In your experience with
large ZK installations, is CPU an issue? Is memory such an issue that you
would trade some CPU for it?

Best regards,

Thomas Koch, http://www.koch.ro