You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Edward Ribeiro <ed...@gmail.com> on 2013/02/13 21:40:41 UTC

DataTree

Hi folks,

I was reading DataTree code last night, and I would like to ask some
snippets I didn't understand, so, please, bear with me, and excuse me in
advance if I say something stupid. I am still finding my way through the
code base.

1. createNode()

 480 HashSet<String> list = ephemerals.get(ephemeralOwner); 481 if (list ==
null) { 482 list = new HashSet<String>();
483ephemerals.put(ephemeralOwner, list);
484 } 485 synchronized (list) { 486 list.add(path); 487 }

The snippet above includes the path in the list of ephemeral nodes, and
creates an entry for that specific ephemeralOwner (lines 481-484), if it
doesn't exists. Suppose that "ephemerals" is initially empty for a given
ephemeralOwner, and you have two threads arriving, at roughly the same
time, at line 481. Both will modify the same entry (ephemeralOwner), that
is empty, so they will create a new List at line 482.

So far, no problem, because "ephemerals" is a ConcurrentMap so the access
will be thread fase, and the last write will win (overriding the previous
one). The first problem is that as each thread created its own "list" then
each one will synchronize on a different "list" reference at line 485
(defeating the purpose of synchronized for this specific case).

Even here there's no problem because each thread is modifying its own copy
of "list", but as one of the was overwritten at "ephemerals" then one of
the paths will be eventually be lost (because its "list" was overwritten
and not more present in "ephemerals"). I don't know if this specific
situation is really feasible or once in a million chance -- I suppose it
can happen  because if the code is synchronizing at line 485 then it says
it's possible that two threads alter the same Set<String> at the same time.
Again, excuse me and ignore this message, if I said something really
stupid. I am still finding my way through the code base. :)

The solution is I came with is very simple:

 480 HashSet<String> list = ephemerals.get(ephemeralOwner); 481 if (list ==
null) { 482 list = ephemerals.putIfAbsent(ephemeralOwner, new
HashSet<String>()); 483 484 } 485 synchronized (list) { 486 list.add(path);
487 }

The putIfAbsent() method will work as follow (quoted from javadoc):

     If the specified key is not already associated with a value, associate
it with the given value. This is equivalent to

   if (!map.containsKey(key))
      return map.put(key, value);
   else
      return map.get(key);

Now you will always have the two (or more) threads holding references to
the same "list" object, line 485 works nicely, so that they don't loose any
information to be included

2. The code below leaks the internal keySet of "ephemerals" to the outside
world (I looked at ConcurrentHashMap impl and, to my surprise it's not a
copy as I've always thought, it's the  internal Map's keySett !!! ), and
this is potentially dangerous.

 199 public Collection<Long> getSessions() { 200 return
ephemerals.keySet(); 201 }

I suggest to replace by this

 199 public Collection<Long> getSessions() { 200 return
Collections.unmodifiableSet<Long>(ephemerals.keySet()); 201 }

or this:

 199 public Collection<Long> getSessions() { 200 return
Collections.unmodifiableSet<Long>(new HashSet<Long>(ephemerals.keySet()));
201 }

3. Line 408-409 state the following (but they are more lines like this in
file):

 408 * the session id that owns this node. -1 indicates this is not 409 *
an ephemeral node.

But looking at the souce code you see:

 479 if (ephemeralOwner != 0) {

 1198 if (eowner != 0)

 779 createTxn.getEphemeral() ? header.getClientId() : 0,

Ooops, it's not -1, but 0 (zero) that determines a non-ephemeral node. Am I
right?

Regards,
Edward

Re: DataTree

Posted by Edward Ribeiro <ed...@gmail.com>.
Cool.

I was thinking about something along the lines of
Collections.newSetFromMap(new ConcurrentHashMap<...>()) too, so that it'd
possible to get rid of synchronized like you did. Another point would be
the use of Guava's Multimaps, but then it would loose the benefits of
ConcurrentMap.

Thanks for the explanation, Thawan!

Edward

On Thu, Feb 14, 2013 at 3:27 AM, Thawan Kooburat <th...@fb.com> wrote:

> In our branch, where we did a lot conversion from synchronize to atomic
> variables or concurrent data structure. Our DataTree.createNode() calls
> this method to update ephemeral map
>
> private void addEphemeralPath(String path, long ephemeralOwner) {
>   if (ephemeralOwner != 0) {
>     Set<String> list = ephemerals.get(ephemeralOwner);
>     if (list == null) {
>       Set<String> newList = Collections.newSetFromMap(
>           new ConcurrentHashMap<String, Boolean>(4, 0.75f, 2));
>       list = ephemerals.putIfAbsent(ephemeralOwner, newList);
>       if (list == null) {
>         list = newList;
>       }
>     }
>     list.add(path);
>   }
>     }
>
> However, race-condition respect to write-request is not possible at the
> moment due to ZOOKEEPER-1505. The new CommitProcessor only allow 1 write
> request at any given time. So there can be only one thread that can modify
> the DataTree  (unless we remove that constraint in the future)
>
>
>
> --
> Thawan Kooburat
>
>
>
>
>
>

Re: DataTree

Posted by Thawan Kooburat <th...@fb.com>.
In our branch, where we did a lot conversion from synchronize to atomic
variables or concurrent data structure. Our DataTree.createNode() calls
this method to update ephemeral map

private void addEphemeralPath(String path, long ephemeralOwner) {
  if (ephemeralOwner != 0) {
    Set<String> list = ephemerals.get(ephemeralOwner);
    if (list == null) {
      Set<String> newList = Collections.newSetFromMap(
          new ConcurrentHashMap<String, Boolean>(4, 0.75f, 2));
      list = ephemerals.putIfAbsent(ephemeralOwner, newList);
      if (list == null) {
        list = newList;
      }
    }
    list.add(path);
  }
    }

However, race-condition respect to write-request is not possible at the
moment due to ZOOKEEPER-1505. The new CommitProcessor only allow 1 write
request at any given time. So there can be only one thread that can modify
the DataTree  (unless we remove that constraint in the future)
 


-- 
Thawan Kooburat