You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Tobias Bocanegra <tr...@apache.org> on 2013/09/30 22:22:49 UTC

Tree.hasChild() / Tree.getChild()

Hi,

I was looking at some spots where the code can be optimized, and I
found some occurrences where a Tree.hasChild() is followed by a
Tree.getChild(). I think this pattern is used commonly. Currently the
hasChild() is implemented with getChild() anyways:

    public boolean hasChild(String name) {
        return createChild(checkNotNull(name)).exists();
    }

so I think that it could make sense to add an optimizable convenience method:

    /**
     * Get a possibly non existing child of this {@code Tree}. If the
child does not exist and
     * {@code mustExist} is true, null is returned.
     * <p>
     * This is basically an optimized convenience method for
     *
     * <pre>
     * child = tree.getChild(name); return !mustExist ||
child.exists() ? child : null;
     * </pre>
     *
     * @param name The name of the child to retrieve.
     * @param mustExist Flag enforcing existence
     * @return The child with the given {@code name} or null if the
chid does not exist
     *              and {@code mustExist} is true.
     */
    @Nullable
    Tree Tree.getChild(@Nonnull String name, boolean mustExist)

WDYT?
Regards, Toby

Re: Tree.hasChild() / Tree.getChild()

Posted by Michael Dürig <md...@apache.org>.

On 1.10.13 3:49 , Jukka Zitting wrote:
> On Tue, Oct 1, 2013 at 9:27 AM, Michael Dürig<md...@apache.org>  wrote:
>> >But since we have that exists call, getChild() can return just a "thunk".
>> >The actual round-trip is deferred to later accesses of the returned child.
>> >And the hasNode() optimisation you mention would go into the exists() call.
>> >
>> >I'm aware that things are currently not implemented that way and we even
>> >implement hasChild() through getChild().exists(). But maybe we should
>> >reconsider this!?
> We can, though I'm a bit skeptical on whether the benefit is worth the effort.

Ok, let's keep this in mind. We should look into whether we need to 
improve the current hasChild() implementation though.

Michael

Re: Tree.hasChild() / Tree.getChild()

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Tue, Oct 1, 2013 at 9:27 AM, Michael Dürig <md...@apache.org> wrote:
> But since we have that exists call, getChild() can return just a "thunk".
> The actual round-trip is deferred to later accesses of the returned child.
> And the hasNode() optimisation you mention would go into the exists() call.
>
> I'm aware that things are currently not implemented that way and we even
> implement hasChild() through getChild().exists(). But maybe we should
> reconsider this!?

We can, though I'm a bit skeptical on whether the benefit is worth the effort.

BR,

Jukka Zitting

Re: Tree.hasChild() / Tree.getChild()

Posted by Michael Dürig <md...@apache.org>.

On 1.10.13 3:09 , Jukka Zitting wrote:
> The double-checking pattern is a leftover, but the hasChild() method
> as such still has a place. Depending on the backend, a getChild() call
> may require the identified node to be fetched from the disk or over
> the network (SegmentMK uses lazy loading to avoid this). So if we're
> only interested in the existence of a node (like in
> Session.nodeExists), hasChild() can in some cases be orders of
> magnitude faster than getChild().

But since we have that exists call, getChild() can return just a 
"thunk". The actual round-trip is deferred to later accesses of the 
returned child. And the hasNode() optimisation you mention would go into 
the exists() call.

I'm aware that things are currently not implemented that way and we even 
implement hasChild() through getChild().exists(). But maybe we should 
reconsider this!?

Michael

Re: Tree.hasChild() / Tree.getChild()

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Tue, Oct 1, 2013 at 3:45 AM, Michael Dürig <md...@apache.org> wrote:
> We could remove the hasChild() method to enforce this pattern. As you say
> this is a leftover from the time we didn't have the exists() method. WDYT?

The double-checking pattern is a leftover, but the hasChild() method
as such still has a place. Depending on the backend, a getChild() call
may require the identified node to be fetched from the disk or over
the network (SegmentMK uses lazy loading to avoid this). So if we're
only interested in the existence of a node (like in
Session.nodeExists), hasChild() can in some cases be orders of
magnitude faster than getChild().

BR,

Jukka Zitting

Re: Tree.hasChild() / Tree.getChild()

Posted by Michael Dürig <md...@apache.org>.

On 1.10.13 3:56 , Jukka Zitting wrote:
> On Mon, Sep 30, 2013 at 4:22 PM, Tobias Bocanegra<tr...@apache.org>  wrote:
>> >I was looking at some spots where the code can be optimized, and I
>> >found some occurrences where a Tree.hasChild() is followed by a
>> >Tree.getChild(). I think this pattern is used commonly.
> Right. I think the pattern dates back to the time before we added the
> Tree.exists() method and made Tree.getChild() return non-null every
> time regardless of whether the identified child exists or is
> accessible.
>
> Instead of the old pattern ...
>
>      if (tree.hasChild(name)) {
>          Tree child = tree.getChild(name);
>          ...;
>      } else {
>          ...;
>      }
>
> ... a better pattern would now be ...
>
>      Tree child = tree.getChild(name);
>      if (child.exists()) {
>          ...;
>      } else {
>          ...;
>      }
>

We could remove the hasChild() method to enforce this pattern. As you 
say this is a leftover from the time we didn't have the exists() method. 
WDYT?

Michael

Re: Tree.hasChild() / Tree.getChild()

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Mon, Sep 30, 2013 at 11:08 PM, Tobias Bocanegra <tr...@apache.org> wrote:
> On Mon, Sep 30, 2013 at 6:56 PM, Jukka Zitting <ju...@gmail.com> wrote:
>> The only difference is between "child.exists()" and "child != null",
>> and AFAICT any optimizations done in such an extra getChild() method
>> could also be used to optimize the exists() method.
>
> right. but depending on the persistence implementation and caching, it
> could result in fetching the child twice if using just the exists()
> check.

The current MutableTree.exists() method just calls enter(), which contains:

    private boolean enter() {
        root.checkLive();
        if (isHidden(name)) {
            return false;
        } else if (applyPendingMoves()) {
            return reconnect();
        } else {
            return nodeBuilder.exists();
        }
    }

In the normal case when there have been no intermediate changes, none
of those calls (except reconnect) should require the creation of any
extra objects or doing anything else that could be expensive. If
needed, the method could be further optimized by memorizing the return
value, but I'd be surprised if this was a bottleneck for anything.

BR,

Jukka Zitting

Re: Tree.hasChild() / Tree.getChild()

Posted by Tobias Bocanegra <tr...@apache.org>.
Hi,

On Mon, Sep 30, 2013 at 6:56 PM, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> On Mon, Sep 30, 2013 at 4:22 PM, Tobias Bocanegra <tr...@apache.org> wrote:
>> I was looking at some spots where the code can be optimized, and I
>> found some occurrences where a Tree.hasChild() is followed by a
>> Tree.getChild(). I think this pattern is used commonly.
>
> Right. I think the pattern dates back to the time before we added the
> Tree.exists() method and made Tree.getChild() return non-null every
> time regardless of whether the identified child exists or is
> accessible.
>
> Instead of the old pattern ...
>
>     if (tree.hasChild(name)) {
>         Tree child = tree.getChild(name);
>         ...;
>     } else {
>         ...;
>     }
>
> ... a better pattern would now be ...
>
>     Tree child = tree.getChild(name);
>     if (child.exists()) {
>         ...;
>     } else {
>         ...;
>     }
>
>> so I think that it could make sense to add an optimizable convenience method:
>> [...]
>>     @Nullable
>>     Tree Tree.getChild(@Nonnull String name, boolean mustExist)
>
> I'm not sure if this helps much beyond the exists() mechanism. The
> case of mustExists == false is equivalent to the existing getChild()
> method, and the access pattern with the mustExist == true case would
> be:
>
>     Tree child = tree.getChild(name, true);
>     if (child != null) {
>         ...;
>     } else {
>         ...;
>     }
>
> The only difference is between "child.exists()" and "child != null",
> and AFAICT any optimizations done in such an extra getChild() method
> could also be used to optimize the exists() method.

right. but depending on the persistence implementation and caching, it
could result in fetching the child twice if using just the exists()
check. It might be nit-picky, but each createChild() creates 2-3
objects. of course this optimization only makes sense, if that
signature is propagated further down.

OTOH, I looked at the places where hasChild/getChild is used again,
and it's only in 3-4 places. so maybe not important to enough to
change.

regards, toby

Re: Tree.hasChild() / Tree.getChild()

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Mon, Sep 30, 2013 at 4:22 PM, Tobias Bocanegra <tr...@apache.org> wrote:
> I was looking at some spots where the code can be optimized, and I
> found some occurrences where a Tree.hasChild() is followed by a
> Tree.getChild(). I think this pattern is used commonly.

Right. I think the pattern dates back to the time before we added the
Tree.exists() method and made Tree.getChild() return non-null every
time regardless of whether the identified child exists or is
accessible.

Instead of the old pattern ...

    if (tree.hasChild(name)) {
        Tree child = tree.getChild(name);
        ...;
    } else {
        ...;
    }

... a better pattern would now be ...

    Tree child = tree.getChild(name);
    if (child.exists()) {
        ...;
    } else {
        ...;
    }

> so I think that it could make sense to add an optimizable convenience method:
> [...]
>     @Nullable
>     Tree Tree.getChild(@Nonnull String name, boolean mustExist)

I'm not sure if this helps much beyond the exists() mechanism. The
case of mustExists == false is equivalent to the existing getChild()
method, and the access pattern with the mustExist == true case would
be:

    Tree child = tree.getChild(name, true);
    if (child != null) {
        ...;
    } else {
        ...;
    }

The only difference is between "child.exists()" and "child != null",
and AFAICT any optimizations done in such an extra getChild() method
could also be used to optimize the exists() method.

BR,

Jukka Zitting