You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Tobias Bocanegra <tr...@apache.org> on 2013/11/01 06:18:27 UTC

Paths in Oak

Hi,

I debugged a simple call like:

session.getProperty("/a/b/foo").getString();

and was really astonished how many path conversion, checking,
manipulation, tree/parent/child access operations are performed. not
only in access control, but everywhere. it looks like most of the
time, oak is busy converting, checking, truncating string paths :-)

for example, the NameParthMapperImpl.getOakPath()
- is overly complicated for the trivial case, where no namespaces
mappings are present
- creates extra object PathListener and many elements
- reconcatenates the path allthough it's still the original string
- truncates the last char

also,
- PathUtils.isValid() / isAbsolute() is used very often.
- why check PathUtils.isValid() for a path that comes from the NamePathMapper

>From stepping though the code, it looks like the full path is not used
very often, but rather the individual segments. maybe it would make
sense to re-use jackrabbits Name and Path classes.

Further, getProperty().getString() actually fetches the property twice
and also checks access control twice. once in the getProperty() call,
and once when fetching the value. I would assume, that the value is
already stored in the PropertyState after the getProperty() call.

regards, toby

Re: Paths in Oak

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Fri, Nov 1, 2013 at 1:18 AM, Tobias Bocanegra <tr...@apache.org> wrote:
> Further, getProperty().getString() actually fetches the property twice
> and also checks access control twice. once in the getProperty() call,
> and once when fetching the value. I would assume, that the value is
> already stored in the PropertyState after the getProperty() call.

I came up with a few ideas on how to avoid the remaining duplicate
property lookup. See OAK-1139.

BR,

Jukka Zitting

Re: Paths in Oak

Posted by Michael Dürig <md...@apache.org>.

On 13.11.13 4:31 , Jukka Zitting wrote:
> Hi,
>
> On Mon, Nov 4, 2013 at 6:28 AM, Michael Dürig <md...@apache.org> wrote:
>> Wrapping the path and related entities into dedicated classes will surely
>> add some overhead at first. But it will OTOH clearly communicate the intend
>> of what otherwise are just naked strings. In addition it will introduce a
>> clear boundary for optimisations while in the string case these blur with
>> the client code.
>
> Would it be possible for us to achieve the best of both worlds here?
> I'm thinking of potential Path and Name classes as just thin type-safe
> wrappers around the underlying Strings.

Yes, that was my thinking. Not sure if we even need the name classes.

>
> When reading content that's already cached (which is the common case),
> path handling becomes a major component in access performance (see my
> comment on OAK-1168 for some figures), so it's critical for it to stay
> as lightweight as possible. That's the main reason why I'd like to do
> as little parsing of the given path strings as possible and avoid any
> extra transformations.

Agree. And I even think this should be easier with dedicated classes.

>
> But you're right in that extra type-safety would make things clearer
> and help avoid errors (currently we just rely on naming patterns like
> oakPath vs. jcrPath). Also, such wrappers might make it easier to
> address OAK-1168/OAK-1174 without too much overhead.

I created https://issues.apache.org/jira/browse/OAK-1179 to track this.

Michael

>
> BR,
>
> Jukka Zitting
>

Re: Paths in Oak

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Mon, Nov 4, 2013 at 6:28 AM, Michael Dürig <md...@apache.org> wrote:
> Wrapping the path and related entities into dedicated classes will surely
> add some overhead at first. But it will OTOH clearly communicate the intend
> of what otherwise are just naked strings. In addition it will introduce a
> clear boundary for optimisations while in the string case these blur with
> the client code.

Would it be possible for us to achieve the best of both worlds here?
I'm thinking of potential Path and Name classes as just thin type-safe
wrappers around the underlying Strings.

When reading content that's already cached (which is the common case),
path handling becomes a major component in access performance (see my
comment on OAK-1168 for some figures), so it's critical for it to stay
as lightweight as possible. That's the main reason why I'd like to do
as little parsing of the given path strings as possible and avoid any
extra transformations.

But you're right in that extra type-safety would make things clearer
and help avoid errors (currently we just rely on naming patterns like
oakPath vs. jcrPath). Also, such wrappers might make it easier to
address OAK-1168/OAK-1174 without too much overhead.

BR,

Jukka Zitting

Re: Paths in Oak

Posted by Michael Dürig <md...@apache.org>.

On 1.11.13 1:43 , Jukka Zitting wrote:
> In Oak it has been an explicit goal to avoid those conversions for the
> common case where no namespace remappings are present. Then in most
> cases it should be possible to just keep the original path string
> passed by the client and use String.substring() for the individual
> path segments.

TBH I still doubt that this is the right approach. While it might be 
possible to make this working initially, it will also make it way to 
easy to introduce performance regressions again later on. Having these 
path conversion functions around means people will use them whenever 
they feel like it thus breaking the initial goal to "just pass the 
strings around".

Wrapping the path and related entities into dedicated classes will 
surely add some overhead at first. But it will OTOH clearly communicate 
the intend of what otherwise are just naked strings. In addition it will 
introduce a clear boundary for optimisations while in the string case 
these blur with the client code.

In a nutshell: I think we can make it work but I fear it will break 
again and again further down the line.

Michael

Re: Paths in Oak

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Fri, Nov 1, 2013 at 1:18 AM, Tobias Bocanegra <tr...@apache.org> wrote:
> I debugged a simple call like:
>
> session.getProperty("/a/b/foo").getString();
>
> and was really astonished how many path conversion, checking,
> manipulation, tree/parent/child access operations are performed. not
> only in access control, but everywhere. it looks like most of the
> time, oak is busy converting, checking, truncating string paths :-)

Indeed! This has been a recurring topic, see
https://issues.apache.org/jira/browse/OAK-978 for the latest version
of that debate (and https://issues.apache.org/jira/browse/OAK-1015 for
followup).

> From stepping though the code, it looks like the full path is not used
> very often, but rather the individual segments. maybe it would make
> sense to re-use jackrabbits Name and Path classes.

One of the performance/memory issues I spent a lot of time on with
Jackrabbit was optimizing the internal Name and Path classes and I'm
still not happy with the overhead of all the parsing/serialization and
extra objects we need there.

In Oak it has been an explicit goal to avoid those conversions for the
common case where no namespace remappings are present. Then in most
cases it should be possible to just keep the original path string
passed by the client and use String.substring() for the individual
path segments.

To make this work really smoothly and efficiently, we need to separate
the tasks of path/name mapping (i.e. namespace prefixes) from
path/name validation (checking whether names are valid). Unfortunately
those tasks are currently mixed in NamePathMapper (which is what
you're seeing), and we haven't yet taken on the non-trivial effort of
untangling them. Perhaps we should try now.

> Further, getProperty().getString() actually fetches the property twice
> and also checks access control twice. once in the getProperty() call,
> and once when fetching the value. I would assume, that the value is
> already stored in the PropertyState after the getProperty() call.

Some while ago we'd actually fetch the property a dozen times (see
http://markmail.org/message/v2ydm5xksd25glm4), so twice is already a
major improvement... Though of course you're right that there should
only need to be a single PropertyState lookup for that pattern.

That's the kind of unnecessary work I was referring to in my comment
to OAK-1138. Before spending too much effort optimizing the lower
levels of code, we should ensure that higher levels of code isn't
causing duplicate or unnecessary work to be performed. Not doing
something at all is always faster than trying to optimize it.

BR,

Jukka Zitting