You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Charles Brooking <pu...@charlie.brooking.id.au> on 2009/09/18 08:16:08 UTC

Escaping/encoding of paths/names/values

Hi all,

In tackling the issue of escaping/encoding or paths, names, and values in
the context of JCR-based web application, I've discovered it's not so
simple. From my searching at least, there is little information online to
help, so I thought I'd write with my understanding so far and perhaps
others can chip in (most likely to correct me).

There are utility methods for escaping/encoding in the
org.apache.jackrabbit.util.ISO9075 and org.apache.jackrabbit.util.Text
classes. Although developed under Jackrabbit, they are part of the JCR
Commons module which only depends on the JCR API.

If you're building a path from user-supplied names, you need to escape
illegal JCR characters (eg item:1 becomes item%3A1):

  String path = "/foo/" + Text.escapeIllegalJcrChars(name);

Such paths are useful for JCR methods like Session.getItem(...) etc.
(Related to this: is there a utility to escape illegal JCR characters in
paths as opposed to just names?)

If you want to use paths in XPath queries, though, you need to escape
according to ISO9075 rules (eg 1hr0 becomes _x0031_hr0):

  String query =
    "/jcr:root" + ISO9075.encodePath(node.getPath()) +
    "/" + ISO9075.encode(name);

For a user-supplied string, this could lead to something like
ISO9075.encode(Text.escapeIllegalJcrChars(name)).

For values inserted into the queries, you should do escaping to prevent
incorrect values and query injection. Generally, if you enclose values in
single quotes, you just need to replace any literal single quote character
with '' (two consecutive single quote characters). There is also a
Text.escapeIllegalXpathSearchChars(...) method you should use for calls to
jcr:contains(...).

  String q =
    "/jcr:root/foo/element(*, foo)" +
    "[jcr:contains(@title, '" +
    Text.escapeIllegalXpathSearchChars(q).replaceAll("'", "''") + "')]"
    "[@itemID = '" + itemID.replaceAll("'", "''") + "']";

There are further encoding/decoding methods in the Text class for dealing
with URIs in a webapp. And this is where I get really confused: the JCR
encoding scheme mimics percent-encoding used in URIs but is only said to
be "loosely modeled after URI encoding". What is the recommended approach
in converting between URI paths and their mapping to/from JCR paths?

Apologies if I've missed any existing online guides about this. Hopefully
we can make a nice page for the based on examples like the ones above.

Later
Charlie


Re: Querying multi-valued properties

Posted by Marcel Reutegger <ma...@gmx.net>.
Hi,

On Thu, Oct 1, 2009 at 17:01, Mohinder Singh <ms...@swri.org> wrote:
> Hi,
> I am using following query to get the multi-valued property 'keywords' from
> 'partialKeywords' (in order to achieve 'Google suggest' like behavior):
>
> "//element(*, sw:resource)/@sw:keywords[jcr:like(@sw:keywords, '%" +
> partialKeyword + "%')]";
>
> How can I now retrieve multi-valued 'keywords' quickly from the query
> result?

you get the NodeIterator from the query result and then get the
property values using the regular API:
node.getProperty('sw:keywords').getValues()

please note that jcr:like with wildcard prefixes might be slow.
Specifically if you have lots of distinct values for that property. I
suggest you use jcr:contains if possible.

regards
 marcel

Querying multi-valued properties

Posted by Mohinder Singh <ms...@swri.org>.
Hi,
I am using following query to get the multi-valued property 'keywords' from
'partialKeywords' (in order to achieve 'Google suggest' like behavior):

"//element(*, sw:resource)/@sw:keywords[jcr:like(@sw:keywords, '%" +
partialKeyword + "%')]";

How can I now retrieve multi-valued 'keywords' quickly from the query
result?

Thanks,
Mohinder



Re: Escaping/encoding of paths/names/values

Posted by Alexander Klimetschek <ak...@day.com>.
On Wed, Sep 30, 2009 at 09:22, Charles Brooking
<pu...@charlie.brooking.id.au> wrote:
>> On Fri, Sep 18, 2009 at 08:16, Charles Brooking
>> Good idea, you could put it onto the wiki, maybe on the examples page:
>> http://wiki.apache.org/jackrabbit/ExamplesPage
>
> See <http://wiki.apache.org/jackrabbit/EncodingAndEscaping>.

Great, thanks for the effort!

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Escaping/encoding of paths/names/values

Posted by Charles Brooking <pu...@charlie.brooking.id.au>.
> On Fri, Sep 18, 2009 at 08:16, Charles Brooking
> Good idea, you could put it onto the wiki, maybe on the examples page:
> http://wiki.apache.org/jackrabbit/ExamplesPage

See <http://wiki.apache.org/jackrabbit/EncodingAndEscaping>.

Later
Charlie


Re: Escaping/encoding of paths/names/values

Posted by Alexander Klimetschek <ak...@day.com>.
On Fri, Sep 18, 2009 at 08:16, Charles Brooking
<pu...@charlie.brooking.id.au> wrote:
> In tackling the issue of escaping/encoding or paths, names, and values in
> the context of JCR-based web application, I've discovered it's not so
> simple.

Yes ;-) But in practice it shouldn't be a problem, because the rules
are precise. And you already deducted all of them properly! (see
below)

> There are utility methods for escaping/encoding in the
> org.apache.jackrabbit.util.ISO9075 and org.apache.jackrabbit.util.Text
> classes. Although developed under Jackrabbit, they are part of the JCR
> Commons module which only depends on the JCR API.
>
> If you're building a path from user-supplied names, you need to escape
> illegal JCR characters (eg item:1 becomes item%3A1):
>
>  String path = "/foo/" + Text.escapeIllegalJcrChars(name);
>
> Such paths are useful for JCR methods like Session.getItem(...) etc.

Correct.

> (Related to this: is there a utility to escape illegal JCR characters in
> paths as opposed to just names?)

No, but in practice you will mostly just create a single node based on
a user-supplied value or if it's a path, you typically split it up
anyway and create nodes step by step, as there are often other things
to do (eg. mixin types, properties, etc.).

> If you want to use paths in XPath queries, though, you need to escape
> according to ISO9075 rules (eg 1hr0 becomes _x0031_hr0):
>
>  String query =
>    "/jcr:root" + ISO9075.encodePath(node.getPath()) +
>    "/" + ISO9075.encode(name);

Correct.

> For a user-supplied string, this could lead to something like
> ISO9075.encode(Text.escapeIllegalJcrChars(name)).

Yes, although I haven't seen a need for that combination so far, as
you typically run such a query because you know the Node in question
and do a getName() on it or the path you search in are defined by your
application already and are simple and ascii-based (eg. /home/users).

> For values inserted into the queries, you should do escaping to prevent
> incorrect values and query injection. Generally, if you enclose values in
> single quotes, you just need to replace any literal single quote character
> with '' (two consecutive single quote characters). There is also a
> Text.escapeIllegalXpathSearchChars(...) method you should use for calls to
> jcr:contains(...).
>
>  String q =
>    "/jcr:root/foo/element(*, foo)" +
>    "[jcr:contains(@title, '" +
>    Text.escapeIllegalXpathSearchChars(q).replaceAll("'", "''") + "')]"
>    "[@itemID = '" + itemID.replaceAll("'", "''") + "']";

Correct.

> There are further encoding/decoding methods in the Text class for dealing
> with URIs in a webapp. And this is where I get really confused: the JCR
> encoding scheme mimics percent-encoding used in URIs but is only said to
> be "loosely modeled after URI encoding". What is the recommended approach
> in converting between URI paths and their mapping to/from JCR paths?

The allowed chars for JCR names contains the URI set plus a few others
(eg. spaces). Thus the URI set is acutally more constrained.
Therefore, if you have a valid URI, you can map it directly onto a JCR
path without having to worry about escaping (this is by design).

If you go the other way, eg. have a JCR path and want to create an URI
for it, you simply use plain URI escaping for it (which often happens
anyway).

To make everything simpler in the context of URIs, I suggest you
always create only JCR nodes with names that are valid URIs.

> Apologies if I've missed any existing online guides about this. Hopefully
> we can make a nice page for the based on examples like the ones above.

Good idea, you could put it onto the wiki, maybe on the examples page:
http://wiki.apache.org/jackrabbit/ExamplesPage

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com