You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by PThiemann <ph...@googlemail.com> on 2009/02/26 10:47:50 UTC

Slashes in wildcard query string do not work

Hello,

I am using Jackrabbit 1.4.2 in an Oracle and RedHat Linux environment.

I have a strange problem in searching for a term with a slash in the query
string ending with a wildcard (e.g. F/OS*)

I have a node property XYZ with value F/OSAM which I am trying to search for
using a XPATH query.

When searching for the following I do get a correct result:
//element(*,
custom:file)[jcr:contains(custom:extendedProperties/@XYZ,'F/OSAM')]/custom:extendedProperties/rep:excerpt(.)

When searching for the next query string I do not get a result. Although
using wildcards in my query:
//element(*,
custom:file)[jcr:contains(custom:extendedProperties/@XYZ,'F/OS*')]/custom:extendedProperties/rep:excerpt(.)

Now there is the strange thing. When I search (leaving out the /) for the
following I can see my result again.
//element(*, custom:file)[jcr:contains(custom:extendedProperties/@XYZ,'F
OS*')]/custom:extendedProperties/rep:excerpt(.)

Is the slash not indexed by lucene or do I have to escape the slash for
Jackrabbit for not being recognized as path delimiter?

For any suggestions I would be glad.

Thanks,
Philipp
-- 
View this message in context: http://www.nabble.com/Slashes-in-wildcard-query-string-do-not-work-tp22220831p22220831.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.


Re: Slashes in wildcard query string do not work

Posted by PThiemann <ph...@googlemail.com>.

Marcel Reutegger wrote:
> 
> Hi,
> 
> On Thu, Feb 26, 2009 at 21:47, Alexander Klimetschek <ak...@day.com>
> wrote:
>> On Thu, Feb 26, 2009 at 8:11 PM, PThiemann
>>> Does anyone know if query highlighing is supported with jcr:like?
>>
>> Yes, I think this is the case.
> 
> no, it's not supported. only query terms in a jcr:contains are
> considered for highlighting.
> 
>> It seems the default excerpt
>> highlighting is not able to handle wildcards.
> 
> yes, it is, but only from jcr:contains. it does not depend on the
> excerpt provider, but on the underlying lucene query being able to
> provide terms on Query.extractTerms(Set). The wildcard query does this
> by expanding the pattern and collect all matching tokens in the index.
> this process is however limited to 1024 tokens. if your pattern
> matches more than 1024 distinct tokens in the index, you won't get any
> highlighted terms in the excerpt for the wildcard term.
> 
> regards
>  marcel
> 
> 

Thanks a lot. Now I see things much more clearly. :-)
-- 
View this message in context: http://www.nabble.com/Slashes-in-wildcard-query-string-do-not-work-tp22220831p22245000.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.


Re: Slashes in wildcard query string do not work

Posted by Alexander Klimetschek <ak...@day.com>.
On Thu, Feb 26, 2009 at 10:26 PM, Marcel Reutegger
<ma...@gmx.net> wrote:
>>> Does anyone know if query highlighing is supported with jcr:like?
>>
>> Yes, I think this is the case.
>
> no, it's not supported. only query terms in a jcr:contains are
> considered for highlighting.

Ah, thanks, what I wanted to say was: "Yes, it is the case that query
highlighting is *not* supported with jcr:like." Did read the question
above in the wrong way ;-)

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Slashes in wildcard query string do not work

Posted by Marcel Reutegger <ma...@gmx.net>.
Hi,

On Thu, Feb 26, 2009 at 21:47, Alexander Klimetschek <ak...@day.com> wrote:
> On Thu, Feb 26, 2009 at 8:11 PM, PThiemann
>> Does anyone know if query highlighing is supported with jcr:like?
>
> Yes, I think this is the case.

no, it's not supported. only query terms in a jcr:contains are
considered for highlighting.

> It seems the default excerpt
> highlighting is not able to handle wildcards.

yes, it is, but only from jcr:contains. it does not depend on the
excerpt provider, but on the underlying lucene query being able to
provide terms on Query.extractTerms(Set). The wildcard query does this
by expanding the pattern and collect all matching tokens in the index.
this process is however limited to 1024 tokens. if your pattern
matches more than 1024 distinct tokens in the index, you won't get any
highlighted terms in the excerpt for the wildcard term.

regards
 marcel

Re: Slashes in wildcard query string do not work

Posted by Alexander Klimetschek <ak...@day.com>.
On Thu, Feb 26, 2009 at 8:11 PM, PThiemann
<ph...@googlemail.com> wrote:
> But now I am facing another problem. The excerpt is not highlighted anymore.
> I am just getting a excerpt string of all property values of node
> 'custom:extendedProperties' without any highlighing.
> As our customer are currently used to query result highlighting this would
> be a real disadvantage.
>
> Does anyone know if query highlighing is supported with jcr:like?

Yes, I think this is the case. It seems the default excerpt
highlighting is not able to handle wildcards. Maybe one can implement
that in a custom ExcerptProvider (see [1]), but since it uses the
Lucene index (org/apache/jackrabbit/core/query/lucene/DefaultHighlighter.java),
I think this could be quite some effort.

BTW, the query could be simplified a bit by extracting the
custom:extendedProperties axis:

//element(*,custom:file)/custom:extendedProperties[jcr:like(@XYZ,'F/OS%')]/rep:excerpt(.)

[1] http://wiki.apache.org/jackrabbit/ExcerptProvider

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Slashes in wildcard query string do not work

Posted by PThiemann <ph...@googlemail.com>.


Alexander Klimetschek wrote:
> 
> On Thu, Feb 26, 2009 at 10:47 AM, PThiemann
> <ph...@googlemail.com> wrote:
>> //element(*,
>> custom:file)[jcr:contains(custom:extendedProperties/@XYZ,'F/OS*')]/custom:extendedProperties/rep:excerpt(.)
>>
>> Now there is the strange thing. When I search (leaving out the /) for the
>> following I can see my result again.
>> //element(*, custom:file)[jcr:contains(custom:extendedProperties/@XYZ,'F
>> OS*')]/custom:extendedProperties/rep:excerpt(.)
>>
>> Is the slash not indexed by lucene or do I have to escape the slash for
>> Jackrabbit for not being recognized as path delimiter?
> 
> Instead of the "fuzzy" jcr:contains() method, you could use jcr:like()
> which is more accurate if you want to match simple properties - it
> uses "%" as wildcard (just as sql LIKE):
> 
> //element(*,custom:file)[jcr:like(custom:extendedProperties/@XYZ,'F/OS%')]/custom:extendedProperties/rep:excerpt(.)
> 
> Regards,
> Alex
> 
> -- 
> Alexander Klimetschek
> alexander.klimetschek@day.com
> 
> 

Thank you, that solved my problem getting search hits without any
workaround.

But now I am facing another problem. The excerpt is not highlighted anymore.
I am just getting a excerpt string of all property values of node
'custom:extendedProperties' without any highlighing. 
As our customer are currently used to query result highlighting this would
be a real disadvantage.

Does anyone know if query highlighing is supported with jcr:like? 
-- 
View this message in context: http://www.nabble.com/Slashes-in-wildcard-query-string-do-not-work-tp22220831p22231363.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.


Re: Slashes in wildcard query string do not work

Posted by Alexander Klimetschek <ak...@day.com>.
On Thu, Feb 26, 2009 at 10:47 AM, PThiemann
<ph...@googlemail.com> wrote:
> //element(*,
> custom:file)[jcr:contains(custom:extendedProperties/@XYZ,'F/OS*')]/custom:extendedProperties/rep:excerpt(.)
>
> Now there is the strange thing. When I search (leaving out the /) for the
> following I can see my result again.
> //element(*, custom:file)[jcr:contains(custom:extendedProperties/@XYZ,'F
> OS*')]/custom:extendedProperties/rep:excerpt(.)
>
> Is the slash not indexed by lucene or do I have to escape the slash for
> Jackrabbit for not being recognized as path delimiter?

Instead of the "fuzzy" jcr:contains() method, you could use jcr:like()
which is more accurate if you want to match simple properties - it
uses "%" as wildcard (just as sql LIKE):

//element(*,custom:file)[jcr:like(custom:extendedProperties/@XYZ,'F/OS%')]/custom:extendedProperties/rep:excerpt(.)

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Slashes in wildcard query string do not work

Posted by Marcel Reutegger <ma...@gmx.net>.
Hi,

wildcards in jcr:contains are a bit tricky ;)

On Thu, Feb 26, 2009 at 10:47, PThiemann
<ph...@googlemail.com> wrote:
> When searching for the following I do get a correct result:
> //element(*,
> custom:file)[jcr:contains(custom:extendedProperties/@XYZ,'F/OSAM')]/custom:extendedProperties/rep:excerpt(.)

as with content that gets indexed, also 'F/OSAM' will be analyzed
before it is evaluated by the query handler. the result of the
analyzing process depends on the configured analyzer. the default
implementation will create two tokens 'f' and 'osam'

> When searching for the next query string I do not get a result. Although
> using wildcards in my query:
> //element(*,
> custom:file)[jcr:contains(custom:extendedProperties/@XYZ,'F/OS*')]/custom:extendedProperties/rep:excerpt(.)

here the wildcard prevents the use of the analyzer because it is
impossible to run an analyzer on a just a prefix of many possible
strings. the resulting query will search for tokens that start with
'f/os'. obviously neither 'f' for 'osm' match here.

> Now there is the strange thing. When I search (leaving out the /) for the
> following I can see my result again.
> //element(*, custom:file)[jcr:contains(custom:extendedProperties/@XYZ,'F
> OS*')]/custom:extendedProperties/rep:excerpt(.)

this in turn creates two tokens again for searching: 'f' and 'os*',
which both match the tokens that were indexed.

> Is the slash not indexed by lucene or do I have to escape the slash for
> Jackrabbit for not being recognized as path delimiter?

this is basically a limitation when you use a wildcard in the jcr
contains clause.

as a rule of thumb you should avoid jcr:contains when your search
includes any special character.

regards
 marcel