You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by "H. Wilson" <wi...@randdss.com> on 2010/06/04 15:21:59 UTC
jcr:contains with wildcards and underscores
Hello,
I am using Jackrabbit 2.0 with OCM and after searching forums both here
and on Lucene, as well as Google, I have yet to find an answer. (On an
aside, if this question should have gone to the Lucene user's list,
please let me know!).
For starters, you should know our clients would like both case-sensitive
and case-insensitive options available to them. The searches are to be
on a property named fullName, which may contain underscores and always
contains a leading dot. (Also our client's requirement.) And while yes,
we are aware that leading wildcard searches are not the best, the client
still plans to use them. Here is my issue:
* My searches using jcr:like work fine for all the scenarios I list
below.
* My searches with jcr:contains and exact names work fine (even with
underscores!).
* My jcr:contains searches using wildcards and underscores always
fail. I have even tried escaping them.
Given there are two objects in our repository with the following
fullName properties:
.North.South.East.WestLand
.North.South.East.West_Land
Both of the following work fine, and each return the respective object:
(jcr:contains(@fullName, '.North.South.East.WestLand'))
(jcr:contains(@fullName, '.North.South.East.West_Land'))
The following jcr:contains queries return BOTH objects successfully:
*North*
.North*
.North.*
The following queries successfully return the FIRST object:
*.South.East.WestLand
.*.South.East.WestLand
*South*.WestLand
*East.WestLand
*.WestLand
*East?WestLand
*?WestLand
*North.South.East.WestLand
And the following identical jcr:contains queries (except the underscore)
do not return anything, when I would expect the SECOND Object:
*.South.East.West_Land
.*.South.East.West_Land
*South*.West_Land
*East.West_Land
*.West_Land
*East?West_Land
*?West_Land
*North.South.East.West_Land
UPDATE: After I wrote this large message, I just remembered something.
(It should be noted - I have been trying to tackle this off and on for
weeks, please bear with the slight memory loss, but maybe having seen
all this will help others.) I remember reading somewhere that Lucene
treats underscores as token dividers. So when I have Object properties
with underscores, it is splitting it into Tokens and essentially
dropping the underscore completely. Which could explain why exact name
search works. (Is this correct?) The above examples were using the
StandardAnalyzer. I have previously tried using the WhitespaceAnalyzer,
but doing so disables my ability to do leading wildcard searches, which
is absolutely required by our clients. I know there is a way to turn on
the leading wild card searches, but I could not gather how to do it
while using JackRabbit. Any advice on a way to use any Analyzer which
would satisfy our clients would be GREATLY appreciated.
Thanks for your time and patience,
H. Wilson
Re: jcr:contains with wildcards and underscores
Posted by "H. Wilson" <wi...@randdss.com>.
**For anyone who stumbles into this post with the same problem, head
on over here ( http://markmail.org/thread/t5hmrob3jdmz7nqm ) for more
discussion and the solution that ended up working for us.
H. Wilson
On 06/04/2010 09:21 AM, H. Wilson wrote:
> Hello,
>
> I am using Jackrabbit 2.0 with OCM and after searching forums both
> here and on Lucene, as well as Google, I have yet to find an answer.
> (On an aside, if this question should have gone to the Lucene user's
> list, please let me know!).
>
> For starters, you should know our clients would like both
> case-sensitive and case-insensitive options available to them. The
> searches are to be on a property named fullName, which may contain
> underscores and always contains a leading dot. (Also our client's
> requirement.) And while yes, we are aware that leading wildcard
> searches are not the best, the client still plans to use them. Here is
> my issue:
>
> * My searches using jcr:like work fine for all the scenarios I list
> below.
> * My searches with jcr:contains and exact names work fine (even with
> underscores!).
> * My jcr:contains searches using wildcards and underscores always
> fail. I have even tried escaping them.
>
> Given there are two objects in our repository with the following
> fullName properties:
>
> .North.South.East.WestLand
> .North.South.East.West_Land
>
>
> Both of the following work fine, and each return the respective object:
>
> (jcr:contains(@fullName, '.North.South.East.WestLand'))
> (jcr:contains(@fullName, '.North.South.East.West_Land'))
>
>
> The following jcr:contains queries return BOTH objects successfully:
>
> *North*
> .North*
> .North.*
>
> The following queries successfully return the FIRST object:
>
> *.South.East.WestLand
> .*.South.East.WestLand
> *South*.WestLand
> *East.WestLand
> *.WestLand
> *East?WestLand
> *?WestLand
> *North.South.East.WestLand
>
> And the following identical jcr:contains queries (except the
> underscore) do not return anything, when I would expect the SECOND
> Object:
>
> *.South.East.West_Land
> .*.South.East.West_Land
> *South*.West_Land
> *East.West_Land
> *.West_Land
> *East?West_Land
> *?West_Land
> *North.South.East.West_Land
>
> UPDATE: After I wrote this large message, I just remembered something.
> (It should be noted - I have been trying to tackle this off and on for
> weeks, please bear with the slight memory loss, but maybe having seen
> all this will help others.) I remember reading somewhere that Lucene
> treats underscores as token dividers. So when I have Object properties
> with underscores, it is splitting it into Tokens and essentially
> dropping the underscore completely. Which could explain why exact name
> search works. (Is this correct?) The above examples were using the
> StandardAnalyzer. I have previously tried using the
> WhitespaceAnalyzer, but doing so disables my ability to do leading
> wildcard searches, which is absolutely required by our clients. I know
> there is a way to turn on the leading wild card searches, but I could
> not gather how to do it while using JackRabbit. Any advice on a way to
> use any Analyzer which would satisfy our clients would be GREATLY
> appreciated.
>
> Thanks for your time and patience,
> H. Wilson
>
>